Skip to contents

IHC Marker Clustering: Quick Reference Card for Pathologists

Print this page and keep it at your desk!


30-Second Decision Guide

┌──────────────────────────────────────────────────────────┐
│ What type of IHC data do I have?                         │
└──────────────────────────────────────────────────────────┘
           │
           ├─→ Binary only (pos/neg)?
           │   └─→ Use CHI-SQUARED ⭐
           │
           ├─→ Continuous only (H-scores, %)?
           │   └─→ Use EUCLIDEAN ⭐
           │
           └─→ Mix of both?
               └─→ Use MIXED ⭐

Distance Metrics at a Glance

Metric Best For When to Use Example Panel
Chi-squared Binary/ordinal Default choice for pos/neg markers TTF1, p40, CK7, CK20
Jaccard Sparse binary Many double-negatives PAX8, RCC, CD10 (renal)
Euclidean Continuous H-scores, % positive PSA, NKX3.1, Ki67
Manhattan Continuous w/ outliers Robust to extreme values Ki67 with outliers
Mixed Binary + Continuous Automatic handling ER, PR, HER2, Ki67%
Mutual Info Non-linear Complex relationships p53 vs Ki67 grading
Hamming Ordinal Intensity scoring 0/1+/2+/3+ markers
Cramér’s V Normalized Different table sizes Any categorical
Correlation Pattern Co-variation focus Related pathways

Interpreting Distance Values

Distance    Meaning                 Action
────────────────────────────────────────────────
0.0 - 0.2   Nearly identical       ⚠️ One marker likely redundant
0.2 - 0.4   Very similar           Consider clinical context
0.4 - 0.6   Moderately different   Probably keep both
0.6 - 0.8   Different              Definitely keep both
0.8 - 1.0   Completely distinct    Independent information

Reading the Dendrogram

        CK7 ──┐         ← Low merge height
        MUC6──┴──┐      ← Markers are similar
                 │
        MUC5AC───┴─┐    ← Medium height
                   ├─── ← High merge height
        CDX2───────┤    ← Markers are different
        MUC2───────┘

Rules: - Low height = markers co-express (may be redundant) - High height = markers independent (keep both) - Red line = significance threshold


Common Clinical Scenarios

Lung (Adeno vs Squamous)

Panel: TTF1, Napsin A, p40, CK5/6
Distance Metric: Chi-squared ⭐

Expected:
├─ TTF1 + Napsin A cluster (distance ~0.2)
└─ p40 + CK5/6 cluster (distance ~0.2)

Optimization:
✅ Keep: TTF1 + p40 (one from each cluster)
❌ Drop: Napsin A or CK5/6 (redundant)

Breast (Molecular Subtype)

Panel: ER, PR, HER2, Ki67%
Distance Metric: Mixed ⭐

Expected:
├─ ER + PR moderate clustering (distance ~0.3)
└─ HER2, Ki67 independent

Optimization:
✅ Keep all (prognostic value despite ER-PR redundancy)

GI (Site of Origin)

Panel: CK7, CK20, CDX2, SATB2
Distance Metric: Chi-squared or Jaccard

Expected:
├─ CDX2 + SATB2 cluster (distance ~0.1)
└─ CK7 independent

Optimization:
✅ Keep: CK7 + CDX2
❌ Drop: SATB2 (redundant with CDX2)

Renal (ccRCC vs pRCC)

Panel: PAX8, RCC, CD10, CK7
Distance Metric: Jaccard ⭐ (sparse positivity)

Expected:
├─ PAX8 + RCC + CD10 cluster (distance ~0.1)
└─ CK7 separate (discriminates subtypes)

Optimization:
✅ Keep: PAX8 + CK7
❌ Drop: RCC marker, CD10 (redundant)

Melanoma (Confirmation)

Panel: S100, SOX10, Melan-A, HMB45
Distance Metric: Hamming or Chi-squared

Expected:
├─ S100 + SOX10 cluster (nuclear, distance ~0.4)
└─ Melan-A + HMB45 separate (cytoplasmic)

Optimization:
✅ Keep: S100 + HMB45 (different patterns)
⚠️ Add SOX10 if S100 equivocal

Statistical Significance

P-value Interpretation

P-value     Confidence    Interpretation
────────────────────────────────────────────
< 0.001     99.9%         Very strong association
< 0.01      99%           Strong association
< 0.05      95%           Moderate association
≥ 0.05      < 95%         No significant association

Cramér’s V Effect Size

For 2×2 Tables (Binary Markers):

V           Interpretation     Clinical Meaning
────────────────────────────────────────────────────
0.0 - 0.1   Negligible        Independent markers
0.1 - 0.3   Weak              Slight relationship
0.3 - 0.5   Moderate          Notable co-expression
0.5 - 0.7   Strong            Often co-express
0.7 - 1.0   Very strong       Nearly always together
1.0         Perfect           Always co-express

Jaccard Index (Co-positivity)

Jaccard     Interpretation     Action
────────────────────────────────────────────
> 0.8       Very high overlap  Consider dropping one
0.6 - 0.8   High overlap       Evaluate clinical need
0.4 - 0.6   Moderate overlap   Keep both
< 0.4       Low overlap        Independent markers

Warning Signs (Red Flags)

⚠️ Don’t Drop Markers If:

  1. Established in Guidelines
    • Example: ER + PR in breast cancer
    • Even if clustered, both have clinical utility
  2. Different Diagnostic Contexts
    • Example: CK7/CK20 pattern changes by tumor type
    • Low GI: CK7-/CK20+ vs Upper GI: CK7+/CK20+
  3. Prognostic Significance
    • Example: ER+/PR- = worse prognosis
    • Loss of PR is clinically meaningful
  4. Different Sensitivities
    • Example: S100 (sensitive) vs HMB45 (specific)
    • Both needed for melanoma confirmation
  5. Small Sample Size
    • Need n > 50 for reliable clustering
    • n < 30 = results may be unstable

Cost-Benefit Calculator

Example: Eliminating 2 redundant markers

Marker cost: $60 per antibody
Cases/year: 500
Markers eliminated: 2

Annual savings = $60 × 2 × 500 = $60,000

BUT check:
✓ Does diagnosis change in any cases?
✓ Is validation data convincing (≥95% concordance)?
✓ Do guidelines allow this modification?

When to Consult Biostatistics

  • Sample size < 50 cases
  • Unexpected clustering patterns
  • Proposing major panel changes
  • Publishing your findings
  • Implementing across institution

Workflow Checklist

□ 1. Collect ≥50-100 cases with complete IHC data
□ 2. Choose distance metric (Chi-squared/Euclidean/Mixed)
□ 3. Run clustering analysis in jamovi
□ 4. Review dendrogram for marker groups
□ 5. Check p-values and effect sizes
□ 6. Identify distance < 0.3 (potential redundancy)
□ 7. Review clinical literature for validation
□ 8. Calculate cost savings
□ 9. Pilot optimized panel on 20-30 new cases
□ 10. Require ≥95% concordance before adoption
□ 11. Document protocol and share with colleagues
□ 12. Monitor performance and adjust annually

Example Interpretations

Good Clustering Result ✅

Dendrogram shows:
├─ Clear separation between diagnostic groups
├─ Markers within groups have distance < 0.3
└─ Between groups distance > 0.7

Interpretation: Well-designed panel
Action: Consider eliminating within-group redundancy

Problematic Clustering Result ⚠️

Dendrogram shows:
├─ All markers cluster together (distance 0.1-0.2)
└─ No clear groups

Interpretation: Over-redundant panel OR technical issue
Action:
1. Check if markers actually worked (not all negative)
2. Review case selection (diverse enough?)
3. Consider adding markers from different pathways

Unexpected Clustering Result 🔍

Dendrogram shows:
├─ TTF1 clusters with p40 (should be opposite!)
└─ Distance 0.15 (very close)

Interpretation: Data quality issue
Action:
1. Check for column swap in data entry
2. Verify case diagnoses
3. Review staining patterns (technical failure?)

Jamovi Settings Quick Guide

In ihccluster analysis:

1. Select Markers:
   ├─ Categorical IHC Markers: Binary/ordinal
   └─ Continuous IHC Markers: H-scores, %

2. Enable Marker Clustering:
   ☑ Perform Marker-Level Clustering

3. Choose Distance:
   ▼ Marker Clustering Distance
       Your choice from table above

4. Enable Tests:
   ☑ Test Marker Associations
   ☑ Auto-detect Marker Groups

5. Review Outputs:
   ├─ Marker Clustering Dendrogram (plot)
   ├─ Marker-Marker Association Tests (table)
   ├─ Marker Clustering Sequence (table)
   └─ Identified Marker Groups (table)

Common Questions

Q: Can I cluster both patients AND markers? A: Yes! Patient clustering (existing) groups cases by IHC profile. Marker clustering (new) groups markers by co-expression. Both provide complementary information.

Q: Why does my dendrogram look different from colleagues? A: Different distance metrics or different case cohorts. Use same metric and validate with independent datasets.

Q: Markers cluster but literature says they’re independent? A: Could be: (1) Different tumor types in your cohort, (2) Both often negative (double-negative clustering), (3) Selection bias in your cases. Always validate against literature.

Q: Can I publish panel optimization based on clustering? A: Yes, but need: (1) Sufficient sample size (n>100), (2) Validation cohort, (3) Performance comparison with original panel, (4) Clinical correlation.

Q: What if clustering contradicts CAP/ASCO guidelines? A: Follow guidelines! Clustering is exploratory. Never eliminate required markers based on clustering alone.


Emergency Troubleshooting

“All markers show distance ~1.0”

Cause: Markers are completely independent OR all different
Check: Is this expected? (e.g., lineage markers)

“All markers show distance ~0.1”

Cause: All redundant OR technical failure (all negative)
Check: Staining quality, case diversity

“Dendrogram doesn’t make biological sense”

Cause: Data entry error, technical issues, unusual cohort
Check: Raw data, swap columns, verify antibodies worked

“Chi-squared test fails”

Cause: Small sample size, sparse cells, constant values
Solution: Use Jaccard or increase sample size

Key References (Quick)

  1. Greenacre M (2017). Correspondence Analysis in Practice
    • The definitive reference for categorical data clustering
  2. Olsen LR et al. (2006). Modern Pathology 19:1238-1251
    • IHC panel optimization methodology
  3. Agresti A (2013). Categorical Data Analysis
    • Statistical foundations (Cramér’s V, chi-squared)

Support

For Technical Issues: - Check jamovi/ClinicoPath documentation - Consult biostatistics support

For Clinical Questions: - Review current literature - Discuss with pathology colleagues - Follow institutional guidelines


Version 1.0 | Updated: 2025-01-26

Print this card and keep at your desk for quick reference!


For detailed explanations and worked examples, see: - Full Guide: ihccluster-marker-distance-pathologist-guide.md - Technical Manual: MARKER_CLUSTERING_DISTANCES.md