IHC Marker Clustering: Quick Reference Card
Source:vignettes/ihccluster-quick-reference-card.Rmd
ihccluster-quick-reference-card.RmdIHC Marker Clustering: Quick Reference Card for Pathologists
Print this page and keep it at your desk!
30-Second Decision Guide
┌──────────────────────────────────────────────────────────┐
│ What type of IHC data do I have? │
└──────────────────────────────────────────────────────────┘
│
├─→ Binary only (pos/neg)?
│ └─→ Use CHI-SQUARED ⭐
│
├─→ Continuous only (H-scores, %)?
│ └─→ Use EUCLIDEAN ⭐
│
└─→ Mix of both?
└─→ Use MIXED ⭐
Distance Metrics at a Glance
| Metric | Best For | When to Use | Example Panel |
|---|---|---|---|
| Chi-squared ⭐ | Binary/ordinal | Default choice for pos/neg markers | TTF1, p40, CK7, CK20 |
| Jaccard | Sparse binary | Many double-negatives | PAX8, RCC, CD10 (renal) |
| Euclidean ⭐ | Continuous | H-scores, % positive | PSA, NKX3.1, Ki67 |
| Manhattan | Continuous w/ outliers | Robust to extreme values | Ki67 with outliers |
| Mixed ⭐ | Binary + Continuous | Automatic handling | ER, PR, HER2, Ki67% |
| Mutual Info | Non-linear | Complex relationships | p53 vs Ki67 grading |
| Hamming | Ordinal | Intensity scoring | 0/1+/2+/3+ markers |
| Cramér’s V | Normalized | Different table sizes | Any categorical |
| Correlation | Pattern | Co-variation focus | Related pathways |
Interpreting Distance Values
Distance Meaning Action
────────────────────────────────────────────────
0.0 - 0.2 Nearly identical ⚠️ One marker likely redundant
0.2 - 0.4 Very similar Consider clinical context
0.4 - 0.6 Moderately different Probably keep both
0.6 - 0.8 Different Definitely keep both
0.8 - 1.0 Completely distinct Independent information
Reading the Dendrogram
CK7 ──┐ ← Low merge height
MUC6──┴──┐ ← Markers are similar
│
MUC5AC───┴─┐ ← Medium height
├─── ← High merge height
CDX2───────┤ ← Markers are different
MUC2───────┘
Rules: - Low height = markers co-express (may be redundant) - High height = markers independent (keep both) - Red line = significance threshold
Common Clinical Scenarios
Lung (Adeno vs Squamous)
Panel: TTF1, Napsin A, p40, CK5/6
Distance Metric: Chi-squared ⭐
Expected:
├─ TTF1 + Napsin A cluster (distance ~0.2)
└─ p40 + CK5/6 cluster (distance ~0.2)
Optimization:
✅ Keep: TTF1 + p40 (one from each cluster)
❌ Drop: Napsin A or CK5/6 (redundant)
Breast (Molecular Subtype)
Panel: ER, PR, HER2, Ki67%
Distance Metric: Mixed ⭐
Expected:
├─ ER + PR moderate clustering (distance ~0.3)
└─ HER2, Ki67 independent
Optimization:
✅ Keep all (prognostic value despite ER-PR redundancy)
GI (Site of Origin)
Panel: CK7, CK20, CDX2, SATB2
Distance Metric: Chi-squared or Jaccard
Expected:
├─ CDX2 + SATB2 cluster (distance ~0.1)
└─ CK7 independent
Optimization:
✅ Keep: CK7 + CDX2
❌ Drop: SATB2 (redundant with CDX2)
Statistical Significance
P-value Interpretation
P-value Confidence Interpretation
────────────────────────────────────────────
< 0.001 99.9% Very strong association
< 0.01 99% Strong association
< 0.05 95% Moderate association
≥ 0.05 < 95% No significant association
Cramér’s V Effect Size
For 2×2 Tables (Binary Markers):
V Interpretation Clinical Meaning
────────────────────────────────────────────────────
0.0 - 0.1 Negligible Independent markers
0.1 - 0.3 Weak Slight relationship
0.3 - 0.5 Moderate Notable co-expression
0.5 - 0.7 Strong Often co-express
0.7 - 1.0 Very strong Nearly always together
1.0 Perfect Always co-express
Warning Signs (Red Flags)
⚠️ Don’t Drop Markers If:
-
Established in Guidelines
- Example: ER + PR in breast cancer
- Even if clustered, both have clinical utility
-
Different Diagnostic Contexts
- Example: CK7/CK20 pattern changes by tumor type
- Low GI: CK7-/CK20+ vs Upper GI: CK7+/CK20+
-
Prognostic Significance
- Example: ER+/PR- = worse prognosis
- Loss of PR is clinically meaningful
-
Different Sensitivities
- Example: S100 (sensitive) vs HMB45 (specific)
- Both needed for melanoma confirmation
-
Small Sample Size
- Need n > 50 for reliable clustering
- n < 30 = results may be unstable
Cost-Benefit Calculator
Example: Eliminating 2 redundant markers
Marker cost: $60 per antibody
Cases/year: 500
Markers eliminated: 2
Annual savings = $60 × 2 × 500 = $60,000
BUT check:
✓ Does diagnosis change in any cases?
✓ Is validation data convincing (≥95% concordance)?
✓ Do guidelines allow this modification?
When to Consult Biostatistics
- Sample size < 50 cases
- Unexpected clustering patterns
- Proposing major panel changes
- Publishing your findings
- Implementing across institution
Workflow Checklist
□ 1. Collect ≥50-100 cases with complete IHC data
□ 2. Choose distance metric (Chi-squared/Euclidean/Mixed)
□ 3. Run clustering analysis in jamovi
□ 4. Review dendrogram for marker groups
□ 5. Check p-values and effect sizes
□ 6. Identify distance < 0.3 (potential redundancy)
□ 7. Review clinical literature for validation
□ 8. Calculate cost savings
□ 9. Pilot optimized panel on 20-30 new cases
□ 10. Require ≥95% concordance before adoption
□ 11. Document protocol and share with colleagues
□ 12. Monitor performance and adjust annually
Example Interpretations
Good Clustering Result ✅
Dendrogram shows:
├─ Clear separation between diagnostic groups
├─ Markers within groups have distance < 0.3
└─ Between groups distance > 0.7
Interpretation: Well-designed panel
Action: Consider eliminating within-group redundancy
Problematic Clustering Result ⚠️
Dendrogram shows:
├─ All markers cluster together (distance 0.1-0.2)
└─ No clear groups
Interpretation: Over-redundant panel OR technical issue
Action:
1. Check if markers actually worked (not all negative)
2. Review case selection (diverse enough?)
3. Consider adding markers from different pathways
Jamovi Settings Quick Guide
In ihccluster analysis:
1. Select Markers:
├─ Categorical IHC Markers: Binary/ordinal
└─ Continuous IHC Markers: H-scores, %
2. Enable Marker Clustering:
☑ Perform Marker-Level Clustering
3. Choose Distance:
▼ Marker Clustering Distance
Your choice from table above
4. Enable Tests:
☑ Test Marker Associations
☑ Auto-detect Marker Groups
5. Review Outputs:
├─ Marker Clustering Dendrogram (plot)
├─ Marker-Marker Association Tests (table)
├─ Marker Clustering Sequence (table)
└─ Identified Marker Groups (table)
Common Questions
Q: Can I cluster both patients AND markers? A: Yes! Patient clustering (existing) groups cases by IHC profile. Marker clustering (new) groups markers by co-expression. Both provide complementary information.
Q: Why does my dendrogram look different from colleagues? A: Different distance metrics or different case cohorts. Use same metric and validate with independent datasets.
Q: Markers cluster but literature says they’re independent? A: Could be: (1) Different tumor types in your cohort, (2) Both often negative (double-negative clustering), (3) Selection bias in your cases. Always validate against literature.
Q: Can I publish panel optimization based on clustering? A: Yes, but need: (1) Sufficient sample size (n>100), (2) Validation cohort, (3) Performance comparison with original panel, (4) Clinical correlation.
Q: What if clustering contradicts CAP/ASCO guidelines? A: Follow guidelines! Clustering is exploratory. Never eliminate required markers based on clustering alone.
Emergency Troubleshooting
“All markers show distance ~1.0”
Cause: Markers are completely independent OR all different
Check: Is this expected? (e.g., lineage markers)
“All markers show distance ~0.1”
Cause: All redundant OR technical failure (all negative)
Check: Staining quality, case diversity
Key References (Quick)
- Greenacre M (2017). Correspondence Analysis in Practice
- The definitive reference for categorical data clustering
- Olsen LR et al. (2006). Modern Pathology 19:1238-1251
- IHC panel optimization methodology
- Agresti A (2013). Categorical Data Analysis
- Statistical foundations (Cramér’s V, chi-squared)
Support
For Technical Issues: - Check jamovi/ClinicoPath documentation - Consult biostatistics support
For Clinical Questions: - Review current literature - Discuss with pathology colleagues - Follow institutional guidelines
Version 1.0 | Updated: 2025-01-26
Print this card and keep at your desk for quick reference!
For detailed explanations and worked examples, see: -
Full Guide:
ihccluster-marker-distance-pathologist-guide.md -
Technical Manual:
MARKER_CLUSTERING_DISTANCES.md