Skip to contents

Clusters cases based on immunohistochemistry (IHC) staining patterns.

Usage

ihccluster(
  data,
  catVars = NULL,
  contVars = NULL,
  caseId = NULL,
  spatialCompartment = NULL,
  performSpatialAnalysis = FALSE,
  spatialComparisonMode = "both",
  method = "pam",
  distanceMethod = "gower",
  linkageMethod = "ward",
  nClusters = 3,
  autoSelectK = TRUE,
  kRange = "medium",
  scaleContVars = TRUE,
  weights = "",
  handleMissing = "pairwise",
  consensusClustering = FALSE,
  nBootstrap = 100,
  seed = 42,
  showSilhouette = TRUE,
  showHeatmap = TRUE,
  heatmapScale = "row",
  showDendrogram = TRUE,
  showPCAPlot = TRUE,
  showBoxplots = TRUE,
  markerSummary = TRUE,
  clusterProfiles = TRUE,
  associationTests = TRUE,
  multipleTestingCorrection = "bonferroni",
  markerOptimization = FALSE,
  showMarkerCorrelation = FALSE,
  clusterQualityMetrics = TRUE,
  iterativeRefinement = FALSE,
  refinementIterations = 3,
  reproducibilityTest = FALSE,
  nSplits = 10,
  supervisedClustering = FALSE,
  supervisedVariable = NULL,
  calculateRatios = FALSE,
  ratioNumerator = NULL,
  ratioDenominator = NULL,
  ratioName = "Marker_Ratio",
  ratioClassification = FALSE,
  ratioLowCutoff = 1,
  ratioHighCutoff = 2,
  exportClusters = FALSE,
  knownDiagnosis = NULL,
  calculateDiagnosticMetrics = FALSE,
  identifyOptimalPanel = FALSE,
  panelSize = "pairs",
  flagOutliers = TRUE,
  outlierThreshold = 0.25,
  clinicalVars = NULL,
  survivalTime = NULL,
  survivalEvent = NULL,
  colorPalette = "default",
  fontSize = "medium",
  plotContrast = FALSE,
  showInterpretation = FALSE,
  showTechnicalNotes = FALSE
)

Arguments

data

The data as a data frame.

catVars

Binary (pos/neg) or ordinal (0/1/2/3) stain results

contVars

H-scores (0-300), \ continuous measures

caseId

Case identifier for tracking

spatialCompartment

Groups cases by spatial location (e.g., Central/Invasive, Preinvasive/Invasive, Primary/Metastatic)

performSpatialAnalysis

Compare clustering patterns across spatial compartments

spatialComparisonMode

How to analyze spatial compartments

method

Clustering algorithm to use

distanceMethod

Distance metric for clustering

linkageMethod

Hierarchical clustering linkage method

nClusters

Leave blank for automatic selection using silhouette

autoSelectK

Use silhouette width to find optimal number of clusters

kRange

Range to test when auto-selecting k

scaleContVars

Z-score continuous markers before clustering

weights

Comma-separated weights for markers (e.g., "1,1,2,1" for 4 markers)

handleMissing

How to handle missing values

consensusClustering

Use bootstrap resampling for stable clusters

nBootstrap

Number of bootstrap samples for consensus

seed

For reproducibility

showSilhouette

Show silhouette width plot for cluster quality

showHeatmap

Show clustered heatmap of IHC expression

heatmapScale

How to scale heatmap values

showDendrogram

Show dendrogram for hierarchical clustering

showPCAPlot

Show dimension reduction plot with clusters

showBoxplots

Show boxplots of continuous markers by cluster

markerSummary

Summary statistics for each marker by cluster

clusterProfiles

Characteristic features of each cluster

associationTests

Test marker-cluster associations

multipleTestingCorrection

Correction method for marker association tests

markerOptimization

Analyze marker importance and identify optimal panel

showMarkerCorrelation

Display correlation structure between markers

clusterQualityMetrics

Calculate PPV, purity, and cluster quality measures

iterativeRefinement

Perform iterative clustering with marker selection (advanced)

refinementIterations

Number of iterative refinement cycles

reproducibilityTest

Random split validation with Cohen kappa (Sterlacci 2019)

nSplits

Number of random splits for reproducibility testing

supervisedClustering

Cluster within each known diagnosis group separately

supervisedVariable

Variable defining groups (e.g., histotype, diagnosis)

calculateRatios

Calculate ratios between continuous markers (e.g., CD4/CD8 ratio)

ratioNumerator

Numerator marker for ratio calculation

ratioDenominator

Denominator marker for ratio calculation

ratioName

Name for the computed ratio variable

ratioClassification

Classify ratio as Low/Intermediate/High using cutoffs

ratioLowCutoff

Values ≤ this are classified as Low

ratioHighCutoff

Values ≥ this are classified as High

exportClusters

Save cluster assignments to dataset

knownDiagnosis

Known diagnoses for calculating marker performance metrics (sensitivity, specificity, PPV, NPV)

calculateDiagnosticMetrics

Compute sensitivity, specificity, PPV, NPV for each marker when diagnosis is known

identifyOptimalPanel

Find minimal marker combinations with maximum diagnostic discrimination

panelSize

Size of antibody panel combinations to evaluate

flagOutliers

Identify cases with ambiguous/atypical immunoprofiles based on silhouette scores

outlierThreshold

Silhouette score threshold below which cases are flagged as outliers

clinicalVars

Clinical variables to compare across clusters

survivalTime

Time variable for survival analysis

survivalEvent

Event variable for survival analysis

colorPalette

Color palette for plots (colorblind-safe options available)

fontSize

Base font size for all text elements

plotContrast

Enable high contrast mode for better visibility

showInterpretation

Display clinical interpretation guide

showTechnicalNotes

Display technical notes about the analysis

Value

A results object containing:

results$todoa html
results$binaryConversionNotea html
results$summarya html
results$clusterSizesa table
results$silhouetteStatsa table
results$clusterProfilesa table
results$markerSummarya table
results$associationTestsa table
results$clinicalComparisona table
results$consensusStatsa table
results$reproducibilityStatsa table
results$supervisedSummarya table
results$supervisedResultsa html
results$ratioSummarya table
results$ratioClassificationTablea table
results$markerImportancea table
results$clusterQualitya table
results$refinementHistorya table
results$markerPerformanceSensitivity, specificity, PPV, NPV for each marker by diagnosis
results$optimalPanelsRanked marker combinations for differential diagnosis
results$outlierCasesCases with ambiguous cluster assignment or low silhouette scores
results$silhouettePlotan image
results$heatmapPlotan image
results$dendrogramPlotan image
results$pcaContributionsa table
results$pcaVariablePlotan image
results$pcaPlotan image
results$boxplotPlotan image
results$markerCorrelationPlotan image
results$survivalPlotan image
results$medoidInfoa table
results$interpretationGuidea html
results$technicalNotesa html
results$executiveSummarya preformatted
results$spatialCompartmentSummarya table
results$spatialConcordancea table
results$spatialClusterComparisona table
results$spatialMarkerDifferencesa table
results$spatialHeatmapPlotan image
results$sizesa table
results$distra table
results$assoca table
results$texta preformatted

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$clusterSizes$asDF

as.data.frame(results$clusterSizes)

Details

Variable Types:

  • Categorical: pos/neg, intensity levels (0/1/2/3), ordinal scales

  • Continuous: H-scores (0-300), \

  • Mixed: Any combination of categorical and continuous markers

Uses Gower distance to handle mixed data types appropriately.