Skip to contents

Essential IHC expression analysis with clustering and H-score calculation. Designed for routine clinical use with simplified options.

Usage

ihcbasic(
  data,
  markers,
  id = NULL,
  computeHScore = FALSE,
  clusterMethod = "hierarchical",
  nClusters = 3,
  distanceMetric = "gower",
  linkageMethod = "ward.D2",
  standardizeData = TRUE,
  showDendrogram = TRUE,
  showHeatmap = TRUE,
  silhouetteAnalysis = TRUE,
  scoringScale = "standard"
)

Arguments

data

the data as a data frame

markers

IHC marker variables (categorical 0-3 or continuous H-scores)

id

Optional case/sample identifier for labeling

computeHScore

Generate H-score statistics (mean, median, range) for each marker. H-scores represent staining intensity multiplied by percentage of positive cells, providing a comprehensive measure of protein expression (range 0-300).

clusterMethod

Method for clustering samples by IHC expression patterns. Hierarchical clustering builds tree-like relationships and works well for small to medium datasets. K-means is faster for large datasets but assumes spherical clusters. PAM (Partitioning Around Medoids) is more robust to outliers.

nClusters

Number of distinct IHC expression patterns to identify. Start with 2-3 patterns for initial exploration. More patterns may reveal tumor heterogeneity but require larger sample sizes (rule of thumb: at least 3-5 samples per pattern).

distanceMetric

How to measure similarity between samples. Gower distance works with different IHC scoring scales (0-3, H-scores, binary) and is recommended for most analyses. Jaccard distance only considers presence/absence of staining.

linkageMethod

How to form groups in hierarchical clustering. Ward's method creates well-balanced groups and is recommended for most IHC analyses. Complete linkage creates tight, compact clusters. Average linkage provides a moderate approach between the two.

standardizeData

Recommended when markers use different scales (e.g., mixing 0-3 scores with H-scores). Puts all markers on the same scale so no single marker dominates the analysis. Disable for binary data or when all markers use the same scale.

showDendrogram

Shows the hierarchical relationship between samples as a tree diagram. Helps visualize how expression patterns relate to each other and identify the optimal number of groups.

showHeatmap

Visual representation of IHC staining patterns across all samples. Each row represents a patient sample, each column represents a marker. Colors indicate expression levels, with samples grouped by similar patterns.

silhouetteAnalysis

Recommended: Evaluates how well-separated the expression patterns are. Higher scores indicate more reliable, clinically meaningful patterns. Helps determine if the identified patterns are real or due to noise.

scoringScale

The IHC scoring system used in your data. Standard 0-3 scale is most common (0=no staining, 1=weak, 2=moderate, 3=strong staining). Binary scoring records only presence/absence. H-score combines intensity and percentage of positive cells (range 0-300).

Value

A results object containing:

results$instructionsa html
results$clinicalSummaryPlain-language summary with clinical context
results$reportTextPre-formatted text for clinical reports
results$clusterSummarya table
results$hscoreTablea table
results$silhouetteTablea table
results$markerSummarya table
results$dendrogramPlotan image
results$heatmapPlotan image

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$clusterSummary$asDF

as.data.frame(results$clusterSummary)

Examples