Segmentation Metrics (Dice, IoU, Hausdorff) — segmentationmetrics • ClinicoPath

Comprehensive validation metrics for image segmentation tasks in digital pathology and AI-based tissue analysis. Evaluates overlap and boundary accuracy between AI-predicted segmentations and expert-annotated ground truth. Essential metrics include Dice Coefficient (F1-score for spatial overlap), Jaccard Index (IoU - Intersection over Union), Hausdorff Distance (maximum boundary deviation), and Surface Distance metrics. Designed for tumor boundary delineation, gland segmentation, cell nuclei detection, tissue region classification, and any pixel-level or region-based segmentation task. Supports binary segmentation (single structure), multi-class segmentation (multiple tissue types), and instance segmentation (individual object detection). Provides statistical analysis across multiple images, stratification by image characteristics (magnification, staining, scanner), and clinical interpretation of segmentation quality. Critical for validating AI algorithms before deployment in diagnostic workflows, comparing segmentation methods, and establishing performance benchmarks for digital pathology systems.

Usage

segmentationmetrics(
  data,
  prediction_mask,
  ground_truth_mask,
  image_id,
  segmentation_type = "binary",
  positive_class,
  dice_coefficient = TRUE,
  jaccard_index = TRUE,
  volumetric_similarity = FALSE,
  sensitivity_specificity = TRUE,
  hausdorff_distance = TRUE,
  average_hausdorff = TRUE,
  surface_distance = TRUE,
  surface_overlap = FALSE,
  boundary_tolerance = 2,
  pixel_size_provided = FALSE,
  pixel_size_x = 0.5,
  pixel_size_y = 0.5,
  class_specific_metrics = TRUE,
  macro_average = TRUE,
  weighted_average = TRUE,
  object_detection_metrics = FALSE,
  iou_threshold = 0.5,
  count_metrics = FALSE,
  confidence_intervals = TRUE,
  bootstrap_ci = FALSE,
  bootstrap_samples = 1000,
  confidence_level = 0.95,
  quality_thresholds = TRUE,
  dice_threshold_excellent = 0.9,
  dice_threshold_good = 0.8,
  dice_threshold_acceptable = 0.7,
  stratified_analysis = FALSE,
  stratify_by,
  outlier_detection = TRUE,
  outlier_method = "iqr",
  plot_metric_distribution = TRUE,
  plot_scatter_comparison = TRUE,
  plot_boundary_error = TRUE,
  plot_confusion_matrix = FALSE,
  plot_performance_by_class = FALSE,
  application_context = "general",
  show_interpretation = TRUE,
  paired_analysis = FALSE,
  comparison_method,
  missing_handling = "complete",
  random_seed = 123
)

Arguments

data: The data as a data frame.
prediction_mask: AI-predicted segmentation mask. For binary segmentation, this is a binary variable (0/1 or background/foreground). For multi-class, this contains class labels. Can be pixel-level or region-level data.
ground_truth_mask: Expert-annotated ground truth segmentation mask. Must have the same encoding scheme as prediction_mask.
image_id: Variable identifying individual images or regions. Used to aggregate metrics per image and calculate summary statistics across images.
segmentation_type: Type of segmentation task. Binary for single structure (e.g., tumor vs background), multi-class for multiple tissue types (e.g., epithelium/stroma/necrosis), instance for individual object detection (e.g., separate cell nuclei).
positive_class: For binary segmentation, specify which level represents the foreground/ structure of interest (e.g., tumor, gland, nucleus).
dice_coefficient: Calculate Dice coefficient (also known as F1-score for segmentation). Measures spatial overlap: Dice = 2|A∩B| / (|A|+|B|). Range 0-1, where 1 = perfect overlap. Most commonly used segmentation metric.
jaccard_index: Calculate Jaccard Index (Intersection over Union). Measures overlap: IoU = |A∩B| / |A∪B|. Range 0-1. Related to Dice: IoU = Dice/(2-Dice). Standard metric in computer vision and AI segmentation.
volumetric_similarity: Calculate Volumetric Similarity coefficient for 3D segmentation or area-based similarity in 2D. Useful for volume/area preservation analysis.
sensitivity_specificity: Calculate pixel-wise sensitivity (true positive rate) and specificity (true negative rate). Shows over-segmentation vs under-segmentation.
hausdorff_distance: Calculate Hausdorff Distance - maximum distance from predicted boundary to ground truth boundary. Sensitive to outliers. Measures worst-case boundary error. Reported in pixels or mm if pixel size provided.
average_hausdorff: Calculate Average Hausdorff Distance (95th percentile). More robust to outliers than maximum Hausdorff. Better represents typical boundary error.
surface_distance: Calculate average distance between predicted and ground truth boundaries. Mean of all point-to-surface distances. Provides average boundary error.
surface_overlap: Calculate Surface Dice (boundary-focused Dice). Only considers points near boundaries. Emphasizes boundary accuracy over volume accuracy.
boundary_tolerance: Tolerance distance for surface/boundary metrics in pixels. Points within this distance are considered boundary points. Typical: 1-5 pixels depending on magnification.
pixel_size_provided: Whether pixel/voxel physical dimensions are available for converting pixel-based metrics to physical distances (micrometers, millimeters).
pixel_size_x: Physical size of one pixel in X dimension (micrometers). Used to convert pixel-based distances to micrometers. Typical WSI: 0.25-0.5 μm/pixel.
pixel_size_y: Physical size of one pixel in Y dimension (micrometers).
class_specific_metrics: For multi-class segmentation, calculate metrics separately for each class (one-vs-rest approach). Shows performance per tissue type.
macro_average: Calculate macro-average (unweighted mean) of metrics across all classes. Treats all classes equally regardless of size.
weighted_average: Calculate weighted average of metrics, weighted by class prevalence/size. Emphasizes performance on larger structures.
object_detection_metrics: For instance segmentation, calculate object-level detection metrics: precision, recall, F1-score based on IoU threshold for matching objects.
iou_threshold: Minimum IoU between predicted and ground truth objects to consider them matched. Standard: 0.5 for object detection, 0.7 for strict matching.
count_metrics: Calculate object counting accuracy for instance segmentation (e.g., cell count accuracy, absolute/relative counting error).
confidence_intervals: Calculate confidence intervals for metrics across images using bootstrap or normal approximation.
bootstrap_ci: Use bootstrap resampling for confidence intervals instead of normal approximation. More accurate for small sample sizes or skewed distributions.
bootstrap_samples: Number of bootstrap samples for confidence interval estimation.
confidence_level: Confidence level for interval estimation (default 95 percent).
quality_thresholds: Apply clinical quality thresholds to categorize segmentation performance (excellent/good/acceptable/poor) based on Dice/IoU values.
dice_threshold_excellent: Dice coefficient threshold for "excellent" segmentation quality. Typical: >=0.90 for clinical deployment.
dice_threshold_good: Dice coefficient threshold for "good" segmentation quality.
dice_threshold_acceptable: Dice coefficient threshold for minimally "acceptable" segmentation. Below this may require human review.
stratified_analysis: Perform stratified analysis by image characteristics (scanner, magnification, staining protocol, tissue type) to assess consistency.
stratify_by: Variable for stratified analysis (e.g., scanner_type, magnification, stain).
outlier_detection: Detect outlier images with unusually poor segmentation performance. Flags images requiring expert review.
outlier_method: Method for outlier detection across images.
plot_metric_distribution: Plot distribution of metrics across images (histogram/violin plot). Shows variability in segmentation quality.
plot_scatter_comparison: Scatter plot showing relationship between Dice and IoU across images.
plot_boundary_error: Plot Hausdorff and surface distances to visualize boundary accuracy.
plot_confusion_matrix: For multi-class segmentation, show confusion matrix of pixel classifications.
plot_performance_by_class: Bar plot showing metrics for each class in multi-class segmentation.
application_context: Clinical/research application context for interpretation guidance.
show_interpretation: Provide interpretation of results including recommendations for clinical deployment based on performance metrics.
paired_analysis: Perform paired statistical tests comparing segmentation performance between different AI models or methods on the same images.
comparison_method: Variable identifying different segmentation methods for comparison.
missing_handling: How to handle images with missing or incomplete segmentations.
random_seed: Random seed for bootstrap sampling and other stochastic procedures.

Value

A results object containing:

`results$instructions`					a html
`results$overallSummary`					a table
`results$overlapMetricsTable`					a table
`results$distanceMetricsTable`					a table
`results$multiclassMetricsTable`					a table
`results$instanceMetricsTable`					a table
`results$qualityAssessmentTable`					a table
`results$outlierImagesTable`					a table
`results$stratifiedAnalysisTable`					a table
`results$comparisonTable`					a table
`results$metricDistributionPlot`					an image
`results$scatterComparisonPlot`					an image
`results$boundaryErrorPlot`					an image
`results$confusionMatrixPlot`					an image
`results$performanceByClassPlot`					an image
`results$clinicalInterpretation`					a html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$overallSummary$asDF

as.data.frame(results$overallSummary)