Skip to contents

Comprehensive validation metrics for image segmentation tasks in digital pathology and AI-based tissue analysis. Evaluates overlap and boundary accuracy between AI-predicted segmentations and expert-annotated ground truth. Essential metrics include Dice Coefficient (F1-score for spatial overlap), Jaccard Index (IoU - Intersection over Union), Hausdorff Distance (maximum boundary deviation), and Surface Distance metrics. Designed for tumor boundary delineation, gland segmentation, cell nuclei detection, tissue region classification, and any pixel-level or region-based segmentation task. Supports binary segmentation (single structure), multi-class segmentation (multiple tissue types), and instance segmentation (individual object detection). Provides statistical analysis across multiple images, stratification by image characteristics (magnification, staining, scanner), and clinical interpretation of segmentation quality. Critical for validating AI algorithms before deployment in diagnostic workflows, comparing segmentation methods, and establishing performance benchmarks for digital pathology systems.

Usage

segmentationmetrics(
  data,
  prediction_mask,
  ground_truth_mask,
  image_id,
  segmentation_type = "binary",
  positive_class,
  dice_coefficient = TRUE,
  jaccard_index = TRUE,
  volumetric_similarity = FALSE,
  sensitivity_specificity = TRUE,
  hausdorff_distance = TRUE,
  average_hausdorff = TRUE,
  surface_distance = TRUE,
  surface_overlap = FALSE,
  boundary_tolerance = 2,
  pixel_size_provided = FALSE,
  pixel_size_x = 0.5,
  pixel_size_y = 0.5,
  class_specific_metrics = TRUE,
  macro_average = TRUE,
  weighted_average = TRUE,
  object_detection_metrics = FALSE,
  iou_threshold = 0.5,
  count_metrics = FALSE,
  confidence_intervals = TRUE,
  bootstrap_ci = FALSE,
  bootstrap_samples = 1000,
  confidence_level = 0.95,
  quality_thresholds = TRUE,
  dice_threshold_excellent = 0.9,
  dice_threshold_good = 0.8,
  dice_threshold_acceptable = 0.7,
  stratified_analysis = FALSE,
  stratify_by,
  outlier_detection = TRUE,
  outlier_method = "iqr",
  plot_metric_distribution = TRUE,
  plot_scatter_comparison = TRUE,
  plot_boundary_error = TRUE,
  plot_confusion_matrix = FALSE,
  plot_performance_by_class = FALSE,
  application_context = "general",
  show_interpretation = TRUE,
  paired_analysis = FALSE,
  comparison_method,
  missing_handling = "complete",
  random_seed = 123
)

Arguments

data

The data as a data frame.

prediction_mask

AI-predicted segmentation mask. For binary segmentation, this is a binary variable (0/1 or background/foreground). For multi-class, this contains class labels. Can be pixel-level or region-level data.

ground_truth_mask

Expert-annotated ground truth segmentation mask. Must have the same encoding scheme as prediction_mask.

image_id

Variable identifying individual images or regions. Used to aggregate metrics per image and calculate summary statistics across images.

segmentation_type

Type of segmentation task. Binary for single structure (e.g., tumor vs background), multi-class for multiple tissue types (e.g., epithelium/stroma/necrosis), instance for individual object detection (e.g., separate cell nuclei).

positive_class

For binary segmentation, specify which level represents the foreground/ structure of interest (e.g., tumor, gland, nucleus).

dice_coefficient

Calculate Dice coefficient (also known as F1-score for segmentation). Measures spatial overlap: Dice = 2|A∩B| / (|A|+|B|). Range 0-1, where 1 = perfect overlap. Most commonly used segmentation metric.

jaccard_index

Calculate Jaccard Index (Intersection over Union). Measures overlap: IoU = |A∩B| / |A∪B|. Range 0-1. Related to Dice: IoU = Dice/(2-Dice). Standard metric in computer vision and AI segmentation.

volumetric_similarity

Calculate Volumetric Similarity coefficient for 3D segmentation or area-based similarity in 2D. Useful for volume/area preservation analysis.

sensitivity_specificity

Calculate pixel-wise sensitivity (true positive rate) and specificity (true negative rate). Shows over-segmentation vs under-segmentation.

hausdorff_distance

Calculate Hausdorff Distance - maximum distance from predicted boundary to ground truth boundary. Sensitive to outliers. Measures worst-case boundary error. Reported in pixels or mm if pixel size provided.

average_hausdorff

Calculate Average Hausdorff Distance (95th percentile). More robust to outliers than maximum Hausdorff. Better represents typical boundary error.

surface_distance

Calculate average distance between predicted and ground truth boundaries. Mean of all point-to-surface distances. Provides average boundary error.

surface_overlap

Calculate Surface Dice (boundary-focused Dice). Only considers points near boundaries. Emphasizes boundary accuracy over volume accuracy.

boundary_tolerance

Tolerance distance for surface/boundary metrics in pixels. Points within this distance are considered boundary points. Typical: 1-5 pixels depending on magnification.

pixel_size_provided

Whether pixel/voxel physical dimensions are available for converting pixel-based metrics to physical distances (micrometers, millimeters).

pixel_size_x

Physical size of one pixel in X dimension (micrometers). Used to convert pixel-based distances to micrometers. Typical WSI: 0.25-0.5 μm/pixel.

pixel_size_y

Physical size of one pixel in Y dimension (micrometers).

class_specific_metrics

For multi-class segmentation, calculate metrics separately for each class (one-vs-rest approach). Shows performance per tissue type.

macro_average

Calculate macro-average (unweighted mean) of metrics across all classes. Treats all classes equally regardless of size.

weighted_average

Calculate weighted average of metrics, weighted by class prevalence/size. Emphasizes performance on larger structures.

object_detection_metrics

For instance segmentation, calculate object-level detection metrics: precision, recall, F1-score based on IoU threshold for matching objects.

iou_threshold

Minimum IoU between predicted and ground truth objects to consider them matched. Standard: 0.5 for object detection, 0.7 for strict matching.

count_metrics

Calculate object counting accuracy for instance segmentation (e.g., cell count accuracy, absolute/relative counting error).

confidence_intervals

Calculate confidence intervals for metrics across images using bootstrap or normal approximation.

bootstrap_ci

Use bootstrap resampling for confidence intervals instead of normal approximation. More accurate for small sample sizes or skewed distributions.

bootstrap_samples

Number of bootstrap samples for confidence interval estimation.

confidence_level

Confidence level for interval estimation (default 95 percent).

quality_thresholds

Apply clinical quality thresholds to categorize segmentation performance (excellent/good/acceptable/poor) based on Dice/IoU values.

dice_threshold_excellent

Dice coefficient threshold for "excellent" segmentation quality. Typical: ≥0.90 for clinical deployment.

dice_threshold_good

Dice coefficient threshold for "good" segmentation quality.

dice_threshold_acceptable

Dice coefficient threshold for minimally "acceptable" segmentation. Below this may require human review.

stratified_analysis

Perform stratified analysis by image characteristics (scanner, magnification, staining protocol, tissue type) to assess consistency.

stratify_by

Variable for stratified analysis (e.g., scanner_type, magnification, stain).

outlier_detection

Detect outlier images with unusually poor segmentation performance. Flags images requiring expert review.

outlier_method

Method for outlier detection across images.

plot_metric_distribution

Plot distribution of metrics across images (histogram/violin plot). Shows variability in segmentation quality.

plot_scatter_comparison

Scatter plot showing relationship between Dice and IoU across images.

plot_boundary_error

Plot Hausdorff and surface distances to visualize boundary accuracy.

plot_confusion_matrix

For multi-class segmentation, show confusion matrix of pixel classifications.

plot_performance_by_class

Bar plot showing metrics for each class in multi-class segmentation.

application_context

Clinical/research application context for interpretation guidance.

show_interpretation

Provide interpretation of results including recommendations for clinical deployment based on performance metrics.

paired_analysis

Perform paired statistical tests comparing segmentation performance between different AI models or methods on the same images.

comparison_method

Variable identifying different segmentation methods for comparison.

missing_handling

How to handle images with missing or incomplete segmentations.

random_seed

Random seed for bootstrap sampling and other stochastic procedures.

Value

A results object containing:

results$instructionsa html
results$overallSummarya table
results$overlapMetricsTablea table
results$distanceMetricsTablea table
results$multiclassMetricsTablea table
results$instanceMetricsTablea table
results$qualityAssessmentTablea table
results$outlierImagesTablea table
results$stratifiedAnalysisTablea table
results$comparisonTablea table
results$metricDistributionPlotan image
results$scatterComparisonPlotan image
results$boundaryErrorPlotan image
results$confusionMatrixPlotan image
results$performanceByClassPlotan image
results$clinicalInterpretationa html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$overallSummary$asDF

as.data.frame(results$overallSummary)

Examples

# \donttest{
result <- segmentationmetrics(
    data = segmentation_results,
    prediction_mask = "ai_segmentation",
    ground_truth_mask = "expert_annotation",
    image_id = "slide_id",
    metric_type = "all"
)
#> Error in segmentationmetrics(data = segmentation_results, prediction_mask = "ai_segmentation",     ground_truth_mask = "expert_annotation", image_id = "slide_id",     metric_type = "all"): unused argument (metric_type = "all")
# }