Skip to contents

Grey-zone ROC analysis for diagnostic tests where uncertainty around decision thresholds requires a "don't know" or "defer decision" option. Unlike traditional binary ROC that forces every case into positive or negative, grey-zone ROC acknowledges diagnostic uncertainty by creating an indeterminate zone around the threshold where additional testing or expert review is recommended. Essential for AI model deployment where uncertain predictions should not drive clinical decisions, cytology with atypical findings requiring repeat sampling, and biomarker cutoffs where borderline values need confirmation. The analysis determines optimal grey-zone boundaries that maximize diagnostic certainty while minimizing inconclusive results, calculates performance metrics excluding vs including grey-zone cases, and provides clinical decision rules for handling uncertain classifications. Particularly valuable for real-world implementation of diagnostic tests where "I don't know" is a valid and often safer response than forcing a potentially incorrect binary classification.

Usage

greyzoneroc(
  data,
  predictor,
  outcome,
  positive_level,
  grey_zone_method = "fixed_width",
  grey_zone_width = 0.1,
  lower_grey_boundary = 0.45,
  upper_grey_boundary = 0.55,
  confidence_threshold = 0.8,
  cost_false_positive = 1,
  cost_false_negative = 1,
  cost_grey_zone = 0.3,
  calculate_definite_performance = TRUE,
  calculate_all_cases_performance = TRUE,
  grey_zone_characteristics = TRUE,
  optimal_threshold = "youden",
  clinical_threshold = 0.5,
  prediction_intervals = FALSE,
  bootstrap_grey_zone = TRUE,
  bootstrap_samples = 1000,
  confidence_level = 0.95,
  grey_zone_action = "reflex",
  reflex_test_name = "Confirmatory test",
  plot_grey_zone_roc = TRUE,
  plot_threshold_distributions = TRUE,
  plot_grey_zone_size = TRUE,
  plot_uncertainty_map = FALSE,
  plot_cost_surface = FALSE,
  fuzzy_membership = FALSE,
  bayesian_grey_zone = FALSE,
  stratified_grey_zone = FALSE,
  stratify_by,
  clinical_scenario = "general",
  missing_handling = "complete",
  random_seed = 123
)

Arguments

data

The data as a data frame.

predictor

Continuous predictor variable or probability score from diagnostic test. For AI models, this is typically the predicted probability. For biomarkers, this is the measured concentration or expression level.

outcome

Binary gold standard outcome variable defining true disease status. Must have exactly 2 levels (positive/negative or diseased/healthy).

positive_level

Level of the outcome variable representing the positive/diseased state.

grey_zone_method

Method for defining the grey-zone boundaries. Fixed width creates equal margins around the optimal threshold, confidence uses prediction uncertainty from the model, cost-benefit minimizes expected costs, Youden creates symmetric zone maximizing certainty, custom allows manual boundary specification.

grey_zone_width

Width of the grey zone when using fixed_width method. For probability scores (0-1), 0.1 creates a ±0.05 margin around the threshold. Larger values increase the indeterminate region but improve certainty for definite classifications.

lower_grey_boundary

Lower boundary of grey zone when using custom method. Values below this are classified as negative. Should be less than upper_grey_boundary.

upper_grey_boundary

Upper boundary of grey zone when using custom method. Values above this are classified as positive. Should be greater than lower_grey_boundary.

confidence_threshold

Minimum confidence level required for definite classification when using confidence-based method. Predictions with lower confidence are assigned to grey zone. Higher values increase grey-zone size but improve reliability of definite calls.

cost_false_positive

Relative cost of false positive classification. Used in cost-benefit optimization to determine grey-zone boundaries that minimize expected costs.

cost_false_negative

Relative cost of false negative classification. Higher values widen the grey zone for low scores to avoid missing positive cases.

cost_grey_zone

Relative cost of classifying a case as grey zone (requiring additional testing). Typically lower than misclassification costs but higher than correct classification. Reflects the burden of deferred decisions and additional testing.

calculate_definite_performance

Calculate sensitivity, specificity, and AUC using only cases with definite classifications (excluding grey-zone). Shows the reliability of the test when it makes a definite call.

calculate_all_cases_performance

Calculate performance treating grey-zone as misclassifications. Provides worst-case scenario where all deferred decisions are considered failures.

grey_zone_characteristics

Analyze characteristics of cases falling in the grey zone including prevalence distribution, risk factors, and suggestions for resolution.

optimal_threshold

Method for selecting the central threshold around which the grey zone is defined.

clinical_threshold

Prespecified clinical threshold when using clinical threshold selection. Common values: 0.5 for balanced classification, varies by clinical context.

prediction_intervals

Calculate prediction intervals around ROC curve to visualize uncertainty. Particularly useful for AI models with calibrated probabilities.

bootstrap_grey_zone

Use bootstrap to assess stability of grey-zone boundaries and performance metrics. Provides confidence intervals for grey-zone size and definite classification rates.

bootstrap_samples

Number of bootstrap samples for confidence interval estimation.

confidence_level

Confidence level for intervals around performance metrics and grey-zone boundaries.

grey_zone_action

Recommended clinical action for cases in the grey zone. This affects interpretation of diagnostic performance and workflow design.

reflex_test_name

Name of the reflex/confirmatory test used for grey-zone cases. For example: FISH for HER2 2+, HPV testing for ASCUS cytology.

plot_grey_zone_roc

Display ROC curve with grey-zone boundaries highlighted and performance metrics for definite vs all classifications shown.

plot_threshold_distributions

Show distribution of predictor scores for positive and negative cases with grey-zone boundaries marked. Visualizes overlap and uncertainty.

plot_grey_zone_size

Plot showing how varying grey-zone width affects the trade-off between percentage of definite classifications and their accuracy.

plot_uncertainty_map

Create heatmap showing prediction uncertainty across the score range. Useful for AI models with calibrated confidence estimates.

plot_cost_surface

Visualize expected costs across different grey-zone boundary configurations. Helps identify cost-optimal grey-zone width.

fuzzy_membership

Use fuzzy set theory to model gradual transition between definite positive, grey zone, and definite negative. Provides soft boundaries instead of hard cutoffs.

bayesian_grey_zone

Use Bayesian approach to define grey zone based on posterior probability intervals. Incorporates prior information and uncertainty quantification.

stratified_grey_zone

Perform grey-zone analysis stratified by important covariates to assess whether grey-zone boundaries should vary by subpopulation.

stratify_by

Variable for stratified analysis. Each level gets separate grey-zone analysis.

clinical_scenario

Clinical scenario for context-specific interpretation and recommendations.

missing_handling

Method for handling missing predictor or outcome data.

random_seed

Random seed for bootstrap and other stochastic procedures.

Value

A results object containing:

results$instructionsa html
results$greyZoneSummarya table
results$greyZoneBoundariesa table
results$definitePerformanceTablea table
results$allCasesPerformanceTablea table
results$greyZoneCharacteristicsa table
results$classificationBreakdowna table
results$costAnalysisTablea table
results$tradeoffAnalysisTablea table
results$stratifiedGreyZoneTablea table
results$greyZoneROCPlotan image
results$thresholdDistributionsPlotan image
results$greyZoneSizePlotan image
results$uncertaintyMapPlotan image
results$costSurfacePlotan image
results$clinicalRecommendationsa html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$greyZoneSummary$asDF

as.data.frame(results$greyZoneSummary)

Examples

result <- greyzoneroc(
    data = ai_predictions,
    predictor = "ai_probability",
    outcome = "true_diagnosis",
    positive_level = "disease",
    grey_zone_width = 0.1,
    confidence_threshold = 0.80
)
#> Error: object 'ai_predictions' not found