Skip to contents

Simplified AI model validation tool for comparing diagnostic performance. Calculates AUC, sensitivity, and specificity for predictor variables and performs statistical comparison using DeLong test.

Usage

aivalidation(
  data,
  predictorVars,
  outcomeVar = NULL,
  positiveLevel,
  compareModels = FALSE,
  youdensJ = FALSE,
  matthewsCC = FALSE,
  bootstrapCI = FALSE,
  nBootstrap = 1000,
  rocPlot = FALSE,
  crossValidation = "none",
  stratified = TRUE,
  randomSeed = 42,
  showExplanations = FALSE,
  showSummaries = FALSE
)

Arguments

data

the data as a data frame

predictorVars

a vector of strings naming the predictor variables (AI scores, human scores, biomarkers, etc.) from data. Limited to first 5 for pairwise comparisons.

outcomeVar

a string naming the binary outcome variable (gold standard) from data

positiveLevel

the level of the outcome variable which represents the positive case

compareModels

perform statistical comparison between models using DeLong test for AUC comparison

youdensJ

calculate and display Youden's J statistic (Sensitivity + Specificity - 1)

matthewsCC

calculate and display Matthews Correlation Coefficient (MCC)

bootstrapCI

use bootstrap resampling for confidence intervals (more robust for small samples)

nBootstrap

number of bootstrap iterations (higher values are more accurate but slower)

rocPlot

generate ROC curves for all predictor variables

crossValidation

cross-validation method for model validation (simplified to avoid resource limits)

stratified

maintain outcome variable proportions across folds

randomSeed

random seed for reproducible cross-validation results

showExplanations

show detailed methodology explanations

showSummaries

show interpretation summaries of results

Value

A results object containing:

results$instructionsa html
results$performanceTablePerformance metrics for each predictor variable
results$comparisonTableStatistical comparison between predictor models using DeLong test
results$cvPerformanceTableCross-validated performance metrics for each predictor
results$rocPlotROC curves for all predictor models
results$methodologyExplanationa html
results$resultsInterpretationa html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$performanceTable$asDF

as.data.frame(results$performanceTable)

Examples

# \donttest{
data('medical_ai_data', package='ClinicoPath')

aivalidation(data = medical_ai_data,
            predictorVars = c('AI_score', 'human_score', 'biomarker1'),
            outcomeVar = 'diagnosis',
            positiveLevel = 'positive',
            compareModels = TRUE)
#> 
#>  AI MODEL VALIDATION
#> 
#>  Model Performance Metrics                                                                                               
#>  ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
#>    Predictor      AUC          AUC 95% CI Lower    AUC 95% CI Upper    Sensitivity    Specificity    Optimal Threshold   
#>  ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
#>    AI_score       0.6937368           0.6194502           0.7680235      0.8315789      0.5100000           0.38150000   
#>    human_score    0.6360526           0.5578771           0.7142281      0.4736842      0.7700000           0.58800000   
#>    biomarker1     0.6779474           0.6026435           0.7532512      0.7052632      0.6200000          -0.07500000   
#>  ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
#> 
#> 
#>  Model Comparison (DeLong Test)                                                              
#>  ─────────────────────────────────────────────────────────────────────────────────────────── 
#>    Comparison                   AUC (Model 1)    AUC (Model 2)    Difference     p-value     
#>  ─────────────────────────────────────────────────────────────────────────────────────────── 
#>    AI_score vs human_score          0.6937368        0.6360526     0.05768421    0.0063090   
#>    AI_score vs biomarker1           0.6937368        0.6779474     0.01578947    0.7837605   
#>    human_score vs biomarker1        0.6360526        0.6779474    -0.04189474    0.4655825   
#>  ─────────────────────────────────────────────────────────────────────────────────────────── 
#> 
#> 
#>  Cross-Validation Performance                                                                                     
#>  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
#>    Predictor    Mean AUC     SD AUC    Mean Sensitivity    SD Sensitivity    Mean Specificity    SD Specificity   
#>  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
#>  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
#> 
# }