Advanced TNM Stage Migration Analysis: A Comprehensive Guide

Introduction

The stagemigration function in ClinicoPath provides state-of-the-art statistical methods for validating TNM staging system improvements. This comprehensive guide demonstrates how to use advanced features for robust staging validation in pathology research.

What is Stage Migration Analysis?

Stage migration analysis, also known as the Will Rogers phenomenon, occurs when patients are reclassified from one staging system to another. This analysis helps pathologists and clinicians evaluate whether a new staging system provides better prognostic discrimination than existing systems.

Key Applications

TNM Edition Transitions: Validating AJCC 7th to 8th edition changes
Biomarker Integration: Adding molecular markers to traditional staging
Institution-Specific Modifications: Customizing staging for local populations
Multi-Institutional Harmonization: Standardizing staging across centers

Clinical Significance

Proper staging validation is crucial because:

Treatment Planning: Staging guides therapeutic decisions
Prognosis: Patients and families need accurate survival estimates
Clinical Trials: Enrollment and stratification depend on accurate staging
Healthcare Resources: Staging influences follow-up intensity and costs

Getting Started

Installation and Setup

# Install ClinicoPath if not already installed
# install.packages("ClinicoPath")

# Load required libraries
library(ClinicoPath)
library(dplyr)
library(ggplot2)
library(survival)

Sample Data Overview

The package includes comprehensive test datasets for different cancer types:

# Load lung cancer staging data
load("stagemigration_lung_cancer.rda")

# Examine the data structure
head(lung_cancer_data)
str(lung_cancer_data)

# Check staging distribution
table(lung_cancer_data$old_stage, lung_cancer_data$new_stage)

Data Requirements

Your dataset should include:

Old staging variable: Original staging system (e.g., TNM 7th edition)
New staging variable: Proposed staging system (e.g., TNM 8th edition)
Survival time: Time to event or censoring (months recommended)
Event indicator: Death or event occurrence (1 = event, 0 = censored)
Patient characteristics: Age, sex, histology, performance status, etc.

Basic Stage Migration Analysis

Simple Comparison

Start with a basic analysis to understand migration patterns:

# Basic stage migration analysis
basic_result <- stagemigration(
  data = lung_cancer_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "basic"
)

# View migration matrix
print(basic_result$results$migrationMatrix)

# View stage-specific survival statistics
print(basic_result$results$stageSummary)

Understanding Migration Patterns

The migration matrix shows how patients move between staging systems:

# Calculate migration statistics
migration_stats <- lung_cancer_data %>%
  mutate(
    migration_type = case_when(
      old_stage == new_stage ~ "No Change",
      as.numeric(old_stage) < as.numeric(new_stage) ~ "Upstaging",
      as.numeric(old_stage) > as.numeric(new_stage) ~ "Downstaging"
    )
  ) %>%
  count(migration_type) %>%
  mutate(percentage = round(n / sum(n) * 100, 1))

print(migration_stats)

Standard Validation Analysis

Comprehensive Statistical Assessment

For publication-quality staging validation, use the standard analysis:

# Standard staging validation
standard_result <- stagemigration(
  data = lung_cancer_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "standard",
  
  # Advanced metrics
  calculateNRI = TRUE,
  calculateIDI = TRUE,
  performROCAnalysis = TRUE,
  performDCA = TRUE,
  
  # Validation
  performBootstrap = TRUE,
  bootstrapReps = 1000,
  
  # Cancer-specific guidance
  cancerType = "lung",
  
  # Visualization
  showMigrationHeatmap = TRUE,
  showROCComparison = TRUE,
  showForestPlot = TRUE
)

# View key results
print(standard_result$results$concordanceComparison)
print(standard_result$results$nriResults)
print(standard_result$results$clinicalInterpretation)

Key Metrics Explained

Harrell’s C-Index

The C-index measures discriminative ability:

Range: 0.5 (no discrimination) to 1.0 (perfect discrimination)
Clinical Significance: Improvement ≥ 0.02 is meaningful
Interpretation: Higher values indicate better prognostic separation

Net Reclassification Improvement (NRI)

NRI quantifies improvement in risk classification:

# Interpret NRI results
nri_results <- standard_result$results$nriResults$asDF

# NRI > 0.20 (20%) is considered clinically significant
# NRI > 0.60 (60%) is considered substantial improvement
for (i in 1:nrow(nri_results)) {
  time_point <- nri_results$TimePoint[i]
  nri_value <- nri_results$NRI[i]
  
  interpretation <- case_when(
    nri_value > 0.60 ~ "Substantial improvement",
    nri_value > 0.20 ~ "Clinically significant improvement",
    nri_value > 0.10 ~ "Modest improvement",
    nri_value > 0.00 ~ "Minimal improvement",
    TRUE ~ "No improvement or worsening"
  )
  
  cat(sprintf("Time %s months: NRI = %.3f (%s)\n", 
              time_point, nri_value, interpretation))
}

Integrated Discrimination Improvement (IDI)

IDI measures improvement in predicted probabilities:

# Interpret IDI results
idi_results <- standard_result$results$idiResults$asDF

# IDI > 0.02 is considered clinically meaningful
# IDI > 0.05 is considered substantial
idi_value <- idi_results$IDI[1]

idi_interpretation <- case_when(
  idi_value > 0.05 ~ "Substantial improvement in risk prediction",
  idi_value > 0.02 ~ "Clinically meaningful improvement",
  idi_value > 0.01 ~ "Modest improvement",
  idi_value > 0.00 ~ "Minimal improvement",
  TRUE ~ "No improvement"
)

cat(sprintf("IDI = %.4f (%s)\n", idi_value, idi_interpretation))

Advanced Analysis Features

Decision Curve Analysis

DCA evaluates clinical utility across decision thresholds:

# Analysis with Decision Curve Analysis
dca_result <- stagemigration(
  data = lung_cancer_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "comprehensive",
  
  # Decision Curve Analysis
  performDCA = TRUE,
  performCalibration = TRUE,
  
  # Clinical thresholds
  clinicalSignificanceThreshold = 0.02,
  nriClinicalThreshold = 0.20,
  
  # Cancer-specific settings
  cancerType = "lung",
  
  # Comprehensive outputs
  showDecisionCurves = TRUE,
  showCalibrationPlots = TRUE,
  showClinicalInterpretation = TRUE,
  generateExecutiveSummary = TRUE
)

# View Decision Curve results
print(dca_result$results$dcaResults)

# View clinical interpretation
print(dca_result$results$clinicalInterpretation)

Bootstrap Validation

Internal validation with optimism correction:

# Comprehensive analysis with bootstrap validation
bootstrap_result <- stagemigration(
  data = lung_cancer_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "comprehensive",
  
  # Bootstrap validation
  performBootstrap = TRUE,
  bootstrapReps = 2000,
  useOptimismCorrection = TRUE,
  
  # Cross-validation
  performCrossValidation = TRUE,
  cvFolds = 5,
  
  # All advanced methods
  calculateNRI = TRUE,
  calculateIDI = TRUE,
  performROCAnalysis = TRUE,
  performDCA = TRUE,
  
  # Model testing
  performLikelihoodTests = TRUE,
  calculatePseudoR2 = TRUE,
  performHomogeneityTests = TRUE,
  performTrendTests = TRUE,
  
  # Cancer-specific
  cancerType = "lung"
)

# View bootstrap results
print(bootstrap_result$results$bootstrapResults)

# View optimism-corrected performance
bootstrap_summary <- bootstrap_result$results$bootstrapResults$asDF
corrected_cindex <- bootstrap_summary$Corrected[bootstrap_summary$Metric == "C-Index"]
cat(sprintf("Optimism-corrected C-index improvement: %.4f\n", corrected_cindex))

Cancer-Specific Analysis

Lung Cancer Staging

# Lung cancer specific analysis
lung_result <- stagemigration(
  data = lung_cancer_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "publication",
  
  # Lung cancer specific settings
  cancerType = "lung",
  
  # Time points relevant to lung cancer
  nriTimePoints = "6, 12, 24, 36, 60",
  rocTimePoints = "6, 12, 24, 36, 60",
  
  # Comprehensive analysis
  calculateNRI = TRUE,
  calculateIDI = TRUE,
  performROCAnalysis = TRUE,
  performDCA = TRUE,
  performBootstrap = TRUE,
  bootstrapReps = 1000,
  
  # Publication ready outputs
  showClinicalInterpretation = TRUE,
  generateExecutiveSummary = TRUE,
  showStatisticalSummary = TRUE,
  showMethodologyNotes = TRUE,
  includeEffectSizes = TRUE
)

# View lung cancer specific guidance
print(lung_result$results$clinicalInterpretation)

Breast Cancer Staging

# Load breast cancer data
load("stagemigration_breast_cancer.rda")

# Breast cancer specific analysis
breast_result <- stagemigration(
  data = breast_cancer_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "comprehensive",
  
  # Breast cancer specific settings
  cancerType = "breast",
  
  # Breast cancer relevant time points (longer follow-up)
  nriTimePoints = "12, 24, 60, 120",
  rocTimePoints = "12, 24, 60, 120",
  
  # Clinical significance thresholds (more conservative for breast cancer)
  clinicalSignificanceThreshold = 0.015,
  nriClinicalThreshold = 0.15,
  
  # Comprehensive methods
  calculateNRI = TRUE,
  calculateIDI = TRUE,
  performROCAnalysis = TRUE,
  performDCA = TRUE,
  performBootstrap = TRUE,
  bootstrapReps = 1000,
  
  # Visualization
  showMigrationHeatmap = TRUE,
  showROCComparison = TRUE,
  showCalibrationPlots = TRUE,
  showDecisionCurves = TRUE,
  showForestPlot = TRUE
)

# Compare breast cancer specific recommendations
print(breast_result$results$clinicalInterpretation)

Colorectal Cancer Staging

# Load colorectal cancer data
load("stagemigration_colorectal_cancer.rda")

# Colorectal cancer specific analysis
colorectal_result <- stagemigration(
  data = colorectal_cancer_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "comprehensive",
  
  # Colorectal cancer specific settings
  cancerType = "colorectal",
  
  # Colorectal cancer time points
  nriTimePoints = "12, 24, 36, 60",
  rocTimePoints = "12, 24, 36, 60",
  
  # Standard thresholds
  clinicalSignificanceThreshold = 0.02,
  nriClinicalThreshold = 0.20,
  
  # Full analysis
  calculateNRI = TRUE,
  calculateIDI = TRUE,
  performROCAnalysis = TRUE,
  performDCA = TRUE,
  performBootstrap = TRUE,
  bootstrapReps = 1000,
  
  # Advanced tests
  performLikelihoodTests = TRUE,
  calculatePseudoR2 = TRUE,
  performHomogeneityTests = TRUE,
  performTrendTests = TRUE,
  showWillRogersAnalysis = TRUE,
  
  # Comprehensive outputs
  showClinicalInterpretation = TRUE,
  generateExecutiveSummary = TRUE
)

# View colorectal cancer specific results
print(colorectal_result$results$clinicalInterpretation)

Interpreting Results

Statistical Significance vs Clinical Importance

# Framework for interpreting staging validation results
interpret_staging_validation <- function(result, cancer_type = "general") {
  
  # Extract key metrics
  concordance <- result$results$concordanceComparison$asDF
  nri <- result$results$nriResults$asDF
  clinical_interp <- result$results$clinicalInterpretation$asDF
  
  # C-index interpretation
  c_index_old <- concordance$C_Index[concordance$Model == "Old Staging"]
  c_index_new <- concordance$C_Index[concordance$Model == "New Staging"]
  c_index_diff <- c_index_new - c_index_old
  
  # NRI interpretation (using 24-month if available)
  nri_24 <- nri$NRI[nri$TimePoint == "24"]
  if (length(nri_24) == 0) nri_24 <- nri$NRI[1]
  
  # Clinical decision framework
  cat("=== STAGING VALIDATION INTERPRETATION ===\n\n")
  
  # C-index assessment
  cat("1. DISCRIMINATION (C-INDEX):\n")
  cat(sprintf("   Old staging C-index: %.3f\n", c_index_old))
  cat(sprintf("   New staging C-index: %.3f\n", c_index_new))
  cat(sprintf("   Improvement: %.4f\n", c_index_diff))
  
  if (c_index_diff >= 0.02) {
    cat("   ✓ CLINICALLY SIGNIFICANT improvement\n")
  } else if (c_index_diff > 0.01) {
    cat("   ⚠ MODEST improvement (borderline significance)\n")
  } else if (c_index_diff > 0.005) {
    cat("   ⚠ MINIMAL improvement\n")
  } else {
    cat("   ✗ NO meaningful improvement\n")
  }
  
  cat("\n2. RECLASSIFICATION (NRI):\n")
  cat(sprintf("   24-month NRI: %.3f\n", nri_24))
  
  if (nri_24 > 0.20) {
    cat("   ✓ CLINICALLY SIGNIFICANT reclassification improvement\n")
  } else if (nri_24 > 0.10) {
    cat("   ⚠ MODEST reclassification improvement\n")
  } else if (nri_24 > 0.05) {
    cat("   ⚠ MINIMAL reclassification improvement\n")
  } else {
    cat("   ✗ NO meaningful reclassification improvement\n")
  }
  
  # Overall recommendation
  cat("\n3. OVERALL RECOMMENDATION:\n")
  
  if (c_index_diff >= 0.02 && nri_24 > 0.20) {
    cat("   ✓ STRONGLY RECOMMEND adopting new staging system\n")
    cat("   • Both discrimination and reclassification show significant improvement\n")
    cat("   • Clinical benefit is well-established\n")
  } else if (c_index_diff >= 0.015 || nri_24 > 0.15) {
    cat("   ⚠ CONSIDER adopting new staging system\n")
    cat("   • Moderate improvement shown\n")
    cat("   • Additional validation recommended\n")
  } else {
    cat("   ✗ INSUFFICIENT evidence for adoption\n")
    cat("   • Improvements are below clinical significance thresholds\n")
    cat("   • Consider larger sample size or different approach\n")
  }
  
  cat("\n4. IMPLEMENTATION CONSIDERATIONS:\n")
  cat("   • Training requirements for new staging system\n")
  cat("   • Cost-benefit analysis of system change\n")
  cat("   • Integration with existing clinical workflows\n")
  cat("   • External validation in independent cohorts\n")
  
  return(invisible(list(
    c_index_improvement = c_index_diff,
    nri_improvement = nri_24,
    recommendation = ifelse(c_index_diff >= 0.02 && nri_24 > 0.20, "Adopt", 
                           ifelse(c_index_diff >= 0.015 || nri_24 > 0.15, "Consider", "Insufficient"))
  )))
}

# Apply interpretation framework
interpretation <- interpret_staging_validation(lung_result, "lung")

Sample Size Considerations

# Analyze different sample sizes
sample_sizes <- c(100, 200, 500, 1000)

sample_size_results <- list()

for (n in sample_sizes) {
  
  # Create subsample
  subsample <- lung_cancer_data %>% 
    slice_sample(n = n) %>%
    mutate(event = as.numeric(event))
  
  # Run analysis
  result <- stagemigration(
    data = subsample,
    oldStage = "old_stage",
    newStage = "new_stage",
    survivalTime = "survival_time",
    event = "event",
    eventLevel = "1",
    analysisType = "standard",
    calculateNRI = TRUE,
    calculateIDI = TRUE,
    cancerType = "lung",
    performBootstrap = FALSE  # Skip for speed
  )
  
  # Extract key metrics
  concordance <- result$results$concordanceComparison$asDF
  c_index_diff <- concordance$C_Index[concordance$Model == "New Staging"] - 
                  concordance$C_Index[concordance$Model == "Old Staging"]
  
  nri_result <- result$results$nriResults$asDF
  nri_24 <- nri_result$NRI[nri_result$TimePoint == "24"]
  if (length(nri_24) == 0) nri_24 <- nri_result$NRI[1]
  
  sample_size_results[[as.character(n)]] <- list(
    n = n,
    c_index_improvement = c_index_diff,
    nri_improvement = nri_24,
    events = sum(subsample$event),
    event_rate = mean(subsample$event)
  )
}

# Summarize sample size effects
sample_size_summary <- do.call(rbind, lapply(sample_size_results, function(x) {
  data.frame(
    SampleSize = x$n,
    Events = x$events,
    EventRate = round(x$event_rate, 3),
    CIndexImprovement = round(x$c_index_improvement, 4),
    NRIImprovement = round(x$nri_improvement, 3)
  )
}))

print(sample_size_summary)

# Recommendations for sample size
cat("\n=== SAMPLE SIZE RECOMMENDATIONS ===\n")
cat("• Minimum 200 patients for basic analysis\n")
cat("• Minimum 500 patients for reliable NRI/IDI estimation\n")
cat("• Minimum 1000 patients for comprehensive validation\n")
cat("• At least 100 events for stable results\n")
cat("• Consider power analysis for specific effect sizes\n")

Publication-Ready Analysis

Complete Workflow for Manuscript

# Complete publication-ready analysis
publication_result <- stagemigration(
  data = lung_cancer_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "publication",
  
  # Complete statistical analysis
  calculateNRI = TRUE,
  calculateIDI = TRUE,
  performROCAnalysis = TRUE,
  performDCA = TRUE,
  performCalibration = TRUE,
  
  # Validation
  performBootstrap = TRUE,
  bootstrapReps = 2000,
  useOptimismCorrection = TRUE,
  performCrossValidation = TRUE,
  cvFolds = 10,
  
  # Model comparison
  performLikelihoodTests = TRUE,
  calculatePseudoR2 = TRUE,
  performHomogeneityTests = TRUE,
  performTrendTests = TRUE,
  showWillRogersAnalysis = TRUE,
  
  # Clinical assessment
  clinicalSignificanceThreshold = 0.02,
  nriClinicalThreshold = 0.20,
  cancerType = "lung",
  
  # Time points
  nriTimePoints = "6, 12, 18, 24, 36, 60",
  rocTimePoints = "6, 12, 18, 24, 36, 60",
  
  # Comprehensive outputs
  showClinicalInterpretation = TRUE,
  generateExecutiveSummary = TRUE,
  showStatisticalSummary = TRUE,
  showMethodologyNotes = TRUE,
  includeEffectSizes = TRUE,
  
  # Visualizations
  showMigrationHeatmap = TRUE,
  showROCComparison = TRUE,
  showCalibrationPlots = TRUE,
  showDecisionCurves = TRUE,
  showForestPlot = TRUE,
  
  # Confidence intervals
  showConfidenceIntervals = TRUE,
  confidenceLevel = 0.95
)

# Generate manuscript tables
cat("=== TABLE 1: PATIENT CHARACTERISTICS ===\n")
# This would typically be generated separately with tableone or similar

cat("\n=== TABLE 2: STAGING MIGRATION MATRIX ===\n")
print(publication_result$results$migrationMatrix)

cat("\n=== TABLE 3: DISCRIMINATION COMPARISON ===\n")
print(publication_result$results$concordanceComparison)

cat("\n=== TABLE 4: RECLASSIFICATION METRICS ===\n")
print(publication_result$results$nriResults)

cat("\n=== TABLE 5: CLINICAL INTERPRETATION ===\n")
print(publication_result$results$clinicalInterpretation)

cat("\n=== EXECUTIVE SUMMARY ===\n")
# Extract executive summary if available
if ("executiveSummary" %in% names(publication_result$results)) {
  print(publication_result$results$executiveSummary)
}

Statistical Reporting Template

# Template for statistical reporting
generate_methods_text <- function(result, cancer_type) {
  
  cat("=== METHODS SECTION TEXT ===\n\n")
  
  cat("Statistical Analysis:\n")
  cat("Stage migration analysis was performed using the ClinicoPath package in R. ")
  cat("We compared the discriminative ability of the original and new staging systems ")
  cat("using Harrell's C-index with 95% confidence intervals. Net Reclassification ")
  cat("Improvement (NRI) was calculated at 6, 12, 24, 36, and 60 months to quantify ")
  cat("improvement in risk classification. Integrated Discrimination Improvement (IDI) ")
  cat("was used to assess overall improvement in predicted survival probabilities.\n\n")
  
  cat("Time-dependent receiver operating characteristic (ROC) analysis was performed ")
  cat("to evaluate discriminative ability over time. Decision curve analysis (DCA) ")
  cat("was used to assess the clinical utility of each staging system across different ")
  cat("decision thresholds. Bootstrap validation with 2000 repetitions was performed ")
  cat("to assess internal validity and apply optimism correction.\n\n")
  
  cat("Clinical significance was defined as C-index improvement ≥0.02 and NRI >20%. ")
  cat("Likelihood ratio tests were used to compare nested models. All statistical ")
  cat("tests were two-sided with significance level α=0.05.\n\n")
  
  cat("=== RESULTS SECTION TEXT ===\n\n")
  
  # Extract key results
  concordance <- result$results$concordanceComparison$asDF
  nri <- result$results$nriResults$asDF
  
  c_index_old <- concordance$C_Index[concordance$Model == "Old Staging"]
  c_index_new <- concordance$C_Index[concordance$Model == "New Staging"]
  c_index_diff <- c_index_new - c_index_old
  
  nri_24 <- nri$NRI[nri$TimePoint == "24"]
  if (length(nri_24) == 0) nri_24 <- nri$NRI[1]
  
  cat(sprintf("The original staging system achieved a C-index of %.3f (95%% CI: %.3f-%.3f), ",
              c_index_old, 
              concordance$CI_Lower[concordance$Model == "Old Staging"],
              concordance$CI_Upper[concordance$Model == "Old Staging"]))
  
  cat(sprintf("while the new staging system achieved a C-index of %.3f (95%% CI: %.3f-%.3f), ",
              c_index_new,
              concordance$CI_Lower[concordance$Model == "New Staging"],
              concordance$CI_Upper[concordance$Model == "New Staging"]))
  
  cat(sprintf("representing an improvement of %.4f (p<0.001).\n\n", c_index_diff))
  
  cat(sprintf("The 24-month Net Reclassification Improvement was %.3f (95%% CI: %.3f-%.3f), ",
              nri_24,
              nri$NRI_CI_Lower[nri$TimePoint == "24"],
              nri$NRI_CI_Upper[nri$TimePoint == "24"]))
  
  if (nri_24 > 0.20) {
    cat("indicating clinically significant improvement in risk classification.\n\n")
  } else {
    cat("indicating modest improvement in risk classification.\n\n")
  }
  
  cat("Bootstrap validation with optimism correction confirmed the stability of these results.\n\n")
  
  return(invisible(NULL))
}

# Generate methods and results text
generate_methods_text(publication_result, "lung")

Troubleshooting and Best Practices

Common Issues and Solutions

Small Sample Sizes

# Load small sample data
load("stagemigration_small_sample.rda")

# Analyze small sample with appropriate methods
small_sample_result <- stagemigration(
  data = small_sample_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "basic",  # Use basic analysis for small samples
  
  # Reduce bootstrap repetitions
  performBootstrap = TRUE,
  bootstrapReps = 500,
  
  # Skip computationally intensive methods
  performCrossValidation = FALSE,
  performDCA = FALSE,
  
  # Use wider confidence intervals
  confidenceLevel = 0.90,
  
  # Appropriate clinical thresholds
  clinicalSignificanceThreshold = 0.03,  # More conservative
  nriClinicalThreshold = 0.25
)

# Check sample size adequacy
n_patients <- nrow(small_sample_data)
n_events <- sum(small_sample_data$event)
event_rate <- mean(small_sample_data$event)

cat(sprintf("Sample size: %d patients, %d events (%.1f%%)\n", 
            n_patients, n_events, event_rate * 100))

if (n_patients < 200) {
  cat("WARNING: Small sample size may limit reliability of results\n")
}

if (n_events < 50) {
  cat("WARNING: Low event count may affect statistical power\n")
}

Handling Missing Data

# Check for missing data
check_missing_data <- function(data) {
  missing_summary <- data %>%
    summarise_all(~sum(is.na(.))) %>%
    gather(variable, missing_count) %>%
    mutate(missing_percent = round(missing_count / nrow(data) * 100, 1)) %>%
    filter(missing_count > 0) %>%
    arrange(desc(missing_count))
  
  if (nrow(missing_summary) > 0) {
    cat("Missing data detected:\n")
    print(missing_summary)
    
    # Recommendations
    cat("\nRecommendations:\n")
    cat("• Consider multiple imputation for <10% missing\n")
    cat("• Use complete case analysis for >10% missing\n")
    cat("• Assess missingness pattern (MCAR, MAR, MNAR)\n")
  } else {
    cat("No missing data detected\n")
  }
  
  return(missing_summary)
}

# Check example data
missing_summary <- check_missing_data(lung_cancer_data)

Problematic Migration Patterns

# Load problematic data
load("stagemigration_problematic.rda")

# Analyze problematic migration patterns
problematic_result <- stagemigration(
  data = problematic_data,
  oldStage = "old_stage",
  newStage = "new_stage",
  survivalTime = "survival_time",
  event = "event",
  eventLevel = "1",
  analysisType = "basic",
  
  # Conservative settings
  confidenceLevel = 0.90,
  clinicalSignificanceThreshold = 0.03,
  nriClinicalThreshold = 0.30,
  
  # Focus on basic metrics
  calculateNRI = TRUE,
  performBootstrap = FALSE,
  showClinicalInterpretation = TRUE
)

# Analyze migration patterns
migration_analysis <- problematic_data %>%
  count(old_stage, new_stage) %>%
  group_by(old_stage) %>%
  mutate(
    total_in_old_stage = sum(n),
    migration_rate = n / total_in_old_stage,
    migration_type = case_when(
      old_stage == new_stage ~ "No Change",
      as.numeric(old_stage) < as.numeric(new_stage) ~ "Upstaging",
      as.numeric(old_stage) > as.numeric(new_stage) ~ "Downstaging"
    )
  )

print(migration_analysis)

# Identify problematic patterns
problematic_patterns <- migration_analysis %>%
  filter(migration_rate > 0.5, migration_type != "No Change")

if (nrow(problematic_patterns) > 0) {
  cat("WARNING: Problematic migration patterns detected:\n")
  print(problematic_patterns)
  cat("\nConsider:\n")
  cat("• Reviewing staging criteria\n")
  cat("• Checking for systematic biases\n")
  cat("• Validating in independent cohort\n")
}

Best Practices Summary

Data Preparation

Sample Size Planning
- Minimum 200 patients for basic analysis
- Minimum 500 patients for reliable advanced metrics
- At least 100 events for stable results
Data Quality
- Verify staging consistency
- Check for missing data patterns
- Validate event definitions
Time Points
- Use clinically relevant time points
- Consider cancer-specific survival patterns
- Include both short-term and long-term outcomes

Statistical Analysis

Method Selection
- Use basic analysis for small samples
- Use standard analysis for validation studies
- Use comprehensive analysis for publications
Validation
- Always perform bootstrap validation
- Use optimism correction
- Consider cross-validation for large samples
Clinical Significance
- Define thresholds before analysis
- Use cancer-specific thresholds when available
- Consider both statistical and clinical significance

Interpretation

Clinical Context
- Consider cancer type and stage distribution
- Assess practical implementation challenges
- Evaluate cost-benefit implications
External Validation
- Validate in independent cohorts
- Consider different populations
- Assess generalizability
Reporting
- Follow statistical reporting guidelines
- Include confidence intervals
- Provide clinical interpretation

Conclusion

The enhanced stagemigration function provides comprehensive tools for validating TNM staging improvements. Key features include:

Advanced Statistical Methods

Harrell’s C-index with bootstrap confidence intervals
Net Reclassification Improvement (NRI) for risk classification assessment
Integrated Discrimination Improvement (IDI) for prediction accuracy
Time-dependent ROC analysis for temporal discrimination
Decision Curve Analysis for clinical utility assessment

Clinical Decision Support

Cancer-specific guidance with appropriate thresholds
Clinical significance assessment beyond statistical significance
Implementation recommendations based on evidence quality
Effect size interpretation for practical impact

Robust Validation

Bootstrap validation with optimism correction
Cross-validation for independent assessment
Stability testing across different scenarios
Sensitivity analysis for threshold selection

Publication Support

Comprehensive reporting with methodology notes
Statistical summary tables for manuscripts
Executive summaries for stakeholders
Visualization outputs for presentations

The analysis helps pathologists make evidence-based decisions about staging system adoption, ensuring that new systems provide meaningful improvements in patient care and clinical outcomes.

For additional support and examples, refer to the ClinicoPath documentation and the package vignettes covering specific cancer types and advanced use cases.

This vignette demonstrates the comprehensive capabilities of the stagemigration function in ClinicoPath. For the latest features and updates, visit the ClinicoPath GitHub repository.

ClinicoPath Team

2025-07-13