jSurvival 16: Cox Proportional Hazards Model Diagnostics
ClinicoPath Development Team
2025-07-13
Source:vignettes/jsurvival-16-cox-model-diagnostics.Rmd
jsurvival-16-cox-model-diagnostics.Rmd
Introduction
The Cox proportional hazards model is one of the most widely used statistical methods in survival analysis for clinical research. However, like all statistical models, the validity of Cox regression results depends on certain assumptions being met. The coxdiagnostics function in ClinicoPath provides comprehensive diagnostic tools to validate Cox model assumptions and assess model adequacy.
Learning Objectives:
- Understand the key assumptions of Cox proportional hazards models
- Learn to interpret diagnostic plots for Cox model validation
- Master the use of residual analysis for model assessment
- Apply multicollinearity detection using VIF analysis
- Develop skills in identifying influential observations
- Create comprehensive diagnostic reports for clinical publications
Why Cox Model Diagnostics Matter
Cox model diagnostics are essential for:
- Model Validation: Ensuring model assumptions are met
- Clinical Validity: Confirming results can be trusted for medical decisions
- Publication Quality: Meeting standards for peer-reviewed research
- Regulatory Compliance: Satisfying FDA and other regulatory requirements
- Patient Safety: Ensuring reliable prognostic models
Cox Model Assumptions
The Cox proportional hazards model relies on several key assumptions:
Diagnostic Plot Types
Martingale Residuals
- Purpose: Detect non-linear relationships and outliers
- Interpretation: Should be randomly scattered around zero
- Warning Signs: Patterns suggest non-linear relationships
Deviance Residuals
- Purpose: Identify poorly fitted observations
- Interpretation: Should be approximately normally distributed
- Warning Signs: Values > 2 or < -2 indicate poor fit
Schoenfeld Residuals
- Purpose: Test proportional hazards assumption
- Interpretation: Should show no trend over time
- Warning Signs: Slopes indicate time-varying effects
Basic Cox Model Diagnostics
Example 1: Melanoma Dataset
Let’s start with a basic Cox model diagnostic analysis using the melanoma dataset.
# Basic Cox diagnostics for melanoma data
coxdiagnostics(
data = melanoma,
time = "time",
event = "status",
covariates = c("age", "sex"),
show_martingale = TRUE,
show_deviance = TRUE,
show_ph_test = TRUE,
show_model_summary = TRUE
)
Example 2: Multiple Covariates with VIF Analysis
# Cox diagnostics with VIF analysis for multicollinearity
coxdiagnostics(
data = melanoma,
time = "time",
event = "status",
covariates = c("age", "sex", "thickness", "ulcer"),
show_martingale = TRUE,
show_deviance = TRUE,
show_vif = TRUE,
vif_threshold = 5.0,
show_interpretation = TRUE
)
Advanced Diagnostic Analysis
Example 3: Comprehensive Diagnostic Panel
# Comprehensive Cox diagnostics with all plot types
coxdiagnostics(
data = colon,
time = "time",
event = "status",
covariates = c("age", "sex", "nodes", "differ"),
show_martingale = TRUE,
show_deviance = TRUE,
show_score = TRUE,
show_schoenfeld = TRUE,
show_dfbeta = TRUE,
show_ph_test = TRUE,
show_vif = TRUE,
show_model_summary = TRUE,
show_interpretation = TRUE
)
Example 4: Stratified Cox Model Diagnostics
# Cox diagnostics with stratification
coxdiagnostics(
data = colon,
time = "time",
event = "status",
covariates = c("age", "nodes"),
strata_var = "rx", # Stratify by treatment
show_martingale = TRUE,
show_deviance = TRUE,
show_ph_test = TRUE,
exclude_missing = TRUE
)
Clinical Research Applications
Example 5: Pathology Research with Histopathology Data
# Cox diagnostics for pathology research
coxdiagnostics(
data = histopathology,
time = "OverallTime",
event = "Outcome",
covariates = c("Age", "Sex", "Grade", "TStage"),
show_martingale = TRUE,
show_deviance = TRUE,
show_schoenfeld = TRUE,
show_vif = TRUE,
show_ph_test = TRUE,
show_model_summary = TRUE,
exclude_missing = TRUE,
confidence_level = 0.95
)
Example 6: Biomarker Validation Study
# Create biomarker validation dataset
set.seed(123)
n_patients <- 200
biomarker_data <- data.frame(
patient_id = 1:n_patients,
survival_time = rweibull(n_patients, shape = 1.5, scale = 24),
event_occurred = rbinom(n_patients, 1, 0.65),
age = round(rnorm(n_patients, 65, 12)),
sex = factor(sample(c("Male", "Female"), n_patients, replace = TRUE)),
biomarker_score = rnorm(n_patients, 50, 15),
tumor_stage = factor(sample(c("I", "II", "III", "IV"), n_patients,
replace = TRUE, prob = c(0.2, 0.3, 0.3, 0.2)))
)
# Ensure realistic ranges
biomarker_data$age <- pmax(18, pmin(85, biomarker_data$age))
biomarker_data$survival_time <- pmax(1, pmin(60, biomarker_data$survival_time))
biomarker_data$biomarker_score <- pmax(0, pmin(100, biomarker_data$biomarker_score))
# Cox diagnostics for biomarker validation
coxdiagnostics(
data = biomarker_data,
time = "survival_time",
event = "event_occurred",
covariates = c("age", "sex", "biomarker_score", "tumor_stage"),
show_martingale = TRUE,
show_deviance = TRUE,
show_schoenfeld = TRUE,
show_vif = TRUE,
show_ph_test = TRUE,
show_model_summary = TRUE,
vif_threshold = 5.0,
show_interpretation = TRUE
)
Plot Customization Options
Example 7: Custom Plot Settings
# Customized diagnostic plots
coxdiagnostics(
data = melanoma,
time = "time",
event = "status",
covariates = c("age", "thickness"),
show_martingale = TRUE,
show_deviance = TRUE,
ox_scale = "observation.id", # Different x-axis scale
add_smooth = TRUE, # Add smoothing line
add_reference = TRUE, # Add reference line at y=0
point_size = 1.5, # Larger points
alpha_level = 0.7, # Semi-transparent points
show_interpretation = TRUE
)
Interpretation Guidelines
Martingale Residuals
Normal Pattern: - Randomly scattered around zero - No obvious patterns or trends - Most values between -2 and +2
Warning Signs: - Systematic patterns (curved relationships) - Outliers with extreme values (>3 or <-3) - Funnel shapes suggesting heteroscedasticity
Deviance Residuals
Normal Pattern: - Approximately symmetric around zero - Most values between -2 and +2 - No systematic patterns
Warning Signs: - Values > 2 or < -2 (poorly fitted observations) - Asymmetric distribution - Systematic patterns
Clinical Decision Making
Publication-Ready Analysis
Example 8: Complete Diagnostic Report
# Complete diagnostic analysis for publication
results <- coxdiagnostics(
data = colon,
time = "time",
event = "status",
covariates = c("age", "sex", "nodes", "differ", "extent"),
show_martingale = TRUE,
show_deviance = TRUE,
show_schoenfeld = TRUE,
show_dfbeta = TRUE,
show_ph_test = TRUE,
show_vif = TRUE,
show_model_summary = TRUE,
show_interpretation = TRUE,
confidence_level = 0.95,
exclude_missing = TRUE
)
Publication Checklist:
- ✅ Model Summary: Coefficients, hazard ratios, confidence intervals
- ✅ Assumption Testing: Proportional hazards test results
- ✅ Residual Analysis: All diagnostic plots included
- ✅ Multicollinearity: VIF analysis performed
- ✅ Missing Data: Handling strategy documented
- ✅ Sample Size: Adequate for number of covariates
- ✅ Clinical Interpretation: Results explained in clinical context
Common Issues and Solutions
Issue 1: Non-Proportional Hazards
Detection: - Significant Schoenfeld test (p < 0.05) - Trends in Schoenfeld residual plots
Solutions: - Stratify by offending variable - Use time-dependent covariates - Consider parametric models
Issue 2: Non-Linear Relationships
Detection: - Patterns in martingale residuals - Curved relationships in plots
Solutions: - Add polynomial terms - Use splines or smoothing - Transform continuous variables
Best Practices
Data Preparation
# Example of proper data preparation
clean_data <- function(data) {
# Remove invalid time values
data <- data[data$time > 0, ]
# Check event coding (should be 0/1)
data$status <- ifelse(data$status == 2, 1, data$status)
# Handle missing values appropriately
complete_vars <- c("time", "status", "age", "sex")
data <- data[complete.cases(data[complete_vars]), ]
return(data)
}
# Apply to melanoma data
melanoma_clean <- clean_data(melanoma)
Systematic Diagnostic Approach
- Model Summary: Review coefficient estimates and model fit
- Proportional Hazards: Test the key assumption first
- Residual Analysis: Examine all residual types systematically
- Influence Analysis: Identify potentially problematic observations
- Multicollinearity: Check for correlation among predictors
- Clinical Review: Ensure results make biological sense
Documentation Standards
# Template for documenting Cox diagnostics
diagnostic_report <- function(model_name, dataset, findings) {
cat("=== Cox Model Diagnostic Report ===\n")
cat("Model:", model_name, "\n")
cat("Dataset:", dataset, "\n")
cat("Date:", Sys.Date(), "\n\n")
cat("Diagnostic Findings:\n")
cat("- Proportional Hazards:", findings$ph_test, "\n")
cat("- Linearity:", findings$linearity, "\n")
cat("- Influential Observations:", findings$influence, "\n")
cat("- Multicollinearity:", findings$vif, "\n\n")
cat("Recommendations:\n")
cat("- Model Adequacy:", findings$adequacy, "\n")
cat("- Required Actions:", findings$actions, "\n")
}
Advanced Topics
Time-Dependent Covariates
When proportional hazards assumption is violated, consider time-dependent covariates:
# Example of time-dependent covariate approach
# (This would require additional data preparation)
# cox_model_td <- coxph(Surv(start, stop, event) ~ age + sex +
# covariate * log(time), data = long_format_data)
Stratified Analysis
For variables that don’t meet proportional hazards:
# Stratified Cox model diagnostics
coxdiagnostics(
data = colon,
time = "time",
event = "status",
covariates = c("age", "nodes"),
strata_var = "differ", # Stratify by differentiation
show_ph_test = TRUE,
show_martingale = TRUE
)
Summary
The coxdiagnostics function provides a comprehensive toolkit for validating Cox proportional hazards models in clinical research. Key takeaways:
Essential Diagnostics
- Proportional Hazards Test: Most critical assumption
- Residual Analysis: Multiple types for different purposes
- VIF Analysis: Detect multicollinearity
- Influence Diagnostics: Identify problematic observations
Clinical Applications
- Biomarker Validation: Ensure reliable prognostic models
- Treatment Efficacy: Validate survival endpoints
- Risk Stratification: Develop clinical prediction tools
- Regulatory Submissions: Meet statistical requirements
Quality Assurance
- Assumption Checking: Systematic validation approach
- Documentation: Complete diagnostic reports
- Interpretation: Clinical significance assessment
- Publication Standards: Peer-review ready analysis
Next Steps
- Practice: Apply to your own survival datasets
- Iterate: Refine models based on diagnostic findings
- Validate: Use external datasets when possible
- Collaborate: Work with biostatisticians for complex cases
The robust diagnostic capabilities of coxdiagnostics ensure that your Cox regression analyses meet the highest standards for clinical research and publication.