Advanced Survival Analysis with jsurvival
ClinicoPath Development Team
2025-06-30
Source:vignettes/general-22-advanced-topics-legacy.Rmd
general-22-advanced-topics-legacy.Rmd
Overview
This vignette covers advanced survival analysis techniques and methodological considerations when using jsurvival. It addresses complex scenarios that researchers commonly encounter in clinical and epidemiological studies.
Advanced Cox Regression Topics
Checking Proportional Hazards Assumption
The Cox proportional hazards model assumes that hazard ratios remain constant over time. Violation of this assumption can lead to misleading results.
Visual Assessment
# Use survival analysis with extended follow-up time
ph_check <- survival(
data = mydata,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = treatment,
ph_cox = TRUE,
endplot = 120, # Extended follow-up for visual inspection
sc = TRUE # Schoenfeld residuals (if available)
)
Visual Signs of PH Violation: - Crossing survival curves - Changing hazard ratios over time - Non-random patterns in Schoenfeld residuals
Solutions for PH Violations
Option 1: Stratified Cox Model
# When PH assumption is violated for a categorical variable
# Use stratification approach in multivariable analysis
stratified_result <- multisurvival(
data = mydata,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = c(treatment, age, sex),
# Note: Stratification would need to be implemented
# This is conceptual - actual implementation may vary
)
Model Selection and Variable Selection
Forward/Backward Selection
# Start with univariate screening
candidate_vars <- c("age", "sex", "stage", "grade", "biomarker")
univariate_results <- list()
for(var in candidate_vars) {
univariate_results[[var]] <- survival(
data = mydata,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = !!sym(var),
ph_cox = TRUE
)
}
# Variables with p < 0.20 in univariate analysis
# proceed to multivariable model
significant_vars <- c("age", "stage", "biomarker") # Based on results
final_model <- multisurvival(
data = mydata,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = significant_vars
)
Model Validation
# Internal validation using bootstrap or cross-validation
# External validation in independent dataset
validation_result <- multisurvival(
data = validation_data,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = final_model_variables
)
Handling Missing Data
Complete Case Analysis
# Default approach - uses only complete cases
complete_result <- survival(
data = mydata,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = treatment
)
Multiple Imputation Approach
# Conceptual approach for multiple imputation
# 1. Create multiple imputed datasets
# 2. Analyze each dataset separately
# 3. Pool results using Rubin's rules
# Example workflow (implementation details may vary):
imputed_results <- list()
for(i in 1:5) { # 5 imputed datasets
imputed_results[[i]] <- survival(
data = imputed_data[[i]],
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = treatment
)
}
# Pool results (conceptual)
pooled_hr <- mean(sapply(imputed_results, function(x) x$hazard_ratio))
pooled_se <- sqrt(mean(sapply(imputed_results, function(x) x$se^2)) +
(1 + 1/5) * var(sapply(imputed_results, function(x) x$hazard_ratio)))
Cut-point Optimization
Methodological Considerations
Multiple Cut-point Testing
# When testing multiple cut-points, adjust for multiple comparisons
cutpoint_analysis <- survivalcont(
data = biomarker_data,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
contexpl = biomarker_value,
findcut = TRUE,
# Consider correction for multiple testing
padjustmethod = "holm"
)
Cross-Validation of Cut-points
# Split data into training and validation sets
set.seed(123)
train_idx <- sample(nrow(biomarker_data), nrow(biomarker_data) * 0.7)
train_data <- biomarker_data[train_idx, ]
test_data <- biomarker_data[-train_idx, ]
# Find cut-point in training data
train_cutpoint <- survivalcont(
data = train_data,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
contexpl = biomarker_value,
findcut = TRUE
)
# Apply cut-point to test data
test_data$biomarker_high <- ifelse(test_data$biomarker_value >= optimal_cutpoint,
"High", "Low")
validation_result <- survival(
data = test_data,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = biomarker_high
)
Alternative Approaches to Dichotomization
Tertiles or Quartiles
# Instead of single cut-point, use tertiles
biomarker_data$biomarker_tertile <- cut(biomarker_data$biomarker_value,
breaks = quantile(biomarker_data$biomarker_value,
c(0, 1/3, 2/3, 1), na.rm = TRUE),
labels = c("Low", "Medium", "High"),
include.lowest = TRUE)
tertile_analysis <- survival(
data = biomarker_data,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = biomarker_tertile,
analysistype = "overall"
)
Time-Dependent Covariates
Landmark Analysis Implementation
Multiple Landmark Points
# Analyze survival at different landmark times
landmark_times <- c(6, 12, 24) # months
landmark_results <- list()
for(t in landmark_times) {
# Create landmark dataset
landmark_data <- subset(original_data,
time_months > t | (time_months <= t & death == 0))
# Adjust survival times
landmark_data$time_from_landmark <- pmax(0, landmark_data$time_months - t)
# Analyze
landmark_results[[paste0("Month_", t)]] <- survival(
data = landmark_data,
elapsedtime = time_from_landmark,
outcome = death,
outcomeLevel = "1",
explanatory = response_status,
uselandmark = TRUE,
landmark = t
)
}
Dynamic Prediction
# Conditional survival probabilities
# Probability of surviving additional 2 years given survival to 1 year
# Patients alive at 12 months
alive_12m <- subset(mydata, time_months > 12 | (time_months <= 12 & death == 0))
alive_12m$time_conditional <- pmax(0, alive_12m$time_months - 12)
conditional_survival <- singlearm(
data = alive_12m,
elapsedtime = time_conditional,
outcome = death,
outcomeLevel = "1",
cutp = "24", # Additional 2 years
timetypeoutput = "months"
)
Competing Risks Analysis
Cause-Specific Hazards
# Separate analyses for different causes of death
# Cancer-specific mortality
cancer_death <- survival(
data = competing_data,
elapsedtime = time_months,
outcome = death_cancer, # 1 = cancer death, 0 = alive or other death
outcomeLevel = "1",
explanatory = treatment,
ph_cox = TRUE
)
# Other-cause mortality
other_death <- survival(
data = competing_data,
elapsedtime = time_months,
outcome = death_other, # 1 = other death, 0 = alive or cancer death
outcomeLevel = "1",
explanatory = treatment,
ph_cox = TRUE
)
Subdistribution Hazards (Fine-Gray Model)
# Fine-Gray model for cumulative incidence
# This approach treats competing events as censoring at the time they occur
# but keeps subjects in the risk set
# Implementation would require specialized competing risks functions
# Currently beyond basic jsurvival functionality
Sample Size and Power Calculations
Post-hoc Power Analysis
# Given observed data, calculate achieved power
observed_events <- 120
observed_hr <- 0.75
alpha <- 0.05
# Use standard formulas or specialized software for power calculation
# Example calculation (conceptual):
log_hr <- log(observed_hr)
se_log_hr <- 1.96 / abs(qnorm(alpha/2)) # Approximate from CI
power <- pnorm(abs(log_hr)/se_log_hr - qnorm(1-alpha/2))
Required Sample Size
# For planning future studies
target_hr <- 0.70 # Clinically meaningful difference
alpha <- 0.05 # Type I error
power <- 0.80 # Desired power
accrual_time <- 24 # months
followup_time <- 36 # months
median_survival <- 30 # months in control group
# Use specialized software or formulas for calculation
# Required events ≈ 4 * (Z_α/2 + Z_β)² / (log(HR))²
Meta-Analysis of Survival Data
Individual Patient Data Meta-Analysis
# Combine multiple datasets
combined_data <- rbind(
transform(study1_data, study = "Study1"),
transform(study2_data, study = "Study2"),
transform(study3_data, study = "Study3")
)
# Stratified analysis by study
meta_result <- multisurvival(
data = combined_data,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = c(treatment, age, sex, study)
# Consider stratification by study
)
Fixed Effects vs. Random Effects
# Fixed effects: assumes same treatment effect across studies
fixed_effects <- survival(
data = combined_data,
elapsedtime = time_months,
outcome = death,
outcomeLevel = "1",
explanatory = treatment
)
# Random effects: allows heterogeneity between studies
# Would require specialized meta-analysis functions
Quality Control and Validation
Data Quality Checks
# Check for data inconsistencies
data_checks <- list(
# Negative survival times
negative_times = sum(mydata$time_months < 0, na.rm = TRUE),
# Events after last follow-up
impossible_events = sum(mydata$death == 1 & mydata$time_months > max_followup),
# Missing key variables
missing_time = sum(is.na(mydata$time_months)),
missing_event = sum(is.na(mydata$death)),
# Extreme values
extreme_times = sum(mydata$time_months > 200, na.rm = TRUE) # >16 years
)
print(data_checks)
Reporting and Interpretation
Effect Size Interpretation
# Hazard Ratio Interpretation:
# HR = 0.50: 50% reduction in hazard (strong effect)
# HR = 0.75: 25% reduction in hazard (moderate effect)
# HR = 0.90: 10% reduction in hazard (small effect)
# HR = 1.00: No effect
# HR = 1.25: 25% increase in hazard (moderate harm)
# HR = 2.00: 100% increase in hazard (strong harm)
# Number Needed to Treat (NNT) calculation
survival_control <- 0.60 # 5-year survival in control
survival_treatment <- 0.70 # 5-year survival in treatment
absolute_benefit <- survival_treatment - survival_control
nnt <- 1 / absolute_benefit # Number needed to treat
Confidence Interval Interpretation
# HR = 0.75 (95% CI: 0.60-0.95)
# Interpretation:
# - Point estimate suggests 25% reduction in hazard
# - We can be 95% confident the true HR is between 0.60 and 0.95
# - Since CI excludes 1.0, result is statistically significant
# - Minimum plausible benefit is 5% (HR=0.95)
# - Maximum plausible benefit is 40% (HR=0.60)
Conclusion
Advanced survival analysis requires careful consideration of:
- Model assumptions and their validation
- Missing data patterns and appropriate handling
- Multiple comparisons and adjustment strategies
- Clinical relevance beyond statistical significance
- Robust validation in independent datasets
Key principles for advanced analysis:
- Pre-specify analysis plans to avoid data dredging
- Validate findings in independent cohorts when possible
- Consider clinical context in statistical decisions
- Report limitations and assumptions clearly
- Collaborate with statisticians for complex analyses
For complex analyses beyond the scope of basic jsurvival functions, consider: - Specialized R packages (survival, survminer, rms, cmprsk) - Statistical software with advanced survival capabilities - Consultation with biostatisticians
Additional resources: - Survival Analysis Handbook - Clinical Prediction Models - STROBE Guidelines for observational studies