Advanced Survival Analysis with jsurvival

Overview

This vignette covers advanced survival analysis techniques and methodological considerations when using jsurvival. It addresses complex scenarios that researchers commonly encounter in clinical and epidemiological studies.

Advanced Cox Regression Topics

Checking Proportional Hazards Assumption

The Cox proportional hazards model assumes that hazard ratios remain constant over time. Violation of this assumption can lead to misleading results.

Visual Assessment

# Use survival analysis with extended follow-up time
ph_check <- survival(
  data = mydata,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = treatment,
  ph_cox = TRUE,
  endplot = 120,  # Extended follow-up for visual inspection
  sc = TRUE       # Schoenfeld residuals (if available)
)

Visual Signs of PH Violation: - Crossing survival curves - Changing hazard ratios over time - Non-random patterns in Schoenfeld residuals

Solutions for PH Violations

Option 1: Stratified Cox Model

# When PH assumption is violated for a categorical variable
# Use stratification approach in multivariable analysis
stratified_result <- multisurvival(
  data = mydata,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = c(treatment, age, sex),
  # Note: Stratification would need to be implemented
  # This is conceptual - actual implementation may vary
)

Option 2: Time-Dependent Effects

For variables with time-varying effects, consider: - Landmark analyses at multiple time points - Piecewise exponential models - Flexible parametric models

Model Selection and Variable Selection

Forward/Backward Selection

# Start with univariate screening
candidate_vars <- c("age", "sex", "stage", "grade", "biomarker")
univariate_results <- list()

for(var in candidate_vars) {
  univariate_results[[var]] <- survival(
    data = mydata,
    elapsedtime = time_months,
    outcome = death,
    outcomeLevel = "1",
    explanatory = !!sym(var),
    ph_cox = TRUE
  )
}

# Variables with p < 0.20 in univariate analysis
# proceed to multivariable model
significant_vars <- c("age", "stage", "biomarker")  # Based on results

final_model <- multisurvival(
  data = mydata,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = significant_vars
)

Model Validation

# Internal validation using bootstrap or cross-validation
# External validation in independent dataset

validation_result <- multisurvival(
  data = validation_data,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = final_model_variables
)

Handling Missing Data

Complete Case Analysis

# Default approach - uses only complete cases
complete_result <- survival(
  data = mydata,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = treatment
)

Multiple Imputation Approach

# Conceptual approach for multiple imputation
# 1. Create multiple imputed datasets
# 2. Analyze each dataset separately
# 3. Pool results using Rubin's rules

# Example workflow (implementation details may vary):
imputed_results <- list()
for(i in 1:5) {  # 5 imputed datasets
  imputed_results[[i]] <- survival(
    data = imputed_data[[i]],
    elapsedtime = time_months,
    outcome = death,
    outcomeLevel = "1",
    explanatory = treatment
  )
}

# Pool results (conceptual)
pooled_hr <- mean(sapply(imputed_results, function(x) x$hazard_ratio))
pooled_se <- sqrt(mean(sapply(imputed_results, function(x) x$se^2)) + 
                  (1 + 1/5) * var(sapply(imputed_results, function(x) x$hazard_ratio)))

Cut-point Optimization

Methodological Considerations

Multiple Cut-point Testing

# When testing multiple cut-points, adjust for multiple comparisons
cutpoint_analysis <- survivalcont(
  data = biomarker_data,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  contexpl = biomarker_value,
  findcut = TRUE,
  # Consider correction for multiple testing
  padjustmethod = "holm"
)

Cross-Validation of Cut-points

# Split data into training and validation sets
set.seed(123)
train_idx <- sample(nrow(biomarker_data), nrow(biomarker_data) * 0.7)
train_data <- biomarker_data[train_idx, ]
test_data <- biomarker_data[-train_idx, ]

# Find cut-point in training data
train_cutpoint <- survivalcont(
  data = train_data,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  contexpl = biomarker_value,
  findcut = TRUE
)

# Apply cut-point to test data
test_data$biomarker_high <- ifelse(test_data$biomarker_value >= optimal_cutpoint, 
                                   "High", "Low")

validation_result <- survival(
  data = test_data,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = biomarker_high
)

Alternative Approaches to Dichotomization

Tertiles or Quartiles

# Instead of single cut-point, use tertiles
biomarker_data$biomarker_tertile <- cut(biomarker_data$biomarker_value,
                                        breaks = quantile(biomarker_data$biomarker_value,
                                                         c(0, 1/3, 2/3, 1), na.rm = TRUE),
                                        labels = c("Low", "Medium", "High"),
                                        include.lowest = TRUE)

tertile_analysis <- survival(
  data = biomarker_data,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = biomarker_tertile,
  analysistype = "overall"
)

Spline Analysis

# For truly continuous relationships
# Consider restricted cubic splines or other flexible approaches
# This would require additional implementation beyond basic jsurvival

Time-Dependent Covariates

Landmark Analysis Implementation

Multiple Landmark Points

# Analyze survival at different landmark times
landmark_times <- c(6, 12, 24)  # months
landmark_results <- list()

for(t in landmark_times) {
  # Create landmark dataset
  landmark_data <- subset(original_data, 
                         time_months > t | (time_months <= t & death == 0))
  
  # Adjust survival times
  landmark_data$time_from_landmark <- pmax(0, landmark_data$time_months - t)
  
  # Analyze
  landmark_results[[paste0("Month_", t)]] <- survival(
    data = landmark_data,
    elapsedtime = time_from_landmark,
    outcome = death,
    outcomeLevel = "1",
    explanatory = response_status,
    uselandmark = TRUE,
    landmark = t
  )
}

Dynamic Prediction

# Conditional survival probabilities
# Probability of surviving additional 2 years given survival to 1 year

# Patients alive at 12 months
alive_12m <- subset(mydata, time_months > 12 | (time_months <= 12 & death == 0))
alive_12m$time_conditional <- pmax(0, alive_12m$time_months - 12)

conditional_survival <- singlearm(
  data = alive_12m,
  elapsedtime = time_conditional,
  outcome = death,
  outcomeLevel = "1",
  cutp = "24",  # Additional 2 years
  timetypeoutput = "months"
)

Competing Risks Analysis

Cause-Specific Hazards

# Separate analyses for different causes of death
# Cancer-specific mortality
cancer_death <- survival(
  data = competing_data,
  elapsedtime = time_months,
  outcome = death_cancer,  # 1 = cancer death, 0 = alive or other death
  outcomeLevel = "1",
  explanatory = treatment,
  ph_cox = TRUE
)

# Other-cause mortality
other_death <- survival(
  data = competing_data,
  elapsedtime = time_months,
  outcome = death_other,   # 1 = other death, 0 = alive or cancer death
  outcomeLevel = "1",
  explanatory = treatment,
  ph_cox = TRUE
)

Subdistribution Hazards (Fine-Gray Model)

# Fine-Gray model for cumulative incidence
# This approach treats competing events as censoring at the time they occur
# but keeps subjects in the risk set

# Implementation would require specialized competing risks functions
# Currently beyond basic jsurvival functionality

Sample Size and Power Calculations

Post-hoc Power Analysis

# Given observed data, calculate achieved power
observed_events <- 120
observed_hr <- 0.75
alpha <- 0.05

# Use standard formulas or specialized software for power calculation
# Example calculation (conceptual):
log_hr <- log(observed_hr)
se_log_hr <- 1.96 / abs(qnorm(alpha/2))  # Approximate from CI
power <- pnorm(abs(log_hr)/se_log_hr - qnorm(1-alpha/2))

Required Sample Size

# For planning future studies
target_hr <- 0.70        # Clinically meaningful difference
alpha <- 0.05            # Type I error
power <- 0.80            # Desired power
accrual_time <- 24       # months
followup_time <- 36      # months
median_survival <- 30    # months in control group

# Use specialized software or formulas for calculation
# Required events ≈ 4 * (Z_α/2 + Z_β)² / (log(HR))²

Meta-Analysis of Survival Data

Individual Patient Data Meta-Analysis

# Combine multiple datasets
combined_data <- rbind(
  transform(study1_data, study = "Study1"),
  transform(study2_data, study = "Study2"),
  transform(study3_data, study = "Study3")
)

# Stratified analysis by study
meta_result <- multisurvival(
  data = combined_data,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = c(treatment, age, sex, study)
  # Consider stratification by study
)

Fixed Effects vs. Random Effects

# Fixed effects: assumes same treatment effect across studies
fixed_effects <- survival(
  data = combined_data,
  elapsedtime = time_months,
  outcome = death,
  outcomeLevel = "1",
  explanatory = treatment
)

# Random effects: allows heterogeneity between studies
# Would require specialized meta-analysis functions

Quality Control and Validation

Data Quality Checks

# Check for data inconsistencies
data_checks <- list(
  # Negative survival times
  negative_times = sum(mydata$time_months < 0, na.rm = TRUE),
  
  # Events after last follow-up
  impossible_events = sum(mydata$death == 1 & mydata$time_months > max_followup),
  
  # Missing key variables
  missing_time = sum(is.na(mydata$time_months)),
  missing_event = sum(is.na(mydata$death)),
  
  # Extreme values
  extreme_times = sum(mydata$time_months > 200, na.rm = TRUE)  # >16 years
)

print(data_checks)

Model Diagnostics

# Concordance index (C-index)
# Measures discriminative ability of the model
# Values >0.7 generally considered good

# Calibration assessment
# Agreement between predicted and observed survival

# Model comparison using AIC/BIC
# Lower values indicate better fit

Reporting and Interpretation

Effect Size Interpretation

# Hazard Ratio Interpretation:
# HR = 0.50: 50% reduction in hazard (strong effect)
# HR = 0.75: 25% reduction in hazard (moderate effect)
# HR = 0.90: 10% reduction in hazard (small effect)
# HR = 1.00: No effect
# HR = 1.25: 25% increase in hazard (moderate harm)
# HR = 2.00: 100% increase in hazard (strong harm)

# Number Needed to Treat (NNT) calculation
survival_control <- 0.60  # 5-year survival in control
survival_treatment <- 0.70  # 5-year survival in treatment
absolute_benefit <- survival_treatment - survival_control
nnt <- 1 / absolute_benefit  # Number needed to treat

Confidence Interval Interpretation

# HR = 0.75 (95% CI: 0.60-0.95)
# Interpretation:
# - Point estimate suggests 25% reduction in hazard
# - We can be 95% confident the true HR is between 0.60 and 0.95
# - Since CI excludes 1.0, result is statistically significant
# - Minimum plausible benefit is 5% (HR=0.95)
# - Maximum plausible benefit is 40% (HR=0.60)

Conclusion

Advanced survival analysis requires careful consideration of:

Model assumptions and their validation
Missing data patterns and appropriate handling
Multiple comparisons and adjustment strategies
Clinical relevance beyond statistical significance
Robust validation in independent datasets

Key principles for advanced analysis:

Pre-specify analysis plans to avoid data dredging
Validate findings in independent cohorts when possible
Consider clinical context in statistical decisions
Report limitations and assumptions clearly
Collaborate with statisticians for complex analyses

For complex analyses beyond the scope of basic jsurvival functions, consider: - Specialized R packages (survival, survminer, rms, cmprsk) - Statistical software with advanced survival capabilities - Consultation with biostatisticians

Additional resources: - Survival Analysis Handbook - Clinical Prediction Models - STROBE Guidelines for observational studies

ClinicoPath Development Team

2025-06-30