Adaptive LASSO for Cox Models — adaptivelasso • ClinicoPath

Adaptive LASSO for Cox proportional hazards models with data-driven penalty selection and improved variable selection properties. Unlike standard LASSO, the adaptive LASSO uses weights based on initial parameter estimates to provide consistent variable selection (oracle property) and reduced bias for non-zero coefficients. This implementation supports multiple weight calculation methods, cross-validation for optimal penalty selection, stability selection for robust variable identification, and comprehensive model diagnostics. Particularly effective for high-dimensional survival data where traditional methods fail, genomic studies with many biomarkers, and clinical prediction models requiring sparse, interpretable solutions. The method automatically handles tied survival times, stratified analysis, and provides uncertainty quantification through bootstrap procedures.

Usage

adaptivelasso(
  data,
  time,
  event,
  predictors,
  strata,
  weight_method = "ridge",
  alpha = 1,
  gamma = 1,
  cv_folds = 10,
  cv_measure = "deviance",
  lambda_sequence = "auto",
  lambda_min_ratio = 0.001,
  n_lambda = 100,
  stability_selection = FALSE,
  stability_threshold = 0.6,
  bootstrap_samples = 100,
  subsampling_ratio = 0.8,
  proportional_hazards = TRUE,
  influence_diagnostics = FALSE,
  goodness_of_fit = TRUE,
  risk_groups = 3,
  time_points = "1, 2, 5",
  baseline_survival = TRUE,
  show_coefficients = TRUE,
  show_selection_path = TRUE,
  show_cv_results = TRUE,
  show_diagnostics = TRUE,
  plot_selection_path = TRUE,
  plot_cv_curve = TRUE,
  plot_stability = FALSE,
  plot_survival_curves = FALSE,
  plot_baseline_hazard = FALSE,
  plot_diagnostics = FALSE,
  tie_method = "breslow",
  standardize = TRUE,
  intercept = FALSE,
  parallel_computing = FALSE,
  n_cores = 2,
  convergence_threshold = 1e-07,
  max_iterations = 10000,
  random_seed = 123
)

Arguments

data: The data as a data frame.
time: Time to event variable (numeric). For right-censored data, this is the time from study entry to event or censoring.
event: Event indicator variable. For survival analysis: 0 = censored, 1 = event. For competing risks: 0 = censored, 1+ = different event types.
predictors: Variables to include in the adaptive LASSO Cox model. Can include continuous, ordinal, and nominal variables. Automatic standardization is applied for optimal penalty performance.
strata: Optional stratification variable for stratified Cox regression. Creates separate baseline hazards for each stratum.
weight_method: Method for calculating adaptive weights. Ridge provides stable weights with shrinkage, univariate uses individual Cox regressions, full Cox uses complete model (may be unstable), correlation uses marginal associations.
alpha: Elastic net mixing parameter. 1.0 = pure LASSO, 0.0 = ridge regression, intermediate values combine both penalties for better performance with correlated predictors.
gamma: Power parameter for adaptive weights calculation. Higher values increase penalty differences between variables, promoting stronger variable selection but potentially increasing instability.
cv_folds: Number of folds for cross-validation to select optimal penalty parameter. More folds provide better estimates but increase computation time.
cv_measure: Performance measure for cross-validation. Deviance is computationally efficient, C-index focuses on discrimination, Brier score accounts for calibration, AUC provides time-specific performance.
lambda_sequence: How to specify the penalty parameter sequence. Automatic uses data-driven range, custom allows user specification, single tests only one value.
lambda_min_ratio: Ratio of smallest to largest lambda value in automatic sequence. Smaller values explore stronger penalties but may lead to computational difficulties.
n_lambda: Number of lambda values in the regularization path. More values provide finer resolution but increase computation time.
stability_selection: Perform stability selection to identify robust variable selection patterns across bootstrap samples. Provides more reliable variable selection than single model fitting.
stability_threshold: Minimum selection frequency for variables in stability selection. Higher thresholds provide more conservative variable selection but may miss important predictors.
bootstrap_samples: Number of bootstrap samples for stability selection and confidence intervals. More samples provide more stable estimates.
subsampling_ratio: Proportion of data used in each bootstrap sample for stability selection. Smaller ratios increase selection variability.
proportional_hazards: Test proportional hazards assumption for selected variables using scaled Schoenfeld residuals and time-interaction tests.
influence_diagnostics: Calculate influence diagnostics including dfbeta, leverage, and outlier detection for robust model assessment.
goodness_of_fit: Perform goodness of fit assessment including model deviance, concordance statistics, and calibration measures.
risk_groups: Number of risk groups for stratification based on linear predictor. Used for Kaplan-Meier curves and risk group analysis.
time_points: Comma-separated list of time points for survival probability predictions. Empty string uses data-driven quantiles.
baseline_survival: Estimate and plot baseline survival function for the adaptive LASSO Cox model using Breslow estimator.
show_coefficients: Display coefficient estimates for selected variables with standard errors and confidence intervals.
show_selection_path: Display complete regularization path showing how coefficients change with penalty parameter.
show_cv_results: Display cross-validation results including optimal lambda selection and performance across penalty values.
show_diagnostics: Display comprehensive model diagnostics including proportional hazards tests, influence measures, and residual analysis.
plot_selection_path: Plot coefficient paths as function of penalty parameter showing variable selection progression.
plot_cv_curve: Plot cross-validation performance curve with optimal lambda selection and confidence bands.
plot_stability: Plot stability selection results showing selection frequencies across bootstrap samples.
plot_survival_curves: Plot Kaplan-Meier survival curves for risk groups defined by adaptive LASSO linear predictor.
plot_baseline_hazard: Plot estimated baseline hazard function and cumulative baseline hazard from the final model.
plot_diagnostics: Create diagnostic plots including residuals, influential observations, and proportional hazards assessment.
tie_method: Method for handling tied survival times. Efron provides better approximation but is computationally more intensive.
standardize: Standardize predictors before fitting. Recommended for optimal penalty performance with mixed-scale variables.
intercept: Include intercept term in Cox model. Generally not needed for survival analysis but may be useful in special cases.
parallel_computing: Use parallel computing for cross-validation and bootstrap procedures to reduce computation time.
n_cores: Number of CPU cores for parallel computation when enabled. More cores speed up CV and bootstrap but use more memory.
convergence_threshold: Convergence threshold for coordinate descent algorithm. Smaller values provide more precise solutions but increase computation time.
max_iterations: Maximum iterations for coordinate descent algorithm. Increase if convergence warnings occur.
random_seed: Random seed for cross-validation folds and bootstrap sampling. Ensures reproducible results across analyses.

Value

A results object containing:

`results$instructions`					a html
`results$todo`					a html
`results$coefficients`					a table
`results$selectionPath`					a table
`results$cvResults`					a table
`results$stabilityResults`					a table
`results$modelDiagnostics`					a table
`results$performanceMetrics`					a table
`results$riskGroups`					a table
`results$predictions`					a table
`results$pathPlot`					an image
`results$cvPlot`					an image
`results$stabilityPlot`					an image
`results$survivalPlot`					an image
`results$baselineHazardPlot`					an image
`results$diagnosticsPlot`					an image

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$coefficients$asDF

as.data.frame(results$coefficients)

Examples

result <- adaptivelasso(
    data = mydata,
    time = "time_to_event",
    event = "event_indicator",
    predictors = c("age", "stage", "biomarker1", "biomarker2"),
    weight_method = "ridge",
    cv_folds = 10,
    stability_selection = TRUE,
    alpha = 1.0
)