Skip to contents

Adaptive LASSO for Cox proportional hazards models with data-driven penalty selection and improved variable selection properties. Unlike standard LASSO, the adaptive LASSO uses weights based on initial parameter estimates to provide consistent variable selection (oracle property) and reduced bias for non-zero coefficients. This implementation supports multiple weight calculation methods, cross-validation for optimal penalty selection, stability selection for robust variable identification, and comprehensive model diagnostics. Particularly effective for high-dimensional survival data where traditional methods fail, genomic studies with many biomarkers, and clinical prediction models requiring sparse, interpretable solutions. The method automatically handles tied survival times, stratified analysis, and provides uncertainty quantification through bootstrap procedures.

Usage

adaptivelasso(
  data,
  time,
  event,
  predictors,
  strata,
  weight_method = "ridge",
  alpha = 1,
  gamma = 1,
  cv_folds = 10,
  cv_measure = "deviance",
  lambda_sequence = "auto",
  lambda_min_ratio = 0.001,
  n_lambda = 100,
  stability_selection = FALSE,
  stability_threshold = 0.6,
  bootstrap_samples = 100,
  subsampling_ratio = 0.8,
  proportional_hazards = TRUE,
  influence_diagnostics = FALSE,
  goodness_of_fit = TRUE,
  risk_groups = 3,
  time_points = "1, 2, 5",
  baseline_survival = TRUE,
  show_coefficients = TRUE,
  show_selection_path = TRUE,
  show_cv_results = TRUE,
  show_diagnostics = TRUE,
  plot_selection_path = TRUE,
  plot_cv_curve = TRUE,
  plot_stability = FALSE,
  plot_survival_curves = FALSE,
  plot_baseline_hazard = FALSE,
  plot_diagnostics = FALSE,
  tie_method = "breslow",
  standardize = TRUE,
  intercept = FALSE,
  parallel_computing = FALSE,
  n_cores = 2,
  convergence_threshold = 1e-07,
  max_iterations = 10000,
  random_seed = 123
)

Arguments

data

The data as a data frame.

time

Time to event variable (numeric). For right-censored data, this is the time from study entry to event or censoring.

event

Event indicator variable. For survival analysis: 0 = censored, 1 = event. For competing risks: 0 = censored, 1+ = different event types.

predictors

Variables to include in the adaptive LASSO Cox model. Can include continuous, ordinal, and nominal variables. Automatic standardization is applied for optimal penalty performance.

strata

Optional stratification variable for stratified Cox regression. Creates separate baseline hazards for each stratum.

weight_method

Method for calculating adaptive weights. Ridge provides stable weights with shrinkage, univariate uses individual Cox regressions, full Cox uses complete model (may be unstable), correlation uses marginal associations.

alpha

Elastic net mixing parameter. 1.0 = pure LASSO, 0.0 = ridge regression, intermediate values combine both penalties for better performance with correlated predictors.

gamma

Power parameter for adaptive weights calculation. Higher values increase penalty differences between variables, promoting stronger variable selection but potentially increasing instability.

cv_folds

Number of folds for cross-validation to select optimal penalty parameter. More folds provide better estimates but increase computation time.

cv_measure

Performance measure for cross-validation. Deviance is computationally efficient, C-index focuses on discrimination, Brier score accounts for calibration, AUC provides time-specific performance.

lambda_sequence

How to specify the penalty parameter sequence. Automatic uses data-driven range, custom allows user specification, single tests only one value.

lambda_min_ratio

Ratio of smallest to largest lambda value in automatic sequence. Smaller values explore stronger penalties but may lead to computational difficulties.

n_lambda

Number of lambda values in the regularization path. More values provide finer resolution but increase computation time.

stability_selection

Perform stability selection to identify robust variable selection patterns across bootstrap samples. Provides more reliable variable selection than single model fitting.

stability_threshold

Minimum selection frequency for variables in stability selection. Higher thresholds provide more conservative variable selection but may miss important predictors.

bootstrap_samples

Number of bootstrap samples for stability selection and confidence intervals. More samples provide more stable estimates.

subsampling_ratio

Proportion of data used in each bootstrap sample for stability selection. Smaller ratios increase selection variability.

proportional_hazards

Test proportional hazards assumption for selected variables using scaled Schoenfeld residuals and time-interaction tests.

influence_diagnostics

Calculate influence diagnostics including dfbeta, leverage, and outlier detection for robust model assessment.

goodness_of_fit

Perform goodness of fit assessment including model deviance, concordance statistics, and calibration measures.

risk_groups

Number of risk groups for stratification based on linear predictor. Used for Kaplan-Meier curves and risk group analysis.

time_points

Comma-separated list of time points for survival probability predictions. Empty string uses data-driven quantiles.

baseline_survival

Estimate and plot baseline survival function for the adaptive LASSO Cox model using Breslow estimator.

show_coefficients

Display coefficient estimates for selected variables with standard errors and confidence intervals.

show_selection_path

Display complete regularization path showing how coefficients change with penalty parameter.

show_cv_results

Display cross-validation results including optimal lambda selection and performance across penalty values.

show_diagnostics

Display comprehensive model diagnostics including proportional hazards tests, influence measures, and residual analysis.

plot_selection_path

Plot coefficient paths as function of penalty parameter showing variable selection progression.

plot_cv_curve

Plot cross-validation performance curve with optimal lambda selection and confidence bands.

plot_stability

Plot stability selection results showing selection frequencies across bootstrap samples.

plot_survival_curves

Plot Kaplan-Meier survival curves for risk groups defined by adaptive LASSO linear predictor.

plot_baseline_hazard

Plot estimated baseline hazard function and cumulative baseline hazard from the final model.

plot_diagnostics

Create diagnostic plots including residuals, influential observations, and proportional hazards assessment.

tie_method

Method for handling tied survival times. Efron provides better approximation but is computationally more intensive.

standardize

Standardize predictors before fitting. Recommended for optimal penalty performance with mixed-scale variables.

intercept

Include intercept term in Cox model. Generally not needed for survival analysis but may be useful in special cases.

parallel_computing

Use parallel computing for cross-validation and bootstrap procedures to reduce computation time.

n_cores

Number of CPU cores for parallel computation when enabled. More cores speed up CV and bootstrap but use more memory.

convergence_threshold

Convergence threshold for coordinate descent algorithm. Smaller values provide more precise solutions but increase computation time.

max_iterations

Maximum iterations for coordinate descent algorithm. Increase if convergence warnings occur.

random_seed

Random seed for cross-validation folds and bootstrap sampling. Ensures reproducible results across analyses.

Value

A results object containing:

results$instructionsa html
results$todoa html
results$coefficientsa table
results$selectionPatha table
results$cvResultsa table
results$stabilityResultsa table
results$modelDiagnosticsa table
results$performanceMetricsa table
results$riskGroupsa table
results$predictionsa table
results$pathPlotan image
results$cvPlotan image
results$stabilityPlotan image
results$survivalPlotan image
results$baselineHazardPlotan image
results$diagnosticsPlotan image

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$coefficients$asDF

as.data.frame(results$coefficients)

Examples

result <- adaptivelasso(
    data = mydata,
    time = "time_to_event",
    event = "event_indicator",
    predictors = c("age", "stage", "biomarker1", "biomarker2"),
    weight_method = "ridge",
    cv_folds = 10,
    stability_selection = TRUE,
    alpha = 1.0
)