Adaptive LASSO for Cox proportional hazards models with data-driven penalty selection and improved variable selection properties. Unlike standard LASSO, the adaptive LASSO uses weights based on initial parameter estimates to provide consistent variable selection (oracle property) and reduced bias for non-zero coefficients. This implementation supports multiple weight calculation methods, cross-validation for optimal penalty selection, stability selection for robust variable identification, and comprehensive model diagnostics. Particularly effective for high-dimensional survival data where traditional methods fail, genomic studies with many biomarkers, and clinical prediction models requiring sparse, interpretable solutions. The method automatically handles tied survival times, stratified analysis, and provides uncertainty quantification through bootstrap procedures.
Usage
adaptivelasso(
data,
time,
event,
predictors,
strata,
weight_method = "ridge",
alpha = 1,
gamma = 1,
cv_folds = 10,
cv_measure = "deviance",
lambda_sequence = "auto",
lambda_min_ratio = 0.001,
n_lambda = 100,
stability_selection = FALSE,
stability_threshold = 0.6,
bootstrap_samples = 100,
subsampling_ratio = 0.8,
proportional_hazards = TRUE,
influence_diagnostics = FALSE,
goodness_of_fit = TRUE,
risk_groups = 3,
time_points = "1, 2, 5",
baseline_survival = TRUE,
show_coefficients = TRUE,
show_selection_path = TRUE,
show_cv_results = TRUE,
show_diagnostics = TRUE,
plot_selection_path = TRUE,
plot_cv_curve = TRUE,
plot_stability = FALSE,
plot_survival_curves = FALSE,
plot_baseline_hazard = FALSE,
plot_diagnostics = FALSE,
tie_method = "breslow",
standardize = TRUE,
intercept = FALSE,
parallel_computing = FALSE,
n_cores = 2,
convergence_threshold = 1e-07,
max_iterations = 10000,
random_seed = 123
)Arguments
- data
The data as a data frame.
- time
Time to event variable (numeric). For right-censored data, this is the time from study entry to event or censoring.
- event
Event indicator variable. For survival analysis: 0 = censored, 1 = event. For competing risks: 0 = censored, 1+ = different event types.
- predictors
Variables to include in the adaptive LASSO Cox model. Can include continuous, ordinal, and nominal variables. Automatic standardization is applied for optimal penalty performance.
- strata
Optional stratification variable for stratified Cox regression. Creates separate baseline hazards for each stratum.
- weight_method
Method for calculating adaptive weights. Ridge provides stable weights with shrinkage, univariate uses individual Cox regressions, full Cox uses complete model (may be unstable), correlation uses marginal associations.
- alpha
Elastic net mixing parameter. 1.0 = pure LASSO, 0.0 = ridge regression, intermediate values combine both penalties for better performance with correlated predictors.
- gamma
Power parameter for adaptive weights calculation. Higher values increase penalty differences between variables, promoting stronger variable selection but potentially increasing instability.
- cv_folds
Number of folds for cross-validation to select optimal penalty parameter. More folds provide better estimates but increase computation time.
- cv_measure
Performance measure for cross-validation. Deviance is computationally efficient, C-index focuses on discrimination, Brier score accounts for calibration, AUC provides time-specific performance.
- lambda_sequence
How to specify the penalty parameter sequence. Automatic uses data-driven range, custom allows user specification, single tests only one value.
- lambda_min_ratio
Ratio of smallest to largest lambda value in automatic sequence. Smaller values explore stronger penalties but may lead to computational difficulties.
- n_lambda
Number of lambda values in the regularization path. More values provide finer resolution but increase computation time.
- stability_selection
Perform stability selection to identify robust variable selection patterns across bootstrap samples. Provides more reliable variable selection than single model fitting.
- stability_threshold
Minimum selection frequency for variables in stability selection. Higher thresholds provide more conservative variable selection but may miss important predictors.
- bootstrap_samples
Number of bootstrap samples for stability selection and confidence intervals. More samples provide more stable estimates.
- subsampling_ratio
Proportion of data used in each bootstrap sample for stability selection. Smaller ratios increase selection variability.
- proportional_hazards
Test proportional hazards assumption for selected variables using scaled Schoenfeld residuals and time-interaction tests.
- influence_diagnostics
Calculate influence diagnostics including dfbeta, leverage, and outlier detection for robust model assessment.
- goodness_of_fit
Perform goodness of fit assessment including model deviance, concordance statistics, and calibration measures.
- risk_groups
Number of risk groups for stratification based on linear predictor. Used for Kaplan-Meier curves and risk group analysis.
- time_points
Comma-separated list of time points for survival probability predictions. Empty string uses data-driven quantiles.
- baseline_survival
Estimate and plot baseline survival function for the adaptive LASSO Cox model using Breslow estimator.
- show_coefficients
Display coefficient estimates for selected variables with standard errors and confidence intervals.
- show_selection_path
Display complete regularization path showing how coefficients change with penalty parameter.
- show_cv_results
Display cross-validation results including optimal lambda selection and performance across penalty values.
- show_diagnostics
Display comprehensive model diagnostics including proportional hazards tests, influence measures, and residual analysis.
- plot_selection_path
Plot coefficient paths as function of penalty parameter showing variable selection progression.
- plot_cv_curve
Plot cross-validation performance curve with optimal lambda selection and confidence bands.
- plot_stability
Plot stability selection results showing selection frequencies across bootstrap samples.
- plot_survival_curves
Plot Kaplan-Meier survival curves for risk groups defined by adaptive LASSO linear predictor.
- plot_baseline_hazard
Plot estimated baseline hazard function and cumulative baseline hazard from the final model.
- plot_diagnostics
Create diagnostic plots including residuals, influential observations, and proportional hazards assessment.
- tie_method
Method for handling tied survival times. Efron provides better approximation but is computationally more intensive.
- standardize
Standardize predictors before fitting. Recommended for optimal penalty performance with mixed-scale variables.
- intercept
Include intercept term in Cox model. Generally not needed for survival analysis but may be useful in special cases.
- parallel_computing
Use parallel computing for cross-validation and bootstrap procedures to reduce computation time.
- n_cores
Number of CPU cores for parallel computation when enabled. More cores speed up CV and bootstrap but use more memory.
- convergence_threshold
Convergence threshold for coordinate descent algorithm. Smaller values provide more precise solutions but increase computation time.
- max_iterations
Maximum iterations for coordinate descent algorithm. Increase if convergence warnings occur.
- random_seed
Random seed for cross-validation folds and bootstrap sampling. Ensures reproducible results across analyses.
Value
A results object containing:
results$instructions | a html | ||||
results$todo | a html | ||||
results$coefficients | a table | ||||
results$selectionPath | a table | ||||
results$cvResults | a table | ||||
results$stabilityResults | a table | ||||
results$modelDiagnostics | a table | ||||
results$performanceMetrics | a table | ||||
results$riskGroups | a table | ||||
results$predictions | a table | ||||
results$pathPlot | an image | ||||
results$cvPlot | an image | ||||
results$stabilityPlot | an image | ||||
results$survivalPlot | an image | ||||
results$baselineHazardPlot | an image | ||||
results$diagnosticsPlot | an image |
Tables can be converted to data frames with asDF or as.data.frame. For example:
results$coefficients$asDF
as.data.frame(results$coefficients)
Examples
result <- adaptivelasso(
data = mydata,
time = "time_to_event",
event = "event_indicator",
predictors = c("age", "stage", "biomarker1", "biomarker2"),
weight_method = "ridge",
cv_folds = 10,
stability_selection = TRUE,
alpha = 1.0
)