Group LASSO for Cox proportional hazards models enabling simultaneous selection of pre-defined variable groups while maintaining within-group structure. This method extends traditional LASSO by applying penalties at the group level, making it ideal for categorical variables with multiple dummy codes, grouped biomarkers, or structured predictors like gene pathways. The implementation supports overlapping groups, adaptive group weights, sparse group LASSO combining group and individual penalties, and comprehensive cross-validation for optimal penalty selection. Particularly valuable for genomic survival analysis, clinical prediction models with natural variable groupings, and scenarios requiring interpretable group-wise feature selection with preserved biological or clinical structure.
Usage
grouplasso(
data,
time,
event,
predictors,
strata,
group_definition = "automatic",
group_structure = "",
factor_grouping = TRUE,
penalty_type = "group_lasso",
alpha = 0.5,
group_weights = "sqrt_size",
custom_weights = "",
cv_folds = 10,
cv_measure = "deviance",
lambda_sequence = "auto",
n_lambda = 50,
lambda_min_ratio = 0.01,
algorithm = "coordinate",
max_iterations = 1000,
tolerance = 1e-06,
selection_threshold = 1e-08,
stability_selection = FALSE,
bootstrap_samples = 100,
stability_threshold = 0.6,
nested_cv = FALSE,
inner_cv_folds = 5,
permutation_test = FALSE,
n_permutations = 100,
show_group_summary = TRUE,
show_coefficients = TRUE,
show_path_summary = TRUE,
show_cv_results = TRUE,
plot_regularization_path = TRUE,
plot_cv_curve = TRUE,
plot_group_importance = TRUE,
plot_stability = FALSE,
plot_group_structure = FALSE,
standardize = TRUE,
center_groups = FALSE,
adaptive_weights_method = "ridge",
warm_start = TRUE,
parallel_computing = FALSE,
random_seed = 123
)Arguments
- data
The data as a data frame.
- time
Time to event variable (numeric). For right-censored data, this is the time from study entry to event or censoring.
- event
Event indicator variable. For survival analysis: 0 = censored, 1 = event. For competing risks: 0 = censored, 1+ = different event types.
- predictors
Variables to include in the group LASSO Cox model. Variables can be grouped based on biological, clinical, or statistical criteria. Factor variables are automatically converted to dummy variables.
- strata
Optional stratification variable for stratified Cox regression. Creates separate baseline hazards for each stratum.
- group_definition
Method for defining variable groups. Automatic groups by data type, custom allows manual specification, factor-based groups dummy variables from same factor, biological supports pathway-based groupings.
- group_structure
Custom group assignment as comma-separated list. Format: 'var1:group1, var2:group1, var3:group2'. Only used when group_definition is 'custom'. Groups can overlap.
- factor_grouping
Automatically group dummy variables from the same factor variable. Ensures that factor variables are selected/excluded as complete units.
- penalty_type
Type of group penalty. Group LASSO selects entire groups, sparse group combines group and individual penalties, adaptive uses data-driven weights, overlapping handles variables belonging to multiple groups.
- alpha
Mixing parameter for sparse group LASSO. 0 = pure group LASSO, 1 = pure individual LASSO, intermediate values combine both penalties. Only used for sparse group LASSO.
- group_weights
Method for calculating group-specific penalty weights. Square root of group size is standard, adaptive uses initial estimates, custom allows user-specified weights.
- custom_weights
Custom weights for each group as comma-separated values. Order should match group numbering. Only used with custom weights.
- cv_folds
Number of folds for cross-validation to select optimal penalty parameter. More folds provide better estimates but increase computation time.
- cv_measure
Performance measure for cross-validation. Deviance is standard for Cox models, C-index focuses on discrimination, Brier score provides calibration-aware selection.
- lambda_sequence
Specification of penalty parameter sequence. Automatic uses data-driven range, custom allows user-defined range.
- n_lambda
Number of lambda values in the regularization path. More values provide finer resolution but increase computation time.
- lambda_min_ratio
Ratio of smallest to largest lambda in automatic sequence. Smaller values explore stronger penalties.
- algorithm
Optimization algorithm for group LASSO. Coordinate descent is standard and efficient, proximal gradient handles complex penalties, ADMM works well for overlapping groups.
- max_iterations
Maximum iterations for optimization algorithm. Increase if convergence warnings occur.
- tolerance
Convergence tolerance for optimization. Smaller values provide more precise solutions but increase computation time.
- selection_threshold
Threshold for determining selected groups. Groups with maximum coefficient magnitude below this value are considered unselected.
- stability_selection
Perform stability selection across bootstrap samples to identify robust group selection patterns and reduce selection variability.
- bootstrap_samples
Number of bootstrap samples for stability selection. More samples provide more stable group selection.
- stability_threshold
Minimum selection frequency for groups in stability selection. Higher thresholds provide more conservative selection.
- nested_cv
Perform nested cross-validation for unbiased performance estimation. Provides honest assessment of model performance with optimal penalties.
- inner_cv_folds
Number of inner CV folds for nested cross-validation. Used for penalty selection within each outer fold.
- permutation_test
Perform permutation test to assess statistical significance of group selection and overall model performance.
- n_permutations
Number of permutations for significance testing. More permutations provide more accurate p-values.
- show_group_summary
Display summary of group definitions, sizes, and selection status with penalty parameter information.
- show_coefficients
Display coefficient estimates for selected variables organized by groups with selection indicators.
- show_path_summary
Display summary of regularization path showing group entry and exit points along penalty sequence.
- show_cv_results
Display cross-validation results including optimal penalty selection and performance metrics.
- plot_regularization_path
Plot group-wise coefficient paths showing how groups enter/exit the model as penalty increases.
- plot_cv_curve
Plot cross-validation performance curve with optimal penalty selection and confidence bands.
- plot_group_importance
Visualize relative importance of selected groups based on coefficient norms and selection frequency.
- plot_stability
Plot stability selection results showing group selection frequencies across bootstrap samples.
- plot_group_structure
Visualize group structure and variable assignments with overlap indicators for complex grouping schemes.
- standardize
Standardize variables before fitting. Recommended for optimal penalty performance across different variable scales.
- center_groups
Center variables within their respective groups before applying penalties. Can improve performance for heterogeneous groups.
- adaptive_weights_method
Method for calculating adaptive weights for groups. Ridge provides stable estimates, univariate uses marginal effects.
- warm_start
Use warm start initialization for faster convergence along the regularization path.
- parallel_computing
Use parallel computing for cross-validation and bootstrap procedures to reduce computation time.
- random_seed
Random seed for cross-validation folds and bootstrap sampling. Ensures reproducible results across analyses.
Value
A results object containing:
results$instructions | a html | ||||
results$todo | a html | ||||
results$groupSummary | a table | ||||
results$coefficients | a table | ||||
results$pathSummary | a table | ||||
results$cvResults | a table | ||||
results$stabilityResults | a table | ||||
results$modelPerformance | a table | ||||
results$nestedCVResults | a table | ||||
results$permutationResults | a table | ||||
results$pathPlot | an image | ||||
results$cvPlot | an image | ||||
results$importancePlot | an image | ||||
results$stabilityPlot | an image | ||||
results$groupStructurePlot | an image |
Tables can be converted to data frames with asDF or as.data.frame. For example:
results$groupSummary$asDF
as.data.frame(results$groupSummary)
Examples
result <- grouplasso(
data = mydata,
time = "time_to_event",
event = "event_indicator",
predictors = c("age", "stage", "biomarker1", "biomarker2"),
group_structure = "age:1, stage:2, biomarker1:3, biomarker2:3",
penalty_type = "group_lasso",
cv_folds = 10
)