Skip to contents

Group LASSO for Cox proportional hazards models enabling simultaneous selection of pre-defined variable groups while maintaining within-group structure. This method extends traditional LASSO by applying penalties at the group level, making it ideal for categorical variables with multiple dummy codes, grouped biomarkers, or structured predictors like gene pathways. The implementation supports overlapping groups, adaptive group weights, sparse group LASSO combining group and individual penalties, and comprehensive cross-validation for optimal penalty selection. Particularly valuable for genomic survival analysis, clinical prediction models with natural variable groupings, and scenarios requiring interpretable group-wise feature selection with preserved biological or clinical structure.

Usage

grouplasso(
  data,
  time,
  event,
  predictors,
  strata,
  group_definition = "automatic",
  group_structure = "",
  factor_grouping = TRUE,
  penalty_type = "group_lasso",
  alpha = 0.5,
  group_weights = "sqrt_size",
  custom_weights = "",
  cv_folds = 10,
  cv_measure = "deviance",
  lambda_sequence = "auto",
  n_lambda = 50,
  lambda_min_ratio = 0.01,
  algorithm = "coordinate",
  max_iterations = 1000,
  tolerance = 1e-06,
  selection_threshold = 1e-08,
  stability_selection = FALSE,
  bootstrap_samples = 100,
  stability_threshold = 0.6,
  nested_cv = FALSE,
  inner_cv_folds = 5,
  permutation_test = FALSE,
  n_permutations = 100,
  show_group_summary = TRUE,
  show_coefficients = TRUE,
  show_path_summary = TRUE,
  show_cv_results = TRUE,
  plot_regularization_path = TRUE,
  plot_cv_curve = TRUE,
  plot_group_importance = TRUE,
  plot_stability = FALSE,
  plot_group_structure = FALSE,
  standardize = TRUE,
  center_groups = FALSE,
  adaptive_weights_method = "ridge",
  warm_start = TRUE,
  parallel_computing = FALSE,
  random_seed = 123
)

Arguments

data

The data as a data frame.

time

Time to event variable (numeric). For right-censored data, this is the time from study entry to event or censoring.

event

Event indicator variable. For survival analysis: 0 = censored, 1 = event. For competing risks: 0 = censored, 1+ = different event types.

predictors

Variables to include in the group LASSO Cox model. Variables can be grouped based on biological, clinical, or statistical criteria. Factor variables are automatically converted to dummy variables.

strata

Optional stratification variable for stratified Cox regression. Creates separate baseline hazards for each stratum.

group_definition

Method for defining variable groups. Automatic groups by data type, custom allows manual specification, factor-based groups dummy variables from same factor, biological supports pathway-based groupings.

group_structure

Custom group assignment as comma-separated list. Format: 'var1:group1, var2:group1, var3:group2'. Only used when group_definition is 'custom'. Groups can overlap.

factor_grouping

Automatically group dummy variables from the same factor variable. Ensures that factor variables are selected/excluded as complete units.

penalty_type

Type of group penalty. Group LASSO selects entire groups, sparse group combines group and individual penalties, adaptive uses data-driven weights, overlapping handles variables belonging to multiple groups.

alpha

Mixing parameter for sparse group LASSO. 0 = pure group LASSO, 1 = pure individual LASSO, intermediate values combine both penalties. Only used for sparse group LASSO.

group_weights

Method for calculating group-specific penalty weights. Square root of group size is standard, adaptive uses initial estimates, custom allows user-specified weights.

custom_weights

Custom weights for each group as comma-separated values. Order should match group numbering. Only used with custom weights.

cv_folds

Number of folds for cross-validation to select optimal penalty parameter. More folds provide better estimates but increase computation time.

cv_measure

Performance measure for cross-validation. Deviance is standard for Cox models, C-index focuses on discrimination, Brier score provides calibration-aware selection.

lambda_sequence

Specification of penalty parameter sequence. Automatic uses data-driven range, custom allows user-defined range.

n_lambda

Number of lambda values in the regularization path. More values provide finer resolution but increase computation time.

lambda_min_ratio

Ratio of smallest to largest lambda in automatic sequence. Smaller values explore stronger penalties.

algorithm

Optimization algorithm for group LASSO. Coordinate descent is standard and efficient, proximal gradient handles complex penalties, ADMM works well for overlapping groups.

max_iterations

Maximum iterations for optimization algorithm. Increase if convergence warnings occur.

tolerance

Convergence tolerance for optimization. Smaller values provide more precise solutions but increase computation time.

selection_threshold

Threshold for determining selected groups. Groups with maximum coefficient magnitude below this value are considered unselected.

stability_selection

Perform stability selection across bootstrap samples to identify robust group selection patterns and reduce selection variability.

bootstrap_samples

Number of bootstrap samples for stability selection. More samples provide more stable group selection.

stability_threshold

Minimum selection frequency for groups in stability selection. Higher thresholds provide more conservative selection.

nested_cv

Perform nested cross-validation for unbiased performance estimation. Provides honest assessment of model performance with optimal penalties.

inner_cv_folds

Number of inner CV folds for nested cross-validation. Used for penalty selection within each outer fold.

permutation_test

Perform permutation test to assess statistical significance of group selection and overall model performance.

n_permutations

Number of permutations for significance testing. More permutations provide more accurate p-values.

show_group_summary

Display summary of group definitions, sizes, and selection status with penalty parameter information.

show_coefficients

Display coefficient estimates for selected variables organized by groups with selection indicators.

show_path_summary

Display summary of regularization path showing group entry and exit points along penalty sequence.

show_cv_results

Display cross-validation results including optimal penalty selection and performance metrics.

plot_regularization_path

Plot group-wise coefficient paths showing how groups enter/exit the model as penalty increases.

plot_cv_curve

Plot cross-validation performance curve with optimal penalty selection and confidence bands.

plot_group_importance

Visualize relative importance of selected groups based on coefficient norms and selection frequency.

plot_stability

Plot stability selection results showing group selection frequencies across bootstrap samples.

plot_group_structure

Visualize group structure and variable assignments with overlap indicators for complex grouping schemes.

standardize

Standardize variables before fitting. Recommended for optimal penalty performance across different variable scales.

center_groups

Center variables within their respective groups before applying penalties. Can improve performance for heterogeneous groups.

adaptive_weights_method

Method for calculating adaptive weights for groups. Ridge provides stable estimates, univariate uses marginal effects.

warm_start

Use warm start initialization for faster convergence along the regularization path.

parallel_computing

Use parallel computing for cross-validation and bootstrap procedures to reduce computation time.

random_seed

Random seed for cross-validation folds and bootstrap sampling. Ensures reproducible results across analyses.

Value

A results object containing:

results$instructionsa html
results$todoa html
results$groupSummarya table
results$coefficientsa table
results$pathSummarya table
results$cvResultsa table
results$stabilityResultsa table
results$modelPerformancea table
results$nestedCVResultsa table
results$permutationResultsa table
results$pathPlotan image
results$cvPlotan image
results$importancePlotan image
results$stabilityPlotan image
results$groupStructurePlotan image

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$groupSummary$asDF

as.data.frame(results$groupSummary)

Examples

result <- grouplasso(
    data = mydata,
    time = "time_to_event",
    event = "event_indicator",
    predictors = c("age", "stage", "biomarker1", "biomarker2"),
    group_structure = "age:1, stage:2, biomarker1:3, biomarker2:3",
    penalty_type = "group_lasso",
    cv_folds = 10
)