Group LASSO for Survival Analysis — grouplasso • ClinicoPath

Group LASSO for Cox proportional hazards models enabling simultaneous selection of pre-defined variable groups while maintaining within-group structure. This method extends traditional LASSO by applying penalties at the group level, making it ideal for categorical variables with multiple dummy codes, grouped biomarkers, or structured predictors like gene pathways. The implementation supports overlapping groups, adaptive group weights, sparse group LASSO combining group and individual penalties, and comprehensive cross-validation for optimal penalty selection. Particularly valuable for genomic survival analysis, clinical prediction models with natural variable groupings, and scenarios requiring interpretable group-wise feature selection with preserved biological or clinical structure.

Usage

grouplasso(
  data,
  time,
  event,
  predictors,
  strata,
  group_definition = "automatic",
  group_structure = "",
  factor_grouping = TRUE,
  penalty_type = "group_lasso",
  alpha = 0.5,
  group_weights = "sqrt_size",
  custom_weights = "",
  cv_folds = 10,
  cv_measure = "deviance",
  lambda_sequence = "auto",
  n_lambda = 50,
  lambda_min_ratio = 0.01,
  algorithm = "coordinate",
  max_iterations = 1000,
  tolerance = 1e-06,
  selection_threshold = 1e-08,
  stability_selection = FALSE,
  bootstrap_samples = 100,
  stability_threshold = 0.6,
  nested_cv = FALSE,
  inner_cv_folds = 5,
  permutation_test = FALSE,
  n_permutations = 100,
  show_group_summary = TRUE,
  show_coefficients = TRUE,
  show_path_summary = TRUE,
  show_cv_results = TRUE,
  plot_regularization_path = TRUE,
  plot_cv_curve = TRUE,
  plot_group_importance = TRUE,
  plot_stability = FALSE,
  plot_group_structure = FALSE,
  standardize = TRUE,
  center_groups = FALSE,
  adaptive_weights_method = "ridge",
  warm_start = TRUE,
  parallel_computing = FALSE,
  random_seed = 123
)

Arguments

data: The data as a data frame.
time: Time to event variable (numeric). For right-censored data, this is the time from study entry to event or censoring.
event: Event indicator variable. For survival analysis: 0 = censored, 1 = event. For competing risks: 0 = censored, 1+ = different event types.
predictors: Variables to include in the group LASSO Cox model. Variables can be grouped based on biological, clinical, or statistical criteria. Factor variables are automatically converted to dummy variables.
strata: Optional stratification variable for stratified Cox regression. Creates separate baseline hazards for each stratum.
group_definition: Method for defining variable groups. Automatic groups by data type, custom allows manual specification, factor-based groups dummy variables from same factor, biological supports pathway-based groupings.
group_structure: Custom group assignment as comma-separated list. Format: 'var1:group1, var2:group1, var3:group2'. Only used when group_definition is 'custom'. Groups can overlap.
factor_grouping: Automatically group dummy variables from the same factor variable. Ensures that factor variables are selected/excluded as complete units.
penalty_type: Type of group penalty. Group LASSO selects entire groups, sparse group combines group and individual penalties, adaptive uses data-driven weights, overlapping handles variables belonging to multiple groups.
alpha: Mixing parameter for sparse group LASSO. 0 = pure group LASSO, 1 = pure individual LASSO, intermediate values combine both penalties. Only used for sparse group LASSO.
group_weights: Method for calculating group-specific penalty weights. Square root of group size is standard, adaptive uses initial estimates, custom allows user-specified weights.
custom_weights: Custom weights for each group as comma-separated values. Order should match group numbering. Only used with custom weights.
cv_folds: Number of folds for cross-validation to select optimal penalty parameter. More folds provide better estimates but increase computation time.
cv_measure: Performance measure for cross-validation. Deviance is standard for Cox models, C-index focuses on discrimination, Brier score provides calibration-aware selection.
lambda_sequence: Specification of penalty parameter sequence. Automatic uses data-driven range, custom allows user-defined range.
n_lambda: Number of lambda values in the regularization path. More values provide finer resolution but increase computation time.
lambda_min_ratio: Ratio of smallest to largest lambda in automatic sequence. Smaller values explore stronger penalties.
algorithm: Optimization algorithm for group LASSO. Coordinate descent is standard and efficient, proximal gradient handles complex penalties, ADMM works well for overlapping groups.
max_iterations: Maximum iterations for optimization algorithm. Increase if convergence warnings occur.
tolerance: Convergence tolerance for optimization. Smaller values provide more precise solutions but increase computation time.
selection_threshold: Threshold for determining selected groups. Groups with maximum coefficient magnitude below this value are considered unselected.
stability_selection: Perform stability selection across bootstrap samples to identify robust group selection patterns and reduce selection variability.
bootstrap_samples: Number of bootstrap samples for stability selection. More samples provide more stable group selection.
stability_threshold: Minimum selection frequency for groups in stability selection. Higher thresholds provide more conservative selection.
nested_cv: Perform nested cross-validation for unbiased performance estimation. Provides honest assessment of model performance with optimal penalties.
inner_cv_folds: Number of inner CV folds for nested cross-validation. Used for penalty selection within each outer fold.
permutation_test: Perform permutation test to assess statistical significance of group selection and overall model performance.
n_permutations: Number of permutations for significance testing. More permutations provide more accurate p-values.
show_group_summary: Display summary of group definitions, sizes, and selection status with penalty parameter information.
show_coefficients: Display coefficient estimates for selected variables organized by groups with selection indicators.
show_path_summary: Display summary of regularization path showing group entry and exit points along penalty sequence.
show_cv_results: Display cross-validation results including optimal penalty selection and performance metrics.
plot_regularization_path: Plot group-wise coefficient paths showing how groups enter/exit the model as penalty increases.
plot_cv_curve: Plot cross-validation performance curve with optimal penalty selection and confidence bands.
plot_group_importance: Visualize relative importance of selected groups based on coefficient norms and selection frequency.
plot_stability: Plot stability selection results showing group selection frequencies across bootstrap samples.
plot_group_structure: Visualize group structure and variable assignments with overlap indicators for complex grouping schemes.
standardize: Standardize variables before fitting. Recommended for optimal penalty performance across different variable scales.
center_groups: Center variables within their respective groups before applying penalties. Can improve performance for heterogeneous groups.
adaptive_weights_method: Method for calculating adaptive weights for groups. Ridge provides stable estimates, univariate uses marginal effects.
warm_start: Use warm start initialization for faster convergence along the regularization path.
parallel_computing: Use parallel computing for cross-validation and bootstrap procedures to reduce computation time.
random_seed: Random seed for cross-validation folds and bootstrap sampling. Ensures reproducible results across analyses.

Value

A results object containing:

`results$instructions`					a html
`results$todo`					a html
`results$groupSummary`					a table
`results$coefficients`					a table
`results$pathSummary`					a table
`results$cvResults`					a table
`results$stabilityResults`					a table
`results$modelPerformance`					a table
`results$nestedCVResults`					a table
`results$permutationResults`					a table
`results$pathPlot`					an image
`results$cvPlot`					an image
`results$importancePlot`					an image
`results$stabilityPlot`					an image
`results$groupStructurePlot`					an image

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$groupSummary$asDF

as.data.frame(results$groupSummary)

Examples

result <- grouplasso(
    data = mydata,
    time = "time_to_event",
    event = "event_indicator",
    predictors = c("age", "stage", "biomarker1", "biomarker2"),
    group_structure = "age:1, stage:2, biomarker1:3, biomarker2:3",
    penalty_type = "group_lasso",
    cv_folds = 10
)