Group LASSO Cox Regression - Comprehensive Guide • ClinicoPath

Note: The grouplasso() function is designed for use within jamovi’s GUI. The code examples below show the R syntax for reference.

Group LASSO for Survival Analysis

Overview

Group LASSO fits penalized Cox proportional hazards models that select or remove pre-defined variable groups simultaneously. Unlike standard LASSO (which selects individual variables), Group LASSO uses an L1/L2 mixed norm penalty at the group level via the grpreg R package. This ensures that entire groups of variables (e.g., all dummy codes from a categorical variable, genes in a pathway, or biomarkers in a clinical panel) are selected or excluded together.

The module supports four penalty types (Group LASSO, Group MCP, Group SCAD, Adaptive Group LASSO), multiple grouping strategies, stability selection for robust variable identification, nested cross-validation for unbiased performance assessment, and permutation testing for statistical significance.

This analysis is particularly valuable for genomic survival studies with natural pathway groupings, clinical prediction models with domain-based predictor sets, and any scenario where interpretable group-wise feature selection is preferred over individual variable selection.

Datasets Used in This Guide

Dataset	N	Predictors	Events	Primary Use
`grouplasso_biomarker`	200	15 (mixed numeric + factor)	~100	Breast cancer biomarker panel with 5 clinical groups
`grouplasso_genomic`	120	30 (all continuous)	~60	Gene expression data with 6 pathway groups
`grouplasso_small`	60	8 (mixed)	~30	Small clinical cohort for edge-case testing

1. Basic Group LASSO with Automatic Grouping

Default analysis with breast cancer biomarker data

This example uses automatic grouping, where each original variable (and its dummy codes for factors) forms a separate group.

biomarker <- read.csv(paste0(data_path, "grouplasso_biomarker.csv"))
#> Error in `file()`:
#> ! cannot open the connection
str(biomarker)
#> Error:
#> ! object 'biomarker' not found
table(biomarker$status)
#> Error:
#> ! object 'biomarker' not found

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  group_definition = "automatic",
  factor_grouping = TRUE,
  penalty_type = "group_lasso",
  group_weights = "sqrt_size",
  cv_folds = 10,
  suitabilityCheck = TRUE
)
#> Error:
#> ! object 'biomarker' not found

Look for: The suitability report (EPV, sample size, multicollinearity), group summary showing which groups were selected, coefficient table with hazard ratios, and the three default plots.

2. Penalty Type Comparison

Group MCP - non-convex penalty with less bias

Group MCP provides less shrinkage for large coefficients, potentially selecting fewer but more confident groups.

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  penalty_type = "group_mcp",
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Group SCAD

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  penalty_type = "group_scad",
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Adaptive Group LASSO with ridge initialization

Adaptive Group LASSO uses data-driven weights from an initial ridge Cox model, applying stronger penalties to less important groups.

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  penalty_type = "adaptive_group",
  adaptive_weights_method = "ridge",
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

3. Custom Group Definitions

Manual group assignment for clinical domains

Define 5 clinical domain groups explicitly.

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  group_definition = "custom",
  group_structure = "age:1, bmi:1, tumor_size:2, grade:2, lvi:2, er:3, pr:3, her2:3, ki67:3, albumin:4, ldh:4, crp:4, chemo:5, radiation:5, hormonal:5",
  plot_group_structure = TRUE,
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Look for: The group structure plot showing the variable-to-group assignment, and the group summary table reflecting the custom grouping.

4. Genomic Pathway Analysis

Gene pathway grouping with custom group structure

The genomic dataset has 30 gene expression variables organized into 6 biological pathways.

genomic <- read.csv(paste0(data_path, "grouplasso_genomic.csv"))
#> Error in `file()`:
#> ! cannot open the connection
str(genomic)
#> Error:
#> ! object 'genomic' not found

grouplasso(
  data = genomic,
  time = "time",
  event = "status",
  outcomeLevel = "Progressed",
  censorLevel = "Stable",
  predictors = c("CCND1", "CCNE1", "CDK4", "CDK6", "RB1",
                 "PIK3CA", "AKT1", "PTEN", "MTOR", "TSC1",
                 "TP53", "MDM2", "ATM", "CHEK2", "CDKN2A",
                 "KRAS", "BRAF", "MAP2K1", "ERK1", "ERK2",
                 "BCL2", "BAX", "BIRC5", "CASP3", "CASP8",
                 "VEGFA", "FLT1", "KDR", "ANGPT1", "ANGPT2"),
  group_definition = "custom",
  group_structure = "CCND1:1, CCNE1:1, CDK4:1, CDK6:1, RB1:1, PIK3CA:2, AKT1:2, PTEN:2, MTOR:2, TSC1:2, TP53:3, MDM2:3, ATM:3, CHEK2:3, CDKN2A:3, KRAS:4, BRAF:4, MAP2K1:4, ERK1:4, ERK2:4, BCL2:5, BAX:5, BIRC5:5, CASP3:5, CASP8:5, VEGFA:6, FLT1:6, KDR:6, ANGPT1:6, ANGPT2:6",
  cv_folds = 5,
  suitabilityCheck = TRUE,
  plot_group_structure = TRUE
)
#> Error:
#> ! object 'genomic' not found

Look for: Which pathways (groups) are selected. The data was generated with true signal in Cell Cycle (1), p53 (3), and Angiogenesis (6) pathways.

5. Group Weight Methods

Equal weights

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  group_weights = "equal",
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Custom weights - penalize treatment group more heavily

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  group_definition = "custom",
  group_structure = "age:1, bmi:1, tumor_size:2, grade:2, lvi:2, er:3, pr:3, her2:3, ki67:3, albumin:4, ldh:4, crp:4, chemo:5, radiation:5, hormonal:5",
  group_weights = "custom",
  custom_weights = "1.0, 1.0, 1.0, 1.0, 2.0",
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Look for: The treatment group (group 5) gets a heavier penalty and may be less likely to be selected.

6. Stability Selection

Robust group identification via subsampling

Stability selection repeatedly fits the model on random subsamples and tracks which groups are consistently selected.

grouplasso(
  data = genomic,
  time = "time",
  event = "status",
  outcomeLevel = "Progressed",
  censorLevel = "Stable",
  predictors = c("CCND1", "CCNE1", "CDK4", "CDK6", "RB1",
                 "PIK3CA", "AKT1", "PTEN", "MTOR", "TSC1",
                 "TP53", "MDM2", "ATM", "CHEK2", "CDKN2A",
                 "KRAS", "BRAF", "MAP2K1", "ERK1", "ERK2",
                 "BCL2", "BAX", "BIRC5", "CASP3", "CASP8",
                 "VEGFA", "FLT1", "KDR", "ANGPT1", "ANGPT2"),
  group_definition = "custom",
  group_structure = "CCND1:1, CCNE1:1, CDK4:1, CDK6:1, RB1:1, PIK3CA:2, AKT1:2, PTEN:2, MTOR:2, TSC1:2, TP53:3, MDM2:3, ATM:3, CHEK2:3, CDKN2A:3, KRAS:4, BRAF:4, MAP2K1:4, ERK1:4, ERK2:4, BCL2:5, BAX:5, BIRC5:5, CASP3:5, CASP8:5, VEGFA:6, FLT1:6, KDR:6, ANGPT1:6, ANGPT2:6",
  stability_selection = TRUE,
  bootstrap_samples = 50,
  stability_threshold = 0.6,
  plot_stability = TRUE,
  cv_folds = 5,
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'genomic' not found

Look for: Groups with selection frequency above the 0.6 threshold (dashed red line in the plot) are considered stable. The stability results table shows per-group frequencies and scores.

7. Nested Cross-Validation

Unbiased performance estimation

Nested CV provides an honest estimate of out-of-sample performance by separating model selection (inner CV) from performance evaluation (outer CV).

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  nested_cv = TRUE,
  cv_folds = 5,
  inner_cv_folds = 3,
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Look for: The nested CV results table showing per-fold performance (C-index), optimal lambda, and number of selected groups. Compare the average performance to the training C-index to assess overfitting.

8. Permutation Testing

Statistical significance of group selection

Permutation testing assesses whether the observed group selection and model performance are better than would be expected by chance.

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  permutation_test = TRUE,
  n_permutations = 50,
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Look for: The permutation results table with three test statistics (N Groups Selected, CV Deviance, Concordance Index). Small p-values indicate the model captures genuine signal.

9. Clinical Output Panels

Results summary for reports

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  showSummary = TRUE,
  showExplanations = TRUE,
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Look for: The summary panel provides a plain-language paragraph suitable for tumor board notes, including penalty type, number of selected groups, hazard ratios for top variables, and C-index. The explanations panel covers what Group LASSO does, when to use it, assumptions, and interpretation tips.

10. Advanced Algorithm Settings

Unstandardized variables with tight convergence

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "bmi", "tumor_size", "grade", "lvi",
                 "er", "pr", "her2", "ki67",
                 "albumin", "ldh", "crp",
                 "chemo", "radiation", "hormonal"),
  standardize = FALSE,
  tolerance = 1e-6,
  max_iterations = 50000,
  selection_threshold = 1e-4,
  random_seed = 42,
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Look for: With standardize=FALSE, variables on different scales may have different effective penalties. The higher selection_threshold may classify borderline variables as “not selected.”

11. Small Sample Edge Case

Small clinical cohort with reduced CV folds

small_data <- read.csv(paste0(data_path, "grouplasso_small.csv"))
#> Error in `file()`:
#> ! cannot open the connection
str(small_data)
#> Error:
#> ! object 'small_data' not found
table(small_data$status)
#> Error:
#> ! object 'small_data' not found

grouplasso(
  data = small_data,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("age", "ecog", "tumor_size", "grade",
                 "hemoglobin", "wbc", "platelets", "ldh"),
  group_definition = "custom",
  group_structure = "age:1, ecog:1, tumor_size:2, grade:2, hemoglobin:3, wbc:3, platelets:3, ldh:3",
  cv_folds = 5,
  suitabilityCheck = TRUE
)
#> Error:
#> ! object 'small_data' not found

Look for: The suitability report should flag the small sample size. The model may select fewer groups with wider confidence intervals.

12. Minimal Input - Factor-Based Grouping

Using factor_based grouping method

grouplasso(
  data = biomarker,
  time = "time",
  event = "status",
  outcomeLevel = "Dead",
  censorLevel = "Alive",
  predictors = c("grade", "lvi", "her2", "chemo", "radiation", "hormonal"),
  group_definition = "factor_based",
  suitabilityCheck = FALSE
)
#> Error:
#> ! object 'biomarker' not found

Look for: Each factor variable (grade, lvi, her2, chemo, radiation, hormonal) forms its own group with all its dummy variables included together.

References

Breheny P, Huang J (2015). “Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors.” Statistics and Computing, 25(2), 173-187.
Yuan M, Lin Y (2006). “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society: Series B, 68(1), 49-67.
Meinshausen N, Buhlmann P (2010). “Stability selection.” Journal of the Royal Statistical Society: Series B, 72(4), 417-473.
Therneau TM (2026). survival: Survival Analysis. R package version 3.8-6.
Friedman J, Hastie T, Tibshirani R (2025). glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. R package version 4.1-10.