Skip to contents

Performs penalized Cox proportional hazards regression using regularization methods (LASSO, Ridge, Elastic Net) for high-dimensional survival data. This method is particularly useful when the number of variables is large relative to the number of observations, or when multicollinearity is present. The regularization helps with variable selection and prevents overfitting.

Usage

penalizedcox(
  data,
  elapsedtime = NULL,
  tint = FALSE,
  dxdate = NULL,
  fudate = NULL,
  timetypedata = "ymd",
  timetypeoutput = "months",
  outcome = NULL,
  outcomeLevel,
  predictors = NULL,
  penalty_type = "lasso",
  alpha = 0.5,
  lambda_selection = "1se",
  lambda_custom = 0.01,
  cv_folds = 10,
  cv_type = "deviance",
  variable_selection = TRUE,
  standardize = TRUE,
  include_intercept = FALSE,
  bootstrap_validation = FALSE,
  bootstrap_samples = 100,
  predict_risk = TRUE,
  survival_curves = FALSE,
  risk_groups = "3",
  coefficient_plot = TRUE,
  cv_plot = TRUE,
  variable_importance = FALSE,
  lambda_sequence = "",
  max_variables = 100,
  convergence_threshold = 1e-07,
  show_coefficients = TRUE,
  show_model_metrics = TRUE,
  show_lambda_path = FALSE,
  showSummaries = FALSE,
  showExplanations = FALSE,
  addRiskScore = FALSE,
  addRiskGroup = FALSE
)

Arguments

data

The dataset for analysis, provided as a data frame. Should contain survival variables and predictor variables for penalized regression.

elapsedtime

The numeric variable representing follow-up time until the event or censoring.

tint

If true, survival time will be calculated from diagnosis and follow-up dates.

dxdate

Date of diagnosis or start of follow-up. Required if tint = true.

fudate

Follow-up date or date of last observation. Required if tint = true.

timetypedata

Specifies the format of date variables in the input data.

timetypeoutput

The units in which survival time is reported in the output.

outcome

The outcome variable indicating event status (e.g., death, recurrence).

outcomeLevel

The level of outcome considered as the event.

predictors

Variables to include in the penalized Cox regression model.

penalty_type

Type of penalty to apply. LASSO performs variable selection, Ridge shrinks coefficients, Elastic Net combines both.

alpha

Mixing parameter for Elastic Net. Alpha=1 is LASSO, alpha=0 is Ridge. Used only when penalty_type = "elastic_net".

lambda_selection

Method for selecting the regularization parameter lambda.

lambda_custom

Custom lambda value when lambda_selection = "custom".

cv_folds

Number of folds for cross-validation to select optimal lambda.

cv_type

Cross-validation error measure for model selection.

variable_selection

Extract and display selected variables (non-zero coefficients).

standardize

Whether to standardize predictor variables before fitting.

include_intercept

Whether to include an intercept in the model (usually false for Cox models).

bootstrap_validation

Perform bootstrap validation of the penalized model.

bootstrap_samples

Number of bootstrap samples for validation.

predict_risk

Calculate linear predictors (risk scores) for each observation.

survival_curves

Generate survival curves stratified by risk score groups.

risk_groups

Number of risk groups for survival curve stratification.

coefficient_plot

Display coefficient paths showing shrinkage across lambda values.

cv_plot

Display cross-validation error plot for lambda selection.

variable_importance

Display variable importance based on coefficient magnitudes.

lambda_sequence

Custom sequence of lambda values (comma-separated). If empty, glmnet will choose automatically.

max_variables

Maximum number of variables to include in the model path.

convergence_threshold

Convergence threshold for coordinate descent algorithm.

show_coefficients

Display table of selected variables and their coefficients.

show_model_metrics

Display model performance metrics including deviance and C-index.

show_lambda_path

Display detailed information about lambda selection process.

showSummaries

Display natural language summaries alongside tables and plots for interpretation of penalized Cox regression results.

showExplanations

Display detailed explanations of penalized Cox regression methods and interpretation guidelines.

addRiskScore

Add calculated linear predictor (risk score) as new variable to dataset.

addRiskGroup

Add risk group classification as new variable to dataset.

Value

A results object containing:

results$todoa html
results$modelSummarya html
results$coefficientTablea table
results$performanceTablea table
results$crossValidationSummarya html
results$lambdaTablea table
results$variableImportanceTablea table
results$riskScoreTablea table
results$riskGroupTablea table
results$bootstrapTablea table
results$coefficientPlotan image
results$cvPlotan image
results$importancePlotan image
results$survivalPlotan image
results$analysisSummarya html
results$methodExplanationa html
results$regularizationExplanationa html
results$variableSelectionExplanationa html
results$crossValidationExplanationa html
results$performanceExplanationa html
results$calculatedtimean output
results$riskScoreOutputan output
results$riskGroupOutputan output

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$coefficientTable$asDF

as.data.frame(results$coefficientTable)

Examples

# Example 1: LASSO penalized Cox regression
library(survival)
library(glmnet)

penalizedcox(
    data = lung_data,
    elapsedtime = "time",
    outcome = "status",
    outcomeLevel = "2",
    predictors = c("age", "sex", "ph.ecog", "ph.karno"),
    penalty_type = "lasso",
    cv_folds = 10
)

# Example 2: Elastic Net with variable selection
penalizedcox(
    data = genomic_data,
    elapsedtime = "survival_time",
    outcome = "event",
    outcomeLevel = "1",
    predictors = c("gene1", "gene2", "gene3", "gene4"),
    penalty_type = "elastic_net",
    alpha = 0.5,
    lambda_selection = "1se",
    variable_selection = TRUE
)