Recursive Partitioning Analysis (RPA) for survival data using CART methodology. Builds a decision tree to identify optimal cut-points for risk stratification. Automatically performs cross-validation and tree pruning. Creates new variable with risk group assignments. Useful for developing prognostic staging systems by integrating multiple predictors.
Usage
rpasurvival(
data,
time,
event,
predictors,
eventValue = "1",
time_unit = "months",
minbucket = 20,
cp = 0.01,
maxdepth = 3,
nfolds = 10,
prunetree = TRUE,
riskgrouplabels = "auto",
treeplot = TRUE,
kmplot = TRUE,
kmci = FALSE,
risktable = TRUE,
pval = TRUE,
riskgrouptable = TRUE,
cptable = FALSE,
variableimportance = TRUE,
createnewvar = FALSE,
newvarname = "rpa_stage",
showSummary = TRUE,
showInterpretation = FALSE,
showReport = TRUE
)Arguments
- data
.
- time
a (non-negative valued) vector of survival times containing the (possibly censored) time to the event or time of last observation
- event
the status indicator; normally 0=alive/censored, 1=dead/event. Other choices are TRUE/FALSE (TRUE = death/event) or 1/2 (2=death/event)
- predictors
variables to use in recursive partitioning analysis for developing risk stratification groups
- eventValue
the value in the event variable that represents an event (death/failure)
- time_unit
unit of measurement for survival time. Used for calculating 5-year survival estimates in the risk group summary table.
- minbucket
the minimum number of observations in any terminal (leaf) node. Smaller values create more detailed trees but may overfit.
- cp
any split that does not decrease overall lack of fit by a factor of cp is not attempted. Smaller values grow larger trees.
- maxdepth
maximum depth of any node of the final tree, with the root node counted as depth 0. Values greater than 30 are unlikely.
- nfolds
number of cross-validation folds for pruning the tree. Use 0 to suppress cross-validation.
- prunetree
prune the tree using cross-validation to select optimal complexity parameter
- riskgrouplabels
labeling scheme for terminal nodes (risk groups)
- treeplot
.
- kmplot
.
- kmci
.
- risktable
.
- pval
.
- riskgrouptable
.
- cptable
.
- variableimportance
.
- createnewvar
.
- newvarname
name for the new variable containing RPA stage assignments
- showSummary
.
- showInterpretation
.
- showReport
.
Value
A results object containing:
results$instructions | a html | ||||
results$summary | a html | ||||
results$interpretation | a html | ||||
results$report | a html | ||||
results$treeplot | Decision tree showing recursive partitioning splits for survival risk stratification | ||||
results$riskgrouptable | a table | ||||
results$kmplot | Kaplan-Meier survival curves stratified by RPA-derived risk groups | ||||
results$logranktest | a table | ||||
results$cptable | a table | ||||
results$varimp | a table | ||||
results$coxmodel | a table | ||||
results$notices | a html |
Tables can be converted to data frames with asDF or as.data.frame. For example:
results$riskgrouptable$asDF
as.data.frame(results$riskgrouptable)
Examples
# Example: Develop RPA staging system for survival data
# Generate example survival data with multiple predictors
set.seed(12345)
n <- 200
survData <- data.frame(
time = rexp(n, rate = 0.02),
event = rbinom(n, 1, 0.65),
age = rnorm(n, 60, 15),
stage = factor(sample(c("I", "II", "III", "IV"), n, replace = TRUE,
prob = c(0.3, 0.3, 0.25, 0.15))),
grade = factor(sample(c("G1", "G2", "G3"), n, replace = TRUE,
prob = c(0.2, 0.5, 0.3))),
LVI = factor(sample(c("Absent", "Present"), n, replace = TRUE,
prob = c(0.6, 0.4)))
)
# Basic RPA analysis
rpasurvival(
data = survData,
time = "time",
event = "event",
predictors = c("age", "stage", "grade", "LVI"),
eventValue = "1",
time_unit = "months",
minbucket = 20,
cp = 0.01,
maxdepth = 3,
prunetree = TRUE,
treeplot = TRUE,
kmplot = TRUE,
riskgrouptable = TRUE,
variableimportance = TRUE,
riskgrouplabels = "auto",
showSummary = TRUE,
showReport = TRUE
)
#>
#> RECURSIVE PARTITIONING ANALYSIS FOR SURVIVAL
#>
#> <div style="font-family: Arial; padding: 15px; background-color:
#> #f8f9fa; border-radius: 5px; margin: 10px 0;">
#> <h3 style="color: #0066cc; margin-top: 0;">Recursive Partitioning
#> Analysis for Survival
#>
#> Purpose: Develop risk stratification groups using binary tree
#> partitioning on survival data.
#>
#> Method: CART (Classification and Regression Trees) for survival
#> endpoints.
#>
#> Output: Terminal nodes become risk groups (e.g., Stage I, II, III).
#>
#> <h4 style="color: #0066cc;">Usage Notes:
#>
#>
#> Requires Survival Time (numeric, non-negative) and Event Status (0/1
#> or TRUE/FALSE).
#> Select predictor variables: categorical (stage, grade) or continuous
#> (age, biomarker).
#> Tree is pruned using cross-validation unless disabled.
#> Minimum node size (default=20) controls tree complexity.
#> Log-rank criterion recommended for survival splitting.
#>
#> <h4 style="color: #0066cc;">Example Application:
#>
#> Integrate LVI status + ypTNM stage to create RPA staging (as in Liu et
#> al., Br J Cancer 2026).
#>
#>
#>
#> <div style="font-family: Arial, sans-serif; margin: 10px;"><div
#> style="margin: 8px 0; padding: 10px; background-color: #5bc0de22;
#> border-left: 4px solid #5bc0de; border-radius: 3px;"><strong
#> style="color: #5bc0de;">ℹ Statistical Assumptions<br/><span
#> style="color: #333;">RPA assumes proportional hazards (constant hazard
#> ratio over time). For competing risks or crossing hazards, consider
#> alternative methods.<div style="margin: 8px 0; padding: 10px;
#> background-color: #5bc0de22; border-left: 4px solid #5bc0de;
#> border-radius: 3px;"><strong style="color: #5bc0de;">ℹ Tree
#> Pruned<br/><span style="color: #333;">Tree pruned using 10-fold
#> cross-validation. Optimal CP = 0.0193. This helps prevent
#> overfitting.<div style="margin: 8px 0; padding: 10px;
#> background-color: #f0ad4e22; border-left: 4px solid #f0ad4e;
#> border-radius: 3px;"><strong style="color: #f0ad4e;">⚠ No
#> Splits Found<br/><span style="color: #333;">Tree has no splits. Try
#> reducing minbucket or cp parameters.
# Create and save risk groups as new variable
rpasurvival(
data = survData,
time = "time",
event = "event",
predictors = c("stage", "LVI"),
createnewvar = TRUE,
newvarname = "rpa_risk_group",
riskgrouplabels = "risk"
)
#>
#> RECURSIVE PARTITIONING ANALYSIS FOR SURVIVAL
#>
#> <div style="font-family: Arial; padding: 15px; background-color:
#> #f8f9fa; border-radius: 5px; margin: 10px 0;">
#> <h3 style="color: #0066cc; margin-top: 0;">Recursive Partitioning
#> Analysis for Survival
#>
#> Purpose: Develop risk stratification groups using binary tree
#> partitioning on survival data.
#>
#> Method: CART (Classification and Regression Trees) for survival
#> endpoints.
#>
#> Output: Terminal nodes become risk groups (e.g., Stage I, II, III).
#>
#> <h4 style="color: #0066cc;">Usage Notes:
#>
#>
#> Requires Survival Time (numeric, non-negative) and Event Status (0/1
#> or TRUE/FALSE).
#> Select predictor variables: categorical (stage, grade) or continuous
#> (age, biomarker).
#> Tree is pruned using cross-validation unless disabled.
#> Minimum node size (default=20) controls tree complexity.
#> Log-rank criterion recommended for survival splitting.
#>
#> <h4 style="color: #0066cc;">Example Application:
#>
#> Integrate LVI status + ypTNM stage to create RPA staging (as in Liu et
#> al., Br J Cancer 2026).
#>
#>
#>
#> <div style="font-family: Arial, sans-serif; margin: 10px;"><div
#> style="margin: 8px 0; padding: 10px; background-color: #5bc0de22;
#> border-left: 4px solid #5bc0de; border-radius: 3px;"><strong
#> style="color: #5bc0de;">ℹ Statistical Assumptions<br/><span
#> style="color: #333;">RPA assumes proportional hazards (constant hazard
#> ratio over time). For competing risks or crossing hazards, consider
#> alternative methods.<div style="margin: 8px 0; padding: 10px;
#> background-color: #5bc0de22; border-left: 4px solid #5bc0de;
#> border-radius: 3px;"><strong style="color: #5bc0de;">ℹ Tree
#> Pruned<br/><span style="color: #333;">Tree pruned using 10-fold
#> cross-validation. Optimal CP = 0.0028. This helps prevent
#> overfitting.<div style="margin: 8px 0; padding: 10px;
#> background-color: #f0ad4e22; border-left: 4px solid #f0ad4e;
#> border-radius: 3px;"><strong style="color: #f0ad4e;">⚠ No
#> Splits Found<br/><span style="color: #333;">Tree has no splits. Try
#> reducing minbucket or cp parameters.
# Conservative settings for external validation
rpasurvival(
data = survData,
time = "time",
event = "event",
predictors = c("age", "stage", "grade"),
minbucket = 30, # Larger minimum node size
cp = 0.02, # More conservative pruning
maxdepth = 2, # Simpler tree
nfolds = 10, # 10-fold cross-validation
prunetree = TRUE,
cptable = TRUE # Show complexity parameter table
)
#>
#> RECURSIVE PARTITIONING ANALYSIS FOR SURVIVAL
#>
#> <div style="font-family: Arial; padding: 15px; background-color:
#> #f8f9fa; border-radius: 5px; margin: 10px 0;">
#> <h3 style="color: #0066cc; margin-top: 0;">Recursive Partitioning
#> Analysis for Survival
#>
#> Purpose: Develop risk stratification groups using binary tree
#> partitioning on survival data.
#>
#> Method: CART (Classification and Regression Trees) for survival
#> endpoints.
#>
#> Output: Terminal nodes become risk groups (e.g., Stage I, II, III).
#>
#> <h4 style="color: #0066cc;">Usage Notes:
#>
#>
#> Requires Survival Time (numeric, non-negative) and Event Status (0/1
#> or TRUE/FALSE).
#> Select predictor variables: categorical (stage, grade) or continuous
#> (age, biomarker).
#> Tree is pruned using cross-validation unless disabled.
#> Minimum node size (default=20) controls tree complexity.
#> Log-rank criterion recommended for survival splitting.
#>
#> <h4 style="color: #0066cc;">Example Application:
#>
#> Integrate LVI status + ypTNM stage to create RPA staging (as in Liu et
#> al., Br J Cancer 2026).
#>
#>
#>
#> <div style="font-family: Arial, sans-serif; margin: 10px;"><div
#> style="margin: 8px 0; padding: 10px; background-color: #5bc0de22;
#> border-left: 4px solid #5bc0de; border-radius: 3px;"><strong
#> style="color: #5bc0de;">ℹ Statistical Assumptions<br/><span
#> style="color: #333;">RPA assumes proportional hazards (constant hazard
#> ratio over time). For competing risks or crossing hazards, consider
#> alternative methods.<div style="margin: 8px 0; padding: 10px;
#> background-color: #5bc0de22; border-left: 4px solid #5bc0de;
#> border-radius: 3px;"><strong style="color: #5bc0de;">ℹ Tree
#> Pruned<br/><span style="color: #333;">Tree pruned using 10-fold
#> cross-validation. Optimal CP = 0.0143. This helps prevent
#> overfitting.<div style="margin: 8px 0; padding: 10px;
#> background-color: #f0ad4e22; border-left: 4px solid #f0ad4e;
#> border-radius: 3px;"><strong style="color: #f0ad4e;">⚠ No
#> Splits Found<br/><span style="color: #333;">Tree has no splits. Try
#> reducing minbucket or cp parameters.