Skip to contents

Recursive Partitioning Analysis (RPA) for survival data using CART methodology. Builds a decision tree to identify optimal cut-points for risk stratification. Automatically performs cross-validation and tree pruning. Creates new variable with risk group assignments. Useful for developing prognostic staging systems by integrating multiple predictors.

Usage

rpasurvival(
  data,
  time,
  event,
  predictors,
  eventValue = "1",
  time_unit = "months",
  minbucket = 20,
  cp = 0.01,
  maxdepth = 3,
  nfolds = 10,
  prunetree = TRUE,
  riskgrouplabels = "auto",
  treeplot = TRUE,
  kmplot = TRUE,
  kmci = FALSE,
  risktable = TRUE,
  pval = TRUE,
  riskgrouptable = TRUE,
  cptable = FALSE,
  variableimportance = TRUE,
  createnewvar = FALSE,
  newvarname = "rpa_stage",
  showSummary = TRUE,
  showInterpretation = FALSE,
  showReport = TRUE
)

Arguments

data

.

time

a (non-negative valued) vector of survival times containing the (possibly censored) time to the event or time of last observation

event

the status indicator; normally 0=alive/censored, 1=dead/event. Other choices are TRUE/FALSE (TRUE = death/event) or 1/2 (2=death/event)

predictors

variables to use in recursive partitioning analysis for developing risk stratification groups

eventValue

the value in the event variable that represents an event (death/failure)

time_unit

unit of measurement for survival time. Used for calculating 5-year survival estimates in the risk group summary table.

minbucket

the minimum number of observations in any terminal (leaf) node. Smaller values create more detailed trees but may overfit.

cp

any split that does not decrease overall lack of fit by a factor of cp is not attempted. Smaller values grow larger trees.

maxdepth

maximum depth of any node of the final tree, with the root node counted as depth 0. Values greater than 30 are unlikely.

nfolds

number of cross-validation folds for pruning the tree. Use 0 to suppress cross-validation.

prunetree

prune the tree using cross-validation to select optimal complexity parameter

riskgrouplabels

labeling scheme for terminal nodes (risk groups)

treeplot

.

kmplot

.

kmci

.

risktable

.

pval

.

riskgrouptable

.

cptable

.

variableimportance

.

createnewvar

.

newvarname

name for the new variable containing RPA stage assignments

showSummary

.

showInterpretation

.

showReport

.

Value

A results object containing:

results$instructionsa html
results$summarya html
results$interpretationa html
results$reporta html
results$treeplotDecision tree showing recursive partitioning splits for survival risk stratification
results$riskgrouptablea table
results$kmplotKaplan-Meier survival curves stratified by RPA-derived risk groups
results$logranktesta table
results$cptablea table
results$varimpa table
results$coxmodela table
results$noticesa html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$riskgrouptable$asDF

as.data.frame(results$riskgrouptable)

Examples

# Example: Develop RPA staging system for survival data
# Generate example survival data with multiple predictors
set.seed(12345)
n <- 200

survData <- data.frame(
    time = rexp(n, rate = 0.02),
    event = rbinom(n, 1, 0.65),
    age = rnorm(n, 60, 15),
    stage = factor(sample(c("I", "II", "III", "IV"), n, replace = TRUE,
                         prob = c(0.3, 0.3, 0.25, 0.15))),
    grade = factor(sample(c("G1", "G2", "G3"), n, replace = TRUE,
                         prob = c(0.2, 0.5, 0.3))),
    LVI = factor(sample(c("Absent", "Present"), n, replace = TRUE,
                       prob = c(0.6, 0.4)))
)

# Basic RPA analysis
rpasurvival(
    data = survData,
    time = "time",
    event = "event",
    predictors = c("age", "stage", "grade", "LVI"),
    eventValue = "1",
    time_unit = "months",
    minbucket = 20,
    cp = 0.01,
    maxdepth = 3,
    prunetree = TRUE,
    treeplot = TRUE,
    kmplot = TRUE,
    riskgrouptable = TRUE,
    variableimportance = TRUE,
    riskgrouplabels = "auto",
    showSummary = TRUE,
    showReport = TRUE
)
#> 
#>  RECURSIVE PARTITIONING ANALYSIS FOR SURVIVAL
#> 
#>  <div style="font-family: Arial; padding: 15px; background-color:
#>  #f8f9fa; border-radius: 5px; margin: 10px 0;">
#>  <h3 style="color: #0066cc; margin-top: 0;">Recursive Partitioning
#>  Analysis for Survival
#> 
#>  Purpose: Develop risk stratification groups using binary tree
#>  partitioning on survival data.
#> 
#>  Method: CART (Classification and Regression Trees) for survival
#>  endpoints.
#> 
#>  Output: Terminal nodes become risk groups (e.g., Stage I, II, III).
#> 
#>  <h4 style="color: #0066cc;">Usage Notes:
#> 
#> 
#>  Requires Survival Time (numeric, non-negative) and Event Status (0/1
#>  or TRUE/FALSE).
#>  Select predictor variables: categorical (stage, grade) or continuous
#>  (age, biomarker).
#>  Tree is pruned using cross-validation unless disabled.
#>  Minimum node size (default=20) controls tree complexity.
#>  Log-rank criterion recommended for survival splitting.
#> 
#>  <h4 style="color: #0066cc;">Example Application:
#> 
#>  Integrate LVI status + ypTNM stage to create RPA staging (as in Liu et
#>  al., Br J Cancer 2026).
#> 
#> 
#> 
#>  <div style="font-family: Arial, sans-serif; margin: 10px;"><div
#>  style="margin: 8px 0; padding: 10px; background-color: #5bc0de22;
#>  border-left: 4px solid #5bc0de; border-radius: 3px;"><strong
#>  style="color: #5bc0de;">&#x2139; Statistical Assumptions<br/><span
#>  style="color: #333;">RPA assumes proportional hazards (constant hazard
#>  ratio over time). For competing risks or crossing hazards, consider
#>  alternative methods.<div style="margin: 8px 0; padding: 10px;
#>  background-color: #5bc0de22; border-left: 4px solid #5bc0de;
#>  border-radius: 3px;"><strong style="color: #5bc0de;">&#x2139; Tree
#>  Pruned<br/><span style="color: #333;">Tree pruned using 10-fold
#>  cross-validation. Optimal CP = 0.0193. This helps prevent
#>  overfitting.<div style="margin: 8px 0; padding: 10px;
#>  background-color: #f0ad4e22; border-left: 4px solid #f0ad4e;
#>  border-radius: 3px;"><strong style="color: #f0ad4e;">&#x26A0; No
#>  Splits Found<br/><span style="color: #333;">Tree has no splits. Try
#>  reducing minbucket or cp parameters.

# Create and save risk groups as new variable
rpasurvival(
    data = survData,
    time = "time",
    event = "event",
    predictors = c("stage", "LVI"),
    createnewvar = TRUE,
    newvarname = "rpa_risk_group",
    riskgrouplabels = "risk"
)
#> 
#>  RECURSIVE PARTITIONING ANALYSIS FOR SURVIVAL
#> 
#>  <div style="font-family: Arial; padding: 15px; background-color:
#>  #f8f9fa; border-radius: 5px; margin: 10px 0;">
#>  <h3 style="color: #0066cc; margin-top: 0;">Recursive Partitioning
#>  Analysis for Survival
#> 
#>  Purpose: Develop risk stratification groups using binary tree
#>  partitioning on survival data.
#> 
#>  Method: CART (Classification and Regression Trees) for survival
#>  endpoints.
#> 
#>  Output: Terminal nodes become risk groups (e.g., Stage I, II, III).
#> 
#>  <h4 style="color: #0066cc;">Usage Notes:
#> 
#> 
#>  Requires Survival Time (numeric, non-negative) and Event Status (0/1
#>  or TRUE/FALSE).
#>  Select predictor variables: categorical (stage, grade) or continuous
#>  (age, biomarker).
#>  Tree is pruned using cross-validation unless disabled.
#>  Minimum node size (default=20) controls tree complexity.
#>  Log-rank criterion recommended for survival splitting.
#> 
#>  <h4 style="color: #0066cc;">Example Application:
#> 
#>  Integrate LVI status + ypTNM stage to create RPA staging (as in Liu et
#>  al., Br J Cancer 2026).
#> 
#> 
#> 
#>  <div style="font-family: Arial, sans-serif; margin: 10px;"><div
#>  style="margin: 8px 0; padding: 10px; background-color: #5bc0de22;
#>  border-left: 4px solid #5bc0de; border-radius: 3px;"><strong
#>  style="color: #5bc0de;">&#x2139; Statistical Assumptions<br/><span
#>  style="color: #333;">RPA assumes proportional hazards (constant hazard
#>  ratio over time). For competing risks or crossing hazards, consider
#>  alternative methods.<div style="margin: 8px 0; padding: 10px;
#>  background-color: #5bc0de22; border-left: 4px solid #5bc0de;
#>  border-radius: 3px;"><strong style="color: #5bc0de;">&#x2139; Tree
#>  Pruned<br/><span style="color: #333;">Tree pruned using 10-fold
#>  cross-validation. Optimal CP = 0.0028. This helps prevent
#>  overfitting.<div style="margin: 8px 0; padding: 10px;
#>  background-color: #f0ad4e22; border-left: 4px solid #f0ad4e;
#>  border-radius: 3px;"><strong style="color: #f0ad4e;">&#x26A0; No
#>  Splits Found<br/><span style="color: #333;">Tree has no splits. Try
#>  reducing minbucket or cp parameters.

# Conservative settings for external validation
rpasurvival(
    data = survData,
    time = "time",
    event = "event",
    predictors = c("age", "stage", "grade"),
    minbucket = 30,      # Larger minimum node size
    cp = 0.02,           # More conservative pruning
    maxdepth = 2,        # Simpler tree
    nfolds = 10,         # 10-fold cross-validation
    prunetree = TRUE,
    cptable = TRUE       # Show complexity parameter table
)
#> 
#>  RECURSIVE PARTITIONING ANALYSIS FOR SURVIVAL
#> 
#>  <div style="font-family: Arial; padding: 15px; background-color:
#>  #f8f9fa; border-radius: 5px; margin: 10px 0;">
#>  <h3 style="color: #0066cc; margin-top: 0;">Recursive Partitioning
#>  Analysis for Survival
#> 
#>  Purpose: Develop risk stratification groups using binary tree
#>  partitioning on survival data.
#> 
#>  Method: CART (Classification and Regression Trees) for survival
#>  endpoints.
#> 
#>  Output: Terminal nodes become risk groups (e.g., Stage I, II, III).
#> 
#>  <h4 style="color: #0066cc;">Usage Notes:
#> 
#> 
#>  Requires Survival Time (numeric, non-negative) and Event Status (0/1
#>  or TRUE/FALSE).
#>  Select predictor variables: categorical (stage, grade) or continuous
#>  (age, biomarker).
#>  Tree is pruned using cross-validation unless disabled.
#>  Minimum node size (default=20) controls tree complexity.
#>  Log-rank criterion recommended for survival splitting.
#> 
#>  <h4 style="color: #0066cc;">Example Application:
#> 
#>  Integrate LVI status + ypTNM stage to create RPA staging (as in Liu et
#>  al., Br J Cancer 2026).
#> 
#> 
#> 
#>  <div style="font-family: Arial, sans-serif; margin: 10px;"><div
#>  style="margin: 8px 0; padding: 10px; background-color: #5bc0de22;
#>  border-left: 4px solid #5bc0de; border-radius: 3px;"><strong
#>  style="color: #5bc0de;">&#x2139; Statistical Assumptions<br/><span
#>  style="color: #333;">RPA assumes proportional hazards (constant hazard
#>  ratio over time). For competing risks or crossing hazards, consider
#>  alternative methods.<div style="margin: 8px 0; padding: 10px;
#>  background-color: #5bc0de22; border-left: 4px solid #5bc0de;
#>  border-radius: 3px;"><strong style="color: #5bc0de;">&#x2139; Tree
#>  Pruned<br/><span style="color: #333;">Tree pruned using 10-fold
#>  cross-validation. Optimal CP = 0.0143. This helps prevent
#>  overfitting.<div style="margin: 8px 0; padding: 10px;
#>  background-color: #f0ad4e22; border-left: 4px solid #f0ad4e;
#>  border-radius: 3px;"><strong style="color: #f0ad4e;">&#x26A0; No
#>  Splits Found<br/><span style="color: #333;">Tree has no splits. Try
#>  reducing minbucket or cp parameters.