Skip to contents

Format

rpasurvival_test: Standard dataset with 200 observations and 11 variables:

patient_id

Character. Patient identifier (PT001-PT200)

time

Numeric. Survival time in months (range: 0.5-120, mean ~36)

event

Factor. Event indicator (0 = censored, 1 = death/event). Event rate ~65\ ageNumeric. Patient age in years (40-85, mean ~65) stageOrdered factor. Tumor stage (I, II, III, IV) gradeOrdered factor. Tumor grade (G1, G2, G3) LVIFactor. Lymphovascular invasion (Absent, Present) tumor_sizeNumeric. Tumor size in centimeters (0.5-10) ki67Numeric. Ki-67 proliferation index, percentage (0-100). ~3\ performance_statusOrdered factor. ECOG performance status (0, 1, 2) treatmentFactor. Treatment modality (Surgery only, Surgery + Chemo, Surgery + Radio, Trimodal)

rpasurvival_small: Minimal dataset with 50 observations and 6 variables:

patient_id

Character. Patient identifier (SM01-SM50)

time

Numeric. Survival time in months

event

Factor. Event indicator (0, 1)

age

Numeric. Patient age in years

stage

Factor. Tumor stage (Early, Advanced)

grade

Factor. Tumor grade (Low, High)

rpasurvival_large: Large dataset with 500 observations and 11 variables:

patient_id

Character. Patient identifier (LG0001-LG0500)

time

Numeric. Survival time in months

event

Factor. Event indicator (0, 1). Event rate ~70\ ageNumeric. Patient age in years stageOrdered factor. Detailed tumor stage (IA, IB, IIA, IIB, IIIA, IIIB, IV) gradeOrdered factor. Tumor grade (1, 2, 3) LVIFactor. Lymphovascular invasion (No, Yes) PNIFactor. Perineural invasion (No, Yes) tumor_sizeNumeric. Tumor size in centimeters nodes_positiveNumeric. Number of positive lymph nodes biomarker1Numeric. Continuous biomarker 1 biomarker2Numeric. Continuous biomarker 2

Edge case datasets (for testing different event/time coding):

Generated synthetically using data-raw/rpasurvival_test_data.R. Seed: 12345. Generation date: 2026-01-31. Synthetic datasets for testing and demonstrating the rpasurvival function (Recursive Partitioning Analysis for Survival Data). These datasets were generated using a seeded random number generator to produce realistic survival data with the following characteristics:

  • Survival times follow exponential distribution

  • Event rates are clinically realistic (60-70\

  • Prognostic correlations built in (Stage IV → shorter survival)

  • Missing data pattern (~3\

  • Events-per-variable (EPV) ratio > 10 for all datasets

The data generation process ensures:
  • Non-negative survival times

  • Proper factor level ordering (ordinal variables)

  • Realistic clinical distributions

  • Sufficient sample sizes for RPA analysis

File FormatsEach dataset is available in multiple formats:
  • RDA: Native R format (use data())

  • CSV: Comma-separated values

  • XLSX: Excel format

  • OMV: jamovi native format

Usage ExamplesSee vignette("rpasurvival-examples") for comprehensive examples.Basic usage:


data(rpasurvival_test)
library(ClinicoPath)# Standard RPA analysis
rpasurvival(
  data = rpasurvival_test,
  time = "time",
  event = "event",
  predictors = c("age", "stage", "grade", "LVI"),
  time_unit = "months"
)# Test small sample warnings
data(rpasurvival_small)
rpasurvival(
  data = rpasurvival_small,
  time = "time",
  event = "event",
  predictors = c("stage", "grade")
)# Test different event coding
data(rpasurvival_edge_truefalse)
rpasurvival(
  data = rpasurvival_edge_truefalse,
  time = "time",
  event = "event_tf",
  predictors = c("stage", "grade"),
  eventValue = "TRUE"
)

Testing ScenariosThe datasets support testing of:

  1. Standard analysis: Use rpasurvival_test with 4-6 predictors

  2. Small samples: Use rpasurvival_small, expect warnings

  3. Complex trees: Use rpasurvival_large with maxdepth=5

  4. Event coding: Test TRUE/FALSE and 1/2 coding schemes

  5. Time units: Test days, months, years with time_unit parameter

  6. Missing data: Verify handling of ~3\

  7. Mixed predictors: Continuous, ordinal, and nominal variables

ValidationAll datasets have been validated for:

  • Non-negative survival times

  • Appropriate event rates

  • Stage-survival correlation (higher stage → worse prognosis)

  • Sufficient EPV (events per variable > 10)

  • Realistic clinical distributions

  • Proper factor level ordering

# Load standard test data data(rpasurvival_test)# Examine structure str(rpasurvival_test)# Summary statistics summary(rpasurvival_test)# Check event rate table(rpasurvival_test$event) prop.table(table(rpasurvival_test$event))# Check stage distribution table(rpasurvival_test$stage)# Basic RPA analysis Liu Y, et al. (2026). Recursive partitioning analysis for survival data. rpasurvival for the main analysis functionvignette("rpasurvival-examples") for comprehensive usage examples datasets