Conditional Inference Trees for survival analysis using unbiased recursive partitioning. This method addresses the variable selection bias inherent in traditional CART by employing conditional inference procedures. The algorithm separates variable selection from splitting point selection, providing unbiased tree structures particularly valuable for survival data with mixed-type predictors and complex censoring patterns. Features include statistical significance-based stopping criteria, handling of missing values without surrogate splits, and robust performance with correlated predictors. The method is particularly suitable for exploratory survival analysis and biomarker discovery in clinical research.
Usage
conditionalinference(
data,
time,
event,
predictors,
strata,
teststat = "quad",
testtype = "Bonferroni",
mincriterion = 0.95,
minsplit = 20,
minbucket = 7,
maxdepth = 5,
nresample = 9999,
logrank_scores = TRUE,
show_splits = TRUE,
show_nodes = TRUE,
show_importance = TRUE,
plot_tree = TRUE,
plot_survival = TRUE,
plot_importance = FALSE,
mtry,
replace = FALSE,
subset
)Arguments
- data
The data as a data frame.
- time
Time to event variable (numeric). For right-censored data, this is the time from study entry to event or censoring.
- event
Event indicator variable. For survival analysis: 0 = censored, 1 = event. For competing risks: 0 = censored, 1+ = different event types.
- predictors
Variables to use for tree construction. Can include numeric, ordinal, and nominal variables. The algorithm handles mixed-type predictors optimally.
- strata
Optional stratification variable for stratified survival analysis. Creates separate baseline hazards for each stratum.
- teststat
Test statistic to use. 'quad' uses quadratic form for multivariate tests, 'max' uses maximum-type statistic. Quadratic form is more powerful for small effects, maximum-type for large effects.
- testtype
Multiple testing correction method. Bonferroni provides conservative but reliable correction. Monte Carlo uses permutation-based p-values.
- mincriterion
Minimum criterion (1 - p-value) for variable selection. Higher values create more conservative trees with fewer splits. Default 0.95 corresponds to significance level of 0.05.
- minsplit
Minimum number of observations required in a node before splitting. Higher values create simpler trees and reduce overfitting.
- minbucket
Minimum number of observations in terminal nodes. Should be smaller than minsplit. Higher values increase tree stability.
- maxdepth
Maximum depth of the tree. Controls tree complexity and prevents overfitting. Deeper trees may overfit, shallower trees may underfit.
- nresample
Number of Monte Carlo resamples for p-value computation when using Monte Carlo test type. More resamples provide more accurate p-values but increase computation time.
- logrank_scores
Use log-rank scores for survival splitting. When TRUE, uses log-rank statistics optimized for survival data. When FALSE, uses standard conditional inference.
- show_splits
Display detailed split statistics including test statistics, p-values, and variable importance measures.
- show_nodes
Display detailed information for each terminal node including Kaplan-Meier estimates, risk tables, and survival summaries.
- show_importance
Calculate and display variable importance measures based on the improvement in prediction accuracy.
- plot_tree
Generate tree structure plot with nodes, splits, and survival curves.
- plot_survival
Plot Kaplan-Meier survival curves for each terminal node with confidence intervals and risk tables.
- plot_importance
Generate variable importance plot showing relative importance of predictors in tree construction.
- mtry
Number of variables randomly selected at each split. Leave empty to use all variables. Setting this creates a random forest-like approach.
- replace
Use bootstrap sampling with replacement for tree construction. Creates more robust trees but may reduce interpretability.
- subset
Fraction of observations to use for tree construction. Leave empty to use all observations. Useful for large datasets.
Value
A results object containing:
results$todo | a html | ||||
results$modelSummary | a table | ||||
results$splitStatistics | a table | ||||
results$nodeDetails | a table | ||||
results$variableImportance | a table | ||||
results$treePlot | an image | ||||
results$survivalPlot | an image | ||||
results$importancePlot | an image |
Tables can be converted to data frames with asDF or as.data.frame. For example:
results$modelSummary$asDF
as.data.frame(results$modelSummary)
Examples
result <- conditionalinference(
data = mydata,
time = "time_to_event",
event = "event_indicator",
predictors = c("age", "stage", "biomarker"),
mincriterion = 0.95,
minsplit = 20,
minbucket = 7,
maxdepth = 5
)