Skip to contents

Comprehensive comparison of decision tree algorithms for clinical research. Compares CART, Random Forest, and Gradient Boosting with cross-validation, statistical testing, and clinical performance assessment.

Usage

treecompare(
  data,
  vars = NULL,
  facs = NULL,
  target,
  targetLevel,
  include_cart = TRUE,
  include_rf = TRUE,
  include_gbm = FALSE,
  include_xgboost = FALSE,
  include_ctree = FALSE,
  validation = "repeated_cv",
  cv_folds = 5,
  cv_repeats = 5,
  bootstrap_samples = 200,
  test_split = 0.25,
  stratified_sampling = TRUE,
  primary_metric = "bacc",
  statistical_testing = TRUE,
  correction_method = "holm",
  tune_parameters = TRUE,
  tuning_method = "grid",
  cart_max_depth = 5,
  cart_min_split = 20,
  rf_ntrees = 500,
  rf_mtry_method = "auto",
  clinical_context = "diagnosis",
  interpretability_weight = 0.3,
  show_comparison_table = TRUE,
  show_performance_plot = TRUE,
  show_roc_comparison = TRUE,
  show_statistical_tests = TRUE,
  show_ranking_table = TRUE,
  show_computational_time = TRUE,
  show_clinical_recommendations = TRUE,
  show_detailed_metrics = FALSE,
  ensemble_best_models = FALSE,
  save_best_models = FALSE,
  set_seed = TRUE,
  seed_value = 42,
  parallel_processing = TRUE,
  verbose_output = FALSE
)

Arguments

data

The data as a data frame for algorithm comparison.

vars

.

facs

.

target

.

targetLevel

.

include_cart

Include Classification and Regression Trees (CART) algorithm.

include_rf

Include Random Forest ensemble method.

include_gbm

Include Gradient Boosting Machine (requires gbm package).

include_xgboost

Include XGBoost algorithm (requires xgboost package).

include_ctree

Include conditional inference trees (requires party package).

validation

Validation method for fair algorithm comparison.

cv_folds

.

cv_repeats

.

bootstrap_samples

.

test_split

.

stratified_sampling

.

primary_metric

Primary metric for ranking algorithms.

statistical_testing

Perform statistical tests to compare algorithm performance.

correction_method

Correction method for multiple pairwise comparisons.

tune_parameters

Automatically tune key parameters for each algorithm.

tuning_method

.

cart_max_depth

.

cart_min_split

.

rf_ntrees

.

rf_mtry_method

.

clinical_context

.

interpretability_weight

Weight given to interpretability in final recommendations (0=performance only, 1=interpretability only).

show_comparison_table

Display comprehensive comparison table with all metrics.

show_performance_plot

Display box plots comparing algorithm performance.

show_roc_comparison

Display overlaid ROC curves for all algorithms.

show_statistical_tests

Display pairwise statistical test results.

show_ranking_table

Display final algorithm ranking with recommendations.

show_computational_time

Include computational time in comparison.

show_clinical_recommendations

Provide clinical recommendations based on comparison results.

show_detailed_metrics

Show detailed metrics for each algorithm (sensitivity, specificity, etc.).

ensemble_best_models

Create ensemble combining top-performing algorithms.

save_best_models

Save the best-performing models for future use.

set_seed

.

seed_value

.

parallel_processing

Use multiple cores for faster comparison (if available).

verbose_output

Show detailed progress during model comparison.

Value

A results object containing:

results$instructionsa html
results$algorithm_summarya html
results$comparison_tablea table
results$performance_plotan image
results$roc_comparisonan image
results$statistical_testsa table
results$ranking_tablea table
results$clinical_recommendationsa html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$comparison_table$asDF

as.data.frame(results$comparison_table)

Examples

# Compare multiple tree algorithms
treecomparison(
    data = clinical_data,
    vars = c("biomarker1", "biomarker2", "age"),
    facs = c("grade", "stage"),
    target = "outcome",
    targetLevel = "positive",
    algorithms = c("cart", "rf", "gbm"),
    validation = "repeated_cv"
)