Skip to contents

Introduction

The advancedtree function in ClinicoPath provides state-of-the-art decision tree algorithms specifically designed for clinical research and medical decision making. This module extends traditional decision tree functionality with modern algorithms including gradient boosting, conditional inference trees, ensemble methods, and advanced interpretability tools.

Key Features

  • Multiple Algorithms: CART, Conditional Inference, Random Forest, XGBoost, Extra Trees, Ensemble
  • Clinical Focus: Specialized metrics and visualizations for healthcare applications
  • Advanced Validation: Cross-validation, bootstrap, hold-out, and temporal validation
  • Feature Analysis: Automated selection, importance ranking, interaction analysis
  • Class Imbalance Handling: Multiple strategies for rare diseases and imbalanced outcomes
  • Interpretability: SHAP values, partial dependence plots, feature interactions
  • Clinical Context: Diagnostic, prognostic, treatment, risk stratification applications

Dataset Overview

For this tutorial, we’ll use the histopathology dataset included in ClinicoPath, which contains clinical and pathological data from 250 patients.

# Load the histopathology dataset
data(histopathology)

# Overview of the dataset
str(histopathology)

# Key variables for decision tree analysis
cat("Continuous Variables:\n")
continuous_vars <- c("Age", "Grade", "TStage", "OverallTime", "MeasurementA", "MeasurementB")
print(continuous_vars)

cat("\nCategorical Variables:\n")
categorical_vars <- c("Sex", "Group", "Grade_Level", "LVI", "PNI", "LymphNodeMetastasis", "Mortality5yr")
print(categorical_vars)

cat("\nOutcome Variables:\n")
outcome_vars <- c("Outcome", "Death", "Mortality5yr", "Outcome2")
print(outcome_vars)

Basic Decision Tree Analysis

Example 1: Simple CART Tree for Mortality Prediction

# Basic CART tree for mortality prediction
# Note: This is a jamovi module function. In R, use:
# library(jjstatsplot)
# basic_tree <- advancedtree(
#   data = histopathology,
#   vars = c("Age", "Grade", "TStage"),
#   facs = c("Sex", "LVI", "PNI"),
#   target = "Mortality5yr",
#   targetLevel = "Dead",
#   algorithm = "rpart",
#   validation = "cv",
#   show_tree_plot = TRUE,
#   show_importance_plot = TRUE,
#   show_performance_metrics = TRUE
# )

Example 2: Conditional Inference Tree

# Conditional inference tree with unbiased variable selection
ctree_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "OverallTime", "MeasurementA", "MeasurementB"),
  facs = c("Grade_Level", "Group", "LymphNodeMetastasis"),
  target = "Outcome",
  targetLevel = "1",
  algorithm = "ctree",
  validation = "bootstrap",
  max_depth = 5,
  show_tree_plot = TRUE,
  show_roc_curve = TRUE,
  show_confusion_matrix = TRUE
)

Advanced Ensemble Methods

Example 3: Random Forest for Comprehensive Analysis

# Random Forest with feature selection and validation
rf_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "Grade", "TStage", "OverallTime", "MeasurementA", "MeasurementB"),
  facs = c("Sex", "Group", "Grade_Level", "LVI", "PNI", "LymphNodeMetastasis"),
  target = "Mortality5yr",
  targetLevel = "Dead",
  algorithm = "randomforest",
  validation = "cv",
  cv_folds = 5,
  n_estimators = 200,
  feature_selection = TRUE,
  importance_method = "permutation",
  show_importance_plot = TRUE,
  show_performance_metrics = TRUE,
  show_validation_curves = TRUE,
  show_roc_curve = TRUE
)

Example 4: Gradient Boosting (XGBoost) with Hyperparameter Tuning

# XGBoost with automated hyperparameter optimization
xgb_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "Grade", "TStage", "OverallTime", "MeasurementA", "MeasurementB"),
  facs = c("Sex", "Group", "Grade_Level", "LVI", "PNI"),
  target = "Outcome",
  targetLevel = "1",
  algorithm = "xgboost",
  validation = "holdout",
  test_split = 0.3,
  hyperparameter_tuning = TRUE,
  tuning_method = "random",
  n_estimators = 100,
  learning_rate = 0.1,
  max_depth = 6,
  show_importance_plot = TRUE,
  show_performance_metrics = TRUE,
  show_roc_curve = TRUE,
  show_calibration_plot = TRUE
)

Class Imbalance Handling

Example 5: Handling Imbalanced Clinical Data

# Create imbalanced outcome for demonstration
imbalanced_data <- histopathology %>%
  mutate(RareEvent = ifelse(Grade >= 3 & LVI == "Present", "Yes", "No"))

# Model with class imbalance handling
imbalanced_model <- advancedtree(
  data = imbalanced_data,
  vars = c("Age", "TStage", "OverallTime"),
  facs = c("Sex", "Group", "PNI", "LymphNodeMetastasis"),
  target = "RareEvent",
  targetLevel = "Yes",
  algorithm = "randomforest",
  validation = "cv",
  handle_imbalance = TRUE,
  imbalance_method = "smote",
  show_performance_metrics = TRUE,
  show_roc_curve = TRUE,
  show_confusion_matrix = TRUE
)

Clinical Context Applications

Example 6: Diagnostic Classification

# Diagnostic model for disease classification
diagnostic_model <- advancedtree(
  data = histopathology,
  vars = c("MeasurementA", "MeasurementB", "Age"),
  facs = c("Sex", "Grade_Level"),
  target = "Disease Status",
  targetLevel = "Ill",
  algorithm = "ensemble",
  validation = "cv",
  clinical_context = "diagnosis",
  cost_sensitive_thresholds = TRUE,
  fn_fp_ratio = 3.0,  # Higher cost for missing disease
  show_performance_metrics = TRUE,
  show_roc_curve = TRUE,
  show_calibration_plot = TRUE
)

Example 7: Prognosis Prediction with Bootstrap Confidence

# Prognostic model with uncertainty quantification
prognosis_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "Grade", "TStage", "OverallTime"),
  facs = c("LVI", "PNI", "LymphNodeMetastasis"),
  target = "Mortality5yr",
  targetLevel = "Dead",
  algorithm = "xgboost",
  validation = "bootstrap",
  clinical_context = "prognosis",
  bootstrap_confidence = TRUE,
  n_bootstrap = 500,
  show_performance_metrics = TRUE,
  show_validation_curves = TRUE
)

Advanced Interpretability

Example 8: SHAP Analysis for Feature Explanation

# Model with SHAP analysis for interpretability
shap_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "Grade", "TStage", "MeasurementA"),
  facs = c("Sex", "LVI", "PNI"),
  target = "Outcome",
  targetLevel = "1",
  algorithm = "randomforest",
  validation = "cv",
  interpretability = TRUE,
  shap_analysis = TRUE,
  partial_dependence = TRUE,
  interaction_analysis = TRUE,
  show_performance_metrics = TRUE
)

Example 9: Biomarker Discovery

# Biomarker discovery with comprehensive feature analysis
biomarker_model <- advancedtree(
  data = histopathology,
  vars = c("MeasurementA", "MeasurementB", "Age", "Grade", "TStage"),
  facs = c("Sex", "Group", "LVI", "PNI", "LymphNodeMetastasis"),
  target = "Outcome2",
  targetLevel = "DOD",  # Death of Disease
  algorithm = "ensemble",
  validation = "cv",
  clinical_context = "biomarker",
  feature_selection = TRUE,
  importance_method = "shap",
  interpretability = TRUE,
  partial_dependence = TRUE,
  show_importance_plot = TRUE,
  show_performance_metrics = TRUE
)

Treatment Response Prediction

Example 10: Personalized Treatment Selection

# Treatment response prediction model
treatment_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "Grade", "TStage", "MeasurementA", "MeasurementB"),
  facs = c("Sex", "Grade_Level", "LVI", "PNI"),
  target = "Group",
  targetLevel = "Treatment",
  algorithm = "xgboost",
  validation = "temporal",  # For treatment sequence data
  clinical_context = "treatment",
  hyperparameter_tuning = TRUE,
  interpretability = TRUE,
  shap_analysis = TRUE,
  show_performance_metrics = TRUE,
  show_roc_curve = TRUE
)

Risk Stratification

Example 11: Patient Risk Categorization

# Risk stratification with cost-sensitive learning
risk_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "Grade", "TStage", "OverallTime"),
  facs = c("Sex", "LVI", "PNI", "LymphNodeMetastasis"),
  target = "Mortality5yr",
  targetLevel = "Dead",
  algorithm = "ensemble",
  validation = "cv",
  clinical_context = "risk",
  cost_sensitive_thresholds = TRUE,
  fn_fp_ratio = 2.0,
  handle_imbalance = TRUE,
  imbalance_method = "cost_sensitive",
  show_performance_metrics = TRUE,
  show_roc_curve = TRUE,
  show_calibration_plot = TRUE
)

Model Export and Deployment

Example 12: Clinical Decision Support System

# Model for clinical decision support deployment
deployment_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "Grade", "TStage", "MeasurementA"),
  facs = c("Sex", "LVI", "PNI"),
  target = "Mortality5yr",
  targetLevel = "Dead",
  algorithm = "xgboost",
  validation = "holdout",
  test_split = 0.2,
  export_model = TRUE,
  bootstrap_confidence = TRUE,
  n_bootstrap = 200,
  missing_data_handling = "model",
  show_performance_metrics = TRUE,
  clinical_context = "diagnosis"
)

Missing Data Handling

Example 13: Robust Analysis with Missing Data

# Create dataset with missing values for demonstration
missing_data <- histopathology
missing_data$Age[sample(nrow(missing_data), 20)] <- NA
missing_data$MeasurementA[sample(nrow(missing_data), 15)] <- NA

# Model with advanced missing data handling
missing_model <- advancedtree(
  data = missing_data,
  vars = c("Age", "Grade", "TStage", "MeasurementA", "MeasurementB"),
  facs = c("Sex", "Group", "LVI"),
  target = "Outcome",
  targetLevel = "1",
  algorithm = "randomforest",
  validation = "cv",
  missing_data_handling = "tree",  # Tree-based imputation
  show_performance_metrics = TRUE,
  show_importance_plot = TRUE
)

Screening Applications

Example 14: Population Screening Model

# Screening model with high sensitivity
screening_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "MeasurementA", "MeasurementB"),
  facs = c("Sex", "Group"),
  target = "Disease Status",
  targetLevel = "Ill",
  algorithm = "ensemble",
  validation = "bootstrap",
  clinical_context = "screening",
  cost_sensitive_thresholds = TRUE,
  fn_fp_ratio = 5.0,  # Very high cost for missing disease in screening
  show_performance_metrics = TRUE,
  show_roc_curve = TRUE,
  show_calibration_plot = TRUE
)

Comparative Analysis

Example 15: Algorithm Comparison

# Compare multiple algorithms on the same dataset
algorithms <- c("rpart", "ctree", "randomforest", "xgboost")
results <- list()

for (algo in algorithms) {
  cat(paste("\nTraining", algo, "model...\n"))
  
  model <- advancedtree(
    data = histopathology,
    vars = c("Age", "Grade", "TStage", "MeasurementA"),
    facs = c("Sex", "LVI", "PNI"),
    target = "Mortality5yr",
    targetLevel = "Dead",
    algorithm = algo,
    validation = "cv",
    cv_folds = 5,
    show_performance_metrics = TRUE
  )
  
  results[[algo]] <- model
}

Advanced Visualization

Example 16: Comprehensive Visualization Suite

# Model with all visualization options enabled
viz_model <- advancedtree(
  data = histopathology,
  vars = c("Age", "Grade", "TStage", "OverallTime"),
  facs = c("Sex", "Group", "LVI", "PNI"),
  target = "Outcome",
  targetLevel = "1",
  algorithm = "randomforest",
  validation = "cv",
  interpretability = TRUE,
  show_tree_plot = TRUE,
  show_importance_plot = TRUE,
  show_performance_metrics = TRUE,
  show_validation_curves = TRUE,
  show_roc_curve = TRUE,
  show_calibration_plot = TRUE,
  show_confusion_matrix = TRUE,
  shap_analysis = TRUE,
  partial_dependence = TRUE,
  interaction_analysis = TRUE
)

Clinical Reporting

The advancedtree function provides comprehensive clinical reporting capabilities:

Performance Metrics

  • Accuracy: Overall classification accuracy
  • Sensitivity: True positive rate (important for disease detection)
  • Specificity: True negative rate (important for avoiding false alarms)
  • Positive Predictive Value (PPV): Probability of disease given positive test
  • Negative Predictive Value (NPV): Probability of no disease given negative test
  • AUC-ROC: Area under the receiver operating characteristic curve
  • Likelihood Ratios: Clinical utility metrics for decision making

Clinical Interpretation

The function provides context-specific interpretation based on the selected clinical application:

  • Diagnostic: Focus on sensitivity/specificity balance
  • Prognostic: Emphasis on long-term prediction accuracy
  • Treatment: Personalized treatment selection criteria
  • Risk: Patient stratification and management guidance
  • Biomarker: Feature importance and biological relevance
  • Screening: High sensitivity for population health

Best Practices

Data Preparation

  1. Variable Selection: Choose clinically relevant predictors
  2. Data Quality: Address missing values and outliers
  3. Feature Engineering: Create meaningful clinical variables
  4. Class Balance: Consider prevalence of outcomes

Model Selection

  1. Sample Size: Use appropriate algorithms for dataset size
  2. Interpretability: Balance performance with explainability
  3. Validation: Use appropriate validation strategies
  4. Clinical Context: Align model choice with application

Performance Evaluation

  1. Multiple Metrics: Don’t rely on accuracy alone
  2. Clinical Relevance: Consider cost of different errors
  3. Confidence Intervals: Quantify uncertainty
  4. External Validation: Test on independent datasets

Troubleshooting

Common Issues

  1. Convergence Problems: Reduce model complexity or increase sample size
  2. Overfitting: Use regularization or ensemble methods
  3. Class Imbalance: Apply balancing techniques
  4. Missing Data: Choose appropriate imputation strategy

Performance Optimization

  1. Feature Selection: Reduce dimensionality
  2. Hyperparameter Tuning: Optimize model parameters
  3. Ensemble Methods: Combine multiple models
  4. Cross-Validation: Use appropriate validation strategy

Conclusion

The advancedtree function in ClinicoPath provides a comprehensive toolkit for clinical decision tree analysis. With support for multiple modern algorithms, extensive validation options, and clinical-focused interpretability tools, it enables researchers and clinicians to build robust predictive models for various healthcare applications.

Key advantages include:

  • Modern Algorithms: Access to state-of-the-art tree-based methods
  • Clinical Focus: Specialized features for healthcare applications
  • Comprehensive Validation: Multiple strategies for performance assessment
  • Advanced Interpretability: SHAP values, partial dependence, interactions
  • Class Imbalance Handling: Techniques for rare diseases and events
  • Flexible Deployment: Model export for clinical decision support

The function complements existing ClinicoPath decision tree modules while providing enhanced functionality for complex clinical research scenarios.


For more information about ClinicoPath and its capabilities, visit the ClinicoPath GitHub repository.