Advanced Decision Tree Analysis in ClinicoPath
Modern Tree Algorithms for Clinical Research and Medical Decision Making
ClinicoPath
Source:vignettes/09-advancedtree.Rmd
09-advancedtree.Rmd
Introduction
The advancedtree
function in ClinicoPath provides
state-of-the-art decision tree algorithms specifically designed for
clinical research and medical decision making. This module extends
traditional decision tree functionality with modern algorithms including
gradient boosting, conditional inference trees, ensemble methods, and
advanced interpretability tools.
Key Features
- Multiple Algorithms: CART, Conditional Inference, Random Forest, XGBoost, Extra Trees, Ensemble
- Clinical Focus: Specialized metrics and visualizations for healthcare applications
- Advanced Validation: Cross-validation, bootstrap, hold-out, and temporal validation
- Feature Analysis: Automated selection, importance ranking, interaction analysis
- Class Imbalance Handling: Multiple strategies for rare diseases and imbalanced outcomes
- Interpretability: SHAP values, partial dependence plots, feature interactions
- Clinical Context: Diagnostic, prognostic, treatment, risk stratification applications
Dataset Overview
For this tutorial, we’ll use the histopathology
dataset
included in ClinicoPath, which contains clinical and pathological data
from 250 patients.
# Load the histopathology dataset
data(histopathology)
# Overview of the dataset
str(histopathology)
# Key variables for decision tree analysis
cat("Continuous Variables:\n")
continuous_vars <- c("Age", "Grade", "TStage", "OverallTime", "MeasurementA", "MeasurementB")
print(continuous_vars)
cat("\nCategorical Variables:\n")
categorical_vars <- c("Sex", "Group", "Grade_Level", "LVI", "PNI", "LymphNodeMetastasis", "Mortality5yr")
print(categorical_vars)
cat("\nOutcome Variables:\n")
outcome_vars <- c("Outcome", "Death", "Mortality5yr", "Outcome2")
print(outcome_vars)
Basic Decision Tree Analysis
Example 1: Simple CART Tree for Mortality Prediction
# Basic CART tree for mortality prediction
# Note: This is a jamovi module function. In R, use:
# library(jjstatsplot)
# basic_tree <- advancedtree(
# data = histopathology,
# vars = c("Age", "Grade", "TStage"),
# facs = c("Sex", "LVI", "PNI"),
# target = "Mortality5yr",
# targetLevel = "Dead",
# algorithm = "rpart",
# validation = "cv",
# show_tree_plot = TRUE,
# show_importance_plot = TRUE,
# show_performance_metrics = TRUE
# )
Example 2: Conditional Inference Tree
# Conditional inference tree with unbiased variable selection
ctree_model <- advancedtree(
data = histopathology,
vars = c("Age", "OverallTime", "MeasurementA", "MeasurementB"),
facs = c("Grade_Level", "Group", "LymphNodeMetastasis"),
target = "Outcome",
targetLevel = "1",
algorithm = "ctree",
validation = "bootstrap",
max_depth = 5,
show_tree_plot = TRUE,
show_roc_curve = TRUE,
show_confusion_matrix = TRUE
)
Advanced Ensemble Methods
Example 3: Random Forest for Comprehensive Analysis
# Random Forest with feature selection and validation
rf_model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "OverallTime", "MeasurementA", "MeasurementB"),
facs = c("Sex", "Group", "Grade_Level", "LVI", "PNI", "LymphNodeMetastasis"),
target = "Mortality5yr",
targetLevel = "Dead",
algorithm = "randomforest",
validation = "cv",
cv_folds = 5,
n_estimators = 200,
feature_selection = TRUE,
importance_method = "permutation",
show_importance_plot = TRUE,
show_performance_metrics = TRUE,
show_validation_curves = TRUE,
show_roc_curve = TRUE
)
Example 4: Gradient Boosting (XGBoost) with Hyperparameter Tuning
# XGBoost with automated hyperparameter optimization
xgb_model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "OverallTime", "MeasurementA", "MeasurementB"),
facs = c("Sex", "Group", "Grade_Level", "LVI", "PNI"),
target = "Outcome",
targetLevel = "1",
algorithm = "xgboost",
validation = "holdout",
test_split = 0.3,
hyperparameter_tuning = TRUE,
tuning_method = "random",
n_estimators = 100,
learning_rate = 0.1,
max_depth = 6,
show_importance_plot = TRUE,
show_performance_metrics = TRUE,
show_roc_curve = TRUE,
show_calibration_plot = TRUE
)
Class Imbalance Handling
Example 5: Handling Imbalanced Clinical Data
# Create imbalanced outcome for demonstration
imbalanced_data <- histopathology %>%
mutate(RareEvent = ifelse(Grade >= 3 & LVI == "Present", "Yes", "No"))
# Model with class imbalance handling
imbalanced_model <- advancedtree(
data = imbalanced_data,
vars = c("Age", "TStage", "OverallTime"),
facs = c("Sex", "Group", "PNI", "LymphNodeMetastasis"),
target = "RareEvent",
targetLevel = "Yes",
algorithm = "randomforest",
validation = "cv",
handle_imbalance = TRUE,
imbalance_method = "smote",
show_performance_metrics = TRUE,
show_roc_curve = TRUE,
show_confusion_matrix = TRUE
)
Clinical Context Applications
Example 6: Diagnostic Classification
# Diagnostic model for disease classification
diagnostic_model <- advancedtree(
data = histopathology,
vars = c("MeasurementA", "MeasurementB", "Age"),
facs = c("Sex", "Grade_Level"),
target = "Disease Status",
targetLevel = "Ill",
algorithm = "ensemble",
validation = "cv",
clinical_context = "diagnosis",
cost_sensitive_thresholds = TRUE,
fn_fp_ratio = 3.0, # Higher cost for missing disease
show_performance_metrics = TRUE,
show_roc_curve = TRUE,
show_calibration_plot = TRUE
)
Example 7: Prognosis Prediction with Bootstrap Confidence
# Prognostic model with uncertainty quantification
prognosis_model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "OverallTime"),
facs = c("LVI", "PNI", "LymphNodeMetastasis"),
target = "Mortality5yr",
targetLevel = "Dead",
algorithm = "xgboost",
validation = "bootstrap",
clinical_context = "prognosis",
bootstrap_confidence = TRUE,
n_bootstrap = 500,
show_performance_metrics = TRUE,
show_validation_curves = TRUE
)
Advanced Interpretability
Example 8: SHAP Analysis for Feature Explanation
# Model with SHAP analysis for interpretability
shap_model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "MeasurementA"),
facs = c("Sex", "LVI", "PNI"),
target = "Outcome",
targetLevel = "1",
algorithm = "randomforest",
validation = "cv",
interpretability = TRUE,
shap_analysis = TRUE,
partial_dependence = TRUE,
interaction_analysis = TRUE,
show_performance_metrics = TRUE
)
Example 9: Biomarker Discovery
# Biomarker discovery with comprehensive feature analysis
biomarker_model <- advancedtree(
data = histopathology,
vars = c("MeasurementA", "MeasurementB", "Age", "Grade", "TStage"),
facs = c("Sex", "Group", "LVI", "PNI", "LymphNodeMetastasis"),
target = "Outcome2",
targetLevel = "DOD", # Death of Disease
algorithm = "ensemble",
validation = "cv",
clinical_context = "biomarker",
feature_selection = TRUE,
importance_method = "shap",
interpretability = TRUE,
partial_dependence = TRUE,
show_importance_plot = TRUE,
show_performance_metrics = TRUE
)
Treatment Response Prediction
Example 10: Personalized Treatment Selection
# Treatment response prediction model
treatment_model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "MeasurementA", "MeasurementB"),
facs = c("Sex", "Grade_Level", "LVI", "PNI"),
target = "Group",
targetLevel = "Treatment",
algorithm = "xgboost",
validation = "temporal", # For treatment sequence data
clinical_context = "treatment",
hyperparameter_tuning = TRUE,
interpretability = TRUE,
shap_analysis = TRUE,
show_performance_metrics = TRUE,
show_roc_curve = TRUE
)
Risk Stratification
Example 11: Patient Risk Categorization
# Risk stratification with cost-sensitive learning
risk_model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "OverallTime"),
facs = c("Sex", "LVI", "PNI", "LymphNodeMetastasis"),
target = "Mortality5yr",
targetLevel = "Dead",
algorithm = "ensemble",
validation = "cv",
clinical_context = "risk",
cost_sensitive_thresholds = TRUE,
fn_fp_ratio = 2.0,
handle_imbalance = TRUE,
imbalance_method = "cost_sensitive",
show_performance_metrics = TRUE,
show_roc_curve = TRUE,
show_calibration_plot = TRUE
)
Model Export and Deployment
Example 12: Clinical Decision Support System
# Model for clinical decision support deployment
deployment_model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "MeasurementA"),
facs = c("Sex", "LVI", "PNI"),
target = "Mortality5yr",
targetLevel = "Dead",
algorithm = "xgboost",
validation = "holdout",
test_split = 0.2,
export_model = TRUE,
bootstrap_confidence = TRUE,
n_bootstrap = 200,
missing_data_handling = "model",
show_performance_metrics = TRUE,
clinical_context = "diagnosis"
)
Missing Data Handling
Example 13: Robust Analysis with Missing Data
# Create dataset with missing values for demonstration
missing_data <- histopathology
missing_data$Age[sample(nrow(missing_data), 20)] <- NA
missing_data$MeasurementA[sample(nrow(missing_data), 15)] <- NA
# Model with advanced missing data handling
missing_model <- advancedtree(
data = missing_data,
vars = c("Age", "Grade", "TStage", "MeasurementA", "MeasurementB"),
facs = c("Sex", "Group", "LVI"),
target = "Outcome",
targetLevel = "1",
algorithm = "randomforest",
validation = "cv",
missing_data_handling = "tree", # Tree-based imputation
show_performance_metrics = TRUE,
show_importance_plot = TRUE
)
Screening Applications
Example 14: Population Screening Model
# Screening model with high sensitivity
screening_model <- advancedtree(
data = histopathology,
vars = c("Age", "MeasurementA", "MeasurementB"),
facs = c("Sex", "Group"),
target = "Disease Status",
targetLevel = "Ill",
algorithm = "ensemble",
validation = "bootstrap",
clinical_context = "screening",
cost_sensitive_thresholds = TRUE,
fn_fp_ratio = 5.0, # Very high cost for missing disease in screening
show_performance_metrics = TRUE,
show_roc_curve = TRUE,
show_calibration_plot = TRUE
)
Comparative Analysis
Example 15: Algorithm Comparison
# Compare multiple algorithms on the same dataset
algorithms <- c("rpart", "ctree", "randomforest", "xgboost")
results <- list()
for (algo in algorithms) {
cat(paste("\nTraining", algo, "model...\n"))
model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "MeasurementA"),
facs = c("Sex", "LVI", "PNI"),
target = "Mortality5yr",
targetLevel = "Dead",
algorithm = algo,
validation = "cv",
cv_folds = 5,
show_performance_metrics = TRUE
)
results[[algo]] <- model
}
Advanced Visualization
Example 16: Comprehensive Visualization Suite
# Model with all visualization options enabled
viz_model <- advancedtree(
data = histopathology,
vars = c("Age", "Grade", "TStage", "OverallTime"),
facs = c("Sex", "Group", "LVI", "PNI"),
target = "Outcome",
targetLevel = "1",
algorithm = "randomforest",
validation = "cv",
interpretability = TRUE,
show_tree_plot = TRUE,
show_importance_plot = TRUE,
show_performance_metrics = TRUE,
show_validation_curves = TRUE,
show_roc_curve = TRUE,
show_calibration_plot = TRUE,
show_confusion_matrix = TRUE,
shap_analysis = TRUE,
partial_dependence = TRUE,
interaction_analysis = TRUE
)
Clinical Reporting
The advancedtree
function provides comprehensive
clinical reporting capabilities:
Performance Metrics
- Accuracy: Overall classification accuracy
- Sensitivity: True positive rate (important for disease detection)
- Specificity: True negative rate (important for avoiding false alarms)
- Positive Predictive Value (PPV): Probability of disease given positive test
- Negative Predictive Value (NPV): Probability of no disease given negative test
- AUC-ROC: Area under the receiver operating characteristic curve
- Likelihood Ratios: Clinical utility metrics for decision making
Clinical Interpretation
The function provides context-specific interpretation based on the selected clinical application:
- Diagnostic: Focus on sensitivity/specificity balance
- Prognostic: Emphasis on long-term prediction accuracy
- Treatment: Personalized treatment selection criteria
- Risk: Patient stratification and management guidance
- Biomarker: Feature importance and biological relevance
- Screening: High sensitivity for population health
Best Practices
Data Preparation
- Variable Selection: Choose clinically relevant predictors
- Data Quality: Address missing values and outliers
- Feature Engineering: Create meaningful clinical variables
- Class Balance: Consider prevalence of outcomes
Troubleshooting
Conclusion
The advancedtree
function in ClinicoPath provides a
comprehensive toolkit for clinical decision tree analysis. With support
for multiple modern algorithms, extensive validation options, and
clinical-focused interpretability tools, it enables researchers and
clinicians to build robust predictive models for various healthcare
applications.
Key advantages include:
- Modern Algorithms: Access to state-of-the-art tree-based methods
- Clinical Focus: Specialized features for healthcare applications
- Comprehensive Validation: Multiple strategies for performance assessment
- Advanced Interpretability: SHAP values, partial dependence, interactions
- Class Imbalance Handling: Techniques for rare diseases and events
- Flexible Deployment: Model export for clinical decision support
The function complements existing ClinicoPath decision tree modules while providing enhanced functionality for complex clinical research scenarios.
For more information about ClinicoPath and its capabilities, visit the ClinicoPath GitHub repository.