Decision Tree Analysis with decisiongraph: A Comprehensive Guide
COVID-19 Treatment Strategy Optimization Using Decision Trees, Markov Models, and Cost-Effectiveness Analysis
meddecide Module
2025-10-09
Source:vignettes/meddecide-05-decisiongraph-comprehensive.Rmd
meddecide-05-decisiongraph-comprehensive.Rmd
Introduction
The decisiongraph
function in the meddecide package
provides comprehensive decision tree analysis capabilities for medical
decision-making. This vignette demonstrates how to use all major
features of the function using a realistic COVID-19 treatment strategy
dataset.
Key Features Covered
- Simple Decision Trees: Basic treatment comparisons
-
Markov Models: Long-term health state
modeling
- Cost-Effectiveness Analysis: Economic evaluation with ICER and NMB
- Sensitivity Analysis: One-way and probabilistic sensitivity analysis
- Advanced Features: Value of information, budget impact, correlation analysis
Dataset Overview
We’ll use a comprehensive COVID-19 treatment dataset with 40 patients comparing four treatment strategies across different patient populations.
# Load required libraries
library(meddecide)
library(dplyr)
library(ggplot2)
# Load the test dataset
covid_data <- read.csv("data/decisiongraph_test_data.csv")
# Display dataset structure
str(covid_data)
# Preview the first few rows
head(covid_data, 3)
# Summary of key variables
cat("Dataset contains:\n")
cat("- Patients:", nrow(covid_data), "\n")
cat("- Treatment strategies:", length(unique(covid_data$treatment_strategy)), "\n")
cat("- Age groups:", length(unique(covid_data$age_group)), "\n")
cat("- Comorbidity types:", length(unique(covid_data$comorbidities)), "\n")
1. Simple Decision Tree Analysis
Let’s start with a basic decision tree comparing treatment strategies based on recovery probability and treatment costs.
# Basic decision tree analysis
simple_analysis <- decisiongraph(
data = covid_data,
treeType = "simple",
# Core variables
decisions = "treatment_strategy",
probabilities = "recovery_prob",
costs = "treatment_cost",
utilities = "recovery_utility",
outcomes = "clinical_outcome",
# Visualization settings
layout = "horizontal",
nodeShapes = TRUE,
showProbabilities = TRUE,
showCosts = TRUE,
showUtilities = TRUE,
# Output options
calculateExpectedValues = TRUE,
summaryTable = TRUE,
# Performance settings
performanceMode = "standard"
)
Understanding Simple Decision Trees
The simple decision tree shows:
- Decision nodes (squares): Treatment strategy choices
-
Chance nodes (circles): Probability of
recovery
- Terminal nodes (triangles): Final outcomes with costs and utilities
- Expected values: Calculated for each treatment path
Key Insights: - Compare treatments based on expected costs and utilities - Identify dominant strategies (higher utility, lower cost) - Visualize decision trade-offs clearly
2. Markov Model Analysis
For long-term analysis, we’ll model patient transitions through health states over time.
# Markov model for long-term analysis
markov_analysis <- decisiongraph(
data = covid_data,
treeType = "markov",
# Markov-specific variables
healthStates = c("healthy_state", "mild_state", "severe_state",
"critical_state", "dead_state"),
transitionProbs = c("transition_healthy_mild", "transition_mild_severe",
"transition_severe_critical", "transition_critical_dead",
"transition_mild_healthy", "transition_severe_mild",
"transition_critical_severe"),
# Time parameters
cycleLength = 1, # 1-year cycles
timeHorizon = 10, # 10-year analysis
discountRate = 0.03, # 3% annual discount rate
# Markov options
cycleCorrection = TRUE, # Half-cycle correction
cohortTrace = TRUE, # Generate cohort trace
cohortSize = 1000, # Starting cohort size
# Cost and utility variables
costs = c("outpatient_cost", "hospitalization_cost", "icu_cost"),
utilities = c("outpatient_utility", "hospitalization_utility", "icu_utility"),
# Output options
summaryTable = TRUE
)
Understanding Markov Models
Markov models track patient populations through health states over multiple cycles:
- Health States: Healthy → Mild → Severe → Critical → Dead
- Transition Probabilities: Likelihood of moving between states each cycle
- Cohort Trace: Shows population distribution over time
- Discounting: Adjusts future costs and benefits to present value
Clinical Applications: - Chronic disease progression modeling - Long-term treatment effectiveness - Lifetime cost-effectiveness analysis
3. Cost-Effectiveness Analysis
Now let’s perform a comprehensive economic evaluation comparing all treatment strategies.
# Comprehensive cost-effectiveness analysis
cea_analysis <- decisiongraph(
data = covid_data,
treeType = "costeffectiveness",
# Decision variables
decisions = "treatment_strategy",
costs = c("treatment_cost", "hospitalization_cost", "icu_cost"),
utilities = c("recovery_utility", "hospitalization_utility", "icu_utility"),
# Cost-effectiveness parameters
calculateNMB = TRUE,
willingnessToPay = 50000, # $50,000 per QALY
# ICER analysis
incrementalAnalysis = TRUE,
dominanceAnalysis = TRUE,
icer_confidence_intervals = FALSE, # Set to TRUE for CI calculations
# Output options
summaryTable = TRUE,
decisionComparison = TRUE,
# Visualization
layout = "horizontal",
colorScheme = "economic",
plotComparison = TRUE
)
Understanding Cost-Effectiveness Analysis
Key Metrics:
-
Net Monetary Benefit (NMB):
NMB = (Utility × WTP) - Cost
- Higher NMB = more cost-effective
- WTP = Willingness-to-pay threshold
-
Incremental Cost-Effectiveness Ratio (ICER):
ICER = (Cost₂ - Cost₁) / (Utility₂ - Utility₁)
- Compare with WTP threshold
- ICER < WTP = cost-effective
-
Dominance Analysis:
- Dominated: More expensive, less effective
- Dominant: Less expensive, more effective
- Extended dominance: Eliminated by linear combination
4. Sensitivity Analysis
Let’s explore how robust our results are to parameter uncertainty.
4.1 One-Way Sensitivity Analysis
# One-way sensitivity analysis with tornado diagram
sensitivity_analysis <- decisiongraph(
data = covid_data,
treeType = "costeffectiveness",
# Core analysis
decisions = "treatment_strategy",
costs = "treatment_cost",
utilities = "recovery_utility",
# Sensitivity analysis
sensitivityAnalysis = TRUE,
tornado = TRUE,
# Cost-effectiveness
calculateNMB = TRUE,
willingnessToPay = 50000,
# Output options
summaryTable = TRUE
)
4.2 Probabilistic Sensitivity Analysis (PSA)
# Probabilistic sensitivity analysis
psa_analysis <- decisiongraph(
data = covid_data,
treeType = "costeffectiveness",
# Core analysis
decisions = "treatment_strategy",
costs = c("treatment_cost", "hospitalization_cost"),
utilities = c("recovery_utility", "hospitalization_utility"),
probabilities = c("recovery_prob", "hospitalization_prob"),
# PSA settings
probabilisticAnalysis = TRUE,
numSimulations = 1000, # Increase for more precision
psa_distributions = "normal",
# Advanced PSA outputs
psa_advanced_outputs = TRUE,
ceacThresholds = "0,100000,5000", # $0 to $100k, $5k steps
# Parameter correlations
correlatedParameters = TRUE,
correlationMatrix = c("correlation_param1", "correlation_param2", "correlation_param3"),
# Cost-effectiveness
calculateNMB = TRUE,
willingnessToPay = 50000,
# Performance optimization
performanceMode = "standard",
memoryOptimization = TRUE,
parallelProcessing = FALSE # Set TRUE if you have multiple cores
)
Understanding PSA Results
PSA generates several key outputs:
- Cost-Effectiveness Scatter Plot: Shows uncertainty cloud on cost-effectiveness plane
- Cost-Effectiveness Acceptability Curve (CEAC): Probability of cost-effectiveness across WTP thresholds
- Net Monetary Benefit Distributions: Uncertainty in NMB estimates
- Confidence Intervals: For all key metrics
5. Value of Information Analysis
Determine the value of reducing uncertainty in key parameters.
# Value of information analysis
voi_analysis <- decisiongraph(
data = covid_data,
treeType = "costeffectiveness",
# Core analysis
decisions = "treatment_strategy",
costs = "treatment_cost",
utilities = "recovery_utility",
probabilities = "recovery_prob",
# VOI analysis
valueOfInformation = TRUE,
evpi_parameters = c("evpi_param1", "evpi_param2"),
# Required: PSA for VOI
probabilisticAnalysis = TRUE,
numSimulations = 1000,
# Cost-effectiveness
calculateNMB = TRUE,
willingnessToPay = 50000
)
Understanding Value of Information
Expected Value of Perfect Information (EVPI): -
Maximum value of eliminating all parameter uncertainty - Upper bound on
research value - Calculated as:
EVPI = E[max(NMB)] - max(E[NMB])
Partial EVPI: - Value of perfect information for specific parameters - Helps prioritize research areas - Identifies which parameters drive decision uncertainty most
6. Budget Impact Analysis
Estimate the financial impact of adopting new treatments at population level.
# Budget impact analysis
budget_analysis <- decisiongraph(
data = covid_data,
treeType = "costeffectiveness",
# Core analysis
decisions = "treatment_strategy",
costs = c("treatment_cost", "hospitalization_cost"),
utilities = "recovery_utility",
# Budget impact parameters
budgetImpactAnalysis = TRUE,
targetPopulationSize = 100000, # 100,000 eligible patients
marketPenetration = 0.3, # 30% uptake rate
cohortSize = 1000, # Analysis cohort size
# Time horizon
timeHorizon = 5,
# Cost-effectiveness
calculateNMB = TRUE,
willingnessToPay = 50000
)
Understanding Budget Impact
Key Calculations: - Total eligible
population: Patients who could receive treatment -
Market penetration: Proportion actually receiving new
treatment - Budget impact:
(New treatment cost - Current cost) × Penetration × Population
Applications: - Healthcare system planning -
Formulary decisions
- Resource allocation - Policy impact assessment
7. Advanced Features and Customization
7.1 Performance Optimization
# Large-scale analysis with performance optimization
large_analysis <- decisiongraph(
data = covid_data,
treeType = "costeffectiveness",
# Core analysis
decisions = "treatment_strategy",
costs = "treatment_cost",
utilities = "recovery_utility",
# Large PSA
probabilisticAnalysis = TRUE,
numSimulations = 5000, # Large simulation
# Performance settings
performanceMode = "fast", # Prioritize speed
memoryOptimization = TRUE, # Enable memory efficiency
parallelProcessing = TRUE, # Use multiple cores if available
# Cost-effectiveness
calculateNMB = TRUE,
willingnessToPay = 50000
)
7.2 Custom Visualization Settings
# Custom visualization and output
custom_analysis <- decisiongraph(
data = covid_data,
treeType = "simple",
# Core variables
decisions = "treatment_strategy",
probabilities = "recovery_prob",
costs = "treatment_cost",
utilities = "recovery_utility",
# Custom visualization
layout = "radial", # Radial tree layout
colorScheme = "medical", # Medical color scheme
nodeShapes = TRUE, # Different shapes per node type
nodeLabels = TRUE, # Show node labels
branchLabels = TRUE, # Show branch labels
# Detailed outputs
summaryTable = TRUE,
decisionComparison = TRUE,
calculateExpectedValues = TRUE
)
8. Interpreting Results
Key Decision Rules
For Simple Decision Trees:
- Highest Expected Utility: Best clinical outcomes
-
Lowest Expected Cost: Most economical option
- Best Utility-Cost Ratio: Balance of effectiveness and cost
Clinical Decision Framework
# Example decision framework
cat("CLINICAL DECISION FRAMEWORK\n")
cat("==========================\n\n")
cat("1. EFFICACY (Primary consideration)\n")
cat(" - Recovery probability > 80%: Highly effective\n")
cat(" - Recovery probability 60-80%: Moderately effective\n")
cat(" - Recovery probability < 60%: Less effective\n\n")
cat("2. SAFETY (Secondary consideration)\n")
cat(" - ICU probability < 5%: Low risk\n")
cat(" - ICU probability 5-15%: Moderate risk\n")
cat(" - ICU probability > 15%: High risk\n\n")
cat("3. COST-EFFECTIVENESS (Tertiary consideration)\n")
cat(" - ICER < $50,000/QALY: Cost-effective\n")
cat(" - ICER $50,000-$100,000/QALY: Possibly cost-effective\n")
cat(" - ICER > $100,000/QALY: Not cost-effective\n\n")
cat("4. BUDGET IMPACT (Health system consideration)\n")
cat(" - Budget impact < 1% of total budget: Minimal impact\n")
cat(" - Budget impact 1-5% of total budget: Moderate impact\n")
cat(" - Budget impact > 5% of total budget: Major impact\n")
9. Best Practices and Tips
9.1 Data Preparation
# Ensure proper data formatting
covid_data_clean <- covid_data %>%
# Convert factors if needed
mutate(
treatment_strategy = as.factor(treatment_strategy),
age_group = as.factor(age_group),
clinical_outcome = as.factor(clinical_outcome)
) %>%
# Check for missing values
filter(complete.cases(.)) %>%
# Validate probability ranges
filter(
recovery_prob >= 0 & recovery_prob <= 1,
hospitalization_prob >= 0 & hospitalization_prob <= 1
) %>%
# Validate cost values
filter(
treatment_cost >= 0,
hospitalization_cost >= 0
)
cat("Data validation complete. Cleaned dataset has", nrow(covid_data_clean), "rows.")
9.2 Analysis Tips
Performance Recommendations:
- Start Small: Begin with 1000 PSA simulations, increase if needed
-
Use Fast Mode: For exploration, use
performanceMode = "fast"
-
Enable Memory Optimization: Always use
memoryOptimization = TRUE
- Parallel Processing: Enable for >5000 simulations
Validation Checklist:
- ✅ Probabilities between 0 and 1
- ✅ Costs are non-negative
- ✅ Utilities typically between 0 and 1
- ✅ Transition probabilities sum to 1 (Markov models)
- ✅ Time horizon is reasonable for condition
- ✅ Discount rate is appropriate (typically 3-5%)
9.3 Common Issues and Solutions
# Common error: Invalid CEAC format
# ❌ Wrong: ceacThresholds = "0 to 100000 by 5000"
# ✅ Correct: ceacThresholds = "0,100000,5000"
# Common error: Too many simulations
# ❌ Wrong: numSimulations = 50000 (may cause memory issues)
# ✅ Correct: numSimulations = 1000 to 5000
# Common error: Missing required variables
# ❌ Wrong: treeType = "markov" without healthStates
# ✅ Correct: Always include required variables for each tree type
# Performance optimization for large analyses
# ✅ Use chunked processing for very large datasets
large_data_analysis <- decisiongraph(
data = covid_data,
treeType = "costeffectiveness",
decisions = "treatment_strategy",
costs = "treatment_cost",
utilities = "recovery_utility",
performanceMode = "fast", # Speed optimization
memoryOptimization = TRUE, # Memory efficiency
numSimulations = 2000 # Reasonable simulation count
)
10. Conclusion
The decisiongraph
function provides a comprehensive
toolkit for medical decision analysis:
Key Strengths: - Flexibility: Supports simple trees, Markov models, and cost-effectiveness analysis - Comprehensive: Includes sensitivity analysis, value of information, and budget impact - User-friendly: Intuitive interface with helpful validation and error messages - Performance: Optimized for large-scale analyses with memory management
Applications: - Clinical treatment guidelines -
Healthcare technology assessment
- Formulary and coverage decisions - Research prioritization - Policy
analysis
Next Steps: 1. Apply to your own clinical data 2. Validate results with clinical experts 3. Conduct sensitivity analyses to test robustness 4. Consider value of information to guide future research 5. Estimate budget impact for implementation planning
This vignette demonstrates the full capabilities of the decisiongraph function. For questions or issues, please refer to the meddecide documentation or submit issues to the GitHub repository.