Decision Tree Analysis with decisiongraph: A Comprehensive Guide

Introduction

The decisiongraph function in the meddecide package provides comprehensive decision tree analysis capabilities for medical decision-making. This vignette demonstrates how to use all major features of the function using a realistic COVID-19 treatment strategy dataset.

Key Features Covered

Simple Decision Trees: Basic treatment comparisons
Markov Models: Long-term health state modeling
Cost-Effectiveness Analysis: Economic evaluation with ICER and NMB
Sensitivity Analysis: One-way and probabilistic sensitivity analysis
Advanced Features: Value of information, budget impact, correlation analysis

Dataset Overview

We’ll use a comprehensive COVID-19 treatment dataset with 40 patients comparing four treatment strategies across different patient populations.

# Load required libraries
library(meddecide)
library(dplyr)
library(ggplot2)

# Load the test dataset
covid_data <- read.csv("data/decisiongraph_test_data.csv")

# Display dataset structure
str(covid_data)

# Preview the first few rows
head(covid_data, 3)

# Summary of key variables
cat("Dataset contains:\n")
cat("- Patients:", nrow(covid_data), "\n")
cat("- Treatment strategies:", length(unique(covid_data$treatment_strategy)), "\n")
cat("- Age groups:", length(unique(covid_data$age_group)), "\n")
cat("- Comorbidity types:", length(unique(covid_data$comorbidities)), "\n")

1. Simple Decision Tree Analysis

Let’s start with a basic decision tree comparing treatment strategies based on recovery probability and treatment costs.

# Basic decision tree analysis
simple_analysis <- decisiongraph(
  data = covid_data,
  treeType = "simple",
  
  # Core variables
  decisions = "treatment_strategy",
  probabilities = "recovery_prob", 
  costs = "treatment_cost",
  utilities = "recovery_utility",
  outcomes = "clinical_outcome",
  
  # Visualization settings  
  layout = "horizontal",
  nodeShapes = TRUE,
  showProbabilities = TRUE,
  showCosts = TRUE,
  showUtilities = TRUE,
  
  # Output options
  calculateExpectedValues = TRUE,
  summaryTable = TRUE,
  
  # Performance settings
  performanceMode = "standard"
)

Understanding Simple Decision Trees

The simple decision tree shows:

Decision nodes (squares): Treatment strategy choices
Chance nodes (circles): Probability of recovery
Terminal nodes (triangles): Final outcomes with costs and utilities
Expected values: Calculated for each treatment path

Key Insights: - Compare treatments based on expected costs and utilities - Identify dominant strategies (higher utility, lower cost) - Visualize decision trade-offs clearly

2. Markov Model Analysis

For long-term analysis, we’ll model patient transitions through health states over time.

# Markov model for long-term analysis
markov_analysis <- decisiongraph(
  data = covid_data,
  treeType = "markov",
  
  # Markov-specific variables
  healthStates = c("healthy_state", "mild_state", "severe_state", 
                   "critical_state", "dead_state"),
  transitionProbs = c("transition_healthy_mild", "transition_mild_severe",
                     "transition_severe_critical", "transition_critical_dead",
                     "transition_mild_healthy", "transition_severe_mild",
                     "transition_critical_severe"),
  
  # Time parameters
  cycleLength = 1,  # 1-year cycles
  timeHorizon = 10, # 10-year analysis
  discountRate = 0.03, # 3% annual discount rate
  
  # Markov options
  cycleCorrection = TRUE,  # Half-cycle correction
  cohortTrace = TRUE,      # Generate cohort trace
  cohortSize = 1000,       # Starting cohort size
  
  # Cost and utility variables
  costs = c("outpatient_cost", "hospitalization_cost", "icu_cost"),
  utilities = c("outpatient_utility", "hospitalization_utility", "icu_utility"),
  
  # Output options
  summaryTable = TRUE
)

Understanding Markov Models

Markov models track patient populations through health states over multiple cycles:

Health States: Healthy → Mild → Severe → Critical → Dead
Transition Probabilities: Likelihood of moving between states each cycle
Cohort Trace: Shows population distribution over time
Discounting: Adjusts future costs and benefits to present value

Clinical Applications: - Chronic disease progression modeling - Long-term treatment effectiveness - Lifetime cost-effectiveness analysis

3. Cost-Effectiveness Analysis

Now let’s perform a comprehensive economic evaluation comparing all treatment strategies.

# Comprehensive cost-effectiveness analysis
cea_analysis <- decisiongraph(
  data = covid_data,
  treeType = "costeffectiveness",
  
  # Decision variables
  decisions = "treatment_strategy",
  costs = c("treatment_cost", "hospitalization_cost", "icu_cost"),
  utilities = c("recovery_utility", "hospitalization_utility", "icu_utility"),
  
  # Cost-effectiveness parameters
  calculateNMB = TRUE,
  willingnessToPay = 50000,  # $50,000 per QALY
  
  # ICER analysis
  incrementalAnalysis = TRUE,
  dominanceAnalysis = TRUE,
  icer_confidence_intervals = FALSE, # Set to TRUE for CI calculations
  
  # Output options
  summaryTable = TRUE,
  decisionComparison = TRUE,
  
  # Visualization
  layout = "horizontal",
  colorScheme = "economic",
  plotComparison = TRUE
)

Understanding Cost-Effectiveness Analysis

Key Metrics:

Net Monetary Benefit (NMB): NMB = (Utility × WTP) - Cost
- Higher NMB = more cost-effective
- WTP = Willingness-to-pay threshold
Incremental Cost-Effectiveness Ratio (ICER):
```
ICER = (Cost₂ - Cost₁) / (Utility₂ - Utility₁)
```
- Compare with WTP threshold
- ICER < WTP = cost-effective
Dominance Analysis:
- Dominated: More expensive, less effective
- Dominant: Less expensive, more effective
- Extended dominance: Eliminated by linear combination

4. Sensitivity Analysis

Let’s explore how robust our results are to parameter uncertainty.

4.1 One-Way Sensitivity Analysis

# One-way sensitivity analysis with tornado diagram
sensitivity_analysis <- decisiongraph(
  data = covid_data,
  treeType = "costeffectiveness",
  
  # Core analysis
  decisions = "treatment_strategy",
  costs = "treatment_cost",
  utilities = "recovery_utility",
  
  # Sensitivity analysis
  sensitivityAnalysis = TRUE,
  tornado = TRUE,
  
  # Cost-effectiveness
  calculateNMB = TRUE,
  willingnessToPay = 50000,
  
  # Output options
  summaryTable = TRUE
)

4.2 Probabilistic Sensitivity Analysis (PSA)

# Probabilistic sensitivity analysis
psa_analysis <- decisiongraph(
  data = covid_data,
  treeType = "costeffectiveness",
  
  # Core analysis
  decisions = "treatment_strategy", 
  costs = c("treatment_cost", "hospitalization_cost"),
  utilities = c("recovery_utility", "hospitalization_utility"),
  probabilities = c("recovery_prob", "hospitalization_prob"),
  
  # PSA settings
  probabilisticAnalysis = TRUE,
  numSimulations = 1000,  # Increase for more precision
  psa_distributions = "normal",
  
  # Advanced PSA outputs
  psa_advanced_outputs = TRUE,
  ceacThresholds = "0,100000,5000",  # $0 to $100k, $5k steps
  
  # Parameter correlations
  correlatedParameters = TRUE,
  correlationMatrix = c("correlation_param1", "correlation_param2", "correlation_param3"),
  
  # Cost-effectiveness
  calculateNMB = TRUE,
  willingnessToPay = 50000,
  
  # Performance optimization
  performanceMode = "standard",
  memoryOptimization = TRUE,
  parallelProcessing = FALSE  # Set TRUE if you have multiple cores
)

Understanding PSA Results

PSA generates several key outputs:

Cost-Effectiveness Scatter Plot: Shows uncertainty cloud on cost-effectiveness plane
Cost-Effectiveness Acceptability Curve (CEAC): Probability of cost-effectiveness across WTP thresholds
Net Monetary Benefit Distributions: Uncertainty in NMB estimates
Confidence Intervals: For all key metrics

5. Value of Information Analysis

Determine the value of reducing uncertainty in key parameters.

# Value of information analysis
voi_analysis <- decisiongraph(
  data = covid_data,
  treeType = "costeffectiveness",
  
  # Core analysis
  decisions = "treatment_strategy",
  costs = "treatment_cost", 
  utilities = "recovery_utility",
  probabilities = "recovery_prob",
  
  # VOI analysis
  valueOfInformation = TRUE,
  evpi_parameters = c("evpi_param1", "evpi_param2"),
  
  # Required: PSA for VOI
  probabilisticAnalysis = TRUE,
  numSimulations = 1000,
  
  # Cost-effectiveness
  calculateNMB = TRUE,
  willingnessToPay = 50000
)

Understanding Value of Information

Expected Value of Perfect Information (EVPI): - Maximum value of eliminating all parameter uncertainty - Upper bound on research value - Calculated as: EVPI = E[max(NMB)] - max(E[NMB])

Partial EVPI: - Value of perfect information for specific parameters - Helps prioritize research areas - Identifies which parameters drive decision uncertainty most

6. Budget Impact Analysis

Estimate the financial impact of adopting new treatments at population level.

# Budget impact analysis
budget_analysis <- decisiongraph(
  data = covid_data,
  treeType = "costeffectiveness",
  
  # Core analysis
  decisions = "treatment_strategy",
  costs = c("treatment_cost", "hospitalization_cost"),
  utilities = "recovery_utility",
  
  # Budget impact parameters
  budgetImpactAnalysis = TRUE,
  targetPopulationSize = 100000,   # 100,000 eligible patients
  marketPenetration = 0.3,         # 30% uptake rate
  cohortSize = 1000,              # Analysis cohort size
  
  # Time horizon
  timeHorizon = 5,
  
  # Cost-effectiveness  
  calculateNMB = TRUE,
  willingnessToPay = 50000
)

Understanding Budget Impact

Key Calculations: - Total eligible population: Patients who could receive treatment - Market penetration: Proportion actually receiving new treatment - Budget impact: (New treatment cost - Current cost) × Penetration × Population

Applications: - Healthcare system planning - Formulary decisions
- Resource allocation - Policy impact assessment

7. Advanced Features and Customization

7.1 Performance Optimization

# Large-scale analysis with performance optimization
large_analysis <- decisiongraph(
  data = covid_data,
  treeType = "costeffectiveness",
  
  # Core analysis
  decisions = "treatment_strategy",
  costs = "treatment_cost",
  utilities = "recovery_utility", 
  
  # Large PSA
  probabilisticAnalysis = TRUE,
  numSimulations = 5000,  # Large simulation
  
  # Performance settings
  performanceMode = "fast",          # Prioritize speed
  memoryOptimization = TRUE,         # Enable memory efficiency  
  parallelProcessing = TRUE,         # Use multiple cores if available
  
  # Cost-effectiveness
  calculateNMB = TRUE,
  willingnessToPay = 50000
)

7.2 Custom Visualization Settings

# Custom visualization and output
custom_analysis <- decisiongraph(
  data = covid_data,
  treeType = "simple",
  
  # Core variables
  decisions = "treatment_strategy",
  probabilities = "recovery_prob",
  costs = "treatment_cost", 
  utilities = "recovery_utility",
  
  # Custom visualization
  layout = "radial",              # Radial tree layout
  colorScheme = "medical",        # Medical color scheme
  nodeShapes = TRUE,              # Different shapes per node type
  nodeLabels = TRUE,              # Show node labels
  branchLabels = TRUE,            # Show branch labels
  
  # Detailed outputs
  summaryTable = TRUE,
  decisionComparison = TRUE,
  calculateExpectedValues = TRUE
)

8. Interpreting Results

Key Decision Rules

For Simple Decision Trees:

Highest Expected Utility: Best clinical outcomes
Lowest Expected Cost: Most economical option
Best Utility-Cost Ratio: Balance of effectiveness and cost

For Cost-Effectiveness Analysis:

Highest NMB: Most cost-effective at given WTP threshold
ICER < WTP: Acceptable cost per QALY gained
Non-dominated: Not eliminated by dominance analysis

For Markov Models:

Lifetime QALYs: Long-term quality-adjusted survival
Discounted Costs: Present value of all costs
Cost per life-year: Traditional healthcare metric

Clinical Decision Framework

# Example decision framework
cat("CLINICAL DECISION FRAMEWORK\n")
cat("==========================\n\n")

cat("1. EFFICACY (Primary consideration)\n")
cat("   - Recovery probability > 80%: Highly effective\n") 
cat("   - Recovery probability 60-80%: Moderately effective\n")
cat("   - Recovery probability < 60%: Less effective\n\n")

cat("2. SAFETY (Secondary consideration)\n")
cat("   - ICU probability < 5%: Low risk\n")
cat("   - ICU probability 5-15%: Moderate risk\n") 
cat("   - ICU probability > 15%: High risk\n\n")

cat("3. COST-EFFECTIVENESS (Tertiary consideration)\n")
cat("   - ICER < $50,000/QALY: Cost-effective\n")
cat("   - ICER $50,000-$100,000/QALY: Possibly cost-effective\n")
cat("   - ICER > $100,000/QALY: Not cost-effective\n\n")

cat("4. BUDGET IMPACT (Health system consideration)\n") 
cat("   - Budget impact < 1% of total budget: Minimal impact\n")
cat("   - Budget impact 1-5% of total budget: Moderate impact\n")
cat("   - Budget impact > 5% of total budget: Major impact\n")

9. Best Practices and Tips

9.1 Data Preparation

# Ensure proper data formatting
covid_data_clean <- covid_data %>%
  # Convert factors if needed
  mutate(
    treatment_strategy = as.factor(treatment_strategy),
    age_group = as.factor(age_group),
    clinical_outcome = as.factor(clinical_outcome)
  ) %>%
  # Check for missing values
  filter(complete.cases(.)) %>%
  # Validate probability ranges
  filter(
    recovery_prob >= 0 & recovery_prob <= 1,
    hospitalization_prob >= 0 & hospitalization_prob <= 1
  ) %>%
  # Validate cost values  
  filter(
    treatment_cost >= 0,
    hospitalization_cost >= 0
  )

cat("Data validation complete. Cleaned dataset has", nrow(covid_data_clean), "rows.")

9.2 Analysis Tips

Performance Recommendations:

Start Small: Begin with 1000 PSA simulations, increase if needed
Use Fast Mode: For exploration, use performanceMode = "fast"
Enable Memory Optimization: Always use memoryOptimization = TRUE
Parallel Processing: Enable for >5000 simulations

Validation Checklist:

✅ Probabilities between 0 and 1
✅ Costs are non-negative
✅ Utilities typically between 0 and 1
✅ Transition probabilities sum to 1 (Markov models)
✅ Time horizon is reasonable for condition
✅ Discount rate is appropriate (typically 3-5%)

9.3 Common Issues and Solutions

# Common error: Invalid CEAC format
# ❌ Wrong: ceacThresholds = "0 to 100000 by 5000"  
# ✅ Correct: ceacThresholds = "0,100000,5000"

# Common error: Too many simulations
# ❌ Wrong: numSimulations = 50000 (may cause memory issues)
# ✅ Correct: numSimulations = 1000 to 5000 

# Common error: Missing required variables
# ❌ Wrong: treeType = "markov" without healthStates
# ✅ Correct: Always include required variables for each tree type

# Performance optimization for large analyses
# ✅ Use chunked processing for very large datasets
large_data_analysis <- decisiongraph(
  data = covid_data,
  treeType = "costeffectiveness", 
  decisions = "treatment_strategy",
  costs = "treatment_cost",
  utilities = "recovery_utility",
  performanceMode = "fast",      # Speed optimization
  memoryOptimization = TRUE,     # Memory efficiency
  numSimulations = 2000         # Reasonable simulation count
)

10. Conclusion

The decisiongraph function provides a comprehensive toolkit for medical decision analysis:

Key Strengths: - Flexibility: Supports simple trees, Markov models, and cost-effectiveness analysis - Comprehensive: Includes sensitivity analysis, value of information, and budget impact - User-friendly: Intuitive interface with helpful validation and error messages - Performance: Optimized for large-scale analyses with memory management

Applications: - Clinical treatment guidelines - Healthcare technology assessment
- Formulary and coverage decisions - Research prioritization - Policy analysis

Next Steps: 1. Apply to your own clinical data 2. Validate results with clinical experts 3. Conduct sensitivity analyses to test robustness 4. Consider value of information to guide future research 5. Estimate budget impact for implementation planning

This vignette demonstrates the full capabilities of the decisiongraph function. For questions or issues, please refer to the meddecide documentation or submit issues to the GitHub repository.

COVID-19 Treatment Strategy Optimization Using Decision Trees, Markov Models, and Cost-Effectiveness Analysis

meddecide Module

2025-10-09