Grouped Hazard Forest Plots for Subgroup Analysis
ClinicoPath
2025-07-13
Source:vignettes/jsurvival-08-groupedforest-comprehensive.Rmd
jsurvival-08-groupedforest-comprehensive.Rmd
Introduction to Grouped Forest Plots
The Grouped Hazard Forest Plot module provides a powerful visualization method for comparing treatment effects across different patient subgroups. This analysis addresses a critical need in precision medicine and clinical research by performing separate Cox proportional hazards regression analyses for each subgroup and presenting the results in a unified forest plot.
This module was developed to address GitHub Issue #88: Create grouped forest plots showing treatment vs control for each variant.
Key Features
- Subgroup-Specific Analysis: Separate Cox regression for each group
- Visual Comparison: Side-by-side hazard ratios in forest plot format
- Statistical Testing: Individual p-values and interaction tests
- Clinical Applications: Perfect for biomarker studies, precision medicine, and clinical trials
- Flexible Grouping: Support for any categorical grouping variable
Getting Started
Required Data Structure
Your dataset should contain:
- Time Variable: Continuous variable for follow-up duration
- Event Variable: Binary indicator (0=censored, 1=event)
- Treatment Variable: Factor variable for treatment comparison
- Grouping Variable: Categorical variable defining subgroups
Example Dataset
Let’s examine the structure of our test data:
library(ClinicoPath)
library(dplyr)
library(ggplot2)
# Load example data
data("groupedforest_comprehensive_data")
# Examine data structure
str(groupedforest_comprehensive_data)
# Preview the data
head(groupedforest_comprehensive_data) %>%
knitr::kable(caption = "Sample of Comprehensive Grouped Forest Data")
Basic Grouped Forest Analysis
Simple Two-Group Comparison
Let’s start with a basic analysis comparing treatment effects between biomarker-positive and biomarker-negative patients:
# Basic grouped forest plot by biomarker status
result_biomarker <- groupedforest(
data = groupedforest_comprehensive_data,
time_var = "survival_months",
event_var = "death_event",
treatment_var = "treatment",
grouping_var = "biomarker_status",
show_statistics = TRUE,
show_overall = TRUE
)
# The result object contains the analysis
class(result_biomarker)
Tumor Stage Stratification
Compare treatment effects between early and advanced tumor stages:
# Grouped analysis by tumor stage
result_stage <- groupedforest(
data = groupedforest_comprehensive_data,
time_var = "survival_months",
event_var = "death_event",
treatment_var = "treatment",
grouping_var = "tumor_stage",
confidence_level = 0.95,
sort_by_hr = TRUE,
plot_title = "Treatment Effects by Tumor Stage"
)
Gender-Based Subgroup Analysis
Evaluate treatment effects across gender groups:
# Analysis by gender
result_gender <- groupedforest(
data = groupedforest_comprehensive_data,
time_var = "survival_months",
event_var = "death_event",
treatment_var = "treatment",
grouping_var = "gender",
show_counts = TRUE,
plot_theme = "clinical"
)
Advanced Analysis Features
Covariate Adjustment
Include covariates to adjust for confounding factors:
# Analysis with covariate adjustment
result_adjusted <- groupedforest(
data = groupedforest_comprehensive_data,
time_var = "survival_months",
event_var = "death_event",
treatment_var = "treatment",
grouping_var = "biomarker_status",
covariates = c("age", "performance_status"),
reference_treatment = "Control",
show_statistics = TRUE
)
Interaction Testing
Test for treatment-by-subgroup interactions:
# Test for interaction effects
result_interaction <- groupedforest(
data = groupedforest_comprehensive_data,
time_var = "survival_months",
event_var = "death_event",
treatment_var = "treatment",
grouping_var = "tumor_stage",
interaction_test = TRUE,
show_overall = TRUE
)
Custom Display Options
Customize the appearance and range of the forest plot:
# Custom forest plot settings
result_custom <- groupedforest(
data = groupedforest_comprehensive_data,
time_var = "survival_months",
event_var = "death_event",
treatment_var = "treatment",
grouping_var = "biomarker_status",
plot_title = "Biomarker-Stratified Treatment Analysis",
plot_theme = "publication",
hr_range = "wide",
sort_by_hr = TRUE,
confidence_level = 0.95
)
Multiple Subgroup Analysis
Molecular Subtype Analysis
Analyze treatment effects across molecular subtypes:
# Molecular subtype forest plot
result_molecular <- groupedforest(
data = groupedforest_multi_subgroups,
time_var = "time_to_event",
event_var = "event_occurred",
treatment_var = "intervention",
grouping_var = "molecular_subtype",
covariates = "patient_age",
show_statistics = TRUE,
show_counts = TRUE
)
Risk Category Stratification
Compare treatment effects across risk categories:
# Risk category analysis
result_risk <- groupedforest(
data = groupedforest_multi_subgroups,
time_var = "time_to_event",
event_var = "event_occurred",
treatment_var = "intervention",
grouping_var = "risk_category",
interaction_test = TRUE,
plot_theme = "minimal"
)
Precision Medicine Applications
Genomic Variant Analysis
Demonstrate precision medicine applications with genomic data:
# Load precision medicine dataset
data("groupedforest_precision_medicine")
# Examine genomic variants
table(groupedforest_precision_medicine$genomic_variant)
# Genomic variant forest plot
result_genomic <- groupedforest(
data = groupedforest_precision_medicine,
time_var = "progression_free_months",
event_var = "progression_event",
treatment_var = "therapy_type",
grouping_var = "genomic_variant",
reference_treatment = "Chemotherapy",
plot_title = "Precision Therapy by Genomic Variant"
)
Expression Level Stratification
Analyze treatment effects by biomarker expression levels:
# Expression level analysis
result_expression <- groupedforest(
data = groupedforest_precision_medicine,
time_var = "progression_free_months",
event_var = "progression_event",
treatment_var = "therapy_type",
grouping_var = "expression_level",
covariates = c("age_at_diagnosis", "tumor_size"),
show_overall = TRUE
)
Biomarker Stratification Studies
Biomarker Level Analysis
Work with biomarker stratification data:
# Load biomarker data
data("groupedforest_biomarker_data")
# Examine biomarker levels
table(groupedforest_biomarker_data$biomarker_level)
# Biomarker level forest plot
result_biomarker_level <- groupedforest(
data = groupedforest_biomarker_data,
time_var = "overall_survival_months",
event_var = "death_indicator",
treatment_var = "treatment_arm",
grouping_var = "biomarker_level",
plot_title = "Biomarker-Targeted Therapy by Expression Level",
hr_range = "custom",
custom_hr_min = 0.2,
custom_hr_max = 5.0
)
Pathway Status Analysis
Analyze treatment effects by pathway activation status:
# Pathway status analysis
result_pathway <- groupedforest(
data = groupedforest_biomarker_data,
time_var = "overall_survival_months",
event_var = "death_indicator",
treatment_var = "treatment_arm",
grouping_var = "pathway_status",
plot_theme = "publication",
show_counts = TRUE
)
Clinical Trial Applications
Multi-Factor Clinical Trial
Demonstrate complex clinical trial analysis:
# Load clinical trial data
data("groupedforest_interaction_data")
# Examine trial characteristics
table(groupedforest_interaction_data$genetic_profile)
table(groupedforest_interaction_data$disease_severity)
# Genetic profile analysis
result_clinical <- groupedforest(
data = groupedforest_interaction_data,
time_var = "event_free_survival",
event_var = "event_status",
treatment_var = "randomized_treatment",
grouping_var = "genetic_profile",
interaction_test = TRUE,
export_data = TRUE,
plot_title = "Clinical Trial: Treatment by Genetic Profile"
)
Disease Severity Stratification
Analyze treatment effects across disease severity levels:
# Disease severity analysis
result_severity <- groupedforest(
data = groupedforest_interaction_data,
time_var = "event_free_survival",
event_var = "event_status",
treatment_var = "randomized_treatment",
grouping_var = "disease_severity",
covariates = c("age_years", "baseline_severity_score"),
confidence_level = 0.90
)
Customization and Visualization Options
Plot Themes
Demonstrate different plot themes:
# Load simple dataset for theme demonstration
data("groupedforest_simple_data")
# Clinical theme (default)
result_theme_clinical <- groupedforest(
data = groupedforest_simple_data,
time_var = "time",
event_var = "event",
treatment_var = "treatment",
grouping_var = "subgroup",
plot_theme = "clinical",
plot_title = "Clinical Theme"
)
# Publication theme
result_theme_pub <- groupedforest(
data = groupedforest_simple_data,
time_var = "time",
event_var = "event",
treatment_var = "treatment",
grouping_var = "subgroup",
plot_theme = "publication",
plot_title = "Publication Theme"
)
Hazard Ratio Ranges
Demonstrate different HR range options:
# Wide range
result_wide <- groupedforest(
data = groupedforest_simple_data,
time_var = "time",
event_var = "event",
treatment_var = "treatment",
grouping_var = "subgroup",
hr_range = "wide",
plot_title = "Wide HR Range (0.1-10)"
)
# Narrow range
result_narrow <- groupedforest(
data = groupedforest_simple_data,
time_var = "time",
event_var = "event",
treatment_var = "treatment",
grouping_var = "subgroup",
hr_range = "narrow",
plot_title = "Narrow HR Range (0.5-2)"
)
Statistical Considerations
Confidence Intervals
Demonstrate different confidence levels:
# Different confidence levels
conf_levels <- c(0.80, 0.90, 0.95, 0.99)
for (conf_level in conf_levels) {
result <- groupedforest(
data = groupedforest_simple_data,
time_var = "time",
event_var = "event",
treatment_var = "treatment",
grouping_var = "subgroup",
confidence_level = conf_level,
plot_title = paste0(conf_level * 100, "% Confidence Intervals")
)
}
Interpretation Guidelines
Reading Forest Plots
Advanced Statistical Methods
Interaction Testing
Understanding treatment-by-subgroup interactions:
# Comprehensive interaction analysis
result_full_interaction <- groupedforest(
data = groupedforest_comprehensive_data,
time_var = "survival_months",
event_var = "death_event",
treatment_var = "treatment",
grouping_var = "biomarker_status",
interaction_test = TRUE,
covariates = c("age", "performance_status"),
show_overall = TRUE,
show_statistics = TRUE
)
Multiple Comparisons
Considerations for multiple subgroup testing:
# Multiple subgroup analysis
result_multiple <- groupedforest(
data = groupedforest_multi_subgroups,
time_var = "time_to_event",
event_var = "event_occurred",
treatment_var = "intervention",
grouping_var = "molecular_subtype", # 4 subgroups = multiple comparisons
show_statistics = TRUE,
plot_title = "Multiple Subgroup Analysis (Consider Adjustment)"
)
Troubleshooting and Common Issues
Data Requirements
Minimum Sample Sizes:
- Per subgroup: At least 10-15 events recommended
- Treatment arms: Balanced representation preferred
- Event rates: Sufficient events for Cox regression
Data Quality:
# Check data quality
check_data_quality <- function(data, time_var, event_var, treatment_var, grouping_var) {
cat("Data Quality Checks:\n")
cat("Total sample size:", nrow(data), "\n")
cat("Complete cases:", sum(complete.cases(data[, c(time_var, event_var, treatment_var, grouping_var)])), "\n")
cat("Event rate:", round(mean(data[[event_var]], na.rm = TRUE), 3), "\n")
# Subgroup sizes
cat("\nSubgroup sizes:\n")
print(table(data[[grouping_var]], data[[treatment_var]]))
# Event rates by subgroup
cat("\nEvent rates by subgroup:\n")
event_rates <- data %>%
group_by(.data[[grouping_var]], .data[[treatment_var]]) %>%
summarise(event_rate = round(mean(.data[[event_var]], na.rm = TRUE), 3), .groups = 'drop')
print(event_rates)
}
check_data_quality(
groupedforest_comprehensive_data,
"survival_months",
"death_event",
"treatment",
"biomarker_status"
)
Summary
The Grouped Hazard Forest Plot module provides a comprehensive solution for subgroup analysis in survival studies. Key benefits include:
Clinical Value:
- Precision Medicine: Identify patients most likely to benefit
- Treatment Personalization: Guide individual treatment decisions
- Research Planning: Inform future study designs
Statistical Rigor:
- Formal Testing: Cox regression for each subgroup
- Interaction Analysis: Test for differential treatment effects
- Visualization: Clear presentation of complex results
Practical Applications:
- Biomarker Studies: Evaluate predictive markers
- Clinical Trials: Subgroup efficacy analysis
- Drug Development: Support regulatory submissions
This module addresses the critical need for systematic subgroup analysis in clinical research, providing both statistical rigor and practical clinical insights for precision medicine applications.