Grouped Hazard Forest Plots for Subgroup Analysis

Introduction to Grouped Forest Plots

The Grouped Hazard Forest Plot module provides a powerful visualization method for comparing treatment effects across different patient subgroups. This analysis addresses a critical need in precision medicine and clinical research by performing separate Cox proportional hazards regression analyses for each subgroup and presenting the results in a unified forest plot.

This module was developed to address GitHub Issue #88: Create grouped forest plots showing treatment vs control for each variant.

Key Features

Subgroup-Specific Analysis: Separate Cox regression for each group
Visual Comparison: Side-by-side hazard ratios in forest plot format
Statistical Testing: Individual p-values and interaction tests
Clinical Applications: Perfect for biomarker studies, precision medicine, and clinical trials
Flexible Grouping: Support for any categorical grouping variable

Clinical Applications

Precision Medicine

Treatment efficacy by genetic variants
Biomarker-stratified therapy selection
Molecular subtype analysis

Clinical Trials

Subgroup efficacy analysis
Patient population identification
Treatment interaction detection

Biomarker Studies

Predictive biomarker evaluation
Treatment-biomarker interactions
Patient stratification strategies

Getting Started

Required Data Structure

Your dataset should contain:

Time Variable: Continuous variable for follow-up duration
Event Variable: Binary indicator (0=censored, 1=event)
Treatment Variable: Factor variable for treatment comparison
Grouping Variable: Categorical variable defining subgroups

Example Dataset

Let’s examine the structure of our test data:

library(ClinicoPath)
library(dplyr)
library(ggplot2)

# Load example data
data("groupedforest_comprehensive_data")

# Examine data structure
str(groupedforest_comprehensive_data)

# Preview the data
head(groupedforest_comprehensive_data) %>%
  knitr::kable(caption = "Sample of Comprehensive Grouped Forest Data")

Basic Grouped Forest Analysis

Simple Two-Group Comparison

Let’s start with a basic analysis comparing treatment effects between biomarker-positive and biomarker-negative patients:

# Basic grouped forest plot by biomarker status
result_biomarker <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "biomarker_status",
  show_statistics = TRUE,
  show_overall = TRUE
)

# The result object contains the analysis
class(result_biomarker)

Tumor Stage Stratification

Compare treatment effects between early and advanced tumor stages:

# Grouped analysis by tumor stage
result_stage <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "tumor_stage",
  confidence_level = 0.95,
  sort_by_hr = TRUE,
  plot_title = "Treatment Effects by Tumor Stage"
)

Gender-Based Subgroup Analysis

Evaluate treatment effects across gender groups:

# Analysis by gender
result_gender <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "gender",
  show_counts = TRUE,
  plot_theme = "clinical"
)

Advanced Analysis Features

Covariate Adjustment

Include covariates to adjust for confounding factors:

# Analysis with covariate adjustment
result_adjusted <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "biomarker_status",
  covariates = c("age", "performance_status"),
  reference_treatment = "Control",
  show_statistics = TRUE
)

Interaction Testing

Test for treatment-by-subgroup interactions:

# Test for interaction effects
result_interaction <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "tumor_stage",
  interaction_test = TRUE,
  show_overall = TRUE
)

Custom Display Options

Customize the appearance and range of the forest plot:

# Custom forest plot settings
result_custom <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "biomarker_status",
  plot_title = "Biomarker-Stratified Treatment Analysis",
  plot_theme = "publication",
  hr_range = "wide",
  sort_by_hr = TRUE,
  confidence_level = 0.95
)

Multiple Subgroup Analysis

Multi-Subgroup Dataset

Let’s work with a dataset containing multiple molecular subtypes:

# Load multi-subgroup data
data("groupedforest_multi_subgroups")

# Examine the subgroups
table(groupedforest_multi_subgroups$molecular_subtype)

Molecular Subtype Analysis

Analyze treatment effects across molecular subtypes:

# Molecular subtype forest plot
result_molecular <- groupedforest(
  data = groupedforest_multi_subgroups,
  time_var = "time_to_event",
  event_var = "event_occurred",
  treatment_var = "intervention",
  grouping_var = "molecular_subtype",
  covariates = "patient_age",
  show_statistics = TRUE,
  show_counts = TRUE
)

Risk Category Stratification

Compare treatment effects across risk categories:

# Risk category analysis
result_risk <- groupedforest(
  data = groupedforest_multi_subgroups,
  time_var = "time_to_event",
  event_var = "event_occurred",
  treatment_var = "intervention",
  grouping_var = "risk_category",
  interaction_test = TRUE,
  plot_theme = "minimal"
)

Precision Medicine Applications

Genomic Variant Analysis

Demonstrate precision medicine applications with genomic data:

# Load precision medicine dataset
data("groupedforest_precision_medicine")

# Examine genomic variants
table(groupedforest_precision_medicine$genomic_variant)

# Genomic variant forest plot
result_genomic <- groupedforest(
  data = groupedforest_precision_medicine,
  time_var = "progression_free_months",
  event_var = "progression_event",
  treatment_var = "therapy_type",
  grouping_var = "genomic_variant",
  reference_treatment = "Chemotherapy",
  plot_title = "Precision Therapy by Genomic Variant"
)

Expression Level Stratification

Analyze treatment effects by biomarker expression levels:

# Expression level analysis
result_expression <- groupedforest(
  data = groupedforest_precision_medicine,
  time_var = "progression_free_months",
  event_var = "progression_event",
  treatment_var = "therapy_type",
  grouping_var = "expression_level",
  covariates = c("age_at_diagnosis", "tumor_size"),
  show_overall = TRUE
)

Biomarker Stratification Studies

Biomarker Level Analysis

Work with biomarker stratification data:

# Load biomarker data
data("groupedforest_biomarker_data")

# Examine biomarker levels
table(groupedforest_biomarker_data$biomarker_level)

# Biomarker level forest plot
result_biomarker_level <- groupedforest(
  data = groupedforest_biomarker_data,
  time_var = "overall_survival_months",
  event_var = "death_indicator",
  treatment_var = "treatment_arm",
  grouping_var = "biomarker_level",
  plot_title = "Biomarker-Targeted Therapy by Expression Level",
  hr_range = "custom",
  custom_hr_min = 0.2,
  custom_hr_max = 5.0
)

Pathway Status Analysis

Analyze treatment effects by pathway activation status:

# Pathway status analysis
result_pathway <- groupedforest(
  data = groupedforest_biomarker_data,
  time_var = "overall_survival_months",
  event_var = "death_indicator",
  treatment_var = "treatment_arm",
  grouping_var = "pathway_status",
  plot_theme = "publication",
  show_counts = TRUE
)

Clinical Trial Applications

Multi-Factor Clinical Trial

Demonstrate complex clinical trial analysis:

# Load clinical trial data
data("groupedforest_interaction_data")

# Examine trial characteristics
table(groupedforest_interaction_data$genetic_profile)
table(groupedforest_interaction_data$disease_severity)

# Genetic profile analysis
result_clinical <- groupedforest(
  data = groupedforest_interaction_data,
  time_var = "event_free_survival",
  event_var = "event_status",
  treatment_var = "randomized_treatment",
  grouping_var = "genetic_profile",
  interaction_test = TRUE,
  export_data = TRUE,
  plot_title = "Clinical Trial: Treatment by Genetic Profile"
)

Disease Severity Stratification

Analyze treatment effects across disease severity levels:

# Disease severity analysis
result_severity <- groupedforest(
  data = groupedforest_interaction_data,
  time_var = "event_free_survival",
  event_var = "event_status",
  treatment_var = "randomized_treatment",
  grouping_var = "disease_severity",
  covariates = c("age_years", "baseline_severity_score"),
  confidence_level = 0.90
)

Customization and Visualization Options

Plot Themes

Demonstrate different plot themes:

# Load simple dataset for theme demonstration
data("groupedforest_simple_data")

# Clinical theme (default)
result_theme_clinical <- groupedforest(
  data = groupedforest_simple_data,
  time_var = "time",
  event_var = "event",
  treatment_var = "treatment",
  grouping_var = "subgroup",
  plot_theme = "clinical",
  plot_title = "Clinical Theme"
)

# Publication theme
result_theme_pub <- groupedforest(
  data = groupedforest_simple_data,
  time_var = "time",
  event_var = "event",
  treatment_var = "treatment",
  grouping_var = "subgroup",
  plot_theme = "publication",
  plot_title = "Publication Theme"
)

Hazard Ratio Ranges

Demonstrate different HR range options:

# Wide range
result_wide <- groupedforest(
  data = groupedforest_simple_data,
  time_var = "time",
  event_var = "event",
  treatment_var = "treatment",
  grouping_var = "subgroup",
  hr_range = "wide",
  plot_title = "Wide HR Range (0.1-10)"
)

# Narrow range
result_narrow <- groupedforest(
  data = groupedforest_simple_data,
  time_var = "time",
  event_var = "event",
  treatment_var = "treatment",
  grouping_var = "subgroup",
  hr_range = "narrow",
  plot_title = "Narrow HR Range (0.5-2)"
)

Statistical Considerations

Sample Size and Power

Understanding the impact of subgroup sample sizes:

# Examine sample sizes by subgroup
groupedforest_comprehensive_data %>%
  group_by(biomarker_status, treatment) %>%
  summarise(
    n = n(),
    events = sum(death_event),
    event_rate = round(mean(death_event), 3),
    .groups = 'drop'
  ) %>%
  knitr::kable(caption = "Sample Sizes and Event Rates by Subgroup")

Confidence Intervals

Demonstrate different confidence levels:

# Different confidence levels
conf_levels <- c(0.80, 0.90, 0.95, 0.99)

for (conf_level in conf_levels) {
  result <- groupedforest(
    data = groupedforest_simple_data,
    time_var = "time",
    event_var = "event", 
    treatment_var = "treatment",
    grouping_var = "subgroup",
    confidence_level = conf_level,
    plot_title = paste0(conf_level * 100, "% Confidence Intervals")
  )
}

Interpretation Guidelines

Reading Forest Plots

Key Elements:

Vertical line at HR = 1: No treatment effect
Points to the left (HR < 1): Treatment reduces hazard (beneficial)
Points to the right (HR > 1): Treatment increases hazard (harmful)
Horizontal lines: Confidence intervals
Wider intervals: Less precise estimates (smaller sample sizes)

Statistical Significance:

Confidence interval excludes 1: Statistically significant effect
Confidence interval includes 1: Non-significant effect
P-values: Individual subgroup significance tests

Clinical Interpretation

Treatment Heterogeneity:

Consistent effects: Similar HRs across subgroups
Treatment interactions: Varying effects between subgroups
Precision medicine opportunities: Subgroups with differential benefit

Clinical Decision Making:

Patient selection: Identify subgroups most likely to benefit
Treatment recommendation: Consider subgroup-specific effects
Future research: Hypothesis generation for subsequent studies

Advanced Statistical Methods

Interaction Testing

Understanding treatment-by-subgroup interactions:

# Comprehensive interaction analysis
result_full_interaction <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "biomarker_status",
  interaction_test = TRUE,
  covariates = c("age", "performance_status"),
  show_overall = TRUE,
  show_statistics = TRUE
)

Multiple Comparisons

Considerations for multiple subgroup testing:

# Multiple subgroup analysis
result_multiple <- groupedforest(
  data = groupedforest_multi_subgroups,
  time_var = "time_to_event",
  event_var = "event_occurred",
  treatment_var = "intervention",
  grouping_var = "molecular_subtype",  # 4 subgroups = multiple comparisons
  show_statistics = TRUE,
  plot_title = "Multiple Subgroup Analysis (Consider Adjustment)"
)

Clinical Research Applications

Regulatory Submissions

Guidelines for regulatory use:

Best Practices:

Pre-specified analyses: Define subgroups before data analysis
Clinical rationale: Justify subgroup selections
Statistical plan: Document analysis methodology
Interaction testing: Formally test for treatment interactions
Multiple comparisons: Consider adjustment methods

Publication Guidelines

Reporting Standards:

CONSORT guidelines: Subgroup analysis reporting
Transparency: Report all analyzed subgroups
Effect sizes: Include confidence intervals
Clinical significance: Interpret beyond statistical significance

Troubleshooting and Common Issues

Data Requirements

Minimum Sample Sizes:

Per subgroup: At least 10-15 events recommended
Treatment arms: Balanced representation preferred
Event rates: Sufficient events for Cox regression

Data Quality:

# Check data quality
check_data_quality <- function(data, time_var, event_var, treatment_var, grouping_var) {
  cat("Data Quality Checks:\n")
  cat("Total sample size:", nrow(data), "\n")
  cat("Complete cases:", sum(complete.cases(data[, c(time_var, event_var, treatment_var, grouping_var)])), "\n")
  cat("Event rate:", round(mean(data[[event_var]], na.rm = TRUE), 3), "\n")
  
  # Subgroup sizes
  cat("\nSubgroup sizes:\n")
  print(table(data[[grouping_var]], data[[treatment_var]]))
  
  # Event rates by subgroup
  cat("\nEvent rates by subgroup:\n")
  event_rates <- data %>%
    group_by(.data[[grouping_var]], .data[[treatment_var]]) %>%
    summarise(event_rate = round(mean(.data[[event_var]], na.rm = TRUE), 3), .groups = 'drop')
  print(event_rates)
}

check_data_quality(
  groupedforest_comprehensive_data,
  "survival_months",
  "death_event", 
  "treatment",
  "biomarker_status"
)

Common Error Messages

Solutions:

“No valid Cox regression results”: Check subgroup sample sizes
“Insufficient events”: Verify event rates per subgroup
“Model convergence failed”: Consider removing small subgroups
“Missing data”: Address missing values in key variables

Summary

The Grouped Hazard Forest Plot module provides a comprehensive solution for subgroup analysis in survival studies. Key benefits include:

Clinical Value:

Precision Medicine: Identify patients most likely to benefit
Treatment Personalization: Guide individual treatment decisions
Research Planning: Inform future study designs

Statistical Rigor:

Formal Testing: Cox regression for each subgroup
Interaction Analysis: Test for differential treatment effects
Visualization: Clear presentation of complex results

Practical Applications:

Biomarker Studies: Evaluate predictive markers
Clinical Trials: Subgroup efficacy analysis
Drug Development: Support regulatory submissions

This module addresses the critical need for systematic subgroup analysis in clinical research, providing both statistical rigor and practical clinical insights for precision medicine applications.

References and Further Reading

Statistical Methods:

Cox Proportional Hazards Models
Forest Plot Methodology
Subgroup Analysis Guidelines
Treatment-Biomarker Interactions

Clinical Applications:

Precision Medicine Frameworks
Biomarker Validation Studies
Clinical Trial Design
Regulatory Guidelines

Software Implementation:

survival R package
ggplot2 visualization
jamovi integration
ClinicoPath module ecosystem

ClinicoPath

2025-07-13

Introduction to Grouped Forest Plots

Key Features

Clinical Applications

Precision Medicine

Clinical Trials

Biomarker Studies

Getting Started

Required Data Structure

Example Dataset

Basic Grouped Forest Analysis

Simple Two-Group Comparison

Tumor Stage Stratification

Gender-Based Subgroup Analysis

Advanced Analysis Features

Covariate Adjustment

Interaction Testing

Custom Display Options

Multiple Subgroup Analysis

Multi-Subgroup Dataset

Molecular Subtype Analysis

Risk Category Stratification

Precision Medicine Applications

Genomic Variant Analysis

Expression Level Stratification

Biomarker Stratification Studies

Biomarker Level Analysis

Pathway Status Analysis

Clinical Trial Applications

Multi-Factor Clinical Trial

Disease Severity Stratification

Customization and Visualization Options

Plot Themes

Hazard Ratio Ranges

Statistical Considerations

Sample Size and Power

Confidence Intervals

Interpretation Guidelines

Reading Forest Plots

Key Elements:

Statistical Significance:

Clinical Interpretation

Treatment Heterogeneity:

Clinical Decision Making:

Advanced Statistical Methods

Interaction Testing

Multiple Comparisons

Clinical Research Applications

Regulatory Submissions

Best Practices:

Publication Guidelines

Reporting Standards:

Troubleshooting and Common Issues

Data Requirements

Minimum Sample Sizes:

Data Quality:

Common Error Messages

Solutions:

Summary

Clinical Value:

Statistical Rigor:

Practical Applications:

References and Further Reading

Statistical Methods:

Clinical Applications:

Software Implementation: