Skip to contents

Introduction to Grouped Forest Plots

The Grouped Hazard Forest Plot module provides a powerful visualization method for comparing treatment effects across different patient subgroups. This analysis addresses a critical need in precision medicine and clinical research by performing separate Cox proportional hazards regression analyses for each subgroup and presenting the results in a unified forest plot.

This module was developed to address GitHub Issue #88: Create grouped forest plots showing treatment vs control for each variant.

Key Features

  • Subgroup-Specific Analysis: Separate Cox regression for each group
  • Visual Comparison: Side-by-side hazard ratios in forest plot format
  • Statistical Testing: Individual p-values and interaction tests
  • Clinical Applications: Perfect for biomarker studies, precision medicine, and clinical trials
  • Flexible Grouping: Support for any categorical grouping variable

Clinical Applications

Precision Medicine

  • Treatment efficacy by genetic variants
  • Biomarker-stratified therapy selection
  • Molecular subtype analysis

Clinical Trials

  • Subgroup efficacy analysis
  • Patient population identification
  • Treatment interaction detection

Biomarker Studies

  • Predictive biomarker evaluation
  • Treatment-biomarker interactions
  • Patient stratification strategies

Getting Started

Required Data Structure

Your dataset should contain:

  1. Time Variable: Continuous variable for follow-up duration
  2. Event Variable: Binary indicator (0=censored, 1=event)
  3. Treatment Variable: Factor variable for treatment comparison
  4. Grouping Variable: Categorical variable defining subgroups

Example Dataset

Let’s examine the structure of our test data:

library(ClinicoPath)
library(dplyr)
library(ggplot2)

# Load example data
data("groupedforest_comprehensive_data")

# Examine data structure
str(groupedforest_comprehensive_data)
# Preview the data
head(groupedforest_comprehensive_data) %>%
  knitr::kable(caption = "Sample of Comprehensive Grouped Forest Data")

Basic Grouped Forest Analysis

Simple Two-Group Comparison

Let’s start with a basic analysis comparing treatment effects between biomarker-positive and biomarker-negative patients:

# Basic grouped forest plot by biomarker status
result_biomarker <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "biomarker_status",
  show_statistics = TRUE,
  show_overall = TRUE
)

# The result object contains the analysis
class(result_biomarker)

Tumor Stage Stratification

Compare treatment effects between early and advanced tumor stages:

# Grouped analysis by tumor stage
result_stage <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "tumor_stage",
  confidence_level = 0.95,
  sort_by_hr = TRUE,
  plot_title = "Treatment Effects by Tumor Stage"
)

Gender-Based Subgroup Analysis

Evaluate treatment effects across gender groups:

# Analysis by gender
result_gender <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "gender",
  show_counts = TRUE,
  plot_theme = "clinical"
)

Advanced Analysis Features

Covariate Adjustment

Include covariates to adjust for confounding factors:

# Analysis with covariate adjustment
result_adjusted <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "biomarker_status",
  covariates = c("age", "performance_status"),
  reference_treatment = "Control",
  show_statistics = TRUE
)

Interaction Testing

Test for treatment-by-subgroup interactions:

# Test for interaction effects
result_interaction <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "tumor_stage",
  interaction_test = TRUE,
  show_overall = TRUE
)

Custom Display Options

Customize the appearance and range of the forest plot:

# Custom forest plot settings
result_custom <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "biomarker_status",
  plot_title = "Biomarker-Stratified Treatment Analysis",
  plot_theme = "publication",
  hr_range = "wide",
  sort_by_hr = TRUE,
  confidence_level = 0.95
)

Multiple Subgroup Analysis

Multi-Subgroup Dataset

Let’s work with a dataset containing multiple molecular subtypes:

# Load multi-subgroup data
data("groupedforest_multi_subgroups")

# Examine the subgroups
table(groupedforest_multi_subgroups$molecular_subtype)

Molecular Subtype Analysis

Analyze treatment effects across molecular subtypes:

# Molecular subtype forest plot
result_molecular <- groupedforest(
  data = groupedforest_multi_subgroups,
  time_var = "time_to_event",
  event_var = "event_occurred",
  treatment_var = "intervention",
  grouping_var = "molecular_subtype",
  covariates = "patient_age",
  show_statistics = TRUE,
  show_counts = TRUE
)

Risk Category Stratification

Compare treatment effects across risk categories:

# Risk category analysis
result_risk <- groupedforest(
  data = groupedforest_multi_subgroups,
  time_var = "time_to_event",
  event_var = "event_occurred",
  treatment_var = "intervention",
  grouping_var = "risk_category",
  interaction_test = TRUE,
  plot_theme = "minimal"
)

Precision Medicine Applications

Genomic Variant Analysis

Demonstrate precision medicine applications with genomic data:

# Load precision medicine dataset
data("groupedforest_precision_medicine")

# Examine genomic variants
table(groupedforest_precision_medicine$genomic_variant)
# Genomic variant forest plot
result_genomic <- groupedforest(
  data = groupedforest_precision_medicine,
  time_var = "progression_free_months",
  event_var = "progression_event",
  treatment_var = "therapy_type",
  grouping_var = "genomic_variant",
  reference_treatment = "Chemotherapy",
  plot_title = "Precision Therapy by Genomic Variant"
)

Expression Level Stratification

Analyze treatment effects by biomarker expression levels:

# Expression level analysis
result_expression <- groupedforest(
  data = groupedforest_precision_medicine,
  time_var = "progression_free_months",
  event_var = "progression_event",
  treatment_var = "therapy_type",
  grouping_var = "expression_level",
  covariates = c("age_at_diagnosis", "tumor_size"),
  show_overall = TRUE
)

Biomarker Stratification Studies

Biomarker Level Analysis

Work with biomarker stratification data:

# Load biomarker data
data("groupedforest_biomarker_data")

# Examine biomarker levels
table(groupedforest_biomarker_data$biomarker_level)
# Biomarker level forest plot
result_biomarker_level <- groupedforest(
  data = groupedforest_biomarker_data,
  time_var = "overall_survival_months",
  event_var = "death_indicator",
  treatment_var = "treatment_arm",
  grouping_var = "biomarker_level",
  plot_title = "Biomarker-Targeted Therapy by Expression Level",
  hr_range = "custom",
  custom_hr_min = 0.2,
  custom_hr_max = 5.0
)

Pathway Status Analysis

Analyze treatment effects by pathway activation status:

# Pathway status analysis
result_pathway <- groupedforest(
  data = groupedforest_biomarker_data,
  time_var = "overall_survival_months",
  event_var = "death_indicator",
  treatment_var = "treatment_arm",
  grouping_var = "pathway_status",
  plot_theme = "publication",
  show_counts = TRUE
)

Clinical Trial Applications

Multi-Factor Clinical Trial

Demonstrate complex clinical trial analysis:

# Load clinical trial data
data("groupedforest_interaction_data")

# Examine trial characteristics
table(groupedforest_interaction_data$genetic_profile)
table(groupedforest_interaction_data$disease_severity)
# Genetic profile analysis
result_clinical <- groupedforest(
  data = groupedforest_interaction_data,
  time_var = "event_free_survival",
  event_var = "event_status",
  treatment_var = "randomized_treatment",
  grouping_var = "genetic_profile",
  interaction_test = TRUE,
  export_data = TRUE,
  plot_title = "Clinical Trial: Treatment by Genetic Profile"
)

Disease Severity Stratification

Analyze treatment effects across disease severity levels:

# Disease severity analysis
result_severity <- groupedforest(
  data = groupedforest_interaction_data,
  time_var = "event_free_survival",
  event_var = "event_status",
  treatment_var = "randomized_treatment",
  grouping_var = "disease_severity",
  covariates = c("age_years", "baseline_severity_score"),
  confidence_level = 0.90
)

Customization and Visualization Options

Plot Themes

Demonstrate different plot themes:

# Load simple dataset for theme demonstration
data("groupedforest_simple_data")

# Clinical theme (default)
result_theme_clinical <- groupedforest(
  data = groupedforest_simple_data,
  time_var = "time",
  event_var = "event",
  treatment_var = "treatment",
  grouping_var = "subgroup",
  plot_theme = "clinical",
  plot_title = "Clinical Theme"
)

# Publication theme
result_theme_pub <- groupedforest(
  data = groupedforest_simple_data,
  time_var = "time",
  event_var = "event",
  treatment_var = "treatment",
  grouping_var = "subgroup",
  plot_theme = "publication",
  plot_title = "Publication Theme"
)

Hazard Ratio Ranges

Demonstrate different HR range options:

# Wide range
result_wide <- groupedforest(
  data = groupedforest_simple_data,
  time_var = "time",
  event_var = "event",
  treatment_var = "treatment",
  grouping_var = "subgroup",
  hr_range = "wide",
  plot_title = "Wide HR Range (0.1-10)"
)

# Narrow range
result_narrow <- groupedforest(
  data = groupedforest_simple_data,
  time_var = "time",
  event_var = "event",
  treatment_var = "treatment",
  grouping_var = "subgroup",
  hr_range = "narrow",
  plot_title = "Narrow HR Range (0.5-2)"
)

Statistical Considerations

Sample Size and Power

Understanding the impact of subgroup sample sizes:

# Examine sample sizes by subgroup
groupedforest_comprehensive_data %>%
  group_by(biomarker_status, treatment) %>%
  summarise(
    n = n(),
    events = sum(death_event),
    event_rate = round(mean(death_event), 3),
    .groups = 'drop'
  ) %>%
  knitr::kable(caption = "Sample Sizes and Event Rates by Subgroup")

Confidence Intervals

Demonstrate different confidence levels:

# Different confidence levels
conf_levels <- c(0.80, 0.90, 0.95, 0.99)

for (conf_level in conf_levels) {
  result <- groupedforest(
    data = groupedforest_simple_data,
    time_var = "time",
    event_var = "event", 
    treatment_var = "treatment",
    grouping_var = "subgroup",
    confidence_level = conf_level,
    plot_title = paste0(conf_level * 100, "% Confidence Intervals")
  )
}

Interpretation Guidelines

Reading Forest Plots

Key Elements:

  • Vertical line at HR = 1: No treatment effect
  • Points to the left (HR < 1): Treatment reduces hazard (beneficial)
  • Points to the right (HR > 1): Treatment increases hazard (harmful)
  • Horizontal lines: Confidence intervals
  • Wider intervals: Less precise estimates (smaller sample sizes)

Statistical Significance:

  • Confidence interval excludes 1: Statistically significant effect
  • Confidence interval includes 1: Non-significant effect
  • P-values: Individual subgroup significance tests

Clinical Interpretation

Treatment Heterogeneity:

  • Consistent effects: Similar HRs across subgroups
  • Treatment interactions: Varying effects between subgroups
  • Precision medicine opportunities: Subgroups with differential benefit

Clinical Decision Making:

  • Patient selection: Identify subgroups most likely to benefit
  • Treatment recommendation: Consider subgroup-specific effects
  • Future research: Hypothesis generation for subsequent studies

Advanced Statistical Methods

Interaction Testing

Understanding treatment-by-subgroup interactions:

# Comprehensive interaction analysis
result_full_interaction <- groupedforest(
  data = groupedforest_comprehensive_data,
  time_var = "survival_months",
  event_var = "death_event",
  treatment_var = "treatment",
  grouping_var = "biomarker_status",
  interaction_test = TRUE,
  covariates = c("age", "performance_status"),
  show_overall = TRUE,
  show_statistics = TRUE
)

Multiple Comparisons

Considerations for multiple subgroup testing:

# Multiple subgroup analysis
result_multiple <- groupedforest(
  data = groupedforest_multi_subgroups,
  time_var = "time_to_event",
  event_var = "event_occurred",
  treatment_var = "intervention",
  grouping_var = "molecular_subtype",  # 4 subgroups = multiple comparisons
  show_statistics = TRUE,
  plot_title = "Multiple Subgroup Analysis (Consider Adjustment)"
)

Clinical Research Applications

Regulatory Submissions

Guidelines for regulatory use:

Best Practices:

  1. Pre-specified analyses: Define subgroups before data analysis
  2. Clinical rationale: Justify subgroup selections
  3. Statistical plan: Document analysis methodology
  4. Interaction testing: Formally test for treatment interactions
  5. Multiple comparisons: Consider adjustment methods

Publication Guidelines

Reporting Standards:

  • CONSORT guidelines: Subgroup analysis reporting
  • Transparency: Report all analyzed subgroups
  • Effect sizes: Include confidence intervals
  • Clinical significance: Interpret beyond statistical significance

Troubleshooting and Common Issues

Data Requirements

Minimum Sample Sizes:

  • Per subgroup: At least 10-15 events recommended
  • Treatment arms: Balanced representation preferred
  • Event rates: Sufficient events for Cox regression

Data Quality:

# Check data quality
check_data_quality <- function(data, time_var, event_var, treatment_var, grouping_var) {
  cat("Data Quality Checks:\n")
  cat("Total sample size:", nrow(data), "\n")
  cat("Complete cases:", sum(complete.cases(data[, c(time_var, event_var, treatment_var, grouping_var)])), "\n")
  cat("Event rate:", round(mean(data[[event_var]], na.rm = TRUE), 3), "\n")
  
  # Subgroup sizes
  cat("\nSubgroup sizes:\n")
  print(table(data[[grouping_var]], data[[treatment_var]]))
  
  # Event rates by subgroup
  cat("\nEvent rates by subgroup:\n")
  event_rates <- data %>%
    group_by(.data[[grouping_var]], .data[[treatment_var]]) %>%
    summarise(event_rate = round(mean(.data[[event_var]], na.rm = TRUE), 3), .groups = 'drop')
  print(event_rates)
}

check_data_quality(
  groupedforest_comprehensive_data,
  "survival_months",
  "death_event", 
  "treatment",
  "biomarker_status"
)

Common Error Messages

Solutions:

  1. “No valid Cox regression results”: Check subgroup sample sizes
  2. “Insufficient events”: Verify event rates per subgroup
  3. “Model convergence failed”: Consider removing small subgroups
  4. “Missing data”: Address missing values in key variables

Summary

The Grouped Hazard Forest Plot module provides a comprehensive solution for subgroup analysis in survival studies. Key benefits include:

Clinical Value:

  • Precision Medicine: Identify patients most likely to benefit
  • Treatment Personalization: Guide individual treatment decisions
  • Research Planning: Inform future study designs

Statistical Rigor:

  • Formal Testing: Cox regression for each subgroup
  • Interaction Analysis: Test for differential treatment effects
  • Visualization: Clear presentation of complex results

Practical Applications:

  • Biomarker Studies: Evaluate predictive markers
  • Clinical Trials: Subgroup efficacy analysis
  • Drug Development: Support regulatory submissions

This module addresses the critical need for systematic subgroup analysis in clinical research, providing both statistical rigor and practical clinical insights for precision medicine applications.


References and Further Reading

Statistical Methods:

  • Cox Proportional Hazards Models
  • Forest Plot Methodology
  • Subgroup Analysis Guidelines
  • Treatment-Biomarker Interactions

Clinical Applications:

  • Precision Medicine Frameworks
  • Biomarker Validation Studies
  • Clinical Trial Design
  • Regulatory Guidelines

Software Implementation:

  • survival R package
  • ggplot2 visualization
  • jamovi integration
  • ClinicoPath module ecosystem