Forest Plots from Regression Models with jforestmodel

library(ClinicoPath)
library(ggplot2)
library(dplyr)

Introduction

The jforestmodel function provides comprehensive forest plot visualization capabilities for regression models including linear regression, logistic regression, and Cox proportional hazards models. Unlike traditional forest plots that display pre-calculated effect sizes, this function fits regression models directly from raw data and creates publication-ready forest plots of the model coefficients.

Forest plots from regression models are essential tools in medical research for:

Visualizing the effect of multiple predictors simultaneously
Comparing the relative importance of different risk factors
Presenting odds ratios, hazard ratios, or linear coefficients with confidence intervals
Creating publication-ready figures for medical journals
Communicating statistical findings to clinical audiences

Key Features

The jforestmodel function supports:

Multiple Model Types: Linear regression (lm), logistic regression (glm), and Cox proportional hazards (coxph)
Automatic Model Fitting: Fits models directly from your data
Professional Visualization: Uses the forestmodel package for high-quality plots
Flexible Customization: Colors, fonts, sizes, and layout options
Clinical Interpretation: Automatic interpretation guidance
Export Options: Multiple formats for publication

Basic Linear Regression

Simple Linear Model

Let’s start with a basic linear regression using the histopathology dataset:

# Load histopathology data
data(histopathology, package = "ClinicoPath")

# Display the first few rows
head(histopathology[c("Age", "Grade", "TStage", "LVI", "PNI")])
#> # A tibble: 6 × 5
#>     Age Grade TStage LVI     PNI    
#>   <dbl> <dbl>  <dbl> <chr>   <chr>  
#> 1    27     2      4 Present Absent 
#> 2    36     2      4 Absent  Absent 
#> 3    65     1      4 Absent  Absent 
#> 4    51     3      4 Present Absent 
#> 5    58     2      1 Absent  Absent 
#> 6    53     2      4 Present Present

# Create a basic linear regression forest plot
linear_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Age",
  predictor_vars = c("Grade", "TStage", "LVI", "PNI"),
  model_type = "lm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  plot_title = "Linear Regression: Predictors of Age",
  x_axis_label = "Beta Coefficient (Years)",
  show_summary = TRUE,
  show_interpretation = TRUE
)

# The plot will be displayed in jamovi interface

Understanding Linear Regression Forest Plots

In linear regression forest plots:

Beta coefficients represent the change in the dependent variable for a one-unit increase in each predictor
Confidence intervals that don’t cross zero indicate statistically significant associations
Point estimates show the direction and magnitude of the effect
Reference line at zero represents no effect

Logistic Regression

Binary Outcome Models

Logistic regression is commonly used for binary outcomes. Let’s predict lymphovascular invasion (LVI):

# Logistic regression with odds ratios
logistic_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  plot_title = "Logistic Regression: Predictors of Lymphovascular Invasion",
  x_axis_label = "Odds Ratio (95% CI)",
  show_summary = TRUE,
  confidence_level = 0.95
)

Working with the Wisconsin Breast Cancer Dataset

Let’s use the classic Wisconsin Breast Cancer dataset for cancer diagnosis prediction:

# Load breast cancer data
data(BreastCancer, package = "ClinicoPath")

# Display dataset structure
str(BreastCancer[c("Class", "Cl.thickness", "Cell.size", "Cell.shape", "Marg.adhesion")])
#> tibble [699 × 5] (S3: tbl_df/tbl/data.frame)
#>  $ Class        : chr [1:699] "benign" "benign" "benign" "benign" ...
#>  $ Cl.thickness : num [1:699] 5 5 3 6 4 8 1 2 2 4 ...
#>  $ Cell.size    : num [1:699] 1 4 1 8 1 10 1 1 1 2 ...
#>  $ Cell.shape   : num [1:699] 1 4 1 8 1 10 1 2 1 1 ...
#>  $ Marg.adhesion: num [1:699] 1 5 1 1 3 8 1 1 1 1 ...

# Comprehensive breast cancer prediction model
breast_cancer_model <- jforestmodel(
  data = BreastCancer,
  dependent_var = "Class",
  predictor_vars = c("Cl.thickness", "Cell.size", "Cell.shape", 
                    "Marg.adhesion", "Epith.c.size", "Bare.nuclei"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  plot_title = "Breast Cancer Diagnosis Prediction Model",
  x_axis_label = "Odds Ratio (95% CI)",
  color_scheme = "blue",
  factor_separate_line = TRUE,
  show_p_values = TRUE,
  show_confidence_intervals = TRUE
)

Different GLM Families

The function supports various GLM families:

# Poisson regression example
poisson_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Age",  # Treating age as count for demonstration
  predictor_vars = c("Grade", "TStage", "LVI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "poisson",
  exponentiate = TRUE,
  plot_title = "Poisson Regression Model",
  x_axis_label = "Rate Ratio"
)

# Gaussian GLM (equivalent to linear regression)
gaussian_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Age",
  predictor_vars = c("Grade", "TStage", "LVI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "gaussian",
  exponentiate = FALSE,
  plot_title = "Gaussian GLM Model",
  x_axis_label = "Coefficient"
)

Cox Proportional Hazards Models

Survival Analysis with Colon Cancer Data

Cox regression is used for time-to-event analysis. Let’s use the colon cancer dataset:

# Load colon cancer survival data
data(colon, package = "ClinicoPath")

# Display relevant variables
head(colon[c("time", "status", "age", "sex", "obstruct", "perfor", "adhere")])
#> # A tibble: 6 × 7
#>    time status   age   sex obstruct perfor adhere
#>   <dbl>  <dbl> <dbl> <dbl>    <dbl>  <dbl>  <dbl>
#> 1  1521      1    43     1        0      0      0
#> 2   968      1    43     1        0      0      0
#> 3  3087      0    63     1        0      0      0
#> 4  3087      0    63     1        0      0      0
#> 5   963      1    71     0        0      0      1
#> 6   542      1    71     0        0      0      1

# Cox proportional hazards model
cox_model <- jforestmodel(
  data = colon,
  dependent_var = "status",  # Required by interface but not used directly
  predictor_vars = c("age", "sex", "obstruct", "perfor", "adhere"),
  model_type = "coxph",
  time_var = "time",
  event_var = "status",
  covariates = NULL,
  exponentiate = TRUE,
  plot_title = "Cox Regression: Colon Cancer Survival",
  x_axis_label = "Hazard Ratio (95% CI)",
  confidence_level = 0.95,
  show_summary = TRUE
)

Melanoma Survival Analysis

Let’s also examine melanoma survival data:

# Load melanoma data
data(melanoma, package = "ClinicoPath")

# Display melanoma variables
head(melanoma[c("time", "status", "age", "sex", "thickness", "ulcer")])
#> # A tibble: 6 × 6
#>    time status   age   sex thickness ulcer
#>   <dbl>  <dbl> <dbl> <dbl>     <dbl> <dbl>
#> 1    10      3    76     1      6.76     1
#> 2    30      3    56     1      0.65     0
#> 3    35      2    41     1      1.34     0
#> 4    99      3    71     0      2.9      0
#> 5   185      1    52     1     12.1      1
#> 6   204      1    28     1      4.84     1

# Melanoma survival model
melanoma_model <- jforestmodel(
  data = melanoma,
  dependent_var = "status",
  predictor_vars = c("age", "sex", "thickness", "ulcer"),
  model_type = "coxph",
  time_var = "time",
  event_var = "status",
  covariates = NULL,
  exponentiate = TRUE,
  plot_title = "Melanoma Survival Analysis",
  x_axis_label = "Hazard Ratio (95% CI)",
  color_scheme = "red",
  show_interpretation = TRUE
)

Advanced Customization

Color Schemes and Visual Options

The function provides multiple color schemes and visual customization:

# Custom color scheme
custom_color_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  color_scheme = "custom",
  custom_color = "#FF5722",
  point_size = 3.0,
  line_size = 1.2,
  plot_title = "Custom Styled Forest Plot"
)

# Professional medical color scheme
medical_style_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  color_scheme = "green",
  plot_title = "Medical Style Forest Plot",
  show_reference_line = TRUE,
  reference_value = 1
)

Confidence Levels and Statistical Options

# 99% confidence intervals
high_confidence_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  confidence_level = 0.99,
  plot_title = "High Confidence Model (99% CI)"
)

# 90% confidence intervals
low_confidence_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  confidence_level = 0.90,
  plot_title = "Conservative Model (90% CI)"
)

Factor Variable Handling

# Factor variables on separate lines (default)
factor_separate_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Death",
  predictor_vars = c("Age", "Grade", "TStage", "LVI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  factor_separate_line = TRUE,
  plot_title = "Factor Levels on Separate Lines"
)

# Factor variables on same line
factor_together_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Death",
  predictor_vars = c("Age", "Grade", "TStage", "LVI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  factor_separate_line = FALSE,
  plot_title = "Factor Levels Together"
)

Variable Sorting and Selection

Sorting Options

The function provides several sorting options for variables:

# Sort by coefficient size
coefficient_sorted_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  sort_variables = "coefficient",
  plot_title = "Sorted by Coefficient Size"
)

# Sort by p-value
pvalue_sorted_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  sort_variables = "pvalue",
  plot_title = "Sorted by P-value"
)

# Alphabetical sorting
alphabetical_sorted_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  sort_variables = "alphabetical",
  plot_title = "Alphabetically Sorted"
)

Covariate Selection

You can specify which covariates to display in the plot:

# Display only specific covariates
selected_covariates_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = c("Age", "Grade"),  # Only show these in plot
  family = "binomial",
  exponentiate = TRUE,
  plot_title = "Selected Covariates Only"
)

Model Diagnostics and Interpretation

Model Summary Statistics

# Comprehensive model with full reporting
comprehensive_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  show_summary = TRUE,
  show_interpretation = TRUE,
  plot_title = "Comprehensive Model Analysis"
)

Linear Model Diagnostics

For linear models, additional diagnostic information is provided:

# Linear model with diagnostics
linear_diagnostics_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Age",
  predictor_vars = c("Grade", "TStage", "LVI", "PNI"),
  model_type = "lm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  show_summary = TRUE,
  plot_title = "Linear Model with Diagnostics"
)

Publication-Ready Examples

High-Quality Clinical Example

# Publication-ready breast cancer model
publication_model <- jforestmodel(
  data = BreastCancer,
  dependent_var = "Class",
  predictor_vars = c("Cl.thickness", "Cell.size", "Cell.shape", 
                    "Marg.adhesion", "Epith.c.size"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  confidence_level = 0.95,
  plot_title = "Predictors of Breast Cancer Malignancy",
  x_axis_label = "Odds Ratio (95% Confidence Interval)",
  color_scheme = "blue",
  point_size = 2.5,
  line_size = 0.8,
  factor_separate_line = TRUE,
  show_p_values = TRUE,
  show_confidence_intervals = TRUE,
  show_reference_line = TRUE,
  reference_value = 1,
  show_summary = TRUE,
  show_interpretation = TRUE
)

Multi-Model Comparison

You can create multiple models for comparison:

# Model 1: Basic predictors
basic_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Death",
  predictor_vars = c("Age", "Grade"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  plot_title = "Basic Model: Age and Grade Only"
)

# Model 2: Extended predictors
extended_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Death",
  predictor_vars = c("Age", "Grade", "TStage", "LVI", "PNI"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  plot_title = "Extended Model: Multiple Predictors"
)

Layout and Panel Customization

Panel Width Ratios

# Custom panel width ratios
custom_layout_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  panel_width_ratio = "2:3:1",
  plot_title = "Custom Panel Layout"
)

Reference Lines and Values

# Custom reference line for linear model
linear_reference_model <- jforestmodel(
  data = histopathology,
  dependent_var = "Age",
  predictor_vars = c("Grade", "TStage", "LVI"),
  model_type = "lm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  show_reference_line = TRUE,
  reference_value = 0,
  plot_title = "Linear Model with Zero Reference"
)

# Custom reference line for logistic model
logistic_reference_model <- jforestmodel(
  data = histopathology,
  dependent_var = "LVI",
  predictor_vars = c("Age", "Grade", "TStage"),
  model_type = "glm",
  time_var = NULL,
  event_var = NULL,
  covariates = NULL,
  family = "binomial",
  exponentiate = TRUE,
  show_reference_line = TRUE,
  reference_value = 1,
  plot_title = "Logistic Model with OR=1 Reference"
)

Interpretation Guidelines

Understanding Different Effect Measures

Linear Regression Coefficients

Interpretation: Change in dependent variable per unit increase in predictor
Significance: Confidence intervals not including zero
Clinical relevance: Consider magnitude along with statistical significance

Odds Ratios (Logistic Regression)

OR = 1: No association
OR > 1: Increased odds of outcome
OR < 1: Decreased odds of outcome
95% CI: Must not include 1 for statistical significance

Hazard Ratios (Cox Regression)

HR = 1: No effect on hazard
HR > 1: Increased hazard (worse survival)
HR < 1: Decreased hazard (better survival)
Interpretation: Instantaneous risk at any given time

Model Assessment Guidelines

Model Fit Assessment

R-squared (linear): Proportion of variance explained
AIC (all models): Lower values indicate better fit
Residual analysis (linear/GLM): Check model assumptions
Proportional hazards assumption (Cox): Verify with diagnostics

Clinical Interpretation

Statistical vs Clinical Significance: Large effects may not be clinically meaningful
Confidence Intervals: Wide intervals suggest uncertainty
Multiple Comparisons: Consider adjustment for multiple testing
Model Limitations: Remember unmeasured confounders

Best Practices

Model Building

Variable Selection: Use clinical knowledge and statistical criteria
Sample Size: Ensure adequate power for the number of predictors
Missing Data: Handle appropriately before modeling
Assumption Checking: Verify model assumptions

Visualization

Appropriate Scale: Use exponentiation for ratios, raw coefficients for linear
Clear Labeling: Descriptive titles and axis labels
Consistent Formatting: Use standard conventions
Color Choices: Consider accessibility and journal requirements

Reporting

Model Details: Report model type, sample size, and selection criteria
Assumption Testing: Document assumption verification
Effect Sizes: Include both point estimates and confidence intervals
Clinical Context: Interpret findings in clinical context

Troubleshooting Common Issues

Data Preparation

# Ensure variables are properly formatted
data$outcome <- as.factor(data$outcome)  # For binary outcomes
data$time <- as.numeric(data$time)       # For survival time
data$event <- as.numeric(data$event)     # For event indicator

Model Convergence Issues

# For convergence problems, try:
# 1. Check for perfect separation in logistic regression
# 2. Scale continuous variables
# 3. Check for multicollinearity
# 4. Reduce number of predictors

Missing Data Handling

# Remove incomplete cases or use multiple imputation
complete_data <- data[complete.cases(data[c("outcome", "pred1", "pred2")]), ]

Advanced Features

Custom Model Objects

The function also supports custom model objects (future enhancement):

# Fit your own model first
my_model <- glm(LVI ~ Age + Grade + TStage, 
               data = histopathology, 
               family = binomial())

# Then create forest plot
custom_model_plot <- jforestmodel(
  model_object = my_model,  # Future feature
  model_type = "custom",
  exponentiate = TRUE,
  plot_title = "Custom Model Forest Plot"
)

Integration with Other Packages

The forest plots can be integrated with other R packages for enhanced analysis:

broom: For model tidying and extraction
performance: For model performance metrics
see: For additional visualization options
report: For automated interpretation

Conclusion

The jforestmodel function provides a comprehensive solution for creating professional forest plots directly from regression models. Key benefits include:

Direct Model Fitting: No need to pre-calculate effect sizes
Multiple Model Types: Support for linear, logistic, and Cox regression
Professional Quality: Publication-ready outputs
Extensive Customization: Colors, layouts, and formatting options
Clinical Integration: Appropriate for medical research applications

Forest plots from regression models are powerful tools for communicating statistical findings in medical research. The jforestmodel function makes it easy to create high-quality visualizations that meet publication standards and effectively communicate your research findings.

For advanced statistical modeling considerations, consult with a biostatistician and refer to specialized statistical texts and guidelines from your research domain.

References

Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ. 2001;322(7300):1479-80.
Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Therneau T. A Package for Survival Analysis in R. 2023.
Harrell FE. Regression Modeling Strategies. Springer, 2015.
Steyerberg EW. Clinical Prediction Models. Springer, 2019.

Professional Forest Plot Visualization for Linear, Logistic, and Cox Regression Models

ClinicoPath

2025-07-13