Forest Plots from Regression Models with jforestmodel
Professional Forest Plot Visualization for Linear, Logistic, and Cox Regression Models
ClinicoPath
2025-07-29
Source:vignettes/19-jforestmodel-comprehensive.Rmd
19-jforestmodel-comprehensive.RmdIntroduction
The jforestmodel function provides comprehensive forest
plot visualization capabilities for regression models including linear
regression, logistic regression, and Cox proportional hazards models.
Unlike traditional forest plots that display pre-calculated effect
sizes, this function fits regression models directly from raw data and
creates publication-ready forest plots of the model coefficients.
Forest plots from regression models are essential tools in medical research for:
- Visualizing the effect of multiple predictors simultaneously
- Comparing the relative importance of different risk factors
- Presenting odds ratios, hazard ratios, or linear coefficients with confidence intervals
- Creating publication-ready figures for medical journals
- Communicating statistical findings to clinical audiences
Key Features
The jforestmodel function supports:
- Multiple Model Types: Linear regression (lm), logistic regression (glm), and Cox proportional hazards (coxph)
- Automatic Model Fitting: Fits models directly from your data
- Professional Visualization: Uses the forestmodel package for high-quality plots
- Flexible Customization: Colors, fonts, sizes, and layout options
- Clinical Interpretation: Automatic interpretation guidance
- Export Options: Multiple formats for publication
Basic Linear Regression
Simple Linear Model
Let’s start with a basic linear regression using the histopathology dataset:
# Load histopathology data
data(histopathology, package = "ClinicoPath")
# Display the first few rows
head(histopathology[c("Age", "Grade", "TStage", "LVI", "PNI")])
# Create a basic linear regression forest plot
linear_model <- jforestmodel(
data = histopathology,
dependent_var = "Age",
predictor_vars = c("Grade", "TStage", "LVI", "PNI"),
model_type = "lm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
plot_title = "Linear Regression: Predictors of Age",
x_axis_label = "Beta Coefficient (Years)",
show_summary = TRUE,
show_interpretation = TRUE
)
# The plot will be displayed in jamovi interfaceUnderstanding Linear Regression Forest Plots
In linear regression forest plots:
- Beta coefficients represent the change in the dependent variable for a one-unit increase in each predictor
- Confidence intervals that don’t cross zero indicate statistically significant associations
- Point estimates show the direction and magnitude of the effect
- Reference line at zero represents no effect
Logistic Regression
Binary Outcome Models
Logistic regression is commonly used for binary outcomes. Let’s predict lymphovascular invasion (LVI):
# Logistic regression with odds ratios
logistic_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
plot_title = "Logistic Regression: Predictors of Lymphovascular Invasion",
x_axis_label = "Odds Ratio (95% CI)",
show_summary = TRUE,
confidence_level = 0.95
)Working with the Wisconsin Breast Cancer Dataset
Let’s use the classic Wisconsin Breast Cancer dataset for cancer diagnosis prediction:
# Load breast cancer data
data(BreastCancer, package = "ClinicoPath")
# Display dataset structure
str(BreastCancer[c("Class", "Cl.thickness", "Cell.size", "Cell.shape", "Marg.adhesion")])
# Comprehensive breast cancer prediction model
breast_cancer_model <- jforestmodel(
data = BreastCancer,
dependent_var = "Class",
predictor_vars = c("Cl.thickness", "Cell.size", "Cell.shape",
"Marg.adhesion", "Epith.c.size", "Bare.nuclei"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
plot_title = "Breast Cancer Diagnosis Prediction Model",
x_axis_label = "Odds Ratio (95% CI)",
color_scheme = "blue",
factor_separate_line = TRUE,
show_p_values = TRUE,
show_confidence_intervals = TRUE
)Different GLM Families
The function supports various GLM families:
# Poisson regression example
poisson_model <- jforestmodel(
data = histopathology,
dependent_var = "Age", # Treating age as count for demonstration
predictor_vars = c("Grade", "TStage", "LVI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "poisson",
exponentiate = TRUE,
plot_title = "Poisson Regression Model",
x_axis_label = "Rate Ratio"
)
# Gaussian GLM (equivalent to linear regression)
gaussian_model <- jforestmodel(
data = histopathology,
dependent_var = "Age",
predictor_vars = c("Grade", "TStage", "LVI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "gaussian",
exponentiate = FALSE,
plot_title = "Gaussian GLM Model",
x_axis_label = "Coefficient"
)Cox Proportional Hazards Models
Survival Analysis with Colon Cancer Data
Cox regression is used for time-to-event analysis. Let’s use the colon cancer dataset:
# Load colon cancer survival data
data(colon, package = "ClinicoPath")
# Display relevant variables
head(colon[c("time", "status", "age", "sex", "obstruct", "perfor", "adhere")])
# Cox proportional hazards model
cox_model <- jforestmodel(
data = colon,
dependent_var = "status", # Required by interface but not used directly
predictor_vars = c("age", "sex", "obstruct", "perfor", "adhere"),
model_type = "coxph",
time_var = "time",
event_var = "status",
covariates = NULL,
exponentiate = TRUE,
plot_title = "Cox Regression: Colon Cancer Survival",
x_axis_label = "Hazard Ratio (95% CI)",
confidence_level = 0.95,
show_summary = TRUE
)Melanoma Survival Analysis
Let’s also examine melanoma survival data:
# Load melanoma data
data(melanoma, package = "ClinicoPath")
# Display melanoma variables
head(melanoma[c("time", "status", "age", "sex", "thickness", "ulcer")])
# Melanoma survival model
melanoma_model <- jforestmodel(
data = melanoma,
dependent_var = "status",
predictor_vars = c("age", "sex", "thickness", "ulcer"),
model_type = "coxph",
time_var = "time",
event_var = "status",
covariates = NULL,
exponentiate = TRUE,
plot_title = "Melanoma Survival Analysis",
x_axis_label = "Hazard Ratio (95% CI)",
color_scheme = "red",
show_interpretation = TRUE
)Advanced Customization
Color Schemes and Visual Options
The function provides multiple color schemes and visual customization:
# Custom color scheme
custom_color_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
color_scheme = "custom",
custom_color = "#FF5722",
point_size = 3.0,
line_size = 1.2,
plot_title = "Custom Styled Forest Plot"
)
# Professional medical color scheme
medical_style_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
color_scheme = "green",
plot_title = "Medical Style Forest Plot",
show_reference_line = TRUE,
reference_value = 1
)Confidence Levels and Statistical Options
# 99% confidence intervals
high_confidence_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
confidence_level = 0.99,
plot_title = "High Confidence Model (99% CI)"
)
# 90% confidence intervals
low_confidence_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
confidence_level = 0.90,
plot_title = "Conservative Model (90% CI)"
)Factor Variable Handling
# Factor variables on separate lines (default)
factor_separate_model <- jforestmodel(
data = histopathology,
dependent_var = "Death",
predictor_vars = c("Age", "Grade", "TStage", "LVI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
factor_separate_line = TRUE,
plot_title = "Factor Levels on Separate Lines"
)
# Factor variables on same line
factor_together_model <- jforestmodel(
data = histopathology,
dependent_var = "Death",
predictor_vars = c("Age", "Grade", "TStage", "LVI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
factor_separate_line = FALSE,
plot_title = "Factor Levels Together"
)Variable Sorting and Selection
Sorting Options
The function provides several sorting options for variables:
# Sort by coefficient size
coefficient_sorted_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
sort_variables = "coefficient",
plot_title = "Sorted by Coefficient Size"
)
# Sort by p-value
pvalue_sorted_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
sort_variables = "pvalue",
plot_title = "Sorted by P-value"
)
# Alphabetical sorting
alphabetical_sorted_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
sort_variables = "alphabetical",
plot_title = "Alphabetically Sorted"
)Covariate Selection
You can specify which covariates to display in the plot:
# Display only specific covariates
selected_covariates_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = c("Age", "Grade"), # Only show these in plot
family = "binomial",
exponentiate = TRUE,
plot_title = "Selected Covariates Only"
)Model Diagnostics and Interpretation
Model Summary Statistics
# Comprehensive model with full reporting
comprehensive_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
show_summary = TRUE,
show_interpretation = TRUE,
plot_title = "Comprehensive Model Analysis"
)Linear Model Diagnostics
For linear models, additional diagnostic information is provided:
# Linear model with diagnostics
linear_diagnostics_model <- jforestmodel(
data = histopathology,
dependent_var = "Age",
predictor_vars = c("Grade", "TStage", "LVI", "PNI"),
model_type = "lm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
show_summary = TRUE,
plot_title = "Linear Model with Diagnostics"
)Publication-Ready Examples
High-Quality Clinical Example
# Publication-ready breast cancer model
publication_model <- jforestmodel(
data = BreastCancer,
dependent_var = "Class",
predictor_vars = c("Cl.thickness", "Cell.size", "Cell.shape",
"Marg.adhesion", "Epith.c.size"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
confidence_level = 0.95,
plot_title = "Predictors of Breast Cancer Malignancy",
x_axis_label = "Odds Ratio (95% Confidence Interval)",
color_scheme = "blue",
point_size = 2.5,
line_size = 0.8,
factor_separate_line = TRUE,
show_p_values = TRUE,
show_confidence_intervals = TRUE,
show_reference_line = TRUE,
reference_value = 1,
show_summary = TRUE,
show_interpretation = TRUE
)Multi-Model Comparison
You can create multiple models for comparison:
# Model 1: Basic predictors
basic_model <- jforestmodel(
data = histopathology,
dependent_var = "Death",
predictor_vars = c("Age", "Grade"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
plot_title = "Basic Model: Age and Grade Only"
)
# Model 2: Extended predictors
extended_model <- jforestmodel(
data = histopathology,
dependent_var = "Death",
predictor_vars = c("Age", "Grade", "TStage", "LVI", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
plot_title = "Extended Model: Multiple Predictors"
)Layout and Panel Customization
Panel Width Ratios
# Custom panel width ratios
custom_layout_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
panel_width_ratio = "2:3:1",
plot_title = "Custom Panel Layout"
)Reference Lines and Values
# Custom reference line for linear model
linear_reference_model <- jforestmodel(
data = histopathology,
dependent_var = "Age",
predictor_vars = c("Grade", "TStage", "LVI"),
model_type = "lm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
show_reference_line = TRUE,
reference_value = 0,
plot_title = "Linear Model with Zero Reference"
)
# Custom reference line for logistic model
logistic_reference_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
show_reference_line = TRUE,
reference_value = 1,
plot_title = "Logistic Model with OR=1 Reference"
)Interpretation Guidelines
Understanding Different Effect Measures
Linear Regression Coefficients
- Interpretation: Change in dependent variable per unit increase in predictor
- Significance: Confidence intervals not including zero
- Clinical relevance: Consider magnitude along with statistical significance
Best Practices
Model Building
- Variable Selection: Use clinical knowledge and statistical criteria
- Sample Size: Ensure adequate power for the number of predictors
- Missing Data: Handle appropriately before modeling
- Assumption Checking: Verify model assumptions
Troubleshooting Common Issues
Data Preparation
# Ensure variables are properly formatted
data$outcome <- as.factor(data$outcome) # For binary outcomes
data$time <- as.numeric(data$time) # For survival time
data$event <- as.numeric(data$event) # For event indicatorModel Convergence Issues
# For convergence problems, try:
# 1. Check for perfect separation in logistic regression
# 2. Scale continuous variables
# 3. Check for multicollinearity
# 4. Reduce number of predictorsMissing Data Handling
# Remove incomplete cases or use multiple imputation
complete_data <- data[complete.cases(data[c("outcome", "pred1", "pred2")]), ]Advanced Features
Custom Model Objects
The function also supports custom model objects (future enhancement):
# Fit your own model first
my_model <- glm(LVI ~ Age + Grade + TStage,
data = histopathology,
family = binomial())
# Then create forest plot
custom_model_plot <- jforestmodel(
model_object = my_model, # Future feature
model_type = "custom",
exponentiate = TRUE,
plot_title = "Custom Model Forest Plot"
)Conclusion
The jforestmodel function provides a comprehensive
solution for creating professional forest plots directly from regression
models. Key benefits include:
- Direct Model Fitting: No need to pre-calculate effect sizes
- Multiple Model Types: Support for linear, logistic, and Cox regression
- Professional Quality: Publication-ready outputs
- Extensive Customization: Colors, layouts, and formatting options
- Clinical Integration: Appropriate for medical research applications
Forest plots from regression models are powerful tools for
communicating statistical findings in medical research. The
jforestmodel function makes it easy to create high-quality
visualizations that meet publication standards and effectively
communicate your research findings.
For advanced statistical modeling considerations, consult with a biostatistician and refer to specialized statistical texts and guidelines from your research domain.
References
- Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ. 2001;322(7300):1479-80.
- Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
- Therneau T. A Package for Survival Analysis in R. 2023.
- Harrell FE. Regression Modeling Strategies. Springer, 2015.
- Steyerberg EW. Clinical Prediction Models. Springer, 2019.