Forest Plots from Regression Models with jforestmodel
Professional Forest Plot Visualization for Linear, Logistic, and Cox Regression Models
ClinicoPath
2025-07-13
Source:vignettes/jjstatsplot-19-jforestmodel-comprehensive.Rmd
jjstatsplot-19-jforestmodel-comprehensive.Rmd
Introduction
The jforestmodel
function provides comprehensive forest
plot visualization capabilities for regression models including linear
regression, logistic regression, and Cox proportional hazards models.
Unlike traditional forest plots that display pre-calculated effect
sizes, this function fits regression models directly from raw data and
creates publication-ready forest plots of the model coefficients.
Forest plots from regression models are essential tools in medical research for:
- Visualizing the effect of multiple predictors simultaneously
- Comparing the relative importance of different risk factors
- Presenting odds ratios, hazard ratios, or linear coefficients with confidence intervals
- Creating publication-ready figures for medical journals
- Communicating statistical findings to clinical audiences
Key Features
The jforestmodel
function supports:
- Multiple Model Types: Linear regression (lm), logistic regression (glm), and Cox proportional hazards (coxph)
- Automatic Model Fitting: Fits models directly from your data
- Professional Visualization: Uses the forestmodel package for high-quality plots
- Flexible Customization: Colors, fonts, sizes, and layout options
- Clinical Interpretation: Automatic interpretation guidance
- Export Options: Multiple formats for publication
Basic Linear Regression
Simple Linear Model
Let’s start with a basic linear regression using the histopathology dataset:
# Load histopathology data
data(histopathology, package = "ClinicoPath")
# Display the first few rows
head(histopathology[c("Age", "Grade", "TStage", "LVI", "PNI")])
#> # A tibble: 6 × 5
#> Age Grade TStage LVI PNI
#> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 27 2 4 Present Absent
#> 2 36 2 4 Absent Absent
#> 3 65 1 4 Absent Absent
#> 4 51 3 4 Present Absent
#> 5 58 2 1 Absent Absent
#> 6 53 2 4 Present Present
# Create a basic linear regression forest plot
linear_model <- jforestmodel(
data = histopathology,
dependent_var = "Age",
predictor_vars = c("Grade", "TStage", "LVI", "PNI"),
model_type = "lm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
plot_title = "Linear Regression: Predictors of Age",
x_axis_label = "Beta Coefficient (Years)",
show_summary = TRUE,
show_interpretation = TRUE
)
# The plot will be displayed in jamovi interface
Understanding Linear Regression Forest Plots
In linear regression forest plots:
- Beta coefficients represent the change in the dependent variable for a one-unit increase in each predictor
- Confidence intervals that don’t cross zero indicate statistically significant associations
- Point estimates show the direction and magnitude of the effect
- Reference line at zero represents no effect
Logistic Regression
Binary Outcome Models
Logistic regression is commonly used for binary outcomes. Let’s predict lymphovascular invasion (LVI):
# Logistic regression with odds ratios
logistic_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
plot_title = "Logistic Regression: Predictors of Lymphovascular Invasion",
x_axis_label = "Odds Ratio (95% CI)",
show_summary = TRUE,
confidence_level = 0.95
)
Working with the Wisconsin Breast Cancer Dataset
Let’s use the classic Wisconsin Breast Cancer dataset for cancer diagnosis prediction:
# Load breast cancer data
data(BreastCancer, package = "ClinicoPath")
# Display dataset structure
str(BreastCancer[c("Class", "Cl.thickness", "Cell.size", "Cell.shape", "Marg.adhesion")])
#> tibble [699 × 5] (S3: tbl_df/tbl/data.frame)
#> $ Class : chr [1:699] "benign" "benign" "benign" "benign" ...
#> $ Cl.thickness : num [1:699] 5 5 3 6 4 8 1 2 2 4 ...
#> $ Cell.size : num [1:699] 1 4 1 8 1 10 1 1 1 2 ...
#> $ Cell.shape : num [1:699] 1 4 1 8 1 10 1 2 1 1 ...
#> $ Marg.adhesion: num [1:699] 1 5 1 1 3 8 1 1 1 1 ...
# Comprehensive breast cancer prediction model
breast_cancer_model <- jforestmodel(
data = BreastCancer,
dependent_var = "Class",
predictor_vars = c("Cl.thickness", "Cell.size", "Cell.shape",
"Marg.adhesion", "Epith.c.size", "Bare.nuclei"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
plot_title = "Breast Cancer Diagnosis Prediction Model",
x_axis_label = "Odds Ratio (95% CI)",
color_scheme = "blue",
factor_separate_line = TRUE,
show_p_values = TRUE,
show_confidence_intervals = TRUE
)
Different GLM Families
The function supports various GLM families:
# Poisson regression example
poisson_model <- jforestmodel(
data = histopathology,
dependent_var = "Age", # Treating age as count for demonstration
predictor_vars = c("Grade", "TStage", "LVI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "poisson",
exponentiate = TRUE,
plot_title = "Poisson Regression Model",
x_axis_label = "Rate Ratio"
)
# Gaussian GLM (equivalent to linear regression)
gaussian_model <- jforestmodel(
data = histopathology,
dependent_var = "Age",
predictor_vars = c("Grade", "TStage", "LVI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "gaussian",
exponentiate = FALSE,
plot_title = "Gaussian GLM Model",
x_axis_label = "Coefficient"
)
Cox Proportional Hazards Models
Survival Analysis with Colon Cancer Data
Cox regression is used for time-to-event analysis. Let’s use the colon cancer dataset:
# Load colon cancer survival data
data(colon, package = "ClinicoPath")
# Display relevant variables
head(colon[c("time", "status", "age", "sex", "obstruct", "perfor", "adhere")])
#> # A tibble: 6 × 7
#> time status age sex obstruct perfor adhere
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1521 1 43 1 0 0 0
#> 2 968 1 43 1 0 0 0
#> 3 3087 0 63 1 0 0 0
#> 4 3087 0 63 1 0 0 0
#> 5 963 1 71 0 0 0 1
#> 6 542 1 71 0 0 0 1
# Cox proportional hazards model
cox_model <- jforestmodel(
data = colon,
dependent_var = "status", # Required by interface but not used directly
predictor_vars = c("age", "sex", "obstruct", "perfor", "adhere"),
model_type = "coxph",
time_var = "time",
event_var = "status",
covariates = NULL,
exponentiate = TRUE,
plot_title = "Cox Regression: Colon Cancer Survival",
x_axis_label = "Hazard Ratio (95% CI)",
confidence_level = 0.95,
show_summary = TRUE
)
Melanoma Survival Analysis
Let’s also examine melanoma survival data:
# Load melanoma data
data(melanoma, package = "ClinicoPath")
# Display melanoma variables
head(melanoma[c("time", "status", "age", "sex", "thickness", "ulcer")])
#> # A tibble: 6 × 6
#> time status age sex thickness ulcer
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 10 3 76 1 6.76 1
#> 2 30 3 56 1 0.65 0
#> 3 35 2 41 1 1.34 0
#> 4 99 3 71 0 2.9 0
#> 5 185 1 52 1 12.1 1
#> 6 204 1 28 1 4.84 1
# Melanoma survival model
melanoma_model <- jforestmodel(
data = melanoma,
dependent_var = "status",
predictor_vars = c("age", "sex", "thickness", "ulcer"),
model_type = "coxph",
time_var = "time",
event_var = "status",
covariates = NULL,
exponentiate = TRUE,
plot_title = "Melanoma Survival Analysis",
x_axis_label = "Hazard Ratio (95% CI)",
color_scheme = "red",
show_interpretation = TRUE
)
Advanced Customization
Color Schemes and Visual Options
The function provides multiple color schemes and visual customization:
# Custom color scheme
custom_color_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
color_scheme = "custom",
custom_color = "#FF5722",
point_size = 3.0,
line_size = 1.2,
plot_title = "Custom Styled Forest Plot"
)
# Professional medical color scheme
medical_style_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
color_scheme = "green",
plot_title = "Medical Style Forest Plot",
show_reference_line = TRUE,
reference_value = 1
)
Confidence Levels and Statistical Options
# 99% confidence intervals
high_confidence_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
confidence_level = 0.99,
plot_title = "High Confidence Model (99% CI)"
)
# 90% confidence intervals
low_confidence_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
confidence_level = 0.90,
plot_title = "Conservative Model (90% CI)"
)
Factor Variable Handling
# Factor variables on separate lines (default)
factor_separate_model <- jforestmodel(
data = histopathology,
dependent_var = "Death",
predictor_vars = c("Age", "Grade", "TStage", "LVI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
factor_separate_line = TRUE,
plot_title = "Factor Levels on Separate Lines"
)
# Factor variables on same line
factor_together_model <- jforestmodel(
data = histopathology,
dependent_var = "Death",
predictor_vars = c("Age", "Grade", "TStage", "LVI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
factor_separate_line = FALSE,
plot_title = "Factor Levels Together"
)
Variable Sorting and Selection
Sorting Options
The function provides several sorting options for variables:
# Sort by coefficient size
coefficient_sorted_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
sort_variables = "coefficient",
plot_title = "Sorted by Coefficient Size"
)
# Sort by p-value
pvalue_sorted_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
sort_variables = "pvalue",
plot_title = "Sorted by P-value"
)
# Alphabetical sorting
alphabetical_sorted_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
sort_variables = "alphabetical",
plot_title = "Alphabetically Sorted"
)
Covariate Selection
You can specify which covariates to display in the plot:
# Display only specific covariates
selected_covariates_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = c("Age", "Grade"), # Only show these in plot
family = "binomial",
exponentiate = TRUE,
plot_title = "Selected Covariates Only"
)
Model Diagnostics and Interpretation
Model Summary Statistics
# Comprehensive model with full reporting
comprehensive_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
show_summary = TRUE,
show_interpretation = TRUE,
plot_title = "Comprehensive Model Analysis"
)
Linear Model Diagnostics
For linear models, additional diagnostic information is provided:
# Linear model with diagnostics
linear_diagnostics_model <- jforestmodel(
data = histopathology,
dependent_var = "Age",
predictor_vars = c("Grade", "TStage", "LVI", "PNI"),
model_type = "lm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
show_summary = TRUE,
plot_title = "Linear Model with Diagnostics"
)
Publication-Ready Examples
High-Quality Clinical Example
# Publication-ready breast cancer model
publication_model <- jforestmodel(
data = BreastCancer,
dependent_var = "Class",
predictor_vars = c("Cl.thickness", "Cell.size", "Cell.shape",
"Marg.adhesion", "Epith.c.size"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
confidence_level = 0.95,
plot_title = "Predictors of Breast Cancer Malignancy",
x_axis_label = "Odds Ratio (95% Confidence Interval)",
color_scheme = "blue",
point_size = 2.5,
line_size = 0.8,
factor_separate_line = TRUE,
show_p_values = TRUE,
show_confidence_intervals = TRUE,
show_reference_line = TRUE,
reference_value = 1,
show_summary = TRUE,
show_interpretation = TRUE
)
Multi-Model Comparison
You can create multiple models for comparison:
# Model 1: Basic predictors
basic_model <- jforestmodel(
data = histopathology,
dependent_var = "Death",
predictor_vars = c("Age", "Grade"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
plot_title = "Basic Model: Age and Grade Only"
)
# Model 2: Extended predictors
extended_model <- jforestmodel(
data = histopathology,
dependent_var = "Death",
predictor_vars = c("Age", "Grade", "TStage", "LVI", "PNI"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
plot_title = "Extended Model: Multiple Predictors"
)
Layout and Panel Customization
Panel Width Ratios
# Custom panel width ratios
custom_layout_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
panel_width_ratio = "2:3:1",
plot_title = "Custom Panel Layout"
)
Reference Lines and Values
# Custom reference line for linear model
linear_reference_model <- jforestmodel(
data = histopathology,
dependent_var = "Age",
predictor_vars = c("Grade", "TStage", "LVI"),
model_type = "lm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
show_reference_line = TRUE,
reference_value = 0,
plot_title = "Linear Model with Zero Reference"
)
# Custom reference line for logistic model
logistic_reference_model <- jforestmodel(
data = histopathology,
dependent_var = "LVI",
predictor_vars = c("Age", "Grade", "TStage"),
model_type = "glm",
time_var = NULL,
event_var = NULL,
covariates = NULL,
family = "binomial",
exponentiate = TRUE,
show_reference_line = TRUE,
reference_value = 1,
plot_title = "Logistic Model with OR=1 Reference"
)
Interpretation Guidelines
Understanding Different Effect Measures
Linear Regression Coefficients
- Interpretation: Change in dependent variable per unit increase in predictor
- Significance: Confidence intervals not including zero
- Clinical relevance: Consider magnitude along with statistical significance
Best Practices
Model Building
- Variable Selection: Use clinical knowledge and statistical criteria
- Sample Size: Ensure adequate power for the number of predictors
- Missing Data: Handle appropriately before modeling
- Assumption Checking: Verify model assumptions
Troubleshooting Common Issues
Data Preparation
# Ensure variables are properly formatted
data$outcome <- as.factor(data$outcome) # For binary outcomes
data$time <- as.numeric(data$time) # For survival time
data$event <- as.numeric(data$event) # For event indicator
Model Convergence Issues
# For convergence problems, try:
# 1. Check for perfect separation in logistic regression
# 2. Scale continuous variables
# 3. Check for multicollinearity
# 4. Reduce number of predictors
Missing Data Handling
# Remove incomplete cases or use multiple imputation
complete_data <- data[complete.cases(data[c("outcome", "pred1", "pred2")]), ]
Advanced Features
Custom Model Objects
The function also supports custom model objects (future enhancement):
# Fit your own model first
my_model <- glm(LVI ~ Age + Grade + TStage,
data = histopathology,
family = binomial())
# Then create forest plot
custom_model_plot <- jforestmodel(
model_object = my_model, # Future feature
model_type = "custom",
exponentiate = TRUE,
plot_title = "Custom Model Forest Plot"
)
Conclusion
The jforestmodel
function provides a comprehensive
solution for creating professional forest plots directly from regression
models. Key benefits include:
- Direct Model Fitting: No need to pre-calculate effect sizes
- Multiple Model Types: Support for linear, logistic, and Cox regression
- Professional Quality: Publication-ready outputs
- Extensive Customization: Colors, layouts, and formatting options
- Clinical Integration: Appropriate for medical research applications
Forest plots from regression models are powerful tools for
communicating statistical findings in medical research. The
jforestmodel
function makes it easy to create high-quality
visualizations that meet publication standards and effectively
communicate your research findings.
For advanced statistical modeling considerations, consult with a biostatistician and refer to specialized statistical texts and guidelines from your research domain.
References
- Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ. 2001;322(7300):1479-80.
- Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
- Therneau T. A Package for Survival Analysis in R. 2023.
- Harrell FE. Regression Modeling Strategies. Springer, 2015.
- Steyerberg EW. Clinical Prediction Models. Springer, 2019.