Visualizing Regression Models with Forest Plots

A forest plot is a powerful way to visualize the results of a statistical model. It allows you to see the effect of several different predictors on an outcome all at once. In clinical research, forest plots are often used to display the results of regression models, showing the odds ratios or hazard ratios for various risk factors.

The Clinical Scenario

A researcher is working with the BreastCancer dataset and wants to understand which cellular characteristics are most strongly associated with a tumor being malignant. The research question is:

What are the risk factors for a breast tumor being malignant, and what is the strength of their association?

We will build a logistic regression model to answer this question and then visualize the results with a forest plot.

What is a Logistic Regression?

Before we make the plot, it’s helpful to understand the model behind it. A logistic regression is a statistical method used to predict a binary outcome (an outcome with only two possibilities, like “benign” or “malignant”) from a set of predictor variables. The model calculates an odds ratio for each predictor, which tells us how the odds of the outcome change with a one-unit increase in the predictor.

Step 1: The Analysis in jamovi

Load the BreastCancer.omv dataset into jamovi.
From the main analysis ribbon, click on JJStatsPlot -> Advanced -> Forest Plot from Regression Model.

[Screenshot of the jamovi analysis ribbon showing the path to the Forest Plot.] ***

In the analysis window:
- Move the Class variable (which contains “benign” and “malignant”) to the Dependent Variable box. This is the outcome we want to predict.
- Move the predictor variables, such as Cl.thickness, Cell.size, and Cell.shape, to the Predictor Variables box.
- Under Model Type, select Logistic Regression (glm).

[Screenshot of the analysis window showing the variables being assigned.] ***

Step 2: The Output Plot

jamovi will first fit a logistic regression model behind the scenes and then generate the following forest plot of the results:

# Load the data
data(BreastCancer, package = "ClinicoPath")

# The jforestmodel function fits the model and creates the plot
jforestmodel(
  data = BreastCancer,
  dependent_var = "Class",
  predictor_vars = c("Cl.thickness", "Cell.size", "Cell.shape", 
                    "Marg.adhesion", "Epith.c.size", "Bare.nuclei"),
  model_type = "glm",
  family = "binomial",
  exponentiate = TRUE,
  plot_title = "Predictors of Breast Cancer Malignancy",
  x_axis_label = "Odds Ratio (95% Confidence Interval)"
)

Step 3: Interpreting the Forest Plot

The forest plot shows the odds ratio for each predictor variable.

The Vertical Line: The vertical line at an odds ratio of 1.0 represents “no effect”. If a predictor’s confidence interval crosses this line, it is not statistically significant.
The Points: Each point represents the odds ratio for that predictor. This is the “best guess” for the effect of that variable.
The Horizontal Lines: The horizontal lines represent the 95% confidence interval for the odds ratio. This gives us a range of plausible values for the true odds ratio.

How to read the results:

Cl.thickness (Clump Thickness): The odds ratio is greater than 1, and the confidence interval does not cross 1. This means that for every one-unit increase in clump thickness, the odds of the tumor being malignant increase significantly.
Cell.size (Uniformity of Cell Size): This variable also has an odds ratio greater than 1, and its confidence interval does not cross 1, indicating that larger cell sizes are associated with higher odds of malignancy.
All the variables in this plot are statistically significant predictors of malignancy because none of their confidence intervals cross the vertical line at 1.0.

Step 4: Reporting the Results

When reporting the results from a forest plot, you should describe the model and then report the odds ratios and confidence intervals for the key predictors.

A logistic regression model was built to predict the likelihood of a malignant diagnosis based on cellular characteristics. The results are displayed in a forest plot. Several factors were found to be significantly associated with the odds of malignancy. For example, for each one-unit increase in clump thickness, the odds of the tumor being malignant increased by a factor of X (OR = X, 95% CI [Y, Z]). Similarly, uniformity of cell size was also a significant predictor (OR = A, 95% CI [B, C]).

(Note: You would fill in the X, Y, Z, A, B, and C values from the output table in jamovi.)