Advanced Heatmap Visualization with jggheatmap
Professional Heatmap Creation for Clinical Research and Data Analysis
ClinicoPath
2025-07-13
Source:vignettes/jjstatsplot-20-jggheatmap-comprehensive.Rmd
jjstatsplot-20-jggheatmap-comprehensive.Rmd
Introduction
The jggheatmap
function provides comprehensive heatmap
visualization capabilities for clinical research and data analysis. It
creates publication-ready heatmaps with advanced clustering, scaling,
and customization options, making it ideal for gene expression analysis,
correlation matrices, clinical parameter visualization, and other
matrix-based data.
Heatmaps are essential tools in clinical research for:
- Gene Expression Analysis: Visualizing expression patterns across samples and genes
- Correlation Matrices: Displaying relationships between multiple variables
- Clinical Parameter Clustering: Identifying patterns in patient characteristics
- Biomarker Discovery: Exploring associations in high-dimensional data
- Quality Control: Identifying outliers and batch effects in datasets
Key Features
The jggheatmap
function supports:
- Flexible Data Input: Matrix variables or pivot format (row/column/value)
- Advanced Clustering: Hierarchical clustering with multiple distance and linkage methods
- Data Scaling: Row, column, or global normalization options
- Rich Visualization: Multiple color schemes, cell shapes, and annotation options
- Interactive Elements: Dendrograms, value displays, and custom labeling
- Export Options: Multiple output formats and customizable dimensions
Basic Heatmap Creation
Matrix Variables Approach
The simplest way to create a heatmap is using the matrix variables approach with numeric data:
# Load histopathology data
data(histopathology, package = "ClinicoPath")
# Display the first few rows of relevant numeric variables
head(histopathology[c("Age", "TStage", "Grade", "LVI", "PNI")])
# Create a basic heatmap using matrix variables (correlation matrix)
basic_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
plot_title = "Clinical Parameters Correlation Heatmap",
show_colorbar = TRUE,
colorbar_title = "Correlation"
)
# The plot will be displayed in jamovi interface
Pivot Format Approach
Row/Column/Value Structure
For data that needs to be reshaped into matrix format, use the pivot approach:
# Create example data in pivot format
pivot_data <- histopathology[1:20, ] %>%
select(PatientID, Grade, TStage, Age) %>%
mutate(
PatientID = paste0("Patient_", row_number()),
Grade = paste0("Grade_", Grade)
) %>%
# Create long format for demonstration
pivot_longer(cols = c(TStage, Age), names_to = "Measure", values_to = "Value")
head(pivot_data)
# Create heatmap using pivot format
pivot_heatmap <- jggheatmap(
data = pivot_data,
row_var = "PatientID",
col_var = "Measure",
value_var = "Value",
plot_title = "Patient Measures Heatmap",
cluster_rows = TRUE,
cluster_cols = FALSE
)
Data Scaling Options
Row Scaling (Z-score by rows)
Row scaling normalizes each row to have mean=0 and sd=1, useful for comparing patterns across variables:
# Row scaling for cross-variable comparison
row_scaled_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
scaling = "row",
plot_title = "Row-Scaled Clinical Parameters",
colorbar_title = "Z-Score",
color_scheme = "blue_red"
)
Column Scaling
Column scaling normalizes each variable independently:
# Column scaling for variable standardization
column_scaled_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
scaling = "column",
plot_title = "Column-Scaled Clinical Parameters",
colorbar_title = "Standardized Value",
color_scheme = "viridis"
)
Global Scaling
Global scaling applies standardization across the entire matrix:
# Global scaling for overall comparison
global_scaled_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
scaling = "global",
plot_title = "Globally-Scaled Parameters",
colorbar_title = "Global Z-Score",
color_scheme = "plasma"
)
Clustering Analysis
Hierarchical Clustering
Enable clustering to reveal patterns and group similar rows/columns:
# Basic clustering with default settings
clustered_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
cluster_rows = TRUE,
cluster_cols = TRUE,
scaling = "row",
plot_title = "Clustered Clinical Parameters",
show_dendrograms = TRUE
)
Clustering Methods
Different clustering methods reveal different patterns:
# Ward's method clustering
ward_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
cluster_rows = TRUE,
cluster_cols = TRUE,
clustering_method = "ward.D2",
scaling = "row",
plot_title = "Ward's Method Clustering"
)
# Average linkage clustering
average_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
cluster_rows = TRUE,
cluster_cols = TRUE,
clustering_method = "average",
scaling = "row",
plot_title = "Average Linkage Clustering"
)
Distance Methods
Choose appropriate distance metrics for your data type:
# Pearson correlation distance
correlation_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
cluster_rows = TRUE,
cluster_cols = TRUE,
distance_method = "pearson",
clustering_method = "complete",
scaling = "row",
plot_title = "Pearson Correlation Distance"
)
# Manhattan distance
manhattan_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
cluster_rows = TRUE,
cluster_cols = TRUE,
distance_method = "manhattan",
scaling = "column",
plot_title = "Manhattan Distance Clustering"
)
Color Schemes and Visualization
Built-in Color Schemes
The function provides several professional color schemes:
# Viridis color scheme (perceptually uniform)
viridis_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
color_scheme = "viridis",
plot_title = "Viridis Color Scheme",
scaling = "row"
)
# Spectral color scheme (diverging)
spectral_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
color_scheme = "spectral",
plot_title = "Spectral Color Scheme",
scaling = "row"
)
# RdYlBu scheme (red-yellow-blue)
rdylbu_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
color_scheme = "rdylbu",
plot_title = "RdYlBu Color Scheme",
scaling = "row"
)
Cell Value Display
Show actual values within cells for detailed analysis:
# Display cell values with custom formatting
values_heatmap <- jggheatmap(
data = histopathology[1:15, ], # Smaller subset for readability
matrix_vars = c("Age", "TStage", "Grade"),
show_values = TRUE,
value_format = "decimal1",
text_size = 10,
plot_title = "Heatmap with Cell Values",
scaling = "none"
)
Advanced Customization
Label and Text Customization
Control the appearance of row and column labels:
# Custom label styling
custom_labels_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI"),
show_row_labels = TRUE,
show_col_labels = TRUE,
row_label_size = 8,
col_label_size = 12,
plot_title = "Custom Label Styling",
scaling = "row"
)
# Hide specific labels
minimal_labels_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
show_row_labels = FALSE,
show_col_labels = TRUE,
col_label_size = 14,
plot_title = "Minimal Labeling",
scaling = "row"
)
Plot Dimensions and Layout
Customize plot size and appearance:
# Large format for publication
publication_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
plot_width = 1000,
plot_height = 800,
cluster_rows = TRUE,
cluster_cols = TRUE,
scaling = "row",
plot_title = "Publication-Ready Heatmap",
colorbar_title = "Standardized Expression"
)
Cell Appearance Options
Customize cell shapes and colors:
# Custom cell appearance
custom_cells_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
cell_shape = "circle",
border_color = "black",
na_color = "gray50",
plot_title = "Custom Cell Appearance",
scaling = "row"
)
Correlation Matrix Visualization
Creating Correlation Heatmaps
Heatmaps are excellent for visualizing correlation matrices:
# Calculate correlation matrix
numeric_vars <- c("Age", "TStage", "Grade", "LVI", "PNI")
correlation_matrix <- cor(histopathology[numeric_vars], use = "complete.obs")
# Convert to long format for jggheatmap
correlation_long <- correlation_matrix %>%
as.data.frame() %>%
rownames_to_column("Variable1") %>%
pivot_longer(cols = -Variable1, names_to = "Variable2", values_to = "Correlation")
head(correlation_long)
# Create correlation heatmap
correlation_heatmap <- jggheatmap(
data = correlation_long,
row_var = "Variable1",
col_var = "Variable2",
value_var = "Correlation",
cluster_rows = TRUE,
cluster_cols = TRUE,
color_scheme = "blue_red",
plot_title = "Clinical Parameter Correlations",
colorbar_title = "Pearson r",
show_values = TRUE,
value_format = "decimal2"
)
Output Options and Data Export
Multiple Output Formats
The function supports different output formats:
# Plot only (default)
plot_only_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
output_format = "plot_only",
plot_title = "Plot Only Output"
)
# Data matrix output
data_matrix_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
output_format = "data_matrix",
scaling = "row",
plot_title = "Data Matrix Output"
)
# Both plot and data
both_output_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
output_format = "both",
cluster_rows = TRUE,
cluster_cols = TRUE,
scaling = "row",
plot_title = "Combined Output"
)
Clinical Research Applications
Gene Expression Analysis
For genomic data analysis (simulated example):
# Simulate gene expression data
set.seed(123)
gene_expression <- data.frame(
Patient = paste0("P", 1:20),
Gene1 = rnorm(20, 5, 2),
Gene2 = rnorm(20, 3, 1.5),
Gene3 = rnorm(20, 7, 2.5),
Gene4 = rnorm(20, 4, 1),
Gene5 = rnorm(20, 6, 2)
)
# Create gene expression heatmap
gene_heatmap <- jggheatmap(
data = gene_expression,
matrix_vars = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5"),
cluster_rows = TRUE,
cluster_cols = TRUE,
scaling = "row",
color_scheme = "viridis",
plot_title = "Gene Expression Heatmap",
colorbar_title = "Log2 Expression",
show_dendrograms = TRUE
)
Biomarker Analysis
Analyzing biomarker patterns:
# Create biomarker data subset
biomarker_data <- histopathology %>%
select(Age, TStage, Grade, LVI, PNI) %>%
slice_head(n = 30) # Focus on first 30 patients
# Comprehensive biomarker heatmap
biomarker_heatmap <- jggheatmap(
data = biomarker_data,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
cluster_rows = TRUE,
cluster_cols = TRUE,
clustering_method = "ward.D2",
distance_method = "euclidean",
scaling = "column",
color_scheme = "spectral",
show_values = FALSE,
show_dendrograms = TRUE,
plot_title = "Clinical Biomarker Patterns",
colorbar_title = "Normalized Value"
)
Quality Control and Batch Effects
Identifying Outliers
Use heatmaps to identify unusual patterns:
# Create QC heatmap with highlighting
qc_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
cluster_rows = TRUE,
cluster_cols = FALSE,
scaling = "row",
color_scheme = "blue_red",
plot_title = "Quality Control Heatmap",
show_values = FALSE,
border_color = "white"
)
Interactive Features and Annotations
Dendrogram Display
Show clustering relationships:
# Detailed dendrogram display
dendrogram_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
cluster_rows = TRUE,
cluster_cols = TRUE,
show_dendrograms = TRUE,
dendrogram_height = 0.3,
clustering_method = "complete",
distance_method = "pearson",
scaling = "row",
plot_title = "Heatmap with Dendrograms"
)
Annotation Variables
Add contextual information (when available):
# Heatmap with annotations
annotated_heatmap <- jggheatmap(
data = histopathology,
matrix_vars = c("Age", "TStage", "Grade"),
annotation_var = "Grade",
annotation_colors = "set1",
cluster_rows = TRUE,
cluster_cols = TRUE,
scaling = "row",
plot_title = "Annotated Heatmap"
)
Interpretation Guidelines
Reading Heatmaps Effectively
Color Interpretation
- Hot colors (red/yellow): High values or positive associations
- Cool colors (blue/green): Low values or negative associations
- White/neutral: Baseline or no association
Best Practices
Data Preparation
- Quality Control: Remove samples/variables with excessive missing data
- Outlier Assessment: Identify and evaluate extreme values
- Normalization: Choose appropriate scaling method for your data type
- Variable Selection: Include relevant variables with sufficient variability
Troubleshooting Common Issues
Data Format Problems
# Ensure numeric variables
data$variable <- as.numeric(data$variable)
# Handle missing values
data <- data[complete.cases(data[variables]), ]
# Check for constant variables
apply(data[variables], 2, var, na.rm = TRUE) # Should be > 0
Visualization Issues
# For too many variables, consider subsetting
important_vars <- c("Gene1", "Gene2", "Gene3") # Select key variables
# For overcrowded labels, adjust sizes or hide them
row_label_size = 6
show_row_labels = FALSE
Advanced Applications
Time Series Heatmaps
For longitudinal data analysis:
# Simulate time series data
time_series_data <- expand.grid(
Patient = paste0("P", 1:10),
Timepoint = paste0("T", 1:5)
) %>%
mutate(
Biomarker1 = rnorm(50, 5, 2),
Biomarker2 = rnorm(50, 3, 1),
Biomarker3 = rnorm(50, 7, 2)
)
# Create time series heatmap
timeseries_heatmap <- jggheatmap(
data = time_series_data,
row_var = "Patient",
col_var = "Timepoint",
value_var = "Biomarker1",
cluster_rows = TRUE,
cluster_cols = FALSE, # Keep temporal order
color_scheme = "plasma",
plot_title = "Longitudinal Biomarker Changes"
)
Multi-Level Comparisons
For comparing different conditions:
# Create condition-specific data
condition_data <- histopathology %>%
mutate(Condition = ifelse(Grade > 2, "High_Grade", "Low_Grade")) %>%
select(Condition, Age, TStage, LVI, PNI) %>%
group_by(Condition) %>%
summarise(
across(c(Age, TStage, LVI, PNI), mean, na.rm = TRUE),
.groups = "drop"
)
# Create comparison heatmap
comparison_heatmap <- jggheatmap(
data = condition_data,
matrix_vars = c("Age", "TStage", "LVI", "PNI"),
cluster_rows = FALSE,
cluster_cols = TRUE,
scaling = "column",
color_scheme = "blue_red",
plot_title = "Condition Comparison Heatmap"
)
Conclusion
The jggheatmap
function provides a comprehensive
solution for advanced heatmap visualization in clinical research. Key
benefits include:
- Flexible Data Input: Support for both matrix and pivot data formats
- Advanced Analytics: Hierarchical clustering with multiple distance and linkage methods
- Professional Quality: Publication-ready outputs with extensive customization
- Clinical Integration: Designed for medical research applications
- Scalability: Handles datasets from small studies to large genomic analyses
Heatmaps are powerful tools for exploring complex datasets,
identifying patterns, and communicating findings effectively. The
jggheatmap
function makes it easy to create
professional-quality visualizations that meet publication standards and
enhance your research presentations.
For specific applications in your research domain, consider consulting with a biostatistician or bioinformatician to ensure appropriate analysis methods and interpretation of results.
References
- Wilkinson L, Friendly M. The History of the Cluster Heat Map. The American Statistician. 2009;63(2):179-184.
- Weinstein JN, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genetics. 2013;45(10):1113-20.
- Eisen MB, et al. Cluster analysis and display of genome-wide expression patterns. PNAS. 1998;95(25):14863-8.
- Ward JH. Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association. 1963;58(301):236-244.
- Murtagh F, Contreras P. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2012;2(1):86-97.