Advanced Heatmap Visualization with jggheatmap

library(ClinicoPath)
library(ggplot2)
library(dplyr)
library(tidyr)

Introduction

The jggheatmap function provides comprehensive heatmap visualization capabilities for clinical research and data analysis. It creates publication-ready heatmaps with advanced clustering, scaling, and customization options, making it ideal for gene expression analysis, correlation matrices, clinical parameter visualization, and other matrix-based data.

Heatmaps are essential tools in clinical research for:

Gene Expression Analysis: Visualizing expression patterns across samples and genes
Correlation Matrices: Displaying relationships between multiple variables
Clinical Parameter Clustering: Identifying patterns in patient characteristics
Biomarker Discovery: Exploring associations in high-dimensional data
Quality Control: Identifying outliers and batch effects in datasets

Key Features

The jggheatmap function supports:

Flexible Data Input: Matrix variables or pivot format (row/column/value)
Advanced Clustering: Hierarchical clustering with multiple distance and linkage methods
Data Scaling: Row, column, or global normalization options
Rich Visualization: Multiple color schemes, cell shapes, and annotation options
Interactive Elements: Dendrograms, value displays, and custom labeling
Export Options: Multiple output formats and customizable dimensions

Basic Heatmap Creation

Matrix Variables Approach

The simplest way to create a heatmap is using the matrix variables approach with numeric data:

# Load histopathology data
data(histopathology, package = "ClinicoPath")

# Display the first few rows of relevant numeric variables
head(histopathology[c("Age", "TStage", "Grade", "LVI", "PNI")])

# Create a basic heatmap using matrix variables (correlation matrix)
basic_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  plot_title = "Clinical Parameters Correlation Heatmap",
  show_colorbar = TRUE,
  colorbar_title = "Correlation"
)

# The plot will be displayed in jamovi interface

Understanding Matrix Variables

When using matrix variables:

Each selected variable becomes a column in the heatmap
Each row in your dataset becomes a row in the heatmap
Values are displayed as colors according to the selected color scheme
Only numeric variables should be included

Pivot Format Approach

Row/Column/Value Structure

For data that needs to be reshaped into matrix format, use the pivot approach:

# Create example data in pivot format
pivot_data <- histopathology[1:20, ] %>%
  select(PatientID, Grade, TStage, Age) %>%
  mutate(
    PatientID = paste0("Patient_", row_number()),
    Grade = paste0("Grade_", Grade)
  ) %>%
  # Create long format for demonstration
  pivot_longer(cols = c(TStage, Age), names_to = "Measure", values_to = "Value")

head(pivot_data)

# Create heatmap using pivot format
pivot_heatmap <- jggheatmap(
  data = pivot_data,
  row_var = "PatientID",
  col_var = "Measure",
  value_var = "Value",
  plot_title = "Patient Measures Heatmap",
  cluster_rows = TRUE,
  cluster_cols = FALSE
)

Data Scaling Options

Row Scaling (Z-score by rows)

Row scaling normalizes each row to have mean=0 and sd=1, useful for comparing patterns across variables:

# Row scaling for cross-variable comparison
row_scaled_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  scaling = "row",
  plot_title = "Row-Scaled Clinical Parameters",
  colorbar_title = "Z-Score",
  color_scheme = "blue_red"
)

Column Scaling

Column scaling normalizes each variable independently:

# Column scaling for variable standardization
column_scaled_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  scaling = "column",
  plot_title = "Column-Scaled Clinical Parameters",
  colorbar_title = "Standardized Value",
  color_scheme = "viridis"
)

Global Scaling

Global scaling applies standardization across the entire matrix:

# Global scaling for overall comparison
global_scaled_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  scaling = "global",
  plot_title = "Globally-Scaled Parameters",
  colorbar_title = "Global Z-Score",
  color_scheme = "plasma"
)

Clustering Analysis

Hierarchical Clustering

Enable clustering to reveal patterns and group similar rows/columns:

# Basic clustering with default settings
clustered_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  scaling = "row",
  plot_title = "Clustered Clinical Parameters",
  show_dendrograms = TRUE
)

Clustering Methods

Different clustering methods reveal different patterns:

# Ward's method clustering
ward_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  clustering_method = "ward.D2",
  scaling = "row",
  plot_title = "Ward's Method Clustering"
)

# Average linkage clustering
average_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  clustering_method = "average",
  scaling = "row",
  plot_title = "Average Linkage Clustering"
)

Distance Methods

Choose appropriate distance metrics for your data type:

# Pearson correlation distance
correlation_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  distance_method = "pearson",
  clustering_method = "complete",
  scaling = "row",
  plot_title = "Pearson Correlation Distance"
)

# Manhattan distance
manhattan_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  distance_method = "manhattan",
  scaling = "column",
  plot_title = "Manhattan Distance Clustering"
)

Color Schemes and Visualization

Built-in Color Schemes

The function provides several professional color schemes:

# Viridis color scheme (perceptually uniform)
viridis_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  color_scheme = "viridis",
  plot_title = "Viridis Color Scheme",
  scaling = "row"
)

# Spectral color scheme (diverging)
spectral_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  color_scheme = "spectral",
  plot_title = "Spectral Color Scheme",
  scaling = "row"
)

# RdYlBu scheme (red-yellow-blue)
rdylbu_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  color_scheme = "rdylbu",
  plot_title = "RdYlBu Color Scheme",
  scaling = "row"
)

Cell Value Display

Show actual values within cells for detailed analysis:

# Display cell values with custom formatting
values_heatmap <- jggheatmap(
  data = histopathology[1:15, ],  # Smaller subset for readability
  matrix_vars = c("Age", "TStage", "Grade"),
  show_values = TRUE,
  value_format = "decimal1",
  text_size = 10,
  plot_title = "Heatmap with Cell Values",
  scaling = "none"
)

Advanced Customization

Label and Text Customization

Control the appearance of row and column labels:

# Custom label styling
custom_labels_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI"),
  show_row_labels = TRUE,
  show_col_labels = TRUE,
  row_label_size = 8,
  col_label_size = 12,
  plot_title = "Custom Label Styling",
  scaling = "row"
)

# Hide specific labels
minimal_labels_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  show_row_labels = FALSE,
  show_col_labels = TRUE,
  col_label_size = 14,
  plot_title = "Minimal Labeling",
  scaling = "row"
)

Plot Dimensions and Layout

Customize plot size and appearance:

# Large format for publication
publication_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  plot_width = 1000,
  plot_height = 800,
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  scaling = "row",
  plot_title = "Publication-Ready Heatmap",
  colorbar_title = "Standardized Expression"
)

Cell Appearance Options

Customize cell shapes and colors:

# Custom cell appearance
custom_cells_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  cell_shape = "circle",
  border_color = "black",
  na_color = "gray50",
  plot_title = "Custom Cell Appearance",
  scaling = "row"
)

Correlation Matrix Visualization

Creating Correlation Heatmaps

Heatmaps are excellent for visualizing correlation matrices:

# Calculate correlation matrix
numeric_vars <- c("Age", "TStage", "Grade", "LVI", "PNI")
correlation_matrix <- cor(histopathology[numeric_vars], use = "complete.obs")

# Convert to long format for jggheatmap
correlation_long <- correlation_matrix %>%
  as.data.frame() %>%
  rownames_to_column("Variable1") %>%
  pivot_longer(cols = -Variable1, names_to = "Variable2", values_to = "Correlation")

head(correlation_long)

# Create correlation heatmap
correlation_heatmap <- jggheatmap(
  data = correlation_long,
  row_var = "Variable1",
  col_var = "Variable2", 
  value_var = "Correlation",
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  color_scheme = "blue_red",
  plot_title = "Clinical Parameter Correlations",
  colorbar_title = "Pearson r",
  show_values = TRUE,
  value_format = "decimal2"
)

Output Options and Data Export

Multiple Output Formats

The function supports different output formats:

# Plot only (default)
plot_only_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  output_format = "plot_only",
  plot_title = "Plot Only Output"
)

# Data matrix output
data_matrix_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  output_format = "data_matrix",
  scaling = "row",
  plot_title = "Data Matrix Output"
)

# Both plot and data
both_output_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  output_format = "both",
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  scaling = "row",
  plot_title = "Combined Output"
)

Clinical Research Applications

Gene Expression Analysis

For genomic data analysis (simulated example):

# Simulate gene expression data
set.seed(123)
gene_expression <- data.frame(
  Patient = paste0("P", 1:20),
  Gene1 = rnorm(20, 5, 2),
  Gene2 = rnorm(20, 3, 1.5),
  Gene3 = rnorm(20, 7, 2.5),
  Gene4 = rnorm(20, 4, 1),
  Gene5 = rnorm(20, 6, 2)
)

# Create gene expression heatmap
gene_heatmap <- jggheatmap(
  data = gene_expression,
  matrix_vars = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5"),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  scaling = "row",
  color_scheme = "viridis",
  plot_title = "Gene Expression Heatmap",
  colorbar_title = "Log2 Expression",
  show_dendrograms = TRUE
)

Biomarker Analysis

Analyzing biomarker patterns:

# Create biomarker data subset
biomarker_data <- histopathology %>%
  select(Age, TStage, Grade, LVI, PNI) %>%
  slice_head(n = 30)  # Focus on first 30 patients

# Comprehensive biomarker heatmap
biomarker_heatmap <- jggheatmap(
  data = biomarker_data,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  clustering_method = "ward.D2",
  distance_method = "euclidean",
  scaling = "column",
  color_scheme = "spectral",
  show_values = FALSE,
  show_dendrograms = TRUE,
  plot_title = "Clinical Biomarker Patterns",
  colorbar_title = "Normalized Value"
)

Quality Control and Batch Effects

Identifying Outliers

Use heatmaps to identify unusual patterns:

# Create QC heatmap with highlighting
qc_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  cluster_rows = TRUE,
  cluster_cols = FALSE,
  scaling = "row",
  color_scheme = "blue_red",
  plot_title = "Quality Control Heatmap",
  show_values = FALSE,
  border_color = "white"
)

Interactive Features and Annotations

Dendrogram Display

Show clustering relationships:

# Detailed dendrogram display
dendrogram_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade", "LVI", "PNI"),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  show_dendrograms = TRUE,
  dendrogram_height = 0.3,
  clustering_method = "complete",
  distance_method = "pearson",
  scaling = "row",
  plot_title = "Heatmap with Dendrograms"
)

Annotation Variables

Add contextual information (when available):

# Heatmap with annotations
annotated_heatmap <- jggheatmap(
  data = histopathology,
  matrix_vars = c("Age", "TStage", "Grade"),
  annotation_var = "Grade",
  annotation_colors = "set1",
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  scaling = "row",
  plot_title = "Annotated Heatmap"
)

Interpretation Guidelines

Reading Heatmaps Effectively

Color Interpretation

Hot colors (red/yellow): High values or positive associations
Cool colors (blue/green): Low values or negative associations
White/neutral: Baseline or no association

Clustering Patterns

Row clusters: Groups of samples with similar profiles
Column clusters: Groups of variables with similar patterns
Dendrogram height: Indicates dissimilarity; longer branches = more different

Data Scaling Impact

No scaling: Preserves original value relationships
Row scaling: Emphasizes patterns within each sample
Column scaling: Emphasizes patterns within each variable
Global scaling: Provides overall standardization

Statistical Considerations

Distance Metrics

Euclidean: Standard geometric distance, sensitive to scale
Manhattan: Sum of absolute differences, robust to outliers
Pearson: Based on linear correlation, good for expression data
Spearman: Based on rank correlation, robust to non-linear relationships

Clustering Methods

Complete: Conservative, creates compact clusters
Average: Balanced approach, widely used
Single: Liberal, can create elongated clusters
Ward: Minimizes within-cluster variance, good for even-sized groups

Best Practices

Data Preparation

Quality Control: Remove samples/variables with excessive missing data
Outlier Assessment: Identify and evaluate extreme values
Normalization: Choose appropriate scaling method for your data type
Variable Selection: Include relevant variables with sufficient variability

Visualization Design

Color Choice: Use perceptually uniform color schemes when possible
Label Clarity: Ensure labels are readable and informative
Plot Size: Match dimensions to intended use (screen vs. publication)
Legend: Include clear color bar with appropriate title

Interpretation

Pattern Recognition: Look for blocks of similar colors
Cluster Validation: Verify that clusters make biological sense
Statistical Significance: Remember that patterns may not be statistically significant
Replication: Validate patterns in independent datasets when possible

Troubleshooting Common Issues

Data Format Problems

# Ensure numeric variables
data$variable <- as.numeric(data$variable)

# Handle missing values
data <- data[complete.cases(data[variables]), ]

# Check for constant variables
apply(data[variables], 2, var, na.rm = TRUE)  # Should be > 0

Visualization Issues

# For too many variables, consider subsetting
important_vars <- c("Gene1", "Gene2", "Gene3")  # Select key variables

# For overcrowded labels, adjust sizes or hide them
row_label_size = 6
show_row_labels = FALSE

Performance Optimization

# For large datasets, consider sampling
sample_data <- data[sample(nrow(data), 100), ]

# Or focus on most variable features
var_scores <- apply(data[variables], 2, var, na.rm = TRUE)
top_vars <- names(sort(var_scores, decreasing = TRUE)[1:20])

Advanced Applications

Time Series Heatmaps

For longitudinal data analysis:

# Simulate time series data
time_series_data <- expand.grid(
  Patient = paste0("P", 1:10),
  Timepoint = paste0("T", 1:5)
) %>%
  mutate(
    Biomarker1 = rnorm(50, 5, 2),
    Biomarker2 = rnorm(50, 3, 1),
    Biomarker3 = rnorm(50, 7, 2)
  )

# Create time series heatmap
timeseries_heatmap <- jggheatmap(
  data = time_series_data,
  row_var = "Patient",
  col_var = "Timepoint",
  value_var = "Biomarker1",
  cluster_rows = TRUE,
  cluster_cols = FALSE,  # Keep temporal order
  color_scheme = "plasma",
  plot_title = "Longitudinal Biomarker Changes"
)

Multi-Level Comparisons

For comparing different conditions:

# Create condition-specific data
condition_data <- histopathology %>%
  mutate(Condition = ifelse(Grade > 2, "High_Grade", "Low_Grade")) %>%
  select(Condition, Age, TStage, LVI, PNI) %>%
  group_by(Condition) %>%
  summarise(
    across(c(Age, TStage, LVI, PNI), mean, na.rm = TRUE),
    .groups = "drop"
  )

# Create comparison heatmap
comparison_heatmap <- jggheatmap(
  data = condition_data,
  matrix_vars = c("Age", "TStage", "LVI", "PNI"),
  cluster_rows = FALSE,
  cluster_cols = TRUE,
  scaling = "column",
  color_scheme = "blue_red",
  plot_title = "Condition Comparison Heatmap"
)

Conclusion

The jggheatmap function provides a comprehensive solution for advanced heatmap visualization in clinical research. Key benefits include:

Flexible Data Input: Support for both matrix and pivot data formats
Advanced Analytics: Hierarchical clustering with multiple distance and linkage methods
Professional Quality: Publication-ready outputs with extensive customization
Clinical Integration: Designed for medical research applications
Scalability: Handles datasets from small studies to large genomic analyses

Heatmaps are powerful tools for exploring complex datasets, identifying patterns, and communicating findings effectively. The jggheatmap function makes it easy to create professional-quality visualizations that meet publication standards and enhance your research presentations.

For specific applications in your research domain, consider consulting with a biostatistician or bioinformatician to ensure appropriate analysis methods and interpretation of results.

References

Wilkinson L, Friendly M. The History of the Cluster Heat Map. The American Statistician. 2009;63(2):179-184.
Weinstein JN, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genetics. 2013;45(10):1113-20.
Eisen MB, et al. Cluster analysis and display of genome-wide expression patterns. PNAS. 1998;95(25):14863-8.
Ward JH. Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association. 1963;58(301):236-244.
Murtagh F, Contreras P. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2012;2(1):86-97.

Professional Heatmap Creation for Clinical Research and Data Analysis

ClinicoPath

2025-07-13

Introduction

Key Features

Basic Heatmap Creation

Matrix Variables Approach

Understanding Matrix Variables

Pivot Format Approach

Row/Column/Value Structure

Data Scaling Options

Row Scaling (Z-score by rows)

Column Scaling

Global Scaling

Clustering Analysis

Hierarchical Clustering

Clustering Methods

Distance Methods

Color Schemes and Visualization

Built-in Color Schemes

Cell Value Display

Advanced Customization

Label and Text Customization

Plot Dimensions and Layout

Cell Appearance Options

Correlation Matrix Visualization

Creating Correlation Heatmaps

Output Options and Data Export

Multiple Output Formats

Clinical Research Applications

Gene Expression Analysis

Biomarker Analysis

Quality Control and Batch Effects

Identifying Outliers

Interactive Features and Annotations

Dendrogram Display

Annotation Variables

Interpretation Guidelines

Reading Heatmaps Effectively

Color Interpretation

Clustering Patterns

Data Scaling Impact

Statistical Considerations

Distance Metrics

Clustering Methods

Best Practices

Data Preparation

Visualization Design

Interpretation

Troubleshooting Common Issues

Data Format Problems

Visualization Issues

Performance Optimization

Advanced Applications

Time Series Heatmaps

Multi-Level Comparisons

Conclusion

References