jjcorrmat: Comprehensive Correlation Matrix Analysis

Overview

The jjcorrmat function provides a comprehensive interface for creating correlation matrices with advanced statistical analysis capabilities. This function is a wrapper around ggstatsplot::ggcorrmat and ggstatsplot::grouped_ggcorrmat, offering both single and grouped correlation matrix visualizations.

Key Features

Multiple correlation methods: Parametric (Pearson), nonparametric (Spearman), robust, and Bayesian
Grouped analysis: Create separate correlation matrices for different groups
Performance optimized: Uses internal caching to eliminate redundant computations
Comprehensive visualization: High-quality correlation plots with statistical annotations

Installation and Setup

# Install ClinicoPath if not already installed
if (!require("ClinicoPath")) {
  devtools::install_github("sbalci/ClinicoPathJamoviModule")
}

library(ClinicoPath)
library(ggplot2)

Quick Start

Basic Correlation Matrix

# Load test data
data(jjcorrmat_test_data)

# Basic correlation matrix
result <- jjcorrmat(
  data = jjcorrmat_test_data,
  dep = c("ki67_percent", "p53_score", "her2_intensity", "tumor_size_mm"),
  typestatistics = "parametric"
)

# View the plot
result$plot

Grouped Correlation Matrix

# Grouped correlation matrix by tumor grade
result_grouped <- jjcorrmat(
  data = jjcorrmat_test_data,
  dep = c("ki67_percent", "p53_score", "her2_intensity"),
  grvar = "tumor_grade",
  typestatistics = "nonparametric"
)

# View the grouped plot
result_grouped$plot2

Function Parameters

Core Parameters

data: Input data frame containing the variables to analyze
dep: List of continuous variables for correlation analysis (must be numeric)
grvar: Optional grouping variable for creating separate correlation matrices
typestatistics: Type of correlation analysis to perform

Statistical Methods

The typestatistics parameter supports four different approaches:

“parametric” (default): Pearson product-moment correlation
- Assumes normal distribution
- Best for linear relationships
- Most commonly used
“nonparametric”: Spearman’s rank correlation
- Distribution-free method
- Robust to outliers
- Captures monotonic relationships
“robust”: Percentage bend correlation
- Robust to outliers and non-normality
- Good compromise between parametric and nonparametric
- Uses WRS2::pbcor()
“bayes”: Bayesian correlation analysis
- Provides Bayes factors for evidence assessment
- Incorporates prior beliefs
- Gives probabilistic interpretation

Performance Optimizations

Version History

The function has been significantly optimized for performance:

Previous Issues: - Redundant data processing in .plot() and .plot2() methods - Unused caching infrastructure - Repeated formula construction and variable processing

Current Optimizations: - Data Caching: Uses .prepareData() method to cache processed data - Options Caching: Uses .prepareOptions() method to cache formula processing - Eliminated Redundancy: Both plot methods now use cached results - Better Progress Feedback: Clear user messaging during processing

Performance Benefits

# Performance comparison (conceptual)
# Before optimization: 
# - Data processed twice (once for each plot)
# - Formula construction repeated
# - Variable processing duplicated

# After optimization:
# - Data processed once and cached
# - Formula construction done once
# - Significant speedup for large datasets

Advanced Usage Examples

Multiple Statistical Methods

# Compare different correlation methods
methods <- c("parametric", "nonparametric", "robust", "bayes")
variables <- c("ki67_percent", "p53_score", "her2_intensity")

results <- list()
for (method in methods) {
  results[[method]] <- jjcorrmat(
    data = jjcorrmat_test_data,
    dep = variables,
    typestatistics = method
  )
}

# Each result contains the correlation matrix plot
# results$parametric$plot
# results$nonparametric$plot
# etc.

Clinical Research Example

# Analyze biomarker correlations in breast cancer data
biomarkers <- c("ki67_percent", "p53_score", "her2_intensity", "tumor_size_mm", "age_years")

# Overall correlation matrix
overall_corr <- jjcorrmat(
  data = jjcorrmat_test_data,
  dep = biomarkers,
  typestatistics = "parametric"
)

# Stratified by hormone receptor status
stratified_corr <- jjcorrmat(
  data = jjcorrmat_test_data,
  dep = biomarkers,
  grvar = "hormone_status",
  typestatistics = "parametric"
)

# Compare correlations across tumor grades
grade_corr <- jjcorrmat(
  data = jjcorrmat_test_data,
  dep = biomarkers[1:4],  # Exclude age for clarity
  grvar = "tumor_grade",
  typestatistics = "spearman"
)

Data Requirements

Input Data Structure

The input data should be a data frame with:

Continuous variables: Numeric columns for correlation analysis
Grouping variables: Factor or character columns for stratified analysis
Complete cases: Missing values are automatically excluded

Example Data Structure

# Structure of test data
str(jjcorrmat_test_data)

# Key variables:
# - ki67_percent: Numeric (0-100)
# - p53_score: Numeric (0-50)
# - her2_intensity: Numeric (0-30)
# - tumor_size_mm: Numeric (5-50)
# - age_years: Numeric (18-90)
# - tumor_grade: Factor (Grade 1/2/3)
# - hormone_status: Factor (ER+/PR+, etc.)

Best Practices

Variable Selection

Choose appropriate variables: Select variables that theoretically should be correlated
Check distributions: Consider log transformation for skewed variables
Handle missing data: Decide on complete case analysis vs. imputation
Sample size: Ensure adequate sample size for stable correlations

Statistical Method Selection

Use parametric when variables are approximately normal
Use nonparametric for ordinal data or non-normal distributions
Use robust when there are outliers but you want parametric-like interpretation
Use Bayesian when you want to quantify evidence for/against correlations

Interpretation Guidelines

Correlation strength:
- 0.00-0.30: Weak
- 0.30-0.70: Moderate
- 0.70-1.00: Strong
Statistical significance: Consider both p-values and effect sizes
Clinical significance: Strong statistical correlation may not be clinically meaningful

Troubleshooting

Common Issues

“Data contains no (complete) rows”
- Check for missing values in your variables
- Ensure at least some complete cases exist
No plot generated
- Verify that you have at least 2 continuous variables
- Check that variables are properly formatted as numeric
Slow performance
- The optimized version should be much faster
- For very large datasets, consider sampling

Error Messages

# Example error handling
tryCatch({
  result <- jjcorrmat(
    data = my_data,
    dep = c("var1", "var2"),
    typestatistics = "parametric"
  )
}, error = function(e) {
  message("Error in correlation analysis: ", e$message)
  message("Check your data structure and variable types")
})

Technical Details

Underlying Functions

The jjcorrmat function is built on:

ggstatsplot::ggcorrmat: For single correlation matrices
ggstatsplot::grouped_ggcorrmat: For grouped analyses
jmvcore: For data handling and formula processing

Caching Implementation

# Internal caching structure (conceptual)
# private$.processedData: Cached cleaned data
# private$.processedOptions: Cached formula and variable processing
# 
# Benefits:
# - Eliminates redundant jmvcore::naOmit() calls
# - Avoids repeated formula construction
# - Shares processed data between plot methods

Conclusion

The optimized jjcorrmat function provides:

High performance: Significant speed improvements through caching
Flexibility: Multiple statistical methods and grouping options
Reliability: Comprehensive error handling and validation
Usability: Clear documentation and examples

The function is well-suited for clinical research, biomarker analysis, and any scenario requiring robust correlation matrix visualization.

Session Information

sessionInfo()

## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Istanbul
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     desc_1.4.3        R6_2.6.1          fastmap_1.2.0    
##  [5] xfun_0.52         cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1
##  [9] rmarkdown_2.29    lifecycle_1.0.4   cli_3.6.5         sass_0.4.10      
## [13] pkgdown_2.1.3     textshaping_1.0.1 jquerylib_0.1.4   systemfonts_1.2.3
## [17] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       ragg_1.4.0       
## [21] bslib_0.9.0       evaluate_1.0.4    yaml_2.3.10       jsonlite_2.0.0   
## [25] rlang_1.1.6       fs_1.6.6          htmlwidgets_1.6.4

ClinicoPath

2025-07-13