jjcorrmat: Comprehensive Correlation Matrix Analysis
ClinicoPath
2025-07-13
Source:vignettes/jjstatsplot-25-jjcorrmat-comprehensive.Rmd
jjstatsplot-25-jjcorrmat-comprehensive.Rmd
Overview
The jjcorrmat
function provides a comprehensive
interface for creating correlation matrices with advanced statistical
analysis capabilities. This function is a wrapper around
ggstatsplot::ggcorrmat
and
ggstatsplot::grouped_ggcorrmat
, offering both single and
grouped correlation matrix visualizations.
Key Features
- Multiple correlation methods: Parametric (Pearson), nonparametric (Spearman), robust, and Bayesian
- Grouped analysis: Create separate correlation matrices for different groups
- Performance optimized: Uses internal caching to eliminate redundant computations
- Comprehensive visualization: High-quality correlation plots with statistical annotations
Installation and Setup
# Install ClinicoPath if not already installed
if (!require("ClinicoPath")) {
devtools::install_github("sbalci/ClinicoPathJamoviModule")
}
library(ClinicoPath)
library(ggplot2)
Function Parameters
Core Parameters
-
data
: Input data frame containing the variables to analyze -
dep
: List of continuous variables for correlation analysis (must be numeric) -
grvar
: Optional grouping variable for creating separate correlation matrices -
typestatistics
: Type of correlation analysis to perform
Statistical Methods
The typestatistics
parameter supports four different
approaches:
-
“parametric” (default): Pearson product-moment
correlation
- Assumes normal distribution
- Best for linear relationships
- Most commonly used
-
“nonparametric”: Spearman’s rank correlation
- Distribution-free method
- Robust to outliers
- Captures monotonic relationships
-
“robust”: Percentage bend correlation
- Robust to outliers and non-normality
- Good compromise between parametric and nonparametric
- Uses WRS2::pbcor()
-
“bayes”: Bayesian correlation analysis
- Provides Bayes factors for evidence assessment
- Incorporates prior beliefs
- Gives probabilistic interpretation
Performance Optimizations
Version History
The function has been significantly optimized for performance:
Previous Issues: - Redundant data processing in
.plot()
and .plot2()
methods - Unused caching
infrastructure - Repeated formula construction and variable
processing
Current Optimizations: - Data
Caching: Uses .prepareData()
method to cache
processed data - Options Caching: Uses
.prepareOptions()
method to cache formula processing -
Eliminated Redundancy: Both plot methods now use cached
results - Better Progress Feedback: Clear user
messaging during processing
Performance Benefits
# Performance comparison (conceptual)
# Before optimization:
# - Data processed twice (once for each plot)
# - Formula construction repeated
# - Variable processing duplicated
# After optimization:
# - Data processed once and cached
# - Formula construction done once
# - Significant speedup for large datasets
Advanced Usage Examples
Multiple Statistical Methods
# Compare different correlation methods
methods <- c("parametric", "nonparametric", "robust", "bayes")
variables <- c("ki67_percent", "p53_score", "her2_intensity")
results <- list()
for (method in methods) {
results[[method]] <- jjcorrmat(
data = jjcorrmat_test_data,
dep = variables,
typestatistics = method
)
}
# Each result contains the correlation matrix plot
# results$parametric$plot
# results$nonparametric$plot
# etc.
Clinical Research Example
# Analyze biomarker correlations in breast cancer data
biomarkers <- c("ki67_percent", "p53_score", "her2_intensity", "tumor_size_mm", "age_years")
# Overall correlation matrix
overall_corr <- jjcorrmat(
data = jjcorrmat_test_data,
dep = biomarkers,
typestatistics = "parametric"
)
# Stratified by hormone receptor status
stratified_corr <- jjcorrmat(
data = jjcorrmat_test_data,
dep = biomarkers,
grvar = "hormone_status",
typestatistics = "parametric"
)
# Compare correlations across tumor grades
grade_corr <- jjcorrmat(
data = jjcorrmat_test_data,
dep = biomarkers[1:4], # Exclude age for clarity
grvar = "tumor_grade",
typestatistics = "spearman"
)
Data Requirements
Input Data Structure
The input data should be a data frame with:
- Continuous variables: Numeric columns for correlation analysis
- Grouping variables: Factor or character columns for stratified analysis
- Complete cases: Missing values are automatically excluded
Example Data Structure
# Structure of test data
str(jjcorrmat_test_data)
# Key variables:
# - ki67_percent: Numeric (0-100)
# - p53_score: Numeric (0-50)
# - her2_intensity: Numeric (0-30)
# - tumor_size_mm: Numeric (5-50)
# - age_years: Numeric (18-90)
# - tumor_grade: Factor (Grade 1/2/3)
# - hormone_status: Factor (ER+/PR+, etc.)
Best Practices
Variable Selection
- Choose appropriate variables: Select variables that theoretically should be correlated
- Check distributions: Consider log transformation for skewed variables
- Handle missing data: Decide on complete case analysis vs. imputation
- Sample size: Ensure adequate sample size for stable correlations
Troubleshooting
Common Issues
-
“Data contains no (complete) rows”
- Check for missing values in your variables
- Ensure at least some complete cases exist
-
No plot generated
- Verify that you have at least 2 continuous variables
- Check that variables are properly formatted as numeric
-
Slow performance
- The optimized version should be much faster
- For very large datasets, consider sampling
Technical Details
Underlying Functions
The jjcorrmat
function is built on:
- ggstatsplot::ggcorrmat: For single correlation matrices
- ggstatsplot::grouped_ggcorrmat: For grouped analyses
- jmvcore: For data handling and formula processing
Caching Implementation
# Internal caching structure (conceptual)
# private$.processedData: Cached cleaned data
# private$.processedOptions: Cached formula and variable processing
#
# Benefits:
# - Eliminates redundant jmvcore::naOmit() calls
# - Avoids repeated formula construction
# - Shares processed data between plot methods
Conclusion
The optimized jjcorrmat
function provides:
- High performance: Significant speed improvements through caching
- Flexibility: Multiple statistical methods and grouping options
- Reliability: Comprehensive error handling and validation
- Usability: Clear documentation and examples
The function is well-suited for clinical research, biomarker analysis, and any scenario requiring robust correlation matrix visualization.
Session Information
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Istanbul
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 desc_1.4.3 R6_2.6.1 fastmap_1.2.0
## [5] xfun_0.52 cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1
## [9] rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.5 sass_0.4.10
## [13] pkgdown_2.1.3 textshaping_1.0.1 jquerylib_0.1.4 systemfonts_1.2.3
## [17] compiler_4.5.1 rstudioapi_0.17.1 tools_4.5.1 ragg_1.4.0
## [21] bslib_0.9.0 evaluate_1.0.4 yaml_2.3.10 jsonlite_2.0.0
## [25] rlang_1.1.6 fs_1.6.6 htmlwidgets_1.6.4