jjdotplotstats: Comprehensive Dot Plot Analysis
ClinicoPath
2025-07-13
Source:vignettes/jjstatsplot-26-jjdotplotstats-comprehensive.Rmd
jjstatsplot-26-jjdotplotstats-comprehensive.Rmd
Overview
The jjdotplotstats
function provides a powerful
interface for creating statistical dot plots that compare continuous
variables across different groups. This function is a wrapper around
ggstatsplot::ggdotplotstats
and
ggstatsplot::grouped_ggdotplotstats
, offering both single
and grouped dot plot visualizations with comprehensive statistical
analysis.
Key Features
- Multiple statistical methods: Parametric (t-tests), nonparametric (Mann-Whitney U), robust, and Bayesian
- Grouped analysis: Create separate dot plots for different subgroups
- Performance optimized: Uses internal caching to eliminate redundant computations
- Effect size calculations: Multiple effect size options (Cohen’s d, Hedge’s g, eta-squared, omega-squared)
- Centrality measures: Optional display of mean, median, or robust centrality measures
- Comprehensive customization: Titles, labels, themes, and statistical results display
Installation and Setup
# Install ClinicoPath if not already installed
if (!require("ClinicoPath")) {
devtools::install_github("sbalci/ClinicoPathJamoviModule")
}
library(ClinicoPath)
library(ggplot2)
Quick Start
Basic Dot Plot
# Load test data
data(jjdotplotstats_test_data)
# Basic dot plot comparing CRP levels across disease severity
result <- jjdotplotstats(
data = jjdotplotstats_test_data,
dep = "crp_level",
group = "disease_severity",
typestatistics = "parametric"
)
# View the plot
result$plot
Grouped Dot Plot
# Grouped dot plot by treatment center
result_grouped <- jjdotplotstats(
data = jjdotplotstats_test_data,
dep = "esr_level",
group = "disease_severity",
grvar = "treatment_center",
typestatistics = "nonparametric",
centralityplotting = TRUE
)
# View the grouped plot
result_grouped$plot2
Function Parameters
Core Parameters
-
data
: Input data frame containing the variables to analyze -
dep
: Continuous numeric variable for the dot plot (dependent variable) -
group
: Categorical variable defining the groups for comparison -
grvar
: Optional grouping variable for creating separate plots -
typestatistics
: Type of statistical analysis to perform
Statistical Methods
The typestatistics
parameter supports four different
approaches:
-
“parametric” (default): Independent t-tests or
one-way ANOVA
- Assumes normality and equal variances
- Most powerful when assumptions are met
- Effect sizes: Cohen’s d or Hedge’s g
-
“nonparametric”: Mann-Whitney U test or
Kruskal-Wallis test
- Distribution-free method
- Robust to outliers and non-normality
- Based on rank comparisons
-
“robust”: Robust statistical tests
- Robust to outliers and violations of normality
- Uses trimmed means and robust estimators
- Good compromise between parametric and nonparametric
-
“bayes”: Bayesian statistical analysis
- Provides Bayes factors for evidence assessment
- Incorporates prior beliefs
- Gives probabilistic interpretation of results
Performance Optimizations
Version History
The function has been significantly optimized for performance:
Previous Issues: - Redundant data processing in
.plot()
and .plot2()
methods - No caching
infrastructure - Repeated variable conversion and formula construction -
Duplicated options processing
Current Optimizations: - Data
Caching: Uses .prepareData()
method to cache
processed data - Options Caching: Uses
.prepareOptions()
method to cache option processing -
Eliminated Redundancy: Both plot methods now use cached
results - Better Progress Feedback: Clear user
messaging during processing - Efficient Variable
Conversion: Numeric conversion done once and cached
Performance Benefits
# Performance comparison (conceptual)
# Before optimization:
# - Data processed twice (once for each plot)
# - Variable conversion repeated
# - Options processing duplicated
# After optimization:
# - Data processed once and cached
# - Variable conversion done once
# - Significant speedup for large datasets
Advanced Usage Examples
Multiple Statistical Methods Comparison
# Compare different statistical methods
methods <- c("parametric", "nonparametric", "robust", "bayes")
results <- list()
for (method in methods) {
results[[method]] <- jjdotplotstats(
data = jjdotplotstats_test_data,
dep = "crp_level",
group = "disease_severity",
typestatistics = method
)
}
# Each result contains the dot plot with method-specific statistics
# results$parametric$plot
# results$nonparametric$plot
# etc.
Clinical Biomarker Analysis
# Comprehensive biomarker analysis
biomarkers <- c("crp_level", "esr_level", "platelet_count", "hemoglobin_level")
# Analyze each biomarker across disease severity groups
biomarker_results <- list()
for (biomarker in biomarkers) {
biomarker_results[[biomarker]] <- jjdotplotstats(
data = jjdotplotstats_test_data,
dep = biomarker,
group = "disease_severity",
typestatistics = "parametric",
effsizetype = "unbiased",
centralityplotting = TRUE,
centralitytype = "parametric",
mytitle = paste("Distribution of", biomarker, "by Disease Severity"),
xtitle = biomarker,
ytitle = "Disease Severity"
)
}
Gender-Stratified Analysis
# Analyze hemoglobin levels by disease severity, stratified by gender
hgb_gender_analysis <- jjdotplotstats(
data = jjdotplotstats_test_data,
dep = "hemoglobin_level",
group = "disease_severity",
grvar = "gender",
typestatistics = "parametric",
effsizetype = "unbiased",
centralityplotting = TRUE,
mytitle = "Hemoglobin Levels by Disease Severity and Gender",
xtitle = "Hemoglobin Level (g/dL)",
ytitle = "Disease Severity"
)
Treatment Response Analysis
# Analyze biomarker levels by treatment response
# (excluding healthy patients who don't have treatment)
treated_patients <- subset(jjdotplotstats_test_data,
treatment_response != "N/A")
response_analysis <- jjdotplotstats(
data = treated_patients,
dep = "crp_level",
group = "treatment_response",
grvar = "comorbidity_status",
typestatistics = "nonparametric",
centralityplotting = TRUE,
centralitytype = "nonparametric",
mytitle = "CRP Levels by Treatment Response and Comorbidity Status",
xtitle = "CRP Level (mg/L)",
ytitle = "Treatment Response"
)
Data Requirements
Input Data Structure
The input data should be a data frame with:
- Continuous variable: Numeric column for the dot plot (dependent variable)
- Grouping variable: Factor or character column defining comparison groups
- Optional grouping variable: Additional factor for stratified analysis
- Complete cases: Missing values are automatically excluded
Example Data Structure
# Structure of test data
str(jjdotplotstats_test_data)
# Key variables:
# - crp_level: C-reactive protein levels (mg/L)
# - esr_level: Erythrocyte sedimentation rate (mm/hr)
# - platelet_count: Platelet count (×10³/μL)
# - hemoglobin_level: Hemoglobin levels (g/dL)
# - disease_severity: Ordered factor (Healthy, Mild, Moderate, Severe)
# - treatment_center: Factor (Center A, B, C, D)
# - gender: Factor (Male, Female)
Best Practices
Variable Selection and Preparation
-
Choose appropriate variables:
- Dependent variable should be continuous and numeric
- Grouping variables should be categorical with meaningful levels
-
Check distributions:
- Examine histograms and Q-Q plots
- Consider transformations for highly skewed data
-
Handle missing data:
- Decide on complete case analysis vs. imputation
- Document missing data patterns
-
Sample size considerations:
- Ensure adequate sample size within each group
- Consider power analysis for effect size detection
Statistical Method Selection
- Use parametric when data is approximately normal with equal variances
- Use nonparametric for ordinal data, non-normal distributions, or small samples
- Use robust when there are outliers but parametric interpretation is desired
- Use Bayesian when you want to quantify evidence and incorporate prior knowledge
Effect Size Interpretation
-
Cohen’s d / Hedge’s g:
- 0.2: Small effect
- 0.5: Medium effect
- 0.8: Large effect
-
Eta-squared / Omega-squared:
- 0.01: Small effect (1% variance explained)
- 0.06: Medium effect (6% variance explained)
- 0.14: Large effect (14% variance explained)
Customization Tips
# Highly customized dot plot
custom_plot <- jjdotplotstats(
data = jjdotplotstats_test_data,
dep = "crp_level",
group = "disease_severity",
typestatistics = "parametric",
effsizetype = "unbiased",
centralityplotting = TRUE,
centralitytype = "parametric",
mytitle = "C-Reactive Protein Levels Across Disease Severity Groups",
xtitle = "CRP Level (mg/L)",
ytitle = "Disease Severity",
originaltheme = FALSE, # Use jamovi theme
resultssubtitle = TRUE # Show statistical results
)
Troubleshooting
Common Issues
-
“Data contains no (complete) rows”
- Check for missing values in your variables
- Ensure at least some complete cases exist
- Consider imputation or subset analysis
-
No plot generated
- Verify that you have specified both
dep
andgroup
- Check that variables exist in your dataset
- Ensure proper variable types (numeric for dep, factor for group)
- Verify that you have specified both
-
Slow performance
- The optimized version should be much faster
- For very large datasets, consider sampling
- Check for memory constraints
Error Handling
# Example error handling
tryCatch({
result <- jjdotplotstats(
data = my_data,
dep = "continuous_var",
group = "group_var",
typestatistics = "parametric"
)
}, error = function(e) {
message("Error in dot plot analysis: ", e$message)
message("Check your data structure and variable types")
})
Technical Details
Underlying Functions
The jjdotplotstats
function is built on:
- ggstatsplot::ggdotplotstats: For single dot plots
- ggstatsplot::grouped_ggdotplotstats: For grouped analyses
- jmvcore: For data handling and option processing
Caching Implementation
# Internal caching structure (conceptual)
# private$.processedData: Cached cleaned data with converted variables
# private$.processedOptions: Cached option processing and titles
#
# Benefits:
# - Eliminates redundant jmvcore::naOmit() calls
# - Avoids repeated variable conversion
# - Shares processed data between plot methods
# - Reduces option processing overhead
Clinical Applications
Biomarker Studies
Dot plots are particularly useful for:
- Diagnostic biomarker evaluation: Comparing levels across disease groups
- Treatment monitoring: Assessing biomarker changes over time
- Cohort comparisons: Analyzing differences between study populations
- Quality control: Identifying outliers and batch effects
Conclusion
The optimized jjdotplotstats
function provides:
- High performance: Significant speed improvements through caching
- Statistical rigor: Multiple statistical methods with effect sizes
- Flexibility: Extensive customization and grouping options
- Clinical relevance: Designed for biomarker and clinical research applications
- Usability: Clear documentation and comprehensive examples
The function is well-suited for clinical research, biomarker analysis, and any scenario requiring robust comparison of continuous variables across groups.
Session Information
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Istanbul
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 desc_1.4.3 R6_2.6.1 fastmap_1.2.0
## [5] xfun_0.52 cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1
## [9] rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.5 sass_0.4.10
## [13] pkgdown_2.1.3 textshaping_1.0.1 jquerylib_0.1.4 systemfonts_1.2.3
## [17] compiler_4.5.1 rstudioapi_0.17.1 tools_4.5.1 ragg_1.4.0
## [21] bslib_0.9.0 evaluate_1.0.4 yaml_2.3.10 jsonlite_2.0.0
## [25] rlang_1.1.6 fs_1.6.6 htmlwidgets_1.6.4