jjdotplotstats: Comprehensive Dot Plot Analysis

Overview

The jjdotplotstats function provides a powerful interface for creating statistical dot plots that compare continuous variables across different groups. This function is a wrapper around ggstatsplot::ggdotplotstats and ggstatsplot::grouped_ggdotplotstats, offering both single and grouped dot plot visualizations with comprehensive statistical analysis.

Key Features

Multiple statistical methods: Parametric (t-tests), nonparametric (Mann-Whitney U), robust, and Bayesian
Grouped analysis: Create separate dot plots for different subgroups
Performance optimized: Uses internal caching to eliminate redundant computations
Effect size calculations: Multiple effect size options (Cohen’s d, Hedge’s g, eta-squared, omega-squared)
Centrality measures: Optional display of mean, median, or robust centrality measures
Comprehensive customization: Titles, labels, themes, and statistical results display

Installation and Setup

# Install ClinicoPath if not already installed
if (!require("ClinicoPath")) {
  devtools::install_github("sbalci/ClinicoPathJamoviModule")
}

library(ClinicoPath)
library(ggplot2)

Quick Start

Basic Dot Plot

# Load test data
data(jjdotplotstats_test_data)

# Basic dot plot comparing CRP levels across disease severity
result <- jjdotplotstats(
  data = jjdotplotstats_test_data,
  dep = "crp_level",
  group = "disease_severity",
  typestatistics = "parametric"
)

# View the plot
result$plot

Grouped Dot Plot

# Grouped dot plot by treatment center
result_grouped <- jjdotplotstats(
  data = jjdotplotstats_test_data,
  dep = "esr_level",
  group = "disease_severity",
  grvar = "treatment_center",
  typestatistics = "nonparametric",
  centralityplotting = TRUE
)

# View the grouped plot
result_grouped$plot2

Function Parameters

Core Parameters

data: Input data frame containing the variables to analyze
dep: Continuous numeric variable for the dot plot (dependent variable)
group: Categorical variable defining the groups for comparison
grvar: Optional grouping variable for creating separate plots
typestatistics: Type of statistical analysis to perform

Statistical Methods

The typestatistics parameter supports four different approaches:

“parametric” (default): Independent t-tests or one-way ANOVA
- Assumes normality and equal variances
- Most powerful when assumptions are met
- Effect sizes: Cohen’s d or Hedge’s g
“nonparametric”: Mann-Whitney U test or Kruskal-Wallis test
- Distribution-free method
- Robust to outliers and non-normality
- Based on rank comparisons
“robust”: Robust statistical tests
- Robust to outliers and violations of normality
- Uses trimmed means and robust estimators
- Good compromise between parametric and nonparametric
“bayes”: Bayesian statistical analysis
- Provides Bayes factors for evidence assessment
- Incorporates prior beliefs
- Gives probabilistic interpretation of results

Effect Size Options

The effsizetype parameter controls effect size calculations:

“biased”: Cohen’s d (default)
“unbiased”: Hedge’s g (small sample correction)
“eta”: Eta-squared (proportion of variance explained)
“omega”: Omega-squared (less biased than eta-squared)

Centrality Measures

When centralityplotting = TRUE, the centralitytype parameter controls the measure displayed:

“parametric”: Mean
“nonparametric”: Median
“robust”: Trimmed mean
“bayes”: MAP (Maximum A Posteriori) estimator

Performance Optimizations

Version History

The function has been significantly optimized for performance:

Previous Issues: - Redundant data processing in .plot() and .plot2() methods - No caching infrastructure - Repeated variable conversion and formula construction - Duplicated options processing

Current Optimizations: - Data Caching: Uses .prepareData() method to cache processed data - Options Caching: Uses .prepareOptions() method to cache option processing - Eliminated Redundancy: Both plot methods now use cached results - Better Progress Feedback: Clear user messaging during processing - Efficient Variable Conversion: Numeric conversion done once and cached

Performance Benefits

# Performance comparison (conceptual)
# Before optimization: 
# - Data processed twice (once for each plot)
# - Variable conversion repeated
# - Options processing duplicated

# After optimization:
# - Data processed once and cached
# - Variable conversion done once
# - Significant speedup for large datasets

Advanced Usage Examples

Multiple Statistical Methods Comparison

# Compare different statistical methods
methods <- c("parametric", "nonparametric", "robust", "bayes")

results <- list()
for (method in methods) {
  results[[method]] <- jjdotplotstats(
    data = jjdotplotstats_test_data,
    dep = "crp_level",
    group = "disease_severity",
    typestatistics = method
  )
}

# Each result contains the dot plot with method-specific statistics
# results$parametric$plot
# results$nonparametric$plot
# etc.

Clinical Biomarker Analysis

# Comprehensive biomarker analysis
biomarkers <- c("crp_level", "esr_level", "platelet_count", "hemoglobin_level")

# Analyze each biomarker across disease severity groups
biomarker_results <- list()

for (biomarker in biomarkers) {
  biomarker_results[[biomarker]] <- jjdotplotstats(
    data = jjdotplotstats_test_data,
    dep = biomarker,
    group = "disease_severity",
    typestatistics = "parametric",
    effsizetype = "unbiased",
    centralityplotting = TRUE,
    centralitytype = "parametric",
    mytitle = paste("Distribution of", biomarker, "by Disease Severity"),
    xtitle = biomarker,
    ytitle = "Disease Severity"
  )
}

Gender-Stratified Analysis

# Analyze hemoglobin levels by disease severity, stratified by gender
hgb_gender_analysis <- jjdotplotstats(
  data = jjdotplotstats_test_data,
  dep = "hemoglobin_level",
  group = "disease_severity",
  grvar = "gender",
  typestatistics = "parametric",
  effsizetype = "unbiased",
  centralityplotting = TRUE,
  mytitle = "Hemoglobin Levels by Disease Severity and Gender",
  xtitle = "Hemoglobin Level (g/dL)",
  ytitle = "Disease Severity"
)

Treatment Response Analysis

# Analyze biomarker levels by treatment response
# (excluding healthy patients who don't have treatment)
treated_patients <- subset(jjdotplotstats_test_data, 
                          treatment_response != "N/A")

response_analysis <- jjdotplotstats(
  data = treated_patients,
  dep = "crp_level",
  group = "treatment_response",
  grvar = "comorbidity_status",
  typestatistics = "nonparametric",
  centralityplotting = TRUE,
  centralitytype = "nonparametric",
  mytitle = "CRP Levels by Treatment Response and Comorbidity Status",
  xtitle = "CRP Level (mg/L)",
  ytitle = "Treatment Response"
)

Data Requirements

Input Data Structure

The input data should be a data frame with:

Continuous variable: Numeric column for the dot plot (dependent variable)
Grouping variable: Factor or character column defining comparison groups
Optional grouping variable: Additional factor for stratified analysis
Complete cases: Missing values are automatically excluded

Example Data Structure

# Structure of test data
str(jjdotplotstats_test_data)

# Key variables:
# - crp_level: C-reactive protein levels (mg/L)
# - esr_level: Erythrocyte sedimentation rate (mm/hr)  
# - platelet_count: Platelet count (×10³/μL)
# - hemoglobin_level: Hemoglobin levels (g/dL)
# - disease_severity: Ordered factor (Healthy, Mild, Moderate, Severe)
# - treatment_center: Factor (Center A, B, C, D)
# - gender: Factor (Male, Female)

Best Practices

Variable Selection and Preparation

Choose appropriate variables:
- Dependent variable should be continuous and numeric
- Grouping variables should be categorical with meaningful levels
Check distributions:
- Examine histograms and Q-Q plots
- Consider transformations for highly skewed data
Handle missing data:
- Decide on complete case analysis vs. imputation
- Document missing data patterns
Sample size considerations:
- Ensure adequate sample size within each group
- Consider power analysis for effect size detection

Statistical Method Selection

Use parametric when data is approximately normal with equal variances
Use nonparametric for ordinal data, non-normal distributions, or small samples
Use robust when there are outliers but parametric interpretation is desired
Use Bayesian when you want to quantify evidence and incorporate prior knowledge

Effect Size Interpretation

Cohen’s d / Hedge’s g:
- 0.2: Small effect
- 0.5: Medium effect
- 0.8: Large effect
Eta-squared / Omega-squared:
- 0.01: Small effect (1% variance explained)
- 0.06: Medium effect (6% variance explained)
- 0.14: Large effect (14% variance explained)

Customization Tips

# Highly customized dot plot
custom_plot <- jjdotplotstats(
  data = jjdotplotstats_test_data,
  dep = "crp_level",
  group = "disease_severity",
  typestatistics = "parametric",
  effsizetype = "unbiased",
  centralityplotting = TRUE,
  centralitytype = "parametric",
  mytitle = "C-Reactive Protein Levels Across Disease Severity Groups",
  xtitle = "CRP Level (mg/L)",
  ytitle = "Disease Severity",
  originaltheme = FALSE,  # Use jamovi theme
  resultssubtitle = TRUE  # Show statistical results
)

Troubleshooting

Common Issues

“Data contains no (complete) rows”
- Check for missing values in your variables
- Ensure at least some complete cases exist
- Consider imputation or subset analysis
No plot generated
- Verify that you have specified both dep and group
- Check that variables exist in your dataset
- Ensure proper variable types (numeric for dep, factor for group)
Slow performance
- The optimized version should be much faster
- For very large datasets, consider sampling
- Check for memory constraints

Error Handling

# Example error handling
tryCatch({
  result <- jjdotplotstats(
    data = my_data,
    dep = "continuous_var",
    group = "group_var",
    typestatistics = "parametric"
  )
}, error = function(e) {
  message("Error in dot plot analysis: ", e$message)
  message("Check your data structure and variable types")
})

Technical Details

Underlying Functions

The jjdotplotstats function is built on:

ggstatsplot::ggdotplotstats: For single dot plots
ggstatsplot::grouped_ggdotplotstats: For grouped analyses
jmvcore: For data handling and option processing

Caching Implementation

# Internal caching structure (conceptual)
# private$.processedData: Cached cleaned data with converted variables
# private$.processedOptions: Cached option processing and titles
# 
# Benefits:
# - Eliminates redundant jmvcore::naOmit() calls
# - Avoids repeated variable conversion
# - Shares processed data between plot methods
# - Reduces option processing overhead

Clinical Applications

Biomarker Studies

Dot plots are particularly useful for:

Diagnostic biomarker evaluation: Comparing levels across disease groups
Treatment monitoring: Assessing biomarker changes over time
Cohort comparisons: Analyzing differences between study populations
Quality control: Identifying outliers and batch effects

Research Scenarios

Drug Development: Compare efficacy biomarkers across treatment arms
Epidemiology: Analyze exposure-outcome relationships
Clinical Trials: Assess treatment effects on continuous endpoints
Laboratory Medicine: Compare reference ranges across populations

Conclusion

The optimized jjdotplotstats function provides:

High performance: Significant speed improvements through caching
Statistical rigor: Multiple statistical methods with effect sizes
Flexibility: Extensive customization and grouping options
Clinical relevance: Designed for biomarker and clinical research applications
Usability: Clear documentation and comprehensive examples

The function is well-suited for clinical research, biomarker analysis, and any scenario requiring robust comparison of continuous variables across groups.

Session Information

sessionInfo()

## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Istanbul
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     desc_1.4.3        R6_2.6.1          fastmap_1.2.0    
##  [5] xfun_0.52         cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1
##  [9] rmarkdown_2.29    lifecycle_1.0.4   cli_3.6.5         sass_0.4.10      
## [13] pkgdown_2.1.3     textshaping_1.0.1 jquerylib_0.1.4   systemfonts_1.2.3
## [17] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       ragg_1.4.0       
## [21] bslib_0.9.0       evaluate_1.0.4    yaml_2.3.10       jsonlite_2.0.0   
## [25] rlang_1.1.6       fs_1.6.6          htmlwidgets_1.6.4

ClinicoPath

2025-07-13