Skip to contents

Overview

The jjdotplotstats function provides a powerful interface for creating statistical dot plots that compare continuous variables across different groups. This function is a wrapper around ggstatsplot::ggdotplotstats and ggstatsplot::grouped_ggdotplotstats, offering both single and grouped dot plot visualizations with comprehensive statistical analysis.

Key Features

  • Multiple statistical methods: Parametric (t-tests), nonparametric (Mann-Whitney U), robust, and Bayesian
  • Grouped analysis: Create separate dot plots for different subgroups
  • Performance optimized: Uses internal caching to eliminate redundant computations
  • Effect size calculations: Multiple effect size options (Cohen’s d, Hedge’s g, eta-squared, omega-squared)
  • Centrality measures: Optional display of mean, median, or robust centrality measures
  • Comprehensive customization: Titles, labels, themes, and statistical results display

Installation and Setup

# Install ClinicoPath if not already installed
if (!require("ClinicoPath")) {
  devtools::install_github("sbalci/ClinicoPathJamoviModule")
}

library(ClinicoPath)
library(ggplot2)

Quick Start

Basic Dot Plot

# Load test data
data(jjdotplotstats_test_data)

# Basic dot plot comparing CRP levels across disease severity
result <- jjdotplotstats(
  data = jjdotplotstats_test_data,
  dep = "crp_level",
  group = "disease_severity",
  typestatistics = "parametric"
)

# View the plot
result$plot

Grouped Dot Plot

# Grouped dot plot by treatment center
result_grouped <- jjdotplotstats(
  data = jjdotplotstats_test_data,
  dep = "esr_level",
  group = "disease_severity",
  grvar = "treatment_center",
  typestatistics = "nonparametric",
  centralityplotting = TRUE
)

# View the grouped plot
result_grouped$plot2

Function Parameters

Core Parameters

  • data: Input data frame containing the variables to analyze
  • dep: Continuous numeric variable for the dot plot (dependent variable)
  • group: Categorical variable defining the groups for comparison
  • grvar: Optional grouping variable for creating separate plots
  • typestatistics: Type of statistical analysis to perform

Statistical Methods

The typestatistics parameter supports four different approaches:

  1. “parametric” (default): Independent t-tests or one-way ANOVA
    • Assumes normality and equal variances
    • Most powerful when assumptions are met
    • Effect sizes: Cohen’s d or Hedge’s g
  2. “nonparametric”: Mann-Whitney U test or Kruskal-Wallis test
    • Distribution-free method
    • Robust to outliers and non-normality
    • Based on rank comparisons
  3. “robust”: Robust statistical tests
    • Robust to outliers and violations of normality
    • Uses trimmed means and robust estimators
    • Good compromise between parametric and nonparametric
  4. “bayes”: Bayesian statistical analysis
    • Provides Bayes factors for evidence assessment
    • Incorporates prior beliefs
    • Gives probabilistic interpretation of results

Effect Size Options

The effsizetype parameter controls effect size calculations:

  • “biased”: Cohen’s d (default)
  • “unbiased”: Hedge’s g (small sample correction)
  • “eta”: Eta-squared (proportion of variance explained)
  • “omega”: Omega-squared (less biased than eta-squared)

Centrality Measures

When centralityplotting = TRUE, the centralitytype parameter controls the measure displayed:

  • “parametric”: Mean
  • “nonparametric”: Median
  • “robust”: Trimmed mean
  • “bayes”: MAP (Maximum A Posteriori) estimator

Performance Optimizations

Version History

The function has been significantly optimized for performance:

Previous Issues: - Redundant data processing in .plot() and .plot2() methods - No caching infrastructure - Repeated variable conversion and formula construction - Duplicated options processing

Current Optimizations: - Data Caching: Uses .prepareData() method to cache processed data - Options Caching: Uses .prepareOptions() method to cache option processing - Eliminated Redundancy: Both plot methods now use cached results - Better Progress Feedback: Clear user messaging during processing - Efficient Variable Conversion: Numeric conversion done once and cached

Performance Benefits

# Performance comparison (conceptual)
# Before optimization: 
# - Data processed twice (once for each plot)
# - Variable conversion repeated
# - Options processing duplicated

# After optimization:
# - Data processed once and cached
# - Variable conversion done once
# - Significant speedup for large datasets

Advanced Usage Examples

Multiple Statistical Methods Comparison

# Compare different statistical methods
methods <- c("parametric", "nonparametric", "robust", "bayes")

results <- list()
for (method in methods) {
  results[[method]] <- jjdotplotstats(
    data = jjdotplotstats_test_data,
    dep = "crp_level",
    group = "disease_severity",
    typestatistics = method
  )
}

# Each result contains the dot plot with method-specific statistics
# results$parametric$plot
# results$nonparametric$plot
# etc.

Clinical Biomarker Analysis

# Comprehensive biomarker analysis
biomarkers <- c("crp_level", "esr_level", "platelet_count", "hemoglobin_level")

# Analyze each biomarker across disease severity groups
biomarker_results <- list()

for (biomarker in biomarkers) {
  biomarker_results[[biomarker]] <- jjdotplotstats(
    data = jjdotplotstats_test_data,
    dep = biomarker,
    group = "disease_severity",
    typestatistics = "parametric",
    effsizetype = "unbiased",
    centralityplotting = TRUE,
    centralitytype = "parametric",
    mytitle = paste("Distribution of", biomarker, "by Disease Severity"),
    xtitle = biomarker,
    ytitle = "Disease Severity"
  )
}

Gender-Stratified Analysis

# Analyze hemoglobin levels by disease severity, stratified by gender
hgb_gender_analysis <- jjdotplotstats(
  data = jjdotplotstats_test_data,
  dep = "hemoglobin_level",
  group = "disease_severity",
  grvar = "gender",
  typestatistics = "parametric",
  effsizetype = "unbiased",
  centralityplotting = TRUE,
  mytitle = "Hemoglobin Levels by Disease Severity and Gender",
  xtitle = "Hemoglobin Level (g/dL)",
  ytitle = "Disease Severity"
)

Treatment Response Analysis

# Analyze biomarker levels by treatment response
# (excluding healthy patients who don't have treatment)
treated_patients <- subset(jjdotplotstats_test_data, 
                          treatment_response != "N/A")

response_analysis <- jjdotplotstats(
  data = treated_patients,
  dep = "crp_level",
  group = "treatment_response",
  grvar = "comorbidity_status",
  typestatistics = "nonparametric",
  centralityplotting = TRUE,
  centralitytype = "nonparametric",
  mytitle = "CRP Levels by Treatment Response and Comorbidity Status",
  xtitle = "CRP Level (mg/L)",
  ytitle = "Treatment Response"
)

Data Requirements

Input Data Structure

The input data should be a data frame with:

  • Continuous variable: Numeric column for the dot plot (dependent variable)
  • Grouping variable: Factor or character column defining comparison groups
  • Optional grouping variable: Additional factor for stratified analysis
  • Complete cases: Missing values are automatically excluded

Example Data Structure

# Structure of test data
str(jjdotplotstats_test_data)

# Key variables:
# - crp_level: C-reactive protein levels (mg/L)
# - esr_level: Erythrocyte sedimentation rate (mm/hr)  
# - platelet_count: Platelet count (×10³/μL)
# - hemoglobin_level: Hemoglobin levels (g/dL)
# - disease_severity: Ordered factor (Healthy, Mild, Moderate, Severe)
# - treatment_center: Factor (Center A, B, C, D)
# - gender: Factor (Male, Female)

Best Practices

Variable Selection and Preparation

  1. Choose appropriate variables:
    • Dependent variable should be continuous and numeric
    • Grouping variables should be categorical with meaningful levels
  2. Check distributions:
    • Examine histograms and Q-Q plots
    • Consider transformations for highly skewed data
  3. Handle missing data:
    • Decide on complete case analysis vs. imputation
    • Document missing data patterns
  4. Sample size considerations:
    • Ensure adequate sample size within each group
    • Consider power analysis for effect size detection

Statistical Method Selection

  • Use parametric when data is approximately normal with equal variances
  • Use nonparametric for ordinal data, non-normal distributions, or small samples
  • Use robust when there are outliers but parametric interpretation is desired
  • Use Bayesian when you want to quantify evidence and incorporate prior knowledge

Effect Size Interpretation

  • Cohen’s d / Hedge’s g:
    • 0.2: Small effect
    • 0.5: Medium effect
    • 0.8: Large effect
  • Eta-squared / Omega-squared:
    • 0.01: Small effect (1% variance explained)
    • 0.06: Medium effect (6% variance explained)
    • 0.14: Large effect (14% variance explained)

Customization Tips

# Highly customized dot plot
custom_plot <- jjdotplotstats(
  data = jjdotplotstats_test_data,
  dep = "crp_level",
  group = "disease_severity",
  typestatistics = "parametric",
  effsizetype = "unbiased",
  centralityplotting = TRUE,
  centralitytype = "parametric",
  mytitle = "C-Reactive Protein Levels Across Disease Severity Groups",
  xtitle = "CRP Level (mg/L)",
  ytitle = "Disease Severity",
  originaltheme = FALSE,  # Use jamovi theme
  resultssubtitle = TRUE  # Show statistical results
)

Troubleshooting

Common Issues

  1. “Data contains no (complete) rows”
    • Check for missing values in your variables
    • Ensure at least some complete cases exist
    • Consider imputation or subset analysis
  2. No plot generated
    • Verify that you have specified both dep and group
    • Check that variables exist in your dataset
    • Ensure proper variable types (numeric for dep, factor for group)
  3. Slow performance
    • The optimized version should be much faster
    • For very large datasets, consider sampling
    • Check for memory constraints

Error Handling

# Example error handling
tryCatch({
  result <- jjdotplotstats(
    data = my_data,
    dep = "continuous_var",
    group = "group_var",
    typestatistics = "parametric"
  )
}, error = function(e) {
  message("Error in dot plot analysis: ", e$message)
  message("Check your data structure and variable types")
})

Technical Details

Underlying Functions

The jjdotplotstats function is built on:

  • ggstatsplot::ggdotplotstats: For single dot plots
  • ggstatsplot::grouped_ggdotplotstats: For grouped analyses
  • jmvcore: For data handling and option processing

Caching Implementation

# Internal caching structure (conceptual)
# private$.processedData: Cached cleaned data with converted variables
# private$.processedOptions: Cached option processing and titles
# 
# Benefits:
# - Eliminates redundant jmvcore::naOmit() calls
# - Avoids repeated variable conversion
# - Shares processed data between plot methods
# - Reduces option processing overhead

Clinical Applications

Biomarker Studies

Dot plots are particularly useful for:

  • Diagnostic biomarker evaluation: Comparing levels across disease groups
  • Treatment monitoring: Assessing biomarker changes over time
  • Cohort comparisons: Analyzing differences between study populations
  • Quality control: Identifying outliers and batch effects

Research Scenarios

  1. Drug Development: Compare efficacy biomarkers across treatment arms
  2. Epidemiology: Analyze exposure-outcome relationships
  3. Clinical Trials: Assess treatment effects on continuous endpoints
  4. Laboratory Medicine: Compare reference ranges across populations

Conclusion

The optimized jjdotplotstats function provides:

  • High performance: Significant speed improvements through caching
  • Statistical rigor: Multiple statistical methods with effect sizes
  • Flexibility: Extensive customization and grouping options
  • Clinical relevance: Designed for biomarker and clinical research applications
  • Usability: Clear documentation and comprehensive examples

The function is well-suited for clinical research, biomarker analysis, and any scenario requiring robust comparison of continuous variables across groups.

Session Information

## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Istanbul
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     desc_1.4.3        R6_2.6.1          fastmap_1.2.0    
##  [5] xfun_0.52         cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1
##  [9] rmarkdown_2.29    lifecycle_1.0.4   cli_3.6.5         sass_0.4.10      
## [13] pkgdown_2.1.3     textshaping_1.0.1 jquerylib_0.1.4   systemfonts_1.2.3
## [17] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       ragg_1.4.0       
## [21] bslib_0.9.0       evaluate_1.0.4    yaml_2.3.10       jsonlite_2.0.0   
## [25] rlang_1.1.6       fs_1.6.6          htmlwidgets_1.6.4