Comprehensive Bar Chart Analysis with jjbarstats
ClinicoPath Development Team
2025-07-29
Source:vignettes/23-jjbarstats-comprehensive.Rmd
23-jjbarstats-comprehensive.Rmd
Introduction to jjbarstats
The jjbarstats
function is a powerful wrapper around the
ggstatsplot
package that creates publication-ready bar
charts with automatic statistical testing. This function is designed
specifically for analyzing categorical data relationships in clinical
and research settings.
Key Features
- Automatic Statistical Testing: Performs chi-squared tests, Fisher’s exact tests, or other appropriate tests based on your data
- Multiple Variable Support: Handle single or multiple dependent variables simultaneously
- Grouped Analysis: Split analysis by additional grouping variables
- Flexible Statistical Methods: Choose from parametric, non-parametric, robust, or Bayesian approaches
- Pairwise Comparisons: Automatic post-hoc testing with multiple comparison correction
- Professional Visualization: Publication-ready plots with statistical annotations
When to Use jjbarstats
Use jjbarstats
when you need to:
- Compare proportions across different groups
- Analyze treatment effectiveness in clinical trials
- Examine relationships between categorical variables
- Create publication-ready visualizations with statistical tests
- Perform quality improvement analysis
- Analyze survey responses and patient feedback
Basic Usage
Single Dependent Variable Analysis
Let’s start with a basic example using medical study data to compare treatment responses across different treatment groups:
# Basic bar chart comparing treatment response across groups
jjbarstats(
data = medical_study_data,
dep = response,
group = treatment_group,
grvar = NULL
)
This creates a bar chart showing the distribution of treatment responses (Complete Response, Partial Response, No Response) across different treatment groups (Control, Treatment A, Treatment B), along with chi-squared test results.
Advanced Statistical Options
Different Statistical Methods
The function supports four different statistical approaches:
Parametric Analysis (Default)
jjbarstats(
data = patient_satisfaction_data,
dep = satisfaction_level,
group = service_type,
typestatistics = "parametric"
)
Non-parametric Analysis
jjbarstats(
data = diagnostic_test_data,
dep = test_result,
group = test_method,
typestatistics = "nonparametric"
)
Robust Analysis
jjbarstats(
data = quality_improvement_data,
dep = implementation_status,
group = improvement_category,
typestatistics = "robust"
)
Bayesian Analysis
jjbarstats(
data = medical_study_data,
dep = severity,
group = treatment_group,
typestatistics = "bayes"
)
Grouped Analysis with Splitting Variables
Using the grvar Parameter
The grvar
parameter allows you to split your analysis by
an additional grouping variable, creating separate plots for each
level:
# Analyze treatment response by treatment group, split by gender
jjbarstats(
data = medical_study_data,
dep = response,
group = treatment_group,
grvar = gender
)
This creates separate bar charts for male and female patients, allowing you to examine whether treatment effects differ by gender.
Complex Grouped Analysis
# Patient satisfaction by service type, split by department
jjbarstats(
data = patient_satisfaction_data,
dep = satisfaction_level,
group = service_type,
grvar = department
)
Pairwise Comparisons and Multiple Testing
Enabling Pairwise Comparisons
When you have more than two groups, pairwise comparisons help identify which specific groups differ:
# jjbarstats(
# data = clinical_trial_data,
# dep = primary_outcome,
# group = drug_dosage,
# pairwisecomparisons = TRUE,
# padjustmethod = "holm"
# )
Multiple Comparison Correction Methods
Different correction methods are available to control for multiple testing:
# Bonferroni correction (most conservative)
jjbarstats(
data = diagnostic_test_data,
dep = test_result,
group = laboratory,
pairwisecomparisons = TRUE,
padjustmethod = "bonferroni"
)
# Benjamini-Hochberg correction (controls false discovery rate)
jjbarstats(
data = quality_improvement_data,
dep = priority_level,
group = department_involved,
pairwisecomparisons = TRUE,
padjustmethod = "BH"
)
Controlling Pairwise Display
You can control which pairwise comparisons are displayed:
# Show only significant comparisons
jjbarstats(
data = medical_study_data,
dep = response,
group = treatment_group,
pairwisecomparisons = TRUE,
pairwisedisplay = "significant"
)
# Show all comparisons
jjbarstats(
data = patient_satisfaction_data,
dep = staff_rating,
group = service_type,
pairwisecomparisons = TRUE,
pairwisedisplay = "everything"
)
Real-World Clinical Applications
Treatment Efficacy Analysis
Analyzing treatment effectiveness across different patient subgroups:
# Comprehensive treatment analysis
jjbarstats(
data = medical_study_data,
dep = response,
group = treatment_group,
grvar = severity,
typestatistics = "nonparametric",
pairwisecomparisons = TRUE,
padjustmethod = "BH"
)
Quality Improvement Analysis
Tracking implementation status across different improvement categories:
jjbarstats(
data = quality_improvement_data,
dep = c(implementation_status, priority_level),
group = improvement_category,
typestatistics = "parametric",
pairwisecomparisons = TRUE
)
Diagnostic Test Evaluation
Comparing test performance across different methods and laboratories:
jjbarstats(
data = diagnostic_test_data,
dep = test_result,
group = test_method,
grvar = laboratory,
typestatistics = "robust",
pairwisecomparisons = TRUE,
pairwisedisplay = "significant"
)
Patient Satisfaction Survey Analysis
Analyzing satisfaction levels across different service types and departments:
jjbarstats(
data = patient_satisfaction_data,
dep = satisfaction_level,
group = service_type,
grvar = department,
typestatistics = "nonparametric",
pairwisecomparisons = TRUE,
padjustmethod = "holm"
)
Working with Real Histopathology Data
Using the histopathology dataset that comes with ClinicoPath:
# Analyze lymphovascular invasion by treatment group
jjbarstats(
data = histopathology,
dep = LVI,
group = Group,
typestatistics = "nonparametric",
pairwisecomparisons = TRUE
)
# Multiple outcome analysis
jjbarstats(
data = histopathology,
dep = c(LVI, PNI),
group = Grade_Level,
typestatistics = "parametric"
)
# Grouped analysis by sex
jjbarstats(
data = histopathology,
dep = LymphNodeMetastasis,
group = Grade_Level,
grvar = Sex,
typestatistics = "robust",
pairwisecomparisons = TRUE
)
Customization and Theming
Using Original ggstatsplot Theme
jjbarstats(
data = medical_study_data,
dep = response,
group = treatment_group,
originaltheme = TRUE
)
Best Practices and Recommendations
Statistical Method Selection
- Parametric: Use when data meets assumptions (large sample sizes, expected frequencies ≥ 5)
- Non-parametric: Default choice for categorical data, fewer assumptions
- Robust: Good middle ground, less sensitive to outliers
- Bayesian: When you want to incorporate prior knowledge or report Bayes factors
Multiple Comparison Correction
- Holm: Good balance between power and Type I error control
- Bonferroni: Most conservative, use when Type I error is critical
- BH (Benjamini-Hochberg): Controls false discovery rate, good for exploratory analysis
- None: Only when you have specific a priori hypotheses
Sample Size Considerations
- Chi-squared tests require expected frequencies ≥ 5 in each cell
- Fisher’s exact test is automatically used for small samples
- Consider effect sizes, not just p-values
Data Preparation Tips
# Ensure categorical variables are properly formatted
medical_study_clean <- medical_study_data %>%
mutate(
treatment_group = factor(treatment_group,
levels = c("Control", "Treatment A", "Treatment B")),
response = factor(response,
levels = c("No Response", "Partial Response", "Complete Response"))
)
# Verify factor levels
str(medical_study_clean[c("treatment_group", "response")])
Troubleshooting Common Issues
Issue 1: Empty Cells or Small Counts
When you have empty cells or very small counts, the function automatically switches to appropriate tests:
# Create data with small counts
small_sample <- medical_study_data[1:20, ]
jjbarstats(
data = small_sample,
dep = response,
group = treatment_group,
typestatistics = "nonparametric"
)
Issue 2: Too Many Categories
When you have many categories, consider grouping or using different visualization:
# Example with multiple categories
jjbarstats(
data = patient_satisfaction_data,
dep = satisfaction_level,
group = department,
pairwisecomparisons = FALSE # Disable pairwise for clarity
)
Issue 3: Missing Data
The function automatically handles missing data when
excl = TRUE
(default):
# Demonstrate missing data handling
data_with_na <- medical_study_data
data_with_na$response[1:5] <- NA
jjbarstats(
data = data_with_na,
dep = response,
group = treatment_group,
excl = TRUE # Exclude missing values
)
Interpretation Guidelines
Understanding the Statistical Output
- Chi-squared test: Tests independence between categorical variables
- Effect size (Cramér’s V): Measures strength of association (0 = no association, 1 = perfect association)
- Confidence intervals: Provide range of plausible values for the effect
- Pairwise comparisons: Show which specific groups differ
Clinical Significance vs Statistical Significance
- Always consider clinical relevance alongside statistical significance
- Effect sizes help interpret practical importance
- Confidence intervals provide information about precision
Reporting Results
When reporting results from jjbarstats:
- Describe the statistical test used (chi-squared, Fisher’s exact, etc.)
- Report effect size (Cramér’s V) and confidence intervals
- Mention multiple comparison correction if applicable
- Provide sample sizes for each group
- Include the actual plot in your publication
Advanced Examples
Multi-stage Analysis Workflow
# Step 1: Overall analysis
# overall_result <- jjbarstats(
# data = clinical_trial_data,
# dep = primary_outcome,
# group = drug_dosage,
# typestatistics = "nonparametric",
# pairwisecomparisons = TRUE,
# padjustmethod = "BH"
# )
# Step 2: Subgroup analysis by study phase
# subgroup_result <- jjbarstats(
# data = clinical_trial_data,
# dep = primary_outcome,
# group = drug_dosage,
# grvar = study_phase,
# typestatistics = "nonparametric",
# pairwisecomparisons = TRUE
# )
Conclusion
The jjbarstats
function provides a comprehensive
solution for categorical data analysis in clinical and research
settings. Its integration with the ggstatsplot
ecosystem
ensures both statistical rigor and visual appeal, making it an excellent
choice for:
- Clinical trial analysis
- Quality improvement studies
- Survey research
- Diagnostic test evaluation
- Healthcare outcomes research
The function’s flexibility in statistical methods, multiple comparison corrections, and visualization options makes it suitable for both exploratory and confirmatory analysis phases of research projects.
Further Resources
- ggstatsplot documentation
- ClinicoPath package documentation
- Statistical methods references for categorical data analysis