jjStatsPlot 01: Statistical Visualization for Pathologists
ClinicoPath Development Team
2025-06-30
Source:vignettes/jjstatsplot-01-introduction.Rmd
jjstatsplot-01-introduction.Rmd
Introduction
jjstatsplot provides comprehensive statistical visualization tools
specifically designed for clinical research and pathology studies. Built
on the powerful ggstatsplot
package, it offers
publication-ready plots with integrated statistical testing, making
complex data analysis accessible to clinical researchers.
Learning Objectives:
- Master statistical visualization for clinical data analysis
- Learn publication-ready plot creation with integrated statistics
- Apply appropriate statistical tests for different data types
- Understand clinical interpretation of statistical visualizations
- Implement quality control and reproducible reporting workflows
Module Overview
jjstatsplot encompasses four main areas of statistical visualization:
1. Categorical Data Analysis
- Bar Charts: Group comparisons with chi-square testing
- Pie Charts: Proportion visualization for categorical outcomes
- Dot Plots: Summary statistics with confidence intervals
- Mosaic Plots: Multi-dimensional categorical relationships
2. Continuous Data Analysis
- Box-Violin Plots: Distribution comparisons between groups
- Histograms: Distribution analysis with normality testing
- Density Plots: Smooth distribution visualization
- Within-Subject Comparisons: Paired data analysis
Getting Started
Clinical Data Examples
We’ll demonstrate jjstatsplot functions using clinical pathology datasets that represent real-world scenarios.
# Load clinical pathology datasets
data(histopathology)
data(breast_cancer_data)
data(ihc_molecular_comprehensive)
# Prepare clinical variables for analysis
clinical_data <- histopathology %>%
mutate(
# Convert variables to appropriate types
Grade = factor(Grade, levels = 1:3, labels = c("Grade 1", "Grade 2", "Grade 3")),
TStage = factor(TStage),
Group = factor(Group),
LVI = factor(LVI, levels = c("FALSE", "TRUE"), labels = c("Absent", "Present")),
Age_Group = factor(ifelse(Age >= 65, "≥65 years", "<65 years")),
# Create clinical outcome variables
High_Risk = factor(ifelse(Grade == "Grade 3" | TStage %in% c("T3", "T4"),
"High Risk", "Low Risk")),
Recurrence_Risk = factor(case_when(
Grade == "Grade 1" & TStage %in% c("T1", "T2") ~ "Low",
Grade == "Grade 2" ~ "Intermediate",
Grade == "Grade 3" | TStage %in% c("T3", "T4") ~ "High",
TRUE ~ "Intermediate"
), levels = c("Low", "Intermediate", "High"))
)
cat("Clinical Dataset Overview:\n")
cat("Patients:", nrow(clinical_data), "\n")
cat("Variables:", ncol(clinical_data), "\n")
# Variable summary
clinical_summary <- clinical_data %>%
summarise(
median_age = median(Age, na.rm = TRUE),
grade3_pct = round(mean(Grade == "Grade 3", na.rm = TRUE) * 100, 1),
lvi_present_pct = round(mean(LVI == "Present", na.rm = TRUE) * 100, 1),
high_risk_pct = round(mean(High_Risk == "High Risk", na.rm = TRUE) * 100, 1)
)
print("Clinical Characteristics:")
print(clinical_summary)
Core Visualization Functions
1. Categorical Data Visualization
Bar Charts for Group Comparisons
Essential for comparing categorical outcomes between treatment groups.
# Bar Chart Example - Tumor Grade by Treatment Group
# In jamovi: JJStatsPlot > Categorical > Bar Charts
cat("Bar Chart Analysis - Tumor Grade by Treatment Group\n")
cat("==================================================\n\n")
# Basic bar chart with statistical testing
grade_by_group <- jjbarstats(
data = clinical_data,
dep = "Grade", # Dependent variable (outcome)
group = "Group", # Grouping variable (treatment)
title = "Tumor Grade Distribution by Treatment Group",
subtitle = "Chi-square test for independence",
xlab = "Treatment Group",
ylab = "Count"
)
# Results interpretation
cat("Statistical Test: Chi-square test of independence\n")
cat("Null Hypothesis: Grade distribution is independent of treatment group\n")
cat("Alternative: Grade distribution differs between treatment groups\n\n")
# Clinical interpretation guidelines
cat("Clinical Interpretation Guidelines:\n")
cat("- Low-grade tumors (Grade 1-2): Better prognosis, less aggressive\n")
cat("- High-grade tumors (Grade 3): Poorer prognosis, more aggressive\n")
cat("- Treatment group differences may indicate selection bias or treatment effect\n")
Pie Charts for Proportion Visualization
Useful for showing the composition of categorical variables.
# Pie Chart Example - Recurrence Risk Distribution
# In jamovi: JJStatsPlot > Categorical > Pie Charts
cat("Pie Chart Analysis - Recurrence Risk Distribution\n")
cat("================================================\n\n")
recurrence_pie <- jjpiestats(
data = clinical_data,
dep = "Recurrence_Risk",
title = "Patient Distribution by Recurrence Risk",
subtitle = "Low, Intermediate, and High Risk Categories"
)
# Risk stratification summary
risk_distribution <- table(clinical_data$Recurrence_Risk)
risk_percentages <- round(prop.table(risk_distribution) * 100, 1)
cat("Recurrence Risk Distribution:\n")
for(i in 1:length(risk_distribution)) {
cat(paste0(names(risk_distribution)[i], ": ", risk_distribution[i],
" patients (", risk_percentages[i], "%)\n"))
}
cat("\nClinical Utility:\n")
cat("- Risk stratification guides treatment intensity\n")
cat("- High-risk patients may benefit from adjuvant therapy\n")
cat("- Low-risk patients may be candidates for de-escalation\n")
2. Continuous Data Visualization
Box-Violin Plots for Group Comparisons
Standard approach for comparing continuous variables between groups.
# Box-Violin Plot Example - Age by Risk Group
# In jamovi: JJStatsPlot > Continuous > Between Groups
cat("Box-Violin Plot Analysis - Age by Risk Group\n")
cat("============================================\n\n")
age_by_risk <- jjbetweenstats(
data = clinical_data,
dep = "Age", # Continuous outcome variable
group = "High_Risk", # Grouping variable
type = "parametric", # Statistical test type
pairwise.comparisons = TRUE, # Post-hoc comparisons
title = "Age Distribution by Risk Category",
subtitle = "Two-sample t-test with effect size",
xlab = "Risk Category",
ylab = "Age (years)"
)
# Effect size interpretation
cat("Statistical Analysis:\n")
cat("Test: Two-sample t-test (parametric)\n")
cat("Effect size: Cohen's d\n")
cat("Assumptions: Normality, equal variances\n\n")
cat("Effect Size Interpretation (Cohen's d):\n")
cat("- Small effect: d = 0.2\n")
cat("- Medium effect: d = 0.5\n")
cat("- Large effect: d = 0.8\n\n")
cat("Clinical Relevance:\n")
cat("- Age may be a confounding factor in risk assessment\n")
cat("- Older patients may have different treatment tolerability\n")
cat("- Age-adjusted analysis may be necessary\n")
Histograms with Statistical Testing
Evaluate distribution characteristics and test against theoretical distributions.
# Histogram Example - Age Distribution Analysis
# In jamovi: JJStatsPlot > Continuous > Histograms
cat("Histogram Analysis - Age Distribution\n")
cat("====================================\n\n")
age_histogram <- jjhistostats(
data = clinical_data,
dep = "Age",
test.value = 65, # Test against clinically relevant age
type = "parametric",
normal.curve = TRUE, # Overlay normal distribution
title = "Patient Age Distribution",
subtitle = "One-sample t-test against age 65",
xlab = "Age (years)",
binwidth = 5
)
# Distribution characteristics
age_stats <- clinical_data %>%
summarise(
n = sum(!is.na(Age)),
mean_age = round(mean(Age, na.rm = TRUE), 1),
median_age = median(Age, na.rm = TRUE),
sd_age = round(sd(Age, na.rm = TRUE), 1),
min_age = min(Age, na.rm = TRUE),
max_age = max(Age, na.rm = TRUE),
elderly_pct = round(mean(Age >= 65, na.rm = TRUE) * 100, 1)
)
cat("Age Distribution Summary:\n")
cat("Sample size:", age_stats$n, "\n")
cat("Mean age:", age_stats$mean_age, "years\n")
cat("Median age:", age_stats$median_age, "years\n")
cat("Standard deviation:", age_stats$sd_age, "years\n")
cat("Age range:", age_stats$min_age, "-", age_stats$max_age, "years\n")
cat("Elderly patients (≥65):", age_stats$elderly_pct, "%\n\n")
cat("Clinical Implications:\n")
cat("- Patient age distribution affects generalizability\n")
cat("- Elderly patients may require modified treatment protocols\n")
cat("- Age-related comorbidities impact treatment selection\n")
3. Correlation and Association Analysis
Scatter Plots with Regression Analysis
Examine relationships between continuous variables.
# Scatter Plot Example - Biomarker Correlation
# In jamovi: JJStatsPlot > Correlation > Scatter Plots
cat("Scatter Plot Analysis - Biomarker Correlation\n")
cat("=============================================\n\n")
# Using measurement variables from histopathology data
biomarker_scatter <- jjscatterstats(
data = clinical_data,
x = "MeasurementA", # First biomarker
y = "MeasurementB", # Second biomarker
type = "parametric",
conf.level = 0.95,
title = "Biomarker A vs Biomarker B Correlation",
subtitle = "Pearson correlation with 95% confidence interval",
xlab = "Biomarker A Expression",
ylab = "Biomarker B Expression"
)
# Correlation strength interpretation
cat("Correlation Strength Interpretation:\n")
cat("- Negligible: |r| < 0.1\n")
cat("- Weak: 0.1 ≤ |r| < 0.3\n")
cat("- Moderate: 0.3 ≤ |r| < 0.5\n")
cat("- Strong: 0.5 ≤ |r| < 0.7\n")
cat("- Very strong: |r| ≥ 0.7\n\n")
cat("Clinical Applications:\n")
cat("- Biomarker co-expression patterns\n")
cat("- Pathway analysis and validation\n")
cat("- Quality control for technical replicates\n")
cat("- Surrogate endpoint validation\n")
Correlation Matrix for Multiple Variables
Comprehensive analysis of relationships between multiple variables.
# Correlation Matrix Example - Clinical Variables
# In jamovi: JJStatsPlot > Correlation > Correlation Matrix
cat("Correlation Matrix Analysis - Clinical Variables\n")
cat("===============================================\n\n")
# Select continuous clinical variables
continuous_vars <- clinical_data %>%
select(Age, MeasurementA, MeasurementB, OverallTime) %>%
na.omit()
if(nrow(continuous_vars) > 10) {
correlation_matrix <- jjcorrmat(
data = continuous_vars,
type = "parametric",
title = "Clinical Variables Correlation Matrix",
subtitle = "Pearson correlations with significance testing"
)
cat("Matrix Interpretation:\n")
cat("- Diagonal elements are always 1.0 (self-correlation)\n")
cat("- Upper triangle shows correlation coefficients\n")
cat("- Lower triangle may show significance levels\n")
cat("- Color intensity indicates correlation strength\n\n")
cat("Clinical Value:\n")
cat("- Identify multicollinearity in regression models\n")
cat("- Discover unexpected relationships between variables\n")
cat("- Guide feature selection for predictive models\n")
cat("- Quality assurance for related measurements\n")
}
Advanced Clinical Applications
Paired Data Analysis
Essential for before-after comparisons and matched samples.
# Paired Analysis Example - Pre/Post Treatment Comparison
# In jamovi: JJStatsPlot > Continuous > Within Subject
cat("Paired Data Analysis - Treatment Response\n")
cat("========================================\n\n")
# Create synthetic paired data for demonstration
set.seed(123)
paired_data <- clinical_data %>%
slice_head(n = 50) %>%
mutate(
Pre_Treatment = MeasurementA,
Post_Treatment = MeasurementA + rnorm(50, mean = -2, sd = 1.5),
Patient_ID = paste0("PT", sprintf("%03d", 1:50))
) %>%
select(Patient_ID, Pre_Treatment, Post_Treatment) %>%
tidyr::pivot_longer(cols = c(Pre_Treatment, Post_Treatment),
names_to = "Timepoint",
values_to = "Biomarker_Level")
if(nrow(paired_data) > 0) {
paired_comparison <- jjwithinstats(
data = paired_data,
x = "Timepoint",
y = "Biomarker_Level",
id = "Patient_ID",
paired = TRUE,
type = "parametric",
title = "Pre vs Post-Treatment Biomarker Levels",
subtitle = "Paired t-test with individual trajectories",
xlab = "Treatment Timepoint",
ylab = "Biomarker Level"
)
cat("Paired Analysis Advantages:\n")
cat("- Controls for individual patient baseline differences\n")
cat("- Higher statistical power than independent samples\n")
cat("- Shows individual patient trajectories\n")
cat("- Appropriate for longitudinal studies\n\n")
cat("Clinical Applications:\n")
cat("- Treatment response monitoring\n")
cat("- Before-after intervention studies\n")
cat("- Matched case-control designs\n")
cat("- Quality of life assessments\n")
}
Subgroup Analysis
Evaluate treatment effects across different patient populations.
cat("Subgroup Analysis - Treatment Effect by Grade\n")
cat("=============================================\n\n")
# Subgroup analysis by tumor grade
subgroup_results <- list()
for(grade_level in c("Grade 1", "Grade 2", "Grade 3")) {
subgroup_data <- clinical_data %>%
filter(Grade == grade_level & !is.na(MeasurementA))
if(nrow(subgroup_data) >= 10) {
cat(paste0("Analysis for ", grade_level, ":\n"))
cat(paste0("Sample size: ", nrow(subgroup_data), "\n"))
# Age comparison by group within grade
age_summary <- subgroup_data %>%
group_by(Group) %>%
summarise(
n = n(),
mean_age = round(mean(Age, na.rm = TRUE), 1),
sd_age = round(sd(Age, na.rm = TRUE), 1),
.groups = 'drop'
)
print(age_summary)
cat("\n")
}
}
cat("Subgroup Analysis Considerations:\n")
cat("- Reduced sample sizes may limit statistical power\n")
cat("- Multiple comparisons increase Type I error risk\n")
cat("- Pre-planned vs post-hoc analyses have different interpretations\n")
cat("- Clinical significance may differ from statistical significance\n")
Integration with Clinical Workflows
Quality Control Applications
Statistical visualization for laboratory and clinical QC.
cat("Quality Control Applications\n")
cat("===========================\n\n")
# QC metrics analysis
qc_data <- clinical_data %>%
mutate(
# Simulate QC metrics
Processing_Batch = sample(1:5, n(), replace = TRUE),
Staining_Quality = sample(c("Excellent", "Good", "Acceptable", "Poor"),
n(), replace = TRUE, prob = c(0.4, 0.3, 0.2, 0.1)),
Measurement_CV = abs(rnorm(n(), mean = 5, sd = 2)) # Coefficient of variation
) %>%
filter(!is.na(MeasurementA))
# Batch effect analysis
if(nrow(qc_data) > 20) {
cat("Batch Effect Analysis:\n")
batch_summary <- qc_data %>%
group_by(Processing_Batch) %>%
summarise(
n = n(),
mean_measurement = round(mean(MeasurementA, na.rm = TRUE), 2),
sd_measurement = round(sd(MeasurementA, na.rm = TRUE), 2),
cv_measurement = round(sd_measurement / mean_measurement * 100, 1),
.groups = 'drop'
)
print(batch_summary)
cat("\nQuality Control Guidelines:\n")
cat("- CV <10%: Excellent reproducibility\n")
cat("- CV 10-15%: Good reproducibility\n")
cat("- CV 15-20%: Acceptable reproducibility\n")
cat("- CV >20%: Poor reproducibility, investigate\n\n")
}
cat("QC Visualization Applications:\n")
cat("- Monitor analytical performance over time\n")
cat("- Detect systematic errors and drift\n")
cat("- Validate new analytical methods\n")
cat("- Support accreditation requirements\n")
Best Practices for Clinical Research
Statistical Considerations
cat("Statistical Best Practices for Clinical Visualization\n")
cat("====================================================\n\n")
cat("1. Choose Appropriate Statistical Tests:\n")
cat(" - Parametric: Normal distribution, continuous data\n")
cat(" - Non-parametric: Skewed data, ordinal scales\n")
cat(" - Robust: Outliers present, mixed distributions\n")
cat(" - Bayesian: Small samples, prior information available\n\n")
cat("2. Effect Size Reporting:\n")
cat(" - Always report effect sizes alongside p-values\n")
cat(" - Consider clinical significance, not just statistical significance\n")
cat(" - Use confidence intervals for uncertainty quantification\n\n")
cat("3. Multiple Comparisons:\n")
cat(" - Adjust p-values when testing multiple hypotheses\n")
cat(" - Pre-specify primary and secondary endpoints\n")
cat(" - Consider false discovery rate control\n\n")
cat("4. Sample Size Considerations:\n")
cat(" - Ensure adequate power for primary analyses\n")
cat(" - Be cautious with subgroup analyses (small n)\n")
cat(" - Report confidence intervals to show precision\n\n")
cat("5. Missing Data:\n")
cat(" - Report patterns and extent of missing data\n")
cat(" - Use appropriate missing data methods\n")
cat(" - Conduct sensitivity analyses\n")
Publication Guidelines
cat("Publication-Ready Visualization Guidelines\n")
cat("==========================================\n\n")
cat("1. Figure Quality:\n")
cat(" - High resolution (≥300 DPI) for print\n")
cat(" - Clear, readable fonts (≥10 point)\n")
cat(" - Appropriate color schemes (colorblind-friendly)\n")
cat(" - Consistent styling across figures\n\n")
cat("2. Statistical Reporting:\n")
cat(" - Include sample sizes in figure legends\n")
cat(" - Report exact p-values when possible\n")
cat(" - Specify statistical tests used\n")
cat(" - Include effect sizes and confidence intervals\n\n")
cat("3. Clinical Context:\n")
cat(" - Provide clinical interpretation of findings\n")
cat(" - Discuss clinical significance vs statistical significance\n")
cat(" - Include relevant reference ranges or cutpoints\n")
cat(" - Consider real-world application\n\n")
cat("4. Reproducibility:\n")
cat(" - Document software versions\n")
cat(" - Provide clear methods description\n")
cat(" - Share analysis code when possible\n")
cat(" - Include raw data summaries\n")
Conclusion
jjstatsplot provides comprehensive statistical visualization tools essential for modern clinical research and pathology studies. The integration of statistical testing with publication-ready graphics makes complex data analysis accessible to clinical researchers while maintaining scientific rigor.
Key Advantages
- Integrated Statistics: Automatic statistical testing with visualization
- Clinical Focus: Designed for healthcare research applications
- Publication Ready: High-quality graphics suitable for journals
- User Friendly: Accessible through jamovi interface
- Comprehensive: Covers all major visualization needs
Next Steps
To explore specific visualization techniques:
- Categorical Analysis: See detailed categorical plot vignettes
- Continuous Data: Explore advanced continuous data visualization
- Correlation Studies: Learn comprehensive correlation analysis
- Clinical Applications: Study real-world clinical research examples
This introduction provides the foundation for using jjstatsplot in clinical research. Explore the detailed vignettes for specific techniques and advanced applications.