Skip to contents

Introduction

jjstatsplot provides comprehensive statistical visualization tools specifically designed for clinical research and pathology studies. Built on the powerful ggstatsplot package, it offers publication-ready plots with integrated statistical testing, making complex data analysis accessible to clinical researchers.

Learning Objectives:

  • Master statistical visualization for clinical data analysis
  • Learn publication-ready plot creation with integrated statistics
  • Apply appropriate statistical tests for different data types
  • Understand clinical interpretation of statistical visualizations
  • Implement quality control and reproducible reporting workflows

Module Overview

jjstatsplot encompasses four main areas of statistical visualization:

1. Categorical Data Analysis

  • Bar Charts: Group comparisons with chi-square testing
  • Pie Charts: Proportion visualization for categorical outcomes
  • Dot Plots: Summary statistics with confidence intervals
  • Mosaic Plots: Multi-dimensional categorical relationships

2. Continuous Data Analysis

  • Box-Violin Plots: Distribution comparisons between groups
  • Histograms: Distribution analysis with normality testing
  • Density Plots: Smooth distribution visualization
  • Within-Subject Comparisons: Paired data analysis

3. Correlation and Association

  • Scatter Plots: Relationship visualization with regression
  • Correlation Matrices: Multi-variable association analysis
  • Heatmaps: Pattern recognition in large datasets
  • Network Plots: Complex relationship visualization

4. Advanced Visualizations

  • Ridge Plots: Multiple distribution comparisons
  • Raincloud Plots: Combined distribution and individual points
  • Forest Plots: Meta-analysis and effect size visualization
  • Sankey Diagrams: Flow and pathway analysis

Getting Started

Clinical Data Examples

We’ll demonstrate jjstatsplot functions using clinical pathology datasets that represent real-world scenarios.

# Load clinical pathology datasets
data(histopathology)
data(breast_cancer_data)
data(ihc_molecular_comprehensive)

# Prepare clinical variables for analysis
clinical_data <- histopathology %>%
  mutate(
    # Convert variables to appropriate types
    Grade = factor(Grade, levels = 1:3, labels = c("Grade 1", "Grade 2", "Grade 3")),
    TStage = factor(TStage),
    Group = factor(Group),
    LVI = factor(LVI, levels = c("FALSE", "TRUE"), labels = c("Absent", "Present")),
    Age_Group = factor(ifelse(Age >= 65, "≥65 years", "<65 years")),
    
    # Create clinical outcome variables
    High_Risk = factor(ifelse(Grade == "Grade 3" | TStage %in% c("T3", "T4"), 
                             "High Risk", "Low Risk")),
    Recurrence_Risk = factor(case_when(
      Grade == "Grade 1" & TStage %in% c("T1", "T2") ~ "Low",
      Grade == "Grade 2" ~ "Intermediate", 
      Grade == "Grade 3" | TStage %in% c("T3", "T4") ~ "High",
      TRUE ~ "Intermediate"
    ), levels = c("Low", "Intermediate", "High"))
  )

cat("Clinical Dataset Overview:\n")
cat("Patients:", nrow(clinical_data), "\n")
cat("Variables:", ncol(clinical_data), "\n")

# Variable summary
clinical_summary <- clinical_data %>%
  summarise(
    median_age = median(Age, na.rm = TRUE),
    grade3_pct = round(mean(Grade == "Grade 3", na.rm = TRUE) * 100, 1),
    lvi_present_pct = round(mean(LVI == "Present", na.rm = TRUE) * 100, 1),
    high_risk_pct = round(mean(High_Risk == "High Risk", na.rm = TRUE) * 100, 1)
  )

print("Clinical Characteristics:")
print(clinical_summary)

Core Visualization Functions

1. Categorical Data Visualization

Bar Charts for Group Comparisons

Essential for comparing categorical outcomes between treatment groups.

# Bar Chart Example - Tumor Grade by Treatment Group
# In jamovi: JJStatsPlot > Categorical > Bar Charts

cat("Bar Chart Analysis - Tumor Grade by Treatment Group\n")
cat("==================================================\n\n")

# Basic bar chart with statistical testing
grade_by_group <- jjbarstats(
  data = clinical_data,
  dep = "Grade",              # Dependent variable (outcome)
  group = "Group",            # Grouping variable (treatment)
  title = "Tumor Grade Distribution by Treatment Group",
  subtitle = "Chi-square test for independence",
  xlab = "Treatment Group",
  ylab = "Count"
)

# Results interpretation
cat("Statistical Test: Chi-square test of independence\n")
cat("Null Hypothesis: Grade distribution is independent of treatment group\n")
cat("Alternative: Grade distribution differs between treatment groups\n\n")

# Clinical interpretation guidelines
cat("Clinical Interpretation Guidelines:\n")
cat("- Low-grade tumors (Grade 1-2): Better prognosis, less aggressive\n")
cat("- High-grade tumors (Grade 3): Poorer prognosis, more aggressive\n")
cat("- Treatment group differences may indicate selection bias or treatment effect\n")

Pie Charts for Proportion Visualization

Useful for showing the composition of categorical variables.

# Pie Chart Example - Recurrence Risk Distribution
# In jamovi: JJStatsPlot > Categorical > Pie Charts

cat("Pie Chart Analysis - Recurrence Risk Distribution\n")
cat("================================================\n\n")

recurrence_pie <- jjpiestats(
  data = clinical_data,
  dep = "Recurrence_Risk",
  title = "Patient Distribution by Recurrence Risk",
  subtitle = "Low, Intermediate, and High Risk Categories"
)

# Risk stratification summary
risk_distribution <- table(clinical_data$Recurrence_Risk)
risk_percentages <- round(prop.table(risk_distribution) * 100, 1)

cat("Recurrence Risk Distribution:\n")
for(i in 1:length(risk_distribution)) {
  cat(paste0(names(risk_distribution)[i], ": ", risk_distribution[i], 
            " patients (", risk_percentages[i], "%)\n"))
}

cat("\nClinical Utility:\n")
cat("- Risk stratification guides treatment intensity\n")
cat("- High-risk patients may benefit from adjuvant therapy\n")
cat("- Low-risk patients may be candidates for de-escalation\n")

2. Continuous Data Visualization

Box-Violin Plots for Group Comparisons

Standard approach for comparing continuous variables between groups.

# Box-Violin Plot Example - Age by Risk Group
# In jamovi: JJStatsPlot > Continuous > Between Groups

cat("Box-Violin Plot Analysis - Age by Risk Group\n")
cat("============================================\n\n")

age_by_risk <- jjbetweenstats(
  data = clinical_data,
  dep = "Age",                    # Continuous outcome variable
  group = "High_Risk",            # Grouping variable
  type = "parametric",            # Statistical test type
  pairwise.comparisons = TRUE,    # Post-hoc comparisons
  title = "Age Distribution by Risk Category",
  subtitle = "Two-sample t-test with effect size",
  xlab = "Risk Category",
  ylab = "Age (years)"
)

# Effect size interpretation
cat("Statistical Analysis:\n")
cat("Test: Two-sample t-test (parametric)\n")
cat("Effect size: Cohen's d\n")
cat("Assumptions: Normality, equal variances\n\n")

cat("Effect Size Interpretation (Cohen's d):\n")
cat("- Small effect: d = 0.2\n")
cat("- Medium effect: d = 0.5\n")
cat("- Large effect: d = 0.8\n\n")

cat("Clinical Relevance:\n")
cat("- Age may be a confounding factor in risk assessment\n")
cat("- Older patients may have different treatment tolerability\n")
cat("- Age-adjusted analysis may be necessary\n")

Histograms with Statistical Testing

Evaluate distribution characteristics and test against theoretical distributions.

# Histogram Example - Age Distribution Analysis
# In jamovi: JJStatsPlot > Continuous > Histograms

cat("Histogram Analysis - Age Distribution\n")
cat("====================================\n\n")

age_histogram <- jjhistostats(
  data = clinical_data,
  dep = "Age",
  test.value = 65,               # Test against clinically relevant age
  type = "parametric",
  normal.curve = TRUE,           # Overlay normal distribution
  title = "Patient Age Distribution",
  subtitle = "One-sample t-test against age 65",
  xlab = "Age (years)",
  binwidth = 5
)

# Distribution characteristics
age_stats <- clinical_data %>%
  summarise(
    n = sum(!is.na(Age)),
    mean_age = round(mean(Age, na.rm = TRUE), 1),
    median_age = median(Age, na.rm = TRUE),
    sd_age = round(sd(Age, na.rm = TRUE), 1),
    min_age = min(Age, na.rm = TRUE),
    max_age = max(Age, na.rm = TRUE),
    elderly_pct = round(mean(Age >= 65, na.rm = TRUE) * 100, 1)
  )

cat("Age Distribution Summary:\n")
cat("Sample size:", age_stats$n, "\n")
cat("Mean age:", age_stats$mean_age, "years\n")
cat("Median age:", age_stats$median_age, "years\n")
cat("Standard deviation:", age_stats$sd_age, "years\n")
cat("Age range:", age_stats$min_age, "-", age_stats$max_age, "years\n")
cat("Elderly patients (≥65):", age_stats$elderly_pct, "%\n\n")

cat("Clinical Implications:\n")
cat("- Patient age distribution affects generalizability\n")
cat("- Elderly patients may require modified treatment protocols\n")
cat("- Age-related comorbidities impact treatment selection\n")

3. Correlation and Association Analysis

Scatter Plots with Regression Analysis

Examine relationships between continuous variables.

# Scatter Plot Example - Biomarker Correlation
# In jamovi: JJStatsPlot > Correlation > Scatter Plots

cat("Scatter Plot Analysis - Biomarker Correlation\n")
cat("=============================================\n\n")

# Using measurement variables from histopathology data
biomarker_scatter <- jjscatterstats(
  data = clinical_data,
  x = "MeasurementA",           # First biomarker
  y = "MeasurementB",           # Second biomarker
  type = "parametric",
  conf.level = 0.95,
  title = "Biomarker A vs Biomarker B Correlation",
  subtitle = "Pearson correlation with 95% confidence interval",
  xlab = "Biomarker A Expression",
  ylab = "Biomarker B Expression"
)

# Correlation strength interpretation
cat("Correlation Strength Interpretation:\n")
cat("- Negligible: |r| < 0.1\n")
cat("- Weak: 0.1 ≤ |r| < 0.3\n")
cat("- Moderate: 0.3 ≤ |r| < 0.5\n")
cat("- Strong: 0.5 ≤ |r| < 0.7\n")
cat("- Very strong: |r| ≥ 0.7\n\n")

cat("Clinical Applications:\n")
cat("- Biomarker co-expression patterns\n")
cat("- Pathway analysis and validation\n")
cat("- Quality control for technical replicates\n")
cat("- Surrogate endpoint validation\n")

Correlation Matrix for Multiple Variables

Comprehensive analysis of relationships between multiple variables.

# Correlation Matrix Example - Clinical Variables
# In jamovi: JJStatsPlot > Correlation > Correlation Matrix

cat("Correlation Matrix Analysis - Clinical Variables\n")
cat("===============================================\n\n")

# Select continuous clinical variables
continuous_vars <- clinical_data %>%
  select(Age, MeasurementA, MeasurementB, OverallTime) %>%
  na.omit()

if(nrow(continuous_vars) > 10) {
  correlation_matrix <- jjcorrmat(
    data = continuous_vars,
    type = "parametric",
    title = "Clinical Variables Correlation Matrix",
    subtitle = "Pearson correlations with significance testing"
  )
  
  cat("Matrix Interpretation:\n")
  cat("- Diagonal elements are always 1.0 (self-correlation)\n")
  cat("- Upper triangle shows correlation coefficients\n")
  cat("- Lower triangle may show significance levels\n")
  cat("- Color intensity indicates correlation strength\n\n")
  
  cat("Clinical Value:\n")
  cat("- Identify multicollinearity in regression models\n")
  cat("- Discover unexpected relationships between variables\n")
  cat("- Guide feature selection for predictive models\n")
  cat("- Quality assurance for related measurements\n")
}

Advanced Clinical Applications

Paired Data Analysis

Essential for before-after comparisons and matched samples.

# Paired Analysis Example - Pre/Post Treatment Comparison
# In jamovi: JJStatsPlot > Continuous > Within Subject

cat("Paired Data Analysis - Treatment Response\n")
cat("========================================\n\n")

# Create synthetic paired data for demonstration
set.seed(123)
paired_data <- clinical_data %>%
  slice_head(n = 50) %>%
  mutate(
    Pre_Treatment = MeasurementA,
    Post_Treatment = MeasurementA + rnorm(50, mean = -2, sd = 1.5),
    Patient_ID = paste0("PT", sprintf("%03d", 1:50))
  ) %>%
  select(Patient_ID, Pre_Treatment, Post_Treatment) %>%
  tidyr::pivot_longer(cols = c(Pre_Treatment, Post_Treatment),
                     names_to = "Timepoint", 
                     values_to = "Biomarker_Level")

if(nrow(paired_data) > 0) {
  paired_comparison <- jjwithinstats(
    data = paired_data,
    x = "Timepoint",
    y = "Biomarker_Level",
    id = "Patient_ID",
    paired = TRUE,
    type = "parametric",
    title = "Pre vs Post-Treatment Biomarker Levels",
    subtitle = "Paired t-test with individual trajectories",
    xlab = "Treatment Timepoint",
    ylab = "Biomarker Level"
  )
  
  cat("Paired Analysis Advantages:\n")
  cat("- Controls for individual patient baseline differences\n")
  cat("- Higher statistical power than independent samples\n")
  cat("- Shows individual patient trajectories\n")
  cat("- Appropriate for longitudinal studies\n\n")
  
  cat("Clinical Applications:\n")
  cat("- Treatment response monitoring\n")
  cat("- Before-after intervention studies\n")
  cat("- Matched case-control designs\n")
  cat("- Quality of life assessments\n")
}

Subgroup Analysis

Evaluate treatment effects across different patient populations.

cat("Subgroup Analysis - Treatment Effect by Grade\n")
cat("=============================================\n\n")

# Subgroup analysis by tumor grade
subgroup_results <- list()

for(grade_level in c("Grade 1", "Grade 2", "Grade 3")) {
  subgroup_data <- clinical_data %>%
    filter(Grade == grade_level & !is.na(MeasurementA))
  
  if(nrow(subgroup_data) >= 10) {
    cat(paste0("Analysis for ", grade_level, ":\n"))
    cat(paste0("Sample size: ", nrow(subgroup_data), "\n"))
    
    # Age comparison by group within grade
    age_summary <- subgroup_data %>%
      group_by(Group) %>%
      summarise(
        n = n(),
        mean_age = round(mean(Age, na.rm = TRUE), 1),
        sd_age = round(sd(Age, na.rm = TRUE), 1),
        .groups = 'drop'
      )
    
    print(age_summary)
    cat("\n")
  }
}

cat("Subgroup Analysis Considerations:\n")
cat("- Reduced sample sizes may limit statistical power\n")
cat("- Multiple comparisons increase Type I error risk\n")
cat("- Pre-planned vs post-hoc analyses have different interpretations\n")
cat("- Clinical significance may differ from statistical significance\n")

Integration with Clinical Workflows

Quality Control Applications

Statistical visualization for laboratory and clinical QC.

cat("Quality Control Applications\n")
cat("===========================\n\n")

# QC metrics analysis
qc_data <- clinical_data %>%
  mutate(
    # Simulate QC metrics
    Processing_Batch = sample(1:5, n(), replace = TRUE),
    Staining_Quality = sample(c("Excellent", "Good", "Acceptable", "Poor"), 
                             n(), replace = TRUE, prob = c(0.4, 0.3, 0.2, 0.1)),
    Measurement_CV = abs(rnorm(n(), mean = 5, sd = 2))  # Coefficient of variation
  ) %>%
  filter(!is.na(MeasurementA))

# Batch effect analysis
if(nrow(qc_data) > 20) {
  cat("Batch Effect Analysis:\n")
  batch_summary <- qc_data %>%
    group_by(Processing_Batch) %>%
    summarise(
      n = n(),
      mean_measurement = round(mean(MeasurementA, na.rm = TRUE), 2),
      sd_measurement = round(sd(MeasurementA, na.rm = TRUE), 2),
      cv_measurement = round(sd_measurement / mean_measurement * 100, 1),
      .groups = 'drop'
    )
  
  print(batch_summary)
  
  cat("\nQuality Control Guidelines:\n")
  cat("- CV <10%: Excellent reproducibility\n")
  cat("- CV 10-15%: Good reproducibility\n") 
  cat("- CV 15-20%: Acceptable reproducibility\n")
  cat("- CV >20%: Poor reproducibility, investigate\n\n")
}

cat("QC Visualization Applications:\n")
cat("- Monitor analytical performance over time\n")
cat("- Detect systematic errors and drift\n")
cat("- Validate new analytical methods\n")
cat("- Support accreditation requirements\n")

Best Practices for Clinical Research

Statistical Considerations

cat("Statistical Best Practices for Clinical Visualization\n")
cat("====================================================\n\n")

cat("1. Choose Appropriate Statistical Tests:\n")
cat("   - Parametric: Normal distribution, continuous data\n")
cat("   - Non-parametric: Skewed data, ordinal scales\n")
cat("   - Robust: Outliers present, mixed distributions\n")
cat("   - Bayesian: Small samples, prior information available\n\n")

cat("2. Effect Size Reporting:\n")
cat("   - Always report effect sizes alongside p-values\n")
cat("   - Consider clinical significance, not just statistical significance\n")
cat("   - Use confidence intervals for uncertainty quantification\n\n")

cat("3. Multiple Comparisons:\n")
cat("   - Adjust p-values when testing multiple hypotheses\n")
cat("   - Pre-specify primary and secondary endpoints\n")
cat("   - Consider false discovery rate control\n\n")

cat("4. Sample Size Considerations:\n")
cat("   - Ensure adequate power for primary analyses\n")
cat("   - Be cautious with subgroup analyses (small n)\n")
cat("   - Report confidence intervals to show precision\n\n")

cat("5. Missing Data:\n")
cat("   - Report patterns and extent of missing data\n")
cat("   - Use appropriate missing data methods\n")
cat("   - Conduct sensitivity analyses\n")

Publication Guidelines

cat("Publication-Ready Visualization Guidelines\n")
cat("==========================================\n\n")

cat("1. Figure Quality:\n")
cat("   - High resolution (≥300 DPI) for print\n")
cat("   - Clear, readable fonts (≥10 point)\n")
cat("   - Appropriate color schemes (colorblind-friendly)\n")
cat("   - Consistent styling across figures\n\n")

cat("2. Statistical Reporting:\n")
cat("   - Include sample sizes in figure legends\n")
cat("   - Report exact p-values when possible\n")
cat("   - Specify statistical tests used\n")
cat("   - Include effect sizes and confidence intervals\n\n")

cat("3. Clinical Context:\n")
cat("   - Provide clinical interpretation of findings\n")
cat("   - Discuss clinical significance vs statistical significance\n")
cat("   - Include relevant reference ranges or cutpoints\n")
cat("   - Consider real-world application\n\n")

cat("4. Reproducibility:\n")
cat("   - Document software versions\n")
cat("   - Provide clear methods description\n")
cat("   - Share analysis code when possible\n")
cat("   - Include raw data summaries\n")

Conclusion

jjstatsplot provides comprehensive statistical visualization tools essential for modern clinical research and pathology studies. The integration of statistical testing with publication-ready graphics makes complex data analysis accessible to clinical researchers while maintaining scientific rigor.

Key Advantages

  1. Integrated Statistics: Automatic statistical testing with visualization
  2. Clinical Focus: Designed for healthcare research applications
  3. Publication Ready: High-quality graphics suitable for journals
  4. User Friendly: Accessible through jamovi interface
  5. Comprehensive: Covers all major visualization needs

Next Steps

To explore specific visualization techniques:

  • Categorical Analysis: See detailed categorical plot vignettes
  • Continuous Data: Explore advanced continuous data visualization
  • Correlation Studies: Learn comprehensive correlation analysis
  • Clinical Applications: Study real-world clinical research examples

This introduction provides the foundation for using jjstatsplot in clinical research. Explore the detailed vignettes for specific techniques and advanced applications.