Professional Violin Plots with jviolin

Introduction

The jviolin function provides professional violin plots for visualizing the distribution of continuous data across groups. Violin plots combine the best features of box plots and density plots, showing both summary statistics and the shape of the data distribution.

What are Violin Plots?

Violin plots are mirror-symmetric density plots that display: - Distribution shape: The width shows data density at each value - Summary statistics: Can include quartiles, medians, and means - Group comparisons: Easy visual comparison across categories - Data points: Optional overlay of individual observations

Key Features

Advanced Overlays: Boxplots, individual points, mean indicators, quantile lines
Multiple Distributions: Handle normal, skewed, bimodal, and complex distributions
Professional Styling: Multiple themes and color palettes
Performance Optimized: Intelligent caching for large datasets
Flexible Customization: Extensive options for appearance and behavior
Clinical Focus: Designed for medical and clinical research applications

Installation and Setup

# Load required libraries
library(ClinicoPath)

## Warning: replacing previous import 'dplyr::as_data_frame' by
## 'igraph::as_data_frame' when loading 'ClinicoPath'

## Warning: replacing previous import 'DiagrammeR::count_automorphisms' by
## 'igraph::count_automorphisms' when loading 'ClinicoPath'

## Warning: replacing previous import 'dplyr::groups' by 'igraph::groups' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'DiagrammeR::get_edge_ids' by
## 'igraph::get_edge_ids' when loading 'ClinicoPath'

## Warning: replacing previous import 'dplyr::union' by 'igraph::union' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'dplyr::select' by 'jmvcore::select' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'igraph::union' by 'lubridate::union' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'igraph::%--%' by 'lubridate::%--%' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::tnr' by 'mlr3measures::tnr' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::precision' by
## 'mlr3measures::precision' when loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::tn' by 'mlr3measures::tn' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::fnr' by 'mlr3measures::fnr' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::tp' by 'mlr3measures::tp' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::npv' by 'mlr3measures::npv' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::ppv' by 'mlr3measures::ppv' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::auc' by 'mlr3measures::auc' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::tpr' by 'mlr3measures::tpr' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::fn' by 'mlr3measures::fn' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::fp' by 'mlr3measures::fp' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::fpr' by 'mlr3measures::fpr' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::recall' by
## 'mlr3measures::recall' when loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::specificity' by
## 'mlr3measures::specificity' when loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::sensitivity' by
## 'mlr3measures::sensitivity' when loading 'ClinicoPath'

## Warning: replacing previous import 'igraph::as_data_frame' by
## 'tibble::as_data_frame' when loading 'ClinicoPath'

## Warning: replacing previous import 'igraph::crossing' by 'tidyr::crossing' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'magrittr::extract' by 'tidyr::extract' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'mlr3measures::sensitivity' by
## 'caret::sensitivity' when loading 'ClinicoPath'

## Warning: replacing previous import 'mlr3measures::specificity' by
## 'caret::specificity' when loading 'ClinicoPath'

## Registered S3 methods overwritten by 'useful':
##   method         from     
##   autoplot.acf   ggfortify
##   fortify.acf    ggfortify
##   fortify.kmeans ggfortify
##   fortify.ts     ggfortify

## Warning: replacing previous import 'jmvcore::select' by 'dplyr::select' when
## loading 'ClinicoPath'

## Registered S3 methods overwritten by 'ggpp':
##   method                  from   
##   heightDetails.titleGrob ggplot2
##   widthDetails.titleGrob  ggplot2

## Warning: replacing previous import 'DataExplorer::plot_histogram' by
## 'grafify::plot_histogram' when loading 'ClinicoPath'

## Warning: replacing previous import 'dplyr::select' by 'jmvcore::select' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'mlr3measures::auc' by 'pROC::auc' when
## loading 'ClinicoPath'

## Warning: replacing previous import 'cutpointr::roc' by 'pROC::roc' when loading
## 'ClinicoPath'

## Warning: replacing previous import 'tibble::view' by 'summarytools::view' when
## loading 'ClinicoPath'

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

# Set options for better output
options(digits = 3)
knitr::opts_chunk$set(
  fig.width = 12,
  fig.height = 8,
    dpi = 300,
  echo = TRUE,
  eval = FALSE,
  out.width = "100%"
)

# Check if packages are available
if (!requireNamespace("ggplot2", quietly = TRUE)) {
  message("Note: ggplot2 package required for violin plots")
}

Data Preparation

Let’s create realistic datasets for different types of violin plot applications:

Clinical Trial Data

# Create clinical trial dataset
set.seed(123)
clinical_data <- data.frame(
  # Patient identifiers
  patient_id = paste0("PT_", sprintf("%03d", 1:240)),
  
  # Treatment groups
  treatment_arm = factor(sample(c("Placebo", "Low_Dose", "High_Dose"), 
                               240, replace = TRUE, prob = c(0.4, 0.3, 0.3))),
  study_site = factor(sample(c("Site_A", "Site_B", "Site_C"), 240, replace = TRUE)),
  
  # Patient characteristics
  age = round(rnorm(240, 58, 15)),
  gender = factor(sample(c("Male", "Female"), 240, replace = TRUE)),
  baseline_severity = factor(sample(c("Mild", "Moderate", "Severe"), 
                                   240, replace = TRUE, prob = c(0.4, 0.4, 0.2))),
  
  # Outcome measures
  baseline_score = rnorm(240, 45, 12),
  blood_pressure = rnorm(240, 130, 20),
  quality_of_life = rnorm(240, 60, 18),
  
  # Laboratory values (often log-normally distributed)
  biomarker_a = exp(rnorm(240, 2, 0.8)),
  biomarker_b = exp(rnorm(240, 3, 0.6)),
  inflammatory_marker = exp(rnorm(240, 1.5, 1.0))
) %>%
  mutate(
    # Create realistic dose-response relationships
    change_from_baseline = case_when(
      treatment_arm == "Placebo" ~ rnorm(240, 2, 8),
      treatment_arm == "Low_Dose" ~ rnorm(240, 8, 10),
      treatment_arm == "High_Dose" ~ rnorm(240, 15, 12)
    ) + 
      # Add baseline severity effect
      case_when(
        baseline_severity == "Mild" ~ rnorm(240, 5, 5),
        baseline_severity == "Moderate" ~ rnorm(240, 0, 5),
        baseline_severity == "Severe" ~ rnorm(240, -5, 5)
      ) +
      rnorm(240, 0, 6),
    
    # Quality of life improvement
    qol_improvement = case_when(
      treatment_arm == "Placebo" ~ rnorm(240, 5, 15),
      treatment_arm == "Low_Dose" ~ rnorm(240, 12, 18),
      treatment_arm == "High_Dose" ~ rnorm(240, 20, 20)
    ) + rnorm(240, 0, 10),
    
    # Biomarker changes (treatment reduces inflammatory marker)
    inflammatory_marker = inflammatory_marker * case_when(
      treatment_arm == "Placebo" ~ 1,
      treatment_arm == "Low_Dose" ~ 0.8,
      treatment_arm == "High_Dose" ~ 0.6
    ) * exp(rnorm(240, 0, 0.3))
  )

# Display data structure
str(clinical_data)

Educational Assessment Data

# Create educational assessment dataset
education_data <- data.frame(
  student_id = 1:300,
  
  # School characteristics
  school_type = factor(sample(c("Public", "Private", "Charter"), 
                             300, replace = TRUE, prob = c(0.6, 0.25, 0.15))),
  class_size = factor(sample(c("Small", "Medium", "Large"), 
                            300, replace = TRUE, prob = c(0.3, 0.5, 0.2))),
  
  # Student demographics
  grade_level = factor(sample(c("9th", "10th", "11th", "12th"), 300, replace = TRUE)),
  gender = factor(sample(c("Male", "Female"), 300, replace = TRUE)),
  socioeconomic = factor(sample(c("Low", "Middle", "High"), 
                               300, replace = TRUE, prob = c(0.3, 0.5, 0.2))),
  
  # Teaching interventions
  teaching_method = factor(sample(c("Traditional", "Technology", "Hybrid"), 300, replace = TRUE)),
  tutoring = factor(sample(c("No", "Yes"), 300, replace = TRUE, prob = c(0.7, 0.3))),
  
  # Test scores with realistic distributions
  math_score = rnorm(300, 75, 15),
  reading_score = rnorm(300, 78, 13),
  science_score = rnorm(300, 72, 14)
) %>%
  mutate(
    # Create realistic educational relationships
    math_score = pmax(0, pmin(100, math_score + 
      case_when(
        teaching_method == "Traditional" ~ 0,
        teaching_method == "Technology" ~ 5,
        teaching_method == "Hybrid" ~ 8
      ) +
      ifelse(tutoring == "Yes", 10, 0) +
      case_when(
        socioeconomic == "Low" ~ -8,
        socioeconomic == "Middle" ~ 0,
        socioeconomic == "High" ~ 8
      ) + rnorm(300, 0, 8))),
    
    reading_score = pmax(0, pmin(100, reading_score + 
      0.3 * (math_score - 75) + rnorm(300, 0, 6))),
    
    science_score = pmax(0, pmin(100, science_score + 
      0.4 * (math_score - 75) + rnorm(300, 0, 7)))
  )

str(education_data)

Basic Usage

Simple Violin Plot

The most basic violin plot shows the distribution of a continuous variable across groups:

# Basic violin plot
basic_result <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm"
)

# The result contains the plot object
print(names(basic_result))

Understanding the Output

A basic violin plot shows: - Width: Indicates data density - wider areas have more observations - Shape: Shows distribution characteristics (normal, skewed, bimodal) - Symmetry: Violin plots are mirrored density plots - Comparison: Easy visual comparison between groups

Different Distribution Types

Violin plots excel at showing different distribution shapes:

# Normal distribution
normal_violin <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline", 
  group = "treatment_arm",
  themex = "minimal"
)

# Skewed distribution (biomarker data)
skewed_violin <- jviolin(
  data = clinical_data,
  dep = "inflammatory_marker",
  group = "treatment_arm", 
  themex = "minimal"
)

# Multiple outcomes comparison
qol_violin <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "treatment_arm",
  themex = "minimal"
)

Group Comparisons

Violin plots are ideal for comparing distributions across multiple groups:

# Educational intervention comparison
education_violin <- jviolin(
  data = education_data,
  dep = "math_score",
  group = "teaching_method",
  themex = "classic"
)

# Socioeconomic impact
socio_violin <- jviolin(
  data = education_data,
  dep = "math_score",
  group = "socioeconomic",
  themex = "classic"
)

Advanced Features

Boxplot Overlay

Add boxplots inside violins to show summary statistics:

# Violin with boxplot overlay
violin_with_box <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  add_boxplot = TRUE,
  boxplot_width = 0.2,
  boxplot_alpha = 0.8,
  themex = "bw"
)

The boxplot overlay shows: - Median: Central line - Quartiles: Box boundaries (25th and 75th percentiles) - Whiskers: Extend to data within 1.5 × IQR - Outliers: Points beyond whiskers (hidden when points are shown)

Individual Data Points

Add individual observations as points:

# Violin with data points
violin_with_points <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "treatment_arm",
  add_points = TRUE,
  point_size = 1.5,
  point_alpha = 0.6,
  point_jitter = TRUE,
  themex = "light"
)

# Without jitter (stacked points)
violin_no_jitter <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "treatment_arm",
  add_points = TRUE,
  point_size = 1.0,
  point_alpha = 0.5,
  point_jitter = FALSE,
  themex = "light"
)

Point options: - Jitter: Horizontal spread to prevent overlap - Size: Point size adjustment - Alpha: Transparency level - Color: Automatically matches group colors

Mean Indicators

Add mean values as prominent markers:

# Violin with mean indicators
violin_with_means <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  add_boxplot = TRUE,
  add_mean = TRUE,
  themex = "minimal"
)

Mean indicators appear as red diamond-shaped points, making it easy to compare central tendencies across groups.

Quantile Lines

Add horizontal lines at specific quantiles:

# Standard quartiles
violin_quartiles <- jviolin(
  data = clinical_data,
  dep = "inflammatory_marker",
  group = "treatment_arm",
  draw_quantiles = TRUE,
  quantile_lines = "0.25,0.5,0.75",
  themex = "dark"
)

# Custom quantiles
violin_custom_quant <- jviolin(
  data = clinical_data,
  dep = "inflammatory_marker",
  group = "treatment_arm",
  draw_quantiles = TRUE,
  quantile_lines = "0.1,0.9",  # 10th and 90th percentiles
  themex = "dark"
)

# Five-number summary
violin_five_num <- jviolin(
  data = clinical_data,
  dep = "inflammatory_marker",
  group = "treatment_arm",
  draw_quantiles = TRUE,
  quantile_lines = "0.1,0.25,0.5,0.75,0.9",
  themex = "dark"
)

Quantile lines help identify: - Distribution spread: Distance between quantiles - Skewness: Asymmetric quantile spacing - Outliers: Values beyond extreme quantiles

Complex Combinations

Combine multiple features for comprehensive visualization:

# Kitchen sink violin plot
complex_violin <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  add_boxplot = TRUE,
  add_points = TRUE,
  add_mean = TRUE,
  draw_quantiles = TRUE,
  quantile_lines = "0.25,0.5,0.75",
  point_size = 1.2,
  point_alpha = 0.4,
  point_jitter = TRUE,
  boxplot_width = 0.15,
  boxplot_alpha = 0.9,
  themex = "minimal"
)

Violin Customization

Violin Scaling Options

Control how violin widths are scaled:

# Equal area (default) - all violins have same area
area_scaling <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  scale_violin = "area",
  add_boxplot = TRUE
)

# Equal count - areas proportional to sample size
count_scaling <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  scale_violin = "count",
  add_boxplot = TRUE
)

# Equal width - all violins have same maximum width
width_scaling <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  scale_violin = "width",
  add_boxplot = TRUE
)

Scaling options: - Area: Equal total area (default) - good for comparing shapes - Count: Area proportional to sample size - emphasizes group sizes
- Width: Equal maximum width - good for detailed shape comparison

Trimming and Width

Control violin appearance:

# Trimmed violins (default)
trimmed_violin <- jviolin(
  data = clinical_data,
  dep = "inflammatory_marker",
  group = "treatment_arm",
  trim_violin = TRUE,
  violin_width = 1.0,
  violin_alpha = 0.7
)

# Untrimmed violins - show full density estimation
untrimmed_violin <- jviolin(
  data = clinical_data,
  dep = "inflammatory_marker",
  group = "treatment_arm",
  trim_violin = FALSE,
  violin_width = 1.2,
  violin_alpha = 0.6
)

# Wide, transparent violins
wide_violin <- jviolin(
  data = clinical_data,
  dep = "inflammatory_marker",
  group = "treatment_arm",
  violin_width = 1.5,
  violin_alpha = 0.5,
  add_boxplot = TRUE
)

Color and Fill Variables

Use different variables for color and fill aesthetics:

# Different fill variable
fill_mapping <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  fill = "baseline_severity",
  add_boxplot = TRUE
)

# Different color and fill variables
color_fill_mapping <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "study_site",
  fill = "treatment_arm",
  col = "gender",
  add_boxplot = TRUE
)

# Education example with multiple groupings
education_mapping <- jviolin(
  data = education_data,
  dep = "math_score",
  group = "teaching_method",
  fill = "socioeconomic",
  add_points = TRUE,
  point_alpha = 0.4
)

Color Palettes and Themes

Color Palettes

Choose from various color schemes:

# Default ggplot2 colors
default_colors <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  color_palette = "default",
  add_boxplot = TRUE
)

# Viridis (color-blind friendly)
viridis_colors <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  color_palette = "viridis",
  add_boxplot = TRUE
)

# ColorBrewer palettes
brewer_colors <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  color_palette = "brewer",
  add_boxplot = TRUE
)

# Manual colors
manual_colors <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  color_palette = "manual",
  manual_colors = "red,blue,green",
  add_boxplot = TRUE
)

Theme Options

Apply different visual themes:

# Minimal theme (clean, publication-ready)
minimal_theme <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "treatment_arm",
  themex = "minimal",
  add_boxplot = TRUE
)

# Classic theme (traditional R plotting)
classic_theme <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "treatment_arm",
  themex = "classic",
  add_boxplot = TRUE
)

# Black and white (high contrast)
bw_theme <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "treatment_arm",
  themex = "bw",
  add_boxplot = TRUE
)

# Dark theme
dark_theme <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "treatment_arm",
  themex = "dark",
  add_boxplot = TRUE,
  color_palette = "viridis"
)

Available themes: - minimal: Clean, modern appearance - classic: Traditional statistical plots - bw: Black and white, high contrast - dark: Dark background for presentations - light, grey, void: Additional options - ipsum: Modern typography (requires hrbrthemes package)

Real-World Applications

Clinical Trial Analysis

Dose-Response Relationships

# Primary efficacy endpoint
efficacy_violin <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  add_boxplot = TRUE,
  add_mean = TRUE,
  draw_quantiles = TRUE,
  quantile_lines = "0.25,0.5,0.75",
  themex = "minimal",
  usexlabel = TRUE,
  xlabel = "Treatment Group",
  useylabel = TRUE,
  ylabel = "Change from Baseline (points)"
)

Safety Analysis

# Biomarker safety assessment
safety_violin <- jviolin(
  data = clinical_data,
  dep = "inflammatory_marker",
  group = "treatment_arm",
  add_boxplot = TRUE,
  add_points = TRUE,
  point_alpha = 0.5,
  scale_violin = "width",
  themex = "classic",
  usexlabel = TRUE,
  xlabel = "Treatment Group",
  useylabel = TRUE,
  ylabel = "Inflammatory Marker (ng/mL)"
)

Subgroup Analysis

# Efficacy by baseline severity
subgroup_violin <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "baseline_severity",
  fill = "treatment_arm",
  add_boxplot = TRUE,
  color_palette = "viridis",
  themex = "minimal",
  usexlabel = TRUE,
  xlabel = "Baseline Disease Severity",
  useylabel = TRUE,
  ylabel = "Treatment Response"
)

Educational Research

Teaching Method Effectiveness

# Math score improvement by teaching method
teaching_violin <- jviolin(
  data = education_data,
  dep = "math_score",
  group = "teaching_method",
  add_boxplot = TRUE,
  add_mean = TRUE,
  themex = "classic",
  usexlabel = TRUE,
  xlabel = "Teaching Method",
  useylabel = TRUE,
  ylabel = "Math Score"
)

Equity Analysis

# Achievement gaps by socioeconomic status
equity_violin <- jviolin(
  data = education_data,
  dep = "math_score",
  group = "socioeconomic",
  fill = "school_type",
  add_boxplot = TRUE,
  add_points = TRUE,
  point_alpha = 0.4,
  color_palette = "brewer",
  themex = "minimal",
  usexlabel = TRUE,
  xlabel = "Socioeconomic Status",
  useylabel = TRUE,
  ylabel = "Math Achievement Score"
)

Intervention Impact

# Tutoring effectiveness across different contexts
tutoring_violin <- jviolin(
  data = education_data,
  dep = "math_score",
  group = "tutoring",
  fill = "teaching_method",
  add_boxplot = TRUE,
  add_mean = TRUE,
  color_palette = "viridis",
  themex = "light"
)

Advanced Techniques

Coordinate Flipping

Create horizontal violin plots:

# Horizontal violin plot
horizontal_violin <- jviolin(
  data = clinical_data,
  dep = "qol_improvement",
  group = "treatment_arm",
  add_boxplot = TRUE,
  flip = TRUE,
  themex = "minimal"
)

Horizontal plots are useful when: - Group names are long - Space is limited vertically - Following certain journal conventions

Missing Data Handling

Handle missing data appropriately:

# Create data with missing values
missing_data <- clinical_data
missing_data$change_from_baseline[sample(1:240, 30)] <- NA
missing_data$treatment_arm[sample(1:240, 10)] <- NA

# Exclude missing data
exclude_missing <- jviolin(
  data = missing_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  excl = TRUE,
  add_boxplot = TRUE,
  themex = "minimal"
)

# Include missing data (default behavior)
include_missing <- jviolin(
  data = missing_data,
  dep = "change_from_baseline",
  group = "treatment_arm", 
  excl = FALSE,
  add_boxplot = TRUE,
  themex = "minimal"
)

Large Dataset Optimization

The function automatically optimizes performance for large datasets:

# Create large dataset
large_data <- do.call(rbind, replicate(5, clinical_data, simplify = FALSE))
large_data$patient_id <- 1:nrow(large_data)

# Performance timing
system.time({
  large_violin <- jviolin(
    data = large_data,
    dep = "change_from_baseline",
    group = "treatment_arm",
    add_boxplot = TRUE,
    add_points = TRUE,
    point_alpha = 0.3
  )
})

# Second run should be faster due to caching
system.time({
  large_violin2 <- jviolin(
    data = large_data,
    dep = "change_from_baseline",
    group = "treatment_arm",
    add_boxplot = TRUE,
    add_points = TRUE,
    point_alpha = 0.3
  )
})

Performance features: - Intelligent caching: Automatic caching of prepared data and plots - Change detection: Cache invalidation when data or options change - Memory efficiency: Optimized data preparation and processing

Best Practices and Guidelines

When to Use Violin Plots

Use violin plots when you want to:

# Create best practices guide
best_practices <- data.frame(
  Use_Case = c(
    "Distribution comparison",
    "Outlier detection", 
    "Bimodal distributions",
    "Sample size differences",
    "Clinical endpoints",
    "Biomarker analysis"
  ),
  Description = c(
    "Compare shapes of distributions across groups",
    "Identify outliers and extreme values",
    "Visualize multi-modal data distributions", 
    "Show groups with different sample sizes",
    "Primary and secondary efficacy endpoints",
    "Laboratory values and biomarkers"
  ),
  Recommended_Options = c(
    "add_boxplot=TRUE for summary stats",
    "add_points=TRUE to show outliers",
    "trim_violin=FALSE to show full shape",
    "scale_violin='count' to show sample sizes",
    "add_mean=TRUE for clinical significance",
    "Use log scale for biomarkers if needed"
  )
)

knitr::kable(best_practices, caption = "Violin Plot Best Practices")

Statistical Interpretation

Distribution Shape Analysis

# Create interpretation guide
interpretation_guide <- data.frame(
  Pattern = c("Symmetric", "Right-skewed", "Left-skewed", "Bimodal", "Uniform"),
  Violin_Shape = c(
    "Symmetric around center",
    "Wider at bottom, tail extends up",
    "Wider at top, tail extends down", 
    "Two bulges or peaks visible",
    "Rectangular or even width"
  ),
  Typical_Examples = c(
    "Height, many biological measures",
    "Income, reaction times, biomarkers",
    "Test scores with ceiling effects",
    "Mixed populations, treatment responders",
    "Random assignments, uniform sampling"
  ),
  Analysis_Notes = c(
    "Use standard statistical tests",
    "Consider log transformation",
    "Check for ceiling/floor effects",
    "Investigate subgroups",
    "Verify randomization"
  )
)

knitr::kable(interpretation_guide, caption = "Distribution Shape Interpretation")

Customization Guidelines

Color Selection

# Clinical color recommendations
clinical_colors <- data.frame(
  Context = c("Dose groups", "Treatment vs Control", "Safety categories", "Efficacy levels"),
  Recommended_Palette = c("viridis", "manual: blue,red", "brewer", "manual: red,yellow,green"), 
  Rationale = c(
    "Sequential, color-blind safe",
    "Clear contrast, conventional colors",
    "Qualitative, distinct categories",
    "Intuitive traffic light system"
  )
)

knitr::kable(clinical_colors, caption = "Color Selection Guidelines")

Publication Requirements

# Publication-ready violin plot
publication_violin <- jviolin(
  data = clinical_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  add_boxplot = TRUE,
  add_mean = TRUE,
  color_palette = "manual",
  manual_colors = "grey80,lightblue,darkblue",
  themex = "classic",
  usexlabel = TRUE,
  xlabel = "Treatment Group",
  useylabel = TRUE,
  ylabel = "Change from Baseline (points)"
)

Publication considerations: - Grayscale compatibility: Ensure plots work in black and white - Font sizes: Use readable fonts for small print - Color choices: Consider journal color policies - Statistical annotations: Include sample sizes and significance tests

Error Handling and Troubleshooting

Common Issues

Data Validation

# Check data requirements
if (any(sapply(clinical_data[c("change_from_baseline")], function(x) !is.numeric(x)))) {
  message("Dependent variable must be numeric")
}

if (any(sapply(clinical_data[c("treatment_arm")], function(x) !is.factor(x) && !is.character(x)))) {
  message("Group variable should be categorical")
}

# Check for sufficient data
group_sizes <- table(clinical_data$treatment_arm)
print(group_sizes)
if (any(group_sizes < 5)) {
  message("Warning: Some groups have very few observations")
}

Edge Cases

# Single group
single_group_data <- clinical_data[clinical_data$treatment_arm == "Placebo", ]

single_group_violin <- jviolin(
  data = single_group_data,
  dep = "change_from_baseline",
  group = "baseline_severity",  # Different grouping
  add_boxplot = TRUE
)

# Very small groups
small_group_data <- clinical_data[1:15, ]

small_group_violin <- jviolin(
  data = small_group_data,
  dep = "change_from_baseline",
  group = "treatment_arm",
  add_points = TRUE,
  point_alpha = 0.7
)

Performance Optimization

# For very large datasets
performance_tips <- data.frame(
  Scenario = c(
    "Large dataset (>10k rows)",
    "Many groups (>10)",
    "Frequent replotting",
    "Interactive dashboards"
  ),
  Recommendation = c(
    "Use point_alpha < 0.5, avoid add_points",
    "Consider faceting instead of single plot",
    "Cache results, avoid changing data unnecessarily", 
    "Use moderate point_size and alpha values"
  ),
  Alternative = c(
    "Sample data for exploration",
    "Group into broader categories",
    "Use static plots for final versions",
    "Consider other plot types for many groups"
  )
)

knitr::kable(performance_tips, caption = "Performance Optimization Tips")

Integration with Research Workflows

Reproducible Analysis

# Set analysis parameters
analysis_params <- list(
  primary_endpoint = "change_from_baseline",
  treatment_var = "treatment_arm",
  covariates = c("baseline_severity", "age", "gender"),
  alpha_level = 0.05,
  plot_theme = "minimal"
)

# Systematic violin plot analysis
create_violin_analysis <- function(data, params) {
  results <- list()
  
  # Primary analysis
  results$primary <- jviolin(
    data = data,
    dep = params$primary_endpoint,
    group = params$treatment_var,
    add_boxplot = TRUE,
    add_mean = TRUE,
    themex = params$plot_theme
  )
  
  # Subgroup analyses
  for (covariate in params$covariates[1:2]) {  # Limit for example
    results[[paste0("subgroup_", covariate)]] <- jviolin(
      data = data,
      dep = params$primary_endpoint,
      group = covariate,
      fill = params$treatment_var,
      add_boxplot = TRUE,
      themex = params$plot_theme
    )
  }
  
  return(results)
}

# Run systematic analysis
study_violins <- create_violin_analysis(clinical_data, analysis_params)

Quality Control

# Data quality checks
qc_checks <- function(data, dep_var, group_var) {
  checks <- list()
  
  # Check for missing data
  checks$missing_dep <- sum(is.na(data[[dep_var]]))
  checks$missing_group <- sum(is.na(data[[group_var]]))
  
  # Check group sizes
  checks$group_sizes <- table(data[[group_var]], useNA = "ifany")
  checks$min_group_size <- min(checks$group_sizes)
  
  # Check distribution characteristics
  checks$dep_var_range <- range(data[[dep_var]], na.rm = TRUE)
  checks$dep_var_outliers <- sum(abs(scale(data[[dep_var]])) > 3, na.rm = TRUE)
  
  return(checks)
}

# Run QC
qc_results <- qc_checks(clinical_data, "change_from_baseline", "treatment_arm")
print(qc_results)

Summary

The jviolin function provides a comprehensive solution for distribution visualization in clinical and research settings. Key advantages include:

Core Capabilities

Advanced Overlays: Boxplots, points, means, and quantile lines for comprehensive visualization
Distribution Analysis: Excellent for comparing shapes, identifying outliers, and detecting bimodality
Flexible Grouping: Multiple grouping variables with color and fill mapping
Professional Output: Publication-ready plots with extensive customization

Performance Features

Intelligent Caching: Automatic optimization for repeated analyses and large datasets
Memory Efficiency: Optimized data preparation and processing
Scalability: Handles datasets from small samples to large studies
Responsiveness: Fast rendering with smart change detection

Clinical Applications

Efficacy Analysis: Primary and secondary endpoints with dose-response visualization
Safety Assessment: Biomarker distributions and adverse event analysis
Subgroup Analysis: Treatment effects across patient populations
Quality Control: Distribution assessment and outlier detection

Research Benefits

Educational Research: Teaching method effectiveness and achievement gap analysis
Survey Research: Response distribution and group comparison analysis
Biomedical Research: Expression data, biomarker analysis, and treatment responses
Epidemiological Studies: Population distribution comparisons

Next Steps

Explore different overlay combinations for your specific data types
Experiment with color palettes and themes for publication requirements
Try the performance optimization features with your large datasets
Consider violin plots for exploratory data analysis in your research workflow

sessionInfo()

Advanced Distribution Visualization for Clinical and Research Data

ClinicoPath

last-modified