Professional Violin Plots with jviolin
Advanced Distribution Visualization for Clinical and Research Data
ClinicoPath
last-modified
Source:vignettes/jjstatsplot-34-jviolin-comprehensive.Rmd
jjstatsplot-34-jviolin-comprehensive.Rmd
Introduction
The jviolin
function provides professional violin plots
for visualizing the distribution of continuous data across groups.
Violin plots combine the best features of box plots and density plots,
showing both summary statistics and the shape of the data
distribution.
What are Violin Plots?
Violin plots are mirror-symmetric density plots that display: - Distribution shape: The width shows data density at each value - Summary statistics: Can include quartiles, medians, and means - Group comparisons: Easy visual comparison across categories - Data points: Optional overlay of individual observations
Key Features
- Advanced Overlays: Boxplots, individual points, mean indicators, quantile lines
- Multiple Distributions: Handle normal, skewed, bimodal, and complex distributions
- Professional Styling: Multiple themes and color palettes
- Performance Optimized: Intelligent caching for large datasets
- Flexible Customization: Extensive options for appearance and behavior
- Clinical Focus: Designed for medical and clinical research applications
Installation and Setup
# Load required libraries
library(ClinicoPath)
## Warning: replacing previous import 'dplyr::as_data_frame' by
## 'igraph::as_data_frame' when loading 'ClinicoPath'
## Warning: replacing previous import 'DiagrammeR::count_automorphisms' by
## 'igraph::count_automorphisms' when loading 'ClinicoPath'
## Warning: replacing previous import 'dplyr::groups' by 'igraph::groups' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'DiagrammeR::get_edge_ids' by
## 'igraph::get_edge_ids' when loading 'ClinicoPath'
## Warning: replacing previous import 'dplyr::union' by 'igraph::union' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'dplyr::select' by 'jmvcore::select' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'igraph::union' by 'lubridate::union' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'igraph::%--%' by 'lubridate::%--%' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::tnr' by 'mlr3measures::tnr' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::precision' by
## 'mlr3measures::precision' when loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::tn' by 'mlr3measures::tn' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::fnr' by 'mlr3measures::fnr' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::tp' by 'mlr3measures::tp' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::npv' by 'mlr3measures::npv' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::ppv' by 'mlr3measures::ppv' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::auc' by 'mlr3measures::auc' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::tpr' by 'mlr3measures::tpr' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::fn' by 'mlr3measures::fn' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::fp' by 'mlr3measures::fp' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::fpr' by 'mlr3measures::fpr' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::recall' by
## 'mlr3measures::recall' when loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::specificity' by
## 'mlr3measures::specificity' when loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::sensitivity' by
## 'mlr3measures::sensitivity' when loading 'ClinicoPath'
## Warning: replacing previous import 'igraph::as_data_frame' by
## 'tibble::as_data_frame' when loading 'ClinicoPath'
## Warning: replacing previous import 'igraph::crossing' by 'tidyr::crossing' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'magrittr::extract' by 'tidyr::extract' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'mlr3measures::sensitivity' by
## 'caret::sensitivity' when loading 'ClinicoPath'
## Warning: replacing previous import 'mlr3measures::specificity' by
## 'caret::specificity' when loading 'ClinicoPath'
## Registered S3 methods overwritten by 'useful':
## method from
## autoplot.acf ggfortify
## fortify.acf ggfortify
## fortify.kmeans ggfortify
## fortify.ts ggfortify
## Warning: replacing previous import 'jmvcore::select' by 'dplyr::select' when
## loading 'ClinicoPath'
## Registered S3 methods overwritten by 'ggpp':
## method from
## heightDetails.titleGrob ggplot2
## widthDetails.titleGrob ggplot2
## Warning: replacing previous import 'DataExplorer::plot_histogram' by
## 'grafify::plot_histogram' when loading 'ClinicoPath'
## Warning: replacing previous import 'dplyr::select' by 'jmvcore::select' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'mlr3measures::auc' by 'pROC::auc' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::roc' by 'pROC::roc' when loading
## 'ClinicoPath'
## Warning: replacing previous import 'tibble::view' by 'summarytools::view' when
## loading 'ClinicoPath'
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
# Set options for better output
options(digits = 3)
knitr::opts_chunk$set(
fig.width = 12,
fig.height = 8,
dpi = 300,
echo = TRUE,
eval = FALSE,
out.width = "100%"
)
# Check if packages are available
if (!requireNamespace("ggplot2", quietly = TRUE)) {
message("Note: ggplot2 package required for violin plots")
}
Data Preparation
Let’s create realistic datasets for different types of violin plot applications:
Clinical Trial Data
# Create clinical trial dataset
set.seed(123)
clinical_data <- data.frame(
# Patient identifiers
patient_id = paste0("PT_", sprintf("%03d", 1:240)),
# Treatment groups
treatment_arm = factor(sample(c("Placebo", "Low_Dose", "High_Dose"),
240, replace = TRUE, prob = c(0.4, 0.3, 0.3))),
study_site = factor(sample(c("Site_A", "Site_B", "Site_C"), 240, replace = TRUE)),
# Patient characteristics
age = round(rnorm(240, 58, 15)),
gender = factor(sample(c("Male", "Female"), 240, replace = TRUE)),
baseline_severity = factor(sample(c("Mild", "Moderate", "Severe"),
240, replace = TRUE, prob = c(0.4, 0.4, 0.2))),
# Outcome measures
baseline_score = rnorm(240, 45, 12),
blood_pressure = rnorm(240, 130, 20),
quality_of_life = rnorm(240, 60, 18),
# Laboratory values (often log-normally distributed)
biomarker_a = exp(rnorm(240, 2, 0.8)),
biomarker_b = exp(rnorm(240, 3, 0.6)),
inflammatory_marker = exp(rnorm(240, 1.5, 1.0))
) %>%
mutate(
# Create realistic dose-response relationships
change_from_baseline = case_when(
treatment_arm == "Placebo" ~ rnorm(240, 2, 8),
treatment_arm == "Low_Dose" ~ rnorm(240, 8, 10),
treatment_arm == "High_Dose" ~ rnorm(240, 15, 12)
) +
# Add baseline severity effect
case_when(
baseline_severity == "Mild" ~ rnorm(240, 5, 5),
baseline_severity == "Moderate" ~ rnorm(240, 0, 5),
baseline_severity == "Severe" ~ rnorm(240, -5, 5)
) +
rnorm(240, 0, 6),
# Quality of life improvement
qol_improvement = case_when(
treatment_arm == "Placebo" ~ rnorm(240, 5, 15),
treatment_arm == "Low_Dose" ~ rnorm(240, 12, 18),
treatment_arm == "High_Dose" ~ rnorm(240, 20, 20)
) + rnorm(240, 0, 10),
# Biomarker changes (treatment reduces inflammatory marker)
inflammatory_marker = inflammatory_marker * case_when(
treatment_arm == "Placebo" ~ 1,
treatment_arm == "Low_Dose" ~ 0.8,
treatment_arm == "High_Dose" ~ 0.6
) * exp(rnorm(240, 0, 0.3))
)
# Display data structure
str(clinical_data)
Educational Assessment Data
# Create educational assessment dataset
education_data <- data.frame(
student_id = 1:300,
# School characteristics
school_type = factor(sample(c("Public", "Private", "Charter"),
300, replace = TRUE, prob = c(0.6, 0.25, 0.15))),
class_size = factor(sample(c("Small", "Medium", "Large"),
300, replace = TRUE, prob = c(0.3, 0.5, 0.2))),
# Student demographics
grade_level = factor(sample(c("9th", "10th", "11th", "12th"), 300, replace = TRUE)),
gender = factor(sample(c("Male", "Female"), 300, replace = TRUE)),
socioeconomic = factor(sample(c("Low", "Middle", "High"),
300, replace = TRUE, prob = c(0.3, 0.5, 0.2))),
# Teaching interventions
teaching_method = factor(sample(c("Traditional", "Technology", "Hybrid"), 300, replace = TRUE)),
tutoring = factor(sample(c("No", "Yes"), 300, replace = TRUE, prob = c(0.7, 0.3))),
# Test scores with realistic distributions
math_score = rnorm(300, 75, 15),
reading_score = rnorm(300, 78, 13),
science_score = rnorm(300, 72, 14)
) %>%
mutate(
# Create realistic educational relationships
math_score = pmax(0, pmin(100, math_score +
case_when(
teaching_method == "Traditional" ~ 0,
teaching_method == "Technology" ~ 5,
teaching_method == "Hybrid" ~ 8
) +
ifelse(tutoring == "Yes", 10, 0) +
case_when(
socioeconomic == "Low" ~ -8,
socioeconomic == "Middle" ~ 0,
socioeconomic == "High" ~ 8
) + rnorm(300, 0, 8))),
reading_score = pmax(0, pmin(100, reading_score +
0.3 * (math_score - 75) + rnorm(300, 0, 6))),
science_score = pmax(0, pmin(100, science_score +
0.4 * (math_score - 75) + rnorm(300, 0, 7)))
)
str(education_data)
Basic Usage
Simple Violin Plot
The most basic violin plot shows the distribution of a continuous variable across groups:
Different Distribution Types
Violin plots excel at showing different distribution shapes:
# Normal distribution
normal_violin <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
themex = "minimal"
)
# Skewed distribution (biomarker data)
skewed_violin <- jviolin(
data = clinical_data,
dep = "inflammatory_marker",
group = "treatment_arm",
themex = "minimal"
)
# Multiple outcomes comparison
qol_violin <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "treatment_arm",
themex = "minimal"
)
Advanced Features
Boxplot Overlay
Add boxplots inside violins to show summary statistics:
# Violin with boxplot overlay
violin_with_box <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
add_boxplot = TRUE,
boxplot_width = 0.2,
boxplot_alpha = 0.8,
themex = "bw"
)
The boxplot overlay shows: - Median: Central line - Quartiles: Box boundaries (25th and 75th percentiles) - Whiskers: Extend to data within 1.5 × IQR - Outliers: Points beyond whiskers (hidden when points are shown)
Individual Data Points
Add individual observations as points:
# Violin with data points
violin_with_points <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "treatment_arm",
add_points = TRUE,
point_size = 1.5,
point_alpha = 0.6,
point_jitter = TRUE,
themex = "light"
)
# Without jitter (stacked points)
violin_no_jitter <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "treatment_arm",
add_points = TRUE,
point_size = 1.0,
point_alpha = 0.5,
point_jitter = FALSE,
themex = "light"
)
Point options: - Jitter: Horizontal spread to prevent overlap - Size: Point size adjustment - Alpha: Transparency level - Color: Automatically matches group colors
Mean Indicators
Add mean values as prominent markers:
# Violin with mean indicators
violin_with_means <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
add_boxplot = TRUE,
add_mean = TRUE,
themex = "minimal"
)
Mean indicators appear as red diamond-shaped points, making it easy to compare central tendencies across groups.
Quantile Lines
Add horizontal lines at specific quantiles:
# Standard quartiles
violin_quartiles <- jviolin(
data = clinical_data,
dep = "inflammatory_marker",
group = "treatment_arm",
draw_quantiles = TRUE,
quantile_lines = "0.25,0.5,0.75",
themex = "dark"
)
# Custom quantiles
violin_custom_quant <- jviolin(
data = clinical_data,
dep = "inflammatory_marker",
group = "treatment_arm",
draw_quantiles = TRUE,
quantile_lines = "0.1,0.9", # 10th and 90th percentiles
themex = "dark"
)
# Five-number summary
violin_five_num <- jviolin(
data = clinical_data,
dep = "inflammatory_marker",
group = "treatment_arm",
draw_quantiles = TRUE,
quantile_lines = "0.1,0.25,0.5,0.75,0.9",
themex = "dark"
)
Quantile lines help identify: - Distribution spread: Distance between quantiles - Skewness: Asymmetric quantile spacing - Outliers: Values beyond extreme quantiles
Complex Combinations
Combine multiple features for comprehensive visualization:
# Kitchen sink violin plot
complex_violin <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
add_boxplot = TRUE,
add_points = TRUE,
add_mean = TRUE,
draw_quantiles = TRUE,
quantile_lines = "0.25,0.5,0.75",
point_size = 1.2,
point_alpha = 0.4,
point_jitter = TRUE,
boxplot_width = 0.15,
boxplot_alpha = 0.9,
themex = "minimal"
)
Violin Customization
Violin Scaling Options
Control how violin widths are scaled:
# Equal area (default) - all violins have same area
area_scaling <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
scale_violin = "area",
add_boxplot = TRUE
)
# Equal count - areas proportional to sample size
count_scaling <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
scale_violin = "count",
add_boxplot = TRUE
)
# Equal width - all violins have same maximum width
width_scaling <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
scale_violin = "width",
add_boxplot = TRUE
)
Scaling options: - Area: Equal total area (default)
- good for comparing shapes - Count: Area proportional
to sample size - emphasizes group sizes
- Width: Equal maximum width - good for detailed shape
comparison
Trimming and Width
Control violin appearance:
# Trimmed violins (default)
trimmed_violin <- jviolin(
data = clinical_data,
dep = "inflammatory_marker",
group = "treatment_arm",
trim_violin = TRUE,
violin_width = 1.0,
violin_alpha = 0.7
)
# Untrimmed violins - show full density estimation
untrimmed_violin <- jviolin(
data = clinical_data,
dep = "inflammatory_marker",
group = "treatment_arm",
trim_violin = FALSE,
violin_width = 1.2,
violin_alpha = 0.6
)
# Wide, transparent violins
wide_violin <- jviolin(
data = clinical_data,
dep = "inflammatory_marker",
group = "treatment_arm",
violin_width = 1.5,
violin_alpha = 0.5,
add_boxplot = TRUE
)
Color and Fill Variables
Use different variables for color and fill aesthetics:
# Different fill variable
fill_mapping <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
fill = "baseline_severity",
add_boxplot = TRUE
)
# Different color and fill variables
color_fill_mapping <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "study_site",
fill = "treatment_arm",
col = "gender",
add_boxplot = TRUE
)
# Education example with multiple groupings
education_mapping <- jviolin(
data = education_data,
dep = "math_score",
group = "teaching_method",
fill = "socioeconomic",
add_points = TRUE,
point_alpha = 0.4
)
Color Palettes and Themes
Color Palettes
Choose from various color schemes:
# Default ggplot2 colors
default_colors <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
color_palette = "default",
add_boxplot = TRUE
)
# Viridis (color-blind friendly)
viridis_colors <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
color_palette = "viridis",
add_boxplot = TRUE
)
# ColorBrewer palettes
brewer_colors <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
color_palette = "brewer",
add_boxplot = TRUE
)
# Manual colors
manual_colors <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
color_palette = "manual",
manual_colors = "red,blue,green",
add_boxplot = TRUE
)
Theme Options
Apply different visual themes:
# Minimal theme (clean, publication-ready)
minimal_theme <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "treatment_arm",
themex = "minimal",
add_boxplot = TRUE
)
# Classic theme (traditional R plotting)
classic_theme <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "treatment_arm",
themex = "classic",
add_boxplot = TRUE
)
# Black and white (high contrast)
bw_theme <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "treatment_arm",
themex = "bw",
add_boxplot = TRUE
)
# Dark theme
dark_theme <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "treatment_arm",
themex = "dark",
add_boxplot = TRUE,
color_palette = "viridis"
)
Available themes: - minimal: Clean, modern appearance - classic: Traditional statistical plots - bw: Black and white, high contrast - dark: Dark background for presentations - light, grey, void: Additional options - ipsum: Modern typography (requires hrbrthemes package)
Real-World Applications
Clinical Trial Analysis
Dose-Response Relationships
# Primary efficacy endpoint
efficacy_violin <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
add_boxplot = TRUE,
add_mean = TRUE,
draw_quantiles = TRUE,
quantile_lines = "0.25,0.5,0.75",
themex = "minimal",
usexlabel = TRUE,
xlabel = "Treatment Group",
useylabel = TRUE,
ylabel = "Change from Baseline (points)"
)
Safety Analysis
# Biomarker safety assessment
safety_violin <- jviolin(
data = clinical_data,
dep = "inflammatory_marker",
group = "treatment_arm",
add_boxplot = TRUE,
add_points = TRUE,
point_alpha = 0.5,
scale_violin = "width",
themex = "classic",
usexlabel = TRUE,
xlabel = "Treatment Group",
useylabel = TRUE,
ylabel = "Inflammatory Marker (ng/mL)"
)
Subgroup Analysis
# Efficacy by baseline severity
subgroup_violin <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "baseline_severity",
fill = "treatment_arm",
add_boxplot = TRUE,
color_palette = "viridis",
themex = "minimal",
usexlabel = TRUE,
xlabel = "Baseline Disease Severity",
useylabel = TRUE,
ylabel = "Treatment Response"
)
Educational Research
Teaching Method Effectiveness
# Math score improvement by teaching method
teaching_violin <- jviolin(
data = education_data,
dep = "math_score",
group = "teaching_method",
add_boxplot = TRUE,
add_mean = TRUE,
themex = "classic",
usexlabel = TRUE,
xlabel = "Teaching Method",
useylabel = TRUE,
ylabel = "Math Score"
)
Equity Analysis
# Achievement gaps by socioeconomic status
equity_violin <- jviolin(
data = education_data,
dep = "math_score",
group = "socioeconomic",
fill = "school_type",
add_boxplot = TRUE,
add_points = TRUE,
point_alpha = 0.4,
color_palette = "brewer",
themex = "minimal",
usexlabel = TRUE,
xlabel = "Socioeconomic Status",
useylabel = TRUE,
ylabel = "Math Achievement Score"
)
Intervention Impact
# Tutoring effectiveness across different contexts
tutoring_violin <- jviolin(
data = education_data,
dep = "math_score",
group = "tutoring",
fill = "teaching_method",
add_boxplot = TRUE,
add_mean = TRUE,
color_palette = "viridis",
themex = "light"
)
Advanced Techniques
Coordinate Flipping
Create horizontal violin plots:
# Horizontal violin plot
horizontal_violin <- jviolin(
data = clinical_data,
dep = "qol_improvement",
group = "treatment_arm",
add_boxplot = TRUE,
flip = TRUE,
themex = "minimal"
)
Horizontal plots are useful when: - Group names are long - Space is limited vertically - Following certain journal conventions
Missing Data Handling
Handle missing data appropriately:
# Create data with missing values
missing_data <- clinical_data
missing_data$change_from_baseline[sample(1:240, 30)] <- NA
missing_data$treatment_arm[sample(1:240, 10)] <- NA
# Exclude missing data
exclude_missing <- jviolin(
data = missing_data,
dep = "change_from_baseline",
group = "treatment_arm",
excl = TRUE,
add_boxplot = TRUE,
themex = "minimal"
)
# Include missing data (default behavior)
include_missing <- jviolin(
data = missing_data,
dep = "change_from_baseline",
group = "treatment_arm",
excl = FALSE,
add_boxplot = TRUE,
themex = "minimal"
)
Large Dataset Optimization
The function automatically optimizes performance for large datasets:
# Create large dataset
large_data <- do.call(rbind, replicate(5, clinical_data, simplify = FALSE))
large_data$patient_id <- 1:nrow(large_data)
# Performance timing
system.time({
large_violin <- jviolin(
data = large_data,
dep = "change_from_baseline",
group = "treatment_arm",
add_boxplot = TRUE,
add_points = TRUE,
point_alpha = 0.3
)
})
# Second run should be faster due to caching
system.time({
large_violin2 <- jviolin(
data = large_data,
dep = "change_from_baseline",
group = "treatment_arm",
add_boxplot = TRUE,
add_points = TRUE,
point_alpha = 0.3
)
})
Performance features: - Intelligent caching: Automatic caching of prepared data and plots - Change detection: Cache invalidation when data or options change - Memory efficiency: Optimized data preparation and processing
Best Practices and Guidelines
When to Use Violin Plots
Use violin plots when you want to:
# Create best practices guide
best_practices <- data.frame(
Use_Case = c(
"Distribution comparison",
"Outlier detection",
"Bimodal distributions",
"Sample size differences",
"Clinical endpoints",
"Biomarker analysis"
),
Description = c(
"Compare shapes of distributions across groups",
"Identify outliers and extreme values",
"Visualize multi-modal data distributions",
"Show groups with different sample sizes",
"Primary and secondary efficacy endpoints",
"Laboratory values and biomarkers"
),
Recommended_Options = c(
"add_boxplot=TRUE for summary stats",
"add_points=TRUE to show outliers",
"trim_violin=FALSE to show full shape",
"scale_violin='count' to show sample sizes",
"add_mean=TRUE for clinical significance",
"Use log scale for biomarkers if needed"
)
)
knitr::kable(best_practices, caption = "Violin Plot Best Practices")
Statistical Interpretation
Distribution Shape Analysis
# Create interpretation guide
interpretation_guide <- data.frame(
Pattern = c("Symmetric", "Right-skewed", "Left-skewed", "Bimodal", "Uniform"),
Violin_Shape = c(
"Symmetric around center",
"Wider at bottom, tail extends up",
"Wider at top, tail extends down",
"Two bulges or peaks visible",
"Rectangular or even width"
),
Typical_Examples = c(
"Height, many biological measures",
"Income, reaction times, biomarkers",
"Test scores with ceiling effects",
"Mixed populations, treatment responders",
"Random assignments, uniform sampling"
),
Analysis_Notes = c(
"Use standard statistical tests",
"Consider log transformation",
"Check for ceiling/floor effects",
"Investigate subgroups",
"Verify randomization"
)
)
knitr::kable(interpretation_guide, caption = "Distribution Shape Interpretation")
Customization Guidelines
Color Selection
# Clinical color recommendations
clinical_colors <- data.frame(
Context = c("Dose groups", "Treatment vs Control", "Safety categories", "Efficacy levels"),
Recommended_Palette = c("viridis", "manual: blue,red", "brewer", "manual: red,yellow,green"),
Rationale = c(
"Sequential, color-blind safe",
"Clear contrast, conventional colors",
"Qualitative, distinct categories",
"Intuitive traffic light system"
)
)
knitr::kable(clinical_colors, caption = "Color Selection Guidelines")
Publication Requirements
# Publication-ready violin plot
publication_violin <- jviolin(
data = clinical_data,
dep = "change_from_baseline",
group = "treatment_arm",
add_boxplot = TRUE,
add_mean = TRUE,
color_palette = "manual",
manual_colors = "grey80,lightblue,darkblue",
themex = "classic",
usexlabel = TRUE,
xlabel = "Treatment Group",
useylabel = TRUE,
ylabel = "Change from Baseline (points)"
)
Publication considerations: - Grayscale compatibility: Ensure plots work in black and white - Font sizes: Use readable fonts for small print - Color choices: Consider journal color policies - Statistical annotations: Include sample sizes and significance tests
Error Handling and Troubleshooting
Common Issues
Data Validation
# Check data requirements
if (any(sapply(clinical_data[c("change_from_baseline")], function(x) !is.numeric(x)))) {
message("Dependent variable must be numeric")
}
if (any(sapply(clinical_data[c("treatment_arm")], function(x) !is.factor(x) && !is.character(x)))) {
message("Group variable should be categorical")
}
# Check for sufficient data
group_sizes <- table(clinical_data$treatment_arm)
print(group_sizes)
if (any(group_sizes < 5)) {
message("Warning: Some groups have very few observations")
}
Edge Cases
# Single group
single_group_data <- clinical_data[clinical_data$treatment_arm == "Placebo", ]
single_group_violin <- jviolin(
data = single_group_data,
dep = "change_from_baseline",
group = "baseline_severity", # Different grouping
add_boxplot = TRUE
)
# Very small groups
small_group_data <- clinical_data[1:15, ]
small_group_violin <- jviolin(
data = small_group_data,
dep = "change_from_baseline",
group = "treatment_arm",
add_points = TRUE,
point_alpha = 0.7
)
Performance Optimization
# For very large datasets
performance_tips <- data.frame(
Scenario = c(
"Large dataset (>10k rows)",
"Many groups (>10)",
"Frequent replotting",
"Interactive dashboards"
),
Recommendation = c(
"Use point_alpha < 0.5, avoid add_points",
"Consider faceting instead of single plot",
"Cache results, avoid changing data unnecessarily",
"Use moderate point_size and alpha values"
),
Alternative = c(
"Sample data for exploration",
"Group into broader categories",
"Use static plots for final versions",
"Consider other plot types for many groups"
)
)
knitr::kable(performance_tips, caption = "Performance Optimization Tips")
Integration with Research Workflows
Reproducible Analysis
# Set analysis parameters
analysis_params <- list(
primary_endpoint = "change_from_baseline",
treatment_var = "treatment_arm",
covariates = c("baseline_severity", "age", "gender"),
alpha_level = 0.05,
plot_theme = "minimal"
)
# Systematic violin plot analysis
create_violin_analysis <- function(data, params) {
results <- list()
# Primary analysis
results$primary <- jviolin(
data = data,
dep = params$primary_endpoint,
group = params$treatment_var,
add_boxplot = TRUE,
add_mean = TRUE,
themex = params$plot_theme
)
# Subgroup analyses
for (covariate in params$covariates[1:2]) { # Limit for example
results[[paste0("subgroup_", covariate)]] <- jviolin(
data = data,
dep = params$primary_endpoint,
group = covariate,
fill = params$treatment_var,
add_boxplot = TRUE,
themex = params$plot_theme
)
}
return(results)
}
# Run systematic analysis
study_violins <- create_violin_analysis(clinical_data, analysis_params)
Quality Control
# Data quality checks
qc_checks <- function(data, dep_var, group_var) {
checks <- list()
# Check for missing data
checks$missing_dep <- sum(is.na(data[[dep_var]]))
checks$missing_group <- sum(is.na(data[[group_var]]))
# Check group sizes
checks$group_sizes <- table(data[[group_var]], useNA = "ifany")
checks$min_group_size <- min(checks$group_sizes)
# Check distribution characteristics
checks$dep_var_range <- range(data[[dep_var]], na.rm = TRUE)
checks$dep_var_outliers <- sum(abs(scale(data[[dep_var]])) > 3, na.rm = TRUE)
return(checks)
}
# Run QC
qc_results <- qc_checks(clinical_data, "change_from_baseline", "treatment_arm")
print(qc_results)
Summary
The jviolin
function provides a comprehensive solution
for distribution visualization in clinical and research settings. Key
advantages include:
Core Capabilities
- Advanced Overlays: Boxplots, points, means, and quantile lines for comprehensive visualization
- Distribution Analysis: Excellent for comparing shapes, identifying outliers, and detecting bimodality
- Flexible Grouping: Multiple grouping variables with color and fill mapping
- Professional Output: Publication-ready plots with extensive customization
Performance Features
- Intelligent Caching: Automatic optimization for repeated analyses and large datasets
- Memory Efficiency: Optimized data preparation and processing
- Scalability: Handles datasets from small samples to large studies
- Responsiveness: Fast rendering with smart change detection
Clinical Applications
- Efficacy Analysis: Primary and secondary endpoints with dose-response visualization
- Safety Assessment: Biomarker distributions and adverse event analysis
- Subgroup Analysis: Treatment effects across patient populations
- Quality Control: Distribution assessment and outlier detection
Research Benefits
- Educational Research: Teaching method effectiveness and achievement gap analysis
- Survey Research: Response distribution and group comparison analysis
- Biomedical Research: Expression data, biomarker analysis, and treatment responses
- Epidemiological Studies: Population distribution comparisons