jjriverplot: Comprehensive River Plot and Flow Visualization
ClinicoPath
2025-07-13
Source:vignettes/jjstatsplot-30-jjriverplot-comprehensive.Rmd
jjstatsplot-30-jjriverplot-comprehensive.Rmd
Overview
The jjriverplot
function provides a comprehensive
interface for creating river plots (alluvial diagrams), Sankey diagrams,
and stream graphs to visualize flows, transitions, and pathways over
time or between categories. This function is built on
ggplot2
and ggalluvial
, offering multiple
visualization styles with performance optimization through internal
caching.
Key Features
- Multiple plot types: Alluvial diagrams, Sankey diagrams, and stream graphs
- Flexible data formats: Supports both longitudinal (long) and cross-sectional (wide) data
- Weight integration: Stream widths can represent quantities like costs, counts, or frequencies
- Rich customization: Fill patterns, curve types, labeling options, and themes
- Performance optimized: Uses internal caching to eliminate redundant computations
- Clinical research ready: Designed for patient pathway and treatment flow analysis
When to Use River Plots
River plots excel at visualizing:
- Patient treatment pathways: Response changes over time
- Customer journey analysis: Funnel progression and conversion flows
- Educational progressions: Student performance tracking through stages
- Process flows: Manufacturing, approval, or workflow stages
- Market transitions: Brand switching, category migration
- Longitudinal studies: Any categorical changes over time
Installation and Setup
# Install ClinicoPath if not already installed
if (!require("ClinicoPath")) {
devtools::install_github("sbalci/ClinicoPathJamoviModule")
}
library(ClinicoPath)
library(ggplot2)
Quick Start
Basic Alluvial Plot (Longitudinal Data)
# Load test data
data(jjriverplot_test_data_long)
# Basic treatment response flow over time
result <- jjriverplot(
data = jjriverplot_test_data_long,
time = "timepoint",
strata = "treatment_response",
plotType = "alluvial"
)
# View the plot
result$plot
Weighted River Plot
# River plot with treatment costs as weights
result_weighted <- jjriverplot(
data = jjriverplot_test_data_long,
time = "timepoint",
strata = "treatment_response",
weight = "treatment_cost",
plotType = "alluvial",
labelNodes = TRUE,
showCounts = TRUE
)
# View the plot
result_weighted$plot
Multi-Stage Pathway (Wide Format)
# Load wide format data
data(jjriverplot_test_data_wide)
# Multi-stage treatment pathway
result_pathway <- jjriverplot(
data = jjriverplot_test_data_wide,
strata = c("month3_response", "month6_response", "month12_response"),
plotType = "sankey",
fillType = "first",
mytitle = "Treatment Response Pathway"
)
# View the plot
result_pathway$plot
Function Parameters
Core Parameters
-
data
: Input data frame containing variables to analyze -
time
: Time or sequence variable (for longitudinal data) -
strata
: Categorical variables representing flow categories -
weight
: Optional numeric variable for stream width weighting -
id
: Optional identifier for tracking individual entities
Plot Types
Alluvial Diagrams
# Alluvial plots show flowing streams between time points
# - Curved connections emphasize continuity
# - Stream width represents frequency or quantity
# - Good for tracking changes in categorical variables over time
Best for: - Longitudinal studies: Patient outcomes over treatment periods - Cohort tracking: Changes in group membership over time - Transition analysis: Movement between categories
Sankey Diagrams
# Sankey diagrams show directed flows between stages
# - Straighter connections emphasize flow direction
# - Node width represents total flow through that stage
# - Good for process flows and sequential decision points
Best for: - Process analysis: Manufacturing or approval workflows - Resource allocation: Budget flows, staff assignments - Multi-stage conversions: Sales funnels, patient triage
Stream Graphs
# Stream graphs show stacked area plots over time
# - Shows composition changes continuously
# - Total height represents overall volume
# - Good for showing relative proportions over time
Best for: - Composition analysis: Market share changes over time - Population studies: Demographic changes - Portfolio analysis: Investment allocations over time
Performance Optimizations
Caching Implementation
The function includes significant performance optimizations:
Previous Issues: - Redundant data processing in plot method - Repeated factor conversion operations - Multiple jmvcore::naOmit() calls - Complex option processing repeated
Current Optimizations: - Data
Caching: Uses .prepareData()
method to cache
processed data - Options Caching: Uses
.prepareOptions()
method to cache option processing -
Factor Conversion: Variables converted once and cached
- Progress Feedback: Clear user messaging during
processing
Advanced Usage Examples
Clinical Trial Analysis
# Treatment response progression over time
clinical_pathway <- jjriverplot(
data = jjriverplot_test_data_long,
time = "timepoint",
strata = "treatment_response",
weight = "treatment_cost",
plotType = "alluvial",
fillType = "first",
labelNodes = TRUE,
mytitle = "Patient Treatment Response Pathways",
xtitle = "Study Timepoint",
ytitle = "Treatment Cost ($)"
)
# This reveals:
# - How patient responses change over time
# - Cost implications of different response patterns
# - Treatment effectiveness across timepoints
Quality of Life Progression
# Quality of life changes alongside treatment response
qol_flow <- jjriverplot(
data = jjriverplot_test_data_long,
time = "timepoint",
strata = "quality_of_life",
plotType = "alluvial",
fillType = "last",
curveType = "cardinal",
mytitle = "Quality of Life Progression",
xtitle = "Study Period",
ytitle = "Patient Count"
)
# This shows:
# - Patient quality of life trajectories
# - Correlation with treatment responses
# - Long-term outcome patterns
Treatment Arm Comparison
# Compare pathways by treatment arm using wide format
treatment_comparison <- jjriverplot(
data = jjriverplot_test_data_wide,
strata = c("month3_response", "month6_response", "month12_response"),
plotType = "sankey",
fillType = "first",
showCounts = TRUE,
mytitle = "Treatment Response Progression by Stage"
)
# Filter by treatment arm for separate analysis
control_data <- jjriverplot_test_data_wide[
jjriverplot_test_data_wide$treatment_arm == "Control", ]
control_pathway <- jjriverplot(
data = control_data,
strata = c("month3_response", "month6_response", "month12_response"),
plotType = "sankey",
mytitle = "Control Group Response Pathway"
)
Educational Journey Analysis
# Load education data
data(jjriverplot_education_data)
# Student performance progression
education_flow <- jjriverplot(
data = jjriverplot_education_data,
strata = c("year1_performance", "year2_performance",
"year3_performance", "final_outcome"),
plotType = "alluvial",
fillType = "last",
labelNodes = TRUE,
mytitle = "Student Academic Progression",
xtitle = "Academic Year",
ytitle = "Student Count"
)
# This reveals:
# - Academic performance trajectories
# - Points of student attrition
# - Success pathway identification
Marketing Funnel Analysis
# Load marketing data
data(jjriverplot_marketing_data)
# Customer journey through marketing funnel
marketing_funnel <- jjriverplot(
data = jjriverplot_marketing_data,
strata = c("awareness", "interest", "consideration",
"purchase", "loyalty"),
weight = "purchase_value",
plotType = "sankey",
fillType = "last",
showCounts = TRUE,
mytitle = "Customer Journey Analysis",
xtitle = "Funnel Stage",
ytitle = "Customer Value ($)"
)
# This shows:
# - Conversion rates between stages
# - Value-weighted customer flows
# - Drop-off points in the funnel
Multi-Site Quality Control
# Analyze treatment consistency across hospital centers
site_analysis <- jjriverplot(
data = jjriverplot_test_data_long,
time = "timepoint",
strata = "treatment_response",
plotType = "alluvial",
fillType = "first",
mytitle = "Treatment Response Patterns Across All Sites"
)
# Compare specific high-performing vs low-performing sites
high_performing_sites <- c("Center_A", "Center_B")
site_subset <- jjriverplot_test_data_long[
jjriverplot_test_data_long$hospital_center %in% high_performing_sites, ]
site_comparison <- jjriverplot(
data = site_subset,
time = "timepoint",
strata = "treatment_response",
weight = "treatment_cost",
plotType = "alluvial",
mytitle = "High-Performing Sites: Treatment Pathways"
)
Data Requirements
Data Formats
Long Format (Longitudinal Data)
# Structure for longitudinal river plots
# Required columns:
# - time: factor with ordered levels (e.g., "Baseline", "Month_3", "Month_6")
# - strata: factor with category levels (e.g., "Complete_Response", "Partial_Response")
# - Optional: weight (numeric), id (factor)
example_long <- data.frame(
patient_id = rep(c("P001", "P002"), each = 3),
timepoint = factor(rep(c("T1", "T2", "T3"), times = 2)),
response = factor(c("Good", "Good", "Excellent",
"Poor", "Fair", "Good")),
cost = c(1000, 1200, 800, 1500, 1300, 1100)
)
Wide Format (Cross-Sectional Stages)
# Structure for multi-stage river plots
# Required: multiple strata variables representing sequential stages
# Optional: weight variable, demographic variables
example_wide <- data.frame(
id = paste0("ID", 1:100),
stage1 = factor(sample(c("A", "B", "C"), 100, replace = TRUE)),
stage2 = factor(sample(c("X", "Y", "Z"), 100, replace = TRUE)),
stage3 = factor(sample(c("Success", "Failure"), 100, replace = TRUE)),
total_value = runif(100, 100, 1000)
)
Data Quality Requirements
# For optimal river plots:
# 1. Adequate sample sizes (n > 50 recommended)
# 2. Reasonable number of categories (2-8 per variable)
# 3. Meaningful category names (not just codes)
# 4. Proper factor ordering for time variables
# 5. Complete cases (missing values are excluded)
Best Practices
Plot Type Selection
# Use Alluvial plots when:
# - Tracking changes over time
# - Emphasizing flow continuity
# - Showing patient/customer journeys
# - Time points are clearly defined
# Use Sankey diagrams when:
# - Analyzing process flows
# - Emphasizing directed flows
# - Decision tree visualization
# - Resource allocation analysis
# Use Stream graphs when:
# - Showing composition over time
# - Continuous time variable
# - Relative proportions important
# - Market share analysis
Color and Fill Strategy
# fillType selection:
# - "first": Track where flows originated (good for source analysis)
# - "last": Track where flows end up (good for outcome analysis)
# - "frequency": Emphasize major flows (good for identifying patterns)
# Examples:
# Treatment analysis: use "first" to track initial response groups
# Outcome analysis: use "last" to track final outcomes
# Process optimization: use "frequency" to identify bottlenecks
Labeling and Clarity
# For clear river plots:
# 1. Use meaningful variable names and category labels
# 2. Enable node labels for interpretation (labelNodes = TRUE)
# 3. Show counts for quantitative analysis (showCounts = TRUE)
# 4. Provide clear axis labels and titles
# 5. Consider legend positioning based on plot complexity
# Avoid overwhelming plots:
# - Limit to 6-8 categories per variable
# - Combine rare categories when appropriate
# - Use clear, contrasting colors
# - Consider split plots for complex data
Advanced Techniques
Subset Analysis
# Analyze specific subgroups for detailed insights
# Example: High-cost patients only
high_cost_patients <- jjriverplot_test_data_long[
jjriverplot_test_data_long$treatment_cost > 5000, ]
high_cost_analysis <- jjriverplot(
data = high_cost_patients,
time = "timepoint",
strata = "treatment_response",
weight = "treatment_cost",
plotType = "alluvial",
mytitle = "High-Cost Patient Pathways"
)
# Example: By demographic groups
elderly_patients <- jjriverplot_test_data_long[
jjriverplot_test_data_long$age_group == "75+", ]
elderly_analysis <- jjriverplot(
data = elderly_patients,
time = "timepoint",
strata = "treatment_response",
plotType = "alluvial",
mytitle = "Elderly Patient Treatment Pathways"
)
Comparative Analysis
# Compare different treatments side by side
treatment_groups <- c("Treatment_A", "Treatment_B")
for (treatment in treatment_groups) {
subset_data <- jjriverplot_test_data_long[
jjriverplot_test_data_long$treatment_arm == treatment, ]
plot_result <- jjriverplot(
data = subset_data,
time = "timepoint",
strata = "treatment_response",
plotType = "alluvial",
mytitle = paste("Pathways for", treatment)
)
# Save or display each plot
print(plot_result$plot)
}
Quality Control Applications
# Multi-center consistency analysis
center_comparison <- function(center_id) {
center_data <- jjriverplot_test_data_long[
jjriverplot_test_data_long$hospital_center == center_id, ]
jjriverplot(
data = center_data,
time = "timepoint",
strata = "treatment_response",
plotType = "alluvial",
mytitle = paste("Treatment Patterns -", center_id),
labelNodes = TRUE,
showCounts = TRUE
)
}
# Analyze each center
centers <- unique(jjriverplot_test_data_long$hospital_center)
center_plots <- lapply(centers[1:3], center_comparison) # First 3 centers
Technical Details
Underlying Functions
The jjriverplot
function is built on:
- ggplot2: Core plotting framework
- ggalluvial: Specialized alluvial diagram geometries
- dplyr: Data manipulation for aggregation
- jmvcore: Data handling and option processing
Caching Details
# Internal caching structure (conceptual)
# private$.processedData: Cached cleaned data with factor conversion
# private$.processedOptions: Cached option processing including titles
#
# Benefits:
# - Eliminates redundant factor conversion calls
# - Avoids repeated jmvcore::naOmit() operations
# - Shares processed data across plot variations
# - Optimizes complex conditional logic processing
Clinical Applications
Research Scenarios
- Clinical Trials: Track patient responses across treatment periods
- Quality Improvement: Monitor care pathway adherence
- Cost Analysis: Visualize resource utilization flows
- Epidemiology: Study disease progression patterns
- Health Services: Analyze patient flow through care systems
Publication Guidelines
# For scientific publications:
# - Use clear, descriptive titles and axis labels
# - Include sample sizes in figure captions
# - Choose colorblind-friendly fill patterns
# - Provide detailed methodology in methods section
# - Consider grayscale compatibility for print journals
# Example publication-ready plot:
publication_plot <- jjriverplot(
data = clinical_data,
time = "study_timepoint",
strata = "response_category",
weight = "patient_count",
plotType = "alluvial",
fillType = "first",
labelNodes = TRUE,
showLegend = TRUE,
mytitle = "Patient Response Trajectories (N = 600)",
xtitle = "Study Timepoint",
ytitle = "Number of Patients"
)
Troubleshooting
Common Issues
-
“Data contains no (complete) rows”
- Check for missing values in required variables
- Ensure factor variables have valid levels
- Verify data filtering hasn’t excluded all observations
-
Empty or unexpected flows
- Check factor level definitions and ordering
- Verify time variable is properly formatted
- Ensure categories exist at multiple time points
-
Overlapping labels
- Reduce number of categories through grouping
- Disable labels (labelNodes = FALSE) for complex plots
- Consider using counts instead of labels
-
Poor flow visibility
- Adjust plot size settings
- Use weight variable to emphasize important flows
- Consider filtering to show only major pathways
Error Handling
# Example error handling
tryCatch({
result <- jjriverplot(
data = my_data,
time = "time_var",
strata = "category_var",
plotType = "alluvial"
)
}, error = function(e) {
message("Error in river plot: ", e$message)
message("Check your data structure and variable types")
# Diagnostic information
cat("Data structure:\n")
str(my_data)
cat("\nTime variable levels:\n")
print(levels(my_data$time_var))
cat("\nCategory variable levels:\n")
print(levels(my_data$category_var))
})
Performance Considerations
# For large datasets:
# - Consider sampling for initial exploration
# - Group rare categories to reduce complexity
# - Use weight variables to highlight important flows
# - Monitor memory usage with very wide datasets
# Optimal performance tips:
# - Ensure categorical variables are factors
# - Use meaningful factor level ordering
# - Remove unnecessary columns from data
# - Balance detail with readability
Interpretation Guidelines
What River Plots Reveal
# River plots effectively show:
# 1. Transition patterns: How categories change over time
# 2. Flow volumes: Relative sizes of different pathways
# 3. Stability: Which categories remain constant
# 4. Convergence: Multiple paths leading to same outcome
# 5. Divergence: Single starting points leading to multiple outcomes
# Key interpretation elements:
# - Stream width = quantity/frequency
# - Stream color = category grouping (based on fillType)
# - Node height = total volume at that stage
# - Flow crossings = category transitions
Common Patterns
# Clinical research patterns:
# Treatment response progression:
# - Stable flows: Consistent responders
# - Improving flows: Treatment success
# - Declining flows: Treatment failure
# - Complex flows: Mixed responses
# Patient journey analysis:
# - Funnel patterns: Progressive selection
# - Retention patterns: Stable participation
# - Dropout patterns: Loss to follow-up
# - Recovery patterns: Improvement over time
Comparison with Other Visualizations
When to Use River Plots vs Alternatives
# Use river plots when:
# - Tracking categorical changes over time
# - Showing flow volumes and transitions
# - Multiple pathways need visualization
# - Process or journey analysis required
# Use line plots when:
# - Continuous variables over time
# - Trends and patterns are focus
# - Statistical relationships important
# - Precise values needed
# Use bar charts when:
# - Single time point comparisons
# - Exact values are important
# - Categories don't flow/transition
# - Simple frequency distributions
# Use heatmaps when:
# - Showing correlation patterns
# - Matrix-style data relationships
# - Intensity rather than flow important
# - Compact representation needed
Conclusion
The optimized jjriverplot
function provides:
- Comprehensive flow visualization: Multiple plot types for different analytical needs
- High performance: Significant speed improvements through caching
-
Clinical relevance: Designed for healthcare pathway
and outcome analysis
- Flexibility: Extensive customization options for publication and presentation
- Usability: Clear documentation and comprehensive examples
River plots excel at revealing transition patterns, pathway volumes, and flow dynamics that traditional static visualizations cannot capture. They are particularly valuable in clinical research for tracking patient journeys, treatment responses, and outcome progressions over time.
The function is well-suited for longitudinal studies, quality improvement initiatives, clinical trial analysis, and any scenario requiring visualization of categorical transitions and flows between defined stages or time points.
Session Information
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Istanbul
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 desc_1.4.3 R6_2.6.1 fastmap_1.2.0
## [5] xfun_0.52 cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1
## [9] rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.5 sass_0.4.10
## [13] pkgdown_2.1.3 textshaping_1.0.1 jquerylib_0.1.4 systemfonts_1.2.3
## [17] compiler_4.5.1 rstudioapi_0.17.1 tools_4.5.1 ragg_1.4.0
## [21] bslib_0.9.0 evaluate_1.0.4 yaml_2.3.10 jsonlite_2.0.0
## [25] rlang_1.1.6 fs_1.6.6 htmlwidgets_1.6.4