Simulated dataset for a multi-center randomized controlled trial with differential
retention across sites. Designed for testing complex flow scenarios with site-level
variability in the consortdiagram function.
Format
A data frame with 600 rows and 10 columns:
- participant_id
- Unique participant identifier (MC-00001 to MC-00600) 
- site
- Study site (Site A/B/C/D) 
- age
- Age in years (mean=65, sd=10) 
- sex
- Sex (Male/Female) 
- screening_failure
- Screening exclusion reasons (inclusion/exclusion criteria, lab values) 
- enrollment_issue
- Enrollment exclusion reasons (consent, travel distance) 
- arm
- Randomized treatment arm (Experimental/Control) 
- not_received
- Allocation exclusion reasons (intervention unavailable, deterioration) 
- followup_loss_reason
- Follow-up loss reasons (lost, withdrew, site closure) 
- analysis_issue
- Analysis exclusion reasons (missing endpoint) 
Details
This dataset simulates a realistic multi-center trial with:
- 600 participants assessed across 4 sites 
- 25\ 
- 5\ 
- 1:1 randomization to Experimental vs Control 
- 3\ 
- 15\ 
- 2\ 
- Final retention: 57.7\ 
- Site-specific retention rates: 52.5\ 
The dataset demonstrates realistic site variability in retention rates, which is common in multi-center trials due to differences in site infrastructure, patient populations, and study management.
Usage
This dataset demonstrates:
- Multi-site trial flow visualization 
- Site-level retention variability 
- Complex exclusion patterns 
- Higher attrition rates typical of multi-center studies 
Examples
if (FALSE) { # \dontrun{
# Load data
data(multicenter_trial_data)
# Site-specific retention
library(dplyr)
multicenter_trial_data %>%
  group_by(site) %>%
  summarise(
    total = n(),
    analyzed = sum(is.na(screening_failure) &
                   is.na(enrollment_issue) &
                   is.na(not_received) &
                   is.na(followup_loss_reason) &
                   is.na(analysis_issue)),
    retention = analyzed / total * 100
  )
} # }