Simulated dataset for a multi-center randomized controlled trial with differential
retention across sites. Designed for testing complex flow scenarios with site-level
variability in the consortdiagram function.
Format
A data frame with 600 rows and 10 columns:
- participant_id
Unique participant identifier (MC-00001 to MC-00600)
- site
Study site (Site A/B/C/D)
- age
Age in years (mean=65, sd=10)
- sex
Sex (Male/Female)
- screening_failure
Screening exclusion reasons (inclusion/exclusion criteria, lab values)
- enrollment_issue
Enrollment exclusion reasons (consent, travel distance)
- arm
Randomized treatment arm (Experimental/Control)
- not_received
Allocation exclusion reasons (intervention unavailable, deterioration)
- followup_loss_reason
Follow-up loss reasons (lost, withdrew, site closure)
- analysis_issue
Analysis exclusion reasons (missing endpoint)
Details
This dataset simulates a realistic multi-center trial with:
600 participants assessed across 4 sites
25\
5\
1:1 randomization to Experimental vs Control
3\
15\
2\
Final retention: 57.7\
Site-specific retention rates: 52.5\
The dataset demonstrates realistic site variability in retention rates, which is common in multi-center trials due to differences in site infrastructure, patient populations, and study management.
Usage
This dataset demonstrates:
Multi-site trial flow visualization
Site-level retention variability
Complex exclusion patterns
Higher attrition rates typical of multi-center studies
Examples
if (FALSE) { # \dontrun{
# Load data
data(multicenter_trial_data)
# Site-specific retention
library(dplyr)
multicenter_trial_data %>%
group_by(site) %>%
summarise(
total = n(),
analyzed = sum(is.na(screening_failure) &
is.na(enrollment_issue) &
is.na(not_received) &
is.na(followup_loss_reason) &
is.na(analysis_issue)),
retention = analyzed / total * 100
)
} # }