Skip to contents

Simulated dataset for a multi-center randomized controlled trial with differential retention across sites. Designed for testing complex flow scenarios with site-level variability in the consortdiagram function.

Usage

multicenter_trial_data

Format

A data frame with 600 rows and 10 columns:

participant_id

Unique participant identifier (MC-00001 to MC-00600)

site

Study site (Site A/B/C/D)

age

Age in years (mean=65, sd=10)

sex

Sex (Male/Female)

screening_failure

Screening exclusion reasons (inclusion/exclusion criteria, lab values)

enrollment_issue

Enrollment exclusion reasons (consent, travel distance)

arm

Randomized treatment arm (Experimental/Control)

not_received

Allocation exclusion reasons (intervention unavailable, deterioration)

followup_loss_reason

Follow-up loss reasons (lost, withdrew, site closure)

analysis_issue

Analysis exclusion reasons (missing endpoint)

Source

Generated using data-raw/create_clinical_trial_flow_data.R (seed: 20251005)

Details

This dataset simulates a realistic multi-center trial with:

  • 600 participants assessed across 4 sites

  • 25\

  • 5\

  • 1:1 randomization to Experimental vs Control

  • 3\

  • 15\

  • 2\

  • Final retention: 57.7\

  • Site-specific retention rates: 52.5\

The dataset demonstrates realistic site variability in retention rates, which is common in multi-center trials due to differences in site infrastructure, patient populations, and study management.

Usage

This dataset demonstrates:

  • Multi-site trial flow visualization

  • Site-level retention variability

  • Complex exclusion patterns

  • Higher attrition rates typical of multi-center studies

Examples

if (FALSE) { # \dontrun{
# Load data
data(multicenter_trial_data)

# Site-specific retention
library(dplyr)
multicenter_trial_data %>%
  group_by(site) %>%
  summarise(
    total = n(),
    analyzed = sum(is.na(screening_failure) &
                   is.na(enrollment_issue) &
                   is.na(not_received) &
                   is.na(followup_loss_reason) &
                   is.na(analysis_issue)),
    retention = analyzed / total * 100
  )
} # }