Clinical Trial Test Datasets for Advanced Raincloud Plot
Source:R/advancedraincloud_data.R
      advancedraincloud_datasets.RdComprehensive test datasets designed to demonstrate and test all features of the Advanced Raincloud Plot module. These datasets simulate realistic clinical trial scenarios with various outcome measures and missing data patterns.
Usage
advancedraincloud_data
advancedraincloud_baseline
advancedraincloud_endpoint
advancedraincloud_changeFormat
- advancedraincloud_data
- Longitudinal dataset with 900 rows (300 patients × 3 visits) and 15 variables 
- advancedraincloud_baseline
- Cross-sectional baseline data with 300 rows and 13 variables 
- advancedraincloud_endpoint
- Cross-sectional endpoint data with 300 rows and 13 variables 
- advancedraincloud_change
- Change score data with 130 complete cases and 6 variables 
An object of class tbl_df (inherits from tbl, data.frame) with 900 rows and 15 columns.
An object of class tbl_df (inherits from tbl, data.frame) with 300 rows and 13 columns.
An object of class tbl_df (inherits from tbl, data.frame) with 300 rows and 13 columns.
An object of class tbl_df (inherits from tbl, data.frame) with 130 rows and 6 columns.
Variables
Patient Identifiers:
- patient_id
- Unique patient identifier (PT001-PT300) 
- treatment_arm
- Treatment group - factor with levels "Placebo", "Drug A" 
- time_point
- Visit timepoint - factor with levels "Baseline", "Week 4", "Week 12" 
- visit_number
- Numeric visit number (1-3) 
Demographics:
- age
- Patient age in years (normally distributed, mean=55, sd=12) 
- gender
- Patient gender - factor with levels "Female", "Male" 
- disease_stage
- Disease stage - factor with levels "Early", "Advanced" 
- biomarker_status
- Biomarker status - factor with levels "Negative", "Positive" 
- age_group
- Age stratification - factor "< 65 years", "≥ 65 years" 
- baseline_biomarker_high
- Baseline biomarker level - factor "High", "Normal" 
Primary Outcomes:
- tumor_size_change
- Percent change in tumor size from baseline (negative = shrinkage) 
- biomarker_level
- Biomarker concentration (log-normally distributed, ~90 units baseline) 
- qol_score
- Quality of life score (0-100 scale, higher = better) 
- pain_score
- Pain intensity score - ordered factor (0-10 scale, higher = worse pain) 
Derived Variables:
- tumor_responder
- Response classification - factor with levels: - "Progressive Disease" (>10% increase) 
- "Stable Disease" (-10% to +10%) 
- "Partial Response" (-30% to -10%) 
- "Complete Response" (≤-30%) 
 
Change Score Variables (advancedraincloud_change only):
- tumor_change
- Change in tumor size (Week 12 - Baseline) 
- biomarker_change
- Change in biomarker (Baseline - Week 12, positive = reduction) 
- qol_change
- Change in quality of life (Week 12 - Baseline) 
- pain_change
- Change in pain score (Baseline - Week 12, positive = improvement) 
Clinical Trial Design
This simulates a randomized, placebo-controlled trial with:
- N = 300 patients (150 per arm) 
- 3 timepoints: Baseline, Week 4, Week 12 
- Primary endpoint: Tumor size reduction 
- Secondary endpoints: Biomarker levels, quality of life, pain scores 
- Realistic dropout: ~15% overall, higher in placebo group 
- Missing data: Increases over time, varies by treatment 
Treatment Effects
The simulated treatment effects are:
- Placebo
- Slight tumor progression (+5% at Week 4, +8% at Week 12) 
- Drug A
- Tumor regression (-15% at Week 4, -25% at Week 12) with 30% non-responders 
- Biomarkers
- Drug A reduces levels by ~40%, Placebo shows slight increase 
- Quality of Life
- Drug A improves scores (+8 points), Placebo slight decline (-2) 
- Pain Scores
- Drug A reduces pain (-1.5 points), Placebo slight increase (+0.5) 
Testing Features
These datasets are designed to test all Advanced Raincloud Plot features:
Clinical Significance:
- Clinical cutoffs (e.g., tumor_size_change = -30 for response) 
- Reference ranges (e.g., biomarker_level: 10-50 normal range) 
- MCID values (e.g., qol_score MCID = 10 points) 
Effect Size Analysis:
- Between-group comparisons (Drug A vs Placebo) 
- Cohen's d, Hedges' g, Glass's delta calculations 
- Multiple timepoint comparisons 
Change Score Analysis:
- Longitudinal change from baseline 
- Responder classifications (20% threshold default) 
- Missing data handling 
Biomarker Features:
- Log-normal distribution requiring transformation 
- Outliers (10 extreme values) for outlier handling tests 
- CV bands for assay variability assessment 
Publication Features:
- Sample size annotations 
- Missing data reporting 
- Statistical comparisons 
- Journal-specific formatting 
Usage Examples
# Basic raincloud plot with clinical cutoff
advancedraincloud(
  data = advancedraincloud_baseline,
  y_var = "biomarker_level",
  x_var = "treatment_arm",
  clinical_cutoff = 100,
  show_sample_size = TRUE
)
# Longitudinal analysis with change scores
advancedraincloud(
  data = advancedraincloud_data,
  y_var = "tumor_size_change",
  x_var = "time_point",
  id_var = "patient_id",
  fill_var = "treatment_arm",
  show_longitudinal = TRUE,
  show_change_scores = TRUE,
  baseline_group = "Baseline",
  clinical_cutoff = -30,
  show_effect_size = TRUE
)
# Biomarker analysis with log transformation
advancedraincloud(
  data = advancedraincloud_endpoint,
  y_var = "biomarker_level",
  x_var = "treatment_arm",
  log_transform = TRUE,
  outlier_method = "winsorize",
  reference_range_min = 10,
  reference_range_max = 50,
  show_cv_bands = TRUE,
  journal_style = "nature"
)
# Quality of life with MCID analysis
advancedraincloud(
  data = advancedraincloud_data,
  y_var = "qol_score",
  x_var = "time_point",
  fill_var = "treatment_arm",
  show_mcid = TRUE,
  mcid_value = 10,
  show_change_scores = TRUE,
  generate_report = TRUE
)
# Likert scale analysis for pain scores
advancedraincloud(
  data = advancedraincloud_data,
  y_var = "pain_score",
  x_var = "treatment_arm",
  likert_mode = TRUE,
  show_comparisons = TRUE,
  p_value_position = "above"
)Data Generation
These datasets were generated using realistic assumptions:
- Reproducible random seed (42) for consistent results 
- Clinically plausible effect sizes and variability 
- Realistic missing data patterns typical of clinical trials 
- Appropriate correlation structures for repeated measures 
- Standard clinical trial demographic distributions 
See also
- advancedraincloudfor the analysis function
- vignette("advancedraincloud_examples")for detailed examples
Examples
# Load the datasets
data("advancedraincloud_data")
data("advancedraincloud_baseline") 
data("advancedraincloud_endpoint")
data("advancedraincloud_change")
# Explore the structure
str(advancedraincloud_data)
summary(advancedraincloud_baseline)
# Check missing data patterns
table(is.na(advancedraincloud_data$tumor_size_change), 
      advancedraincloud_data$time_point,
      advancedraincloud_data$treatment_arm)
# View response rates by treatment
with(advancedraincloud_endpoint, 
     table(tumor_responder, treatment_arm, useNA = "ifany"))
# Check change score completeness
nrow(advancedraincloud_change)  # Complete cases for change analysis