Clinical Research Demographics with Treatment Groups
Source:R/data_toolssummary_docs.R
toolssummary_clinical_demographics.Rd
Simulated clinical research study demographics dataset with multiple treatment groups, comprehensive clinical variables, and realistic missing data patterns. Designed to test demographic table generation, grouped summaries, cross-tabulations, and publication-ready formatting typical in clinical research using summarytools enhanced features.
Format
A data frame with 300 observations and 16 variables:
- patient_id
Character. Unique patient identifier (PT_0001 to PT_0300)
- age
Integer. Patient age at enrollment (18-85 years)
- sex
Factor. Patient sex ("Male", "Female")
- treatment_group
Factor. Treatment assignment ("Control", "Treatment A", "Treatment B")
- study_site
Factor. Study enrollment site ("Site_A" to "Site_F")
- bmi
Numeric. Body mass index (16-45 kg/m²) with ~5% missing values
- systolic_bp
Integer. Systolic blood pressure (90-200 mmHg)
- diastolic_bp
Integer. Diastolic blood pressure (60-120 mmHg)
- diabetes
Factor. Diabetes status ("No", "Type 1", "Type 2")
- smoking_status
Factor. Smoking history ("Never", "Former", "Current")
- education
Ordered Factor. Education level ("Less than HS" < "High School" < "Some College" < "Bachelor's" < "Graduate")
- hemoglobin
Numeric. Hemoglobin level (8-18 g/dL) with ~3% missing values
- glucose
Integer. Fasting glucose (70-400 mg/dL)
- cholesterol
Integer. Total cholesterol (120-350 mg/dL) with ~8% missing values
- enrollment_date
Date. Study enrollment date (2023-01-01 to 2024-12-31)
- followup_months
Integer. Follow-up duration (1-24 months)
Details
This dataset simulates a comprehensive clinical research study baseline characteristics table with realistic distributions and appropriate missing data patterns. It's specifically designed to showcase summarytools capabilities including dfSummary comprehensive overviews, freq tables for categorical variables, descr for numeric summaries, and ctable for cross-tabulations by treatment groups.
Key Features:
Realistic clinical variable distributions with physiological ranges
Balanced treatment groups for meaningful comparisons
Multiple categorical variables for frequency analysis
Ordered factors for proper categorical handling
Missing data patterns reflecting real clinical studies
Date variables for temporal analysis
summarytools Integration Testing:
dfSummary: Complete data frame overview with variable distributions
freq: Categorical variable frequency tables (sex, treatment_group, diabetes)
descr: Comprehensive descriptive statistics for numeric variables
ctable: Cross-tabulations by treatment group or study site
Recommended Usage Scenarios:
Grouped summaries by treatment_group or study_site
Cross-tabulation analysis for categorical associations
Missing data pattern assessment
Publication-ready demographic tables
Examples
if (FALSE) { # \dontrun{
# Load the dataset
data(toolssummary_clinical_demographics)
# Basic enhanced summary with summarytools
result <- toolssummary(
data = toolssummary_clinical_demographics,
vars = c("age", "sex", "bmi", "diabetes", "treatment_group"),
useSummarytools = TRUE,
showDfSummary = TRUE,
showDescr = TRUE,
showFreq = TRUE
)
# Grouped analysis by treatment
result_grouped <- toolssummary(
data = toolssummary_clinical_demographics,
vars = c("age", "bmi", "systolic_bp", "diabetes"),
groupVar = "treatment_group",
useSummarytools = TRUE,
showCrosstabs = TRUE
)
} # }