Comprehensive collection of test datasets for the timeinterval function, covering various date formats, clinical scenarios, and edge cases for robust testing of time interval calculations.
Clinical trial data with YYYY-MM-DD date format, treatment groups, and realistic enrollment/follow-up patterns for testing basic time interval calculations.
European clinical data with DD/MM/YYYY date format for testing international date format compatibility and disease progression scenarios.
Hospital admission/discharge data with MM/DD/YYYY HH:MM:SS datetime format for testing high-precision time calculations and data quality issues.
Population-based cohort study data with various exit reasons and extreme values for testing comprehensive data quality assessment features.
Cancer study dataset specifically designed for testing landmark analysis functionality with 6-month landmark time point.
Deliberately challenging dataset with mixed date formats in the same columns for testing automatic date format detection capabilities.
Summary information for all timeinterval test datasets including observation counts, descriptions, and key features.
Documentation of specific test scenarios, recommended datasets, and expected results for comprehensive timeinterval testing.
Usage
timeinterval_clinical_trial
timeinterval_european_dates
timeinterval_us_datetime
timeinterval_epidemiological
timeinterval_landmark
timeinterval_mixed_formats
timeinterval_datasets_summary
timeinterval_test_scenarios
Format
Various data frames with different structures optimized for specific testing scenarios
A data frame with 200 observations and 8 variables:
- patient_id
Character. Unique patient identifier (CT_001 to CT_200)
- treatment_group
Character. Treatment assignment: "Treatment A", "Treatment B", "Control"
- age
Numeric. Patient age at enrollment (mean=65, sd=12)
- sex
Character. Patient sex: "Male", "Female"
- enrollment_date_ymd
Character. Study enrollment date in YYYY-MM-DD format
- followup_date_ymd
Character. Last follow-up date in YYYY-MM-DD format (some missing)
- event_occurred
Numeric. Binary indicator of primary event (0/1)
- site_location
Character. Study site: "Site A", "Site B", "Site C"
A data frame with 150 observations and 7 variables:
- study_id
Character. Unique study identifier (EU_001 to EU_150)
- country
Character. European country: "Germany", "France", "Italy", "Spain", "UK"
- diagnosis_date_dmy
Character. Diagnosis date in DD/MM/YYYY format
- last_visit_dmy
Character. Last clinical visit in DD/MM/YYYY format
- disease_stage
Character. Disease stage: "I", "II", "III", "IV"
- outcome_status
Character. Patient status: "Alive", "Deceased", "Lost to Follow-up"
- comorbidity_score
Numeric. Comorbidity burden score (0-10)
A data frame with 180 observations and 8 variables:
- record_id
Character. Unique record identifier (US_0001 to US_0180)
- hospital_unit
Character. Hospital unit: "ICU", "Emergency", "Surgery", "Medical", "Pediatric"
- admission_datetime
Character. Admission date/time in MM/DD/YYYY HH:MM:SS format
- discharge_datetime
Character. Discharge date/time in MM/DD/YYYY HH:MM:SS format
- primary_diagnosis
Character. Primary diagnosis category
- severity_score
Numeric. Illness severity score
- readmission_30d
Numeric. 30-day readmission indicator (0/1)
- insurance_type
Character. Insurance type: "Private", "Medicare", "Medicaid", "Uninsured"
A data frame with 250 observations and 9 variables:
- participant_id
Character. Unique participant identifier (EPI_0001 to EPI_0250)
- geographic_region
Character. Region: "Urban", "Suburban", "Rural"
- cohort_entry_date
Character. Cohort entry date in YYYY-MM-DD format
- exit_date
Character. Study exit date (some missing for testing)
- exit_reason
Character. Exit reason: "event", "censored", "death", "emigration"
- age_at_entry
Numeric. Age at cohort entry (mean=45, sd=15)
- exposure_status
Character. Exposure level: "High", "Medium", "Low", "None"
- socioeconomic_status
Character. SES: "High", "Medium", "Low"
- baseline_health_score
Numeric. Health score at baseline (mean=75, sd=12)
A data frame with 120 observations and 8 variables:
- patient_id
Character. Unique patient identifier (LM_001 to LM_120)
- cancer_type
Character. Cancer type: "Breast", "Lung", "Colorectal", "Prostate", "Lymphoma"
- diagnosis_date
Character. Diagnosis date (all same date for landmark testing)
- last_contact_date
Character. Last contact date
- vital_status
Character. Status: "Deceased", "Alive"
- treatment_received
Character. Treatment type: "Surgery Only", "Surgery + Chemo", etc.
- response_6m
Character. 6-month response: "Complete", "Partial", "Stable", "Progressive"
- landmark_eligible
Logical. TRUE if survived past 6-month landmark
A data frame with 100 observations and 7 variables:
- sample_id
Character. Unique sample identifier (MX_001 to MX_100)
- data_source
Character. Data source: "Manual Entry", "Electronic Import", etc.
- start_date_mixed
Character. Start dates in MIXED formats (YMD, DMY, MDY, YDM)
- end_date_mixed
Character. End dates in MIXED formats (matching start format)
- data_quality_flag
Character. Quality flag: "High", "Medium", "Low"
- operator_id
Character. Data entry operator identifier
- verification_status
Character. Status: "Verified", "Pending", "Flagged"
A data frame with 6 observations and 4 variables:
- Dataset
Character. Dataset name
- Observations
Numeric. Number of observations in dataset
- Description
Character. Brief dataset description
- Key_Features
Character. Key testing features
A data frame with 10 observations and 3 variables:
- Scenario
Character. Test scenario name
- Dataset
Character. Recommended dataset for testing
- Expected_Result
Character. Expected outcome description
Details
This collection includes six specialized datasets designed to test different aspects of the timeinterval function:
Multiple date format parsing (YMD, DMY, MDY, datetime)
Clinical trial scenarios with treatment groups
International date format compatibility
High-precision datetime calculations
Epidemiological cohort studies
Landmark analysis test cases
Data quality assessment scenarios
Missing value and edge case handling
This dataset simulates a multicenter clinical trial with:
Staggered enrollment over 1 year (2020-2021)
Realistic follow-up periods (30-500 days, Poisson distributed)
~5% missing follow-up dates for robustness testing
Event rate of ~25% across all groups
Three treatment arms with different allocation probabilities
Ideal for testing:
Basic YMD date parsing
Missing value handling
Treatment group stratification
Clinical trial time-to-event analysis
This dataset represents a European multicenter study with:
DD/MM/YYYY date format (European standard)
Disease staging data for oncology analysis
Follow-up periods ranging from 6 months to 3 years
Realistic outcome distributions
Country-specific enrollment patterns
Ideal for testing:
DMY date format parsing
International date standards
Disease progression analysis
Multi-country study coordination
This dataset simulates hospital electronic health record data with:
High-precision datetime stamps (hours/minutes/seconds)
Length of stay ranging from 2 hours to several days
Realistic hospital unit distributions
~5 intentional negative intervals (discharge before admission) for quality testing
Various diagnosis categories and severity levels
Ideal for testing:
MDY datetime format parsing
High-precision time calculations
Negative interval detection
Hospital length of stay analysis
Data quality assessment features
This dataset represents a population-based epidemiological study with:
Follow-up periods from 2018-2023 (up to 6 years)
Multiple exit scenarios (15% events, 65% censored, 15% death, 5% emigration)
~8 extreme follow-up values (10-20 years) for outlier testing
~4 missing exit dates for missing data testing
Realistic demographic and exposure distributions
Ideal for testing:
Long-term follow-up calculations
Extreme value detection
Missing data handling
Population health analysis
Comprehensive data quality assessment
This dataset is specifically designed for landmark analysis testing with:
Common diagnosis date for all patients (2020-03-01)
~30% of patients with events before 6-month landmark (to be excluded)
~70% surviving past landmark with varying additional follow-up
5 patients with exactly 6-month follow-up for boundary testing
Realistic cancer treatment and response patterns
Ideal for testing:
Landmark analysis at 6 months
Conditional survival calculations
Patient exclusion criteria
Cancer survival analysis
Treatment response correlation
This dataset intentionally mixes date formats within the same column:
25% YYYY-MM-DD format
25% DD/MM/YYYY format
25% MM/DD/YYYY format
25% YYYY/DD/MM format
First 3 entries contain obviously invalid dates for error testing
Simulates real-world data integration challenges
Ideal for testing:
Automatic date format detection
Mixed format handling
Error detection and reporting
Data quality assessment
Robust parsing algorithms
Examples
if (FALSE) { # \dontrun{
# Load the dataset
data(timeinterval_clinical_trial)
# Basic time interval calculation
timeinterval(
data = timeinterval_clinical_trial,
dx_date = "enrollment_date_ymd",
fu_date = "followup_date_ymd",
time_format = "ymd",
output_unit = "months"
)
# Treatment group analysis
library(dplyr)
timeinterval_clinical_trial %>%
group_by(treatment_group) %>%
summarise(
n = n(),
events = sum(event_occurred, na.rm = TRUE)
)
} # }
if (FALSE) { # \dontrun{
# Load the dataset
data(timeinterval_european_dates)
# European date format analysis
timeinterval(
data = timeinterval_european_dates,
dx_date = "diagnosis_date_dmy",
fu_date = "last_visit_dmy",
time_format = "dmy",
output_unit = "months"
)
# Disease stage analysis
library(dplyr)
timeinterval_european_dates %>%
group_by(disease_stage) %>%
summarise(
n = n(),
deceased = sum(outcome_status == "Deceased", na.rm = TRUE)
)
} # }
if (FALSE) { # \dontrun{
# Load the dataset
data(timeinterval_us_datetime)
# High-precision datetime analysis
timeinterval(
data = timeinterval_us_datetime,
dx_date = "admission_datetime",
fu_date = "discharge_datetime",
time_format = "mdy", # Will auto-detect datetime
output_unit = "days",
include_quality_metrics = TRUE
)
# Hospital unit analysis
library(dplyr)
timeinterval_us_datetime %>%
group_by(hospital_unit) %>%
summarise(
n = n(),
readmissions = sum(readmission_30d, na.rm = TRUE)
)
} # }
if (FALSE) { # \dontrun{
# Load the dataset
data(timeinterval_epidemiological)
# Epidemiological analysis with quality assessment
timeinterval(
data = timeinterval_epidemiological,
dx_date = "cohort_entry_date",
fu_date = "exit_date",
time_format = "ymd",
output_unit = "years",
include_quality_metrics = TRUE,
remove_extreme = TRUE
)
# Exposure analysis
library(dplyr)
timeinterval_epidemiological %>%
group_by(exposure_status, exit_reason) %>%
summarise(n = n(), .groups = "drop")
} # }
if (FALSE) { # \dontrun{
# Load the dataset
data(timeinterval_landmark)
# Landmark analysis at 6 months
timeinterval(
data = timeinterval_landmark,
dx_date = "diagnosis_date",
fu_date = "last_contact_date",
time_format = "ymd",
output_unit = "months",
use_landmark = TRUE,
landmark_time = 6
)
# Treatment response analysis
library(dplyr)
timeinterval_landmark %>%
filter(landmark_eligible) %>%
group_by(response_6m) %>%
summarise(
n = n(),
alive = sum(vital_status == "Alive", na.rm = TRUE)
)
} # }
if (FALSE) { # \dontrun{
# Load the dataset
data(timeinterval_mixed_formats)
# Test automatic format detection
timeinterval(
data = timeinterval_mixed_formats,
dx_date = "start_date_mixed",
fu_date = "end_date_mixed",
time_format = "auto", # Test auto-detection
output_unit = "days",
include_quality_metrics = TRUE
)
# Data quality analysis
library(dplyr)
timeinterval_mixed_formats %>%
group_by(data_quality_flag) %>%
summarise(
n = n(),
verified = sum(verification_status == "Verified", na.rm = TRUE)
)
} # }