Comprehensive date field correction and standardization using multiple R packages (datefixR, anytime, lubridate). This module handles messy date formats commonly found in clinical databases: different separators, month representations, missing components, and ambiguous formats. Provides standardized date outputs with detailed correction reports and quality assessment. Perfect for clinical research data preprocessing and database standardization workflows.
Usage
datecorrection(
data,
date_vars,
correction_method = "datefixr",
date_format = "auto",
day_impute = 1,
month_impute = 7,
handle_excel = TRUE,
timezone = "UTC",
show_correction_table = TRUE,
show_quality_assessment = TRUE,
show_format_analysis = TRUE,
show_correction_summary = TRUE,
show_interpretation = TRUE
)
Arguments
- data
The data as a data frame.
- date_vars
Variables containing date information in various formats that need correction and standardization. Can handle character strings, numeric values, factors with date representations.
- correction_method
Method for date correction. datefixR provides robust format detection, anytime offers flexible parsing, lubridate allows format specification, and consensus combines methods for maximum reliability.
- date_format
Expected date format for ambiguous cases. Auto-detect tries to determine the most likely format based on the data patterns.
- day_impute
Day of month to impute when day is missing (1-31). Default is 1st of month. If value exceeds days in month, last day of month will be used.
- month_impute
Month to impute when month is missing (1-12). Default is 7 (July). Commonly used middle-year value for clinical research.
- handle_excel
Whether to convert Excel numeric date values (days since 1900-01-01). Useful for data exported from Excel spreadsheets.
- timezone
Timezone for output dates. Use UTC for standardization, or local timezone if time-of-day information is critical.
- show_correction_table
Display detailed table showing original values, corrected values, and correction status for each observation.
- show_quality_assessment
Provide quality assessment including success rates, common problems, and recommendations for further correction.
- show_format_analysis
Analyze detected date formats and patterns in the original data.
- show_correction_summary
Summary statistics of the correction process including before/after comparison and data quality metrics.
- show_interpretation
Display guidance on date correction methods, best practices, and recommendations for clinical research data.