Skip to contents

Comprehensive date field correction and standardization using multiple R packages (datefixR, anytime, lubridate). This module handles messy date formats commonly found in clinical databases: different separators, month representations, missing components, and ambiguous formats. Provides standardized date outputs with detailed correction reports and quality assessment. Perfect for clinical research data preprocessing and database standardization workflows.

Usage

datecorrection(
  data,
  date_vars,
  correction_method = "datefixr",
  date_format = "auto",
  day_impute = 1,
  month_impute = 7,
  handle_excel = TRUE,
  timezone = "UTC",
  show_correction_table = TRUE,
  show_quality_assessment = TRUE,
  show_format_analysis = TRUE,
  show_correction_summary = TRUE,
  show_interpretation = TRUE
)

Arguments

data

The data as a data frame.

date_vars

Variables containing date information in various formats that need correction and standardization. Can handle character strings, numeric values, factors with date representations.

correction_method

Method for date correction. datefixR provides robust format detection, anytime offers flexible parsing, lubridate allows format specification, and consensus combines methods for maximum reliability.

date_format

Expected date format for ambiguous cases. Auto-detect tries to determine the most likely format based on the data patterns.

day_impute

Day of month to impute when day is missing (1-31). Default is 1st of month. If value exceeds days in month, last day of month will be used.

month_impute

Month to impute when month is missing (1-12). Default is 7 (July). Commonly used middle-year value for clinical research.

handle_excel

Whether to convert Excel numeric date values (days since 1900-01-01). Useful for data exported from Excel spreadsheets.

timezone

Timezone for output dates. Use UTC for standardization, or local timezone if time-of-day information is critical.

show_correction_table

Display detailed table showing original values, corrected values, and correction status for each observation.

show_quality_assessment

Provide quality assessment including success rates, common problems, and recommendations for further correction.

show_format_analysis

Analyze detected date formats and patterns in the original data.

show_correction_summary

Summary statistics of the correction process including before/after comparison and data quality metrics.

show_interpretation

Display guidance on date correction methods, best practices, and recommendations for clinical research data.

Value

A results object containing:

results$todoa html
results$correction_tablea html
results$quality_assessmenta html
results$format_analysisa html
results$correction_summarya html
results$interpretationa html

Examples

# \donttest{
# Example:
# 1. Select variables containing date information that need correction.
# 2. Choose correction method (automatic detection or specific format).
# 3. Configure missing value imputation settings.
# 4. Review correction results and quality assessment.
# }