This module provides data quality assessment including duplicate detection, missing value analysis, and data completeness summary (similar to sumvar's dup() function).
Usage
dataquality(
data,
vars,
check_duplicates = TRUE,
check_missing = TRUE,
complete_cases_only = FALSE,
visual_analysis = TRUE,
visdat_type = "vis_dat",
missing_threshold_visual = 10,
export_plots = FALSE
)
Arguments
- data
The data as a data frame.
- vars
Variables to assess for data quality. If none selected, entire dataset will be analyzed.
- check_duplicates
If TRUE, analyzes duplicate values within each variable or across the entire dataset.
- check_missing
If TRUE, provides detailed missing value statistics and patterns.
- complete_cases_only
If TRUE, analyzes completeness across all selected variables simultaneously.
- visual_analysis
Enable visual data exploration using visdat package integration. Provides visual summaries of data types and missing patterns.
- visdat_type
Choose the type of visual analysis to perform using visdat.
- missing_threshold_visual
Threshold percentage for highlighting variables with missing values in visual analysis.
- export_plots
Enable export functionality for visual data quality plots.