This module provides data quality assessment including duplicate detection, missing value analysis, and data completeness summary (similar to sumvar's dup() function).
Usage
dataquality(
data,
vars,
check_duplicates = TRUE,
check_missing = TRUE,
complete_cases_only = FALSE,
plot_data_overview = FALSE,
plot_missing_patterns = FALSE,
plot_data_types = FALSE,
plot_value_expectations = FALSE,
missing_threshold_visual = 10
)Arguments
- data
The data as a data frame.
- vars
Variables to assess for data quality. If none selected, entire dataset will be analyzed.
- check_duplicates
If TRUE, analyzes duplicate values within each variable or across the entire dataset.
- check_missing
If TRUE, provides detailed missing value statistics and patterns.
- complete_cases_only
If TRUE, analyzes completeness across all selected variables simultaneously.
- plot_data_overview
Show data overview visualization displaying variable types and missing values.
- plot_missing_patterns
Show missing value patterns visualization.
- plot_data_types
Show data type detection and validation visualization.
- plot_value_expectations
Show value expectations analysis visualization.
- missing_threshold_visual
Threshold percentage for highlighting variables with missing values in visual analysis.