Skip to contents

This module provides data quality assessment including duplicate detection, missing value analysis, and data completeness summary (similar to sumvar's dup() function).

Usage

dataquality(
  data,
  vars,
  check_duplicates = TRUE,
  check_missing = TRUE,
  complete_cases_only = FALSE,
  plot_data_overview = FALSE,
  plot_missing_patterns = FALSE,
  plot_data_types = FALSE,
  plot_value_expectations = FALSE,
  missing_threshold_visual = 10
)

Arguments

data

The data as a data frame.

vars

Variables to assess for data quality. If none selected, entire dataset will be analyzed.

check_duplicates

If TRUE, analyzes duplicate values within each variable or across the entire dataset.

check_missing

If TRUE, provides detailed missing value statistics and patterns.

complete_cases_only

If TRUE, analyzes completeness across all selected variables simultaneously.

plot_data_overview

Show data overview visualization displaying variable types and missing values.

plot_missing_patterns

Show missing value patterns visualization.

plot_data_types

Show data type detection and validation visualization.

plot_value_expectations

Show value expectations analysis visualization.

missing_threshold_visual

Threshold percentage for highlighting variables with missing values in visual analysis.

Value

A results object containing:

results$todoa html
results$texta html
results$plotDataOverviewan image
results$plotMissingPatternsan image
results$plotDataTypesan image
results$plotValueExpectationsan image

Examples

# \donttest{
# Example:
# 1. Load your data frame.
# 2. Select variables to check for data quality issues.
# 3. Choose analysis type (duplicates, missing values, or both).
# 4. Run the dataquality module to see comprehensive data quality report.
# }