Skip to contents

This module provides data quality assessment including duplicate detection, missing value analysis, and data completeness summary (similar to sumvar's dup() function).

Usage

dataquality(
  data,
  vars,
  check_duplicates = TRUE,
  check_missing = TRUE,
  complete_cases_only = FALSE,
  visual_analysis = TRUE,
  visdat_type = "vis_dat",
  missing_threshold_visual = 10,
  export_plots = FALSE
)

Arguments

data

The data as a data frame.

vars

Variables to assess for data quality. If none selected, entire dataset will be analyzed.

check_duplicates

If TRUE, analyzes duplicate values within each variable or across the entire dataset.

check_missing

If TRUE, provides detailed missing value statistics and patterns.

complete_cases_only

If TRUE, analyzes completeness across all selected variables simultaneously.

visual_analysis

Enable visual data exploration using visdat package integration. Provides visual summaries of data types and missing patterns.

visdat_type

Choose the type of visual analysis to perform using visdat.

missing_threshold_visual

Threshold percentage for highlighting variables with missing values in visual analysis.

export_plots

Enable export functionality for visual data quality plots.

Value

A results object containing:

results$todoa html
results$texta html
results$plota html

Examples

# \donttest{
# Example:
# 1. Load your data frame.
# 2. Select variables to check for data quality issues.
# 3. Choose analysis type (duplicates, missing values, or both).
# 4. Run the dataquality module to see comprehensive data quality report.
# }