Skip to contents

This module provides data quality assessment including duplicate detection, missing value analysis, and data completeness summary (similar to sumvar's dup() function).

Usage

dataquality(
  data,
  vars,
  check_duplicates = FALSE,
  check_missing = FALSE,
  complete_cases_only = FALSE,
  plot_data_overview = FALSE,
  plot_missing_patterns = FALSE,
  plot_data_types = FALSE,
  missing_threshold_visual = 10,
  showSummary = TRUE,
  showRecommendations = TRUE,
  showExplanations = FALSE
)

Arguments

data

The data as a data frame.

vars

Variables to assess for data quality. If none selected, entire dataset will be analyzed.

check_duplicates

If TRUE, analyzes duplicate values within each variable or across the entire dataset.

check_missing

If TRUE, provides detailed missing value statistics and patterns.

complete_cases_only

If TRUE, checks for duplicate rows across all selected variables. If FALSE, checks for duplicate values within each variable separately.

plot_data_overview

Show data overview visualization displaying variable types and missing values.

plot_missing_patterns

Show missing value patterns visualization.

plot_data_types

Show data type detection and validation visualization.

missing_threshold_visual

Threshold percentage for highlighting variables with missing values in visual analysis.

showSummary

If TRUE, displays a concise plain-language summary of quality issues and overall assessment.

showRecommendations

If TRUE, provides specific recommendations for addressing identified quality issues.

showExplanations

If TRUE, includes explanations of quality metrics and statistical tests used in the analysis.

Value

A results object containing:

results$todoa html
results$texta html
results$summarya html
results$recommendationsa html
results$explanationsa html
results$plotDataOverviewan image
results$plotMissingPatternsan image
results$plotDataTypesan image

Examples

# \donttest{
# Example:
# 1. Load your data frame.
# 2. Select variables to check for data quality issues.
# 3. Choose analysis type (duplicates, missing values, or both).
# 4. Run the dataquality module to see comprehensive data quality report.
# }