Skip to contents

Single Variable Quality Check

Usage

checkdata(
  data,
  var,
  showOutliers = TRUE,
  showDistribution = FALSE,
  showDuplicates = FALSE,
  showPatterns = FALSE,
  rareCategoryThreshold = 5,
  clinicalValidation = TRUE,
  unitSystem = "auto",
  outlierTransform = "none",
  mcarTest = FALSE,
  cvMinMean = 0.01,
  showSummary = FALSE,
  showAbout = FALSE,
  showCaveats = FALSE
)

Arguments

data

.

var

.

showOutliers

Detect and display potential outliers using z-score method (|z| > 3).

showDistribution

Display descriptive statistics and distribution characteristics.

showDuplicates

Identify and count duplicate values in the dataset.

showPatterns

Analyze patterns in missing data and value distributions.

rareCategoryThreshold

Percentage threshold for flagging rare categories (important for chi-squared assumptions and modeling).

clinicalValidation

Perform context-specific validation for clinical variables (age, lab values, etc.). Heuristic ranges may need adjustment.

unitSystem

Unit system for clinical plausibility checks. Auto-detect attempts to infer from data ranges.

outlierTransform

Apply transformation before outlier detection to handle skewed distributions (especially right-skewed lab values).

mcarTest

Perform Little's MCAR test if available (requires naniar package). Provides formal test vs. heuristic assessment.

cvMinMean

Suppress coefficient of variation when absolute mean is below this threshold (avoids instability).

showSummary

Display a plain-language summary of data quality suitable for copying to reports.

showAbout

Display information about data quality assessment methodology.

showCaveats

Display important limitations and assumptions of the quality assessment.

Value

A results object containing:

results$todoa html
results$qualityTexta preformatted
results$missingValsa table
results$noOutliersa html
results$outliersShows outliers detected by at least 2 of 3 methods: Z-score (|z|>3), IQR (1.5×IQR rule), Modified Z-score (MAD-based |z|>3.5). Points flagged by only 1 method are NOT shown.
results$outlierMethodSummarySummary of each outlier detection method. These are heuristic approaches; consider skewness and sample size when interpreting.
results$distributiona table
results$duplicatesa table
results$patternsa table
results$naturalSummarya html
results$aboutAnalysisa html
results$caveatsAssumptionsa html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$missingVals$asDF

as.data.frame(results$missingVals)