Skip to contents

Advanced batch effect detection and correction for tabular data analysis. Essential for digital pathology and multi-institutional studies where technical variation can confound biological signals ("garbage in, garbage out").

Usage

batcheffect(
  data,
  features,
  batch_var,
  biological_var,
  perform_pca = TRUE,
  perform_combat = TRUE,
  feature_quality = TRUE,
  redundancy_analysis = TRUE,
  variance_threshold = 0.01,
  correlation_threshold = 0.95,
  outlier_method = "robust",
  outlier_threshold = 3,
  pca_components = 3,
  missing_threshold = 20,
  show_plots = TRUE,
  combat_parametric = TRUE,
  save_corrected = FALSE
)

Arguments

data

the data as a data frame

features

Numeric feature variables to assess for batch effects (e.g., biomarkers, image features)

batch_var

Categorical variable indicating batch/technical groups (e.g., institution, date, instrument)

biological_var

Optional biological grouping variable to preserve during batch correction

perform_pca

Perform Principal Component Analysis to visualize batch effects

perform_combat

Apply ComBat correction to remove batch effects while preserving biological variation

feature_quality

Assess individual feature quality including distribution analysis and outlier detection

redundancy_analysis

Analyze feature redundancy and correlation structure

variance_threshold

Threshold for removing low-variance features (0 = no filtering)

correlation_threshold

Threshold for identifying highly correlated redundant features

outlier_method

Method for outlier detection in feature quality assessment

outlier_threshold

Threshold for outlier detection (standard deviations or IQR multiplier)

pca_components

Number of principal components to compute and visualize

missing_threshold

Maximum percentage of missing values allowed per feature

show_plots

Generate diagnostic plots including PCA plots and quality assessments

combat_parametric

Use parametric ComBat (faster) vs non-parametric (more robust)

save_corrected

Add ComBat-corrected features to the dataset

Value

A results object containing:

results$instructionsInstructions for batch effect analysis and quality control
results$summaryOverall summary of data quality and batch effect assessment
results$batch_detectionStatistical tests and metrics for batch effect detection
results$feature_qualityQuality metrics for individual features
results$redundancyAnalysis of highly correlated and redundant features
results$combat_resultsResults and effectiveness of ComBat batch correction
results$pca_plotPCA plots showing batch effects before and after correction
results$quality_plotDistribution and quality assessment plots for features
results$correlation_plotCorrelation matrix heatmap for redundancy analysis
results$interpretationClinical context and quality control recommendations

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$summary$asDF

as.data.frame(results$summary)

Details

Implements PCA visualization, ComBat correction, feature quality assessment, and comprehensive quality control metrics for high-dimensional data.

Examples

data('your_data')

batcheffect(data = your_data,
           features = feature_variables,
           batch_var = batch_id,
           biological_var = treatment_group,
           perform_pca = TRUE,
           perform_combat = TRUE)