Advanced batch effect detection and correction for tabular data analysis. Essential for digital pathology and multi-institutional studies where technical variation can confound biological signals ("garbage in, garbage out").
Usage
batcheffect(
data,
features,
batch_var,
biological_var,
perform_pca = TRUE,
perform_combat = TRUE,
feature_quality = TRUE,
redundancy_analysis = TRUE,
variance_threshold = 0.01,
correlation_threshold = 0.95,
outlier_method = "robust",
outlier_threshold = 3,
pca_components = 3,
missing_threshold = 20,
show_plots = TRUE,
combat_parametric = TRUE,
save_corrected = FALSE
)Arguments
- data
the data as a data frame
- features
Numeric feature variables to assess for batch effects (e.g., biomarkers, image features)
- batch_var
Categorical variable indicating batch/technical groups (e.g., institution, date, instrument)
- biological_var
Optional biological grouping variable to preserve during batch correction
- perform_pca
Perform Principal Component Analysis to visualize batch effects
- perform_combat
Apply ComBat correction to remove batch effects while preserving biological variation
- feature_quality
Assess individual feature quality including distribution analysis and outlier detection
- redundancy_analysis
Analyze feature redundancy and correlation structure
- variance_threshold
Threshold for removing low-variance features (0 = no filtering)
- correlation_threshold
Threshold for identifying highly correlated redundant features
- outlier_method
Method for outlier detection in feature quality assessment
- outlier_threshold
Threshold for outlier detection (standard deviations or IQR multiplier)
- pca_components
Number of principal components to compute and visualize
- missing_threshold
Maximum percentage of missing values allowed per feature
- show_plots
Generate diagnostic plots including PCA plots and quality assessments
- combat_parametric
Use parametric ComBat (faster) vs non-parametric (more robust)
- save_corrected
Add ComBat-corrected features to the dataset
Value
A results object containing:
results$instructions | Instructions for batch effect analysis and quality control | ||||
results$summary | Overall summary of data quality and batch effect assessment | ||||
results$batch_detection | Statistical tests and metrics for batch effect detection | ||||
results$feature_quality | Quality metrics for individual features | ||||
results$redundancy | Analysis of highly correlated and redundant features | ||||
results$combat_results | Results and effectiveness of ComBat batch correction | ||||
results$pca_plot | PCA plots showing batch effects before and after correction | ||||
results$quality_plot | Distribution and quality assessment plots for features | ||||
results$correlation_plot | Correlation matrix heatmap for redundancy analysis | ||||
results$interpretation | Clinical context and quality control recommendations |
Tables can be converted to data frames with asDF or as.data.frame. For example:
results$summary$asDF
as.data.frame(results$summary)
Details
Implements PCA visualization, ComBat correction, feature quality assessment, and comprehensive quality control metrics for high-dimensional data.
Examples
data('your_data')
batcheffect(data = your_data,
features = feature_variables,
batch_var = batch_id,
biological_var = treatment_group,
perform_pca = TRUE,
perform_combat = TRUE)