Skip to contents

Comprehensive biomarker discovery and validation platform using machine learning algorithms with interpretability analysis. Includes feature selection, biomarker ranking, pathway analysis, and clinical validation metrics. Designed for omics data analysis with regulatory compliance features for biomarker development and clinical translation.

Usage

biomarkerdiscovery(
  data,
  outcome_var,
  biomarker_vars,
  clinical_vars,
  batch_var,
  patient_id,
  time_var,
  event_var,
  discovery_method = "elastic_net",
  outcome_type = "binary",
  data_type = "genomics",
  data_preprocessing = TRUE,
  normalization_method = "z_score",
  batch_correction = FALSE,
  batch_method = "combat",
  filter_low_variance = TRUE,
  variance_threshold = 0.1,
  feature_selection_method = "univariate_stats",
  n_features_select = 50,
  fdr_threshold = 0.05,
  correlation_threshold = 0.9,
  validation_method = "cv_10fold",
  train_proportion = 0.7,
  hyperparameter_tuning = TRUE,
  n_bootstrap_samples = 1000,
  random_seed = 42,
  biomarker_ranking = TRUE,
  stability_analysis = TRUE,
  clinical_performance = TRUE,
  signature_development = TRUE,
  pathway_analysis = FALSE,
  interpretability = TRUE,
  shap_analysis = TRUE,
  lime_analysis = FALSE,
  feature_interaction = TRUE,
  partial_dependence = TRUE,
  biomarker_networks = FALSE,
  cutpoint_optimization = TRUE,
  risk_stratification = TRUE,
  nomogram_development = TRUE,
  decision_curve_analysis = TRUE,
  external_validation = TRUE,
  biomarker_generalizability = TRUE,
  robustness_testing = TRUE,
  quality_control = TRUE,
  outlier_detection = TRUE,
  missing_data_analysis = TRUE,
  detailed_results = TRUE,
  biomarker_report = TRUE,
  export_biomarkers = FALSE,
  save_signature = FALSE,
  regulatory_documentation = TRUE
)

Arguments

data

the data as a data frame

outcome_var

Primary outcome variable for biomarker discovery

biomarker_vars

Potential biomarker variables (genes, proteins, metabolites, etc.)

clinical_vars

Clinical variables to include in the analysis (age, stage, etc.)

batch_var

Batch or study identifier for batch effect correction

patient_id

Patient identifier for tracking

time_var

Time to event variable for survival biomarker analysis

event_var

Event indicator for survival biomarker analysis

discovery_method

Method for biomarker discovery and selection

outcome_type

Type of outcome variable

data_type

Type of biomarker data being analyzed

data_preprocessing

Perform data preprocessing and normalization

normalization_method

Method for data normalization

batch_correction

Perform batch effect correction

batch_method

Method for batch effect correction

filter_low_variance

Remove features with low variance

variance_threshold

Minimum variance threshold for feature filtering

feature_selection_method

Method for initial feature selection

n_features_select

Maximum number of features to select for analysis

fdr_threshold

False discovery rate threshold for multiple testing correction

correlation_threshold

Correlation threshold for removing highly correlated features

validation_method

Method for model validation

train_proportion

Proportion of data for training (70 percent = 0.7)

hyperparameter_tuning

Perform hyperparameter optimization

n_bootstrap_samples

Number of bootstrap samples for confidence intervals

random_seed

Random seed for reproducibility

biomarker_ranking

Rank biomarkers by importance and clinical relevance

stability_analysis

Assess biomarker selection stability across resampling

clinical_performance

Calculate clinical performance metrics for biomarkers

signature_development

Develop multi-biomarker signatures

pathway_analysis

Perform pathway enrichment analysis for discovered biomarkers

interpretability

Generate interpretability analysis using SHAP/LIME

shap_analysis

Generate SHAP values for biomarker explanation

lime_analysis

Generate LIME explanations for individual predictions

feature_interaction

Analyze interactions between biomarkers

partial_dependence

Generate partial dependence plots for key biomarkers

biomarker_networks

Analyze biomarker co-expression and interaction networks

cutpoint_optimization

Find optimal cutpoints for biomarker classification

risk_stratification

Create risk stratification based on biomarker signatures

nomogram_development

Develop clinical nomogram incorporating biomarkers

decision_curve_analysis

Assess clinical utility using decision curve analysis

external_validation

Prepare biomarkers for external validation

biomarker_generalizability

Assess biomarker generalizability across populations

robustness_testing

Test biomarker robustness to data perturbations

quality_control

Comprehensive quality control for biomarker data

outlier_detection

Detect and handle outliers in biomarker data

missing_data_analysis

Analyze and handle missing biomarker data

detailed_results

Include comprehensive analysis results

biomarker_report

Generate comprehensive biomarker discovery report

export_biomarkers

Export list of discovered biomarkers

save_signature

Save trained biomarker signature model

regulatory_documentation

Include documentation for regulatory submission

Value

A results object containing:

results$discovery_overviewa table
results$data_summarya table
results$quality_control_summarya table
results$outlier_analysisa table
results$feature_selection_summarya table
results$selected_biomarkersa table
results$discovery_performancea table
results$signature_performancea table
results$biomarker_rankinga table
results$stability_analysis_resultsa table
results$shap_biomarker_importancea table
results$biomarker_interactionsa table
results$optimal_cutpointsa table
results$risk_stratification_resultsa table
results$decision_curve_resultsa table
results$pathway_enrichmenta table
results$cross_validation_resultsa table
results$generalizability_assessmenta table
results$clinical_interpretationa table
results$regulatory_summarya table
results$biomarker_importance_plotan image
results$roc_comparison_plotan image
results$shap_summary_plotan image
results$shap_dependence_plotan image
results$stability_plotan image
results$biomarker_correlation_plotan image
results$risk_stratification_plotan image
results$decision_curve_plotan image
results$pathway_network_plotan image
results$biomarker_distribution_plotan image
results$nomogram_plotan image

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$discovery_overview$asDF

as.data.frame(results$discovery_overview)

Examples

# \donttest{
data('biomarker_data')
#> Warning: data set ‘biomarker_data’ not found

biomarkerdiscovery(
    data = biomarker_data,
    outcome_var = "response",
    biomarker_vars = c("gene1", "gene2", "protein1"),
    discovery_method = "elastic_net",
    validation_method = "bootstrap",
    interpretability = TRUE
)
#> Error in biomarkerdiscovery(data = biomarker_data, outcome_var = "response",     biomarker_vars = c("gene1", "gene2", "protein1"), discovery_method = "elastic_net",     validation_method = "bootstrap", interpretability = TRUE): argument "batch_var" is missing, with no default
# }