Biomarker Discovery Platform with ML Interpretability
Source:R/biomarkerdiscovery.h.R
      biomarkerdiscovery.RdUsage
biomarkerdiscovery(
  data,
  outcome_var,
  biomarker_vars,
  clinical_vars,
  batch_var,
  patient_id,
  time_var,
  event_var,
  discovery_method = "elastic_net",
  outcome_type = "binary",
  data_type = "genomics",
  data_preprocessing = TRUE,
  normalization_method = "z_score",
  batch_correction = FALSE,
  batch_method = "combat",
  filter_low_variance = TRUE,
  variance_threshold = 0.1,
  feature_selection_method = "univariate_stats",
  n_features_select = 50,
  fdr_threshold = 0.05,
  correlation_threshold = 0.9,
  validation_method = "cv_10fold",
  train_proportion = 0.7,
  hyperparameter_tuning = TRUE,
  n_bootstrap_samples = 1000,
  random_seed = 42,
  biomarker_ranking = TRUE,
  stability_analysis = TRUE,
  clinical_performance = TRUE,
  signature_development = TRUE,
  pathway_analysis = FALSE,
  interpretability = TRUE,
  shap_analysis = TRUE,
  lime_analysis = FALSE,
  feature_interaction = TRUE,
  partial_dependence = TRUE,
  biomarker_networks = FALSE,
  cutpoint_optimization = TRUE,
  risk_stratification = TRUE,
  nomogram_development = TRUE,
  decision_curve_analysis = TRUE,
  external_validation = TRUE,
  biomarker_generalizability = TRUE,
  robustness_testing = TRUE,
  quality_control = TRUE,
  outlier_detection = TRUE,
  missing_data_analysis = TRUE,
  detailed_results = TRUE,
  biomarker_report = TRUE,
  export_biomarkers = FALSE,
  save_signature = FALSE,
  regulatory_documentation = TRUE
)Arguments
- data
- the data as a data frame 
- outcome_var
- Primary outcome variable for biomarker discovery 
- biomarker_vars
- Potential biomarker variables (genes, proteins, metabolites, etc.) 
- clinical_vars
- Clinical variables to include in the analysis (age, stage, etc.) 
- batch_var
- Batch or study identifier for batch effect correction 
- patient_id
- Patient identifier for tracking 
- time_var
- Time to event variable for survival biomarker analysis 
- event_var
- Event indicator for survival biomarker analysis 
- discovery_method
- Method for biomarker discovery and selection 
- outcome_type
- Type of outcome variable 
- data_type
- Type of biomarker data being analyzed 
- data_preprocessing
- Perform data preprocessing and normalization 
- normalization_method
- Method for data normalization 
- batch_correction
- Perform batch effect correction 
- batch_method
- Method for batch effect correction 
- filter_low_variance
- Remove features with low variance 
- variance_threshold
- Minimum variance threshold for feature filtering 
- feature_selection_method
- Method for initial feature selection 
- n_features_select
- Maximum number of features to select for analysis 
- fdr_threshold
- False discovery rate threshold for multiple testing correction 
- correlation_threshold
- Correlation threshold for removing highly correlated features 
- validation_method
- Method for model validation 
- train_proportion
- Proportion of data for training (70\ - hyperparameter_tuningPerform hyperparameter optimization - n_bootstrap_samplesNumber of bootstrap samples for confidence intervals - random_seedRandom seed for reproducibility - biomarker_rankingRank biomarkers by importance and clinical relevance - stability_analysisAssess biomarker selection stability across resampling - clinical_performanceCalculate clinical performance metrics for biomarkers - signature_developmentDevelop multi-biomarker signatures - pathway_analysisPerform pathway enrichment analysis for discovered biomarkers - interpretabilityGenerate interpretability analysis using SHAP/LIME - shap_analysisGenerate SHAP values for biomarker explanation - lime_analysisGenerate LIME explanations for individual predictions - feature_interactionAnalyze interactions between biomarkers - partial_dependenceGenerate partial dependence plots for key biomarkers - biomarker_networksAnalyze biomarker co-expression and interaction networks - cutpoint_optimizationFind optimal cutpoints for biomarker classification - risk_stratificationCreate risk stratification based on biomarker signatures - nomogram_developmentDevelop clinical nomogram incorporating biomarkers - decision_curve_analysisAssess clinical utility using decision curve analysis - external_validationPrepare biomarkers for external validation - biomarker_generalizabilityAssess biomarker generalizability across populations - robustness_testingTest biomarker robustness to data perturbations - quality_controlComprehensive quality control for biomarker data - outlier_detectionDetect and handle outliers in biomarker data - missing_data_analysisAnalyze and handle missing biomarker data - detailed_resultsInclude comprehensive analysis results - biomarker_reportGenerate comprehensive biomarker discovery report - export_biomarkersExport list of discovered biomarkers - save_signatureSave trained biomarker signature model - regulatory_documentationInclude documentation for regulatory submission 
A results object containing:
| results$discovery_overview | a table | ||||
| results$data_summary | a table | ||||
| results$quality_control_summary | a table | ||||
| results$outlier_analysis | a table | ||||
| results$feature_selection_summary | a table | ||||
| results$selected_biomarkers | a table | ||||
| results$discovery_performance | a table | ||||
| results$signature_performance | a table | ||||
| results$biomarker_ranking | a table | ||||
| results$stability_analysis_results | a table | ||||
| results$shap_biomarker_importance | a table | ||||
| results$biomarker_interactions | a table | ||||
| results$optimal_cutpoints | a table | ||||
| results$risk_stratification_results | a table | ||||
| results$decision_curve_results | a table | ||||
| results$pathway_enrichment | a table | ||||
| results$cross_validation_results | a table | ||||
| results$generalizability_assessment | a table | ||||
| results$clinical_interpretation | a table | ||||
| results$regulatory_summary | a table | ||||
| results$biomarker_importance_plot | an image | ||||
| results$roc_comparison_plot | an image | ||||
| results$shap_summary_plot | an image | ||||
| results$shap_dependence_plot | an image | ||||
| results$stability_plot | an image | ||||
| results$biomarker_correlation_plot | an image | ||||
| results$risk_stratification_plot | an image | ||||
| results$decision_curve_plot | an image | ||||
| results$pathway_network_plot | an image | ||||
| results$biomarker_distribution_plot | an image | ||||
| results$nomogram_plot | an image | 
asDF or as.data.frame. For example:results$discovery_overview$asDFas.data.frame(results$discovery_overview)
Comprehensive biomarker discovery and validation platform using machine
learning
algorithms with interpretability analysis. Includes feature selection,
biomarker
ranking, pathway analysis, and clinical validation metrics. Designed for
omics
data analysis with regulatory compliance features for biomarker development
and clinical translation.
data('biomarker_data')biomarkerdiscovery(
    data = biomarker_data,
    outcome_var = "response",
    biomarker_vars = c("gene1", "gene2", "protein1"),
    discovery_method = "elastic_net",
    validation_method = "bootstrap",
    interpretability = TRUE
)