Biomarker Discovery Platform with ML Interpretability
Source:R/biomarkerdiscovery.h.R
biomarkerdiscovery.RdComprehensive biomarker discovery and validation platform using machine learning algorithms with interpretability analysis. Includes feature selection, biomarker ranking, pathway analysis, and clinical validation metrics. Designed for omics data analysis with regulatory compliance features for biomarker development and clinical translation.
Usage
biomarkerdiscovery(
data,
outcome_var,
biomarker_vars,
clinical_vars,
batch_var,
patient_id,
time_var,
event_var,
discovery_method = "elastic_net",
outcome_type = "binary",
data_type = "genomics",
data_preprocessing = TRUE,
normalization_method = "z_score",
batch_correction = FALSE,
batch_method = "combat",
filter_low_variance = TRUE,
variance_threshold = 0.1,
feature_selection_method = "univariate_stats",
n_features_select = 50,
fdr_threshold = 0.05,
correlation_threshold = 0.9,
validation_method = "cv_10fold",
train_proportion = 0.7,
hyperparameter_tuning = TRUE,
n_bootstrap_samples = 1000,
random_seed = 42,
biomarker_ranking = TRUE,
stability_analysis = TRUE,
clinical_performance = TRUE,
signature_development = TRUE,
pathway_analysis = FALSE,
interpretability = TRUE,
shap_analysis = TRUE,
lime_analysis = FALSE,
feature_interaction = TRUE,
partial_dependence = TRUE,
biomarker_networks = FALSE,
cutpoint_optimization = TRUE,
risk_stratification = TRUE,
nomogram_development = TRUE,
decision_curve_analysis = TRUE,
external_validation = TRUE,
biomarker_generalizability = TRUE,
robustness_testing = TRUE,
quality_control = TRUE,
outlier_detection = TRUE,
missing_data_analysis = TRUE,
detailed_results = TRUE,
biomarker_report = TRUE,
export_biomarkers = FALSE,
save_signature = FALSE,
regulatory_documentation = TRUE
)Arguments
- data
the data as a data frame
- outcome_var
Primary outcome variable for biomarker discovery
- biomarker_vars
Potential biomarker variables (genes, proteins, metabolites, etc.)
- clinical_vars
Clinical variables to include in the analysis (age, stage, etc.)
- batch_var
Batch or study identifier for batch effect correction
- patient_id
Patient identifier for tracking
- time_var
Time to event variable for survival biomarker analysis
- event_var
Event indicator for survival biomarker analysis
- discovery_method
Method for biomarker discovery and selection
- outcome_type
Type of outcome variable
- data_type
Type of biomarker data being analyzed
- data_preprocessing
Perform data preprocessing and normalization
- normalization_method
Method for data normalization
- batch_correction
Perform batch effect correction
- batch_method
Method for batch effect correction
- filter_low_variance
Remove features with low variance
- variance_threshold
Minimum variance threshold for feature filtering
- feature_selection_method
Method for initial feature selection
- n_features_select
Maximum number of features to select for analysis
- fdr_threshold
False discovery rate threshold for multiple testing correction
- correlation_threshold
Correlation threshold for removing highly correlated features
- validation_method
Method for model validation
- train_proportion
Proportion of data for training (70 percent = 0.7)
- hyperparameter_tuning
Perform hyperparameter optimization
- n_bootstrap_samples
Number of bootstrap samples for confidence intervals
- random_seed
Random seed for reproducibility
- biomarker_ranking
Rank biomarkers by importance and clinical relevance
- stability_analysis
Assess biomarker selection stability across resampling
- clinical_performance
Calculate clinical performance metrics for biomarkers
- signature_development
Develop multi-biomarker signatures
- pathway_analysis
Perform pathway enrichment analysis for discovered biomarkers
- interpretability
Generate interpretability analysis using SHAP/LIME
- shap_analysis
Generate SHAP values for biomarker explanation
- lime_analysis
Generate LIME explanations for individual predictions
- feature_interaction
Analyze interactions between biomarkers
- partial_dependence
Generate partial dependence plots for key biomarkers
- biomarker_networks
Analyze biomarker co-expression and interaction networks
- cutpoint_optimization
Find optimal cutpoints for biomarker classification
- risk_stratification
Create risk stratification based on biomarker signatures
- nomogram_development
Develop clinical nomogram incorporating biomarkers
- decision_curve_analysis
Assess clinical utility using decision curve analysis
- external_validation
Prepare biomarkers for external validation
- biomarker_generalizability
Assess biomarker generalizability across populations
- robustness_testing
Test biomarker robustness to data perturbations
- quality_control
Comprehensive quality control for biomarker data
- outlier_detection
Detect and handle outliers in biomarker data
- missing_data_analysis
Analyze and handle missing biomarker data
- detailed_results
Include comprehensive analysis results
- biomarker_report
Generate comprehensive biomarker discovery report
- export_biomarkers
Export list of discovered biomarkers
- save_signature
Save trained biomarker signature model
- regulatory_documentation
Include documentation for regulatory submission
Value
A results object containing:
results$discovery_overview | a table | ||||
results$data_summary | a table | ||||
results$quality_control_summary | a table | ||||
results$outlier_analysis | a table | ||||
results$feature_selection_summary | a table | ||||
results$selected_biomarkers | a table | ||||
results$discovery_performance | a table | ||||
results$signature_performance | a table | ||||
results$biomarker_ranking | a table | ||||
results$stability_analysis_results | a table | ||||
results$shap_biomarker_importance | a table | ||||
results$biomarker_interactions | a table | ||||
results$optimal_cutpoints | a table | ||||
results$risk_stratification_results | a table | ||||
results$decision_curve_results | a table | ||||
results$pathway_enrichment | a table | ||||
results$cross_validation_results | a table | ||||
results$generalizability_assessment | a table | ||||
results$clinical_interpretation | a table | ||||
results$regulatory_summary | a table | ||||
results$biomarker_importance_plot | an image | ||||
results$roc_comparison_plot | an image | ||||
results$shap_summary_plot | an image | ||||
results$shap_dependence_plot | an image | ||||
results$stability_plot | an image | ||||
results$biomarker_correlation_plot | an image | ||||
results$risk_stratification_plot | an image | ||||
results$decision_curve_plot | an image | ||||
results$pathway_network_plot | an image | ||||
results$biomarker_distribution_plot | an image | ||||
results$nomogram_plot | an image |
Tables can be converted to data frames with asDF or as.data.frame. For example:
results$discovery_overview$asDF
as.data.frame(results$discovery_overview)
Examples
# \donttest{
data('biomarker_data')
#> Warning: data set ‘biomarker_data’ not found
biomarkerdiscovery(
data = biomarker_data,
outcome_var = "response",
biomarker_vars = c("gene1", "gene2", "protein1"),
discovery_method = "elastic_net",
validation_method = "bootstrap",
interpretability = TRUE
)
#> Error in biomarkerdiscovery(data = biomarker_data, outcome_var = "response", biomarker_vars = c("gene1", "gene2", "protein1"), discovery_method = "elastic_net", validation_method = "bootstrap", interpretability = TRUE): argument "batch_var" is missing, with no default
# }