Skip to contents

Develop and validate clinical prediction models using machine learning algorithms with interpretability analysis. Includes comprehensive model comparison, feature selection, cross-validation, and explainable AI through SHAP and LIME methods. Designed for clinical research applications with regulatory compliance features for model validation and documentation.

Usage

clinicalprediction(
  data,
  outcome_var,
  predictor_vars,
  patient_id,
  time_var,
  event_var,
  stratify_var,
  model_type = "random_forest",
  problem_type = "classification",
  outcome_type = "binary",
  feature_selection = TRUE,
  selection_method = "recursive_fe",
  feature_engineering = TRUE,
  handle_missing = "mice_imputation",
  train_proportion = 0.7,
  validation_method = "cv_10fold",
  hyperparameter_tuning = TRUE,
  tuning_method = "random_search",
  random_seed = 42,
  interpretability = TRUE,
  shap_analysis = TRUE,
  lime_analysis = FALSE,
  permutation_importance = TRUE,
  partial_dependence = TRUE,
  individual_explanations = FALSE,
  n_explanations = 10,
  performance_metrics = TRUE,
  calibration_analysis = TRUE,
  clinical_metrics = TRUE,
  roc_analysis = TRUE,
  threshold_optimization = TRUE,
  compare_models = FALSE,
  baseline_models = TRUE,
  ensemble_models = FALSE,
  risk_stratification = TRUE,
  n_risk_groups = 3,
  nomogram = TRUE,
  decision_curve = TRUE,
  external_validation = TRUE,
  bootstrap_ci = 1000,
  stability_analysis = TRUE,
  bias_analysis = TRUE,
  detailed_output = TRUE,
  clinical_report = TRUE,
  save_model = FALSE,
  export_predictions = FALSE,
  regulatory_documentation = TRUE
)

Arguments

data

the data as a data frame

outcome_var

Target variable for prediction (binary, continuous, or survival)

predictor_vars

Variables to use as predictors in the model

patient_id

Patient identifier for tracking predictions

time_var

Time to event variable for survival prediction models

event_var

Event indicator for survival prediction models

stratify_var

Variable for stratified sampling and validation

model_type

Type of machine learning model to fit

problem_type

Type of prediction problem

outcome_type

Specific type of outcome variable

feature_selection

Perform automated feature selection

selection_method

Method for automatic feature selection

feature_engineering

Perform automated feature engineering

handle_missing

Method for handling missing data

train_proportion

Proportion of data for training (70 percent = 0.7)

validation_method

Method for model validation

hyperparameter_tuning

Perform automated hyperparameter optimization

tuning_method

Method for hyperparameter tuning

random_seed

Random seed for reproducibility

interpretability

Generate model interpretability analysis

shap_analysis

Generate SHAP (SHapley Additive exPlanations) values

lime_analysis

Generate LIME (Local Interpretable Model-agnostic Explanations)

permutation_importance

Calculate permutation feature importance

partial_dependence

Generate partial dependence plots for key features

individual_explanations

Explain individual predictions using SHAP/LIME

n_explanations

Number of individual predictions to explain in detail

performance_metrics

Calculate comprehensive performance metrics

calibration_analysis

Assess model calibration

clinical_metrics

Calculate clinical decision analysis metrics

roc_analysis

Perform ROC curve analysis with confidence intervals

threshold_optimization

Optimize prediction threshold for clinical use

compare_models

Compare multiple model types

baseline_models

Include simple baseline models for comparison

ensemble_models

Create ensemble of best performing models

risk_stratification

Create risk stratification groups

n_risk_groups

Number of risk stratification groups (e.g., low/medium/high)

nomogram

Create clinical nomogram for risk calculation

decision_curve

Perform decision curve analysis for clinical utility

external_validation

Prepare model for external validation

bootstrap_ci

Number of bootstrap samples for confidence intervals

stability_analysis

Assess model stability across different samples

bias_analysis

Analyze model bias across demographic groups

detailed_output

Include detailed model diagnostics and explanations

clinical_report

Generate clinical interpretation report

save_model

Save trained model for future predictions

export_predictions

Export individual predictions with probabilities

regulatory_documentation

Include documentation for regulatory submission

Value

A results object containing:

results$overviewa table
results$dataset_infoa table
results$feature_selection_resultsa table
results$feature_engineering_summarya table
results$performance_summarya table
results$classification_metricsa table
results$survival_metricsa table
results$feature_importancea table
results$shap_summarya table
results$individual_explanationsa table
results$model_comparisona table
results$risk_stratificationa table
results$decision_curve_analysisa table
results$cross_validation_resultsa table
results$stability_analysisa table
results$bias_fairness_analysisa table
results$hyperparameter_resultsa table
results$clinical_interpretationa table
results$regulatory_summarya table
results$roc_plotan image
results$calibration_plotan image
results$feature_importance_plotan image
results$shap_summary_plotan image
results$shap_dependence_plotan image
results$decision_curve_plotan image
results$risk_distribution_plotan image
results$model_comparison_plotan image
results$stability_plotan image
results$nomogram_plotan image

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$overview$asDF

as.data.frame(results$overview)

Examples

# \donttest{
data('clinical_data')
#> Warning: data set ‘clinical_data’ not found

clinicalprediction(
    data = clinical_data,
    outcome_var = "mortality",
    predictor_vars = c("age", "biomarker", "stage"),
    model_type = "random_forest",
    interpretability = TRUE,
    validation_method = "cv_10fold"
)
#> Error in clinicalprediction(data = clinical_data, outcome_var = "mortality",     predictor_vars = c("age", "biomarker", "stage"), model_type = "random_forest",     interpretability = TRUE, validation_method = "cv_10fold"): argument "event_var" is missing, with no default
# }