Clinical Prediction Model Builder
Comprehensive Guide to the modelbuilder Function
ClinicoPath
2025-07-13
Source:vignettes/jjstatsplot-38-modelbuilder-comprehensive.Rmd
jjstatsplot-38-modelbuilder-comprehensive.Rmd
Overview
The modelbuilder
function in ClinicoPath provides
comprehensive clinical prediction model development with advanced
validation and seamless integration with Decision Curve Analysis. This
function is specifically designed for medical research applications
where robust prediction models are essential for clinical
decision-making.
Key Features
- Multiple Model Types: Basic clinical, enhanced clinical, biomarker, and custom models
- Advanced Validation: Cross-validation, bootstrap validation, and optimism correction
- Comprehensive Metrics: AUC, calibration, NRI, IDI, and net benefit calculations
- Missing Data Handling: Complete cases, mean imputation, and multiple imputation
- DCA Integration: Seamless workflow to Decision Curve Analysis
- Clinical Risk Scores: Automatic generation of clinical risk scoring systems
When to Use Prediction Model Builder
Prediction modeling is essential when:
- Risk Stratification: Identify patients at high risk for adverse outcomes
- Clinical Decision Support: Provide evidence-based decision aids
- Biomarker Validation: Assess the clinical utility of new biomarkers
- Regulatory Submissions: Develop models for FDA/EMA approval
- Personalized Medicine: Create individualized treatment recommendations
Understanding Prediction Models
Types of Prediction Models
The modelbuilder function supports four types of prediction models:
1. Basic Clinical Models
- Purpose: Foundation models using core demographic and primary risk factors
- Variables: Age, sex, primary risk factors (diabetes, hypertension)
- Use Case: Initial risk assessment, screening tools
- Advantages: Simple, widely applicable, easy to implement
2. Enhanced Clinical Models
- Purpose: Extended models with additional clinical variables
- Variables: Basic predictors plus smoking, cholesterol, BMI, blood pressure
- Use Case: Comprehensive risk assessment, clinical guidelines
- Advantages: Better discrimination, clinical relevance
Model Development Process
The modelbuilder follows established clinical prediction modeling guidelines:
- Data Preparation: Validation, missing data handling, variable transformation
- Model Fitting: Logistic regression with convergence monitoring
- Variable Selection: Stepwise selection with AIC/BIC criteria
- Performance Assessment: Discrimination, calibration, clinical utility
- Validation: Cross-validation, bootstrap validation, external validation
- Clinical Implementation: Risk score generation, DCA integration
Basic Usage
Simple Clinical Model
# Load example data
data(modelbuilder_test_data)
# Build basic clinical model
result <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
showModelSummary = TRUE,
showPerformanceMetrics = TRUE
)
Enhanced Clinical Model
# Build enhanced clinical model
result <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
enhancedPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "bmi", "systolic_bp"),
buildEnhancedModel = TRUE,
splitData = TRUE,
crossValidation = TRUE,
showROCCurves = TRUE
)
Biomarker Model with Validation
# Build biomarker model with comprehensive validation
result <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
biomarkerPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "troponin", "creatinine"),
buildBiomarkerModel = TRUE,
splitData = TRUE,
crossValidation = TRUE,
bootstrapValidation = TRUE,
missingDataMethod = "multiple_imputation",
showCalibrationPlots = TRUE,
compareModels = TRUE
)
Advanced Configuration
Missing Data Handling
# Complete cases analysis
result_complete <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "troponin"),
buildBasicModel = TRUE,
missingDataMethod = "complete_cases"
)
# Multiple imputation
result_imputed <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "troponin"),
buildBasicModel = TRUE,
missingDataMethod = "multiple_imputation",
imputationSets = 10
)
# Variable exclusion for high missing rates
result_excluded <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "troponin"),
buildBasicModel = TRUE,
missingDataMethod = "exclude_missing"
)
Variable Transformations and Interactions
# Include variable transformations
result_transformed <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "cholesterol", "troponin"),
buildBasicModel = TRUE,
transformVariables = TRUE,
transformMethod = "log"
)
# Include interaction terms
result_interactions <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "smoking"),
buildBasicModel = TRUE,
includeInteractions = TRUE,
interactionTerms = "age*sex, diabetes*smoking"
)
Stepwise Variable Selection
# Stepwise selection with AIC
result_stepwise_aic <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
enhancedPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "bmi", "systolic_bp"),
buildEnhancedModel = TRUE,
useStepwise = TRUE,
stepwiseDirection = "both",
selectionCriterion = "aic"
)
# Stepwise selection with BIC
result_stepwise_bic <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
enhancedPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "bmi", "systolic_bp"),
buildEnhancedModel = TRUE,
useStepwise = TRUE,
stepwiseDirection = "both",
selectionCriterion = "bic"
)
Validation Methods
# Cross-validation
result_cv <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
crossValidation = TRUE,
cvFolds = 10
)
# Bootstrap validation
result_bootstrap <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
bootstrapValidation = TRUE,
bootstrapReps = 1000
)
# Combined validation
result_combined <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
crossValidation = TRUE,
bootstrapValidation = TRUE,
cvFolds = 5,
bootstrapReps = 500
)
Clinical Research Applications
Application 1: Cardiovascular Risk Prediction
# Cardiovascular risk assessment model
cv_risk_model <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
# Build multiple models for comparison
buildBasicModel = TRUE,
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildEnhancedModel = TRUE,
enhancedPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "bmi", "family_history"),
buildBiomarkerModel = TRUE,
biomarkerPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "troponin", "creatinine"),
# Configuration
splitData = TRUE,
crossValidation = TRUE,
missingDataMethod = "multiple_imputation",
# Outputs
compareModels = TRUE,
showROCCurves = TRUE,
showCalibrationPlots = TRUE,
generateRiskScore = TRUE,
exportForDCA = TRUE
)
Clinical Interpretation: - Basic model provides foundation risk assessment - Enhanced model adds lifestyle and laboratory factors - Biomarker model incorporates cardiac-specific markers - Model comparison guides clinical implementation
Application 2: Biomarker Validation Study
# Biomarker validation for new cardiac marker
biomarker_study <- modelbuilder(
data = biomarker_validation_data,
outcome = "major_adverse_cardiac_event",
outcomePositive = "Yes",
# Reference clinical model
buildBasicModel = TRUE,
basicPredictors = c("age", "sex", "diabetes", "hypertension", "chest_pain"),
basicModelName = "clinical_model",
# Biomarker-enhanced model
buildBiomarkerModel = TRUE,
biomarkerPredictors = c("age", "sex", "diabetes", "hypertension",
"chest_pain", "new_biomarker"),
biomarkerModelName = "biomarker_model",
# Validation
splitData = TRUE,
crossValidation = TRUE,
calculateNRI = TRUE,
calculateIDI = TRUE,
# Outputs
compareModels = TRUE,
showPerformanceMetrics = TRUE,
exportForDCA = TRUE
)
Application 3: Personalized Treatment Selection
# Treatment selection model
treatment_model <- modelbuilder(
data = treatment_response_data,
outcome = "treatment_response",
outcomePositive = "Responder",
# Personalized model
buildCustomModel = TRUE,
customPredictors = c("age", "sex", "disease_severity", "biomarker_profile",
"genetic_score", "comorbidity_index"),
customModelName = "personalized_treatment",
# Advanced options
includeInteractions = TRUE,
interactionTerms = "age*biomarker_profile, disease_severity*genetic_score",
transformVariables = TRUE,
useStepwise = TRUE,
# Validation
splitData = TRUE,
crossValidation = TRUE,
bootstrapValidation = TRUE,
# Clinical utility
generateRiskScore = TRUE,
exportForDCA = TRUE
)
Application 4: Multi-center Model Development
# Multi-center prediction model
multicenter_model <- modelbuilder(
data = multicenter_study_data,
outcome = "clinical_outcome",
outcomePositive = "Yes",
# Enhanced model with site adjustments
buildEnhancedModel = TRUE,
enhancedPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "site"),
# Robust validation
splitData = TRUE,
crossValidation = TRUE,
cvFolds = 10,
bootstrapValidation = TRUE,
# Missing data handling
missingDataMethod = "multiple_imputation",
imputationSets = 20,
# Comprehensive outputs
compareModels = TRUE,
showModelSummary = TRUE,
showPerformanceMetrics = TRUE,
showCalibrationPlots = TRUE,
exportForDCA = TRUE
)
Interpreting Results
Model Performance Metrics
Discrimination Metrics: - AUC (Area Under the Curve): Measures model’s ability to distinguish between outcomes - AUC > 0.8: Excellent discrimination - AUC 0.7-0.8: Good discrimination - AUC 0.6-0.7: Fair discrimination - AUC < 0.6: Poor discrimination
Calibration Metrics: - Calibration Slope: Measures agreement between predicted and observed probabilities - Slope = 1.0: Perfect calibration - Slope < 1.0: Overfitting (predictions too extreme) - Slope > 1.0: Underfitting (predictions too conservative)
-
Calibration Intercept: Measures systematic over- or
under-prediction
- Intercept = 0: No systematic bias
- Intercept > 0: Systematic under-prediction
- Intercept < 0: Systematic over-prediction
Overall Performance: - Brier Score: Measures overall prediction accuracy (lower is better) - Hosmer-Lemeshow Test: Tests goodness of fit (p > 0.05 indicates good fit)
Model Comparison
# Compare multiple models
comparison_result <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
# Build all model types
buildBasicModel = TRUE,
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildEnhancedModel = TRUE,
enhancedPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "bmi"),
buildBiomarkerModel = TRUE,
biomarkerPredictors = c("age", "sex", "diabetes", "hypertension",
"smoking", "cholesterol", "troponin"),
# Comparison settings
compareModels = TRUE,
showPerformanceMetrics = TRUE,
calculateNRI = TRUE,
calculateIDI = TRUE
)
Interpretation Guidelines: - Higher AUC indicates better discrimination - Calibration slope closer to 1.0 indicates better calibration - Lower Brier score indicates better overall performance - Positive NRI indicates net improvement in reclassification - Positive IDI indicates improvement in discrimination
Cross-Validation Results
Cross-Validation Metrics: - CV AUC: Average AUC across folds - CV AUC (SD): Standard deviation of AUC across folds - Optimism: Difference between training and CV performance - Optimism-Corrected AUC: Training AUC minus optimism
Interpretation: - CV AUC provides unbiased estimate of future performance - Small SD indicates stable performance across folds - Large optimism indicates overfitting - Optimism-corrected estimates are more realistic
Best Practices
1. Study Design and Data Quality
# Ensure adequate sample size
n_events <- sum(clinical_data$outcome == "Yes")
n_predictors <- length(predictor_variables)
epv <- n_events / n_predictors
# Rule of thumb: EPV should be ≥ 10
if (epv < 10) {
warning("Events per variable (EPV) is low. Consider reducing predictors or increasing sample size.")
}
# Check data quality
summary(clinical_data)
2. Variable Selection
# Clinical relevance first
clinical_predictors <- c("age", "sex", "diabetes", "hypertension")
# Add variables based on clinical knowledge
enhanced_predictors <- c(clinical_predictors, "smoking", "cholesterol", "family_history")
# Consider biomarkers last
biomarker_predictors <- c(enhanced_predictors, "troponin", "creatinine", "bnp")
# Use stepwise selection judiciously
stepwise_model <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
enhancedPredictors = biomarker_predictors,
buildEnhancedModel = TRUE,
useStepwise = TRUE,
selectionCriterion = "bic" # BIC is more conservative than AIC
)
3. Validation Strategy
# Comprehensive validation approach
validation_model <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
# Multiple validation methods
splitData = TRUE, # Internal validation
crossValidation = TRUE, # Cross-validation
bootstrapValidation = TRUE, # Bootstrap validation
cvFolds = 10,
bootstrapReps = 1000,
# Performance assessment
showPerformanceMetrics = TRUE,
showCalibrationPlots = TRUE,
compareModels = TRUE
)
4. Missing Data Strategy
# Assess missing data patterns
missing_summary <- clinical_data %>%
summarise_all(~sum(is.na(.))) %>%
gather(variable, missing_count) %>%
mutate(missing_percent = missing_count / nrow(clinical_data) * 100) %>%
arrange(desc(missing_percent))
# Choose appropriate method
if (max(missing_summary$missing_percent) < 5) {
# Low missing data: complete cases
method <- "complete_cases"
} else if (max(missing_summary$missing_percent) < 20) {
# Moderate missing data: multiple imputation
method <- "multiple_imputation"
} else {
# High missing data: exclude variables
method <- "exclude_missing"
}
model_missing <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
missingDataMethod = method
)
Clinical Risk Scores
Generating Risk Scores
# Generate clinical risk score
risk_score_model <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension", "smoking"),
buildBasicModel = TRUE,
# Risk score generation
generateRiskScore = TRUE,
riskScorePoints = "framingham" # or "simple" or "deciles"
)
Risk Score Implementation
Framingham-Style Points: - Age-based scaling - Integer point assignments - Clinically interpretable ranges
Simple Integer Weights: - Coefficient-based scaling - Easy calculation - Suitable for electronic systems
Risk Decile Scoring: - Percentile-based categories - Population-specific ranges - Suitable for quality metrics
Clinical Risk Score Example
# Example: Cardiovascular Risk Score
# Age: 65 years = 2 points
# Sex: Male = 3 points
# Diabetes: Yes = 4 points
# Hypertension: Yes = 2 points
# Smoking: Current = 5 points
# Total Score: 16 points
# Risk interpretation:
# 0-5 points: Low risk (<5%)
# 6-10 points: Moderate risk (5-15%)
# 11-15 points: High risk (15-30%)
# >15 points: Very high risk (>30%)
Integration with Decision Curve Analysis
DCA Preparation
# Build models for DCA
dca_models <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
# Multiple models for comparison
buildBasicModel = TRUE,
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
basicModelName = "clinical_model",
buildBiomarkerModel = TRUE,
biomarkerPredictors = c("age", "sex", "diabetes", "hypertension", "troponin"),
biomarkerModelName = "biomarker_model",
# DCA preparation
createPredictions = TRUE,
exportForDCA = TRUE
)
DCA Workflow
# After model building, use DCA module
# 1. Models create prediction columns automatically
# 2. Columns named: [model_name]_prob
# 3. Ready for direct use in DCA
# Example DCA setup:
# Outcome: cardiovascular_event
# Positive outcome: Yes
# Prediction models: clinical_model_prob, biomarker_model_prob
# Threshold range: 5% to 50%
Troubleshooting
Common Issues and Solutions
Issue 1: Model Non-Convergence
Problem: Model fitting fails or produces warnings
Solutions:
# Increase maximum iterations
result <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes"),
buildBasicModel = TRUE,
# Convergence handled automatically in enhanced version
)
# Check for perfect separation
separation_check <- table(clinical_data$perfect_predictor, clinical_data$cardiovascular_event)
print(separation_check)
Issue 2: Perfect Separation
Problem: One predictor perfectly predicts outcome
Solutions:
# The enhanced modelbuilder automatically detects and warns about separation
# Remove or combine categories of problematic variables
# Use penalized regression (if available)
Issue 3: Low Event Rate
Problem: Too few events for stable modeling
Solutions:
# Check events per variable
n_events <- sum(clinical_data$outcome == "Yes")
n_predictors <- length(predictor_variables)
epv <- n_events / n_predictors
if (epv < 10) {
# Reduce number of predictors
essential_predictors <- c("age", "sex", "primary_risk_factor")
# Or combine outcome categories
# Or collect more data
}
Issue 4: High Missing Data
Problem: Substantial missing data in key variables
Solutions:
# Use multiple imputation
result <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "biomarker"),
buildBasicModel = TRUE,
missingDataMethod = "multiple_imputation",
imputationSets = 20
)
# Or exclude high-missing variables
result <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "biomarker"),
buildBasicModel = TRUE,
missingDataMethod = "exclude_missing"
)
Reporting Results
Statistical Reporting Guidelines
When reporting prediction model results:
-
Model Development:
- Sample size and event rate
- Variable selection method
- Missing data handling
- Model fitting procedure
-
Performance Metrics:
- Discrimination (AUC with 95% CI)
- Calibration (slope and intercept)
- Overall performance (Brier score)
- Classification metrics at optimal threshold
-
Validation Results:
- Cross-validation performance
- Bootstrap validation (if performed)
- Optimism-corrected estimates
- External validation (if available)
-
Clinical Utility:
- Decision curve analysis results
- Net benefit across thresholds
- Clinical impact assessment
Example Report Template
# Generate comprehensive model report
report_model <- modelbuilder(
data = modelbuilder_test_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
# Model development
buildBasicModel = TRUE,
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
# Validation
splitData = TRUE,
crossValidation = TRUE,
bootstrapValidation = TRUE,
# Comprehensive outputs
showModelSummary = TRUE,
showPerformanceMetrics = TRUE,
showCalibrationPlots = TRUE,
compareModels = TRUE,
generateRiskScore = TRUE,
exportForDCA = TRUE
)
Sample Report Text: “We developed a clinical prediction model for cardiovascular events using logistic regression. The model included age, sex, diabetes, and hypertension as predictors. The dataset was split into training (70%, n=420) and validation (30%, n=180) sets. The model demonstrated good discrimination (AUC 0.74, 95% CI 0.69-0.79) and adequate calibration (slope 0.96, intercept 0.03). Five-fold cross-validation yielded an optimism-corrected AUC of 0.72. Decision curve analysis showed clinical utility across the 10-30% threshold range.”
Advanced Topics
Custom Prediction Models
# Complex custom model with interactions
custom_model <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
# Custom variable combination
buildCustomModel = TRUE,
customPredictors = c("age", "sex", "diabetes", "cholesterol",
"family_history", "exercise", "diet_score"),
# Advanced features
includeInteractions = TRUE,
interactionTerms = "age*sex, diabetes*cholesterol, family_history*exercise",
transformVariables = TRUE,
transformMethod = "polynomial",
# Variable selection
useStepwise = TRUE,
stepwiseDirection = "both",
selectionCriterion = "bic"
)
Model Updating and Recalibration
# Original model
original_model <- modelbuilder(
data = original_dataset,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE
)
# Updated model with new data
updated_model <- modelbuilder(
data = updated_dataset,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
# Compare to original
compareModels = TRUE
)
Multi-outcome Models
# Separate models for different outcomes
primary_outcome <- modelbuilder(
data = clinical_data,
outcome = "primary_endpoint",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
basicModelName = "primary_model"
)
secondary_outcome <- modelbuilder(
data = clinical_data,
outcome = "secondary_endpoint",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
basicModelName = "secondary_model"
)
Quality Assurance
Validation Steps
# 1. Data quality assessment
summary(clinical_data)
# 2. Model development
model_result <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
showModelSummary = TRUE
)
# 3. Performance evaluation
performance_check <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
showPerformanceMetrics = TRUE,
showCalibrationPlots = TRUE
)
# 4. Validation
validation_check <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
crossValidation = TRUE,
bootstrapValidation = TRUE
)
# 5. Clinical utility
utility_check <- modelbuilder(
data = clinical_data,
outcome = "cardiovascular_event",
outcomePositive = "Yes",
basicPredictors = c("age", "sex", "diabetes", "hypertension"),
buildBasicModel = TRUE,
exportForDCA = TRUE
)
Conclusion
The modelbuilder
function provides a comprehensive
toolkit for clinical prediction model development. By combining robust
statistical methods, advanced validation techniques, and seamless
integration with decision analysis, it enables researchers to:
- Develop high-quality prediction models
- Validate model performance rigorously
- Assess clinical utility effectively
- Implement models in clinical practice
Key Takeaways
- Start with clinical knowledge when selecting predictors
- Use appropriate validation methods for your dataset size
- Handle missing data carefully with appropriate methods
- Assess both discrimination and calibration performance
- Evaluate clinical utility with decision curve analysis
- Report methods and results transparently for reproducibility
Further Reading
- Steyerberg, E.W. (2019). Clinical Prediction Models. Springer.
- Harrell, F.E. (2015). Regression Modeling Strategies. Springer.
- Collins, G.S., et al. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD). BMJ, 350, g7594.
For more information and updates, visit the ClinicoPath documentation.
This vignette was generated using ClinicoPath version 0.0.3.56 on 2025-07-13.