A dataset for testing and developing predictive models, particularly for cardiovascular events. It contains patient demographics, clinical risk factors, lab values, and an outcome variable.
Usage
data(modelbuilder_test_data)
Format
A data frame with 600 rows and 16 variables:
- patient_id
Character. Unique patient identifier.
- hospital
Character. Hospital or study center identifier.
- age
Integer. Patient's age in years.
- sex
Character. Patient's sex (e.g., "Male", "Female").
- diabetes
Character. Diabetes status (e.g., "Yes", "No").
- hypertension
Character. Hypertension status (e.g., "Yes", "No").
- smoking
Character. Smoking status (e.g., "Yes", "No", "Former").
- cholesterol
Integer. Total cholesterol level.
- bmi
Numeric. Body Mass Index.
- systolic_bp
Integer. Systolic blood pressure.
- family_history
Character. Family history of cardiovascular disease (e.g., "Yes", "No").
- troponin
Numeric. Cardiac troponin level.
- creatinine
Numeric. Serum creatinine level.
- cardiovascular_event
Character. Outcome variable indicating if a cardiovascular event occurred (e.g., "Yes", "No").
- true_risk
Numeric. A simulated true underlying risk score for the patient.
- risk_category
Character. A pre-calculated risk category based on certain criteria.
Examples
data(modelbuilder_test_data)
str(modelbuilder_test_data)
#> 'data.frame': 600 obs. of 16 variables:
#> $ patient_id : chr "PT0001" "PT0002" "PT0003" "PT0004" ...
#> $ hospital : chr "University Medical Center" "General Hospital" "Community Hospital" "Community Hospital" ...
#> $ age : num 81 58 69 73 70 64 83 64 85 64 ...
#> $ sex : chr "Male" "Male" "Female" "Male" ...
#> $ diabetes : chr "Yes" "No" "No" "Yes" ...
#> $ hypertension : chr "No" "No" "No" "Yes" ...
#> $ smoking : chr "Never" "Current" "Never" "Former" ...
#> $ cholesterol : num 167 194 185 186 129 205 243 250 283 219 ...
#> $ bmi : num 20.2 28.2 21.3 25.1 21.8 34.3 29.6 28.1 30.6 32.2 ...
#> $ systolic_bp : num 132 121 134 131 130 170 107 148 132 130 ...
#> $ family_history : chr "No" "Yes" "No" "Yes" ...
#> $ troponin : num 7.36 2.32 3.43 2.41 NA 3.66 3.12 2.68 2.5 3.39 ...
#> $ creatinine : num 1.19 1.19 1.09 1.29 1.15 0.93 1.32 0.9 1.07 0.96 ...
#> $ cardiovascular_event: Factor w/ 2 levels "No","Yes": 1 2 1 2 NA 2 2 2 2 1 ...
#> $ true_risk : num 0.823 0.901 0.212 0.976 NA 0.875 0.764 0.792 0.987 0.811 ...
#> $ risk_category : Factor w/ 4 levels "Low","Moderate",..: 4 4 3 4 NA 4 4 4 4 4 ...
head(modelbuilder_test_data)
#> patient_id hospital age sex diabetes hypertension smoking
#> 1 PT0001 University Medical Center 81 Male Yes No Never
#> 2 PT0002 General Hospital 58 Male No No Current
#> 3 PT0003 Community Hospital 69 Female No No Never
#> 4 PT0004 Community Hospital 73 Male Yes Yes Former
#> 5 PT0005 University Medical Center 70 Female No Yes Never
#> 6 PT0006 Community Hospital 64 Female No Yes Former
#> cholesterol bmi systolic_bp family_history troponin creatinine
#> 1 167 20.2 132 No 7.36 1.19
#> 2 194 28.2 121 Yes 2.32 1.19
#> 3 185 21.3 134 No 3.43 1.09
#> 4 186 25.1 131 Yes 2.41 1.29
#> 5 129 21.8 130 No NA 1.15
#> 6 205 34.3 170 Yes 3.66 0.93
#> cardiovascular_event true_risk risk_category
#> 1 No 0.823 Very High
#> 2 Yes 0.901 Very High
#> 3 No 0.212 High
#> 4 Yes 0.976 Very High
#> 5 <NA> NA <NA>
#> 6 Yes 0.875 Very High
summary(modelbuilder_test_data$bmi)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 18.00 23.80 26.70 26.72 29.40 40.20
table(modelbuilder_test_data$cardiovascular_event)
#>
#> No Yes
#> 169 358