Simulated cardiovascular risk assessment dataset with traditional risk factors, patient demographics, and 5-year cardiovascular event outcomes. Designed to test risk prediction models, clinical decision support, and cardiovascular screening decision trees.
Format
A data frame with 400 patients and 13 variables:
- patient_id
Character. Unique patient identifier (CVD_0001 to CVD_0400)
- systolic_bp
Numeric. Systolic blood pressure (mmHg)
- diastolic_bp
Numeric. Diastolic blood pressure (mmHg)
- cholesterol
Numeric. Total cholesterol level (mg/dL)
- hdl
Numeric. HDL cholesterol level (mg/dL)
- ldl
Numeric. LDL cholesterol level (mg/dL)
- triglycerides
Numeric. Triglyceride level (mg/dL)
- bmi
Numeric. Body mass index (kg/m²)
- age
Integer. Patient age (years)
- smoking
Factor. Smoking status ("Never", "Former", "Current")
- diabetes
Factor. Diabetes status ("No", "Yes")
- family_history
Factor. Family history of cardiovascular disease ("No", "Yes")
- cv_event
Factor. Primary outcome - cardiovascular event within 5 years ("No", "Yes")
- study_cohort
Factor. Study cohort ("training", "testing")
- sex
Factor. Patient sex ("Male", "Female")
- ethnicity
Factor. Patient ethnicity ("White", "Black", "Hispanic", "Asian", "Other")
- x_coord, y_coord
Numeric. Spatial coordinates for geographic analysis
Details
This dataset simulates a comprehensive cardiovascular risk assessment study with traditional risk factors and demographic variables. The dataset follows established epidemiological patterns for cardiovascular disease risk factors and outcomes.
Clinical Context:
Cardiovascular disease risk prediction
Primary prevention screening
Clinical decision support for risk stratification
Population health management
Key Features:
Traditional Framingham Risk Score variables
Realistic distributions of cardiovascular risk factors
Diverse patient demographics and ethnicities
Geographic variation modeling with spatial coordinates
Appropriate missing data patterns (6-12% across variables)
Recommended Analysis Scenarios:
Risk prediction model development
Clinical decision thresholds optimization
Population-based risk stratification
Geographic variation analysis
Multi-ethnic risk assessment
Bootstrap validation for confidence intervals
Examples
if (FALSE) { # \dontrun{
# Load the dataset
data(cardiovascular_risk)
# Risk stratification analysis
result <- tree(
data = cardiovascular_risk,
vars = c("systolic_bp", "cholesterol", "hdl", "bmi", "age"),
facs = c("smoking", "diabetes", "family_history"),
target = "cv_event",
targetLevel = "Yes",
train = "study_cohort",
trainLevel = "training",
clinicalContext = "screening",
riskStratification = TRUE,
showInterpretation = TRUE,
crossValidation = TRUE
)
} # }