A longitudinal dataset containing PSA measurements and survival outcomes for 200 prostate cancer patients. This dataset is designed for demonstrating joint longitudinal-survival modeling techniques.
Format
A data frame with 1369 observations and 8 variables:
- patient_id
Character. Unique patient identifier (PSA_001 to PSA_200)
- age
Numeric. Patient age at baseline (years)
- stage
Factor. Tumor stage (T1, T2, T3, T4)
- gleason_score
Numeric. Gleason score (6-10)
- visit_time
Numeric. Time of PSA measurement (months from baseline)
- psa_level
Numeric. PSA level (ng/mL)
- survival_time
Numeric. Time to death or last follow-up (months)
- death_status
Numeric. Event indicator (0 = censored, 1 = death)
Details
The dataset simulates realistic PSA trajectories where:
PSA levels generally increase over time
Higher tumor stage and Gleason score are associated with higher PSA levels
Current PSA level influences survival hazard
Visit intervals are irregular, mimicking real clinical practice
14.5% event rate with median follow-up of 60 months
Examples
data(psa_joint_data)
# Basic data exploration
head(psa_joint_data)
# Number of patients and visits
length(unique(psa_joint_data$patient_id)) # 200 patients
table(table(psa_joint_data$patient_id)) # Visit distribution
# Plot individual PSA trajectories for first 10 patients
library(ggplot2)
first_10 <- subset(psa_joint_data, patient_id %in% unique(patient_id)[1:10])
ggplot(first_10, aes(x = visit_time, y = psa_level, color = patient_id)) +
geom_line() + geom_point() +
labs(title = "PSA Trajectories", x = "Time (months)", y = "PSA (ng/mL)")