A longitudinal dataset containing PSA measurements and survival outcomes for 200 prostate cancer patients. This dataset is designed for demonstrating joint longitudinal-survival modeling techniques.
Format
A data frame with 1369 observations and 8 variables:
- patient_id
- Character. Unique patient identifier (PSA_001 to PSA_200) 
- age
- Numeric. Patient age at baseline (years) 
- stage
- Factor. Tumor stage (T1, T2, T3, T4) 
- gleason_score
- Numeric. Gleason score (6-10) 
- visit_time
- Numeric. Time of PSA measurement (months from baseline) 
- psa_level
- Numeric. PSA level (ng/mL) 
- survival_time
- Numeric. Time to death or last follow-up (months) 
- death_status
- Numeric. Event indicator (0 = censored, 1 = death) 
Details
The dataset simulates realistic PSA trajectories where:
- PSA levels generally increase over time 
- Higher tumor stage and Gleason score are associated with higher PSA levels 
- Current PSA level influences survival hazard 
- Visit intervals are irregular, mimicking real clinical practice 
- 14.5% event rate with median follow-up of 60 months 
Examples
data(psa_joint_data)
# Basic data exploration
head(psa_joint_data)
# Number of patients and visits
length(unique(psa_joint_data$patient_id))  # 200 patients
table(table(psa_joint_data$patient_id))    # Visit distribution
# Plot individual PSA trajectories for first 10 patients
library(ggplot2)
first_10 <- subset(psa_joint_data, patient_id %in% unique(patient_id)[1:10])
ggplot(first_10, aes(x = visit_time, y = psa_level, color = patient_id)) +
  geom_line() + geom_point() +
  labs(title = "PSA Trajectories", x = "Time (months)", y = "PSA (ng/mL)")