Skip to contents

A longitudinal dataset containing PSA measurements and survival outcomes for 200 prostate cancer patients. This dataset is designed for demonstrating joint longitudinal-survival modeling techniques.

Usage

psa_joint_data

Format

A data frame with 1369 observations and 8 variables:

patient_id

Character. Unique patient identifier (PSA_001 to PSA_200)

age

Numeric. Patient age at baseline (years)

stage

Factor. Tumor stage (T1, T2, T3, T4)

gleason_score

Numeric. Gleason score (6-10)

visit_time

Numeric. Time of PSA measurement (months from baseline)

psa_level

Numeric. PSA level (ng/mL)

survival_time

Numeric. Time to death or last follow-up (months)

death_status

Numeric. Event indicator (0 = censored, 1 = death)

Source

Simulated data based on typical prostate cancer cohort characteristics

Details

The dataset simulates realistic PSA trajectories where:

  • PSA levels generally increase over time

  • Higher tumor stage and Gleason score are associated with higher PSA levels

  • Current PSA level influences survival hazard

  • Visit intervals are irregular, mimicking real clinical practice

  • 14.5% event rate with median follow-up of 60 months

Examples

data(psa_joint_data)

# Basic data exploration
head(psa_joint_data)

# Number of patients and visits
length(unique(psa_joint_data$patient_id))  # 200 patients
table(table(psa_joint_data$patient_id))    # Visit distribution

# Plot individual PSA trajectories for first 10 patients
library(ggplot2)
first_10 <- subset(psa_joint_data, patient_id %in% unique(patient_id)[1:10])
ggplot(first_10, aes(x = visit_time, y = psa_level, color = patient_id)) +
  geom_line() + geom_point() +
  labs(title = "PSA Trajectories", x = "Time (months)", y = "PSA (ng/mL)")