Relative Survival Analysis - Comprehensive Guide
Source:vignettes/jsurvival-relativesurvival-comprehensive.Rmd
jsurvival-relativesurvival-comprehensive.RmdNote: The
relativesurvival()function is designed for use within jamovi’s GUI. The code examples below show the R syntax for reference. To run interactively, usedevtools::load_all()and call the wrapper function directly.
Relative Survival Analysis
Overview
Relative survival analysis compares observed survival in a patient cohort to the survival expected in a matched general population. Rather than requiring (often inaccurate) cause-of-death data, it estimates disease-specific mortality indirectly through the ratio of observed to expected survival. This approach is the standard in population-based cancer epidemiology and is used by EUROCARE, CONCORD, and national cancer registries worldwide.
The relativesurvival module provides:
- Four estimation methods: Pohar-Perme (recommended), Ederer I, Ederer II, Hakulinen
-
15 population rate tables: 4 from the
relsurvpackage (US, Minnesota, France, Slovenia) and 10 WHO-based tables (Turkey, Germany, UK, Italy, Japan, Spain, Brazil, South Korea, China, India), plus a custom option - Net survival and excess mortality estimation
- Crude probability of death decomposition (disease vs. other causes)
- ICSS age standardization for international comparisons
- Period analysis to track survival trends over diagnosis years
-
Regression models: additive excess hazard,
multiplicative, and flexible parametric (via
rstpm2) - Four publication-ready plots: observed, expected, relative survival, and excess mortality
Datasets
| Dataset | N | Events | Key Features |
|---|---|---|---|
relativesurvival_test |
200 | ~116 deaths | 4 cancer sites (Colon, Breast, Lung, Prostate), stages I-IV, ages 30-90, diagnosis years 2000-2015, covariates (comorbidity, tumor_size) |
data(relativesurvival_test)
str(relativesurvival_test)
#> 'data.frame': 200 obs. of 11 variables:
#> $ patient_id : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ followup_years : num 3.59 10 3.39 1.94 3.56 10 2.39 3.37 3.02 7.03 ...
#> $ vital_status : int 1 0 1 1 1 0 1 1 1 1 ...
#> $ age_at_diagnosis: num 81 58 69 73 70 64 83 64 89 64 ...
#> $ sex : Factor w/ 2 levels "female","male": 2 2 1 2 1 2 1 2 2 2 ...
#> $ diagnosis_year : int 2007 2003 2005 2009 2011 2010 2003 2007 2008 2014 ...
#> $ cancer_site : Factor w/ 4 levels "Breast","Colon",..: 4 4 2 3 3 2 3 2 2 3 ...
#> $ stage : Ord.factor w/ 4 levels "I"<"II"<"III"<..: 1 2 1 3 3 2 3 4 2 1 ...
#> $ grade : Factor w/ 3 levels "Moderately differentiated",..: 1 1 2 1 3 3 3 2 1 2 ...
#> $ comorbidity : int 0 0 0 0 2 2 2 2 2 3 ...
#> $ tumor_size : num 3.6 2.5 1.6 4 4.2 5.8 4.1 3.3 8.2 2.3 ...
summary(relativesurvival_test[, c("followup_years", "vital_status",
"age_at_diagnosis", "sex", "diagnosis_year")])
#> followup_years vital_status age_at_diagnosis sex diagnosis_year
#> Min. : 0.080 Min. :0.00 Min. :30.0 female: 82 Min. :2000
#> 1st Qu.: 1.518 1st Qu.:0.00 1st Qu.:58.0 male :118 1st Qu.:2004
#> Median : 3.460 Median :1.00 Median :65.0 Median :2008
#> Mean : 4.754 Mean :0.58 Mean :64.6 Mean :2008
#> 3rd Qu.: 8.590 3rd Qu.:1.00 3rd Qu.:73.0 3rd Qu.:2012
#> Max. :10.000 Max. :1.00 Max. :90.0 Max. :20151. Basic Analysis
The minimum call requires five variables: follow-up time, vital status, age at diagnosis, sex, and calendar year of diagnosis. By default, the module uses the Pohar-Perme estimator and the US population rate table.
This produces the main survival table (observed, expected, relative survival at 1, 3, 5, and 10 years), net survival table, excess mortality rates, crude probability of death, and all four plots.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no defaultWhat the outputs mean
- Survival Estimates by Time: Observed (Kaplan-Meier), expected (population), and relative (net) survival at each requested timepoint.
- Net Survival Estimates: The probability of surviving the disease if other causes of death were removed. Values near 1.0 mean the disease contributes little to mortality.
- Excess Mortality Rates: The additional hazard attributable to the disease in each year interval. Higher values indicate more disease-specific mortality.
- Crude Probability of Death: Decomposes total mortality into disease-related and other-cause components, accounting for competing risks.
- Clinical Interpretation: Automated 5-year prognosis grading (Excellent >90%, Good >70%, Fair >50%, Poor <50%).
2. Estimation Methods
The module supports four estimation methods, each differing in how expected survival is calculated. The Pohar-Perme method (default) is the only unbiased estimator of net survival and is recommended by international guidelines.
2a. Pohar-Perme (default, recommended)
Pohar-Perme uses inverse-probability-of-censoring weighting (IPCW) to produce unbiased net survival estimates. This is the EUROCARE/CONCORD standard.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
method = "poharperme"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default2b. Ederer II
Ederer II updates the expected survival at each event time. It was the traditional standard but is now known to be biased for net survival.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
method = "ederer2"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default2c. Ederer I
Ederer I calculates expected survival based on the full cohort demographics at baseline, without updating. It tends to overestimate expected survival at later times.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
method = "ederer1"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default2d. Hakulinen
Hakulinen weights expected survival by the censoring distribution, making it a compromise between Ederer I and Ederer II.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
method = "hakulinen"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default3. Rate Tables
Population rate tables are the source of expected survival. The choice of rate table should match the country and calendar period of the patient cohort.
3a. US Population (default)
The survexp.us table from the survival
package, repackaged by relsurv.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
ratetable = "us"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default3b. Minnesota Population
A state-level rate table useful for regional cancer registries.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
ratetable = "mn"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default3c. French Population
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
ratetable = "fr"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default3d. Slovenian Population
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
ratetable = "slovenia"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default3e. WHO-Based Tables (Turkey example)
WHO-based tables use Global Health Observatory life table data bundled with this module. They cover Turkey, Germany, UK, Italy, Japan, Spain, Brazil, South Korea, China, and India. If the table is not available, the module falls back to the US table with a notice.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
ratetable = "turkey"
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default4. Net Survival & Excess Mortality
By default, both net survival
(net_survival = TRUE) and excess mortality
(excess_mortality = TRUE) are calculated. You can toggle
each independently.
4a. Net Survival Only
Net survival is the probability of surviving the disease in a hypothetical world where the disease is the only possible cause of death.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
net_survival = TRUE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default4b. Excess Mortality Only
Excess mortality is the additional hazard due to the disease,
computed as -log(S_net(t) / S_net(t-1)) for each yearly
interval.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
net_survival = FALSE,
excess_mortality = TRUE,
crude_probability = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default5. Crude Probabilities
Crude probabilities decompose total mortality into disease-specific
and other-cause components, using the method of Cronin and Feuer
(implemented in relsurv::cmp.rel). This accounts for
competing risks.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = TRUE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default6. Age Standardization
Age-standardized relative survival uses ICSS (International Cancer Survival Standard) weights to make survival estimates comparable across populations with different age structures.
ICSS weight groups: - 0-44 years: 0.07 - 45-54 years: 0.12 - 55-64 years: 0.23 - 65-74 years: 0.29 - 75+ years: 0.29
The module calculates Pohar-Perme net survival within each age group, then produces a weighted average.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
age_standardized = TRUE,
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default7. Period Analysis
Period analysis estimates the most recent survival experience by restricting follow-up to a specific calendar window. Unlike cohort analysis (which follows patients from diagnosis), period analysis captures the latest treatment effects.
7a. Default Period Analysis
Without a cohort definition, the module creates 5-year diagnosis periods automatically from the data range and computes 5-year relative survival for each period.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
period_analysis = TRUE,
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default7b. With Cohort Definition
The cohort_year option restricts the analysis to
patients diagnosed within a specific year range. The format is
“start-end” (e.g., “2005-2015”).
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
period_analysis = TRUE,
cohort_year = "2005-2015",
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default8. Regression Models
When additional covariates are provided, the module can fit three types of regression models to assess the effect of prognostic factors on excess mortality.
8a. Additive Excess Hazard Model
The additive model (relsurv::rsadd) assumes that excess
hazard is additive to the expected hazard. This is the classic Esteve
model.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
covariates = "cancer_site",
regression_model = "additive",
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error:
#> ! 'survexp.us' is not an exported object from 'namespace:relsurv'8b. Multiplicative Model
The multiplicative model (relsurv::rsmul) assumes that
excess hazard is proportional (multiplicative) to baseline excess
hazard.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
covariates = "cancer_site",
regression_model = "multiplicative",
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error:
#> ! 'survexp.us' is not an exported object from 'namespace:relsurv'8c. Flexible Parametric Model
The flexible parametric model (rstpm2::stpm2) uses
restricted cubic splines to model the baseline excess hazard. The
spline_df option controls the degrees of freedom
(complexity) of the spline.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
covariates = "cancer_site",
regression_model = "flexible",
spline_df = 3,
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error:
#> ! 'survexp.us' is not an exported object from 'namespace:relsurv'8d. Multiple Covariates
You can include multiple covariates (continuous and/or categorical).
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
covariates = c("cancer_site", "stage"),
regression_model = "additive",
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error:
#> ! 'survexp.us' is not an exported object from 'namespace:relsurv'9. Time Scales
The module accepts follow-up time in years (default), months, or
days. The time_scale option tells the module how to
interpret the time variable; it is internally converted to years (and
then to days for the relsurv rate table).
9a. Time in Months
# Create a copy with follow-up in months
test_months <- relativesurvival_test
test_months$followup_months <- test_months$followup_years * 12
relativesurvival(
data = test_months,
time = "followup_months",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
time_scale = "months",
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default9b. Time in Days
# Create a copy with follow-up in days
test_days <- relativesurvival_test
test_days$followup_days <- test_days$followup_years * 365.25
relativesurvival(
data = test_days,
time = "followup_days",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
time_scale = "days",
net_survival = FALSE,
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default10. Custom Timepoints & Confidence Levels
10a. Custom Timepoints
The timepoints option accepts a comma-separated string
of time values (in years, regardless of the time_scale
setting). These determine at which follow-up times the survival
estimates are reported.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
timepoints = "0.5, 1, 2, 3, 5, 7",
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default10b. Custom Confidence Level
The confidence_level option controls the width of all
confidence intervals in the analysis (default 0.95). Here we use
90%.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
confidence_level = 0.90,
timepoints = "1, 3, 5",
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default11. Edge Cases
11a. Small Sample Warning
The module requires at least 30 complete cases and at least 10 events. With 10-19 events, a strong warning is displayed; with 20-49, a moderate warning.
# Subset to a small sample with few events
small_data <- relativesurvival_test[1:40, ]
relativesurvival(
data = small_data,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
excess_mortality = FALSE,
crude_probability = FALSE,
plot_observed = FALSE,
plot_expected = FALSE,
plot_relative = FALSE,
plot_excess = FALSE
)
#> Error in `relativesurvival()`:
#> ! argument "covariates" is missing, with no default11b. Full Analysis with All Options Enabled
This demonstrates the maximum output: all analysis options, all plots, period analysis with cohort definition, age standardization, and an additive regression model.
relativesurvival(
data = relativesurvival_test,
time = "followup_years",
status = "vital_status",
age = "age_at_diagnosis",
sex = "sex",
year = "diagnosis_year",
covariates = c("cancer_site", "stage"),
ratetable = "us",
method = "poharperme",
time_scale = "years",
net_survival = TRUE,
excess_mortality = TRUE,
crude_probability = TRUE,
age_standardized = TRUE,
period_analysis = TRUE,
cohort_year = "2000-2015",
regression_model = "additive",
plot_observed = TRUE,
plot_expected = TRUE,
plot_relative = TRUE,
plot_excess = TRUE,
confidence_level = 0.95,
timepoints = "1, 3, 5, 10"
)
#> Error:
#> ! 'survexp.us' is not an exported object from 'namespace:relsurv'Complete Option Reference
| # | Option | Type | Default | Range/Choices | Section Demonstrated |
|---|---|---|---|---|---|
| 1 | time |
Variable (numeric) | – | continuous | 1. Basic Analysis |
| 2 | status |
Variable (factor/numeric) | – | binary 0/1 | 1. Basic Analysis |
| 3 | age |
Variable (numeric) | – | 0-120 | 1. Basic Analysis |
| 4 | sex |
Variable (factor) | – | male/female mappings | 1. Basic Analysis |
| 5 | year |
Variable (numeric) | – | 1900-2100 | 1. Basic Analysis |
| 6 | covariates |
Variables (numeric/factor) | – | any | 8. Regression Models |
| 7 | ratetable |
List | us |
us/mn/fr/slovenia/turkey/germany/uk/italy/japan/spain/brazil/south_korea/china/india/custom | 3. Rate Tables |
| 8 | method |
List | poharperme |
poharperme/ederer1/ederer2/hakulinen | 2. Estimation Methods |
| 9 | time_scale |
List | years |
years/months/days | 9. Time Scales |
| 10 | net_survival |
Bool | true |
– | 4. Net Survival |
| 11 | excess_mortality |
Bool | true |
– | 4. Excess Mortality |
| 12 | crude_probability |
Bool | true |
– | 5. Crude Probabilities |
| 13 | age_standardized |
Bool | false |
– | 6. Age Standardization |
| 14 | period_analysis |
Bool | false |
– | 7. Period Analysis |
| 15 | cohort_year |
String | "" |
e.g., “2010-2015” | 7b. Cohort Definition |
| 16 | regression_model |
List | none |
none/additive/multiplicative/flexible | 8. Regression Models |
| 17 | spline_df |
Integer | 4 |
1-10 | 8c. Flexible Parametric |
| 18 | plot_observed |
Bool | true |
– | 1. Basic Analysis |
| 19 | plot_expected |
Bool | true |
– | 1. Basic Analysis |
| 20 | plot_relative |
Bool | true |
– | 1. Basic Analysis |
| 21 | plot_excess |
Bool | true |
– | 4b. Excess Mortality |
| 22 | confidence_level |
Number | 0.95 |
0.50-0.99 | 10b. Custom Confidence |
All 22 .a.yaml options are covered in the examples
above.
References
- Pohar Perme M, Stare J, Esteve J. On Estimation in Relative Survival. Biometrics, 2012;68:113-120.
- Ederer F, Axtell LM, Cutler SJ. The Relative Survival Rate: A Statistical Methodology. NCI Monograph, 1961;6:101-121.
- Hakulinen T. Cancer Survival Corrected for Heterogeneity in Patient Withdrawal. Biometrics, 1982;38:933-942.
- Corazziari I, Quinn M, Capocaccia R. Standard cancer patient population for age standardising survival ratios. Eur J Cancer, 2004;40:2307-2316.
- Dickman PW, Coviello E. Estimating and Modeling Relative Survival. The Stata Journal, 2015;15(1):186-215.
- Lambert PC, Royston P. Further Development of Flexible Parametric Models for Survival Analysis. The Stata Journal, 2009;9(2):265-290.
- Cronin KA, Feuer EJ. Cumulative Cause-Specific Mortality for Cancer Patients in the Presence of Other Causes: A Crude Analogue of Relative Survival. Statistics in Medicine, 2000;19:1729-1740.
This vignette is part of the jsurvival module of the ClinicoPath jamovi package.