Skip to contents

Overview

The leaveonecenterout function performs leave-one-center-out cross-validation (LOOCV) for internal-external validation of multi-institutional prediction models. It iteratively holds out each center as a test set, trains on the remaining centers, and evaluates predictive performance.

This implements the validation framework recommended by TRIPOD guidelines (Debray et al., BMJ 2015).

Key features:

  • Three model types: logistic, Cox, and linear regression
  • Optional LASSO regularization with lambda selection (minimum CV error or 1SE rule)
  • Per-center and pooled performance metrics (AUC, C-index, R-squared)
  • Forest plot visualization of center-specific performance
  • Clinical notices for data quality, EPV, discrimination, and heterogeneity

Test Data

Two bundled datasets are available:

library(ClinicoPath)
data(loocv_multicenter)  # N=200, 5 centers
data(loocv_small)        # N=45, 3 centers

1. Logistic Regression (Binary Outcome)

leaveonecenterout(
    data = loocv_multicenter,
    outcome = "treatment_response",
    outcomeLevel = "Responder",
    predictors = c("age", "ki67_score", "tumor_size_mm", "grade"),
    centerVariable = "institution",
    elapsedtime = "os_time",
    modelType = "logistic"
)

Outputs: Study design summary, per-center AUC with 95% CI, pooled AUC, forest plot, interpretation with heterogeneity assessment.

2. Cox Regression (Survival Outcome)

leaveonecenterout(
    data = loocv_multicenter,
    outcome = "os_status",
    outcomeLevel = "Dead",
    predictors = c("age", "ki67_score", "grade"),
    centerVariable = "institution",
    elapsedtime = "os_time",
    modelType = "cox"
)

Metric: C-index (concordance) with 95% CI per center, clamped to [0, 1].

3. Linear Regression (Continuous Outcome)

leaveonecenterout(
    data = loocv_multicenter,
    outcome = "tumor_shrinkage",
    outcomeLevel = "Responder",
    predictors = c("age", "ki67_score", "tumor_size_mm"),
    centerVariable = "institution",
    elapsedtime = "os_time",
    modelType = "linear"
)

Metric: Out-of-sample R-squared per center (can be negative when model predicts worse than the mean). RMSE also computed.

4. LASSO Regularization

LASSO (L1 penalty) performs automatic variable selection within each CV fold. Recommended when the number of predictors is large relative to the number of events.

Lambda selection: 1SE rule (more parsimonious)

leaveonecenterout(
    data = loocv_multicenter,
    outcome = "treatment_response",
    outcomeLevel = "Responder",
    predictors = c("age", "ki67_score", "tumor_size_mm", "ctdna_level",
                   "grade", "stage", "gender"),
    centerVariable = "institution",
    elapsedtime = "os_time",
    modelType = "logistic",
    useLasso = TRUE,
    lambdaMethod = "lambda.1se"
)

Lambda selection: Minimum CV error (less shrinkage)

leaveonecenterout(
    data = loocv_multicenter,
    outcome = "treatment_response",
    outcomeLevel = "Responder",
    predictors = c("age", "ki67_score", "tumor_size_mm", "ctdna_level",
                   "grade", "stage", "gender"),
    centerVariable = "institution",
    elapsedtime = "os_time",
    modelType = "logistic",
    useLasso = TRUE,
    lambdaMethod = "lambda.min"
)

LASSO Cox

leaveonecenterout(
    data = loocv_multicenter,
    outcome = "os_status",
    outcomeLevel = "Dead",
    predictors = c("age", "ki67_score", "tumor_size_mm", "grade"),
    centerVariable = "institution",
    elapsedtime = "os_time",
    modelType = "cox",
    useLasso = TRUE
)

Note: LASSO is not available for linear regression. If useLasso=TRUE with modelType="linear", a warning notice is displayed and standard OLS is used.

5. Output Options

Toggle pooled performance and forest plot

# Minimal output: per-center results only
leaveonecenterout(
    data = loocv_multicenter,
    outcome = "treatment_response",
    outcomeLevel = "Responder",
    predictors = c("age", "ki67_score"),
    centerVariable = "institution",
    elapsedtime = "os_time",
    pooledPerformance = FALSE,
    forestPlot = FALSE
)

# With calibration (Brier score)
leaveonecenterout(
    data = loocv_multicenter,
    outcome = "treatment_response",
    outcomeLevel = "Responder",
    predictors = c("age", "ki67_score", "grade"),
    centerVariable = "institution",
    elapsedtime = "os_time",
    calibrationCheck = TRUE
)

6. Small Dataset (Edge Cases)

data(loocv_small)

leaveonecenterout(
    data = loocv_small,
    outcome = "diagnosis",
    outcomeLevel = "Positive",
    predictors = c("age", "marker1", "marker2"),
    centerVariable = "center",
    elapsedtime = "surv_time",
    modelType = "logistic"
)

With 3 centers and small sample sizes, expect:

  • Warning notice for centers with < 5 cases
  • Wider confidence intervals
  • Some centers may be skipped if no event variation

Interpretation Guide

Discrimination Assessment

Metric Range Logistic/Cox Linear
>= 0.80 (AUC/C-index) / >= 0.50 (R-squared) Good Good
0.70-0.80 / 0.20-0.50 Acceptable Moderate
< 0.70 / < 0.20 Poor Poor
– / < 0.00 Very Poor

Heterogeneity

SD Range Interpretation
<= 0.05 Low heterogeneity – consistent performance
0.05-0.10 Moderate – some variation across centers
> 0.10 High – consider center-level covariates or stratification

Notice Types

  • Warning (blue): Data quality concerns (small centers, LASSO+linear)
  • Strong Warning (yellow): Results may be unreliable (low EPV, poor discrimination)
  • Info (green): Confirmations (event level, completion summary)

References

  • Debray TPA et al. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol 2015;68:279-289.
  • Riley RD et al. Minimum sample size for developing a multivariable prediction model. Stat Med 2019;38:1262-1275.
  • Collins GS et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD). Ann Intern Med 2015;162:55-63.