Analyzing Diagnostic Tests without a Gold Standard
ClinicoPath Development Team
2025-06-03
Source:vignettes/nogoldstandard.Rmd
nogoldstandard.Rmd
Introduction
In clinical practice and research, evaluating diagnostic test
performance typically requires a “gold standard” reference test to
determine the true disease status. However, in many situations, a
perfect gold standard test is unavailable, expensive, invasive, or
unethical to perform. The nogoldstandard
module in the
ClinicoPath package provides methods for analyzing the performance of
multiple diagnostic tests without requiring a gold standard
reference.
This vignette demonstrates how to use the nogoldstandard
module to:
- Estimate disease prevalence
- Calculate sensitivity and specificity for multiple tests
- Compute confidence intervals using bootstrap methods
- Visualize test agreement patterns
Why analyze tests without a gold standard?
When no perfect reference test exists, researchers typically face challenging options:
- Use an imperfect reference (introducing verification bias)
- Employ composite reference standards (potentially arbitrary)
- Exclude subjects without verification (introducing selection bias)
Statistical methods for “no gold standard” analysis offer an alternative approach, allowing estimation of test performance metrics by using the patterns of agreement and disagreement among multiple imperfect tests.
Methods Overview
The nogoldstandard
module implements five different
approaches:
Latent Class Analysis (LCA): Assumes disease status is an unobserved (latent) variable and models the relationship between this latent variable and the observed test results.
Composite Reference: Creates a reference standard by considering the majority result across all tests as the “true” status.
All Tests Positive: Considers disease present only when all tests are positive (high specificity approach).
Any Test Positive: Considers disease present when any test is positive (high sensitivity approach).
Bayesian Analysis: Uses prior distributions and an EM algorithm to estimate parameters, potentially incorporating prior knowledge.
Installation
The nogoldstandard
module is part of the ClinicoPath
package:
# Install from CRAN
install.packages("ClinicoPath")
# Or the development version
# install.packages("devtools")
devtools::install_github("sbalci/ClinicoPath")
Basic Usage
Example Data
Let’s create a sample dataset with 4 diagnostic tests performed on 200 patients:
set.seed(123)
n <- 200 # Number of patients
# True disease status (unknown in practice)
true_status <- rbinom(n, 1, 0.3) # 30% prevalence
# Create 4 imperfect tests with different sensitivity/specificity
test1 <- ifelse(true_status == 1,
rbinom(n, 1, 0.90), # 90% sensitivity
rbinom(n, 1, 0.10)) # 90% specificity
test2 <- ifelse(true_status == 1,
rbinom(n, 1, 0.75), # 75% sensitivity
rbinom(n, 1, 0.05)) # 95% specificity
test3 <- ifelse(true_status == 1,
rbinom(n, 1, 0.85), # 85% sensitivity
rbinom(n, 1, 0.15)) # 85% specificity
test4 <- ifelse(true_status == 1,
rbinom(n, 1, 0.70), # 70% sensitivity
rbinom(n, 1, 0.05)) # 95% specificity
# Convert to categorical format
test1 <- factor(ifelse(test1 == 1, "pos", "neg"))
test2 <- factor(ifelse(test2 == 1, "pos", "neg"))
test3 <- factor(ifelse(test3 == 1, "pos", "neg"))
test4 <- factor(ifelse(test4 == 1, "pos", "neg"))
# Create data frame
data <- data.frame(
caseID = 1:n,
test1 = test1,
test2 = test2,
test3 = test3,
test4 = test4
)
# View the first few rows
head(data)
#> caseID test1 test2 test3 test4
#> 1 1 pos neg pos neg
#> 2 2 neg pos pos pos
#> 3 3 pos neg neg neg
#> 4 4 pos pos neg pos
#> 5 5 pos pos pos pos
#> 6 6 neg pos pos neg
Running the Analysis
Now let’s analyze this data using the Latent Class Analysis method:
library(ClinicoPath)
result <- nogoldstandard(
data = data,
test1 = "test1",
test1Positive = "pos",
test2 = "test2",
test2Positive = "pos",
test3 = "test3",
test3Positive = "pos",
test4 = "test4",
test4Positive = "pos",
test5 = NULL,
method = "latent_class"
)
The result will contain:
- Estimated disease prevalence
- Sensitivity and specificity for each test
- Test agreement matrix visualization
Bootstrap Confidence Intervals
One of the key features of the nogoldstandard
module is
the ability to compute confidence intervals using bootstrap resampling.
This provides a measure of uncertainty around the estimated
parameters.
Understanding Bootstrap in NoGoldStandard Analysis
Bootstrap resampling involves:
- Randomly selecting observations from the original dataset with replacement
- Running the selected analysis method on each resampled dataset
- Computing the parameter of interest (prevalence, sensitivity, specificity)
- Repeating steps 1-3 many times to build a distribution
- Using the distribution to calculate confidence intervals
Enabling Bootstrap with Progress Reporting
The enhanced version of the nogoldstandard
module
includes detailed progress reporting during bootstrap analysis:
result_with_ci <- nogoldstandard(
data = data,
test1 = "test1",
test1Positive = "pos",
test2 = "test2",
test2Positive = "pos",
test3 = "test3",
test3Positive = "pos",
test4 = "test4",
test4Positive = "pos",
method = "latent_class",
bootstrap = TRUE, # Enable bootstrap
nboot = 1000, # Number of bootstrap samples
alpha = 0.05 # For 95% confidence intervals
)
When running this analysis, you’ll see progress updates in the console:
=== Bootstrap Analysis ===
Starting bootstrap with 1000 iterations for latent_class method
Estimating confidence intervals for prevalence
50/1000 (5.0%) - 50 successful, 0 errors - 12.3 sec elapsed, ~234.7 sec remaining
100/1000 (10.0%) - 100 successful, 0 errors - 24.8 sec elapsed, ~223.2 sec remaining
...
950/1000 (95.0%) - 942 successful, 8 errors - 236.5 sec elapsed, ~12.5 sec remaining
1000/1000 (100.0%) - 991 successful, 9 errors - 249.3 sec elapsed, ~0.0 sec remaining
=== Bootstrap Complete ===
Total time: 249.3 seconds (4.01 iterations/sec)
Successful iterations: 991 (99.1%)
Failed iterations: 9 (0.9%)
Confidence interval (95.0%): [0.2145, 0.2987]
Expected Duration for Bootstrap Analysis
The time required for bootstrap analysis depends on several factors:
Dataset Size | 100 Iterations | 1,000 Iterations | 10,000 Iterations |
---|---|---|---|
Small (<100 obs, 2-3 tests) | 5-30 sec | 30 sec - 5 min | 5-50 min |
Medium (100-1,000 obs, 3-4 tests) | 30 sec - 2 min | 3-20 min | 30 min - 3 hrs |
Large (>1,000 obs, 5 tests) | 1-5 min | 10-60 min | 1-8 hrs |
The analysis method also affects duration: - Latent Class Analysis: Slowest (multiple model fitting attempts) - Bayesian Analysis: Moderate (iterative EM algorithm) - Composite/All/Any Test: Fastest (simple calculations)
Comparing Different Methods
Let’s compare the results from different analysis methods:
# Run analysis with each method
methods <- c("latent_class", "composite", "all_positive", "any_positive", "bayesian")
results <- list()
for (method in methods) {
results[[method]] <- nogoldstandard(
data = data,
test1 = "test1",
test1Positive = "pos",
test2 = "test2",
test2Positive = "pos",
test3 = "test3",
test3Positive = "pos",
test4 = "test4",
test4Positive = "pos",
method = method
)
}
# Extract prevalence estimates
prevalence_estimates <- sapply(results, function(x) {
x$prevalence$asDF()$estimate[1]
})
print(prevalence_estimates)
Different methods may produce different estimates. In general:
- Latent Class Analysis: Often considered the most theoretically sound but requires assumptions about conditional independence
- Composite Reference: Practical but can be biased toward majority
- All Tests Positive: Conservative approach (low prevalence, high test specificity)
- Any Test Positive: Liberal approach (high prevalence, high test sensitivity)
- Bayesian Analysis: Flexible and can incorporate prior knowledge, but requires careful prior specification
Interpreting Results
Disease Prevalence
The estimated prevalence represents the proportion of the population expected to have the disease. Different methods will yield different prevalence estimates:
- All Tests Positive: Typically produces the lowest prevalence estimate
- Any Test Positive: Typically produces the highest prevalence estimate
- Latent Class Analysis: Usually produces an intermediate estimate
Test Performance
For each test, the module estimates:
- Sensitivity: The probability of a positive test result in patients with the disease
- Specificity: The probability of a negative test result in patients without the disease
Sensitivity and specificity can be used to:
- Compare test performance
- Inform test selection decisions
- Design optimal testing strategies
Advanced Topics
Handling Missing Data
The nogoldstandard
module can handle missing test
results. The enhanced implementation:
- Skips missing values when calculating agreement
- Uses available data for each test when estimating parameters
- Properly accounts for missing data in bootstrap resampling
Bayesian Analysis with Prior Information
If you have prior knowledge about disease prevalence or test characteristics, the Bayesian method can incorporate this information:
# Example of Bayesian analysis with informative priors would be shown here
# This feature would require customization of the Bayesian method code
Conditional Dependence
The default Latent Class Analysis assumes conditional independence between tests given the true disease status. If this assumption is violated, results may be biased. Extensions to handle conditional dependence are:
- Including direct test-to-test associations in the model
- Using more than two latent classes
- Applying hierarchical latent class models
Recommendations for Practice
Method Selection
- Start with Latent Class Analysis as the primary method
- Compare with composite methods to check robustness
- Consider Bayesian analysis if prior information is available
- Report results from multiple methods for transparency
Technical Implementation
The Enhanced Bootstrap Function
Below is the implementation of the bootstrap function with progress reporting:
.calculateBootstrapCI = function(data, method, nboot, alpha, type, test_index = NULL) {
# Simple bootstrap implementation with progress indicators
n <- nrow(data)
boot_results <- numeric(nboot)
# Show starting message
message("\n=== Bootstrap Analysis ===")
message(sprintf("Starting bootstrap with %d iterations for %s method", nboot, method))
message(sprintf("Estimating confidence intervals for %s", type))
if (!is.null(test_index)) {
message(sprintf("Test index: %d", test_index))
}
# Progress tracking variables
start_time <- Sys.time()
last_update <- start_time
update_interval <- max(1, floor(nboot / 20)) # Update ~20 times during process
success_count <- 0
error_count <- 0
for (b in 1:nboot) {
# Resample data
boot_indices <- sample(n, n, replace = TRUE)
boot_data <- data[boot_indices, ]
# Run analysis on bootstrap sample
boot_result <- NULL
tryCatch({
if (method == "latent_class") {
boot_result <- private$.runLCA(boot_data, names(data), NULL)
} else if (method == "composite") {
boot_result <- private$.runComposite(boot_data)
} else if (method == "all_positive") {
boot_result <- private$.runAllPositive(boot_data)
} else if (method == "any_positive") {
boot_result <- private$.runAnyPositive(boot_data)
} else if (method == "bayesian") {
boot_result <- private$.runBayesian(boot_data)
}
success_count <- success_count + 1
}, error = function(e) {
# Count errors but continue bootstrap
error_count <- error_count + 1
})
# Extract relevant statistic
if (!is.null(boot_result)) {
if (type == "prevalence") {
boot_results[b] <- boot_result$prevalence
} else if (type == "sensitivity" && !is.null(test_index)) {
boot_results[b] <- boot_result$sensitivities[test_index]
} else if (type == "specificity" && !is.null(test_index)) {
boot_results[b] <- boot_result$specificities[test_index]
}
} else {
boot_results[b] <- NA
}
# Show progress updates
current_time <- Sys.time()
if (b %% update_interval == 0 || b == nboot ||
as.numeric(difftime(current_time, last_update, units = "secs")) > 10) {
elapsed <- as.numeric(difftime(current_time, start_time, units = "secs"))
percent_done <- b / nboot * 100
est_total <- elapsed / percent_done * 100
est_remaining <- est_total - elapsed
message(sprintf(" %d/%d (%.1f%%) - %d successful, %d errors - %.1f sec elapsed, ~%.1f sec remaining",
b, nboot, percent_done, success_count, error_count,
elapsed, est_remaining))
last_update <- current_time
}
}
# Show final statistics
total_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs"))
message("\n=== Bootstrap Complete ===")
message(sprintf("Total time: %.1f seconds (%.2f iterations/sec)",
total_time, nboot/total_time))
message(sprintf("Successful iterations: %d (%.1f%%)",
success_count, success_count/nboot*100))
message(sprintf("Failed iterations: %d (%.1f%%)",
error_count, error_count/nboot*100))
# Calculate percentile CI
boot_results <- boot_results[!is.na(boot_results)]
if (length(boot_results) > 0) {
ci <- quantile(boot_results, c(alpha/2, 1-alpha/2), na.rm=TRUE)
message(sprintf("Confidence interval (%.1f%%): [%.4f, %.4f]",
(1-alpha)*100, ci[1], ci[2]))
return(list(lower = ci[1], upper = ci[2]))
} else {
message("WARNING: No valid bootstrap results obtained. Returning NA.")
return(list(lower = NA, upper = NA))
}
}
References
Hui SL, Walter SD. Estimating the error rates of diagnostic tests. Biometrics. 1980;36(1):167-171.
Joseph L, Gyorkos TW, Coupal L. Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am J Epidemiol. 1995;141(3):263-272.
Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics. 2004;60(2):427-435.
Collins J, Huynh M. Estimation of diagnostic test accuracy without full verification: a review of latent class methods. Stat Med. 2014;33(24):4141-4169.
Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001;57(1):158-167.
Session Information
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS 15.5
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Istanbul
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 desc_1.4.3 R6_2.6.1 fastmap_1.2.0
#> [5] xfun_0.52 cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1
#> [9] rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.5 sass_0.4.10
#> [13] pkgdown_2.1.3 textshaping_1.0.1 jquerylib_0.1.4 systemfonts_1.2.3
#> [17] compiler_4.3.2 rstudioapi_0.17.1 tools_4.3.2 ragg_1.4.0
#> [21] bslib_0.9.0 evaluate_1.0.3 yaml_2.3.10 jsonlite_2.0.0
#> [25] rlang_1.1.6 fs_1.6.6 htmlwidgets_1.6.4