Vignette: Contingency Tables and Association Tests with ClinicoPath
Analysis by Claude
2025-07-13
Source:vignettes/clinicopath-descriptives-19-contingency-tables.Rmd
clinicopath-descriptives-19-contingency-tables.Rmd
Introduction
The contTables
function in ClinicoPath provides
comprehensive contingency table analysis for examining associations
between categorical variables. This function performs χ² tests of
association and provides various measures of association, including phi
coefficients, Cramer’s V, odds ratios, relative risk, and ordinal
measures like Gamma and Kendall’s tau-b.
When to Use Contingency Tables
Contingency tables are used to: - Test independence between two categorical variables - Examine associations in 2×2 tables (e.g., exposure vs. outcome) - Analyze multi-way tables with stratification variables - Calculate measures of association strength - Perform exact tests for small samples
# Load required libraries
library(ClinicoPath)
# Use the histopathology dataset included with ClinicoPath
# For demonstration, ensure we have the data
if (!exists("histopathology")) {
data(histopathology, package = "ClinicoPath")
}
# Examine the structure of relevant categorical variables
str(histopathology[c("Sex", "Mortality5yr", "Grade", "TStage", "Group", "LVI", "PNI")])
Basic 2×2 Contingency Table
Let’s start with a basic 2×2 table examining the association between sex and 5-year mortality:
# Basic contingency table analysis
basic_result <- contTables(
data = histopathology,
rows = "Sex",
cols = "Mortality5yr",
chiSq = TRUE,
fisher = TRUE,
obs = TRUE,
pcRow = TRUE,
pcCol = TRUE
)
# Display basic result structure
print(basic_result)
For 2×2 tables, we can also calculate odds ratios and relative risk:
# 2×2 table with comparative measures
measures_result <- contTables(
data = histopathology,
rows = "Sex",
cols = "Mortality5yr",
chiSq = TRUE,
fisher = TRUE,
odds = TRUE,
logOdds = TRUE,
relRisk = TRUE,
ci = TRUE,
ciWidth = 95,
obs = TRUE,
exp = TRUE,
pcRow = TRUE,
pcCol = TRUE,
pcTot = TRUE
)
# Display measures result
print(measures_result)
Larger Contingency Tables
For variables with more than two categories, we can examine association patterns:
# Multi-category contingency table
multi_result <- contTables(
data = histopathology,
rows = "Grade",
cols = "TStage",
chiSq = TRUE,
likeRat = TRUE,
contCoef = TRUE,
phiCra = TRUE,
obs = TRUE,
exp = TRUE,
pcRow = TRUE,
pcCol = TRUE
)
# Display multi-category result
print(multi_result)
Ordinal Association Measures
When dealing with ordinal variables, we can use measures that account for ordering:
# Ordinal measures for Grade (ordinal) vs TStage (ordinal)
ordinal_result <- contTables(
data = histopathology,
rows = "Grade",
cols = "TStage",
gamma = TRUE,
taub = TRUE,
ci = TRUE,
ciWidth = 95,
obs = TRUE
)
# Display ordinal result
print(ordinal_result)
Stratified Analysis
We can stratify our analysis by additional variables using layers:
# Stratified analysis by treatment group
stratified_result <- contTables(
data = histopathology,
rows = "Sex",
cols = "Mortality5yr",
layers = "Group",
chiSq = TRUE,
fisher = TRUE,
odds = TRUE,
ci = TRUE,
obs = TRUE,
pcRow = TRUE
)
# Display stratified result
print(stratified_result)
Working with Count Data
The function can also handle aggregated count data, similar to the classic HairEyeColor dataset:
# Example with HairEyeColor data (if available)
if (require(datasets, quietly = TRUE)) {
data(HairEyeColor)
hair_eye_data <- as.data.frame(HairEyeColor)
# Using count data
count_result <- contTables(
data = hair_eye_data,
rows = "Hair",
cols = "Eye",
counts = "Freq",
chiSq = TRUE,
contCoef = TRUE,
phiCra = TRUE,
obs = TRUE
)
# Alternative: using formula interface
formula_result <- contTables(
formula = Freq ~ Hair:Eye,
data = hair_eye_data,
chiSq = TRUE
)
head(hair_eye_data)
}
Multiple Association Tests
For a comprehensive analysis, we can request multiple association measures:
# Comprehensive analysis
comprehensive_result <- contTables(
data = histopathology,
rows = "LVI",
cols = "PNI",
chiSq = TRUE,
chiSqCorr = TRUE,
likeRat = TRUE,
fisher = TRUE,
contCoef = TRUE,
phiCra = TRUE,
odds = TRUE,
relRisk = TRUE,
ci = TRUE,
ciWidth = 95,
obs = TRUE,
exp = TRUE,
pcRow = TRUE,
pcCol = TRUE,
pcTot = TRUE
)
# Display comprehensive result
print(comprehensive_result)
Interpreting Results
Chi-square Tests
- χ²: Tests the null hypothesis of independence
- χ² with continuity correction: Yates’ correction for 2×2 tables
- Likelihood ratio: Alternative to Pearson’s χ²
- Fisher’s exact test: Exact p-value for small samples
Association Measures
- Phi coefficient: For 2×2 tables (ranges from -1 to 1)
- Cramer’s V: Standardized measure for larger tables (0 to 1)
- Contingency coefficient: Alternative association measure
Clinical Examples
Example 1: Biomarker and Treatment Response
# Analyzing lymphovascular invasion and outcome
lvi_outcome <- contTables(
data = histopathology,
rows = "LVI",
cols = "Mortality5yr",
chiSq = TRUE,
fisher = TRUE,
odds = TRUE,
relRisk = TRUE,
ci = TRUE,
ciWidth = 95,
obs = TRUE,
pcRow = TRUE
)
# This analysis helps determine if LVI is associated with poor prognosis
Example 2: Grading System Validation
# Testing association between grade and stage
grade_stage <- contTables(
data = histopathology,
rows = "Grade",
cols = "TStage",
chiSq = TRUE,
gamma = TRUE,
taub = TRUE,
obs = TRUE,
pcRow = TRUE,
pcCol = TRUE
)
# Gamma and tau-b are particularly useful for ordinal variables like grade and stage
Tips for Clinical Research
- Choose appropriate tests: Use Fisher’s exact test for small samples, especially when expected counts < 5
- Report effect sizes: P-values alone are insufficient; include odds ratios, relative risks, or correlation coefficients
- Consider confidence intervals: They provide information about precision and clinical significance
- Use stratified analysis: Control for confounding variables using layers
- Check assumptions: Ensure adequate sample sizes and appropriate variable types
Conclusion
The contTables
function provides a comprehensive toolkit
for analyzing associations between categorical variables in clinical
research. From basic χ² tests to sophisticated measures of association,
it supports the full range of contingency table analyses needed in
pathology and clinical studies.
The function integrates seamlessly with the ClinicoPath package ecosystem and can handle both individual-level and aggregated count data, making it versatile for various research scenarios.