Skip to contents

Comprehensive interrater reliability analysis including Cohen's kappa (2 raters), Fleiss' kappa (3+ raters), Krippendorff's alpha, and consensus analysis. Provides agreement statistics, visualization, and clinical interpretation for categorical rating data.

Usage

pathagreement(
  data,
  vars,
  sft = FALSE,
  heatmap = FALSE,
  heatmapDetails = FALSE,
  heatmapTheme = "viridis",
  wght = "unweighted",
  exct = FALSE,
  multiraterMethod = "auto",
  fleissCI = TRUE,
  kripp = FALSE,
  krippMethod = "nominal",
  consensus = FALSE,
  consensus_method = "majority",
  tie_breaking = "exclude",
  show_consensus_table = TRUE,
  showClinicalSummary = TRUE,
  showAboutAnalysis = FALSE,
  showAssumptions = FALSE,
  showWeightedKappaGuide = TRUE,
  showStatisticalGlossary = FALSE,
  diagnosticStyleAnalysis = FALSE,
  styleClusterMethod = "ward",
  styleDistanceMetric = "agreement",
  styleGroups = 3,
  raterCharacteristics = FALSE,
  experienceVar,
  trainingVar,
  institutionVar,
  specialtyVar,
  identifyDiscordantCases = FALSE,
  caseID,
  icc = FALSE,
  bootstrap = FALSE,
  bootstrapSamples = 1000,
  pairwiseAnalysis = FALSE,
  categoryAnalysis = FALSE,
  outlierAnalysis = FALSE,
  pathologyContext = FALSE,
  gwetAC = FALSE,
  pabak = FALSE,
  sampleSizePlanning = FALSE,
  targetKappa = 0.8,
  targetPrecision = 0.1,
  raterBiasAnalysis = FALSE,
  agreementTrendAnalysis = FALSE,
  caseDifficultyScoring = FALSE,
  agreementStabilityAnalysis = FALSE,
  performClustering = FALSE,
  clusteringMethod = "ward",
  nStyleGroups = 3,
  autoSelectGroups = FALSE,
  showClusteringHeatmap = TRUE,
  heatmapColorScheme = "diagnostic",
  identifyDiscordant = TRUE,
  discordantThreshold = 0.5,
  raterExperience = NULL,
  raterSpecialty = NULL,
  raterInstitution = NULL,
  raterVolume = NULL,
  referenceStandard = NULL,
  useMetadataRows = FALSE,
  showInlineComments = FALSE,
  showClusteringInterpretation = FALSE,
  enhancedErrorGuidance = TRUE,
  showProgressIndicators = TRUE
)

Arguments

data

The data as a data frame. Each row represents a case/subject, and columns represent different raters/observers.

vars

Variables representing different raters/observers. Each variable should contain the ratings/diagnoses given by each observer for the same set of cases.

sft

Show frequency tables for each rater and cross-tabulation tables for pairwise comparisons.

heatmap

Show agreement heatmap visualization with color-coded agreement levels.

heatmapDetails

Show detailed heatmap with kappa values and confidence intervals for all rater pairs.

heatmapTheme

Choose color scheme for the agreement heatmap visualization.

wght

Weighting scheme for kappa analysis. Use 'squared' or 'equal' only with ordinal variables. Weighted kappa accounts for the degree of disagreement.

exct

Use exact method for Fleiss' kappa calculation with 3 or more raters. More accurate but computationally intensive.

multiraterMethod

Choose specific method for multi-rater agreement analysis or use automatic selection.

fleissCI

Calculate 95\ asymptotic standard errors.

kripp

Calculate Krippendorff's alpha, a generalized measure of reliability for any number of observers and data types.

krippMethod

Measurement level for Krippendorff's alpha calculation. Choose based on your data type.

consensus

Perform consensus scoring analysis to determine agreed-upon ratings from multiple raters.

consensus_method

Method for determining consensus scores from multiple raters.

tie_breaking

How to handle cases where no consensus can be reached using the selected method.

show_consensus_table

Display detailed consensus scoring results including individual rater scores and consensus outcomes.

showClinicalSummary

Show clinical summary with plain-language interpretation of agreement statistics and their practical implications.

showAboutAnalysis

Show educational information about inter-rater reliability analysis, when to use it, and what the outputs mean.

showAssumptions

Show important assumptions, data requirements, common pitfalls, and interpretation guidelines for the analysis.

showWeightedKappaGuide

Show explanatory guide for weighted kappa options, including when to use linear vs quadratic weighting schemes.

showStatisticalGlossary

Show glossary of statistical terms (kappa, ICC, alpha, etc.) with clinical interpretations and usage guidelines.

diagnosticStyleAnalysis

Enable diagnostic style clustering analysis using the Usubutun method to identify pathologist "schools" or diagnostic approaches.

styleClusterMethod

Hierarchical clustering method for diagnostic style analysis. Ward's linkage is the Usubutun standard.

styleDistanceMetric

Distance metric for measuring diagnostic similarity between raters for style clustering.

styleGroups

Number of diagnostic style groups to identify. Usubutun et al. found 3 groups optimal for most analyses.

raterCharacteristics

Include rater background characteristics (experience, training, institution) in style analysis.

experienceVar

Optional variable containing rater experience information (years of experience, level of training, etc.)

trainingVar

Optional variable containing rater training institution or background information

institutionVar

Optional variable containing rater current institution or location information

specialtyVar

Optional variable containing rater medical specialty or subspecialty information

identifyDiscordantCases

Identify cases that distinguish different diagnostic styles - useful for training and consensus development.

caseID

Optional variable containing case identifiers. If not specified, cases will be numbered automatically.

icc

Calculate ICC for continuous or ordinal data. Provides additional reliability measures beyond kappa.

bootstrap

Calculate bootstrap confidence intervals for Krippendorff's alpha and other statistics.

bootstrapSamples

Number of bootstrap samples for confidence interval calculation.

pairwiseAnalysis

Detailed analysis of agreement between each pair of raters.

categoryAnalysis

Agreement analysis for each diagnostic category separately.

outlierAnalysis

Identify cases with unusually poor agreement across raters.

pathologyContext

Calculate pathology-specific metrics including diagnostic accuracy, sensitivity, and specificity when gold standard is available.

gwetAC

Calculate Gwet's AC1 and AC2 coefficients, which are more robust than kappa for high agreement scenarios and less affected by prevalence.

pabak

Calculate Prevalence-Adjusted Bias-Adjusted Kappa (PABAK) to address prevalence and bias issues in agreement studies.

sampleSizePlanning

Perform sample size planning calculations for agreement studies with specified precision requirements.

targetKappa

Target kappa value for sample size planning calculations.

targetPrecision

Target precision for confidence interval width in sample size planning.

raterBiasAnalysis

Analyze systematic tendencies and biases for each rater compared to the consensus or average ratings.

agreementTrendAnalysis

Analyze how agreement changes over time or case sequence, useful for training effect assessment.

caseDifficultyScoring

Quantify inherent case difficulty based on inter-rater disagreement patterns and provide difficulty scores.

agreementStabilityAnalysis

Bootstrap-based stability measures to assess the consistency of agreement statistics across different samples.

performClustering

Identify diagnostic style groups among raters using hierarchical clustering. Implements methodology from Usubutun et al. (2012) Modern Pathology. Clusters raters based on diagnosis pattern similarity to reveal systematic differences in diagnostic approach.

clusteringMethod

Hierarchical clustering linkage method. Ward's method minimizes within-group variance and is recommended for identifying distinct diagnostic styles (Usubutun 2012).

nStyleGroups

Number of diagnostic style groups to identify. Original study found 3 groups: conservative (under-diagnosis), moderate (majority), and sensitive (aligns with expert).

autoSelectGroups

Use silhouette method or within-cluster sum of squares to automatically determine optimal number of style groups.

showClusteringHeatmap

Display Cases × Raters heatmap with hierarchical dendrograms showing diagnostic patterns and style groups.

heatmapColorScheme

Color scheme for clustering heatmap. Diagnostic uses distinct colors per category as in Usubutun (2012).

identifyDiscordant

Flag cases with high inter-rater disagreement that distinguish diagnostic style groups. These cases are useful for training and quality assurance discussions.

discordantThreshold

Minimum disagreement proportion for flagging discordant cases. 0.5 means at least 50\ diagnosis.

raterExperience

Years of experience for each rater. Will be tested for association with style group membership.

raterSpecialty

Specialty or practice type (e.g., specialist vs generalist, subspecialty). Will be tested for association with style group membership.

raterInstitution

Training or current practice institution. Will be tested for association with style group membership. Usubutun (2012) found no association, suggesting diagnostic style is personal rather than institutional.

raterVolume

Number of cases seen per month or year. Will be tested for association with style group membership.

referenceStandard

Expert consensus or reference standard diagnosis. Used to compare style groups and identify which group aligns most closely with expert judgment.

useMetadataRows

Enable extraction of rater characteristics from special metadata rows in the dataset. Metadata rows should have case_id starting with "META_" (e.g., META_experience, META_specialty). Values in rater columns will be extracted as characteristics for association testing.

showInlineComments

Show detailed statistical explanations and interpretations inline with results for educational purposes.

showClusteringInterpretation

Display explanatory guide for interpreting clustering results, diagnostic style groups, and discordant cases. Useful for understanding clinical implications.

enhancedErrorGuidance

Provide detailed error messages and suggestions for resolving common issues in agreement analysis.

showProgressIndicators

Display progress indicators for computationally intensive operations like bootstrap calculations.

Value

A results object containing:

results$todoa html
results$overviewTablea table
results$kappaTablea table
results$iccTablea table
results$pairwiseTablea table
results$categoryTablea table
results$outlierTablea table
results$diagnosticAccuracyTablea table
results$diagnosticStyleTablea table
results$styleSummaryTablea table
results$discordantCasesTablea table
results$krippTablea table
results$consensusTablea table
results$consensusSummarya table
results$heatmapPlotan image
results$pairwisePlotan image
results$categoryPlotan image
results$confusionMatrixPlotan image
results$diagnosticStyleDendrograman image
results$diagnosticStyleHeatmapan image
results$diagnosticStyleCombinedan image
results$raterFrequencyTables$frequencyTablea table
results$crosstabTablea table
results$clinicalSummarya html
results$reportTemplatea html
results$aboutAnalysisa html
results$assumptionsa html
results$weightedKappaGuidea html
results$statisticalGlossarya html
results$gwetACTablea table
results$pabakTablea table
results$sampleSizeTablea table
results$raterBiasTablea table
results$agreementTrendTablea table
results$caseDifficultyTablea table
results$stabilityTablea table
results$trendPlotan image
results$biasPlotan image
results$difficultyPlotan image
results$inlineCommentsa html
results$styleGroupSummarya table
results$styleGroupProfilesa table
results$discordantCasesClustera table
results$characteristicAssociationsa table
results$referenceComparisona table
results$clusteringHeatmapHeatmap showing diagnostic patterns with dual dendrograms (raters and cases)
results$clusterDendrogramDendrogram showing hierarchical relationships between raters
results$silhouettePlotSilhouette plot showing cluster separation and cohesion
results$clusteringInterpretationExplanatory guide for understanding diagnostic style groups and cluster results

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$overviewTable$asDF

as.data.frame(results$overviewTable)

Examples

# Load example data
data('pathology_ratings', package = 'ClinicoPath')

# Basic agreement analysis with 2 raters
pathagreement(pathology_ratings,
          vars = c('rater1', 'rater2'))

# Advanced analysis with 3+ raters including visualization
pathagreement(pathology_ratings,
          vars = c('rater1', 'rater2', 'rater3'),
          multiraterMethod = 'fleiss',
          fleissCI = TRUE,
          heatmap = TRUE,
          heatmapDetails = TRUE,
          sft = TRUE)

# Krippendorff's alpha for ordinal data
agreement(pathology_ratings,
          vars = c('rater1', 'rater2', 'rater3'),
          multiraterMethod = 'krippendorff',
          kripp = TRUE,
          krippMethod = 'ordinal')

# Consensus analysis
agreement(pathology_ratings,
          vars = c('rater1', 'rater2', 'rater3'),
          consensus = TRUE,
          consensus_method = 'majority',
          tie_breaking = 'arbitration',
          show_consensus_table = TRUE)