Skip to contents

A comprehensive test dataset specifically designed for testing the chisqposttest function. Contains multiple categorical variables with known associations of different strengths, edge cases, and missing data patterns.

Usage

data("chisqposttest_test_data")

Format

A data frame with 300 observations and 14 variables:

PatientID

Patient identifier (1-300)

Treatment

Treatment group: "Standard", "Experimental"

Response

Treatment response: "No Response", "Response" (strongly associated with Treatment)

Sex

Patient sex: "Male", "Female" (balanced)

TumorGrade

Tumor grade: "Grade 1", "Grade 2", "Grade 3"

TumorStage

Tumor stage: "Stage I", "Stage II", "Stage III" (moderately associated with TumorGrade)

Institution

Hospital: "Hospital A", "Hospital B", "Hospital C", "Hospital D"

QualityScore

Quality rating: "High", "Low" (weakly associated with Institution)

RandomVar1

Random variable: "Group A", "Group B", "Group C" (no associations)

RandomVar2

Random variable: "Type X", "Type Y" (no associations)

RareCategory

Frequency category: "Common", "Uncommon", "Rare" (unbalanced)

BinaryOutcome

Binary outcome: "Negative", "Positive" (associated with RareCategory)

AgeGroup

Age category: "Young", "Middle", "Elderly"

BiomarkerStatus

Biomarker status: "Negative", "Positive" (moderately associated with AgeGroup)

Details

This dataset contains several types of associations designed to test different aspects of chi-square post-hoc analysis:

Strong Associations:

  • Treatment -> Response: Clear treatment effect with odds ratio ~5

Moderate Associations:

  • TumorGrade -> TumorStage: Higher grades associated with advanced stages

  • AgeGroup -> BiomarkerStatus: Age-related biomarker expression pattern

Weak Associations:

  • Institution -> QualityScore: Institutional quality differences

  • RareCategory -> BinaryOutcome: Effect in rare category with small cell counts

No Associations:

  • RandomVar1 ⊥ RandomVar2: Independent random variables for null hypothesis testing

The dataset includes approximately 5% missing data in Treatment, Sex, and TumorGrade variables to test missing data handling options.

Source

Simulated data created for testing purposes. Associations are based on realistic clinical scenarios but data is artificially generated.

See also

chisqposttest, histopathology

Examples

# Load the dataset
data(chisqposttest_test_data)

# Examine structure
str(chisqposttest_test_data)

# Example 1: Strong association (should be highly significant)
chisqposttest(
  data = chisqposttest_test_data,
  rows = "Treatment",
  cols = "Response",
  posthoc = "bonferroni"
)

# Example 2: Moderate association (should be significant with post-hoc differences)
chisqposttest(
  data = chisqposttest_test_data,
  rows = "TumorGrade", 
  cols = "TumorStage",
  posthoc = "fdr"
)

# Example 3: No association (should be non-significant)
chisqposttest(
  data = chisqposttest_test_data,
  rows = "RandomVar1",
  cols = "RandomVar2",
  posthoc = "bonferroni"
)

# Example 4: Edge case with rare categories
chisqposttest(
  data = chisqposttest_test_data,
  rows = "RareCategory",
  cols = "BinaryOutcome",
  posthoc = "fdr"
)

# Example 5: Missing data handling
chisqposttest(
  data = chisqposttest_test_data,
  rows = "Treatment",
  cols = "Sex",
  excl = TRUE  # Exclude missing values
)