Skip to contents

Introduction

The gtsummary function in ClinicoPath provides a comprehensive interface to the powerful gtsummary R package for creating publication-ready tables. This function is designed specifically for clinical and pathological research, offering multiple table types, statistical testing, and professional formatting options.

Key Features

  • Multiple Table Types: Summary tables, cross-tabulations, regression tables, and survival tables
  • Statistical Testing: Automatic or manual selection of appropriate statistical tests
  • Professional Formatting: Bold labels, italics, spanning headers, and custom styling
  • Publication Ready: HTML, LaTeX, Word, and Markdown output formats
  • Missing Value Handling: Flexible options for displaying missing data
  • Comprehensive Statistics: Mean/SD, median/IQR, frequencies, ranges, and more

Loading Test Datasets

ClinicoPath provides comprehensive test datasets specifically designed for gtsummary functionality:

# Load all gtsummary test datasets
data(gtsummary_clinical_trial, package = "ClinicoPath")
data(gtsummary_survey_data, package = "ClinicoPath")
data(gtsummary_laboratory_data, package = "ClinicoPath")
data(gtsummary_manufacturing_data, package = "ClinicoPath")
data(gtsummary_cross_data, package = "ClinicoPath")

# Display dataset dimensions
cat("Dataset Dimensions:\n")
## Dataset Dimensions:
cat("Clinical Trial:", nrow(gtsummary_clinical_trial), "×", ncol(gtsummary_clinical_trial), "\n")
## Clinical Trial: 300 × 26
cat("Survey Data:", nrow(gtsummary_survey_data), "×", ncol(gtsummary_survey_data), "\n")
## Survey Data: 500 × 19
cat("Laboratory Data:", nrow(gtsummary_laboratory_data), "×", ncol(gtsummary_laboratory_data), "\n")
## Laboratory Data: 400 × 29
cat("Manufacturing Data:", nrow(gtsummary_manufacturing_data), "×", ncol(gtsummary_manufacturing_data), "\n")
## Manufacturing Data: 350 × 24
cat("Cross-tabulation Data:", nrow(gtsummary_cross_data), "×", ncol(gtsummary_cross_data), "\n")
## Cross-tabulation Data: 600 × 15

Basic Summary Tables

Simple Summary Table

Let’s start with a basic summary table using clinical trial data:

# Create a basic summary table
result1 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "gender", "bmi", "diabetes"),
  tableType = "summary",
  statistics = c("mean_sd", "median_iqr", "n_percent"),
  includeMissing = "ifany"
)

# Display basic information about the table
cat("Basic Summary Table Created Successfully\n")
## Basic Summary Table Created Successfully
cat("Variables included:", length(c("age", "gender", "bmi", "diabetes")), "\n")
## Variables included: 4
cat("Missing values:", ifelse(any(is.na(gtsummary_clinical_trial[c("age", "gender", "bmi", "diabetes")])), "Present", "None"), "\n")
## Missing values: None

Grouped Summary Table (Table 1)

The most common use case is creating a “Table 1” for clinical papers - a summary table grouped by treatment or exposure:

# Create a grouped summary table (classic "Table 1")
result2 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "gender", "race", "bmi", "diabetes", "disease_stage", "hemoglobin"),
  byvar = "treatment_group",
  tableType = "summary",
  includeOverall = TRUE,
  addPValue = TRUE,
  testMethod = "auto",
  statistics = c("mean_sd", "median_iqr", "n_percent"),
  boldLabels = TRUE,
  boldPValues = TRUE,
  pValueThreshold = 0.05,
  showNHeader = TRUE,
  tableTitle = "Baseline Characteristics by Treatment Group",
  footnote = "Data presented as mean (SD), median (IQR), or n (%). P-values from appropriate statistical tests."
)

cat("Grouped Summary Table Created Successfully\n")
## Grouped Summary Table Created Successfully
cat("Treatment groups:", length(unique(gtsummary_clinical_trial$treatment_group)), "\n")
## Treatment groups: 3
cat("P-values included for group comparisons\n")
## P-values included for group comparisons

Cross-Tabulation Tables

Cross-tabulation tables are essential for analyzing relationships between categorical variables:

# Create a cross-tabulation table
result3 <- ClinicoPath::gtsummary(
  data = gtsummary_cross_data,
  vars = c("treatment_response", "drug_dosage"),
  tableType = "cross",
  addPValue = TRUE,
  percentType = "row",
  boldLabels = TRUE,
  tableTitle = "Treatment Response by Drug Dosage",
  footnote = "Row percentages shown. P-value from chi-square test."
)

cat("Cross-tabulation Table Created Successfully\n")
## Cross-tabulation Table Created Successfully
cat("Row variable: treatment_response (", length(unique(gtsummary_cross_data$treatment_response)), " levels)\n")
## Row variable: treatment_response ( 3  levels)
cat("Column variable: drug_dosage (", length(unique(gtsummary_cross_data$drug_dosage)), " levels)\n")
## Column variable: drug_dosage ( 3  levels)

Complex Cross-Tabulation

# Create a more complex cross-tabulation
result4 <- ClinicoPath::gtsummary(
  data = gtsummary_cross_data,
  vars = c("side_effects", "compliance"),
  tableType = "cross",
  addPValue = TRUE,
  percentType = "column",
  boldLabels = TRUE,
  boldPValues = TRUE,
  tableTitle = "Side Effects by Treatment Compliance",
  footnote = "Column percentages shown. Higher compliance associated with fewer side effects."
)

cat("Complex Cross-tabulation Created Successfully\n")
## Complex Cross-tabulation Created Successfully
cat("Analysis of side effects by compliance level\n")
## Analysis of side effects by compliance level

Regression Tables

Regression tables are useful for presenting model results:

# Create a regression table
result5 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "gender", "bmi", "diabetes"),
  tableType = "regression",
  boldLabels = TRUE,
  boldPValues = TRUE,
  tableTitle = "Linear Regression: Age as Outcome",
  footnote = "Model: age ~ gender + bmi + diabetes. Coefficients with 95% CI shown."
)

cat("Regression Table Created Successfully\n")
## Regression Table Created Successfully
cat("Model includes demographic and clinical predictors\n")
## Model includes demographic and clinical predictors

Laboratory Data Analysis

Clinical laboratory data requires special consideration for reference ranges and flagged values:

# Create laboratory summary table
result6 <- ClinicoPath::gtsummary(
  data = gtsummary_laboratory_data,
  vars = c("glucose", "creatinine", "cholesterol_flag", "glucose_flag"),
  byvar = "department",
  tableType = "summary",
  addPValue = TRUE,
  testMethod = "auto",
  statistics = c("mean_sd", "median_iqr", "n_percent"),
  boldLabels = TRUE,
  boldPValues = TRUE,
  showNHeader = TRUE,
  tableTitle = "Laboratory Results by Department",
  footnote = "Normal ranges: Glucose 70-125 mg/dL, Creatinine <1.3 mg/dL, Cholesterol <200 mg/dL"
)

cat("Laboratory Analysis Created Successfully\n")
## Laboratory Analysis Created Successfully
cat("Departments analyzed:", length(unique(gtsummary_laboratory_data$department)), "\n")
## Departments analyzed: 5
cat("Laboratory parameters with reference ranges included\n")
## Laboratory parameters with reference ranges included

Survey Data Analysis

Survey data often requires special handling for Likert scales and categorical responses:

# Create survey data summary
result7 <- ClinicoPath::gtsummary(
  data = gtsummary_survey_data,
  vars = c("age_group", "income_level", "education_level", "overall_satisfaction"),
  byvar = "region",
  tableType = "summary",
  addPValue = TRUE,
  statistics = c("n_percent", "median_iqr"),
  boldLabels = TRUE,
  showNHeader = TRUE,
  includeMissing = "ifany",
  tableTitle = "Survey Demographics by Geographic Region",
  footnote = "Satisfaction scored 1-7 (Very Dissatisfied to Very Satisfied)"
)

cat("Survey Analysis Created Successfully\n")
## Survey Analysis Created Successfully
cat("Geographic regions:", length(unique(gtsummary_survey_data$region)), "\n")
## Geographic regions: 4
cat("Ordinal variables handled appropriately\n")
## Ordinal variables handled appropriately

Manufacturing Quality Control

Manufacturing data demonstrates the versatility of gtsummary for quality control applications:

# Create manufacturing quality control table
result8 <- ClinicoPath::gtsummary(
  data = gtsummary_manufacturing_data,
  vars = c("actual_temperature", "pressure_psi", "overall_grade", "within_specification"),
  byvar = "shift",
  tableType = "summary",
  addPValue = TRUE,
  statistics = c("mean_sd", "median_iqr", "n_percent"),
  boldLabels = TRUE,
  boldPValues = TRUE,
  showNHeader = TRUE,
  tableTitle = "Quality Control Metrics by Production Shift",
  footnote = "Temperature in °F, Pressure in PSI. Specification limits monitored."
)

cat("Manufacturing QC Analysis Created Successfully\n")
## Manufacturing QC Analysis Created Successfully
cat("Production shifts analyzed:", length(unique(gtsummary_manufacturing_data$shift)), "\n")
## Production shifts analyzed: 3
cat("Process control parameters included\n")
## Process control parameters included

Advanced Formatting Options

Custom Styling and Formatting

# Create a highly formatted table
result9 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "gender", "bmi_category", "hypertension", "diabetes"),
  byvar = "treatment_group",
  tableType = "summary",
  includeOverall = TRUE,
  addPValue = TRUE,
  addQ = TRUE,  # Add FDR-adjusted p-values
  testMethod = "auto",
  statistics = c("mean_sd", "median_iqr", "n_percent"),
  boldLabels = TRUE,
  boldLevels = FALSE,
  boldPValues = TRUE,
  italicizeLabels = FALSE,
  pValueThreshold = 0.05,
  digitsOverall = 2,
  digitsByGroup = 2,
  digitsPValue = 3,
  showNHeader = TRUE,
  outputFormat = "html",
  tableTitle = "Comprehensive Clinical Characteristics",
  footnote = "Q-values shown for multiple comparison correction (FDR method)"
)

cat("Advanced Formatting Applied Successfully\n")
## Advanced Formatting Applied Successfully
cat("Features: FDR correction, custom digits, styling options\n")
## Features: FDR correction, custom digits, styling options

Multiple Statistics Display

# Show comprehensive statistics
result10 <- ClinicoPath::gtsummary(
  data = gtsummary_laboratory_data,
  vars = c("glucose", "creatinine", "hemoglobin", "total_cholesterol"),
  byvar = "patient_sex",
  tableType = "summary",
  statistics = c("mean_sd", "median_iqr", "range", "n_percent"),
  includeMissing = "always",
  addPValue = TRUE,
  boldLabels = TRUE,
  tableTitle = "Laboratory Values: Comprehensive Statistics",
  footnote = "All available statistics shown: mean (SD), median (IQR), range, and missing values"
)

cat("Comprehensive Statistics Display Created Successfully\n")
## Comprehensive Statistics Display Created Successfully
cat("Statistics included: mean, median, range, frequencies, missing values\n")
## Statistics included: mean, median, range, frequencies, missing values

R Code Generation

The gtsummary function can generate the equivalent R code for transparency and reproducibility:

# Create table with R code output
result11 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "gender", "bmi"),
  byvar = "treatment_group",
  tableType = "summary",
  addPValue = TRUE,
  boldLabels = TRUE,
  showCode = TRUE,  # Generate R code
  exportTable = TRUE,  # Export information
  tableTitle = "Example with R Code Generation"
)

cat("R Code Generation Enabled\n")
## R Code Generation Enabled
cat("Generated code can be used for reproducible research\n")
## Generated code can be used for reproducible research

Statistical Testing Options

Automatic Test Selection

# Demonstrate automatic test selection
result12 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "gender", "bmi", "diabetes", "disease_stage"),
  byvar = "treatment_group",
  tableType = "summary",
  addPValue = TRUE,
  testMethod = "auto",  # Automatic test selection
  pairedTest = FALSE,
  boldPValues = TRUE,
  pValueThreshold = 0.05,
  tableTitle = "Automatic Statistical Test Selection",
  footnote = "Appropriate tests selected automatically: t-test/ANOVA for continuous, chi-square/Fisher for categorical"
)

cat("Automatic Test Selection Applied\n")
## Automatic Test Selection Applied
cat("Tests selected based on variable type and group structure\n")
## Tests selected based on variable type and group structure

Manual Test Selection

# Demonstrate manual test selection
result13 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "hemoglobin", "creatinine"),
  byvar = "treatment_group",
  tableType = "summary",
  addPValue = TRUE,
  testMethod = "kruskal.test",  # Manual test selection
  boldPValues = TRUE,
  tableTitle = "Manual Test Selection: Kruskal-Wallis",
  footnote = "Kruskal-Wallis test used for all continuous variables (non-parametric)"
)

cat("Manual Test Selection Applied\n")
## Manual Test Selection Applied
cat("Kruskal-Wallis test specified for non-parametric analysis\n")
## Kruskal-Wallis test specified for non-parametric analysis

Missing Value Handling

Different Missing Value Strategies

# Demonstrate different missing value handling
result14 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "education", "biomarker_b", "qol_baseline"),
  byvar = "treatment_group",
  tableType = "summary",
  includeMissing = "always",  # Always show missing values
  addPValue = TRUE,
  boldLabels = TRUE,
  tableTitle = "Missing Value Handling: Always Show",
  footnote = "Missing values shown for all variables, including those with no missing data"
)

cat("Missing Value Handling Demonstrated\n")
## Missing Value Handling Demonstrated
cat("Strategy: Always show missing values\n")
## Strategy: Always show missing values
# Show missing values only when present
result15 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "education", "biomarker_b", "qol_baseline"),
  byvar = "treatment_group",
  tableType = "summary",
  includeMissing = "ifany",  # Show missing only if present
  addPValue = TRUE,
  boldLabels = TRUE,
  tableTitle = "Missing Value Handling: If Any",
  footnote = "Missing values shown only for variables with missing data"
)

cat("Conditional Missing Value Display\n")
## Conditional Missing Value Display
cat("Strategy: Show missing values only when present\n")
## Strategy: Show missing values only when present

Output Formats

HTML Output (Default)

# HTML output format
result16 <- ClinicoPath::gtsummary(
  data = gtsummary_survey_data,
  vars = c("age_group", "income_level", "overall_satisfaction"),
  byvar = "region",
  tableType = "summary",
  outputFormat = "html",
  addPValue = TRUE,
  boldLabels = TRUE,
  tableTitle = "HTML Output Format Example"
)

cat("HTML Output Format Applied\n")
## HTML Output Format Applied
cat("Best for interactive viewing and web-based reports\n")
## Best for interactive viewing and web-based reports

Markdown Output

# Markdown output format
result17 <- ClinicoPath::gtsummary(
  data = gtsummary_survey_data,
  vars = c("age_group", "income_level", "overall_satisfaction"),
  byvar = "region",
  tableType = "summary",
  outputFormat = "markdown",
  addPValue = TRUE,
  boldLabels = TRUE,
  tableTitle = "Markdown Output Format Example"
)

cat("Markdown Output Format Applied\n")
## Markdown Output Format Applied
cat("Ideal for reproducible reports and documentation\n")
## Ideal for reproducible reports and documentation

Real-World Applications

Clinical Trial Table 1

# Create a comprehensive clinical trial Table 1
result18 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "gender", "race", "education", "bmi_category", "hypertension", 
           "diabetes", "disease_stage", "hemoglobin", "creatinine", "cholesterol"),
  byvar = "treatment_group",
  tableType = "summary",
  includeOverall = TRUE,
  addPValue = TRUE,
  testMethod = "auto",
  statistics = c("mean_sd", "median_iqr", "n_percent"),
  boldLabels = TRUE,
  boldPValues = TRUE,
  showNHeader = TRUE,
  includeMissing = "ifany",
  tableTitle = "Patient Characteristics by Treatment Assignment",
  footnote = "Continuous variables: mean (SD) or median (IQR); Categorical variables: n (%). P-values from t-test, ANOVA, chi-square, or Fisher exact test as appropriate."
)

cat("Clinical Trial Table 1 Created\n")
## Clinical Trial Table 1 Created
cat("Publication-ready format with comprehensive baseline characteristics\n")
## Publication-ready format with comprehensive baseline characteristics

Biomarker Analysis

# Biomarker analysis table
result19 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("biomarker_a", "biomarker_b", "hemoglobin", "creatinine", "cholesterol"),
  byvar = "response",
  tableType = "summary",
  addPValue = TRUE,
  testMethod = "wilcox.test",  # Non-parametric for biomarkers
  statistics = c("median_iqr", "range"),
  boldLabels = TRUE,
  boldPValues = TRUE,
  tableTitle = "Biomarker Levels by Treatment Response",
  footnote = "Biomarker values shown as median (IQR) and range. Wilcoxon rank-sum test used for comparisons."
)

cat("Biomarker Analysis Table Created\n")
## Biomarker Analysis Table Created
cat("Non-parametric analysis appropriate for biomarker data\n")
## Non-parametric analysis appropriate for biomarker data

Best Practices and Tips

Variable Selection Guidelines

  1. Continuous Variables: Best displayed with mean (SD) and/or median (IQR)
  2. Categorical Variables: Show frequencies and percentages
  3. Ordinal Variables: Consider median (IQR) for analysis
  4. Missing Values: Always document how missing data is handled

Statistical Testing

  1. Automatic Selection: Usually appropriate for exploratory analysis
  2. Manual Selection: Use when you have specific hypotheses or assumptions
  3. Multiple Testing: Consider FDR correction (addQ = TRUE) when testing many variables
  4. Non-parametric: Use Kruskal-Wallis or Wilcoxon for non-normal data

Formatting for Publication

  1. Bold Labels: Always use for professional appearance
  2. P-value Formatting: Use 3 decimal places, bold significant values
  3. Table Titles: Be descriptive and specific
  4. Footnotes: Explain statistics, tests, and abbreviations
# Demonstrate best practices
result20 <- ClinicoPath::gtsummary(
  data = gtsummary_clinical_trial,
  vars = c("age", "gender", "bmi", "diabetes", "disease_stage", "hemoglobin"),
  byvar = "treatment_group",
  tableType = "summary",
  includeOverall = TRUE,
  addPValue = TRUE,
  addQ = TRUE,  # FDR correction
  testMethod = "auto",
  statistics = c("mean_sd", "median_iqr", "n_percent"),
  boldLabels = TRUE,
  boldPValues = TRUE,
  pValueThreshold = 0.05,
  showNHeader = TRUE,
  includeMissing = "ifany",
  digitsOverall = 2,
  digitsByGroup = 2,
  digitsPValue = 3,
  tableTitle = "Best Practices Example: Complete Clinical Table",
  footnote = "Data: mean (SD), median (IQR), or n (%). Tests: t-test/ANOVA (continuous), chi-square/Fisher (categorical). Q-values: FDR-adjusted p-values for multiple comparisons. Missing values shown when present."
)

cat("Best Practices Example Created\n")
## Best Practices Example Created
cat("Professional formatting with comprehensive documentation\n")
## Professional formatting with comprehensive documentation

Summary

The gtsummary function in ClinicoPath provides a powerful and flexible interface for creating publication-ready tables. Key advantages include:

  1. Comprehensive Statistics: Multiple options for displaying data appropriately
  2. Flexible Testing: Automatic or manual selection of statistical tests
  3. Professional Formatting: Bold labels, italics, and custom styling
  4. Multiple Output Formats: HTML, Markdown, LaTeX, and Word compatibility
  5. Missing Value Handling: Flexible options for displaying incomplete data
  6. R Code Generation: Transparency and reproducibility
  7. Specialized Datasets: Comprehensive test data for various research domains

This function is particularly valuable for: - Clinical trial reports (Table 1) - Laboratory data analysis - Survey research - Quality control applications - Biomarker studies - Cross-sectional studies

The generated tables meet publication standards for major medical and scientific journals, making it an essential tool for clinical and pathological research.


This vignette demonstrates the comprehensive capabilities of the gtsummary function in ClinicoPath. For more information about specific options and advanced features, refer to the function documentation and the gtsummary R package documentation.