Skip to contents

Specialized dataset containing various edge cases, extreme values, special characters, and challenging data patterns. Designed to test the robustness of enhanced summary implementations, error handling capabilities, and graceful degradation with problematic data using summarytools comprehensive error management.

Usage

toolssummary_edge_cases

Format

A data frame with 150 observations and 12 variables:

id

Character. Unique identifier (EDGE_001 to EDGE_150)

scenario

Factor. Test scenario type (1-6)

numeric_extreme

Numeric. Variable with extreme values, zeros, and systematic missing

numeric_decimal

Numeric. Variable with high-precision decimals (8 decimal places)

categorical_many

Factor. Categorical variable with 30 different levels

categorical_few

Factor. Binary categorical variable ("A", "B")

categorical_special

Factor. Categories with missing-like values ("N/A", "Missing", "")

constant_numeric

Numeric. Constant value (all 42) for edge case testing

constant_factor

Factor. Constant factor (all "SAME") for edge case testing

text_variable

Character. Text with varying lengths and special characters

date_variable

Date. Date variable spanning 2020-2025

binary_numeric

Integer. Binary numeric variable (0, 1)

Source

Simulated data generated using create_toolssummary_test_data.R

Details

This dataset is designed to stress-test enhanced summary implementations with various edge cases and challenging data patterns that might cause failures in standard analysis approaches. It tests summarytools robustness and error handling capabilities.

Scenario Types:

  • Scenario 1: Normal case (baseline comparison)

  • Scenario 2: Extreme values (very large positive and negative numbers)

  • Scenario 3: Very small values (near-zero decimals)

  • Scenario 4: High precision decimals and complex numbers

  • Scenario 5: Zero and edge numeric values

  • Scenario 6: Systematic missing values

Quality Challenges:

  • Extreme numeric ranges and precision requirements

  • Categorical variables with many levels (30 categories)

  • Special characters and empty strings in categorical data

  • Systematic missing patterns (every 6th and 7th observation)

  • Constant values across all observations

  • Text variables with varying lengths and special characters

summarytools Integration Testing:

  • dfSummary: Robustness with problematic data structures

  • freq: Handling of many categories and special characters

  • descr: Extreme value processing and missing data management

  • ctable: Edge case cross-tabulation scenarios

Recommended Usage Scenarios:

  • Robustness testing for all summary functions

  • Error handling validation

  • Special character and encoding testing

  • Missing data pattern analysis

Examples

if (FALSE) { # \dontrun{
# Load the dataset
data(toolssummary_edge_cases)

# Robustness testing with edge cases
result <- toolssummary(
  data = toolssummary_edge_cases,
  vars = c("numeric_extreme", "categorical_many", "text_variable"),
  useSummarytools = TRUE,
  showDfSummary = TRUE,
  showFreq = TRUE
)

# Scenario-based analysis
result_scenario <- toolssummary(
  data = toolssummary_edge_cases,
  vars = c("numeric_extreme", "categorical_special"),
  groupVar = "scenario",
  useSummarytools = TRUE,
  showCrosstabs = TRUE
)
} # }