jjtreemap: Comprehensive Treemap Visualization

Introduction to jjtreemap

The jjtreemap function is a powerful wrapper around the treemap and ggplot2 R packages that creates hierarchical treemap visualizations for categorical data. Treemaps display hierarchical data as nested rectangles, where the area of each rectangle is proportional to a quantitative value, making them ideal for visualizing part-to-whole relationships, portfolio compositions, market share analysis, and budget allocations.

Key Features

Hierarchical Visualization: Display nested categorical data structures
Size Mapping: Rectangle areas represent quantitative values
Color Coding: Additional categorical or quantitative dimensions through color
Flexible Labeling: Customizable text display with size, color, and alignment options
Performance Optimized: Enhanced caching and data preparation for faster rendering
Publication-Ready: High-quality outputs suitable for presentations and reports

Loading Required Libraries

library(ClinicoPath)
library(dplyr)
library(tidyr)

# For this vignette, we'll create sample data
set.seed(123)

Basic Treemap Creation

Simple Category Visualization

Let’s start with a basic treemap showing market share by company:

# Create sample market share data
market_data <- data.frame(
  company = factor(c("TechCorp", "DataSoft", "CloudNet", "AIWorks", "SecureIT", "WebDev")),
  market_share = c(28.5, 22.3, 18.7, 12.5, 10.2, 7.8),
  sector = factor(c("Software", "Software", "Cloud", "AI/ML", "Security", "Web"))
)

# Create basic treemap
result_basic <- jjtreemap(
  data = market_data,
  group = "company",
  size = "market_share",
  showLabels = TRUE,
  labelSize = 6
)

print(result_basic)

Understanding Treemap Components

Group Variable (Categories)

Defines the rectangles in the treemap
Can be hierarchical (multiple levels)
Each unique value becomes a separate rectangle

Size Variable (Rectangle Area)

Must be numeric and positive
Determines the area of each rectangle
Represents the quantitative dimension

Color Variable (Optional)

Additional categorical dimension
Helps distinguish groups or show patterns
Can represent a secondary classification

Styling and Customization

Border Customization

Control the appearance of rectangle borders:

# Create data with hierarchical structure
dept_budget <- data.frame(
  department = factor(c("R&D", "Marketing", "Sales", "Operations", "HR", "IT")),
  budget = c(45.2, 32.5, 28.7, 38.9, 12.3, 22.5),
  division = factor(c("Innovation", "Growth", "Growth", "Core", "Support", "Infrastructure"))
)

# Treemap with custom borders
result_borders <- jjtreemap(
  data = dept_budget,
  group = "department",
  size = "budget",
  color = "division",
  borderWidth = 1.5,
  borderLevel1Width = 2,
  borderLevel2Width = 0.5,
  borderLevel1Color = "darkblue",
  borderLevel2Color = "lightgray",
  showLabels = TRUE
)

print(result_borders)

Label Customization

Font Styling

# Test different font faces
result_bold <- jjtreemap(
  data = dept_budget,
  group = "department",
  size = "budget",
  labelFontFace = "bold",
  labelLevel1Size = 16,
  labelLevel1Color = "white",
  labelBackground = "rgba(0,0,0,0.3)",
  showLabels = TRUE
)

print(result_bold)

Label Alignment

# Custom label alignment
result_aligned <- jjtreemap(
  data = dept_budget,
  group = "department",
  size = "budget",
  labelAlignH = "left",
  labelAlignV = "top",
  showLabels = TRUE,
  labelSize = 8
)

print(result_aligned)

Color Palettes and Themes

# Product portfolio with color coding
product_data <- data.frame(
  product = factor(c("Smartphones", "Laptops", "Tablets", "Headphones", 
                     "Smartwatches", "Cameras", "Speakers", "Monitors")),
  revenue = c(450, 380, 220, 180, 150, 120, 98, 85),
  category = factor(c("Mobile", "Computing", "Mobile", "Audio", 
                      "Wearables", "Imaging", "Audio", "Computing"))
)

# Treemap with category colors
result_colored <- jjtreemap(
  data = product_data,
  group = "product",
  size = "revenue",
  color = "category",
  showLabels = TRUE,
  labelSize = 6,
  title = "Product Revenue by Category",
  subtitle = "2023 Annual Report",
  caption = "Values in millions USD"
)

print(result_colored)

Real-World Applications

# Create realistic market share data
tech_market <- data.frame(
  company = factor(c("Apple", "Samsung", "Google", "Microsoft", "Amazon",
                     "Meta", "Tesla", "NVIDIA", "Intel", "Oracle")),
  market_cap = c(2850, 1450, 1680, 2450, 1580, 
                 890, 780, 1100, 190, 280),
  sector = factor(c("Consumer Tech", "Consumer Tech", "Internet", "Software", "E-commerce",
                    "Social Media", "Automotive", "Semiconductors", "Semiconductors", "Enterprise"))
)

# Market capitalization treemap
result_market <- jjtreemap(
  data = tech_market,
  group = "company",
  size = "market_cap",
  color = "sector",
  showLabels = TRUE,
  labelSize = 8,
  title = "Tech Giants Market Capitalization",
  subtitle = "By Sector Classification",
  caption = "Market cap in billions USD",
  aspectRatio = 1.4
)

print(result_market)

Budget Allocation Visualization

# Government budget example
gov_budget <- data.frame(
  category = factor(c("Healthcare", "Education", "Defense", "Social Security",
                      "Infrastructure", "Science & Tech", "Environment", 
                      "Agriculture", "Justice", "Other")),
  allocation = c(28.5, 22.3, 18.7, 25.2, 12.5, 8.3, 6.7, 5.8, 4.2, 3.8),
  type = factor(c("Social", "Social", "Security", "Social",
                  "Infrastructure", "Research", "Environment",
                  "Economic", "Administration", "Various"))
)

# Budget treemap with custom styling
result_budget <- jjtreemap(
  data = gov_budget,
  group = "category",
  size = "allocation",
  color = "type",
  showLabels = TRUE,
  labelSize = 7,
  labelFontFace = "bold",
  title = "Federal Budget Allocation",
  subtitle = "Fiscal Year 2024",
  caption = "Percentages of total budget"
)

print(result_budget)

Portfolio Composition

# Investment portfolio breakdown
portfolio <- data.frame(
  asset = factor(c("US Stocks", "International Stocks", "Bonds", "Real Estate",
                   "Commodities", "Cash", "Crypto", "Private Equity")),
  value = c(45000, 28000, 35000, 22000, 12000, 8000, 5000, 15000),
  risk_level = factor(c("High", "High", "Low", "Medium", 
                        "High", "Low", "Very High", "High"))
)

# Portfolio treemap with risk coloring
result_portfolio <- jjtreemap(
  data = portfolio,
  group = "asset",
  size = "value",
  color = "risk_level",
  showLabels = TRUE,
  labelSize = 8,
  title = "Investment Portfolio Composition",
  subtitle = "Total Value: $170,000",
  caption = "Color indicates risk level"
)

print(result_portfolio)

Advanced Customization

Aspect Ratio Control

# Test different aspect ratios
sales_data <- data.frame(
  region = factor(c("North", "South", "East", "West", "Central")),
  sales = c(120, 95, 110, 88, 102)
)

# Wide aspect ratio
result_wide <- jjtreemap(
  data = sales_data,
  group = "region",
  size = "sales",
  aspectRatio = 2.5,
  showLabels = TRUE,
  title = "Wide Aspect Ratio (2.5)"
)

print(result_wide)

# Square aspect ratio
result_square <- jjtreemap(
  data = sales_data,
  group = "region",
  size = "sales",
  aspectRatio = 1,
  showLabels = TRUE,
  title = "Square Aspect Ratio (1.0)"
)

print(result_square)

Handling Small Values

# Data with very different scales
diverse_data <- data.frame(
  category = factor(c("Giant", "Large", "Medium", "Small", "Tiny", "Microscopic")),
  value = c(1000, 200, 50, 10, 2, 0.5)
)

# Treemap handles extreme differences
result_diverse <- jjtreemap(
  data = diverse_data,
  group = "category",
  size = "value",
  showLabels = TRUE,
  labelSize = 4,
  labelOverlap = 0.8,  # Allow more overlap for small rectangles
  title = "Handling Extreme Value Differences"
)

print(result_diverse)

Label Visibility Control

# Many categories - label management
many_categories <- data.frame(
  item = factor(paste0("Item_", LETTERS[1:20])),
  value = sort(runif(20, 10, 100), decreasing = TRUE)
)

# Control label display
result_many <- jjtreemap(
  data = many_categories,
  group = "item",
  size = "value",
  showLabels = TRUE,
  labelSize = 4,  # Minimum size for readability
  labelOverlap = 0.3,  # Less overlap tolerance
  title = "Many Categories with Smart Labeling"
)

print(result_many)

Clinical and Research Applications

Clinical Trial Enrollment

# Clinical trial sites and enrollment
trial_sites <- data.frame(
  site = factor(paste0("Site_", sprintf("%02d", 1:12))),
  enrolled = c(125, 98, 87, 76, 72, 68, 65, 58, 52, 48, 45, 42),
  region = factor(c(rep("North America", 3), rep("Europe", 3), 
                    rep("Asia Pacific", 3), rep("Latin America", 3))),
  site_type = factor(c("Academic", "Community", "Private", "Academic", 
                       "Community", "Academic", "Private", "Community",
                       "Academic", "Community", "Private", "Academic"))
)

# Enrollment treemap
result_clinical <- jjtreemap(
  data = trial_sites,
  group = "site",
  size = "enrolled",
  color = "region",
  showLabels = TRUE,
  labelSize = 6,
  title = "Clinical Trial Enrollment by Site",
  subtitle = "Phase III Multi-Center Study",
  caption = "Total enrolled: 866 patients"
)

print(result_clinical)

Research Funding Distribution

# Research grant distribution
research_grants <- data.frame(
  department = factor(c("Oncology", "Cardiology", "Neurology", "Immunology",
                        "Genetics", "Infectious Disease", "Pediatrics", "Surgery")),
  funding = c(12.5, 10.2, 9.8, 8.5, 7.2, 6.5, 5.8, 4.5),
  grant_type = factor(c("Federal", "Federal", "Mixed", "Private",
                        "Federal", "Mixed", "State", "Private"))
)

# Funding treemap
result_research <- jjtreemap(
  data = research_grants,
  group = "department",
  size = "funding",
  color = "grant_type",
  showLabels = TRUE,
  labelSize = 7,
  labelFontFace = "bold",
  title = "Research Funding Distribution",
  subtitle = "Academic Medical Center FY2024",
  caption = "Values in millions USD"
)

print(result_research)

Data Preparation Best Practices

Aggregating Data

# Raw transaction data
raw_sales <- data.frame(
  product = sample(c("A", "B", "C", "D"), 100, replace = TRUE),
  region = sample(c("North", "South", "East", "West"), 100, replace = TRUE),
  sales = runif(100, 10, 100)
)

# Aggregate before treemap
agg_sales <- raw_sales %>%
  group_by(product, region) %>%
  summarise(total_sales = sum(sales), .groups = 'drop') %>%
  arrange(desc(total_sales))

# Display top aggregated data
head(agg_sales, 10)

# Create treemap from aggregated data
result_agg <- jjtreemap(
  data = agg_sales,
  group = "product",
  size = "total_sales",
  color = "region",
  showLabels = TRUE,
  title = "Aggregated Sales by Product"
)

print(result_agg)

Handling Negative Values

# Data with negative values (profits/losses)
profit_data <- data.frame(
  division = factor(c("Electronics", "Software", "Services", "Hardware", 
                      "Consulting", "Support")),
  profit = c(25.5, 18.3, -5.2, 12.7, -2.1, 8.5)
)

# Function automatically converts negatives to small positive values
result_profit <- jjtreemap(
  data = profit_data,
  group = "division",
  size = "profit",
  showLabels = TRUE,
  labelSize = 8,
  title = "Division Performance",
  subtitle = "Note: Negative values shown as minimal size",
  caption = "Original negative values: Services (-5.2), Consulting (-2.1)"
)

print(result_profit)

Hierarchical Data Preparation

# Prepare hierarchical data structure
hierarchy_data <- data.frame(
  main_category = factor(rep(c("Electronics", "Clothing", "Food"), each = 3)),
  sub_category = factor(c("Phones", "Laptops", "Tablets",
                          "Shirts", "Pants", "Shoes",
                          "Fruits", "Vegetables", "Dairy")),
  sales = c(150, 120, 80, 60, 70, 90, 45, 38, 52)
)

# Create treemap with hierarchy indication through colors
result_hierarchy <- jjtreemap(
  data = hierarchy_data,
  group = "sub_category",
  size = "sales",
  color = "main_category",
  showLabels = TRUE,
  labelSize = 6,
  title = "Sales by Category and Subcategory",
  subtitle = "Color indicates main category"
)

print(result_hierarchy)

Performance Optimization

Large Dataset Handling

The function includes several performance optimizations:

# Performance test with larger dataset
large_data <- data.frame(
  category = factor(paste0("Category_", 1:50)),
  value = runif(50, 100, 10000),
  group = factor(rep(paste0("Group_", LETTERS[1:5]), each = 10))
)

# This should render efficiently due to optimizations
start_time <- Sys.time()
performance_result <- jjtreemap(
  data = large_data,
  group = "category",
  size = "value",
  color = "group",
  showLabels = TRUE
)
end_time <- Sys.time()

cat("Rendering time:", difftime(end_time, start_time, units = "secs"), "seconds\n")
print(performance_result)

Optimization Features

The function implements several performance enhancements:

Data Preparation Caching: Processed data is cached to avoid recomputation
Option Preprocessing: Common option processing is done once and cached
Treemap Data Caching: The treemap calculation is cached and reused
Hash-based Change Detection: Only reprocesses when inputs change
Efficient Memory Usage: Minimizes data copying and transformation overhead

Troubleshooting Common Issues

Label Visibility

# Small rectangles with labels
small_rect_data <- data.frame(
  item = factor(c("Large", "Medium", "Small", "Tiny", "Micro")),
  value = c(100, 30, 10, 3, 1)
)

# Solution 1: Adjust minimum label size
result_min_label <- jjtreemap(
  data = small_rect_data,
  group = "item",
  size = "value",
  showLabels = TRUE,
  labelSize = 3,  # Smaller minimum size
  title = "Solution: Smaller Minimum Label Size"
)

print(result_min_label)

# Solution 2: Hide labels selectively
result_no_labels <- jjtreemap(
  data = small_rect_data,
  group = "item", 
  size = "value",
  showLabels = FALSE,  # Hide labels for cleaner look
  title = "Solution: Hide Labels for Small Items"
)

print(result_no_labels)

Color Contrast

# Ensure good contrast between labels and backgrounds
contrast_data <- data.frame(
  category = factor(c("A", "B", "C", "D")),
  value = c(40, 30, 20, 10),
  type = factor(c("Dark", "Dark", "Light", "Light"))
)

# Adjust label colors for contrast
result_contrast <- jjtreemap(
  data = contrast_data,
  group = "category",
  size = "value",
  color = "type",
  showLabels = TRUE,
  labelLevel1Color = "black",  # Dark labels
  labelBackground = "rgba(255,255,255,0.7)",  # Semi-transparent white background
  title = "Improved Label Contrast"
)

print(result_contrast)

Data Validation

# Function to validate treemap data
validate_treemap_data <- function(data, group_var, size_var) {
  errors <- c()
  
  # Check if variables exist
  if (!group_var %in% names(data)) {
    errors <- c(errors, "Group variable not found in data")
  }
  if (!size_var %in% names(data)) {
    errors <- c(errors, "Size variable not found in data")
  }
  
  if (length(errors) > 0) return(errors)
  
  # Check data types
  if (!is.numeric(data[[size_var]])) {
    errors <- c(errors, "Size variable must be numeric")
  }
  
  # Check for negative values
  if (any(data[[size_var]] < 0, na.rm = TRUE)) {
    errors <- c(errors, "Warning: Negative values will be converted to 0.01")
  }
  
  # Check for missing values
  complete_rows <- sum(complete.cases(data[c(group_var, size_var)]))
  if (complete_rows == 0) {
    errors <- c(errors, "No complete data rows")
  }
  
  if (length(errors) == 0) {
    return("Data validation passed!")
  } else {
    return(errors)
  }
}

# Test validation
test_data <- data.frame(
  category = c("A", "B", "C"),
  value = c(10, 20, 30)
)

validate_treemap_data(test_data, "category", "value")

Best Practices and Recommendations

Design Guidelines

Hierarchy Levels: Limit to 2-3 levels for clarity
Color Usage: Use color to represent meaningful categories
Label Density: Show labels only for significant rectangles
Aspect Ratio: Choose based on display medium (wide for presentations, square for reports)
Border Width: Use thicker borders for main categories

Data Preparation Tips

Aggregate First: Always aggregate data to appropriate level
Handle Negatives: Convert negative values or use alternative visualization
Sort by Size: Larger values create more visually prominent rectangles
Limit Categories: 5-15 categories work best for readability
Use Meaningful Names: Short, descriptive category names

Visual Hierarchy

# Example of good visual hierarchy
hierarchy_example <- data.frame(
  category = factor(c("Primary A", "Primary B", "Secondary C", 
                      "Secondary D", "Minor E", "Minor F")),
  value = c(350, 280, 120, 95, 45, 30),
  importance = factor(c("High", "High", "Medium", "Medium", "Low", "Low"))
)

result_hierarchy <- jjtreemap(
  data = hierarchy_example,
  group = "category",
  size = "value",
  color = "importance",
  showLabels = TRUE,
  labelSize = 6,
  borderWidth = 1.5,
  title = "Visual Hierarchy in Treemap Design",
  subtitle = "Size and color reinforce importance"
)

print(result_hierarchy)

Integration with Reporting Workflows

Export-Ready Plots

# High-quality treemap for reports
report_data <- data.frame(
  metric = factor(c("Revenue", "Costs", "R&D", "Marketing", "Operations")),
  value = c(150, 85, 25, 18, 42),
  category = factor(c("Income", "Expense", "Investment", "Investment", "Expense"))
)

# Publication-quality treemap
result_publication <- jjtreemap(
  data = report_data,
  group = "metric",
  size = "value",
  color = "category",
  showLabels = TRUE,
  labelSize = 10,
  labelFontFace = "bold",
  borderWidth = 2,
  title = "Financial Overview FY2024",
  subtitle = "All values in millions USD",
  caption = "Source: Annual Financial Report"
)

print(result_publication)

Interactive Exploration Tips

While treemaps are static, they can guide interactive exploration:

Start Broad: Show high-level categories first
Drill Down: Create separate treemaps for subcategories
Time Series: Compare treemaps across time periods
Filters: Create treemaps for filtered data subsets

Summary

The jjtreemap function provides a comprehensive solution for hierarchical data visualization with:

Flexible customization for borders, labels, and colors
Performance optimizations for efficient rendering
Robust error handling and data validation
Publication-ready output quality
Wide applicability across business, research, and clinical domains

Treemaps are particularly effective for: - Part-to-whole relationships - Portfolio composition - Budget allocation - Market share analysis - Resource distribution - Hierarchical categorical data

Function Reference

For complete parameter documentation, see the treemap and ggplot2 package documentation: - CRAN treemap documentation - ggplot2 documentation

# Session information
sessionInfo()

ClinicoPath

2025-07-13