Skip to contents

Introduction to jjtreemap

The jjtreemap function is a powerful wrapper around the treemap and ggplot2 R packages that creates hierarchical treemap visualizations for categorical data. Treemaps display hierarchical data as nested rectangles, where the area of each rectangle is proportional to a quantitative value, making them ideal for visualizing part-to-whole relationships, portfolio compositions, market share analysis, and budget allocations.

Key Features

  • Hierarchical Visualization: Display nested categorical data structures
  • Size Mapping: Rectangle areas represent quantitative values
  • Color Coding: Additional categorical or quantitative dimensions through color
  • Flexible Labeling: Customizable text display with size, color, and alignment options
  • Performance Optimized: Enhanced caching and data preparation for faster rendering
  • Publication-Ready: High-quality outputs suitable for presentations and reports

Loading Required Libraries

library(ClinicoPath)
library(dplyr)
library(tidyr)

# For this vignette, we'll create sample data
set.seed(123)

Basic Treemap Creation

Simple Category Visualization

Let’s start with a basic treemap showing market share by company:

# Create sample market share data
market_data <- data.frame(
  company = factor(c("TechCorp", "DataSoft", "CloudNet", "AIWorks", "SecureIT", "WebDev")),
  market_share = c(28.5, 22.3, 18.7, 12.5, 10.2, 7.8),
  sector = factor(c("Software", "Software", "Cloud", "AI/ML", "Security", "Web"))
)

# Create basic treemap
result_basic <- jjtreemap(
  data = market_data,
  group = "company",
  size = "market_share",
  showLabels = TRUE,
  labelSize = 6
)

print(result_basic)

Understanding Treemap Components

Group Variable (Categories)

  • Defines the rectangles in the treemap
  • Can be hierarchical (multiple levels)
  • Each unique value becomes a separate rectangle

Size Variable (Rectangle Area)

  • Must be numeric and positive
  • Determines the area of each rectangle
  • Represents the quantitative dimension

Color Variable (Optional)

  • Additional categorical dimension
  • Helps distinguish groups or show patterns
  • Can represent a secondary classification

Styling and Customization

Border Customization

Control the appearance of rectangle borders:

# Create data with hierarchical structure
dept_budget <- data.frame(
  department = factor(c("R&D", "Marketing", "Sales", "Operations", "HR", "IT")),
  budget = c(45.2, 32.5, 28.7, 38.9, 12.3, 22.5),
  division = factor(c("Innovation", "Growth", "Growth", "Core", "Support", "Infrastructure"))
)

# Treemap with custom borders
result_borders <- jjtreemap(
  data = dept_budget,
  group = "department",
  size = "budget",
  color = "division",
  borderWidth = 1.5,
  borderLevel1Width = 2,
  borderLevel2Width = 0.5,
  borderLevel1Color = "darkblue",
  borderLevel2Color = "lightgray",
  showLabels = TRUE
)

print(result_borders)

Label Customization

Font Styling

# Test different font faces
result_bold <- jjtreemap(
  data = dept_budget,
  group = "department",
  size = "budget",
  labelFontFace = "bold",
  labelLevel1Size = 16,
  labelLevel1Color = "white",
  labelBackground = "rgba(0,0,0,0.3)",
  showLabels = TRUE
)

print(result_bold)

Label Alignment

# Custom label alignment
result_aligned <- jjtreemap(
  data = dept_budget,
  group = "department",
  size = "budget",
  labelAlignH = "left",
  labelAlignV = "top",
  showLabels = TRUE,
  labelSize = 8
)

print(result_aligned)

Color Palettes and Themes

# Product portfolio with color coding
product_data <- data.frame(
  product = factor(c("Smartphones", "Laptops", "Tablets", "Headphones", 
                     "Smartwatches", "Cameras", "Speakers", "Monitors")),
  revenue = c(450, 380, 220, 180, 150, 120, 98, 85),
  category = factor(c("Mobile", "Computing", "Mobile", "Audio", 
                      "Wearables", "Imaging", "Audio", "Computing"))
)

# Treemap with category colors
result_colored <- jjtreemap(
  data = product_data,
  group = "product",
  size = "revenue",
  color = "category",
  showLabels = TRUE,
  labelSize = 6,
  title = "Product Revenue by Category",
  subtitle = "2023 Annual Report",
  caption = "Values in millions USD"
)

print(result_colored)

Real-World Applications

Market Share Analysis

# Create realistic market share data
tech_market <- data.frame(
  company = factor(c("Apple", "Samsung", "Google", "Microsoft", "Amazon",
                     "Meta", "Tesla", "NVIDIA", "Intel", "Oracle")),
  market_cap = c(2850, 1450, 1680, 2450, 1580, 
                 890, 780, 1100, 190, 280),
  sector = factor(c("Consumer Tech", "Consumer Tech", "Internet", "Software", "E-commerce",
                    "Social Media", "Automotive", "Semiconductors", "Semiconductors", "Enterprise"))
)

# Market capitalization treemap
result_market <- jjtreemap(
  data = tech_market,
  group = "company",
  size = "market_cap",
  color = "sector",
  showLabels = TRUE,
  labelSize = 8,
  title = "Tech Giants Market Capitalization",
  subtitle = "By Sector Classification",
  caption = "Market cap in billions USD",
  aspectRatio = 1.4
)

print(result_market)

Budget Allocation Visualization

# Government budget example
gov_budget <- data.frame(
  category = factor(c("Healthcare", "Education", "Defense", "Social Security",
                      "Infrastructure", "Science & Tech", "Environment", 
                      "Agriculture", "Justice", "Other")),
  allocation = c(28.5, 22.3, 18.7, 25.2, 12.5, 8.3, 6.7, 5.8, 4.2, 3.8),
  type = factor(c("Social", "Social", "Security", "Social",
                  "Infrastructure", "Research", "Environment",
                  "Economic", "Administration", "Various"))
)

# Budget treemap with custom styling
result_budget <- jjtreemap(
  data = gov_budget,
  group = "category",
  size = "allocation",
  color = "type",
  showLabels = TRUE,
  labelSize = 7,
  labelFontFace = "bold",
  title = "Federal Budget Allocation",
  subtitle = "Fiscal Year 2024",
  caption = "Percentages of total budget"
)

print(result_budget)

Portfolio Composition

# Investment portfolio breakdown
portfolio <- data.frame(
  asset = factor(c("US Stocks", "International Stocks", "Bonds", "Real Estate",
                   "Commodities", "Cash", "Crypto", "Private Equity")),
  value = c(45000, 28000, 35000, 22000, 12000, 8000, 5000, 15000),
  risk_level = factor(c("High", "High", "Low", "Medium", 
                        "High", "Low", "Very High", "High"))
)

# Portfolio treemap with risk coloring
result_portfolio <- jjtreemap(
  data = portfolio,
  group = "asset",
  size = "value",
  color = "risk_level",
  showLabels = TRUE,
  labelSize = 8,
  title = "Investment Portfolio Composition",
  subtitle = "Total Value: $170,000",
  caption = "Color indicates risk level"
)

print(result_portfolio)

Advanced Customization

Aspect Ratio Control

# Test different aspect ratios
sales_data <- data.frame(
  region = factor(c("North", "South", "East", "West", "Central")),
  sales = c(120, 95, 110, 88, 102)
)

# Wide aspect ratio
result_wide <- jjtreemap(
  data = sales_data,
  group = "region",
  size = "sales",
  aspectRatio = 2.5,
  showLabels = TRUE,
  title = "Wide Aspect Ratio (2.5)"
)

print(result_wide)

# Square aspect ratio
result_square <- jjtreemap(
  data = sales_data,
  group = "region",
  size = "sales",
  aspectRatio = 1,
  showLabels = TRUE,
  title = "Square Aspect Ratio (1.0)"
)

print(result_square)

Handling Small Values

# Data with very different scales
diverse_data <- data.frame(
  category = factor(c("Giant", "Large", "Medium", "Small", "Tiny", "Microscopic")),
  value = c(1000, 200, 50, 10, 2, 0.5)
)

# Treemap handles extreme differences
result_diverse <- jjtreemap(
  data = diverse_data,
  group = "category",
  size = "value",
  showLabels = TRUE,
  labelSize = 4,
  labelOverlap = 0.8,  # Allow more overlap for small rectangles
  title = "Handling Extreme Value Differences"
)

print(result_diverse)

Label Visibility Control

# Many categories - label management
many_categories <- data.frame(
  item = factor(paste0("Item_", LETTERS[1:20])),
  value = sort(runif(20, 10, 100), decreasing = TRUE)
)

# Control label display
result_many <- jjtreemap(
  data = many_categories,
  group = "item",
  size = "value",
  showLabels = TRUE,
  labelSize = 4,  # Minimum size for readability
  labelOverlap = 0.3,  # Less overlap tolerance
  title = "Many Categories with Smart Labeling"
)

print(result_many)

Clinical and Research Applications

Clinical Trial Enrollment

# Clinical trial sites and enrollment
trial_sites <- data.frame(
  site = factor(paste0("Site_", sprintf("%02d", 1:12))),
  enrolled = c(125, 98, 87, 76, 72, 68, 65, 58, 52, 48, 45, 42),
  region = factor(c(rep("North America", 3), rep("Europe", 3), 
                    rep("Asia Pacific", 3), rep("Latin America", 3))),
  site_type = factor(c("Academic", "Community", "Private", "Academic", 
                       "Community", "Academic", "Private", "Community",
                       "Academic", "Community", "Private", "Academic"))
)

# Enrollment treemap
result_clinical <- jjtreemap(
  data = trial_sites,
  group = "site",
  size = "enrolled",
  color = "region",
  showLabels = TRUE,
  labelSize = 6,
  title = "Clinical Trial Enrollment by Site",
  subtitle = "Phase III Multi-Center Study",
  caption = "Total enrolled: 866 patients"
)

print(result_clinical)

Research Funding Distribution

# Research grant distribution
research_grants <- data.frame(
  department = factor(c("Oncology", "Cardiology", "Neurology", "Immunology",
                        "Genetics", "Infectious Disease", "Pediatrics", "Surgery")),
  funding = c(12.5, 10.2, 9.8, 8.5, 7.2, 6.5, 5.8, 4.5),
  grant_type = factor(c("Federal", "Federal", "Mixed", "Private",
                        "Federal", "Mixed", "State", "Private"))
)

# Funding treemap
result_research <- jjtreemap(
  data = research_grants,
  group = "department",
  size = "funding",
  color = "grant_type",
  showLabels = TRUE,
  labelSize = 7,
  labelFontFace = "bold",
  title = "Research Funding Distribution",
  subtitle = "Academic Medical Center FY2024",
  caption = "Values in millions USD"
)

print(result_research)

Data Preparation Best Practices

Aggregating Data

# Raw transaction data
raw_sales <- data.frame(
  product = sample(c("A", "B", "C", "D"), 100, replace = TRUE),
  region = sample(c("North", "South", "East", "West"), 100, replace = TRUE),
  sales = runif(100, 10, 100)
)

# Aggregate before treemap
agg_sales <- raw_sales %>%
  group_by(product, region) %>%
  summarise(total_sales = sum(sales), .groups = 'drop') %>%
  arrange(desc(total_sales))

# Display top aggregated data
head(agg_sales, 10)

# Create treemap from aggregated data
result_agg <- jjtreemap(
  data = agg_sales,
  group = "product",
  size = "total_sales",
  color = "region",
  showLabels = TRUE,
  title = "Aggregated Sales by Product"
)

print(result_agg)

Handling Negative Values

# Data with negative values (profits/losses)
profit_data <- data.frame(
  division = factor(c("Electronics", "Software", "Services", "Hardware", 
                      "Consulting", "Support")),
  profit = c(25.5, 18.3, -5.2, 12.7, -2.1, 8.5)
)

# Function automatically converts negatives to small positive values
result_profit <- jjtreemap(
  data = profit_data,
  group = "division",
  size = "profit",
  showLabels = TRUE,
  labelSize = 8,
  title = "Division Performance",
  subtitle = "Note: Negative values shown as minimal size",
  caption = "Original negative values: Services (-5.2), Consulting (-2.1)"
)

print(result_profit)

Hierarchical Data Preparation

# Prepare hierarchical data structure
hierarchy_data <- data.frame(
  main_category = factor(rep(c("Electronics", "Clothing", "Food"), each = 3)),
  sub_category = factor(c("Phones", "Laptops", "Tablets",
                          "Shirts", "Pants", "Shoes",
                          "Fruits", "Vegetables", "Dairy")),
  sales = c(150, 120, 80, 60, 70, 90, 45, 38, 52)
)

# Create treemap with hierarchy indication through colors
result_hierarchy <- jjtreemap(
  data = hierarchy_data,
  group = "sub_category",
  size = "sales",
  color = "main_category",
  showLabels = TRUE,
  labelSize = 6,
  title = "Sales by Category and Subcategory",
  subtitle = "Color indicates main category"
)

print(result_hierarchy)

Performance Optimization

Large Dataset Handling

The function includes several performance optimizations:

# Performance test with larger dataset
large_data <- data.frame(
  category = factor(paste0("Category_", 1:50)),
  value = runif(50, 100, 10000),
  group = factor(rep(paste0("Group_", LETTERS[1:5]), each = 10))
)

# This should render efficiently due to optimizations
start_time <- Sys.time()
performance_result <- jjtreemap(
  data = large_data,
  group = "category",
  size = "value",
  color = "group",
  showLabels = TRUE
)
end_time <- Sys.time()

cat("Rendering time:", difftime(end_time, start_time, units = "secs"), "seconds\n")
print(performance_result)

Optimization Features

The function implements several performance enhancements:

  1. Data Preparation Caching: Processed data is cached to avoid recomputation
  2. Option Preprocessing: Common option processing is done once and cached
  3. Treemap Data Caching: The treemap calculation is cached and reused
  4. Hash-based Change Detection: Only reprocesses when inputs change
  5. Efficient Memory Usage: Minimizes data copying and transformation overhead

Troubleshooting Common Issues

Label Visibility

# Small rectangles with labels
small_rect_data <- data.frame(
  item = factor(c("Large", "Medium", "Small", "Tiny", "Micro")),
  value = c(100, 30, 10, 3, 1)
)

# Solution 1: Adjust minimum label size
result_min_label <- jjtreemap(
  data = small_rect_data,
  group = "item",
  size = "value",
  showLabels = TRUE,
  labelSize = 3,  # Smaller minimum size
  title = "Solution: Smaller Minimum Label Size"
)

print(result_min_label)

# Solution 2: Hide labels selectively
result_no_labels <- jjtreemap(
  data = small_rect_data,
  group = "item", 
  size = "value",
  showLabels = FALSE,  # Hide labels for cleaner look
  title = "Solution: Hide Labels for Small Items"
)

print(result_no_labels)

Color Contrast

# Ensure good contrast between labels and backgrounds
contrast_data <- data.frame(
  category = factor(c("A", "B", "C", "D")),
  value = c(40, 30, 20, 10),
  type = factor(c("Dark", "Dark", "Light", "Light"))
)

# Adjust label colors for contrast
result_contrast <- jjtreemap(
  data = contrast_data,
  group = "category",
  size = "value",
  color = "type",
  showLabels = TRUE,
  labelLevel1Color = "black",  # Dark labels
  labelBackground = "rgba(255,255,255,0.7)",  # Semi-transparent white background
  title = "Improved Label Contrast"
)

print(result_contrast)

Data Validation

# Function to validate treemap data
validate_treemap_data <- function(data, group_var, size_var) {
  errors <- c()
  
  # Check if variables exist
  if (!group_var %in% names(data)) {
    errors <- c(errors, "Group variable not found in data")
  }
  if (!size_var %in% names(data)) {
    errors <- c(errors, "Size variable not found in data")
  }
  
  if (length(errors) > 0) return(errors)
  
  # Check data types
  if (!is.numeric(data[[size_var]])) {
    errors <- c(errors, "Size variable must be numeric")
  }
  
  # Check for negative values
  if (any(data[[size_var]] < 0, na.rm = TRUE)) {
    errors <- c(errors, "Warning: Negative values will be converted to 0.01")
  }
  
  # Check for missing values
  complete_rows <- sum(complete.cases(data[c(group_var, size_var)]))
  if (complete_rows == 0) {
    errors <- c(errors, "No complete data rows")
  }
  
  if (length(errors) == 0) {
    return("Data validation passed!")
  } else {
    return(errors)
  }
}

# Test validation
test_data <- data.frame(
  category = c("A", "B", "C"),
  value = c(10, 20, 30)
)

validate_treemap_data(test_data, "category", "value")

Best Practices and Recommendations

Design Guidelines

  1. Hierarchy Levels: Limit to 2-3 levels for clarity
  2. Color Usage: Use color to represent meaningful categories
  3. Label Density: Show labels only for significant rectangles
  4. Aspect Ratio: Choose based on display medium (wide for presentations, square for reports)
  5. Border Width: Use thicker borders for main categories

Data Preparation Tips

  1. Aggregate First: Always aggregate data to appropriate level
  2. Handle Negatives: Convert negative values or use alternative visualization
  3. Sort by Size: Larger values create more visually prominent rectangles
  4. Limit Categories: 5-15 categories work best for readability
  5. Use Meaningful Names: Short, descriptive category names

Visual Hierarchy

# Example of good visual hierarchy
hierarchy_example <- data.frame(
  category = factor(c("Primary A", "Primary B", "Secondary C", 
                      "Secondary D", "Minor E", "Minor F")),
  value = c(350, 280, 120, 95, 45, 30),
  importance = factor(c("High", "High", "Medium", "Medium", "Low", "Low"))
)

result_hierarchy <- jjtreemap(
  data = hierarchy_example,
  group = "category",
  size = "value",
  color = "importance",
  showLabels = TRUE,
  labelSize = 6,
  borderWidth = 1.5,
  title = "Visual Hierarchy in Treemap Design",
  subtitle = "Size and color reinforce importance"
)

print(result_hierarchy)

Integration with Reporting Workflows

Export-Ready Plots

# High-quality treemap for reports
report_data <- data.frame(
  metric = factor(c("Revenue", "Costs", "R&D", "Marketing", "Operations")),
  value = c(150, 85, 25, 18, 42),
  category = factor(c("Income", "Expense", "Investment", "Investment", "Expense"))
)

# Publication-quality treemap
result_publication <- jjtreemap(
  data = report_data,
  group = "metric",
  size = "value",
  color = "category",
  showLabels = TRUE,
  labelSize = 10,
  labelFontFace = "bold",
  borderWidth = 2,
  title = "Financial Overview FY2024",
  subtitle = "All values in millions USD",
  caption = "Source: Annual Financial Report"
)

print(result_publication)

Interactive Exploration Tips

While treemaps are static, they can guide interactive exploration:

  1. Start Broad: Show high-level categories first
  2. Drill Down: Create separate treemaps for subcategories
  3. Time Series: Compare treemaps across time periods
  4. Filters: Create treemaps for filtered data subsets

Summary

The jjtreemap function provides a comprehensive solution for hierarchical data visualization with:

  • Flexible customization for borders, labels, and colors
  • Performance optimizations for efficient rendering
  • Robust error handling and data validation
  • Publication-ready output quality
  • Wide applicability across business, research, and clinical domains

Treemaps are particularly effective for: - Part-to-whole relationships - Portfolio composition - Budget allocation - Market share analysis - Resource distribution - Hierarchical categorical data

Function Reference

For complete parameter documentation, see the treemap and ggplot2 package documentation: - CRAN treemap documentation - ggplot2 documentation

# Session information
sessionInfo()