Skip to contents

Introduction

What is Predictive Power Score (PPS)?

Predictive Power Score (PPS) is a modern metric that detects linear and non-linear relationships between variables using machine learning. Unlike traditional correlation analysis, PPS:

  • Detects non-linear relationships that correlation misses
  • Works with mixed data types (numeric, categorical)
  • Provides asymmetric scores (X→Y may differ from Y→X)
  • Uses machine learning for robust relationship detection
  • Ranges from 0 (no predictive power) to 1 (perfect prediction)

Why Use PPS Instead of Correlation?

Feature Correlation PPS
Relationship Types Linear only Linear + Non-linear
Data Types Numeric only Numeric + Categorical
Symmetry Symmetric (X↔︎Y) Asymmetric (X→Y ≠ Y→X)
Method Mathematical formula Machine learning
Range -1 to +1 0 to 1
Outlier Sensitivity High Moderate

The jpps Function

The jpps() function in ClinicoPath provides a comprehensive interface for PPS analysis with four analysis types:

  1. Single: One predictor → one target
  2. Predictors: Multiple predictors → one target
  3. Matrix: All variables → all variables
  4. Compare: PPS vs correlation comparison

Getting Started

# Load required packages
library(ClinicoPath)
## Warning: replacing previous import 'dplyr::as_data_frame' by
## 'igraph::as_data_frame' when loading 'ClinicoPath'
## Warning: replacing previous import 'DiagrammeR::count_automorphisms' by
## 'igraph::count_automorphisms' when loading 'ClinicoPath'
## Warning: replacing previous import 'dplyr::groups' by 'igraph::groups' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'DiagrammeR::get_edge_ids' by
## 'igraph::get_edge_ids' when loading 'ClinicoPath'
## Warning: replacing previous import 'dplyr::union' by 'igraph::union' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'dplyr::select' by 'jmvcore::select' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'igraph::union' by 'lubridate::union' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'igraph::%--%' by 'lubridate::%--%' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::tnr' by 'mlr3measures::tnr' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::precision' by
## 'mlr3measures::precision' when loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::tn' by 'mlr3measures::tn' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::fnr' by 'mlr3measures::fnr' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::tp' by 'mlr3measures::tp' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::npv' by 'mlr3measures::npv' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::ppv' by 'mlr3measures::ppv' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::auc' by 'mlr3measures::auc' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::tpr' by 'mlr3measures::tpr' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::fn' by 'mlr3measures::fn' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::fp' by 'mlr3measures::fp' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::fpr' by 'mlr3measures::fpr' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::recall' by
## 'mlr3measures::recall' when loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::specificity' by
## 'mlr3measures::specificity' when loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::sensitivity' by
## 'mlr3measures::sensitivity' when loading 'ClinicoPath'
## Warning: replacing previous import 'igraph::as_data_frame' by
## 'tibble::as_data_frame' when loading 'ClinicoPath'
## Warning: replacing previous import 'igraph::crossing' by 'tidyr::crossing' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'magrittr::extract' by 'tidyr::extract' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'mlr3measures::sensitivity' by
## 'caret::sensitivity' when loading 'ClinicoPath'
## Warning: replacing previous import 'mlr3measures::specificity' by
## 'caret::specificity' when loading 'ClinicoPath'
## Registered S3 methods overwritten by 'useful':
##   method         from     
##   autoplot.acf   ggfortify
##   fortify.acf    ggfortify
##   fortify.kmeans ggfortify
##   fortify.ts     ggfortify
## Warning: replacing previous import 'jmvcore::select' by 'dplyr::select' when
## loading 'ClinicoPath'
## Registered S3 methods overwritten by 'ggpp':
##   method                  from   
##   heightDetails.titleGrob ggplot2
##   widthDetails.titleGrob  ggplot2
## Warning: replacing previous import 'DataExplorer::plot_histogram' by
## 'grafify::plot_histogram' when loading 'ClinicoPath'
## Warning: replacing previous import 'dplyr::select' by 'jmvcore::select' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'mlr3measures::auc' by 'pROC::auc' when
## loading 'ClinicoPath'
## Warning: replacing previous import 'cutpointr::roc' by 'pROC::roc' when loading
## 'ClinicoPath'
## Warning: replacing previous import 'tibble::view' by 'summarytools::view' when
## loading 'ClinicoPath'
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

# Set random seed for reproducibility
set.seed(20250707)

Quick Example

Let’s start with a simple example showing PPS detecting a non-linear relationship:

# Create data with non-linear relationship
data <- data.frame(
  x = seq(-3, 3, length.out = 50),
  y_linear = 2 * seq(-3, 3, length.out = 50) + rnorm(50, 0, 0.5),
  y_quadratic = seq(-3, 3, length.out = 50)^2 + rnorm(50, 0, 0.5)
)

# Analyze with PPS
result <- jpps(
  data = data,
  analysis_type = "predictors",
  target_var = "y_quadratic",
  predictor_vars = c("x"),
  algorithm = "tree"
)

# View results
result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

This demonstrates how PPS can detect the quadratic relationship between x and y_quadratic.

Analysis Types

Single Predictor Analysis

Use when you want to analyze one specific predictor-target relationship:

# Create test data
single_data <- data.frame(
  sales = c(100, 150, 120, 180, 140, 200, 160, 220, 180, 250),
  advertising = c(5, 8, 6, 10, 7, 12, 9, 13, 10, 15)
)

# Single predictor analysis
single_result <- jpps(
  data = single_data,
  analysis_type = "single",
  target_var = "sales",
  predictor_var = "advertising",
  algorithm = "auto"
)

single_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Multiple Predictors Analysis

Use to identify the best predictors for a target variable:

# Create business data
business_data <- data.frame(
  revenue = rnorm(40, 1000, 200),
  marketing_spend = rnorm(40, 50, 15),
  employee_count = round(rnorm(40, 25, 8)),
  customer_satisfaction = rnorm(40, 4.2, 0.8),
  market_share = rnorm(40, 15, 5)
)

# Multiple predictors analysis
predictors_result <- jpps(
  data = business_data,
  analysis_type = "predictors",
  target_var = "revenue",
  predictor_vars = c("marketing_spend", "employee_count", "customer_satisfaction", "market_share"),
  sort_results = "pps_desc",
  show_summary = TRUE
)

predictors_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Matrix Analysis

Use for comprehensive relationship exploration between all variables:

# Create correlation matrix data
matrix_data <- data.frame(
  var1 = rnorm(30, 50, 10),
  var2 = rnorm(30, 100, 20),
  var3 = rnorm(30, 25, 5),
  var4 = rnorm(30, 75, 15)
) %>%
  mutate(
    # Create some relationships
    var2 = var2 + 0.5 * var1,  # Linear relationship
    var3 = var1^0.5 + rnorm(30, 0, 2),  # Non-linear relationship
    var4 = ifelse(var1 > 50, var4 + 20, var4)  # Threshold relationship
  )

# Matrix analysis with heatmap
matrix_result <- jpps(
  data = matrix_data,
  analysis_type = "matrix",
  matrix_vars = c("var1", "var2", "var3", "var4"),
  show_heatmap = TRUE,
  color_scheme = "viridis",
  show_values_on_plot = TRUE
)

matrix_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Comparison Analysis

Use to compare PPS with correlation and identify where PPS provides additional insights:

# Create data where PPS outperforms correlation
comparison_data <- data.frame(
  x = runif(50, -3, 3)
) %>%
  mutate(
    linear_rel = 0.8 * x + rnorm(50, 0, 0.5),      # High correlation, high PPS
    quadratic_rel = x^2 + rnorm(50, 0, 0.5),       # Low correlation, high PPS
    no_rel = rnorm(50, 0, 1)                       # Low correlation, low PPS
  )

# Comparison analysis
comparison_result <- jpps(
  data = comparison_data,
  analysis_type = "compare",
  matrix_vars = c("x", "linear_rel", "quadratic_rel", "no_rel"),
  show_correlation_comparison = TRUE,
  correlation_method = "pearson"
)

comparison_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  PPS vs Correlation Comparison                                  
##  ────────────────────────────────────────────────────────────── 
##    Variable Pair    PPS Score    Correlation    PPS Advantage   
##  ────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Algorithm Options

PPS supports three algorithms, each with different strengths:

Decision Tree Algorithm

Best for: Interpretable results, categorical variables, threshold relationships

# Data with clear thresholds
threshold_data <- data.frame(
  income = runif(40, 20000, 100000),
  age = round(runif(40, 25, 65))
) %>%
  mutate(
    loan_approved = factor(ifelse(income > 50000 & age > 30, "Yes", "No"))
  )

tree_result <- jpps(
  data = threshold_data,
  analysis_type = "single",
  target_var = "loan_approved",
  predictor_var = "income",
  algorithm = "tree"
)

tree_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Random Forest Algorithm

Best for: Complex relationships, mixed data types, high accuracy

forest_result <- jpps(
  data = threshold_data,
  analysis_type = "single", 
  target_var = "loan_approved",
  predictor_var = "income",
  algorithm = "forest"
)

forest_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Auto Algorithm

Best for: General use, automatic algorithm selection

auto_result <- jpps(
  data = threshold_data,
  analysis_type = "single",
  target_var = "loan_approved", 
  predictor_var = "income",
  algorithm = "auto"
)

auto_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Working with Different Data Types

Mixed Data Types

PPS excels with mixed categorical and numerical data:

# Create mixed data
mixed_data <- data.frame(
  # Categorical variables
  education = factor(sample(c("High School", "Bachelor", "Master", "PhD"), 60, replace = TRUE)),
  department = factor(sample(c("Sales", "Marketing", "Engineering", "HR"), 60, replace = TRUE)),
  
  # Numerical variables
  experience_years = round(runif(60, 0, 20)),
  performance_score = rnorm(60, 75, 15)
) %>%
  mutate(
    # Salary influenced by education and experience
    salary = case_when(
      education == "PhD" ~ 80000 + experience_years * 3000 + rnorm(sum(education == "PhD"), 0, 5000),
      education == "Master" ~ 60000 + experience_years * 2500 + rnorm(sum(education == "Master"), 0, 4000),
      education == "Bachelor" ~ 45000 + experience_years * 2000 + rnorm(sum(education == "Bachelor"), 0, 3000),
      TRUE ~ 30000 + experience_years * 1500 + rnorm(sum(education == "High School"), 0, 2000)
    )
  )
## Warning: There were 4 warnings in `mutate()`.
## The first warning was:
##  In argument: `salary = case_when(...)`.
## Caused by warning in `80000 + experience_years * 3000 + rnorm(sum(education == "PhD"), 0, 5000)`:
## ! longer object length is not a multiple of shorter object length
##  Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.
# Analyze mixed data
mixed_result <- jpps(
  data = mixed_data,
  analysis_type = "predictors",
  target_var = "salary",
  predictor_vars = c("education", "department", "experience_years", "performance_score"),
  sort_results = "pps_desc"
)

mixed_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Time Series Data

PPS can detect temporal patterns:

# Create time series data
ts_data <- data.frame(
  month = 1:24,
  seasonal_factor = sin(2 * pi * (1:24) / 12)
) %>%
  mutate(
    sales = 1000 + 50 * month + 200 * seasonal_factor + rnorm(24, 0, 50),
    lagged_sales = lag(sales, 1)
  ) %>%
  filter(!is.na(lagged_sales))

# Analyze temporal relationships
ts_result <- jpps(
  data = ts_data,
  analysis_type = "matrix",
  matrix_vars = c("month", "seasonal_factor", "sales", "lagged_sales"),
  show_heatmap = TRUE
)

ts_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Advanced Features

Sample Size Management

For large datasets, use sample size limits for faster analysis:

# Create large dataset
large_data <- data.frame(
  x = rnorm(1000),
  y = rnorm(1000)
) %>%
  mutate(z = 0.6 * x + 0.4 * y + rnorm(1000, 0, 0.3))

# Use sampling for efficiency
sampled_result <- jpps(
  data = large_data,
  analysis_type = "matrix",
  matrix_vars = c("x", "y", "z"),
  sample_size = 200,  # Use only 200 samples
  algorithm = "tree"
)

sampled_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Cross-Validation Options

Adjust cross-validation for different dataset sizes:

# Small dataset - use fewer folds
small_result <- jpps(
  data = mixed_data[1:20, ],
  analysis_type = "single",
  target_var = "salary",
  predictor_var = "experience_years",
  cv_folds = 3  # Fewer folds for small data
)

# Larger dataset - use more folds
large_result <- jpps(
  data = mixed_data,
  analysis_type = "single",
  target_var = "salary",
  predictor_var = "experience_years", 
  cv_folds = 10  # More folds for better validation
)

Threshold Filtering

Filter results by minimum PPS score:

# Show only relationships with PPS > 0.1
threshold_result <- jpps(
  data = matrix_data,
  analysis_type = "matrix",
  matrix_vars = c("var1", "var2", "var3", "var4"),
  min_pps_threshold = 0.1,
  sort_results = "pps_desc"
)

threshold_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Visualization Options

Heatmap Customization

Customize heatmaps for matrix analysis:

# Custom color scheme
custom_heatmap <- jpps(
  data = matrix_data,
  analysis_type = "matrix",
  matrix_vars = c("var1", "var2", "var3", "var4"),
  show_heatmap = TRUE,
  color_scheme = "custom",
  custom_color_low = "#FFFFFF",
  custom_color_high = "#FF6B35",
  show_values_on_plot = TRUE,
  plot_title = "Custom PPS Heatmap"
)

custom_heatmap
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Barplot Options

Customize barplots for predictor analysis:

# Custom barplot
barplot_result <- jpps(
  data = business_data,
  analysis_type = "predictors",
  target_var = "revenue",
  predictor_vars = c("marketing_spend", "employee_count", "customer_satisfaction"),
  show_barplot = TRUE,
  show_values_on_plot = TRUE,
  plot_title = "Revenue Predictors",
  sort_results = "pps_desc"
)

barplot_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Practical Applications

Marketing Analytics

Identify which marketing channels drive sales:

# Marketing data
marketing_data <- data.frame(
  # Marketing channels
  tv_spend = runif(50, 1000, 10000),
  digital_spend = runif(50, 500, 5000),
  radio_spend = runif(50, 200, 2000),
  print_spend = runif(50, 100, 1000),
  
  # Customer metrics
  brand_awareness = runif(50, 20, 80),
  website_visits = round(runif(50, 1000, 10000))
) %>%
  mutate(
    # Sales influenced by digital and TV (non-linear)
    sales = 5000 + 
      sqrt(digital_spend) * 2 +  # Non-linear digital effect
      tv_spend * 0.3 +           # Linear TV effect
      ifelse(brand_awareness > 60, 2000, 0) +  # Threshold effect
      rnorm(50, 0, 1000)
  )

# Analyze marketing effectiveness
marketing_result <- jpps(
  data = marketing_data,
  analysis_type = "predictors",
  target_var = "sales",
  predictor_vars = c("tv_spend", "digital_spend", "radio_spend", "print_spend", "brand_awareness"),
  algorithm = "forest",
  show_summary = TRUE
)

marketing_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Medical Research

Identify predictors of health outcomes:

# Clinical data
clinical_data <- data.frame(
  age = round(runif(80, 25, 75)),
  gender = factor(sample(c("Male", "Female"), 80, replace = TRUE)),
  bmi = rnorm(80, 26, 4),
  smoking = factor(sample(c("Never", "Former", "Current"), 80, replace = TRUE, prob = c(0.5, 0.3, 0.2))),
  exercise_hours = pmax(0, rnorm(80, 3, 2))
) %>%
  mutate(
    # Risk score with complex relationships
    risk_score = case_when(
      smoking == "Current" & age > 50 ~ 70 + rnorm(sum(smoking == "Current" & age > 50), 0, 10),
      smoking == "Current" ~ 50 + rnorm(sum(smoking == "Current" & age <= 50), 0, 8),
      age > 60 ~ 40 + rnorm(sum(smoking != "Current" & age > 60), 0, 12),
      TRUE ~ 20 + rnorm(sum(smoking != "Current" & age <= 60), 0, 8)
    ),
    risk_score = pmax(0, pmin(100, risk_score))
  )

# Analyze health risk factors
health_result <- jpps(
  data = clinical_data,
  analysis_type = "predictors",
  target_var = "risk_score",
  predictor_vars = c("age", "gender", "bmi", "smoking", "exercise_hours"),
  algorithm = "forest",
  show_summary = TRUE
)

health_result

Financial Analysis

Detect relationships in financial data:

# Financial indicators
financial_data <- data.frame(
  # Market indicators
  sp500_return = rnorm(60, 0.08, 0.15),
  volatility_index = abs(rnorm(60, 20, 8)),
  bond_yield = rnorm(60, 3.5, 1.2),
  
  # Economic indicators  
  gdp_growth = rnorm(60, 2.5, 1.0),
  inflation = rnorm(60, 2.8, 0.8),
  unemployment = rnorm(60, 5.2, 1.5)
) %>%
  mutate(
    # Stock performance with complex relationships
    stock_performance = 
      sp500_return * 1.2 +                           # Market beta
      -0.5 * log(volatility_index + 1) +            # Non-linear volatility effect
      ifelse(gdp_growth > 3, 0.05, 0) +             # Threshold GDP effect
      rnorm(60, 0, 0.1)
  )

# Analyze financial relationships
financial_result <- jpps(
  data = financial_data,
  analysis_type = "compare",
  matrix_vars = c("sp500_return", "volatility_index", "bond_yield", "gdp_growth", "stock_performance"),
  show_correlation_comparison = TRUE,
  correlation_method = "pearson"
)

financial_result
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  PPS vs Correlation Comparison                                  
##  ────────────────────────────────────────────────────────────── 
##    Variable Pair    PPS Score    Correlation    PPS Advantage   
##  ────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Performance and Optimization

Caching System

The jpps function includes intelligent caching to speed up repeated analyses:

# First run - computes PPS
system.time({
  result1 <- jpps(
    data = large_data,
    analysis_type = "matrix",
    matrix_vars = c("x", "y", "z")
  )
})

# Second run with same parameters - uses cache (much faster)
system.time({
  result2 <- jpps(
    data = large_data,
    analysis_type = "matrix", 
    matrix_vars = c("x", "y", "z")
  )
})

Memory Management

For very large datasets, use strategic sampling:

# For datasets > 10,000 rows
huge_data <- data.frame(
  x = rnorm(50000),
  y = rnorm(50000),
  z = rnorm(50000)
)

# Use sample_size to manage memory
efficient_result <- jpps(
  data = huge_data,
  analysis_type = "matrix",
  matrix_vars = c("x", "y", "z"),
  sample_size = 1000,    # Sample for efficiency
  algorithm = "tree"     # Faster algorithm
)

Interpretation Guidelines

PPS Score Interpretation

PPS Score Interpretation Action
0.0 No predictive power No relationship
0.0 - 0.2 Weak relationship Investigate further
0.2 - 0.5 Moderate relationship Potentially useful
0.5 - 0.8 Strong relationship Very useful predictor
0.8 - 1.0 Very strong to perfect Excellent predictor

When to Use PPS vs Correlation

Use PPS when:

  • Exploring unknown relationships
  • Working with mixed data types
  • Suspecting non-linear patterns
  • Need asymmetric relationship detection
  • Want robust outlier handling

Use Correlation when:

  • Specifically testing linear relationships
  • Need fast computation
  • Working with continuous variables only
  • Require exact mathematical interpretation

Clinical/Research Interpretation

# Example interpretation workflow
interpretation_data <- data.frame(
  treatment_dose = c(0, 5, 10, 15, 20, 25, 30),
  response_rate = c(10, 25, 45, 65, 75, 80, 82)
)

# Analyze dose-response relationship
dose_response <- jpps(
  data = interpretation_data,
  analysis_type = "single",
  target_var = "response_rate",
  predictor_var = "treatment_dose",
  algorithm = "tree"
)

dose_response
## 
##  PREDICTIVE POWER SCORE ANALYSIS
## 
##  <div style='padding: 20px; color: #d32f2f;'>
##  ❌ Package Required
## 
##  The 'ppsr' package is required for Predictive Power Score analysis.
## 
## 
##  PPS Scores                                                                         
##  ────────────────────────────────────────────────────────────────────────────────── 
##    Predictor    Target    PPS Score    Baseline Score    Model Score    CV Method   
##  ────────────────────────────────────────────────────────────────────────────────── 
##  ────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  Analysis Summary       
##  ────────────────────── 
##    Statistic    Value   
##  ────────────────────── 
##  ────────────────────── 
## 
## 
##  <div style='font-family: -apple-system,BlinkMacSystemFont,Segoe
##  UI,Roboto,Arial,sans-serif; padding: 20px;'>
##  <h3 style='color: #3f51b5; margin-bottom: 15px;'>🔍 PPS Interpretation
##  Guide
## 
## 
##  <div style='background-color: #e8f5e8; padding: 15px; border-radius:
##  8px; border-left: 4px solid #4caf50; margin-bottom: 15px;'>
##  PPS Score Interpretation:
## 
##  • 0.0: No predictive power (random prediction)
## 
##  • 0.0-0.2: Weak predictive relationship
## 
##  • 0.2-0.5: Moderate predictive relationship
## 
##  • 0.5-0.8: Strong predictive relationship
## 
##  • 0.8-1.0: Very strong to perfect prediction
## 
## 
##  <div style='background-color: #f3e5f5; padding: 15px; border-radius:
##  8px; border-left: 4px solid #9c27b0; margin-bottom: 15px;'>
##  Key Advantages of PPS:
## 
##  • Detects non-linear relationships that correlation might miss
## 
##  • Asymmetric: X→Y may differ from Y→X predictive power
## 
##  • Works with mixed data types (numeric, categorical)
## 
##  • Uses machine learning for robust relationship detection
## 
## 
##  <div style='background-color: #fff3e0; padding: 15px; border-radius:
##  8px; border-left: 4px solid #ff9800;'>
##  ⚠️ Important Considerations:
## 
##  • PPS is a "quick and dirty" exploration tool
## 
##  • High PPS doesn't necessarily imply causation
## 
##  • Results may vary with different algorithms and sample sizes
## 
##  • Consider domain knowledge when interpreting relationships
## 
##  • Use PPS to guide further, more detailed analysis

Interpretation: A high PPS score (>0.8) suggests treatment dose is a strong predictor of response rate, likely following a non-linear dose-response curve typical in pharmacology.

Best Practices

Data Preparation

  1. Clean missing values appropriately
  2. Scale/normalize when comparing vastly different ranges
  3. Consider transformations for heavily skewed data
  4. Encode categorical variables consistently

Analysis Strategy

  1. Start with matrix analysis for exploration
  2. Use comparison analysis to validate against correlation
  3. Focus on predictors analysis for modeling
  4. Apply domain knowledge to interpretation

Performance Tips

  1. Use sampling for datasets >5,000 rows
  2. Choose appropriate algorithms for data type
  3. Adjust CV folds based on sample size
  4. Leverage caching for repeated analyses

Common Pitfalls and Solutions

Pitfall 1: Overfitting with Small Samples

Problem: High PPS scores with very small datasets

Solution: Use more conservative CV folds and interpret cautiously

# Small sample - use conservative settings
small_data <- data.frame(
  x = 1:8,
  y = rnorm(8)
)

conservative_result <- jpps(
  data = small_data,
  analysis_type = "single",
  target_var = "y",
  predictor_var = "x",
  cv_folds = 3,  # Fewer folds for small data
  algorithm = "tree"  # Simpler algorithm
)

Pitfall 2: Ignoring Data Types

Problem: Treating categorical variables as numeric

Solution: Ensure proper factor encoding

# Correct categorical encoding
corrected_data <- data.frame(
  category = factor(c("A", "B", "C", "A", "B", "C")),  # Proper factor
  outcome = c(10, 20, 30, 12, 22, 28)
)

correct_result <- jpps(
  data = corrected_data,
  analysis_type = "single",
  target_var = "outcome",
  predictor_var = "category"
)

Pitfall 3: Misinterpreting Causation

Problem: Assuming high PPS implies causation

Solution: Remember PPS shows predictive power, not causation

Conclusion

The jpps() function provides a powerful, modern approach to relationship detection that goes beyond traditional correlation analysis. Key advantages include:

Detects non-linear relationships
Works with mixed data types
Provides asymmetric insights
Uses robust machine learning
Includes performance optimizations
Offers comprehensive visualization

Next Steps

  1. Explore your data with matrix analysis
  2. Compare with correlation to identify missed relationships
  3. Focus on high-PPS predictors for modeling
  4. Validate findings with domain knowledge
  5. Use for feature selection in machine learning pipelines

Remember: PPS is a “quick and dirty” exploration tool. Use it to guide more detailed analysis rather than as a final answer.


For more information about ClinicoPath functions, visit our documentation or explore other vignettes in this series.