Skip to contents

Performs SEQUENTIAL permutation-based significance testing to determine which principal components explain more variance than expected by random chance. This provides an objective, hypothesis-tested approach to component retention.

Usage

pcacomponenttest(
  data,
  vars,
  ncomp = 5,
  nperm = 1000,
  stop_rule = TRUE,
  seed = 123,
  center = TRUE,
  scale = TRUE,
  conflevel = 0.95,
  adjustmethod = "BH",
  showpercent = TRUE,
  colororiginal = "#0072B2",
  colorpermuted = "#E69F00",
  showScreePlot = FALSE,
  showLoadingsPlot = FALSE,
  nLoadings = 5,
  plotwidth = 600,
  plotheight = 450,
  showSummary = FALSE,
  showGuide = FALSE
)

Arguments

data

The data as a data frame.

vars

Continuous variables to include in Principal Component Analysis. Select at least 3 numeric variables.

ncomp

Number of principal components to test for significance (1 to 20). Testing will be performed for PC1 through the specified number of components.

nperm

Number of permutations to generate null distribution (100-10000). Higher values provide more accurate p-values but take longer. Minimum p-value = 1/(nperm+1). For p<0.001, use nperm>=1000.

stop_rule

Stop testing after the first non-significant component (Sequential Testing). Recommended: TRUE to prevent Type I error inflation. If FALSE, all components up to ncomp are tested (Batch Testing).

seed

Random seed for reproducibility of permutation results.

center

Center variables to have mean = 0 before PCA. Recommended: TRUE for most analyses.

scale

Scale variables to have standard deviation = 1 before PCA. Recommended: TRUE when variables have different units or scales.

conflevel

Confidence level for confidence intervals (0.80-0.99). Default: 0.95 for 95 percent CI.

adjustmethod

Method for adjusting p-values for multiple testing. BH (Benjamini-Hochberg) controls false discovery rate.

showpercent

Display variance accounted for (VAF) as percentage (0-100) instead of proportion (0-1).

colororiginal

Color for original VAF line/points. Use color names or hex codes. Default is color-blind safe blue.

colorpermuted

Color for permuted VAF line/points. Use color names or hex codes. Default is color-blind safe orange.

showScreePlot

Display a scree plot of eigenvalues.

showLoadingsPlot

Display a plot of variable loadings for significant components.

nLoadings

Number of top variables to display in loadings plot.

plotwidth

Width of the plot in pixels.

plotheight

Height of the plot in pixels.

showSummary

Display natural-language summary of results with clinical interpretation. Provides plain-language explanation of significant components and variance explained.

showGuide

Display guide explaining how to interpret VAF, p-values, and clinical implications. Includes definitions of key terms and guidance for clinical use.

Value

A results object containing:

results$todoa html
results$warningsa html
results$resultsStatistical significance of principal components based on permutation testing
results$vafplotVisualization comparing original VAF to permuted null distribution
results$screePlotan image
results$loadingsPlotan image
results$summarya html
results$guidea html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$results$asDF

as.data.frame(results$results)

Details

The test uses the Buja & Eyuboglu (1992) sequential method where:

  1. Each component is tested against a permutation null distribution

  2. Significant components have their variance REMOVED before testing the next component

  3. Testing STOPS when the first non-significant component is found

This sequential approach prevents inflated Type I errors that occur when all components are tested against the same null distribution (batch testing).

CRITICAL: Requires centered and scaled data for valid correlation-based interpretation. Without centering/scaling, the test compares raw variance instead of correlation structure.

References

Buja A, Eyuboglu N. (1992). Remarks on Parallel Analysis. Multivariate Behavioral Research, 27(4):509-540.

Torres-Espin A, Chou A, Huie JR, et al. (2021). Reproducible analysis of disease space via principal components using the novel R package syndRomics. eLife, 10:e61812.

Examples

# \donttest{
# Example with mtcars dataset
data("mtcars")

# Test significance of first 5 principal components
pcacomponenttest(
  data = mtcars,
  vars = c("mpg", "disp", "hp", "drat", "wt", "qsec"),
  ncomp = 5,
  nperm = 1000,
  center = TRUE,
  scale = TRUE,
  conflevel = 0.95,
  adjustmethod = "BH"
)
#> Error in item$visible && !is(item, "Image"): invalid 'x' type in 'x && y'
# }