Skip to contents

Usage

pcacomponenttest(
  data,
  vars,
  ncomp = 5,
  nperm = 1000,
  stop_rule = TRUE,
  seed = 123,
  center = TRUE,
  scale = TRUE,
  conflevel = 0.95,
  adjustmethod = "BH",
  showpercent = TRUE,
  colororiginal = "#0072B2",
  colorpermuted = "#E69F00",
  showScreePlot = FALSE,
  showLoadingsPlot = FALSE,
  nLoadings = 5,
  plotwidth = 600,
  plotheight = 450,
  showSummary = FALSE,
  showGuide = FALSE
)

Arguments

data

The data as a data frame.

vars

Continuous variables to include in Principal Component Analysis. Select at least 3 numeric variables.

ncomp

Number of principal components to test for significance (1 to 20). Testing will be performed for PC1 through the specified number of components.

nperm

Number of permutations to generate null distribution (100-10000). Higher values provide more accurate p-values but take longer. Minimum p-value = 1/(nperm+1). For p<0.001, use nperm>=1000.

stop_rule

Stop testing after the first non-significant component (Sequential Testing). Recommended: TRUE to prevent Type I error inflation. If FALSE, all components up to ncomp are tested (Batch Testing).

seed

Random seed for reproducibility of permutation results.

center

Center variables to have mean = 0 before PCA. Recommended: TRUE for most analyses.

scale

Scale variables to have standard deviation = 1 before PCA. Recommended: TRUE when variables have different units or scales.

conflevel

Confidence level for confidence intervals (0.80-0.99). Default: 0.95 for 95\

adjustmethodMethod for adjusting p-values for multiple testing. BH (Benjamini-Hochberg) controls false discovery rate.

showpercentDisplay variance accounted for (VAF) as percentage (0-100) instead of proportion (0-1).

colororiginalColor for original VAF line/points. Use color names or hex codes. Default is color-blind safe blue.

colorpermutedColor for permuted VAF line/points. Use color names or hex codes. Default is color-blind safe orange.

showScreePlotDisplay a scree plot of eigenvalues.

showLoadingsPlotDisplay a plot of variable loadings for significant components.

nLoadingsNumber of top variables to display in loadings plot.

plotwidthWidth of the plot in pixels.

plotheightHeight of the plot in pixels.

showSummaryDisplay natural-language summary of results with clinical interpretation. Provides plain-language explanation of significant components and variance explained.

showGuideDisplay guide explaining how to interpret VAF, p-values, and clinical implications. Includes definitions of key terms and guidance for clinical use.

A results object containing:

results$todoa html
results$warningsa html
results$resultsStatistical significance of principal components based on permutation testing
results$vafplotVisualization comparing original VAF to permuted null distribution
results$screePlotan image
results$loadingsPlotan image
results$summarya html
results$guidea html
Tables can be converted to data frames with asDF or as.data.frame. For example:results$results$asDFas.data.frame(results$results) Performs SEQUENTIAL permutation-based significance testing to determine which principal components explain more variance than expected by random chance. This provides an objective, hypothesis-tested approach to component retention. The test uses the Buja & Eyuboglu (1992) sequential method where:
  1. Each component is tested against a permutation null distribution

  2. Significant components have their variance REMOVED before testing the next component

  3. Testing STOPS when the first non-significant component is found

This sequential approach prevents inflated Type I errors that occur when all components are tested against the same null distribution (batch testing).CRITICAL: Requires centered and scaled data for valid correlation-based interpretation. Without centering/scaling, the test compares raw variance instead of correlation structure. ReferencesBuja A, Eyuboglu N. (1992). Remarks on Parallel Analysis. Multivariate Behavioral Research, 27(4):509-540.Torres-Espin A, Chou A, Huie JR, et al. (2021). Reproducible analysis of disease space via principal components using the novel R package syndRomics. eLife, 10:e61812.

# Example with mtcars dataset data("mtcars")# Test significance of first 5 principal components pcacomponenttest( data = mtcars, vars = c("mpg", "disp", "hp", "drat", "wt", "qsec"), ncomp = 5, nperm = 1000, center = TRUE, scale = TRUE, conflevel = 0.95, adjustmethod = "BH" )