Skip to contents

Visualizes patient similarity using dimensionality reduction techniques (PCA, t-SNE, UMAP, MDS). Projects high-dimensional patient data into 2D or 3D space to reveal natural patient groupings and subpopulations. Inspired by Orange Data Mining's interactive projection widgets, adapted for jamovi with comprehensive cluster analysis and statistical validation.

Usage

patientsimilarity(
  data,
  vars = NULL,
  method = "tsne",
  dimensions = "2",
  colorBy = NULL,
  perplexity = 30,
  iterations = 1000,
  umapNeighbors = 15,
  umapMinDist = 0.1,
  performClustering = FALSE,
  clusterMethod = "kmeans",
  nClusters = 3,
  showClusterStats = TRUE,
  survivalAnalysis = FALSE,
  survivalTime = NULL,
  survivalEvent = NULL,
  survivalEventLevel,
  scaleVars = TRUE,
  removeOutliers = FALSE,
  showLoadings = FALSE,
  show3DPlot = FALSE
)

Arguments

data

The dataset to be analyzed, provided as a data frame.

vars

Continuous variables to use for calculating patient similarity. These will be used to compute distances between patients. Categorical variables should be converted to numeric or one-hot encoded.

method

Method for dimensionality reduction: - PCA: Linear method, preserves global structure - t-SNE: Non-linear, excellent for visualization, preserves local structure - UMAP: Non-linear, preserves both local and global structure, faster than t-SNE - MDS: Classical method, preserves pairwise distances

dimensions

Number of dimensions for projection. 2D is easier to interpret, 3D can reveal additional structure.

colorBy

Variable to use for coloring points. Typically an outcome variable (e.g., disease status, survival, response) to see if it corresponds to natural patient groupings.

perplexity

Perplexity parameter for t-SNE. Roughly corresponds to the number of nearest neighbors considered. Typical values: 5-50. Higher values preserve more global structure.

iterations

Number of iterations for t-SNE optimization. More iterations improve convergence but take longer.

umapNeighbors

Number of nearest neighbors for UMAP. Controls local vs global structure. Smaller values preserve local structure, larger values preserve global.

umapMinDist

Minimum distance between points in UMAP. Controls how tightly points are packed. Smaller values create tighter clusters.

performClustering

Automatically identify patient clusters using k-means or hierarchical clustering on the reduced-dimension space.

clusterMethod

Method for clustering patients in the reduced space.

nClusters

Number of clusters for k-means or hierarchical clustering. For DBSCAN, this is ignored.

showClusterStats

Display summary statistics for each cluster including size, characteristics, and outcome distribution.

survivalAnalysis

If survival data is available, compare survival across discovered clusters. Useful for identifying prognostic patient subtypes.

survivalTime

Time to event or censoring for survival analysis.

survivalEvent

Event indicator (1=event, 0=censored).

survivalEventLevel

Level indicating the event occurred.

scaleVars

Standardize variables to mean=0, sd=1 before analysis. Recommended when variables have different scales.

removeOutliers

Remove outliers before analysis using IQR method. May improve visualization quality.

showLoadings

Show how original variables contribute to each dimension. Only available for PCA and MDS.

show3DPlot

Generate interactive 3D plot using plotly (if dimensions=3).

Value

A results object containing:

results$instructionsa html
results$summaryTexta preformatted
results$projectionPlotan image
results$projection3Dan image
results$varianceTablea table
results$loadingsTablea table
results$clusterHeadinga html
results$clusterSummarya table
results$clusterCharacteristicsa table
results$clusterOutcomesa table
results$clusterQualitya table
results$survivalHeadinga html
results$survivalTablea table
results$survivalPlotan image
results$survivalComparisona table
results$exportClustersan output
results$exportCoordinatesan output
results$interpretationa html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$varianceTable$asDF

as.data.frame(results$varianceTable)

Examples

# Example 1: Basic t-SNE visualization
library(Rtsne)
data(iris)

patientsimilarity(
    data = iris,
    vars = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"),
    method = "tsne",
    colorBy = "Species"
)

# Example 2: UMAP with cluster analysis
patientsimilarity(
    data = clinical_data,
    vars = c("age", "tumor_size", "grade", "ki67"),
    method = "umap",
    colorBy = "survival_status",
    performClustering = TRUE,
    nClusters = 3,
    showClusterStats = TRUE
)

# Example 3: PCA with survival comparison
patientsimilarity(
    data = pathology_data,
    vars = c("age", "stage", "nodes", "size"),
    method = "pca",
    colorBy = "death",
    dimensions = 3,
    survivalTime = "months",
    survivalEvent = "death"
)