Skip to contents

Usage

tree(
  data,
  vars,
  facs,
  target,
  targetLevel,
  train,
  trainLevel,
  imputeMissing = FALSE,
  balanceClasses = FALSE,
  scaleFeatures = FALSE,
  clinicalMetrics = FALSE,
  featureImportance = FALSE,
  showInterpretation = FALSE,
  showPlot = FALSE,
  minCases = 10,
  maxDepth = 4,
  confidenceInterval = FALSE,
  riskStratification = FALSE,
  exportPredictions = FALSE,
  clinicalContext = "diagnosis",
  costRatio = 1,
  prevalenceAdjustment = FALSE,
  expectedPrevalence = 10
)

Arguments

data

The data as a data frame containing clinical variables, biomarkers, and patient outcomes.

vars

Continuous variables such as biomarker levels, age, laboratory values, or quantitative pathological measurements.

facs

Categorical variables such as tumor grade, stage, histological type, or patient demographics.

target

Primary outcome variable: disease status, treatment response, survival status, or diagnostic category.

targetLevel

Level representing disease presence, positive outcome, or event of interest.

train

Variable indicating training vs validation cohorts. If not provided, data will be split automatically.

trainLevel

Level indicating the training/discovery cohort.

imputeMissing

Impute missing values using medically appropriate methods (median within disease groups for continuous, mode for categorical).

balanceClasses

Balance classes to handle rare diseases or imbalanced outcomes. Recommended for disease prevalence <20\

scaleFeaturesStandardize continuous variables (useful when combining biomarkers with different scales/units).

clinicalMetricsDisplay sensitivity, specificity, predictive values, likelihood ratios, and other clinical metrics.

featureImportanceIdentify most important clinical variables and biomarkers for the decision tree.

showInterpretationProvide clinical interpretation of results including diagnostic utility and clinical recommendations.

showPlotDisplay visual representation of the decision tree.

minCasesMinimum number of cases required in each terminal node (higher values prevent overfitting).

maxDepthMaximum depth of decision tree (deeper trees may overfit).

confidenceIntervalDisplay confidence intervals for performance metrics.

riskStratificationAnalyze risk stratification performance and create risk categories based on tree predictions.

exportPredictionsAdd predicted classifications and probabilities to the dataset.

clinicalContextClinical context affects interpretation thresholds and recommendations (e.g., screening requires high sensitivity).

costRatioRelative cost of missing a case vs false alarm. Higher values favor sensitivity over specificity.

prevalenceAdjustmentAdjust predictive values for expected disease prevalence in target population (different from study sample).

expectedPrevalenceExpected disease prevalence in target population for adjusted predictive value calculations.

A results object containing:

results$todoa html
results$text1a preformatted
results$text2a preformatted
results$text2aa preformatted
results$text2ba preformatted
results$text3a preformatted
results$text4a html
results$dataQualitya preformatted
results$missingDataReporta table
results$modelSummarya html
results$clinicalMetricsa table
results$clinicalInterpretationa html
results$featureImportancea table
results$riskStratificationa table
results$confusionMatrixa table
results$adjustedMetricsa table
results$plotan image
results$deploymentGuidelinesa html
results$predictionsan output
results$probabilitiesan output
Tables can be converted to data frames with asDF or as.data.frame. For example:results$missingDataReport$asDFas.data.frame(results$missingDataReport) Enhanced decision tree analysis for medical research, pathology and oncology. Provides clinical performance metrics, handles missing data appropriately, and offers interpretations relevant to medical decision-making. # Example for cancer diagnosis data(cancer_biomarkers) tree( data = cancer_biomarkers, vars = c("PSA", "age", "tumor_size"), facs = c("grade", "stage"), target = "diagnosis", targetLevel = "cancer", train = "cohort", trainLevel = "discovery", imputeMissing = TRUE, balanceClasses = TRUE )