Getting Started - nolanlab/citrus GitHub Wiki

This page provides details on how to use the GUI to configure Citrus for analyzing your data.

Table of Contents

Overview

A Citrus analysis generally involves four steps:

  1. You must tell Citrus what your experimental endpoint of interest is. For instance, if you are interested in identifying cellular subsets whose behavior is different between two experimental groups, you must tell Citrus which samples belong to each experimental group.
  2. Cell subsets are identified in all samples using hierarchical clustering. You can think of this step as roughly equivalent to manual gating. You will need to tell Citrus which measured markers are used to differentiate cell types, how many cells to cluster from each sample, and if you want to transform the data before analysis. Typically these will be cell surface markers (i.e. CD markers). Cells will be grouped into clusters by the clustering algorithm based on the similarity of the specified clustering markers.
  3. Properties of every identified cell subset are calculated on a per-sample basis. You will need to tell Citrus which properties to calculate for each cell subset. For example, you may instruct Citrus to calculate the abundance of each cell subset in each patient.
  4. Descriptive properties from all discovered cellular subsets are evaluated for an association with your experimental endpoint of interest. Citrus reports those cellular subsets that are likely to be predictive or correlated with the experimental endpoint. You will need to specify which statistical model to use to detect an association with an experimental endpoint.

Data Requirements

  • Citrus requires 8 or more samples in each experimental group for Citrus in order to work as expected. That said, having more samples per group is always better and 8 per group may not be sufficient to detect subtle signals or signals in extremely rare cellular subsets. Running Citrus with fewer than 8 samples per group will likely produce spurious results.
  • Parameters must be measured on the same channels in each file.
  • The same parameters must be measured in all FCS files (no extras or missing parameters in any FCS file). If you are unsure if this is true, follow the directions here. If you need to remove parameters from an FCS file, the cytofCore package may be helpful.
  • Measured parameters and channels must appear in the same order in each FCS file.
  • Raw FCS file data must be compensated (fluorescence) or not need compensation (mass cytometry). Citrus does not apply compensation matrices stored in FCS files to FCS data.

Preparing data for analysis

  • Remove dead cells, doublets, and debris from FCS files prior to analysis.
  • Pre-gate and remove any cells or cell lineages that you are not interested in analyzing (i.e. cells marked by a dump-channel).
  • Place the FCS files (and only those FCS files) that you wish to analyze in a single directory. Results will be placed in this directory when the analysis is complete.

Starting a Citrus analysis

Citrus analyses may be run using the Citrus web GUI. After installing Citrus, start R, load the Citrus R package, and then launch the GUI:

 R> library("citrus")
 R> citrus.launchUI()

You will be prompted with a file selection box. Navigate to the directory where you have placed your FCS files that you wish to analyze and select a single FCS file from that directory. It does not matter which file you select - this is simply to tell Citrus where the data is stored.

Citrus configuration

Specifying an experimental endpoint

Click on the "Sample Group Setup" tab.

In this circumstance, we consider the sample group to be our experimental endpoint of interest. In other words, we are interested in finding cell subsets whose behavior is predictive of / different between sample groups. Tell Citrus which samples belong to each group as follows:

  1. Specify the number of groups that you want to compare in the "Number of Sample Groups" select box.
  2. Assign names to each group in the group text boxes. FCS Files that contain the group name in file name will automatically be assigned to that group.
  3. Assign each file to a group (if files are not automatically assigned).
At this time, the Citrus GUI only supports the identification of cell subsets that differ between experimental groups. Regression analysis against continuous or survival endpoints may be performed by directly using the Citrus package in R.

Configure Citrus Clustering

Click on the "Clustering Setup" tab.

Citrus uses clustering to identify many cell subsets that may be experimentally informative. In order to ensure that each sample is equally represented in the clustering, Citrus selects an equal number of events from each sample that are combined and clustered together. You must tell Citrus how many cells to select from each file, and the markers that will be used for clustering.

Citrus identifies many clusters during clustering. However, "clusters" containing just one or two cells are likely uninformative. Therefore, a minimum cluster size threshold must be specified that tells Citrus which clusters to keep and which to ignore. This parameter is specified in terms of a percentage of the total number of clustered events. For example, if you cluster 10,000 cells and specify a minimum cluster size of 5 percent, all clusters that contain fewer than 500 events (5% of 10,000) will be ignored by Citrus. Setting this parameter to a smaller number includes smaller, more rare cell subsets but also reduces statistical power to detect associations.

  1. Enter the number of cells to be selected from each file in the "Events Sampled Per File:" input box. Citrus estimates the total number of cells that will be clustered below. The number of Clustered Cells = (the number of events selected per file <math>\times</math> the total number of samples).
  2. Specify the minimum cluster size percent of interest in the "Minimum Cluster Size" input box. This number is a percent, and values should be between 0 and 100.
  3. Check the names of parameters that will be used for clustering. Typically, these are any markers you would normally use for manual gating (i.e. CD Markers).
  4. Select markers that should be transformed before analysis. Both clustering and functional markers should be transformed unless you know that your data is already transformed or does not need to be transformed.
  5. Data are transformed using the arcsin hyperbolic transform. You may adjust the transform cofactor in the "Transform Cofactor" box. Typically, 5 is a good value for mass cytometry data and 150 is a good value for fluorescence-based measurements.

Characterize Clusters

Click on the "Cluster Characterization" tab.

Citrus can characterize clusters using a number of metrics. These properties are calculated on a per-sample basis.

  • Abundance features quantify the proportion of a sample's cells that belong to a cluster (i.e. 20% of a sample's cells are found to cluster x).
  • Median features quantify the median value of a marker in a cluster's cells (i.e. the median level of functional marker y from cells in cluster x).
Specify the cluster properties that you are interested in examining by checking the appropriate boxes in the the "Calculated Cluster Features" input. If you check the 'Cluster medians' option, you will also need to specify which markers that you wish to calculate medians for. Typically, one would calculate the median level of functional markers. Again, unless data are already transformed or do not need to be transformed, you should specify that functional markers be transformed in the "Clustering Setup" tab.

Association Model Selection

Click the "Association Model Configuration" tab.

Citrus uses two types of models, predictive and correlative, to detect associations between cluster properties and experimental endpoints. Briefly, predictive models identify subsets of cluster properties that are the best predictors of the experimental endpoint. Correlative models detect subsets of cluster properties that are correlated with the experimental endpoint but that are not necessarily accurate predictors of an experimental outcome.

Select one or models to be used to test for differences between groups.

Correlative Models

  • Significance Analysis of Microarrays (sam)
    • Works with two or more groups or continuous endpoint measures
    • Identifies cluster properties whose value is associated with an experimental group or continuous outcome
    • Does not provide a predictive model
    • More information

Predictive Models

  • Nearest Shrunken Centroid (pamr)
    • Identifies cluster properties that are predictive of sample class
    • Works with two or more groups
    • May be used to predict the class of a new sample
    • More information
  • L1-Penalized Regression (glmnet)
    • Identifies linear combinations of cluster properties that are predictive of an experimental endpoint (multivariate linear/logistic regression)
    • Works with exactly two groups or continuous endpoint measures.
    • May be used to predict an experimental endpoint in a new sample
    • More information
Documentation of cross-validation folds to be added.

Write Citrus Configuration File and Run Citrus

Click the "Run!" tab.

If you want to run Citrus now, select the "Quit GUI and run Citrus in R" option in the Citrus Execution Options tab and click the "Run Citrus" button. Alternatively, you may instead just write the configuration file to the data directory. This may be useful if you have a large analysis that you'd like to run on another computer.

⚠️ **GitHub.com Fallback** ⚠️