Example Usage - nolanlab/spade GitHub Wiki

The vignette distributed with the spade package provides a example with synthetic data that is useful in getting started with CytoSPADE. We recommend that new users start with that example. When you are ready to try CytoSPADE with real data, this page offers larger examples using CyToF high-dimensional data.

Bone Marrow Surface Panel

Running CytoSPADE R package

All code blocks beginning with R> are intended to be executed in your R session. These instructions assume you have successfully installed and loaded the spade package.

  1. Download sample data Bendall_et_al_Science_2011_Marrow_1_SurfacePanel_Live_CD44pos_Singlets.fcs

  2. Build CytoSPADE trees using the driver function. Note this can take anywhere from 5 minutes to 30 minutes depending on the number of cores that you have, speed of your processor(s), etc.

    This dataset, derived from Bendall et al. Science 2011, includes approximately 30 surface markers, and we will be using all of them in the clustering process. Additionally we have set a few other parameters, specifically the number of clusters and the number of events to keep after downsampling, to best capture our data. We typically want to over-cluster by several-fold so that we capture transitional populations.

     R> markers <- c("Cd(110,111,112,114)","Cell_length","Dy(163.929)-Dual","Er(165.930)-Dual","Er(166.932)-Dual","Er(167.932)-Dual","Er(169.935)-Dual","Eu(150.919)-Dual","Eu(152.921)-Dual","Gd(155.922)-Dual","Gd(157.924)-Dual","Gd(159.927)-Dual","Ho(164.930)-Dual","In(114.903)-Dual","Ir(190.960)-Dual","La(138.906)-Dual","Lu(174.940)-Dual","Nd(141.907)-Dual","Nd(143.910)-Dual","Nd(144.912)-Dual","Nd(145.913)-Dual","Nd(147.916)-Dual","Nd(149.920)-Dual","Pr(140.907)-Dual","Sm(146.914)-Dual","Sm(151.919)-Dual","Sm(153.922)-Dual","Tb(158.925)-Dual","Tm(168.934)-Dual","Yb(170.936)-Dual","Yb(171.936)-Dual","Yb(173.938)-Dual","Yb(175.942)-Dual")
    
     R> PANELS <- list(list(panel_files=c("Bendall_et_al_Science_2011_Marrow_1_SurfacePanel_Live_CD44pos_Singlets.fcs"), median_cols=NULL,reference_files=c("Bendall_et_al_Science_2011_Marrow_1_SurfacePanel_Live_CD44pos_Singlets.fcs"),fold_cols=c()))
    
     R> SPADE.driver("Bendall_et_al_Science_2011_Marrow_1_SurfacePanel_Live_CD44pos_Singlets.fcs", out_dir="output", cluster_cols=markers, panels=PANELS, transforms=flowCore::arcsinhTransform(a=0, b=0.2), layout=SPADE.layout.arch, downsampling_target_percent=0.1, downsampling_target_number=NULL, downsampling_target_pctile=NULL, downsampling_exclude_pctile=0.01, k=200, clustering_samples=50000)
    
  3. Generate PDFs of the SPADE trees annotated with the median intensity of the different surface markers

     R> layout <- read.table(file.path("output","layout.table"))
     R> mst <- read.graph(file.path("output","mst.gml"),format="gml")
     R> SPADE.plot.trees(mst,"output",file_pattern="*fcs*Rsave",layout=as.matrix(layout),out_dir=file.path("output","pdf"),size_scale_factor=1.2)
    

    The resulting pdfs will be in the output/pdf directory; one for every parameter in the FCS file. They are labeled with the reporter isotope.

Interpreting the data

Navigate to the "output/pdf" folder and explore the PDFs that contain "medians" in the name. These display the median expression of each marker in each node as a color overlay. The color scale was normalized for each marker independently.

Note that this demo dataset was manually gated on Cytobank.org with three gates:

  1. A "Singlets1" gate selecting DNA(pos) Cell_length(low) events. This helps exclude debris and doublets.
  2. A "Singlets2" gate excluding CD45(super-high) events. This helps exclude doublets.
  3. A "CD44pos" gate selecting CD44(pos) events. The cutoff for CD44 positivity was chosen to be somewhat permissive so that erythrocyte progenitors and CD34(pos) progenitors would be included. This helps exclude red blood cells and platelets, which were extremely plentiful in this sample despite the fact that it was Ficoll-enriched.

What are we looking at?

This is fresh, Ficoll-enriched healthy human bone marrow, with the caveat that we are only looking at the CD44(pos) events as described above. Cells were live when stained. This is from the same fresh bone marrow sample referred to as Marrow 1 in Bendal et al., Science, 2011, but all data in that paper is from cells that were formaldehyde-fixed prior to staining.

What do the markers mean?

A wide range of cell types are visible, and it is not practical to give an exhaustive list here. Below are some landmarks to orient you, and hopefully from this starting point, you will find your own exciting correlations in the data (in alphabetical order of output filenames, with isotopes in parenthesis). Additional information about the experiment can be found in the MIFlowCyt annotations available in the cytobank.org report.

  • CD3 (Cd.110.111.112.114): Canonical T cell marker. Note: The Cd(110,111,112,114) parameter is the one you should pay attention to. This is a derived parameter that is the sum of 4 cadmium isotopes, and therefore shows strong correlation with each of those cadmium isotopes individually. The summed signal provides the most sensitivity, and is therefore used for clustering.
  • Cell_length: Predominantly a singlet/doublet marker. A derived parameter that indicates the number of "slices" of the ion cloud that were summed together to identify each cell. The "length" refers to the length of the Gaussian peak; not the physical length of the cell. Cells carrying more total ions tend to have more slices. Recall that the data was already gated on this manually as described above.
  • CD15 (Dy.163.929): Predominantly granulocytes; also visible on myeloid progenitors and apparently some NK cells, though these might be doublets.
  • CD44 (Er.165.930): Broadly expressed on many cell types. Lost as erythrocytes and granulocytes mature. Recall that this was already used for manual gating, so the range observed on SPADE does not reflect the total dynamic range of this marker.
  • CD7 (Er.166.932): Dim or bright expression on T cells and bright on NK/NKT cells.
  • CD13 (Er.167.932): Myeloid progenitors
  • CD56 (Er.169.935): NK and NKT cells
  • CD123 (Eu.150.919): Brightest on myeloid progenitors; dim on early monocytes and mature B cells.
  • CD10 (Gd.155.922): Immature B cells
  • CD33 (Gd.157.924): Bright on monocytes and granulocytes; dim on HSCs and progenitors.
  • CD14 (Gd.159.927): Bright on mature monocytes. Coexpression with CD38(high) B cell progenitors is an artifact due to spectral bleed from 159Tb-CD38 into the 160-CD14 channel. This "+1 bleed" can happen in mass cytometry when the signal gets extremely high.
  • CD16 (Ho.164.930): Bright on NK cells and some erythroblasts, though this is puzzling and may actually be granulocytes.
  • CD45 (In.114.903): Broadly expressed at high levels on mature lymphoid or monocytoid cells; dim levels on progenitors; low levels on granulocytes, platelets and erythrocytes, and plasma cells.
  • DNA-191 (Ir.190.960): An iridium-based metallointercalator. Predominantly used to discriminate debris, singlets, and doublets. While this reagent has an affinity for DNA, it tends to bind to all cell types and give a cell-size-dependent signal. Recall that the data was already gated on this manually as described above.
  • DNA-193 (Ir.192.962): The same as DNA-191, just a different isotope.
  • CD45RA (La.138.906): Bright on B, NK, and naive T cells; dim on erythroblasts. Negative on HSC and MPP.
  • CXCR4 (Lu.174.940): Also known as CD184. Bright on megakaryocyte progenitors and B cells, as well as some monocytes. Dim on T and NK cells.
  • CD19 (Nd.141.907): Canonical pan-B cell marker. Bright on mature B cells, but preceded by CD10 in development. Dim expression on CD235ab(high) erythroblasts is actually due to "+1 bleed" from Nd141-CD235ab.
  • CD11b (Nd.143.910): Bright on mature monocytes and myeloid progenitors; dim on NK cells.
  • CD4 (Nd.144.912): Bright on helper T cells; dim on monocytes.
  • CD8 (Nd.145.913): Bright on cytotoxic T cells; dim on NK cells.
  • CD34 (Nd.147.916): Bright on HSCs and progenitors. Dim expression on mature B cells is due to "+1 bleed" from Nd147-CD20
  • CD161 (Nd.149.920): NK and NKT cells.
  • CD235ab (Pr.140.907): Pro-erythroblasts and erythroblasts. Mature erythrocytes were excluded by the CD44+ gate.
  • Viability: A protein-reactive dye mixed with the live cells for 15 minutes. Primarily taken up by the pro-erythroblasts. Not very useful.
  • CD20 (Sm.146.914): Mature B cells
  • CD41 (Sm.151.919): Platelet and megakaryocte marker, but negative on almost everything here because platelets were excluded by the manual gate. Some residual expression visible on myeloid progenitors and B cells.
  • CD11c (Sm.153.922): Bright on mature monocytes; dim on promonocytes.
  • CD38 (Tb.158.925): Bright on CD10(pos) pro-B and CD34(pos) multilineage progenitor cells except HSCs, which are dim. Extremely bright on CD10(neg) CD19(neg) plasma B cells. Dim on B, NK, and CD4 T cells.
  • CD61 (Tm.168.934): Platelet and megakaryocte marker, but negative on almost everything here because platelets were excluded by the manual gate. Some residual expression visible on monocytes and myeloid progenitors.
  • CD117 (Yb.170.936): Bright on multilineage progenitors; dim on pro-erythroblasts.
  • CD47 (Yb.171.936): Bright on early multilineage progenitors (MPP and HSC), pro-erythroblasts, and pre-B cells. Dim on many other mature cell types.
  • HLA-DR (Yb.173.938): Bright on late multilineage progenitors, promonocytes, and mature B cells and pro-B cells; dim on mature monocytes.
  • CD90 (Yb.175.942): Bright on pro-B and mature B cells; dim on HSC; negative on MPP; dim on other myeloid progenitors.