scRNA seq - BgeeDB/expression-annotations Wiki

Dedicated page for scRNA-seq expression data annotation.

Follow this link for general annotation guidelines in Bgee.

Introduction

scRNA-seq allows to access the transcriptome heterogeneity at cell levels, describing cell types.

https://www.nature.com/articles/s41368-021-00146-0

scRNA-seq methods can be low throughput (also known as plate-based methods) or high throughput (also known as droplet-based methods), and may concern only nuclei (single-nuclei) instead of the whole cell, see this review for detailed information.

There are many protocols to generate scRNA-seq libraries, but all follow the same workflow:

  • cell isolation

  • library preparation

  • sequencing

scRNA-seq annotation

We started with full-length (FL) scRNA-seq annotation, because the library preparation is closely related to the one for bulk RNA-seq. But many interesting datasets are generated from target-based (TB) scRNA-seq, and we adapted our annotation process to capture this type of scRNA-seq too.

cell isolation

This step is crucial to follow the general bgee rules of normality. As explained below, we had however to reconsider our normality rules for scRNA-seq annotation, because many protocols use transgenic strains to facilitate cell isolation (even if cellular side effects resulting from GFP expression for example have been reported, as described in this paper).

  • The ideal isolation protocol is using antibodies staining, then FACS or MACS protocols that isolate cells from tissues, rather in a mechanical way.
  • We may do accept transgenic cells with constitutive reporter genes (e.g. GFP, YFP, dtTomato) if there is a big gain of interesting samples (see below integrating data from big consortium).

Using DNA recombinant technology, scientists combine the Gfp gene to a another gene that produces a protein that they want to study, and then they insert the complex into a cell. If the cell produces the green fluorescence, scientists infer that the cell expresses the target gene as well

source:https://embryo.asu.edu/pages/green-fluorescent-protein

  • We aim to follow protocols used by dedicated cell atlas/single-cell expression data consortium, such as the STAR protocol for 'Isolation and RNA sequencing of single nuclei from Drosophila tissues', used in Fly Cell Atlas, FCA, but we are still focused on our normality rules, and so we report the genotypes in order to have the possibility of further filtering on genotypes clearly away from the 'wild type' genotype.
  • We may then accept transgenic strains with driver line such as the GAL4/UAS system; see this picture of Drosophila usual cell staining protocol for cell isolation.
  • We do not accept induction lines: CRE-inducible protocols that involve injection of inducers such as tamoxifen to activate reporter genes are rejected.
  • We do not accept cells from culture or cell lines.

As part of the improvement of our annotation process, we report 'genotype' in the FL and TB annotation files in addition to the strain information.

scRNASeqFL (full-length scRNA-seq)

Full-length scRNA-seq libraries are single-cell libraries that each contains a unique cell type. The cell type is either known a priori, or can be defined more precisely a posteriori by clustering.

currently accepted protocols

  • Smart-seq

  • Smart-seq2

table format for scRNASeqFLLibrary.tsv file

  • libraryId
  • experimentId
  • platform
  • anatId > the identifier used in mapping the anatomical structure, usually UBERON id
  • anatName > the name associated to the anatID
  • cellTypeId_abInitio > the identifier used in the mapping of the cell type from column infoCellType_abInitio and provided by the authors before any orignal analysis (clustering) that may allow to precise the cell type (see column cellTypeId_Cluster)
  • cellTypeName_abInitio > the name associated to the cellTypeId_abInitio
  • markers > this column can report any marker associated to the cell type (usually find in the paper, and associated to the cluster)
  • stageId
  • stageName
  • infoOrgan
  • infoCellType_abInitio
  • infoStage
  • sampleTitle
  • clusterId
  • clusterName
  • cellTypeId_Cluster
  • cellTypeName_Cluster
  • cellTypeAnnotationStatus
  • stageAnnotationStatus
  • stageBiologicalStatus
  • sex
  • strain
  • genotype
  • speciesId
  • comment
  • annotatorId
  • lastModificationDate
  • replicate
  • infoReplicate
  • SRSId
  • tags
  • protocol
  • protocolType (= Full-length)

scRNASeqTB (target-based scRNA-seq)

Target-based scRNA-seq libraries are single-cell libraries that each contains more than a cell type, and to report and further annotate each cell types, it is necessary to have barcodes/UMI linked to each individual cells. See here detailed information about barcodes.

It may happen that the same barcode reports different cell types in the same experiment: such case is normal, as the number of available barcodes is somehow limited. However inside a library, each barcode has to be unique to a cell type.

A clustering information may be associated to target-based experiment and allow to define precise cell types.

currently accepted protocols

  • 10X Genomics: it exists different version that we report in file scRNASeqTBLibrary.tsv, column AE 'whiteList' (so far only 'whiteList' = v2 is accepted (?))

  • CEL-seq2

  • Drop-seq