Datasets - satijalab/seurat-data GitHub Wiki
Dataset Packages
SeuratData uses R packages to bundle and distribute datasets in an R-native manner. By using R packages, neither SeuratData nor the user need to know where the data are stored. Instead, R itself handles the downloading and storage of datasets. This setup also allows the user to load a dataset from any directory, as R handles the loading of data itself as well.
Storing the Data
Datasets can be bundled with packages in two forms: a Seurat
object loadable through the data
mechanism or an h5Seurat file accessible with LoadData
Documentation and Citations
Documentation of datasets should be done using standard Roxygen syntax
Dataset packages should also include citation information in a CITATION file located at inst/CITATION
. The original source of the dataset should be listed as the citation
Metadata and Other Information
Dataset metadata, not cell-level metadata provided in the Seurat
object, is stored in the dataset's package DESCRIPTION.
Key | Value |
---|---|
Package | Name of package, should be name_of_dataset.SeuratData |
Date | Date package was built in YYYY-MM-DD format, used for versioning |
Type | Should be Package |
Title | Short description of dataset |
Version | Version of Seurat dataset was built under |
Author or Authors@R | Name(s) and contact information dataset package builders and maintainers |
Description | ... |
License | License of data, typically a Creative Commons license (eg. CC BY 4.0) |
Encoding | Character encoding used by package, typcially UTF-8 |
LazyData | ... |
RoxygenNote | Version of Roxygen dataset documentation was generated with |
Suggests | Packages and package versions used to generate dataset, should include a version Seurat |
The DESCRIPTION for the CBMC dataset provided by SeuratData is as follows
Package: cbmc.SeuratData
Date: 2019-07-17
Type: Package
Title: scRNAseq and 13-antibody sequencing of CBMCs
Version: 3.0.0
Authors@R: c(
person(given = 'Satija', family = 'Lab', email = '[email protected]', role = c('aut', 'cre'))
)
Description:
species: human
system: CBMC (cord blood)
ncells: 8617
tech: CITE-seq
default.dataset: raw
License: CC BY 4.0
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
Suggests:
Seurat (>= 3.0.0)