C2M2 Table Summary - nih-cfde/published-documentation GitHub Wiki
Crosscut Metadata Model (C2M2) Common Vocabulary (CV) Tables
- All table files listed in this summary must be bundled together, along with the C2M2 datapackage JSON Schema file, to create a valid C2M2 datapackage for submission to CFDE
- TSV files for any empty (unused) tables must still be submitted, with only the (tab-separated) column-header row filled in
- Table (TSV) filenames must exactly match those listed in the JSON Schema file (and in these docs)
- Table column headers must exactly match those listed in the JSON Schema file (and in these docs)
- Table columns must appear in the order given in the JSON Schema file (and in these docs)
- Tables marked "CV term table" must be built using the CFDE submission prep script (wiki; code)
- Table (TSV) files must not contain any empty rows or extra lines
- Every TSV file must end with the final row of table data, terminated by a newline
Table (click for detailed information) | Construction | Can be empty? | Notes |
---|---|---|---|
analysis_type.tsv | Built by script | Y | CV term table |
anatomy.tsv | Built by script | Y | CV term table |
assay_type.tsv | Built by script | Y | CV term table |
biosample.tsv | Prepared by submitter | Y | This table will have one row for each biosample |
biosample_disease.tsv | Prepared by submitter | Y | For biosamples with disease metadata, this table will have one row for each disease associated with each biosample, along with a field distinguishing "exemplar of disease" from "disease specifically ruled out" |
biosample_from_subject.tsv | Prepared by submitter | Y | This table will have one row for each attribution of a biosample to a subject |
biosample_gene.tsv | Prepared by submitter | Y | For each biosample with a small group of associated genes (e.g. knockdown targets), this table will have one row for each association of a gene with a biosample |
biosample_in_collection.tsv | Prepared by submitter | Y | This table will have one row for each assignment of a biosample as a member of a collection |
biosample_substance.tsv | Prepared by submitter | Y | For biosamples with substance metadata, this table will have one row for each association of a substance with a biosample |
collection.tsv | Prepared by submitter | Y | This table will have one row for each collection |
collection_anatomy.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of anatomy Y", for one particular (collection X, anatomy Y) pair |
collection_compound.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of compound Y", for one particular (collection X, compound Y) pair |
collection_defined_by_project.tsv | Prepared by submitter | Y | This table will have one row for each collection that was generated directly by a project listed in the project.tsv table |
collection_disease.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of disease Y", for one particular (collection X, disease Y) pair |
collection_gene.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of gene Y", for one particular (collection X, gene Y) pair |
collection_in_collection.tsv | Prepared by submitter | Y | This table will have one row for each parent->child (collection->subcollection) relationship |
collection_phenotype.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of phenotype Y", for one particular (collection X, phenotype Y) pair |
collection_protein.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of protein Y", for one particular (collection X, protein Y) pair |
collection_substance.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of substance Y", for one particular (collection X, substance Y) pair |
collection_taxonomy.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of taxonomy Y", for one particular (collection X, taxonomy Y) pair |
compound.tsv | Built by script | Y | CV term table |
data_type.tsv | Built by script | Y | CV term table |
dcc.tsv (formerly primary_dcc_contact.tsv |
Prepared by submitter | N | This table will have exactly one row |
disease.tsv | Built by script | Y | CV term table |
file.tsv | Prepared by submitter | Y | This table will have one row for each file |
file_describes_biosample.tsv | Prepared by submitter | Y | This table will have one row for each association of a biosample with a describing file |
file_describes_collection.tsv | Prepared by submitter | Y | This table will have one row for each association of a collection with a describing file |
file_describes_subject.tsv | Prepared by submitter | Y | This table will have one row for each association of a subject with a describing file |
file_format.tsv | Built by script | Y | CV term table |
file_in_collection.tsv | Prepared by submitter | Y | This table will have one row for each assignment of a file as a member of a collection |
gene.tsv | Built by script | Y | CV term table |
id_namespace.tsv | Prepared by submitter | N | This table will have one row for each C2M2 identifier namespace registered with CFDE |
ncbi_taxonomy.tsv | Built by script | Y | CV term table |
phenotype.tsv | Built by script | Y | CV term table |
phenotype_disease.tsv | Built by script | Y | Each row in this table is equivalent to the statement "phenotype X is known to be associated with disease Y", for one particular (phenotype X, disease Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every phenotype term and every disease term used in submitter-prepared tables |
phenotype_gene.tsv | Built by script | Y | Each row in this table is equivalent to the statement "phenotype X is known to be associated with gene Y", for one particular (phenotype X, gene Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every phenotype term and every gene term used in submitter-prepared tables |
project.tsv | Prepared by submitter | N | This table will have one row for each project |
project_in_project.tsv | Prepared by submitter | Y* | This table will have one row for each parent->child (project->subproject) relationship. --- *If you have more than one project in your project.tsv table, then you must populate this table with all of your program's top-level projects, listed as children of your program's root project. |
protein.tsv | Built by script | Y | CV term table |
protein_gene.tsv | Built by script | Y | Each row in this table is equivalent to the statement "protein X is known to be associated with gene Y", for one particular (protein X, gene Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every protein term and every gene term used in submitter-prepared tables |
subject.tsv | Prepared by submitter | Y | This table will have one row for each subject |
subject_disease.tsv | Prepared by submitter | Y | For subjects with disease metadata, this table will have one row for each disease associated with each subject, along with a field distinguishing "disease detected" from "disease specifically ruled out" |
subject_in_collection.tsv | Prepared by submitter | Y | This table will have one row for each assignment of a subject as a member of a collection |
subject_phenotype.tsv | Prepared by submitter | Y | For every subject with phenotype metadata, this table will have one row for each phenotype associated with each subject, along with a field distinguishing "exemplar of phenotype" from "phenotype specifically ruled out" |
subject_race.tsv | Prepared by submitter | Y | This table will have one row for each subject with a race assertion |
subject_role_taxonomy.tsv | Prepared by submitter | Y | This table will have one row for each taxon assigned to a subject |
subject_substance.tsv | Prepared by submitter | Y | For subjects with substance metadata, this table will have one row for each substance associated with each subject |
substance.tsv | Built by script | Y | CV term table |