C2M2 Table Summary - nih-cfde/published-documentation GitHub Wiki
Crosscut Metadata Model (C2M2) Common Vocabulary (CV) Tables
These files can be assembled mostly automatically, please see the Submission Guide for instructions on assembling these files.
- All table files listed in this summary will be bundled together, along with the C2M2 datapackage JSON Schema file which defines them, to create a valid C2M2 datapackage for submission to CFDE
- TSV files for any empty (unused) tables must still be submitted, with only the (tab-separated) column-header row filled in
- Table (TSV) filenames must exactly match those listed in the JSON Schema file (and in these docs)
- Table column headers must exactly match those listed in the JSON Schema file (and in these docs)
- Table columns must appear in the order given in the JSON Schema file (and in these docs)
- Tables marked "CV term table" will be built automatically with the CFDE tools (wiki)
- Table (TSV) files must not contain any empty rows or extra lines
- Every TSV file must end with the final row of table data, terminated by a newline
| Table (click for detailed information) | Construction | Can be empty? | Notes |
|---|---|---|---|
| analysis_type.tsv | Built by script | Y | CV term table |
| anatomy.tsv | Built by script | Y | CV term table |
| assay_type.tsv | Built by script | Y | CV term table |
| biofluid.tsv | Built by script | Y | CV term table |
| biosample.tsv | Prepared by submitter | Y | This table will have one row for each biosample |
| biosample_disease.tsv | Prepared by submitter | Y | For biosamples with disease metadata, this table will have one row for each disease associated with each biosample, along with a field distinguishing "exemplar of disease" from "disease specifically ruled out" |
| biosample_from_subject.tsv | Prepared by submitter | Y | This table will have one row for each attribution of a biosample to a subject |
| biosample_gene.tsv | Prepared by submitter | Y | For each biosample with a small group of associated genes (e.g. knockdown targets), this table will have one row for each association of a gene with a biosample |
| biosample_in_collection.tsv | Prepared by submitter | Y | This table will have one row for each assignment of a biosample as a member of a collection |
| biosample_ptm.tsv | Prepared by submitter | Y | For each biosample with a small group of associated PTMs, this table will have one row for each association of a PTM with a biosample |
| biosample_substance.tsv | Prepared by submitter | Y | For biosamples with substance metadata, this table will have one row for each association of a substance with a biosample |
| collection.tsv | Prepared by submitter | Y | This table will have one row for each collection |
| collection_anatomy.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of anatomy Y", for one particular (collection X, anatomy Y) pair |
| collection_biofluid.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of biofluid Y", for one particular (collection X, biofluid Y) pair |
| collection_compound.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of compound Y", for one particular (collection X, compound Y) pair |
| collection_defined_by_project.tsv | Prepared by submitter | Y | This table will have one row for each collection that was generated directly by a project listed in the project.tsv table |
| collection_disease.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of disease Y", for one particular (collection X, disease Y) pair |
| collection_gene.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of gene Y", for one particular (collection X, gene Y) pair |
| collection_in_collection.tsv | Prepared by submitter | Y | This table will have one row for each parent->child (collection->subcollection) relationship |
| collection_phenotype.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of phenotype Y", for one particular (collection X, phenotype Y) pair |
| collection_protein.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of protein Y", for one particular (collection X, protein Y) pair |
| collection_ptm.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of PTM Y", for one particular (collection X, PTM Y) pair |
| collection_substance.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of substance Y", for one particular (collection X, substance Y) pair |
| collection_taxonomy.tsv | Prepared by submitter | Y | Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of taxonomy Y", for one particular (collection X, taxonomy Y) pair |
| compound.tsv | Built by script | Y | CV term table |
| data_type.tsv | Built by script | Y | CV term table |
dcc.tsv (formerly primary_dcc_contact.tsv) |
Prepared by submitter | N | This table will have exactly one row |
| disease.tsv | Built by script | Y | CV term table |
| domain_location.tsv | Prepared by submitter | Y | This table will have one row for each unique domain_location term in the ptm table |
| file.tsv | Prepared by submitter | Y | This table will have one row for each file |
| file_describes_biosample.tsv | Prepared by submitter | Y | This table will have one row for each association of a biosample with a describing file |
| file_describes_collection.tsv | Prepared by submitter | Y | This table will have one row for each association of a collection with a describing file |
| file_describes_subject.tsv | Prepared by submitter | Y | This table will have one row for each association of a subject with a describing file |
| file_format.tsv | Built by script | Y | CV term table |
| file_in_collection.tsv | Prepared by submitter | Y | This table will have one row for each assignment of a file as a member of a collection |
| gene.tsv | Built by script | Y | CV term table |
| id_namespace.tsv | Prepared by submitter | N | This table will have one row for each C2M2 identifier namespace registered with CFDE |
| ncbi_taxonomy.tsv | Built by script | Y | CV term table |
| phenotype.tsv | Built by script | Y | CV term table |
| phenotype_disease.tsv | Built by script | Y | Each row in this table is equivalent to the statement "phenotype X is known to be associated with disease Y", for one particular (phenotype X, disease Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every phenotype term and every disease term used in submitter-prepared tables |
| phenotype_gene.tsv | Built by script | Y | Each row in this table is equivalent to the statement "phenotype X is known to be associated with gene Y", for one particular (phenotype X, gene Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every phenotype term and every gene term used in submitter-prepared tables |
| project.tsv | Prepared by submitter | N | This table will have one row for each project |
| project_in_project.tsv | Prepared by submitter | Y* | This table will have one row for each parent->child (project->subproject) relationship. --- *If you have more than one project in your project.tsv table, then you must populate this table with all of your program's top-level projects, listed as children of your program's root project. |
| protein.tsv | Built by script | Y | CV term table |
| protein_gene.tsv | Built by script | Y | Each row in this table is equivalent to the statement "protein X is known to be associated with gene Y", for one particular (protein X, gene Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every protein term and every gene term used in submitter-prepared tables |
| ptm.tsv | Prepared by submitter | Y | This table will have one row for each PTM |
| ptm_type.tsv | Prepared by submitter | Y | This table will have one row for each unique ptm_type term in the ptm table |
| ptm_subtype.tsv | Prepared by submitter | Y | This table will have one row for each unique ptm_subtype term in the ptm table |
| sample_prep_method.tsv | Built by script | Y | CV term table |
| subject.tsv | Prepared by submitter | Y | This table will have one row for each subject |
| subject_disease.tsv | Prepared by submitter | Y | For subjects with disease metadata, this table will have one row for each disease associated with each subject, along with a field distinguishing "disease detected" from "disease specifically ruled out" |
| subject_in_collection.tsv | Prepared by submitter | Y | This table will have one row for each assignment of a subject as a member of a collection |
| subject_phenotype.tsv | Prepared by submitter | Y | For every subject with phenotype metadata, this table will have one row for each phenotype associated with each subject, along with a field distinguishing "exemplar of phenotype" from "phenotype specifically ruled out" |
| subject_race.tsv | Prepared by submitter | Y | This table will have one row for each subject with a race assertion |
| subject_role_taxonomy.tsv | Prepared by submitter | Y | This table will have one row for each taxon assigned to a subject |
| subject_substance.tsv | Prepared by submitter | Y | For subjects with substance metadata, this table will have one row for each substance associated with each subject |
| substance.tsv | Built by script | Y | CV term table |