C2M2 Table Summary - nih-cfde/published-documentation GitHub Wiki

Crosscut Metadata Model (C2M2) Common Vocabulary (CV) Tables

  • All table files listed in this summary must be bundled together, along with the C2M2 datapackage JSON Schema file, to create a valid C2M2 datapackage for submission to CFDE
  • TSV files for any empty (unused) tables must still be submitted, with only the (tab-separated) column-header row filled in
  • Table (TSV) filenames must exactly match those listed in the JSON Schema file (and in these docs)
  • Table column headers must exactly match those listed in the JSON Schema file (and in these docs)
  • Table columns must appear in the order given in the JSON Schema file (and in these docs)
  • Tables marked "CV term table" must be built using the CFDE submission prep script (wiki; code)
  • Table (TSV) files must not contain any empty rows or extra lines
  • Every TSV file must end with the final row of table data, terminated by a newline
Table (click for detailed information) Construction Can be empty? Notes
analysis_type.tsv Built by script Y CV term table
anatomy.tsv Built by script Y CV term table
assay_type.tsv Built by script Y CV term table
biosample.tsv Prepared by submitter Y This table will have one row for each biosample
biosample_disease.tsv Prepared by submitter Y For biosamples with disease metadata, this table will have one row for each disease associated with each biosample, along with a field distinguishing "exemplar of disease" from "disease specifically ruled out"
biosample_from_subject.tsv Prepared by submitter Y This table will have one row for each attribution of a biosample to a subject
biosample_gene.tsv Prepared by submitter Y For each biosample with a small group of associated genes (e.g. knockdown targets), this table will have one row for each association of a gene with a biosample
biosample_in_collection.tsv Prepared by submitter Y This table will have one row for each assignment of a biosample as a member of a collection
biosample_substance.tsv Prepared by submitter Y For biosamples with substance metadata, this table will have one row for each association of a substance with a biosample
collection.tsv Prepared by submitter Y This table will have one row for each collection
collection_anatomy.tsv Prepared by submitter Y Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of anatomy Y", for one particular (collection X, anatomy Y) pair
collection_compound.tsv Prepared by submitter Y Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of compound Y", for one particular (collection X, compound Y) pair
collection_defined_by_project.tsv Prepared by submitter Y This table will have one row for each collection that was generated directly by a project listed in the project.tsv table
collection_disease.tsv Prepared by submitter Y Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of disease Y", for one particular (collection X, disease Y) pair
collection_gene.tsv Prepared by submitter Y Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of gene Y", for one particular (collection X, gene Y) pair
collection_in_collection.tsv Prepared by submitter Y This table will have one row for each parent->child (collection->subcollection) relationship
collection_phenotype.tsv Prepared by submitter Y Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of phenotype Y", for one particular (collection X, phenotype Y) pair
collection_protein.tsv Prepared by submitter Y Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of protein Y", for one particular (collection X, protein Y) pair
collection_substance.tsv Prepared by submitter Y Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of substance Y", for one particular (collection X, substance Y) pair
collection_taxonomy.tsv Prepared by submitter Y Each row in this table is equivalent to the statement "the contents of collection X directly relate to the study of taxonomy Y", for one particular (collection X, taxonomy Y) pair
compound.tsv Built by script Y CV term table
data_type.tsv Built by script Y CV term table
dcc.tsv (formerly primary_dcc_contact.tsv Prepared by submitter N This table will have exactly one row
disease.tsv Built by script Y CV term table
file.tsv Prepared by submitter Y This table will have one row for each file
file_describes_biosample.tsv Prepared by submitter Y This table will have one row for each association of a biosample with a describing file
file_describes_collection.tsv Prepared by submitter Y This table will have one row for each association of a collection with a describing file
file_describes_subject.tsv Prepared by submitter Y This table will have one row for each association of a subject with a describing file
file_format.tsv Built by script Y CV term table
file_in_collection.tsv Prepared by submitter Y This table will have one row for each assignment of a file as a member of a collection
gene.tsv Built by script Y CV term table
id_namespace.tsv Prepared by submitter N This table will have one row for each C2M2 identifier namespace registered with CFDE
ncbi_taxonomy.tsv Built by script Y CV term table
phenotype.tsv Built by script Y CV term table
phenotype_disease.tsv Built by script Y Each row in this table is equivalent to the statement "phenotype X is known to be associated with disease Y", for one particular (phenotype X, disease Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every phenotype term and every disease term used in submitter-prepared tables
phenotype_gene.tsv Built by script Y Each row in this table is equivalent to the statement "phenotype X is known to be associated with gene Y", for one particular (phenotype X, gene Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every phenotype term and every gene term used in submitter-prepared tables
project.tsv Prepared by submitter N This table will have one row for each project
project_in_project.tsv Prepared by submitter Y* This table will have one row for each parent->child (project->subproject) relationship.
---
*If you have more than one project in your project.tsv table, then you must populate this table with all of your program's top-level projects, listed as children of your program's root project.
protein.tsv Built by script Y CV term table
protein_gene.tsv Built by script Y Each row in this table is equivalent to the statement "protein X is known to be associated with gene Y", for one particular (protein X, gene Y) pair; contents are autoloaded from HPO by the submission prep script, which will add relevant rows for every protein term and every gene term used in submitter-prepared tables
subject.tsv Prepared by submitter Y This table will have one row for each subject
subject_disease.tsv Prepared by submitter Y For subjects with disease metadata, this table will have one row for each disease associated with each subject, along with a field distinguishing "disease detected" from "disease specifically ruled out"
subject_in_collection.tsv Prepared by submitter Y This table will have one row for each assignment of a subject as a member of a collection
subject_phenotype.tsv Prepared by submitter Y For every subject with phenotype metadata, this table will have one row for each phenotype associated with each subject, along with a field distinguishing "exemplar of phenotype" from "phenotype specifically ruled out"
subject_race.tsv Prepared by submitter Y This table will have one row for each subject with a race assertion
subject_role_taxonomy.tsv Prepared by submitter Y This table will have one row for each taxon assigned to a subject
subject_substance.tsv Prepared by submitter Y For subjects with substance metadata, this table will have one row for each substance associated with each subject
substance.tsv Built by script Y CV term table
⚠️ **GitHub.com Fallback** ⚠️