TableInfo: subject.tsv - nih-cfde/published-documentation GitHub Wiki

The subject.tsv table will contain one row per subject in your program.

Note on human subjects and ethnicity metadata: ethnicity values are supplied by study subjects themselves as a vehicle for their self-identification. Obtaining, storing and using this information should not be taken to imply that any concept of ethnicity has a well-defined biological analogue: none does. Research involving ethnically self-identified populations has the potential to cause great harm, and should be continuously and carefully scrutinized, especially with respect to its potential to support existing harmful disparities in medical care (or even to create new ones). So why store it at all? Because, carefully used, this information also has the potential to offer systematic improvements to care unobtainable through other means: for example, by exposing systematic differences in treatment decisions or health outcomes that disproportionately affect ethnic minority communities.

Note on human subjects and sex metadata: sex, in the context of C2M2 metadata, is considered to be related to but fully distinct from gender. We do not currently model gender. This doesn't reflect any decision to exclude such metadata; the issue is that at present, we lack any available DCC metadata for this dimension. Gender is a self-affirmed concept expressing various fundamental qualities of subjective human existence and experience: as such it can correlate (like race) with differences in health care outcomes, but (like race) the question of a direct biological correlate is epistemologically meaningless. Sex -- related to, but not a proxy for, gender -- in the context of C2M2 clinical metadata is meant to broadly describe groups of people based solely on morphology and genetics (i.e. without regard to gender identity or expression). Currently modeled values include female, male, intersex, transsexual (female-to-male), transsexual (male-to-female), and indeterminate. We define "transsexual" to be someone who has completed or has begun any medical physiological transition process, be it surgical, hormonal, or otherwise; "indeterminate" is a clinical category through which clinicians report that no sex determination has been made during course of treatment or study.

Field Field Description Required? Field Value Type Extra Info
id_namespace A CFDE-cleared identifier representing the top-level data space containing this subject [part 1 of 2-component composite primary key] Required string id_namespace is a unique URI prefix pre-registered with CFDE and attached to your program (or a subset of your program) that identifies anything labeled with it as belonging to you. Please see the technical documentation for a full discussion of how this information is built and used.
local_id An identifier representing this subject, unique within this id_namespace [part 2 of 2-component composite primary key] Required string The string formed by concatenating the id_namespace and local_id field values must be unique for each row in this table. Please see the technical documentation for a full discussion of how this information is to be used.
project_id_namespace The id_namespace of the primary project within which this subject was observed [part 1 of 2-component composite foreign key] Required string This will be the value of id_namespace in the row in project.tsv corresponding to the primary project that observed this subject. If your program has not registered multiple CFDE identifier namespaces, this will be exactly the same value for all rows.
project_local_id The local_id of the primary project within which this subject was observed [part 2 of 2-component composite foreign key] Required string This will be the value of local_id in the row in project.tsv corresponding to the primary project that observed this subject.
persistent_id A persistent, resolvable (not necessarily retrievable) URI or compact ID permanently attached to this subject Optional string Meant to serve as a permanent address to which landing pages (which summarize metadata associated with this subject) and other relevant annotations and functions can optionally be attached. Please see the technical documentation for a full discussion of how this information is to be used.
creation_time An ISO 8601 -; RFC 3339 (subset)-compliant timestamp documenting this biosample's creation time Optional (string) YYYY-MM-DDTHH:MM:SS±NN:NN Examples:
2021-01-08T00:00:00-00:00 ("Jan 8, 2021")
2021-00-00T00:00:00-00:00 ("2021")
2021-01-08T00:45:40-04:00 ("Jan 8, 2021, 12:45:40AM, Zulu minus 4")
Please see the technical documentation for a complete treatment.
granularity A CFDE CV term categorizing this subject by multiplicity Required enum of strings Table of allowed values
sex The sex of this subject optional string enum of strings Table of allowed values
ethnicity The ethnicity of this subject optional string enum of strings Table of allowed values
age_at_enrollment The age in years (with a fixed precision of two digits past the decimal point) of this subject when they were first enrolled in the primary project within which they were studied Optional number
⚠️ **GitHub.com Fallback** ⚠️