Minutes_Standards_2021 12 - airr-community/airr-standards GitHub Wiki

Standards Call 2021-12

Agenda

  • Follow-up on Travis/Github Actions (#484, #541, #546)
  • Christian found some notes for Call 2021-08, which did not make it into the minutes. Did we actually discuss the following points?
    1. TSV output SHOULD be supported for all data that is commonly represented in tabular structures
    2. All API endpoints SHOULD return both JSON and TSV. Endpoints for strongly nested objects MAY be defined as JSON-only and reject TSV requests
    3. If an endpoint returns both JSON and TSV the content MUST be equivalent
  • Terminology documents (see Call 2021-10, #549).
  • Follow-up on example Cell schema files (#409), converted 10X data set is here: https://hub.dkfz.de/s/GXGJdLcgnJm9jJa
  • Can we resolve the issue aroung keywords_study? (#515)

Minutes

Meta

  • Date: Mon, 2021-12-06 19:00 UTC
  • Present: Brian, Chaim, Christian, Jason, Kira, Ulrik
  • Regrets: Scott

Topics

  • Travis/Github Actions: Github Actions seems to have worked fine for the last Merges. Jason will disable Travis for now, we will monitor things until the next meeting
  • Call 2021-08 notes: Points were potentially discussed but clearly no decision on them was made. Brian will bring up points 2+3 with ComRepo later this week. Will revisit this topic in the next call.
  • Terminology documents: The general idea is to come up with the list of terms to be used in the future, but they do not necessarily need to be identical to the ones used in the past. However, the old terms terms will be included in the list, redirecting to the current term for the respective entity. Christian will put the merged list of terms from the various documents on Github by the end of the week for consolidation.
  • Cell schema files: Example JSON looks good, should be finally able to close this soon. Further remarks from the call:
    • JSON format is rather large (625 MB for a 10X GEX data set), but compresses well (down to 23 MB). Also, it is simple to import into MongoDB.
    • While property is the term used by OpenAPI to describe entities within an object, feature_id would be the more generic term used in data science and underscore the following point that it SHOULD hold a CURIE.
    • To increase interoperability, property SHOULD hold PIDs that are appropriate for the data type at hand. This can be Ensembl for genes or an ID from the Antibody Registry for antibodies used in cytometry experiments. However, as experimentalists might use reagents that are not available in a global catalog, also custom strings MUST be supported. These MUST NOT look like CURIEs. Also any given cell_id/property pair MUST be unique
    • We should include an explanation in the docs that technically, the Cell object schema can be used to represent cells for which no rearrangement data is available. This could be due to failed amplification of IG/TR transcripts or due to the cell not expressing IG/TR at all. However, it should not be used to replicate arbitrary GEX data sets into the ADC, if there is no AIRR connection.
  • keywords_study: Will leave current fields in place, but change the proposed keywords to consistently distinguish between objects that exists in the data set in the repository and features of the experimental setup. Discussion will be continued and finalized in #569.
⚠️ **GitHub.com Fallback** ⚠️