Minutes_Standards_2021 10 - airr-community/airr-standards GitHub Wiki

Standards Call 2021-10

Agenda

  • Follow-up on Travis/Github Actions (#484, #541, #546)
  • Follow-up on implementation of CellProperties, both for on-disk as well as in the schema (#409).
  • JSON/YAML support: Clean up docs (#547)
  • Follow-up on "AIRR compliance" (Call 2021-09)
  • Follow-up on terminology documents (Call 2021-08, #549).
  • Pushed over from Call 2021-09 :
    • Current status of the discussion on Germline/Genotype representation in the Schema (#530).
    • Review clonal abundance calculations (#543) and revisit *_count fields (#161).
    • DataProcessing sprint is done, need to turn the discussion and results into action items and Github issues.

Minutes

Meta

  • Date: Mon, 2021-10-04 18:00 UTC
  • Present: Brian, Felix, Jason, Kira, Christian, Francisco, Susanna, Scott
  • Regrets: Chaim

Topics

  • Travis/Github Actions: R-CMD is running both on Travis and Github Actions now. Both take around 4 min for a complete run, but Actions do use dependence (i.e., only test if something was changed). Also, Github Actions seems to have better integration into Github and would provide us with twice as many credits/minutes, at apparently equal execution speed. Jason will run one more performance comparison between the two and look at conditional testing for Travis.
  • CellProperty implementation:
    • Every property will have one (locally) unique feature_id (which is rather a label than an ID). Additional PIDs for a feature_id might be included as:
      • additional keys in the JSON
      • normalized representation in a relational DB backend
      • additional files for a matrix TSV, as defined by AnnData.
    • We agreed that the TSV and the JSON may look structurally different, i.e., TSV being a matrix (one row per cell) while the JSON being an normalized hash-of-hashes.
    • Normalization (division operation, not the data representation) is useful when data is queried in a repository, but people will usually want to download non-normlized data, to apply their own routines. This might ultimately require the storage of at least two levels of the same data (i.e., two matrices).
    • To keep moving, we will only support a single feature_id for now, but keep a future extension of feature IDs/labels open. The API will deliver JSON-formated data, which will be identical to the on-disk JSON file. The TSV will be plain matrix with a column header and a cell_id index column. This will be potentially expanded with further AnnData files.
  • JSON/YAML support: Brian will work on the docs, will continue next call.
  • Terminology documents: Agreement that there should be only one document, but it may exist on both RTD and Zenodo. Will need to discuss the hierarchy of the two platforms next time.
  • *_count fields: Pushed to next call, but will have priority then, as SFU team would like to come to a closure regarding clone counts (#543) and the various rearrangement counts (#161) are closely linked. Therefore please comment on these issues soon.
⚠️ **GitHub.com Fallback** ⚠️