Minutes_Standards_2022 02 - airr-community/airr-standards GitHub Wiki

Standards Call 2022-02

Agenda

  • Github Actions: Follow-up with Jason (#484, #541, #546)
  • Report from the ExCom/Co-Leads meeting
  • AIRR VI is coming up, any organizational things that this WG wants to discuss?
  • Terminology documents (see Call 2021-10, #549).
  • Germline-related objects: Introduced and further updated in PRs #530, #566 and #568. Still numerous open questions (#559, #562) that require a discussion about scope, intended use cases, and potential MiAIRR inclusion.
  • Generic Contributor object: Schema has now multiple data structures that annotate researchers involved in a study and their roles. We should try to consolidate this. (#552)
  • Call 2021-08 notes: Follow up on ComRepo feedback (also see Call 2021-12), suggested restructuring:
    • TSV output SHOULD be supported for all data that is commonly represented in tabular structures
    • All API endpoints SHOULD return JSON encodings
    • For some API endpoints it is possible to request TSV files
    • For any endpoints supporting TSV encodings these will be clearly documented in the API Spec and/or Documentation
    • Endpoints that do not support TSV can reject TSV requests
    • If API endpoint returns a field then the content of that field in JSON and TSV MUST be equivalent
    • Bottom line: API Endpoints that are JSON or JSON+TSV should be defined in the spec/docs - the repositories don't get to choose.
  • Cell schema (#409) and associated pull request (#574)
  • keywords_study (#515, #569)
  • Restarted activity on (#404): Should we try to have a closer integration of the Receptor object and MHCGenotype from the Germline schema?

Minutes

Meta

  • Date: Mon, 2022-02-21 19:00 UTC
  • Present: Aditi, Adrien, Brian, Christian, Felix, Jason, Jingyun, Scott, William, Ulrik
  • Regrets: -

Topics

  • Github Actions: No issues so far. Travis is currently disabled, except for the release-1.3 branch, with only ~600 credits remaining.
  • Ontology update: Basic AIRR specification-based ontology checker working, uses a custom AIRR Spec at the moment so it will not work in general. Related to #524 discussion.
  • We should strive for v1.4 release before AIRR-C Meeting VI:
    • Set AIRR v1.4 Milestone due date to April 22, which is the Friday before the April ComRepo meeting, so ComRepo has some time to discuss before the AIRR-C meeting.
    • Major v1.4 issue for the ADC is the identifier definitions.
    • Everyone please update their issues and pull requests with the correct Project and target Milestone.
    • We will review v1.4 Milestone and decide on including in the March Standards WG meeting.
  • Terminology document: There is now a consolidated list of terms in a Google Sheet, please review and comment until 2022-02-27.
  • keywords_study: PR #569 has been merged.
  • Continuing Germline discussion:
    • Referencing and updating allele records (#559):
      • GermlineSet cannot contain multiple version of an allele, as there is no full-fledged support in the schema. However, the release_version and release_date field can be used to indicate that changes took place in an AlleleDescription between different GermlineSet records.
      • In case "temporary labels" are assigned to alleles, these labels will - similar to regular gene symbols - be used in for annotation in the Rearrangement.[vdj]_call fields.
      • There is no coordinated process to increment the release_version of an AlleleDescription and/or a GermlineSet. Users will copy a germline set from a germline set repository (e.g., OGRDB) and put it into their study to ensure tight linkage. If users change either of these records, during their analysis procedure they are expected to change the version and date, but these changes will remain local to the study using these modified records. If and how these changes flow back to the germline set repository is beyond the scope of the Standards WG. It should be noted that extended alleles will be considered to be like new alleles, and thus also trigger a version increase. William will add some guidance on this in the Docs.
    • Extended discussion around identifiers and references (#562):
      • It seems to be clear that _id for ancillary objects (e.g., SequenceDelineationV)are not necessary.
      • The main purpose of _id fields for now is to allow the local linkage between objects within a study, which is expected to include a copy of the utilized GermlineSet und Genotype records. This requires uniqueness of the ID, but not resolvability.
      • Updated versions of a utilized GermlineSet may be retrieved via the germline_set_ref, which may contain a CURIE. This reference needs to be resolvable, but not necessarily unique. It is not expected that AIRR-seq repositories use this reference to resolve queries (e.g., for a GermlineSet), as the records included in a study are considered to be most accurate ones for this study,
      • The generic part of the discussion around IDs, PIDs, GIDs, persistence and resolvability has been moved to #347.
  • Next call on Mon, 2022-02-21. Mind the (day-light saving time) Gap!
⚠️ **GitHub.com Fallback** ⚠️