Second Call - adammoore/corda GitHub Wiki

Notes from Second call on COrDa work after Hackday - 15 March 2019

Introductions - a couple of new attendees so people did a quick overview of their role and institution

Review of Hackday outputs

The write up was reviewed and agreed to be near complete - Will & Rory to move eprints into code repository, Owen to provide RingGold work

All need to review and update site in general to remove placeholder information. AVM to update in general to reflect progress.

Potential for updates to work completed was discussed to e.g. revise to generate outputs for ORCID schema rather than first pass JSON (this work is more relevant given later discussions)

https://github.com/ORCID/ORCID-Source/tree/master/orcid-model/src/main/resources/record_2.1

Discussion of need for a memory

It was obvious near the end of the hackday that the system would not be performant without some form of memory - taking the initial one stage real-time concept off the roadmap in favour of a more nuanced storage based model where queries to the connected systems are made by queued queries and then returned - this may be in near real time or a report ready notification may need to be generated

Jake

The system design proposed in the link above was overviewed and discussed...where to provide the system with a memory and cache the connectors would be linked to ElasticSearch ingestors

It was noted that ElasticSearch was primarily an index not a storage layer (cache), even though it has been leveraged as such before and should have a proper storage underneath it.

General consensus was that the priority would be to develop a better understanding of the data and how to store and retrieve it via storage than to index and display it.

In light of this JAKE is on hold as something that might be a later phase once the storage / data model had been developed.

Relational model

In order to derive the best model for the data being passed around the system it was proposed to do some formal modelling

  • ERM was suggested as the main approach to the modelling process

  • in preparation for and during the hack day we could look at deriving a model for producing the data / based on the ORCID schema and target sources for the Abstract Layer

Real data processing

For the hack day in order to set up test systems we would be looking for volunteers to provide access to test data sets

Provenance

It was noted that it was important to ensure to surface the source of ORCID iD and any provenance (if available) e.g. from CRIS (originally from CrossRef, metadata authenticated)

Going forwards

-Look for sources for test systems for Community Event

-Design Data model and abstraction

-Devise test protocols - round trips between the registry and integrations and basic dashboard reporting

-Stakeholders interactions - finalise models to be presented before Manchester