ET WDC 2020 2 - wmo-im/et-acdm GitHub Wiki
Notes from Teleconference via Bluejeans 13:00 15:00 CEST 28 April 2020
Participants:
- Jörg Klausen (JKl, chair)
- Markus Fiebig (MF)
- Tom Kralidis (TK)
- Atsuya Kinoshita (AK)
- Kjetil Torseth (KT)
- Nate James (NJ)
- Judd Welton (JW)
- Anatoly Tsvetkov (AT)
- Enrico Fucile (EF)
- Claudia Volosciuk (CV)
- Debra Kollonige (DK)
- Gao Chen (GC)
- Oystein Godoy (OG)
- Christopher Lehmann (CL)
- Rayn Stauffer (RS)
- Stoyka Netcheva (SN, Rapporteur)
- Drasko Vasiljevic (DV, WMO consultant)
Excused
- Vincent-Henri Peuch
- Martine De Mazière
Agenda
- Welcome and acceptance proposed Agenda (JKl 5’)
- Approval of minutes from the meeting on 11 March – corrections, comments (JKl 5’)
- Round Table from last meeting (all 60’) Purpose to develop action items related to the presentation and examples, or propose other experts to give a talk or clarification and try to agree. Purpose is to identify what is driving and where we want to go in order to make progress in harmonious way on how to develop WDCs in future and propose action items for our path forward.
- Summary WMO contractor’s plan and progress (DV, 15’)
- Highlights from SSC review of ToR of ETs and SAGs (JKl, 5’)
- Proposed new method to follow up on Action items and use of wiki
- Review of Action Items with upcoming/past deadlines (SN, 20’)
- Next meeting (JKl, 10’).
1. Minutes from last meeting were approved.
2. Claudia Volosciuk will start work with WDC and GAW Contributing Networks part time for the next 6 months. She will be working on documenting data QA/QC protocols and automated tools adopted by individual centers and networks. The intension is to create a library of already existing tools and descriptions, and training materials to help centers and networks in need to improve the quality of their data. Everybody’s cooperation will be very highly appreciated.
One item added to the proposed Agenda is a discussion of the current web presence and proposed use of Wiki. To be done at the end.
3. Round Table from the last telecon:
DOIs had been discussed few times in the past regarding observations and archiving in WDC. At this point no clear vision exists on what WDC needs and how WDC should evolve to get there. We want to hear what do you already use in your DC or what plans you have on the implementation of DOIs and other identifiers. Sharing what has been already implemented and how will help us come to solution.
MF on NILU/EBAS/WDCRG use of DOIS: As result of involvement in multiple projects incl ICOS, the center/network started using Primary DOIs with fixed granularity and hierarchy. Results are expected at the end of the year. During development phase consideration and proper research should be done and include: availability in the future, change concept in the future. Now it is assigned to all data sets in current architecture. One data set is one continuous observation from one institution, one instrument, one station and can include multiple observed variables. How to define a data set is decided by each data center or data repository. Persistent identifiers (PIDs) are given to all data pre-products, prerequisite for provenance which requires having all entities in data production flow, incl issues (history) and data holdings, documentation on attribution. Still not aligned with WIGOS xml. Some vocabularies are missing.
TK on WOUDC: DOIs are implemented at data set level for all data sets. 2 DOIs: Total ozone column DOI and ozonesondes DOI. SAG O3 and UV requested assigning DOIs for each station not per data set. It is needed for publication to identify where the data set is. This approach might be better achieved in OSCAR or other place upstream. DOIs are in WIS metadata.
JKl: Any changes to current set up of OSCAR should be requested through submitting formal request to OSCAR application board on behalf of this group. It will have more weight and get appropriate attention. The general position of OSCAR team had been that DOIs are not priority and receiving request from the group could change this. OSCAR has something at station level –WIGOS station ID, it is as PID and digital object identifier, but is not registered in registers. Could be small thing to do but needed united push.
EF: Main problem for DOIs is to decide what it is referencing. Providing unique ID or referencing DOI? We have ID. What are the requirements to have DOIs or to have identifiers when submitting for publication – to present DOI of set or link to available data? Unique ID for stations can be done easily (WIGOS ID) and link could be provided through URL.
TK: WOUDC do identifiers but they are not DOIs.
JW: Using Station ID – SP pole station as example – having different activities on one site could be difficult to identify different projects and datasets.
JKl: The main question is what do you want to describe. WIGOS Station Identifier is for station ID – and not sufficient to describe data set at a station. Can it be minted as DOI? We can have DOI for particular data set you want to reference. We need to know what is the content behind DOI? WMO view will be the record in OSCAR. While selecting DOIs we need to know who will manage them and how.
GC: South Pole experiment allowed DOIs management by different groups (each data providing group) from one and the same station. For some needs, it could be important to have identifying programs, while for others might not be. NOAA is doing it under their DOIs. This approach “at station level” is valid for NOAA. It is not a problem if WMO looks at station level as South Pole. We will have our own DOIs for specific purposes and there’s no conflict.
MF: We could end up with several DOIs per station. Each group, each data set. We use 1 DOI per continuous dataset per instrument per station. We allow instrument swap if all instruments are calibrated and are the same model.
JKl: In WIGOS an instrument is tied to a deployment, i.e., an instrument change involves a new deployment. Several deployments are combined and create one observation. Observations are defined by variable and geometry at a station/platform/observing facility. Stations can be combined further in clusters/sets. This is a hierarchy for homogeneous data sets. We agreed as ET-WDC that we will work as DC for DOIs and how to move for implementation and have work plan to follow. What to do to accelerate the process? Some groups have implemented approaches, others have not. WOUDC and NILU are known. AI: Send few lines to SN for the minutes (each DC).
JW: Want DOIs per station and to be related to the observations for MPLNET, GALION but will require scoping out, wider will be better but have no time to implement those at the moment.
CL: UoI has experience in DOIs at UoI, holding ID of Universities, but at station levels are created at host institutions. Is there guidance so original station to create them or to have station do this?
MF: DOIs have prefix of institution, DOI have framework and background associated with multiple networks. DOI is not for transporting additional information.
GC: DOIs are for record ID. They are not for crediting funding agencies.
MF: What mechanism to use to account for crediting ? prefix? or use Metadata for credits? But not DOIs.
GC: It needs to be simple and work for 10 years – should be a simple number. May be as 1st step we could decide what to be first for example – start with station or prefix, 2d place for field campaign type, name – aircraft, campaign, suffix or something like that followed by instrument names and so on incl all needed information in agreed order. Credit could be given on the landing page. Version is the most important part.
JKl: DOI is just an identifier – license plate for a digital entity. There are models for describing its content, e.g. for attribution, to describe licenses, ownership etc.
GC: Need to have on the landing page the data provider for credit and questions. Who made the measurement? To put this in writing to contact data provider for data use.
OG: DOIs are citeable and traceable contrary to an identifier.
JKl: Now trying to create a status document where we are. Guidance will follow if not enough clarity. AI 2020-1: To create status document. Action Item to everybody to provide information on each data centre and networks and discuss it on our next meeting in May. Collection of DOIs status
Document to be followed by guidance document if clarity does not exist. Common understanding is important to make progress for the community.
AT: DOIs is important discussion, but documenting and sharing QC/QA procedures, system, existing tools and practices for QC/QA is needed for radiation data center. This had been long standing topic to revive.
AK: WDCGG are in process of assigning DOIs this year.
JKl: Invites AK to consult with SAG GG and other people (this team) before implementation. Was external input considered? AI 2020-2: Atsuya to describe at next telecom what is the plan for DOIs implementation and who was consulted. Consult with SAG and the team on implementation. Report back at next telco of ET-WDC.
JKl: Back to Anatoly’s question on QC/QA. CV will be working on this outstanding topic and contact everybody to document this. WDCs and networks are expected to provide information and share it when possible. What you do. How it works, tools existing on time series or other things to define if there is a data issue. Let’s benefit from the experience of others. Everybody who wish could give presentations or send documents to CV.
KT: comment to collection to QC/QA documentation. Information probably exists to different extent and is concerned of going into much detail in describing for archiving purpose only. Suggests to focus on review how it is done rather than collection of information which will be useful for users and improvement of practices and performance.
JKl: To extract and compile in one place and not needed and prone to get outdated if not the intention. But DCs need to be more transparent and make info available for users and be challenged and be appreciated. It will be good to make info available on the websites at each DC at the same place and in a similar way. It depends on what is available and what approaches had been taken. There is clear link to SAGs. Data QC is clearly related to the science. The link needs to be maintained. To be reviewed in future telecom.
4.Summary WMO contractor’s plan and progress
DV: gave an overview of what he is working on to help in implementing WIGOS, his progress and plans. Most of participants were already contacted and work was initiated and for some discussions already advanced. Gathering and encoding information is in progress with intention for facilitating automated xml uploads to OSCAR.
JKl: one comment: OSCAR provides metadata also in xml, 1st method is manual download, or one can try API to access xml files directly, 3rd option is the OAI-PMH server that has been set up and can be introduced to the group on the next telecon.
AI 2020-3: Introduce OAI-PMH instance available (JKl)
MF: We have 2 servers – 1 operational and 2 developing for WIGOS version where progress is held up by vocabulary issues. Needs help and collaboration on this.
JKl: Enrico could help with WIGOS task team discussion on what the issue is. He can invite task team at next telecon for discussion. Prepare proposal in advance for variables with Drasko’s help and report on next telecon. The document to be shared with the team.
AI 2020-4: Clarify issues with vocabularies (MF, DV, JKL) and prepare CR for WIGOS metadata code tables.
5. Highlights from SSC review of ToR of ETs and SAGs
Discussions had revolved around responsibilities of each group- SAGs and ET. SAG is giving direction for priorities, data submission, data processing, call for data submissions. It is expected ET-WDC named now Data Management group to be connected to all SAGs.
6.Proposed new method to follow up on Action items and use of wiki
JKl: Proposed new method to track progress on Action Items via GitHub cards. Each Action Item to be assigned to a person and when progress made moved to the next box and further proposed for review and completion. Everybody needs to register/create an account to allow assigning of a task and report on progress. 2nd proposal is to switch from current group website on Github to wiki. Advantages – does not require publishing step in comparison to the website, does not require additional software to be used; it is less complex; prevents the need for asking TK for help.
JW: likes the current website. Clean and well organized.
JKl: wiki could be organized in the same way with no difference but we could leave as is now and when content duplicated to review and ask for opinion again.
Team agrees on proposed changes, AI will be maintained in Github, website will be continued in the wiki.
7. Review of Action Items with upcoming/past deadlines
AI 2020-5: Collect input on progress on Action Items and create GitHub issues/project cards. Continue documentation of ET-WDC activities on wiki (SN)
8. Next meeting: May 26 13:00 CEST.
List of Action Items from the meeting:
AI 2020-1: To create status document. Action Item to everybody to provide information on each data centre and networks and discuss it on our next meeting in May (all).
AI 2020-2: Atsuya to describe at next telecom what is the plan for DOIs implementation and who was consulted. Consult with SAG and the team on implementation. Report back at next telco of ET-WDC (AK).
AI 2020-3: Introduce OAI-PMH instance available (JKl)
AI 2020-4: Clarify issues with vocabularies (MF, DV, JKL) and prepare CR for WIGOS metadata code tables.
AI 2020-5: Collect input on progress on Action Items and create GitHub issues/project cards. Continue documentation of ET-WDC activities on wiki (SN)