Data archival with Zenodo - gbif-norway/documentation GitHub Wiki

Archival of datasets published to GBIF

Datasets should be archived directly into Zenodo.

The metadata published to GBIF should link to the Zenodo archive, and the Zenodo archive should link back to the DOI provided to the dataset by GBIF.
Add any datasets archived to the GBIF-Norway group.
Use the [email protected] account to upload datasets so they are not linked to your personal account.
NB! New versions of source data updated in the IPT should also be updated in Zenodo

Previously, data archiving was done in GitHub and Zenodo, and worked as follows:

Data archiving in GitHub & Zenodo require each dataset to be in a separate repositry https://guides.github.com/activities/citable-code/

Archive dataset with GitHub and Zenodo

GitHub

Create a new repository for the new dataset to publish in GBIF.no
Configure write permissions for team "GBIF.no" in Settings-Collaborators & teams
Add the original source datafile provided by the data owner.
Add the Darwin Core archive file created by IPT.
Consider using LFS for large files. The GitHub file size limit is 100MB, and git in general does not handle large files well.
Create a simple README.md for the repository with the dataset.
Remember to include a text file LICENSE.txt (in GitHub).

Zenodo

Login to Zenodo and configure webhooks on repositories (once).
Select your user profile, GitHub, and settings: https://zenodo.org/account/settings/github/
Toggle the flip from "off" to "on" for the GitHub repository of the dataset to archive.

Return to GitHub

Select dataset-repository and make a release (select menu item releases).
Release-tag: v1.1
Release-title: [organization_short_dataset_name_version]

Add a DOI button to the GitHub README.md

GBIF.no IPT

Add the Zenodo DOI under metadata -- External links
Name: Zenodo data archive
Download URL: https://doi.org/10.5281/zenodo....
Data format: Darwin Core archive

Dataset structure

Proposed:

├── LICENSE
├── README.md          <- README including basic metadata, or, perhaps own file for final metadata?
├── data
│   |── raw            <- The original, immutable data dump, possibly also including scanned field forms etc.. 
│   |── interim        <- Intermediate data that that are transformed and in a machine interpretable form 
│   |── DwC-A   <- The final, mapped data
│
├── docs               <- Supporting information, e.g. raw metadata and description from data-owners in text 
│
├── code             <- whatever code is used to transform and map the data