Data archival with Zenodo - gbif-norway/documentation GitHub Wiki
Archival of datasets published to GBIF
Datasets should be archived directly into Zenodo.
-
The metadata published to GBIF should link to the Zenodo archive, and the Zenodo archive should link back to the DOI provided to the dataset by GBIF.
-
Add any datasets archived to the GBIF-Norway group.
-
Use the [email protected] account to upload datasets so they are not linked to your personal account.
-
NB! New versions of source data updated in the IPT should also be updated in Zenodo
Previously, data archiving was done in GitHub and Zenodo, and worked as follows:
Data archiving in GitHub & Zenodo require each dataset to be in a separate repositry https://guides.github.com/activities/citable-code/
Archive dataset with GitHub and Zenodo
GitHub
- Create a new repository for the new dataset to publish in GBIF.no
- Configure write permissions for team "GBIF.no" in Settings-Collaborators & teams
- Add the original source datafile provided by the data owner.
- Add the Darwin Core archive file created by IPT.
- Consider using LFS for large files. The GitHub file size limit is 100MB, and git in general does not handle large files well.
- Create a simple README.md for the repository with the dataset.
- Remember to include a text file LICENSE.txt (in GitHub).
Zenodo
- Login to Zenodo and configure webhooks on repositories (once).
- Select your user profile, GitHub, and settings: https://zenodo.org/account/settings/github/
- Toggle the flip from "off" to "on" for the GitHub repository of the dataset to archive.
Return to GitHub
- Select dataset-repository and make a release (select menu item releases).
- Release-tag: v1.1
- Release-title: [organization_short_dataset_name_version]
GBIF.no IPT
- Add the Zenodo DOI under metadata -- External links
- Name: Zenodo data archive
- Download URL: https://doi.org/10.5281/zenodo....
- Data format: Darwin Core archive
Dataset structure
Proposed:
├── LICENSE
├── README.md <- README including basic metadata, or, perhaps own file for final metadata?
├── data
│ |── raw <- The original, immutable data dump, possibly also including scanned field forms etc..
│ |── interim <- Intermediate data that that are transformed and in a machine interpretable form
│ |── DwC-A <- The final, mapped data
│
├── docs <- Supporting information, e.g. raw metadata and description from data-owners in text
│
├── code <- whatever code is used to transform and map the data