3. Structure and conventions - scpyork/data_management GitHub Wiki

General structure

  • DATABASES
  • PROJECT_DATA
    • 2016_Project_C
      • Raw
        • data_soy.csv
        • readme.md
        • metadata.csv
        • units.csv
      • Processed
        • data_soy_yield.csv
        • readme.md
        • metadata.csv
        • units.csv
    • 2019_Project_D
    • Archive
      • 2014-2015_Project_A
      • 2016_Project_B
  • NON-PROJECT_DATA
    • GIS
    • FAOSTAT
      • Agriculture

Folder structure

At the highest level, the SCP datastore is split into DATABASES, PROJECT_DATA and NON-PROJECT_DATA. Within the PROJECT_DATA folder, there will then be further folders for specific projects, and an "Archive" folder for completed projects. Within each project folder, we suggest a "Raw" and a "Processed" folder. Each data file should be accompanied by readme, metadata and units files within the same folder.

Capitalisation

  • Top-level folder names are in capitals PROJECT_NAME
  • Second level directories have the initial word capitalised Source_data
  • Further nested tiers will all be in lower case extra_files

Archives

Archive folders will be used to store old projects/versions of files that we want to keep for any reason. They can be used within project or dataset folders too.

Project folder names

Project folders should be titled with the year of the project, followed by an underscore and then the name of the project.

If the project spans across several years, when it is finished the project folder should include the start and end year, e.g. 2016-2018_ABC_carbon_footprint.

If it is an active project, only include the start year.

General naming conventions

  • There should be no spaces.
  • Dashes signify a relationship (belongs to) e.g. trase-storage
  • Underscores represent spaces, dashes represent a subset.
  • File names containing dates must contain the date formatted in the YYYY-MM-DD format. If only part of the full date is required, the end may be omitted.
  • Day and month need to be zero-padded (02 not 2).

Metadata

The above is just an example but illustrates some of the conventions we should implement. Each project/data folder should also contain three accompanying files:

  • a readme.md file containing: title, description, uses, users, useful scripts, url, how to install (if relevant), dependencies (if relevant).
  • a metadata.csv file containing: the date of last update, url to download data (if available).
  • a units.csv file containing a table with columns: variable, unit name, conversion to SI unit value, SI unit.

Text file format

All readme and text documentation files should be Markdown files with the .md suffix. See: [here].(https://github.com/adam-p/markdown-here/wiki/Markdown-Here-Cheatsheet).

Units

If possible units must either be Système international (SI) or discipline specific units E.g. production should be in tonnes, area in Ha and yield in tonnes per Ha.

All units within a project shall be consistent within a project, and if not explicitly specified with each document, a conversion table between units, including that to SI shall be supplied. See Units Page