Raw data format - sustainabledevelopment-rwanda/sdg-indicators-archived GitHub Wiki
How the Data Format Works
These guidelines are for anyone wishing to supply a data set for an SDG indicator. For data guidance and troubleshooting, please visit sustainabledevelopment-rwanda/sdg-data.
Column Headings
- The first row in the csv must be the column names. There can't be any extra rows with information, notes etc.
- The first column must be
Year
, case sensitive with no extra white space. - The last column must be
Value
, case sensitive with no extra white space. - If Geo codes are included, the second from last column must be ' GeoCode', case sensitive with no extra white space
- Any number of categorical columns may go in between e.g. ' Sex', ' Age', ' Geography'. Ensure you use consistent category naming across the indicators.
Fields
Year
should be integer, unquoted, e.g.2017
.- You do not need to have each year present, and order does not matter.
Value
should always be numeric (no notes, or < symbols etc), or blank (signifies missing).Units
see section below. This is a special field.- If a cell is blank (missing) this means aggregated over. So in the example below the first row is disaggregated by Grade=A, then Grade=B, and the last one is missing, which means “Both”. The blanks in the Fruit field mean “All fruits”. So blank in general means “all”. There is no need for any text saying ' Total'.
- There must not be trailing white-space in any categorical field. So "A Word" is OK, but "A " is not. See further info below.
Year | Grade | Fruit | Value |
---|---|---|---|
2016 | A | 1.1 | |
2016 | B | 1.2 | |
2016 | 1.15 |
If we have it broken down by fruit OR grade then it would look like the plot below. The last line, with all grouping columns blank, is the national level indicator.
Year | Grade | Fruit | Value |
---|---|---|---|
2016 | A | 1.1 | |
2016 | B | 1.2 | |
2016 | Apples | 1.0 | |
2016 | Oranges | 1.3 | |
2016 | Lemons | 1.05 | |
2016 | 1.15 |
Then if we introduce two levels it would go to something like (values just for illustration):
Year | Grade | Fruit | Value |
---|---|---|---|
2016 | A | Apples | 1.05 |
2016 | B | Apples | 0.95 |
2016 | Apples | 1.00 | |
2016 | A | Oranges | 1.35 |
2016 | B | Oranges | 1.25 |
2016 | Oranges | 1.3 | |
2016 | A | Lemons | 1.1 |
2016 | B | Lemons | 1.0 |
2016 | Lemons | 1.05 | |
2016 | A | 1.1 | |
2016 | B | 1.2 | |
2016 | 1.15 |
The idea is that the plot would start out with the all blank row (at the bottom) and then as you add disaggregations to the plot it would filter to show the right data.
Units
Some indicators are reported using different units of measure. A special column name, Units
is used to capture this (see 9-2-1 as an example). The Units
column is interpreted as a special top level disaggregation. You may only choose one at a time and it affects the chart axis labels. A Units column is not required if an indicator only includes one unit.
White space
You can double check for white space within an Excel file using a COUNTIF formula or by running a macro. We will add more information on this soon.