Data Formatting - TaxFoundation/data GitHub Wiki
Key Points
- Save data as CSV files (optional: you may publish Excel files on the Tax Foundation website in addition to CSV files, but never without making a CSV file of machine-readable data)
- Files should have specific and human-readable names in lowercase with words separated by hyphens. Example:
tax-freedom-day-2014.csv
. - Data variables should have specifc and human-readable names without spaces where all first letters of a word (besides the first word) are capitalized. Examples:
incomeTaxRate
,propertyTaxRate
,effectiveTaxRate
. - Numbers should include only digits and decimal points if necessary; data should never include symbols like $, %, etc. Examples:
32445.84
,0.05
.
Why Does it Matter How We Publish Our Data?
Tax Foundation strives to be principled, insightful, and engaged. Part of living up to these ideals means that whenever we publish research, we also publish any data we used or created in the process. That data must be not only available, but highly useable. It should be formatted consistently and use common standards. Anyone in possession of our data and following our publicly-stated methodologies should be able to reproduce our work.
How Should We Format Our Data?
All data should be published as comma-separated value (CSV) files. CSV is a commonly-used file type that can be used in Excel and many other programs. CSV files are simply text files where each value is separated by commas, and rows are separated by line breaks. Because CSV is a text file, the data is not stuck in any proprietary format. Also, no formatting (fonts, colors, etc.) is saved in CSV files.
Take note that CSV files, unlike Excel files, do not support multiple worksheets—only one table of data can be saved per CSV files. If you're working with an Excel file that has multiple worksheets of data, you may need to save each worksheet individually as a CSV file when preparing for publication.
CSV also does not support saving Excel formulas—only values will be saved to a CSV file.
If using Excel to prepare your data, you can save CSV files by choosing File > Save As, and selecting from the save dialog box Save As Type: CSV (Comma Delimited).
Files should have human-readable names that accurately describe the file contents and the date of publication. File names should not use spaces; instead, separate words with hyphens. Avoid capitalization.
Good:
tax-freedom-day-2014.csv
state-and-local-sales-tax-rates-2014.csv
Bad:
TFD 2014.csv
State Local Tax.csv
Variable names should be human-readable, and avoid codes or abbreviations whenever possible. The recommended way to write variable names is to use a form of CamelCase, where spaces are omitted and each word after the first word (including smaller words like "and") begins with a capital letter.
Good:
state
salesTaxRate
salesTaxCollections
Bad:
s
Tax Rate
collections
When publishing data in dollars, percents, or other specific number forms, the data should use decimal format without an accompanying $, %, etc. Large numbers should not use commas.
Good:
10267.91
0.395
Bad:
$10,267.91
39.5%
Example of a Correctly Formatted Data Set
File name: iowa-property-tax-rates-2014.csv
countyId |
county |
propertyTaxRate |
---|---|---|
19001 | Adair | 0.0307 |
19003 | Adams | 0.0276 |
19005 | Allamakee | 0.0268 |
19007 | Appanoose | 0.0368 |
19009 | Audubon | 0.0306 |
19011 | Benton | 0.028 |
19013 | Black Hawk | 0.0351 |
19015 | Boone | 0.0309 |
19017 | Bremer | 0.0301 |
19019 | Buchanan | 0.0309 |
19021 | Buena Vista | 0.0304 |
19023 | Butler | 0.0293 |
19025 | Calhoun | 0.0282 |
19027 | Carroll | 0.0236 |
19029 | Cass | 0.0317 |
19031 | Cedar | 0.0277 |