Data Formatting - TaxFoundation/data GitHub Wiki

Key Points

  • Save data as CSV files (optional: you may publish Excel files on the Tax Foundation website in addition to CSV files, but never without making a CSV file of machine-readable data)
  • Files should have specific and human-readable names in lowercase with words separated by hyphens. Example: tax-freedom-day-2014.csv.
  • Data variables should have specifc and human-readable names without spaces where all first letters of a word (besides the first word) are capitalized. Examples: incomeTaxRate, propertyTaxRate, effectiveTaxRate.
  • Numbers should include only digits and decimal points if necessary; data should never include symbols like $, %, etc. Examples: 32445.84, 0.05.

Why Does it Matter How We Publish Our Data?

Tax Foundation strives to be principled, insightful, and engaged. Part of living up to these ideals means that whenever we publish research, we also publish any data we used or created in the process. That data must be not only available, but highly useable. It should be formatted consistently and use common standards. Anyone in possession of our data and following our publicly-stated methodologies should be able to reproduce our work.

How Should We Format Our Data?

All data should be published as comma-separated value (CSV) files. CSV is a commonly-used file type that can be used in Excel and many other programs. CSV files are simply text files where each value is separated by commas, and rows are separated by line breaks. Because CSV is a text file, the data is not stuck in any proprietary format. Also, no formatting (fonts, colors, etc.) is saved in CSV files.

Take note that CSV files, unlike Excel files, do not support multiple worksheets—only one table of data can be saved per CSV files. If you're working with an Excel file that has multiple worksheets of data, you may need to save each worksheet individually as a CSV file when preparing for publication.

CSV also does not support saving Excel formulas—only values will be saved to a CSV file.

If using Excel to prepare your data, you can save CSV files by choosing File > Save As, and selecting from the save dialog box Save As Type: CSV (Comma Delimited).

Files should have human-readable names that accurately describe the file contents and the date of publication. File names should not use spaces; instead, separate words with hyphens. Avoid capitalization.

Good:

  • tax-freedom-day-2014.csv
  • state-and-local-sales-tax-rates-2014.csv

Bad:

  • TFD 2014.csv
  • State Local Tax.csv

Variable names should be human-readable, and avoid codes or abbreviations whenever possible. The recommended way to write variable names is to use a form of CamelCase, where spaces are omitted and each word after the first word (including smaller words like "and") begins with a capital letter.

Good:

  • state
  • salesTaxRate
  • salesTaxCollections

Bad:

  • s
  • Tax Rate
  • collections

When publishing data in dollars, percents, or other specific number forms, the data should use decimal format without an accompanying $, %, etc. Large numbers should not use commas.

Good:

  • 10267.91
  • 0.395

Bad:

  • $10,267.91
  • 39.5%

Example of a Correctly Formatted Data Set

File name: iowa-property-tax-rates-2014.csv

countyId county propertyTaxRate
19001 Adair 0.0307
19003 Adams 0.0276
19005 Allamakee 0.0268
19007 Appanoose 0.0368
19009 Audubon 0.0306
19011 Benton 0.028
19013 Black Hawk 0.0351
19015 Boone 0.0309
19017 Bremer 0.0301
19019 Buchanan 0.0309
19021 Buena Vista 0.0304
19023 Butler 0.0293
19025 Calhoun 0.0282
19027 Carroll 0.0236
19029 Cass 0.0317
19031 Cedar 0.0277