Single Audit Data Sources - 18F/FAC-Distiller GitHub Wiki

Single Audit Data Sources

Two data sources are used by the Federal Audit Clearinghouse Distiller. To provide cross-database compatibility, loaders for each data source normalize the data as native Python objects and feed it through the Django ORM.

Assistance listings from sam.gov

Well-formed CSV for CFDA listings are available for download here:

https://beta.sam.gov/data-services?domain=Assistance%20Listings%2Fdatagov

To load this data into the Distiller database, run:

pipenv run python manage.py load_assistance_listings

See implementation here.

Federal Audit Clearinghouse CSVs

There are individual downloads for each fiscal year back to 1997, and a single "all years" download. Table dumps are available here:

https://harvester.census.gov/facdissem/PublicDataDownloads.aspx

To import:

pipenv run python manage.py load_single_audit_db --all
pipenv run python manage.py load_single_audit_db --audit
pipenv run python manage.py load_single_audit_db --cfda
pipenv run python manage.py load_single_audit_db --finding
pipenv run python manage.py load_single_audit_db --findingtext

These fixedwidth CSVs are of unreliable quality for importing. See here for implementation.

  1. Some tables have inconsistent numbers of columns per row in the same source file.
  2. The schema changed from year to year, and the "all years" download appears to mix different formatted rows into one file.
  3. Header counts don't always match up with column counts
  4. Free-form text is not escaped

The escaping of multi-line text in the findingstext table necessitated usage of a procedural file reader, rather than a CSV parser. See parse_findings_text_csv() in the implementation.