Single Audit Data Sources - 18F/FAC-Distiller GitHub Wiki
Single Audit Data Sources
Two data sources are used by the Federal Audit Clearinghouse Distiller. To provide cross-database compatibility, loaders for each data source normalize the data as native Python objects and feed it through the Django ORM.
Assistance listings from sam.gov
Well-formed CSV for CFDA listings are available for download here:
https://beta.sam.gov/data-services?domain=Assistance%20Listings%2Fdatagov
To load this data into the Distiller database, run:
pipenv run python manage.py load_assistance_listings
See implementation here.
Federal Audit Clearinghouse CSVs
There are individual downloads for each fiscal year back to 1997, and a single "all years" download. Table dumps are available here:
https://harvester.census.gov/facdissem/PublicDataDownloads.aspx
To import:
pipenv run python manage.py load_single_audit_db --all
pipenv run python manage.py load_single_audit_db --audit
pipenv run python manage.py load_single_audit_db --cfda
pipenv run python manage.py load_single_audit_db --finding
pipenv run python manage.py load_single_audit_db --findingtext
These fixedwidth CSVs are of unreliable quality for importing. See here for implementation.
- Some tables have inconsistent numbers of columns per row in the same source file.
- The schema changed from year to year, and the "all years" download appears to mix different formatted rows into one file.
- Header counts don't always match up with column counts
- Free-form text is not escaped
The escaping of multi-line text in the findingstext table necessitated usage of a procedural file reader, rather than a CSV parser. See parse_findings_text_csv() in the implementation.