Why Document Submission - GSA-TTS/document-extractor-poc GitHub Wiki

The TTS Public Benefits Studio was founded in 2022 to serve as an incubator and accelerator for technology-based infrastructure that improves the efficiency and effectiveness of government at all levels. In 2023, we launched our first shared service offering, Notify.gov, a text messaging service to send bulk one-way messages to members of the public.

Building on that success, in 2024, the team evaluated a short list of product opportunities and narrowed in on document submission processes as a key pain point of both the American public and government application processing staff.

The Document Submission Problem

Submitting documents is a critical part of almost all benefits applications. State and federal benefits programs today face three intertwined challenges that make document submission both a user pain point and an administrative quagmire:

  • Rising mobile submissions, growing back-end burden Almost a third of Americans with incomes less than $30,000 rely solely on mobile phones for internet access, and since 2019, the adoption of mobile-responsive benefits applications has jumped by 25%. Yet while front-end uploaders have become more flexible, accepting PDFs, photos, even crumpled or handwritten forms, these unstructured files simply shift work downstream. Administrators now spend inordinate hours classifying, verifying, and keying in data from documents like W-2s, 1099s, DD214s, and others into data-management systems, delaying benefit determinations and diverting staff from higher-value tasks.

  • Manual work & error risk Because forms arrive in inconsistent formats, staff must decipher legibility issues, parse multi-part names, handle non-English characters, and wrestle with poor image quality. This manual extraction process is slow, error-prone, and mentally taxing, leading to higher operational costs, approval delays, and mounting frustration for both processors and applicants.

  • Procurement & technology gaps Although commercial OCR exists, they are expensive, difficult to integrate into existing workflows, and as such, agencies typically procure through large, multi-year modernization projects that leave tools outdated before launch. Off-the-shelf solutions often demand heavy customization and external expertise—locking agencies into costly contracts and stymieing rapid iteration.

The Solution

To break this cycle, government agencies need a secure, scalable document-processing solution that can:

  • Automatically classify commonly requested documents as part of application processes
  • Accurately extract key data fields—even from low-quality images
  • Continuously improve through model training and feedback loops
  • Work alongside existing agency case-management workflows and systems including, integrating when possible.
  • Deploy safely within a compliant government cloud environment

At the same time, teams must balance speed, ongoing costs, and long-term maintainability, including carefully weighing commercial versus open-source OCR options, so that upgrades stay current, affordable, and under agency control.

Potential Cost Savings for automating data extraction

  • Processing could be 3x faster. Automation significantly reduces document processing times according to a USDA RPA case study.
  • 50 million hours of staff time could be saved. If we consider the total number of hours saved in the human services industry alone.
  • Potential to reduce manual error rate from 31% - which is currently the rate of manual document flows leading to inaccuracy (ABYY).