Making Docling UI into Hugging Face Space - cereal-d3v/Docling GitHub Wiki
How to make a Hugging Face Space
1. Set up your Hugging Face Space (if you haven't already):
- Go to [Hugging Face Spaces](https://huggingface.co/spaces) and click "Create new Space."
- Give it a name (e.g.,
Docling-Ui
), choose an SDK (Gradio, Streamlit, Docker, etc., based on whatdocling-serve
is designed for), and set its visibility.
2. Add your Hugging Face Space as a remote to your docking-serve
GitHub repository:
-
Navigate to your local clone of the
docking-serve
GitHub repository.cd path/to/your/docling-serve-repo
-
Add the Hugging Face Space as a new remote. You'll use the URL of your Hugging Face Space, which follows the format
https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME.git
.git remote add space https://huggingface.co/spaces/CerealDev/Docling-Ui.git
(Replace
CerealDev/Docling-Ui
with your actual Space URL.)
3. Push your docling-serve
repository to the Hugging Face Space:
-
Initial Push (Force Push Recommended for First Sync): For the very first push to an empty or intended-to-be-overwritten Hugging Face Space, it's common to force push to ensure everything syncs correctly and overwrites any placeholder content.
git push --force space main
(Or
master
if yourdocking-serve
repo's main branch ismaster
).You will be prompted for your Hugging Face username and a Hugging Face access token.
- How to get a Hugging Face Access Token:
- Go to [Hugging Face Settings](https://huggingface.co/settings/tokens).
- Click "New token."
- Give it a name (e.g., "Docling-Ui-Deploy").
- Set the role to "write" to allow pushing changes.
- Copy the generated token.
- How to get a Hugging Face Access Token:
4. Set up GitHub Actions for automatic synchronization (Highly Recommended):
This is the best way to keep your Hugging Face Space in sync with your GitHub repository. Every time you push to your GitHub repo, the GitHub Action will automatically push those changes to your Hugging Face Space.
-
Create a GitHub Actions workflow file: In your
docking-serve
GitHub repository, create a directory.github/workflows/
if it doesn't exist. Inside it, create a new file (e.g.,sync_to_hf_space.yml
):name: Sync to Hugging Face Hub on: push: branches: [main] # or master, depending on your main branch name workflow_dispatch: # Allows manual triggering from GitHub Actions tab jobs: sync-to-hub: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 with: fetch-depth: 0 lfs: true # Essential if you have large files tracked with Git LFS - name: Push to Hugging Face Space env: HF_TOKEN: ${{ secrets.HF_TOKEN }} # This will be a GitHub Secret run: | git config --global user.email "[email protected]" git config --global user.name "GitHub Action" git push https://CerealDev:[email protected]/spaces/CerealDev/Docling-Ui.git main # Replace with your actual username and Space name
-
Add your Hugging Face Token as a GitHub Secret:
- Go to your
docling-serve
GitHub repository. - Click on "Settings" (usually on the right sidebar).
- Go to "Secrets and variables" > "Actions" > "Repository secrets."
- Click "New repository secret."
- Name it
HF_TOKEN
(must match the name in the workflow file). - Paste your Hugging Face access token (the one you generated in step 3) into the "Secret value" field.
- Click "Add secret."
- Go to your
Now, whenever you push changes to the main
branch of your docking-serve
GitHub repository, the GitHub Action will automatically push those changes to your Hugging Face Space, and it will handle Git LFS files correctly.
This setup provides a robust and automated way to manage your Hugging Face Space content from your dedicated GitHub repository.
It looks like you're still hitting a similar issue, but this time the rejection is explicitly for "binary files" (specifically .png
and .pdf
) rather than just "files larger than 10 MiB." Hugging Face, like many Git hosting services, uses pre-receive hooks to enforce certain best practices for repository size and content, particularly for large binary files. The recommended solution remains the same: Git Large File Storage (Git LFS).
Even if docking-serve
is a separate GitHub repo, if it contains these binary files and you haven't configured Git LFS for them, you'll encounter this error when pushing to Hugging Face. Hugging Face's infrastructure is designed to work well with Git LFS for assets like images, PDFs, models, and datasets.
Here's a detailed guide to fix this using Git LFS:
Understanding Git LFS
Git LFS replaces large files in your Git repository with small pointer files. When you clone or pull, Git LFS downloads the actual large files from a separate LFS server. Hugging Face's infrastructure supports this seamlessly.
Steps to Implement Git LFS and Push Successfully
You'll need to do this from the local clone of the repository that contains these offending files (likely your docling-serve
repository, or whichever local repo you're currently trying to push from).
-
Install Git LFS (if you haven't already): If you don't have Git LFS installed on your system, you need to do that first.
- macOS (using Homebrew):
brew install git-lfs
- Windows (using Chocolatey):
choco install git-lfs
- Linux (using apt):
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash sudo apt-get install git-lfs
- Other (or manual): Visit https://git-lfs.github.com/ for download links and instructions.
After installation, initialize Git LFS for your user:
git lfs install
- macOS (using Homebrew):
-
Navigate to your Repository: Make sure you are in the root directory of the
docling-serve
(or relevant) Git repository that contains theimg/
andtests/
folders.cd path/to/your/docling-serve-repo
-
Track the Binary Files with Git LFS: You need to tell Git LFS which file types to track. The
git lfs track
command creates or updates a.gitattributes
file in your repository, which Git uses to know which files are handled by LFS.Based on your error message, you need to track
.png
and.pdf
files.git lfs track "*.png" git lfs track "*.pdf"
This command adds entries like
*.png filter=lfs diff=lfs merge=lfs -text
to your.gitattributes
file. -
Add
.gitattributes
to Git (Commit it!): The.gitattributes
file is crucial because it tells Git (and Hugging Face) how to handle these files. You must commit this file.git add .gitattributes
-
Re-add the Offending Files (Crucial Step for Existing Files): If the large files were already committed to Git before you configured LFS, their history still contains the large binary data. You need to re-add them so Git LFS can process them.
git add img/ui-output.png git add tests/2206.01062v1.pdf git add tests/2408.09869v5.pdf # Add any other new or existing large files that were rejected
Self-correction/Pro Tip: If you have many such files, or if they are scattered, you can re-add them by staging them again:
git add .
(if you want to stage all changes) or use a specific path to re-stage all files within a directory after LFS tracking is set up. -
Commit Your Changes: Commit the
.gitattributes
file and the re-added binary files.git commit -m "Configure Git LFS for binary files and re-track existing ones"
-
Push to Hugging Face Space: Now, when you push, Git LFS will intercept the large files and upload them to Hugging Face's LFS storage, while Git commits only the small pointer files.
git push origin main
(Or
git push space main
if you named your Hugging Face remotespace
as in the previous suggestion).
Why This Happens and Why LFS is the Solution:
- Git's Design: Git is optimized for tracking changes in text-based code. Large binary files, when committed directly, bloat the repository history, making cloning, pushing, and pulling slow and inefficient.
- Hugging Face's Restrictions: Hugging Face Spaces have limits on direct binary file size in the main Git repository to maintain performance and manage storage efficiently. They strongly encourage LFS for such assets.
- The Error "binary files": This is a specific check on the type of file, not just its size, indicating that Hugging Face wants these non-text files handled by LFS. The previous "larger than 10 MiB" was a more general size check.
By following these steps, you should successfully push your docling-serve
repository, including the binary assets, to your Hugging Face Space.