How to turn off OCR (useful if you only want metadata extraction) - chrismattmann/tika-python GitHub Wiki
Problem
Even if parser.from_text(x, service = 'meta')
is selected, Tika extracts the content. For files that need OCR'ing this can take a lot of time.
Solution
There are some solutions offered by Tika here to turn off OCR'ing. Since tika-python uses a Tika Server the last solution can be used:
parser.from_file(x, service = 'meta', headers = {"X-Tika-OCRskipOcr": 'true'})
This also works with service = 'all'. It returns the content if there is content that can be returned without OCR.