Tesseract

[Source Code]
- [galaxy/tools/tesseract.xml : ]This is the configuration file that details the tools usage, its inputs, outputs, version, and other things.
- galaxy/tools/run-tesseract.py : This is a python wrapper to run the FFmpeg on input video. FFmpeg creates frames from the video. These frames are then passed through the tesseract tool which runs the OCR and produces a JSON output. The JSON output has all the text predictions with their corresponding bounding box coordinates for all the frames.

[Installations]
- [$ sudo apt-get install FFmpeg]
- [$ pip install pytesseract]
- [$ sudo apt install tesseract-ocr]
- [$ sudo apt install libtesseract-dev]

[Running ]
- [[The tool can be invoked from Galaxy UI as other tools. User needs to supply input data in the form of a video file.]]

[Parameters]
- input_video: the video file to be passed through the OCR.
- dedupe: Whether to dedupe consecutive frames with same texts. default true.
- period: Period in seconds to last as consecutive duplicate frames. default 5 seconds.

[Output]
- [amp_vocr: It has the output of the OCR with all the recognized text in each frame and their bounding boxes. It also has other information like frame rate and resolution.]
- [amp_vocr_dedupe: The AMP OCRR JSON with duplicate frames removed]

[More inpormation about tesseract is here.]

Document generated by Confluence on Feb 25, 2025 10:39

Tesseract - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki