Home - smalot/pdfparser GitHub Wiki
PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
Currently, secured documents are not supported.
This Library is under active maintenance. There is no active development by the author of this library (at the moment), but we welcome any pull request adding/extending functionality!
This project is supported by Actualys.
Features
- Load/parse objects and headers
- Extract meta data (author, description, ...)
- Extract text from ordered pages
- Support of compressed pdf
- Support of MAC OS Roman charset encoding
- Handling of hexa and octal encoding in text sections
- PSR-0 compliant (autoloader)
- PSR-1 compliant (code styling)
Prerequisites
This library requires PHP 7.1+ (since v1.0.0).
PDFParser is built on top of TCPDF parser.
This library will be automatically downloaded through Composer command line.
In case you can't use Composer, you can include alt_autoload.php-dist
into your project.
License
This library is under the LGPLv3 license.