pdf manipulation - jgrey4296/templates GitHub Wiki

Pdfs

Tools

calibre ebook-meta

https://manual.calibre-ebook.com/generated/en/ebook-meta.html

exiftool

https://exiftool.org/ https://exiftool.org/exiftool_pod.html

ffmpeg

https://ffmpeg.org/

ffmpeg --help
# convert wav to mp3:
ffmpeg -i input.wav -vn -ar 44100 -ac 2 -b:a 192k output.mp3

-i : set input -vn : no video -ar : audio rate -ac : audio channels -b:a : bitrate

from https://superuser.com/questions/384073

# ffmpeg -i <INPUT FILE> -ss 10 -f image2 -r 25 <OUTPUT FILE>

-i <INPUT FILE> Specifies the input file. E.g. movie.mp4. -ss <TIME> Specifies time position in seconds. “hh:mm:ss[.xxx]” is also supported. -f image2 Force/Set format. -r 25 Set frame rate (in Hz. Can either be a fraction or a number, default = 25). <OUTPUT FILE> Set output file. E.g. image1.jpg.

https://stackoverflow.com/questions/10957412

imagemagick

https://imagemagick.org/script/command-line-tools.php

animate :: animate images when in x11 compare :: differences between images composite :: overlap images conjure :: scripting language interpreter convert :: between different formats display :: display image when in x11 identify :: get format data import :: screenshot x11 mogrify :: destructively modify montage :: combine without overlapping stream :: pixels at a time

pdfimages

pdfimages --help
pdfimages version 22.12.0 Copyright 2005-2022 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011, 2022 Glyph & Cog, LLC Usage: pdfimages [options] <PDF-file> <image-root> -f <int> : first page to convert -l <int> : last page to convert -png : change the default output format to PNG -tiff : change the default output format to TIFF -j : write JPEG images as JPEG files -jp2 : write JPEG2000 images as JP2 files -jbig2 : write JBIG2 images as JBIG2 files -ccitt : write CCITT images as CCITT files -all : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt -list : print list of images instead of saving -opw <string> : owner password (for encrypted files) -upw <string> : user password (for encrypted files) -p : include page numbers in output file names -q : don’t print any messages or errors -v : print copyright and version info -h : print usage information -help : print usage information –help : print usage information -? : print usage information

pdftk

https://www.pdflabs.com/docs/pdftk-man-page/

exiftool file.pdf

# or:
pdftk file.pdf dump_data_utf8 > file.info
# edit
pdftk file.pdf update_info_utf8 file.info output file2.pdf
# For Creating Bookmarks/TOC in pdfs:
# BookmarkBegin
# BookmarkTitle:
# BookmarkLevel: 1
# BookmarkPageNumber:
pdftk ? dump_data > info.txt
# -- Add bookmarks
pdftk ? update_info info.txt output updated.pdf
# --
pdftk ? attach_files
pdftk ? dump_data_annots

# --
pdftk ? update_info ./info output out3.pdf
InfoBegin
InfoKey: JGData
InfoValue: Blah,Blee

pdftotext

pdftotext [options] <PDF-file> [<text-file>]
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>             : first page to convert
  -l <int>             : last page to convert
  -r <fp>              : resolution, in DPI (default is 72)
  -x <int>             : x-coordinate of the crop area top left corner
  -y <int>             : y-coordinate of the crop area top left corner
  -W <int>             : width of crop area in pixels (default is 0)
  -H <int>             : height of crop area in pixels (default is 0)
  -layout              : maintain original physical layout
  -fixed <fp>          : assume fixed-pitch (or tabular) text
  -raw                 : keep strings in content stream order
  -nodiag              : discard diagonal text
  -htmlmeta            : generate a simple HTML file, including the meta information
  -tsv                 : generate a simple TSV file, including the meta information for bounding boxes
  -enc <string>        : output text encoding name
  -listenc             : list available encodings
  -eol <string>        : output end-of-line convention (unix, dos, or mac)
  -nopgbrk             : don't insert page breaks between pages
  -bbox                : output bounding box for each word and page size to html. Sets -htmlmeta
  -bbox-layout         : like -bbox but with extra layout bounding box data.  Sets -htmlmeta
  -cropbox             : use the crop box rather than media box
  -colspacing <fp>     : how much spacing we allow after a word before considering adjacent text to be a new column, as a fraction of the font size (default is 0.7, old releases had a 0.3 default)
  -opw <string>        : owner password (for encrypted files)
  -upw <string>        : user password (for encrypted files)
  -q                   : don't print any messages or errors
  -v                   : print copyright and version info
  -h                   : print usage information
  -help                : print usage information
  --help               : print usage information
  -?                   : print usage information

python exif

https://gitlab.com/TNThieding/exif import exif with open(file, ‘rb’) as f: data = exif.Image(f)

then delete the user_comment, set it, and write to a file using data.get_file()

tesseract

https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html

Topics

images to pdf

convert ? -alpha off ./temp/`?`
mogrify -orient bottom-left ?
img2pdf --output `?`.pdf --pagesize A4 --auto-orient ?
pdftk * cat output diagrams.pdf
⚠️ **GitHub.com Fallback** ⚠️