docs.pdfs - jgrey4296/jgrey4296.github.io GitHub Wiki

Pdfs

Pdf Tools

exiftool

Usage

# List tags in file:
exiftool -forcePrint -duplicates -groupHeadings -unknown a/file.pdf
# Get a tag:
exiftool -tag a/file.pdf
# Write a tag:
exiftool -tag=value a/file.pdf

# output in json:
exiftool -j a/file.pdf

Custom XMP Namespace

# In the ExifTool config file:
%Image::ExifTool::UserDefined = (
# Define a new namespace
    'Image::ExifTool::XMP::Main' => {
        # namespace definition for examples 8 to 11
        bibtex => { # <-- must be the same as the NAMESPACE prefix
            SubDirectory => {
                TagTable => 'Image::ExifTool::UserDefined::bibtex',
                # (see the definition of this table below)
            },
        },
        # add more user-defined XMP namespaces here...
    },
);

# Then define its components
%Image::ExifTool::UserDefined::bibtex = (
    GROUPS     => { 0 => 'XMP', 1 => 'XMP-bib', 2 => 'bibtex' },
    NAMESPACE  => { 'bibtex' => 'http://www.bibtex.org/' },
    WRITABLE   => 'string', # (default to string-type tags)
    Full       => { Writable => 'string' },
    Tags       => { List     => 'Bag'},

    Entry      => {
        # the "Struct" entry defines the structure fields
        Struct => {
            # structure fields (very similar to tag definitions)
                   Key         => {},
                   Type        => {},
                   Title       => {},
                   Author      => {},
                   Editor      => {},
                   Journal     => {},
                   Booktitle   => {},
                   Institution => {},
                   Note        => {},
                   Publisher   => {},
                   Issn        => {},
                   Isbn        => {},
                   DOI         => {},
                   Url         => {},
                   Year        => { Writable => 'integer' },
        },
    },
);
# In Use:
exiftool -bibtex:full="blah"
exiftool -bibtex:entry="{type="blah", publisher="blah"}"
# Note theres no separator between entry and journal:
exiftool -bibtex:entryjournal="awegaweg"

pdfimages

pdfimages --help
pdfimages version 22.12.0 Copyright 2005-2022 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011, 2022 Glyph & Cog, LLC Usage: pdfimages [options] <PDF-file> <image-root> -f <int> : first page to convert -l <int> : last page to convert -png : change the default output format to PNG -tiff : change the default output format to TIFF -j : write JPEG images as JPEG files -jp2 : write JPEG2000 images as JP2 files -jbig2 : write JBIG2 images as JBIG2 files -ccitt : write CCITT images as CCITT files -all : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt -list : print list of images instead of saving -opw <string> : owner password (for encrypted files) -upw <string> : user password (for encrypted files) -p : include page numbers in output file names -q : don’t print any messages or errors -v : print copyright and version info -h : print usage information -help : print usage information –help : print usage information -? : print usage information

pdftk

exiftool file.pdf

# or:
pdftk file.pdf dump_data_utf8 > file.info
# edit
pdftk file.pdf update_info_utf8 file.info output file2.pdf
# For Creating Bookmarks/TOC in pdfs:
# BookmarkBegin
# BookmarkTitle:
# BookmarkLevel: 1
# BookmarkPageNumber:
pdftk ? dump_data > info.txt
# -- Add bookmarks
pdftk ? update_info info.txt output updated.pdf
# --
pdftk ? attach_files
pdftk ? dump_data_annots

# --
pdftk ? update_info ./info output out3.pdf
InfoBegin
InfoKey: JGData
InfoValue: Blah,Blee

pdftotext

pdftotext [options] <PDF-file> [<text-file>]
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>             : first page to convert
  -l <int>             : last page to convert
  -r <fp>              : resolution, in DPI (default is 72)
  -x <int>             : x-coordinate of the crop area top left corner
  -y <int>             : y-coordinate of the crop area top left corner
  -W <int>             : width of crop area in pixels (default is 0)
  -H <int>             : height of crop area in pixels (default is 0)
  -layout              : maintain original physical layout
  -fixed <fp>          : assume fixed-pitch (or tabular) text
  -raw                 : keep strings in content stream order
  -nodiag              : discard diagonal text
  -htmlmeta            : generate a simple HTML file, including the meta information
  -tsv                 : generate a simple TSV file, including the meta information for bounding boxes
  -enc <string>        : output text encoding name
  -listenc             : list available encodings
  -eol <string>        : output end-of-line convention (unix, dos, or mac)
  -nopgbrk             : don't insert page breaks between pages
  -bbox                : output bounding box for each word and page size to html. Sets -htmlmeta
  -bbox-layout         : like -bbox but with extra layout bounding box data.  Sets -htmlmeta
  -cropbox             : use the crop box rather than media box
  -colspacing <fp>     : how much spacing we allow after a word before considering adjacent text to be a new column, as a fraction of the font size (default is 0.7, old releases had a 0.3 default)
  -opw <string>        : owner password (for encrypted files)
  -upw <string>        : user password (for encrypted files)
  -q                   : don't print any messages or errors
  -v                   : print copyright and version info
  -h                   : print usage information
  -help                : print usage information
  --help               : print usage information
  -?                   : print usage information

python exif

https://gitlab.com/TNThieding/exif

import exif
with open(file, 'rb') as f:
    data = exif.Image(f)

# then delete the user_comment, set it,
# and write to a file using data.get_file()

qpdf

qpdf {file} [options] {file}
# Check file structure
# 2: errors, 3: warnings
qpdf --check {file}
# Check if the pdf needs a password
# 2: no , 0: yes
qpdf --requires-password {file}
# Remove owner restrictions
qpdf --decrypt {file} {unlocked_file}

Ghostscript / gs

man gs

Howto

images to pdf

convert ? -alpha off ./temp/`?`
mogrify -orient bottom-left ?
img2pdf --output `?`.pdf --pagesize A4 --auto-orient ?
pdftk * cat output diagrams.pdf

Links

⚠️ **GitHub.com Fallback** ⚠️