docs.pdfs - jgrey4296/jgrey4296.github.io GitHub Wiki
# List tags in file:
exiftool -forcePrint -duplicates -groupHeadings -unknown a/file.pdf
# Get a tag:
exiftool -tag a/file.pdf
# Write a tag:
exiftool -tag=value a/file.pdf
# output in json:
exiftool -j a/file.pdf
# In the ExifTool config file:
%Image::ExifTool::UserDefined = (
# Define a new namespace
'Image::ExifTool::XMP::Main' => {
# namespace definition for examples 8 to 11
bibtex => { # <-- must be the same as the NAMESPACE prefix
SubDirectory => {
TagTable => 'Image::ExifTool::UserDefined::bibtex',
# (see the definition of this table below)
},
},
# add more user-defined XMP namespaces here...
},
);
# Then define its components
%Image::ExifTool::UserDefined::bibtex = (
GROUPS => { 0 => 'XMP', 1 => 'XMP-bib', 2 => 'bibtex' },
NAMESPACE => { 'bibtex' => 'http://www.bibtex.org/' },
WRITABLE => 'string', # (default to string-type tags)
Full => { Writable => 'string' },
Tags => { List => 'Bag'},
Entry => {
# the "Struct" entry defines the structure fields
Struct => {
# structure fields (very similar to tag definitions)
Key => {},
Type => {},
Title => {},
Author => {},
Editor => {},
Journal => {},
Booktitle => {},
Institution => {},
Note => {},
Publisher => {},
Issn => {},
Isbn => {},
DOI => {},
Url => {},
Year => { Writable => 'integer' },
},
},
);
# In Use:
exiftool -bibtex:full="blah"
exiftool -bibtex:entry="{type="blah", publisher="blah"}"
# Note theres no separator between entry and journal:
exiftool -bibtex:entryjournal="awegaweg"
pdfimages --help
exiftool file.pdf
# or:
pdftk file.pdf dump_data_utf8 > file.info
# edit
pdftk file.pdf update_info_utf8 file.info output file2.pdf
# For Creating Bookmarks/TOC in pdfs:
# BookmarkBegin
# BookmarkTitle:
# BookmarkLevel: 1
# BookmarkPageNumber:
pdftk ? dump_data > info.txt
# -- Add bookmarks
pdftk ? update_info info.txt output updated.pdf
# --
pdftk ? attach_files
pdftk ? dump_data_annots
# --
pdftk ? update_info ./info output out3.pdf
InfoBegin
InfoKey: JGData
InfoValue: Blah,Blee
pdftotext [options] <PDF-file> [<text-file>]
Usage: pdftotext [options] <PDF-file> [<text-file>]
-f <int> : first page to convert
-l <int> : last page to convert
-r <fp> : resolution, in DPI (default is 72)
-x <int> : x-coordinate of the crop area top left corner
-y <int> : y-coordinate of the crop area top left corner
-W <int> : width of crop area in pixels (default is 0)
-H <int> : height of crop area in pixels (default is 0)
-layout : maintain original physical layout
-fixed <fp> : assume fixed-pitch (or tabular) text
-raw : keep strings in content stream order
-nodiag : discard diagonal text
-htmlmeta : generate a simple HTML file, including the meta information
-tsv : generate a simple TSV file, including the meta information for bounding boxes
-enc <string> : output text encoding name
-listenc : list available encodings
-eol <string> : output end-of-line convention (unix, dos, or mac)
-nopgbrk : don't insert page breaks between pages
-bbox : output bounding box for each word and page size to html. Sets -htmlmeta
-bbox-layout : like -bbox but with extra layout bounding box data. Sets -htmlmeta
-cropbox : use the crop box rather than media box
-colspacing <fp> : how much spacing we allow after a word before considering adjacent text to be a new column, as a fraction of the font size (default is 0.7, old releases had a 0.3 default)
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-q : don't print any messages or errors
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
https://gitlab.com/TNThieding/exif
import exif
with open(file, 'rb') as f:
data = exif.Image(f)
# then delete the user_comment, set it,
# and write to a file using data.get_file()
qpdf {file} [options] {file}
# Check file structure
# 2: errors, 3: warnings
qpdf --check {file}
# Check if the pdf needs a password
# 2: no , 0: yes
qpdf --requires-password {file}
# Remove owner restrictions
qpdf --decrypt {file} {unlocked_file}
Ghostscript / gs
man gs
convert ? -alpha off ./temp/`?`
mogrify -orient bottom-left ?
img2pdf --output `?`.pdf --pagesize A4 --auto-orient ?
pdftk * cat output diagrams.pdf