Debian - SeasStarsRoses/IT GitHub Wiki
With Simple Scan you can scan several pages, crop them and save them as PDF. The tool is good for quick scans. Documents can be saved as PDF, JPF and PNG.
The XSane Image Scanner recommends to scan texts in their Scanning Tips with 300 DPI and Line Art. You can use it for texts with the following options:
- Main screen
-
Number of pages to scan. Choose:- 1 for manual
- n for Automatic Document Feeder (ADF)
-
Selects the scan source, such as document feeder: ChooseAutoorADF. -
Browse for image filename: Choose this button to select a filename. -
Target icon:Viewer. This will let you view the scanned image, turn it and save as PDF. -
Type: SetPDFas default. When you are using PNG like Tesseract that setPNG. -
Scan Mode:Lineart -
Set Scan Resolution:300
-
-
Settings: Choose a DirectoryPreferences➡️Save➡️Temporary Directory. -
Viewer- Use the buttons to turn the image.
- Save it as PDF using
File➡️Save image. - Close
Viewerto scan the next page.
With Sane it is possible to scan using scanimage commands:
- Scan a page from manual tray in
tiffformat to the fileoutput.tiff:scanimage -x 210 -y 297 --mode Gray --resolution 300 --format png > outfile.png- Possible formats are e.g.
pngortiff - Paper size A4 is 210 by 297 millimetres according to Wikipedia.
- Available options:
man scanimage
- Concatenate two files
input1.pngandinput2.pngto a fileoutput.png:
convert input1.png input1.png -append output.png - Turn and Crop an image with the
Shotwell viewer.
Concatenate the files input1.txt and input2.txt to the file output.txt:
cat input1.txt input2.txt > output.txt
![]() |
|---|
| Copyright: https://commons.wikimedia.org/wiki/File:TesseractLogo.png |
Tesseract-ocr is an optical character recognition engine. It is free software, released under the Apache License, Version 2.0. For more informations see Wikipedia.
The Command-Line usage is described in GitHub.
To read the text from a file input.png to an output file output.txt use:
tesseract input.png output.
Language support:
- English :
tesseract input.png output -l eng - German:
tesseract input.png output -l deu - How to run tesseract with multiple languages one time
-
Lios is a free and open source software for converting print into text using a scanner or camera. It can also produce text from other sources. Such as images, Pdf, or screenshot. Lios is released under GPL3 licence.
- The official Debian packages include
Lios. You do not need to install it via shell, there is a graphical way viaSoftware. - Lios cannot read German Umlaute, so the German language has to be installed first to get better results.
- Rectangles can be defined where the software is searching for text.
-
LIOSis usingTesseract-ocras OCR engine.
-
gImageReader is a simple frontent for
tesseract-ocr. - It is a free software under the GNU General Public License v3.0.
- It is available in the officially Debian repositories.
Zip a directory mydir into a file myfile.zip:
zip -r myfiles.zip mydir
- The PDF Toolkit pdftk is able to concatenate several PDF files to one.
- Features: The PDF Toolkit is an open source cross-platform tool for manipulating PDF documents. pdftk is capable of splitting, merging, encrypting, decrypting, uncompressing, recompressing, and repairing PDFs.
- Installation:
sudo apt-get install pdftk -
Concatenate two files:
pdftk file1.pdf file2.pdf cat output all.pdf -
Syntax: http://web.mit.edu/outland/arch/sun4x_58/build/pdftk-1.12/pdftk.1.html
- PDFSAM is an Open Source tool. It can split, merge, and rotate PDF files. It can create a content of table from the file names.
- Installation
- PDFSAM is included in the debian repositories, but in an old version.
- Instruction for the newer version
- Set the
JAVA_HOMEenvironment variable, to the JDK path, e.g.
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64" - Launch from command line with the command
sudo pdfsam
Linuxquestions.org describes how to convert one or more png files to pdf:
convert input1.png input2.png output.pdf
The software GIMP (GNU Image Manipulation Program) can crop a png file and export the result as pdf by using the menu File ➡️ Export As.
Document Viewer ist the Debian standard tool for viewing PDF files. It is possibe to turn PDF files horizontally and vertically with the menu option View Options and then Rotate Left and Rotate Right.
PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents.
StackExchange discusses tools options of Converting PDF to PDF/A.
The PDF Asscociation runs a competence center about long-term archiving.
LibreOffice can export files to PDF-A, for example in LibreOffice Writer choose the menu File ➡️ Export as PDF.... In the PDF Options Dialog choose the option Archive PDF/A-1a(ISO 19005-1).
Ghostscript is an interpreter for the PostScript language and for PDF. Ghostscript is supplied as part of every major Linux distribution, e.g. Debian.
It might be necessary to install it, see the FAQs: apt-get install ghostscript
The command gs is outlined in the documentation.
The following code example converts a file from the format PDF to PDF/A. For example pdf2pdfa myfile.pdf converts a given file myfile.pdf into myfile_a.pdf:
#!/bin/bash
echo Welcome to the converter PDF to PDF Archive
echo Usage with e.g. input.pdf: pdf2pdfa input.pdf
echo more info at https://unix.stackexchange.com/questions/79516/converting-pdf-to-pdf-a
pdf_input=$1
echo pdf_input=${pdf_input}
ps_output=${pdf_input%.*}.ps
echo ps_output=${ps_output}
pdfa_output=${pdf_input%.*}_a.pdf
echo pdfa_output=${pdfa_output}
echo command=pdftops $pdf_input $ps_output
pdftops $pdf_input $ps_output
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=$pdfa_output $ps_outputA a short help can be displayed with gs -h
Some flages of the gs command are:
-
-dBATCH: Causes Ghostscript to exit after processing all files named on the command line, rather than going into an interactive loop reading PostScript commands. Equivalent to putting-c quitat the end of the command line. -
-dNOOUTERSAVE: Suppresses the initial save that is used for compatibility with Adobe PS Interpreters that ordinarily run under a job server. If a job server is going to be used to set up the outermost save level, then-dNOOUTERSAVEshould be used so that the restore between jobs will restore global VM as expected. -
-dNOPAUSE: no pause after page -
-dDEVICE=<devname>: select device -
-sOutputFile=<file>: select output file
pdftops is a Portable Document Format (PDF) to PostScript converter, according to the https://linux.die.net/man/1/pdftops. The following command converts the PDF document input.pdf to the PostScript Document output.ps:
pdftops input.pdf output.ps
In Debian Linux right click on a PDF file and select properties. In the Document tab you can see the Format, e.g.:
-
PDF-1.3for a normal PDF document -
PDF/A -1bfor an Archive PDF document
Enable Developer Mode on Smartphone:
- Click seven times at
Settings -> About phone -> Build number - A message shows up like:
You are now a developerorDeveloper mode has been enabled - Enable USB Debugging on Smartphone: Choose
Settings -> Developer Options -> USB debugging -> on - To be able to flash the device: Choose
Settings -> System -> Developer Options -> OEM Unlocking -> on
On you Linux Laptop run the following commands.
- Launch adb server:
sudo adb start-server. - List your smartphone:
sudo adb devices. - Get root:
sudo adb root. You might need to confirm the questionAllow USB-Debuggingon your device withYes. Then you might need to typesudo adb rootagain. - Get a shell:
sudo adb shell. You might need to confirm the questionAllow USB-Debuggingon your device withYes. - Copy all files in
Download directoryfrom Smartphone to Laptop:sudo adb pull /storage/self/primary/Download - Copy a file from Laptop to Smartphone
Download directory:sudo adb push <filename> /storage/self/primary/Download - Remount for write rights:
sudo adb remount. - Restart in recovery mode:
sudo adb reboot recovery - ADB Sideload
- Smartphone: Reboot in revovery mode. Activate sideload in TWRP
- Laptop:
sudo adb sideload <filename>

- Install Java JDK
-
Download the Binary tar.gz archive
apache-maven-3.6.0-bin.tar.gz - Verify the checksum with
sha512sum apache-maven-3.6.0-bin.tar.gz - Extract the file
- Terminal
- Set the
JAVA_HOMEenvironment variable, to the JDK path, e.g.
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64" - Navigate to the
bindirectory - Check the version number using
./mvn --version - Or use your installation path, e.g.
/home/user/SW/apache-maven/apache-maven-3.6.0-bin/apache-maven-3.6.0/bin/mvn --version
- Set the
