Debian - SeasStarsRoses/IT GitHub Wiki
With Simple Scan
you can scan several pages, crop them and save them as PDF. The tool is good for quick scans. Documents can be saved as PDF
, JPF
and PNG
.
The XSane Image Scanner recommends to scan texts in their Scanning Tips with 300 DPI
and Line Art
. You can use it for texts with the following options:
- Main screen
-
Number of pages to scan
. Choose:- 1 for manual
- n for Automatic Document Feeder (ADF)
-
Selects the scan source, such as document feeder
: ChooseAuto
orADF
. -
Browse for image filename
: Choose this button to select a filename. -
Target icon
:Viewer
. This will let you view the scanned image, turn it and save as PDF. -
Type
: SetPDF
as default. When you are using PNG like Tesseract that setPNG
. -
Scan Mode
:Lineart
-
Set Scan Resolution
:300
-
-
Settings
: Choose a DirectoryPreferences
➡️Save
➡️Temporary Directory
. -
Viewer
- Use the buttons to turn the image.
- Save it as PDF using
File
➡️Save image
. - Close
Viewer
to scan the next page.
With Sane it is possible to scan using scanimage commands:
- Scan a page from manual tray in
tiff
format to the fileoutput.tiff
:scanimage -x 210 -y 297 --mode Gray --resolution 300 --format png > outfile.png
- Possible formats are e.g.
png
ortiff
- Paper size A4 is 210 by 297 millimetres according to Wikipedia.
- Available options:
man scanimage
- Concatenate two files
input1.png
andinput2.png
to a fileoutput.png
:
convert input1.png input1.png -append output.png
- Turn and Crop an image with the
Shotwell viewer
.
Concatenate the files input1.txt
and input2.txt
to the file output.txt
:
cat input1.txt input2.txt > output.txt
|
---|
Copyright: https://commons.wikimedia.org/wiki/File:TesseractLogo.png |
Tesseract-ocr is an optical character recognition engine. It is free software, released under the Apache License, Version 2.0. For more informations see Wikipedia.
The Command-Line usage is described in GitHub.
To read the text from a file input.png
to an output file output.txt
use:
tesseract input.png output
.
Language support:
- English :
tesseract input.png output -l eng
- German:
tesseract input.png output -l deu
- How to run tesseract with multiple languages one time
-
Lios is a free and open source software for converting print into text using a scanner or camera. It can also produce text from other sources. Such as images, Pdf, or screenshot. Lios is released under GPL3 licence.
- The official Debian packages include
Lios
. You do not need to install it via shell, there is a graphical way viaSoftware
. - Lios cannot read German Umlaute, so the German language has to be installed first to get better results.
- Rectangles can be defined where the software is searching for text.
-
LIOS
is usingTesseract-ocr
as OCR engine.
-
gImageReader is a simple frontent for
tesseract-ocr
. - It is a free software under the GNU General Public License v3.0.
- It is available in the officially Debian repositories.
Zip a directory mydir
into a file myfile.zip
:
zip -r myfiles.zip mydir
- The PDF Toolkit pdftk is able to concatenate several PDF files to one.
- Features: The PDF Toolkit is an open source cross-platform tool for manipulating PDF documents. pdftk is capable of splitting, merging, encrypting, decrypting, uncompressing, recompressing, and repairing PDFs.
- Installation:
sudo apt-get install pdftk
-
Concatenate two files:
pdftk file1.pdf file2.pdf cat output all.pdf
-
Syntax
: http://web.mit.edu/outland/arch/sun4x_58/build/pdftk-1.12/pdftk.1.html
- PDFSAM is an Open Source tool. It can split, merge, and rotate PDF files. It can create a content of table from the file names.
- Installation
- PDFSAM is included in the debian repositories, but in an old version.
- Instruction for the newer version
- Set the
JAVA_HOME
environment variable, to the JDK path, e.g.
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"
- Launch from command line with the command
sudo pdfsam
Linuxquestions.org describes how to convert one or more png
files to pdf
:
convert input1.png input2.png output.pdf
The software GIMP
(GNU Image Manipulation Program) can crop a png
file and export the result as pdf
by using the menu File
➡️ Export As
.
Document Viewer ist the Debian standard tool for viewing PDF files. It is possibe to turn PDF files horizontally and vertically with the menu option View Options
and then Rotate Left
and Rotate Right
.
PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents.
StackExchange discusses tools options of Converting PDF to PDF/A.
The PDF Asscociation runs a competence center about long-term archiving.
LibreOffice can export files to PDF-A, for example in LibreOffice Writer choose the menu File
➡️ Export as PDF...
. In the PDF Options
Dialog choose the option Archive PDF/A-1a(ISO 19005-1)
.
Ghostscript is an interpreter for the PostScript language and for PDF. Ghostscript is supplied as part of every major Linux distribution, e.g. Debian.
It might be necessary to install it, see the FAQs: apt-get install ghostscript
The command gs
is outlined in the documentation.
The following code example converts a file from the format PDF to PDF/A. For example pdf2pdfa myfile.pdf
converts a given file myfile.pdf
into myfile_a.pdf
:
#!/bin/bash
echo Welcome to the converter PDF to PDF Archive
echo Usage with e.g. input.pdf: pdf2pdfa input.pdf
echo more info at https://unix.stackexchange.com/questions/79516/converting-pdf-to-pdf-a
pdf_input=$1
echo pdf_input=${pdf_input}
ps_output=${pdf_input%.*}.ps
echo ps_output=${ps_output}
pdfa_output=${pdf_input%.*}_a.pdf
echo pdfa_output=${pdfa_output}
echo command=pdftops $pdf_input $ps_output
pdftops $pdf_input $ps_output
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=$pdfa_output $ps_output
A a short help can be displayed with gs -h
Some flages of the gs
command are:
-
-dBATCH
: Causes Ghostscript to exit after processing all files named on the command line, rather than going into an interactive loop reading PostScript commands. Equivalent to putting-c quit
at the end of the command line. -
-dNOOUTERSAVE
: Suppresses the initial save that is used for compatibility with Adobe PS Interpreters that ordinarily run under a job server. If a job server is going to be used to set up the outermost save level, then-dNOOUTERSAVE
should be used so that the restore between jobs will restore global VM as expected. -
-dNOPAUSE
: no pause after page -
-dDEVICE=<devname>
: select device -
-sOutputFile=<file>
: select output file
pdftops is a Portable Document Format (PDF) to PostScript converter, according to the https://linux.die.net/man/1/pdftops. The following command converts the PDF document input.pdf
to the PostScript Document output.ps
:
pdftops input.pdf output.ps
In Debian Linux right click on a PDF file and select properties
. In the Document
tab you can see the Format, e.g.:
-
PDF-1.3
for a normal PDF document -
PDF/A -1b
for an Archive PDF document
Enable Developer Mode on Smartphone:
- Click seven times at
Settings -> About phone -> Build number
- A message shows up like:
You are now a developer
orDeveloper mode has been enabled
- Enable USB Debugging on Smartphone: Choose
Settings -> Developer Options -> USB debugging -> on
- To be able to flash the device: Choose
Settings -> System -> Developer Options -> OEM Unlocking -> on
On you Linux Laptop run the following commands.
- Launch adb server:
sudo adb start-server
. - List your smartphone:
sudo adb devices
. - Get root:
sudo adb root
. You might need to confirm the questionAllow USB-Debugging
on your device withYes
. Then you might need to typesudo adb root
again. - Get a shell:
sudo adb shell
. You might need to confirm the questionAllow USB-Debugging
on your device withYes
. - Copy all files in
Download directory
from Smartphone to Laptop:sudo adb pull /storage/self/primary/Download
- Copy a file from Laptop to Smartphone
Download directory
:sudo adb push <filename> /storage/self/primary/Download
- Remount for write rights:
sudo adb remount
. - Restart in recovery mode:
sudo adb reboot recovery
- ADB Sideload
- Smartphone: Reboot in revovery mode. Activate sideload in TWRP
- Laptop:
sudo adb sideload <filename>
Apache Maven can manage a project's build, reporting and documentation from a central piece of information.
- Install Java JDK
-
Download the Binary tar.gz archive
apache-maven-3.6.0-bin.tar.gz
- Verify the checksum with
sha512sum apache-maven-3.6.0-bin.tar.gz
- Extract the file
- Terminal
- Set the
JAVA_HOME
environment variable, to the JDK path, e.g.
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"
- Navigate to the
bin
directory - Check the version number using
./mvn --version
- Or use your installation path, e.g.
/home/user/SW/apache-maven/apache-maven-3.6.0-bin/apache-maven-3.6.0/bin/mvn --version
- Set the