Debian - SeasStarsRoses/IT GitHub Wiki

Table of Contents


1. Scanning

1.1 Scanner

1.1.1 Simple Scan

With Simple Scan you can scan several pages, crop them and save them as PDF. The tool is good for quick scans. Documents can be saved as PDF, JPF and PNG.

1.1.2 XSane Image Scanner

The XSane Image Scanner recommends to scan texts in their Scanning Tips with 300 DPI and Line Art. You can use it for texts with the following options:

  • Main screen
    • Number of pages to scan. Choose:
      • 1 for manual
      • n for Automatic Document Feeder (ADF)
    • Selects the scan source, such as document feeder: Choose Auto or ADF.
    • Browse for image filename: Choose this button to select a filename.
    • Target icon: Viewer. This will let you view the scanned image, turn it and save as PDF.
    • Type: Set PDF as default. When you are using PNG like Tesseract that set PNG.
    • Scan Mode: Lineart
    • Set Scan Resolution: 300
  • Settings: Choose a Directory Preferences ➡️ Save ➡️ Temporary Directory.
  • Viewer
    • Use the buttons to turn the image.
    • Save it as PDF using File ➡️ Save image.
    • Close Viewer to scan the next page.

1.1.3 Sane Command Line

With Sane it is possible to scan using scanimage commands:

  • Scan a page from manual tray in tiff format to the file output.tiff:
    • scanimage -x 210 -y 297 --mode Gray --resolution 300 --format png > outfile.png
    • Possible formats are e.g. png or tiff
    • Paper size A4 is 210 by 297 millimetres according to Wikipedia.
  • Available options: man scanimage

1.2 File concatenation and conversion

1.2.1 PNG

  • Concatenate two files input1.png and input2.png to a file output.png:
    convert input1.png input1.png -append output.png
  • Turn and Crop an image with the Shotwell viewer.

1.2.3 TXT

Concatenate the files input1.txt and input2.txt to the file output.txt:

cat input1.txt input2.txt > output.txt


1.3 OCR


Copyright: https://commons.wikimedia.org/wiki/File:TesseractLogo.png

1.3.1 Tesseract-ocr

Tesseract-ocr is an optical character recognition engine. It is free software, released under the Apache License, Version 2.0. For more informations see Wikipedia.
The Command-Line usage is described in GitHub.
To read the text from a file input.png to an output file output.txt use:
tesseract input.png output.
Language support:

1.3.2 Lios

  • Lios is a free and open source software for converting print into text using a scanner or camera. It can also produce text from other sources. Such as images, Pdf, or screenshot. Lios is released under GPL3 licence.
  • The official Debian packages include Lios. You do not need to install it via shell, there is a graphical way via Software.
  • Lios cannot read German Umlaute, so the German language has to be installed first to get better results.
  • Rectangles can be defined where the software is searching for text.
  • LIOS is using Tesseract-ocr as OCR engine.

1.3.3 gImageReader

1.4 Compression

Zip a directory mydir into a file myfile.zip:
zip -r myfiles.zip mydir


2. PDF

2.1 PDF Toolkit

2.2 PDFSAM

  • PDFSAM is an Open Source tool. It can split, merge, and rotate PDF files. It can create a content of table from the file names.
  • Installation
    • PDFSAM is included in the debian repositories, but in an old version.
    • Instruction for the newer version
    • Set the JAVA_HOME environment variable, to the JDK path, e.g.
      export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"
    • Launch from command line with the command sudo pdfsam

2.3 PNG to PDF

Linuxquestions.org describes how to convert one or more png files to pdf:
convert input1.png input2.png output.pdf
The software GIMP (GNU Image Manipulation Program) can crop a png file and export the result as pdf by using the menu File ➡️ Export As.

2.4 Rotate

Document Viewer ist the Debian standard tool for viewing PDF files. It is possibe to turn PDF files horizontally and vertically with the menu option View Options and then Rotate Left and Rotate Right.

2.5 PDF Archive PDF/A

PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents.

StackExchange discusses tools options of Converting PDF to PDF/A.

The PDF Asscociation runs a competence center about long-term archiving.

2.5.1 LibreOffice

LibreOffice can export files to PDF-A, for example in LibreOffice Writer choose the menu File ➡️ Export as PDF.... In the PDF Options Dialog choose the option Archive PDF/A-1a(ISO 19005-1).

2.5.2 Ghostscript

Ghostscript is an interpreter for the PostScript language and for PDF. Ghostscript is supplied as part of every major Linux distribution, e.g. Debian.

It might be necessary to install it, see the FAQs: apt-get install ghostscript

The command gs is outlined in the documentation.

The following code example converts a file from the format PDF to PDF/A. For example pdf2pdfa myfile.pdf converts a given file myfile.pdf into myfile_a.pdf:

#!/bin/bash

echo Welcome to the converter PDF to PDF Archive
echo Usage with e.g. input.pdf: pdf2pdfa input.pdf
echo more info at https://unix.stackexchange.com/questions/79516/converting-pdf-to-pdf-a
pdf_input=$1
echo pdf_input=${pdf_input}
ps_output=${pdf_input%.*}.ps
echo ps_output=${ps_output}
pdfa_output=${pdf_input%.*}_a.pdf
echo pdfa_output=${pdfa_output}
echo command=pdftops $pdf_input $ps_output
pdftops $pdf_input $ps_output

gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=$pdfa_output $ps_output

A a short help can be displayed with gs -h

Some flages of the gs command are:

  • -dBATCH: Causes Ghostscript to exit after processing all files named on the command line, rather than going into an interactive loop reading PostScript commands. Equivalent to putting -c quit at the end of the command line.
  • -dNOOUTERSAVE: Suppresses the initial save that is used for compatibility with Adobe PS Interpreters that ordinarily run under a job server. If a job server is going to be used to set up the outermost save level, then -dNOOUTERSAVE should be used so that the restore between jobs will restore global VM as expected.
  • -dNOPAUSE: no pause after page
  • -dDEVICE=<devname>: select device
  • -sOutputFile=<file>: select output file

pdftops is a Portable Document Format (PDF) to PostScript converter, according to the https://linux.die.net/man/1/pdftops. The following command converts the PDF document input.pdf to the PostScript Document output.ps:

pdftops input.pdf output.ps

2.5.3 How to verify if a PDF document is in the PDF/A format

In Debian Linux right click on a PDF file and select properties. In the Document tab you can see the Format, e.g.:

  • PDF-1.3 for a normal PDF document
  • PDF/A -1b for an Archive PDF document

3. Android Debug Bridge (adb)

3.1 Developer Mode

Enable Developer Mode on Smartphone:

  • Click seven times at Settings -> About phone -> Build number
  • A message shows up like: You are now a developer or Developer mode has been enabled
  • Enable USB Debugging on Smartphone: Choose Settings -> Developer Options -> USB debugging -> on
  • To be able to flash the device: Choose Settings -> System -> Developer Options -> OEM Unlocking -> on

3.2 Run

On you Linux Laptop run the following commands.

  • Launch adb server: sudo adb start-server.
  • List your smartphone: sudo adb devices.
  • Get root: sudo adb root. You might need to confirm the question Allow USB-Debugging on your device with Yes. Then you might need to type sudo adb root again.
  • Get a shell: sudo adb shell. You might need to confirm the question Allow USB-Debugging on your device with Yes.
  • Copy all files in Download directory from Smartphone to Laptop: sudo adb pull /storage/self/primary/Download
  • Copy a file from Laptop to Smartphone Download directory: sudo adb push <filename> /storage/self/primary/Download
  • Remount for write rights: sudo adb remount.
  • Restart in recovery mode: sudo adb reboot recovery
  • ADB Sideload
    • Smartphone: Reboot in revovery mode. Activate sideload in TWRP
    • Laptop: sudo adb sideload <filename>

4. Apache Maven

4.1 Info

See https://en.wikipedia.org/wiki/Apache_Maven#/media/File:Maven_logo.svg Apache Maven can manage a project's build, reporting and documentation from a central piece of information.


4.2 Download and Installation

  • Install Java JDK
  • Download the Binary tar.gz archive apache-maven-3.6.0-bin.tar.gz
  • Verify the checksum with sha512sum apache-maven-3.6.0-bin.tar.gz
  • Extract the file
  • Terminal
    • Set the JAVA_HOME environment variable, to the JDK path, e.g.
      export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"
    • Navigate to the bin directory
    • Check the version number using ./mvn --version
    • Or use your installation path, e.g. /home/user/SW/apache-maven/apache-maven-3.6.0-bin/apache-maven-3.6.0/bin/mvn --version

5. Disk Space

See Observe partition disk usage

⚠️ **GitHub.com Fallback** ⚠️