Text and Image Digitization Standards - noolahamfoundation/standards GitHub Wiki

  • Title: T10: NF’s Text and Image Digitalization Standards
  • Document Type: Standard Operating Procedure
  • Security Classification: Public
  • Department: NF Technology
  • Author (s): Natkeeran L. Kanthan
  • Approval Status:
  • Version: Version 1 - Jan 2015; Initial Draft - Dec 2012

Table of Contents

Purpose of the Document

The purpose of this document is to provide standards and guidelines for text and image digitalization activities carried out by Noolaham Foundation.

Audience

This document is intended for Management, Staff, Technology Team and RB.

Executive Summary

  • Text materials must be scanned using min 400 dpi, 24 bit color, TIFF uncompressed.
  • Images should scanned using min 600 dpi, 24 bit color, TIFF uncompressed.

Background

NF’s digitalization activities began as an amature effort to accelerate digital preservation. As the project evolved, NF began a process of re-evaluating the equipments, standards, technologies and workflows used in digitalization. The aim of the re-evaluation is to align NF in accordance with international standards, and improve overall quality and efficiency. This “T10: Digitalization Standards” document outlines the standards and the quality assurance workflow to be used for scanning.

Digitalization

Digitalization is the process of converting a physical sources such as a book into an electronic format such as digital images or a pdf book. In theory, digital data is not degradable, and can be easily copied, organized and preserved.

Goals of Digital Preservation

Noolaham Foundation objective is long term digital preservation of resources. These digital resources can be used by many user communities for various type of uses. The goals of the digitalization include:

  • Long-term digital preservation of resources
  • To ensure that the scanned text can be used for Optical Character Recognition.
  • To enable high-quality or reasonable reproduction of the originals in print and other media.
  • To enable efficient storage, browsing, and retrieval of the digital records.
  • To protect digital records against software and hardware technological obsolescence
To fulfil the above goals, Noolaham Foundation aims to digitalize resources at the optimum standards possible.

Text/Image Documents Digitization Standards

Quality Measure Description Standard Reasoning
Optical or Spatial Resolution Image or Spatial resolution refers to the number of picture elements or pixels per unit of measurement. This is measured as number of Dots Per Inch (DPI). 400 dpi - for text documents; min 600 dpi - for images Printed materials generally use 300 dpi, however, 400 dpi is recommended for OCR. Referenced Standards: http://www.library.unt.edu/d
Bit Depth or Signal Resolution or Tonal Resolution Bit depth defines the number of shades each dot can represent. Usually this can be 1 bit (black or white), 8 bit (gray scale), 8 bit (color), 24 bit (color). The higher the bit depth, the more accurate images can be scanned or represented. 24 bit Color - RGB (3 x 8 bit channels) For archiving purposes, highest quality is desired. However, 48 bit is color depth is not practical for text documents.
File Format TIFF lossless TIFF is the standard for commercial printing and uncompressed TIFF format preserves all image data. TIFF is recommended for archival storage by majority of standards. http://www.archives.gov/preservation/technical/guidelines.pdf
Document Size 100% to scale 100% to scale
Color Management TBD TBD
Post Processing TBD TBD

Scanner Selection

There are three types of scanners. Regardless of the scanner type, the following parameters must be considered in purchasing scanners.

  • Labour
  • Spatial Resolution supported (minimum 600 dpi, and upto 1200 dpi)
  • Signal Resolution (minimum 16 bit gray-scale, and 24 bit color options)
  • Paper (scanned) Per Minute or PPM - (at different modes, at the highest quality minimum 5 PPM)
  • Scanning Range: The typical Letter paper size is A4 or 8.5 x 11. A range to at least A3 will be required.
  • Output formats: JPEG, TIFF and PDF output formats should be supported.
  • Software: Software for post-scanned processing and file handling should be provided.
Scanner Type Labour Technology Reasoning Quality
Manual Scanners High Low Low High
Sheet-Feed Scanners Medium Low Medium High
Sheet-Feed Scanners Medium Low Medium High
Fully-Automated Scanners Low High High High

Digitalization Workflow

  • Assess source document attributes (quality of the source, tone, color etc)
  • Scanning
  • Post processing
  • Quality assurance

Quality Assurance

Qualitative assessment is required to be undertaken to meet Acceptance Quality Limits of Level III. The acceptance sampling and record keeping is to be incorporated to the overall scanning workflow. =

Quantitative assessment is to be undertaken by RB appointed audit team to ensure overall quality of the digitalization work.

References


⚠️ **GitHub.com Fallback** ⚠️