Noolaham Foundation’s Technology Roadmap - noolahamfoundation/guiding-documents GitHub Wiki

Title Noolaham Foundation’s Technology Roadmap
Document Type Report
Security Classification BOD, RB, Management
Department NF Technology
Author (s) Natkeeran L. Kanthan
Version Initial Draft - Sep 15, 2014
Second Draft - Sep 26, 2014
(Incorporate comments from Ramanesh and Gopi)

Table of Contents

Purpose of the Document

The purpose of this document is to analyze the current technological state of the Noolaham Foundation, assess its near and long term technical needs and identify strategies to develop the technical, organizational and resource infrastructure needed to support those needs.

Audience

This document is intended for BOD, RB, Management and Technology Team.

Methodology

This report is written primarily by the Noolaham Technical Lead, based on several years of experience contributing to Noolaham Foundation. Limited literature review and Noolaham executive inputs were also used in writing this report.

Table Of Contents

Purpose of the Document
Audience
Methodology
Background
Definitions
Challenges
Major Goals - Three Years
Strategic Priorities
Action Plan
  Approaches to Technology Development
  Research, Collaboration, Development and Innovation
  Develop Interoperable Digital Preservation/Repository System
  Systematization of Digitization and Preservation Services
Core Technical Competencies
  Digital Preservation Technologies
  Digital Library and Delivery Mechanisms
  Documentation Technologies (Information Capture)
  Enabling Technologies
  Operational Technologies
Key Projects
References
Appendix 1 - Metadata
Appendix 2 - E-book
  E-book Formats
  E-book Readers
  E-Book Creation and Conversion
Appendix 3 - Technology Project Development Lifecycle
Appendix 4 - Digital Library Software Evaluation Metrics - High Level (Draft)

Background

Noolaham Foundation’s core mission is to document, digitally preserve and make freely accessible all knowledge relating to Tamil speaking communities of Sri Lanka. Digital preservation technology is at the heart of its mission. Technological infrastructure along with organizational infrastructure and requisite resources form the “three-legged stool” on top of which the Foundation stands.

Digital preservation is a rapidly evolving multi faceted field, encompassing library and information sciences, archival science, data sciences, computer science and digital humanities. Digital preservation systems consists of software, hardware, standards and institutional and organizational setups. Such complexity poses significant challenges in meeting Noolaham Foundation’s technical needs.

In 2006, Noolaham Foundation began as an amature and voluteer effort to bring printed Tamil books online. We had limited knowledge about standards, processes and technologies employed by libraries and archives. Since 2012, the institutional and organizational development of Noolaham Foundation has matured into a leading digital preservation and archiving institution in Sri Lanka. The technological infrastructure development has lagged and currently acts as a bottleneck for several key projects. Thus, Noolaham Foundation is at cross-roads and needs to develop its technical infrastructure to fully support its vision and objectives.

As Noolaham Foundation transition’s into a mature archival institution, the executive/regulatory board of the Foundation requested a clear assessment of its technological infrastructure, challenges, needs and strategies. This report was prepared to address that request.

Definitions

This report will use various technical terms and concepts that may not be familiar to some readers. Thus, they are defined below:

A Resource is physical or digital unit that has informational and/or cultural value.

A Digital Object or Digital Resource refers to a “logically meaningful unit that is deposited with an archive”. Essentially it consists of datastreams of the resource and metadata about the content of the datastreams.

Digital Preservation “is defined as the managed activities necessary:

  1. For the long term maintenance of a byte stream (including metadata) sufficient to reproduce a suitable facsimile of the original document and
  2. For the continued accessibility of the document contents through time and changing technology”.
Digital Repository or Digital Archive is an organizational and technical mechanism for collecting, storing, using and managing digital resources.

Digital Library is used to refer to the user gateways to digital resources.

Digitization refers to the process of converting physical resources to digital resources; for example by scanning.

Ingestion is the process whereby a collection of digital objects are automatically or semi-automatically deposited within a digital repository.

Challenges

Noolaham Foundation faces technical, organizational and resource challenges in meeting its evolving technology needs.

Although Noolaham Foundation has recognized the importance of technology, adequate resources are not allocated to address the demand . Noolaham has failed to invest in technology.

Noolaham Foundation lacks core technical competencies. Noolaham stakeholders, including Board and the Executive lack adequate understanding about the complexities involved in digital preservation. Thus, education and communication about issues around long term digital preservation is vital.

Noolaham Foundation needs to build capacity to undertake research, development and maintenance of the technical infrastructure needed for a digital archiving institution. Noolaham Foundation, being an organization in transition, has little experience in managing volunteers or contractors for technical projects.

Noolaham Foundation lacks adequate secure backup systems of its digital objects. It lacks a LOCKSS system, that can aid in preservation of public domain and Creative Commons licensed resources.

Noolaham Foundation organizational and technical setup does not adequately enforce copyrights management policies.

Noolaham Foundation lacks adequate conceptualization or modeling for digital objects.

Noolaham Foundation aims to digitally preserve and share a diverse set of collections from a varied number of sources. Technical architecture for archival repositories that support different types of data such as images, video, audio, websites, presentations (ppts) and text is a major challenge.

Noolaham Foundation is organized into Chapters and it undertakes diverse set of projects with collaborating organizations. The current centralized digital repository architecture technically and organizationally cannot adequately support this requirement.

Interoperability, persistent digital object identification, metadata services are lacking in the current repository architecture.

Noolaham lacks Collections Services or Mechanisms other than a static portal.

Noolaham Foundation finds it difficult to adopt international standards for archival scanning, categorization, digital preservation and metadata (administrative, descriptive, structural/technical) development.

Currently, the digitization process is centralized, manual and labour intensive. Labour intensive manual processes imped scaling up of operations.

Noolaham Foundation’s delivery mechanisms, including Digital Library are not designed to support archival repositories. For example, the digital library lacks accurate general search, metadata based search, browsability and user community features such as reviews.

Noolaham Foundation lack effective offline delivery mechanisms.

Noolaham Foundation does not take advantage of crowd sourcing or user driven project models.

Noolaham Foundation does not have review or audit mechanisms for its technical systems. It does not distill best practices from successful or failed projects.

Developing a community of practice focused on digital preservation and related technologies, is yet to be expedited. It has failed to build links with mainstream technically focused organizations.

Noolaham Foundation lacks a knowledge base (ideas, resources, notes, contacts) regarding technical matters relating to digital preservation, digital library, digitization, information and library sciences, museology, data sciences etc.

Major Goals - Three Years

  • Develop an OAIS compliant Digital Object model to handle various content types.
  • Develop an OAIS compliant digital preservation system and migrate the current digital resources into that system.
  • Streamline and bring online digitization and digital preservation processes.
  • Semi or full automation of at least 60% of scanning.
  • Develop robust and user friendly discovery mechanisms with capabilities such as search, browse, multimedia and collaboration.
  • Identify and support activities involved in long-term preservation.
  • Create a LOCKSS server to distribute public domain resources.
  • Build capacity to develop and maintain complex software projects.
  • Develop an information portal (example Noolaham Blog) to collect and share digital preservation and related subject matters. Build a community around that portal.
  • Identify accessibility issues and undertake pilot projects to make Noolaham Foundation resources available to differently abled people.
  • Undertake pilot projects in collaboration with other organizations to develop Tamil Optical Character software.

Strategic Priorities

To build and sustain an advanced technical infrastructure, following strategies are formulated to address the issues and challenges( risks) identified above.

1. Adequate Resource Allocation for Technology

Noolaham Foundation must make an active commitment to invest in technology. Realistic cost assessments and resource allocation is the first step in building the technical infrastructure required by the Foundation.

2. Implementation of Open Archival Information System (OAIS) based Standards and Technologies

OAIS is the internationally recognized reference model for digital archives. OAIS and related standards help ensure quality in all of Noolaham Foundation’s activities, including acquisition, rights management, metadata, administration and accessibility.

3. Build Technical Capacity and Communities of Practice

Noolaham Foundation needs to expand its expertise in executing technology projects such as software development. This requires a different set of processes and management expertise than digitization or documentation projects. Digital preservation technologies require expertise in specialized areas. Noolaham Foundation needs to develop resources and training opportunities for volunteers to develop skillsets in these areas.

4. User Centered Approach

All of Noolaham Foundation’s projects aim to provide immediate and practical results to its user community. Listening to user input, measuring and improving user reach and user friendly user interfaces are some methods for user centered development.

5. Develop Self Sufficient and Shared Infrastructure

As a maturing archiving organization, Noolaham Foundation strive to develops self-sufficient hardware, software and system infrastructure to support all its activities. When appropriate Noolaham Foundation shall collaborate with partner organizations to develop shared infrastructure.

6. Free and Open Source Technologies

As an organization founded on the principles of Free and Open Access to knowledge resources, Noolaham Foundation is committed to using and developing Free and Open Sources technologies in all of its levels of operations, specially in digital preservation related technologies.

Action Plan

Approaches to Technology Development

Focus resources on developing core competencies (please see section below for details) in digital preservation and digital library services. Develop expertise in a set of FOSS technology ecosystems.

Noolaham Foundation should follow the Internet Archive model of keeping it simple. As stated by Brewster Kahle, “we don’t do anything that isn’t immediately obvious to college students with Linux on their dorm-room desktop”.

Use well defined, time bound development cycles to produce usable end products. Develop a project model to make use of varied resources such as experts, volunteers, contractors and staff.

Develop technical infrastructure as shared infrastructure in collaboration with partner organizations.

Research, Collaboration, Development and Innovation

Digital repository, digital preservation, digital library, metadata etc demand ongoing focused, research, collaboration, development and innovation. Noolaham Foundation must recognize this need and create special hybrid research and development groups to meet its needs.

Data analytics (process data, usage data and engagement data) should inform the technological and organizational development of the Foundation.

Develop Interoperable Digital Preservation/Repository System

All of Noolaham Foundation’s projects should make use of the following proposed interoperable digital repository and digital preservation services. Every resource deposited to a Noolaham Foundation project should have basic attributes such as metadata, identification and services associated with them. Thus, each digital object can be deposited to the Noolaham Foundation Primary Repository System with ease. If a resource is deposited with one project, collections mechanisms should have access to that resource and related services. For example, if a book is deposited with the Pallikoodam project, Noolaham Digital LIbrary project need not have to digitalize or add metadata again.

Noolaham Digital Library project should be able to harvest the resource and metadata from the Pallikoodam repository. Another example is if Noolaham Canada develops a digital repository for Canadian Tamil Publications, those resources should be harvested and searchable by Noolaham Digital Library or other collection mechanisms.

Develop a Digital Object Model that can be used across all of Noolaham Foundation’s projects and other similar organizations.

Develop and deploy a Interoperable Digital Repository Management System (s) that will provide digital preservation services, including metadata management, storage, access control, workflows, search, ingestion, Persistent Identifiers (PID) and cataloging. Have mechanisms to ingest user contributed content.

Setups to consider:

  • Peer Repositories can deposit it into or update a Primary Repository
  • Primary Repository can deposit into Peer Repositories
The end result will be the same. But the workflow and administrative processing will be different.

Develop Digital Library solutions to provide browse, search, other discovery mechanisms such as keywords, export, and user community features such as review and social media.

Implement Digital Object Identification (DOI) solution for

Develop Collection Mechanisms that will provide search, browse, and cataloging solutions for collections using the distributed repository systems.

Develop LOCKSS System/Peer-to-Peer Systems to distribute Public Domain and Creative Commons licensed works.

Systematization of Digitization and Preservation Services

Identify all the tasks, processes and workflows involved in digitization and digital preservation.

Increase automation in areas such as scanning, ingestion and cataloging. Specially moving to semi-automated and fully automated scanners.

Core Technical Competencies

The Digital Preservation System can be conceptualized into three layers. At the bottom layer is the storage, backup and distribution of digital content. The next layer consists of preservation services such as metadata creation and management, rights and access management, creating delivery and preservation formats, cataloging etc. The top-most layer is the end-user facing layer consisting of browse, search, export and user community features. The processes involved are conceptualized in the Open Archival Information System reference model. The range of function may be available in one integrated system or separate yet interoperable systems and services. Noolaham Foundation seeks to develop core competencies in digital preservation technologies.

Digital Preservation Technologies

Long-term preservation is a core objective of the Noolaham Foundation. Software (ex digital repository software), hardware (ex scanners, storage devices, servers) and system challenges in building end to end digital preservation solutions are immerse. By focusing and developing advanced expertise in this area, Noolaham Foundation can position itself as a leading digital preservation institution in South Asia. Please refer to the “Research Challenges in Digital ARchiving and Long-Term Preservation” (referenced below) for an extended discussion about areas and issues concerning digital preservation technologies.

Digital Library and Delivery Mechanisms

Noolaham Foundation projects are focused on making Sri Lankan Tamil speaking communities knowledge accessible to all, practically and immediately. Digital Library gateways, Collection Mechanisms and other delivery mechanisms play a key role in achieving this objective. Getting this right is critical for Noolaham Foundation’s success. From advanced browse and search facilities to multi media support, export features Digital Library solutions provide numerous challenges.

Documentation Technologies (Information Capture)

Noolaham Foundation works with many marginalized and undocumented communities. These communities lack formal records such as publications or multimedia of their knowledge resources. Thus, developing documentation expertise (photography, audio, video, artifacts) is key in capturing valuable knowledge systems from these communities. Noolaham Foundation also works with connected and dispersed communities. Creating platform for members from these communities to contribute content (crowdsourcing) will aid in growing Noolaham Foundation’s collections.

Enabling Technologies

Enabling technologies are key technologies that fall outside of traditional preservation technologies, yet critical for digital preservation efforts. A general purpose Tamil Optical Character Recognition (Tamil OCR) is an enabler technology that Noolaham Foundation needs to invest in.

Operational Technologies

Website, customer relationship (volunteers, donors, content creators) management system, email, internal document management system, project management systems, analytics are critical for Noolaham Foundation’s success.

Key Projects

Critical Projects

  • Distributed Interoperable Digital Repository System and Preservation Services
  • Metadata
  • Integrated Metadata Harvester and Catalog
  • User Shared Collections
  • Born Digital Collections Harvester and Archive
  • Digital Library gateways
  • Multimedia Archive
  • Manuscript Archive
  • Microfilm Archive
  • Collection Mechanisms
  • Pallikoodam - Virtual Learning Environments
  • Web Archive & Aggregator
  • Low cost automated scanning solutions

High Value Projects

  • Offline delivery mechanisms
  • Marketing Website
  • Donor Engagement and Management Software
  • Library and Information Sciences, Archiving, Museology, Digital Libraries etc Information/Community Portal

Advanced Projects

  • Tamil Optical Character Recognition
  • Textual Search Tool - Search capability for Tamil Digital Library (linked with Tamil OCR)
  • Vertical Tamil Search Engine (can be linked with Web Archive Project)
  • North, East and Hill-country Geographic Information System

Other Projects

  • Integrated Village Portals
  • Local Knowledge Maps and Databases (Plants, Herbs, Crafts etc)
  • Personal Noolaham Reader & Contributor Tool
  • History Timeline
  • Offline delivery mechanisms
  • Tamil Data Visualization Project (ex 3D rendering, Graphics)

References

2003. Research Challenges in Digital Archiving and Long-Term Preservation

http://www.digitalpreservation.gov/documents/about_time2003.pdf

Defining Collections in Distributed Digital Libraries

http://cdigital.uv.mx/bitstream/123456789/6080/2/Bibliotecas%20digiitales%20colecciones.pdf

Digital Preservation Architecture and Technology for Trusted Digital Repositories

http://www.dlib.org/dlib/june05/jantz/06jantz.html

Coordination Action on Digital Library Interoperability, Best Practices and Modelling Foundation

https://www.coar-repositories.org/files/D3-4-Digital-Library-Technology-and-Methodology-Cookbook1.pdf

Library Technology Forecast for 2014 and Beyond

http://www.infotoday.com/cilmag/dec13/Breeding--Library-Technology-Forecast%20-for-2014-and-Beyond.shtml

Metadata: Standards & Structures

http://www.bslw.com/metadata/

Appendix 1 - Metadata

Descriptive Metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords. Noolaham Foundation uses Dublin Core standards for descriptive metadata.

Administrative Metadata provides information to help manage a resource, such as when and how it was created, who can access it, copyrights information, selection criteria or archiving policy, contractual information and general administrative information. Administrative metadata is sometimes divided as Rights Management Metadata and Preservation Metadata.

Structural Metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters. E-book format is an example of a structure metadata. METS

Technical Metadata describes the technical processes used to produce, or required to use a digital object.

Please refer to T11 NF Metadata Standards documents for details.

Appendix 2 - E-book

E-book is a book published in the digital form consisting of text, images and interactive elements. E-book can be published using many formats. In order to read an e-book, you will need an e-book reader in your device that supports the format in which your e-book is published.

E-book Formats

PDF is the widely used e-book format. The format specifications were released to open source in 2008. Although widely used and supported, the creation of pdf documents is a cumbersome process that requires commercial applications. Although users can open PDF documents in all mobile devices, PDFs contain static text, which makes it hard to read on small screens.

EPUB is a widely supported (with the notable exception of Amazon Kindle) open source e-book format. It is published by the International Digital Publishing Forum (IDPF) industry consortium. EPUB underlying technology uses HTML and CSS. Various quality open source tools are available to read and create EPUB format ebooks. EPUB3 is the latest version of the EPUB format, and it supports interactivity within an e-book. EPUB e-book will display content to fit the screen, contrary to PDF.

MOBI is a open e-book format, primarily developed and supported by Amazon. It has a large commercial market-share, and it is similar in functionality to EPUB.

E-book Readers

EPUB e-books can be read in FireFox (EPUB Reader - www.addons.mozilla.org/en-US/firefox/addon/epubreader/) and Chrome (Readium - www.readium.org) browsers using Add-Ons. Internet Explorer does not have an addon, but you can visit www.epubread.com/app/reader.html to read EPUB files from Internet Explorer.

E-Book Creation and Conversion

There exist an array of FOSS and commercial tools to create e-books and convert one e-book from one format to another e-book format.

Appendix 3 - Technology Project Development Lifecycle

  • Business Requirements and Uses Cases (Organizational, User based)
  • Requirements Specifications (Technical)
  • Request for Proposal (RFP)
  • Solution Proposal & Demo (external)
  • Solution Design (internal)
  • Solution Selection Approval
  • Implementation
  • Testing
  • User Acceptance Testing
  • Deployment
  • Maintenance, Documentation
  • End of Life Cycle Plan

Appendix 4 - Digital Library Software Evaluation Metrics - High Level (Draft)

Usability Metadata Storage Administration
Browse Descriptive Meta data – Dublin Core File based Easy to upload a text file
Search OAI-PMH, SWORD Database based Easy to upload an audio file
Key Words, Tags Other (RDA) Cloud based Easy to upload a small/medium size video file
Collections METS Security Easy to add metadata
Multimedia Support PREMS Backup Easy to classify an item
Access Control Distribute User management
Rights Management Workflow management
Social Media Open Source
Mobile Interface Support Easy to Install
Modern / Themable Interface Low-medium resource requirement
Import Export Digital Objects as an item or in batch Extendable
Multilingual interface / Localized / Unicode Expertise available
Basic CMS
User Contributions
  • Share
  • Review
  • Tag
  • Rate
  • Annotate
  • Add to Favourites
  • You may also like

Appendix 5 - High Level System Diagram

Peer Collection / Repository System < SIP/DIP - DIP/AIP > Primary Repository System [AIP] External Repositories < DIP/AIP - AIP/DIP > Primary Repository System [AIP] Primary Repository System [AIP] -> Integrated Discovery Mechanism (long term)

Primary Repository System - All digital objects are collected here. Provides long term storage/metadata/identification/search/preservation services. Built on top of an modular and extensible digital object and repository architecture.

Peer Collection / Repository Systems: Can be stand alone system implementing a project for a designated user community (ex Pallikoodam, Norwegian Tamil Digital Library).

(Open) External Repositories such as British Library provide access to their resources which can be ingested and incorporated into NF Primary Repository System. This will become more important from information services point of view.

Integrated Discovery Mechanism is our ideal long term UI similar to Austrialian National Library’s Trove system.

Appendix 6 - Analytics

  • Usage Data
  • Internal Documentation
⚠️ **GitHub.com Fallback** ⚠️