Some ideas - dtolpin/yycat GitHub Wiki

YUNG YiDiSH Catalogue

How the Book Collection looks now

YY has a vast collection of Yiddish books. The books come from

public libraries which close their Yiddish sections;
private collections (when the owner dies or otherwise looses control, I assume);
trash bins.

Some of the books are on the shelves,

either ordered in the alphabetical order by author/title;
or grouped by subject.

Most of the books are in boxes. Most of the books in boxes are unsorted. There are many periodicals: literary journals, magazines, newspapers. There are also some manuscripts, not all attributed. An estimate that there are about 40,000 units in the library, of which about 3,000 units are on the shelves of YUNG YiDiSH Tel Aviv, the rest is in YUNG YiDiSH Jerusalem or in the boxes. There is a partial catalogue (in Microsoft Access Format) containing information about attributes and locations of some of the books in the collection.

Objectives: Sorting and Cataloging

Two things need to be done:

The books in the boxes must be sorted and put on shelves.
All of the books must be registered in a computer catalogue.

Additionally, the books can be gradually digitalized, fully or partially, but this is a long-term objective of lower priority.

Sorting

Leaving out the logistic issues (enough space, enough workforce, air conditioning etc.), the main challenge is registering the books as the sorting goes on, therefore I suggest that the cataloging tool chain is prepared and tested on the books already on shelves in YUNG YiDiSH Tel Aviv. As soon as that 3,000 book collection is registered and conveniently accessible, we can proceed to the boxes.

Cataloging

How the Perfect Catalogue Should Look and Work

Searching in the catalogue

The electronic catalogue is stored in a computer, accessible from the Internet. The user can find a book record in the catalogue by searching for a combination of attributes (title, author, year) and/or by specifying a set of topics or tags (periodical, music, poetry etc.) A book record contains basic bibliographical about the book, the book location (or locations) in the collection, and images of the title page (optionally of several initial pages). One example is the catalogue of Amazon book store: a heterogeneous book collection with images and relevant information; a record provides an easy-to-grasp general idea about the book. Different entries may have varying amount of information.

Every item in the collection should have a unique identifier, like ISBN or ISSN. Once a book is located in the catalogue, it can be later referenced by the unique identifier which stays the same no matter how the catalogue is re-ordered, reorganized, expanded, or improved. Most of the books and peridicals are old and do not have assigned ISBN/ISSN numbers, hence a way to simulate ISBN/ISSN like identifiers should be implemented. Should there be ISMN (for manuscripts)? Should a different scheme for identifiers be used instead?

Yiddish-specific: the catalogue should be in Yiddish with (automatic) transliteration of authors and titles. The user should be able to search in either Hebrew or Latin script, and spelling discrepancies must be accounted for automatically.

Book labels

Every book in the collection gets a label. The label contains basic information about the book (author, title, year, publisher, least of topic), and the book identifier. The information is provided in both human-readable and computer-readable forms: as text and as barcodes. I suggest that there be TWO barcodes, one representing the IS{B/S/M}N only, the other containing the basic information about the book, such that the user can clip the book information for later use with the need to instantly access the catalogue.

In such a way, the user can scan the identifier to find the book record in the catalogue, or just/read scan the basic information for clipping/storage/reference. Any modern smartphone is powerful enough for scanning barcodes of moderate size.

Realistic Implementation

I like the idea of creating a catalogue that costs little to create and is free (or almost free) to maintain, and believe this can be done without compromising quality or convenience.

Data storage

The maximum size of the catalogue will be several tens of thousands of entries at most, a small table by today's merits. A full-blown database is not actually required, the table can be put online as a spreadsheet in Google docs, with handy scripts for data entry, search, and record displays. Some of the functionality is somewhat trickier to implement (like incorporating the images and figuring out where to store them, but I believe everything is doable).

Another advantage of storing the catalogue as a spreadsheet is the ability to export it and store in the local computer or on a USB drive, for backup, sharing, or offline usage. Handy and does not require any custom software.

Scanning books

Flatbed scanners are no good for old fragile books, and professional book scanners are expensive. But those expensive professional scanners are actually copy stands with one or two digital cameras and lights, see the Wikipedia article on book scanning, for example.

Modern smartphones have good cameras, suitable lights can be bought for 50 shekels, and a perfect book bed can be made from a cardboard box. A small technical problem is how to bolt the phone to the tripod; I ordered a toy copy stand and a phone holder from eBay, these two gadgets should do the trick. Since we are going to use our (at least mine) smartphones for scanning the barcodes, accessing the database and so on, using them for scanning a few pages from every books we record is just natural.

An additional advantage of photographing the books with the phone camera is that the smartphones are connected to the internet, so the shots will be immediately available on-line for processing.

Generating, printing and using barcode labels

There is plenty of tools for generating barcode images, both online and downloadable. I will write a script (a small program) for generating the labels, we can print them online in batches, and then stick to the books catalogued during the previous session. When the book has a label,

we know that the book is in the catalogue;
the catalogue has a field saying what shelf the book must be on;
the basic information about the book is readily available on the book itself, we don't need to reconstruct it every time. Every library user with a smartphone has access to the data.