Legacy Devnagri to Unicode font converter for formatted documents like word, spreadsheet, presentation - datameet-pune/datameet-pune.github.io GitHub Wiki

Aim

User uploads an office document like word, spreadsheet or presentation that has text in a legacy Devnagri font like Shree-Dev, Kruti-Dev etc.

The program returns for download a modified version of the file where all instances of legacy font text have been replaced with same text in unicode.

All while keeping formatting, tables etc intact and keeping the other normal text in the doc as it were.

What we have

We have simple webpages using Javascript that do this job for unformatted text that is copy-pasted into the page. You can find these converters published in the links below.

Problem statement

When we have to convert entire documents that have formatting done in them, and docs which have text in both normal and legacy fonts, then it becomes difficult to convert using the simple converters. The text boxes strip away all formatting. And with the nature of legacy to unicode conversion, if any normal (aka English) text is put in then even those characters get replaced according to the character mapping algorithm. See this screenshot to see mapping of some popular legacy fonts

Use cases

Scores of documents in government offices have been written using legacy text since years, and the offices have a standing order (Note: find the the exact GR for Maharashtra and insert link here) to convert all of these documents to Unicode. Most folks don't even know about the presence of the simple scripts linked above. Having an automated solution that converts entire docs while taking care of the problems mentioned above. would greatly help in bringing all documents up to Unicode. There are probably several non-governmental institutions and companies too having this same issue. Whoever has been using Indian languages in computers before Unicode Devnagri became mainstream, is probably facing this issue and can use this solution.

Links