Galron Lexicon migration design - abartov/bybeconv GitHub Wiki

WHAT IT IS

The Galron Lexicon is a collection of static files (edited with FrontPage) containing a bio-bibliographical lexicon of authors (this is the core of the lexicon), but also a variety of bibliographical resources as well as scans of actual publications (secondary parts of the lexicon).

OVERALL GOAL

Integrate its entire contents into Project Ben-Yehuda's app and DB. Eventually, stop serving the static files completely, redirecting automatically from all legacy URLs.

DATA MODEL

  • LexEntry - polymorphic?

    • LexPerson - for people entries in the lexicon
      • calculate copyright status for each
      • eventually add 'genre' field to each LexPerson via some process
      • make it impressionable, to support top-ten
    • LexPeriodical - for journals or other collections in the Lexicon; may be just bibliography (plain markdown? structured?), or include links to LexTexts
      • journals can have multiple incarnations of consecutive publication (e.g. 1902–1910 and, after a pause, 1935–1962).
      • journals have LexIssues
      • LexIssue - representing a single issue of a LexPeriodical. LexIssues have an optional volume number, and an issue number (or letter), as well as a system-generated sequential number. They optionally have an editor (or several) associated with this particular issue, distinct from the overall editor of the journal.
      • for the Yiddish lexicon page (9480 in spec) and similar, support a per-publication boolean setting to show the A-Z navbar (which would be simple jumps to anchors, i.e. absolutely depend on there being H2s (##) in the markdown titled א etc.)
  • LexPublication - entity representing a single publication by a particular LexPerson, to potentially link to a Collection of type volume, and to associate with LexCitations that are specifically about it, rather than generally about the LexPerson.

  • LexText - entity for works made available via the lexicon. Initially, will link to a PDF; as a standard PBY edition is prepared, a link to the PBY edition would be added to the LexText entity as well.

  • LexCitation - entity for representing a citation, potentially linking to the cited source's author[s] in PBY and/or the Lexicon, if available, to Wikidata if available, and only as pure string otherwise. Also linking to external full-text if available somewhere online, e.g. Ha'aretz.co.il

  • LexLink - a collection of external links for entries and for homepage (9100 in spec)

  • (UX note: when editing a lexicon page, the user will be prompted if there are PBY editions by this author/publication that are not yet linked from the lexicon entry markdown

PHASE ONE

  • Set up a hybrid routing system, serving legacy static files for all content that hasn't yet been fully migrated.

  • Identify lexicon entries that are about people (as distinct from literary journals or anthologies, which are also top-level entries in the lexicon)

  • Parse person-entries and populate LexPerson entities (creating a LexEntry for each, too. The main Lexicon is an alphabetical list of LexEntries; most are LexPersons, and some are LexPublications.)

    • bio converted to a markdown buffer
    • person's publications converted to a LexPublication
      • LexPublication linked to a Collection (volume) if its contents are already being served in PBY's main repository
    • reviews (LexCitations) of the person in general are linked to the LexPerson
    • reviews (LexCitations) of a particular publication by the person are linked to the LexPublication
    • external links are converted to a LexLink
  • Create a back-end interface for reviewing the auto-converted content, allowing to edit it, as well as to link the LexPerson to an existing Authority record, if one exists. Ideally, try to auto-identify matching entities (taking into consideration an Authority's other designations as well), and get the human reviewer to confirm or correct the proposed identification.

  • Create front-end view for the list of LexPerson entries (screen D9200 in SPEC)

  • Create front-end view for a single LexPerson entry (screen D9300 in SPEC)

  • Create at least basic back-end for editing LexPerson, LexCitation, LexPublication, and LexLink entities, to allow maintaining and updating migrated entries of the Lexicon while the rest is still served from (and maintained in) the static files.

LATER PHASES

  • per content type beside author entries, determine:
    • whether we ingest it into PBY as Manifestations in a Collection
    • whether we ingest it into a new Lex* type
    • whether we leave it forever as legacy static pages (unlikely)
    • something else?