Meeting record for 15th Dec 2021 - petermr/CEVOpen GitHub Wiki

Projects regroup and planning

docanalysis + "crops"we need to split:

  • Dictionary
  • Corpus
    • writable:
      • JATS metadata (front) - searchable
      • search results
      • extracted objects (images)
  • unsupervised
    • phrase extraction (YAKE),
    • PKE - multiple unsupervised tools
  • supervised
    • dictionary-based
    • string matching - ?library routines
      • fuzzy
      • lexemes / lemmatise
      • capitalization
      • stemming
  • workflow (commandline)
  • entity in context / w3c annotation The key target is to get unsupervised and dictionary-based searching for our new interns.

pyami

  • pypi / installation
  • sectioning and sentences
  • searching

imageanalysis @Anuv and @Peter Murray-Rust

probably need to add a stateful `AmiImage to manage conversions TextBox now displays boxes on text . We'll need user testing AmiArrow has been started.

General workflow

  • pygetpapers
  • ami-search and corpus manipulation (section, glob, delete, filter) (docanalysis + pyami) -> "results" maybe Excel, Pandas, CSV, etc.
  • downstream tools (python, display, analysis and ML) -> Pandas