Meeting record for 15th Dec 2021 - petermr/CEVOpen GitHub Wiki
Projects regroup and planning
docanalysis + "crops"we need to split:
- Dictionary
- Corpus
- writable:
- JATS metadata (front) - searchable
- search results
- extracted objects (images)
- writable:
- unsupervised
- phrase extraction (YAKE),
- PKE - multiple unsupervised tools
- supervised
- dictionary-based
- string matching - ?library routines
- fuzzy
- lexemes / lemmatise
- capitalization
- stemming
- workflow (commandline)
- entity in context / w3c annotation The key target is to get unsupervised and dictionary-based searching for our new interns.
pyami
- pypi / installation
- sectioning and sentences
- searching
imageanalysis @Anuv and @Peter Murray-Rust
probably need to add a stateful `AmiImage to manage conversions TextBox now displays boxes on text . We'll need user testing AmiArrow has been started.
General workflow
- pygetpapers
- ami-search and corpus manipulation (section, glob, delete, filter) (docanalysis + pyami) -> "results" maybe Excel, Pandas, CSV, etc.
- downstream tools (python, display, analysis and ML) -> Pandas