Meeting Agenda - unicode-org/inflection GitHub Wiki
2024-12-10
From George:
Some topics for the next meeting will likely include the following:
- Merge the pull request? Who?
- Generating lexical data from Wikidata. Status?
- Getting someone else to compile the code besides George by following the instructions in the readme.
- What is the minimum set of OSes and compilers to be supported for the initial release. Support anything beyond macOS and Linux with the clang compiler for now?
- Do we keep the CoreFoundation dependency in the C API? If not, the following topics will have to be discussed.
- How should informative error message in exceptions be conveyed? Use UErrorCode instead and lose the diagnostic information?
- How should a locale be specified? const char *?
- How should a string be specified? const char16_t * with length parameter? The code currently requires C11 as a minimum.
- How should an array or a map of objects be represented? This is a less trivial of a mapping, but it’s doable.
- Who should make this change?
- Should the project name be renamed? If so, the following topics will have to be discussed.
- Who is making the change?
- The C API will likely have to change the prefix, if it’s not starting with an m.
- Lots of directories and C++ namespace will have to change too.
- Who should write instructions for adding a new language? Should it be George with someone else actually adding a language? When should that be attempted?
- How should documentation be released? How about doxygen? What minimum version of doxygen should be supported? Any other documentation?
- If someone wants to make a breaking API change or grammatical structure change due to a poor decision, before the initial release would be a good time to discuss such topics. I already made the important modifications for this area.
- Target date for official release.
2024-11-26
CANCELED due to Thanksgiving week in the USA.
2024-11-12
- Go over the Apple PR
- Discuss license
2024-10-29
- Status of Apple code contribution - one more approval
- The presentation went well
2024-10-15
- Status of Apple code contribution
- Finalizing UTW presentation
- Gaming localization use case
2024-10-01
- Status of Apple code contribution
- Please add your name, position etc to the contributor slide (slide 16)
2024-09-17
- Discuss state of Apple code contribution
- Discuss UTW participation (talk was accepted for 40min session
2024-09-02
Canceled due to Labor Day in USA, no agenda
2024-08-20
- Status of Apple code contribution
- UTW presentation (discuss abstract, what we want to cover)
2024-08-06
Short agenda after the break
- Status of Apple code contribution
- UTW presentation (anybody wants to co-present?)
Nebojša submitted a short abstract to the Unicode organizers:
"Noun inflection is an unsolved problem in message formatting/UI and affects 1.7B users from Slavic, Arabic, Hebrew, Indic and other languages. Most companies deploy UI work arounds that don't sound native or lose personalization available to English users.
I would like to evangelize the new Unicode WG effort and attract contributors, both engineers and linguists to help us scale to as many languages as possible."
2024-07-23 CANCELED
Many OOOs, no agenda, see email
2024-07-09 CANCELED
Many OOOs, no agenda, see email
2024-06-25
Covering Apple contribution (from George's email). These are the main parts of the wrapper.
- https://developer.apple.com/documentation/foundation/inflectionrule
- https://developer.apple.com/documentation/foundation/morphology
- https://developer.apple.com/documentation/foundation/termofaddress
- https://developer.apple.com/documentation/foundation/inflectionconcept
- https://developer.apple.com/documentation/foundation/morphology/pronoun
Here are previous presentations that involve this wrapper code.
- UTW Automatic Grammar Agreement in Message Formatting
- WWDC23: Unlock the power of grammatical agreement | Apple
If Kyle joins we can discuss:
- Dictionary & Rules & ML approach
- Check if there's a way to attract NLP students to help scale
2024-06-11 (CANCELED - too many OOOs)
Covering Apple contribution (from George's email). These are the main parts of the wrapper.
- https://developer.apple.com/documentation/foundation/inflectionrule
- https://developer.apple.com/documentation/foundation/morphology
- https://developer.apple.com/documentation/foundation/termofaddress
- https://developer.apple.com/documentation/foundation/inflectionconcept
- https://developer.apple.com/documentation/foundation/morphology/pronoun
Here are previous presentations that involve this wrapper code.
- UTW Automatic Grammar Agreement in Message Formatting
- WWDC23: Unlock the power of grammatical agreement | Apple
If Kyle joins we can discuss:
- Dictionary & Rules & ML approach
- Check if there's a way to attract NLP students to help scale
2024-05-28
- George sent an email about Apple inflection code open-sourcing
- Further discussion about FST & ML (LSTMs)
- There is a open source library for FST training
- LSTM approach with >90% accuracy, code & video/paper
- Potential contributors from academia (no solid news here)
2024-05-14
- Getting month data from Wikidata (thanks Denny)
- Lexemes https://w.wiki/A4ya
- Forms https://w.wiki/A4yg
- Labels https://w.wiki/A4yq
- Serbian rules PR to showcase more complex rules
- Rule generation using examples
- Multiple results from API - some words can inflect in many ways depending on context (can be done with FSTs with weights), but higher level logic needs to decide which one to use
2024-04-30
- Go over PRs
- Some projects/questions:
- Expand the lexicon - form1: attr1, attr2; form2: attr1, attr3;...
- Investigate pulling Wikidata (script)?
- Use FST model to work with dates in English (CLDR lexicon/dates)
- Add a more complex example using Pynini (Serbian/Russian?)
- An interesting quote from the FST book
"In our opinion, finite-state methods still play a central role in speech and language technologies and are not going away any time soon. At Google, the OpenFst and OpenGrm libraries remain absolutely essential for latency-sensitive applications like voice search, automated captions in YouTube, and the Google Assistant. Many Google engineers and linguists working on speech and language processing specialize in WFST algorithms or grammar development.
While we cannot speak to practices elsewhere in the tech industry, Pusateri et al. (2017) reports that the Apple’s Siri assistant uses finite-state grammars—hybridized with a neural network for inverse normalization, i.e., to convert ASR transcripts to a human-readable form. The powerful Kaldi speech recognition toolkit—widely used by academic researchers uses a WFST decoder, implemented with OpenFst.
Other technologies - including modern neural networks — have begun to encroach on the state of the art for speech technologies, and may ultimately render WFSTs obsolete, but such technologies still struggle to compete on latency, particularly for embedded platforms (e.g., mobile devices) lacking the specialized hardware needed to support large neural networks."
2024-04-16
- Go over PRs
- Go over next steps, e.g. how to do inflection.
2024-04-02
- Introductions
- Go over the discussion
2024-03-19
- Denny present Wikidata
- Review “Issues” and prioritize them
2024-03-07
- Introduce members
- Discuss operations, e.g. meeting cadence/duration
- Discuss goals and non-goals
- Go over issues
- Discuss repository structure