Meeting Agenda - unicode-org/inflection GitHub Wiki

2025-08-19

Released v0.1
- Waiting for tutorial for Unicode blog post
GSoC updates
- Binary releases for MacOS/Linux done, Windows build research in progress
- Malayalam pull request passes its tests. A few more changes are needed.
- Presentations to be done (email sent to mentors and mentees)
Unicode approved our talk, by Wednesday, August 27
- Respond and confirm acceptance of your slot
- Complete your Speaker & Appearance Release Form.
- Send a photo to include on the event website to [email protected].
Any PRs to unblock?
Any issues to discuss?

2025-08-05

Handle CI state before releasing 0.1
- Most of the data/code is in
- Memory leak in Linux CI is being triggered
GSoC updates/unblocking
Any PRs to unblock?
Any issues to discuss?

2025-07-22

Permanent meeting time 9am?
Any PRs to unblock?
GSoC status
Progress on 0.1 update
UTW topic discussion, George has proposed the outline

2025-07-08

GSoC status (and survey on/after July 14th)
Progress on 0.1 update
Arabic/Hebrew update after new Wikidata drop
Doxygen update
UTW topic discussion

2025-06-24

Meeting cancelled

2025-06-10

GSoC status
0.1 status (was legal gating last time)
- Temporarily remove failing Arabic and Hebrew tests to complete transition of all lexical dictionaries to Wikidata? Good progress, but more work is needed.
- Legal status of contribution to transition rest of lexical dictionaries to Wikidata.

2025-05-27

GSoC status
UTW presentation abstract is submitted
LFS
Tagging with 0.1 (refresh memory)

2025-05-13

git LFS status (ICU discussion, what can we do on our end)
GSoC status
0.1 release readiness
Message Format scope inclusion, timing, and location
Wikidata integration progress update
Discuss feasibility of https://github.com/unicode-org/inflection/issues/112

2025-04-29

Waiting for LFS budget to reset for CIs
Discuss code/tests with George
GSoC status

2025-04-15

CI is disabled on pull requests due to LFS issues. Both CLDR and Inflection filled tickets with GitHub.
Update on decompounding contribution (approved by nciric)
Unicode Technology Workshop 2025 presentation, call for presenters is open
MF2.0 discussion (if Addison is present)

2025-04-01

Status update for Inflection-37 Support word decompounding for inflecting words
- Large contribution incoming at some point (larger than initial commit)
Status update for switching languages to Wikidata
- Help?
- Progress made so far
- Some more templating/importing tools for Wikidata
- Potential Wikidata donation of 4 Indic languages (and question of how to add them to WD)
Status update for Google Summer of Code project proposals

2025-03-18

Update from Wikidata meeting on Monday
Helping GSoC and other volunteers
- MacOS CI added by future GSoC, some improvements to Ubuntu
Denny may be around for this meeting:
- Help with Wikidata submissions for German
- What is the minimum set of Senses for a new lexeme? (description? or also a link?)
Prioritize new languages, or steer volunteers to contribute to Wikidata for languages of interest to them?
Go over issues marked as "discuss"

2025-03-04

Problems with Wikidata upload, got a warning about incomplete items
- Should update the tool to include some senses, like description
GSoC contribution for packaging/release
Question about availability of this library on iOS (if info is shareable)
Talk about language progress.
- Danish and English are done.
- Spanish is close to done.
- Norwegian is not wrong.
- Italian is 1 word away from being done.
- Portuguese, Swedish, Dutch and Russian have improved, but need more work.
- French and Turkish may need a lot of customization beyond the scope of Wikidata. How much should we allow?
- German may be blocked by some conflicting opinions on how Wikidata lexemes should be structured regarding occupations and physical gender.
- Arabic, Hebrew, and Hindi need a lot of effort.
Go over issues marked as "discuss"

2025-02-18

Created Wikidata upload tool (based on Denny's script)
CI optimized, thanks George for the pointer, to use multicore build (2x speedup)
Supporting data upload to Wikidata - we need to make it simple
- How do we review the results after the bulk upload?
How to handle adjectives (align case, number and gender so one can have 3 genders)
- Keep them together as forms? Split them up by gender into 3 lexemes?
- [Bruno] better to avoid. We should follow the Mask definition (see explaination here)
Go over issues marked as "discuss"

2025-02-04

Some lexicons are large, 30+MB. Should we use GitHub's LFS?
Basic CI is set up, works for Linux build and relies on system ICU (74)
Supporting data upload to Wikidata - we need to make it simple
- How do we review the results after the bulk upload?
Started discussion about use of AI for lexicon generation.
Created 3 milestones, 0.1, 1.0, Post 1.0. Use them with issues.
Granted write access to BrunoCartoni (for a language addition/triaging).
- Let me know if anybody else needs access, and provide your GitHub username.
Go over issues marked as "discuss"

2025-01-21

Dependency on CoreFoundation was removed which also fixed the Ubuntu problems.
Comment on Wikidata Java PR
Update on CI
Steps for first official 0.1 release (should start an issue about that)
Go over issues marked as "discuss"

2025-01-07

Happy New Year!

Go over Ubuntu build
- it may require changes to README on how to set up ICU
Go over open pull request(s)
- How-to needs approval
- Adding sr-RS can now be worked on as the build works/tests run. Need help with dictionary/lexicon to proceed.
Progress on topics from last meeting
- Generating lexical data from Wikidata - "Danny to generate data from Wikidata"
- CoreFoundation depenency removal (timeline?)
  - Remove the Core Foundation dependency in Morphuntion
- Renaming decision waits till Mark is back (he said he will attend)
- "RG: Doxygen may have issues with licensing, we’ll need to check"

2024-12-10

From George:

Some topics for the next meeting will likely include the following:

Merge the pull request? Who?
Generating lexical data from Wikidata. Status?
Getting someone else to compile the code besides George by following the instructions in the readme.
What is the minimum set of OSes and compilers to be supported for the initial release. Support anything beyond macOS and Linux with the clang compiler for now?
Do we keep the CoreFoundation dependency in the C API? If not, the following topics will have to be discussed.
How should informative error message in exceptions be conveyed? Use UErrorCode instead and lose the diagnostic information?
How should a locale be specified? const char *?
How should a string be specified? const char16_t * with length parameter? The code currently requires C11 as a minimum.
How should an array or a map of objects be represented? This is a less trivial of a mapping, but it’s doable.
Who should make this change?
Should the project name be renamed? If so, the following topics will have to be discussed.
Who is making the change?
The C API will likely have to change the prefix, if it’s not starting with an m.
Lots of directories and C++ namespace will have to change too.
Who should write instructions for adding a new language? Should it be George with someone else actually adding a language? When should that be attempted?
How should documentation be released? How about doxygen? What minimum version of doxygen should be supported? Any other documentation?
If someone wants to make a breaking API change or grammatical structure change due to a poor decision, before the initial release would be a good time to discuss such topics. I already made the important modifications for this area.
Target date for official release.

2024-11-26

CANCELED due to Thanksgiving week in the USA.

2024-11-12

Go over the Apple PR
Discuss license

2024-10-29

Status of Apple code contribution - one more approval
The presentation went well

2024-10-15

Status of Apple code contribution
Finalizing UTW presentation
Gaming localization use case

2024-10-01

Status of Apple code contribution
Please add your name, position etc to the contributor slide (slide 16)

2024-09-17

Discuss state of Apple code contribution
Discuss UTW participation (talk was accepted for 40min session

2024-09-02

Canceled due to Labor Day in USA, no agenda

2024-08-20

Status of Apple code contribution
UTW presentation (discuss abstract, what we want to cover)

2024-08-06

Short agenda after the break

Status of Apple code contribution
UTW presentation (anybody wants to co-present?)

Nebojša submitted a short abstract to the Unicode organizers:

"Noun inflection is an unsolved problem in message formatting/UI and affects 1.7B users from Slavic, Arabic, Hebrew, Indic and other languages. Most companies deploy UI work arounds that don't sound native or lose personalization available to English users.

I would like to evangelize the new Unicode WG effort and attract contributors, both engineers and linguists to help us scale to as many languages as possible."

2024-07-23 CANCELED

Many OOOs, no agenda, see email

2024-07-09 CANCELED

Many OOOs, no agenda, see email

2024-06-25

Covering Apple contribution (from George's email). These are the main parts of the wrapper.

Here are previous presentations that involve this wrapper code.

If Kyle joins we can discuss:

Dictionary & Rules & ML approach
Check if there's a way to attract NLP students to help scale

2024-06-11 (CANCELED - too many OOOs)

Covering Apple contribution (from George's email). These are the main parts of the wrapper.

Here are previous presentations that involve this wrapper code.

If Kyle joins we can discuss:

Dictionary & Rules & ML approach
Check if there's a way to attract NLP students to help scale

2024-05-28

George sent an email about Apple inflection code open-sourcing
Further discussion about FST & ML (LSTMs)
- There is a open source library for FST training
- LSTM approach with >90% accuracy, code & video/paper
Potential contributors from academia (no solid news here)

2024-05-14

Getting month data from Wikidata (thanks Denny)
- Lexemes https://w.wiki/A4ya
- Forms https://w.wiki/A4yg
- Labels https://w.wiki/A4yq
Serbian rules PR to showcase more complex rules
Rule generation using examples
Multiple results from API - some words can inflect in many ways depending on context (can be done with FSTs with weights), but higher level logic needs to decide which one to use

2024-04-30

Go over PRs
Some projects/questions:
- Expand the lexicon - form1: attr1, attr2; form2: attr1, attr3;...
- Investigate pulling Wikidata (script)?
- Use FST model to work with dates in English (CLDR lexicon/dates)
- Add a more complex example using Pynini (Serbian/Russian?)
- An interesting quote from the FST book

"In our opinion, finite-state methods still play a central role in speech and language technologies and are not going away any time soon. At Google, the OpenFst and OpenGrm libraries remain absolutely essential for latency-sensitive applications like voice search, automated captions in YouTube, and the Google Assistant. Many Google engineers and linguists working on speech and language processing specialize in WFST algorithms or grammar development.

While we cannot speak to practices elsewhere in the tech industry, Pusateri et al. (2017) reports that the Apple’s Siri assistant uses finite-state grammars—hybridized with a neural network for inverse normalization, i‧e., to convert ASR transcripts to a human-readable form. The powerful Kaldi speech recognition toolkit—widely used by academic researchers uses a WFST decoder, implemented with OpenFst.

Other technologies - including modern neural networks — have begun to encroach on the state of the art for speech technologies, and may ultimately render WFSTs obsolete, but such technologies still struggle to compete on latency, particularly for embedded platforms (e‧g., mobile devices) lacking the specialized hardware needed to support large neural networks."

2024-04-16

Go over PRs
Go over next steps, e‧g. how to do inflection.

2024-04-02

Introductions
Go over the discussion

2024-03-19

Denny present Wikidata
Review “Issues” and prioritize them

2024-03-07

Introduce members
Discuss operations, e‧g. meeting cadence/duration
Discuss goals and non-goals
Go over issues
Discuss repository structure