ZZZ Other JA Models and Resources - AUGMXNT/shisa GitHub Wiki

In general the best-maintained resource tracking the latest JA-related LLM stuff is: https://github.com/llm-jp/awesome-japanese-llm/

I'm also now curating a JP AI Twitter List: https://twitter.com/i/lists/1738064886427734518

See also latest HF JA text-generation Models: https://huggingface.co/models?pipeline_tag=text-generation&language=ja&sort=modified

We'll just use this to track some of the more interesting Japanese releases and write-ups we come across:

2024-01-15 Karasu/Qarasu

Released EOY 2023, but a new announcement here:

First new model to use some of our dataset:

2023-12-21 Nekomata

Rinna has released new Qwen-based models (150K Qwen tokenizer, +66B token pre-training). Based on how strong Qwen-14B Chat was, interested to see how this tune compares:

2023-12-20 ELYZA-tasks-100 Shootoff

A very detailed writeup testing out a lot of JA (and a few non-JA) LLMs using GPT-4 judging (with some analysis on that aspect as well). For instruction following, I assume a decent large-sized instruct is pretty necessary to answer many of these questions properly.

2023-12-19 Swallow

2023-12-18 Kobata-Recipes

Kazuki Fujii (who worked on training Swallow) published their PEFT, FSDP, PEFT+FSDP recipes. (Aside: FSDP still maybe degrades Mistral so better to use DeepSpeed or for small tunes, Unsloth)

2023-12-14 DeepSpeed Writeup

So, more useful for Japanese native speakers, but fun code+animated diagrams explaining some DeepSpeed stuff: https://zenn.dev/turing_motors/articles/d00c46a79dc976

2023-11-01 NTT LLM - tsuzumi 7B

NTT's commercial LLM targeted for March 2024 release

2023-10-21 ALMA Ja

EN/JA Translation model based off of ALMA-7B

2023-08-29 ELYZA-japanese-Llama-2-7b

Sakura

A series of models being fine-tuned by a Chinese community using Chinese base models? Need to look at it more

Japanese Stability