ZZZ Other JA Models and Resources - AUGMXNT/shisa GitHub Wiki
In general the best-maintained resource tracking the latest JA-related LLM stuff is: https://github.com/llm-jp/awesome-japanese-llm/
I'm also now curating a JP AI Twitter List: https://twitter.com/i/lists/1738064886427734518
See also latest HF JA text-generation Models: https://huggingface.co/models?pipeline_tag=text-generation&language=ja&sort=modified
We'll just use this to track some of the more interesting Japanese releases and write-ups we come across:
2024-01-15 Karasu/Qarasu
Released EOY 2023, but a new announcement here:
- https://www.lightblue-tech.com/2024/01/15/20240115_news/
- https://note.com/peter_lightblue/n/ne08a7c8cc47a
- https://huggingface.co/lightblue
First new model to use some of our dataset:
- https://huggingface.co/models?dataset=dataset:augmxnt/ultra-orca-boros-en-ja-v1
- Karasu 7B is based off of
shisa-7b-v1
2023-12-21 Nekomata
Rinna has released new Qwen-based models (150K Qwen tokenizer, +66B token pre-training). Based on how strong Qwen-14B Chat was, interested to see how this tune compares:
- Tweet: https://twitter.com/rinna_research/status/1737648832345989428
- Announcement: https://rinna.co.jp/news/2023/12/20231221.html
- Collection: https://huggingface-co.translate.goog/collections/rinna/nekomata-6582b5134ee85531becbb9a9
- Benchmarks: https://rinnakk.github.io/research/benchmarks/lm/index.html
- Instruct model uses a mix of stuff (included mistranslations): https://huggingface.co/rinna/nekomata-14b-instruction
2023-12-20 ELYZA-tasks-100 Shootoff
A very detailed writeup testing out a lot of JA (and a few non-JA) LLMs using GPT-4 judging (with some analysis on that aspect as well). For instruction following, I assume a decent large-sized instruct is pretty necessary to answer many of these questions properly.
- https://qiita.com/wayama_ryousuke/items/105a164e5c80c150caf1
- Hey, in the addendum there's shisa-7b-v1: https://qiita.com/wayama_ryousuke/items/105a164e5c80c150caf1#appendix-3-%E3%81%95%E3%82%89%E3%81%AB%E4%BB%96%E3%81%AE%E3%83%A2%E3%83%87%E3%83%AB%E3%82%82%E8%A9%95%E4%BE%A1%E3%81%97%E3%81%A6%E3%81%BF%E3%81%9F
- Code: https://github.com/Northern-System-Service/gpt4-autoeval
- Spreadsheet: https://docs.google.com/spreadsheets/d/1nOWtneRdrkxwQbAN0rWmXqiJXR9IXK9lVkyDjQTqNGc/edit#gid=1023787356
2023-12-19 Swallow
- Site: https://tokyotech-llm.github.io/
- Announcement: https://tokyotech-llm.github.io/swallow-llama
- Technical Writeup: https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907
- Tokenizer extension to 43176 (I believe that %64 would be better for perf)
- 100B continued pretrain
- Instruct models are using the same mistranslations, I've run JA MT-Bench results that show expected performance
- I've created the appropriate chat_template for their instruct format
- Some of my notes/testing in https://discord.com/channels/1147858054231105577/1147862078695149608/1187047159468675073
2023-12-18 Kobata-Recipes
Kazuki Fujii (who worked on training Swallow) published their PEFT, FSDP, PEFT+FSDP recipes. (Aside: FSDP still maybe degrades Mistral so better to use DeepSpeed or for small tunes, Unsloth)
- https://medium.com/@kaz.tokyo.tech20/kotoba-recipes-library-5-minutes-to-start-llama-2-continual-learning-5f95c244a566
- https://github.com/kotoba-tech/kotoba-recipes
2023-12-14 DeepSpeed Writeup
So, more useful for Japanese native speakers, but fun code+animated diagrams explaining some DeepSpeed stuff: https://zenn.dev/turing_motors/articles/d00c46a79dc976
2023-11-01 NTT LLM - tsuzumi 7B
NTT's commercial LLM targeted for March 2024 release
2023-10-21 ALMA Ja
EN/JA Translation model based off of ALMA-7B
2023-08-29 ELYZA-japanese-Llama-2-7b
- Announcement: https://note.com/elyza/n/na405acaca130
- Technical Writeup: https://zenn.dev/elyza/articles/2fd451c944649d
- Evals (ELYZA-100 discussion): https://zenn.dev/elyza/articles/5e7d9373c32a98
Sakura
A series of models being fine-tuned by a Chinese community using Chinese base models? Need to look at it more
Japanese Stability
- Nov 2023: Beta 7B/70B (Llama2) w/ 100B slightly filtered (SlimPajama for EN, but unfiltered for JA) additional token pretrain using the default tokenizers - the 70B probably the strongest explicitly JA focused open model, but kneecapped by bad fine tune (In native speaker testing, Xwin-LM-70B-V0.1 generated significantly better Japanese responses in chat!)
- Japanese announcement
- Japanese MT-Bench
- Not officially announced, but they also did a Mistral 7B pretrain called "Gamma"
- Instruct tune was again using poorly translated datasets
- Gamma instruct: dolly , anthropic
- Beta instruct: Anthropic HH-RLHF, Databricks Dolly 15-k, OpenAssistant Conversations Dataset
- August 2023: Stability AI JP released their first "Alpha" models (English announcement) - Apache 2.0, 7B parameter, 750B token pretrain (unfiltered datasets), GPT-NeoX, 65K NovelAI/nerdstash-tokenizer-v1 - fluency limited
- Fine tune used an Alapaca translation, and the largely incorrect dolly , anthropic, and wikinews from here