LLM fine tunning - HiroSung/Study GitHub Wiki

LLM finetunning (24.09.02~03) / ๊น€ํ˜•์šฑ ๊ฐ•์‚ฌ

LLM finetuning-v4.pdf colab-20240902T061043Z-001.zip

1. Revisit LLMs

  • ๋Œ€๋ถ€๋ถ„์˜ llm์€ Transformer ๊ธฐ๋ฐ˜ ์•„ํ‚คํ…์ฒ˜ (2017๋…„ ๋ถ€ํ„ฐ~. ๊ตฌ๊ธ€ ๊ฐœ๋ฐœ. ๋ฒˆ์—ญ๋ชจ๋ธ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•ด์„œ. RNL๊ณ„์—ด . sequence to sequence)
  • GPT ๊ณ„์—ด(๋ฌธ์žฅ ์ƒ์„ฑ), BERT ๊ณ„์—ด (๋ฌธ์žฅ ๋งฅ๋ฝ ์ดํ•ด. ๋ถ„๋ฅ˜. ์ž…๋ ฅ๋œ ๋‹จ์–ด์˜ embedding ๋ฐฉ์‹)

2. BERT

2.1 Tokenization & Embedding

  • Tockenization : ํ…์ŠคํŠธ๋ฅผ ์˜๋ฏธ์žˆ๋Š” ํ† ํฐ ๋‹จ์œ„๋กœ ๋‚˜๋ˆ„๊ณ  ๊ณ ์œ ํ•œ ์ธ๋ฑ์Šค๋ฒˆํ˜ธ๋กœ ๋ณ€ํ™˜ [CLS] this is a input . [SEP] ์ŠคํŽ˜์„ค ํ† ํฐ ... ์ธ๋ฑ์Šค ๋ฒˆํ˜ธ ๋‹จ์–ด์‚ฌ์ „์˜ ๊ทœ๋ชจ๋Š” 5๋งŒ๊ฐœ ์ด์ƒ์ž„. ==> ์ •์ˆ˜์ฝ”๋”ฉ. ์ธ๋ฑ์Šค ์ฝ”๋”ฉ.
  • Embedding : Input Layer

3. GPT

  • ์‚ฌ์ „ํ•™์Šต์„ ํ•˜์ง€๋งŒ Decoder๊ธฐ๋ฐ˜ ์–ธ์–ด ๋ชจ๋ธ
  • ๊ฐ๊ฐ์˜ time step ์œผ๋กœ ๋ ˆ์ด๋ธ” ํ•˜์—ฌ ์˜ˆ์ธก์˜ ๋‹จ์–ด๋ฅผ ๋ถ„๋ฅ˜ํ•˜์—ฌ Vocab ์‚ฌ์ „์—์„œ ๋‹จ์–ด๋ฅผ ์„ ํƒํ•˜๊ฒŒ ๋จ.

4. Transformer

4.1 ์ฃผ์š” ํŠน์ง•

  • ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ์ง€์›: Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ์ˆ˜์‹ญ ์ข…์˜ Transformer ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ๋“ค์€ ๋‹ค์–‘ํ•œ ์–ธ์–ด ๋ฐ ์ž‘์—…์— ๋Œ€ํ•ด ์‚ฌ์ „ ํ›ˆ๋ จ๋˜์–ด ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.
  • ์‰ฌ์šด ๋ชจ๋ธ ์‚ฌ์šฉ: ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ช‡ ์ค„์˜ ์ฝ”๋“œ๋กœ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ดํ•ดํ•˜์ง€ ์•Š์•„๋„ ๊ณ ์„ฑ๋Šฅ์˜ NLP ๋ชจ๋ธ์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์‚ฌ์šฉ์ž ์นœํ™”์ : PyTorch์™€ TensorFlow ๋ชจ๋‘๋ฅผ ์ง€์›ํ•˜๋ฉฐ, ์‰ฌ์šด API๋ฅผ ํ†ตํ•ด ๋‘ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ๋ชจ๋‘ ์†์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ง๊ด€์ ์ธ ๋ฌธ์„œ์™€ ๋‹ค์–‘ํ•œ ํŠœํ† ๋ฆฌ์–ผ์ด ์ œ๊ณต๋˜์–ด ์‚ฌ์šฉ์ž๊ฐ€ ๋น ๋ฅด๊ฒŒ ํ•™์Šตํ•˜๊ณ  ์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.
  • ํ™•์žฅ์„ฑ: ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•˜๊ฑฐ๋‚˜ ์ƒˆ๋กœ์šด ํ˜•ํƒœ์˜ Transformer ๋ชจ๋ธ์„ ์‰ฝ๊ฒŒ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ–์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ปค๋ฎค๋‹ˆํ‹ฐ ๋ฐ ์ž์›: Hugging Face๋Š” ํ™œ๋ฐœํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๋ชจ๋ธ ํ—ˆ๋ธŒ๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉ์ž๊ฐ€ ๊ฐœ๋ฐœํ•œ ๋ชจ๋ธ์„ ๊ณต์œ ํ•˜๊ฑฐ๋‚˜ ๋‹ค๋ฅธ ์‚ฌ์šฉ์ž์˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4.2 Pipeline

4.2.1 Pipeline ์ข…๋ฅ˜

๊ฐ์ •๋ถ„์„
zero-shot text classification
text generation(Completion)
  • generator = pipeline("text-generation", model="kykim/gpt3-kor-small_based_on_gpt2")
Question answering
Summarization
NER
์‹ค์Šต

4.2.2 Model

  • Transformer์—๋Š” ๋‹ค์–‘ํ•œ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ์žˆ์œผ๋ฉฐ, ๊ฐ๊ฐ์€ ํŠน์ • ์ž‘์—…์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ForCausalLM - Text ์ƒ์„ฑ. ํ•™์Šต method๊ฐ€ ์ •์˜๋˜์–ด ์žˆ์Œ. ForMaskedLM - Bert ๋ชจ๋ธ ์‚ฌ์ „ํ•™์Šต์‹œ ์‚ฌ์šฉ ForMultipleChoice ForQuestionAnswering ForSequenceClassification - CLS๋ฅผ ๋ถ„๋ฅ˜๊ธฐ๋กœ ๋ฝ‘์•„์„œ ์‚ฌ์šฉ ForTokenClassification - ๊ฐ ํ† ํฐ๋ณ„๋กœ CLS๊ฐ€ ํ•„์š”ํ•  ๊ฒฝ์šฐ ์‚ฌ์šฉ. ์˜ˆ) ๋ฌธ๋ฒ• ๋ถ„์„๊ธฐ?

Training Language Model

Fine-tunning

  • 2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์Œ.
  • Scratch vs Transfer Learning

Scratch

  • Training from scratch .. ํ˜„์žฌ๋Š” ๊ฑฐ์˜ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ

Transfer Learning

  • ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ.
  • PLM (Pretrained Language Modeling)

Fine-tune ์ „๋žต

  • Pre-Tran > SFT (Supervised Fine Tunnig) > RLHF (๊ฐ•ํ™”ํ•™์Šต) > Fine-tunning & in-context learning
  • ์‹ค์Šต (PytorcH, Transformer Trainer API๋ฅผ ์‚ฌ์šฉํ•œ ํŒŒ์ธํŠœ๋‹) 5.Fine_tuning_a_model.ipynb
  • chatgpt๋„ 3๋‹จ๊ณ„๋กœ ์ ์šฉ์ด ๋œ๊ฒƒ์ž„

SFT (Supervised Fine Tunnig)

RLHF (Reinforcement Learning from Human Feedback)

Question answering model

PEFT (Parameter-Efficient Fine-Tuning)

Foundation Model

  • ํ•™์Šต์‹œํ‚ฌ ์ž๋ฃŒ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ๋ฆฌ์†Œ์Šค ์‚ฌ์šฉ์ด ์ปค์ง€๊ฒŒ๋จ.
  • ๊ฑฐ๋Œ€ ๋ชจ๋ธ์„ ํŠœ๋‹๊นŒ์ง€ ํ•  ์ˆ˜ ์žˆ์„๊นŒ? ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์  ์‚ฌ์šฉ์œ„ํ•ด์„œ ์‚ฌ์šฉ๋œ ๋ชจ๋ธ์ด PEFT ๋ชจ๋ธ์ž„

์ ์šฉ๋ฐฉ๋ฒ•

  1. Prompt-based method
  2. Adater modules

prefix-tuning

  • BERT ๋ชจ๋ธ. ์„œ๋น„์Šคํ•˜๊ณ ์ž ํ•˜๋Š” ๊ตฌ์„ฑ์— ๋”ฐ๋ผ ๋ชจ๋ธ์ด ์ƒ์„ฑ๋ ๊ฒƒ์ธ๋ฐ, attention layer ์— ๊ฐ€๋ณ€์  Prefix ์— ํ•ด๋‹นํ•˜๋Š” ํ† ํฐ๋“ค๋งŒ ํ•™์Šต์„ ์‹œํ‚ด

prompt tuning

  • prefix-tunig์„ ๋ฐœ์ „์‹œํ‚จ๊ฒƒ.
  • ์ž…๋ ฅ ํ† ํฐ๋งŒ ํ•™์Šต

LoRA

  • ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” ๋ฐฉ๋ฒ•
  • attention layer์— Query / Key / Value ๋กœ ์ธ๋ฑ์‹ฑ ํ•˜๋Š” ํ–‰๋ ฌ์ด ๋ฐ”๋€” ๊ฒฝ์šฐ Transformer ์—ญํ• ์ด ์ปค์ง€๊ฒŒ ๋˜์–ด. ํ•ด๋‹น ๋ถ€๋ถ„์— LoRA๋ฅผ ์ ์šฉํ•จ.
  • ์„ฑ๋Šฅ๋„ ์ข‹๊ณ , ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋„ ์ ์–ด์„œ ์—ฐ์‚ฐ์— ํ•„์š”ํ•œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ ๊ฒŒ ์‚ฌ์šฉ๋จ
  • LoRA ์‹ค์Šต https://colab.research.google.com/drive/1lr0nTEEaq5gEL8jghNCdFldiNImZ0iLO?usp=sharing

Instruction tuning

huggingface

  • accesstocken - write / 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
  • mistralai/Mistral-7B-v0.1 / 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

Instruction tuning

  • ๋‹ต๋ณ€ ์™„์„ฑ์ชฝ์— ํŠœ๋‹์ด ๋˜์–ด ์žˆ๋‹ค๋ฉด ์›ํ•˜๋Š” ๋‹ต๋ณ€์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ fine tuning ํ•˜๊ฒŒ ๋จ.
  • SFT (์ง€๋„์ ๊ฐ•ํ™”ํ•™์Šต) ์˜ ํ•˜๋‚˜์˜ ๋ฐฉ๋ฒ•

ํ•™์Šต๋ฐฉ๋ฒ•์€?

  • gpt ๋ชจ๋ธ ํ•™์Šต์‹œ ๊ธฐ์กด์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋™์ผํ•˜๊ณ  data-set๋งŒ ๋‹ฌ๋ผ์ง

  • step1. Dataset ๊ตฌ์ถ• > step2. ํŠœ๋‹ ๊ตฌ์ถ• .

  • ์š”์ฆ˜์€ instruction๋„ AI๋ฅผ ํ†ตํ•ด์„œ ์ƒ์„ฑ. ChatGPT4, Gemini ...

  • Alpaca Dataset ( GPT-3.5 (text-davinci-003) https://crfm.stanford.edu/2023/03/13/alpaca.html)

  • Instruction tuning with GTP4 / https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#how-good-is-the-data

  • wandb ๋”ฅ๋Ÿฌ๋‹ ์‚ฌ์šฉํ•˜๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๋งŽ์ด ์‚ฌ์šฉํ•จ

Alpaca ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•œ Llama ๋ชจ๋ธ instruction tuning

want db

RLHF

  • (Reinforcement Learning from Human Feedback) / ๊ฐ•ํ™”ํ•™์Šต
  • InstructGPT . PPO(๋ณด์ƒ๋ชจ๋ธ์˜ ๋ณด์ƒ์น˜๋ฅผ ๋†’์ž„) / DPO (๋ณด์ƒ๋ชจ๋ธ์—†์ด ์‚ฌ๋žŒ์˜ ์„ ํ˜ธ๋„๋กœ ํ•™์Šตํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜. ์„ ํ˜ธ.๋น„์„ ํ˜ธ ๋ฌธ์„œ๋ฅผ ์ฃผ๊ณ  ๋ฐ”๋กœ ํ•™์Šต)
  • PPO ๋ฐฉ์‹์„ ๋” ๊ฐ•๊ฑดํ•˜๊ณ  ํ•™์Šต์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ ์šฉ์„ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด ๊ฐœ๋ฐœ๋˜์–ด์•ผ ํ•จ.
  • ์ ์šฉ๋ฐฉ๋ฒ• . 3๋‹จ๊ณ„๋กœ ์ง„ํ–‰
  • ์ •์ฑ…๋ชจ๋ธ . ์ƒํ™ฉ์— ์˜ฌ๋ฐ”๋ฅธ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋„๋ก ํ•˜๋Š”๊ฒƒ.
  • DPO ์‹ค์Šต https://colab.research.google.com/drive/1uugTDKSJvpoLYfz5ntT6qL7uEFRpt-U_?usp=sharing

์˜ˆ์ƒ๋ฌธ์ œ

[๋ฌธ์ œ] ffn์„ ๋„ฃ์–ด์ฃผ๋Š” ์ด์œ  - ๋น„์„ ํ˜•์„ฑ์„ ๋„ฃ์–ด์ฃผ๊ธฐ ์œ„ํ•ด์„œ์ž„

์ฐธ๊ณ ์ž๋ฃŒ

Huggingface ํ•˜์œ„ ํ”„๋กœ์ ํŠธ
Transformer
https://huggingface.co/docs/transformers/index
PEFT
https://huggingface.co/docs/peft/v0.10.0/en/index
https://github.com/huggingface/peft/blob/main/README.md
์œ ์šฉํ•œ ์•„ํ‹ฐํด
The Ultimate Guide to Fine-Tune LLaMA 2, With LLM Evaluations
https://www.confident-ai.com/blog/the-ultimate-guide-to-fine-tune-llama-2-with-llm-evaluations
How to fine-tune an LLM part1 : Preparing Dataset for Instruction Tuning
https://newsletter.ruder.io/p/instruction-tuning-vol-1
https://newsletter.ruder.io/p/instruction-tuning-vol-2
https://medium.com/aiguys/reinforcement-learning-from-human-feedback-instructgpt-and-chatgpt-693d00cb9c58
Generating a Clinical Instruction Dataset in Portuguese with Langchain and GPT-4
https://solano-todeschini.medium.com/generating-a-clinical-instruction-dataset-in-portuguese-with-langchain-and-gpt-4-6ee9abfa41ae
How to Generate Instruction Datasets from Any Documents for LLM Fine-Tuning
https://towardsdatascience.com/how-to-generate-instruction-datasets-from-any-documents-for-llm-fine-tuning-abb319a05d91)
Github source
https://github.com/mlabonne/llm-course
https://github.com/ashishpatel26/LLM-Finetuning
https://github.com/tatsu-lab/stanford_alpaca
https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM?tab=readme-ov-file#how-good-is-the-data
https://github.com/Eladlev/AutoPrompt