Content mirrored for search engine indexing from:

https://github.com/cccbook/py2gpt/wiki/rlhf_dpo

Why does this service exist?

📅 Last Modified: Tue, 31 Dec 2024 05:09:24 GMT

rlhf_dpo - cccbook/py2gpt GitHub Wiki

從 RLHF 到 DPO

arxiv: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
RLHF vs DPO
https://huggingface.co/docs/trl/main/en/online_dpo_trainer
https://huggingface.co/docs/trl/en/dpo_trainer

🗂️ Page Index for this GitHub Wiki