RLHF - AshokBhat/ml GitHub Wiki About Reinforcement Learning from Human Feedback Use methods from reinforcement learning to directly optimize a language model with human feedback. Used in ChatGPT See also ChatGPT