- https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama_2/scripts/dpo_llama2.py
- https://github.com/CarperAI/trlx/blob/main/examples/summarize_rlhf/reward_model/reward_model.py
- https://github.com/CarperAI/trlx/blob/main/trlx/models/modeling_ppo.py
RLHF训练代码
于 2024-01-23 10:15:28 首次发布