RLHF
RAFT
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
code
RRHF
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
code
p i = ∑ t log P π ( y i , t ∣ y i , < t ) ∥ y i ∥ p_i=\frac{\sum_{t}\log P_{\pi}(y_{i,t}|y_{i,<t})}{\|y_i\|} pi=∥yi∥∑tlogPπ(yi,t∣yi,<t)
L r a n k = ∑ r i < r j max ( 0 , p i − p j ) L_{rank}=\sum_{r_i<r_j}{\max(0,p_i-p_j)} Lrank<