from
Hacker News
Top
New
Direct Preference Optimization: Your Language Model Is a Reward Model
by
ntonozzi
on 6/5/23, 2:04 AM with 0 comments