from Hacker News

Top
New

Direct Preference Optimization: Your Language Model Is a Reward Model

by ntonozzi on 6/5/23, 2:04 AM with 0 comments