from Hacker News

  • Top
  • New

Direct Preference Optimization: Your Language Model Is a Reward Model

by ntonozzi on 6/5/23, 2:04 AM with 0 comments