by extesy on 4/8/25, 11:07 PM with 24 comments
by doctoboggan on 4/9/25, 12:22 AM
Are these raters experts in the field the report was written on? Did they rate the reports on factuality, broadness, and insights?
These sort of tests (and RLHF in general) are the reason that LLMs often respond with "Great question, you are exactly right to wonder..." or "Interesting insight, I agree that...". I do not want this obsequious behavior, I want "correct answers"[0]. We need some better benchmarks when it comes to human preference.
[0]: I know there is no objective correct answer for some questions.
by jeffbee on 4/9/25, 12:14 AM
by DadBase on 4/9/25, 12:30 AM
by pizzly on 4/9/25, 1:24 AM
by infecto on 4/9/25, 12:16 AM