from Hacker News

Outcome-Based Reinforcement Learning to Predict the Future

by bturtel on 5/27/25, 1:33 PM with 15 comments

  • by ctoth on 5/27/25, 4:14 PM

    Do you want paperclips? Because this is how you get paperclips!

    Eliminate all agents, all sources of change, all complexity - anything that could introduce unpredictability, and it suddenly becomes far easier to predict the future, no?

  • by valine on 5/27/25, 6:26 PM

    So instead of next token prediction its next event prediction. At some point this just loops around and we're back to teaching models to predict the next token in the sequence.
  • by jldugger on 5/27/25, 10:25 PM

    From the abstract

    > A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037).

    I'm lazy: is this hypothetical shooting fish in a barrel, or is it a real edge?

  • by amelius on 5/27/25, 9:56 PM

    Why would you use RL if you're not going to control the environment, but just predict it?
  • by garbagecoder on 5/28/25, 7:45 PM

    "a couple of wavy lines"

    bzzzzz "sorry this isn't your lucky day"