from Hacker News

Outcome-Based Reinforcement Learning to Predict the Future

by bturtel on 5/27/25, 1:33 PM with 15 comments

by ctoth on 5/27/25, 4:14 PM
Do you want paperclips? Because this is how you get paperclips!
Eliminate all agents, all sources of change, all complexity - anything that could introduce unpredictability, and it suddenly becomes far easier to predict the future, no?
by valine on 5/27/25, 6:26 PM
So instead of next token prediction its next event prediction. At some point this just loops around and we're back to teaching models to predict the next token in the sequence.
by jldugger on 5/27/25, 10:25 PM
From the abstract
> A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037).
I'm lazy: is this hypothetical shooting fish in a barrel, or is it a real edge?
by amelius on 5/27/25, 9:56 PM
Why would you use RL if you're not going to control the environment, but just predict it?
by garbagecoder on 5/28/25, 7:45 PM
"a couple of wavy lines"
bzzzzz "sorry this isn't your lucky day"