from Hacker News

Using reinforcement learning and $4.80 of GPU time to find the best HN post

by kcorbitt on 10/28/24, 5:17 PM with 95 comments

by jerjerjer on 10/28/24, 6:29 PM
> In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.
> Even if the model gets extremely good at predicting final_score_if_it_hits_front_page, there’s still the inherent randomness of probability_of_hitting_front_page that is fundamentally unpredictable.
In addition to date, you might want to include three fields:
- day of week (categorical)
- is weekend/holiday (boolean)
- hour or time of the day (categorical, you can have 24 of them or morning/afternoon/etc.).
The probability of a post hitting the front page is usually affected by these things so it can really help the model.
by kelnos on 10/28/24, 6:52 PM
I don't get the conclusion the author is trying to draw. If you look at the data presented, it seems that the model was actually pretty bad at guessing the real-world behavior of the posts listed. Out of the top ten it picked:
* 1 had a score that was reasonably close (8.4%) to what the model predicted
* 4 had scores wildly lower than the model predicted
* 2 had scores wildly higher than the model predicted
* the remaining 3 were not wildly off, but weren't really that close either (25%-42% off)
Then there's a list of 10 submissions that the model predicted would have scores ranging from 33 to 135, but they all only received a score of 1 in reality.
The graph shown paints a bit of a better picture, I guess, but it's still not all that compelling to me.
by youoy on 10/28/24, 6:27 PM
Thanks for sharing! Very interesting.
> The correlation is actually not bad (0.53), but our model is very consistently over-estimating the score at the low end, and underestimating it at the high end. This is surprising; some variation on any given data point is expected, but such a consistent mis-estimation trend isn’t what we’d expect.
This is a consequence on the model objective. If you don't know what is really happening, a good way of reducing the overall error is to do that. If you instead try to exactly predict the very highs and very lows, you can see that you will get very high errors on those, resulting in a bigger overall error.
Appart from that, I want to comment on AI alignment here. For me the objective of "most up votes" is not fully correlated with where I get the most value on HN. Most of the time, the most up voted I would have found them anyway on other platforms. It's the middle range what I really like. So be careful implementing this algorithm at scale, it could turn the website into another platform with shitty AI recommendations.
by oli5679 on 10/28/24, 5:59 PM
If you withhold a small amount of data, or even retrain on a sample of your training data, then isotonicregression is good to solve many calibration problems.
https://scikit-learn.org/dev/modules/generated/sklearn.isoto...
I also agree with your intuition that if your output is censored at 0, with a large mass there, it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.
by swyx on 10/28/24, 6:09 PM
> > This query took 17 seconds to load the dataset into RAM and then aggregating by type was almost instant. It is absolutely incredible to me that I can load every HN post and comment ever into RAM in a few seconds on my (admittedly beefy) dev laptop, and analyze them at will. What an age of abundance!
https://motherduck.com/blog/big-data-is-dead/
by Arctic_fly on 10/28/24, 6:32 PM
> But in 2015 there is a stark discontinuity, where the number of stories (with text) shoots up by >10x, and the average score drops by 5x! Is this some kind of eternal September?
Based on the later analysis in the post (which I agree with), the total score of a comment is disproportionately tied to whether it hits the front page, and of course how long it stays there. Regardless of the quality of the average post starting in 2015, the sheer quantity would make it impossible for all but a few to stay on the front page for very long. Hacker News got more popular, so each story got less prime time.
by kcorbitt on 10/28/24, 5:23 PM
Hey all, this project was a labor of love I worked on in my spare time over the last couple of weeks. Happy to answer any questions!
by sdflhasjd on 10/28/24, 6:00 PM
It's interesting that service complaints are so popular on HN. I always feel a bit bad that my most popular HN contribution was me complaining about a popular service
by pclmulqdq on 10/28/24, 5:47 PM
There is a timing factor that you need to consider, too. Anecdotally, Sunday morning is the best time to get onto the front page, while Tuesday or Wednesday morning gets you the most views.
by manx on 10/29/24, 6:11 AM
Very interesting! Identifying great new content is a big unsolved problem for HN IMHO. Unfortunately, scores are not a good metric to predict, because they are not comparable (see https://felx.me/2021/08/29/improving-the-hacker-news-ranking...). A better metric might be "upvoterate", defined as how much more or less likely users are to upvote a story compared to the average story. More about that here: https://github.com/social-protocols/quality-news?tab=readme-...
by Nevermark on 10/29/24, 11:04 AM
> It’s super important that your training inputs includes all the information your model will need to make predictions. In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.
You would do better to leave out dates and authors.
Do you really want the model to hone in on dates & authors? If you just trained on those would it create anything useful?
It can’t for dates, since it isn’t getting any future date examples to prepare for future dates. I suppose you could argue that month & day matter. But surely that would be a much lower quality discriminator than forcing the model to stay focused on title & content.
Similarly with author. You can find out which authors produce content with the most upvotes with a simple calculation.
But again, is that the discriminator you want the model to use? Or the title & content? Because it will use the easiest discriminator it can.
by gavin_gee on 10/28/24, 11:19 PM
Take note HN, this is what great content marketing looks like.
by 6gvONxR4sf7o on 10/28/24, 6:36 PM
Why use RL for this instead of plain old supervised learning?
by Havoc on 10/28/24, 5:46 PM
Nice write up.
Did you ever figure out what happened in 2016?
by 1024core on 10/28/24, 8:02 PM
Is it my understanding that the reward model is also similar to an LLM (with the difference being it predicts a score instead of the next token)?
by hnburnsy on 10/29/24, 5:22 PM
Suggestion would be to try and coorolate the best time to post on HN to get it noticed. A good post won't catch fire if it doesn't overcome the initial low visibility. I've posted items that are later posted by others that gain traction.
Maybe the reputation of the poster is also a factor?
by metalman on 10/30/24, 7:26 AM
now do it again, and this time see where your post on ranking posts,ranks Personaly,I find lauding the dead, and dead past to be some how objectionable. Though I suppose that it is the business of our so called Ai, mining the dead past, hoping to come up with something better than frankenstien's zombie corpse. It is an insurmountable limitation, and dangerous I think as well, the past is that ultimatly perfect thing, its absolute imutability, and totality, as it is all there, to pick and choose from such a thing is brazen indeed. I cant help but imagine a picture of your $4.80 actualy bieng consumed in a bed of fluidised coal, which in fact it was.
by eugenekolo on 10/28/24, 5:43 PM
What does the model say about this post?
by hn_throwaway_99 on 10/28/24, 11:29 PM
> And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!
Well, thanks HN, you were good while it lasted...
by suyash on 10/28/24, 6:20 PM
Very interesting project, would love to read a more technical write up on how the model was architected and trained, any pointers?
by octocop on 10/29/24, 8:35 AM
Even the AI's don't read the content before up/down voting.
by floobertoober on 10/28/24, 10:16 PM
Maybe it would help to use a box cox transform on the score distribution?
by chx on 10/28/24, 7:57 PM
> . That’s not much time for a model that (hopefully) understands all of HN!
this is dangerous talk.
it doesn't understand anything at all.
Reminder: We are more prone to anthromorphizing LLMs than to humanizing suffering humans.
by ChrisArchitect on 10/28/24, 6:23 PM
First problem with the submissions that supposedly 'would do well on HN' is other than the Ask HN: they're misusing the submission by putting it in a text post instead of sharing as a link post directly. And sketchy new/inactive accounts. C'mon. Not gonna keep reading grifty post after that opening.
by ivanovm on 10/29/24, 6:04 PM
this is very cool, have you tried DPO?