from Hacker News

HN under the hood: Analyzing ~40M stories and comments

by osm3000 on 4/3/24, 2:33 PM with 5 comments

  • by osm3000 on 4/3/24, 2:33 PM

    Hi HN, I've been working in the last few weeks with the HN API to analyze stories and comments—about 40 million items in total. I've put together some stats on user activity and story scoring, and I've shared the code as well. If you're curious about the data or have insights to add, check it out. I'd love to hear your thoughts :)
  • by ColinWright on 4/3/24, 3:08 PM

    Is there a reason for the stray strong "i" in the middle of the text?

    Your LaTeX didn't render for $R^{2}=0.78$

    You're missing an "is" in this sentence:

    > So, if you want to want to win the HN game, consistency in sharing IS the winning strategy.

    I can't parse this sentence:

    > "Because I was still collecting the data, and the I had only until 2015 then"

    All in all, an interesting analysis, although I'm unsure of some of the conclusions. Many of them seem reasonable, but it would be nice for someone to use them to form hypotheses and go on to test them.

    But definitely an interesting article.

  • by distalx on 4/3/24, 2:54 PM

    Is there a regular data dump available for HN?