by hackandthink on 4/19/25, 1:12 PM with 29 comments
by gwern on 4/22/25, 12:04 AM
By the way, note that this applies to LLMs too. One of the biggest pons asinorums that people get hung up on is the idea that "it just imitates the data, therefore, it can never be better than the average datapoint (or at least, best datapoint); how could it possibly be better?"
Well, we know from a long history that this is not that hard: humans make random errors all the time, and even a linear model with a few parameters or a little flowchart can outperform them. So it shouldn't be surprising or a mystery if some much more complicated AI system could too.
by vintermann on 4/22/25, 5:49 AM
This reminds me of the many years machine translation was evaluated on BLEU towards reference translations, because they didn't know any better ways. Turns out that if you measure translation quality by n-gram precision towards a reference translation, then methods based on n-gram precision (such as the old pre-NMT Google translate) were really hard to beat.
by nitwit005 on 4/21/25, 8:01 PM
Humans are tool users. If you make a statistical table to consult for some medical issue, you've using a tool.
by rawgabbit on 4/21/25, 9:00 PM
The most recent example I can think of is "Frank". In 2021, JPMorgan Chase acquired Frank, a startup founded by Charlie Javice, for $175 million. Frank claimed to simplify the FAFSA process for students. Javice asserted the platform had over 4 million users, but in reality, it had fewer than 300,000. To support her claim, she allegedly hired a data science professor to generate synthetic data, creating fake user profiles. JPMorgan later discovered the discrepancy when a marketing campaign revealed a high rate of undeliverable emails. In March 2025, Javice was convicted of defrauding JPMorgan.
IMO an data expert could have recognized the fake user profiles through the fact he has seen e.g., how messy real data is, know the demographics of would be users of a service like Frank (wealthy, time stressed families), know tell tale signs of fake data (clusters of data that follow obvious "first principles").
by dominicq on 4/21/25, 8:51 PM
Or maybe even be in a domain which, for whatever reason, is poorly represented by a statistical model, something where data points are hard to get.
by mwkaufma on 4/22/25, 1:28 AM
There is another aspect here where those averaged outcomes are also the output of statistical models. So it is kind of like asking whether statistical models are better at agreeing with other statistical models than humans.
by delichon on 4/21/25, 6:44 PM
This seems to be a near restatement of the bitter lesson. It's not just that large enough statistical models outperform algorithms built from human expertise, they also outperform human expertise directly.
by 3abiton on 4/21/25, 9:58 PM
by reedf1 on 4/22/25, 7:39 AM
by kreyenborgi on 4/22/25, 5:16 PM
by whatever1 on 4/22/25, 4:52 AM
My point is that in many occasions being right on average is less important than being right on the tail.
by bicepjai on 4/22/25, 4:12 PM