by chmaynard on 6/3/25, 9:31 PM with 160 comments
by b0a04gl on 6/4/25, 7:09 AM
Honestly, for straight-up classification? I’d pick SVM or logistic any day. Transformers are cool, but unless your data’s super clean, they just hallucinate confidently. Like giving GPT a multiple-choice test on gibberish—it will pick something, and say it with its chest.
Lately, I just steal embeddings from big models and slap a dumb classifier on top. Works better, runs faster, less drama.
Appreciate this post. Needed that reality check before I fine-tune something stupid again.
by amelius on 6/3/25, 10:07 PM
by Kiyo-Lynn on 6/4/25, 6:28 AM
I believe that if we’re not even willing to carefully confirm whether our predictions match reality, then no matter how impressive the technology looks, it’s only a fleeting illusion.
by godelski on 6/3/25, 10:29 PM
> although later investigation suggests there may have been data leakage
I think this point is often forgotten. Everyone should assume data leakage until it is strongly evidenced otherwise. It is not on the reader/skeptic to prove that there is data leakage, it is the authors who have the burden of proof.It is easy to have data leakage on small datasets. Datasets where you can look at everything. Data leakage is really easy to introduce and you often do it unknowingly. Subtle things easily spoil data.
Now, we're talking about gigantic datasets where there's no chance anyone can manually look through it all. We know the filter methods are imperfect, so it how do we come to believe that there is no leakage? You can say you filtered it, but you cannot say there's no leakage.
Beyond that, we are constantly finding spoilage in the datasets we do have access to. So there's frequent evidence that it is happening.
So why do we continue to assume there's no spoilage? Hype? Honestly, it just sounds like a lie we tell ourselves because we want to believe. But we can't fix these problems if we lie to ourselves about them.
by boxed on 6/4/25, 6:56 AM
It's the same as "AI can code". It gets caught with failing spectacularly when the problem isn't in the training set over and over again, and people are surprised every time.
by kenjackson on 6/3/25, 10:10 PM
Is this really not the case? I've read some of the AI papers in my field, and I know many other domain experts have as well. That said I do think that CS/software based work is generally easier to check than biology (or it may just be because I know very little bio).
by croemer on 6/3/25, 10:59 PM
by 8bitsrule on 6/3/25, 11:29 PM
by slt2021 on 6/3/25, 10:19 PM
This is basically another argument that deep learning works only as a [generative] information retrieval - i.e a stochastic parrot, due to the fact that the training data is a very lossy representation of the underlying domain.
Because the data/labels of genes do not always represent the underlying domain (biology) perfectly, the output can be false/invalid/nonsensical.
in cases where it works very well - there is data leakage, because by design LLMs are information retrieval tools. It comes form the information theory standpoint, a fundamental "unknown unknown" for any model.
my takeaway is that its not a fault of the algorithm, its more the fault of the training dataset.
We humans operate fluidly in the domain of natural language, and even a kid can read and evaluate whether text make sense or not - this explains the success of models trained on NLP.
but in domains where training data represents the fundamental domain with losses, it will be imperfect.
by ErigmolCt on 6/4/25, 7:20 AM
by softwaredoug on 6/3/25, 11:42 PM
by imiric on 6/4/25, 7:38 AM
This seems to be exactly the kind of results we would expect from a system that hallucinates, has no semantic understanding of the content, and is little more than a probabilistic text generator. This doesn't mean that it can't be useful when placed in the right hands, but it's also unsurprising that human non-experts would use it to cut corners in search of money, power, and glory, or worse—actively delude, scam, and harm others. Considering that the latter group is much larger, it's concerning how little thought and resources are put into implementing _actual_ safety measures, and not just ones that look good in PR statements.
by naasking on 6/13/25, 1:08 PM
This is not a unique phenomenon to AI, a study a few years before AI exploded showed that sensational but wrong papers had a higher chance of being published in top journals, and were cited more than papers that showed they were incorrect. This is an outcome of the bad incentives in science publishing these days.
by choeger on 6/4/25, 6:46 AM
Why are people using transformers? Do they have any intuition that they could solve the challenge, let alone efficiently?
by 6stringmerc on 6/4/25, 1:02 PM
Until the concept of consequences and punishment are part of AI systems, they are missing the biggest real world component of human decision making. If the AI models aren’t held responsible, and the creators / maintainers / investors are not held accountable, then we’re heading for a new Dark Age. Of course this is a disagreeable position because humans reading this don’t want to have negative repercussions - financially, reputationally, or regarding incarceration - so they will protest this perspective.
That only emphasizes how I’m right. AI doesn’t give a fuck about human life or its freedom because it has neither. Grow up and start having real conversations about this flaw, or make peace that eventually society will have an epiphany about this and react accordingly.
by j7ake on 6/4/25, 1:55 AM
by mehulashah on 6/4/25, 3:54 PM
by vismit2000 on 6/5/25, 3:51 AM
by andai on 6/4/25, 6:05 AM
by kgilpin on 6/4/25, 1:52 PM
Deep, accurate, real-time code review could be of huge assistance in improving quality of both human- and AI-generated code. But all the hype is focused on LLMs spewing out more and more code.
by hbartab on 6/4/25, 3:03 PM
The danger behind usage of LLMs is that managers do not see the diligent work needed to ensure whatever they come up with is correct. They just see a slab of text that is a mixture of reality and confabulation, though mostly the latter, and it looks reasonable enough, so they think it is magic.
Executives who peddle this nonsense don't realize that the proper usage requires a huge amount of patience and careful checking. Not glamorous work, as the author states, but absolutely essential to get good results. Without it, you are just trusting a bullshit artist with whatever that person comes up with.
by aaron695 on 6/3/25, 10:23 PM
The worse science, publish or perish pulp, got more academic karma Altmetric/Citations -> $$$
AI is a perfect academic, the science and curiosity is gone and the ability to push out science looking text is supermaxxed.
Tragic end solution, do the same and throw even more money at it
> At a time when funding is being slashed, I believe we should be doing the opposite
AI has show academia is beyond broken in a way that can't be ignored to the world and academia won't get their heads out of the granular sediments between 0.0625 mm and 2 mm in diameter.
Defund academia now.
by Klaus_ on 6/4/25, 3:09 AM
by aucisson_masque on 6/3/25, 10:25 PM
Except that we can’t compare twitter to nature journal. Science is supposed to be immune to these kind of bullshit thanks to reputed journals and pair reviewing, blocking a publication before it does any harm.
Was that a failure of nature ?
by semiinfinitely on 6/3/25, 10:42 PM