by meetpateltech on 5/6/25, 3:10 PM with 686 comments
by segphault on 5/6/25, 3:34 PM
There are still significant limitations, no amount of prompting will get current models to approach abstraction and architecture the way a person does. But I'm finding that these Gemini models are finally able to replace searches and stackoverflow for a lot of my day-to-day programming.
by paulirish on 5/6/25, 5:54 PM
It'd make sense to rename WebDev Arena to React/Tailwind Arena. Its system prompt requires [1] those technologies and the entire tool breaks when requesting vanilla JS or other frameworks. The second-order implications of models competing on this narrow definition of webdev are rather troublesome.
[1] https://blog.lmarena.ai/blog/2025/webdev-arena/#:~:text=PROM...
by ranyume on 5/6/25, 3:30 PM
by laborcontract on 5/6/25, 3:57 PM
They measure the old gemini 2.5 generating proper diffs 92% of the time. I bet this goes up to ~95-98% https://aider.chat/docs/leaderboards/
Question for the google peeps who monitor these threads: Is gemini-2.5-pro-exp (free tier) updated as well, or will it go away?
Also, in the blog post, it says:
> The previous iteration (03-25) now points to the most recent version (05-06), so no action is required to use the improved model, and it continues to be available at the same price.
Does this mean gemini-2.5-pro-preview-03-25 now uses 05-06? Does the same apply to gemini-2.5-pro-exp-03-25?update: I just tried updating the date in the exp model (gemini-2.5-pro-exp-05-06) and that doesnt work.
by mohsen1 on 5/6/25, 4:00 PM
+------------------------------+---------+--------------+
| Benchmark | o3 | Gemini 2.5 |
| | | Pro |
+------------------------------+---------+--------------+
| ARC-AGI (High Compute) | 87.5% | — |
| GPQA Diamond (Science) | 87.7% | 84.0% |
| AIME 2024 (Math) | 96.7% | 92.0% |
| SWE-bench Verified (Coding) | 71.7% | 63.8% |
| Codeforces Elo Rating | 2727 | — |
| MMMU (Visual Reasoning) | 82.9% | 81.7% |
| MathVista (Visual Math) | 86.8% | — |
| Humanity’s Last Exam | 26.6% | 18.8% |
+------------------------------+---------+--------------+
[1] https://storage.googleapis.com/model-cards/documents/gemini-...by andy12_ on 5/6/25, 3:55 PM
[1] https://storage.googleapis.com/model-cards/documents/gemini-... [2] https://deepmind.google/technologies/gemini/
by planb on 5/7/25, 9:55 AM
What's up with AI companies and their model naming? So is this an updated 2.5 Pro and they indicate it by appending "Preview" to the name? Or was it always called 2.5 Preview and this is an updated "Preview"? Why isn't it 2.6 Pro or 2.5.1 Pro?
by killerstorm on 5/6/25, 3:51 PM
E.g. call it Gemini Pro 2.5.1.
by herpdyderp on 5/6/25, 3:36 PM
by arnaudsm on 5/6/25, 4:10 PM
I bet they kept training on coding, made everything worse on the way, and tried to hide it under the rug because of the sunk costs.
by simonw on 5/6/25, 10:19 PM
https://gist.github.com/simonw/7ef3d77c8aeeaf1bfe9cc6fd68760...
30,408 input, 8,535 output = 12.336 cents.
8,500 is a very long output! Finally a model that obeys my instructions to "go long" when summarizing Hacker News threads. Here's the script I used: https://gist.github.com/simonw/7ef3d77c8aeeaf1bfe9cc6fd68760...
by ionwake on 5/6/25, 3:40 PM
edit> Its gemini-2.5-pro-preview-05-06
edit>Cursor syas it doesnt have "good support" et, but im not sure if this is a defualt message when it doesnt recognise a model? is this a big deal? should I wait until its officially supported by cursor?
Just trying to save time here for everyone - anyone know the answer?
by franze on 5/6/25, 7:48 PM
by ramesh31 on 5/6/25, 3:27 PM
It really is wild to have seen this happen over the last year. The days of traditional "design-to-code" FE work are completely over. I haven't written a line of HTML/CSS in months. If you are still doing this stuff by hand, you need to adapt fast. In conjunction with an agentic coding IDE and a few MCP tools, weeks worth of UI work are now done in hours to a higher level of quality and consistency with practically zero effort.
by siwakotisaurav on 5/6/25, 3:28 PM
by minzi on 5/7/25, 11:29 AM
by mliker on 5/6/25, 3:49 PM
by crat3r on 5/6/25, 3:38 PM
I'm assuming large companies are mandating it, but ultimately the work that these LLMs seem poised for would benefit smaller companies most and I don't think they can really afford using them? Are people here paying for a personal subscription and then linking it to their work machines?
by jmward01 on 5/6/25, 10:16 PM
by djrj477dhsnv on 5/6/25, 3:42 PM
by zoogeny on 5/7/25, 6:46 PM
That process works pretty well but not perfectly. I have two examples where Gemini suggested improvements during the review stage that were actually breaking.
As an aside, I was investigating the OpenAI APIs and decided to use ChatGPT since I assumed it would have the most up-to-date information on its own APIs. It felt like a huge step back (it was the free model so I cut it some slack). It not only got its own APIs completely wrong [1], but when I pasted the url for the correct API doc into the chat it still insisted that what was written on the page was the wrong API and pointed me back to the page I had just linked to justify it's incorrectness. It was only after I prompted that the new API was possibly outside of its training data that it actually got to the correct analysis. I also find the excessive use of emojis to be juvenile, distracting and unhelpful.
1. https://chatgpt.com/share/681ba964-0240-800c-8fb8-c23a2cae09...
by m_kos on 5/6/25, 5:33 PM
- [codes] showing up instead of references,
- raw search tool output sliding across the screen,
- Gemini continusly answering questions asked two or more messages before but ignoring the most recent one (you need to ask Gemini an unrelated question for it to snap out of this bug for a few minutes),
- weird messages including text irrelevant to any of my chats with Gemini, like baseball,
- confusing its own replies with mine,
- not being able to run its own Python code due to some unsolvable formatting issue,
- timeouts, and more.
by artdigital on 5/6/25, 11:44 PM
Just recently a lot of people (me included) got hit with a surprise bill, with some racking up $500 in cost for normal use
I certainly got burnt and removed my API key from my tools to not accidentally use it again
Example: https://x.com/pashmerepat/status/1918084120514900395?s=46
by snthpy on 5/7/25, 4:47 AM
Also how do i understand the OpenAI model names? I don't use OpenAI anymore since Ilya left but when looking at the benchmarks I'm constantly confused by their model names. We have semantic versioning - why do I need an AI or web search to understand your model name?
by thevillagechief on 5/6/25, 3:55 PM
by ramoz on 5/6/25, 4:26 PM
by wewewedxfgdf on 5/6/25, 8:50 PM
You must rename your files to .tsx.txt THEN IT ACCEPTS THEM and works perfectly fine writing TSX code.
This is absolutely bananas. How can such a powerful coding engine have this behavior?
by llm_nerd on 5/6/25, 3:43 PM
Would be ideal if they incremented the version number or the like.
by xnx on 5/6/25, 3:34 PM
by martinald on 5/6/25, 3:43 PM
by seidleroni on 5/7/25, 12:11 PM
by qwertox on 5/6/25, 6:29 PM
It turns a well readable code-snippet of 5 lines into a 30 line snippet full of comments and mostly unnecessary error handling. Code which becomes harder to reason about.
But for sysadmin tasks, like dealing with ZFS and LVM, it is absolutely incredible.
by croemer on 5/7/25, 6:33 AM
No you haven't? At least not at 6am UTC on May 7. The PDF still mentions (03-25) as date of the model.
What version do I get on gemini.google.com when I select "2.5 Pro (experimental)"? Has anything changed there or not (yet)?
by javiercr on 5/7/25, 3:37 PM
https://github.com/microsoft/vscode-copilot-release/issues/8...
by CSMastermind on 5/6/25, 3:57 PM
At first I was very impressed with it's coding abilities, switching off of Claud for it but recently I've been using GPT o3 which I find is much more concise and generally better at problem solving when you hit an error.
by childintime on 5/6/25, 4:57 PM
Also, why doesn't Ctrl+C work??
by elAhmo on 5/7/25, 9:34 AM
Google releasing a new model (as it has a blog post, announcement, can be chosen in the API) called 2.5 Pro Preview, while having a 2.5 Pro already out for months is ridiculous. I thought it was just OpenAI that is unable to use its dozens of billions of dollars to come up with a normal naming scheme - yet here we are with another trillion dollar company being unable to settle on a versioning scheme that is not confusing.
by niteshpant on 5/7/25, 8:59 PM
Good thinking otherwise.
by oellegaard on 5/6/25, 3:46 PM
by EliasWatson on 5/6/25, 4:00 PM
by seatac76 on 5/7/25, 4:57 PM
by mattmcknight on 5/7/25, 12:50 PM
by cadamsdotcom on 5/6/25, 8:43 PM
Now there’s a big nugget to chew (LLMs) you’re seeing that latent capability come to life. This awakening feels more bottom-up driven than top down. Google’s a war machine chugging along nicely in peacetime, but now its war again!
Hats off to the engineers working on the tech. Excited to try it out!
by mvdtnz on 5/6/25, 7:08 PM
by nashashmi on 5/6/25, 4:18 PM
How are they now? Sufficiently good? Competent? Competitive? Or limited? My needs are very consumer oriented, not programming/api stuff.
by seydor on 5/7/25, 6:26 AM
by jeswin on 5/6/25, 3:25 PM
by brap on 5/6/25, 3:51 PM
by xbmcuser on 5/6/25, 3:42 PM
by obsolete_wagie on 5/6/25, 5:14 PM
by panarchy on 5/6/25, 4:54 PM
by white_beach on 5/6/25, 3:43 PM
(aider joke)
by gitroom on 5/6/25, 4:02 PM
by ionwake on 5/6/25, 5:16 PM
by alana314 on 5/6/25, 8:45 PM
by xyst on 5/6/25, 4:55 PM
Oof. G and others are way behind