by bryanh on 5/1/25, 4:02 PM with 257 comments
by throwup238 on 5/1/25, 5:23 PM
LLMs were always a fun novelty for me until OpenAI DeepResearch which started to actually come up with useful results on more complex programming questions (where I needed to write all the code by hand but had to pull together lots of different libraries and APIs), but it was limited to 10/month for the cheaper plan. Then Google Deep Research upgraded to 2.5 Pro and with paid usage limits of 20/day, which allowed me to just throw everything at it to the point where I'm still working through reports that are a week or more old. Oh and it searched up to 400 sources at a time, significantly more than OpenAI which made it quite useful in historical research like identifying first edition copies of books.
Now Claude is releasing the same research feature with integrations (excited to check out the Cloudflare MCP auth solution and hoping Val.town gets something similar), and a run time of up to 45 minutes. The pace of change was overwhelming half a year ago, now it's just getting ridiculous.
by meander_water on 5/1/25, 10:25 PM
However, there's a major concern that server hosters are on the hook to implement authorization. Ongoing discussion here [1].
[0] https://modelcontextprotocol.io/specification/2025-03-26
[1] https://github.com/modelcontextprotocol/modelcontextprotocol...
by VSerge on 5/1/25, 5:36 PM
In case the above link doesn't work later on, the page for this demo day is here: https://demo-day.mcp.cloudflare.com/
by n_ary on 5/1/25, 5:47 PM
by sebstefan on 5/2/25, 9:14 AM
Edit: Actually right in the tickets themselves would probably be better and not require MCP... but still
by conroy on 5/1/25, 7:32 PM
When I hooked up our remote MCP server, Claude sends a GET request to the endpoint. According to the spec, clients that want to support both transports should first attempt to POST an InitializeRequest to the server URL. If that returns a 4xx, it should then assume the SSE integration.
by joshwarwick15 on 5/1/25, 5:20 PM
by tkgally on 5/2/25, 1:12 AM
I ran two of the same prompts just now through Anthropic’s new Advanced Research. The results for it and for ChatGPT and Gemini appear below. Opinions might vary, but for my purposes Gemini is still the best. Claude’s responses were too short and simple and they didn’t follow the prompt as closely as I would have liked.
Writing conventions in Japanese and English
https://claude.ai/public/artifacts/c883a9a5-7069-419b-808d-0...
https://docs.google.com/document/d/1V8Ae7xCkPNykhbfZuJnPtCMH...
https://chatgpt.com/share/680da37d-17e4-8011-b331-6d4f3f5ca7...
Overview of an industry in Japan
https://claude.ai/public/artifacts/ba88d1cb-57a0-4444-8668-e...
https://docs.google.com/document/d/1j1O-8bFP_M-vqJpCzDeBLJa3...
https://chatgpt.com/share/680da9b4-8b38-8011-8fb4-3d0a4ddcf7...
The second task, by the way, is just a hypothetical case. Though I have worked as a translator in Japan for many years, I am not the person described in the prompt.
by zoogeny on 5/1/25, 5:57 PM
Perhaps I am just frivolous with my own time, but I tend to use LLMs in a more iterative way for research. I get partial answers, probe for more information, direct the attention of the LLM away from areas I am familiar and towards areas I am less familiar. I feel if I just let it loose for 45 minutes it would spend too much time on areas I do not find valuable.
This seems more like a play for "replacement" rather than "augmentation". Although, I suppose if I had infinite wealth, I could kick of 10+ research agents each taking 45 minutes and then review their output as it became available, then kick off round 2, etc. That is, I could do my process but instead of interactively I could do it asynchronously.
by boh on 5/1/25, 5:12 PM
by WhitneyLand on 5/1/25, 5:34 PM
Hope one day it will be practical to do nightly finetunes of a model per company with all core corporate data stores.
This could create a seamless native model experience that knows about (almost) everything you’re doing.
by kostas_f on 5/1/25, 8:43 PM
Both OpenAI and Google continue to push the frontier on reasoning, multimodality, and efficiency whereas Claude's recent releases have felt more iterative. I'd love to see Anthropic push into model research again.
by rubenfiszel on 5/1/25, 5:19 PM
by myflash13 on 5/2/25, 6:01 AM
by bjornsing on 5/1/25, 7:13 PM
by edaemon on 5/1/25, 6:06 PM
by OJFord on 5/1/25, 6:21 PM
People will say 'aaah ad company' (me too sometimes) but I'd honestly trust a Google AI tool with this way more. Not just because it already has access to my Google Workspace obviously, but just because it's a huge established tech firm with decades of experience in trying not to lose (or have taken) user data.
Even if they get the permissions right and it can only read my stuff if I'm just asking it to 'research', now Anthropic has all that and a target on their backs. And I don't even know what 'all that' is, whatever it explored deeming it maybe useful.
Maybe I'm just transitioning into old guy not savvy with latest tech, but I just can't trust any of this 'go off and do whatever seems correct or helpful with access to my filesystem/Google account/codebase/terminal' stuff.
I like chat-only (well, +web) interactions where I control the input and taking the output, but even that is not an experience that gives me any confidence in giving uncontrolled access to stuff and it always doing something correct and reasonable. It's often confidently incorrect too! I wouldn't give an intern free reign in my shell either!
by bredren on 5/1/25, 5:16 PM
I’m a bit skeptical that it’s gonna work out of the box because of the amount of custom fields that seem to be involved to make successful API requests in our case.
But I would welcome, not having to solve this problem. Jira’s interface is among the worst of all the ticket tracking applications I have encountered.
But, I have found using a LM conversation paired within enough context about what is involved for successful POSTs against the API allow me to create update and relate issues via curl.
It’s begging for a chat based LLM solution like this. I’d just prefer the underlying model not be locked to a vendor.
Atlassian should be solving this for its customers.
by drivingmenuts on 5/1/25, 5:54 PM
This does not sound like it would be learning general information helpful across an industry, but specific, actionable information.
If not available now, is that something that AI vendors are working toward? If so, what is to keep them from using that knowledge to benefit themselves or others of their choosing, rather than the people they are learning from?
While people understand ethics, morals and legality (and ignore them), that does not seem like something that an AI understands in a way that might give them pause before doing an action.
by imbnwa on 5/1/25, 5:37 PM
by atonse on 5/1/25, 6:16 PM
Being Apple, they would have to come up with something novel like they did with push (where you have _one_ OS process running that delegates to apps rather than every app trying to handle push themselves) rather than having 20 MCP servers running. But I think if they did this properly, it would be so amazing.
I hope Apple is really re-thinking their absolutely comical start with AI. I hope they regroup and hit it out of the park (like how Google initially stumbled with Bard, but are now hitting it out of the park with Gemini)
by bdd_pomerium on 5/1/25, 11:37 PM
But the thread's security concerns—permissions, data protection, trust—are dead on. There is also a major authN/Z gap, especially for orgs that want MCP to access internal tools, not just curated SaaS.
Pushing complex auth logic (OAuth scopes, policy rules) into every MCP tool feels backwards.
* Access-control sprawl. Each tool reinvents security. Audits get messy fast.
* Static scopes vs. agent drift. Agents chain calls in ways no upfront scope list can predict. We need per-call, context checks.
* Zero-Trust principles mismatch. Central policy enforcement is the point. Fragmenting it kills visibility and consistency.
We already see the cost of fragmented auth: supply-chain hits and credential reuse blowing up multiple tenants. Agents only raise the stakes.
I think a better path (and in one in full disclosure, we're actively working on at Pomerium ) is to have:
* One single access point in front of all MCP resources.
* Single sign-on once, then short-lived signed claims flow downstream..
* AuthN separated from AuthZ with a centralized policy engine that evaluates every request, deny-by-default. Evaluation in both directions with hooks for DLP.
* Unified management, telemetry, audit log and policy surface.
I’m really excited about what MCP is putting us in the direction of being able to do with agents.
But without a higher level way to secure and manage the access, I’m afraid we’ll spend years patching holes tool by tool.
by belter on 5/1/25, 6:21 PM
by rvz on 5/1/25, 6:41 PM
Increasing the amount of "connections" to the LLM increases the risk in a leak and it gives your more rope to hang yourself with when at least one connection becomes problematic.
Now is a great time to be a LLM security consultant.
by pton_xd on 5/1/25, 6:06 PM
Give us an LLM with better reasoning capabilities, please! All this other stuff just feels like a distraction.
by 6stringmerc on 5/1/25, 5:41 PM
by mkagenius on 5/1/25, 5:37 PM
by jes5199 on 5/2/25, 4:35 AM
which is to say: I’m not sure it actually wins, technically, over the OpenAI/OpenAPI idea from last year, which was at least easy to understand
by game_the0ry on 5/2/25, 1:42 PM
Btw, that speaks to how important it is to get clear business requirements for work.
by sagarpatil on 5/2/25, 5:07 AM
by Surac on 5/2/25, 5:09 AM
by artur_makly on 5/1/25, 9:54 PM
So I tested a basic prompt:
1. go to : SOME URL
2. copy all the content found VERBATIM, and show me all that content as markdown here.
Result : it FAILED miserably with a few basic html pages - it simply is not loading all the page content in its internal browser.
What worked well: - Gemini 2.5Pro (Experimental) - GPT 4o-mini // - Gemini 2.0 Flash ( not verbatim but summarized )
by ChicagoDave on 5/1/25, 6:14 PM
I love MCP (it’s way better than plain Claude) but even that runs into context walls.
by rafram on 5/1/25, 6:41 PM
by weinzierl on 5/1/25, 6:41 PM
Sometimes I want a pure model answer and I used to use Claude for that. For research tasks I preferred ChatGPT, but I found that you cannot reliably deny it web access. If you are asking it a research question, I am pretty sure it uses web search, even when "Search" and "Deep Research" are off.
by cruffle_duffle on 5/1/25, 6:16 PM
by elia_42 on 5/2/25, 9:24 AM
I think we are coming to a new automated technology ecosystem where LLMs will orchestrate many different parts of software with each other, speeding up the launch, evolution and monitoring of products.
by gonzan on 5/1/25, 7:47 PM
by hdjjhhvvhga on 5/1/25, 7:18 PM
by jarbus on 5/1/25, 4:59 PM
by the_clarence on 5/1/25, 8:41 PM
That + the agent SDK of openAI makes creating agentic flow so easy.
On the other hand you're kinda forced to run these tools / MCP servers in their own process which makes no sense to me.
by zhyder on 5/1/25, 5:22 PM
by abhisek on 5/2/25, 10:32 AM
by dimgl on 5/1/25, 7:05 PM
Even my wife, who normally used Claude to create interesting recipes to bake cookies, has noticed a huge downgrade in 3.7.
by davee5 on 5/1/25, 6:14 PM
> a new way to connect your apps and tools to Claude. We're also expanding... with an advanced mode that searches the web.
The notion of software eating the world, and AI accelerating that trend, always seems to forget that The World is a vast thing, a physical thing, a thing that by its very nature can never be fully consumed by the relentless expansion of our digital experiences. Your worldview /= the world.
The cynic would suggest that the teams that build these tools should go touch grass, but I think that misses the mark. The real indictment is of the sort of thinking that improvements to digital tools [intelligences?] in and of themselves can constitute truly substantial and far reaching changes.
The reach of any digital substrate inherently limited, and this post unintentionally lays that bare. And while I hear accelerationists invoking "robots" as the means for digital agents to expand their potent impact deeper into the real world I suggest this is the retort of those who spend all day in apps, tools, and the web. The impacts and potential of AI is indeed enormous, but some perspective remains warranted and occasional injections of humility and context would probably do these teams some good.
by indigodaddy on 5/1/25, 6:07 PM
by gianpaj on 5/1/25, 6:10 PM
by MarkMarine on 5/2/25, 6:47 AM
by deanc on 5/1/25, 9:41 PM
Keyword search is such a naive approach to information discovery and information sharing - and renders confluence in big orgs useless. Being able to discuss and ask questions is a more natural way of unpacking problems.
by behnamoh on 5/1/25, 4:06 PM
by franze on 5/2/25, 6:36 AM
by noisy_boy on 5/2/25, 2:08 AM
by arjie on 5/1/25, 4:58 PM
by abhisek on 5/1/25, 5:58 PM
by Nijikokun on 5/1/25, 6:06 PM
by ausbah on 5/2/25, 3:15 PM
by worldsayshi on 5/1/25, 7:53 PM
I might not dare to add an integration if it can potentially add a bunch of stuff to the backing systems without my approval. Confirmations and review should be part of the protocol.
by todsacerdoti on 5/1/25, 8:37 PM
by jngiam1 on 5/2/25, 4:28 AM
by xnx on 5/1/25, 5:09 PM
Lots of people are making moves in this space (including Anthropic), but nothing has broken through to the mainstream.