from Hacker News

LLM function calls don't scale; code orchestration is simpler, more effective

by jngiam1 on 5/21/25, 5:18 PM with 101 comments

  • by madrox on 5/22/25, 12:11 AM

    I've been saying for two years that "any sufficiently advanced agent is indistinguishable from a DSL."

    Rather than asking an agent to internalize its algorithm, you should teach it an API and then ask it to design an algorithm which you can run that in user space. There are very few situations where I think it makes sense (for cost or accuracy) for an LLM to internalize its algorithm. It's like asking asking an engineer to step through a function in their head instead of just running it.

  • by jacob019 on 5/22/25, 1:52 AM

    I've been building agentic systems for my ecommerce business. I evaluated smolagents. It's elegant and has a lot of appealing qualities, but adds a lot of complexity to the system. For some tasks it's perfect, dynamic reporting environments that can sort and aggregate data without schema might be a good one. For most tasks it's just overkill. Gemini and OpenAI both offer python interpreters as tools, which can cover a lot of the use cases for code agents. It's true that cramming endless message on a stack of tool calls and interactions is not scalable, that is not a good way to use these tools. Most agentic workflows are shortlived. Complexity is managed with structure and discipline. These are well known problems in software development, and the old lessons still apply to the new tools. Function calls can absolutely scale well in an agentic system, or they can become a mess, just like they can in any codebase. Personally, building a system that works well is as much about managing cognitive load as the developer as it is about managing control flow and runtime performance. A simple solution that works well enough is usually superior to a clever solution with great performance. Composing function calls is the simple solution. Structured data can be still be parsed and transformed the old fashioned way. If the structure is unknown, even the cheap models are great at parsing. Managing complexity in an agentic system can be broken down into a problem of carefully managing application state. The message stack can be manipulated as needed to feed the models the active context. It's memory management in a constrained environment.
  • by mehdibl on 5/21/25, 8:35 PM

    The issue is not in function calls but HOW MCP got designed here and you are using.

    Most MCP are replicating API. Returning blobs of data.

    1. This is using a lot of input context in formating as JSON and escaping a Json inside already a JSON. 2. This contain a lot of irrelevant information that you can same on it.

    So the issue is the MCP tool. It should instead flaten the data as possible as it's going back again thru JSON Encoding. And if needed remove some fields.

    So MCP SAAS here are mainly API gateways.

    That brings this noise! And most of ALL they are not optimizing MCP's.

  • by obiefernandez on 5/21/25, 7:34 PM

    My team at Shopify just open sourced Roast [1] recently. It lets us embed non-deterministic LLM jobs within orchestrated workflows. Essential when trying to automate work on codebases with millions of lines of code.

    [1] https://github.com/shopify/roast

  • by hintymad on 5/21/25, 6:40 PM

    I feel that the optimal solution is hybrid, not polarized. That is, we use deterministic approach as much as we can, but leverage LLMs to handle the remaining complex part that is hard to spec out or describe deterministically
  • by padjo on 5/21/25, 9:07 PM

    Sorry I’ve been out of the industry for the last year or so, is this madness really what people are doing now?
  • by codyb on 5/21/25, 7:47 PM

    I'm slightly confused as to why you'd use a LLM to sort structured data in the first place?
  • by avereveard on 5/21/25, 5:54 PM

    That's kind of the entire premise of huggingface smolagent and while it does work really well when it works it also increase the challenges in rolling back failed actions

    I guess one could in principle wrap the entire execution block into a distributed transaction, but llm try to make code that is robust, which works against this pattern as it makes hard to understand failure

  • by bguberfain on 5/21/25, 8:41 PM

    I think that there may be another solution for this, that is the LLM write a valid code that calls the MCP's as functions. See it like a Python script, where each MCP is mapped to a function. A simple example:

      def process(param1, param2):
         my_data = mcp_get_data(param1)
         sorted_data = mcp_sort(my_data, by=param2)
         return sorted_data
  • by CSMastermind on 5/21/25, 8:35 PM

    LLMs clearly struggle when presented with JSON, especially large amounts of it.

    There's nothing stopping your endpoints from returning data in some other format. LLMs actually seem to excel with XML for instance. But you could just use a template to define some narrative text.

  • by arjunchint on 5/21/25, 9:48 PM

    I am kind of confused why can't you just create a new MCP tool that encapsulates parsing and other required steps together in a code block?

    This would be more reliable than expecting the LLM to generate working code 100% of the time?

  • by stavros on 5/21/25, 10:19 PM

    I would really like to see output-aware LLM inference engines. For example, imagine if the LLM output some tokens that meant "I'm going to do a tool call now", and the inference engine (e.g. llama.cpp) changed the grammar on the fly so the next token could only be valid for the available tools.

    Or, if I gave the LLM a list of my users and asked it to filter based on some criteria, the grammar would change to only output user IDs that existed in my list.

    I don't know how useful this would be in practice, but at least it would make it impossible for the LLM to hallucinate for these cases.

  • by norcalkc on 5/21/25, 9:41 PM

    > Allowing an execution environment to also access MCPs, tools, and user data requires careful design to where API keys are stored, and how tools are exposed.

    If your tools are calling APIs on-behalf of users, it's better to use OAuth flows to enable users of the app to give explicit consent to the APIs/scopes they want the tools to access. That way, tools use scoped tokens to make calls instead of hard to manage, maintain API keys (or even client credentials).

  • by brainless on 5/22/25, 6:23 AM

    This is something I have been attempting for quite a while now. One simple tool I started is a deterministic data extraction system where AI helps in finding out the data to be extracted but then the code would try and "template" it. When we have the template, the extraction on any similar string would happen deterministically.

    Think of extracting parts of an email subject. LLM is great at going through unseen subject lines and telling us what can be extracted. We ask LLM what it found, where. For things like dates, times, city, country etc, we can then deterministically re-run on new strings to extract.

    https://github.com/pixlie/determined

  • by darkteflon on 5/21/25, 9:43 PM

    We’ve been using smolagents, which takes this approach, and are impressed.

    Slight tangent, but as a long term user of OpenAI models, I was surprised at how well Claude Sonnet 3.7 through the desktop app handles multi-hop problem solving using tools (over MCP). As long as tool descriptions are good, it’s quite capable of chaining and “lateral thinking” without any customisation of the system or user prompts.

    For those of you using Sonnet over API: is this behaviour similar there out of the box? If not, does simply pasting the recently exfiltrated[1] “agentic” prompt into the API system prompt get you (most of the way) there?

    [1] https://news.ycombinator.com/item?id=43909409

  • by iLoveOncall on 5/21/25, 10:01 PM

    That's MCP for you.

    MCP is literally just a wrapper around an API call, but because it has some LLM buzz sprinkled on top, people expect it to do some magic, when they wouldn't expect the same magic from the underlying API.

  • by abelanger on 5/21/25, 5:57 PM

    > Most execution environments are stateful (e.g., they may rely on running Jupyter kernels for each user session). This is hard to manage and expensive if users expect to be able to come back to AI task sessions later. A stateless-but-persistent execution environment is paramount for long running (multi-day) task sessions.

    It's interesting how architectural patterns built at large tech companies (for completely different use-cases than AI) have become so relevant to the AI execution space.

    You see a lot of AI startups learning the hard way that value of event sourcing and (eventually) durable execution, but these patterns aren't commonly adopted on Day 1. I blame the AI frameworks.

    (disclaimer - currently working on a durable execution platform)

  • by visarga on 5/21/25, 6:38 PM

    Maybe we just need models that can reference spans by start:end range. Then they can pass arguments by reference instead of explicit quotation. We can use these spans as answers in extractive QA tasks, or as arguments for a code block, or to construct a graph from pointers, and do graph computation. If we define a "hide span" operation the LLM could dynamically open and close its context, which could lead to context size reduction. Basically - add explicit indexing to context memory, and make it powerful, the LLM can act like a CPU.
  • by fullstackchris on 5/21/25, 8:17 PM

    This is exactly what I've encountered, at least with Claude, it writes out huge artifacts (static ones retrieved from the file system or wherever) character for character - What I'm going to try this weekend is just integrating a redis cache or sqlite into the MCP tool calls, so claude doesnt have to write everything out character per character... no idea if it will work as expected...

    also looking into "fire and forget" tools, to see even if that is possible

  • by zackify on 5/22/25, 3:34 AM

    It’s because MCP return types are so basic. It’s text. Or image. Or one other type in the protocol I forget.

    It’s not well thought out. I’ve been building one with the new auth spec and their official code and tooling is really lacking.

    It could have been so much simpler and straight forward by now.

    Instead you have 3 different server types and one is deprecated already (SSE) it’s almost funny

  • by darkteflon on 5/21/25, 9:54 PM

    What are the current best options for sandboxed execution environments? HuggingFace seems to have a tie-up with E2B, although by default smolagents runs something ephemeral in-process. I feel like there must be a good Docker container solution to this that doesn’t require signing up to yet another SaaS. Any recommendations?
  • by deadbabe on 5/21/25, 11:30 PM

    I’m confused as to why no one is just having LLMs dynamically produce and expose new tools on the fly as combinations of many small tools or even write new functions from scratch, to handle cases where there isn’t an ideal tool to process some input with one efficient tool call.
  • by quotemstr on 5/22/25, 7:57 AM

    In which the industry reinvents the concept of a schema-ful API surface like the kinds we've had for 30 years. Rediscovering the past shouldn't be revolutionary
  • by yahoozoo on 5/21/25, 11:03 PM

    In the example request, they want a list of issues in their project but don’t need the ID of each issue. But, what about when you want a list of issues and DO want the ID?
  • by koakuma-chan on 5/21/25, 8:05 PM

    > TL;DR: Giving LLMs the full output of tool calls is costly and slow.

    Is this true for all tool calls? Even if the tool returns little data?