Why Frontier LLMs Struggle to Accurately & Reliably Call Functions

Asher Bond
3 min readJun 28, 2024

--

Here are some of the reasons why today’s frontier large language models still struggle to accurately call the right functions in a continued and reliable manner:

  1. Lack of Explicit Context: Any LLMs which lacks explicit contextual information can’t clearly indicate the appropriate function to use. LLMs rely heavily on the context provided by the user, which is often too vague or incomplete.
  2. Ambiguity in Natural Language: Language itself is naturally and inherently ambiguous and can be interpreted in multiple ways. Without clear and precise instructions, it can be difficult for an LLM to determine the exact function needed. This is a manageable problem and can be solved with a little domain-specific fine-tuning.
  3. Complexity of Task: If a task requires a combination of multiple functions or a specific sequence of operations that is dynamic or requires a huge plan then the LLM will get distracted from its tasks. This lack of complex task management which people want from AI is why we have Langchain, BabyAGI, Elastic Supertransformation Platform, Poe, etc. There’s pretty much a PaaS for every agentic chaining solution especially if its open-source. Determining the correct combination and sequence requires a deep understanding of the task and the available functions. Functionally atomic programming handles the task complexity problem by breaking down tasks into atomic functions. The function call structure can be communicated clearly and reliably through dynamic contexts efficiently and effectively. Complexity increases exponentially when you consider error handling and yet exponentially again when you consider actually recovering from an error. But task planning is no big deal so don’t get complacent about what you think these limitations are right now. Check back later and see what these capabilities look like with a little fine-tuning. You might be shocked by how fast innovation is evolving.
  4. Limited Training on Specific APIs: LLMs are trained on a broad range of text but may not have specific training on the exact API or function calls relevant to a particular task. Limited API training means limited ability to accurately choose the right functions without additional context or guidance. APIs are lies? Well maybe that’s taking it too far, but the documentation can be sparse and fall behind how the API actually works. We’ve seen solutions where developers are reverse engineering APIs to produce documentation and even fully-featuered and untethered working open-source platforms (ex: Openstack) based around API semantics in some cases. Over time this should solve itself as the documentation drift begins to undrift due to rapid documentation capabilities. The challenge still remains how to keep models trained and augmented with the latest available and recommended API calls.
  5. Evolving Nature of APIs: APIs and functions often evolve, with new ones being added and old ones being deprecated. Keeping up-to-date with these changes is challenging for static models trained on historical data. To put it bluntly, platform developers can pull the rug out from under their application developers by changing things. They should do the right thing, but in the state of nature they evolve quickly. Platform developers should do the right thing because their application developers are their users. True software design evolution in the context of true platform developer evolution requires clear setting of developer expectations throughout the whole supply chain not just hyperscale platform developers changing APIs whenever they want and everyone else jumping from one hyperscaler to another. The new technology being born from this need is the cognitive API integrator which is an API proxy that autonomously integrates APIs and automates some of the damage control around these haphazard or undocumented API changes.
  6. User Intent Understanding: Accurately understanding the user’s intent is crucial. Misinterpretation of the user’s request can lead to calling the wrong function. There’s a fine line between RLHF-level nagging of the user repeatedly for clarification and fully autonomous doing of the whole project wrongfully. People do big projects wrong that fail too by the way. I know because I’m a VC Investor. I know because I spent over 25 years as a software engineer. User intent and user understanding man. This is what software design is all about and LLMs can sometimes skip the important parts. You always gotta stay close to your users.

--

--