Large language models are going to form the foundation of a burgeoning industry of autonomous software agents that can act as virtual assistants, internal search engines, and digital bots that will help us all be more productive. As large language models grow, their creators imbue the with additionally sophisticated features that helps them better respond to prompts. One of the latest of these is ChatGPT's "function calling" capability.
In June 2023, OpenAI, the makers of ChatGPT, created a feature for their API, one that lets developers specify a list of Python functions that their code supports and allows ChatGPT to create the code that can invoke these functions, along with their parameters, automatically. Developers were immediately excited, as this opened up a new tool for them to enable ChatGPT to automatically invoke, enabling them to create more sophisticated applications. But while automatic "function calling" is interesting, the way that it's implemented in LLMs won't be enough for many businesses and production-ready applications, and developers should be prepared to choose a more robust technique for creating their agents.
Diving deeper into function calling
ChatGPT's function calling, which itself is a bit of a misnomer since it implies ChatGPT actually calls functions for you, when it doesn't, is an API-specific capability that allows developers to list out a JSON object that contains descriptive information about a list of Python functions that their app supports. Developers can send this list in the new "functions" attribute of ChatGPT's API. ChatGPT will then read the user's prompt and determine if a function call is appropriate. If it is, it will provide a response that contains a JSON-formatted string that can be executed as Python code. The string will contain the proper name and parameters based on the LLM's understanding, and your code could then use Python's exec() method to actually execute that string.
In an ideal world, ChatGPT would provide the perfect code to run in any given scenario, allowing your application to simply execute a series of requests, responses, and actions using ChatGPT, ultimately enabling your software to do anything and everything. Unfortunately, function calling has a few downsides that might make it harder to use for business-specific applications.
Issues with function calling
If you've got a simple list of functions that don't need any special features, like OAuth authentication, adding Bearer tokens in your headers, or pulling user specific data, then ChatGPT's built-in function calls can work pretty well. Calling a simple weather service API with today's date might be all you need for your app. But if you need to make more complex calls that have pre-requisites (like user-specific data mentioned above), or that require additional steps or data to complete, function calls might actually make your life harder.
For example, let's say your agent integrates with Google Drive and Gmail as part of your internal search engine. When one of your users makes a request to your bot about finding a PDF in your "Contracts" folder, your bot would need to interact with Google Drive's search and filter functionality to get a list of relevant documents, and then search through the contents of those documents to find the right one. But in order to have your code connect to Google Drive, you need to write several lines of code to instantiate a search object, and you also need to retrieve the OAuth token associated with the requesting user before you can issue a search request, and if that OAuth token has expired, you'd need to refresh it before you do anything else. This is a multi-step process, and the function calling methodology is designed for more simplistic use cases.
Similarly, if your agent needs to support database calls, you might want the function calling feature to invoke some function to run a simple SQL query for you based on the user's request. But without knowing what your database schema looks like prior to receiving the request from your agent, it wouldn't know what type of SQL query to provide, so it might end up hallucinating something that doesn't work (another issue with function calls). In order to get around this, you'd need to first pass in your database's schema so that ChatGPT would have the right context when determining how to actually invoke your function to begin with.
APIs might also request Bearer Tokens, user-specific API tokens, context-specific JSON data, or any number of other nuances that might not be available to ChatGPT, or might be onerous to provide it, when you pass in a list of functions that it can call.
So all of these nuances make it difficult to use function calls as they have been designed thus far. You might make the argument that, for each of the above, you could all of the prerequisites and requisite steps into the actual function that ChatGPT would invoke, but imagine doing this with an agent that has support for hundreds of custom tools, each of which supports 10-30 different API calls or specific pieces of functionality. You'd have to design your code in a highly nuanced, and arguably unmaintainable way.
The issues above may not even matter if you're not using Python, Javascript, or a similar language that lets you execute a text string as code, in which case, you're out of luck with function calls to begin with.
It's clear that, as they're structured today, ChatGPT's function calls are unlikely to support sophisticated enterprise or business applications until things change. So if you're designing an AutoGPT agent, you might want to go with a different strategy. Fortunately, there's a strategy that's as powerful as function calling but without the downsides, and you can implement it yourself easily. That strategy is smart tool selection.
The alternative to function calling
One of the nice things about using LLMs for function calling is that they make the decision of which action your code should take next based on the context you provide. That intelligent decision-making ability is crucial to building an autonomous agent, but if you decouple that capability from the lower-level execution of the actual code that needs to be run, you can create an extremely capable, autonomous agent — one whose execution details you control more closely, providing it with the guardrails and hand-holding it needs to complete individual tasks.
In this "tool selection" framework, you use ChatGPT to select the next best tool to run, rather than have it provide the specific code to run next. The difference is slight but important. By having ChatGPT select your next tool, your code can take ownership for obtaining all of the pre-requisite data and running all of the necessary steps it needs while still keeping your code well-structured and maintainable.
The idea is simple. Rather than passing in a description of which functions to run next, you instead pass in a description of the tools that ChatGPT has available to it. You can provide a name, description, and optional input & output formats for each tool, and you can ask ChatGPT to select the next tool to run.
When it responds with the next tool, your code essentially hands off control to that tool, which is responsible for ensuring it executes properly. For example, the tool may have a list of sub-functions that it supports, and once it gains control of the execution of your app, it can ask ChatGPT which of its sub-functions it should run. It might not even need ChatGPT to figure out what to do next, as long as your code provides it the object that represents the current user and the only function it supports is retrieving data from one endpoint for one user, it has everything it needs.
As long as the selected tool has all of the contextual information and runtime objects it needs to do its job, your tool's code can be responsible for all setup and teardown tasks related to its execution, and the interface between your tool and the larger application could be as simple as a single method, leading to a cleaner and more maintainable codebase. This setup gives you the same power of using LLMs to identify what to do next without the issues that come from having it run arbitrary code that may not perform all necessary setup tasks. And this setup will work with whatever language you're using. As an added benefit, you'll never run an arbitrary string as code, which provides added security and confidence that your application won't go totally off the rails.
Agents require a lot of work
While the newly created world of autonomous agents has a lot of potential, we're only in the early days, and creating high-functioning, robust agents will require a lot of engineering effort. If you're creating your own agent today, it's important to architect it in a way that it can grow with you in the future. Choosing the right strategy for having it execute its functionality is crucial to ensuring it will operate properly in all scenarios. We also recommend adding lots of logging along the way, so that you can always trace what an agent did, and why.
If you're interested in having your own autonomous agents for search, information retrieval, as a virtual assistant, or as a business productivity tool, but you don't want to create your own, we've got you covered at Locusive. Our team can help you get up and running right away, just feel free to reach out.
---