Many LLMs do function calling (aka tool use) by taking advantage of JSON schema.
We’ll use docments to make getting JSON schema from Python functions as ergonomic as possible. Each parameter (and the return value) should have a type, and a docments comment with the description of what it is. Here’s an example:
def silly_sum( a:int, # First thing to sum b:int=1, # Second thing to sum c:list[int]=None, # A pointless argument) ->int: # The sum of the inputs"Adds a + b."return a + b
This is what docments makes of that:
d = docments(silly_sum, full=True)d
{ 'a': { 'anno': <class'int'>,'default': <class'inspect._empty'>,'docment': 'First thing to sum'},'b': {'anno': <class'int'>, 'default': 1, 'docment': 'Second thing to sum'},'c': {'anno': list[int], 'default': None, 'docment': 'A pointless argument'},'return': { 'anno': <class'int'>,'default': <class'inspect._empty'>,'docment': 'The sum of the inputs'}}
Note that this is an AttrDict so we can treat it like an object, or a dict:
d.a.docment, d['a']['anno']
('First thing to sum', int)
Exported source
def _types(t:type)->tuple[str,Optional[str]]:"Tuple of json schema type name and (if appropriate) array item name."if t is empty: raiseTypeError('Missing type') tmap = {int:"integer", float:"number", str:"string", bool:"boolean", list:"array", dict:"object"} tmap.update({k.__name__: v for k, v in tmap.items()})ifgetattr(t, '__origin__', None) in (list,tuple): args =getattr(t, '__args__', None) item_type ="object"ifnot args else tmap.get(t.__args__[0].__name__, "object")return"array", item_type# if t is a string like 'int', directly use the string as the keyelifisinstance(t, str): return tmap.get(t, "object"), None# if t is the type itself and a containerelif get_origin(t): return tmap.get(get_origin(t).__name__, "object"), None# if t is the type itself like int, use the __name__ representation as the keyelse: return tmap.get(t.__name__, "object"), None
This internal function is needed to convert Python types into JSON schema types.
Generate JSON schema for a class, function, or method
Exported source
def get_schema(f:Union[callable,dict], pname='input_schema')->dict:"Generate JSON schema for a class, function, or method"ifisinstance(f, dict): return f schema = _get_nested_schema(f) desc = f.__doc__assert desc, "Docstring missing!" d = docments(f, full=True) ret = d.pop('return')if (ret.anno isnot empty) and (ret.anno isnotNone): desc +=f'\n\nReturns:\n- type: {_types(ret.anno)[0]}'return {"name": f.__name__, "description": desc, pname: schema}
Usage examples
Putting this all together, we can now test getting a schema from silly_sum. The tool use spec doesn’t support return annotations directly, so we put that in the description instead.
s = get_schema(silly_sum)desc = s.pop('description')print(desc)s
class Dummy:def sums(self, a:int, # First thing to sum b:int=1# Second thing to sum ) ->int: # The sum of the inputs"Adds a + b."print(f"Finding the sum of {a} and {b}")return a + bget_schema(Dummy.sums)
get_schema also handles more complicated structures such as nested classes. This is useful for things like structured outputs.
class Turn:"Turn between two speakers"def__init__(self, speaker_a:str, # First speaker's message speaker_b:str, # Second speaker's message ): store_attr()class Conversation:"A conversation between two speakers"def__init__(self, turns:list[Turn], # Turns of the conversation ): store_attr()get_schema(Conversation)
class DictConversation:"A conversation between two speakers"def__init__(self, turns:dict[str,list[Turn]], # dictionary of topics and the Turns of the conversation ): store_attr()get_schema(DictConversation)
{'name': 'DictConversation',
'description': 'A conversation between two speakers',
'input_schema': {'type': 'object',
'properties': {'turns': {'type': 'object',
'description': 'dictionary of topics and the Turns of the conversation',
'additionalProperties': {'type': 'array',
'items': {'$ref': '#/$defs/Turn'}}}},
'title': 'DictConversation',
'required': ['turns'],
'$defs': {'Turn': {'type': 'object',
'properties': {'speaker_a': {'type': 'string',
'description': "First speaker's message"},
'speaker_b': {'type': 'string',
'description': "Second speaker's message"}},
'title': 'Turn',
'required': ['speaker_a', 'speaker_b']}}}}
class SetConversation:"A conversation between two speakers"def__init__(self, turns:set[Turn], # the unique Turns of the conversation ): store_attr()get_schema(SetConversation)
{'name': 'SetConversation',
'description': 'A conversation between two speakers',
'input_schema': {'type': 'object',
'properties': {'turns': {'type': 'array',
'description': 'the unique Turns of the conversation',
'items': {'$ref': '#/$defs/Turn'},
'uniqueItems': True}},
'title': 'SetConversation',
'required': ['turns'],
'$defs': {'Turn': {'type': 'object',
'properties': {'speaker_a': {'type': 'string',
'description': "First speaker's message"},
'speaker_b': {'type': 'string',
'description': "Second speaker's message"}},
'title': 'Turn',
'required': ['speaker_a', 'speaker_b']}}}}
Containers can also be used, both in their parameterized form (List[int]) or as their unparameterized raw type (List). In the latter case, the item type is mapped to object in JSON schema.
In language model clients it’s often useful to have a ‘code interpreter’ – this is something that runs code, and generally outputs the result of the last expression (i.e like IPython or Jupyter).
In this section we’ll create the python function, which executes a string as Python code, with an optional timeout. If the last line is an expression, we’ll return that – just like in IPython or Jupyter, but without needing them installed.
Exported source
import ast, time, signal, tracebackfrom fastcore.utils import*
Exported source
def _copy_loc(new, orig):"Copy location information from original node to new node and all children." new = ast.copy_location(new, orig)for field, o in ast.iter_fields(new):ifisinstance(o, ast.AST): setattr(new, field, _copy_loc(o, orig))elifisinstance(o, list): setattr(new, field, [_copy_loc(value, orig) for value in o])return new
This is an internal function that’s needed for _run to ensure that location information is available in the abstract syntax tree (AST), since otherwise python complains.
Exported source
def _run(code:str, glb:dict=None, loc:dict=None):"Run `code`, returning final expression (similar to IPython)" tree = ast.parse(code) last_node = tree.body[-1] if tree.body elseNone# If the last node is an expression, modify the AST to capture the resultifisinstance(last_node, ast.Expr): tgt = [ast.Name(id='_result', ctx=ast.Store())] assign_node = ast.Assign(targets=tgt, value=last_node.value) tree.body[-1] = _copy_loc(assign_node, last_node) compiled_code =compile(tree, filename='<ast>', mode='exec') glb = glb or {} stdout_buffer = io.StringIO() saved_stdout = sys.stdout sys.stdout = stdout_buffertry: exec(compiled_code, glb, loc)finally: sys.stdout = saved_stdout _result = glb.get('_result', None)if _result isnotNone: return _resultreturn stdout_buffer.getvalue().strip()
This is the internal function used to actually run the code – we pull off the last AST to see if it’s an expression (i.e something that returns a value), and if so, we store it to a special _result variable so we can return it.
_run('import math;math.factorial(12)')
479001600
_run('print(1+1)')
'2'
We now have the machinery needed to create our python function.
Executes python code with timeout and returning final expression (similar to IPython).
Type
Default
Details
code
str
Code to execute
glb
Optional
None
Globals namespace
loc
Optional
None
Locals namespace
timeout
int
3600
Maximum run time in seconds
Exported source
def python( code:str, # Code to execute glb:Optional[dict]=None, # Globals namespace loc:Optional[dict]=None, # Locals namespace timeout:int=3600# Maximum run time in seconds):"Executes python `code` with `timeout` and returning final expression (similar to IPython)."def handler(*args): raiseTimeoutError()if glb isNone: glb = inspect.currentframe().f_back.f_globalsif loc isNone: loc=glb signal.signal(signal.SIGALRM, handler) signal.alarm(timeout)try: return _run(code, glb, loc)exceptExceptionas e: return traceback.format_exc()finally: signal.alarm(0)
There’s no builtin security here – you should generally use this in a sandbox, or alternatively prompt before running code. It can handle multiline function definitions, and pretty much any other normal Python syntax.
python("""def factorial(n): if n == 0 or n == 1: return 1 else: return n * factorial(n-1)factorial(5)""")
120
If the code takes longer than timeout then it returns an error string.
Traceback (most recent call last):
File "/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_54602/2052945749.py", line 14, in python
try: return _run(code, glb, loc)
^^^^^^^^^^^^^^^^^^^^
File "/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_54602/1858893181.py", line 18, in _run
try: exec(compiled_code, glb, loc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<ast>", line 1, in <module>
File "/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_54602/2052945749.py", line 9, in handler
def handler(*args): raise TimeoutError()
^^^^^^^^^^^^^^^^^^^^
TimeoutError
By default the caller’s global namespace is used.
python("a=1")a
1
Pass a different glb if needed; this requires using python_ns.
glb = {}python("a=3", glb=glb)a, glb['a']
(1, 3)
get_schema(python)
{'name': 'python',
'description': 'Executes python `code` with `timeout` and returning final expression (similar to IPython).',
'input_schema': {'type': 'object',
'properties': {'code': {'type': 'string', 'description': 'Code to execute'},
'glb': {'type': 'object',
'description': 'Globals namespace',
'default': None,
'anyOf': [{'type': 'object'}, {'type': 'null'}]},
'loc': {'type': 'object',
'description': 'Locals namespace',
'default': None,
'anyOf': [{'type': 'object'}, {'type': 'null'}]},
'timeout': {'type': 'integer',
'description': 'Maximum run time in seconds',
'default': 3600}},
'required': ['code']}}
Tool Calling
Many LLM API providers offer tool calling where an LLM can choose to call a given tool. This is also helpful for structured outputs since the response from the LLM is contrained to the required arguments of the tool.
This section will be dedicated to helper functions for calling tools. We don’t want to allow LLMs to call just any possible function (that would be a security disaster!) so we create a namespace – that is, a dictionary of allowable function names to call.
Types that can be constructed from a plain str can be used directly, as long as they are in custom_types (which you can add to).
def path_test( a: Path, # a type hint b: Path # b type hint):"Mandatory docstring"return a/btest_eq(call_func('path_test', {'a': '/home', 'b': 'user'}, ns=[path_test]), Path('/home/user'))
Async function calling
asyncdef asums(a, b): return a + bns = mk_ns(asums); ns
Awaits the function fc_name with the given fc_inputs using namespace ns.
Exported source
asyncdef call_func_async(fc_name, fc_inputs, ns, raise_on_err=True):"Awaits the function `fc_name` with the given `fc_inputs` using namespace `ns`." res = call_func(fc_name, fc_inputs, ns, raise_on_err=raise_on_err)if inspect.iscoroutine(res):try: res =await resexceptExceptionas e:if raise_on_err: raise e fromNoneelse: return traceback.format_exc()return res
Create a callable function from a JSON schema tool definition
mk_tool is the inverse of get_schema — it creates a callable Python function from a JSON schema tool definition. This is useful for MCP clients where tools are defined as schemas but need to be called as regular Python functions.
The created function has a proper signature, docstring, and annotations, so it works well with IDE autocomplete and introspection.