All posts

Function calling audit trail: where teams get it wrong

Function calling looks easy to audit, which is exactly why teams get a function calling audit trail wrong. The model emits a structured call with named arguments, so it feels like you already have a clean record. You do not. What the model proposed and what actually executed are two different events, and the gap between them is where the trouble lives. Where it goes wrong: trusting the proposed call The model's function call is a request, not a result. Teams log the proposed call and assume i

Free White Paper

Function Calling Security + Audit Trail Requirements: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Function calling looks easy to audit, which is exactly why teams get a function calling audit trail wrong. The model emits a structured call with named arguments, so it feels like you already have a clean record. You do not. What the model proposed and what actually executed are two different events, and the gap between them is where the trouble lives.

Where it goes wrong: trusting the proposed call

The model's function call is a request, not a result. Teams log the proposed call and assume it ran as written. But arguments get mutated, calls get retried, some get denied, and some execute against systems the model never reasoned about. A function calling audit trail built on the model's output records intent, not effect.

Where it goes wrong: no identity on the call

Structured arguments tell you what was asked, not who asked. When every function executes under one shared credential, you cannot tie a call back to a run or a person, and the tidy JSON gives you false confidence that you have attribution when you do not.

Record the executed call, with identity

Capture the call where it executes, not where the model proposed it: the real arguments, the result, the identity behind the run, and any denial. That is the only version that holds up.

The architecture that fixes both

Both mistakes come from recording in the wrong place. The requirement is to capture the executed call at a boundary outside the agent, under a scoped per-run identity, checked against policy before it runs, in a record the agent cannot edit. That is one control surface, and hoop.dev is built to it: it fronts the functions as an identity-aware proxy, records each executed call as a command-level audit, and masks sensitive arguments inline. In practice you route the function calls through hoop.dev. The getting-started guide covers the first connection, and hoop.dev/learn explains the executed-call record.

Continue reading? Get the full guide.

Function Calling Security + Audit Trail Requirements: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

The schema is not the safety

The deepest version of the mistake is assuming the structured schema gives you safety. Function calling produces clean, typed arguments, and that tidiness is reassuring in a way that raw shell access never was. But a well-formed call can still be the wrong call. A model can be steered into invoking a legitimate function with arguments that exfiltrate data, delete the wrong records, or reach a system the task never needed. The schema validates the shape of the request, not its intent or its authority. A function calling audit trail that records only that a valid call was made tells you the JSON parsed, not that the action was allowed.

A concrete example

Say your agent exposes a send_email function. The schema is simple: recipient, subject, body. An injected instruction convinces the agent to call it with the customer list pasted into the body and an external recipient. Every field is valid. The call is perfectly formed. If your record captures the proposed schema-valid call and nothing else, it looks unremarkable. If instead the executed call is recorded at the boundary, under the run's identity, and the recipient is checked against policy before it sends, the same call is denied and logged as the attempt it was. The difference is not better validation of the arguments. It is recording and checking the executed call where the agent cannot reach.

Try it on one function

hoop.dev is open source. From the GitHub repository, put one function behind it and compare the executed-call record against what the model proposed.

FAQ

Isn't the structured call already a good log?

It is a good record of intent. You need the executed call, with identity and result, to know what actually happened.

What about retries and failures?

Record them. A function calling audit trail that drops denied and failed calls hides the most useful signals.

Does this work across different function backends?

Yes. Whether the functions hit an internal service, a database, or a third-party API, recording the executed call at the access boundary captures them in one consistent format. The function calling audit trail does not care what the function is, only that the call crossed the boundary where it was checked and logged.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts