Function calling & tool use
Tool definitions, parallel function calls, tool-choice forcing, and multi-step agent loops. All OpenAI-compatible. Works against every hosted model and most routed ones.
The simplest possible example
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get current weather for a city",
parameters: {
type: "object",
properties: {
city: { type: "string" },
unit: { type: "string", enum: ["c", "f"] },
},
required: ["city"],
},
},
},
];
const resp = await client.chat.completions.create({
model: "deepseek/deepseek-v3.2-exp",
messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
tools,
});
// resp.choices[0].message.tool_calls is populated:
// [{ id: "call_x", function: { name: "get_weather",
// arguments: '{"city":"Tokyo","unit":"c"}' } }]Parallel function calling
The newer hosted models (Llama 4 Maverick, DeepSeek V3.2, Qwen3-Max, Kimi K2) emit multiple tool calls in a single response when they determine independent subtasks. Each call has a unique id; you execute them in parallel and respond with the results.
// User: "Compare weather in Tokyo and Singapore"
// Model emits two tool calls in one shot:
[
{ id: "call_1", function: { name: "get_weather", arguments: '{"city":"Tokyo"}' } },
{ id: "call_2", function: { name: "get_weather", arguments: '{"city":"Singapore"}' } },
]
// Execute both in parallel, then send results back:
messages.push(
{ role: "tool", tool_call_id: "call_1", content: "23°C, sunny" },
{ role: "tool", tool_call_id: "call_2", content: "31°C, humid" },
);
const finalResp = await client.chat.completions.create({
model: "deepseek/deepseek-v3.2-exp",
messages, tools,
});Tool-choice forcing
Three modes:
tool_choice: "auto"(default) — model decides whether to call a tool or reply directly.tool_choice: "required"— model must call at least one tool.tool_choice: { type: "function", function: { name: "get_weather" } }— model must call this specific tool.
Forced tool calls are implemented at the constrained-decoding layer (see structured output). Output is guaranteed valid by construction.
Tool-call accuracy benchmarks
We track tool-call correctness on BFCL v3 (Berkeley Function Calling Leaderboard) for every hosted model:
| Model | BFCL v3 (overall) | Parallel calls | Hallucination |
|---|---|---|---|
| Kimi K2 Instruct | 78.4 | 82.1 | 3.2% |
| Llama 4 Maverick | 76.9 | 80.5 | 4.1% |
| DeepSeek V3.2 | 76.2 | 78.8 | 3.8% |
| Qwen3-Max | 75.5 | 79.4 | 4.5% |
| GLM-4.6 | 73.8 | 76.2 | 5.1% |
| Magistral Medium | 70.4 | 72.1 | 6.3% |
Multi-step agent loops
Most agent frameworks (LangChain, LlamaIndex, Mastra, custom) loop on the same chat-completion call:
async function runAgent(userMessage: string) {
const messages = [{ role: "user", content: userMessage }];
for (let step = 0; step < MAX_STEPS; step++) {
const resp = await client.chat.completions.create({
model: "kimi/k2-instruct", // best for agent loops
messages, tools,
tool_choice: "auto",
});
const msg = resp.choices[0].message;
messages.push(msg);
if (!msg.tool_calls?.length) return msg.content; // done
const results = await Promise.all(
msg.tool_calls.map(call => executeTool(call.function.name, call.function.arguments))
);
for (const [i, result] of results.entries()) {
messages.push({
role: "tool",
tool_call_id: msg.tool_calls[i].id,
content: JSON.stringify(result),
});
}
}
throw new Error("Max steps reached");
}What we handle automatically
- Tool-call retries:if the model emits invalid JSON arguments, we re-sample with constrained-decoding and don't bill you for the failed attempt.
- Schema validation: tool definitions are validated up front against JSON Schema spec (Draft 2020-12). Malformed schemas return 400 immediately.
- Prefix caching: tool definitions are part of the cached system prefix (see KV cache), so you don't pay re-computation on every step of an agent loop.
TL;DR
Pass tools like you would to OpenAI; parallel calls just work on every newer hosted model. Use Kimi K2 or Llama 4 Maverick for agent loops — both above 76 on BFCL v3. We handle retries, validation, and prefix caching for free.