Structured output
Constrain model output to valid JSON, regex, or any context-free grammar — at the token sampling level. No retries, no prompt engineering, no parsing failures.
How it works
During sampling, instead of choosing from all 100K+ tokens in the vocabulary, we mask out tokens that would violate your schema. The model is forced to pick from valid tokens only. The generated text is guaranteed to parse — by construction, not by hope.
Three modes are supported, in order of strictness:
- JSON mode: guarantees output is parseable JSON of any shape.
- JSON Schema: guarantees output matches a specific JSON Schema (Draft 2020-12).
- Grammar mode: guarantees output matches an arbitrary EBNF-style grammar (regex, custom languages, code-completion).
JSON Schema example
const resp = await client.chat.completions.create({
model: "deepseek/deepseek-v3.2-exp",
messages: [
{
role: "user",
content: "Extract: 'Alice is 32 and works as a fintech CTO in NYC.'",
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "person",
strict: true,
schema: {
type: "object",
properties: {
name: { type: "string" },
age: { type: "integer", minimum: 0 },
role: { type: "string" },
industry: { type: "string", enum: ["fintech", "healthtech", "edtech", "other"] },
location: { type: "string" },
},
required: ["name", "age", "role", "industry", "location"],
},
},
},
});
// resp.choices[0].message.content is GUARANTEED to parse and match the schema
const person = JSON.parse(resp.choices[0].message.content);
// { name: "Alice", age: 32, role: "CTO", industry: "fintech", location: "NYC" }Grammar mode (advanced)
For shapes that JSON Schema can't express — like a SQL subset, a domain-specific language, or matching only valid IPv4 addresses — use grammar mode with EBNF.
const grammar = `
root ::= ipv4
ipv4 ::= octet "." octet "." octet "." octet
octet ::= "25" [0-5] | "2" [0-4] [0-9] | "1" [0-9] [0-9] | [1-9]? [0-9]
`;
const resp = await client.chat.completions.create({
model: "meta/llama-4-scout",
messages: [
{ role: "user", content: "Give me an IP address." },
],
extra_body: { grammar },
});
// Output is guaranteed valid IPv4, e.g. "192.168.1.42"Performance impact
Naive constrained decoding adds 20-40% overhead per token (the mask must be computed against the whole vocab). Luminet's implementation precompiles the schema into a finite-state automaton at request time and then masks in O(1) per token. Net overhead: < 4%.
When NOT to use
- Free-form chat: obviously.
- Very large schemas (≥ 50 KB): compile time becomes noticeable. Either simplify or pre-compile via our Grammars API.
- When you need explanation: JSON-only output gives you the answer but not the reasoning. Ask for a
reasoningfield in your schema if you need both.
TL;DR
Use response_format: json_schema to guarantee parseable output. Overhead is < 4%. Works on every hosted model and most routed models (where the upstream provider supports it).