Get a quote Book a call

Documentation

Build with confidence

From your first request to deploying custom fine-tunes — and the engineering deep-dives behind FireAttention's 2.4× speedup.

Getting started

Make your first call. Understand the API surface.

Quickstart

First API call in under 60 seconds.

API reference

Endpoints, parameters, response shapes.

SDKs

Use any OpenAI-compatible client.

Models & deployment

What we host, how routing works, and how to bring your own.

Hosted models

Open-weight models on FireAttention.

Routing & failover

How requests pick the best provider.

Bring your own model

Deploy any HF checkpoint or LoRA.

Fine-tuning

Train custom checkpoints, deploy in one click.

Inference optimization

Engineer-grade guides to the techniques behind FireAttention.

Quantization

FP8 vs INT4 vs BF16 — when to use what.

Speculative decoding

Draft models for 2-3× wallclock speedup.

Continuous batching

Iteration-level scheduling, the four knobs.

Structured output

JSON Schema & grammar-constrained sampling.

KV cache & prefix caching

PagedAttention, prefix reuse, KV quantization.

Expert parallelism (MoE)

TP / DP / EP, all-to-all cost, EP=16 on Blackwell.

Disaggregated prefill / decode

Two pools, two parallelism strategies, one rack.

Multi-LoRA serving

Hundreds of adapters, one base GPU.

Function calling

Tool definitions, parallel calls, agent loops.

Streaming optimization

SSE chunking, partial-token handling, TTFT.

Long-context inference

From 128K to 10M tokens — cost & speed.

Guides

End-to-end tutorials for common patterns.

Tutorials & guides

Streaming, vision, tools, RAG.

Hello world (TypeScript)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.luminet.ai/v1",
  apiKey: process.env.LUMINET_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "deepseek/deepseek-v3.2-exp",
  messages: [{ role: "user", content: "Write a haiku about FP8." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}