Model library
Open models, served at production speed
Open-weight models run on our FireAttention engine — custom kernels, FP8, continuous batching. Frontier closed models are routed through their official APIs as a passthrough so you keep one endpoint, one bill, one SDK.
Hosted
17 open-weight models
Served on Luminet's GPU clusters with FireAttention kernels. 2-3× the throughput of stock inference servers.
Routed
10 closed frontier models
Passthrough to OpenAI, Anthropic, Google, etc. via official APIs. Same SDK, one bill, no markup beyond a 5% routing fee.
| Model | Provider | Serving | Context | In / 1M | Out / 1M | Throughput | TTFT |
|---|---|---|---|---|---|---|---|
Qwen3-235B A22B Featuredalibaba/qwen3-235b-a22b | Alibaba | Hosted | 256K | $0.45 | $1.80 | 525 tok/s | 105ms |
DeepSeek V4 Featureddeepseek/deepseek-v4 | DeepSeek | Hosted | 256K | $0.32 | $1.28 | 480 tok/s | 95ms |
GLM-5 Featuredzhipu/glm-5 | Zhipu AI | Hosted | 1M | $0.60 | $2.40 | 410 tok/s | 110ms |
Kimi K2.6 Featuredmoonshot/kimi-k2.6 | Moonshot AI | Hosted | 200K | $0.55 | $2.20 | 320 tok/s | 130ms |
Qwen3.5-Max Featuredalibaba/qwen3.5-max | Alibaba | Hosted | 1M | $1.20 | $4.80 | 310 tok/s | 155ms |
Llama 5 Instruct Featuredmeta/llama-5-instruct | Meta | Hosted | 2M | $0.85 | $2.60 | 285 tok/s | 145ms |
Nemotron Ultra 340B Featurednvidia/nemotron-ultra-340b | NVIDIA | Hosted | 256K | $1.40 | $4.20 | 240 tok/s | 175ms |
Phi-5 22B microsoft/phi-5-22b | Microsoft | Hosted | 32K | $0.09 | $0.18 | 760 tok/s | 62ms |
Qwen3-Next 80B A3B alibaba/qwen3-next-80b | Alibaba | Hosted | 256K | $0.14 | $0.42 | 640 tok/s | 78ms |
Yi-Lightning 2 01-ai/yi-lightning-2 | 01.AI | Hosted | 32K | $0.08 | $0.08 | 620 tok/s | 68ms |
Devstral 2 mistral/devstral-2 | Mistral | Hosted | 128K | $0.10 | $0.30 | 580 tok/s | 92ms |
Gemma 4 27B google/gemma-4-27b | Hosted | 200K | $0.18 | $0.55 | 510 tok/s | 88ms | |
Llama 4 Scout meta/llama-4-scout | Meta | Hosted | 10M | $0.18 | $0.59 | 460 tok/s | 95ms |
Qwen3-Coder 2 alibaba/qwen3-coder-2 | Alibaba | Hosted | 256K | $0.55 | $2.20 | 270 tok/s | 150ms |
Qwen3-VL 2 alibaba/qwen3-vl-2 | Alibaba | Hosted | 128K | $0.70 | $2.80 | 240 tok/s | 175ms |
Mistral Large 3 mistral/mistral-large-3 | Mistral | Hosted | 256K | $1.80 | $5.40 | 230 tok/s | 165ms |
DeepSeek R2 deepseek/deepseek-r2 | DeepSeek | Hosted | 256K | $0.50 | $2.00 | 215 tok/s | 220ms |
GPT-5 Nano openai/gpt-5-nano | OpenAI | Routed | 128K | $0.10 | $0.40 | 240 tok/s | 130ms |
Gemini 2.5 Flash google/gemini-2.5-flash | Routed | 1M | $0.08 | $0.30 | 192 tok/s | 160ms | |
Claude Haiku 4.5 anthropic/claude-haiku-4.5 | Anthropic | Routed | 200K | $0.80 | $4.00 | 168 tok/s | 180ms |
GPT-5 Mini openai/gpt-5-mini | OpenAI | Routed | 200K | $0.60 | $2.40 | 145 tok/s | 220ms |
Command A cohere/command-a | Cohere | Routed | 256K | $2.50 | $10.00 | 90 tok/s | 410ms |
Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 | Anthropic | Routed | 500K | $3.00 | $15.00 | 88 tok/s | 410ms |
Gemini 3 Pro google/gemini-3-pro | Routed | 2M | $1.50 | $12.00 | 82 tok/s | 480ms | |
GPT-5 openai/gpt-5 | OpenAI | Routed | 400K | $5.00 | $15.00 | 78 tok/s | 480ms |
Grok 4 xai/grok-4 | xAI | Routed | 256K | $5.00 | $15.00 | 72 tok/s | 520ms |
Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | Routed | 1M | $15.00 | $75.00 | 42 tok/s | 920ms |