Model library

Open models, served at production speed

Open-weight models run on our FireAttention engine — custom kernels, FP8, continuous batching. Frontier closed models are routed through their official APIs as a passthrough so you keep one endpoint, one bill, one SDK.

Hosted

17 open-weight models

Served on Luminet's GPU clusters with FireAttention kernels. 2-3× the throughput of stock inference servers.

Routed

10 closed frontier models

Passthrough to OpenAI, Anthropic, Google, etc. via official APIs. Same SDK, one bill, no markup beyond a 5% routing fee.

ModelProviderServingContextIn / 1MOut / 1MThroughputTTFT
Qwen3-235B A22B
Featured
alibaba/qwen3-235b-a22b
AlibabaHosted256K$0.45$1.80525 tok/s105ms
DeepSeek V4
Featured
deepseek/deepseek-v4
DeepSeekHosted256K$0.32$1.28480 tok/s95ms
GLM-5
Featured
zhipu/glm-5
Zhipu AIHosted1M$0.60$2.40410 tok/s110ms
Kimi K2.6
Featured
moonshot/kimi-k2.6
Moonshot AIHosted200K$0.55$2.20320 tok/s130ms
Qwen3.5-Max
Featured
alibaba/qwen3.5-max
AlibabaHosted1M$1.20$4.80310 tok/s155ms
Llama 5 Instruct
Featured
meta/llama-5-instruct
MetaHosted2M$0.85$2.60285 tok/s145ms
Nemotron Ultra 340B
Featured
nvidia/nemotron-ultra-340b
NVIDIAHosted256K$1.40$4.20240 tok/s175ms
Phi-5 22B
microsoft/phi-5-22b
MicrosoftHosted32K$0.09$0.18760 tok/s62ms
Qwen3-Next 80B A3B
alibaba/qwen3-next-80b
AlibabaHosted256K$0.14$0.42640 tok/s78ms
Yi-Lightning 2
01-ai/yi-lightning-2
01.AIHosted32K$0.08$0.08620 tok/s68ms
Devstral 2
mistral/devstral-2
MistralHosted128K$0.10$0.30580 tok/s92ms
Gemma 4 27B
google/gemma-4-27b
GoogleHosted200K$0.18$0.55510 tok/s88ms
Llama 4 Scout
meta/llama-4-scout
MetaHosted10M$0.18$0.59460 tok/s95ms
Qwen3-Coder 2
alibaba/qwen3-coder-2
AlibabaHosted256K$0.55$2.20270 tok/s150ms
Qwen3-VL 2
alibaba/qwen3-vl-2
AlibabaHosted128K$0.70$2.80240 tok/s175ms
Mistral Large 3
mistral/mistral-large-3
MistralHosted256K$1.80$5.40230 tok/s165ms
DeepSeek R2
deepseek/deepseek-r2
DeepSeekHosted256K$0.50$2.00215 tok/s220ms
GPT-5 Nano
openai/gpt-5-nano
OpenAIRouted128K$0.10$0.40240 tok/s130ms
Gemini 2.5 Flash
google/gemini-2.5-flash
GoogleRouted1M$0.08$0.30192 tok/s160ms
Claude Haiku 4.5
anthropic/claude-haiku-4.5
AnthropicRouted200K$0.80$4.00168 tok/s180ms
GPT-5 Mini
openai/gpt-5-mini
OpenAIRouted200K$0.60$2.40145 tok/s220ms
Command A
cohere/command-a
CohereRouted256K$2.50$10.0090 tok/s410ms
Claude Sonnet 4.6
anthropic/claude-sonnet-4.6
AnthropicRouted500K$3.00$15.0088 tok/s410ms
Gemini 3 Pro
google/gemini-3-pro
GoogleRouted2M$1.50$12.0082 tok/s480ms
GPT-5
openai/gpt-5
OpenAIRouted400K$5.00$15.0078 tok/s480ms
Grok 4
xai/grok-4
xAIRouted256K$5.00$15.0072 tok/s520ms
Claude Opus 4.7
anthropic/claude-opus-4.7
AnthropicRouted1M$15.00$75.0042 tok/s920ms