Model library

Open models, served at production speed

Open-weight models run on our FireAttention engine — custom kernels, FP8, continuous batching. Frontier closed models are routed through their official APIs as a passthrough so you keep one endpoint, one bill, one SDK.

Hosted

17 open-weight models

Served on Luminet's GPU clusters with FireAttention kernels. 2-3× the throughput of stock inference servers.

Routed

10 closed frontier models

Passthrough to OpenAI, Anthropic, Google, etc. via official APIs. Same SDK, one bill, no markup beyond a 5% routing fee.

Model	Provider	Serving	Context	In / 1M	Out / 1M	Throughput	TTFT
Qwen3-235B A22B Featured alibaba/qwen3-235b-a22b	Alibaba	Hosted	256K	$0.45	$1.80	525 tok/s	105ms
DeepSeek V4 Featured deepseek/deepseek-v4	DeepSeek	Hosted	256K	$0.32	$1.28	480 tok/s	95ms
GLM-5 Featured zhipu/glm-5	Zhipu AI	Hosted	1M	$0.60	$2.40	410 tok/s	110ms
Kimi K2.6 Featured moonshot/kimi-k2.6	Moonshot AI	Hosted	200K	$0.55	$2.20	320 tok/s	130ms
Qwen3.5-Max Featured alibaba/qwen3.5-max	Alibaba	Hosted	1M	$1.20	$4.80	310 tok/s	155ms
Llama 5 Instruct Featured meta/llama-5-instruct	Meta	Hosted	2M	$0.85	$2.60	285 tok/s	145ms
Nemotron Ultra 340B Featured nvidia/nemotron-ultra-340b	NVIDIA	Hosted	256K	$1.40	$4.20	240 tok/s	175ms
Phi-5 22B microsoft/phi-5-22b	Microsoft	Hosted	32K	$0.09	$0.18	760 tok/s	62ms
Qwen3-Next 80B A3B alibaba/qwen3-next-80b	Alibaba	Hosted	256K	$0.14	$0.42	640 tok/s	78ms
Yi-Lightning 2 01-ai/yi-lightning-2	01.AI	Hosted	32K	$0.08	$0.08	620 tok/s	68ms
Devstral 2 mistral/devstral-2	Mistral	Hosted	128K	$0.10	$0.30	580 tok/s	92ms
Gemma 4 27B google/gemma-4-27b	Google	Hosted	200K	$0.18	$0.55	510 tok/s	88ms
Llama 4 Scout meta/llama-4-scout	Meta	Hosted	10M	$0.18	$0.59	460 tok/s	95ms
Qwen3-Coder 2 alibaba/qwen3-coder-2	Alibaba	Hosted	256K	$0.55	$2.20	270 tok/s	150ms
Qwen3-VL 2 alibaba/qwen3-vl-2	Alibaba	Hosted	128K	$0.70	$2.80	240 tok/s	175ms
Mistral Large 3 mistral/mistral-large-3	Mistral	Hosted	256K	$1.80	$5.40	230 tok/s	165ms
DeepSeek R2 deepseek/deepseek-r2	DeepSeek	Hosted	256K	$0.50	$2.00	215 tok/s	220ms
GPT-5 Nano openai/gpt-5-nano	OpenAI	Routed	128K	$0.10	$0.40	240 tok/s	130ms
Gemini 2.5 Flash google/gemini-2.5-flash	Google	Routed	1M	$0.08	$0.30	192 tok/s	160ms
Claude Haiku 4.5 anthropic/claude-haiku-4.5	Anthropic	Routed	200K	$0.80	$4.00	168 tok/s	180ms
GPT-5 Mini openai/gpt-5-mini	OpenAI	Routed	200K	$0.60	$2.40	145 tok/s	220ms
Command A cohere/command-a	Cohere	Routed	256K	$2.50	$10.00	90 tok/s	410ms
Claude Sonnet 4.6 anthropic/claude-sonnet-4.6	Anthropic	Routed	500K	$3.00	$15.00	88 tok/s	410ms
Gemini 3 Pro google/gemini-3-pro	Google	Routed	2M	$1.50	$12.00	82 tok/s	480ms
GPT-5 openai/gpt-5	OpenAI	Routed	400K	$5.00	$15.00	78 tok/s	480ms
Grok 4 xai/grok-4	xAI	Routed	256K	$5.00	$15.00	72 tok/s	520ms
Claude Opus 4.7 anthropic/claude-opus-4.7	Anthropic	Routed	1M	$15.00	$75.00	42 tok/s	920ms