Luminet — The fastest inference platform for production AI

Most customers come for our engineering services. Self-serve platform pricing below — for project-based work, see Solutions.

Browse Solutions →

Self-serve platform pricing

Per-token pricing. No markup.

Open-weight models served on our infrastructure at the lowest published rates. Closed frontier models routed to upstream APIs with a flat 5% routing fee. Dedicated GPUs by the hour when you need guaranteed capacity.

Open-weight models served on Luminet

Per-token billing in arrears · billed monthly · no minimums

No markup

Model	Input / 1M	Output / 1M	Throughput	Context
DeepSeek V4· DeepSeek deepseek/deepseek-v4	$0.32	$1.28	480 tok/s	256K
Kimi K2.6· Moonshot AI moonshot/kimi-k2.6	$0.55	$2.20	320 tok/s	200K
GLM-5· Zhipu AI zhipu/glm-5	$0.60	$2.40	410 tok/s	1000K
Llama 5 Instruct· Meta meta/llama-5-instruct	$0.85	$2.60	285 tok/s	2000K
Nemotron Ultra 340B· NVIDIA nvidia/nemotron-ultra-340b	$1.40	$4.20	240 tok/s	256K
Qwen3-235B A22B· Alibaba alibaba/qwen3-235b-a22b	$0.45	$1.80	525 tok/s	256K
Qwen3.5-Max· Alibaba alibaba/qwen3.5-max	$1.20	$4.80	310 tok/s	1000K
DeepSeek R2· DeepSeek deepseek/deepseek-r2	$0.50	$2.00	215 tok/s	256K
Qwen3-Coder 2· Alibaba alibaba/qwen3-coder-2	$0.55	$2.20	270 tok/s	256K
Qwen3-VL 2· Alibaba alibaba/qwen3-vl-2	$0.70	$2.80	240 tok/s	128K
Qwen3-Next 80B A3B· Alibaba alibaba/qwen3-next-80b	$0.14	$0.42	640 tok/s	256K
Mistral Large 3· Mistral mistral/mistral-large-3	$1.80	$5.40	230 tok/s	256K
Gemma 4 27B· Google google/gemma-4-27b	$0.18	$0.55	510 tok/s	200K
Phi-5 22B· Microsoft microsoft/phi-5-22b	$0.09	$0.18	760 tok/s	32K
Llama 4 Scout· Meta meta/llama-4-scout	$0.18	$0.59	460 tok/s	10000K
Yi-Lightning 2· 01.AI 01-ai/yi-lightning-2	$0.08	$0.08	620 tok/s	32K
Devstral 2· Mistral mistral/devstral-2	$0.10	$0.30	580 tok/s	128K

All prices in USD per 1M tokens. Throughput measured at batch 32 on 8× H100 SXM. Volume discounts kick in at $25K/month — see Enterprise.

Enterprise

Spending $25K+/month?

Volume discounts on per-token rates, dedicated capacity, custom SLA, single-tenant clusters, and a named TAM. Pre-paid annual commitments unlock additional savings.

Talk to sales Schedule a call

Frequently asked

How is usage billed?+

We charge per 1M input/output tokens at the model's listed rate, plus a 5% routing fee. No minimums, no markups on most providers.

Can I bring my own API keys?+

Yes — Team and Enterprise plans support BYOK for major providers. Your usage isn't billed by Luminet, only the routing fee applies.

What happens when I hit my rate limit?+

Requests return HTTP 429 with a Retry-After header. Pro and above can request rate limit increases by emailing support.

Do credits expire?+

Free trial credits expire after 30 days. Paid plan credits roll over for 12 months.

Is there a free tier?+

Yes — every account gets $5 in credits to test all models, with no credit card required.

How does the smart routing work?+

We monitor latency, error rates, and price across providers in real-time. Failed requests automatically retry on a healthy backend within 200ms.

Where is my data sent?+

Requests are routed to the upstream provider's nearest region. We don't log prompt or completion bodies by default. Enterprise can pin regions.

Can I deploy on-prem?+

Enterprise customers can run Luminet inside their own VPC or on-prem. Contact sales for a deployment guide.