Most customers come for our engineering services. Self-serve platform pricing below — for project-based work, see Solutions.
Browse Solutions →Per-token pricing. No markup.
Open-weight models served on our infrastructure at the lowest published rates. Closed frontier models routed to upstream APIs with a flat 5% routing fee. Dedicated GPUs by the hour when you need guaranteed capacity.
Open-weight models served on Luminet
Per-token billing in arrears · billed monthly · no minimums
All prices in USD per 1M tokens. Throughput measured at batch 32 on 8× H100 SXM. Volume discounts kick in at $25K/month — see Enterprise.
Spending $25K+/month?
Volume discounts on per-token rates, dedicated capacity, custom SLA, single-tenant clusters, and a named TAM. Pre-paid annual commitments unlock additional savings.
Frequently asked
How is usage billed?+
We charge per 1M input/output tokens at the model's listed rate, plus a 5% routing fee. No minimums, no markups on most providers.
Can I bring my own API keys?+
Yes — Team and Enterprise plans support BYOK for major providers. Your usage isn't billed by Luminet, only the routing fee applies.
What happens when I hit my rate limit?+
Requests return HTTP 429 with a Retry-After header. Pro and above can request rate limit increases by emailing support.
Do credits expire?+
Free trial credits expire after 30 days. Paid plan credits roll over for 12 months.
Is there a free tier?+
Yes — every account gets $5 in credits to test all models, with no credit card required.
How does the smart routing work?+
We monitor latency, error rates, and price across providers in real-time. Failed requests automatically retry on a healthy backend within 200ms.
Where is my data sent?+
Requests are routed to the upstream provider's nearest region. We don't log prompt or completion bodies by default. Enterprise can pin regions.
Can I deploy on-prem?+
Enterprise customers can run Luminet inside their own VPC or on-prem. Contact sales for a deployment guide.