luminet.ai
SolutionsBenchmarksPlaybookPlatformDocs
Get a quoteBook a call
Blog

Engineering notes from the Luminet team

Inference benchmarks, kernel deep-dives, model launches, and lessons from running open-weight LLMs at production scale.

ModelsMay 11, 2026·Marina Chen·8 min

Why Qwen3-Next 80B beats Llama 3.3 70B at half the cost

Production benchmark of the new Qwen3-Next 3B-active MoE vs Meta's flagship 70B dense. 2× throughput, 6 of 8 eval wins, half the bill — and an Apache 2.0 license.

Read post
EngineeringMay 8, 2026·Sasha Petrov·9 min

FireAttention v3: 2.4× faster inference for open-weight LLMs

Third-generation kernel ships today. FP8 fused prefill, speculative KV eviction, tree-based speculative decoding — and what each means in production.

Read post

More posts coming soon — subscribe via RSS for new releases.

luminet.ai

Performance engineering for production AI. We make your GPUs faster — fixed-fee, code in your repo.

billing@lumnt.com

Solutions

  • All services
  • Quote calculator
  • Team
  • Free playbook

Platform

  • LLM models
  • Embeddings
  • Audio
  • Image
  • Pricing

Inference

  • Benchmarks
  • Quantization
  • Speculative decoding
  • Continuous batching
  • Long context

Company

  • About
  • Blog
  • Changelog
  • Careers
  • Contact

Legal

  • Privacy
  • Terms
  • Security
  • DPA

© 2026 Luminet, Inc. All rights reserved.

Built for builders. Made with care.