Blog
Engineering notes from the Luminet team
Inference benchmarks, kernel deep-dives, model launches, and lessons from running open-weight LLMs at production scale.
ModelsMay 11, 2026·Marina Chen·8 min
Why Qwen3-Next 80B beats Llama 3.3 70B at half the cost
Production benchmark of the new Qwen3-Next 3B-active MoE vs Meta's flagship 70B dense. 2× throughput, 6 of 8 eval wins, half the bill — and an Apache 2.0 license.
Read post
EngineeringMay 8, 2026·Sasha Petrov·9 min
FireAttention v3: 2.4× faster inference for open-weight LLMs
Third-generation kernel ships today. FP8 fused prefill, speculative KV eviction, tree-based speculative decoding — and what each means in production.
Read post
More posts coming soon — subscribe via RSS for new releases.