Blog

Engineering notes from the Luminet team

Inference benchmarks, kernel deep-dives, model launches, and lessons from running open-weight LLMs at production scale.

ModelsMay 11, 2026·Marina Chen·8 min

Why Qwen3-Next 80B beats Llama 3.3 70B at half the cost

Production benchmark of the new Qwen3-Next 3B-active MoE vs Meta's flagship 70B dense. 2× throughput, 6 of 8 eval wins, half the bill — and an Apache 2.0 license.

Read post

EngineeringMay 8, 2026·Sasha Petrov·9 min

FireAttention v3: 2.4× faster inference for open-weight LLMs

Third-generation kernel ships today. FP8 fused prefill, speculative KV eviction, tree-based speculative decoding — and what each means in production.

Read post

More posts coming soon — subscribe via RSS for new releases.