Team

The engineers behind your engagement.

Solutions engagements aren't outsourced to a partner network. Every project is staffed by named engineers from the team that built FireAttention.

Founders

MR

Marina Rojas

Co-founder & CEO

Founder

Marina led inference infrastructure at a frontier AI lab for three years, where she shipped the FP8 production stack used in their flagship models. Before that, compiler optimisation work on TPU at a hyperscaler research org. She started Luminet because no inference platform was selling 'we make your GPUs faster' as a real product.

Previously
Inference infra @ frontier labTPU compiler @ hyperscaler
Specialties
Production inferenceCustomer engagementHiring
SP

Sasha Petrov

Co-founder & CTO

Founder

Sasha wrote the first version of FireAttention in his apartment over a long weekend in 2024. He spent four years on a major open-model lab's serving team, where he was a primary author of the FP8 fused attention kernel that shipped with their flagship release. PhD in systems from a top-3 US program.

Previously
Llama-class serving teamPhD CS
Specialties
CUDA kernelsSpeculative decodingSystems

Engineering

The people who get assigned to your engagement. Not a partner network. Not contractors.

AV

Anya Vasiliev

Forward Deployed Engineering Lead

Anya runs our customer-facing engineering team. Previously VP Infrastructure at a Series-D AI company, then led professional services at a Ray-based inference startup. She's shipped over 30 customer migrations.

Previously
Pro Services @ inference startupVP Infra @ Series-D AI
Specialties
Migration engagementsPerformance auditsOn-prem deployment
MC

Marina Chen

Inference Lead

Marina is our deepest expert on MoE inference. She spent a year on a frontier-lab serving team for a 685B-parameter MoE model, and worked on Llama serving before that. Author of the Qwen3-Next vs Llama 3.3 70B benchmark study.

Previously
Frontier-lab serving (MoE)Llama serving
Specialties
MoE expert routingQuantizationModel benchmarking
DK

Daniel Kahn

Kernel Engineer

Daniel writes the kernels you don't want to write yourself. He worked on a major hyperscaler's TensorRT-LLM stack, then on Triton compiler internals at a frontier lab. If your forward pass has a custom op, Daniel can probably make it 2× faster in a week.

Previously
Triton compiler @ frontier labTensorRT-LLM @ hyperscaler
Specialties
CUDA kernelsTritonCUTLASS
EB

Erika Brandt

Solutions Engineer

Erika owns customer migrations end-to-end. She came from a serverless compute platform where she built the LoRA-serving integration that became the model for our multi-LoRA product. Background in distributed systems before AI.

Previously
Serverless compute platformStreaming infrastructure
Specialties
Multi-LoRACustomer integrationAPI design
RL

Ravi Lakshmi

SRE Lead

Ravi keeps the lights on. Nine years at a major fintech in payments infrastructure, then three years building a multi-region inference fleet at an open-model serving startup. If a customer needs 99.99%, Ravi is who designs the architecture for it.

Previously
Multi-region inference @ open-model startupPayments infra @ fintech
Specialties
Multi-region deploymentSLA designIncident response
JM

Julia Moreau

ML Performance Engineer

Julia evaluates every model we host. She came from a major model-hub's evaluation team and led benchmark methodology there. Her work shapes which models become 'featured' on the platform and which ones we recommend skipping.

Previously
Eval team @ model hubENS Paris
Specialties
Quality validationEval designQuantization QA
Hiring

Want to be on the next case study?

We're hiring inference engineers, kernel writers, and customer-facing engineers. Equity-heavy comp, async-first, shipping every week.