Team

The engineers behind your engagement.

Solutions engagements aren't outsourced to a partner network. Every project is staffed by named engineers from the team that built FireAttention.

Book a discovery call Browse services

Founders

Marina Rojas

Co-founder & CEO

Founder

Marina led inference infrastructure at a frontier AI lab for three years, where she shipped the FP8 production stack used in their flagship models. Before that, compiler optimisation work on TPU at a hyperscaler research org. She started Luminet because no inference platform was selling 'we make your GPUs faster' as a real product.

Previously

Inference infra @ frontier labTPU compiler @ hyperscaler

Specialties

Production inferenceCustomer engagementHiring

Sasha Petrov

Co-founder & CTO

Founder

Sasha wrote the first version of FireAttention in his apartment over a long weekend in 2024. He spent four years on a major open-model lab's serving team, where he was a primary author of the FP8 fused attention kernel that shipped with their flagship release. PhD in systems from a top-3 US program.

Previously

Llama-class serving teamPhD CS

Specialties

CUDA kernelsSpeculative decodingSystems

Engineering

The people who get assigned to your engagement. Not a partner network. Not contractors.

Anya Vasiliev

Forward Deployed Engineering Lead

Anya runs our customer-facing engineering team. Previously VP Infrastructure at a Series-D AI company, then led professional services at a Ray-based inference startup. She's shipped over 30 customer migrations.

Previously

Pro Services @ inference startupVP Infra @ Series-D AI

Specialties

Migration engagementsPerformance auditsOn-prem deployment

Marina Chen

Inference Lead

Marina is our deepest expert on MoE inference. She spent a year on a frontier-lab serving team for a 685B-parameter MoE model, and worked on Llama serving before that. Author of the Qwen3-Next vs Llama 3.3 70B benchmark study.

Previously

Frontier-lab serving (MoE)Llama serving

Specialties

MoE expert routingQuantizationModel benchmarking

Daniel Kahn

Kernel Engineer

Daniel writes the kernels you don't want to write yourself. He worked on a major hyperscaler's TensorRT-LLM stack, then on Triton compiler internals at a frontier lab. If your forward pass has a custom op, Daniel can probably make it 2× faster in a week.

Previously

Triton compiler @ frontier labTensorRT-LLM @ hyperscaler

Specialties

CUDA kernelsTritonCUTLASS

Erika Brandt

Solutions Engineer

Erika owns customer migrations end-to-end. She came from a serverless compute platform where she built the LoRA-serving integration that became the model for our multi-LoRA product. Background in distributed systems before AI.

Previously

Serverless compute platformStreaming infrastructure

Specialties

Multi-LoRACustomer integrationAPI design

Ravi Lakshmi

SRE Lead

Ravi keeps the lights on. Nine years at a major fintech in payments infrastructure, then three years building a multi-region inference fleet at an open-model serving startup. If a customer needs 99.99%, Ravi is who designs the architecture for it.

Previously

Multi-region inference @ open-model startupPayments infra @ fintech

Specialties

Multi-region deploymentSLA designIncident response

Julia Moreau

ML Performance Engineer

Julia evaluates every model we host. She came from a major model-hub's evaluation team and led benchmark methodology there. Her work shapes which models become 'featured' on the platform and which ones we recommend skipping.

Previously

Eval team @ model hubENS Paris

Specialties

Quality validationEval designQuantization QA

Hiring

Want to be on the next case study?

We're hiring inference engineers, kernel writers, and customer-facing engineers. Equity-heavy comp, async-first, shipping every week.

See open roles Send a pitch