Luminet Audio

Transcription & text-to-speech, real-time.

Whisper variants and neural TTS, served at sub-300 ms streaming latency. Same OpenAI-compatible endpoint shape (/audio/transcriptions, /audio/speech).

streaming TTS first-byte
180 ms
STT real-time factor
0.04×
languages (Whisper)
99

Speech-to-text

ModelPrice (per audio hour)RTF (lower = faster)Languages
Whisper Large V3
audio/whisper-large-v3
$0.36 / hr0.18×99
Whisper Large V3 Turbo
audio/whisper-large-v3-turbo
$0.18 / hr0.06×99
Distil-Whisper V3
audio/distil-whisper-v3
$0.12 / hr0.04×8
Deepgram Nova-3 (routed)
audio/deepgram-nova-3
$0.43 / hr0.08×30+

Text-to-speech

ModelPrice (per 1M chars)Streaming latencyVoices
Kokoro 82M
audio/kokoro-82m
$5 / 1M chars180 ms54
XTTS v2
audio/xtts-v2
$8 / 1M chars220 msCustom (clone)
ElevenLabs v3 (routed)
audio/elevenlabs-v3
$22 / 1M chars240 ms5,000+

Voice agents in one HTTP call

// Transcribe → LLM → Speak, all in 800 ms
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.luminet.ai/v1",
  apiKey: process.env.LUMINET_API_KEY,
});

// 1. STT
const transcript = await client.audio.transcriptions.create({
  file: audioFile,
  model: "audio/whisper-large-v3-turbo",
});

// 2. LLM
const reply = await client.chat.completions.create({
  model: "deepseek/deepseek-v3.2-exp",
  messages: [{ role: "user", content: transcript.text }],
});

// 3. TTS
const speech = await client.audio.speech.create({
  model: "audio/kokoro-82m",
  voice: "luminet/em",
  input: reply.choices[0].message.content,
});