Luminet Audio

Transcription & text-to-speech, real-time.

Whisper variants and neural TTS, served at sub-300 ms streaming latency. Same OpenAI-compatible endpoint shape (/audio/transcriptions, /audio/speech).

Start Read the docs

streaming TTS first-byte

180 ms

STT real-time factor

0.04×

languages (Whisper)

Speech-to-text

Model	Price (per audio hour)	RTF (lower = faster)	Languages
Whisper Large V3 audio/whisper-large-v3	$0.36 / hr	0.18×	99
Whisper Large V3 Turbo audio/whisper-large-v3-turbo	$0.18 / hr	0.06×	99
Distil-Whisper V3 audio/distil-whisper-v3	$0.12 / hr	0.04×	8
Deepgram Nova-3 (routed) audio/deepgram-nova-3	$0.43 / hr	0.08×	30+

Text-to-speech

Model	Price (per 1M chars)	Streaming latency	Voices
Kokoro 82M audio/kokoro-82m	$5 / 1M chars	180 ms	54
XTTS v2 audio/xtts-v2	$8 / 1M chars	220 ms	Custom (clone)
ElevenLabs v3 (routed) audio/elevenlabs-v3	$22 / 1M chars	240 ms	5,000+

Voice agents in one HTTP call

// Transcribe → LLM → Speak, all in 800 ms
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.luminet.ai/v1",
  apiKey: process.env.LUMINET_API_KEY,
});

// 1. STT
const transcript = await client.audio.transcriptions.create({
  file: audioFile,
  model: "audio/whisper-large-v3-turbo",
});

// 2. LLM
const reply = await client.chat.completions.create({
  model: "deepseek/deepseek-v3.2-exp",
  messages: [{ role: "user", content: transcript.text }],
});

// 3. TTS
const speech = await client.audio.speech.create({
  model: "audio/kokoro-82m",
  voice: "luminet/em",
  input: reply.choices[0].message.content,
});