Luminet Audio
Transcription & text-to-speech, real-time.
Whisper variants and neural TTS, served at sub-300 ms streaming latency. Same OpenAI-compatible endpoint shape (/audio/transcriptions, /audio/speech).
streaming TTS first-byte
180 ms
STT real-time factor
0.04×
languages (Whisper)
99
Speech-to-text
| Model | Price (per audio hour) | RTF (lower = faster) | Languages |
|---|---|---|---|
Whisper Large V3 audio/whisper-large-v3 | $0.36 / hr | 0.18× | 99 |
Whisper Large V3 Turbo audio/whisper-large-v3-turbo | $0.18 / hr | 0.06× | 99 |
Distil-Whisper V3 audio/distil-whisper-v3 | $0.12 / hr | 0.04× | 8 |
Deepgram Nova-3 (routed) audio/deepgram-nova-3 | $0.43 / hr | 0.08× | 30+ |
Text-to-speech
| Model | Price (per 1M chars) | Streaming latency | Voices |
|---|---|---|---|
Kokoro 82M audio/kokoro-82m | $5 / 1M chars | 180 ms | 54 |
XTTS v2 audio/xtts-v2 | $8 / 1M chars | 220 ms | Custom (clone) |
ElevenLabs v3 (routed) audio/elevenlabs-v3 | $22 / 1M chars | 240 ms | 5,000+ |
Voice agents in one HTTP call
// Transcribe → LLM → Speak, all in 800 ms
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.luminet.ai/v1",
apiKey: process.env.LUMINET_API_KEY,
});
// 1. STT
const transcript = await client.audio.transcriptions.create({
file: audioFile,
model: "audio/whisper-large-v3-turbo",
});
// 2. LLM
const reply = await client.chat.completions.create({
model: "deepseek/deepseek-v3.2-exp",
messages: [{ role: "user", content: transcript.text }],
});
// 3. TTS
const speech = await client.audio.speech.create({
model: "audio/kokoro-82m",
voice: "luminet/em",
input: reply.choices[0].message.content,
});