VoiceOS API turns speech into polished, context-aware text. Unlike a plain ASR endpoint, VoiceOS accepts both audio and an OpenAI-style message stack, then uses that context to recognize rare terms and correct likely transcription misses.Documentation Index
Fetch the complete documentation index at: https://docs.voiceos.com/llms.txt
Use this file to discover all available pages before exploring further.
Promptable ASR
Send conversation context with audio so the API understands product names, files, identifiers, and domain language.
Fast keyword extraction
Extract ASR vocabulary locally in under 10ms without an LLM or network call.
OpenAI-like request shape
Use multipart uploads with
file, messages, languages, and optional vocabulary.Polished transcripts
Get
raw_text from ASR and final text after LLM polish.What you can build
- Voice agents that understand the active conversation.
- Developer tools that recognize filenames, symbols, and code identifiers.
- Meeting or support transcription with product-specific vocabulary.
- Dictation experiences that return clean, polished text instead of raw ASR.
Current status
The current implementation is a local developer prototype:- Endpoint:
POST /v1/audio/transcriptions - Local base URL:
http://localhost:3000 - Auth: none in non-production local development
- Production API keys, billing, rate limits, and public deployment are not enabled yet
The public API contract is intentionally designed to look familiar to developers using OpenAI-style message arrays, while preserving VoiceOS-specific promptable ASR behavior.

