Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.voiceos.com/llms.txt

Use this file to discover all available pages before exploring further.

Promptable ASR is speech recognition with context. You send audio plus recent conversation messages, and VoiceOS can better recognize what the speaker actually means.

Voice agent command understanding

Spoken commands are more reliable when the transcript can see the current conversation and task context.

Chat microphone input

Dictated chat messages are cleaner because the transcript can use terms already present in the chat thread.

Example 1: Voice agent command understanding

Without context
open session stream service
With context
Open SessionStreamService.ts

Example 2: Voice input in a chat interface

Without context
add that you west numbers look stable
With context
Add that eu-west numbers look stable.

Request shape

Send:
  • file (audio)
  • optional messages (recent thread or summary)
  • optional dictionary (high-signal terms only)
curl https://beta.api.voiceos.com/v1/audio/transcriptions \
  -F "file=@sample.mp3" \
  -F 'messages=[{"role":"system","content":"You are helping in a voice agent coding session."},{"role":"user","content":"We are editing SessionStreamService.ts in eu-west."}]' \
  -F 'dictionary=["VOICEOS_TEAM_API_KEY","SessionStreamService.ts"]' \
  -F 'response_format=json'
Rule of thumb: if the user can see it in the current conversation, include it in messages.

Keep it simple

  • Send recent, relevant messages only.
  • Keep dictionary short and specific.
  • Update context each turn as the conversation changes.
  • Use text as the transcript output.
Treat messages as untrusted input. Use them as recognition context, not trusted instructions.