Voice Protocol
The message protocol used over the UpVoot WebSocket connection.
The UpVoot voice protocol is designed to be minimal. Most integrations only need to stream binary audio frames — the JSON control messages are optional signals you can listen to for logging, UI updates, or custom logic.
Message types
Binary frames
Every binary WebSocket frame is a chunk of raw PCM audio. There is no framing header — the entire payload is audio data.
- Client → server: Caller audio (microphone input)
- Server → client: Agent audio (to be played to the caller)
You can send chunks of any size. Smaller chunks (e.g. 20ms = 640 bytes) reduce latency; larger chunks (e.g. 100ms) reduce overhead. We recommend 20–40ms chunks.
Call state machine
WebSocket open
│
▼
┌─────────────────┐
│ GREETING │ Agent speaks greeting message
└────────┬────────┘
│ audio_end
▼
┌─────────────────┐
│ LISTENING │ Waiting for caller to speak
└────────┬────────┘
│ caller speaks (binary audio)
▼
┌─────────────────┐
│ PROCESSING │ turn_start — LLM generating response
└────────┬────────┘
│ audio_start
▼
┌─────────────────┐
│ SPEAKING │ Agent speaking (binary audio to client)
└────────┬────────┘
│ audio_end (or barge-in)
▼
┌─────────────────┐
│ LISTENING │ Loop until max turns or end_call
└─────────────────┘Barge-in handling
If the caller speaks while the agent is in SPEAKING state, the platform automatically:
- Stops the current agent audio stream
- Sends
audio_end - Begins processing the caller's interruption
Your telephony provider does not need to do anything special — just keep streaming caller audio even while receiving agent audio.
Call termination
The call can end in several ways:
| Trigger | Who | Behavior |
|---|---|---|
| Client closes WebSocket | Your provider | Call ends immediately; call is logged |
end_call message | Client | Agent says a goodbye, then closes |
| Max turns reached | Server | Server sends call_end with reason: "max_turns" |
| Balance exhausted | Server | Server sends call_end with reason: "balance_exhausted" |
| Silence timeout | Server | Server sends call_end with reason: "silence_timeout" |
Minimal listener (JavaScript)
ws.addEventListener("message", (event) => {
if (event.data instanceof ArrayBuffer) {
// Binary: agent audio — send to caller
playOrForward(event.data);
return;
}
const msg = JSON.parse(event.data);
switch (msg.type) {
case "transcript":
console.log(`[${msg.role}]: ${msg.text}`);
break;
case "call_end":
console.log("Call ended:", msg.reason);
ws.close();
break;
case "error":
console.error("Platform error:", msg.code, msg.message);
break;
}
});