U
UpVoot Docs

Voice Protocol

The message protocol used over the UpVoot WebSocket connection.

The UpVoot voice protocol is designed to be minimal. Most integrations only need to stream binary audio frames — the JSON control messages are optional signals you can listen to for logging, UI updates, or custom logic.

Message types

Binary frames

Every binary WebSocket frame is a chunk of raw PCM audio. There is no framing header — the entire payload is audio data.

  • Client → server: Caller audio (microphone input)
  • Server → client: Agent audio (to be played to the caller)

You can send chunks of any size. Smaller chunks (e.g. 20ms = 640 bytes) reduce latency; larger chunks (e.g. 100ms) reduce overhead. We recommend 20–40ms chunks.

Call state machine

           WebSocket open
                │
                ▼
      ┌─────────────────┐
      │   GREETING      │  Agent speaks greeting message
      └────────┬────────┘
               │  audio_end
               ▼
      ┌─────────────────┐
      │   LISTENING     │  Waiting for caller to speak
      └────────┬────────┘
               │  caller speaks (binary audio)
               ▼
      ┌─────────────────┐
      │   PROCESSING    │  turn_start — LLM generating response
      └────────┬────────┘
               │  audio_start
               ▼
      ┌─────────────────┐
      │   SPEAKING      │  Agent speaking (binary audio to client)
      └────────┬────────┘
               │  audio_end  (or barge-in)
               ▼
      ┌─────────────────┐
      │   LISTENING     │  Loop until max turns or end_call
      └─────────────────┘

Barge-in handling

If the caller speaks while the agent is in SPEAKING state, the platform automatically:

  1. Stops the current agent audio stream
  2. Sends audio_end
  3. Begins processing the caller's interruption

Your telephony provider does not need to do anything special — just keep streaming caller audio even while receiving agent audio.

Call termination

The call can end in several ways:

TriggerWhoBehavior
Client closes WebSocketYour providerCall ends immediately; call is logged
end_call messageClientAgent says a goodbye, then closes
Max turns reachedServerServer sends call_end with reason: "max_turns"
Balance exhaustedServerServer sends call_end with reason: "balance_exhausted"
Silence timeoutServerServer sends call_end with reason: "silence_timeout"

Minimal listener (JavaScript)

ws.addEventListener("message", (event) => {
  if (event.data instanceof ArrayBuffer) {
    // Binary: agent audio — send to caller
    playOrForward(event.data);
    return;
  }

  const msg = JSON.parse(event.data);
  switch (msg.type) {
    case "transcript":
      console.log(`[${msg.role}]: ${msg.text}`);
      break;
    case "call_end":
      console.log("Call ended:", msg.reason);
      ws.close();
      break;
    case "error":
      console.error("Platform error:", msg.code, msg.message);
      break;
  }
});

On this page

No Headings