U
UpVoot Docs

WebSocket Integration

Connect your telephony provider to UpVoot using a standard WebSocket.

Endpoint

wss://api.upvoot.com/voice/<api-key>

This is a standard WebSocket endpoint. No custom headers, SDKs, or libraries are required. Any telephony platform that supports WebSocket audio streaming can connect.

Connection flow

  1. Your telephony provider initiates a WebSocket upgrade to the endpoint above.
  2. UpVoot validates the API key. If invalid, it responds with HTTP 401 and rejects the upgrade.
  3. The agent configuration (system prompt, greeting, voice settings) is loaded based on the API key.
  4. The agent speaks the greeting immediately after the connection is established.
  5. Your provider streams raw audio from the caller to UpVoot; UpVoot streams agent audio back to the caller.
  6. When the caller hangs up, your provider closes the WebSocket. The call is logged and points are deducted.

Audio streaming

Audio is streamed as raw binary frames over the WebSocket in both directions:

DirectionFormatSample rateChannelsBit depth
Client → UpVootPCM (raw)16 kHzMono16-bit signed little-endian
UpVoot → ClientPCM (raw)16 kHzMono16-bit signed little-endian

See Audio Format for conversion examples and codec compatibility.

Control messages

In addition to binary audio frames, the server sends JSON text frames to signal call state:

Message typeDirectionDescription
audio_startServer → ClientAgent has started speaking; binary audio frames will follow
audio_endServer → ClientAgent finished speaking; ready to listen
transcriptServer → ClientFinal transcript of the last turn (agent or caller)
interimServer → ClientPartial (streaming) transcript of caller speech
turn_startServer → ClientLLM has started processing a turn
end_callClient → ServerInstruct the agent to end the call gracefully

Handling barge-in

UpVoot detects when the caller speaks while the agent is talking and interrupts the agent automatically. You do not need to handle this — audio from the caller is always processed regardless of whether the agent is speaking.

Connection limits

  • Maximum call duration: 30 minutes (configurable per agent)
  • Maximum turns per call: configurable in agent settings (default 20)
  • Silence timeout: 10 seconds of continuous silence ends the call

Minimal client example (Node.js)

const WebSocket = require("ws");

const ws = new WebSocket(
  "wss://api.upvoot.com/voice/sk_live_YOUR_API_KEY"
);

ws.on("open", () => {
  console.log("Connected — agent will speak greeting");
});

ws.on("message", (data) => {
  if (typeof data === "string") {
    const msg = JSON.parse(data);
    if (msg.type === "transcript") {
      console.log(`[${msg.role}] ${msg.text}`);
    }
  } else {
    // Binary: PCM audio from agent — send to caller
    streamToCaller(data);
  }
});

// Stream PCM audio from the caller to UpVoot
function onCallerAudio(pcmBuffer) {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(pcmBuffer);
  }
}

Exotel-specific setup

In the Exotel Applet builder, create a Connect to WebSocket node and set the URL to:

wss://api.upvoot.com/voice/<api-key>

Set the audio codec to PCM / L16 / 16kHz (also called "raw linear"). No additional headers are required.

See Supported Providers for a full Exotel walkthrough.

On this page

No Headings