UpVoot — Voice AI Agents for Indian Businesses

Endpoint

wss://api.upvoot.com/voice/<api-key>

This is a standard WebSocket endpoint. No custom headers, SDKs, or libraries are required. Any telephony platform that supports WebSocket audio streaming can connect.

Connection flow

Your telephony provider initiates a WebSocket upgrade to the endpoint above.
UpVoot validates the API key. If invalid, it responds with HTTP 401 and rejects the upgrade.
The agent configuration (system prompt, greeting, voice settings) is loaded based on the API key.
The agent speaks the greeting immediately after the connection is established.
Your provider streams raw audio from the caller to UpVoot; UpVoot streams agent audio back to the caller.
When the caller hangs up, your provider closes the WebSocket. The call is logged and points are deducted.

Audio streaming

Audio is streamed as raw binary frames over the WebSocket in both directions:

Direction	Format	Sample rate	Channels	Bit depth
Client → UpVoot	PCM (raw)	16 kHz	Mono	16-bit signed little-endian
UpVoot → Client	PCM (raw)	16 kHz	Mono	16-bit signed little-endian

See Audio Format for conversion examples and codec compatibility.

Control messages

In addition to binary audio frames, the server sends JSON text frames to signal call state:

Message type	Direction	Description
`audio_start`	Server → Client	Agent has started speaking; binary audio frames will follow
`audio_end`	Server → Client	Agent finished speaking; ready to listen
`transcript`	Server → Client	Final transcript of the last turn (agent or caller)
`interim`	Server → Client	Partial (streaming) transcript of caller speech
`turn_start`	Server → Client	LLM has started processing a turn
`end_call`	Client → Server	Instruct the agent to end the call gracefully

Handling barge-in

UpVoot detects when the caller speaks while the agent is talking and interrupts the agent automatically. You do not need to handle this — audio from the caller is always processed regardless of whether the agent is speaking.

Connection limits

Maximum call duration: 30 minutes (configurable per agent)
Maximum turns per call: configurable in agent settings (default 20)
Silence timeout: 10 seconds of continuous silence ends the call

Minimal client example (Node.js)

const WebSocket = require("ws");

const ws = new WebSocket(
  "wss://api.upvoot.com/voice/sk_live_YOUR_API_KEY"
);

ws.on("open", () => {
  console.log("Connected — agent will speak greeting");
});

ws.on("message", (data) => {
  if (typeof data === "string") {
    const msg = JSON.parse(data);
    if (msg.type === "transcript") {
      console.log(`[${msg.role}] ${msg.text}`);
    }
  } else {
    // Binary: PCM audio from agent — send to caller
    streamToCaller(data);
  }
});

// Stream PCM audio from the caller to UpVoot
function onCallerAudio(pcmBuffer) {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(pcmBuffer);
  }
}

Exotel-specific setup

In the Exotel Applet builder, create a Connect to WebSocket node and set the URL to:

wss://api.upvoot.com/voice/<api-key>

Set the audio codec to PCM / L16 / 16kHz (also called "raw linear"). No additional headers are required.

See Supported Providers for a full Exotel walkthrough.

WebSocket Integration