U
UpVoot Docs

Audio Format

Required audio format for sending and receiving audio over the WebSocket.

Required format

PropertyValue
EncodingPCM signed 16-bit little-endian (L16)
Sample rate16,000 Hz (16 kHz)
Channels1 (mono)
Frame sizeAny — chunk as you go
WebSocket frame typeBinary

This is also known as raw PCM L16 16kHz mono. It is the standard interchange format for real-time voice AI pipelines — most telephony providers support it natively.

Converting from other formats

From mulaw 8kHz (Twilio default)

// Node.js — using the 'alawmulaw' package
const alawmulaw = require('alawmulaw');

// mulaw → PCM 8kHz
const pcm8k = alawmulaw.mulaw.decode(mulawBuffer);

// Upsample 8kHz → 16kHz (simple 2x linear interpolation)
const pcm16k = Buffer.alloc(pcm8k.length * 2 * 2); // 2x samples, 2 bytes each
for (let i = 0; i < pcm8k.length; i++) {
  const sample = pcm8k.readInt16LE(i * 2);
  pcm16k.writeInt16LE(sample, i * 4);
  pcm16k.writeInt16LE(sample, i * 4 + 2);
}
ws.send(pcm16k);

From Float32 (browser Web Audio API)

// Convert Float32Array from AudioWorklet to Int16
function float32ToInt16(float32: Float32Array): ArrayBuffer {
  const int16 = new Int16Array(float32.length);
  for (let i = 0; i < float32.length; i++) {
    int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32768));
  }
  return int16.buffer;
}

// In your AudioWorkletProcessor:
process(inputs) {
  const pcmBuffer = float32ToInt16(inputs[0][0]);
  this.port.postMessage(pcmBuffer, [pcmBuffer]);
  return true;
}

From a WAV file (Python)

import wave, websockets, asyncio

async def play_wav(api_key: str, wav_path: str):
    uri = f"wss://api.upvoot.com/voice/{api_key}"
    async with websockets.connect(uri) as ws:
        with wave.open(wav_path, "rb") as wf:
            assert wf.getframerate() == 16000
            assert wf.getnchannels() == 1
            assert wf.getsampwidth() == 2  # 16-bit
            while chunk := wf.readframes(1024):
                await ws.send(chunk)
                # Receive and play agent audio
                msg = await ws.recv()
                if isinstance(msg, bytes):
                    play_audio(msg)

asyncio.run(play_wav("sk_live_YOUR_KEY", "caller_audio.wav"))

Receiving audio from UpVoot

Audio sent by the agent is also raw PCM L16 16kHz mono. Your telephony provider should be able to play it directly. If your provider expects a different format (e.g. mulaw 8kHz), apply the reverse conversion before sending to the caller.

On this page

No Headings