Audio Format
Required audio format for sending and receiving audio over the WebSocket.
Required format
| Property | Value |
|---|---|
| Encoding | PCM signed 16-bit little-endian (L16) |
| Sample rate | 16,000 Hz (16 kHz) |
| Channels | 1 (mono) |
| Frame size | Any — chunk as you go |
| WebSocket frame type | Binary |
This is also known as raw PCM L16 16kHz mono. It is the standard interchange format for real-time voice AI pipelines — most telephony providers support it natively.
Converting from other formats
From mulaw 8kHz (Twilio default)
// Node.js — using the 'alawmulaw' package
const alawmulaw = require('alawmulaw');
// mulaw → PCM 8kHz
const pcm8k = alawmulaw.mulaw.decode(mulawBuffer);
// Upsample 8kHz → 16kHz (simple 2x linear interpolation)
const pcm16k = Buffer.alloc(pcm8k.length * 2 * 2); // 2x samples, 2 bytes each
for (let i = 0; i < pcm8k.length; i++) {
const sample = pcm8k.readInt16LE(i * 2);
pcm16k.writeInt16LE(sample, i * 4);
pcm16k.writeInt16LE(sample, i * 4 + 2);
}
ws.send(pcm16k);From Float32 (browser Web Audio API)
// Convert Float32Array from AudioWorklet to Int16
function float32ToInt16(float32: Float32Array): ArrayBuffer {
const int16 = new Int16Array(float32.length);
for (let i = 0; i < float32.length; i++) {
int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32768));
}
return int16.buffer;
}
// In your AudioWorkletProcessor:
process(inputs) {
const pcmBuffer = float32ToInt16(inputs[0][0]);
this.port.postMessage(pcmBuffer, [pcmBuffer]);
return true;
}From a WAV file (Python)
import wave, websockets, asyncio
async def play_wav(api_key: str, wav_path: str):
uri = f"wss://api.upvoot.com/voice/{api_key}"
async with websockets.connect(uri) as ws:
with wave.open(wav_path, "rb") as wf:
assert wf.getframerate() == 16000
assert wf.getnchannels() == 1
assert wf.getsampwidth() == 2 # 16-bit
while chunk := wf.readframes(1024):
await ws.send(chunk)
# Receive and play agent audio
msg = await ws.recv()
if isinstance(msg, bytes):
play_audio(msg)
asyncio.run(play_wav("sk_live_YOUR_KEY", "caller_audio.wav"))Receiving audio from UpVoot
Audio sent by the agent is also raw PCM L16 16kHz mono. Your telephony provider should be able to play it directly. If your provider expects a different format (e.g. mulaw 8kHz), apply the reverse conversion before sending to the caller.