Most LLMs are too slow for phone calls. We optimized the entire stack: VAD, STT, LLM, and TTS to shave off every millisecond of delay.
Optimised for real-time calling. We host inference at the edge in Sydney to minimize network travel time for local calls.
Users can interrupt the AI mid-sentence. Our Voice Activity Detection (VAD) stops audio playback instantly, just like a human would.
Tuned for 8kHz phone audio. It filters out background noise, static, and poor reception to understand intent clearly.
Define strict guardrails that controls what the AI says in relation to your products or competitors, rather than letting AI improvise.
The engine distinguishes between a "pause for thought" and "end of turn," reducing those awkward moments where the AI cuts you off.
In voice, speed = intelligence. Slow responses make users hang up. Our infrastructure is peered directly with major AU carriers to ensure the fastest possible packet transit.
Don't start from zero. Clone these battle-tested JSON configurations.
High endpointing timeout (1200ms) for consultative calls where users speak in long paragraphs.
Aggressive turn-taking (400ms endpointing) and concise answers for fast-paced qualifying calls.
PII scrubbing enabled, strict state enforcement, and zero data retention mode.
Listen to raw audio output demonstrating edge-case handling.
User: "Actually, wait, stop." -> AI stops instantly and asks for clarification.
A rapid back-and-forth conversation checking name, address, and date without pauses.
User uses terms like "Arvo", "Rego", and "Ute" -> AI understands and responds appropriately.
Your telephony provider sends µ-law 8kHz audio via WebSocket. We buffer and process typically within <50ms.
Our Voice Activity Detector flags speech vs noise. The transcription model converts speech to text, optimized for Australian accents.
The LLM determines the next action based on your defined State Graph. It checks guardrails before generating a single token.
The TTS engine begins streaming audio bytes back to the caller before the full sentence is even generated.
Connect via standard protocols. No proprietary hardware required.
One-click XML configuration to fork audio from Twilio Programmable Voice.
Native support for VXML and WebSocket audio forks.
Direct SIP-in capabilities for high-volume enterprise diallers.
Control the call logic from your own backend code in real-time.
Check our Status Page for real-time latency metrics across all Australian capital cities.
On Australian networks, we aim for <800ms "Voice-to-Voice" latency. This includes network transit, transcription, LLM token generation, and TTS synthesis.