Voice Configuration

The Voice tab configures the three core services in your agent's pipeline.

Speech-to-Text (STT)

Converts the caller's speech into text for the LLM.

Provider	Models	Languages	Notes
Deepgram	nova-3-general	30+ languages	Low latency, high accuracy
Sarvam	saarika:v2.5, saaras:v3	Indian languages	Optimized for Hindi, Tamil, etc.

Converts the LLM's text response into speech.

Provider	Models	Voices	Notes
Deepgram	aura-2-helena-en	Multiple English voices	Natural sounding, fast
Sarvam	bulbul:v3-beta	anushka (F), shubh (M)	Indian language voices

The brain of your agent — processes the conversation and generates responses.

Provider API keys can be configured at two levels:

Add ambient audio during calls for a more natural experience:

Sound	Description
None	Silent background (default)
Office	Office ambiance
Cafe	Coffee shop atmosphere
Rain	Rain sounds
White Noise	Consistent background noise
Nature	Birds, wind, outdoor sounds
Keyboard	Typing sounds

Adjust the volume slider (0–100%) to control how loud the background sound is relative to the agent's voice.