Building Production-Ready Voice Agents with LiveKit
Setting the stage for building real-time voice agents
You've probably talked to a voice AI recently. Maybe it was frustrating. Maybe it cut you off. Maybe it felt robotic. Building a voice agent that actually feels natural is harder than it looks. But once you understand the fundamentals, you can build something people genuinely want to talk to, don't mind talking to, and actually prefer to talk to.
That's what this workshop is about.
Who this is for
This workshop assumes you're comfortable writing Python. If you can write functions, work with classes, and you've seen async/await before, you're good. You don't need prior experience with voice AI, machine learning, or real-time systems. Those concepts are explained as they come up.
Why now
Voice AI hit an inflection point. Speech-to-text models are faster and more accurate. LLMs can handle nuanced conversations. Text-to-speech sounds human. The pieces exist. The challenge is wiring them together into something that feels responsive and natural.
A year ago, building this required specialized audio engineering knowledge and months of work. Today, you can get a working voice agent running in just a few minutes.
What you'll learn
Here's what you'll learn throughout the workshop:
- Voice pipeline fundamentals — how VAD, STT, LLM, and TTS work together
- Latency optimization — why milliseconds matter and how to minimize delays
- Natural turn-taking — techniques so your agent doesn't interrupt mid-sentence
- Personality and failover — customize prompts, voices, and set up provider redundancy
- Tool integration — connect external APIs and capabilities
- Workflows and handoffs — build real workflows for production use cases
By the end, you'll have an agent that listens, thinks, and responds in real time. Not a toy demo. A foundation you can actually ship. Each lesson adds a new capability, and each one builds on the last.
Tech stack
This workshop uses the LiveKit Agents SDK for Python. LiveKit handles the real-time infrastructure so you can focus on the agent logic.
Note: While this workshop uses Python, LiveKit also offers a TypeScript Agents SDK if you prefer working in JavaScript or TypeScript.
The models used throughout the workshop:
- Python 3.10+
- LiveKit Agents SDK
- AssemblyAI (STT)
- OpenAI (LLM)
- Cartesia (TTS)
These are not the only options. The Agents SDK supports a wide range of providers for each component. Want to use Azure for STT? Anthropic for your LLM? ElevenLabs for TTS? You can swap them out with just a few lines of code. The patterns you learn here apply regardless of which providers you choose.
We'll use LiveKit Inference, which comes with credits, to get access to all of the models so you can follow along without spending any money.
Prerequisites
Before starting, make sure you have:
- Python 3.10+ installed on your machine
- uv package manager (installation guide)
- A code editor ready
- A LiveKit Cloud account with API key and secret (sign up free)
Workshop roadmap
These are the different topics covered in this workshop:
- Foundations — architecture, baseline agent
- Turn detection — improve conversational dynamics with semantic turn-taking
- Personality & fallbacks — customize prompts, voices, and provider redundancy
- Metrics, deployment, and latency optimization — capture usage data and reduce response delays
- Tools & MCP — integrate external capabilities and control planes
- Consent & handoffs — build workflows for real customers
Each lesson introduces new LiveKit APIs and updates agent.py incrementally.