Project Showcase

Voice AI Demo

Self-Hosted Conversational Voice AI Assistant

A fully self-hosted, open-source conversational voice AI assistant powered by LiveKit Agents. Features real-time WebRTC audio transport, a configurable AI pipeline with VAD, STT, LLM, and TTS, a polished Next.js web UI, and Apple Silicon Metal GPU acceleration — all running locally with zero cloud dependencies.

Tech Stack

LiveKit AgentsFastAPIPython 3.12Next.js 15React 19Tailwind CSS 4WhisperOllamaEdge-TTSDockerWebRTC
Built with LiveKit Agents — Real-time voice pipeline with VAD, STT, LLM, and TTS

Key Features

Real-Time Voice Conversation

Low-latency WebRTC audio transport via LiveKit. Speak naturally and hear responses in real-time with sub-2s latency.

100% Self-Hosted

All AI models run locally — Silero VAD, faster-whisper STT, Ollama LLM, and Edge-TTS. No cloud dependencies, no API keys, no data leaving your machine.

Configurable AI Pipeline

Swap STT, LLM, and TTS providers at runtime via .env. Use local models or switch to Groq, OpenAI, Cartesia, or any OpenAI-compatible API.

Apple Silicon Optimized

Metal GPU acceleration for Whisper, Llama, and other models. Runs efficiently on MacBook hardware with minimal resource usage.

Architecture

LiveKit Agents SDK 1.5 for voice pipeline orchestration
FastAPI for JWT token generation and health check endpoints
Silero VAD for speech segment and utterance boundary detection
faster-whisper for local STT transcription (large-v3-turbo)
Ollama with llama3.2:3b for local LLM inference
Edge-TTS for speech synthesis with natural-sounding voices
Docker-based LiveKit WebRTC SFU for media routing
SafeSTT/SafeLLM/SafeTTS wrappers for graceful error handling

Diagrams

Voice AI Demo architecture
Four tiers: browser layer (Next.js 15 + LiveKit SDK), agent service (FastAPI + LiveKit Agent Worker), media layer (LiveKit Server in Docker), and AI voice pipeline (VAD → STT → LLM → TTS) with Safe wrappers for resilience.
Voice AI Demo request and audio streaming flow
Browser requests a JWT token from FastAPI, then connects to LiveKit via WebRTC. LiveKit dispatches a job to the Agent Worker which runs the voice pipeline — VAD, STT, LLM (Ollama), and TTS — and streams audio back through LiveKit to the browser.

API Usage

Start the Full Stack
One-command deployment...
# 1. Start LiveKit Server
docker compose up -d

# 2. Pull LLM model
ollama pull llama3.2:3b

# 3. Start Agent Service
cd agent && python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python main.py --test-mode  # or without --test-mode for real models

# 4. Start Frontend
cd frontend && npm install && npm run dev

# Open http://localhost:3000

4-Stage Voice Pipeline

1

VAD (Voice Activity Detection)

Detects speech segments and utterance boundaries using Silero VAD

SileroVAD.detect_voice(audio_chunk) → speech_segments
2

STT (Speech-to-Text)

Transcribes audio to text using local faster-whisper model (large-v3-turbo)

WhisperSTT.transcribe(audio) → "Hello, how can I help you?"
3

LLM (Language Model)

Generates natural language responses via Ollama with OpenAI-compatible API

OllamaLLM.generate(prompt) → "I can help you with that!"
4

TTS (Text-to-Speech)

Synthesizes response audio using Microsoft Edge TTS engine

EdgeTTS.synthesize(text) → audio_bytes

Provider Switching

Example .env Configurations

Groq LLM

LLM_PROVIDER=openai_compatible · LLM_MODEL=llama-3.3-70b-versatile

OpenAI STT

STT_PROVIDER=openai_compatible · STT_MODEL=whisper-1

Cartesia TTS

TTS_PROVIDER=openai_compatible · TTS_MODEL=sonic-2

Add corresponding API keys to .env. System auto-detects provider type at startup.

Impact

Real-time voice conversation with full privacy — all models run locally, zero API costs, sub-2s response latency on Apple Silicon.

4-stage

Voice Pipeline

Real-time

WebRTC Streaming

0 API

Cloud Costs