Project Showcase
Voice AI Demo
Self-Hosted Conversational Voice AI Assistant
A fully self-hosted, open-source conversational voice AI assistant powered by LiveKit Agents. Features real-time WebRTC audio transport, a configurable AI pipeline with VAD, STT, LLM, and TTS, a polished Next.js web UI, and Apple Silicon Metal GPU acceleration — all running locally with zero cloud dependencies.
Tech Stack
Key Features
Real-Time Voice Conversation
Low-latency WebRTC audio transport via LiveKit. Speak naturally and hear responses in real-time with sub-2s latency.
100% Self-Hosted
All AI models run locally — Silero VAD, faster-whisper STT, Ollama LLM, and Edge-TTS. No cloud dependencies, no API keys, no data leaving your machine.
Configurable AI Pipeline
Swap STT, LLM, and TTS providers at runtime via .env. Use local models or switch to Groq, OpenAI, Cartesia, or any OpenAI-compatible API.
Apple Silicon Optimized
Metal GPU acceleration for Whisper, Llama, and other models. Runs efficiently on MacBook hardware with minimal resource usage.
Architecture
Diagrams
API Usage
# 1. Start LiveKit Server
docker compose up -d
# 2. Pull LLM model
ollama pull llama3.2:3b
# 3. Start Agent Service
cd agent && python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python main.py --test-mode # or without --test-mode for real models
# 4. Start Frontend
cd frontend && npm install && npm run dev
# Open http://localhost:30004-Stage Voice Pipeline
VAD (Voice Activity Detection)
Detects speech segments and utterance boundaries using Silero VAD
SileroVAD.detect_voice(audio_chunk) → speech_segmentsSTT (Speech-to-Text)
Transcribes audio to text using local faster-whisper model (large-v3-turbo)
WhisperSTT.transcribe(audio) → "Hello, how can I help you?"LLM (Language Model)
Generates natural language responses via Ollama with OpenAI-compatible API
OllamaLLM.generate(prompt) → "I can help you with that!"TTS (Text-to-Speech)
Synthesizes response audio using Microsoft Edge TTS engine
EdgeTTS.synthesize(text) → audio_bytesProvider Switching
Groq LLM
LLM_PROVIDER=openai_compatible · LLM_MODEL=llama-3.3-70b-versatile
OpenAI STT
STT_PROVIDER=openai_compatible · STT_MODEL=whisper-1
Cartesia TTS
TTS_PROVIDER=openai_compatible · TTS_MODEL=sonic-2
Add corresponding API keys to .env. System auto-detects provider type at startup.
Impact
Real-time voice conversation with full privacy — all models run locally, zero API costs, sub-2s response latency on Apple Silicon.
4-stage
Voice Pipeline
Real-time
WebRTC Streaming
0 API
Cloud Costs