LiveKit Vision Demo
March 10, 2026 ยท View on GitHub
Warning
This example is outdated. While it still works great, we've now added built-in vision support to a whole set of new frontend starter apps for every platform and live video is easy to add to the Python agent starter repository for the latest example.
LiveKit Vision Demo
This LiveKit sample app shows a voice AI assistant with realtime audio and video input.
It contains a native iOS frontend, built on LiveKit's Swift SDK, and a backend agent, built on LiveKit's Python Agents framework and the Gemini Live API.
Features
Real-time Video & Audio
- ๐ฑ Front and back camera support
- ๐๏ธ Natural voice conversations
- ๐ฅ๏ธ Live screen sharing
Background Support
- ๐ Continues running while using other apps
- ๐ฌ Voice conversations in background
- ๐ Screen monitoring while multitasking
The assistant can observe and interact with you seamlessly, whether you're actively using the app or working on other tasks.
Agent Architecture
The backend agent is built on the MultimodalAgent class hooked up to the Gemini Live API.
Video frames are sampled at 1 frame per second while the user speaks, and 0.3 frames per second otherwise. Images are sent as JPEG at 1024x1024 max size. For more information on video input, see the LiveKit Agents vision docs.
Running Locally
This project is meant to be a starting point for your own project, and is easy to run locally.
Running the Agent
Prerequisites
- LiveKit Cloud project
- Google Gemini API Key
- Python 3
Setup
Put your LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET, GOOGLE_API_KEY into a file called agent/.env.
Then install dependencies
cd agent
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Finally, run the agent with:
python main.py dev
Using the Agents Playground
This project is fully compatible with LiveKit's Agents Playground, so you can easily test the agent in your browser without having to build the iOS app. Just go to the playground, pick your cloud project, and connect! There is a checkbox to "Enable camera" if you wish to share your camera feed with the agent.
Running the iOS App
This project includes a sample iOS app that you can build yourself.
Prerequisites
- Xcode 16
- Device with iOS 17+ (simulator is not supported)
- LiveKit Cloud project
- A token server (enable from your project's Options on the Settings page)
Setup
- Open
swift-frontend/VisionDemo/VisionDemo.xcodeprojin Xcode. - Create a file
swift-frontend/VisionDemo/Resources/Secrets.xcconfigwithLK_SANDBOX_TOKEN_SERVER_ID=and your token server's unique ID. - Edit the bundle identifier for the
VisionDemotarget to a suitable values for your own use. - Edit the bundle identifier for the
BroadcastExtensionto<your-bundle-identifier>.broadcast. - Create a new App Group called
group.<your-bundle-identifier>and select it in the "Signing & Capabilities" section of theVisionDemotarget. - Build and run the app on your device.
Self-Hosted Options
This project is built with the LiveKit Cloud token server (enable from your project's Options on the Settings page) to make token generation easy. If you want to self-host or run a local LiveKit instance, you'll need to modify swift-frontend/VisionDemo/Services/TokenService.swift file to fetch your token from your own server and remove the noise-cancellation plugin from the agent (enhanced noise cancellation is a LiveKit Cloud feature).