ποΈ Observer AI
May 16, 2026 Β· View on GitHub
ποΈ Observer AI
YouTube | Tiktok | Instagram | Twitter | Discord
Build powerful micro-agents that observe, log and react, so you don't have to.
All while keeping your data 100% private and secure.
ποΈ How Observer Agents Work
Sensors β |
Models β |
Tools |
|
|
|
|
π€ Base Agent Example
Sends an email when the Observer logo is on screen
System Prompt (uses $SCREEN for multimodal screen input)
You are an Observer agent, watch the screen and if you see the Observer logo say OBSERVER, if you don't, say CONTINUE.
$SCREEN
Code using Email Tool if model identified an Observer logo
if(response.includes("OBSERVER")){
sendEmail("your@email.com", response, screen); //sends the screen as an attached image
}
π― What Observer AI Does Best
π Intelligent Loggingπ§ Text & Visual Memory π₯ Smart Screen Recording |
π¨ Powerful Notificationsπ§ Email β’ π¬ Discord β’ π± Telegram π SMS β’ π WhatsApp β’ Pushover |
ποΈ Building Your Own Agent
Creating your own Observer AI consist of three things:
- SENSORS - input that your model will have
- MODELS - Small LLMs
- TOOLS - functions for your model to use
Quick Start
- Navigate to the Agent Dashboard and click "Create New Agent"
- Fill in the "Configuration" tab with basic details (name, description, model, loop interval)
- Give your model a system prompt and Sensors! The current Sensors that exist are:
- Screenshot ($SCREEN) Captures screen as an image for multimodal models.
- Screen OCR ($SCREEN_OCR) Captures screen content as text via OCR.
- CAMERA ($CAMERA) Captures the camera for multimodal models.
- CAMERA OCR ($CAMERA_OCR) Captures the camera text via OCR.
- Agent Memory (MEMORY@agent_id) Accesses agents' stored information (defaults to current agent)
- Agent Image Memory (IMEMORY@agent_id) Accesses agents' stored images (defaults to current agent)
- Agent Image Memory OCR ($IMEMORY_OCR) Captures text on an image's memory via OCR.
- Clipboard ($CLIPBOARD) It pastes the clipboard contents
- Microphone* ($MICROPHONE) Captures the microphone and adds a transcription
- Screen Audio* ($SCREEN_AUDIO) Captures the audio transcription of screen sharing a tab.
- All audio* ($ALL_AUDIO) Mixes the microphone and screen audio and provides a complete transcription of both (used for meetings).
* Uses a whisper model with transformers.js
Agent Tools:
getMemory(agentId?)*β Retrieve stored memorysetMemory(agentId?, content)*β Replace stored memoryappendMemory(agentId?, content)*β Add to existing memorygetImageMemory(agentId?)*- Retrieve images stored in memorysetImageMemory(agentId?, images)*- Set images to memoryappendImageMemory(agentId?, images)*- Add images to memorystartAgent(agentId?)*β Starts an agentstopAgent(agentId?)*β Stops an agenttime()- Gets current timesleep(ms)- Waits that ammount of miliseconds
* agentId is optional, deaults to agent running code
Notification Tools:
sendDiscord(discord_webhook, message, images?, videos?)- Directly sends a discord message to a server.sendTelegram(chat_id, message, images?, videos?)Sends a telegram message with the Observer bot. Get the chat_id messaging the bot @observer_notification_bot.sendEmail(email, message, images?, videos?)- Sends an email. Email must be the signed in email.sendPushover(user_token, message, images?, title?)- Sends a pushover notification.call(phone_number, message)*- Makes an automated phone call with text-to-speech message. Needs whitelisting.sendWhatsapp(phone_number, message, images?, videos?)*- Sends a whatsapp message with the Observer bot. Needs whitelisting.sendSms(phone_number, message, images?, videos?)*- Sends an SMS to a phone number. Due to A2P policy, blocked for US/Canada. Needs whitelisting.notify(title, options)β Send browser notification β οΈIMPORTANT: Some browsers block notifications
* To whitelist, SMS or call +1 (863)208-5341 or whatsapp +1 (555)783-4727
Video Recording Tools:
startClip()- Starts a recording of any video media and saves it to the recording Tab.stopClip()- Stops an active recordingmarkClip(label)- Adds a label to any active recording that will be displayed in the recording Tab.getVideo()- Returns array of videos on buffer.
App Tools:
ask(question, title="Confirmation")- Pops up a system confirmation dialogmessage(message, title="Agent Message")- Pops up a system messagesystem_notify(body, title="Observer AI")- Sends a system notificationoverlay(body)- Pushes a message to the overlayclick('left'|'right')- Triggers a mouse click at the current cursor position accepts either 'left' or 'right', defaults to left.celebrate()- Triggers a celebration animation in the Observer UI.
Code Tab
The "Code" tab receives the following variables as context before running:
response- The model's responseagentId- The id of the agent running the codescreen- The screen if capturedcamera- The camera if capturedimemory- The agent's current image in memoryimages- All images sent to the modelprompt- The model's promptmicrophone- Trascription from the microphone in this loopscreenAudio- Transcription from screen audio in this loopallAudio- Transcription from microphone and screen audio mixed in this loop
JavaScript agents run in the browser sandbox, making them ideal for passive monitoring and notifications:
// Remove Think tags for deepseek model
const cleanedResponse = response.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
// Get time
const time = time();
// Update memory with timestamp
appendMemory(`[${time}] ${cleanedResponse}`);
// Send to Telegram if the model mentions a word
if(response.includes("word")){
sendTelegram(cleanedResponse, "12345678") // Example chat_id
}
π Getting Started with Local Inference
There are a few ways to get Observer up and running with local inference. I recommend the Observer App.
Option 1 (Easiest): Pull models using Transformers.js on WebApp
If you go to the ModelHub you can download Gemma 4 e2b and Gemma4 e4b directly on the browser with no install. This is a bit unstable and crashes mobile devices, but no install is required at all!
Option 2 (Easy and Stable): Download the Observer App and use bundled llama.cpp models
Download the Official App:
The Observer App comes bundled with llama.cpp under the hood, so you can run any GGUF model! Be sure to load an mmproj file if using multimodality.
Option 3 (Most stable): Use Desktop App with any OpenAI compatible endpoint (Ollama, llama.cpp, vLLM)
Download Ollama for the best compatibility. Observer can connect directly to any server that provides a v1/chat/completions endpoint.
Set the Custom Model Server URL on the App to any OpenAI compatible endpoint.
NOTE: Your browser app sends the request to localhost:3838 which the ObserverApp proxies to your Custom Model Server URL, this is because of CORS.
Option 4: Full Docker Setup (Deprecated)
For Docker setup instructions, see docker/DOCKER.md.
Setting Up Python (Jupyter Server) (Deprecated)
For Jupyter server setup instructions, see app/JUPYTER.md.
Deploy & Share
Save your agent, test it from the dashboard, and upload to community to share with others!
π€ Contributing
We welcome contributions from the community! Here's how you can help:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ by Roy Medina for the Observer AI Community Special thanks to the Ollama team for being an awesome backbone to this project!