👁️ Observer AI

May 16, 2026 · View on GitHub

👁️ Observer AI

YouTube | Tiktok | Instagram | Twitter | Discord

Build powerful micro-agents that observe, log and react, so you don't have to.

All while keeping your data 100% private and secure.

👁️ How Observer Agents Work

Sensors →	Models →	Tools
_{Screen • Camera • Mic • Audio}	_{Local LLMs}	_{Messaging • Notifications • Recording • Memory • Code}

🤖 Base Agent Example

Sends an email when the Observer logo is on screen

System Prompt (uses $SCREEN for multimodal screen input)

You are an Observer agent, watch the screen and if you see the Observer logo say OBSERVER, if you don't, say CONTINUE. 
$SCREEN

Code using Email Tool if model identified an Observer logo

if(response.includes("OBSERVER")){
  sendEmail("your@email.com", response, screen); //sends the screen as an attached image
}

🎯 What Observer AI Does Best

📊 Intelligent Logging

🧠 Text & Visual Memory

🎥 Smart Screen Recording

🚨 Powerful Notifications

📧 Email • 💬 Discord • 📱 Telegram 📞 SMS • 💚 WhatsApp • Pushover

🏗️ Building Your Own Agent

Creating your own Observer AI consist of three things:

SENSORS - input that your model will have
MODELS - Small LLMs
TOOLS - functions for your model to use

Quick Start

Navigate to the Agent Dashboard and click "Create New Agent"
Fill in the "Configuration" tab with basic details (name, description, model, loop interval)
Give your model a system prompt and Sensors! The current Sensors that exist are:
- Screenshot ($SCREEN) Captures screen as an image for multimodal models.
- Screen OCR ($SCREEN_OCR) Captures screen content as text via OCR.
- CAMERA ($CAMERA) Captures the camera for multimodal models.
- CAMERA OCR ($CAMERA_OCR) Captures the camera text via OCR.
- Agent Memory ( $MEMORY or$ MEMORY@agent_id) Accesses agents' stored information (defaults to current agent)
- Agent Image Memory ( $IMEMORY or$ IMEMORY@agent_id) Accesses agents' stored images (defaults to current agent)
- Agent Image Memory OCR ($IMEMORY_OCR) Captures text on an image's memory via OCR.
- Clipboard ($CLIPBOARD) It pastes the clipboard contents
- Microphone* ($MICROPHONE) Captures the microphone and adds a transcription
- Screen Audio* ($SCREEN_AUDIO) Captures the audio transcription of screen sharing a tab.
- All audio* ($ALL_AUDIO) Mixes the microphone and screen audio and provides a complete transcription of both (used for meetings).

* Uses a whisper model with transformers.js

Agent Tools:

getMemory(agentId?)* – Retrieve stored memory
setMemory(agentId?, content)* – Replace stored memory
appendMemory(agentId?, content)* – Add to existing memory
getImageMemory(agentId?)* - Retrieve images stored in memory
setImageMemory(agentId?, images)* - Set images to memory
appendImageMemory(agentId?, images)* - Add images to memory
startAgent(agentId?)* – Starts an agent
stopAgent(agentId?)* – Stops an agent
time() - Gets current time
sleep(ms) - Waits that ammount of miliseconds

* agentId is optional, deaults to agent running code

Notification Tools:

sendDiscord(discord_webhook, message, images?, videos?) - Directly sends a discord message to a server.
sendTelegram(chat_id, message, images?, videos?) Sends a telegram message with the Observer bot. Get the chat_id messaging the bot @observer_notification_bot.
sendEmail(email, message, images?, videos?) - Sends an email. Email must be the signed in email.
sendPushover(user_token, message, images?, title?) - Sends a pushover notification.
call(phone_number, message)* - Makes an automated phone call with text-to-speech message. Needs whitelisting.
sendWhatsapp(phone_number, message, images?, videos?)* - Sends a whatsapp message with the Observer bot. Needs whitelisting.
sendSms(phone_number, message, images?, videos?)* - Sends an SMS to a phone number. Due to A2P policy, blocked for US/Canada. Needs whitelisting.
notify(title, options) – Send browser notification ⚠️IMPORTANT: Some browsers block notifications

* To whitelist, SMS or call +1 (863)208-5341 or whatsapp +1 (555)783-4727

Video Recording Tools:

startClip() - Starts a recording of any video media and saves it to the recording Tab.
stopClip() - Stops an active recording
markClip(label) - Adds a label to any active recording that will be displayed in the recording Tab.
getVideo() - Returns array of videos on buffer.

App Tools:

ask(question, title="Confirmation") - Pops up a system confirmation dialog
message(message, title="Agent Message") - Pops up a system message
system_notify(body, title="Observer AI") - Sends a system notification
overlay(body) - Pushes a message to the overlay
click('left'|'right') - Triggers a mouse click at the current cursor position accepts either 'left' or 'right', defaults to left.
celebrate() - Triggers a celebration animation in the Observer UI.

Code Tab

The "Code" tab receives the following variables as context before running:

response - The model's response
agentId - The id of the agent running the code
screen - The screen if captured
camera - The camera if captured
imemory - The agent's current image in memory
images - All images sent to the model
prompt - The model's prompt
microphone - Trascription from the microphone in this loop
screenAudio - Transcription from screen audio in this loop
allAudio - Transcription from microphone and screen audio mixed in this loop

JavaScript agents run in the browser sandbox, making them ideal for passive monitoring and notifications:

// Remove Think tags for deepseek model
const cleanedResponse = response.replace(/<think>[\s\S]*?<\/think>/g, '').trim();

// Get time
const time = time();

// Update memory with timestamp
appendMemory(`[${time}] ${cleanedResponse}`);

// Send to Telegram if the model mentions a word
if(response.includes("word")){
  sendTelegram(cleanedResponse, "12345678") // Example chat_id
}

🚀 Getting Started with Local Inference

There are a few ways to get Observer up and running with local inference. I recommend the Observer App.

Option 1 (Easiest): Pull models using Transformers.js on WebApp

If you go to the ModelHub you can download Gemma 4 e2b and Gemma4 e4b directly on the browser with no install. This is a bit unstable and crashes mobile devices, but no install is required at all!

Option 2 (Easy and Stable): Download the Observer App and use bundled llama.cpp models

Download the Official App:

The Observer App comes bundled with llama.cpp under the hood, so you can run any GGUF model! Be sure to load an mmproj file if using multimodality.

Option 3 (Most stable): Use Desktop App with any OpenAI compatible endpoint (Ollama, llama.cpp, vLLM)

Download Ollama for the best compatibility. Observer can connect directly to any server that provides a v1/chat/completions endpoint.

Set the Custom Model Server URL on the App to any OpenAI compatible endpoint.

NOTE: Your browser app sends the request to localhost:3838 which the ObserverApp proxies to your Custom Model Server URL, this is because of CORS.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Built with ❤️ by Roy Medina for the Observer AI Community Special thanks to the Ollama team for being an awesome backbone to this project!

👁️ Observer AI

👁️ Observer AI

Build powerful micro-agents that observe, log and react, so you don't have to.

👁️ How Observer Agents Work

Sensors →

Models →

Tools

🤖 Base Agent Example

🎯 What Observer AI Does Best

📊 Intelligent Logging

🚨 Powerful Notifications

🏗️ Building Your Own Agent

Quick Start

Code Tab

🚀 Getting Started with Local Inference

Option 1 (Easiest): Pull models using Transformers.js on WebApp

Option 2 (Easy and Stable): Download the Observer App and use bundled llama.cpp models

Option 3 (Most stable): Use Desktop App with any OpenAI compatible endpoint (Ollama, llama.cpp, vLLM)

Option 4: Full Docker Setup (Deprecated)

Setting Up Python (Jupyter Server) (Deprecated)

🤝 Contributing

📄 License