Search API V1.2
May 16, 2026 · View on GitHub
Search API V1.2 is a local, non-agentic web search system designed for Large Language Models and built to integrate with Text Generation WebUI via a thin extension layer.
It provides explicit, controllable, and reproducible web search without relying on hidden or hallucinated browsing behavior.
This project was originally developed as the first building block of a larger local AI system (code-named “The Junior”), but is released as a standalone, production-ready component.
Documentation
- FAQ (settings, troubleshooting, known limitations):
docs/FAQ.md - Changelog (all releases):
CHANGELOG.md
Why this exists
Most LLM integrations that claim “web access” suffer from at least one of these problems:
- the model hallucinates browsing behavior,
- search is implicit and uncontrollable,
- results are mixed with generation,
- behavior is unpredictable and hard to debug,
- installation breaks Python environments.
Search API V1 was designed to solve these problems explicitly and honestly.
Why this is useful
WebSearcher focuses on context optimization and structured retrieval, not raw webpage dumping into the prompt.
Design goals
- No hallucinated browsing claims
- Clear separation between search and generation
- Deterministic, debuggable behavior
- One search per user message (V1)
- Works with any LLM exposed via an OpenAI-compatible API
- Minimal impact on WebUI Python environment
- Simple installation (user or system, headless supported)
Core principles
1. Web search is explicit and controllable
The model does not perform hidden or implicit searches on its own.
Search is only possible via an explicit user marker:
???
This completely eliminates hallucinated browsing.
2. One search per user message (V1)
In V1, each user message can trigger at most one search cycle:
rewrite → search → rank → (fetch/extract) → (optional pack)
There are:
- no retries,
- no loops,
- no multi-step agent behavior.
This is intentional, to keep the system:
- fast,
- predictable,
- easy to reason about.
If the model determines that more data is needed, it outputs a new search query and asks the user to repeat the request manually.
3. Only the first trigger is processed
In V1:
- only the first
???marker in a message is processed, - multiple triggers in a single message are not supported.
Architecture overview
The system is designed to work with Text Generation WebUI using its OpenAI-compatible API interface.
User
↓
WebUI (thin Python plugin)
↓
LLM
↓ (SEARCH_QUERY)
Searcher Service (Node.js)
├─ Search backend (SearXNG / DuckDuckGo)
├─ Snippet ranking (LLM, single call)
├─ Fetch & extract (local | jina)
├─ Cache (extracted text only)
↓
CONTEXT_PACK
↓
LLM final answer
Why a thin WebUI plugin + external service?
Experience shows that complex WebUI plugins often turn into Python dependency hell, breaking WebUI upgrades or entire environments.
This project deliberately uses:
- a thin Python plugin (UI + orchestration only),
- a separate search service with its own dependencies, lifecycle and systemd units.
This makes installation, upgrades and maintenance much safer and cleaner.
Search flow (V1)
For each user message containing ???:
1. Query source
Two modes:
-
user_text Your text after
???is used as-is. -
llm_query The LLM rewrites your text into a concise search query.
You can switch modes at any time.
2. Search backends
Supported in V1:
- SearXNG (primary, recommended)
- DuckDuckGo (fallback, limited)
DuckDuckGo API is intentionally treated as a fallback due to its limited and often empty responses.
More backends (including commercial ones) are planned for future versions.
3. Snippet ranking (optional)
After search results are returned:
- the LLM performs a single ranking call,
- selects the most relevant snippets,
- deterministic fallback is used if ranking fails.
No loops, no retries, no agent behavior.
4. Fetch & extract (optional, full mode)
Two extraction engines are supported:
local
- Direct HTTP fetch
- Mozilla Readability
- High privacy
- Does not handle JS-rendered or protected pages
jina
- Uses the external Jina Reader service
- Better results on complex pages
- Optional API key for higher limits
5. Context handling
Two modes:
inject
Extracted content is injected directly into the prompt.
llm_pack
All extracted pages are passed to the LLM, which:
- selects only information relevant to the full user question,
- produces a compact summary,
- significantly reduces context pollution.
Privacy and proxy support
Search API V1 supports routing search requests and page fetching through a SOCKS5 proxy.
This can be useful for:
- increased privacy,
- network isolation,
- bypassing regional or network-level restrictions.
Configuration
Proxy usage is optional and disabled by default.
To enable it:
- Open the relevant configuration files.
- Uncomment the proxy sections.
- Specify your SOCKS5 proxy address.
Example:
proxy:
socks_url: socks5://127.0.0.1:1080
Both search and fetch stages respect this setting.
OpenAI API autodetection
By default:
openai_api_base = auto
WebSearcher will probe local Text Generation WebUI OpenAI-compatible API endpoints on:
127.0.0.1:5000..5005
and automatically use the first working endpoint.
This improves compatibility with newer multi-instance Text Generation WebUI setups.
You can also force a specific instance manually:
http://127.0.0.1:5001/v1
Useful when running multiple WebUI instances and wanting WebSearcher to always use a specific LLM backend.
Cache (V1)
V1 includes a simple, robust cache by design.
- Caches extracted text only
- HTML is never stored
- Failed or empty extractions are not cached
- Key:
sha1(engine + normalized_url) - TTL-based cleanup using file
mtime - No index (intentional)
Cache locations:
-
User install:
~/.cache/mistbyte-ai/websearch -
System install:
/var/cache/mistbyte-ai/websearch
System prompt contract (important)
This project relies on a compatible system prompt contract.
The recommended prompt is designed to work with:
- WebSearcher CONTEXT_PACK injection;
- modern tool-enabled LLM workflows;
- built-in Text Generation WebUI tools and web search.
At minimum, the system prompt should enforce the following rules:
-
The assistant must not falsely claim that it searched, browsed, fetched, verified, or checked online information unless tools actually returned relevant content.
-
Any
CONTEXT_PACKmust be treated as explicitly provided contextual input for the current task. -
Information from
CONTEXT_PACKor tool output should not be ignored solely because of training cutoff limitations. -
If available tools fail or are unavailable, the assistant may ask the user to perform an additional search using:
SEARCH_QUERY:
-
The search query must be a single line with no explanations or extra formatting.
-
The assistant should avoid inventing facts, sources, links, or citations.
The recommended reference prompt used during development is included in the repository.
Advanced users may adapt it, but incompatible prompt behavior may cause:
- hallucinated browsing claims;
- incorrect CONTEXT_PACK handling;
- conflicts with built-in tools;
- unnecessary context growth;
- degraded retrieval quality.
Full reference prompt:
docs/system-prompt.txt
Additional docs:
- FAQ:
docs/FAQ.md - Changelog:
CHANGELOG.md
Installation
The project ships with a single installer:
install.sh
It supports:
- user-level installation (recommended),
- system-wide installation,
- headless environments,
- systemd user services.
User install vs system install
- Running as a regular user → user install
- Running as root → system install
User install is recommended to avoid writing into system directories.
Recommended setup: install in USER mode under the same Linux user account
that runs text-generation-webui.
This avoids permission mismatches between WebUI, the plugin, and systemd services,
and is the least error-prone configuration.
WebUI plugin installation
The WebUI plugin is installed under the identifier websearch-mistbyte.
The installer:
- attempts to auto-detect
text-generation-webui, - installs the plugin automatically if found.
If auto-detection fails:
- Create a directory in WebUI extensions:
websearch_mistbyte - Copy
script.pyinto it. - Restart WebUI backend (not just the browser UI).
(Note: the project identifier uses a dash, while the directory name uses an underscore.)
After restart, search settings appear below the input box.
Headless user setup
For headless systems or servers:
sudo loginctl enable-linger <user>
Logs:
journalctl --user -u searxng -f
journalctl --user -u websearch-mistbyte -f
Commands must be executed as the same user that owns the services.
Windows / WSL2 support
Search API V1 is Linux-first.
Supported on Windows via WSL2
- Windows 10 / 11
- WSL2
- Linux filesystem inside WSL (
/home/...) - systemd enabled inside WSL
Enable systemd in WSL:
/etc/wsl.conf
[boot]
systemd=true
Then restart WSL:
wsl --shutdown
Native Windows service installation is not supported in V1.
Compatibility with newer Text Generation WebUI versions
Recent versions of Text Generation WebUI include built-in web search and tool support.
WebSearcher is designed to work alongside native tools instead of replacing them.
The recommended system prompt was updated to:
- avoid conflicts with built-in tools;
- treat CONTEXT_PACK as optional contextual enrichment;
- allow normal tool-enabled workflows.
See:
docs/system-prompt.txt
Troubleshooting
For a full troubleshooting guide and configuration reference, see docs/FAQ.md.
Podman pull fails with TLS handshake timeout
If podman pull hangs or fails:
curl -4 -I https://registry-1.docker.io/v2/
curl -6 -I https://registry-1.docker.io/v2/
If IPv6 fails but IPv4 works, your system prefers IPv6 by default.
Fix (recommended):
sudo vim /etc/gai.conf
Add:
precedence ::ffff:0:0/96 100
This forces IPv4 preference and fixes most Podman/Docker TLS issues.
Limitations (V1)
- One search per user message
- Only the first
???trigger is processed - No agent loop
- No multi-step search
- No headless browser extraction
- Sequential page fetching only
These limitations are intentional.
Donations
This project is developed independently, without sponsors.
Donations directly accelerate development of roadmap features.
Releases
See CHANGELOG.md for the release history and what changed in each version.
Roadmap
See ROADMAP.md
About the author
More projects and technical background:
Summary
Search API V1 is a clean, honest, engineering-driven foundation.
It does not promise magic — it delivers predictable, controllable web search for LLMs.