Search API V1.2

May 16, 2026 · View on GitHub

Search API V1.2 is a local, non-agentic web search system designed for Large Language Models and built to integrate with Text Generation WebUI via a thin extension layer.

It provides explicit, controllable, and reproducible web search without relying on hidden or hallucinated browsing behavior.

This project was originally developed as the first building block of a larger local AI system (code-named “The Junior”), but is released as a standalone, production-ready component.

Documentation

FAQ (settings, troubleshooting, known limitations): docs/FAQ.md
Changelog (all releases): CHANGELOG.md

Why this exists

Most LLM integrations that claim “web access” suffer from at least one of these problems:

the model hallucinates browsing behavior,
search is implicit and uncontrollable,
results are mixed with generation,
behavior is unpredictable and hard to debug,
installation breaks Python environments.

Search API V1 was designed to solve these problems explicitly and honestly.

Why this is useful

WebSearcher focuses on context optimization and structured retrieval, not raw webpage dumping into the prompt.

Design goals

No hallucinated browsing claims
Clear separation between search and generation
Deterministic, debuggable behavior
One search per user message (V1)
Works with any LLM exposed via an OpenAI-compatible API
Minimal impact on WebUI Python environment
Simple installation (user or system, headless supported)

Core principles

1. Web search is explicit and controllable

The model does not perform hidden or implicit searches on its own.

Search is only possible via an explicit user marker:

???

This completely eliminates hallucinated browsing.

2. One search per user message (V1)

In V1, each user message can trigger at most one search cycle:

rewrite → search → rank → (fetch/extract) → (optional pack)

There are:

no retries,
no loops,
no multi-step agent behavior.

This is intentional, to keep the system:

fast,
predictable,
easy to reason about.

If the model determines that more data is needed, it outputs a new search query and asks the user to repeat the request manually.

3. Only the first trigger is processed

In V1:

only the first ??? marker in a message is processed,
multiple triggers in a single message are not supported.

Architecture overview

The system is designed to work with Text Generation WebUI using its OpenAI-compatible API interface.

User
 ↓
WebUI (thin Python plugin)
 ↓
LLM
 ↓   (SEARCH_QUERY)
Searcher Service (Node.js)
 ├─ Search backend (SearXNG / DuckDuckGo)
 ├─ Snippet ranking (LLM, single call)
 ├─ Fetch & extract (local | jina)
 ├─ Cache (extracted text only)
 ↓
CONTEXT_PACK
 ↓
LLM final answer

Why a thin WebUI plugin + external service?

Experience shows that complex WebUI plugins often turn into Python dependency hell, breaking WebUI upgrades or entire environments.

This project deliberately uses:

a thin Python plugin (UI + orchestration only),
a separate search service with its own dependencies, lifecycle and systemd units.

This makes installation, upgrades and maintenance much safer and cleaner.

Search flow (V1)

For each user message containing ???:

1. Query source

Two modes:

user_text Your text after ??? is used as-is.
llm_query The LLM rewrites your text into a concise search query.

You can switch modes at any time.

2. Search backends

Supported in V1:

SearXNG (primary, recommended)
DuckDuckGo (fallback, limited)

DuckDuckGo API is intentionally treated as a fallback due to its limited and often empty responses.

More backends (including commercial ones) are planned for future versions.

3. Snippet ranking (optional)

After search results are returned:

the LLM performs a single ranking call,
selects the most relevant snippets,
deterministic fallback is used if ranking fails.

No loops, no retries, no agent behavior.

4. Fetch & extract (optional, full mode)

Two extraction engines are supported:

`local`

Direct HTTP fetch
Mozilla Readability
High privacy
Does not handle JS-rendered or protected pages

`jina`

Uses the external Jina Reader service
Better results on complex pages
Optional API key for higher limits

5. Context handling

Two modes:

`inject`

Extracted content is injected directly into the prompt.

`llm_pack`

All extracted pages are passed to the LLM, which:

selects only information relevant to the full user question,
produces a compact summary,
significantly reduces context pollution.

Privacy and proxy support

Search API V1 supports routing search requests and page fetching through a SOCKS5 proxy.

This can be useful for:

increased privacy,
network isolation,
bypassing regional or network-level restrictions.

Configuration

Proxy usage is optional and disabled by default.

To enable it:

Open the relevant configuration files.
Uncomment the proxy sections.
Specify your SOCKS5 proxy address.

Example:

proxy:
  socks_url: socks5://127.0.0.1:1080

Both search and fetch stages respect this setting.

OpenAI API autodetection

By default:

openai_api_base = auto

WebSearcher will probe local Text Generation WebUI OpenAI-compatible API endpoints on:

127.0.0.1:5000..5005

and automatically use the first working endpoint.

This improves compatibility with newer multi-instance Text Generation WebUI setups.

You can also force a specific instance manually:

http://127.0.0.1:5001/v1

Useful when running multiple WebUI instances and wanting WebSearcher to always use a specific LLM backend.

Cache (V1)

V1 includes a simple, robust cache by design.

Caches extracted text only
HTML is never stored
Failed or empty extractions are not cached
Key: sha1(engine + normalized_url)
TTL-based cleanup using file mtime
No index (intentional)

Cache locations:

User install: ~/.cache/mistbyte-ai/websearch
System install: /var/cache/mistbyte-ai/websearch

System prompt contract (important)

This project relies on a compatible system prompt contract.

The recommended prompt is designed to work with:

WebSearcher CONTEXT_PACK injection;
modern tool-enabled LLM workflows;
built-in Text Generation WebUI tools and web search.

At minimum, the system prompt should enforce the following rules:

The assistant must not falsely claim that it searched, browsed, fetched, verified, or checked online information unless tools actually returned relevant content.
Any CONTEXT_PACK must be treated as explicitly provided contextual input for the current task.
Information from CONTEXT_PACK or tool output should not be ignored solely because of training cutoff limitations.
If available tools fail or are unavailable, the assistant may ask the user to perform an additional search using:

SEARCH_QUERY:
The search query must be a single line with no explanations or extra formatting.
The assistant should avoid inventing facts, sources, links, or citations.

The recommended reference prompt used during development is included in the repository.

Advanced users may adapt it, but incompatible prompt behavior may cause:

hallucinated browsing claims;
incorrect CONTEXT_PACK handling;
conflicts with built-in tools;
unnecessary context growth;
degraded retrieval quality.

Full reference prompt: docs/system-prompt.txt

Additional docs:

FAQ: docs/FAQ.md
Changelog: CHANGELOG.md

Installation

The project ships with a single installer:

install.sh

It supports:

user-level installation (recommended),
system-wide installation,
headless environments,
systemd user services.

User install vs system install

Running as a regular user → user install
Running as root → system install

User install is recommended to avoid writing into system directories.

Recommended setup: install in USER mode under the same Linux user account that runs text-generation-webui. This avoids permission mismatches between WebUI, the plugin, and systemd services, and is the least error-prone configuration.

WebUI plugin installation

The WebUI plugin is installed under the identifier websearch-mistbyte.

The installer:

attempts to auto-detect text-generation-webui,
installs the plugin automatically if found.

If auto-detection fails:

Create a directory in WebUI extensions:
```
websearch_mistbyte
```
Copy script.py into it.
Restart WebUI backend (not just the browser UI).

(Note: the project identifier uses a dash, while the directory name uses an underscore.)

After restart, search settings appear below the input box.

Headless user setup

For headless systems or servers:

sudo loginctl enable-linger <user>

Logs:

journalctl --user -u searxng -f
journalctl --user -u websearch-mistbyte -f

Commands must be executed as the same user that owns the services.

Windows / WSL2 support

Search API V1 is Linux-first.

Supported on Windows via WSL2

Windows 10 / 11
WSL2
Linux filesystem inside WSL (/home/...)
systemd enabled inside WSL

Enable systemd in WSL:

/etc/wsl.conf
[boot]
systemd=true

Then restart WSL:

wsl --shutdown

Native Windows service installation is not supported in V1.

Compatibility with newer Text Generation WebUI versions

Recent versions of Text Generation WebUI include built-in web search and tool support.

WebSearcher is designed to work alongside native tools instead of replacing them.

The recommended system prompt was updated to:

avoid conflicts with built-in tools;
treat CONTEXT_PACK as optional contextual enrichment;
allow normal tool-enabled workflows.

See:

docs/system-prompt.txt

Troubleshooting

For a full troubleshooting guide and configuration reference, see docs/FAQ.md.

Podman pull fails with TLS handshake timeout

If podman pull hangs or fails:

curl -4 -I https://registry-1.docker.io/v2/
curl -6 -I https://registry-1.docker.io/v2/

If IPv6 fails but IPv4 works, your system prefers IPv6 by default.

Fix (recommended):

sudo vim /etc/gai.conf

Add:

precedence ::ffff:0:0/96 100

This forces IPv4 preference and fixes most Podman/Docker TLS issues.

Limitations (V1)

One search per user message
Only the first ??? trigger is processed
No agent loop
No multi-step search
No headless browser extraction
Sequential page fetching only

These limitations are intentional.

Donations

This project is developed independently, without sponsors.

Donations directly accelerate development of roadmap features.

https://web.tribute.tg/d/Ih8

https://home.vps.3-a.net/

Summary

Search API V1 is a clean, honest, engineering-driven foundation.

It does not promise magic — it delivers predictable, controllable web search for LLMs.