Example 04
March 6, 2026 · View on GitHub
Goal
Prevent agents from polluting each other’s context. A “Scholar” drafts an answer from evidence; an “Auditor” validates it against the same evidence using a strict handoff contract. If validation fails, we do not ship text—we ask for more evidence or retry retrieval. No SDKs, single-file runs.
Problem Map link
Targets No.6 Logic Collapse & Recovery (state consistency during hand-offs). Secondary reductions in No.1 (hallucination) and No.2 (intent split) because each agent operates inside a hard boundary.
Outcome
- Clean hand-off JSON with explicit scope, evidence ids, claims, and verdict
- Deterministic accept/reject logic; no “looks right” guesses
- Machine-readable agent traces for audits and regressions
1) Inputs
Use the same tiny corpus as earlier examples:
// data/chunks.json
[
{"id":"p1#1","page":1,"text":"X is a constrained mapping."},
{"id":"p2#1","page":2,"text":"Y is unrelated to X. It describes a separate protocol."}
]
Two test questions:
- Q1: “What is X?” → should pass with citation
[p1#1] - Q2: “Explain Z.” → should refuse as
not in context
2) Handoff Contract (JSON)
This is the only way agents talk. Anything outside the contract is ignored.
// contract.schema (conceptual)
{
"handoff_id": "string", // unique
"question": "string",
"scope": { "allowed_ids": ["..."] },// strict evidence boundary
"scholar": {
"claim": "string", // one-sentence claim
"citations": ["..."], // ids subset of allowed_ids
"notes": "string" // optional, not binding
},
"auditor": {
"verdict": "VALID | INVALID | NOT_IN_CONTEXT",
"reason": "string",
"citations": ["..."], // subset of allowed_ids
"corrected_claim": "string|null" // for INVALID when fix is trivial
}
}
Rules
citations ⊆ scope.allowed_idsfor both agents- If
verdict != VALID, the pipeline must not emit an answer - If
verdict = NOT_IN_CONTEXT, retrieval retries are allowed; otherwise surface a refusal - Trace every hand-off as JSONL
3) Path A — Python (single file, no extra deps)
Create agents.py.
# agents.py -- two-agent boundary with strict JSON handoff
import json, os, time, uuid, urllib.request, sys
CHUNKS = json.load(open("data/chunks.json", encoding="utf8"))
def retrieve(question, k=2):
qs = set(question.lower().split())
scored = []
for c in CHUNKS:
score = sum(1 for w in c["text"].lower().split() if w in qs)
scored.append((score, c))
scored.sort(key=lambda x: x[0], reverse=True)
picks = [c for _, c in scored[:k]]
return picks
def build_scholar_prompt(q, chunks):
ctx = "\n\n".join(f"[{c['id']}] {c['text']}" for c in chunks)
return (
"Role: Scholar.\n"
"Use only the evidence. If not provable, reply exactly: not in context.\n"
"Return JSON with fields: claim, citations (array of ids). No extra keys.\n\n"
f"Question: {q}\n\nEvidence:\n{ctx}\n"
)
def build_auditor_prompt(q, chunks, scholar_json):
ctx = "\n\n".join(f"[{c['id']}] {c['text']}" for c in chunks)
return (
"Role: Auditor.\n"
"Validate the Scholar strictly against the same evidence.\n"
"Return JSON with fields: verdict (VALID|INVALID|NOT_IN_CONTEXT), reason, citations (array), corrected_claim (or null).\n"
"Rules:\n"
"- If the claim is outside evidence, verdict=INVALID.\n"
"- If the question cannot be answered from evidence, verdict=NOT_IN_CONTEXT.\n"
"- citations must only reference provided evidence ids.\n\n"
f"Question: {q}\nEvidence:\n{ctx}\n\nScholar:\n{scholar_json}\n"
)
def call_openai(prompt, model=os.getenv("OPENAI_MODEL","gpt-4o-mini")):
api_key = os.getenv("OPENAI_API_KEY")
if not api_key: raise RuntimeError("Set OPENAI_API_KEY")
body = json.dumps({
"model": model,
"messages": [{"role":"user","content":prompt}],
"temperature": 0
}).encode("utf-8")
req = urllib.request.Request(
"https://api.openai.com/v1/chat/completions",
data=body,
headers={"Content-Type":"application/json","Authorization":f"Bearer {api_key}"}
)
with urllib.request.urlopen(req) as r:
j = json.loads(r.read().decode("utf-8"))
return j["choices"][0]["message"]["content"].strip()
def parse_json_block(txt):
# best-effort extract first JSON object in the text
start = txt.find("{")
end = txt.rfind("}")
if start == -1 or end == -1 or end <= start: return {}
try:
return json.loads(txt[start:end+1])
except Exception:
return {}
def enforce(handoff):
allowed = set(handoff["scope"]["allowed_ids"])
def subset(cites): return set(cites).issubset(allowed)
s = handoff["scholar"]
a = handoff["auditor"]
# basic checks
if not subset(s.get("citations", [])):
return "REJECT", "scholar cites outside scope"
if not subset(a.get("citations", [])):
return "REJECT", "auditor cites outside scope"
verdict = a.get("verdict", "INVALID")
if verdict == "VALID":
return "ACCEPT", "auditor validated"
if verdict == "NOT_IN_CONTEXT":
return "RETRY", "needs more/different evidence"
return "REJECT", "auditor invalidated claim"
def run(question):
handoff_id = str(uuid.uuid4())
chunks = retrieve(question, k=2)
scope_ids = [c["id"] for c in chunks]
scholar_prompt = build_scholar_prompt(question, chunks)
scholar_raw = call_openai(scholar_prompt)
scholar = parse_json_block(scholar_raw) or {"claim":"not in context","citations":[]}
auditor_prompt = build_auditor_prompt(question, chunks, json.dumps(scholar))
auditor_raw = call_openai(auditor_prompt)
auditor = parse_json_block(auditor_raw) or {"verdict":"NOT_IN_CONTEXT","reason":"parse failed","citations":[],"corrected_claim":None}
handoff = {
"handoff_id": handoff_id,
"question": question,
"scope": {"allowed_ids": scope_ids},
"scholar": scholar,
"auditor": auditor
}
decision, reason = enforce(handoff)
os.makedirs("runs", exist_ok=True)
with open("runs/agent_trace.jsonl","a",encoding="utf8") as f:
f.write(json.dumps({
"ts": int(time.time()),
"handoff": handoff,
"decision": decision,
"reason": reason
}, ensure_ascii=False) + "\n")
return decision, reason, handoff
if __name__ == "__main__":
if len(sys.argv) < 2:
print("usage: OPENAI_API_KEY=sk-xxx python agents.py \"your question\"")
sys.exit(1)
print(run(sys.argv[1]))
Run:
OPENAI_API_KEY=sk-xxx python agents.py "What is X?"
OPENAI_API_KEY=sk-xxx python agents.py "Explain Z."
Pass criteria
- Q1 prints
('ACCEPT', 'auditor validated', …)and both agents cite only[p1#1] - Q2 prints
('RETRY' or 'REJECT', …)withNOT_IN_CONTEXTverdict; no free-text answer emitted - Two lines appended to
runs/agent_trace.jsonl
4) Path B — Node (single file, no deps)
Create agents.mjs.
// agents.mjs -- two-agent boundary in Node, strict JSON handoff
import fs from "node:fs";
import https from "node:https";
import crypto from "node:crypto";
const CHUNKS = JSON.parse(fs.readFileSync("data/chunks.json","utf8"));
function retrieve(q, k=2){
const qs = new Set(q.toLowerCase().split(/\s+/));
return [...CHUNKS]
.map(c => [c.text.toLowerCase().split(/\s+/).filter(w=>qs.has(w)).length, c])
.sort((a,b)=>b[0]-a[0]).slice(0,k).map(([_,c])=>c);
}
function buildScholarPrompt(q, chunks){
const ctx = chunks.map(c => `[${c.id}] ${c.text}`).join("\n\n");
return `Role: Scholar.
Use only the evidence. If not provable, reply exactly: not in context.
Return JSON with fields: claim, citations (array of ids). No extra keys.
Question: ${q}
Evidence:
${ctx}
`;
}
function buildAuditorPrompt(q, chunks, scholarJson){
const ctx = chunks.map(c => `[${c.id}] ${c.text}`).join("\n\n");
return `Role: Auditor.
Validate the Scholar strictly against the same evidence.
Return JSON with fields: verdict (VALID|INVALID|NOT_IN_CONTEXT), reason, citations (array), corrected_claim (or null).
Rules:
- If the claim is outside evidence, verdict=INVALID.
- If the question cannot be answered from evidence, verdict=NOT_IN_CONTEXT.
- citations must only reference provided evidence ids.
Question: ${q}
Evidence:
${ctx}
Scholar:
${scholarJson}
`;
}
async function callOpenAI(prompt, model=process.env.OPENAI_MODEL || "gpt-4o-mini"){
const apiKey = process.env.OPENAI_API_KEY;
if(!apiKey) throw new Error("Set OPENAI_API_KEY");
const body = JSON.stringify({ model, messages:[{role:"user",content:prompt}], temperature:0 });
const resp = await new Promise((resolve,reject)=>{
const req = https.request("https://api.openai.com/v1/chat/completions",{
method:"POST",
headers:{
"Content-Type":"application/json",
"Authorization":`Bearer ${apiKey}`,
"Content-Length":Buffer.byteLength(body)
}
}, r=>{
let data=""; r.on("data",d=>data+=d); r.on("end",()=>resolve(JSON.parse(data)));
});
req.on("error",reject); req.write(body); req.end();
});
return resp.choices[0].message.content.trim();
}
function parseJsonBlock(txt){
const s = txt.indexOf("{"), e = txt.lastIndexOf("}");
if(s<0 || e<=s) return {};
try { return JSON.parse(txt.slice(s,e+1)); } catch { return {}; }
}
function enforce(handoff){
const allowed = new Set(handoff.scope.allowed_ids);
const subset = cites => cites.every(x => allowed.has(x));
const s = handoff.scholar, a = handoff.auditor;
if(!subset(s.citations || [])) return ["REJECT","scholar cites outside scope"];
if(!subset(a.citations || [])) return ["REJECT","auditor cites outside scope"];
if(a.verdict === "VALID") return ["ACCEPT","auditor validated"];
if(a.verdict === "NOT_IN_CONTEXT") return ["RETRY","needs more/different evidence"];
return ["REJECT","auditor invalidated claim"];
}
async function run(q){
const chunks = retrieve(q,2);
const scopeIds = chunks.map(c=>c.id);
const scholarRaw = await callOpenAI(buildScholarPrompt(q, chunks));
const scholar = parseJsonBlock(scholarRaw) || {claim:"not in context", citations:[]};
const auditorRaw = await callOpenAI(buildAuditorPrompt(q, chunks, JSON.stringify(scholar)));
const auditor = parseJsonBlock(auditorRaw) || {verdict:"NOT_IN_CONTEXT", reason:"parse failed", citations:[], corrected_claim:null};
const handoff = {
handoff_id: crypto.randomUUID(),
question: q,
scope: { allowed_ids: scopeIds },
scholar, auditor
};
const [decision, reason] = enforce(handoff);
fs.mkdirSync("runs",{recursive:true});
fs.appendFileSync("runs/agent_trace.jsonl", JSON.stringify({ ts: Date.now(), handoff, decision, reason })+"\n");
return { decision, reason, handoff };
}
if (import.meta.url === `file://${process.argv[1]}`) {
const q = process.argv.slice(2).join(" ");
if(!q){ console.error("usage: OPENAI_API_KEY=sk-xxx node agents.mjs \"your question\""); process.exit(1); }
console.log(await run(q));
}
Run:
OPENAI_API_KEY=sk-xxx node agents.mjs "What is X?"
OPENAI_API_KEY=sk-xxx node agents.mjs "Explain Z."
Pass criteria are identical to Python.
5) Acceptance logic (deterministic)
A request only emits a final answer if:
auditor.verdict == VALIDscholar.citations ⊆ scope.allowed_idsandauditor.citations ⊆ scope.allowed_ids- (Optional) The answer passes your template compliance checks from Example 02
If any check fails, you do not print text. You retry retrieval or respond with not in context.
6) Failure shapes & quick fixes
- Boundary leak: citations include an id not in
scope.allowed_ids→ fix retrieval scope or chunk ids - Over-refusal:
NOT_IN_CONTEXTbut query tokens exist in evidence → increase top-k pre-rerank (see Example 03) - Agreeing wrong: both agents agree on an incorrect statement → your evidence is wrong; fix chunking or data quality
- Parsing noise: models add prose around JSON →
parse_json_blockalready strips; keep temperature at 0
7) Production tips
- Log
handoffobjects to a dedicated topic or file for audits - Add a cooldown: if 2 consecutive
NOT_IN_CONTEXThappen for the same user intent, escalate to fallback UX instead of re-query loops - Surface
reasonto operators; it is much faster than re-reading raw answers
8) Where to go next
- Combine with Example 03 (intersection + rerank) for better evidence
- Add a third agent Policy that blocks answers containing restricted terms, still within the same
scopeids - Wire this acceptance gate into your API so UI can never bypass it
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
Explore More
| Layer | Page | What it’s for |
|---|---|---|
| ⭐ Proof | WFGY Recognition Map | External citations, integrations, and ecosystem proof |
| ⚙️ Engine | WFGY 1.0 | Original PDF tension engine and early logic sketch (legacy reference) |
| ⚙️ Engine | WFGY 2.0 | Production tension kernel for RAG and agent systems |
| ⚙️ Engine | WFGY 3.0 | TXT based Singularity tension engine (131 S class set) |
| 🗺️ Map | Problem Map 1.0 | Flagship 16 problem RAG failure taxonomy and fix map |
| 🗺️ Map | Problem Map 2.0 | Global Debug Card for RAG and agent pipeline diagnosis |
| 🗺️ Map | Problem Map 3.0 | Global AI troubleshooting atlas and failure pattern map |
| 🧰 App | TXT OS | .txt semantic OS with fast bootstrap |
| 🧰 App | Blah Blah Blah | Abstract and paradox Q&A built on TXT OS |
| 🧰 App | Blur Blur Blur | Text to image generation with semantic control |
| 🏡 Onboarding | Starter Village | Guided entry point for new users |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.