Dingo Agent Architecture & Implementation Guide
April 2, 2026 · View on GitHub
Overview
Dingo's Agent system extends traditional rule and LLM-based evaluation with multi-step reasoning, tool usage, and adaptive context gathering capabilities. This document provides a comprehensive overview of the Agent architecture, file structure, and implementation patterns.
Table of Contents
- Architecture Overview
- File Structure
- Core Components
- Implementation Patterns
- Data Flow
- Configuration
- Examples
Architecture Overview
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ Dingo Evaluation System │
├─────────────────────────────────────────────────────────────┤
│ Data Input → Executor → [Rules | LLMs | Agents] → Results │
└─────────────────────────────────────────────────────────────┘
▼
┌─────────────────────┐
│ Agent Framework │
└─────────────────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────┐
│ Base │ │ Tools │ │ LangChain│
│ Agent │◄────────│ Registry │ │ Adapter │
└─────────┘ └──────────┘ └──────────┘
│ │
▼ ▼
┌────────────────┐ ┌──────────────────┐
│ AgentFactCheck │ │ tavily_search │
│AgentHallucin..│ │ arxiv_search │
│ArticleFactChk │ │ claims_extractor│
│ (Custom) │ │ render_tool │
└────────────────┘ │ mineru_ocr_tool │
└──────────────────┘
Evaluation Flow Comparison
Traditional Evaluation:
┌──────┐ ┌─────────┐ ┌────────────┐
│ Data │─────▶│ Rule/LLM│─────▶│ EvalDetail │
└──────┘ └─────────┘ └────────────┘
Agent-Based Evaluation:
┌──────┐ ┌───────┐ ┌──────────┐ ┌─────┐ ┌────────────┐
│ Data │─────▶│ Agent │─────▶│Tool Calls│─────▶│ LLM │─────▶│ EvalDetail │
└──────┘ └───────┘ └──────────┘ └─────┘ └────────────┘
│ │
Web Search Reasoning &
OCR Tools Synthesis
File Structure
Current Implementation (Latest)
dingo/
├── model/
│ ├── llm/ # LLM-based evaluators
│ │ ├── agent/ # ✨ Agent Framework
│ │ │ ├── __init__.py # Package exports (BaseAgent, tools)
│ │ │ ├── base_agent.py # BaseAgent abstract class
│ │ │ ├── agent_fact_check.py # LangChain-based agent (framework-driven)
│ │ │ ├── agent_hallucination.py # Custom workflow agent (imperative)
│ │ │ ├── agent_article_fact_checker.py # Agent-First article fact-checker
│ │ │ ├── agent_wrapper.py # LangChain 1.0 integration wrapper
│ │ │ ├── langchain_adapter.py # Dingo ↔ LangChain tool adapter
│ │ │ └── tools/ # Agent tools
│ │ │ ├── __init__.py # Tool registry exports
│ │ │ ├── base_tool.py # BaseTool abstract class
│ │ │ ├── tool_registry.py # Tool registration & discovery
│ │ │ ├── claims_extractor.py # Claims extraction tool (LLM-based)
│ │ │ ├── arxiv_search.py # Academic paper search tool
│ │ │ ├── tavily_search.py # Web search tool (Tavily API)
│ │ │ ├── render_tool.py # HTML rendering tool
│ │ │ └── mineru_ocr_tool.py # OCR tool (MinerU integration)
│ │ ├── base_openai.py # Base class for OpenAI-compatible LLMs
│ │ └── ... # Other LLM evaluators
│ ├── model.py # ✏️ Central registry (@Model decorator)
│ └── rule/ # Rule-based evaluators
│
├── config/
│ └── input_args.py # ✏️ Configuration models (Pydantic)
│ # - InputArgs
│ # - EvaluatorArgs (includes agent_config)
│
├── exec/
│ ├── local.py # ✏️ Local executor with thread/process pools
│ │ # - Agents run in ThreadPoolExecutor (I/O-bound)
│ └── spark.py # Distributed executor (Spark)
│
├── io/
│ ├── input/
│ │ └── data.py # Data class (standardized input)
│ └── output/
│ └── eval_detail.py # EvalDetail (evaluation result)
│
└── utils/
└── log_util/ # Logging utilities
└── logger.py
examples/
└── agent/ # ✨ Agent usage examples
├── agent_executor_example.py # Basic agent execution
├── agent_hallucination_example.py # Hallucination detection example
└── agent_article_fact_checking_example.py # Article fact-checking example
test/
└── scripts/
└── model/
└── llm/
└── agent/ # ✨ Agent tests
├── test_agent_fact_check.py
├── test_agent_hallucination.py
├── test_article_fact_checker.py # ArticleFactChecker tests (88 tests)
├── test_async_article_fact_checker.py # Async/parsing tests (30 tests)
├── test_tool_registry.py
└── tools/
├── test_claims_extractor.py
├── test_arxiv_search.py
├── test_tavily_search.py
├── test_render_tool.py
└── test_mineru_ocr_tool.py
docs/
├── agent_development_guide.md # Comprehensive development guide
├── agent_architecture.md # This file
├── article_fact_checking_guide.md # ArticleFactChecker guide
└── quick_start_article_fact_checking.md # Quick start for article fact-checking
requirements/
└── agent.txt # Agent dependencies
# - langchain>=1.0.0
# - langchain-openai
# - tavily-python
# - etc.
.github/
└── env/
└── agent_hallucination.json # Example agent configuration
Key File Changes from "Old Version"
| Old Path | New Path | Notes |
|---|---|---|
dingo/model/agent/ | dingo/model/llm/agent/ | Moved under LLM module hierarchy |
| N/A | agent_wrapper.py | Added LangChain 1.0 integration |
| N/A | langchain_adapter.py | Added Dingo ↔ LangChain adapters |
agent_fact_check_web.py | agent_fact_check.py | Simplified naming |
| N/A | agent_hallucination.py | Added custom workflow example |
tools/web_search.py | tools/tavily_search.py | Specific implementation naming |
| N/A | tools/render_tool.py | Added HTML rendering |
| N/A | tools/mineru_ocr_tool.py | Added OCR capabilities |
Core Components
1. BaseAgent (base_agent.py)
Purpose: Abstract base class for all agent-based evaluators
Key Features:
- Extends
BaseOpenAIto inherit LLM functionality - Supports dual execution paths: Legacy (manual) and LangChain (framework-driven)
- Manages tool execution and configuration injection
- Provides agent orchestration methods
Core Methods:
class BaseAgent(BaseOpenAI):
# Configuration
available_tools: List[str] = [] # Tools this agent can use
max_iterations: int = 5 # Safety limit
use_agent_executor: bool = False # Enable LangChain path
# Abstract methods (must implement)
@abstractmethod
def plan_execution(cls, input_data: Data) -> List[Dict[str, Any]]
@abstractmethod
def aggregate_results(cls, input_data: Data, results: List[Any]) -> EvalDetail
# Main evaluation entry point
def eval(cls, input_data: Data) -> EvalDetail
# Tool execution
def execute_tool(cls, tool_name: str, **kwargs) -> Dict[str, Any]
def configure_tool(cls, tool_name: str, tool_class)
# LangChain integration
def _eval_with_langchain_agent(cls, input_data: Data) -> EvalDetail
def get_langchain_tools(cls)
def _format_agent_input(cls, input_data: Data) -> str
def _get_system_prompt(cls, input_data: Data) -> str
Execution Flow:
eval()
├─ use_agent_executor == True? (standard path)
│ ├─ Yes → _eval_with_langchain_agent()
│ │ ├─ get_langchain_tools()
│ │ ├─ get_langchain_llm()
│ │ ├─ AgentWrapper.create_agent()
│ │ ├─ AgentWrapper.invoke_and_format()
│ │ └─ aggregate_results()
│ │
│ └─ No → Legacy path
│ ├─ plan_execution()
│ ├─ Loop through plan steps
│ │ ├─ execute_tool() for tool steps
│ │ └─ send_messages() for LLM steps
│ └─ aggregate_results()
Note: ArticleFactChecker overrides eval() entirely and uses a two-phase
async parallel architecture (asyncio.run → _async_eval) instead of
the above base-class dispatch. See ArticleFactChecker section below.
2. Tool System
BaseTool (tools/base_tool.py)
Purpose: Abstract interface for all agent tools
class BaseTool(ABC):
name: str # Unique identifier
description: str # For LLM understanding
config: ToolConfig # Tool-specific config
@abstractmethod
def execute(cls, **kwargs) -> Dict[str, Any]
def validate_config(cls)
def update_config(cls, config_dict: Dict[str, Any])
ToolRegistry (tools/tool_registry.py)
Purpose: Central registry for tool discovery and management
Key Features:
- Auto-discovery via
@tool_register()decorator - Lazy loading (tools loaded on first use)
- Configuration injection from agent config
@tool_register("tavily_search")
class TavilySearch(BaseTool):
name = "tavily_search"
description = "Search the web using Tavily API"
@classmethod
def execute(cls, query: str, **kwargs) -> Dict[str, Any]:
# Implementation
return {
'success': True,
'results': [...],
'answer': "..."
}
Built-in Tools:
| Tool | File | Purpose | Dependencies |
|---|---|---|---|
claims_extractor | claims_extractor.py | LLM-based claims extraction | openai |
arxiv_search | arxiv_search.py | Academic paper search | arxiv |
tavily_search | tavily_search.py | Web search via Tavily API | tavily-python |
render_tool | render_tool.py | HTML rendering with Playwright | playwright |
mineru_ocr_tool | mineru_ocr_tool.py | OCR with MinerU | magic-pdf |
3. LangChain Integration
AgentWrapper (agent_wrapper.py)
Purpose: Wrapper for LangChain 1.0 create_agent API
Key Methods:
class AgentWrapper:
@staticmethod
def create_agent(llm, tools, system_prompt, **config)
# Uses langchain.agents.create_agent (LangGraph-based)
@staticmethod
def invoke_and_format(agent, input_text, input_data, max_iterations)
# Invokes agent and formats results for Dingo
@staticmethod
def get_openai_llm_from_dingo_config(dynamic_config)
# Creates ChatOpenAI from Dingo config
LangChain 1.0 Changes (Nov 2025):
- Uses
create_agent()instead of deprecatedAgentExecutor - Built on LangGraph for better state management
recursion_limitinstead ofmax_iterations- Message-based invocation interface
LangChain Adapter (langchain_adapter.py)
Purpose: Converts Dingo tools to LangChain StructuredTool format
def convert_dingo_tools(tool_names: List[str], agent_class) -> List[StructuredTool]:
# Wraps Dingo tools for LangChain compatibility
# Preserves Dingo's configuration injection mechanism
4. Agent Implementations
AgentFactCheck (agent_fact_check.py)
Pattern: LangChain-Based (Framework-Driven)
Key Characteristics:
- Sets
use_agent_executor = True - Overrides
_format_agent_input()for custom input formatting - Overrides
_get_system_prompt()for task-specific instructions - LangChain handles autonomous tool calling and reasoning
- Parses structured output in
aggregate_results()
Workflow:
Input: Question + Response + Context (optional)
↓
LangChain Agent decides:
- With context: MAY search for additional verification
- Without context: MUST search to verify facts
↓
Agent autonomously:
- Calls tavily_search tool as needed
- Reasons about results
- Returns structured output (HALLUCINATION_DETECTED: YES/NO)
↓
aggregate_results() parses output → EvalDetail
When to Use:
- ✅ Complex multi-step reasoning
- ✅ Benefit from LangChain's orchestration
- ✅ Prefer declarative style
- ✅ Rapid prototyping
AgentHallucination (agent_hallucination.py)
Pattern: Custom Workflow (Imperative)
Key Characteristics:
- Implements custom
eval()with explicit workflow - Manually calls
execute_tool()for searches - Manually calls
send_messages()for LLM interactions - Delegates to existing evaluator (LLMHallucination)
- Full control over execution flow
Workflow:
Input: Content + Context (optional)
↓
Check context availability
↓
├─ Has context? → Delegate to LLMHallucination
│
└─ No context? → Agent workflow:
1. Extract factual claims (LLM call)
2. Search web for each claim (Tavily tool)
3. Synthesize context (combine results)
4. Evaluate with synthesized context (LLMHallucination)
↓
Return EvalDetail with provenance
When to Use:
- Fine-grained control over steps
- Compose with existing evaluators
- Prefer explicit behavior
- Domain-specific workflows
- Conditional logic between steps
ArticleFactChecker (agent_article_fact_checker.py)
Pattern: Agent-First with Context Tracking (LangChain ReAct + Artifact Saving)
Key Characteristics:
- Sets
use_agent_executor = True(same as AgentFactCheck) - Overrides
eval()to add context tracking and file saving - Uses thread-local storage (
threading.local()) for concurrent safety - Extracts claims from tool_calls observation data
- Builds enriched per-claim verification records
- Saves intermediate artifacts (article, claims, verification, report)
- Produces dual-layer
EvalDetail.reason:[text_summary, structured_report_dict]
Workflow (two-phase parallel architecture):
Input: Article text (Markdown)
|
eval() override:
|- Save article content to output_path
|- asyncio.run(_async_eval())
|
Phase 1 — Claims Extraction:
|- ClaimsExtractor.execute(content) # Direct tool call, not via agent
|- Returns list of factual claims
|
Phase 2 — Parallel Claim Verification:
|- asyncio.gather() with Semaphore(max_concurrent_claims)
|- Each claim → independent LangChain mini-agent
│ |- _async_verify_single_claim()
│ |- AgentWrapper.async_invoke_and_format()
│ |- _parse_claim_json_robust() # 3-tier robust JSON parsing
│ └─ Returns per-claim verdict
|
Aggregation:
|- _aggregate_parallel_results()
|- _recalculate_summary()
|- Save artifacts (claims_extracted.jsonl, claims_verification.jsonl, report.json)
|- Return EvalDetail with dual-layer reason
When to Use:
- Article-level comprehensive fact-checking
- Need intermediate artifacts (claims list, per-claim details, full report)
- Benefit from transparent evidence chains
- Want structured report alongside text summary
Data Flow
Complete Evaluation Pipeline
┌───────────────────────────────────────────────────────────────┐
│ 1. Configuration Loading │
└───────────────────────────────────────────────────────────────┘
JSON Config → InputArgs (Pydantic) → EvaluatorArgs
├─ name: "AgentFactCheck"
├─ config.key: API key
├─ config.model: "gpt-4"
└─ config.agent_config:
├─ max_iterations: 10
└─ tools:
└─ tavily_search:
└─ api_key: "..."
┌───────────────────────────────────────────────────────────────┐
│ 2. Data Loading & Conversion │
└───────────────────────────────────────────────────────────────┘
DataSource.load() → Generator[raw_data]
↓
Converter.convert() → Data objects
├─ content: str
├─ prompt: Optional[str]
├─ context: Optional[List[str]]
└─ raw_data: Dict
┌───────────────────────────────────────────────────────────────┐
│ 3. Agent Execution (ThreadPoolExecutor) │
└───────────────────────────────────────────────────────────────┘
BaseAgent.eval(Data) → EvalDetail
│
├─ use_agent_executor?
│
├─ YES (LangChain Path):
│ ├─ _format_agent_input(Data) → input_text
│ ├─ _get_system_prompt(Data) → system_prompt
│ ├─ get_langchain_tools() → StructuredTool[]
│ ├─ get_langchain_llm() → ChatOpenAI
│ ├─ AgentWrapper.create_agent() → CompiledStateGraph
│ ├─ AgentWrapper.invoke_and_format()
│ │ ├─ Agent reasoning loop (ReAct)
│ │ ├─ Tool calls (autonomous)
│ │ └─ Final output
│ └─ aggregate_results() → EvalDetail
│
└─ NO (Legacy Path):
├─ plan_execution(Data) → plan: List[step]
├─ Loop through steps:
│ ├─ Tool step: execute_tool(name, **args)
│ │ ├─ ToolRegistry.get(name)
│ │ ├─ configure_tool()
│ │ └─ tool.execute()
│ └─ LLM step: send_messages(messages)
└─ aggregate_results(results) → EvalDetail
┌───────────────────────────────────────────────────────────────┐
│ 4. Result Aggregation │
└───────────────────────────────────────────────────────────────┘
EvalDetail
├─ metric: str # "AgentFactCheck"
├─ status: bool # True = issue detected
├─ score: Optional[float] # Numeric score
├─ label: List[str] # ["QUALITY_BAD.HALLUCINATION"]
└─ reason: List[Any] # Dual-layer reason:
# reason[0]: str (human-readable text)
# reason[1]: Dict (structured report, optional)
# ArticleFactChecker uses this for
# text summary + full report dict
┌───────────────────────────────────────────────────────────────┐
│ 5. Summary Generation │
└───────────────────────────────────────────────────────────────┘
ResultInfo → SummaryModel
├─ total_count: int
├─ good_count: int
├─ bad_count: int
├─ type_ratio: Dict[field, Dict[label, count]]
└─ metrics_score_stats: Dict[metric, stats]
Tool Execution Flow
BaseAgent.execute_tool(tool_name, **kwargs)
↓
Check if tool in available_tools
↓
ToolRegistry.get(tool_name) → tool_class
↓
configure_tool(tool_name, tool_class)
├─ Extract config from dynamic_config.agent_config.tools.{tool_name}
└─ tool_class.update_config(config_dict)
↓
tool_class.execute(**kwargs)
├─ Tool-specific logic (API calls, processing, etc.)
└─ Return Dict[str, Any] with 'success' key
↓
Return to agent for processing
Summary
Key Takeaways
- Architecture: Agents extend
BaseOpenAIand are registered via@Model.llm_register() - Location: All agent code lives under
dingo/model/llm/agent/ - Three Patterns: LangChain-based (declarative), Custom Workflow (imperative), Agent-First + Context (hybrid)
- Tool System: Centralized registry with configuration injection
- Execution: Runs in ThreadPoolExecutor alongside other LLMs
- Configuration:
agent_configis a top-level key in evaluator config (flat structure) - Artifact Saving: ArticleFactChecker auto-saves intermediate artifacts to a timestamped directory by default; override via
agent_config.output_path, or disable withagent_config.save_artifacts=false
Implementation Checklist
Creating a new agent:
- Choose pattern (LangChain vs Custom)
- Create agent file under
dingo/model/llm/agent/ - Extend
BaseAgent - Register with
@Model.llm_register("YourAgent") - Define
available_toolslist - Implement required methods based on pattern
- Add tests under
test/scripts/model/llm/agent/ - Update documentation
- Add example usage under
examples/agent/
Creating a new tool:
- Create tool file under
dingo/model/llm/agent/tools/ - Extend
BaseTool - Register with
@tool_register("your_tool") - Implement
execute()method - Define custom
ToolConfigif needed - Add tests under
test/scripts/model/llm/agent/tools/ - Update requirements/agent.txt if dependencies needed
Next Steps
- Read
docs/agent_development_guide.mdfor detailed implementation guide - Study
agent_fact_check.pyfor LangChain pattern example - Study
agent_hallucination.pyfor custom workflow example - Study
agent_article_fact_checker.pyfor Agent-First + artifact saving pattern - Review
examples/agent/for usage examples - Check
test/scripts/model/llm/agent/for testing patterns
Reference Links
- Agent Development Guide - Comprehensive development guide
- Article Fact-Checking Guide - ArticleFactChecker usage guide
- CLAUDE.md - Project overview and common commands
- LangChain Documentation - Agent concepts
- Tavily API - Web search tool documentation