DeepSeek Deep Research

February 6, 2025 · View on GitHub

DeepSeekDeepResearch is a Python-based AI research assistant that continuously searches for and extracts relevant information from the web based on a user query. It leverages a local LLM (DeepSeek-R1:7b via Ollama), the Google Custom Search API for search queries, and asynchronous webpage extraction (Newspaper3k or Jina) to generate a comprehensive, detailed report on a given topic.

The final report is saved as final_report.txt in the repository directory upon completion.

Features

Iterative Research Loop - Continuously refines search queries and gathers context until no additional queries are needed.
Asynchronous Processing - Performs searches, webpage fetching, and context extraction concurrently for improved speed.
Local LLM-Powered Decision Making - Uses DeepSeek-R1:7b via Ollama to:
- Generate precise search queries
- Evaluate webpage usefulness
- Extract relevant context from webpages
- Produce a final comprehensive report
Google Custom Search Integration - Uses the Google Custom Search JSON API to retrieve relevant links for each generated query.
Webpage Extraction - Extracts webpage content using Newspaper3k (with Jina as a fallback if needed).
Duplicate Filtering - Aggregates and deduplicates links across search rounds to ensure efficiency.
Final Report Generation and Saving - Compiles all extracted information into a detailed final report and saves it as final_report.txt.

Setup

1. Clone or Download the Repository

git clone https://github.com/yourusername/local-deep-researcher.git
cd local-deep-researcher

2. Configure API Keys

Open the main.py file.
Replace the placeholders for GOOGLE_API_KEY and GOOGLE_CX with your actual Google API Key and Custom Search Engine ID.
If using Jina for webpage extraction, update JINA_API_KEY and JINA_BASE_URL as needed.

3. Ensure Ollama is Installed

Verify that Ollama is installed on your system and that the following command runs without errors:

ollama run deepseek-r1:7b

4. Install Dependencies

Ensure you have Python 3.8+ and install the required dependencies:

pip install nest_asyncio aiohttp tenacity google-api-python-client newspaper3k

Usage

Run the Program

Execute the script from your terminal:

python main.py

Input Prompt

Research Query/Topic: Enter a research query or topic (e.g., "What is UAP?").
Maximum Number of Iterations: Optionally, specify the max iterations (default is 10).

Research Process

Initial Query & Search Generation:
- The local LLM (DeepSeek-R1:7b via Ollama) generates up to four distinct search queries based on your input.
Concurrent Search & Extraction:
- Each query is sent to the Google Custom Search API.
- The program aggregates, deduplicates links, and fetches webpage content asynchronously.
Evaluation & Context Extraction:
- The LLM evaluates webpage relevance and extracts useful context.
Iterative Refinement:
- The LLM determines if additional search queries are needed.
- The process repeats until:
  - The iteration limit is reached, or
  - No new queries are generated.
Final Report Generation & Saving:
- A detailed final report is compiled, printed to the console, and saved as final_report.txt.

How It Works

1. Input & Query Generation

The LLM processes the user’s query and generates up to four distinct search queries.

2. Concurrent Google Searches

Each query is sent concurrently to the Google Custom Search API.
The returned links are aggregated and deduplicated.

3. Webpage Processing

For each unique link:

Content Extraction → Extracted using Newspaper3k (or Jina as a fallback).
Usefulness Evaluation → The LLM checks if the content is relevant.
Context Extraction → Extracts key information from relevant pages.

The LLM reviews gathered context and decides if additional searches are needed.
If required, new search queries are generated; otherwise, the loop terminates.

5. Final Report Compilation

All relevant information is compiled and passed to the LLM.
The LLM generates a final comprehensive research report.
The report is printed to the console and saved as final_report.txt.

Troubleshooting

PSReadline Warning

If you see a PSReadline warning on Windows, you can safely ignore it.

Asyncio Runtime Errors

If you encounter:

RuntimeError: asyncio.run() cannot be called from a running event loop

Ensure nest_asyncio is applied at the start of the script:
```
import nest_asyncio
nest_asyncio.apply()
```

API Issues

Google API Errors: Verify that your Google API Key and Custom Search Engine ID are correct.
Quota Limits: Check if you've exceeded your Google API quota.

Ollama Errors

If the local LLM call fails:
- Ensure Ollama is installed correctly.
- Check that running:
```
ollama run deepseek-r1:7b
```
  Works without errors in your terminal.

License

This project is licensed under the MIT License. See LICENSE for details.

🔥 Contributions & Feedback

We welcome contributions! Feel free to fork, submit issues, or contribute to the project.