Using predictive-maintenance-mcp with Ollama
March 5, 2026 · View on GitHub
This guide explains how to use the Predictive Maintenance MCP Server with Ollama for fully local, air-gapped vibration analysis.
Why Ollama? All signal data stays on your machine. No API keys needed. Full privacy for sensitive industrial data.
Prerequisites
- Ollama installed and running (download)
- predictive-maintenance-mcp installed (see INSTALL.md)
- A model with tool-calling support (see below)
Step 1: Install a Compatible Model
MCP requires tool calling (function calling) support. Not all Ollama models support this. Recommended models:
| Model | Size | Tool Calling | Notes |
|---|---|---|---|
qwen2.5:14b | ~9 GB | ✅ | Best balance of quality and size |
qwen2.5:7b | ~4.7 GB | ✅ | Lighter, good for screening |
qwen2.5:32b | ~20 GB | ✅ | Best quality for complex diagnosis |
llama3.1:8b | ~4.7 GB | ✅ | Good general purpose |
llama3.1:70b | ~40 GB | ✅ | High quality, needs >48 GB RAM |
mistral-nemo:12b | ~7.1 GB | ✅ | Strong reasoning |
# Pull a model (example: Qwen 2.5 14B)
ollama pull qwen2.5:14b
Verify it's running:
ollama list
Step 2: Connect via an MCP Client
Ollama itself does not natively speak the MCP protocol. You need an MCP client that can bridge Ollama's API with MCP servers. Several options exist:
Option A: Open WebUI (Recommended)
Open WebUI supports both Ollama backends and MCP tool servers.
-
Install Open WebUI:
pip install open-webui open-webui serve -
In the Open WebUI settings, add the MCP server:
- Go to Settings → Tools → MCP Servers
- Add a new server with command:
/path/to/predictive-maintenance-mcp/.venv/bin/python /path/to/predictive-maintenance-mcp/src/machinery_diagnostics_server.py - On Windows:
C:\path\to\predictive-maintenance-mcp\.venv\Scripts\python.exe C:\path\to\predictive-maintenance-mcp\src\machinery_diagnostics_server.py
-
Select your Ollama model and start analyzing.
Option B: Claude Code (CLI) with Ollama Backend
If you use Claude Code with an Ollama-compatible API proxy, configure the MCP server in your project's .mcp.json:
{
"mcpServers": {
"predictive-maintenance": {
"command": "/path/to/.venv/bin/python",
"args": ["/path/to/src/machinery_diagnostics_server.py"]
}
}
}
Option C: Custom Python Client
For programmatic access, use the MCP Python SDK directly with Ollama's OpenAI-compatible API:
import asyncio
import json
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import httpx
OLLAMA_URL = "http://localhost:11434/api/chat"
MODEL = "qwen2.5:14b"
async def main():
# Connect to MCP server
server_params = StdioServerParameters(
command="python",
args=["src/machinery_diagnostics_server.py"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# List available tools
tools = await session.list_tools()
# Convert MCP tools to Ollama format
ollama_tools = []
for tool in tools.tools:
ollama_tools.append({
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema
}
})
# Chat with Ollama
messages = [
{"role": "user", "content": "List available vibration signals"}
]
response = httpx.post(OLLAMA_URL, json={
"model": MODEL,
"messages": messages,
"tools": ollama_tools,
"stream": False
})
result = response.json()
# Handle tool calls from Ollama
if result["message"].get("tool_calls"):
for call in result["message"]["tool_calls"]:
tool_result = await session.call_tool(
call["function"]["name"],
arguments=call["function"]["arguments"]
)
print(f"Tool: {call['function']['name']}")
print(f"Result: {tool_result.content}")
asyncio.run(main())
Step 3: Verify the Connection
Once connected, ask the model to list available signals:
List available vibration signals
Expected: The model calls list_signals() and returns a list of .csv files from data/signals/.
Then try a quick screening:
Analyze statistics for real_train/baseline_1.csv
Performance Considerations
| Aspect | Recommendation |
|---|---|
| Model size | 14B+ for reliable tool calling; 7B works for simple queries |
| RAM | Model size × 1.2 + 4 GB for the MCP server and data |
| GPU | Strongly recommended for models ≥14B |
| Quantization | Q4_K_M is a good balance (default in Ollama) |
| Context window | 8K minimum; 32K+ recommended for multi-step diagnosis |
Limitations with Local Models
- Tool calling reliability: Smaller models (<7B) may format tool calls incorrectly or hallucinate tool names. Use 14B+ for production diagnostics.
- Multi-step workflows: Complex workflows like
diagnose_bearing(8 steps) work best with 32B+ models. Consider breaking into individual tool calls for smaller models. - Evidence-based inference: Local models may not follow the evidence-based inference policy as strictly as Claude. Always verify diagnostic conclusions against the raw tool outputs.
Troubleshooting
Model doesn't call tools
- Ensure your model supports tool calling (
ollama show <model>to check capabilities) - Try a larger model (14B+)
- Some models need explicit prompting: "Use the
analyze_statisticstool to..."
Connection refused
- Verify Ollama is running:
ollama serve - Check the port: default is
http://localhost:11434
MCP server not starting
- Use absolute paths in the configuration
- Verify the venv:
.venv/Scripts/python.exe -c "import mcp; print('ok')" - Check INSTALL.md troubleshooting section
Out of memory
- Use a smaller model or increase swap
- Close other applications
- Try quantized versions:
ollama pull qwen2.5:14b-q4_0
Security Note
Running everything locally (Ollama + MCP server) means:
- Zero data leaves your network — ideal for proprietary industrial data
- No API keys — no cloud dependency
- Full air-gap capable — works offline after initial model download
See SECURITY.md for the full privacy architecture.