Using predictive-maintenance-mcp with Ollama

March 5, 2026 · View on GitHub

This guide explains how to use the Predictive Maintenance MCP Server with Ollama for fully local, air-gapped vibration analysis.

Why Ollama? All signal data stays on your machine. No API keys needed. Full privacy for sensitive industrial data.


Prerequisites

  • Ollama installed and running (download)
  • predictive-maintenance-mcp installed (see INSTALL.md)
  • A model with tool-calling support (see below)

Step 1: Install a Compatible Model

MCP requires tool calling (function calling) support. Not all Ollama models support this. Recommended models:

ModelSizeTool CallingNotes
qwen2.5:14b~9 GBBest balance of quality and size
qwen2.5:7b~4.7 GBLighter, good for screening
qwen2.5:32b~20 GBBest quality for complex diagnosis
llama3.1:8b~4.7 GBGood general purpose
llama3.1:70b~40 GBHigh quality, needs >48 GB RAM
mistral-nemo:12b~7.1 GBStrong reasoning
# Pull a model (example: Qwen 2.5 14B)
ollama pull qwen2.5:14b

Verify it's running:

ollama list

Step 2: Connect via an MCP Client

Ollama itself does not natively speak the MCP protocol. You need an MCP client that can bridge Ollama's API with MCP servers. Several options exist:

Open WebUI supports both Ollama backends and MCP tool servers.

  1. Install Open WebUI:

    pip install open-webui
    open-webui serve
    
  2. In the Open WebUI settings, add the MCP server:

    • Go to Settings → Tools → MCP Servers
    • Add a new server with command:
      /path/to/predictive-maintenance-mcp/.venv/bin/python /path/to/predictive-maintenance-mcp/src/machinery_diagnostics_server.py
      
    • On Windows:
      C:\path\to\predictive-maintenance-mcp\.venv\Scripts\python.exe C:\path\to\predictive-maintenance-mcp\src\machinery_diagnostics_server.py
      
  3. Select your Ollama model and start analyzing.

Option B: Claude Code (CLI) with Ollama Backend

If you use Claude Code with an Ollama-compatible API proxy, configure the MCP server in your project's .mcp.json:

{
  "mcpServers": {
    "predictive-maintenance": {
      "command": "/path/to/.venv/bin/python",
      "args": ["/path/to/src/machinery_diagnostics_server.py"]
    }
  }
}

Option C: Custom Python Client

For programmatic access, use the MCP Python SDK directly with Ollama's OpenAI-compatible API:

import asyncio
import json
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import httpx

OLLAMA_URL = "http://localhost:11434/api/chat"
MODEL = "qwen2.5:14b"

async def main():
    # Connect to MCP server
    server_params = StdioServerParameters(
        command="python",
        args=["src/machinery_diagnostics_server.py"]
    )
    
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            
            # List available tools
            tools = await session.list_tools()
            
            # Convert MCP tools to Ollama format
            ollama_tools = []
            for tool in tools.tools:
                ollama_tools.append({
                    "type": "function",
                    "function": {
                        "name": tool.name,
                        "description": tool.description,
                        "parameters": tool.inputSchema
                    }
                })
            
            # Chat with Ollama
            messages = [
                {"role": "user", "content": "List available vibration signals"}
            ]
            
            response = httpx.post(OLLAMA_URL, json={
                "model": MODEL,
                "messages": messages,
                "tools": ollama_tools,
                "stream": False
            })
            
            result = response.json()
            
            # Handle tool calls from Ollama
            if result["message"].get("tool_calls"):
                for call in result["message"]["tool_calls"]:
                    tool_result = await session.call_tool(
                        call["function"]["name"],
                        arguments=call["function"]["arguments"]
                    )
                    print(f"Tool: {call['function']['name']}")
                    print(f"Result: {tool_result.content}")

asyncio.run(main())

Step 3: Verify the Connection

Once connected, ask the model to list available signals:

List available vibration signals

Expected: The model calls list_signals() and returns a list of .csv files from data/signals/.

Then try a quick screening:

Analyze statistics for real_train/baseline_1.csv

Performance Considerations

AspectRecommendation
Model size14B+ for reliable tool calling; 7B works for simple queries
RAMModel size × 1.2 + 4 GB for the MCP server and data
GPUStrongly recommended for models ≥14B
QuantizationQ4_K_M is a good balance (default in Ollama)
Context window8K minimum; 32K+ recommended for multi-step diagnosis

Limitations with Local Models

  • Tool calling reliability: Smaller models (<7B) may format tool calls incorrectly or hallucinate tool names. Use 14B+ for production diagnostics.
  • Multi-step workflows: Complex workflows like diagnose_bearing (8 steps) work best with 32B+ models. Consider breaking into individual tool calls for smaller models.
  • Evidence-based inference: Local models may not follow the evidence-based inference policy as strictly as Claude. Always verify diagnostic conclusions against the raw tool outputs.

Troubleshooting

Model doesn't call tools

  • Ensure your model supports tool calling (ollama show <model> to check capabilities)
  • Try a larger model (14B+)
  • Some models need explicit prompting: "Use the analyze_statistics tool to..."

Connection refused

  • Verify Ollama is running: ollama serve
  • Check the port: default is http://localhost:11434

MCP server not starting

  • Use absolute paths in the configuration
  • Verify the venv: .venv/Scripts/python.exe -c "import mcp; print('ok')"
  • Check INSTALL.md troubleshooting section

Out of memory

  • Use a smaller model or increase swap
  • Close other applications
  • Try quantized versions: ollama pull qwen2.5:14b-q4_0

Security Note

Running everything locally (Ollama + MCP server) means:

  • Zero data leaves your network — ideal for proprietary industrial data
  • No API keys — no cloud dependency
  • Full air-gap capable — works offline after initial model download

See SECURITY.md for the full privacy architecture.