Julius: LLM Service Fingerprinting Tool

March 24, 2026 ยท View on GitHub

Julius - Open source LLM service fingerprinting tool for security professionals. Identify Ollama, vLLM, LiteLLM and 17+ AI services.

Julius: LLM Service Fingerprinting Tool

Identify Ollama, vLLM, LiteLLM, and 60+ AI services running on any endpoint in seconds.

Go Version License Build Status Go Report Card

Julius is an LLM service fingerprinting tool for security professionals. It detects which AI server software is running on network endpoints during penetration tests, attack surface discovery, and security assessments.

Unlike model fingerprinting tools that identify which LLM generated text, Julius identifies the server infrastructure: Is that endpoint running Ollama? vLLM? LiteLLM? A Hugging Face deployment? Julius answers in seconds.

Table of Contents

The Problem

You've discovered an open port during a security assessment. Is it Ollama on port 11434? vLLM? LiteLLM? A Hugging Face endpoint? Some other AI service?

Manually checking each possibility is slow and error-prone. Different LLM services have different API signatures, default ports, and response patterns.

Julius solves this by automatically fingerprinting LLM services - sending targeted HTTP probes and matching response signatures to identify the exact service running.

Features

FeatureDescription
63 LLM ServicesDetects Ollama, vLLM, LiteLLM, LocalAI, Hugging Face TGI, AWS Bedrock, and 57 more
Fast ScanningConcurrent probing with intelligent port-based prioritization
Model DiscoveryExtracts available models from identified endpoints
Specificity Scoring1-100 scoring ranks results by most specific match (e.g., LiteLLM over generic OpenAI-compatible)
Multiple InputsSingle target, file input, or stdin piping
Flexible OutputTable, JSON, or JSONL formats for easy integration
ExtensibleAdd new service detection via simple YAML probe files
Offline OperationNo cloud dependencies - runs entirely locally
Single BinaryGo-based tool compiles to one portable executable

Quick Start

Installation

go install github.com/praetorian-inc/julius/cmd/julius@latest

Basic Usage

julius probe https://target.example.com

Example Output

+----------------------------+---------+-------------+-------------+--------+-------+
|           TARGET           | SERVICE | SPECIFICITY |  CATEGORY   | MODELS | ERROR |
+----------------------------+---------+-------------+-------------+--------+-------+
| https://target.example.com | ollama  |         100 | self-hosted |        |       |
+----------------------------+---------+-------------+-------------+--------+-------+

Supported LLM Services

Julius identifies 63 LLM platforms across self-hosted, gateway, RAG/orchestration, and cloud-managed categories:

Self-Hosted LLM Servers (25)

ServiceDefault PortDescription
Ollama11434Popular local LLM server with easy model management
vLLM8000High-throughput LLM inference engine
SGLang30000High-performance LLM serving engine
LocalAI8080OpenAI-compatible local AI server
llama.cpp8080CPU-optimized LLM inference
Hugging Face TGI3000Text Generation Inference server
NVIDIA NIM8000NVIDIA's enterprise inference microservices
NVIDIA TensorRT-LLM8000NVIDIA TensorRT-LLM inference server
NVIDIA Triton8000NVIDIA Triton Inference Server (KServe v2)
BentoML3000AI application framework for serving models
Ray Serve8265Scalable model serving on Ray cluster
Aphrodite Engine2242Large-scale LLM inference engine
Baseten Truss8080Open-source ML model serving framework
DeepSpeed-MII28080High-throughput inference powered by DeepSpeed
FastChat21001Open platform for LLM chatbots
GPT4All4891Run local models on any device
Gradio7860ML model demo interfaces
Jan1337Local OpenAI-compatible API server
KoboldCpp5001AI text-generation for GGML/GGUF models
LM Studio1234Desktop LLM application with API server
MLC LLM8000Universal deployment engine with ML compilation
Petals5000Decentralized BitTorrent-style LLM inference
PowerInfer8080CPU/GPU hybrid inference engine
TabbyAPI5000FastAPI-based server for ExLlama
Text Generation WebUI5000Local LLM interface with API

Gateway/Proxy Services (8)

ServiceDefault PortDescription
LiteLLM4000Unified proxy for 100+ LLM providers
Bifrost8080High-performance unified LLM gateway
Envoy AI Gateway80Unified access to generative AI services
Helicone8585Open-source LLM observability platform and gateway
Kong AI Gateway8001Enterprise API gateway with AI plugins
OmniRoute20128AI gateway with smart routing and caching
Portkey AI Gateway8787Unified gateway for 200+ LLM providers
TensorZero3000Rust-based LLM gateway with observability

RAG & Orchestration Platforms (18)

ServiceDefault PortDescription
AnythingLLM3001All-in-one AI application with RAG and agents
AstrBot6185Multi-platform LLM chatbot framework
BetterChatGPT3000Enhanced ChatGPT interface
Dify80LLM app development platform with workflow orchestration
Flowise3000Low-code platform for AI agents and workflows
h2oGPT7860Private local GPT with document Q&A
HuggingFace Chat UI3000Open source ChatGPT-style interface
Langflow7860Low-code platform for AI agents and RAG
LibreChat3080Multi-provider chat interface with RAG
LobeHub3210Multi-agent AI collaboration platform
NextChat3000Self-hosted ChatGPT-style interface
Onyx3000Enterprise search and chat with RAG
OpenClaw18789AI agent gateway and control plane
Open WebUI3000ChatGPT-style interface for local LLMs
PrivateGPT8001Private document Q&A with LLMs
Quivr5050RAG platform for AI assistants
RAGFlow80RAG engine with deep document understanding
SillyTavern8000Character-based chat application

Cloud-Managed Services (11)

ServiceDefault PortDescription
AWS Bedrock443Foundation model hosting and inference
Azure OpenAI443Microsoft Azure OpenAI Service
Cloudflare AI Gateway443AI proxy with caching and observability
Databricks Model Serving443Real-time ML inference endpoints
Fireworks AI443Cloud inference platform for LLMs
Google Vertex AI443ML training and generative AI platform
Groq443LPU-accelerated cloud inference
Modal443Serverless AI compute platform
Replicate443Cloud ML platform with prediction API
Salesforce Einstein443Salesforce AI platform
Together AI443Cloud inference for open-source models

Generic Detection

ServiceDescription
OpenAI-compatibleAny server implementing OpenAI's API specification

Usage

Single Target

Scan a single endpoint for LLM services:

julius probe https://target.example.com
julius probe https://target.example.com:11434
julius probe 192.168.1.100:8080

Multiple Targets

Scan multiple endpoints efficiently:

# Command line arguments
julius probe https://target1.example.com https://target2.example.com

# From file (one target per line)
julius probe -f targets.txt

# From stdin (pipe from other tools)
cat targets.txt | julius probe -
echo "https://target.example.com" | julius probe -

Output Formats

Choose the output format that fits your workflow:

# Table format (default) - human-readable
julius probe https://target.example.com

# JSON format - structured output
julius probe -o json https://target.example.com

# JSONL format - one JSON object per line, ideal for piping
julius probe -o jsonl https://target.example.com | jq '.service'

Model Discovery

When Julius identifies an LLM service, it can also extract available models:

julius probe -o json https://ollama.example.com | jq '.models'
{
  "target": "https://ollama.example.com",
  "service": "ollama",
  "models": ["llama2", "mistral", "codellama"]
}

Advanced Options

# Adjust concurrency (default: 10)
julius probe -c 20 https://target.example.com

# Increase timeout for slow endpoints (default: 5 seconds)
julius probe -t 10 https://target.example.com

# Use custom probe definitions
julius probe -p ./my-probes https://target.example.com

# Verbose output for debugging
julius probe -v https://target.example.com

# Quiet mode - only show matches
julius probe -q https://target.example.com

# List all available probes
julius list

How It Works

Julius uses HTTP-based service fingerprinting to identify LLM platforms:

flowchart LR
    A[Target URL] --> B[Load Probes]
    B --> C[HTTP Requests]
    C --> D[Rule Matching]
    D --> E{Match?}
    E -->|Yes| F[Report Service]
    E -->|No| G[Try Next Probe]
    G --> C

    subgraph Scanner
        C
        D
        E
    end

Detection Process

  1. Target Normalization: Validates and normalizes input URLs
  2. Probe Selection: Prioritizes probes matching the target's port
  3. HTTP Probing: Sends requests to service-specific endpoints
  4. Rule Matching: Compares responses against signature patterns
  5. Specificity Scoring: Orders results by most specific match first
  6. Model Extraction: Optionally retrieves available models via JQ expressions

Match Rules

Each probe defines rules that must all match for identification:

Rule TypeDescriptionExample
statusHTTP status code200, 404
body.containsResponse body contains string"models":
body.prefixResponse body starts with{"object":
content-typeContent-Type header equals valueapplication/json
header.containsHeader contains valueX-Custom: foo
header.prefixHeader starts with valuetext/

All rules support negation with not: true.

Architecture

cmd/julius/          CLI entrypoint
pkg/
  runner/            Command execution (probe, list, validate)
  scanner/           HTTP client, response caching, model extraction
  rules/             Match rule engine (status, body, header patterns)
  output/            Formatters (table, JSON, JSONL)
  probe/             Probe loader (embedded YAML + filesystem)
  types/             Core data structures
probes/              YAML probe definitions (one per service)

Key Design Decisions

  • Concurrent scanning with bounded goroutine pools via errgroup
  • Response caching with MD5 deduplication and singleflight
  • Embedded probes compiled into binary for portability
  • Plugin-style rules for easy extension
  • Port-based prioritization for faster identification

Adding Custom Probes

Create a YAML file in probes/ to detect new LLM services:

name: my-llm-service
description: My custom LLM service detection
category: self-hosted
port_hint: 8080
api_docs: https://example.com/api-docs

requests:
  - path: /health
    method: GET
    match:
      - type: status
        value: 200
      - type: body.contains
        value: '"service":"my-llm"'

  - path: /api/version
    method: GET
    match:
      - type: status
        value: 200
      - type: content-type
        value: application/json

models:
  path: /api/models
  method: GET
  extract: ".models[].name"

Validate your probe:

julius validate ./probes

See CONTRIBUTING.md for the complete probe specification.

FAQ

What is LLM service fingerprinting?

LLM service fingerprinting identifies what LLM server software (Ollama, vLLM, LiteLLM, etc.) is running on a network endpoint. This differs from model fingerprinting, which identifies which AI model generated a piece of text.

Julius answers: "What server is running on this port?" Model fingerprinting answers: "Which LLM wrote this text?"

How is Julius different from Shodan-based detection?

Tools like Cisco's Shodan-based Ollama detector query internet-wide scan databases. Julius performs active probing against specific targets you control, working offline without external dependencies. It also detects 60+ services versus single-service detection.

Is Julius safe for penetration testing?

Yes. Julius only sends standard HTTP requests - the same as a web browser or curl. It does not:

  • Exploit vulnerabilities
  • Attempt authentication bypass
  • Perform denial of service
  • Modify or delete data
  • Execute code on targets

Always ensure you have authorization before scanning targets.

How do I add support for a new LLM service?

  1. Create a YAML probe file in probes/ (e.g., probes/my-service.yaml)
  2. Define HTTP requests with match rules
  3. Validate with julius validate ./probes
  4. Test against a live instance
  5. Submit a pull request

See CONTRIBUTING.md for detailed examples.

Why doesn't Julius detect my LLM service?

Common reasons:

  1. Non-default port: Try specifying the full URL with port
  2. Authentication required: Julius doesn't handle auth; the endpoint may be protected
  3. Custom configuration: The service may have non-standard API paths
  4. Unsupported service: Consider adding a custom probe

Can Julius detect services behind reverse proxies?

Yes, if the proxy forwards requests to the backend LLM service endpoints. Julius matches on response content, not network-level signatures.

Why "Julius"?

Named after Julius Caesar - the original fingerprinter of Roman politics.

Troubleshooting

Error: "no matches found"

Cause: No probe signatures matched the target's responses.

Solutions:

  1. Verify the target URL is correct and accessible
  2. Check if the service requires authentication
  3. Try with verbose mode: julius probe -v https://target
  4. The service may not be in Julius's probe database - consider adding a custom probe

Error: "connection refused"

Cause: Target is not accepting connections on the specified port.

Solutions:

  1. Verify the target host and port are correct
  2. Check if a firewall is blocking the connection
  3. Ensure the LLM service is running

Error: "timeout"

Cause: Target didn't respond within the timeout period.

Solutions:

  1. Increase timeout: julius probe -t 15 https://target
  2. Check network connectivity to the target
  3. The service may be overloaded or unresponsive

Slow scanning performance

Cause: Default concurrency may be too low for many targets.

Solutions:

  1. Increase concurrency: julius probe -c 50 -f targets.txt
  2. Use JSONL output for faster streaming: julius probe -o jsonl -f targets.txt

Contributing

We welcome contributions! See CONTRIBUTING.md for:

  • Adding new LLM service probes
  • Creating new match rule types
  • Testing guidelines
  • Code style requirements

Security

Julius is designed for authorized security testing only. See SECURITY.md for:

  • Security considerations and responsible use
  • What Julius does and does not do
  • Reporting security issues

Support

If you find Julius useful, please consider:

Star History Chart

License

Apache 2.0 - Praetorian Security, Inc.


Built by Praetorian - Offensive Security Solutions