Julius: LLM Service Fingerprinting Tool

March 24, 2026 · View on GitHub

Julius - Open source LLM service fingerprinting tool for security professionals. Identify Ollama, vLLM, LiteLLM and 17+ AI services.

Julius: LLM Service Fingerprinting Tool

Identify Ollama, vLLM, LiteLLM, and 60+ AI services running on any endpoint in seconds.

Julius is an LLM service fingerprinting tool for security professionals. It detects which AI server software is running on network endpoints during penetration tests, attack surface discovery, and security assessments.

Unlike model fingerprinting tools that identify which LLM generated text, Julius identifies the server infrastructure: Is that endpoint running Ollama? vLLM? LiteLLM? A Hugging Face deployment? Julius answers in seconds.

The Problem
Features
Quick Start
Supported LLM Services
Usage
How It Works
Architecture
Adding Custom Probes
FAQ
Troubleshooting
Contributing
Security
Support
License

The Problem

You've discovered an open port during a security assessment. Is it Ollama on port 11434? vLLM? LiteLLM? A Hugging Face endpoint? Some other AI service?

Manually checking each possibility is slow and error-prone. Different LLM services have different API signatures, default ports, and response patterns.

Julius solves this by automatically fingerprinting LLM services - sending targeted HTTP probes and matching response signatures to identify the exact service running.

Features

Feature	Description
63 LLM Services	Detects Ollama, vLLM, LiteLLM, LocalAI, Hugging Face TGI, AWS Bedrock, and 57 more
Fast Scanning	Concurrent probing with intelligent port-based prioritization
Model Discovery	Extracts available models from identified endpoints
Specificity Scoring	1-100 scoring ranks results by most specific match (e.g., LiteLLM over generic OpenAI-compatible)
Multiple Inputs	Single target, file input, or stdin piping
Flexible Output	Table, JSON, or JSONL formats for easy integration
Extensible	Add new service detection via simple YAML probe files
Offline Operation	No cloud dependencies - runs entirely locally
Single Binary	Go-based tool compiles to one portable executable

Quick Start

Installation

go install github.com/praetorian-inc/julius/cmd/julius@latest

Basic Usage

julius probe https://target.example.com

Example Output

+----------------------------+---------+-------------+-------------+--------+-------+
|           TARGET           | SERVICE | SPECIFICITY |  CATEGORY   | MODELS | ERROR |
+----------------------------+---------+-------------+-------------+--------+-------+
| https://target.example.com | ollama  |         100 | self-hosted |        |       |
+----------------------------+---------+-------------+-------------+--------+-------+

Supported LLM Services

Julius identifies 63 LLM platforms across self-hosted, gateway, RAG/orchestration, and cloud-managed categories:

Self-Hosted LLM Servers (25)

Service	Default Port	Description
Ollama	11434	Popular local LLM server with easy model management
vLLM	8000	High-throughput LLM inference engine
SGLang	30000	High-performance LLM serving engine
LocalAI	8080	OpenAI-compatible local AI server
llama.cpp	8080	CPU-optimized LLM inference
Hugging Face TGI	3000	Text Generation Inference server
NVIDIA NIM	8000	NVIDIA's enterprise inference microservices
NVIDIA TensorRT-LLM	8000	NVIDIA TensorRT-LLM inference server
NVIDIA Triton	8000	NVIDIA Triton Inference Server (KServe v2)
BentoML	3000	AI application framework for serving models
Ray Serve	8265	Scalable model serving on Ray cluster
Aphrodite Engine	2242	Large-scale LLM inference engine
Baseten Truss	8080	Open-source ML model serving framework
DeepSpeed-MII	28080	High-throughput inference powered by DeepSpeed
FastChat	21001	Open platform for LLM chatbots
GPT4All	4891	Run local models on any device
Gradio	7860	ML model demo interfaces
Jan	1337	Local OpenAI-compatible API server
KoboldCpp	5001	AI text-generation for GGML/GGUF models
LM Studio	1234	Desktop LLM application with API server
MLC LLM	8000	Universal deployment engine with ML compilation
Petals	5000	Decentralized BitTorrent-style LLM inference
PowerInfer	8080	CPU/GPU hybrid inference engine
TabbyAPI	5000	FastAPI-based server for ExLlama
Text Generation WebUI	5000	Local LLM interface with API

Gateway/Proxy Services (8)

Service	Default Port	Description
LiteLLM	4000	Unified proxy for 100+ LLM providers
Bifrost	8080	High-performance unified LLM gateway
Envoy AI Gateway	80	Unified access to generative AI services
Helicone	8585	Open-source LLM observability platform and gateway
Kong AI Gateway	8001	Enterprise API gateway with AI plugins
OmniRoute	20128	AI gateway with smart routing and caching
Portkey AI Gateway	8787	Unified gateway for 200+ LLM providers
TensorZero	3000	Rust-based LLM gateway with observability

RAG & Orchestration Platforms (18)

Service	Default Port	Description
AnythingLLM	3001	All-in-one AI application with RAG and agents
AstrBot	6185	Multi-platform LLM chatbot framework
BetterChatGPT	3000	Enhanced ChatGPT interface
Dify	80	LLM app development platform with workflow orchestration
Flowise	3000	Low-code platform for AI agents and workflows
h2oGPT	7860	Private local GPT with document Q&A
HuggingFace Chat UI	3000	Open source ChatGPT-style interface
Langflow	7860	Low-code platform for AI agents and RAG
LibreChat	3080	Multi-provider chat interface with RAG
LobeHub	3210	Multi-agent AI collaboration platform
NextChat	3000	Self-hosted ChatGPT-style interface
Onyx	3000	Enterprise search and chat with RAG
OpenClaw	18789	AI agent gateway and control plane
Open WebUI	3000	ChatGPT-style interface for local LLMs
PrivateGPT	8001	Private document Q&A with LLMs
Quivr	5050	RAG platform for AI assistants
RAGFlow	80	RAG engine with deep document understanding
SillyTavern	8000	Character-based chat application

Cloud-Managed Services (11)

Service	Default Port	Description
AWS Bedrock	443	Foundation model hosting and inference
Azure OpenAI	443	Microsoft Azure OpenAI Service
Cloudflare AI Gateway	443	AI proxy with caching and observability
Databricks Model Serving	443	Real-time ML inference endpoints
Fireworks AI	443	Cloud inference platform for LLMs
Google Vertex AI	443	ML training and generative AI platform
Groq	443	LPU-accelerated cloud inference
Modal	443	Serverless AI compute platform
Replicate	443	Cloud ML platform with prediction API
Salesforce Einstein	443	Salesforce AI platform
Together AI	443	Cloud inference for open-source models

Generic Detection

Service	Description
OpenAI-compatible	Any server implementing OpenAI's API specification

Usage

Single Target

Scan a single endpoint for LLM services:

julius probe https://target.example.com
julius probe https://target.example.com:11434
julius probe 192.168.1.100:8080

Multiple Targets

Scan multiple endpoints efficiently:

# Command line arguments
julius probe https://target1.example.com https://target2.example.com

# From file (one target per line)
julius probe -f targets.txt

# From stdin (pipe from other tools)
cat targets.txt | julius probe -
echo "https://target.example.com" | julius probe -

Output Formats

Choose the output format that fits your workflow:

# Table format (default) - human-readable
julius probe https://target.example.com

# JSON format - structured output
julius probe -o json https://target.example.com

# JSONL format - one JSON object per line, ideal for piping
julius probe -o jsonl https://target.example.com | jq '.service'

Model Discovery

When Julius identifies an LLM service, it can also extract available models:

julius probe -o json https://ollama.example.com | jq '.models'

{
  "target": "https://ollama.example.com",
  "service": "ollama",
  "models": ["llama2", "mistral", "codellama"]
}

Advanced Options

# Adjust concurrency (default: 10)
julius probe -c 20 https://target.example.com

# Increase timeout for slow endpoints (default: 5 seconds)
julius probe -t 10 https://target.example.com

# Use custom probe definitions
julius probe -p ./my-probes https://target.example.com

# Verbose output for debugging
julius probe -v https://target.example.com

# Quiet mode - only show matches
julius probe -q https://target.example.com

# List all available probes
julius list

How It Works

Julius uses HTTP-based service fingerprinting to identify LLM platforms:

flowchart LR
    A[Target URL] --> B[Load Probes]
    B --> C[HTTP Requests]
    C --> D[Rule Matching]
    D --> E{Match?}
    E -->|Yes| F[Report Service]
    E -->|No| G[Try Next Probe]
    G --> C

    subgraph Scanner
        C
        D
        E
    end

Detection Process

Target Normalization: Validates and normalizes input URLs
Probe Selection: Prioritizes probes matching the target's port
HTTP Probing: Sends requests to service-specific endpoints
Rule Matching: Compares responses against signature patterns
Specificity Scoring: Orders results by most specific match first
Model Extraction: Optionally retrieves available models via JQ expressions

Match Rules

Each probe defines rules that must all match for identification:

Rule Type	Description	Example
`status`	HTTP status code	`200`, `404`
`body.contains`	Response body contains string	`"models":`
`body.prefix`	Response body starts with	`{"object":`
`content-type`	Content-Type header equals value	`application/json`
`header.contains`	Header contains value	`X-Custom: foo`
`header.prefix`	Header starts with value	`text/`

All rules support negation with not: true.

Architecture

cmd/julius/          CLI entrypoint
pkg/
  runner/            Command execution (probe, list, validate)
  scanner/           HTTP client, response caching, model extraction
  rules/             Match rule engine (status, body, header patterns)
  output/            Formatters (table, JSON, JSONL)
  probe/             Probe loader (embedded YAML + filesystem)
  types/             Core data structures
probes/              YAML probe definitions (one per service)

Key Design Decisions

Concurrent scanning with bounded goroutine pools via errgroup
Response caching with MD5 deduplication and singleflight
Embedded probes compiled into binary for portability
Plugin-style rules for easy extension
Port-based prioritization for faster identification

Adding Custom Probes

Create a YAML file in probes/ to detect new LLM services:

name: my-llm-service
description: My custom LLM service detection
category: self-hosted
port_hint: 8080
api_docs: https://example.com/api-docs

requests:
  - path: /health
    method: GET
    match:
      - type: status
        value: 200
      - type: body.contains
        value: '"service":"my-llm"'

  - path: /api/version
    method: GET
    match:
      - type: status
        value: 200
      - type: content-type
        value: application/json

models:
  path: /api/models
  method: GET
  extract: ".models[].name"

Validate your probe:

julius validate ./probes

See CONTRIBUTING.md for the complete probe specification.

FAQ

What is LLM service fingerprinting?

LLM service fingerprinting identifies what LLM server software (Ollama, vLLM, LiteLLM, etc.) is running on a network endpoint. This differs from model fingerprinting, which identifies which AI model generated a piece of text.

Julius answers: "What server is running on this port?" Model fingerprinting answers: "Which LLM wrote this text?"

How is Julius different from Shodan-based detection?

Tools like Cisco's Shodan-based Ollama detector query internet-wide scan databases. Julius performs active probing against specific targets you control, working offline without external dependencies. It also detects 60+ services versus single-service detection.

Is Julius safe for penetration testing?

Yes. Julius only sends standard HTTP requests - the same as a web browser or curl. It does not:

Exploit vulnerabilities
Attempt authentication bypass
Perform denial of service
Modify or delete data
Execute code on targets

Always ensure you have authorization before scanning targets.

How do I add support for a new LLM service?

Create a YAML probe file in probes/ (e.g., probes/my-service.yaml)
Define HTTP requests with match rules
Validate with julius validate ./probes
Test against a live instance
Submit a pull request

See CONTRIBUTING.md for detailed examples.

Why doesn't Julius detect my LLM service?

Common reasons:

Non-default port: Try specifying the full URL with port
Authentication required: Julius doesn't handle auth; the endpoint may be protected
Custom configuration: The service may have non-standard API paths
Unsupported service: Consider adding a custom probe

Verify the target URL is correct and accessible
Check if the service requires authentication
Try with verbose mode: julius probe -v https://target
The service may not be in Julius's probe database - consider adding a custom probe