Usage Guide

June 8, 2026 · View on GitHub

Argus provides two independent tools: a CLI for quick JVM diagnostics and an Agent with a real-time web dashboard.

Argus CLI

Standalone diagnostic commands that work on any running JVM process. No agent attachment or application restart required.

How it works: Argus CLI uses jcmd, jstat, and JDK management APIs to query target JVM processes externally. Works on Java 11+.

Quick Start

# Install
curl -fsSL https://raw.githubusercontent.com/rlaope/argus/master/install.sh | bash

# List JVM processes
argus ps

# Diagnose a process (replace <pid> with actual PID)
argus info <pid>
argus gc <pid>
argus threads <pid>
argus histo <pid>
argus heap <pid>

Key Commands

Command	What it shows	When to use
`argus ps`	Running JVM processes (version, uptime)	Find the PID of your target app
`argus info`	JVM version, uptime, CPU%, flags	Verify target JVM details
`argus gc`	GC collector stats, pause times	Investigate GC pressure
`argus gcutil`	Memory pool utilization %	Check generation fill levels
`argus gcrun`	Trigger System.gc() remotely	Force GC before heap dump
`argus heap`	Heap used/committed/max	Spot memory leaks
`argus histo`	Top heap-consuming classes	Find what's eating memory
`argus buffers`	NIO direct/mapped buffer pools	Diagnose direct buffer leaks
`argus threads`	Thread count by state, daemon, peak	Detect thread leaks or deadlocks
`argus threaddump`	Full thread dump with stack traces	jstack replacement
`argus deadlock`	Deadlocked threads	Production hang diagnosis
`argus sc`	Search loaded classes by pattern	Classpath conflicts
`argus logger`	View/change log levels at runtime	Production debugging
`argus events`	VM internal event log	Safepoint/deopt analysis
`argus compilerqueue`	JIT compilation queue	Slow startup diagnosis
`argus jfranalyze`	Analyze JFR recording files	Post-mortem analysis
`argus metaspace`	Metaspace usage	ClassLoader leak detection
`argus nmt`	Native memory breakdown	Off-heap memory investigation
`argus profile`	CPU profiling (async-profiler)	Find hot code paths
`argus doctor`	Health diagnosis with tuning recs	"Why is my app slow?"
`argus gclog`	GC log analysis (GCEasy alternative)	GC tuning
`argus flame`	One-shot flame graph + browser open	CPU hotspot analysis
`argus watch`	Real-time terminal dashboard	Continuous monitoring
`argus suggest`	JVM flag optimization by workload	Configuration tuning

All commands support --format=json for scripting and --source=auto|agent|jdk to choose the data source.

Argus Agent (Dashboard)

Real-time JVM monitoring via a web dashboard. Attach as a Java agent to your application.

How it works: The agent uses JFR (Java Flight Recorder) streaming to capture GC, CPU, memory, thread, and virtual thread events with minimal overhead. Data is served via a built-in Netty web server.

Quick Start

# Start your app with Argus agent
java -javaagent:argus-agent.jar \
  -Dargus.gc.enabled=true \
  -Dargus.cpu.enabled=true \
  -jar your-app.jar

# Open dashboard
open http://localhost:9202/

Dashboard Sections

Section	Description	Java Version
JVM Health	GC pause timeline, heap usage, GC overhead	17+
CPU Utilization	JVM and system CPU load over time	17+
GC Cause Distribution	Breakdown of why GC is triggered	17+
Allocation & Metaspace	Allocation rate, metaspace growth, top allocating classes	21+
Carrier Thread Distribution	Virtual thread load across carrier threads	21+
Profiling & Contention	Hot methods, lock contention hotspots	21+
Flame Graph	Interactive CPU flame graph from execution samples	21+
Recommendations	Auto-generated insights from cross-correlation analysis	21+
Virtual Threads	Thread events, pinning alerts, hotspot analysis	21+

Interactive Console

Access /console.html from the dashboard header to run diagnostic commands directly in the browser. Commands execute on the attached JVM using MXBeans — no separate CLI needed.

Dashboard Modes

The local web UI has two operational modes:

Mode	URL	Data source	Status shown
Standalone JVM	`/` on an attached Argus agent	WebSocket events plus REST snapshots	`Live stream: connected/disconnected`
Cluster selected pod	`/` served by the aggregator with `?pod=<pod-id>`	REST snapshot polling through `/pod/{id}`	`Snapshot polling: selected pod, scrape status, last scrape`

The Fleet, Profiles, Console, and Dashboard pages share the selected pod with the argus.console.pod browser storage key and with query params where useful:

Page	Context params
Dashboard	`?pod=<pod-id>`
Profiles	`?pod=&event=cpu
Console	`?pod=<pod-id>`
Fleet	`#pod/<pod-id>`

The Dashboard's incident synopsis ranks current heap, GC, CPU, pinning, contention, allocation, and scrape signals so the first screen explains why a pod or JVM looks unhealthy before the detailed charts.

Configuration

All settings are passed as -D system properties:

Property	Default	Description
`argus.server.port`	`9202`	Dashboard port
`argus.gc.enabled`	`true`	GC event collection
`argus.cpu.enabled`	`true`	CPU sampling
`argus.cpu.interval`	`1000`	CPU sample interval (ms)
`argus.allocation.enabled`	`false`	Object allocation tracking (high overhead)
`argus.profiling.enabled`	`false`	Method profiling (high overhead)
`argus.contention.enabled`	`false`	Lock contention tracking
`argus.metrics.prometheus.enabled`	`true`	Prometheus `/prometheus` endpoint

Spring Boot Integration

Add argus-spring-boot-starter to your Spring Boot 3.2+ application:

implementation("io.argus:argus-spring-boot-starter:1.5.0")

Choose a posture: `argus.mode`

`argus.mode`	What runs	When to use
`full` (default)	JFR streaming + ArgusServer on port 9202 + every diagnostic bean + actuator endpoints	Demos, local development, dedicated monitoring sidecars
`diagnostics`	Doctor / GC log / GC score beans + actuator endpoints + optional scheduled doctor — no JFR stream, no port 9202	Production self-diagnosis without a daemon listener
`off`	Nothing	Equivalent to `argus.enabled=false`

argus:
  mode: diagnostics                # production-safe default

Programmatic API

The starter exposes three injectable services so application code can run Argus analyses on demand:

@Autowired DoctorService doctor;

@GetMapping("/admin/health/jvm")
public List<Finding> jvmHealth() {
    return doctor.diagnoseLocal();        // or doctor.diagnoseRemote(pid)
}

Bean	API
`DoctorService`	`diagnoseLocal()`, `diagnoseRemote(pid)`, `diagnose(snapshot)`, `lastCollectionWarnings()`
`GcLogAnalyzerService`	`analyze(Path)`, `analyze(List<GcEvent>)`
`GcScoreService`	`score(analysis)`, `score(analysis, gcAlgorithm)`, `inferAlgorithm(analysis)`

All three are @ConditionalOnMissingBean so applications can override with custom implementations.

Actuator endpoints

HTTP	Returns
`GET /actuator/argus-doctor`	JVM health findings (severity histogram + suggested flags + full finding list) for the local JVM
`GET /actuator/argus-doctor/{pid}`	Same shape, for a remote JVM via `jcmd`
`GET /actuator/argus-gc`	GC log analysis + score (reads `argus.doctor.gc-log-path`; status-coded response when the path is unset / missing / unparseable)
`GET /actuator/health/argus`	JFR engine + server status (only present in `mode=full`)

Opt in with the standard Actuator gate:

management:
  endpoints:
    web:
      exposure:
        include: argus-doctor,argus-gc,health

Scheduled doctor + structured logging

Opt in to a background doctor that runs on a fixed interval and emits one slf4j log line per finding — designed for direct ingestion by Loki / Datadog / Vector / Logstash:

argus:
  doctor:
    schedule:
      enabled: true             # off by default
      interval-ms: 60000        # 1 min

Output (logger name argus.doctor, severity mapped to log level):

argus.doctor severity=CRITICAL category=GC      title="Frequent GC events: 350 events/min"
argus.doctor severity=WARNING  category=Threads title="Blocked thread ratio: 12%"
argus.doctor severity=INFO     category=Memory  title="Heap usage: 4.2 GB / 8.0 GB"

Use the diagnostics library outside Spring

argus-diagnostics ships as a framework-agnostic JAR — usable from Quarkus, Micronaut, IDE plugins, or plain java -jar apps:

implementation("io.argus:argus-diagnostics:1.5.0")

import io.argus.diagnostics.doctor.DoctorEngine;
import io.argus.diagnostics.doctor.JvmSnapshotCollector;

var findings = DoctorEngine.diagnose(JvmSnapshotCollector.collectLocal());

Diagnose a ZGC Outage

A step-by-step walkthrough for investigating a ZGC-related latency spike or OOM.

Step 0 — Capture a healthy baseline (pre-incident)

During a known-good window, capture a baseline snapshot so you have something to diff against later:

argus zgc <PID> --save=baseline.txt

Store this file somewhere accessible (e.g. /tmp/zgc-baseline.txt or a shared volume). If you already have a baseline from a previous incident window, skip this step.

Step 1 — Get a verdict in 30 seconds

argus zgc <PID>

argus zgc attaches to the JVM via JMX, starts a 30-second JFR recording with settings=profile, and prints a HEALTHY / WARNING / UNHEALTHY verdict with allocation stall counts, cycle overlap status, SoftMax breach detection, and STW pause averages. When stalls are present, the output also shows a Top alloc sources during capture block with the top-5 allocation call sites from the same JFR recording — no separate profile step needed in most cases.

If the target JVM is not using ZGC, the command exits immediately with a message showing the active collector. Confirm with argus gc <PID> and switch with -XX:+UseZGC (JDK 15+) or -XX:+UseZGC -XX:+ZGenerational (JDK 21–23).

Step 2 — Run doctor for cross-cutting findings

If the verdict is WARNING or UNHEALTHY, run the full health check:

argus doctor <PID>

argus doctor fires all health rules, including the ZGC-specific ZgcSoftMaxBreachRule (WARNING) and ZgcCycleOverlapRule (CRITICAL), alongside general heap, CPU, and thread rules. Exit code 2 means critical findings require immediate action.

Step 3 — Profile allocations if stalls are present

If Step 1 reported allocation stalls, review the Top alloc sources block first. If the hot site is your own code, investigate it directly. If you need a longer or more detailed profile:

argus profile <PID> --event=alloc --duration=30

This shows the top allocation call sites by stack frame. Address the top allocators first — reducing allocation rate is often more effective than raising -Xmx alone.

Step 4 — Apply recommendations and confirm

After tuning (raise -Xmx, set -XX:SoftMaxHeapSize, raise -XX:ConcGCThreads, or fix hot allocation sites), confirm the verdict improved. If you captured a baseline in Step 0, use diff for a precise comparison:

argus zgc <PID> --diff=baseline.txt

A healthy post-tuning run shows no REGRESSION rows in the diff, and a standalone run returns Verdict: HEALTHY — ZGC is keeping up. with no stalls and no overlap.

CI/CD Integration

Add JVM health checks to your pipeline. Exit codes are machine-readable: 0=pass, 1=warnings, 2=critical.

- name: JVM Health Check
  uses: rlaope/Argus/action@master
  with:
    command: ci
    fail-on: critical
    format: github-annotations

argus ci --pid=auto --fail-on=critical --format=summary

For profile regression gates between builds:

argus profile <pid> --duration=30 --save=before.json
# ... deploy new build ...
argus profile <pid> --duration=30 --save=after.json
argus profile-gate before.json after.json --threshold=5 --annotate=github

Monitoring Stack (Prometheus / OTLP / Docker)

Native Prometheus + Grafana integration. Deploy to Kubernetes with the included Helm chart or import docs/grafana-dashboard.json directly into Grafana.

# Prometheus scrape endpoint (no extra config needed)
curl http://localhost:9202/prometheus

# Export to OpenTelemetry Collector
java -javaagent:~/.argus/argus-agent.jar \
     -Dargus.otlp.enabled=true \
     -Dargus.otlp.endpoint=http://localhost:4318/v1/metrics \
     -jar your-app.jar

# Docker — diagnose any JVM on the host
docker run --pid=host ghcr.io/rlaope/argus doctor
docker run --pid=host ghcr.io/rlaope/argus watch

The Grafana dashboard provides datasource, namespace, deployment, pod, and instance variables, incident-first rows, GC pause percentiles, and local drilldown links back to Fleet, Dashboard, Profiles, and Console.

Helm chart, Grafana dashboard JSON, and K8s setup: docs/kubernetes.md

Continuous ZGC monitoring during a deploy

Use this workflow to catch ZGC regressions introduced by a new release before declaring the deploy stable.

Before the deploy, capture a pre-deploy baseline:
```
argus zgc <PID> --save=pre-deploy.txt
```
After the deploy (allow 1–2 minutes for JVM warm-up), run 10 minutes of continuous monitoring at 60-second intervals:
```
argus zgc <PID> --watch=10 --interval=60
```
Each iteration prints a 1-line summary showing heap, cycles, stalls, and mark-end delta from the previous iteration. Every 5th iteration prints the full diagnosis table. Ctrl-C at any point stops the loop, cleans up the JFR recording, and prints a final summary.
If any iteration shows ✘ stalls or ⚠ committed heap growth, diff against the pre-deploy baseline before declaring success:
```
argus zgc <PID> --diff=pre-deploy.txt
```
Any REGRESSION row (✘) needs investigation. New stalls and softMax breaches are the most critical signals and should block the deploy from going fully live.