Usage Guide
June 8, 2026 · View on GitHub
Argus provides two independent tools: a CLI for quick JVM diagnostics and an Agent with a real-time web dashboard.
Argus CLI
Standalone diagnostic commands that work on any running JVM process. No agent attachment or application restart required.
How it works: Argus CLI uses jcmd, jstat, and JDK management APIs to query target JVM processes externally. Works on Java 11+.
Quick Start
# Install
curl -fsSL https://raw.githubusercontent.com/rlaope/argus/master/install.sh | bash
# List JVM processes
argus ps
# Diagnose a process (replace <pid> with actual PID)
argus info <pid>
argus gc <pid>
argus threads <pid>
argus histo <pid>
argus heap <pid>
Key Commands
| Command | What it shows | When to use |
|---|---|---|
argus ps | Running JVM processes (version, uptime) | Find the PID of your target app |
argus info | JVM version, uptime, CPU%, flags | Verify target JVM details |
argus gc | GC collector stats, pause times | Investigate GC pressure |
argus gcutil | Memory pool utilization % | Check generation fill levels |
argus gcrun | Trigger System.gc() remotely | Force GC before heap dump |
argus heap | Heap used/committed/max | Spot memory leaks |
argus histo | Top heap-consuming classes | Find what's eating memory |
argus buffers | NIO direct/mapped buffer pools | Diagnose direct buffer leaks |
argus threads | Thread count by state, daemon, peak | Detect thread leaks or deadlocks |
argus threaddump | Full thread dump with stack traces | jstack replacement |
argus deadlock | Deadlocked threads | Production hang diagnosis |
argus sc | Search loaded classes by pattern | Classpath conflicts |
argus logger | View/change log levels at runtime | Production debugging |
argus events | VM internal event log | Safepoint/deopt analysis |
argus compilerqueue | JIT compilation queue | Slow startup diagnosis |
argus jfranalyze | Analyze JFR recording files | Post-mortem analysis |
argus metaspace | Metaspace usage | ClassLoader leak detection |
argus nmt | Native memory breakdown | Off-heap memory investigation |
argus profile | CPU profiling (async-profiler) | Find hot code paths |
argus doctor | Health diagnosis with tuning recs | "Why is my app slow?" |
argus gclog | GC log analysis (GCEasy alternative) | GC tuning |
argus flame | One-shot flame graph + browser open | CPU hotspot analysis |
argus watch | Real-time terminal dashboard | Continuous monitoring |
argus suggest | JVM flag optimization by workload | Configuration tuning |
All commands support --format=json for scripting and --source=auto|agent|jdk to choose the data source.
Argus Agent (Dashboard)
Real-time JVM monitoring via a web dashboard. Attach as a Java agent to your application.
How it works: The agent uses JFR (Java Flight Recorder) streaming to capture GC, CPU, memory, thread, and virtual thread events with minimal overhead. Data is served via a built-in Netty web server.
Quick Start
# Start your app with Argus agent
java -javaagent:argus-agent.jar \
-Dargus.gc.enabled=true \
-Dargus.cpu.enabled=true \
-jar your-app.jar
# Open dashboard
open http://localhost:9202/
Dashboard Sections
| Section | Description | Java Version |
|---|---|---|
| JVM Health | GC pause timeline, heap usage, GC overhead | 17+ |
| CPU Utilization | JVM and system CPU load over time | 17+ |
| GC Cause Distribution | Breakdown of why GC is triggered | 17+ |
| Allocation & Metaspace | Allocation rate, metaspace growth, top allocating classes | 21+ |
| Carrier Thread Distribution | Virtual thread load across carrier threads | 21+ |
| Profiling & Contention | Hot methods, lock contention hotspots | 21+ |
| Flame Graph | Interactive CPU flame graph from execution samples | 21+ |
| Recommendations | Auto-generated insights from cross-correlation analysis | 21+ |
| Virtual Threads | Thread events, pinning alerts, hotspot analysis | 21+ |
Interactive Console
Access /console.html from the dashboard header to run diagnostic commands directly in the browser. Commands execute on the attached JVM using MXBeans — no separate CLI needed.
Dashboard Modes
The local web UI has two operational modes:
| Mode | URL | Data source | Status shown |
|---|---|---|---|
| Standalone JVM | / on an attached Argus agent | WebSocket events plus REST snapshots | Live stream: connected/disconnected |
| Cluster selected pod | / served by the aggregator with ?pod=<pod-id> | REST snapshot polling through /pod/{id} | Snapshot polling: selected pod, scrape status, last scrape |
The Fleet, Profiles, Console, and Dashboard pages share the selected pod with the
argus.console.pod browser storage key and with query params where useful:
| Page | Context params |
|---|---|
| Dashboard | ?pod=<pod-id> |
| Profiles | `?pod= |
| Console | ?pod=<pod-id> |
| Fleet | #pod/<pod-id> |
The Dashboard's incident synopsis ranks current heap, GC, CPU, pinning, contention, allocation, and scrape signals so the first screen explains why a pod or JVM looks unhealthy before the detailed charts.
Configuration
All settings are passed as -D system properties:
| Property | Default | Description |
|---|---|---|
argus.server.port | 9202 | Dashboard port |
argus.gc.enabled | true | GC event collection |
argus.cpu.enabled | true | CPU sampling |
argus.cpu.interval | 1000 | CPU sample interval (ms) |
argus.allocation.enabled | false | Object allocation tracking (high overhead) |
argus.profiling.enabled | false | Method profiling (high overhead) |
argus.contention.enabled | false | Lock contention tracking |
argus.metrics.prometheus.enabled | true | Prometheus /prometheus endpoint |
Spring Boot Integration
Add argus-spring-boot-starter to your Spring Boot 3.2+ application:
implementation("io.argus:argus-spring-boot-starter:1.5.0")
Choose a posture: argus.mode
argus.mode | What runs | When to use |
|---|---|---|
full (default) | JFR streaming + ArgusServer on port 9202 + every diagnostic bean + actuator endpoints | Demos, local development, dedicated monitoring sidecars |
diagnostics | Doctor / GC log / GC score beans + actuator endpoints + optional scheduled doctor — no JFR stream, no port 9202 | Production self-diagnosis without a daemon listener |
off | Nothing | Equivalent to argus.enabled=false |
argus:
mode: diagnostics # production-safe default
Programmatic API
The starter exposes three injectable services so application code can run Argus analyses on demand:
@Autowired DoctorService doctor;
@GetMapping("/admin/health/jvm")
public List<Finding> jvmHealth() {
return doctor.diagnoseLocal(); // or doctor.diagnoseRemote(pid)
}
| Bean | API |
|---|---|
DoctorService | diagnoseLocal(), diagnoseRemote(pid), diagnose(snapshot), lastCollectionWarnings() |
GcLogAnalyzerService | analyze(Path), analyze(List<GcEvent>) |
GcScoreService | score(analysis), score(analysis, gcAlgorithm), inferAlgorithm(analysis) |
All three are @ConditionalOnMissingBean so applications can override with custom implementations.
Actuator endpoints
| HTTP | Returns |
|---|---|
GET /actuator/argus-doctor | JVM health findings (severity histogram + suggested flags + full finding list) for the local JVM |
GET /actuator/argus-doctor/{pid} | Same shape, for a remote JVM via jcmd |
GET /actuator/argus-gc | GC log analysis + score (reads argus.doctor.gc-log-path; status-coded response when the path is unset / missing / unparseable) |
GET /actuator/health/argus | JFR engine + server status (only present in mode=full) |
Opt in with the standard Actuator gate:
management:
endpoints:
web:
exposure:
include: argus-doctor,argus-gc,health
Scheduled doctor + structured logging
Opt in to a background doctor that runs on a fixed interval and emits one slf4j log line per finding — designed for direct ingestion by Loki / Datadog / Vector / Logstash:
argus:
doctor:
schedule:
enabled: true # off by default
interval-ms: 60000 # 1 min
Output (logger name argus.doctor, severity mapped to log level):
argus.doctor severity=CRITICAL category=GC title="Frequent GC events: 350 events/min"
argus.doctor severity=WARNING category=Threads title="Blocked thread ratio: 12%"
argus.doctor severity=INFO category=Memory title="Heap usage: 4.2 GB / 8.0 GB"
Use the diagnostics library outside Spring
argus-diagnostics ships as a framework-agnostic JAR — usable from Quarkus, Micronaut, IDE plugins, or plain java -jar apps:
implementation("io.argus:argus-diagnostics:1.5.0")
import io.argus.diagnostics.doctor.DoctorEngine;
import io.argus.diagnostics.doctor.JvmSnapshotCollector;
var findings = DoctorEngine.diagnose(JvmSnapshotCollector.collectLocal());
Diagnose a ZGC Outage
A step-by-step walkthrough for investigating a ZGC-related latency spike or OOM.
Step 0 — Capture a healthy baseline (pre-incident)
During a known-good window, capture a baseline snapshot so you have something to diff against later:
argus zgc <PID> --save=baseline.txt
Store this file somewhere accessible (e.g. /tmp/zgc-baseline.txt or a shared volume). If you already have a baseline from a previous incident window, skip this step.
Step 1 — Get a verdict in 30 seconds
argus zgc <PID>
argus zgc attaches to the JVM via JMX, starts a 30-second JFR recording with settings=profile, and prints a HEALTHY / WARNING / UNHEALTHY verdict with allocation stall counts, cycle overlap status, SoftMax breach detection, and STW pause averages. When stalls are present, the output also shows a Top alloc sources during capture block with the top-5 allocation call sites from the same JFR recording — no separate profile step needed in most cases.
If the target JVM is not using ZGC, the command exits immediately with a message showing the active collector. Confirm with argus gc <PID> and switch with -XX:+UseZGC (JDK 15+) or -XX:+UseZGC -XX:+ZGenerational (JDK 21–23).
Step 2 — Run doctor for cross-cutting findings
If the verdict is WARNING or UNHEALTHY, run the full health check:
argus doctor <PID>
argus doctor fires all health rules, including the ZGC-specific ZgcSoftMaxBreachRule (WARNING) and ZgcCycleOverlapRule (CRITICAL), alongside general heap, CPU, and thread rules. Exit code 2 means critical findings require immediate action.
Step 3 — Profile allocations if stalls are present
If Step 1 reported allocation stalls, review the Top alloc sources block first. If the hot site is your own code, investigate it directly. If you need a longer or more detailed profile:
argus profile <PID> --event=alloc --duration=30
This shows the top allocation call sites by stack frame. Address the top allocators first — reducing allocation rate is often more effective than raising -Xmx alone.
Step 4 — Apply recommendations and confirm
After tuning (raise -Xmx, set -XX:SoftMaxHeapSize, raise -XX:ConcGCThreads, or fix hot allocation sites), confirm the verdict improved. If you captured a baseline in Step 0, use diff for a precise comparison:
argus zgc <PID> --diff=baseline.txt
A healthy post-tuning run shows no REGRESSION rows in the diff, and a standalone run returns Verdict: HEALTHY — ZGC is keeping up. with no stalls and no overlap.
CI/CD Integration
Add JVM health checks to your pipeline. Exit codes are machine-readable: 0=pass, 1=warnings, 2=critical.
- name: JVM Health Check
uses: rlaope/Argus/action@master
with:
command: ci
fail-on: critical
format: github-annotations
argus ci --pid=auto --fail-on=critical --format=summary
For profile regression gates between builds:
argus profile <pid> --duration=30 --save=before.json
# ... deploy new build ...
argus profile <pid> --duration=30 --save=after.json
argus profile-gate before.json after.json --threshold=5 --annotate=github
Monitoring Stack (Prometheus / OTLP / Docker)
Native Prometheus + Grafana integration. Deploy to Kubernetes with the included Helm chart or import docs/grafana-dashboard.json directly into Grafana.
# Prometheus scrape endpoint (no extra config needed)
curl http://localhost:9202/prometheus
# Export to OpenTelemetry Collector
java -javaagent:~/.argus/argus-agent.jar \
-Dargus.otlp.enabled=true \
-Dargus.otlp.endpoint=http://localhost:4318/v1/metrics \
-jar your-app.jar
# Docker — diagnose any JVM on the host
docker run --pid=host ghcr.io/rlaope/argus doctor
docker run --pid=host ghcr.io/rlaope/argus watch
The Grafana dashboard provides datasource, namespace, deployment, pod, and instance variables, incident-first rows, GC pause percentiles, and local drilldown links back to Fleet, Dashboard, Profiles, and Console.
Helm chart, Grafana dashboard JSON, and K8s setup: docs/kubernetes.md
Continuous ZGC monitoring during a deploy
Use this workflow to catch ZGC regressions introduced by a new release before declaring the deploy stable.
-
Before the deploy, capture a pre-deploy baseline:
argus zgc <PID> --save=pre-deploy.txt -
After the deploy (allow 1–2 minutes for JVM warm-up), run 10 minutes of continuous monitoring at 60-second intervals:
argus zgc <PID> --watch=10 --interval=60Each iteration prints a 1-line summary showing heap, cycles, stalls, and mark-end delta from the previous iteration. Every 5th iteration prints the full diagnosis table. Ctrl-C at any point stops the loop, cleans up the JFR recording, and prints a final summary.
-
If any iteration shows ✘ stalls or ⚠ committed heap growth, diff against the pre-deploy baseline before declaring success:
argus zgc <PID> --diff=pre-deploy.txtAny REGRESSION row (✘) needs investigation. New stalls and softMax breaches are the most critical signals and should block the deploy from going fully live.