Usage Guide

June 8, 2026 · View on GitHub

Argus provides two independent tools: a CLI for quick JVM diagnostics and an Agent with a real-time web dashboard.


Argus CLI

Standalone diagnostic commands that work on any running JVM process. No agent attachment or application restart required.

How it works: Argus CLI uses jcmd, jstat, and JDK management APIs to query target JVM processes externally. Works on Java 11+.

Quick Start

# Install
curl -fsSL https://raw.githubusercontent.com/rlaope/argus/master/install.sh | bash

# List JVM processes
argus ps

# Diagnose a process (replace <pid> with actual PID)
argus info <pid>
argus gc <pid>
argus threads <pid>
argus histo <pid>
argus heap <pid>

Key Commands

CommandWhat it showsWhen to use
argus psRunning JVM processes (version, uptime)Find the PID of your target app
argus infoJVM version, uptime, CPU%, flagsVerify target JVM details
argus gcGC collector stats, pause timesInvestigate GC pressure
argus gcutilMemory pool utilization %Check generation fill levels
argus gcrunTrigger System.gc() remotelyForce GC before heap dump
argus heapHeap used/committed/maxSpot memory leaks
argus histoTop heap-consuming classesFind what's eating memory
argus buffersNIO direct/mapped buffer poolsDiagnose direct buffer leaks
argus threadsThread count by state, daemon, peakDetect thread leaks or deadlocks
argus threaddumpFull thread dump with stack tracesjstack replacement
argus deadlockDeadlocked threadsProduction hang diagnosis
argus scSearch loaded classes by patternClasspath conflicts
argus loggerView/change log levels at runtimeProduction debugging
argus eventsVM internal event logSafepoint/deopt analysis
argus compilerqueueJIT compilation queueSlow startup diagnosis
argus jfranalyzeAnalyze JFR recording filesPost-mortem analysis
argus metaspaceMetaspace usageClassLoader leak detection
argus nmtNative memory breakdownOff-heap memory investigation
argus profileCPU profiling (async-profiler)Find hot code paths
argus doctorHealth diagnosis with tuning recs"Why is my app slow?"
argus gclogGC log analysis (GCEasy alternative)GC tuning
argus flameOne-shot flame graph + browser openCPU hotspot analysis
argus watchReal-time terminal dashboardContinuous monitoring
argus suggestJVM flag optimization by workloadConfiguration tuning

All commands support --format=json for scripting and --source=auto|agent|jdk to choose the data source.


Argus Agent (Dashboard)

Real-time JVM monitoring via a web dashboard. Attach as a Java agent to your application.

How it works: The agent uses JFR (Java Flight Recorder) streaming to capture GC, CPU, memory, thread, and virtual thread events with minimal overhead. Data is served via a built-in Netty web server.

Quick Start

# Start your app with Argus agent
java -javaagent:argus-agent.jar \
  -Dargus.gc.enabled=true \
  -Dargus.cpu.enabled=true \
  -jar your-app.jar

# Open dashboard
open http://localhost:9202/

Dashboard Sections

SectionDescriptionJava Version
JVM HealthGC pause timeline, heap usage, GC overhead17+
CPU UtilizationJVM and system CPU load over time17+
GC Cause DistributionBreakdown of why GC is triggered17+
Allocation & MetaspaceAllocation rate, metaspace growth, top allocating classes21+
Carrier Thread DistributionVirtual thread load across carrier threads21+
Profiling & ContentionHot methods, lock contention hotspots21+
Flame GraphInteractive CPU flame graph from execution samples21+
RecommendationsAuto-generated insights from cross-correlation analysis21+
Virtual ThreadsThread events, pinning alerts, hotspot analysis21+

Interactive Console

Access /console.html from the dashboard header to run diagnostic commands directly in the browser. Commands execute on the attached JVM using MXBeans — no separate CLI needed.

Dashboard Modes

The local web UI has two operational modes:

ModeURLData sourceStatus shown
Standalone JVM/ on an attached Argus agentWebSocket events plus REST snapshotsLive stream: connected/disconnected
Cluster selected pod/ served by the aggregator with ?pod=<pod-id>REST snapshot polling through /pod/{id}Snapshot polling: selected pod, scrape status, last scrape

The Fleet, Profiles, Console, and Dashboard pages share the selected pod with the argus.console.pod browser storage key and with query params where useful:

PageContext params
Dashboard?pod=<pod-id>
Profiles`?pod=&event=cpu
Console?pod=<pod-id>
Fleet#pod/<pod-id>

The Dashboard's incident synopsis ranks current heap, GC, CPU, pinning, contention, allocation, and scrape signals so the first screen explains why a pod or JVM looks unhealthy before the detailed charts.

Configuration

All settings are passed as -D system properties:

PropertyDefaultDescription
argus.server.port9202Dashboard port
argus.gc.enabledtrueGC event collection
argus.cpu.enabledtrueCPU sampling
argus.cpu.interval1000CPU sample interval (ms)
argus.allocation.enabledfalseObject allocation tracking (high overhead)
argus.profiling.enabledfalseMethod profiling (high overhead)
argus.contention.enabledfalseLock contention tracking
argus.metrics.prometheus.enabledtruePrometheus /prometheus endpoint

Spring Boot Integration

Add argus-spring-boot-starter to your Spring Boot 3.2+ application:

implementation("io.argus:argus-spring-boot-starter:1.5.0")

Choose a posture: argus.mode

argus.modeWhat runsWhen to use
full (default)JFR streaming + ArgusServer on port 9202 + every diagnostic bean + actuator endpointsDemos, local development, dedicated monitoring sidecars
diagnosticsDoctor / GC log / GC score beans + actuator endpoints + optional scheduled doctor — no JFR stream, no port 9202Production self-diagnosis without a daemon listener
offNothingEquivalent to argus.enabled=false
argus:
  mode: diagnostics                # production-safe default

Programmatic API

The starter exposes three injectable services so application code can run Argus analyses on demand:

@Autowired DoctorService doctor;

@GetMapping("/admin/health/jvm")
public List<Finding> jvmHealth() {
    return doctor.diagnoseLocal();        // or doctor.diagnoseRemote(pid)
}
BeanAPI
DoctorServicediagnoseLocal(), diagnoseRemote(pid), diagnose(snapshot), lastCollectionWarnings()
GcLogAnalyzerServiceanalyze(Path), analyze(List<GcEvent>)
GcScoreServicescore(analysis), score(analysis, gcAlgorithm), inferAlgorithm(analysis)

All three are @ConditionalOnMissingBean so applications can override with custom implementations.

Actuator endpoints

HTTPReturns
GET /actuator/argus-doctorJVM health findings (severity histogram + suggested flags + full finding list) for the local JVM
GET /actuator/argus-doctor/{pid}Same shape, for a remote JVM via jcmd
GET /actuator/argus-gcGC log analysis + score (reads argus.doctor.gc-log-path; status-coded response when the path is unset / missing / unparseable)
GET /actuator/health/argusJFR engine + server status (only present in mode=full)

Opt in with the standard Actuator gate:

management:
  endpoints:
    web:
      exposure:
        include: argus-doctor,argus-gc,health

Scheduled doctor + structured logging

Opt in to a background doctor that runs on a fixed interval and emits one slf4j log line per finding — designed for direct ingestion by Loki / Datadog / Vector / Logstash:

argus:
  doctor:
    schedule:
      enabled: true             # off by default
      interval-ms: 60000        # 1 min

Output (logger name argus.doctor, severity mapped to log level):

argus.doctor severity=CRITICAL category=GC      title="Frequent GC events: 350 events/min"
argus.doctor severity=WARNING  category=Threads title="Blocked thread ratio: 12%"
argus.doctor severity=INFO     category=Memory  title="Heap usage: 4.2 GB / 8.0 GB"

Use the diagnostics library outside Spring

argus-diagnostics ships as a framework-agnostic JAR — usable from Quarkus, Micronaut, IDE plugins, or plain java -jar apps:

implementation("io.argus:argus-diagnostics:1.5.0")
import io.argus.diagnostics.doctor.DoctorEngine;
import io.argus.diagnostics.doctor.JvmSnapshotCollector;

var findings = DoctorEngine.diagnose(JvmSnapshotCollector.collectLocal());

Diagnose a ZGC Outage

A step-by-step walkthrough for investigating a ZGC-related latency spike or OOM.

Step 0 — Capture a healthy baseline (pre-incident)

During a known-good window, capture a baseline snapshot so you have something to diff against later:

argus zgc <PID> --save=baseline.txt

Store this file somewhere accessible (e.g. /tmp/zgc-baseline.txt or a shared volume). If you already have a baseline from a previous incident window, skip this step.

Step 1 — Get a verdict in 30 seconds

argus zgc <PID>

argus zgc attaches to the JVM via JMX, starts a 30-second JFR recording with settings=profile, and prints a HEALTHY / WARNING / UNHEALTHY verdict with allocation stall counts, cycle overlap status, SoftMax breach detection, and STW pause averages. When stalls are present, the output also shows a Top alloc sources during capture block with the top-5 allocation call sites from the same JFR recording — no separate profile step needed in most cases.

If the target JVM is not using ZGC, the command exits immediately with a message showing the active collector. Confirm with argus gc <PID> and switch with -XX:+UseZGC (JDK 15+) or -XX:+UseZGC -XX:+ZGenerational (JDK 21–23).

Step 2 — Run doctor for cross-cutting findings

If the verdict is WARNING or UNHEALTHY, run the full health check:

argus doctor <PID>

argus doctor fires all health rules, including the ZGC-specific ZgcSoftMaxBreachRule (WARNING) and ZgcCycleOverlapRule (CRITICAL), alongside general heap, CPU, and thread rules. Exit code 2 means critical findings require immediate action.

Step 3 — Profile allocations if stalls are present

If Step 1 reported allocation stalls, review the Top alloc sources block first. If the hot site is your own code, investigate it directly. If you need a longer or more detailed profile:

argus profile <PID> --event=alloc --duration=30

This shows the top allocation call sites by stack frame. Address the top allocators first — reducing allocation rate is often more effective than raising -Xmx alone.

Step 4 — Apply recommendations and confirm

After tuning (raise -Xmx, set -XX:SoftMaxHeapSize, raise -XX:ConcGCThreads, or fix hot allocation sites), confirm the verdict improved. If you captured a baseline in Step 0, use diff for a precise comparison:

argus zgc <PID> --diff=baseline.txt

A healthy post-tuning run shows no REGRESSION rows in the diff, and a standalone run returns Verdict: HEALTHY — ZGC is keeping up. with no stalls and no overlap.


CI/CD Integration

Add JVM health checks to your pipeline. Exit codes are machine-readable: 0=pass, 1=warnings, 2=critical.

- name: JVM Health Check
  uses: rlaope/Argus/action@master
  with:
    command: ci
    fail-on: critical
    format: github-annotations
argus ci --pid=auto --fail-on=critical --format=summary

For profile regression gates between builds:

argus profile <pid> --duration=30 --save=before.json
# ... deploy new build ...
argus profile <pid> --duration=30 --save=after.json
argus profile-gate before.json after.json --threshold=5 --annotate=github

Monitoring Stack (Prometheus / OTLP / Docker)

Native Prometheus + Grafana integration. Deploy to Kubernetes with the included Helm chart or import docs/grafana-dashboard.json directly into Grafana.

# Prometheus scrape endpoint (no extra config needed)
curl http://localhost:9202/prometheus

# Export to OpenTelemetry Collector
java -javaagent:~/.argus/argus-agent.jar \
     -Dargus.otlp.enabled=true \
     -Dargus.otlp.endpoint=http://localhost:4318/v1/metrics \
     -jar your-app.jar

# Docker — diagnose any JVM on the host
docker run --pid=host ghcr.io/rlaope/argus doctor
docker run --pid=host ghcr.io/rlaope/argus watch

The Grafana dashboard provides datasource, namespace, deployment, pod, and instance variables, incident-first rows, GC pause percentiles, and local drilldown links back to Fleet, Dashboard, Profiles, and Console.

Helm chart, Grafana dashboard JSON, and K8s setup: docs/kubernetes.md


Continuous ZGC monitoring during a deploy

Use this workflow to catch ZGC regressions introduced by a new release before declaring the deploy stable.

  1. Before the deploy, capture a pre-deploy baseline:

    argus zgc <PID> --save=pre-deploy.txt
    
  2. After the deploy (allow 1–2 minutes for JVM warm-up), run 10 minutes of continuous monitoring at 60-second intervals:

    argus zgc <PID> --watch=10 --interval=60
    

    Each iteration prints a 1-line summary showing heap, cycles, stalls, and mark-end delta from the previous iteration. Every 5th iteration prints the full diagnosis table. Ctrl-C at any point stops the loop, cleans up the JFR recording, and prints a final summary.

  3. If any iteration shows ✘ stalls or ⚠ committed heap growth, diff against the pre-deploy baseline before declaring success:

    argus zgc <PID> --diff=pre-deploy.txt
    

    Any REGRESSION row (✘) needs investigation. New stalls and softMax breaches are the most critical signals and should block the deploy from going fully live.