GPU Kill

December 27, 2025 · View on GitHub

A CLI tool for managing GPUs across NVIDIA, AMD, Intel, and Apple Silicon systems. Monitor, control, and secure your GPU infrastructure with ease.

Community & Support

Join our Discord community for discussions, support, and updates:

Features

Monitor GPUs: Real-time usage, memory, temperature, and processes
Kill Processes: Gracefully terminate stuck GPU processes
Security: Detect crypto miners and suspicious activity
Guard Mode: Policy enforcement to prevent resource abuse
Remote: Manage GPUs across multiple servers
Multi-Vendor: Works with NVIDIA, AMD, Intel, and Apple Silicon
AI Integration: MCP server for AI assistant integration

Requirements

Build Performance

For faster development builds:

# Fast release build (recommended for development)
cargo build --profile release-fast

# Standard release build (optimized for production)
cargo build --release

# Maximum optimization (slowest, best performance)
cargo build --profile release-max

Build times on typical hardware:

Debug build: ~3 seconds
Release-fast: ~28 seconds
Release: ~28 seconds (improved from 76 seconds)
Release-max: ~60+ seconds (maximum optimization)

System Dependencies

Linux (Ubuntu/Debian):

sudo apt install build-essential libssl-dev pkg-config

Linux (Fedora/RHEL/CentOS):

sudo dnf install gcc gcc-c++ pkg-config openssl-devel
# or for older systems:
# sudo yum install gcc gcc-c++ pkg-config openssl-devel

macOS:

# Install Xcode command line tools
xcode-select --install
# OpenSSL is included with macOS

Windows:

Install Visual Studio Build Tools
OpenSSL is handled automatically by vcpkg

GPU Drivers

NVIDIA: NVIDIA drivers installed
AMD: ROCm drivers installed
Intel: intel-gpu-tools package installed
Apple Silicon: macOS with Apple Silicon (M1/M2/M3/M4)

Build Requirements

OS: Linux, macOS, or Windows
Rust: 1.70+ (for building from source)

Quick Start

Install & Run

# Build from source (first build may take 2-3 minutes)
git clone https://github.com/treadiehq/gpu-kill.git
cd gpu-kill
cargo build --release

# Or install via Cargo
cargo install gpukill

# Or one-liner installers (recommended)
# macOS/Linux
curl -fsSL https://gpukill.com/install | sh
# Windows (PowerShell)
irm https://gpukill.com/install-windows | iex

# List your GPUs
gpukill --list

# Watch GPU usage in real-time
gpukill --list --watch

Dead-simple cheatsheet

# Live watch (alias)
gpukill watch            # = gpukill --list --watch

# Kill job by PID (positional alias)
gpukill 12345            # = gpukill --kill --pid 12345

# Free a specific GPU index (kill all jobs on GPU 0)
gpukill --kill --gpu 0   # add --batch to actually kill; preview without it

# Force reset a GPU (shorthand)
gpukill --reset 0        # = gpukill --reset --gpu 0

# Safe mode: dry-run first (no changes)
gpukill 12345 --safe     # alias: --dry-run

The GPU Kill dashboard provides a modern web interface for GPU cluster monitoring. The dashboard is included in the repository for local development but is not required for core GPU Kill functionality.

GPU Kill Dashboard

Quick Start

# 1. Start the backend API server
gpukill --server --server-port 8080

# 2. In a new terminal, start the dashboard UI
cd dashboard
npm install  # First time only
npm run dev

# 3. Access the dashboard
open http://localhost:3000

Requirements:

Node.js 18+ and npm
GPU Kill backend server running (provides the API)

Note: You need both the backend server (port 8080) and frontend UI (port 3000) running for the dashboard to work.

Dashboard Features

Real-time monitoring of all GPUs across your cluster
Security detection with threat analysis and risk scoring
Policy management for resource control and enforcement
Cluster overview with Magic Moment contention insights
Interactive controls for process management and GPU operations

Production Deployment

For production GPU monitoring solutions, check the Kill Suite website.

MCP Server

GPU Kill includes a MCP server that enables AI assistants to interact with GPU management functionality:

Resources: Read GPU status, processes, audit data, policies, and security scans
Tools: Kill processes, reset GPUs, scan for threats, create policies

# Start the MCP server
cargo run --release -p gpukill-mcp

# Server runs on http://localhost:3001/mcp

Usage

Ask your AI to use the tools.

What GPUs do I have and what's their current usage?

Kill the Python process that's stuck on GPU 0

Kill all training processes that are using too much GPU memory

Show me GPU usage and kill any stuck processes

Scan for crypto miners and suspicious activity

Create a policy to limit user memory usage to 8GB

Reset GPU 1 because it's not responding

What processes are currently using my GPUs?

See mcp/README.md for detailed MCP server documentation.

Security & Policies

Detect Threats

# Scan for crypto miners and suspicious activity
gpukill --audit --rogue

# Configure detection rules
gpukill --audit --rogue-config

Policy Enforcement

# Enable Guard Mode
gpukill --guard --guard-enable

# Test policies safely
gpukill --guard --guard-test-policies

For detailed security and policy documentation, see DETAILED.md.

Remote Management

Manage GPUs across multiple servers via SSH:

# List GPUs on remote server
gpukill --remote staging-server --list

# Kill process on remote server
gpukill --remote prod-gpu-01 --kill --pid 1234

# Reset GPU on remote server
gpukill --remote gpu-cluster --reset --gpu 0

Troubleshooting

Build Issues

OpenSSL not found:

# Ubuntu/Debian
sudo apt install build-essential libssl-dev pkg-config

# Fedora/RHEL/CentOS
sudo dnf install gcc gcc-c++ pkg-config openssl-devel

Other common build issues:

Ensure you have the latest Rust toolchain: rustup update
Clean and rebuild: cargo clean && cargo build --release
Check system dependencies are installed (see Requirements section)

Need Help?

gpukill --help                    # Show all options
gpukill --version                 # Show version

CI/CD and Testing

GPU Kill uses a CI/CD pipeline with automatic GPU testing:

✅ Conditional GPU testing - Runs automatically when GPU hardware is available
✅ Multi-vendor GPU testing on real hardware (NVIDIA, AMD, Intel, Apple Silicon)
✅ Hot Aisle integration - Optional on-demand GPU instance provisioning for comprehensive testing
✅ Cross-platform compatibility testing
✅ Performance benchmarking and profiling
✅ Security auditing and compliance checks
✅ Stress testing for reliability validation

How GPU Testing Works

On GitHub hosted runners: GPU tests skip gracefully (no GPU hardware)
On self-hosted runners: GPU tests run automatically when GPU hardware is detected
On cloud instances: GPU tests run automatically when GPU hardware is available
On developer machines: GPU tests run automatically when GPU hardware is detected
Via Hot Aisle: On-demand GPU instance provisioning for comprehensive testing

Quick Setup

Option 1: Test Locally (Already Working)

cargo test --test gpu_hardware_tests  # Runs on your GPU hardware

Option 2: Set Up Cloud GPU (5 minutes)

# On any cloud GPU instance:
curl -sSL https://raw.githubusercontent.com/treadiehq/gpu-kill/main/scripts/setup-gpu-runner.sh | bash

Option 3: Self-Hosted Runner See CI_CD.md for detailed information about our testing infrastructure and how to set up self-hosted runners with GPU hardware.

Option 4: Hot Aisle Integration (Optional)

# Build with Hot Aisle feature
cargo build --release --features hotaisle

# Integration tests run automatically (no API key required)
# For actual GPU testing:
# 1. Set up HOTAISLE_API_KEY in GitHub Secrets
# 2. Manually trigger "Hot Aisle GPU Testing" workflow
# 3. Tests run on real GPU hardware with automatic cleanup

Option 5: Cloud GPU Setup See docs/CLOUD_GPU_SETUP.md for AWS, GCP, and Azure GPU instance setup.

Documentation

DETAILED.md - Complete documentation, API reference, and advanced features
CI_CD.md - CI/CD pipeline and testing infrastructure
docs/HOTAISLE_INTEGRATION.md - Hot Aisle integration guide
docs/CLOUD_GPU_SETUP.md - Cloud GPU setup guide (AWS, GCP, Azure)

License

This project is licensed under the FSL-1.1-MIT License. See the LICENSE file for details.