Overview

March 25, 2026 ยท View on GitHub

Build Tests Nightly Tests

Overview

XPK (Accelerated Processing Kit, pronounced x-p-k) is a command line interface that simplifies cluster creation and workload execution on Google Kubernetes Engine (GKE). XPK generates preconfigured, training-optimized clusters and allows easy workload scheduling without any Kubernetes expertise.

XPK is recommended for quick creation of GKE clusters for proofs of concepts and testing.

XPK decouples provisioning capacity from running jobs. There are two structures: clusters (provisioned VMs) and workloads (training jobs). Clusters represent the physical resources you have available. Workloads represent training jobs -- at any time some of these will be completed, others will be running and some will be queued, waiting for cluster resources to become available.

The ideal workflow starts by provisioning the clusters for all of the ML hardware you have reserved. Then, without re-provisioning, submit jobs as needed. By eliminating the need for re-provisioning between jobs, using Docker containers with pre-installed dependencies and cross-ahead of time compilation, these queued jobs run with minimal start times. Further, because workloads return the hardware back to the shared pool when they complete, developers can achieve better use of finite hardware resources. And automated tests can run overnight while resources tend to be underutilized.

XPK supports a variety of hardware accelerators.

AcceleratorTypeRecipes
Ironwoodtpu7xRun training workload with Ironwood and regular/gSC/DWS Calendar reservations using GCS Bucket storage
Run training workload with Ironwood with flex-start using Filestore storage
Run training workload with Ironwood and flex-start using Lustre storage
Trilliumv6eCreate Cluster
Create Workload
TPU v5pv5pCreate Cluster
Create Workload
TPU v5ev5eCreate Cluster
Create Workload
TPU v4v4Create Cluster
Create Workload
GPU A4Xgb200Create Cluster
Create Workload
GPU A4b200Create Cluster
Create Workload
GPU A3 Ultrah200Create Cluster
Create Workload
GPU A3 Megah100-megaCreate Cluster
Create Workload
GPU A3 Highh100Create Cluster
Create Workload
GPU A100A100Create Cluster
Create Workload
CPUn2-standard-32Create Cluster
Create Workload

XPK also supports the following Google Cloud Storage solutions:

Storage TypeDocumentation
Cloud Storage FUSEdocs
Filestoredocs
Parallelstoredocs
Block storage (Persistent Disk, Hyperdisk)docs

Documentation

Dependencies

DependencyWhen used
Google Cloud SDK (gcloud)always
kubectlalways (Auto-installed)
ClusterToolkitProvisioning GPU clusters (Auto-installed)
KueueScheduling workloads (Auto-installed)
JobSetWorkload creation (Auto-installed)
CraneBuilding workload container (Auto-installed)
CoreDNSCluster set up (Auto-installed)

Privacy notice

To help improve XPK, feature usage statistics are collected and sent to Google. You can opt-out at any time by executing the following shell command:

xpk config set send-telemetry <true/false>

XPK telemetry overall is handled in accordance with the Google Privacy Policy. When you use XPK to interact with or utilize GCP Services, your information is handled in accordance with the Google Cloud Privacy Notice.

Contributing

Please read contributing.md for details on our code of conduct, and the process for submitting pull requests to us.

Get involved

We'd love to hear from you! If you have questions or want to discuss ideas, join us on GitHub Discussions. Found a bug or have a feature request? Please let us know on GitHub Issues.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details