AIStore K8s Deployment Guide

May 18, 2026 ยท View on GitHub

This document provides guidance for deploying AIStore clusters on Kubernetes (K8s).

Contents

  1. Prerequisites
  2. Deployment Steps
  3. Post-Deployment Steps
  4. Troubleshooting Help

Prerequisites

Generally, any recent version of K8s on a recent Linux OS will be sufficient for AIS. See the prerequisites doc to ensure your cluster is ready.

For network setup details, see the network configuration doc.

  • Ansible Host Config Playbooks To assist you in setting up your system for AIStore, we've included a set of Ansible playbooks for host configuration. For an effective initial setup, we suggest following the ais_host_config_common guide. This will help you fine-tune your system to meet AIStore's requirements, ensuring optimal performance.

  • Persistent Volumes:

    • The AIS Operator does NOT format disks or create persistent volumes -- we expect this to be done beforehand as it varies per deployment.
    • For details on PV requirements and the PVC naming convention, see Target Data Persistent Volumes.
    • See the create-pv Helm Chart for a reference template for creating node-local HostPath-type PVs.

Deployment Steps

Note: Please refer to the compatibility matrix for AIStore and ais-operator. We recommend and only support the latest versions for both.

Operator Deployment

With Kubernetes installed and the nodes properly configured, it's time to deploy the AIS Operator.

Operator Deployment Options:

Choose ONE of the following:

  1. Helm Chart -- Refer to the AIS Helm docs
  2. Local build (custom builds, development, and testing) -- Refer to the AIS Operator docs
  3. Default Manifest -- Apply a specific manifest with default values directly from the GitHub release artifact:
export AIS_OPERATOR_VERSION=v2.15.0
kubectl apply -f https://github.com/NVIDIA/ais-k8s/releases/download/$AIS_OPERATOR_VERSION/ais-operator.yaml

Wait for the operator to come up as ready:

kubectl wait --for=condition=available --timeout=120s deployment/ais-operator-controller-manager -n ais-operator-system

Optionally, use kubectl to check the status of the deployed pods:

$ kubectl get pods -n ais-operator-system

The AIS Operator pod should be in the Running state, indicating a successful deployment.

Once deployed, the AIS Operator will reconcile the state of any deployed AIStore custom resources.

AIStore Deployment

With the AIS Operator deployed, the next step is to configure and deploy an AIStore custom resource. Again, there are a few deployment options:

  1. Helm Charts (recommended) -- See AIS Helm Charts
  2. Ansible Playbooks (deprecated) -- Refer to the Ansible Playbook docs for details
  3. Manual resource creation (advanced)
    • If you want to manage everything yourself, it is possible to create the required namespace, PVs, secrets, and AIStore custom resource separately.
    • The AIS Operator will create all the other K8s resources based on the AIS spec (configmaps, statefulsets, services, pods, etc.).
    • Reference our samples, helm template, and commands used in the ansible playbooks.

Multihome Deployment:

  • For a multihome deployment using multiple network interfaces, some extra configuration is required before deploying the cluster.
  • Refer to the multihome deployment doc for details.

After deployment, verify all AIS pods are ready and running:

$ watch kubectl get pods -n <cluster-namespace>

Notes

  • In some Kubernetes deployments, the default cluster domain name might differ from cluster.local which can be overridden using the clusterDomain spec option.
  • For production environments, it's recommended to operate one proxy and one target per Kubernetes (K8s) node as shown in the above playbooks. Multiple storage targets can also be deployed on a single K8s node for testing or higher availability.

Configuring access

See the operator docs for configuring external access to AIS proxies and targets.

Post-Deployment Steps

Client Pod Access

We currently offer two options for deploying a client Pod within the cluster:

  • adminClient option in AIS spec will create a managed deployment with a pre-configured pod. See the operator documentation.
  • ais-client Helm Chart offers an independent chart for configuring the deployment. See the chart documentation

Monitoring

AIStore supports a /metrics endpoint to provide prometheus metrics and outputs logs using a sidecar container to K8s standard logging interface. See the AIS docs on metrics and reference metrics.

We also provide Helm charts for configuring our monitoring stack as a starting point or reference: Monitoring Resources.

Performance Testing with aisloader

For evaluating the performance of your AIS cluster, we provide the aisloader load generation tool. Additionally, aisloader-composer includes a variety of scripts and Ansible playbooks for running aisloader across multiple hosts.

Troubleshooting

If you encounter any problems during the deployment process, feel free to report them on the AIStore repository's issues page. We welcome your feedback and queries to enhance the deployment experience.

We also provide a troubleshooting doc for steps to resolve some of the issues you might come across.

Happy deploying! ๐ŸŽ‰๐Ÿš€๐Ÿ–ฅ๏ธ