k8s-ops-toolkit

May 31, 2026 · View on GitHub

CI License: MIT Last commit Top language Chart Version Kubernetes Helm Prometheus Grafana Loki OpenCost ArgoCD Open Source

Production-grade Helm bundles and observability for Next.js apps on Kubernetes.

Built by Sarma Linux.


What this is

Most teams reach for Kubernetes when they outgrow Vercel or want to cut costs. Then they spend two weeks configuring the same things everyone else configures: ingress, cert-manager, monitoring, logging, autoscaling, secrets.

This toolkit is those things, ready to go. Drop your Next.js app into the chart, set the domain, install. It includes a full observability stack (Prometheus, Grafana, Loki 3.x, Alertmanager) preconfigured for the common Next.js failure modes, an OpenCost spend dashboard, and a GitOps path through ArgoCD when you want the platform reconciled from git rather than installed by hand.

Architecture

graph TD
  Internet[Internet] -->|443| Ing[ingress-nginx]
  Cert[cert-manager + Let's Encrypt] -.TLS certs.-> Ing
  Ing --> Svc[Next.js Service :80]
  Svc --> Pods[Next.js Pods x N :3000]
  HPA[HorizontalPodAutoscaler] -.scales on CPU.-> Pods
  Pods -->|/api/metrics| Prom[Prometheus]
  Pods -->|stdout| Promtail[Promtail] --> Loki[Loki 3.x]
  OpenCost[OpenCost] --> Prom
  Prom --> Graf[Grafana]
  Loki --> Graf
  Prom --> AM[Alertmanager]
  AM --> Slack[Slack]
  Argo[ArgoCD app-of-apps] -.reconciles.-> Ing
  Argo -.reconciles.-> Prom

What is in the box

  • charts/nextjs-app: Helm chart for any Next.js app. Deployment with a tuned rolling update strategy and hardened security context, ClusterIP service, ingress with cert-manager TLS, HorizontalPodAutoscaler, PodDisruptionBudget, liveness and readiness probes, inline and secret-backed environment injection, and a Prometheus ServiceMonitor.
  • scripts/install.sh: one-shot, version-pinned install of the surrounding platform on a fresh cluster. ingress-nginx, cert-manager with a Let's Encrypt production issuer, kube-prometheus-stack (Prometheus, Grafana, Alertmanager), Loki 3.x with Promtail for logs, and OpenCost for spend, with an optional Slack webhook for alerting.
  • scripts/load-dashboards.sh: loads the bundled Grafana dashboards into the cluster as sidecar ConfigMaps.
  • manifests/: the bundled Grafana dashboards (Next.js app, OpenCost spend), Prometheus alert rules, and the Alertmanager and Loki values files.
  • gitops/argocd/: an app-of-apps that reconciles the same pinned platform from git through ArgoCD, the alternative to the imperative installer.

When to use this, and when not to

Use this if you are moving a Next.js app off a managed platform onto your own Kubernetes cluster and you do not want to hand-write deployment, ingress, TLS, autoscaling, and monitoring manifests. It is a good fit for a platform team standardising several internal Next.js services on one consistent shape, and for cost-controlled staging environments that need real certificates and metrics without much spend.

Do not use this if you are happy on Vercel or another managed platform, because you would be taking on cluster operations you currently pay someone else to handle. It is the wrong tool if you do not run Next.js, since the chart probes /api/health and scrapes /api/metrics and assumes a container that serves on port 3000. It is also not a managed service: you own the cluster, the upgrades, and the on-call.

Quick start

git clone https://github.com/sarmakska/k8s-ops-toolkit.git
cd k8s-ops-toolkit
export KUBECONFIG=~/.kube/your-cluster.yaml
./scripts/install.sh \
  --domain example.com \
  --email you@example.com \
  --slack-webhook https://hooks.slack.com/...

In about 8 minutes you have ingress, TLS, monitoring, logging, cost tracking, and alerting working. Every upstream chart version is pinned in scripts/install.sh, so the same command produces the same platform every time.

Deploy a Next.js app

helm install my-app ./charts/nextjs-app \
  --set image.repository=ghcr.io/you/my-app \
  --set image.tag=v1.0.0 \
  --set ingress.host=app.example.com \
  --set replicas=3

GitOps install (ArgoCD)

Prefer to reconcile the platform from git rather than run a script? Point ArgoCD at the app-of-apps root once and it syncs the same pinned components and self-heals drift:

kubectl apply -n argocd -f gitops/argocd/root.yaml

The child Applications under gitops/argocd/apps/ pin ingress-nginx, cert-manager, kube-prometheus-stack, Loki, Promtail, and OpenCost to the same versions as the installer.

Documentation

Full documentation lives in the project wiki:

Working example: build any container that serves on port 3000 and exposes /api/health, push it to a registry, then point the chart at the image:

helm install demo ./charts/nextjs-app \
  --set image.repository=ghcr.io/you/nextjs-demo \
  --set image.tag=latest \
  --set ingress.host=demo.example.com

The app/ router needs only a one-line health route to satisfy the probes:

// app/api/health/route.ts
export async function GET() {
  return Response.json({ ok: true })
}

Tests

The chart and the bundled manifests are covered by an end-to-end pytest suite that renders charts/nextjs-app with real Helm and asserts on the resulting Kubernetes objects (selectors match pods, the service targets the container port, TLS wiring is correct, optional objects are gated off), plus checks that the GitOps Applications and the installer pin matching chart versions.

uv pip install --system pytest pyyaml
pytest -ra

CI runs helm lint, a template render of the chart and both fixtures, the pytest suite, dashboard JSON validation, and ShellCheck on every push and pull request.

Roadmap

  • Next.js Helm chart with probes, autoscaling, PDB, ingress, hardened security context
  • Observability stack (Prometheus, Grafana, Loki 3.x, Alertmanager)
  • cert-manager + ingress-nginx wired in via the version-pinned install script
  • OpenCost spend dashboard
  • GitOps install via ArgoCD app-of-apps
  • End-to-end test suite that renders the chart and asserts on the objects
  • Disaster recovery scripts via Velero
  • HPA on custom metrics (requests per second from the ServiceMonitor)
  • ingress-nginx canary traffic split between two releases

License

MIT.

Built by Sarma Linux.


More open source by Sarma

Part of a portfolio of twelve production-shaped open-source repositories built and maintained by Sarma.

RepositoryWhat it is
Sarmalink-aiMulti-provider OpenAI-compatible AI gateway with 14-engine failover and intent-based plugin auto-routing
agent-orchestratorDurable multi-agent workflows in TypeScript with deterministic replay and Inspector UI
voice-agent-starterSub-second full-duplex voice agent loop. WebRTC, mediasoup, pluggable STT / LLM / TTS
ai-eval-runnerEvals as code. Python, DuckDB, FastAPI viewer, regression mode for CI
mcp-server-toolkitProduction Model Context Protocol server starter (Python / FastAPI)
local-llm-routerOpenAI-compatible proxy that routes to Ollama or cloud providers based on policy
rag-over-pdfMinimal end-to-end RAG starter for PDF corpora
receipt-scannerVision OCR for receipts with Zod-validated JSON output
webhook-to-emailWebhook receiver that forwards events to email via Resend
k8s-ops-toolkitHelm chart for shipping Next.js to Kubernetes with full observability stack
terraform-stackVercel + Supabase + Cloudflare + DigitalOcean modules in one Terraform repo
staff-portalOpen-source HR / ops portal for leave, attendance, expenses, kiosk mode

Engineering essays at sarmalinux.com/blog · All projects at sarmalinux.com/open-source