Kubernetes publishing-bot production instance notes
July 19, 2023 ยท View on GitHub
What is this and what does it do?
The publishing-bot for the Kubernetes project is running in the publishing-bot namespace on a CNCF sponsored GKE cluster aaa in the kubernetes-public project.
How do i get access to this?
If you need access to any of the following, please update groups.yaml.
GKE instance
publishing-bot is running in a GKE cluster named aaa in the kubernetes-public
The cluster can be accessed by k8s-infra-rbac-publishing-bot@kubernetes.io. To access the cluster, please see these instructions.
What images does it use?
Publishing-bot images can be pushed by k8s-infra-staging-publishing-bot@kubernetes.io.
What commands are in this repo and how/when do i use them?
Make sure you are at the root of the publishing-bot repo before running these commands.
Populating repos
This script needs to be run whenever a new staging repo is added in kubernetes/kubernetes
hack/fetch-all-latest-and-push.sh kubernetes
Deploying the bot
make validate build-image push-image deploy CONFIG=configs/kubernetes
How to connect to the aaa cluster
You can use the Activate Cloud Shell in the GCP console above and in that console, run the following command
gcloud container clusters get-credentials aaa --region us-central1 --project kubernetes-public
then run kubectl commands to ensure you can see what's running in the cluster.
What is running there?
The publishing-bot runs in a separate kubernetes namespace by the same name in the aaa cluster.
The manifests here have the definitions
for these kubernetes resources. Example below:
davanum@cloudshell:~ (kubernetes-public)$ kubectl get pv,pvc,replicaset,pod -n publishing-bot
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-084a4d52-0a57-4f70-a76a-5d2d2667429d 100Gi RWO Delete Bound publishing-bot/publisher-gopath ssd 8h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/publisher-gopath Bound pvc-084a4d52-0a57-4f70-a76a-5d2d2667429d 100Gi RWO ssd 8h
NAME DESIRED CURRENT READY AGE
replicaset.apps/publisher 1 1 1 45d
NAME READY STATUS RESTARTS AGE
pod/publisher-cdvwj 1/1 Running 0 9h
How do i know if/when the bot fails?
Follow this Kubernetes issue #56876. When the bot fails it re-opens this issue with a fresh log. So if you are subscribed to this issue, you can see the bot open the issue when it fails.
How do i see what the publishing bot is doing?
you can stream the logs of the pod to see what the publishing-bot is doing
kubectl -n publishing-bot logs pod/publisher-cdvwj -f
What is the persistent volume for?
To do its work the publishing-bot has to download all the repositories and performs git surgery on them. So publishing-bot keeps the downloaded copy around and re-uses them. For example, if the pod gets killed the new pod can still work off of the downloaded git repositories on the persistent volume. Occasionally if we suspect the downloaded git repos are corrupted for some reason (say github flakiness), we may have to cleanup the pv/pvc. in other words, The volume is cache only. Wiping it is not harmful in general (other than for the time it takes to recreate all the data).
How do i clean up the pvc?
Step 1: Use the command to scale down the replicaset
kubectl scale -n publishing-bot --replicas=0 replicaset publisher
Step 2: Delete the PVC
kubectl delete -n publishing-bot persistentvolumeclaim/publisher-gopath
Step 3: Make sure the PVC is deleted and removed from the namespace
kubectl get -n publishing-bot pvc
should not list any PVCs
Step 4: Re-deploy the pvc again
kubectl apply -n publishing-bot -f artifacts/manifests/pvc.yaml
Step 5: Scale up the replicaset
kubectl scale -n publishing-bot --replicas=1 replicaset publisher
Step 6: Watch the pod start back up from Pending
kubectl -n publishing-bot get pods