Node Problem Detector Add-on

February 21, 2023 ยท View on GitHub

node-problem-detector makes common Kubernetes node problems visible to the cluster management stack. It is a daemon which runs on each node, detects problems, and reports them to the apiserver.

For more details on the Kubernetes node-problem-detector, see the GitHub project.

The following is a sample API definition with the node-problem-detector addon enabled.

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "kubernetesConfig": {
        "addons": [
          {
            "name": "node-problem-detector",
            "enabled": true
          }
        ]
      }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "",
      "vmSize": "Standard_DS2_v2"
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool",
        "count": 3,
        "vmSize": "Standard_DS2_v2",
        "availabilityProfile": "AvailabilitySet"
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": ""
          }
        ]
      }
    }
  }
}

You can validate that the add-on is running as expected with the following command. You should see a node-problem-detector pod running for each agent node in the cluster.

kubectl get pods -n kube-system

To test node-problem-detector in a running cluster, you can inject messages into the logs it is watching. See Try It Out in the node-problem-detector documentation for details.

Configuration

NameRequiredDescriptionDefault Value
customPluginMonitornoComma-separated list of custom plugin monitor config files/config/kernel-monitor-counter.json,/config/systemd-monitor-counter.json
systemLogMonitornoComma-separated list of system log monitor config files/config/kernel-monitor.json,/config/docker-monitor.json,/config/systemd-monitor.json
systemStatsMonitornoComma-separated list of system stats monitor config files/config/system-stats-monitor.json
versionLabelnoVersion label used as DaemonSet selectorv0.8.4

Node Problem Detector

NameRequiredDescriptionDefault Value
namenocontainer name"node-problem-detector"
imagenoimage"registry.k8s.io/node-problem-detector:v0.8.1"
cpuRequestsnocpu requests for the container"20m"
memoryRequestsnomemory requests for the container"20Mi"
cpuLimitsnocpu limits for the container"200m"
memoryLimitsnomemory limits for the container"100Mi"

Supported Orchestrators

Kubernetes

Contact