Node Problem Detector Add-on
February 21, 2023 ยท View on GitHub
node-problem-detector makes common Kubernetes node problems visible to the cluster management stack. It is a daemon which runs on each node, detects problems, and reports them to the apiserver.
For more details on the Kubernetes node-problem-detector, see the GitHub project.
The following is a sample API definition with the node-problem-detector addon enabled.
{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"kubernetesConfig": {
"addons": [
{
"name": "node-problem-detector",
"enabled": true
}
]
}
},
"masterProfile": {
"count": 1,
"dnsPrefix": "",
"vmSize": "Standard_DS2_v2"
},
"agentPoolProfiles": [
{
"name": "agentpool",
"count": 3,
"vmSize": "Standard_DS2_v2",
"availabilityProfile": "AvailabilitySet"
}
],
"linuxProfile": {
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
{
"keyData": ""
}
]
}
}
}
}
You can validate that the add-on is running as expected with the following command. You should see a node-problem-detector pod running for each agent node in the cluster.
kubectl get pods -n kube-system
To test node-problem-detector in a running cluster, you can inject messages into the logs it is watching. See Try It Out in the node-problem-detector documentation for details.
Configuration
| Name | Required | Description | Default Value |
|---|---|---|---|
| customPluginMonitor | no | Comma-separated list of custom plugin monitor config files | /config/kernel-monitor-counter.json,/config/systemd-monitor-counter.json |
| systemLogMonitor | no | Comma-separated list of system log monitor config files | /config/kernel-monitor.json,/config/docker-monitor.json,/config/systemd-monitor.json |
| systemStatsMonitor | no | Comma-separated list of system stats monitor config files | /config/system-stats-monitor.json |
| versionLabel | no | Version label used as DaemonSet selector | v0.8.4 |
Node Problem Detector
| Name | Required | Description | Default Value |
|---|---|---|---|
| name | no | container name | "node-problem-detector" |
| image | no | image | "registry.k8s.io/node-problem-detector:v0.8.1" |
| cpuRequests | no | cpu requests for the container | "20m" |
| memoryRequests | no | memory requests for the container | "20Mi" |
| cpuLimits | no | cpu limits for the container | "200m" |
| memoryLimits | no | memory limits for the container | "100Mi" |
Supported Orchestrators
Kubernetes
Contact
- If you have any questions or feedback regarding the node-problem-detector addon, please file an issue at https://github.com/Azure/aks-engine-azurestack/issues