Node Group Configuration
March 31, 2026 ยท View on GitHub
Configuration of the Node groups that Escalator will monitor is done through a YAML configuration file. This file is required for Escalator to run.
The configuration is validated by Escalator on start.
Example nodegroups_config.yaml configuration:
node_groups:
- name: "shared"
label_key: "customer"
label_value: "shared"
cloud_provider_group_name: "shared-nodes"
min_nodes: 1
max_nodes: 30
dry_mode: false
scale_on_starve: false
exclude_tainted_node_pods: false
taint_upper_capacity_threshold_percent: 40
taint_lower_capacity_threshold_percent: 10
slow_node_removal_rate: 2
fast_node_removal_rate: 5
scale_up_threshold_percent: 70
scale_up_cool_down_period: 2m
scale_up_cool_down_timeout: 10m
soft_delete_grace_period: 1m
hard_delete_grace_period: 10m
taint_effect: NoExecute
max_node_age: 24h
aws:
fleet_instance_ready_timeout: 1m
launch_template_id: lt-1a2b3c4d
launch_template_version: "1"
lifecycle: on-demand
instance_type_overrides: ["t2.large", "t3.large"]
resource_tagging: false
Options
name
This is the name of the node group to be displayed in the logs of Escalator when it is running. It doesn't provide any other benefit other than being useful for debugging a specific nodegroup.
Multiple node groups will need to have unique names.
If you configure the name of the node group to be default, it will watch all pods that do not have any affinity or
node selectors specified. More information on the default option can be found here.
label_key and label_value
label_key and label_value is the key-value pair used to select nodes and pods for consideration in the calculations
for a node group.
Pod and Node selectors are documented here.
cloud_provider_group_name
cloud_provider_group_name is the node group in the cloud provider that Escalator will either increase the size
of or terminate nodes in when Escalator scales up or down.
- AWS: this is the name of the auto scaling group. More information on AWS deployments can be found here.
min_nodes and max_nodes
These are the required hard limits that Escalator will stay within when performing scale up or down activities. If there is a case where a scale up or down activity will exceed either of these values, Escalator will clamp/adjust the value to the min or max value. For example:
min_nodes is configured to be 5. The current node count is 7. Escalator needs to scale down and terminate
3 nodes. The new current node count after scale down will exceed 5 if we terminate 3 nodes, so Escalator
will clamp/adjust the scale down amount to only terminate 2 nodes. The new current node count will be 5.
max_nodes is configured to be 20. The current node count is 18. Escalator needs to scale up and create
3 nodes. The new current node count after scale up will exceed 20 if we create 3 nodes, so Escalator will
clamp/adjust the scale up amount to create 2 nodes. The new current node count will be 20.
min_nodes and max_nodes do not have to match the values in the cloud provider. Note: if you configure Escalator's
min_nodes and max_nodes to be larger/smaller than the cloud provider's, the cloud provider may still block the scale
activity if the activity exceeds the cloud provider's limits.
Auto Discovery
min_nodes and max_nodes can be auto-discovered by Escalator and set to the min and max node values that are
configured for the node group in the cloud provider. These values are populated only when Escalator starts for the first
time. If the values in the cloud provider have changed, you will have to stop and restart Escalator for it to pick up
the new values.
To enable this, set min_nodes and max_nodes to 0 for the node group in nodegroups_config.yaml or simply remove
the two options from nodegroups_config.yaml.
dry_mode
This flag allows running a specific node group in dry mode. This will ensure Escalator doesn't taint, cordon or modify the node group, but just logs out the actions it would perform. This is helpful in understanding what Escalator would do in specific scenarios.
Note: this flag is overridden by the --drymode command line flag.
scale_on_starve
This flag adds an additional check to the escalator scaling algorithm to enforce a minimum scaling of 1 new node whenever there is a pod that cannot currently be scheduled due to no node having capacity to run it.
Warning: You have to be extra careful around pod request sizes not exceeding node capacity when this option is enabled, or you may find the escalator always creates the maximum number of nodes.
exclude_tainted_node_pods
When enabled, Escalator will exclude pods running on unavailable nodes from demand in the utilisation calculation. Unavailable nodes are tainted, force tainted, and cordoned nodes.
Unscheduled (pending) pods are still included so that scale-up decisions account for outstanding demand.
By default, Escalator aggregates pod resource requests across all pods in the node group, but calculates node capacity from only untainted nodes. This mismatch can inflate the utilisation percentage as pods on tainted or draining nodes are counted as demand, while the nodes they occupy are excluded from capacity.
The default value is false (disabled), preserving backward-compatible behavior.
taint_upper_capacity_threshold_percent
This option defines the threshold at which Escalator will slowly start tainting nodes. The slow tainting will only occur when the utilisation of the node group goes below this value. For example:
If the node group utilisation is 38%, and taint_upper_capacity_threshold_percent is configured as 40,
Escalator will taint nodes at the rate defined in slow_node_removal_rate.
taint_lower_capacity_threshold_percent
This option defines the threshold at which Escalator will quickly/fast start tainting nodes. The fast tainting will only occur when the utilisation of the node group goes below this value. For example:
If the node group utilisation is 5%, and taint_lower_capacity_threshold_percent is configured as 10,
Escalator will taint nodes at the rate defined in fast_node_removal_rate.
slow_node_removal_rate
The amount of nodes to taint whenever the node group utilisation goes below the
taint_upper_capacity_threshold_percent value.
fast_node_removal_rate
The amount of nodes to taint whenever the node group utilisation goes below the
taint_lower_capacity_threshold_percent value.
scale_up_threshold_percent
This value defines the threshold at which Escalator will increase the size of the node group. Escalator will calculate based on the current utilisation how much to increase the node group to be below the value of this option. For example:
scale_up_threshold_percent is configured as 70 and the current node group utilisation is 75%, Escalator will
increase the size of the node group so that the utilisation drops below 70.
Escalator calculates the utilisation based on the CPU and Memory limits/requests of containers running in the node group and uses whichever is the higher value as the utilisation. More information on utilisation calculation can be found here.
Slack space can be configured by leaving a gap between the
scale_up_threshold_percent and 100%, e.g. a value of 70 will mean 30% slack space.
scale_up_cool_down_period and scale_up_cool_down_timeout
scale_up_cool_down_period is a grace period before Escalator can consider the scale up of the node group
a success. This grace period is managed by the scale up locking mechanism and will not unlock the lock mechanism until
at least the scale_up_cool_down_period has passed.
A successful scale up is when the ASG size matches the ASG target size and the ready nodes in Kubernetes matches the ASG target size.
scale_up_cool_down_timeout is a timeout that when reached Escalator will release the scale lock so it can do future
scale up activities. This timeout is often reached when the cloud provider couldn't bring the nodes up in time or when
the node failed to register with Kubernetes in time.
Having the scale up activity timeout isn't necessarily a bad thing, it just acts as a fail safe in case scaling activities take too long so that the scale lock isn't permanently enabled.
soft_delete_grace_period and hard_delete_grace_period
These values define the periods before a node is attempted to be terminated and when the node is forcefully terminated.
When a node is tainted, the time at which it was tainted is recorded with the taint. soft_delete_grace_period defines
the time until Escalator can try to terminate the node. During the time period after soft_delete_grace_period and
before hard_delete_grace_period, Escalator will only terminate the node if it is considered empty. A node is
considered empty if it doesn't have any sacred pods running on it. Daemonsets are filtered out of this check.
If hard_delete_grace_period is reached, the node will be terminated regardless, even if there are sacred pods
running on it.
Take consideration when setting soft_delete_grace_period, as a low value will mean the node is terminated as soon as
possible, but if there is a sudden spike in pods there may not be an available pool of tainted nodes to untaint.
It is highly recommended to have the hard_delete_grace_period option set to a large value to give pods running on
nodes enough time to finish before the node is terminated.
Logic for determining if a node is empty can be found in pkg/k8s NodeEmpty()
taint_effect
This is an optional field and the value defines the taint effect that will be applied to the nodes when scaling down. The valid values are :
- NoSchedule
- NoExecute
- PreferNoSchedule
IF not set, it will default to NoSchedule.
aws.fleet_instance_ready_timeout
This is an optional field. The default value is 1 minute.
When using the one-shot capacity acquisition for AWS (see aws.launch_template_id), this is the maximum amount of time
that Escalator will block waiting for new EC2 instances to become ready so that they can be added to the node group.
This roughly corresponds to the amount of time it takes an instance to boot to multi-user mode and for the EC2 control
plane to notice that it is healthy. This generally takes much less than a minute.
Note: Escalator will block other scaling operations for, at most, this amount of time while new capacity comes online.
aws.launch_template_id
This value is the launch template ID (of the format lt-[a-f0-9]{17}) to use as a template for new instances that are
acquired using the AWS Fleet API. This value can be obtained through the AWS EC2 console on the Launch Templates page
or from the LaunchTemplateId field returned from the
create-launch-template CLI command
and AWS API call.
Setting this value and that of aws.launch_template_version allows Escalator to use the AWS Fleet API to acquire all
desired capacity for a scale-up operation at once rather than waiting for an auto-scaling group to add capacity. This
call may fail if AWS is unable to fulfill all capacity for some reason. Escalator will wait for the acquired capacity
to become ready on the AWS side and will attach it all to the Escalator managed auto-scaling-group.
The instance types configured in the launch template should match the instance types configured for the auto-scaling group. If the auto-scaling group was created using a launch template instead of a launch configuration then that template ID should be used here.
Note: the AWS Fleet API does not support balancing requested instances by availability zone. When using this functionality the desired capacity will be acquire all at once in a single availability zone or not at all. This may change in a future update to the API.
aws.launch_template_version
This value is the version of the launch template to use. See aws.launch_template_id above. This value should be a
numeric string. This value can be obtained through the AWS EC2 console on the Launch Templates page or from the
LatestVersionNumber or DefaultVersionNumber field returned from the
create-launch-template CLI command
and AWS API call.
aws.lifecyle
Dependent on Launch Template ID being specified.
This optional value is the lifecycle of the instances that will be launched. The accepted values are strings of either
on-demand or spot to request On-Demand or Spot instances respectively. If no value is specified this will default
to on-demand.
aws.instance_type_overrides
Dependent on Launch Template ID being specified.
An optional list of instance types to override the instance type within the launch template. Providing multiple instance types here increases the likelihood of a Spot request being successful. If omitted the instance type to request will be taken from the launch template.
aws.resource_tagging
Tag ASG and Fleet Request resources used by Escalator with the metatdata key-value pair
k8s.io/atlassian-escalator/enabled:true. Tagging doesn't alter the functionality of Escalator. Read more about
tagging your AWS resources here.
max_node_age
max_node_age allows the configuration of a maximum age for nodes in the node group when the node group has scaled down
to the minimum node group size. When at the minimum node grop size, Escalator will trigger a scale up by a minimum of 1
if there are any nodes exceeding this max node age in an effort to have the old node rotated out.
This is to ensure that nodes in the node group are kept relatively new to pick up any changes to the launch configuration or launch template.
When not at the minimum, the natural scaling up and down of the node group will ensure new nodes are introduced to the node group.
This is an optional feature and by default is disabled.
unhealthy_node_grace_period
Defines the minimum age of a node before it can be tested to check if it is unhealthy.
When enabled, instances can be tested periodically to determine if they are healthy. Escalator will pause all scaling activity and flush out unhealthy instances if they go above a configured threshold for the nodegroup. It will continuously do this until enough instances in the nodegroup are healthy and normal scaling activity can resume.
Cordoned nodes are skipped and can never be considered unhealthy.
This is an optional field. The default value is empty, which disables the feature.
health_check_newest_nodes_percent
[Only used if unhealthy_node_grace_period is set.]
The percentage of nodes (ordered by age from newer to older) in the nodegroup that are considered when checking for the maximum allowed unhealthy nodes in the nodegroup. The nodes captured by this percentage form the "test set" to be checked. Only nodes which are older than unhealthy_node_grace_period will be included in the test set.
This field is required.
max_unhealthy_nodes_percent
[Only used if unhealthy_node_grace_period is set.]
The maximum percentage of unhealthy nodes in the test set from health_check_newest_nodes_percent. Beyond this threshold all scaling activity is paused and unhealthy nodes are flushed out.
Note: The valid range for
max_unhealthy_nodes_percentis0%to99%.
This is an optional field. If not set, it will default to 0%.