Jenkins Guide
June 8, 2026 · View on GitHub
A guide on maintaining Node.js' Test and Release Jenkins clusters
TOC
Ansible
All machines in the clusters are managed using Ansible, with the
playbooks that live in ansible/playbooks/jenkins.
The playbooks provision a machine, and handle tasks such as installing compilers, Java versions, and managing the local Jenkins agent.
To see which playbooks correspond to which worker, check the services document.
Running playbooks
Before running playbooks, ensure that you have the secrets repo properly cloned and found by Ansible, as described in the README. If the machine secret is not available, you can always get it from the machine's Jenkins configuration page.
You can now run the playbook for the machine. Since
test-rackspace-freebsd10-x64-1 is being used for this example, you would want
to run the following command on your local machine, from within the ansible
directory:
ansible-playbook playbooks/jenkins/worker/create.yml --limit test-rackspace-freebsd10-x64-1
If all goes according to plan, then Ansible should be able to run the playbook with no errors. If you do encounter problems, there are usually some WG members available in the Node.js Build Slack channel, who can try and lend a hand.
Default security permissions
Default security permissions for the test CI are as shown:

Security releases
When security releases are due to go out, the Build WG plays an important role in facilitating their testing.
A tracking issue in the nodejs/build issue tracker
should have been created by the security release steward requesting Build WG
members to be available to support the security release. Avoid making any
unnecessary changes to the Jenkins job or machine configurations in the
lead up to the release.
Solving problems
Issues with the Jenkins clusters are usually reported to either the
Node.js Build Slack channel, or to the
nodejs/build issue tracker.
When trying to fix a worker, ensure that you mark the node as offline,
via the Jenkins worker configure UI, so more failures don't pile up.
Once you are done fixing the worker, ensure that you return the worker
to the "online" status.
The most common issues facing workers are explained below, with potential
solutions on how to remedy the problem. Most commands below are meant to
be run on the worker itself, after SSH-ing in and switching to the
iojs user. See the SSH guide on how to log into the machines.
Out of memory
First, get statistics on running processes for the machine: ps aux | grep node.
If there are a lot of "hanging" / "abandoned" processes, it's best to
remove them by running a command like: ps -ef | grep node | grep -v -egrep -ejava -edocker | awk '{print \$2}' | xargs kill.
Overall memory utilization can be found using the free command on most
workers, or the swap -s -h command on SmartOS workers.
Out of space
First, get statistics on how full (or not) the machine is by running the
df -h command.
If the Use% column appears very high for the worker's largest disk,
then it is probably appropriate to clean out part of the worker's
workspace (where Jenkins jobs are performed). To clean out part of the
workspace, run rm -rf ~/build/workspace/node-test-commit*.
AIX Space Issues
The AIX machines are often under provisoned in terms of how much diskspace they have. You can just increase the size of mount points themselves.
To check space run df -g
$ df -g
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd4 0.09 0.06 41% 2543 16% /
/dev/hd2 6.12 4.25 31% 38189 4% /usr
/dev/hd9var 5.19 4.75 9% 1254 1% /var
/dev/hd3 4.22 4.21 1% 42 1% /tmp
/dev/hd1 20.03 17.73 12% 44911 2% /home
/dev/hd11admin 0.12 0.12 1% 5 1% /admin
/proc - - - - - /proc
/dev/hd10opt 10.41 8.77 16% 24532 2% /opt
/dev/livedump 0.25 0.25 1% 4 1% /var/adm/ras/livedump
/dev/repo00 8.41 0.01 100% 2656 42% /usr/sys/inst.images
To check the amount of space left on the volume group:
Find the volume group lsvg
List information about the volume group lsvg rootvg
Check the amount of FREE PPs to see how much space is left on the volume group:
VOLUME GROUP: rootvg VG IDENTIFIER: 00f6db0a00004c000000016fe89cbde6
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 3838 (122816 megabytes)
MAX LVs: 256 FREE PPs: 2033 (65056 megabytes) *HERE*
LVs: 13 USED PPs: 1805 (57760 megabytes)
OPEN LVs: 12 QUORUM: 2 (Enabled)
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 16256 MAX PVs: 2
LTG size (Dynamic): 512 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512 CRITICAL VG: no
FS SYNC OPTION: no
To increase the size of /home run: sudo chfs -a size=+XG /home where X is the number of GBs to increase the system.
If you hit an error about the maximum allocation of the logical volume you can increase the max with:
sudo chlv -x <size> <logical volume>
If you are unsure about any of the commands please ask a Red Hat/IBM member of the build working group for assistance
General issues with Jenkins agent: "normal machines" edition
Git errors or exceptions raised by the Jenkins agent can usually be fixed by taking a look at the agent's logs, and then restarting it.
To view the agent's logs on most modern Linux machines, you can run
journalctl -n 50 -u jenkins | less.
To view the status of the agent, you can run one of the following commands (based on the OS of the worker):
# Most modern Linux machines
systemctl status jenkins
# Older Linux machines
service jenkins status
# SmartOS
svcs -l svc:/application/jenkins:default
# macOS
sudo launchctl list | grep jenkins
# Other OSes
~iojs/start.sh
To restart the agent, you can run one of the following commands (based on the OS of the worker):
# Most modern Linux machines
systemctl restart jenkins
# Older Linux machines
service jenkins restart
# SmartOS
svcadm restart svc:/application/jenkins:default
# macOS
launchctl stop org.nodejs.osx.jenkins
launchctl start org.nodejs.osx.jenkins
# AIX
sudo /etc/rc.d/rc2.d/S20jenkins start
# Other OSes
~iojs/start.sh
Read-only filesystem
Sometimes a failure in the filesystem will cause it to enter a read-only state. If that happens, follow the steps below:
touch foo # to confirm system is read-only, don't proceed if this succeeds
sudo df # to determine the device for `/`
sudo e2fsck -y /dev/mmcblk0p2 # replace mmcblk0p2 with proper device for `/`
After running the steps above, follow instructions below to restart the machine.
Note: the occurrence of read-only root filesystem indicates there's probably a more concerning issue going on. Remember to open an issue on nodejs/build to investigate further, as it might be required to reprovision the machine.
Restart the machine
Sometimes something weird happens, and it's easier to just reboot the worker.
On Unix just do one of:
shutdown -r now
# or:
reboot
On the advice of the system adminstrators managing the AIX machines, please do not restart them without good reason, it will make things worse. It may help to run the https://ci.nodejs.org/job/aix-cleanup/ to kill any stray processes from earlier failed job runs.
Fixing machines with Docker
The above steps generally do not apply to workers that are either "Half Docker" or "Full Docker".
Below is a quick guide using test-softlayer-ubuntu1804_container-x64-1 as an example Jenkins worker.
- Figure out the which machine hosts the container. It should be stated in the worker's view on Jenkins
- Verify the existence of the container:
- To view a list of all active Docker containers, you can run:
docker ps - To view a list of all active services, you can run:
systemctl list-units | grep jenkins - Each container should have a matching
systemdservice that starts and stops the container. Its name should bejenkins-${workername}, so in this examplejenkins-test-softlayer-ubuntu1804_container-x64-1
- To view a list of all active Docker containers, you can run:
- To view the logs for a service run:
journalctl -u jenkins-test-softlayer-ubuntu1804_container-x64-1 - To restart a Docker container, restart the associated
systemdservice:systemctl restart jenkins-test-softlayer-ubuntu1804_container-x64-1 CONTAINER IDis needed to console into a Docker container, first run:
That will give you:docker ps -f "name=test-softlayer-ubuntu1804_container-x64-1"
then using run the following, replacingCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9f3272e43017 node-ci:test-softlayer-ubuntu1804_container-x64-1 "/bin/sh -c 'cd /hom…" 19 minutes ago Up 19 minutes node-ci-test-softlayer-ubuntu1804_container-x64-1CONTAINER_IDwith the appropriate IDdocker exec -it ${CONTAINER_ID} /bin/bash`
Windows
On Windows, it might or might not help but should be harmless to run https://ci.nodejs.org/view/All/job/git-clean-windows/ and https://ci.nodejs.org/view/All/job/windows-update-reboot/ with force reboot.
IDK what to do
In case the above steps did not work, or you are unsure of what to try, this section is for you.
The first thing to remember is that, ultimately, all workers can be replaced with newly provisioned ones, so don't worry too much about messing up a worker.
The safest bet when dealing with an erroring worker is to re-run its associated Ansible playbook. This will try and restore the worker back to its desired state, including refreshing and restarting the Jenkins agent configuration.
If none of the above steps work, please post in the
Node.js Build Slack channel, or the
nodejs/build issue tracker, to allow
for escalation and other WG members to troubleshoot.
For problems with machines outside of the Jenkins test cluster, ask one of the members of the infra or release administrators to take a look.