To properly cooperate with the jenkins-dynamatrix project driving regular NUT CI builds, each build environment should be exposed as an individual agent with labels describing its capabilities.
With the jenkins-dynamatrix
, agent labels are used to calculate a large
"slow build" matrix to cover numerous scenarios for what can be tested
with the current population of the CI farm, across operating systems,
make
, shell and compiler implementations and versions, and C/C++ language
revisions, to name a few common "axes" involved.
Emulated-CPU container builds are CPU-intensive, so for them we define as few capabilities as possible: here CI is more interested in checking how binaries behave on those CPUs, not in checking the quality of recipes (distcheck, Make implementations, etc.), shell scripts or documentation, which is more efficient to test on native platforms.
Still, we are interested in results from different compiler suites, so specify at least one version of each.
Currently the NUT Jenkinsfile-dynamatrix
only looks at various
COMPILER
variants for qemu-nut-builder
use-cases, disregarding the
versions and just using one that the environment defaults to.
The reduced set of labels for QEMU workers looks like:
qemu-nut-builder qemu-nut-builder:alldrv NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=cppunit OS_FAMILY=linux OS_DISTRO=debian11 GCCVER=10 CLANGVER=11 COMPILER=GCC COMPILER=CLANG ARCH64=ppc64le ARCH_BITS=64
For contrast, a "real" build agent’s set of labels, depending on presence or known lack of some capabilities, looks something like this:
doc-builder nut-builder nut-builder:alldrv NUT_BUILD_CAPS=docs:man NUT_BUILD_CAPS=docs:all NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=cppunit=no OS_FAMILY=bsd OS_DISTRO=freebsd12 GCCVER=10 CLANGVER=10 COMPILER=GCC COMPILER=CLANG ARCH64=amd64 ARCH_BITS=64 SHELL_PROGS=sh SHELL_PROGS=dash SHELL_PROGS=zsh SHELL_PROGS=bash SHELL_PROGS=csh SHELL_PROGS=tcsh SHELL_PROGS=busybox MAKE=make MAKE=gmake PYTHON=python2.7 PYTHON=python3.8
ci-debian-altroot--jenkins-debian10-arm64
(note the
pattern for "Conflicts With" detailed below)
Remote root directory: preferably unique per agent, to avoid surprises;
e.g.: /home/abuild/jenkins-nut-altroots/jenkins-debian10-armel
.ccache
or .gitcache-dynamatrix
are available to all builders with identical contents
/dev/shm
on modern
Linux distributions); roughly estimate 300Mb per executor for NUT builds.
Node properties / Environment variables:
PATH+LOCAL
⇒ /usr/lib/ccache
Depending on circumstances of the container, there are several options available to the NUT CI farm:
agent.jar
JVM
would run in the container.
Filesystem for the abuild
account may be or not be shared with the host.
ssh
or chroot
(networking not required, but bind-mount
of /home/abuild
and maybe other paths from host would be needed) called
for executing sh
steps in the container environment. Either way, home
directory of the abuild
account is maintained on the host and shared with
the guest environment, user and group IDs should match.
As time moves on and Jenkins core and its plugins get updated, support for some older run-time features of the build agents can get removed (e.g. older Java releases, older Git tooling). While there are projects like Temurin that provide Java builds for older systems, at some point a switch to "Jenkins agent on new host going into older build container" approach can become unavoidable. One clue to look at in build logs is failure messages like:
Caused by: java.lang.UnsupportedClassVersionError: hudson/slaves/SlaveComputer$SlaveVersion has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0
This is a typical use-case for tightly integrated build farms under common
management, where the Jenkins controller can log by SSH into systems which
act as its build agents. It injects and launches the agent.jar
to execute
child processes for the builds, and maintains a tunnel to communicate.
Methods below involving SSH assume that you have configured a password-less
key authentication from the host machine to the abuild
account in each
guest build environment container.
This can be an ssh-keygen
result posted into authorized_keys
, or a
trusted key passed by a chain of ssh agents from a Jenkins Credential
for connection to the container-hoster into the container.
The private SSH key involved may be secured by a pass-phrase, as long as
your Jenkins Credential storage knows it too.
Note that for the approaches explored below, the containers are not
directly exposed for log-in from any external network.
For passing the agent through an SSH connection from host to container,
so that the agent.jar
runs inside the container environment, configure:
Host, Credentials, Port: as suitable for accessing the container-hoster
The container-hoster should have accessed the guest container from
the account used for intermediate access, e.g. abuild
, so that its
.ssh/known_hosts
file would trust the SSH server on the container.
Prefix Start Agent Command: content depends on the container name,
but generally looks like the example below to report some info about
the final target platform (and make sure java
is usable) in the
agent’s log. Note that it ends with un-closed quote and a space char:
ssh jenkins-debian10-amd64 '( java -version & uname -a ; getconf LONG_BIT; getconf WORD_BIT; wait ) &&
'
The other option is to run the agent.jar
on the host, for all the
network and filesystem magic the agent does, and only execute shell
steps in the container. The solution relies on overridden sh
step
implementation in the jenkins-dynamatrix
shared library that uses a
magic CI_WRAP_SH
environment variable to execute a pipe into the
container. Such pipes can be ssh
or chroot
with appropriate host
setup described above.
In case of ssh piping, remember that the container’s
/etc/ssh/sshd_config
should AcceptEnv *
and the SSH
server should be restarted after such configuration change.
Prefix Start Agent Command: content depends on the container name, but generally looks like the example below to report some info about the final target platform (and make sure it is accessible) in the agent’s log. Note that it ends with a space char, and that the command here should not normally print anything into stderr/stdout (this tends to confuse the Jenkins Remoting protocol):
echo PING > /dev/tcp/jenkins-debian11-ppc64el/22 &&
Node properties / Environment variables:
CI_WRAP_SH
⇒
ssh -o SendEnv='*' "jenkins-debian11-ppc64el" /bin/sh -xe
This approach allows remote systems to participate in the NUT CI farm by dialing in and so defining an agent. A single contributing system may be running a number of containers or virtual machines set up following the instructions above, and each of those would be a separate build agent.
Such systems should be "dedicated" to contribution in the sense that they should be up and connected for days, and sometimes tasks would land.
Configuration files maintained on the Swarm Agent system dictate which labels or how many executors it would expose, etc. Credentials to access the NUT CI farm Jenkins controller to register as an agent should be arranged with the farm maintainers, and currently involve a GitHub account with Jenkins role assignment for such access, and a token for authentication.
The jenkins-swarm-nutci repository contains example code from such setup with a back-up server experiment for the NUT CI farm, including auto-start method scripts for Linux systemd and upstart, illumos SMF, and OpenBSD rcctl.
Another aspect of farm management is that emulation is a slow and intensive operation, so we can not run all agents and execute builds at the same time.
The current solution relies on https://github.com/jimklimov/conflict-aware-ondemand-retention-strategy-plugin to allow co-located build agents to "conflict" with each other — when one picks up a job from the queue, it blocks neighbors from starting; when it is done, another may start.
Containers can be configured with "Availability ⇒ On demand", with shorter cycle to switch over faster (the core code sleeps a minute between attempts):
0
;
0
(Jenkins may change it to 1
);
^ci-debian-altroot--.*$
assuming that is the pattern
for agent definitions in Jenkins — not necessarily linked to hostnames.
Also, the "executors" count should be reduced to the amount of compilers in that system (usually 2) and so avoid extra stress of scheduling too many emulated-CPU builds at once.
As part of the jenkins-dynamatrix
optional optimizations, the NUT CI
recipe invoked via Jenkinsfile-dynamatrix
maintains persistent git
reference repositories that can be used to cache NUT codebase (including
the tested commits) and so considerably speed up workspace preparation
when running numerous build scenarios on the same agent.
Such .gitcache-dynamatrix
cache directories are located in the build
workspace location (unique for each agent), but on a system with numerous
containers these names can be symlinks pointing to a shared location.
To avoid collisions with several executors updating the same cache with
new commits, critical access windows are sequentialized with the use of
Lockable Resources
plugin. On the jenkins-dynamatrix
side this is facilitated by labels:
DYNAMATRIX_UNSTASH_PREFERENCE=scm-ws:nut-ci-src DYNAMATRIX_REFREPO_WORKSPACE_LOCKNAME=gitcache-dynamatrix:SHARED_HYPERVISOR_NAME
DYNAMATRIX_UNSTASH_PREFERENCE
tells the jenkins-dynamatrix
library
code which checkout/unstash strategy to use on a particular build agent
(following values defined in the library; scm-ws
means SCM caching
under the agent workspace location, nut-ci-src
names the cache for
this project);
DYNAMATRIX_REFREPO_WORKSPACE_LOCKNAME
specifies a semi-unique
string: it should be same for all co-located agents which use the same
shared cache location, e.g. guests on the same hypervisor; and it should
be different for unrelated cache locations, e.g. different hypervisors
and stand-alone machines.