Our current solution uses Jenkins to start a Nomad job which starts a (unprivileged) docker container in which a developers Dockerfile is being build (as root) using the docker on the host.
The goal is to replace the docker build in the container by buildah so that we don't need to make the docker on the host available inside the container.
The path to this wasn't as straightforward unfortunately, a lot of yaks needed shaving.
Start of the journey
We're starting with a basic container where we install buildah in
# docker run --rm -ti centos:7 /bin/bash
[root@7387c68139dd /]# yum -y install buildah
And a very simple Dockerfile
FROM centos:7
RUN uptime
# docker run --rm -ti centos:7 /bin/bash
[root@7387c68139dd /]# yum -y install buildah
FROM centos:7
RUN uptime
Yak 1 - overlay problems
Out of the box running buildah in the container will give an overlay error.
# buildah bud -t test .
ERRO[0000] 'overlay' is not supported over extfs at "/var/lib/containers/storage/overlay"
ERRO[0000] 'overlay' is not supported over extfs at "/var/lib/containers/storage/overlay"
kernel does not support overlay fs: 'overlay' is not supported over extfs at "/var/lib/containers/storage/overlay": backing file system is unsupported for this graph driver
kernel does not support overlay fs: 'overlay' is not supported over extfs at "/var/lib/containers/storage/overlay": backing file system is unsupported for this graph driver
Spoiler: The real reason this doesn't work is because it tries to do a mount
call, which can only be done with the SYS_ADMIN
capability (or in a privileged container).
Using --storage-driver vfs
fixed this problem.
On to the next one.
# buildah bud -t test .
ERRO[0000] 'overlay' is not supported over extfs at "/var/lib/containers/storage/overlay"
ERRO[0000] 'overlay' is not supported over extfs at "/var/lib/containers/storage/overlay"
kernel does not support overlay fs: 'overlay' is not supported over extfs at "/var/lib/containers/storage/overlay": backing file system is unsupported for this graph driver
kernel does not support overlay fs: 'overlay' is not supported over extfs at "/var/lib/containers/storage/overlay": backing file system is unsupported for this graph driver
mount
call, which can only be done with the SYS_ADMIN
capability (or in a privileged container).--storage-driver vfs
fixed this problem.Yak 2 - mount namespace error aka unshare(CLONE_NEWNS) permission aka the wrong yak
Spoiler: this yak is a red herring
# buildah --storage-driver vfs bud -t test .
STEP 1: FROM centos:7
Getting image source signatures
Copying blob sha256:aeb7866da422acc7e93dcf7323f38d7646f6269af33bcdb6647f2094fc4b3bf7
71.24 MiB / 71.24 MiB [====================================================] 4s
Copying config sha256:75835a67d1341bdc7f4cc4ed9fa1631a7d7b6998e9327272afea342d90c4ab6d
2.13 KiB / 2.13 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
STEP 2: RUN uptime
error running container: error creating new mount namespace for [/bin/sh -c uptime]: operation not permitted
error building at step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[uptime] Flags:[] Attrs:map[] Message:RUN uptime Original:RUN uptime}: exit status 1
strace to the rescue
unshare(CLONE_NEWNS) = -1 EPERM (Operation not permitted)
After some googling I found that centos/rhel kernels have user namespace disabled by default and need to have a kernel parameter set to get this working.
We can enable this by running on the host
sudo grubby --args="namespace.unpriv_enable=1 user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
And also set the maximum number of user namespaces that any user in the current user namespace may create by running
echo "user.max_user_namespaces=15000" >> /etc/sysctl.conf
Now we can reboot the server
And come to the conclusion that it still doesn't work.
# buildah --storage-driver vfs bud -t test .
STEP 1: FROM centos:7
Getting image source signatures
Copying blob sha256:aeb7866da422acc7e93dcf7323f38d7646f6269af33bcdb6647f2094fc4b3bf7
71.24 MiB / 71.24 MiB [====================================================] 4s
Copying config sha256:75835a67d1341bdc7f4cc4ed9fa1631a7d7b6998e9327272afea342d90c4ab6d
2.13 KiB / 2.13 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
STEP 2: RUN uptime
error running container: error creating new mount namespace for [/bin/sh -c uptime]: operation not permitted
error building at step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[uptime] Flags:[] Attrs:map[] Message:RUN uptime Original:RUN uptime}: exit status 1
unshare(CLONE_NEWNS) = -1 EPERM (Operation not permitted)
sudo grubby --args="namespace.unpriv_enable=1 user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
echo "user.max_user_namespaces=15000" >> /etc/sysctl.conf
Yak 3 - outdated buildah version
Thanks to the #buildah channel on freenode, I found out that the problem of yak 2 was actually an outdated buildah version.
Centos has only a buildah 1.2 rpm, but 1.4 or higher was needed so I'd have to build my own.
You can have this pleasure too with the following script containing a modified RPM spec.
Run a new centos:7 container
# docker run -ti -v /tmp:/tmp centos:7 /bin/bash
and run following commands in the container:
yum -y group install development
yum -y install wget
cd /root/rpmbuild/SOURCES
wget "https://github.com/containers/buildah/tarball/608fa843cce45e7ee58ccb71a90297b645a984d3" -O buildah-608fa84.tar.gz
tar zxvf buildah-608fa84.tar.gz
mv containers-buildah-608fa84 buildah-608fa843cce45e7ee58ccb71a90297b645a984d3
tar zcvf buildah-608fa84.tar.gz buildah-608fa843cce45e7ee58ccb71a90297b645a984d3
rm -rf buildah-608fa843cce45e7ee58ccb71a90297b645a984d3
cd ../SPECS
wget https://gist.githubusercontent.com/42wim/848fba2ed2d64d457f56eeebef0e85a2/raw/bb3ad3c524529ed921626fb077b8ff78a56783fc/buildah.spec -O buildah.spec
yum-builddep -y buildah.spec
rpmbuild -ba buildah.spec
This will give you your RPMs
Wrote: /root/rpmbuild/SRPMS/buildah-1.4-1.git608fa84.el7.centos.src.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/buildah-1.4-1.git608fa84.el7.centos.x86_64.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/buildah-debuginfo-1.4-1.git608fa84.el7.centos.x86_64.rpm
Centos has only a buildah 1.2 rpm, but 1.4 or higher was needed so I'd have to build my own.
# docker run -ti -v /tmp:/tmp centos:7 /bin/bash
yum -y group install development
yum -y install wget
cd /root/rpmbuild/SOURCES
wget "https://github.com/containers/buildah/tarball/608fa843cce45e7ee58ccb71a90297b645a984d3" -O buildah-608fa84.tar.gz
tar zxvf buildah-608fa84.tar.gz
mv containers-buildah-608fa84 buildah-608fa843cce45e7ee58ccb71a90297b645a984d3
tar zcvf buildah-608fa84.tar.gz buildah-608fa843cce45e7ee58ccb71a90297b645a984d3
rm -rf buildah-608fa843cce45e7ee58ccb71a90297b645a984d3
cd ../SPECS
wget https://gist.githubusercontent.com/42wim/848fba2ed2d64d457f56eeebef0e85a2/raw/bb3ad3c524529ed921626fb077b8ff78a56783fc/buildah.spec -O buildah.spec
yum-builddep -y buildah.spec
rpmbuild -ba buildah.spec
Wrote: /root/rpmbuild/SRPMS/buildah-1.4-1.git608fa84.el7.centos.src.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/buildah-1.4-1.git608fa84.el7.centos.x86_64.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/buildah-debuginfo-1.4-1.git608fa84.el7.centos.x86_64.rpm
Yak 4 - proc mount error
Progress, a new error when running buildah 1.4!
# buildah --storage-driver vfs bud -t test .
STEP 1: FROM centos:7
Getting image source signatures
Copying blob sha256:205941c9c2d103bcdff0bc72d8836e0ffc4573ec0e6e524ec1a59606062a289f
71.25 MiB / 71.25 MiB [====================================================] 4s
Copying config sha256:e26dc8af6a3b1856b9f4a893d5b51855c02dfe3b9cec58a4e55002036528c669
2.14 KiB / 2.14 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
STEP 2: RUN uptime
container_linux.go:336: starting container process caused "process_linux.go:399: container init caused \"rootfs_linux.go:58: mounting \\\"/proc\\\" to rootfs \\\"/tmp/buildah596035765/mnt/rootfs\\\" at \\\"/proc\\\" caused \\\"operation not permitted\\\"\""
error running container: error creating container for [/bin/sh -c uptime]: : exit status 1
error building at step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[uptime] Flags:[] Attrs:map[] Message:RUN uptime Original:RUN uptime}: exit status 1
ERRO[0012] exit status 1
Again thanks to #buildah channel, I found out that running --isolation chroot
would solve it.
# buildah --storage-driver vfs bud -t test .
STEP 1: FROM centos:7
Getting image source signatures
Copying blob sha256:205941c9c2d103bcdff0bc72d8836e0ffc4573ec0e6e524ec1a59606062a289f
71.25 MiB / 71.25 MiB [====================================================] 4s
Copying config sha256:e26dc8af6a3b1856b9f4a893d5b51855c02dfe3b9cec58a4e55002036528c669
2.14 KiB / 2.14 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
STEP 2: RUN uptime
container_linux.go:336: starting container process caused "process_linux.go:399: container init caused \"rootfs_linux.go:58: mounting \\\"/proc\\\" to rootfs \\\"/tmp/buildah596035765/mnt/rootfs\\\" at \\\"/proc\\\" caused \\\"operation not permitted\\\"\""
error running container: error creating container for [/bin/sh -c uptime]: : exit status 1
error building at step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[uptime] Flags:[] Attrs:map[] Message:RUN uptime Original:RUN uptime}: exit status 1
ERRO[0012] exit status 1
--isolation chroot
would solve it.Victory!
Finally it works, we have an image created by buildah running in an unprivileged container.
# buildah --storage-driver vfs bud --isolation chroot -t test .
STEP 1: FROM centos:7
STEP 2: RUN uptime
21:30:55 up 32 min, 0 users, load average: 0.39, 0.12, 0.08
STEP 3: COMMIT containers-storage:[vfs@/var/lib/containers/storage+/var/run/containers/storage]localhost/test:latest
Getting image source signatures
Skipping fetch of repeat blob sha256:f972d139738dfcd1519fd2461815651336ee25a8b54c358834c50af094bb262f
Skipping fetch of repeat blob sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
Copying config sha256:26e3b2177f9e9db1bdc8f49083d09dbb980a99ed4e606f4dc45b79ca865588ce
1.17 KiB / 1.17 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
--> 26e3b2177f9e9db1bdc8f49083d09dbb980a99ed4e606f4dc45b79ca865588ce
But after testing a new yak appears.
# buildah --storage-driver vfs bud --isolation chroot -t test .
STEP 1: FROM centos:7
STEP 2: RUN uptime
21:30:55 up 32 min, 0 users, load average: 0.39, 0.12, 0.08
STEP 3: COMMIT containers-storage:[vfs@/var/lib/containers/storage+/var/run/containers/storage]localhost/test:latest
Getting image source signatures
Skipping fetch of repeat blob sha256:f972d139738dfcd1519fd2461815651336ee25a8b54c358834c50af094bb262f
Skipping fetch of repeat blob sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
Copying config sha256:26e3b2177f9e9db1bdc8f49083d09dbb980a99ed4e606f4dc45b79ca865588ce
1.17 KiB / 1.17 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
--> 26e3b2177f9e9db1bdc8f49083d09dbb980a99ed4e606f4dc45b79ca865588ce
Yak 5 - a lot of diskspace
This one is related to yak 1, because we're using the vfs
storage driver, it uses the disk not very space efficient (according to https://docs.docker.com/storage/storagedriver/vfs-driver/) a more complicated docker build uses gigabytes of disk when using the vfs
storage driver compared to the overlay
driver.
To run with the overlay driver we need access to the mount call which means we have to run our docker container with CAP_SYS_ADMIN
which is unfortunate.
# docker run --rm --add-cap SYS_ADMIN -ti centos:7 /bin/bash
vfs
storage driver, it uses the disk not very space efficient (according to https://docs.docker.com/storage/storagedriver/vfs-driver/) a more complicated docker build uses gigabytes of disk when using the vfs
storage driver compared to the overlay
driver.CAP_SYS_ADMIN
which is unfortunate.# docker run --rm --add-cap SYS_ADMIN -ti centos:7 /bin/bash
Conclusion
It's possible to run buildah in an unprivileged container but only using the vfs storage driver, but beware of the disk usage when building images!
Interesting links:
- https://github.com/containers/fuse-overlayfs this will probably fix the
CAP_SYS_ADMIN
issue for overlay
- https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/ an overview of what's the problem and what's getting fixed regarding to unprivileged builds, must read!
- https://buildah.io everything about buildah
CAP_SYS_ADMIN
issue for overlay
Hello, when I follow your process, the first step is the following:
ReplyDelete```
[root@b07810c3fc08 opt]# buildah bud -t test .
Error during unshare(CLONE_NEWUSER): Operation not permitted
ERRO[0000] error parsing PID "": strconv.Atoi: parsing "": invalid syntax
ERRO[0000] (unable to determine exit status)
```
Should I add other parameters when docker starts the container?
Zhou,
ReplyDeleteTo resolve this problem you need to run the container with --cap-add SYS_ADMIN or as a privileges container.