Pages

Saturday, March 30, 2019

Routable IPv6 containers with podman

Hacking podman to have "rootless" routable ipv6 containers using a small root daemon.



Podman is great, but to have it replace our current docker setup it also needs ipv6 support (which it has using slirp4netns), but this isn't reachable from other containers or outside the host.

We don't care about incoming legacy IP (ipv4).


What do we want

When a user starts a container, the container should have a routable IPV6 address and register it's name in consul. That way we can have multiple containers talk to eachother, no matter from which host they're started. (and this all needs to work on centos 7.6)


What do we need


From podman

  • The id of the user that started the container
  • The PID of the container so we can use this to enter the same network namespace
  • The name of the container so we can register this in consul
  • A way to talk to v6pod

External to podman (v6pod will handle this)

  • Be compatible with our current docker IPv6 ranges (/80)
  • Creating a bridge and add the gateway IPv6 address to it (::1)
  • Creating a veth pair
  • Generate a (dynamic) IPv6 address in the /80 range and add to veth that will come into the container
  • Add one of the veths to the bridge, the other in the network namespace of the user
  • Add a default IPv6 route to the bridge
  • Register the name of the container to the generated IPv6 address in consul
  • Deregister the name when the container stops

Modifying libpod/podman

1) Executing user

Some investigation into what happens when running podman run (rootless)
Podman tries to create a user namespace, join this and become root in it and re-executes itself in that namespace.
We need to save the id of the executing user somewhere, the environment looks a good place.
So we create a v6pod_user variable which contains the userid of the user running podman.


2) Pid of the container

This could be added somewhere better probably, but I kept it in the same method.
We don't have access to the container PID yet there because it hasn't started, but we already have the container ID that will be used.
So I save the container ID in the v6pod_id environment variable.
v6pod will then look into /run/user/" + userID + "/runc/" + containerID + "/state.json file to get the PID

Overview below of what happens when podman runs executes again but now in the user namespace.




3) Name of the container

We could've set this using another variable, but to be more flexible (maybe we need more information about the container in the future) we choose not to.
Podman saves it create-config in the path: "/run/user/" + userID + "/libpod/tmp/socket/" + containerID + "/artifacts/create-config" which contains a lot information and also the container name.

4) Talk with v6pod

Here we just hijack the slirp4netns command (which enables userspace networking) and replace it with a v6pod-slirp4netns bash file which contains:

#!/bin/bash
/bin/curl -XPOST -d "user=$v6pod_user&id=$v6pod_id" http://localhost:6781/api/activate
/bin/slirp4netns "$@"
/bin/curl -XPOST -d "user=$v6pod_user&id=$v6pod_id" http://localhost:6781/api/deactivate


So we use the variables we set above to do all the networking stuff we need, then let slirp4netns do it setup so we still have outgoing IPv4 besides IPv6. When the container ends, slirp4netns exists and we do a deregistration.


Modifying slirp4netns

slirp4netns sets an ipv4 and ipv6 address and gateways. We do the IPv6 part now, so this needs to be disabled in slirp4netns.

v6pod

v6pod is a go daemon with a rest interface that has a /activate and /deactivate entrypoint.
It implements the requirements of above:
  • Be compatible with our current docker IPv6 ranges (/80)
  • Creating a bridge and add the gateway IPv6 address to it (::1)
  • Creating a veth pair
  • Generate a (dynamic) IPv6 address in the /80 range and add to veth that will come into the container
  • Add one of the veths to the bridge, the other in the network namespace of the user
  • Add a default IPv6 route to the bridge
  • Register the name of the container to the generated IPv6 address in consul
  • Deregister the name when the container stops