docker swarm - preserve client ip on incoming connections

Despite the rise of kubernetes, docker swarm is still an easy to use choice on smaller environments with less complicated service infrastructures. In a real world scenario a customer is running logstash as forwarder on a docker swarm and using tcp input. Checking the extracted data, we just saw a mapped sourceip (host field) instead of the real client ip.

The reason for this is pinned down quite fast - it's due the docker swarm routing which will effectively source-nat (snat) incoming connections. Just for an example we'll peek the iptables rules in the ingress namespace:

# nsenter --net=/var/run/docker/netns/ingress_sbox iptables-save | grep SNAT
-A POSTROUTING -d 10.0.0.0/24 -m ipvs --ipvs -j SNAT --to-source 10.0.0.2
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 39403 -j SNAT --to-source :53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 33013 -j SNAT --to-source :53
SNAT in ingress namespace

You can see that there's a SNAT which will affect all destinations within 10.0.0.0/24 (in this case the network where the service relies).

So - it all comes down to the way the routing mesh works on docker swarm. How can we solve this?

The easiest joice would be to change the networking mode of the containers associated with the service. This works on standalone installations but not in services running on docker swarm - host networking mode is not supported there.

Bypass the docker swarm routing mesh

Docker provides direct information on how to omit these issues - as usual you just need to know where it's documented: https://docs.docker.com/engine/swarm/ingress/#bypass-the-routing-mesh

Our solution will be that we're not changing the overall network mode for the container - we're justing assigning specific ports with another mode. In docker-compose this looks like the following:

version: '3.7'

services:
  app:
    image: nginx:latest
    deploy:
      replicas: 1
    ports:
      - target: 443
        published: 443
        protocol: tcp
        mode: host
    networks:
      - default
networks:
  default:
example docker-compose.yaml

Take note that we're assigning mode: host on the published port explicitly. As a result we're exposing this port using another mode - and therefore without traversing through the routing mesh.

As a side effect, we're using a shorter (and quicker) path - especially on systems with heavy load this might be a choice.

There is no free lunch

Like always there's a downside - in this case: The service will be reachable on host(s) running the service, but no requests will be routed to other swarm nodes. There are two solutions for that:

  • Ensure the service runs on every node
  • Use external load balancing (with health checks)

In our case we're able to use external load balancing because the surrounding infrastructure provides the ability for this and we do not increase the complexity on our docker setup.

After all this is not the best solution and has some aftertaste. For now it's an option that works with docker swarm, is plausible and supported by docker. If you want a more sophisticated approach, kubernetes provides much more sophisticated solutions here (which might come with some other challenges).