Containers are no virtual machines - yet we might want to treat hosts running container workloads like hypervisors and apply limitations on container networking. This guide describes a way to limit container networking on docker based container hosts using firewalld.
Consequent adaption of microservices goes hand in hand with production usage of container applications. Running (multiple) containers on a single host is (from a logical perspective) similar to running a hypervisor which is hosting several virtual machines: A single host is running several distinct application workloads. Running these workloads in several virtual networks (container networks) effectively isolates these services from each other. External access is implemented using exposed ports. In general this works very well and we don't have to get in touch with too much of network configuration when running container workloads. Yet, there is one thing to consider.
TL;DR - Kubernetes
This guide is a deep dive into how to implement limitations on container networking. Running orchestration like kubernetes offers more flexible ways to implement restrictions on networking like network policies.
Container access to local networks
A container running on docker has full network access by default. This implies accessing remote services on the internet as well as accessing the local network. Just like any other application running on an operating system, a container has no limitation on outgoing network traffic.
From a security perspective we might want to limit this access:
Containers may get automatically updated (foreign sources)
Containers may get compromised
Considering the case one of you're containers is compromised it's useful to ensure that the container cannot access local resources.
Isn't this part of the security between hosts?
Yes! Each host in your environment needs to ensure that it cannot be attacked from another host in the same network. Zero-trust!
Why treat containers otherwise?
The issue with containers is that they are running on a host and cannot only access the network, they also can access all endpoints on the host itself.
So, for example, if you are running a reverse proxy that routes incoming requests to different containers, an attacker might use a hijacked container and access a configuration rest-api on that reverse-proxy in order to reconfigure the routing.
In another scenario docker may expose the client access using a tcp- instead of unix-socket. In this case a container might just access the socket and spawn new containers that in turn with elevated privileges.
You might also be running some other workloads on your nodes like glusterfs or ceph or even just a NFS share that runs on your internal network and might be accessible from any of your container hosts.
The attack vector looks somehow like this:
Limiting network access
Regarding the described situations above we would like to ensure that services running within our containers cannot communicate with local resources. As most microservices don't access local resources at all, we are going to prevent communication to the container host as well as to the local network.
Our implementation makes use of firewall-cmd which in turn utilizes iptables/nftables for actual filtering.
You need to install firewalld unless it's already installed. On debian/ubuntu based systems ufw might already be installed. ufw is also capable of implementing the solution - after all we're just implementing some iptables/nftables rules. The administration using firewall-cmd provided by firewalld is just easier and avoids fiddling with configuration files.
Applying the restrictions is done using a set of commands, shown below.
These commands will to the following:
create several chains
redirect outbound traffic from containers if targeting loopback interface
redirect outbound traffic from containers if targeting eth* interface
dropped packets will be logged (with rate limiting)
The script is shown here:
Having create the rules you need to reload the ruleset using firewall-cmd --reload.
The ruleset above will DROP packets - containers won't get any error here, packets are silently dropped. You may also use REJECT instead of DROP to notify containers that a packet has been dropped. From a security perspective the knowledge that a port has been blocked may slow down network analysis when searching for services to exploit for lateral movement. In this case it may just not be required as most containers won't even handle the distinction between DROP and REJECT. So we just DROP. If an exception is required, you'll have to add an exception anyways.
All rules will be created for ipv4 and ipv6. The following destinations will be dropped.
unique local unicast (private networks)
The packet flow will be as follows.
packet enters firewall
depending on source interface DOCKER-USER chain will be triggered
depending on the outgoing interface the DOCKER-USER-DENY-INTERNAL chain will be triggered
if a packet is targeting one of the destination networks, the DOCKER-USER-DROP chain will be executed
otherwise the processing will return to DOCKER-USER chain
packets in the DOCKER-USER-DROP chain will be logged with a rate limit and dropped
As a result packets targeting the desired interfaces (lo & eth*) will be dropped when the destination is a private or non-routable destination.
You can also view the rules using firewall-cmd.
This shows all current installed rules. Adding the --permanent flag to the command will show the persistent ruleset.
Retrieving droppet packets
Adding application workload or diagnosing communication issues can be hard if you cannot check a log for rejects. The ruleset above therefore implements logging (with rate limiting) to enable diagnosing dropped packets.
dmesg will print out all messages - just filter using our assigned prefix DOCKER_NET_DROP - and you'll get all dropped packets.
Watch messages in real-time
If you need to watch in real-time use dmesg -w - this will print messages as they are received. Just be aware of the rate limiting we've set above.
Log rate limiting
By default we're setting a rate limit of one log message every 10 seconds. Rate limit is enforced per srcip, dstip and dstport. The limit of 10s is chosen to reduce the number of buckets if a container is doing a portscan (which would in turn generate a bucket per destination port) while still having a moderate rate in case a service is frequently trying to access the same destination (like due an configuration mistake or yet not allowed service). Additionally having a log message every 10 seconds might help administrators to recognize that the rate limiter has been hit.
Granting communication for desired services
Obviously not every container is running without dependencies on local networks. Common use cases are databases which are not part of the container layer or authentication backends like an LDAP server. Therefore this guide takes this in consideration and provides a solution.
Allowing service communication according your requirements can be achieved by creating a rule that is processed prior to our deny rules. The default rules we've implemented above are using a priority of 4096 in the DOCKER-USER chain. Adding an allowed service is done by creating another rule on the DOCKER-USER chain with a priority less than 4096 and RETURN.
This rule will grant any container to access the destination 10.57.17.5 (which is located on another host) without any further restriction.
How granular should exceptions be?
The recommendation is to allow communication to other hosts without limitation. According to zero-trust we always need to make sure that a compromised host cannot overtake the whole network. If you're communication between network zones this is even more true as the separation and limitation of services has to be enforced by the gateway firewall.
Why RETURN instead of ACCEPT?
You could - of course - use the ACCEPT instead of RETURN. This implies that iptables/nftables will skip processing the forward chain and pass the packet on. Technically this is even faster than using RETURN. Yet we don't want to bypass any (potentially important) rules that have been created by the docker daemon itself.
The packet flow in forward chain ist:
If we're accepting a packet in DOCKER-USER chain we're also skipping the DOCKER chain. As our goal is to allow packets to pass (just as our limitation wouldn't exist at all) we don't fiddle with the docker internal processing rules by just returning to the flow as our limitation never existed.
Graceful implementation of request filtering
If you've set up a container host a while ago and would like to add this awesome network restriction right now, you might wonder if there's a way to add this without affecting already affective services.
Luckily - there's a way to achieve this.
Add an audit only ruleset (see below)
Watch the audit logs
Add exceptions for services that require internal communication
If all exceptions are in place: enforce ruleset
The whole ruleset with audit looks as follows.
In this case we've modified the DOCKER-USER-DROP chain to not DROP packets - we're just logging packets and returning back to the calling chain. This is done by injection of RETURNrules right before the we drop the packets (using priority 127).
Having this rules enabled, check out the logs as described above. If no more services need to be granted, remove the RETURN rules and enforce dropping unintended traffic.
You build your OnPremise Kubernetes Cluster and set up your self-hosted private registry. To make it pretty you used your own CA to sign the certificate for the registry. Everything is fine and now you are ready to deploy your own services to your Kubernetes Cluster and develop some awesome