docker daemon not starting after upgrade

Having upgraded docker you may come into a situation where the docker daemon itself doesn't start anymore. I've seen this on a CentOS 8 host in our lab environment after upgrading to the latest docker release (20.10).

May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.307874000+02:00" level=info msg="Starting up"
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.312047700+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.312076200+02:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.312107100+02:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}" module=grpc
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.312124300+02:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.329196000+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.329233100+02:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.329283700+02:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}" module=grpc
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.329299400+02:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.354999100+02:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.402451200+02:00" level=warning msg="Your kernel does not support cgroup blkio weight"
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.402487000+02:00" level=warning msg="Your kernel does not support cgroup blkio weight_device"
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.402880500+02:00" level=info msg="Loading containers: start."
May  3 18:47:37 lab2-nuv-apphost1 dockerd[7450]: time="2021-05-03T18:47:37.474785600+02:00" level=info msg="Firewalld: docker zone already exists, returning"
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D PREROUTING' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D OUTPUT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER' failed: iptables v1.8.4 (nf_tables):  CHAIN_USER_DEL failed (Device or resource busy): chain DOCKER
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION-STAGE-1' failed: iptables v1.8.4 (nf_tables):  CHAIN_USER_DEL failed (Device or resource busy): chain DOCKER-ISOLATION-STAGE-1
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -D FORWARD -i docker0 -o docker0 -j DROP' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May  3 18:47:37 lab2-nuv-apphost1 firewalld[870]: ERROR: ZONE_CONFLICT: 'docker0' already bound to a zone

/var/log/messages

Showing the log reveals quickly that there's an issue on the firewall configuration. To be correct: there's already a zone assignment for the docker0 interface which conflicts with a configuration docker daemon performs on startup.

This might relate to the firewalld integration that has been shipped with the latest releases of docker daemon and is now fiddling somehow with firewalld. Let's fix it!

Solution - remove interface assignment

Getting docker up and running again is really easy: just remove the interface assignment on firewalld and start the docker daemon.

# fix firewalld configuration
firewall-cmd --permanent --zone=trusted --remove-interface=docker0
firewall-cmd --reload

# start docker
systemctl start docker

fix startup

Docker is now able to map interfaces as expected and doesn't abort on startup anymore.