multus - deadlock state on thick plugin
Using Multus CNI's 'thick plugin' in Kubernetes can cause pod startup deadlocks due to circular dependencies between pods and the Multus daemon. Switching to the daemonless 'thin plugin' mode effectively resolves this initialization issue.
We're currently working on some advanced kubernetes scenarios that utilize high performance networking (ovs, sriov). To map the network interface, we're using multus-cni, cilium and for example ovs-cni.
During testing we've observed a deadlock state on multus using the thick plugin.
Thick vs thin
Multus has two operation modes - thin is like many CNIs, kubelet invokes the CNI binary and get's the configuration. Thick plugin has a daemon in the background.
With the multus 4.0 release, we introduce a new client/server-style plugin deployment. This new deployment is called 'thick plugin', in contrast to deployments in previous versions, which is now called a 'thin plugin'. The new thick plugin consists of two binaries, multus-daemon and multus-shim CNI plugin. The 'multus-daemon' will be deployed to all nodes as a local agent and supports additional features, such as metrics, which were not available with the 'thin plugin' deployment before. Due to these additional features, the 'thick plugin' comes with the trade-off of consuming more resources than the 'thin plugin'.
Here's the overview taken from the docs:
┌─────────┐ ┌───────┐ ┌────────┐ ┌──────────┐
│ │ cni ADD/DEL │ │ REST POST │ │ cni ADD/DEL │ │
│ runtime ├────────────►│ shim │===========│ daemon ├────────────►│ delegate │
│ │<------------│ │ │ │<------------│ │
└─────────┘ └───────┘ └────────┘ └──────────┘
thick plugin flow
Deadlock state
Running the system (test system has been a k3s node) has had the following issue:
- After startup many pods have been in pending state - waiting for CNI. In the background the multus-shim binary is launched and waiting for the daemon.
- The daemon pod tries to copy the multus-shim on startup onto the host bin path during init before launching the daemon. As of the first point, the file is not writable ("'/host/opt/cni/bin/multus-shim': Text file busy").
So the launching pods are blocking the binary while waiting for the daemon, the daemon is failing because of the other pods in starting state. Overall, just a circular dependency.
Fix: use the thin plugin
Switching to the thin plugin solved the issue for us - as we're not having any dependency on the daemon, we unblocked all other pods and avoided having long running multus-shim processes.