On one of our systems we had issues that - once a week - the I/O subsystem stalled and causes issues on database operations.
Upgrading a major postgres version using containers with different C libraries caused me some headaches because I go an error "database has no actual collation version, but a version was recorded" - and I did not fix it. At least I can give a hint on why it happend and how you could avoid it.
Uprading postgres with timescaledb caused me some issues related to the collation. After some retries I've found a reliable way to doing the upgrade. This post describes the steps to be done.
If you run kubernetes on your own, you need to provide a storage solution with it. We are using ceph (operated through rook). This article gives some short overview about it's benefits and some pro's and con's of it.
On a recent project I've been stumbling on the case that kerberos tickets have been inadvertently shared across containers on a node - which obviously caught my attention as I'm not keen on sharing such secrets across workloads. This post describes why this happens and what to do to prevent this.
The new openvpn 2.6.0 has some very nice and shiny features you might want to starting using soon. This post highlights some of them.
One of our playgrounds recently had an incident which caused control-plane to go out-of-memory. This article shows how to diagnose and especially how to fix or event prevent this.
You build your OnPremise Kubernetes Cluster and set up your self-hosted private registry. To make it pretty you used your own CA to sign the certificate for the registry. Everything is fine and now you are ready to deploy your own services to your Kubernetes Cluster and develop some awesome