Many - if not most - kubernetes installations out there are running on cloud providers. And that's fine. Everything is managed and you don't need to think too much about how all this is magically working.
In my experience the most challenging topic of running kubernetes - depending on the cluster size - is not necessarily to keep the control-plane running but rather integrate a well suited storage solution.
On our internal systems we use rook as an operator for ceph. So we provide some nodes on the kubernetes cluster that will run the storage cluster as an application workload in kubernetes and provide this the other workloads.
I don't want to talk to much about why we are using rook (the answer is simple: it's simple and it's reliable and we can use "hyperconverged" nodes). To get some more information about rook: https://rook.io/
What I actually want to answer is:
Why do we use ceph?
From time to time people are why we are using ceph.
There are a few major reasons why.
- Ceph is reliable and field proven
- Ceph is incredibly robust
- Ceph can add more nodes and scale
But this is something many storages can. Can it do more?
- Ceph CSI can provide ReadWriteOnce volumes (based on RBDs)
- Ceph can provide ReadWriteMany volumes (based on ceph filesystem)
- Ceph can provide an object storage with an S3 compatible endpoint
In this regard ceph provides both main volume modes on kubernetes workloads - and you will need both in most cases!
Plus we also have support for an object storage right from the actual storage solution - no additional layer of abstraction (like minio, which is still great!).
Myths about ceph
Isn't ceph creating much overhead?
Sure - if you have a very small cluster ceph will consume quite some CPU and memory. There's stuff about metadata going on, caching and basic CPU requirements.
If you have filesystems (RWX volumes) with milions of files, metadata is growing even further.
But - this is not an issue of ceph at the end. It's an issue of the storage layer every application/implementation will need to handle in one or another way. Other solutions will mostly have similar requirements.
Isn't ceph hard to manage?
This depends on your own experience with storage solutions and - unexpected - ceph itself. Ceph is a quite logical piece of software and behaves in most scenarios like expected. As in any application there are some quirks and stuff you just need to know. Generally speaking it's very intuitive and by default (especially with rook) holds you back from doing too dumb things :-)
Isn't ceph storage slow?
Using ceph as an abstraction layer on top of your physical disks is slower - that's true. And you should expect some serious reduction on IOPS here, especially if you are having very fast devices (like NVMe) and very small I/O (for example 4K ops). This setup is not ceph's very strength, ceph (in our experience) shines much brighter with larger request sizes.
There might be some future post about some detailed performance testing with ceph and some options you might to adjust to squeeze some more out of ceph. But that's mostly in the range of < 20% - don't expect magic here.
But - other abstraction layers do have the same issue here. Mayastor or longhorn show similar overheads than ceph.
This means, if your kubernetes runs database workload that is mostly I/O bound, you might need to think about another storage solution that runs outside of kubernetes at all (maybe some Microsoft S2D based storage that is mapped using SMB3).
So - when you have some spare time and want to get more into detail about kubernetes and more technical on an infrastructure level of such workloads, check out rook and play with it - it's worth it!