This time i'm talking about some helpers that may help you to check i/lo related bottlenecks when working with linux.
Obviously, but useful - use iotop to watch current io utilization and associated processes.
Total DISK READ : 0.00 B/s | Total DISK WRITE : 6.82 M/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 7.99 M/s
TID PRIO USER DISK READ DISK WRITE> SWAPIN IO COMMAND
425 be/4 mysql 0.00 B/s 1718.11 K/s 0.00 % 1.56 % mysqld
424 be/4 mysql 0.00 B/s 1710.37 K/s 0.00 % 1.89 % mysqld
422 be/4 mysql 0.00 B/s 1687.15 K/s 0.00 % 0.25 % mysqld
423 be/4 mysql 0.00 B/s 1687.15 K/s 0.00 % 1.28 % mysqld
430 be/4 mysql 0.00 B/s 178.00 K/s 0.00 % 0.00 % mysqld
This is quite handy if you need to check when there's a bottleneck and you want to get a quick overview about potential sources.
If you want to get deeper, fire up iostat (redhat/centos provide the tool in the sysstat package).
Here you can see that /dev/sda is writing much more data than actually reading. You should especially keep an eye on tps (transfers per second) and kB_read/s + kB_wrtn/s (KB read/written per second). When having bottlenecks, you might spot an issue here, if values differ from expectation of the underlying system.
If you need to go deeper and see what partition has most transfers, use the p flag.
This provides insights on a per partition level. On the system taking the snapshot data is written most of the time into a database (append) - so most i/o is write on the currently last partition. This matches our expectations.
If you need to dig even deeper, use the extended flag (x).
Most useful counters here are the await values - these indicate the latency.
A very handy tool to check the current latency is ioping. Checking read or write latency is really easy here. Here are some examples:
It's quite useful to use a reasonable working set here and use DIRECT_IO (-D flag) - this bypasses kernel caches, especially.
If you want to check latency, use small sizes (-s, 4k - 32k) - if you want to check throughput use larger values (-s, 4M-8M).
Obviously there are plenty other tools around there, that may help to find useful information about i/o issues. For a common and quick test to check whether storage may be an issue I mostly use the tools shown above - so hopefully these may help you too sometimes.
On a recent project I've been stumbling on the case that kerberos tickets have been inadvertently shared across containers on a node - which obviously caught my attention as I'm not keen on sharing such secrets across workloads. This post describes why this happens and what to do to prevent this.
If you run kubernetes on your own, you need to provide a storage solution with it. We are using ceph (operated through rook). This article gives some short overview about it's benefits and some pro's and con's of it.