You've successfully subscribed to Nuvotex Blog
Great! Next, complete checkout for full access to Nuvotex Blog
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.

When fstrim stalls your I/O subsystem

On one of our systems we had issues that - once a week - the I/O subsystem stalled and causes issues on database operations.

Daniel Nachtrub
Daniel Nachtrub

On one of our systems we're running some sort of quite active database that is writing at a high ingestion rate 24/7. Since a while we had issues that the system had issues once every week and we thought it relates to postgres (timescaledb) when large portions of the databases are freed.

The problems arised from sunday to monday at 01:00am and showed that the whole I/O subsystem just stalled for 30-45minutes and the system just went very slow meaning that the data ingestion went into timeouts etc.

when the system is overloaded

The problem is, that we've been quite focused on the database as a root cause that we didn't look too much into the base system (which we should have done right away).

TL;DR - fstrim

After a while we discovered the following logs:

Aug 21 01:42:37 fstrim[1189508]: /mnt/data0: 360,8 GiB (387452268544 bytes) trimmed on /dev/disk/by-uuid/89461c46-8ce2-440b-8688-fc869fa29dca
Aug 21 01:42:37 fstrim[1189508]: /boot/efi: 504,9 MiB (529440768 bytes) trimmed on /dev/disk/by-uuid/3EA2-5AD0
Aug 21 01:42:37 fstrim[1189508]: /boot: 296,4 MiB (310812672 bytes) trimmed on /dev/disk/by-uuid/05e6c53c-9824-45f4-9a6b-1747e653c035
Aug 21 01:42:37 fstrim[1189508]: /: 11,4 GiB (12227768320 bytes) trimmed on /dev/disk/by-id/dm-uuid-LVM-LoI7bwnertrebIByL235l9giXFBPDHUO6J0svRABn021oAZT40YAWYxlMzeFuPG1
Aug 21 01:42:37 systemd[1]: fstrim.service: Deactivated successfully.
log output

This shows that the fstrim trimmed around 360GiB on the disk. On one hand this is quite nice because we pass this information to the disk. On the other hand we're running on a virtual machine that (as by default) uses a fixed size disk on a hybrid storage-subsystem. So there's no requirement to trim at all.

systemctl disable --now fstrim.timer
disable fstrim

Having disabled the timer, all is fine.

Since Ubuntu 18.04 The fstrim is enabled by default on Ubuntu since version 18.04.

Daniel Nachtrub

Kind of likes computers. Linux foundation certified: LFCS / CKA / CKAD / CKS. Microsoft certified: Cybersecurity Architect Expert & Azure Solutions Architect Expert.