When fstrim stalls your I/O subsystem

On one of our systems we had issues that - once a week - the I/O subsystem stalled and causes issues on database operations.

Daniel Nachtrub

4 Sep 2023

On one of our systems we're running some sort of quite active database that is writing at a high ingestion rate 24/7. Since a while we had issues that the system had issues once every week and we thought it relates to postgres (timescaledb) when large portions of the databases are freed.

The problems arised from sunday to monday at 01:00am and showed that the whole I/O subsystem just stalled for 30-45minutes and the system just went very slow meaning that the data ingestion went into timeouts etc.

The problem is, that we've been quite focused on the database as a root cause that we didn't look too much into the base system (which we should have done right away).

TL;DR - fstrim

After a while we discovered the following logs:

Aug 21 01:42:37 fstrim[1189508]: /mnt/data0: 360,8 GiB (387452268544 bytes) trimmed on /dev/disk/by-uuid/89461c46-8ce2-440b-8688-fc869fa29dca
Aug 21 01:42:37 fstrim[1189508]: /boot/efi: 504,9 MiB (529440768 bytes) trimmed on /dev/disk/by-uuid/3EA2-5AD0
Aug 21 01:42:37 fstrim[1189508]: /boot: 296,4 MiB (310812672 bytes) trimmed on /dev/disk/by-uuid/05e6c53c-9824-45f4-9a6b-1747e653c035
Aug 21 01:42:37 fstrim[1189508]: /: 11,4 GiB (12227768320 bytes) trimmed on /dev/disk/by-id/dm-uuid-LVM-LoI7bwnertrebIByL235l9giXFBPDHUO6J0svRABn021oAZT40YAWYxlMzeFuPG1
Aug 21 01:42:37 systemd[1]: fstrim.service: Deactivated successfully.

log output

This shows that the fstrim trimmed around 360GiB on the disk. On one hand this is quite nice because we pass this information to the disk. On the other hand we're running on a virtual machine that (as by default) uses a fixed size disk on a hybrid storage-subsystem. So there's no requirement to trim at all.

systemctl disable --now fstrim.timer

disable fstrim

Having disabled the timer, all is fine.

Since Ubuntu 18.04 The fstrim is enabled by default on Ubuntu since version 18.04.

Database Linux

Daniel Nachtrub Twitter

Kind of likes computers. Linux foundation certified: LFCS / CKA / CKAD / CKS. Microsoft certified: Cybersecurity Architect Expert & Azure Solutions Architect Expert.

When fstrim stalls your I/O subsystem

TL;DR - fstrim

Daniel Nachtrub Twitter

Authors →

Daniel Nachtrub

Felix Zimmermann

Sebastian Augustin

TL;DR - fstrim

Daniel Nachtrub Twitter

You might also like

Cannot access more than a few virtual functions using SRIOV Paid Members Public

ingress-nginx 1.12 & allow-snippet-annotations Paid Members Public

Authors →

Daniel Nachtrub

Felix Zimmermann

Sebastian Augustin