ceph reports "n hosts fail cephadm check"

One of our ceph clusters entered HEALTH_WARN state while seemingly everything had been running. Checking out the status showed:

  cluster:
    id:     2b9ccc20-2b33-11eb-8d8f-00155d51f07c
    health: HEALTH_WARN
            1 hosts fail cephadm check

All daemons had been running and everything worked as expected. How to find out what's wrong?

You can use cephadm check-host to verify connectivity and requirements for ceph to run successfully. So let's try out:

ceph cephadm check-host nuv-dc-apphost2
check-host failed:
INFO:cephadm:podman|docker (/usr/bin/docker) is present
INFO:cephadm:systemctl is present
INFO:cephadm:lvcreate is present
WARNING:cephadm:No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service']
INFO:cephadm:Hostname "nuv-dc-apphost2" matches what is expected.
ERROR: No time synchronization is active

sad check

Checking the ntp daemon on the affected host, it's been down indeed. So i just started the daemon again and cluster has gone happy right afterwards again.

ceph cephadm check-host nuv-dc-apphost3
nuv-dc-apphost3 (None) ok

happy check

So - if you've a similar issue, invoke ceph cephadm check-host to see what check failed and be able to resolve the issue.