ceph reports "n hosts fail cephadm check"
One of our ceph clusters entered HEALTH_WARN state with reason "1 hosts fail cephadm check". This guide shows a quick tip how to find out more about this issue.
One of our ceph clusters entered HEALTH_WARN state while seemingly everything had been running. Checking out the status showed:
cluster:
id: 2b9ccc20-2b33-11eb-8d8f-00155d51f07c
health: HEALTH_WARN
1 hosts fail cephadm check
All daemons had been running and everything worked as expected. How to find out what's wrong?
You can use cephadm check-host to verify connectivity and requirements for ceph to run successfully. So let's try out:
Checking the ntp daemon on the affected host, it's been down indeed. So i just started the daemon again and cluster has gone happy right afterwards again.
So - if you've a similar issue, invoke ceph cephadm check-host to see what check failed and be able to resolve the issue.