You've successfully subscribed to Nuvotex Blog
Great! Next, complete checkout for full access to Nuvotex Blog
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.
One of our ceph clusters entered HEALTH_WARN state while seemingly everything had been running. Checking out the status showed:
1 hosts fail cephadm check
All daemons had been running and everything worked as expected. How to find out what's wrong?
You can use
cephadm check-host to verify connectivity and requirements for ceph to run successfully. So let's try out:
ceph cephadm check-host nuv-dc-apphost2
INFO:cephadm:podman|docker (/usr/bin/docker) is present
INFO:cephadm:systemctl is present
INFO:cephadm:lvcreate is present
WARNING:cephadm:No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service']
INFO:cephadm:Hostname "nuv-dc-apphost2" matches what is expected.
ERROR: No time synchronization is active
Checking the ntp daemon on the affected host, it's been down indeed. So i just started the daemon again and cluster has gone happy right afterwards again.
ceph cephadm check-host nuv-dc-apphost3
nuv-dc-apphost3 (None) ok
So - if you've a similar issue, invoke
ceph cephadm check-host to see what check failed and be able to resolve the issue.