You've successfully subscribed to Nuvotex Blog
Great! Next, complete checkout for full access to Nuvotex Blog
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.

ceph reports "n hosts fail cephadm check"

One of our ceph clusters entered HEALTH_WARN state with reason "1 hosts fail cephadm check". This guide shows a quick tip how to find out more about this issue.

Daniel Nachtrub
Daniel Nachtrub

One of our ceph clusters entered HEALTH_WARN state while seemingly everything had been running. Checking out the status showed:

  cluster:
    id:     2b9ccc20-2b33-11eb-8d8f-00155d51f07c
    health: HEALTH_WARN
            1 hosts fail cephadm check

All daemons had been running and everything worked as expected. How to find out what's wrong?

You can use cephadm check-host to verify connectivity and requirements for ceph to run successfully. So let's try out:

ceph cephadm check-host nuv-dc-apphost2
check-host failed:
INFO:cephadm:podman|docker (/usr/bin/docker) is present
INFO:cephadm:systemctl is present
INFO:cephadm:lvcreate is present
WARNING:cephadm:No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service']
INFO:cephadm:Hostname "nuv-dc-apphost2" matches what is expected.
ERROR: No time synchronization is active
sad check

Checking the ntp daemon on the affected host, it's been down indeed. So i just started the daemon again and cluster has gone happy right afterwards again.

ceph cephadm check-host nuv-dc-apphost3
nuv-dc-apphost3 (None) ok
happy check

So - if you've a similar issue, invoke ceph cephadm check-host to see what check failed and be able to resolve the issue.

Linux

Daniel Nachtrub

Just some guy working with computers.