Ceph - OSD restore performance
When ceph restores an OSD, performance may seem quite slow. This is due the default settings where ceph has quite conservative values depending on your application workload. Especially if you're running workloads with many small objects (files), the default values may seem too slow.
Adjust OSD daemon configuration
Configuration of restore speed is mostly affected by OSD daemon configuration. If you want to adjust restore speed, you may try the following settings:
This will adjust the runtime configuration:
- setting max number of backfills per OSD (counted inbound & outbound independently)
- setting max number of active recovery requests per OSD at the same time
- disable delays between recovery operations on hdd/ssd and hybrid configurations
- increasing recovery priority in OSD worker queue
Depending on your hardware you will see a rather huge increase in restore objects (in our test case 20-30 object/s → 1000-2000 object/s).
If you're adjusting these values on a production system, i recommend to increase the values in steps (using ceph tell to set runtime values) and to stay on values that provide a sufficent restore speed - going to fast might impact your actual production workload.
Selecting proper values
You might want to get max speed when restoring data. If your workload allows it, you might increase the values. The out ceph status as input for your decisions.
This shows that you've 8 pgs that are actively backfilling while 24 pgs are waiting on backfill. If - only if - you're hardware still has enough spare capacity, you might want to increase the concurrency on backfilling further in this case.
cluster:
id: bcdbd2fa-7037-11eb-93b2-9380cdd20e72
health: HEALTH_WARN
Degraded data redundancy: 129439/1549110 objects degraded (8.356%), 24 pgs degraded, 24 pgs undersized
services:
mon: 3 daemons, quorum nuv-dc-apphost1,nuv-dc-apphost2,nuv-dc-apphost3 (age 2h)
mgr: nuv-dc-apphost1.cpsuzt(active, since 2h), standbys: nuv-dc-apphost2.esvbvr
mds: cephfs0:1 {0=cephfs0.nuv-dc-apphost2.agwcjj=up:active} 2 up:standby
osd: 3 osds: 3 up (since 8m), 3 in (since 8m); 24 remapped pgs
data:
pools: 3 pools, 65 pgs
objects: 516.37k objects, 17 GiB
usage: 113 GiB used, 143 GiB / 256 GiB avail
pgs: 129439/1549110 objects degraded (8.356%)
41 active+clean
24 active+undersized+degraded+remapped+backfilling
io:
recovery: 48 MiB/s, 1.14k objects/s
This is the same environment running the same restore with backfills increased to 128 - no more waiting for backfilling. We're now recovery with full speed. Use this with care if you're running application workloads!