目标
ceph recovery 时会占用大量带宽
本文主要调研一下如何控制, 主要降低 ceph recovery 时的速度, IO 能力
查询某个 osd 当前最大读写能力
[root@cephsvr-128214 ~]# ceph tell osd.12 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"bytes_per_sec": 122277678
}
recovery 常见检测
参数调整方法
单个 OSD 参数调整
[root@cephsvr-128214 ~]# ceph daemon osd.12 config set debug_osd 10
{
"success": ""
}
[root@cephsvr-128214 ~]# ceph --admin-daemon /var/run/ceph/ceph-osd.12.asok config show | grep debug_osd
"debug_osd": "10/10",
针对所有 OSD 进行参数调整
[root@cephsvr-128040 dizzy]# ceph tell osd.\* injectargs '--osd_max_backfills=1'
osd.0: osd_max_backfills = '1'
osd.1: osd_max_backfills = '1'
osd.2: osd_max_backfills = '1'
osd.3: osd_max_backfills = '1'
osd.4: osd_max_backfills = '1'
osd.5: osd_max_backfills = '1'
osd.6: osd_max_backfills = '1'
osd.7: osd_max_backfills = '1'
osd.8: osd_max_backfills = '1'
osd.9: osd_max_backfills = '1'
osd.10: osd_max_backfills = '1'
osd.11: osd_max_backfills = '1'
osd.12: osd_max_backfills = '1'
osd.13: osd_max_backfills = '1'
osd.14: osd_max_backfills = '1'
查询当前参数方法
[root@cephsvr-128040 dizzy]# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep backfill
"mon_osd_backfillfull_ratio": "0.900000",
"osd_backfill_retry_interval": "30.000000",
"osd_backfill_scan_max": "512",
"osd_backfill_scan_min": "64",
"osd_debug_reject_backfill_probability": "0.000000",
"osd_debug_skip_full_check_in_backfill_reservation": "false",
"osd_kill_backfill_at": "0",
"osd_max_backfills": "1",
IO 能力调整(针对 recovery)
recovery 信息
参考命令获得 recovery 信息
[root@cephsvr-128214 ~]# ceph pg dump|grep recovering
dumped all
3.20 332 0 650 0 0 1391972352 767 767 active+recovering+degraded 2017-12-01 17:32:16.343398 392'767 511:1495 [30,13,3] 30 [30,13,3] 30 392'31 2017-11-29 13:09:00.918493 330'12 2017-11-24 16:50:10.749723
3.365 357 0 721 0 0 1492619264 803 803 active+recovering+degraded 2017-12-01 17:32:16.603756 392'803 511:1425 [4,15,30] 4 [4,15,30] 4 392'24 2017-11-29 15:07:21.700341 392'24 2017-11-29 15:07:21.700341
3.428 335 485 485 0 0 1400328438 1666 1666 active+recovering+degraded 2017-12-01 16:54:11.036479 392'1666 510:49 [16,6,31] 16 [16,6,31] 16 392'407 2017-11-30 06:03:19.380038 0'0 2017-11-23 16:34:44.666269
3.524 383 144 144 0 0 1592238097 919 919 active+recovering+degraded 2017-12-01 16:52:35.565517 392'919 510:817 [15,10,32] 15 [15,10,32] 15 392'397 2017-11-29 23:30:37.556655 391'34 2017-11-26 01:52:22.370744
3.640 315 0 639 0 0 1321177088 685 685 active+recovering+degraded 2017-12-01 17:32:18.653057 392'685 511:1340 [2,19,34] 2 [2,19,34] 2 392'311 2017-11-30 08:50:52.272975 392'34 2017-11-28 22:58:28.172339
重要信息如下
[root@cephsvr-128214 ~]# ceph pg dump|grep recovering|awk '{print $1,$2,$4,$10,$15,$16,$17,$18}'
PG_STAT OBJECTS DEGRADED STATE UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB
3.7d0 339 429 active+recovering+degraded [3,29,19] 3 [3,29,19] 3
3.713 320 456 active+recovering+degraded [0,30,15] 0 [0,30,15] 0
3.198 313 419 active+recovering+degraded [6,13,33] 6 [6,13,33] 6
3.428 335 485 active+recovering+degraded [16,6,31] 16 [16,6,31] 16
3.524 383 144 active+recovering+degraded [15,10,32] 15 [15,10,32] 15
watch 脚本
watch -n 1 -d "ceph pg dump|grep recovering|awk '{print \$1,\$2,\$4,\$10,\$15,\$16,\$17,\$18}'"
磁盘当前读写速度
安装软件
yum install -y dstat
监控状态
[root@cephsvr-128214 ~]# dstat -td -D /dev/sdb
time | read writ
01-12 17:37:26| 758B 2382k
01-12 17:37:27| 0 8228k
01-12 17:37:28| 0 16M
01-12 17:37:29| 0 24M
recovery 控制
默认参数
osd_max_backfills = 1
osd_recovery_max_active = 3
osd_recovery_sleep = 0
“osd_disk_thread_ioprio_priority”: “-1”,
“osd_disk_threads”: “1”,
“osd_backfill_scan_max”: “512”,
“osd_backfill_scan_min”: “64”,
“osd_recovery_op_priority”: “3”,
“osd_recovery_max_active”: “3”,
默认状态
max recovery | journal SSD | 单个 osd SATA | recoverying pg |
80 ~ 150MB/s | 50 ~ 180MB/s | 20 ~ 60MB/s | 5 ~ 6个 |
调整参数
osd_max_backfill
osd_max_backfills = 2
max recovery | journal SSD | 单个 osd SATA | recoverying pg |
120 ~ 350MB/s | 180 ~ 340MB/s | 12 ~ 70MB/s | 20 ~ 22个 |
osd_recovery_max_active
osd_recovery_max_active = 10
max recovery | journal SSD | 单个 osd SATA | recoverying pg |
80 ~ 250MB/s | 50 ~ 260MB/s | 20 ~ 60MB/s | 5 ~ 10个 |
只调整这个参数, 效果不大, 必须配合 osd_max_backfills 进行调整
osd_recovery_sleep
osd_recovery_sleep = 0.5
这个主要用于降速
max recovery | journal SSD | 单个 osd SATA | recoverying pg |
30 ~ 50MB/s | 20 ~ 55MB/s | 12 ~ 40MB/s | 5 ~ 10个 |