一、硬件调优
1、NVMe SSD 调优
● 目的
为减少数据跨片开销。
● 方法
将NVMe SSD与网卡插在统一Riser卡。
2、内存插法调优
● 目的
内存按1dpc方式插将获得最佳性能,即将DIMM0插满,此时内存带宽最大。
● 方法
优先插入DIMM0,即插入DIMM000、010、020、030、040、050、100、110、
120、130、140、150插槽。三位数字中,第一位代表所属CPU,第二位代表内存
通道,第三位代表DIMM,优先将第三位为0的插槽按内存通道从小到大依次插
入。
3、public网卡和cluster网卡均衡
将public网卡和cluster网卡插在不同cpu下。
二、BIOS配置
1、Power Policy:performance
2、内存刷新速率:64ms
3、SMMU Disable
4、CPU预取打开
三、网络配置
1、Bond模式
public-bond和cluster-bond均组2个10GE口成一个bond。bond具体配置参数如下。
BONDING_OPTS="mode=2 miimon=1000 xmit_hash_policy=layer3+4"
2、网卡参数配置
调节MTU和网卡队列大小,脚本如下。
for i in `ifconfig | grep flags | awk '{print $1}' | sed "s/://g"`; do ifconfig $i mtu 9000 up ; done;
for i in `ifconfig | grep flags | awk '{print $1}' | sed "s/://g"`
do
ethtool -G $i rx 4096
ethtool -G $i tx 4096
done;
3、关闭系统中断均衡服务
systemctl stop irqbalance
systemctl disable irqbalance
4、打开lro
ethtool -K enp130s0f0 lro on
查看是否打开:
ethtool -k enp130s0f0 | grep large-receive-offload
5、ring_buffer调整
ethtool -G enp130s0f0 rx 4096 tx 4096
查看:ethtool -g enp130s0f0
6、网卡软中断绑核
①. 关闭irqbalance服务。
②. 查询网卡归属于哪个NUMA节点。
cat /sys/class/net/enp130s0f0/device/numa_node
③. 查询该NUMA节点对应哪些CPU core。
lscpu
④. 查询网卡中断号。
ls /sys/class/net/enp189s0f1/device/msi_irqs/
cat /proc/interrupts | grep enp130s0f0 | awk -F ':' '{print $1}'
⑤. 将软中断绑定到该NUMA节点对应的core上。
echo <core编号> > /proc/irq/ <中断号> smp_affinity_list。
四、OS配置
将如下参数放入/etc/profile里面,并执行source /etc/profile。
ulimit -u 1000000
ulimit -n 1000000
ulimit -d unlimited
ulimit -m unlimited
ulimit -s unlimited
ulimit -t unlimited
ulimit -v unlimited
ulimit -l 1024000
五、硬盘调度算法
1、将hdd的调度算法修改为mq-deadline:
echo deadline > /sys/block/sda/queue/scheduler
echo deadline > /sys/block/sdb/queue/scheduler
echo deadline > /sys/block/sdc/queue/scheduler
echo deadline > /sys/block/sdd/queue/scheduler
echo deadline > /sys/block/sde/queue/scheduler
echo deadline > /sys/block/sdf/queue/scheduler
echo deadline > /sys/block/sdg/queue/scheduler
echo deadline > /sys/block/sdh/queue/scheduler
echo deadline > /sys/block/sdi/queue/scheduler
echo deadline > /sys/block/sdj/queue/scheduler
echo deadline > /sys/block/sdk/queue/scheduler
echo deadline > /sys/block/sdl/queue/scheduler
echo deadline > /sys/block/sdm/queue/scheduler
echo deadline > /sys/block/sdn/queue/scheduler
echo deadline > /sys/block/sdo/queue/scheduler
echo deadline > /sys/block/sdp/queue/scheduler
echo deadline > /sys/block/sdq/queue/scheduler
echo deadline > /sys/block/sdr/queue/scheduler
echo deadline > /sys/block/sds/queue/scheduler
echo deadline > /sys/block/sdt/queue/scheduler
2、将SSD的调度算法修改为none:
echo none > /sys/block/nvme0n1/queue/scheduler
echo none > /sys/block/nvme0n2/queue/scheduler
六、软件层面
1、 Kernel pid max 设置内核PID上限到最大值
echo 4194303 > /proc/sys/kernel/pid_max
2、 设置MTU,交换机端需要支持该功能,系统网卡设置才有效果
配置文件追加MTU=9000
3、 read_ahead, 通过数据预读并且记载到随机访问内存方式提高磁盘读操作
echo "8192" > /sys/block/sda/queue/read_ahead_kb
echo "8192" > /sys/block/sdb/queue/read_ahead_kb
echo "8192" > /sys/block/sdc/queue/read_ahead_kb
echo "8192" > /sys/block/sdd/queue/read_ahead_kb
echo "8192" > /sys/block/sde/queue/read_ahead_kb
echo "8192" > /sys/block/sdf/queue/read_ahead_kb
echo "8192" > /sys/block/sdg/queue/read_ahead_kb
echo "8192" > /sys/block/sdh/queue/read_ahead_kb
echo "8192" > /sys/block/sdi/queue/read_ahead_kb
echo "8192" > /sys/block/sdj/queue/read_ahead_kb
echo "8192" > /sys/block/sdk/queue/read_ahead_kb
echo "8192" > /sys/block/sdl/queue/read_ahead_kb
echo "8192" > /sys/block/sdm/queue/read_ahead_kb
echo "8192" > /sys/block/sdn/queue/read_ahead_kb
echo "8192" > /sys/block/sdo/queue/read_ahead_kb
echo "8192" > /sys/block/sdp/queue/read_ahead_kb
echo "8192" > /sys/block/sdq/queue/read_ahead_kb
echo "8192" > /sys/block/sdr/queue/read_ahead_kb
echo "8192" > /sys/block/sds/queue/read_ahead_kb
echo "8192" > /sys/block/sdt/queue/read_ahead_kb
echo "8192" > /sys/block/nvme0n1/queue/read_ahead_kb
echo "8192" > /sys/block/nvme1n1/queue/read_ahead_kb
4、 swappiness, 关闭虚拟内存
echo “vm.swappiness = 0″/etc/sysctl.conf ; sysctl –p
5、bcache顺序中断
echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
echo 0 > /sys/block/bcache1/bcache/sequential_cutoff
echo 0 > /sys/block/bcache2/bcache/sequential_cutoff
echo 0 > /sys/block/bcache3/bcache/sequential_cutoff
echo 0 > /sys/block/bcache4/bcache/sequential_cutoff
echo 0 > /sys/block/bcache5/bcache/sequential_cutoff
echo 0 > /sys/block/bcache6/bcache/sequential_cutoff
echo 0 > /sys/block/bcache7/bcache/sequential_cutoff
echo 0 > /sys/block/bcache8/bcache/sequential_cutoff
echo 0 > /sys/block/bcache9/bcache/sequential_cutoff
echo 0 > /sys/block/bcache10/bcache/sequential_cutoff
echo 0 > /sys/block/bcache11/bcache/sequential_cutoff
echo 0 > /sys/block/bcache12/bcache/sequential_cutoff
echo 0 > /sys/block/bcache13/bcache/sequential_cutoff
echo 0 > /sys/block/bcache14/bcache/sequential_cutoff
echo 0 > /sys/block/bcache15/bcache/sequential_cutoff
echo 0 > /sys/block/bcache16/bcache/sequential_cutoff
echo 0 > /sys/block/bcache17/bcache/sequential_cutoff
echo 0 > /sys/block/bcache18/bcache/sequential_cutoff
echo 0 > /sys/block/bcache19/bcache/sequential_cutoff
6、bcache配置
for var in `ls -d /sys/fs/bcache/*`
do
echo 0 >$var/congested_read_threshold_us
echo 0 >$var/congested_write_threshold_us
done
7、设置最小回刷速度为128k(默认8)
echo 512 > /sys/block/sda/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdb/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdc/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdd/bcache/writeback_rate_minimum
echo 512 > /sys/block/sde/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdf/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdg/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdh/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdi/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdj/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdk/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdl/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdm/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdn/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdo/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdp/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdq/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdr/bcache/writeback_rate_minimum
echo 512 > /sys/block/sds/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdt/bcache/writeback_rate_minimum
8、为所有块设备开启writeback模式
echo writeback > /sys/block/sda/bcache/cache_mode
echo writeback > /sys/block/sdb/bcache/cache_mode
echo writeback > /sys/block/sdc/bcache/cache_mode
echo writeback > /sys/block/sdd/bcache/cache_mode
echo writeback > /sys/block/sde/bcache/cache_mode
echo writeback > /sys/block/sdf/bcache/cache_mode
echo writeback > /sys/block/sdg/bcache/cache_mode
echo writeback > /sys/block/sdh/bcache/cache_mode
echo writeback > /sys/block/sdi/bcache/cache_mode
echo writeback > /sys/block/sdj/bcache/cache_mode
echo writeback > /sys/block/sdk/bcache/cache_mode
echo writeback > /sys/block/sdl/bcache/cache_mode
echo writeback > /sys/block/sdm/bcache/cache_mode
echo writeback > /sys/block/sdn/bcache/cache_mode
echo writeback > /sys/block/sdo/bcache/cache_mode
echo writeback > /sys/block/sdp/bcache/cache_mode
echo writeback > /sys/block/sdq/bcache/cache_mode
echo writeback > /sys/block/sdr/bcache/cache_mode
echo writeback > /sys/block/sds/bcache/cache_mode
echo writeback > /sys/block/sdt/bcache/cache_mode
9、IO路径跟踪
for var in `ls -d /sys/fs/bcache/*`
do
echo 0 >$var/congested_read_threshold_us
echo 0 >$var/congested_write_threshold_us
done
10、脏数据回写比例
echo 5 > /proc/sys/vm/dirty_background_ratio
echo 10 > /proc/sys/vm/dirty_ratio
11、bcache配置(默认10、30)
for f in `ls -d /sys/block/bcache*`
do
echo writeback > $f/bcache/cache_mode
echo 20 > $f/bcache/writeback_percent
echo 80 > $f/bcache/writeback_delay
done
12、所有进程打开文件数量file-max设置
设置为cat /proc/meminfo | grep MemTotal |awk '{print$2}'所查看到的值
执行:echo ${file-max} > /proc/sys/fs/file-max
file-max为cat /proc/meminfo | grep MemTotal |awk '{print$2}'所查看到的值
13、nr_requests(默认256)
查看:cat /sys/block/sdb/queue/nr_requests
设置:echo 512 > /sys/block/sdb/queue/nr_requests
echo 512 > /sys/block/sda/queue/nr_requests
echo 512 > /sys/block/sdb/queue/nr_requests
echo 512 > /sys/block/sdc/queue/nr_requests
echo 512 > /sys/block/sdd/queue/nr_requests
echo 512 > /sys/block/sde/queue/nr_requests
echo 512 > /sys/block/sdf/queue/nr_requests
echo 512 > /sys/block/sdg/queue/nr_requests
echo 512 > /sys/block/sdh/queue/nr_requests
echo 512 > /sys/block/sdi/queue/nr_requests
echo 512 > /sys/block/sdj/queue/nr_requests
echo 512 > /sys/block/sdk/queue/nr_requests
echo 512 > /sys/block/sdl/queue/nr_requests
echo 512 > /sys/block/sdm/queue/nr_requests
echo 512 > /sys/block/sdn/queue/nr_requests
echo 512 > /sys/block/sdo/queue/nr_requests
echo 512 > /sys/block/sdp/queue/nr_requests
echo 512 > /sys/block/sdq/queue/nr_requests
echo 512 > /sys/block/sdr/queue/nr_requests
echo 512 > /sys/block/sds/queue/nr_requests
echo 512 > /sys/block/sdt/queue/nr_requests
七、ceph参数调优
[global]
osd pool default size=3
osd memory target=4294967296
osd pool default min size=1
max open files=131072
osd memory target=4294967296
[mon]
mon clock drift allowed=1
mon osd min down reporters=13
mon osd down out interval=600
[osd]
osd journal size=20000
osd max write size=512
osd client message size cap=2147483648
osd deep scrub stride=131072
osd op threads=16
osd disk threads=4
osd map cache size=1024
osd map cache bl size=128
osd recovery op priority=2
osd recovery max active=10
osd max backfills=4
osd min pg log entries=30000
osd max pg log entries=100000
osd mon heartbeat interval=40
ms dispatch throttle bytes=1048576000
objecter inflight ops=819200
osd op log threshold=50
osd crush chooseleaf type=0
journal max write bytes=1073714824
journal max write entries=10000
journal queue max ops=50000
journal queue max bytes=10485760000
[client]
rbd cache=True
rbd cache size=335544320
rbd cache max dirty=134217728
rbd cache max dirty age=30
rbd cache writethrough until flush=False
rbd cache max dirty object=2
rbd cache target dirty=235544320