目标
希望解决针对 openstack 使用用户反馈经常性遇到大容量 vm , 高 IO 吞吐遇到死机问题而进行升级
升级 vm
尝试升级 vm kernel ( centos7)
rpm -ivh kernel-ml-4.20.7-1.el7.elrepo.x86_64.rpm;
grub2-set-default 0;
grub2-mkconfig -o /boot/grub2/grub.cfg;
init 6
对 vm 执行 io 测试方法
yum install -y fio
while [ 1 ]
do
fio -filename=/tmp/test.file -direct=2 -iodepth 2 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=20G -numjobs=10 -runtime=600 -group_reporting -name=test_r_w
sync
sleep 3
done
参考监控结果
升级物理机
包括 kernel-ml-4.20 修复 kernel bug
包括 libvirt-5.0.0-1.el7.x86_64 希望修复 libvirtd 导致死机
包含 qemu-kvm-ev-2.12.0-33.1.el7_7.4.x86_64 希望更新 qemu 版本
包含 libguestfs-1.40.2-5.el7_7.2.x86_64 支持 python3 监控问题
升级内核
rpm -ivh kernel-ml-4.20.7-1.el7.elrepo.x86_64.rpm;
grub2-set-default 0;
grub2-mkconfig -o /boot/grub2/grub.cfg;
init 6
升级 libvirtd
建议使用 7.7.1908 版本软件源
/etc/yum.repos.d/qemu.repo
[libvirt]
name=libvirt
baseurl=http://mirror.centos.org/centos/7/virt/x86_64/libvirt-latest/
enabled=1
gpgcheck=0
[qemu]
name=qemu
baseurl=http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
enabled=1
gpgcheck=0
升级 libivrtd
yum update dbus-libs
yum update libvirt*
升级 qemu
yum update qemu-kvm qemu-img qemu-kvm-common
更新监控需要软件
yum update libguestfs libguestfs-devel libguestfs-tools libguestfs-tools-c libguestfs-winsupport python-libguestfs
bug report
bug 1
当更新 qemu 至 2.10 以上版本,都会出现下面错误
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 182, in run_periodic_tasks
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task task(self, context)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5476, in update_available_resource
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task rt.update_available_resource(context)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/openstack/common/lockutils.py", line 249, in inner
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task return f(*args, **kwargs)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 293, in update_available_resource
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task resources = self.driver.get_available_resource(self.nodename)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4277, in get_available_resource
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task stats = self.get_host_stats(refresh=True)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4952, in get_host_stats
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task return self.host_state.get_host_stats(refresh=refresh)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5352, in get_host_stats
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task self.update_status()
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5395, in update_status
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task data['disk_available_least'] = _get_disk_available_least()
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5368, in _get_disk_available_least
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task disk_over_committed = (self.driver.
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4922, in get_disk_over_committed_size_total
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task self.get_instance_disk_info(i_name))
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4898, in get_instance_disk_info
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task backing_file = libvirt_utils.get_disk_backing_file(path)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/utils.py", line 482, in get_disk_backing_file
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task backing_file = images.qemu_img_info(path).backing_file
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 54, in qemu_img_info
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task 'qemu-img', 'info', path)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/utils.py", line 165, in execute
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task return processutils.execute(*cmd, **kwargs)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/openstack/common/processutils.py", line 195, in execute
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task cmd=sanitized_cmd)
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task ProcessExecutionError: Unexpected error while running command.
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task Command: env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/dbea59cc-6ca7-4986-8c72-e4c895577c7f/disk
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task Exit code: 1
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task Stdout: u''
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task Stderr: u'qemu-img: Could not open \'/var/lib/nova/instances/dbea59cc-6ca7-4986-8c72-e4c895577c7f/disk\': Failed to get shared "write" lock\nIs another process using the image [/var/lib/nova/instances/dbea59cc-6ca7-4986-8c72-e4c895577c7f/disk]?\n'
2020-03-19 17:53:43.401 36513 TRACE nova.openstack.common.periodic_task
2020-03-19 17:54:10.528 36513 WARNING nova.virt.libvirt.driver [-] Connection to libvirt lost: 1
2020-03-19 17:54:10.617 36513 INFO nova.openstack.common.service [-] Caught SIGTERM, exiting
解决方法
修改 /usr/lib/python2.7/site-packages/nova/virt/images.py 文件 (54行)
原:
out, err = utils.execute('env', 'LC_ALL=C', 'LANG=C',
'qemu-img', 'info', path)
修改:
out, err = utils.execute('env', 'LC_ALL=C', 'LANG=C',
'qemu-img', 'info', '-U', path)
重启一下 libvirtd , nova-compute 即可
修复后正常日志如下
2020-03-25 08:47:37.713 12662 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
2020-03-25 08:47:38.832 12662 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 158611
2020-03-25 08:47:38.833 12662 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 686
2020-03-25 08:47:38.833 12662 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 0
2020-03-25 08:47:38.847 12662 INFO nova.compute.resource_tracker [-] Compute_service record updated for ns-compute-209161.vclound.com:ns-compute-209161.vclound.com
2020-03-25 08:48:39.090 12662 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
2020-03-25 08:48:40.218 12662 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 158611
2020-03-25 08:48:40.218 12662 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 686
2020-03-25 08:48:40.218 12662 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 0
2020-03-25 08:48:40.233 12662 INFO nova.compute.resource_tracker [-] Compute_service record updated for ns-compute-209161.vclound.com:ns-compute-209161.vclound.com
bug 2
发现由于每分钟 nova 都会计算磁盘容量,并上报至 nova-server 参考上述 log
导致几天下来 /var/log/libvirt/qemu 目录产生几十万个日志文件
find /var/log/libvirt/qemu/ -type f | wc -l
804179
文件命名方法如下
guestfs-4somethzjjbyyobf.log-20200319 guestfs-e6nse4b28g28c99k.log-20200319 guestfs-nm5tn2146lvxrvyv.log-20200319
guestfs-4sor5uzz7ejc22cr.log-20200319 guestfs-e6nu7925mppk5nqz.log-20200319 guestfs-nm5tpf02xcsdtlu6.log-20200319
guestfs-4sosb1q83b86sw4m.log-20200319 guestfs-e6nuwn6s3y6ar9x3.log-20200319 guestfs-nm5ueib3fz7tetza.log-20200319
guestfs-4sosdak74auigjzo.log-20200319 guestfs-e6nxkp19e3m4mtma.log-20200319 guestfs-nm60gls4vzqjm3sy.log-20200319
guestfs-4soumttutrl2w4qq.log-20200319 guestfs-e6o3hud0hckxn8yo.log-20200319 guestfs-nm615lc34wy18gb0.log-20200319
guestfs-4sow6wiwhvexf60i.log-20200319 guestfs-e6o6iobmx2dvi2za.log-20200319 guestfs-nm66sj3binblobcm.log-20200319
guestfs-4sowhzx0ofcrleqg.log-20200319 guestfs-e6oh14ngcggp242j.log-20200319 guestfs-nm6bg7skiiybfmog.log-20200319
guestfs-4soxim7pgs6q2w7k.log-20200319 guestfs-e6oi5pyou0ry62ua.log-20200319 guestfs-nm6fshqa4602dw2j.log-20200319
guestfs-4sp29nf8fnh6p69g.log-20200319 guestfs-e6oiuk5ylrlm43w6.log-20200319 guestfs-nm6hhallad9iz6sg.log-20200319
guestfs-4sp32i69kmnc3yw4.log-20200319 guestfs-e6oizn6bl9p9h1m0.log-20200319 guestfs-nm6i7jgett1y4igk.log-20200319
直接使用 rm 命令删除文件会出现下面报错
/usr/bin/rm: arg list too long
这个是因为文件数量太多导致, 解决方法
find /var/log/libvirt/qemu/ -type f -exec rm -r {} \;
取消这个日志方法
grep -v "#" /etc/libvirt/libvirtd.conf | grep -v ^$
log_level = 4
log_filters="4:qemu 4:libvirt 4:object 4:json 4:event 4:util"
log_outputs="4:syslog:libvirtd"
audit_logging = 0
sed 方法
sed -i /log_level/a\ log_level\ =\ 4 /etc/libvirt/libvirtd.conf
sed -i /log_filters=/a\ 'log_filters="4:qemu 4:libvirt 4:object 4:json 4:event 4:util"' /etc/libvirt/libvirtd.conf
sed -i /log_outputs=/a\ 'log_outputs="4:syslog:libvirtd"' /etc/libvirt/libvirtd.conf
sed -i /audit_logging/a\ "audit_logging=0" /etc/libvirt/libvirtd.conf
重启 libvirtd 即可解决
bug report 3
假如遇到 libvirt 无法启动并报错
Apr 2 18:01:41 myhostname journal: Failed to probe capabilities for /usr/libexec/qemu-kvm: internal error: Failed to probe QEMU binary with QMP: /usr/libexec/qemu-kvm: relocation error: /lib64/libspice-server.so.1: symbol SSL_CONF_CTX_set_ssl_ctx, version libssl.so.10 not defined in file libssl.so.10 with link time reference
Apr 2 18:01:41 myhostname journal: internal error: Failed to probe QEMU binary with QMP: /usr/libexec/qemu-kvm: relocation error: /lib64/libspice-server.so.1: symbol SSL_CONF_CTX_set_ssl_ctx, version libssl.so.10 not defined in file libssl.so.10 with link time reference
升级 openll 后重启 libvirtd 即可