对于一些刚接触OpenStack的新人,辛苦两天终于把OpenStack部署好了,建立实例时却失败了。这是一件很郁闷的事情。
大家试想一下,如果你准备启动一台物理服务器,如果服务器的CPU,内存,存储发生错误,在开机自检阶段就过不了,即意味着服务器挂掉,无法启动了。
同样,生成VM时常见的报错 " No valid host was found. There are not enough hosts available "也基本是CPU,内存,存储出错导致的。创建VM时错误地使用了external类型的网络,也会产生这个报错。
1. CPU虚拟化参数配置错误
查看nova-compute 日志:
couldn't obtain the vcpu count fromdomain id: 769f95ac-d8da-41be-8e29-f326f03a762f, exception: Requested operationis not valid: cpu affinity is not supported
**分析:**日志出现绑定CPU失败的错误,立刻想到和CPU虚拟化相关。/etc/nova/nova.conf
中的virt_type参数设置得不对
**处理:**修改compute节点的配置文件/etc/nova/nova.conf
如果compute节点是物理机或开启嵌套虚拟化(CPU硬件加速)的虚拟机: virt_type=kvm
如果compute节点是未开启嵌套虚拟化的虚拟机:virt_type=qemu
2. 内存不足导致报错
用一个规格比较高的flavor创建实例:
nova-conductor.log 报错:
qemu-kvm:cannot set up guest memory 'pc.ram': Cannot allocate memory\n"]
nova-scheduler.log :
Filter RetryFilter returned 0 hosts
**分析:**日志已经给出原因:无法分配内存,即内存空间不足。特别是RetryFilter没有筛选出可以提供符合flavor中资源数量的host,这时应该去确认host中的资源是否不够用了。
**处理:**增加计算节点内存
3. 存储相关的原因导致报错
-
用自制的Ubuntu镜像创建实例时,没有配置合适的volume大小
nova日志报错:Image 1c63d80b-dc44-4785-bf20-4cdb47d7b2c6 is unacceptable: Imagevirtual size is 18GB and doesn't fit in a volume of size 1GB.
**分析和处理:**镜像中的文件系统是18GB,上图红圈中的参数必须大于等于18GB
-
Block Device Mapping is Invalid 也是很常见错误
**分析和处理:**通常是由于cinder或ceph等backend配置错误引起的块存储设备报错。这时就需要查看cinder和ceph的服务状态是否正常,这里有几个常用命令:
ceph -s
vgs
cinder service-list
cinder list
lsblk
进一步排查原因的话还需要分析日志。 -
权限不足(导致没有可用的存储)
查看nova-compute 日志:InvalidDiskInfo:Disk info file is invalid: qemu-img failed to execute on/var/lib/libvirt/images/centos7.0.qcow2 : Unexpected error while runningcommand. 2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Command: /usr/bin/python2 -moslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C info /var/lib/libvirt/images/centos7.0.qcow2 2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Exit code: 1 2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Stdout: u'' 2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Stderr: u"qemu-img: Couldnot open '/var/lib/libvirt/images/centos7.0.qcow2': Could not open '/var/lib/libvirt/images/centos7.0.qcow2':Permission denied\n" InvalidDiskInfo:Disk info file is invalid: qemu-img failed to execute on/var/lib/libvirt/images/centos7.0.qcow2 : Unexpected error while runningcommand. 2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Command: /usr/bin/python2 -moslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C info /var/lib/libvirt/images/centos7.0.qcow2 2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Exit code: 1 2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Stdout: u'' 2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Stderr: u"qemu-img: Couldnot open '/var/lib/libvirt/images/centos7.0.qcow2': Could not open '/var/lib/libvirt/images/centos7.0.qcow2':Permission denied\n"
分析:/var/lib/libvirt/images/centos7.0.qcow2是我之前做实验用virt-manager在host OS创建的虚拟机,也就是说这个虚拟机镜像被virt-manager所管理,导致openstack没有权限使用启动这个虚拟机(即没有可用的存储)。
**处理:**在host os 上,卸载virt-manager
4. 创建虚拟机时的错误操作
外部网络(Provider / external / public network)一般不能直接用来创建VM(除非该外部网络同时是shared network)
如果创建VM时直接使用了外部网络,则报no valid host was found 错误。查看拓扑图会发现这台VM没有链路。
controller节点 nova-conductor日志:
==> nova-conductor.log <==
2017-10-16 11:02:23.153 2090 ERROR nova.scheduler.utils [req-370255d8-f462-4090-9203-8db574e8f589 0817b805f51e4877a383dd401a318bee b0d993d7027e457189f70bae70f870a9 - - -] [instance: 9b2e0b9e-6d16-4576-b289-075761e3d449] Error from last host: compute (node compute): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1905, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2058, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 9b2e0b9e-6d16-4576-b289-075761e3d449 was re-scheduled: Binding failed for port 41b8cdb2-3b1d-4661-bb47-f567c57bda5f, please check neutron logs for more information.\n']
2017-10-16 11:02:23.226 2090 WARNING nova.scheduler.utils [req-370255d8-f462-4090-9203-8db574e8f589 0817b805f51e4877a383dd401a318bee b0d993d7027e457189f70bae70f870a9 - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 84, in select_destinations
filter_properties)
File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 90, in select_destinations
raise exception.NoValidHost(reason=reason)
NoValidHost: No valid host was found. There are not enough hosts available.
2017-10-16 11:02:23.227 2090 WARNING nova.scheduler.utils [req-370255d8-f462-4090-9203-8db574e8f589 0817b805f51e4877a383dd401a318bee b0d993d7027e457189f70bae70f870a9 - - -] [instance: 9b2e0b9e-6d16-4576-b289-075761e3d449] Setting instance to ERROR state.
controller节点 neutron-server日志:
2017-10-16 11:02:22.260 2189 ERROR neutron.plugins.ml2.managers [req-cdcb2a89-98f5-45e7-827f-812a01dc4dd6 0ebb9d3a4d52454ca74d1c45a382795a ed964295d9ca4f878fe4c25478aaeca0 - - -] Failed to bind port 41b8cdb2-3b1d-4661-bb47-f567c57bda5f on host compute
compute节点nova-compute日志:
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager [-] Instance failed network setup after 1 attempt(s)
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager Traceback (most recent call last):
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1564, in _allocate_network_async
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager dhcp_options=dhcp_options)
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 744, in allocate_for_instance
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager self._delete_ports(neutron, instance, created_port_ids)
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
2017-10-16 11:02:24.004 1408 DEBUG nova.compute.utils [req-370255d8-f462-4090-9203-8db574e8f589 0817b805f51e4877a383dd401a318bee b0d993d7027e457189f70bae70f870a9 - - -] [instance: 9b2e0b9e-6d16-4576-b289-075761e3d449] Binding failed for port 41b8cdb2-3b1d-4661-bb47-f567c57bda5f, please check neutron logs for more information.
compute节点/var/log/neutron下的openvswitch-agent.log日志:
2017-10-16 11:02:23.138 1417 INFO neutron.agent.securitygroups_rpc [req-cdcb2a89-98f5-45e7-827f-812a01dc4dd6 0ebb9d3a4d52454ca74d1c45a382795a ed964295d9ca4f878fe4c25478aaeca0 - - -] Security group member updated [u'312e84c2-ff27-460f-802a-10eaabb3bd19']
2017-10-16 11:02:23.652 1417 INFO neutron.agent.securitygroups_rpc [req-2493c53e-7b58-47f5-a2af-5f6c49af0f5e 0ebb9d3a4d52454ca74d1c45a382795a ed964295d9ca4f878fe4c25478aaeca0 - - -] Security group member updated [u'312e84c2-ff27-460f-802a-10eaabb3bd19']
2017-10-16 11:02:24.358 1417 INFO neutron.agent.common.ovs_lib [req-1241fe20-da43-457f-9ec8-365111248723 - - - - -] Port 41b8cdb2-3b1d-4661-bb47-f567c57bda5f not present in bridge br-int
2017-10-16 11:02:24.359 1417 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-1241fe20-da43-457f-9ec8-365111248723 - - - - -] port_unbound(): net_uuid None not in local_vlan_map
2017-10-16 11:02:24.360 1417 INFO neutron.agent.securitygroups_rpc [req-1241fe20-da43-457f-9ec8-365111248723 - - - - -] Remove device filter for [u'41b8cdb2-3b1d-4661-bb47-f567c57bda5f']
**分析:**上述日志中有大量port相关报错。根本原因是上面提到的创建虚拟机时不能直接使用外部网络(即无法给VM分配port)。如果要访问外部网络,必须经过路由器中转。
**处理:**使用内部网络(私有网络/private网络)或者shared类型的外部网络创建虚拟机
网络的思考:##
通常情况下,物理机不会因为自身网络/网卡问题而无法启动,同样虚拟机也不会在启动过程中由于网络问题而出现No valid host was found 错误 。例外的场景是在Openstack中增加第三方SDN控制器后,SDN控制器的某些错误配置会引起创建VM时会报No valid host was found 报错; 以及错误的使用外部网络创建虚拟机时也会报No valid host was found错误。
5. 没有可用的服务
查看日志可以发现报错:
**分析:**日志已经给出原因:ServiceNotFound , 即未找到服务供OpenStack使用。
**处理:**有服务没有安装好,或者安装好了没启动。没启动的话就用手动用命令将其启动。
小结
查看日志的技巧——重点看scheduler 日志中的Filter字段来确定哪种资源不足