初探oVirt-使用小结FAQ

最新推荐文章于 2022-11-15 18:46:11 发布
weixin_34019144
最新推荐文章于 2022-11-15 18:46:11 发布
阅读量1k
点赞数
文章标签： python 操作系统运维
原文链接：http://blog.51cto.com/nosmoking/1698911
版权
初探oVirt-使用小结FAQ
2016/11/15
【Q01】如何快速的部署 ovirt 环境
A：如下。
1、相关主机的防火墙内网互通。缓存ovirt的软件包到本地yum源，配置各节点使用本地yum源；
2、配置engine，不要选择自动配置防火墙；
3、在节点机上手动安装 vdsm 和 vdsm-cli ；
4、在engine的页面上新增host，不要选择自动配置防火墙。


【Q02】：执行virsh命令时，会提示需要用户验证（Please enter your authentication name），看错误提示似乎和配置vdsm服务后，使用了SASL有关系，怎么解决？
A：使用工具“saslpasswd2 - set a user’s sasl password”来创建用户。
问题发生时是这样的：
# virsh list
Please enter your authentication name: 
Please enter your password: 
error: Failed to reconnect to the hypervisor
error: no valid connection
error: authentication failed: Failed to step SASL negotiation: -1 (SASL(-1): generic failure: All-whitespace username.)

我们来创建一个用户：
# saslpasswd2 -a libvirt mYusernAme    
Password: mYpasswOrd
Again (for verification): mYpasswOrd

其中，-a 参数跟着 appname，这里我们需要指定的是libvirt服务
原因是：vdsm在加入ovirt时会使用sasl再次加密libvirt

再次测试：
# virsh list
Please enter your authentication name: mYusernAme
Please enter your password: 
 Id    Name                           State
----------------------------------------------------
 1     tvm-test-template              running
 2     tvm-test-clone                 running
 3     tvm-test-clone-from-snapshot   running
 4     testpool001                    running
 5     testpool007                    running
 6     testpool006                    running
 
符合预期。 


【Q03】：执行ovirt界面上的针对vm的重启操作，ovirt的web界面有提示状态的变更，，但vm的console看并未重启，这是怎么回事？
A：vm里面没有安装agent，在linux下面是：ovirt-guest-agent
安装 ovirt-guest-agent
在vm上先安装ovirt-release35.rpm这个yum源。
# yum -y install http://plain.resources.ovirt.org/pub/yum-repo/ovirt-release35.rpm
# yum -y install ovirt-guest-agent
启动服务
# service ovirt-guest-agent start
# chkconfig ovirt-guest-agent on

【Q04】：克隆VM时，磁盘等待很久还没准备就绪
A：状态：被克隆对象，附加了一个大容量磁盘（2T）。
检查所在host上运行的进程，找到qemu-img，检查是否卡死，手动结束。
 

【Q05】：使用glusterfs服务时，报错 “glusterfs: failed to get the 'volume file' from server”
A：先检查gluster版本，保持一致。host启用gluster服务后安装的版本，根据ovirt的源来分析，可能是官网最新的版本。
默认ovirt在安装时，使用的是：ovirt-3.5-dependencies.repo，，当前会下载glusterfs/3.7
客户端手动安装官网的新版本：
# wget http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/CentOS/epel-6/x86_64/glusterfs-3.7.4-2.el6.x86_64.rpm
# wget http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/CentOS/epel-6/x86_64/glusterfs-libs-3.7.4-2.el6.x86_64.rpm
# wget http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/CentOS/epel-6/x86_64/glusterfs-client-xlators-3.7.4-2.el6.x86_64.rpm
# wget http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/CentOS/epel-6/x86_64/glusterfs-fuse-3.7.4-2.el6.x86_64.rpm
# rpm -ivh *.rpm


【Q06】：不使用ovirt管理glusterfs，自己配置glusterfs，怎么做？数据域如何挂载，做了哪些优化工作？
A：
首先，ovirt的优化做了如下工作：
---
优化后，配置将做如下调整：
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: enable
nfs.disable: off
performance.readdir-ahead: on
---


其次，集群内每个主机需要能正常解析gluster的节点名称->IP的映射（不仅是“新建域”时指定的那台主机需要配置hosts或者dns服务器的A记录）
再次，防火墙

示例为在ovirt中启用 gluster 服务后的防火墙配置：
[root@n86 network-scripts]# cat /etc/sysconfig/iptables

# oVirt default firewall configuration. Automatically generated by vdsm bootstrap script.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
# vdsm
-A INPUT -p tcp --dport 54321 -j ACCEPT
# rpc.statd
-A INPUT -p tcp --dport 111 -j ACCEPT
-A INPUT -p udp --dport 111 -j ACCEPT
# SSH
-A INPUT -p tcp --dport 22 -j ACCEPT
# snmp
-A INPUT -p udp --dport 161 -j ACCEPT


# libvirt tls
-A INPUT -p tcp --dport 16514 -j ACCEPT

# guest consoles
-A INPUT -p tcp -m multiport --dports 5900:6923 -j ACCEPT

# migration
-A INPUT -p tcp -m multiport --dports 49152:49216 -j ACCEPT

# glusterd
-A INPUT -p tcp -m tcp --dport 24007 -j ACCEPT

# gluster swift
-A INPUT -p tcp -m tcp --dport 8080  -j ACCEPT

# portmapper
-A INPUT -p tcp -m tcp --dport 38465 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 38466 -j ACCEPT

# nfs
-A INPUT -p tcp -m tcp --dport 38467 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 2049  -j ACCEPT
-A INPUT -p tcp -m tcp --dport 38469 -j ACCEPT

# nrpe
-A INPUT -p tcp --dport 5666 -j ACCEPT

# status
-A INPUT -p tcp -m tcp --dport 39543 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 55863 -j ACCEPT

# nlockmgr
-A INPUT -p tcp -m tcp --dport 38468 -j ACCEPT
-A INPUT -p udp -m udp --dport 963   -j ACCEPT
-A INPUT -p tcp -m tcp --dport 965   -j ACCEPT

# ctdbd
-A INPUT -p tcp -m tcp --dport 4379  -j ACCEPT

# smbd
-A INPUT -p tcp -m tcp --dport 139   -j ACCEPT
-A INPUT -p tcp -m tcp --dport 445   -j ACCEPT

# Ports for gluster volume bricks (default 100 ports)
-A INPUT -p tcp -m tcp --dport 24009:24108 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 49152:49251 -j ACCEPT


# Reject any other input traffic
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -m physdev ! --physdev-is-bridged -j REJECT --reject-with icmp-host-prohibited
COMMIT

【手动配置gluster服务】
1）网卡配置（包括n86, n72, n73）
注：接入ovirt时，默认将自动建立一个网桥ovirtmgmt桥接到其中一个端口上（例如em1）。
em1 -> 10.50.200.0/24
em2+em3=bond1 -> br1 ->10.60.200.0/24

[root@n86 network-scripts]# cat ifcfg-em1
DEVICE=em1
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=10.50.200.86
PREFIX=24
GATEWAY=10.50.200.1
[root@n86 network-scripts]# cat ifcfg-em2
DEVICE=em2
MASTER=bond1
SLAVE=yes
ONBOOT=yes
MTU=1500
NM_CONTROLLED=no
[root@n86 network-scripts]# cat ifcfg-em3
DEVICE=em3
MASTER=bond1
SLAVE=yes
ONBOOT=yes
MTU=1500
NM_CONTROLLED=no
[root@n86 network-scripts]# cat ifcfg-bond1 
DEVICE=bond1
BONDING_OPTS='mode=5 miimon=100'
BRIDGE=br1
ONBOOT=yes
MTU=1500
NM_CONTROLLED=no
HOTPLUG=no
[root@n86 network-scripts]# cat ifcfg-br1 
DEVICE=br1
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
IPADDR=10.60.200.86
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
HOTPLUG=no


2）存储配置-glusterfs集群：n86, n72, n73,（示例提供3副本作为数据域）
【数据盘分区】
如果分区所在设备已经挂载，要先卸载并删掉现有系统。
yum install lvm2 xfsprogs -y   
pvcreate /dev/sdb
vgcreate vg0 /dev/sdb 
lvcreate -l 100%FREE -n lv01 vg0
mkfs.xfs -f -i size=512 /dev/vg0/lv01 
mkdir /data
cat <<_EOF >>/etc/fstab
UUID=$(blkid /dev/vg0/lv01 |cut -d'"' -f2) /data                   xfs     defaults        0 0
_EOF

mount -a
# df -h |grep data
/dev/mapper/vg0-lv01  16T   33M  16T   1% /data


【配置服务】

[root@n86 ~]# yum install glusterfs-server
[root@n86 ~]# service glusterd start
[root@n86 ~]# chkconfig glusterd on

【配置集群】
[root@n86 ~]# gluster peer probe 10.60.200.72
[root@n86 ~]# gluster peer probe 10.60.200.73
每台集群节点上建立目录
[root@n86 ~]# mkdir /data/gv1/brick1 -p

【提供data域】
创建卷gv1作为主数据域：
[root@n86 ~]# gluster volume create gv1 replica 3 transport tcp \
10.60.200.86:/data/gv1/brick1 \
10.60.200.72:/data/gv1/brick1 \
10.60.200.73:/data/gv1/brick1 

【启动】
[root@n86 ~]# gluster volume start gv1

【查看现状】
[root@n86 ~]# gluster volume info
 
Volume Name: gv1
Type: Replicate
Volume ID: 32b1866c-1743-4dd9-9429-6ecfdfa168a2
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.60.200.86:/data/gv1/brick1
Brick2: 10.60.200.72:/data/gv1/brick1
Brick3: 10.60.200.73:/data/gv1/brick1


---配置卷，以gv1为例：
gluster volume set gv1 diagnostics.count-fop-hits on
gluster volume set gv1 diagnostics.latency-measurement on
gluster volume set gv1 storage.owner-gid 36
gluster volume set gv1 storage.owner-uid 36 
gluster volume set gv1 cluster.server-quorum-type server
gluster volume set gv1 cluster.quorum-type auto
gluster volume set gv1 network.remote-dio enable
gluster volume set gv1 cluster.eager-lock enable
gluster volume set gv1 performance.stat-prefetch off
gluster volume set gv1 performance.io-cache off
gluster volume set gv1 performance.read-ahead off
gluster volume set gv1 performance.quick-read off
gluster volume set gv1 auth.allow \*
gluster volume set gv1 user.cifs enable
gluster volume set gv1 nfs.disable off
---配置卷

在1台节点上挂载卷gv1测试
[root@n93 ~]# mount -t glusterfs 10.60.200.86:/gv1 /mnt
[root@n93 ~]# df -h /mnt
Filesystem          Size  Used Avail Use% Mounted on
10.60.200.86:/gv1   16T   39M   16T   1% /mnt



3）配置存储（Storage）
【数据域】
在ovirt上配置页面：
“新建域”
名称：data-gv1
域功能：DATA/GlusterFS
使用主机：随便选择一台
路径：10.50.200.72:/gv1
挂载选项：backupvolfile-server=10.50.200.73,backupvolfile-server=10.50.200.86

在ovirt上配置页面：
“新建域”
名称：data-gv1-bak
域功能：DATA/NFS
使用主机：随便选择一台
路径：10.60.200.93:/data/ovirt/data

【iso域】
在ovirt上配置页面：
“新建域”
名称：iso
域功能：ISO/NFS
使用主机：随便选择一台
路径：10.60.200.93:/data/ovirt/iso

【导出域】
在ovirt上配置页面：
“新建域”
名称：export
域功能：EXPORT/NFS
使用主机：随便选择一台
路径：10.60.200.93:/data/ovirt/export




【Q07】：提示“执行动作 添加存储连接 时出错: 试图挂载目标时出现问题”
A：注意：填写“路径”时，注意末尾不要出现空格，否则会失败，通过查看挂载节点上/var/log/vdsm.log，可以分析原因，例如，日志显示：
Storage.StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: access denied by server while mounting 10.50.200.
93:/data/ovirt/iso \n')
【错误】
路径：10.50.200.93:/data/ovirt/iso [iso后边跟着一个空格]
【正确】
路径：10.50.200.93:/data/ovirt/iso[iso后边没空格]


假设已经挂载完毕ISO，我们需要增加OS进来，这里有个小技巧：
查看iso所在的NFS服务器（10.50.200.93）的路径
# pwd
/data/ovirt/iso/62a1b5e0-730f-47db-8057-3ed0fda7b83a/p_w_picpaths/11111111-1111-1111-1111-111111111111
我们可以直接cd到这个目录下，将OS文件上传到这里，修改权限
# chown -R 36:36 . 
回到web端，查看iso域的映像即可。


【Q08】配置 ovirt-hosted-engine-setup 时，各种报错，怎么解决？
A：
1）防火墙相关
如果你把DNS这类服务也是用同一台物理机提供服务，则在配置vm和把host加入cluster时，防火墙的配置会被更新，因而影响到DNS解析。
##################配置vm时##################
【(1) Continue setup - VM installation is complete】
防火墙配置已经被更新为仅允许ssh和vnc服务通过

[root@n93 ~]# cat /etc/sysconfig/iptables
# Generated by ovirt-hosted-engine-setup installer
#filtering rules
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i lo -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type any -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 5900 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 5900 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 5901 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 5901 -j ACCEPT

#drop all rule
-A INPUT -j REJECT --reject-with icmp-host-prohibited
COMMIT

##################把host加入cluster时##################
【Enter the name of the cluster to which you want to add the host (Default) [Default]: 】
防火墙被更新为：运行vdsm相关服务通过。

[root@n93 ~]# cat /etc/sysconfig/iptables

# oVirt default firewall configuration. Automatically generated by vdsm bootstrap script.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
# vdsm
-A INPUT -p tcp --dport 54321 -j ACCEPT
# rpc.statd
-A INPUT -p tcp --dport 111 -j ACCEPT
-A INPUT -p udp --dport 111 -j ACCEPT
# SSH
-A INPUT -p tcp --dport 22 -j ACCEPT
# snmp
-A INPUT -p udp --dport 161 -j ACCEPT


# libvirt tls
-A INPUT -p tcp --dport 16514 -j ACCEPT

# guest consoles
-A INPUT -p tcp -m multiport --dports 5900:6923 -j ACCEPT

# migration
-A INPUT -p tcp -m multiport --dports 49152:49216 -j ACCEPT


# Reject any other input traffic
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -m physdev ! --physdev-is-bridged -j REJECT --reject-with icmp-host-prohibited
COMMIT

解决方法：使用其他的主机来提供DNS和NFS服务。


2）DNS相关
##################把host加入cluster时##################
报错：
          To continue make a selection from the options below:
          (1) Continue setup - engine installation is complete
          (2) Power off and restart the VM
          (3) Abort setup
          (4) Destroy VM and abort setup
         
          (1, 2, 3, 4)[1]:
[ INFO  ] Engine replied: DB Up!Welcome to Health Status!
          Enter the name of the cluster to which you want to add the host (Default) [Default]: 
[ ERROR ] Cannot automatically add the host to cluster Default: Host address must be a FQDN or a valid IP address 
[ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default

上面这个异常的可能原因是：是通过DNS服务器而不是/etc/hosts来解析主机名，而我们使用的是hosts配置，因此无法解析。
查看日志，得到的是400错误。
2015-09-28 15:28:59 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 Enter the name of the cluster to which you want to add the host (Default) [Default]: 
2015-09-28 15:29:34 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:626 Adding the host to the cluster
2015-09-28 15:29:36 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:654 Cannot add the host to cluster Default
Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/engine/add_host.py", line 645, in _closeup
    otopicons.NetEnv.IPTABLES_ENABLE
  File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/brokers.py", line 13280, in add
    headers={"Expect":expect, "Correlation-Id":correlation_id}
  File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 88, in add
    return self.request('POST', url, body, headers)
  File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 118, in request
    persistent_auth=self._persistent_auth)
  File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 146, in __doRequest
    persistent_auth=persistent_auth
  File "/usr/lib/python2.6/site-packages/ovirtsdk/web/connection.py", line 134, in doRequest
    raise RequestError, response
RequestError: 
status: 400
reason: Bad Request
detail: Host address must be a FQDN or a valid IP address
2015-09-28 15:29:36 ERROR otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:662 Cannot automatically add the host to cluster Default:
Host address must be a FQDN or a valid IP address

2015-09-28 15:29:36 DEBUG otopi.context context._executeMethod:152 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/engine/add_host.py", line 669, in _closeup
    cluster=cluster_name,
RuntimeError: Cannot add the host to cluster Default
2015-09-28 15:29:36 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Closing up': Cannot add the host to cluster Default

猜想：在把host加入cluster时，使用的是主机名，而这个主机名，需要通过DNS服务来提供解析。



##################把host加入cluster时##################
报错：
[ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs.
[ ERROR ] Unable to add hosted_engine_1 to the manager
          Please shutdown the VM allowing the system to launch it as a monitored service.
          The system will wait until the VM is down.
          

          
结合布置步骤：
找到配置这个字符串的地方：vm_hosted_e01
说明，这个位置配置的是在web界面中配置主机时，对应的“名称”字段。



【配置步骤】
（略）
          Please specify an alias for the Hosted Engine p_w_picpath [hosted_engine]: 
          Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_1]: 
（略）
[ INFO  ] Stage: Setup validation
         
          --== CONFIGURATION PREVIEW ==--
         
          Engine FQDN                        : e01.test
          Bridge name                        : ovirtmgmt
          SSH daemon port                    : 22
          Firewall manager                   : iptables
          Gateway address                    : 10.50.200.1
          Host name for web application      : hosted_engine_1
          Host ID                            : 1
          Image alias                        : hosted_engine
          Image size GB                      : 40
          Storage connection                 : 10.50.200.93:/data/ovirt/p_w_picpaths
          Console type                       : vnc
          Memory size MB                     : 8192
          MAC address                        : 00:16:3e:7b:18:b9
          Boot type                          : cdrom
          Number of CPUs                     : 4
          ISO p_w_picpath (for cdrom boot)         : /data/ovirt/iso/CentOS-6.5-x86_64-bin-DVD1.iso
          CPU Type                           : model_SandyBridge
         
          Please confirm installation settings (Yes, No)[Yes]: 
（略）
          The VM has been started.  Install the OS and shut down or reboot it.  To continue please make a selection:
         
          (1) Continue setup - VM installation is complete
          (2) Reboot the VM and restart installation
          (3) Abort setup
          (4) Destroy VM and abort setup
         
          (1, 2, 3, 4)[1]: 
（略）
          The VM has been started.  Install the OS and shut down or reboot it.  To continue please make a selection:

          (1) Continue setup - VM installation is complete
          (2) Reboot the VM and restart installation
          (3) Abort setup
          (4) Destroy VM and abort setup

          (1, 2, 3, 4)[1]: 
          Waiting for VM to shut down...
[ INFO  ] Creating VM
（略）
          Please install and setup the engine in the VM.
          You may also be interested in installing ovirt-guest-agent-common package in the VM.
          To continue make a selection from the options below:
          (1) Continue setup - engine installation is complete
          (2) Power off and restart the VM
          (3) Abort setup
          (4) Destroy VM and abort setup
         
          (1, 2, 3, 4)[1]: 
[ INFO  ] Engine replied: DB Up!Welcome to Health Status!
          Enter the name of the cluster to which you want to add the host (Default) [Default]: 
[ INFO  ] Waiting for the host to become operational in the engine. This may take several minutes...
[ INFO  ] Still waiting for VDSM host to become operational...
[ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs.
[ ERROR ] Unable to add hosted_engine_1 to the manager
          Please shutdown the VM allowing the system to launch it as a monitored service.
          The system will wait until the VM is down.


查看日志：
2015-09-29 05:05:47,858 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-8-thread-3) [5dad09df] Command TimeBoundPollVDSCommand(HostName = hosted_engine_1, HostId = 54878c22-956f-4102-91c9-f9b15e467814) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Timeout during xml-rpc call
2015-09-29 05:05:47,860 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-8-thread-3) [5dad09df] Timeout waiting for VDSM response. java.util.concurrent.TimeoutException
2015-09-29 05:05:47,867 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [5dad09df] Host installation failed for host 54878c22-956f-4102-91c9-f9b15e467814, hosted_engine_1.: org.ovirt.engine.core.bll.VdsCommand$VdsInstallException: Host not reachable


看到没，，主机不可达。。我们配置的主机“名称”，应该是能解析才对。


结论：推荐配置一个独立的，不受影响的DNS服务器，为集群提供域名解析。
先说下，，我之前报错是在同一台主机A上配置了DNS服务，然后用工具ovirt-hosted-engine-setup 一步步配置的
后来争取的操作是在主机B上配置了DNS服务，其他没变，最终顺利的配置成功。
我观察到，主机A在配置过程中会更新2次防火墙，，可能会有所影响。
第1次是：安装vm前后
第2次是：安装engine前后


【Q09】删除 集群 时，使用强制删除，，还有3个host遗留下来，再删除时，提示无法删除，“No up server in cluster”，怎么解决？
A：正确的删除方法是，在host处于正常状态时，先删除所有的host，仅留下最后一个host。同理，遇到上述问题，可以先试图激活其中一个host，然后删除另外2个host，最后再删除这个host即可。


【Q10】如果安装失败，请检查日志，如果发现是某些安装包安装失败，提示类似：2015-11-04 17:13:42 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:97 Yum [u'4:perl-libs-5.10.1-136.el6.i686 requires perl = 4:5.10.1-136.el6']
A：可以在节点机上手动安装 vdsm 和 vdsm-cli 测试下，如果确认是类似这样的异常：
Error: Package: 4:perl-libs-5.10.1-136.el6.i686 (base)
           Requires: perl = 4:5.10.1-136.el6
           Installed: 4:perl-5.10.1-141.el6.x86_64 (@base)
               perl = 4:5.10.1-141.el6
           Available: 4:perl-5.10.1-136.el6.x86_64 (base)
               perl = 4:5.10.1-136.el6
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
请做软件包的降级操作：
# yum downgrade perl*


【Q11】如何更改ovirt-engine的管理员admin的密码
A：使用工具：ovirt-aaa-jdbc-tool
[root@e01 ~]# ovirt-aaa-jdbc-tool user password-reset admin
Password:
updating user admin...
user updated successfully
参考：
http://www.ovirt.org/Features/AAA_JDBC#Password_management


【Q12】ovirt-engine怎么接入域
A：测试接入现有办公网的AD
[root@engine ~]# engine-manage-domains add --provider=ad --domain=test.org --user=ovirtmgr
Enter password:
The domain test.org has been added to the engine as an authentication source but no users from that domain have been granted permissions within the oVirt Manager.
Users from this domain can be granted permissions by editing the domain using action edit and specifying --add-permissions or from the Web administration interface logging in as admin@internal user.
oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart).
Manage Domains completed successfully

提示，增加域用户后要重启engine，增加时，也可以用参数“--add-permissions”来继承系统权限，当然，后面也可以编辑
[root@engine ~]# service ovirt-engine restart
列出域：
[root@engine ~]# engine-manage-domains list  
Domain: test.org
        User name: ovirtmgr@test.org
Manage Domains completed successfully
编辑权限：
[root@engine ~]# engine-manage-domains edit --provider=ad --domain=test.org --user=ovirtmgr --add-permissions
Enter password:
Successfully added domain test.org. oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart).
Manage Domains completed successfully
[root@engine ~]# service ovirt-engine restart

登录到ovirt查看这个用户，是 SuperUser 角色，和admin@internal一样。


【Q13】登录ovirt页面遇到错误：无法登录。用户帐号被禁用或锁住，请联系系统管理员。 （Cannot Login. User Account is Disabled or Locked, Please contact your system administrator.）
A：显而易见，，用户被锁定了，可能是输入了3次错误的密码导致的。
解锁即可：
[root@e01 ~]# ovirt-aaa-jdbc-tool user unlock admin
updating user admin...
user updated successfully

【Q14】host1下线后该host上的虚拟机处于问号（？，未知的，unknown）状态，无法迁移到host2上，怎么解决？
A：选择 已下线的节点host1 的右键菜单：“确认主机已经重启”
根据提示：在没有正确手动重启的主机上执行这个操作可能会导致虚拟机在多个主机上启动时存储损坏。
确认操作即可。

结果：符合预期。虚拟机自动迁移到 host2 上。
注：若 host1 可能只是网络故障，并非处于重启或者关机的状态，则在修复上线前，建议先重启一次。


【Q15】如何配置邮件告警
A：使用ovirt-engine-notifier服务来根据指定是事件发出邮件通知。
1）配置服务
[root@engine ~]# vim /usr/share/ovirt-engine/services/ovirt-engine-notifier/ovirt-engine-notifier.conf
MAIL_SERVER=smtp.xxx.com
MAIL_PORT=465
MAIL_USER=from@xxx.com
MAIL_PASSWORD=xxxx
MAIL_SMTP_ENCRYPTION=ssl
HTML_MESSAGE_FORMAT=true
MAIL_FROM=from@xxx.com

[root@engine ~]# chkconfig ovirt-engine-notifier on
[root@engine ~]# service ovirt-engine-notifier start

2）配置用户
在 ovirt-engine 页面选择：“系统”-“用户”
选定用户（admin），在下方的菜单中选择：“事件通知器”-“管理事件”
选定需要告警的事件，配置邮件接收者。
重启一下服务：
[root@engine ~]# service ovirt-engine-notifier restart

3）测试
迁移一个vm，观察延迟1-3分钟，将收到邮件。
查看日志：
[root@engine ~]# tail /var/log/ovirt-engine/notifier/notifier.log
2015-12-24 10:40:57,692 INFO    [org.ovirt.engine.core.notifier.EngineMonitorService initServerUrl] Engine health servlet URL is "http://e01.test:80/ovirt-engine/services/health".
2015-12-24 10:43:28,813 INFO    [org.ovirt.engine.core.notifier.transport.smtp.Smtp idle] Send mail subject='alertMessage (e01.test), [Migration started (VM: tttttt, Source: n34.test, Desti
nation: n33.test, User: admin@internal).]' to='admin@xxx.com'
2015-12-24 10:43:31,090 INFO    [org.ovirt.engine.core.notifier.transport.smtp.Smtp idle] Send mail subject='resolveMessage (e01.test), [Migration completed (VM: tttttt, Source: n34.test, D
estination: n33.test, Duration: 1 minute 12 seconds, Total: 1 minute 12 seconds, Actual downtime: (N/A))]' to='admin@xxx.com'

接收到的邮件1：
主题：alertMessage (e01.test), [Migration started (VM: tttttt,Source: n34.test, Destination: n33.test, User: admin@internal).] 
发件人：xxx
时   间：2015年12月24日(星期四) 上午10:43	
收件人： xxx

内容：
Time:2015-12-24 10:41:44.999
Message:Migration started (VM: tttttt, Source: n34.test, Destination: n33.test, User: admin@internal).
Severity:NORMAL
User Name: admin@internal
VM Name: tttttt
Host Name: n34.test
Template Name: tpl-m1
Data Center Name: SZ

接收到的邮件2：
发件人：xxx
时   间：2015年12月24日(星期四) 上午10:43	
收件人： xxx
主题：
resolveMessage (e01.test), [Migration completed (VM: tttttt,Source: n34.test, Destination: n33.test, Duration: 1 minute 12 seconds,Total: 1 minute 12 seconds, Actual downtime: (N/A))]
内容：
Time:2015-12-24 10:42:57.125
Message:Migration completed (VM: tttttt, Source: n34.test, Destination: n33.test, Duration: 1 minute 12 seconds, Total: 1 minute 12 seconds, Actual downtime: (N/A))
Severity:NORMAL
User Name: admin@internal
VM Name: tttttt
Host Name: n34.test
Template Name: tpl-m1
Data Center Name: SZ


【Q16】如何升级版本
A：根据官网文档来操作即可。值得注意的地方是在3.5 -> 3.6 的升级过程中需要注意。
在版本为3.6的engine上，建立了数据中心，默认是兼容3.6的，，如果创建3.5的集群，会遇到错误提示兼容性问题。
el6的系统只支持到3.5版本的vdsm主机，如果需要3.6版本的vdsm主机，则需要el7的系统，因为对应的3.6版本的vdsm相关的rpm包仅存在于官方yum源的el7目录中。
http://resources.ovirt.org/pub/ovirt-3.6/rpm/el7/noarch/
参考官网的系统需求：
http://www.ovirt.org/Download
-----------------------------------------------------
Minimum Hardware/Software
    4 GB memory
    20 GB disk space
    
Optional Hardware
    Network storage
    
Recommended browsers
    Latest Mozilla Firefox
    Latest Google Chrome
    IE10 and above
    
Supported Manager
    Fedora 22 (3.6 only)
    CentOS Linux 6.7, 7.2
    Red Hat Enterprise Linux 6.7, 7.2
    Scientific Linux 6.7, 7.2
    
Supported Hosts
    Fedora 21, 22
    CentOS Linux 6.7 (3.5 only), 7.2
    Red Hat Enterprise Linux 6.7 (3.5 only), 7.2
    Scientific Linux 6.7 (3.5 only), 7.2
-----------------------------------------------------


【Q17】host xxx did no satisfy internal filter Memory because its swap value was illegal.
A：细节在邮件列表中找这个thead：
ovirt users mailing list:
http://lists.ovirt.org/pipermail/users/2015-December/036891.html


【Q18】如何将系统封装成模版
A：
----------------------- centos6 ------------------------------
1）配置epel源
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-6.repo
yum makecache

2）配置 ovirt-guest-agent
yum -y install http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
yum -y install ovirt-guest-agent
service ovirt-guest-agent start
chkconfig ovirt-guest-agent on

重启vm后查看效果：符合预期。


3）配置 cloud-init
yum -y install cloud-init
echo 'datasource_list: ["NoCloud", "ConfigDrive"]' >>/etc/cloud/cloud.cfg

关闭vm后验证：符合预期。

4）手动清理在创建虚拟机时可能导致冲突的配置
--- 清理cloud-init ---
rm /var/lib/cloud -fr

--- 清理hostname ---
cat <<'_EOF' >/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=localhost.localdomain
_EOF

--- 清理网卡相关 ---
sed -i -e '/UUID/d' -e '/HWADDR/d' -e '/ONBOOT/d' -e '/BOOTPROTO/d' \
-e '/IPADDR/d' -e '/NETMASK/d' -e '/GATEWAY/d' \
-e '/TYPE=Ethernet/a\ONBOOT=no\nBOOTPROTO=dhcp' /etc/sysconfig/network-scripts/ifcfg-eth*

--- 清理ssh相关 ---
rm -f /etc/ssh/ssh_host_*
rm /root/.ssh -fr 

--- 清理log ---
find /var/log -type f -delete
find /root -type f ! -name ".*" -delete


--- 最后一步 ---
（注：此处可以直接执行 sys-unconfig，这个工具除了清理udev，还将在下次启动时，启动几个服务，例如密码，网络，时间等配置，具体可以参考man的解释。由于本人不想重置root密码和其他服务，将采取下述操作来收尾）
--- 清理 udev 和history ---
rm /etc/udev/rules.d/*-persistent-*.rules -f
echo >~/.bash_history
history -c

--- 关机 ---
# poweroff

----------------------- centos7 ------------------------------
1）配置epel源
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
yum makecache

2）安装 ovirt-guest-agent
yum -y install ovirt-guest-agent
systemctl start ovirt-guest-agent.service 
systemctl enable ovirt-guest-agent.service

3）安装 cloud-init
yum -y install cloud-init
echo 'datasource_list: ["NoCloud", "ConfigDrive"]' >>/etc/cloud/cloud.cfg

4）手动清理在创建虚拟机时可能导致冲突的配置
--- 清理cloud-init ---
rm /var/lib/cloud -fr

--- 清理hostname ---
cat <<'_EOF' >/etc/hostname 
localhost.localdomain
_EOF

--- 清理网卡相关 ---
sed -i -e '/UUID/d' -e '/ONBOOT/d' -e '/BOOTPROTO/d' -e '/IPADDR/d' -e '/NETMASK/d' -e '/GATEWAY/d' \
-e '/TYPE=Ethernet/a\ONBOOT=no\nBOOTPROTO=dhcp' /etc/sysconfig/network-scripts/ifcfg-eth*

--- 清理ssh相关 ---
rm -f /etc/ssh/ssh_host_* /root/.ssh/*

--- 清理log ---
rm -f /root/anaconda-ks.cfg
find /var/log -type f -delete


--- 最后一步 ---
（注：此处可以直接执行 sys-unconfig，这个工具除了清理udev，还将在下次启动时，启动几个服务，例如密码，网络，时间等配置，具体可以参考man的解释。由于本人不想重置root密码和其他服务，将采取下述操作来收尾）
--- 清理 udev 和history ---
rm /etc/udev/rules.d/*-persistent-*.rules -f
echo >~/.bash_history
history -c

--- 关机 ---
# poweroff

【Q19】遇到心跳超时的警报 Heartbeat exeeded
A：
engine页面事件记录：
2016-1-6 上午11:31:33  Host n33.test power management was verified successfully.
2016-1-6 上午11:31:33  Status of host n33.test was set to Up.
2016-1-6 上午11:31:33  Executing power management status on Host n33.test using Proxy Host n34.test and Fence Agent ipmilan:10.50.200.43.
2016-1-6 上午11:31:30  Invalid status on Data Center SZ. Setting Data Center status to Non Responsive (On host n33.test, Error: Network error during communication with the Host.).
2016-1-6 上午11:31:30  Host n33.test is not responding. It will stay in Connecting state for a grace period of 80 seconds and after that an attempt to fence the host will be issued.
2016-1-6 上午11:31:30  VDSM n33.test command failed: Heartbeat exeeded

engine.log里面记录的日志内容也一致表明是engine检测node时，心跳超时。
WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-27) [] Host 'n33.test' is not responding. It will stay in Connecting state for a grace period of 80 seconds and after that an attempt to fence the host will be issued.
ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-25) [] Command 'ListVDSCommand(HostName = n33.test, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='999be037-0298-4506-afb6-665b6f00db2e', vds='Host[n33.test,999be037-0298-4506-afb6-665b6f00db2e]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded

尝试调整心跳超时的间隔：
[root@e01 ~]# engine-config -s vdsHeartbeatInSeconds=20 
[root@e01 ~]# service ovirt-engine restart 
未能解决，继续收到警报。
突然发现，，engine和node的时间不一致，差距有3-5分钟，检查ntp服务器，手动执行同步命令失效，判断是ntp服务器异常。
----
ntpdate xxx
报错：no server suitable for synchronization found
判断是所有公网的ntp服务器无法正常请求到数据
以下的定义是让NTP Server和其自身保持同步，如果在/etc/ntp.conf中定义的server都不可用时，将使用local时间作为ntp服务提供给ntp客户端。

server 127.127.1.0
fudge 127.127.1.0 stratum 8 
----
调整ntp服务器的配置，并同步时间后，问题解决。



【Q20】UI在新建存储等页面中，出现输入框处于无法输入的状态，怎么处理？
A：使用IE浏览器试试，目前接触到的案例都是兼容性的问题。


【Q21】ovirt node 在 ××× 网络下出现异常表现。
A：建议分析这个配置文件的存在 rule-ovirtmgmt ，对网络带来的影响。
示例：
1、当前状态
[root@n33 network-scripts]# cat rule-ovirtmgmt 
# Generated by VDSM version 4.16.27-0.el6
from 10.50.200.0/24 table 3232235797
from all to 10.50.200.0/24 dev ovirtmgmt table 3232235797

对应 table 3232235797 的路由要结合这个配置文件来看：
[root@n33 network-scripts]# cat route-ovirtmgmt 
# Generated by VDSM version 4.16.27-0.el6
0.0.0.0/0 via 10.50.200.1 dev ovirtmgmt table 3232235797
10.50.200.0/24 via 10.50.200.21 dev ovirtmgmt table 3232235797

[root@n33 network-scripts]# ip rule
0:      from all lookup local 
32764:  from all to 10.50.200.0/24 iif ovirtmgmt lookup 3232235797 
32765:  from 10.50.200.0/24 lookup 3232235797 
32766:  from all lookup main 
32767:  from all lookup default 

2、可能存在的现象：即使手动指定了静态路由，，实际上数据还是走的默认网关。
3、实例分析
---------------------------------------------------------------------------------------------------
数据流向：
10.50.200.100/24(server) ->10.50.200.1/24(gateway)
                         ->10.50.200.254/24(*** server/172.16.17.0)    <->    172.16.17.6(client)
---------------------------------------------------------------------------------------------------
1）【给 10.50.200.100 增加一条静态路由】
[root@n33 network-scripts]# ip route add 172.16.0.0/16 via 10.50.200.254 dev ovirtmgmt
[root@n33 network-scripts]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.50.200.0     0.0.0.0         255.255.255.0   U     0      0        0 ovirtmgmt
172.16.0.0      10.50.200.254   255.255.0.0     UG    0      0        0 ovirtmgmt
169.254.0.0     0.0.0.0         255.255.0.0     U     1064   0        0 ovirtwan
169.254.0.0     0.0.0.0         255.255.0.0     U     1065   0        0 ovirtmgmt
0.0.0.0         10.50.200.1     0.0.0.0         UG    0      0        0 ovirtmgmt

2）【在 client 端启动一个 http 服务来测试】
[on 172.16.17.6]
python -m SimpleHTTPServer 11111

3）【测试】
[on 10.50.200.100]
curl -I http://172.16.17.6:11111/`hostname`

4）【结果分析】
预期结果：
[on 172.16.17.6]
10.50.200.100 - - [date-time] "HEAD /n33.test.com HTTP/1.1" 404 -

实际结果：
[on 172.16.17.6]
10.50.200.1 - - [date-time] "HEAD /n33.test.com HTTP/1.1" 404 -

5）【测试解决办法】
a）删除rule
ip rule del from 10.50.200.0/24 lookup 3232235797

b）增加rule
ip rule add from 10.50.200.0/24 to 172.16.0.0/16 lookup main

6）【原因分析】
结果前述的文件 route-ovirtmgmt 和 rule-ovirtmgmt 来分析
ip rule 的规则中，第一列是 priority 的值，值小，则优先匹配，因此，实例中的请求逻辑是这样的：
10.50.200.100 -> 172.16.17.6
匹配路由：转发给10.50.200.254
匹配规则：32765，转发给 table 3232235797
查找table 3232235797对应的路由：转发给10.50.200.1
数据外发

7）【结论】
在新增静态路由后，对应的增加一条规则来约束指定的流量走指定的静态路由。
a）命令行
ip route add 172.16.0.0/16 via 10.50.200.254 dev ovirtmgmt
ip rule add from 10.50.200.0/24 to 172.16.0.0/16 lookup main
b）配置
echo '172.16.0.0/16 via 10.50.200.254 dev ovirtmgmt' >>route-ovirtmgmt 
echo 'from 10.50.200.0/24 to 172.16.0.0/16 table main' >>rule-ovirtmgmt

最终的配置文件：
[root@n33 network-scripts]# cat route-ovirtmgmt 
# Generated by VDSM version 4.16.27-0.el6
0.0.0.0/0 via 10.50.200.1 dev ovirtmgmt table 3232235797
10.50.200.0/24 via 10.50.200.21 dev ovirtmgmt table 3232235797
172.16.0.0/16 via 10.50.200.254 dev ovirtmgmt

[root@n33 network-scripts]# cat rule-ovirtmgmt 
# Generated by VDSM version 4.16.27-0.el6
from 10.50.200.0/24 table 3232235797
from all to 10.50.200.0/24 dev ovirtmgmt table 3232235797
from 10.50.200.0/24 to 172.16.0.0/16 table main
转载于:https://blog.51cto.com/nosmoking/1698911