[WIP] Openstack Masakari (by quqi99)

作者:张华 发表于:2022-06-07
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

什么是masakari

masakari是OpenStack VM HA项目, 支持3种故障恢复:

  • VM process down, 虚拟机进程挂了, 则通过虚拟机的API关闭和启动虚拟. process_all_instances=True时检测所有VM, 若为False时结合VM’s metadata是否为HA_enabled, 并根据VM状态来决定如何处理,如若是stop状态的(masakari\engine\drivers\taskflow\instance_failure.py#StopInstanceTask)那就执行StartInstanceTask, 然后定期执行ConfirmInstanceActiveTask(可通过masakari/setup.cfg自定义恢复方式)
  • provisioning process down, 虚拟化进程挂了, 则通过Nova-computeAPI设置Nova-compute服务为down状态. 检测到libvirt, nova-compute, sshd, masakari-instancemonitor, masakari-hostmonitor等进程(process_list.yaml中定义)异常后disable该计算节点并让新创建的VM不会被调度到此节点(masakari\engine\drivers\taskflow\process_failure.py#DisableComputeNodeTask).然后定期运行ConfirmComputeNodeDisabledTask(配置在masakari/setup.cfg中),也有个重要配置叫process_failure_recovery_tasks nova-compute host failure,
  • Nova-compute进程挂了, 则疏散计算节点上的虚拟机。(masakari\engine\drivers\taskflow\host_failure.py,)

Masakari的架构如下:
在这里插入图片描述
Masakari由controller服务与monitor服务组成,controller服务运行在控制节点,monitor服务则运行在计算节点。

  • masakari-api: 运行在控制节,提供服务api。通过RPC它将发送到的处理API请求交由masakari-engine处理。
  • masakari-engine: 运行在控制节点,通过以异步方式执行恢复工作流来处理收到的masakari-api发送的通知。
  • masakari-instancemonitor : 运行在计算节点,属于masakari-monitor,检测虚拟机进程是否挂掉了
  • masakari-processmonitor : 运行在计算节点,属于masakari-monitor,检测Nova-compute是否挂了
  • masakari-hostmonitor : 运行在计算节点,属于masakari-monitor,检测计算节点是否挂了
  • masakari-introspectiveinstancemonitor:运行在计算节点,属于masakari-monitor,当虚拟机安装了qemu-ga,可用于检测以及启动回复故障进程或服务
  • pacemaker-remote:运行在计算节点,解决corosync/pacemaker的16个节点的限制。有些节点可能没有安装corosync, pacemaker-remote让没有安装corosync的节点也能像管理corosync一样来管理

Masakari依赖于pacemaker,Masakari host-monitor定期检查由pacemaker报告的节点状态,并且如果发生故障,则向masakari api发送通知。Pacemaker应该运行stonith资源来关闭节点,然后masakari将在计算节点上运行的guest虚拟机移动到另一个可用的物理节点。Pacemaker是OpenStack官方推荐的资源管理工具,群集基础架构利用Coresync提供的信息和成员管理功能来检测和恢复云端资源级别的故障,达到集群高可用性。Corosync在云端环境中提供集群通讯,主要负责为控制节点提供传递心跳信息的作用。
注:STONITH(Shoot-The-Other-Node-In-The-Head)是用来保护数据的,例如检测到计算节点死掉了, 可能只是管理网络不可达, 业务网络如存储网络可能仍然有数据访问.这个时候,masakari-monitor通过masakari-api将些计算节点上的虚机迁移到其他计算节点上了, 但此时旧计算节点上的虚机还在访问相同的共享存储从而造成数据毁坏. 解决方案便是在packemaker/corosync集群中启用stonith, 当群集检测到某个节点已失联时,它会运行一个stonith插件来关闭计算节点。Masakari推荐使用IPMI的方式去关闭计算节点(基于IPMI的stonith)。这样避免了后端同样的存储资源双写的问题。

测试环境快速搭建

#./generate-bundle.sh --name masakari-ha --series jammy  --num-compute 3 --ovn --masakari --run
#ERROR cannot deploy bundle: cannot resolve charm or bundle "pacemaker-remote": retry resolving with preferred channel:
latest/stable
#delete pacemaker-remote from common/charm_lists to use latest/stable channel instead of ussuri/edge for ch:pacemaker-remote
./generate-bundle.sh --name masakari --series focal --num-compute 3 --ovn --masakari --use-stable-charms --run
./tools/vault-unseal-and-authorise.sh
sudo snap install openstackclients
#sudo snap refresh openstackclients --channel=edge
openstack complete |sudo tee /etc/bash_completion.d/openstack
source /etc/bash_completion.d/openstack
openstack segment --help
source novarc
./configure
openstack compute service list -c Host -c Status -c State --service nova-compute
openstack server show focal-1 -c OS-EXT-SRV-ATTR:host
openstack network list
openstack agent list

注:上面的’–num-compute 3 --masakari’会创建masakar-ha模式, 之前可以运行,但现在却是要填maas_url等信息的。

1, hacluster charm中的./files/ocf/maas/maas_stonith_plugin.py(https://specs.openstack.org/openstack/charm-specs/specs/stein/implemented/masakari.html , 用于确保当用共享存储时仅有一个compute node运行来避免同时写数据毁坏数据的脑裂)通过maas_url与maas_credentials来使用maas控制maas machines(这里是nova-compute节点所在的裸机)的on, off, status, gethosts, getconfignames等操作

juju config masakari-hacluster maas_url="http://10.149.144.51:5240/MAAS"
juju config masakari-hacluster maas_credentials="CGt46wm9SjXkhDdhUn:pUXMEG52TFCBZWnyQQ:pD2hhEdCZk6QALmLa4jUHYnqXtPZcytq"
sudo crm configure edit st-maas
$ sudo crm configure show st-maas
primitive st-maas stonith:external/maas \
        params url="http://10.149.144.51:5240/MAAS" apikey="CGt46wm9SjXkhDdhUn:pUXMEG52TFCBZWnyQQ:pD2hhEdCZk6QALmLa4jUHYnqXtPZcytq" hostnames="juju-bc096c-masakari-ha-19 juju-bc096c-masakari-ha-8 juju-bc096c-masakari-ha-9" \
        op monitor interval=25 start-delay=25 timeout=25

sudo crm_resource --cleanup
sudo crm node clearstate juju-bc096c-masakari-ha-16
sudo crm node online juju-bc096c-masakari-ha-16
sudo crm_mon -1Anrt

$ sudo crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: juju-bc096c-masakari-ha-17 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Wed Apr  9 11:06:47 2025
  * Last change:  Wed Apr  9 11:03:02 2025 by root via cibadmin on juju-bc096c-masakari-ha-16
  * 6 nodes configured
  * 9 resource instances configured
Node List:
  * Online: [ juju-bc096c-masakari-ha-16 juju-bc096c-masakari-ha-17 juju-bc096c-masakari-ha-18 ]
  * RemoteOnline: [ juju-bc096c-masakari-ha-8 juju-bc096c-masakari-ha-9 juju-bc096c-masakari-ha-19 ]
Full List of Resources:
  * Resource Group: grp_masakari_vips:
    * res_masakari_d8c7c56_vip  (ocf:heartbeat:IPaddr2):         Started juju-bc096c-masakari-ha-16
  * Clone Set: cl_res_masakari_haproxy [res_masakari_haproxy]:
    * Started: [ juju-bc096c-masakari-ha-16 juju-bc096c-masakari-ha-17 juju-bc096c-masakari-ha-18 ]
  * juju-bc096c-masakari-ha-8   (ocf:pacemaker:remote):  Started juju-bc096c-masakari-ha-17
  * juju-bc096c-masakari-ha-9   (ocf:pacemaker:remote):  Started juju-bc096c-masakari-ha-18
  * st-maas     (stonith:external/maas):         Started juju-bc096c-masakari-ha-17
  * st-null     (stonith:null):  Started juju-bc096c-masakari-ha-16
  * juju-bc096c-masakari-ha-19  (ocf:pacemaker:remote):  Started juju-bc096c-masakari-ha-16
Failed Fencing Actions:
  * reboot of juju-bc096c-masakari-ha-19 failed: delegate=, client=pacemaker-controld.34444, origin=juju-bc096c-masakari-ha-17, last-failed='1970-01-01 00:13:55Z'
  
2, hacluster charm与pacemaker-remote charm(是nova-compute的子charm)有relation, 这样hacluster就能根据pacemaker-remote relations找到相应的nova-compute所在的裸机,进而启停裸机,或决定是否在裸机上运行resources
  
3, 这样,看起来运行masakari-ha需要真正的maas环境,但是如果不想测上面的stonish部分,上面的maas-url与maas_credentials也是可以随便填个错值的。

Masakari CLI

https://docs.openstack.org/charm-guide/latest/admin/instance-ha.html
1, 虚机evacuation恢复
openstack segment createsegment1 auto COMPUTE

  • reserved_host:segment中专门配置一台机器用于reserve目的,当有机器DOWN时它将被标为on_maintenance状态,并且它上面的VMs将迁移到reserve机器(同时去掉reserve标签, 可通过命令恢复:openstack segment host update --reserved= )
  • auto:segment中没有专门配置reserve机器,当有机器DOWN时它被标为on_maintenance状态,并且它上面的VMs将迁移到其他机器
  • auto_priority:首先它会尝试’自动’恢复方法,如果它失败了,那么它会尝试使用’reserved_host’恢复方法。
  • rh_priority:它与’auto_priority’恢复方法完全相反。

openstack segment host create compute205 COMPUTE SSH S1
openstack segment host delete
juju run --unit nova-compute/2 sudo ip link set br-ens3 down

有问题的主机再加回去时:
openstack compute service set --enable nova-compute
openstack segment host update --on_maintenance=False

2, 虚机重启
openstack server set --property HA_Enabled=True focal-1
juju run --unit nova-compute/2 ‘pgrep -f guest=instance-00000001’
juju run --unit nova-compute/2 ‘sudo pkill -f -9 guest=instance-00000001’
juju run --unit nova-compute/2 ‘pgrep -f guest=instance-00000001’
openstack server show focal-1 -c OS-EXT-SRV-ATTR:host -c OS-EXT-SRV-ATTR:instance_name

3, 主机恢复(如关闭主机网卡)
openstack server show VM-1-c OS-EXT-SRV-ATTR:host -f value
fence_ipmilan -P -A password-a 10.0.5.18 -p password -l admin -u 15206 -o status

20250409 - HostRecoveryFailureException: Failed to evacuate instances

masakari在做测试时看到‘masakari.exception.HostRecoveryFailureException: Failed to evacuate instances’,但evacuate却能成功。查看代码之后,在_evacuate_and_confirm里只有两处exception(etimeout.Timeout and Exception)会写failed_evacuation_instances, 所以确定是由该error(ERROR oslo.service.loopingcall AttributeError: OS-EXT-SRV-ATTR:hypervisor_hostname)导致。客户是使用了custom policy的(https://docs.openstack.org/charm-guide/latest/admin/ops-show-extended-server-attributes.html)
一些检查确认的命令如下:

$ gres 'os_' report-juju-be1d62-0-lxd-6-00404993-2025-03-06-aipbtqf/etc/masakari/masakari.confsakari.conf
os_user_domain_name = service_domain
os_project_domain_name = service_domain
os_privileged_user_name = masakari
os_privileged_user_password = *********
os_privileged_user_tenant = services
os_privileged_user_auth_url = https://keystone.infra.cloud.xxx.net:5000/v3

openstack server show 1f6408c7-b966-4bd7-b723-6fa02e597b90 -fvalue -cOS-EXT-SRV-ATTR:hypervisor_hostname
juju config nova-cloud-controller use-policyd-override
juju exec --unit nova-cloud-controller/0 "oslopolicy-policy-generator --namespace nova --output /dev/stdout | grep os-extended-server-attributes"

openstack user list --domain service_domain
openstack project list --domain service_domain
openstack role assignment list --project-domain service_domain --project services --names
#openstack role add --user neutron --user-domain service_domain --domain service_domain Admin
openstack user show --domain service_domain neutron

export OS_PASSWORD=7cdpRh4r84k293s8VBwrjqz9NmFjNL26L8TcqP6wPBGb6rXn9x7kR8zTYPGSYhVL
export OS_IDENTITY_API_VERSION=3
export OS_USER_DOMAIN_NAME=service_domain
export OS_REGION_NAME=RegionOne
export OS_AUTH_URL=https://10.149.144.52:5000/v3
export OS_PROJECT_DOMAIN_NAME=service_domain
export OS_AUTH_PROTOCOL=https
export OS_USERNAME=neutron
export OS_AUTH_TYPE=password
export OS_PROJECT_NAME=services

openstack server show 1f6408c7-b966-4bd7-b723-6fa02e597b90 -fvalue -cOS-EXT-SRV-ATTR:hypervisor_hostname
openstack --debug server show 1f6408c7-b966-4bd7-b723-6fa02e597b90 -fvalue -cOS-EXT-SRV-ATTR:hypervisor_hostname

openstack notification show 6c6ba699-1863-4f33-92d5-ee17d282ee7e --fit-width 
openstack server event list <instance-id> 

客户后来做了下列修改就正常了:

os_compute_api:os-extended-server-attributes
OLD rule:project_admin_api
NEW rule:context_is_admin

Flow 'instance_evacuate_engine' (27003400-d18b-4aa9-a849-a7b4c57108d4) transitioned into state 'SUCCESS' from state 'RUNNING'  
_flow_receiver /usr/lib/python3/dist-packages/taskflow/listeners/logging.py:141

Reference

[1] Openstack Masakari task流程源码分析 - https://cloud.tencent.com/developer/article/1583296
[2] OpenStack高可用组件Masakari架构、原理及实战 - https://www.pcserver.cn/h-nd-28.html
[3] https://docs.openstack.org/masakari/latest/
[4] https://docs.openstack.org/charm-guide/latest/admin/instance-ha.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

quqi99

你的鼓励就是我创造的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值