nova evacuate功能分析

宕机疏散对外呈现的命令行

nova evacuate [--password <password>] <server> [<host>]

参数详解:

<server> 故障计算节点上的虚拟机

<host> 目标计算节点的名称或ID。如果没有指定特定的计算节点,则nova scheduler调度器随机选择选择一个可用的计算节点

password <password> 设置宕机疏散后虚机的登录密码

Part 2

nova evacuate应用场景

nova evacuate 的应用场景主要是,当虚拟机所在nova-compute计算节点出现宕机后,虚拟机可以通过nova evacuate将虚拟机从宕机的nova-compute计算节点迁移至其它可用的计算节点上。当原compute节点再重新恢复后,会对疏散后的虚机进行删除。

Part 3

nova evacuate代码梳理

当nova接受到用户下发的nova evacuate请求时,nova各模块处理的主体流程如下:

  • 1) nova-api服务接受用户请求,对用户的请求参数进行有效性校验,然后向nova-conductor服务发送rpc请求,把处理流程交给nova-conductor服务进行处理

  • 2) nova-conductor服务接受到rpc消息后,根据用户下发的参数进行不同逻辑判断,如果用户没有指定特定的计算节点,那么会进一步的调用nova-scheduler服务,来选择一个可用的计算节点

  • 3) nova-conductor服务给选中的nova-compute计算节点发送cast类型的rpc消息,交由具体的计算节点来进行处理

  • 4) nova-compute节点接受到rpc消息后,进行虚机的创建操作

Part 4

nova-api服务阶段

nova-api服务执行api目录下代码阶段

  • 1)用户下发 nova evacuate 命令时,使用post方法,给nova-api服务发送http请求,http body体里面,使用的action动作为 evacuate

  • 2)获取http请求body体里面的内容,从而获取 host,force,password,on_shared_storage,这些参数的值

  • 3)如果指定了host 参数的值,那么首先判断该host节点是否存在,如果不存在的话,那么抛出找不到该Host的异常,存在的话,执行第四步

  • 4)如果指定的Host,与虚机所在的host相同,那么抛出异常,不允许指定的计算节点与虚机的host相同

D:\tran_code\nova_v1\nova\api\openstack\compute\evacuate.py

    def _evacuate(self, req, id, body):

        """Permit admins to evacuate a server from a failed host

        to a new one.

        """

        context = req.environ["nova.context"]

        instance = common.get_instance(self.compute_api, context, id)

        context.can(evac_policies.BASE_POLICY_NAME,

                    target={'user_id': instance.user_id,

                            'project_id': instance.project_id})

        evacuate_body = body["evacuate"]

        host = evacuate_body.get("host")

        force = None

        ..........

        if host is not None:

            try:

                self.host_api.service_get_by_compute_host(context, host)

            except (exception.ComputeHostNotFound,

                    exception.HostMappingNotFound):

                msg = _("Compute host %s not found.") % host

                raise exc.HTTPNotFound(explanation=msg)

        if instance.host == host:

            msg = _("The target host can't be the same one.")

            raise exc.HTTPBadRequest(explanation=msg)

        try:

            self.compute_api.evacuate(context, instance, host,

                                      on_shared_storage, password, force)

        except exception.InstanceUnknownCell as e:

            raise exc.HTTPNotFound(explanation=e.format_message())

        .......

nova-api服务执行compute目录模块代码阶段

  • 1)获取虚机的host信息

  • 2)判断虚机Host的nova-compue服务是否处于Up状态,如果是Up状态,那么就抛出一个异常(宕机疏散,只有在虚机所在节点宕机的情况下,才进行宕机疏散的),不在执行后续的操作,如果非up,那么执行3以下步骤

  • 3)创建Migration表,来记录该虚机宕机疏散的操作信息,便于后续函数,获取该虚机的信息

  • 4)如果指定了特定host主机,那么把这个host更新到migration的dest_compute字段里面

  • 5)根据虚机的 uuid,获取虚机的request_spec信息。在T版本中,虚机的request_spect内容,存放在nova_api.request_specs表spec字段里面

  • 6)如果指定了目标主机,但是不强制进行宕机疏散的话,把host参数置为none,由nova-scheduler随机选择一个可有的计算节点

这个函数,只用了instance, host, on_shared_storage,admin_password=None, force=None,recreate=True这五个参数,其他的参数没有用,使用默认值,传递给nova conductor服务的rebuild_instance方法

代码逻辑如下:

D:\tran_code\nova_v1\nova\compute\api.py

    def evacuate(self, context, instance, host, on_shared_storage,

                 admin_password=None, force=None):

        LOG.debug('vm evacuation scheduled', instance=instance)

获取虚机的所在host节点

        inst_host = instance.host

根据虚机的host来获取其nova-compute service信息

        service = objects.Service.get_by_compute_host(context, inst_host)

对虚机所在的nova-compute节点状态进行判断,宕机疏散是在虚机所在的nova-compute节点为down的情况下,疏散的,

因此如果虚机所在的nova-compute服务为up,会抛出异常

        if self.servicegroup_api.service_is_up(service):

            LOG.error('Instance compute service state on %s '

                      'expected to be down, but it was up.', inst_host)

            raise exception.ComputeServiceInUse(host=inst_host)

        设置虚机的任务状态为rebuiding,重建状态

        instance.task_state = task_states.REBUILDING

        instance.save(expected_task_state=[None])

        self._record_action_start(context, instance, instance_actions.EVACUATE)

创建这个migration记录,是为源计算节点创建一个醒目标志,为了找到它及以后清理,这个参数不会通过参数的形式,下发下去,迁移的类型为evacuate

        migration = objects.Migration(context,

                                      source_compute=instance.host,

                                      source_node=instance.node,

                                      instance_uuid=instance.uuid,

                                      status='accepted',

                                      migration_type='evacuation')

如果指定了目标主机,那么把目标主机记录到Migration表里面

        if host:

            migration.dest_compute = host

        migration.create()

        compute_utils.notify_about_instance_usage(

            self.notifier, context, instance, "evacuate")

        try:

            request_spec = objects.RequestSpec.get_by_instance_uuid(

                context, instance.uuid)

        except exception.RequestSpecNotFound:

            # Some old instances can still have no RequestSpec object attached

            # to them, we need to support the old way

            request_spec = None

        # NOTE(sbauza): Force is a boolean by the new related API version

如果不强制进行宕机疏散并且还强制指定了特定的host主机,那么就走这段逻辑,其他情况是,不走。

        if force is False and host:

            nodes = objects.ComputeNodeList.get_all_by_host(context, host)

            # NOTE(sbauza): Unset the host to make sure we call the scheduler

虽然形参赋值了,但是在这里把host赋值为空,让它走nova-scheduler调度

            host = None

            # FIXME(sbauza): Since only Ironic driver uses more than one

            # compute per service but doesn't support evacuations,

            # let's provide the first one.

            target = nodes[0]

            if request_spec:

                destination = objects.Destination(

                    host=target.host,

                    node=target.hypervisor_hostname

                )

                request_spec.requested_destination = destination

        return self.compute_task_api.rebuild_instance(context,

                       instance=instance,

                       new_pass=admin_password,

                       injected_files=None,

                       image_ref=None,

                       orig_image_ref=None,

                       orig_sys_metadata=None,

                       bdms=None,

                       recreate=True,

                       on_shared_storage=on_shared_storage,

                       host=host,

                       request_spec=request_spec,

                       )

Part 5

nova-conductor服务阶段

nova-conductor服务接受到nova-api发送的rpc请求以后,nova-conductor阶段 manager.py阶段处理

1) 根据虚机的uuid,获取虚机的migrantion表信息

2) 对传入的Host进行不同逻辑判断

3) host有值的情景

  • 第一种情景:在虚机原始的host上,使用虚机原始的镜像进行重建rebuild;

  • 第二种情景:指定特定的主机,并且进行强制的宕机疏散。

这两种情况下,node这个参数是为空的

4) host无值的情况

三种情景:

  • 第一种情景:要么没有指定主机进行宕机疏散;

  • 第二种情景:要么指定主机了,但是没有进行强制宕机疏散;

  • 第三种情景:要么就是在虚机所在主机上,使用新的镜像进行rebuild重建虚机。

在nova-scheduler的过程中,instance的host是会被排除的,避免选择到这个相同的主机,这种情况下,选择目标主机后,host和Node是非空的,host用于设置消息的目标主机路由参数,node用于后续函数中。

5) 给nova-compute服务发送rpc请求

代码逻辑如下:

D:\tran_code\nova_v1\nova\conductor\manager.py

    def rebuild_instance(self, context, instance, orig_image_ref, image_ref,

                         injected_files, new_pass, orig_sys_metadata,

                         bdms, recreate, on_shared_storage,

                         preserve_ephemeral=False, host=None,

                         request_spec=None):

        with compute_utils.EventReporter(context, 'rebuild_server', instance.uuid):

            node = limits = None

            try:

根据虚机的Uuid,来获取到虚机的migration表信息,如果没有找到,那么抛异常

                migration = objects.Migration.get_by_instance_and_status(

                    context, instance.uuid, 'accepted')

            except exception.MigrationNotFoundByStatus:

                LOG.debug("No migration record for the rebuild/evacuate "

                          "request.", instance=instance)

                migration = None

            有两种情况,host变量是被传递的,

第一种是虚机的host被传递过去,要在虚机所在的主机上进行重建,这个会跳过nova scheduler调度器;

虚机重建有两种情况,一种是虚机使用原始的镜像,另一种是虚机使用非原始镜像

第二种情况,在指定特定的目标主机,并且强制疏散的情况下,那么就不通过nova scheduler调度器

            if host:

                # We only create a new allocation on the specified host if

                # we're doing an evacuate since that is a move operation.

                if host != instance.host:

                    self._allocate_for_evacuate_dest_host(

                        context, instance, host, request_spec)

            else:

在相同的主机上使用新的镜像进行重建或者指定特定的主机,进行宕机疏散,但是不强制

没有指定request_spec的情况下,根据虚机的镜像信息,来构造image元数据,并且来构造request_spec信息

                if not request_spec:

                    filter_properties = {'ignore_hosts': [instance.host]}

                    # build_request_spec expects a primitive image dict

                    image_meta = nova_object.obj_to_primitive(

                        instance.image_meta)

                    request_spec = scheduler_utils.build_request_spec(

                            context, image_meta, [instance])

                    request_spec = objects.RequestSpec.from_primitives(

                        context, request_spec, filter_properties)

                elif recreate:宕机疏散是要走这个的

通过在RequestSpec中增加source host来排除调度器调度到它

                    # NOTE(sbauza): Augment the RequestSpec object by excluding

                    # the source host for avoiding the scheduler to pick it

                    request_spec.ignore_hosts = [instance.host]排除掉虚机的host

                    # NOTE(sbauza): Force_hosts/nodes needs to be reset

                    # if we want to make sure that the next destination

                    # is not forced to be the original host

                    request_spec.reset_forced_destinations()

                try:

                    request_spec.ensure_project_id(instance)

nova scheduler服务,根据request_spec来调度选择一个可用的计算节点

                    hosts = self._schedule_instances(context, request_spec,

                                                     [instance.uuid])

                    host_dict = hosts.pop(0)

                    host, node, limits = (host_dict['host'],

                                          host_dict['nodename'],

                                          host_dict['limits'])

.......

            compute_utils.notify_about_instance_usage(

                self.notifier, context, instance, "rebuild.scheduled")

            instance.availability_zone = (

                availability_zones.get_host_availability_zone(

                    context, host))

            self.compute_rpcapi.rebuild_instance(context,

                    instance=instance,

                    new_pass=new_pass,

                    injected_files=injected_files,

                    image_ref=image_ref,

                    orig_image_ref=orig_image_ref,

                    orig_sys_metadata=orig_sys_metadata,

                    bdms=bdms,

                    recreate=recreate,

                    on_shared_storage=on_shared_storage,

                    preserve_ephemeral=preserve_ephemeral,

                    migration=migration, 此时传递了migration这个结构体

                    host=host, node=node, limits=limits)

Part 6

目标节点的nova-compute 服务阶段

nova-compute阶段 manager.py阶段

  • 1) 根据recreate值来区分是nova evacuate宕机疏散操作还是nova rebuild操作

  • 2) Recreate参数为真的情况下,nova evacuate宕机疏散,recreate为假的情况下,nova rebuild操作

  • 3) 根据选择的sceduler node 来对目标节点进行资源申请

  • 4) 获取虚机的镜像信息

  • 5) 根据虚机的uuid,读取 block_device_mapping 表来获取虚机的块设备信息,

  • 6) 获取虚机的网络信息

  • 7) 把虚机的块设备进行卸载

  • 8) 因为libvirt没有实现rebuild驱动,所以实际调用了_rebuild_default_impl方法来实现,宕机疏散和rebuild重建

  • 9) 如果是宕机疏散nova evacuate操作,那么就在目标节点上,调用spawn驱动,进行新建操作

  • 10) 如果是rebuild操作,那么先在目标节点上destory虚机,然后再调用spawn驱动,进行新建操作,如果是evacuate操作,那么直接进行重建虚机

整个nova-compute服务调用的主要函数流程如下:

rebuild_instance------->

_do_rebuild_instance_with_claim----->

_do_rebuild_instance----->

_rebuild_default_impl

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值