OpenStack源码分析【2021-11-21】

2021SC@SDUSC

nova-conductor

nova-conductor主要管理服务之间的通信并进行任务处理。它在接收到请求之后,会为 nova-scheduler 创建一个 RequestSpec 对象用来包装与调度相关的所有请求信息,然后调用 nova-scheduler 服务的 select_destination 接口。

随着 nova-conductor 的不断完善,它还需要承担部分原来属于 nova-compute 的 TaskAPI 的任务。TaskAPI 主要包含了耗时较长的任务,例如:Create Instance/Migrate Instance 等等。(所以,scheduler调用create_instance后会立刻返回而不等待结果)

ConductorManager类继承自manager.Manager,也就是说它充当“公司本部的项目经理”(解释见11.15博客),但它和compute组件(就是它的甲方)是部署在一起的,所以不用rpc通信。

这里面常提到cell的概念,先做个解释:历史上,Nova曾只使用一套逻辑数据库和消息队列,所有节点都用这一套东西来通信和持久化数据。这为开发者伸缩系统和提供系统容错性造成困难。因此有了cells v1,它将节点分成若干小组,每个组配有数据库和队列。cells v2的基本思想和配置与cells v1相同,不同点是,cells v2需要一个nova-cells服务用于在父子cell之间同步信息;另外,在cells v1给一些行为提供了旁路,这使得理解openstack的运行困难,而cells v2避免这些可替代路径,而让nova直接找到正确的数据库、计算节点来处理指定请求。

它的存在时为了给计算节点访问数据库搭桥,所以Conductor的外派团队conductor-api会驻扎在nova-compute中。

Manager类中有一个类对象: target = messaging.Target(version=‘3.0’),Target类在oslo_messaging/target.py中定义,它用于标明一个信息一个message的目的地,包含以下对象成员:

self.exchange = exchange
self.topic = topic
self.namespace = namespace
self.version = version
self.server = server
self.fanout = fanout
self.accepted_namespaces = [namespace] + (legacy_namespaces or [])

在Manager类中,可以看到一个Manager应该有哪些成员变量:

if not host:
    host = CONF.host
self.host = host
self.backdoor_port = None
self.service_name = service_name
self.notifier = rpc.get_notifier(self.service_name, self.host)
self.additional_endpoints = []
super(Manager, self).__init__()

在ConductorManager中,增添了一个compute_task_mgr成员,是一个ComputeTaskManager对象(这个类我们待会详解),并把compute_task_mgr添加到additional_endpoints中。

我们具体看主要函数做了什么:

provider_fw_rule_get_all:模拟空的db结果(在。。。。处用到)
// _object_dispatch:把一个调用分派到指定的对象方法。这个函数做了异常处理,确保对象方法能够被调用并且本函数处理产生的异常。
object_action:给对象派活,并且更新对象,返回更新前后差异信息。

ComputeTaskManager类是计算节点可以远程访问的rpc方法实现的集合,与api一一对应。

它设置了一个类成员target = messaging.Target(namespace=‘compute_task’, version=‘1.23’)

它在构造器中注册了很多组件的api,并后续需要用它们干活。

其中最重要的函数应该是build_instances,这就是一开头提到的nova.conductor.rpcapi.build_instances RPC 接口的实际功能实现函数

先检查requested_networks是否存在并且是NetworkRequestList类型,如果不是合规类型,则给它转换成NetworkRequestList类型。

if (requested_networks and
        not isinstance(requested_networks,
                       objects.NetworkRequestList)):
    requested_networks = objects.NetworkRequestList.from_tuples(
        requested_networks)

然后获取filter_properties的instance_type属性值,如果获取到了东西但不是Flavor类型,则给他转换为Flavor类型。

flavor = filter_properties.get('instance_type')
if flavor and not isinstance(flavor, objects.Flavor):
    # Code downstream may expect extra_specs to be populated since it
    # is receiving an object, so lookup the flavor to ensure this.
    flavor = objects.Flavor.get_by_id(context, flavor['id'])
    filter_properties = dict(filter_properties, instance_type=flavor)

如果request_spec是None(一些旧版本中,计算节点在重新schedule时不会传request_spec,所以这种情况下我们需要自己创建一个新的)

if request_spec is None:
    legacy_request_spec = scheduler_utils.build_request_spec(
        image, instances)

如果request_spec不是空,则为了兼容性,需要把传进来的request_spec转化为向前兼容的形式。

legacy_request_spec = request_spec.to_legacy_request_spec_dict()

然后通过host_lists是否为空判断本次调用是否是在reschedule

is_reschedule = host_lists is not None

正式开始干活。设置重试策略

scheduler_utils.populate_retry(
    filter_properties, instances[0].uuid)

获取一个新的RequestSpec对象

instance_uuids = [instance.uuid for instance in instances]
spec_obj = objects.RequestSpec.from_primitives(
        context, legacy_request_spec, filter_properties)
LOG.debug("Rescheduling: %s", is_reschedule)

如果是在reschedule,如果没有可用host,则创建实例失败

if is_reschedule:
    # Make sure that we have a host, as we may have exhausted all
    # our alternates
    if not host_lists[0]:
        # We have an empty list of hosts, so this instance has
        # failed to build.
        msg = ("Exhausted all hosts available for retrying build "
               "failures for instance %(instance_uuid)s." %
               {"instance_uuid": instances[0].uuid})
        raise exception.MaxRetriesExceeded(reason=msg)

如果不在reschedule,则调用_schedule_instances获取一个可选目标主机列表host_lists

else:
    host_lists = self._schedule_instances(context, spec_obj,
            instance_uuids, return_alternates=True)

其中,_schedule_instances内部逻辑如下:

def _schedule_instances(self, context, request_spec,
                        instance_uuids=None, return_alternates=False):
    scheduler_utils.setup_instance_group(context, request_spec)
    with timeutils.StopWatch() as timer:
        host_lists = self.query_client.select_destinations(
            context, request_spec, instance_uuids, return_objects=True,
            return_alternates=return_alternates)
    LOG.debug('Took %0.2f seconds to select destinations for %s '
              'instance(s).', timer.elapsed(), len(instance_uuids))
    return host_lists

setup_instance_group函数通过在filter_properties字典中增加group_hosts和group_policies段来标明本instance(request_spec)属于哪个group。配置来自context。

select_destinations函数非常重要,它来自scheduler/client/query.py文件,是SchedulerQueryClient类的一个方法,它返回最适配request_spec和filter_properties的hosts。这个函数在11.07博客中介绍过了。

回到manager,经过一大通操作,我们已经获得了host_lists了,然后处理了一大堆异常

except Exception as exc:
    # NOTE(mriedem): If we're rescheduling from a failed build on a
    # compute, "retry" will be set and num_attempts will be >1 because
    # populate_retry above will increment it. If the server build was
    # forced onto a host/node or [scheduler]/max_attempts=1, "retry"
    # won't be in filter_properties and we won't get here because
    # nova-compute will just abort the build since reschedules are
    # disabled in those cases.
    num_attempts = filter_properties.get(
        'retry', {}).get('num_attempts', 1)
    for instance in instances:
        # If num_attempts > 1, we're in a reschedule and probably
        # either hit NoValidHost or MaxRetriesExceeded. Either way,
        # the build request should already be gone and we probably
        # can't reach the API DB from the cell conductor.
        if num_attempts <= 1:
            try:
                # If the BuildRequest stays around then instance
                # show/lists will pull from it rather than the errored
                # instance.
                self._destroy_build_request(context, instance)
            except exception.BuildRequestNotFound:
                pass
        self._cleanup_when_reschedule_fails(
            context, instance, exc, legacy_request_spec,
            requested_networks)
    return

然后对context进行升权(把context.is_admin改为True)

elevated = context.elevated()

对于instance和host_list中每一个对应项的组合,如果是在reschedule,对host_list中每一个当前可用的host,将资源分配请求反序列化后存入alloc_req,如果反序列化成功(因为会有一些部署使用不同的调度器,没有使用Placement,对于它们无从谈起”claim“,所以直接假设host是可用的),则调用placementAPI根据给定instance UUID和request开出一块资源,申请成功则返回true。如果成功,则调用fill_provider_mapping在request_spec中设置请求组和资源提供者的映射关系。

如果没有可用host,则宣告失败。

for (instance, host_list) in zip(instances, host_lists):
    host = host_list.pop(0)
    if is_reschedule:
        # If this runs in the superconductor, the first instance will
        # already have its resources claimed in placement. If this is a
        # retry, though, this is running in the cell conductor, and we
        # need to claim first to ensure that the alternate host still
        # has its resources available. Note that there are schedulers
        # that don't support Placement, so must assume that the host is
        # still available.
        host_available = False
        while host and not host_available:
            if host.allocation_request:
                alloc_req = jsonutils.loads(host.allocation_request)
            else:
                alloc_req = None
            if alloc_req:
                try:
                    host_available = scheduler_utils.claim_resources(
                        elevated, self.report_client, spec_obj,
                        instance.uuid, alloc_req,
                        host.allocation_request_version)
                    if request_spec and host_available:
                        # NOTE(gibi): redo the request group - resource
                        # provider mapping as the above claim call
                        # moves the allocation of the instance to
                        # another host
                        scheduler_utils.fill_provider_mapping(
                            request_spec, host)
                except Exception as exc:
                    self._cleanup_when_reschedule_fails(
                        context, instance, exc, legacy_request_spec,
                        requested_networks)
                    return
            else:
                host_available = True
            if not host_available:
                host = host_list.pop(0) if host_list else None
        if not host_available:
            msg = ("Exhausted all hosts available for retrying build "
                   "failures for instance %(instance_uuid)s." %
                   {"instance_uuid": instance.uuid})
            exc = exception.MaxRetriesExceeded(reason=msg)
            self._cleanup_when_reschedule_fails(
                context, instance, exc, legacy_request_spec,
                requested_networks)
            return

然后设置或处理host中的availability_zone

if 'availability_zone' in host:
    instance.availability_zone = host.availability_zone
else:
    try:
        instance.availability_zone = (
            availability_zones.get_host_availability_zone(context,
                    host.service_host))
    except Exception as exc:
        # Put the instance into ERROR state, set task_state to
        # None, inject a fault, etc.
        self._cleanup_when_reschedule_fails(
            context, instance, exc, legacy_request_spec,
            requested_networks)
        continue

保存实例,并处理异常。

try:
    # NOTE(danms): This saves the az change above, refreshes our
    # instance, and tells us if it has been deleted underneath us
    instance.save()
except (exception.InstanceNotFound,
        exception.InstanceInfoCacheNotFound):
    LOG.debug('Instance deleted during build', instance=instance)
    continue

把filter_properties深拷贝给local_filter_props,调用populate_filter_properties在local_filter_props中为已被scheduling进程选中的节点增加标注信息。

local_filter_props = copy.deepcopy(filter_properties)
scheduler_utils.populate_filter_properties(local_filter_props,
    host)

使用增添信息后的local_filter_props再创建一个RequestSpec对象local_reqspec,把原来request_spec的requested_resources赋给local_reqspec,然后用instance.uuid获取块存储映射表。

local_reqspec = objects.RequestSpec.from_primitives(
    context, legacy_request_spec, local_filter_props)

if request_spec:
 local_reqspec.requested_resources = (
 request_spec.requested_resources)
bdms = objects.BlockDeviceMappingList.get_by_instance_uuid(
        context, instance.uuid)

获取重试次数,如果重试不到1次,说明这不是reschedule,需要建立从instance到cell的映射,并且销毁BuildRequest。如果重试超过1次,说明是reschedule,以上事情不需要做,而是给bdms的现存绑定生效。

if num_attempts <= 1:
    # If this is a reschedule the instance is already mapped to
    # this cell and the BuildRequest is already deleted so ignore
    # the logic below.
    inst_mapping = self._populate_instance_mapping(context,
                                                   instance,
                                                   host)
    try:
        self._destroy_build_request(context, instance)
    except exception.BuildRequestNotFound:
        if inst_mapping:
            inst_mapping.destroy()
        return
else:
self._validate_existing_attachment_ids(context, instance, bdms)

然后创建并为instance绑定ARQ

try:
    accel_uuids = self._create_and_bind_arq_for_instance(
        context, instance, host.nodename, local_reqspec)
    # Create ARQs, determine their RPs and initiate ARQ binding
except Exception as exc:
    LOG.exception('Failed to reschedule. Reason: %s', exc)
    self._cleanup_when_reschedule_fails(
            context, instance, exc, legacy_request_spec,
            requested_networks)
    continue

最后调用compute_rpcapi.build_and_run_instance运行实例。这个函数在之前的博客中介绍过。

self.compute_rpcapi.build_and_run_instance(context,
        instance=instance, host=host.service_host, image=image,
        request_spec=local_reqspec,
        filter_properties=local_filter_props,
        admin_password=admin_password,
        injected_files=injected_files,
        requested_networks=requested_networks,
        security_groups=security_groups,
        block_device_mapping=bdms, node=host.nodename,
        limits=host.limits, host_list=host_list,
        accel_uuids=accel_uuids)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值