2021SC@SDUSC
nova-conductor
nova-conductor主要管理服务之间的通信并进行任务处理。它在接收到请求之后,会为 nova-scheduler 创建一个 RequestSpec 对象用来包装与调度相关的所有请求信息,然后调用 nova-scheduler 服务的 select_destination 接口。
随着 nova-conductor 的不断完善,它还需要承担部分原来属于 nova-compute 的 TaskAPI 的任务。TaskAPI 主要包含了耗时较长的任务,例如:Create Instance/Migrate Instance 等等。(所以,scheduler调用create_instance后会立刻返回而不等待结果)
ConductorManager类继承自manager.Manager,也就是说它充当“公司本部的项目经理”(解释见11.15博客),但它和compute组件(就是它的甲方)是部署在一起的,所以不用rpc通信。
这里面常提到cell的概念,先做个解释:历史上,Nova曾只使用一套逻辑数据库和消息队列,所有节点都用这一套东西来通信和持久化数据。这为开发者伸缩系统和提供系统容错性造成困难。因此有了cells v1,它将节点分成若干小组,每个组配有数据库和队列。cells v2的基本思想和配置与cells v1相同,不同点是,cells v2需要一个nova-cells服务用于在父子cell之间同步信息;另外,在cells v1给一些行为提供了旁路,这使得理解openstack的运行困难,而cells v2避免这些可替代路径,而让nova直接找到正确的数据库、计算节点来处理指定请求。
它的存在时为了给计算节点访问数据库搭桥,所以Conductor的外派团队conductor-api会驻扎在nova-compute中。
Manager类中有一个类对象: target = messaging.Target(version=‘3.0’),Target类在oslo_messaging/target.py中定义,它用于标明一个信息一个message的目的地,包含以下对象成员:
self.exchange = exchange
self.topic = topic
self.namespace = namespace
self.version = version
self.server = server
self.fanout = fanout
self.accepted_namespaces = [namespace] + (legacy_namespaces or [])
在Manager类中,可以看到一个Manager应该有哪些成员变量:
if not host:
host = CONF.host
self.host = host
self.backdoor_port = None
self.service_name = service_name
self.notifier = rpc.get_notifier(self.service_name, self.host)
self.additional_endpoints = []
super(Manager, self).__init__()
在ConductorManager中,增添了一个compute_task_mgr成员,是一个ComputeTaskManager对象(这个类我们待会详解),并把compute_task_mgr添加到additional_endpoints中。
我们具体看主要函数做了什么:
provider_fw_rule_get_all:模拟空的db结果(在。。。。处用到)
// _object_dispatch:把一个调用分派到指定的对象方法。这个函数做了异常处理,确保对象方法能够被调用并且本函数处理产生的异常。
object_action:给对象派活,并且更新对象,返回更新前后差异信息。
ComputeTaskManager类是计算节点可以远程访问的rpc方法实现的集合,与api一一对应。
它设置了一个类成员target = messaging.Target(namespace=‘compute_task’, version=‘1.23’)
它在构造器中注册了很多组件的api,并后续需要用它们干活。
其中最重要的函数应该是build_instances,这就是一开头提到的nova.conductor.rpcapi.build_instances RPC 接口的实际功能实现函数
先检查requested_networks是否存在并且是NetworkRequestList类型,如果不是合规类型,则给它转换成NetworkRequestList类型。
if (requested_networks and
not isinstance(requested_networks,
objects.NetworkRequestList)):
requested_networks = objects.NetworkRequestList.from_tuples(
requested_networks)
然后获取filter_properties的instance_type属性值,如果获取到了东西但不是Flavor类型,则给他转换为Flavor类型。
flavor = filter_properties.get('instance_type')
if flavor and not isinstance(flavor, objects.Flavor):
# Code downstream may expect extra_specs to be populated since it
# is receiving an object, so lookup the flavor to ensure this.
flavor = objects.Flavor.get_by_id(context, flavor['id'])
filter_properties = dict(filter_properties, instance_type=flavor)
如果request_spec是None(一些旧版本中,计算节点在重新schedule时不会传request_spec,所以这种情况下我们需要自己创建一个新的)
if request_spec is None:
legacy_request_spec = scheduler_utils.build_request_spec(
image, instances)
如果request_spec不是空,则为了兼容性,需要把传进来的request_spec转化为向前兼容的形式。
legacy_request_spec = request_spec.to_legacy_request_spec_dict()
然后通过host_lists是否为空判断本次调用是否是在reschedule
is_reschedule = host_lists is not None
正式开始干活。设置重试策略
scheduler_utils.populate_retry(
filter_properties, instances[0].uuid)
获取一个新的RequestSpec对象
instance_uuids = [instance.uuid for instance in instances]
spec_obj = objects.RequestSpec.from_primitives(
context, legacy_request_spec, filter_properties)
LOG.debug("Rescheduling: %s", is_reschedule)
如果是在reschedule,如果没有可用host,则创建实例失败
if is_reschedule:
# Make sure that we have a host, as we may have exhausted all
# our alternates
if not host_lists[0]:
# We have an empty list of hosts, so this instance has
# failed to build.
msg = ("Exhausted all hosts available for retrying build "
"failures for instance %(instance_uuid)s." %
{"instance_uuid": instances[0].uuid})
raise exception.MaxRetriesExceeded(reason=msg)
如果不在reschedule,则调用_schedule_instances获取一个可选目标主机列表host_lists
else:
host_lists = self._schedule_instances(context, spec_obj,
instance_uuids, return_alternates=True)
其中,_schedule_instances内部逻辑如下:
def _schedule_instances(self, context, request_spec,
instance_uuids=None, return_alternates=False):
scheduler_utils.setup_instance_group(context, request_spec)
with timeutils.StopWatch() as timer:
host_lists = self.query_client.select_destinations(
context, request_spec, instance_uuids, return_objects=True,
return_alternates=return_alternates)
LOG.debug('Took %0.2f seconds to select destinations for %s '
'instance(s).', timer.elapsed(), len(instance_uuids))
return host_lists
setup_instance_group函数通过在filter_properties字典中增加group_hosts和group_policies段来标明本instance(request_spec)属于哪个group。配置来自context。
select_destinations函数非常重要,它来自scheduler/client/query.py文件,是SchedulerQueryClient类的一个方法,它返回最适配request_spec和filter_properties的hosts。这个函数在11.07博客中介绍过了。
回到manager,经过一大通操作,我们已经获得了host_lists了,然后处理了一大堆异常
except Exception as exc:
# NOTE(mriedem): If we're rescheduling from a failed build on a
# compute, "retry" will be set and num_attempts will be >1 because
# populate_retry above will increment it. If the server build was
# forced onto a host/node or [scheduler]/max_attempts=1, "retry"
# won't be in filter_properties and we won't get here because
# nova-compute will just abort the build since reschedules are
# disabled in those cases.
num_attempts = filter_properties.get(
'retry', {}).get('num_attempts', 1)
for instance in instances:
# If num_attempts > 1, we're in a reschedule and probably
# either hit NoValidHost or MaxRetriesExceeded. Either way,
# the build request should already be gone and we probably
# can't reach the API DB from the cell conductor.
if num_attempts <= 1:
try:
# If the BuildRequest stays around then instance
# show/lists will pull from it rather than the errored
# instance.
self._destroy_build_request(context, instance)
except exception.BuildRequestNotFound:
pass
self._cleanup_when_reschedule_fails(
context, instance, exc, legacy_request_spec,
requested_networks)
return
然后对context进行升权(把context.is_admin改为True)
elevated = context.elevated()
对于instance和host_list中每一个对应项的组合,如果是在reschedule,对host_list中每一个当前可用的host,将资源分配请求反序列化后存入alloc_req,如果反序列化成功(因为会有一些部署使用不同的调度器,没有使用Placement,对于它们无从谈起”claim“,所以直接假设host是可用的),则调用placementAPI根据给定instance UUID和request开出一块资源,申请成功则返回true。如果成功,则调用fill_provider_mapping在request_spec中设置请求组和资源提供者的映射关系。
如果没有可用host,则宣告失败。
for (instance, host_list) in zip(instances, host_lists):
host = host_list.pop(0)
if is_reschedule:
# If this runs in the superconductor, the first instance will
# already have its resources claimed in placement. If this is a
# retry, though, this is running in the cell conductor, and we
# need to claim first to ensure that the alternate host still
# has its resources available. Note that there are schedulers
# that don't support Placement, so must assume that the host is
# still available.
host_available = False
while host and not host_available:
if host.allocation_request:
alloc_req = jsonutils.loads(host.allocation_request)
else:
alloc_req = None
if alloc_req:
try:
host_available = scheduler_utils.claim_resources(
elevated, self.report_client, spec_obj,
instance.uuid, alloc_req,
host.allocation_request_version)
if request_spec and host_available:
# NOTE(gibi): redo the request group - resource
# provider mapping as the above claim call
# moves the allocation of the instance to
# another host
scheduler_utils.fill_provider_mapping(
request_spec, host)
except Exception as exc:
self._cleanup_when_reschedule_fails(
context, instance, exc, legacy_request_spec,
requested_networks)
return
else:
host_available = True
if not host_available:
host = host_list.pop(0) if host_list else None
if not host_available:
msg = ("Exhausted all hosts available for retrying build "
"failures for instance %(instance_uuid)s." %
{"instance_uuid": instance.uuid})
exc = exception.MaxRetriesExceeded(reason=msg)
self._cleanup_when_reschedule_fails(
context, instance, exc, legacy_request_spec,
requested_networks)
return
然后设置或处理host中的availability_zone
if 'availability_zone' in host:
instance.availability_zone = host.availability_zone
else:
try:
instance.availability_zone = (
availability_zones.get_host_availability_zone(context,
host.service_host))
except Exception as exc:
# Put the instance into ERROR state, set task_state to
# None, inject a fault, etc.
self._cleanup_when_reschedule_fails(
context, instance, exc, legacy_request_spec,
requested_networks)
continue
保存实例,并处理异常。
try:
# NOTE(danms): This saves the az change above, refreshes our
# instance, and tells us if it has been deleted underneath us
instance.save()
except (exception.InstanceNotFound,
exception.InstanceInfoCacheNotFound):
LOG.debug('Instance deleted during build', instance=instance)
continue
把filter_properties深拷贝给local_filter_props,调用populate_filter_properties在local_filter_props中为已被scheduling进程选中的节点增加标注信息。
local_filter_props = copy.deepcopy(filter_properties)
scheduler_utils.populate_filter_properties(local_filter_props,
host)
使用增添信息后的local_filter_props再创建一个RequestSpec对象local_reqspec,把原来request_spec的requested_resources赋给local_reqspec,然后用instance.uuid获取块存储映射表。
local_reqspec = objects.RequestSpec.from_primitives(
context, legacy_request_spec, local_filter_props)
if request_spec:
local_reqspec.requested_resources = (
request_spec.requested_resources)
bdms = objects.BlockDeviceMappingList.get_by_instance_uuid(
context, instance.uuid)
获取重试次数,如果重试不到1次,说明这不是reschedule,需要建立从instance到cell的映射,并且销毁BuildRequest。如果重试超过1次,说明是reschedule,以上事情不需要做,而是给bdms的现存绑定生效。
if num_attempts <= 1:
# If this is a reschedule the instance is already mapped to
# this cell and the BuildRequest is already deleted so ignore
# the logic below.
inst_mapping = self._populate_instance_mapping(context,
instance,
host)
try:
self._destroy_build_request(context, instance)
except exception.BuildRequestNotFound:
if inst_mapping:
inst_mapping.destroy()
return
else:
self._validate_existing_attachment_ids(context, instance, bdms)
然后创建并为instance绑定ARQ
try:
accel_uuids = self._create_and_bind_arq_for_instance(
context, instance, host.nodename, local_reqspec)
# Create ARQs, determine their RPs and initiate ARQ binding
except Exception as exc:
LOG.exception('Failed to reschedule. Reason: %s', exc)
self._cleanup_when_reschedule_fails(
context, instance, exc, legacy_request_spec,
requested_networks)
continue
最后调用compute_rpcapi.build_and_run_instance运行实例。这个函数在之前的博客中介绍过。
self.compute_rpcapi.build_and_run_instance(context,
instance=instance, host=host.service_host, image=image,
request_spec=local_reqspec,
filter_properties=local_filter_props,
admin_password=admin_password,
injected_files=injected_files,
requested_networks=requested_networks,
security_groups=security_groups,
block_device_mapping=bdms, node=host.nodename,
limits=host.limits, host_list=host_list,
accel_uuids=accel_uuids)