nova scheduler 调度实现分析 源码分析

(1)调度器的启动

调度器由脚本 nova/bin/nova-scheduler启动

其内容如下:

if __name__ == '__main__':
    utils.default_flagfile()
    flags.FLAGS(sys.argv)
    logging.setup()
    utils.monkey_patch()
    server = service.Service.create(binary='nova-scheduler')
    service.serve(server)
    service.wait()

其中

utils.default_flagfile() 处理相关的配置文件,如果没有提供配置文件,则采用默认位置的配置文件。

def default_flagfile(filename='nova.conf', args=None):
    if args is None:
        args = sys.argv
    for arg in args:
        if arg.find('flagfile') != -1:
            return arg[arg.index('flagfile') + len('flagfile') + 1:]
    else:
        if not os.path.isabs(filename):
            # turn relative filename into an absolute path
            script_dir = os.path.dirname(inspect.stack()[-1][1])
            filename = os.path.abspath(os.path.join(script_dir, filename))
        if not os.path.exists(filename):
            filename = "./nova.conf"
            if not os.path.exists(filename):
                filename = '/etc/nova/nova.conf'
        if os.path.exists(filename):
            flagfile = '--flagfile=%s' % filename
            args.insert(1, flagfile)
            return filename

配置文件位置的检查顺序,当前执行脚本所在的目录下 --> 当前目录下 --> /etc/nova/nova.conf --> 命令行参数--flag-file指定的配置文件

flags.FLAGS(sys.argv); 将配置参数加载到flags中,flags保存了所有的配置信息

logging.setup() 日志模块

服务的创建如下:

 @classmethod
    def create(cls, host=None, binary=None, topic=None, manager=None,
               report_interval=None, periodic_interval=None,
               periodic_fuzzy_delay=None):
        """Instantiates class and passes back application object.

        :param host: defaults to FLAGS.host
        :param binary: defaults to basename of executable
        :param topic: defaults to bin_name - 'nova-' part
        :param manager: defaults to FLAGS.<topic>_manager
        :param report_interval: defaults to FLAGS.report_interval
        :param periodic_interval: defaults to FLAGS.periodic_interval
        :param periodic_fuzzy_delay: defaults to FLAGS.periodic_fuzzy_delay

        """
        if not host:
            host = FLAGS.host
        if not binary:
            binary = os.path.basename(inspect.stack()[-1][1])
        if not topic:
            topic = binary.rpartition('nova-')[2]
        if not manager:
            manager = FLAGS.get('%s_manager' % topic, None)
        if report_interval is None:
            report_interval = FLAGS.report_interval
        if periodic_interval is None:
            periodic_interval = FLAGS.periodic_interval
        if periodic_fuzzy_delay is None:
            periodic_fuzzy_delay = FLAGS.periodic_fuzzy_delay
        service_obj = cls(host, binary, topic, manager,
                          report_interval=report_interval,
                          periodic_interval=periodic_interval,
                          periodic_fuzzy_delay=periodic_fuzzy_delay)

        return service_obj

其中的report_interval:

seconds between nodes reporting state to datastore

manager为配置文件中,“schedule_manager=...”对应的类名。

在对应的初始化函数中:

def __init__(self, host, binary, topic, manager, report_interval=None,
                 periodic_interval=None, periodic_fuzzy_delay=None,
                 *args, **kwargs):
        self.host = host
        self.binary = binary
        self.topic = topic
        self.manager_class_name = manager
        manager_class = utils.import_class(self.manager_class_name)
        self.manager = manager_class(host=self.host, *args, **kwargs)
        self.report_interval = report_interval
        self.periodic_interval = periodic_interval
        self.periodic_fuzzy_delay = periodic_fuzzy_delay
        super(Service, self).__init__(*args, **kwargs)
        self.saved_args, self.saved_kwargs = args, kwargs
        self.timers = []

一个service包含一个manager,并且监听topic对应的消息队列。此外它定期的将它的状态汇报给数据库。

重点研究了下nova/service.py文件

(a)Launcher类

管理一个service对象,包括启动,停止。service对象的启动采用了eventlet.spawn,以一个新的线程启动

Launcher类内部有一个列表,记录了所有的service对象。

(b)service类

 

(2)SchedulerManager类

scheduler_driver_opt = cfg.StrOpt('scheduler_driver',
        default='nova.scheduler.multi.MultiScheduler',
        help='Default driver to use for the scheduler')

FLAGS = flags.FLAGS
FLAGS.register_opt(scheduler_driver_opt)

class SchedulerManager(manager.Manager):
    """Chooses a host to run instances on."""

    def __init__(self, scheduler_driver=None, *args, **kwargs):
        if not scheduler_driver:
            scheduler_driver = FLAGS.scheduler_driver
        self.driver = utils.import_object(scheduler_driver)
        super(SchedulerManager, self).__init__(*args, **kwargs)

默认采用nova.scheduler.multi.MultiScheduler

在接收到虚拟机请求时,SchedulerManager.run_instance被调用

def run_instance(self, context, topic, *args, **kwargs):
        """Tries to call schedule_run_instance on the driver.
        Sets instance vm_state to ERROR on exceptions
        """
        args = (context,) + args
        try:
            return self.driver.schedule_run_instance(*args, **kwargs)
        except exception.NoValidHost as ex:
            # don't reraise
            self._set_vm_state_and_notify('run_instance',
                                         {'vm_state': vm_states.ERROR},
                                          context, ex, *args, **kwargs)

转而调用schedule_run_instance方法,如果该过程出现错误,将虚拟机状态设置为ERROR

def schedule_run_instance(self, *args, **kwargs):
        return self.drivers['compute'].schedule_run_instance(*args, **kwargs)

其中的driver为:

 cfg.StrOpt('compute_scheduler_driver',
               default='nova.scheduler.'
                    'filter_scheduler.FilterScheduler',
               help='Driver to use for scheduling compute calls'),

虚拟机的请求是转交给FilterScheduler类来处理的。

FilterScheduler类中的schedule_run_instance方法:

def schedule_run_instance(self, context, request_spec, *args, **kwargs):
        """This method is called from nova.compute.api to provision
        an instance.  We first create a build plan (a list of WeightedHosts)
        and then provision.

        Returns a list of the instances created.
        """
        elevated = context.elevated()
        num_instances = request_spec.get('num_instances', 1)
        LOG.debug(_("Attempting to build %(num_instances)d instance(s)") %
                locals())
        payload = dict(request_spec=request_spec)
        notifier.notify(notifier.publisher_id("scheduler"),
                        'scheduler.run_instance.start', notifier.INFO, payload)

        weighted_hosts = self._schedule(context, "compute", request_spec, *args, **kwargs)

        if not weighted_hosts:
            raise exception.NoValidHost(reason="")

        # NOTE(comstud): Make sure we do not pass this through.  It
        # contains an instance of RpcContext that cannot be serialized.
        kwargs.pop('filter_properties', None)

        instances = []
        for num in xrange(num_instances):
            if not weighted_hosts:
                break
            weighted_host = weighted_hosts.pop(0)

            request_spec['instance_properties']['launch_index'] = num
            instance = self._provision_resource(elevated, weighted_host,
                                                request_spec, kwargs)
            if instance:
                instances.append(instance)

        notifier.notify(notifier.publisher_id("scheduler"), 'scheduler.run_instance.end', notifier.INFO, payload)

(3)调度算法

(a)得到权重和成本计算函数get_cost_functions

每个topic对应一个权重和成本计算函数,如果之前缓存过,直接返回

if topic in self.cost_function_cache:
return self.cost_function_cache[topic]

成本计算函数是配置文件中least_cost_functions对应的

###### (FloatOpt) How much weight to give the fill-first cost function. A negative value will reverse behavior: e.g. spread-first
# compute_fill_first_cost_fn_weight=-1.0
###### (ListOpt) Which cost functions the LeastCostScheduler should use
# least_cost_functions="nova.scheduler.least_cost.compute_fill_first_cost_fn"
###### (FloatOpt) How much weight to give the noop cost function
# noop_cost_fn_weight=1.0

各个成本计算函数以逗号分开,另外每个成本计算函数也对应一个权重。

get_cost_functions 最后获取所有的成本计算函数以及其对应的权重,并返回。

 

(b)得到所有可用节点信息HostManager. get_all_host_states

def get_all_host_states(self, context, topic):
        """Returns a dict of all the hosts the HostManager
        knows about. Also, each of the consumable resources in HostState
        are pre-populated and adjusted based on data in the db.

        For example:
        {'192.168.1.100': HostState(), ...}

        Note: this can be very slow with a lot of instances.
        InstanceType table isn't required since a copy is stored
        with the instance (in case the InstanceType changed since the
        instance was created)."""

i.  首先会从数据库中选出所有可用的未过滤的节点组成的一个字典host_state_map。
ii. 所有的计算节点都保存在nova库的coupute_nodes表中。
iii. 从表中取出所有节点必要的信息(如节点的内存容量,虚拟CPU个数,硬盘容量等),放到字典host_state_map中
iv. 从数据库表nova.instances中取出所有实例instance的信息。
v. 根据instance所在的主机,从host_state_map中对应主机减去实例所占用的资源,得到主机的剩余资源

(c)根据定义的过滤方法,过滤掉不合适的主机

可以在配置文件中设置过滤条件

scheduler_available_filters定义可用的过滤器,可以多次定义,每个scheduler_driver=nova.scheduler.distributed_scheduler.FilterScheduler
scheduler_available_filters=nova.scheduler.filters.standard_filters
scheduler_available_filters=myfilter.MyFilter
scheduler_default_filters=RamFilter,ComputeFilter,MyFilter

(d)根据成本计算函数和权重,选择最合适主机

 


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值