Ansible 源码解析:forks并发机制的实现

(本文基于Ansible 2.7)
forks选项是Ansible原生支持的一种支持并发执行的方式,可以通过配置文件指定默认值,可以在运行ansible时指定,也可以在调用ansble API做开发时赋值。

forks选项的接收和处理在lib/ansible/cli/__init__.py 的442-444行:

        if fork_opts:
            parser.add_option('-f', '--forks', dest='forks', default=C.DEFAULT_FORKS, type='int',
                              help="specify number of parallel processes to use (default=%s)" % C.DEFAULT_FORKS)

389-391行表明,该值不可小于1

        if fork_opts:
            if op.forks < 1:
                self.parser.error("The number of processes (--forks) must be >= 1")

help信息表明,此选项是“使用的并发进程数”,默认值是C.DEFAULT_FORKS。我们可以在启动ansible时加入 -f 选项覆盖默认值。

那么Ansible是如何在运行中使用这个值的呢?我们知道,ansible是通过TaskQueueManager类来运行任务的(参见 Ansible 源码解析: Ansible的运行过程
那么创建process的过程应该也是由TaskQueueManager来负责,我们来看一下TaskQueueManager的run方法:
lib/ansible/executer/task_queue_manager.py,220-299行:

    def run(self, play):
        '''
        Iterates over the roles/tasks in a play, using the given (or default)
        strategy for queueing tasks. The default is the linear strategy, which
        operates like classic Ansible by keeping all hosts in lock-step with
        a given task (meaning no hosts move on to the next task until all hosts
        are done with the current task).
        '''

        if not self._callbacks_loaded:
            self.load_callbacks()

        all_vars = self._variable_manager.get_vars(play=play)
        warn_if_reserved(all_vars)
        templar = Templar(loader=self._loader, variables=all_vars)

        new_play = play.copy()
        new_play.post_validate(templar)
        new_play.handlers = new_play.compile_roles_handlers() + new_play.handlers

        self.hostvars = HostVars(
            inventory=self._inventory,
            variable_manager=self._variable_manager,
            loader=self._loader,
        )

        play_context = PlayContext(new_play, self._options, self.passwords, self._connection_lockfile.fileno())
        for callback_plugin in self._callback_plugins:
            if hasattr(callback_plugin, 'set_play_context'):
                callback_plugin.set_play_context(play_context)

        self.send_callback('v2_playbook_on_play_start', new_play)

        # initialize the shared dictionary containing the notified handlers
        self._initialize_notified_handlers(new_play)

        # build the iterator
        iterator = PlayIterator(
            inventory=self._inventory,
            play=new_play,
            play_context=play_context,
            variable_manager=self._variable_manager,
            all_vars=all_vars,
            start_at_done=self._start_at_done,
        )

        # adjust to # of workers to configured forks or size of batch, whatever is lower
        self._initialize_processes(min(self._options.forks, iterator.batch_size))

        # load the specified strategy (or the default linear one)
        strategy = strategy_loader.get(new_play.strategy, self)
        if strategy is None:
            raise AnsibleError("Invalid play strategy specified: %s" % new_play.strategy, obj=play._ds)

        # Because the TQM may survive multiple play runs, we start by marking
        # any hosts as failed in the iterator here which may have been marked
        # as failed in previous runs. Then we clear the internal list of failed
        # hosts so we know what failed this round.
        for host_name in self._failed_hosts.keys():
            host = self._inventory.get_host(host_name)
            iterator.mark_host_failed(host)

        self.clear_failed_hosts()

        # during initialization, the PlayContext will clear the start_at_task
        # field to signal that a matching task was found, so check that here
        # and remember it so we don't try to skip tasks on future plays
        if getattr(self._options, 'start_at_task', None) is not None and play_context.start_at_task is None:
            self._start_at_done = True

        # and run the play using the strategy and cleanup on way out
        play_return = strategy.run(iterator, play_context)

        # now re-save the hosts that failed from the iterator to our internal list
        for host_name in iterator.get_failed_hosts():
            self._failed_hosts[host_name] = True

        strategy.cleanup()
        self._cleanup_processes()
        return play_return

其中第266-267行:

        # adjust to # of workers to configured forks or size of batch, whatever is lower
        self._initialize_processes(min(self._options.forks, iterator.batch_size))

表明创建process的数量是在目标数量和option中指定的forks数量中取较小值。
而_initialize_processes方法实际上仅仅是建立了一个空worker的list(113-117行)

    def _initialize_processes(self, num):
        self._workers = []

        for i in range(num):
            self._workers.append(None)

并没有创建进程的内容。
这时我们注意到,290-291行,play的运行过程是由strategy来负责的:

        # and run the play using the strategy and cleanup on way out
        play_return = strategy.run(iterator, play_context)

而strategy对象是根据self,即TaskQueueManager对象创建的(269-270行):

        # load the specified strategy (or the default linear one)
        strategy = strategy_loader.get(new_play.strategy, self)

这时可以到strategy中查看它是如何做的。
默认的strategy是liner(lib/ansible/plugins/strategy/liner.py),但其中仅有一次对workers的引用,是用来处理结果信息的,显然不是我们要寻找的内容。
再去strategy的基类StrategyBase中查找,发现了在liner strategy的run中有调用的_queue_task方法:

lib/ansible/plugins/strategy/__init__.py,279-336行:

    def _queue_task(self, host, task, task_vars, play_context):
        ''' handles queueing the task up to be sent to a worker '''

        display.debug("entering _queue_task() for %s/%s" % (host.name, task.action))

        # Add a write lock for tasks.
        # Maybe this should be added somewhere further up the call stack but
        # this is the earliest in the code where we have task (1) extracted
        # into its own variable and (2) there's only a single code path
        # leading to the module being run.  This is called by three
        # functions: __init__.py::_do_handler_run(), linear.py::run(), and
        # free.py::run() so we'd have to add to all three to do it there.
        # The next common higher level is __init__.py::run() and that has
        # tasks inside of play_iterator so we'd have to extract them to do it
        # there.

        if task.action not in action_write_locks.action_write_locks:
            display.debug('Creating lock for %s' % task.action)
            action_write_locks.action_write_locks[task.action] = Lock()

        # and then queue the new task
        try:

            # create a dummy object with plugin loaders set as an easier
            # way to share them with the forked processes
            shared_loader_obj = SharedPluginLoaderObj()

            queued = False
            starting_worker = self._cur_worker
            while True:
                worker_prc = self._workers[self._cur_worker]
                if worker_prc is None or not worker_prc.is_alive():
                    self._queued_task_cache[(host.name, task._uuid)] = {
                        'host': host,
                        'task': task,
                        'task_vars': task_vars,
                        'play_context': play_context
                    }

                    worker_prc = WorkerProcess(self._final_q, task_vars, host, task, play_context, self._loader, self._variable_manager, shared_loader_obj)
                    self._workers[self._cur_worker] = worker_prc
                    worker_prc.start()
                    display.debug("worker is %d (out of %d available)" % (self._cur_worker + 1, len(self._workers)))
                    queued = True
                self._cur_worker += 1
                if self._cur_worker >= len(self._workers):
                    self._cur_worker = 0
                if queued:
                    break
                elif self._cur_worker == starting_worker:
                    time.sleep(0.0001)

            self._pending_results += 1
        except (EOFError, IOError, AssertionError) as e:
            # most likely an abort
            display.debug("got an error while queuing: %s" % e)
            return
        display.debug("exiting _queue_task() for %s/%s" % (host.name, task.action))

这里,self._cur_worker 是一个计数器,循环中,每次创建WorkerProcess对象后会+1.当self._cur_worker值达到_wokers的长度时,计数器会被清零,继续循环。直到有任务结束。相当于创建了一个子进程池,在未给全部任务分配子进程前,任意子进程退出后就会有新的子进程填补进来,运行新的任务。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值