(本文基于Ansible 2.7)
在base.yml中对DEFAULT_INTERNAL_POLL_INTERVAL参数是如下定义的:
DEFAULT_INTERNAL_POLL_INTERVAL:
name: Internal poll interval
default: 0.001
env: []
ini:
- {key: internal_poll_interval, section: defaults}
type: float
version_added: "2.2"
description:
- This sets the interval (in seconds) of Ansible internal processes polling each other.
Lower values improve performance with large playbooks at the expense of extra CPU load.
Higher values are more suitable for Ansible usage in automation scenarios,
when UI responsiveness is not required but CPU usage might be a concern.
- "The default corresponds to the value hardcoded in Ansible <= 2.1"
这个描述已经写得很清楚了:如果该参数值设置得较小,则以较高的CPU额外负载来换取大型playbook的执行性能。而在(运维)自动化应用场景中,更在乎CPU使用率而非界面响应(充分利用计算资源)的条件下,使用较低的值比较合适。
这个参数只在Strategy里用,如StrategyBase:
def _wait_on_handler_results(self, iterator, handler, notified_hosts):
'''
Wait for the handler tasks to complete, using a short sleep
between checks to ensure we don't spin lock
'''
ret_results = []
handler_results = 0
display.debug("waiting for handler results...")
while (self._pending_results > 0 and
handler_results < len(notified_hosts) and
not self._tqm._terminated):
if self._tqm.has_dead_workers():
raise AnsibleError("A worker was found in a dead state")
results = self._process_pending_results(iterator)
ret_results.extend(results)
handler_results += len([
r._host for r in results if r._host in notified_hosts and
r.task_name == handler.name])
if self._pending_results > 0:
time.sleep(C.DEFAULT_INTERNAL_POLL_INTERVAL)
display.debug("no more pending handlers, returning what we have")
return ret_results
def _wait_on_pending_results(self, iterator):
'''
Wait for the shared counter to drop to zero, using a short sleep
between checks to ensure we don't spin lock
'''
ret_results = []
display.debug("waiting for pending results...")
while self._pending_results > 0 and not self._tqm._terminated:
if self._tqm.has_dead_workers():
raise AnsibleError("A worker was found in a dead state")
results = self._process_pending_results(iterator)
ret_results.extend(results)
if self._pending_results > 0:
time.sleep(C.DEFAULT_INTERNAL_POLL_INTERVAL)
display.debug("no more pending results, returning what we have")
return ret_results
都是循环里面sleep,1ms太小,修改为0.1即100ms即可有较佳表现。
总结:本文与Ansible 性能优化(一):降低工作进程(Worker Process)列表检查频率所讨论的内容相同,都是通过降低轮询的频率来降低CPU的使用率。高频的轮询造成的后果不止是服务器承载能力的下降,任务规划策略的困难,还表现为同一个play中task执行数据的巨大差异,有时执行内容完全相同,仅目标不同的task执行时间差距可以达到数百倍,对我们的数据统计分析也造成了很大困扰。正如上文引用的参数描述中所说的,针对自动化场景降低CPU的使用率十分有益。