由于ironic里涉及到node的状态变化较多(r版ironic状态转换图),为方便管理,ironic使用了有限状态机(machines.FiniteMachine库)对这些状态进行了统一管理。
machines.FiniteMachine库
machines.FiniteMachine库抽象了一个状态机对象,该对象可以定义哪些状态是有效的,以及不同状态间的转换关系。
状态机对象维护了两个比较重要的数据列表:
- self._state: 有序字典,记录所有有效的状态集合。当状态机定义一个新状态时,该状态会被加入至该有序字典。
- self._transitions:字典。记录所有的状态在遇到事件后的状态转换map。当状态机遇到某事件时,会根据这个字典查找当前状态和该事件的map关系,然后查找到应转变的状态。
machines.FiniteMachine库里对状态机的使用说明如下:
状态机添加状态
def add_state(self, state, terminal=False, on_enter=None, on_exit=None):
......
self._states[state] = {
'terminal': bool(terminal),
'reactions': {},
'on_enter': on_enter,
'on_exit': on_exit,
}
self._transitions[state] = collections.OrderedDict()
可以看到,add_state方法会在 self._states方法里添加新状态,并设置一些跟该 state相关的一些属性;同时初始化该状态在self._transitions的mapping为空字典。
这里可以注意一下on_enter
和on_exit
这两个属性,这两个属性都是回调函数;其中on_enter函数用于在该状态被设置前调用,而on_exit函数用于在该状态马上要退出时调用。
状态机添加状态转换关系
def add_transition(self, start, end, event, replace=False):
......
target = _Jump(end, self._states[end]['on_enter'],
self._states[start]['on_exit'])
self._transitions[start][event] = target
可以看到,add_transition方法会在self._transitions里针对 start 状态,设置在遇到event事件的时候映射为end对应的_Jump对象,而_Jump对象只是end状态相关的一些属性。
状态机初始化状态
通过调用initialize方法初始化实例的初始状态机状态。
def initialize(self, start_state=None):
...
if start_state is None:
start_state = self._default_start_state
if start_state not in self._states:
raise excp.NotFound("Can not start from a undefined"
" state '%s'" % (start_state))
if self._states[start_state]['terminal']:
raise excp.InvalidState("Can not start from a terminal"
" state '%s'" % (start_state))
# No on enter will be called, since we are priming the state machine
# and have not really transitioned from anything to get here, we will
# though allow on_exit to be called on the event that causes this
# to be moved from...
self._current = _Jump(start_state, None,
self._states[start_state]['on_exit'])
这个在ironic中的api以及conductor的task_manager中都有使用,以task_manager为例:当task_manager在初始化的时候,会从数据库获取node,并赋值为self.node属性,node用setter装饰器给装饰了:
@node.setter
def node(self, node):
self._node = node
if node is not None:
self.fsm.initialize(start_state=self.node.provision_state,
target_state=self.node.target_provision_state)
处理事件(一般涉及状态转换)
def process_event(self, event):
...
current = self._current
replacement = self._transitions[current.name][event]
if current.on_exit is not None:
current.on_exit(current.name, event)
if replacement.on_enter is not None:
replacement.on_enter(replacement.name, event)
self._current = replacement
result = self._effect_builder(self._states[replacement.name], event)
...
process_event方法,在处理event事件时,先根据self._transitions找到当前状态对应event的下一个状态,然后先执行当前状态的on_exit方法,然后执行下个状态的on_enter方法,最后设置状态。
ironic里对状态机的使用
ironic对状态机的状态设置位于/ironic/common/states.py中,列出部分代码如下:
...
machine = fsm.FSM()
# Add stable states
for state in STABLE_STATES:
machine.add_state(state, stable=True, **watchers)
# Add verifying state
machine.add_state(VERIFYING, target=MANAGEABLE, **watchers)
# Add deploy* states
# NOTE(deva): Juno shows a target_provision_state of DEPLOYDONE
# this is changed in Kilo to ACTIVE
machine.add_state(DEPLOYING, target=ACTIVE, **watchers)
machine.add_state(DEPLOYWAIT, target=ACTIVE, **watchers)
machine.add_state(DEPLOYFAIL, target=ACTIVE, **watchers)
# A deployment may fail
machine.add_transition(DEPLOYING, DEPLOYFAIL, 'fail')
# A failed deployment may be retried
# ironic/conductor/manager.py:do_node_deploy()
machine.add_transition(DEPLOYFAIL, DEPLOYING, 'rebuild')
# NOTE(deva): Juno allows a client to send "active" to initiate a rebuild
machine.add_transition(DEPLOYFAIL, DEPLOYING, 'deploy')
# A deployment may also wait on external callbacks
machine.add_transition(DEPLOYING, DEPLOYWAIT, 'wait')
machine.add_transition(DEPLOYWAIT, DEPLOYING, 'resume')
# A deployment waiting on callback may time out
machine.add_transition(DEPLOYWAIT, DEPLOYFAIL, 'fail')
# A deployment may complete
machine.add_transition(DEPLOYING, ACTIVE, 'done')
# An active instance may be re-deployed
# ironic/conductor/manager.py:do_node_deploy()
machine.add_transition(ACTIVE, DEPLOYING, 'rebuild')
代码在该文件被导入时就会调用add_state方法设置状态机有效状态集合,也会调用add_transition方法设置状态对应事件的映射map。
那么ironic的node状态转换是如何被触发的呢?ironic针对node的各个操作都是由TaskManager对象来完成的,状态也是,TaskManager部分属性和方法列出如下:
class TaskManager(object):
def __init__(self, context, node_id, shared=False,
purpose='unspecified action',
load_driver=True):
...
self.fsm = states.machine.copy()
@node.setter
def node(self, node):
self._node = node
if node is not None:
self.fsm.initialize(start_state=self.node.provision_state,
target_state=self.node.target_provision_state)
def process_event(self, event, callback=None, call_args=None,
call_kwargs=None, err_handler=None, target_state=None):
self.fsm.process_event(event, target_state=target_state)
...
self.node.provision_state = self.fsm.current_state
TaskManager对象在初始化时会设置一个self.fsm属性,即前面说到的已经初始过的状态机;当TaskManager设置具体node时,会通过node的provision_state和target_provision_state设置node的在状态机上的初始状态;当node遇到event事件需要转换状态时,调用process_event方法变换状态,并设置node的provision_state为转换后的状态。
比如在node进行clean时,d 做完clean准备的切网,pxe选项后,将状态切换为"clean wait" 状态:
def _do_node_clean(self, task, clean_steps=None):
...
try:
prepare_result = task.driver.deploy.prepare_cleaning(task)
except Exception as e:
msg = (_('Failed to prepare node %(node)s for cleaning: %(e)s')
% {'node': node.uuid, 'e': e})
LOG.exception(msg)
return utils.cleaning_error_handler(task, msg)
if prepare_result == states.CLEANWAIT:
# Prepare is asynchronous, the deploy driver will need to
# set node.driver_internal_info['clean_steps'] and
# node.clean_step and then make an RPC call to
# continue_node_cleaning to start cleaning.
# For manual cleaning, the target provision state is MANAGEABLE,
# whereas for automated cleaning, it is AVAILABLE (the default).
target_state = states.MANAGEABLE if manual_clean else None
task.process_event('wait', target_state=target_state)
return
其中 event 'wait’在clean过程里对应的是"clean wait"状态。