ironic有限状态机的使用

由于ironic里涉及到node的状态变化较多(r版ironic状态转换图),为方便管理,ironic使用了有限状态机(machines.FiniteMachine库)对这些状态进行了统一管理。

machines.FiniteMachine库

machines.FiniteMachine库抽象了一个状态机对象,该对象可以定义哪些状态是有效的,以及不同状态间的转换关系。
状态机对象维护了两个比较重要的数据列表:

  • self._state: 有序字典,记录所有有效的状态集合。当状态机定义一个新状态时,该状态会被加入至该有序字典。
  • self._transitions:字典。记录所有的状态在遇到事件后的状态转换map。当状态机遇到某事件时,会根据这个字典查找当前状态和该事件的map关系,然后查找到应转变的状态。

machines.FiniteMachine库里对状态机的使用说明如下:

状态机添加状态

def add_state(self, state, terminal=False, on_enter=None, on_exit=None):
        ......
        self._states[state] = {
            'terminal': bool(terminal),
            'reactions': {},
            'on_enter': on_enter,
            'on_exit': on_exit,
        }
        self._transitions[state] = collections.OrderedDict()

可以看到,add_state方法会在 self._states方法里添加新状态,并设置一些跟该 state相关的一些属性;同时初始化该状态在self._transitions的mapping为空字典。
这里可以注意一下on_enteron_exit这两个属性,这两个属性都是回调函数;其中on_enter函数用于在该状态被设置前调用,而on_exit函数用于在该状态马上要退出时调用。

状态机添加状态转换关系

    def add_transition(self, start, end, event, replace=False):
        ......
        target = _Jump(end, self._states[end]['on_enter'],
                       self._states[start]['on_exit'])
        self._transitions[start][event] = target

可以看到,add_transition方法会在self._transitions里针对 start 状态,设置在遇到event事件的时候映射为end对应的_Jump对象,而_Jump对象只是end状态相关的一些属性。

状态机初始化状态

通过调用initialize方法初始化实例的初始状态机状态。

    def initialize(self, start_state=None):
		...
        if start_state is None:
            start_state = self._default_start_state
        if start_state not in self._states:
            raise excp.NotFound("Can not start from a undefined"
                                " state '%s'" % (start_state))
        if self._states[start_state]['terminal']:
            raise excp.InvalidState("Can not start from a terminal"
                                    " state '%s'" % (start_state))
        # No on enter will be called, since we are priming the state machine
        # and have not really transitioned from anything to get here, we will
        # though allow on_exit to be called on the event that causes this
        # to be moved from...
        self._current = _Jump(start_state, None,
                              self._states[start_state]['on_exit'])

这个在ironic中的api以及conductor的task_manager中都有使用,以task_manager为例:当task_manager在初始化的时候,会从数据库获取node,并赋值为self.node属性,node用setter装饰器给装饰了:

    @node.setter
    def node(self, node):
        self._node = node
        if node is not None:
            self.fsm.initialize(start_state=self.node.provision_state,
                                target_state=self.node.target_provision_state)

处理事件(一般涉及状态转换)

def process_event(self, event):
    ...
    current = self._current
    replacement = self._transitions[current.name][event]
    if current.on_exit is not None:
        current.on_exit(current.name, event)
    if replacement.on_enter is not None:
        replacement.on_enter(replacement.name, event)
    self._current = replacement
    result = self._effect_builder(self._states[replacement.name], event)
    ...

process_event方法,在处理event事件时,先根据self._transitions找到当前状态对应event的下一个状态,然后先执行当前状态的on_exit方法,然后执行下个状态的on_enter方法,最后设置状态。

ironic里对状态机的使用

ironic对状态机的状态设置位于/ironic/common/states.py中,列出部分代码如下:

...

machine = fsm.FSM()

# Add stable states
for state in STABLE_STATES:
    machine.add_state(state, stable=True, **watchers)

# Add verifying state
machine.add_state(VERIFYING, target=MANAGEABLE, **watchers)

# Add deploy* states
# NOTE(deva): Juno shows a target_provision_state of DEPLOYDONE
#             this is changed in Kilo to ACTIVE
machine.add_state(DEPLOYING, target=ACTIVE, **watchers)
machine.add_state(DEPLOYWAIT, target=ACTIVE, **watchers)
machine.add_state(DEPLOYFAIL, target=ACTIVE, **watchers)

# A deployment may fail
machine.add_transition(DEPLOYING, DEPLOYFAIL, 'fail')

# A failed deployment may be retried
# ironic/conductor/manager.py:do_node_deploy()
machine.add_transition(DEPLOYFAIL, DEPLOYING, 'rebuild')
# NOTE(deva): Juno allows a client to send "active" to initiate a rebuild
machine.add_transition(DEPLOYFAIL, DEPLOYING, 'deploy')

# A deployment may also wait on external callbacks
machine.add_transition(DEPLOYING, DEPLOYWAIT, 'wait')
machine.add_transition(DEPLOYWAIT, DEPLOYING, 'resume')

# A deployment waiting on callback may time out
machine.add_transition(DEPLOYWAIT, DEPLOYFAIL, 'fail')

# A deployment may complete
machine.add_transition(DEPLOYING, ACTIVE, 'done')

# An active instance may be re-deployed
# ironic/conductor/manager.py:do_node_deploy()
machine.add_transition(ACTIVE, DEPLOYING, 'rebuild')

代码在该文件被导入时就会调用add_state方法设置状态机有效状态集合,也会调用add_transition方法设置状态对应事件的映射map。
那么ironic的node状态转换是如何被触发的呢?ironic针对node的各个操作都是由TaskManager对象来完成的,状态也是,TaskManager部分属性和方法列出如下:

class TaskManager(object):
    def __init__(self, context, node_id, shared=False,
                 purpose='unspecified action',
                 load_driver=True):
        ...
        self.fsm = states.machine.copy()
    
    @node.setter
    def node(self, node):
        self._node = node
        if node is not None:
            self.fsm.initialize(start_state=self.node.provision_state,
                                target_state=self.node.target_provision_state)
    
    def process_event(self, event, callback=None, call_args=None,
                      call_kwargs=None, err_handler=None, target_state=None):
        self.fsm.process_event(event, target_state=target_state)
        ...

        self.node.provision_state = self.fsm.current_state

TaskManager对象在初始化时会设置一个self.fsm属性,即前面说到的已经初始过的状态机;当TaskManager设置具体node时,会通过node的provision_state和target_provision_state设置node的在状态机上的初始状态;当node遇到event事件需要转换状态时,调用process_event方法变换状态,并设置node的provision_state为转换后的状态。

比如在node进行clean时,d 做完clean准备的切网,pxe选项后,将状态切换为"clean wait" 状态:

    def _do_node_clean(self, task, clean_steps=None):
        ...
        try:
            prepare_result = task.driver.deploy.prepare_cleaning(task)
        except Exception as e:
            msg = (_('Failed to prepare node %(node)s for cleaning: %(e)s')
                   % {'node': node.uuid, 'e': e})
            LOG.exception(msg)
            return utils.cleaning_error_handler(task, msg)

        if prepare_result == states.CLEANWAIT:
            # Prepare is asynchronous, the deploy driver will need to
            # set node.driver_internal_info['clean_steps'] and
            # node.clean_step and then make an RPC call to
            # continue_node_cleaning to start cleaning.

            # For manual cleaning, the target provision state is MANAGEABLE,
            # whereas for automated cleaning, it is AVAILABLE (the default).
            target_state = states.MANAGEABLE if manual_clean else None
            task.process_event('wait', target_state=target_state)
            return

其中 event 'wait’在clean过程里对应的是"clean wait"状态。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值