1.前言
上一篇我们讨论了云计算设计模式之索引表模式,了解了Azure Table Storage 存储的机制及如何在云应用中应用这种模式来提升系统的性能.这一篇我们来讨论下在云端的分布式应用中,我们如何来协调不同的Task的运作.
2.概念
在同时运行的多个Task中,如果没有一个协调者角色的参与往往会发生抢占共享资源而造成死锁,而且很可能Task之间互相打扰.在云环境中,通常通过水平扩展来解决Task抢占共享资源的问题,但是一旦多个实例往共享资源中同时写入数据时,就需要一个协调者来协调写入的顺序.并且在云环境中,分布式的Task运行的结果需要一个协调者对计算结果进行综合.综上所述,在这个模式中,我们的任务协调者需要处理两个任务:
1)处理任务实例对共享资源的访问,避免出现因竞争共享资源而导致的死锁.
2)对分布式任务的结果进行综合处理.
3.解决方案
我们明确了在分布式任务处理系统中,需要一个任务作为任务协调者的角色,那么究竟根据什么原则来筛选这个任务协调者呢?官方给出的原则是,仅有一个Task的任务可以承担任务些调整的角色,这个任务需要处理上面提到的两个任务.如果所有的任务实例运行的都是相同的代码,那么需要通过一种机制避免多个实例抢占任务协调者角色.
系统必须提供一种健壮的机制来选择某个任务作为协调者,通常被协调的任务通过心跳(HeartBeat)或者选举(Polling)的机制与协调者进行通信.一旦协调者因为异常而终止工作或者由于网络异常出现问题,则需要在剩下的任务中选择出新的协调者.
综上所述,任务协调者需要与任务之间通过心跳的方式进行通信,一旦协调者运行出现问题,则另选协调者.任务协调者最好只处理单个任务(基本协调任务),如果所有任务实例运行相同代码,则每一个任务都必须具备些调整者的基本功能.系统必须提供一种机制确保任务之中始终存在协调者.
关于选择协调者的策略,官方推荐考虑以下几个方面:
1)Selecting the task instance with the lowest-ranked instance or process ID.
2)Racing to obtain a shared distributed mutex. The first task instance that acquires the mutex is the leader. However, the system must ensure that, if the leader terminates or becomes disconnected from the rest of the system, the mutex is released to allow another task instance to become the leader.
3)Implementing one of the common leader election algorithms such as the Bully Algorithm or the Ring Algorithm. These algorithms are relatively straightforward, but there are also a number of more sophisticated techniques available. These algorithms assume that each candidate participating in the election has a unique ID, and that they can communicate with the other candidates in a reliable manner.
实现这种模式其实还是比较复杂的,涉及到一些任务调度相关算法.总体来说,需要在以下几点之间进行权衡.
1)The process of electing a leader should be resilient to transient and persistent failures.
2)It must be possible to detect when the leader has failed or has become otherwise unavailable (perhaps due to a communications failure). The speed at which such detection is required will be system dependent. Some systems may be able to function for a short while without a leader, during which time a transient fault that caused the leader to become unavailable may have been rectified. In other cases, it may be necessary to detect leader failure immediately and trigger a new election.
3)In a system that implements horizontal autoscaling, the leader could be terminated if the system scales back and shuts down some of the computing resources.
4)Using a shared distributed mutex introduces a dependency on the availability of the external service that provides the mutex. This service may constitute a single point of failure. If this service should become unavailable for any reason, the system will not be able to elect a leader.
5)Using a single dedicated process as the leader is a relatively straightforward approach. However, if the process fails there may be a significant delay while it is restarted, and the resultant latency may affect the performance and response times of other processes if they are waiting for the leader to coordinate an operation.
6)Implementing one of the leader election algorithms manually provides the greatest flexibility for tuning and optimizing the code.
从上述这些点来看,使用这个设计模式的要求还是很高的。所以对于何时使用这个模式,官方说法如下:
1)最好是有一个Task Instance 永远能够作为一个Task Leader 存在.
If there is a natural leader or dedicated process that can always act as the leader. For example, it may be possible to implement a singleton process that coordinates the task instances. If this process fails or becomes unhealthy, the system can shut it down and restart it.
2)任务协调者与任务之间的通信尽量轻量化的机制,以免印象性能.
If the coordination between tasks can be easily achieved by using a more lightweight mechanism. For example, if several task instances simply require coordinated access to a shared resource, a preferable solution might be to use optimistic or pessimistic locking to control access to that resource.
3)如果能使用稳定的第三方服务就更好了,比如微软的Azure HDInsight (基于Hadoop),Apache Zookeeper就是这种已经实现任务协调者角色的机制.
If a third-party solution is more appropriate. For example, the Microsoft Azure HDInsight service (based on Apache Hadoop) uses the services provided by Apache Zookeeper to coordinate the map/reduce tasks that aggregate and summarize data. It’s also possible to install and configure Zookeeper on a Azure Virtual Machine and integrate it into your own solutions, or use the Zookeeper prebuilt virtual machine image available from Microsoft Open Technologies. For more information, see Apache Zookeeper on Microsoft Azure on the Microsoft Open Technologies website.
4.Example
实现这种模式最好的范例就是Azure HDInsight.需要详细了解请关注Azure.
官方关于这种模式的讲解:https://msdn.microsoft.com/en-us/library/dn568104.aspx