Cloud Design Pattern - Leader Election Pattern(领导选拔模式)

1.前言

上一篇我们讨论了云计算设计模式之索引表模式,了解了Azure Table Storage 存储的机制及如何在云应用中应用这种模式来提升系统的性能.这一篇我们来讨论下在云端的分布式应用中,我们如何来协调不同的Task的运作.

2.概念

在同时运行的多个Task中,如果没有一个协调者角色的参与往往会发生抢占共享资源而造成死锁,而且很可能Task之间互相打扰.在云环境中,通常通过水平扩展来解决Task抢占共享资源的问题,但是一旦多个实例往共享资源中同时写入数据时,就需要一个协调者来协调写入的顺序.并且在云环境中,分布式的Task运行的结果需要一个协调者对计算结果进行综合.综上所述,在这个模式中,我们的任务协调者需要处理两个任务:

1)处理任务实例对共享资源的访问,避免出现因竞争共享资源而导致的死锁.

2)对分布式任务的结果进行综合处理.

3.解决方案

我们明确了在分布式任务处理系统中,需要一个任务作为任务协调者的角色,那么究竟根据什么原则来筛选这个任务协调者呢?官方给出的原则是,仅有一个Task的任务可以承担任务些调整的角色,这个任务需要处理上面提到的两个任务.如果所有的任务实例运行的都是相同的代码,那么需要通过一种机制避免多个实例抢占任务协调者角色.

系统必须提供一种健壮的机制来选择某个任务作为协调者,通常被协调的任务通过心跳(HeartBeat)或者选举(Polling)的机制与协调者进行通信.一旦协调者因为异常而终止工作或者由于网络异常出现问题,则需要在剩下的任务中选择出新的协调者.

综上所述,任务协调者需要与任务之间通过心跳的方式进行通信,一旦协调者运行出现问题,则另选协调者.任务协调者最好只处理单个任务(基本协调任务),如果所有任务实例运行相同代码,则每一个任务都必须具备些调整者的基本功能.系统必须提供一种机制确保任务之中始终存在协调者.

关于选择协调者的策略,官方推荐考虑以下几个方面:

1)Selecting the task instance with the lowest-ranked instance or process ID.

2)Racing to obtain a shared distributed mutex. The first task instance that acquires the mutex is the leader. However, the system must ensure that, if the leader terminates or becomes disconnected from the rest of the system, the mutex is released to allow another task instance to become the leader.

3)Implementing one of the common leader election algorithms such as the Bully Algorithm or the Ring Algorithm. These algorithms are relatively straightforward, but there are also a number of more sophisticated techniques available. These algorithms assume that each candidate participating in the election has a unique ID, and that they can communicate with the other candidates in a reliable manner.

实现这种模式其实还是比较复杂的,涉及到一些任务调度相关算法.总体来说,需要在以下几点之间进行权衡.

1)The process of electing a leader should be resilient to transient and persistent failures.

2)It must be possible to detect when the leader has failed or has become otherwise unavailable (perhaps due to a communications failure). The speed at which such detection is required will be system dependent. Some systems may be able to function for a short while without a leader, during which time a transient fault that caused the leader to become unavailable may have been rectified. In other cases, it may be necessary to detect leader failure immediately and trigger a new election.

3)In a system that implements horizontal autoscaling, the leader could be terminated if the system scales back and shuts down some of the computing resources.

4)Using a shared distributed mutex introduces a dependency on the availability of the external service that provides the mutex. This service may constitute a single point of failure. If this service should become unavailable for any reason, the system will not be able to elect a leader.

5)Using a single dedicated process as the leader is a relatively straightforward approach. However, if the process fails there may be a significant delay while it is restarted, and the resultant latency may affect the performance and response times of other processes if they are waiting for the leader to coordinate an operation.

6)Implementing one of the leader election algorithms manually provides the greatest flexibility for tuning and optimizing the code.

从上述这些点来看,使用这个设计模式的要求还是很高的。所以对于何时使用这个模式,官方说法如下:

1)最好是有一个Task Instance 永远能够作为一个Task Leader 存在.

If there is a natural leader or dedicated process that can always act as the leader. For example, it may be possible to implement a singleton process that coordinates the task instances. If this process fails or becomes unhealthy, the system can shut it down and restart it.

2)任务协调者与任务之间的通信尽量轻量化的机制,以免印象性能.

If the coordination between tasks can be easily achieved by using a more lightweight mechanism. For example, if several task instances simply require coordinated access to a shared resource, a preferable solution might be to use optimistic or pessimistic locking to control access to that resource.

3)如果能使用稳定的第三方服务就更好了,比如微软的Azure HDInsight (基于Hadoop),Apache Zookeeper就是这种已经实现任务协调者角色的机制.

If a third-party solution is more appropriate. For example, the Microsoft Azure HDInsight service (based on Apache Hadoop) uses the services provided by Apache Zookeeper to coordinate the map/reduce tasks that aggregate and summarize data. It’s also possible to install and configure Zookeeper on a Azure Virtual Machine and integrate it into your own solutions, or use the Zookeeper prebuilt virtual machine image available from Microsoft Open Technologies. For more information, see Apache Zookeeper on Microsoft Azure on the Microsoft Open Technologies website.

4.Example

实现这种模式最好的范例就是Azure HDInsight.需要详细了解请关注Azure.

官方关于这种模式的讲解:https://msdn.microsoft.com/en-us/library/dn568104.aspx







  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Cloud applications have a unique set of characteristics. They run on commodity hardware, provide services to untrusted users, and deal with unpredictable workloads. These factors impose a range of problems that you, as a designer or developer, need to resolve. Your applications must be resilient so that they can recover from failures, secure to protect services from malicious attacks, and elastic in order to respond to an ever changing workload. This guide demonstrates design patterns that can help you to solve the problems you might encounter in many different areas of cloud application development. Each pattern discusses design considerations, and explains how you can implement it using the features of Windows Azure. The patterns are grouped into categories: availability, data management, design and implementation, messaging, performance and scalability, resilience, management and monitoring, and security. You will also see more general guidance related to these areas of concern. It explains key concepts such as data consistency and asynchronous messaging. In addition, there is useful guidance and explanation of the key considerations for designing features such as data partitioning, telemetry, and hosting in multiple datacenters. These patterns and guidance can help you to improve the quality of applications and services you create, and make the development process more efficient. Enjoy! Table of Contents Part 1. Preface Part 2. PATTERNS Chapter 1. Cache-Aside Pattern Chapter 2. Circuit Breaker Pattern Chapter 3. Compensating Transaction Pattern Chapter 4. Competing Consumers Pattern Chapter 5. Compute Resource Consolidation Pattern Chapter 6. Command and Query Responsibility Segregation (CQRS) Pattern Chapter 7. Event Sourcing Pattern Chapter 8. External Confguration Store Pattern Chapter 9. Federated Identity Pattern Chapter 10. Gatekeeper Pattern Chapter 11. Health Endpoint Monitoring Pattern Chapter 12. Index Table Pattern Chapter 13. Leader Election Pattern

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值