Cloud Design Pattern - Compute Resource Consolidation(服务聚合)

最新推荐文章于 2024-05-21 09:46:19 发布

feng1456

最新推荐文章于 2024-05-21 09:46:19 发布

阅读量1.1k

点赞数

分类专栏：云计算文章标签：云计算设计模式 Compute Resource Con 计算资源整合敏捷开发

本文链接：https://blog.csdn.net/afandaafandaafanda/article/details/49687983

版权

云计算专栏收录该内容

37 篇文章 0 订阅

订阅专栏

1.前言

上一篇中讲到竞争消费模式,这一篇我们讨论下计算资源整合模式.在如今流行的软件设计架构中,微服务架构备受推崇.微服务架构是单一职责原则在架构方面的具体实践,这种架构模式的原则是将服务拆分成单个的独立单元,明确服务的边界,尽量减小微服务之间的耦合,这种做法是完全符合面向对象设计的基本原则的,因为各个服务都是可以独立替换的,整个系统的架构变得更加松散,扩展性更强,完全符合面向新增开放,面向修改关闭的原则.

从开发管理的角度来说,这样做带来的好处是服务可以单独地上线,使得敏捷开发变得更加敏捷,减少系统更新的周期,对于需要快速迭代的产品,尤其是互联网产品,意义是非常重大的.这一篇讲资源整合模式,意思是将某些微服务聚合在一起,显然跟我们目前流程的微服务架构背道而驰,那为什么会这样呢?我想,设计必须因地制宜,根据实际的业务需求,合理地划分微服务的粒度,因为并非服务的粒度越小越好.如何根据实际需求,把握合适的粒度才是真正衡量架构设计水平高低的标尺。

2.概念

蒋金楠有一句话,个人认为非常有道理,那就是在开始学习一门技术之前,先最好搞清除这门技术为什么会出现.某一种技术的出现或者流行,背后往往有着深刻的原因.在前言部分,我们似乎已经嗅到了这种设计模式的不同,就像是"逆流的鱼".分析任何技术产生的原因，一定要从该技术使用的环境入手,这里也不例外,云环境中,服务都是部署在不同的进程,不同的服务器,甚至是不用地域的数据中心中,服务之间调用都需要付出代价-延迟.服务划分越细,这种延迟越明显。所以,我们需要考虑如何确定粒度的大小.

如上图所示,服务都是各自在自己的进程中运行。云最重要的理念就是弹性,提升资源利用率.想象一下,如果每个服务运行在各自的服务器上,那么CPU,I/O及网络这些宝贵资源肯定无法达到高使用率的要求,这时候就需要聚合服务了,资源使用率是确定微服务粒度的基本根据.

说了这么多,在使用这种模式的时候,我们究竟需要考虑哪些方面才能获得最大的效益呢?

Consider the following points when implementing this pattern:

Scalability and Elasticity. Many cloud solutions implement scalability and elasticity at the level of the computational unit by starting and stopping instances of units. Avoid grouping tasks that have conflicting scalability requirements in the same computational unit.
Lifetime. The cloud infrastructure may periodically recycle the virtual environment that hosts a computational unit. When executing many long-running tasks inside a computational unit, it may be necessary to configure the unit to prevent it from being recycled until these tasks have finished. Alternatively, design the tasks by using a check-pointing approach that enables them to stop cleanly, and continue at the point at which they were interrupted when the computational unit is restarted.
Release Cadence. If the implementation or configuration of a task changes frequently, it may be necessary to stop the computational unit hosting the updated code, reconfigure and redeploy the unit, and then restart it. This process will also require that all other tasks within the same computational unit are stopped, redeployed, and restarted.
Security. Tasks in the same computational unit may share the same security context and be able to access the same resources. There must be a high degree of trust between the tasks, and confidence that that one task is not going to corrupt or adversely affect another. Additionally, increasing the number of tasks running in a computational unit may increase the attack surface of the computational unit; each task is only as secure as the one with the most vulnerabilities.
Fault Tolerance. If one task in a computational unit fails or behaves abnormally, it can affect the other tasks running within the same unit. For example, if one task fails to start correctly it may cause the entire startup logic for the computational unit to fail, and prevent other tasks in the same unit from running.
Contention. Avoid introducing contention between tasks that compete for resources in the same computational unit. Ideally, tasks that share the same computational unit should exhibit different resource utilization characteristics. For example, two compute-intensive tasks should probably not reside in the same computational unit, and neither should two tasks that consume large amounts of memory. However, mixing a compute intensive task with a task that requires a large amount of memory may be a viable combination.

omplexity. Combining multiple tasks into a single computational unit adds complexity to the code in the unit, possibly making it more difficult to test, debug, and maintain.
Stable Logical Architecture. Design and implement the code in each task so that it should not need to change, even if the physical environment in which task runs does change.
Other Strategies. Consolidating compute resources is only one way to help reduce costs associated with running multiple tasks concurrently. It requires careful planning and monitoring to ensure that it remains an effective approach. Other strategies may be more appropriate, depending on the nature of the work being performed and the location of the users on whose behalf these tasks are running. For example, functional decomposition of the workload (as described by theCompute Partitioning Guidance) may be a better option.

在考虑如何实现这个模式时需要注意一下几点：

1)扩展性和弹性：很多云集算平台中通过支持计算单元的动态增删来实现系统的扩展性和弹性特点，因此在对任务进行分组时务必不能将扩展性弹性特征不同的任务分在一起（前文例子可以证明）。

2)生命周期：一些云计算平台会周期性地回收计算单元的虚拟化环境（资源）。但这种设计下一般来说当系统中有需要长时间执行的任务运行时，需要将其所在的计算单元配置成不会被回收，除非任务已经执行完成。一种替代手段是采用check-pointing，当计算单元被终止时任务会暂停，做一个check point，然后当计算单元被恢复后又能从这个点开始继续运行。

3)Release Cadency：如果计算单元中有任务的配置或者业务代码经常性发生变化，那么就会导致需要经常停止计算单元，然后更新业务代码、更新任务配置并重新载入任务，最后重启计算单元。但当这种情况出现时，同一个计算单元中的计算任务也随之受到影响，随着计算单元的暂停重启而重启。所以对于那种经常性发生变化的任务要把它独立出去以免对其他任务造成过大影响。

4)安全性：在同一个计算单元的任务一般来说会在同一个安全控制上下文中工作，并共享访问资源的权限。因此，被分配到一起的任务必须相互之间高度可信，并保证某个任务不会对其他任务造成不好的影响。另外，如果一个计算单元中分配的任务越多，那么当这个计算单元被攻击时收到的危害就越大，而计算单元中漏洞最多的任务会成为整个计算单元中所有任务的短板。

5)计算单元的容错：一个计算单元中的某个任务如果出现异常，那么则很有可能对同一计算单元中的其他任务造成影响，比方说在一个不好的设计中某个任务如果启动失败了，那么很可能整个计算单元的启动逻辑都会被影响。因此需要考虑到做好计算单元级别的容错，避免由于单个任务的失败对其他任务产生影响。

6)竞争处理：使用中要避免将两个会产生资源竞争的任务分配在一起，在理想情况中，被分配在一起的任务应该有不同的资源利用特点。举个例子，两个计算密集型的任务就最好不要被分配在一起，类似的两个都需要消耗大量内存的任务也最好不要被分配到一起，这种情况中把计算密集任务和需要消耗大量内存的任务放在一起就是一个很好的组合（比较一下这种情况与前文那种突发大压力情况下策略的不同）。

7)复杂性：使用这种模式时会把不同的task混在一起放到一个计算单元中来运行，相比传统设计来说这种方法必然会使计算单元中的程序逻辑变得复杂，从而提升维护、调试、测试难度。

8)保持逻辑架构稳定：使用过程中要优化每个任务的设计使它能在日后尽可能地保持不变。

9)其他：合并计算资源只是用于降低并发运行任务开销的方法之一。这个方法需要进行严格的计划和监管以确保其发挥出理想的效果。根据系统的不同特征或者运行环境特点，有时一些其他策略可能会更加有用，所以不能局限于这一种模式。

3.案例

在Windows Azure Cloud Service开发过程中,我们就可以选择某个角色（Role）包含哪些任务（Task）.同时有一个Fabric controller负责调度这些角色.

通常我们使用WebRole和WorkRole,WebRole负责处理web请求,work role负责后台任务处理.下面的代码展示了WorkRole的功能.

public class WorkerRole: RoleEntryPoint
{
  // The cancellation token source used to cooperatively cancel running tasks.
  private readonly CancellationTokenSource cts = new CancellationTokenSource ();

  // List of tasks running on the role instance.
  private readonly List<Task> tasks = new List<Task>();

  // List of worker tasks to run on this role.
  private readonly List<Func<CancellationToken, Task>> workerTasks  
                        = new List<Func<CancellationToken, Task>>
    {
      MyWorkerTask1,
      MyWorkerTask2
    };
  
  ...
}

我们的任务就是MyWorkerTask1，如何定义的呢？

// A sample worker role task.
private static async Task MyWorkerTask1(CancellationToken ct)
{
  // Fixed interval to wake up and check for work and/or do work.
  var interval = TimeSpan.FromSeconds(30);

  try
  {
    while (!ct.IsCancellationRequested)
    {
      // Wake up and do some background processing if not canceled.
      // TASK PROCESSING CODE HERE
      Trace.TraceInformation("Doing Worker Task 1 Work");

      // Go back to sleep for a period of time unless asked to cancel.
      // Task.Delay will throw an OperationCanceledException when canceled.
      await Task.Delay(interval, ct);
    }
  }
  catch (OperationCanceledException)
  {
    // Expect this exception to be thrown in normal circumstances or check
    // the cancellation token. If the role instances are shutting down, a
    // cancellation request will be signaled.
    Trace.TraceInformation("Stopping service, cancellation requested");

    // Re-throw the exception.
    throw;
  }
}

我们知道Task通过Run和Stop方法来启动和停止服务.下面就是start和stop方法的代码.

...
// RoleEntry Run() is called after OnStart().  
// Returning from Run() will cause a role instance to recycle.
public override void Run()
{
  // Start worker tasks and add them to the task list.
  foreach (var worker in workerTasks)
    tasks.Add(worker(cts.Token));

  Trace.TraceInformation("Worker host tasks started");
  // The assumption is that all tasks should remain running and not return, 
  // similar to role entry Run() behavior.
  try
  {
    Task.WaitAny(tasks.ToArray());
  }
  catch (AggregateException ex)
  {
    Trace.TraceError(ex.Message);

    // If any of the inner exceptions in the aggregate exception 
    // are not cancellation exceptions then re-throw the exception.
    ex.Handle(innerEx => (innerEx is OperationCanceledException));
  }

  // If there was not a cancellation request, stop all tasks and return from Run()
  // An alternative to cancelling and returning when a task exits would be to 
  // restart the task.
  if (!cts.IsCancellationRequested)
  {
    Trace.TraceInformation("Task returned without cancellation request");
    Stop(TimeSpan.FromMinutes(5));
  }
}
...

stop方法的内容如下:

// Stop running tasks and wait for tasks to complete before returning 
// unless the timeout expires.
private void Stop(TimeSpan timeout)
{
  Trace.TraceInformation("Stop called. Canceling tasks.");
  // Cancel running tasks.
  cts.Cancel();

  Trace.TraceInformation("Waiting for canceled tasks to finish and return");

  // Wait for all the tasks to complete before returning. Note that the 
  // emulator currently allows 30 seconds and Azure allows five
  // minutes for processing to complete.
  try
  {
    Task.WaitAll(tasks.ToArray(), timeout);
  }
  catch (AggregateException ex)
  {
    Trace.TraceError(ex.Message);

    // If any of the inner exceptions in the aggregate exception 
    // are not cancellation exceptions then re-throw the exception.
    ex.Handle(innerEx => (innerEx is OperationCanceledException));
  }
}

更多关于 Windows Azure WebRole and WorkRole内容请关注:

http://www.tuicool.com/articles/FbAJFbN

4.相关阅读

Related Patterns and Guidance

The following patterns and guidance may also be relevant when implementing this pattern:

Autoscaling Guidance. Autoscaling can be used to start and stop instances of service hosting computational resources, depending on the anticipated demand for processing.
Compute Partitioning Guidance. This guidance describes how to allocate the services and components in a cloud service in a way that helps to minimize running costs while maintaining the scalability, performance, availability, and security of the service.

More Information

The blog article Combining Multiple Azure Worker Roles into an Azure Web Role.
The section Application Startup Processes in the patterns & practices guide Moving Applications to the Cloud on MSDN.

This pattern has a sample application associated with it. You can download the "Cloud Design Patterns – Sample Code" from the Microsoft Download Center athttp://aka.ms/cloud-design-patterns-sample.