Sinan:ML-Based and QoS-Aware Resource Management for Cloud Microservices

Sinan:ML-Based and QoS-Aware Resource Management for Cloud Microservices

{2021}, {Yanqi Zhang}, {ASPLOS}

{引用格式}

Summary

写完笔记之后最后填,概述文章的内容,以后查阅笔记的时候先看这一段。注:写文章summary切记需要通过自己的思考,用自己的语言描述。忌讳直接Ctrl + c原文。

Research Objective(s)

Microservice complicate resource management, as dependencies between them introduce backpressure effects and cascading QoS violations.

We present Sinan, a data-driven cluster manager for interactive cloud microservices that is online and QoS-aware. Sinan leverages a set of scalable and validated machine learning models to determine the performance impact of dependencies between microservices, and allocate appropriate resources per tier in a way that perserves the end-to-end tail latency target.
backpressure
美 [bækp’reʃə]
英 [bækp’reʃə]
n.背压;回压
网络半双工背压;反向压力;反压
cascading 级联

Background / Problem Statement

研究的背景以及问题陈述:作者需要解决的问题是什么?
Microservices introduce new system challenges, especially in resource management, since the complex topologies of microservice dependencies exacerbate queueing effects, and introdcue cascading Quality of Serivce (Qos) viloation that are difficult to identify and correct in a timely mananer.

Microservices are by design mostly stateless, hence their performance is defined by their CPU allocation. Given this, Sinan primarily focuses on allocating CPU resourcs to each tier[26].

  1. Dependencies among ties
  2. System complexity
    Second, the application may include third-party software whose source code cannot be instrumented. Alternatively, expecting the use to express each tier's resource sensitivity is problematic, as users already face difficulties correctly reserving resources for simple, monolithic workloads, leading to well-documented underutilization, and the impact of microservice dependencies is especially hard to assess, even for expert developers.
  3. Delayed queued effect
  4. Boudaries of resource allocation space

Method(s)

作者解决问题的方法/算法是什么?是否基于前人的方法?基于了哪些?

Sinan first uses an efficient space exploration algorithm to examine the space of possible resource allocations, especially focusing on corner cases that introduce QoS violations. This yields a training dataset used to train two models: Convolutional Neural Network(CNN) model for detailed short-term performance predcition, and a Boosted Trees model that evaluates the long-term performance evolution. The combination of the two model allows Sinan to both examine the near-future outcome of a resource allocation, and to account for the system’s inertia in building up queues with higher accuracy than a single model examining both time windows.
inertia
美 [ɪ’nɜrʃə]
英 [ɪ’nɜː®ʃə]
n.惯性;惰性;缺乏活力;保守
网络惯量;不活动;无力

2.3 Management Chanllenges & the Need for ML

The resource scheduler should have a global view of the microservice graph and be able to anticipate the impact of dependencies on end-to-end performance.

Delayed queuenig effect

This delayed queueing effect highlights the need for ML to evaluate the long-term impact of resource allocations.

3. Machine learning models

To address this, we designed a two-stage model. First, a CNN that predicts the
end-to-end latency of the next timestep with high accuracy, and, second, a Boossted Trees(BT) model that estimates the proability for QoS violations further in the future, using the latent variable extracted by CNN. BT is generally less prone to overfitting than CNNs, since it has much fewer tunable hyperpameters than NNs; mainly the number of tress and tree depth.

Evaluation

作者如何评估自己的方法?实验的setup是什么样的?感兴趣实验数据和结果有哪些?有没有问题或者可以借鉴的地方?

Conclusion

作者给出了哪些结论?哪些是strong conclusions, 哪些又是weak的conclusions(即作者并没有通过实验提供evidence,只在discussion中提到;或实验的数据并没有给出充分的evidence)?

Notes

(optional) 不在以上列表中,但需要特别记录的笔记。

References

(optional) 列出相关性高的文献,以便之后可以继续track下去。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值