高流量环境下的kubernetes 3个要点

This post assumes technical expertise from the reader; with working knowledge on Amazon Web Services (AWS) and Kubernetes.

这篇文章假定读者具有技术专长; 具有关于Amazon Web Services(AWS)和Kubernetes的工作知识。

介绍 (Introduction)

upday started its Kubernetes journey on two smaller product lines with very light traffic last year. Armed with the confidence of running production workloads on Kubernetes, we then worked on migrating our primary product line, the upday platform.

去年, upday在两条流量很小的小型生产线上开始了Kubernetes之旅。 怀着在Kubernetes上运行生产工作负载的信心,我们随后着手迁移了我们的主要产品线( 升级平台)。

This post summarizes the strategies we employed to handle the specific challenges we face at our scale.

这篇文章总结了我们用来应对大规模挑战的策略。

快速回顾我们的架构 (A Quick Recap on our Architecture)

A summary of the architecture can be read here. Although the underlying systems have evolved since that post, our core value proposition remains the same, but at a high level, i.e. we deliver personalised news content to our millions of users.

可以在此处阅读该体系结构的摘要。 尽管自发布以来基础系统已经发生了变化,但我们的核心价值主张保持不变,但在较高水平上,即我们向数百万用户提供个性化的新闻内容。

The upday platform currently handles 5k RPS during normal workloads and up-to 15k RPS during a news push notification. In some unique cases, depending on the importance of the news and time of the day, this RPS count can go exponentially higher. This is after a 80% cache hit ratio with our CDN on specific routes.

Upday平台目前在正常工作负载下可处理5k RPS,在新闻推送通知期间可处理高达15k RPS。 在某些特殊情况下,根据新闻的重要性和一天中的时间,此RPS计数可能成倍增加。 这是因为我们的CDN在特定路线上的缓存命中率达到80%。

我们的迁移策略快速回顾 (A Quick Recap on our Migration Strategy)

We adopted Infrastructure Strangulation to migrate the applications to Kubernetes. The Strangler pattern is an established Cloud Architecture Design pattern documented by Martin Fowler and Microsoft.

我们采用了基础架构扼杀将应用程序迁移到Kubernetes。 Strangler模式是由Martin FowlerMicrosoft记录的已建立的Cloud Architecture设计模式。

Image for post
Strangler Pattern in action. Source 扼杀者模式在行动。 来源 docs.microsoft.com. docs.microsoft.com

Strangler pattern is about employing an incremental approach for replacing a legacy system with a new system. It is achieved by using a facade (called the Strangler Facade) which abstracts both, existing and new, implementations consistently.

Strangler模式是关于采用增量方法将旧系统替换为新系统。 这是通过使用立面(称为Strangler立面)实现的,该立面一致地抽象了现有和新的实现。

We modified our deployment pipelines to run the applications on both Legacy and Kubernetes infrastructures. We also modified our Edge infrastructure to be able to route traffic to both Kubernetes and Legacy infrastructure. Most cutovers were handled at the DNS level. When we were confident about the outcome of the migration (i,e, application and functionality worked as expected, no reports of issues from users; zero alarms from monitoring; statsd metrics looking same as before, etc.), we simply stopped sending traffic to the Legacy infrastructure and decommissioned the same after a few days.

我们修改了部署管道以在Legacy和Kubernetes基础设施上运行应用程序。 我们还修改了Edge基础架构,以能够将流量路由到Kubernetes和Legacy基础架构。 大多数转换都在DNS级别进行。 当我们对迁移的结果充满信心时(即应用程序和功能按预期工作,没有用户问题的报告;来自监控的零警报; statsd指标与以前相同,等等),我们只是停止发送流量迁移到旧版基础架构,几天后退役。

外卖 (The takeaways)

1.延迟负载均衡 (1. Load balancing for Latency)

TL;DR — The default load balancing algorithm for Kubernetes ingress, round robin, causes the applications behind ingress to receive imbalanced traffic. The difference is very much visible when the traffic is high and bursty. The peak EWMA algorithm helps to get an even distribution of traffic.

TL; DR — Kubernetes入口的默认负载平衡算法, round robin ,使入口后的应用程序接收不平衡流量。 当流量高且突发时,差异非常明显。 峰值EWMA算法有助于获得均匀的流量分布。

In the modern software ecosystem, load balancing plays several roles; it enables Scalability and Resilience for the systems. In a distributed microservices architecture, we have to also consider another aspect — Latency.

在现代软件生态系统中,负载平衡扮演着多个角色。 它使系统具有可伸缩性弹性 。 在分布式微服务架构中,我们还必须考虑另一方面-Latency

Load balancing algorithms like round robin address the Scalability and Resilience aspects easily. For latency, the load balancer also has to account for the history of downstream application/server’s recent performance. One such load balancing algorithm that tracks this aspect is the Peak EWMA (Exponentially Weighted Moving Average).

诸如round robin类的负载平衡算法轻松解决可伸缩性弹性方面。 对于延迟 ,负载均衡器还必须考虑下游应用程序/服务器最近的性能历史。 跟踪此方面的一种此类负载平衡算法是Peak EWMA(指数加权移动平均值)。

Image for post
Peak EWMA in action
EWMA高峰

Peak EWMA algorithm works by maintaining a moving average of each replica’s round-trip time, weighted by the number of outstanding requests, and distributing traffic to replicas where that cost function is smallest.

峰值EWMA算法的工作原理是:维护每个副本的往返时间的移动平均值,并由未完成的请求数加权,然后将流量分配给成本函数最小的副本。

We have observed that Peak EWMA load balancing algorithm is pretty effective at distributing traffic across replicas especially when they auto-scale.

我们已经观察到Peak EWMA负载平衡算法在跨副本之间分配流量方面非常有效,尤其是当副本自动扩展时。

More information on this can be found at:

有关更多信息,请访问:

2.本地DNS缓存 (2. Local DNS Cache)

TL;DR — In Cloud Environments, everything has limits. The KubeDNS component (CoreDNS), by default, resolves its upstream queries with the VPC DNS resolver from a limited number of nodes. It is very easy to hit the limits enforced by the resolver, thereby resulting in DNS lookup failures. Using a nodelocaldns will distribute the DNS resolution load across all the worker nodes.

TL; DR —在云环境中,一切都有局限性。 默认情况下,KubeDNS组件(CoreDNS)使用VPC DNS解析器从有限数量的节点解析其上游查询。 达到解析器所强加的限制非常容易,从而导致DNS查找失败。 使用 nodelocaldns 将在所有辅助节点上分配DNS解析负载。

DNS is one of the important pillars of any networked infrastructure. Kubernetes utilizes CoreDNS as its authoritative DNS server. When reaching a certain scale, a significant number of DNS errors started to appear in logs, that was leading to a lot of retries. Operations that were supposed to take a few minutes were taking orders of magnitude longer.

DNS是任何网络基础架构的重要Struts之一。 Kubernetes利用CoreDNS作为其权威的DNS服务器。 当达到一定规模时,日志中开始出现大量DNS错误,这导致大量重试。 原本需要花费几分钟的运营时间却要长几个数量级。

In our case, we were hitting the limits of DNS resolution imposed by the VPC resolver on a node. There are many ways to reduce the amount of DNS resolutions made, like adjusting the default DNS timeouts for OpenJDK or nodots fixes. Those fixes helped a bit, beyond those fixes, we still needed to find a way to handle more DNS traffic. Kubernetes developers have an add-on, NodeLocal DNSCache, that specifically addresses these kinds of problems.

在我们的案例中,我们达到了VPC解析器对节点施加的DNS解析的极限。 有很多方法可以减少DNS解析的数量,例如调整OpenJDK默认DNS超时nodots fixes 。 这些修补程序有所帮​​助,除了这些修补程序之外,我们仍然需要找到一种处理更多DNS流量的方法。 Kubernetes开发人员有一个附加程序NodeLocal DNSCache ,专门解决了这类问题。

Image for post
DNS requests before and after implementing NodeLocal DNSCache.
实施NodeLocal DNSCache之前和之后的DNS请求。

NodeLocal DNSCache improves Cluster DNS performance by running a DNS caching agent on cluster nodes as a DaemonSet. Pods will be able to reach out to the DNS caching agent running on the same node, thereby avoiding iptables DNAT rules and connection tracking.

NodeLocal DNSCache通过在群集节点上作为DaemonSet运行DNS缓存代理来提高群集DNS性能。 Pod将能够与在同一节点上运行的DNS缓存代理联系,从而避免iptables DNAT规则和连接跟踪。

Implementing this reduced the volume of DNS queries made by a large extent. Burst traffic no longer affected DNS resolutions and this is in line with the AWS recommendation to cache DNS.

实施此操作可在很大程度上减少DNS查询的数量。 突发流量不再影响DNS解析,这与AWS建议缓存DNS一致

More information on this can be found at:

有关更多信息,请访问:

3.多次进入 (3. Multiple Ingress)

TL;DR — Every product line has different SLA requirements and traffic patterns. In such cases, traffic can be isolated at ingress level by deploying multiple ingress controllers (with different ingress classes) for different applications or product lines.

TL; DR —每个产品线都有不同的SLA要求和流量模式。 在这种情况下,可以通过为不同的应用程序或产品线部署多个入口控制器(具有不同的入口类)在入口级别隔离流量。

In layman’s terms, Kubernetes ingress is used to expose Kubernetes services to external requests. Ingresses usually interface with a load balancer, to help handle the traffic.

用外行的术语来说,Kubernetes入口用于向外部请求公开Kubernetes服务。 入口通常与负载平衡器连接,以帮助处理流量。

We use ingress-nginx maintained by the Kubernetes team. It is built on top of the stable nginx open source reverse proxy and load balancer. During our migration, ingress-nginx was able to replace spring-api-gateway with ingress definitions and authenticate API requests with its auth-url annotation.

我们使用Kubernetes团队维护的ingress-nginx 。 它建立在稳定的nginx开源反向代理和负载均衡器之上。 在我们的迁移过程中, ingress-nginx能够用入口定义替换spring-api-gateway并通过其身份验证API请求 auth-url 批注。

We have multiple product lines, some catering to B2C traffic and some with B2B traffic. Although they are separated under different namespaces, node groups and affinities, when the burst traffic in B2C product line started causing latency to the B2B products, it started affecting the SLA commitments as all of them used the same ingress controller. Therefore, we launched multiple ingress controllers with a different name in ingress-class to isolate the traffic.

我们有多个产品线,有些满足B2C流量,有些满足B2B流量。 尽管它们在不同的命名空间,节点组和关联关系下是分开的,但是当B2C产品系列中的突发流量开始引起B2B产品的延迟时,由于它们都使用相同的入口控制器,因此开始影响SLA承诺。 因此,我们在ingress-class启动了多个具有不同名称的入口控制器,以隔离流量。

Image for post
Multi-Ingress in action
多入口行动

By separating ingress controllers, we achieve isolation of traffic for that product line within the ingress, handle high latency during burst traffic scenarios and even containment of blast radius for that product line.

通过分离入口控制器,我们实现了入口内该产品线的流量隔离,在突发流量情况下处理高延迟,甚至控制了该产品线的爆炸半径。

We now have a truly isolated setup for each product line, at the ingress layer too. Post this implementation, we have observed that traffic of one product doesn’t affect the other.

现在,我们在入口层也为每个产品线提供了真正隔离的设置。 发布此实现后,我们观察到一种产品的流量不会影响另一种产品。

More information on this can be found at:

有关更多信息,请访问:

结论 (Conclusion)

The current solutions are stable and are working consistently for our high traffic requirements. Further improvements can be made by separating to multiple clusters, rearchitecting our hot paths, caching etc.

当前的解决方案稳定且始终如一地满足我们的高流量需求。 通过分离到多个集群,重新配置我们的热路径,缓存等,可以进行进一步的改进。

We would love to read your ideas, questions or comments about this topic. Thank you for reading!

我们很乐意阅读您对这个主题的想法,问题或评论。 感谢您的阅读!

翻译自: https://medium.com/upday-devs/kubernetes-on-a-high-traffic-environment-3-key-takeaways-39d3852fb515

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值