jvm7 jvm8_改善JVM上的jvm热身

最新推荐文章于 2022-04-28 19:21:42 发布

weixin_26720753

最新推荐文章于 2022-04-28 19:21:42 发布

阅读量271

点赞数

文章标签： jvm

原文链接：https://tech.olx.com/improving-jvm-warm-up-on-kubernetes-1b27dd8ecd58

版权

jvm7 jvm8

JVM warm-up is a notorious problem. JVM-based applications deliver great performance but need some time to “warm-up” before reaching the top speed. When the application launches, it usually starts with reduced performance. It can be attributed to things like Just-In-Time (JIT) compilation which optimizes frequently used code by collecting usage profile information. The net negative effect of this is that the requests received during this warm-up period will have a very high response time as compared to the average. This problem can be exacerbated in containerized, high-throughput, frequent-deploys, and auto-scaled environments.

JVM预热是一个臭名昭著的问题。 基于JVM的应用程序可提供出色的性能，但在达到最高速度之前需要一些时间进行“热身”。 当应用程序启动时，通常以降低的性能开始。 它可以归因于即时(JIT)编译之类的事情，它通过收集使用情况概要信息来优化常用代码。 最终的负面影响是，与平均时间相比，在预热期间收到的请求将具有非常高的响应时间。 在容器化，高吞吐量，频繁部署和自动缩放的环境中，此问题可能会加剧。

In this post, I will discuss our experience with JVM warm-up issues with Java services in our Kubernetes cluster, mitigation approaches we tried — what worked/what didn’t, and our overall learnings.

在本文中，我将讨论我们在Kubernetes集群中使用Java服务进行JVM预热问题的经验，我们尝试的缓解方法-哪些可行/什么无效以及我们的整体学习。

创世记 (Genesis)

OLX started out with a monolithic PHP application. A few years back, we started migrating to a microservice-based architecture on Kubernetes by gradually carving out services from the monolith. Most of the new services were developed in Java. We first encountered this issue when we made one such service live in India, a high-traffic market for OLX. We carried out our usual process of capacity planning by load testing and determined N pods to be sufficient to handle more than the expected peak traffic.

OLX始于一个整体PHP应用程序。几年前，我们通过逐步从整体中分离出服务，开始在Kubernetes上迁移到基于微服务的体系结构。大多数新服务都是用Java开发的。当我们在印度这是OLX的高流量市场上启用一项这样的服务时，我们首先遇到了这个问题。我们通过负载测试执行了通常的容量规划过程，并确定N容器足以处理超出预期的峰值流量。

Although the service was handling the peak traffic without a sweat, we started seeing issues during deployments. Each of our pods was handling more than10k RPM during peak time and we were using Kubernetes rolling update mechanism. During deployment, the response time of the service would spike for a few minutes before settling down to the usual steady-state. In our NewRelic dashboard, we would see a graph similar to this:

尽管该服务可以毫不费力地处理高峰流量，但我们开始在部署过程中发现问题。我们的每个Pod在高峰时间内的处理速度都超过了10k RPM，我们正在使用Kubernetes滚动更新机制。在部署过程中，服务的响应时间会激增几分钟，然后再稳定到通常的稳定状态。在我们的NewRelic仪表板中，我们将看到类似于以下的图形：

Meanwhile, we started receiving a lot of complaints from our dependent services of high response times and timeout errors during time periods that aligned with our deployments.

同时，我们开始在依赖于我们的部署的时间段内从我们的依赖服务中收到大量投诉，这些响应时间长且超时错误。

采取1：把钱扔在问题上 (Take 1: Throw Money At The Problem)

We quickly realized that the issue had something to do with the JVM warm-up phase, but didn’t have a lot of time to investigate because of other important things that were going on. So we tried the easiest solution — increase the number of pods in order to reduce per pod throughput. We increased the number of pods almost three-fold so that each pod handled ~4k RPM throughput at peak. We also tuned our deployment strategy to ensure a maximum 25% rollout at a time (using maxSurge and maxUnavailable parameters). That solved the problem and we were able to deploy without any issues in our or any of the dependent services, though we were running at 3x the capacity required for steady-state.

我们很快意识到该问题与JVM预热阶段有关，但是由于发生了其他重要事件，因此没有太多时间进行调查。因此，我们尝试了最简单的解决方案-增加容器的数量，以减少每个容器的吞吐量。我们将Pod的数量增加了近三倍，因此每个Pod在高峰时处理的吞吐量约为4k RPM。我们还调整了部署策略，以确保一次最多25％的部署(使用maxSurge和maxUnavailable参数)。这就解决了问题，尽管我们的运行能力是稳态所需容量的3倍，但我们还是能够在我们的服务或任何从属服务中进行任何部署。

As we migrated more services over the next few months, we started noticing the issue frequently in other services as well. We, then, decided to spend some time to investigate the issue and find a better solution.

在接下来的几个月中，随着我们迁移更多服务，我们也开始在其他服务中也频繁注意到此问题。然后，我们决定花一些时间来调查问题并找到更好的解决方案。

习题2：热身脚本 (Take 2: The Warm-Up Script)

After perusing through various articles, we decided to give the warm-up script a try. Our version of the idea was to run a warm-up script that sent synthetic requests to the service for a couple of minutes in the hopes that it’d warm-up the JVM and only then, allow actual traffic to it.

阅读各种文章之后，我们决定尝试一下热身脚本。我们的想法是运行一个预热脚本，该预热脚本将综合请求发送给服务几分钟，以期预热JVM，然后才允许实际流量通过。

To create the warm-up script, we scraped actual URLs from production traffic. We, then, created a Python script that sent parallel requests using those URLs. We configured the initialDelaySeconds of the readiness probe accordingly to ensure the warm-up script finishes before the pod is ready and starts accepting traffic.

为了创建预热脚本，我们从生产流量中抓取了实际的URL。然后，我们创建了一个Python脚本，该脚本使用这些URL发送并行请求。我们相应地配置了就绪探针的initialDelaySeconds ，以确保预热脚本在Pod ready并开始接受流量之前完成。

To our surprise, though we saw some improvement, it wasn’t significant. We still observed a worrisome spike in response time and errors. Also, the warm-up script introduced new problems. Earlier, our pods would become ready in 40–50 seconds, but with the script, they would take about 3 minutes, which became a concern during deployments, but more importantly, during auto-scaling. We tweaked a few things in the warm-up mechanism such as allowing a brief overlapping period between the warm-up script and the actual traffic and changes in the script itself but didn’t see significant improvements. Finally, we decided that the small benefits provided by the warm-up strategy were not worth it and ditched it completely.

令我们惊讶的是，尽管我们看到了一些改进，但这并不重要。我们仍然观察到响应时间和错误令人担忧的峰值。另外，预热脚本引入了新问题。之前，我们的Pod可以在40–50秒内准备就绪，但是使用脚本，它们大约需要3分钟，这在部署过程中成为一个问题，但更重要的是在自动缩放过程中。我们对热身机制进行了一些调整，例如允许在热身脚本和实际流量之间进行短暂的重叠，并在脚本本身中进行更改，但未看到明显的改进。最后，我们认为热身策略所带来的小收益是不值得的，因此完全放弃了。

采取3：探索启发法 (Take 3: Heuristics For Discovery)

Since our warm-up script idea tanked, we went back to the drawing board and decided to try some heuristic techniques with —

由于我们的热身脚本想法遭到拒绝，我们回到了绘图板上，并决定尝试一些启发式技术，

GC (G1, CMS, and Parallel) and various GC parameters
GC(G1，CMS和并行)和各种GC参数
Heap memory
堆内存
CPU allocated
CPU分配

After a few rounds of experiments, we finally hit a breakthrough. The service we were testing on had Kubernetes resource limits configured:

经过几轮实验，我们终于取得了突破。我们正在测试的服务已配置了Kubernetes资源限制：

resources:
  requests:
    cpu: 1000m
    memory: 2000Mi
  limits:
    cpu: 1000m
    memory: 2000Mi

We increased CPU request and limit to 2000m and deployed the service to see the impact. We saw a huge improvement in response time and errors, much better than the warm-up script.

我们增加了CPU请求并将其限制为2000m并部署了该服务以查看影响。与预热脚本相比，我们在响应时间和错误方面看到了巨大的进步。

To test it further, we upscaled the configuration to 3000m CPU and to our pleasant surprise, the issue was completely gone. As you can see below, there are no spikes in response time.

为了进一步测试，我们将配置升级到3000m CPU，令人惊讶的是，问题完全消失了。如下所示，响应时间没有峰值。

It quickly became clear that the issue was because of CPU throttling. Apparently, during the warm-up phase, the JVM needs more CPU time than the average steady-state but the Kubernetes resource handling mechanism (CGroup) was throttling the CPU as per the configured limit.

很快就发现问题出在CPU节流。显然，在预热阶段，JVM需要比平均稳态更多的CPU时间，但是Kubernetes资源处理机制(CGroup)正在按照配置的限制来限制CPU。

There was a straightforward way to verify this. Kubernetes exposes a per-pod metric container_cpu_cfs_throttled_seconds_total which denotes — how many seconds CPU has been throttled for this pod since its start. If we observe this metric with 1000m configuration, we should see a lot of throttling at the start and then settle down after a few minutes. We did the deployment with this configuration and here’s the graph of container_cpu_cfs_throttled_seconds_total for all the pods in Prometheus:

有一种直接的方法可以验证这一点。 Kubernetes公开了每个容器的度量标准container_cpu_cfs_throttled_seconds_total ，它表示自该容器启动以来已为它节流了多少秒的CPU。如果我们在1000m配置下遵守此指标，则应该在开始时看到很多节流，然后在几分钟后稳定下来。我们使用此配置进行了部署，这是Prometheus中所有Pod的container_cpu_cfs_throttled_seconds_total的图形：

As expected, there’s a lot of throttling in the first 5 to 7 minutes of the container start — mostly between 500 seconds to 1000 seconds, but then it settles down, confirming our hypothesis.

不出所料，在容器启动的前5到7分钟内有很多节流-大部分在500秒到1000秒之间，但随后稳定下来，证实了我们的假设。

When we deploy with 3000m CPU configuration, we observe the following graph:

当我们使用3000m CPU配置进行部署时，我们观察到下图：

CPU throttling is almost negligible (less than 4 seconds almost all the pods) and that’s why the deployment goes smooth.

CPU节流几乎可以忽略不计(几乎所有Pod不到4秒)，这就是部署顺利进行的原因。

采取4：让它破裂 (Take 4: Let It Burst)

Although we found the bottleneck that was causing the issue, the solution wasn’t very promising in terms of cost. Most of the services that faced this issue had a similar resource configuration and were running over-provisioned in terms of the number of pods just to avoid the deployment blues. But none of the teams warmed up (pun intended :-)) to the idea of increasing the CPU request/limit three-fold and reduce the number of pods accordingly. In terms of cost, this solution could actually be worse than running more pods since Kubernetes schedules pods on the basis of request and finding nodes with 3 spare CPU capacity was much harder than finding ones with 1 space CPU. It could cause the cluster autoscaler to trigger frequently, adding more nodes to the cluster.

尽管我们发现了造成此问题的瓶颈，但从成本角度来看，该解决方案并不是很有希望。面对此问题的大多数服务都具有相似的资源配置，并且在pod数量方面运行过度配置，只是为了避免部署失败。但是没有一个团队热身(双关语意为:-))来增加CPU请求/限制三倍，并相应减少Pod的数量。就成本而言，此解决方案实际上可能比运行更多的Pod更糟糕，因为Kubernetes根据request调度Pod，并且查找具有3个备用CPU容量的节点比查找具有1个空间CPU的节点要困难得多。这可能会导致集群自动缩放器频繁触发，从而向集群添加更多节点。

We went back to the drawing board, again (Phew!), but this time, with some new important information. We phrased the problem:

我们又回到了绘图板上(Phew！)，但是这次，有了一些新的重要信息。我们表述了这个问题：

JVM needs more CPU (~3000m) than the configued limit (1000m) during the initial warm-up phase which last a few minutes. After the warm-up, JVM can run at it's full potential comfortably even with 1000m CPU limit. Kubernetes schedules pods using "requests", not the "limits".

Once we read the problem statement with a clear, peaceful mind, the answer presented itself — Kubernetes Burstable QoS.

一旦我们以清晰，平静的心态阅读问题陈述，答案就会出现：Kubernetes Burstable QoS。

Kubernetes assigns QoS classes to pods based on the configured resource requests and limits.

Kubernetes根据配置的资源请求和限制将QoS类分配给Pod。

So far, we have been using the Guaranteed QoS class by specifying both requests and limits with equal values (initially both 1000m and then both 3000m). Although the Guaranteed QoS has its benefits, we don’t need the full power of 3 CPUs for the entirety of the pod lifecycle, we only need it for the first few minutes. The Burstable QoS class does just that. It allows us to specify the request less than the limit e.g.

到目前为止，我们一直在通过使用相等值(最初都是1000m ，然后都是3000m )指定请求和限制来使用保证的QoS类。尽管保证QoS具有其优势，但在整个Pod生命周期的整个过程中，我们不需要3个CPU的全部功能，而仅在前几分钟就需要它。 Burstable QoS类就是这样做的。它允许我们指定小于limit的request ，例如

resources:
  requests:
    cpu: 1000m
    memory: 2000Mi
  limits:
    cpu: 3000m
    memory: 2000Mi

Since Kubernetes uses the values specified in requests to schedule the pods, it’ll find nodes with 1000m spare CPU capacity to schedule this pod. But since the limit is much higher at 3000m , if the application needs more CPU than 1000m at any time and if spare CPU capacity is available on that node, the application will not be throttled on CPU. It can use up to 3000m if available.

由于Kubernetes使用requests指定的值来调度Pod，因此它将找到具有1000m备用CPU容量的节点来调度此Pod。但是，由于此限制在3000m处要高得多，因此，如果应用程序在任何时候都需要超过1000m CPU，并且该节点上有可用的CPU容量，则不会在CPU上限制应用程序。如果可用，它可以使用长达3000m 。

This fits nicely with our problem statement. During the warm-up phase when JVM needs more CPU, it can get it by bursting. Once the JVM is optimized, it can go on at full speed within the request . This allows us to use the spare capacity in our cluster (which we checked and found it was sufficiently available) to solve the warm-up problem without any additional cost.

这非常适合我们的问题陈述。在预热阶段，当JVM需要更多CPU时，它可以通过爆裂来获得它。优化JVM之后，它可以在request中全速运行。这使我们能够使用群集中的备用容量(我们检查并发现它足够可用)来解决预热问题，而无需任何额外费用。

Finally, it was time to test the hypothesis. We changed the resource configuration and deployed the application. And it worked! We did a few more deployments to test that we could repeat the results and it worked consistently. Also, we monitored the container_cpu_cfs_throttled_seconds_total metrics, and here’s the graph from one of the deployments:

最后，是时候检验假设了。我们更改了资源配置并部署了应用程序。而且有效！我们再进行了几次部署，以测试我们是否可以重复结果，并且该结果始终如一。此外，我们监视了container_cpu_cfs_throttled_seconds_total指标，这是其中一种部署的图表：

As we can see, this graph is quite similar to the Guaranteed QoS setting with 3000m CPU. Throttling is almost negligible and it confirms that the solution with Burstable QoS works.

如我们所见，此图与3000m CPU的“保证的QoS”设置非常相似。节流几乎可以忽略不计，它证实了具有Burstable QoS的解决方案有效。

Caveat Emptor: For the Burstable QoS solution to work, there needs to be spare capacity available on the nodes. This can happen in two ways:

警告Emptor：为了使Burstable QoS解决方案正常工作，节点上需要有可用的备用容量。这可以通过两种方式发生：

— Nodes are not fully packed in terms of CPU

—就CPU而言，节点未完全打包

— The workloads are not utilizing 100% of the requested CPU

—工作负载未使用请求的CPU的100％

In my experience, this is normally true. In cases where it’s not — your nodes are fully packed with pods and the pods are utilizing close to 100% of requested CPU all the time, you may not see the similar benefits of Burstable QoS.

以我的经验，这通常是正确的。如果情况并非如此-您的节点上装满了Pod，而Pod一直都在使用接近100％的请求CPU，则可能看不到Burstable QoS的类似好处。

As a side-note, I’d also encourage you to understand how CPU quota and CPU time allocation works in Kubernetes if you don’t already. There are a lot of good resources around.

作为附带说明，我还鼓励您了解Kubernetes中的CPU配额和CPU时间分配如何工作(如果您还没有的话)。周围有很多好的资源。

结论 (Conclusion)

Although it took us some time, we were happy to find a cost-effective solution. Kubernetes resource limits is an important, albeit a bit tricky, concept, which we learned the hard way. We implemented the solution in all our Java-based services and both the deployments and auto-scaling are working fine without any issues.

尽管花了一些时间，但我们很高兴找到一种经济高效的解决方案。 Kubernetes的资源限制是一个重要的概念，尽管有点棘手，但我们很难理解。我们在所有基于Java的服务中实施了该解决方案，并且部署和自动扩展都可以正常工作，没有任何问题。

关键要点： (Key takeaways:)

Think carefully when setting resource limits for your applications. Invest some time to understand your application workload and set requests/limits accordingly. Understand the implications of setting resource limits and various QoS classes.
在为应用程序设置资源限制时，请仔细考虑。花一些时间来了解您的应用程序工作负载并相应地设置请求/限制。了解设置资源限制和各种QoS类的含义。
Keep an eye on CPU throttling by monitoring/alerting on container_cpu_cfs_throttled_seconds_total . If you observe excessive throttling, try tuning the resource limits.
通过监视/更改container_cpu_cfs_throttled_seconds_total来密切注意CPU的限制。如果发现节流过多，请尝试调整资源限制。
When using Burstable QoS, make sure you specify the capacity required for steady-state performance in request and use burst capacity only for occasional spurts. Do not rely on the burst capacity for baseline performance.
使用Burstable QoS时，请确保在request指定稳定状态性能所需的容量，并且仅将突发容量用于偶尔的突发情况。不要依靠突发容量来获得基准性能。

I would like to thank everyone who lent their invaluable support and worked on various aspects of this — Nikhil Sharma, Sahil Thakral, Arjun PP, Gaurav Bang, Aakash Garg, Akshay Sharma, Sumit Garg, and Nikhil Dhiman.

我要感谢所有提供了宝贵支持并在此各个方面工作的人-Nikhil Sharma ，Sahil Thakral，Arjun PP，Gaurav Bang，Aakash Garg，Akshay Sharma，Sumit Garg和Nikhil Dhiman 。