grpc istio_在皮质数据湖中使用grpc envoy和istio进行大规模数据摄取

最新推荐文章于 2023-07-02 11:40:58 发布

weixin_26636643

最新推荐文章于 2023-07-02 11:40:58 发布

阅读量814

点赞数 1

文章标签： python java 大数据人工智能

原文链接：https://medium.com/engineering-at-palo-alto-networks/large-scale-data-ingestion-using-grpc-envoy-and-istio-in-cortex-data-lake-ec82ea87fa3b

版权

本文翻译自Palo Alto Networks的文章，探讨如何在Cortex数据湖中利用grpc、envoy和istio进行大规模数据摄入。通过结合这些技术，实现高效、可靠的数据处理和传输。

摘要由CSDN通过智能技术生成

grpc istio

Cortex Data Lake collects, transforms and integrates your enterprise’s security data to enable Palo Alto Networks solutions. This product helps organizations consume and understand data at a much higher rate — letting system administrators focus on the alerts that matter most. It combines advanced artificial intelligence and machine learning across all your enterprise’s data, and provides for a more meaningful response to sophisticated attacks.

Cortex Data Lake收集，转换和集成企业的安全数据，以启用Palo Alto Networks解决方案。该产品可帮助组织以更高的速度使用和理解数据-让系统管理员专注于最重要的警报。它结合了跨企业所有数据的高级人工智能和机器学习，并提供了对复杂攻击的更有意义的响应。

Cortex Data Lake Ingestion Service has the requirement to ingest millions of requests per second while acknowledging each request to guarantee no data loss at low latency. Cortex Data Lake clients are typically long-lived and send data continuously. With these requirements, gRPC quickly emerged as the technology of choice to build Cortex Data Lake Ingestion Service. When we started on this project Envoy was the only high-performance proxy that supported gRPC and Istio was the preferred control plane for Envoy.

Cortex Data Lake提取服务要求每秒接收数百万个请求，同时确认每个请求，以确保在低延迟下不会丢失数据。 Cortex Data Lake客户端通常是长期存在的，并且可以连续发送数据。有了这些要求， gRPCSwift成为构建Cortex Data Lake Ingestion Service的首选技术。当我们开始这个项目时， Envoy是唯一支持gRPC的高性能代理，而Istio是Envoy的首选控制平面。

使者 (Envoy)

Envoy is a high-performance open source edge and service proxy designed for cloud-native applications. It is typically deployed in a distributed fashion as a side-car with application containers in the same application pod. Envoy handles advanced routing, monitoring, tracing, logging, and other cross-cutting concerns. Envoy has a programmatic control plane that allows it to be dynamically configured.

Envoy是专为云原生应用程序设计的高性能开源边缘和服务代理。它通常以分布式方式部署为在同一应用程序容器中具有应用程序容器的边车。 Envoy处理高级路由，监视，跟踪，日志记录和其他跨领域问题。 Envoy具有程序控制平面，可对其进行动态配置。

Istio (Istio)

Istio is an open source Service Mesh and API Gateway that provides capabilities to connect, secure, control, and observe services. Istio relies on Envoy for data-plane and implements Envoy’s control plane APIs. It includes all the application networking and operational best practices like timeouts, retries, circuit breaking, advanced load balancing, fault injection, and mTLS. Cortex Data Lake API Gateway as well as Service Mesh for Data Services are powered by Istio. Following is the high-level skeleton of Cortex Data Lake Infrastructure Setup.

Istio是一个开放源代码的服务网格和API网关，提供连接，保护，控制和观察服务的功能。 Istio依赖Envoy进行数据平面设计，并实现Envoy的控制平面API。它包括所有应用程序网络和操作最佳实践，例如超时，重试，断路，高级负载平衡，故障注入和mTLS。 Istio为Cortex Data Lake API网关以及用于数据服务的服务网格提供了支持。以下是Cortex Data Lake Infrastructure安装程序的高级框架。

At the API Gateway layer Istio supports both mTLS and JWT authentication policies. We are using both authentication mechanisms depending on the use case. There are some challenges in exposing a service to support both mTLS and JWT depending on its client. Those details will be covered in a separate post. Overall, Istio has worked well for us, but with earlier versions of Istio we ran into bottlenecks with Istio telemetry getting overloaded with a large number of streams. We turned off Istio telemetry and are now using Envoy native telemetry.

在API网关层，Istio支持mTLS和JWT身份验证策略。我们根据用例使用两种身份验证机制。根据客户的不同，公开支持mTLS和JWT的服务存在一些挑战。这些细节将在单独的帖子中介绍。总体而言，Istio对我们来说运行良好，但是在Istio的早期版本中，Istio遥测技术遇到了瓶颈，使大量流过载。我们关闭了Istio遥测，现在正在使用Envoy本地遥测。

gRPC (gRPC)

gRPC was created by Google as open-source evolution of their internal RPC technology Stubby. gRPC uses HTTP/2 as its transport protocol. HTTP/2 can multiplex many parallel requests over the same connection and allows full-duplex bidirectional communication.

gRPC由Google创建，是其内部RPC技术Stubby的开源演变。 gRPC使用HTTP / 2作为其传输协议。 HTTP / 2可以通过同一连接多路传输许多并行请求，并允许全双工双向通信。

Image for post — Introduction to Http2 Http2简介

gRPC uses a channel abstraction to facilitate concurrent use of underlying HTTP/2 connections and to provide flow control capabilities. Within a channel, multiple RPCs may be issued, each of which maps to an underlying HTTP/2 stream.

gRPC使用通道抽象来促进底层HTTP / 2连接的并发使用并提供流控制功能。在一个通道内，可以发出多个RPC，每个RPC都映射到基础HTTP / 2流。

gRPC uses Protocol Buffers as the Interface Definition Language and also for underlying message interchange format. The source code for gRPC client and server interfaces is generated for different languages using the protoc compiler.

gRPC使用协议缓冲区作为接口定义语言，也用于基础消息交换格式。使用protoc编译器为不同的语言生成gRPC客户端和服务器接口的源代码。

gRPC client and server stubs implement the StreamObserver interface for sending and receiving messages. For outgoing messages, a StreamObserver is provided to the application by the gRPC library. For incoming messages, the application implements the StreamObserver and passes it to the gRPC library for receiving. The StreamObserver interface is rather simple with just three methods:

gRPC客户端和服务器存根实现了StreamObserver接口，用于发送和接收消息。对于传出消息，gRPC库将StreamObserver提供给应用程序。对于传入消息，该应用程序实现StreamObserver并将其传递到gRPC库以进行接收。 StreamObserver接口非常简单，仅使用以下三种方法：

onNext: Receives a value from the stream
onNext：从流中接收值
onError: Receives a terminating error from the stream
onError：从流中接收终止错误
onCompleted: Receives notification of successful stream completion
onCompleted：接收成功完成流的通知

一元v / s双向 (Unary v/s Bi-directional)

gRPC applications can be written using different types of service methods and we evaluated unary and bi-directional. The pros and cons of each approach are listed below with the preferred characteristics shown in bold.

可以使用不同类型的服务方法来编写gRPC应用程序，并且我们对一元和双向进行了评估。下面列出了每种方法的优缺点，并以粗体显示了首选特征。

With bi-directional streams, the message throughput is higher and latency is lower, thereby meeting our design requirements. Having long-lived streams and multiple messages per stream transfers some responsibilities from the gRPC protocol to the application. The desired functionality had to be implemented within our client and server applications. The increased complexity was worth the higher throughput provided by bi-directional streams.

使用双向流，消息吞吐量更高，等待时间更低，从而满足我们的设计要求。流具有长寿命的流，每个流具有多个消息，这会将某些职责从gRPC协议转移到应用程序。所需的功能必须在我们的客户端和服务器应用程序中实现。增加的复杂度值得双向流提供更高的吞吐量。

消息确认和错误处理 (Message Acknowledgement and Error Handling)

Cortex Data Lake has the requirement to acknowledge each request. Our gRPC client application sends discrete request payloads on their outbound stream and receives ACKs for those requests on their inbound stream. This allows clients to use timers and retries to compensate for network problems. Each request contains a unique ID. Each ACK contains the ID of a corresponding request and a description of the result of that request. As the client receives ACKs from the server, it inspects the messages, checks for errors, and decides which messages can be retried and which messages must be dropped. The client also implements exponential backoff on retries to allow the server to recover if it is overloaded.

Cortex Data Lake要求确认每个请求。我们的gRPC客户端应用程序在其出站流上发送离散的请求有效负载，并在入站流上接收这些请求的ACK。这允许客户端使用计时器和重试来补偿网络问题。每个请求都包含一个唯一的ID。每个ACK都包含一个相应请求的ID以及对该请求结果的描述。当客户端从服务器接收到ACK时，客户端将检查消息，检查错误并确定可以重试的消息和必须丢弃的消息。客户端还对重试实现指数补偿，以允许服务器在过载时恢复。

流量控制 (Flow Control)

Flow Control is a mechanism to prevent senders from overwhelming the receiver of data. The receiver may be busy under heavy load and may not have resources to handle the additional load. The receiver, in this case, should exert flow control. gRPC relies on underlying HTTP/2 flow control capabilities.

流控制是一种防止发送方压倒数据接收方的机制。接收器可能在重负载下很忙，可能没有资源来处理其他负载。在这种情况下，接收器应进行流量控制。 gRPC依赖于基础的HTTP / 2流控制功能。

In our ingestion pipeline, we have gRPC client communicating to gRPC server via Istio API Gateway as shown in the diagram below

在我们的接收管道中，我们使gRPC客户端通过Istio API Gateway与gRPC服务器通信，如下图所示

There are many stream buffers involved in the pipeline. The larger the buffer, the more memory it can use on congested upstream and the longer it takes to communicate backpressure.

管道中涉及许多流缓冲区。缓冲区越大，它可以在拥塞的上游使用更多的内存，并且传达背压所需的时间也越长。

To implement backpressure feedback loop in gRPC client for each stream we use CallStreamObserver.html#setOnReadyHandler. This notification calls our application client code every time the stream isReady() state changes from false to true.

要在gRPC客户端中为每个流实现反压反馈循环，我们使用CallStreamObserver.html＃setOnReadyHandler 。每当流的isReady()状态从false变为true时，此通知就会调用我们的应用程序客户端代码。

gRPC服务器优化 (gRPC Server Optimizations)

In the initial implementation of our gRPC Server, we had large queues and many threads. At high load, we observed cascading failures and limited throughput at much higher latency.

在我们的gRPC Server的最初实现中，我们有大队列和许多线程。在高负载下，我们观察到级联故障和吞吐量受到限制，而延迟却高得多。

We added detailed metrics at each step to identify where we were spending time and took thread dumps. We identified that threads were contending and the server was not exerting backpressure quickly. We even ran into a JDK bug where java.security.Provider.getService() synchronization became a scalability bottleneck at high load. This required us to upgrade to JDK 13. We reduced the size of thread pools in gRPC server to two times the number of cores and that eliminated most of the thread contention.

我们在每个步骤都添加了详细的指标，以确定我们在哪里花费时间并进行了线程转储。我们确定线程在争用，并且服务器没有Swift施加背压。我们甚至遇到了一个JDK错误，其中java.security.Provider.getService()同步成为高负载时的可伸缩性瓶颈。这要求我们升级到JDK13。我们将gRPC服务器中的线程池的大小减少到内核数的两倍，从而消除了大多数线程争用。

Since the pipeline is asynchronous with several buffers/queues we were simply enqueuing more work than could be processed. We did a bunch of controlled-load tests, keeping the gRPC Server CPU busy. We profiled our code and tuned it, then we tuned the Kafka producer embedded in our server application. We established that the request processing thread p99 processing time we can achieve is 70–80 ms and Kafka writes of 125–200 ms.

由于流水线与多个缓冲区/队列是异步的，因此我们只是在排队更多的工作而不是可以处理的工作。我们进行了许多受控负载测试，使gRPC Server CPU保持繁忙。我们分析了代码并对其进行了调整，然后对嵌入在服务器应用程序中的Kafka生产者进行了调整。我们确定，请求处理线程p99可以实现的处理时间为70–80 ms，Kafka写操作为125–200 ms。

By bounding the input queue, the server will not read from gRPC when full and exert backpressure. We used the following formulae to calculate gRPC Server request queue length:

通过限制输入队列，服务器将不会在gRPC已满并施加反压时从gRPC进行读取。我们使用以下公式来计算gRPC Server请求队列长度：

maxLatency = (transactionTime / number of threads) * queueLength

maxLatency =(transactionTime /线程数)* queueLength

要么

queueLength = maxLatency / (transactionTime / number of threads)

queueLength = maxLatency /(transactionTime /线程数)

We kept maxLatency the same as transactionTime to have maximum backpressure and settled with a queue length the same as the number of threads. With this approach, the workload was mostly CPU bound and it auto-scaled well with varying loads.

我们将maxLatency保持与transactionTime相同，以具有最大的背压，并以与线程数相同的队列长度来解决。通过这种方法，工作负载主要受CPU限制，并且可以随着负载的变化自动很好地扩展。

负载均衡 (Load Balancing)

gRPC keeps the TCP session open as long as possible to maximize throughput and minimize overhead, but the long-lived sessions make load balancing complex. This is more of an issue in an auto-scaled Kubernetes environments where with increased load new pods are added, but the client will stay connected to the same gRPC server pods, resulting in unequal load distribution.

gRPC使TCP会话尽可能长时间地保持打开状态，以最大程度地提高吞吐量并最大程度地减少开销，但是长期存在的会话使负载平衡变得复杂。在自动扩展的Kubernetes环境中，这是一个更大的问题，在该环境中，随着负载的增加，添加了新的Pod，但是客户端将保持连接到相同的gRPC服务器Pod，从而导致负载分配不均。

The designers of gRPC had already thought about this problem and added support for a connection expiration policy on the gRPC Server. This expiration policy will force clients to disconnect and reconnect to another server. Connection expiration can be performed by causing connections to expire after a certain amount of time has elapsed. The Java gRPC library implements this with the maxConnectionAge() and maxConnectionAgeGrace() server builder options. These functions serve to limit and then forcibly terminate a gRPC channel, respectively. When a gRPC channel expires, the server will send an HTTP/2 GOAWAY, indicating that the client may not start new requests but may finish existing ones. At the end of the max connection age grace, the gRPC server will send a second HTTP/2 GOAWAY and close the channel.

gRPC的设计者已经考虑了此问题，并增加了对gRPC服务器上的连接过期策略的支持。此到期策略将强制客户端断开连接，然后重新连接到另一台服务器。可以通过在经过一定时间后使连接过期来执行连接过期。 Java gRPC库使用maxConnectionAge()和maxConnectionAgeGrace()服务器构建器选项来实现此目的。这些功能分别用于限制和强制终止gRPC通道。当gRPC通道到期时，服务器将发送HTTP / 2 GOAWAY，指示客户端可能不会启动新请求，但可能会完成现有请求。在最大连接期限宽限期结束时，gRPC服务器将发送第二个HTTP / 2 GOAWAY并关闭通道。

We used fixed-size streams to send batches of requests and had to consider the following trade-offs:

我们使用固定大小的流发送批量请求，并且必须考虑以下折衷：

Larger stream size permit higher throughput but use more memory
较大的流大小允许更高的吞吐量，但使用更多的内存
Smaller stream sizes reduce memory usage but cause the client and server to block more frequently while waiting for messages to be acknowledged.
较小的流大小会减少内存使用量，但会导致客户端和服务器在等待确认消息时更频繁地阻塞。

Stream size played a very important role in load balancing. With larger streams, their distribution on different Ingestion Server pods was non-uniform and would result in a wide range of CPU utilization across Ingestion Server pods thereby affecting Kubernetes Horizontal Pod Autoscaling. The table below shows the summary results of our tests with different stream sizes.

流大小在负载平衡中起着非常重要的作用。对于较大的流，它们在不同的Ingestion Server Pod上的分布不均匀，并且会导致整个Ingestion Server Pod的CPU利用率范围很广，从而影响Kubernetes水平Pod的自动缩放。下表显示了不同流大小的测试摘要结果。

GKE (GKE)

We are using GKE and it required additional tuning for our applications.

我们正在使用GKE，并且需要对我们的应用程序进行其他调整。

节点内核调整 (Node Kernel Tuning)

At high load nodes were becoming unresponsive because of low conntrack and threadMax limits. We increased CONNTRACK_MAX to 2 million, CONNTRACK_HASHSIZE to 0.5 million, and THREAD_MAX bumped to 4 million.

在高负载下，由于低的conntrack和threadMax限制，节点变得无响应。我们将CONNTRACK_MAX增加到200万，将CONNTRACK_HASHSIZE增加到50万，将THREAD_MAX增加到400万。

IO节流 (IO throttling)

We were using regular disks and ran into IO throttling causing Docker daemon and Kubernetes to become unstable. We moved our workloads to node-pools with SSD to avoid throttling.

我们使用常规磁盘进入IO限制，导致Docker守护程序和Kubernetes变得不稳定。我们将工作负载转移到具有SSD的节点池中，以避免节流。

节点内存耗尽 (Node Memory Exhaustion)

Some of our workloads were not tuned initially and did not have proper limits set, resulting in node instability and memory exhaustion and frequent docker and kubelet restarts. We profiled our workloads and tuned resource requests and limits to not exhaust node resources.

我们的某些工作负载最初并未进行调整，并且未设置适当的限制，从而导致节点不稳定和内存耗尽，以及频繁的docker和kubelet重新启动。我们分析了工作负载并调整了资源请求和限制，以免耗尽节点资源。

结果 (Results)

With all these changes and tuning, here are the results of one test where we ran a load of 800k rps and system auto scaled quickly to absorb the load.

经过所有这些更改和调整，这是一个测试的结果，在该测试中，我们运行了800k rps的负载，并且系统快速自动缩放以吸收负载。

The pipeline is very efficient. Istio ILB can easily handle 10,000 requests per core @ 65% average cpu utilization and Ingestion Frontend can handle 1000 requests per core @ 65% average cpu utilization.

管道非常有效。 Istio ILB可以轻松地在平均cpu利用率为65％的情况下处理每个核心10,000个请求，而Ingestion Frontend可以在平均cpu利用率为65％的情况下处理每个核心1000个请求。

This was a marathon effort by gRPC Client, API Gateway and Ingestion team members namely

这是gRPC客户端，API网关和Ingestion小组成员的马拉松工作，即