gke google_如何使用GKE设置多集群负载平衡

最新推荐文章于 2024-05-27 06:30:00 发布

weixin_26742939

最新推荐文章于 2024-05-27 06:30:00 发布

阅读量748

点赞数

文章标签： python java linux

原文链接：https://blog.doit-intl.com/how-to-setup-multi-cluster-load-balancing-with-gke-4b407e1f3dff

版权

gke google

Understand components of GCP Load Balancing and learn how to set up globally available GKE multi-cluster load balancer, step-by-step.

了解GCP负载平衡的组件，并逐步学习如何设置全局可用的GKE多群集负载平衡器。

One of the features I like the most about GCP is the external HTTP(S) Load Balancing. This is a global load balancer which gives you a single anycast IP address (no DNS load balancing needed, yay!). Requests enter Google’s global network at one of the edge points of presence (POPs) close to the user,¹ and are proxied to the closest region with available capacity. This results in a highly available, globally distributed, scalable, and fully managed load balancing setup. It can be further augmented with DDoS and WAF protection Cloud Armor, Cloud CDN, or Identity-Aware Proxy (IAP) to secure access to your web applications.

我最喜欢GCP的功能之一是外部HTTP(S)负载平衡 。这是一个全局负载平衡器，它为您提供一个任播IP地址(不需要DNS负载平衡，是的！)。请求在靠近用户¹的存在边缘(POP)之一进入Google的全球网络，并被代理到具有可用容量的最近区域。这将导致高可用性，全局分布，可伸缩和完全托管的负载平衡设置。可以使用DDoS和WAF保护，Cloud Armor，Cloud CDN或身份识别代理(IAP)进一步增强它，以保护对Web应用程序的访问。

With this, multi-cluster load balancing with GKE immediately comes into mind and is often a topic of interest from our customers. And while there’s no native support in GKE/Kubernetes at the moment,² GCP provides all necessary building blocks to set this up yourself.

这样一来，立即就可以想到使用GKE的多群集负载平衡，并且这经常是我们客户感兴趣的话题。尽管目前GKE / Kubernetes尚无本地支持，但GCP提供了所有必要的构建块来自行设置。

Let’s get familiar with the GCP Load Balancing components in the first part. We will follow the journey of a request as it enters the system and understands what each of the load balancing building blocks represents. And we will set up load balancing across two GKE clusters step by step in the second part.

让我们在第一部分中熟悉GCP负载平衡组件。当请求进入系统时，我们将跟踪请求的过程，并了解每个负载平衡构建块所代表的含义。在第二部分中，我们将逐步在两个GKE群集之间建立负载平衡。

GCP负载平衡概述 (GCP Load Balancing Overview)

Let’s start with a high-level Load Balancing flow overview. HTTP(S) connection from the client is terminated at edge location by Google Front Ends (GFEs),³ based on HTTP(S) Target Proxy, and Forwarding Rule configuration. The Target Proxy consults associated URL Map and Backend Service definitions to determine how to route traffic. From the GFEs a new connection will be established, and traffic flows over the Google Network to the closest healthy Backend with available capacity. Traffic within the region is then distributed across individual Backend Endpoints, according to their capacity.

让我们从高层次的负载平衡流概述开始。来自客户端的HTTP(S)连接由Google前端 (GFE)在边缘位置终止，³基于HTTP(S) 目标代理和转发规则配置。目标代理查询相关的URL映射和后端服务定义，以确定如何路由流量。从GFE建立新的连接，流量将通过Google联网流向具有可用容量的最近的健康后端。然后，根据区域的容量，将区域内的流量分配到各个后端。

GCP负载平衡组件 (GCP Load Balancing Components)

Forwarding Rule — each rule is associated with a specific IP and port. Given we’re talking about global HTTP(S) load balancing, this will be anycast global IP address (optionally reserved static IP). The associated port is the port on which the load balancer is ready to accept traffic from external clients.⁴
转发规则 -每个规则都与特定的IP和端口相关联。鉴于我们正在谈论全局HTTP(S)负载平衡，这将是任播全局IP地址(可选地保留的静态IP)。关联的端口是负载均衡器准备好接受来自外部客户端的流量的端口。⁴
Target HTTP(S) Proxy — traffic is then terminated based on Target Proxy configuration. Each Target Proxy is linked to exactly one URL Map (N:1 relationship). You’ll also have to attach at least one SSL certificate and configure SSL Policy In the case of HTTPS proxy.
目标HTTP(S)代理 -然后根据目标代理配置终止流量。每个目标代理都仅链接到一个URL映射(N：1关系)。对于HTTPS代理，还必须附加至少一个SSL证书并配置SSL策略。
URL Map — is the core traffic management component and allows you to route incoming traffic between different Backend Services (incl. GCS buckets). Basic routing is hostname and path-based, but more advanced traffic management is possible as well — URL redirects, URL rewriting and header- and query parameter-based routing. Each rule directs traffic to one Backend Service.
URL Map —是核心流量管理组件，使您可以在不同的后端服务(包括GCS存储桶)之间路由传入的流量。基本路由是基于主机名和路径的，但是也可以进行更高级的流量管理-URL重定向，URL重写以及基于标头和查询参数的路由。每个规则将流量定向到一个后端服务。
Backend Service — is a logical grouping of backends for the same service and relevant configuration options, such as traffic distribution between individual Backends, protocols, session affinity, or features like Cloud CDN, Cloud Armor or IAP. Each Backend Service is also associated with a Health Check.
后端服务 -是相同服务和相关配置选项的后端的逻辑分组，例如各个后端之间的流量分配，协议，会话亲和性或诸如Cloud CDN，Cloud Armor或IAP之类的功能。每个后端服务还与运行状况检查关联。
Health Check -determines how are individual backend endpoints checked for being alive, and this is used to compute the overall health state for each Backend. Protocol and port have to be specified when creating one, along with some optional parameters like check interval, healthy and unhealthy thresholds, or timeout. An important bit to note is that firewall rules allowing health-check traffic from a set of internal IP ranges⁵ must be in place.
运行状况检查 -确定如何检查各个后端端点是否处于活动状态，这用于计算每个后端的总体运行状况。创建协议和端口时，必须指定协议和端口，以及一些可选参数，例如检查间隔，正常和不正常的阈值或超时。需要注意的重要一点是，必须有允许来自一组内部IP范围⁵的运行状况检查流量的防火墙规则。
Backend — represents a group of individual endpoints in a given location. In case of GKE, our backends will be Network Endpoint Groups (NEGs),⁶ one per each zone of our GKE cluster (in case of GKE NEGs these are zonal, but some backend types are regional).
后端 -代表给定位置中的一组单个端点。对于GKE，我们的后端将是网络端点组(NEG)，⁶我们的GKE群集的每个区域一个(对于GKE NEG，这些是区域性的，但是某些后端类型是区域性的)。
Backend Endpoint — is a combination of IP address and port, in case of GKE with container-native load balancing⁷ pointing to individual Pods.
后端端点 —是GKE的IP地址和端口的组合，其中容器本机负载均衡指向单个Pod。

建立 (Setup)

We will set up a multi-cluster load balancing for two services — Foo and Bar — deployed across two clusters (fig. 3). We’ll use simple path-based rules, and route any request for /foo/* to service Foo, resp. /bar/* to service Bar.

我们将为跨两个集群部署的两个服务(Foo和Bar)设置多集群负载平衡(图3)。我们将使用基于路径的简单规则，并将对/foo/*任何请求路由到服务Foo，分别。 /bar/*服务酒吧。

先决条件 (Prerequisites)

2 GKE Clusters, in VPC-native mode, let’s call them primary and secondary⁸
2个GKE群集，在VPC本地模式下，我们称它们为主要和辅助 ⁸
DNS record to point to the static IP
DNS记录指向静态IP
Recent version of gcloud CLI
最新版本的gcloud CLI
Clone of the stepanstipl/gke-multi-cluster-native repository⁹
stepanstipl / gke-multi-cluster-native本地存储库的克隆⁹

git clone https://github.com/stepanstipl/gke-multi-cluster-native.gitcd gke-multi-cluster-native

将应用程序和服务部署到GKE集群 (Deploy Applications and Services to GKE clusters)

Let’s start by deploying simple demo applications to each of the clusters. The application displays details about serving cluster and region, and source code is available at stepanstipl/k8s-demo-app.

让我们从将简单的演示应用程序部署到每个集群开始。该应用程序显示有关服务群集和区域的详细信息，源代码可在stepanstipl / k8s-demo-app中获得。

Repeat the following steps for each of your clusters.

对每个群集重复以下步骤。

获取kubectl的凭据 (Get Credentials for kubectl)

gcloud container clusters get-credentials [cluster] \
  --region [cluster-region]

部署Foo和Bar应用程序 (Deploy Both Foo & Bar Applications)

kubectl apply -f deploy-foo.yaml
kubectl apply -f deploy-bar.yaml

You can verify that Pods for both services are up and running by kubectl get pods.

您可以通过kubectl get pods验证两种服务的Pods是否都已启动并正在运行。

为两个应用程序创建K8s服务 (Create K8s Services for Both Applications)

kubectl apply -f svc-foo.yaml kubectl apply -f svc-bar.yaml

Note the cloud.google.com/neg: '{"exposed_ports": {"80":{}}}' annotation on the services telling GKE to create a NEG for the Service.

请注意服务上的cloud.google.com/neg: '{"exposed_ports": {"80":{}}}'注释，该注释告诉GKE为该服务创建NEG。

You can verify services are set up correctly by forwarding local port using the kubectl port-forward service/foo 8888:80 and accessing the service at http://localhost:8888/.

您可以通过使用kubectl port-forward service/foo 8888:80转发本地端口并访问http：// localhost：8888 /来验证服务是否正确设置。

Now don’t forget to repeat the above for all your clusters.

现在，请不要忘记对所有集群重复上述操作。

设置负载平衡(GCLB)组件 (Setup Load Balancing (GCLB) Components)

创建健康检查 (Create a Health Check)

gcloud compute health-checks create http health-check-foobar \
  --use-serving-port \
  --request-path="/healthz"

创建后端服务 (Create Backend Services)

Create backend service for each of the services, plus one more to serve as default backend for traffic that doesn’t match the path-based rules.

为每个服务创建后端服务，再为后端服务创建一个后端服务，以作为不符合基于路径规则的流量的默认后端。

gcloud compute backend-services create backend-service-default \
  --globalgcloud compute backend-services create backend-service-foo \
  --global \
  --health-checks health-check-foobargcloud compute backend-services create backend-service-bar \
  --global \
  --health-checks health-check-foobar

创建URL映射 (Create URL Map)

gcloud compute url-maps create foobar-url-map \
 --global \
 --default-service backend-service-default

将路径规则添加到URL映射 (Add Path Rules to URL Map)

gcloud compute url-maps add-path-matcher foobar-url-map \
  --global \
  --path-matcher-name=foo-bar-matcher \
  --default-service=backend-service-default \
  --backend-service-path-rules='/foo/*=backend-service-foo,/bar/*=backend-service-bar'

保留静态IP地址 (Reserve Static IP Address)

gcloud compute addresses create foobar-ipv4 \
  --ip-version=IPV4 \
  --global

设定DNS (Setup DNS)

Point your DNS to the previously reserved static IP address. Note the IP address you have requested:

将您的DNS指向以前保留的静态IP地址。记下您请求的IP地址：

gcloud compute addresses list --global

Create an A record foobar.[your_domain_name] pointing to this IP. You can use Cloud DNS to manage the record or any other service of your choice. This step should be completed before moving forward¹⁰

创建一个指向该IP的A记录foobar.[your_domain_name] 。您可以使用Cloud DNS管理记录或您选择的任何其他服务。此步骤应在继续前进之前完成。

创建托管的SSL证书 (Create Managed SSL Certificate)

gcloud beta compute ssl-certificates create foobar-cert \
 --domains "foobar.[your_domain_name]"

创建目标HTTPS代理 (Create Target HTTPS Proxy)

gcloud compute target-https-proxies create foobar-https-proxy \
  --ssl-certificates=foobar-cert \
  --url-map=foobar-url-map

创建转发规则 (Create Forwarding Rule)

gcloud compute forwarding-rules create foobar-fw-rule \
  --target-https-proxy=foobar-https-proxy \
  --global \
  --ports=443 \
  --address=foobar-ipv4

验证TLS证书 (Verify TLS Certificate)

The whole process of certificate provisioning can take a while. You can verify its status using the:

证书供应的整个过程可能需要一段时间。您可以使用以下方法验证其状态：

gcloud beta compute ssl-certificates describe foobar-cert

The managed.status should become ACTIVE within the next 60 minutes or so, usually sooner, if everything was set up correctly.

如果一切都正确设置， managed.status应该在接下来的60分钟左右(通常更快)内变为ACTIVE 。

将K8s服务连接到负载均衡器 (Connect K8s Services to the Load Balancer)

GKE has provisioned NEGs for each of the K8s services deployed with the cloud.google.com/neg annotation. Now we need to add these NEGs as backends to corresponding backend services.

GKE已为每个使用cloud.google.com/neg注释部署的K8s服务配置了cloud.google.com/neg 。现在，我们需要将这些NEG作为后端添加到相应的后端服务。

检索已调配的NEG的名称 (Retrieve Names of Provisioned NEGs)

kubectl get svc \
  -o custom-columns='NAME:.metadata.name,NEG:.metadata.annotations.cloud\.google\.com/neg-status'

Note down the NEG name and zones for each service.

记下每个服务的NEG名称和区域。

Repeat for all your GKE Clusters.

对所有GKE集群重复上述步骤。

将NEG添加到后端服务 (Add NEGs to Backend Services)

Repeat the following for every NEG and zone from both clusters. Make sure to use only NEGs belonging to the Foo service.

对两个群集中的每个NEG和区域重复以下操作。确保仅使用属于Foo服务的 NEG。

gcloud compute backend-services add-backend backend-service-foo \
 --global \
 --network-endpoint-group [neg_name] \
 --network-endpoint-group-zone=[neg_zone] \
 --balancing-mode=RATE \
 --max-rate-per-endpoint=100

And same for Bar service, again repeat for both clusters, every NEG and zone:

对于Bar服务也是如此 ，对每个群集，每个NEG和区域再次重复：

gcloud compute backend-services add-backend backend-service-bar \
 --global \
 --network-endpoint-group [neg_name] \
 --network-endpoint-group-zone=[neg_zone] \
 --balancing-mode=RATE \
 --max-rate-per-endpoint=100

允许GCLB流量 (Allow GCLB Traffic)

gcloud compute firewall-rules create fw-allow-gclb \
  --network=[vpc_name] \
  --action=allow \
  --direction=ingress \
  --source-ranges=130.211.0.0/22,35.191.0.0/16 \
  --rules=tcp:8080

验证后端是否健康 (Verify Backends Are Healthy)

gcloud compute backend-services get-health \
  --global backend-service-foo gcloud compute backend-services get-health \
  --global backend-service-bar

You should typically see 6 backends (3 per cluster,¹¹ 1 per each zone) for each backend service, with healthState: HEALTHY. It might take a while before all the backends are healthy after adding the firewall rules.

您通常应该为每个后端服务看到6个后端(每个集群3个，每个区域1个1)，并且具有healthState: HEALTHY 。添加防火墙规则后，可能需要一段时间才能使所有后端正常运行。

测试一切正常 (Test Everything’s Working)

Curl your DNS name https://foobar.[your-domain] (or open in the browser). You should get 502 for the root, as we didn’t add any backends for the default service.

卷曲您的DNS名称https://foobar.[your-domain] (或在浏览器中打开)。根目录应该为502，因为我们没有为默认服务添加任何后端。

curl -v "https://foobar.[your-domain]"

Now curl paths for individual services https://foobar.[your-domain]/foo/ or https://foobar.[your-domain]/bar/ and you should receive 200 and content from the corresponding service.

现在，卷曲各个服务的路径https://foobar.[your-domain]/foo/或https://foobar.[your-domain]/bar/ ，您应该从相应的服务中收到200内容。

curl -v "https://foobar.[your-domain]/foo/"
curl -v "https://foobar.[your-domain]/bar/"

If you retry a few times, you should see traffic served by different Pods and Clusters.¹²

如果重试几次，应该会看到由不同的Pod和群集服务的流量。¹²

If you simulate some traffic, for example using one of my favorite CLI tools vegeta, you can nicely observe traffic distribution across backends in the GCP Console. Go to Network services -> Load balancing section -> select your load balancer -> Monitoring tab and select the corresponding backend. You should see a dashboard similar to fig. 5.

如果您模拟一些流量，例如使用我最喜欢的CLI工具vegeta之一，则可以在GCP控制台中很好地观察后端之间的流量分布。转到网络服务 -> 负载平衡部分->选择您的负载平衡器-> 监视选项卡，然后选择相应的后端。您应该看到类似于无花果的仪表板。 5，

Now it’s a good time to experiment a bit. Let’s see what happens if you have clusters in the same region, and what if they’re in different regions. Increase the load and see the traffic overflow to another region (hint: remember the --max-rate-per-endpoint used before?). See what happens if you take one of the clusters down. And can you add a 3rd cluster in the mix?

现在是进行实验的好时机。让我们看看如果群集在同一区域中会发生什么，以及如果群集在不同区域中会发生什么。增加负载并查看到其他区域的流量溢出(提示：还记得以前使用的--max-rate-per-endpoint吗？)。查看如果您将其中一个群集关闭，会发生什么。并且可以在混合中添加第三个群集吗？

(可选) `gke-autoneg-controller` ((optional) `gke-autoneg-controller`)

Notice the anthos.cft.dev/autoneg annotation on the K8s Services. It is not needed for our setup, but optionally you can deploy gke-autoneg-controller¹³ to your cluster, and use it to automatically associate NEGs created by GKE with corresponding backend services. This will save you some tedious manual work.

注意anthos.cft.dev/autoneg的K8S服务注释。它不需要我们的设置，但您也可以选择部署GKE-AUTONEG控制器 ¹³到群集，并用它来通过GKE与对应的后端服务自动创建关联NEGs。这将节省您一些繁琐的手工工作。

做得好！ (Good Job!)

And that is it. We have explained the purpose of individual GCLB components and demonstrated how to set up multi-cluster load balancing between services deployed in 2 or more GKE clusters in different regions. For real-life use, I would recommend automating this setup with a configuration management tool, such as Terraform.

就是这样。我们已经解释了单个GCLB组件的目的，并演示了如何在部署在不同区域中的2个或更多GKE集群中的服务之间建立多集群负载平衡。对于现实生活中的使用，我建议使用诸如Terraform之类的配置管理工具来自动执行此设置。

This setup both increases your service availability, as several independent GKE clusters serve the traffic and also lowers your latency. In the case of HTTPS, the time to the first byte is shorter, as the initial TLS negotiation happens at the GFE server close to the user. And with multiple clusters, the request will be served by the closest one to the user.

此设置不仅可以提高服务可用性，因为几个独立的GKE群集可以为流量提供服务，还可以降低延迟。对于HTTPS，由于初始TLS协商发生在靠近用户的GFE服务器上，因此到第一个字节的时间更短。对于多个群集，该请求将由最接近用户的一个服务。

Please let me know if you find this useful and any other questions you might have, either here or at @stepanstipl. 🚀🚀🚀 Serve fast and prosper!

如果您觉得这很有用，或者在这里或@stepanstipl遇到任何其他问题，请告诉我。 🚀🚀🚀服务Swift繁荣！

[1] Over 90 locations around the world — Load Balancing — Locations and are proxied to the closest region with available capacity.
[1]全球 超过90个位置- 负载平衡-位置， 并被代理到具有可用容量的最近区域。
[2] Leaving the Anthos aside for now. Anthos is an application management platform that enables you to run K8s clusters on-prem and in other clouds, and also extends the functionality of GKE Clusters, incl. multi-cluster ingress controller.)
[2] 离开 ANTHOS 预留现在。 Anthos是一个应用程序管理平台，使您可以在本地和其他云中运行K8s集群，还可以扩展GKE集群的功能，包括。 多集群入口控制器 。)
[3] GFEs are software-defined, scalable distributed systems located at Edge POPs.
[3] GFE是位于Edge POP上的软件定义的可扩展分布式系统。
[4] The port can be 80 or 8080 if the target is HTTP proxy, or 443 in case of HTTPS proxy.
[4] 如果目标是HTTP代理，则端口可以是80或8080；如果是HTTPS代理，则端口可以是443。
[5] For External HTTP(S) Load Balancing these ranges are 35.191.0.0/16 and 130.211.0.0/22.
[5] 对于外部HTTP(S)负载平衡，这些范围是 35.191.0.0/16 和 130.211.0.0/22 。
[6] Network endpoint groups overview
[6] 网络端点组概述
[7] Container-native load balancing requires cluster in VPC-native mode and allows load balancers to target individual Pods directly (as opposed to targeting cluster nodes).
[7] 容器本机负载平衡 需要 VPC本机 模式下的 群集， 并允许负载平衡器直接定位各个Pod(而不是定位群集节点)。
[8] Using clusters in different regions is more interesting.
[8] 在不同地区使用群集更为有趣。
[9] All the files referenced in the commands in this post are relative to the root of this repository.
[9]本文 中命令中引用的所有文件都相对于此存储库的根目录。
[10] The DNS record needs to be in place for the Google-managed SSL certificate provisioning to work. If not, the certificate might be marked as permanently failed (as the Certificate Authority will fail to sign it) and will need to be recreated.
[10]必须 有DNS记录才能使Google管理的SSL证书设置生效。 如果不是，则该证书可能被标记为永久失败(因为证书颁发机构将无法对其进行签名)，因此需要重新创建。
[11] Assuming you have used regional clusters, each deployed across 3 zones, otherwise adjust accordingly.
[11] 假设您使用了区域集群，每个集群分布在3个区域中，否则进行相应调整。
[12] If you have clusters in different regions, GCLB will prefer to serve the traffic from the one closer to the client, so do not expect traffic to be load-balanced equally between regions.
[12] 如果您的群集位于不同的区域，则GCLB会更喜欢从较近的客户端提供流量，因此不要期望流量在区域之间得到均等的负载平衡。
[13] I’ll not go into the details on how to deploy and use here, please follow the readme, but basically add annotation with the name of the NEG to your service, e.g. anthos.cft.dev/autoneg: '{"name":"autoneg_test", "max_rate_per_endpoint":1000}'.
[13]在这里 我不会详细介绍如何部署和使用，请按照 自述文件进行操作 ，但基本上在您的服务中添加带有NEG名称的注释，例如 anthos.cft.dev/autoneg: '{"name":"autoneg_test", "max_rate_per_endpoint":1000}' 。