普通集群镜像集群_到底是谁的集群？

最新推荐文章于 2023-12-29 16:34:28 发布

weixin_26742939

最新推荐文章于 2023-12-29 16:34:28 发布

阅读量208

点赞数

文章标签： docker 分布式大数据 kubernetes hadoop

原文链接：https://medium.com/faun/whose-cluster-is-it-anyway-f600f9181ae9

版权

普通集群镜像集群

While researching how enterprises adopt Kubernetes, we can outline a common scenario; implementing a Kubernetes cluster in a company often starts as a proof of concept. Either developers decide they want to try something new, or the CTO does his research and decides to give it a try as it sounds promising. Typically, there is no roadmap, no real plan for the future steps, no decision to go for production.

在研究企业如何采用Kubernetes时，我们可以概述一个常见的场景。在公司中实施Kubernetes集群通常是从概念验证开始的。开发人员决定他们想尝试新事物，或者CTO进行研究并决定尝试尝试，因为听起来很有希望。通常，没有路线图，没有未来步骤的真正计划，也没有决定投产。

在企业中使用Kubernetes集群的第一步 (First steps with a Kubernetes cluster in an enterprise)

And then it is a huge success — a Kubernetes cluster makes managing deployments easier, it’s simple to use for developers, cheaper than the previously used platform and it just works for everyone. The security team creates the firewall rules, approves the configuration of the network overlay, and load balancers. Operators create their CI/CD pipelines for the cluster deployments, backups, and daily tasks. Developers rewrite configuration parsing and communication to fully utilize the ConfigMaps, Secrets, and cluster internal routing and DNS. In no time you are one click from scrapping the existing infrastructure and moving everything to the Kubernetes.

然后，它取得了巨大的成功-Kubernetes集群使管理部署更加容易，对开发人员来说使用起来简单，比以前使用的平台便宜，而且对所有人都适用。安全团队创建防火墙规则，批准网络覆盖的配置以及负载均衡器。运营商为集群部署，备份和日常任务创建其CI / CD管道。开发人员重写配置解析和通信，以充分利用ConfigMap，Secrets和群集内部路由和DNS。您一口气就可以删除现有基础架构并将所有内容移至Kubernetes。

This might be the point when you start thinking about providing support for your cluster and the applications in it. It may be an internal development team using your Kubernetes cluster, or PaaS for external teams. In all cases, you need a way to triage all support cases and decide which team or a person is responsible for which part of the cluster management. Let’s first split this into two scenarios.

当您开始考虑为群集及其中的应用程序提供支持时，可能就是这一点。它可能是内部开发团队使用您的Kubernetes集群，或者是PaaS用于外部团队。在所有情况下，您都需要一种对所有支持案例进行分类并确定哪个团队或人员负责集群管理的哪一部分的方法。让我们首先将其分为两种情况。

每个团队一个Kubernetes集群 (A Kubernetes Cluster per team)

If the decision is to give a full cluster or clusters for a team, there is no resource sharing, so there is less to worry about. Still, someone has to draw the line and say where a cluster operators’ responsibility ends, and the developers have to take it.

如果决定为一个团队提供一个或多个完整集群，则不存在资源共享，因此不必担心。尽管如此，还是有人要划清界限，说出集群运营商的责任在哪里结束，而开发人员必须承担起责任。

The easiest way would be to give the full admin access to the cluster, some volumes for persistent data and a set of LBs (or even one LB for ingress), and delegate the management to the development team. Such a solution would not be possible in most cases, as it requires a lot of experience from the development team to properly manage the cluster and make sure it is stable. Also, this is not always optimal from the resources perspective to create a cluster for even a small team.

最简单的方法是授予对群集的完全管理员访问权限，一些用于存储持久数据的卷和一组LB(甚至对于入口是一个LB)，然后将管理委派给开发团队。在大多数情况下，这种解决方案是不可能的，因为它需要开发团队的大量经验才能正确管理集群并确保其稳定。同样，从资源的角度来看，这对于为一个小型团队创建集群也不总是最佳的。

The other problem is that when a team has to manage the whole cluster, the actual way it works can greatly diverge. Some teams decide to use Nginx ingress and some traefik. End of the day, it is much easier to monitor and manage the uniform clusters.

另一个问题是，当团队必须管理整个集群时，其实际工作方式可能会大相径庭。一些团队决定使用Nginx入口和一些traefik。一天结束时，监视和管理统一集群要容易得多。

共享集群 (Shared cluster)

The alternative is to utilize the same cluster for multiple teams. There is quite a lot of configuration required to make sure the team doesn’t interfere and can’t affect other teams operations, but adds a lot of flexibility when it comes to resource management and limits greatly the number of clusters which have to be managed, for example in terms of backing them up. It might be also useful if teams work on the same project or the set of projects which use the same resources or closely communicate — at the current point it is possible to communicate between the cluster using service mesh or just load balancers, but it may be the most performant solution.

另一种选择是将同一集群用于多个团队。需要进行大量配置以确保团队不会干扰并且不会影响其他团队的运营，但是在资源管理方面增加了很多灵活性，并极大地限制了必须管理的集群数量，例如在备份它们方面。如果团队在使用相同资源或紧密通信的同一个项目或一组项目上工作，这可能也很有用。在当前情况下，可以使用服务网格或仅使用负载平衡器在集群之间进行通信，但可能最高效的解决方案。

责任等级 (Responsibility levels)

If the dev team does not possess the skills required to manage a Kubernetes cluster, then the responsibility has to split between them and operators. Let’s create four examples of this kind of distribution:

如果开发团队不具备管理Kubernetes集群所需的技能，则责任必须在他们和运营商之间分配。让我们创建这种分布的四个示例：

不是开发人员的责任 (Not a developer responsibility)

This is probably the hardest version for the operators’ team, where the development team is only responsible for building the docker image and pushing to the correct container registry. Kubernetes on its own helps a lot with making sure that new version rollout does not result in a broken application via deployment strategy and health checks. If something silently breaks, it may be hard to figure out if it is a cluster failure or a result of the application update, or even database model change.

对于操作员团队来说，这可能是最难的版本，开发团队仅负责构建docker映像并推送到正确的容器注册表。 Kubernetes本身可以通过部署策略和运行状况检查来确保新版本的发布不会导致应用程序损坏，这对我们大有帮助。如果某件事无声地中断了，则可能很难弄清这是集群故障还是应用程序更新甚至数据库模型更改的结果。

开发人员可以管理部署，Pod和配置资源 (Developer can manage deployments, pods, and configuration resources)

This is a better scenario. When developers are responsible for the whole application deployment by creating manifests, all configuration resources, and doing rollouts, they can and should do a smoke test afterward to make sure everything remains operational. Additionally, they can check the logs to see what went wrong and debug in the cluster.

这是一个更好的方案。当开发人员通过创建清单，所有配置资源并进行首次发布来负责整个应用程序部署时，他们可以并且应该在以后进行冒烟测试，以确保一切都可以正常运行。此外，他们可以检查日志以查看问题所在并在集群中进行调试。

This is also the point where the security or operations team needs to start to think about securing a cluster. There are settings on the pod level which can elevate the workload privileges, change the group it runs as or mount the system directories. This can be done for example via Open Policy Agent. Obviously, there should be no access to the other namespaces, especially the kube-system, but this can be easily done with just built-in RBAC.

这也是安全或运营团队需要开始考虑保护群集的关键点。在Pod级别上有一些设置可以提升工作负载特权，更改其运行组或装入系统目录。例如，这可以通过开放策略代理来完成。显然，不应访问其他名称空间，尤其是kube系统，但这可以通过内置的RBAC轻松完成。

开发人员可以管理所有名称空间级别的资源 (Developers can manage all namespace level resources)

If the previous version worked maybe we can give developers more power? We can, especially when we create quotas on everything we can. Let’s first go through additional resources that are now available and see if something seems risky (we have stripped the uncommon ones for clarity). Below you can see them gathered in two groups:

如果以前的版本可行，也许我们可以给开发人员更多的权力？我们可以做到，尤其是当我们尽一切可能创建配额时。首先，让我们看一下现在可用的其他资源，看看是否有风险(为了清晰起见，我们删除了不常见的资源)。在下面，您可以看到它们分为两组：

Safe ones:

安全的：

Job
工作
PersistentVolumeClaim
持久卷声明
Ingress
入口
PodDisruptionBudget
PodDisruption预算
DaemonSet
守护程序集
HorizontalPodAutoscaler
HorizontalPodAutoscaler
CronJob
CronJob
ServiceAccount
服务帐号

The ones we recommend to block:

我们建议阻止的对象：

This is not really a definitive guide, just a hint. NetworkPolicy depends really on the network overlay configuration and security rules we want to enforce. ServiceAccount is also arguable depending on the use case. Other ones are commonly used to manage the resources in the shared cluster and the access to it, so should be available mainly for the cluster administrators.

这并不是真正的权威指南，只是一个提示。 NetworkPolicy实际上取决于我们要实施的网络覆盖配置和安全规则。 ServiceAccount也可以根据用例进行争论。其他资源通常用于管理共享群集中的资源以及对其的访问，因此主要应供群集管理员使用。

DevOps多功能团队 (DevOps multifunctional teams)

Last, but not least, the famous and probably the hardest to come by approach: multifunctional teams and a DevOps role. Let’s start with the first one — moving part of the operators to work in the same team, same room, with the developers solves a lot of problems. There is no going back and forth and trying to keep in sync backlogs, sprints, and tasks for multiple teams — the work is prioritized for the team and treated as a team effort. No more waiting 3 weeks for a small change, because the whole ops team is busy with the mission-critical project. No more fighting for the change that is top-priority for the project, but gets pushed down in the queue.

最后但并非最不重要的一点是，著名的方法可能是最难实现的方法：多功能团队和DevOps角色。让我们从第一个开始，将操作员的一部分转移到与开发人员同一个团队，同一个房间中，以解决许多问题。没有来回的尝试，并试图保持多个团队的积压工作，冲刺和任务同步-该工作是团队的优先事项，被视为团队的工作。由于整个操作团队都在忙于执行关键任务项目，因此无需再等待3周的小改动。无需再为项目中最优先的变更而战，而是被排在队列中。

Unfortunately, this means each team needs its own operators, which may be expensive and rarely possible. As a solution for that problem comes the mythical DevOps position: developer with operator skills who can part-time create and manage the cluster resources, deployments and CI/CD pipelines, and part-time work on the code. The required skill set is very broad, so it is not easy to find someone for that position, but it gets popular and may revolutionize the way teams work. Sad to say, this position is often described as an alias of the SRE position, which is not really the same thing.

不幸的是，这意味着每个团队都需要自己的操作员，这可能很昂贵，而且很少有。解决该问题的方法是提出神话般的DevOps职位：具有操作员技能的开发人员，可以兼职创建和管理集群资源，部署和CI / CD管道，以及兼职编写代码。所需的技能非常广泛，因此要找到适合该职位的人并不容易，但是它很受欢迎，并且可能会改变团队的工作方式。可悲的是，这个位置通常被描述为SRE的位置，这是不是真的一样的东西的别名。

分流，委托和修复 (Triage, delegate, and fix)

The responsibility split is done, so now we should only decide on the incident response scenarios, how do we triage issues, and figure out which team is responsible for fixing it (for example by monitoring cluster health and associating it with the failure), alerting and, of course, on-call schedules. There are a lot of tools available just for that.

责任划分已经完成，所以现在我们只应决定事件响应方案，如何对问题进行分类，并找出负责修复此问题的团队(例如，通过监视集群运行状况并将其与故障相关联)，发出警报当然还有通话时间表。有很多工具可用于此目的。

Eventually, there is always a question “whose cluster is it?” and if everyone knows which field or part of the cluster they manage, then there are no misunderstandings and no blaming each other for the failure. And it’s getting resolved much faster.

最终，总会有一个问题“它是谁的集群？” 如果每个人都知道他们管理的是哪个领域或集群的一部分，那么就不会有误解，也不会将失败归咎于彼此。而且解决速度更快。

Originally published at https://grapeup.com on August 18, 2020.

最初于 2020年8月18日 在 https://grapeup.com 上发布。

Subscribe to FAUN topics and get your weekly curated email of the must-read tech stories, news, and tutorials 🗞️

订阅 FAUN主题， 并每周收到必须阅读的技术故事，新闻和教程的精选电子邮件🗞️

Follow us on Twitter 🐦 and Facebook 👥 and Instagram 📷 and join our Facebook and Linkedin Groups 💬

在 Twitter上 关注我们 🐦 和 Facebook 👥 和 Instagram 📷 并加入我们的 Facebook 和 Linkedin 组 💬

如果此帖子有帮助，请单击下面的拍手👏按钮几次，以表示对作者的支持！ ⬇ (If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇)

翻译自: https://medium.com/faun/whose-cluster-is-it-anyway-f600f9181ae9

普通集群镜像集群

weixin_26742939

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
普通集群镜像集群_到底是谁的集群？

普通集群镜像集群While researching how enterprises adopt Kubernetes, we can outline a common scenario; implementing a Kubernetes cluster in a company often starts as a proof of concept. Either developers deci...
复制链接

扫一扫