基础架构作为代码，但您不必编写该代码

最新推荐文章于 2024-09-27 10:11:28 发布

郝ren

最新推荐文章于 2024-09-27 10:11:28 发布

阅读量212

点赞数

文章标签： python java vue ViewUI

原文链接：https://medium.com/@duplocloud/infrastructure-as-code-but-you-dont-have-to-write-that-code-87ec4fe94863

版权

As we are moving to cloud based infrastructure, more and more teams are looking to use some kind of Infrastructure-as-code solution. This allows them to keep all changes under a version control system and review infrastructure changes by a senior devops person before they are actually deployed.

随着我们转向基于云的基础架构，越来越多的团队正在寻求使用某种“基础架构即代码”解决方案。这使他们可以将所有更改保留在版本控制系统下，并由高级开发人员在实际部署之前查看基础结构更改。

While Infrastructure-as-code has its advantages, it also comes with its own set of challenges. Based on talking to VPs of engineering, CTOs and devops leaders, we have classified their problems into a few categories:

虽然“基础结构即代码”有其优势，但也有其自身的挑战。在与工程副总裁，CTO和devop负责人交谈的基础上，我们将他们的问题分为几类：

Two Difficult Skills (Programming and Operations)Needed in One Person: Your DevOps team needs to be good at programming and operations — so now you need to hire for two skills. In many cases, people are good at one and not so great at the other. If a person is primarily a developer who has now turned into a devops person, they are good at programming but may deploy less-secure or sub-optimal infrastructure. On the other hand, if the existing ops experts or system administrators learn to code, they often do a suboptimal programming job leading to poor code organization. That makes code management and future changes harder to do. A lot of code gets copy-pasted instead of proper modules based design.
一个人需要具备的两项困难技能(编程和操作)：您的DevOps团队需要擅长编程和操作-所以现在您需要雇用两种技能。在许多情况下，人们擅长于一个而不是另一个擅长。如果一个人主要是现在变成了开发人员的开发人员，则他们擅长编程，但可能会部署不太安全或次优的基础结构。另一方面，如果现有的运维专家或系统管理员学习编码，他们通常会做次优的编程工作，从而导致不良的代码组织。这使得代码管理和将来的更改变得更加困难。许多代码被复制粘贴，而不是基于适当模块的设计。
Scaling Adds Complexity: As the team scales, if the code is not structured properly, it gets harder for a team to make changes without stepping on each other. One needs to understand the overall code structure, layout and split across files to know where to add more resources. If there is any churn in the team, new members may prefer a different layout and structure, causing extra work to be done in such cases. This gets harder and one needs to follow best practices to do it well. Here is a detailed article from HashiCorp on how to organize your code for scale:
扩展会增加复杂性：随着团队的扩展，如果代码的结构不正确，团队就很难在不互相影响的情况下进行更改。人们需要了解整体代码结构，布局和跨文件拆分，以了解在何处添加更多资源。如果团队中有任何流失，新成员可能会喜欢不同的布局和结构，从而在这种情况下需要进行额外的工作。这变得越来越困难，需要遵循最佳实践来做到这一点。这是HashiCorp撰写的有关如何按比例组织代码的详细文章：

Scaling Adds Complexity: As the team scales, if the code is not structured properly, it gets harder for a team to make changes without stepping on each other. One needs to understand the overall code structure, layout and split across files to know where to add more resources. If there is any churn in the team, new members may prefer a different layout and structure, causing extra work to be done in such cases. This gets harder and one needs to follow best practices to do it well. Here is a detailed article from HashiCorp on how to organize your code for scale:https://www.hashicorp.com/resources/terraform-workflow-best-practices-at-scale/Although it is a very interesting read, the guidelines require proper processes and learning in place for all team members.
扩展会增加复杂性：随着团队的扩展，如果代码的结构不正确，团队就很难在不互相影响的情况下进行更改。人们需要了解整体代码结构，布局和跨文件拆分，以了解在何处添加更多资源。如果团队中有任何流失，新成员可能会喜欢不同的布局和结构，从而在这种情况下需要进行额外的工作。这变得越来越困难，需要遵循最佳实践来做到这一点。这是HashiCorp撰写的有关如何按比例组织代码的详细文章： https ://www.hashicorp.com/resources/terraform-workflow-best-practices-at-scale/尽管这是一本非常有趣的文章，但该准则要求所有团队成员都有适当的流程和学习。
Changes Take Time: Small changes or enhancements take much longer as one has to go through the code, review, testing, checkin and run process for every minor change. As infrastructure grows, one needs to look at code organization, restructuring, adding modules for keeping the code DRY and the role of an expert reviewer becomes even more critical. Also in order to test changes, one needs to write test code and manage that as well. With coding, reviews, testing, over time managing all of infrastructure-as-code itself becomes a complete software development project. As an example of how to write unit tests for Terraform code, see:
变更需要时间：微小的更改或增强需要更长的时间，因为每个微小的更改都要经过代码，审查，测试，签入和运行过程。随着基础架构的发展，人们需要研究代码的组织，重组，添加用于保持代码DRY的模块，而专家审阅者的角色变得更加关键。同样，为了测试更改，还需要编写测试代码并对其进行管理。通过编码，审查，测试以及逐步管理所有基础架构即代码本身，就成为一个完整的软件开发项目。有关如何为Terraform代码编写单元测试的示例，请参见：

Changes Take Time: Small changes or enhancements take much longer as one has to go through the code, review, testing, checkin and run process for every minor change. As infrastructure grows, one needs to look at code organization, restructuring, adding modules for keeping the code DRY and the role of an expert reviewer becomes even more critical. Also in order to test changes, one needs to write test code and manage that as well. With coding, reviews, testing, over time managing all of infrastructure-as-code itself becomes a complete software development project. As an example of how to write unit tests for Terraform code, see:https://www.terraform.io/docs/extend/testing/unit-testing.htmlOne read through this article and you will see why Infrastructure-as-code is not suitable for all teams and it requires a certain amount of expertise to do it well.
变更需要时间：微小的更改或增强需要更长的时间，因为每个微小的更改都要经过代码，审查，测试，签入和运行过程。随着基础架构的发展，人们需要研究代码的组织，重组，添加用于保持代码DRY的模块，而专家审阅者的角色变得更加关键。同样，为了测试更改，还需要编写测试代码并对其进行管理。通过编码，审查，测试以及逐步管理所有基础架构即代码本身，就成为一个完整的软件开发项目。有关如何为Terraform代码编写单元测试的示例，请参见： https ://www.terraform.io/docs/extend/testing/unit-testing.html仔细阅读本文，您将了解为什么将Infrastructure-as-代码并不适合所有团队，需要一定的专业知识才能做好。
Security and Compliance Can’t be an Afterthought: Your team needs to know how to deploy infrastructure in a secure manner. Lots of controls for security and compliance need to be met during the provisioning time. If one makes mistakes during provisioning, and the problems are detected later in an audit, the fixes can take weeks to months. For example, if the VPCs or subnets are not properly configured, redoing that work will require careful planning and execution so as not to lose IP addresses or connectivity to machines during the transition. It is simply not a scalable long term solution to keep applying security and compliance as afterthought after initial provisioning.
安全和合规性不是事后的想法：您的团队需要知道如何以安全的方式部署基础结构。在供应期间，需要满足许多安全性和合规性控制措施。如果在配置期间犯了错误，并且稍后在审核中发现了问题，则修复可能需要数周至数月的时间。例如，如果未正确配置VPC或子网，则要重做该工作将需要仔细计划和执行，以免在过渡期间丢失IP地址或与计算机的连接。在初始配置后继续保持安全性和合规性，这绝对不是可扩展的长期解决方案。

代码到无代码 (Code to No-Code)

Given the advances in machine based automation and learning, it shouldn’t be very hard to have an intelligent program or a bot take care of the code for us.

鉴于基于机器的自动化和学习方面的进步，拥有一个智能程序或机器人来为我们处理代码应该并不难。

Ultimately a lot of expertise about infrastructure deployment can be captured as rules or a knowledge graph that a program can use. The bot should be able to understand a high level requirement or specification for the application deployment, provision all the underlying infrastructure in a fully secure and compliant manner and ultimately generate the terraform or other infrastructure as code output for the team to keep.

最终，可以将许多有关基础结构部署的专业知识捕获为程序可以使用的规则或知识图。机器人应该能够理解应用程序部署的高层次要求或规范，以完全安全且合规的方式提供所有底层基础架构，并最终生成terraform或其他基础架构作为团队保留的代码输出。

Once the bot produces the right output, it should be fairly easy to change a few variables and deploy that infrastructure to mimic a dev or staging environment that looks identical to production. Essentially, the code gets written but not by humans necessarily.

一旦机器人产生了正确的输出，就应该很容易地更改一些变量并部署该基础结构来模仿看起来与生产相同的开发或暂存环境。从本质上讲，代码是编写的，但不一定是人类编写的。

精心设计的设计 (Well Architected Design)

There are well architected design guidelines that are published by public cloud vendors like AWS. See: https://aws.amazon.com/architecture/well-architected/

由AWS等公共云供应商发布了精心设计的设计指南。参见： https : //aws.amazon.com/architecture/well-architected/

These best practices can be learned, measured for a deployment, and implemented if they are not being used. AWS and many of its partners offer this as a free service. Ultimately a poorly designed infrastructure on AWS will also make AWS look bad. If customers get hacked on a cloud quite often, it will tarnish the reputation of the cloud vendor for no fault of their own. In fact, AWS even has lots of tools available to help with the security audit of a customer’s infrastructure. Some of the common things it will test are network ACLs, security groups, open ports, public access to any databases, public access of S3 buckets and so on.

如果不使用这些最佳实践，则可以学习它们，为部署进行衡量，并加以实施。 AWS及其许多合作伙伴提供此项免费服务。最终，AWS上设计不良的基础架构也会使AWS看起来很糟糕。如果客户经常在云上遭到黑客入侵，这将损害云供应商的声誉，而不会造成自己的过错。实际上，AWS甚至提供了许多工具来帮助客户基础设施进行安全审核。它将测试的一些常见事物包括网络ACL，安全组，开放端口，对任何数据库的公共访问，S3存储桶的公共访问等等。

It is quite challenging for any human to learn all these guidelines and keep up with the new services and their requirements that come out every month. As a result it is common for errors to happen during the deployment phase. In most cases, secops teams use a set of tools to go over the complete infrastructure periodically, using a read-only account and check for a set of guidelines that should be met. These tools then flag a list of violations with different severity levels that the team has to go through and analyze each one of them.

对于任何人来说，学习所有这些指南并跟上每个月出现的新服务及其要求是非常具有挑战性的。因此，在部署阶段通常会发生错误。在大多数情况下，思科普团队会使用一组工具，使用一个只读帐户，定期检查整个基础架构，并检查应满足的一组准则。然后，这些工具会标记一组具有不同严重性级别的违规列表，团队必须对这些违规级别进行分析，并分析每个违规级别。

What a waste of time and resources for everyone.

浪费每个人的时间和资源。

Instead, why not have a tool that does the deployment while meeting all the well architected design principles from the provisioning time itself. You never should get into a bad situation and run continuous analysis tools to detect and fix problems.

取而代之的是，为什么不提供一种工具来进行部署，同时又能从供应时间本身满足所有精心设计的设计原则。您永远都不应陷入困境并运行连续的分析工具来检测和修复问题。

预防而不是(检测+修复) (Prevention Instead of (Detection + Remediation))

Instead of writing a lot of code to build infrastructure, making sure that the code is correct, analyze the deployment using a separate set of tools, fixing those in the code, re-deploying that code and repeating this whole process over and over again, we need a way to have an automated bot that can take care of infrastructure provisioning, application deployment, adding security controls to the underlying VMs or containers and alerting when something goes wrong.

无需编写大量代码来构建基础结构，而是要确保代码正确无误，而是使用另一套工具来分析部署，修复代码中的工具，重新部署该代码并一遍又一遍地重复整个过程，我们需要一种拥有自动化机器人的方法，该机器人可以处理基础设施配置，应用程序部署，为基础VM或容器添加安全控制并在出现问题时发出警报。

We need to go from detection mindset to prevention mindset. That is the only way to a healthy lifestyle as a doctor would say!

我们需要从检测心态转变为预防心态。 正如医生所说，这是健康生活方式的唯一途径！

Figure A shows the current state of the art in terms of deploying applications in a cloud and Figure B shows what should happen.

图A显示了在云中部署应用程序的最新状态，图B显示了应该发生的情况。

Image for post — **Figure A: Lifecycle of Application Deployment using Infrastructure-as-code** **图A：使用“基础结构即代码”的应用程序部署的生命周期**

DuploCloud has built a solution as shown in figure B that allows you to have the benefits of infrastructure-as-code without any of its drawbacks. We believe that this is how the applications and infrastructure should be deployed in the cloud. If you agree with this new approach and think this is a journey you would like to be on, please reach out to us.

DuploCloud构建了如图B所示的解决方案，使您可以享受基础架构即代码的好处，而没有任何缺点。我们认为，这就是应如何在云中部署应用程序和基础架构。如果您同意这种新方法，并认为这是您希望继续的旅程，请与我们联系。

We can’t promise how many users your application will have, but we can promise that your developers and operations team will not have to worry about security, compliance and infrastructure following the right set of principles and guidelines. They will have a lot more free time to focus on what matters to the business and not worry about the infrastructure or application deployment being done in the right way.

我们不能保证您的应用程序将拥有多少用户，但是我们可以保证您的开发人员和运营团队将不必担心遵循正确的一组原则和准则的安全性，合规性和基础架构。他们将有更多的空闲时间专注于对业务至关重要的事情，而不必担心基础架构或应用程序部署是否以正确的方式进行。

We can promise to give a lot more time back to your developers and operations team, which they can spend on activities that really matter to business instead of on security, compliance and infrastructure provisioning.

我们可以保证给您的开发人员和运营团队更多的时间，他们可以将这些时间花在对业务真正重要的活动上，而不是用于安全性，合规性和基础架构设置。

结论 (Conclusion)

As tools like Terraform, Ansible and CloudFormation have brought about Infrastructure-as-code as a dominant design pattern, we are also realizing the limitations and challenges that come with using such tools. Specifically in areas such as balancing the skill sets required, scalability as the code base grows, time taken making changes & security being an afterthought. Furthermore, these tools lead teams to using even more tools to check their work for security issues, compliance and well architected design principles.

由于诸如Terraform，Ansible和CloudFormation之类的工具已将基础架构即代码作为主要的设计模式，因此我们也意识到使用此类工具会带来的局限性和挑战。特别是在平衡所需技能集，随着代码库增长的可伸缩性，更改时间和安全性是事后才想到的领域。此外，这些工具使团队可以使用更多工具来检查其工作中的安全性，合规性和结构合理的设计原则。

We are trying to create a world where all the infrastructure provisioning and application deployment tasks can be done using an intelligent bot, which will do these in the right way from day 1 and will not require continuous checking. More than two dozen of our customers are using DuploCloud everyday, managing more than a million dollars of AWS spend every tear, while doing 5,000+ deployments per week.

我们正在尝试创建一个世界，在此世界中，所有基础结构调配和应用程序部署任务都可以使用智能机器人来完成，该机器人将从第一天开始就以正确的方式完成这些任务，并且不需要进行连续检查。我们每天有超过二十个客户在使用DuploCloud，每一次耗费一百万美元用于管理AWS的费用，每周进行5,000多个部署。

Finally, there is an old saying for both personal and cloud infrastructure health: Prevention is better than cure!

最后，对于个人和云基础架构的健康都有一句老话： 预防胜于治疗！