dhh ror_如何自动化您的dhh部署

最新推荐文章于 2024-09-21 17:56:48 发布

weixin_26739079

最新推荐文章于 2024-09-21 17:56:48 发布

阅读量283

点赞数

文章标签： python java leetcode

原文链接：https://codeburst.io/how-to-automate-your-dwh-deployments-5f1b70d1ebab

版权

dhh ror

Some time ago, I wrote about the Challenges in Automating Multi-Schema Database Deployments. It was about why automated database deployments are crucial to efficient development processes and why they are challenging to establish with multi-schema applications.

前一段时间，我写了关于自动化多模式数据库部署的挑战。这是关于为什么自动化数据库部署对于有效的开发流程至关重要，以及为什么要使用多模式应用程序建立自动化数据库具有挑战性。

After implementing many automated DWH deployment processes, I now summarized the most important steps to get there and explain why not only technological implementations have to be made but also process changes and why they are crucial.

在实施了许多自动化的DWH部署流程之后，我现在总结了最重要的步骤，并解释了为什么不仅必须进行技术实施，而且还必须进行流程更改以及它们为何至关重要。

地狱的部署过程 (A deployment process from hell)

Manual deployments include all the processes that require manual steps to install your software into production. Those manual steps can be:

手动部署包括需要手动步骤才能将软件安装到生产中的所有过程。这些手动步骤可以是：

Emailing the install scripts to other people (lack of centralized repository).
通过电子邮件将安装脚本发送给其他人(缺少集中式存储库)。
Manual execution of installation scripts (lack of automated pipelines).
手动执行安装脚本(缺少自动管道)。
Copying files around (lack of centralized repository).
到处复制文件(缺少集中存储库)。
Sending emails (lack of automated processes).
发送电子邮件(缺少自动化流程)。
People outside the team execute the installation (lack of ownership).
团队之外的人执行安装(没有所有权)。

Image for post — Example of a manual deployment process where communication happens over email.

In the picture above, an example of a manual deployment process is shown. The team member sends an installation package over email to the DBA. The DBA then manually installs the scripts on the target database. He then sends an email back to the team member with a message if the deployment was successful or not. If it wasn’t, he attaches a log file with the error. The team member then has to start looking for the problem and sends a fix again to the DBA…

在上图中，显示了手动部署过程的示例。团队成员通过电子邮件将安装包发送给DBA。然后，DBA手动将脚本安装在目标数据库上。然后，如果部署成功与否，他将向电子邮件发送回给团队成员，并带有一条消息。如果不是，他将附加错误日志文件。然后，团队成员必须开始寻找问题，然后再次将修复程序发送给DBA…

Such a deployment process has many problems:

这样的部署过程存在许多问题：

The involvement of other teams is very time-consuming. The waiting times for the development team members are very long. This waiting time makes the process very expensive if repeated often.
其他团队的参与非常耗时。开发团队成员的等待时间很长。如果频繁重复，此等待时间将使该过程非常昂贵。
Manual executions are prone to errors. They are not repeatable and therefore, don’t reduce risk throughout the deployment pipeline.
手动执行容易出错。它们不可重复，因此不会降低整个部署流程中的风险。
Automated testing is not possible because no mechanism ensures automatic test execution.
自动化测试是不可能的，因为没有机制可以确保自动测试执行。

多模式数据库应用程序 (Multi-Schema Database Applications)

Multi-Schema Database Applications are applications that include more than one database schema. They share data or access data across schemas but are physically on the same database instance. Typically this is seen in data warehouses as shown in the picture below where each non-grey box shows a database schema. Those schemas are used by different applications that share and exchange data (DWH Core and Marts).

多架构数据库应用程序是包含多个数据库架构的应用程序。它们跨模式共享数据或访问数据，但实际上位于同一数据库实例上。通常，这可以在数据仓库中看到，如下图所示，其中每个非灰色框都显示了一个数据库模式。这些架构由共享和交换数据的不同应用程序使用(DWH Core和Marts)。

Suppose a data warehouse contains separate schemas for its layers, there’s most likely one schema that contains all the metadata and configuration data for the functionality of the ETL pipelines and the data loads. Also, standard functionality like logging is usually installed in such a separate schema.

假设数据仓库的各个层包含单独的架构，则最有可能的一个架构包含用于ETL管道功能和数据加载的所有元数据和配置数据。同样，标准功能(如日志记录)通常安装在这样一个单独的架构中。

This approach leads to an application that stretches across multiple schemas and is therefore called a multi-schema database application.

这种方法导致一个跨多个架构的应用程序，因此被称为多模式数据库应用程序。

The Data Marts will most likely also have some conformed dimensions that have been physically referenced in multiple tables. They allow references to the same structure, attributes, values, definitions and concepts like time and dates. Because of this, they are used across many fact tables; so even if the single data marts could be split into separate schemas, there would still be the dependency to the conformed dimensions that all the marts have.

数据集市很可能还将具有一些已在多个表中实际引用的一致尺寸。它们允许引用相同的结构，属性，值，定义和概念，例如时间和日期。因此，它们被用于许多事实表中。因此，即使可以将单个数据集市拆分为单独的架构，所有集市所拥有的依存维度仍然存在依赖性。

自动化部署的优势 (Advantages of automated deployment)

The goal of continuous integration and automated deployments is to achieve stable releases with high quality. We also save time during the process by eliminating repetitive manual tasks through automation. With improved stability and higher quality, it should be possible to achieve less unplanned work. And the automation is a huge timesaver that also saves money because the developers’ time can be used to create actual value for the product on which we are working.

持续集成和自动化部署的目标是获得高质量的稳定版本。通过自动化消除重复的手动任务，我们还节省了时间。随着稳定性的提高和质量的提高，应该可以减少计划外的工作。自动化是节省时间的巨大工具，因为开发人员的时间可以用来为我们正在开发的产品创造实际价值，因此可以节省大量资金。

In their book “Accelerate — The Science of DevOps” Nicole Fosgren, Jez Humble and Gene Kim published their research findings of their “State of DevOps Reports”. They found out that high performance in development teams is possible with all kinds of systems and is independent of the technologies they use.

Nicole Fosgren，Jez Humble和Gene Kim在他们的《加速-DevOps的科学》一书中发表了他们对“ DevOps报告状态”的研究发现。他们发现，各种类型的系统都有可能在开发团队中实现高性能，并且与他们使用的技术无关。

“High performance is possible with all kind of systems and is independent of the technologies the teams use! “— Accelerate — The Science of DevOps

“所有类型的系统都可以实现高性能，而与团队使用的技术无关！ “ —加速— DevOps的科学

Another interesting finding is that — where code deployments are most painful — one can find the most deficient organizational performance and culture. So if your deployment process hurts, most likely the culture around it isn’t the best either.

另一个有趣的发现是-在代码部署最痛苦的地方-人们可以找到最缺乏组织性能和文化的地方。因此，如果您的部署过程受到影响，那么很有可能它周围的文化也不是最好的。

“Where code deployments are most painful, you’ll find the poorest organizational performance and culture”— Accelerate — The Science of DevOps

“在代码部署最痛苦的地方，您会发现最差的组织绩效和文化” – Accelerate – The DevOps科学

The third finding I want to highlight is that software delivery performance positively impacts organizational performance. This finding implies that with automation and better development and delivery processes, not only do the software teams improve their performance but the whole organization profits from those improvements.

我要强调的第三个发现是软件交付性能会对组织绩效产生积极影响。这一发现表明，借助自动化以及更好的开发和交付流程，软件团队不仅可以提高他们的性能，而且整个组织都可以从这些改进中获利。

“Software delivery performance positively impacts organizational performance!”— Accelerate — The Science of DevOps

“软件交付绩效对组织绩效产生积极影响！” — Accelerate — DevOps科学

优化数据库部署的三个阶段(Three stages of optimizing your database deployment)

Now, if you want to improve your continuous integration and delivery process then certain preconditions have to be established. The most basic is in regards to how changes are handled. If multiple changes are put into one change package (see the image below — left stage), this is the first thing that can be optimized. Every change in the application code needs to be treated as an individual change. For example, one change or deployment per implemented feature. This way, failing features don’t stop the successful features from getting installed in production. This splitting should then result in the second stage in the image below. The third step is automating deployment for all those changes.

现在，如果您要改善持续集成和交付过程，则必须建立某些先决条件。最基本的是关于如何处理更改。如果将多个更改放入一个更改包中(请参见下图，左图)，这是可以优化的第一件事。应用程序代码中的每个更改都应视为一个单独的更改。例如，对每个已实现功能进行一次更改或部署。这样，失败的功能不会阻止成功的功能在生产中安装。然后，该拆分将导致下图中的第二阶段。第三步是自动化所有这些更改的部署。

如何到那 (How to get there)

But how to accomplish automated deployments? There are two crucial things to implement:

但是如何完成自动化部署？有两件重要的事情要实现：

Establish the Practices of Continuous Integration
建立持续整合的实践

Martin Fowler describes very well what is necessary to implement a continuous integration process that’s robust and reliable.
马丁·福勒(Martin Fowler)很好地描述了实现健壮和可靠的连续集成过程所需的条件。
Build Deployment Pipelines
建立部署管道

Pipelines will be needed to automate the installations until production. There are three kinds of pipelines that are most likely required described in the image below.
在生产之前，将需要管道来使安装自动化。下图描述了最可能需要的三种管道。

Continuous Integration Pipelines will assure that changes on feature branches are deployable and don’t break stuff.

持续集成管道将确保要素分支上的更改是可部署的，并且不会破坏内容。

If your application is a database, a build is the same thing as a deployment.

如果您的应用程序是数据库，则构建与部署相同。

Nightly Build Pipelines assure the health status of the files in your main branch and ensure that the whole installation is installable.

每晚构建管道可确保主分支中文件的运行状况良好，并确保整个安装均可安装。

Deployment Pipelines provide you with the automated functionality to deploy automatically through your environment pipeline from test to integration until production in a safe and repeatable way. Always the same way.

部署管道为您提供了自动化的功能，可以以安全，可重复的方式从测试到集成，直到生产，贯穿整个环境管道进行自动部署。总是一样。

More about the DevOps Report Studies can be found on the DORA Research Program website.

可以在DORA研究计划网站上找到有关DevOps报告研究的更多信息。