对数据库开发人员如何面试_从开发人员的角度处理数据库

对数据库开发人员如何面试

When you develop a new feature using databases, the development cycle usually includes (but is not limited to) the following steps:

使用数据库开发新功能时,开发周期通常包括(但不限于)以下步骤:

Writing an SQL migration → writing source code → testing → release → monitoring.

编写SQL迁移→编写源代码→测试→发布→监视。

In this article, I want to share some practical tips on shortening the cycle time for each of these stages without any loss in quality — instead, we would like to improve it.

在本文中,我想分享一些实用的技巧,以缩短这些阶段中每个阶段的周期时间而又不损失任何质量-相反,我们希望改进它。

Since we are working with PostgreSQL and our server code is written in Java, the examples will be based on this stack. However, most of these ideas will apply regardless of the database or programming language you are using.

由于我们正在使用PostgreSQL,并且我们的服务器代码是用Java编写的,因此示例将基于此堆栈。 但是,无论您使用哪种数据库或编程语言,这些想法中的大多数都会适用。

SQL迁移 (SQL migration)

The first development stage after designing the feature is writing an SQL migration. The most basic advice is not to make any manual changes to the data schema, but always to make changes through scripts that are all stored in one place.

设计功能之后的第一个开发阶段是编写SQL迁移。 最基本的建议不是对数据模式进行任何手动更改,而是始终通过全部存储在一个地方的脚本进行更改。

Developers write SQL migrations themselves at our company, so migrations are stored in a repository alongside the source code. For some organizations, the schema is managed by database administrators. In such a case, the migration registry is stored somewhere at their side. In both cases, this approach brings the following benefits:

开发人员在我们公司自己编写SQL迁移,因此迁移与源代码一起存储在存储库中。 对于某些组织,该架构由数据库管理员管理。 在这种情况下,迁移注册表将存储在其旁边的某个位置。 在这两种情况下,这种方法都具有以下优点:

  • You can always easily create a new database from scratch or upgrade existing ones to the current version. This allows you to deploy new test environments and local development environments quickly.

    您始终可以轻松地从头开始创建新数据库,也可以将现有数据库升级到当前版本。 这使您可以快速部署新的测试环境和本地开发环境。
  • All databases have the same schema, which means no surprises in maintenance.

    所有数据库都具有相同的架构,这意味着在维护方面不会感到意外。
  • You can see the full history of all changes (versioning).

    您可以查看所有更改(版本)的完整历史记录。

There are several available tools to automate this process, both commercial and free: Flyway, Liquibase, Sqitch, and others. We’re not going to compare and choose the best tool in this article because that is a big, separate topic, and many other articles have already been written about it.

有几个可用的工具来自动完成这一过程,包括商业和免费:迁飞LiquibaseSqitch ,等等。 我们不会比较和选择本文中最好的工具,因为这是一个很大的主题,而且已经写了许多其他文章。

We’re using Flyway, so here’s some information on it.

我们正在使用Flyway,因此这里有一些信息。

  • There are two types of migration: SQL-based and Java-based.

    迁移有两种类型:基于SQL的迁移和基于Java的迁移。

  • SQL migrations are immutable. After the first execution, an SQL migration cannot be changed. Flyway calculates the checksum for a migration file’s content and checks it for every run. For Java migrations to be immutable, additional manual handling is required.

    SQL迁移是不可变的。 第一次执行后,无法更改SQL迁移。 Flyway计算迁移文件内容的校验和,并在每次运行时对其进行检查。 为了使Java迁移不可变,需要其他手动处理

  • The history of all migrations is stored in the flyway_schema_history (formerly schema_version) table. You can find the date and duration of each migration and its type, file name, and checksum there.

    所有迁移的历史记录都存储在flyway_schema_history (以前称为schema_version)表中。 您可以在其中找到每个迁移的日期和持续时间及其类型,文件名和校验和。

Under our internal arrangements, all changes to the data schema are made only through an SQL migration. Their immutability ensures that we can always obtain a current schema that is entirely identical to all environments.

根据我们的内部安排,仅通过SQL迁移即可对数据架构进行所有更改。 它们的不变性确保我们始终可以获得与所有环境完全相同的当前模式。

Java migrations are only used for DML when you can’t write in plain SQL. A typical case is a migration to Postgres from another database (we are moving from Redis to Postgres, but that’s a story for another day). One more example is updating the data for a large table in several transactions to minimize the table lock time. It’s worth mentioning that since the 11th version of Postgres, this can be done with SQL procedures on plpgsql.

Java迁移仅在无法使用普通SQL编写时才用于DML。 一个典型的案例是从另一个数据库迁移到Postgres(我们正在从Redis迁移到Postgres,但这又是另一回事了)。 另一个示例是在多个事务中更新大型表的数据,以最大程度地减少表锁定时间。 值得一提的是,从Postgres的第11版开始,可以使用plpgsql上SQL过程来完成此操作。

When Java code becomes obsolete, the migration can be removed because we don’t want to produce legacy code (the Java class for the migration remains, but it’s blank). We set this to happen no sooner than a month after performing the migration in production — we believe this is enough time for all test environments and local development environments to update. Since Java migrations are used only for DML, its removal does not affect the creation of new databases from scratch.

当Java代码过时时,可以删除迁移,因为我们不想生成遗留代码(用于迁移的Java类仍然存在,但是为空白)。 我们将其设置为在生产环境中进行迁移后不超过一个月的时间-我们相信这是足够的时间来更新所有测试环境和本地开发环境。 由于Java迁移仅用于DML,因此将其删除不会影响从头开始创建新数据库。

对于使用pg_bouncer的人的重要说明 (An important note for those who use pg_bouncer)

During the migration, Flyway acquires a lock to prevent multiple migrations from co-executing. To simplify, it works like this:

在迁移过程中,Flyway会获取一个锁,以防止多个迁移共同执行。 为简化起见,它的工作方式如下:

  • A lock is acquired.

    获取锁。
  • A transactional migration is executed.

    执行事务迁移。
  • The lock is released.

    锁被释放。

For Postgres, Flyway uses advisory locks in session mode, which means that to work properly, the application server must be running on the same connection when it is released as it was when it was acquired. If you use pg_bouncer in transactional mode (which is most common) or in statement mode, it can return a new connection for each transaction, and Flyway will not be able to release the lock.

对于Postgres,Flyway在会话模式下使用咨询锁,这意味着要正常工作,应用服务器在发布时必须与获取时在同一连接上运行。 如果在事务模式(最常见)或语句模式下使用pg_bouncer,它可以为每个事务返回一个新的连接,并且Flyway将无法释放该锁。

To solve this problem, we use a small dedicated connection pool on pg_bouncer in session mode, which is intended for migrations only. On the application side, there is also a separate pool containing one connection that closes on a timeout after the migration is complete in order not to hold resources.

为了解决这个问题,我们在会话模式下在pg_bouncer上使用了一个小型专用连接池,该池仅用于迁移。 在应用程序方面,还有一个单独的池,其中包含一个连接,该池在迁移完成后超时时关闭,以不占用资源。

编写源代码 (Writing source code)

The migration is in place, so now we can write the code.

迁移已经就绪,因此现在我们可以编写代码了。

There are three approaches for working with the database on the application side:

在应用程序端使用三种方法来处理数据库:

  • using ORM (for Java, hibernate is the de facto standard)

    使用ORM(对于Java,Hibernate是事实上的标准)
  • using plain SQL + jdbcTemplate or the like

    使用普通SQL + jdbcTemplate之类的
  • using DSL libraries

    使用DSL库

Using ORM reduces your SQL skill requirements — a lot of things are generated automatically:

使用ORM会降低您SQL技能要求-很多东西会自动生成:

  • The data schema can be created from an XML description or a Java entity available from the code.

    可以从XML描述或代码中可用的Java实体创建数据模式。
  • Object relations are defined using a declarative description — ORM will make joins for you.

    对象关系使用声明性描述进行定义-ORM将为您进行联接。
  • With Spring Data JPA, even tricky queries can be generated automatically by the repository method’s signature.

    使用Spring Data JPA,甚至可以通过存储库方法的签名自动生成棘手的查询。

One more bonus is data caching out of the box (for hibernate, 3 levels of caching).

另外一个好处是开箱即用的数据缓存(对于Hibernate模式,它具有3级缓存)。

But it is important to note that ORM, like any other powerful tool, requires specific skills to use. Without proper customization, the code is still likely to work, but far from optimally.

但是需要注意的是,ORM与其他任何强大的工具一样,都需要使用特定的技能。 没有适当的自定义,代码仍然可能会工作,但远非最佳。

The opposite option is to write SQL manually. This allows you to control the queries fully — your queries will be performed exactly as you wrote them, without any surprises. But that increases the amount of manual labor and the requirements for the developers’ qualifications.

相反的选择是手动编写SQL。 这使您可以完全控制查询-您的查询将完全按照编写它们的方式执行,不会有任何意外。 但这增加了体力劳动量和对开发人员资格的要求。

DSL库 (DSL libraries)

Somewhere between these two approaches is another one: using DSL libraries (jOOQ, Querydsl, etc.). These are usually much more lightweight than ORM, but more convenient than completely manual work with a database. Their usage is not so widespread, so in this article, we will briefly consider this approach.

这两种方法之间的另一处是:使用DSL库( jOOQQuerydsl等)。 这些通常比ORM轻巧得多,但比完全手动使用数据库更方便。 它们的用法并不广泛,因此在本文中,我们将简要考虑这种方法。

We’re going to discuss jOOQ, which offers

我们将讨论jOOQ ,它提供

  • database inspection and automatic generation of classes

    数据库检查和自动生成类
  • fluent API for writing queries

    编写查询的流利API

Since jOOQ is not an ORM, there is no auto-generation of queries or caching, but some of the manual approach problems are fully covered:

由于jOOQ不是ORM,因此不会自动生成查询或缓存,但是完全解决了一些手动方法问题:

  • classes for tables, views, functions, and other database objects are generated automatically;

    表,视图,函数和其他数据库对象的类是自动生成的;
  • queries are written in Java, which ensures type safety — if the query is syntactically incorrect or has a wrong type parameter, it will not compile — your IDE will immediately indicate an error. You won’t have to spend time launching the application to check if the query is correct. This speeds up the development process and reduces the error probability.

    查询是用Java编写的,可确保类型安全-如果查询在语法上不正确或具有错误的类型参数,它将无法编译-您的IDE将立即指示错误。 您将不必花费时间启动应用程序来检查查询是否正确。 这样可以加快开发过程并降低错误概率。

Queries in code look like this:

代码查询如下所示

BookRecord book = dslContext.selectFrom(BOOK)
.where(BOOK.LANGUAGE.eq(“DE”))
.orderBy(BOOK.TITLE)
.fetchAny();

You can also use plain SQL if you want to:

如果要执行以下操作,也可以使用普通SQL:

Result<Record> records = dslContext.fetch(“SELECT * FROM BOOK WHERE LANGUAGE = ? ORDER BY TITLE LIMIT 1”, “DE”);

Obviously, in this case, the query’s correctness and the results parsing are completely on you.

显然,在这种情况下,查询的正确性和结果解析完全取决于您。

jOOQ记录和POJO (jOOQ Record and POJO)

BookRecord, in the example above, is a wrapper for the row in the “book” table that implements the active record pattern. Because this class is a part of the data access layer (moreover, its specific implementation), you might not want to pass it to other layers of the application but rather use your POJO. For an easy record ↔ POJO conversion, jOOQ offers several mechanisms: automatic ones and a manual one. The documentation referenced above contains various examples for reading, but there are no examples for inserting new data or updating it. Let us fill in this gap:

在上面的示例中,BookRecord是“ book”表中用于实现活动记录模式的行的包装。 由于此类是数据访问层的一部分(此外,它是其特定的实现),因此您可能不希望将其传递给应用程序的其他层,而是使用POJO。 为了轻松记录↔POJO转换,jOOQ提供了几种机制:自动机制和手动机制。 上面引用的文档包含各种读取示例,但没有插入或更新新数据的示例。 让我们填补这一空白:

private static final RecordUnmapper<Book, BookRecord> unmapper =
book -> new BookRecord(book.getTitle(), …); // some kind of logicspublic void create(Book book) {
context.insertInto(BOOK)
.set(unmapper.unmap(book))
.execute();
}

As you can see, it’s simple enough.

如您所见,它很简单。

This approach allows you to hide implementation details within a data access layer class and avoid leaks to other application layers.

这种方法使您可以在数据访问层类中隐藏实现细节,并避免泄漏到其他应用程序层。

Jooq can also generate DAO classes with a set of basic methods to simplify data handling for this table and reduce the amount of manual code writing (very similar to Spring Data JPA):

Jooq还可以使用一组基本方法来生成DAO类,以简化该表的数据处理并减少手动编写代码的数量(非常类似于Spring Data JPA):

public interface DAO<R extends TableRecord<R>, P, T> {
void insert(P object) throws DataAccessException;
void update(P object) throws DataAccessException;
void delete(P… objects) throws DataAccessException;
void deleteById(T… ids) throws DataAccessException;
boolean exists(P object) throws DataAccessException;

}

We don’t use auto-generation DAO classes at our company — we only generate wrappers for database objects and write the queries manually. Wrapper generation takes place every time a dedicated Maven module for the migrations is rebuilt. Further in this article, there will be more details on how we implemented the wrapper generation.

我们在公司不使用自动生成的DAO类-我们只为数据库对象生成包装器并手动编写查询。 每次重建用于迁移的专用Maven模块时,都会生成包装器。 本文的进一步内容将提供有关如何实现包装器生成的更多详细信息。

测验 (Testing)

Testing is an essential part of the development process. Useful tests guarantee the quality of your code and save time in its further maintenance. The opposite is also true: wrong tests may create an illusion of high-quality code, conceal errors, and slow down the development process. That’s why it is not enough just to decide that you’re going to write tests — you need to do it right. However, the proper test concept is very vague, and everyone has their idea of what it means.

测试是开发过程的重要组成部分。 有用的测试可确保代​​码的质量,并节省进一步维护的时间。 反之亦然:错误的测试可能会产生高质量代码的错觉,掩盖错误并减慢开发过程。 这就是为什么仅仅确定要编写测试还不够—您需要正确地做。 但是,正确的测试概念非常模糊,每个人都对它的含义有所了解。

The same is true for the classification of tests. In this article, we suggest the following classification:

测试分类也是如此。 在本文中,我们建议以下分类:

  • unit-testing

    单元测试
  • integration testing

    整合测试
  • end-to-end testing

    端到端测试

Unit testing implies checking the functionality of each module separately from others. Once again, there is no single opinion on the size of the module. Some believe it should be a single method; others that it should be a class. Separation means that all other modules will be replaced with mocks or stubs in testing. Follow this link to read Martin Fowler’s article about the difference between the two. Unit tests are small and fast, but they can only guarantee that an individual unit’s logic is correct.

单元测试意味着分别检查每个模块的功能。 再一次,对模块的大小没有任何意见。 有些人认为这应该是单一方法。 其他人认为应该是一类。 分离意味着在测试中所有其他模块将被模拟或存根替换。 单击此链接以阅读Martin Fowler关于两者之间区别的文章。 单元测试既小又快速,但是只能保证单个单元的逻辑是正确的。

Integration testing, unlike unit testing, allows checking how several modules interact with each other. Working with a database is an excellent example of a situation where integration testing makes sense because it’s challenging to properly mock a database so that every critical detail is taken into account. In most cases, integration tests for databases make a good compromise between execution speed and quality assurance compared to other types of testing. That’s why we will discuss this type of testing more thoroughly in this article.

与单元测试不同,集成测试允许检查多个模块之间如何交互。 使用数据库是一个很好的例子,说明集成测试很有意义,因为正确地模拟数据库以使每个关键细节都被考虑到是一个挑战。 在大多数情况下,与其他类型的测试相比,数据库集成测试在执行速度和质量保证之间做出了很好的折衷。 因此,我们将在本文中更全面地讨论这种类型的测试。

End-to-end testing is the most extensive type of testing. To carry it out, you need to set up an entire environment. It guarantees the highest level of confidence in the product quality, although it is the slowest and most expensive type of test.

端到端测试是最广泛的测试类型。 要实现它,您需要设置整个环境。 尽管这是最慢和最昂贵的测试类型,但它可以确保对产品质量的最高置信度。

整合测试 (Integration testing)

When it comes to implementing integration testing for code that works with a database, most developers have to address the same questions: how to start the database, initialize its state, and how to do so as quickly as possible.

在针对与数据库一起使用的代码实施集成测试时,大多数开发人员必须解决相同的问题:如何启动数据库,初始化其状态以及如何尽快进行。

Some time ago, it was common to use H2 in integration testing. H2 is an in-memory database written in Java that has modes that allow compatibility with most popular databases. It’s versatile and doesn’t require installing a database, making it a very convenient replacement for actual databases, especially in applications that don’t depend on a specific database and use only what is included in the SQL standard (which is not always the case).

前一段时间,在集成测试中通常使用H2 。 H2是用Java编写的内存数据库,其模式允许与大多数流行的数据库兼容。 它用途广泛,不需要安装数据库,因此非常适合替换实际的数据库,尤其是在不依赖特定数据库并且仅使用SQL标准中包含的内容的应用程序中(并非总是如此) )。

The trouble begins when you start to use some tricky database features that are not supported in H2 or completely new ones from a fresh release. In general, since this is a “simulation” of a specific DBMS, it might not perfectly reflect the behavior of that DBMS.

当您开始使用H2中不支持的某些棘手的数据库功能或全新发行版中的全新功能时,麻烦就开始了。 通常,由于这是特定DBMS的“模拟”,因此它可能无法完美反映该DBMS的行为。

Another option is to use embedded Postgres. This is the real Postgres shipped as an archive; it also does not require installation. You may work with it as you would with a regular Postgres version.

另一种选择是使用嵌入式Postgres 。 这是真正的Postgres存档文件。 它也不需要安装。 您可以像使用常规Postgres版本一样使用它。

There are several implementations thereof. The most popular are the ones developed by Yandex and openTable. We used the Yandex version. Its disadvantages are a rather slow start (the archive is unpacked every time, and the database launch takes 2–5 seconds depending on the computer’s capacity) and the delay behind the official release version. We also had an error that occurred sometimes after an attempt to stop it with code: the Postgres process would keep running in the OS until we killed it manually.

有几种实现方式。 最受欢迎的是YandexopenTable开发的。 我们使用了Yandex版本。 它的缺点是启动速度很慢(每次都解压缩归档文件,数据库启动需要2到5秒钟,具体取决于计算机的容量),并且延迟了正式发行版本。 在尝试使用代码停止它之后,有时也会发生错误:Postgres进程将继续在OS中运行,直到我们手动将其杀死。

测试容器 (Testcontainers)

The third option is to use docker. There is a Testcontainers library that provides an API for working with docker containers through your code for Java. This way, any dependency in your application with a docker image can be replaced in tests with Testcontainers. Also, there are separate ready-to-use classes for many popular technologies that provide a more convenient API, depending on the image used:

第三种选择是使用docker。 有一个Testcontainers库,该库提供用于通过Java代码使用Docker容器的API。 这样,可以在测试中使用Testcontainers替换应用程序中具有docker映像的任何依赖项。 此外,根据所使用的图像,有许多适用于许多流行技术的单独的即用类,它们提供了更方便的API:

By the way, when the Tescontainers project became popular, Yandex developers officially announced that they were stopping the development of their embedded Postgres project and advised everyone to switch to Testcontainers.

顺便说一下,当Tescontainers项目开始流行时,Yandex开发人员正式宣布他们将停止其嵌入式Postgres项目的开发,并建议所有人都改用Testcontainers。

What are the pros:

优点是什么:

  • Testcontainers is fast (starting empty Postgres takes less than a second).

    Testcontainers很快(启动空的Postgres不到一秒钟)。
  • The Postgres community releases official docker images for each new version.

    Postgres社区为每个新版本发布官方docker映像

  • Testcontainers has a dedicated process that kills hanging containers after shutting JVM down unless you have killed them yourself.

    Testcontainers有专门的过程,可以在关闭JVM后杀死挂起的容器,除非您自己杀死了它们。

  • with Testcontainers you can use a unified approach to test your application’s external dependencies, which makes things easier.

    使用Testcontainer,您可以使用统一的方法来测试应用程序的外部依赖关系,这使事情变得更加容易。

An example of a test with Postgres:

使用Postgres进行测试的示例:

@Test
public void testSimple() throws SQLException {
try (PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>()) {
postgres.start();
ResultSet resultSet = performQuery(postgres, "SELECT 1");
int resultSetInt = resultSet.getInt(1);
assertEquals("A basic SELECT query succeeds", 1, resultSetInt);
}
}

If there is no separate class for the image in Testcontainers, then creating a container looks like this:

如果在Testcontainers中没有单独的图像类,则创建一个容器如下所示

public static GenericContainer redis = new GenericContainer(“redis:3.0.2”).withExposedPorts(6379);

Testcontainers has extra support for JUnit4, JUnit5, and Spock, making writing tests easier if you are using these frameworks.

Testcontainers对JUnit4,JUnit5和Spock具有额外的支持,如果您使用这些框架,则使编写测试变得更加容易。

使用Testcontainer加快测试速度 (Speeding up tests with Testcontainers)

Switching from embedded Postgres to Testcontainers made our tests faster because the Postgres launch was taking less time. However, over time, the tests began to slow down again due to the increased SQL migrations that Flyway performs at startup. When the number of migrations exceeded one hundred, the execution time was about 7–8 seconds, which significantly slowed down the tests. Here’s how it worked:

从嵌入式Postgres切换到Testcontainers可以使我们的测试更快,因为Postgres的发布花费的时间更少。 但是,随着时间的流逝,由于Flyway在启动时执行SQL迁移增加,测试又开始放慢速度。 当迁移数量超过一百时,执行时间约为7-8秒,这大大降低了测试速度。 运作方式如下:

  1. Before the next test class, a “clean” container with Postgres was launched.

    在下一堂课之前,一个带有Postgres的“干净”容器被启动了。
  2. Flyway performed the migrations.

    Flyway执行了迁移。
  3. Tests for this class were executed.

    执行了此类的测试。
  4. The container was stopped and removed.

    停止并取出容器。
  5. All of this was repeated for the next test class.

    所有这些都在下一个测试课程中重复进行。

Obviously, over time, the second step took longer and longer.

显然,随着时间的流逝,第二步花费的时间越来越长。

While trying to solve this problem, we realized that it is enough to perform migrations only once before all tests, save the container’s state, and then use it for all tests. So we changed the algorithm:

在尝试解决此问题时,我们意识到只需要在所有测试之前执行一次迁移,保存容器的状态,然后将其用于所有测试就足够了。 因此,我们更改了算法:

  1. A “clean” container with Postgres launches before all tests.

    在进行所有测试之前,将启动带有Postgres的“干净”容器。
  2. Flyway performs the migrations.

    Flyway执行迁移。
  3. The сontainer state is saved.

    容器状态被保存。
  4. Before the next test class, a previously prepared container is launched.

    在下一个测试课程之前,将启动一个预先准备好的容器。
  5. Tests for this class are executed.

    将执行此类的测试。
  6. The container stops and is removed.

    容器停下并取出。
  7. The algorithm repeats from step 4 for the next test class.

    对于下一个测试类,该算法将从步骤4开始重复。

Now the execution time of an individual test doesn’t depend on the number of migrations, and with the current number of migrations (200+), the new algorithm saves several minutes on each run of all tests.

现在,单个测试的执行时间不再取决于迁移的数量,而根据当前的迁移数量(200多个),新算法可在所有测试的每次运行中节省几分钟。

以下是有关如何实现它的一些高级技巧。 (Here are some advanced tips on how to implement it.)

Docker has a built-in mechanism for creating a new image from a running container using the commit command. This allows you to customize the images, for example, by changing some settings.

Docker具有内置机制,可使用commit命令从正在运行的容器中创建新映像。 这使您可以自定义图像,例如,通过更改某些设置。

An important detail is that this command doesn’t save the data of the mounted partitions. But suppose you use the official Postgres docker image. In that case, the PGDATA directory in which the data is stored is located in such a separate partition so as not to lose data after restarting. So, when the commit is executed, the state of the database will not be saved.

一个重要的细节是该命令不会保存已安装分区的数据。 但是,假设您使用了官方的Postgres码头工人形象。 在这种情况下,存储数据的PGDATA目录位于这样一个单独的分区中,以便在重新启动后不会丢失数据。 因此,执行提交时,将不会保存数据库的状态。

The solution is to not use the partition for PGDATA but to keep the data in memory, which is quite normal for tests. There are two ways to achieve this: (1) use your dockerfile (it may look like this) without creating a directory, or (2) override the PGDATA variable when starting the official container (the former directory will still be made, but won’t be used). The second way seems much simpler:

解决方案是不使用PGDATA分区,而是将数据保留在内存中,这对于测试来说是很正常的。 有两种方法可以实现此目的:(1)在不创建目录的情况下使用dockerfile(可能看起来像这样),或(2)在启动正式容器时覆盖PGDATA变量(仍将创建前一个目录,但会赢得)不会使用)。 第二种方法似乎更简单:

PostgreSQLContainer<?> container = ...
container.addEnv("PGDATA", "/var/lib/postgresql/data-no-mounted");
container.start();

It is recommended to make a checkpoint for Postgres before committing to copy the shared buffers’ changes to the “disk” (which matches the overridden PGDATA variable):

建议在将共享缓冲区的更改复制到“磁盘”(与覆盖的PGDATA变量匹配)之前,为Postgres创建一个检查点

container.execInContainer(“psql”, “-c”, “checkpoint”);

This is how the commit is executed:

这是执行提交的方式:

CommitCmd cmd = container.getDockerClient().commitCmd(container.getContainerId())
.withMessage("Container for integration tests. ...")
.withRepository(imageName)
.withTag(tag);
String imageId = cmd.exec();

It should be noted that the approach of using prepared images can be implemented for many other images, which will also save time during integration tests.

应该注意的是,使用准备好的图像的方法可以用于许多其他图像,这也将节省集成测试的时间。

关于优化构建时间的几句话 (A few more words about optimizing build time)

As it has been mentioned earlier, when building the separate Maven module with migrations, among other things, Java wrappers are generated over the database objects. We use a self-written Maven plugin that runs before compiling the main code and performs three actions:

如前所述,在通过迁移构建单独的Maven模块时,Java包装器将在数据库对象上生成。 我们使用一个自编写的Maven插件,该插件在编译主代码之前运行,并执行三个操作:

  1. It launches a “clean” docker container with Postgres.

    它使用Postgres启动一个“干净”的docker容器。
  2. It launches Flyway to perform SQL migrations for all databases, thereby checking their validity.

    它启动Flyway以对所有数据库执行SQL迁移,从而检查其有效性。
  3. It launches Jooq to inspect the database schema and generate Java classes for tables, views, functions, and other schema objects.

    它启动Jooq来检查数据库模式并为表,视图,函数和其他模式对象生成Java类。

As you can see, the first two actions are identical to those performed when the tests start. To save time on creating the container and running migrations before tests, we have moved the container state’s saving state to the plugin. This means that the prepared images of all databases used in the code will appear in the local Docker images repository immediately after rebuilding the module.

如您所见,前两个动作与测试开始时执行的动作相同。 为了节省创建容器和在测试前运行迁移的时间,我们已将容器状态的保存状态移至插件。 这意味着在重建模块后,代码中使用的所有数据库的准备好的映像将立即显示在本地Docker映像存储库中。

A more detailed code example

更详细的代码示例

An adapter for PostgreSQLContainer for use in the plugin:

PostgreSQLContainer的适配器,可在插件中使用:

@ThreadSafe
public class PostgresContainerAdapter implements PostgresExecutable {
    private static final String ORIGINAL_IMAGE = "postgres:11.6-alpine";


    @GuardedBy("this")
    @Nullable
    private PostgreSQLContainer<?> container; // not null if it is running


    @Override
    public synchronized String start(int port, String db, String user, String password) 
    {
        Preconditions.checkState(container == null, "postgres is already running");


        PostgreSQLContainer<?> newContainer = new PostgreSQLContainer<>(ORIGINAL_IMAGE)
            .withDatabaseName(db)
            .withUsername(user)
            .withPassword(password);


        newContainer.addEnv("PGDATA", "/var/lib/postgresql/data-no-mounted");


        // workaround for using fixed port instead of random one chosen by docker
        List<String> portBindings = new ArrayList<>(newContainer.getPortBindings());
        portBindings.add(String.format("%d:%d", port, POSTGRESQL_PORT));
        newContainer.setPortBindings(portBindings);
        newContainer.start();


        container = newContainer;
        return container.getJdbcUrl();
    }


    @Override
    public synchronized void saveState(String name) {
        try {
            Preconditions.checkState(container != null, "postgres isn't started yet");


            // flush all changes
            doCheckpoint(container);


            commitContainer(container, name);
        } catch (Exception e) {
            stop();
            throw new RuntimeException("Saving postgres container state failed", e);
        }
    }


    @Override
    public synchronized void stop() {
        Preconditions.checkState(container != null, "postgres isn't started yet");


        container.stop();
        container = null;
    }


    private static void doCheckpoint(PostgreSQLContainer<?> container) {
        try {
            container.execInContainer("psql", "-c", "checkpoint");
        } catch (IOException | InterruptedException e) {
            throw new RuntimeException(e);
        }
    }


    private static void commitContainer(PostgreSQLContainer<?> container, String image)
    {
        String tag = "latest";
        container.getDockerClient().commitCmd(container.getContainerId())
            .withMessage("Container for integration tests. It uses non default location for PGDATA which is not mounted to a volume")
            .withRepository(image)
            .withTag(tag)
            .exec();
    }
    // ...
}

The plugin’s start goal implementation:

插件的start目标实现:

@Mojo(name = "start")
public class PostgresPluginStartMojo extends AbstractMojo {
    private static final Logger logger = LoggerFactory.getLogger(PostgresPluginStartMojo.class);


    @Nullable
    static PostgresExecutable postgres;


    @Parameter(defaultValue = "5432")
    private int port;
    @Parameter(defaultValue = "dbName")
    private String db;
    @Parameter(defaultValue = "userName")
    private String user;
    @Parameter(defaultValue = "password")
    private String password;


    @Override
    public void execute() throws MojoExecutionException {
        if (postgres != null) {
            logger.warn("Postgres already started");
            return;
        }
        logger.info("Starting Postgres");
        if (!isDockerInstalled()) {
            throw new IllegalStateException("Docker is not installed");
        }
        String url = start();
        testConnection(url, user, password);
        logger.info("Postgres started at " + url);
    }


    private String start() {
        postgres = new PostgresContainerAdapter();
        return postgres.start(port, db, user, password);
    }


    private static void testConnection(String url, String user, String password) throws MojoExecutionException {
        try (Connection conn = DriverManager.getConnection(url, user, password)) {
            conn.createStatement().execute("SELECT 1");
        } catch (SQLException e) {
            throw new MojoExecutionException("Exception occurred while testing sql connection", e);
        }
    }


    private static boolean isDockerInstalled() {
        if (CommandLine.executableExists("docker")) {
            return true;
        }
        if (CommandLine.executableExists("docker.exe")) {
            return true;
        }
        if (CommandLine.executableExists("docker-machine")) {
            return true;
        }
        if (CommandLine.executableExists("docker-machine.exe")) {
            return true;
        }
        return false;
    }
}

Thesave-stateand stop goals are implemented in a very similar way and therefore omitted here.

save-statestop目标的实现方式非常相似,因此在此省略。

Using in pom.xml:

pom.xml使用:

<build>
  <plugins>
    <plugin>
      <groupId>com.miro.maven</groupId>
      <artifactId>PostgresPlugin</artifactId>
      <executions>
        <!-- running a postgres container -->
        <execution>
          <id>start-postgres</id>
          <phase>generate-sources</phase>
          <goals>
            <goal>start</goal>
          </goals>
          
          <configuration>
            <db>${db}</db>
            <user>${dbUser}</user>
            <password>${dbPassword}</password>
            <port>${dbPort}</port>
          </configuration>
        </execution>
        
        <!-- applying migrations and generation java-classes -->
        <execution>
          <id>flyway-and-jooq</id>
          <phase>generate-sources</phase>
          <goals>
            <goal>execute-mojo</goal>
          </goals>
          
          <configuration>
            <plugins>
              <!-- applying migrations -->
              <plugin>
                <groupId>org.flywaydb</groupId>
                <artifactId>flyway-maven-plugin</artifactId>
                <version>${flyway.version}</version>
                <executions>
                  <execution>
                    <id>migration</id>
                    <goals>
                      <goal>migrate</goal>
                    </goals>
                    
                    <configuration>
                      <url>${dbUrl}</url>
                      <user>${dbUser}</user>
                      <password>${dbPassword}</password>
                      <locations>
                        <location>filesystem:src/main/resources/migrations</location>
                      </locations>
                    </configuration>
                  </execution>
                </executions>
              </plugin>
 
              <!-- generation java-classes -->
              <plugin>
                <groupId>org.jooq</groupId>
                <artifactId>jooq-codegen-maven</artifactId>
                <version>${jooq.version}</version>
                <executions>
                  <execution>
                    <id>jooq-generate-sources</id>
                    <goals>
                      <goal>generate</goal>
                    </goals>
                      
                    <configuration>
                      <jdbc>
                        <url>${dbUrl}</url>
                        <user>${dbUser}</user>
                        <password>${dbPassword}</password>
                      </jdbc>
                      
                      <generator>
                        <database>
                          <name>org.jooq.meta.postgres.PostgresDatabase</name>
                          <includes>.*</includes>
                          <excludes>
                            #exclude flyway tables
                            schema_version | flyway_schema_history
                            # other excludes
                          </excludes>
                          <includePrimaryKeys>true</includePrimaryKeys>
                          <includeUniqueKeys>true</includeUniqueKeys>
                          <includeForeignKeys>true</includeForeignKeys>
                          <includeExcludeColumns>true</includeExcludeColumns>
                        </database>
                        <generate>
                          <interfaces>false</interfaces>
                          <deprecated>false</deprecated>
                          <jpaAnnotations>false</jpaAnnotations>
                          <validationAnnotations>false</validationAnnotations>
                        </generate>
                        <target>
                          <packageName>com.miro.persistence</packageName>
                          <directory>src/main/java</directory>
                        </target>
                      </generator>
                    </configuration>
                  </execution>
                </executions>
              </plugin>
            </plugins>
          </configuration>
        </execution>
 
        <!-- creation an image for integration tests -->
        <execution>
          <id>save-state-postgres</id>
          <phase>generate-sources</phase>
          <goals>
            <goal>save-state</goal>
          </goals>
          
          <configuration>
            <name>postgres-it</name>
          </configuration>
        </execution>
 
        <!-- stopping the container -->
        <execution>
          <id>stop-postgres</id>
          <phase>generate-sources</phase>
          <goals>
            <goal>stop</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

发布(Release)

The code has been written and tested, and now it’s time to release. In general, the complexity of a release depends on the following factors:

该代码已经过编写和测试,现在该发布了。 通常,发布的复杂性取决于以下因素:

  • the number of databases (one or more)

    数据库数量(一个或多个)
  • the database size

    数据库大小
  • the number of application servers (one or more)

    应用程序服务器的数量(一个或多个)
  • whether the release is seamless (whether application downtime is allowed)

    版本是否无缝(是否允许应用程序停机)

The first and third items above impose a backward compatibility requirement on the code since, in most cases, it is impossible to update all databases and all application servers simultaneously: there will always be a moment when the databases have different schemas, and the servers have different versions of the code.

上面的第一和第三项对代码提出了向后兼容性的要求,因为在大多数情况下,不可能同时更新所有数据库和所有应用程序服务器:总有一段时间数据库具有不同的架构,并且服务器具有不同版本的代码。

The database size affects the migration time: the larger the database, the more likely you will have to perform a long migration.

数据库的大小会影响迁移时间:数据库越大,执行长时间迁移的可能性就越大。

Being seamless is partly a resulting factor: if a release is performed with a downtime, then the first three items are not so important and only affect the time when the app is unavailable.

保持无缝是部分导致的结果:如果发布是在停机期间执行的,那么前三个项目就不是那么重要,而只会影响应用程序不可用的时间。

Here’s what our service looks like:

我们的服务如下所示:

  • There are about 30 database clusters.

    大约有30个数据库集群。
  • The size of a single database is 200–400 GB.

    单个数据库的大小为200–400 GB。
  • Several application servers (their number is auto-scaled during the day depending on the load; it can exceed 100 at peak times), and each server is connected to all databases.

    几个应用程序服务器(它们的数量会根据负载在一天中自动缩放;在高峰时间可能会超过100),并且每个服务器都连接到所有数据库。
  • Our releases are seamless.

    我们的发布是无缝的。

We do canary releases: a new version of the application is first applied to a few servers (we call it a prerelease), and after a while, if no errors are found in the prerelease, this version is released to the rest of the servers. Thus, production servers can have different versions.

我们进行金丝雀发布:首先将新版本的应用程序应用到一些服务器(我们称其为预发行版),过一会儿,如果在预发行版中未发现错误,则将该版本发布到其余服务器。 因此,生产服务器可以具有不同的版本。

Each application server checks the DB version against the script versions from the source code (in terms of Flyway, this is called validation). If the versions differ, the server will not start, which guarantees compatibility between the code and the database. This also prevents a situation where, for example, a code is trying to access a database table that hasn’t been created yet because the migration is in a different version of the server.

每个应用程序服务器都会根据源代码中的脚本版本检查数据库版本(就Flyway而言,这称为验证)。 如果版本不同,则服务器将无法启动,从而保证了代码与数据库之间的兼容性。 例如,这还可以防止由于迁移位于服务器的其他版本中而导致代码正在尝试访问尚未创建的数据库表的情况。

But this certainly does not resolve the problem when, for example, a new version of an application has a migration that removes a column from a table that can still be used in the old version of the server. Currently, we only check such situations at the review stage (which is mandatory), but we need to implement a different stage with this check our CI/CD cycle.

但这当然不能解决问题,例如,当新版本的应用程序进行迁移时,该迁移会将表中的列删除,而该表仍可以在服务器的旧版本中使用。 当前,我们仅在审查阶段检查这种情况(这是强制性的),但是我们需要在此阶段实施CI / CD周期的另一个检查阶段。

Sometimes migrations can take a lot of time (for example, updating data in a large table); in this case, we use a combined migration technique to avoid slowing down the releases. This technique consists of two steps. First, we manually run the migration on a running server (via the administration panel, without Flyway, and, accordingly, without any record in the schema history table). Second, we do the same migration in the “regular” way at the server’s next version. The migrations of this kind must satisfy the following requirements:

有时迁移可能会花费大量时间(例如,更新大表中的数据)。 在这种情况下,我们使用组合迁移技术来避免放慢发行速度。 此技术包括两个步骤。 首先,我们在运行中的服务器上手动运行迁移(通过管理面板,没有Flyway,因此,在模式历史记录表中没有任何记录)。 其次,我们在服务器的下一版本中以“常规”方式进行相同的迁移。 这种迁移必须满足以下要求:

  • It must be written so that it does not block the application when it’s running for a long time (the main point here is not to acquire long-term locks at the database level). To do so, we have internal guidelines for developers on how to write migrations. In the future, we may also share them on Medium.

    必须对其进行编写,以使其在长时间运行时不会阻塞应用程序(此处的重点不是在数据库级别获取长期锁定)。 为此,我们为开发人员提供了有关如何编写迁移的内部准则。 将来,我们可能还会在Medium上分享它们。
  • A “regular” run of the migration should realize that it has already been performed in manual mode; in this case, it should do nothing but add a new record in the schema history table. For SQL migrations, such a check is performed by executing an SQL query to see if anything has changed. There is another approach for Java migrations — we use some stored boolean flags that are set after a manual run.

    迁移的“常规”运行应意识到它已经在手动模式下执行; 在这种情况下,除了在架构历史记录表中添加新记录外,它什么都不做。 对于SQL迁移,通过执行SQL查询以查看是否已更改来执行这种检查。 Java迁移还有另一种方法-我们使用一些在手动运行后设置的存储的布尔标志。
Image for post

This approach solves two problems:

此方法解决了两个问题:

  • The release is fast (though with some manual actions).

    发布速度很快(尽管有一些手动操作)。
  • All environments (the local ones for developers and the test ones) are updated automatically without manual actions.

    所有环境(开发人员的本地环境和测试环境的本地环境)都将自动更新,而无需手动操作。

监控方式 (Monitoring)

The development cycle doesn’t end after the release. To understand whether the new features work (and how they do), you need to add many metrics. They can be divided into two groups.

发布后,开发周期不会结束。 要了解新功能是否有效(以及它们如何工作),您需要添加许多指标。 它们可以分为两组。

The first group are the very domain-specific metrics: it’s useful for a mail server to know the number of messages sent, a news resource to see the number of unique users per day, etc.

第一组是特定于域的指标:对于邮件服务器来说,了解发送的邮件数,新闻资源查看每天的唯一身份用户数等非常有用。

The second group’s metrics are pretty much universal: they determine the technical state of the server, such as CPU consumption, memory allocation, network load, database status, and so on.

第二组的指标非常通用:它们确定服务器的技术状态,例如CPU消耗,内存分配,网络负载,数据库状态等等。

What exactly you need to monitor and how to do it is the topic of an impressive number of dedicated articles so that it won’t be discussed here. We would like only to remind you of the essential things (here comes Captain Obvious):

您确实需要监视什么以及如何进行监视,这是大量专门文章的主题,因此这里不再讨论。 我们只想提醒您一些重要的事情(上尉船长来了):

Define the metrics in advanceYou need to define a list of basic metrics. And you should do this in advance, before the release — not after the first incident when you don’t understand what is happening to the system.

预先定义指标您需要定义一个基本指标列表。 而且,您应该在发布之前提前执行此操作,而不是在第一次事件发生后(如果您不了解系统正在发生什么情况)。

Set up automatic alertsThis will speed up your reactions and will save time on manual monitoring. Ideally, you should know about problems before users start to notice and point them out to you.

设置自动警报这样可以加快您的React速度,并节省手动监视的时间。 理想情况下,您应该在用户开始注意到并向您指出之前先了解问题。

Collect the metrics from all nodesYou can never have too many metrics or logs. Having data from each node of your system (application server, database, connection pooler, load balancer, etc.) allows you to see the complete picture of its status, and if necessary, to quickly localize the problem.

从所有节点收集指标您永远不能有太多的指标或日志。 从系统的每个节点(应用程序服务器,数据库,连接池,负载平衡器等)获取数据,可以查看其状态的完整图片,并在必要时快速定位问题。

A simple example: loading data from a web page is getting slower. There could be many reasons:

一个简单的例子:从网页加载数据变得越来越慢。 可能有很多原因:

  • The web server is overloaded and is taking a long time to respond to requests.

    Web服务器超载,需要很长时间才能响应请求。
  • The SQL query now takes longer than usual to execute.

    现在,SQL查询所需的时间比平时要长。
  • A longer queue appears at the connection pooler, and the application server can’t get a connection for a long time.

    连接池中出现一个较长的队列,并且应用程序服务器长时间无法连接。
  • There are network problems.

    存在网络问题。
  • Something else.

    还有别的

Finding the cause of the problem without metrics won’t be easy.

没有指标就很难找到问题的根源。

而不是结论 (Instead of a conclusion)

I would like to say quite an obvious thing about the fact that there is no one-size-fits-all solution, and the choice of one approach over another depends on the requirements of the particular task — what works well for others may not be applicable to you. But the more different approaches you are aware of, the more thorough and efficient your decision-making will be. I hope you’ve learned something new from this article that will help you in the future. I would be happy to see comments on what approaches you use to improve your working processes with databases!

我想说一个很明显的事实,那就是没有一种“万能的”解决方案,而选择一种方法而不是另一种方法则取决于特定任务的要求,而对其他方法行之有效的方法可能并非如此。适用于您。 但是,您知道的方法越不同,您的决策就越彻底和有效。 希望您从本文中学到了一些新知识,这些对将来有帮助。 我很高兴看到您使用什么方法来改善数据库工作流程的评论!

加入我们的团队! (Join our team!)

Would you like to be an Engineer, Team Lead or Engineering Manager at Miro? Check out opportunities to join the Engineering team.

您想成为Miro的工程师,团队负责人还是工程经理? 找出加入工程团队的机会。

翻译自: https://medium.com/miro-engineering/handling-databases-from-a-developers-perspective-fc916ab3fe68

对数据库开发人员如何面试

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值