postgresql快照_PostgreSQL-4中的MVCC。快照

最新推荐文章于 2024-07-09 06:09:24 发布

cullen2012

最新推荐文章于 2024-07-09 06:09:24 发布

阅读量533

点赞数

文章标签：数据库 python mysql java 大数据

原文链接：https://habr.com/en/company/postgrespro/blog/479512/

版权

postgresql快照

isolation problems and having made a digression regarding the 隔离问题并对 low-level data structure, last time we explored 低级数据结构进行了论述之后，上次我们探索 row versions and observed how different operations changed tuple header fields. 行版本并观察了不同的操作如何改变元组头字段。

Now we will look at how consistent data snapshots are obtained from tuples.

现在，我们将研究如何从元组中获取一致的数据快照。

什么是数据快照？ (What is a data snapshot?)

Data pages can physically contain several versions of the same row. But each transaction must see only one (or none) version of each row, so that all of them make up a consistent picture of the data (in the sense of ACID) as of a certain point in time.

数据页实际上可以包含同一行的多个版本。但是每个事务只能看到每一行的一个(或没有)版本，以便它们在特定时间点上构成数据的一致图片(按照ACID的意义)。

Isolation in PosgreSQL is based on snapshots: each transaction works with its own data snapshot, which «contains» data that were committed before the moment the snapshot was created and does not «contain» data that were not committed by that moment yet. We've already seen that although the resulting isolation appears stricter than required by the standard, it still has anomalies.

PosgreSQL中的隔离基于快照：每个事务都使用其自己的数据快照，该快照“包含”在创建快照之前提交的数据，并且不“包含”在该时刻尚未提交的数据。我们已经看到，尽管最终的隔离看起来比标准要求的严格，但仍然存在异常。

At the Read Committed isolation level, a snapshot is created at the beginning of each transaction statement. This snapshot is active while the statement is being performed. In the figure, the moment the snapshot was created (which, as we recall, is determined by the transaction ID) is shown in blue.

在“读取已提交”隔离级别，将在每个事务语句的开头创建一个快照。在执行语句时，此快照处于活动状态。在该图中，快照创建的时刻(我们记得它由事务ID决定)以蓝色显示。

At the Repeatable Read and Serializable levels, the snapshot is created once, at the beginning of the first transaction statement. Such a snapshot remains active up to the end of the transaction.

在“可重复读取”和“可序列化”级别上，快照在第一个事务语句的开头创建一次。这样的快照在事务结束之前一直保持活动状态。

快照中元组的可见性 (Visibility of tuples in a snapshot)

可见性规则 (Visibility rules)

A snapshot is certainly not a physical copy of all the necessary tuples. A snapshot is actually specified by several numbers, and the visibility of tuples in a snapshot is determined by rules.

快照当然不是所有必要元组的物理副本。快照实际上由多个数字指定，并且快照中元组的可见性由规则确定。

Whether a tuple will be visible or not in a snapshot depends on two fields in the header, namely, xmin and xmax, that is, the IDs of the transactions that created and deleted the tuple. Intervals like this do not overlap, and therefore, not more than one version represents a row in each snapshot.

元组在快照中是否可见取决于标头中的两个字段，即xmin和xmax ，即创建和删除该元组的事务的ID。这样的间隔不会重叠，因此，每个快照中的一行表示一个以上的版本。

The exact visibility rules are pretty complicated and take into account a lot of different cases and extremes.

确切的可见性规则非常复杂，并考虑了许多不同的情况和极端情况。

To simplify, we can say that a tuple is visible when in the snapshot, changes made by the xmin transaction are visible, while those made by the xmax transaction are not (in other words, it is already clear that the tuple was created, but it is not yet clear whether it was deleted).

为简化起见，我们可以说一个元组是可见的，而在快照中，由xmin事务进行的更改是可见的，而由xmax事务进行的更改则不可见(换句话说，已经创建了元组，但是尚不清楚是否已删除它)。

Regarding a transaction, its changes are visible in the snapshot either if it is that very transaction that created the snapshot (it does see its own not yet committed changes) or the transaction was committed before the snapshot was created.

关于事务，无论是创建快照的事务(它确实看到自己尚未提交的更改)还是创建快照之前提交的事务，快照中的更改都是可见的。

We can graphically represent transactions by segments (from the start time to the commit time):

我们可以按段(从开始时间到提交时间)以图形方式表示事务：

Here:

这里：

Changes of the transaction 2 will be visible since it was completed before the snapshot was created.
事务2的更改是可见的，因为它是在创建快照之前完成的。
Changes of the transaction 1 will not be visible since it was active at the moment the snapshot was created.
事务1的更改将不可见，因为它在创建快照时处于活动状态。
Changes of the transaction 3 will not be visible since it started after the snapshot was created (regardless of whether it was completed or not).
事务3的更改将不可见，因为它是在创建快照后开始的(无论它是否完成)。

Unfortunately, the system is unaware of the commit time of transactions. Only its start time is known (which is determined by the transaction ID and marked with a dashed line in the figures above), but the event of completion is not written anywhere.

不幸的是，系统没有意识到事务的提交时间。仅知道其开始时间(由事务ID确定，并在上图中用虚线标记)，但是完成事件未写入任何地方。

All we can do is to find out the current status of transactions at the snapshot creation. This information is available in the shared memory of the server, in the ProcArray structure, which contains the list of all active sessions and their transactions.

我们所能做的就是在创建快照时找出事务的当前状态。该信息在服务器的共享内存中的ProcArray结构中可用，该结构包含所有活动会话及其事务的列表。

But we will be unable to figure out post factum whether or not a certain transaction was active at the moment the snapshot was created. Therefore, a snapshot has to store a list of all the current active transactions.

但是我们将无法确定事后快照创建时某个事务是否处于活动状态。因此，快照必须存储所有当前活动事务的列表。

From the above it follows that in PostgreSQL, it is not possible to create a snapshot that shows consistent data as of certain time backward, even if all the necessary tuples are available in table pages. A question often arises why PostgreSQL lacks retrospective (or temporal; or flashback, like Oracle calls them) queries — and this is one of the reasons.

从上面可以看出，在PostgreSQL中，即使表页中所有必要的元组都可用，也无法创建快照来显示在特定时间后向一致的数据。经常会产生一个问题，为什么PostgreSQL缺乏追溯性(或时间性；或闪回，如Oracle称呼为Oracle)这是原因之一。

Kind of funny is that this functionality was first available, but then deleted from the DBMS. You can read about this in the article by Joseph M. Hellerstein.

有点可笑的是，此功能最初可用，但随后已从DBMS中删除。您可以在 Joseph M. Hellerstein的文章中阅读有关此内容的信息。

So, the snapshot is determined by several parameters:

因此，快照由几个参数确定：

The moment the snapshot was created, more exactly, the ID of the next transaction, yet unavailable in the system (snapshot.xmax).
创建快照的那一刻，更确切地说，是下一个事务的ID，但在系统中不可用( snapshot.xmax )。
The list of active (in progress) transactions at the moment the snapshot was created (snapshot.xip).
创建snapshot.xip的活动(进行中)事务列表( snapshot.xip )。

For convenience and optimization, the ID of the earliest active transaction is also stored (snapshot.xmin). This value makes an important sense, which will be discussed below.

为了方便和优化，还将存储最早活动事务的ID( snapshot.xmin )。该值具有重要意义，下面将进行讨论。

The snapshot also stores a few more parameters, which are unimportant to us, however.

快照还存储了一些其他参数，但是这些参数对我们来说并不重要。

例 (Example)

To understand how the snapshot determines the visibility, let's reproduce the above example with three transactions. The table will have three rows, where:

为了了解快照如何确定可见性，让我们通过三个事务重现上面的示例。该表将具有三行，其中：

The first was added by a transaction that started prior to the snapshot creation but completed after it.
第一个是由在快照创建之前开始但在快照创建之后完成的事务添加的。
The second was added by a transaction that started and completed prior to the snapshot creation.
第二个是通过在快照创建之前开始并完成的事务添加的。
The third was added after the snapshot creation.
第三个是在创建快照后添加的。

=> TRUNCATE TABLE accounts;

The first transaction (not completed yet):

第一笔交易(尚未完成)：

=> BEGIN;
=> INSERT INTO accounts VALUES (1, '1001', 'alice', 1000.00);
=> SELECT txid_current();

=> SELECT txid_current();
 txid_current 
--------------
         3695
(1 row)

The second transaction (completed before the snapshot was created):

第二笔交易(在创建快照之前完成)：

|  => BEGIN;
|  => INSERT INTO accounts VALUES (2, '2001', 'bob', 100.00);
|  => SELECT txid_current();

|   txid_current 
|  --------------
|           3696
|  (1 row)

|  => COMMIT;

Creating a snapshot in a transaction in another session.

在另一个会话的事务中创建快照。

||    => BEGIN ISOLATION LEVEL REPEATABLE READ;
||    => SELECT xmin, xmax, * FROM accounts;

||     xmin | xmax | id | number | client | amount 
||    ------+------+----+--------+--------+--------
||     3696 |    0 |  2 | 2001   | bob    | 100.00
||    (1 row)

Committing the first transaction after the snapshot was created:

创建快照后提交第一个事务：

=> COMMIT;

And the third transaction (appeared after the snapshot was created):

第三笔交易(在创建快照后出现)：

|  => BEGIN;
|  => INSERT INTO accounts VALUES (3, '2002', 'bob', 900.00);
|  => SELECT txid_current();

|   txid_current 
|  --------------
|           3697
|  (1 row)

|  => COMMIT;

Evidently, only one row is still visible in our snapshot:

显然，在我们的快照中仍然仅可见一行：

||    => SELECT xmin, xmax, * FROM accounts;

||     xmin | xmax | id | number | client | amount 
||    ------+------+----+--------+--------+--------
||     3696 |    0 |  2 | 2001   | bob    | 100.00
||    (1 row)

The question is how Postgres understands this.

问题是Postgres如何理解这一点。

All is determined by the snapshot. Let's look at it:

全部由快照确定。让我们看一下：

||    => SELECT txid_current_snapshot();

||     txid_current_snapshot 
||    -----------------------
||     3695:3697:3695
||    (1 row)

Here snapshot.xmin, snapshot.xmax and snapshot.xip are listed, delimited by a colon (snapshot.xip is one number in this case, but in general it's a list).

这里列出了snapshot.xmin ， snapshot.xmax和snapshot.xip ，以冒号分隔(在这种情况下， snapshot.xip是一个数字，但通常是一个列表)。

According to the above rules, in the snapshot, those changes must be visible that were made by transactions with IDs xid such that snapshot.xmin <= xid < snapshot.xmax except those that are on the snapshot.xip list. Let's look at all table rows (in the new snapshot):

根据上述规则，在快照中，那些ID为xid的事务所做的更改必须是可见的，使得snapshot.xmin <= xid < snapshot.xmax除外，而那些snapshot.xip列表中的更改除外。让我们看一下所有表行(在新快照中)：

=> SELECT xmin, xmax, * FROM accounts ORDER BY id;

xmin | xmax | id | number | client | amount  
------+------+----+--------+--------+---------
 3695 |    0 |  1 | 1001   | alice  | 1000.00
 3696 |    0 |  2 | 2001   | bob    |  100.00
 3697 |    0 |  3 | 2002   | bob    |  900.00
(3 rows)

The first row is not visible: it was created by a transaction that is on the list of active transactions (xip).

第一行不可见：它是由活动事务( xip )列表上的事务创建的。

The second row is visible: it was created by a transaction that is in the snapshot range.

第二行可见：它是由快照范围内的事务创建的。

The third row is not visible: it was created by a transaction that is out of the snapshot range.

第三行不可见：它是由快照范围之外的事务创建的。

||    => COMMIT;

交易本身的变化 (Transaction's own changes)

Determining the visibility of the transaction's own changes somewhat complicates the situation. In this case, it may be needed to see only part of such changes. For example: at any isolation level, a cursor opened at a certain point in time must not see changes done later.

确定交易本身更改的可见性会使情况复杂化。在这种情况下，可能仅需要查看部分此类更改。例如：在任何隔离级别，在某个时间点打开的游标一定不能看到以后所做的更改。

To this end, a tuple header has a special field (represented in the cmin and cmax pseudo-columns), which shows the order number inside the transaction. cmin is the number for insertion, and cmax — for deletion, but to save space in the tuple header, this is actually one field rather than two different ones. It is assumed that a transaction infrequently inserts and deletes the same row.

为此，元组标头具有一个特殊字段(在cmin和cmax伪列中表示)，该字段显示事务内的订单号。 cmin是插入和数量cmax -删除，而是保存在元组头空间，这其实是一个领域，而不是两个不同的人。假定事务很少插入和删除同一行。

But if this does happen, a special combo command id (combocid) is inserted in the same field, and the backend process remembers the actual cmin and cmax for this combocid. But this is entirely exotic.

但是，如果发生这种情况，一个特殊的组合命令ID( combocid )插在同一领域，以及后端处理记得实际cmin和cmax为这个combocid 。但这是完全异国情调的。

Here is a simple example. Let's start a transaction and add a row to the table:

这是一个简单的例子。让我们开始一个事务并在表中添加一行：

=> BEGIN;
=> SELECT txid_current();

txid_current 
--------------
         3698
(1 row)

INSERT INTO accounts(id, number, client, amount) VALUES (4, 3001, 'charlie', 100.00);

Let's output the contents of the table, along with the cmin field (but only for rows added by the transaction — for others it is meaningless):

让我们输出表的内容以及cmin字段(但仅适用于事务添加的行，对于其他行则没有意义)：

=> SELECT xmin, CASE WHEN xmin = 3698 THEN cmin END cmin, * FROM accounts;

xmin | cmin | id | number | client  | amount  
------+------+----+--------+---------+---------
 3695 |      |  1 | 1001   | alice   | 1000.00
 3696 |      |  2 | 2001   | bob     |  100.00
 3697 |      |  3 | 2002   | bob     |  900.00
 3698 |    0 |  4 | 3001   | charlie |  100.00
(4 rows)

Now we open a cursor for a query that returns the number of rows in the table.

现在，我们为查询打开一个游标，该查询返回表中的行数。

=> DECLARE c CURSOR FOR SELECT count(*) FROM accounts;

And after that we add another row:

然后，我们添加另一行：

=> INSERT INTO accounts(id, number, client, amount) VALUES (5, 3002, 'charlie', 200.00);

The query returns 4 — the row added after opening the cursor does not get into the data snapshot:

查询返回4-打开游标后添加的行不会进入数据快照：

=> FETCH c;

count 
-------
     4
(1 row)

Why? Because the snapshot takes into account only tuples with cmin < 1.

为什么？因为快照仅考虑cmin < 1元组。

=> SELECT xmin, CASE WHEN xmin = 3698 THEN cmin END cmin, * FROM accounts;

xmin | cmin | id | number | client  | amount  
------+------+----+--------+---------+---------
 3695 |      |  1 | 1001   | alice   | 1000.00
 3696 |      |  2 | 2001   | bob     |  100.00
 3697 |      |  3 | 2002   | bob     |  900.00
 3698 |    0 |  4 | 3001   | charlie |  100.00
 3698 |    1 |  5 | 3002   | charlie |  200.00
(5 rows)

=> ROLLBACK;

活动范围 (Event horizon)

The ID of the earliest active transaction (snapshot.xmin) makes an important sense: it determines the «event horizon» of the transaction. That is, beyond its horizon the transaction always sees only up-to-date row versions.

最早的活动事务的ID( snapshot.xmin )具有重要意义：它确定事务的“事件范围”。也就是说，超出其范围，该事务始终只能看到最新的行版本。

Really, an outdated (dead) row version needs to be visible only when the up-to-date one was created by a not-yet-completed transaction and is, therefore, not visible yet. But all transactions «beyond the horizon» are completed for sure.

实际上，仅当尚未完成的事务创建了最新的(死)行版本时，才需要看到该行版本，因此尚不可见。但是，所有“超越地平线”的交易肯定都已完成。

You can see the transaction horizon in the system catalog:

您可以在系统目录中看到事务范围：

=> BEGIN;
=> SELECT backend_xmin FROM pg_stat_activity WHERE pid = pg_backend_pid();

backend_xmin 
--------------
         3699
(1 row)

We can also define the horizon at the database level. To do this, we need to take all active snapshots and find the oldest xmin among them. And it will define the horizon, beyond which dead tuples in the database will never be visible to any transaction. Such tuples can be vacuumed away — and this is exactly why the concept of horizon is so important from a practical standpoint.

我们还可以在数据库级别定义范围。为此，我们需要获取所有活动快照并在其中找到最旧的xmin 。它将定义范围，超过该范围，数据库中的死元组将永远对任何事务都不可见。 这样的元组可以被清除掉 -这就是为什么从实际的角度来看，地平线的概念如此重要的原因。

If a certain transaction is holding a snapshot for a long time, by that it will also be holding the database horizon. Moreover, just the existence of an uncompleted transaction will hold the horizon even if the transaction itself does not hold the snapshot.

如果某个事务长时间保存快照，那么它也将保存数据库范围。此外，即使事务本身不保存快照，仅存在未完成的事务也将保留视野。

And this means that dead tuples in the DB cannot be vacuumed away. In addition, it is possible that a «long-play» transaction does not intersect by data with other transactions at all, but this does not really matter since all share one database horizon.

这意味着无法清除数据库中的死元组。另外，“长期”事务可能根本不会与其他事务相交，但是这并不重要，因为所有事务共享一个数据库范围。

If we now make a segment represent snapshots (from snapshot.xmin to snapshot.xmax) rather than transactions, we can visualize the situation as follows:

如果现在我们使一个段代表快照(从snapshot.xmin到snapshot.xmax )而不是事务，则可以将情况可视化如下：

In this figure, the lowest snapshot pertains to an uncompleted transaction, and in the other snapshots, snapshot.xmin cannot be greater than the transaction ID.

在此图中，最低的快照与未完成的事务有关，在其他快照中， snapshot.xmin不能大于事务ID。

In our example, the transaction was started with the Read Committed isolation level. Even though it does not have any active data snapshot, it continues to hold the horizon:

在我们的示例中，事务是从“读取已提交”隔离级别开始的。即使它没有任何活动的数据快照，它也会继续保持发展：

|  => BEGIN;
|  => UPDATE accounts SET amount = amount + 1.00;
|  => COMMIT;

=> SELECT backend_xmin FROM pg_stat_activity WHERE pid = pg_backend_pid();

backend_xmin 
--------------
         3699
(1 row)

And only after completion of the transaction, the horizon moves forward, which enables vacuuming dead tuples away:

并且仅在事务完成之后，地平线才向前移动，从而可以清除死元组：

=> COMMIT;
=> SELECT backend_xmin FROM pg_stat_activity WHERE pid = pg_backend_pid();

backend_xmin 
--------------
         3700
(1 row)

In the case the described situation really causes issues and there is no way to work it around at the application level, two parameters are available starting with version 9.6:

如果所描述的情况确实导致问题，并且无法在应用程序级别解决，则从9.6版开始可以使用两个参数：

old_snapshot_threshold determines the maximum lifetime of the snapshot. When this time elapses, the server will be eligible to vacuum dead tuples, and if a «long-play» transaction still needs them, it will get a «snapshot too old» error.
old_snapshot_threshold确定快照的最大生存期。这段时间过后，服务器将有资格清理死元组，并且如果“长期播放”事务仍然需要它们，则会出现“快照太旧”错误。
idle_in_transaction_session_timeout determines the maximal lifetime of an idle transaction. When this time elapses, the transaction aborts.
idle_in_transaction_session_timeout确定空闲事务的最大生存期。经过这段时间后，事务中止。

快照导出 (Snapshot export)

Sometimes situations arise where several concurrent transactions must be guaranteed to see the same data. An example is a pg_dump utility, which can work in a parallel mode: all worker processes must see the database in the same state for the backup copy to be consistent.

有时会出现必须保证多个并发事务才能看到相同数据的情况。一个示例是pg_dump实用程序，它可以在并行模式下工作：所有工作进程都必须以相同的状态查看数据库，以使备份副本保持一致。

Of course, we cannot rely on the belief that the transactions will see the same data just because they were started «simultaneously». To this end, export and import of a snapshot are available.

当然，我们不能依靠这样的信念，即仅仅因为交易是“同时”开始的，交易就会看到相同的数据。为此，可以导出和导入快照。

The pg_export_snapshot function returns the snapshot ID, which can be passed to another transaction (using tools outside the DBMS).

pg_export_snapshot函数返回快照ID，可以将其传递给另一个事务(使用DBMS外部的工具)。

=> BEGIN ISOLATION LEVEL REPEATABLE READ;
=> SELECT count(*) FROM accounts; -- any query

count 
-------
     3
(1 row)

=> SELECT pg_export_snapshot();

pg_export_snapshot  
---------------------
 00000004-00000E7B-1
(1 row)

The other transaction can import the snapshot using the SET TRANSACTION SNAPSHOT command before performing its first query. The Repeatable Read or Serializable isolation level should also be specified before since at the Read Committed level, statements will use their own snapshots.

另一个事务可以在执行第一个查询之前使用SET TRANSACTION SNAPSHOT命令导入快照。还应该在指定可重复读或可序列化隔离级别之前，因为在“读已提交”级别，语句将使用其自己的快照。

|  => DELETE FROM accounts;
|  => BEGIN ISOLATION LEVEL REPEATABLE READ;
|  => SET TRANSACTION SNAPSHOT '00000004-00000E7B-1';

The second transaction will now work with the snapshot of the first one and, therefore, see three rows (rather than zero):

现在，第二个事务将与第一个事务的快照配合使用，因此，请参见三行(而不是零行)：

|  => SELECT count(*) FROM accounts;

|   count 
|  -------
|       3
|  (1 row)

The lifetime of an exported snapshot is the same as the lifetime of the exporting transaction.

导出快照的生存期与导出事务的生存期相同。

|    => COMMIT;
=> COMMIT;

Read on. 继续阅读。

翻译自: https://habr.com/en/company/postgrespro/blog/479512/