MySQL-5.7-8.2.1.11 Block Nested-Loop and Batched Key Access Joins(块嵌套循环和批处理密钥访问连接)

In MySQL, a Batched Key Access (BKA) Join algorithm is available that uses both index access to the joined table and a join buffer.

在MySQL中,一个batch Key Access (BKA) Join算法是可用的,它使用对被连接表的索引访问和连接缓冲区。

The BKA algorithm supports inner join, outer join, and semijoin operations, including nested outer joins. Benefits of BKA include improved join performance due to more efficient table scanning.

BKA算法支持内连接、外连接和半连接操作,包括嵌套的外连接。BKA的好处包括由于更有效的表扫描而提高了连接性能。

Also, the Block Nested-Loop (BNL) Join algorithm previously used only for inner joins is extended and can be employed for outer join and semijoin operations, including nested outer joins.

此外,Block nested - loop (BNL) Join算法也得到了扩展,可以用于外部连接和半连接操作,包括嵌套外部连接。

The following sections discuss the join buffer management that underlies the extension of the original BNL algorithm, the extended BNL algorithm, and the BKA algorithm.

下面几节将讨论连接缓冲区管理,它是原始BNL算法、扩展BNL算法和BKA算法扩展的基础。

For information about semijoin strategies, see Section 8.2.2.1, “Optimizing Subqueries, Derived Tables, and View References with Semijoin Transformations”

Join Buffer Management for Block Nested-Loop and Batched Key Access Algorithms

块嵌套循环和批处理密钥访问算法的连接缓冲区管理

MySQL can employ join buffers to execute not only inner joins without index access to the inner table, but also outer joins and semijoins that appear after subquery flattening.

MySQL不仅可以使用连接缓冲区来执行内部连接,而不需要对内部表进行索引访问,还可以执行子查询扁平化后出现的外部连接和半连接。

Moreover, a join buffer can be effectively used when there is an index access to the inner table.

此外,当有对内部表的索引访问时,可以有效地使用连接缓冲区。

The join buffer management code slightly more efficiently utilizes join buffer space when storing the values of the interesting row columns: No additional bytes are allocated in buffers for a row column if its value is NULL, and the minimum number of bytes is allocated for any value of the VARCHAR type.

加入缓冲管理代码更有效地利用加入缓冲区空间当有趣的行存储值列:没有额外字节的缓冲区分配列一行,如果它的值是NULL,和最低分配的字节数为VARCHAR类型的任何值。

The code supports two types of buffers, regular and incremental. Suppose that join buffer B1 is employed to join tables t1 and t2 and the result of this operation is joined with table t3 using join buffer B2:

该代码支持两种类型的缓冲区,常规缓冲区和增量缓冲区。假设连接缓冲区B1用于连接表t1和t2,该操作的结果使用连接缓冲区B2与表t3连接:

  • A regular join buffer contains columns from each join operand. If B2 is a regular join buffer, each row r put into B2 is composed of the columns of a row r1 from B1 and the interesting columns of a matching row r2 from table t3.

  • 普通连接缓冲区包含来自每个连接操作数的列。如果B2是一个常规连接缓冲区,则放入B2中的每一行r由B1中的行r1的列和表t3中的匹配行r2中的感兴趣的列组成。

  • An incremental join buffer contains only columns from rows of the table produced by the second join operand. That is, it is incremental to a row from the first operand buffer. If B2 is an incremental join buffer, it contains the interesting columns of the row r2 together with a link to the row r1 from B1.

  • 增量连接缓冲区只包含由第二个连接操作数生成的表中的行中的列。也就是说,它是从第一个操作数缓冲区递增到一行的。如果B2是一个增量连接缓冲区,那么它包含行r2中有意义的列,以及从B1到行r1的链接。

Incremental join buffers are always incremental relative to a join buffer from an earlier join operation, so the buffer from the first join operation is always a regular buffer. In the example just given, the buffer B1 used to join tables t1 and t2 must be a regular buffer.

相对于来自较早连接操作的连接缓冲区,增量连接缓冲区总是增量的,因此来自第一个连接操作的缓冲区总是一个常规缓冲区。在刚才给出的示例中,用于连接表t1和t2的缓冲区B1必须是一个常规缓冲区。

Each row of the incremental buffer used for a join operation contains only the interesting columns of a row from the table to be joined. These columns are augmented with a reference to the interesting columns of the matched row from the table produced by the first join operand.

用于连接操作的增量缓冲区的每一行只包含要连接的表中某一行中感兴趣的列。通过引用第一个连接操作数生成的表中匹配行中感兴趣的列来扩充这些列。

Several rows in the incremental buffer can refer to the same row r whose columns are stored in the previous join buffers insofar as all these rows match row r.

增量缓冲区中的几行可以引用相同的行r,该行r的列存储在前一个连接缓冲区中,只要这些行都匹配行r。

Incremental buffers enable less frequent copying of columns from buffers used for previous join operations. This provides a savings in buffer space because in the general case a row produced by the first join operand can be matched by several rows produced by the second join operand.

增量缓冲区允许较不频繁地从用于以前连接操作的缓冲区复制列。这节省了缓冲区空间,因为在一般情况下,第一个连接操作数生成的行可以与第二个连接操作数生成的几行相匹配。

It is unnecessary to make several copies of a row from the first operand. Incremental buffers also provide a savings in processing time due to the reduction in copying time.

没有必要从第一个操作数复制一行。由于复制时间的减少,增量缓冲区还可以节省处理时间。

The block_nested_loop and batched_key_access flags of the optimizer_switch system variable control how the optimizer uses the Block Nested-Loop and Batched Key Access join algorithms. By default, block_nested_loop is on and batched_key_access is off.

optimizer_switch系统变量的block_nested_loop和batched_key_access标志控制优化器如何使用Block Nested-Loop和batch Key Access连接算法。默认情况下,block_nested_loop是打开的,batched_key_access是关闭的。

See Section 8.9.2, “Switchable Optimizations”. Optimizer hints may also be applied; see Optimizer Hints for Block Nested-Loop and Batched Key Access Algorithms.

For information about semijoin strategies, see Section 8.2.2.1, “Optimizing Subqueries, Derived Tables, and View References with Semijoin Transformations”

Block Nested-Loop Algorithm for Outer Joins and Semijoins

用于外连接和半连接的块嵌套循环算法

The original implementation of the MySQL BNL algorithm is extended to support outer join and semijoin operations.

对MySQL BNL算法的原始实现进行了扩展,支持外部连接和半连接操作。

When these operations are executed with a join buffer, each row put into the buffer is supplied with a match flag.

当使用连接缓冲区执行这些操作时,缓冲区中的每一行都提供一个匹配标志。

If an outer join operation is executed using a join buffer, each row of the table produced by the second operand is checked for a match against each row in the join buffer.

如果使用连接缓冲区执行外部连接操作,则会检查由第二个操作数生成的表中的每一行是否与连接缓冲区中的每一行相匹配

When a match is found, a new extended row is formed (the original row plus columns from the second operand) and sent for further extensions by the remaining join operations.

当找到匹配时,将形成一个新的扩展行(原始行加上来自第二个操作数的列),并由其余的连接操作发送进一步的扩展。

In addition, the match flag of the matched row in the buffer is enabled. After all rows of the table to be joined have been examined, the join buffer is scanned.

同时,在缓冲区中使能匹配行的match标志。在检查了要连接的表的所有行之后,将扫描连接缓冲区。

Each row from the buffer that does not have its match flag enabled is extended by NULL complements (NULL values for each column in the second operand) and sent for further extensions by the remaining join operations.

缓冲区中没有启用匹配标志的每一行都由NULL补码(第二个操作数中每一列的NULL值)扩展,并由其余的连接操作发送以进行进一步的扩展。

The block_nested_loop flag of the optimizer_switch system variable controls how the optimizer uses the Block Nested-Loop algorithm. By default, block_nested_loop is on.

optimizer_switch系统变量的block_nested_loop标志控制优化器如何使用Block Nested-Loop算法。默认情况下,block_nested_loop是打开的。

See Section 8.9.2, “Switchable Optimizations”. Optimizer hints may also be applied; see Optimizer Hints for Block Nested-Loop and Batched Key Access Algorithms.

In EXPLAIN output, use of BNL for a table is signified when the Extra value contains Using join buffer (Block Nested Loop) and the type value is ALLindex, or range.

在EXPLAIN输出中,当Extra值包含Using join buffer (Block Nested Loop)且类型值为ALL、index或range时,表示对表使用BNL。

Some cases involving the combination of one or more subqueries with one or more left joins, particularly those returning many rows, may use BNL even though it is not ideal in such instances.

有些情况涉及到一个或多个子查询与一个或多个左连接的组合,特别是那些返回许多行的情况,可以使用BNL,尽管在这种情况下它不是理想的。

This is a known issue which is fixed in MySQL 8.0. If upgrading MySQL is not immediately feasible for you, you may wish to disable BNL in the meantime by setting optimizer_switch='block_nested_loop=off' or employing the NO_BNL optimizer hint to let the optimizer choose a better plan, using one or more index hints (see Section 8.9.4, “Index Hints”), or some combination of these, to improve the performance of such queries.

这是一个已知的问题,在MySQL 8.0中已经修复。如果不立即可行的升级MySQL,您可能希望禁用BNL同时通过设置optimizer_switch = ' block_nested_loop =”或者使用NO_BNL优化器提示让优化器选择一个更好的计划,使用一个或多个索引提示(参见8.9.4指数提示),或它们的一些组合,以提高此类查询的性能。

For information about semijoin strategies, see Section 8.2.2.1, “Optimizing Subqueries, Derived Tables, and View References with Semijoin Transformations”

Batched Key Access Joins

批量密钥访问连接

MySQL implements a method of joining tables called the Batched Key Access (BKA) join algorithm.

MySQL实现了一种连接表的方法,称为批处理键访问(BKA)连接算法。

BKA can be applied when there is an index access to the table produced by the second join operand.

BKA可以应用于由第二个连接操作数产生的对表的索引访问。

Like the BNL join algorithm, the BKA join algorithm employs a join buffer to accumulate the interesting columns of the rows produced by the first operand of the join operation.

与BNL连接算法类似,BKA连接算法使用一个连接缓冲区来积累由连接操作的第一个操作数产生的行中感兴趣的列。

Then the BKA algorithm builds keys to access the table to be joined for all rows in the buffer and submits these keys in a batch to the database engine for index lookups.

然后,BKA算法构建键来访问缓冲区中所有行将连接的表,并将这些键批量提交给数据库引擎进行索引查找。

The keys are submitted to the engine through the Multi-Range Read (MRR) interface (see Section 8.2.1.10, “Multi-Range Read Optimization”).

After submission of the keys, the MRR engine functions perform lookups in the index in an optimal way, fetching the rows of the joined table found by these keys, and starts feeding the BKA join algorithm with matching rows.

提交键后,MRR引擎函数以最优方式在索引中执行查找,获取由这些键找到的被连接表的行,并开始向BKA连接算法提供匹配的行。

Each matching row is coupled with a reference to a row in the join buffer.

每个匹配的行都与联接缓冲区中的行引用相耦合。

When BKA is used, the value of join_buffer_size defines how large the batch of keys is in each request to the storage engine. The larger the buffer, the more sequential access is made to the right hand table of a join operation, which can significantly improve performance.

当使用BKA时,join_buffer_size的值定义了对存储引擎的每个请求中键的批量大小。缓冲区越大,对连接操作的右侧表进行的顺序访问就越多,这可以显著提高性能。

For BKA to be used, the batched_key_access flag of the optimizer_switch system variable must be set to on. BKA uses MRR, so the mrr flag must also be on. Currently, the cost estimation for MRR is too pessimistic. Hence, it is also necessary for mrr_cost_based to be off for BKA to be used. The following setting enables BKA:

要使用BKA, optimizer_switch系统变量的batched_key_access标志必须设置为on。BKA使用MRR,因此MRR标志也必须打开。目前对MRR的成本估计过于悲观。因此,为了使用BKA,关闭mrr_cost_based也是必要的。通过以下设置启用BKA:

 

There are two scenarios by which MRR functions execute:

MRR函数有两种执行场景:

The first scenario is used for conventional disk-based storage engines such as InnoDB and MyISAM.

第一个场景用于传统的基于磁盘的存储引擎,如InnoDB和MyISAM。

For these engines, usually the keys for all rows from the join buffer are submitted to the MRR interface at once.

对于这些引擎,联接缓冲区中所有行的键通常会一次提交给MRR接口。

Engine-specific MRR functions perform index lookups for the submitted keys, get row IDs (or primary keys) from them, and then fetch rows for all these selected row IDs one by one by request from BKA algorithm.

特定于引擎的MRR函数对提交的键执行索引查找,从它们获取行id(或主键),然后根据BKA算法的请求逐个获取所有这些选中的行id的行。

Every row is returned with an association reference that enables access to the matched row in the join buffer.

返回的每一行都带有一个关联引用,该关联引用允许访问连接缓冲区中匹配的行。

The rows are fetched by the MRR functions in an optimal way: They are fetched in the row ID (primary key) order. This improves performance because reads are in disk order rather than random order.

MRR函数以最优方式获取行:按行ID(主键)顺序获取行。这将提高性能,因为读取是按照磁盘顺序而不是随机顺序进行的。

The second scenario is used for remote storage engines such as NDB.

第二种场景用于NDB等远程存储引擎。

A package of keys for a portion of rows from the join buffer, together with their associations, is sent by a MySQL Server (SQL node) to NDB Cluster data nodes.

MySQL Server (SQL节点)向NDB集群数据节点发送连接缓冲区中部分行的键包,以及它们的关联。

In return, the SQL node receives a package (or several packages) of matching rows coupled with corresponding associations. The BKA join algorithm takes these rows and builds new joined rows.

作为返回,SQL节点接收一个包(或多个包),该包包含匹配的行和相应的关联。BKA连接算法获取这些行并构建新的连接行。

Then a new set of keys is sent to the data nodes and the rows from the returned packages are used to build new joined rows.

然后,将一组新的键发送到数据节点,并使用返回包中的行构建新的连接行。

The process continues until the last keys from the join buffer are sent to the data nodes, and the SQL node has received and joined all rows matching these keys.

这个过程继续进行,直到连接缓冲区中的最后一个键被发送到数据节点,并且SQL节点已经接收到并连接了所有匹配这些键的行。

This improves performance because fewer key-bearing packages sent by the SQL node to the data nodes means fewer round trips between it and the data nodes to perform the join operation.

这将提高性能,因为SQL节点发送给数据节点的带键的包更少,意味着它与执行连接操作的数据节点之间的往返次数更少。

With the first scenario, a portion of the join buffer is reserved to store row IDs (primary keys) selected by index lookups and passed as a parameter to the MRR functions.

对于第一个场景,连接缓冲区的一部分被保留以存储索引查找所选择的行id(主键),并将其作为参数传递给MRR函数。

There is no special buffer to store keys built for rows from the join buffer. Instead, a function that builds the key for the next row in the buffer is passed as a parameter to the MRR functions.

没有特殊的缓冲区来存储为联接缓冲区中的行构建的键。相反,为缓冲区中下一行构建键的函数将作为参数传递给MRR函数。

In EXPLAIN output, use of BKA for a table is signified when the Extra value contains Using join buffer (Batched Key Access) and the type value is ref or eq_ref.

在EXPLAIN输出中,当Extra值包含Using join buffer (Batched Key Access)且类型值为ref或eq_ref时,表示对表使用BKA。

Optimizer Hints for Block Nested-Loop and Batched Key Access Algorithms

块嵌套循环和批处理密钥访问算法的优化提示

In addition to using the optimizer_switch system variable to control optimizer use of the BNL and BKA algorithms session-wide, MySQL supports optimizer hints to influence the optimizer on a per-statement basis.

除了使用optimizer_switch系统变量来控制优化器对BNL和BKA算法的使用,MySQL还支持优化器提示来影响每条语句的优化器。

See Section 8.9.3, “Optimizer Hints”.

To use a BNL or BKA hint to enable join buffering for any inner table of an outer join, join buffering must be enabled for all inner tables of the outer join.

要使用BNL或BKA提示为外部连接的任何内部表启用连接缓冲,必须为外部连接的所有内部表启用连接缓冲。

Nested Loop Join是Oracle中一种基于嵌套循环连接算法,它的原理是对于两个表中的每一行,都对另一个表进行一次扫描,找到符合条件的行进行连接,直到所有行都被连接完成。 下面通过一个案例来说明Nested Loop Join的原理。 假设有两个表A和B,它们的结构如下: 表A: ``` id name 1 Tom 2 Jerry 3 Alice ``` 表B: ``` id age 1 25 2 30 3 35 ``` 现在需要将这两个表按照id进行连接,并且只选择age小于等于30的记录。可以使用如下SQL语句: ``` SELECT A.name, B.age FROM A, B WHERE A.id = B.id AND B.age <= 30; ``` 这个查询语句中使用了Nested Loop Join算法。具体来说,查询语句的执行步骤如下: 1. 从表A中读取第一行数据,即id=1,name=Tom。 2. 对于表A的每一行数据,都从表B的开头开始扫描。 3. 对于表B的第一行数据,即id=1,age=25,它符合连接条件,因此将它和表A中的当前行连接起来,得到结果:Tom 25。 4. 继续扫描表B的下一行数据,即id=2,age=30,它也符合连接条件,因此将它和表A中的当前行连接起来,得到结果:Tom 25、Jerry 30。 5. 继续扫描表B的下一行数据,即id=3,age=35,它不符合连接条件,因此跳过。 6. 如果表B还有未扫描的行,则返回步骤3,否则返回步骤1,继续读取表A的下一行数据。 7. 从表A中读取下一行数据,即id=2,name=Jerry。 8. 对于表A的每一行数据,都从表B的开头开始扫描,重复步骤3到6,得到结果:Tom 25、Jerry 30。 9. 从表A中读取下一行数据,即id=3,name=Alice。 10. 对于表A的每一行数据,都从表B的开头开始扫描,重复步骤3到6,得到结果:Tom 25、Jerry 30。 11. 如果表A还有未扫描的行,则返回步骤2,否则返回结果。 最终的查询结果为: ``` name age Tom 25 Jerry 30 ``` 可以看到,Nested Loop Join算法在这个案例中的效果非常好,它很快地完成了连接操作,并且得到了正确的结果。但是,在实际应用中,如果数据集非常大,那么Nested Loop Join就会变得非常慢,并且会占用大量的系统资源。因此,在实际应用中,我们需要根据数据集的大小和查询的条件来选择合适的连接算法。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值