PostgreSQL B+树索引---ScanKey

obvious__

已于 2023-08-22 10:47:10 修改

阅读量1.3w

点赞数 6

分类专栏： postgresql 文章标签： postgresql 数据库

于 2021-08-25 11:18:17 首次发布

本文链接：https://blog.csdn.net/obvious__/article/details/119907003

版权

postgresql 专栏收录该内容

25 篇文章 34 订阅

订阅专栏

本文详细介绍了B+树索引的工作原理，包括ScanKey的概念和构建过程，以及如何处理查询条件。在PostgreSQL中，ScanKey是用于索引遍历的关键，它必须满足最左匹配原则。文章还分析了二分法的实现细节，以及如何处理多个ScanKey以确定定位和遍历阶段的规则。最后，讨论了如何处理OR条件和重复列的情况，以确保高效的查询执行。

摘要由CSDN通过智能技术生成

B+树索引—ScanKey

预备知识

《PostgreSQL B+树索引—查询》

ScanKey概念

对于一个给定的查询语句，我们需要做的第一件事就是从where条件中获取可以用于索引遍历的表达式，这些表达式在PostgreSQL中被称为ScanKey。下面我们先来看一个场景，首先我们创建一张带有索引的表：

drop table if exists test;
create table test(a int);
create index idxtest on test(a);

然后向表中写入数据：

insert into test select generate_series(1,10000);

现在我们来做一个简单的查询，看看它的执行计划：

在这里插入图片描述

这个查询使用了索引进行过滤，这没什么好说的。现在我们修改一下查询语句，再来看看其执行计划：

在这里插入图片描述

按道理来讲，这条查询语句与上一条是等价的，但是它却没有使用索引，而是使用的全表遍历。其原因就在于索引字段a存在于四则运算表达式中，在这个情况下a + 1 = 2这个条件是没有办法使用索引的，查询引擎也不会将a + 1 = 2等价为a = 1。这就说明可以作为ScanKey的表达式必须是如下形式：

index_field OP const_expression

index_field

索引字段，不包含其他运算符。
OP

比较运算符。
const_expression

常量表达式，可以是四则运算表达式、子查询等，只要最终可以计算出一个常量的表达式都可以。

ScanKey实现

在PostgreSQL中ScanKey定义如下：

typedef struct ScanKeyData
{
	int			sk_flags;		/* flags, see below */
	AttrNumber	sk_attno;		/* table or index column number */
	StrategyNumber sk_strategy; /* operator strategy number */
	Oid			sk_subtype;		/* strategy subtype */
	Oid			sk_collation;	/* collation to use, if needed */
	FmgrInfo	sk_func;		/* lookup info for function to call */
	Datum		sk_argument;	/* data to compare */
} ScanKeyData;

typedef ScanKeyData *ScanKey;

其中：

sk_attno

字段编号，对应前面的index_field。
sk_func

比较函数，对应前面的OP（在PostgreSQL中所有的运算符都转换成函数）。
sk_argument

函数参数，对应前面的const_expression。

ScanKey Array

在创建索引时，可以指定多个索引列，例如：

drop table if exists test;
create table test(a int,b int,c int);
create index idxtest on test(a,b,c);

这里，我们将a、b、c都作为索引列，在构建索引时这个三个列的关系如下：

首先按照a排序，在a相等的情况下按照b排序，在b相等的情况下按照c排序。

而在查询时，如果查询条件中包含了多个索引列，那么实际就会存在多个ScanKey。例如：

select * from test where a = 1 and b = 1 and c = 1;

上述用例存在三个ScanKey：a = 1、b = 1、c = 1。在PostgreSQL中使用ScanKey的数组来存放这三个ScanKey，数组中的元素个数为3，而这个数组就是ScanKey Array。ScanKey Array表示n个用and连接的比较表达式，在ScanKey定义处的注释中也可以找到相关说明：

/*
 * A ScanKey represents the application of a comparison operator between
 * a table or index column and a constant.  When it's part of an array of
 * ScanKeys, the comparison conditions are implicitly ANDed.
 */

OR如何处理

那么如果比较表达式用OR来连接应该如何处理？例如：
select * from test where a = 1 OR a = 2;
这种情况下相当于执行两次查询，第一次执行：
select * from test where a = 1;
第二次执行：
select * from test where a = 2;
那么索引也就会遍历两次，第一次ScanKey Array的元素个数为1，元素为a = 1；第二次ScanKey Array的元素个数为1，元素为a = 2。

查询的基本流程与基本概念

基本流程

假定上述test表中索引列a的值如下：

1 2 3 4 5 5 5 5 6 7 8 9 10

有如下查询语句：

select * from test where a = 5;

那么数据库是如何利用B+树索引来实现查询的呢？查询流程主要分为两个阶段：

阶段一：定位

在B+树中定位第一个5出现的位置。定位过程可以充分利用数据的有序性，通过二分法进行快速定位。
阶段二：遍历

从第一个5出现的位置起，向后遍历，直到出现第一个不为5的值。

这两个阶段有一个非常重要的点：定位阶段使用的ScanKey Array可能与遍历阶段不同。下面我们就来重点讨论这个问题。

基本概念

最左匹配原则

前面讲过多列索引的构建规则（首先按照a排序，在a相等的情况下按照b排序…），这种规则导致我们在定位阶段时构建的ScanKey Array必须满足最左匹配原则。为了了解最左匹配原则，我们来考虑几个问题（以上述test表为例）：

-- 用例1
select * from test where b = 1;

b = 1可以作为ScanKey在B+树索引中利用二分法进行定位么？不可以！因为索引的构建规则是，先按照a排序，a相等时按照b排序。所以从全局来看，a是有序的，而b是无序的，而上述比较条件只包含字段b不包含字段a，所以b = 1这个条件不能用作定位阶段的ScanKey（上述查询语句也不会使用idxtest索引，而是全表遍历）。

-- 用例2
select * from test where a = 1 and c = 1;

c = 1可以作为ScanKey在B+树索引中利用二分法进行定位么？不可以！因为在b相同的前提下，c才有序，而条件中不包含字段b。所以只有a = 1可以作为定位阶段的ScanKey。

-- 用例3
select * from test where a > 1 and b = 1;

b = 1可以作为ScanKey在B+树索引中利用二分法进行定位么？不可以！这一点，和PostgreSQL二分法的实现有关，我们在后面再来仔细分析。

综上所述，在定位阶段我们所使用的ScanKey Array必须满足最左匹配：

数组元素从左向右与索引列一一对应，顺序一致。

对于上述用例，ScanKey Array[0]必须是列a的ScanKey；ScanKey Array[1]必须是列b的ScanKey；ScanKey Array[2]必须是列c的ScanKey。
数组必须连续，即数组不能出现ScanKey.sk_attno为0的元素。

对于用例1：由于缺乏列a的ScanKey，即ScanKey Array[0].sk_attno为0，所以不存在满足条件的数组。

对于用例2：由于缺乏列b的ScanKey，即ScanKey Array[1].sk_attno为0，所以ScanKey Array只有一个合法元素ScanKey Array[0]。
数组中除了最后一个ScanKey外，不能出现运算符：>（升序遍历）、<（降序遍历）。

对于用例3：由于ScanKey Array[0].sk_func为’>'，所以ScanKey Array只有一个合法元素ScanKey Array[0]。

关于 <、<=运算符

有如下SQL语句：

select * from test where a <= 1000;

现在思考一个问题，a <= 1是否可以作为ScanKey用作定位阶段？显然不可以！想一想上面这个语句我们应该如何定位？是不是应该直接定位到B+树第一个叶子节点的第一个item？这和a <= 1貌似没什么毛关系！所以 <、<=条件也是不能作为定位阶段的ScanKey的。

遍历顺序（方向）

上面的描述，其实隐含了一个条件，即遍历顺序升序遍历（Forward），这是B+树通常的遍历顺序。其实还有另外一种情况，有如下SQL语句：
select * from test where a <= 1000 order by a desc;
对于这个语句，最容易想到的方式是先按照从小到大的遍历方式获取满足a < 1000的所有元组，然后再按照a的大小进行降序排列。但显然这种方式是很低效的！由于B+树索引本身就是有序的，所以上述语句正确的执行流程是：

利用二分法定位最后一个a = 1000的位置。
从这个位置起，降序（Backward）遍历B+树，直到第一个叶子节点的第一个item。

在这个流程中，a <= 1000是作为ScanKey参与定位阶段的。所以<、<=是否可以作为定位阶段的ScanKey，与遍历顺序有关。升序时遍历<、<=不能作为定位阶段的ScanKey，降序时遍历>、>=不能作为定位阶段的ScanKey。

上述内容说明了一个关键问题：不是所有满足最左匹配的ScanKey都可以用在定位阶段。

重复列的处理

思考一个问题：定位阶段的ScanKey Array中会出现重复的索引列么？不会！例如：

select * from test  where a >= 1 and a >= 10;

这里出现了两个关于索引列a的比较表达式a >= 1和a >= 10，但是显然a >=10是a >=1的子集，我们只需要获取满足a >= 10的数据即可。再来看一个用例：

select * from test  where a >= 1 and a <= 10;

通过前面的阐述，我们知道不论是顺序遍历还是逆序遍历，在定位阶段只能用到两个条件的其中一个。最后一个用例：

select * from test  where a = 1 and a = 10;

这显然是一个恒为假的条件，所以根本都不用进行B+树遍历。

遍历阶段

通过前面的描述，我们知道了只有满足一定条件的ScanKey可以用作定位阶段，那么那些不能用于定位阶段的ScanKey有什么用呢？用在遍历阶段！在遍历阶段有两个非常重要的判断：

当前的key是否满足条件
是否需要继续遍历

假定上述test表中索引列a的值如下：

1 2 3 4 5 5 5 5 6 7 8 9 10

有如下查询语句：

select * from test where a > 5 and a <= 8;

上述SQL语句，在执行时会首先定位到第一个>5的位置，也就是6，然后从6开始向后遍历，直到获取到9，结束遍历。我们修改上述SQL：

select * from test where a > 5 and a != 8;

同样该SQL会首先定位到6，然后从6开始向后遍历，当遍历到8时，发现8不满足条件，所以不会将8输出，然后继续遍历，直到遍历完全部数据。

通过这两个用例，我们不难看出一件事：a <= 8这个条件会决定遍历是否终止，而a != 8只是一个过滤条件，不决定遍历是否终止。所以，我们需要有一个规则来判断哪些条件只是过滤条件，哪些条件可以决定是否终止遍历。一个可以决定遍历是否终止的条件，必须具备以下特定：

运算符只能是：>、>=、<、<=、= 5个中的一个

升序遍历时，运算符只能是：**<、<=、=**3个中的一个。

降序遍历时，运算符只能是：**>、>=、=**3个中的一个。
这个条件对应的ScanKey必须满足最左匹配
```
select * from test where a > 5 and b <= 10;
```
在这个用例中b <= 10并不满足最左匹配（因为a的条件是>），所以b <= 10只是一个过滤条件。
当前ScanKey之前的那个ScanKey的比较运算符只能是**=**
```
select * from test where a >= 5 and b <= 10;
```
在这个用例用b <= 10满足最左匹配（因为a的条件是>=），但由于a的条件不是=，所以b <= 10也只是一个过滤条件。这一点需要特别注意，但是也比较好理解，假设(a,b)有如下值：

(5,1) (5,2) (5,3) (5,11) (6,1) (6,2) (6,3)

如果b <= 10可以用于判断遍历是否终止，那么当遍历到(5,11)时遍历就应该终止，但是显然后面的(6,1) (6,2) (6,3)同样满足条件。

所有不能否终止遍历的ScanKey都只能用来做过滤，在PostgreSQL中，判断ScanKey能否终止遍历的代码实现如下：

/*
 *	构建
 */
if (priorNumberOfEqualCols == attno - 1)	//判断当前key是否满足最左匹配
{
    int			addflags;

    //下面内容是由_bt_mark_scankey_required函数展开
	switch (skey->sk_strategy)
	{
		case BTLessStrategyNumber:
		case BTLessEqualStrategyNumber:
			addflags = SK_BT_REQFWD;
			break;
		case BTEqualStrategyNumber:
			addflags = SK_BT_REQFWD | SK_BT_REQBKWD;
			break;
		case BTGreaterEqualStrategyNumber:
		case BTGreaterStrategyNumber:
			addflags = SK_BT_REQBKWD;
			break;
		default:
			elog(ERROR, "unrecognized StrategyNumber: %d",
				 (int) skey->sk_strategy);
			addflags = 0;		/* keep compiler quiet */
			break;
	}

	skey->sk_flags |= addflags;
}

/*
 *	判断
 *  ScanDirectionIsForward用于表示遍历方向
 */
if ((subkey->sk_flags & SK_BT_REQFWD) 		&& ScanDirectionIsForward(dir))
    *continuescan = false;
else if ((subkey->sk_flags & SK_BT_REQBKWD) && ScanDirectionIsBackward(dir))
	*continuescan = false;

代码流程

在《PostgreSQL B+树索引—查询》中，我们讲解的查询的框架和流程，但是跳过了ScanKey的处理细节，ScanKey的处理集中在**_bt_first**函数中，这个函数代码比较长，我们先来分析下它的大概流程，分析过程中我们使用一个用例来阐述每个步骤的输入输出，用例如下：

SELECT * FROM test WHERE b <= 9000 and a > 8500 and  c >= 8000 and a > 10000 ;

在进入_bt_first之前，PostgreSQL会首先获取上述SQL中的所有ScanKey（满足ScanKey的三要素即可）。然后按照索引顺序对ScanKey排序，从而组成一个有序的ScanKey Array，ScanKey Array中的元素如下：

a > 8500、a > 10000、b <= 9000、c >= 8000

这个ScanKey Array被放在IndexScanDescData的keyData成员中，IndexScanDescData的numberOfKeys成员用于记录数组中元素的个数：

在这里插入图片描述

_bt_first函数可以被分为几个部分，现在我们分别来讨论每个部分的功能以及输入输出。

Part1

位置：

line558，_bt_preprocess_keys()函数

功能：

合并ScanKey Array

用例中的a > 8500和a > 10000会被合并为a > 10000。
标记可能可以用于终止遍历的ScanKey

这样的ScanKey会使用sk_flags加以标识。所谓可能是指带有标记的ScanKey还需要在后面真正遍历的时候，结合遍历顺序才能决定是否可以终止遍历。

输入：

scan->numberOfKeys

合并前ScanKey Array中的元素个数。
scan->keyData

合并前的ScanKey Array。

输出：

scan->opaque->numberOfKeys

合并后ScanKey Array中的元素个数。
scan->opaque->keyData

合并后的ScanKey Array。

在这里插入图片描述

对比输入输出，不难看出执行完_bt_preprocess_keys之后，去掉了一个与a相关的ScanKey（去掉的就是a > 8500），然后将a > 10000这个ScanKey进行标记，因为这个条件可能可以决定遍历是否终止。

Part2

位置：

line614~746

功能：

构建可以用于定位阶段的ScanKey Array。

输入：

scan->opaque->numberOfKeys
scan->opaque->keyData

输出：

keysCount

用于定位的ScanKey Array中的元素个数。
startKeys

用于定位的ScanKey Array。

针对上面的用例，不难看出只有a>10000可以用于定位阶段。

在这里插入图片描述

Part3

位置：

line748~887

功能：

处理row comparisons

例如：(x, y) > (c1, c2)这样的表达式就被称为row comparisons
将search-style ScanKey转换为insert-style ScanKey

Part2输出的startKeys可以用于定位，但是startKeys对应的是a>10000，即startKeys[0].sk_func是一个用于计算**>的函数，这个函数返回的是一个bool类型（表示a是否>10000），而含有这样函数的ScanKey被称为search-style ScanKey**。但显然在利用二分法定位时应该是比较a与10000大小，返回0（a = 10000）、1（a > 10000）、-1（a < 10000），所以我们需要将sk_func替换为实现二分法比较功能的函数，而含有这样函数的ScanKey就被称为insert-style ScanKey。

输入：

startKeys

输出：

scankeys：scankeys是startKeys的一个copy，然后替换掉startKeys中的sk_func。

在这里插入图片描述

Part4

位置：

line901~978

功能：

获取nextkey
获取goback

输入：

strat_total

输出：

nextkey
goback

nextkey、goback用来决定定位规则。PostgreSQL对nextkey有如下注释：

 /* 
  * When nextkey is false (the usual case), we are looking for the first
  * item >= ScanKey.  When nextkey is true, we are looking for the first
  * item strictly greater than ScanKey.
  */

这个不太好解释，我们需要结合用例来分析下。假定上述test表中索引列a的值如下：

1 2 3 4 5 5 5 5 6 7 8 9 10

SQL1：

select * from test where a = 5;

nextkey为false表示需要定位到第一个**>=5**的位置：

1 2 3 4 5 5 5 5 6 7 8 9 10

nextkey为true表示需要定位到第一个**>5**的位置：

1 2 3 4 5 5 5 5 6 7 8 9 10

对于SQL1而言，如果是升序遍历，显然是需要定位到第一个等于5的位置，显然将nextkey设为false就能满足条件。而如果是降序遍历，则需要定位到最后一个等于5的位置：
1 2 3 4 5 5 5 5 6 7 8 9 10

这需要首先定位到第一个>5的位置，所以nextkey应该设为true，然后再左移一位，所以goback为true。

SQL2：

select * from test where a > 5;

nextkey：ture

goback：false

SQL3：

select * from test where a >= 5;

nextkey：false

goback：false

SQL4：

select * from test where a < 5;

这里需要注意一下，当**<**可以用于定位时，一定是降序遍历。所以SQL4需要定位到第一个<5的位置：
1 2 3 4 5 5 5 5 6 7 8 9 10

这里首先需要定位第一个>=5的位置，所以nextkey应该设为false，然后再左移一位，所以goback为true。

SQL5：

select * from test where a <= 5;

同样，由于降序遍历，SQL5需要定位到第一个最后一个等于5的位置，同SQL1的降序遍历一样，nextkey为true，goback为true。

前面提到的strat_total这里可以简单认为就是SQL1~SQL5的比较运算符。

二分法

之前在阐述最左匹配时，我们对Scan Array提到了一个要求：数组中除了最后一个ScanKey外，不能出现运算符：>（升序遍历）、<（降序遍历）。即数组中，可以出现多次**=、>=、<=，但只能出现一次>（如果是升序遍历）或一次<（如果是降序遍历）**。当时没有说明这是为什么。因为这和PostgreSQL的二分法实现机制有关系，现在我们来看看PostgreSQL二分法的实现代码：

/*
 *	_bt_binsrch() -- Do a binary search for a key on a particular page.
 *
 * The passed ScanKey must be an insertion-type ScanKey (see nbtree/README),
 * but it can omit the rightmost column(s) of the index.
 *
 * When nextkey is false (the usual case), we are looking for the first
 * item >= ScanKey.  When nextkey is true, we are looking for the first
 * item strictly greater than ScanKey.
 *
 * On a leaf page, _bt_binsrch() returns the OffsetNumber of the first
 * key >= given ScanKey, or > ScanKey if nextkey is true.  (NOTE: in
 * particular, this means it is possible to return a value 1 greater than the
 * number of keys on the page, if the ScanKey is > all keys on the page.)
 *
 * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber
 * of the last key < given ScanKey, or last key <= given ScanKey if nextkey
 * is true.  (Since _bt_compare treats the first data key of such a page as
 * minus infinity, there will be at least one key < ScanKey, so the result
 * always points at one of the keys on the page.)  This key indicates the
 * right place to descend to be sure we find all leaf keys >= given ScanKey
 * (or leaf keys > given ScanKey when nextkey is true).
 *
 * This procedure is not responsible for walking right, it just examines
 * the given page.  _bt_binsrch() has no lock or refcount side effects
 * on the buffer.
 */
OffsetNumber
_bt_binsrch(Relation rel,
			Buffer buf,
			int keysz,
			ScanKey ScanKey,
			bool nextkey)
{
	Page		page;
	BTPageOpaque opaque;
	OffsetNumber low,
				high;
	int32		result,
				cmpval;

	page = BufferGetPage(buf);
	opaque = (BTPageOpaque) PageGetSpecialPointer(page);

	low = P_FIRSTDATAKEY(opaque);
	high = PageGetMaxOffsetNumber(page);

	/*
	 * If there are no keys on the page, return the first available slot. Note
	 * this covers two cases: the page is really empty (no keys), or it
	 * contains only a high key.  The latter case is possible after vacuuming.
	 * This can never happen on an internal page, however, since they are
	 * never empty (an internal page must have children).
	 */
	if (high < low)
		return low;

	/*
	 * Binary search to find the first key on the page >= scan key, or first
	 * key > ScanKey when nextkey is true.
	 *
	 * For nextkey=false (cmpval=1), the loop invariant is: all slots before
	 * 'low' are < scan key, all slots at or after 'high' are >= scan key.
	 *
	 * For nextkey=true (cmpval=0), the loop invariant is: all slots before
	 * 'low' are <= scan key, all slots at or after 'high' are > scan key.
	 *
	 * We can fall out when high == low.
	 */
	high++;						/* establish the loop invariant for high */

	cmpval = nextkey ? 0 : 1;	/* select comparison value */

	while (high > low)
	{
		OffsetNumber mid = low + ((high - low) / 2);

		/* We have low <= mid < high, so mid points at a real slot */

		result = _bt_compare(rel, keysz, ScanKey, page, mid);

		if (result >= cmpval)
			low = mid + 1;
		else
			high = mid;
	}

	/*
	 * At this point we have high == low, but be careful: they could point
	 * past the last slot on the page.
	 *
	 * On a leaf page, we always return the first key >= scan key (resp. >
	 * scan key), which could be the last slot + 1.
	 */
	if (P_ISLEAF(opaque))
		return low;

	/*
	 * On a non-leaf page, return the last key < scan key (resp. <= scan key).
	 * There must be one if _bt_compare() is playing by the rules.
	 */
	Assert(low > P_FIRSTDATAKEY(opaque));

	return OffsetNumberPrev(low);
}

这段代码很短，但是注释很长，并且注释里面的内容都很关键，我们现在来看看这些关键点：

loop invariant

在注释中出现了一个词loop invariant，这是循环不变量的意思，这也是这个二分法的关键点。在这个循环中，循环不变量需要保证的是：
- 当nextkey=false（cmpval=1）时，位于low之前的所有key都 < sacnkey;位于high或high之后的所有key都 >= ScanKey。
  
  那么这就意味着，当low = high跳出循环后，low之前的key < sacnkey，low或low之后的key >= ScanKey，于是我们就找到了第一个 >= ScanKey的item。
- 当nextkey=true（cmpval=0）时，位于low之前的所有key都 <= sacnkey;位于high或high之后的所有key都 > ScanKey。
  
  那么这就意味着，当low = high跳出循环后，low之前的key <= sacnkey，low或low之后的key > ScanKey，于是我们就找到了第一个 > ScanKey的item。
high++;

上述代码的line70，对于high进行了自增。这是为了让high保持循环不变。因为high或high之后的key需要 >= 或 > ScanKey，如果原本high < ScanKey，那么此时high需要指向high++这个位置。这就导致当跳出循环后，得到的位置可能是页面中最后一个solt之后的位置。
```
/*
 * At this point we have high == low, but be careful: they could point
 * past the last slot on the page.
 */
```
对叶子节点和非叶子节点的处理

从上面代码中不难看出，对于叶子节点和非叶子节点的二分法采用了不同的处理方式：
```
/* 
 * On a leaf page, we always return the first key >= scan key
 * On a non-leaf page, return the last key < scan key 
 */
```
在《PostgreSQL B+树索引—查询》中，我们讲过，非叶子中每一个item对应一个叶子节点中的最小值。那么假定非叶子节点中有如下值：

1 2 3 4 5 5 5 5 6 7 8 9 10

那么对于SQL语句：
```
select * from test where a = 5;
```
应该定位到最后一个比5小的item，也就是4：

1 2 3 4 5 5 5 5 6 7 8 9 10

这个4表示，下级节点中最小的key为4，所以下级节点中是可能含有5的。

_bt_compare

int32
_bt_compare(Relation rel,
			int keysz,
			ScanKey ScanKey,
			Page page,
			OffsetNumber offnum)
{
	TupleDesc	itupdesc = RelationGetDescr(rel);
	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
	IndexTuple	itup;
	int			i;

	if (!P_ISLEAF(opaque) && offnum == P_FIRSTDATAKEY(opaque))
		return 1;

	itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
	for (i = 1; i <= keysz; i++)
	{
		Datum		datum;
		bool		isNull;
		int32		result;

		datum = index_getattr(itup, ScanKey->sk_attno, itupdesc, &isNull);

		/* see comments about NULLs handling in btbuild */
		if (ScanKey->sk_flags & SK_ISNULL)		/* key is NULL */
		{
            //省略
		}
		else if (isNull)		/* key is NOT_NULL and item is NULL */
		{
             //省略
		}
		else
		{
            //比较
			result = DatumGetInt32(FunctionCall2Coll(&ScanKey->sk_func,
													 ScanKey->sk_collation,
													 datum,
													 ScanKey->sk_argument));

			if (!(ScanKey->sk_flags & SK_BT_DESC))
				result = -result;
		}

		/* if the keys are unequal, return the difference */
		if (result != 0)
			return result;

		ScanKey++;
	}
	/* if we get here, the keys are equal */
	return 0;
}

对于_bt_compare，这里我们对代码进行了简化，这个函数主要的流程就是遍历ScanKey array，取出其中每一个ScanKey然后与索引中的key向比较，如果相等就比较下一个ScanKey。

这也就是为什么ScanKey array中只能出现一次>或<的原因！因为_bt_compare在对于ScanKey进行比较时，是不管ScanKey中的原始符号的，**而比较规则与=、>=、<=是兼容的，但与>、<不兼容。**我们来看一个例子：

(1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (3,1) (3,2) (3,3)

现有SQL：

select * from test where a > 1 AND b = 1;

按照上述规则，只有 a > 1可以作为ScanKey，然后定位到(2,1):

(1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (3,1) (3,2) (3,3)

但是如果将a > 1、b = 1都作为ScanKey，那么就会定位到(1,1)：

(1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (3,1) (3,2) (3,3)

因为将a > 1、b = 1转换为insert-style，实际上等价于a = 1、b = 1。

实现细节

最后，我们来看看_bt_first中part1和part2的代码实现，part3和part4的代码比较简单，并且前面已经提到过，所以就不再赘述。

_bt_preprocess_keys

void
_bt_preprocess_keys(IndexScanDesc scan)
{
	BTScanOpaque so = (BTScanOpaque) scan->opaque;
	int			numberOfKeys = scan->numberOfKeys;
	int16	   *indoption = scan->indexRelation->rd_indoption;
	int			new_numberOfKeys;
	int			numberOfEqualCols;
	ScanKey		inkeys;
	ScanKey		outkeys;
	ScanKey		cur;
	ScanKey		xform[BTMaxStrategyNumber];
	bool		test_result;
	int			i,
				j;
	AttrNumber	attno;

	/* initialize result variables */
	so->qual_ok = true;
	so->numberOfKeys = 0;

	if (numberOfKeys < 1)
		return;					/* done if qual-less scan */

	/*
	 * Read so->arrayKeyData if array keys are present, else scan->keyData
	 * step1:获取函数的输入ScanKey与输出ScanKey
	 */
	if (so->arrayKeyData != NULL)
		inkeys = so->arrayKeyData;
	else
		inkeys = scan->keyData;

	outkeys = so->keyData;
	cur = &inkeys[0];
	/* we check that input keys are correctly ordered */
	if (cur->sk_attno < 1)
		elog(ERROR, "btree index keys must be ordered by attribute");

	/* We can short-circuit most of the work if there's just one key
	 * ScanKey array中如果只有1个元素，那么就不涉及合并等操作，可以简单处理
     */
	if (numberOfKeys == 1)
	{
		/* Apply indoption to ScanKey (might change sk_strategy!) */
		if (!_bt_fix_scankey_strategy(cur, indoption))
			so->qual_ok = false;
		memcpy(outkeys, cur, sizeof(ScanKeyData));
		so->numberOfKeys = 1;
		/* We can mark the qual as required if it's for first index col 
		 * _bt_mark_scankey_required用于标记遍历阶段可以用于终止遍历的条件，
		 * 由于用于终止遍历的条件必须满足最左匹配原则，此时ScanKey array又只有一个元素，
		 * 所以只有cur->sk_attno == 1时才满足最左匹配原则。
		 */
		if (cur->sk_attno == 1)
			_bt_mark_scankey_required(outkeys);
		return;
	}

	/*
	 * Otherwise, do the full set of pushups.
	 */
	new_numberOfKeys = 0;
	numberOfEqualCols = 0;

	/*
	 * Initialize for processing of keys for attr 1.
	 *
	 * xform[i] points to the currently best scan key of strategy type i+1; it
	 * is NULL if we haven't yet found such a key for this attr.
	 */
	attno = 1;
	memset(xform, 0, sizeof(xform));

	/*
	 * Loop iterates from 0 to numberOfKeys inclusive; we use the last pass to
	 * handle after-last-key processing.  Actual exit from the loop is at the
	 * "break" statement below.
	 */
	for (i = 0;; cur++, i++)
	{
		if (i < numberOfKeys)
		{
			/* Apply indoption to ScanKey (might change sk_strategy!) */
			if (!_bt_fix_scankey_strategy(cur, indoption))
			{
				/* NULL can't be matched, so give up */
				so->qual_ok = false;
				return;
			}
		}

		/*
		 * If we are at the end of the keys for a particular attr, finish up
		 * processing and emit the cleaned-up keys.
		 */
		if (i == numberOfKeys || cur->sk_attno != attno)
		{
			//step3: 合并运算符，暂时省略
		}

		/* check strategy this key's operator corresponds to */
		j = cur->sk_strategy - 1;

		/* if row comparison, push it directly to the output array */
		if (cur->sk_flags & SK_ROW_HEADER)
		{
			//处理row comparison
			continue;
		}
		/* step2: 搜集ScanKey */
		/* have we seen one of these before? */
		if (xform[j] == NULL)
		{
			/* nope, so remember this ScanKey */
			xform[j] = cur;
		}
		else
		{
			/* yup, keep only the more restrictive key */
			if (_bt_compare_scankey_args(scan, cur, cur, xform[j],
										 &test_result))
			{
				if (test_result)
					xform[j] = cur;
				else if (j == (BTEqualStrategyNumber - 1))
				{
					/* key == a && key == b, but a != b */
					so->qual_ok = false;
					return;
				}
				/* else old key is more restrictive, keep it */
			}
			else
			{
				/*
				 * We can't determine which key is more restrictive.  Keep the
				 * previous one in xform[j] and push this one directly to the
				 * output array.
				 */
				ScanKey		outkey = &outkeys[new_numberOfKeys++];

				memcpy(outkey, cur, sizeof(ScanKeyData));
				if (numberOfEqualCols == attno - 1)
					_bt_mark_scankey_required(outkey);
			}
		}
	}

	so->numberOfKeys = new_numberOfKeys;
}

我们使用一个用例来解释上述代码：

select * from test where b < 10 and a > 2 and and a > 3 and a >= 1 and a <= 10 ;

这个流程可以分为三个步骤：

step1：获取函数的输入ScanKey与输出ScanKey

这个步骤没什么特别的，输入ScanKey：scan->keyData，输出ScanKey：so->keyData。这里需要注意的是，输入的ScanKey array是一个按照列序升序排列的有序数组：

[a > 2] [a > 3] [a >= 1] [a <= 10] [b < 10]
step2：搜集ScanKey

这里需要首先将一个非常重要的成员xform：
```
ScanKey		xform[BTMaxStrategyNumber];
```
xform是一个ScanKey数组，数组元素个数为BTMaxStrategyNumber，所谓StrategyNumber实际上可以简单理解为比较运算符即：<、<=、=、>=、>，共5个，定义如下：
```
/*
 * Strategy numbers for B-tree indexes.
 */
#define BTLessStrategyNumber			1
#define BTLessEqualStrategyNumber		2
#define BTEqualStrategyNumber			3
#define BTGreaterEqualStrategyNumber	4
#define BTGreaterStrategyNumber			5

#define BTMaxStrategyNumber				5
```
这个数组用来干什么呢？用来搜集与合并ScanKey。对于上面的ScanKey array：[a > 2] [a > 3] [a >= 1] [a <= 10] [b < 10]，搜集的方式是遍历这个数组，比如现在我们获取到的ScanKey为[a > 2]，其比较运算符是>，所以我们令xform[BTGreaterStrategyNumber-1] = [a > 2]（BTLessStrategyNumber从1开始，所以这里需要-1）。然后继续遍历，获取[a > 3]，其比较运算符也是>，此时xform[BTGreaterStrategyNumber-1]已经有[a > 2]了，于是我们需要将这两个范围进行合并（也就是注释中所谓的：keep only the more restrictive key）。显然 a > 3是a > 2的子集，于是我们令xform[BTGreaterStrategyNumber-1] = [a > 3]。

step3：合并运算符

执行这个步骤需要两个条件：i == numberOfKeys或 cur->sk_attno != attno，也就是说当遍历到最后一个ScanKey或者ScanKey中的索引字段发生变化时。对于上述用例，当我们遍历到[b < 10]时就会触发cur->sk_attno != attno。此时xform中的5个元素分别如下：

[NULL] [a <= 10] NULL [a >= 1] [a > 3]

不难发现，[a >= 1]和[a > 3]，还可以进一步合并为[a > 3]。这也就是step3的第一个作用：对xform进行最终的合并，合并后就得到了列a的最终ScanKey：[a <= 10] [a > 3]。合并后，不难发现，a > 3满足最左匹配原则，所以这个条件可以作为终止遍历条件，这也就是step3的第二个作用：标记可以终止遍历的ScanKey。这些做完之后，我们就已经处理完了列a对应的ScanKey，所以需要将[a <= 10]、[a > 3]写入输出ScanKey即so->keyData。然后清空xform数组，继续处理后面的列b的ScanKey。step3的相关代码如下：

if (i == numberOfKeys || cur->sk_attno != attno)
{
    int			priorNumberOfEqualCols = numberOfEqualCols;

    /* check input keys are correctly ordered */
    if (i < numberOfKeys && cur->sk_attno < attno)
        elog(ERROR, "btree index keys must be ordered by attribute");


    /* 合并=与其他运算符，例如：key = 1 AND key > 2 */
    if (xform[BTEqualStrategyNumber - 1])
    {
        ScanKey		eq = xform[BTEqualStrategyNumber - 1];

        for (j = BTMaxStrategyNumber; --j >= 0;)
        {
            ScanKey		chk = xform[j];

            if (!chk || j == (BTEqualStrategyNumber - 1))
                continue;

            if (eq->sk_flags & SK_SEARCHNULL)
            {
                /* IS NULL is contradictory to anything else */
                so->qual_ok = false;
                return;
            }

            if (_bt_compare_scankey_args(scan, chk, eq, chk,
                                         &test_result))
            {
                if (!test_result)
                {
                    /* keys proven mutually contradictory */
                    so->qual_ok = false;
                    return;
                }
                /* else discard the redundant non-equality key */
                xform[j] = NULL;
            }
            /* else, cannot determine redundancy, keep both keys */
        }
        /* track number of attrs for which we have "=" keys 
         * 记录含有=的索引列数
         */
        numberOfEqualCols++;
    }

    /* try to keep only one of <, <= 
     * 合并 < 和 <= 
     */
    if (xform[BTLessStrategyNumber - 1]
        && xform[BTLessEqualStrategyNumber - 1])
    {
        ScanKey		lt = xform[BTLessStrategyNumber - 1];
        ScanKey		le = xform[BTLessEqualStrategyNumber - 1];

        if (_bt_compare_scankey_args(scan, le, lt, le,
                                     &test_result))
        {
            if (test_result)
                xform[BTLessEqualStrategyNumber - 1] = NULL;
            else
                xform[BTLessStrategyNumber - 1] = NULL;
        }
    }

    /* try to keep only one of >, >= 
     * 合并 > 和 >= 
     */
    if (xform[BTGreaterStrategyNumber - 1]
        && xform[BTGreaterEqualStrategyNumber - 1])
    {
        ScanKey		gt = xform[BTGreaterStrategyNumber - 1];
        ScanKey		ge = xform[BTGreaterEqualStrategyNumber - 1];

        if (_bt_compare_scankey_args(scan, ge, gt, ge,
                                     &test_result))
        {
            if (test_result)
                xform[BTGreaterEqualStrategyNumber - 1] = NULL;
            else
                xform[BTGreaterStrategyNumber - 1] = NULL;
        }
    }

    /*
	 * Emit the cleaned-up keys into the outkeys[] array, and then
	 * mark them if they are required.  They are required (possibly
	 * only in one direction) if all attrs before this one had "=".
	 * 输出处理后的ScanKey，标记可以作为遍历终止条件的ScanKey，
	 * 这个ScanKey之前的ScanKey的比较运算符必须是=
	 */
    for (j = BTMaxStrategyNumber; --j >= 0;)
    {
        if (xform[j])
        {
            ScanKey		outkey = &outkeys[new_numberOfKeys++];

            memcpy(outkey, xform[j], sizeof(ScanKeyData));
            if (priorNumberOfEqualCols == attno - 1)
                _bt_mark_scankey_required(outkey);
        }
    }

    /*
	 * Exit loop here if done.
	 * 结束ScanKey的遍历
	 */
    if (i == numberOfKeys)
        break;

    /* Re-initialize for new attno */
    attno = cur->sk_attno;
    memset(xform, 0, sizeof(xform));
}

Part2

if (so->numberOfKeys > 0)
{
    AttrNumber	curattr;
    ScanKey		chosen;
    ScanKey		impliesNN;
    ScanKey		cur;

    /*
	 * chosen is the so-far-chosen key for the current attribute, if any.
	 * We don't cast the decision in stone until we reach keys for the
	 * next attribute.
	 */
    curattr = 1;
    chosen = NULL;
    /* Also remember any scankey that implies a NOT NULL constraint */
    impliesNN = NULL;

    /*
	 * Loop iterates from 0 to numberOfKeys inclusive; we use the last
	 * pass to handle after-last-key processing.  Actual exit from the
	 * loop is at one of the "break" statements below.
	 */
    for (cur = so->keyData, i = 0;; cur++, i++)
    {
        if (i >= so->numberOfKeys || cur->sk_attno != curattr)
        {
            /*
			 * Done looking at keys for curattr.  If we didn't find a
			 * usable boundary key, see if we can deduce a NOT NULL key.
			 *
			 * step2：将选中(chosen)的ScanKey输出，即加入startKeys
			 */
            if (chosen == NULL && impliesNN != NULL &&
                ((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
                 ScanDirectionIsForward(dir) :
                 ScanDirectionIsBackward(dir)))
            {
                /* Yes, so build the key in notnullkeys[keysCount] */
                chosen = &notnullkeys[keysCount];
                ScanKeyEntryInitialize(chosen,
                                       (SK_SEARCHNOTNULL | SK_ISNULL |
                                        (impliesNN->sk_flags &
                                         (SK_BT_DESC | SK_BT_NULLS_FIRST))),
                                       curattr,
                                       ((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
                                        BTGreaterStrategyNumber :
                                        BTLessStrategyNumber),
                                       InvalidOid,
                                       InvalidOid,
                                       InvalidOid,
                                       (Datum) 0);
            }

            /*
			 * If we still didn't find a usable boundary key, quit; else
			 * save the boundary key pointer in startKeys.
			 * 将chosen输出到startKeys中。
			 */
            if (chosen == NULL)
                break;
            startKeys[keysCount++] = chosen;

            /*
			 * Adjust strat_total, and quit if we have stored a > or <
			 * key.
			 * 如果chosen的运算符是>或<，则结束循环
			 */
            strat = chosen->sk_strategy;
            if (strat != BTEqualStrategyNumber)
            {
                strat_total = strat;
                if (strat == BTGreaterStrategyNumber ||
                    strat == BTLessStrategyNumber)
                    break;
            }

            /*
			 * Done if that was the last attribute, or if next key is not
			 * in sequence (implying no boundary key is available for the
			 * next attribute).
			 * 如果当前索引列不满足最左匹配，则结束循环
			 */
            if (i >= so->numberOfKeys ||
                cur->sk_attno != curattr + 1)
                break;

            /*
			 * Reset for next attr.
			 */
            curattr = cur->sk_attno;
            chosen 	= NULL;
            impliesNN = NULL;
        }

        /*
		 * Can we use this key as a starting boundary for this attr?
		 *
		 * If not, does it imply a NOT NULL constraint?  (Because
		 * SK_SEARCHNULL keys are always assigned BTEqualStrategyNumber,
		 * *any* inequality key works for that; we need not test.)
		 * step1：选择可以用于定位的ScanKey
		 */
        switch (cur->sk_strategy)
        {
            case BTLessStrategyNumber:
            case BTLessEqualStrategyNumber:
                if (chosen == NULL)
                {
                    if (ScanDirectionIsBackward(dir))
                        chosen = cur;
                    else
                        impliesNN = cur;
                }
                break;
            case BTEqualStrategyNumber:
                /* override any non-equality choice */
                chosen = cur;
                break;
            case BTGreaterEqualStrategyNumber:
            case BTGreaterStrategyNumber:
                if (chosen == NULL)
                {
                    if (ScanDirectionIsForward(dir))
                        chosen = cur;
                    else
                        impliesNN = cur;
                }
                break;
        }
    }
}

/*
 * If we found no usable boundary keys, we have to start from one end of
 * the tree.  Walk down that edge to the first or last key, and scan from
 * there.
 */
if (keysCount == 0)
    return _bt_endpoint(scan, dir);

前面我们讨论过，同一个索引列，只可能有一个ScanKey可以用于定位阶段，所以上述代码的功能就是遍历_bt_preprocess_keys()输出的ScanKey Array，选择可以用于定位阶段的ScanKey，并输出到startKeys中。在前面的用例中_bt_preprocess_keys()输出的的ScanKey Array如下：

[a > 3] [a <= 10] [b < 10]

我们首先获取[a > 3]，由于是正向遍历，所以显然[a > 3]可以用于定位，于是chosen指向[a > 3]。然后继续遍历到[a <= 10]，此时由于chosen不为空，并且[a <= 10]也不能用于定位，所以继续遍历到 [b < 10]，此时遍历到了ScanKey Array的最后一个元素，且索引列也有a变为了b，所以满足 if (i >= so->numberOfKeys || cur->sk_attno != curattr)。于是需要将chosen对应的ScanKey进行输出。由于chosen的运算符是>，所以后面的ScanKey都不能用于定位，于是用strat_total记录这个>符号（strat_total将用于Part4），然后结束循环。