mysql innodb btree_mysql InnoDB btree | 学步园

/*

Latching strategy of the InnoDB B-tree

--------------------------------------

A tree latch protects all non-leaf nodes of the tree. Each node of a tree

also has a latch of its own.

A B-tree operation normally first acquires an S-latch on the tree. It

searches down the tree and releases the tree latch when it has the

leaf node latch. To save CPU time we do not acquire any latch on

non-leaf nodes of the tree during a search, those pages are only bufferfixed.

If an operation needs to restructure the tree, it acquires an X-latch on

the tree before searching to a leaf node. If it needs, for example, to

split a leaf,

(1) InnoDB decides the split point in the leaf,

(2) allocates a new page,

(3) inserts the appropriate node pointer to the first non-leaf level,

(4) releases the tree X-latch,

(5) and then moves records from the leaf to the new allocated page.

Node pointers

-------------

Leaf pages of a B-tree contain the index records stored in the

tree. On levels n > 0 we store 'node pointers' to pages on level

n - 1. For each page there is exactly one node pointer stored:

thus the our tree is an ordinary B-tree, not a B-link tree.

A node pointer contains a prefix P of an index record. The prefix

is long enough so that it determines an index record uniquely.

The file page number of the child page is added as the last

field. To the child page we can store node pointers or index records

which are >= P in the alphabetical order, but < P1 if there is

a next node pointer on the level, and P1 is its prefix.

If a node pointer with a prefix P points to a non-leaf child,

then the leftmost record in the child must have the same

prefix P. If it points to a leaf node, the child is not required

to contain any record with a prefix equal to P. The leaf case

is decided this way to allow arbitrary deletions in a leaf node

without touching upper levels of the tree.

We have predefined a special minimum record which we

define as the smallest record in any alphabetical order.

A minimum record is denoted by setting a bit in the record

header. A minimum record acts as the prefix of a node pointer

which points to a leftmost node on any level of the tree.

File page allocation

--------------------

In the root node of a B-tree there are two file segment headers.

The leaf pages of a tree are allocated from one file segment, to

make them consecutive on disk if possible. From the other file segment

we allocate pages for the non-leaf levels of the tree.

*/

/* DICT_MAX_INDEX_COL_LEN is measured in bytes and is the maximum

indexed column length (or indexed prefix length). It is set to 3*256,

so that one can create a column prefix index on 256 characters of a

TEXT or VARCHAR column also in the UTF-8 charset. In that charset,

a character may take at most 3 bytes.

This constant MUST NOT BE CHANGED, or the compatibility of InnoDB data

files would be at risk! */

#define DICT_MAX_INDEX_COL_LEN768

/* Data structure for a field in an index */

struct dict_field_struct{

dict_col_t*col;/* pointer to the table column */

const char*name;/* name of the column */

unsignedprefix_len:10;/* 0 or the length of the column

prefix in bytes in a MySQL index of

type, e.g., INDEX (textcol(25));

must be smaller than

DICT_MAX_INDEX_COL_LEN; NOTE that

in the UTF-8 charset, MySQL sets this

to 3 * the prefix len in UTF-8 chars */

unsignedfixed_len:10;/* 0 or the fixed length of the

column if smaller than

DICT_MAX_INDEX_COL_LEN */

};

/* Data structure for an index */

struct dict_index_struct{

dulintid;/* id of the index */

mem_heap_t*heap;/* memory heap */

ulinttype;/* index type */

const char*name;/* index name */

const char*table_name; /* table name */

dict_table_t*table;/* back pointer to table */

unsignedspace:32;

/* space where the index tree is placed */

unsignedpage:32;/* index tree root page number */

unsignedtrx_id_offset:10;/* position of the the trx id column

in a clustered index record, if the fields

before it are known to be of a fixed size,

0 otherwise */

unsignedn_user_defined_cols:10;

/* number of columns the user defined to

be in the index: in the internal

representation we add more columns */

unsignedn_uniq:10;/* number of fields from the beginning

which are enough to determine an index

entry uniquely */

unsignedn_def:10;/* number of fields defined so far */

unsignedn_fields:10;/* number of fields in the index */

unsignedn_nullable:10;/* number of nullable fields */

unsignedcached:1;/* TRUE if the index object is in the

dictionary cache */

dict_field_t*fields;/* array of field descriptions */

UT_LIST_NODE_T(dict_index_t)

indexes;/* list of indexes of the table */

btr_search_t*search_info; /* info used in optimistic searches */

/*----------------------*/

ib_longlong*stat_n_diff_key_vals;

/* approximate number of different key values

for this index, for each n-column prefix

where n <= dict_get_n_unique(index); we

periodically calculate new estimates */

ulintstat_index_size;

/* approximate index size in database pages */

ulintstat_n_leaf_pages;

/* approximate number of leaf pages in the

index tree */

rw_lock_tlock;/* read-write lock protecting the upper levels

of the index tree */

#ifdef UNIV_DEBUG

ulintmagic_n;/* magic number */

# define DICT_INDEX_MAGIC_N76789786

#endif

};

/*****************************************************************

Makes tree one level higher by splitting the root, and inserts

the tuple. It is assumed that mtr contains an x-latch on the tree.

NOTE that the operation of this function must always succeed,

we cannot reverse it: therefore enough free disk space must be

guaranteed to be available before this function is called. */

/* Allocate a new page to the tree. Root splitting is done by first

moving the root records to the new page, emptying the root, putting

a node pointer to the new page, and then splitting the new page. */

/*****************************************************************

Splits an index page to halves and inserts the tuple. It is assumed

that mtr holds an x-latch to the index tree. */

/* 1. Decide the split record; split_rec == NULL means that the

tuple to be inserted should be the first record on the upper

half-page */

/* 2. Allocate a new page to the index */

/* 3. Calculate the first record on the upper half-page, and the

first record (move_limit) on original page which ends up on the

upper half */

/* 4. Do first the modifications in the tree structure */

/* 5. Move then the records to the new page */

/* 6. The split and the tree modification is now completed. Decide the

page where the tuple should be inserted */

/* 7. Reposition the cursor for insert and try insertion */

/* 8. If insert did not fit, try page reorganization */

innodb的btree就是一棵普通的Btree,没有引入任何论文里面来提高效率的方法,并发的性能应不高

btree在插入分裂的时候,会通过x-latch把整个树锁起来,这样在分裂完成之前,无法进行任何其它的操作,通过降低并发性来保正正确性,并且对于分裂的改变树结构操作,一旦成功,也是不可回滚的,

比较特别的地方就是,它可以对部分叶结点的页面建立一个hash索引,相当于对btree上的一种索引,相当于索引之上的索引,这个hash索引是指向btree的叶页面,btree叶结点页面是有序的,方便范围顺序查找。所有的BTREE全部使用同一个HASH表

它把叶页面和非叶叶面存在了两个不同的段之中,提高了read-ahead的效率

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值