mysql源码分析——索引的数据结构

引子

说几句题外话,在京被困三个月之久,不能回家,所以这个源码分析就中断了。之所以在家搞这个数据库的源码分析,主要是在家环境齐全,公司的电脑老旧不堪。意外事件往往打断正常的习惯和运行轨迹,但这却是正常现象。回来也有两周,从本周开始恢复这个源码分析的系列。
大德久远,有始有终!

一、索引

什么是索引?索引有什么作用?还记得上小学时,老是教使用字典么?如果一个字不认识或者知道读音但字儿不会写都可以通过拼音或者笔画直接定位到某个字,然后它的后面就是这个字的具体的页码位置。翻到这个页码,就可以看到这个字了。在这个页面上,有这个字的写法,读音以及相关的语义,甚至还有一些常用的词语的解释和造句应用举例。
两样的道理,在图书馆找一本书,也是类似的方法;在KTV点歌,也是类似的方式,只是它们随着计算机的普及,变得更简单,更容易找到。索引,原始的意思就是查找图书的一种工具。关系型数据库技术将这种工具转用过来,把它定义为“一种单独的、物理的对数据库表中一列或多列的值进行排序的一种存储结构,它是某个表中一列或若干列值的集合和相应的指向表中物理标识这些值的数据页的逻辑指针清单。”。
试想一下,在图书馆中找寻一本图书,是从第一个书架开始,一直找到所需要为止方便快捷;还是通过索引直接定位到指定的书架以及具体的位置方便快捷,这当然不言而喻。所以,在关系型数据库技术中,索引的目的很明显,就是为了查找的速度快。但需要注意的是,不一定使用索引就必定快。索引的创建、使用合理与否,决定着索引的效率。
在数据库中,一般来说,把数据库索引分成聚簇索引和非聚簇索引两大类,其它什么唯一索引,主键索引等等,都是一些特定的叫法。这个在后面的索引系列中进先分析,本篇先分析底层的相关代码。
索引虽然可以加快查找效率,但也是需要付出代价的,在计算机中无外乎两种代价,空间代价和时间代价。索引本身需要存储空间,所以空间代价一定会增大;而在数据库的数据增删改时,除了修改数据本身还要修改索引,所以时间代价也是需要付出的。

二、mysql中索引的数据结构

在MySql中,不同的存储引擎使用的不同的索引数据结构,在MyISAM(非聚族索引)和InnoDB(聚簇索引)中使用的是B+树,而在内存引擎Memory中使用是HASH。下面就重点介绍一下这两个数据结构。
这先来看一下在mysql中的索引的底层数据结构:

先看一下HASH的数据结构(storage\heap中,内存型数据库引擎):

//heapdef.h
struct HASH_INFO {
  HASH_INFO * next_key;
  uchar * ptr_to_rec;
  ulong hash; /* Cached key hash value. */
};
//heap.h
struct HP_KEYDEF /* Key definition with open */
{
  uint flag{0};       /* HA_NOSAME | HA_NULL_PART_KEY */
  uint keysegs{0};    /* Number of key-segment */
  uint length{0};     /* Length of key (automatic) */
  uint8 algorithm{0}; /* HASH / BTREE */
  HA_KEYSEG *seg{nullptr};
  HP_BLOCK block; /* Where keys are saved */
  /*
    Number of buckets used in hash table. Used only to provide
    #records estimates for heap key scans.
  */
  ha_rows hash_buckets{0};
  TREE rb_tree;
  int (*write_key)(HP_INFO *info, HP_KEYDEF *keyinfo, const uchar *record,
                   uchar *recpos){nullptr};
  int (*delete_key)(HP_INFO *info, HP_KEYDEF *keyinfo, const uchar *record,
                    uchar *recpos, int flag){nullptr};
  uint (*get_key_length)(HP_KEYDEF *keydef, const uchar *key){nullptr};
};
//include/my_comare.h
struct HA_KEYSEG /* Key-portion */
{
  const CHARSET_INFO *charset;
  uint32 start;    /* Start of key in record */
  uint32 null_pos; /* position to NULL indicator */
  uint16 bit_pos;  /* Position to bit part */
  uint16 flag;
  uint16 length; /* Keylength */
  uint16 language;
  uint8 type;               /* Type of key (for sort) */
  uint8 null_bit;           /* bitmask to test for NULL */
  uint8 bit_start, bit_end; /* if bit field */
  uint8 bit_length;         /* Length of bit part * /
};

这个需要注意的是不要和innodb中的自适应 Hash索引(Adaptive Hash Index)混淆,它们在不同的目录中。

再看一下树的数据结构:

//storage/innobase/include/mem0mem.h
/** The info structure stored at the beginning of a heap block */
struct mem_block_info_t {
  uint64_t magic_n; /* magic number for debugging */
#ifdef UNIV_DEBUG
  char file_name[16]; /* file name where the mem heap was created */
  ulint line;         /*!< line number where the mem heap was created */
#endif                /* UNIV_DEBUG */
  UT_LIST_BASE_NODE_T(mem_block_t)
  base; /* In the first block in the
the list this is the base node of the list of blocks;
in subsequent blocks this is undefined */
  UT_LIST_NODE_T(mem_block_t)
  list;             /* This contains pointers to next
  and prev in the list. The first block allocated
  to the heap is also the first block in this list,
  though it also contains the base node of the list. */
  ulint len;        /*!< physical length of this block in bytes */
  ulint total_size; /*!< physical length in bytes of all blocks
                in the heap. This is defined only in the base
                node and is set to ULINT_UNDEFINED in others. */
  ulint type;       /*!< type of heap: MEM_HEAP_DYNAMIC, or
                    MEM_HEAP_BUF possibly ORed to MEM_HEAP_BTR_SEARCH */
  ulint free;       /*!< offset in bytes of the first free position for
                    user data in the block */
  ulint start;      /*!< the value of the struct field 'free' at the
                    creation of the block */
  void *free_block;
  /* if the MEM_HEAP_BTR_SEARCH bit is set in type,
  and this is the heap root, this can contain an
  allocated buffer frame, which can be appended as a
  free block to the heap, if we need more space;
  otherwise, this is NULL */
  void *buf_block;
  /* if this block has been allocated from the buffer
  pool, this contains the buf_block_t handle;
  otherwise, this is NULL */
};

/** The search info struct in an index */
struct btr_search_t {
  ulint ref_count; /*!< Number of blocks in this index tree
                   that have search index built
                   i.e. block->index points to this index.
                   Protected by search latch except
                   when during initialization in
                   btr_search_info_create(). */

  /** @{ The following fields are not protected by any latch.
  Unfortunately, this means that they must be aligned to
  the machine word, i.e., they cannot be turned into bit-fields. */
  buf_block_t *root_guess; /*!< the root page frame when it was last time
                           fetched, or NULL */
  ulint hash_analysis;     /*!< when this exceeds
                           BTR_SEARCH_HASH_ANALYSIS, the hash
                           analysis starts; this is reset if no
                           success noticed */
  ibool last_hash_succ;    /*!< TRUE if the last search would have
                           succeeded, or did succeed, using the hash
                           index; NOTE that the value here is not exact:
                           it is not calculated for every search, and the
                           calculation itself is not always accurate! */
  ulint n_hash_potential;
  /*!< number of consecutive searches
  which would have succeeded, or did succeed,
  using the hash index;
  the range is 0 .. BTR_SEARCH_BUILD_LIMIT + 5 */
  /** @} */
  /**---------------------- @{ */
  ulint n_fields;  /*!< recommended prefix length for hash search:
                   number of full fields */
  ulint n_bytes;   /*!< recommended prefix: number of bytes in
                   an incomplete field
                   @see BTR_PAGE_MAX_REC_SIZE */
  ibool left_side; /*!< TRUE or FALSE, depending on whether
                   the leftmost record of several records with
                   the same prefix should be indexed in the
                   hash index */
                   /*---------------------- @} */
#ifdef UNIV_SEARCH_PERF_STAT
  ulint n_hash_succ; /*!< number of successful hash searches thus
                     far */
  ulint n_hash_fail; /*!< number of failed hash searches */
  ulint n_patt_succ; /*!< number of successful pattern searches thus
                     far */
  ulint n_searches;  /*!< number of searches */
#endif               /* UNIV_SEARCH_PERF_STAT */
#ifdef UNIV_DEBUG
  ulint magic_n; /*!< magic number @see BTR_SEARCH_MAGIC_N */
/** value of btr_search_t::magic_n, used in assertions */
#define BTR_SEARCH_MAGIC_N 1112765
#endif /* UNIV_DEBUG */
};
//storage/innobase/include/dict0mem.h
/** Data structure for an index.  Most fields will be
initialized to 0, NULL or FALSE in dict_mem_index_create(). */
struct dict_index_t {
  space_index_t id;       /*!< id of the index */
  mem_heap_t *heap;       /*!< memory heap */
  id_name_t name;         /*!< index name */
  const char *table_name; /*!< table name */
  dict_table_t *table;    /*!< back pointer to table */
  unsigned space : 32;
  /*!< space where the index tree is placed */
  unsigned page : 32; /*!< index tree root page number */
  unsigned merge_threshold : 6;
  /*!< In the pessimistic delete, if the page
  data size drops below this limit in percent,
  merging it to a neighbor is tried */
#define DICT_INDEX_MERGE_THRESHOLD_DEFAULT 50
  unsigned type : DICT_IT_BITS;
  /*!< index type (DICT_CLUSTERED, DICT_UNIQUE,
  DICT_IBUF, DICT_CORRUPT) */
#define MAX_KEY_LENGTH_BITS 12
  unsigned trx_id_offset : MAX_KEY_LENGTH_BITS;
  /*!< position of the trx id column
  in a clustered index record, if the fields
  before it are known to be of a fixed size,
  0 otherwise */
#if (1 << MAX_KEY_LENGTH_BITS) < MAX_KEY_LENGTH
#error(1<<MAX_KEY_LENGTH_BITS) < MAX_KEY_LENGTH
#endif
  unsigned n_user_defined_cols : 10;
  /*!< number of columns the user defined to
  be in the index: in the internal
  representation we add more columns */
  unsigned allow_duplicates : 1;
  /*!< if true, allow duplicate values
  even if index is created with unique
  constraint */
  unsigned nulls_equal : 1;
  /*!< if true, SQL NULL == SQL NULL */
  unsigned disable_ahi : 1;
  /*!< if true, then disable AHI. Currently
  limited to intrinsic temporary table and SDI
  table as index id is not unique for such table
  which is one of the validation criterion for
  ahi. */
  unsigned n_uniq : 10;     /*!< number of fields from the beginning
                          which are enough to determine an index
                          entry uniquely */
  unsigned n_def : 10;      /*!< number of fields defined so far */
  unsigned n_fields : 10;   /*!< number of fields in the index */
  unsigned n_nullable : 10; /*!< number of nullable fields */
  unsigned n_instant_nullable : 10;
  /*!< number of nullable fields before first
  instant ADD COLUMN applied to this table.
  This is valid only when has_instant_cols() is true */
  unsigned cached : 1; /*!< TRUE if the index object is in the
                      dictionary cache */
  unsigned to_be_dropped : 1;
  /*!< TRUE if the index is to be dropped;
  protected by dict_operation_lock */
  unsigned online_status : 2;
  /*!< enum online_index_status.
  Transitions from ONLINE_INDEX_COMPLETE (to
  ONLINE_INDEX_CREATION) are protected
  by dict_operation_lock and
  dict_sys->mutex. Other changes are
  protected by index->lock. */
  unsigned uncommitted : 1;
  /*!< a flag that is set for secondary indexes
  that have not been committed to the
  data dictionary yet */
  unsigned instant_cols : 1;
  /*!< TRUE if the index is clustered index and it has some
  instant columns */
  uint32_t srid; /* spatial reference id */
  bool srid_is_valid;
  /* says whether SRID is valid - it cane be
  undefined */
  std::unique_ptr<dd::Spatial_reference_system> rtr_srs;
  /*!< Cached spatial reference system dictionary
  entry used by R-tree indexes. */

#ifdef UNIV_DEBUG
  uint32_t magic_n; /*!< magic number */
/** Value of dict_index_t::magic_n */
#define DICT_INDEX_MAGIC_N 76789786
#endif
  dict_field_t *fields; /*!< array of field descriptions */
#ifndef UNIV_HOTBACKUP
  st_mysql_ftparser *parser; /*!< fulltext parser plugin */
  bool is_ngram;
  /*!< true if it's ngram parser */
  bool has_new_v_col;
  /*!< whether it has a newly added virtual
  column in ALTER */
  bool hidden; /*!< if the index is an hidden index */
#endif         /* !UNIV_HOTBACKUP */
  UT_LIST_NODE_T(dict_index_t)
  indexes; /*!< list of indexes of the table */
  btr_search_t *search_info;
  /*!< info used in optimistic searches */
#ifndef UNIV_HOTBACKUP
  row_log_t *online_log;
  /*!< the log of modifications
  during online index creation;
  valid when online_status is
  ONLINE_INDEX_CREATION */
  /*----------------------*/
  /** Statistics for query optimization */
  /** @{ */
  ib_uint64_t *stat_n_diff_key_vals;
  /*!< approximate number of different
  key values for this index, for each
  n-column prefix where 1 <= n <=
  dict_get_n_unique(index) (the array is
  indexed from 0 to n_uniq-1); we
  periodically calculate new
  estimates */
  ib_uint64_t *stat_n_sample_sizes;
  /*!< number of pages that were sampled
  to calculate each of stat_n_diff_key_vals[],
  e.g. stat_n_sample_sizes[3] pages were sampled
  to get the number stat_n_diff_key_vals[3]. */
  ib_uint64_t *stat_n_non_null_key_vals;
  /* approximate number of non-null key values
  for this index, for each column where
  1 <= n <= dict_get_n_unique(index) (the array
  is indexed from 0 to n_uniq-1); This
  is used when innodb_stats_method is
  "nulls_ignored". */
  ulint stat_index_size;
  /*!< approximate index size in
  database pages */
#endif /* !UNIV_HOTBACKUP */
  ulint stat_n_leaf_pages;
  /*!< approximate number of leaf pages in the
  index tree */
  /** @} */
  last_ops_cur_t *last_ins_cur;
  /*!< cache the last insert position.
  Currently limited to auto-generated
  clustered index on intrinsic table only. */
  last_ops_cur_t *last_sel_cur;
  /*!< cache the last selected position
  Currently limited to intrinsic table only. */
  rec_cache_t rec_cache;
  /*!< cache the field that needs to be
  re-computed on each insert.
  Limited to intrinsic table as this is common
  share and can't be used without protection
  if table is accessible to multiple-threads. */
  rtr_ssn_t rtr_ssn;           /*!< Node sequence number for RTree */
  rtr_info_track_t *rtr_track; /*!< tracking all R-Tree search cursors */
  trx_id_t trx_id;             /*!< id of the transaction that created this
                               index, or 0 if the index existed
                               when InnoDB was started up */
  zip_pad_info_t zip_pad;      /*!< Information about state of
                               compression failures and successes */
  rw_lock_t lock;              /*!< read-write lock protecting the
                               upper levels of the index tree */
  bool fill_dd;                /*!< Flag whether need to fill dd tables
                               when it's a fulltext index. */

  /** Determine if the index has been committed to the
  data dictionary.
  @return whether the index definition has been committed */
  bool is_committed() const {
    ut_ad(!uncommitted || !(type & DICT_CLUSTERED));
    return (UNIV_LIKELY(!uncommitted));
  }

  /** Flag an index committed or uncommitted.
  @param[in]	committed	whether the index is committed */
  void set_committed(bool committed) {
    ut_ad(!to_be_dropped);
    ut_ad(committed || !(type & DICT_CLUSTERED));
    uncommitted = !committed;
  }

  /** Get the next index.
  @return	next index
  @retval	NULL	if this was the last index */
  const dict_index_t *next() const {
    const dict_index_t *next = UT_LIST_GET_NEXT(indexes, this);
    ut_ad(magic_n == DICT_INDEX_MAGIC_N);
    return (next);
  }
  /** Get the next index.
  @return	next index
  @retval	NULL	if this was the last index */
  dict_index_t *next() {
    return (const_cast<dict_index_t *>(
        const_cast<const dict_index_t *>(this)->next()));
  }

  /** Check whether the index is corrupted.
  @return true if index is corrupted, otherwise false */
  bool is_corrupted() const {
    ut_ad(magic_n == DICT_INDEX_MAGIC_N);

    return (type & DICT_CORRUPT);
  }

  /* Check whether the index is the clustered index
  @return nonzero for clustered index, zero for other indexes */

  bool is_clustered() const {
    ut_ad(magic_n == DICT_INDEX_MAGIC_N);

    return (type & DICT_CLUSTERED);
  }

  /** Check whether the index is the multi-value index
  @return nonzero for multi-value index, zero for other indexes */
  bool is_multi_value() const {
    ut_ad(magic_n == DICT_INDEX_MAGIC_N);

    return (type & DICT_MULTI_VALUE);
  }

  /** Returns the minimum data size of an index record.
  @return minimum data size in bytes */
  ulint get_min_size() const {
    ulint size = 0;

    for (unsigned i = 0; i < n_fields; i++) {
      size += get_col(i)->get_min_size();
    }

    return (size);
  }

  /** Check whether index can be used by transaction
  @param[in] trx		transaction*/
  bool is_usable(const trx_t *trx) const;

  /** Check whether index has any instantly added columns
  @return true if this is instant affected, otherwise false */
  bool has_instant_cols() const { return (instant_cols); }

  /** Check if tuple is having instant format.
  @param[in]	n_fields_in_tuple	number of fields in tuple
  @return true if yes, false otherwise. */
  bool is_tuple_instant_format(const uint16_t n_fields_in_tuple) const;

  /** Returns the number of nullable fields before specified
  nth field
  @param[in]	nth	nth field to check */
  uint32_t get_n_nullable_before(uint32_t nth) const {
    uint32_t nullable = n_nullable;

    ut_ad(nth <= n_fields);

    for (uint32_t i = nth; i < n_fields; ++i) {
      if (get_field(i)->col->is_nullable()) {
        --nullable;
      }
    }

    return (nullable);
  }

  /** Returns the number of fields before first instant ADD COLUMN */
  uint32_t get_instant_fields() const;

  /** Adds a field definition to an index. NOTE: does not take a copy
  of the column name if the field is a column. The memory occupied
  by the column name may be released only after publishing the index.
  @param[in] name_arg	column name
  @param[in] prefix_len	0 or the column prefix length in a MySQL index
                          like INDEX (textcol(25))
  @param[in] is_ascending	true=ASC, false=DESC */
  void add_field(const char *name_arg, ulint prefix_len, bool is_ascending) {
    dict_field_t *field;

    ut_ad(magic_n == DICT_INDEX_MAGIC_N);

    n_def++;

    field = get_field(n_def - 1);

    field->name = name_arg;
    field->prefix_len = (unsigned int)prefix_len;
    field->is_ascending = is_ascending;
  }

  /** Gets the nth field of an index.
  @param[in] pos	position of field
  @return pointer to field object */
  dict_field_t *get_field(ulint pos) const {
    ut_ad(pos < n_def);
    ut_ad(magic_n == DICT_INDEX_MAGIC_N);

    return (fields + pos);
  }

  /** Gets pointer to the nth column in an index.
  @param[in] pos	position of the field
  @return column */
  const dict_col_t *get_col(ulint pos) const { return (get_field(pos)->col); }

  /** Gets the column number the nth field in an index.
  @param[in] pos	position of the field
  @return column number */
  ulint get_col_no(ulint pos) const;

  /** Returns the position of a system column in an index.
  @param[in] type		DATA_ROW_ID, ...
  @return position, ULINT_UNDEFINED if not contained */
  ulint get_sys_col_pos(ulint type) const;

  /** Looks for column n in an index.
  @param[in]	n		column number
  @param[in]	inc_prefix	true=consider column prefixes too
  @param[in]	is_virtual	true==virtual column
  @return position in internal representation of the index;
  ULINT_UNDEFINED if not contained */
  ulint get_col_pos(ulint n, bool inc_prefix = false,
                    bool is_virtual = false) const;

  /** Get the default value of nth field and its length if exists.
  If not exists, both the return value is nullptr and length is 0.
  @param[in]	nth	nth field to get
  @param[in,out]	length	length of the default value
  @return	the default value data of nth field */
  const byte *get_nth_default(ulint nth, ulint *length) const {
    ut_ad(nth < n_fields);
    ut_ad(get_instant_fields() <= nth);
    const dict_col_t *col = get_col(nth);
    if (col->instant_default == nullptr) {
      *length = 0;
      return (nullptr);
    }

    *length = col->instant_default->len;
    ut_ad(*length == 0 || *length == UNIV_SQL_NULL ||
          col->instant_default->value != nullptr);
    return (col->instant_default->value);
  }

  /** Sets srid and srid_is_valid values
  @param[in]	srid_value		value of SRID, may be garbage
                                          if srid_is_valid_value = false
  @param[in]	srid_is_valid_value	value of srid_is_valid */
  void fill_srid_value(uint32_t srid_value, bool srid_is_valid_value) {
    srid_is_valid = srid_is_valid_value;
    srid = srid_value;
  }

  /** Check if the underlying table is compressed.
  @return true if compressed, false otherwise. */
  bool is_compressed() const;

  /** Check if a multi-value index is built on specified multi-value
  virtual column. Please note that there could be only one multi-value
  virtual column on the multi-value index, but not necessary the first
  field of the index.
  @param[in]	mv_col	multi-value virtual column
  @return non-zero means the column is on the index and this is the
  nth position of the column, zero means it's not on the index */
  uint32_t has_multi_value_col(const dict_v_col_t *mv_col) const {
    ut_ad(is_multi_value());
    for (uint32_t i = 0; i < n_fields; ++i) {
      const dict_col_t *col = get_col(i);
      if (mv_col->m_col.ind == col->ind) {
        return (i + 1);
      }

      /* Only one multi-value field, if not match then no match. */
      if (col->is_multi_value()) {
        break;
      }
    }

    return (0);
  }

 public:
  /** Get the page size of the tablespace to which this index belongs.
  @return the page size. */
  page_size_t get_page_size() const;

  /** Get the space id of the tablespace to which this index belongs.
  @return the space id. * /
  space_id_t space_id() const { return space; }
};

最后一个数据结构dict_index_t的注释明确说明了这就是索引结构的数据结构体,在innodb中采用的是聚集索引,聚集索引在前面提到过,索引数据和数据数据存储在同一物理空间内。

提到B+树,其实还有B树和B*树,在学习数据结构的时候儿还有AVL树和红黑树,这些在DB技术里都有应用,有兴趣的可以对比学习分析一下。其实这些树的数据结构,只要掌握了任何的其中一种,再学习其它树,只要明白它们不同和优缺点就非常容易了。

三、源码

哈希部分比较简单,这里只分析B+树的索引部分:

/** Creates an index memory object.
 @return own: index object */
dict_index_t *dict_mem_index_create(
    const char *table_name, /*!< in: table name */
    const char *index_name, /*!< in: index name */
    ulint space,            /*!< in: space where the index tree is
                            placed, ignored if the index is of
                            the clustered type */
    ulint type,             /*!< in: DICT_UNIQUE,
                            DICT_CLUSTERED, ... ORed */
    ulint n_fields)         /*!< in: number of fields */
{
  dict_index_t *index;
  mem_heap_t *heap;

  ut_ad(table_name && index_name);

  heap = mem_heap_create(DICT_HEAP_SIZE);

  index = static_cast<dict_index_t *>(mem_heap_zalloc(heap, sizeof(*index)));

  dict_mem_fill_index_struct(index, heap, table_name, index_name, space, type,
                             n_fields);

#ifndef UNIV_HOTBACKUP
#ifndef UNIV_LIBRARY
  dict_index_zip_pad_mutex_create_lazy(index);

  if (type & DICT_SPATIAL) {
    mutex_create(LATCH_ID_RTR_SSN_MUTEX, &index->rtr_ssn.mutex);
    index->rtr_track = static_cast<rtr_info_track_t *>(
        mem_heap_alloc(heap, sizeof(*index->rtr_track)));
    mutex_create(LATCH_ID_RTR_ACTIVE_MUTEX,
                 &index->rtr_track->rtr_active_mutex);
    index->rtr_track->rtr_active = UT_NEW_NOKEY(rtr_info_active());
  }
#endif /* !UNIV_LIBRARY */
#endif /* !UNIV_HOTBACKUP */

  return (index);
}

/** This function poplulates a dict_index_t index memory structure with
 supplied information. */
UNIV_INLINE
void dict_mem_fill_index_struct(
    dict_index_t *index,    /*!< out: index to be filled */
    mem_heap_t *heap,       /*!< in: memory heap */
    const char *table_name, /*!< in: table name */
    const char *index_name, /*!< in: index name */
    ulint space,            /*!< in: space where the index tree is
                            placed, ignored if the index is of
                            the clustered type */
    ulint type,             /*!< in: DICT_UNIQUE,
                            DICT_CLUSTERED, ... ORed */
    ulint n_fields)         /*!< in: number of fields */
{
  if (heap) {
    index->heap = heap;
    index->name = mem_heap_strdup(heap, index_name);
    index->fields = (dict_field_t *)mem_heap_alloc(
        heap, 1 + n_fields * sizeof(dict_field_t));
  } else {
    index->name = index_name;
    index->heap = nullptr;
    index->fields = nullptr;
  }

  /* Assign a ulint to a 4-bit-mapped field.
  Only the low-order 4 bits are assigned. */
  index->type = type;
#ifndef UNIV_HOTBACKUP
  index->space = (unsigned int)space;
  index->page = FIL_NULL;
  index->merge_threshold = DICT_INDEX_MERGE_THRESHOLD_DEFAULT;
#endif /* !UNIV_HOTBACKUP */
  index->table_name = table_name;
  index->n_fields = (unsigned int)n_fields;
  /* The '1 +' above prevents allocation
  of an empty mem block */
  index->allow_duplicates = false;
  index->nulls_equal = false;
  index->disable_ahi = false;
  index->last_ins_cur = nullptr;
  index->last_sel_cur = nullptr;
#ifndef UNIV_HOTBACKUP
  new (&index->rec_cache) rec_cache_t();

#endif /* UNIV_HOTBACKUP */
#ifdef UNIV_DEBUG
  index->magic_n = DICT_INDEX_MAGIC_N;
#endif /* UNIV_DEBUG */
}

/** Returns the number of fields before first instant ADD COLUMN * /
inline uint32_t dict_index_t::get_instant_fields() const {
  ut_ad(has_instant_cols());
  return (n_fields - (table->n_cols - table->n_instant_cols));
}

上面是创建索引和显示索引的信息,再看一下如何给一列增加索引:

/** Adds a column to index.
@param[in,out]	index		index
@param[in]	table		table
@param[in]	col		column
@param[in]	prefix_len	column prefix length
@param[in]	is_ascending	true=ASC, false=DESC */
void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
                        dict_col_t *col, ulint prefix_len, bool is_ascending) {
  dict_field_t *field;
  const char *col_name;

#ifndef UNIV_LIBRARY
  if (col->is_virtual()) {
#ifndef UNIV_HOTBACKUP
    dict_v_col_t *v_col = reinterpret_cast<dict_v_col_t *>(col);

    /* When v_col->v_indexes==NULL,
    ha_innobase::commit_inplace_alter_table(commit=true)
    will evict and reload the table definition, and
    v_col->v_indexes will not be NULL for the new table. */
    if (v_col->v_indexes != nullptr) {
      /* Register the index with the virtual column index
      list */
      struct dict_v_idx_t new_idx = {index, index->n_def};

      v_col->v_indexes->push_back(new_idx);
    }

    col_name = dict_table_get_v_col_name_mysql(table, dict_col_get_no(col));
#else  /* !UNIV_HOTBACKUP */
    /* PRELIMINARY TEMPORARY WORKAROUND: is this ever used? */
    bool not_hotbackup = false;
    ut_a(not_hotbackup);
#endif /* !UNIV_HOTBACKUP */
  } else
#endif /* !UNIV_LIBRARY */
  {
    col_name = table->get_col_name(dict_col_get_no(col));
  }

  index->add_field(col_name, prefix_len, is_ascending);

  field = index->get_field(index->n_def - 1);

  field->col = col;
  /* DATA_POINT is a special type, whose fixed_len should be:
  1) DATA_MBR_LEN, when it's indexed in R-TREE. In this case,
  it must be the first col to be added.
  2) DATA_POINT_LEN(be equal to fixed size of column), when it's
  indexed in B-TREE,
  3) DATA_POINT_LEN, if a POINT col is the PRIMARY KEY, and we are
  adding the PK col to other B-TREE/R-TREE. */
  /* TODO: We suppose the dimension is 2 now. */
  if (dict_index_is_spatial(index) && DATA_POINT_MTYPE(col->mtype) &&
      index->n_def == 1) {
    field->fixed_len = DATA_MBR_LEN;
  } else {
    field->fixed_len = static_cast<unsigned int>(
        col->get_fixed_size(dict_table_is_comp(table)));
  }

  if (prefix_len && field->fixed_len > prefix_len) {
    field->fixed_len = (unsigned int)prefix_len;
  }

  /* Long fixed-length fields that need external storage are treated as
  variable-length fields, so that the extern flag can be embedded in
  the length word. */

  if (field->fixed_len > DICT_MAX_FIXED_COL_LEN) {
    field->fixed_len = 0;
  }
#if DICT_MAX_FIXED_COL_LEN != 768
  /* The comparison limit above must be constant.  If it were
  changed, the disk format of some fixed-length columns would
  change, which would be a disaster. * /
#error "DICT_MAX_FIXED_COL_LEN != 768"
#endif

  if (!(col->prtype & DATA_NOT_NULL)) {
    index->n_nullable++;
  }
}

加载一个索引集:

/** Loads definitions for table indexes. Adds them to the data dictionary
 cache.
 @return DB_SUCCESS if ok, DB_CORRUPTION if corruption of dictionary
 table or DB_UNSUPPORTED if table has unknown index type */
static dberr_t dict_load_indexes(
    dict_table_t *table, /*!< in/out: table */
    mem_heap_t *heap,    /*!< in: memory heap for temporary storage */
    dict_err_ignore_t ignore_err)
/*!< in: error to be ignored when
loading the index definition */
{
  dict_table_t *sys_indexes;
  dict_index_t *sys_index;
  btr_pcur_t pcur;
  dtuple_t *tuple;
  dfield_t *dfield;
  const rec_t *rec;
  byte *buf;
  mtr_t mtr;
  dberr_t error = DB_SUCCESS;

  ut_ad(mutex_own(&dict_sys->mutex));

  mtr_start(&mtr);

  sys_indexes = dict_table_get_low("SYS_INDEXES");
  sys_index = UT_LIST_GET_FIRST(sys_indexes->indexes);
  ut_ad(!dict_table_is_comp(sys_indexes));
  ut_ad(name_of_col_is(sys_indexes, sys_index, DICT_FLD__SYS_INDEXES__NAME,
                       "NAME"));
  ut_ad(name_of_col_is(sys_indexes, sys_index, DICT_FLD__SYS_INDEXES__PAGE_NO,
                       "PAGE_NO"));

  tuple = dtuple_create(heap, 1);
  dfield = dtuple_get_nth_field(tuple, 0);

  buf = static_cast<byte *>(mem_heap_alloc(heap, 8));
  mach_write_to_8(buf, table->id);

  dfield_set_data(dfield, buf, 8);
  dict_index_copy_types(tuple, sys_index, 1);

  btr_pcur_open_on_user_rec(sys_index, tuple, PAGE_CUR_GE, BTR_SEARCH_LEAF,
                            &pcur, &mtr);
  for (;;) {
    dict_index_t *index = nullptr;
    const char *err_msg;

    if (!btr_pcur_is_on_user_rec(&pcur)) {
      /* We should allow the table to open even
      without index when DICT_ERR_IGNORE_CORRUPT is set.
      DICT_ERR_IGNORE_CORRUPT is currently only set
      for drop table */
      if (table->first_index() == nullptr &&
          !(ignore_err & DICT_ERR_IGNORE_CORRUPT)) {
        ib::warn(ER_IB_MSG_197) << "Cannot load table " << table->name
                                << " because it has no indexes in"
                                   " InnoDB internal data dictionary.";
        error = DB_CORRUPTION;
        goto func_exit;
      }

      break;
    }

    rec = btr_pcur_get_rec(&pcur);

    if ((ignore_err & DICT_ERR_IGNORE_RECOVER_LOCK) &&
        (rec_get_n_fields_old_raw(rec) == DICT_NUM_FIELDS__SYS_INDEXES
         /* a record for older SYS_INDEXES table
         (missing merge_threshold column) is acceptable. */
         ||
         rec_get_n_fields_old_raw(rec) == DICT_NUM_FIELDS__SYS_INDEXES - 1)) {
      const byte *field;
      ulint len;
      field = rec_get_nth_field_old(rec, DICT_FLD__SYS_INDEXES__NAME, &len);

      if (len != UNIV_SQL_NULL &&
          static_cast<char>(*field) ==
              static_cast<char>(*TEMP_INDEX_PREFIX_STR)) {
        /* Skip indexes whose name starts with
        TEMP_INDEX_PREFIX, because they will
        be dropped during crash recovery. */
        goto next_rec;
      }
    }

    err_msg =
        dict_load_index_low(buf, table->name.m_name, heap, rec, TRUE, &index);
    ut_ad((index == nullptr && err_msg != nullptr) ||
          (index != nullptr && err_msg == nullptr));

    if (err_msg == dict_load_index_id_err) {
      /* TABLE_ID mismatch means that we have
      run out of index definitions for the table. */

      if (table->first_index() == nullptr &&
          !(ignore_err & DICT_ERR_IGNORE_CORRUPT)) {
        ib::warn(ER_IB_MSG_198)
            << "Failed to load the"
               " clustered index for table "
            << table->name << " because of the following error: " << err_msg
            << "."
               " Refusing to load the rest of the"
               " indexes (if any) and the whole table"
               " altogether.";
        error = DB_CORRUPTION;
        goto func_exit;
      }

      break;
    } else if (err_msg == dict_load_index_del) {
      /* Skip delete-marked records. */
      goto next_rec;
    } else if (err_msg) {
      ib::error(ER_IB_MSG_199) << err_msg;
      if (ignore_err & DICT_ERR_IGNORE_CORRUPT) {
        goto next_rec;
      }
      error = DB_CORRUPTION;
      goto func_exit;
    }

    ut_ad(index);

    /* Check whether the index is corrupted */
    if (index->is_corrupted()) {
      ib::error(ER_IB_MSG_200) << "Index " << index->name << " of table "
                               << table->name << " is corrupted";

      if (!srv_load_corrupted && !(ignore_err & DICT_ERR_IGNORE_CORRUPT) &&
          index->is_clustered()) {
        dict_mem_index_free(index);

        error = DB_INDEX_CORRUPT;
        goto func_exit;
      } else {
        /* We will load the index if
        1) srv_load_corrupted is TRUE
        2) ignore_err is set with
        DICT_ERR_IGNORE_CORRUPT
        3) if the index corrupted is a secondary
        index */
        ib::info(ER_IB_MSG_201) << "Load corrupted index " << index->name
                                << " of table " << table->name;
      }
    }

    if (index->type & DICT_FTS && !dict_table_has_fts_index(table)) {
      /* This should have been created by now. */
      ut_a(table->fts != nullptr);
      DICT_TF2_FLAG_SET(table, DICT_TF2_FTS);
    }

    /* We check for unsupported types first, so that the
    subsequent checks are relevant for the supported types. */
    if (index->type & ~(DICT_CLUSTERED | DICT_UNIQUE | DICT_CORRUPT | DICT_FTS |
                        DICT_SPATIAL | DICT_VIRTUAL)) {
      ib::error(ER_IB_MSG_202) << "Unknown type " << index->type << " of index "
                               << index->name << " of table " << table->name;

      error = DB_UNSUPPORTED;
      dict_mem_index_free(index);
      goto func_exit;
    } else if (!index->is_clustered() && nullptr == table->first_index()) {
      ib::error(ER_IB_MSG_203)
          << "Trying to load index " << index->name << " for table "
          << table->name << ", but the first index is not clustered!";

      dict_mem_index_free(index);
      error = DB_CORRUPTION;
      goto func_exit;
    } else if (dict_is_old_sys_table(table->id) &&
               (index->is_clustered() || ((table == dict_sys->sys_tables) &&
                                          !strcmp("ID_IND", index->name)))) {
      /* The index was created in memory already at booting
      of the database server */
      dict_mem_index_free(index);
    } else {
      dict_load_fields(index, heap);

      mutex_exit(&dict_sys->mutex);

      error = dict_index_add_to_cache(table, index, index->page, FALSE);

      mutex_enter(&dict_sys->mutex);

      /* The data dictionary tables should never contain
      invalid index definitions. */
      if (UNIV_UNLIKELY(error != DB_SUCCESS)) {
        goto func_exit;
      }
    }
  next_rec:
    btr_pcur_move_to_next_user_rec(&pcur, &mtr);
  }

  ut_ad(table->fts_doc_id_index == nullptr);

  if (table->fts != nullptr) {
    table->fts_doc_id_index =
        dict_table_get_index_on_name(table, FTS_DOC_ID_INDEX_NAME);
  }

  /* If the table contains FTS indexes, populate table->fts->indexes */
  if (dict_table_has_fts_index(table)) {
    ut_ad(table->fts_doc_id_index != nullptr);
    /* table->fts->indexes should have been created. * /
    ut_a(table->fts->indexes != nullptr);
    dict_table_get_all_fts_indexes(table, table->fts->indexes);
  }

func_exit:
  btr_pcur_close(&pcur);
  mtr_commit(&mtr);

  return (error);
}

再看一看通过索引查询相关数据:

/** Gets the column number.
 @return col->ind, table column position (starting from 0) */
UNIV_INLINE
ulint dict_col_get_no(const dict_col_t *col) /*!< in: column */
{
  ut_ad(col);

  return (col->ind);
}

/** Gets the column position in the clustered index. */
UNIV_INLINE
ulint dict_col_get_clust_pos(
    const dict_col_t *col,           /*!< in: table column */
    const dict_index_t *clust_index) /*!< in: clustered index */
{
  ulint i;

  ut_ad(col);
  ut_ad(clust_index);
  ut_ad(clust_index->is_clustered());

  for (i = 0; i < clust_index->n_def; i++) {
    const dict_field_t *field = &clust_index->fields[i];

    if (!field->prefix_len && field->col == col) {
      return (i);
    }
  }

  return (ULINT_UNDEFINED);
}

/** Gets the column position in the given index.
@param[in]	col	table column
@param[in]	index	index to be searched for column
@return position of column in the given index. */
UNIV_INLINE
ulint dict_col_get_index_pos(const dict_col_t *col, const dict_index_t *index) {
  ulint i;

  for (i = 0; i < index->n_def; i++) {
    const dict_field_t *field = &index->fields[i];

    if (!field->prefix_len && field->col == col) {
      return (i);
    }
  }

  return (ULINT_UNDEFINED);
}

/** Check whether the index consists of descending columns only.
@param[in]	index  index tree
@retval true if index has any descending column
@retval false if index has only ascending columns */
UNIV_INLINE
bool dict_index_has_desc(const dict_index_t *index) {
  ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);

  for (ulint i = 0; i < index->n_def; i++) {
    const dict_field_t *field = &index->fields[i];

    if (!field->is_ascending) {
      return (true);
    }
  }

  return (false);
}

/** Check if index is auto-generated clustered index.
@param[in]	index	index

@return true if index is auto-generated clustered index. */
UNIV_INLINE
bool dict_index_is_auto_gen_clust(const dict_index_t *index) {
  return (index->type == DICT_CLUSTERED);
}

/** Check whether the index is unique.
 @return nonzero for unique index, zero for other indexes */
UNIV_INLINE
ulint dict_index_is_unique(const dict_index_t *index) /*!< in: index */
{
  ut_ad(index);
  ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);

  return (index->type & DICT_UNIQUE);
}

/** Check whether the index is a Spatial Index.
 @return	nonzero for Spatial Index, zero for other indexes */
UNIV_INLINE
ulint dict_index_is_spatial(const dict_index_t *index) /*!< in: index */
{
  ut_ad(index);
  ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);

  return (index->type & DICT_SPATIAL);
}

/** Check whether the index contains a virtual column
@param[in]	index	index
@return	nonzero for the index has virtual column, zero for other indexes */
UNIV_INLINE
ulint dict_index_has_virtual(const dict_index_t *index) {
  ut_ad(index);
  ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);

  return (index->type & DICT_VIRTUAL);
}

/** Check whether the index is the insert buffer tree.
 @return nonzero for insert buffer, zero for other indexes */
UNIV_INLINE
ulint dict_index_is_ibuf(const dict_index_t *index) /*!< in: index */
{
  ut_ad(index);
  ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);

  return (index->type & DICT_IBUF);
}

/** Check whether the index is a secondary index or the insert buffer tree.
 @return nonzero for insert buffer, zero for other indexes */
UNIV_INLINE
ulint dict_index_is_sec_or_ibuf(const dict_index_t *index) /*!< in: index */
{
  ulint type;

  ut_ad(index);
  ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);

  type = index->type;

  return (!(type & DICT_CLUSTERED) || (type & DICT_IBUF));
}

再看一下Page分裂时如何处理:


/** Splits an index page to halves and inserts the tuple. It is assumed
 that mtr holds an x-latch to the index tree. NOTE: the tree x-latch is
 released within this function! NOTE that the operation of this
 function must always succeed, we cannot reverse it: therefore enough
 free disk space (2 pages) must be guaranteed to be available before
 this function is called.
 @return inserted record */
rec_t *btr_page_split_and_insert(
    uint32_t flags,        /*!< in: undo logging and locking flags */
    btr_cur_t *cursor,     /*!< in: cursor at which to insert; when the
                           function returns, the cursor is positioned
                           on the predecessor of the inserted record */
    ulint **offsets,       /*!< out: offsets on inserted record */
    mem_heap_t **heap,     /*!< in/out: pointer to memory heap, or NULL */
    const dtuple_t *tuple, /*!< in: tuple to insert */
    mtr_t *mtr)            /*!< in: mtr */
{
  buf_block_t *block;
  page_t *page;
  page_zip_des_t *page_zip;
  page_no_t page_no;
  byte direction;
  page_no_t hint_page_no;
  buf_block_t *new_block;
  page_t *new_page;
  page_zip_des_t *new_page_zip;
  rec_t *split_rec;
  buf_block_t *left_block;
  buf_block_t *right_block;
  buf_block_t *insert_block;
  page_cur_t *page_cursor;
  rec_t *first_rec;
  byte *buf = nullptr; /* remove warning */
  rec_t *move_limit;
  ibool insert_will_fit;
  ibool insert_left;
  ulint n_iterations = 0;
  rec_t *rec;
  ulint n_uniq;
  dict_index_t *index;

  index = btr_cur_get_index(cursor);

  if (dict_index_is_spatial(index)) {
    /* Split rtree page and update parent */
    return (
        rtr_page_split_and_insert(flags, cursor, offsets, heap, tuple, mtr));
  }

  if (!*heap) {
    *heap = mem_heap_create(1024);
  }
  n_uniq = dict_index_get_n_unique_in_tree(cursor->index);
func_start:
  ut_ad(tuple->m_heap != *heap);
  mem_heap_empty(*heap);
  *offsets = nullptr;

  ut_ad(mtr_memo_contains_flagged(mtr, dict_index_get_lock(cursor->index),
                                  MTR_MEMO_X_LOCK | MTR_MEMO_SX_LOCK) ||
        cursor->index->table->is_intrinsic());
  ut_ad(!dict_index_is_online_ddl(cursor->index) || (flags & BTR_CREATE_FLAG) ||
        cursor->index->is_clustered());
  ut_ad(rw_lock_own_flagged(dict_index_get_lock(cursor->index),
                            RW_LOCK_FLAG_X | RW_LOCK_FLAG_SX) ||
        cursor->index->table->is_intrinsic());

  block = btr_cur_get_block(cursor);
  page = buf_block_get_frame(block);
  page_zip = buf_block_get_page_zip(block);

  ut_ad(
      mtr_is_block_fix(mtr, block, MTR_MEMO_PAGE_X_FIX, cursor->index->table));
  ut_ad(!page_is_empty(page));

  /* try to insert to the next page if possible before split */
  rec =
      btr_insert_into_right_sibling(flags, cursor, offsets, *heap, tuple, mtr);

  if (rec != nullptr) {
    return (rec);
  }

  page_no = block->page.id.page_no();

  /* 1. Decide the split record; split_rec == NULL means that the
  tuple to be inserted should be the first record on the upper
  half-page */
  insert_left = FALSE;

  if (n_iterations > 0) {
    direction = FSP_UP;
    hint_page_no = page_no + 1;
    split_rec = btr_page_get_split_rec(cursor, tuple);

    if (split_rec == nullptr) {
      insert_left =
          btr_page_tuple_smaller(cursor, tuple, offsets, n_uniq, heap);
    }
  } else if (btr_page_get_split_rec_to_right(cursor, &split_rec)) {
    direction = FSP_UP;
    hint_page_no = page_no + 1;

  } else if (btr_page_get_split_rec_to_left(cursor, &split_rec)) {
    direction = FSP_DOWN;
    hint_page_no = page_no - 1;
    ut_ad(split_rec);
  } else {
    direction = FSP_UP;
    hint_page_no = page_no + 1;

    /* If there is only one record in the index page, we
    can't split the node in the middle by default. We need
    to determine whether the new record will be inserted
    to the left or right. */

    if (page_get_n_recs(page) > 1) {
      split_rec = page_get_middle_rec(page);
    } else if (btr_page_tuple_smaller(cursor, tuple, offsets, n_uniq, heap)) {
      split_rec = page_rec_get_next(page_get_infimum_rec(page));
    } else {
      split_rec = nullptr;
    }
  }

  /* 2. Allocate a new page to the index */
  new_block = btr_page_alloc(cursor->index, hint_page_no, direction,
                             btr_page_get_level(page, mtr), mtr, mtr);

  /* New page could not be allocated */
  if (!new_block) {
    return nullptr;
  }

  new_page = buf_block_get_frame(new_block);
  new_page_zip = buf_block_get_page_zip(new_block);
  btr_page_create(new_block, new_page_zip, cursor->index,
                  btr_page_get_level(page, mtr), mtr);

  /* 3. Calculate the first record on the upper half-page, and the
  first record (move_limit) on original page which ends up on the
  upper half */

  if (split_rec) {
    first_rec = move_limit = split_rec;

    *offsets =
        rec_get_offsets(split_rec, cursor->index, *offsets, n_uniq, heap);

    insert_left = cmp_dtuple_rec(tuple, split_rec, cursor->index, *offsets) < 0;

    if (!insert_left && new_page_zip && n_iterations > 0) {
      /* If a compressed page has already been split,
      avoid further splits by inserting the record
      to an empty page. */
      split_rec = nullptr;
      goto insert_empty;
    }
  } else if (insert_left) {
    ut_a(n_iterations > 0);
    first_rec = page_rec_get_next(page_get_infimum_rec(page));
    move_limit = page_rec_get_next(btr_cur_get_rec(cursor));
  } else {
  insert_empty:
    ut_ad(!split_rec);
    ut_ad(!insert_left);
    buf =
        UT_NEW_ARRAY_NOKEY(byte, rec_get_converted_size(cursor->index, tuple));

    first_rec = rec_convert_dtuple_to_rec(buf, cursor->index, tuple);
    move_limit = page_rec_get_next(btr_cur_get_rec(cursor));
  }

  /* 4. Do first the modifications in the tree structure */

  btr_attach_half_pages(flags, cursor->index, block, first_rec, new_block,
                        direction, mtr);

  /* If the split is made on the leaf level and the insert will fit
  on the appropriate half-page, we may release the tree x-latch.
  We can then move the records after releasing the tree latch,
  thus reducing the tree latch contention. */

  if (split_rec) {
    insert_will_fit =
        !new_page_zip &&
        btr_page_insert_fits(cursor, split_rec, offsets, tuple, heap);
  } else {
    if (!insert_left) {
      UT_DELETE_ARRAY(buf);
      buf = nullptr;
    }

    insert_will_fit =
        !new_page_zip &&
        btr_page_insert_fits(cursor, nullptr, offsets, tuple, heap);
  }

  if (!srv_read_only_mode && !cursor->index->table->is_intrinsic() &&
      insert_will_fit && page_is_leaf(page) &&
      !dict_index_is_online_ddl(cursor->index)) {
    mtr->memo_release(dict_index_get_lock(cursor->index),
                      MTR_MEMO_X_LOCK | MTR_MEMO_SX_LOCK);

    /* NOTE: We cannot release root block latch here, because it
    has segment header and already modified in most of cases.*/
  }

  /* 5. Move then the records to the new page */
  if (direction == FSP_DOWN) {
    /*		fputs("Split left\n", stderr); */

    if (false
#ifdef UNIV_ZIP_COPY
        || page_zip
#endif /* UNIV_ZIP_COPY */
        || !page_move_rec_list_start(new_block, block, move_limit,
                                     cursor->index, mtr)) {
      /* For some reason, compressing new_page failed,
      even though it should contain fewer records than
      the original page.  Copy the page byte for byte
      and then delete the records from both pages
      as appropriate.  Deleting will always succeed. */
      ut_a(new_page_zip);

      page_zip_copy_recs(new_page_zip, new_page, page_zip, page, cursor->index,
                         mtr);
      page_delete_rec_list_end(move_limit - page + new_page, new_block,
                               cursor->index, ULINT_UNDEFINED, ULINT_UNDEFINED,
                               mtr);

      /* Update the lock table and possible hash index. */

      if (!dict_table_is_locking_disabled(cursor->index->table)) {
        lock_move_rec_list_start(new_block, block, move_limit,
                                 new_page + PAGE_NEW_INFIMUM);
      }

      btr_search_move_or_delete_hash_entries(new_block, block, cursor->index);

      /* Delete the records from the source page. */

      page_delete_rec_list_start(move_limit, block, cursor->index, mtr);
    }

    left_block = new_block;
    right_block = block;

    if (!dict_table_is_locking_disabled(cursor->index->table)) {
      lock_update_split_left(right_block, left_block);
    }
  } else {
    /*		fputs("Split right\n", stderr); */

    if (false
#ifdef UNIV_ZIP_COPY
        || page_zip
#endif /* UNIV_ZIP_COPY */
        || !page_move_rec_list_end(new_block, block, move_limit, cursor->index,
                                   mtr)) {
      /* For some reason, compressing new_page failed,
      even though it should contain fewer records than
      the original page.  Copy the page byte for byte
      and then delete the records from both pages
      as appropriate.  Deleting will always succeed. */
      ut_a(new_page_zip);

      page_zip_copy_recs(new_page_zip, new_page, page_zip, page, cursor->index,
                         mtr);
      page_delete_rec_list_start(move_limit - page + new_page, new_block,
                                 cursor->index, mtr);

      /* Update the lock table and possible hash index. */
      if (!dict_table_is_locking_disabled(cursor->index->table)) {
        lock_move_rec_list_end(new_block, block, move_limit);
      }

      ut_ad(!dict_index_is_spatial(index));

      btr_search_move_or_delete_hash_entries(new_block, block, cursor->index);

      /* Delete the records from the source page. */

      page_delete_rec_list_end(move_limit, block, cursor->index,
                               ULINT_UNDEFINED, ULINT_UNDEFINED, mtr);
    }

    left_block = block;
    right_block = new_block;

    if (!dict_table_is_locking_disabled(cursor->index->table)) {
      lock_update_split_right(right_block, left_block);
    }
  }

#ifdef UNIV_ZIP_DEBUG
  if (page_zip) {
    ut_a(page_zip_validate(page_zip, page, cursor->index));
    ut_a(page_zip_validate(new_page_zip, new_page, cursor->index));
  }
#endif /* UNIV_ZIP_DEBUG */

  /* At this point, split_rec, move_limit and first_rec may point
  to garbage on the old page. */

  /* 6. The split and the tree modification is now completed. Decide the
  page where the tuple should be inserted */

  if (insert_left) {
    insert_block = left_block;
  } else {
    insert_block = right_block;
  }

  /* 7. Reposition the cursor for insert and try insertion */
  page_cursor = btr_cur_get_page_cur(cursor);

  page_cur_search(insert_block, cursor->index, tuple, page_cursor);

  rec = page_cur_tuple_insert(page_cursor, tuple, cursor->index, offsets, heap,
                              mtr);

#ifdef UNIV_ZIP_DEBUG
  {
    page_t *insert_page = buf_block_get_frame(insert_block);

    page_zip_des_t *insert_page_zip = buf_block_get_page_zip(insert_block);

    ut_a(!insert_page_zip ||
         page_zip_validate(insert_page_zip, insert_page, cursor->index));
  }
#endif /* UNIV_ZIP_DEBUG */

  if (rec != nullptr) {
    goto func_exit;
  }

  /* 8. If insert did not fit, try page reorganization.
  For compressed pages, page_cur_tuple_insert() will have
  attempted this already. */

  if (page_cur_get_page_zip(page_cursor) ||
      !btr_page_reorganize(page_cursor, cursor->index, mtr)) {
    goto insert_failed;
  }

  rec = page_cur_tuple_insert(page_cursor, tuple, cursor->index, offsets, heap,
                              mtr);

  if (rec == nullptr) {
    /* The insert did not fit on the page: loop back to the
    start of the function for a new split */
  insert_failed:
    /* We play safe and reset the free bits for new_page */
    if (!cursor->index->is_clustered() &&
        !cursor->index->table->is_temporary()) {
      ibuf_reset_free_bits(new_block);
      ibuf_reset_free_bits(block);
    }

    n_iterations++;
    ut_ad(n_iterations < 2 || buf_block_get_page_zip(insert_block));
    ut_ad(!insert_will_fit);

    goto func_start;
  }

func_exit:
  /* Insert fit on the page: update the free bits for the
  left and right pages in the same mtr * /

  if (!cursor->index->is_clustered() && !cursor->index->table->is_temporary() &&
      page_is_leaf(page)) {
    ibuf_update_free_bits_for_two_pages_low(left_block, right_block, mtr);
  }

  MONITOR_INC(MONITOR_INDEX_SPLIT);

  ut_ad(page_validate(buf_block_get_frame(left_block), cursor->index));
  ut_ad(page_validate(buf_block_get_frame(right_block), cursor->index));

  ut_ad(!rec || rec_offs_validate(rec, cursor->index, * offsets));

基本上索引和数据在innodb中都在字典及页处理部分中,更多的细节可以查看相关的storage/include 以及相关的dict路径下的相关代码。换句话说,可以在include中的gis0tree.h 和gistree.ic中有对一些更细节的数据处理的函数,不过,在此引擎中,对树的描述相对分散一些,毕竟聚簇索引要保障在描述索引时还要描述数据,所以这点要看明白。
在MySql5.5以后,默认的数据库引擎是innodb而之间是MyISAM。而它的索引数据是分开来设计的,就比较好看清楚了。

五、总结

说实话,已经有多年不正经写Sql语句了,更多的则是关注于数据底层的细节,特别是针对近些年来的一些NOSQL型数据库,花费了不少的精力。但回过头来看,整个数据库解决的不外乎两个问题:一个是存储量大;另外一个是CRUD快。涉及到具体的实现,就需要考虑各种安全、并行。更要考虑事务和一致性,还要考虑对分布式的支持等等。数据库技术是一门相当复杂的技术,从上到下,从理论到实践,都在互相不断促进着。
索引只是其中的重要的一环,要想用好索引,知道索引底层是如何实现的,能更好的有针对性的解决在实际中遇到的索引问题。不同的数据库,可能实现的机制略有不同,但是原理基本都是类似的。好好学习,天天向上。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值