mysql全文索引 innodb,mysql5.6中关于innodb存储引擎支持全文索引的原版文档(zt)...

Create an FTS Index on an InnoDB Table

For creating a full-text index, we recommend loading the table

with data first, then creating the FULLTEXT index.

This is much faster than creating the table with an empty

FULLTEXT index, then inserting documents into it.

use test;

drop table if exists quotes;

-- The InnoDB full-text search feature in the 5.6 Labs release

-- lets us define a full-text index on an InnoDB table.

create table quotes

( id int unsigned auto_increment primary key

, author varchar(64)

, quote varchar(400)

, source varchar(64)

) engine=innodb;

-- Insert some words and phrases to search for into the table.

insert into quotes (author, quote, source) values

('Abraham Lincoln', 'Fourscore and seven years ago...',

'Gettysburg Address')

, ('George Harrison', 'All those years ago...', 'Live In Japan')

, ('Arthur C. Clarke', 'Then 10 years ago the monolith was discovered.',

'2010: The Year We Make Contact')

, ('Benjamin Franklin',

'Early to bed and early to rise, makes a man healthy, wealthy, and wise.',

'Poor Richard''s Almanack')

, ('James Thurber',

'Early to rise and early to bed makes a male healthy and wealthy and dead.',

'The New Yorker')

, ('K', '1500 hundred years ago, everybody knew that the Earth was the center of the universe.', 'Men in Black');

-- Create the fulltext index

create fulltext index idx on quotes(quote);

-- You can also create fulltext index at create table time

-- create table quotes

-- ( id int unsigned auto_increment primary key

-- , author varchar(64)

-- , quote varchar(400)

-- , source varchar(64)

-- , fulltext (quote)

-- ) engine=innodb;

InnoDB can create the index in parallel through InnoDB's Fast

Index Creation (FIC) feature, which supports parallel sorting. This

technique actually allows you to tokenize, sort, and create an

InnoDB FULLTEXT index in parallel.

To control the parallel sort degree, you can use a new

configuration variable innodb_ft_sort_pll_degree

(default 2, maximum 32). This variable specifies how many ways to

parallelize the tokenization and sort operations. Experiments show,

on a non-I/O bound system, the create index performance scales well

with the number of CPUs and

innodb_ft_sort_pll_degree.

The following table shows a quick run on 2.7 GB of Wikipedia

data on an 8-core Linux x86 machine:

Server

Time (min)

MyISAM

11 min 47.90

InnoDB (default)

7 min 25.21 sec

InnoDB pll_degree

5 min 34.98

InnoDB pll_degree = 8

4 min 9 sec

InnoDB pll_degree = 16

3 min 39.51 sec

Each InnoDB table with a FULLTEXT index includes a

column FTS_DOC_ID. Specifying this column in the

original table definition saves a rebuild of the entire table when

an InnoDB FULLTEXT index is created. To do so, add the

column FTS_DOC_ID (all uppercase) to the table with

the FTS index. The column must be of BIGINT UNSIGNED NOT

NULL datatype. It does not need to be an auto-increment

column, but auto_increment could make the loading

easier:

CREATE TABLE fts_test (

FTS_DOC_ID BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,

title VARCHAR(200),

body TEXT) ENGINE=InnoDB;

-- The unique "FTS_DOC_ID_INDEX" index on FTS_DOC_ID is also optional,

-- without it, InnoDB will automatically create it.:

CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on fts_test(FTS_DOC_ID);

Then load the data and:

CREATE FULLTEXT INDEX idx on fts_test (title, body);

Please note, currently this special column must be named

FTS_DOC_ID, and the name must be all uppercase. If you

leave out this column, InnoDB adds FTS_DOC_ID as a

hidden column automatically when the FULLTEXT index is

created — and that is an expensive operation for an InnoDB table,

so it is much more efficient to define the column yourself.

Query, Natural Language Search

Once the data is loaded and committed, you can run queries using

the MATCH(columns) AGAINST (search

expression)operator to conduct actual searches. You can combine

this operator with the WHERE clause and similar

clauses in the SELECT statement.

-- Search for a single word.

select author as "Monolith" from quotes

where match(quote) against ('monolith' in natural language mode);

Monolith

Arthur C. Clarke

1 row in set (0.01 sec)

-- The default minimum length is 3 rather than 4, and the search

-- returns words that appear in a high proportion of the table rows.

select author as "Ago" from quotes

where match(quote) against ('ago' in natural language mode);

Ago

Abraham Lincoln

George Harrison

Arthur C. Clarke

K

4 rows in set (0.00 sec)

Query, Boolean Search

For more complicated searches, you can have multiple words and

phrases, and search for different combinations of optional and

required terms, not necessarily in the same order. This technique

typically involves several data values that you query from

elsewhere, or splitting apart a user-entered string and applying

your own rules to the words and phrases inside.

-- Search for a combination of words, not in the same order as the original.

select author as "Ago and Years" from quotes

where match(quote) against ('+ago +years' in boolean mode);

Resulting in:

Ago and Years

Abraham Lincoln

George Harrison

Arthur C. Clarke

K

4 rows in set (0.00 sec)

Proximity Search

Proximity search is a new feature in InnoDB full-text search. It

is a special case of Boolean search using the @

operator within the AGAINST() string. You supply two

or more words, double-quoted, within the single-quoted

AGAINST() string, followed by

@distance to specify how far apart these words

can be. The distance represents the maximum number of bytes (which

might not be equal to the number of characters) between the

starting points of all these words.

-- The starting points for these words are too far apart

-- (not within 20 bytes), so no results.

select quote as "Too Far Apart" from quotes

where match(quote) against ('"early wise" @20' in boolean mode);

Empty set (0.00 sec)

-- But the starting points of all words are within 100 bytes,

-- so this query does give results.

select quote as "Early...Wise" from quotes

where match(quote) against ('"early wise" @100' in boolean mode);

This results in:

Early...Wise

Early to bed and early to rise, makes a man healthy, wealthy,

and wise.

1 row in set (0.00 sec)

Transactions

One of the key ideas behind bringing full-text search to InnoDB

tables is to make this feature compatible with transactions. You

can design tables with both full-text columns and other columns.

Multiple sessions can update the full-text column data (and other

columns in the table) simultaneously. The full-text data doesn't

have to be treated as read-only or read-mostly.

One thing to note is that, just like the search feature in most

transactional databases (such as Oracle Text for the Oracle

Database), the tokenization for inserted strings is performed only

at commit time, so a full-text search does not see the uncommitted

data.

create table quotes_uncommitted

(

author varchar(64)

, quote varchar(4000)

, source varchar(64)

, fulltext(quote)

, primary key (author, quote(128))

);

-- Insert but don't immediately commit.

insert into quotes_uncommitted select author, quote, source from quotes;

-- Within the same transaction, a full-text search does not see the uncommitted data.

select count(author), author as "Uncommitted Results" from quotes_uncommitted

where match(quote) against ('ago' in natural language mode);

Which gives us:

count(author)

Uncommitted Results

0

NULL

1 row in set (0.00 sec)

-- OK, let's start with some committed data in the table, then empty the table,

-- then try some FTS queries, both before and after the commit.

insert into quotes_uncommitted select author, quote, source from quotes;

commit;

delete from quotes_uncommitted;

select count(author), author as "Deleted but still not committed" from quotes_uncommitted

where match(quote) against ('ago' in natural language mode);

Resulting in:

count(author)

Deleted but still not committed

0

NULL

1 row in set (0.00 sec)

rollback;

select count(author), author as "Deleted and rolled back" from quotes_uncommitted

where match(quote) against ('ago' in natural language mode);

Which results in:

count(author)

Deleted and rolled back

4

Abraham Lincoln

1 row in set (0.00 sec)

delete from quotes_uncommitted;

commit;

select count(author), author as "Deleted and committed" from quotes_uncommitted

where match(quote) against ('ago' in natural language mode);

count(author)

Deleted and committed

0

NULL

1 row in set (0.00 sec)

insert into quotes_uncommitted select author, quote, source from quotes;

commit;

truncate table quotes_uncommitted;

select count(author), author as "Truncated" from quotes_uncommitted

where match(quote) against ('ago' in natural language mode);

With the result of:

count(author)

Truncated

0

NULL

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值