使用服务器字符集和排序规则(character_set_server和 collation_server系统变量的值),加载停用词列表并搜索全文查询 。如果用于全文索引或搜索的停用词文件或列的字符集或排序规则不同于character_set_server或collation_server, 则对于停用词查找可能会出现错误的命中或遗漏。
停用词查找的区分大小写取决于服务器排序规则。例如,查找是不区分大小写如果核对是utf8mb4_0900_ai_ci,反之,如果核对是查找是大小写敏感 utf8mb4_0900_as_cs或 utf8mb4_bin。
InnoDB搜索索引的停用词
InnoDB缺省停用词的列表相对较短,因为技术,文学和其他来源的文档经常使用短词作为关键字或重要短语。例如,您可能搜索 “ 是或不是 ”,并期望获得明智的结果,而不是忽略所有这些词。
要查看默认InnoDB停用词列表,请查询 INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD 表。
mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
+-------+
| value |
+-------+
| a |
| about |
| an |
| are |
| as |
| at |
| be |
| by |
| com |
| de |
| en |
| for |
| from |
| how |
| i |
| in |
| is |
| it |
| la |
| of |
| on |
| or |
| that |
| the |
| this |
| to |
| was |
| what |
| when |
| where |
| who |
| will |
| with |
| und |
| the |
| www |
+-------+
36 rows in set (0.00 sec)
要为所有InnoDB表定义自己的stopword列表,请定义一个结构与InnoDB_FT_DEFAULT_stopword表相同的表,用stopwords填充它,并在创建全文索引之前将InnoDB_FT_server_stopword_table选项的值设置为db_name/table_name格式的值。stopword表必须有一个名为value的VARCHAR列。下面的示例演示如何为InnoDB创建和配置新的全局stopword表。
-- Create a new stopword table
mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE= INNODB;
Query OK, 0 rows affected (0.01 sec)
-- Insert stopwords (for simplicity, a single stopword is used in this example)
mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael');
Query OK, 1 row affected (0.00 sec)
-- Create the table
mysql> CREATE TABLE opening_lines (id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)
-- Insert data into the table
mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
-- Set the innodb_ft_server_stopword_table option to the new stopword table
mysql> SET GLOBAL innodb_ft_server_stopword_table= 'test/my_stopwords';
Query OK, 0 rows affected (0.00 sec)
-- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)
mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (1.17 sec)
Records: 0 Duplicates: 0 Warnings: 1
通过查询INFORMATION SCHEMA.INNODB_FTINDEX表中的单词,验证指定的'Ishmael' 的stopword 是否未出现。
注意默认情况下,InnoDB全文搜索索引中不会出现长度小于3个字符或大于84个字符的单词。最大和最小字长值可使用innodb ft_max_token_size和innodb ft_min_token_size变量配置。此默认行为不适用于ngram解析器插件。ngram令牌大小由ngram令牌大小选项定义。