注:以下的内置分词器只是对中文几乎不适用,了解。下篇记录的IK分词器是在实际开发中使用的
1、什么是分词
分词就是指将一个文本转化成一系列单词的过程,也叫文本分析,在Elasticsearch中称之为Analysis。
举例:我是中国人 --> 我/是/中国人
2、分词api
指定分词器进行分词
POST http://192.168.142.128:9200/_analyze
{
"analyzer": "standard",
"text": "hello world"
}
指定索引分词
POST http://192.168.142.128:9200/itcast/_analyze
{
"analyzer": "standard",
"field": "hobby",
"text": "听音乐"
}
标准分词器对中文的分词不是很友好,会将中文分成一个一个的之字
3、内置分词器
1)Standard标准分词器
Standard 标准分词,按单词切分,并且会转化成小写
POST http://192.168.142.128:9200/_analyze
{
"analyzer": "standard",
"text": "A man becomes learned by asking questions."
}
2)Simple分词器
Simple分词器,按照非单词切分,并且做小写处理
POST http://192.168.142.128:9200/_analyze
{
"analyzer": "simple",
"text": "If the document doesn't already exist"
}
3)Whitespppace
Whitespace是按照空格切分
POST http://192.168.142.128:9200/_analyze
{
"analyzer": "whitespace",
"text": "If the document doesn't already exist"
}
4)Stop分词
Stop分词器,是去除Stop Word语气助词,如the、an等。
POST http://192.168.142.128:9200/_analyze
{
"analyzer": "stop",
"text": "If the document doesn't already exist"
}
5)Keyword
Keyword分词器,意思是传入就是关键词,不做分词处理。
POST http://192.168.142.128:9200/_analyze
{
"analyzer": "keyword",
"text": "If the document doesn't already exist"
}