安装插件
按照官网步骤,用命令行安装了插件:
https://github.com/medcl/elasticsearch-analysis-ik/tree/v7.3.1
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip
创建一个index,并创建mapping
按照官方指导:
curl -XPUT http://localhost:9200/index
curl -XPOST http://localhost:9200/index/_mapping -H 'Content-Type:application/json' -d'
{
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}'
插入文件
当进行插入文件操作的时候,死活也无法用curl往里面插入数据,我用的是cygwin下面的curl,错误提示信息:
$ curl -XPOST -H 'Content-Type:application/json' http://localhost:9200/index/_create/4 -d '{"content":"中文行不行"}'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 567 100 543 100 24 543 24 0:00:01 --:--:-- 0:00:01 7269
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse field [content] of type [text] in document with id '4'. Preview of field's value: ''"}],"type":"mapper_parsing_exception","reason":"failed to parse field [content] of type [text] in document with id '4'. Preview of field's value: ''","caused_by":{"type":"json_parse_exception","reason":"Invalid UTF-8 middle byte 0xd0\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@56ebd434; line: 1, column: 15]"}},"status":400}
也就是说,cygwin下的curl无法传递中文信息。
最后的解决办法:在Kibana中的Dev Tools中:
Post index/_create/6
{
"content": "中国是我的国家,长春是中国的城市"
}
结果成功
尝试高亮搜索:
Post index/_search
{
"query" : { "match" : { "content" : "中国" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" : {
"content" : {}
}
}
}
结果:
{
"took" : 120,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.080443,
"hits" : [
{
"_index" : "index",
"_type" : "_doc",
"_id" : "5",
"_score" : 2.080443,
"_source" : {
"content" : "中国"
},
"highlight" : {
"content" : [
"<tag1>中国</tag1>"
]
}
},
{
"_index" : "index",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.980741,
"_source" : {
"content" : "中国是我的国家,上海是中国的城市"
},
"highlight" : {
"content" : [
"<tag1>中国</tag1>是我的国家,上海是<tag1>中国</tag1>的城市"
]
}
}
]
}
}
尝试分词:
GET /index/_analyze
{
"text": " 对于你,我始终只能以陌生人的身份去怀念。",
"analyzer": "ik_smart"
}
结果:
{
"tokens" : [
{
"token" : "对于",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "你",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "我",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "始终",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "只",
"start_offset" : 8,
"end_offset" : 9,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "能以",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "陌生人",
"start_offset" : 11,
"end_offset" : 14,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "的",
"start_offset" : 14,
"end_offset" : 15,
"type" : "CN_CHAR",
"position" : 7
},
{
"token" : "身份",
"start_offset" : 15,
"end_offset" : 17,
"type" : "CN_WORD",
"position" : 8
},
{
"token" : "去",
"start_offset" : 17,
"end_offset" : 18,
"type" : "CN_CHAR",
"position" : 9
},
{
"token" : "怀念",
"start_offset" : 18,
"end_offset" : 20,
"type" : "CN_WORD",
"position" : 10
}
]
}