java中使用kuromoji,Elasticsearch:无法使用Kuromoji阅读表格过滤搜索

我正在使用Elasticsearch 0.90.1和Kuromoji插件1.4.0 .

$ curl localhost:9200

{

"ok" : true,

"status" : 200,

"name" : "Agent Zero",

"version" : {

"number" : "0.90.1",

"snapshot_build" : false,

"lucene_version" : "4.3"

},

"tagline" : "You Know, for Search"

}

我创建了一个新的索引,使用Kuromoji为我的 default 分析器:

$ curl -X PUT localhost:9200/test -d '{

"index": {

"analysis": {

"filter": {

"kuromoji_rf": {

"type": "kuromoji_readingform",

"use_romaji": "false"

}

},

"tokenizer": {

"kuromoji": {

"type": "kuromoji_tokenizer"

}

},

"analyzer": {

"default": {

"type": "custom",

"tokenizer": "kuromoji",

"filter": [

"kuromoji_rf"

]

}

}

}

}

}'

结果:

{

"ok": true,

"acknowledged": true

}

阅读形式令牌过滤器似乎工作正常(汉字被归一化为片假名):

$ curl localhost:9200/test/_analyze -d '東京'

结果:

{

"tokens": [

{

"token": "トウキョウ",

"start_offset": 0,

"end_offset": 2,

"type": "word",

"position": 1

}

]

}

索引文档:

$ curl -X PUT localhost:9200/test/docs/1 -d '{

"body": "これは関西国際空港です"

}'

结果:

{

"ok": true,

"_index": "test",

"_type": "docs",

"_id": "1",

"_version": 1

}%

索引文档与通配符查询匹配:

$ curl 'localhost:9200/test/docs/_search?q=body:*'

结果:

{

"took": 109,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 1,

"max_score": 1.0,

"hits": [

{

"_index": "test",

"_type": "docs",

"_id": "1",

"_score": 1.0,

"_source": {

"body": "これは関西国際空港です"

}

}

]

}

}

但是,当我使用日语搜索时,它不匹配:

$ curl 'localhost:9200/test/docs/_search?q=body:空港'

结果:

{

"took": 21,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits": []

}

}

$ curl 'localhost:9200/test/docs/_search?q=body:クウコウ'

结果:

{

"took": 95,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits": []

}

}

$ curl 'localhost:9200/test/docs/_search?q=body:空'

结果:

{

"took": 22,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits": []

}

}

我想知道分析器是否可能没有用于搜索查询,但指定分析器没有帮助:

$ curl 'localhost:9200/test/docs/_search?analyzer=default&q=body:空港'

结果:

{

"took": 17,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits": []

}

}

顺便说一句,如果我禁用令牌过滤器,一切正常 .

我究竟做错了什么?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值