在执行搜索之前,我们准备了一份数据,可以通过curl来下载。下面的代码将会下载文件,并将其添加到我们的数据库中
curl https://gitee.com/timczm/el-learning/raw/master/account.json -o account.json
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"
如果一切顺利,结果如下
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank Gr6RiYrNS9yst0QX6mR2Nw 1 1 1000 0 381.9kb 381.9kb
随后,先进行一个简单的搜索来获取所有在bank中的数据。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
'
结果如下,hits
中会返回匹配你的查询条件的前10条数据。
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"_score" : null,
"_source" : { "account_number" : 0, "balance" : 16623, "firstname" : "Bradshaw", "lastname" : "Mckenzie", "age" : 29, "gender" : "F", "address" : "244 Columbus Place", "employer" : "Euron", "email" : "bradshawmckenzie@euron.com", "city" : "Hobucken", "state" : "CO" },
"sort" : [ 0 ]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : { "account_number" : 1, "balance" : 39225, "firstname" : "Amber", "lastname" : "Duke", "age" : 32, "gender" : "M", "address" : "880 Holmes Lane", "employer" : "Pyrami", "email" : "amberduke@pyrami.com", "city" : "Brogan", "state" : "IL" },
"sort" : [ 1 ]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"account_number" : 2,
"balance" : 28838,
"firstname" : "Roberta",
"lastname" : "Bender",
"age" : 22,
"gender" : "F",
"address" : "560 Kingsway Place",
"employer" : "Chillium",
"email" : "robertabender@chillium.com",
"city" : "Bennett",
"state" : "LA"
},
"sort" : [
2
]
}, {...}
]
}
}
该相应会提供如下信息:
took
- 花了多少时间在搜索上,单位:毫秒timed_out
- 查询的过程是否超时_shards
- 有多少的分片被搜索到了以及成功,失败或跳过了多少个分片。max_score
- 找到的所有文档中,相关性最高的数据的得分(相关性分数)hits.total.value
- 找到了多少个满足要求的数据hits.sort
- 文档的排序位置(不按相关性得分排序时)hits._score
- 文档的相关性得分(使用match_all的时候不适用)
同样的,刚刚的查询也是支持区间查询的。如下代码可以帮助你进行区间查询,from代表了起始的位置,size代表了你需要查询的数据量,如果你曾经使用过mysql之类的关系型数据库,那么该过程和 select * from table limit 10, 10
非常类似。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
],
"from": 10,
"size": 10
}
'
随后,你就可以获取到需要的数据。当然,现在还可以做一点更为有意思的查询。接下来就要介绍 match 语句。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match": { "address": "mill lane" } }
}
'
这里要匹配的是,地址中含有mill或者含有lane的数据。如果你并不希望分词,而是需要完全匹配的数据,那么可以将match修改为match_phrase:
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_phrase": { "address": "mill lane" } }
}
'
如果你希望有更复杂的匹配方式,那么bool能够满足你,在下面的查询中,会查询出年龄40岁的,并且不居住在爱达荷州(ID)的客户。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
'
每一个在布尔查询语句中的must
,must_not
,should
都是一个查询子句,
每当文档满足must
和should
查询子句,其相关性得分都会增加。其得分越高,那么文档会越符合你的需求。在默认情况下,es将会依照相关性得分排序。而must_not
仅相当于一个过滤器(filter),它只会影响数据是否出现在结果中,而不会影响相关性得分。你也可以显式的指定过滤器(filter),如下是过滤了余额(balance)在[20000, 30000]之间的用户。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
'