【ElasticSearch】1.ElasticSearch_搜索

最新推荐文章于 2024-06-30 00:56:36 发布

Anabel Chen

最新推荐文章于 2024-06-30 00:56:36 发布

阅读量2k

点赞数

分类专栏：【ELK-Elasticsearch】文章标签： elasticsearch 搜索数据

本文链接：https://blog.csdn.net/benben513624/article/details/50493644

版权

【ELK-Elasticsearch】专栏收录该内容

3 篇文章 0 订阅

订阅专栏

示例数据，如果没有特别说明，一些映射将用于本节的余下部分：

{
  "book":{
    "_index":{
      "enabled":true
    },
    "_id":{
      "index":"not_analyzed",
      "store":"yes"
    },
    "properties":{
      "author":{
        "type":"string"
      },
      "characters":{
        "type":"string"
      },
      "copies":{
        "type":"string",
        "ignore_malformed":false
      },
      "otitle":{
        "type":"string"
      },
      "tags":{
        "type":"string"
      },
      "title":{
        "type":"string"
      },
      "year":{
         "type":"long",
        "ignore_malformed":false,
        "index":"analyzed"
      },
      "available":{
        "type":"boolean"
      }
    }
  }
}

上述映射（保存为mapping.json文件）用来创建library索引。使用下面的命令来运行：

curl -XPOST 'localhost:9200/library?pretty'
curl -XPUT 'localhost:9200/library/book/_mapping?pretty' -d @mapping.json

批量上传数据，便于搜索

{
  "index":{
    "_index":"library",
    "_type":"book",
    "_id":"1"
  }
}
{
  "title":"All Quiet on the Western Front",
  "otitle":"Im Western nichts Neues",
  "author":"Erich Maria Remarque",
  "year":1929,
  "characters":["Paul Baumer","Albert Kropp","Haie Westhus","Fredrich Muller","Stanislaus Katczinsky","Tjaden"],
  "tags":["novel"],
  "copies":1,
  "available":true,
  "section":3
}
{
  "index":{
    "_index":"library",
    "_type":"book",
    "_id":"2"
  }
}
{
  "title":"Catch-22",
  "author":"Joseph Heller",
  "year":1961,
  "characters":["John Yossarian","Captain Aardvark","Chaplain Tappman","Colonel Cathcart","Doctor Daneeka"],
  "tags":["novel"],
  "copies":6,
  "available":false,
  "section":1
}
{
  "index":{
    "_index":"library",
    "_type":"book",
    "_id":"3"
  }
}
{
  "title":"The Complete Sherlock Holmes",
  "author":"Arthur Conan Doyle",
  "year":1936,
  "characters":["Sherlock Holmes","Dr.Watson","G.Lestrade"],
  "tags":[],
  "copies":0,
  "available":false,
  "section":12
}
{
  "index":{
    "_index":"library",
    "_type":"book",
    "_id":"4"
  }
}
{
  "title":"Crime and Punishment",
  "otitle":"Pecty",
  "author":"Fyodor Dostoevsky",
  "year":1886,
  "characters":["Raskolnikov","Sofia Semyonovna Marmeladova"],
  "tags":[],
  "copies":0,
  "available":true
}

把上面数据保存在documents.json文件里，使用下面的命令来索引化：

curl -s -XPOST 'localhost:9200/_bulk?pretty' --data-binary @documents.json

可以看到数据已经添加

（一）查询ElasticSearch

1.简单查询

查询ElasticSearch最简单的办法是使用URI请求查询，例如：为了搜索title字段中的crime一词，使用下面的命令：

curl -XGET 'localhost:9200/library/book/_search?pretty=true&q=title:crime'

这种查询方式简单，但比较局限。如果从ElasticSearch的查询DSL的视点来看，上面的查询是一种query_string查询，它查询title字段中含有crime一词的文档，可以这样写：

curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{
   "query":{
    "query_string":{"query":"title:crime"}
   }
 }'

采用查询DSL来发送查询有点不同，但也不是什么高神的东西，我们和以前一样发送HTTP GET请求到_search这个REST端点，并在请求主题中附上查询。请求体（-d参数）把整个json格式的查询发到ElasticSearch，pretty=true（换行）参数让ElasticSearch以更容易阅读的方式返回响应，一下是两种查询的输出结果：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_score" : 0.15342641,
      "_source":{

  "title":"Crime and Punishment",

  "otitle":"Pecty",

  "author":"Fyodor Dostoevsky",

  "year":1886,

  "characters":["Raskolnikov","Sofia Semyonovna Marmeladova"],

  "tags":[],

  "copies":0,

  "available":true

}
    } ]
  }
}

2.分页和结果集大小

正如我们期望的，ElasticSearch能控制想要的最多结果数以及想从哪个结果开始。下面是可以在请求体中添加的两个额外参数：

（1）from:返回起始文档，默认值是0，表示想要得到从第一个文档开始的结果。

（2）size:返回一次查询的文档数，默认值为10。

如果想让查询从第一个文档开始返回3个文档，可以发送如下查询：

curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{
   "size":2,
   "query":{
     "query_string":{"query":"title:crime"}
   }
 }'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_score" : 0.15342641,
      "_source":{

  "title":"Crime and Punishment",

  "otitle":"Pecty",

  "author":"Fyodor Dostoevsky",

  "year":1886,

  "characters":["Raskolnikov","Sofia Semyonovna Marmeladova"],

  "tags":[],

  "copies":0,

  "available":true

}
    } ]
  }
}

3.除了所有返回的信息以外，ElasticSearch还可以返回文档的版本，为此，需要在json对象的最上层添上version属性并把值设置为true,所以要求返回版本信息的查询，最终将如下所示：

curl -XGET 'localhost:9200/library/book/_search?pretty' -d '
{
  "version":true,
  "query":{
    "query_string":{"query":"title:crime"}
  }
}'

执行上面的查询后，得到如下结果：

{
  "took" : 315,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_version" : 1,
      "_score" : 0.15342641,
      "_source":{

  "title":"Crime and Punishment",

  "otitle":"Pecty",

  "author":"Fyodor Dostoevsky",

  "year":1886,

  "characters":["Raskolnikov","Sofia Semyonovna Marmeladova"],

  "tags":[],

  "copies":0,

  "available":true

}
    } ]
  }
}

可以看到，_version属性出现在返回的唯一hit对象中。

4.限制得分

对于非标准案例，ElasticSearch提供一项功能，让我们可以根据文档需要满足的最低的分值，来过滤结果。为了用此功能，必须在JSON顶层提供min_score属性和最低得分值。例如：希望我们的查询只是返回得分高于0.75的文档，发出一下查询：

curl -XGET 'localhost:9200/library/book/_search?pretty' -d '
{
  "min_score":0.75,
  "query":{
    "query_string":{"query":"title:crime"}
  }
}'

执行后得到如下响应：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

看下之前那个例子，文档得分是0.15342641，比0.75低，所以这次没有得到任何文档。限制得分一般没有太大意义，因为一般来说在查询之前比较得分很困难。也许在某些情况下，你将需要这个功能。

5.需要返回的字段

在请求主题中使用字段数组，可以定义在响应中包含哪些字段。记住：你只能返回那些在用于创建索引的映射中的标记为存储的字段，或者你使用了_source字段（ElasticSearch使用_source字段提供存储字段）。因此，要让每个结果中的文档只返回title和year字段，发送下面的查询到ElasticSearch：

curl -XGET 'localhost:9200/library/book/_search?pretty' -d '
{
  "fields":["title","year"],
  "query":{
    "query_string":{"query":"title:crime"}
  }
}'

在响应中，得到如下输出：

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_score" : 0.15342641,
      "fields" : {
        "year" : [ 1886 ],
        "title" : [ "Crime and Punishment" ]
      }
    } ]
  }
}

可以看到，一切按预期工作。与你分享以下三点：

（1）如果没有定义fields数组，它将用默认值，如果有就返回_source字段；

（2）如果使用_source字段，并且请求一个没有的字段那么这个字段将从_source字段中提取（然后，这需要额外的处理）；

（3）如果想返回所有的存储字段，只需传入（×）作为字段的名字。

从性能角度来看，返回_source字段比返回多个存储字段更好。

6.部分字段

除了可以选择要返回哪些字段外，ElasticSearch允许使用所谓部分字段。ElasticSearch公开了部分字段对象的include和exclude属性，所以可以基于这些属性来包含或排除字段。例如：为了在查询中包括titl开头排除以chara开头的字段，发出以下查询：

curl -XGET 'localhost:9200/library/book/_search?pretty' -d '
{
  "partial_fields":{
    "partiall":{
      "include":["titl*"],
      "exclude":["chara*"]
    }
  },
  "query":{
    "query_string":{"query":"title:crime"}
  }
}'

查询结果如下：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_score" : 0.15342641,
      "fields" : {
        "partiall" : [ {
          "title" : "Crime and Punishment"
        } ]
      }
    } ]
  }
}

7.使用脚本字段

可以在ElasticSearch中返回脚本计算字段：在JSON的查询对象中加上script_fields部分，添加上每个想返回的脚本值的名字。若要返回一个叫correctYear的值，它用year字段减去1800计算得来，运行以下查询：

curl -XGET 'localhost:9200/library/book/_search?pretty' -d '
{
   "script_fields":{
    "correctYear":{
      "script":"doc['year'].year-1800"
    }
  },
  "query":{
    "query_string":{"query":"title:crime"}
  }
}'

我们在上面的示例中使用了doc符号，它让我们捕获了返回结果，从而让脚本执行速度更快，但也导致了更高的内存消耗，并且限制了只能用单个字段的单个值。如果关心内存的使用，或者使用的是更复杂的字段值，可以用_source字段。使用此字段的查询如下所示：

curl -XGET 'localhost:9200/library/book/_search?pretty' -d '
{
   "script_fields":{
    "correctYear":{
      "script":"_source.year-1800"
    }
  },
  "query":{
    "query_string":{"query":"title:crime"}
  }
}'

Anabel Chen

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【ElasticSearch】1.ElasticSearch_搜索

示例数据，如果没有特别说明，一些映射将用于本节的余下部分：{ "book":{ "_index":{ "enabled":true }, "_id":{ "index":"not_analyzed", "store":"yes" }, "properties&q
复制链接

扫一扫

专栏目录