创建索引
创建索引的语法
PUT /index
{
"settings": {
"index":{
"number_of_shards":"3",
"number_of_replicas":"2"
}
},
"mappings": {
"dynamic":false,
"properties" : {
"field1" : { "type" : "text" },
"field2" : {"type" : "integer" }
}
},
"aliases": {
"otherName": {}
}
}
创建一个新闻索引
创建一个分片数=3,副本数=2 ,别名=news 的新闻索引。
PUT /article
{
"settings":{
"number_of_shards":3,
"number_of_replicas":2
},
"mappings":{
"dynamic":false,
"properties":{
"title":{
"type":"text",
"analyzer":"ik_smart",
"search_analyzer":"ik_max_word"
},
"content":{
"type":"text",
"analyzer":"ik_smart",
"search_analyzer":"ik_max_word"
},
"categoryName":{
"type":"keyword"
},
"view_count":{
"type": "integer"
},
"publishTime":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
}
}
},
"aliases":{
"news":{
}
}
}
使用 kibana 插入4条数据
PUT /article/_doc/1
{
"title":"芯片人才争夺",
"content":"芯片人才争夺“生猛”,需要大量人才",
"categoryName":"科技",
"view_count" : 60,
"publishTime":"2022-04-19 12:00:00"
}
PUT /article/_doc/2
{
"title":"2021年我国数字阅读用户规模破5亿 人均电子阅读11.58本",
"content":"2021年,我国数字阅读用户规模为5.06亿,相比2020年增长了2.49%;人均阅读量电子阅读11.58本,有声阅读7.08本。在首届全民阅读大会数字阅读分论坛暨第八届数字阅读年会上,中国音像与数字出版协会发布发布了《2021年度中国数字阅读报告》,展现了过去一年中国数字阅读行业发展情况与特点",
"categoryName":"科技",
"view_count" : 80,
"publishTime":"2022-04-26 12:00:00"
}
PUT /article/_doc/3
{
"title":"徙的鸟死于城市灯光,气象雷达如何拯救它们",
"content":"城市灯光会吸引迁徙的鸟类,并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章,讲述了美国的鸟类学家如何利用气象雷达和建模,降低人造灯光影响下的鸟类死亡率",
"categoryName":"科技",
"view_count":70,
"publishTime":"2022-04-26 12:00:00"
}
PUT /article/_doc/4
{
"title":"季度动力电池装机量排行榜",
"content":"在汽车电动化时代,中国领跑全球;而在汽车动力电池领域,中国的宁德时代继续领跑全球",
"categoryName":"汽车",
"view_count" : 50,
"publishTime":"2022-04-26 12:00:00"
}
简单查询语法
-
GET /<target>/_search
-
GET /_search
-
POST /<target>/_search
-
POST /_search
根据ID查询
GET /article/_doc/1
-- 输出内容
{
"_index" : "article",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "芯片人才争夺",
"content" : "芯片人才争夺“生猛”,需要大量人才",
"categoryName" : "科技",
"view_count" : 60,
"publishTime" : "2022-04-19 12:00:00"
}
}
使用别名查询
GET /news/_doc/1
--输出内容
{
"_index" : "article",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "芯片人才争夺",
"content" : "芯片人才争夺“生猛”,需要大量人才",
"categoryName" : "科技",
"view_count" : 60,
"publishTime" : "2022-04-19 12:00:00"
}
}
无条件搜索所有
GET /news/_search
{
"took" : 60,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "article",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "2021年我国数字阅读用户规模破5亿 人均电子阅读11.58本",
"content" : "2021年,我国数字阅读用户规模为5.06亿,相比2020年增长了2.49%;人均阅读量电子阅读11.58本,有声阅读7.08本。在首届全民阅读大会数字阅读分论坛暨第八届数字阅读年会上,中国音像与数字出版协会发布发布了《2021年度中国数字阅读报告》,展现了过去一年中国数字阅读行业发展情况与特点",
"categoryName" : "科技",
"view_count" : 80,
"publishTime" : "2022-04-26 12:00:00"
}
},
{
"_index" : "article",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "徙的鸟死于城市灯光,气象雷达如何拯救它们",
"content" : "城市灯光会吸引迁徙的鸟类,并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章,讲述了美国的鸟类学家如何利用气象雷达和建模,降低人造灯光影响下的鸟类死亡率",
"categoryName" : "科技",
"view_count" : 70,
"publishTime" : "2022-04-26 12:00:00"
}
},
{
"_index" : "article",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"title" : "季度动力电池装机量排行榜",
"content" : "在汽车电动化时代,中国领跑全球;而在汽车动力电池领域,中国的宁德时代继续领跑全球",
"categoryName" : "汽车",
"view_count" : 50,
"publishTime" : "2022-04-26 12:00:00"
}
},
{
"_index" : "article",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "芯片人才争夺",
"content" : "芯片人才争夺“生猛”,需要大量人才",
"categoryName" : "科技",
"view_count" : 60,
"publishTime" : "2022-04-19 12:00:00"
}
}
]
}
}
字段 | 描述 |
took | 耗费了几毫秒 |
timed_out | 是否超时,这里是没有 |
_shards | 数据拆成3个分片,对于搜索请求,会打到所有的primary shard(或者是它的某个replica shard也可以),所以total和successful会是3; |
hits | 查询的所有结果 |
hits.total | 查询结果的数量(多少个 document) |
hits.max_score | score的含义就是document对于一个search的相关度的匹配分数,越相关、就越匹配,分数也越高; |
hits.hits(hits里面包含了hits) | 包含了匹配搜索的document的详细数据-----里面的hits包含的是和每个文档相关的数据,外面的hits有的数据是统计数据,如total等--------一般都有两个hits嵌套 |
_index | 该文档所属的index |
_type | 该文档所属的type |
_id | 该文档的id |
_source | 具体的内容,即存储的json串 |
传参
与http请求传参类似
GET /news/_search?q=title:人才&sort=publishTime:desc
{
"took" : 48,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "article",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"title" : "芯片人才争夺",
"content" : "芯片人才争夺“生猛”,需要大量人才",
"categoryName" : "科技",
"view_count" : 60,
"publishTime" : "2022-04-19 12:00:00"
},
"sort" : [
1650369600000
]
}
]
}
}
分页查询
GET /news/_search?from=1&size=2
{
"took" : 45,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "article",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "徙的鸟死于城市灯光,气象雷达如何拯救它们",
"content" : "城市灯光会吸引迁徙的鸟类,并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章,讲述了美国的鸟类学家如何利用气象雷达和建模,降低人造灯光影响下的鸟类死亡率",
"categoryName" : "科技",
"view_count" : 70,
"publishTime" : "2022-04-26 12:00:00"
}
},
{
"_index" : "article",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"title" : "季度动力电池装机量排行榜",
"content" : "在汽车电动化时代,中国领跑全球;而在汽车动力电池领域,中国的宁德时代继续领跑全球",
"categoryName" : "汽车",
"view_count" : 50,
"publishTime" : "2022-04-26 12:00:00"
}
}
]
}
}
query DSL入门
query基本匹配查询关键字说明
关键字 | 说明 |
match_all | 查询简单的 匹配所有文档。在没有指定查询方式时,它是默认的查询 |
match | 用于全文搜索或者精确查询,如果在一个精确值的字段上使用它, 例如数字、日期、布尔或者一个 not_analyzed 字符串字段,那么它将会精确匹配给定的值 |
range | 查询找出那些落在指定区间内的数字或者时间 gt 大于;gte 大于等于;lt 小于;lte 小于等于 |
term | 被用于精确值 匹配 |
terms | terms 查询和 term 查询一样,但它允许你指定多值进行匹配 |
exists | 查找那些指定字段中有值的文档 |
missing | 查找那些指定字段中无值的文档 |
must | 多组合查询 必须匹配这些条件才能被包含进来 |
must_not | 多组合查询 必须不匹配这些条件才能被包含进来 |
should | 多组合查询 如果满足这些语句中的任意语句,将增加 _score ,否则,无任何影响。它们主要用于修正每个文档的相关性得分 |
filter | 多组合查询 这些语句对评分没有贡献,只是根据过滤标准来排除或包含文档 |
查询全部 GET /book/_search
POST localhost:9200/news/_search
{
"query":{
"match_all":{
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "article",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"title": "芯片人才争夺",
"content": "芯片人才争夺“生猛”,需要大量人才",
"publishTime": "2022-04-19 12:00:00"
}
}
]
}
}
查询指定条件
GET /news/_search
{
"query": {
"match": {
"title": "人才"
}
}
}
--输出
{
"took" : 30,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "article",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"title" : "芯片人才争夺",
"content" : "芯片人才争夺“生猛”,需要大量人才",
"categoryName" : "科技",
"view_count" : 60,
"publishTime" : "2022-04-19 12:00:00"
}
}
]
}
}
排序查询
GET /news/_search
{
"query": {
"match": {
"title": "人才"
}
},
"sort": [
{
"view_count": {
"order": "desc"
}
}
],
"from": 0,
"size": 2
}
--输出
{
"took" : 45,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "article",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"title" : "芯片人才争夺",
"content" : "芯片人才争夺“生猛”,需要大量人才",
"categoryName" : "科技",
"view_count" : 60,
"publishTime" : "2022-04-19 12:00:00"
},
"sort" : [
60
]
}
]
}
}
term不分词查询
value值部分会作为整体被查询, 不会被分词, 与match做区分, match的value是会被分词作匹配查询的
GET /news/_search
{
"query": {
"term": {
"title": {
"value": "芯片人才"
}
}
}
}
--输出
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
match_phrase
//title中 "芯片", "人才" 会做精准匹配, 都全部含有这两个短语的文档才会被检索出来
GET /news/_search
{
"query": {
"match_phrase": {
"title": "芯片人才"
}
}
}
bool多条件复合查询
bool查询的使用:
Bool查询对应Lucene中的BooleanQuery,它由一个或者多个子句组成,每个子句都有特定的类型。
must
用于全文搜索或者精确查询,如果在一个精确值的字段上使用它, 例如数字、日期、布尔或者一个 not_analyzed 字符串字段,那么它将会精确匹配给定的值。返回的文档必须满足must子句的条件,并且参与计算分值
GET /news/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "人才"
}
},
{
"match": {
"content": "人才"
}
},{
"range": {
"view_count": {
"gte": 50,
"lte": 60
}
}
}
]
}
}
}
-- 输出
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.683245,
"hits" : [
{
"_index" : "article",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.683245,
"_source" : {
"title" : "芯片人才争夺",
"content" : "芯片人才争夺“生猛”,需要大量人才",
"categoryName" : "科技",
"view_count" : 60,
"publishTime" : "2022-04-19 12:00:00"
}
}
]
}
}
filter
返回的文档必须满足filter子句的条件。但是不会像Must一样,参与计算分值
should
返回的文档可能满足should子句的条件。在一个Bool查询中,如果没有must或者filter,有一个或者多个should子句,那么只要满足一个就可以返回。minimum_should_match参数定义了至少满足几个子句, 默认情况是1
GET /news/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "人才"
}
},
{
"match": {
"title": "城市"
}
}
]
}
}
}
---输出
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.9140557,
"hits" : [
{
"_index" : "article",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.9140557,
"_source" : {
"title" : "徙的鸟死于城市灯光,气象雷达如何拯救它们",
"content" : "城市灯光会吸引迁徙的鸟类,并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章,讲述了美国的鸟类学家如何利用气象雷达和建模,降低人造灯光影响下的鸟类死亡率",
"categoryName" : "科技",
"view_count" : 70,
"publishTime" : "2022-04-26 12:00:00"
}
},
{
"_index" : "article",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"title" : "芯片人才争夺",
"content" : "芯片人才争夺“生猛”,需要大量人才",
"categoryName" : "科技",
"view_count" : 60,
"publishTime" : "2022-04-19 12:00:00"
}
}
]
}
}
must_not
返回的文档必须不满足must_not定义的条件。
如果一个查询既有filter又有should,那么至少包含一个should子句。
bool查询也支持禁用协同计分选项disable_coord。一般计算分值的因素取决于所有的查询条件。
bool查询也是采用more_matches_is_better的机制,因此满足must和should子句的文档将会合并起来计算分值。
Query DSL语法
POST localhost:9200/news/_search
{
"query":{
"bool":{
"must":{
"match":{
"title":"人才"
}
}
}
}
}
{
"took": 56,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5753642,
"hits": [
{
"_index": "article",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"title": "芯片人才争夺",
"content": "芯片人才争夺“生猛”,需要大量人才",
"publishTime": "2022-04-19 12:00:00"
}
}
]
}
}
简写形式:
POST localhost:9200/news/_search
{
"query":{
"match":{
"title":"人才"
}
}
}
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5753642,
"hits": [
{
"_index": "article",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"title": "芯片人才争夺",
"content": "芯片人才争夺“生猛”,需要大量人才",
"publishTime": "2022-04-19 12:00:00"
}
}
]
}
}
POST localhost:9200/news/_search
{
"query":{
"bool":{
"must":{
"multi_match":{
"query":"人才",
"fields":[
"title",
"content"
]
}
}
}
}
}
{
"took": 62,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.7911257,
"hits": [
{
"_index": "article",
"_type": "_doc",
"_id": "1",
"_score": 0.7911257,
"_source": {
"title": "芯片人才争夺",
"content": "芯片人才争夺“生猛”,需要大量人才",
"publishTime": "2022-04-19 12:00:00"
}
}
]
}
}
fuzzy query
返回包含与搜索词类似的词的文档,该词由Levenshtein编辑距离度量
POST localhost:9200/news/_search
{
"query":{
"bool":{
"must":{
"fuzzy":{
"content":{"value":"心片"}
}
}
}
}
}
输出:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
查询计划
POST localhost:9200/news/_validate/query?explain
{
"query":{
"bool":{
"must":{
"fuzzy":{
"title":{"value":"芯片"}
}
}
}
}
}
输出:
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"valid": true,
"explanations": [
{
"index": "article",
"valid": true,
"explanation": "+title:芯片~0"
}
]
}
搜索与聚合结合,统计类别的数量
POST localhost:9200/news/_search
{
"size":0,
"query":{
"match_all":{
}
},
"aggs":{
"popular_colors":{
"terms":{
"field":"categoryName"
}
}
}
}
输出:
{
"took": 65,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"popular_colors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "科技",
"doc_count": 3
},
{
"key": "汽车",
"doc_count": 1
}
]
}
}
}
GET localhost:9200/myindex/_search?q=is
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "myindex",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"title": "this is my first template"
}
}
]
}
}
定制自己的dynamic mapping template
PUT localhost:9200/myindex
{
"mappings": {
"dynamic_templates": [
{
"en": {
"match": "*_en",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "english"
}
}
}
]
}
}
插入数据
PUT localhost:9200/myindex/_doc/1
{
"title":"this is my first template"
}
PUT localhost:9200/myindex/_doc/2
{
"title_en":"this is my first template"
}
搜索停用词is
搜索关键词template
GET localhost:9200/myindex/_search?q=template
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "myindex",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"title": "this is my first template"
}
},
{
"_index": "myindex",
"_type": "_doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"title_en": "this is my first template"
}
}
]
}
}
参考:jianshu.com/p/50dbd7252d0a