#### ES散记 ####

wangfy_

已于 2024-06-13 12:27:17 修改

阅读量370

点赞数

分类专栏： es 文章标签： elasticsearch 大数据搜索引擎

于 2020-07-23 09:48:51 首次发布

本文链接：https://blog.csdn.net/chushoufengli/article/details/107529487

版权

es 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一、第一部分

中文分词器

首先我们来了解一下中文分词器，中文分词器有两种：一种是ik_max_word，一种是ik_smart,我们分别来看下他们对中文分词的拆分

ik_max_word分词器

采用ik_max_word分词器，我们看到我是中国人分成了：我、是、中国人、中国、国人

ik_smart分词器

下面我们采用ik_smart来进行分词，同样还是对“我是中国人”这句话来进行分词，我们可以看到，采用ik_smart分词器分成了：我、是、中国人

我们发现，ik_max_word是一种细粒度的拆分，而ik_smart则是一种较为智能的拆分，实际使用时可根据不同场景来进行选择

Rest风格

我们知道，elasticsearch是一套基于Rest风格的api，我们刚刚是在Kibana可视化控制台中进行的请求操作，同样的，我们可以借助postman等工具对其发起rest风格的请求，如下，也得到了我们想要的结果，所以它的使用方式是很灵活的

不过在Kibana中有很强大的语法提示，使用起来更加灵活，它的底层实现还是rest请求，只不过帮我们都封装好了

elasticsearch的一些概念

索引(indices)：

数据库databases

类型(type)：

Table数据表

文档(Document)：

Row行，每一行数据就是一个文档，类似mysql查询出来的每一行数据

字段(Field)：

Columns列，类似mysql的每个字段

索引库

创建索引库类似于创建数据库，我们创建一个名为test的数据库，语法：

PUT /test

{

"settings": {

"number_of_shards": 5

, "number_of_replicas": 1

}

查询索引库

GET /test

删除索引库

DELETE /test

创建映射字段

在test数据库下面创建一张商品表，商品表里面有3个字段，分别是title、images、price

创建语法：

PUT /test/_mapping/goods

{

"properties": {

"title":{

"type": "text",

"analyzer": "ik_max_word"

"images":{

"type": "keyword",

"index": false

"price":{

"type": "float"

}

我们通过上面的语法，发现几个关键词，首先类型，文本类型有text和keyword，它们有什么区别呢，text类型的文本可以用来分词，不可用来聚合，而keyword的类型的文本不可分词，数据会作为完整的字段进行匹配，可以参与聚合

index代表是否索引，默认值为true，而一般图片的地址不需要检索的，所以index给设置为false

查看映射关系

GET /test/_mapping/goods

elasticsearch字段类型概述

一级分类二级分类具体类型

核心类型字符串类型 text,keyword

整数类型 integer,long,short,byte

浮点类型 double,float,half_float,scaled_float

逻辑类型 boolean

日期类型 date

范围类型 range

二进制类型 binary

复合类型数组类型 array

对象类型 object

嵌套类型 nested

地理类型地理坐标类型 geo_point

地理地图 geo_shape

特殊类型 IP类型 ip

范围类型 completion

令牌计数类型 token_count

附件类型 attachment

抽取类型 percolator

添加数据

添加一条数据：

POST /test/goods

{

"title":"小米手机",

"images":"http://img.xiaomi.com",

"price":2999.5

}

查询所有数据：

GET /test/_search

{

"query": {

"match_all": {}

}

根据id查询数据：

GET /test/goods/HHzUuWkB96yUdC__j7MfGET

修改数据

根据id修改数据：

PUT /test/goods/HHzUuWkB96yUdC__j7Mf

{

"title" : "中米手机",

"images" : "http://img.xiaomi.com",

"price" : 3999

}

put功能很强大，如果id不存在，则是新增数据

PUT /test/goods/123565

{

"title" : "锤子手机",

"images" : "http://img.chuizi.com",

"price" : 1888

}

我们可以看到，result的返回结果，如果是修改数据，返回的updated,如果是新增数据，返回的是created，很灵活，可以代替post添加数据的功能

删除数据

根据id删除数据：

DELETE /test/goods/HHzUuWkB96yUdC__j7Mf

match查询

我们现在检索小米手机

GET /test/_search

{

"query": {

"match": {

"title": "小米手机"

}

但是我们仔细看结果，咦？为什么华为手机，锤子手机都搜索出来了，这里注意了，我们存储的时候需要分词，搜索的时候也需要分词，我们虽然是搜索小米手机，实际上elasticsearch是拿着小米、手机这两个词去库里面检索的。所以有小米或者手机这两个词的都被检索出来了

那我们就想搜索小米手机怎么办呢？我们可以采用下面的方式(用and关系，operator指定为and)

GET /test/_search

{

"query": {

"match": {

"title": {"query": "小米手机","operator": "and"}

}

match_all查询

查询所有数据，这个前面接演示过，这里不再赘述了，语法格式如下：

GET /test/_search

{

"query": {

"match_all": {}

}

词条查询

我们用term来查询小米手机

GET /test/_search

{

"query": {

"term": {

"title": {

"value": "小米手机"

}

我们发现，词条查询居然查不出小米手机，这是为什么呢？这是因为词条查询是将小米手机作为一个整体去数据库里面查询，而数据库里面的title字段都是分过词的，拿着整体去匹配分过词的数据，当然是查询不到啦！所以词条查询一般应用场景是用来去查询那些不分词的字段，例如：价格，图片地址

指定返回字段

有的时候，我们可能不需要所有的字段都返回，那我们可以用source来指定需要返回的具体字段

GET /test/_search

{

"_source": ["title","price"],

"query": {

"match": {

"title": "小米手机"

}

还有另外两种变种写法：

写法一(includes包含需要返回的字段):

GET /test/_search

{

"_source": {

"includes": ["title","price"]

"query": {

"match": {

"title": "小米手机"

}

写法二(excludes排除需要返回的字段)：

GET /test/_search

{

"_source": {

"excludes": ["title","price"]

"query": {

"match": {

"title": "小米手机"

}

模糊查询

有一个场景，用户想搜索apple手机，但是不小心输入成applo，怎么办呢？别急，模糊查询fuzzy派上用场啦！

GET /test/_search

{

"query": {

"fuzzy": {

"title": "applo"

}

我们看到了，即使用户输错了，我们仍然可以查询出来

指定范围查询

查询价格在3000-5000范围内的手机：

GET /test/_search

{

"query": {

"range": {

"price": {

"gte": 3000,

"lte": 5000

}

布尔查询

查询商品名为小米手机，同时价格在1000-5000范围内的商品

GET /test/_search

{

"query": {

"bool": {

"must": [

{"match": {

"title": {"query": "小米手机","operator": "and"}

}}

"filter": {

"range": {

"price": {

"gte": 1000,

"lte": 5000

}

排序

查询商品名称为手机的商品并按照价格升序排序：

GET /test/_search

{

"query": {

"match": {

"title":{"query": "手机"}

}

"sort": [

{

"price": {

"order": "desc"

}

]

}

二、第二部分

es概念：

一个索引就是一个db，一个索引可有多张表，type相当于表

字段结构即mapping，支持新增，不支持删除和修改

一条记录头部有id，可以在插入时指定，也可以插入时生成

条的数据信息存在source里

docker安装es、kabana、ik分词

docker search elasticsearch

docker pull nshou/elasticsearch-kibana

docker images

docker run -d -p 9200:9200 -p 5601:5601 nshou/elasticsearch-kibana --name eskb

在线安装ik

docker exec -it es /bin/bash

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.7.0/elasticsearch-analysis-ik-7.7.0.zip

注意版本和对应es，安装好后会在plugi下生成文件

es命令

http://172.16.1.23:9200/_cat/indices?v

显示全部数据

http://172.16.1.23:9200/svc_search_mp_group/_search

检索

http://172.16.1.23:9200/svc_search_mp_group/_doc/_search/?q=group_name:分组

http://172.16.1.23:9200/svc_search_mp_content/_search/?q=pt_type:1

http://172.16.1.23:9200/svc_search_mp_content/_search/?q=id:515

查看索引结构

http://172.16.1.23:9200/svc_search_oplog/_mapping/

增加索引字段

PUT http://172.16.1.23:9200/svc_search_oplog/_mapping

{

"properties": {

"op_detail": {

"type": "text",

"analyzer": "my_ik_max_word",

"search_analyzer": "my_ik_smart"

}

修改文档名

https://www.jianshu.com/p/b1c7be021d9e

在 Elasticsearch 中文档是不可改变的，不能修改它们。在我们对文档进行更新操作的时候，都会将旧文档删除，在原来的位置创建新的文档.

我们会发现"_version"字段变成了2,这是因为这个字段标志着版本号，表示这是在这个index，这个type，这个id下，第二次存储的数据.

执行：

POST svc_search_mp_apply/_doc/15192491366042977927/_update

{

"doc" : {

"mp_name" : "test"

}

#! Deprecation: [types removal] Specifying types in document update requests is deprecated, use the endpoint /{index}/_update/{id} instead.

{

"_index" : "svc_search_mp_apply",

"_type" : "_doc",

"_id" : "15192491366042977927",

"_version" : 2,

"result" : "noop",

"_shards" : {

"total" : 0,

"successful" : 0,

"failed" : 0

"_seq_no" : 11,

"_primary_term" : 4

}

独立安装kibana（推荐使用es+kbn的镜像组合）

安装：brew install kibana

启动：kibana

浏览器访问：http://localhost:5601

连接es:

kibana 配置文件位置：

brew install安装应用最先是放在/usr/local/Cellar/目录下

cd到kibana，里面的config -> /usr/local/etc/kibana

修改：（取消默认的注释那几项即可）

server.port: 5601

server.host: "0.0.0.0"

elasticsearch.url: "http://192.168.202.128:9200"

kibana.index: ".kibana"

kbn命令

全部索引：

GET _cat/indices

svc_search_oplog信息：

GET svc_search_oplog

svc_search_oplog的setting信息：

GET svc_search_oplog/_settings

svc_search_oplog的mapping信息：

GET svc_search_oplog/_mapping

新建索引mappings svc_search_oplog：

PUT svc_search_oplog

{

"mappings": {

"properties": {

"account_id": {

"type": "keyword"

"account_sign": {

"analyzer": "my_ik_max_word",

"search_analyzer": "my_ik_smart",

"type": "text"

"created_at": {

"type": "long"

"op_detail": {

"analyzer": "my_ik_max_word",

"search_analyzer": "my_ik_smart",

"type": "text"

"op_log_id": {

"type": "keyword"

"op_object": {

"type": "keyword"

}

"settings": {

"analysis": {

"analyzer": {

"my_ik_max_word": {

"char_filter": [

"html_strip"

"tokenizer": "ik_max_word",

"type": "custom"

"my_ik_smart": {

"char_filter": [

"html_strip"

"tokenizer": "ik_smart",

"type": "custom"

}

插入条：

#指定索引为lagou，表为job，id为1

PUT lagou/job/1

{

"title":"后端研发",

"salary_min":20000,

"Company":{

"name":"百度",

"address":"北京"

}

获取条：

#获取索引为lagou，表为job，id为1的数据

GET lagou/job/1

修改条：

#直接覆盖id为1的文档

PUT lagou/job/1

{

"title":"后端研发",

"salary_min":10000,

"Company":{

"name":"百度",

"address":"北京"

}

#指明字段修改，doc为固定格式，指明要修改的字段

POST lagou/job/1/_update

{

"doc":

{

"salary_min":20000

}

删除索引为lagou，表为job，id为1的数据：

DELETE lagou/job/1

删除索引，不支持删除表：

DELETE lagou

获取svc_search_oplog的全部数据：

GET /svc_search_oplog/_search

{

"query": {

"match_all": {}

}

参数大致解释:

took: 执行搜索耗时，毫秒为单位

time_out: 搜索是否超时

_shards: 多少分片被搜索，成功多少，失败多少

hits：搜索结果展示

hits.total: 匹配条件的文档总数

hits.hits: 返回结果展示，默认返回十个

hits.max_score：最大匹配得分

hits._score: 返回文档的匹配得分（得分越高，匹配程度越高，越靠前）

_index _type _id 作为剥层定位到特定的文档

_source 文档源

———————————————————————————————————————————

https://www.cnblogs.com/yjf512/p/4897294.html

https://blog.csdn.net/qq_24365213/article/details/79224630

———————————————————————————————————————————

elasticsearch 查询（match和term）

es中的查询请求有两种方式，一种是简易版的查询，另外一种是使用JSON完整的请求体，叫做结构化查询（DSL）。

由于DSL查询更为直观也更为简易，所以大都使用这种方式。

DSL查询是POST过去一个json，由于post的请求是json格式的，所以存在很多灵活性，也有很多形式。

这里有一个地方注意的是官方文档里面给的例子的json结构只是一部分，并不是可以直接黏贴复制进去使用的。一般要在外面加个query为key的机构。

match

最简单的一个match例子：

查询和"我的宝马多少马力"这个查询语句匹配的文档。

{

"query": {

"match": {

"content" : {

"query" : "我的宝马多少马力"

}

上面的查询匹配就会进行分词，比如"宝马多少马力"会被分词为"宝马多少马力", 所有有关"宝马多少马力", 那么所有包含这三个词中的一个或多个的文档就会被搜索出来。

并且根据lucene的评分机制(TF/IDF)来进行评分。

match_phrase

比如上面一个例子，一个文档"我的保时捷马力不错"也会被搜索出来，那么想要精确匹配所有同时包含"宝马多少马力"的文档怎么做？就要使用 match_phrase 了

{

"query": {

"match_phrase": {

"content" : {

"query" : "我的宝马多少马力"

}

完全匹配可能比较严，我们会希望有个可调节因子，少匹配一个也满足，那就需要使用到slop。

{

"query": {

"match_phrase": {

"content" : {

"query" : "我的宝马多少马力",

"slop" : 1

}

multi_match

如果我们希望两个字段进行匹配，其中一个字段有这个文档就满足的话，使用multi_match

{

"query": {

"multi_match": {

"query" : "我的宝马多少马力",

"fields" : ["title", "content"]

}

但是multi_match就涉及到匹配评分的问题了。

best_fields

我们希望完全匹配的文档占的评分比较高，则需要使用best_fields

{

"query": {

"multi_match": {

"query": "我的宝马发动机多少",

"type": "best_fields",

"fields": [

"tag",

"content"

"tie_breaker": 0.3

}

意思就是完全匹配"宝马发动机"的文档评分会比较靠前，如果只匹配宝马的文档评分乘以0.3的系数

most_fields

我们希望越多字段匹配的文档评分越高，就要使用

{

"query": {

"multi_match": {

"query": "我的宝马发动机多少",

"type": "most_fields",

"fields": [

"tag",

"content"

]

}

cross_fields

我们会希望这个词条的分词词汇是分配到不同字段中的，那么就使用

{

"query": {

"multi_match": {

"query": "我的宝马发动机多少",

"type": "cross_fields",

"fields": [

"tag",

"content"

]

}

term

term是代表完全匹配，即不进行分词器分析，文档中必须包含整个搜索的词汇

{

"query": {

"term": {

"content": "汽车保养"

}

查出的所有文档都包含"汽车保养"这个词组的词汇。

使用term要确定的是这个字段是否“被分析”(analyzed)，默认的字符串是被分析的。

拿官网上的例子举例：

mapping是这样的：

PUT my_index

{

"mappings": {

"my_type": {

"properties": {

"full_text": {

"type": "string"

"exact_value": {

"type": "string",

"index": "not_analyzed"

}

PUT my_index/my_type/1

{

"full_text": "Quick Foxes!",

"exact_value": "Quick Foxes!"

}

其中的full_text是被分析过的，所以full_text的索引中存的就是[quick, foxes]，而extra_value中存的是[Quick Foxes!]。

那下面的几个请求：

GET my_index/my_type/_search

{

"query": {

"term": {

"exact_value": "Quick Foxes!"

}

请求的出数据，因为完全匹配

GET my_index/my_type/_search

{

"query": {

"term": {

"full_text": "Quick Foxes!"

}

请求不出数据的，因为full_text分词后的结果中没有[Quick Foxes!]这个分词。

bool联合查询: must,should,must_not

如果我们想要请求"content中带宝马，但是tag中不带宝马"这样类似的需求，就需要用到bool联合查询。

联合查询就会使用到must,should,must_not三种关键词。

这三个可以这么理解

must: 文档必须完全匹配条件

# filter:过滤，不参与打分

should: should下面会带一个以上的条件，至少满足一个条件，这个文档就符合should

must_not: 文档必须不匹配条件

比如上面那个需求：

{

"query": {

"bool": {

"must": {

"term": {

"content": "宝马"

}

"must_not": {

"term": {

"tags": "宝马"

}

———————————————————————————————————————————

使用多种分词器

https://www.jianshu.com/p/c47cd5313653

如果希望使用多种分析器得到不同的分词，可以使用 multi-fields 特性，指定多个产生字段：

PUT /my-index/_mapping/my-type

{

"my-type": {

"properties": {

"name": {

"type": "string",

"analyzer": "standard",

"fields": {

"custom1": {

"type": "string",

"analyzer": "custom1"

"custom2": {

"type": "string",

"analyzer": "custom2"

}

这样你可以通过 name、name.custom1、name.custom2 来使用不同的分析器得到的分词。

查询时也可以指定分析器

如：

POST /my-index/my-type/_search

{

"query": {

"match": {

"name": {

"query": "it's brown",

"analyzer": "standard"

}

———————————————————————————————————————————

##### 区别：term、match、match phrase、match phrase prefix

term

匹配一个值，输入的值不会分词。

match

模糊匹配，先对输入进行分词，对分词后的结果进行查询，文档只要包含match查询条件的一部分就会被返回。

match phrase

例如查询quick brown这俩关键词时，保证俩连在一起且保证顺序，即brown a quick和brown quick都不会被搜到。

match phrase prefix

较match phrase而言它增加了，quick brown f时，允许f前缀匹配。

term查keyword类型，其他查text类型。

———————————————————————————————————————————

elasticSearch 设置fields字段的keyword属性, 可精确匹配text类型

https://blog.csdn.net/u012976879/article/details/86598032

{

"svc_search_spider_item" : {

"mappings" : {

"properties" : {

"created_at" : {

"type" : "long"

"item_id" : {

"type" : "keyword"

"last_crawled_at" : {

"type" : "long"

"status" : {

"type" : "long"

"task_id" : {

"type" : "keyword"

"title" : {

"type" : "text",

"fields" : {

"standard" : {

"type" : "text",

"analyzer" : "standard"

"key":{

"type":"keyword"

}

"analyzer" : "my_ik_max_word",

"search_analyzer" : "my_ik_smart"

}

boolMap["must"] = M{

"multi_match": M{

"query": strings.TrimSpace(req.GetKeyword()),

"type": "phrase_prefix",

"fields": []string{"name.standard"},

}

boolMap["must"] = M{

"multi_match": M{

"query": strings.TrimSpace(req.GetKeyword()),

"type": "phrase_prefix",

"fields": []string{"name.key"},

}

刷新数据

POST /svc_search_spider_item/_update_by_query

{

"query": {

"match_all": {}

}

修改es的index的mapping 增加字段

dev环境：ssh bolome@47.98.144.219

stag环境：kubectl exec -it centos7-59b89689cd-nv88k -- /bin/bash

curl --location --request PUT 'http://172.16.1.23:9200/svc_search_topic/_mapping' \

--header 'Content-Type: application/json' \

--data-raw '{

"properties": {

"catalogue": {

"type": "text",

"fields": {

"standard": {

"type": "text",

"analyzer": "standard"

}

"analyzer": "my_ik_max_word",

"search_analyzer": "my_ik_smart"

"contentNum": {

"type": "long"

"contentViews": {

"type": "long"

"created_at": {

"type": "long"

"desc": {

"type": "text",

"fields": {

"standard": {

"type": "text",

"analyzer": "standard"

}

"analyzer": "my_ik_max_word",

"search_analyzer": "my_ik_smart"

"end_at": {

"type": "long"

"id": {

"type": "keyword"

"mpNum": {

"type": "long"

"sort": {

"type": "long"

"start_at": {

"type": "long"

"title": {

"type": "text",

"fields": {

"standard": {

"type": "text",

"analyzer": "standard"

}

"analyzer": "my_ik_max_word",

"search_analyzer": "my_ik_smart"

"status": {

"type": "long"

}

也可以只写新增的字段：

PUT cimissgcdb/_mapping/agmedays

{

"properties": {

"TimeFormat": {

"type": "date",

"format": "yyyy-MM-dd HH:mm:ss"

}

———————————————————————————————————————————

空间检索、函数查询等

查询距离和价格合适，且符合品类限制的小区，并排序

{
	"query": {
	  "function_score": {
		"functions": [
		  {
			"linear": {
			  "sort_1_flt": {
				"origin": 2000,
				"offset": 2000,
				"scale": 5000000,
				"decay": 0.01
			  }
			}
		  },
		  {
			"linear": {
			  "loc": {
				"origin": {
				  "lon": %v,
				  "lat": %v
				},
				"offset": "1km",
				"scale": "100km",
				"decay": 0.01
			  }
			}
		  }
		],
		"query": {
		  "bool": {
			"must": [
			  {
				"term": {
				  "city_id": "%v"
				}
			  },
			  {
				"term": {
				  "biz_sub_type": "%v"
				}
			  }
			],
			"must_not": [
			  {
				"term": {
				  "poi_bids": "%v"
				}
			  },
			  {
				"term": {
				  "sort_1_flt": 0
				}
			  }
			]
		  }
		}
	  }
	},
	"sort": [
	  {
		"weight": {
		  "order": "desc"
		}
	  },
	  {
		"_score": {
		  "order": "desc"
		}
	  }
	],
	"from": %v,
	"size": %v
  }