ElasticSearch 学习第一部分

最新推荐文章于 2023-03-03 18:00:37 发布

黑色幽默595

最新推荐文章于 2023-03-03 18:00:37 发布

阅读量323

点赞数

分类专栏：数据库文章标签： elasticsearch 学习搜索引擎

本文链接：https://blog.csdn.net/MXqihang/article/details/125302688

版权

数据库专栏收录该内容

9 篇文章 0 订阅

订阅专栏

ElasticSearch 学习笔记一

es官网基于 6.8.6 的版本学习

发起请求工具 apifox

安装教程

基于HTTP 协议以json 为数据交换格式的 RESTful API

curl -x<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>'-d'<BODY>'

参数	说明
verb	http方法 `get、post、put、head、delete`
protocol	http/https 只有前面加https 时 es代理才可用
host	es集群中任意一个节点地址
port	服务端口
path	路径`_count` 返回集群中的文档数量。path 可以时多个组件 `_cluster/stats` 或`_nodes/stats/jvm`
query_string	一些可选的查询请求参数，例如 `pretty` 将参数请求返回json 数据
body	一个json 格式请求主体

curl -XGET 'http://localhost:9200/_count?pretty' -d '
{
    "query": {
        "match_all": {}
	}
}
'

Elasticsearch -> Indices -> Types -> Documents -> Fields

Elasticsearch集群可以包含多个索引indices 数据库,每一个索引可以包含多个类型的types 表,每一个类型可以包含多个文档documents 行 每个文档可以包含多个字段fields 列

创建一行数据

16552062081362

Path： /megacorp/employee/1

megacorp: 索引名 indices （es中没有时会自动创建）

employee：类型名 types

1 ：用户 id

查询数据

直接发起git 请求即可，相当简单，原数据信息存放在 _source中
这里同样可以使用

delect 请求删除数据

put更改数据

head 检查文档是否存在
xhead http://localhost:9200/{index}/{type}/{id}
# 返回状态码200 表示存在
# 404 表示不存在
查询所用用户信息.git请求 _search http://127.0.0.1:9200/megacorp/employee/_search

16552070345946

查询部分数据

如果只想查询部分字段，则携带参数 _source参数，设置

get /{index}/{type}/{id} ? _source=age,about

如果元数据不想看到只想看到_source

get /{index}/{type}/{id}/_source

批量查询mget是一个docs数组也可以设置_source来查询需要的字段

post /_mget
{
	"docs": [
		{
			"_index": "index1",
			"_type" : "type1",
			"_id": 1
		},
		{
			"_index": "index1",
			"_type" : "type1",
			"_id": 2,
			"_source": "age"
		}
	]
}

post /{index}/{type}/_mget
{
	"ids": ["1","2"]
}
# 在一个 type 中查询多个 id

条件查询 query string

这里数据自行添加一下 。

现在需要匹配姓中包含 Smith 的人(es英文分词按照空格来，中文分词需要自行安装)

使用 query string 搜索，_search搜索出全部q=来匹配条件

16552080222210

get /_search?q=Smith

这种会进行全字段匹配，不是单独匹配last_name

复杂条件查询 DSL

DSL(domain specific language) 特定领域语言，以json请求体形式构建。

GET //megacorp/employee/_search
{
  "query" : {
    "match" : {
      "last_name" : "Smith"
    }
  }
}

这里和上面q=的效果时一样的，查询条件在 josn中 match 一致匹配

GET //megacorp/employee/_search
{
  "query" : {
    "filtered" : {
      "filter" : {
        "range" : {
          "age" : { "gt" : 30 }<1>
        }
      },
      "query" : {
        "match" : {
          "last_name" : "smith" 
        }
      }
    }
  }
}

现在要得到用户 age 大于 30 岁的，同时姓氏 Smith 的人

<1> range filter 过滤gt是 greater than 缩写

全文搜索

全文搜索，是传统数据库难以实现的功能

结果里面_score相关性评分：文档与条件的匹配程度。

16552107722355

短语搜索

处理单个词语的搜索外，还需要 rock climbing这样的连续在一起的搜索 match 换成match_phrase

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
          "about" : "rock climbing"
        }
		} 
}

高亮搜索

搜索内容中，和查询内容一样的部分，高亮。highlight

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
				} 
  	}
}

更新数据局部更新

post /{index}/{type}/{id}/_update
{
	"doc" :{
		"views": 0
	}
}

在原有文档中添加了一个views字段

post /{index}/{type}/{id}/_update
{
	"script": "ctx._source.views+=1"
}

使用update脚本怎加views数量ctx._source表示内容。

"tags": ["value1"]

post /{index}/{type}/{id}/_update
{
	"script": "ctx._source.tags+=new_tag",
	"params": {
		"new_tag" : "value2"
	}
}

结果："tags": ["value1","value2"]

post /{index}/{type}/{id}/_update
{
	"script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none' ",
	"params": {
		"conut": 1
	}
}

设置 ctx.op 为delete来删除文档内容

post /{index}/{type}/{id}/_update
{
	"script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none' ",
	"upsert": {
		"views": 1
	}
}

更新一个不存的文档时候会409 使用upsert来使文档不存在的时候也能被创建

post /{index}/{type}/{id}/_update? retry_on_conflict=5 {}

retry_on_conflict 更新是在返回错误前，重试。上面重试5次

数据

文档元数据

节点	说明
_index	文档存储的位置。（理解库）
_type	文档代表的对象类（理解为表）
_id	文档的唯一标识
_version	版本号
found	true 文档找当 false 文档不存在
_source	存入的数据

# 创建一条数据
put /{index}/{type}/{id}{ json...}

自动id生成

注意是post 请求 / {}
自动生成的id 是22位的 UUIDS

post /{index}/{type}/{ json ..}

更新文档

es 中是不可变的，不能去修改，需要更新已经存在的文档，使用put 来替换它 "_version" : 2表示更新成功，索引相同，es为旧版本的文档做了删除标记。还没被删除，也不能不被访问。

es 是通过 _index、_type、_id来确定一个唯一的文档

# 方式一
put /{index}/{type}/{id} ? op_type=create

# 方式二
put /{index}/{type}/{id}/_create

当请求成功的时候返回状态码 201 Created 文档已经存在的时候返回409 Conflict

批量处理

行为	说明
create	文档不存在时创建
index	创建文档，或替换成新文档
update	局部更新
delete	删除

post /_bulk
{ "delete":{"_index": "xx" , "_type": "xx" , "_id": xx}}
{ "create":{"_index": "xx" , "_type": "xx" , "_id": xx}}
{ "update":{"_index": "xx" , "_type": "xx" , "_id": xx , "_retry_on_conflict": "5"}}

post /website/_bulk
{}{}

_bulk 进行多条数据请求{}都是单独处理的，返回结果也是一一对应

website/_bulk 当在同一个index操作的时候，返回的元数据时多余的使用这个可以忽略返回元数据

空搜索

get /_search

响应字段	说明
this	包含了`total`字段来表示匹配到的文档总数
took	这个搜索请求花费的毫秒数
shards	查询的分片数`total` 有多少成功`successful`多少失败`failed`
timeout	查询超时参数 `10ms 或 1s`

索引

写法	说明
/_search	在所有索引中查询
/name/_search	指定 name 索引中查询
/name1,name/_search	在索引 name1 和 name1 中查询
/ n,s/_search	模糊匹配索引 n和s 开头的
/name/type/_search	指定索引和类型
/name1,name2/type1,type2/_search	在索引 name1-2 和tyep 1-2 中搜索
/_all/tyep1.tyep2/_search	在所有索引中搜索 type1-2

分页

获取 1-3页的结果

get /_search?size=5
get /_seatch?size=5&from=5
get /_seatch?size=5&from=10

get /_search
{
	"from": 5
	"size": 10
}

映射

get /{index}/_mapping/{type/doc}

查看当前文档的字段映射

“data”: { “type”: “date” , “format”: “dateOptionalTime” }

data 字段是 data 类型

类型	表示数据类型
string	string
whole number	byte、short、integer、long
floating point	float、double
date	date

自动映射 json type	field type
Boolean: false\true	boolean
whole number: 123	long
floating point: 123.123	double
string, valid date: “2022-01-01”	date
string: “xx”	string

{
  "name":{
    "type": "string",
    "index": "not_analyzed"
  }
}

String 类型默认的index是analyzed

值	说明
analyzed	分析字符串，然后索引。以全文形式索引次字段
not_ananlyzed	索引字段，可以被搜索，单搜索内容和指定值一样。不分析字段
no	不索引这个字段，这个字段不能搜索到

添加映射

put /{index}/_mapping/{doc}
{
	"properties" :{
		"name": {
			"type": "string",
			"index": "not_analyzed"
		}
	}
}

底部