1. 简单Demo
索引
(1) 创建一个名字为demo的索引
PUT http://localhost:9200/demo
ES响应
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "demo"
}
(2) 创建索引时, 指定主分片和分片副本的数量
PUT http://localhost:9200/demo
{
"settings":{
"number_of_shards":1,
"number_of_replicas":1
}
}
ES响应
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "demo"
}
(3) 查看索引
GET http://localhost:9200/demo
ES响应
{
"demo": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"creation_date": "1561110747038",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "kjPqDUt6TMyywg1P7qgccw",
"version": {
"created": "5060499"
},
"provided_name": "demo"
}
}
}
}
(4) 查看索引状态
GET http://localhost:9200/_cat/indices?v
ES响应
health | status | index | uuid | pri | rep | docs.count |
yellow | open | demo | wqkto5CCTpWNdP3HGpLfxA | 5 | 1 | 0 |
yellow | open | .kibana | pwKW9hJyRkO7_pE0MNE05g | 1 | 1 | 1 |
可以看到当前ES中一共2个索引, 一个是刚创建的demo, 另一个是kibana创建的索引, 表格中的字段信息如下所示:
- health: 健康状态, red表示不是所有主分片都可用, 即部分主分片可用. yellow表示主分片可用备分片不可用, 常常是单机ES的健康状态, greens表示主备分片都可用.
- status: 索引状态, open表示打开可对索引中的文档进行读写, close表示关闭, 此时索引占用的内存会被释放, 但是此索引不可以进行读写操作.
- index: 索引.
- uuid: 索引标识.
- pri: 索引的主分片数量.
- rep: 索引的分片副本数量, 1表示有一个分片副本. (有多少主分片就有多少备分片)
- doc.count: 文档数量
- doc.deleted: 被删除的文档数量
- store.size: 索引大小
- pri.store.size: 主分片占用的大小
(5) 删除索引
DELETE http://localhost:9200/demo
ES响应
example_type
类型(同时定义映射Mapping字段及类型)
创建索引demo的类型为example_type, 包含两个字段: created类型为date, message类型为keyword.
(1) 创建类型方式一 (此方式的类型一旦创建就不能删除, 只能修改, 故而慎用)
PUT http://localhost:9200/demo/_mapping/example_type
{
"properties":{
"created":{
"type":"date"
},
"message":{
"type":"keyword"
}
}
}
(2) 创建类型方式二 (配合映射Mapping使用, 该创建方式是常用的方式)
PUT http://localhost:9200/demo
{
"mappings":{
"example_type":{
"properties":{
"created":{
"type":"date"
},
"message":{
"type":"keyword"
}
}
}
}
}
文档
(1) 插入文档
POST http://localhost:9200/demo/example_type
{
"created":1561135459000,
"message":"test1"
}
ES响应
{
"_index": "demo",
"_type": "example_type",
"_id": "AWt67Ql_Tf0FgxupYlBX",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}
(2) 查询文档
POST http://localhost:9200/demo/example_type/_search?pretty
ES响应
{
"took": 183,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "demo",
"_type": "example_type",
"_id": "AWt67Ql_Tf0FgxupYlBX",
"_score": 1,
"_source": {
"created": 1561135459000,
"message": "test1"
}
}
]
}
}
(3) 修改文档
根据_id对文档进行修改
POST http://localhost:9200/demo/example_type/AWt67Ql_Tf0FgxupYlBX/_update
{
"doc":{
"message":"updated"
}
}
ES响应
{
"_index": "demo",
"_type": "example_type",
"_id": "AWt67Ql_Tf0FgxupYlBX",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
}
}
(4) 删除文档
根据_id对文档进行删除
DELETE http://localhost:9200/demo/example_type/AWt67Ql_Tf0FgxupYlBX
ES响应
{
"found": true,
"_index": "demo",
"_type": "example_type",
"_id": "AWt67Ql_Tf0FgxupYlBX",
"_version": 2,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
}
}
2. 分词
(1) ES中只对字符串才有分词操作, 在ES2.x版本中, 字符串类型只有string, ES5.x之后字符串类型分成了text和keyword两种类型, 但是分词操作只针对text类型, keyword不会被分词处理, 故keyword通常会被用来做为整词索引.
(2) ES的默认分词器是standard, 对于英文搜索没问题, 但是其对于中文分词并不友好, 只会将中文按字分开. 例如"中国", 会被分成"中"和"国"两个字. 为了优化该部分, 针对中文文本, 通常用IK分词器来分词.
(3) IK插件安装(直接下载编译好了的zip文件, 需要和ES版本一致): https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v6.3.2. IK历史版本下载页面: https://github.com/medcl/elasticsearch-analysis-ik/releases. 下载之后完成解压, 将elasticsearch-analysis-ik-6.3.2文件夹直接放在ES安装目录下的plugins文件中, 重启ES.
(4) ik分词器有ik_smart和ik_max_word两种模式.
- ik_smart: 粗粒度分词, 例如"北京大学", 此时不会被分词.
- ik_max_word: 细粒度分词, 例如"北京大学", 会被分词为:"北京大学", "北京大", "北京", "大学".
- ik_max_word会带来很多无用的噪音词汇, 故我们在使用时通常使用ik_smart模式.
(5) 自定义词库
进入IK插件目录下的config文件, 创建custom.dic自定义词库, 想该词库中添加"小米手机"并保存, 此时该词库就是用户词典了. 仍然是在config文件中, 修改IKAnalyzer.cfg.xml文件进行配置.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展词典-->
<entry key="ext_dict">custom.dic</entry>
<!--用户可以在这里配置自己的扩展停用词典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展词典-->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停用词典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
重启ES之后, 再次通过ik_smart对"小米手机"进行分词, 发现"小米手机"不再被分词.
3. python操作ES
from elasticsearch import Elasticsearch
es = Elasticsearch(['127.0.0.1:9200'])
# 删除索引
# es.delete(index='word', id=1) # es.delete必须有id
result = es.indices.delete(index='word') # es.indices.delete只需要有index即可
print("delete = ", result)
# 创建索引
# es.create(index="word", id=1, body={"name":"python","addr":"深圳"}) # es.create必须有body 和 id参数
# result = es.indices.create(index="word") # es.indices.create只需要有index即可
# print("create = ", result)
# 创建带type和mapping的索引
mapping = {
"mappings": {
"analyser_demo": { # doc_type
"properties": {
"name": {
"type": "text",
"analyzer":"ik_smart"
}
}
}
}
}
result = es.indices.create(index="word", body=mapping)
print("create type mapping = ", result)
# 插入数据
result = es.index(index="word", doc_type="analyser_demo", body={"name":"张三"})
result = es.index(index="word", doc_type="analyser_demo", body={"name":"李四"})
result = es.index(index="word", doc_type="analyser_demo", body={"name":"软件"})
result = es.index(index="word", doc_type="analyser_demo", body={"name":"我们是软件工程师"})
print("insert = ", result)
# http://localhost:9200/word/_search?pretty 无条件查询, 查询索引是word的刚插入的数据
# 简单查询数据, 不能插入之后 立马查询, 要等10s钟
query = {'query': {'match_all': {}}} # 查找所有文档
query = {'query': {'term': {"name": "软件"}}} # term精确查找, 不分词搜索
query = {'query': {'match': {"name": "软件"}}} # match模糊查找, 分词搜索
result = es.search(index="word", doc_type="analyser_demo", body=query)
hits = []
for data in result["hits"]["hits"]:
hits.append(data["_source"]["name"])
print(hits)
4. 复杂查询
import json
import requests
from elasticsearch import Elasticsearch
es = Elasticsearch(['127.0.0.1:9200'])
# 删除索引
result = es.indices.delete(index='company') # es.indices.delete只需要有index即可
print("delete = ", result)
print()
# 创建带type和mapping的索引
mapping = {
"mappings":{
"employee":{
"properties":{
# 雇员id, string类型, keyword不分词
"id":{
"type":"keyword"
},
# 雇员name, string类型, text分词
# 雇员name.keyword string类型, keyword不分词
"name":{
"type":"text",
"analyzer":"ik_smart",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256 # 文本最大长度
}
}
},
# 雇员性别 string类型, keyword不分词
"sex":{
"type":"keyword"
},
# 雇员年龄 int类型
"age":{
"type":"integer"
},
# 雇员生日 date类型
"birthday":{
"type":"date"
},
# 雇员position, string类型, text分词
# 雇员position.keyword string类型, keyword不分词
"position":{
"type":"text",
"analyzer":"ik_smart",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
# 雇员level,
"level":{
"type":"join",
"relations":{
"superior":"staff",
"staff":"junior"
}
},
# 雇员departments, string类型, text分词
# 雇员departments.keyword string类型, keyword不分词
"departments":{
"type":"text",
"analyzer":"ik_smart",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
# 雇员加入公司的时间
"joinTime":{
"type":"date"
},
# 修改的时间
"modified":{
"type":"date"
},
# 这条记录的创建时间
"created":{
"type":"date"
}
}
}
}
}
result = es.indices.create(index="company", body=mapping)
print("create type mapping = ", result)
print()
# 插入数据
body1 = {
"id": "1",
"name": "张三",
"sex": "男",
"age": 49,
"birthday": "1970-01-01",
"position": "董事长",
"level": {
"name": "superior"
},
"joinTime": "1990-01-01",
"modified": "1562167817000",
"created": "1562167817000"
}
body2 = {
"id": "2",
"name": "李四",
"sex": "男",
"age": 39,
"birthday": "1980-04-03",
"position": "总经理",
"level": {
"name": "staff",
"parent": "1"
},
"departments": ["市场部", "研发部"],
"joinTime": "2001-02-02",
"modified": "1562167817000",
"created": "1562167817000"
}
body3 = {
"id": "3",
"name": "王五",
"sex": "女",
"age": 27,
"birthday": "1992-09-01",
"position": "销售",
"level": {
"name": "junior",
"parent": "2"
},
"departments": ["市场部"],
"joinTime": "2010-07-01",
"modified": "1562167817000",
"created": "1562167817000"
}
body4 = {
"id": "4",
"name": "赵六",
"sex": "男",
"age": 29,
"birthday": "1990-09-01",
"position": "销售",
"level": {
"name": "junior",
"parent": "2"
},
"departments": ["市场部"],
"joinTime": "2010-08-08",
"modified": "1562167817000",
"created": "1562167817000"
}
body5 = {
"id": "5",
"name": "孙七",
"sex": "男",
"age": 26,
"birthday": "1993-12-10",
"position": "前端工程师",
"level": {
"name": "junior",
"parent": "2"
},
"departments": ["研发部"],
"joinTime": "2016-07-01",
"modified": "1562167817000",
"created": "1562167817000"
}
body6 = {
"id": "6",
"name": "周八",
"sex": "男",
"age": 28,
"birthday": "1994-05-11",
"position": "Java工程师",
"level": {
"name": "junior",
"parent": "2"
},
"departments": ["研发部"],
"joinTime": "2018-03-10",
"modified": "1562167817000",
"created": "1562167817000"
}
# join里面的parent=1是指的整个插入的_id=1, 但是es.index这种插入形式, 如果不指定id=1则默认的_id=字符串
# result = es.index(index="company", doc_type="employee", id=1, body=body1)
# 如果有join形式的mappings使用put形式添加数据
url = "http://localhost:9200/company/employee/1?routing=1"
res = requests.put(url, data=json.dumps(body1), headers={'Content-Type':'application/json'})
print("insert-1 = ", res.text)
url = "http://localhost:9200/company/employee/2?routing=1"
res = requests.put(url, data=json.dumps(body2), headers={'Content-Type':'application/json'})
print("insert-2 = ", res.text)
url = "http://localhost:9200/company/employee/3?routing=1"
res = requests.put(url, data=json.dumps(body3), headers={'Content-Type':'application/json'})
print("insert-3 = ", res.text)
url = "http://localhost:9200/company/employee/4?routing=1"
res = requests.put(url, data=json.dumps(body4), headers={'Content-Type':'application/json'})
print("insert-4 = ", res.text)
url = "http://localhost:9200/company/employee/5?routing=1"
res = requests.put(url, data=json.dumps(body5), headers={'Content-Type':'application/json'})
print("insert-5 = ", res.text)
url = "http://localhost:9200/company/employee/6?routing=1"
res = requests.put(url, data=json.dumps(body6), headers={'Content-Type':'application/json'})
print("insert-6 = ", res.text)
print()
# http://localhost:9200/company/_search?pretty 无条件查询, 查询索引是word的刚插入的数据
# 切记在插入之后等10s之后才可以查询的
# 查询研发部的员工
query = {
"query": {
"match": {
"departments": "研发部"
}
}
}
result = es.search(index="company", doc_type="employee", body=query)
hits = []
for data in result["hits"]["hits"]:
hits.append(data["_source"]["name"])
print("研发部 = ", hits)
# 查询研发部且在市场部的员工
query = {
"query": {
"bool": {
"must": [
{
"match": {
"departments": "市场部"
}
},
{
"match": {
"departments": "研发部"
}
}
]
}
}
}
result = es.search(index="company", doc_type="employee", body=query)
hits = []
for data in result["hits"]["hits"]:
hits.append(data["_source"]["name"])
print("研发部&市场部 = ", hits)
# 查询张三的直接下属
query = {
"query": {
"has_parent": {
"parent_type": "superior",
"query": {
"match":{
"name":"张三"
}
}
}
}
}
result = es.search(index="company", doc_type="employee", body=query)
hits = []
for data in result["hits"]["hits"]:
hits.append(data["_source"]["name"])
print("张三的下属 = ", hits)
# 查询王五的上级
query = {
"query": {
"has_child": {
"type": "junior",
"query": {
"match":{
"name":"王五"
}
}
}
}
}
result = es.search(index="company", doc_type="employee", body=query)
hits = []
for data in result["hits"]["hits"]:
hits.append(data["_source"]["name"])
print("王五的上级 = ", hits)
# 计算员工的平均年龄
query = {
"size": 0,
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
result = es.search(index="company", doc_type="employee", body=query)
print("平均年龄 = ", result)
# 查询张三的生日
query = {
"_source": ["name","birthday"],
"query": {
"match": {
"name": "张三"
}
}
}
result = es.search(index="company", doc_type="employee", body=query)
print("张三生日 = ", result)
复杂查询的更多例子可以参考该链接: https://pan.baidu.com/s/1IvNomQVxkMgqYKjZs2uXSg 提取码: j63w.