EleasticSearch综述(二)

最新推荐文章于 2024-08-22 22:48:15 发布

知然xu

最新推荐文章于 2024-08-22 22:48:15 发布

阅读量481

点赞数

分类专栏：数据检索文章标签： elasticsearch

本文链接：https://blog.csdn.net/m0_37565948/article/details/105549498

版权

数据检索专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1. 简单Demo

索引

(1) 创建一个名字为demo的索引

PUT http://localhost:9200/demo

ES响应

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "demo"
}

(2) 创建索引时, 指定主分片和分片副本的数量

PUT http://localhost:9200/demo

{
    "settings":{
        "number_of_shards":1,
        "number_of_replicas":1
    }
}

ES响应

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "demo"
}

(3) 查看索引

GET http://localhost:9200/demo

ES响应

{
    "demo": {
    "aliases": {},
    "mappings": {},
    "settings": {
        "index": {
            "creation_date": "1561110747038",
            "number_of_shards": "1",
            "number_of_replicas": "1",
            "uuid": "kjPqDUt6TMyywg1P7qgccw",
            "version": {
                "created": "5060499"
            },
            "provided_name": "demo"
            }
        }
    }
}

(4) 查看索引状态

GET http://localhost:9200/_cat/indices?v

ES响应

health	status	index	uuid	pri	rep	docs.count
yellow	open	demo	wqkto5CCTpWNdP3HGpLfxA	5	1	0
yellow	open	.kibana	pwKW9hJyRkO7_pE0MNE05g	1	1	1

可以看到当前ES中一共2个索引, 一个是刚创建的demo, 另一个是kibana创建的索引, 表格中的字段信息如下所示:

health: 健康状态, red表示不是所有主分片都可用, 即部分主分片可用. yellow表示主分片可用备分片不可用, 常常是单机ES的健康状态, greens表示主备分片都可用.
status: 索引状态, open表示打开可对索引中的文档进行读写, close表示关闭, 此时索引占用的内存会被释放, 但是此索引不可以进行读写操作.
index: 索引.
uuid: 索引标识.
pri: 索引的主分片数量.
rep: 索引的分片副本数量, 1表示有一个分片副本. (有多少主分片就有多少备分片)
doc.count: 文档数量
doc.deleted: 被删除的文档数量
store.size: 索引大小
pri.store.size: 主分片占用的大小

(5) 删除索引

DELETE http://localhost:9200/demo

ES响应

example_type

类型(同时定义映射Mapping字段及类型)

创建索引demo的类型为example_type, 包含两个字段: created类型为date, message类型为keyword.

(1) 创建类型方式一 (此方式的类型一旦创建就不能删除, 只能修改, 故而慎用)

PUT http://localhost:9200/demo/_mapping/example_type

{
    "properties":{
        "created":{
            "type":"date"
        },
        "message":{
            "type":"keyword"
        }
    }
}

(2) 创建类型方式二 (配合映射Mapping使用, 该创建方式是常用的方式)

PUT http://localhost:9200/demo

{
    "mappings":{
        "example_type":{
            "properties":{
                "created":{
                    "type":"date"
                },
                "message":{
                    "type":"keyword"
                }
            }
        }
    }
}

文档

(1) 插入文档

POST http://localhost:9200/demo/example_type

{
    "created":1561135459000,
    "message":"test1"
}

ES响应

{
    "_index": "demo",
    "_type": "example_type",
    "_id": "AWt67Ql_Tf0FgxupYlBX",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "created": true
}

(2) 查询文档

POST http://localhost:9200/demo/example_type/_search?pretty

ES响应

{
    "took": 183,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
            {
                "_index": "demo",
                "_type": "example_type",
                "_id": "AWt67Ql_Tf0FgxupYlBX",
                "_score": 1,
                "_source": {
                    "created": 1561135459000,
                    "message": "test1"
                }
            }
        ]
    }
}

(3) 修改文档

根据_id对文档进行修改

POST http://localhost:9200/demo/example_type/AWt67Ql_Tf0FgxupYlBX/_update

{
    "doc":{
        "message":"updated"
    }
}

ES响应

{
    "_index": "demo",
    "_type": "example_type",
    "_id": "AWt67Ql_Tf0FgxupYlBX",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}

(4) 删除文档

根据_id对文档进行删除

DELETE http://localhost:9200/demo/example_type/AWt67Ql_Tf0FgxupYlBX

ES响应

{
    "found": true,
    "_index": "demo",
    "_type": "example_type",
    "_id": "AWt67Ql_Tf0FgxupYlBX",
    "_version": 2,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}

2. 分词

(1) ES中只对字符串才有分词操作, 在ES2.x版本中, 字符串类型只有string, ES5.x之后字符串类型分成了text和keyword两种类型, 但是分词操作只针对text类型, keyword不会被分词处理, 故keyword通常会被用来做为整词索引.

(2) ES的默认分词器是standard, 对于英文搜索没问题, 但是其对于中文分词并不友好, 只会将中文按字分开. 例如"中国", 会被分成"中"和"国"两个字. 为了优化该部分, 针对中文文本, 通常用IK分词器来分词.

(3) IK插件安装(直接下载编译好了的zip文件, 需要和ES版本一致): https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v6.3.2. IK历史版本下载页面: https://github.com/medcl/elasticsearch-analysis-ik/releases. 下载之后完成解压, 将elasticsearch-analysis-ik-6.3.2文件夹直接放在ES安装目录下的plugins文件中, 重启ES.

(4) ik分词器有ik_smart和ik_max_word两种模式.

ik_smart: 粗粒度分词, 例如"北京大学", 此时不会被分词.
ik_max_word: 细粒度分词, 例如"北京大学", 会被分词为:"北京大学", "北京大", "北京", "大学".
ik_max_word会带来很多无用的噪音词汇, 故我们在使用时通常使用ik_smart模式.

(5) 自定义词库

进入IK插件目录下的config文件, 创建custom.dic自定义词库, 想该词库中添加"小米手机"并保存, 此时该词库就是用户词典了. 仍然是在config文件中, 修改IKAnalyzer.cfg.xml文件进行配置.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展词典-->
    <entry key="ext_dict">custom.dic</entry>
    <!--用户可以在这里配置自己的扩展停用词典-->
    <entry key="ext_stopwords"></entry>
    <!--用户可以在这里配置远程扩展词典-->
    <!-- <entry key="remote_ext_dict">words_location</entry> -->
    <!--用户可以在这里配置远程扩展停用词典-->
    <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

重启ES之后, 再次通过ik_smart对"小米手机"进行分词, 发现"小米手机"不再被分词.

3. python操作ES

from elasticsearch import Elasticsearch
es = Elasticsearch(['127.0.0.1:9200'])

# 删除索引
# es.delete(index='word', id=1) # es.delete必须有id
result = es.indices.delete(index='word')  # es.indices.delete只需要有index即可
print("delete = ", result)

# 创建索引
# es.create(index="word", id=1, body={"name":"python","addr":"深圳"}) # es.create必须有body 和 id参数
# result = es.indices.create(index="word")  # es.indices.create只需要有index即可
# print("create = ", result)

# 创建带type和mapping的索引
mapping = {
    "mappings": {
        "analyser_demo": {       # doc_type
            "properties": {
                "name": {
                    "type": "text",
                    "analyzer":"ik_smart"
                }
            }
        }
    }
}
result = es.indices.create(index="word", body=mapping)
print("create type mapping = ", result)

# 插入数据
result = es.index(index="word", doc_type="analyser_demo", body={"name":"张三"})
result = es.index(index="word", doc_type="analyser_demo", body={"name":"李四"})
result = es.index(index="word", doc_type="analyser_demo", body={"name":"软件"})
result = es.index(index="word", doc_type="analyser_demo", body={"name":"我们是软件工程师"})
print("insert = ", result)

# http://localhost:9200/word/_search?pretty 无条件查询, 查询索引是word的刚插入的数据

# 简单查询数据, 不能插入之后 立马查询, 要等10s钟
query = {'query': {'match_all': {}}}    # 查找所有文档
query = {'query': {'term': {"name": "软件"}}} # term精确查找, 不分词搜索
query = {'query': {'match': {"name": "软件"}}} # match模糊查找, 分词搜索
result = es.search(index="word", doc_type="analyser_demo", body=query)
hits = []
for data in result["hits"]["hits"]:
    hits.append(data["_source"]["name"])
print(hits)

python操作ES的更多教程可以参考: 链接1, 链接2

4. 复杂查询

import json
import requests
from elasticsearch import Elasticsearch
es = Elasticsearch(['127.0.0.1:9200'])

# 删除索引
result = es.indices.delete(index='company')  # es.indices.delete只需要有index即可
print("delete = ", result)
print()

# 创建带type和mapping的索引
mapping = {
    "mappings":{
        "employee":{
            "properties":{
                # 雇员id, string类型, keyword不分词
                "id":{
                    "type":"keyword"
                },
                # 雇员name, string类型, text分词
                # 雇员name.keyword string类型, keyword不分词
                "name":{
                    "type":"text",
                    "analyzer":"ik_smart",
                    "fields":{
                        "keyword":{
                            "type":"keyword",
                            "ignore_above":256  # 文本最大长度
                        }
                    }
                },
                # 雇员性别 string类型, keyword不分词
                "sex":{
                    "type":"keyword"
                },
                # 雇员年龄 int类型
                "age":{
                    "type":"integer"
                },
                # 雇员生日 date类型
                "birthday":{
                    "type":"date"
                },
                # 雇员position, string类型, text分词
                # 雇员position.keyword string类型, keyword不分词
                "position":{
                    "type":"text",
                    "analyzer":"ik_smart",
                    "fields":{
                        "keyword":{
                            "type":"keyword",
                            "ignore_above":256
                        }
                    }
                },
                # 雇员level,
                "level":{
                    "type":"join",
                    "relations":{
                        "superior":"staff",
                        "staff":"junior"
                    }
                },
                # 雇员departments, string类型, text分词
                # 雇员departments.keyword string类型, keyword不分词
                "departments":{
                    "type":"text",
                    "analyzer":"ik_smart",
                    "fields":{
                        "keyword":{
                            "type":"keyword",
                            "ignore_above":256
                        }
                    }
                },
                # 雇员加入公司的时间
                "joinTime":{
                    "type":"date"
                },
                # 修改的时间
                "modified":{
                    "type":"date"
                },
                # 这条记录的创建时间
                "created":{
                    "type":"date"
                }
            }
        }
    }
}
result = es.indices.create(index="company", body=mapping)
print("create type mapping = ", result)
print()

# 插入数据
body1 = {
    "id": "1",
    "name": "张三",
    "sex": "男",
    "age": 49,
    "birthday": "1970-01-01",
    "position": "董事长",
    "level": {
        "name": "superior"
    },
    "joinTime": "1990-01-01",
    "modified": "1562167817000",
    "created": "1562167817000"
}

body2 = {
    "id": "2",
    "name": "李四",
    "sex": "男",
    "age": 39,
    "birthday": "1980-04-03",
    "position": "总经理",
    "level": {
        "name": "staff",
        "parent": "1"
    },
    "departments": ["市场部", "研发部"],
    "joinTime": "2001-02-02",
    "modified": "1562167817000",
    "created": "1562167817000"
}

body3 = {
    "id": "3",
    "name": "王五",
    "sex": "女",
    "age": 27,
    "birthday": "1992-09-01",
    "position": "销售",
    "level": {
        "name": "junior",
        "parent": "2"
    },
    "departments": ["市场部"],
    "joinTime": "2010-07-01",
    "modified": "1562167817000",
    "created": "1562167817000"
}

body4 = {
    "id": "4",
    "name": "赵六",
    "sex": "男",
    "age": 29,
    "birthday": "1990-09-01",
    "position": "销售",
    "level": {
        "name": "junior",
        "parent": "2"
    },
    "departments": ["市场部"],
    "joinTime": "2010-08-08",
    "modified": "1562167817000",
    "created": "1562167817000"
}

body5 = {
    "id": "5",
    "name": "孙七",
    "sex": "男",
    "age": 26,
    "birthday": "1993-12-10",
    "position": "前端工程师",
    "level": {
        "name": "junior",
        "parent": "2"
    },
    "departments": ["研发部"],
    "joinTime": "2016-07-01",
    "modified": "1562167817000",
    "created": "1562167817000"
}

body6 = {
    "id": "6",
    "name": "周八",
    "sex": "男",
    "age": 28,
    "birthday": "1994-05-11",
    "position": "Java工程师",
    "level": {
        "name": "junior",
        "parent": "2"
    },
    "departments": ["研发部"],
    "joinTime": "2018-03-10",
    "modified": "1562167817000",
    "created": "1562167817000"
}

# join里面的parent=1是指的整个插入的_id=1, 但是es.index这种插入形式, 如果不指定id=1则默认的_id=字符串
# result = es.index(index="company", doc_type="employee", id=1, body=body1)

# 如果有join形式的mappings使用put形式添加数据
url = "http://localhost:9200/company/employee/1?routing=1"
res = requests.put(url, data=json.dumps(body1), headers={'Content-Type':'application/json'})
print("insert-1 = ", res.text)

url = "http://localhost:9200/company/employee/2?routing=1"
res = requests.put(url, data=json.dumps(body2), headers={'Content-Type':'application/json'})
print("insert-2 = ", res.text)

url = "http://localhost:9200/company/employee/3?routing=1"
res = requests.put(url, data=json.dumps(body3), headers={'Content-Type':'application/json'})
print("insert-3 = ", res.text)

url = "http://localhost:9200/company/employee/4?routing=1"
res = requests.put(url, data=json.dumps(body4), headers={'Content-Type':'application/json'})
print("insert-4 = ", res.text)

url = "http://localhost:9200/company/employee/5?routing=1"
res = requests.put(url, data=json.dumps(body5), headers={'Content-Type':'application/json'})
print("insert-5 = ", res.text)

url = "http://localhost:9200/company/employee/6?routing=1"
res = requests.put(url, data=json.dumps(body6), headers={'Content-Type':'application/json'})
print("insert-6 = ", res.text)
print()

# http://localhost:9200/company/_search?pretty 无条件查询, 查询索引是word的刚插入的数据

# 切记在插入之后等10s之后才可以查询的
# 查询研发部的员工
query = {
    "query": {
        "match": {
            "departments": "研发部"
        }
    }
}
result = es.search(index="company", doc_type="employee", body=query)
hits = []
for data in result["hits"]["hits"]:
    hits.append(data["_source"]["name"])
print("研发部 = ", hits)

# 查询研发部且在市场部的员工
query = {
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "departments": "市场部"
                    }
                },
                {
                    "match": {
                        "departments": "研发部"
                    }
                }
            ]
        }
    }
}
result = es.search(index="company", doc_type="employee", body=query)
hits = []
for data in result["hits"]["hits"]:
    hits.append(data["_source"]["name"])
print("研发部&市场部 = ", hits)

# 查询张三的直接下属
query = {
    "query": {
        "has_parent": {
            "parent_type": "superior",
            "query": {
                "match":{
                    "name":"张三"
                }
            }
        }
    }
}
result = es.search(index="company", doc_type="employee", body=query)
hits = []
for data in result["hits"]["hits"]:
    hits.append(data["_source"]["name"])
print("张三的下属 = ", hits)

# 查询王五的上级
query = {
    "query": {
        "has_child": {
            "type": "junior",
            "query": {
                "match":{
                    "name":"王五"
                }
            }
        }
    }
}
result = es.search(index="company", doc_type="employee", body=query)
hits = []
for data in result["hits"]["hits"]:
    hits.append(data["_source"]["name"])
print("王五的上级 = ", hits)

# 计算员工的平均年龄
query = {
    "size": 0,
    "aggs": {
        "avg_age": {
            "avg": {
                "field": "age"
            }
        }
    }
}
result = es.search(index="company", doc_type="employee", body=query)
print("平均年龄 = ", result)

# 查询张三的生日
query = {
    "_source": ["name","birthday"],
    "query": {
        "match": {
            "name": "张三"
        }
    }
}
result = es.search(index="company", doc_type="employee", body=query)
print("张三生日 = ", result)

复杂查询的更多例子可以参考该链接: https://pan.baidu.com/s/1IvNomQVxkMgqYKjZs2uXSg 提取码: j63w.