Python 进阶必学库：elasticsearch-py 使用详解 (下篇)

最新推荐文章于 2023-08-30 18:36:10 发布

懒编程-二两

最新推荐文章于 2023-08-30 18:36:10 发布

阅读量329

点赞数

本文链接：https://blog.csdn.net/weixin_30230009/article/details/105100102

版权

HackPython 致力于有趣有价值的编程教学

简介

在上一篇文件中，介绍了 Elasticsearch 以及其中的关键概念，并且安装了 Elasticsearch 与对应的 elasticsearch-py ，本章就来使用一下其基本的功能????。

使用

Elasticsearch 本身提供了一系列 Restful API 来进行存取和查询操作，我们可以使用任意语言来使用这些 API，而 elasticsearch-py 已经将这些 API 封装后，直接调用其中方法则可。

为了正常使用，先运行 elasticsearch

cd /usr/local/Cellar/elasticsearch/6.8.0/bin && ./elasticsearch

创建索引 (Index)

导入 elasticsearch 库的 Elasticsearch 类，使用 create () 方法创建一个名为 names 的 Index

from elasticsearch import Elasticsearch
es = Elasticsearch()
result = es.indices.create(index='names')
print(result)

如果创建成功，则会返回下面结果????

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'names'}

Elasticsearch 中不可以创建同名的 Index ，如果重复执行上述代码，则会出现如下错误????

elasticsearch.exceptions.RequestError: RequestError(400, 'resource_already_exists_exception', 'index [names/x-AtvCZ-Q5uL-NB0hNeoFQ] already exists')

错误表明，names 这个 Index 已经存在了，不可以重复创建，错误类型为 400，在创建代码中添加 ignore = 400 可以让程序忽略这个报错。

from elasticsearch import Elasticsearch
es = Elasticsearch()
result = es.indices.create(index='names', ignore = 400)
print(result)

对于一些已知的、可处理的报错可以将其忽略来保证程序的正常运行，但编写程序时，我们是无法保证程序永不出错的，如果程序出现不再预知范围内的错误，最佳的方式就是让其崩溃，而不是隐藏错误，让程序以不正常的状态苟延残喘的运行下去????‍♂️。

删除 Index

与创建 Index 类似，代码如下：

from elasticsearch import Elasticsearch
es = Elasticsearch()
result = es.indices.delete(index='names', ignore=[400, 404])
print(result)

同样使用了 ignore 参数，来忽略 Index 不存在而删除失败导致程序中断的问题，如果成功删除，会输出如下结果????：

{'acknowledged': True}

如果 Index 已经被删除，再执行删除则会输出如下结果????：

{'error': {'root_cause': [{'type': 'index_not_found_exception', 'reason': 'no such index', 'resource.type': 'index_or_alias', 'resource.id': 'names', 'index_uuid': '_na_', 'index': 'names'}], 'type': 'index_not_found_exception', 'reason': 'no such index', 'resource.type': 'index_or_alias', 'resource.id': 'names', 'index_uuid': '_na_', 'index': 'names'}, 'status': 404}

结果表明当前 Index 不存在，删除失败，返回的结果同样是 JSON，状态码是 400，但是由于我们添加了 ignore 参数，忽略了 400 状态码，因此程序正常执行输出 JSON 结果，而不是抛出异常。

插入数据

Elasticsearch 可以直接插入结构化字典数据，代码如下：

from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index='people', ignore=400)
data = {'name': ' 二两', 'age': '28'}
result = es.create(index='people', doc_type='politics', id=1, body=data)
print(result)

????创建一条数据，包括人名和年龄，然后通过调用 create () 方法插入了这条数据，在调用 create () 方法时，我们传入了四个参数，index 参数代表了索引名称，doc_type 代表了文档类型，body 则代表了文档具体内容，id 则是数据的唯一标识 ID。

运行结果：

{'_index': 'people', '_type': 'politics', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

结果中 result 字段为 created，代表该数据插入成功????。

除 create () 方法外还可以使用 index () 方法来插入数据，create () 方法需要我们指定 id 字段来唯一标识该条数据，而 index () 方法则不需要，如果不指定 id，会自动生成一个 id，调用 index () 方法的写法如下：

es.index(index='people', doc_type='politics', body=data)

更新数据

指定数据的 id 和内容，调用 update () 方法即可，但需要注意的是， Elasticsearch 对应的更新 API 对传递数据的格式是有要求的，更新时使用的具体代码如下：

es = Elasticsearch()
data = {
   'doc' : {
       'name': '二两',
       'age': '30',
       'desc': 'Java工程师'
   }
}
result = es.update(index='people', doc_type='politics', body=data, id=1)
print(result)

这里为数据增加了一个日期字段，然后调用了 update () 方法，update () 其他的数据格式为：

{
     “ doc ”：{}，
     “ script ”：{}
}

成功更新后，可以看到如下结果????：

{'_index': 'people', '_type': 'politics', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}

result 字段为 updated，即表示更新成功， _version 字段，这代表更新后的版本号数，2 代表这是第二个版本，因为之前已经插入过一次数据，所以第一次插入的数据是版本 1。

此外，还可以直接使用 index () 方法来实现更新操作，代码如下：

es = Elasticsearch()
data = {
   'name': '二两',
   'age': '29',
   'desc': 'Python工程师'
}
result = es.index(index='people', doc_type='politics', body=data, id=1)
print(result)

成功后，会返回相同的结果，只是 _version 变为了 3。index () 方法对格式没有要求。

删除数据

删除数据调用 delete () 方法，指定需要删除的数据 id ，代码如下：

es = Elasticsearch()
result = es.delete(index='people', doc_type='politics', id=1)
print(result)

成功删除后，输出如下结果????：

{'_index': 'people', '_type': 'politics', '_id': '1', '_version': 4, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}

查询数据

上面几个操作都非常简单，而 Elasticsearch 最强大的功能是查询功能，这里来使用了 Elasticsearch ，因为中文数据与英文数据不同，英文单词之间天然由空格分割，所以可以直接利用 Elasticsearch 来搜索，而中文数据词与词时相互连接在一起的，所以需要先进行分词，即将中文数据中的词汇分割出来，????这里可以使用 elasticsearch-analysis-ik 插件，该插件 Elasticsearch 实现了中文分词????，可以使用 elasticsearch-plugin 来安装 Elasticsearch 插件，注意，插件的版本要与 Elasticsearch 主版本对应，这里使用的是 6.8.0 版本，所以安装 6.x 版本的 elasticsearch-analysis-ik 则可。

./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.0/elasticsearch-analysis-ik-6.8.0.zip

elasticsearch-analysis-ik Github 地址为: https://github.com/medcl/elasticsearch-analysis-ik ????

你安装时，请将 6.8.0 替换为自己 Elasticsearch 的版本，然后交给 elasticsearch-plugin 完成下载与安装。

安装完后，需要重启 Elasticsearch ，启动的过程中 Elasticsearch 会自动加载其中的数据。

简单重启方式????：

1. 找到 Elastic 进程 ID

ps aux | grep elastic | grep -v grep

2.kill

kill -9 Elastic 进程ID

3. 再次启动

启动后，新建一个索引并指定需要分词的字段，代码如下：

from elasticsearch import Elasticsearch
es = Elasticsearch()
mapping = {
    'properties': {
        'title': {
            'type': 'text',
            'analyzer': 'ik_max_word',
            'search_analyzer': 'ik_max_word'
        }
    }
}
es.indices.delete(index='people', ignore=[400, 404])
es.indices.create(index='news', ignore=400)
result = es.indices.put_mapping(index='news', doc_type='politics', body=mapping)
print(result)

上述代码中，先将之前名为 people 的索引删除，然后新建了名为 news 的索引，然后更新了它的 mapping 信息， mapping 信息中指定了分词的字段，????其中将 title 字段的类型 type 指定为 text，并将分词器 analyzer 和搜索分词器 searchanalyzer 设置为 ikmax_word ，即使用刚刚安装的中文分词插件，如果不指定，则默认使用英文分词器????‍♂️。

接着，插入一些数据，代码如下：

def test2():
    datas = [
        {
            'title': '设计灵魂离职，走下神坛的苹果设计将去向何方？',
            'url': 'https://www.tmtpost.com/4033397.html',
            'date': '2019-06-29 11:30'
        },
        {
            'title': '医生中的建筑设计师，凭什么挽救了上万人的生命？',
            'url': 'https://www.tmtpost.com/4034052.html',
            'date': '2019-06-29 11:10'
        },
        {
            'title': '中国网红二十年：从痞子蔡、芙蓉姐姐到李佳琦，流量与变现的博弈',
            'url': 'https://www.tmtpost.com/4034045.html',
            'date': '2019-06-29 11:03'
        },
        {
            'title': '网易云音乐、喜马拉雅等音频类应用被下架，或因违反相关规定',
            'url': 'https://www.tmtpost.com/nictation/4034040.html',
            'date': '2019-06-29 10:07'
        }
    ]
    for data in datas:
        es.index(index='news', doc_type='politics', body=data)

可以将插入的内容查询打印出来看看：

result = es.search(index='news', doc_type='politics')
print(json.dumps(result, indent=4, ensure_ascii=False))

输出结果如下????：

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 4,
        "max_score": 1.0,
        "hits": [
            {
                "_index": "news",
                "_type": "politics",
                "_id": "MclZoWsB7N68fyc5DUS7",
                "_score": 1.0,
                "_source": {
                    "title": "网易云音乐、喜马拉雅等音频类应用被下架，或因违反相关规定",
                    "url": "https://www.tmtpost.com/nictation/4034040.html",
                    "date": "2019-06-29 10:07"
                }
            },
...

返回结果会出现在 hits 字段里面，其中有 total 字段标明了查询的结果条目数， max_score 代表了最大匹配分数。

我们还可以进行全文检索，这才是 Elasticsearch 搜索引擎的特性????：

dsl = {
        'query': {
            'match': {
                'title': '网红 设计师'
            }
        }
    }
es = Elasticsearch()
result = es.search(index='news', doc_type='politics', body=dsl)
print(json.dumps(result, indent=2, ensure_ascii=False))

Elasticsearch 支持的 DSL 语句来进行查询，使用 match 指定全文检索，检索的字段是 title，内容是 “网红设计师”，搜索结果如下

{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 2.2950945,
    "hits": [
      {
        "_index": "news",
        "_type": "politics",
        "_id": "L8lZoWsB7N68fyc5DUSx",
        "_score": 2.2950945,
        "_source": {
          "title": "医生中的建筑设计师，凭什么挽救了上万人的生命？",
          "url": "https://www.tmtpost.com/4034052.html",
          "date": "2019-06-29 11:10"
        }
      },
      {
        "_index": "news",
        "_type": "politics",
        "_id": "MMlZoWsB7N68fyc5DUS2",
        "_score": 1.8132976,
        "_source": {
          "title": "中国网红二十年：从痞子蔡、芙蓉姐姐到李佳琦，流量与变现的博弈",
          "url": "https://www.tmtpost.com/4034045.html",
          "date": "2019-06-29 11:03"
        }
      },
      {
        "_index": "news",
        "_type": "politics",
        "_id": "LslZoWsB7N68fyc5DEQC",
        "_score": 0.71580166,
        "_source": {
          "title": "设计灵魂离职，走下神坛的苹果设计将去向何方？",
          "url": "https://www.tmtpost.com/4033397.html",
          "date": "2019-06-29 11:30"
        }
      }
    ]
  }
}

匹配的结果有两条，第一条的分数为 2.29，第二条的分数为 1.81，即查询 “网红设计师” 时， Elasticsearch 认为第一个结果权重更高。从该检索结果可以看出，检索时会对对应的字段全文检索，结果还会按照检索关键词的相关性进行排序，这已经是一个搜索引擎的雏形了????。

Elasticsearch 还支持非常多的查询方式: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/query-dsl.html ????

进一步学习

????1.Elasticsearch 权威指南：https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html ????2.Elastic 中文社区：https://www.elasticsearch.cn/

结尾

Elasticsearch 简单使用就介绍到这里，Elasticsearch 本身具有一定的复杂性，简单几篇文章只能让大家对其有个基本的理解，后续还会以 [课外知识] 的方式分享更多 Elasticsearch 方面的内容，最后欢迎学习 HackPython 的教学课程并感觉您的阅读与支持。

懒编程-二两

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python 进阶必学库：elasticsearch-py 使用详解 (下篇)

HackPython 致力于有趣有价值的编程教学简介在上一篇文件中，介绍了 Elasticsearch 以及其中的关键概念，并且安装了 Elastics...
复制链接

扫一扫