docker和python和Es的完美集合_01_docker安装es7并使用python连接-CSDN博客

本文链接：https://blog.csdn.net/feiyu361/article/details/120162359

本文档介绍了如何在Python环境中利用Docker安装Elasticsearch和Kibana，以及如何进行基本的数据操作，如创建索引、导入文档、搜索、过滤和聚合分析。通过示例代码展示了如何使用Python的Elasticsearch库连接到ES集群，进行数据的增删查改，并演示了全文搜索、精确匹配和范围过滤等查询技巧。

摘要由CSDN通过智能技术生成

文章目录

创建一个 python 项目,(需要电脑有py和docker)
- - - - docker的安装
    - 创建目录
安装 Elasticsearch 及 Kibana
用py连接到 Elasticsearch
创建索引并导入文档
- 插入更多的文档数据
获取一个文档
删除一个文档
搜索文档,它们在 hits 字段中被展示。
match 操作符,查询index是megacorp并且('first_name':'nitin')的
bool 操作符
filter 操作符,过滤器,大于小于,需要用range范围字段
全文搜索,about:查询字节
Phrase search,确切顺序搜索,只有3个
聚合

创建一个 python 项目,(需要电脑有py和docker)

docker的安装

创建目录

mkdir python-elasticsearch
cd python-elasticsearch

pip安装 elasticsearch 包：

pip3 install elasticsearch

安装 Elasticsearch 及 Kibana

在文件夹种创建docker-compose.yml 的文件

---
version: "3"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.10.0
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
 
  kibana:
    image: docker.elastic.co/kibana/kibana:7.10.0
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch
 
volumes:
  esdata:
    driver: local

命令行中执行

docker-compose up

在这里插入图片描述
验证是否安装成功:

http://localhost:9200 

http://localhost:5601

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8Rf02jSK-1631005954537)(https://img-blog.csdnim.cn/b9a08ea0d39541469b9e9e0b8dbd0e1c.png)]

用py连接到 Elasticsearch

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)

创建索引并导入文档

Elasticsearch 是面向文档的，这意味着它存储了整个对象或文档。它不仅存储它们，而且索引每个文档的内容以使其可搜索。在 Elasticsearch 中，你可以对文档进行索引，搜索，排序和过滤。
Elasticsearch 使用 JSON 作为文档的序列化格式。现在让我们开始索引员工文档。在 Elasticsearch 中存储数据的行为称为索引编制。 Elasticsearch 集群可以包含多个索引，而索引又包含一个类型。这些类型包含多个文档，并且每个文档都有多个字段。如果你想了解更多这些概念，请阅读我之前的文章 “Elasticsearch 中的一些重要概念: cluster, node, index, document, shards 及 replica”。

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
 
e1 = {
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type ='_doc',id=1,body = e1)
print(res)

运行结果如下,result 字段的结果 created 中，可以看出来一个新的文档已经被生成,通过 Kibana 来进行查看: 在这里插入图片描述
需要自行搜索:

在面板左边输入:

GET megacorp/_search

在这里插入图片描述
在实际的使用中，指定 id 会带来导入效率的降低,所以我们不要指定id

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
 
e1 = {
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type ='_doc', body = e1)
print(res)

运行代码,显示 result 为 created:
在这里插入图片描述
GET megacorp/_search

插入更多的文档数据

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
 
e1 = {
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
print(res['result'])
 
e2 = {
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
e3 = {
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
print(res['result'])
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
print(res['result'])

在这里插入图片描述
第一个文档插入时，由于 id 为 1 的文档已经是存在，再次进行插入时，返回的结果为 updated，而对于下面的两个文档来说，它们都是第一次被创建所以是 created

获取一个文档

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
 
e1 = {
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
print(res['result'])
 
e2 = {
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
e3 = {
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
print(res['result'])
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
print(res['result'])
 
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
print(res)

在上面，我们添加了获取 id 为 3 的文档, _source 字段中可以发现之前输入文档的内容：
在这里插入图片描述

删除一个文档

res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
print(res['result'])

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
 
e1 = {
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
print(res['result'])
 
e2 = {
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
e3 = {
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
print(res['result'])
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
print(res['result'])
 
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
print(res)
 
# Delete a doc with id = 3
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
print(res['result'])

在这里插入图片描述

搜索文档,它们在 hits 字段中被展示。

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
 
e1 = {
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
print(res['result'])
 
e2 = {
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
e3 = {
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
print(res['result'])
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
print(res['result'])
 
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
print(res)
 
# Delete a doc with id = 3
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
print(res['result'])
 
#查询所有文档
# Search all of the available documents
res = es.search(index = 'megacorp', body = {'query': {"match_all": {}} } )
print(res['hits'])

match 操作符,查询index是megacorp并且(‘first_name’:‘nitin’)的

# Search for a document with first_name = nitin
res= es.search(index = 'megacorp', body = {'query':{'match':{'first_name':'nitin'}}})
print(res['hits']['hits'])

bool 操作符

bool 使用字典，其中至少包含 must，should 和 must_not 中的一个，每个字典都包含匹配列表或其他进一步的搜索运算符。

res= es.search(index = 'megacorp', body = {
        'query':{
            'bool':{
                'must':[{
                        'match':{
                            'first_name':'nitin'
                        }
                    }]
            }
        }
    })
print(res['hits']['hits'])

详细代码如下:

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
#print(es)
 
e1 = {
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
#print(res['result'])
 
e2 = {
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
e3 = {
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
#print(res['result'])
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
#print(res['result'])
 
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
#print(res)
 
# Delete a doc with id = 3
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
#print(res['result'])
 
# Search all of the available documents
res = es.search(index = 'megacorp', body = {'query': {"match_all": {}} } )
#print(res['hits'])
 
# Search for a document with first_name = nitin
res= es.search(index = 'megacorp', body = {'query':{'match':{'first_name':'nitin'}}})
#print(res['hits']['hits'])
 
res= es.search(index = 'megacorp', body = {
        'query':{
            'bool':{
                'must':[{
                        'match':{
                            'first_name':'nitin'
                        }
                    }]
            }
        }
    })
 
print(res['hits']['hits'])

filter 操作符,过滤器,大于小于,需要用range范围字段

res= es.search(index = 'megacorp', body = {
        'query':{
            'bool':{
                'must':{
                    'match':{
                        'first_name':'nitin'
                    }
                },
                "filter":{
                    "range":{
                        "age":{
                            "gt":25
                        }
                    }
                }
            }
        }
    })
 
print(res['hits']['hits'])

全文搜索,about:查询字节

# Import Elasticsearch package
from elasticsearch import Elasticsearch
 
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
#print(es)
 
e1 = {
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
#print(res['result'])
 
e2 = {
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
e3 = {
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
#print(res['result'])
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
#print(res['result'])
 
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
#print(res)
 
# Delete a doc with id = 3
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
#print(res['result'])
 
# Search all of the available documents
res = es.search(index = 'megacorp', body = {'query': {"match_all": {}} } )
#print(res['hits'])
 
# Search for a document with first_name = nitin
res= es.search(index = 'megacorp', body = {'query':{'match':{'first_name':'nitin'}}})
#print(res['hits']['hits'])
 
res= es.search(index = 'megacorp', body = {
        'query':{
            'bool':{
                'must':[{
                        'match':{
                            'first_name':'nitin'
                        }
                    }]
            }
        }
    })
 
# print(res['hits']['hits'])
 
res= es.search(index = 'megacorp', body = {
        'query':{
            'bool':{
                'must':{
                    'match':{
                        'first_name':'nitin'
                    }
                },
                "filter":{
                    "range":{
                        "age":{
                            "gt":27
                        }
                    }
                }
            }
        }
    })
 
# print(res['hits']['hits'])
 
e4 = {
    "first_name":"asd",
    "last_name":"pafdfd",
    "age": 27,
    "about": "Love to play football",
    "interests": ['sports','music'],
}
 
res = es.index(index = 'megacorp', doc_type = '_doc', id = 4, body = e4)
print(res['result'])
 
res = es.search( index = 'megacorp', body = {
        'query':{
            'match':{
                "about":"play cricket"
            }
        }
    })
 
for hit in res['hits']['hits']:
    print(hit['_source']['about'])
    print(hit['_score'])
    print('**********************')

搜索到的结果如下:
在这里插入图片描述

Phrase search,确切顺序搜索,只有3个

res= es.search(index = 'megacorp', body = {
        'query':{
            'match_phrase':{
                "about":"play cricket"
            }
        }
    })
 
for hit in res['hits']['hits']:
    print(hit['_source']['about'])
    print(hit['_score'])
    print('**********************')

聚合

Elasticsearch 具有称为聚合的功能，该功能使你可以对数据进行复杂的分析。它与 SQ L中的 “Group By” 相似，但功能更强大。

res= es.search(index = 'megacorp', body = {
        "aggs": {
            "all_interests": {
            "terms": { "field": "interests.keyword" }
            }
        }
    })
 
print(res)

第一篇,完结!