文章目录
创建一个 python 项目,(需要电脑有py和docker)
docker的安装
创建目录
mkdir python-elasticsearch
cd python-elasticsearch
pip安装 elasticsearch 包:
pip3 install elasticsearch
安装 Elasticsearch 及 Kibana
在文件夹种创建docker-compose.yml 的文件
---
version: "3"
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.0
container_name: es01
environment:
- node.name=es01
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata:/usr/share/elasticsearch/data
ports:
- 9200:9200
kibana:
image: docker.elastic.co/kibana/kibana:7.10.0
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
esdata:
driver: local
命令行中执行
docker-compose up
验证是否安装成功:
http://localhost:9200
http://localhost:5601
用py连接到 Elasticsearch
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
创建索引并导入文档
Elasticsearch 是面向文档的,这意味着它存储了整个对象或文档。 它不仅存储它们,而且索引每个文档的内容以使其可搜索。 在 Elasticsearch 中,你可以对文档进行索引,搜索,排序和过滤。
Elasticsearch 使用 JSON 作为文档的序列化格式。现在让我们开始索引员工文档。在 Elasticsearch 中存储数据的行为称为索引编制。 Elasticsearch 集群可以包含多个索引,而索引又包含一个类型。 这些类型包含多个文档,并且每个文档都有多个字段。如果你想了解更多这些概念,请阅读我之前的文章 “Elasticsearch 中的一些重要概念: cluster, node, index, document, shards 及 replica”。
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
e1 = {
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type ='_doc',id=1,body = e1)
print(res)
运行结果如下,result 字段的结果 created 中,可以看出来一个新的文档已经被生成,通过 Kibana 来进行查看:
需要自行搜索:
在面板左边输入:
GET megacorp/_search
在实际的使用中,指定 id 会带来导入效率的降低,所以我们不要指定id
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
e1 = {
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type ='_doc', body = e1)
print(res)
运行代码,显示 result 为 created:
GET megacorp/_search
插入更多的文档数据
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
e1 = {
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
print(res['result'])
e2 = {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
e3 = {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
print(res['result'])
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
print(res['result'])
第一个文档插入时,由于 id 为 1 的文档已经是存在,再次进行插入时,返回的结果为 updated,而对于下面的两个文档来说,它们都是第一次被创建所以是 created
获取一个文档
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
e1 = {
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
print(res['result'])
e2 = {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
e3 = {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
print(res['result'])
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
print(res['result'])
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
print(res)
在上面,我们添加了获取 id 为 3 的文档, _source 字段中可以发现之前输入文档的内容:
删除一个文档
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
print(res['result'])
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
e1 = {
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
print(res['result'])
e2 = {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
e3 = {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
print(res['result'])
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
print(res['result'])
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
print(res)
# Delete a doc with id = 3
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
print(res['result'])
搜索文档,它们在 hits 字段中被展示。
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
print(es)
e1 = {
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
print(res['result'])
e2 = {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
e3 = {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
print(res['result'])
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
print(res['result'])
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
print(res)
# Delete a doc with id = 3
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
print(res['result'])
#查询所有文档
# Search all of the available documents
res = es.search(index = 'megacorp', body = {'query': {"match_all": {}} } )
print(res['hits'])
match 操作符,查询index是megacorp并且(‘first_name’:‘nitin’)的
# Search for a document with first_name = nitin
res= es.search(index = 'megacorp', body = {'query':{'match':{'first_name':'nitin'}}})
print(res['hits']['hits'])
bool 操作符
bool 使用字典,其中至少包含 must,should 和 must_not 中的一个,每个字典都包含匹配列表或其他进一步的搜索运算符。
res= es.search(index = 'megacorp', body = {
'query':{
'bool':{
'must':[{
'match':{
'first_name':'nitin'
}
}]
}
}
})
print(res['hits']['hits'])
详细代码如下:
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
#print(es)
e1 = {
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
#print(res['result'])
e2 = {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
e3 = {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
#print(res['result'])
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
#print(res['result'])
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
#print(res)
# Delete a doc with id = 3
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
#print(res['result'])
# Search all of the available documents
res = es.search(index = 'megacorp', body = {'query': {"match_all": {}} } )
#print(res['hits'])
# Search for a document with first_name = nitin
res= es.search(index = 'megacorp', body = {'query':{'match':{'first_name':'nitin'}}})
#print(res['hits']['hits'])
res= es.search(index = 'megacorp', body = {
'query':{
'bool':{
'must':[{
'match':{
'first_name':'nitin'
}
}]
}
}
})
print(res['hits']['hits'])
filter 操作符,过滤器,大于小于,需要用range范围字段
res= es.search(index = 'megacorp', body = {
'query':{
'bool':{
'must':{
'match':{
'first_name':'nitin'
}
},
"filter":{
"range":{
"age":{
"gt":25
}
}
}
}
}
})
print(res['hits']['hits'])
全文搜索,about:查询字节
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
#print(es)
e1 = {
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type ='_doc', id = 1, body = e1)
#print(res['result'])
e2 = {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
e3 = {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
res = es.index(index = 'megacorp', doc_type = '_doc', id = 2,body = e2)
#print(res['result'])
res = es.index(index = 'megacorp', doc_type = '_doc', id = 3, body = e3)
#print(res['result'])
res = es.get(index='megacorp', doc_type = '_doc', id = 3)
#print(res)
# Delete a doc with id = 3
res = es.delete(index = 'megacorp',doc_type='_doc', id = 3)
#print(res['result'])
# Search all of the available documents
res = es.search(index = 'megacorp', body = {'query': {"match_all": {}} } )
#print(res['hits'])
# Search for a document with first_name = nitin
res= es.search(index = 'megacorp', body = {'query':{'match':{'first_name':'nitin'}}})
#print(res['hits']['hits'])
res= es.search(index = 'megacorp', body = {
'query':{
'bool':{
'must':[{
'match':{
'first_name':'nitin'
}
}]
}
}
})
# print(res['hits']['hits'])
res= es.search(index = 'megacorp', body = {
'query':{
'bool':{
'must':{
'match':{
'first_name':'nitin'
}
},
"filter":{
"range":{
"age":{
"gt":27
}
}
}
}
}
})
# print(res['hits']['hits'])
e4 = {
"first_name":"asd",
"last_name":"pafdfd",
"age": 27,
"about": "Love to play football",
"interests": ['sports','music'],
}
res = es.index(index = 'megacorp', doc_type = '_doc', id = 4, body = e4)
print(res['result'])
res = es.search( index = 'megacorp', body = {
'query':{
'match':{
"about":"play cricket"
}
}
})
for hit in res['hits']['hits']:
print(hit['_source']['about'])
print(hit['_score'])
print('**********************')
搜索到的结果如下:
Phrase search,确切顺序搜索,只有3个
res= es.search(index = 'megacorp', body = {
'query':{
'match_phrase':{
"about":"play cricket"
}
}
})
for hit in res['hits']['hits']:
print(hit['_source']['about'])
print(hit['_score'])
print('**********************')
聚合
Elasticsearch 具有称为聚合的功能,该功能使你可以对数据进行复杂的分析。 它与 SQ L中的 “Group By” 相似,但功能更强大。
res= es.search(index = 'megacorp', body = {
"aggs": {
"all_interests": {
"terms": { "field": "interests.keyword" }
}
}
})
print(res)
第一篇,完结!