Python中集成es两种方式
1 原生集成
from elasticsearch import Elasticsearch
obj = Elasticsearch()
'''
不用doc包裹会报错
ActionRequestValidationException[Validation Failed: 1: script or doc is missing
'''
query = {'query': {'match': {'title': '十个'}}}
allDoc = obj.search(index='books', doc_type='_doc', body=query)
print(allDoc['hits']['hits'][0]['_source'])
2 dsl集成
from datetime import datetime
from elasticsearch_dsl import Document, Date, Nested, Boolean,analyzer, InnerDoc, Completion, Keyword, Text,Integer
from elasticsearch_dsl.connections import connections
connections.create_connection(hosts=["localhost"])
class Article(Document):
title = Text(analyzer='ik_max_word')
author = Text()
class Index:
name = 'myindex'
def save(self, ** kwargs):
return super(Article, self).save(** kwargs)
if __name__ == '__main__':
s = Article.search()
s = s.filter('match', title="李清照").delete()
二、 集群搭建(脑裂)
-只要es节点能联通,ping,自动加人到节点中
cluster.name: my_es1
node.name: node1
network.host: 127.0.0.1
http.port: 9200
transport.tcp.port: 9300
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9302", "127.0.0.1:9303", "127.0.0.1:9304"]
cluster.name: my_es1
node.name: node2
network.host: 127.0.0.1
http.port: 9202
transport.tcp.port: 9302
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9302", "127.0.0.1:9303", "127.0.0.1:9304"]
cluster.name: my_es1
node.name: node3
network.host: 127.0.0.1
http.port: 9203
transport.tcp.port: 9303
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9302", "127.0.0.1:9303", "127.0.0.1:9304"]
cluster.name: my_es1
node.name: node4
network.host: 127.0.0.1
http.port: 9204
transport.tcp.port: 9304
node.master: false
node.data: true
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9302", "127.0.0.1:9303", "127.0.0.1:9304"]
由上例的配置可以看到,各节点有一个共同的名字my_es1,但由于是本地环境,所以各节点的名字不能一致,我们分别启动它们,它们通过单播列表相互介绍,发现彼此,然后组成一个my_es1集群。谁是老大则是要看谁先启动了!
-由于网络问题,网络波动,没有相互发现 3个节点一组 , 4 个节点一组形成了两个机器
-防止脑列
防止脑裂,我们对最小集群节点数该集群设置参数:(集群节点总数/2+1的个数)
discovery.zen.minimum_master_nodes: 3
三、打分机制
1 确定文档和查询有多么相关的过程被称为打分
2 TF`是词频(term frequency):一个词条在文档中出现的次数,出现的频率越高,表示相关度越高
3 `IDF`是逆文档频率:如果一个词条在索引中的不同文档中出现的次数越多,那么它就越不重要