Elastic Search尝试

最新推荐文章于 2024-09-04 19:00:19 发布

wendyw1999

最新推荐文章于 2024-09-04 19:00:19 发布

阅读量751

点赞数

分类专栏：数据库文章标签： python elasticsearch 大数据

本文链接：https://blog.csdn.net/wendyw1999/article/details/105971624

版权

数据库专栏收录该内容

7 篇文章 0 订阅

订阅专栏

ElasticSearch的下载与初运行

感谢

本篇文章参考了这些网站和博主
英文博主
 csdn博主
 百度网盘安装包

安装和测试

如果你在海外可以直接前往官网下载安装
但是国内下载这个较慢，推荐使用这个博主的百度网盘安装包
下载完成后，选择解压（unzip/extract all）该文件
解压后，如果是windows，双击打开bin文件中的elasticsearch.bat
开始跑，跑到后面如果出现security is disabled 那就是安装成功了
每次用elastic search都得打开这个elasticsearch.bat 不然就连接不上
想测试你是否安装成功可以在浏览器里输入下面这个https://localhost:9200/ 注意一定有这个/ 页面会显示如下，就说明安装成功了。

{
  "name" : "DESKTOP-8VIS1PG",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "XiR7TDQLTrK78pzIUq49YQ",
  "version" : {
    "number" : "7.6.0",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "7f634e9f44834fbc12724506cc1da681b0c3b1e3",
    "build_date" : "2020-02-06T00:09:00.449973Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Python实现

创建、删除index

第一件事就是创建 index .Index在elastic search中类似于一个RDBMS中的 数据库 (Database）。所以不要将index在这个情景下的概念和indexing搞混。

首先确定自己的pip是否安装elastisearch 如果没有那就在terminal输入：

pip install elasticsearch

在python file里import

from elastisearch import ElasticSearch

接着创建一个index

es = Elasticsearch()
result = es.indices.create(index='test', ignore=[400])
print(result)

这里的ignore=[400] 主要是防止报错影响。在正常情况下会出现下面的的print结果

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'test'}

我们可以打开浏览器查看http://localhost:9200/test 页面里是这样的：

{"test":{"aliases":{},"mappings":{},"settings":{"index":{"creation_date":"1588834593270","number_of_shards":"1","number_of_replicas":"1","uuid":"czE1_n-EREC9AOA2FahZtQ","version":{"created":"7060099"},"provided_name":"test"}}}}

但是如果我们再次create一个index = ‘test’ 的index，就会报错，因为test已经存在，不能再次被建立。但是这个错误被ignore了，但是通过print我们还是可以看到error message。

result = es.indices.create(index='test', ignore=[400])
print(result)

{'error': {'root_cause': [{'type': 'resource_already_exists_exception', 'reason': 'index [test/czE1_n-EREC9AOA2FahZtQ] already exists', 'index_uuid': 'czE1_n-EREC9AOA2FahZtQ', 'index': 'test'}], 'type': 'resource_already_exists_exception', 'reason': 'index [test/czE1_n-EREC9AOA2FahZtQ] already exists', 'index_uuid': 'czE1_n-EREC9AOA2FahZtQ', 'index': 'test'}, 'status': 400}

删除

result = es.indices.delete(index='test', ignore=[400])
print(result)

{'acknowledged': True}

插入数据

第一个方法：create

data={'title':'百度','url':'http://www.baidu.com'}
result=es.create(index='test',doc_type='search',id=1,body=data)
print(result)

{'_index': 'test', '_type': 'search', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

第二个方法：index

es.index(index='test', doc_type='search', body=data)

{'_index': 'test',
 '_type': 'search',
 '_id': '6uUL7nEBRTagxeCBtu4b',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 1,
 '_primary_term': 1}

两者区别，根据这个博主两者的区别在于

结果中 result 字段为 created，代表该数据插入成功。
另外其实我们也可以使用 index() 方法来插入数据，但与 create() 不同的是，create() 方法需要我们指定 id 字段来唯一标识该条数据，而 index() 方法则不需要，如果不指定 id，会自动生成一个 id.

其中doc_type 是table name，version就是第几个版本

更新&删除

更新一个数据

data={'title':'百度','url':'https://www.baidu.com','date':'2011-12-16'}
es.index(index='test', doc_type='search', body=data, id=1)

{'_index': 'test',
 '_type': 'search',
 '_id': '1',
 '_version': 2,
 'result': 'updated',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 2,
 '_primary_term': 1}

删除通过id

result = es.delete(index='test', doc_type='search', id=1)
print(result)

{'_index': 'test', '_type': 'search', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}

查询

首先如果要使用中文文本，需要下载一个plugin 具体的下载安装方式可以参考这个博主
然后我们先重新设置一个index，这里我们提前设置后mapping。

mapping={'properties':{'title':{'type':'text','analyzer':'ik_max_word','search_analyzer':'ik_max_word'}}}
es.indices.delete(index='test',ignore=400)
es.indices.create(index='test',ignore=400)
result=es.indices.put_mapping(index='news',doc_type='search',body=mapping,include_type_name=True)

{'_index': 'test', '_type': 'search', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

接下来我们插入一些数据，并查找

datas=[{'title':'baidu search engine','url':'https://www.baidu.com','date':'2011-12-16'},
       {'title':'google search engine','url':'https://www.google.com','date':'2011-12-16'},
       {'title':'bing search engine','url':'https://www.bing.com','date':'2011-12-16'}]

for data in datas:
    es.index(index = "test",doc_type = "search",body = data)
result = es.search(index = "test",doc_type = "search") #这一步是查找
print(result)

查询结果如下

{'took': 4,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 6, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'test',
    '_type': 'search',
    '_id': '-Bwk7nEBgrTKNfIQYcuv',
    '_score': 1.0,
    '_source': {'title': 'baidu',
     'url': 'https://www.baidu.com',
     'date': '2011-12-16'}},
   {'_index': 'test',
    '_type': 'search',
    '_id': '-Rwk7nEBgrTKNfIQY8s5',
    '_score': 1.0,
    '_source': {'title': 'google',
     'url': 'https://www.google.com',
     'date': '2011-12-16'}},
   {'_index': 'test',
    '_type': 'search',
    '_id': '-hwk7nEBgrTKNfIQY8uw',
    '_score': 1.0,
    '_source': {'title': 'bing',
     'url': 'https://www.bing.com',
     'date': '2011-12-16'}},
   {'_index': 'test',
    '_type': 'search',
    '_id': '-xwt7nEBgrTKNfIQ0cv1',
    '_score': 1.0,
    '_source': {'title': 'baidu search engine',
     'url': 'https://www.baidu.com',
     'date': '2011-12-16'}},
   {'_index': 'test',
    '_type': 'search',
    '_id': '_Bwt7nEBgrTKNfIQ0svV',
    '_score': 1.0,
    '_source': {'title': 'google search engine',
     'url': 'https://www.google.com',
     'date': '2011-12-16'}},
   {'_index': 'test',
    '_type': 'search',
    '_id': '_Rwt7nEBgrTKNfIQ08tQ',
    '_score': 1.0,
    '_source': {'title': 'bing search engine',
     'url': 'https://www.bing.com',
     'date': '2011-12-16'}}]}}

我们先前插入的三条数据都找到了！

全文搜索

全文搜索什么意思呢？我们的title是bing search engine,普通的搜索只能通过搜全称才能搜到这个词条，但是通过es我们可以搜bing engine / bing / engine / search这种单词就搜到你啦！
另外如果我们搜索bing engine 可以搜到任意一个包含bing的词以及所有包含engine的词条。

sl = {'query':{'match':{
    'title':"baidu google"}}}
es = Elasticsearch()
result = es.search(index="test",doc_type="search",body=dsl)
print(result)

搜到了两条，google search engine 和baidu search engine。

{'took': 2,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': 0.9808291,
  'hits': [{'_index': 'test',
    '_type': 'search',
    '_id': '_hwz7nEBgrTKNfIQUctL',
    '_score': 0.9808291,
    '_source': {'title': 'baidu search engine',
     'url': 'https://www.baidu.com',
     'date': '2011-12-16'}},
   {'_index': 'test',
    '_type': 'search',
    '_id': '_xwz7nEBgrTKNfIQUsu3',
    '_score': 0.9808291,
    '_source': {'title': 'google search engine',
     'url': 'https://www.google.com',
     'date': '2011-12-16'}}]}}

匹配度
在上面的print result里大家可以看到score是一样的。因为两个词条和搜索词条的匹配度是一样的
我们可以尝试一下搜索baidu search engine 看看另外两个词条的匹配度

dsl = {'query':{'match':{
    'title':"baidu search engine"}}}
es = Elasticsearch()
result = es.search(index="test",doc_type="search",body=dsl)
result

{'took': 4,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 3, 'relation': 'eq'},
  'max_score': 1.2478919,
  'hits': [{'_index': 'test',
    '_type': 'search',
    '_id': '_hwz7nEBgrTKNfIQUctL',
    '_score': 1.2478919,
    '_source': {'title': 'baidu search engine',
     'url': 'https://www.baidu.com',
     'date': '2011-12-16'}},
   {'_index': 'test',
    '_type': 'search',
    '_id': '_xwz7nEBgrTKNfIQUsu3',
    '_score': 0.26706278,
    '_source': {'title': 'google search engine',
     'url': 'https://www.google.com',
     'date': '2011-12-16'}},
   {'_index': 'test',
    '_type': 'search',
    '_id': 'ABwz7nEBgrTKNfIQU8wi',
    '_score': 0.26706278,
    '_source': {'title': 'bing search engine',
     'url': 'https://www.bing.com',
     'date': '2011-12-16'}}]}}

google和bing的匹配度都只有零点2，baidu的匹配度达到了1.247.我们可以通过匹配度对搜索结果进行排序。

api的使用 (postman & curl)

Postman get和post

下载安装，如果无法打开google store可以下载软件版本download link
我们之前添加的数据都是没有id的我们来创建个有id的数据

data = {'title':'sogou search engine','url':'https://www.sogou.com','date':'2011-12-16'}
result=es.create(index='test',doc_type='search',id=1,body=data)
print(result)

{'_index': 'test', '_type': 'search', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}

打开postman 查询id = 1 的词条，结果如下
post功能，插入数据
将get换成post，并且点击上面的body里写你想添加的词条具体数据JSON
注意：从plain text改成JSON
按了send就可以插入这条词条啦

可以检测一下是不是已经插入成功再次试一试get localhost:9200/test/search/2

尝试用curl的get

包含metadata的查询
注意这里在任何location使用curl都行
主要格式：localhost:9200/index_name/doc_type_name/id

C:\Users\Wendy W>curl -X GET "localhost:9200/test/search/1"
{"_index":"test","_type":"search","_id":"1","_version":1,"_seq_no":3,"_primary_term":1,"found":true,"_source":{"title":"sogou search engine","url":"https://www.sogou.com","date":"2011-12-16"}}

如果你只想要查询的是数据，不包括metadata，那就在id后面加一个_source

C:\Users\Wendy W>curl -X GET "localhost:9200/test/search/1/_source"
{"title":"sogou search engine","url":"https://www.sogou.com","date":"2011-12-16"}

查看我们目前有多少个数据

curl -X GET "localhost:9200/_count"
{"count":5,"_shards":{"total":2,"successful":2,"skipped":0,"failed":0}}

下一期我会尝试连接到postgresql上

wendyw1999

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录