elasticsearch 查询_ElasticSearch+python完成类似mysql的不分词查询

最新推荐文章于 2023-05-26 17:03:03 发布

weixin_39702479

最新推荐文章于 2023-05-26 17:03:03 发布

阅读量274

点赞数

ElasticSearch 6.x + python 3.x

最近做一个项目，要存储千万级别的三元组（比如：【姓名，年龄，职业】三个字段所对应的信息），后续有频繁的查询，尝试过存入mysql，但是连续上百次查询都超级慢，学长建议尝试一下ES建立倒排索引，于是这两天开始学习ES，虽然它主要用于日志或者文档的管理，但是还是想尝试一下能不能把ElasticSearch当做Mysql，实现mysql等同的一些功能，存储千万条数据，同时满足快速返回结果的需求。

因为ES在建立索引和查询的时候会自动分词，比如我搜索“南京长江大桥”会把检索词分词“南京”“长江”，“大桥”，检索的时候也会把存储的内容分词匹配，无法完成精确匹配。所以解决方法就是建立不分词的索引和不分词的检索。

如果把ElasticSearch当做关系数据库来看，他们的对应的关系如下：

Relational DB: ->Databases->Tables->Rows->Columns

ElasticSearch: ->Indices->Types->Documents->Fields

预先准备：

Getting Started with Elasticsearchwww.elastic.co

ElasticSearch是啥，为什么用它

2.下载，安装并启动ElasticSearch ，Kibana（可视化界面）

step 1：（ElasticSearch）建立索引 == Relational DB（设计表格）

进入Kibana可视化界面：Dev Tools-左侧执行以下代码

不分词建立索引：

PUT /index_test          // index_test 对应mysql database_name
{
  "mappings":{
    "doc_type":{        // doc_type对应 对应mysql table_name
      "properties": {
          "field1": {"type": "keyword"},
          "field2": {"type": "keyword"},
          "field3": {"type": "keyword"}
      }
    }
  }
}

例如：

    PUT /fb2m
    {
      "mappings":{
        "kb_fact":{
          "properties": {
              "subject": {"type": "keyword"},
              "predicate": {"type": "keyword"},
              "object": {"type": "keyword"}
          }
        }
      }
    }

step 2：批量插入数据

import copy
import elasticsearch
from elasticsearch import helpers
import json
# 你的Elasticsearch所对应的IP地址和端口，注意，不是Kibanake可视化界面的地址
es = elasticsearch.Elasticsearch([{'host':'xx.xx.x.xxx','port':xxxx}])

print("============== index ================")
count = 0
i = 0
j = 0
num=0
actions = []
max_count = 2000
with open('dataset/index_data','r',encoding='utf-8') as f:
    for line in f:
        j += 1
        triple = line.strip().split(" ||| ")
        try:
            triple_dict = {'field1':triple[0],'field2':triple[1],'field3':triple[2]}
            # 如果数据量小可以用index的方法一条条插入
            # 这里index，doc_type就等于上一步建立索引所用的名称
            #es.index(index='index_test',doc_type='doc_type',body=triple_dict)
            action={
            "_index":"index_test",
            "_type":"doc_type",
            "_id":i,
            "_source":triple_dict
            }
            i += 1
            count += 1
            actions.append(action)
        except:
            print(" !!! "+ str(j) +" th row insert faied: "+line)
            continue
        if count>=max_count:
            helpers.bulk(es, actions)
            actions=[]
            count=0
            num+=1
            print("Insert "+str(num*max_count)+" records.")
helpers.bulk(es,actions)
print('finish~~~')

step 3：不分词匹配

1）可视化界面检索：在Kibana可视化界面：Dev Tools-左侧执行以下DSL代码

1.不分词查询所有：
select * from doc_type

GET /index_test/doc_type/_search
{
  "query": {
    "match_all": {
    }
  }
}

2.不分词匹配一个字段：
select * from doc_type where filed1=value1

GET /index_test/doc_type/_search
{
  "query": {
    "term": { "filed1" :"value1 }
  }
}

3.不分词同时匹配两个字段：

mysql：select * from doc_type where filed1=value1 and filed2=value2

ES可以用bool过滤器实现组合检索,一个bool过滤器由三部分组成：

{
   "bool" : {
      "must" :     [],  //与 AND 等价
      "should" :   [],  //与 OR 等价
      "must_not" : [],  //与 NOT 等价
   }
}

所有对应代码如下：

GET /index_test/doc_type/_search
{
  "query": {
    "bool": {
      "must": [
        { "term": { "filed1" :"value1 }},
        { "term": {  "filed2" :"value2" }}
      ]
    }
  }
}

2）Python代码检索：

# 上面DSL代码query部分复制下来
dsl = {
 "query": {
    "bool": {
      "must": [
        { "term": { "filed1" :"value1 }},
        { "term": {  "filed2" :"value2" }}
      ]
    }
  }
}
es_result = es.search(index='index_test', doc_type='doc_type', body=dsl)
# es返回的是一个dict
result = es_result['hits']['hits']
print(result)

weixin_39702479

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
elasticsearch 查询_ElasticSearch+python完成类似mysql的不分词查询

ElasticSearch 6.x + python 3.x最近做一个项目，要存储千万级别的三元组（比如：【姓名，年龄，职业】三个字段所对应的信息），后续有频繁的查询，尝试过存入mysql，但是连续上百次查询都超级慢，学长建议尝试一下ES建立倒排索引，于是这两天开始学习ES，虽然它主要用于日志或者文档的管理，但是还是想尝试一下能不能把ElasticSearch当做Mysql，实现mysql等同的一...
复制链接

扫一扫