Elasticsearch使用总结

最新推荐文章于 2023-08-30 18:36:10 发布

如锡如璧

最新推荐文章于 2023-08-30 18:36:10 发布

阅读量658

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch 索引 IK分词 fileddata

本文链接：https://blog.csdn.net/u014449866/article/details/70568838

版权

elasticsearch 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Elasticsearch从索引到查询

创建索引
分词
查询
高亮
分页
排序

创建索引

第一步，创建索引

`if self.es.indices.exists(index='test-index') is not True:
            self.es.indices.create(index='test-index',body=self._index_mappings)
        else:
            self.es.indices.delete(index='test-index')
            self.es.indices.create(index='test-index',body=self._index_mappings)`

参数body是字典类型的，定义形式如下：

_index_mappings = {
            "mappings":{
                    "properties":{
                        "title":{"type":"text"},
                        "content":{"type":"text"},
                        "st_time":{"type":"date"},
                        "pt_time":{"type":"date"},
                        "board":{"type":"text"},
                        "topic_id":{"type":"integer"},
                        "t_id":{"type":"text"},
                        "url":{"type":"text"},
                        "site_id":{"type":"integer"},
                        #"site_name":{"type":"text"},
                        "data_type":{"type":"integer"},
                        "read_num":{"type":"integer"},
                        "comm_num":{"type":"integer"},
                        #"repost_num":{"type":"integer"},
                        "img_url":{"type":"text"},
                        "lan_type":{"type":"integer"},
                        "is_read":{"type":"integer"},
                        "name":{"type":"text"}
                    }
                }
            }
        }

在ES5.3中不支持String类型的书写，数据库中的String类型建立索引的时候用text，int用integer,datetime用date（以mongodb为例）

写入索引

for _doc in post_docs:
            #print("***",_doc)
            try:
                self.es.index(index='test-index',doc_type='post',refresh=True,body=_doc)
            except:
                traceback.print_exc()

注意，_doc的形式必须要与body的定义形式相同

分词

分词使用的是IK分词器，下载安装参考这篇博文http://www.cnblogs.com/xing901022/p/5910139.html 和http://www.yihaomen.com/article/java/629.htm

用maven编译后的文件目录如下所示：

...\elasticsearch-5.3.0\elasticsearch-5.3.0\plugins\ik

安装完分词器以后，使用分词器：

self._index_mappings = {
            "mappings":{***@fulltext,全局设置***
                    "fulltext":{
                        "_all": {
                        "analyzer": "ik_max_word",
                        "search_analyzer": "ik_max_word",
                        "term_vector": "no",
                        "store": "false"
                    },
                    "properties":{***@需要分词的field，进行具体设置***
                        "title":{"type":"text",
                                    "fielddata":"true",
                                    "analyzer": "ik_max_word",
                                    "search_analyzer": "ik_max_word",
                                    "include_in_all": "true"},
                        "content":{"type":"text",
                                    "fielddata":"true",
                                    "analyzer": "ik_max_word",
                                    "search_analyzer": "ik_max_word",
                                    "include_in_all": "true"},
                        "st_time":{"type":"date"},
                        "pt_time":{"type":"date"},
                        "board":{"type":"text",
                                    "fielddata":"true",
                                    "analyzer": "ik_max_word",
                                    "search_analyzer": "ik_max_word",
                                    "include_in_all": "true"},
                        "topic_id":{"type":"integer"},
                        "t_id":{"type":"text"},
                        "url":{"type":"text"},
                        "site_id":{"type":"integer"},
                        #"site_name":{"type":"text"},
                        "data_type":{"type":"integer"},
                        "read_num":{"type":"integer"},
                        "comm_num":{"type":"integer"},
                        #"repost_num":{"type":"integer"},
                        "img_url":{"type":"text"},
                        "lan_type":{"type":"integer"},
                        "is_read":{"type":"integer"},
                        "name":{"type":"text",
                                    "fielddata":"true",
                                    "analyzer": "ik_max_word",
                                    "search_analyzer": "ik_max_word",
                                    "include_in_all": "true"}
                    }
                }
            }
        }

查询

简单查询

  "query": {
    "match_all": {}
  }

条件查询

"query":{
                "bool":{
                        "should":[
                                {"match":{"title":"医务"}},
                                {"match":{"topic_id":"1"}},
                                {"match":{"board":"微信"}},
                                {"range":{
                                        "@timestamp":{
                                            "from":"2016-12-28T20:38:09.815000",
                                            "to":"2017-04-11T00:00:00"
                                                    }
                                        }
                                }
                                ]
                        }   
                }
        }

高亮

"query":{
                "bool":{
                        "should":[
                                {"match":{"title":"医务"}},
                                {"match":{"topic_id":"1"}},
                                {"match":{"board":"微信"}},
                                {"range":{
                                        "@timestamp":{
                                            "from":"2016-12-28T20:38:09.815000",
                                            "to":"2017-04-11T00:00:00"
                                                    }
                                        }
                                }
                                ]
                        }   
                },
        "highlight": {
            "pre_tags": ["<em class=\"hlt1\">"],
            "post_tags": ["</em>"],
            "fields": {
            "title": {},
            "board":{}
            }
        }

分页

"query":{
                "bool":{
                        "should":[
                                {"match":{"title":"医务"}},
                                {"match":{"topic_id":"1"}},
                                {"match":{"board":"微信"}},
                                {"range":{
                                        "@timestamp":{
                                            "from":"2016-12-28T20:38:09.815000",
                                            "to":"2017-04-11T00:00:00"
                                                    }
                                        }
                                }
                                ]
                        }   
                },
        "highlight": {
            "pre_tags": ["<em class=\"hlt1\">"],
            "post_tags": ["</em>"],
            "fields": {
            "title": {},
            "board":{}
            }
        },
        "from":0,
        "size":10

from为起始页，size为每页大小

排序

{
  "query": {
    "match_all": {}
  },
  "sort":[
            {
              "name":{"order":"asc"}}
            ]
}

order的取值：desc和asc分别为降序和升序
排序前需要在建立索引的时候制定fielddata为TRUE，参考官网的解释：
Search needs to answer the question “Which documents contain this term?”, while sorting and aggregations need to answer a different question: “What is the value of this field for this document?”.
Instead, text fields use a query-time in-memory data structure called fielddata. This data structure is built on demand the first time that a field is used for aggregations, sorting, or in a script.
Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
fielddata是通过读取所有分段中的倒排索引按照term-doc顺序构建的一种数据结果，存放在JVM中支持in-memory查询，很费内存

如锡如璧

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch使用总结

Elasticsearch从索引到查询创建索引分词查询高亮分页排序创建索引第一步，创建索引`if self.es.indices.exists(index='test-index') is not True: self.es.indices.create(index='test-index',body=self._index_mappings)
复制链接

扫一扫

专栏目录