Elasticsearch使用总结

Elasticsearch从索引到查询

  • 创建索引
  • 分词
  • 查询
  • 高亮
  • 分页
  • 排序

创建索引

  • 第一步,创建索引
`if self.es.indices.exists(index='test-index') is not True:
            self.es.indices.create(index='test-index',body=self._index_mappings)
        else:
            self.es.indices.delete(index='test-index')
            self.es.indices.create(index='test-index',body=self._index_mappings)` 

参数body是字典类型的,定义形式如下:

_index_mappings = {
            "mappings":{
                    "properties":{
                        "title":{"type":"text"},
                        "content":{"type":"text"},
                        "st_time":{"type":"date"},
                        "pt_time":{"type":"date"},
                        "board":{"type":"text"},
                        "topic_id":{"type":"integer"},
                        "t_id":{"type":"text"},
                        "url":{"type":"text"},
                        "site_id":{"type":"integer"},
                        #"site_name":{"type":"text"},
                        "data_type":{"type":"integer"},
                        "read_num":{"type":"integer"},
                        "comm_num":{"type":"integer"},
                        #"repost_num":{"type":"integer"},
                        "img_url":{"type":"text"},
                        "lan_type":{"type":"integer"},
                        "is_read":{"type":"integer"},
                        "name":{"type":"text"}
                    }
                }
            }
        }

在ES5.3中不支持String类型的书写,数据库中的String类型建立索引的时候用text,int用integer,datetime用date(以mongodb为例)

  • 写入索引
for _doc in post_docs:
            #print("***",_doc)
            try:
                self.es.index(index='test-index',doc_type='post',refresh=True,body=_doc)
            except:
                traceback.print_exc()

注意,_doc的形式必须要与body的定义形式相同

分词

分词使用的是IK分词器,下载安装参考这篇博文http://www.cnblogs.com/xing901022/p/5910139.htmlhttp://www.yihaomen.com/article/java/629.htm

用maven编译后的文件目录如下所示:

...\elasticsearch-5.3.0\elasticsearch-5.3.0\plugins\ik

安装完分词器以后,使用分词器:

self._index_mappings = {
            "mappings":{***@fulltext,全局设置***
                    "fulltext":{
                        "_all": {
                        "analyzer": "ik_max_word",
                        "search_analyzer": "ik_max_word",
                        "term_vector": "no",
                        "store": "false"
                    },
                    "properties":{***@需要分词的field,进行具体设置***
                        "title":{"type":"text",
                                    "fielddata":"true",
                                    "analyzer": "ik_max_word",
                                    "search_analyzer": "ik_max_word",
                                    "include_in_all": "true"},
                        "content":{"type":"text",
                                    "fielddata":"true",
                                    "analyzer": "ik_max_word",
                                    "search_analyzer": "ik_max_word",
                                    "include_in_all": "true"},
                        "st_time":{"type":"date"},
                        "pt_time":{"type":"date"},
                        "board":{"type":"text",
                                    "fielddata":"true",
                                    "analyzer": "ik_max_word",
                                    "search_analyzer": "ik_max_word",
                                    "include_in_all": "true"},
                        "topic_id":{"type":"integer"},
                        "t_id":{"type":"text"},
                        "url":{"type":"text"},
                        "site_id":{"type":"integer"},
                        #"site_name":{"type":"text"},
                        "data_type":{"type":"integer"},
                        "read_num":{"type":"integer"},
                        "comm_num":{"type":"integer"},
                        #"repost_num":{"type":"integer"},
                        "img_url":{"type":"text"},
                        "lan_type":{"type":"integer"},
                        "is_read":{"type":"integer"},
                        "name":{"type":"text",
                                    "fielddata":"true",
                                    "analyzer": "ik_max_word",
                                    "search_analyzer": "ik_max_word",
                                    "include_in_all": "true"}
                    }
                }
            }
        }

查询

简单查询

  "query": {
    "match_all": {}
  }

条件查询

"query":{
                "bool":{
                        "should":[
                                {"match":{"title":"医务"}},
                                {"match":{"topic_id":"1"}},
                                {"match":{"board":"微信"}},
                                {"range":{
                                        "@timestamp":{
                                            "from":"2016-12-28T20:38:09.815000",
                                            "to":"2017-04-11T00:00:00"
                                                    }
                                        }
                                }
                                ]
                        }   
                }
        }

高亮

"query":{
                "bool":{
                        "should":[
                                {"match":{"title":"医务"}},
                                {"match":{"topic_id":"1"}},
                                {"match":{"board":"微信"}},
                                {"range":{
                                        "@timestamp":{
                                            "from":"2016-12-28T20:38:09.815000",
                                            "to":"2017-04-11T00:00:00"
                                                    }
                                        }
                                }
                                ]
                        }   
                },
        "highlight": {
            "pre_tags": ["<em class=\"hlt1\">"],
            "post_tags": ["</em>"],
            "fields": {
            "title": {},
            "board":{}
            }
        }

分页

"query":{
                "bool":{
                        "should":[
                                {"match":{"title":"医务"}},
                                {"match":{"topic_id":"1"}},
                                {"match":{"board":"微信"}},
                                {"range":{
                                        "@timestamp":{
                                            "from":"2016-12-28T20:38:09.815000",
                                            "to":"2017-04-11T00:00:00"
                                                    }
                                        }
                                }
                                ]
                        }   
                },
        "highlight": {
            "pre_tags": ["<em class=\"hlt1\">"],
            "post_tags": ["</em>"],
            "fields": {
            "title": {},
            "board":{}
            }
        },
        "from":0,
        "size":10

from为起始页,size为每页大小

排序

{
  "query": {
    "match_all": {}
  },
  "sort":[
            {
              "name":{"order":"asc"}}
            ]
}

order的取值:desc和asc分别为降序和升序
排序前需要在建立索引的时候制定fielddata为TRUE,参考官网的解释:
Search needs to answer the question “Which documents contain this term?”, while sorting and aggregations need to answer a different question: “What is the value of this field for this document?”.
Instead, text fields use a query-time in-memory data structure called fielddata. This data structure is built on demand the first time that a field is used for aggregations, sorting, or in a script.
Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
fielddata是通过读取所有分段中的倒排索引按照term-doc顺序构建的一种数据结果,存放在JVM中支持in-memory查询,很费内存

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值