Elasticsearch从索引到查询
- 创建索引
- 分词
- 查询
- 高亮
- 分页
- 排序
创建索引
- 第一步,创建索引
`if self.es.indices.exists(index='test-index') is not True:
self.es.indices.create(index='test-index',body=self._index_mappings)
else:
self.es.indices.delete(index='test-index')
self.es.indices.create(index='test-index',body=self._index_mappings)`
参数body是字典类型的,定义形式如下:
_index_mappings = {
"mappings":{
"properties":{
"title":{"type":"text"},
"content":{"type":"text"},
"st_time":{"type":"date"},
"pt_time":{"type":"date"},
"board":{"type":"text"},
"topic_id":{"type":"integer"},
"t_id":{"type":"text"},
"url":{"type":"text"},
"site_id":{"type":"integer"},
#"site_name":{"type":"text"},
"data_type":{"type":"integer"},
"read_num":{"type":"integer"},
"comm_num":{"type":"integer"},
#"repost_num":{"type":"integer"},
"img_url":{"type":"text"},
"lan_type":{"type":"integer"},
"is_read":{"type":"integer"},
"name":{"type":"text"}
}
}
}
}
在ES5.3中不支持String类型的书写,数据库中的String类型建立索引的时候用text,int用integer,datetime用date(以mongodb为例)
- 写入索引
for _doc in post_docs:
#print("***",_doc)
try:
self.es.index(index='test-index',doc_type='post',refresh=True,body=_doc)
except:
traceback.print_exc()
注意,_doc的形式必须要与body的定义形式相同
分词
分词使用的是IK分词器,下载安装参考这篇博文http://www.cnblogs.com/xing901022/p/5910139.html 和http://www.yihaomen.com/article/java/629.htm
用maven编译后的文件目录如下所示:
...\elasticsearch-5.3.0\elasticsearch-5.3.0\plugins\ik
安装完分词器以后,使用分词器:
self._index_mappings = {
"mappings":{***@fulltext,全局设置***
"fulltext":{
"_all": {
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"term_vector": "no",
"store": "false"
},
"properties":{***@需要分词的field,进行具体设置***
"title":{"type":"text",
"fielddata":"true",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"include_in_all": "true"},
"content":{"type":"text",
"fielddata":"true",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"include_in_all": "true"},
"st_time":{"type":"date"},
"pt_time":{"type":"date"},
"board":{"type":"text",
"fielddata":"true",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"include_in_all": "true"},
"topic_id":{"type":"integer"},
"t_id":{"type":"text"},
"url":{"type":"text"},
"site_id":{"type":"integer"},
#"site_name":{"type":"text"},
"data_type":{"type":"integer"},
"read_num":{"type":"integer"},
"comm_num":{"type":"integer"},
#"repost_num":{"type":"integer"},
"img_url":{"type":"text"},
"lan_type":{"type":"integer"},
"is_read":{"type":"integer"},
"name":{"type":"text",
"fielddata":"true",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"include_in_all": "true"}
}
}
}
}
查询
简单查询
"query": {
"match_all": {}
}
条件查询
"query":{
"bool":{
"should":[
{"match":{"title":"医务"}},
{"match":{"topic_id":"1"}},
{"match":{"board":"微信"}},
{"range":{
"@timestamp":{
"from":"2016-12-28T20:38:09.815000",
"to":"2017-04-11T00:00:00"
}
}
}
]
}
}
}
高亮
"query":{
"bool":{
"should":[
{"match":{"title":"医务"}},
{"match":{"topic_id":"1"}},
{"match":{"board":"微信"}},
{"range":{
"@timestamp":{
"from":"2016-12-28T20:38:09.815000",
"to":"2017-04-11T00:00:00"
}
}
}
]
}
},
"highlight": {
"pre_tags": ["<em class=\"hlt1\">"],
"post_tags": ["</em>"],
"fields": {
"title": {},
"board":{}
}
}
分页
"query":{
"bool":{
"should":[
{"match":{"title":"医务"}},
{"match":{"topic_id":"1"}},
{"match":{"board":"微信"}},
{"range":{
"@timestamp":{
"from":"2016-12-28T20:38:09.815000",
"to":"2017-04-11T00:00:00"
}
}
}
]
}
},
"highlight": {
"pre_tags": ["<em class=\"hlt1\">"],
"post_tags": ["</em>"],
"fields": {
"title": {},
"board":{}
}
},
"from":0,
"size":10
from为起始页,size为每页大小
排序
{
"query": {
"match_all": {}
},
"sort":[
{
"name":{"order":"asc"}}
]
}
order的取值:desc和asc分别为降序和升序
排序前需要在建立索引的时候制定fielddata为TRUE,参考官网的解释:
Search needs to answer the question “Which documents contain this term?”, while sorting and aggregations need to answer a different question: “What is the value of this field for this document?”.
Instead, text fields use a query-time in-memory data structure called fielddata. This data structure is built on demand the first time that a field is used for aggregations, sorting, or in a script.
Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
fielddata是通过读取所有分段中的倒排索引按照term-doc顺序构建的一种数据结果,存放在JVM中支持in-memory查询,很费内存