技术笔记外传二——用elasticsearch搭建自己的搜索框架（三）

本文链接：https://blog.csdn.net/BetrayArmy/article/details/88604522

四 esengine的高级搜索

这篇博客中，我们将介绍esengine的高级搜索，如下图所示：

在图中右上角是一个多字段搜索表单，以及一个通向高级搜索功能的链接。点击高级搜索，会进入到高级搜索页面中：

在高级搜索页面中，esengine提供一个复杂的表单来设置搜索的各种条件，包括要包含的关键字以及要排除的关键字，还有在指定的范围内进行搜索；在得到搜索结果后，我们可以在页面右上角的表单中继续我们的搜索：

为了实现高级搜索所需要的条件，我们需要使用Query DSL中的bool查询和range查询。bool查询含义是将两个query的结果根据给定条件做逻辑运算，并得到最终结果。一共有三种bool条件，名称和作用如下：

Elasticsearch中的bool条件
must	must下的query必须同时满足，即对多个query做与操作
should	should下的query至少满足一共，即对多个query做或操作
must_not	must_not下的query一定不满足，即对每个query取反

而range查询顾名思义，是对指定的值在指定的范围进行匹配。range使用gte、lte等字符串限定范围，而非直接的数学运算符。常用的限定词如下：

Elasticsearch中的range限定词
lte	小于等于
lt	小于
gte	大于等于
gt	大于

为了同时使用bool和range查询，我们需要使用filter关键字，即对查询结果根据指定条件进行一个过滤。

有了上面两个查询的知识，我们就可以来分析高级搜索的Query DSL是如何实现了。很显然，这是一个大的bool查询。对于包含关键词，我们要根据用户选择的包含方式（与和或）选用must或should进行匹配；而对于排除关键词，我们要使用must_not进行匹配；最后，我们使用filter来根据选定的日期进行过滤，从而得到我们的搜索结果。

因此，我们使用__buildAdvanceQueryBody来构造满足以上条件的json，代码如下：

# esenginecore.py
class esengine:
    # ...
    def __buildAdvanceQueryBody(self,includefields,includekeywords,
                                excludefields,excludekeywords,datefield,
                                startdate,enddate,includemethod):
        querystr = '{"query":{' \
                   '"bool":{'
        if includemethod == '1':
            must_clause_header = '"must":[ %s ]'
        else:
            must_clause_header = '"should":[ %s ]'
        must_not_clause_header = '"must_not":[ %s ]'
        match_clause = '{ "prefix": { "%s":"%s" } }'
        range_clause = '"filter":[ { "range": { "%s" :{ "gte":"%s","lte":"%s" } } } ]'
        includekeywordlist = []
        excludekeywordlist = []
        if includekeywords != '':
            includekeywordlist = includekeywords.split(',')
        if excludekeywords != '':
            excludekeywordlist = excludekeywords.split(',')
        total_include_match = ''
        total_exclude_match = ''
        must_clause_body = ''
        must_not_clause_body = ''
        if len(includefields) > 0:
            for includefield in includefields:
                for includekeyword in includekeywordlist:
                    if total_include_match != '':
                        total_include_match += ','
                    tmp_match_clause = match_clause % (includefield,includekeyword)
                    total_include_match += tmp_match_clause
        if len(excludefields) > 0:
            for excludefield in excludefields:
                for excludekeyword in excludekeywordlist:
                    if total_exclude_match != '':
                        total_exclude_match += ','
                    tmp_match_clause = match_clause % (excludefield,excludekeyword)
                    total_exclude_match += tmp_match_clause

        range_clause_body = range_clause % (datefield, startdate, enddate)
        if total_include_match != '':
            must_clause_body = must_clause_header % total_include_match
        if total_exclude_match != '':
            must_not_clause_body = must_not_clause_header % total_exclude_match

        if must_not_clause_body != '':
            querystr = querystr + must_clause_body + ',' + must_not_clause_body + ',' + range_clause_body +'}}}'
        else:
            querystr = querystr + must_clause_body + ',' + range_clause_body +'}}}'
        body = json.loads(querystr)
        return body

这个函数相比于其他几个构造queryBody函数，多了很多参数。因此在这里依然使用表格来介绍每个参数的含义：

__buildAdvanceQueryBody参数含义
参数名称	类型	含义
includefields	list	在这些字段中应包含关键字
includekeywords	str	搜索的关键字，以英文逗号分割
excludefields	list	在这些字段中应排除关键字
excludekeywords	str	要排除的关键字，以英文逗号分割
datefield	str	在此字段对日期进行限定
startdate	date	起始日期
enddate	date	结束日期
includemethod	str	决定是以与还是或方式搜索关键字

在这个函数中，为了构造复杂的json，我将json拆分为must/should部分、must_not部分和filter部分，并把每部分的子查询统一成一个字符串模板match_clause。这里我们使用了prefix而不是match进行匹配，原因是采用前缀方式来匹配关键字能得到更多的搜索结果；如果使用match进行搜索的话，我们只能得到精确包含整个关键字的结果，而不能得到仅包含部分关键字的结果。

通过这个函数，我们最后得到的json字符串如下所示：

{"query":{
	"bool":{
			"must":[ 
				{ "prefix": { "title":"test" } },
				{ "prefix": { "content":"test" } } 
			],
			"must_not":[ 
				{ "prefix": { "content":"0" } } 
			],
			"filter":[ 
				{ "range": 
					{ "createdate" :
						{ "gte":"1900-01-01",
						"lte":"2020-01-01" 
						} 
					} 
				} 
			]
		}
	}
}

然后，我们再来实现与之对应的advancesearch函数：

# esenginecore.py
class esengine:
    # ...
    def advancesearch(self,indexname,doctype,includefields,includekeywords,excludefields,excludekeywords,datefield,startdate,enddate,includemethod):
        querybody = self.__buildAdvanceQueryBody(includefields,
                                                 includekeywords,
                                                 excludefields,
                                                 excludekeywords,
                                                 datefield,
                                                 startdate,
                                                 enddate,
                                                 includemethod)
        res = self.es.search(indexname, doctype, body=querybody)
        totalcount, result = self.__parseresult(res, includefields)
        return totalcount,result
    # ...

这个函数没什么好说的，只是根据传入的参数来生成我们的json，并传递给elasticsearch得到搜索结果，并在之后的View函数中调用它。

在这期博客中，我们介绍了esengine的高级搜索功能。使用高级搜索功能，我们能制定更详细的搜索条件，从而得到更多/更精确的搜索结果。在下一篇博客中，将为大家带来esengine的View和表单部分，希望大家继续关注～