Elasticsearch教程---基本查询(八)

10.1 term & terms查询

10.1.1 term

term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词,所以我们的搜索词必须是文档分词集合中的一个。

比如说我们要查找省份(province)中为“湖北省”的所有文档,JSON如下:

{
	"from": 0,
	"size": 5,
	"query": {
		"term": {
			"province": {
				"value": "湖北省",
				"boost": 1.0
			}
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 查询某个字段里含有某个关键词的文档
 * @param indexName   索引名
 * @param typeName    TYPE
 * @param fieldName   字段名称
 * @param fieldValue  字段值
 * @return 返回结果列表
 * @throws IOException
 */
public List<Map<String,Object>> termQuery(String indexName, String typeName, String fieldName, String fieldValue) throws IOException {
    List<Map<String,Object>> response =new ArrayList<>();
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    sourceBuilder.query(QueryBuilders.termQuery(fieldName, fieldValue));
    sourceBuilder.from(0);
    sourceBuilder.size(10);
    searchRequest.source(sourceBuilder);
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
        response.add(hit.getSourceAsMap());
    }
    return response;
}

10.1.2 terms

在查询的字段只有一个值的时候,应该使用term而不是terms,在查询字段包含多个的时候才使用terms(类似于sql中的in、or),使用terms语法.

比如说我们要查找省份(province)为“湖北省”或“北京”的所有文档,JSON如下:

{
	"from": 0,
	"size": 5,
	"query": {
		"terms": {
			"province": ["湖北省", "北京"],
			"boost": 1.0
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 查询某个字段里含有多个关键词的文档
 * @param indexName   索引名
 * @param typeName    TYPE
 * @param fieldName   字段名称
 * @param fieldValues  字段值
 * @return 返回结果列表
 * @throws IOException
 */
public List<Map<String,Object>> termsQuery(String indexName, String typeName, String fieldName, String... fieldValues) throws IOException {
    List<Map<String,Object>> response =new ArrayList<>();
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    sourceBuilder.query(QueryBuilders.termsQuery(fieldName,fieldValues));
    sourceBuilder.from(0);
    sourceBuilder.size(10);
    searchRequest.source(sourceBuilder);
    log.info("source:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
        response.add(hit.getSourceAsMap());
    }
    return response;
}

10.1.3 测试演示用例

//查询省份为湖北省的数据
@Test
public void testTerm() throws IOException {
    List<Map<String, Object>> list = baseQuery.termQuery(indexName, type, "province", "湖北省");
    System.out.println("term查询数量:" + list.size());
    System.out.println(list);
}

查询省份为湖北省或北京的数据
@Test
public void testTerms() throws IOException {
    List<Map<String, Object>> list1 = baseQuery.termsQuery(indexName, type, "province", "湖北省","北京");
    System.out.println("terms查询数量:" + list1.size());
    System.out.println(list1);
}

10.1.4 学生实习题

参考上面的代码,实现如下功能:

1、查出手机号为“13800000000”的文档
2、查出公司名称为“途虎养车”或“盒马鲜生”的文档

10.2 match查询

像 match 或 query_string 这样的查询是高层查询,它们了解字段映射的信息:
1.如果查询 日期(date) 或 整数(integer) 字段,它们会将查询字符串分别作为日期或整数对待。
2.如果查询一个( not_analyzed )未分析的精确值字符串字段, 它们会将整个查询字符串作为单个词项对待。
3.但如果要查询一个( analyzed )已分析的全文字段, 它们会先将查询字符串传递到一个合适的分析器,然后生成一个供查询的词项列表。
一旦组成了词项列表,这个查询会对每个词项逐一执行底层的查询,再将结果合并,然后为每个文档生成一个最终的相关度评分。
match查询其实底层是多个term查询,最后将term的结果合并。

10.2.1 match_all查询

查询所有的文档。

JSON如下:

{
	"query": {
		"match_all": {
			"boost": 1.0
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 查询所有文档
 * @param indexName   索引名称
 * @param typeName    TYPE
 * @throws IOException
 */
@Override
public void queryAll(String indexName, String typeName) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchAllQuery());
    searchSourceBuilder.from(0);
    searchSourceBuilder.size(20);
    searchRequest.source(searchSourceBuilder);
    log.info("source:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println(hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println(hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest

   //查询全部
    @Test
    public void testMatchsAll() throws IOException {
        baseQuery.queryAll(indexName, type);
    }

10.2.2 match查询

{
	"query": {
		"match": {
			"smsContent": {
				"query": "收货安装",
				"operator": "OR",
				"prefix_length": 0,
				"max_expansions": 50,
				"fuzzy_transpositions": true,
				"lenient": false,
				"zero_terms_query": "NONE",
				"auto_generate_synonyms_phrase_query": true,
				"boost": 1.0
			}
		}
	}
}

对“收货安装”进行分词:

收货,,安装

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * match 搜索
 * @param indexName 索引名称
 * @param typeName  TYPE名称
 * @param field     字段
 * @param keyWord   搜索关键词
 * @throws IOException
 */
@Override
public void queryMatch(String indexName, String typeName, String field,String keyWord) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchQuery(field,keyWord));
    searchRequest.source(searchSourceBuilder);
    log.info("source:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest

@Test
public void testQueryMatch() throws IOException {
    baseQuery.queryMatch(indexName,type,"smsContent","收货安装");
}

查询结果

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oTf61FRJ-1584624053058)(mdpic/es_match.png)]

10.2.3 布尔match查询

查询短信内容中 即有“中国”同时也有“健康”的文档:

{
	"query": {
		"match": {
			"smsContent": {
				"query": "中国 健康",
				"operator": "AND",
				"prefix_length": 0,
				"max_expansions": 50,
				"fuzzy_transpositions": true,
				"lenient": false,
				"zero_terms_query": "NONE",
				"auto_generate_synonyms_phrase_query": true,
				"boost": 1.0
			}
		}
	}
}

查询短信内容中有“中国”或“健康”的文档:

{
	"query": {
		"match": {
			"smsContent": {
				"query": "中国 健康",
				"operator": "OR",
				"prefix_length": 0,
				"max_expansions": 50,
				"fuzzy_transpositions": true,
				"lenient": false,
				"zero_terms_query": "NONE",
				"auto_generate_synonyms_phrase_query": true,
				"boost": 1.0
			}
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 布尔match查询
 * @param indexName    索引名称
 * @param typeName     TYPE名称
 * @param field        字段名称
 * @param keyWord      关键词
 * @param op           该参数取值为or 或 and
 * @throws IOException
 */
@Override
public void queryMatchWithOperate(String indexName, String typeName, String field,String keyWord,Operator op) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchQuery(field,keyWord).operator(op));
    searchRequest.source(searchSourceBuilder);
    log.info("source:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest

@Test
public void testMatchWithOperate() throws IOException {
    baseQuery.queryMatchWithOperate(indexName, type, "smsContent", "中国 健康", Operator.AND);
    baseQuery.queryMatchWithOperate(indexName, type, "smsContent", "中国 健康", Operator.OR);
}

查询结果

在这里插入图片描述

查询结果2:

在这里插入图片描述

10.2.4 multi_match查询

multi_match查询与match查询类似,不同的是他不在作用于一个字段上,该查询通过字段fields参数作用在多个字段上。

例如:查询省份(province)与短信内容(smsContent)中含有“北京”关键字的文档:

{
	"query": {
		"multi_match": {
			"query": "北京",
			"fields": ["province^1.0", "smsContent^1.0"],
			"type": "best_fields",
			"operator": "OR",
			"slop": 0,
			"prefix_length": 0,
			"max_expansions": 50,
			"zero_terms_query": "NONE",
			"auto_generate_synonyms_phrase_query": true,
			"fuzzy_transpositions": true,
			"boost": 1.0
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 *该查询通过字段fields参数作用在多个字段上。
 * @param indexName  索引名称
 * @param typeName   TYPE名称
 * @param keyWord    关键字
 * @param fieldNames  字段
 * @throws IOException
 */
@Override
public void queryMulitMatch(String indexName, String typeName,String keyWord,String ...fieldNames) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.multiMatchQuery(keyWord,fieldNames));
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest

    @Test
    public void testQueryMulitMatch() throws IOException {
        baseQuery.queryMulitMatch(indexName,type,"北京","smsContent","province");
    }

查询结果:
在这里插入图片描述

10.2.5 match_phrase 查询

对查询词语分析后构建一个短语查询,而不是一个布尔表达式。

一、什么是近似匹配

match_phrase的使用场景

现假设有两个句子

1、java is my favourite programming language, and I also think spark is a very good big data system.

2、java spark are very related, because scala is spark’s programming language and scala is also based on jvm like java.

进行match query,query语法如下:

{
    "query":{
            "match": {
            "content": "java spark"
            }}

match query进行搜索,只能搜索到包含java或spark的document,包含java和spark的doc都会被返回回来。现在假如说我们要实现以下三个需求:

1、java spark,就靠在一起,中间不能插入任何其他字符,就要搜索出来这种doc

2、java spark,但是要求,java和spark两个单词靠的越近,doc的分数越高,排名越靠前

3、我们搜索时,文档中必须包含java spark这两个文档,且他们之间的距离不能超过5,

要实现上述三个需求,用match做全文检索,是搞不定的,必须得用proximity match(近似匹配),proximity match分两种,短语匹配(phrase match)和近似匹配(proximity match)。这一讲,要学习的是phrase match,就是仅仅搜索出java和spark靠在一起的那些doc,比如有个doc,是java use’d spark,这就不是结果。

二、match_phrase的用法

phrase match,就是要去将多个term作为一个短语,一起去搜索,只有包含这个短语的doc才会作为结果返回。match是只在包含其中任何一个分词就返回。

1、match语法:

GET /forum/article/_search

{
    "query": {
        "match": {
        "content": "java spark"
        }
	}
}

单单包含java的doc也返回了,不是我们想要的结果

2、改一个数据,将一个doc的content设置为恰巧包含java spark这个短语,以方便搜索

POST /forum/article/5/_update
{
    "doc": {

    "content": "spark is best big data solution based on scala ,an programming language similar to java spark"

    }
}

3、match_phrase语法

GET /forum/article/_search
{
    "query": {
            "match_phrase": {
            "content": "java spark"
            }
    }
}

结果只返回了最后我们修改的那个doc,只包含java或spark的doc不会返回

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 对查询词语分析后构建一个短语查询
 * @param indexName    索引名称
 * @param typeName     TYPE名称
 * @param fieldName    字段名称
 * @param keyWord      关键字
 * @throws IOException
 */
@Override
public void queryMatchPhrase(String indexName, String typeName,String fieldName,String keyWord) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchPhraseQuery(fieldName,keyWord));
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest

	@Test
    public void testMathPhrase() throws IOException, InterruptedException {
        SmsSendLog smsSendLog5 = new SmsSendLog();
        smsSendLog5.setMobile("13600000088");
        smsSendLog5.setCorpName("中国移动");
        smsSendLog5.setCreateDate(new Date());
        smsSendLog5.setSendDate(new Date());
        smsSendLog5.setIpAddr("10.126.2.8");
        smsSendLog5.setLongCode("10690000998");
        smsSendLog5.setReplyTotal(60);
        smsSendLog5.setProvince("湖北省");
        smsSendLog5.setOperatorId(1);
        smsSendLog5.setSmsContent("java is my favourite programming language, and I also think spark is a very good big data system..");
        docService.add(indexName,type, JSON.toJSONString(smsSendLog5),"200");
        smsSendLog5.setSmsContent("java spark very related, because scala is spark's programming language and scala is also based on jvm like java.");
        docService.add(indexName,type, JSON.toJSONString(smsSendLog5),"300");
        //不是实时刷新,不休眠一下可能查不出来
        Thread.sleep(3000l);
        baseQuery.queryMatchPhrase(indexName,type,"smsContent","java spark");
    }

1.1 ids查询

ids查询是一类简单的查询,它过滤返回的文档只包含其中指定标识符的文档,该查询默认指定作用在“_id”上面。

例如查询_id为1,2,3的文档:

{
	"query": {
		"ids": {
			"type": [],
			"values": ["1", "2", "3"],
			"boost": 1.0
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 查出指定_id的文档
 * @param indexName   索引名称
 * @param typeName    TYPE名称
 * @param ids         _id值
 * @throws IOException
 */
@Override
public void idsQuery(String indexName, String typeName,String ... ids) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.idsQuery().addIds(ids));
    searchRequest.source(searchSourceBuilder);
    log.info("string:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.testIdsQuery

@Test
public void testIdsQuery() throws IOException, InterruptedException {
    baseQuery.idsQuery(indexName,type,"1","2","3");
}

1.2 prefix查询

前缀查询,可以使我们找到某个字段以给定前缀开头的文档。

最典型的使用场景,一般是在文本框录入的时候的联想功能,如下所示:
在这里插入图片描述
例如想查到公司名以“中国”开头的文档:

{
	"query": {
		"prefix": {
			"corpName": {
				"value": "中国",
				"boost": 1.0
			}
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 查找某字段以某个前缀开头的文档
 * @param indexName 索引名称
 * @param typeName  TYPE名称
 * @param field     字段
 * @param prefix    前缀
 * @throws IOException
 */
@Override
public void prefixQuery(String indexName, String typeName, String field, String prefix) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.prefixQuery(field,prefix));
    searchRequest.source(searchSourceBuilder);
    log.info("string:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.testIdsQuery

@Test
public void testPrefixQuery() throws IOException, InterruptedException {
    baseQuery.prefixQuery(indexName,type,"corpName","中国");
}

1.3 fuzzy查询

fuzzy才是实现真正的模糊查询,我们输入的字符可以是个大概,他可以根据我们输入的文字大概进行匹配查询。

参数说明:

prefix_length

不能被 “模糊化” 的初始字符数。 大部分的拼写错误发生在词的结尾,而不是词的开始。 例如通过将 prefix_length 设置为 3 ,你可能够显著降低匹配的词项数量。

例如用户在查询过程中把“中国移动”输为了“中国联动”:

{
	"query": {
		"fuzzy": {
			"corpName": {
				"value": "中国联动",
				"fuzziness": "AUTO",
				"prefix_length": 0,
				"max_expansions": 50,
				"transpositions": false,
				"boost": 1.0
			}
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 查找某字段以某个前缀开头的文档
 * @param indexName 索引名称
 * @param typeName  TYPE名称
 * @param field     字段
 * @param value     查询关键字
 * @throws IOException
 */
@Override
public void fuzzyQuery(String indexName, String typeName,String field,String value) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.fuzzyQuery(field,value).prefixLength(2));
    searchRequest.source(searchSourceBuilder);
    log.info("string:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

把prefixLength(2)参数改为prefixLength(3)就查不到结果了,不能被 “模糊化” 的初始字符变成了中国联

演示用例:com.javablog.elasticsearch.test.document.testIdsQuery

@Test
public void testfuzzyQuery() throws IOException, InterruptedException {
    baseQuery.fuzzyQuery(indexName,type,"corpName","中国联动");
}

学生实习题

用户输入错误地把“中国平安”输为了“中国保安”,通过fuzzy查询也可以查出公司名为中国平安的文档。

1.4 wildcard查询

wildcard查询允许我们在要查询的内容中使用通配符*和?,和SQL语句。

注意:wildcard查询不注意查询性能,应尽可能避免使用。

例如用户在查询公司名以“中国”开头的文档:

{
	"query": {
		"wildcard": {
			"corpName": {
				"wildcard": "中国*",
				"boost": 1.0
			}
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 以通配符来查询
 * @param indexName     索引名称
 * @param typeName      TYPE名称
 * @param fieldName     字段名称
 * @param wildcard      通配符
 * @throws IOException
 */
@Override
public void wildCardQuery(String indexName, String typeName, String fieldName, String wildcard) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.wildcardQuery(fieldName, wildcard));
    searchRequest.source(searchSourceBuilder);
    log.info("string:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.testIdsQuery

@Test
public void testWildCardQuery() throws IOException, InterruptedException {
    baseQuery.wildCardQuery(indexName,type,"corpName","中国*");
    baseQuery.wildCardQuery(indexName,type,"corpName","中国?安保险有限公司");
}

1.5 range查询

本章到目前为止,对于数字,只介绍如何处理精确值查询。 实际上,对数字范围进行过滤有时会更有用。例如,我们可能想要查找所有价格大于 $20 且小于 $40 美元的产品。

在 SQL 中,范围查询可以表示为:

SELECT document
FROM   products
WHERE  price BETWEEN 20 AND 40

Elasticsearch 有 range 查询, 不出所料地,可以用它来查找处于某个范围内的文档:

"range" : {
    "price" : {
        "gte" : 20,
        "lte" : 40
    }
}

range 查询可同时提供包含(inclusive)和不包含(exclusive)这两种范围表达式,可供组合的选项如下:

  • gt: > 大于(greater than)

  • lt: < 小于(less than)

  • gte: >= 大于或等于(greater than or equal to)

  • lte: <= 小于或等于(less than or equal to)

例如要查询出短信状态报告响应时间在1-20秒之内的文档:

string: {
	"query": {
		"range": {
			"replyTotal": {
				"from": 1,
				"to": 20,
				"include_lower": true,
				"include_upper": true,
				"boost": 1.0
			}
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 * 范围查询
 * @param indexName     索引名称
 * @param typeName      TYPE名称
 * @param fieldName     字段名称
 * @param from
 * @param to
 * @throws IOException
 */
@Override
public void rangeQuery(String indexName, String typeName, String fieldName, int from,int to) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.rangeQuery(fieldName).from(from).to(to));
    searchRequest.source(searchSourceBuilder);
    log.info("string:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.testIdsQuery

@Test
public void testRangeCardQuery() throws IOException {
    baseQuery.rangeQuery(indexName,type,"replyTotal",1,20);
}

学生实习题

查询价格(fee)在5-8分之间的短信文档。

1.6 regexp查询

正则表达式查询,wildcard和regexp查询的工作方式和prefix查询完全一样。它们也需要遍历倒排索引中的词条列表来找到所有的匹配词条,然后逐个词条地收集对应的文档ID。它们和prefix查询的唯一区别在于它们能够支持更加复杂的模式。

这也意味着使用它们存在相同的风险。对一个含有很多不同词条的字段运行这类查询是非常消耗资源的。避免使用一个以通配符开头的模式(比如,*foo)。

尽管对于前缀匹配,可以在索引期间准备你的数据让它更加高效,通配符和正则表达式匹配只能在查询期间被完成。虽然使用场景有限,但是这些查询也有它们的用武之地。

注意:prefix,wildcard以及regexp查询基于词条进行操作。如果你在一个analyzed字段上使用了它们,它们会检查字段中的每个词条,而不是整个字段。

例如查找长号码(longCode)以1069开头后面是数字的文档:

{
	"query": {
		"regexp": {
			"longCode": {
				"value": "1069[0-9].+",
				"flags_value": 65535,
				"max_determinized_states": 10000,
				"boost": 1.0
			}
		}
	}
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

/**
 *正则表达示查询
 * @param indexName     索引名称
 * @param typeName      TYPE名称
 * @param fieldName     字段名称
 * @param regexp        正则表达示
 * @throws IOException
 */
@Override
public void regexpQuery(String indexName, String typeName, String fieldName, String regexp) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.regexpQuery(fieldName,regexp));
    searchRequest.source(searchSourceBuilder);
    log.info("string:" + searchRequest.source());
    SearchResponse searchResponse =  restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
    SearchHits hits = searchResponse.getHits();
    System.out.println("count:"+hits.totalHits);
    SearchHit[] h =  hits.getHits();
    for (SearchHit hit : h) {
        System.out.println("结果"+hit.getSourceAsMap());
    }
}

演示用例:com.javablog.elasticsearch.test.document.testIdsQuery

@Test
public void testRegexpQuery() throws IOException {
    String regex = "1069[0-9].+";
    baseQuery.regexpQuery(indexName,type,"longCode",regex);
}

1.7 scroll查询

ES对于from+size的个数是有限制的,二者之和不能超过1w。当所请求的数据总量大于1w时,可用scroll来代替from+size。

1.7.1 原理

ES的搜索是分2个阶段进行的,即Query阶段和Fetch阶段。 Query阶段比较轻量级,通过查询倒排索引,获取满足查询结果的文档ID列表。 而Fetch阶段比较重,需要将每个shard的结果取回,在协调结点进行全局排序。 通过From+size这种方式分批获取数据的时候,随着from加大,需要全局排序并丢弃的结果数量随之上升,性能越来越差。

而Scroll查询,先做轻量级的Query阶段以后,免去了繁重的全局排序过程。 它只是将查询结果集,也就是doc id列表保留在一个上下文里, 之后每次分批取回的时候,只需根据设置的size,在每个shard内部按照一定顺序(默认doc_id续), 取回这个size数量的文档即可。

1.7.2 使用场景

由此也可以看出scroll不适合支持那种实时的和用户交互的前端分页工作,其主要用途用于从ES集群分批拉取大量结果集的情况,一般都是offline的应用场景。 比如需要将非常大的结果集拉取出来,存放到其他系统处理,或者需要做大索引的reindex等等。
不要把 scroll 用于实时请求,它主要用于大数据量的场景。例如:将一个索引的内容索引到另一个不同配置的新索引中。

1.7.3 JAVA代码示例

例如滚动查询所有文档:

POST 127.0.0.1:9200/my_index/_search?scroll=1m
{
    "query": {
        "match_all" : {}
    },
    "sort": [
        "_doc"
        ]
    }
}

清除scroll

虽然我们在设置开启scroll时,设置了一个scroll的存活时间,但是如果能够在使用完顺手关闭,可以提早释放资源,降低ES的负担.

DELETE 127.0.0.1:9200/_search/scroll
{
    "scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAdsMqFmVkZTBJalJWUmp5UmI3V0FYc2lQbVEAAAAAAHbDKRZlZGUwSWpSVlJqeVJiN1dBWHNpUG1RAAAAAABpX2sWclBEekhiRVpSRktHWXFudnVaQ3dIQQAAAAAAaV9qFnJQRHpIYkVaUkZLR1lxbnZ1WkN3SEEAAAAAAGlfaRZyUER6SGJFWlJGS0dZcW52dVpDd0hB"
}

JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl

@Override
public void scrollQuery(String indexName, String typeName) throws IOException {
    SearchRequest searchRequest = new SearchRequest(indexName);
    searchRequest.types(typeName);
    //初始化scroll
    //值不需要足够长来处理所有数据—它只需要足够长来处理前一批结果。每个滚动请求(带有滚动参数)设置一个新的过期时间。
    final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L)); //设定滚动时间间隔
    searchRequest.scroll(scroll);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(matchAllQuery());
    searchSourceBuilder.size(5); //设定每次返回多少条数据
    searchRequest.source(searchSourceBuilder);
    log.info("string:" + searchRequest.source());
    SearchResponse searchResponse = null;
    try {
        searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    } catch (IOException e) {
        e.printStackTrace();
    }
    String scrollId = searchResponse.getScrollId();
    SearchHit[] searchHits = searchResponse.getHits().getHits();
    System.out.println("-----首页-----");
    for (SearchHit searchHit : searchHits) {
        System.out.println(searchHit.getSourceAsString());
    }
    //遍历搜索命中的数据,直到没有数据
    while (searchHits != null && searchHits.length > 0) {
        SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
        scrollRequest.scroll(scroll);
        log.info("string:" + scrollRequest.toString());
        try {
            searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
        scrollId = searchResponse.getScrollId();
        searchHits = searchResponse.getHits().getHits();
        if (searchHits != null && searchHits.length > 0) {
            System.out.println("-----下一页-----");
            for (SearchHit searchHit : searchHits) {
                System.out.println(searchHit.getSourceAsString());
            }
        }

    }
    
    //清除滚屏
    ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
    clearScrollRequest.addScrollId(scrollId);//也可以选择setScrollIds()将多个scrollId一起使用
    ClearScrollResponse clearScrollResponse = null;
    try {
        clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest,RequestOptions.DEFAULT);
    } catch (IOException e) {
        e.printStackTrace();
    }
    boolean succeeded = clearScrollResponse.isSucceeded();
    System.out.println("succeeded:" + succeeded);
}

演示用例:com.javablog.elasticsearch.test.document.testIdsQuery

@Test
public void testScrollQuery() throws IOException {
    baseQuery.scrollQuery(indexName,type);
}

演示效果:
在这里插入图片描述完整代码:https://github.com/chutianmen/elasticsearch-examples

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值