7搜索管理
7.1 准备环境
7.1.1 创建映射
创建xc_course索引库。
创建如下映射
post:http://localhost:9200/xc_course/doc/_mapping
参考 “资料”–》搜索测试-初始化数据.txt
{
"properties": {
"description": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"name": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"pic":{
"type":"text",
"index":false
},
"price": {
"type": "float"
},
"studymodel": {
"type": "keyword"
},
"timestamp": {
"type": "date",
"format": "yyyy‐MM‐dd HH:mm:ss||yyyy‐MM‐dd||epoch_millis"
}
}
}
7.1.2 插入原始数据
向xc_course/doc中插入以下数据:
参考 “资料”–》搜索测试-初始化数据.txt
http://localhost:9200/xc_course/doc/1
{
"name": "Bootstrap开发",
"description": "Bootstrap是由Twitter推出的一个前台页面开发框架,是一个非常流行的开发框架,此框架集成了 多种页面效果。此开发框架包含了大量的CSS、JS程序代码,可以帮助开发者(尤其是不擅长页面开发的程序人员)轻松 的实现一个不受浏览器限制的精美界面效果。",
"studymodel": "201002",
"price":38.6,
"timestamp":"2018‐04‐25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg"
}
http://localhost:9200/xc_course/doc/2
{
"name": "java编程基础",
"description": "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
"studymodel": "201001",
"price":68.6,
"timestamp":"2018‐03‐25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg"
}
http://localhost:9200/xc_course/doc/3
{
"name": "spring开发基础",
"description": "spring 在java领域非常流行,java程序员都在用。",
"studymodel": "201001",
"price":88.6,
"timestamp":"2018‐02‐24 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg"
}
7.1.3 简单搜索
简单搜索就是通过url进行查询,以get方式请求ES。
格式:get …/_search?q=…
q:搜索字符串。
例子:
?q=name:spring 搜索name中包括spring的文档。
7.3 DSL搜索
DSL(Domain Specifific Language)是ES提出的基于json的搜索方式,在搜索时传入特定的json格式的数据来完成不同的搜索需求。
DSL比URI搜索方式功能强大,在项目中建议使用DSL方式来完成搜索。
7.3.1 查询所有文档
查询所有索引库的文档。
发送:post http://localhost:9200/_search
查询指定索引库指定类型下的文档。(通过使用此方法)
发送:post http://localhost:9200/xc_course/doc/_search
{ "query": { "match_all": {} },"_source" : ["name","studymodel"] }
_source:source源过虑设置,指定结果中所包括的字段有哪些。
结果说明:
took:本次操作花费的时间,单位为毫秒。
timed_out:请求是否超时
_shards:说明本次操作共搜索了哪些分片
hits:搜索命中的记录
hits.total : 符合条件的文档总数 hits.hits :匹配度较高的前N个文档
hits.max_score:文档匹配得分,这里为最高分
_score:每个文档都有一个匹配度得分,按照降序排列。
_source:显示了文档的原始内容。
JavaClient:
@SpringBootTest
@RunWith(SpringRunner.class)
public class TestSearch {
@Autowired
RestHighLevelClient client;
@Autowired
RestClient restClient;
//搜索type下的全部记录
@Test
public void testSearchAll() throws IOException, ParseException {
//搜索请求对象
SearchRequest searchRequest = new SearchRequest("xc_course");
//指定类型
searchRequest.types("doc");
//搜索源构建对象
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//搜索方式
//matchAllQuery搜索全部
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
//设置源字段过虑,第一个参数结果集包括哪些字段,第二个参数表示结果集不包括哪些字段
searchSourceBuilder.fetchSource(new String[]{"name","studymodel","price","timestamp"},new String[]{});
//向搜索请求对象中设置搜索源
searchRequest.source(searchSourceBuilder);
//执行搜索,向ES发起http请求
SearchResponse searchResponse = client.search(searchRequest);
//搜索结果
SearchHits hits = searchResponse.getHits();
//匹配到的总记录数
long totalHits = hits.getTotalHits();
//得到匹配度高的文档
SearchHit[] searchHits = hits.getHits();
//日期格式化对象
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
for(SearchHit hit:searchHits){
//文档的主键
String id = hit.getId();
//源文档内容
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
//由于前边设置了源文档字段过虑,这时description是取不到的
String description = (String) sourceAsMap.get("description");
//学习模式
String studymodel = (String) sourceAsMap.get("studymodel");
//价格
Double price = (Double) sourceAsMap.get("price");
//日期
Date timestamp = dateFormat.parse((String) sourceAsMap.get("timestamp"));
System.out.println(name);
System.out.println(studymodel);
System.out.println(description);
}
}
}
7.3.2 分页查询
ES支持分页查询,传入两个参数:from和size。
form:表示起始文档的下标,从0开始。
size:查询的文档数量。
发送:post http://localhost:9200/xc_course/doc/_search
{"from" : 0, "size" : 1, "query": { "match_all": {} }, "_source" : ["name","studymodel"] }
JavaClient
SearchRequest searchRequest = new SearchRequest("xc_course");
searchRequest.types("xc_course");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
//分页查询,设置起始下标,从0开始
searchSourceBuilder.from(0);
//每页显示个数
searchSourceBuilder.size(10);
//source源字段过虑
searchSourceBuilder.fetchSource(new String[]{"name","studymodel"}, new String[]{});
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest);
7.3.3 Term Query
Term Query为精确查询,在搜索时会整体匹配关键字,不再将关键字分词。
发送:post http://localhost:9200/xc_course/doc/_search
{ "query": { "term" : { "name": "spring" } },"_source" : ["name","studymodel"] }
上边的搜索会查询name包括“spring”这个词的文档。
JavaClient:
SearchRequest searchRequest = new SearchRequest("xc_course");
searchRequest.types("xc_course");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.termQuery("name","spring"));
//source源字段过虑
searchSourceBuilder.fetchSource(new String[]{"name","studymodel"}, new String[]{});
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest);
7.3.4 根据id精确匹配
ES提供根据多个id值匹配的方法:
测试:
post: http://127.0.0.1:9200/xc_course/doc/_search
{ "query": { "ids" : { "type" : "doc", "values" : ["3", "4", "100"] } } }
JavaClient:
String[] split = new String[]{"1","2"};
List<String> idList = Arrays.asList(split);
searchSourceBuilder.query(QueryBuilders.termsQuery("_id", idList));
7.3.5 match Query
1、基本使用
match Query即全文检索,它的搜索方式是先将搜索字符串分词,再使用各各词条从索引中搜索。
match query与Term query区别是match query在搜索前先将搜索关键字分词,再拿各各词语去索引中搜索。
发送:post http://localhost:9200/xc_course/doc/_search
{ "query": { "match" : { "description" : { "query" : "spring开发", "operator" : "or" } } } }
query:搜索的关键字,对于英文关键字如果有多个单词则中间要用半角逗号分隔,而对于中文关键字中间可以用
逗号分隔也可以不用。
operator:or 表示 只要有一个词在文档中出现则就符合条件,and表示每个词都在文档中出现则才符合条件。
上边的搜索的执行过程是:
1、将“spring开发”分词,分为spring、开发两个词
2、再使用spring和开发两个词去匹配索引中搜索。
3、由于设置了operator为or,只要有一个词匹配成功则就返回该文档。
JavaClient:
//MatchQuery
@Test
public void testMatchQuery() throws IOException, ParseException {
//搜索请求对象
SearchRequest searchRequest = new SearchRequest("xc_course");
//指定类型
searchRequest.types("doc");
//搜索源构建对象
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//搜索方式
//MatchQuery
searchSourceBuilder.query(QueryBuilders.matchQuery("description","spring开发框架").operator(Operator.OR));
//设置源字段过虑,第一个参数结果集包括哪些字段,第二个参数表示结果集不包括哪些字段
searchSourceBuilder.fetchSource(new String[]{"name","studymodel","price","timestamp"},new String[]{});
//向搜索请求对象中设置搜索源
searchRequest.source(searchSourceBuilder);
//执行搜索,向ES发起http请求
SearchResponse searchResponse = client.search(searchRequest);
//搜索结果
SearchHits hits = searchResponse.getHits();
//匹配到的总记录数
long totalHits = hits.getTotalHits();
//得到匹配度高的文档
SearchHit[] searchHits = hits.getHits();
//日期格式化对象
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
for(SearchHit hit:searchHits){
//文档的主键
String id = hit.getId();
//源文档内容
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
//由于前边设置了源文档字段过虑,这时description是取不到的
String description = (String) sourceAsMap.get("description");
//学习模式
String studymodel = (String) sourceAsMap.get("studymodel");
//价格
Double price = (Double) sourceAsMap.get("price");
//日期
Date timestamp = dateFormat.parse((String) sourceAsMap.get("timestamp"));
System.out.println(name);
System.out.println(studymodel);
System.out.println(description);
}
}
2、minimum_should_match
上边使用的operator = or表示只要有一个词匹配上就得分,如果实现三个词至少有两个词匹配如何实现?
使用minimum_should_match可以指定文档匹配词的占比:
比如搜索语句如下:
{ "query": { "match" : { "description" : { "query" : "spring开发框架", "minimum_should_match": "80%" } } } }
“spring开发框架”会被分为三个词:spring、开发、框架
设置"minimum_should_match": "80%"表示,三个词在文档的匹配占比为80%,即3*0.8=2.4,向上取整得2,表
示至少有两个词在文档中要匹配成功。
对应的RestClient如下:
//匹配关键字
searchSourceBuilder.query(QueryBuilders.matchQuery("description","spring开发框架")
.minimumShouldMatch("80%"));
7.3.6 multiQuery
上边学习的termQuery和matchQuery一次只能匹配一个Field,本节学习multiQuery,一次可以匹配多个字段。
1、基本使用
单项匹配是在一个field中去匹配,多项匹配是拿关键字去多个Field中匹配。
例子:
发送:post http://localhost:9200/xc_course/doc/_search
拿关键字 “spring css”去匹配name 和description字段。
{ "query": { "multi_match" : { "query" : "spring css", "minimum_should_match": "50%", "fields": [ "name", "description" ] }} }
2、提升boost
匹配多个字段时可以提升字段的boost(权重)来提高得分
例子:
提升boost之前,执行下边的查询:
{ "query": { "multi_match" : { "query" : "spring框架", "minimum_should_match": "50%", "fields": [ "name", "description" ] }} }
通过查询发现Bootstrap排在前边。
提升boost,通常关键字匹配上name的权重要比匹配上description的权重高,这里可以对name的权重提升。
{ "query": { "multi_match" : { "query" : "spring框架", "minimum_should_match": "50%", "fields": [ "name^10", "description" ] }} }
“name^10” 表示权重提升10倍,执行上边的查询,发现name中包括spring关键字的文档排在前边。
JavaClient:
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("spring框架", "name", "description") .minimumShouldMatch("50%");
multiMatchQueryBuilder.field("name",10);//提升boost
7.3.7 布尔查询
布尔查询对应于Lucene的BooleanQuery查询,实现将多个查询组合起来。
三个参数:
must:文档必须匹配must所包括的查询条件,相当于 “AND” should:文档应该匹配should所包括的查询条件其
中的一个或多个,相当于 “OR” must_not:文档不能匹配must_not所包括的该查询条件,相当于“NOT”
分别使用must、should、must_not测试下边的查询:
发送:POST http://localhost:9200/xc_course/doc/_search
{ "_source" : [ "name", "studymodel", "description"], "from" : 0, "size" : 1, "query": { "bool" : { "must":[{ "multi_match" : { "query" : "spring框架", "minimum_should_match": "50%", "fields": [ "name^10", "description" ] }},{ "term":{"studymodel" : "201001" } } ] } } }
must:表示必须,多个查询条件必须都满足。(通常使用must)
should:表示或者,多个查询条件只要有一个满足即可。
must_not:表示非。
JavaClient:
//BoolQuery,将搜索关键字分词,拿分词去索引库搜索
@Test
public void testBoolQuery() throws IOException {
//创建搜索请求对象
SearchRequest searchRequest= new SearchRequest("xc_course");
searchRequest.types("doc");
//创建搜索源配置对象
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.fetchSource(new String[]{"name","pic","studymodel"},new String[]{});
//multiQuery
String keyword = "spring开发框架";
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("spring框架",
"name", "description")
.minimumShouldMatch("50%");
multiMatchQueryBuilder.field("name",10);
//TermQuery
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("studymodel", "201001");
//布尔查询
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(multiMatchQueryBuilder);
boolQueryBuilder.must(termQueryBuilder);
//设置布尔查询对象
searchSourceBuilder.query(boolQueryBuilder);
searchRequest.source(searchSourceBuilder);//设置搜索源配置
SearchResponse searchResponse = client.search(searchRequest);
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHits = hits.getHits();
for(SearchHit hit:searchHits){
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
System.out.println(sourceAsMap);
}
}
7.3.8 过虑器
过虑是针对搜索的结果进行过虑,过虑器主要判断的是文档是否匹配,不去计算和判断文档的匹配度得分,所以过
虑器性能比查询要高,且方便缓存,推荐尽量使用过虑器去实现查询或者过虑器和查询共同使用。
过虑器在布尔查询中使用,下边是在搜索结果的基础上进行过虑:
{ "_source" : [ "name", "studymodel", "description","price"], "query": { "bool" : { "must":[{ "multi_match" : { "query" : "spring框架", "minimum_should_match": "50%", "fields": [ "name^10", "description" ] }} ],"filter": [ { "term": { "studymodel": "201001" }}, { "range": { "price": { "gte": 60 ,"lte" : 100}}} ] } } }
range:范围过虑,保留大于等于60 并且小于等于100的记录。
term:项匹配过虑,保留studymodel等于"201001"的记录。
注意:range和term一次只能对一个Field设置范围过虑。
client:
//布尔查询使用过虑器
@Test
public void testFilter() throws IOException {
SearchRequest searchRequest = new SearchRequest("xc_course");
searchRequest.types("doc");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//source源字段过虑
searchSourceBuilder.fetchSource(new String[]{"name","studymodel","price","description"},
new String[]{});
searchRequest.source(searchSourceBuilder);
//匹配关键字
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("spring框
架", "name", "description");
//设置匹配占比
multiMatchQueryBuilder.minimumShouldMatch("50%");
//提升另个字段的Boost值
multiMatchQueryBuilder.field("name",10);
searchSourceBuilder.query(multiMatchQueryBuilder);
//布尔查询
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(searchSourceBuilder.query());
//过虑
boolQueryBuilder.filter(QueryBuilders.termQuery("studymodel", "201001"));
boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gte(60).lte(100));
SearchResponse searchResponse = client.search(searchRequest);
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHits = hits.getHits();
for (SearchHit hit : searchHits) {
String index = hit.getIndex();
String type = hit.getType();
String id = hit.getId();
float score = hit.getScore();
String sourceAsString = hit.getSourceAsString();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String studymodel = (String) sourceAsMap.get("studymodel");
String description = (String) sourceAsMap.get("description");
System.out.println(name);
System.out.println(studymodel);
System.out.println(description);
}
}
7.3.9 排序
可以在字段上添加一个或多个排序,支持在keyword、date、flfloat等类型上添加,text类型的字段上不允许添加排 序。
发送 POST http://localhost:9200/xc_course/doc/_search
过虑0–10元价格范围的文档,并且对结果进行排序,先按studymodel降序,再按价格升序
{ "_source" : [ "name", "studymodel", "description","price"], "query": { "bool" : { "filter": [ { "range": { "price": { "gte": 0 ,"lte" : 100}}} ] } }, "sort" : [ {"studymodel" : "desc" }, { "price" : "asc" } ] }
client:
@Test
public void testSort() throws IOException {
SearchRequest searchRequest = new SearchRequest("xc_course");
searchRequest.types("doc");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//source源字段过虑
searchSourceBuilder.fetchSource(new String[]{"name","studymodel","price","description"},
new String[]{});
searchRequest.source(searchSourceBuilder);
//布尔查询
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
//过虑
boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gte(0).lte(100));
//排序
searchSourceBuilder.sort(new FieldSortBuilder("studymodel").order(SortOrder.DESC));
searchSourceBuilder.sort(new FieldSortBuilder("price").order(SortOrder.ASC));
SearchResponse searchResponse = client.search(searchRequest);
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHits = hits.getHits();
for (SearchHit hit : searchHits) {
String index = hit.getIndex();
String type = hit.getType();
String id = hit.getId();
float score = hit.getScore();
String sourceAsString = hit.getSourceAsString();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String studymodel = (String) sourceAsMap.get("studymodel");
String description = (String) sourceAsMap.get("description");
System.out.println(name);
System.out.println(studymodel);
System.out.println(description);
}
}
7.3.10 高亮显示
高亮显示可以将搜索结果一个或多个字突出显示,以便向用户展示匹配关键字的位置。
在搜索语句中添加highlight即可实现,如下:
Post: http://127.0.0.1:9200/xc_course/doc/_search
{ "_source" : [ "name", "studymodel", "description","price"], "query": { "bool" : { "must":[{ "multi_match" : { "query" : "开发框架", "minimum_should_match": "50%", "fields": [ "name^10", "description" ], "type":"best_fields" }} ],"filter": [ { "range": { "price": { "gte": 0 ,"lte" : 100}}} ] } }, "sort" : [{ "price" : "asc" } ],"highlight": { "pre_tags": ["<tag1>"], "post_tags": ["</tag2>"], "fields": { "name": {}, "description":{} } } }
client代码如下:
@Test
public void testHighlight() throws IOException {
SearchRequest searchRequest = new SearchRequest("xc_course");
searchRequest.types("doc");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//source源字段过虑
searchSourceBuilder.fetchSource(new String[]{"name","studymodel","price","description"},
new String[]{});
searchRequest.source(searchSourceBuilder);
//匹配关键字
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("开发",
"name", "description");
searchSourceBuilder.query(multiMatchQueryBuilder);
//布尔查询
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(searchSourceBuilder.query());
//过虑
boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gte(0).lte(100));
//排序
searchSourceBuilder.sort(new FieldSortBuilder("studymodel").order(SortOrder.DESC));
searchSourceBuilder.sort(new FieldSortBuilder("price").order(SortOrder.ASC));
//高亮设置
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.preTags("<tag>");//设置前缀
highlightBuilder.postTags("</tag>");//设置后缀
// 设置高亮字段
highlightBuilder.fields().add(new HighlightBuilder.Field("name"));
// highlightBuilder.fields().add(new HighlightBuilder.Field("description"));
searchSourceBuilder.highlighter(highlightBuilder);
SearchResponse searchResponse = client.search(searchRequest);
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHits = hits.getHits();
for (SearchHit hit : searchHits) {
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
//名称
String name = (String) sourceAsMap.get("name");
//取出高亮字段内容
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
if(highlightFields!=null){
HighlightField nameField = highlightFields.get("name");
if(nameField!=null){
Text[] fragments = nameField.getFragments();
StringBuffer stringBuffer = new StringBuffer();
for (Text str : fragments) {
stringBuffer.append(str.string());
}
name = stringBuffer.toString();
}
}
String index = hit.getIndex();
String type = hit.getType();
String id = hit.getId();
float score = hit.getScore();
String sourceAsString = hit.getSourceAsString();
String studymodel = (String) sourceAsMap.get("studymodel");
String description = (String) sourceAsMap.get("description");
System.out.println(name);
System.out.println(studymodel);
System.out.println(description);
}
}