Elastic Search 学习入门之Search全文检索(七)

ES是基于Lucene的开源搜索引擎,其查询语法关键字部分和Lucene大致一样:

        分页: from/size、字段:fields、排序:sort、查询:query

        过滤:filter、高亮:highlight、统计:facet 

ES的搜索类型有4种(以下说明是基于elasticsearch2.3):

query and fetch (速度最快)(返回N倍数据量)     受保护,5.3之前可用 

query then fetch (默认的搜索方式) 

DFS query and fetch    没有了 

DFS query then fetch (可以更精确控制搜索打分和排名) 

DFS:这个D可能是Distributed,F可能是frequency的缩写,至于S可能是Scatter的缩写,整个单词可能是分布式词频率和文档频率散发的缩写。 

初始化散发:从ES的官方网站可以发现,初始化散发其实就是在进行真正的查询之前,先把各个分片的词频率和文档频率收集一下,然后进行词搜索的时候,各分片依据全局的词频率和文档频率进行搜索和排名。显然如果使用DFS_QUERY_THEN_FETCH这种查询方式,效率是最低的,因为一个搜索,可能要请求3次分片。但使用DFS方法,搜索精度应该是最高的。

综上,从性能考虑:

        QUERY_AND_FETCH是最快的,DFS_QUERY_THEN_FETCH是最慢的。

从搜索的精确度:

        DFS要比非DFS的准确度更高。

 

ElasticSearch查询:

        

        对应每个查询项,我们可以通过must、should、mustNot方法对QueryBuilder进行组合,形成多条件查询。(must => and, should=>or)

        Luncene 支持基于词条的TermQuery、RangeQuery、PrefixQuery、BooleanQuery、PhraseQuery、WildcardQuery、FuzzQuery

   

  • TermQuery与QueryParser

        单个单词作为查询表达式时,它相当于一个单独的项,如果表达式是由单个单词构成,QueryParser的parse()方法会返回一个TermQuery对象。

        如查询表达式为content:hello, QueryParser会返回一个域为content,值为hello的TermQuery。

Query query = new TermQuery("content", "hello")

  • RangeQuery与QUeryParser

 

        QueryParser可以使用[ 起始 To 终止 ] 或 { 起始 To 终止 }表达式来构造RangeQuery。

        如查询表达式:time:[20181010 To 20181210], QueryParser会返回一个域为time,下限为20181010,上限为20181210的RangeQuery。

        Term t1 = new Term("time", "20181010");

        Term t2 = new Term("time", "20181210");

        Query query = new RangeQuery(t1, t2, true);

  • PrefixQuery与QueryParser
    • 当查询表达式中短语以星号(*)结尾时,QueryParser会创建一个PrefixQuery对象。
    • 如查询表达式为content:luc*, 则QueryParser会返回一个域为content,值为luc的PrefixQuery

      Query query = new PrefixQuery(luc);

  • BooleanQuery与QueryParser
    • 当查询表达式中包含多个项时,QueryParser可以方便的构建BooleanQuery。QueryParser使用圆括号分组,通过-,+,AND,OR及NOT来指定所生成的BooleanQuery。
  • PhraseQuery与QueryParser

        在QueryParser的分析表达式中双引号的若干项会被转换为一个PhraseQuery对象,默认情况下,Slop因子为0,可以在表达式中通过~n来指定slop因子的值。

        如查询表达式为content:"hello world" ~3, 则QueryParser会返回一个域为content,内容为"hello world", slope为3的短语查询。

        Query query = new PhraseQuery();

        query.setSlop(3);

        query.add(new Term("content", "hello"));

        query.add(new Term("content", "world"));

  • Wildcard与QueryParser

Luncene使用两个标准的通配符号,*代表0或多个字母,?代表0或1个字母。但查询表达式中包含*或者?时,则QueryParser会返回一个WildcardQuery对象。但要注意的是,当*出现在查询表达式的末尾时,会被优化为PrefixQuery;并且查询表达式的首个字符不能是通配符,防止用户输入以通配符*为前缀的搜索表达式,导致Lucene枚举所有的项而耗费巨大的资源。

  • FuzzyQuery和QueryParser

        QueryParser通过在某个项之后添加"~"来支持FuzzyQuery类的模糊查询。

代码实现:

  • 简单查询及显示所有内容:

  •     @Test
        public void testQuery1(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    /**
                     * 设置检索方式:
                     *  QUERY_AND_FETCH:  在5.3之前,之后受保护
                     *  QUERY_THEN_FETCH:  默认
                     *  DFS_QUERY_AND_FETCH:  直接移除,新版本没有
                     *  DFS_QUERY_THEN_FETCH:
                     */
                    .setSearchType(SearchType.DEFAULT)
                    /**
                     * 设置要检索的内容
                     * 基于不同的检索方式,是否能够检索到想要的数据,就逐渐衍生出来了一个职位SEO,搜索引擎优化
                     */
    //                .setQuery(QueryBuilders.matchPhrasePrefixQuery("firstname", "V*")) // 在firstname字段上检索以V开头的数据
    //                .setQuery(QueryBuilders.matchQuery("state", "NM"))
                    .setQuery(QueryBuilders.termQuery("age", 40))
                    //分页,每页显示M条,显示第N页的数据setFrom((N - 1) * M ).setSize()
                    .setFrom(1)//从哪一条开始显示
                    .setSize(5)//每页显示的内容
                    .get();
            // 返回检索结果数据,被封装SearchHits对象中
            SearchHits searchHits = response.getHits();
            long totalHits = searchHits.totalHits;
            System.out.println("搜索到"+totalHits+"个结果");
     
            /**
             * "hits": [
             * {
             * "_index": "product",
             * "_type": "bigdata",
             * "_id": "5",
             * "_score": 1,
             * "_source": {
             * "name": "redis",
             * "author": "redis",
             * "version": "5.0.0"
             * }
             * }
             */
            SearchHit[] hits = searchHits.getHits();
            for (SearchHit hit : hits){
                System.out.println("--------------------------------------------");
                String index = hit.getIndex();
                String type = hit.getType();
                String id = hit.getId();
                float score = hit.getScore();
                System.out.println("index: " + index);
                System.out.println("type: " + type);
                System.out.println("id: " + id);
                System.out.println("score: " + score);
                Map<String, Object> source = hit.getSourceAsMap();
                source.forEach((field, value) ->{
                    System.out.println(field + "--->" + value);
                });
     
            }
        }

  • 查询字段部分高亮显示:

  •  
     @Test
        public void testHightLight(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    .setSearchType(SearchType.DEFAULT)
                    .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                    .highlighter(//设置高亮显示
                            SearchSourceBuilder.highlight()
                                    .field("address")
                                    .preTags("<font color='red' size='16px'>")
                                    .postTags("</font>")
                    )
                    .setFrom(0)//从哪一条开始显示
                    .setSize(5)//每页显示的内容
                    .get();
            // 返回检索结果数据,被封装SearchHits对象中
            SearchHits searchHits = response.getHits();
            long totalHits = searchHits.totalHits;
            System.out.println("搜索到"+totalHits+"个结果");
            SearchHit[] hits = searchHits.getHits();
            for (SearchHit hit : hits){//获取高亮显示的内容
                System.out.println("-------------------------------------------");
                //高亮字段内容
                Map<String, HighlightField> highlightFields = hit.getHighlightFields();
                highlightFields.forEach((key,highlightField) -> {
                    System.out.println("key: " + key);
                    String address = "";
                    Text[] fragments = highlightField.fragments();
                    for (Text fragment : fragments){
                        address += fragment.toString();
                    }
                    System.out.println("address: " + address);
     
                });
     
     
            }
     
        }

  • 按照某个字段进行排序显示:

  •  
      @Test
        public void testSort(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    .setSearchType(SearchType.DEFAULT)
                    .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                    .highlighter(//设置高亮显示
                            SearchSourceBuilder.highlight()
                                    .field("address")
                                    .preTags("<font color='red' size='16px'>")
                                    .postTags("</font>")
                    )
                    .addSort("age", SortOrder.ASC)
    //                .addSort("age", SortOrder.DESC)
                    .setFrom(0)//从哪一条开始显示
                    .setSize(5)//每页显示的内容
                    .get();
            // 返回检索结果数据,被封装SearchHits对象中
            SearchHits searchHits = response.getHits();
            long totalHits = searchHits.totalHits;
            System.out.println("搜索到"+totalHits+"个结果");
            SearchHit[] hits = searchHits.getHits();
            for (SearchHit hit : hits){//获取高亮显示的内容
                System.out.println("-------------------------------------------");
                Map<String, Object> source = hit.getSourceAsMap();
                Object firstname = source.get("firstname");
                Object age = source.get("age");
                System.out.println("firstname: " + firstname);
                System.out.println("age: " + age);
                //高亮字段内容
                Map<String, HighlightField> highlightFields = hit.getHighlightFields();
                highlightFields.forEach((key,highlightField) -> {
                    System.out.println("key: " + key);
                    String address = "";
                    Text[] fragments = highlightField.fragments();
                    for (Text fragment : fragments){
                        address += fragment.toString();
                    }
                    System.out.println("address: " + address);
                });
            }
     
        }

  • 聚合操作测试:

  •  
     @Test
        public void testAggr(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    .setSearchType(SearchType.DEFAULT)
                    .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                    .addAggregation(
                            AggregationBuilders
                                    .avg("avg_age")//select avg(age) avg_age --> 这里面的name就是最好显示的别名
                                    .field("age")//select max(age), min(age), avg(age) avg_age --> 这里面的field就是这里的age对应列,或者索引库中的field
                    )
                    .get();
            Aggregations aggrs = response.getAggregations();//是个集合
    //        System.out.println(aggrs);
            for (Aggregation aggr : aggrs){
    //            System.out.println(aggr);
    //            System.out.println(aggr.getName());
    //            System.out.println(aggr.getType());
                InternalAvg avg = (InternalAvg) aggr;
                double value = avg.getValue();
                System.out.println(avg.getName() + "-->" + value);
            }
        }

  • 过滤部分字段范围测试:

  •  
    @Test
        public void testFilter(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    .setSearchType(SearchType.DEFAULT)
                    .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                    .highlighter(//设置高亮显示
                            SearchSourceBuilder.highlight()
                                    .field("address")
                                    .preTags("<font color='red' size='16px'>")
                                    .postTags("</font>")
                    )
                    //过滤年龄在30~35之间的数据
                    .setPostFilter(
                            QueryBuilders.rangeQuery("age").gte(30).lte(35)
                    )
                    .addSort("age", SortOrder.ASC)
                    .setFrom(0)//从哪一条开始显示
                    .setSize(5)//每页显示的内容
                    .get();
            // 返回检索结果数据,被封装SearchHits对象中
            SearchHits searchHits = response.getHits();
            long totalHits = searchHits.totalHits;
            System.out.println("搜索到"+totalHits+"个结果");
            SearchHit[] hits = searchHits.getHits();
            for (SearchHit hit : hits){//获取高亮显示的内容
                System.out.println("-------------------------------------------");
                Map<String, Object> source = hit.getSourceAsMap();
                Object firstname = source.get("firstname");
                Object age = source.get("age");
                System.out.println("firstname: " + firstname);
                System.out.println("age: " + age);
                //高亮字段内容
                Map<String, HighlightField> highlightFields = hit.getHighlightFields();
                highlightFields.forEach((key,highlightField) -> {
                    System.out.println("key: " + key);
                    String address = "";
                    Text[] fragments = highlightField.fragments();
                    for (Text fragment : fragments){
                        address += fragment.toString();
                    }
                    System.out.println("address: " + address);
                });
            }
        }

  • 全代码:

    elasticsearch.conf:

    cluster.name=rk-ES
    cluster.host.port=hadoop01:9300,hadoop02:9300,hadoop03:9300
    package rk.constants;
     
    /**
     * @Author rk
     * @Date 2018/12/10 15:14
     * @Description:
     **/
    public interface Constants {
        String CLUSTER_NAME = "cluster.name";
        String CLUSTER_HOST_PORT = "cluster.host.port";
     
    }

  • package rk.elastic;
     
    import org.elasticsearch.action.search.SearchResponse;
    import org.elasticsearch.action.search.SearchType;
    import org.elasticsearch.client.transport.TransportClient;
    import org.elasticsearch.common.settings.Settings;
    import org.elasticsearch.common.text.Text;
    import org.elasticsearch.common.transport.TransportAddress;
    import org.elasticsearch.index.query.QueryBuilders;
    import org.elasticsearch.search.SearchHit;
    import org.elasticsearch.search.SearchHits;
    import org.elasticsearch.search.aggregations.Aggregation;
    import org.elasticsearch.search.aggregations.AggregationBuilders;
    import org.elasticsearch.search.aggregations.Aggregations;
    import org.elasticsearch.search.aggregations.metrics.avg.InternalAvg;
    import org.elasticsearch.search.builder.SearchSourceBuilder;
    import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
    import org.elasticsearch.search.sort.SortOrder;
    import org.elasticsearch.transport.client.PreBuiltTransportClient;
    import org.junit.After;
    import org.junit.Before;
    import org.junit.Test;
    import rk.constants.Constants;
     
    import java.io.IOException;
    import java.io.InputStream;
    import java.net.InetSocketAddress;
    import java.util.Map;
    import java.util.Properties;
     
    /**
     * @Author rk
     * @Date 2018/12/10 15:06
     * @Description:
     **/
    public class ElasticSearchTest2 {
     
        private TransportClient client;
        @Before
        public void setUp() throws IOException {
            Properties properties = new Properties();
            InputStream in = ElasticSearchTest2.class.getClassLoader().getResourceAsStream("elasticsearch.conf");
            properties.load(in);
            Settings setting = Settings.builder()
                    .put(Constants.CLUSTER_NAME,properties.getProperty(Constants.CLUSTER_NAME))
                    .build();
            client = new PreBuiltTransportClient(setting);
            String hostAndPorts = properties.getProperty(Constants.CLUSTER_HOST_PORT);
            for (String hostAndPort : hostAndPorts.split(",")){
                String[] fields = hostAndPort.split(":");
                String host = fields[0];
                int port = Integer.valueOf(fields[1]);
                TransportAddress ts = new TransportAddress(new InetSocketAddress(host, port));
                client.addTransportAddresses(ts);
            }
            System.out.println("cluster.name = " + client.settings().get("cluster.name"));
        }
     
        String[] indices = {"product","test"};
     
        @Test
        public void testQuery1(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    /**
                     * 设置检索方式:
                     *  QUERY_AND_FETCH:  在5.3之前,之后受保护
                     *  QUERY_THEN_FETCH:  默认
                     *  DFS_QUERY_AND_FETCH:  直接移除,新版本没有
                     *  DFS_QUERY_THEN_FETCH:
                     */
                    .setSearchType(SearchType.DEFAULT)
                    /**
                     * 设置要检索的内容
                     * 基于不同的检索方式,是否能够检索到想要的数据,就逐渐衍生出来了一个职位SEO,搜索引擎优化
                     */
    //                .setQuery(QueryBuilders.matchPhrasePrefixQuery("firstname", "V*")) // 在firstname字段上检索以V开头的数据
    //                .setQuery(QueryBuilders.matchQuery("state", "NM"))
                    .setQuery(QueryBuilders.termQuery("age", 40))
                    //分页,每页显示M条,显示第N页的数据setFrom((N - 1) * M ).setSize()
                    .setFrom(1)//从哪一条开始显示
                    .setSize(5)//每页显示的内容
                    .get();
            // 返回检索结果数据,被封装SearchHits对象中
            SearchHits searchHits = response.getHits();
            long totalHits = searchHits.totalHits;
            System.out.println("搜索到"+totalHits+"个结果");
     
            /**
             * "hits": [
             * {
             * "_index": "product",
             * "_type": "bigdata",
             * "_id": "5",
             * "_score": 1,
             * "_source": {
             * "name": "redis",
             * "author": "redis",
             * "version": "5.0.0"
             * }
             * }
             */
            SearchHit[] hits = searchHits.getHits();
            for (SearchHit hit : hits){
                System.out.println("--------------------------------------------");
                String index = hit.getIndex();
                String type = hit.getType();
                String id = hit.getId();
                float score = hit.getScore();
                System.out.println("index: " + index);
                System.out.println("type: " + type);
                System.out.println("id: " + id);
                System.out.println("score: " + score);
                Map<String, Object> source = hit.getSourceAsMap();
                source.forEach((field, value) ->{
                    System.out.println(field + "--->" + value);
                });
     
            }
        }
     
        @Test
        public void testHightLight(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    .setSearchType(SearchType.DEFAULT)
                    .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                    .highlighter(//设置高亮显示
                            SearchSourceBuilder.highlight()
                                    .field("address")
                                    .preTags("<font color='red' size='16px'>")
                                    .postTags("</font>")
                    )
                    .setFrom(0)//从哪一条开始显示
                    .setSize(5)//每页显示的内容
                    .get();
            // 返回检索结果数据,被封装SearchHits对象中
            SearchHits searchHits = response.getHits();
            long totalHits = searchHits.totalHits;
            System.out.println("搜索到"+totalHits+"个结果");
            SearchHit[] hits = searchHits.getHits();
            for (SearchHit hit : hits){//获取高亮显示的内容
                System.out.println("-------------------------------------------");
                //高亮字段内容
                Map<String, HighlightField> highlightFields = hit.getHighlightFields();
                highlightFields.forEach((key,highlightField) -> {
                    System.out.println("key: " + key);
                    String address = "";
                    Text[] fragments = highlightField.fragments();
                    for (Text fragment : fragments){
                        address += fragment.toString();
                    }
                    System.out.println("address: " + address);
     
                });
     
     
            }
     
        }
     
        @Test
        public void testSort(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    .setSearchType(SearchType.DEFAULT)
                    .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                    .highlighter(//设置高亮显示
                            SearchSourceBuilder.highlight()
                                    .field("address")
                                    .preTags("<font color='red' size='16px'>")
                                    .postTags("</font>")
                    )
                    .addSort("age", SortOrder.ASC)
    //                .addSort("age", SortOrder.DESC)
                    .setFrom(0)//从哪一条开始显示
                    .setSize(5)//每页显示的内容
                    .get();
            // 返回检索结果数据,被封装SearchHits对象中
            SearchHits searchHits = response.getHits();
            long totalHits = searchHits.totalHits;
            System.out.println("搜索到"+totalHits+"个结果");
            SearchHit[] hits = searchHits.getHits();
            for (SearchHit hit : hits){//获取高亮显示的内容
                System.out.println("-------------------------------------------");
                Map<String, Object> source = hit.getSourceAsMap();
                Object firstname = source.get("firstname");
                Object age = source.get("age");
                System.out.println("firstname: " + firstname);
                System.out.println("age: " + age);
                //高亮字段内容
                Map<String, HighlightField> highlightFields = hit.getHighlightFields();
                highlightFields.forEach((key,highlightField) -> {
                    System.out.println("key: " + key);
                    String address = "";
                    Text[] fragments = highlightField.fragments();
                    for (Text fragment : fragments){
                        address += fragment.toString();
                    }
                    System.out.println("address: " + address);
                });
            }
     
        }
     
     
        @Test
        public void testAggr(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    .setSearchType(SearchType.DEFAULT)
                    .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                    .addAggregation(
                            AggregationBuilders
                                    .avg("avg_age")//select avg(age) avg_age --> 这里面的name就是最好显示的别名
                                    .field("age")//select max(age), min(age), avg(age) avg_age --> 这里面的field就是这里的age对应列,或者索引库中的field
                    )
                    .get();
            Aggregations aggrs = response.getAggregations();//是个集合
    //        System.out.println(aggrs);
            for (Aggregation aggr : aggrs){
    //            System.out.println(aggr);
    //            System.out.println(aggr.getName());
    //            System.out.println(aggr.getType());
                InternalAvg avg = (InternalAvg) aggr;
                double value = avg.getValue();
                System.out.println(avg.getName() + "-->" + value);
            }
        }
     
        @Test
        public void testFilter(){
            SearchResponse response = client
                    .prepareSearch(indices) // 指定要检索的索引库
                    .setSearchType(SearchType.DEFAULT)
                    .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                    .highlighter(//设置高亮显示
                            SearchSourceBuilder.highlight()
                                    .field("address")
                                    .preTags("<font color='red' size='16px'>")
                                    .postTags("</font>")
                    )
                    //过滤年龄在30~35之间的数据
                    .setPostFilter(
                            QueryBuilders.rangeQuery("age").gte(30).lte(35)
                    )
                    .addSort("age", SortOrder.ASC)
                    .setFrom(0)//从哪一条开始显示
                    .setSize(5)//每页显示的内容
                    .get();
            // 返回检索结果数据,被封装SearchHits对象中
            SearchHits searchHits = response.getHits();
            long totalHits = searchHits.totalHits;
            System.out.println("搜索到"+totalHits+"个结果");
            SearchHit[] hits = searchHits.getHits();
            for (SearchHit hit : hits){//获取高亮显示的内容
                System.out.println("-------------------------------------------");
                Map<String, Object> source = hit.getSourceAsMap();
                Object firstname = source.get("firstname");
                Object age = source.get("age");
                System.out.println("firstname: " + firstname);
                System.out.println("age: " + age);
                //高亮字段内容
                Map<String, HighlightField> highlightFields = hit.getHighlightFields();
                highlightFields.forEach((key,highlightField) -> {
                    System.out.println("key: " + key);
                    String address = "";
                    Text[] fragments = highlightField.fragments();
                    for (Text fragment : fragments){
                        address += fragment.toString();
                    }
                    System.out.println("address: " + address);
                });
            } 
        }
       
        
        @After
        public void cleanUp(){
            client.close();
        }
     
    }

    elasticsearch2.3版本时的代码实现:

    package rk.elastic;
     
     
    import com.fasterxml.jackson.databind.ObjectMapper;
    import org.dom4j.Document;
    import org.dom4j.Element;
    import org.dom4j.io.SAXReader;
    import org.elasticsearch.action.bulk.BulkRequestBuilder;
    import org.elasticsearch.action.bulk.BulkResponse;
    import org.elasticsearch.action.index.IndexRequest;
    import org.elasticsearch.action.search.SearchRequestBuilder;
    import org.elasticsearch.action.search.SearchResponse;
    import org.elasticsearch.action.search.SearchType;
    import org.elasticsearch.client.transport.TransportClient;
    import org.elasticsearch.common.settings.Settings;
    import org.elasticsearch.common.text.Text;
    import org.elasticsearch.common.transport.TransportAddress;
    import org.elasticsearch.index.query.QueryBuilders;
    import org.elasticsearch.search.SearchHit;
    import org.elasticsearch.search.SearchHits;
    import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
    import org.elasticsearch.transport.client.PreBuiltTransportClient;
    import org.junit.Before;
    import org.junit.Test;
    import rk.constants.Constants;
     
    import java.io.File;
    import java.io.IOException;
    import java.io.InputStream;
    import java.net.InetSocketAddress;
    import java.util.*;
     
    /**
     * @Author rk
     * @Date 2018/12/10 15:06
     * @Description:
     *
     *      准备数据:
     *      <doc>
     *         <url>http://gongyi.sohu.com/20120730/n349358066.shtml</url>
     *         <docno>fdaa73d52fd2f0ea-34913306c0bb3300</docno>
     *          <contenttitle>失独父母中年遇独子夭折 称不怕死亡怕养老生病</contenttitle>
     *          <content></content>
     *      </doc>
     *
     **/
     
    class Article{
        private String url;
        private String docno;
        private String content;
        private String contenttitle;
     
        public Article() {
        }
     
        public Article(String url, String docno, String content, String contenttitle) {
            this.url = url;
            this.docno = docno;
            this.content = content;
            this.contenttitle = contenttitle;
        }
     
        public String getUrl() {
            return url;
        }
     
        public void setUrl(String url) {
            this.url = url;
        }
     
        public String getDocno() {
            return docno;
        }
     
        public void setDocno(String docno) {
            this.docno = docno;
        }
     
        public String getContent() {
            return content;
        }
     
        public void setContent(String content) {
            this.content = content;
        }
     
        public String getContenttitle() {
            return contenttitle;
        }
     
        public void setContenttitle(String contenttitle) {
            this.contenttitle = contenttitle;
        }
     
        @Override
        public String toString() {
            return "Article{" +
                    "url='" + url + '\'' +
                    ", docno='" + docno + '\'' +
                    ", content='" + content + '\'' +
                    ", contenttitle='" + contenttitle + '\'' +
                    '}';
        }
    }
     
    //  解析代码,取其中前20条
    class XmlParser {
        public static List<Article> getArticle() {
            List<Article> list = new ArrayList<Article>();
            SAXReader reader = new SAXReader();
            Document document;
            try {
                document = reader.read(new File("news_sohusite_xml"));
                Element root = document.getRootElement();
                Iterator<Element> iterator = root.elementIterator("doc");
                Article article = null;
                int count = 0;
                while(iterator.hasNext()) {
                    Element doc = iterator.next();
                    String url = doc.elementTextTrim("url");
                    String docno = doc.elementTextTrim("docno");
                    String content = doc.elementTextTrim("content");
                    String contenttitle = doc.elementTextTrim("contenttitle");
                    article = new Article();
                    article.setContent(content);
                    article.setDocno(docno);
                    article.setContenttitle(contenttitle);
                    article.setUrl(url);
                    if(++count > 20) {
                        break;
                    }
                    list.add(article);
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
            return list;
        }
     
    }
     
     
     
     
     
    public class ElasticSearchTest2_3 {
     
        private TransportClient client;
        @Before
        public void setUp() throws IOException {
            Properties properties = new Properties();
            InputStream in = ElasticSearchTest2.class.getClassLoader().getResourceAsStream("elasticsearch.conf");
            properties.load(in);
            Settings setting = Settings.builder()
                    .put(Constants.CLUSTER_NAME,properties.getProperty(Constants.CLUSTER_NAME))
                    .build();
            client = new PreBuiltTransportClient(setting);
            String hostAndPorts = properties.getProperty(Constants.CLUSTER_HOST_PORT);
            System.out.println(hostAndPorts);
            for (String hostAndPort : hostAndPorts.split(",")){
                String[] fields = hostAndPort.split(":");
                String host = fields[0];
                int port = Integer.valueOf(fields[1]);
                TransportAddress ts = new TransportAddress(new InetSocketAddress(host, port));
                client.addTransportAddresses(ts);
            }
     
            TransportAddress[] transportAddrs = new TransportAddress[3];
            client.addTransportAddresses(transportAddrs);
            System.out.println("cluster.name = " + client.settings().get("cluseter.name"));
     
        }
     
        String index = "search";
        //  批量导入ES库
        @Test
        public void bulkInsert() throws Exception {
            List<Article> list = XmlParser.getArticle();
            ObjectMapper oMapper = new ObjectMapper();
            BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();
            for (int i = 0; i < list.size(); i++) {
                Article article = list.get(i);
                String val = oMapper.writeValueAsString(article);
                bulkRequestBuilder.add(new IndexRequest(index, "news",
                        article.getDocno()).source(val));
            }
            BulkResponse response = bulkRequestBuilder.get();
        }
     
        //查询
        @Test
        public void testSearch() {
            String indices = "bigdata";//指的是要搜索的哪一个索引库
            SearchRequestBuilder builder = client.prepareSearch(indices)
                    .setSearchType(SearchType.DEFAULT)
                    .setFrom(0)
                    .setSize(5)//设置分页
                    /**
                     * 这是最新的
                     * .highlighter(//设置高亮显示
                     *          SearchSourceBuilder.highlight()
                     *                      .field("address")
                     *                      .preTags("<font color='red' size='16px'>")
                     *                      .postTags("</font>")
                     *      )
                     */
                    .addHighlightedField("name")//设置高亮字段
                    .setHighlighterPreTags("<font color='blue'>")
                    .setHighlighterPostTags("</font>");//高亮风格
            builder.setQuery( QueryBuilders.fuzzyQuery("name", "hadoop"));
            SearchResponse searchResponse = builder.get();
            SearchHits searchHits = searchResponse.getHits();
            SearchHit[] hits = searchHits.getHits();
            long total = searchHits.getTotalHits();
            System.out.println("总共条数:" + total);//总共查询到多少条数据
            for (SearchHit searchHit : hits) {
     
                Map<String, Object> source = searchHit.getSource();//这是最新的:searchHit.getSourceAsMap()
                Map<String, HighlightField> highlightFields = searchHit.getHighlightFields();
                System.out.println("---------------------------");
                String name = source.get("name").toString();
                String author = source.get("author").toString();
                System.out.println("name=" + name);
                System.out.println("author=" + author);
                HighlightField highlightField = highlightFields.get("name");
                if(highlightField != null) {
                    Text[] fragments = highlightField.fragments();
                    name = "";
                    for (Text text : fragments) {
                        name += text.toString();
                    }
                }
                System.out.println("name: " + name);
                System.out.println("author: " + author);
            }
        }
     
    }

    与SQL使用LIMIT来控制单“页”数量类似,Elasticsearch使用的是from以及size两个参数:

         from:从哪条结果开始,默认值为0

         size:每次返回多少个结果,默认值为10

    假设每页显示5条结果,那么1至3页的请求就是:

        GET /_search?size=5

        GET /_search?size=5&from=5

        GET /_search?size=5&from=10

    注意:不要一次请求过多或者页码过大的结果,这么会对服务器造成很大的压力。因为它们会在返回前排序。一个请求会经过多个分片。每个分片都会生成自己的排序结果。然后再进行集中整理,以确保最终结果的正确性。
    ————————————————
    版权声明:本文为CSDN博主「R_记忆犹新」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
    原文链接:https://blog.csdn.net/qq_28844767/article/details/84946433

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值