ElasticSearch入门(第一篇)

ES版本:7.3.1

一些概念:

词频:所查找的单词在文档中出现次数越多,得分越高

逆文档词频:如果某个单词在所有文档中比较少见,那么该词的权重越高,得分也会越高

和关系型数据库对应理解:

ES索引类型文档
RMDB数据库

 

逻辑设计——搜索应用所要注意的。用于索引和搜索的基本单位是文档,可以将其认为是关系数据库里的一行。文档以类型来分组,类型包含文档,类似表格包含若干行,最终一个或多个类型存于同一个索引中,索引是更大的容器,类似于SQL世界中的数据库。

物理设计——在后台Elasticsearch是如何处理数据的。Elasticsearch将每个索引划分为分片,每份分片可以在集群中的不同服务器间迁移。集群管理的时候需要留心,物理设计的配置方式决定来集群的性能、可扩展性和可用性。

           默认情况下每个索引由5个分片组成,而每份分片又有一个副本。

分片:分片是es处理的最小单元。一份分片是Lucene的索引:一个包含倒排索引的文件目录。

           副本分片可以在运行的时候进行添加和移动,而主分片不可以。

           索引由一个或者多个多个主分片以及零个或多个副本分片构成。

倒排索引:倒排索引的结构使得es在不扫描所有文档的情况下,就能告诉你哪些文档包含特定的词条(单词)。

       例子:get-togetoer索引的首个主分片可能包含何种信息:该分片称为get-together0,他是一个Lucene索引、一个倒排索引。他默认存储原始文档内容,再加上一些额外的信息,如词条字典和词频,这些都能帮助到搜索。

词条字典:词条字典将每个词条和包含该词条的文档映射起来。搜索的时候es没必要为了某个词条而扫描所有文档,而是根据这个字典快速的识别匹配的文档。

索引一篇文档时发生了什么:默认情况下,系统首先根据文档ID散列值选择一个主分片,并将该文档发送到主分片。然后文档被发送到该主分片的所有副本分片进行索引。这使得副本分片和主分片之间保持数据同步,使得副本分片可以服务于搜索请求,并在原有主分片无法使用的情况下升级为主分片,同时搜索时可以在主分片和副本分片之间进行请求负载。

集群:一个节点就是Elasticsearch的实例。在服务器上启动es之后你就拥了一个节点,如果在另外一台上启动一个es实例,你就拥有另一个节点。甚至可以启动多个es进程在同一个服务器上拥有多个节点。多个节点可以加入同一个集群。集群对性能和稳定性都有好处,但他也有个缺点:必须确定节点之间足够快速通信,并且不会产生大脑分裂(集群的2个部分不能彼此交流,都认为对方宕机了)。

        水平扩展:添加更多的节点到同一个集群中,现有的分片在所有的节点中进行负载均衡。

        垂直扩展:为es节点增加更多的硬件资源。

下载地址:https://www.elastic.co/cn/downloads/elasticsearch

将压缩包解压

运行 bin/elasticsearch (或者 bin\elasticsearch.bat 在Windows系统上)

访问地址:http://localhost:9200/

1、创建文档

curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "John Doe"
}
'

 说明:

        参数:pretty=true或者仅仅是pretty,无论请求是否通过curl处理,我们使用后者。默认返回的json在一行里显示,而这个pretty参数使得返回json有更好的可读性。

        es默认自动创建get-together索引,并且为group类型创建一个新的映射。

        -d为可选参数,表示后面带参数。

        -H 表示携带的请求头部。

返回如下:

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}

2、搜索文档

curl -X GET "localhost:9200/customer/_doc/1?pretty"

返回:

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}

3、用_bulk批量操作文档

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"

 说明:数量在1000到5000之间同时内容大小在5M到15M之间比较佳

           测试数据地址:https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json

返回:

{
  "took" : 533,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
...//省略
}

查看:

curl "localhost:9200/_cat/indices?v"
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   commodity    9srG6uQ4SCaM8A6U7q4epQ   5   1          0            0      1.3kb          1.3kb
yellow open   bank         M9VIEPM2TaSkPrRrhS-R9Q   1   1       1000            0    414.3kb        414.3kb
yellow open   book-index   g7zhCgucSoeJknd08ygxww   3   2          4            0      7.5kb          7.5kb
yellow open   get-together Xb9iBol9R1WHJGQfs0GIeA   1   1          1            0      4.3kb          4.3kb
yellow open   book         xJ3-ThlRSHChfrRIxOOOyw   5   1          0            0      1.3kb          1.3kb
yellow open   customer     StLOulvsSlSM4aU_o6LL1Q   1   1          1            0      3.5kb          3.5kb

4、使用_search进行搜索

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}
'

说明:默认hits返回最开始到10条

返回:

{
  "took" : 14,//查询花费到时间,单位ms
  "timed_out" : false,//是否查询请求超时
  "_shards" : {//分片
    "total" : 1,//总分片数
    "successful" : 1,//查询击中的分片数
    "skipped" : 0,//跳过的分片数
    "failed" : 0//失败的分片数
  },
  "hits" : {
    "total" : {
      "value" : 1000,//找到的文档总数
      "relation" : "eq"
    },
    "max_score" : null,//相关性得分最高的文档
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "0",
        "_score" : null,//文档相关性得分(使用match_all时将不起效)
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [//文档的排序位置(未按相关性分数排序时),例如该文档在返回结果的第三条,则值为2
          0
        ]
      },  
...//省略
}

5、使用from和size进行搜索分页

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}
'

  说明:from开始位置,size取的条数

返回:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : null,
        "_source" : {
          "account_number" : 10,
          "balance" : 46170,
          "firstname" : "Dominique",
          "lastname" : "Park",
          "age" : 37,
          "gender" : "F",
          "address" : "100 Gatling Place",
          "employer" : "Conjurica",
          "email" : "dominiquepark@conjurica.com",
          "city" : "Omar",
          "state" : "NJ"
        },
        "sort" : [
          10
        ]
      },
...//省略
}

6、使用match进行查询

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "address": "mill lane" } }
}
'

说明:索引bank中address字段包含mill和lane的客户文档信息

返回:

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      },
...//省略
}

7、match_phrase 使用整个词组搜索而不是单个词

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
'

说明:查找address字段包含词“mill lane”的客户文档信息

返回:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      }
    ]
  }
}

8、使用bool进行复杂查询

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
'

说明:must match要求,should matc合意的,must not match不合意的。例如查询要求年龄为40岁并且居住地址不在ID的客户文档信息。

返回:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 43,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "474",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 474,
          "balance" : 35896,
          "firstname" : "Obrien",
          "lastname" : "Walton",
          "age" : 40,
          "gender" : "F",
          "address" : "192 Ide Court",
          "employer" : "Suremax",
          "email" : "obrienwalton@suremax.com",
          "city" : "Crucible",
          "state" : "UT"
        }
      },
...//省略
}

9、使用range查询

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
'

说明:查找存款在20000到30000之间到客户文档信息。

返回:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 217,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "49",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 49,
          "balance" : 29104,
          "firstname" : "Fulton",
          "lastname" : "Holt",
          "age" : 23,
          "gender" : "F",
          "address" : "451 Humboldt Street",
          "employer" : "Anocha",
          "email" : "fultonholt@anocha.com",
          "city" : "Sunriver",
          "state" : "RI"
        }
      },
...//省略
}

10、使用聚合aggregations

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
'

说明:按州分组bank索引中的所有帐户,并按降序返回帐户最多的十个州

返回:

{
  "took" : 22,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 743,
      "buckets" : [
        {
          "key" : "TX",
          "doc_count" : 30
        },
        {
          "key" : "MD",
          "doc_count" : 28
        },
        {
          "key" : "ID",
          "doc_count" : 27
        },
        {
          "key" : "AL",
          "doc_count" : 25
        },
        {
          "key" : "ME",
          "doc_count" : 25
        },
        {
          "key" : "TN",
          "doc_count" : 25
        },
        {
          "key" : "WY",
          "doc_count" : 25
        },
        {
          "key" : "DC",
          "doc_count" : 24
        },
        {
          "key" : "MA",
          "doc_count" : 24
        },
        {
          "key" : "ND",
          "doc_count" : 24
        }
      ]
    }
  }
}

11、使用嵌套聚合

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

说明:在先前的group_by_state聚合中嵌套平均聚合,以计算每个州的平均帐户余额

返回:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 743,
      "buckets" : [
        {
          "key" : "TX",
          "doc_count" : 30,
          "average_balance" : {
            "value" : 26073.3
          }
        },
        {
          "key" : "MD",
          "doc_count" : 28,
          "average_balance" : {
            "value" : 26161.535714285714
          }
        },
        {
          "key" : "ID",
          "doc_count" : 27,
          "average_balance" : {
            "value" : 24368.777777777777
          }
        },
        {
          "key" : "AL",
          "doc_count" : 25,
          "average_balance" : {
            "value" : 25739.56
          }
        },
        {
          "key" : "ME",
          "doc_count" : 25,
          "average_balance" : {
            "value" : 21663.0
          }
        },
        {
          "key" : "TN",
          "doc_count" : 25,
          "average_balance" : {
            "value" : 28365.4
          }
        },
        {
          "key" : "WY",
          "doc_count" : 25,
          "average_balance" : {
            "value" : 21731.52
          }
        },
        {
          "key" : "DC",
          "doc_count" : 24,
          "average_balance" : {
            "value" : 23180.583333333332
          }
        },
        {
          "key" : "MA",
          "doc_count" : 24,
          "average_balance" : {
            "value" : 29600.333333333332
          }
        },
        {
          "key" : "ND",
          "doc_count" : 24,
          "average_balance" : {
            "value" : 26577.333333333332
          }
        }
      ]
    }
  }
}

12、对聚合结果进行排序

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

返回:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 827,
      "buckets" : [
        {
          "key" : "CO",
          "doc_count" : 14,
          "average_balance" : {
            "value" : 32460.35714285714
          }
        },
        {
          "key" : "NE",
          "doc_count" : 16,
          "average_balance" : {
            "value" : 32041.5625
          }
        },
        {
          "key" : "AZ",
          "doc_count" : 14,
          "average_balance" : {
            "value" : 31634.785714285714
          }
        },
        {
          "key" : "MT",
          "doc_count" : 17,
          "average_balance" : {
            "value" : 31147.41176470588
          }
        },
        {
          "key" : "VA",
          "doc_count" : 16,
          "average_balance" : {
            "value" : 30600.0625
          }
        },
        {
          "key" : "GA",
          "doc_count" : 19,
          "average_balance" : {
            "value" : 30089.0
          }
        },
        {
          "key" : "MA",
          "doc_count" : 24,
          "average_balance" : {
            "value" : 29600.333333333332
          }
        },
        {
          "key" : "IL",
          "doc_count" : 22,
          "average_balance" : {
            "value" : 29489.727272727272
          }
        },
        {
          "key" : "NM",
          "doc_count" : 14,
          "average_balance" : {
            "value" : 28792.64285714286
          }
        },
        {
          "key" : "LA",
          "doc_count" : 17,
          "average_balance" : {
            "value" : 28791.823529411766
          }
        }
      ]
    }
  }
}

Java API的使用

1、创建文档

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                ));
ndexRequest request = new IndexRequest("posts");
request.id("2");
String jsonString = "{" +
                "\"user\":\"张三\"," +
                "\"postDate\":\"2013-01-30\"," +
                "\"message\":\"测试创建文档 Elasticsearch\"" +
                "}";
request.source(jsonString, XContentType.JSON);

//同步执行
IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT);

浏览器访问:http://localhost:9200/posts/_doc/2

返回:

{
    "_index":"posts",
    "_type":"_doc",
    "_id":"2",
    "_version":1,
    "_seq_no":1,
    "_primary_term":1,
    "found":true,
    "_source":{
        "user":"张三",
        "postDate":"2013-01-30",
        "message":"测试创建文档 Elasticsearch"
    }
}

2、获取文档

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                ));
GetRequest request = new GetRequest(
                "posts",
                "2");

        /**配置source返回包含的字段*/
        String[] includes = new String[]{"message", "*Date"};
        String[] excludes = Strings.EMPTY_ARRAY;
        FetchSourceContext fetchSourceContext =
                new FetchSourceContext(true, includes, excludes);
        request.fetchSourceContext(fetchSourceContext);

        //同步执行
        GetResponse getResponse = null;
        try {
            getResponse = client.get(request, RequestOptions.DEFAULT);
        } catch (ElasticsearchException e) {
            if (e.status() == RestStatus.NOT_FOUND) {
                //未找到文档
            }
        }
        if (getResponse.isExists()) {
            //获取文档String类型
            String sourceAsString = getResponse.getSourceAsString();
            System.out.println(sourceAsString);
            //获取文档map类型
            Map<String, Object> sourceAsMap = getResponse.getSourceAsMap();
            //获取文档byte[]类型
            byte[] sourceAsBytes = getResponse.getSourceAsBytes();
        } else {
            //没获取到文档的场景

        }

返回:

3、判断文档是否存在

 RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")/*,
                        new HttpHost("localhost", 9201, "http")*/
                ));
GetRequest getRequest = new GetRequest(
                "posts",
                "2");
        //Disable fetching _source
        getRequest.fetchSourceContext(new FetchSourceContext(false));
        //Disable fetching stored fields.
        getRequest.storedFields("_none_");

        boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
        System.out.println(exists);

返回:

4、删除文档

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")/*,
                        new HttpHost("localhost", 9201, "http")*/
                ));
DeleteRequest request = new DeleteRequest(
                "posts",
                "2");
        DeleteResponse deleteResponse = client.delete(
                request, RequestOptions.DEFAULT);
        if (deleteResponse.getResult() == DocWriteResponse.Result.NOT_FOUND) {
            System.out.println("文档不存在");
        } else {
            System.out.println("删除成功");
        }

返回:

5、更新文档

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")/*,
                        new HttpHost("localhost", 9201, "http")*/
                ));
UpdateRequest request = new UpdateRequest(
                "posts",
                "2");

        String jsonString = "{" +
                "\"updated\":\"2017-01-01\"," +
                "\"reason\":\"daily update\"" +
                "}";
        request.doc(jsonString, XContentType.JSON);

        UpdateResponse updateResponse = null;
        try {
            updateResponse = client.update(
                    request, RequestOptions.DEFAULT);
        } catch (ElasticsearchException e) {
            if (e.status() == RestStatus.NOT_FOUND) {
                System.out.println("不存在");
            }
        }

返回:

浏览器访问:http://localhost:9200/posts/_doc/2

返回:

{"_index":"posts","_type":"_doc","_id":"2","_version":2,"_seq_no":6,"_primary_term":1,"found":true,"_source":{"user":"张三","postDate":"2013-01-30","message":"测试创建文档 Elasticsearch","reason":"daily update","updated":"2017-01-01"}}

6、词向量api

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")/*,
                        new HttpHost("localhost", 9201, "http")*/
                ));
TermVectorsRequest request = new TermVectorsRequest("posts", "2");
        request.setFields("reason");

        /**同步执行*/
        TermVectorsResponse response =
                client.termvectors(request, RequestOptions.DEFAULT);

        /**获取词向量更多信息**/
        for (TermVectorsResponse.TermVector tv : response.getTermVectorsList()) {
            /** 当前字段名称*/
            String fieldname = tv.getFieldName();
            /**字段统计 当前字段文档数 */
            int docCount = tv.getFieldStatistics().getDocCount();
            /**总词频**/
            long sumTotalTermFreq =
                    tv.getFieldStatistics().getSumTotalTermFreq();
            /**逆文档频率**/
            long sumDocFreq = tv.getFieldStatistics().getSumDocFreq();
            if (tv.getTerms() != null) {
                /**当前字段terms*/
                List<TermVectorsResponse.TermVector.Term> terms =
                        tv.getTerms();
                for (TermVectorsResponse.TermVector.Term term : terms) {
                    /**词条名称*/
                    String termStr = term.getTerm();
                    /**Term frequency of the term*/
                    /**词频*/
                    int termFreq = term.getTermFreq();
                    /**逆文档频率*/
                    int docFreq = term.getDocFreq();
                    /**总词频*/
                    long totalTermFreq = term.getTotalTermFreq();
                    /**词条得分*/
                    float score = term.getScore();
                    if (term.getTokens() != null) {
                        /**词条分词*/
                        List<TermVectorsResponse.TermVector.Token> tokens =
                                term.getTokens();
                        for (TermVectorsResponse.TermVector.Token token : tokens) {
                            /**分词位置*/
                            int position = token.getPosition();
                            /**分词开始偏移量*/
                            int startOffset = token.getStartOffset();
                            /**分词结束偏移量*/
                            int endOffset = token.getEndOffset();
                            /**分词 Payload */
                            String payload = token.getPayload();
                        }
                    }
                }
            }
        }

控制台打印:

返回:

{
    "_index":"posts",
    "_type":"_doc",
    "_id":"2",
    "_version":2,
    "found":true,
    "took":2,
    "term_vectors":{
        "reason":{
            "field_statistics":{
                "sum_doc_freq":2,
                "doc_count":1,
                "sum_ttf":2
            },
            "terms":{
                "daily":{
                    "term_freq":1,
                    "tokens":[
                        {
                            "position":0,
                            "start_offset":0,
                            "end_offset":5
                        }
                    ]
                },
                "update":{
                    "term_freq":1,
                    "tokens":[
                        {
                            "position":1,
                            "start_offset":6,
                            "end_offset":12
                        }
                    ]
                }
            }
        }
    }
}

参考文档:

https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.3/index.html

书籍:《Elasticsearch实战》

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值