ElasticSearch入门(第一篇)

最新推荐文章于 2022-12-29 12:40:57 发布

江上飞鱼

最新推荐文章于 2022-12-29 12:40:57 发布

阅读量419

点赞数

分类专栏： ElasticSearch

本文链接：https://blog.csdn.net/jianghuiyun/article/details/100057909

版权

ElasticSearch 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

ES版本：7.3.1

一些概念：

词频：所查找的单词在文档中出现次数越多，得分越高

逆文档词频：如果某个单词在所有文档中比较少见，那么该词的权重越高，得分也会越高

和关系型数据库对应理解：

ES	索引	类型	文档
RMDB	数据库	表	行

逻辑设计——搜索应用所要注意的。用于索引和搜索的基本单位是文档，可以将其认为是关系数据库里的一行。文档以类型来分组，类型包含文档，类似表格包含若干行，最终一个或多个类型存于同一个索引中，索引是更大的容器，类似于SQL世界中的数据库。

物理设计——在后台Elasticsearch是如何处理数据的。Elasticsearch将每个索引划分为分片，每份分片可以在集群中的不同服务器间迁移。集群管理的时候需要留心，物理设计的配置方式决定来集群的性能、可扩展性和可用性。

默认情况下每个索引由5个分片组成，而每份分片又有一个副本。

分片：分片是es处理的最小单元。一份分片是Lucene的索引：一个包含倒排索引的文件目录。

副本分片可以在运行的时候进行添加和移动，而主分片不可以。

索引由一个或者多个多个主分片以及零个或多个副本分片构成。

倒排索引：倒排索引的结构使得es在不扫描所有文档的情况下，就能告诉你哪些文档包含特定的词条（单词）。

例子：get-togetoer索引的首个主分片可能包含何种信息：该分片称为get-together0,他是一个Lucene索引、一个倒排索引。他默认存储原始文档内容，再加上一些额外的信息，如词条字典和词频，这些都能帮助到搜索。

词条字典：词条字典将每个词条和包含该词条的文档映射起来。搜索的时候es没必要为了某个词条而扫描所有文档，而是根据这个字典快速的识别匹配的文档。

索引一篇文档时发生了什么：默认情况下，系统首先根据文档ID散列值选择一个主分片，并将该文档发送到主分片。然后文档被发送到该主分片的所有副本分片进行索引。这使得副本分片和主分片之间保持数据同步，使得副本分片可以服务于搜索请求，并在原有主分片无法使用的情况下升级为主分片，同时搜索时可以在主分片和副本分片之间进行请求负载。

集群：一个节点就是Elasticsearch的实例。在服务器上启动es之后你就拥了一个节点，如果在另外一台上启动一个es实例，你就拥有另一个节点。甚至可以启动多个es进程在同一个服务器上拥有多个节点。多个节点可以加入同一个集群。集群对性能和稳定性都有好处，但他也有个缺点：必须确定节点之间足够快速通信，并且不会产生大脑分裂（集群的2个部分不能彼此交流，都认为对方宕机了）。

水平扩展：添加更多的节点到同一个集群中，现有的分片在所有的节点中进行负载均衡。

垂直扩展：为es节点增加更多的硬件资源。

下载地址：https://www.elastic.co/cn/downloads/elasticsearch

将压缩包解压

运行 bin/elasticsearch (或者 bin\elasticsearch.bat 在Windows系统上)

访问地址：http://localhost:9200/

1、创建文档

curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "John Doe"
}
'

说明：

参数：pretty=true或者仅仅是pretty，无论请求是否通过curl处理，我们使用后者。默认返回的json在一行里显示，而这个pretty参数使得返回json有更好的可读性。

es默认自动创建get-together索引，并且为group类型创建一个新的映射。

-d为可选参数，表示后面带参数。

-H 表示携带的请求头部。

返回如下：

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}

2、搜索文档

curl -X GET "localhost:9200/customer/_doc/1?pretty"

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}

3、用_bulk批量操作文档

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"

说明：数量在1000到5000之间同时内容大小在5M到15M之间比较佳

测试数据地址：https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json

{
  "took" : 533,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
...//省略
}

查看：

curl "localhost:9200/_cat/indices?v"

health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   commodity    9srG6uQ4SCaM8A6U7q4epQ   5   1          0            0      1.3kb          1.3kb
yellow open   bank         M9VIEPM2TaSkPrRrhS-R9Q   1   1       1000            0    414.3kb        414.3kb
yellow open   book-index   g7zhCgucSoeJknd08ygxww   3   2          4            0      7.5kb          7.5kb
yellow open   get-together Xb9iBol9R1WHJGQfs0GIeA   1   1          1            0      4.3kb          4.3kb
yellow open   book         xJ3-ThlRSHChfrRIxOOOyw   5   1          0            0      1.3kb          1.3kb
yellow open   customer     StLOulvsSlSM4aU_o6LL1Q   1   1          1            0      3.5kb          3.5kb

4、使用_search进行搜索

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}
'

说明：默认hits返回最开始到10条

{
  "took" : 14,//查询花费到时间，单位ms
  "timed_out" : false,//是否查询请求超时
  "_shards" : {//分片
    "total" : 1,//总分片数
    "successful" : 1,//查询击中的分片数
    "skipped" : 0,//跳过的分片数
    "failed" : 0//失败的分片数
  },
  "hits" : {
    "total" : {
      "value" : 1000,//找到的文档总数
      "relation" : "eq"
    },
    "max_score" : null,//相关性得分最高的文档
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "0",
        "_score" : null,//文档相关性得分（使用match_all时将不起效）
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [//文档的排序位置（未按相关性分数排序时）,例如该文档在返回结果的第三条，则值为2
          0
        ]
      },  
...//省略
}

5、使用from和size进行搜索分页

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}
'

说明：from开始位置，size取的条数

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : null,
        "_source" : {
          "account_number" : 10,
          "balance" : 46170,
          "firstname" : "Dominique",
          "lastname" : "Park",
          "age" : 37,
          "gender" : "F",
          "address" : "100 Gatling Place",
          "employer" : "Conjurica",
          "email" : "dominiquepark@conjurica.com",
          "city" : "Omar",
          "state" : "NJ"
        },
        "sort" : [
          10
        ]
      },
...//省略
}

6、使用match进行查询

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "address": "mill lane" } }
}
'

说明：索引bank中address字段包含mill和lane的客户文档信息

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      },
...//省略
}

7、match_phrase 使用整个词组搜索而不是单个词

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
'

说明：查找address字段包含词“mill lane”的客户文档信息

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      }
    ]
  }
}

8、使用bool进行复杂查询

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
'

说明：must match要求，should matc合意的，must not match不合意的。例如查询要求年龄为40岁并且居住地址不在ID的客户文档信息。

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 43,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "474",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 474,
          "balance" : 35896,
          "firstname" : "Obrien",
          "lastname" : "Walton",
          "age" : 40,
          "gender" : "F",
          "address" : "192 Ide Court",
          "employer" : "Suremax",
          "email" : "obrienwalton@suremax.com",
          "city" : "Crucible",
          "state" : "UT"
        }
      },
...//省略
}

9、使用range查询

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
'

说明：查找存款在20000到30000之间到客户文档信息。

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 217,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "49",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 49,
          "balance" : 29104,
          "firstname" : "Fulton",
          "lastname" : "Holt",
          "age" : 23,
          "gender" : "F",
          "address" : "451 Humboldt Street",
          "employer" : "Anocha",
          "email" : "fultonholt@anocha.com",
          "city" : "Sunriver",
          "state" : "RI"
        }
      },
...//省略
}

10、使用聚合aggregations

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
'

说明：按州分组bank索引中的所有帐户，并按降序返回帐户最多的十个州

{
  "took" : 22,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 743,
      "buckets" : [
        {
          "key" : "TX",
          "doc_count" : 30
        },
        {
          "key" : "MD",
          "doc_count" : 28
        },
        {
          "key" : "ID",
          "doc_count" : 27
        },
        {
          "key" : "AL",
          "doc_count" : 25
        },
        {
          "key" : "ME",
          "doc_count" : 25
        },
        {
          "key" : "TN",
          "doc_count" : 25
        },
        {
          "key" : "WY",
          "doc_count" : 25
        },
        {
          "key" : "DC",
          "doc_count" : 24
        },
        {
          "key" : "MA",
          "doc_count" : 24
        },
        {
          "key" : "ND",
          "doc_count" : 24
        }
      ]
    }
  }
}

11、使用嵌套聚合

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

说明：在先前的group_by_state聚合中嵌套平均聚合，以计算每个州的平均帐户余额

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 743,
      "buckets" : [
        {
          "key" : "TX",
          "doc_count" : 30,
          "average_balance" : {
            "value" : 26073.3
          }
        },
        {
          "key" : "MD",
          "doc_count" : 28,
          "average_balance" : {
            "value" : 26161.535714285714
          }
        },
        {
          "key" : "ID",
          "doc_count" : 27,
          "average_balance" : {
            "value" : 24368.777777777777
          }
        },
        {
          "key" : "AL",
          "doc_count" : 25,
          "average_balance" : {
            "value" : 25739.56
          }
        },
        {
          "key" : "ME",
          "doc_count" : 25,
          "average_balance" : {
            "value" : 21663.0
          }
        },
        {
          "key" : "TN",
          "doc_count" : 25,
          "average_balance" : {
            "value" : 28365.4
          }
        },
        {
          "key" : "WY",
          "doc_count" : 25,
          "average_balance" : {
            "value" : 21731.52
          }
        },
        {
          "key" : "DC",
          "doc_count" : 24,
          "average_balance" : {
            "value" : 23180.583333333332
          }
        },
        {
          "key" : "MA",
          "doc_count" : 24,
          "average_balance" : {
            "value" : 29600.333333333332
          }
        },
        {
          "key" : "ND",
          "doc_count" : 24,
          "average_balance" : {
            "value" : 26577.333333333332
          }
        }
      ]
    }
  }
}

12、对聚合结果进行排序

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 827,
      "buckets" : [
        {
          "key" : "CO",
          "doc_count" : 14,
          "average_balance" : {
            "value" : 32460.35714285714
          }
        },
        {
          "key" : "NE",
          "doc_count" : 16,
          "average_balance" : {
            "value" : 32041.5625
          }
        },
        {
          "key" : "AZ",
          "doc_count" : 14,
          "average_balance" : {
            "value" : 31634.785714285714
          }
        },
        {
          "key" : "MT",
          "doc_count" : 17,
          "average_balance" : {
            "value" : 31147.41176470588
          }
        },
        {
          "key" : "VA",
          "doc_count" : 16,
          "average_balance" : {
            "value" : 30600.0625
          }
        },
        {
          "key" : "GA",
          "doc_count" : 19,
          "average_balance" : {
            "value" : 30089.0
          }
        },
        {
          "key" : "MA",
          "doc_count" : 24,
          "average_balance" : {
            "value" : 29600.333333333332
          }
        },
        {
          "key" : "IL",
          "doc_count" : 22,
          "average_balance" : {
            "value" : 29489.727272727272
          }
        },
        {
          "key" : "NM",
          "doc_count" : 14,
          "average_balance" : {
            "value" : 28792.64285714286
          }
        },
        {
          "key" : "LA",
          "doc_count" : 17,
          "average_balance" : {
            "value" : 28791.823529411766
          }
        }
      ]
    }
  }
}

Java API的使用

1、创建文档

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                ));
ndexRequest request = new IndexRequest("posts");
request.id("2");
String jsonString = "{" +
                "\"user\":\"张三\"," +
                "\"postDate\":\"2013-01-30\"," +
                "\"message\":\"测试创建文档 Elasticsearch\"" +
                "}";
request.source(jsonString, XContentType.JSON);

//同步执行
IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT);

浏览器访问：http://localhost:9200/posts/_doc/2

{
    "_index":"posts",
    "_type":"_doc",
    "_id":"2",
    "_version":1,
    "_seq_no":1,
    "_primary_term":1,
    "found":true,
    "_source":{
        "user":"张三",
        "postDate":"2013-01-30",
        "message":"测试创建文档 Elasticsearch"
    }
}

2、获取文档

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                ));
GetRequest request = new GetRequest(
                "posts",
                "2");

        /**配置source返回包含的字段*/
        String[] includes = new String[]{"message", "*Date"};
        String[] excludes = Strings.EMPTY_ARRAY;
        FetchSourceContext fetchSourceContext =
                new FetchSourceContext(true, includes, excludes);
        request.fetchSourceContext(fetchSourceContext);

        //同步执行
        GetResponse getResponse = null;
        try {
            getResponse = client.get(request, RequestOptions.DEFAULT);
        } catch (ElasticsearchException e) {
            if (e.status() == RestStatus.NOT_FOUND) {
                //未找到文档
            }
        }
        if (getResponse.isExists()) {
            //获取文档String类型
            String sourceAsString = getResponse.getSourceAsString();
            System.out.println(sourceAsString);
            //获取文档map类型
            Map<String, Object> sourceAsMap = getResponse.getSourceAsMap();
            //获取文档byte[]类型
            byte[] sourceAsBytes = getResponse.getSourceAsBytes();
        } else {
            //没获取到文档的场景

        }

3、判断文档是否存在

 RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")/*,
                        new HttpHost("localhost", 9201, "http")*/
                ));
GetRequest getRequest = new GetRequest(
                "posts",
                "2");
        //Disable fetching _source
        getRequest.fetchSourceContext(new FetchSourceContext(false));
        //Disable fetching stored fields.
        getRequest.storedFields("_none_");

        boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
        System.out.println(exists);

4、删除文档

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")/*,
                        new HttpHost("localhost", 9201, "http")*/
                ));
DeleteRequest request = new DeleteRequest(
                "posts",
                "2");
        DeleteResponse deleteResponse = client.delete(
                request, RequestOptions.DEFAULT);
        if (deleteResponse.getResult() == DocWriteResponse.Result.NOT_FOUND) {
            System.out.println("文档不存在");
        } else {
            System.out.println("删除成功");
        }

5、更新文档

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")/*,
                        new HttpHost("localhost", 9201, "http")*/
                ));
UpdateRequest request = new UpdateRequest(
                "posts",
                "2");

        String jsonString = "{" +
                "\"updated\":\"2017-01-01\"," +
                "\"reason\":\"daily update\"" +
                "}";
        request.doc(jsonString, XContentType.JSON);

        UpdateResponse updateResponse = null;
        try {
            updateResponse = client.update(
                    request, RequestOptions.DEFAULT);
        } catch (ElasticsearchException e) {
            if (e.status() == RestStatus.NOT_FOUND) {
                System.out.println("不存在");
            }
        }

浏览器访问：http://localhost:9200/posts/_doc/2

{"_index":"posts","_type":"_doc","_id":"2","_version":2,"_seq_no":6,"_primary_term":1,"found":true,"_source":{"user":"张三","postDate":"2013-01-30","message":"测试创建文档 Elasticsearch","reason":"daily update","updated":"2017-01-01"}}

6、词向量api

RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")/*,
                        new HttpHost("localhost", 9201, "http")*/
                ));
TermVectorsRequest request = new TermVectorsRequest("posts", "2");
        request.setFields("reason");

        /**同步执行*/
        TermVectorsResponse response =
                client.termvectors(request, RequestOptions.DEFAULT);

        /**获取词向量更多信息**/
        for (TermVectorsResponse.TermVector tv : response.getTermVectorsList()) {
            /** 当前字段名称*/
            String fieldname = tv.getFieldName();
            /**字段统计 当前字段文档数 */
            int docCount = tv.getFieldStatistics().getDocCount();
            /**总词频**/
            long sumTotalTermFreq =
                    tv.getFieldStatistics().getSumTotalTermFreq();
            /**逆文档频率**/
            long sumDocFreq = tv.getFieldStatistics().getSumDocFreq();
            if (tv.getTerms() != null) {
                /**当前字段terms*/
                List<TermVectorsResponse.TermVector.Term> terms =
                        tv.getTerms();
                for (TermVectorsResponse.TermVector.Term term : terms) {
                    /**词条名称*/
                    String termStr = term.getTerm();
                    /**Term frequency of the term*/
                    /**词频*/
                    int termFreq = term.getTermFreq();
                    /**逆文档频率*/
                    int docFreq = term.getDocFreq();
                    /**总词频*/
                    long totalTermFreq = term.getTotalTermFreq();
                    /**词条得分*/
                    float score = term.getScore();
                    if (term.getTokens() != null) {
                        /**词条分词*/
                        List<TermVectorsResponse.TermVector.Token> tokens =
                                term.getTokens();
                        for (TermVectorsResponse.TermVector.Token token : tokens) {
                            /**分词位置*/
                            int position = token.getPosition();
                            /**分词开始偏移量*/
                            int startOffset = token.getStartOffset();
                            /**分词结束偏移量*/
                            int endOffset = token.getEndOffset();
                            /**分词 Payload */
                            String payload = token.getPayload();
                        }
                    }
                }
            }
        }

控制台打印：

{
    "_index":"posts",
    "_type":"_doc",
    "_id":"2",
    "_version":2,
    "found":true,
    "took":2,
    "term_vectors":{
        "reason":{
            "field_statistics":{
                "sum_doc_freq":2,
                "doc_count":1,
                "sum_ttf":2
            },
            "terms":{
                "daily":{
                    "term_freq":1,
                    "tokens":[
                        {
                            "position":0,
                            "start_offset":0,
                            "end_offset":5
                        }
                    ]
                },
                "update":{
                    "term_freq":1,
                    "tokens":[
                        {
                            "position":1,
                            "start_offset":6,
                            "end_offset":12
                        }
                    ]
                }
            }
        }
    }
}

参考文档：

https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.3/index.html

书籍：《Elasticsearch实战》

江上飞鱼

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
ElasticSearch入门(第一篇)

ES版本：7.3.1一些概念：词频：所查找的单词在文档中出现次数越多，得分越高逆文档词频：如果某个单词在所有文档中比较少见，那么该词的权重越高，得分也会越高和关系型数据库对应理解：ES 索引类型文档 RMDB 数据库表行逻辑设计——搜索应用所要注意的。用于索引和搜索的基本单位是文档，可以将其认为是关系数据库里的一行。文...
复制链接

扫一扫

专栏目录