elasticsearch6版本
- elasticsearch介绍
- 介绍以及应用场景
es是一个基于Lucene构建的开源的高扩展的分布式全文搜索引擎。 它可以在很短的时间内存储、搜索和分析大量的数据。
支持简易的横向扩展,可以轻松地对服务进行扩/缩容,能够扩展至数以百计的服务器来存储以及处理PB级的数据。
采用Java语言开发,通过简单的RESTful API来实现Lucene的复杂性操作。
官网: Elasticsearch:官方分布式搜索和分析引擎 | Elastic
https://www.elastic.co/cn/elasticsearch/features - 存储结构
Elasticsearch是面向文档型数据库,一条数据在这里就是一个文档,用JSON作为文档序列化的格式,比如下面这条数据:
es与数据库对照{ "name":"张三", "age":28, "phone":"13111111111", "address":"安徽省合肥市" }
elasticsearch 数据库 索引Index 数据库Database 类型Type 表Table 文档Document 数据行Row 字段Field 数据列Column
- Elasticsearch安装
地址: https://blog.csdn.net/xiaobo5264063/article/details/114555170 - 正向索引和倒排索引
- 正向索引
正排表是以文档的 ID 为关键字,记录文档中每个词的位置信息,查找时扫描表中每个文档中词的信息,直到找出所有包含查询关键字的文档。
正向索引在建立索引的时候结构简单,方便维护,但是在查询时需对所有的文档进行扫描,且如果文档中对应的关键词过多,就会造成资源浪费,从而使检索时间过长,检索效率变低。 - 倒排索引
倒排表以字或词为关键字进行索引,表中关键词对应着出现这个词的所有文档,表中记录该文档的ID和字符在该文档中出现的位置情况。
倒排索引由于每个字或词对应的文档数量在动态变化,所以倒排表的建立和维护都较为复杂,但是在查询的时候可以一次得到查询关键词所对应的所有文档,所以效率高于正排表。
es在保存文档时,默认情况下会保存两份内容,一份是_source 中的数据,另一份则是通过分词、排序等一系列过程生成的倒排索引文件,倒排索引中保存了词项和文档之间的对应关系。
搜索时,当 es 接收到用户的搜索请求之后,就会去倒排索引中查询,通过的倒排索引中维护的倒排记录表找到关键词对应的文档集合,然后对文档进行评分、排序、高亮等处理,处理完成后返回文档。
- 文档映射
文档映射就是给文档中的字段指定类型、分词器等属性。
- 核心数据类型
文档: Field datatypes | Elasticsearch Guide [6.4] | Elastic类别 数据类型 核心类型 text,keyword,long, integer,short,byte,double,date,boolean等等 复杂类型 Object(单个json对象),Nested(json对象数组) 地理类型 geo_point(纬度/经度),geo_shape(多边形等复杂形状) 专用类型 ip,completion,token_count(令牌计算),join(父子关系文档),mapper_murmur3等等 ...... ......
字符串分为:
text:支持分词,但不能用来排序和聚合。 适用于新闻内容、产品描述等等。
keyword:不支持分词,但可以被用来检索过滤、排序和聚合。适用于结构化的字段,例如标签、email、手机号码等等。
- Elasticsearch中文分词
- 因为Elasticsearch中默认的标准分词器分词器对中文分词不是很友好,会将中文词语拆分成一个一个中文的汉子。因此引入中文分词器es-ik插件。
### 默认分词 GET /source/_analyze { "analyzer": "standard", "text": "床前明月光" } ### 指定分词1 GET /source/_analyze { "analyzer": "ik_smart", "text": "床前明月光" } ### 指定分词2 GET /source/_analyze { "analyzer": "ik_max_word", "text": "床前明月光" }
- Elasticsearch数据操作
- 基本操作
#######1、创建索引 PUT /basic #######2、查询索引 GET /basic #######3、添加/编辑文档 方式: /索引/文档/id 编辑同样使用该方法 PUT /basic/user/1 { "name":"张三", "age":11, "phone":"13111111111", "address":"安徽省合肥市" } #######4、查询文档 方式: /索引/文档/id GET /basic/user/1 #######5、查看文档类型 GET /basic/user/_mapping #######6、查看文档对应的分片位置 GET _cat/shards/basic?v #######7、删除索引 DELETE /basic #######8、查看全部索引 GET _cat/indices #######9、添加数据自动返回ID POST /basic/user/ { "name":"李四", "age":22, "phone":"15222222222", "address":"上海市" }
- 初始数据
PUT /source #######设置文档类型 POST /source/_mapping/book { "book":{ "properties":{ "author":{ "type":"keyword" }, "title":{ "type":"text", "analyzer":"ik_max_word", "search_analyzer":"ik_max_word" }, "content":{ "type":"text", "analyzer":"ik_max_word", "search_analyzer":"ik_max_word" }, "time":{ "type":"date" }, "price":{ "type":"long" } } } } PUT /source/book/1 { "author":"李白", "title":"早发白帝城", "content":"朝辞白帝彩云间,千里江陵一日还,两岸猿声啼不住,轻舟已过万重山", "time":"2015-11-11", "price":11 } PUT /source/book/2 { "author":"崔护", "title":"题都城南庄", "content":"去年今日此门中,人面桃花相映红,人面不知何处去,桃花依旧笑春风", "time":"2016-12-12", "price":22 } PUT /source/book/3 { "author":"白居易", "title":"大林寺桃花", "content":"人间四月芳菲尽,山寺桃花始盛开,长恨春归无觅处,不知转入此中来", "time":"2016-12-12", "price":33 } PUT /source/book/4 { "author":"李白", "title":"静夜思", "content":"床前明月光,疑是地上霜,举头望明月,低头思故乡", "time":"2017-07-07", "price":44 } PUT /source/book/5 { "author":"贺知章", "title":"咏柳", "content":"碧玉妆成一树高,万条垂下绿丝绦。不知细叶谁裁出,二月春风似剪刀", "time":"2018-10-10", "price":55 }
- 查询与过滤
ES中的查询请求有两种方式,一种是简易版的URL查询,另一种是基于JSON的结构化查询(DSL)
URL查询
DSL语言#######1、查询所有数据 GET /source/book/_search #######2、根据id集合查询 GET /source/book/_mget { "ids":[1,2] } #######3、查询价格为22的文档 GET /source/book/_search?q=price:22 #######4、查询价格在20-50的并排序 GET /source/book/_search?q=count[20 TO 50]&sort=price:desc #######5、排序并分页查询 GET /source/book/_search?sort=price:desc&from=0&size=2 #######6、只展示name、age字段的属性值 GET /source/book/_search?&_source=title,author
##match查询相当于模糊匹配,先分词在查询###查询所有并排序 GET /source/book/_search { "query": { "match_all": {} }, "sort":{ "count":"desc" } } ###根据内容搜索 默认词与词是 or的关系 GET /source/book/_search { "query": { "match": { "content": "春风桃花" } } } ###根据内容搜索 设置词与词是 and 的关系 GET /source/book/_search { "query": { "match": { "content": { "query": "春风桃花", "operator": "and" } } } } ###指定多个查询字段条件 GET /source/book/_search { "query": { "multi_match": { "query": "静夜思桃花", "fields": ["tite","content"] } } } ###根据日期范围查询 GET /source/book/_search { "query":{ "range":{ "time":{ "gte":"2016-11-11", "lte":"2017-11-11" } } } } ###根据id查询并分页 GET /source/book/_search { "_source":["author","title"], "query":{ "ids":{ "values":[1,2,3] } }, "from": 0, "size": 2 } ###must: 必须全部匹配 should: 可以匹配任意 must_not必须全部不匹配 ###符合条件查询 查询作者是李白的且价格不在43-45之间 GET /source/book/_search { "query": { "bool": { "must": [ { "term": { "author": { "value": "李白" } } } ], "must_not": [ { "range": { "price": { "gte": 43, "lte": 45 } } } ], "should": [ { "match": { "content": "春眠不觉晓" } } ] } } } ###搜索结果高亮 POST /source/book/_search { "query": { "match": { "content": "桃花春风" } }, "highlight": { "pre_tags": [ "<span>" ], "post_tags": [ "</span>" ], "fields": { "content": {} } } } ###分组查询 根据author统计文档数量 获取2条 POST /source/book/_search { "aggs":{ "group_by_author_count":{ "terms": { "field": "author", "size": 2 } } } } ###查询价格的最大值 GET /source/book/_search { "aggs": { "max_price": { "max": { "field": "price" } } } } ###查询价格的信息 POST /source/book/_search { "aggs":{ "book_count_info":{ "stats": { "field": "price" } } } } ###统计价格在20-50之间的所有数据count的平均值 POST /source/book/_search { "aggs": { "NAME": { "filter": { "range": { "price": { "gt": 20, "lte": 50 } } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } } ###filter查询 GET /source/book/_search { "query": { "bool": { "must": [{ "match_all": {} }], "filter": { "range": { "price": { "gt": 21, "lte": 51 } } } } }, "from": 0, "size": 10, "_source": ["author", "title", "price"] }
##term是代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl.html 深入搜索 | Elasticsearch: 权威指南 | Elastic
- springboot整合elasticsearch
- 创建Springboot项目,pom.xml引入依赖
<parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.1.5.RELEASE</version> <relativePath /> <!-- lookup parent from repository --> </parent> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> </dependency> </dependencies>
- application.yml配置如下
server: port: 8001 servlet: context-path: /es spring: data: elasticsearch: ####集群名称 cluster-name: elasticsearch ####集群节点 多个则以逗号隔开 192.168.2.117:9300,192.168.2.118:9300,192.168.2.119:9300 cluster-nodes: 121.4.227.23:9300
- 创建实体类 BookEntity.java
indexName:索引名称import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.Data; import org.springframework.data.annotation.Id; import org.springframework.data.elasticsearch.annotations.Document; import java.util.Date; @Data @ApiModel @Document(indexName = "source", type = "book") public class BookEntity { @Id @ApiModelProperty(value = "id", name = "id", example = "101") private String id; @ApiModelProperty(value = "作者", name = "author", example = "柳宗元") private String author; @ApiModelProperty(value = "诗名", name = "title", example = "江雪") private String title; @ApiModelProperty(value = "内容", name = "content", example = "千山鸟飞绝,万径人踪灭,孤舟蓑笠翁,独钓寒江雪") private String content; @ApiModelProperty(value = "上线时间", name = "time", example = "2018-08-08") private Date time; @ApiModelProperty(value = "价格", name = "price", example = "100") private Integer price;
type: 文档类型名称
- 创建UserDAO.java
import com.basic.entity.BookEntity; import org.springframework.data.domain.Page; import org.springframework.data.domain.Pageable; import org.springframework.data.elasticsearch.annotations.Query; import org.springframework.data.elasticsearch.repository.ElasticsearchRepository; import java.util.List; public interface BookRepository extends ElasticsearchRepository<BookEntity, String> { List<BookEntity> findByAuthorIn(List<String> authors); List<BookEntity> findByAuthorAndPrice(String author, Integer price); Page<BookEntity> findByAuthorNot(String author, Pageable page); // 非 List<BookEntity> findByAuthorContains(String author); // 模糊查询 *白* Integer countByAuthor(String author); // 统计 List<BookEntity> findByPriceBetween(Integer start, Integer end);// 范围查询 int deleteByAuthor(String author); @Query("{\"bool\":{\"should\":[{\"term\":{\"author\": \"?0\"}},{\"term\":{\"price\": \"?1\"}}]}}") Page<BookEntity> shouldByAuthorOrPrice(String author, Integer price, Pageable pageable); }
- 创建IndexController.java
import com.basic.repository.BookRepository; import com.basic.entity.BookEntity; import com.google.common.collect.Lists; import io.swagger.annotations.*; import lombok.extern.slf4j.Slf4j; import org.elasticsearch.index.query.*; import org.elasticsearch.search.aggregations.*; import org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder; import org.elasticsearch.search.sort.SortBuilders; import org.elasticsearch.search.sort.SortOrder; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.domain.Page; import org.springframework.data.domain.PageRequest; import org.springframework.data.domain.Pageable; import org.springframework.data.domain.Sort; import org.springframework.data.elasticsearch.core.aggregation.AggregatedPage; import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder; import org.springframework.data.elasticsearch.core.query.SearchQuery; import org.springframework.util.StringUtils; import org.springframework.web.bind.annotation.*; import java.util.List; @Slf4j @RestController @Api(tags = "elasticsearch接口") public class IndexController { @Autowired private BookRepository bookRepository; @GetMapping("/selectById") @ApiOperation(value = "ID查询", notes = "根据ID查询", httpMethod = "GET") @ApiImplicitParam(name = "id", value = "文档ID", dataType = "String", example = "1") public BookEntity findBook(@RequestParam("id") String id) { return bookRepository.findById(id).get(); } @GetMapping("/selectFun") @ApiOperation(value = "方法查询", notes = "方法查询", httpMethod = "GET") @ApiImplicitParams({ @ApiImplicitParam(name = "author", value = "作何", dataType = "String", example = "李白"), @ApiImplicitParam(name = "price", value = "价格", dataType = "int", example = "44") }) public List<BookEntity> findByAuthorAndPrice(@RequestParam("author") String author, @RequestParam("price") int price) { return bookRepository.findByAuthorAndPrice(author, price); } @GetMapping("/selectSort") @ApiOperation(value = "分页排序", notes = "根据价格排序并分页", httpMethod = "GET") @ApiImplicitParams({ @ApiImplicitParam(name = "page", value = "页码", dataType = "int", example = "0"), @ApiImplicitParam(name = "size", value = "页容量", dataType = "int", example = "2") }) public Page<BookEntity> selectList(@RequestParam("page") int page, @RequestParam("size") int size) { Sort sort = new Sort(Sort.Direction.DESC, "price"); Pageable pageable = PageRequest.of(page, size, sort); Page<BookEntity> pages = bookRepository.findAll(pageable); return pages; } @PostMapping("/selectLogic") @ApiOperation(value = "逻辑查询", notes = "逻辑关系查询", httpMethod = "POST") @ApiImplicitParams({ @ApiImplicitParam(name = "word", value = "分词", dataType = "String", example = "大林寺/寺院"), @ApiImplicitParam(name = "price", value = "价格", dataType = "int", example = "44") }) public List<BookEntity> selectLogic(@RequestParam("word") String word, @RequestParam("price") Integer price) { // must 相当于 &(与)条件。 // must not 相当于~(非)条件。 // should 相当于 | (或)条件。 // filter 类似must,区别在于它不参与计算分值,在不需要用到分值计算的时候效率更高 BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery(); // 精确查询 // queryBuilder.must(QueryBuilders.termQuery("author","李白")); // 分词查询 // queryBuilder.must(QueryBuilders.matchQuery("content", "春风十里不如你")); // 匹配多个字段 // queryBuilder.must(QueryBuilders.multiMatchQuery("春风十里不如你", "title", "content")); // 前缀查询 // queryBuilder.must(QueryBuilders.prefixQuery("author", "李")); // 模糊查询 // queryBuilder.must(QueryBuilders.fuzzyQuery("title", "白帝")); // 区间查询 // queryBuilder.must(QueryBuilders.rangeQuery("price").gt(20).lt(40)); // 通配符查询 // queryBuilder.must(QueryBuilders.wildcardQuery("title", "*白帝*")); // 注意中文适配问题 // id的in查询 // queryBuilder.must(QueryBuilders.idsQuery().addIds("1", "2")); // price不等于空 // queryBuilder.must(QueryBuilders.existsQuery("price")); // 时间范围查询 // queryBuilder.filter(QueryBuilders.rangeQuery("time").from("2017-07-01").to("2017-07-11")); if (StringUtils.hasText(word)) { queryBuilder.should(QueryBuilders.matchQuery("title", word)); } if (price != null) { queryBuilder.should(QueryBuilders.matchQuery("price", price)); } Iterable<BookEntity> results = bookRepository.search(queryBuilder); return Lists.newArrayList(results); } @PostMapping("/selectMulti") @ApiOperation(value = "多属性查询", notes = "多属性查询", httpMethod = "POST") public Page<BookEntity> selectMulti() { SearchQuery searchQuery = new NativeSearchQueryBuilder() .withFields("title", "content", "price") // 指定字段 .withQuery(QueryBuilders.multiMatchQuery("春风桃花", "title", "content")) //搜索条件 .withSort(SortBuilders.fieldSort("price").order(SortOrder.DESC)) // 排序 .withPageable(PageRequest.of(0, 2)) // 分页 .build(); Page<BookEntity> pages = bookRepository.search(searchQuery); return pages; } @GetMapping("/dsl") @ApiOperation(value = "DSL查询", notes = "DSL查询", httpMethod = "GET") public Page<BookEntity> dsl() { Sort sort = new Sort(Sort.Direction.DESC, "price"); Pageable pageable = PageRequest.of(0, 2, sort); Page<BookEntity> pages = bookRepository.shouldByAuthorOrPrice("李白", 33, pageable); return pages; } @PostMapping("/add") @ApiOperation(value = "新增文档", notes = "新增文档", httpMethod = "POST") public BookEntity add(@RequestBody BookEntity book) { return bookRepository.save(book); } @GetMapping("/deleteById") @ApiOperation(value = "删除文档", notes = "删除文档", httpMethod = "GET") @ApiImplicitParam(name = "id", value = "文档id", dataType = "String", example = "3") public void deleteById(@RequestParam("id") String id) { bookRepository.deleteById(id); } @GetMapping("/selectAgg") @ApiOperation(value = "聚合查询", notes = "聚合查询", httpMethod = "GET") public Object selectAgg() { NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); // // 查询所有书的价格平均值 avg // queryBuilder.addAggregation(AggregationBuilders.avg("avg_price").field("price")); // // 查询所有书的价格之和 sum // queryBuilder.addAggregation(AggregationBuilders.sum("sum_price").field("price")); // // 查询所有书最低的价格值 min // queryBuilder.addAggregation(AggregationBuilders.min("min_price").field("price")); // // 根据作者分组并求和 group by sum // TermsAggregationBuilder groupBuilder = AggregationBuilders.terms("group_author").field("author"); // AggregationBuilder sumBuilder = AggregationBuilders.sum("sum_price").field("price"); // queryBuilder.addAggregation(groupBuilder.subAggregation(sumBuilder)); // 查询作者李白书中最高的price max FilterAggregationBuilder filterBuilder = AggregationBuilders.filter("param_author", QueryBuilders.termQuery("author", "李白")) .subAggregation(AggregationBuilders.max("max_price").field("price")); queryBuilder.addAggregation(filterBuilder); AggregatedPage<BookEntity> aggPage = (AggregatedPage<BookEntity>) bookRepository.search(queryBuilder.build()); Aggregations aggregations = aggPage.getAggregations(); // Terms terms = aggregations.get("group_author"); // for (Terms.Bucket bucket : terms.getBuckets()) { // } return aggregations; } }
- 启动类
@EnableElasticsearchRepositories配置扫包范围import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.data.elasticsearch.repository.config.EnableElasticsearchRepositories; @SpringBootApplication @EnableElasticsearchRepositories(basePackages = "com.basic.repository") public class AppServer { public static void main( String[] args ) { SpringApplication.run(AppServer.class, args); } }
- 测试
http://127.0.0.1:8001/es/doc.html
文档:
Spring Data Elasticsearch
Bucket aggregations | Java Transport Client (deprecated) [7.17] | Elastic
- ES集群和高可用
- elasticsearch节点类型
一个 Elasticsearch 实例是一个节点,一组节点组成了集群。Elasticsearch 集群中的节点可以配置为 3 种不同的角色(不限3种):
主节点:负责管理集群的状态并广播到其他节点,并接收其他节点的响应。通过elasticsearch.yml中的node.master: true成为主节点。
数据节点:存储索引数据,并对文档进行操作 。通过elasticsearch.yml中的node.data: true成为数据节点。
客户端节点:扮演一个负载均衡的角色,将到来的请求路由到集群中的对应分片上。
...................
如果使用ES默认配置,一个节点启动后会承担上述所有的节点类型角色,建议每个节点只配置一种节点类型角色。
Node | Elasticsearch Guide [6.4] | Elastic - elasticsearch的分片机制
主分片: elasticsearch会将一个索引数据分别存放在多个分片上(默认5个),每个分片都会部署在多个不同的节点上,该分片称为主分片 (primary shard) 。
复制分片: 每一个主分片为了实现高可用,提高系统的容错性,防止某个节点或分片出现故障导致数据丢失,所以每个主分片对应1个或多个副本(replica shards)。
"number_of_shards":"3" 表示: 主分片为3个###创建索引并指定索引的分片数量 PUT back { "settings": { "number_of_shards": 3, "number_of_replicas": 1 } } ###查看索引信息 GET /back/_settings
"number_of_replicas":"1" 表示: 每个主分片都会对应1个副分片
查询默认分片信息 http://121.4.227.23:9200/source/_settings - elasticsearch的路由策略
elasticsearch在写入数据时,是通过路由来确定具体写入哪一个分片中,大致路由过程如下:shard = hash(routing) % number_of_primary_shards
routing默认时文档的_id,支持自定义。routing通过hash得到的数字,再根据number_of_primary_shards(主分片的数量) 求余 ,就是该文档对应的分片位置。
- elasticsearch集群 集群健康 | Elasticsearch: 权威指南 | Elastic
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/modules-discovery-zen.html
- elasticsearch7.x新特性
- 自带JDK
- 默认分片数量为1 ,不再是5
- 移除type
- kibana支持暗黑模式, 以及集群的协调功能
- 查询速度优化,使用Term查询性能提升3700%
- 对内存管理更加健壮,降低OOM(内存溢出)发生
- 时间戳支持纳秒级别
- ....................
- elasticsearch最佳实践
- 官方建议分片大小控制在30GB-50GB以内
- 索引分片数为数据节点的倍数