文章目录
一、Elasticsearch
1.1 用途
Elasticsearch 是一个基于Apache Lucene,分布式、RESTful 风格的搜索和数据分析引擎。
根据官方文档,ES经常用于以下几方面
- 全文本搜索。这是最常见,也是最基本的用途,对系统内部文本数据进行搜索,可实现关键字搜索。
- 收集和分析日志或事务数据。可以使用ELK(Elasticsearch + Logstash + Kibana)完成此项功能。
- 对系统进行监视和可视化。
官方下载地址:https://www.elastic.co/cn/downloads/past-releases#elasticsearch
1.2 和Apache solr对比
solr也是一个基于Apache Lucene的的全文搜索服务器。和es相比在不同方面下的优缺点如下
Elasticsearch | Solr | |
---|---|---|
安装部署 | 开箱即用,非常简单 | 略复杂,需要部署到web容器中 |
分布式管理 | 自身携带分布式协调管理 | 需要利用 Zookeeper |
数据格式 | 只支持JSON | 支持更多格式的数据,比如JSON、XML、CSV |
实时索引 | 实时查询非常快 | 当实时建立索引,查询速度变慢 |
数据量增大 | 查询速度没有明显变化 | 变得更慢 |
1.3 基本概念
-
Index
索引是具有相同结构文档的集合,索引名必须是小写。
-
Type
类型是索引的逻辑分区,在6.00版本中被废弃,至于被抛弃的原因,可以见下方
⼀直认为ES中的“index”类似于关系型数据库的“database”,⽽“type”相当于⼀个数据表。ES的开发者们认为这是⼀个糟糕的
认识。例如:关系型数据库中两个数据表⽰是独立的,即使他们里面有相同名称的列也不影响使⽤,但ES中不是这样的。 我们都知道elasticsearch是基于Lucene开发的搜索引擎,⽽ES中不同type下名称相同的filed最终在Lucene中的处理⽅式是⼀样的。举个例⼦,两个不同type下的两个user_name,在ES同⼀个索引下其实被认为是同⼀个filed,你必须在两个不同的type中定义相同的filed映射。否则,不同type中的相同字段名称就会在处理中出现冲突的情况,导致Lucene处理效率下降。
去掉type能够使数据存储在独⽴的index中,这样即使有相同的字段名称也不会出现冲突,就像ElasticSearch出现的第⼀句话⼀样“你知道的,为了搜索····”,去掉type就是为了提⾼ES处理数据的效率。
除此之外,在同⼀个索引的不同type下存储字段数不⼀样的实体会导致存储中出现稀疏数据,影响Lucene压缩⽂档的能⼒,导致ES查询效率的降低。
注:6.*版本type只能有一个,7.*版本移除type,只有一个_doc
-
Document
文档是存储在索引中的json串,原始数据就存储在_source这个字段下面,搜索文档时默认会返回这个字段。
-
Mapping
映射是定义一个文档以及其所包含的字段如何被存储和索引的方法
-
Shards
分片,可以分为主分片(primary shard)和副本分片(replica shard),文档会存储在分片中。当你存储一个文档时,es会首先存储在主分片中,然后复制到副分片里。默认一个索引5个主分片和1组副分片(每个主分片有一个副分片对应),可以在创建索引时指定,每个主分片的副分片不会与主分片分布在同一个节点上。
在一个多分片的索引中写入数据时,通过 路由(routing) 来确定具体写入哪一个分片中
副分片有如下作用:
- 作为备份,防止主分片崩溃
- 分担查询请求,请求会在主分片和副分片之间均匀分布
-
Term
在es中,term是一个能被精确索引的索引词
-
Text
通常下文本会被分析成一个个的索引词,存储在es中
-
Analyzer
分析器是将Text转化为Term的程序,中文分词器有名的就属IK了
1.4 基本使用
-
GET
-
查询所有索引状态
GET /_cat/indices?v
-
查询某一索引详细信息, 所有索引详细信息
GET /{index_name} GET /_all
-
查询某一索引下单条文档
GET /{index_name}/{type_name}/{id}
-
查询单条文档源数据
GET /{index_name}/{type_name}/{id}/_source
-
查询索引下所有数据
GET {index_name}/_search { "query": { "match_all": {} } }
-
查看索引设置、结构
GET /{index_name}/_setting GET /{index_name}/_mapping
-
查看字段结构,多个可用逗号隔开,支持*通配符
GET /{index_name}/_mapping/{type_name}/field/{filed_name}
-
查询集群的健康状态
GET /_cat/health?v
-
查询所有文档,某个索引下文档数
GET /_count GET /{index_name}/_count
-
-
PUT
-
新增索引,mapping参数详解见官方文档
PUT {index_name} { "settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 }, "mappings" : { "{type_name}" : { "properties" : { "{filed_name}" : { "type" : "{type_name}" } } } } }
-
新增字段
PUT {index_name}/_mapping/{type_name} { "properties": { "name": { "properties": { "last": { "type": "text" } } }, "user_id": { "type": "keyword", "ignore_above": 100 } } }
-
新增或更新文档
PUT {index_name}/{type_name}/{id} { "{field_name}" : {value}, ... }
-
-
POST
-
新增文档
POST {index_name}/{type_name} { "{field_name}" : {value}, ... }
-
删除匹配的文档,删除所有文档
POST {index_name}/_delete_by_query { "query": { "match": { "{filed_name}": {value} } } } POST {index_name}/_delete_by_query { "query": { "match_all": {} } }
-
更新文档
POST {index_name}/{type_name}/{id}/_update { "doc" : { "{field_name}" : {value} } }
-
-
DELETE
-
删除索引
DELETE /{index_name}
-
删除文档
DELETE /{index_name}/{type_name}/{id}
-
-
Search
-
URL搜索
GET /elasticsearch/_search?q=user:kimchy GET /kimchy,elasticsearch/_search?q=tag:wow GET /_all/_search?q=tag:wow
-
DSL搜索
-
搜索和过滤
GET /_search { "query": { "bool": { "must": [ { "match": { "title": "Search" }}, { "match": { "content": "Elasticsearch" }} ], "filter": [ { "term": { "status": "published" }}, { "range": { "publish_date": { "gte": "2015-01-01" }}} ] } } }
-
更多可以见官方文档
-
-
1.5 Java API
java可以通过以下两种API访问es
- TransportClient,使用9300端口 (注意:会在版本7.0废弃,8.0移除)
- REST Client, 使用9200端口
二、SpringBoot
springboot提供了一套访问es的接口,spring-data-elasticseatch是使用TransportClient
2.1 版本
Spring Data Elasticsearch | Elasticsearch | Spring Framework | Spring Boot |
---|---|---|---|
4.2.1 | 7.12.1 | 5.3.7 | 2.5.x |
4.1.x | 7.9.3 | 5.3.2 | 2.4.x |
4.0.x | 7.6.2 | 5.2.12 | 2.3.x |
3.2.x | 6.8.12 | 5.2.12 | 2.2.x |
3.1.x | 6.2.2 | 5.1.19 | 2.1.x |
3.0.x | 5.5.0 | 5.0.13 | 2.0.x |
2.1.x | 2.4.0 | 4.3.25 | 1.5.x |
这里因为之前下载了Elasticsearch-6.4.2,所以选择pom依赖为
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.6.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jsr310</artifactId>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>30.0-jre</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
Employee.java
@Data
@Document(indexName = "employee")
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Setting(settingPath = "es/es_search_setting.json")
@Mapping(mappingPath = "es/es_mapping_search.json")
public class Employee {
@Id
private Integer id;
private String name;
private Integer age;
private String about;
@JsonFormat(pattern = "yyyy-MM-dd HH:mm:ss")
@JsonDeserialize(using = LocalDateTimeDeserializer.class)
@JsonSerialize(using = LocalDateTimeSerializer.class)
private LocalDateTime lastUpdateTime;
}
es_search_setting.json
{
"employee": {
"properties": {
"name": {
"type": "completion"
},
"age": {
"type": "keyword",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
} ,
"about": {
"type": "text",
"analyzer": "ik_pinyin_analyzer",
"search_analyzer": "ik_pinyin_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lastUpdateTime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
}
}
}
es_mapping_search.json
{
"index": {
"number_of_replicas": "0",
"number_of_shards": "1",
"analysis": {
"filter": {
"pinyin_first_letter_and_full_pinyin_filter": {
"keep_joined_full_pinyin": "true",
"lowercase": "true",
"none_chinese_pinyin_tokenize": "false",
"keep_original": "true",
"remove_duplicated_term": "true",
"keep_separate_first_letter": "false",
"trim_whitespace": "true",
"type": "pinyin",
"limit_first_letter_length": "16",
"keep_none_chinese_in_joined_full_pinyin": "true",
"keep_first_letter": "true",
"keep_none_chinese": "true",
"keep_full_pinyin": "true"
}
},
"analyzer": {
"ik_pinyin_analyzer": {
"filter": "pinyin_first_letter_and_full_pinyin_filter",
"tokenizer": "my_ik_pinyin"
},
"pinyin_analyzer": {
"tokenizer": "my_pinyin"
}
},
"tokenizer": {
"my_ik_pinyin": {
"type": "ik_max_word"
},
"my_pinyin": {
"keep_joined_full_pinyin": "true",
"lowercase": "true",
"none_chinese_pinyin_tokenize": "false",
"keep_original": "true",
"remove_duplicated_term": "true",
"keep_separate_first_letter": "false",
"trim_whitespace": "true",
"type": "pinyin",
"limit_first_letter_length": "16",
"keep_none_chinese_in_joined_full_pinyin": "true",
"keep_first_letter": "true",
"keep_none_chinese": "true",
"keep_full_pinyin": "true"
}
}
}
}
}
application.yml
spring:
data:
elasticsearch:
cluster-nodes: localhost:9300
Employee.java
public interface EmployeeDao extends ElasticsearchRepository<Employee, Integer> {
}
2.2 CRUD
前置
@Autowired
private EmployeeDao employeeDao;
@Autowired
private ElasticsearchTemplate elasticsearchTemplate;
@Autowired
private TransportClient client;
@Autowired
private RestHighLevelClient restHighLevelClient;
private static final Logger logger = LoggerFactory.getLogger(ElasticsearchApplicationTests.class);
@Before
public void setUp() {
Employee songjiang = Employee.builder().id(1).name("宋江").age(34).about("宋江,字公明,绰号呼保义、及时雨、孝义黑三郎,是施耐庵所作古典小说《水浒传》中的角色,梁山一百零八将之一,排第一位")
.lastUpdateTime(LocalDateTime.now()).build();
Employee likui = Employee.builder().id(22).name("李逵 ").age(41).about("李逵,绰号“黑旋风”,沂州沂水县(今属山东省临沂市沂水县)吕丈村人氏。为救宋江,李逵大劫法场,是中国古典小说《水浒传》中的重要人物")
.lastUpdateTime(LocalDateTime.now()).build();
Employee wuyong = Employee.builder().id(3).name("吴用 ").age(41).about("吴用,表字学究,绰号智多星,道号“加亮先生”,是施耐庵所作古典小说《水浒传》中的角色")
.lastUpdateTime(LocalDateTime.now()).build();
Employee lijun = Employee.builder().id(26).name("李俊 ").age(41).about("李俊,庐州人氏,因在扬子江中做过撑船梢公,精通水性,人称混江龙。")
.lastUpdateTime(LocalDateTime.now()).build();
employeeDao.saveAll(Lists.newArrayList(songjiang, likui, wuyong, lijun));
}
@After
public void cleanUp() {
employeeDao.deleteAll();
}
@Test
public void testCrudByElasticsearchRepository() {
Iterable<Employee> all = employeeDao.findAll();
assertThat(Lists.newArrayList(all).size()).isEqualTo(4);
employeeDao.deleteById(1);
all = employeeDao.findAll();
assertThat(Lists.newArrayList(all).size()).isEqualTo(3);
Optional<Employee> optional = employeeDao.findById(3);
Integer age = optional.map(Employee::getAge).orElseGet(() -> {fail("error get employee");return null;});
optional.ifPresent(o -> {
Integer changedAge = age + 1;
o.setAge(changedAge);
employeeDao.save(o);
Optional<Employee> findOne = employeeDao.findById(3);
findOne.ifPresent(employee -> assertThat(employee.getAge()).isEqualTo(changedAge));
});
}
2.3 搜索
-
关键字搜索
@Test public void testTermQuery() { String kw = "施耐庵"; SearchQuery query = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.boolQuery() .should(QueryBuilders.termQuery("about", kw))) .build(); logger.info("DSL:{}", query.getQuery().toString()); List<Employee> employees = elasticsearchTemplate.queryForList(query, Employee.class); logger.info("{}", employees); assertThat(employees.size()).isEqualTo(2); } @Test public void testStringQuery() { SearchQuery query = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.queryStringQuery("+宋江 -李逵")) .build(); logger.info("DSL:{}", query.getQuery().toString()); List<Employee> employees = elasticsearchTemplate.queryForList(query, Employee.class); logger.info("{}", employees); assertThat(employees.size()).isEqualTo(1); } @Test public void testPhraseQuery(){ SearchQuery phraseQuery = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.matchPhraseQuery("about", "水浒")) .build(); List<Employee> phraseEmployee = elasticsearchTemplate.queryForList(phraseQuery, Employee.class); assertThat(phraseEmployee.size()).isEqualTo(3); SearchQuery commonQuery = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.boolQuery() .should(QueryBuilders.matchQuery("about", "水浒"))) .build(); List<Employee> commonEmployee = elasticsearchTemplate.queryForList(commonQuery, Employee.class); assertThat(commonEmployee.size()).isEqualTo(4); }
-
高亮搜索
@Test public void testHightLightQuery() { String keyword = "宋江"; SearchQuery searchQuery = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.queryStringQuery(keyword).boost(2f)) .withHighlightFields(new HighlightBuilder.Field("about") .preTags("<span style='color:red'>") .postTags("</span>")) .build(); AggregatedPage<Employee> employees = elasticsearchTemplate.queryForPage(searchQuery, Employee.class, new SearchResultMapper() { @Override public <T> AggregatedPage<T> mapResults(SearchResponse response, Class<T> clazz, Pageable pageable) { List<Employee> chunk = Lists.newArrayList(); SearchHits hits = response.getHits(); for (SearchHit searchHit : hits) { logger.info("{}", searchHit); Employee employee = Employee.builder().id(Integer.parseInt(searchHit.getId())) .name((String) searchHit.getSourceAsMap().get("name")) .age(Integer.parseInt(searchHit.getSourceAsMap().get("age").toString())) .about(searchHit.getHighlightFields().get("about").fragments()[0].toString()) .build(); chunk.add(employee); } return new AggregatedPageImpl<T>((List<T>) chunk, pageable, hits.getTotalHits(), response.getAggregations()); } }); logger.info("{}", employees.getContent()); List<Employee> content = employees.getContent(); assertThat(content.size()).isEqualTo(2); assertThat(content.get(0).getAbout()).contains("<span style='color:red'>"); }
-
拼音搜索
@Test public void testPinyinSearch() { String keyword = "songjiang"; SearchQuery searchQuery = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.boolQuery() .should(QueryBuilders.matchQuery("about", keyword)) .should(QueryBuilders.matchQuery("about.pinyin", keyword))) .build(); logger.info("{}", searchQuery.getQuery().toString()); List<Employee> employees = elasticsearchTemplate.queryForList(searchQuery, Employee.class); logger.info("{}", employees); assertThat(employees.size()).isEqualTo(2); }
-
模糊搜索
@Test public void testWildcardQuery() { SearchQuery wildcardQuery = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.wildcardQuery("name", "李*")) .build(); List<Employee> wildEmployee = elasticsearchTemplate.queryForList(wildcardQuery, Employee.class); assertThat(wildEmployee.size()).isEqualTo(2); } @Test public void testFuzzyQuery() { SearchQuery fuzzyQuery = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.fuzzyQuery("about", "水浒")) .build(); List<Employee> fuzzyEmployee = elasticsearchTemplate.queryForList(fuzzyQuery, Employee.class); assertThat(fuzzyEmployee.size()).isEqualTo(3); }
-
自动补全
@Test public void testSuggest() { String prefix = "李"; String suggestName = "nameSuggest"; SuggestBuilder suggestBuilder = new SuggestBuilder() .addSuggestion(suggestName, SuggestBuilders.completionSuggestion("name").prefix(prefix).skipDuplicates(true).size(10)); SearchResponse response = elasticsearchTemplate.suggest(suggestBuilder, Employee.class); Suggest suggest = response.getSuggest(); CompletionSuggestion termSuggestion = suggest.getSuggestion(suggestName); List<Text> lists = termSuggestion.getEntries().stream().flatMap(o -> o.getOptions().stream().map(Suggest.Suggestion.Entry.Option::getText)).collect(Collectors.toList()); logger.info("{}", lists); assertThat(lists.size()).isEqualTo(2); }
三、错误及解决方法
-
使用银行的拼音(yinhang)查询时,查不出来
-
使用分词器发现拼音不对
-
解决方法,更换nlp-lang版本为最新版,银行的拼音终于正确
-
参考
- https://docs.spring.io/spring-data/elasticsearch/docs/3.1.10.RELEASE/reference/html/
- https://www.elastic.co/guide/en/elasticsearch/reference/6.4/index.html
- https://github.com/medcl/elasticsearch-analysis-pinyin
- ElasticSearch系列(六)springboot中使用QueryBuilders、NativeSearchQuery实现复杂查询
- Elasticsearch入门,这一篇就够了
- SpringBoot集成Elasticsearch 进阶,实现中文、拼音分词,繁简体转换