文章目录
基本概念
ElasticSearch(简称ES) 是基于lucene的分布式搜索引擎。核心思想就是在多台机器上启动多个 ES 进程实例,组成了一个 ES 集群。
ES 中存储数据的基本单位是索引,相当于是 mysql 里的数据库。
es | mysql | 说明 |
---|---|---|
index(索引) | database(数据库) | index是结构相似的文档集合 |
document(文档) json格式 | row (行) | |
field(字段) | column(列) | |
mapping(映射关系) | schema(表结构) | 规定数据的限制,字段的类型、默认值、分析器、是否被索引 |
keyword和text区别
keyword | text |
---|---|
直接索引 | 会分词然后索引 |
支持模糊,精确搜索 | 支持模糊,精确搜索 |
支持聚合 | 不支持聚合 |
分片和复制(shards&replicas)
分片: 将一个索引横向划分成多个索引,这些索引可以被放置到集群中的任何节点上。类似于MySQL的横向分表。分片的好处如下:
- 可以进行横向扩展容量
- 分片是分布在多个节点上的,可以提高性能/吞吐量
复制: 创建分片的一份或多份拷贝
- 在分片故障的时候,提高了可用性
- 搜索可以在所有的复制上并行运行,提高搜索量/吞吐量
分片和复制的数量可以在索引创建的时候指定。在索引创建之后,你可以在任何时候动态地改变复制的数量,但是你事后不能改变分片的数量。 默认情况下,elasticsearch中的每个索引有5个分片
和1个复制
。
基本操作
springboot整合es
引入依赖
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
创建客户端连接
@Configuration
public class EsConfig {
@Value("${spring.elasticsearch.host}")
private String host;
@Value("${spring.elasticsearch.port}")
private Integer port;
@Value("${spring.elasticsearch.username}")
private String username;
@Value("${spring.elasticsearch.password}")
private String password;
@Bean
public RestHighLevelClient restHighLevelClient() {
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,
new UsernamePasswordCredentials(username, password));
RestClientBuilder builder = RestClient.builder(new HttpHost(host, port, "http"))
.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpAsyncClientBuilder) {
httpAsyncClientBuilder.disableAuthCaching();
return httpAsyncClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
}
});
return new RestHighLevelClient(builder);
}
}
创建索引和映射关系
PUT blog
{
"mappings": {
"article": {
"properties": {
"id": {
"type": "long",
"store": true,
"index": "false"
},
"title": {
"type": "text",
"store": true,
"index": "true",
"analyzer": "standard"
},
"content": {
"type": "text",
"store": true,
"index": "true",
"analyzer": "standard"
}
}
}
}
}
@Autowired
private RestHighLevelClient restHighLevelClient;
@Test
public void testCreateIndex() throws IOException {
CreateIndexRequest request = new CreateIndexRequest("blog");
CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
System.out.println(createIndexResponse.isAcknowledged());
}
创建索引后设置mapping
# 创建索引
PUT blog01
# 设置mapping
POST blog01/hello/_mapping
{
"hello": {
"properties": {
"id": {
"type": "long",
"store": true,
"index": "false"
},
"title": {
"type": "text",
"store": true,
"index": "true",
"analyzer": "standard"
},
"content": {
"type": "text",
"store": true,
"index": "true",
"analyzer": "standard"
}
}
}
}
删除索引
DELETE blog01
@Test
public void testDeleteIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest("blog01");
AcknowledgedResponse delete = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
System.out.println(delete.isAcknowledged());
}
创建文档
POST blog/article/1
{
"id": 1,
"title": "es详解",
"content": "es是一个分布式的搜索引擎"
}
@Test
public void testInertDoc() throws IOException {
IndexRequest indexRequest = new IndexRequest("blog");
indexRequest.type("article");
indexRequest.id("2");
Article article = new Article();
article.setId(2L);
article.setTitle("java");
article.setContent("java是一门编程语言");
indexRequest.source(JSON.toJSONString(article), XContentType.JSON);
IndexResponse index = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
System.out.println(index.getResult());
}
修改文档
POST blog/article/1
{
"id": 1,
"title": "【修改】es详解",
"content": "【修改】es是一个分布式的搜索引擎"
}
@Test
public void testUpdateDoc() throws IOException {
UpdateRequest request = new UpdateRequest("blog", "article", "2");
// 根据json字段更新,不能更新null值
// Article article = new Article();
// article.setId(2L);
// article.setTitle("[修改]java");
// article.setContent("[修改]java是一门编程语言");
//
// request.doc(JSON.toJSONString(article), XContentType.JSON);
// UpdateResponse update = restHighLevelClient.update(request, RequestOptions.DEFAULT);
//
// System.out.println(update.getResult());
// XContentBuilder 可以更新null
Article article = new Article();
article.setId(null);
XContentBuilder builder = XContentFactory.jsonBuilder()
.startObject()
.field("id", article.getId())
.endObject();
request.doc(builder);
UpdateResponse update = restHighLevelClient.update(request, RequestOptions.DEFAULT);
System.out.println(update.getResult());
}
删除文档
DELETE blog/article/1
@Test
public void testDeleteDoc() throws IOException {
DeleteRequest request = new DeleteRequest("blog");
request.type("article");
request.id("2");
DeleteResponse delete = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
System.out.println(delete.getResult());
}
批量保存数据
@Test
public void testBulk() throws IOException {
// 批量添加文档,如果存在,会覆盖文档
BulkRequest bulkRequest = new BulkRequest("blog01", "article");
List<Article> docs = new ArrayList<Article>();
for (int i = 100; i < 500; i++) {
Article doc = new Article();
doc.setId((long) (i + 1));
doc.setTitle("title" + (i + 1) + ": java详解");
doc.setContent("content" + (i + 1) + ": java是一门编程语言");
docs.add(doc);
}
for (Article doc : docs) {
IndexRequest indexRequest = new IndexRequest();
indexRequest.id(doc.getId().toString());
indexRequest.source(JSON.toJSONString(doc), XContentType.JSON);
bulkRequest.add(indexRequest);
}
BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
System.out.println(bulk.hasFailures());
}
根据ID查询数据
GET blog/article/2
@Test
public void testGetDoc() throws IOException {
GetRequest request = new GetRequest("blog");
request.id("2");
GetResponse getResponse = restHighLevelClient.get(request, RequestOptions.DEFAULT);
String sourceAsString = getResponse.getSourceAsString();
Article article = JSONObject.parseObject(sourceAsString, Article.class);
System.out.println(article);
Map<String, Object> sourceAsMap = getResponse.getSourceAsMap();
System.out.println(sourceAsMap.get("content"));
}
根据字段查询-term
GET blog/article/_search
{
"query": {
"term": {
"content": "编程"
}
}
}
分页和排序
GET blog/_search
{
"query": {
"match_all": {}
},
"sort": [
{ "id": "desc" }
]
}
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
}
}
- took: 运行查询所花费的时间(毫秒ms)
- time_out: 请求是否超时
- _shards: 搜索了多少个碎片,以及成功,失败或跳过了多少个碎片的细目分类
@Test
public void testSearch() throws IOException {
SearchRequest request = new SearchRequest("blog01");
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("content", "分布式");
SearchSourceBuilder builder = new SearchSourceBuilder();
// 设置高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("content");
highlightBuilder.preTags("<font style='color:red'>");
highlightBuilder.postTags("</font>");
builder.highlighter(highlightBuilder);
builder.timeout(new TimeValue(60, TimeUnit.SECONDS));
// 分页
builder.from(0);
builder.size(5);
builder.query(termQueryBuilder);
request.source(builder);
SearchResponse search = restHighLevelClient.search(request, RequestOptions.DEFAULT);
System.out.println("总共有" + search.getHits().getTotalHits() + "条");
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
Text[] contents = hit.getHighlightFields().get("content").getFragments();
for (Text content : contents) {
System.out.println(content);
}
}
}
倒排索引
ES引入倒排索引,为了加速查询和搜索速度。倒排索引不是由记录来确定属性值,而是由属性值来确定记录的位置。
正排索引 | 倒排索引 |
---|---|
文档ID到文档内容和单词的关联 | 单词到文档ID的关系 |
图书的目录页 | 图书的索引页 |
倒排索引的核心组成
- 单词词典:记录所有文档的单词,一般都比较大。还会记录单词到倒排列表的关联信息
- 倒排列表:记录了单词对应的集合文档,由倒排索引项组成
- 文档ID,用于获取原始信息
- 单词频率TF,记录该单词在该文档中的出现次数,用于后续相关性算分
- 位置Position,记录单词在文档中分词的位置,用于语句搜索(phrase query)
- 偏移Offset,记录单词在文档的开始和结束位置,实现高亮显示
注意事项
为ES设置密码
- 修改ES配置文件
elasticsearch.yml
,并重启ES
xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true
-
进入
bin
目录,elasticsearch-setup-passwords interactive
初始化密码。包含这些用户的密码elastic
,apm_system
,kibana
,logstash_system
,beats_system
,remote_monitoring_user
-
为kibana配置账号密码,配置文件
kibana.yml
elasticsearch.username: "kibana"
elasticsearch.password: "12345678"
IK分词器
es默认分词器是标准分词器,分词效果如下:
GET /_analyze?pretty=true
{
"analyzer": "standard",
"text": "爪洼编程语言"
}
// 爪 洼 编 程 语 言
标准分词器对中文分词不太支持,没有按照我们的想法进行分词爪洼
,编程
,语言
。这时候需要用另一种分词-IK分词器,IK提供了两个分词算法ik_smart
和 ik_max_word
GET /_analyze?pretty=true
{
"analyzer": "ik_smart",
"text": "爪洼是一门编程语言"
}
// 爪 洼 是 一门 编程 语言
GET /_analyze?pretty=true
{
"analyzer": "ik_max_word",
"text": "爪洼是一门编程语言"
}
// 爪 洼 是 一门 一 门 编程 语言