ElasticSearch详解(一)

一个人L

已于 2023-03-03 09:35:10 修改

阅读量353

点赞数 1

分类专栏： Java elasticsearch 文章标签： elasticsearch 大数据

于 2023-02-02 11:21:50 首次发布

本文链接：https://blog.csdn.net/qq_33460865/article/details/128716275

版权

Java 同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

elasticsearch

1 篇文章 0 订阅

订阅专栏

文章目录

基本概念

ElasticSearch(简称ES) 是基于lucene的分布式搜索引擎。核心思想就是在多台机器上启动多个 ES 进程实例，组成了一个 ES 集群。

ES 中存储数据的基本单位是索引，相当于是 mysql 里的数据库。

es	mysql	说明
index(索引)	database(数据库)	index是结构相似的文档集合
document(文档) json格式	row (行)
field(字段)	column(列)
mapping(映射关系）	schema(表结构)	规定数据的限制，字段的类型、默认值、分析器、是否被索引

keyword和text区别

keyword	text
直接索引	会分词然后索引
支持模糊，精确搜索	支持模糊，精确搜索
支持聚合	不支持聚合

分片和复制(shards&replicas)

分片: 将一个索引横向划分成多个索引，这些索引可以被放置到集群中的任何节点上。类似于MySQL的横向分表。分片的好处如下:

可以进行横向扩展容量
分片是分布在多个节点上的，可以提高性能/吞吐量

复制: 创建分片的一份或多份拷贝

在分片故障的时候，提高了可用性
搜索可以在所有的复制上并行运行，提高搜索量/吞吐量

分片和复制的数量可以在索引创建的时候指定。在索引创建之后，你可以在任何时候动态地改变复制的数量，但是你事后不能改变分片的数量。默认情况下，elasticsearch中的每个索引有5个分片和1个复制。

基本操作

springboot整合es

引入依赖

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

创建客户端连接

@Configuration
public class EsConfig {

    @Value("${spring.elasticsearch.host}")
    private String host;

    @Value("${spring.elasticsearch.port}")
    private Integer port;

    @Value("${spring.elasticsearch.username}")
    private String username;

    @Value("${spring.elasticsearch.password}")
    private String password;

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY,
                new UsernamePasswordCredentials(username, password));

        RestClientBuilder builder = RestClient.builder(new HttpHost(host, port, "http"))
                .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                    public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpAsyncClientBuilder) {
                        httpAsyncClientBuilder.disableAuthCaching();
                        return httpAsyncClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                    }
                });
        return new RestHighLevelClient(builder);
    }
}

创建索引和映射关系

PUT blog
{
  "mappings": {
    "article": {
      "properties": {
        "id": {
          "type": "long",
          "store": true,
          "index": "false"
        },
        "title": {
          "type": "text",
          "store": true,
          "index": "true",
          "analyzer": "standard"
        },
        "content": {
          "type": "text",
          "store": true,
          "index": "true",
          "analyzer": "standard"
        }
      }
    }
  }
}

@Autowired
private RestHighLevelClient restHighLevelClient;

@Test
public void testCreateIndex() throws IOException {
    CreateIndexRequest request = new CreateIndexRequest("blog");
    CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
    System.out.println(createIndexResponse.isAcknowledged());
}

创建索引后设置mapping

# 创建索引
PUT blog01

# 设置mapping
POST blog01/hello/_mapping
{
  "hello": {
    "properties": {
      "id": {
        "type": "long",
        "store": true,
        "index": "false"
      },
      "title": {
        "type": "text",
        "store": true,
        "index": "true",
        "analyzer": "standard"
      },
      "content": {
        "type": "text",
        "store": true,
        "index": "true",
        "analyzer": "standard"
      }
    }
  }
}

删除索引

DELETE blog01

@Test
public void testDeleteIndex() throws IOException {
    DeleteIndexRequest request = new DeleteIndexRequest("blog01");
    AcknowledgedResponse delete = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
    System.out.println(delete.isAcknowledged());
}

创建文档

POST blog/article/1
{
  "id": 1,
  "title": "es详解",
  "content": "es是一个分布式的搜索引擎"
}

@Test
public void testInertDoc() throws IOException {
    IndexRequest indexRequest = new IndexRequest("blog");
    indexRequest.type("article");
    indexRequest.id("2");

    Article article = new Article();
    article.setId(2L);
    article.setTitle("java");
    article.setContent("java是一门编程语言");

    indexRequest.source(JSON.toJSONString(article), XContentType.JSON);

    IndexResponse index = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
    System.out.println(index.getResult());
}

修改文档

POST blog/article/1
{
  "id": 1,
  "title": "【修改】es详解",
  "content": "【修改】es是一个分布式的搜索引擎"
}

@Test
public void testUpdateDoc() throws IOException {
    UpdateRequest request = new UpdateRequest("blog", "article", "2");

    // 根据json字段更新，不能更新null值
//        Article article = new Article();
//        article.setId(2L);
//        article.setTitle("[修改]java");
//        article.setContent("[修改]java是一门编程语言");
//
//        request.doc(JSON.toJSONString(article), XContentType.JSON);
//        UpdateResponse update = restHighLevelClient.update(request, RequestOptions.DEFAULT);
//
//        System.out.println(update.getResult());

    // XContentBuilder  可以更新null
    Article article = new Article();
    article.setId(null);
    XContentBuilder builder = XContentFactory.jsonBuilder()
            .startObject()
            .field("id", article.getId())
            .endObject();

    request.doc(builder);

    UpdateResponse update = restHighLevelClient.update(request, RequestOptions.DEFAULT);
    System.out.println(update.getResult());
}

删除文档

DELETE blog/article/1

@Test
public void testDeleteDoc() throws IOException {
     DeleteRequest request = new DeleteRequest("blog");
     request.type("article");
     request.id("2");

     DeleteResponse delete = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
     System.out.println(delete.getResult());
 }

批量保存数据

@Test
public void testBulk() throws IOException {
    // 批量添加文档，如果存在，会覆盖文档
    BulkRequest bulkRequest = new BulkRequest("blog01", "article");
    List<Article> docs = new ArrayList<Article>();
    for (int i = 100; i < 500; i++) {
        Article doc = new Article();
        doc.setId((long) (i + 1));
        doc.setTitle("title" + (i + 1) + ": java详解");
        doc.setContent("content" + (i + 1) + ": java是一门编程语言");
        docs.add(doc);
    }

    for (Article doc : docs) {
        IndexRequest indexRequest = new IndexRequest();
        indexRequest.id(doc.getId().toString());
        indexRequest.source(JSON.toJSONString(doc), XContentType.JSON);
        bulkRequest.add(indexRequest);
    }
    BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
    System.out.println(bulk.hasFailures());
}

根据ID查询数据

GET blog/article/2

@Test
public void testGetDoc() throws IOException {
    GetRequest request = new GetRequest("blog");
    request.id("2");

    GetResponse getResponse = restHighLevelClient.get(request, RequestOptions.DEFAULT);

    String sourceAsString = getResponse.getSourceAsString();
    Article article = JSONObject.parseObject(sourceAsString, Article.class);
    System.out.println(article);

    Map<String, Object> sourceAsMap = getResponse.getSourceAsMap();
    System.out.println(sourceAsMap.get("content"));
}

根据字段查询-term

GET blog/article/_search
{
  "query": {
    "term": {
       "content": "编程"
    }
  }
}

分页和排序

GET blog/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {  "id": "desc" }
  ]
}

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  }
}

took: 运行查询所花费的时间(毫秒ms)
time_out: 请求是否超时
_shards: 搜索了多少个碎片，以及成功，失败或跳过了多少个碎片的细目分类

@Test
public void testSearch() throws IOException {
    SearchRequest request = new SearchRequest("blog01");
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("content", "分布式");
    SearchSourceBuilder builder = new SearchSourceBuilder();
    // 设置高亮
    HighlightBuilder highlightBuilder = new HighlightBuilder();
    highlightBuilder.field("content");
    highlightBuilder.preTags("<font style='color:red'>");
    highlightBuilder.postTags("</font>");
    builder.highlighter(highlightBuilder);
    builder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    // 分页
    builder.from(0);
    builder.size(5);
    builder.query(termQueryBuilder);
    request.source(builder);
    SearchResponse search = restHighLevelClient.search(request, RequestOptions.DEFAULT);

    System.out.println("总共有" + search.getHits().getTotalHits() + "条");
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
        Text[] contents = hit.getHighlightFields().get("content").getFragments();
        for (Text content : contents) {
            System.out.println(content);
        }
    }
}

倒排索引

ES引入倒排索引，为了加速查询和搜索速度。倒排索引不是由记录来确定属性值，而是由属性值来确定记录的位置。

正排索引	倒排索引
文档ID到文档内容和单词的关联	单词到文档ID的关系
图书的目录页	图书的索引页

倒排索引的核心组成

单词词典：记录所有文档的单词，一般都比较大。还会记录单词到倒排列表的关联信息
倒排列表：记录了单词对应的集合文档，由倒排索引项组成
- 文档ID，用于获取原始信息
- 单词频率TF，记录该单词在该文档中的出现次数，用于后续相关性算分
- 位置Position，记录单词在文档中分词的位置，用于语句搜索(phrase query)
- 偏移Offset，记录单词在文档的开始和结束位置，实现高亮显示

注意事项

为ES设置密码

修改ES配置文件elasticsearch.yml，并重启ES

xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true

进入bin目录，elasticsearch-setup-passwords interactive初始化密码。包含这些用户的密码elastic，apm_system，kibana，logstash_system，beats_system，remote_monitoring_user
为kibana配置账号密码，配置文件kibana.yml

elasticsearch.username: "kibana"
elasticsearch.password: "12345678"

IK分词器

es默认分词器是标准分词器，分词效果如下:

GET /_analyze?pretty=true
{
  "analyzer": "standard",
  "text": "爪洼编程语言"
}
// 爪 洼 编 程 语 言

标准分词器对中文分词不太支持，没有按照我们的想法进行分词爪洼，编程，语言。这时候需要用另一种分词-IK分词器，IK提供了两个分词算法ik_smart和 ik_max_word

GET /_analyze?pretty=true
{
  "analyzer": "ik_smart",
  "text": "爪洼是一门编程语言"
}
// 爪 洼 是 一门 编程 语言

GET /_analyze?pretty=true
{
  "analyzer": "ik_max_word",
  "text": "爪洼是一门编程语言"
}
// 爪 洼 是 一门 一 门 编程 语言