ElasticSearch详解(一)

基本概念

ElasticSearch(简称ES) 是基于lucene的分布式搜索引擎。核心思想就是在多台机器上启动多个 ES 进程实例,组成了一个 ES 集群。

ES 中存储数据的基本单位是索引,相当于是 mysql 里的数据库。

esmysql说明
index(索引)database(数据库)index是结构相似的文档集合
document(文档) json格式row (行)
field(字段)column(列)
mapping(映射关系)schema(表结构)规定数据的限制,字段的类型、默认值、分析器、是否被索引
keyword和text区别
keywordtext
直接索引会分词然后索引
支持模糊,精确搜索支持模糊,精确搜索
支持聚合不支持聚合
分片和复制(shards&replicas)

分片: 将一个索引横向划分成多个索引,这些索引可以被放置到集群中的任何节点上。类似于MySQL的横向分表。分片的好处如下:

  • 可以进行横向扩展容量
  • 分片是分布在多个节点上的,可以提高性能/吞吐量

复制: 创建分片的一份或多份拷贝

  • 在分片故障的时候,提高了可用性
  • 搜索可以在所有的复制上并行运行,提高搜索量/吞吐量

分片和复制的数量可以在索引创建的时候指定。在索引创建之后,你可以在任何时候动态地改变复制的数量,但是你事后不能改变分片的数量。 默认情况下,elasticsearch中的每个索引有5个分片1个复制

基本操作
springboot整合es

引入依赖

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

创建客户端连接

@Configuration
public class EsConfig {

    @Value("${spring.elasticsearch.host}")
    private String host;

    @Value("${spring.elasticsearch.port}")
    private Integer port;

    @Value("${spring.elasticsearch.username}")
    private String username;

    @Value("${spring.elasticsearch.password}")
    private String password;

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY,
                new UsernamePasswordCredentials(username, password));

        RestClientBuilder builder = RestClient.builder(new HttpHost(host, port, "http"))
                .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                    public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpAsyncClientBuilder) {
                        httpAsyncClientBuilder.disableAuthCaching();
                        return httpAsyncClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                    }
                });
        return new RestHighLevelClient(builder);
    }
}
创建索引和映射关系
PUT blog
{
  "mappings": {
    "article": {
      "properties": {
        "id": {
          "type": "long",
          "store": true,
          "index": "false"
        },
        "title": {
          "type": "text",
          "store": true,
          "index": "true",
          "analyzer": "standard"
        },
        "content": {
          "type": "text",
          "store": true,
          "index": "true",
          "analyzer": "standard"
        }
      }
    }
  }
}
@Autowired
private RestHighLevelClient restHighLevelClient;

@Test
public void testCreateIndex() throws IOException {
    CreateIndexRequest request = new CreateIndexRequest("blog");
    CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
    System.out.println(createIndexResponse.isAcknowledged());
}
创建索引后设置mapping
# 创建索引
PUT blog01

# 设置mapping
POST blog01/hello/_mapping
{
  "hello": {
    "properties": {
      "id": {
        "type": "long",
        "store": true,
        "index": "false"
      },
      "title": {
        "type": "text",
        "store": true,
        "index": "true",
        "analyzer": "standard"
      },
      "content": {
        "type": "text",
        "store": true,
        "index": "true",
        "analyzer": "standard"
      }
    }
  }
}
删除索引
DELETE blog01
@Test
public void testDeleteIndex() throws IOException {
    DeleteIndexRequest request = new DeleteIndexRequest("blog01");
    AcknowledgedResponse delete = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
    System.out.println(delete.isAcknowledged());
}
创建文档
POST blog/article/1
{
  "id": 1,
  "title": "es详解",
  "content": "es是一个分布式的搜索引擎"
}
@Test
public void testInertDoc() throws IOException {
    IndexRequest indexRequest = new IndexRequest("blog");
    indexRequest.type("article");
    indexRequest.id("2");

    Article article = new Article();
    article.setId(2L);
    article.setTitle("java");
    article.setContent("java是一门编程语言");

    indexRequest.source(JSON.toJSONString(article), XContentType.JSON);

    IndexResponse index = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
    System.out.println(index.getResult());
}
修改文档
POST blog/article/1
{
  "id": 1,
  "title": "【修改】es详解",
  "content": "【修改】es是一个分布式的搜索引擎"
}
@Test
public void testUpdateDoc() throws IOException {
    UpdateRequest request = new UpdateRequest("blog", "article", "2");

    // 根据json字段更新,不能更新null值
//        Article article = new Article();
//        article.setId(2L);
//        article.setTitle("[修改]java");
//        article.setContent("[修改]java是一门编程语言");
//
//        request.doc(JSON.toJSONString(article), XContentType.JSON);
//        UpdateResponse update = restHighLevelClient.update(request, RequestOptions.DEFAULT);
//
//        System.out.println(update.getResult());

    // XContentBuilder  可以更新null
    Article article = new Article();
    article.setId(null);
    XContentBuilder builder = XContentFactory.jsonBuilder()
            .startObject()
            .field("id", article.getId())
            .endObject();

    request.doc(builder);

    UpdateResponse update = restHighLevelClient.update(request, RequestOptions.DEFAULT);
    System.out.println(update.getResult());
}
删除文档
DELETE blog/article/1
@Test
public void testDeleteDoc() throws IOException {
     DeleteRequest request = new DeleteRequest("blog");
     request.type("article");
     request.id("2");

     DeleteResponse delete = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
     System.out.println(delete.getResult());
 }
批量保存数据
@Test
public void testBulk() throws IOException {
    // 批量添加文档,如果存在,会覆盖文档
    BulkRequest bulkRequest = new BulkRequest("blog01", "article");
    List<Article> docs = new ArrayList<Article>();
    for (int i = 100; i < 500; i++) {
        Article doc = new Article();
        doc.setId((long) (i + 1));
        doc.setTitle("title" + (i + 1) + ": java详解");
        doc.setContent("content" + (i + 1) + ": java是一门编程语言");
        docs.add(doc);
    }

    for (Article doc : docs) {
        IndexRequest indexRequest = new IndexRequest();
        indexRequest.id(doc.getId().toString());
        indexRequest.source(JSON.toJSONString(doc), XContentType.JSON);
        bulkRequest.add(indexRequest);
    }
    BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
    System.out.println(bulk.hasFailures());
}
根据ID查询数据
GET blog/article/2
@Test
public void testGetDoc() throws IOException {
    GetRequest request = new GetRequest("blog");
    request.id("2");

    GetResponse getResponse = restHighLevelClient.get(request, RequestOptions.DEFAULT);

    String sourceAsString = getResponse.getSourceAsString();
    Article article = JSONObject.parseObject(sourceAsString, Article.class);
    System.out.println(article);

    Map<String, Object> sourceAsMap = getResponse.getSourceAsMap();
    System.out.println(sourceAsMap.get("content"));
}
根据字段查询-term
GET blog/article/_search
{
  "query": {
    "term": {
       "content": "编程"
    }
  }
}
分页和排序
GET blog/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {  "id": "desc" }
  ]
}
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  }
}
  • took: 运行查询所花费的时间(毫秒ms)
  • time_out: 请求是否超时
  • _shards: 搜索了多少个碎片,以及成功,失败或跳过了多少个碎片的细目分类
@Test
public void testSearch() throws IOException {
    SearchRequest request = new SearchRequest("blog01");
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("content", "分布式");
    SearchSourceBuilder builder = new SearchSourceBuilder();
    // 设置高亮
    HighlightBuilder highlightBuilder = new HighlightBuilder();
    highlightBuilder.field("content");
    highlightBuilder.preTags("<font style='color:red'>");
    highlightBuilder.postTags("</font>");
    builder.highlighter(highlightBuilder);
    builder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    // 分页
    builder.from(0);
    builder.size(5);
    builder.query(termQueryBuilder);
    request.source(builder);
    SearchResponse search = restHighLevelClient.search(request, RequestOptions.DEFAULT);

    System.out.println("总共有" + search.getHits().getTotalHits() + "条");
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
        Text[] contents = hit.getHighlightFields().get("content").getFragments();
        for (Text content : contents) {
            System.out.println(content);
        }
    }
}
倒排索引

ES引入倒排索引,为了加速查询和搜索速度。倒排索引不是由记录来确定属性值,而是由属性值来确定记录的位置。

正排索引倒排索引
文档ID到文档内容和单词的关联单词到文档ID的关系
图书的目录页图书的索引页

倒排索引的核心组成

  1. 单词词典:记录所有文档的单词,一般都比较大。还会记录单词到倒排列表的关联信息
  2. 倒排列表:记录了单词对应的集合文档,由倒排索引项组成
    • 文档ID,用于获取原始信息
    • 单词频率TF,记录该单词在该文档中的出现次数,用于后续相关性算分
    • 位置Position,记录单词在文档中分词的位置,用于语句搜索(phrase query)
    • 偏移Offset,记录单词在文档的开始和结束位置,实现高亮显示
注意事项
为ES设置密码
  1. 修改ES配置文件elasticsearch.yml,并重启ES
xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true
  1. 进入bin目录,elasticsearch-setup-passwords interactive初始化密码。包含这些用户的密码elasticapm_systemkibanalogstash_systembeats_systemremote_monitoring_user

  2. 为kibana配置账号密码,配置文件kibana.yml

elasticsearch.username: "kibana"
elasticsearch.password: "12345678"
IK分词器

es默认分词器是标准分词器,分词效果如下:

GET /_analyze?pretty=true
{
  "analyzer": "standard",
  "text": "爪洼编程语言"
}
// 爪 洼 编 程 语 言

标准分词器对中文分词不太支持,没有按照我们的想法进行分词爪洼编程语言。这时候需要用另一种分词-IK分词器,IK提供了两个分词算法ik_smartik_max_word

GET /_analyze?pretty=true
{
  "analyzer": "ik_smart",
  "text": "爪洼是一门编程语言"
}
// 爪 洼 是 一门 编程 语言
GET /_analyze?pretty=true
{
  "analyzer": "ik_max_word",
  "text": "爪洼是一门编程语言"
}
// 爪 洼 是 一门 一 门 编程 语言
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值