分布式搜索引擎ElasticSearch_百度用的是es吗-CSDN博客

本文链接：https://blog.csdn.net/qq_36092584/article/details/101212162

搜索包括：
1）搜索引擎搜索：百度、谷歌
2）站内搜索：淘宝、天猫、京东

1. ElasticSearch简介

一个实时的分布式搜索和分析引擎。基于Lucene的搜索服务器，提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。

特点：
1）可以作为一个大型分布式集群技术，处理PB级数据
2）将全文搜索、数据分析以及分布式技术，合并在了一起，形成了独一无二的ES
3）开箱即用，部署简单
4）全文检索，同义词处理，相关度排名，复杂数据分析，海量数据近实时处理

2. ElasticSearch体系结构

下面是ElasticSearch与MySQL数据库逻辑概念的对比

ElasticSearch	MySQL
索引（index）	数据库（database）
类型（type）	表（table）
文档（document）	行（row）

3. ElasticSearch安装

直接下载解压，bin/elasticsearch.bat运行即可，端口9300（Java开发），9200（非Java）

4. ElasticSearch的Restful风格使用

通过Postman模拟

// 新建articleindex索引，put方式提交
http://127.0.0.1:9200/articleindex

// 新建文档，post方式提交
http://127.0.0.1:9200/articleindex/article
body:{"title": "xxx", "content": "xxx"}

// 查询所有，get方式提交
http://127.0.0.1:9200/articleindex/_search

// 按ID查询文档，get方式提交
http://127.0.0.1:9200/atricleindex/article/1

// 根据某列进行查询，get方式提交
http://127.0.0.1:9200/atricleindex/article/_search?q=title:完美

5. Head插件使用

通过rest请求方式使用Elasticsearch太麻烦。一般都会使用图形化界面来实现日常管理。Head插件时最常用的。通过npm启动。

6. IK分词器

直接下载，放置到ElasticSearch的plugin下面就可以用了。

IK提供了两个分词算法ik_smart（最少切分）和ik_max_word（最细粒度划分），可通过
localhost:9200/_analyze?analyzer=ik_smart&pretty=true&text=搜素引擎的使用

7. Java应用

导包

<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-elasticsearch</artifactId>
</dependency>

配置

spring.data.elasticsearch.cluster-nodes = 127.0.0.1:9300

实体

@Document(indexName = "", type = "")
public class Article implements Serializable {
    
    @Id
    private String id;

    // 是否索引，看该域是否能被搜索
    // 是否分词，就表示搜索的时候时整体匹配还是单词匹配
    // 是否存储，就是是否在页面上显示
    @Field(index = true, analyzer = "ik_max_word", searchAnalyzer = "ik_max_word")
    private String title;

    @Field(index = true, analyzer = "ik_max_word", searchAnalyzer = "ik_max_word")
    private String content;

    private String state;

    ...
}

Dao

extends ElasticsearchRepository<实体, String>

8. Elasticsearch与MySQL数据同步

8.1 Logstash

一款轻量级的日志搜集处理框架，可以方便的把分散的、多样化的日志搜集起来，并进行自定义的处理，然后传输到指定位置，比如某个服务器或者文件。

解压，进入bin目录

logstash -e 'input { stdin { } } output { stdout { } }'
logstash -f ../mysqletc/mysql.conf