elastic-search入门教程总结

最新推荐文章于 2024-08-11 18:53:25 发布

杭州剃须刀

最新推荐文章于 2024-08-11 18:53:25 发布

阅读量550

点赞数 1

分类专栏：中间件文章标签： elastic-search es 搜索引擎

本文链接：https://blog.csdn.net/SeanTandol/article/details/96874445

版权

中间件专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章目录

安装
概念
分词解析器
restful调用
调试插件
集群使用
JAVA API调用
Spring-data-elasticsearch
原理
数据库和elastic-search性能对比
参考

安装

下载压缩包 https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip
解压
Windows，执行解压后bin目录下的elasticsearch.bat
Linux，执行解压后bin目录下的elasticsearch

概念

索引——数据库
类型——表
文档——记录
属性——字段
mapping——表结构的定义

分词解析器

分词解析器的作用是将一段话根据分词解析器的规则对一段话进行词语的划分。词语划分后，模糊搜索时，会根据划分的词语进行搜索，实际是词语作为倒排索引。

中文分词解析器：根据中文的词语规则，对语句进行词语的划分。
下载地址：https://github.com/medcl/elasticsearch-analysis-ik

restful调用

创建索引并设置类型：

PUT请求: localhost:9200/blog

{
	"mappings":{
		"hello":{
			"properties":{
			"id":{
				"store":true,
				"analyzer":"standard",
				"type":"long"
			},
			"title":{
				"store":true,
				"analyzer":"ik_smart",
				"type":"text"
			},
			"content":{
				"store":true,
				"analyzer":"ik_smart",
				"type":"text"
			}
		}
		}
	}
}

添加文档：

POST请求（其中索引的id为url后的id，若不设置，则会生成默认id，而非属性中的id）: localhost:9200/blog/hello/1

{
	"id":1,
	"title":"字符过滤器",
	"content":"首先字符串经过过滤器（character filter），他们的工作是在表征化（注：这个词叫做断词更适合）前处理字符串。字符过滤器能够去除HTML标记，或者转化为“&”为“and”。  "
}

修改文档：

与添加文档的请求一致。
删除文档：

DELETE请求：localhost:9200/blog/hello/1

查询文档：查询（query）与过滤（filter），查询会对词语进行评分，模糊匹配，根据相关性进行查询，性能较低；过滤是是否匹配，yes或者no，直接进行过滤操作，不进行评分，性能更高

match_all：所有字段的匹配所有
```
{ "match_all": {}}
```
match：全文字段上全文搜索，非全文字段进行精确查询
```
{ "match": { "tweet": "About Search" }}
```

multi_match：多个字段上执行match

{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

range：落在指定区间内的数字或时间

{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

term：不进行文本分析，精确值匹配

{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}

terms：与term一样，但进行多值匹配

{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}

exists和missing：查询是否有值或无值，类似SQL的 IS_NULL 或 NOT IS_NULL
```
{
    "exists":   {
        "field":    "title"
    }
}
```

调试插件

ElasticSearch Head：可使用该工具查看索引、类型和文档的信息，并可执行常用的查询

集群使用

修改config目录下的elasticsearch.yml该配置文件

cluster.name: my-es-cluster #集群名称
node.name: node-2 #节点名称
http.port: 9202 #操作数据的端口
transport.tcp.port: 9302 #各节点相互通信的端口
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9301", "127.0.0.1:9302", "127.0.0.1:9303"] #集群单播地址

分别启动各个节点的elasticsearch
连接集群，进行数据操作

JAVA API调用

依赖jar

<dependencies>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>5.5.1</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>5.5.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/io.netty/netty-transport -->
        <!-- 引入之后PreBuiltTransportClient会引入 -->
        <dependency>
            <groupId>io.netty</groupId>
            <artifactId>netty-transport</artifactId>
            <version>4.1.13.Final</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.9.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb -->
        <!-- 引入连接 mongo数据库的相关jar -->
        <dependency>
            <groupId>com.github.richardwilly98.elasticsearch</groupId>
            <artifactId>elasticsearch-river-mongodb</artifactId>
            <version>2.0.9</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/junit/junit -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter-api</artifactId>
            <version>5.0.0-M4</version>
            <scope>test</scope>
        </dependency>
        <!--lombok注解简化代码-->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.4</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>${jackson.version}</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>${jackson.version}</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>${jackson.version}</version>
        </dependency>
    </dependencies>

类型实体

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Article {
    private long id;
    private String title;
    private String content;
}

测试类

public class ElasticSearchClientTest {
    private TransportClient client;


    // 连接集群
    @Before
    public void init() throws Exception {
        Settings settings = Settings.builder().put("cluster.name", "my-es-cluster").build();
        client = new PreBuiltTransportClient(settings);
        client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9301))
                .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9302));
    }

    // 创建索引
    @Test
    public void createIndex() throws UnknownHostException {
        Settings settings = Settings.builder().put("cluster.name", "my-es-cluster").build();
        TransportClient client = new PreBuiltTransportClient(settings);
        client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9301));
        client.admin().indices().prepareCreate("hello-index").get();
        client.close();
    }

    // 设置文档
    @Test
    public void setMappings() throws Exception {
        Settings settings = Settings.builder().put("cluster.name", "my-es-cluster").build();
        TransportClient client = new PreBuiltTransportClient(settings);
        client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9301))
                .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9302));
        XContentBuilder builder = XContentFactory.jsonBuilder()
                .startObject()
                .startObject("article")
                .startObject("properties")
                .startObject("id")
                .field("store", true)
                .field("type", "long")
                .endObject()
                .startObject("title")
                .field("store", true)
                .field("type", "text")
                .field("analyzer", "ik_smart")
                .endObject()
                .startObject("content")
                .field("store", true)
                .field("type", "text")
                .field("analyzer", "ik_smart")
                .endObject()
                .endObject()
                .endObject()
                .endObject();
        // 提交数据
        client.admin().indices()
                // 索引
                .preparePutMapping("hello-index")
                // 类型
                .setType("article")
                // 设置数据，可以是builder，可以是json字符串
                .setSource(builder)
                .get();
        client.close();
    }

    // 添加文档
    @Test
    public void testAddDocument() throws Exception {
        // 创建client对象
        // 创建文档对象
        XContentBuilder builder = XContentFactory.jsonBuilder()
                .startObject()
                .field("id", 2)
                .field("title", "分词器")
                .field("content", "下一步，分词器（tokenizer）被表征化（断词）为独立的词。一个简单的分词器（tokenizer）可以根据空格或逗号将单词分开（注：这个在中文中不适用）。")
                .endObject();
        // 提交数据
        client.prepareIndex()
                // 设置索引
                .setIndex("hello-index")
                // 设置type
                .setType("article")
                // 设置id，不设置，则会默认生成
                .setId("2")
                // 设置文档信息
                .setSource(builder)
                // 提交信息
                .get();
        // 关闭客户端
        client.close();
    }

    // 加入大量数据，进行测试
    @Test
    public void testDocument2() throws Exception {
        for (int i=0;i<96;i++) {
            Article article = new Article();
            article.setId(i);
            article.setTitle("表征过滤"+i);
            article.setContent(i+"最后，每个词都通过所有表征过滤（token filters），他可以修改词（例如将“Quick”转为小写），去掉词（例如停用词像“a”、“and”、“the”等等），或者增加词（例如同义词像“a”、“and”、“the”等等）或者增加词（例如同义词像“jump”和“leap”）");
            String articleJson = new ObjectMapper().writeValueAsString(article);
            // 提交数据
            client.prepareIndex()
                    // 设置索引
                    .setIndex("hello-index")
                    // 设置type
                    .setType("article")
                    // 设置id，不设置，则会默认生成
                    .setId(""+i)
                    // 设置文档信息
                    .setSource(articleJson, XContentType.JSON)
                    // 提交信息
                    .get();
        }
        // 关闭客户端
        client.close();

    }
    
    // 根据id查询
    @Test
    public void searchById() throws Exception {
        QueryBuilder builder = QueryBuilders.idsQuery().addIds("1","2");
        SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder).get();
        SearchHits hits = article.getHits();
        System.out.println("总数："+hits.getTotalHits());
        Iterator<SearchHit> iterator = hits.iterator();
        while (iterator.hasNext()){
            SearchHit next = iterator.next();
            Map<String, Object> sourceAsMap = next.getSourceAsMap();
            System.out.println(next.getSourceAsString());
            System.out.println("id："+sourceAsMap.get("id"));
            System.out.println("title："+sourceAsMap.get("title"));
            System.out.println("content："+sourceAsMap.get("content"));
        }
        client.close();
    }
    
    // 根据term查询
    @Test
    public void searchByTerm() throws Exception {
        QueryBuilder builder = QueryBuilders.termQuery("title","过滤");
        SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder).get();
        SearchHits hits = article.getHits();
        System.out.println("总数："+hits.getTotalHits());
        Iterator<SearchHit> iterator = hits.iterator();
        while (iterator.hasNext()){
            SearchHit next = iterator.next();
            Map<String, Object> sourceAsMap = next.getSourceAsMap();
            System.out.println(next.getSourceAsString());
            System.out.println("id："+sourceAsMap.get("id"));
            System.out.println("title："+sourceAsMap.get("title"));
            System.out.println("content："+sourceAsMap.get("content"));
        }
        client.close();
    }
    
    // 根据query_string查询
    @Test
    public void searchByQueryString() throws Exception {
        QueryBuilder builder = QueryBuilders.queryStringQuery("过滤器").defaultField("title");
        SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder).get();
        SearchHits hits = article.getHits();
        System.out.println("总数："+hits.getTotalHits());
        Iterator<SearchHit> iterator = hits.iterator();
        while (iterator.hasNext()){
            SearchHit next = iterator.next();
            Map<String, Object> sourceAsMap = next.getSourceAsMap();
            System.out.println(next.getSourceAsString());
            System.out.println("id："+sourceAsMap.get("id"));
            System.out.println("title："+sourceAsMap.get("title"));
            System.out.println("content："+sourceAsMap.get("content"));
        }
        client.close();
    }
    
    // 根据query_string查询
    @Test
    public void searchByQueryStringPage() throws Exception {
        QueryBuilder builder = QueryBuilders.queryStringQuery("表征").defaultField("title");
        SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder)
                // from 起始行号，每页的行数
                .setFrom(0).setSize(5)
                .get();
        SearchHits hits = article.getHits();
        System.out.println("总数："+hits.getTotalHits());
        Iterator<SearchHit> iterator = hits.iterator();
        while (iterator.hasNext()){
            SearchHit next = iterator.next();
            Map<String, Object> sourceAsMap = next.getSourceAsMap();
            System.out.println(next.getSourceAsString());
            System.out.println("id："+sourceAsMap.get("id"));
            System.out.println("title："+sourceAsMap.get("title"));
            System.out.println("content："+sourceAsMap.get("content"));
        }
        client.close();
    }
    
    // 根据query_string查询,并高亮显示
    @Test
    public void searchByQueryStringPageHilight() throws Exception {
        QueryBuilder builder = QueryBuilders.queryStringQuery("表征").defaultField("title");
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title")
                .preTags("<em>")
                .postTags("</em>");
        SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder)
                // from 起始行号，每页的行数
                .setFrom(0).setSize(5)
                // 设置高亮显示
                .highlighter(highlightBuilder)
                .get();
        SearchHits hits = article.getHits();
        System.out.println("总数："+hits.getTotalHits());
        Iterator<SearchHit> iterator = hits.iterator();
        while (iterator.hasNext()){
            SearchHit next = iterator.next();
            Map<String, Object> sourceAsMap = next.getSourceAsMap();
            System.out.println(next.getSourceAsString());
            System.out.println("id："+sourceAsMap.get("id"));
            System.out.println("title："+sourceAsMap.get("title"));
            System.out.println("content："+sourceAsMap.get("content"));
            System.out.println("高亮结果："+next.getHighlightFields());
            System.out.println("高亮结果2："+next.getHighlightFields().get("title").getFragments()[0].string());
        }
        client.close();
    }

    // 根据id查询,并搞了显示
    @Test
    public void searchByIdHilight() throws Exception {
        QueryBuilder builder = QueryBuilders.idsQuery().addIds("1","2");
        SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder).get();
        SearchHits hits = article.getHits();
        System.out.println("总数："+hits.getTotalHits());
        Iterator<SearchHit> iterator = hits.iterator();
        while (iterator.hasNext()){
            SearchHit next = iterator.next();
            Map<String, Object> sourceAsMap = next.getSourceAsMap();
            System.out.println(next.getSourceAsString());
            System.out.println("id："+sourceAsMap.get("id"));
            System.out.println("title："+sourceAsMap.get("title"));
            System.out.println("content："+sourceAsMap.get("content"));
        }
        client.close();
    }
}

Spring-data-elasticsearch

依赖jar

 <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.0.4.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
        </dependency>
        <!--lombok注解简化代码-->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.4</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>compile</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.9</version>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-collections4</artifactId>
            <version>4.1</version>
        </dependency>
    </dependencies>

类型实体

@Data
@NoArgsConstructor
@AllArgsConstructor
// 注解标识索引和类型
@Document(indexName = "zoo",type = "animal")
public class Animal {
    // 注解标识属性的各种信息
    @Field(type = FieldType.Long)
    private long id;
    @Field(type = FieldType.Text,analyzer = "ik_smart")
    private String name;
    @Field(type = FieldType.Text,analyzer = "ik_smart")
    private String hobby;
}

接口实体

// 通过继承ElasticsearchRepository，通过编写接口方法的方式进行数据查询，泛型指定类型的实体和ID的类型
@Component
public interface AnimalRepository extends ElasticsearchRepository<Animal, Long> {

    List<Animal> findAnimalByHobbyOrName(String hobby, String name);

    List<Animal> findAnimalByHobbyOrName(String hobby, String name, Pageable pageable);

}

测试类

@RunWith(SpringRunner.class)
@SpringBootTest(classes = ESApp.class)
public class ElasticsearchApplicationTest {
    @Autowired
    private AnimalRepository animalRepository;
    @Autowired
    private ElasticsearchTemplate template;

    // 添加文档
    @Test
    public void testSaveAnimal() {
        for (int i = 0; i < 100; i++) {
            Animal animal = new Animal();
            animal.setId(2L+i);
            animal.setName("12我是一个大号人啊啊啊啊"+i);
            animal.setHobby("456我是尼采，我是一个大坏人啊啊啊"+i);
            animalRepository.save(animal);
        }
    }

    // 删除文档
    @Test
    public void testDeleteAnimal() {
        Animal animal = new Animal();
        animal.setId(2L);
        animal.setName("123");
        animal.setHobby("456");
        animalRepository.delete(animal);
    }

    // 根据id查询文档
    @Test
    public void testQueryAnimal() {
        Optional<Animal> optional = animalRepository.findById(5L);
        Animal animal = optional.get();
        System.out.println(animal);
    }

    // 通过在接口中命名查询方法查找文档
    @Test
    public void testFindAnimal() {
        List<Animal> animal = animalRepository.findAnimalByHobbyOrName("尼采","一个");
        System.out.println(animal);
        System.out.println(animal.size());
    }

    // 通过在接口中命名查询方法查找文档，且支持分页查询
    @Test
    public void testFindAnimalPage() {
        Pageable pageable = PageRequest.of(0,15);
        List<Animal> animal = animalRepository.findAnimalByHobbyOrName("尼采","一个",pageable);
        System.out.println(animal);
        System.out.println(animal.size());
    }

    // 通过java中的es查询方式进行查询
    @Test
    public void testNativeQuery() {
        NativeSearchQuery searchQuery = new 	NativeSearchQueryBuilder().withQuery(QueryBuilders.queryStringQuery("坏人"))
                .withQuery(QueryBuilders.termQuery("name","我是")).withPageable(PageRequest.of(0,20)).build();
        List<Animal> animal =  template.queryForList(searchQuery,Animal.class);
        System.out.println(animal);
        System.out.println(animal.size());
    }