ElasticSearch学习

最新推荐文章于 2024-05-10 11:24:32 发布

大数据与数据分析

最新推荐文章于 2024-05-10 11:24:32 发布

阅读量1.4k

点赞数

文章标签：大数据

本文链接：https://blog.csdn.net/Ryxiong728/article/details/114286205

版权

ElasticSearch介绍

引言

在海量数据中执行搜索功能, Mysql对于大数据的搜索,效率太低

如果关键字不准确, 一样可以搜索到想要的数据

es介绍

es是使用java语言并且基于Lucene编写的搜索引擎框架, 提供了分布式的全文检索功能, 可以近乎实时的存储, 检索数据, 提供了统一的基于RESTful风格的web接口, 官方客户端对多种语言提供了相应的API

Lucene: Lucene本身就是一个搜索引擎的底层, 本质是一个jar包，里面包含了封装好的各种建立倒排索引，以及进行搜索的代码，包括各种算法。

全文检索是指计算机索引程序通过扫描文章中的每一个词，对每一个词建立一个索引，指明该词在文章中出现的次数和位置，当用户查询时，根据关键字去分词库中进行检索, 找到匹配内容

结构化检索：我想搜索商品分类为日化用品的商品都有哪些，select * from products where category_id=‘日化用品’

es和solr

Solr在查询死数据时, 速度相对于es会更快. 但是如果数据是实时改变的, Solr的查询速度会降低很多, ES的查询效率基本没有变化
Solr搭建基于需要依赖Zookeeper来帮助管理. ES本身就支持集群的搭建, 不需要第三方介入
Solr针对国内的文档并不多, 在ES出现后, 火爆程度直线上升, 文档非常健全
ES对云计算和大数据支持特别好

倒排索引

将存放的数据, 按照一定的方式进行分词, 并且将分词的内容存放到一个单独的分词库中

当用户去查询时, 会将用户的查询关键词进行分词

然后去分词库中匹配内容, 最终得到数据的id标识

根据id标识去存放数据的位置拉取到指定的数据

ElasticSearch安装

安装ES&Kibana

安装ES

version: "3.1"
services:
  elasticsearch:
    image: daocloud.io/library/elasticsearch:6.5.4
    restart: always
    container_name: elasticsearch
    environment:  # 分配的内存，必须指定，因为es默认指定2g，直接内存溢出了，必须改
      - "ES_JAVA_OPTS=-Xms128m -Xmx256m"
      - "discovery.type=single-node"
      - "COMPOSE_PROJECT_NAME=elasticsearch-server"
    ports:
      - 9200:9200
  kibana:
    image: daocloud.io/library/kibana:6.5.4
    restart: always
    container_name: kibana
    ports:
      - 5601:5601
    environment:
      - elasticsearch_url:http://115.159.222.145:9200
    depends_on:
      - elasticsearch

es文件目录

bin 启动文件
config 配置文件
	-log4j2 日志配置
	-jvm.options java虚拟机配置, 配置运行所需内存, 内存不够时配置小一点
	-elasticsearch.yml elasticsearch配置文件, 端口9200
lib 相关jar包
logs 日志
module 功能模块
plugins 插件

elasticsearch启动不起来

elasticsearch exited with code 78

解决：

切换到root用户

执行命令：

sysctl -w vm.max_map_count=262144

查看结果：

sysctl -a|grep vm.max_map_count

显示：

vm.max_map_count = 262144

上述方法修改之后，如果重启虚拟机将失效，所以：

解决办法：

在 /etc/sysctl.conf文件最后添加一行

vm.max_map_count=262144

如果还有问题,注意服务器的内存状态, 可能是内存不够, 需要清理出一些内存.

启动成功后测试

浏览器访问es

http://host:9200

安装Kibana

kibana是一个针对ElasticSearch的开源分析及可视化平台, 用来搜索, 查看交互存储在es索引中的数据.可以通过各种图标进行高级数据分析及展示.

操作简单方便, 数据展示直观

在访问kibana

http://host:5601

安装可视化ES插件head

下载地址:http://mobz.github.io/elasticsearch-head
启动
```
npm install
npm run start
```

跨域问题解决

# 修改es配置文件elasticsearch.yml
http.cors.enabled: true
http.cors.allow-origin: "*"

重启es服务器, 再次连接

安装ik分词器

ik分词器下载地址

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.5.4/elasticsearch-analysis-ik-6.5.4.zip

查看es容器

docker ps | grep elastic

进入es容器内部, 执行bin/目录下elasticsearch-plugin安装ik分词器

./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.5.4/elasticsearch-analysis-ik-6.5.4.zip

如果github网络不好,可以找其他版本使用国内路径

使用接口测试分词效果

注意:

需要重启es加载安装的分词器

docker restart es容器名/id

等待重启后测试分词

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "尚硅谷教育"
}

需要指定分词器类型 analyzer

返回值

{
  "tokens" : [
    {
      "token" : "尚",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "硅谷",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "教育",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

ElasticSearch核心

ES组件

近实时

分为两个意思

从写入数据到数据可以被搜索到有一个小延迟（大概1秒）；
基于es执行搜索和分析可以达到秒级。

Cluster（集群)

集群包含多个节点，每个节点属于哪个集群是通过一个配置（集群名称，默认是elasticsearch）来决定的，对于中小型应用来说，刚开始一个集群就一个节点很正常

node(节点)

集群中的一个节点，节点也有一个名称（默认是随机分配的），节点名称很重要（在执行运维管理操作的时候），默认节点会去加入一个名称为“elasticsearch”的集群，如果直接启动一堆节点，那么它们会自动组成一个elasticsearch集群，当然一个节点也可以组成一个elasticsearch集群。

ElasticSearch存储结构

Index(索引-数据库)

索引包含一堆有相似结构的文档数据，ES服务中,可以建立多个索引

如可以有一个客户索引，商品分类索引，订单索引，索引有一个名称。

每一个索引默认分为5片存储
每个分片会存在至少一个备份
备份分片默认不会帮助检索,当检索压力特别大时, 备份才会帮助检索
备份分片需要放在不同的服务器中

Type(类型-表)

每个索引里都可以有一个或多个type，type是index中的一个逻辑数据分类，一个type下的document，都有相同的field

注意:

ES5.x版本,一个Index下可以创建多个Type
ES5.x版本,一个Index下可以创建一个Type
ES5.x版本,一个Index没有Type

Document(文档-行)

文档是es中的最小数据单元，一个类型下可以有多个document, 一个document可以是一条或多条客户数据

Field(字段-列)

Field是Elasticsearch的最小单位。一个document里面有多个field，每个field就是一个数据字段。

操作ES的RESTful语法

GET请求:

http://ip:port/index: 查询索引信息
http://ip:port/index/type/doc_id: 查询指定的文档信息

POST请求:

http://ip:port/index/type/_search: 查询文档, 可以在请求体中添加json字符串代表查询条件
http://ip:port/index/type/doc_id/_update: 修改文档, 可以在请求体中添加json字符串代表修改的具体内容

PUT请求:

http://ip:port/index: 创建一个索引, 需要在请求体中指定索引的具体信息
http://ip:port/index/type/_mapping: 代表创建索引时, 指定索引文档存储的属性信息

DELETE请求:

http://ip:port/index: 删除索引
http://ip:port/index/type/doc_id: 删除指定文档

索引的操作

创建一个索引

PUT /person
{
   
  "settings": {
   
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}

查看索引信息

kibana图形界面查询

接口查询

# 查看索引信息
GET /person

{
     
  "person" : {
     
    "aliases" : {
      },
    "mappings" : {
      },
    "settings" : {
     
      "index" : {
     
        "creation_date" : "1614596113957",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "bC8PJsegQ16t5EAtNWh_vg",
        "version" : {
     
          "created" : "6050499"
        },
        "provided_name" : "person"
      }
    }
  }
}

删除索引

图形管理界面

接口删除

# 删除索引
DELETE /person

{
  "acknowledged" : true
}

ES中Field类型

String:

text: 用于全文检索, 将当前Field进行分词
keyworld: 当前Field不会被分词

数值类型:

long
integer
byte
double
float

时间类型:

date类型: 针对时间类型指定具体的格式

布尔类型:

boolean类型, 表达true和false

二进制类型:

binary类型暂时支持Base64 encoding string

范围类型:

long_range: 赋值是,只需存储一个范围即可, 指定gt, lt, gte, lte
float_range:
integer_range:
date_range:
ip_range:

经纬度类型:

geo_point: 用来存储经纬度的

ip类型:

ip: 可以存储ipv4或者ipv6

其他

创建索引并指定数据结构

# 创建索引, 指定数据类型
PUT /book
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  },
  "mappings": {
    "novel": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "ik_max_word",
          "index": true,
          "store": false 
        },
        "author": {
          "type": "keyword"
        },
        "count": {
          "type": "long"
        },
        "onSale": {
          "type": "date",
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd|epoch_millis"
        },
        "desc": {
          "type": "text",
          "analyzer": "ik_max_word"
        }
      }
    }
  }
}

解释:

number_of_shards: 分片
number_of_replicas: 备份
mappings: 指定数据结构
novel: 指定的类型名
properties: 文档中字段的定义
name: 指定一个字段名为name
type: 指定该字段的类型
analyzer: 指定使用的分词器
index: true指定当前的field可以被作为查询条件
store: 是否需要额外存储
format: 指定时间存储的格式

文档的操作

文档在ES服务器中唯一的标识, _index, _type, _id三个内容为组合, 锁定一个文档, 操作时添加还是修改

新建文档

自动生成id

# 添加文档
POST /book/novel
{
  "name": "斗罗",
  "author": "西红柿",
  "count": 10000,
  "onSale": "2000-01-01",
  "desc": "斗罗大陆修仙小说"
}

手动指定id

# 手动指定id
PUT /book/novel/1
{
  "name": "红楼梦",
  "author": "曹雪芹",
  "count": 10000,
  "onSale": "1758-01-01",
  "desc": "红楼梦小说"
}

修改文档

覆盖式修改

# 手动指定id
PUT /book/novel/1
{
  "name": "红楼梦",
  "author": "曹雪芹",
  "count": 20000,
  "onSale": "1758-01-01",
  "desc": "红楼梦小说"
}

doc修改方式

# 修改文档，基于doc方式
POST /book/novel/1/_update
{
  "doc": {
    "count": 123455
    # 指定修改的field和对应的值
  }
}

删除文档

# 根据id删除文档
DELETE /book/novel/Ile37XcBdlEqQ4RmWKpJ

Java操作ElasticSearch

java连接ES

创建maven工程

导入依赖

<dependencies>
    <!-- 1.elasticsearch -->
    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
        <version>6.5.4</version>
    </dependency>
    <!-- 2.elasticsearch API -->
    <dependency>
        <groupId>org.elasticsearch.client</groupId>
        <artifactId>elasticsearch-rest-high-level-client</artifactId>
        <version>6.5.4</version>
    </dependency>
    <!-- 3. junit-->
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
    </dependency>
    <!-- 4. lombok-->
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <version>1.16.22</version>
    </dependency>
    <!-- 5. jackson -->
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.10.2</version>
    </dependency>
</dependencies>

创建es连接

package com.example.utils;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/1/21
 * @Description:
 */
public class ESClient {
     
    public static RestHighLevelClient getClient() {
     
        // 1.创建HttpHost对象
        HttpHost httpHost = new HttpHost("115.159.222.145", 9200);

        // 2. 创建RestClientBuilder
        RestClientBuilder clientBuilder = RestClient.builder(httpHost);

        // 3. 创建RestHighLevelClient
        RestHighLevelClient client = new RestHighLevelClient(clientBuilder);
        // 返回client对象
        return client;
    }
}

java操作索引

创建索引

package com.example.test;

import com.example.utils.ESClient;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.json.JsonXContent;
import org.junit.Test;

import java.io.IOException;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/1/21
 * @Description:
 */
public class Demo02 {
   
    RestHighLevelClient client = ESClient.getClient();
    String index = "person";
    String type = "info";
    /*
    "mappings": {
        "novel": {
          "properties": {
            "name": {
              "type": "text",
              "analyzer": "ik_max_word",
              "index": true,
              "store": false
            },
            "author": {
              "type": "keyword"
            },
            "count": {
              "type": "long"
            },
            "onSale": {
              "type": "date",
              "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
            },
            "desc": {
              "type": "text",
              "analyzer": "ik_max_word"
            }
          }
        }
      }
     */
    @Test
    public void createIndex() throws IOException {
   
        // 1. 准备索引的settings
        Settings.Builder settings = Settings.builder();
        settings.put("number_of_shards", 3);
        settings.put("number_of_replicas", 1);

        // 2. 准备关于索引的结构mappings
        XContentBuilder mappings = JsonXContent.contentBuilder()
                .startObject()
                    .startObject("properties")
                        .startObject("name")
                            .field("type", "text")
                        .endObject()
                        .startObject("age")
                            .field("type", "integer")
                        .endObject()
                        .startObject("birthday")
                            .field("type", "date")
                            .field("format", "yyyy-MM-dd")
                        .endObject()
                    .endObject()
                .endObject();

        // 3. 将settings和mappings封装到Request对象中
        CreateIndexRequest request = new CreateIndexRequest(index);
        request.settings(settings);
        request.mapping(type, mappings);

        // 4. 通过client连接ES并创建索引
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);

        // 5. 输出
        System.out.println(response.toString());

    }
}

检查索引是否存在

@Test
public void isExists() throws IOException {
   
    // 1. 准备request对象
    GetIndexRequest request = new GetIndexRequest();
    request.indices(index);

    // 2. 通过client去操作
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);

    // 3. 打印
    System.out.println(exists);
}

删除索引

@Test
public void deleteIndex() throws IOException {
   
    // 1. 准备request对象
    DeleteIndexRequest delete = new DeleteIndexRequest();
    delete.indices(index);

    // 2. 通过client操作
    AcknowledgedResponse resp = client.indices().delete(delete, RequestOptions.DEFAULT);

    // 3. 获取返回结果
    System.out.println(resp.isAcknowledged());
}

java操作文档

添加文档

person实例

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Person {
   
    @JsonIgnore  // 注解: 序列化是忽略id字段
    private Integer id;
    private String name;
    private Integer age;
    @JsonFormat(pattern = "yyyy-MM-dd")  // 序列化是将date格式化为 yyyy-MM-dd类型
    private Date birthday;

}

创建案例

package com.example.test;

import com.example.entity.Person;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;

import java.io.IOException;
import java.util.Date;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/1/21
 * @Description:
 */
public class Demo03 {
   
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "person";
    String type = "info";

    @Test
    public void createDocument() throws IOException {
   
        // 1. 准备一个json数据
        Person person = new Person(1, "张三", 23, new Date());
        String json = mapper.writeValueAsString(person);
        System.out.println(json);

        // 2. 准备一个request对象
        IndexRequest indexRequest = new IndexRequest(index, type, person.getId().toString());
        indexRequest.source(json, XContentType.JSON);

        // 3. 通过client对象添加文档
        IndexResponse resp = client.index(indexRequest, RequestOptions.DEFAULT);

        // 4. 打印结果
        System.out.println(resp.getResult().toString());
    }
}

修改文档

@Test
public void updateDocument() throws IOException {
   
    // 1. 创建一个map, 指定修改的内容
    Map<String, Object> doc = new HashMap<String, Object>();
    doc.put("name", "李四");
    String docId = "1";

    // 2. 创建request对象, 封装数据
    UpdateRequest updateRequest = new UpdateRequest(index, type, docId);
    updateRequest.doc(doc);

    // 3. 通过client对象执行
    UpdateResponse response = client.update(updateRequest, RequestOptions.DEFAULT);

    // 4. 输出返回结果
    System.out.println(response.getResult().toString());

}

删除文档

@Test
public void deleteDocument() throws IOException {
   
    // 1. 封装request对象
    DeleteRequest deleteRequest = new DeleteRequest(index, type, "1");

    // 2. 通过client执行
    DeleteResponse response = client.delete(deleteRequest, RequestOptions.DEFAULT);

    // 3. 输出结果
    System.out.println(response.getResult().toString());

}

java批量操作文档

批量添加文档

@Test
public void bulkCreateDocument() throws IOException {
   
    // 1. 准备多个json数据
    Person p1 = new Person(1, "张三", 23, new Date());
    Person p2 = new Person(2, "李四", 24, new Date());
    Person p3 = new Person(3, "王五", 25, new Date());

    String json1 = mapper.writeValueAsString(p1);
    String json2 = mapper.writeValueAsString(p2);
    String json3 = mapper.writeValueAsString(p3);

    // 2. 创建Request, 将准备好的数据封装
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.add(new IndexRequest(index, type, p1.getId().toString()).source(json1, XContentType.JSON));
    bulkRequest.add(new IndexRequest(index, type, p2.getId().toString()).source(json2, XContentType.JSON));
    bulkRequest.add(new IndexRequest(index, type, p3.getId().toString()).source(json3

最低0.47元/天解锁文章

大数据与数据分析

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch学习

文章目录ElasticSearch介绍引言es介绍es和solr倒排索引ElasticSearch安装安装ES&Kibanaelasticsearch启动不起来启动成功后测试安装ik分词器使用接口测试分词效果ElasticSearch核心ES组件近实时Cluster（集群)node(节点)ElasticSearch存储结构Index(索引-数据库)Type(类型-表)Document(文档-行)Field(字段-列)操作ES的RESTful语法索引的操作创建一个索引查看索引信息删除索引ES中Fiel
复制链接

扫一扫