ElasticSearch：简单介绍以及使用Docker部署ElasticSearch 和 Kibana

codingXT

已于 2022-02-15 16:46:54 修改

阅读量8.1k

点赞数

文章标签： elasticsearch

于 2022-02-12 15:19:50 首次发布

本文链接：https://blog.csdn.net/qq_37774171/article/details/122895226

版权

ElasticSearch介绍

ElasticSearch是什么？

Elasticsearch 是一个分布式的免费开源搜索和分析引擎，适用于包括文本、数字、地理空间、结构化和非结构化数据等在内的所有类型的数据。Elasticsearch 在 Apache Lucene 的基础上开发而成，由 Elasticsearch N.V.（即现在的 Elastic）于 2010 年首次发布。Elasticsearch 以其简单的 REST 风格 API、分布式特性、速度和可扩展性而闻名，是 Elastic Stack 的核心组件；Elastic Stack 是一套适用于数据采集、扩充、存储、分析和可视化的免费开源工具。人们通常将 Elastic Stack 称为 ELK Stack（代指 Elasticsearch、Logstash 和 Kibana），目前 Elastic Stack 包括一系列丰富的轻量型数据采集代理，这些代理统称为 Beats，可用来向 Elasticsearch 发送数据。

ElasticSearch是一个分布式，高性能、高可用、可伸缩、RESTful 风格的搜索和数据分析引擎。通常作为Elastic Stack的核心来使用.

Elasticsearch 的用途是什么？

Elasticsearch 在速度和可扩展性方面都表现出色，而且还能够索引多种类型的内容，这意味着其可用于多种用例：

应用程序搜索
网站搜索
企业搜索
日志处理和分析
基础设施指标和容器监测
应用程序性能监测
地理空间数据分析和可视化
安全分析
业务分析

Elasticsearch 的工作原理是什么？

原始数据会从多个来源（包括日志、系统指标和网络应用程序）输入到 Elasticsearch 中。数据采集指在 Elasticsearch 中进行索引之前解析、标准化并充实这些原始数据的过程。这些数据在 Elasticsearch 中索引完成之后，用户便可针对他们的数据运行复杂的查询，并使用聚合来检索自身数据的复杂汇总。在 Kibana 中，用户可以基于自己的数据创建强大的可视化，分享仪表板，并对 Elastic Stack 进行管理。

Elasticsearch 索引是什么？

Elasticsearch 索引指相互关联的文档集合。Elasticsearch 会以 JSON 文档的形式存储数据。每个文档都会在一组键（字段或属性的名称）和它们对应的值（字符串、数字、布尔值、日期、数值组、地理位置或其他类型的数据）之间建立联系。

Elasticsearch 使用的是一种名为倒排索引的数据结构，这一结构的设计可以允许十分快速地进行全文本搜索。倒排索引会列出在所有文档中出现的每个特有词汇，并且可以找到包含每个词汇的全部文档。

在索引过程中，Elasticsearch 会存储文档并构建倒排索引，这样用户便可以近实时地对文档数据进行搜索。索引过程是在索引 API 中启动的，通过此 API 您既可向特定索引中添加 JSON 文档，也可更改特定索引中的 JSON 文档。

Logstash 的用途是什么？

Logstash 是 Elastic Stack 的核心产品之一，可用来对数据进行聚合和处理，并将数据发送到 Elasticsearch。Logstash 是一个开源的服务器端数据处理管道，允许您在将数据索引到 Elasticsearch 之前同时从多个来源采集数据，并对数据进行充实和转换。

Kibana 的用途是什么？

Kibana 是一款适用于 Elasticsearch 的数据可视化和管理工具，可以提供实时的直方图、线形图、饼状图和地图。Kibana 同时还包括诸如 Canvas 和 Elastic Maps 等高级应用程序；Canvas 允许用户基于自身数据创建定制的动态信息图表，而 Elastic Maps 则可用来对地理空间数据进行可视化。

为何使用 Elasticsearch？

Elasticsearch 很快。由于 Elasticsearch 是在 Lucene 基础上构建而成的，所以在全文本搜索方面表现十分出色。Elasticsearch 同时还是一个近实时的搜索平台，这意味着从文档索引操作到文档变为可搜索状态之间的延时很短，一般只有一秒。因此，Elasticsearch 非常适用于对时间有严苛要求的用例，例如安全分析和基础设施监测。

Elasticsearch 具有分布式的本质特征。Elasticsearch 中存储的文档分布在不同的容器中，这些容器称为分片，可以进行复制以提供数据冗余副本，以防发生硬件故障。Elasticsearch 的分布式特性使得它可以扩展至数百台（甚至数千台）服务器，并处理 PB 量级的数据。

Elasticsearch 包含一系列广泛的功能。除了速度、可扩展性和弹性等优势以外，Elasticsearch 还有大量强大的内置功能（例如数据汇总和索引生命周期管理），可以方便用户更加高效地存储和搜索数据。

Elastic Stack 简化了数据采集、可视化和报告过程。通过与 Beats 和 Logstash 进行集成，用户能够在向 Elasticsearch 中索引数据之前轻松地处理数据。同时，Kibana 不仅可针对 Elasticsearch 数据提供实时可视化，同时还提供 UI 以便用户快速访问应用程序性能监测 (APM)、日志和基础设施指标等数据。

我们可以在DB-Engines Ranking查询到ElasticSearch是非常受欢迎的
在这里插入图片描述
ElasticSearch的技术体系也是非常全面丰富

Elastic Stack生态图如下所示

ElasticSearch的基础概念

Near Realtime（NRT）近实时。数据提交索引后，立马就可以搜索到。
Cluster集群，一个集群由一个唯一的名字标识，默认为“elasticsearch”。集群名称非常重要，具有相同集群名的节点才会组成一个集群。集群名称可以在配置文件中指定。
Node节点：存储集群的数据，参与集群的索引和搜索功能。像集群有名字，节点也有自己的名称，默认在启动时会以一个随机的UUID的前七个字符作为节点的名字，你可以为其指定任意的名字。通过集群名在网络中发现同伴组成集群。一个节点也可是集群。一个节点就是一个ES实例，本质上就是一个java进程

ES的节点类型主要分为如下几种：

Master Eligible节点：每个节点启动后，默认就是Master Eligible节点，可以通过设置node.master:false 来禁止。Master Eligible可以参加选主流程，并成为Master节点（当第一个节点启动后，它会将自己选为Master节点）；
注意：每个节点都保存了集群的状态，只有Master节点才能修改集群的状态信息。

Data节点：可以保存数据的节点。主要负责保存分片数据，利于数据扩展。

Coordinating 节点：负责接收客户端请求，将请求发送到合适的节点，最终把结果汇集到一起

注意：每个节点默认都起到了Coordinating node的职责。一般在开发环境中一个节点可以承担多个角色，但是在生产环境中，还是设置单一的角色比较好，因为有助于提高性能。

Index 索引:一个索引是一个文档的集合（等同于solr中的集合）。每个索引有唯一的名字，通过这个名字来操作它。一个集群中可以有任意多个索引。索引就相当于MySql里的数据库，它是具有某种相似特性的文档集合。反过来说不同特性的文档一般都放在不同的索引里；ElasticSearch 使用的是倒排索引，采用Lucene倒排索引做为底层。
Type 类型：指在一个索引中，可以索引不同类型的文档，如用户数据、博客数据。类型就相当于MySql里的表，我们知道MySql里一个库下可以有很多表，最原始的时候ES也是这样，一个索引下可以有很多类型，但是从6.0版本开始，type已经被逐渐废弃，但是这时候一个索引仍然可以设置多个类型，一直到7.0版本开始，一个索引就只能创建一个类型了
Document 文档：被索引的一条数据，索引的基本信息单元，以JSON格式来表示。Java是面向对象的，而Elasticsearch是面向文档的，也就是说文档是所有可搜索数据的最小单元。ES的文档就像MySql中的一条记录，只是ES的文档会被序列化成json格式，保存在Elasticsearch中；这个json对象是由字段组成，字段就相当于Mysql的列，每个字段都有自己的类型
Shard分片：ES里面的索引可能存储大量数据，这些数据可能会超出单个节点的硬件限制。为了解决这个问题，ES提供了将索引细分为多个碎片的功能，这就是分片。在创建一个索引时可以指定分成多少个分片来存储。每个分片本身也是一个功能完善且独立的“索引”，可以被放置在集群的任意节点上。

分片的好处：
允许我们水平切分/扩展容量可在多个分片上进行分布式的、并行的操作，提高系统的性能和吞吐量。
ES可以完全自动管理分片的分配和文档的聚合来完成搜索请求，并且对用户完全透明；

注意：主分片数在索引创建时指定，后续只能通过Reindex修改，但是较麻烦，一般不进行修改。备份数可以随时改。

Replication 备份: 一个分片可以有多个备份（副本）。

备份的好处：
1.高可用。一个主分片挂了，副本分片就顶上去扩展搜索的并发能力、吞吐量。需要注意的是，副本分片永远不会分配到复制它的原始或主分片所在的节点上,因为这样这个节点挂了，主分片和副本分片就都毁了，所以鸡蛋不会放在同一个篮子里；
2.可以提高扩展搜索量和吞吐量，因为ES允许在所有副本上并行执行搜索；
3.默认情况下，ES中的每个索引都分配5个主分片，并为每个主分片分配1个副本分片。主分片在创建索引时指定，不能修改，副本分片可以修改。

概念对比如如下
在这里插入图片描述

Docker 部署ElasticSearch 和 Kibana

拉取镜像

搜寻仓库里面有那些镜像

docker search elasticsearch

可以看见官方的镜像STARS最多，但是下面的那个镜像融合了ElasticSearch和 Kibana
版本为7.16，虽然现在ElasticSearch 到了8.1版本。但是7.16也足够使用了。
在这里插入图片描述
拉取镜像（后面没加版本号，默认拉取最新版）

docker pull nshou/elasticsearch-kibana

创建文件夹

一般就需要部署这些类型的文件夹来挂载服务容器内部的文件，先创建着，以后说不定有用得着的时候
也就是说把配置文件，日志文件，持久化文件挂载到宿主机上面

mkdir -p /usr/local/elasticsearch/config
mkdir -p /usr/local/elasticsearch/logs
mkdir -p /usr/local/elasticsearch/data

mkdir -p /usr/local/kibana/config

赋予用户这些文件夹的权限
-R 递归赋予权限

chmod -R 777 /usr/local/elasticsearch
chmod -R 777 /usr/local/kibana/config

ES配置文件elasticsearch.yml

elasticsearch 的配置文件,修改一下集群名称，节点名称，日志和数据的存储路径

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: xt-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
##索引数据的存储路径
path.data: /home/elasticsearch/elasticsearch-7.16.2/data 
#
# Path to log files:
##日志文件的存储路径
path.logs: /home/elasticsearch/elasticsearch-7.16.2/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
#network.host: 192.168.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
#
# ---------------------------------- Security ----------------------------------
#
#                                 *** WARNING ***
#
# Elasticsearch security features are not enabled by default.
# These features are free, but require configuration changes to enable them.
# This means that users don’t have to provide credentials and can get full access
# to the cluster. Network connections are also not encrypted.
#
# To protect your data, we strongly encourage you to enable the Elasticsearch security features. 
# Refer to the following documentation for instructions.
#
# https://www.elastic.co/guide/en/elasticsearch/reference/7.16/configuring-stack-security.html

ES配置文件 jvm.options

java 虚拟机相关的配置

可以修改里面的堆大小，如果你的服务器资源比较紧缺的话，最好还是调小一点，如下所示，我最小设置到512M，最大是1G

-Xms512m
-Xmx1g

################################################################
##
## JVM configuration
##
################################################################
##
## WARNING: DO NOT EDIT THIS FILE. If you want to override the
## JVM options in this file, or set any additional options, you
## should create one or more files in the jvm.options.d
## directory containing your adjustments.
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/jvm-options.html
## for more information.
##
################################################################



################################################################
## IMPORTANT: JVM heap size
################################################################
##
## The heap size is automatically configured by Elasticsearch
## based on the available memory in your system and the roles
## each node is configured to fulfill. If specifying heap is
## required, it should be done through a file in jvm.options.d,
## and the min and max should be set to the same value. For
## example, to set the heap to 4 GB, create a new file in the
## jvm.options.d directory containing these lines:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/heap-size.html
## for more information
##
################################################################


################################################################
## Expert settings
################################################################
##
## All settings below here are considered expert settings. Do
## not adjust them unless you understand what you are doing. Do
## not edit them in this file; instead, create a new file in the
## jvm.options.d directory containing your adjustments.
##
################################################################

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC

## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails; heap dumps
# are created in the working directory of the JVM unless an alternative path is
# specified
-XX:+HeapDumpOnOutOfMemoryError

# exit right after heap dump on out of memory error. Recommended to also use
# on java 8 for supported versions (8u92+).
9-:-XX:+ExitOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=data

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=logs/hs_err_pid%p.log

## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m

kibana配置文件kibana.yml

加一个汉化,放在本地的kibana的config文件夹中

i18n.locale: zh-CN

创建启动容器

端口映射保持不变

docker run -itd -p 9200:9200 -p 9300:9300 -p 5601:5601 \
-v /usr/local/elasticsearch/config/jvm.options:/home/elasticsearch/elasticsearch-7.16.2/config/jvm.options \
-v /usr/local/elasticsearch/config/elasticsearch.yml:/home/elasticsearch/elasticsearch-7.16.2/config/elasticsearch.yml \
-v /usr/local/elasticsearch/data:/home/elasticsearch/elasticsearch-7.16.2/data \
-v /usr/local/elasticsearch/logs:/home/elasticsearch/elasticsearch-7.16.2/logs \
-v /usr/local/kibana/config/kibana.yml:/home/elasticsearch/kibana-7.16.2-linux-x86_64/config/kibana.yml \
--name EsKibana  nshou/elasticsearch-kibana

9300 ElasticSearch通信端口
9200 ElasticSearch 访问端口
5601 Kibana访问端口

参数说明：

-p 9200:9200 ：将容器的 9200 端口映射到主机的 9200 端口（前面的为主机）
–name ：启动后的容器名
-v 将配置文件挂载出来（所以将上面两个文件移动到指定文件路径）

对外开放端口

firewall-cmd --permanent --add-port=9200/tcp
firewall-cmd --permanent --add-port=9300/tcp
firewall-cmd --permanent --add-port=5601/tcp

重启防火墙(修改配置后要重启防火墙)

firewall-cmd --reload

还需要去服务器的控制台去开启防火墙，
如果是阿里云，还需要在安全组里面手动添加端口，配置可由那些IP访问。

安装ElasticSearch Head

它相当于是ES的图形化界面，它是一个浏览器的扩展程序，直接在chrome浏览器扩展程序里下载安装即可：
chrome应用商店网址
在这里插入图片描述

访问ElasticSearch

http://ip:9200/

在这里插入图片描述

访问Kibana

http://ip:5601/

在这里插入图片描述

使用ElasticSearch Head

ElasticSearch-head就是一款能连接ElasticSearch搜索引擎，并提供可视化的操作页面对ElasticSearch搜索引擎进行各种设置和数据检索功能的管理插件，如在head插件页面编写RESTful接口风格的请求，就可以对ElasticSearch中的数据进行增删改查、创建或者删除索引等操作。类似于使用navicat工具连接MySQL这种关系型数据库，对数据库做操作。
在这里插入图片描述

集群健康值。Elasticsearch 中其实有专门的衡量索引健康状况的标志，分为三个等级：

green，绿色。这代表所有的主分片和副本分片都已分配。你的集群是 100% 可用的。
yellow，黄色。所有的主分片已经分片了，但至少还有一个副本是缺失的。不会有数据丢失，所以搜索结果依然是完整的。不过，你的高可用性在某种程度上被弱化。如果更多的分片消失，你就会丢数据了。所以可把yellow 想象成一个需要及时调查的警告。
red，红色。至少一个主分片以及它的全部副本都在缺失中。这意味着你在缺少数据：搜索只能返回部分数据，而分配到这个分片上的写入请求会返回一个异常。

References:

https://www.elastic.co/cn/what-is/elasticsearch
https://ropledata.blog.csdn.net/article/details/106423578?spm=1001.2101.3001.6650.3&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7ERate-3.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7ERate-3.pc_relevant_default&utm_relevant_index=6
https://www.cnblogs.com/tdp0108/p/11105848.html
https://www.cnblogs.com/leeSmall/p/9189078.html

（写博客主要是对自己学习的归纳整理，资料大部分来源于书籍、网络资料、官方文档和自己的实践，整理的不足和错误之处，请大家评论区批评指正。同时感谢广大博主和广大作者辛苦整理出来的资源和分享的知识。）

codingXT

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch：简单介绍以及使用Docker部署ElasticSearch 和 Kibana

ElasticSearch在实际生产里通常和LogStash，Kibana，FileBeat一起构成Elastic Stack来使用ElasticSearch介绍ElasticSearch是什么？Elasticsearch 是一个分布式的免费开源搜索和分析引擎，适用于包括文本、数字、地理空间、结构化和非结构化数据等在内的所有类型的数据。Elasticsearch 在 Apache Lucene 的基础上开发而成，由 Elasticsearch N.V.（即现在的 Elastic）于 2010 年首次发
复制链接

扫一扫