生产级ELK日志平台搭建

最新推荐文章于 2024-06-27 18:19:02 发布

weixin_33994429

最新推荐文章于 2024-06-27 18:19:02 发布

阅读量531

点赞数

文章标签：操作系统 ldap git

原文链接：http://blog.51cto.com/zgui2000/2406177

版权

1. 生产级别ELK日志平台的搭建:

在本篇博文中，主要讲解如下几个知识点和实践经验，供大家参考：

1. ELK日志平台的简介：

2. kafka集群的搭建：

3. logstash集群的搭建：

4. elasticsearch集群的搭建：

5. kibana的安装和配置：

6. x-pack插件的介绍：

7. filebeat日志收集客户端介绍：

1. ELK日志平台的简介：

本文所介绍的ELK日志平台主要的组件包括： filebeat、kafka、logstash、elasticsearch、kibana.每个组件的用途主要包含如下：

首先介绍一下为什么要有统一日志平台？通常日志由服务器生成，输出到不同的文件中，一般会有系统日志、应用日志、安全日志。这些日志分散地存储在不同的机器上。
当系统发生故障时，工程师需要登录到各个服务器上，使用 grep / sed / awk 等 Linux 脚本工具去日志里查找故障原因。在没有日志系统的情况下，首先需要定位处理请求的服务器，如果这台服务器部署了多个实例，则需要去每个应用实例的日志目录下去找日志文件。每个应用实例还会设置日志滚动策略（如：每天生成一个文件），还有日志压缩归档策略等。这样一系列流程下来，对于我们排查故障以及及时找到故障原因，造成了比较大的麻烦。因此，如果我们能把这些日志集中管理，并提供集中检索功能，不仅可以提高诊断的效率，同时对系统情况有个全面的理解，避免事后救火的被动。

我认为统一日志平台有以下几个重要的作用：

数据查找：通过检索日志信息，定位相应的 bug ，找出解决方案
服务诊断：通过对日志信息进行统计、分析，了解服务器的负荷和服务运行状态
数据分析：可以做进一步的数据分析，比如根据nginx的access日志分析pv、uv等；
那么本文中用到的组件都是做什么用途的？，主要包括如下：

Logstash ：数据收集处理引擎。支持动态的从各种数据源搜集数据，并对数据进行过滤、分析、丰富、统一格式等操作，然后存储以供后续使用。

Kibana ：可视化化平台。它能够搜索、展示存储在 Elasticsearch 中索引数据。使用它可以很方便的用图表、表格、地图展示和分析数据。

Elasticsearch ：分布式搜索引擎。具有高可伸缩、高可靠、易管理等特点。可以用于全文检索、结构化检索和分析，并能将这三者结合起来。Elasticsearch 基于 Lucene 开发，现在使用最广的开源搜索引擎之一，Wikipedia 、StackOverflow、Github 等都基于它来构建自己的搜索引擎。

Filebeat ：轻量级数据收集引擎。基于原先 Logstash-fowarder 的源码改造出来。换句话说：Filebeat就是新版的 Logstash-fowarder，也会是 ELK Stack 在 shipper 端的第一选择。

kafka : 在我们今天演示的这个场景中，首先通过filebeat 来收集数据，然后经过 Output 插件将数据投递到 Kafka 集群中，然后由logstash集群实现kafka消息的INPUT，然后再OUTPUT到es集群，这样当遇到 Logstash 接收数据的能力超过 Elasticsearch 集群处理能力的时候，就可以通过队列就能起到削峰填谷的作用， Elasticsearch 集群就不存在丢失数据的问题。

关于多机房部署ELK集群，比如我们公司有好几个机房，如果只部署一个ELK集群，那么所有机房的日志都输出到这一个ELK集群，这对于网络带宽的要求非常高。除非你们公司很豪，能够有足够的专线带宽，不然还是建议每一个机房搭建一套ELK集群；

关于这次实验所用到的配置信息参考如下：

服务器IP	机器配置	角色	安装软件
192.168.1.12	4C/8G/100G	KAFKA集群节点1	kafka_2.11-1.0.0.tgz、kafka manager
192.168.1.14	4C/8G/100G	KAFKA集群节点2	kafka_2.11-1.0.0.tgz
192.168.1.15	4C/8G/100G	KAFKA集群节点3	kafka_2.11-1.0.0.tgz
192.168.1.16	4C/8G/100G	logstash日志转发和格式定义	logstash-6.2.4.tar.gz
192.168.1.17	4C/8G/100G	Elasticsearch集群master节点1	elasticsearch-6.2.4.tar.gz、x-pack-6.2.4.zip
192.168.1.18	4C/8G/100G	Elasticsearch集群master节点2	elasticsearch-6.2.4.tar.gz、x-pack-6.2.4.zip
192.168.1.25	4C/8G/100G	Elasticsearch集群master节点3和kibana日志展示节点	elasticsearch-6.2.4.tar.gz、x-pack-6.2.4.zip、kibana-6.2.4-linux-x86_64.tar.gz
192.168.1.19	4C/8G/100G	Elasticsearch集群client节点1	elasticsearch-6.2.4.tar.gz、x-pack-6.2.4.zip、head插件、Cerebro v0.8.3
192.168.1.24	4C/8G/100G	Elasticsearch集群client节点2	elasticsearch-6.2.4.tar.gz、x-pack-6.2.4.zip
192.168.1.26	8C/16G/500G	Elasticsearch集群DATA节点1	elasticsearch-6.2.4.tar.gz、x-pack-6.2.4.zip
192.168.1.28	8C/16G/500G	Elasticsearch集群DATA节点2	elasticsearch-6.2.4.tar.gz、x-pack-6.2.4.zip
192.168.1.29	8C/16G/500G	Elasticsearch集群DATA节点3	elasticsearch-6.2.4.tar.gz、x-pack-6.2.4.zip

2 kafka集群的搭建：

首先是一些基础配置，以前的博文都多次阐述过，这里就不多说。主要包括关闭selinux、关闭防火墙、配置Ntp、配置hosts文件、配置yum仓库等；

# 每台机器都是一样，先下载kafka软件，然后解压到/usr/local下面做软链接：
tag xzvf  kafka_2.11-1.0.0.tgz -C /usr/local
cd /usr/local
ln -sv kafka_2.11-1.0.0 kafka

#由于kafka自带了zookeeper，所以每一台机器必须分别都要配置zk的配置文件和kafka的配置文件
# 修改第一台zk的配置文件：

[root@SZ1PRDELK00AP002 config]# cat zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.1.12:2888:3888
server.2=192.168.1.14:2888:3888
server.3=192.168.1.15:2888:3888

#提前创建目录/opt/zookeeper，并且要创建myid文件，内容是server.$1的值，这里是1
[root@SZ1PRDELK00AP002 ~]# cat /opt/zookeeper/myid
1

# 紧接着修改第1台的kafka的配置文件：

[root@SZ1PRDELK00AP002 config]# cat server.properties
broker.id=1
listeners=PLAINTEXT://192.168.1.12:9092
port=9092
host.name=192.168.1.12
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=true
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
[root@SZ1PRDELK00AP002 config]#

[root@SZ1PRDELK00AP002 config]# cat producer.properties
bootstrap.servers=192.168.1.12:9092,192.168.1.14:9092,192.168.1.15:9092
compression.type=none
[root@SZ1PRDELK00AP002 config]#

[root@SZ1PRDELK00AP002 config]# cat consumer.properties
zookeeper.connect=192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181
zookeeper.connection.timeout.ms=6000
group.id=test-consumer-group
[root@SZ1PRDELK00AP002 config]#

# 紧接着修改第二台kafka集群节点的zk配置文件:

[root@SZ1PRDELK00AP003 config]# cat zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.1.12:2888:3888
server.2=192.168.1.14:2888:3888
server.3=192.168.1.15:2888:3888

[root@SZ1PRDELK00AP003 config]# cat /opt/zookeeper/myid
2

# 修改第二台kafka集群节点的kafka配置文件:

[root@SZ1PRDELK00AP003 config]# cat server.properties
broker.id=2
listeners=PLAINTEXT://192.168.1.14:9092
port=9092
host.name=192.168.1.14
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=true
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
[root@SZ1PRDELK00AP003 config]#

[root@SZ1PRDELK00AP003 config]# cat producer.properties
bootstrap.servers=192.168.1.12:9092,192.168.1.14:9092,192.168.1.15:9092
compression.type=none
[root@SZ1PRDELK00AP003 config]#

[root@SZ1PRDELK00AP003 config]# cat consumer.properties
zookeeper.connect=192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181
zookeeper.connection.timeout.ms=6000
group.id=test-consumer-group
[root@SZ1PRDELK00AP003 config]#

# 现在开始修改第三台kafka集群的zk配置文件:

[root@SZ1PRDELK00AP004 config]# cat zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.1.12:2888:3888
server.2=192.168.1.14:2888:3888
server.3=192.168.1.15:2888:3888
[root@SZ1PRDELK00AP004 config]#
[root@SZ1PRDELK00AP004 config]# cat /opt/zookeeper/myid
3

# 修改第三台kafka集群的kafka配置文件:

[root@SZ1PRDELK00AP004 config]# cat server.properties
broker.id=3
listeners=PLAINTEXT://192.168.1.15:9092
port=9092
host.name=192.168.1.15
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=true
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
[root@SZ1PRDELK00AP004 config]#

[root@SZ1PRDELK00AP004 config]# cat producer.properties
bootstrap.servers=192.168.1.12:9092,192.168.1.14:9092,192.168.1.15:9092
compression.type=none

[root@SZ1PRDELK00AP004 config]# cat consumer.properties
zookeeper.connect=192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181
zookeeper.connection.timeout.ms=6000
group.id=test-consumer-group
[root@SZ1PRDELK00AP004 config]#

配置好kafka集群之后，就需要启动kafka集群了，启动的方式如下:

/usr/local/kafka/bin
./zookeeper-server-start.sh -daemon ../config/zookeeper.properties
./kafka-server-start.sh -daemon server.properties

检查kafka集群是否正常的步骤如下:

# 检查topic:
cd /usr/local/kafka/bin
./kafka-topics.sh --list --zookeeper 192.168.1.12:2181
__consumer_offsets
test-infotest
test-infotest2
test
[root@SZ1PRDELK00AP002 bin]#

# 测试生产者和消费者消息发送：
# 首先创建topic:
./bin/kafka-topics.sh -zookeeper 192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181 -topic test -replication-factor 2 -partitions 3 -create

# 然后在三台kafka集群的a机器上面执行创建生产者：

./bin/kafka-console-producer.sh -broker-list 192.168.1.12:9092,192.168.1.14:9092,192.168.1.15:9092 -topic test

# 接着在三台kafka集群的机器上面执行创建消费者：

./bin/kafka-console-consumer.sh -zookeeper 192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181 - from-begining -topic test

[root@SZ1PRDELK00AP002 bin]# cd ..
[root@SZ1PRDELK00AP002 kafka]# ./bin/kafka-console-producer.sh -broker-list 192.168.1.12:9092,192.168.1.14:9092,192.168.1.15:9092 -topic test
>123
>
[root@SZ1PRDELK00AP003 kafka]# ./bin/kafka-console-consumer.sh -zookeeper 192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181 - from-begining -topic test
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
123

# 可以看到消息在生产者发出，消费者收到了。

在github上面有一个管理kafka的web平台可以下载部署，名字叫做kafka manager，部署方法如下:

git clone https://github.com/yahoo/kafka-manager.git
cd /usr/local/kafka-manager/
mkdir -p /root/.sbt/launchers/1.2.8
vim  application.conf

[root@SZ1PRDELK00AP002 conf]# cat application.conf

# Copyright 2015 Yahoo Inc. Licensed under the Apache License, Version 2.0
# See accompanying LICENSE file.

# This is the main configuration file for the application.
# ~~~~~

# Secret key
# ~~~~~
# The secret key is used to secure cryptographics functions.
# If you deploy your application to several instances be sure to use the same key!
play.crypto.secret="^<csmm5Fx4d=r2HEX8pelM3iBkFVv?k[mc;IZE<_Qoq8EkX_/7@Zt6dP05Pzea3U"
play.crypto.secret=${?APPLICATION_SECRET}
play.http.session.maxAge="1h"

# The application languages
# ~~~~~
play.i18n.langs=["en"]

play.http.requestHandler = "play.http.DefaultHttpRequestHandler"
play.http.context = "/"
play.application.loader=loader.KafkaManagerLoader

kafka-manager.zkhosts="192.168.1.12:2181,192.168.1.14:2181,192.168.1.15:2181" #主要是修改这一行，把地址修改成你自己环境的zk集群地址
kafka-manager.zkhosts=${?ZK_HOSTS}
pinned-dispatcher.type="PinnedDispatcher"
pinned-dispatcher.executor="thread-pool-executor"
application.features=["KMClusterManagerFeature","KMTopicManagerFeature","KMPreferredReplicaElectionFeature","KMReassignPartitionsFeature"]
kafka-manager.broker-view-thread-pool-size=30
kafka-manager.broker-view-max-queue-size=3000
kafka-manager.broker-view-update-seconds=30
kafka-manager.offset-cache-thread-pool-size=4
kafka-manager.offset-cache-max-queue-size=1000
kafka-manager.kafka-admin-client-thread-pool-size=4
kafka-manager.kafka-admin-client-max-queue-size=1000

akka {
  loggers = ["akka.event.slf4j.Slf4jLogger"]
  loglevel = "INFO"
}

akka.logger-startup-timeout = 60s

basicAuthentication.enabled=false
basicAuthentication.enabled=${?KAFKA_MANAGER_AUTH_ENABLED}

basicAuthentication.ldap.enabled=false
basicAuthentication.ldap.enabled=${?KAFKA_MANAGER_LDAP_ENABLED}
basicAuthentication.ldap.server=""
basicAuthentication.ldap.server=${?KAFKA_MANAGER_LDAP_SERVER}
basicAuthentication.ldap.port=389
basicAuthentication.ldap.port=${?KAFKA_MANAGER_LDAP_PORT}
basicAuthentication.ldap.username=""
basicAuthentication.ldap.username=${?KAFKA_MANAGER_LDAP_USERNAME}
basicAuthentication.ldap.password=""
basicAuthentication.ldap.password=${?KAFKA_MANAGER_LDAP_PASSWORD}
basicAuthentication.ldap.search-base-dn=""
basicAuthentication.ldap.search-base-dn=${?KAFKA_MANAGER_LDAP_SEARCH_BASE_DN}
basicAuthentication.ldap.search-filter="(uid=$capturedLogin$)"
basicAuthentication.ldap.search-filter=${?KAFKA_MANAGER_LDAP_SEARCH_FILTER}
basicAuthentication.ldap.connection-pool-size=10
basicAuthentication.ldap.connection-pool-size=${?KAFKA_MANAGER_LDAP_CONNECTION_POOL_SIZE}
basicAuthentication.ldap.ssl=false
basicAuthentication.ldap.ssl=${?KAFKA_MANAGER_LDAP_SSL}

basicAuthentication.username="admin"
basicAuthentication.username=${?KAFKA_MANAGER_USERNAME}
basicAuthentication.password="password"
basicAuthentication.password=${?KAFKA_MANAGER_PASSWORD}

basicAuthentication.realm="Kafka-Manager"
basicAuthentication.excluded=["/api/health"] # ping the health of your instance without authentification

kafka-manager.consumer.properties.file=${?CONSUMER_PROPERTIES_FILE}
[root@SZ1PRDELK00AP002 conf]#

./sbt clean dist  #这个是编译的命令，编译之后生成一个kafka-manager-2.0.0.2.zip的包
ls /usr/local/kafka-manager/target/universal/kafka-manager-2.0.0.2.zip
cd /usr/local/kafka-manager/target/universal/
unzip kafka-manager-2.0.0.2.zip
cd /usr/local/kafka-manager/target/universal/kafka-manager-2.0.0.2/bin
nohup ./kafka-manager -Dconfig.file=/usr/local/kafka-manager/target/universal/kafka-manager-2.0.0.2/conf/application.conf -Dhttp.port=8080 &  #启动kafka manager，包括指定配置文件和端口

生产级ELK日志平台搭建关于kafka集群的一些基础概念，我想有必要给大家简单介绍一下,因为后面讲到日志的时候会谈到topic等：

producer：消息生产者，发布消息到 kafka 集群的终端或服务
broker： kafka 集群中包含的服务器。
topic： 每条发布到 kafka 集群的消息属于的类别，即 kafka 是面向 topic 的。
partition： partition 是物理上的概念，每个 topic 包含一个或多个 partition。kafka 分配的单位是 partition。
consumer： 从 kafka 集群中消费消息的终端或服务。
Consumer group： high-level consumer API 中，每个 consumer 都属于一个 consumer group，每条消息只能被 consumer group 中的一个 Consumer 消费，但可以被多个 consumer group 消费。
replica： partition 的副本，保障 partition 的高可用。
leader: replica 中的一个角色， producer 和 consumer 只跟 leader 交互。
follower： replica 中的一个角色，从 leader 中复制数据。
controller： kafka 集群中的其中一个服务器，用来进行 leader election 以及各种 failover。
zookeeper： kafka 通过 zookeeper 来存储集群的 meta 信息。

3. logstash集群的搭建：

配置完成filebeat、kafka集群之后，现在就是需要配置logstash. filebeat负责收集日志送到消息缓存中间件kafka集群，然后由kafka再输出到logstash。 logstash有日志过滤的功能，能够将日志的字段进行json格式的转换，特别是grok模块非常复杂，学习成本很高。本文只讲logstash如何对接kafka和es.

# 首先安装jdk1.8,安装完成之后下载logstash-6.2.4.tar.gz，解压
tar xzvf logstash-6.2.4.tar.gz -C /usr/local
cd /usr/local
ln -sv logstash-6.2.4 logstash
cd logstash/config

# 紧接着配置logstash

input {
  kafka {
    bootstrap_servers => "192.168.1.12:9092,192.168.1.14:9092,192.168.1.15:9092" #配置kafka集群的地址，多个地址之间用都逗号分隔；
    topics_pattern => "elk-.*"  #后续会降到filebeat的配置，因为我们的filebeat里面输出到kafka的topic名称都是以elk开头的；
    group_id => "test-consumer-group"  #如果要配置logstash集群，每一个logstash服务器这里的组ID要相同，才不能抢占kafka的消息；
    codec => json  #注意这里一定要用json格式，不然的话输出到kibana之后，message字段会有很多filebeat带过来的默认信息；
    consumer_threads => 3
    decorate_events => true
    auto_offset_reset => "latest"
  }
}

output {
  elasticsearch {
     hosts => ["192.168.1.19:9200","192.168.1.24:9200"] #这里配置es集群的client节点的ip，多个用逗号分隔；
     user => "elastic"  #因为我们的es集群开启了x-pack认证功能，所以连接es集群需要用到用户名和密码
     password => "654321"
     index => "%{[@metadata][kafka][topic]}-%{+YYYY-MM-dd}"  #这里的索引名称引用了上面的topic字段和日期时间戳的方式；
     workers => 1
  }
}

#启动logstash
/usr/local/logstash/bin/logstash -f test-infotest.conf

在此处有三个问题需要注意：

index索引名字不能有大写字母，并且最好是配置成时间戳的格式；
因为我们的elasticsearch配置额x-pack插件，要求认证。所以需要配置logstash连接es的账户信息，使用超级管理员elastic账户即可；
先不用配置输出到elasticsearch,先是输出到stdout，测试是否有数据从kafka过来;配置方法如下：
```
output {
stdout {
codec => rubydubeg
}
}
```
4、 elasticsearch集群的搭建：
在搭建es集群之前，首先和大家介绍一下ES集群里面的几个角色的分工；

1. master节点：
主要功能是维护元数据，管理集群各个节点的状态，数据的导入和查询都不会走master节点，所以master节点的压力相对较小，因此master节点的内存分配也可以相对少些；但是master节点是最重要的，如果master节点挂了或者发生脑裂了，你的元数据就会发生混乱，那样你集群里的全部数据可能会发生丢失，所以一定要保证master节点的稳定性。
2. data节点：
是负责数据的查询和导入的，它的压力会比较大，它需要分配多点的内存，选择服务器的时候最好选择配置较高的机器（大内存，双路CPU，SSD... 土豪~）；data node要是坏了，可能会丢失一小份数据。
3. client节点:
是作为任务分发用的，它里面也会存元数据，但是它不会对元数据做任何修改。client node存在的好处是可以分担下data node的一部分压力；为什么client node能分担data node的一部分压力？因为es的查询是两层汇聚的结果，第一层是在data node上做查询结果汇聚，然后把结果发给client node，client node接收到data node发来的结果后再做第二次的汇聚，然后把最终的查询结果返回给用户；所以我们看到，client node帮忙把第二层的汇聚工作处理了，自然分担了data node的压力。
这里，我们可以举个例子，当你有个大数据查询的任务（比如上亿条查询任务量）丢给了es集群，要是没有client node，那么压力直接全丢给了data node，如果data node机器配置不足以接受这么大的查询，那么就很有可能挂掉，一旦挂掉，data node就要重新recover，重新reblance，这是一个异常恢复的过程，这个过程的结果就是导致es集群服务停止... 但是如果你有client node，任务会先丢给client node，client node要是处理不来，顶多就是client node停止了，不会影响到data node，es集群也不会走异常恢复。
我们这个ES集群总共使用了8台机器组成，3台master节点,2台client节点,3台data节点。client节点是需要对外提供api接口的.

#每一个节点的初始化动作，都需要安装jdk1.8

cd /software
wget ftp://bqjrftp:Pass123$%^@192.168.20.27:9020/software/ELK_201906/elasticsearch-6.2.4.tar.gz
tar xzvf elasticsearch-6.2.4.tar.gz -C /usr/local
cd /usr/local
ln -sv elasticsearch-6.2.4  elasticsearch

mkdir -p /var/log/elasticsearch
mkdir -p /var/lib/elasticsearch
mkdir -p /var/run/elasticsearch
mkdir -p /data/es-data
useradd  elasticsearch
chown -R elasticsearch.elasticsearch /var/lib/elasticsearch/
chown -R elasticsearch.elasticsearch /var/log/elasticsearch/
chown -R elasticsearch.elasticsearch /var/run/elasticsearch/
chown -R elasticsearch.elasticsearch /data/es-data/
chown -R elasticsearch:elasticsearch /usr/local/elasticsearch

client1的配置文件：

[root@SZ1PRDELK00AP008 config]# egrep -v "^$|#"  elasticsearch.yml
cluster.name: my-elk
node.name: client-1
node.data: false
node.ingest: false
node.master: false
path.data: /data/es-data
path.logs: /var/log/elasticsearch
network.host: 192.168.1.19
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.17", "192.168.1.18", "192.168.1.25"]
discovery.zen.minimum_master_nodes: 2
[root@SZ1PRDELK00AP008 config]#

#注意这里的node.data  node.ingest node.master都是false，就说明是client节点，这里的discovery.zen.minimum_master_nodes: 2
算法是master节点数/2 + 1

# 配置elasticsearch的启动服务，每一个节点都需要配置。注意elasticsearch的服务只能用普通用户启动。

#!/bin/bash
#
# elasticsearch <summary>
#
# chkconfig:   2345 80 20
# description: Starts and stops a single elasticsearch instance on this system
#

### BEGIN INIT INFO
# Provides: Elasticsearch
# Required-Start: $network $named
# Required-Stop: $network $named
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: This service manages the elasticsearch daemon
# Description: Elasticsearch is a very scalable, schema-free and high-performance search solution supporting multi-tenancy and near realtime search.
### END INIT INFO

pid_num=$(ps aux |grep elasticsearch|grep -v grep|awk '{print $2}')
start() {
    su - elasticsearch -c "nohup /usr/local/elasticsearch/bin/elasticsearch >/dev/null 2>&1 &"
    }
stop() {
   if [ `ps aux |grep elasticsearch|grep -v grep|wc -l` -eq 1 ];then
      kill -9 ${pid_num}
   fi
}
status() {
    if [ `ps aux |grep elasticsearch|grep -v grep|wc -l` -eq 1 ];then
     echo "elasticsearch service is starting"
    else
     echo "elasticsearch service is stoping"
    fi
}
case $1 in
start)
  start
;;

stop)
   stop
;;

status)
   status
;;

*)
echo "service accept arguments start|stop|status"
esac

启动elasticsearch服务的时候，由于配置文件中配置块：#bootstrap.memory_lock: true  所以启动服务是报错，报错大体如下:
ERROR: [2] bootstrap checks failed
[1]: max file descriptors [65535] for elasticsearch process is too low, increase to at least [65536]
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

解决方案如下，每一台es节点都需要做如下的优化配置才行：
echo -e "fs.file-max = 65536\nvm.max_map_count = 655360" >> /etc/sysctl.conf
sysctl -p
echo -e "elasticsearch soft nofile 65539\nelasticsearch hard nofile 65539" >> /etc/security/limits.conf

client2的配置文件：

[root@SZ1PRDELK00AP009 config]# egrep -v "^$|#" elasticsearch.yml
cluster.name: my-elk
node.name: client-2
node.data: false
node.ingest: false
node.master: false
path.data: /data/es-data
path.logs: /var/log/elasticsearch
network.host: 192.168.1.24
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.17", "192.168.1.18", "192.168.1.25"]
discovery.zen.minimum_master_nodes: 2
[root@SZ1PRDELK00AP009 config]#

master1的配置文件:

[root@SZ1PRDELK00AP006 config]# egrep -v "^$|#" elasticsearch.yml
cluster.name: my-elk
node.name: master-1
node.master: true
node.data: false
node.ingest: false
path.data: /data/es-data
path.logs: /var/log/elasticsearch
network.host: 192.168.1.17
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.17", "192.168.1.18", "192.168.1.25"]
discovery.zen.minimum_master_nodes: 2
[root@SZ1PRDELK00AP006 config]#

# 注意这里的node.master：true参数

master2的配置文件:

[root@SZ1PRDELK00AP007 config]# egrep -v "^$|#" elasticsearch.yml
cluster.name: my-elk
node.name: master-2
node.master: true
node.data: false
node.ingest: false
path.data: /data/es-data
path.logs: /var/log/elasticsearch
network.host: 192.168.1.18
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.17", "192.168.1.18", "192.168.1.25"]
discovery.zen.minimum_master_nodes: 2
[root@SZ1PRDELK00AP007 config]#

master3的配置文件:

[root@SZ1PRDELK00AP0010 config]# egrep -v "^$|#" elasticsearch.yml
cluster.name: my-elk
node.name: master-3
node.master: true
node.data: false
node.ingest: false
path.data: /data/es-data
path.logs: /var/log/elasticsearch
network.host: 192.168.1.25
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.17", "192.168.1.18", "192.168.1.25"]
discovery.zen.minimum_master_nodes: 2
[root@SZ1PRDELK00AP0010 config]#

data1的配置文件:

cluster.name: my-elk
node.name: data-1
node.data: true
node.ingest: true
path.data: /data/es-data
path.logs: /var/log/elasticsearch
network.host: 192.168.1.26
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.17", "192.168.1.18", "192.168.1.25"]
discovery.zen.minimum_master_nodes: 2

data2的配置文件：

cluster.name: my-elk
node.name: data-2
node.data: true
node.ingest: true
path.data: /data/es-data
path.logs: /var/log/elasticsearch
network.host: 192.168.1.28
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.17", "192.168.1.18", "192.168.1.25"]
discovery.zen.minimum_master_nodes: 2

data3的配置文件:

cluster.name: my-elk
node.name: data-3
node.data: true
node.ingest: true
path.data: /data/es-data
path.logs: /var/log/elasticsearch
network.host: 192.168.1.29
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.17", "192.168.1.18", "192.168.1.25"]
discovery.zen.minimum_master_nodes: 2

配置完成es集群之后，还需要安装es的插件，这里主要介绍三个常用插件，一个是集群管理的cerebro插件，一个是索引查看的head插件，还有一个是权限管理的x-pack插件；安装方式如下：

#安装elasticsearch的管理控制台cerebro

cd /usr/local
wget ftp://bqjrftp:Pass123$%^@192.168.20.27:9020/software/ELK_201906/cerebro-0.8.3.tgz
tar xzvf cerebro-0.8.3.tgz -C /usr/local
nohup ./cerebro -Dhttp.port=8888 -Dhttp.address=192.168.1.24 &

#安装x-pack插件:
cd /home/bqadm
wget ftp://bqjrftp:Pass123$%^@192.168.20.27:9020/software/ELK_201906/x-pack-6.2.4.zip
cp -r /home/bqadm/x-pack-6.2.4.zip /home/elasticsearch/
chown -R elasticsearch:elasticsearch /home/elasticsearch/x-pack-6.2.4.zip
/usr/local/elasticsearch/bin/elasticsearch-plugin install file:///home/elasticsearch/x-pack-6.2.4.zip

es6.2.4安装head:  参考文档:  https://blog.51cto.com/ityunwei2017/2071014

git clone git://github.com/mobz/elasticsearch-head.git
wget https://nodejs.org/dist/v6.10.2/node-v6.10.2-linux-x64.tar.xz

[root@SZ1PRDELK00AP008 ~]# cat /etc/profile.d/node.sh
NODE_HOME=/home/elasticsearch/node-v6.10.2-linux-x64/bin
PATH=$NODE_HOME:$PATH
export NODE_HOME PATH

source /etc/profile.d/node.sh
npm install
grunt server  启动head插件，默认启动的head只能连接localhost的9100端口，我们需要通过局域网来访问，就需要配置两个文件

# 在Gruntfile.js文件中增加hostname字段
connect: {
            server: {
                options: {
                    hostname: '192.168.1.19',
                    port: 9100,
                    base: '.',
                    keepalive: true
                }
            }
        }

# 在_site/app.js里面修改如下行，将localhost修改为ip
this.base_uri = this.config.base_uri || this.prefs.get("app-base_uri") || "http://192.168.1.19:9200";

nohup ./grunt server & #启动head插件就可以访问了

# 默认es5以后的版本，安装head插件之后还是不能通过http访问的，需要增加如下配置:

http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization,X-Requested-With,Content-Length,Content-Type

生产级ELK日志平台搭建好了，今天就写这么多吧。最近公司是在是太忙了。各种技术栈整合推广，什么k8s容器、ELK日志平台、jenkins整合、迁移上云等项目，搞的我是在没有时间写博文。只能趁宝宝睡觉了没人打扰了，晚上加班把工作中的经验和积累写出来，希望可以帮助到大家。这篇博文是ELK日志平台的第一篇，未完待续。第二篇将重点介绍x-pack功能，包括如何突破30天试用期的限制来使用x-pack功能，开启x-pack功能之后es集群的ssl加密通信，日志格式的标准，filebeat收集日志的配置等，更多干货等待着你。请关注我的个人微信公众号“云时代IT运维”。