ELK(Elastic Search、Logstash、Kibana、Logstash)日志采集分析搭建

Bo云见日

已于 2022-08-30 09:00:06 修改

阅读量1.1k

点赞数 4

文章标签： kafka elk 分布式

于 2022-08-26 10:02:03 首次发布

本文链接：https://blog.csdn.net/qq_42970366/article/details/126537603

版权

Elastic Search、Logstash、Kibana、Logstash

主要是分为Elastic Search、Logstash和Kibana三部分：其中Logstash作为日志的汇聚，可以通过input、filter、output三部分，把日志收集、过滤、输出到Elastic Search中（也可以输出到文件或其他载体）；

192.168.32.221	zookeeper、kafka、kafka-eagle、es、logstash、kibana、filebeat
192.168.32.222	zookeeper、kafka、es
192.168.32.223	zookeeper、kafka、es

zookeeper安装

先配置好三台机器名等：

三台机器分别执行：
hostnamectl set-hostname zookeeper01
hostnamectl set-hostname zookeeper02
hostnamectl set-hostname zookeeper03

随意一台执行：
vi /etc/hosts

192.168.32.225 zookeeper01
192.168.32.226 zookeeper02
192.168.32.227 zookeeper03
然后分发给另外两台：
scp /etc/hosts root@192.168.32.226:/etc/
scp /etc/hosts root@192.168.32.227:/etc/

安装jdk

tar zxvf /usr/local/package/jdk-8u121-linux-x64.tar.gz -C /usr/local/

echo '
JAVA_HOME=/usr/local/jdk1.8.0_121
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME PATH
' >>/etc/profile

source /etc/profile

安装zookeeper

mkdir data
上传zookeeper-3.4.14.tar.gz包，并解压到当前目录,重命名成zookeeper
mv zookeeper-3.4.14.tar.gz zookeeper
cd zookeeper/conf
cp zoo_sample.cfg zoo.cfg

vi zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/logs
clientPort=2181
server.1=zookeeper01:2888:3888
server.2=zookeeper02:2888:3888
server.3=zookeeper03:2888:3888

将配置文件分发给另外两台:

scp /data/zookeeper/conf/zoo.cfg root@192.168.32.226:/data/zookeeper/conf
scp /data/zookeeper/conf/zoo.cfg root@192.168.32.227:/data/zookeeper/conf

三台均创建目录:

cd /data/zookeeper
mkdir data
mkdir logs

在创建的data文件中创建myid文件

mkdir  /data/zookeeper/data

echo 1 >  /data/zookeeper/data/myid ##第一台机器
echo 2 >  /data/zookeeper/data/myid ##第二台机器
echo 3 >  /data/zookeeper/data/myid ##第三台机器

命令：

cd /data/zookeeper/bin
./zkServer.sh start
./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /data/zookeeper/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower

kafka安装

cd /data
上传kafka_2.12-2.3.0.tgz包，并解压到当前目录,重命名成kafka
修改配置文件：
vi /data/kafka/config/server.properties

完整配置文件

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.

######三台编号分别为1、2、3
broker.id=1

############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092

######填写配置的各自机器的ip和端口
listeners=PLAINTEXT://192.168.32.225:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network

num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server

socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server

socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)

socket.request.max.bytes=104857600

############################# Log Basics #############################
# A comma separated list of directories under which to store log files

log.dirs=/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.

num.partitions=3

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.

num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.

offsets.topic.replication.factor=1

transaction.state.log.replication.factor=1

transaction.state.log.min.isr=1

############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age

log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.

log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies

log.retention.check.interval.ms=300000

############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.

zookeeper.connect=192.168.32.221:2181,192.168.32.222:2181,192.168.32.223:2181

# Timeout in ms for connecting to zookeeper

zookeeper.connection.timeout.ms=6000

############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.

group.initial.rebalance.delay.ms=0

将配置文件分发给其余两台：

cd /data/kafka/config
scp server.properties root@192.168.32.222:/data/kafka/config
scp server.properties root@192.168.32.223:/data/kafka/config

修改器中配置为各自机器编号和ip
broker.id=1
listeners=PLAINTEXT://192.168.32.225:9092

常用命令：

cd /data/kafka/config/
启动
../bin/kafka-server-start.sh -daemon server.properties
查看启动日志
tail -f -n 400 ../logs/server.log

kafka-eagle

cd /data
解压kafka-eagle-bin-1.3.2.tar.gz中的kafka-eagle-web-1.3.2-bin.tar.gz包
mv kafka-eagle-web-1.3.2 kafka-eagle
cd /data/kafka-eagle/conf
vim system-config.properties

修改其中一下内容：

kafka.eagle.zk.cluster.alias=cluster1
cluster1.zk.list=node-1:2181,node-2:2181,node-3:2181

kafka.eagle.driver=org.sqlite.JDBC
kafka.eagle.url=jdbc:sqlite:/export/servers/kafka-eagle-web-1.3.2/db/ke.db 
kafka.eagle.username=root
kafka.eagle.password=123456

配置环境变量:

vim /etc/profile

export KE_HOME=/export/servers/kafka-eagle-web-1.3.2
export PATH=:$KE_HOME/bin:$PATH

source /etc/profile

启动kafka-eagle:

cd /data/kafka-eagle/bin
chmod u+x ke.sh
./ke.sh start

页面地址用户名密码:

http://192.168.32.221:8048/ke
http://192.168.32.221:8048/ke/account/signin?/ke/

用户名：admin
密码：12345

filebeat安装

cd /data
上传filebeat-6.8.0-x86_64.rpm包，并安装
配置文件目录：
/etc/filebeat/filebeat.yml
修改配置文件
filebeat.prospectors:
- input_type: log        #指定输入的类型
  paths:
    -  /var/log/nginx/*.log      #日志的路径
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: log

output.kafka:        #输出日志,固定格式
  enabled: true       #始终输出
  hosts: ["192.168.32.221:9092","192.168.32.222:9092","192.168.32.223:9092"]
  topic: 'nginx'

systemctl start filebeat
systemctl status filebeat
systemctl stop filebeat

logstach

在data下上传logstash-6.8.0.tar.gz 解压
mv logstash-6.8.0 logstash
touch logstash.conf
vi logstash.conf

配置文件内容为：

input {
kafka {               #指定kafka服务
    type => "nginx_log"
    codec => "json"        #通用选项，用于输入数据的编解码器
    topics => "nginx"        #这里定义的topic
    decorate_events => true  #此属性会将当前topic、group、partition等信息也带到message中
    bootstrap_servers => "192.168.32.225:9092, 192.168.32.226:9092, 192.168.32.227:9092"  #kafka集群IP和端口号9092
  }
}
output{ #输出插件，将事件发送到特定目标
elasticsearch { #输出到es
hosts => ["192.168.32.221:9200","192.168.32.222:9200","192.168.32.223:9200"] #指定es服务的ip加端口
index => ["%{type}-%{+YYYY.MM.dd}"] #引用input中的type名称，定义输出的格式
}
}

命令为：

cd /data/logstash
bin/logstash -f logstash.conf #若要后台启动在末尾添加 & 即可

ES安装

cd /data
解压elasticsearch-6.8.0.tar.gz到data目录下
mv elasticsearch-6.8.0 elasticsearch

创建用户：

sudo useradd ela_user
sudo passwd ela_user
将elasticsearch赋予给用户ela_user权限
sudo chown -R ela_user:ela_user /data/elasticsearch

修改配置文件

cd /elasticsearch/config
vi elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#

cluster.name: ELK-server

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#

node.name: es-1

#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#

path.data: /data/elasticsearch/elkdata

#
## Path to log files:
#

path.logs: /data/elasticsearch/logs

#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#

network.host: 192.168.32.221

#
# Set a custom port for HTTP:
#

http.port: 9200

#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#

discovery.zen.ping.unicast.hosts: ["192.168.32.221", "192.168.32.222","192.168.32.223"]

#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

发送给另外两台配置文件
scp /data/elasticsearch/conf/elasticsearch.yml root@192.168.32.222:/data/elasticsearch/conf/
scp /data/elasticsearch/conf/elasticsearch.yml root@192.168.32.223:/data/elasticsearch/conf/
修改配置文件为各自对应的：
node.name: es-2
node.name: es-3
network.host: 192.168.32.222
network.host: 192.168.32.223

命令：

su ela_user
密码：ela_user
cd /data/elasticsearch/bin
./elasticsearch

kibana

kibana-6.8.0-x86_64.rpm安装
vi /etc/kibana/kibana.yml

修改配置文件

server.port: 5601
server.host: "192.168.32.221"
elasticsearch.hosts: ["http://192.168.32.221:9200"]

命令：

systemctl start kibana
systemctl status kibana
systemctl stop kibana

elasticsearch.yml root@192.168.32.223:/data/elasticsearch/conf/
修改配置文件为各自对应的：
node.name: es-2
node.name: es-3
network.host: 192.168.32.222
network.host: 192.168.32.223