一、Elastic Stack 项目介绍
⚠️警告
1 这不是零基础的教程,需要掌握 Ansible 和 ELK 各个组件的使用
2 这个只是提供了一套自动化部署 ELK 系统的解决方案
3作为练习之用,也可以经过自己修改之后用于生产环境
4 讨论的话私信我或者评论区留言
1 项目使用的服务器情况
项目采用购买的阿里云抢占式服务器 5 台
主机名 | 功能 |
---|---|
es01 | 部署 elasticsearch 节点 ea1 |
es02 | 部署 elasticsearch 节点 ea2 |
es03 | 部署 elasticsearch 节点 ea3 |
logstash01 | 部署 logstash 节点 |
kibana01 | 部署 kibana 节点 |
美国(硅谷)
2 vcpu, 4G 内存, 2 M 带宽
每小时大约 1 元
注意设置自动释放时间,防止忘了释放导致一直扣费!
下方图片中是 7 台,这里只用到 5 台
2 架构图
3 整个 Elastic Stack 使用到的二进制包官方链接
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.0-linux-x86_64.tar.gz
https://artifacts.elastic.co/downloads/kibana/kibana-7.10.2-linux-x86_64.tar.gz
https://artifacts.elastic.co/downloads/logstash/logstash-7.11.1-linux-x86_64.tar.gz
https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.10.2-linux-x86_64.tar.gz
4 整个 Elastic Stack 项目的目录结构
elk-play/elk/
├── deploy-elasticsearch.yml
├── deploy-kibana.yml
├── deploy-logstash.yml
├── ela
│ ├── elasticsearch.j2
│ ├── jvm.options
│ ├── limits.conf
│ └── sysctl.conf
├── elk-pkg
│ ├── elasticsearch-7.10.0-linux-x86_64.tar.gz
│ ├── kibana-7.10.0-linux-x86_64.tar.gz
│ └── logstash-7.10.0-linux-x86_64.tar.gz
├── elk-vars.yml
├── kibana
│ ├── kibana.service.j2
│ └── kibana.yml
└── logstash
├── logstash.conf
├── logstash.service.j2
└── logstash.yml
4 directories, 17 files
5 整个 Elastic Stack 项目共用的变量文件内容
elk-play/elk/elk-vars.yml
[root@ansible ~]# cat elk-play/elk/elk-vars.yml
# elasticsearch 用户
ela_user: ela
# 程序的二进制包名称
ela: elk-pkg/elasticsearch-7.10.0-linux-x86_64.tar.gz
logstash: elk-pkg/logstash-7.10.0-linux-x86_64.tar.gz
kibana: elk-pkg/kibana-7.10.0-linux-x86_64.tar.gz
6 整个 Elastic Stack 项目用的主机名解析文件
/etc/hosts
ip 根据自己的实际情况修改
192.168.122.6 es01
192.168.122.106 es02
192.168.122.218 es03
192.168.122.166 logstash01
192.168.122.131 kibana01
7 整个 Elastic Stack 项目用的资产清单文件
elk-play/hosts.ini
[es]
es01 node_name=ela1
es02 node_name=ela2
es03 node_name=ela3
[es:vars]
es_nodes=["ela1", "ela2", "ela3"]
[kibana]
kibana01
[logstash]
logstash01
8 整个 Elastic Stack 项目用的发送公钥的 playbook
elk-play/send-pubkey.yml
---
- name: 传输本地的 hosts 主机名解析文件
hosts: es,logstash,kibana
gather_facts: no
tasks:
- name: 传输 /etc/hosts 文件
copy:
src: /etc/hosts
dest: /etc/hosts
...
9 整个 Elastic Stack 项目用的发送主机解析文件的 playbook
elk-play/send-etc-hosts.yml
---
- hosts: es,logstash,kibana
gather_facts: no
remote_user: root
vars:
ansible_ssh_pass: upsa
tasks:
- name: Set authorized key taken from file
authorized_key:
user: root
state: present
key: "{{ lookup('file', '/root/.ssh/id_rsa.pub') }}"
...
二、部署 Elasticsearch 集群
1 目录结构
[root@vm1 playbook]# tree elk
elk
├── deploy-elasticsearch.yml
├── ela
│ ├── elasticsearch.j2
│ ├── jvm.options
│ ├── limits.conf
│ └── sysctl.conf
├── elk-pkg
│ ├── elasticsearch-7.10.0-linux-x86_64.tar.gz
│ ├── kibana-7.10.0-linux-x86_64.tar.gz
│ └── logstash-7.10.0-linux-x86_64.tar.gz
└── elk-vars.yml
2 说明:
1 elasticsearch 默认不允许使用 root 用户启动,因此这里需要创建普通用户: ela
2 使用系统自带的 systemd 程序管理 elasticsearch 进程
3 elk/deploy-elasticsearch.yml
---
- name: 部署 elasticsearch 集群
hosts: es
gather_facts: no
remote_user: root
vars_files:
- elastic-vars.yml
tasks:
- name: 创建用户
user:
name: "{{ ela_user }}"
tags: deploy
- name: 传输本地软件包到远程主机并且解压到指定目录
ansible.builtin.unarchive:
src: "{{ ela }}"
dest: /usr/local/
owner: "{{ ela_user }}"
group: "{{ ela_user }}"
list_files: yes
register: ret
tags: deploy
- name: 创建软链接
ansible.builtin.file:
src: /usr/local/{{ ret.files.0 | regex_replace('/.*') }}
dest: /usr/local/elasticsearch
state: link
tags: deploy
- name: 传输配置文件
template:
src: elasticsearch.j2
dest: /usr/local/elasticsearch/config/elasticsearch.yml
owner: "{{ ela_user }}"
group: "{{ ela_user }}"
tags:
- deploy
- update-conf
- name: 传输 jvm 配置文件
copy:
src: jvm.options
dest: /usr/local/elasticsearch/config/jvm.options
owner: "{{ ela_user }}"
group: "{{ ela_user }}"
tags:
- deploy
- update-conf
- name: 传输系统配置文件
copy:
src: limits.conf
dest: /etc/security/limits.conf
tags: deploy
- name: 传输系统配置文件
copy:
src: sysctl.conf
dest: /etc/sysctl.conf
tags: deploy
- name: 加载 /etc/sysctl.conf 文件,使内核参数生效
shell: sysctl -p
tags: deploy
- name: 检查 进程 PID
shell:
cmd: ./jps | grep [E]lasticsearch | cut -d ' ' -f 1
chdir: /usr/local/elasticsearch/jdk/bin/
register: ela_pid
tags:
- restart
- update-conf
#- name: 调试
# debug: var=ela_pid
- name: 停止 elasticsearch
when: ela_pid.stdout
shell: kill -9 "{{ ela_pid.stdout }}"
tags:
- restart
- update-conf
- name: 启动服务
# 使用 elastic 用户执行此命令
become: yes
become_user: "{{ ela_user }}"
command:
# argv 是一个列表,存放了需要执行的命令及其参数
# 一行一个
argv:
- nohup
- /usr/local/elasticsearch/bin/elasticsearch
- -p
- /tmp/elasticsearch.pid
- -d
tags:
- deploy
- restart
- update-conf
...
4 elk/ela/elasticsearch.j2
ES 集群配置模板文件
cluster.name: elk
node.name: {{ node_name }}
node.data: true
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts:
- es01
- es02:9300
- es03
cluster.initial_master_nodes: {{ es_nodes }}
上面的模板文件中使用了变,需要在 资产清单文件 hosts.ini
中添加如下变量
[es]
# 集群节点名称
es01 node_name=ela1
es02 node_name=ela2
es03 node_name=ela3
[es:vars]
# 用于参加选举的节点名列表
es_nodes=["ela1", "ela2", "ela3"]
5 elk/ela/jvm.options
ES 集群使用的 JVM 配置文件(这个不是必须的)
[root@vm1 playbook]# cat elk/jvm.options
-Xms1g
-Xmx1g
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
-Djava.io.tmpdir=${ES_TMPDIR}
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=logs/hs_err_pid%p.log
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
6 elk/ela/limits.conf
系统文件句柄数配置文件
root soft nofile 65535
root hard nofile 65535
* soft nofile 65535
* hard nofile 65535
* soft nproc 4096
* hard nproc 4096
7 elk/ela/sysctl.conf
系统的内核配置文件,内存映射数量等配置
只有
# elasticsearch
下面的配置内容是必须的,其他可选
# elasticsearch
# 内存映射
vm.max_map_count=262144
# TCP 连接等待响应的等待重试次数
net.ipv4.tcp_retries2=5
#####################################
vm.swappiness = 0
kernel.sysrq = 1
net.ipv4.neigh.default.gc_stale_time = 120
# see details in https://help.aliyun.com/knowledge_detail/39428.html
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
# see details in https://help.aliyun.com/knowledge_detail/41334.html
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_slow_start_after_idle = 0
8 elk/elastic-vars.yml
参考上方 第一部分的第 5 小结
9 集群检查命令
9.1 查看集群内某个单一节点信息
curl 'http://es01:9200'
curl 'http://es02:9200'
curl 'http://es03:9200'
[root@ansible ~]# curl 'http://es01:9200'
{
"name" : "ela1",
"cluster_name" : "elk",
"cluster_uuid" : "_1EhSGJfSG-FmmdljfwKcA",
"version" : {
"number" : "7.10.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
"build_date" : "2020-11-09T21:30:33.964949Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
9.2 查看集群健康状态
curl -XGET 'http://es01:9200/_cluster/health?pretty'
9.3 查看节点信息
curl -X GET 'http://es01:9200/_cat/nodes'
二、 部署 Kibana
1 目录结构
elk
├── deploy-kibana.yml
├── elk-pkg
│ ├── elasticsearch-7.10.0-linux-x86_64.tar.gz
│ ├── kibana-7.10.0-linux-x86_64.tar.gz
│ └── logstash-7.10.0-linux-x86_64.tar.gz
├── elk-vars.yml
└── kibana
├── kibana.service.j2
└── kibana.yml
2 说明:
1 Kibana 默认不允许使用 root 用户启动,因此这里需要创建普通用户: ela
2 使用系统自带的 systemd 程序管理 Kibana 进程
3 自定义了程序的日志路径和程序的 PID 路径
3 elk/deploy-kibana.yml
---
- name: 部署 Kibana
hosts: kibana
gather_facts: no
remote_user: root
vars_files:
- elk-vars.yml
vars:
dirs:
dir_log: /var/log/kibana
dir_pid: /run/kibana
tasks:
- name: create user
user:
name: "{{ ela_user }}"
state: present
tags: deploy
- name: create directory
loop: "{{ dirs | dict2items }}"
file:
path: "{{ item.value }}"
state: directory
owner: "{{ ela_user }}"
group: "{{ ela_user }}"
tags: deploy
- name: 传输本地软件包到远程主机并且解压到指定目录
ansible.builtin.unarchive:
src: "{{ kibana}}"
dest: /usr/local/
owner: "{{ ela_user }}"
group: "{{ ela_user }}"
list_files: yes
register: ret
tags:
- deploy
- name: 创建软链接
ansible.builtin.file:
src: /usr/local/{{ ret.files.0 | regex_replace('/.*') }}
dest: /usr/local/kibana
state: link
tags: deploy
- name: 传输配置文件
template:
src: kibana/kibana.yml
dest: /usr/local/kibana/config/kibana.yml
tags:
- deploy
- restart
notify: restart kibana
- name: 传输服务管理文件
template:
src: kibana/kibana.service.j2
dest: /etc/systemd/system/kibana.service
tags: deploy
- name: 停止服务
systemd:
name: kibana
state: stopped
tags:
- stop
- name: 启动
systemd:
name: kibana
state: started
daemon_reload: yes
tags:
- deploy
handlers:
- name: restart kibana
systemd:
name: kibana
state: restarted
tags: restart
4 elk/kibana/kibana.yml
Kibana 程序的配置文件
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://es01:9200"]
pid.file: /run/kibana/kibana.pid
logging.dest: /var/log/kibana/kibana.log
i18n.locale: "zh-CN"
5 elk/kibana/kibana.service.j2
Kibana 程序管理模板文件
[Unit]
Description=Kibana
Documentation=https://www.elastic.co
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User={{ ela_user }}
Group={{ ela_user }}
ExecStart=/usr/local/kibana/bin/kibana
ExecStop=/bin/pkill -F "{{ dirs.dir_pid }}/kibana.pid"
Restart=on-failure
RestartSec=3
StartLimitBurst=3
StartLimitInterval=60
WorkingDirectory=/usr/local/kibana
StandardOutput=journal
StandardError=inherit
[Install]
WantedBy=multi-user.target
6 elk/elk-vars.yml
用到的变量文件
参考上面 第一部分的第 5 小节
三、部署 Logstash
1 目录结构
elk
├── deploy-logstash.yml
│
├── elk-pkg
│ ├── elasticsearch-7.10.0-linux-x86_64.tar.gz
│ ├── kibana-7.10.0-linux-x86_64.tar.gz
│ └── logstash-7.10.0-linux-x86_64.tar.gz
├── elk-vars.yml
└── logstash
├── logstash.conf
├── logstash.service.j2
└── logstash.yml
2 说明
1 程序使用 root 用户执行
2 指定了日志目录为 /var/log/logstash/
3 elk/deploy-logstash.yml
---
- name: 部署 Kibana
hosts: logstash
gather_facts: no
remote_user: root
vars_files:
- elk-vars.yml
tasks:
- name: 创建日志目录
file:
path: /var/log/logstash
state: directory
- name: 传输本地软件包到远程主机并且解压到指定目录
ansible.builtin.unarchive:
src: "{{ logstash}}"
dest: /usr/local/
list_files: yes
register: ret
tags: deploy
- name: 创建软链接
ansible.builtin.file:
src: /usr/local/{{ ret.files.0 | regex_replace('/.*') }}
dest: /usr/local/logstash
state: link
tags: deploy
- name: 传输配置文件
template:
src: logstash/logstash.yml
dest: /usr/local/logstash/config/logstash.yml
tags: deploy
- name: 传输管道配置文件
copy:
src: logstash/logstash.conf
dest: /usr/local/logstash/config/logstash-sample.conf
tags: deploy
- name: 传输系统服务文件
template:
src: logstash/logstash.service.j2
dest: /etc/systemd/system/logstash.service
tags: deploy
- name: 启动 logstash
systemd:
name: logstash
state: started
daemon_reload: yes
tags:
- deploy
- name: restart logstash
systemd:
name: logstash
state: restarted
daemon_reload: yes
tags:
- restart
...
4 elk/logstash/logstash.yml
Logstash 程序主配置文件
http.host: "0.0.0.0"
path.logs: /var/log/logstash/
[root@ansible elk]#
5 elk/logstash/logstash.conf
Logstash 程序管道配置文件
这个文件内容后面实际项目的时候,会根据情况进行修改更新
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => ["http://es01:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
#user => "elastic"
#password => "changeme"
}
}
6 logstash/logstash.service.j2
Logstash 程序进程管理模板文件
[Unit]
Description=logstash
[Service]
Type=simple
ExecStart=/usr/local/logstash/bin/logstash "-f" "/usr/local/logstash/config/*.conf"
Restart=always
WorkingDirectory=/
#Nice=19
LimitNOFILE=65535
# When stopping, how long to wait before giving up and sending SIGKILL?
# Keep in mind that SIGKILL on a process can cause data loss.
TimeoutStopSec=infinity
[Install]
WantedBy=multi-user.target
7 elk/elk-vars.yml
用到的变量文件
参考上面 第一部分的第 5 小节