Ansible 部署 (ELK)----ES集群
ELK需要的二进制包
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.0-linux-x86_64.tar.gz
https://artifacts.elastic.co/downloads/kibana/kibana-7.10.2-linux-x86_64.tar.gz
https://artifacts.elastic.co/downloads/logstash/logstash-7.11.1-linux-x86_64.tar.gz
https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.10.2-linux-x86_64.tar.gz
整个ELK 项目的目录结构
elk-play/elk/
├── deploy-elasticsearch.yml
├── deploy-kibana.yml
├── deploy-logstash.yml
├── ela
│ ├── elasticsearch.j2
│ ├── jvm.options
│ ├── limits.conf
│ └── sysctl.conf
├── elk-pkg
│ ├── elasticsearch-7.10.0-linux-x86_64.tar.gz
│ ├── kibana-7.10.0-linux-x86_64.tar.gz
│ └── logstash-7.10.0-linux-x86_64.tar.gz
├── elk-vars.yml
├── kibana
│ ├── kibana.service.j2
│ └── kibana.yml
└── logstash
├── logstash.conf
├── logstash.service.j2
└── logstash.yml
4 directories, 17 files
整个 ELK 项目共用的变量文件内容
elk-play/elk/elk-vars.yml
[root@ansible ~]# cat elk-play/elk/elk-vars.yml
# elasticsearch 用户
ela_user: ela
# 程序的二进制包名称
ela: elk-pkg/elasticsearch-7.10.0-linux-x86_64.tar.gz
logstash: elk-pkg/logstash-7.10.0-linux-x86_64.tar.gz
kibana: elk-pkg/kibana-7.10.0-linux-x86_64.tar.gz
整个 ELK项目用的主机名解析文件
/etc/hosts
ip 根据自己的实际情况修改
192.66.66.6 es01
192.66.66.106 es02
192.66.66.218 es03
192.66.66.166 logstash01
192.66.66.131 kibana01
整个 ELK项目用的资产清单文件
[es]
es01 node_name=ela1
es02 node_name=ela2
es03 node_name=ela3
[es:vars]
es_nodes=["ela1", "ela2", "ela3"]
[kibana]
kibana01
[logstash]
logstash01
整个 ELK 项目用的发送公钥的 playbook
- elk-play/send-pubkey.yml
---
- hosts: es,logstash,kibana
gather_facts: no
remote_user: root
vars:
ansible_ssh_pass: upsa
tasks:
- name: Set authorized key taken from file
authorized_key:
user: root
state: present
key: "{{ lookup('file', '/root/.ssh/id_rsa.pub') }}"
...
整个 ELK项目用的发送主机解析文件的 playbook
- elk-play/send-etc-hosts.yml
---
- name: 传输本地的 hosts 主机名解析文件
hosts: es,logstash,kibana
gather_facts: no
tasks:
- name: 传输 /etc/hosts 文件
copy:
src: /etc/hosts
dest: /etc/hosts
...
部署 Elasticsearch集群
目录结构
[root@vm1 playbook]# tree elk
elk
├── deploy-elasticsearch.yml
├── ela
│ ├── elasticsearch.j2
│ ├── jvm.options
│ ├── limits.conf
│ └── sysctl.conf
├── elk-pkg
│ ├── elasticsearch-7.10.0-linux-x86_64.tar.gz
│ ├── kibana-7.10.0-linux-x86_64.tar.gz
│ └── logstash-7.10.0-linux-x86_64.tar.gz
└── elk-vars.yml
- 说明
- elasticsearch 默认不允许使用 root 用户启动,因此这里需要创建普通用户: ela
- 使用系统自带的 systemd 程序管理 elasticsearch 进程
elk/deploy-elasticsearch.yml
---
- name: 部署 elasticsearch 集群
hosts: es
gather_facts: no
remote_user: root
vars_files:
- elastic-vars.yml
tasks:
- name: 创建用户
user:
name: "{{ ela_user }}"
tags: deploy
- name: 传输本地软件包到远程主机并且解压到指定目录
ansible.builtin.unarchive:
src: "{{ ela }}"
dest: /usr/local/
owner: "{{ ela_user }}"
group: "{{ ela_user }}"
list_files: yes
register: ret
tags: deploy
- name: 创建软链接
ansible.builtin.file:
src: /usr/local/{{ ret.files.0 | regex_replace('/.*') }}
dest: /usr/local/elasticsearch
state: link
tags: deploy
- name: 传输配置文件
template:
src: elasticsearch.j2
dest: /usr/local/elasticsearch/config/elasticsearch.yml
owner: "{{ ela_user }}"
group: "{{ ela_user }}"
tags:
- deploy
- update-conf
- name: 传输 jvm 配置文件
copy:
src: jvm.options
dest: /usr/local/elasticsearch/config/jvm.options
owner: "{{ ela_user }}"
group: "{{ ela_user }}"
tags:
- deploy
- update-conf
- name: 传输系统配置文件
copy:
src: limits.conf
dest: /etc/security/limits.conf
tags: deploy
- name: 传输系统配置文件
copy:
src: sysctl.conf
dest: /etc/sysctl.conf
tags: deploy
- name: 加载 /etc/sysctl.conf 文件,使内核参数生效
shell: sysctl -p
tags: deploy
- name: 检查 进程 PID
shell:
cmd: ./jps | grep [E]lasticsearch | cut -d ' ' -f 1
chdir: /usr/local/elasticsearch/jdk/bin/
register: ela_pid
tags:
- restart
- update-conf
#- name: 调试
# debug: var=ela_pid
- name: 停止 elasticsearch
when: ela_pid.stdout
shell: kill -9 "{{ ela_pid.stdout }}"
tags:
- restart
- update-conf
- name: 启动服务
# 使用 elastic 用户执行此命令
become: yes
become_user: "{{ ela_user }}"
command:
# argv 是一个列表,存放了需要执行的命令及其参数
# 一行一个
argv:
- nohup
- /usr/local/elasticsearch/bin/elasticsearch
- -p
- /tmp/elasticsearch.pid
- -d
tags:
- deploy
- restart
- update-conf
...
elk/ela/elasticsearch.j2
- ES 集群配置模板文件
cluster.name: elk
node.name: {{ node_name }}
node.data: true
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts:
- es01
- es02:9300
- es03
cluster.initial_master_nodes: {{ es_nodes }}
上面的模板文件中使用了变,需要在 资产清单文件 hosts.ini
中添加如下变量
[es]
# 集群节点名称
es01 node_name=ela1
es02 node_name=ela2
es03 node_name=ela3
[es:vars]
# 用于参加选举的节点名列表
es_nodes=["ela1", "ela2", "ela3"]
elk/ela/jvm.options
ES 集群使用的 JVM 配置文件(这个不是必须的)
[root@vm1 playbook]# cat elk/jvm.options
-Xms1g
-Xmx1g
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
-Djava.io.tmpdir=${ES_TMPDIR}
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=logs/hs_err_pid%p.log
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
elk/ela/limits.conf
系统文件句柄数配置文件
root soft nofile 65535
root hard nofile 65535
* soft nofile 65535
* hard nofile 65535
* soft nproc 4096
* hard nproc 4096
elk/ela/sysctl.conf
系统的内核配置文件,内存映射数量等配置
只有
# elasticsearch
下面的配置内容是必须的,其他可选
# elasticsearch
# 内存映射
vm.max_map_count=262144
# TCP 连接等待响应的等待重试次数
net.ipv4.tcp_retries2=5
#####################################
vm.swappiness = 0
kernel.sysrq = 1
net.ipv4.neigh.default.gc_stale_time = 120
# see details in https://help.aliyun.com/knowledge_detail/39428.html
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
# see details in https://help.aliyun.com/knowledge_detail/41334.html
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_slow_start_after_idle = 0
集群检查命令
查看集群内某个单一节点信息
curl 'http://es01:9200'
curl 'http://es02:9200'
curl 'http://es03:9200'
[root@ansible ~]# curl 'http://es01:9200'
{
"name" : "ela1",
"cluster_name" : "elk",
"cluster_uuid" : "_1EhSGJfSG-FmmdljfwKcA",
"version" : {
"number" : "7.10.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
"build_date" : "2020-11-09T21:30:33.964949Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
查看集群健康状态
curl -XGET 'http://es01:9200/_cluster/health?pretty'
查看节点信息
curl -X GET 'http://es01:9200/_cat/nodes'