基于双vip的GTID的半同步主从复制MySQL高可用集群
文章目录
一、项目准备
-
画好网络拓扑图
-
准备9台安装好CentOS的机器,并分配好ip地址
机器名称 IP地址 master 192.168.10.130 slave 192.168.10.136 slave2 192.168.10.137 delay_backup 192.168.10.138 ansible中控机 192.168.10.140 mysqlrouter1 192.168.10.141 mysqlrouter2 192.168.10.142 test 192.168.10.180 prometheus 192.168.10.188 - 前4台是MySQL服务器,搭建主从复制的集群,一个master服务器,2个slave服务器,1个延迟备份服务器,同时延迟备份服务器也可以充当异地备份服务器,数据从master或者slave上导出,然后rsync到备份服务器。
- 2台MySQLrouter服务器,安装好keepalived软件,实现高可用的读写分离服务;
- 1台ansible中控服务器,实现对MySQL整个集群的服务器进行批量管理。
- 1台test测试机器做压力测试
- 1台prometheus监控服务器
二、项目步骤
1、使用ansible部署MySQL集群
(1) 安装ansible
[root@ansible ~]# yum install epel-release -y
[root@ansible ~]# yum install ansible -y
(2) 与所有MySQL节点服务器建立免密通道
[root@ansible ~]# vi /etc/ansible/hosts
[db]
192.168.10.130
192.168.10.136
192.168.10.137
192.168.10.138
[dbslaves]
192.168.10.136
192.168.10.137
192.168.10.138
[root@ansible ~]# ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.130
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.136
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.137
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.138
-
测试
[root@ansible ~]# ssh 'root@192.168.10.130'
(3) 使用ansible部署mysql集群
# 编写一键安装MySQL脚本,如下:
[root@ansible ansible]# cat onekey_install_mysql_binary.sh
#!/bin/bash
#解决软件的依赖关系
yum install cmake ncurses-devel gcc gcc-c++ vim lsof bzip2 openssl-devel ncurses-compat-libs -y
#mkdir /mysql
cd /mysql
#解压mysql二进制安装包
tar xf mysql-5.7.35-linux-glibc2.12-x86_64.tar.gz
#移动mysql解压后的文件到/usr/local下改名叫mysql
mv mysql-5.7.35-linux-glibc2.12-x86_64 /usr/local/mysql
#新建组和用户 mysql
groupadd mysql
#mysql这个用户的shell 是/bin/false 属于mysql组
useradd -r -g mysql -s /bin/false mysql
#关闭firewalld防火墙服务,并且设置开机不要启动
service firewalld stop
systemctl disable firewalld
#临时关闭selinux
setenforce 0
#永久关闭selinux
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config
#新建存放数据的目录
mkdir /data/mysql -p
#修改/data/mysql目录的权限归mysql用户和mysql组所有,这样mysql用户可以对这个文件夹进行读写了
chown mysql:mysql /data/mysql/
#只是允许mysql这个用户和mysql组可以访问,其他人都不能访问
chmod 750 /data/mysql/
#进入/usr/local/mysql/bin目录
cd /usr/local/mysql/bin/
#初始化mysql
./mysqld --initialize --user=mysql --basedir=/usr/local/mysql/ --datadir=/data/mysql &>passwd.txt
#让mysql支持ssl方式登录的设置
./mysql_ssl_rsa_setup --datadir=/data/mysql/
#获得临时密码
tem_passwd=$(cat passwd.txt |grep "temporary"|awk '{print $NF}')
#$NF表示最后一个字段
# abc=$(命令) 优先执行命令,然后将结果赋值给abc
# 修改PATH变量,加入mysql bin目录的路径
#临时修改PATH变量的值
export PATH=/usr/local/mysql/bin/:$PATH
#重新启动linux系统后也生效,永久修改
echo 'PATH=/usr/local/mysql/bin:$PATH' >>/root/.bashrc
#复制support-files里的mysql.server文件到/etc/init.d/目录下叫mysqld
cp ../support-files/mysql.server /etc/init.d/mysqld
#修改/etc/init.d/mysqld脚本文件里的datadir目录的值
sed -i '70c datadir=/data/mysql' /etc/init.d/mysqld
#生成/etc/my.cnf配置文件
cat >/etc/my.cnf <<EOF
[mysqld_safe]
[client]
socket=/data/mysql/mysql.sock
[mysqld]
socket=/data/mysql/mysql.sock
port = 3306
open_files_limit = 8192
innodb_buffer_pool_size = 512M
character-set-server=utf8
[mysql]
auto-rehash
prompt=\\u@\\d \\R:\\m mysql>
EOF
#修改内核的open file的数量
ulimit -n 1000000
#设置开机启动的时候也配置生效
echo "ulimit -n 1000000" >>/etc/rc.local
chmod +x /etc/rc.d/rc.local
#启动mysqld进程
service mysqld start
#将mysqld添加到linux系统里服务管理名单里
/sbin/chkconfig --add mysqld
#设置mysqld服务开机启动
/sbin/chkconfig mysqld on
#初次修改密码需要使用--connect-expired-password 选项
#-e 后面接的表示是在mysql里需要执行命令 execute 执行
#set password='Sanchuang123#'; 修改root用户的密码为Sanchuang123#
mysql -uroot -p$tem_passwd --connect-expired-password -e "set password='Sanchuang123#';"
#检验上一步修改密码是否成功,如果有输出能看到mysql里的数据库,说明成功。
mysql -uroot -p'Sanchuang123#' -e "show databases;"
# 使用ansible将mysql源码传过去。
[root@ansible ansible]# ansible mysql -m copy -a 'src=/ansible/mysql-5.7.35-linux-glibc2.12-x86_64.tar.gz dest=/mysql/'
# 使用ansible编写playbook执行安装。
[root@ansible ansible]# cat onekey_install_mysql.yaml
---
- name: 安装mysql
hosts: mysql
gather_facts: yes
tasks:
- name: 安装mysql
script: /ansible/onekey_install_mysql_binary.sh
[root@ansible ansible]# ansible-playbook onekey_install_mysql.yaml
# 至此MySQL集群已经搭建完成。
2、配置半同步,开启GTID功能
(1) 数据一致
-
master上导出数据
[root@ln-master ~]# mysqldump -uroot -p"123456" --all-databases >/backup/all_db.sql mysqldump: [Warning] Using a password on the command line interface can be insecure.
-
将数据文件上传到从服务器(slave,slave3,delay_backup)上
scp all_db.sql 192.168.10.136:/root scp all_db.sql 192.168.10.137:/root scp all_db.sql 192.168.10.138:/root
-
slave和backup导入数据
[root@ln-slave ~]# mysql -uroot -p'Sanchuang123#' <all_db.sql mysql: [Warning] Using a password on the command line interface can be insecure.
(2) 配置
-
在master机器上安装半同步的插件
mysql>install plugin rpl_semi_sync_master SONAME 'semisync_master.so';
-
在master上修改配置文件进行永久配置,并刷新mysqld服务
[root@master ~]# vi /etc/my.cnf [mysqld] #开启二进制日志 log_bin server_id=1 #半同步 rpl_semi_sync_master_enabled=1 rpl_semi_sync_master_timeout=1000 #1 second #gtid功能 gtid-mode=ON enforce-gtid-consistency=ON
service mysqld restart
-
可以清空一下master的二进制日志等各种数据
mysql>reset master;
-
-
在3台slave上安装半同步插件(注意是slave)
mysql>install plugin rpl_semi_sync_slave SONAME 'semisync_slave.so';
-
在3台slave上修改配置文件
[mysqld] #二进制日志 log_bin server_id=2 #!注意:每台slave的id都不一样 expire_logs_days=15 #开启gtid功能 gtid-mode=ON enforce-gtid-consistency=ON log_slave_updates=ON #开启半同步,需要提前安装半同步的插件 rpl_semi_sync_slave_enabled=1
-
刷新服务
service mysqld restart
-
清空slave所有数据
mysql> reset slave all;
-
-
在master和slave上查看是否激活半同步
mysql>select plugin_name,plugin_status from information_schema.plugins where plugin_name like '%semi%'; +----------------------+---------------+ | plugin_name | plugin_status | +----------------------+---------------+ | rpl_semi_sync_master | ACTIVE | | rpl_semi_sync_slave | ACTIVE | +----------------------+---------------+ 1 row in set (0.01 sec)
-
master授权配置
mysql>grant replication slave on *.* to 'liuna'@'192.168.10.%'identified by '123456'; mysql> flush privileges;
-
slave配置同步(延迟备份服务器暂时不用)
CHANGE MASTER TO MASTER_HOST='192.168.10.130', MASTER_USER='liuna', MASTER_PASSWORD='123456', MASTER_PORT=3306, master_auto_position=1; mysql> start slave;
-
查看机器的状态(查看IO线程和SQL线程是否起来)
mysql> show master status; mysql> show slave status\G;
3、delay_backup配置延迟备份
-
授权用户配置同步时授权的是slave3服务器,并配置延迟备份
CHANGE MASTER TO MASTER_HOST='192.168.10.137', MASTER_USER='liuna', MASTER_PASSWORD='123456', MASTER_PORT=3306, master_auto_position=1; mysql>change master to master_delay=10; mysql> start slave;
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-U0tSyW9j-1680922993633)(C:\Users\Eran\AppData\Roaming\Typora\typora-user-images\image-20230408101137535.png)]
4、mater定时自动备份
-
master上建立与ansible的免密通道
[root@ln-master ~]# ssh-keygen -t rsa [root@ln-master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.140
-
master创建创建计划任务到ansible
[root@ln-master backup]# vi backup_db.sh #!/bin/bash #导出数据 mkdir -p /backup mysqldump -uroot -p'123456' --all-databases b>/backup/$(date +%F)_all_db.sql #scp到远程的备份服务器 scp /backup/$(date +%F)_all_db.sql root@192.168.10.140:/backup
[root@ln-master backup]# crontab -e 30 2 * * * bash /backup/backup_db.sh
5、MySQLrouter读写分离部署实现
-
上传或者去官方网站下载mysql-router软件
-
准备两台全新的linux服务器,安装好mysqlrouter软件
mysqlrouter1:192.168.10.141 mysqlrouter2:192.168.10.142
rpm -ivh mysql-router-community-8.0.21-1.el7.x86_64.rpm
-
修改配置文件
[root@mysql_router1 ~]# cd /etc/mysqlrouter [root@mysql_router1 mysqlrouter]# ls mysqlrouter.conf [root@mysql_router1 mysqlrouter]# vi mysqlrouter.conf
[DEFAULT] config_folder = /etc/mysqlrouter logging_folder = /usr/local/mysqlrouter/log runtime_folder = /var/run/mysqlrouter [logger] level = INFO #read-only [routing:slaves] bind_address = 0.0.0.0:7001 #mysqlrouter1 #slave destinations = 192.168.10.136:3306,192.168.10.137:3306 mode = read-only connect_timeout = 1 #write and read [routing:masters] bind_address = 0.0.0.0:7002 #mysqlrouter2 #master destinations = 192.168.10.130:3306 mode = read-write connect_timeout = 2
#router2 #read-only [routing:slaves] bind_address = 0.0.0.0:7001 #slave destinations = 192.168.10.136:3306,192.168.10.137:3306 mode = read-only connect_timeout = 1 #write and read [routing:masters] bind_address = 0.0.0.0:7002 #master destinations = 192.168.10.130:3306 mode = read-write connect_timeout = 2
为什么router的bind-address是0.0.0.0?
- 因为启动keepalived会有一个虚拟IP,当要指定其中一台mysqlrouter服务器为master时,虚拟IP会飘到这台master上,使它拥有了两个ip地址。
- 如果配置文件指定的ip地址是router本身的,keepalived的虚拟ip就起不到任何作用。
-
启动MySQL router服务(启动之前,请确保后端MySQL已被配置好主从复制)。
service mysqlrouter start
-
在master上创建2个账号,一个是读的,一个是写的。
mysql>grant all on *.* to 'write'@'%' identified by '123456'; mysql>grant select on *.* to 'read'@'%' identified by '123456';
-
在客户端上测试读写分离的效果,使用两个测试账号
6、搭建keepalived实现双vip功能
(1) 安装keepalived软件
-
安装keepalived软件,在2台MySQLrouter上都安装
yum install keepalived -y
-
修改配置文件
[root@mysql_router1 mysqlrouter]# cd /etc/keepalived [root@mysql_router1 keepalived]# ls keepalived.conf [root@mysql_router1 keepalived]# vi keepalived.conf ! Configuration File for keepalived global_defs { notification_email { acassen@firewall.loc failover@firewall.loc sysadmin@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 192.168.200.1 smtp_connect_timeout 30 router_id LVS_DEVEL vrrp_skip_check_adv_addr #vrrp_strict #注释掉 vrrp_garp_interval 0 vrrp_gna_interval 0 } vrrp_instance VI_1 { #定义一个vrrp协议的实例 state MASTER #router1是master,router2是backup interface ens33 #指定监听网络的接口是ens33(vip绑定的那个) virtual_router_id 27 #0~255都可以,router的id都要一样 priority 200 #优先级设为200,router2为100 (0~255) advert_int 1 #宣告消息的时间间隔为1秒 authentication { auth_type PASS #密码认证 password auth_pass 1111 #具体密码 } virtual_ipaddress { 192.168.2.188 #vip地址 } } #后面的配置都删掉
-
刷新服务并且关闭防火墙
systemctl disable firewalld service keepalived start
-
检查ip add有没有出现vip,正常情况下只有master服务器上才会有vip,如果backup机器上也出现了vip,说明出现了脑裂现象
脑裂现象
(1) 出现的原因:
1.vrid(虚拟路由id)不一样
2.网络通信有问题:中间有防火墙阻止了网络之间的选举的过程,vrrp报文的通信
3.认证密码不一样也会出现脑裂
(2) 是否有危害:
没有危害,能正常访问,反而还有负载均衡的作用
脑裂恢复的时候,还是有影响的,会短暂的中断,影响业务
(3) 验证vip漂移
-
关闭master上的keepalived服务
service keepalived stop
-
在backup服务器上查看是否有vip
ip add
(2) 实现双vip
思路:搞2个vrrp实例,2个vip,2个实例互为主备
在keepalived配置文件里写两个vrrp实例,2个vip,2个实例互为主备
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 27
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.2.188
}
}
vrrp_instance VI_2 {
state backup
interface ens33
virtual_router_id 26
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.2.186
}
}
vrrp_instance VI_1 {
state backup
interface ens33
virtual_router_id 26
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.2.188
}
}
vrrp_instance VI_2 {
state master
interface ens33
virtual_router_id 27
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.2.186
}
}
7、压力测试
工具:sysbench
https://www.cnblogs.com/f-ck-need-u/p/9279703.html
(1) 安装
去官网https://dev.mysql.com/downloads/file/?id=513590下载mysql的repo文件(解决yum源依赖问题)
[root@liuna ~]# rpm -ivh mysql80-community-release-el7-7.noarch.rpm
警告:mysql80-community-release-el7-7.noarch.rpm: 头V4 RSA/SHA256 Signature, 密钥 ID 3a79bd29: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql80-community-release-el7-7 ################################# [100%]
[root@liuna ~]# cd /etc/yum.repos.d
[root@liuna yum.repos.d]# ls
CentOS-Base.repo CentOS-Sources.repo mysql-community-debuginfo.repo
CentOS-CR.repo CentOS-Vault.repo mysql-community.repo
CentOS-Debuginfo.repo CentOS-x86_64-kernel.repo mysql-community-source.repo
CentOS-fasttrack.repo epel.repo
CentOS-Media.repo epel-testing.repo
[root@liuna ~]# wget https://github.com/akopytov/sysbench/archive/1.0.15.tar.gz
-
建议使用epel-release源安装sysbench
yum install epel-release -y yum install sysbench -y
(2) 测试数据
-
先在slave上连接到mysqlrouter的写端口,提前首先创建好sysbench所需数据库
sbtest
(这是sysbench默认使用的库名,必须创建测试库)。[root@ln-slave ~]# mysql -uwrite -p'123456' -h 192.168.10.200 -P 7002
write@(none) 16:27 mysql>create database sbtest; Query OK, 1 row affected (0.00 sec)
-
编译安装
sysbench --mysql-host=192.168.10.200 --mysql-port=7001 --mysql-user=write --mysql-password='123456' /root/sysbench/sysbench-1.0.15/src/lua/oltp_common.lua --tables=10 --table_size=10000 prepare
-
yum安装
sysbench --mysql-host=192.168.10.200 --mysql-port=7002 --mysql-user=write --mysql-password='123456' /usr/share/sysbench/oltp_common.lua --tables=10 --table_size=10000 prepare
其中
--tables=10
表示创建10个测试表,--table_size=100000
表示每个表中插入10W行数据,prepare
表示这是准备数的过程。
(3) 数据库测试与结果分析
sysbench --threads=4 --time=20 --report-interval=5 --mysql-host=192.168.10.200 --mysql-port=7002 --mysql-user=write --mysql-password=123456 /usr/share/sysbench/oltp_read_write.lua --tables=10 --table_size=10000 run
以下是测试返回的结果:
sysbench 1.0.17 (using system LuaJIT 2.0.4)
Running the test with following options:
Number of threads: 4
Report intermediate results every 5 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
####以下是每5秒返回一次的结果,统计的指标包括:
####线程数、tps(每秒事务数)、qps(每秒查询数)
####每秒的读/写/其它次数、延迟、每秒错误数、每秒重连次数
[ 5s ] thds: 4 tps: 158.12 qps: 3169.72 (r/w/o: 2220.02/632.66/317.03) lat (ms,95%): 31.94 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 4 tps: 149.04 qps: 2986.42 (r/w/o: 2089.98/598.36/298.08) lat (ms,95%): 34.33 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 4 tps: 135.78 qps: 2714.06 (r/w/o: 1900.56/541.93/271.57) lat (ms,95%): 48.34 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 4 tps: 152.03 qps: 3037.53 (r/w/o: 2126.57/606.90/304.05) lat (ms,95%): 34.95 err/s: 0.00 reconn/s: 0.00
SQL statistics:
queries performed:
read: 41706 #执行的读操作数量
write: 11916 #执行的写操作数量
other: 5958 #执行的其他操作数量
total: 59580
transactions: 2979 (148.40 per sec.) #执行事务的平均速率
queries: 59580 (2967.93 per sec.) #平均每秒能执行多少次查询
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 20.0738s #总消耗时间
total number of events: 2979 #总请求数量(读、写、其它)
Latency (ms):
min: 15.90
avg: 26.89
max: 75.17
95th percentile: 38.25 #采样计算的平均延迟
sum: 80103.21
Threads fairness:
events (avg/stddev): 744.7500/0.43
execution time (avg/stddev): 20.0258/0.02
(4) cpu/io/内存等测试
sysbench内置了几个测试指标。
Compiled-in tests:
fileio - File I/O test
cpu - CPU performance test
memory - Memory functions speed test
threads - Threads subsystem performance test
mutex - Mutex performance test
-
创建5个文件,总共1G,每个文件大概200M。
[root@localhost sysbench]# sysbench fileio --file-num=5 --file-total-size=1G prepare sysbench 1.0.17 (using system LuaJIT 2.0.4) 5 files, 209715Kb each, 1023Mb total Creating files for the test... Extra file open flags: (none) Creating file test_file.0 Creating file test_file.1 Creating file test_file.2 Creating file test_file.3 Creating file test_file.4 1073807360 bytes written in 4.39 seconds (233.32 MiB/sec). [root@localhost sysbench]# ls -lh test* -rw-------. 1 root root 205M 4月 6 17:06 test_file.0 -rw-------. 1 root root 205M 4月 6 17:06 test_file.1 -rw-------. 1 root root 205M 4月 6 17:06 test_file.2 -rw-------. 1 root root 205M 4月 6 17:06 test_file.3 -rw-------. 1 root root 205M 4月 6 17:06 test_file.4 tests: 总用量 12K drwxr-xr-x. 3 root root 4.0K 4月 6 15:48 include drwxr-xr-x. 2 root root 4.0K 4月 6 15:48 t -rwxr-xr-x. 1 root root 2.5K 3月 16 2019 test_run.sh
-
运行测试
sysbench --events=5000 --threads=16 fileio --file-num=5 --file-total-size=1G --file-test-mode=rndrw --file-fsync-freq=0 --file-block-size=16384 run
sysbench 1.0.17 (using system LuaJIT 2.0.4) Running the test with following options: Number of threads: 16 Initializing random number generator from current time Extra file open flags: (none) 5 files, 204.8MiB each 1024MiB total file size Block size 16KiB Number of IO requests: 5000 Read/Write ratio for combined random IO test: 1.50 Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random r/w test Initializing worker threads... Threads started! File operations: reads/s: 9015.19 writes/s: 6035.21 fsyncs/s: 240.81 Throughput: #吞吐量 read, MiB/s: 140.82 #表示读的带宽 written, MiB/s: 94.30 #表示写的带宽 General statistics: total time: 0.3315s total number of events: 5000 Latency (ms): min: 0.00 avg: 0.84 max: 113.72 95th percentile: 6.79 sum: 4180.69 Threads fairness: events (avg/stddev): 312.5000/28.42 execution time (avg/stddev): 0.2613/0.04
-
cpu性能测试
[root@localhost sysbench]# sysbench cpu --threads=40 --events=10000 --cpu-max-prime=20000 run sysbench 1.0.17 (using system LuaJIT 2.0.4) Running the test with following options: Number of threads: 40 Initializing random number generator from current time Prime numbers limit: 20000 Initializing worker threads... Threads started! CPU speed: events per second: 656.65 General statistics: total time: 10.0336s total number of events: 6589 Latency (ms): min: 1.33 avg: 59.56 max: 768.15 95th percentile: 434.83 sum: 392421.74 Threads fairness: events (avg/stddev): 164.7250/3.46 execution time (avg/stddev): 9.8105/0.12
8、搭建监控系统
(1) 源码安装
-
上传下载的源码包到Linux服务器
[root@prometheus ~]# mkdir /prome [root@prometheus ~]# cd /prome [root@prometheus prome]# ls prometheus-2.34.0.linux-amd64.tar.gz
-
解压
[root@prometheus prome]# tar xf prometheus-2.34.0.linux-amd64.tar.gz [root@prometheus prome]# ls prometheus-2.34.0.linux-amd64 prometheus-2.34.0.linux-amd64.tar.gz
-
修改解压后的压缩目录名字
[root@prometheus prome]# mv prometheus-2.34.0.linux-amd64 prometheus [root@prometheus prome]# ls prometheus prometheus-2.34.0.linux-amd64.tar.gz [root@prometheus prome]# cd prometheus [root@prometheus prometheus]# ls console_libraries consoles LICENSE NOTICE prometheus prometheus.yml promtool
-
临时和永久修改PATH变量,添加Prometheus的路径
[root@prometheus prometheus]# PATH=/prome/prometheus:$PATH [root@prometheus prometheus]# vi /root/.bashrc [root@prometheus prometheus]# cat /root/.bashrc # .bashrc # User specific aliases and functions alias rm='rm -i' alias cp='cp -i' alias mv='mv -i' # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi PATH=/prome/prometheus:$PATH #添加
-
执行Prometheus程序
[root@prometheus prometheus]# nohup prometheus --config.file=/prome/prometheus/prometheus.yml & [1] 1957 [root@prometheus prometheus]# nohup: 忽略输入并把输出追加到"nohup.out" #查看prometheus的进程 [root@prometheus prometheus]# ps aux |grep prome root 1957 0.0 5.2 782340 52668 pts/0 Sl 21:02 0:00 prometheus --config.file=/prome/prometheus/prometheus.yml root 1966 0.0 0.0 112824 972 pts/0 S+ 21:13 0:00 grep --color=auto prome #查看pronmetheus监听的端口号 [root@prometheus prometheus]# netstat -anplut|grep prome tcp6 0 0 :::9090 :::* LISTEN 1957/prometheus tcp6 0 0 ::1:9090 ::1:50454 ESTABLISHED 1957/prometheus tcp6 0 0 ::1:50454 ::1:9090 ESTABLISHED 1957/prometheus
-
关闭防火墙并设置开机不要启动
[root@prometheus prometheus]# systemctl disable firewalld Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service. Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
(2) 访问测试
(3) 把prometheus做成一个服务来进行管理,
[root@prometheus ~]# cat /usr/lib/systemd/system/prometheus.service
[Unit]
Description=prometheus
[Service]
ExecStart=/prom/prometheus/prometheus --config.file=/prom/prometheus/prometheus.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
[root@prometheus system]# systemctl daemon-reload #重载systemd相关服务
-
第一次是因为使用nohup方式启动的prometheus,还是需要使用kill的方式杀死第一次启动的进程,后面可以使用service方式管理使用prometheus
[root@prometheus system]# service prometheus start Redirecting to /bin/systemctl start prometheus.service [root@prometheus system]# ps aux|grep prometheus root 1957 0.0 5.5 782340 55296 pts/0 Sl 21:02 0:01 prometheus --config.file=/prome/prometheus/prometheus.yml root 10032 0.0 0.0 112824 976 pts/0 R+ 21:40 0:00 grep --color=auto prometheu [root@prometheus system]# service prometheus stop Redirecting to /bin/systemctl stop prometheus.service [root@prometheus system]# ps aux|grep prometheus root 1957 0.0 5.5 782340 55296 pts/0 Sl 21:02 0:01 prometheus --config.file=/prome/prometheus/prometheus.yml root 10050 0.0 0.0 112824 972 pts/0 R+ 21:41 0:00 grep --color=auto prometheu [root@prometheus system]# kill -9 1957 [root@prometheus system]# service prometheus start Redirecting to /bin/systemctl start prometheus.service [root@prometheus system]# ps aux|grep prometheus root 10067 0.2 5.0 782084 49876 ? Ssl 21:41 0:00 /prome/prometheus/prometheu --config.file=/prome/prometheus/prometheus.yml root 10090 0.0 0.0 112824 972 pts/0 R+ 21:42 0:00 grep --color=auto prometheu [root@prometheus system]# service prometheus stop Redirecting to /bin/systemctl stop prometheus.service [root@prometheus system]# ps aux|grep prometheus root 10108 0.0 0.0 112824 976 pts/0 R+ 21:42 0:00 grep --color=auto prometheu [root@prometheus system]# service prometheus start Redirecting to /bin/systemctl start prometheus.service
(4) 在node节点上安装exporter
1.下载node_exporter-1.4.0-rc.0.linux-amd64.tar.gz源码,上传到节点服务器上
2.解压
[root@keep-01 ~]# ls
anaconda-ks.cfg node_exporter-1.4.0-rc.0.linux-amd64.tar.gz
[root@keep-01 ~]# tar xf node_exporter-1.4.0-rc.0.linux-amd64.tar.gz
[root@keep-01 ~]# ls
node_exporter-1.4.0-rc.0.linux-amd64
node_exporter-1.4.0-rc.0.linux-amd64.tar.gz
单独存放到/node_exporter文件夹
[root@keep-01 ~]# mv node_exporter-1.4.0-rc.0.linux-amd64 /node_exporter
[root@keep-01 ~]#
[root@keep-01 ~]# cd /node_exporter/
[root@keep-01 node_exporter]# ls
LICENSE node_exporter NOTICE
[root@keep-01 node_exporter]#
# 修改PATH变量
[root@keep-01 node_exporter]# PATH=/node_exporter/:$PATH
[root@keep-01 node_exporter]# vim /root/.bashrc
[root@keep-01 node_exporter]# tail -1 /root/.bashrc
PATH=/node_exporter/:$PATH
# 执行node exporter 代理程序agent
[root@keep-01 node_exporter]# nohup node_exporter --web.listen-address 0.0.0.0:8090 &
[root@keep-01 node_exporter]# ps aux | grep node_exporter
root 64281 0.0 2.1 717952 21868 pts/0 Sl 19:03 0:04 node_exporter --web.listen-address 0.0.0.0:8090
root 82787 0.0 0.0 112824 984 pts/0 S+ 20:46 0:00 grep --color=auto node_exporter
[root@keep-01 node_exporter]# netstat -anplut | grep 8090
tcp6 0 0 :::8090 :::* LISTEN 64281/node_exporter
tcp6 0 0 192.168.17.152:8090 192.168.17.156:43576 ESTABLISHED 64281/node_exporter
[root@lb-1 node_exporter]#
# 其他节点一样的配置,这里可以使用ansible部署。
(5) 访问测试
(6) 在prometheus server里添加node节点
这样就可以知道去哪里pull数据
在Prometheus服务器上添加抓取数据的配置,添加node节点服务器,将抓取的数据存储到时序数据库里
[root@prometheus prometheus]# pwd
/prome/prometheus
[root@prometheus prometheus]# vi prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "master"
static_configs:
- targets: ["192.168.10.130:8090"]
- job_name: "slave"
static_configs:
- targets: ["192.168.10.136:8090"]
- job_name: "delay_backup"
static_configs:
- targets: ["192.168.10.138:8090"]
- job_name: "slave3"
static_configs:
- targets: ["192.168.10.137:8090"]
- job_name: "mysqlrouter1"
static_configs:
- targets: ["192.168.10.141:8090"]
- job_name: "mysqlrouter2"
static_configs:
- targets: ["192.168.10.142:8090"]
[root@prometheus prometheus]# service prometheus restart
9、Grafana
(1) 概述:
美观、强大的可视化监控指标展示工具
grafana是一款采用go语言编写的开源应用,主要用于大规模指标数据的可视化展现,是网络架构和应用分析中最流行的时序数据展示工具,目前已经支持绝大部分常用的时序数据库。最好的参考资料就是官网(http://docs.grafana.org/)
(2) Dashboard
仪表盘——》图形展示
(3) 安装部署
-
编辑grafana.repo文件在/etc/yum.repos.d目录下
[root@prometheus grafana]# cd /etc/yum.repos.d/ [root@prometheus yum.repos.d]# vi grafana.repo [grafana] name=grafana baseurl=https://packages.grafana.com/enterprise/rpm repo_gpgcheck=1 enabled=1 gpgcheck=1 gpgkey=https://packages.grafana.com/gpg.key sslverify=1 sslcacert<F6>=/etc/pki/tls/certs/ca-bundle.crt
-
安装
[root@prometheus yum.repos.d]# yum install grafana -y
-
启动grafana
[root@prometheus grafana]# service grafana-server start Starting grafana-server (via systemctl): [ 确定 ]
-
监听的端口
[root@prometheus grafana]# netstat -anplut |grep grafana tcp 0 0 192.168.10.188:34582 185.199.109.133:443 ESTABLISHED 1583/grafana-server tcp 0 0 192.168.10.188:58104 34.120.177.193:443 ESTABLISHED 1583/grafana-server tcp6 0 0 :::3000 :::* LISTEN 1583/grafana-server
(4) 进入网站并登录grafana
192.168.10.188:3000
默认用户名:admin
默认密码:admin
修改后的密码为admin123
(5) 设置grafana开机启动
[root@prometheus grafana]# systemctl enable grafana-server
Created symlink from /etc/systemd/system/multi-user.target.wants/grafana-server.service to /usr/lib/systemd/system/grafana-server.service.
(6) 数据展示
-
先配置prometheus的数据源
-
导入grafana的模板
http://grafana.com/grafana/dashboards
-
创建文件夹,存放模板
- 1860
- 8919(中文模板)
三、项目心得
- 一定要提前规划好整个集群的架构,配置要细心,脚本要提前准备好,边做边修改;
- 防火墙和selinux的问题需要多注意,最好每台机器都关闭防火墙
- 对MySQL的集群和高可用有了深入的了解
- 对自动化批量部署和监控有了更多的应用和理解
- keepalived的配置需要更加细心和ip地址的规划有了新的认识
- 对双vip的使用,添加2条负载均衡记录实现dns轮询,达到向2个vip负载均衡器上分流