环境配置:
zabbix服务版本:3.4.1
mysql当前版本:5.1.7
mysql新版本:5.7.26
zabbix服务器硬件配置:8核32G100G
mysql服务器当前硬件配置:4核16G100G
mysql新服务器硬件配置:4核32G100G
zabbix的监控主机数量有四百多台,监控项数量有三万多个,目前已经运行了快两年时间,随着多样化的监控项和告警需求不断的接入,发现前端页面无响应,或者响应慢,zabbix绘图中经常出现断图,一些item没有数据,告警短信时延也逐步加剧,zabbix性能是越来越低,因此有必要对zabbix的整体性能做一次大的调整。
性能优化第一步:zabbix自身性能优化
zabbix内部组件进行监控,这是zabbix监控的第一步
配置-主机 找到zabbix服务器主机,添加Template APP Zabbix Server模板保存即可
需要关注的几个内部性能监控图表
Zabbix inernal process busy 监控图各个监控项对应配置文件的参数和默认值如下,当某个进程负载过高时调整相应的值:
Zabbix busy timer processes,in % StartTimers=1 Zabbix busy eacalators processes,in % StartEscalators=1 Zabbix busy housekeeper processes,in % HousekeepingFrequency=1 , MaxHousekeeperDelete=5000 Zabbix busy alerter processes,in % StartAlerters=3 Zabbix busy configuration syncer processes,in % Zabbix busy history syncer processes,in % Zabbix busy self-monitoring processes,in % Zabbix busy task manager processes,in % Zabbix busy ipmi manager processes,in % Zabbix busy alter manager processes,in % Zabbix busy preprocessing manager processes,in % Zabbix busy preprocessing worker processes,in %
Zabbix data gathering process busy % 监控图各个监控项对应配置文件的参数和默认值如下,当某个进程负载过高时调整相应的值:
Zabbix busy trapper processes, in % StartTrappers=5 Zabbix busy poller processes, in % StartPollers=5 Zabbix busy ipmi poller processes, in % StartIPMIPollers=0 Zabbix busy discoverer processes, in % StartDiscoverers=1 Zabbix busy icmp pinger processes, in % StartPingers=1 Zabbix busy http poller processes, in % StartHTTPPollers=1 Zabbix busy proxy poller processes, in % StartProxyPollers=1 Zabbix busy unreachable poller processes, in % StartPollersUnreachable=1 Zabbix busy java poller processes, in % StartJavaPollers=0 Zabbix busy snmp trapper processes, in % StartSNMPTrapper=0 Zabbix busy vmware collector processes, in % StartVMwareCollectors=0
Zabbix cache usage, % free 监控图各个监控项对应配置文件的参数和默认值如下,当某个进程负载过高时调整相应的值:
Zabbix trend write cache, % free TrendCacheSize=4M Zabbix configuration cache, % free CacheSize=8M Zabbix text write cache, % free HistoryTextCacheSize=16M Zabbix history write cache, % free HistoryCacheSize=8M Zabbix value cache, % free ValueCacheSize=8M Zabbix vmware cache, % free VMwareCacheSize=8M
目前设备和监控规模的服务端配置参数
ListenPort=21004 LogFile=/var/log/zabbix/zabbix_server.log LogFileSize=50 SocketDir=/tmp/zabbix DBName=zabbix DBUser=zabbix StartPollers=200 StartPollersUnreachable=20 StartTrappers=120 StartDiscoverers=10 StartHTTPPollers=20 StartAlerters=50 HousekeepingFrequency=12 MaxHousekeeperDelete=1000000 CacheSize=1024M StartDBSyncers=8 HistoryCacheSize=2048M HistoryIndexCacheSize=2048M TrendCacheSize=2048M ValueCacheSize=16G Timeout=10 AlertScriptsPath=/usr/local/zabbix-3.4.1/alertscripts LogSlowQueries=3000 LogFile=/var/log/zabbix/zabbix_server.log
监控项间隔检查
select delay, count(*), concat(round(count(*) / (select count(*) from items where status=0) * 100,2),'%') as percent from items where status=0 group by delay order by 2 desc;
MYSQL数据库优化
优化主要是两个方面
1、MYSQL参数优化
[client] default-character-set = utf8 socket=/var/lib/mysql/mysql.sock [mysqld] max_connections = 2000 open_files_limit = 30000 table_open_cache = 20000 explicit_defaults_for_timestamp = true transaction-isolation = READ-COMMITTED max_allowed_packet = 128M innodb_buffer_pool_size = 20G innodb_log_file_size = 512M innodb_file_per_table = 1 innodb_log_buffer_size=4M innodb_thread_concurrency=64 innodb_flush_log_at_trx_commit=0 innodb_flush_method=O_DIRECT lower_case_table_names = 1 # collation-server = gbk_bin # # character-set-server = gbk default-storage-engine = INNODB join_buffer_size = 512M sort_buffer_size = 20M read_rnd_buffer_size = 20M log_timestamps = system collation-server = utf8_bin character-set-server = utf8 validate_password_policy = 0 validate_password_number_count = 0 validate_password_length = 4 validate_password_special_char_count = 0 default_password_lifetime = 0 innodb_read_io_threads = 16 innodb_write_io_threads = 16 datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock skip-host-cache skip-name-resolve # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 # Recommended in standard MySQL setup #sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES #sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION' sql_mode='STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION' #[mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid
2、关闭zabbixhousekeeper清理历史和趋势记录数据功能,zabbix几个大表时间分区(history,history_log,history_str,history_text,history_unit,trends,trends_unit),
关闭housekeeper禁止自动定期清除历史和趋势记录数据,转而使用数据库分区的方式配合存储过程来定期清除历史和趋势记录数据。
停止zabbix server进程,清空历史和趋势数据。我这里是直接更换了一台高版本mysql服务器,然后使用navicat同步除这七张表的zabbix所有数据表数据,再同步的这七张表的数据结构,然后在新数据库上进行这七张数据表分区
[zabbix]> truncate table history;
[zabbix]> truncate table history_log;
[zabbix]> truncate table history_str;
[zabbix]> truncate table history_text;
[zabbix]> truncate table history_unit;
[zabbix]> truncate table trends;
[zabbix]> truncate table trends_unit;
[zabbix]> optimize table history;
[zabbix]> optimize table history_log;
[zabbix]> optimize table history_str;
[zabbix]> optimize table history_text;
[zabbix]> optimize table history_unit;
[zabbix]> optimize table trends;
[zabbix]> optimize table trends_unit;
将官方的四个分散代码拷贝至一个文件保存为sql,导入数据库;
https://www.zabbix.org/wiki/Docs/howto/mysql_partition#All_done
cat /root/zabbix-partition.sql
四个存储过程如下:
DELIMITER $$
CREATE PROCEDURE `partition_create`(SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64), CLOCK int)
BEGIN
/*
SCHEMANAME = The DB schema in which to make changes
TABLENAME = The table with partitions to potentially delete
PARTITIONNAME = The name of the partition to create
*/
/*
Verify that the partition does not already exist
*/
DECLARE RETROWS INT;
SELECT COUNT(1) INTO RETROWS
FROM information_schema.partitions
WHERE table_schema = SCHEMANAME AND table_name = TABLENAME AND partition_description >= CLOCK;
IF RETROWS = 0 THEN
/*
1. Print a message indicating that a partition was created.
2. Create the SQL to create the partition.
3. Execute the SQL from #2.
*/
SELECT CONCAT( "partition_create(", SCHEMANAME, ",", TABLENAME, ",", PARTITIONNAME, ",", CLOCK, ")" ) AS msg;
SET @sql = CONCAT( 'ALTER TABLE ', SCHEMANAME, '.', TABLENAME, ' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', CLOCK, '));' );
PREPARE STMT FROM @sql;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
END IF;
END$$
DELIMITER ;
DELIMITER $$
CREATE PROCEDURE `partition_drop`(SCHEMANAME VARCHAR(64), TABLENAME VARCHAR(64), DELETE_BELOW_PARTITION_DATE BIGINT)
BEGIN
/*
SCHEMANAME = The DB schema in which to make changes
TABLENAME = The table with partitions to potentially delete
DELETE_BELOW_PARTITION_DATE = Delete any partitions with names that are dates older than this one (yyyy-mm-dd)
*/
DECLARE done INT DEFAULT FALSE;
DECLARE drop_part_name VARCHAR(16);
/*
Get a list of all the partitions that are older than the date
in DELETE_BELOW_PARTITION_DATE. All partitions are prefixed with
a "p", so use SUBSTRING TO get rid of that character.
*/
DECLARE myCursor CURSOR FOR
SELECT partition_name
FROM information_schema.partitions
WHERE table_schema = SCHEMANAME AND table_name = TABLENAME AND CAST(SUBSTRING(partition_name FROM 2) AS UNSIGNED) < DELETE_BELOW_PARTITION_DATE;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
/*
Create the basics for when we need to drop the partition. Also, create
@drop_partitions to hold a comma-delimited list of all partitions that
should be deleted.
*/
SET @alter_header = CONCAT("ALTER TABLE ", SCHEMANAME, ".", TABLENAME, " DROP PARTITION ");
SET @drop_partitions = "";
/*
Start looping through all the partitions that are too old.
*/
OPEN myCursor;
read_loop: LOOP
FETCH myCursor INTO drop_part_name;
IF done THEN
LEAVE read_loop;
END IF;
SET @drop_partitions = IF(@drop_partitions = "", drop_part_name, CONCAT(@drop_partitions, ",", drop_part_name));
END LOOP;
IF @drop_partitions != "" THEN
/*
1. Build the SQL to drop all the necessary partitions.
2. Run the SQL to drop the partitions.
3. Print out the table partitions that were deleted.
*/
SET @full_sql = CONCAT(@alter_header, @drop_partitions, ";");
PREPARE STMT FROM @full_sql;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
SELECT CONCAT(SCHEMANAME, ".", TABLENAME) AS `table`, @drop_partitions AS `partitions_deleted`;
ELSE
/*
No partitions are being deleted, so print out "N/A" (Not applicable) to indicate
that no changes were made.
*/
SELECT CONCAT(SCHEMANAME, ".", TABLENAME) AS `table`, "N/A" AS `partitions_deleted`;
END IF;
END$$
DELIMITER ;
DELIMITER $$
CREATE PROCEDURE `partition_maintenance`(SCHEMA_NAME VARCHAR(32), TABLE_NAME VARCHAR(32), KEEP_DATA_DAYS INT, HOURLY_INTERVAL INT, CREATE_NEXT_INTERVALS INT)
BEGIN
DECLARE OLDER_THAN_PARTITION_DATE VARCHAR(16);
DECLARE PARTITION_NAME VARCHAR(16);
DECLARE OLD_PARTITION_NAME VARCHAR(16);
DECLARE LESS_THAN_TIMESTAMP INT;
DECLARE CUR_TIME INT;
CALL partition_verify(SCHEMA_NAME, TABLE_NAME, HOURLY_INTERVAL);
SET CUR_TIME = UNIX_TIMESTAMP(DATE_FORMAT(NOW(), '%Y-%m-%d 00:00:00'));
SET @__interval = 1;
create_loop: LOOP
IF @__interval > CREATE_NEXT_INTERVALS THEN
LEAVE create_loop;
END IF;
SET LESS_THAN_TIMESTAMP = CUR_TIME + (HOURLY_INTERVAL * @__interval * 3600);
SET PARTITION_NAME = FROM_UNIXTIME(CUR_TIME + HOURLY_INTERVAL * (@__interval - 1) * 3600, 'p%Y%m%d%H00');
IF(PARTITION_NAME != OLD_PARTITION_NAME) THEN
CALL partition_create(SCHEMA_NAME, TABLE_NAME, PARTITION_NAME, LESS_THAN_TIMESTAMP);
END IF;
SET @__interval=@__interval+1;
SET OLD_PARTITION_NAME = PARTITION_NAME;
END LOOP;
SET OLDER_THAN_PARTITION_DATE=DATE_FORMAT(DATE_SUB(NOW(), INTERVAL KEEP_DATA_DAYS DAY), '%Y%m%d0000');
CALL partition_drop(SCHEMA_NAME, TABLE_NAME, OLDER_THAN_PARTITION_DATE);
END$$
DELIMITER ;
DELIMITER $$
CREATE PROCEDURE `partition_verify`(SCHEMANAME VARCHAR(64), TABLENAME VARCHAR(64), HOURLYINTERVAL INT(11))
BEGIN
DECLARE PARTITION_NAME VARCHAR(16);
DECLARE RETROWS INT(11);
DECLARE FUTURE_TIMESTAMP TIMESTAMP;
/*
* Check if any partitions exist for the given SCHEMANAME.TABLENAME.
*/
SELECT COUNT(1) INTO RETROWS
FROM information_schema.partitions
WHERE table_schema = SCHEMANAME AND table_name = TABLENAME AND partition_name IS NULL;
/*
* If partitions do not exist, go ahead and partition the table
*/
IF RETROWS = 1 THEN
/*
* Take the current date at 00:00:00 and add HOURLYINTERVAL to it. This is the timestamp below which we will store values.
* We begin partitioning based on the beginning of a day. This is because we don't want to generate a random partition
* that won't necessarily fall in line with the desired partition naming (ie: if the hour interval is 24 hours, we could
* end up creating a partition now named "p201403270600" when all other partitions will be like "p201403280000").
*/
SET FUTURE_TIMESTAMP = TIMESTAMPADD(HOUR, HOURLYINTERVAL, CONCAT(CURDATE(), " ", '00:00:00'));
SET PARTITION_NAME = DATE_FORMAT(CURDATE(), 'p%Y%m%d%H00');
-- Create the partitioning query
SET @__PARTITION_SQL = CONCAT("ALTER TABLE ", SCHEMANAME, ".", TABLENAME, " PARTITION BY RANGE(`clock`)");
SET @__PARTITION_SQL = CONCAT(@__PARTITION_SQL, "(PARTITION ", PARTITION_NAME, " VALUES LESS THAN (", UNIX_TIMESTAMP(FUTURE_TIMESTAMP), "));");
-- Run the partitioning query
PREPARE STMT FROM @__PARTITION_SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
END IF;
END$$
DELIMITER ;
[root@Zabbix-Server ~]# mysql -u zabbix -p zabbix Enter password: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 48790 Server version: 5.5.52-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [zabbix]> use zabbix; Database changed MariaDB [zabbix]> source /root/zabbix-partition.sql; Query OK, 0 rows affected (0.04 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec)
分区维护存储过程SQL
DELIMITER $$
CREATE PROCEDURE `partition_maintenance_all`(SCHEMA_NAME VARCHAR(32))
BEGIN
CALL partition_maintenance(SCHEMA_NAME, 'history', 28, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'history_log', 28, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'history_str', 28, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'history_text', 28, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'history_uint', 28, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'trends', 730, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'trends_uint', 730, 24, 14);
END$$
DELIMITER ;
以上代码部分的含义为(库名,表名,保存多少天的数据,每隔多久生成一个分区,本次生成多少分区)
调用分区维护存储过程SQL,将历史和趋势数据表做成表分区
mysql> source /root/partition_maintenance_all.sql;
Query OK, 0 rows affected (0.00 sec)
mysql> CALL partition_maintenance_all('zabbix');
为了让时间分区不断的创建出来,使用crontab任务调用分区维护存储过程SQL来生成新的表分区
[root@HXQ-WLOMC-APP03 ~]# crontab -l ## zabbix partition stored procedures 0 3 * * 7 /usr/bin/mysql -uzabbix -p'SHipnet!23$' -e "use zabbix;" -e "CALL partition_maintenance_all('zabbix');" [root@HXQ-WLOMC-APP03 ~]#
参考链接