报错信息如下:
启动报如下告警
Warning: Failed to allocate 130023424 bytes from HugeTLB memory. errno 12
Warning: Using conventional memory pool
注:此篇文章也属于mysql/mariadb优化点之一
什是Hugepage
从内存管理的角度来看,整个物理内存被划分为“帧”,虚拟内存被划分为“页”。
内存管理单元执行虚拟内存地址到物理内存地址的转换。关于哪个虚拟内存页面映射到哪个物理帧的信息保存在称为“页表”的数据结构中。页表查找成本很高。为了避免这种查找造成的性能损失,大多数体系结构都维护了一个名为 Translation Lookaside Buffer (TLB) 的快速查找缓存。此查找缓存包含虚拟内存地址到物理内存地址的映射。因此,任何需要转换为物理内存地址的虚拟内存地址首先与转换后备缓冲区进行比较,以获得有效的映射。
当 TLB 中不存在有效的地址转换时,称为“TLB 未命中”。如果发生 TLB 未命中,内存管理单元将不得不参考页表来获取转换。这会带来额外的性能开销,假如程序需要映射一个2M内存页面,以4kb进行映射需要:2048 kb / 4 kb = 512页,而以2m进行映射:2048 / 2048 = 1页,好处显而易见,所以在日常管理中我们要进尽量减少 TLB 未命中带来不必要的损失。
Linux内核2.6之前不支持大页面(Huge page),2.6以后开始支持Hugepage功能,支持2M和1G,默认为2m大小,TB以上内存推荐为1G。
使用Hugepage的优点
- 通过使用大页面,减少了TLB未命中;
- 页面被锁定在内存中,减少内存交换;
- 提高内存性能并减少 CPU 负载;
查看Hugepage
使用cat /proc/meminfo | grep Huge
查看大内存页分配情况
cat /proc/meminfo | grep Huge
AnonHugePages: 2371584 kB
ShmemHugePages: 0 kB
HugePages_Total: 4000 # huge总数量
HugePages_Free: 2817 # 剩余数量
HugePages_Rsvd: 2247 # 保留数量
HugePages_Surp: 0 # 剩余数量
Hugepagesize: 2048 kB # Huge page默认页面大小为2m
Hugetlb: 8192000 kB # Hugetlb缓冲区大小,为HugePages_Total*Hugepagesize,此处为4000*2048=8192000 KB
计算大小
Total - Free + Rsvd
# 计算使用的数量
4000-2817+2247 = 3430
# 在乘以单个页面大小,单位kb
3430 * 2048 = 7024640 kbyte
# 换算成字节
7024640 * 1024 = 7,193,231,360
查看Hugepage使用情况
使用ipcs -m
查看huge使用情况
ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 0 mysql 600 111149056 1 dest
0x00000000 32769 mysql 600 130023424 1 dest
0x00000000 65538 mysql 600 2097152 1 dest
0x00000000 98307 mysql 600 138412032 1 dest
0x00000000 131076 mysql 600 138412032 1 dest
0x00000000 163845 mysql 600 138412032 1 dest
0x00000000 196614 mysql 600 138412032 1 dest
0x00000000 229383 mysql 600 138412032 1 dest
0x00000000 262152 mysql 600 138412032 1 dest
0x00000000 294921 mysql 600 138412032 1 dest
0x00000000 327690 mysql 600 138412032 1 dest
0x00000000 360459 mysql 600 138412032 1 dest
0x00000000 393228 mysql 600 138412032 1 dest
0x00000000 425997 mysql 600 138412032 1 dest
0x00000000 458766 mysql 600 138412032 1 dest
0x00000000 491535 mysql 600 138412032 1 dest
0x00000000 524304 mysql 600 138412032 1 dest
# 计算使用大小,计算后应和上面的使用大小匹配
ipcs -m|awk '{ print $5}'|awk '{a+=$0}END{print a}'
6887047168
配置mariadb支持Hugepage
mariadb开启大Huge page支持,主要由2个参数组成:分别是large_page_size
、large_pages
,以下是官方文档中解释:
large_page_size
Description: Indicates the size of memory page if large page support (Linux only) is enabled. The page size is determined from the Hugepagesize setting in /proc/meminfo. See large_pages. Deprecated and unused in MariaDB 10.5.3 since multiple page size support was added.
Scope: Global
Dynamic: No
Data Type: numeric
Default Value: Autosized (see description)
Deprecated: MariaDB 10.5.3
large_pages
Description: Indicates whether large page support (Linux only - called huge pages) is used. This is set with --large-pages or disabled with --skip-large-pages. Large pages are used for the innodb buffer pool and for online DDL (of size 3* innodb_sort_buffer_size (or 6 when encryption is used)). To use large pages, the Linux sysctl variable kernel.shmmax must be large than the llocation. Also the sysctl variable vm.nr_hugepages multipled by large-page) must be larger than the usage. The ulimit for locked memory must be sufficient to cover the amount used (ulimit -l and equalivent in /etc/security/limits.conf / or in systemd LimitMEMLOCK). If these operating system controls or insufficient free huge pages are available, the allocation of large pages will fall back to conventional memory allocation and a warning will appear in the logs. Only allocations of the default Hugepagesize currently occur (see /proc/meminfo).
Commandline: --large-pages, --skip-large-pages
Scope: Global
Dynamic: No
Data Type: boolean
Default Value: OFF
large_page_size
:通过官方文档知晓,large_page_size的大小由Linux内核参数里的/proc/meminfo
来决定,并且mariadb在10.5.3里弃用了这个参数,因为从10.5.3起,large_page_size起改用动态分配,原因是这样分配更加合理。而手动分配则存在一个问题,分配的过大使用不完,浪费,分配的过小不够用,报错。large_pages
:是否开启Huge支持,默认NO
。无法通过SET GLOABL
动态开启,需要在my.cnf
里配置并且重启maridab服务才能生效。
设置hupage大小要注意如下几点:
- huge page大小要大于
innodb buffer pool
,如果要在线联机使用DDL语句(CREATE TABLE、ALTER TABLE)那么还需要加上innodb_sort_buffer_size(索引缓冲区数值)数值乘以3,如果是加密的则是乘以6。最终大小为:不加密innodb buffer pool
+innodb_sort_buffer_size*3
,加密innodb buffer pool
+innodb_sort_buffer_size*6
- 需要修改
/etc/sysctl.conf
里的vm.nr_hugepages
、vm.hugetlb_shm_group
参数 - 需要修改
/etc/security/limits.conf
memlock或者/etc/systemd/system.conf
DefaultLimitMEMLOCK参数
mariadb则在启动时会检测以上参数符不符合Huge page要求,如果符合要求,则正常启动;不符合则退回到常规内存(4KB)分配,并且在日志中输出告开头警通知!
计算mariadb使用内存总数
查看mariadb 全局使用内存总数:SUM=(query_cache_size + table_open_cache + innodb_buffer_pool_size + innodb_log_file_size + innodb_buffer_pool_chunk_size + performance_schema.memory)+10%
注:这个公式不太准确,因为缺少了performance_schema.memory监控内存使用数据
使用如下语句查询:
show variables like 'query_cache_size';
show variables like 'table_open_cache';
show variables like 'innodb_buffer_pool_size';
show variables like 'innodb_log_file_size';
show variables like 'innodb_buffer_pool_chunk_size';
# 以下是实验机器各项值
MariaDB [(none)]> show variables like 'query_cache_size';
+------------------+---------+
| Variable_name | Value |
+------------------+---------+
| query_cache_size | 1048576 |
+------------------+---------+
1 row in set (0.01 sec)
MariaDB [(none)]> show variables like 'table_open_cache';
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| table_open_cache | 100000 |
+------------------+--------+
1 row in set (0.00 sec)
MariaDB [(none)]> show variables like 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| innodb_buffer_pool_size | 268435456 |
+-------------------------+-----------+
1 row in set (0.00 sec)
MariaDB [(none)]> show variables like 'innodb_log_file_size';
+----------------------+----------+
| Variable_name | Value |
+----------------------+----------+
| innodb_log_file_size | 50331648 |
+----------------------+----------+
1 row in set (0.00 sec)
MariaDB [(none)]> SHOW VARIABLES LIKE 'innodb_buffer_pool_chunk_size';
+-------------------------------+-----------+
| Variable_name | Value |
+-------------------------------+-----------+
| innodb_buffer_pool_chunk_size | 134217728 |
+-------------------------------+-----------+
1 row in set (0.00 sec)
实验机器mariadb实际SUM
值为:454133408 ,换算成mb大概为 433mb
MariaDB [(none)]> SELECT 1048576+100000+268435456+50331648+134217728;
+---------------------------------------------+
| 1048576+100000+268435456+50331648+134217728 |
+---------------------------------------------+
| 454133408 |
+---------------------------------------------+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT 454133408/1024/1024;
+---------------------+
| 454133408/1024/1024 |
+---------------------+
| 433.09536743 |
+---------------------+
1 row in set (0.00 sec)
查看mysql用户组信息
id mysql
uid=997(mysql) gid=995(mysql) groups=995(mysql)
配置sysctl.conf参数
编辑/etc/sysctl.conf
下,添加如下内容
vim /etc/sysctl.d/99-sysctl.conf
# 添加使用Hugepage用户组,配置为mysql用户的group id
vm.hugetlb_shm_group = 995
# 设置Huge page数量
# 计算公式为:SUM / 1024 / 2048 = hugepage
# SUM值为433mb,增加10%冗余值,为450m左右,换算byte为471859200 byte
# 471859200/1024/2048=225个,vm.nr_hugepages设置225个足够
# vm.nr_hugepages请勿超过物理内存的85%~90%,否则有可能造成性能下降
# 实验环境改为400,实际生产环境中请勿改非常大,因为是独占,其他程序无法使用,因此修改为比实际页面高出10个以内即可
vm.nr_hugepages = 400
经过多次实验,配置以下参数mariadb启动会报130023424 bytes from HugeTLB memory. errno 28错误,因此在此注释掉
# 配置kernel.shmall
# Linux下内存页默认为4KB,计算公式为SUM / 4096 bype/page = kernel.shmall
# 419430400 byte / 4096 bype/page = 102400
# kernel.shmall = 102400
# 配置kernel.shmmax
# 在mariadb里使用show variables like 'innodb_buffer_pool_instances';获取缓冲区实例数量,默认为1
# 计算公式为:innodb_buffer_pool_size / innodb_buffer_pool_instances = kernel.shmmax
# 假如mariadb里innodb_buffer_pool_size缓冲区为8G,innodb_buffer_pool_instances缓冲区数量为4,则8 / 4 = 2G,每个缓冲区大小为2G
# 2G换算成byte为:2*1024*1024*1024=2147483648
# 实验机器innodb_buffer_pool_siz为256m,innodb_buffer_pool_instances为1,256 / 1 = 256,256*1024*1024=268435456 byte
# kernel.shmmax = 268435456
修改memlock参数
假如你是使用/bin/mysqld start 或者service mysqld start这样的启动方式,请这样修改
查看默认参数,Centos里默认为64kb,对于myraidb来说这是远远不够的。
ulimit -l
64
# 或者这样查询
ps -ef |grep mysqld
root 1126 3402 0 14:37 pts/2 00:00:00 grep --color=auto mysqld
mysql 26525 1 0 13:26 ? 00:00:06 /usr/sbin/mysqld --basedir=/usr
# 查看限制,mysql Max locked memory最大为65536byte,65536/1024=64kb
cat /proc/26525/limits | grep "Max locked memory"
Max locked memory 65536 65536 bytes
需要修改/etc/security/limits.conf
里文件,增加mysql用户最大内存锁的限制,保存退出即可
vim /etc/security/limits.conf
# 增加如下选项,假如你的mariadb是用mysql用户执行的
@mysql soft memlock unlimited
@mysql hard memlock unlimited
如果你是使用systemctl start mariadb启动的,请这样修改:
编辑/etc/security/limits.conf
,请注意limits.conf
开头注释文件:
#This file sets the resource limits for the users logged in via PAM.
#It does not affect resource limits of the system services.
#
#Also note that configuration files in /etc/security/limits.d directory,
#which are read in alphabetical order, override the settings in this
#file in case the domain is the same or more specific.
#That means for example that setting a limit for wildcard domain here
#can be overriden with a wildcard setting in a config file in the
#subdirectory, but a user specific setting here can be overriden only
#with a user specific setting in the subdirectory.
可以发现第二行字说明,此设置对system services不生效,只对通过PAM登录的用户生效,也就是说我们使用systemd管理的服务进程是不受这里影响的,如果你的mariadb是用systemctl start mariadb
管理的,那么/etc/security/limits.conf
里配置是不会生效的;需要在/etc/systemd/system/mariadb.service.d/
新增一个配置文件,增加如下选项:
vim /etc/systemd/system/mariadb.service.d/limits.conf
# 增加打开文件限制
[Service]
LimitNOFILE=409600
# 公式为vm.nr_hugepages*2m*1024*1024
# 上面设置为400*2*1024*1024
LimitMEMLOCK=838860800
# 或者直接改为没有限制,2选一
LimitMEMLOCK=infinity
# 重置服务
systemctl daemon-reload
# 重启服务
systemctl restart mariadb
修改完重启Centos,一定要重启!
mariadb开启large_pages支持
mariadb默认是关闭large_pages
选项,无法通过SET GLOBAL
动态开启,需要在my.cnf
里配置,并且要重启服务才能生效
vim /etc/my.cnf.d/my.cnf
# 增加如下选项开启大内存页支持
large-pages = ON
# 重启服务
systemctl restart mariadb
查看HugePages
发现Hugepages已经生效
cat /proc/meminfo |grep 'Huge'
AnonHugePages: 1458176 kB
HugePages_Total: 400
HugePages_Free: 355
HugePages_Rsvd: 267
HugePages_Surp: 0
Hugepagesize: 2048 kB
查看mariadb是否使用了大内存页
可以看到mariadb已经使用了大内存页
ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 0 mysql 600 111149056 1 dest
0x00000000 1 mysql 600 138412032 1 dest
0x00000000 2 mysql 600 138412032 1 dest
0x00000000 3 mysql 600 130023424 1 dest
0x00000000 4 mysql 600 2097152 1 dest
查看err.log也无报错
2022-03-24 14:37:30 140512708901056 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2022-03-24 14:37:30 140512708901056 [Note] InnoDB: Uses event mutexes
2022-03-24 14:37:30 140512708901056 [Note] InnoDB: Compressed tables use zlib 1.2.7
2022-03-24 14:37:30 140512708901056 [Note] InnoDB: Using Linux native AIO
2022-03-24 14:37:30 140512708901056 [Note] InnoDB: Number of pools: 1
2022-03-24 14:37:30 140512708901056 [Note] InnoDB: Using generic crc32 instructions
2022-03-24 14:37:30 140512708901056 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2022-03-24 14:37:30 140512708901056 [Note] InnoDB: Completed initialization of buffer pool
2022-03-24 14:37:30 140511038211840 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2022-03-24 14:37:31 140512708901056 [Note] InnoDB: Highest supported file format is Barracuda.
2022-03-24 14:37:31 140512708901056 [Note] InnoDB: 128 out of 128 rollback segments are active.
2022-03-24 14:37:31 140512708901056 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-03-24 14:37:31 140512708901056 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-03-24 14:37:31 140512708901056 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-03-24 14:37:31 140512708901056 [Note] InnoDB: Waiting for purge to start
2022-03-24 14:37:31 140512708901056 [Note] InnoDB: 5.7.36 started; log sequence number 153310958801
2022-03-24 14:37:31 140510391957248 [Note] InnoDB: Loading buffer pool(s) from /home/mysql/ib_buffer_pool
2022-03-24 14:37:31 140512708901056 [Note] Plugin 'FEEDBACK' is disabled.
220324 14:37:31 server_audit: server_audit_incl_users set to 'test,user,root,vipshop' # 审计对象,不设置则针对所有用户'.
220324 14:37:31 server_audit: MariaDB Audit Plugin version 1.4.12 STARTED.
220324 14:37:31 server_audit: Query cache is enabled with the TABLE events. Some table reads can be veiled.220324 14:37:31 server_audit: logging started to the file /home/mysql/server_audit.log.
2022-03-24 14:37:31 140512708901056 [Note] Server socket created on IP: '::'.
2022-03-24 14:37:31 140511002081024 [Note] Event Scheduler: scheduler thread started with id 7
2022-03-24 14:37:31 140512708901056 [Note] Reading of all Master_info entries succeeded
2022-03-24 14:37:31 140512708901056 [Note] Added new Master_info '' to hash table
2022-03-24 14:37:31 140512708901056 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.2.41-MariaDB-log' socket: '/dev/shm/mysql.sock' port: 3306 MariaDB Server
2022-03-24 14:37:32 140510391957248 [Note] InnoDB: Buffer pool(s) load completed at 220324 14:37:32
总结
写这篇文档大概用了4天的时间,4天里一直在做实验,解决errno12报错;mariadb报错从最初的Warning: Failed to allocate 130023424 bytes from HugeTLB memory. errno 12,变成errno22,在变成errno28,直到最后注释掉kernel.shmall
、kernel.shmmax
才解决;其中也总结了一些经验:
- 不要过多的相信文档,还要多实践;
- mariadb开启Hugepage,对小数据量提升并不大,如果你的数据库库经常调度大数据,还是建议开启,包括oracle和PGSQL;
- 为何注释掉
kernel.shmall
、kernel.shmmax
这两个参数才正常,个人猜测1是我设置的不对,2是mariadb里不需要这些参数,其中在oracle官方MOS里也提示了需要修改这两个参数,有账号的可以点击这个连接进行查看:https://support.oracle.com/knowledge/Oracle%20Database%20Products/2118592_1.html
vm.nr_hugepages
这个参数直接和innodb使用的总使用量挂钩,所以需要精确计算,具体的还要查找mariadb的官方文档,查找出和innodb引擎相关的内存使用项,然后进行计算,这也是下一步需要优化此篇文档的工作;
5. 为什么使用cat /proc/meminfo | grep Huge
,在不同的机器里显示的项不一样
上面2个例子中,例子2比例子1多了一项Hugetlb: 1228800 kB
,个人猜测是内核版本不同造成的,这也是需要下一步理解的点;
此问题已解决,确实内核版本造成的,在升级内核至5.4.187
后解决了疑问,如何升级内核查,请查看本人的这篇文章:https://blog.csdn.net/hzcm1984/article/details/123728953\
# 例子1
cat /proc/meminfo | grep Huge
AnonHugePages: 1613824 kB
HugePages_Total: 400
HugePages_Free: 352
HugePages_Rsvd: 264
HugePages_Surp: 0
Hugepagesize: 2048 kB
# 例子2
cat /proc/meminfo | grep Huge
AnonHugePages: 1261568 kB
ShmemHugePages: 0 kB
HugePages_Total: 600
HugePages_Free: 428
HugePages_Rsvd: 306
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 1228800 kB
# 升级内核5.4.187后,又多出了FileHugePages选项
[root@pev ~]# cat /proc/meminfo |grep 'Huge'
AnonHugePages: 819200 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
写文章不易,如果觉得此篇文章对你有用,请收藏、关注!本人定期发布一些技术干货
参考文章
https://access.redhat.com/solutions/320303
https://mariadb.com/kb/en/innodb-buffer-pool/
http://mysql.rjweb.org/doc.php/memory
https://www.codetd.com/en/article/9718933