在测试NAS性能,用fstest长时间写,分析性能变差的原因,发现server主机内存使用率很高。
1.首先查看内存
# top -M
top - 14:43:12 up 14 days, 6 min, 1 user, load average: 8.36, 8.38, 8.41
Tasks: 419 total, 1 running, 418 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.0%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 63.050G total, 62.639G used, 420.973M free, 33.973M buffers
Swap: 4095.996M total, 0.000k used, 4095.996M free, 48.889G cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
111 root 20 0 0 0 0 S 2.0 0.0 0:25.52 ksoftirqd/11
5968 root 20 0 15352 1372 828 R 2.0 0.0 0:00.01 top
13273 root 20 0 0 0 0 D 2.0 0.0 25:54.02 nfsd
17765 root 0 -20 0 0 0 S 2.0 0.0 0:11.89 kworker/5:1H
1 root 20 0 19416 1436 1136 S 0.0 0.0 0:01.88 init
.....
发现内存基本用完,究竟是什么进程占用?top命令发现排名第一的%MEM才零点几。
2.通过 vmstat -m命令查看内核空间的内存使用。
# vmstat -m
Cache Num Total Size Pages
xfs_dqtrx 0 0 384 10
xfs_dquot 0 0 504 7
xfs_buf 91425 213300 384 10
fstrm_item 0 0 24 144
xfs_mru_cache_elem 0 0 32 112
xfs_ili 7564110 8351947 224 17
xfs_inode 7564205 8484180 1024 4
xfs_efi_item 257 390 400 10
xfs_efd_item 237 380 400 10
xfs_buf_item 1795 2414 232 17
xfs_log_item_desc 830 1456 32 112
xfs_trans 377 490 280 14
xfs_ifork 0 0 64 59
xfs_da_state 0 0 488 8
xfs_btree_cur 342 437 208 19
xfs_bmap_free_item 89 288 24 144
xfs_log_ticket 717 966 184 21
xfs_ioend 726 896 120 32
rbd_segment_name 109 148 104 37
rbd_obj_request 1054 1452 176 22
rbd_img_request 1037 1472 120 32
ceph_osd_request 548 693 872 9
ceph_msg_data 1041 1540 48 77
ceph_msg 1197 1632 232 17
nfsd_drc 19323 33456 112 34
nfsd4_delegations 0 0 368 10
nfsd4_stateids 855 1024 120 32
nfsd4_files 802 1050 128 30
nfsd4_lockowners 0 0 384 10
nfsd4_openowners 15 50 392 10
rpc_inode_cache 27 30 640 6
rpc_buffers 8 8 2048 2
rpc_tasks 8 15 256 15
fib6_nodes 22 59 64 59
pte_list_desc 0 0 32 112
ext4_groupinfo_4k 722 756 136 28
ext4_inode_cache 3362 3728 968 4
ext4_xattr 0 0 88 44
ext4_free_data 0 0 64 59
ext4_allocation_context 0 0 136 28
ext4_prealloc_space 42 74 104 37
ext4_system_zone 0 0 40 92
Cache Num Total Size Pages
ext4_io_end 0 0 64 59
ext4_extent_status 1615 5704 40 92
jbd2_transaction_s 30 30 256 15
jbd2_inode 254 539 48 77
........
发现这两项值很高: xfs_ili xfs_inode 占用了大量的内存。
3.使用slabtop命令查看内核slab 缓冲区信息
#slabtop -s c | head
Active / Total Objects (% used) : 31807723 / 35664583 (89.2%)
Active / Total Slabs (% used) : 3259180 / 3259251 (100.0%)
Active / Total Caches (% used) : 139 / 227 (61.2%)
Active / Total Size (% used) : 11242773.43K / 12756788.05K (88.1%)
Minimum / Average / Maximum Object : 0.02K / 0.36K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
8480676 7565420 89% 1.00K 2120169 4
8480676K
xfs_inode
8351794 7565375 90% 0.22K 491282 17
1965128K
xfs_ili
xfs_ili 占用1965128k xfs_inode占用8480676K,但他们究竟是什么东东?猜测是nas/rbd 卷的文件系统缓存信息。xfs_inode看字面意思是xfs文件系统的inode信息。
搜了下xfs_ili,只搜到内核代码片段。
xfs_inode_zone =
1636 kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
1637 KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD,
1638 xfs_fs_inode_init_once);
1639 if (!xfs_inode_zone)
1640 goto out_destroy_efi_zone;
1641
1642 xfs_ili_zone =
1643 kmem_zone_init_flags(sizeof(xfs_inode_log_item_t), "xfs_ili",
1644 KM_ZONE_SPREAD, NULL);
28 typedef struct xfs_inode_log_item {
29 xfs_log_item_t ili_item; /* common portion */
30 struct xfs_inode *ili_inode; /* inode ptr */
31 xfs_lsn_t ili_flush_lsn; /* lsn at last flush */
32 xfs_lsn_t ili_last_lsn; /* lsn at last transaction */
33 unsigned short ili_lock_flags; /* lock flags */
34 unsigned short ili_logged; /* flushed logged data */
35 unsigned int ili_last_fields; /* fields when flushed */
36 unsigned int ili_fields; /* fields to be logged */
37 struct xfs_bmbt_rec *ili_extents_buf; /* array of logged
38 data exts */
39 struct xfs_bmbt_rec *ili_aextents_buf; /* array of logged
40 attr exts */
41 xfs_inode_log_format_t ili_format; /* logged structure */
42 } xfs_inode_log_item_t;
分析加估计是文件系统的日志缓存。究竟是不是?目前nfs-server有14个卷,每个卷的在格式化xfs的时指定的参数(即日志大小)-l=128m 14*128*1024 约等于1965128。
4.但是xfs_ili xfs_inode两者加起来才10G,还有50G去哪儿了呢?查资料说linux将用过的文件缓存到内存中。
执行下面的命令就释放了内存
#sync #
刷到磁盘
#echo 3 > /proc/sys/vm/drop_caches
5.总结
是不是由于内存少导致的性能变差,还在测试。不过以后在优化nfs-server端有一定的指导意义。卷越多,必然占用的内存越多。做机头的内存配置要高。