Solaris存在内存泄漏???

 

Help! I've lost my memory!

Unix Insider 10/1/95

Adrian Cockcroft, Unix Insider

Dear Adrian,
After a reboot I saw that most of my computer's memory was free, but when I launched my application it used up almost all the memory. When I stopped the application the memory didn't come back! Take a look at my vmstat output:


% vmstat 5
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id

This is before the program starts:

0 0 0 330252 80708 0 2 0 0 0 0 0 0 0 0 1 18 107 113 0 1 99
0 0 0 330252 80708 0 0 0 0 0 0 0 0 0 0 0 14 87 78 0 0 99
I start the program and it runs like this for a while:

0 0 0 314204 8824 0 0 0 0 0 0 0 0 0 0 0 414 132 79 24 1 74
0 0 0 314204 8824 0 0 0 0 0 0 0 0 0 0 0 411 99 66 25 1 74

I stop it, then almost all the swap space comes back, but the free memory does not:

0 0 0 326776 21260 0 3 0 0 0 0 0 0 1 0 0 420 116 82 4 2 95
0 0 0 329924 24396 0 0 0 0 0 0 0 0 0 0 0 414 82 77 0 0 100
0 0 0 329924 24396 0 0 0 0 0 0 0 0 2 0 1 430 90 84 0 1 99


I checked that there were no application processes running. It looks like a huge memory leak in the operating system. How can I get my memory back?
--RAMless in Ripon


The short answer

Launch your application again. Notice that it starts up more quickly than it did the first time, and with less disk activity. The application code and its data files are still in memory, even though they are not active. The memory they occupy is not "free." If you restart the same application it finds the pages that are already in memory. The pages are attached to the inode cache entries for the files. If you start a different application, and there is insufficient free memory, the kernel will scan for pages that have not been touched for a long time, and "free" them. Once you quit the first application, the memory it occupies is not being touched, so it will be freed quickly for use by other applications.

In 1988, Sun introduced this feature in SunOS 4.0. It still applies to all versions of Solaris 1 and 2. The kernel is trying to avoid disk reads by caching as many files as possible in memory. Attaching to a page in memory is around 1,000 times faster than reading it in from disk. The kernel figures that you paid good money for all of that RAM, so it will try to make good use of it by retaining files you might need.

By contrast, Memory leaks appear as a shortage of swap space after the misbehaving program runs for a while. You will probably find a process that has a larger than expected size. You should restart the program to free up the swap space, and check it with a debugger that offers a leak-finding feature (SunSoft's DevPro debugger, for example).

The long (and technical) answer

To understand how Sun's operating systems handle memory, I will explain how the inode cache works, how the buffer cache fits into the picture, and how the life cycle of a typical page evolves as the system uses it for several different purposes.

The inode cache and file data caching

Whenever you access a file, the kernel needs to know the size, the access permissions, the date stamps and the locations of the data blocks on disk. Traditionally, this information is known as the inode for the file. There are many filesystem types. For simplicity I will assume we are only interested in the Unix filesystem (UFS) on a local disk. Each filesystem type has its own inode cache.

The filesystem stores inodes on the disk; the inode must be read into memory whenever an operation is performed on an entity in the filesystem. The number of inodes read per second is reported as iget/s by the sar -a command. The inode read from disk is cached in case it is needed again, and the number of inodes that the system will cache is influenced by a kernel parameter called ufs_ninode. The kernel keeps inodes on a linked list, rather than in a fixed-size table.

As I mention each command I will show you what the output looks like. In my case I'm collecting sar data automatically using cron. sar, which defaults to reading the stored data for today. If you have no stored data, specify a time interval and sar will show you current activity.


% sar -a

SunOS hostname 5.4 Generic_101945-32 sun4c 09/18/95

00:00:01 iget/s namei/s dirbk/s
01:00:01 4 6 0
All reads or writes to UFS files occur by paging from the filesystem. All pages that are part of the file and are in memory will be attached to the inode cache entry for that file. When a file is not in use, its data is cached in memory, using an inactive inode cache entry. When the kernel reuses an inactive inode cache entry that has pages attached, it puts the pages on the free list; this case is shown by sar -g as %ufs_ipf. This number is the percentage of UFS inodes that were overwritten in the inode cache by iget and that had reusable pages associated with them. The kernel flushes the pages, and updates on disk any modified pages. Thus, this %ufs_ipf number is the percentage of igets with page flushes. Any non-zero values of %ufs_ipf reported by sar -g indicate that the inode cache is too small for the current workload.

% sar -g

SunOS hostname 5.4 Generic_101945-32 sun4c 09/18/95

00:00:01 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
01:00:01 0.02 0.02 0.08 0.12 0.00

For SunOS 4 and releases up to Solaris 2.3, the number of inodes that the kernel will keep in the inode cache is set by the kernel variable ufs_ninode. To simplify: When a file is opened, an inactive inode will be reused from the cache if the cache is full; when an inode becomes inactive, it is discarded if the cache is over-full. If the cache limit has not been reached then an inactive inode is placed at the back of the reuse list and invalid inodes (inodes for files that longer exist) are placed at the front for immediate reuse. It is entirely possible for the number of open files in the system to cause the number of active inodes to exceed ufs_ninode; raising ufs_ninode allows more inactive inodes to be cached in case they are needed again.

Solaris 2.4 uses a more clever inode cache algorithm. The kernel maintains a reuse list of blank inodes for instant use. The number of active inodes is no longer constrained, and the number of idle inodes (inactive but cached in case they are needed again) is kept between ufs_ninode and 75 percent of ufs_ninode by a new kernel thread that scavenges the inodes to free them and maintains entries on the reuse list. If you use sar -v to look at the inode cache, you may see a larger number of existing inodes than the reported "size."


% sar -v

SunOS hostname 5.4 Generic_101945-32 sun4c 09/18/95

00:00:01 proc-sz ov inod-sz ov file-sz ov lock-sz
01:00:01 66/506 0 2108/2108 0 353/353 0 0/0

Buffer cache

The buffer cache is used to cache filesystem data in SVR3 and BSD Unix. In SunOS 4, generic SVR4, and Solaris 2, it is used to cache inode, indirect block, and cylinder group blocks only. Although this change was introduced in 1988, many people still incorrectly think the buffer cache is used to hold file data. Inodes are read from disk to the buffer cache in 8-kilobyte blocks, then the individual inodes are read from the buffer cache into the inode cache.

Life cycle of a typical physical memory page

This section provides additional insight into the way memory is used. The sequence described is an example of some common uses of pages; many other possibilities exist.

1. Initialization -- A page is born
When the system boots, it forms all free memory into pages, and allocates a kernel data structure to hold the state of every page in the system.
2. Free -- An untouched virgin page
All the memory is put onto the free list to start with. At this stage the content of the page is undefined.
3. ZFOD -- Joining an uninitialized data segment
When a program accesses data that is preset to zero for the very first time, a minor page fault occurs and a Zero Fill On Demand (ZFOD) operation takes place. The page is taken from the free list, block-cleared to contain all zeroes, and added to the list of anonymous pages for the uninitialized data segment. The program then reads and writes data to the page.
4. Scanned -- The pagedaemon awakes
When the free list gets below a certain size, the pagedaemon starts to look for memory pages to steal from processes. It looks at all pages in physical memory order; when it gets to the page, the page is synchronized with the memory management unit (MMU) and a reference bit is cleared.
5. Waiting -- Is the program really using this page right now?
There is a delay that varies depending upon how quickly the pagedaemon scans through memory. If the program references the page during this period, the MMU reference bit is set.
6. Pageout Time -- Saving the contents
The pageout daemon returns and checks the MMU reference bit to find that the program has not used the page so it can be stolen for reuse. The pagedaemon checks to see if anything had been written to the page; if it contains no data, a page-out occurs. The page is moved to the pageout queue and marked as I/O pending. The swapfs code clusters the page together with other pages on the queue and writes the cluster to the swap space. The page is then free and is put on the free list again. It remembers that it still contains the program data.
7. Reclaim -- Give me back my page!
Belatedly, the program tries to read the page and takes a page fault. If the page had been reused by someone else in the meantime, a major fault would occur and the data would be read from the swap space into a new page taken from the free list. In this case, the page is still waiting to be reused, so a minor fault occurs, and the page is moved back from the free list to the program's data segment.
8. Program Exit -- Free again
The program finishes running and exits. The data segments are private to that particular instance of the program (unlike the shared-code segments), so all the pages in the data segment are marked as undefined and put onto the free list. This is the same state as Step 2.
9. Page-in -- A shared code segment
A page fault occurs in the code segment of a window system shared library. The page is taken off the free list, and a read from the filesystem is scheduled to get the code. The process that caused the page fault sleeps until the data arrives. The page is attached to the inode of the file, and the segments reference the inode.
10. Attach -- A popular page
Another process using the same shared-library page faults in the same place. It discovers that the page is already in memory and attaches to the page, increasing its inode reference count by one.
11. COW -- Making a private copy
If one of the processes sharing the page tries to write to it, a copy-on-write (COW) page fault occurs. Another page is grabbed from the free list, and a copy of the original is made. This new page becomes part of a privately mapped segment backed by anonymous storage (swap space) so it can be changed, but the original page is unchanged and can still be shared. Shared libraries contain jump tables in the code that are patched, using COW as part of the dynamic linking process.
12. File Cache -- Not free
The entire window system exits, and both processes go away. This time the page stays in use, attached to the inode of the shared library file. The inode is now inactive but will stay in the inode cache until it is reused, and the pages act as a file cache in case the user is about to restart the window system again.
13. fsflush -- Flushed by the sync
Every 30 seconds all the pages in the system are examined in physical page order to see which ones contain modified data and are attached to a vnode. The details differ between SunOS 4 and Solaris 2, but essentially any modified pages will be written back to the filesystem, and the pages will be marked as clean.

This example sequence can continue from Step 4 or Step 9 with minor variations. The fsflush process occurs every 30 seconds by default for all pages, and whenever the free list size drops below a certain value, the pagedaemon scanner wakes up and reclaims some pages.

Now you know

I have seen this missing-memory question asked about once a month since 1988! Perhaps the manual page for vmstat should include a better explanation of what the values are measuring. This answer is based on some passages from my book Sun Performance and Tuning. The book explains in detail how the memory algorithms work and how to tune them.

Adrian Cockcroft joined Sun Microsystems in 1988, and currently works as a performance specialist for the Computer Systems Division of Sun. He wrote Sun Performance and Tuning: SPARC and Solaris and Sun Performance and Tuning: Java and the Internet, both published by Sun Microsystems Press Books.
 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值