NUMA Performance Optimization (文档 ID 1488175.1)

NUMA Performance Optimization (文档 ID 1488175.1)

转到底部


In this Document

Symptoms

 

Cause

 

Solution


 

APPLIES TO:

Linux OS - Version 2.6.18 and later
Information in this document applies to any platform.
On architectures which support NUMA (non-uniform memory access) it is possible for an application on one cluster node can access memory physically on another node. While this is done using a high-speed transfer bus, the accesses are significantly slower than access to memory local to the node.

The Linux kernel makes extensive use of dynamic memory allocation during its normal operations. Depending on several heuristics, the memory may be allocated locally, or it may be allocated on a separate cluster node.

The kernel's memory allocation policy can be tuned using simple command-line settings.

Each cluster node can have a separate allocation policy. As a corollary, the settings should probably be changed on all cluster nodes. Pinning a process to a particular CPU has no impact on this allocation policy.

SYMPTOMS

System performance can vary with system load, as memory allocations may be satisfied by using local memory, or by using remote memory from a different cluster node.

Performance anomalies include unexpected swap store usage; unexpectedly poor performance; or system out-of-memory process terminations while sufficient memory and swap space appear available.

CAUSE

The Linux kernel may satisfy its dynamic memory allocations using either local or remote memory in a NUMA system; this is its default operation.  The kernel tries to keep memory allocation local, but may choose to use remote memory.  While this does allow the kernel or process to keep running, there can be a significant performance penalty.

SOLUTION

Observe the current kernel memory allocation policy:

# /sbin/sysctl vm.zone_reclaim_mode

0

 

The default value of vm.zone_reclaim_mode is 0.
Neither vm.zone_reclaim_mode nor vm.zone_reclaim_interval are present in kernel-xen-2.6.18-xxx.
vm.zone_reclaim_interval is not present in UEK 2.6.32 and later.

Add an entry in the /etc/sysctl.conf file to select a different policy:

vm.zone_reclaim_mode = 6

This will prevent the local node from allocating VM pages on a different cluster node.

After modified /etc/sysctl.conf, please make the modification take effect by:

# /sbin/sysctl -p

There is a related setting which determines how frequently local memory is scavenged:

# cat /proc/sys/vm/zone_reclaim_interval

30

This value can be decreased if unwanted off-node allocations still take place:

vm.zone_reclaim_interval = 10

This causes the reclaimation scan every 10 seconds as opposed to the default value of 30 seconds.

For a more complete description of the possible values for these settings, please ensure that the kernel-doc RPM package is installed, and consult the/usr/share/doc/kernel-doc-2.6.18/Documentation/sysctl/vm.txt file..

 

The zone_reclaim_mode is two-edged sword for performance issue.

Some file operations, such as copy, move, or backup, rely heavily on the system cache memory.   With zone reclaimation disabled (vm.zone_reclaim_mode=0) memory pressure can result.  If there are /var/log/messages similar to:

swapper: page allocation failure. order:1, mode:0x20

these are symptomatic of memory pressue. In such a situation, enable zone reclaimation by setting vm.zone_reclaim_mode=1 to allow the off-node allocations to succeed.

 

You don't have to accept slow Ruby or Rails performance. In this comprehensive guide to Ruby optimization, you'll learn how to write faster Ruby code--but that's just the beginning. See exactly what makes Ruby and Rails code slow, and how to fix it. Alex Dymo will guide you through perils of memory and CPU optimization, profiling, measuring, performance testing, garbage collection, and tuning. You'll find that all those "hard" things aren't so difficult after all, and your code will run orders of magnitude faster. This is the first book ever that consolidates all the Ruby performance optimization advice in one place. It's your comprehensive guide to memory optimization, CPU optimization, garbage collector tuning, profiling, measurements, performance testing, and more. You'll go from performance rookie to expert. First, you'll learn the best practices for writing Ruby code that's easy not only on the CPU, but also on memory, and that doesn't trigger the dreaded garbage collector. You'll find out that garbage collection accounts for 80% of slowdowns, and often takes more than 50% of your program's execution time. And you'll discover the bottlenecks in Rails code and learn how selective attribute loading and preloading can mitigate the performance costs of ActiveRecord. As you advance to Ruby performance expert, you'll learn how profile your code, how to make sense out of profiler reports, and how to make optimization decisions based on them. You'll make sure slow code doesn't creep back into your Ruby application by writing performance tests, and you'll learn the right way to benchmark Ruby. And finally, you'll dive into the Ruby interpreter internals to really understand why garbage collection makes Ruby so slow, and how you can tune it up. What You Need: Some version of Ruby. The advice from this book applies to all modern Ruby versions from 1.9 to 2.2. 80% of the material will also be useful for legacy Ruby 1.8 users, and there is 1.8-specific advice as well. Table of Contents Chapter 1. What Makes Ruby Code Fast Chapter 2. Fix Common Performance Problems Chapter 3. Make Rails Faster Chapter 4. Profile Chapter 5. Learn to Optimize with the Profiler Chapter 6. Profile Memory Chapter 7. Measure Chapter 8. Test Performance Chapter 9. Think Outside the Box Chapter 10. Tune Up the Garbage Collector
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值