性能优化之: Linux Performance Optimization

Linux Performance Optimization
Performance optimization in Linux doesn't always mean what you might think. It's not just a matter of outright speed; sometimes it's about tuning the system to fit into a small memory footprint.

Always remember that the purpose of the computer is to save you time! There's not much point in spending hours tweaking the system, if the end result is that a program that your run once a month runs five seconds faster. You'll never recoup the time invested in your entire lifetime, let alone the few years before you completely replace that program! By all mean, tweak the system to a certain point, but beyond that, it's just button polishing and time-wasting.

Optimizing for speed

I remember doing a double-take at a Microsoft exec's statement that it doesn't matter whether an application is fast, as long as it looks fast. He had a point, though: what's important is that the application respond quickly to input and come back for more, while perhaps catching up in the background.

A similar principle applies in the Linux world. For a desktop system, an amazing proportion of CPU cycles are actually spent in manipulating pixels on the screen, and the more efficiently this gets done, the faster the system will seem (and the more CPU cycles are freed up for the real work being done under the covers). I was reminded of this recently when tinkering with Red Hat 8.0 on my "lab rat" system, which currently has an Intel 845GBV mainboard.

Red Hat 8.0 had installed fairly smoothly, but screen redrawing was nothing special - probably not much different from the PII-366 I used to use. But much worse, opening a konsole window in KDE gave unreadable text with the kerning (character spacing) completely wrong. I was forced to work in GNOME, a comparatively impoverished experience (I bet that gets a few irate letters, Ed!).

That is, until the day I went searching for the correct graphics driver at the Intel web site. Having found the driver for the Intel 82845G Extreme Graphics chip, I downloaded and installed it and Wow! What A Difference! Screen repainting was an order of magnitude faster, to the extent that some screen-savers lost their hypnotic charm and became frenzied pyrotechnics. And the konsole font problem was fixed up, too!

So, the first step in getting a system tweaked for both speed and reliability is to chase down the latest versions of required device drivers. Your best allies here are http://www.google.com/linux and http://groups.google.com.

Another useful skill to develop is building a "feel" for what the system is doing, and what the bottlenecks are. You can get this by running various system monitoring utilities, such as the top command, or KDE's ktop program which is roughly analogous to the Windows Task Manager Processes and Performance tabs.

Recompile your kernel

Typical distributions comprise a collection of binary packages compiled for a specific target processor. However, if the distribution vendor compiled all their packages for the Pentium IV processor, they'd soon collect a lot of abuse from people who couldn't install on a Pentium II or III system. And they'd collect a surprising number of complaints from people with 386 and 486 machines!

So, the vendors pick a lowest-level "minimum hardware requirement' which, not surprisingly, creeps upwards over the years. The Red Hat 8 box is surprisingly coy about hardware requirements, but I know from experience that the kernels supplied require a Pentium processor or better. So those of you with 386 or 486 machines are partly out of luck. . .

But what if you have a Pentium IV or better? Or a particularly large RAM configuration? If your distribution installed a kernel that was compiled for a Pentium, you are missing out on a number of performance optimizations that would apply to your processor. For example, Red Hat 8 comes with the following kernels:


Kernel RPM

Comments

kernel-2.4.18-14.athlon.rpm

Optimized for AMD Athlon processor

kernel-2.4.18-14.i586.rpm

Optimized for Intel Pentium processor

kernel-2.4.18-14.i686.rpm


kernel-bigmem-2.4.18-14.i686.rpm

Optimized for Pentium III machines with 4 GB or more RAM

kernel-smp-2.4.18-14.athlon.rpm

Optimized for computers with two or more AMD Athlon processors

kernel-smp-2.4.18-14.i686.rpm

Optimized for computers with two or more Pentium II processors

kernel-uml-2.4.18-14.i686.rpm

Linux kernel compiled as a conventional program (user-mode Linux)

Any other configuration will be sub-optimal to some degree. By compiling your own kernel, you can specify the processor precisely (Pentium, Pentium-MMX, Pentium III/Celeron (Coppermine), Pentium-4, Athlon/Duron/K-7, Elan, etc.) as well as selectively enabling or disabling processor-specific options such as Intel processor microcode updating, and MTRR (Memory Type Range Registers) support. That last option, by the way, can make a big difference to display update speeds on systems that have the Pentium Pro, Pentium II or later processors, but adds about 9 KB to the size of the kernel

Recompile the critical applications

However, if you spend all day building and recalculating spreadsheets, then your processor will actually spend relatively little time in the operating system kernel. Most of its time will be spent in the application itself; and Linux, along with similar open-source operating systems, offers a tremendous advantage here: you can actually download the complete source code for application suites like Open Office, and recompile them optimally for your system. However, I'd have to question how much time the average financial analyst would save with a highly optimized spreadsheet, versus the time said financial analyst would likely waste trying to coax the C compiler into recompiling the spreadsheet program the way he wants.

Recompile the whole darn box'n'dice

If you really want to wring the last drop of work from every CPU cycle, then perhaps you should consider a distribution that compiles everything from scratch for your particular processor. A typical example is Gentoo Linux (http://www.gentoo.org). Gentoo is installed in several stages, in a somewhat technically involved bootstrap process which involves booting from an initial CD-ROM image - which can be as small as 40 MB in size, for a quick download - and then downloading all packages in source form and compiling them with your choice of options and optimizations.

Gentoo does this by using a system called "Portage". By using the "emerge" command, you can automatically fetch the source code for supported packages from the Gentoo CVS servers and configure, compile and install them automatically. The file /etc/make.conf contains various compiler flags which can be set to build packages as tightly optimized as you want - always bearing in mind that tight optimization tricks tend to be the natural enemy of stability, not to mention debugging. However, the portage system does make system maintenance easy: just two commands (emerge --update system and emerge --update world) is all that is required to bring a system up to date.

More RAM! And More CPU's

You can never have too much memory. What the Linux kernel doesn't allocate to processes, it will use as disk cache. And if you really are CPU-bound, especially when running multiple processes on a server, then consider using a dual-processor or even quad-processor setup.

Optimizing Disk Access

It's always worth paying attention to disk access, as it's a factor of 100,000 times (or so) slower than memory access - which is, of course, why being forced to use virtual memory by swapping to disk is always a bad idea. There are some simple techniques which can produce worthwhile improvements in disk performance.

First, read up on the hdparm command, which sets various flags and modes on the IDE disk driver subsystem. Two that are particularly worth investigating are the -c option, which can set 32-bit I/O support, and -d which can enable or disable the "using_dma" flag for the drive. Now, most modern distributions will set the using_dma flag to 1 - but if yours hasn't, you're suffering a major performance hit. You should try changing it, by putting a command like

hdparm -d 1 /dev/hda

at the end of the /etc/rc.d/rc.local file. By contrast, most systems are set up for 16-bit I/O - switching to 32-bit might squeeze a little more bandwidth from the disk subsystem, and you can set this up with the command

hdparm -c 1 /dev/hda

again, at the end of /etc/rc.d/rc.local.

You can use the hdparm -t command to test read performance from drives as you make changes. However, this should only be used when the system is lightly loaded, ideally with a single user, and you should average the results from several runs. Alternatively, for a detailed examination of disk performance, you might want to investigate the bonnie++ hard drive benchmark, written by expatriate Aussie Russell Coker: http://www.coker.com.au/bonnie++/

Be very cautious about creating multiple filesystems on a single physical drive. A worst-case scenario might put /usr at the outside of the drive, and /home on the inside - forcing massive head movements as the system runs and slowing it down. A much better approach is to use multiple separate disk drives - in particular, those older, smaller drives you have on the shelf can often be put to good use for virtual memory swap partitions - though not all that fast, they can help to minimise head movement on the filesystem drives. But bear in mind that the more drives you have, the more likely it is that you will experience a drive failure, so make sure that you have a strategy to deal with this, whether it be regular backups or use of hot-swappable RAID drives to avoid data loss completely.

RAID

RAID (Redundant Array of Inexpensive Drives) arrays are worth considering - if not for redundancy and reliability, in some circumstances, for performance. RAID 1 mirrors data on two or more drives, and gives better read performance than a single drive, since data can be read from any drive in the array - but writes are painfully slow, since data must be written to all drives in the array. RAID 5, which stripes data and parity across multiple drives, also offers good read performance. However, bear in mind that the Linux kernel's software RAID 1 support is nowhere near as effective as even a low-cost hardware RAID controller.

Optimizing for memory usage

Sometimes, you want to install Linux on an older machine (it's amazing how those old machines just keep chugging on - I have a vintage 1990 486/33 which is still doing useful work as a backup DNS/mail server). Here, the problem is typically small memory and a slow processor.

Compile a monolithic kernel

The kernels supplied with modern Linux distributions are highly modular, and configure themselves at boot time (or even later, at run time) by loading a few of the hundreds of supplied device driver modules. You can trim your kernel considerably by building the required device driver code into the kernel at compile-time, avoiding the overhead of run-time linking the modules and the space occupied by stub code and the kernel module loader. The downside is that it's inconvenient - change the network card, for example, and you'll have to rebuild the kernel.

Ditch X

The X Window System takes up a lot of memory. Desktop environments like GNOME and KDE take up a lot - and I mean, a lot - more. Since small machines are most likely to be pressed into service as network infrastructure servers, such as DHCP or DNS servers, they do not need a graphical interface. So lose it, by editing /etc/inittab and setting the id (initial default runlevel) value to 3, rather than 5.

If you do need a GUI, then choose a lightweight one. Almost anything is better - oops, I meant smaller - than GNOME or KDE, and while I personally favour xfce (http://www.xfce.org) there are many more to choose from. See http://www.plig.org/xwinman/ for a large selection.

Remove virtual consoles

On a typical Linux box, Alt+F1 through Alt+F6 will select six different text-mode virtual consoles, while Alt+F7 switches to the graphical desktop. We've already disposed of X, but each virtual console also consumes a lot of memory. So prune them: edit /etc/inittab and scroll down to the bottom, where you can comment out four or so of the six lines that spawn mingetty or mgetty or whatever.

Remove unnecessary daemons

You probably don't need sendmail running on your DHCP server, so kill it. Use ntsysv, ksysv, or the chkconfig command to disable as many unnecessary daemons as possible. Good candidates for right-sizing on a minimal system include apmd, autofs, gpm, identd, ip6tables, iptables, isdn, kudzu, lpd, nfs, portmap and talkd.

Optimizing for disk usage

Probably the best advice here is to take great care when initially installing - don't just accept the distribution install program's package groups, but select each individual package. This is a tedious procedure, but the good news is that most distributions have automated or script-driven install utilities like Red Hat's KickStart installer which will allow you to repeat the install with much less effort and you can fine-tune the package selection without too much trouble.

Another point to bear in mind is that using multiple separate filesystems, while a useful technique for security and other reasons, will tend to give rise to more wasted disk space, as there will be some spare space in each of the filesystems, rather than a single pool of free space.

 

Tech Terms:

DMA: Direct Memory Access - A technique used by some device drivers whereby data is transferred directly from the device's buffer memory into the computer's main memory, while the processor is uninvolved and can even be doing something else. Cf Programmed I/O

Filesystem: A partition. While Windows generally refers to partitions by drive letters (C:, D:, etc.), UNIX systems can mount such partitions at almost arbitrary points in the directory tree, so that a large server system might have many separate partitions, invisible to the user.

Programmed I/O: A technique used by some device drivers whereby the processor executes a tight loop, reading each byte in turn from the device's buffer memory and then writing it out to main memory. Cf DMA

sendmail: The universal (and almost universally-reviled) Mail Transfer Agent of the UNIX world. Sendmail is huge and horribly complex because it acts as a gateway between Internet SMTP mail and other protocols like Bitnet, Decnet and UUCP which no-one actually uses.

Virtual Console: A console is a terminal directly attached to a computer, for use by an operator or administrator. PC's don't have terminals, but Linux typically simulates six of them which the user can access by pressing Alt+F1 through Alt+F6.

X Window System: The graphics services subsystem on most UNIX/Linux computers, which provides low-level graphics functionality. Layered on top of that will be a window manager, which provides a common set of widgets such as title-bars, scroll-bars, etc. Finally, layered on top of that may be a desktop environment such as KDE or GNOME, which provides a file manager, taskbar, program launch panel, clipboard, various editors, music players, web browsers and other executive toys.

 

Listing 1: The effects of various hdparm commands.

[root@dvalin root]# hdparm -t /dev/hda

/dev/hda:
Timing buffered disk reads: 64 MB in 2.97 seconds = 21.54 MB/sec
[root@dvalin root]# # Now let's switch to 32-bit I/O
[root@dvalin root]# hdparm -c 1 /dev/hda

/dev/hda:
setting 32-bit I/O support flag to 1
I/O support = 1 (32-bit)
[root@dvalin root]# hdparm -t /dev/hda

/dev/hda:
Timing buffered disk reads: 64 MB in 2.97 seconds = 21.56 MB/sec
[root@dvalin root]# # See? A slight improvement
[root@dvalin root]# # Now let's turn off DMA and see how bad things get. . .
[root@dvalin root]# hdparm -d 0 /dev/hda

/dev/hda:
setting using_dma to 0 (off)
using_dma = 0 (off)
[root@dvalin root]# hdparm -t /dev/hda

/dev/hda:
Timing buffered disk reads: 64 MB in 10.94 seconds = 5.85 MB/sec
 

 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值