Last revised 28 Feb 2005. If you want to see what has changed,
search for this date in this article.
If you like this article, visit my blog, PHP Everywhere for related
articles.
A HOWTO on Optimizing PHP
PHP is a very fast
programming language, but there is more to optimizing PHP than just
speed of code execution.
In this chapter, we explain why optimizing PHP involves many
factors which are not code related, and why tuning PHP requires an
understanding of how PHP performs in relation to all the other
subsystems on your server, and then identifying bottlenecks caused
by these subsystems and fixing them. We also cover how to tune and
optimize your PHP scripts so they run even faster.
Achieving High Performance
When we talk about good performance, we are not talking about
how fast your PHP scripts will run. Performance is a set of
tradeoffs between scalability and speed. Scripts tuned to use fewer
resources might be slower than scripts that perform caching, but
more copies of the same script can be run at one time on a web
server.
In the example below, A.php is a sprinter that can run fast, and
B.php is a marathon runner than can jog forever at the nearly the
same speed. For light loads, A.php is substantially faster, but as
the web traffic increases, the performance of B.php only drops a
little bit while A.php just runs out of steam.
Let us take a more
realistic example to clarify matters further. Suppose we need to
write a PHP script that reads a 250K file and generates a HTML
summary of the file. We write 2 scripts that do the same thing:
hare.php that reads the whole file
into memory at once and processes it in one pass, and
tortoise.php that reads the file, one
line at time, never keeping more than the longest line in
memory. Tortoise.php will be slower as multiple reads are
issued, requiring more system calls.
Hare.php requires 0.04 seconds of CPU and 10 Mb RAM and
tortoise.php requires 0.06 seconds of CPU and 5 Mb RAM. The server
has 100 Mb free actual RAM and its CPU is 99% idle. Assume no
memory fragmentation occurs to simplify things.
At 10 concurrent scripts running, hare.php will run out of
memory (10 x 10 = 100). At that point, tortoise.php will still have
50 Mb of free memory. The 11th concurrent script to run will bring
hare.php to its knees as it starts using virtual memory, slowing it
down to maybe half its original speed; each invocation of hare.php
now takes 0.08 seconds of CPU time. Meanwhile, tortoise.php will be
still be running at its normal 0.06 seconds CPU time.
In the table below, the faster php script for different loads is
in bold:
Connections
CPU seconds required to
satisfy 1 HTTP request
CPU seconds required to
satisfy 10 HTTP requests
CPU seconds required to
satisfy 11 HTTP requests
hare.php
0.04
0.40
0.88
(runs out of RAM)
tortoise.php
0.06
0.60
0.66
As the above example shows, obtaining good performance is not
merely writing fast PHP scripts. High performance PHP requires a
good understanding of the underlying hardware, the operating system
and supporting software such as the web server and database.
Bottlenecks
The hare and tortoise example has shown us that bottlenecks
cause slowdowns. With infinite RAM, hare.php will always be faster
than tortoise.php. Unfortunately, the above model is a bit
simplistic and there are many other bottlenecks to performance
apart from RAM:
(a) Networking
Your network is probably the biggest bottleneck. Let us say you
have a 10 Mbit link to the Internet, over which you can pump 1
megabyte of data per second. If each web page is 30k, a mere 33 web
pages per second will saturate the line.
More subtle networking bottlenecks include frequent access to
slow network services such as DNS, or allocating insufficient
memory for networking software.
(b) CPU
If you monitor your CPU load, sending plain HTML pages over a
network will not tax your CPU at all because as we mentioned
earlier, the bottleneck will be the network. However for the
complex dynamic web pages that PHP generates, your CPU speed will
normally become the limiting factor. Having a server with multiple
processors or having a server farm can alleviate this.
(c) Shared Memory
Shared memory is used for inter-process communication, and to
store resources that are shared between multiple processes such as
cached data and code. If insufficient shared memory is allocated
any attempt to access resources that use shared memory such as
database connections or executable code will perform poorly.
(d) File System
Accessing a hard disk can be 50 to 100 times slower than reading
data from RAM. File caches using RAM can alleviate this. However
low memory conditions will reduce the amount of memory available
for the file-system cache, slowing things down. File systems can
also become heavily fragmented, slowing down disk accesses. Heavy
use of symbolic links on Unix systems can slow down disk accesses
too.
Default Linux installs are also notorious for setting hard disk
default settings which are tuned for compatibility and not for
speed. Use the command hdparm to tune your Linux hard disk
settings.
(e) Process Management
On some operating systems such as Windows creating new processes
is a slow operation. This means CGI applications that fork a new
process on every invocation will run substantially slower on these
operating systems. Running PHP in multi-threaded mode should
improve response times (note: older versions of PHP are not stable
in multi-threaded mode).
Avoid overcrowding your web server with too many unneeded
processes. For example, if your server is purely for web serving,
avoid running (or even installing) X-Windows on the machine. On
Windows, avoid running Microsoft Find Fast (part of Office) and
3-dimensional screen savers that result in 100% CPU
utilization.
Some of the programs that you can consider removing include
unused networking protocols, mail servers, antivirus scanners,
hardware drivers for mice, infrared ports and the like. On Unix, I
assume you are accessing your server using SSH. Then you can
consider removing:
deamons such as telnetd,
inetd, atd, ftpd, lpd, sambad
sendmail for incoming mail
portmap for NFS
xfs, fvwm, xinit, X
You can also disable at
startup various programs by modifying the startup files which are
usually stored in the /etc/init* or /etc/rc*/init*
directory.
Also review your cron jobs to see if you can remove them or
reschedule them for off-peak periods.
(f) Connecting to Other Servers
If your web server requires services running on other servers,
it is possible that those servers become the bottleneck. The most
common example of this is a slow database server that is servicing
too many complicated SQL requests from multiple web servers.
When to Start
Optimizing?
Some people say that it is better to defer tuning until after
the coding is complete. This advice only makes sense if your
programming team's coding is of a high quality to begin with, and
you already have a good feel of the performance parameters of your
application. Otherwise you are exposing yourselves to the risk of
having to rewrite substantial portions of your code after
testing.
My advice is that before you design a software application, you
should do some basic benchmarks on the hardware and software to get
a feel for the maximum performance you might be able to achieve.
Then as you design and code the application, keep the desired
performance parameters in mind, because at every step of the way
there will be tradeoffs between performance, availability, security
and flexibility.
Also choose good test data. If your database is expected to hold
100,000 records, avoid testing with only a 100 record database –
you will regret it. This once happened to one of the programmers in
my company; we did not detect the slow code until much later,
causing a lot of wasted time as we had to rewrite a lot of code
that worked but did not scale.
Tuning Your Web Server for
PHP
We will cover how to get the best PHP performance for the two
most common web servers in use today, Apache 1.3 and IIS. A lot of
the advice here is relevant for serving HTML also.
The authors of PHP have stated that there is no performance nor
scalability advantage in using Apache 2.0 over Apache 1.3 with PHP,
especially in multi-threaded mode. When running Apache 2.0 in
pre-forking mode, the following discussion is still relevant (21
Oct 2003).
(a) Apache 1.3/2.0
Apache is available on both Unix and Windows. It is the most
popular web server in the world. Apache 1.3 uses a
pre-forking model for web serving. When Apache starts up, it
creates multiple child processes that handle HTTP requests. The
initial parent process acts like a guardian angel, making sure that
all the child processes are working properly and coordinating
everything. As more HTTP requests come in, more child processes are
spawned to process them. As the HTTP requests slow down, the parent
will kill the idle child processes, freeing up resources for other
processes. The beauty of this scheme is that it makes Apache
extremely robust. Even if a child process crashes, the parent and
the other child processes are insulated from the crashing
child.
The pre-forking model is not as fast as some other possible
designs, but to me that it is "much ado about nothing" on a server
serving PHP scripts because other bottlenecks will kick in long
before Apache performance issues become significant. The robustness
and reliability of Apache is more important.
Apache 2.0 offers operation in multi-threaded mode. My
benchmarks indicate there is little performance advantage in this
mode. Also be warned that many PHP extensions are not compatible
(e.g. GD and IMAP). Tested with Apache 2.0.47 (21 Oct 2003).
Apache is configured using the httpd.conf file. The following
parameters are particularly important in configuring child
processes:
Directive
Default
Description
MaxClients
256
The maximum number of
child processes to create. The default means that up to 256 HTTP
requests can be handled concurrently. Any further connection
requests are queued.
StartServers
5
The number of child
processes to create on startup.
MinSpareServers
5
The number of idle child
processes that should be created. If the number of idle child
processes falls to less than this number, 1 child is created
initially, then 2 after another second, then 4 after another
second, and so forth till 32 children are created per
second.
MaxSpareServers
10
If more than this number
of child processes are alive, then these extra processes will be
terminated.
MaxRequestsPerChild
0
Sets the number of HTTP
requests a child can handle before terminating. Setting to 0 means
never terminate. Set this to a value to between 100 to 10000 if you
suspect memory leaks are occurring, or to free under-utilized
resources.
For large sites, values
close to the following might be better:
MinSpareServers 32
MaxSpareServers 64
Apache on Windows
behaves differently. Instead of using child processes, Apache uses
threads. The above parameters are not used. Instead we have one
parameter: ThreadsPerChild which defaults to 50. This
parameter sets the number of threads that can be spawned by Apache.
As there is only one child process in the Windows version, the
default setting of 50 means only 50 concurrent HTTP requests can be
handled. For web servers experiencing higher traffic, increase this
value to between 256 to 1024.
Other useful performance parameters you can change include:
Directive
Default
Description
SendBufferSize
Set to OS
default
Determines the size of
the output buffer (in bytes) used in TCP/IP connections. This is
primarily useful for congested or slow networks when packets need
to be buffered; you then set this parameter close to the size of
the largest file normally downloaded. One TCP/IP buffer will be
created per client connection.
KeepAlive [on|off]
On
In the original HTTP
specification, every HTTP request had to establish a separate
connection to the server. To reduce the overhead of frequent
connects, the keep-alive header was developed. Keep-alives tells
the server to reuse the same socket connection for multiple HTTP
requests.
If a separate dedicated
web server serves all images, you can disable this option. This
technique can substantially improve resource
utilization.
KeepAliveTimeout
15
The number of seconds to
keep the socket connection alive. This time includes the generation
of content by the server and acknowledgements by the client. If the
client does not respond in time, it must make a new
connection.
This value should be
kept low as the socket will be idle for extended periods
otherwise.
MaxKeepAliveRequests
100
Socket connections will
be terminated when the number of requests set by
MaxKeepAliveRequests is reached. Keep this to a high value below
MaxClients or ThreadsPerChild.
TimeOut
300
Disconnect when idle
time exceeds this value. You can set this value lower if your
clients have low latencies.
LimitRequestBody
0
Maximum size of a PUT or
POST. O means there is no limit.
If you do not require
DNS lookups and you are not using the htaccess file to configure
Apache settings for individual directories you can set:
# disable DNS lookups: PHP scripts only get the IP
address
HostnameLookups off
# disable htaccess checks
/>
AllowOverride none
If you are not worried
about the directory security when accessing symbolic links, turn on
FollowSymLinks and turn off SymLinksIfOwnerMatch to prevent
additional lstat() system calls from being made:
Options FollowSymLinks
#Options SymLinksIfOwnerMatch
(b) IIS Tuning
IIS is a multi-threaded web server available on Windows NT and
2000. From the Internet Services Manager, it is possible to tune
the following parameters:
Performance Tuning based
on the number of hits per day.
Determines how much
memory to preallocate for IIS. (Performance Tab).
Bandwidth
throttling
Controls the bandwidth
per second allocated per web site. (Performance Tab).
Process
throttling
Controls the CPU%
available per Web site. (Performance Tab).
Timeout
Default is 900 seconds.
Set to a lower value on a Local Area Network. (Web Site
Tab)
HTTP
Compression
In IIS 5, you can
compress dynamic pages, html and images. Can be configured to cache
compressed static html and images. By default compression is
off.
HTTP compression has to
be enabled for the entire physical server. To turn it on open the
IIS console, right-click on the server (not any of the subsites,
but the server in the left-hand pane), and get Properties. Click on
the Service tab, and select "Compress application files" to
compress dynamic content, and "Compress static files" to compress
static content.
You can also configure
the default isolation level of your web site. In the Home Directory
tab under Application Protection, you can define your level of
isolation. A highly isolated web site will run slower because it is
running as a separate process from IIS, while running web site in
the IIS process is the fastest but will bring down the server if
there are serious bugs in the web site code. Currently I recommend
running PHP web sites using CGI, or using ISAPI with Application
Protection set to high.
You can also use regedit.exe to modify following IIS 5 registry
settings stored at the following location:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Inetinfo\Parameters\
MemCacheSize
Sets the amount of
memory that IIS will use for its file cache. By default IIS will
use 50% of available memory. Increase if IIS is the only
application on the server. Value is in megabytes.
MaxCachedFileSize
Determines the maximum
size of a file cached in the file cache in bytes. Default is
262,144 (256K).
ObjectCacheTTL
Sets the length of time
(in milliseconds) that objects in the cache are held in memory.
Default is 30,000 milliseconds (30 seconds).
MaxPoolThreads
Sets the number of pool
threads to create per processor. Determines how many CGI
applications can run concurrently. Default is 4. Increase this
value if you are using PHP in CGI mode.
ListenBackLog
Specifies the maximum
number of active Keep Alive connections that IIS maintains in the
connection queue. Default is 15, and should be increased to the
number of concurrent connections you want to support. Maximum is
250.
If the settings are
missing from this registry location, the defaults are being
used.
High Performance on Windows: IIS and FastCGI
After much testing, I find that the best PHP performance on
Windows is offered by using IIS with FastCGI. CGI is a protocol for
calling external programs from a web server. It is not very fast
because CGI programs are terminated after every page request.
FastCGI modifies this protocol for high performance, by making the
CGI program persist after a page request, and reusing the same CGI
program when a new page request comes in.
As the installation of FastCGI with IIS is complicated, you
should use the EasyWindows PHP
Installer. This will install PHP, FastCGI and Turck MMCache for
the best performance possible. This installer can also install PHP
for Apache 1.3/2.0.
This section on FastCGI added 21 Oct 2003.
PHP4's Zend Engine
The Zend Engine is the internal compiler and runtime engine used
by PHP4. Developed by Zeev Suraski and Andi Gutmans, the Zend
Engine is an abbreviation of their names. In the early days of
PHP4, it worked in the following fashion:
The PHP script was
loaded by the Zend Engine and compiled into Zend opcode. Opcodes,
short for operation codes, are low level binary instructions. Then
the opcode was executed and the HTML generated sent to the client.
The opcode was flushed from memory after execution.
Today, there are a multitude of products and techniques to help
you speed up this process. In the following diagram, we show the
how modern PHP scripts work; all the shaded boxes are optional.
PHP Scripts are loaded
into memory and compiled into Zend opcodes. These opcodes can now
be optimized using an optional peephole optimizer called Zend
Optimizer. Depending on the script, it can increase the speed of
your PHP code by 0-50%.
Formerly after execution, the opcodes were discarded. Now the
opcodes can be optionally cached in memory using several
alternative open source products and the Zend Accelerator (formerly
Zend Cache), which is a commercial closed source product. The only
opcode cache that is compatible with the Zend Optimizer is the Zend
Accelerator. An opcode cache speeds execution by removing the
script loading and compilation steps. Execution times can improve
between 10-200% using an opcode cache.
Where to find Opcode Caches
Zend Accelerator: A commercial opcode
cache developed by the Zend Engine team. Very reliable and robust.
Visit http://zend.com for more
information.
You will need to test the following open
source opcode caches before using them on production servers as
their performance and reliability very much depends on the PHP
scripts you run.
Turcke MMCache: http://turck-mmcache.sourceforge.net/
is no longer maintained. See eAccelerator, which is a
branch of mmcache that is actively maintained (Added 28 Feb
2005).
One of the secrets of high performance is not to write faster
PHP code, but to avoid executing PHP code by caching generated HTML
in a file or in shared memory. The PHP script is only run once and
the HTML is captured, and future invocations of the script will
load the cached HTML. If the data needs to be updated regularly, an
expiry value is set for the cached HTML. HTML caching is not part
of the PHP language nor Zend Engine, but implemented using PHP
code. There are many class libraries that do this. One of them is
the PEAR Cache, which we will cover in the next section. Another is
the Smarty
template library.
Finally, the HTML sent to a web client can be compressed. This
is enabled by placing the following code at the beginning of your
PHP script:
ob_start("ob_gzhandler");
:
:
?>
If your HTML is highly
compressible, it is possible to reduce the size of your HTML file
by 50-80%, reducing network bandwidth requirements and latencies.
The downside is that you need to have some CPU power to spare for
compression.
HTML Caching with PEAR
Cache
The PEAR Cache is a set of caching classes that allows you to
cache multiple types of data, including HTML and images.
The most common use of the PEAR Cache is to cache HTML text. To
do this, we use the Output buffering class which caches all text
printed or echoed between the start() and end() functions:
require_once("Cache/Output.php");
$cache = new Cache_Output("file",
array("cache_dir" => "cache/") );
if ($contents =
$cache->start(md5("this is a unique key!")))
{
## aha, cached data
returned#
$contents;
"
Cache
Hit
";} else {
## no cached data, or
cache expired#
"
Don't leave home without
it…
"; # place incache
"
Stand and
deliver
"; # place incache
$cache->end(10);
}
Since I wrote
these lines, a superior PEAR cache system has been developed:
Cache Lite;
and for more sophisticated distributed caching, see memcached (Added 28 Feb
2005).
The Cache constructor
takes the storage driver to use as the first parameter. File,
database and shared memory storage drivers are available; see the
pear/Cache/Container directory. Benchmarks by Ulf Wendel suggest
that the "file" storage driver offers the best performance. The
second parameter is the storage driver options. The options are
"cache_dir", the location of the caching directory, and
"filename_prefix", which is the prefix to use for all cached files.
Strangely enough, cache expiry times are not set in the options
parameter.
To cache some data, you
generate a unique id for the cached data using a key. In the above
example, we used md5("this is a unique key!").
The start() function uses the key to find a cached copy of the
contents. If the contents are not cached, an empty string is
returned by start(), and all future echo() and print() statements
will be buffered in the output cache, until end() is called.
The end() function returns the contents of the buffer, and ends
output buffering. The end() function takes as its first parameter
the expiry time of the cache. This parameter can be the seconds to
cache the data, or a Unix integer timestamp giving the date and
time to expire the data, or zero to default to 24 hours.
Another way to use the PEAR cache is to store variables or other
data. To do so, you can use the base Cache class:
require_once("Cache.php");
$cache = new Cache("file", array("cache_dir"
=> "cache/") );$id =
$cache->generateID("this is a unique
key");
if ($data = $cache->get($id))
{
print "Cache
hit.
Data: $data";
} else {
$data = "The
quality of mercy is not strained...";$cache->save($id,
$data, $expires = 60);print "Cache
miss.
";
}
?>
To save the data we use
save(). If your unique key is already a legal file name, you can
bypass the generateID() step. Objects and arrays can be saved
because save() will serialize the data for you. The last parameter
controls when the data expires; this can be the seconds to cache
the data, or a Unix integer timestamp giving the date and time to
expire the data, or zero to use the default of 24 hours. To
retrieve the cached data we use get().
You can delete a cached data item using
$cache->delete($id) and remove all cached items
using $cache->flush().
New: A faster Caching class is Cache-Lite. Highly
recommended.
Using Benchmarks
In earlier section we have covered many performance issues. Now
we come to the meat and bones, how to go about measuring and
benchmarking your code so you can obtain decent information on what
to tune.
If you want to perform realistic benchmarks on a web server, you
will need a tool to send HTTP requests to the server. On Unix,
common tools to perform benchmarks include ab (short for
apachebench) which is part of the Apache release, and the newer
flood (httpd.apache.org/test/flood). On Windows NT/2000 you can use
Microsoft's free Web Application Stress Tool
(webtool.rte.microsoft.com).
These programs can make multiple concurrent HTTP requests,
simulating multiple web clients, and present you with detailed
statistics on completion of the tests.
You can monitor how your server behaves as the benchmarks are
conducted on Unix using "vmstat 1". This prints out a status report
every second on the performance of your disk i/o, virtual memory
and CPU load. Alternatively, you can use "top d 1" which gives you
a full screen update on all processes running sorted by CPU load
every 1 second.
On Windows 2000, you can use the Performance Monitor or the Task
Manager to view your system statistics.
If you want to test a particular aspect of your code without
having to worry about the HTTP overhead, you can benchmark using
the microtime(), which returns the current time accurate to the
microsecond as a string. The following function will convert it
into a number suitable for calculations.
function getmicrotime(){
list($usec,
$sec) = explode(" ",microtime());
return
((float)$usec + (float)$sec);}
$time = getmicrotime();
## benchmark code
here#
echo "
Time
elapsed: ",getmicrotime() - $time, " seconds";
Alternatively, you can
use a profiling tool such as APD or
XDebug. Also
see my article squeezing code with
xdebug.