Grizzly MINA NIO Framework xSocket
http://mina.apache.org/quick-start-guide.html
008-10-06 12:16
Most NIO frameworks can saturate 1 gigabit ethernet at some point. However, some frameworks can saturate the bandwidth with the smaller number of connections while others can not. The performance numbers of the 5 well-known open source NIO frameworks are presented here to help you figure out the excellence of Netty in performance.
Where’s the Graph?
If you are in a hurry, please scroll down to see the graphs first. You can also download the PDF document which contains detailed numbers and graphs.
What’s the Bottom Line?
Unlike usual expectations, NIO frameworks have different performance characteristics in spite of the fact that they are using the same NIO selector provider.
What’s observed is that the difference comes from the fundamental factors such as data structure and thread contention management, and those factors should never be overlooked.
Netty has succeeded to introduce the breakthrough in NIO framework performance with careful engineering, while retaining the flexible architecture.
Test Scenario
A simple echo server and client exchange fixed length messages one by one (i.e. synchronous ping-pong). The handler code, which sends the received data back in verbatim, is executed in a separate thread pool that each NIO framework provides.
The tests were run with different message lengths (64 ~ 16384 bytes) and different network configurations (loopback and 1 gigabit ethernet), to see how well each framework performs on various conditions.
Test Environment
- Software
- The test client has been written in Netty 3.0.0.CR5.
- Echo server implementations
- Netty 3.0.0.CR5
- Other 4 open source NIO frameworks
- Grizzly, MINA, NIO Framework, and xSocket
- Used the latest milestone releases as of October 3rd, 2008
- Excluded inactive projects (no release in 2008)
- Framework names were anonymized in no particular order.
- Thread pool
- The number of I/O threads – the number of the CPU cores
- The number of handler threads – 16
- The default thread pool that each framework provides was used.
- If the framework doesn’t provide a thread pool implementation which limits the maximum number of threads,
Executors.newFixedThreadPool()
was used instead.
- Use of direct buffers was suppressed to avoid excessive memory consumption.
- JRE – Sun JDK 1.6.0_07
- JRE options –
-server -Xms2048m -Xmx2048m -XX:+UseParallelGC -XX:+AggressiveOpts -XX:+UseFastAccessorMethods
- Hardware
- Server (Hostname: Eden)
- CPU: 2 x quad-core Xeon 2.83GHz, ‘performance’ governor
- O/S: Linux 2.6.25.11-97.fc9 (Fedora 9)
- RAM: 6 GiB
- NIC: Broadcom NetXtreme Gigabit Ethernet PCI express
- Client (Hostname: Serpent)
- CPU: 2 x dual-core Xeon 3.00GHz, ‘performance’ governor
- O/S: Linux 2.6.25.11-97.fc9 (Fedora 9)
- RAM: 3 GiB
- NIC: Broadcom NetXtreme Gigabit Ethernet PCI express
- No switching hub was used to minimized possible network latency.
- Server (Hostname: Eden)
- Common TCP/IP parameters
TCP_NODELAY
was turned on. (i.e. Nagle’s algorithm was disabled.)- net.ipv4.tcp_tw_recycle has been set to
1
- Used the default MTU (i.e.
1500
– no jumbo frame)
Test Result
Client and Server on the Same Machine (Loopback Device)
The test client and servers ran on the same machine, Eden. (If images are not showing up, please refresh. There are three graphs here.)
Client and Server on Different Machines (1 Gigabit Ethernet)
The test client ran in Serpent, and the servers ran in Eden. (If images are not showing up, please refresh. There are three graphs here.)