测量应用软件和系统软件瓶颈的工具集合

Jean Dagenais published a great response on a mechanical-sympathy thread to Gil Tene's article, The Black Magic Of Systematically Reducing Linux OS Jitter. It's full of helpful tools for tracking down jitter problems. I apologize for the incomplete attribution. I did not find a web presence for Jean. 

To complement the great information I got on the “Systematic Way to Find Linux Jitter”, I have created a toolkit that I now used to evaluate current and future trading platforms.

In case this can be useful, I have listed these tools, as well as the URLs to get the source code and a description of their usage. I am learning a lot by reading the source code, and the blog entry associated.

This is far from an exhaustive list, as every week I find either a new problem area or a new tool that improve my understanding of this beautiful problem domain ;)

These tools are grouped into these categories: 

  1. CPU, Memory, Disk, Network
  2. X86, Linux, and Java time resolution
  3. Context Switches & Inter Thread Latency
  4. System Jitter
  5. Application Building Blocks: distruptor, openHft, Aeron & Workload Generator
  6. Application Performance Testing

Happy Benchmarking and Jitter Chasing!

1. CPU, Memory, Disk, Network

1.1 lmbench - http://www.bitmover.com/lmbench/ 

lmbench provides basic performance data for multiple aspects of the server, from instructions, fork, context switches, disk, network, and memory, and more

1.2.1 Memory - lmbench – lat_mem_rd 

The article describes how to measure memory access, and I used this approach to measure the performance of the memory subsystem under local and remote access (e.g. NUMA effect)

1.2.2 Java Memory Access Patterns – Martin Thompson

Great article and a useful tool that highlight impact of local/remote memory access and NUMA effects.

1.3 Disk Performance - fio - FIO Source Code 

1.4 Network

There are many tools available, and these are the ones I find the most useful. They are useful to tune network parameters (e.g. kernel bypass drivers) when measuring throughput, latency, and jitter.

Java Ping - Nitsan Wakart network-measuring tools

Note: The c/c++ network tools will behave differently from Java (e.g. jit, garbage generation, behavior) as such they are very complementary and offer many options that can be tested easily.

2. X86, Linux, and Java time resolution

2.1 Measuring granularity and latency of currentTimeMillis() & nanoTime(); - Aleksey Shipilev

Great article and use the jmh to test granularity and latency. 

2.1 Measuring Latency in Linux

Provides information about how Linux measure time, and a test program that displays the different timers available and their resolution.

3. Context Switches & Inter Thread Latency

3.1 This is a good write up on Stack Overflow 

3.2 Java and C++ Inter Thread Latency – Martin Thompson

4. System Jitter

4.1 Sysjitter - SysJitter from Solarflare - OpenOnload

sysjitter measures the extent to which the system impacts on user-level code by causing jitter.  It runs a thread on each processor core, and when the thread is "knocked off" the core it measures how long for.  At the end of the run it outputs some summary statistics for each core, and optionally the full raw data.

4.2 jHicckup – Gil Tene – Azul - jHiccup

jHiccup is an open source tool designed to measure the pauses and stalls (or “hiccups”) associated with an application’s underlying Java runtime platform. The new tool captures the aggregate effects of the Java Virtual Machine (JVM), operating system, hypervisor (if used) and hardware on application stalls and response time. 

4.3 MicroJitterSampler - Peter Lawrey - Micro Jitter Busy Waiting and Binding

The micro jitter sampler looks at interrupts to a running thread.  It is similar to jHiccup but instead of measuring how delayed a thread is in waking up, it measures how delays a thread gets once it has started running.  Surprisingly how you run your threads impacts the sort of delays it will see once it wakes up.

5. Application Building Blocks: distruptor, openHft, Aeron, etc

The Disruptor, openHft, and Aeron contain programs that are useful to measure the “raw” performance of each building block/library/framework before we incorporate them with our applications. 

It’s a lot more difficult to figure out the contribution/limitations of each component when doing a macro benchmark of 20 jvms, 5 physical servers, too many cores to count them, and 10,000’s of operations per second. 

I am quite impressed by Aeron, which is both an eye opener (e.g. It can process 11,000,000 messages per second @ 500MB/second on our server), and a major source of learning (from design to implementation) for me. It shows what’s possible to do in Java, when you understand and apply “current best practices”. 

Great work by Martin Thompson and team.

5.2 tress-ng

This tool is used to generate “synthetic workload” on a server. It’s useful for measuring “interferences” with a running app. E.g. what happen if we add this kind of workload on a server.

stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

6. Application Performance Testing

It takes me about 1 day to run most of the tools mentioned before and collect performance data and analyze it. The baseline data is used to assess if the platform/server performance and stability (e.g. jitter) is good enough to go the next level of testing. In general the answer is no, and this will require the use of different tricks (e.g. stopping services, bios setup, numactl, taskset, os settings, and everything mentioned in the other threads) to get an acceptable level.

The application benchmark turns out to be the most valuable but very challenging, as any performance limitations/jitter of the underlying platform and services will now be amplified by the complexity of the application components and their interactions.

For example, these are a few things that are challenging to observe and understand

  • 100’s of app & gc threads running on different cores, sockets, and servers
  • Linux scheduler moving threads around cores/socket/NUMA nodes
  • Different application threading models for reader, worker, and sender threads (BUSY_SPIN, WAIT, BLOCKED, ASYNC)
  • off-heap memory access and the bdflush process cleaning dirty pages and causing disk io bottleneck
  • Messaging infrastructure and appliances …
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值