性能测试的原因
- 部分并发程序是由串行程序改造而来,需要比较两种算法的性能
- 由于业务原因引入多线程,多线程并发控制导致性能损耗,评估损耗比重是否能够接受.
4.1 JMH
JMH ( Java Microbenchmark Harness ) 是一个在 OpenJDK 项目中发布的, 专门用于性能
测试的框架, 其精度可以到达毫秒级.
4.2 JMH简单使用
导入JMH包
使用Maven导入,pom.xml内容如下:
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.20</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.20</version>
<scope>provided</scope>
</dependency>
JMH程序示例
@BenchmarkMode(Mode.AverageTime)//度量模式
@OutputTimeUnit(TimeUnit.MICROSECONDS)//度量单位
public class JMHSample_01_HelloWorld {
@Benchmark
public void wellHelloThere() {
// this method was intentionally left blank.
//System.out.println("ok");
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_01_HelloWorld.class.getSimpleName())
.forks(1).build();
new Runner(opt).run();
}
}
设置APT模式
- APT(Annotatino Processing Tool)的作用是处理代码中的注解, 用来生成代码
- JMH 框架会在测试开始前, 根据用户的测试用例, 通过 Java APT 机制生成真正的测试代码
设置过程
- 安装Maven插件m2e-apt
- Preference=>Maven=>Annotation Processing Tool=>勾选automatically
测试结果分析
# JMH version: 1.20
# VM version: JDK 1.8.0_45, VM 25.45-b02
# VM invoker: D:\Desktop\study\java\jdk\jre\bin\java.exe
# VM options: -Dfile.encoding=UTF-8
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: geym.conc.ch3.jmh.JMHSample_01_HelloWorld.wellHelloThere
# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration 1: ≈ 10⁻⁴ us/op
...
# Warmup Iteration 20: ≈ 10⁻⁴ us/op
Iteration 1: ≈ 10⁻⁴ us/op
...
Iteration 20: ≈ 10⁻⁴ us/op
Result "geym.conc.ch3.jmh.JMHSample_01_HelloWorld.wellHelloThere": ≈ 10⁻⁴ us/op
# Run complete. Total time: 00:00:40
Benchmark Mode Cnt Score Error Units
JMHSample_01_HelloWorld.wellHelloThere avgt 20 ≈ 10⁻⁴ us/op
代码分析
- 1-10行是测试的基本信息,包括java路径,预热和测试代码迭代次数,线程数量等
- Warmup是热身时的性能指标,预热能够使JVM充分优化测试代码
- Iteration为实际测试代码时的性能指标
- 最后一行表示被测试函数,测试模式,测试次数,得分等信息
4.3 JMH的基本概念与配置
模式(Mode)
- Throughput: 整体吞吐量, 表示 1 秒内可以执行多少次调用。
- AverageTime: 调用的平均时间, 指每一次调用所需要的时间。
- SampleTime: 随机取样, 最后输出取样结果的分布, 例如“ 99%的调用在 xxx 毫秒” 。
- SingleShotTime: 只运行一次。 同时把 warmup 次数设为 0, 用于测试冷启动时的性能(不预热)。
迭代(Iteration)
JMH的一次测试单位,一次迭代为1s,期间不断调用被测方法,并采样计算吞吐量,平均时间等参数.
预热(Warmup)
- 由于JVM中JIT的存在,同一方法在JIT编译前后时间不同.
- 预热代码,使代码得到充分JIT编译,通常只考虑方法在JIT后的性能
状态(State)
通过State可指定对象的作用范围
- 线程范围(Thread):为每个线程生成一个对象
- 基准测试范围(Benchmark):多个线程共享一个实例
配置类(Options/OptionsBuilder)
测试前对测试参数配置
- 指定测试类(include)
- 使用进程个数(fork)
- 预热迭代次数(warmupIterations)
Options opt = new OptionsBuilder() .include(JMHSample_01_HelloWorld.class.getSimpleName()) .forks(1).build(); new Runner(opt)•run();
4.4 JMH中的Mode
测试代码
@Benchmark
@BenchmarkMode(Mode.XXX)//表示不同模式
eOutputTimeUnit(TimeUnit_SECONDS)
public void measureThroughput(} throws InterruptedException (
TimeUnit.MILLISECONDS.sleep(100);
}
测试结果
-
Mode.Throughput
JMHSample 02 BenehmarkModes.measureThroughput thrpt 20 9.960 ± 0.007 ops/s 每秒约10次操作
-
Mode.AverageTime
JMHSample_02_BenchmarkModes•measureAvgTime avgt 20 100449,572 土 77.384 us/op 每次操作约100ms
-
Mode.SampleTime
JMHSample_02_BenchmarkModes.measureSamples sample 200 100323.820 士 83,746 us/op JMHSample_02_BenchmarkModes•measureSamples:measureSamples p0.00 sample 99221.504 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples p0.50 sample 100270.380 us/op JMHSample_02_BenchnerkModes.measureSamples:measureSamples p0.90 sample 100794.368 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples pO.99 sample 101055,201 us/op JMHSampIe 02 BenchmarkModes.measureSamples:measureSamples p1.00 sample 101974,016 us/op 在一定时间内完成的概率
4.5 JMH中的State
代码示例
public class JMHSample_03_States {
@State(Scope.Benchmark)//线程共享
public static class BenchmarkState {
volatile double x = Math.PI;
}
@State(Scope.Thread)//线程独享副本
public static class ThreadState {
volatile double x = Math.PI;
}
@Benchmark
public void measureUnshared(ThreadState state) {
state.x++;
}
@Benchmark
public void measureShared(BenchmarkState state) {
state.x++;
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_03_States.class.getSimpleName())
.threads(4)
.forks(1)
.build();
new Runner(opt).run();
}
}
结果分析
Benchmark Mode Cnt Score Error Units
JMHSample_03_States.measureShared thrpt 20 77596034.965 ± 560383.574 ops/s
JMHSample_03_States.measureUnshared thrpt 20 699479891.399 ± 3711396.990 ops/s
- 线程共享一份数据,写入时效率较低
4.6 对于性能的思考
性能比较
- 在不同的使用环境下,模块的性能可能不同.
- 严格的性能比较,两个模块的功能和测试环境应该相同.
- 性能的两个参数
时间复杂度
空间复杂度
性能比较实例
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public class MapTest {
static Map hashMap = new HashMap();
static Map syncHashMap = Collections.synchronizedMap(new HashMap());
static Map concurrentHashMap = new ConcurrentHashMap();
@Setup
public void setup() {
for (int i = 0; i < 10000; i++) {
hashMap.put(Integer.toString(i), Integer.toString(i));
syncHashMap.put(Integer.toString(i), Integer.toString(i));
concurrentHashMap.put(Integer.toString(i), Integer.toString(i));
}
}
@Benchmark
public void hashMapGet() {
hashMap.get("4");
}
@Benchmark
public void syncHashMapGet() {
syncHashMap.get("4");
}
@Benchmark
public void concurrentHashMapGet() {
concurrentHashMap.get("4");
}
@Benchmark
public void hashMapSize() {
hashMap.size();
}
@Benchmark
public void syncHashMapSize() {
syncHashMap.size();
}
@Benchmark
public void concurrentHashMapSize() {
concurrentHashMap.size();
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(MapTest.class.getSimpleName()).forks(1).warmupIterations(5)
.measurementIterations(5).threads(2).build();
new Runner(opt).run();
}
}
@Setup表示初始化操作,被修饰的方法在测试前执行
结果分析
-
单线程测试
Benchmark Mode Cnt Score Error Units MapTest.concurrentHashMapGet thrpt 5 138.200 ± 5.530 ops/us MapTest.concurrentHashMapSize thrpt 5 915.124 ± 146.810 ops/us MapTest.hashMapGet thrpt 5 157.456 ± 31.701 ops/us MapTest.hashMapSize thrpt 5 1705.856 ± 175.743 ops/us MapTest.syncHashMapGet thrpt 5 67.337 ± 7.518 ops/us MapTest.syncHashMapSize thrpt 5 76.763 ± 0.898 ops/us
-
多线程测试
Benchmark Mode Cnt Score Error Units MapTest.concurrentHashMapGet thrpt 5 254.638 ± 31.406 ops/us MapTest.concurrentHashMapSize thrpt 5 1639.774 ± 189.014 ops/us MapTest.hashMapGet thrpt 5 290.629 ± 63.919 ops/us MapTest.hashMapSize thrpt 5 3213.160 ± 220.701 ops/us MapTest.syncHashMapGet thrpt 5 18.772 ± 0.366 ops/us MapTest.syncHashMapSize thrpt 5 22.952 ± 1.291 ops/us
代码分析
- HashMap:无锁
ConcurrentHashMap:多段锁
Collections.synchronizedMap(new HashMap()):全局锁- 单线程测试时:无锁>多段锁>全局锁(锁需要消耗性能)
3, 多线程时,全局锁应该阻塞性能反而降低,而无锁和多段锁性能约提高一倍
4.7 CopyOnWriteArrayList 与 ConcurrentLinkedQueue
- CopyOnWriteArrayList 通过写复制提高并发性能
- ConcurrentLinkedQueue 通过CAS和锁分离提高性能
性能测试实例
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public class ListTest {
CopyOnWriteArrayList smallCopyOnWriteList = new CopyOnWriteArrayList();
ConcurrentLinkedQueue smallConcurrentList = new ConcurrentLinkedQueue();
CopyOnWriteArrayList bigCopyOnWriteList = new CopyOnWriteArrayList();
ConcurrentLinkedQueue bigConcurrentList = new ConcurrentLinkedQueue();
@Setup
public void setup() {
for (int i = 0; i < 10; i++) {
smallCopyOnWriteList.add(new Object());
smallConcurrentList.add(new Object());
}
for (int i = 0; i < 1000; i++) {
bigCopyOnWriteList.add(new Object());
bigCopyOnWriteList.add(new Object());
}
}
@Benchmark
public void copyOnWriteGet() {
smallCopyOnWriteList.get(0);
}
@Benchmark
public void copyOnWriteSize() {
smallCopyOnWriteList.size();
}
@Benchmark
public void concurrentListGet() {
smallConcurrentList.peek();
}
@Benchmark
public void concurrentListSize() {
smallConcurrentList.size();
}
@Benchmark
public void smallCopyOnWriteWrite() {
smallCopyOnWriteList.add(new Object());
smallCopyOnWriteList.remove(0);
}
@Benchmark
public void smallConcurrentListWrite() {
smallConcurrentList.add(new Object());
smallConcurrentList.remove(0);
}
@Benchmark
public void bigCopyOnWriteWrite() {
bigCopyOnWriteList.add(new Object());
bigCopyOnWriteList.remove(0);
}
@Benchmark
public void bigConcurrentListWrite() {
bigConcurrentList.offer(new Object());
bigConcurrentList.remove(0);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(ListTest.class.getSimpleName()).forks(1).warmupIterations(5)
.measurementIterations(5).threads(4).build();
new Runner(opt).run();
}
}
测试结果分析
Benchmark Mode Cnt Score Error Units
ListTest.bigConcurrentListWrite thrpt 5 0.012 ± 0.007 ops/us
ListTest.bigCopyOnWriteWrite thrpt 5 0.264 ± 0.026 ops/us
ListTest.concurrentListGet thrpt 5 4206.582 ± 598.722 ops/us
ListTest.concurrentListSize thrpt 5 310.722 ± 53.405 ops/us
ListTest.copyOnWriteGet thrpt 5 4243.784 ± 326.868 ops/us
ListTest.copyOnWriteSize thrpt 5 5403.908 ± 671.604 ops/us
ListTest.smallConcurrentListWrite thrpt 5 0.012 ± 0.007 ops/us
ListTest.smallCopyOnWriteWrite thrpt 5 10.162 ± 1.582 ops/us
代码分析
- 对于CopyOnWriteArrayList的写操作,大规模数组的性能远低于小规模数组
- 在少量写入且元素少(1000也算小)的情况下,CopyOnWriteArrayList 性能优于 ConcurrentLinkedQueue