最近处理报表数据,经常需要使用stream,一直听说stream性能不佳,突然想知道差别在哪儿,写了几个实例测试了一下。
@Data
public class StudentInfo {
@ApiModelProperty(value = "班级id")
private int classId;
@ApiModelProperty(value = "姓名")
private String name;
@ApiModelProperty(value = "课程成绩")
private List<SubjectScore> scoreList;
@Data
public static class SubjectScore {
@ApiModelProperty(value = "课程id")
private int subId;
@ApiModelProperty(value = "课程名称")
private String subName;
@ApiModelProperty(value = "得分")
private double score;
}
}
一、stream串行流测试
1、测试代码
import java.util.*;
import java.util.stream.Collectors;
import cn.hutool.core.date.StopWatch;
import cn.hutool.core.util.RandomUtil;
import lombok.Data;
@Data
public class TestMain {
public static void main(String[] args) throws Exception {
StopWatch stopWatch = new StopWatch();
// 初始化数据 修改初始集合大小,测试不同容量场景
List<StudentInfo> studentList = getList(20000);
stopWatch.start("stream流方式");
Map<String,
Double> streamMap = studentList.stream().map(StudentInfo::getScoreList).flatMap(Collection::stream)
.collect(Collectors.groupingBy(StudentInfo.SubjectScore::getSubName,
Collectors.summingDouble(StudentInfo.SubjectScore::getScore)));
stopWatch.stop();
stopWatch.start("for循环方式");
Map<String, Double> forMap = new HashMap<>();
for (StudentInfo studentInfo : studentList) {
List<StudentInfo.SubjectScore> scoreList = studentInfo.getScoreList();
for (StudentInfo.SubjectScore score : scoreList) {
if (forMap.containsKey(score.getSubName())) {
Double aDouble = forMap.get(score.getSubName());
aDouble += score.getScore();
forMap.put(score.getSubName(), aDouble);
} else {
forMap.put(score.getSubName(), score.getScore());
}
}
}
stopWatch.stop();
System.out.println(stopWatch.prettyPrint());
// JacksonUtils是自己写的工具,只是为了打印,确认处理结果是否一致,可以忽略
System.out.println(JacksonUtils.getJsonConvert().toJsonString(streamMap));
System.out.println(JacksonUtils.getJsonConvert().toJsonString(forMap));
}
private static List<StudentInfo> getList(int size) {
String[] subjectArray = {"chinese", "math", "english"};
List<StudentInfo> studentInfoList = new ArrayList<>(size);
for (int i = 0; i < size; i++) {
StudentInfo studentInfo = new StudentInfo();
studentInfo.setClassId(i % 7);
studentInfo.setName("xiaoming" + i % 7);
List<StudentInfo.SubjectScore> scoreList = new ArrayList<>(3);
for (int j = 0; j < 3; j++) {
StudentInfo.SubjectScore score = new StudentInfo.SubjectScore();
score.setSubName(subjectArray[j]);
score.setSubId(j);
score.setScore(RandomUtil.randomDouble(30.00, 99.00));
scoreList.add(score);
}
studentInfo.setScoreList(scoreList);
studentInfoList.add(studentInfo);
}
return studentInfoList;
}
}
2、运行结果
3、多组数据对比(数据为耗时百分比)
5 | 50 | 500 | 1000 | 5000 | 10000 | 20000 | 50000 | 100000 | 100w | |
for | 1% | 3% | 11% | 13% | 37% | 51% | 59% | 54% | 51% | 42% |
stream | 99% | 97% | 89% | 87% | 63% | 49% | 41% | 46% | 49% | 58% |
二、stream并行流测试
1、parallelStream默认使用了fork-join框架,设置ForkJoinPool线程数,并验证设置是否成功
public static void main(String[] args) throws Exception {
ForkJoinPool forkJoinPool1 = new ForkJoinPool(8);
StopWatch stopWatch = new StopWatch();
List<StudentInfo> studentList = getList(20);
studentList.parallelStream().forEach(studentInfo -> System.out.println(Thread.currentThread().getName()));
}
2、测试代码(只粘贴了并行流处理部分,其他代码块同上)
ForkJoinPool forkJoinPool1 = new ForkJoinPool(8);
StopWatch stopWatch = new StopWatch();
List<StudentInfo> studentList = getList(100000);
stopWatch.start("stream流方式");
ForkJoinTask<Map<String, Double>> forkJoinTask =
forkJoinPool1.submit(() -> studentList.parallelStream().map(StudentInfo::getScoreList)
.flatMap(Collection::stream).collect(Collectors.groupingBy(StudentInfo.SubjectScore::getSubName,
Collectors.summingDouble(StudentInfo.SubjectScore::getScore))));
3、测试比对结果
5 | 50 | 500 | 1000 | 5000 | 10000 | 20000 | 50000 | 100000 | 100w | |
for | 1% | 2% | 8% | 12% | 27% | 38% | 55% | 43% | 36% | 53% |
stream | 99% | 98% | 92% | 88% | 73% | 62% | 45% | 57% | 64% | 47% |
三、结论
(由于自身技术有限,可能测试过程存在谬误,导致该测试结果不可信,欢迎各位指正)
1、数据量较小时,for的耗时远低于stream(但两者耗时都很小,基本不会影响性能);
2、数据量较大时,两者耗时差不多(百万级测下来也差不多);
3、简单流处理与复杂流(比如先提取再分组再聚合)处理,测试下来,两者差不多;
4、并行流,不知道是不是我测试的有问题,实测下来,并没有优化多少,与串行流耗时差不多