大数据excel导出解决方案
场景和问题
管理后台需要提供一个导出脑电记录生成excel文件功能,数据量大约在50万.针对此需求功能,面对以下几个问题:
1.查询数据库把全部相关脑电记录导出,如果一次性导出,内存会溢出,需要批量导出。
2.批量导出,如果采用分页查询,需要考察是否出现深分页带来的性能问题。
3.采用多线程处理
4.采用阿里巴巴开源的EasyExcel组件生成excel文件
源码地址: https://gitee.com/zhihui-ke/excel-load-demo.git(完整mock数据和测试例子)
测试结果
模拟50万条数据为例子
-
串行化执行大概37秒
-
双线程执行大概30秒
-
数据库查询采用覆盖索引大概15秒
-
数据库查询使用id分页大概5秒(前提id有序)
过程
串行执行
public void generateExcel() {
long startTime = System.currentTimeMillis();
String filePath = "F://data//excel//" + System.currentTimeMillis() + ".xlsx";
int batchNum = 1000;
int total = (int) this.count();
int current = 1;
int index = 1;
try (ExcelWriter excelWriter = EasyExcel.write(filePath, EegRecordMockExcel.class).build()) {
// 这里注意 如果同一个sheet只要创建一次
WriteSheet writeSheet = EasyExcel.writerSheet("模板").build();
while (total > 0) {
int offset = (current - 1) * batchNum;
batchNum = total > batchNum ? batchNum : total;
// select t.* from t_eeg_record_mock t limit #{offset}, #{size}
List<EegRecordMock> data = eegRecordMockMapper.listForPage(offset, batchNum);
excelWriter.write(data, writeSheet);
total -= batchNum;
current++;
}
}
}
分析:由于easyExcel不支持并发同步写入,因此不能开启多线程同时写入。而且经过分析,主要耗时是数据库查询。第一步,开双线程,采用观察生产者模式,一个线程主要获取数据,另一个线程监听,数据准备时写入excel文件。主要采用SynchronousQueue队列实现线程之间的通讯。
双线程同步执行
public void generateExcelAsync() throws InterruptedException {
SynchronousQueue<List<EegRecordMock>> queue = new SynchronousQueue<>();
String filePath = "F://data//excel//" + System.currentTimeMillis() + ".xlsx";
ExecutorService executorService = Executors.newFixedThreadPool(2);
int batchNum0 = 1000;
int total0 = (int) this.count();
int num = total0 / batchNum0;
num = total0 % batchNum0 == 0 ? num : num + 1;
CountDownLatch countDownLatch = new CountDownLatch(num);
long startTime = System.currentTimeMillis();
// 生产者线程
executorService.submit(() -> {
int batchNum = batchNum0;
int total = total0;
int current = 1;
while (total > 0) {
int offset = (current - 1) * batchNum;
batchNum = total > batchNum ? batchNum : total;
int size = batchNum;
// select t.* from t_eeg_record_mock t limit #{offset}, #{size}
List<EegRecordMock> data = eegRecordMockMapper.listForPage(offset, size);
try {
queue.put(data);
} catch (InterruptedException e) {
e.printStackTrace();
}
total -= batchNum;
current++;
}
});
// 消费者线程
executorService.submit(() -> {
try (ExcelWriter excelWriter = EasyExcel.write(filePath, EegRecordMockExcel.class).build()) {
WriteSheet writeSheet = EasyExcel.writerSheet("模板").build();
try {
while (countDownLatch.getCount() > 0) {
List<EegRecordMock> data = queue.take();
excelWriter.write(data, writeSheet);
countDownLatch.countDown();
}
} catch (InterruptedException e) {
e.printStackTrace();
} catch (Exception e) {
log.error("写入出错: {}", e.getMessage());
}
}
});
countDownLatch.await();
}
优化效果不是特别明显,大约时间减少7-8秒左右。性能瓶颈是深分页查询比较慢。因此获取数据优化
覆盖索引深分页优化
# 原本
select t.* from t_eeg_record_mock t limit #{offset}, #{size}
# 优化后
select t.* from t_eeg_record_mock t, (select id from t_eeg_record_mock limit #{offset}, #{size}) t2 where t.id = t2.id order by t.id asc
结果: 大概时间是15秒左右
在id有序情况下,采用id分页
# 生产者优化代码
executorService.submit(() -> {
int batchNum = batchNum0;
int total = total0;
int current = 1;
AtomicReference<Integer> idIndex = new AtomicReference<>(0);
while (total > 0) {
int offset = (current - 1) * batchNum;
batchNum = total > batchNum ? batchNum : total;
int size = batchNum;
int maxId = idIndex.get();
// select t.* from t_eeg_record_mock t where t.id > #{id} order by id asc limit #{size}
List<EegRecordMock> data = eegRecordMockMapper.listBetweenId(maxId, size);
// 获取上一次分页的最大id,用于下次过滤查询
idIndex.set(data.get(data.size() - 1).getId());
try {
queue.put(data);
} catch (InterruptedException e) {
e.printStackTrace();
}
total -= batchNum;
current++;
}
});
结果:时间大概是在5秒左右。