记录一次Easy Excel 导出百万级数据功能优化及分页查询sql优化
前言
业务需求,需要一起将40多万条数据导出成 Excel。所以在技术选型上就是用了阿里巴巴的 Easy Excel ,因为官网是这么介绍的EasyExcel是一个基于Java的简单、省内存的读写Excel的开源项目。
有兴趣的朋友可以去官网看一下。
由于业务上并不复杂,只是使用几张表进行了简单的关联查询就可以得到想要的数据,但问题在于数据量过大,而且数据不断的在新增,极有可能日后达到百万级,所以在写这个功能时,考虑到了导出时间的问题,于是在网上各种查找相关解决办法。。。。。。
实现过程
在无数的技术帖子中,借鉴了一位大佬的技术方案(点击查看原地址)。以下是从大佬那里借鉴的代码并进行了修改,主要实现思路就是分页查询,分sheet页导出。
1.controller层
public void excelTableBloodRelationship(@RequestBody TableBloodRelationship tableBloodRelationship, HttpServletResponse response) {
long start = System.currentTimeMillis();
OutputStream outputStream;
try {
outputStream = response.getOutputStream();
String time = new SimpleDateFormat("yyyy-MM-dd-hh-mm-ss").format(new Date());
//添加响应头信息
response.setHeader("Content-disposition", "attachment; filename=" + "contract" + time + ".xlsx");
//设置类型
response.setContentType("application/vnd.ms-excel;charset=UTF-8");
//设置头
response.setHeader("Pragma", "No-cache");
//设置头
response.setHeader("Cache-Control", "no-cache");
//设置日期头
response.setDateHeader("Expires", 0);
//获取总数据量
Integer count = appNameService.getCount(tableBloodRelationship);
//导出Excel
//如果总数据量多余10万,分页导出
if(count>60000){
//每页多少个
int max=60000;
//必须放到循环外,否则会刷新流
ExcelWriter excelWriter=EasyExcel.write(outputStream).build();
for (int i=0;i<(count/max)+1;i++){
tableBloodRelationship.setStartNum(i *max);
tableBloodRelationship.setPageSize(max);
//查询数据库
List<TableBloodRelationship> tableBloodRelationships = appNameService.selectTableBloodRelationshipExport(tableBloodRelationship);
WriteSheet writeSheet = EasyExcel.writerSheet(i, "台账" + (i + 1)).head(TableBloodRelationship.class)
.registerWriteHandler(new LongestMatchColumnWidthStyleStrategy()).build();
excelWriter.write(tableBloodRelationships,writeSheet);
}
//刷新流
excelWriter.finish();
}else {
tableBloodRelationship.setStartNum(0);
tableBloodRelationship.setPageSize(60000);
//查询数据库
List<TableBloodRelationship> tableBloodRelationships = appNameService.selectTableBloodRelationshipExport(tableBloodRelationship);
EasyExcel.write(outputStream,TableBloodRelationship.class).registerWriteHandler(new LongestMatchColumnWidthStyleStrategy()).sheet("台账").doWrite(tableBloodRelationships);
}
outputStream.flush();
response.getOutputStream().close();
long end = System.currentTimeMillis();
logger.info("需要的时间===" + (end - start));
} catch (Exception e) {
logger.error("导出失败:{}", e.getMessage());
e.printStackTrace();
}
}
2.查询的sql
-- 每次查询6万条数据
SELECT
d.dict_label,
a.id,
a.app_id,
a.scenario_id,
a.source_system,
a.professional,
a.deploy_type,
a.source_table_en,
a.source_table_cn,
a.near_source_table_en,
a.near_source_table_cn,
a.shared_table_en,
a.shared_table_cn,
a.shared_table_type,
a.analysis_table_en,
a.analysis_table_cn,
a.province_id,
a.group_id,
a.create_by,
a.create_time,
a.update_by,
a.update_time,
a.is_delete,
a.business_areas,
a.department,
b.app_name,
c.scenario_name,
c.scenario_special
FROM
table_blood_relationship a
INNER JOIN app_name b ON a.app_id = b.id
AND a.province_id = b.provincial_id
INNER JOIN scenario_defined c ON a.scenario_id = c.id
AND a.province_id = c.unit_code
INNER JOIN sys_dict_data d ON a.province_id = d.dict_value
WHERE
a.is_delete = 0
AND d.dict_type = 'unit'
AND province_id = '99999999'
AND b.is_delete = 0
AND c.is_delete = 0
AND a.province_id = '99999999'
LIMIT 0,
60000
启动程序进行测试,导出40万条数据,需要时间为3分钟。导出时间远远没有达到预期要求,所以进行了下一步优化。
- 首先先测试了下 sql 单独执行时间。在本地库测试的查询6万条数据所用时间为1秒,在测试库运行时间1.1秒,运行多次,时间都大于等于1秒,本项目规定 sql 运行时间大于1秒都被列为慢sql 清单中,所以第一步先做的事就是进行 sql 优化。
- 修改了 sql 书写方式 ,并且添加了索引。修改后的sql 如下:
SELECT
d.dict_label,
temp.id,
temp.source_system,
temp.professional,
temp.deploy_type,
temp.source_table_en,
temp.near_source_table_en,
temp.shared_table_en,
temp.shared_table_cn,
temp.shared_table_type,
temp.analysis_table_en,
temp.province_id,
temp.is_delete,
temp.department ,
b.app_name,
c.scenario_name,
c.scenario_special
FROM
(
SELECT
a.id,
a.app_id,
a.scenario_id,
a.source_system,
a.professional,
a.deploy_type,
a.source_table_en,
a.near_source_table_en,
a.shared_table_en,
a.shared_table_cn,
a.shared_table_type,
a.analysis_table_en,
a.province_id,
a.is_delete,
a.department
FROM
dcmop.table_blood_relationship a
WHERE
a.is_delete = 0 AND
a.province_id = '99999999'
LIMIT 0,
60000
) temp
INNER JOIN dcmop.app_name b ON temp.app_id = b.id
AND temp.province_id = b.provincial_id
AND b.is_delete = 0
INNER JOIN dcmop.scenario_defined c ON temp.scenario_id = c.id
AND temp.province_id = c.unit_code
AND c.is_delete = 0
INNER JOIN dcmop.sys_dict_data d ON temp.province_id = d.dict_value
AND d.dict_type = 'unit'
测试导出40万条数据,所用时间为1分钟虽然提升了2分钟,但导出时间还是过长。所以进行了进一步优化即优化查询导出代码,使用多线程进行查询,查询后将查询结果合并导出成Excel。此处借鉴了另一位大佬的代码(原地址)。
修改后controller代码如下:
/**
* 导出Excel方法
*/
public void writeExcel(AppNameService exportExcelservice, TableBloodRelationship queryCondition, int exifInfoCount, OutputStream outputStream) {
//每个sheet保存的数据量
int num = 5000;
ExcelWriter excelWriter = null;
int corePoolSize = 10;
int maximumPoolSize = 20;
//用线程池管理多线程
ThreadPoolExecutor exector = (ThreadPoolExecutor) Executors.newFixedThreadPool(corePoolSize);
exector.setCorePoolSize(corePoolSize);
exector.setMaximumPoolSize(maximumPoolSize);
List<ReadExifInfoThread> tasks = Lists.newCopyOnWriteArrayList();
excelWriter = EasyExcel.write(outputStream, TableBloodRelationship.class).build();
//exifInfoCount 写入excel数据总量
//pageCount 要写入sheet页数量。同分页
int pageCount = exifInfoCount % num == 0 ? (exifInfoCount / num) : (exifInfoCount / num + 1);
queryCondition.setProvinceId(SecurityUtils.getCurrDeptId());
for (int i = 0; i < pageCount; i++) {
ReadExifInfoThread readExifInfoThread = new ReadExifInfoThread(queryCondition, exportExcelservice, i, num);
tasks.add(readExifInfoThread);
}
try {
//用invokeAll方法提交任务,返回数据的顺序和tasks中的任务顺序一致,如果第一个线程查0-10000行数据,第二个线程查10000-10001行数据,
//第二个线程大概率比第一个线程先执行完,但是futures中第一位数据是0-10000的数据。
List<Future<List<TableBloodRelationship>>> futures = exector.invokeAll(tasks);
WriteSheet writeSheet = EasyExcel.writerSheet(0, "台账信息" + (1)).build();
for (int i = 0; i < pageCount; i++) {
List<TableBloodRelationship> exifInfoList = futures.get(i).get();
excelWriter.write(exifInfoList, writeSheet);
}
} catch (Exception e) {
logger.error("台账导出数据失败", e);
}
exector.shutdown();
excelWriter.finish();
}
/**
* 多线程查询数据内部类
*/
public class ReadExifInfoThread implements Callable<List<TableBloodRelationship>> {
@Autowired
private AppNameService exportExcelservice;
private int startNum;
private int pageZize;
/**
* 源端系统
*/
private final String dataSourceSystem;
/**
* 共享层表名
*/
private final String sharedTableEn;
/**
* 应用名称
*/
private final String appName;
/**
* 场景名称
*/
private final String scenarioName;
/**
* 场景名称
*/
private final String provinceId;
public ReadExifInfoThread(TableBloodRelationship queryCondition, AppNameService exportExcelservice, int startNum, int pageZize) {
this.dataSourceSystem = queryCondition.getDataSourceSystem();
this.sharedTableEn = queryCondition.getSharedTableEn();
this.appName = queryCondition.getAppName();
this.scenarioName = queryCondition.getScenarioName();
this.exportExcelservice = exportExcelservice;
this.startNum = startNum;
this.pageZize = pageZize;
this.provinceId = queryCondition.getProvinceId();
}
@Override
public List<TableBloodRelationship> call() {
TableBloodRelationship queryCondition = new TableBloodRelationship();
queryCondition.setStartNum(startNum * pageZize);
queryCondition.setPageSize(pageZize);
queryCondition.setDataSourceSystem(dataSourceSystem);
queryCondition.setSharedTableEn(sharedTableEn);
queryCondition.setScenarioName(scenarioName);
queryCondition.setAppName(appName);
queryCondition.setProvinceId(provinceId);
long startTime = System.currentTimeMillis();
List<TableBloodRelationship> exifInfoList = null;
try {
//从数据库查询要写入excle的数据
exifInfoList = exportExcelservice.selectTableBloodRelationshipExport(queryCondition);
long endTime = System.currentTimeMillis();
long spendTime = endTime - startTime;
logger.info(Thread.currentThread().getName() + "查询耗时:" + spendTime + ";分页是从【" + queryCondition.getStartNum() + "】开始");
} catch (Exception e) {
logger.error("多线程查询导出数据失败", e);
}
return exifInfoList;
}
}
经过测试,修改后的代码,导出Excel 需要时间为=40多秒,已经算很快了,但是发现一个问题,因为是多线程分页查询,并且数据量较大,mysql limit 分页越往后所需时间越长,所以又将查询 sql 进行了分页优化。就是先将分页后的 id 查出来,然后在进行关联查询。
修改的sql 如下:
SELECT
*
FROM
(
SELECT
d.dict_label,
temp.id,
temp.source_system,
temp.professional,
temp.deploy_type,
temp.source_table_en,
temp.near_source_table_en,
temp.shared_table_en,
temp.shared_table_cn,
temp.shared_table_type,
temp.analysis_table_en,
temp.province_id,
temp.is_delete,
temp.department,
b.app_name,
c.scenario_name,
c.scenario_special
FROM
(
SELECT
a.id,
a.app_id,
a.scenario_id,
a.source_system,
a.professional,
a.deploy_type,
a.source_table_en,
a.near_source_table_en,
a.shared_table_en,
a.shared_table_cn,
a.shared_table_type,
a.analysis_table_en,
a.province_id,
a.is_delete,
a.department
FROM
dcmop.table_blood_relationship a
) temp
INNER JOIN dcmop.app_name b ON temp.app_id = b.id
AND temp.province_id = b.provincial_id
AND b.is_delete = 0
INNER JOIN dcmop.scenario_defined c ON temp.scenario_id = c.id
AND temp.province_id = c.unit_code
AND c.is_delete = 0
INNER JOIN dcmop.sys_dict_data d ON temp.province_id = d.dict_value
AND d.dict_type = 'unit'
) temp2
JOIN ( SELECT id FROM table_blood_relationship WHERE is_delete = 0 AND province_id = '99999999' LIMIT 0, 5000 ) temp1 ON temp2.id = temp1.id;
测试
最后测试后整体导出40万需要时间二十多秒,后面分页查询也不到1秒中。
一个在码农道路上孤独行走的人
微信搜索【Java猿记】