百万数据查询导出(分页异步阻塞查询,SQL优化)

目标

实现查询超出百万的数据,并且对数据进行处理,导出excel表

问题所在:
查询数据速度;内存溢出;

步骤

1.分页查询(解决内存溢出)
2.异步阻塞查询组合数据(解决查询效率慢)
3.SQL优化(通过limit,索引优化效率(explain),具体字段优化)
4.主动gc,对使用完的list数据主动回收。

内容概述

本文中,每查询10w条数据,其中每1w条数据通过异步并发分页查询数据,查询的limit在0到10w中,用异步阻塞的方法获取10w条数据后,取10w条数据的最大orderId的值,做为下次查询的条件,使第10w到20w数据的查询的limit值重新从0到10w取值

PS:
1.需要注意处理异步阻塞查询数据时,并发的脏读问题
2.已知在mysql的limit中 limit 0,5000 和limit 10000,5000中offset值越大,查询效率越慢,则可以通过其他索引字段限制offset值,使offset从头开始
3.注意mybatis中的分页插件查询数据会先进行count查询再进行分页查询,查2次对大数据查询不友好

代码分析

1.创建分布式锁,限制次数(防止用户多次点击)
注意:redisLoader只是对redisTemplate的封装,指定了调用的redis配置

		Long start = System.currentTimeMillis() / 1000;
        //分布式锁,60s后过期
        if (!redisLoader.setIfAbsent("saas:exportData:exportCountOrderInfo", start+"", 60 * 1000)) {
            logger.info("接口在执行!!,不能重复执行");
            return;
        }

2.完成业务后主动删除锁

			if (String.valueOf(start).equals(redisLoader.get("saas:exportData:exportCountOrderInfo"))) {
                //删锁
                redisLoader.delete("saas:exportData:exportCountOrderInfo");
            }

3.异步阻塞+分页查询数据

//每百万条数据切换一个sheet
                WriteSheet writeSheet = sheets.get(i / 10);
                long start1 = System.currentTimeMillis() / 1000;
                //主线程等待子线程执行完毕
                List<CompletableFuture<List<OrderInfoCountPrintVo>>> futureList = new ArrayList<>();
                List<OrderInfoStatisticsSearch> listSearch = new ArrayList<>();
                logger.info(System.currentTimeMillis() + ":阻塞异步开始");
                //pageSizeWrite/pageSize 写入数据分页大小和读取数据分页大小的倍率
                for (int j = 1; j <= pageSizeWrite / pageSize; j++) {
                    //增加异步查询的次数,来减少每次查询的时间,削峰填谷
                    int k = i * pageSizeWrite / pageSize + j;
                    if (k <= pagesFact) {
                        //并发线程内需要变更的对象需要在线程内创建创建新对象处理
                        //logger.warn("读取页数:{},页大小:{}",k ,pageSize)
                        CompletableFuture<List<OrderInfoCountPrintVo>> job1 = asyncService.getExportDataList(orderInfoStatisticsSearch, j, pageSize);
                        futureList.add(job1);
                    }
                }
                List<OrderInfoCountPrintVo> list = new ArrayList<>();
                for (CompletableFuture<List<OrderInfoCountPrintVo>> listCompletableFuture : futureList) {
                    List<OrderInfoCountPrintVo> join = listCompletableFuture.get();
                    if (join.size() > 0) {
                        list.addAll(join);
                        join.clear();
                    }
                }

                logger.info(System.currentTimeMillis() + ":异步阻塞结束");
                logger.warn("读取数据耗时:{}", System.currentTimeMillis() / 1000 - start1);
                if (list.size() > 0) {
                    list.stream().sorted(Comparator.comparing(OrderInfoCountPrintVo::getOrderId)).collect(Collectors.toList());
                    //logger.warn("listSize:{},排序耗时:{}", list.size(), System.currentTimeMillis() / 1000 - start2);
                    long start3 = System.currentTimeMillis() / 1000;
                    excelWriter.write(list, writeSheet);
                    logger.warn("listSize:{},write耗时:{}", list.size(), System.currentTimeMillis() / 1000 - start3);
                    //限制异步查询一下次查询的最小id值,为了offset重新开始
                    orderInfoStatisticsSearch.setOrderId(list.get(list.size() - 1).getOrderId());

                }
                //gc,清空内存
                list.clear();
                logger.warn("pageNo:,{},pageSizeWrite:{},pagesWrite{},耗时:{}", i, pageSizeWrite, pagesWrite, System.currentTimeMillis() / 1000 - start1);

4.并发任务的

 @Async("ExportDataBatch")
    public CompletableFuture<List<OrderInfoCountPrintVo>> getExportDataList(OrderInfoStatisticsSearch orderInfoStatisticsSearch, int j, int pageSize) {
        List<OrderInfoCountPrintVo> list =new ArrayList<>();
        //orderInfoStatisticsSearch 对象非安全,
        //data-》线程内创建的对象线性安全,解决并发线程幻读脏读问题,
        OrderInfoStatisticsSearch data=new OrderInfoStatisticsSearch();
        BeanUtils.copyProperties(orderInfoStatisticsSearch,data);
        data.setOffset((j-1)*pageSize);
        data.setLimit(pageSize);
        try {
            List<OrderInfoCountPrintVo> orderInfoExportDataList = orderInfoService.listExportOrderInfo(data);
            if (orderInfoExportDataList.size() > 0) {
                list.addAll(orderInfoExportDataList);
                orderInfoExportDataList.clear();
            }

            //logger.warn("异步读取 pageNo:{},pageSize:{},耗时:{}", j, pageSize, System.currentTimeMillis() / 1000 - start);
        } catch (Exception e) {
            e.printStackTrace();
        }finally {
            return CompletableFuture.completedFuture(list);
        }
    }

5.查询sql

		select a.orderId from (
        <include refid="exportData"/>
        )a
        order by a.orderId
        <if test="limit != null and limit != ''">
            limit #{offset},#{limit}
        </if>

主要代码汇总

	/**
     * 导出 订单报表统计的相关订单
     */
    @ResponseBody
    @RequestMapping(value = "/exportCountOrderInfo", method = RequestMethod.GET)
    public void exportCountOrderInfo(OrderInfoStatisticsSearch orderInfoStatisticsSearch, HttpServletResponse response) throws IOException {
        OutputStream outputStream = response.getOutputStream();
//        ExcelWriter excelWriter = new ExcelWriter(outputStream, ExcelTypeEnum.XLSX);
        ExcelWriter excelWriter = EasyExcel.write(outputStream, OrderPrintVo.class).build();
        response.setHeader("Set-Cookie", "fileDownload=true; path=/");
        int pageNo = 1;
        int pageSizeWrite = 100000;
        int pageSize = 10000;
        //其他非分页sql数据查询
        orderInfoStatisticsSearch = orderInfoService.orderInfoStatisticsSearchChange(orderInfoStatisticsSearch);
        //long totalCount=orderInfoService.listCountOrderInfo(orderInfoStatisticsSearch).getTotal();
        Long start = System.currentTimeMillis() / 1000;
        //分布式锁,60s后过期
        if (!redisLoader.setIfAbsent("saas:exportData:exportCountOrderInfo", start+"", 60 * 1000)) {
            logger.info("接口在执行!!,不能重复执行");
            return;
        }
        try {
            logger.info(orderInfoStatisticsSearch.getTenantCode() + ",orderInfo/exportCountOrderInfo,startTime:{},endTime:{},count:{}",
                    orderInfoStatisticsSearch.getStartTime(), orderInfoStatisticsSearch.getEndTime(), orderInfoStatisticsSearch.getCount());
            response.setContentType("application/vnd.ms-excel");
            response.setCharacterEncoding("utf-8");
            // 这里URLEncoder.encode可以防止中文乱码 当然和easyexcel没有关系
            String fileName = URLEncoder.encode("订单列表", "UTF-8");
            response.setHeader("Content-disposition", "attachment;filename=" + fileName + ".xlsx");

            orderInfoStatisticsSearch.setOffset(pageNo - 1);
            orderInfoStatisticsSearch.setLimit(pageSize);
            //禁止使用mybatis的分页查询,mybatis的分页查询会先进行count查询,导出查2次数据
            //总数
            Long nums = orderInfoMapper.listExportOrderInfoCount(orderInfoStatisticsSearch);
            //可以取到小数点后的正确数值,如果两个都是整形,那小数点后面就会清零
            float numberA = nums;
            //总页数
            float numberB = pageSize;
            //计算实际读取数据时分页大小
            Integer pagesFact = (int) Math.ceil(numberA / numberB);
            //计算写入数据时分页大小
            float numberC = pageSizeWrite;
            int pagesWrite = (int) Math.ceil(numberA / numberC);

            logger.info("计算写入页码用时:" + (System.currentTimeMillis() / 1000 - start) + "秒,{},总数:{}", pagesWrite, nums);
            logger.info("条件:{}", JSONObject.toJSONString(orderInfoStatisticsSearch));


            List<WriteSheet> sheets = new ArrayList<>();
            for (int i = 0; i < pagesWrite; i++) {
                if (i % 10 == 0) {
                    //每百万条数据切换一个sheet
                    WriteSheet writeSheet = EasyExcel.writerSheet("订单列表_" + (i + 1)).build();
                    sheets.add(writeSheet);
                }
            }
            for (int i = 0; i < pagesWrite; i++) {
                //每百万条数据切换一个sheet
                WriteSheet writeSheet = sheets.get(i / 10);
                long start1 = System.currentTimeMillis() / 1000;


                //主线程等待子线程执行完毕
                List<CompletableFuture<List<OrderInfoCountPrintVo>>> futureList = new ArrayList<>();
                List<OrderInfoStatisticsSearch> listSearch = new ArrayList<>();


                logger.info(System.currentTimeMillis() + ":阻塞异步开始");

                //pageSizeWrite/pageSize 写入数据分页大小和读取数据分页大小的倍率
                for (int j = 1; j <= pageSizeWrite / pageSize; j++) {
                    //增加异步查询的次数,来减少每次查询的时间,削峰填谷
                    int k = i * pageSizeWrite / pageSize + j;
                    if (k <= pagesFact) {
                        //并发线程内需要变更的对象需要在线程内创建创建新对象处理
                        //logger.warn("读取页数:{},页大小:{}",k ,pageSize);
                        //logger.info("读取页数条件:{},id", orderInfoStatisticsSearch.getOrderId()!=null?orderInfoStatisticsSearch.getOrderId():null);
                        CompletableFuture<List<OrderInfoCountPrintVo>> job1 = asyncService.getExportDataList(orderInfoStatisticsSearch, j, pageSize);
                        futureList.add(job1);
                    }
                }
                List<OrderInfoCountPrintVo> list = new ArrayList<>();
                for (CompletableFuture<List<OrderInfoCountPrintVo>> listCompletableFuture : futureList) {
                    List<OrderInfoCountPrintVo> join = listCompletableFuture.get();
                    if (join.size() > 0) {
                        list.addAll(join);
                        join.clear();
                    }
                }

                logger.info(System.currentTimeMillis() + ":异步阻塞结束");
                logger.warn("读取数据耗时:{}", System.currentTimeMillis() / 1000 - start1);
                if (list.size() > 0) {
                    list.stream().sorted(Comparator.comparing(OrderInfoCountPrintVo::getOrderId)).collect(Collectors.toList());
                    //logger.warn("listSize:{},排序耗时:{}", list.size(), System.currentTimeMillis() / 1000 - start2);
                    long start3 = System.currentTimeMillis() / 1000;
                    excelWriter.write(list, writeSheet);
                    logger.warn("listSize:{},write耗时:{}", list.size(), System.currentTimeMillis() / 1000 - start3);
                    //限制异步查询一下次查询的最小id值,为了offset重新开始
                    orderInfoStatisticsSearch.setOrderId(list.get(list.size() - 1).getOrderId());

                }
                //gc,清空内存
                list.clear();
                logger.warn("pageNo:,{},pageSizeWrite:{},pagesWrite{},耗时:{}", i, pageSizeWrite, pagesWrite, System.currentTimeMillis() / 1000 - start1);
            }

            long end = System.currentTimeMillis() / 1000;
            logger.info("订单报表导出耗时:" + (end - start) + "秒");
            if (String.valueOf(start).equals(redisLoader.get("saas:exportData:exportCountOrderInfo"))) {
                //删锁
                redisLoader.delete("saas:exportData:exportCountOrderInfo");
            }

        } catch (Exception e) {
            logger.error("导出订单失败", e);
        } finally {
            outputStream.flush();
            excelWriter.finish();
            outputStream.close();
        }

    }

扩展内容讨论

1.导出的数据内容上亿后有几十g如何处理?
答:集群服务器分文件生成部分内容,输出到统一的文件服务器上,校验完整输出后打包多个文件为一个zip包,然后保存下载地址,提供用户下载。(当然得注意对数据库读取数据的压力)

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值