导入导出？CV完了就摸鱼

最新推荐文章于 2024-05-31 10:46:10 发布

Z先生日常

最新推荐文章于 2024-05-31 10:46:10 发布

阅读量331

点赞数

分类专栏：工作ING文档-我不会停也不能停文章标签： java 开发语言后端

本文链接：https://blog.csdn.net/weixin_43206491/article/details/123069731

版权

工作ING文档-我不会停也不能停专栏收录该内容

1 篇文章 0 订阅

订阅专栏

今天的主角是EasyExcel,github地址:https://github.com/alibaba/easyexcel
号称64M内存1分钟内读取75M(46W行25列)的Excel，还有急速模式能更快，但是内存占用会在100M多一点

Maven依赖

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>easyexcel</artifactId>
    <version>3.0.5</version>
</dependency>

<!-- 对于文件操作可能用到了 hutool的工具包 -->
<dependency>
   <groupId>cn.hutool</groupId>
   <artifactId>hutool-all</artifactId>
   <version>5.7.16</version>
</dependency>

读Excel

excel示例
对象

/**
 * 用户表
 *
 * @author DuanYu <a>z-houjie@qq.com</a>
 * @date 2022-02-09 11:09
 */
@Data
public class UserExcelDTO {

    private Integer id;
    
    /** 用户类型： 0-未知，1-员工，2-客户 */
    private Integer type;
    
    /** 用户头像 */
    private String avatar;
    
    /** 姓名 */
    private String name;

    /** 手机号 */
    private String mobile;

    /** 删除标记 */
    private Integer deleted;

    /** 创建时间 */
    private Date createdTime;

    /** 更新时间 */
    private Date updatedTime;
}

一个用于处理文件上传的controller

/**
 * 上传
 *
 * @param file 文件
 */
@PostMapping("/upload")
public void upload(MultipartFile file) {
    Assert.isTrue(Objects.requireNonNull(file.getOriginalFilename()).endsWith(".xlsx"), "请以.xlsx格式导入文件");
    exportService.upload(file);
}

处理读取数据

// 这里 需要指定读用哪个class去读,然后读取第一个sheet,从第二行开始;文件流会自动关闭
EasyExcelFactory.read(file.getInputStream(), User.class, new AnalysisEventListener<User>() {
    @Resource
    private UserMapper userMapper;

    /**
     * 每隔500条存储数据库，然后清理list ，方便内存回收
     */
    private static final int BATCH_COUNT = 500;

    private List<User> cachedDataList = new ArrayList<>(BATCH_COUNT);

    /**
     * 每一条数据解析都调用一次这个方法
     *
     * @param user            用户数据
     * @param analysisContext 读取excel信息
     */
    @Override
    public void invoke(User user, AnalysisContext analysisContext) {
        log.info("解析到一条数据:{}", JSON.toJSONString(user));
        cachedDataList.add(user);
        // 达到BATCH_COUNT了，需要去存储一次数据库，防止数据几万条数据在内存，容易OOM
        if (cachedDataList.size() >= BATCH_COUNT) {
            saveData();
            // 存储完成清理 list
            cachedDataList = new ArrayList<>(BATCH_COUNT);
        }
    }

    /**
     * 所有数据解析完成了调用这个方法
     *
     * @param analysisContext 读取excel信息
     */
    @Override
    public void doAfterAllAnalysed(AnalysisContext analysisContext) {
        // 这里也要保存数据，将最后一批数据也存储到数据库
        saveData();
        log.info("所有数据解析完成");
    }

    private void saveData() {
        log.info("{}条数据，开始存储数据库", cachedDataList.size());
        userMapper.batchInsert(cachedDataList);
        log.info("存储数据库成功");
    }
}).sheet(0).headRowNumber(1).doRead();

在开发中，这个Listener监听肯定是要拿出来单独写的
有个很重要的点 XXXXAnalysisEventListener不能被spring管理，要每次读取excel都要new,然后里面用到spring可以构造方法传进去

转换器Converter

我们可以发现excel中的用户类型、是否离职是String，但在Java类中是Integer类型。所以这里介绍一下转换器的使用

/**
* 抽象方法介绍
*/
public interface Converter<T> {
    // 在java中的类型是什么
    Class supportJavaTypeKey();
    
    // 在excel中的类型是什么
    CellDataTypeEnum supportExcelTypeKey();
    
    // 将excel的数据类型转为java数据类型
    T convertToJavaData(CellData var1, ExcelContentProperty var2, GlobalConfiguration var3) throws Exception;
    
    // 将java的数据类型转为excel数据类型
    CellData convertToExcelData(T var1, ExcelContentProperty var2, GlobalConfiguration var3) throws Exception;
}

/**
 * 枚举
 */
@Getter
@RequiredArgsConstructor
public enum UserType {

    /**
     * 员工
     */
    STAFF(1, "员工"),

    /**
     * 客户
     */
    CUSTOMER(2, "客户");

    private final Integer value;

    private final String description;

    private static final Map<Integer, String> DESCRIPTION_DICT = new HashMap<>();

    private static final Map<String, Integer> VALUE_DICT = new HashMap<>();

    static {
        for (UserType value : UserType.values()) {
            DESCRIPTION_DICT.put(value.getValue(), value.getDescription());
            VALUE_DICT.put(value.getDescription(), value.getValue());
        }
    }

    public static String getTypeName(Integer key) {
        return DESCRIPTION_DICT.getOrDefault(key, null);
    }

    public static Integer getTypeValue(String key) {
        return VALUE_DICT.getOrDefault(key, null);
    }
}

/**
 * 具体实现
 * @author DuanYu Email:<a>z-houjie@qq.com</a>
 * @date 2022-02-21 17:20
 */
public class UserTypeExcelConverter implements Converter<Integer> {

    @Override
    public Class supportJavaTypeKey() {
        return Integer.class;
    }

    @Override
    public CellDataTypeEnum supportExcelTypeKey() {
        return CellDataTypeEnum.STRING;
    }

    @Override
    public Integer convertToJavaData(CellData cellData, ExcelContentProperty excelContentProperty, GlobalConfiguration globalConfiguration) throws Exception {
        return UserType.getTypeValue(cellData.getStringValue());

    }

    @Override
    public CellData convertToExcelData(Integer value, ExcelContentProperty excelContentProperty, GlobalConfiguration globalConfiguration) throws Exception {
        return new CellData(Optional.ofNullable(UserType.getTypeName(value)).orElse(""));
    }
}

使用

@ExcelProperty(value = "用户类型", converter = UserTypeExcelConverter.class)
private Integer type;

@ExcelProperty(value = "是否删除", converter = DeleteExcelConverter.class)
private Integer deleted;

// 介绍一下其他转换器

/** 日期 */
@DateTimeFormat("yyyy年MM月dd日HH时mm分ss秒")
private String date;

/** 百分比 */
@NumberFormat("#.##%")
private String doubleData;

踩坑知识点

使用异常线程读Excel文件可能存在 java.io.FileNotFoundExceprion
1.spring boot的服务在启动时，会在操作系统的/tmp目录下生成一个Tomcat.*的文件目录，用于"java.io.tmpdir"文件流操作
2.当程序对文件进行操作（上传）时，会在此目录下生成临时文件，暂存在临时文件中
当我们使用异步线程时主线程结束了，异步线程还未拿到文件，主线程会存在将临时文件删掉的操作导致上述异常
复现方式：异步线程进来等待几秒在用文件，直接java.io.FileNotFoundExceprion
解决方法：1.规避多线程操作文件、2.将文件转到本地使用完后手动删除

File localFile = null;
try{
    localFile = FileUtils.multipartFileToFile(file);
} catch(Exception e){
    
} finally {
    FileUtils.del(localFile);
}

FileUtils

@Slf4j
public class FileUtils extends FileUtil {

    private static final String TMP_PATH = File.separator + "tmp" + File.separator;

    /**
     * MultipartFile 转 File
     *
     * @param file MultipartFile
     * @throws IOException io异常
     */
    public static File multipartFileToFile(MultipartFile file) throws IOException {
        InputStream ins = file.getInputStream();
        File toFile = new File(TMP_PATH + URLEncoder.encode(Objects.requireNonNull(file.getOriginalFilename()), "UTF-8"));
        toFile.getParentFile().mkdirs();
        FileUtil.writeFromStream(ins, toFile);
        return toFile;
    }

    /**
     * base64转文件
     *
     * @param base64   base64
     * @param fileName 文件名称
     * @return 文件
     * @throws UnsupportedEncodingException 编码异常
     */
    public static File base64ToFile(String base64, String fileName) throws UnsupportedEncodingException {
        return Base64.decodeToFile(base64, new File(TMP_PATH + URLEncoder.encode(fileName, "UTF-8")));
    }

}

导出Excel

往往导出会涉及大数据量的问题，那么无论对用户的体验还是接口的可行性都是不友好的（超时异常等）。这样我们会将导出的功能做成异步的，用户先通过筛选条件查询，服务器就会在后台异步的进行数据的制作，等制作完成再让用户去下载。

表设计
有文件服务器的情况下，只需要把文件上传到服务器，将下载地址存到表里面即可。
如果没有单独的文件服务器，那么只能将数据存储到数据库中，但是一条数据量非常大，导致仅仅一个修改状态的操作都要把这条记录的数据全部下载下来。
为解决这一问题采用的是每个主题的数据分一个表，将数据分片存储一对多的建立关系，相当于数据桶的概念，将不同的数据放在不同的桶中，当然桶是需要自己创建的。

-- 文件导出表
CREATE TABLE `export` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `status` tinyint(1) DEFAULT NULL COMMENT '导出状态：0: 导出中 1：成功，2：失败',
    `export_topic` int(4) DEFAULT NULL COMMENT '导出对象，1-某个主题、2-某个主题...',
    `export_parameters` text COMMENT '导出参数(JSON格式)',
    `export_type` tinyint(1) DEFAULT '0' COMMENT '导出类型，1：csv  2：excel',
    `info_slicing_path` varchar(255) DEFAULT NULL COMMENT '内容分片路径',
    `operator` int(11) DEFAULT NULL COMMENT '操作人ID',
    `created_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
    PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4;

-- 主题数据表,每一个主题都需要新建一张数据分片表
CREATE TABLE `exprot_user_slicing`(
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `export_info` longtext COMMENT '导出信息',
    `created_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
    PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4;

Controller

/**
 * 查询数据入库
 *
 * @param param 请求参数
 */
@PostMapping("/data")
public void export(@RequestBody ExportParam param) {
    exportService.exportExcel(param);
    log.info("数据制作中...");
}

// ExportParam 为查询参数（哪一个主题，查询数据筛选条件以json字符串传过来）
@Data
@Accessors(chain = true)
public class ExportParam implements Serializable {

    private static final long serialVersionUID = 0L;

    /**
     * 导出的主题
     */
    private Integer topic;

    /**
     * 查询参数
     */
    private String jsonParam;
}

初始化主题枚举
枚举类中会定义某个主题，这个主题的查询参数、导出类型、文件名称、查询方法等，为后面的通用Service层做准备，这样每个导出只需要初始化主题，并提供一个查询的方法。其他的交给天吧！！！！


@Getter
@RequiredArgsConstructor
public enum ExportTopic {

    /**
     * 用户列表页
     */
    USER_LIST(1, UserParam.class, UserExportDTO.class, "用户数据", "export_user_slicing", p -> SpringContextUtil.getBean(UserService.class).findExportData((UserParam) p));


    private final Integer value;

    /**
     * 查询参数类型
     */
    private final Class<?> param;

    /**
     * 导出类型
     */
    private final Class<?> target;

    /**
     * 文件名称
     */
    private final String fileName;

    /**
     * 分片表地址
     */
    private final String location;

    /**
     * 数据查询方法
     */
    private final Function<Object, List<?>> function;


    public static final Map<Integer, ExportTopic> INSTANCE;

    static {
        Map<Integer, ExportTopic> dict = new HashMap<>(ExportTopic.values().length);
        for (ExportTopic export : ExportTopic.values()) {
            dict.put(export.value, export);
        }
        INSTANCE = dict;
    }

}

Service层
从给入场中找到这个到导出是什么主题、这里对数据的类型并不敏感，只需要将数据按照大小分片存储即可
其中，因为是异步的，我们有充足的时间对数据进行优化操作，下面的分片是将数据先进行压缩，得到的byte数组，然后根据字节大小进行分片。

/**
 * 查询数据导出数据
 *
 * @param exportParam 查询参数
 */
@Override
public void exportExcel(ExportParam exportParam) {
    ExportTopic exportTopic = getTopic(exportParam.getTopic());
    // 生成导出记录
    Export export = new Export();
    export.setStatus(ExportStatus.EXPORTING.getValue());
    export.setExportTopic(exportTopic.getValue());
    export.setExportParameters(exportParam.getJsonParam());
    export.setExportType(ExportType.EXCEL.getValue());
    export.setCreatedTime(new Date());
    exportMapper.insertSelective(export);

    // 异步制作导出内容
    asyncOutput.execute(() -> {
        try {
            Object param = JSON.parseObject(exportParam.getJsonParam(), exportTopic.getParam());

            // 执行数据查询方法拿到数据集合
            List<?> data = exportTopic.getFunction().apply(param);
            // 分片存储
            String locations = dataInfoSlicing(exportTopic.getLocation(), data);
            // 更新导出情况
            exportMapper.updateByPrimaryKeySelective(new Export().setId(export.getId()).setInfoSlicingPath(locations).setStatus(ExportStatus.SUCCESS.getValue()));

            log.info("制作成功！地址库:{},分片ID：{}", exportTopic.getLocation(), locations);
        } catch (Exception e) {
            exportMapper.updateByPrimaryKeySelective(new Export().setId(export.getId()).setStatus(ExportStatus.FAIL.getValue()));
            log.error("制作异常：", e);
        }
    });
}

/**
 * 数据分片存储
 *
 * @param location 地址库
 * @param dataInfo 数据信息
 */
private String dataInfoSlicing(String location, List<?> dataInfo) {
    // 内容压缩
    byte[] serialize = ExportUtil.serialize(dataInfo);
    // 内容分片
    List<String> slicingList = ExportUtil.slicing(serialize);
    // 存储 
    if (!CollectionUtils.isEmpty(slicingList)) {
        List<Map<String, Object>> exportInfo = slicingList.stream().map(v -> {
            Map<String, Object> infoMap = new HashMap<>(slicingList.size());
            infoMap.put("exportInfo", v);
            return infoMap;
        }).collect(Collectors.toList());

        // 在没有IO限制的条件下可以优化成并发写入
        exportMapper.insertSlice(exportInfo, location);
        return exportInfo.stream().map(v -> String.valueOf(v.get("id"))).collect(Collectors.joining("-"));
    }
    throw new IllegalStateException("数据分片存储异常");
}

/**
 * 获取导出主题
 *
 * @param topic 主题value
 * @return 主题
 */
private ExportTopic getTopic(Integer topic) {
    return Objects.requireNonNull(ExportTopic.INSTANCE.get(topic), "不支持的导出主题");
}

下载
数据制作完成后，通过主题ID可以查询到这个主题所有导出列表

/**
 * 获取导出列表
 *
 * @param param 请求参数
 * @return
 */
@PostMapping("/get")
public ResultData getExportList(@RequestBody @Validated ExportGetListParam param) {
    return ResultData.successful(exportService.findExportList(param));
}


/**
 * 获取下载列表
 *
 * @param param 查询条件
 * @return 分页列表
 */
@Override
public PageInfo<Export> findExportList(ExportGetListParam param) {
    PageMethod.startPage(param.getPageIndex(), param.getPageSize());
    return new PageInfo<>(exportMapper.selectByTopic(param.getTopic()));
    // 可以完善 操作人ID字段,再查询映射出操作人名称(举一反三)
}

下载
在上面的列表中选着某一条制作的数据进行导出，后台做的操作为通过ID查询这条记录，获取这条记录的储存分片的ID路径（1-2-3-4），通过这些ID去数据桶里把数据捞出来按主键顺序拼接起来即可（数据桶主键自增）

@GetMapping("/download/{id}")
public void download(@PathVariable Integer id, HttpServletResponse response) throws IOException {
    exportService.download(id, response);
}


/**
 * 下载导出文件
 *
 * @param id       导出记录ID
 * @param response 响应体
 */
@Override
public void download(Integer id, HttpServletResponse response) throws IOException {
    Export export = exportMapper.selectByPrimaryKey(id);
    if (Objects.nonNull(export) && Objects.equals(ExportStatus.SUCCESS.getValue(), export.getStatus())) {
        // 获取片段内容
        List<Integer> sliceIds = Arrays.stream(export.getInfoSlicingPath().split("-")).map(Integer::valueOf).collect(Collectors.toList());
        ExportTopic topic = getTopic(export.getExportTopic());
        List<String> slices = exportMapper.selectSliceInfoByLocation(topic.getLocation(), sliceIds);
        String join = String.join("", slices);
        List targetList;
        if (!StringUtils.hasText(join)) {
            log.info("数据为空");
            targetList = new ArrayList();
        } else {
            targetList = ExportUtil.deserialize(join, topic.getTarget());
        }
        ExportUtil.exportSingleSheetToHttpResponse(topic.getTarget(), () -> targetList, ResultData.failed("导出失败"), topic.getFileName(), "1", response);
    }
    throw new IllegalStateException("无可下载数据");
}

ExportUtil

/**
 * @author DuanYu <a>Email:z-houjie@qq.com</a>
 * @date 2022-02-09 14:47
 */
public class ExportUtil {

    private static final Logger log = LoggerFactory.getLogger(ExportUtil.class);

    private ExportUtil() {
        throw new IllegalStateException("Utility class");
    }

    /**
     * 导出单Sheet的excel数据到HttpResponse
     *
     * @param type      类型
     * @param query     获取数据的方法
     * @param errorData 执行失败的返回数据
     * @param fileName  文件名
     * @param sheetName sheet
     * @param response  response
     * @param <T>       类型
     */
    public static <T> void exportSingleSheetToHttpResponse(Class<T> type, Supplier<List<T>> query, Object errorData, String fileName, String sheetName, HttpServletResponse response) throws IOException {
        try {
            long start = System.currentTimeMillis();
            List<T> list = query.get();
            long end = System.currentTimeMillis();
            log.info("Excel 获取数据耗时：{}ms", end - start);

            setExcelHttpResponse(fileName, response);

            ServletOutputStream outputStream = response.getOutputStream();
            EasyExcelFactory.write(outputStream, type).excelType(ExcelTypeEnum.XLSX).sheet(sheetName).doWrite(list);
        } catch (Exception e) {
            log.error("Excel 导出失败：", e);
            response.reset();
            // 如果生成失败，则转为Json返回
            setJsonHttpResponse(response);

            PrintWriter writer = response.getWriter();
            writer.println(JSON.toJSONString(errorData));
            writer.close();
        }
    }

    /**
     * 设置响应类型为excel
     *
     * @param fileName 文件名称
     * @param response response
     * @throws UnsupportedEncodingException 编码异常
     */
    public static void setExcelHttpResponse(String fileName, HttpServletResponse response) throws UnsupportedEncodingException {
        response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
        response.setCharacterEncoding("utf-8");
        fileName = URLEncoder.encode(fileName, "UTF-8");
        response.setHeader("Content-disposition", "attachment;filename=" + fileName + ".xlsx");
    }

    /**
     * 设置响应类型为excel
     *
     * @param response response
     */
    public static void setJsonHttpResponse(HttpServletResponse response) {
        response.setContentType("application/json");
        response.setCharacterEncoding("utf-8");
    }

    /**
     * 分片
     *
     * @param serialize 压缩后的内容
     * @return 内容片段
     */
    public static List<String> slicing(byte[] serialize) {
        List<String> list = new ArrayList<>();
        // 每片大小（100KB）可以看情况定义
        final int slicingSize = 1024 * 100;
        // 初始化偏移量和长度
        int offset = 0, length = slicingSize;
        do {
            if ((serialize.length - offset) <= slicingSize) {
                length = serialize.length - offset;
            }
            list.add(new String(serialize, offset, length, StandardCharsets.ISO_8859_1));
            offset += length;
        } while (offset < serialize.length);
        return list;
    }


    /**
     * 序列化 压缩json
     *
     * @param elements 内容
     * @param <E>      内容类型
     * @return 压缩后内容
     */
    public static <E> byte[] serialize(List<E> elements) {
        String json = JSON.toJSONString(elements);
        byte[] bytes = json.getBytes(StandardCharsets.UTF_8);
        return ZipUtil.gzip(bytes);
    }


    /**
     * 解压
     *
     * @param serialized 压缩内容
     * @param type       转换的类型
     * @param <E>        类型
     * @return 解压的内容
     */
    public static <E> List<E> deserialize(String serialized, Class<E> type) {
        byte[] bytes = serialized.getBytes(StandardCharsets.ISO_8859_1);
        byte[] unzip = ZipUtil.unGzip(bytes);
        return JSON.parseArray(new String(unzip), type);
    }
}

结束语

书山有路勤为径,学海无涯苦作舟！！！

这篇文章部分来源于EasyExcel官方文档，更多更细的API使用方法请移步官方文档EasyExcel · 语雀 (yuque.com),我只是一名引路者。

Z先生日常

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
导入导出？CV完了就摸鱼

今天的主角是EasyExcel,github地址:https://github.com/alibaba/easyexcel号称64M内存1分钟内读取75M(46W行25列)的Excel，还有急速模式能更快，但是内存占用会在100M多一点Maven依赖<dependency> <groupId>com.alibaba</groupId> <artifactId>easyexcel</artifactId> <ver
复制链接

扫一扫