Spring-batch解析Csv文件

最新推荐文章于 2024-08-20 18:32:30 发布

watsonQiu

最新推荐文章于 2024-08-20 18:32:30 发布

阅读量1.7k

点赞数

分类专栏：编码之路

本文链接：https://blog.csdn.net/watson1360884839/article/details/84306292

版权

背景：

最近用POI解析线上的excel文件，在5万条以上的时候性能很慢。甚至内存卡死现象。于是想到用spring-batch分批次读取。但是spring-batch不支持直接读取excel文件。所以先将excel转为csv文件（测试转换效率：8万条 40s）。然后用spring-batch分批次读取，每次5000条。然后5000条数据处理再用多线程（forkJoin）处理。

============ 以下记录下工程demo，仅供我本人参考 ===========

1：spring-batch配置-----pom:

<dependency>
      <groupId>org.springframework.batch</groupId>
      <artifactId>spring-batch-core</artifactId>
      <version>3.0.8.RELEASE</version>
</dependency>

2：spring-batch配置-----batch-content.xml:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:p="http://www.springframework.org/schema/p"
       xmlns:tx="http://www.springframework.org/schema/tx"
       xmlns:aop="http://www.springframework.org/schema/aop"
       xmlns:context="http://www.springframework.org/schema/context"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
	http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
	http://www.springframework.org/schema/tx
	http://www.springframework.org/schema/tx/spring-tx-3.0.xsd
	http://www.springframework.org/schema/aop
	http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
	http://www.springframework.org/schema/context
	http://www.springframework.org/schema/context/spring-context-2.5.xsd"
       default-autowire="byName">

    <bean id="jobRepository"
          class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
        <property name="transactionManager" ref="transactionManagerBatch"/>
    </bean>

    <bean id="jobLauncher"
          class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
        <property name="jobRepository" ref="jobRepository"/>
    </bean>

    <!-- 这里命名不能和spring的transactionManager重名.否则导致spring事务不生效  -->
    <bean id="transactionManagerBatch"
          class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>
</beans>

3：spring-batch-----配置：batch-job.xml:

<?xml version="1.0" encoding="UTF-8"?>
<bean:beans xmlns="http://www.springframework.org/schema/batch"
            xmlns:bean="http://www.springframework.org/schema/beans"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xmlns:p="http://www.springframework.org/schema/p"
            xmlns:tx="http://www.springframework.org/schema/tx"
            xmlns:aop="http://www.springframework.org/schema/aop"
            xmlns:context="http://www.springframework.org/schema/context"
            xsi:schemaLocation="http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
    http://www.springframework.org/schema/tx
    http://www.springframework.org/schema/tx/spring-tx-3.0.xsd
    http://www.springframework.org/schema/aop
    http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
    http://www.springframework.org/schema/context
    http://www.springframework.org/schema/context/spring-context-2.5.xsd
    http://www.springframework.org/schema/batch
    http://www.springframework.org/schema/batch/spring-batch-2.2.xsd">
    <bean:import resource="classpath:META-INF/batch/fee-batch-context.xml"/>

    <job id="analysisExcelJob">
        <step id="listStep">
            <tasklet transaction-manager="transactionManager">
                <chunk reader="redeemDataReader" writer="redeemDataWriter" processor="redeemDataProcessor"
                       commit-interval="5000"/>
            </tasklet>
        </step>
        <listeners>
            <listener ref="analysisExcelInterceptor"/>
        </listeners>
    </job>

    <!-- 读取报表文件,csv格式 -->
    <bean:bean id="redeemDataReader"
               class="org.springframework.batch.item.file.FlatFileItemReader"
               scope="step">
        <bean:property name="resource"
                       value="file:#{jobParameters['file.data']}"/>
        <bean:property name="linesToSkip" value="4"/>
        <bean:property name="lineMapper">
            <bean:bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
                <bean:property name="lineTokenizer">
                    <!-- 映射的字段以下面names属性, 须覆盖所有表头, 以 , 隔开 -->
                    <bean:bean
                            class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                        <bean:property name="names" value="计划回款时间,商户名称,父产品名称,标的名称,合同编号,进件编码,
                        标的募集金额,投资人利率,当前期数,总期数,应还金额,应还利息,罚息金额,还款总额,代扣实际到账,未到账金额,
                        商户分润金额,产品起息日,虚户时间,滞销天数,首次回款日,运营滞销贴息,商户线下应还,
                        当期是否提前回款,回款模式,是否是转非标,是否提现成功"/>
                    </bean:bean>
                </bean:property>

                <!-- 如果