背景:
最近用POI解析线上的excel文件,在5万条以上的时候性能很慢。甚至内存卡死现象。于是想到用spring-batch分批次读取。 但是spring-batch不支持直接读取excel文件。所以先将excel转为csv文件(测试转换效率:8万条 40s)。然后用spring-batch分批次读取,每次5000条。 然后5000条数据处理再用多线程(forkJoin)处理。
============ 以下记录下工程demo,仅供我本人参考 ===========
1:spring-batch配置-----pom:
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-core</artifactId>
<version>3.0.8.RELEASE</version>
</dependency>
2:spring-batch配置-----batch-content.xml:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:aop="http://www.springframework.org/schema/aop"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-3.0.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context-2.5.xsd"
default-autowire="byName">
<bean id="jobRepository"
class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
<property name="transactionManager" ref="transactionManagerBatch"/>
</bean>
<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository"/>
</bean>
<!-- 这里命名不能和spring的transactionManager重名.否则导致spring事务不生效 -->
<bean id="transactionManagerBatch"
class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>
</beans>
3:spring-batch-----配置:batch-job.xml:
<?xml version="1.0" encoding="UTF-8"?>
<bean:beans xmlns="http://www.springframework.org/schema/batch"
xmlns:bean="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:aop="http://www.springframework.org/schema/aop"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-3.0.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context-2.5.xsd
http://www.springframework.org/schema/batch
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd">
<bean:import resource="classpath:META-INF/batch/fee-batch-context.xml"/>
<job id="analysisExcelJob">
<step id="listStep">
<tasklet transaction-manager="transactionManager">
<chunk reader="redeemDataReader" writer="redeemDataWriter" processor="redeemDataProcessor"
commit-interval="5000"/>
</tasklet>
</step>
<listeners>
<listener ref="analysisExcelInterceptor"/>
</listeners>
</job>
<!-- 读取报表文件,csv格式 -->
<bean:bean id="redeemDataReader"
class="org.springframework.batch.item.file.FlatFileItemReader"
scope="step">
<bean:property name="resource"
value="file:#{jobParameters['file.data']}"/>
<bean:property name="linesToSkip" value="4"/>
<bean:property name="lineMapper">
<bean:bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<bean:property name="lineTokenizer">
<!-- 映射的字段以下面names属性, 须覆盖所有表头, 以 , 隔开 -->
<bean:bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<bean:property name="names" value="计划回款时间,商户名称,父产品名称,标的名称,合同编号,进件编码,
标的募集金额,投资人利率,当前期数,总期数,应还金额,应还利息,罚息金额,还款总额,代扣实际到账,未到账金额,
商户分润金额,产品起息日,虚户时间,滞销天数,首次回款日,运营滞销贴息,商户线下应还,
当期是否提前回款,回款模式,是否是转非标,是否提现成功"/>
</bean:bean>
</bean:property>
<!-- 如果