spring-batch使用

近来工作上 需要 批量化统计登陆日志(日活,访问量等) ,大约是40M一日志文件,一天有多个,数据量大约一天 几百万。
本来想得很天真,入表后group by这种,然后发现性能太差,支持不了。。于是就想到用程序去执行,读取,解析,统计,入表。后来听说了springbatch 等工具,这里来学习下.

springbatch一些文档
中文 https://kimmking.gitbooks.io/springbatchreference/
英文文档:http://docs.spring.io/spring-batch/trunk/reference/html/index.html

官网例子

官网的例子非常简单,,直接用java config来代替配置文件。

  • 文件列表:
src
└─main
    ├─java
    │  └─hello
    │          Application.java
    │          BatchConfiguration.java
    │          JobCompletionNotificationListener.java
    │          Person.java
    │          PersonItemProcessor.java
    │
    └─resources
            sample-data.csv
            schema-all.sql
  • 简要介绍:
    Application 是程序启动类,没有业务
    BatchConfiguration springbatch的配置,加了
    @Configuration
    @EnableBatchProcessing
    这两个配置,只要spring启动就会 开启这个配置,
    JobCompletionNotificationListener 这个是类似测试的,在批量化走完后,这个来查询下db,看看是否有数据
    Person 是一个pojo
    PersonItemProcessor 处理类,这里是把 用户名upper后再重新返回
    sample-data.csv 这个就是数据源了,里面有几条数据,字段之间用逗号分隔
    schema-all.sql 建表语句
    代码里直接使用了sql, 内嵌数据库 是hsqldb。

  • 核心代码说明
    BatchConfiguration.java

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

import javax.sql.DataSource;

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    @Autowired
    public DataSource dataSource;

    // tag::readerwriterprocessor[]
    @Bean
    public FlatFileItemReader<Person> reader() {
        FlatFileItemReader<Person> reader = new FlatFileItemReader<Person>();
        reader.setResource(new ClassPathResource("sample-data.csv"));
        reader.setLineMapper(new DefaultLineMapper<Person>() {{
            setLineTokenizer(new DelimitedLineTokenizer() {{
                setNames(new String[] { "firstName", "lastName" });
            }});
            setFieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
                setTargetType(Person.class);
            }});
        }});
        return reader;
    }

    @Bean
    public PersonItemProcessor processor() {
        return new PersonItemProcessor();
    }

    @Bean
    public JdbcBatchItemWriter<Person> writer() {
        JdbcBatchItemWriter<Person> writer = new JdbcBatchItemWriter<Person>();
        writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<Person>());
        writer.setSql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)");
        writer.setDataSource(dataSource);
        return writer;
    }
    // end::readerwriterprocessor[]

    // tag::jobstep[]
    @Bean
    public Job importUserJob(JobCompletionNotificationListener listener) {
        return jobBuilderFactory.get("importUserJob")
                .incrementer(new RunIdIncrementer())
                .listener(listener)
                .flow(step1())
                .end()
                .build();
    }

    @Bean
    public Step step1() {
        return stepBuilderFactory.get("step1")
                .<Person, Person> chunk(10)
                .reader(reader())
                .processor(processor())
                .writer(writer())
                .build();
    }
    // end::jobstep[]
}

上面定义 了reader,writer,process,一个job,和 step1

  • process是直接调用了 PersonItemProcessor,这里只做了简单的转成大写的动作
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.springframework.batch.item.ItemProcessor;

public class PersonItemProcessor implements ItemProcessor<Person, Person> {

    private static final Logger log = LoggerFactory.getLogger(PersonItemProcessor.class);

    @Override
    public Person process(final Person person) throws Exception {
        final String firstName = person.getFirstName().toUpperCase();
        final String lastName = person.getLastName().toUpperCase();

        final Person transformedPerson = new Person(firstName, lastName);

        log.info("Converting (" + person + ") into (" + transformedPerson + ")");

        return transformedPerson;
    }

}
  • 流程
    这里写图片描述

  • 说明
    程序用了java config 后,很多配置都没了,有些地方看得不明所以,看官方文档,大部分配置都是基于xml的.

一个比较正常点的例子

官网的例子有点飘逸,使用了sping boot 和其它的一些特性,找不到xml的影子,但是。。。batch的官方文档的例子,一堆的配置都是xml示例。。这里做个xml的配置例子,源代码在这里,

  • 代码列表
main
├─java
│  └─com
│      └─yp
│          └─batch
│              │  App.java
│              │  PersonFieldSetMapper.java
│              │  PersonItemProcessor.java
│              │  PersonItemWriter.java
│              │
│              └─entity
│                      Person.java
│
└─resources
        applicationContext.xml
        log4j.xml
        sample-data.csv
  • App.java为启动类
import org.apache.log4j.Logger;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class App {
   static Logger log = Logger.getLogger(App.class);

    public static void main(String[] args) throws Exception {
        String[] springConfig = { "applicationContext.xml" };
        ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext(springConfig);
        JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
        Job job = (Job) context.getBean("helloWorldJob");
        JobExecution execution = jobLauncher.run(job, new JobParameters());
        log.info("Exit Status : " + execution.getStatus());
        context.close();
    }
}

把配置文件加载进来,然后调用JobLauncher.run 方法,

*applicationContext.xml为配置

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:batch="http://www.springframework.org/schema/batch"
       xmlns:jdbc="http://www.springframework.org/schema/jdbc"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
                http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
                http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
                http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc.xsd">

    <bean id="cvsFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
        <property name="resource" value="classpath:sample-data.csv"/>
        <property name="lineMapper">
            <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
                <property name="lineTokenizer">
                    <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                    </bean>
                </property>
                <property name="fieldSetMapper">
                    <bean class="com.yp.batch.PersonFieldSetMapper"/>
                </property>
            </bean>
        </property>
    </bean>

    <bean id="itemProcessor" class="com.yp.batch.PersonItemProcessor"/>

    <bean id="personWriter" class="com.yp.batch.PersonItemWriter"></bean>

    <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
        <property name="jobRepository" ref="jobRepository"/>
    </bean>

    <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
        <property name="transactionManager" ref="transactionManager"/>
    </bean>

    <bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>

    <job id="helloWorldJob" xmlns="http://www.springframework.org/schema/batch">
        <step id="step1">
            <tasklet>
                <chunk reader="cvsFileItemReader" writer="personWriter" processor="itemProcessor"
                             commit-interval="10">
                </chunk>
            </tasklet>
        </step>
    </job>
</beans>

上面的配置清晰明了,就不多说了,其它情况请参考源代码

  • 数据即sample-data.csv是官网的
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doe
  • 运行结果
20:00:55,214 INFO  [PersonItemWriter] write : firstName: JILL, lastName: DOE
20:00:55,214 INFO  [PersonItemWriter] write : firstName: JOE, lastName: DOE
20:00:55,214 INFO  [PersonItemWriter] write : firstName: JUSTIN, lastName: DOE
20:00:55,214 INFO  [PersonItemWriter] write : firstName: JANE, lastName: DOE
20:00:55,214 INFO  [PersonItemWriter] write : firstName: JOHN, lastName: DOE
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值