目录
1. Springbatch概述
1.1 Springbatch简介
Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high performance batch jobs through optimization and partitioning techniques.
Spring Batch 作为 Spring 的子项目,是一款基于 Spring 的企业批处理框架。Spring Batch 不仅提供了统一的读写接口、丰富的任务处理方式、灵活的事务管理及并发处理,同时还支持日志、监控、任务重启与跳过等特性,大大简化了批处理应用开发,将开发人员从复杂的任务配置管理过程中解放出来,使他们可以更多地去关注核心的业务处理过程。
1.2 Springbatch核心概念
1) JobLauncher:是任务启动器,通过它来启动任务,可以看做是程序的入口。
2) Job:代表着一个具体的任务。
3) Step:代表着一个具体的步骤,一个Job可以包含多个Step。在实际业务场景中,可能一个任务很复杂,这个时候可以将任务 拆分成多个step,分别对这些step 进行管理(将一个复杂任务简单化)。(这些step 默认是串行执行,也可以并行执行)。每一个Step都有一个ItemReader(读取数据),一个ItemProcessor(处理数据)和一个ItemWriter(写入数据)。
值得注意的是,这里的Reader、Processor、Writer都有很多接口,比如Reader可以实现读文件、数据库,Writer可以实现写入文件、Kafka等,结合Processor,可以实现很多业务需求。
4) JobRepository:批处理框架执行过程中的上下文,两种实现方式,一种是通过内存来管理(下面的demo就是以这种方式为例,简洁方便),一种是进行持久化到数据库。
2. Springbatch读取文件并输出到文件的案例
2.1 Batch Job配置
2.2.1 相关依赖和配置
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
#关闭batch job自动启动
spring.batch.job.enabled = false
2.2.2 Job配置
1) Reader配置。
@Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>().name("personItemReader")
.resource(new ClassPathResource("sample-data.csv")).delimited()
.names(new String[] { "firstName", "lastName" })
.recordSeparatorPolicy(new RecordSeparatorPolicy()) // 实现跳过空行
.fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {
{
setTargetType(Person.class);
}
}).build();
}
2) Writer配置。
@Bean
public FlatFileItemWriter<Person> writer2() {
FlatFileItemWriter<Person> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("output/outputData.csv"));
writer.setAppendAllowed(true);
writer.setLineAggregator(new DelimitedLineAggregator<Person>() {
{
setDelimiter(",");
setFieldExtractor(new BeanWrapperFieldExtractor<Person>() {
{
setNames(new String[]{"firstName", "lastName"});
}
});
}
});
return writer;
}
3) Processor配置。
@Override
public Person process(Person sourcePerson) throws Exception {
final String firstName = sourcePerson.getFirstName().toUpperCase();
final String lastName = sourcePerson.getLastName().toUpperCase();
final Person transformedPerson = new Person(firstName, lastName);
log.info("Converting (" + sourcePerson + ") into (" + transformedPerson + ")");
return transformedPerson;
}
值得注意的是,当我们需要传递参数到Processor,简单的方法是这样:
JobParameters jobParameters;
@BeforeStep
public void beforeStep(final StepExecution stepExecution) {
jobParameters = stepExecution.getJobParameters();
log.info("jobParameters: {}", jobParameters);
}
4) Listener配置。这个不一定需要,主要是可以执行Job开始前和结束后的一些回调方法。
@Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
JobBuilder builder = jobBuilderFactory.get("importUserJob");
Job job = builder.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1)
.build();
return job;
}
@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {
private static final Logger log = LoggerFactory.getLogger(JobCompletionNotificationListener.class);
private final JdbcTemplate jdbcTemplate;
@Autowired
public JobCompletionNotificationListener(JdbcTemplate jdbcTemplate) {
this.jdbcTemplate = jdbcTemplate;
}
@Override
public void beforeJob(JobExecution jobExecution) {
log.info("job befor start...");
}
@Override
public void afterJob(JobExecution jobExecution) {
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
log.info("!!! JOB FINISHED! Time to verify the results");
// jdbcTemplate.query("SELECT first_name, last_name FROM people",
// (resultSet, row) -> new Person(resultSet.getString(1), resultSet.getString(2)))
// .forEach(person -> log.info("Found <" + person + "> in the database."));
}
}
}
2.2 启动Job
JobParameters parameters = new JobParametersBuilder()
.addString("msg",msg)
.toJobParameters();
try {
jobLauncher.run(jobLauncherDemoJob,parameters);
} catch (JobExecutionAlreadyRunningException e) {
e.printStackTrace();
} catch (JobRestartException e) {
e.printStackTrace();
} catch (JobInstanceAlreadyCompleteException e) {
e.printStackTrace();
} catch (JobParametersInvalidException e) {
e.printStackTrace();
}
2.3 测试
附:
1. 官网参考文档:Spring Batch - Reference Documentation。
2. 如果初始化spring容器的时候显示batch的持久化数据库不存在的问题,