介绍Spring Batch 中Tasklet 和 Chunks

最新推荐文章于 2025-03-10 20:00:00 发布

梦想画家

最新推荐文章于 2025-03-10 20:00:00 发布

阅读量2.2w

点赞数 16

分类专栏： # spring 文章标签： Spring Batch Tasklet Chunks

本文链接：https://blog.csdn.net/neweastsun/article/details/88866837

版权

spring 专栏收录该内容

92 篇文章

订阅专栏

介绍Spring Batch 中Tasklet 和 Chunks

Spring Batch 提供了两种不同方式实现job: tasklet 和 chunk。本文通过实例实践两种方法。

示例需求说明

给定输入csv文件内容如下：

Mae Hodges,10/22/1972
Gary Potter,02/22/1953
Betty Wise,02/17/1968
Wayne Rose,04/06/1977
Adam Caldwell,09/27/1995
Lucille Phillips,05/14/1992

每行的第一列表示名称、第二列表示出生日期。我们的示例是需要生成结果文档包括名称和年龄：

Mae Hodges,45
Gary Potter,64
Betty Wise,49
Wayne Rose,40
Adam Caldwell,22
Lucille Phillips,25

现在目标清楚了，让我们开始使用两种方法实现。首先我们使用tasklet。

Tasklet 方式实现

分析与设计

takslet意味着在step中执行单个任务，job有多个step按一定顺序组成，每个步骤应该执行一个具体任务。
我们的job有三个步骤：

从输入csv文件读
对每个输入行数据计算年龄
写姓名和年龄至输出csv文件

框架搭好了，我们开始实现具体每个步骤。

LinesReader负责从输入文件读数据：

public class LinesReader implements Tasklet {
    // ...
}

LinesProcessor负责计算每个人年龄：

public class LinesProcessor implements Tasklet {
    // ...
}

最后，LinesWriter负责名称和年龄至输出文件：

public class LinesWriter implements Tasklet {
    // ...
}

同时，所有步骤都实现Tasklet接口，所以必须实现其方法：

@Override
public RepeatStatus execute(StepContribution stepContribution, 
  ChunkContext chunkContext) throws Exception {
    // ...
}

该方法需实现每个步骤的业务逻辑。开始写详细代码之前，我们先配置job。

###　批处理配置

我们需要在spring上下文中增加一些配置。把前面创建的类增bean声明配置，请看job定义：

@Configuration
@EnableBatchProcessing
public class TaskletsConfig {

    @Autowired private JobBuilderFactory jobs;
    @Autowired private StepBuilderFactory steps;

    @Bean
    public JobLauncherTestUtils jobLauncherTestUtils() {
        return new JobLauncherTestUtils();
    }

    @Bean
    public JobRepository jobRepository() throws Exception {
        MapJobRepositoryFactoryBean factory = new MapJobRepositoryFactoryBean();
        factory.setTransactionManager(transactionManager());
        return (JobRepository) factory.getObject();
    }

    @Bean
    public PlatformTransactionManager transactionManager() {
        return new ResourcelessTransactionManager();
    }

    @Bean
    public JobLauncher jobLauncher() throws Exception {
        SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
        jobLauncher.setJobRepository(jobRepository());
        return jobLauncher;
    }

    @Bean
    public LinesReader linesReader() {
        return new LinesReader();
    }

    @Bean
    public LinesProcessor linesProcessor() {
        return new LinesProcessor();
    }

    @Bean
    public LinesWriter linesWriter() {
        return new LinesWriter();
    }

    @Bean
    protected Step readLines() {
        return steps
          .get("readLines")
          .tasklet(linesReader())
          .build();
    }

    @Bean
    protected Step processLines() {
        return steps
          .get("processLines")
          .tasklet(linesProcessor())
          .build();
    }

    @Bean
    protected Step writeLines() {
        return steps
          .get("writeLines")
          .tasklet(linesWriter())
          .build();
    }

    @Bean
    public Job job() {
        return jobs
          .get("taskletsJob")
          .start(readLines())
          .next(processLines())
          .next(writeLines())
          .build();
    }
}

我们的taskletsJob由三个步骤组成。第一个（readLines）执行lineReader bean的任务，然后到下一个步骤processLines。ProcessLines执行lineProcessor bean的任务，最后执行writeLines步骤。
到目前为止，我们的job流程定义好了，下面准备增加逻辑。

模型对象和工具类

因为需要处理csv文件中的每一行，我们定义Line类：

@Setter
@Getter
@ToString
@AllArgsConstructor
@NoArgsConstructor
public class Line implements Serializable {
    private String name;
    private LocalDate dob;
    private Long age;
}

需要注意的是Line类实现Serializable接口，那是因为Line作为DTO在不同步骤之间传输数据。根据Spring Batch规范，在步骤之间传输的对象必须是可序列化的。另外我们需要考虑如何读写csv文件，这里我们使用OpenCSV工具库：

    implementation 'com.opencsv:openscs:4.5'

加入OpenCSV依赖后，我们实现FileUtils类，提供CSV文件读写方法：

ublic class FileUtils {

    private final Logger logger = LoggerFactory.getLogger(FileUtils.class);

    private String fileName;
    private CSVReader CSVReader;
    private CSVWriter CSVWriter;
    private FileReader fileReader;
    private FileWriter fileWriter;
    private File file;

    public FileUtils(String fileName) {
        this.fileName = fileName;
    }

    public Line readLine() {
        try {
            if (CSVReader == null) initReader();
            String[] line = CSVReader.readNext();
            if (line == null) return null;
            return new Line(line[0], LocalDate.parse(line[1], DateTimeFormatter.ofPattern("MM/dd/yyyy")));
        } catch (Exception e) {
            logger.error("Error while reading line in file: " + this.fileName);
            return null;
        }
    }

    public void writeLine(Line line) {
        try {
            if (CSVWriter == null) initWriter();
            String[] lineStr = new String[2];
            lineStr[0] = line.getName();
            lineStr[1] = line
              .getAge()
              .toString();
            CSVWriter.writeNext(lineStr);
        } catch (Exception e) {
            logger.error("Error while writing line in file: " + this.fileName);
        }
    }

    private void initReader() throws Exception {
        ClassLoader classLoader = this
          .getClass()
          .getClassLoader();
        if (file == null) file = new File(classLoader
          .getResource(fileName)
          .getFile());
        if (fileReader == null) fileReader = new FileReader(file);
        if (CSVReader == null) CSVReader = new CSVReader(fileReader);
    }

    private void initWriter() throws Exception {
        if (file == null) {
            file = new File(fileName);
            file.createNewFile();
        }
        if (fileWriter == null) fileWriter = new FileWriter(file, true);
        if (CSVWriter == null) CSVWriter = new CSVWriter(fileWriter);
    }

    public void closeWriter() {
        try {
            CSVWriter.close();
            fileWriter.close();
        } catch (IOException e) {
            logger.error("Error while closing writer.");
        }
    }

    public void closeReader() {
        try {
            CSVReader.close();
            fileReader.close();
        } catch (IOException e) {
            logger.error("Error while closing reader.");
        }
    }

}

readLine实际上是OpenCSV的readNext方法的包装器负责返回Line对象。同理，writeLine封装了OpenCSV的writeNext方法，接收Line并写入文件。下面实现每个步骤的业务逻辑。

LinesReader

实现LinesReader 类的完整逻辑：

public class LinesReader implements Tasklet, StepExecutionListener {
 
    private final Logger logger = LoggerFactory
      .getLogger(LinesReader.class);
 
    private List<Line> lines;
    private FileUtils fu;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        lines = new ArrayList<>();
        fu = new FileUtils(
          "taskletsvschunks/input/tasklets-vs-chunks.csv");
        logger.debug("Lines Reader initialized.");
    }
 
    @Override
    public RepeatStatus execute(StepContribution stepContribution, 
      ChunkContext chunkContext) throws Exception {
        Line line = fu.readLine();
        while (line != null) {
            lines.add(line);
            logger.debug("Read line: " + line.toString());
            line = fu.readLine();
        }
        return RepeatStatus.FINISHED;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        fu.closeReader();
        stepExecution
          .getJobExecution()
          .getExecutionContext()
          .put("lines", this.lines);
        logger.debug("Lines Reader ended.");
        return ExitStatus.COMPLETED;
    }
}

LinesReader的execute方法针对输入文件path创建CSVFileUtils实例，然后增加Line对象至list，直至没有数据可读。同时也实现StepExecutionListener接口，并实现其两个方法：beforeStep 和 afterStep，主要实现初始化或关闭资源。另外，afterStep方法中list（存储读取的数据：lines）被放入job上下文中，使其在下一个步骤中可用：

stepExecution
  .getJobExecution()
  .getExecutionContext()
  .put("lines", this.lines);

至此，第一步已经完成其职责：加载csv文件至内存（list中），下面继续第二步处理数据。

###　LinesProcessor

LinesProcessor　当然也实现StepExecutionListener和Tasklet接口,因此需要实现 beforeStep、 execute 、afterStep三个方法：

public class LinesProcessor implements Tasklet, StepExecutionListener {
 
    private Logger logger = LoggerFactory.getLogger(
      LinesProcessor.class);
 
    private List<Line> lines;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        ExecutionContext executionContext = stepExecution
          .getJobExecution()
          .getExecutionContext();
        this.lines = (List<Line>) executionContext.get("lines");
        logger.debug("Lines Processor initialized.");
    }
 
    @Override
    public RepeatStatus execute(StepContribution stepContribution, 
      ChunkContext chunkContext) throws Exception {
        for (Line line : lines) {
            long age = ChronoUnit.YEARS.between(
              line.getDob(), 
              LocalDate.now());
            logger.debug("Calculated age " + age + " for line " + line.toString());
            line.setAge(age);
        }
        return RepeatStatus.FINISHED;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        logger.debug("Lines Processor ended.");
        return ExitStatus.COMPLETED;
    }
}

很容易理解，其实现从job上下文中加载数据并计算每人的年龄。不需要再次把结果存入job上下文，因为其已经存在，我们仅修改了数据（增加了年龄）。下面开始最后一步。

LinesWriter

LinesWriter任务就是遍历list中的每项数据并写入姓名和年龄至文件：

public class LinesWriter implements Tasklet, StepExecutionListener {
 
    private final Logger logger = LoggerFactory
      .getLogger(LinesWriter.class);
 
    private List<Line> lines;
    private FileUtils fu;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        ExecutionContext executionContext = stepExecution
          .getJobExecution()
          .getExecutionContext();
        this.lines = (List<Line>) executionContext.get("lines");
        fu = new FileUtils("output.csv");
        logger.debug("Lines Writer initialized.");
    }
 
    @Override
    public RepeatStatus execute(StepContribution stepContribution, 
      ChunkContext chunkContext) throws Exception {
        for (Line line : lines) {
            fu.writeLine(line);
            logger.debug("Wrote line " + line.toString());
        }
        return RepeatStatus.FINISHED;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        fu.closeWriter();
        logger.debug("Lines Writer ended.");
        return ExitStatus.COMPLETED;
    }
}

至此已经实现了所有功能，下面跑测试验证结果。

运行job

定义测试配置类：

@Configuration
public class TaskletsTestConfig {
    @Bean
    public JobLauncherTestUtils jobLauncherTestUtils() {
        return new JobLauncherTestUtils();
    }
}

通过测试类运行：

@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {TaskletsConfig.class, TaskletsTestConfig.class})
public class TaskletApplicationTests {
    @Autowired
    private JobLauncherTestUtils jobLauncherTestUtils;

    @Test
    public void givenTaskletsJob_whenJobEnds_thenStatusCompleted() throws Exception {
        JobExecution jobExecution = jobLauncherTestUtils.launchJob();
        assertEquals(ExitStatus.COMPLETED, jobExecution.getExitStatus());
    }
}

ContextConfiguration注解指定spring上下文配置及JobLauncherTestUtils测试类。

所有都好了，开始运行测试。运行完成之后，output.csv有了期望的内容，日志显示如下：

[main] DEBUG o.b.t.tasklets.LinesReader - Lines Reader initialized.
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.tasklets.LinesReader - Lines Reader ended.
[main] DEBUG o.b.t.tasklets.LinesProcessor - Lines Processor initialized.
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 45 for line [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 64 for line [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 49 for line [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 40 for line [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 22 for line [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 25 for line [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Lines Processor ended.
[main] DEBUG o.b.t.tasklets.LinesWriter - Lines Writer initialized.
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Mae Hodges,10/22/1972,45]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Gary Potter,02/22/1953,64]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Betty Wise,02/17/1968,49]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Wayne Rose,04/06/1977,40]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Adam Caldwell,09/27/1995,22]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Lucille Phillips,05/14/1992,25]
[main] DEBUG o.b.t.tasklets.LinesWriter - Lines Writer ended.

Chunk方法

分析与设计

见名思议，该方法基于数据块（一部分数据）执行。也就是说，其不是一次读、处理和写所有行，而是一次仅读、处理、写固定数量记录。然后重复循环执行直到读不到数据为止。
因此，此流程与上面有些差异：
while 有数据：
do X 行数据
读一行
处理一行
写 X 行数据
end while

所以，我们需要针对该方法创建三个bean：

public class LineReader {
     // ...
}

public class LineProcessor {
    // ...
}

public class LinesWriter {
    // ...
}

在具体实现业务之前，我们首先配置job：

@Configuration
@EnableBatchProcessing
public class ChunksConfig {
    private final JobBuilderFactory jobs;
    private final StepBuilderFactory steps;

    public ChunksConfig(JobBuilderFactory jobs, StepBuilderFactory steps) {
        this.jobs = jobs;
        this.steps = steps;
    }

    @Bean
    public ItemReader<Line> itemReader() {
        return new LineReader();
    }

    @Bean
    public ItemProcessor<Line, Line> itemProcessor() {
        return new LineProcessor();
    }

    @Bean
    public ItemWriter<Line> itemWriter() {
        return new LinesWriter();
    }

    @Bean
    protected Step processLines(ItemReader<Line> reader,
                                ItemProcessor<Line, Line> processor, ItemWriter<Line> writer) {
        return steps.get("processLines").<Line, Line> chunk(2)
            .reader(reader)
            .processor(processor)
            .writer(writer)
            .build();
    }

    @Bean
    public Job job() {
        return jobs
            .get("chunksJob")
            .start(processLines(itemReader(), itemProcessor(), itemWriter()))
            .build();
    }
}

这里我们仅一个步骤执行一个任务。但是任务中定义了基于数据库进行读、处理以及写环节。
需要提醒的是，提交量表示在一个数据块中处理数据的数量。因此我们会一次读、处理、写两行。
好了，下面该实现每个环节的业务逻辑。

LineReader

LineReader负责读一条记录，然后返回Line实例。为了实现读环节，需实现ItemReader接口：

public class LineReader implements
  ItemReader<Line>, StepExecutionListener {
 
    private final Logger logger = LoggerFactory
      .getLogger(LineReader.class);
  
    private FileUtils fu;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        fu = new FileUtils("taskletsvschunks/input/tasklets-vs-chunks.csv");
        logger.debug("Line Reader initialized.");
    }
 
    @Override
    public Line read() throws Exception {
        Line line = fu.readLine();
        if (line != null) logger.debug("Read line: " + line.toString());
        return line;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        fu.closeReader();
        logger.debug("Line Reader ended.");
        return ExitStatus.COMPLETED;
    }
}

read()方法很直接，读一行数据并返回Line实例。同时我们也实现了StepExecutionListener接口，其中beforeStep 和 afterStep方法在整个步骤之前和之后运行，负责初始化资源和关闭资源。

LineProcessor

LineProcessor遵循的逻辑与LineReader几乎相同，需要实现process方法，其接收一个输入line，处理并返回输出line对象。

public class LineProcessor implements
  ItemProcessor<Line, Line>, StepExecutionListener {
 
    private Logger logger = LoggerFactory.getLogger(LineProcessor.class);
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        logger.debug("Line Processor initialized.");
    }
     
    @Override
    public Line process(Line line) throws Exception {
        long age = ChronoUnit.YEARS
          .between(line.getDob(), LocalDate.now());
        logger.debug(
          "Calculated age " + age + " for line " + line.toString());
        line.setAge(age);
        return line;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        logger.debug("Line Processor ended.");
        return ExitStatus.COMPLETED;
    }
}

同时也实现StepExecutionListener接口，这里仅用于输出日志，方便理解执行机制。

###　LinesWriter

与reader 和 processor 不同，LinesWriter写数据块中所有行，因此write方法接收List参数。

public class LinesWriter implements
  ItemWriter<Line>, StepExecutionListener {
 
    private final Logger logger = LoggerFactory
      .getLogger(LinesWriter.class);
  
    private FileUtils fu;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        fu = new FileUtils("output.csv");
        logger.debug("Line Writer initialized.");
    }
 
    @Override
    public void write(List<? extends Line> lines) throws Exception {
        for (Line line : lines) {
            fu.writeLine(line);
            logger.debug("Wrote line " + line.toString());
        }
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        fu.closeWriter();
        logger.debug("Line Writer ended.");
        return ExitStatus.COMPLETED;
    }
}

其他代码几乎是自描述的，下面开始测试job。

运行并测试job

我们创建一个新的测试，与之前测试tasklet方法一致：

@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {ChunksConfig.class, TaskletsTestConfig.class})
public class ChunksTest {

    @Autowired
    private JobLauncherTestUtils jobLauncherTestUtils;

    @Test
    public void givenChunksJob_whenJobEnds_thenStatusCompleted()
    throws Exception {

        JobExecution jobExecution = jobLauncherTestUtils.launchJob();

        assertEquals(ExitStatus.COMPLETED, jobExecution.getExitStatus());
    }
}

配置了ChunksConfig之后，开始运行测试。job正常运行后，生成了output.csv文件，包含期望结果，且日志输出如下：

[main] DEBUG o.b.t.chunks.LineReader - Line Reader initialized.
[main] DEBUG o.b.t.chunks.LinesWriter - Line Writer initialized.
[main] DEBUG o.b.t.chunks.LineProcessor - Line Processor initialized.
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 45 for line [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 64 for line [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Mae Hodges,10/22/1972,45]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Gary Potter,02/22/1953,64]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 49 for line [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 40 for line [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Betty Wise,02/17/1968,49]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Wayne Rose,04/06/1977,40]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 22 for line [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 25 for line [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Adam Caldwell,09/27/1995,22]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Lucille Phillips,05/14/1992,25]
[main] DEBUG o.b.t.chunks.LineProcessor - Line Processor ended.
[main] DEBUG o.b.t.chunks.LinesWriter - Line Writer ended.
[main] DEBUG o.b.t.chunks.LineReader - Line Reader ended.

两个方法结果相同，但流程不同，日志结果显示两个方法的执行流程的差异。