Spring Batch分区示例

最新推荐文章于 2023-10-18 09:30:00 发布

cyan20115

最新推荐文章于 2023-10-18 09:30:00 发布

阅读量1.3k

点赞数

文章标签： java 开发工具数据库

图片来源： Spring Source

在Spring Batch中，“分区”是“多个线程分别处理一系列数据”。例如，假设您在一个表中有100条记录，并在其中分配了“主要ID”（从1到100），并且您想处理全部100条记录。

通常，该过程从1到100（单线程示例）开始。该过程估计需要10分钟才能完成。

Single Thread - Process from 1 to 100

在“分区”中，我们可以启动10个线程来处理每个10条记录（基于“ id”的范围）。现在，该过程可能只需要1分钟即可完成。

Thread 1 - Process from 1 to 10
Thread 2 - Process from 11 to 20
Thread 3 - Process from 21 to 30
......
Thread 9 - Process from 81 to 90
Thread 10 - Process from 91 to 100

要实施“分区”技术，您必须了解要处理的输入数据的结构，以便可以适当地计划“数据范围”。

1.教程

在本教程中，我们将向您展示如何创建一个“ Partitioner”作业，该作业有10个线程，每个线程将根据提供的“ id”范围从数据库中读取记录。

使用的工具和库

Maven 3
Eclipse 4.2
JDK 1.6
Spring Core 3.2.2。发布
Spring Batch 2.2.0。发布
MySQL Java驱动程式5.1.25

PS假设“用户”表有100条记录。

users table structure

id, user_login, user_passs, age

1,user_1,pass_1,20
2,user_2,pass_2,40
3,user_3,pass_3,70
4,user_4,pass_4,5
5,user_5,pass_5,52
......
99,user_99,pass_99,89
100,user_100,pass_100,76

2.项目目录结构

查看最终项目结构，这是一个标准的Maven项目。

3.分区器

首先，创建一个Partitioner实现，将“ 分区范围 ”放入ExecutionContext 。稍后，您将声明相同的fromId并将其tied在批处理作业XML文件中。

在这种情况下，分区范围如下所示：

Thread 1 = 1 - 10
Thread 2 = 11 - 20
Thread 3 = 21 - 30
......
Thread 10 = 91 - 100

RangePartitioner.java

package com.mkyong.partition;

import java.util.HashMap;
import java.util.Map;

import org.springframework.batch.core.partition.support.Partitioner;
import org.springframework.batch.item.ExecutionContext;

public class RangePartitioner implements Partitioner {

	@Override
	public Map<String, ExecutionContext> partition(int gridSize) {

		Map<String, ExecutionContext> result 
                       = new HashMap<String, ExecutionContext>();

		int range = 10;
		int fromId = 1;
		int toId = range;

		for (int i = 1; i <= gridSize; i++) {
			ExecutionContext value = new ExecutionContext();

			System.out.println("\nStarting : Thread" + i);
			System.out.println("fromId : " + fromId);
			System.out.println("toId : " + toId);

			value.putInt("fromId", fromId);
			value.putInt("toId", toId);

			// give each thread a name, thread 1,2,3
			value.putString("name", "Thread" + i);

			result.put("partition" + i, value);

			fromId = toId + 1;
			toId += range;

		}

		return result;
	}

}

4.批处理作业

查看批处理作业XML文件，它应该是不言自明的。要强调的几点：

对于分区程序， grid-size =线程数 。
对于pageItemReader bean（一个jdbc阅读器示例）， #{stepExecutionContext[fromId, toId]}值将由ExecutionContext注入rangePartitioner中。
对于itemProcessor bean， #{stepExecutionContext[name]}值将由RangePartitioner中的ExecutionContext注入。
对于编写者，每个线程将以文件名格式– users.processed[fromId]}-[toId].csv将记录输出到不同的csv文件中。

job-partitioner.xml

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:batch="http://www.springframework.org/schema/batch"
	xmlns:util="http://www.springframework.org/schema/util"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.springframework.org/schema/batch 
	http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
	http://www.springframework.org/schema/beans 
	http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
	http://www.springframework.org/schema/util 
	http://www.springframework.org/schema/util/spring-util-3.2.xsd
	">

  <!-- spring batch core settings -->
  <import resource="../config/context.xml" />
	
  <!-- database settings -->
  <import resource="../config/database.xml" />

  <!-- partitioner job -->
  <job id="partitionJob" xmlns="http://www.springframework.org/schema/batch">
	    
    <!-- master step, 10 threads (grid-size)  -->
    <step id="masterStep">
	<partition step="slave" partitioner="rangePartitioner">
		<handler grid-size="10" task-executor="taskExecutor" />
	</partition>
    </step>
		
  </job>

  <!-- each thread will run this job, with different stepExecutionContext values. -->
  <step id="slave" xmlns="http://www.springframework.org/schema/batch">
	<tasklet>
		<chunk reader="pagingItemReader" writer="flatFileItemWriter"
			processor="itemProcessor" commit-interval="1" />
	</tasklet>
  </step>

  <bean id="rangePartitioner" class="com.mkyong.partition.RangePartitioner" />

  <bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor" />

  <!-- inject stepExecutionContext -->
  <bean id="itemProcessor" class="com.mkyong.processor.UserProcessor" scope="step">
	<property name="threadName" value="#{stepExecutionContext[name]}" />
  </bean>

  <bean id="pagingItemReader"
	class="org.springframework.batch.item.database.JdbcPagingItemReader"
	scope="step">
	<property name="dataSource" ref="dataSource" />
	<property name="queryProvider">
	  <bean
		class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
		<property name="dataSource" ref="dataSource" />
		<property name="selectClause" value="select id, user_login, user_pass, age" />
		<property name="fromClause" value="from users" />
		<property name="whereClause" value="where id &gt;= :fromId and id &lt;= :toId" />
		<property name="sortKey" value="id" />
	  </bean>
	</property>
	<!-- Inject via the ExecutionContext in rangePartitioner -->
	<property name="parameterValues">
	  <map>
		<entry key="fromId" value="#{stepExecutionContext[fromId]}" />
		<entry key="toId" value="#{stepExecutionContext[toId]}" />
	  </map>
	</property>
	<property name="pageSize" value="10" />
	<property name="rowMapper">
		<bean class="com.mkyong.UserRowMapper" />
	</property>
  </bean>

  <!-- csv file writer -->
  <bean id="flatFileItemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter"
	scope="step" >
	<property name="resource"
		value="file:csv/outputs/users.processed#{stepExecutionContext[fromId]}-#{stepExecutionContext[toId]}.csv" />
	<property name="appendAllowed" value="false" />
	<property name="lineAggregator">
	  <bean
		class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
		<property name="delimiter" value="," />
		<property name="fieldExtractor">
		  <bean
			class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor">
			<property name="names" value="id, username, password, age" />
		  </bean>
		</property>
	  </bean>
	</property>
  </bean>

</beans>

项目处理器类仅用于打印处理项目和当前正在运行的“线程名称”。

UserProcessor.java – item processor

package com.mkyong.processor;

import org.springframework.batch.item.ItemProcessor;
import com.mkyong.User;

public class UserProcessor implements ItemProcessor<User, User> {

	private String threadName;

	@Override
	public User process(User item) throws Exception {

		System.out.println(threadName + " processing : " 
			+ item.getId() + " : " + item.getUsername());

		return item;
	}

	public String getThreadName() {
		return threadName;
	}

	public void setThreadName(String threadName) {
		this.threadName = threadName;
	}

}

5.运行

加载所有内容并运行它……将启动10个线程来处理提供的数据范围。

package com.mkyong;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class PartitionApp {

  public static void main(String[] args) {
	PartitionApp obj = new PartitionApp ();
	obj.runTest();
  }

  private void runTest() {

	String[] springConfig = { "spring/batch/jobs/job-partitioner.xml" };

	ApplicationContext context = new ClassPathXmlApplicationContext(springConfig);

	JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
	Job job = (Job) context.getBean("partitionJob");

	try {

	  JobExecution execution = jobLauncher.run(job, new JobParameters());
	  System.out.println("Exit Status : " + execution.getStatus());
	  System.out.println("Exit Status : " + execution.getAllFailureExceptions());

	} catch (Exception e) {
		e.printStackTrace();
	}

	  System.out.println("Done");

  }
}

控制台输出

Starting : Thread1
fromId : 1
toId : 10

Starting : Thread2
fromId : 11
toId : 20

Starting : Thread3
fromId : 21
toId : 30

Starting : Thread4
fromId : 31
toId : 40

Starting : Thread5
fromId : 41
toId : 50

Starting : Thread6
fromId : 51
toId : 60

Starting : Thread7
fromId : 61
toId : 70

Starting : Thread8
fromId : 71
toId : 80

Starting : Thread9
fromId : 81
toId : 90

Starting : Thread10
fromId : 91
toId : 100

Thread8 processing : 71 : user_71
Thread2 processing : 11 : user_11
Thread3 processing : 21 : user_21
Thread10 processing : 91 : user_91
Thread4 processing : 31 : user_31
Thread6 processing : 51 : user_51
Thread5 processing : 41 : user_41
Thread1 processing : 1 : user_1
Thread9 processing : 81 : user_81
Thread7 processing : 61 : user_61
Thread2 processing : 12 : user_12
Thread7 processing : 62 : user_62
Thread6 processing : 52 : user_52
Thread1 processing : 2 : user_2
Thread9 processing : 82 : user_82
......

该过程完成后，将创建10个CSV文件。

users.processed1-10.csv

1,user_1,pass_1,20
2,user_2,pass_2,40
3,user_3,pass_3,70
4,user_4,pass_4,5
5,user_5,pass_5,52
6,user_6,pass_6,69
7,user_7,pass_7,48
8,user_8,pass_8,34
9,user_9,pass_9,62
10,user_10,pass_10,21

6.其他

6.1或者，您可以通过注释注入#{stepExecutionContext[name]} 。

UserProcessor.java – Annotation version

package com.mkyong.processor;

import org.springframework.batch.item.ItemProcessor;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;
import com.mkyong.User;

@Component("itemProcessor")
@Scope(value = "step")
public class UserProcessor implements ItemProcessor<User, User> {

	@Value("#{stepExecutionContext[name]}")
	private String threadName;

	@Override
	public User process(User item) throws Exception {

		System.out.println(threadName + " processing : " 
                     + item.getId() + " : " + item.getUsername());

		return item;
	}
	
}

请记住，启用Spring组件自动扫描。

<context:component-scan base-package="com.mkyong" />

6.2数据库分区读取器– MongoDB示例。

job-partitioner.xml

<bean id="mongoItemReader" class="org.springframework.batch.item.data.MongoItemReader"
	scope="step">
	<property name="template" ref="mongoTemplate" />
	<property name="targetType" value="com.mkyong.User" />
	<property name="query"
	  value="{ 
		'id':{$gt:#{stepExecutionContext[fromId]}, $lte:#{stepExecutionContext[toId]} 
	  } }" 
	/>
	<property name="sort">
		<util:map id="sort">
			<entry key="id" value="" />
		</util:map>
	</property>
  </bean>

做完了

下载源代码

下载它– SpringBatch-Partitioner-Example.zip （31 KB）

参考文献

标签：分区弹簧批处理线程

翻译自: https://mkyong.com/spring-batch/spring-batch-partitioning-example/

cyan20115

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Spring Batch分区示例

图片来源： Spring Source 在Spring Batch中，“分区”是“多个线程分别处理一系列数据”。例如，假设您在一个表中有100条记录，并在其中分配了“主要ID”（从1到100），并且您想处理全部100条记录。通常，该过程从1到100（单线程示例）开始。该过程估计需要10分钟才能完成。 Single Thread - Process from 1 to ...
复制链接

扫一扫