Spring Batch Framework– introduction chapter(上)

Bacth processes are hard to write-especially when using ageneral language like Java. Batch jobs run every night, making it easy formillions of people to do things like banking, online shopping, querying billinginformation.

Spring Batch is Java Framework that makes it easy to writebatch applications. Batch applications invlove reliably and efficientlyprocessing large volumes of data to and from various data sources (files,databases, and so on). Spring Batch is great at doing this and provides thenecessary foundation to meet the stringent requirements of batch appliocations.Sir Isaac Newton said, “If I have seen further it is only by standing on theshoulders of giants.” Spring batch builds on the shoulders of one giant inpraticular: the Spring Framework. Spring is the framework of choice for asignificant segment of the Enterprise Java development market. Spring Batchmakes the Spring programming model – based on simplicity and efficiency –easier to apply for batch applications.

What are batch applications? Batch applications processlarge amounts of data without human intervention. You’d opt to use bacthapplications to compute data for generating monthly financial statements,calculating statistics, and indexing files.

The most common scenario for a batch application isexporting data to files from one system and processing them in another. A batchapplication processes data automatically, so it must be robust and reliablebecause there is no human interaction to recover from an error. The greater thevolume of data a batch application must process, the longer it takes tocomplete. This means you must also consider performance in your batchapplication because it’s often restricted to execute within a specific timewindow. Every day, large and complex calculations take place to index billionsof documents, using cutting-edge algorithms like MapReduce. For data exchange,message-based solutions are also popular, having the advantage over batchapplications of being(close to) real time.




The goal of the Spring Batch project is to provide an opensource batch-oritened framework that effectively addresses the most common needsof batch applications.

Spring Batch isn’t a scheduler!

Spring Batch drives batch jobs (we use the terms job, batch,and process interchangeably) but doesn’t provide advanced support to launchthem according to a schedule. Spring Batch leaves this task to dedicatedschedulers like Quartz and cron. A scheduler triggers the launching of SpringBatch jobs by accessing the Spring Batch runtime ( like Quartz because it’s ajava solution) or by launching a dedicated JVM process( in the case of cron).Sometimes a scheduler launches batch jobs in sequence; first job A, and thenjob B if A succeeded, or job C if A failed. The scheduler can use the filesgenerated by the jobs or exit codes to orchestrate the sequence. Spring Batchcan aslo orchestrate such sequences itself; Spring Batch jobs are made ofsteps, and you can easily configure the sequence by using Spring Batch XML.

Should a whole batch fail because of one badly formattedline? Not always. The decision to skip an incorrect line or an incorrect itemis declarative in Spring Batch. It’s all about configuration. Components can trackeverything they do, and the framework provides them with the execution data onrestart. The components then know ehre they left off and can restart processingat the right place.


Spring Batch processes items in chunks. A job reads andwrites items in small chunks. Chunk processing allows streaming data instead ofloading all the data in memory. By default, chunk processing is single threadedand susally performs well. But some batch jobs need to execute faster, soSpring Batch provides ways to make chunk processing multi-threaded and todistribute processing on multiple physical nodes.

Partitioning splits a step into substeps, each of whichhandles a specific portion of the data. This implies that you know thestructure of the input data and that you know in advance how to distribute databetween substeps. Distribution can take place by ranges of primary key valuesfor database records or by directories for files. The substeps can executelocally or remotely, and Spring Batch provides support for multi-threadedsubsteps.

Spring Batch and grid computing

When dealing with large amounts of data—petabytes-a popularsolution to scaling is to divide the enormous amounts of computations intosmaller chunks, compute them in parallel(usually on different nodes), and thengather the results. Some open source frameworks(Haddop, GridGain, andHazelcast, for example) have appeared in the last few years to deal with theburden of distributing units of work so that developers can focus on developingthe computations themselves. How does Spring Batch compare to thesegrid-computing frameworks? Spring Batch is a loghtweight solution: all it needsis the Java Runtime installed, whereas grid-computing frameworks need a moreadvanced infrastucture. As an example, Hadoop usually works on top of its owndistributed fle system, HDFS. In terms of features, Spring Batch provides a lotof support to work with flat files, XML files, and relational database.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值