Spring Batch 写Excel数据
本文通过示例介绍如何通过Spring Batch写数据至Excel。
1. Apache POI介绍
Apache poi 是解析微软office文档的Java工具库,支持解析Excel,Word,PowerPoint甚至Visio.
本文聚焦Excel及当前最新版本xlsx,POI提供低内存占用API写xlsx文件。通过设置滑动窗口值指定在内存中缓存多少行,如果超过窗口大小,内容被写到磁盘临时文件,最后数据被移动至特定目标文件。
Spring Batch 支持CSV文件,但默认不支持Xlsx文件,github提供excel读写组件版本比较老,不支持POI sxssf api。
2. Spring Batch SXSSF 示例
为了简化过程,主要说明写Excel文件。我们使用Spring Boot搭建项目,主要依赖如下:
dependencies {
compile group: 'org.apache.poi', name: 'poi', version: '4.0.1'
compile group: 'org.apache.poi', name: 'poi-ooxml', version: '4.0.1'
compile group: 'org.apache.poi', name: 'poi-ooxml-schemas', version: '4.0.1'
compile "com.google.guava:guava:23.0"
implementation 'org.springframework.boot:spring-boot-starter-batch'
implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
compileOnly 'org.projectlombok:lombok'
runtimeOnly 'com.h2database:h2'
annotationProcessor 'org.projectlombok:lombok'
testImplementation('org.springframework.boot:spring-boot-starter-test') {
exclude group: 'org.junit.vintage', module: 'junit-vintage-engine'
}
testImplementation 'org.springframework.batch:spring-batch-test'
}
使用h2数据库和jpa说明从数据库读取信息。下面看实现过程。
假如书店有售书籍系统,需要获得书籍订单信息,需要把书记信息列表导出excel。下面JPA标识书店应用逻辑,定义Book实体和Repository从数据库中读取信息。
@Data
@Builder
@Entity
@AllArgsConstructor
@NoArgsConstructor
public class Book {
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
private Integer id;
private String title;
private String author;
private String isbn;
}
lombok的@Data注解针对所有属性生成toString, equals, hashCode, getters 和 setters方法。
Repository定义:
public interface BookRepository extends PagingAndSortingRepository<Book, Long> {
}
下面创建 RepositoryItemReader 从 JPA 读取数据,并启用JPA repository。在配置类中定义读步骤:
@Bean
public ItemReader<Book> bookReader(BookRepository repository) {
RepositoryItemReader<Book> reader = new RepositoryItemReader<>();
reader.setRepository(repository);
reader.setMethodName("findAll");
reader.setPageSize(CHUNK);
reader.setSort(singletonMap("id", ASC));
return reader;
}
读取书籍信息是为了写Excel,所有信息都保存在Workbook,我们定义一个Bean:
@Bean
public SXSSFWorkbook workbook() {
return new SXSSFWorkbook(CHUNK);
}
SXSSFWorkbook 在内存中的数量有CHUNK静态常量定义。使用workbook定义sheet,我们先声明BookWriter,后面详细解释:
@Bean
public ItemWriter<Book> bookWriter(SXSSFWorkbook workbook) {
SXSSFSheet sheet = workbook.createSheet("Books");
return new BookWriter(sheet);
}
读写组件都有,接着可以定义step和job,完整的配置类代码:
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
private static final Integer CHUNK = 100;
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
@Autowired
public BatchConfiguration(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
}
@Bean
public SXSSFWorkbook workbook() {
return new SXSSFWorkbook(CHUNK);
}
@Bean
public Job job(Step step, JobExecutionListener listener) {
return jobBuilderFactory.get("exportBooksToXlsx")
.start(step)
.listener(listener)
.build();
}
@Bean
public Step step(ItemReader<Book> reader, ItemWriter<Book> writer) {
return stepBuilderFactory.get("export")
.<Book, Book>chunk(CHUNK)
.reader(reader)
.writer(writer)
.build();
}
@Bean
public ItemReader<Book> bookReader(BookRepository repository) {
RepositoryItemReader<Book> reader = new RepositoryItemReader<>();
reader.setRepository(repository);
reader.setMethodName("findAll");
reader.setPageSize(CHUNK);
reader.setSort(singletonMap("id", ASC));
return reader;
}
@Bean
public ItemWriter<Book> bookWriter(SXSSFWorkbook workbook) {
SXSSFSheet sheet = workbook.createSheet("Books");
return new BookWriter(sheet);
}
@Bean
JobListener jobListener(SXSSFWorkbook workbook, BookRepository bookRepository) {
return new JobListener(workbook, bookRepository);
}
}
下面详细解释BookWriter类,其针对每条书籍信息创建一行,属性作为列进行存储。
public class BookWriter implements ItemWriter<Book> {
private final Sheet sheet;
public BookWriter(Sheet sheet) {
this.sheet = sheet;
}
@Override
public void write(List<? extends Book> list) {
for (int i = 0; i < list.size(); i++) {
writeRow(i+1, list.get(i));
}
}
private void writeRow(int currentRowNumber, Book book) {
List<String> columns = prepareColumns(book);
Row row = this.sheet.createRow(currentRowNumber);
for (int i = 0; i < columns.size(); i++) {
writeCell(row, i, columns.get(i));
}
}
private List<String> prepareColumns(Book book) {
return asList(
book.getId().toString(),
book.getAuthor(),
book.getTitle(),
book.getIsbn()
);
}
private void writeCell(Row row, int currentColumnNumber, String value) {
Cell cell = row.createCell(currentColumnNumber);
cell.setCellValue(value);
}
}
当Job状态改变时,会自动调用JobExecutionListener。在Job完成时,使用FileOutputStream保存数据至xlsx文件:
@Slf4j
public class JobListener implements JobExecutionListener {
private final SXSSFWorkbook workbook;
private final BookRepository bookRepository;
public JobListener(SXSSFWorkbook workbook, BookRepository bookRepository) {
this.workbook = workbook;
this.bookRepository = bookRepository;
}
@Override
public void afterJob(JobExecution jobExecution) {
BatchStatus batchStatus = jobExecution.getStatus().getBatchStatus();
String resName = (String)jobExecution.getExecutionContext().get("resourceName");
assert resName != null;
if (batchStatus == COMPLETED) {
try {
FileOutputStream fileOutputStream = new FileOutputStream(resName+".xlsx");
workbook.write(fileOutputStream);
fileOutputStream.close();
} catch (IOException e) {
log.error(e.getMessage(), e);
}
}
}
@Override
public void beforeJob(JobExecution jobExecution) {
initializeBooks(bookRepository);
ExecutionContext context = jobExecution.getExecutionContext();
context.put("titles", Arrays.asList("id","title","author","isbn"));
context.put("resourceName", "书籍信息信息");
headRow((List<String>) Objects.requireNonNull(context.get("titles")));
}
private void headRow(List<String> titles){
Row row = workbook.getSheetAt(0).createRow(0);
for (int i = 0; i < titles.size(); i++) {
writeCell(row, i, titles.get(i));
}
}
private void writeCell(Row row, int currentColumnNumber, String value) {
Cell cell = row.createCell(currentColumnNumber);
cell.setCellValue(value);
}
private void initializeBooks(BookRepository bookRepository) {
Set<Book> books = new HashSet<>();
Book.BookBuilder builder = Book.builder();
books.add(builder.author("John Doe").title("Forbid tails").isbn("1111-111-111-111").build());
books.add(builder.author("Mary Doe").title("Not found title").isbn("2222-222-222-222").build());
bookRepository.saveAll(books);
}
}
initializeBooks方法增加模拟数据。beforeJob方法在Job启动之前调用,动态设置excel标题信息和导出文件名称。
3. 总结
本文使用Spring Batch从数据库读取信息写入Excel。使用POI的sxssf api占用较小内存,避免大数据量造成系统罢工。