Mybatis物理分页
看了之前的几篇处理文章,还是很久以前写的了,最近再补一补更加高级的使用场景。
真实的使用场景下会遇到以下场景:
1、单表数据量太大,1000万数据左右?
2、能不能多线程处理?
处理思路分析
1、单表数据量大,肯定不能一次加载到数据库内存。首先想到是分页,一页一页的读写。经过分页,mybatis有springbatch批量处理的实现。挺高兴的。。。
2、多线程处理,springbatch官网读完发现也有。也挺好的。
然而在实际操作的时候发现,mybatis的分页批处理的实现里,还是个内存分页,先把数据全弄处理,然后分页写?!!!额。。。。1000万,直接内存溢出了。怎么办???拷贝Mybatis源码进行重写吧。
下面就是我的实现:
Mybatis SpringBatch物理分页重写
MyBatisPhysicsPagingItemReader
import java.util.List;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;
import org.springframework.batch.item.database.AbstractPagingItemReader;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.util.ClassUtils;
public abstract class AbstractPhysicsPagingItemReader<T> extends AbstractPagingItemReader<T> implements InitializingBean {
protected Log logger = LogFactory.getLog(getClass());
private int pageSize = 10;
private volatile int page = 0;
private volatile int current = 0;
protected volatile List<T> results;
private Object lock = new Object();
public AbstractPhysicsPagingItemReader() {
setName(ClassUtils.getShortName(AbstractPhysicsPagingItemReader.class));
}
//重写Read接口的方法
public T read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
synchronized (lock) {
//优化一下,如果results里的数据取完了就得再取一次
//在Mybatis中如果使用了一对多映射Collection
//数据库读取16条数据,可能转换之后只有8条数据,必pageSize要小
//如果按照current >= pageSize的逻辑,不会再继续读取
//导致分页只能取到第一页
//if (results == null || current >= pageSize) {
if (results == null || (current>=results.size()) ) {
if (logger.isDebugEnabled()) {
logger.debug("Reading page " + page);
}
doReadPage();
page++;
if (current >= results.size()) {
current = 0;
}
}
int next = current++;
if (next < results.size()) {
return results.get(next);
} else {
return null;
}
}
}
abstract protected void doReadPage();
public int getPageSize() {
return pageSize;
}
public void setPageSize(int pageSize) {
this.pageSize = pageSize;
}
public int getPage() {
return page;
}
public int getCurrent() {
return current;
}
}
MyBatisPhysicsPagingItemReader
import static org.springframework.util.ClassUtils.getShortName;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.CopyOnWriteArrayList;
import com.ycxy.batchstudy.comm.cons.CommonCons;
import org.apache.ibatis.session.SqlSessionFactory;
import org.mybatis.spring.SqlSessionTemplate;
public class MyBatisPhysicsPagingItemReader<T> extends AbstractPhysicsPagingItemReader<T> {
private String queryId;
private SqlSessionFactory sqlSessionFactory;
private SqlSessionTemplate sqlSessionTemplate;
private Map<String, Object> parameterValues;
public MyBatisPhysicsPagingItemReader() {
setName(getShortName(MyBatisPhysicsPagingItemReader.class));
}
@Override
protected void doReadPage() {
Map<String, Object> parameters = new HashMap<String, Object>();
if (parameterValues != null) {
parameters.putAll(parameterValues);
}
parameters.put(CommonCons.KEY_PAGING_PAGE, getPage());
parameters.put(CommonCons.KEY_PAGING_PAGESIZE, getPageSize());
parameters.put(CommonCons.KEY_START_PAGE, getPage() * getPageSize());
parameters.put(CommonCons.KEY_END_PAGE, (getPage()+1) * getPageSize());
parameters.put(CommonCons.KEY_PAGING_SKIPROWS, getPage() * getPageSize());
if (results == null) {
results = new CopyOnWriteArrayList<T>();
} else {
results.clear();
}
System.out.println(parameters.toString());
long startTime = System.currentTimeMillis();
results.addAll(sqlSessionTemplate.<T>selectList(queryId, parameters));
long endTime = System.currentTimeMillis();
logger.debug("----reader[" + queryId + "]耗时: " + (endTime - startTime) + " ms");
}
public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
this.sqlSessionFactory = sqlSessionFactory;
}
public void setQueryId(String queryId) {
this.queryId = queryId;
}
public void setParameterValues(Map<String, Object> parameterValues) {
this.parameterValues = parameterValues;
}
@Override
protected void doJumpToPage(int itemIndex) {
}
}
MyBatisPhysicsPagingItemReaderBuilder
import org.apache.ibatis.session.SqlSessionFactory;
import java.util.Map;
import java.util.Optional;
/**
* A builder for the {@link MyBatisPhysicsPagingItemReader}.
*
* @author Kazuki Shimizu
* @since 2.0.0
* @see MyBatisPhysicsPagingItemReader
*/
public class MyBatisPhysicsPagingItemReaderBuilder<T> {
private SqlSessionFactory sqlSessionFactory;
private String queryId;
private Map<String, Object> parameterValues;
private Integer pageSize;
private Boolean saveState;
private Integer maxItemCount;
/**
* Set the {@link SqlSessionFactory} to be used by writer for database access.
*
* @param sqlSessionFactory
* the {@link SqlSessionFactory} to be used by writer for database access
* @return this instance for method chaining
* @see MyBatisPhysicsPagingItemReader#setSqlSessionFactory(SqlSessionFactory)
*/
public MyBatisPhysicsPagingItemReaderBuilder<T> sqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
this.sqlSessionFactory = sqlSessionFactory;
return this;
}
/**
* Set the query id identifying the statement in the SqlMap configuration file.
*
* @param queryId
* the id for the query
* @return this instance for method chaining
* @see MyBatisPhysicsPagingItemReader#setQueryId(String)
*/
public MyBatisPhysicsPagingItemReaderBuilder<T> queryId(String queryId) {
this.queryId = queryId;
return this;
}
/**
* Set the parameter values to be used for the query execution.
*
* @param parameterValues
* the parameter values to be used for the query execution
* @return this instance for method chaining
* @see MyBatisPhysicsPagingItemReader#setParameterValues(Map)
*/
public MyBatisPhysicsPagingItemReaderBuilder<T> parameterValues(Map<String, Object> parameterValues) {
this.parameterValues = parameterValues;
return this;
}
/**
* The number of records to request per page/query. Defaults to 10. Must be greater than zero.
*
* @param pageSize
* number of items
* @return this instance for method chaining
* @see org.springframework.batch.item.database.AbstractPagingItemReader#setPageSize(int)
*/
public MyBatisPhysicsPagingItemReaderBuilder<T> pageSize(int pageSize) {
this.pageSize = pageSize;
return this;
}
/**
* Configure if the state of the {@link org.springframework.batch.item.ItemStreamSupport} should be persisted within
* the {@link org.springframework.batch.item.ExecutionContext} for restart purposes.
*
* @param saveState
* defaults to true
* @return The current instance of the builder.
* @see org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader#setSaveState(boolean)
*/
public MyBatisPhysicsPagingItemReaderBuilder<T> saveState(boolean saveState) {
this.saveState = saveState;
return this;
}
/**
* Configure the max number of items to be read.
*
* @param maxItemCount
* the max items to be read
* @return The current instance of the builder.
* @see org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader#setMaxItemCount(int)
*/
public MyBatisPhysicsPagingItemReaderBuilder<T> maxItemCount(int maxItemCount) {
this.maxItemCount = maxItemCount;
return this;
}
/**
* Returns a fully built {@link MyBatisPhysicsPagingItemReader}.
*
* @return the reader
*/
public MyBatisPhysicsPagingItemReader<T> build() {
MyBatisPhysicsPagingItemReader<T> reader = new MyBatisPhysicsPagingItemReader<>();
reader.setSqlSessionFactory(this.sqlSessionFactory);
reader.setQueryId(this.queryId);
reader.setParameterValues(this.parameterValues);
Optional.ofNullable(this.pageSize).ifPresent(reader::setPageSize);
Optional.ofNullable(this.saveState).ifPresent(reader::setSaveState);
Optional.ofNullable(this.maxItemCount).ifPresent(reader::setMaxItemCount);
return reader;
}
}
分页的关键就是写分页代码。
SQL参考:
Oracle版本
<select id="selectPeopleWithSubAll" resultMap="WithSubMap" >
select
p.PERSON_ID AS PERSON_ID,
sub.PERSON_ID AS sublist_PERSON_ID,
sub.SEQ_NO AS sublist_SEQ_NO ,
sub.FIRST_NAME AS sublist_FIRST_NAME,
sub.LAST_NAME AS sublist_LAST_NAME
from (select PERSON_ID,FIRST_NAME,LAST_NAME from PEOPLE
where rownum <![CDATA[>]]> #{_startpage,jdbcType=DECIMAL}
and rownum <![CDATA[<=]]> #{_endpage,jdbcType=DECIMAL}
order by PERSON_ID ) p,PEOPLE_SUB sub
where p.PERSON_ID=sub.PERSON_ID
</select>
先排序,根据排序进行分页。
Mysql版本
limit方式。。。会存在性能问题。
优化进阶方案:
选中一个唯一索引UK列,进行分页分段,然后根据UK查询表中数据分页处理。
使用场景
如果表中数据量不大,百万级,直接排序分页即可。未来提高性能,可以适当修改一下提交量和读取量。
缺点
这里的读为了避免并发时出现问题,在读的时候加了锁。也就是说,使用该reader时在读阶段是不会正在的并发的,其实还是属于串行读取。
如果在不同的机器或者一个机器多个实例部署,每个jvm下执行不同的区间段,还是能够实现真正的并发的。