多线程批量插入数据到mysql
最近发现大批量往mysql插入数据,可以用多线程实现。
首先来了解一下
threadPoolExecutor.execute与executorService.submit,都是把任务提交给线程池,execute无返回值,submit有返回值。
看我写得一个demo,先看图片项目流程结构:(手动画的真丑)
①实体类:
@Data
public class GeneralTable implements Serializable {
private String colValue;
private String colAttr;
private String colType;
private String colFrom;
private Long rowKey;
}
②先看控制层:(简单的调用接口)
@RequestMapping("/insert")
public Boolean insert() {
GeneralTable gt = new GeneralTable();
Random rand = new Random();
List<GeneralTable> list = new ArrayList<>();
for (int i = 0; i < 1000000; i++) {
gt.setColAttr("列属性" + rand.nextInt(9) * 1000);
gt.setColFrom("表属性" + rand.nextInt(9) * 1000);
gt.setColValue("列值" + rand.nextInt(9) * 1000);
gt.setColType("列类型" + rand.nextInt(9) * 1000);
gt.setRowKey((long) rand.nextInt(1000));
list.add(gt);
}
boolean a = batchOperateMysqlInf.insert(list);
return a;
}
③Interface
boolean insert(List<GeneralTable> list);
④实现类
@Service
public class BatchOperateMysqlImpl implements BatchOperateMysqlInf {
@Autowired
private BatchOperateMysqlDao batchOperateMysqlDao;
//创建自适应机器本身线程数量的线程池
Integer process = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = new ThreadPoolExecutor(
2,
process,
2L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<>(3),
Executors.defaultThreadFactory(),
new ThreadPoolExecutor.CallerRunsPolicy()
);
@Override
public boolean insert(List<GeneralTable> list) {
Future<Boolean> a = null;
try {
/**
* submit与execute 都是向线程池提交任务。
* submit提交后执行提交类实现callable方法后重写的call方法,execute提交后执行实现Runnable的run方法
* Runnable任务没有返回值,而Callable任务有返回值。
* 并且Callable的call()方法只能通过ExecutorService的submit(Callable <T> task) 方法来执行
* 多人同时提交时的线程控制:
*/
a = executorService.submit(new BatchInsert(list, batchOperateMysqlDao));
return a.get();
} catch (Exception e) {
e.printStackTrace();
try {
return a.get();
} catch (Exception ex) {
ex.printStackTrace();
return false;
}
}
}
⑤批量任务类
public class BatchInsert implements Callable<Boolean> {
/**
* 100条为分界批量导入
*/
private int batch100 = 100;
/**mysql数据*/
private List<GeneralTable> list;
private BatchOperateMysqlDao batchOperateMysqlDao;
/**线程池*/
private ThreadPoolExecutor threadPoolExecutor =
new ThreadPoolExecutor(2,
Runtime.getRuntime().availableProcessors(),
2L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<>(100),
Executors.defaultThreadFactory(),
new ThreadPoolExecutor.CallerRunsPolicy()
);
/**
*这里的对象batchOperateMysqlDao是从实现类传过来的,因为这个类本身没有纳入容器管理
*所以不能直接用Autowire引入dao层对象
**/
public BatchInsert(List<GeneralTable> list, BatchOperateMysqlDao batchOperateMysqlDao) {
this.list = list;
this.batchOperateMysqlDao = batchOperateMysqlDao;
}
public BatchInsert(List<GeneralTable> list) {
this.list = list;
}
@Override
public Boolean call(){
try {
batchOp(list);
return true;
} catch (Exception e) {
e.printStackTrace();
}
return false;
}
private void batchOp(List<GeneralTable> list) {
System.out.println("我是线程:"+Thread.currentThread().getName());
if(!list.isEmpty()){
Integer size = list.size();
if(size<=batch100){
batchOperateMysqlDao.batchInsert(list);
}else if(size>batch100){
batchOpSpilit(list,batch100);
}
}
}
//切割
private void batchOpSpilit(List<GeneralTable> list, int batch100) {
System.out.println("开始切割………………");
Long t1 = System.currentTimeMillis();
List<List<GeneralTable>> list1 = BathUtil.pagingList(list,batch100);
try {
for(List<GeneralTable> list2:list1){
//再调batchOp方法,这里的多线程是多个小集合往数据库插
threadPoolExecutor.execute(()->{
System.out.println("我是线程:"+Thread.currentThread().getName() );
batchOp(list2);
});
}
} catch (Exception e) {
e.printStackTrace();
}finally {
threadPoolExecutor.shutdown();
Long t2 = System.currentTimeMillis();
System.out.println("执行完成,用时…………"+(t2-t1));
}
}
}
⑥集合切割类,就是把一个大集合切割成多个指定条数的小集合,方便往数据库插入数据
public static <T> List<List<T>> pagingList(List<T> list, int pageSize){
int length = list.size();
int num = (length+pageSize-1)/pageSize;
List<List<T>> newList = new ArrayList<>();
for(int i=0;i<num;i++){
int fromIndex = i*pageSize;
int toIndex = (i+1)*pageSize<length?(i+1)*pageSize:length;
newList.add(list.subList(fromIndex,toIndex));
}
return newList;
}
⑦dao层
/**
* 批量插入数据库
* @param list
*/
void batchInsert(@Param("list") List<GeneralTable> list);
⑧ xml层数据插入
<insert id="batchInsert" parameterType="com.sinux.liaochao.myallprogramtest.sinux.batchoperation.entity.GeneralTable">
insert into generateTable(
row_key,
col_value,
col_attr,
col_type,
col_from
)values
<foreach collection="list" item="generaltable" separator=",">
(
#{generaltable.rowKey,jdbcType=BIGINT},
#{generaltable.colValue,jdbcType=VARCHAR},
#{generaltable.colAttr,jdbcType=VARCHAR},
#{generaltable.colType,jdbcType=VARCHAR},
#{generaltable.colFrom,jdbcType=VARCHAR}
)
</foreach>
</insert>
这样就可以跑起来了,我亲自测试三百万数据也仅仅几十秒就插入完了
总结一下:
壹:本案例采用向线程池提交任务的方法应对并发访问,即executorService.submit
贰:采用把数据量大的集合进行拆分成指定数量的小集合,即 BathUtil.pagingList(list,batch100);
叁:用线程池处理多个小集合多线程向数据库插入数据,即threadPoolExecutor.execute(()->{})
肆:在sql采用循环插入数据 ,即 <foreach collection="list" item="generaltable" separator=","></foreach>