【Mybatis批处理 vs 集合拆分+多线程插入100W条数据性能对比】

为java添砖加瓦

已于 2024-09-20 19:14:56 修改

阅读量1k

点赞数 23

于 2024-09-20 19:00:30 首次发布

本文链接：https://blog.csdn.net/weixin_44585177/article/details/142390607

版权

文章目录

一.什么是MyBatis批处理

1.概念

在MyBatis中，批处理操作是一种高效执行多条语句的方式，特别是当你需要在一个事务中插入、更新或删除多条记录时。批处理可以显著减少与数据库的交互次数，从而提高性能。

执行批处理的基本步骤：

开启批处理模式：在获取SqlSession时，需要指定执行器（Executor）类型为ExecutorType.BATCH。
执行SQL语句：执行需要批处理的SQL语句，此时语句并不会立即执行，而是被添加到批处理队列中。
提交事务：调用SqlSession.commit()方法，此时MyBatis会将批处理队列中的语句一次性发送给数据库执行。处理批处理结果：提交事务后，可以通过批处理结果进行后续处理。

2.Mybatis批处理代码

@Slf4j
@Component
public class BatchInsertUtils {
    /**
     * 每一批数据条数：10000条
     */
    private static final int BATCH_SIZE = 10000;

    @Resource
    private SqlSessionFactory sqlSessionFactory;

    /**
     * mybatis批量插入数据，批处理
     * @param data
     * @param mapperClass
     * @param function
     * @return
     * @param <T>
     * @param <U>
     * @param <R>
     */
    public<T,U,R>int batchInsertOrUpdateData(List<T> data, Class<U> mapperClass, BiFunction<T,U,R> function){
        int i = 1;
        // 设置批处理
        SqlSession batchSqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH);
        try{
            U mapper = batchSqlSession.getMapper(mapperClass);
            if(CollectionUtils.isEmpty(data)){
                log.info("数据为空");
                return 0;
            }
            int size = data.size();
            for (T element : data){
                function.apply(element,mapper);
                if(i % BATCH_SIZE == 0 || i == size){
                    batchSqlSession.flushStatements();
                }
                i++;
            }
            batchSqlSession.commit(!TransactionSynchronizationManager.isSynchronizationActive());
        }catch (Exception e){
            batchSqlSession.rollback();
            log.info("批量写入失败："+e);
        }finally {
            batchSqlSession.close();
        }
        return i-1;
    }
}

设置url参数，开启批处理

allowMultiQueries=true：支持多SQL执行的参数，添加该参数后，可以进行批处理操作，提高数据库操作效率
只有把rewriteBatchedStatements参数置为true, 驱动才会帮你批量执行SQL
设置jdbc的url：jdbc:mysql://127.0.0.1:3306/test?allowMultiQueries=true&rewriteBatchedStatements=true

3.集合拆分+多线程批量插入代码

 public void parallelSubList() {
        List<User> list = new ArrayList<User>();
        for (int i = 0; i < 1000000; i++) {
            list.add(new User("test" + i,25));
        }
        int totalCount = list.size();
        int pageSize = 10000;
        int threadCount = totalCount % pageSize == 0 ?  totalCount / pageSize : totalCount / pageSize + 1;
        log.info("线程拆分数量:{}",threadCount);
        CountDownLatch countDownLatch = new CountDownLatch(threadCount);

        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        for (int index = 0; index < threadCount; index++){
            List<User> subList = list.subList(index * pageSize, index == threadCount - 1 ? totalCount : (index + 1) * pageSize);
            tulingThreadPoolExecutor.submit(()->{
                try {
                    // 线程里面批量插入10000条用户信息
                    log.info("当前执行线程名称:{}",Thread.currentThread().getName());
                    userMapper.batchInsert(subList);
                }catch (Exception e){
                    log.info("当前任务执行失败：{}",e.getMessage());
                }finally {
                    countDownLatch.countDown();
                }
            });
        }

        try {
            countDownLatch.await();
            stopWatch.stop();
            log.info("List拆分子List,拆分线程数量:{},在线程内部插入结果总耗时:{}",threadCount,stopWatch.getTotalTimeMillis()/1000/60.0f +"min");
        }catch (InterruptedException e){
            log.info("任务出错了,中断异常："+e.getMessage());
            e.printStackTrace();
        }
    }

二.测试

1.通过Mybatis批处理插入100W条数据

1.1 一批1000条

每一批大小：1000,使用MyBatis批处理成功写入：1000000条数据,总耗时:0.46666667min
在这里插入图片描述

1.2 一批1W条

每一批大小：10000,使用MyBatis批处理成功写入：1000000条数据,总耗时:0.4min
在这里插入图片描述

1.3 一批5W条

每一批大小：50000,使用MyBatis批处理成功写入：1000000条数据,总耗时:0.38333333min
在这里插入图片描述](https://i-blog.csdnimg.cn/direct/b0f351850e7342f3a07ff07d82b14fba.png)

2.通过集合拆分+多线程插入100W条

线程池定义

 @Bean("commonPool")
    public ExecutorService commonThreadPoolExecutor(){
        return new TulingMallThreadPoolExecutor("测试用例公共线程池",10,100).getLhrmsThreadPoolExecutor();
    }

我的线程池拒绝策略是自定义的，满了则阻塞放入队尾，所以不会丢弃任务

  public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
            try {
                executor.getQueue().put(r);
            }catch (InterruptedException e){
                e.printStackTrace();
            }
        }

2.1 任务拆分10个，每批处理10W条

List拆分子List,拆分线程数量:10,在线程内部插入结果总耗时:0.23333333min
在这里插入图片描述

2.2 任务拆分50个，每批处理2W条

List拆分子List,拆分线程数量:50,在线程内部插入结果总耗时:0.18333334min
在这里插入图片描述

2.3 任务拆分数量100个，每批处理1W条

List拆分子List,拆分线程数量:100,在线程内部插入结果总耗时:0.2min
在这里插入图片描述

三.小结

通过上面两种方式测试插入100W条数据。
可以看出：

Mybatis批处理效率比较稳定，无论每批5000、1W、还是5W，效率基本都在0.4min左右
(我的User对象不大，就name,age属性，大家在项目里面得自己实际去测，然后尽量设置到最优批处理大小即可。
集合拆分+多线程。因为咱们的核心线程数是10，最大线程数是100，阻塞队列容大小是8，拒绝策略是阻塞队列满了，阻塞添加到队列尾部，所以不存在任务丢失的情况。
结果：并不是线程(异步任务)开越多越好，后面任务拆成了100个反而比拆的50个异步任务效率低了，因为咱们得线程切换上下文开销也大(如果了解过jdk19后的虚拟线程，基本就不需要考虑线程切换造成的性能开销问题，效率更快)，然后最好是再修改核心线程数、最大线程数、阻塞队列再测试，设置到更合适的线程池参数。
总之两者插入100W条数据效率都特别快，我们自己酌情选择即可~