mybatis批量插入10w条数据的优化

最新推荐文章于 2024-06-19 14:04:57 发布

qq_45772772

最新推荐文章于 2024-06-19 14:04:57 发布

阅读量852

点赞数 5

文章标签： mybatis windows

本文链接：https://blog.csdn.net/qq_45772772/article/details/137397729

版权

批量插入数据的业务场景我们经常会遇到，众所周知，批量操作是很耗时的，因此本次就来讨论如何优化

创建基础工程和表
1.自行搭建springboot项目并引入mybatis-plus
2.创建表，这里我暂时只给两个字段
3.此处还引入hutool，用于生成id

<dependency>
  <groupId>cn.hutool</groupId>
    <artifactId>hutool-all</artifactId>
    <version>5.7.18</version>
</dependency>
<dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
</dependency>

表结构

CREATE TABLE `student` (
  `id` varchar(36) COLLATE utf8mb4_general_ci NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

实体类

@Data
@TableName
public class Student {
    
    private String id;
    
    private String name;
}

创建好项目和数据库后，添加测试代码
请求controller

@RestController
public class TestController {
    @Autowired
    private IStudentService service;

    @GetMapping("insert/{number}")
    public void insert(@PathVariable Integer number) {
        service.insert(number);
        System.out.println(number + " 条插入完成");
    }

}

service类

public interface IStudentService  extends IService<Student> {
    void insert(Integer number);
}

1.使用mybatis-plus自带的批量插入功能

@Service
@RequiredArgsConstructor
public class IStudentServiceImpl extends ServiceImpl<StudentMapper, Student> implements IStudentService {

    @Override
    @Transactional
    public void insert(Integer number) {
        List<Student> list = new ArrayList<>();
        for (int i = 0; i < number; i++) {
            Student course = new Student();
            course.setName("name" + i);
            course.setId(String.valueOf(IdUtil.getSnowflakeNextId()));
            list.add(course);
        }
        //使用mybatis-plus
        mp(list);
    }

    private void mp(List<Student> list) {
        System.out.println("mybatis-plus操作--start");
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        super.saveBatch(list);
        stopWatch.stop();
        long total = stopWatch.getTotal(TimeUnit.SECONDS);
        System.out.println("mybatis-plus操作，耗时" + total);
    }
}

在controller处执行请求，插入100000条
在这里插入图片描述
结果

10秒钟，有点久
翻看源码后发现它是一条条for循环插入的。。。好家伙

这是把10w条数据分批插入，每批是1000条循环？

看来大数据的批插入不能用这个方法，否则10w次的io，客户气得半夜跑过来给你一砖头

现在使用另一种方法
批插入肯定是要访问数据库的，这是避免不了的，因此我们可以把访问的次数减少，那不就好了？也就是减少io，减少sqlSession会话次数

如下，在mapper重新生成了sql语句

    @Insert("<script>" +
            "INSERT INTO `student`(`id`, `name`) " +
            "VALUES " +
            "<foreach collection= 'list' item= 'item' separator= ','> " +
            "(#{item.id},#{item.name})" +
            "</foreach></script>")
    void batch(@Param("list") List<Student> list);

也就是10w条语句改成一条，一次性执行，减少循环过程
代码如下

@Service
@RequiredArgsConstructor
public class IStudentServiceImpl extends ServiceImpl<StudentMapper, Student> implements IStudentService {

    @Override
    @Transactional
    public void insert(Integer number) {
        List<Student> list = new ArrayList<>();
        for (int i = 0; i < number; i++) {
            Student course = new Student();
            course.setName("name" + i);
            course.setId(String.valueOf(IdUtil.getSnowflakeNextId()));
            list.add(course);
        }
        //使用mybatis-plus
//        mp(list);
        //使用自定义的sql批处理
        consumer(list);
    }

    private void mp(List<Student> list) {
        System.out.println("mybatis-plus操作--start");
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        super.saveBatch(list);
        stopWatch.stop();
        long total = stopWatch.getTotal(TimeUnit.SECONDS);
        System.out.println("mybatis-plus操作，耗时" + total);
    }

    /**
     * 使用自定义的sql批操作
     * @param list
     */
    private void consumer(List<Student> list) {
        System.out.println("使用自定义的sql批操作--start");
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        baseMapper.batch(list);
        stopWatch.stop();
        long total = stopWatch.getTotal(TimeUnit.SECONDS);
        System.out.println("使用自定义的sql批操作，耗时" + total);
    }
}

执行结果
在这里插入图片描述
快很多了
后续的测试发现，插入的内容太多，就会出现问题

我把上面的name值换成和id一样的取值，增加了name值的长度，因此sql语句长度也增加了

### Error updating database.  Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (5,300,043 > 4,194,304). You can change this value on the server by setting the 'max_allowed_packet' variable.
### The error may exist in org/example/student/mapper/StudentMapper.java (best guess)
### The error may involve org.example.student.mapper.StudentMapper.batch-Inline
### The error occurred while setting parameters

我肯定不能随便的改数据库配置的

行吧，来终极大杀器，多线程
将sql批处理拆分成多个线程，一个线程处理3w条(记得加事务)


    private void thread(List<Student> list1) {
        System.out.println("多线程操作--start");
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        int max = 3;
        final CountDownLatch countDownLatch = new CountDownLatch(max);
        ExecutorService fixedThreadPool = Executors.newFixedThreadPool(max);
        task(fixedThreadPool, list1.subList(0, 30000), countDownLatch);
        task(fixedThreadPool, list1.subList(30000, 60000), countDownLatch);
        task(fixedThreadPool, list1.subList(60000, 100000), countDownLatch);
        try {
            countDownLatch.await();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        fixedThreadPool.shutdown();
        stopWatch.stop();
        long total = stopWatch.getTotal(TimeUnit.SECONDS);
        System.out.println("多线程操作，耗时" + total);
    }

    private void task(ExecutorService fixedThreadPool, List<Student> list, CountDownLatch countDownLatch) {
        System.out.println("本次插入条数为 = " + list.size());
        fixedThreadPool.execute(() -> {
            baseMapper.batch(list);
            countDownLatch.countDown();
        });
    }

在这里插入图片描述
完整代码如下

@Service
@RequiredArgsConstructor
public class IStudentServiceImpl extends ServiceImpl<StudentMapper, Student> implements IStudentService {

    @Override
    @Transactional
    public void insert(Integer number) {
        List<Student> list = new ArrayList<>();
        for (int i = 0; i < number; i++) {
            Student course = new Student();
//            course.setName("name" + i);
//            course.setId(String.valueOf(IdUtil.getSnowflakeNextId()));
//            list.add(course);
            //下面的代码会增加sql语句长度
            long snowflakeNextId = IdUtil.getSnowflakeNextId();
            course.setName("name" + snowflakeNextId);
            course.setId(String.valueOf(snowflakeNextId));
            list.add(course);
        }
        //使用mybatis-plus
//        mp(list);
        //使用自定义的sql批处理
//        consumer(list);
        //使用多线程同时操作
        thread(list);
    }

    private void mp(List<Student> list) {
        System.out.println("mybatis-plus操作--start");
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        super.saveBatch(list);
        stopWatch.stop();
        long total = stopWatch.getTotal(TimeUnit.SECONDS);
        System.out.println("mybatis-plus操作，耗时" + total);
    }

    /**
     * 使用自定义的sql批操作
     * @param list
     */
    private void consumer(List<Student> list) {
        System.out.println("使用自定义的sql批操作--start");
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        baseMapper.batch(list);
        stopWatch.stop();
        long total = stopWatch.getTotal(TimeUnit.SECONDS);
        System.out.println("使用自定义的sql批操作，耗时" + total);
    }

    private void thread(List<Student> list1) {
        System.out.println("多线程操作--start");
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        int max = 3;
        final CountDownLatch countDownLatch = new CountDownLatch(max);
        ExecutorService fixedThreadPool = Executors.newFixedThreadPool(max);
        task(fixedThreadPool, list1.subList(0, 30000), countDownLatch);
        task(fixedThreadPool, list1.subList(30000, 60000), countDownLatch);
        task(fixedThreadPool, list1.subList(60000, 100000), countDownLatch);
        try {
            countDownLatch.await();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        fixedThreadPool.shutdown();
        stopWatch.stop();
        long total = stopWatch.getTotal(TimeUnit.SECONDS);
        System.out.println("多线程操作，耗时" + total);
    }

    private void task(ExecutorService fixedThreadPool, List<Student> list, CountDownLatch countDownLatch) {
        System.out.println("本次插入条数为 = " + list.size());
        fixedThreadPool.execute(() -> {
            baseMapper.batch(list);
            countDownLatch.countDown();
        });
    }

}

当然，还有其他方法，例如生成sql文件然后由mysql执行，还有就是使用存储过程，这里暂不试了

qq_45772772

关注

5
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
mybatis批量插入10w条数据的优化

批插入肯定是要访问数据库的，这是避免不了的，因此我们可以把访问的次数减少，那不就好了？当然，还有其他方法，例如生成sql文件然后由mysql执行，还有就是使用存储过程，这里暂不试了。批量插入数据的业务场景我们经常会遇到，众所周知，批量操作是很耗时的，因此本次就来讨论如何优化。我把上面的name值换成和id一样的取值，增加了name值的长度，因此sql语句长度也增加了。看来大数据的批插入不能用这个方法，否则10w次的io，客户气得半夜跑过来给你一砖头。后续的测试发现，插入的内容太多，就会出现问题。
复制链接

扫一扫