多线程批量插入数据，3s插入20w数据，大数据操作必备

Java斌

于 2024-05-23 10:54:59 发布

阅读量750

点赞数 6

分类专栏：十分钟学会Java java基础数据库文章标签： java spring boot mysql 大数据

coolsb.cn

本文链接：https://blog.csdn.net/qq_41737694/article/details/139141163

版权

java基础同时被 3 个专栏收录

8 篇文章 0 订阅

订阅专栏

十分钟学会Java

7 篇文章 0 订阅

订阅专栏

数据库

3 篇文章 0 订阅

订阅专栏

一、数据库链接设置

jdbc:mysql://数据库地址/数据库名？useUnicode=true&characterEncoding=UTF8&allowMultiQueries=true&rewriteBatchedStatements=true

rewriteBatchedStatements=false情况下，且MP(Mybaties-Plus)为3.5.3.1版本下进行测试的。

二、操作方法

使用多线程方式更新数据，首先我们先测试使用5个线程插入20万条数据，使用Mybaties-plus自带的saveBatch()方法更新，准备20w的数据，然后创建线程池，五个线程池批量跑

直接上代码：

import org.jeecg.JeecgSystemApplication;
import org.jeecg.modules.demo.test.entity.TestUser;
import org.jeecg.modules.demo.test.mapper.TestUserMapper;
import org.jeecg.modules.demo.test.service.ITestUserService;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import java.util.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

@RunWith(SpringRunner.class)
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT,classes = JeecgSystemApplication.class)
public class UserTest {

    @Autowired
    private ITestUserService userService;

    @Autowired
    private TestUserMapper testUserMapper;

     @Test
    public void testInsertBatchMulThreadSaveBatch() throws Exception{
        int totalRecords = 199999;
        int batchSize = 5000;
        int threadCount = 5; // 可以根据实际情况调整线程数量

        ExecutorService executor = Executors.newFixedThreadPool(threadCount);
        List<Future<Void>> futures = new ArrayList<>();

        long s = System.currentTimeMillis();
        for (int i = 0; i < totalRecords; i += batchSize) {
            int startIndex = i;
            int endIndex = Math.min(i + batchSize, totalRecords);

            List<TestUser> batchList = new ArrayList<>();
            for (int j = startIndex; j < endIndex; j++) {
                TestUser user = new TestUser();
                user.setName("张三");
                user.setAge("20");
                user.setProvince("重庆市");
                user.setSalary("200000");
                user.setRemark("diitch");
                batchList.add(user);
            }

            Future<Void> future = executor.submit(() -> {
                userService.saveBatch(batchList);
                return null;
            });
            futures.add(future);
        }

        // 等待所有线程执行完成
        for (Future<Void> future : futures) {
            future.get();
        }

        executor.shutdown();
        System.out.println("保存200000条数据消耗" + (System.currentTimeMillis() - s) + "ms");
    }
}

三、优化

将二十万条数据一次性放入内存中，确实有可能导致内存溢出（OOM，Out of Memory）。为了优化这段代码，我们可以考虑以下几个方面：

减少内存使用：避免一次性创建所有对象，而是分批次创建并处理。
优化数据插入逻辑：使用数据库事务和批量插入，减少数据库交互次数。
合理配置线程池：根据系统资源合理配置线程池大小。
异步处理：使用异步编程模型，提高数据处理效率。
资源清理：及时释放不再使用的资源。

下面是优化后的代码示例：

import org.jeecg.JeecgSystemApplication;
import org.jeecg.modules.demo.test.entity.TestUser;
import org.jeecg.modules.demo.test.mapper.TestUserMapper;
import org.jeecg.modules.demo.test.service.ITestUserService;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.scheduling.annotation.EnableAsync;
import org.springframework.test.context.junit4.SpringRunner;
import java.util.concurrent.*;
import java.util.*;

@RunWith(SpringRunner.class)
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT, classes = JeecgSystemApplication.class)
@EnableAsync // 开启异步支持
public class UserTest {

    @Autowired
    private ITestUserService userService;

    @Autowired
    private TestUserMapper testUserMapper;

    @Test
    public void testInsertBatchMulThreadSaveBatch() throws Exception {
        int totalRecords = 200000;
        int batchSize = 5000;
        int threadCount = 5; // 可以根据实际情况调整线程数量

        ExecutorService executor = Executors.newFixedThreadPool(threadCount);
        CountDownLatch latch = new CountDownLatch(threadCount); // 用于等待所有线程完成

        long startTime = System.currentTimeMillis();
        for (int i = 0; i < totalRecords; i += batchSize) {
            int startIndex = i;
            int endIndex = Math.min(i + batchSize, totalRecords);

            executor.submit(() -> {
                try {
                    List<TestUser> batchList = new ArrayList<>();
                    for (int j = startIndex; j < endIndex; j++) {
                        TestUser user = new TestUser();
                        user.setName("张三");
                        user.setAge("20");
                        user.setProvince("重庆市");
                        user.setSalary("200000");
                        user.setRemark("diitch");
                        batchList.add(user);
                    }
                    userService.saveBatch(batchList); // 假设这个方法支持批量插入
                } catch (Exception e) {
                    e.printStackTrace();
                } finally {
                    latch.countDown(); // 计数减一，表示一个线程任务完成
                }
            });
        }

        latch.await(); // 等待所有线程执行完成
        executor.shutdown();
        long endTime = System.currentTimeMillis();
        System.out.println("保存" + totalRecords + "条数据消耗" + (endTime - startTime) + "ms");
    }
}

请注意，这个示例代码假设ITestUserService的saveBatch方法已经支持批量插入操作，并且能够处理传入的List<TestUser>。如果saveBatch方法不支持批量插入，你可能需要修改服务层的实现，或者在TestUserMapper中实现批量插入逻辑。

此外，使用CountDownLatch代替Future可以简化线程同步的代码，并避免处理Future可能抛出的异常。这样可以更清晰地管理线程的生命周期，并在所有任务完成后统一进行资源清理。

Java斌

关注

6
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
0
评论
多线程批量插入数据，3s插入20w数据，大数据操作必备

使用多线程方式更新数据，首先我们先测试使用5个线程插入20万条数据，使用Mybaties-plus自带的。情况下，且MP(Mybaties-Plus)为3.5.3.1版本下进行测试的。方法更新，准备20w的数据，然后创建线程池。
复制链接

扫一扫