Mybatis-plus实现百万级数据插入优化（多线程）

捡星星同学

于 2024-08-16 14:19:20 发布

阅读量224

点赞数 9

文章标签： mybatis java 开发语言

本文链接：https://blog.csdn.net/qq_52110315/article/details/141255985

版权

大家在使用mybatis-plus的时候，可能都有遇到过实现数据导入的需求,但是大多数文章里面对这种100w级以上的数据导入的时间都想做进一步的优化:

首先我们要实现从excel读取数据，我这里因为业务需要，自定义了一个数据检验的监听器（大家有需要的选择使用）：

校验监听器：

/**
 * 导入（校验监听）
 */
@Data
public class InsuranceRecordListener extends AnalysisEventListener<InsuranceRecordVo> {
    //错误
    public String errors = "";
    //所有数据
    public List<InsuranceRecordVo> allRecords = new ArrayList<>();
    //校验通过的所有数据（只要有一条不通过都会全部跳出）
    public List<InsuranceRecordVo> validRecords = new ArrayList<>();
    //
    public List<String>  reinsuranceContNos = new ArrayList<>();
    //
    public List<String>  riskCodes = new ArrayList<>();
    //
    List<LdCodePOJO> reinsureComs = new ArrayList<>();

    @Override
    public void invoke(InsuranceRecordVo data, AnalysisContext context) {
        allRecords.add(data); // 先收集所有记录
    }

    @Override
    public void doAfterAllAnalysed(AnalysisContext context) {
        // 在所有记录读取完毕后进行校验
        for (int i = 0; i < allRecords.size(); i++) {
            if (errors.isEmpty()){
                InsuranceRecordVo data = allRecords.get(i);
                int rowIndex = i + 1; // 当前行索引
                validate(data, rowIndex);
            }else {
                break;
            }
        }
        if (errors.isEmpty()) {
            validRecords.addAll(allRecords); // 如果所有记录都校验通过，则将它们添加到validRecords
        }
    }

    private void validate(InsuranceRecordVo data, int rowIndex) {
        if (data.getAccountingYear() == null || !data.getAccountingYear().matches("^\\d{4}$")) {
            errors=String.format("第%d行xxx格式不正确，应为YYYY格式！", rowIndex);
        }
        if (data.getReinsuranceContNo() == null || data.getReinsuranceContNo().isEmpty()) {
            errors=String.format("第%d行xxx不能为空！", rowIndex);
        }
        if (data.getManageCom() == null || data.getManageCom().isEmpty()) {
            errors=String.format("第%d行xxx不能为空！", rowIndex);
        }
        if (data.getSaleChnl() == null || data.getSaleChnl().isEmpty()) {
            errors=String.format("第%d行xx不能为空！", rowIndex);
        }
        if (data.getRiskCode() == null || data.getRiskCode().isEmpty()) {
            errors=String.format("第%d行xxx不能为空！", rowIndex);
        }
        if (data.getReinsureCom() == null || data.getReinsureCom().isEmpty()) {
            errors=String.format("第%d行xxx不能为空！", rowIndex);
        }
        if (data.getCessComm() == null && data.getProfitCommission() == null) {
            errors=String.format("第%d行xxx和xxx至少填写一项！", rowIndex);
        }
        if (!StringUtils.isEmpty(data.getRiskCode())){
            if (!riskCodes.contains(data.getRiskCode())){
                errors=String.format("第%d行xxx不存在，请确认！", rowIndex);
            }
        }
        if (!StringUtils.isEmpty(data.getReinsuranceContNo())){
            if (!reinsuranceContNos.contains(data.getReinsuranceContNo())){
                errors=String.format("第%d行xxx不存在，请确认！", rowIndex);
            }
        }
        if (!StringUtils.isEmpty(data.getReinsureCom())){
            if (!reinsureComs.contains(data.getReinsureCom())){
                errors=String.format("第%d行xxx不存在，请确认！", rowIndex);
            }
        }
    }

}

实际调用以及数据获取：

List<String> reinsuranceContNos = lrProductMapper.selectList(new LambdaQueryWrapper<>())
                .stream().map(LrProductPOJO::getReInsuranceContNo)
                .distinct().collect(Collectors.toList());
//所有产品
 List<String> riskCodes = lrRiskMapper.selectList(new LambdaQueryWrapper<>())
                .stream().map(LrRiskPOJO::getRiskCode)
                .distinct().collect(Collectors.toList());
//所有公司
List<LdCodePOJO> reinsureComs = ldCodeMapper.selectList(new LambdaQueryWrapper<LdCodePOJO>()
                .eq(LdCodePOJO::getCodeType,"lrcommapper"));
InsuranceRecordListener listener = new InsuranceRecordListener();
listener.setReinsuranceContNos(reinsuranceContNos);
listener.setRiskCodes(riskCodes);
listener.setReinsureComs(reinsureComs);
InputStream inputStream = actualBillingDTO.getFile().getInputStream();
EasyExcel.read(inputStream, InsuranceRecordVo.class, listener).sheet().doRead();
String errors = listener.getErrors();
List<InsuranceRecordVo> validRecords = listener.getValidRecords();

1.有的小伙伴可能想到了使用mybatis-plus的批量插入，但是这个方法在批量插入数据时可能会导致内存溢出，因为mybatis-plus的批量插入是在内存中进行的，如果数据量过大，可能会导致内存溢出，所以需要使用其他方法来实现批量插入,并且mybatis-plus的saveBatch方法实际上是调用了mybatis的批量插入方法，也是循环一条一条插入的，所以他的速度也是比较慢的

2.也有小伙伴可能看到了有使用rewriteBatchedStatements=true的方式来实现，但是这个方法也是在内存中进行的，所以也是会内存溢出的，并且有可能导致sql过长插入失败;

3.也有小伙伴在mybatis-plus版本在3.4版本以上，可以实现用自定义InsertBatchSomeColumn的方法来实现批量插入 ,具体方式如下：

3.1.自定义SQL注入器实现DefaultSqlInjector，添加InsertBatchSomeColumn方法

public class MySqlInjector extends DefaultSqlInjector {
  @Override
  public List<AbstractMethod> getMethodList(Class<?> mapperClass, TableInfo tableInfo) 
         {
           List<AbstractMethod> methodList = super.getMethodList(mapperClass, tableInfo);
           methodList.add(new InsertBatchSomeColumn(i -> i.getFieldFill() != FieldFill.UPDATE));
           return methodList;
      }

3.2.编写配置类，把自定义注入器放入spring容器

@Configuration
public class MybatisPlusConfig {
   @Bean
   public MySqlInjector sqlInjector() {
      return new MySqlInjector();
   }
}

3.3.编写自定义BaseMapper,加入InsertBatchSomeColumn方法

public interface MyBaseMapper<T> extends BaseMapper<T> {
      /**
      * 方法名字需要一模一样
      */
     int insertBatchSomeColumn(List<T> entityList);
}

3.4.需要批量插入的Mapper继承自定义BaseMapper

@Mapper
public interface StudentMapper extends MyBaseMapper<Student> {
}

3.5.service里面实现调用此方法

@Service
public class StudentServiceImpl extends ServiceImpl<StudentMapper, Student> {
      @Override
      public boolean insertList(Collection<Student> entityList) {
            studentMapper.insertBatchSomeColumn(entityList);
        }
}

注意：在 InsertBatchSomeColumn 类的类注释上面，官方有说明：**不同的数据库支持度不一样!!! 只在 mysql 下测试过!!! 只在 mysql 下测试过!!! 只在 mysql 下测试过!!!除了主键是数据库自增的未测试外理论上都可以使用!!! 如果你使用自增有报错或主键值无法回写到entity，就不要跑来问为什么了，因为我也不知道!!! ** 推测可能是这个批量插入的实现仍不完善，所以官方没有明确支持这个功能，而是要我们自己来注入以了解其中的利害

4.但是也有小伙伴们因为项目的原因，mybatis-plus的版本在3.4.1版本以下,比如3.1版本的这种就无法使用InsertBatchSomeColumn方法,所以可以使用多线程+分批+sql拼接的方式来进行数据插入，代码如下：

 List<InsuranceRecordVo> validRecords = listener.getValidRecords();
ExecutorService executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
      int batchSize = 1000; // 每批插入1000条
      int totalSize = validRecords .size();
      int batchCount = (totalSize + batchSize - 1) / batchSize;
      for (int i = 0; i < batchCount; i++) {
           int fromIndex = i * batchSize;
           int toIndex = Math.min(fromIndex + batchSize, totalSize);
           List<InsuranceRecordVo> batchList = validRecords .subList(fromIndex, toIndex);
                    // 提交给线程池处理
             executorService.submit(() -> {
                  realityBillMapper.insertRealityBillBatch(batchList);
             });
        }
                 //关闭线程池并等待所有任务完成
             executorService.shutdown();
             try {
                if (!executorService.awaitTermination(30, TimeUnit.MINUTES)) {
                        executorService.shutdownNow();
                }
            } catch (InterruptedException e) {
                    executorService.shutdownNow();
                    Thread.currentThread().interrupt();
            }

对应mapper里面的插入代码如下例子：

@Mapper
public interface RealityBillMapper extends BaseMapper<InsuranceRecordVo> {
    @Insert("<script>" +
            "INSERT INTO InsuranceRecord (" +
            "    accountingyear, accountingmonth, reinsurancecontno, periodbill, " + 
            ") VALUES " +
            "<foreach collection='list' item='item' index='index' separator=','>" +
            "    (#{item.accountingYear}, #{item.accountingMonth}, #{item.reinsuranceContNo}, #{item.periodBill}, " +
            "</foreach>" +
            "</script>")
void insertRealityBillBatch(@Param("list") List<InsuranceRecordVo> InsuranceRecordVos);

}

这样就已经完全实现了mybatis百万的导入优化了，经测试110w的数据导入时间在40s左右，比使用原生的for循环插入快了几十倍的时间，希望这篇文章能帮到大家！