批量sql处理优化

批量sql处理优化

大批量数据的增删改是较为常见的场景,不当的处理带来一些问题,最直接的问题就是批量持久化数据导致长事务,从而引起数据库连接池资源耗尽,接口延时等一系列问题

本文以mysql数据库为例,批量插入场景为例子叙述个人在该种场景下的优化见解,只针对于优化方向:事务统一提交、内存暂存表、多线程持久化、数据库批处理语句、框架提供的批处理模式。

此处对数据库批处理语句大致说明:

插入语句可使用:insert into table_name values (...),(...),...,(...)

更新语句可使用:

update table_name,(
(select  1 as id,'1' as name) union
 (select  2 as id,'2' as name) union
 (select  3 as id,'3' as name) union
 (select  4 as id,'4' as name)
  /* ....more data ... */
    ) as table_temp
   set
table_name.name=table_temp.name
 where
table_name.id=table_temp.id

插入或更新可使用:

insert into table_name values 
(id=1,name=1),(...),...,(...)
on duplicate key update
name = values(name),

删除可使用:delete from table_name where id in (...)

存储过程

正常存储过程插入1w条 时间:14s484ms

create table batch_demo
(
    id          int         not null comment 'id'
        primary key,
    batch_name  varchar(32) null,
    batch_value varchar(32) null
)
    comment '批量处理测试表';

drop procedure if exists gen_data;

delimiter $$

create procedure gen_data(in size int)
begin
    #分页提交大小
    declare page_size int default 1000;
    #下标
    declare i int default 1;
    out_while:while i < size do
        set i = i + 1;
        insert into batch_demo(id, batch_name, batch_value) value (i,concat('name',i),concat('value',i));
    end while ;
end $$
delimiter ;

call gen_data(10000);

存储过程使用内存表优化添加1w条数据 时间: 98ms

create table batch_demo
(
    id          int         not null comment 'id'
        primary key,
    batch_name  varchar(32) null,
    batch_value varchar(32) null
)
    comment '批量处理测试表';

create temporary table batch_demo_temp
(
    id          int         not null comment 'id'
        primary key,
    batch_name  varchar(32) null,
    batch_value varchar(32) null
) engine = MEMORY
    comment '批量处理测试表暂存表';

drop procedure if exists gen_data;

delimiter $$

create procedure gen_data(in size int)
begin
    #分页提交大小
    declare page_size int default 1000;
    #下标
    declare i int default 1;
    out_while:while i < size do
        set i = i + 1;
        insert into batch_demo_temp(id, batch_name, batch_value) value (i,concat('name',i),concat('value',i));
    end while ;
    insert into batch_demo select * from batch_demo_temp;
end $$
delimiter ;

call gen_data(10000);

注意:暂存表使用会有大小限制,默认为16M,可以调节my.ini文件或者transaction_alloc_block_size全局常量改变限制。但是个人不建议这样操作,在批量存储前可以将数据分片再存。

事务控制改为手动,并分片控制 时间:346ms

create table batch_demo
(
    id          int         not null comment 'id'
        primary key,
    batch_name  varchar(32) null,
    batch_value varchar(32) null
)
    comment '批量处理测试表';
    
drop procedure if exists gen_data;
delimiter $$

create procedure gen_data(in size int)
begin
    #分页提交大小
    declare page_size int default 1000;
    #下标
    declare i int default 1;
    #开启事务
    start transaction ;
    out_while:while i < size do
        set i = i + 1;
        if 0 <> i % page_size then
            insert into batch_demo(id, batch_name, batch_value) value (i,concat('name',i),concat('value',i));
        else
            commit ;
            start transaction ;
        end if ;
    end while ;
    commit ;
end $$
delimiter ;

Jdbc

代码位置:https://gitee.com/tonyffd/batch_demo/blob/master/src/test/java/top/wecoding/jdbc/TestJdbcDemo.java

jdbc有Statement和prepareStatement两种处理sql的方式,个人推荐使用prepareStatement

正常循环执行 耗时:20507

@Test
@SneakyThrows
void testSimpleInsertData5(){
    PreparedStatement preparedStatement = connection.prepareStatement("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)");
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        preparedStatement.setInt(1,i);
        preparedStatement.setString(2,"name"+i);
        preparedStatement.setString(3,"value"+i);
        preparedStatement.executeUpdate();
    }
    System.out.println("耗时:"+String.valueOf(System.currentTimeMillis()-starTime));
}

事务统一提交 耗时:5621

@Test
@SneakyThrows
void testSimpleInsertData6(){
    connection.setAutoCommit(false);
    PreparedStatement preparedStatement = connection.prepareStatement("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)");
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        preparedStatement.setInt(1,i);
        preparedStatement.setString(2,"name"+i);
        preparedStatement.setString(3,"value"+i);
        preparedStatement.executeUpdate();
    }
    connection.commit();
    System.out.println("耗时:"+String.valueOf(System.currentTimeMillis()-starTime));
}

多线程持久化 耗时:4353

注意线程池和数据库连接池的使用限制约束,并需要考虑线程安全问题和事务控制问题

@Test
@SneakyThrows
void testSimpleInsertData8() {
    //手动创建线程池
    ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(Runtime.getRuntime().availableProcessors() / 2, Runtime.getRuntime().availableProcessors(), 1, TimeUnit.MINUTES, new ArrayBlockingQueue<>(10000), (r, exec) -> {
        throw new RejectedExecutionException("线程池已满");
    });
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        final int index = i;
        poolExecutor.submit(() -> {
            try (Connection connection = dataSource.getConnection();
                 final PreparedStatement preparedStatement = connection.prepareStatement("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)");) {
                preparedStatement.setInt(1, index);
                preparedStatement.setString(2, "name" + index);
                preparedStatement.setString(3, "value" + index);
                preparedStatement.executeUpdate();
            } catch (SQLException e) {
                e.printStackTrace();
            }

        });
    }
    poolExecutor.shutdown();
    poolExecutor.awaitTermination(30, TimeUnit.MINUTES);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

内存暂存表优化 耗时:5316

@Test
@SneakyThrows
void testSimpleInsertData2() {
    Statement statement = connection.createStatement();
    long starTime = System.currentTimeMillis();
    statement.executeUpdate("create temporary table batch_demo_temp\n" +
                            "(\n" +
                            "    id          int         not null comment 'id'\n" +
                            "        primary key,\n" +
                            "    batch_name  varchar(32) null,\n" +
                            "    batch_value varchar(32) null\n" +
                            ") engine = MEMORY\n" +
                            "    comment '批量处理测试表暂存表';");
    for (int i = 0; i < 10000; i++) {
        statement.executeUpdate("insert into batch_demo_temp(id, batch_name, batch_value) value (" + i + ",concat('name'," + i + "),concat('value'," + i + "));");
    }
    statement.executeUpdate("insert into batch_demo select * from batch_demo_temp;");
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

使用批处理模式 耗时:281

注意在jdbc连接配置url中添加rewriteBatchedStatements=true参数,否则可能不生效

@Test
@SneakyThrows
void testSimpleInsertData7() {
    PreparedStatement preparedStatement = connection.prepareStatement("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)");
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        preparedStatement.setInt(1, i);
        preparedStatement.setString(2, "name" + i);
        preparedStatement.setString(3, "value" + i);
        preparedStatement.addBatch();
    }
    preparedStatement.executeBatch();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

数据库批处理语句 耗时:269

@Test
@SneakyThrows
void testSimpleInsertData9() {
    Statement statement = connection.createStatement();
    StringBuilder sb = new StringBuilder("insert into batch_demo(id, batch_name, batch_value) values ");
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        sb.append(" ("+i+",'name"+i+"','value"+i+"') ,");
    }
    String sql = sb.substring(0, sb.length() - 1);
    statement.executeUpdate(sql);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

结论

  • 可以看出使用框架和数据库提供的批处理方式为最优解
  • 个人不建议这种场景直接使用多线程持久化,理由如下:很多情况下我们都需要获取当前登录用户信息作为填充字段,然而很多框架在多线程下无法获取当前登录用户信息,并且多线程是Spring事务失效的经典场景之一
  • 即使没有其它选择只能使用多线程进行进行优化,也不要对线程池的线程数量定义太多,定义为当前服务器cpu核数的2倍较好

Spring Jdbc

代码位置:https://gitee.com/tonyffd/batch_demo/blob/master/src/test/java/top/wecoding/jdbcTemplate/TestJdbcTemplateDemo.java

正常循环执行 耗时:20822

@Test
void testInsert0() {
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        jdbcTemplate.update("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)",
                            new Object[]{i, "name" + i, "value" + i});
    }
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

事务统一提交 耗时:5774

  • @Transactional事务注解
@Test
@Transactional
void testInsert1() {
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        jdbcTemplate.update("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)",
                            new Object[]{i, "name" + i, "value" + i});
    }
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}
  • 声明式事务
@Test
void testInsert2() {
    TransactionTemplate template = SpringUtil.getBean(TransactionTemplate.class);
    long starTime = System.currentTimeMillis();
    template.execute(status -> {
        for (int i = 0; i < 10000; i++) {
            jdbcTemplate.update("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)",
                                new Object[]{i, "name" + i, "value" + i});
        }
        return null;
    });
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

多线程持久化 耗时:4370

@Test
@SneakyThrows
void testInsert8() {
    long starTime = System.currentTimeMillis();
    //手动创建线程池
    ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(Runtime.getRuntime().availableProcessors()/2, Runtime.getRuntime().availableProcessors(), 1, TimeUnit.MINUTES, new ArrayBlockingQueue<>(10000), (r, exec) -> {
        throw new RejectedExecutionException("线程池已满");
    });
    for (int i = 0; i < 10000; i++) {
        final int index = i;
        poolExecutor.submit(()->{
            jdbcTemplate.update("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)",new Object[]{index,"name"+index,"value"+index});
        });
    }
    poolExecutor.shutdown();
    poolExecutor.awaitTermination(30,TimeUnit.MINUTES);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

内存暂存表优化 耗时:5359

@Test
void testInsert3() {
    long starTime = System.currentTimeMillis();
    jdbcTemplate.execute("create temporary table batch_demo_temp\n" +
                         "(\n" +
                         "    id          int         not null comment 'id'\n" +
                         "        primary key,\n" +
                         "    batch_name  varchar(32) null,\n" +
                         "    batch_value varchar(32) null\n" +
                         ") engine = MEMORY\n" +
                         "    comment '批量处理测试表暂存表';");
    for (int i = 0; i < 10000; i++) {
        jdbcTemplate.update("insert into batch_demo_temp(id, batch_name, batch_value) value (?,?,?)",
                            new Object[]{i, "name" + i, "value" + i});
    }
    jdbcTemplate.execute("insert into batch_demo select * from batch_demo_temp;");
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

使用批处理模式 耗时:343

  • jdbcTemplate.batchUpdate 通用 耗时:343
@Test
void testInsert5() {
    long starTime = System.currentTimeMillis();
    jdbcTemplate.batchUpdate("insert into batch_demo(id, batch_name, batch_value) value (?,?,?)", new BatchPreparedStatementSetter() {
        @Override
        public void setValues(PreparedStatement ps, int i) throws SQLException {
            ps.setInt(1,i);
            ps.setString(2,"name"+i);
            ps.setString(3,"value"+i);
        }

        @Override
        public int getBatchSize() {
            return 10000;
        }
    });
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}
  • BatchSqlUpdate适合大批量数据处理,分批次处理 耗时:400
@Test
void testInsert6() {
    long starTime = System.currentTimeMillis();
    BatchSqlUpdate batchSqlUpdate = new BatchSqlUpdate(jdbcTemplate.getDataSource(),"insert into batch_demo(id, batch_name, batch_value) value (?,?,?)");
    int[] types = {Types.INTEGER,Types.VARCHAR,Types.VARCHAR};
    batchSqlUpdate.setTypes(types);
    // 分批处理大小
    batchSqlUpdate.setBatchSize(1000);
    for (int i = 0; i < 10000; i++) {
        batchSqlUpdate.update(new Object[]{i,"name"+i,"value"+i});
    }
    batchSqlUpdate.flush();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

数据库批处理语句 耗时: 281

@Test
void testInsert7() {
    long starTime = System.currentTimeMillis();
    StringBuilder builder = new StringBuilder("insert into batch_demo(id, batch_name, batch_value) values ");
    for (int i = 0; i < 10000; i++) {
        builder.append(" ("+i+",'name"+i+"','value"+i+"'),");
    }
    String sql = builder.substring(0, builder.length() - 1);
    jdbcTemplate.update(sql);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

Mybatis

代码位置: https://gitee.com/tonyffd/batch_demo/blob/master/src/test/java/top/wecoding/mybatis/TestMybatisDemo.java

为简化文档,此处只演示调用代码,具体mapper中的sql操作下载演示demo查看

正常循环执行 耗时:27097

@Test
void testInsert0(){
    BatchDemo batchDemo = new BatchDemo();
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        batchDemoMapper.insertForMybatis(batchDemo);
    }
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

事务统一提交 耗时:11328

@Test
@Transactional
void testInsert1(){
    BatchDemo batchDemo = new BatchDemo();
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        batchDemoMapper.insertForMybatis(batchDemo);
    }
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

多线程持久化 耗时:3503

此处的线程池是使用SpringBoot项目中配置好的线程池,可自己定义

@Test
@SneakyThrows
void testInsert5(){
    ExecutorService executorService = SpringUtil.getBean(AsyncConfig.ExecutorType.WORK_EXECUTOR);
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        executorService.submit(()->{
            batchDemoMapper.insertForMybatis(batchDemo);
        });
    }
    executorService.shutdown();
    executorService.awaitTermination(30, TimeUnit.MINUTES);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

内存暂存表优化 耗时:10661

@Test
void testInsert2(){
    BatchDemo batchDemo = new BatchDemo();
    long starTime = System.currentTimeMillis();
    //创建暂存表
    batchDemoMapper.createTempTable();
    for (int i = 0; i < 10000; i++) {
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        //给暂存表添加数据
        batchDemoMapper.insertTempTableForMybatis(batchDemo);
    }
    //将暂存表数据导入正式表
    batchDemoMapper.insertByTempTableForMybatis();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

使用批处理模式 耗时:770

使用sqlSessionFactory.openSession(ExecutorType.BATCH);切换sqlSession模式,三种模式分别为

SIMPLE, REUSE, BATCH
@Test
void testInsert4(){
    //使用批处理模式
    SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH);
    BatchDemoMapper mapper = sqlSession.getMapper(BatchDemoMapper.class);
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        mapper.insertForMybatis(batchDemo);
    }
    sqlSession.commit();
    sqlSession.close();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

数据库批处理语句 耗时:903

对应的*Mapper.xml中使用foreach遍历拼接sql语句

<insert id="insertListForMybatis" keyColumn="id" keyProperty="id" parameterType="list" useGeneratedKeys="true">
   insert into batch_demo values
   <foreach collection="batchDemoList" index="index" item="batchDemo" separator=",">
       ( #{batchDemo.id,jdbcType=INTEGER},
       #{batchDemo.batchName,jdbcType=VARCHAR},
       #{batchDemo.batchValue,jdbcType=VARCHAR}
       )
   </foreach>
</insert>
@Test
void testInsert3(){
    List<BatchDemo> batchDemos = new ArrayList<>(10000);
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        batchDemos.add(batchDemo);
    }
    batchDemoMapper.insertListForMybatis(batchDemos);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

Mybatis Plus

代码位置: https://gitee.com/tonyffd/batch_demo/blob/master/src/test/java/top/wecoding/mybatisPlus/TestMybatisPlusDemo.java

作为mybatis的拓展,拥有mybatis的所有功能,直接使用Mybatis的用法完全OK,当然也一些自己的api可以使用,该测试只使用了ServiceImpl中的方法,且有些方法于Mybatis一致,省去不写。

使用批处理模式 耗时:903

@Test
void testInsert3(){
    List<BatchDemo> batchDemos = new ArrayList<>(10000);
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        batchDemos.add(batchDemo);
    }
    batchDemoService.saveBatch(batchDemos);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

可以直接使用SqlHelper中的批处理api,实现ServiceImpl后调用的api较为简单,并且也是使用SqlHelper中的api实现的

Spring Data Jpa

代码位置: https://gitee.com/tonyffd/batch_demo/blob/master/src/test/java/top/wecoding/jpa/TestJpaDemo.java

Spring Data Jpa中包含hibernatehibernate也是jpa的一种实现,故直接测试Jpa,不再测试hibernate

正常循环执行 耗时:44241

  • 使用JpaRepository中的save方法,较为耗时,因为它会先从数据库查询一遍,判断持久化数据是否为新数据 耗时:44241
@Test
@Transactional
void testInsert0(){
    BatchDemo batchDemo = new BatchDemo();
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        batchDemoRepository.save(batchDemo);
    }
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}
  • 强制执行insert语句,在Repository中定义一下方法 耗时:耗时:36168
@Modifying
@Transactional
@Query(nativeQuery = true,value = "insert into batch_demo(id, batch_name, batch_value) VALUE (:id,:batchName,:batchValue)")
void saveBatchDemo(@Param("id") Integer id, @Param("batchName") String batchName, @Param("batchValue") String batchValue);
@Test
@Transactional(propagation = Propagation.SUPPORTS)
void testInsert2() {
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        batchDemoRepository.saveBatchDemo(batchDemo.getId(),batchDemo.getBatchName(),batchDemo.getBatchValue());
    }
    batchDemoRepository.flush();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

事务统一提交 耗时:12363

  • 使用JpaRepository中的saveAll方法 耗时:12363
@Test
void testInsert1(){
    long starTime = System.currentTimeMillis();
    List<BatchDemo> batchDemos = new ArrayList<>(10000);
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name"+i);
        batchDemo.setBatchValue("value"+i);
        batchDemos.add(batchDemo);
    }
    batchDemoRepository.saveAll(batchDemos);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}
  • 强制性执行insert语句 耗时:6549
@Transactional(rollbackFor = Exception.class)
public void insert3() {
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name" + i);
        batchDemo.setBatchValue("value" + i);
        batchDemoRepository.saveBatchDemo(batchDemo.getId(), batchDemo.getBatchName(), batchDemo.getBatchValue());
    }
    batchDemoRepository.flush();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}
  • 使用entityManager.persist直接插入 耗时: 6251
@Transactional(rollbackFor = Exception.class)
public void insert4() {
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name" + i);
        batchDemo.setBatchValue("value" + i);
        entityManager.persist(batchDemo);
    }
    entityManager.flush();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

多线程持久化 耗时:4412

@Test
@SneakyThrows
void testInsert5() {
    ExecutorService executorService = SpringUtil.getBean(AsyncConfig.ExecutorType.WORK_EXECUTOR);
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        final Integer index = i;
        executorService.submit(()->{
            batchDemoRepository.saveBatchDemo(index,"name"+index,"value"+index);
        });
    }
    executorService.shutdown();
    executorService.awaitTermination(30, TimeUnit.MINUTES);
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

内存暂存表优化 耗时:6173

@Transactional(rollbackFor = Exception.class)
public void insert6() {
    long starTime = System.currentTimeMillis();
    Query createTable = entityManager.createNativeQuery("create temporary table batch_demo_temp\n" +
                                                        "(\n" +
                                                        "    id          int         not null comment 'id'\n" +
                                                        "        primary key,\n" +
                                                        "    batch_name  varchar(32) null,\n" +
                                                        "    batch_value varchar(32) null\n" +
                                                        ") engine = MEMORY\n" +
                                                        "    comment '批量处理测试表暂存表';");
    createTable.executeUpdate();
    for (int i = 0; i < 10000; i++) {
        entityManager.createNativeQuery("insert into batch_demo_temp(id, batch_name, batch_value) value ("+i+",'name"+i+"','value"+i+"')").executeUpdate();
    }
    entityManager.createNativeQuery("insert into batch_demo select * from batch_demo_temp;").executeUpdate();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

使用批处理模式 耗时:523

  • 配置hibernate的批处理大小
spring:
	jpa:
        properties:
              hibernate:
                jdbc:
                  batch_size: 500
                order_inserts: true
                order_updates: true
  • 分批次提交
@Transactional(rollbackFor = Exception.class)
public void insert8() {
    long starTime = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        BatchDemo batchDemo = new BatchDemo();
        batchDemo.setId(i);
        batchDemo.setBatchName("name" + i);
        batchDemo.setBatchValue("value" + i);
        entityManager.persist(batchDemo);
        if (i%500==0){
            entityManager.flush();
            entityManager.clear();
        }
    }
    entityManager.flush();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

数据库批处理语句 耗时:386

@Transactional(rollbackFor = Exception.class)
public void insert7() {
    long starTime = System.currentTimeMillis();
    StringBuilder sb = new StringBuilder("insert into batch_demo(id, batch_name, batch_value) values ");
    for (int i = 0; i < 10000; i++) {
        sb.append(" ("+i+",'name"+i+"','value"+i+"') ,");
    }
    String sql = sb.substring(0, sb.length() - 1);
    entityManager.createNativeQuery(sql).executeUpdate();
    System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
}

总结

上述的各种持久化框架中,均只使用了单个方向进行优化,在实际业务场景结合使用效果可能会更好。

  • 从这几个优化方向来说使用数据库批处理语句和持久化框架的批处理模式明显是效果最好的

  • 内存暂存表一般只适用于插入场景,并且这种内存暂存表是一次数据库连接状态下存在的,随着数据库连接的中断,暂存表也会没了,所以我们可以使用内存暂存表和多线程结合使用,效果应该会更好

  • 事务统一提交多线程持久化是两个较为需要考虑使用场景的点,对于事务统一提交来说,一般情况下我们都会使用Transactional注解进行管理,或者有的系统中把业务层全部使用切面声明事务,所以一般情况下是统一提交的,但是这也正是长事务的来源之一,多线程持久化必然会遇到多线程事务控制,异常处理等问题,下面将展开详细说明

@Transactional的引发的问题

这个注解正常情况下会使用在业务层,业务层的逻辑如果在逻辑较为复杂或者处理数据量大的情况下,就会造成单条业务的事务时间很长,可以在这张表中查看information_schema.INNODB_TRX当前当前事务有哪些并筛选时间较长的事务。

解决方法: 1、使用声明式事务进行控制,将事务粒度缩小 (推荐)

2、将业务逻辑处理和持久化操作处理进行拆分

上面的解决方法2中又可能会引发一种问题,如果把业务处理逻辑和持久化处理拆分为一个业务类中的两个方法,这种时候直接用this.saveXXX()待用的话就会引发事务失效,因为添加事务注解后会在生成bean的时候使用动态代理进行代理,this调用的不是动态代理类,所以事务生效不了。

解决方法:

1、添加manage层,将持久化操作下沉

2、在本类中注入自己的代理类,有三级缓存,不会造成循环依赖问题

@service
public class AServiceImpl implements AService{
    @Resource
    private AService aService;
    
    public void save(){
        aService.doSave();
    }
    ...
    public void doSave(){}
}

3、使用AopContext.currentProxy()获取当前类的代理类进行调用,注意需要配置@EnableAspectJAutoProxy(exposeProxy = true)

多线程处理

import cn.hutool.core.util.ObjectUtil;
import cn.hutool.extra.spring.SpringUtil;
import lombok.RequiredArgsConstructor;
import lombok.SneakyThrows;
import org.springframework.jdbc.datasource.DataSourceTransactionManager;
import org.springframework.stereotype.Service;
import org.springframework.transaction.PlatformTransactionManager;
import org.springframework.transaction.TransactionDefinition;
import org.springframework.transaction.TransactionManager;
import org.springframework.transaction.TransactionStatus;
import org.springframework.transaction.annotation.Propagation;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.transaction.support.DefaultTransactionDefinition;
import top.wecoding.config.AsyncConfig;
import top.wecoding.mybatis.domain.BatchDemo;
import top.wecoding.mybatis.service.BatchDemoService;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * 使用多线程持久化的时候对事务进行控制
 * @author ffd
 */
@Service
@RequiredArgsConstructor
public class BatchInsertByThreadService {
    private final BatchDemoService batchDemoService;
    private final PlatformTransactionManager transactionManager;
    private List<TransactionStatus> transactionStatuses = Collections.synchronizedList(new ArrayList<>());

    /**
     * 事务失效,插入199条数据
     */
    @Transactional(rollbackFor = Exception.class)
    @SneakyThrows
    public void insert0(){
        ExecutorService executorService = SpringUtil.getBean(AsyncConfig.ExecutorType.WORK_EXECUTOR);
        long starTime = System.currentTimeMillis();
        for (int i = 0; i < 200; i++) {
            BatchDemo batchDemo = new BatchDemo();
            batchDemo.setId(i);
            batchDemo.setBatchName("name"+i);
            batchDemo.setBatchValue("value"+i);
            executorService.execute(()->{
                if (batchDemo.getId()==100){
                    throw new RuntimeException("dashdkjashdk");
                }
                batchDemoService.save(batchDemo);
            });
        }
        executorService.shutdown();
        executorService.awaitTermination(30, TimeUnit.MINUTES);
        System.out.println("耗时:" + String.valueOf(System.currentTimeMillis() - starTime));
    }

    /**
     * 使用事务集合进行控制事务,注意如果事务数组大于数据库连接池的最大连接数就会报错,因为这条事务是基于数据库连接的
     * 可以提前对保存数据分批处理在用多线程持久化
     * todo 暂未成功
     * @link {https://blog.csdn.net/qq273766764/article/details/119972911}
     */
    @Transactional(rollbackFor = Exception.class)
    @SneakyThrows
    public void insert1(){
        //线程数量
        Integer threadCount = Runtime.getRuntime().availableProcessors();
        ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
        AtomicBoolean isError = new AtomicBoolean(false);
        boolean awaitTermination = false;
        try {
            for (int i = 0; i < 5; i++) {
                BatchDemo batchDemo = new BatchDemo();
                batchDemo.setId(i);
                batchDemo.setBatchName("name"+i);
                batchDemo.setBatchValue("value"+i);
                executorService.execute(()->{
                    try {
                        //创建一个新的事务状态
                        DefaultTransactionDefinition transactionDefinition = new DefaultTransactionDefinition();
                        transactionDefinition.setPropagationBehavior(Propagation.REQUIRED.value());
                        TransactionStatus status = transactionManager.getTransaction(transactionDefinition);
                        transactionStatuses.add(status);
//                        if (batchDemo.getId()==2){
//                            throw new RuntimeException("dashdkjashdk");
//                        }
                        batchDemoService.save(batchDemo);
                    } catch (Exception e) {
                        e.printStackTrace();
                        isError.set(true);
                    }
                });
            }
            executorService.shutdown();
            awaitTermination = executorService.awaitTermination(30, TimeUnit.MINUTES);
        } catch (InterruptedException e) {
            e.printStackTrace();
            isError.set(true);
        }
        if (!awaitTermination){
            isError.set(true);
        }
        if (ObjectUtil.isNotEmpty(transactionStatuses)){
            if (isError.get()){
                transactionStatuses.forEach(status->transactionManager.rollback(status));
            }else {
                transactionStatuses.forEach(status->transactionManager.commit(status));
            }
        }
    }
}

多线程的事务控制等因素较为繁琐,暂时只发现TransactionStatus集合控制的方式,但任然没有成功,并且这种实现感觉也很粗糙。

如有不一样的见解或好的实现方式,欢迎issue、pull request或评论。

gitee地址:https://gitee.com/tonyffd/batch_demo.git

github地址:https://github.com/Tony-ffd/batch_demo.git

2022/5/24 the end!🧸

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值