Spring Data JPA批量插入过慢及其优化 —— 自定义Repository

最新推荐文章于 2024-06-09 20:15:00 发布

Lazyafei

最新推荐文章于 2024-06-09 20:15:00 发布

阅读量9.7k

点赞数 3

分类专栏： # JPA

本文链接：https://blog.csdn.net/tfstone/article/details/113741890

版权

JPA 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

不久前在测试给组织添加应用功能时(类似于小程序，选中组织下的所有用户有权限使用添加的应用)，无意选中了根节点，插入状态一直pending，后台跟踪大概插入操作执行了270s，数据量有26510 -> 约2.6W条，看了下代码，采用整体for循环单个save的方式：

可谓慢的过分，于是着手优化；

优化一：考虑JPA的批量查询

JPA咱也不熟，批量插入搞不好需要添加配置，一顿百度猛如虎，需要添加如下配置：

1、application.properties添加（yml自行转换）

#批量的大小
spring.jpa.properties.hibernate.jdbc.batch_size=500
#可以告诉Hibernate JDBC驱动程序能够在执行批量更新时返回正确的受影响行数(执行版本检查所需)
spring.jpa.properties.hibernate.jdbc.batch_versioned_data=true
#开启批量插入
spring.jpa.properties.hibernate.order_inserts=true
#开启批量更新
spring.jpa.properties.hibernate.order_updates =true

2、数据库jdbc_url添加（这里是国产神通数据库、其他数据库参考着添加就ok）

#rewriteBatchedStatements=TRUE

oa-server.datasource.url=jdbc:oscar://x.x.x.x:2003/OSRDB?useSSL=false&rewriteBatchedStatements=TRUE

代码初步修改如下：

查看批量save的源码：

点进去发现其实调用的不过也是单个save的方法：

单个save方法慢其实就慢在 - 有检测机制isNew方法，它有许多实现类实习了此方法，每一次插入都会检测是新增还是更新（通过判断id是否为空、通过判断版本号），浪费了大量时间：

JPA对于批量插入的支持可真是难以言说，要解决这个检测机制，当然是要重写save方法；

AbstractEntityInformation.isNew（判断id是否为空）

public boolean isNew(T entity) {
        ID id = this.getId(entity);
        Class<ID> idType = this.getIdType();
        if (!idType.isPrimitive()) {
            return id == null;
        } else if (id instanceof Number) {
            return ((Number)id).longValue() == 0L;
        } else {
            throw new IllegalArgumentException(String.format("Unsupported primitive id type %s!", idType));
        }
    }

JpaMetamodelEntityInformation.isNew （判断版本号是否一致）

/* 
	 * (non-Javadoc)
	 * @see org.springframework.data.repository.core.support.AbstractEntityInformation#isNew(java.lang.Object)
	 */
	@Override
	public boolean isNew(T entity) {

		if (versionAttribute == null || versionAttribute.getJavaType().isPrimitive()) {
			return super.isNew(entity);
		}

		BeanWrapper wrapper = new DirectFieldAccessFallbackBeanWrapper(entity);
		Object versionValue = wrapper.getPropertyValue(versionAttribute.getName());

		return versionValue == null;
	}

优化二：自定义Repository、重写save方法，解决isNew检测问题，此时插入时间2.6W数据为68s

自定义Repository官方文档：Custom Implementations for Spring Data Repositories

自定义基础存储库： Customize the Base Repository

Example 39. Custom repository base class

class MyRepositoryImpl<T, ID>
  extends SimpleJpaRepository<T, ID> {

  private final EntityManager entityManager;

  MyRepositoryImpl(JpaEntityInformation entityInformation,
                          EntityManager entityManager) {
    super(entityInformation, entityManager);

    // Keep the EntityManager around to used from the newly introduced methods.
    this.entityManager = entityManager;
  }

  @Transactional
  public <S extends T> S save(S entity) {
    // implementation goes here
  }
}

这里我新增一个新的batchSave方法，替代之前的批量save：

1、新增 ApplicationCompanyOrganizationUserBatchSaveRepository 接口类继承JpaRepository、JpaSpecificationExecutor接口

@Repository
public interface ApplicationCompanyOrganizationUserBatchSaveRepository extends JpaRepository<ApplicationCompanyOrganizationUser, Integer>, JpaSpecificationExecutor<ApplicationCompanyOrganizationUser> {

    @Transactional
    List<ApplicationCompanyOrganizationUser> batchSave(Iterable<ApplicationCompanyOrganizationUser> entities);

}

2、新增实现类实现BatchSave方法（注意要在启动类中，加上该实现类）

新增实现类

package com.easemob.oa.persistence.jpa.impl;

import com.easemob.oa.models.entity.ApplicationCompanyOrganizationUser;
import com.easemob.oa.persistence.jpa.ApplicationCompanyOrganizationUserBatchSaveRepository;
import com.easemob.oa.persistence.jpa.ApplicationCompanyOrganizationUserRepository;
import com.google.common.collect.Lists;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.data.jpa.repository.support.JpaEntityInformation;
import org.springframework.data.jpa.repository.support.SimpleJpaRepository;
import org.springframework.data.repository.NoRepositoryBean;
import org.springframework.util.Assert;

import javax.persistence.EntityManager;
import javax.transaction.Transactional;
import java.io.Serializable;
import java.util.Iterator;
import java.util.List;

/**
 * @Author turnflys
 * @Date 1/12/21 10:51 PM
 */
@NoRepositoryBean
public class ApplicationCompanyOrganizationRepositoryImpl<T, ID extends Serializable> extends SimpleJpaRepository<ApplicationCompanyOrganizationUser,Integer> implements ApplicationCompanyOrganizationUserBatchSaveRepository{

    //持久化上下文
    private final EntityManager em;
    private final JpaEntityInformation<T, ?> entityInformation;


    public ApplicationCompanyOrganizationRepositoryImpl(JpaEntityInformation<T, ?> entityInformation, EntityManager entityManager) {
        super((JpaEntityInformation<ApplicationCompanyOrganizationUser, ?>) entityInformation,entityManager);
        this.entityInformation = entityInformation;
        this.em = entityManager;
    }

    @Override
    @Transactional
    public List<ApplicationCompanyOrganizationUser> batchSave(Iterable<ApplicationCompanyOrganizationUser> entities) {
        Iterator<ApplicationCompanyOrganizationUser> iterator = entities.iterator();
        int index = 0;
        while (iterator.hasNext()){
            em.persist(iterator.next());
            index++;
            if (index % 1000 == 0){
                em.flush();
                em.clear();
            }
        }
        if (index % 1000 != 0){
            em.flush();
            em.clear();
        }
        List<ApplicationCompanyOrganizationUser> lists = Lists.newArrayList();
        entities.forEach(lists::add);
        return lists;
    }

}

3、主逻辑service实现类，启用新repository中新的存储方法

优化三：引入多线程

参考本人的另一篇文章：JPA批量插入过慢及其优化之 —— 泛型提炼公用batchSave方法、引入多线程

Lazyafei

关注

3
点赞
踩
25

收藏

觉得还不错? 一键收藏
8
评论
Spring Data JPA批量插入过慢及其优化 —— 自定义Repository

不久前在测试给组织添加应用功能时(类似于小程序，选中组织下的所有用户有权限使用添加的应用)，无意选中了根节点，插入状态一直pending，后台跟踪大概插入操作执行了270s，数据量有26510 -> 约2.6W条，看了下代码，采用整体for循环单个save的方式：可谓慢的过分，于是着手优化；优化一：考虑JPA的批量查询JPA咱也不熟，批量插入搞不好需要添加配置，一顿百度猛如虎，需要添加如下配置：1、application.properties添加（yml自行转换）#.
复制链接

扫一扫

专栏目录