一、概述
在RR事务隔离级别下,当分布式锁外有事务时,高并发请求下可能会出现先开启事务的线程后获得锁的场景,这种情况下先拿到锁的事务版本号更大,后拿到锁的事务版本号更小,那么后拿到锁的将无法查询到先拿到锁的线程提交的数据,导致将其更新覆盖。
图一
二、用例验证:
如图二时序图所示,这里通过一个loginName属性入库的测试用例模拟这种问题场景。表中login_name属性添加了唯一约束,在执行写入检查不存在重复的loginName后再写入。如果分布式锁失效,则数据库会返回唯一约束冲突。
图二
1.业务流程如下:
1)执行findAll查询,真正开启事务;
2)执行1~5000毫秒的随机休眠,模拟在不同数据场景下的执行时长;
3)获取Redis分布式锁;
4)校验是否已经存在同样的loginName,存在则抛出异常;
5)不存在则执行入库保存。
2.示例核心代码:
@Service
public class DistributedLockTransactionTestService implements Interface {
private final Logger log = LoggerFactory.getLogger(this.getClass());
static final String KEY_PRE = "demo_";
static Random r = new Random();
@Autowired
RedisUtil redisUtil;
@Autowired
UserInfoRepository userInfoRepository;
@Transactional(rollbackFor = Exception.class, isolation = Isolation.REPEATABLE_READ)
public void saveDeviceWithDistributedLock(String loginName) {
log.info("事务状态,{}", TransactionSynchronizationManager.isActualTransactionActive());
//通过查询开启事务
userInfoRepository.findAll();
//线程随机时长休眠模拟不同业务数据下的执行时长
int i = r.nextInt(5000);
log.info("随机数:{}", i);
try {
Thread.sleep(i);
} catch (InterruptedException e) {
e.printStackTrace();
}
try {
System.out.println("当前线程名称:" + Thread.currentThread().getName());
redisUtil.lock(KEY_PRE + loginName, 30000L, () -> saveNormal(loginName));
} catch (DescribeException de) {
log.info("新增用户失败:{}", de.getMessage());
} catch (Exception e) {
log.error("操作失败:", e);
throw new DescribeException(SYSTEM_ERROR);
}
}
public UserInfo saveNormal(String loginName) {
check(loginName);
UserInfo userInfo = new UserInfo(loginName);
return userInfoRepository.save(userInfo);
}
public void check(String loginName) {
log.info("检查用户是否已经方法的事务,{}", TransactionSynchronizationManager.isActualTransactionActive());
Optional<UserInfo> optional = userInfoRepository.findByLoginName(loginName);
if (optional.isPresent()) {
throw new DescribeException(PARAM_ERROR, "用户账号已存在");
}
}
}
3.验证结果:启动服务之后,用两个并发开始请求,很容易就触发了唯一键冲突异常。
Hibernate: select userinfo0_.id as id1_0_, userinfo0_.create_time as create_t2_0_, userinfo0_.login_name as login_na3_0_, userinfo0_.update_time as update_t4_0_ from user_info userinfo0_
2022-09-02 15:07:05.057 INFO 10024 --- [nio-8006-exec-1] .d.DistributedLockTransactionTestService : 随机数:3739
2022-09-02 15:07:05.413 INFO 10024 --- [nio-8006-exec-2] .d.DistributedLockTransactionTestService : 事务状态,true
Hibernate: select userinfo0_.id as id1_0_, userinfo0_.create_time as create_t2_0_, userinfo0_.login_name as login_na3_0_, userinfo0_.update_time as update_t4_0_ from user_info userinfo0_
2022-09-02 15:07:05.416 INFO 10024 --- [nio-8006-exec-2] .d.DistributedLockTransactionTestService : 随机数:2067
当前线程名称:http-nio-8006-exec-2
2022-09-02 15:07:07.505 INFO 10024 --- [nio-8006-exec-2] .d.DistributedLockTransactionTestService : 检查用户是否已经方法的事务,true
Hibernate: select userinfo0_.id as id1_0_, userinfo0_.create_time as create_t2_0_, userinfo0_.login_name as login_na3_0_, userinfo0_.update_time as update_t4_0_ from user_info userinfo0_ where userinfo0_.login_name=?
Hibernate: select nextval ('hibernate_sequence')
Hibernate: insert into user_info (create_time, login_name, update_time, id) values (?, ?, ?, ?)
当前线程名称:http-nio-8006-exec-1
2022-09-02 15:07:08.797 INFO 10024 --- [nio-8006-exec-1] .d.DistributedLockTransactionTestService : 检查用户是否已经方法的事务,true
Hibernate: select userinfo0_.id as id1_0_, userinfo0_.create_time as create_t2_0_, userinfo0_.login_name as login_na3_0_, userinfo0_.update_time as update_t4_0_ from user_info userinfo0_ where userinfo0_.login_name=?
Hibernate: select nextval ('hibernate_sequence')
Hibernate: insert into user_info (create_time, login_name, update_time, id) values (?, ?, ?, ?)
2022-09-02 15:07:08.810 INFO 10024 --- [nio-8006-exec-1] o.h.e.j.b.internal.AbstractBatchImpl : HHH000010: On release of batch it still contained JDBC statements
2022-09-02 15:07:08.810 ERROR 10024 --- [nio-8006-exec-1] o.h.e.jdbc.batch.internal.BatchingBatch : HHH000315: Exception executing batch [java.sql.BatchUpdateException: Batch entry 0 insert into user_info (create_time, login_name, update_time, id) values ('2022-09-02 15:07:08.80088+08', '6666666666', '2022-09-02 15:07:08.80088+08', 620) was aborted: ERROR: duplicate key value violates unique constraint "user_info_login_name_idx"
详细:Key (login_name)=(6666666666) already exists. Call getNextException to see other errors in the batch.], SQL: insert into user_info (create_time, login_name, update_time, id) values (?, ?, ?, ?)
2022-09-02 15:07:08.810 WARN 10024 --- [nio-8006-exec-1] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: 23505
2022-09-02 15:07:08.810 ERROR 10024 --- [nio-8006-exec-1] o.h.engine.jdbc.spi.SqlExceptionHelper : Batch entry 0 insert into user_info (create_time, login_name, update_time, id) values ('2022-09-02 15:07:08.80088+08', '6666666666', '2022-09-02 15:07:08.80088+08', 620) was aborted: ERROR: duplicate key value violates unique constraint "user_info_login_name_idx"
详细:Key (login_name)=(6666666666) already exists. Call getNextException to see other errors in the batch.
2022-09-02 15:07:08.810 ERROR 10024 --- [nio-8006-exec-1] o.h.engine.jdbc.spi.SqlExceptionHelper : ERROR: duplicate key value violates unique constraint "user_info_login_name_idx"
详细:Key (login_name)=(6666666666) already exists.
2022-09-02 15:07:08.813 ERROR 10024 --- [nio-8006-exec-1] o.h.i.ExceptionMapperStandardImpl : HHH000346: Error during managed flush [org.hibernate.exception.ConstraintViolationException: could not execute batch]
2022-09-02 15:07:08.821 ERROR 10024 --- [nio-8006-exec-1] o.a.c.c.C.[.[.[.[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [/demo] threw exception [Request processing failed; nested exception is org.springframework.dao.DataIntegrityViolationException: could not execute batch; SQL [insert into user_info (create_time, login_name, update_time, id) values (?, ?, ?, ?)]; constraint [user_info_login_name_idx]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute batch] with root cause
org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "user_info_login_name_idx"
详细:Key (login_name)=(6666666666) already exists.
从日志分析并发执行流程如下:
1)线程1先开启了事务,随机休眠时间是3739毫秒;
2)线程2后开启事务,随机休眠时间是2067毫秒,所以线程2先拿到了锁并执行了写入,释放锁。
3)线程1拿到了锁,查不到线程2已经入库的数据,也执行了写入,导致数据库返回唯一约束冲突。
三、解决方案一
通常来说,还是建议将事务声明在锁内,例如下面图三所示。
图三
但是这个也不是绝对的最佳方案,例如加锁内的方法被使用的很多,锁向上移动改造的成本太高;或者由于加锁范围的扩大影响了服务性能,都不适合使用这种方案。
四、解决方案二
当分布式锁上移的方案不适合具体业务场景的时候,还可以使用事务传播模式为REQUIRES_NEW方案来实现。
图四
1.业务流程:
1)线程1执行方法A,获得事务版本号N;
2)线程2执行方法A,获取事务版本号N+1(此处事务版本号+1只是表示获得了更大的事务版本号);
3)线程2获得锁,执行方法B,获得方法B的事务版本号M,完成方法B后提交其事务,并释放锁;
4)线程2继续执行方法A;
5)线程1获得锁,执行方法B,获得方法B的事务版本号M+1,完成业务B后提交其事务,并释放锁;
6)线程2继续执行方法A。
2.缺点和解决办法:
1)这种方案要求锁内是独立的业务,可以单独保证数据一致性。另外,在外部事务后续还有业务代码要执行的场景中,要考虑处理外部事务回滚后,锁内业务的回滚处理。
2)spring-tx包中提供了事务同步管理器:TransactionSynchronizationManager,可以实现在使用加锁方法时注册一个同步事件来外部监听事务,当外面的事务发生完成会执行事件中的内容,我们就可以在这个里面做手动回滚。
TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
@Override
public void afterCommit() {
}
@Override
public void afterCompletion(int status) {
}
});