坑 1:查询没走索引导致的 MySql lock wait timeout exceeded
需求:修复表数据,修改会员订单表 p_member_order 的 pay_time,同步修改会员子订单表 p_member_order_item 的 pay_time,这两个操作需要加事务。
p_member_order 的数量量为 1883 1645,p_member_order_item 的数据量为 3252 7087
实现类:
@Component
@Scope("prototype")
public class FixOrderPayTimeToolsAction extends AdminAction {
private static Logger logger = LoggerFactory.getLogger(FixOrderPayTimeToolsAction.class);
/**
* 0 不更改数据,1更改数据
*/
private int update = 0;
private Long minId;
private Long maxId;
private Integer maxPoolSize = 3;
@Autowired
private IMemberOrderService memberOrderService;
@Autowired
private ILrOnlineOrderDao lrOnlineOrderDao;
@Autowired
private ILrOnlineColdOrderDao lrOnlineColdOrderDao;
private ThreadPoolExecutor executorService = new ThreadPoolExecutor(
1, maxPoolSize, 0, TimeUnit.MILLISECONDS,
new ArrayBlockingQueue<>(2048), new ThreadPoolExecutor.CallerRunsPolicy());
@Override
public String execute() {
logger.info("修复 - 开始 {}", update == 1 ? "修复数据":"不修复");
if (minId == null) {
minId = 0L;
}
if (maxId == null) {
maxId = 19205834L;
}
long startId = minId;
while (true){
logger.info("startId:{}", startId);
// 串行获取
List<MemberOrderBean> list = Collections.synchronizedList(memberOrderService.getListByIdFlow(startId, maxId, 5000));
if (CollectionUtils.isEmpty(list)) {
logger.info("没有数据了,修复完成");
break;
}
startId = list.get(list.size() - 1).getId();
logger.info("startId=" + startId + ", endId=" + maxId + "is running [ pool size = " + executorService.getPoolSize() +
", active threads = " + executorService.getActiveCount() +
", queued tasks = " + executorService.getQueue().size() +
", completed tasks = " + executorService.getCompletedTaskCount() +
"]");
// 并行处理
for (int i = 0; i < list.size(); i++) {
int finalI = i;
MemberOrderBean orderBean = list.get(finalI);
executorService.execute(()-> {
try{
fixData(orderBean);
}catch (Exception e){
logger.info("fix data fail" + e.getMessage());
}
});
}
}
logger.info("修复 - 结束");
return SUCCESS;
}
private void fixData(MemberOrderBean orderBean) {
LrOrderBean successMemberOrder;
// 从冷库获取
successMemberOrder = LrOnlineColdOrderDao.getSuccessMemberOrder(orderBean.getUserId(), orderBean.getOrderId());
if (successMemberOrder == null) {
// 从热库获取
successMemberOrder = LrOnlineOrderDao.getSuccessMemberOrder(orderBean.getUserId(), orderBean.getOrderId());
}
if (successMemberOrder != null) {
if (!orderBean.getPayTime().equals(successMemberOrder.getNotifyTime())) {
logger.info("memberOrder id {} orderId {} userId{} 当前 payTime:{} 修改为:notifyTime:{}"
, orderBean.getId(), orderBean.getOrderId(), orderBean.getUserId(), DateUtils.getDayStr(orderBean.getPayTime(), DateUtils.DATETIME)
, DateUtils.getDayStr(successMemberOrder.getNotifyTime(), DateUtils.DATETIME));
if (update == 1) {
MemberOrderBean updateMemberOrderBean = new MemberOrderBean();
updateMemberOrderBean.setId(orderBean.getId());
updateMemberOrderBean.setUserId(orderBean.getUserId());
updateMemberOrderBean.setPayTime(successMemberOrder.getNotifyTime());
memberOrderService.updateOrderAndItem(updateMemberOrderBean);
}
}
}
}
}
@Service
public class MemberOrderServiceImpl implements IMemberOrderService {
@Override
@Transactional(value = DataSourceConsts.AUDIOBOOK_TX_MANAGER_NAME, rollbackFor = Throwable.class)
public int updateOrderAndItem(MemberOrderBean updateBean) {
// 修改会员订单表
memberOrderManager.updatePayTimeById(updateBean);
// 修改会员子订单表
MemberOrderItemBean updateItemBean = new MemberOrderItemBean();
updateItemBean.setOrderId(updateBean.getId());
updateItemBean.setPayTime(updateBean.getPayTime());
updateItemBean.setUserId(updateBean.getUserId());
memberOrderItemManager.updatePayTimeByOrderId(updateItemBean);
return 0;
}
}
修改 sql:
update p_member_order set pay_time=#{payTime} where id=#{id}
update p_member_order_item set update_time=now(), pay_time=#{payTime} where order_id = #{orderId}
开跑,然后报错
Lock wait timeout exceeded
:并发环境下,后提交的事务等待前面处理的事务释放锁,但是在等待的时候超过了mysql的锁等待时间,就会引发这个异常
通过 arthas 工具定位,发现慢在了更改方法 updateOrderAndItem () ,等待了 55 s
检查 sql,发现 sql 没有走索引字段 user_id,导致 update 订单子表的时间过长,阻塞了后面的事务
// 原来的 sql
update p_member_order_item set update_time=now(), pay_time=#{payTime} where order_id = #{orderId}
// 优化后的 sql
update p_member_order_item set update_time=now(), pay_time=#{payTime} where
user_id = #{userId}
order_id = #{orderId}
如果在事务中执行了一条没有索引条件的查询,引发全表扫描,把行级锁上升为全表记录锁定(等价于表级锁),多个这样的事务执行后,就很容易产生死锁和阻塞,最终应用系统会越来越慢,发
反思:操作数据量大的表时,先看该表哪些字段设置了索引,每条 sql 尽量保证索引字段值不为空,当然,纯靠自觉是很虚的,还有什么手段可以保证?
坑 2:查询没命中索引导致 CPU 使用率飙升,被老大拉去教育
一张新上线的表,起始数据为 0,往后每日递增 60 万左右,定时任务每日定时查询一次该表,里面涉及按用户段并发查询,表数据量为 400万左右的时候,DBA 在群里发出警告
吓得我赶紧排查,发现查询 sql 没有命中索引字段 int_date,瞬间更迷惑了,因为 DAO 层每个方法都传了 int_date 参数值,再去查看 xml ,发现
这么明显的 bug 测试环境居然没被发现!!!因为是新表,而测试环境造的数据比较少,导致 bug 躲过一劫。