分布式集群下数据可见性问题分析记录

在AntDB 3.1的分布式集群环境中,文章详细记录了一次由于全局事务处理导致的数据可见性问题分析过程。问题源于在特定场景下,查询到的记录属于local committed but global uncommitted状态,导致外键检查触发核心崩溃。文章深入探讨了问题的原因,包括全局快照合并导致的事务状态错误,并提出了修复方案,包括调整快照合并策略和更新查询逻辑,以避免查询到未全局提交的记录。此外,还讨论了异常环境构造和不同操作(查询、更新、删除)下的影响。
摘要由CSDN通过智能技术生成

分布式集群下数据可见性问题分析记录

1. 问题背景介绍

AntDB 3.1在做benchmarksql压测时,一直有个core down 问题。起初一直以为是触发器问题,因触发器的逻辑不太熟悉,一直先紧着别的问题修正。最近定位这个问题时,根据core文件的堆栈信息发现,是foreign key检查的问题。

可通过如下脚本复现该问题。

1-1. 集群搭建

​ 为了简化测试环境,排除其他干扰项,此处搭建一个1C1D的AntDB集群。

postgres=# table pgxc_node;
 node_name | node_type | node_port |   node_host   | nodeis_primary | nodeis_preferred |  node_id   
-----------+-----------+-----------+---------------+----------------+------------------+-----------
 cd2       | C         |     38000 | 169.254.3.127 | f              | f                |  455674804
 dn1       | D         |     39001 | 169.254.3.127 | f              | f                | -560021589
(2 rows)
1-2. 建表SQL

​ 创建复制表test, testa,其中test(sid)外键关联testa的id字段。

drop table if exists testa, test;
create table testa(id serial primary key, sex varchar(2)) DISTRIBUTE BY REPLICATION;
insert into testa(sex) values('m');
create table test(id serial primary key, name varchar(20), sid integer references testa on delete cascade) DISTRIBUTE BY REPLICATION;
1-2. 测试脚本
#!/bin/bash

for((i=0;i<=1000;i++));
do
        psql -X -p 38000 -U postgres <<EOF
begin;
insert into testa(sex) values('m');
update testa set sex = 'f' where id = 1 returning *;
insert into test(name, sid) values('zl', 1) returning *;
end;
EOF
done
1-3. 测试方法

​ 后台同时运行2次以上脚本,一段时间(很快,约1min)后即可在dn1上产生core文件。

[dev@CentOS-7 ~/workspace_adb31]$ sh test.sh >test.result 2>&1 &
[1] 27726
[dev@CentOS-7 ~/workspace_adb31]$ sh test.sh >test.result 2>&1 &
[2] 27748
[dev@CentOS-7 ~/workspace_adb31]$ sh test.sh >test.result 2>&1 &
[3] 29055
1-4. core 文件堆栈
[dev@CentOS-7 ~/workspace_adb31]$ gdb postgres /data/dev/adb31/dn/dn1/core.22968
(gdb) bt
#0  0x00007fe831fd71f7 in raise () from /lib64/libc.so.6
#1  0x00007fe831fd88e8 in abort () from /lib64/libc.so.6
#2  0x0000000000aa730d in ExceptionalCondition (conditionName=0xd41db0 "!(CritSectionCount > 0 || TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) && ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ? HeapTupleGetUpdateXid(tup) : ( (tup)-"..., errorType=0xd41cc4 "FailedAssertion", fileName=0xd41c80 "/home/dev/workspace_adb31/adb_sql/src/backend/utils/time/combocid.c", lineNumber=132) at /home/dev/workspace_adb31/adb_sql/src/backend/utils/error/assert.c:54
#3  0x0000000000af1cad in HeapTupleHeaderGetCmax (tup=0x7fe819dfd360) at /home/dev/workspace_adb31/adb_sql/src/backend/utils/time/combocid.c:131
#4  0x00000000004de247 in heap_lock_tuple (relation=0x7fe833bf3aa0, tuple=0x7fff69b0dc10, cid=3, mode=LockTupleKeyShare, wait_policy=LockWaitBlock, follow_updates=1 '\001', buffer=0x7fff69b0dc58,  hufd=0x7fff69b0dc40) at /home/dev/workspace_adb31/adb_sql/src/backend/access/heap/heapam.c:5100
#5  0x00000000006ebe94 in ExecLockRows (node=0x1fe9950) at /home/dev/workspace_adb31/adb_sql/src/backend/executor/nodeLockRows.c:179
#6  0x00000000006c8577 in ExecProcNode (node=0x1fe9950) at /home/dev/workspace_adb31/adb_sql/src/backend/executor/execProcnode.c:585
#7  0x00000000006c4436 in ExecutePlan (estate=0x1fe9748, planstate=0x1fe9950, use_parallel_mode=0 '\000', operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=1, direction=ForwardScanDirection,  dest=0x101de40 <spi_printtupDR>) at /home/dev/workspace_adb31/adb_sql/src/backend/executor/execMain.c:1624
#8  0x00000000006c2379 in standard_ExecutorRun (queryDesc=0x1fe7798, direction=ForwardScanDirection, count=1) at /home/dev/workspace_adb31/adb_sql/src/backend/executor/execMain.c:352
#9  0x00000000006c21e8 in ExecutorRun (queryDesc=0x1fe7798, direction=ForwardScanDirection, count=1) at /home/dev/workspace_adb31/adb_sql/src/backend/executor/execMain.c:297
#10 0x000000000070a5a5 in _SPI_pquery (queryDesc=0x1fe7798, fire_triggers=0 '\000', tcount=1) at /home/dev/workspace_adb31/adb_sql/src/backend/executor/sp
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值