一、搭建NFS实验环境:
以centos8系统为例:
Linux 4.19.90-23.8.v2101.ky10.x86_64 #1 SMP Mon May 17 17:08:34 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
[test@localhost share]$ nfsstat -s 10.10.10.123
Server rpc stats:
calls badcalls badfmt badauth badclnt
22440720 0 0 0 0
Server nfs v4:
null compound
3 0% 22440719 99%
Server nfs v4 operations:
[root@localhost wal]# yum list | grep sqlite
sqlite.x86_64 3.24.0-9.ky10 @anaconda
sqlite.x86_64 3.32.3-7.ky10 @System
sqlite-devel.x86_64 3.24.0-9.ky10 @anaconda
freeradius-sqlite.x86_64 3.0.15-28.p01.ky10 ks10-adv-updates
rubygem-sqlite3.x86_64 1.4.2-2.ky10 ks10-adv-updates
rubygem-sqlite3-doc.noarch 1.4.2-2.ky10 ks10-adv-updates
sqlite-devel.x86_64 3.32.3-7.ky10 ks10-adv-updates
sqlite-help.noarch 3.32.3-7.ky10 ks10-adv-updates
[root@localhost wal]# python3 --version
Python 3.7.9
nfs服务端:
yum install -y nfs-utils rpcbind
mkdir -p /usr/share/nfs/share
echo "/usr/share/nfs/share 10.10.10.123/24(rw,sync,no_root_squash)" >> /etc/exports
exportfs -a
systemctl enable rpcbind nfs-server
systemctl start rpcbind nfs-server
exportfs -v
firewall-cmd --permanent --zone=public --add-service=nfs --add-service=mountd --add-service=rpc-bind --add-service=nscd --add-service=nfs-lock --add-service=nfs-idmap
firewall-cmd --reload
nfs客户端:
yum install -y nfs-utils
mkdir -p /mnt/nfs_share
mount -t nfs 10.10.10.123:/usr/share/nfs/share /mnt/nfs_share
二、基于NFS压测SQLite3的WAL模式:
压测脚本 wal_performance_test.py:
import time
import sqlite3
import subprocess
import re
from datetime import datetime
def wal_performance_test():
# WAL模式
dev, ip = get_default_interface_ip()
start = time.perf_counter()
for i in range(100):
try:
conn = sqlite3.connect('/mnt/nfs_share/wal_test.db')
# 设置WAL模式必须将journal_mode设为WAL
conn.execute("PRAGMA journal_mode=WAL;")
# 调整同步策略为NORMAL提升性能
conn.execute("PRAGMA synchronous=NORMAL;")
# 设置WAL文件自动清理的检查点阈值
conn.execute("PRAGMA wal_autocheckpoint=1000;")
conn.execute('CREATE TABLE IF NOT EXISTS data (id INT, host VARCHAR(15))')
conn.execute("BEGIN TRANSACTION")
for ii in range(100):
conn.execute("INSERT INTO data VALUES (?,?)", (i*100+ii,ip,))
conn.commit()
conn.close()
time.sleep(1)
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(1)
wal_time = time.perf_counter() - start
print(f"WAL模式: {wal_time:.6f}s")
def sqlite_performance_test():
# 常规模式
dev, ip = get_default_interface_ip()
start = time.perf_counter()
for i in range(100):
try:
conn = sqlite3.connect('/mnt/nfs_share/wal_test.db')
conn.execute('CREATE TABLE IF NOT EXISTS data (id INT, host VARCHAR(15))')
conn.execute("BEGIN TRANSACTION")
for ii in range(100):
conn.execute("INSERT INTO data VALUES (?,?)", (i*100+ii,ip,))
conn.commit()
conn.close()
time.sleep(1)
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(1)
normal_time = time.perf_counter() - start
print(f"常规模式: {normal_time:.6f}s")
def get_default_interface_ip():
"""通过路由表获取默认网卡和IP"""
try:
# 获取默认路由接口 - 使用 stdout=subprocess.PIPE
result = subprocess.run(
['ip', 'route', 'show', 'default'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
check=True
)
# 解析默认路由行
match = re.search(r'dev\s+(\w+)', result.stdout)
if match:
interface = match.group(1)
# 获取该接口的IP地址
ip_result = subprocess.run(
['ip', '-4', 'addr', 'show', interface],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
check=True
)
# 解析IP地址
ip_match = re.search(r'inet\s+(\d+\.\d+\.\d+\.\d+)', ip_result.stdout)
if ip_match:
return interface, ip_match.group(1)
except subprocess.CalledProcessError as e:
print(f"命令执行失败: {e}")
print(f"错误输出: {e.stderr}")
except Exception as e:
print(f"错误: {e}")
return None, None
wal_performance_test()
分别在客户机124和125执行:
python3 wal_performance_test.py
两个客户端交替出现错误 disk I/O error,总计出现6条:
[test@124 wal]$ python3 wal_performance_test.py
2025-12-09 14:33:21.776389 disk I/O error
2025-12-09 14:33:47.146377 disk I/O error
2025-12-09 14:34:39.921037 disk I/O error
WAL模式: 105.810731s
[test@125 wal]$ python3 wal_performance_test.py
2025-12-09 14:33:20.745065 disk I/O error
2025-12-09 14:34:13.513216 disk I/O error
2025-12-09 14:34:38.881051 disk I/O error
WAL模式: 101.554718s
三、验证数据:
在123机器执行:
yum install -y sqlite
sqlite3 wal_test.db
sqlite> select count(*) from data;
19400
sqlite> select count(*) from data where host='10.10.10.124';
9700
sqlite> select count(*) from data where host='10.10.10.125';
9700
总的执行数据量2*100*100=20000,减去6次失败 20000-600=19400,数据量正好。
也就是不会导致数据写乱,但是加锁失败会报错,需要客户端自己重试,如果不重试数据会丢失。
四、基于NFS压测SQLite的三种事务类型:
压测脚本 nfs_transaction_test.py:
import time
import sqlite3
import sys
from datetime import datetime
def nfs_deferred_transaction_test():
start = time.perf_counter()
ok_counter, fail_counter = 0, 0
for i in range(10000):
try:
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute("BEGIN TRANSACTION")
row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
counter = row[0] + 1
conn.execute("UPDATE data SET counter=?", (counter,))
time.sleep(0.01)
conn.commit()
conn.close()
ok_counter+=1
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(0.01)
fail_counter+=1
cost_time = time.perf_counter() - start
print(f"deferred: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")
def nfs_immediate_transaction_test():
start = time.perf_counter()
ok_counter, fail_counter = 0, 0
for i in range(10000):
try:
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute("BEGIN IMMEDIATE TRANSACTION")
row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
counter = row[0] + 1
conn.execute("UPDATE data SET counter=?", (counter,))
time.sleep(0.01)
conn.commit()
conn.close()
ok_counter+=1
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(0.01)
fail_counter+=1
cost_time = time.perf_counter() - start
print(f"immediate: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")
def nfs_exclusive_transaction_test():
start = time.perf_counter()
ok_counter, fail_counter = 0, 0
for i in range(10000):
try:
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute("BEGIN EXCLUSIVE TRANSACTION")
row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
counter = row[0] + 1
conn.execute("UPDATE data SET counter=?", (counter,))
time.sleep(0.01)
conn.commit()
conn.close()
ok_counter+=1
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(0.01)
fail_counter+=1
cost_time = time.perf_counter() - start
print(f"exclusive: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")
def main():
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute('CREATE TABLE IF NOT EXISTS data (counter INT)')
row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
if row is None:
conn.execute("INSERT INTO data (counter) VALUES (?)", (0,))
conn.commit()
conn.close()
command = sys.argv[1] if len(sys.argv)>1 else ""
if command == "" or command == "1":
nfs_deferred_transaction_test()
elif command == "2":
nfs_immediate_transaction_test()
elif command == "3":
nfs_exclusive_transaction_test()
else:
print("错误:参数不足或命令无效")
if __name__ == "__main__":
main()
1、deferred事务:
两台客户机都会出现大量的报错:
[test@124 wal]$ python3 nfs_transaction_test.py 1
2025-12-11 18:17:21.147881 database is locked
deferred: cost_time:247.461341s ok_counter:9884 fail_counter:116
[test@125 wal]$ python3 nfs_transaction_test.py 1
2025-12-11 18:17:22.120418 database is locked
deferred: cost_time:248.263145s ok_counter:237 fail_counter:9763
从123使用linux工具查询(deferred事务)也会出现报错,fail_counter加上结果数量正好是测试数量。
sqlite> select * from data;
Error: database is locked
sqlite> select * from data;
4742
sqlite> select * from data;
Error: database is locked
sqlite> select * from data;
10121
2、immediate事务:
仅有一台客户机有报错产生:
[test@124 wal]$ python3 nfs_transaction_test.py 2
immediate: cost_time:190.654982s ok_counter:10000 fail_counter:0
[test@125 wal]$ python3 nfs_transaction_test.py 2
2025-12-11 18:29:14.539222 database is locked
2025-12-11 18:29:23.738750 database is locked
2025-12-11 18:29:26.140422 disk I/O error
immediate: cost_time:435.384960s ok_counter:9969 fail_counter:31
从123使用linux工具查询(deferred事务)也会出现报错,fail_counter加上结果数量比测试数量多2条。
125出现过两条disk I/O error错误,怀疑是虽然报了disk I/O error错误,但数据实际写入成功。
[root@localhost share]# sqlite3 nfs_test.db
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> select * from data;
Error: database is locked
sqlite> select * from data;
19971
sqlite>
3、exclusive事务:
仅有一台客户机有报错:
[test@124 wal]$ python3 nfs_transaction_test.py 3
exclusive: cost_time:189.517209s ok_counter:10000 fail_counter:0
[test@125 wal]$ python3 nfs_transaction_test.py 3
2025-12-11 18:41:21.333215 database is locked
2025-12-11 18:41:26.411349 database is locked
exclusive: cost_time:432.467810s ok_counter:9965 fail_counter:35
从123使用linux工具查询(deferred事务)也会出现报错,fail_counter加上结果数量正好是测试数量。
[root@localhost share]# sqlite3 nfs_test.db
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> select * from data;
10547
sqlite> select * from data;
Error: database is locked
sqlite> select * from data;
11174
sqlite> select * from data;
19965
sqlite>
五、基于NFS压测SQLite的三种事务类型(WAL模式):
压测脚本 nfs_wal_transaction_test.py:
import time
import sqlite3
import sys
from datetime import datetime
def nfs_deferred_transaction_test():
start = time.perf_counter()
ok_counter, fail_counter = 0, 0
for i in range(10000):
try:
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute("PRAGMA journal_mode=WAL;")
conn.execute("BEGIN TRANSACTION")
row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
counter = row[0] + 1
conn.execute("UPDATE data SET counter=?", (counter,))
time.sleep(0.01)
conn.commit()
conn.close()
ok_counter+=1
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(0.01)
fail_counter+=1
cost_time = time.perf_counter() - start
print(f"deferred: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")
def nfs_immediate_transaction_test():
start = time.perf_counter()
ok_counter, fail_counter = 0, 0
for i in range(10000):
try:
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute("PRAGMA journal_mode=WAL;")
conn.execute("BEGIN IMMEDIATE TRANSACTION")
row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
counter = row[0] + 1
conn.execute("UPDATE data SET counter=?", (counter,))
time.sleep(0.01)
conn.commit()
conn.close()
ok_counter+=1
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(0.01)
fail_counter+=1
cost_time = time.perf_counter() - start
print(f"immediate: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")
def nfs_exclusive_transaction_test():
start = time.perf_counter()
ok_counter, fail_counter = 0, 0
for i in range(10000):
try:
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute("PRAGMA journal_mode=WAL;")
conn.execute("BEGIN EXCLUSIVE TRANSACTION")
row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
counter = row[0] + 1
conn.execute("UPDATE data SET counter=?", (counter,))
time.sleep(0.01)
conn.commit()
conn.close()
ok_counter+=1
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(0.01)
fail_counter+=1
cost_time = time.perf_counter() - start
print(f"exclusive: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")
def main():
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute('CREATE TABLE IF NOT EXISTS data (counter INT)')
row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
if row is None:
conn.execute("INSERT INTO data (counter) VALUES (?)", (0,))
conn.commit()
conn.close()
command = sys.argv[1] if len(sys.argv)>1 else ""
if command == "" or command == "1":
nfs_deferred_transaction_test()
elif command == "2":
nfs_immediate_transaction_test()
elif command == "3":
nfs_exclusive_transaction_test()
else:
print("错误:参数不足或命令无效")
if __name__ == "__main__":
main()
1、deferred事务:
两台机器都出现大量的两种报错,接近于不可用。
2025-12-14 23:27:23.548719 database is locked
2025-12-14 23:27:23.578794 disk I/O error
2、immediate事务:
两台机器都出现大量的两种报错,接近于不可用。
2025-12-14 23:21:53.997637 database disk image is malformed
2025-12-14 23:23:27.288040 disk I/O error
3、exclusive事务:
两台机器都出现大量的三种报错,接近于不可用。
2025-12-14 23:26:58.143788 database is locked
2025-12-14 23:26:58.193508 disk I/O error
2025-12-14 23:29:44.264101 database disk image is malformed
六、测试 SQLite PRAGMA 命令的影响:
压测脚本 nfs_sqlite_pragma_test:
import time
import sqlite3
import sys
import subprocess
import re
import argparse
from datetime import datetime
def nfs_sqlite_test():
start = time.perf_counter()
ok_counter, fail_counter = 0, 0
for i in range(10000):
try:
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute(f"PRAGMA journal_mode={args.journal_mode};")
conn.execute(f"PRAGMA synchronous={args.synchronous};")
conn.execute(f"PRAGMA busy_timeout={args.busy_timeout};")
conn.execute(f"PRAGMA temp_store={args.temp_store}")
conn.execute(f"PRAGMA locking_mode={args.locking_mode}")
conn.execute(f"BEGIN {args.trans_mode} TRANSACTION")
row = conn.execute('SELECT * FROM data WHERE host=? LIMIT 1', (myip,)).fetchone()
counter = row[0] + 1
conn.execute('UPDATE data SET counter=? WHERE host=?', (counter,myip,))
time.sleep(0.01)
conn.commit()
conn.close()
ok_counter+=1
except Exception as e:
now = datetime.now()
print(now, e)
time.sleep(0.01)
fail_counter+=1
cost_time = time.perf_counter() - start
print(f"cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")
def get_default_interface_ip():
try:
result = subprocess.run(
['ip', 'route', 'show', 'default'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
check=True
)
match = re.search(r'dev\s+(\w+)', result.stdout)
if match:
interface = match.group(1)
ip_result = subprocess.run(
['ip', '-4', 'addr', 'show', interface],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
check=True
)
ip_match = re.search(r'inet\s+(\d+\.\d+\.\d+\.\d+)', ip_result.stdout)
if ip_match:
return interface, ip_match.group(1)
except subprocess.CalledProcessError as e:
print(f"命令执行失败: {e}")
print(f"错误输出: {e.stderr}")
except Exception as e:
print(f"错误: {e}")
return None, None
def main():
global myip, args
conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
conn.execute('CREATE TABLE IF NOT EXISTS data (counter INT, host VARCHAR(15))')
_, myip = get_default_interface_ip()
row = conn.execute('SELECT * FROM data WHERE host=? LIMIT 1', (myip,)).fetchone()
if row is None:
conn.execute("INSERT INTO data VALUES (?,?)", (0,myip,))
conn.commit()
conn.close()
parser = argparse.ArgumentParser()
parser.add_argument('--trans_mode', type=str, default="DEFERRED", help="DEFERRED/IMMEDIATE/EXCLUSIVE")
parser.add_argument('--journal_mode', type=str, default="DELETE", help="DELETE/TRUNCATE/PERSIST/MEMORY/WAL/OFF")
parser.add_argument('--synchronous', type=str, default="NORMAL", help="NORMAL/OFF/FULL/EXTRA")
parser.add_argument('--busy_timeout', type=int, default=0, help="ms")
parser.add_argument('--temp_store', type=str, default="DEFAULT", help="DEFAULT/FILE/MEMORY")
parser.add_argument('--locking_mode', type=str, default="NORMAL", help="NORMAL/EXCLUSIVE")
args = parser.parse_args()
print(datetime.now())
print("ip:", myip)
print("args:", args)
nfs_sqlite_test()
if __name__ == "__main__":
main()
测试完全同步、独占事务、繁忙超时1分钟、默认日志模式下的情况:
python3 nfs_sqlite_pragma_test.py --trans_mode EXCLUSIVE --busy_timeout 60000 --synchronous FULL
两台机器共出现一条错误,损失一个计数:
2025-12-15 16:18:33.476198 database is locked
cost_time:945.651139s ok_counter:9999 fail_counter:1
测试完全同步、独占事务、繁忙超时1分钟、临时存储采用内存、默认日志模式下的情况:
python3 nfs_sqlite_pragma_test.py --trans_mode EXCLUSIVE --busy_timeout 60000 --synchronous FULL --temp_store MEMORY
两台机器共出现一条错误,损失一个计数,总耗时比不加--temp_store MEMORY少了一点:
2025-12-15 16:35:47.510544 database is locked
cost_time:903.890313s ok_counter:9999 fail_counter:1
测试不指定事务类型,但指定locking_mode为EXCLUSIVE:
python3 nfs_sqlite_pragma_test.py --trans_mode ' ' --busy_timeout 60000 --locking_mode EXCLUSIVE
两台机器都有大量报错,计数损失严重,说明locking_mode=EXCLUSIVE并不能影响全局的默认事务类型,且都是立刻出现报错,并没有忙等的过程。
2025-12-15 17:08:09.288668 database is locked
2025-12-15 17:08:09.332024 database is locked
测试设置busy_timeout是否对默认事务类型(DEFERRED)有影响:
python3 nfs_sqlite_pragma_test.py --busy_timeout 60000
两台机器都有大量报错,计数损失严重,都是立刻出现报错,并没有忙等的过程,说明busy_timeout对deferred类型事务无效,busy_timeout只对immediate和exclusive事务有效。
2025-12-15 17:18:05.893199 database is locked
2025-12-15 17:18:06.006748 database is locked
----
NFS使用SQLite3总结:
1、NFS网络文件环境不要使用WAL(预写日志)模式,报错太多,而且多了个 database disk image is malformed 错误。可以使用默认的 PRAGMA journal_mode = DELETE。
2、事务类型使用 EXCLUSIVE(排他)或 IMMEDIATE(立即),因为 DEFERRED(延迟)事务写入的时候加锁失败太多(database is locked)。
3、设置 PRAGMA busy_timeout = 60000;(1分钟,自定义) 对 "database is locked" 错误进行延后重试,时间足够长就成了串行化,真繁忙就出现堆积,任务超时失败。注意只对immediate和exclusive事务有效。
4、设置 PRAGMA synchronous = FULL; 避免数据丢失。
5、设置 PRAGMA temp_store = MEMORY; 避免在NFS上创建临时文件导致多节点间冲突(我没测试出区别)。
--end--
375

被折叠的 条评论
为什么被折叠?



