SQLite3在NFS下会不会导致文件写乱?

一、搭建NFS实验环境:

以centos8系统为例:

Linux 4.19.90-23.8.v2101.ky10.x86_64 #1 SMP Mon May 17 17:08:34 CST 2021 x86_64 x86_64 x86_64 GNU/Linux

[test@localhost share]$ nfsstat -s 10.10.10.123
Server rpc stats:
calls      badcalls   badfmt     badauth    badclnt
22440720   0          0          0          0       

Server nfs v4:
null             compound         
3         0%     22440719 99%     

Server nfs v4 operations:

[root@localhost wal]# yum list | grep sqlite
sqlite.x86_64                                         3.24.0-9.ky10                          @anaconda        
sqlite.x86_64                                         3.32.3-7.ky10                          @System          
sqlite-devel.x86_64                                   3.24.0-9.ky10                          @anaconda        
freeradius-sqlite.x86_64                              3.0.15-28.p01.ky10                     ks10-adv-updates 
rubygem-sqlite3.x86_64                                1.4.2-2.ky10                           ks10-adv-updates 
rubygem-sqlite3-doc.noarch                            1.4.2-2.ky10                           ks10-adv-updates 
sqlite-devel.x86_64                                   3.32.3-7.ky10                          ks10-adv-updates 
sqlite-help.noarch                                    3.32.3-7.ky10                          ks10-adv-updates 

[root@localhost wal]# python3 --version
Python 3.7.9

nfs服务端:

yum install -y nfs-utils rpcbind
mkdir -p /usr/share/nfs/share
echo "/usr/share/nfs/share 10.10.10.123/24(rw,sync,no_root_squash)" >> /etc/exports
exportfs -a
systemctl enable rpcbind nfs-server
systemctl start rpcbind nfs-server
exportfs -v
firewall-cmd --permanent --zone=public --add-service=nfs --add-service=mountd --add-service=rpc-bind --add-service=nscd --add-service=nfs-lock --add-service=nfs-idmap
firewall-cmd --reload

nfs客户端:

yum install -y nfs-utils
mkdir -p /mnt/nfs_share
mount -t nfs 10.10.10.123:/usr/share/nfs/share /mnt/nfs_share

二、基于NFS压测SQLite3的WAL模式:

压测脚本 wal_performance_test.py:

import time
import sqlite3
import subprocess
import re
from datetime import datetime

def wal_performance_test():
    # WAL模式
    dev, ip = get_default_interface_ip()
    start = time.perf_counter()
    for i in range(100):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/wal_test.db')
            # 设置WAL模式必须将journal_mode设为WAL
            conn.execute("PRAGMA journal_mode=WAL;")  
            # 调整同步策略为NORMAL提升性能
            conn.execute("PRAGMA synchronous=NORMAL;") 
            # 设置WAL文件自动清理的检查点阈值
            conn.execute("PRAGMA wal_autocheckpoint=1000;") 
            conn.execute('CREATE TABLE IF NOT EXISTS data (id INT, host VARCHAR(15))')
            conn.execute("BEGIN TRANSACTION")
            for ii in range(100):
                conn.execute("INSERT INTO data VALUES (?,?)", (i*100+ii,ip,))
            conn.commit()
            conn.close()
            time.sleep(1)
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(1)
    wal_time = time.perf_counter() - start
    print(f"WAL模式: {wal_time:.6f}s")

def sqlite_performance_test():
    # 常规模式
    dev, ip = get_default_interface_ip()
    start = time.perf_counter()
    for i in range(100):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/wal_test.db')
            conn.execute('CREATE TABLE IF NOT EXISTS data (id INT, host VARCHAR(15))')
            conn.execute("BEGIN TRANSACTION")
            for ii in range(100):
                conn.execute("INSERT INTO data VALUES (?,?)", (i*100+ii,ip,))
            conn.commit()
            conn.close()
            time.sleep(1)
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(1)
    normal_time = time.perf_counter() - start
    print(f"常规模式: {normal_time:.6f}s")

def get_default_interface_ip():
    """通过路由表获取默认网卡和IP"""
    try:
        # 获取默认路由接口 - 使用 stdout=subprocess.PIPE
        result = subprocess.run(
            ['ip', 'route', 'show', 'default'],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            universal_newlines=True,
            check=True
        )
        # 解析默认路由行
        match = re.search(r'dev\s+(\w+)', result.stdout)
        if match:
            interface = match.group(1)
            # 获取该接口的IP地址
            ip_result = subprocess.run(
                ['ip', '-4', 'addr', 'show', interface],
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                universal_newlines=True,
                check=True
            )
            # 解析IP地址
            ip_match = re.search(r'inet\s+(\d+\.\d+\.\d+\.\d+)', ip_result.stdout)
            if ip_match:
                return interface, ip_match.group(1)
    except subprocess.CalledProcessError as e:
        print(f"命令执行失败: {e}")
        print(f"错误输出: {e.stderr}")
    except Exception as e:
        print(f"错误: {e}")
    return None, None


wal_performance_test()

分别在客户机124和125执行:

python3 wal_performance_test.py

两个客户端交替出现错误 disk I/O error,总计出现6条:

[test@124 wal]$ python3 wal_performance_test.py
2025-12-09 14:33:21.776389 disk I/O error
2025-12-09 14:33:47.146377 disk I/O error
2025-12-09 14:34:39.921037 disk I/O error
WAL模式: 105.810731s

[test@125 wal]$ python3 wal_performance_test.py 
2025-12-09 14:33:20.745065 disk I/O error
2025-12-09 14:34:13.513216 disk I/O error
2025-12-09 14:34:38.881051 disk I/O error
WAL模式: 101.554718s

三、验证数据:

在123机器执行:

yum install -y sqlite
sqlite3 wal_test.db
sqlite> select count(*) from data;
19400
sqlite> select count(*) from data where host='10.10.10.124';
9700
sqlite> select count(*) from data where host='10.10.10.125';
9700

总的执行数据量2*100*100=20000,减去6次失败 20000-600=19400,数据量正好。

也就是不会导致数据写乱,但是加锁失败会报错,需要客户端自己重试,如果不重试数据会丢失。

四、基于NFS压测SQLite的三种事务类型:

压测脚本 nfs_transaction_test.py:

import time
import sqlite3
import sys
from datetime import datetime

def nfs_deferred_transaction_test():
    start = time.perf_counter()
    ok_counter, fail_counter = 0, 0
    for i in range(10000):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
            conn.execute("BEGIN TRANSACTION")
            row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
            counter = row[0] + 1
            conn.execute("UPDATE data SET counter=?", (counter,))
            time.sleep(0.01)
            conn.commit()
            conn.close()
            ok_counter+=1
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(0.01)
            fail_counter+=1
    cost_time = time.perf_counter() - start
    print(f"deferred: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")

def nfs_immediate_transaction_test():
    start = time.perf_counter()
    ok_counter, fail_counter = 0, 0
    for i in range(10000):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
            conn.execute("BEGIN IMMEDIATE TRANSACTION")
            row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
            counter = row[0] + 1
            conn.execute("UPDATE data SET counter=?", (counter,))
            time.sleep(0.01)
            conn.commit()
            conn.close()
            ok_counter+=1
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(0.01)
            fail_counter+=1
    cost_time = time.perf_counter() - start
    print(f"immediate: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")

def nfs_exclusive_transaction_test():
    start = time.perf_counter()
    ok_counter, fail_counter = 0, 0
    for i in range(10000):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
            conn.execute("BEGIN EXCLUSIVE TRANSACTION")
            row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
            counter = row[0] + 1
            conn.execute("UPDATE data SET counter=?", (counter,))
            time.sleep(0.01)
            conn.commit()
            conn.close()
            ok_counter+=1
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(0.01)
            fail_counter+=1
    cost_time = time.perf_counter() - start
    print(f"exclusive: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")

def main():
    conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
    conn.execute('CREATE TABLE IF NOT EXISTS data (counter INT)')
    row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
    if row is None:
        conn.execute("INSERT INTO data (counter) VALUES (?)", (0,))
    conn.commit()
    conn.close()
    command = sys.argv[1] if len(sys.argv)>1 else ""
    if command == "" or command == "1":
        nfs_deferred_transaction_test()
    elif command == "2":
        nfs_immediate_transaction_test()
    elif command == "3":
        nfs_exclusive_transaction_test()
    else:
        print("错误:参数不足或命令无效")

if __name__ == "__main__":
    main()

1、deferred事务:

两台客户机都会出现大量的报错:

[test@124 wal]$ python3 nfs_transaction_test.py 1
2025-12-11 18:17:21.147881 database is locked
deferred: cost_time:247.461341s ok_counter:9884 fail_counter:116

[test@125 wal]$ python3 nfs_transaction_test.py 1
2025-12-11 18:17:22.120418 database is locked
deferred: cost_time:248.263145s ok_counter:237 fail_counter:9763

从123使用linux工具查询(deferred事务)也会出现报错,fail_counter加上结果数量正好是测试数量。

sqlite> select * from data;
Error: database is locked
sqlite> select * from data;
4742
sqlite> select * from data;
Error: database is locked
sqlite> select * from data;
10121

2、immediate事务:

仅有一台客户机有报错产生:

[test@124 wal]$ python3 nfs_transaction_test.py 2
immediate: cost_time:190.654982s ok_counter:10000 fail_counter:0

[test@125 wal]$ python3 nfs_transaction_test.py 2
2025-12-11 18:29:14.539222 database is locked
2025-12-11 18:29:23.738750 database is locked
2025-12-11 18:29:26.140422 disk I/O error
immediate: cost_time:435.384960s ok_counter:9969 fail_counter:31

从123使用linux工具查询(deferred事务)也会出现报错,fail_counter加上结果数量比测试数量多2条。

125出现过两条disk I/O error错误,怀疑是虽然报了disk I/O error错误,但数据实际写入成功。

[root@localhost share]# sqlite3 nfs_test.db
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> select * from data;
Error: database is locked
sqlite> select * from data;
19971
sqlite> 

3、exclusive事务:

仅有一台客户机有报错:

[test@124 wal]$ python3 nfs_transaction_test.py 3
exclusive: cost_time:189.517209s ok_counter:10000 fail_counter:0

[test@125 wal]$ python3 nfs_transaction_test.py 3
2025-12-11 18:41:21.333215 database is locked
2025-12-11 18:41:26.411349 database is locked
exclusive: cost_time:432.467810s ok_counter:9965 fail_counter:35

从123使用linux工具查询(deferred事务)也会出现报错,fail_counter加上结果数量正好是测试数量。

[root@localhost share]# sqlite3 nfs_test.db 
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> select * from data;
10547
sqlite> select * from data;
Error: database is locked
sqlite> select * from data;
11174
sqlite> select * from data;
19965
sqlite>

五、基于NFS压测SQLite的三种事务类型(WAL模式):

压测脚本 nfs_wal_transaction_test.py:

import time
import sqlite3
import sys
from datetime import datetime

def nfs_deferred_transaction_test():
    start = time.perf_counter()
    ok_counter, fail_counter = 0, 0
    for i in range(10000):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
            conn.execute("PRAGMA journal_mode=WAL;")
            conn.execute("BEGIN TRANSACTION")
            row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
            counter = row[0] + 1
            conn.execute("UPDATE data SET counter=?", (counter,))
            time.sleep(0.01)
            conn.commit()
            conn.close()
            ok_counter+=1
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(0.01)
            fail_counter+=1
    cost_time = time.perf_counter() - start
    print(f"deferred: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")

def nfs_immediate_transaction_test():
    start = time.perf_counter()
    ok_counter, fail_counter = 0, 0
    for i in range(10000):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
            conn.execute("PRAGMA journal_mode=WAL;")
            conn.execute("BEGIN IMMEDIATE TRANSACTION")
            row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
            counter = row[0] + 1
            conn.execute("UPDATE data SET counter=?", (counter,))
            time.sleep(0.01)
            conn.commit()
            conn.close()
            ok_counter+=1
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(0.01)
            fail_counter+=1
    cost_time = time.perf_counter() - start
    print(f"immediate: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")

def nfs_exclusive_transaction_test():
    start = time.perf_counter()
    ok_counter, fail_counter = 0, 0
    for i in range(10000):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
            conn.execute("PRAGMA journal_mode=WAL;")
            conn.execute("BEGIN EXCLUSIVE TRANSACTION")
            row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
            counter = row[0] + 1
            conn.execute("UPDATE data SET counter=?", (counter,))
            time.sleep(0.01)
            conn.commit()
            conn.close()
            ok_counter+=1
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(0.01)
            fail_counter+=1
    cost_time = time.perf_counter() - start
    print(f"exclusive: cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")

def main():
    conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
    conn.execute('CREATE TABLE IF NOT EXISTS data (counter INT)')
    row = conn.execute('SELECT * FROM data LIMIT 1').fetchone()
    if row is None:
        conn.execute("INSERT INTO data (counter) VALUES (?)", (0,))
    conn.commit()
    conn.close()
    command = sys.argv[1] if len(sys.argv)>1 else ""
    if command == "" or command == "1":
        nfs_deferred_transaction_test()
    elif command == "2":
        nfs_immediate_transaction_test()
    elif command == "3":
        nfs_exclusive_transaction_test()
    else:
        print("错误:参数不足或命令无效")

if __name__ == "__main__":
    main()

1、deferred事务:

两台机器都出现大量的两种报错,接近于不可用。

2025-12-14 23:27:23.548719 database is locked
2025-12-14 23:27:23.578794 disk I/O error

2、immediate事务:

两台机器都出现大量的两种报错,接近于不可用。

2025-12-14 23:21:53.997637 database disk image is malformed

2025-12-14 23:23:27.288040 disk I/O error

3、exclusive事务:

两台机器都出现大量的三种报错,接近于不可用。

2025-12-14 23:26:58.143788 database is locked
2025-12-14 23:26:58.193508 disk I/O error
2025-12-14 23:29:44.264101 database disk image is malformed

六、测试 SQLite PRAGMA 命令的影响:

压测脚本 nfs_sqlite_pragma_test:

import time
import sqlite3
import sys
import subprocess
import re
import argparse
from datetime import datetime

def nfs_sqlite_test():
    start = time.perf_counter()
    ok_counter, fail_counter = 0, 0
    for i in range(10000):
        try:
            conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
            conn.execute(f"PRAGMA journal_mode={args.journal_mode};")
            conn.execute(f"PRAGMA synchronous={args.synchronous};")
            conn.execute(f"PRAGMA busy_timeout={args.busy_timeout};")
            conn.execute(f"PRAGMA temp_store={args.temp_store}")
            conn.execute(f"PRAGMA locking_mode={args.locking_mode}")
            conn.execute(f"BEGIN {args.trans_mode} TRANSACTION")
            row = conn.execute('SELECT * FROM data WHERE host=? LIMIT 1', (myip,)).fetchone()
            counter = row[0] + 1
            conn.execute('UPDATE data SET counter=? WHERE host=?', (counter,myip,))
            time.sleep(0.01)
            conn.commit()
            conn.close()
            ok_counter+=1
        except Exception as e:
            now = datetime.now()
            print(now, e)
            time.sleep(0.01)
            fail_counter+=1
    cost_time = time.perf_counter() - start
    print(f"cost_time:{cost_time:.6f}s ok_counter:{ok_counter} fail_counter:{fail_counter}")

def get_default_interface_ip():
    try:
        result = subprocess.run(
            ['ip', 'route', 'show', 'default'],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            universal_newlines=True,
            check=True
        )
        match = re.search(r'dev\s+(\w+)', result.stdout)
        if match:
            interface = match.group(1)
            ip_result = subprocess.run(
                ['ip', '-4', 'addr', 'show', interface],
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                universal_newlines=True,
                check=True
            )
            ip_match = re.search(r'inet\s+(\d+\.\d+\.\d+\.\d+)', ip_result.stdout)
            if ip_match:
                return interface, ip_match.group(1)
    except subprocess.CalledProcessError as e:
        print(f"命令执行失败: {e}")
        print(f"错误输出: {e.stderr}")
    except Exception as e:
        print(f"错误: {e}")
    return None, None

def main():
    global myip, args
    conn = sqlite3.connect('/mnt/nfs_share/nfs_test.db')
    conn.execute('CREATE TABLE IF NOT EXISTS data (counter INT, host VARCHAR(15))')
    _, myip = get_default_interface_ip()
    row = conn.execute('SELECT * FROM data WHERE host=? LIMIT 1', (myip,)).fetchone()
    if row is None:
        conn.execute("INSERT INTO data VALUES (?,?)", (0,myip,))
    conn.commit()
    conn.close()
    parser = argparse.ArgumentParser()
    parser.add_argument('--trans_mode', type=str, default="DEFERRED", help="DEFERRED/IMMEDIATE/EXCLUSIVE")
    parser.add_argument('--journal_mode', type=str, default="DELETE", help="DELETE/TRUNCATE/PERSIST/MEMORY/WAL/OFF")
    parser.add_argument('--synchronous', type=str, default="NORMAL", help="NORMAL/OFF/FULL/EXTRA")
    parser.add_argument('--busy_timeout', type=int, default=0, help="ms")
    parser.add_argument('--temp_store', type=str, default="DEFAULT", help="DEFAULT/FILE/MEMORY")
    parser.add_argument('--locking_mode', type=str, default="NORMAL", help="NORMAL/EXCLUSIVE")
    args = parser.parse_args()
    print(datetime.now())
    print("ip:", myip)
    print("args:", args)
    nfs_sqlite_test()

if __name__ == "__main__":
    main()

测试完全同步、独占事务、繁忙超时1分钟、默认日志模式下的情况:

python3 nfs_sqlite_pragma_test.py --trans_mode EXCLUSIVE --busy_timeout 60000 --synchronous FULL

两台机器共出现一条错误,损失一个计数:

2025-12-15 16:18:33.476198 database is locked
cost_time:945.651139s ok_counter:9999 fail_counter:1

测试完全同步、独占事务、繁忙超时1分钟、临时存储采用内存、默认日志模式下的情况:

python3 nfs_sqlite_pragma_test.py --trans_mode EXCLUSIVE --busy_timeout 60000 --synchronous FULL --temp_store MEMORY

两台机器共出现一条错误,损失一个计数,总耗时比不加--temp_store MEMORY少了一点:

2025-12-15 16:35:47.510544 database is locked
cost_time:903.890313s ok_counter:9999 fail_counter:1

测试不指定事务类型,但指定locking_mode为EXCLUSIVE:

python3 nfs_sqlite_pragma_test.py --trans_mode ' ' --busy_timeout 60000 --locking_mode EXCLUSIVE

两台机器都有大量报错,计数损失严重,说明locking_mode=EXCLUSIVE并不能影响全局的默认事务类型,且都是立刻出现报错,并没有忙等的过程。

2025-12-15 17:08:09.288668 database is locked
2025-12-15 17:08:09.332024 database is locked

测试设置busy_timeout是否对默认事务类型(DEFERRED)有影响:

python3 nfs_sqlite_pragma_test.py --busy_timeout 60000

两台机器都有大量报错,计数损失严重,都是立刻出现报错,并没有忙等的过程,说明busy_timeout对deferred类型事务无效,busy_timeout只对immediate和exclusive事务有效。

2025-12-15 17:18:05.893199 database is locked
2025-12-15 17:18:06.006748 database is locked

----

NFS使用SQLite3总结:

1、NFS网络文件环境不要使用WAL(预写日志)模式,报错太多,而且多了个 database disk image is malformed 错误。可以使用默认的 PRAGMA journal_mode = DELETE。

2、事务类型使用 EXCLUSIVE(排他)或 IMMEDIATE(立即),因为 DEFERRED(延迟)事务写入的时候加锁失败太多(database is locked)。

3、设置 PRAGMA busy_timeout = 60000;(1分钟,自定义) 对 "database is locked" 错误进行延后重试,时间足够长就成了串行化,真繁忙就出现堆积,任务超时失败。注意只对immediate和exclusive事务有效。

4、设置 PRAGMA synchronous = FULL; 避免数据丢失。

5、设置 PRAGMA temp_store = MEMORY; 避免在NFS上创建临时文件导致多节点间冲突(我没测试出区别)。

--end--

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值