使用 pytest-dist --dist=loadscope 避免数据污染的完整指南
当使用 --dist=loadscope
进行并行测试时,防止数据污染是关键挑战。以下是系统化的解决方案:
一、数据污染类型及应对策略
污染类型 | 典型场景 | 解决方案 |
---|---|---|
数据库记录污染 | 并行操作同一张表 | 独立schema/事务隔离 |
文件系统冲突 | 同时读写同一文件 | 工作器独立目录/文件锁 |
内存共享状态 | 全局变量/缓存污染 | 进程隔离/测试清理 |
外部服务状态 | API测试改变服务状态 | 模拟服务/独立测试账户 |
环境变量污染 | 并行修改环境变量 | 工作器独立环境 |
二、数据库隔离方案
1. 动态Schema策略
# conftest.py
import os
import pytest
from sqlalchemy import create_engine
@pytest.fixture(scope="module")
def db_engine(request):
"""每个工作器+模块组合使用独立schema"""
worker_id = getattr(request.config, 'workerinput', {}).get('workerid', 'master')
schema_name = f"test_{worker_id}_{request.module.__name__}"
engine = create_engine(f"postgresql://user:pass@localhost/mydb?options=-csearch_path%3D{schema_name}")
# 创建schema
with engine.connect() as conn:
conn.execute(f"CREATE SCHEMA IF NOT EXISTS {schema_name}")
conn.execute("SET search_path TO {schema_name}")
yield engine
# 测试结束后清理
if not request.config.option.no_cleanup:
with engine.connect() as conn:
conn.execute(f"DROP SCHEMA {schema_name} CASCADE")
2. 事务回滚方案
# conftest.py
@pytest.fixture(autouse=True)
def transactional_db(db_connection):
"""每个测试用例在事务中运行并自动回滚"""
transaction = db_connection.begin()
yield
transaction.rollback()
三、文件系统隔离方案
1. 工作器独立目录
# conftest.py
@pytest.fixture(scope="module")
def work_dir(tmp_path_factory, request):
"""为每个工作器的每个模块创建独立工作目录"""
worker_id = getattr(request.config, 'workerinput', {}).get('workerid', 'master')
dir_name = f"worker_{worker_id}_{request.module.__name__}"
return tmp_path_factory.mktemp(dir_name)
2. 分布式文件锁
from filelock import FileLock
@pytest.fixture
def shared_file_access(tmp_path):
lock_file = tmp_path / "resource.lock"
with FileLock(str(lock_file)):
# 临界区操作
yield resource
四、状态管理方案
1. 测试数据工厂
# test_utils.py
class DataFactory:
def __init__(self, worker_id):
self.counter = 0
self.worker_id = worker_id
def unique_email(self):
self.counter += 1
return f"user_{self.worker_id}_{self.counter}@test.com"
# conftest.py
@pytest.fixture(scope="module")
def data_factory(request):
worker_id = getattr(request.config, 'workerinput', {}).get('workerid', 'master')
return DataFactory(worker_id)
2. Redis命名空间
@pytest.fixture(scope="module")
def redis_client(request):
worker_id = getattr(request.config, 'workerinput', {}).get('workerid', 'master')
client = redis.Redis()
client.execute_command("SELECT", f"{worker_id}")
yield client
client.flushdb()
五、外部服务隔离
1. 独立测试账户
@pytest.fixture(scope="module")
def api_client(request):
worker_id = getattr(request.config, 'workerinput', {}).get('workerid', 'master')
client = APIClient()
client.login(
username=f"worker_{worker_id}",
password=f"password_{worker_id}"
)
yield client
client.delete_account()
2. 服务模拟
@pytest.fixture(scope="module")
def mock_service(request):
worker_id = getattr(request.config, 'workerinput', {}).get('workerid', 'master')
with mock.patch('service.endpoint', new=f"http://mock/{worker_id}") as m:
yield m
六、验证方案
1. 污染检测脚本
# conftest.py
def pytest_sessionfinish(session, exitstatus):
if session.config.getoption("--check-contamination"):
# 实现污染检查逻辑
check_database_contamination()
check_file_system_conflicts()
2. 使用示例
pytest -n 4 --dist=loadscope --check-contamination
七、CI/CD集成示例
# GitHub Actions 配置
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:13
env:
POSTGRES_PASSWORD: postgres
ports:
- 5432:5432
strategy:
matrix:
worker: [1, 2, 3]
steps:
- run: |
pytest -n ${{ matrix.worker }} \
--dist=loadscope \
--postgres-host=postgres \
--no-cleanup=${{ github.event_name == 'pull_request' }}
最佳实践总结
- 命名规范:所有资源名称包含
worker_id
和module_name
- 清理策略:PR构建跳过清理方便调试,主干构建强制清理
- 监控体系:实施污染检测机制
- 文档记录:在项目文档中明确隔离方案
- 渐进实施:先从最敏感的测试开始应用隔离方案