python sqlite数据库一对多_Python:使用sqlite3进行多处理

我有一个SQLite3数据库.我需要解析10000个文件.我从每个文件中读取一些数据,然后使用此数据查询数据库以获得结果.我的代码在单个进程环境中工作正常.但是在尝试使用多重处理池时出现错误.

My approach without multiprocessing (works OK):

1. Open DB connection object

2. for f in files:

foo(f, x1=x1, x2=x2, ..., db=DB)

3. Close DB

My approach with multiprocessing (does NOT work):

1. Open DB

2. pool = multiprocessing.Pool(processes=4)

3. pool.map(functools.partial(foo, x1=x1, x2=x2, ..., db=DB), [files])

4. pool.close()

5. Close DB

我收到以下错误:sqlite3.ProgrammingError:未调用Base Cursor .__ init__.

我的DB类实现如下:

def open_db(sqlite_file):

"""Open SQLite database connection.

Args:

sqlite_file -- File path

Return:

Connection

"""

log.info('Open SQLite database %s', sqlite_file)

try:

conn = sqlite3.connect(sqlite_file)

except sqlite3.Error, e:

log.error('Unable to open SQLite database %s', e.args[0])

sys.exit(1)

return conn

def close_db(conn, sqlite_file):

"""Close SQLite database connection.

Args:

conn -- Connection

"""

if conn:

log.info('Close SQLite database %s', sqlite_file)

conn.close()

class MapDB:

def __init__(self, sqlite_file):

"""Initialize.

Args:

sqlite_file -- File path

"""

# 1. Open database.

# 2. Setup to receive data as dict().

# 3. Get cursor to execute queries.

self._sqlite_file = sqlite_file

self._conn = open_db(sqlite_file)

self._conn.row_factory = sqlite3.Row

self._cursor = self._conn.cursor()

def close(self):

"""Close DB connection."""

if self._cursor:

self._cursor.close()

close_db(self._conn, self._sqlite_file)

def check(self):

...

def get_driver_net(self, net):

...

def get_cell_id(self, net):

...

函数foo()看起来像这样:

def foo(f, x1, x2, db):

extract some data from file f

r1 = db.get_driver_net(...)

r2 = db.get_cell_id(...)

整体不起作用的实施如下:

mapdb = MapDB(sqlite_file)

log.info('Create NetInfo objects')

pool = multiprocessing.Pool(processes=4)

files = [get list of files to process]

pool.map(functools.partial(foo, x1=x1, x2=x2, db=mapdb), files)

pool.close()

mapdb.close()

为了解决这个问题,我想我需要在每个池工作者中创建MapDB()对象(因此有4个并行/独立的连接).但我不知道该怎么做.有人能告诉我如何用Pool实现这个目标吗?

最佳答案 那样定义foo怎么样:

def foo(f, x1, x2, db_path):

mapdb = MapDB(db_path)

... open mapdb

... process data ...

... close mapdb

然后将pool.map调用更改为:

pool.map(functools.partial(foo, x1=x1, x2=x2, db_path="path-to-sqlite3-db"), files)

更新

另一个选择是自己处理工作线程并通过队列分配工作.

from Queue import Queue

from threading import Thread

q = Queue()

def worker():

mapdb = ...open the sqlite database

while True:

item = q.get()

if item[0] == "file":

file = item[1]

... process file ...

q.task_done()

else:

q.task_done()

break

...close sqlite connection...

# Start up the workers

nworkers = 4

for i in range(nworkers):

worker = Thread(target=worker)

worker.daemon = True

worker.start()

# Place work on the Queue

for x in ...list of files...:

q.put(("file",x))

# Place termination tokens onto the Queue

for i in range(nworkers):

q.put(("end",))

# Wait for all work to be done.

q.join()

终止令牌用于确保关闭sqlite连接 – 如果重要的话.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值