python sqlite数据库一对多_Python：使用sqlite3进行多处理

最新推荐文章于 2022-05-02 00:37:08 发布

weixin_36331429

最新推荐文章于 2022-05-02 00:37:08 发布

阅读量466

点赞数

文章标签： python sqlite数据库一对多

本文链接：https://blog.csdn.net/weixin_36331429/article/details/112991494

版权

我有一个SQLite3数据库.我需要解析10000个文件.我从每个文件中读取一些数据,然后使用此数据查询数据库以获得结果.我的代码在单个进程环境中工作正常.但是在尝试使用多重处理池时出现错误.

My approach without multiprocessing (works OK):

1. Open DB connection object

2. for f in files:

foo(f, x1=x1, x2=x2, ..., db=DB)

3. Close DB

My approach with multiprocessing (does NOT work):

1. Open DB

2. pool = multiprocessing.Pool(processes=4)

3. pool.map(functools.partial(foo, x1=x1, x2=x2, ..., db=DB), [files])

4. pool.close()

5. Close DB

我收到以下错误：sqlite3.ProgrammingError：未调用Base Cursor .__ init__.

我的DB类实现如下：

def open_db(sqlite_file):

"""Open SQLite database connection.

Args:

sqlite_file -- File path

Return:

Connection

"""

log.info('Open SQLite database %s', sqlite_file)

try:

conn = sqlite3.connect(sqlite_file)

except sqlite3.Error, e:

log.error('Unable to open SQLite database %s', e.args[0])

sys.exit(1)

return conn

def close_db(conn, sqlite_file):

"""Close SQLite database connection.

Args:

conn -- Connection

"""

if conn:

log.info('Close SQLite database %s', sqlite_file)

conn.close()

class MapDB:

def __init__(self, sqlite_file):

"""Initialize.

Args:

sqlite_file -- File path

"""

# 1. Open database.

# 2. Setup to receive data as dict().

# 3. Get cursor to execute queries.

self._sqlite_file = sqlite_file

self._conn = open_db(sqlite_file)

self._conn.row_factory = sqlite3.Row

self._cursor = self._conn.cursor()

def close(self):

"""Close DB connection."""

if self._cursor:

self._cursor.close()

close_db(self._conn, self._sqlite_file)

def check(self):

...

def get_driver_net(self, net):

...

def get_cell_id(self, net):

...

函数foo()看起来像这样：

def foo(f, x1, x2, db):

extract some data from file f

r1 = db.get_driver_net(...)

r2 = db.get_cell_id(...)

整体不起作用的实施如下：

mapdb = MapDB(sqlite_file)

log.info('Create NetInfo objects')

pool = multiprocessing.Pool(processes=4)

files = [get list of files to process]

pool.map(functools.partial(foo, x1=x1, x2=x2, db=mapdb), files)

pool.close()

mapdb.close()

为了解决这个问题,我想我需要在每个池工作者中创建MapDB()对象(因此有4个并行/独立的连接).但我不知道该怎么做.有人能告诉我如何用Pool实现这个目标吗？

最佳答案那样定义foo怎么样：

def foo(f, x1, x2, db_path):

mapdb = MapDB(db_path)

... open mapdb

... process data ...

... close mapdb

然后将pool.map调用更改为：

pool.map(functools.partial(foo, x1=x1, x2=x2, db_path="path-to-sqlite3-db"), files)

更新

另一个选择是自己处理工作线程并通过队列分配工作.

from Queue import Queue

from threading import Thread

q = Queue()

def worker():

mapdb = ...open the sqlite database

while True:

item = q.get()

if item[0] == "file":

file = item[1]

... process file ...

q.task_done()

else:

q.task_done()

break

...close sqlite connection...

# Start up the workers

nworkers = 4

for i in range(nworkers):

worker = Thread(target=worker)

worker.daemon = True

worker.start()

# Place work on the Queue

for x in ...list of files...:

q.put(("file",x))

# Place termination tokens onto the Queue

for i in range(nworkers):

q.put(("end",))

# Wait for all work to be done.

q.join()

终止令牌用于确保关闭sqlite连接 – 如果重要的话.

weixin_36331429

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python sqlite数据库一对多_Python：使用sqlite3进行多处理

我有一个SQLite3数据库.我需要解析10000个文件.我从每个文件中读取一些数据,然后使用此数据查询数据库以获得结果.我的代码在单个进程环境中工作正常.但是在尝试使用多重处理池时出现错误.My approach without multiprocessing (works OK):1. Open DB connection object2. for f in files:foo(f, x1=x...
复制链接

扫一扫