python数据库查询数据多_Python：使用多处理池和队列查询多个数据库

最新推荐文章于 2022-05-13 13:39:54 发布

胡辰鑫

最新推荐文章于 2022-05-13 13:39:54 发布

阅读量522

点赞数

文章标签： python数据库查询数据多

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_30998797/article/details/114432835

版权

我正在编写一个API，它允许用户同时在几百个数据库中运行查询，可以是单个数据库，也可以是多个数据库。当查询多个数据库时，它们被并行访问以加快响应时间。我的语义：“q-job”是在一个或多个数据库中执行的查询。在

由于它是一个内部工具，预计用户数量不会太多(目前预计每天约有5000个查询)，但我觉得有些限制是合乎逻辑的：一个数据库中只允许同时运行X个查询

只允许同时运行Y数量的q作业

到目前为止，我可以通过使用multiprocessing.Pool来并行化数据库的查询，但是我只专注于对q作业进行排队，并使用multiprocessing.Queue和池(也考虑过multiprocessing.Semaphore)对单个数据库的查询进行限制。

一些伪代码来说明我的想法：# Prepare pool and queues for multiprocessed tasks

process_pool = multiprocessing.Pool(processes = max_processes)

job_queue = multiprocessing.Queue() # Initialize the queue containing single- and multi-client queries

results_queue = multiprocessing.Queue() # Initialize the queue that holds results

# Query a database

__execute_query_in_db(args):

# [Open DB connection]

# [Execute query]

# [Close DB connection]

# [Return response]

# Main worker

def __main_worker(w_queue):

''' Runs in the background and checks for new entries in the job queue

'''

while True: # Endlessly check for new items in the queue ...

conn_strings=w_queue.get(True) # ...

__query_worker(conn_strings) # Run the query

# Query worker

def __query_worker(conn_strings):

queries = process_pool.map_async(__execute_query_in_db, conn_strings) # Run individual DB queries in the pool

try:

retArr=queries.get(timeout=1200) # Execute queries in pool; Timeout after 20 minutes

# NOTE: We never reach this point!!!

except:

logger.error("TS-API Unexpected error: ", sys.exc_info()[0])

process_pool.terminate()

results = "Error: %s"%(sys.exc_info()[0])

else: # Wait until all queries have finished

process_pool.close()

process_pool.join()

logger.debug("Got %s results"%len(retArr))

## Ignore empty responses

results = []

for result in retArr:

if result is not None:

results.extend(result)

logger.debug("Putting results in results_queue")

results_queue.put(results)

# Instantiate single-process job pool that monitors new query requests

job_pool = multiprocessing.Pool(processes=1, initializer=__main_worker, initargs=(job_queue,))

# Invoked when an API call is made...

def ExecuteSuperQuery(params=None):

# [Procure connection string(s)]

job_queue.put(conn_strings) # Send connection information to the job queue

while True: # Wait for response to be pushed to results queue

if not results_queue.empty(): # ...

return results_queue.get() # ...

time.sleep(2) # ...

# [Somehow get the results and show them as a (decorated) JSON string]

当我运行这类代码并在数据库中执行查询时，进程似乎永远不会超出标有“NOTE:”的行。没有错误消息；queries.get只是看起来永远没有完成。在

我还尝试运行一个单独的进程，该进程根据需要实例化一个进程池，以便每次查询一个或多个数据库，但后来我得到一个错误，即子进程无法运行子进程。去想想吧。在

我请求社区帮助解决以下问题：有没有更好的方法来排队和跟踪结果？在

有没有更好的方法来限制查询和作业？在

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python数据库查询数据多_Python：使用多处理池和队列查询多个数据库

我正在编写一个API，它允许用户同时在几百个数据库中运行查询，可以是单个数据库，也可以是多个数据库。当查询多个数据库时，它们被并行访问以加快响应时间。我的语义：“q-job”是在一个或多个数据库中执行的查询。在由于它是一个内部工具，预计用户数量不会太多(目前预计每天约有5000个查询)，但我觉得有些限制是合乎逻辑的：一个数据库中只允许同时运行X个查询只允许同时运行Y数量的q作业到目前为止，我可以通...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。