我正在编写一个API,它允许用户同时在几百个数据库中运行查询,可以是单个数据库,也可以是多个数据库。当查询多个数据库时,它们被并行访问以加快响应时间。我的语义:“q-job”是在一个或多个数据库中执行的查询。在
由于它是一个内部工具,预计用户数量不会太多(目前预计每天约有5000个查询),但我觉得有些限制是合乎逻辑的:一个数据库中只允许同时运行X个查询
只允许同时运行Y数量的q作业
到目前为止,我可以通过使用multiprocessing.Pool来并行化数据库的查询,但是我只专注于对q作业进行排队,并使用multiprocessing.Queue和池(也考虑过multiprocessing.Semaphore)对单个数据库的查询进行限制。
一些伪代码来说明我的想法:# Prepare pool and queues for multiprocessed tasks
process_pool = multiprocessing.Pool(processes = max_processes)
job_queue = multiprocessing.Queue() # Initialize the queue containing single- and multi-client queries
results_queue = multiprocessing.Queue() # Initialize the queue that holds results
# Query a database
__execute_query_in_db(args):
# [Open DB connection]
# [Execute query]
# [Close DB connection]
# [Return response]
# Main worker
def __main_worker(w_queue):
''' Runs in the background and checks for new entries in the job queue
'''
while True: # Endlessly check for new items in the queue ...
conn_strings=w_queue.get(True) # ...
__query_worker(conn_strings) # Run the query
# Query worker
def __query_worker(conn_strings):
queries = process_pool.map_async(__execute_query_in_db, conn_strings) # Run individual DB queries in the pool
try:
retArr=queries.get(timeout=1200) # Execute queries in pool; Timeout after 20 minutes
# NOTE: We never reach this point!!!
except:
logger.error("TS-API Unexpected error: ", sys.exc_info()[0])
process_pool.terminate()
results = "Error: %s"%(sys.exc_info()[0])
else: # Wait until all queries have finished
process_pool.close()
process_pool.join()
logger.debug("Got %s results"%len(retArr))
## Ignore empty responses
results = []
for result in retArr:
if result is not None:
results.extend(result)
logger.debug("Putting results in results_queue")
results_queue.put(results)
# Instantiate single-process job pool that monitors new query requests
job_pool = multiprocessing.Pool(processes=1, initializer=__main_worker, initargs=(job_queue,))
# Invoked when an API call is made...
def ExecuteSuperQuery(params=None):
# [Procure connection string(s)]
job_queue.put(conn_strings) # Send connection information to the job queue
while True: # Wait for response to be pushed to results queue
if not results_queue.empty(): # ...
return results_queue.get() # ...
time.sleep(2) # ...
# [Somehow get the results and show them as a (decorated) JSON string]
当我运行这类代码并在数据库中执行查询时,进程似乎永远不会超出标有“NOTE:”的行。没有错误消息;queries.get只是看起来永远没有完成。在
我还尝试运行一个单独的进程,该进程根据需要实例化一个进程池,以便每次查询一个或多个数据库,但后来我得到一个错误,即子进程无法运行子进程。去想想吧。在
我请求社区帮助解决以下问题:有没有更好的方法来排队和跟踪结果?在
有没有更好的方法来限制查询和作业?在