python多线程读取数据库数据,如何在python中进行多线程SQL查询,以便获得所有查询的结果...

Is there a way to use threads to simultaneously perform the SQL queries so I can cut down on processing time of my code below? Is there a better method to perform the same result as below without using the pandas module? Given the size of the data sets I am working with I cannot store the entire dataset in memory and I have found looping over the rows of a SELECT * FROM statement and comparing them against the list I am querying with adds to the processing time.

# DATABASE layout

# _____________________________________________________________

# | id | name | description |

# |_____________|____________________|__________________________|

# | 1 | John | Credit Analyst |

# | 2 | Jane | Doctor |

# | ... | ... | ... |

# | 5000000 | Mohammed | Dentist |

# |_____________|____________________|__________________________|

import sqlite3

SEARCH_IDS = [x for x in range(15000)]

DATABASE_NAME = 'db.db'

def chunks(wholeList, chunkSize=999):

"""Yield successive n-sized chunks from wholeList."""

for i in range(0, len(wholeList), chunkSize):

yield wholeList[i:i + chunkSize]

def search_database_for_matches(listOfIdsToMatch):

'''Takes a list of ids and returns the rows'''

conn = sqlite3.connect(DATABASE_NAME)

cursor = conn.cursor()

sql = "SELECT id, name, description FROM datatable WHERE id IN ({})".format(', '.join(["?" for x in listOfIdsToMatch]))

cursor.execute(sql,tuple(listOfIdsToMatch))

rows = cursor.fetchall()

return rows

def arrange(orderOnList,listToBeOrdered,defaultReturnValue='N/A'):

'''Takes a list of ids in the desired order and list of tuples which have ids as the first items.

the list of tuples is aranged into a new list corresponding to the order of the source list'''

from collections import OrderedDict

resultList=[defaultReturnValue for x in orderOnList]

indexLookUp = OrderedDict( [ ( value , key ) for key , value in enumerate( orderOnList ) ] )

for item in listToBeOrdered:

resultList[indexLookUp[item[0]]]=item

return resultList

def main():

results=[]

for chunk in chunks(SEARCH_IDS,999):

results += search_database_for_matches(chunk)

results = arrange(SEARCH_IDS,results)

print(results)

if __name__ == '__main__': main()

解决方案

Some advices:

Instead of reading the records by chucks using a iterator, you ought to use pagination.

See this questions:

If you're using multithreading / multiprocessing make sure your database can support it.

See: SQLite And Multiple Threads

To implement what you want you can use a pool of workers which work on each chunk. See Using a pool of workers in the Python documentation.

Example:

Import multiprocessing

with multiprocessing.pool.Pool(process = 4) as pool:

result = pool.map(search_database_for_match, [for chunk in chunks(SEARCH_IDS,999)])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值