I am working on a Linux web server that runs Python code to grab realtime data over HTTP from a 3rd party API. The data is put into a MySQL database.
I need to make a lot of queries to a lot of URL's, and I need to do it fast (faster = better). Currently I'm using urllib3 as my HTTP library.
What is the best way to go about this? Should I spawn multiple threads (if so, how many?) and have each query for a different URL?
I would love to hear your thoughts about this - thanks!
解决方案
If a lot is really a lot than you probably want use asynchronous io not threads.
GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
grequests.map(rs)