记得以前写爬虫的时候为了防止dns多次查询,是直接修改/etc/hosts文件的,最近看到一个优美的解决方案,修改后记录如下:
import socket
_dnscache={}
def _setDNSCache():
"""
Makes a cached version of socket._getaddrinfo to avoid subsequent DNS requests.
"""
def _getaddrinfo(*args, **kwargs):
global _dnscache
if args in _dnscache:
print str(args)+" in cache"
return _dnscache[args]
else:
print str(args)+" not in cache"
_dnscache[args] = socket._getaddrinfo(*args, **kwargs)
return _dnscache[args]
if not hasattr(socket, '_getaddrinfo'):
socket._getaddrinfo = socket.getaddrinfo
socket.getaddrinfo = _getaddrinfo
def test():
_setDNSCache()
import urllib
urllib.urlopen('http://www.baidu.com')
urllib.urlopen('http://www.baidu.com')
test()
结果如下:
('www.baidu.com', 80, 0, 1) not in cache
('www.baidu.com', 8

最低0.47元/天 解锁文章

11

被折叠的 条评论
为什么被折叠?



