通过代理访问网页
由于的有的网站会拒绝一些ip的直接访问,这里使用一种最简单的代理来进行访问
这里用的局域网,可自行配置
对使用的代理proxy进行检查格式
def checkProxy(self, proxy):
global ip
try:
proxyMatch = re.compile('http[s]?://[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}:[\d]{1,5}$')
re.search(proxyMatch, proxy)
except Exception as e:
exit()
flag = 1
proxy = proxy.replace('//', '')
try:
protocol = proxy.split(':')[0]
ip = proxy.split(':')[1]
port = proxy.split(':')[2]
except Exception as e:
print("下标超出")
exit()
flag = flag and len(proxy.split(':')) == 3 and len(proxy.split('.'))
flag = ip.split('.')[0] in map(str, range(1, 256)) and flag
flag = ip.split('.')[1] in map(str, range(256)) and flag
flag = ip.split('.')[2] in map(str, range(256)) and flag
flag = ip.split('.')[3] in map(str, range(1, 255)) and flag
flag = port in map(str, range(1, 65535)) and flag
if flag:
print("格式有效")
else:
exit()
其中proxyMatch是对ip的一个正则表达,用来检测代理格式,后面通过分隔符来对端口号,ip,以及协议进行检查
def useProxy(self, proxy):
protocol = proxy.split('://')[0]
proxy_Handler = urllib.request.ProxyHandler({protocol: proxy})
opener = urllib.request.build_opener(proxy_Handler)
urllib.request.install_opener(opener)
try:
with urllib.request.urlopen(self.url, timeout=self.timeout) as response:
result = response.read().decode('utf-8')
except Exception as e:
print("连接错误")
exit()
print("%s" % result)
if re.search(self.flagWord, result):
print('可用')
else:
print('不可用')
通过代理来访问百度的网址并且返回的结果中获取flagword这个特征值,如果获取到则代理可用