设计思路:
由于某大爷网站上做了反爬设置,
上是免费代{过} {滤}理ip我是直接爬不下来,
只有先复制到txt文件中,然后进行数据清洗得到我想要的格式
然后在弯弯某.tw/上进行代{过}{滤}理IP可匿、可用测试,符合条件的IP写入输出文件
具体步骤如下:
1、将从网页上复制的IP地址保存到txt文件里面,然后进行数据清洗代码如下:
with open('file/IP-DL/ip-01.txt', 'r+', encoding='utf-8') as fr, open('file/IP-DL/IP-02.txt', 'w+', encoding='utf-8') as fd:
b = fr.read()
b = b.replace(' ', ',')
b = b.replace(',,,\n', ',')
b = b.replace(',,\n', ',')
b = b.replace(',\n', ',')
fd.write(b)
fd.close()
fr.close()
with open('file/IP-DL/ip-02.txt', 'r', encoding='utf-8') as fd1,open('file/IP-DL/IP-03.txt', 'w+', encoding='utf-8') as fr1:
for text in fd1.readlines():
if len(text) > 10 and text.find('透明') == -1 and int(text.split(',')[5]) < 500:
a1 = text.split(',')[0]
b1 = text.split(',')[1]
c1 = text.split(',')[5]
a1 = a1.strip()
b1 = b1.strip()
c1 = c1.strip()
d1 = a1+':'+b1+'\n'
print('D1