python项目之爬取代理的ip地址

最新推荐文章于 2023-06-02 18:43:41 发布

小六工作室

最新推荐文章于 2023-06-02 18:43:41 发布

阅读量1w

点赞数

分类专栏： python项目爬虫项目文章标签： python

本文链接：https://blog.csdn.net/lyffly2011/article/details/50551146

版权

python项目同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

爬虫项目

10 篇文章 0 订阅

订阅专栏

python项目之爬取代理的ip地址

爬取网站的代理ip地址，解析，保存为文本文件。

练习源码

# coding = utf-8

####################################################
# coding by 刘云飞
####################################################

import requests
import re

URL_S="http://www.xicidaili.com/"
headers = {
    'Host':'www.xicidaili.com',
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
    'Accept-Encoding': 'gzip, deflate',
    'Cookie':'_free_proxy_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFVEkiJTYxMDdmMjBlZGVjMTMyN2QxZjVmMTM1OGI1ZWRiNTVmBjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMVQzaWNQazE2ZHovZ0NReWFKeFpMakp3dURJOVpyMkZXNUp6WUVqNjJJZ2c9BjsARg%3D%3D--fcb2c5aed90070f18b85d2262278f9e5811f6b56; CNZZDATA1256960793=1456382766-1453291871-http%253A%252F%252Fwww.baidu.com%252F%7C1453291871',
    'Connection':'keep-alive',
    'If-None-Match': 'W/"aa248d9ab9daa155024a37bbfb5ce775"',
    'Cache-Control': 'max-age=0'
}

sess = requests.session()
resp = sess.get(URL_S,headers = headers)
text = resp.text
comp = re.compile(r'(?isu)<td>(\d+)\.(\d+)\.(\d+)\.(\d+)</td>\s*<td>(\d+)</td>')
all_ip = comp.findall(text)
str_all = ""

for ip in all_ip:
    str_all += ip[0]+'.'+ip[1]+'.'+ip[2]+'.'+ip[3]+'.'+ip[4]+"\n"
    print(ip)

with open('ip.txt','w') as f:
    f.write(str_all)