爬虫学习：一个ip地址下载器

最新推荐文章于 2023-08-10 14:45:27 发布

shawncheer

最新推荐文章于 2023-08-10 14:45:27 发布

阅读量396

点赞数

分类专栏： python学习爬虫学习

本文链接：https://blog.csdn.net/shawncheer/article/details/50776178

版权

python学习同时被 2 个专栏收录

40 篇文章 2 订阅

订阅专栏

爬虫学习

7 篇文章 0 订阅

订阅专栏

import urllib.request
import re

def open_url(url):
    req=urllib.request.Request(url)
    req.add_header('User-Agent','Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6')
    page=urllib.request.urlopen(req)
    html=page.read().decode('utf-8')

    return html

def get_img(html):
    p=r'(?:(?:[01]?\d?\d|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d?\d|2[0-4]\d|25[0-5])'#?:正则表达式的扩展
    iplist=re.findall(p,html)

    for each in iplist:
        print(each)
        



if __name__=='__main__':
    url="http://www.xicidaili.com/"#代理地址框
    get_img(open_url(url))