爬虫入门2——爬代理ip地址

最新推荐文章于 2024-08-06 11:55:39 发布

GAN_player

最新推荐文章于 2024-08-06 11:55:39 发布

阅读量559

点赞数 1

分类专栏：我的Python学习文章标签：爬虫

本文链接：https://blog.csdn.net/GAN_player/article/details/78072357

版权

我的Python学习专栏收录该内容

60 篇文章 2 订阅

订阅专栏

import urllib.request
import re
def url_open(url):
    req=urllib.request.Request(url)
    req.add_header('User-Agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36')
    page=urllib.request.urlopen(req)  
    html=page.read().decode('utf-8')
    return html

def get_img(url):

    p=r'(?:(?:\d\d\d|\d\d|\d)\.){3}(?:\d\d\d|\d\d|\d)'
    iplist=re.findall(p,html)

    for each in iplist:
        print(each)

if __name__=='__main__':
     url='http://www.xicidaili.com/'
     html=url_open(url)
     iplist=get_img(html)