Python 3 爬取网络资源（一）

最新推荐文章于 2024-06-19 17:27:45 发布

爱此清夜雨

最新推荐文章于 2024-06-19 17:27:45 发布

阅读量774

点赞数

文章标签： python 爬虫

本文链接：https://blog.csdn.net/dh858115/article/details/60598225

版权

最近刚学习python爬虫技术，查找了一下python爬虫的demo，发现大部分都是python 2 的语法，于是自己查了一下api，自己变更修改了下，最终完成了图片爬取。

具体代码如下：

#coding=utf-8
import re
import urllib.request


def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read().decode('utf-8')
    return html


def getDiv(html):
    reg = r'src="(https://.+?\.jpg)"'
    divRe = re.compile(reg)
    imglist = re.findall(divRe,html)
    x = 0
    for imgurl in imglist:
        urllib.request.urlretrieve(imgurl,'%s.jpg' % x)
        x+=1
    return imglist 


html = getHtml("https://image.baidu.com/")


print(getDiv(html))

整个程序是通过制定具体的网址，使用python 的 urllib.request 库，然后使用正则表达式去匹配。最后使用的urllib.requst的
urlretrieve函数将网上图片保存到本地。