python爬虫零基础入门（一）三行代码下载图片-CSDN博客

本文链接：https://blog.csdn.net/qq_42680814/article/details/104096956

爬取访问网站的小问题，代码如下

import urllib.request
response=urllib.request.urlopen("https://www.pexels.com/search/book")cat_img=response.read()

出现下列错误

有些网站为了防止这种非正常的访问,会验证请求信息中的UserAgent(它的信息包括硬件平台、系统软件、应用软件和用户个人偏好),如果UserAgent存在异常或者是不存在,那么这次请求将会被拒绝(如上错误信息所示)

那么我们应该怎么办呢？简单，以大家现在的水平，找软的柿子捏，下面有个网站没有防御系统

import urllib.request
response=urllib.request.urlopen("http://placekitten.com/g/200/300")cat_img=response.read()with open('cat_200_300.jpg','wb') as f:    f.write(cat_img)

这样，你就能在你的路径里面发现下载的图片

对付有防御能力的网站，只能另想办法，我们可以利用下面的代码访问有防御能力的网站，仔细看看两个代码的不同之处。

import urllib.request
req = urllib.request.Request("https://www.pexels.com/search/book")req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36')response=urllib.request.urlopen(req)wy = response.read()

下面做一个小练习，爬取下载刘亦菲女神的图片，这个网站是百度上的刘亦菲图片。

import urllib.request
response=urllib.request.urlopen("https://pic.sogou.com/d?query=%C1%F5%D2%E0%B7%C6&mode=1&did=2#did1")
girl_img=response.read()
with open('cat_200_300.jpg','wb') as f:
    f.write(girl_img)

参考来源：https://blog.csdn.net/The_Time_Runner/article/details/86522700

《零基础入门学习python》