代码爬取公众号上的图片
一、导入模块(这个部分有不懂的可以看我的文章《Python第三方库安装详细教程(图文结合)》)
import requests
from bs4 import BeautifulSoup
import time
二、获取网站的响应信息,并以text打印
url = 'https://mp.weixin.qq.com/s/J7y6TLECYyl2FmVe6XKpww'
headers = {
# 'referer': 'https://mp.weixin.qq.com',
'cookie':'pgv_pvid=6670082751; RK=WMxp6iw+3H; ptcz=831e2d5114bbf9b46ee7956fedb62717ee910417ecd992f3e0027f034213caf1; o_cookie=2925851543; pac_uid=1_2925851543; iip=0; tvfe_boss_uuid=94828b35f56c4131; LW_uid=01d6E8a1d0T8Y6S87134I123O2; eas_sid=J116c8t1G078b6f8N1u4m24059; LW_sid=6166y891k1d2s4h7v9M5A8K6e8; rewardsn=; wxtokenkey=777; wwapp.vid=; wwapp.cst=; wwapp.deviceid=',
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48'
}
response = requests.get(url,headers=headers)
# print(response.status_code) # 打印响应状态码
# 获取网页数据
html = response.text
# print(html)
三、解析返回的html文件,查找所有的img标签并存放在列表内
soup = BeautifulSoup(html, 'html.parser')
img_list = soup.find_all('img')
# print(img_list)
四、遍历img_list,获取内的图片链接
for img_url in img_list:
name = str(img_list.index(img_url))
# print(img_url)
img_link = img_url.get('data-src')
# print(img_link)
if img_link != None:
# print(img_link)
response2 = requests.get(img_link)
# 图片是二进制数据,获取要用content,文本文件用text
img_content = response2.content
# 设置休眠时间,防止速度过快被封
time.sleep(5)
五、保存图片,保存是在for循环内的if语句内
with open('D:\\图片\\'+name+'.jpeg','wb+') as f:
f.write(img_content)
f.close()
print(f'第{name}张图片下载成功')
六、结果展示
文章对你有帮助的话,麻烦点个赞吧!