教你用Python来爬取表情包网站的所有表情图片
配置环境
安装Python 开发环境 3X系列
win + R 打开运行,输入cmd,输入python,进行验证是否安装Python
win + R 打开运行,输入cmd,输入pip install requests
爬取目标
PY爬虫代码如下:
'''
作者:血饮
功能:爬取制定网页表情包
时间:2020.02.20
'''
import requests
import os
import re
target_url = "https://qq.yh31.com/zjbq/0551964.html"
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
}
source_code = requests.get(target_url,headers=headers).content.decode("utf-8")
regex_1 = r'img[\s]+src="(.*?\.gif)"[\s]+/'
xueyin = re.compile(regex_1)
get_img_url = re.findall(xueyin,source_code)
path = os.getcwd()
for x in get_img_url:
x = "https://qq.yh31.com/" + x
file_name = x.split("/")[-1]
file_path = path +"\\"+file_name
response = requests.get(x,headers=headers)
with open(file_path, "wb") as f:
f.write(response.content)
print("完成")
使用方法:
进入目标网站
进入目标网站,按下F12打开开发者工具
得到
图片地址(不完整):/tp/zjbq/201903271348331856.gif
自己浏览器UA:User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36