使用环境
win10系统,python
先简单开始访问,获取html从而提取所需数据
import requests
url = 'https://www.douyu.com/g_yz'
response = requests.get(url=url)
html = response.text
print(html)
将输出结果往下拉,指导看到jpg相关的如下图所示
接着就利用简单的正则表达式进行提取
import re
title_url = re.findall(r'"rn":"(.*?)","rpos":0,"rs1":"(.*?)"',html)
for title,one_url in title_url:
print(title+"=================="+one_url)
下图便是相关结果
对于一个图片的下载如下所示
with open('一贫如洗的直播间 5695362.jpg','wb') as f:
resp = requests.get(url='https://rpic.douyucdn.cn/live-cover/appCovers/2020/06/21/5695362_20200621173529_big.jpg/dy2').content
f.write(resp)
下面是保存成功的图片
那么在一个循坏里也是同理的
for title,one_url in title_url:
with open(title+'.jpg','wb') as f:
resp = requests.get(url=one_url).content
f.write(resp)
print(title+'======================保存成功')
输出结果:
看成品
优化后的源码如下:
import requests
import re
import os
import time
url = 'https://www.douyu.com/g_yz'
response = requests.get(url=url)
html = response.text
title_url = re.findall(r'"rn":"(.*?)","rpos":0,"rs1":"(.*?)"',html)
os.chdir('小姐姐\\')
for title,one_url in title_url:
with open(title+'.jpg','wb') as f:
resp = requests.get(url=one_url).content
f.write(resp)
print(title+'======================保存成功')
time.sleep(0.5)