前言
本章主要用requests,解析图片网址主要用beautiful soup
操作步骤
1.打开F12,选到network,点击Load more…按钮,可以查看network里抓到的网址
现在我们可以通过requests请求网页
import requests
#cookies、headers值这里就不写了
cookies = {}
headers = {}
params = {'page': '2'}
#这里是get请求,get方法带参数请求时,是params=参数字典
response = requests.get('https://github.com/topics', headers=headers, params=params, cookies=cookies)
print(response.text)
2.点击下图的小箭头,选择图中的一个图片点击,可以获得图片地址
根据请求到的数据用beautifulsoup 模块解析 ,获取图片地址
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content, "lxml")
pngs = soup.find("ul", {"class": "list-style-none"}).find_all("li", {"class": "py-4 border-bottom"})
print(len(pngs))
for each in pngs:
png_tag = each.find("img", {"class": "rounded-1 mr-3"})
if not png_tag:
png_url = ""
else:
png_url = png_tag.get("src")
print(png_url)
3.获取到图片地址后就可将图片保存到本地
这里我是用图片原本的图片名保存的
import urllib.request
filename = png_url.split('/')[-1]
print(filename)
urllib.request.urlretrieve(png_url, 'E://images/'+filename)
4.全部的代码如下
import requests
from bs4 import BeautifulSoup
import urllib.request
def main():
cookies = {}
headers = {}
params = {'page': '2'}
response = requests.get('https://github.com/topics', headers=headers, params=params, cookies=cookies)
soup = BeautifulSoup(response.content, "lxml")
pngs = soup.find("ul", {"class": "list-style-none"}).find_all("li", {"class": "py-4 border-bottom"})
print(len(pngs))
for each in pngs:
png_tag = each.find("img", {"class": "rounded-1 mr-3"})
if not png_tag:
png_url = ""
else:
png_url = png_tag.get("src")
print(png_url)
filename = png_url.split('/')[-1]
print(filename)
urllib.request.urlretrieve(png_url, 'E://images/'+filename)
# response = requests.get(png_url, stream=True)
# with open('E://images/'+filename, 'wb') as fd:
# fd.write(response.content)
# print(filename + "download success")
if __name__ == '__main__':
main()