效果
目标网站
目标接口:
# GET 请求
https://www.logosc.cn/api/so/get?page=0&pageSize=20&keywords=&category=local&isNeedTranslate=undefined
通过分析,我们可以知道应该可以修改page
和pageSize
参数来控制获取的图片。
导入第三方模块
import requests
import os.path
分析要爬取的信息
代码
封装getPicture(page,pageSize)
方法,来爬取图片素材的地址,在通过download(urls)
方法来进行下载!
# author: LiuShihao
# data: 2020/12/3 5:23 下午
# youknow: 各位老铁,我的这套代码曾经有人出价三个亿我没有卖,如今拿出来和大家分享,不求别的,只求大家免费的小红心帮忙点一点,这里谢过了。
# desc: 爬取搜图神器网站的图片素材
"""
https://www.logosc.cn/so/
目标接口: https://www.logosc.cn/api/so/get?page=0&pageSize=20&keywords=&category=local&isNeedTranslate=undefined
"""
import requests
import os.path
# page = 0
# pageSize = 50
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36",
}
# 获取素材地址
def getPicture(page,pageSize):
urls = []
url = f"https://www.logosc.cn/api/so/get?page={page}&pageSize={pageSize}&keywords=&category=local&isNeedTranslate=undefined"
print(url)
response = requests.get(url=url, headers=headers)
content = response.json()
if "data" in content:
i = 0
while True:
try:
if content["data"][i]["large_img_path"]["url"]:
picture_url = content["data"][i]["large_img_path"]["url"]
print("picture_url" + str(i) + ":", picture_url)
i = i + 1
urls.append(picture_url)
except:
break
print("没有数据!")
else:
print("没有获取到数据!")
return urls
# 下载素材
def download(urls):
i = 0
for image in urls:
i += 1
image = requests.get(image).content
if os.path.exists('images'):
with open('images/' + str(i) + '.jpg', 'wb') as f: # 注意打开的是就jpg文件 w 只读 b二进制
print(str(i) + '.jpg 正在保存。。。')
f.write(image)
else:
print('目录不存在')
if __name__ == '__main__':
urls = getPicture(0,40)
print(f"获取到{len(urls)}个地址")
download(urls)