今日头条python_今日头条.py · 李强/Python-Crawler - Gitee.com

最新推荐文章于 2024-08-18 15:31:51 发布

weixin_39646831

最新推荐文章于 2024-08-18 15:31:51 发布

阅读量155

点赞数

文章标签：今日头条python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39646831/article/details/114011570

版权

#coding=utf-8

import requests

from urllib.parse import urlencode

import os

from hashlib import md5

def get_page(offset,keyword):

params={

'offset':offset,

'format':'json',

'keyword':keyword,

'autoload':'true',

'count':'20',

'cur_tab':'1',

'from':'search_tab'

}

#https://www.toutiao.com/search_content/?offset=60&format=json&keyword=%E8%BD%A6%E6%A8%A1&autoload=true&count=20&cur_tab=1&from=search_tab

url='https://www.toutiao.com/search_content/?'+urlencode(params)

response=requests.get(url)

#500服务器内部错误，400错误请求(服务器找不到请求的语法) 404未找到

if response.status_code==200:

return response.json()

def get_images(json):

data=json.get('data')

if data:

for item in data:

image_list=item.get('image_list')

title=item.get('title')

if image_list:

for image in image_list:

#构造一个生成器，将图片和标题一起返回

yield {

'image':image.get('url'),

'title':title

}

#item就是get_image()返回的一个字典

#item里面的title创建一个文件夹

def save_image(item):

if not os.path.exists(item.get('title')):

os.mkdir(item.get('title'))

local_image_url=item.get('image')

response=requests.get("http:"+local_image_url)

if response.status_code==200:

file_path='{0}/{1}.{2}'.format(item.get('title'),md5(response.content).hexdigest(),'jpg')

#判断路径是否存在，如果不存在，写入

if not os.path.exists(file_path):

with open(file_path,'wb')as f:

f.write(response.content)

#定义一个offset数组，遍历，提取图片，下载

def main(offset,keyword):

json=get_page(offset,keyword)

for item in get_images(json):

print(item)

save_image(item)

if __name__ == '__main__':

keyword=input("请输入要爬取图片的关键词:")

offset=input("请输入要爬取的数量:")

main(offset,keyword)

一键复制

编辑

Web IDE

原始数据

按行查看

历史

weixin_39646831

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。