python爬虫实例：今日头条街拍大图下载

最新推荐文章于 2024-08-04 18:30:00 发布

万物皆乱

最新推荐文章于 2024-08-04 18:30:00 发布

阅读量312

点赞数

分类专栏：爬虫

本文链接：https://blog.csdn.net/weixin_48697341/article/details/108359174

版权

爬虫专栏收录该内容

8 篇文章 0 订阅

订阅专栏

今天的目标是头条首页搜索街拍，下载各个标题下的大图，如下：
在这里插入图片描述

今天有点懒，仅分享下代码，自行学习消化：

import requests
import os
from hashlib import md5


for i in range(3):
    offset=i*20
    url='https://www.toutiao.com/api/search/content/?aid=24&app_name=web_search&offset=%d&format=json&keyword=街拍&autoload=true&count=20&en_qc=1&cur_tab=1'% offset
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'}


    response=requests.get(url,headers=headers)
    #print(response.json())
    json=response.json()
    #print(json.get('data'))
    

    for titles in json.get('data'):
        try:
            print(titles['title'])
            print(titles['large_image_url'])
            if not os.path.exists(titles['title']):
                try:
                    os.mkdir(titles['title'])
                    image=requests.get(titles['large_image_url'])
                    if image.status_code == 200:
                        file_path='{0}/{1}.{2}'.format(titles['title'],md5(image.content).hexdigest(),'jpg')
                        #file_path = '{0}/{1}.{2}'.format(item.get('title'),md5(response.content).hexdigest(),'jpg')
                        if not os.path.exists(file_path):
                            with open(file_path,'wb') as o:
                                o.write(image.content) 
                    else:
                        print('')
                except OSError:
                    print('OSError')
        except KeyError:
            print('KeyError')