批量下载GitHub中项目的图片 | 爬虫

最新推荐文章于 2024-05-24 08:35:06 发布

季泠

最新推荐文章于 2024-05-24 08:35:06 发布

阅读量1.8k

点赞数 1

分类专栏： Spider Python成长笔记工具集源码文章标签： python 爬虫 github

本文链接：https://blog.csdn.net/Heart_for_Ling/article/details/104007773

版权

Python成长笔记同时被 3 个专栏收录

31 篇文章 0 订阅

订阅专栏

Spider

7 篇文章 0 订阅

订阅专栏

工具集源码

5 篇文章 0 订阅

订阅专栏

举例详解。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author:LingInHeart

import requests,re,os
r=requests.get('https://github.com/MiracleYoung/You-are-Pythonista/tree/master/PythonExercise/App/plan_game/material_images')
urls=re.findall(r'MiracleYoung/You-are-Pythonista/blob/master/PythonExercise/App/plan_game/material_images/\w+\.png',r.text)
for v,url in enumerate(urls):
    url='https://raw.githubusercontent.com/'+url.replace('/blob','')
    path = 'D://pict/' + url.split('/')[-1]
    r = requests.get(url)
    with open(path, 'wb')as f:
        f.write(r.content)
        f.close()
        print('第%d张图片保存成功！' % (v + 1))

第一步，浏览器找到图片库的链接
在这里插入图片描述
点开一个图片观察链接：https://github.com/MiracleYoung/You-are-Pythonista/blob/master/PythonExercise/App/plan_game/material_images/again.png

r=requests.get('https://github.com/MiracleYoung/You-are-Pythonista/tree/master/PythonExercise/App/plan_game/material_images')
urls=re.findall(r'MiracleYoung/You-are-Pythonista/blob/master/PythonExercise/App/plan_game/material_images/\w+\.png',r.text)

获得网页源代码，通过re表达式提取所有的图片链接。for i in urls:print(i)
在这里插入图片描述
第二步，找到图片的源代码，观察链接：
https://raw.githubusercontent.com/MiracleYoung/You-are-Pythonista/master/PythonExercise/App/plan_game/material_images/again.png

易知urls中的所有链接都缺少前置https://raw.githubusercontent.com/ 以及多了 /blob，

url='https://raw.githubusercontent.com/'+url.replace('/blob','')

对url进行处理，添加前缀并删去多余部分。

第三步，保存图片

    path = 'D://pict/' + url.split('/')[-1]
    r = requests.get(url)
    with open(path, 'wb')as f:
        f.write(r.content)
        f.close()

下图是完整代码执行结果。
在这里插入图片描述

以上。

季泠

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
4
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录