python批量下载文件

最新推荐文章于 2024-05-01 21:59:49 发布

Sun_Weiss

最新推荐文章于 2024-05-01 21:59:49 发布

阅读量5.4k

点赞数 4

分类专栏： Python 文章标签： python excel

本文链接：https://blog.csdn.net/Sun_Weiss/article/details/113933288

版权

Python 专栏收录该内容

22 篇文章 2 订阅

订阅专栏

在已经有文件url的前提下，批量下载文件。

在excel文档中，保存url和文件名/编号。

request读取网页内容，filetype判断文件类型，批量下载保存。

需要安装filetype包：pip install filetype

# -*- coding: utf-8 -*-
"""
Created on Mon Feb 22 10:24:35 2021

@author: weisssun
"""
import requests
import pandas as pd
import filetype

myHeaders = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36"}

# 定义文件下载函数 downloadFile

def downloadFile(url,savePath):
    # 将网页链接 url，文件夹路径 savePath 作为参数传入
    try:
        webPage = requests.get(url, headers = myHeaders, timeout=5)
        #print(webPage.status_code)
        # 获取网页
        webContent = webPage.content
        # 网页内容
        file_type = filetype.guess(webContent).extension
        # 识别文件类型
        #print(file_type)
        file_path = savePath + fileId + '.' + file_type
        # 根据文件夹路径、文件名id、文件类型，组合文件保存路径
        f = open(file_path, 'wb')
        f.write(webContent)
        # 将网页内容写入保存路径中
        f.close()
    except requests.exceptions.RequestException:
        print(fileId + '超时')

# 读取excel表格
        
data = pd.read_excel(r'D:\保存url和文件编号的文档.xlsx')
#data = pd.read_excel(r'D:\保存url和文件编号的文档.xlsx', sheet_name='abc')

# 下载文件保存文件夹
savePath = 'D:/文件下载/'

for i in data.index:
    fileId = str(data.loc[i, '编号'])
    url = str(data.loc[i, 'url'])
    if url == 'nan':
        continue
    else:
        downloadFile(url,savePath)

Sun_Weiss

关注

4
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
python批量下载文件

在已经有文件url的前提下，批量下载文件。在excel文档中，保存url和文件名/编号。request读取网页内容，filetype判断文件类型，批量下载保存。# -*- coding: utf-8 -*-"""Created on Mon Feb 22 10:24:35 2021@author: weisssun"""import requestsimport pandas as pdimport filetypemyHeaders = {'User-Agent': "M
复制链接

扫一扫