以下载MovieLens为例,实现一个存放到指定路径,显示下载进度的效果。个人觉得这段代码比较通用,记录一下。
代码如下:
import os
from urllib.request import urlretrieve
from tqdm import tqdm
import zipfile
def download_data():
"""
download movie data
"""
data_name = 'ml-1m'
# 文件夹路径需要先创建好:“../../dataset/download”
save_path = '../../dataset/download/ml-1m.zip'
url = 'http://files.grouplens.org/datasets/movielens/ml-1m.zip'
# hash_code = 'c4d9eecfca2ab87c1945afe126590906'
if os.path.exists(save_path):
print('{} is already exiting..'.format(data_name))
else:
# DLProgress、urlretrieve解释:https://blog.csdn.net/weixin_30713953/article/details/95244537
with DLProgress(unit='B', unit_scale=True, miniters=1, desc='Downloading {}'.format(data_name)) as pbar:
'''
urlretrieve(url, filename=None, reporthook=None, data=None)方法直接将远程数据下载到本地
filename指定了保存本地路径(如果参数未指定,urllib会生成一个临时文件保存数据。
reporthook是一个回调函数,当连接上服务器、以及相应的数据块传输完毕时会触发该回调,我们可以利用这个回调函数来显示当前的下载进度。
data指post导服务器的数据,该方法返回一个包含两个元素的(filename, headers) 元组,filename 表示保存到本地的路径,header表示服务器的响应头
'''
urlretrieve(url, save_path, pbar.hook)
# 继承tqdm类
class DLProgress(tqdm):
"""
Handle progress bar while downloading
"""
last_block = 0
def hook(self, block_num=1, block_size=1, total_size=None):
"""
a hook function
"""
self.total = total_size
self.update((block_num - self.last_block) * block_size)
self.last_block = block_num
if __name__ == '__main__':
download_data()
# extract_data()
另外还可以参考的解压MovieLens的代码。
def extract_data():
"""
extract data from ml-1m.zip
"""
data_name = 'ml-1m'
data_path = '../../dataset/download/ml-1m.zip'
extract_path = '../'
if not os.path.exists(os.path.join(extract_path, data_name)):
os.makedirs(os.path.join(extract_path, data_name))
unzip(data_name, data_path, extract_path)
print('extracting done')
else:
print('Dir has already exist! Please delete it fristly!')
def unzip(data_name, from_path, to_path):
print('extracting {} ....'.format(data_name))
with zipfile.ZipFile(from_path) as zf:
zf.extractall(to_path)
最后的文件位置如下:
参考文献
- https://github.com/HeartFu/DSSM
- https://blog.csdn.net/weixin_30713953/article/details/95244537