1.背景
久闻凯斯西储数据集大名,最近开了个新坑,打算用CWRU数据做一下验证,网上简单搜了一下没有找到现成的数据,遂前往官网手动下载然后手动命名。在下载十几个数据之后,忽然想起为什么不用python来下载?
本文提供了python下载CWRU数据的程序,以及汇总数据。
(后话)下载完才发现有几个百度云资源,这里一并整理在“4.数据集下载”中。
2.工具
python+re+request
3.程序
import requests
import os
import re
def downloadFile(url, path, file_name, type = 'mat'):
r = requests.get(url)
with open("%s/%s.%s"%(path,file_name,type), "wb") as f:
f.write(r.content)
def page_crawler(url):
headers = {
'user_agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36 Edg/84.0.522.58'
}
dir_path = os.path.dirname(os.path.abspath(__file__))+'\\'+os.path.split(url)[-1]
if not os.path.exists(dir_path):
os.makedirs(dir_path)
page = requests.get(url,headers=headers)
page.encoding = 'utf-8'
pattern = re.compile(r'<a href=.*</a>')
for i in [i for i in pattern.findall(page.text) if '.mat' in i]:
file_url = re.findall(r'(?<=href=").*?(?=">)',i)[0]
file_name = re.findall(r'(?<=>).*(?=</a>)',i)[0]
downloadFile(file_url, dir_path, file_name)
if __name__ == '__main__':
urls = [
'https://engineering.case.edu/bearingdatacenter/normal-baseline-data',
'https://engineering.case.edu/bearingdatacenter/12k-fan-end-bearing-fault-data',
'https://engineering.case.edu/bearingdatacenter/12k-drive-end-bearing-fault-data',
'https://engineering.case.edu/bearingdatacenter/48k-drive-end-bearing-fault-data'
]
[page_crawler(i) for i in urls]
4.数据集下载
阿里云:
https://www.alipan.com/s/xkPSvKfFzer
提取码: b1u8
(mat和压缩文件都分享不了,把.mat改成了.txt,下载下来之后可以运行bat文件批量改回去,不放心的可以记事本打开bat文件检查一下内容,就是一些相对路径移动和文件重命名)
蓝奏云:
https://wwk.lanzoue.com/b0ny54e8j
密码:9hz7
百度云: