数据集信息
数据集简单介绍
原文描述
The dataset presented in this paper is sourced from the website: Mixing Secrets for The Small Studio.1 The website hosts around 300 multi-tracks free of charge for academic purposes, making it perfect for use in the MIR community. The tracks are professionally recorded. These multi-tracks are also listed on The Open MultiTrack Testbed [4]. The multi-tracks span several genres, and while a detailed quantitative analysis has not been carried out yet, it is definitely a promising source of data for tasks such as source separation and instrument recognition.
其他
- 所有Tracks 涵盖的乐器包括 ‘Bass,’ ‘Drums,’ ‘Vocals,’ ‘Electrical Guitar,’ ‘Acoustic Guitar,’ and ‘Piano.’ , 这些被采用于这篇paper用于separation,但不局限于这些乐器。
- several genres 涵盖 Acoustic / Jazz / Country / Orchestral Electronica / Dance /
Experimental Pop / Singer-Songwriter Alt Rock / Blues / Country Rock
/ Indie / Funk / Reggae Rock / Punk / Metal Hip-Hop / R&B 等流派 - 官网有片段试听,以及数据库等一些其他信息
数据集下载
"""
Author: kokole
Download Mixing Secrets dataset for musical instrument recognition
Mixing Secrets
- Homepage: https://www.cambridge-mt.com/ms/mtk/
- Paper: https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Gururani_Lerch_2017_Mixing-Secrets.pdf
"""
import requests
from bs4 import BeautifulSoup
import re
import os
from tqdm import tqdm
import zipfile
import tempfile
Hostreferer = {
'User-Agent': "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
'Referer': 'https://www.cambridge-mt.com/'
}
Picreferer = {
'User-Agent': "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
'Referer': 'https://www.cambridge-mt.com/'
}
# 获取html数据
def getHTMLText(url):
try:
r = requests.get(url, timeout=30, headers=Hostreferer)
print(r)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
return ""
# 解析html返回各个multi-track的下载url列表
def parse(html):
url_list = []
soup = BeautifulSoup(html, 'html.parser')
span_all = soup.find_all('span', {'class': 'm-mtk-download__links'})
for span in span_all:
if re.match('^https://mtkdata.cambridgemusictechnology.co.uk/(.*)_Full.zip$', span.a['href']):
url_list.append(span.a['href'])
return url_list
# 下载并解压文件至相关的目录
def save_file(url_list, path):
for url in tqdm(url_list):
save_path = os.path.join(path, url.split('/')[-1].split('.')[0])
if os.path.exists(save_path):
continue
f = requests.get(url, allow_redirects=True)
_tmp_file = tempfile.TemporaryFile()
_tmp_file.write(f.content)
zf = zipfile.ZipFile(_tmp_file, mode='r')
for names in zf.namelist():
f = zf.extract(names, save_path)
zf.close()
if __name__ == '__main__':
html = getHTMLText('https://www.cambridge-mt.com/ms/mtk/')
url_list = parse(html)
save_file(url_list, '/Users/xxx/research/dataset/Mixing_Secrets/Unzipped')