数据集Mixing Secrets下载

数据集信息

数据集简单介绍

原文描述

The dataset presented in this paper is sourced from the website: Mixing Secrets for The Small Studio.1 The website hosts around 300 multi-tracks free of charge for academic purposes, making it perfect for use in the MIR community. The tracks are professionally recorded. These multi-tracks are also listed on The Open MultiTrack Testbed [4]. The multi-tracks span several genres, and while a detailed quantitative analysis has not been carried out yet, it is definitely a promising source of data for tasks such as source separation and instrument recognition.

其他

  1. 所有Tracks 涵盖的乐器包括 ‘Bass,’ ‘Drums,’ ‘Vocals,’ ‘Electrical Guitar,’ ‘Acoustic Guitar,’ and ‘Piano.’ , 这些被采用于这篇paper用于separation,但不局限于这些乐器。
  2. several genres 涵盖 Acoustic / Jazz / Country / Orchestral Electronica / Dance /
    Experimental Pop / Singer-Songwriter Alt Rock / Blues / Country Rock
    / Indie / Funk / Reggae Rock / Punk / Metal Hip-Hop / R&B 等流派
  3. 官网有片段试听,以及数据库等一些其他信息

数据集下载

"""
Author: kokole

Download Mixing Secrets dataset for musical instrument recognition

Mixing Secrets
    - Homepage: https://www.cambridge-mt.com/ms/mtk/
    - Paper: https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Gururani_Lerch_2017_Mixing-Secrets.pdf
"""

import requests
from bs4 import BeautifulSoup
import re
import os
from tqdm import tqdm
import zipfile
import tempfile

Hostreferer = {
    'User-Agent': "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
    'Referer': 'https://www.cambridge-mt.com/'
}

Picreferer = {
    'User-Agent': "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
    'Referer': 'https://www.cambridge-mt.com/'
}

# 获取html数据
def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30, headers=Hostreferer)
        print(r)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""

# 解析html返回各个multi-track的下载url列表
def parse(html):
    url_list = []
    soup = BeautifulSoup(html, 'html.parser')
    span_all = soup.find_all('span', {'class': 'm-mtk-download__links'})
    for span in span_all:
        if re.match('^https://mtkdata.cambridgemusictechnology.co.uk/(.*)_Full.zip$', span.a['href']):
            url_list.append(span.a['href'])

    return url_list


# 下载并解压文件至相关的目录
def save_file(url_list, path):

    for url in tqdm(url_list):
        save_path = os.path.join(path, url.split('/')[-1].split('.')[0])
        if os.path.exists(save_path):
            continue

        f = requests.get(url, allow_redirects=True)
        _tmp_file = tempfile.TemporaryFile()
        _tmp_file.write(f.content)

        zf = zipfile.ZipFile(_tmp_file, mode='r')
        for names in zf.namelist():
            f = zf.extract(names, save_path)

        zf.close()
    
if __name__ == '__main__':
    html = getHTMLText('https://www.cambridge-mt.com/ms/mtk/')
    url_list = parse(html)
    save_file(url_list, '/Users/xxx/research/dataset/Mixing_Secrets/Unzipped')
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值