数据集Mixing Secrets下载

最新推荐文章于 2025-05-26 23:51:08 发布

xianxianlele

最新推荐文章于 2025-05-26 23:51:08 发布

阅读量337

点赞数

分类专栏： python 文章标签： python 爬虫

本文链接：https://blog.csdn.net/S10xuexi/article/details/118055271

版权

python 专栏收录该内容

2 篇文章

订阅专栏

数据集信息

数据集简单介绍

原文描述

The dataset presented in this paper is sourced from the website: Mixing Secrets for The Small Studio.1 The website hosts around 300 multi-tracks free of charge for academic purposes, making it perfect for use in the MIR community. The tracks are professionally recorded. These multi-tracks are also listed on The Open MultiTrack Testbed [4]. The multi-tracks span several genres, and while a detailed quantitative analysis has not been carried out yet, it is definitely a promising source of data for tasks such as source separation and instrument recognition.

其他

所有Tracks 涵盖的乐器包括 ‘Bass,’ ‘Drums,’ ‘Vocals,’ ‘Electrical Guitar,’ ‘Acoustic Guitar,’ and ‘Piano.’ ，这些被采用于这篇paper用于separation，但不局限于这些乐器。
several genres 涵盖 Acoustic / Jazz / Country / Orchestral Electronica / Dance /
Experimental Pop / Singer-Songwriter Alt Rock / Blues / Country Rock
/ Indie / Funk / Reggae Rock / Punk / Metal Hip-Hop / R&B 等流派
官网有片段试听，以及数据库等一些其他信息

数据集下载

"""
Author: kokole

Download Mixing Secrets dataset for musical instrument recognition

Mixing Secrets
    - Homepage: https://www.cambridge-mt.com/ms/mtk/
    - Paper: https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Gururani_Lerch_2017_Mixing-Secrets.pdf
"""

import requests
from bs4 import BeautifulSoup
import re
import os
from tqdm import tqdm
import zipfile
import tempfile

Hostreferer = {
    'User-Agent': "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
    'Referer': 'https://www.cambridge-mt.com/'
}

Picreferer = {
    'User-Agent': "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
    'Referer': 'https://www.cambridge-mt.com/'
}

# 获取html数据
def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30, headers=Hostreferer)
        print(r)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""

# 解析html返回各个multi-track的下载url列表
def parse(html):
    url_list = []
    soup = BeautifulSoup(html, 'html.parser')
    span_all = soup.find_all('span', {'class': 'm-mtk-download__links'})
    for span in span_all:
        if re.match('^https://mtkdata.cambridgemusictechnology.co.uk/(.*)_Full.zip$', span.a['href']):
            url_list.append(span.a['href'])

    return url_list


# 下载并解压文件至相关的目录
def save_file(url_list, path):

    for url in tqdm(url_list):
        save_path = os.path.join(path, url.split('/')[-1].split('.')[0])
        if os.path.exists(save_path):
            continue

        f = requests.get(url, allow_redirects=True)
        _tmp_file = tempfile.TemporaryFile()
        _tmp_file.write(f.content)

        zf = zipfile.ZipFile(_tmp_file, mode='r')
        for names in zf.namelist():
            f = zf.extract(names, save_path)

        zf.close()
    
if __name__ == '__main__':
    html = getHTMLText('https://www.cambridge-mt.com/ms/mtk/')
    url_list = parse(html)
    save_file(url_list, '/Users/xxx/research/dataset/Mixing_Secrets/Unzipped')