Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions论文笔记

Wichitas

已于 2024-07-31 15:09:17 修改

阅读量586

点赞数 5

文章标签： transformer 论文阅读深度学习

于 2024-07-06 14:16:38 首次发布

本文链接：https://blog.csdn.net/Wichitas/article/details/140208231

版权

总体：midi到remi的数据表征/数据处理

由图片看出以小节为单位

论文分析：

1.intro

the perception of rhythm,Repetition and long-term structure are also important factors that make a musical piece coherent and understandable.

选用的研究路径：two critical elements in the aforementioned approach—the way music is converted into discrete tokens for language modeling,

Midi存在的问题：MIDI-like representation：Hence, high-level (semantic) information of music , such as downbeat, tempo and chord

因为midi like存在错误累计的问题，To address this issue, we propose to use the combination of Bar and Position events instead.

we introduce the Bar event to indicate the beginning of a bar, and the Position events to point to certain locations within a bar. For example, Position (9/16) indicates that we are pointing to the middle of a bar, which is quantized to 16 regions in this implementation (see Section 3 for details). The combination of Position and Bar therefore provides an explicit metrical grid to model music, in a beat-based manner

在音乐生成模型中，"Bar" 和 "Position" 的概念帮助模型更好地理解和生成有结构的音乐。具体来说：

- **"Bar"** 代表一个音乐小节（bar），是音乐的基本结构单位。
- **"Position"** 代表一个小节中的具体时间点，用于细分小节的时间。

### 理解Bar和Position的作用

当模型学习音乐生成时，"Bar" 和 "Position" 提供了明确的时间和结构线索，使得模型能够理解音乐的时间和节奏结构。

denotes the time resolution adopted to represent a bar. For example, if we consider a 16-th note time grid as Q = 16.

#### 学习过程中的快速理解
模型在训练初期就能快速学习到 "Bar" 和 "Position" 的意义。例如：

- 在一个小节内，"Position" 的顺序是固定的，`Position (9/Q)` 永远不会出现在 `Position (3/Q)` 之前，除非中间有一个新的 "Bar"。
- 模型通过这种结构可以快速纠正自身的错误，而不需要后处理步骤。

### 例子：Bar和Position的学习曲线

假设我们有一个基于Transformer的音乐生成模型。在训练初期，模型可能会产生以下错误：

- 在一个小节内，音符的位置顺序混乱。
- 音符在同一个位置重复出现。

然而，在训练几个epoch之后，模型能够理解以下规律：

1. **位置顺序**：
- 在同一个小节内，音符按照 `Position 1/Q` 到 `Position Q/Q` 的顺序排列。

2. **小节的分隔**：
- 当一个小节结束时（例如达到 `Position Q/Q`），下一个音符会出现在新的小节开始位置（`Position 1/Q`）。

### 直观理解

下面是一个简单的示例，展示模型在训练过程中如何理解 "Bar" 和 "Position" 的概念。

# 模型生成的音符位置（初始阶段）
generated_positions = [
    (1, 4), (1, 3), (1, 1), (1, 7), (1, 2), # 小节1
    (2, 5), (2, 8), (2, 6), (2, 3), (2, 1)  # 小节2
]

# 格式: (Bar, Position)
# 可以看到同一小节内位置顺序混乱

# 可以看到同一小节内位置顺序混乱
```

#### 训练几个epoch后（正确的生成）

# 模型生成的音符位置（几个epoch后）
generated_positions = [
    (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), # 小节1
    (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8)  # 小节2
]

# 格式: (Bar, Position)
# 现在同一小节内位置顺序正确

通过引入 "Bar" 和 "Position" 概念，模型能够快速学习和理解音乐的结构，生成符合音乐理论的作品。这种方法不仅提高了模型的学习效率，还减少了后处理步骤，确保了生成音乐的质量和连贯性。

对tempo的介绍：用音频域的节奏估计函数来推导出这些节奏事件。

在音乐生成任务中，“**generate from scratch**” 和 “**generate continuation**” 是两种不同的生成模式。它们的区别在于是否有前置条件或初始输入来引导生成过程。

### Generate from Scratch

在这种模式下，模型从零开始生成音乐片段，没有任何初始的条件或提示。这意味着生成的音乐是完全由模型根据内部的规则和训练数据的分布生成的。参数设置如`n_target_bar=16` 表示生成的音乐长度为16小节。

特点：
- **随机性**：由于没有输入提示，生成结果完全依赖于模型的随机初始化和参数设置（如温度、top-k 选择等）。
- **探索性**：适合用来探索模型的潜力，或生成完全新的、没有约束的音乐内容。

代码示例：

model.generate(
    n_target_bar=16,
    temperature=1.2,
    topk=5,
    output_path='./result/from_scratch.midi',
    prompt=None)  # 无提示

```

### Generate Continuation

在这种模式下，模型接收一个已有的音乐片段作为输入（即提示），并基于此片段生成后续内容。这种方式类似于续写，模型会尝试生成与初始片段风格一致或相协调的音乐。

特点：
- **连续性**：生成的音乐片段会接续已有的输入片段，确保音乐的连贯性。
- **依赖性**：生成结果受输入片段的风格和内容影响，能够在已有作品的基础上扩展或变换。

代码示例：
```python

model.generate(
    n_target_bar=16,
    temperature=1.2,
    topk=5,
    output_path='./result/continuation.midi',
    prompt='./data/evaluation/000.midi')  # 有提示
```

在这个例子中，`prompt` 指定了一个已有的 MIDI 文件（`./data/evaluation/000.midi`），模型将基于这个文件中的音乐片段生成后续部分。

### 总结

- **Generate from Scratch**：没有提示，完全从零开始生成音乐，适合探索性和实验性创作。
- **Generate Continuation**：有提示，根据已有音乐片段生成后续内容，适合续写和扩展已有作品。

下面是在autodl和vscode运行这篇论文的代码

要求的环境：

1.autodl连接vscode 终端输入密码一定是右键不然会被出现permission denied

2.进入jupyterlab的终端

下载anaconda

wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh

安装不成功就换例如Anaconda3-2022.10-Linux-x86_64.sh

Archive在 Index of /anaconda/archive/ | 清华大学开源软件镜像站 | Tsinghua Open Source Mirror

安装成功后

bash Anaconda3-2021.05-Linux-x86_64.sh

然后创建虚拟环境

conda create --name tf_env python=3.6
conda activate tf_env

平常从终端进入进入 ana:

 source ~/anaconda3/bin/activate

配置虚拟环境：我参考的这篇：https://blog.csdn.net/weixin_44151034/article/details/132716212

添加源

#conda清除添加源，恢复默认源
conda config --remove-key channels
#添加一些清华源(有时候清华源崩了，可以换阿里源，阿里源崩了，可以换中科大源。国内安环境遇到问题，比如查找不到包的问题，很多都是网络的问题)
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/

tensorflow对应的网址参考官网：https://tensorflow.google.cn/install/source?hl=zh-cn#tested_build_configurations

然后根据论文的要求：tensorflow-gpu 1.14.0

看到对应的

使用 CUDA 10.0 和 cuDNN 7.4

再根据

conda search cudatoolkit
conda search cudnn

找对应的版本

我选择的

conda install cudatoolkit==10.0.130
conda install cudnn==7.6.0

一定要对应search的结果选择对应的

我开始选择了结果里没有的报错然后问gpt让我手动下载把终端搞崩了系统盘内存也爆了

然后下载

tensorflow-gpu 1.14.0
```
pip install tensorflow-gpu==1.14.0
```
z这里我会报错recommend pip install --upgrade pip我没有upgrade运行代码会报错找不到tensor
所以这里
```
pip install ——upgrade pip
```
之后再安装tensorflow1.14就好了
下载
```
pip install miditoolkit
```
因为我运行会报错没有scipy 所以
```
conda install scipy
```
运行还是会出问题： failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED

参考的这里的文章了解到RTX30系列GPU可能对9.0，10.0及以下版本cuda不支持为了

我之前用的显卡是RTX3090 换成RTX2080就好了能生成论文那两首midi

Tensorflow-GPU无法运行-failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED - 知乎 (zhihu.com)

autodl上传文件的解压：AutoDL帮助文档

不要忘记把checkpioint和data也解压到remi-master

unzip REMI-tempo-checkpoint.zip -d remi-master
unzip data.zip -d remi-master

Wichitas

关注

5
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions论文笔记

在一个小节内，"Position" 的顺序是固定的，`Position (9/Q)` 永远不会出现在 `Position (3/Q)` 之前，除非中间有一个新的 "Bar"。当模型学习音乐生成时，"Bar" 和 "Position" 提供了明确的时间和结构线索，使得模型能够理解音乐的时间和节奏结构。在音乐生成模型中，"Bar" 和 "Position" 的概念帮助模型更好地理解和生成有结构的音乐。下面是一个简单的示例，展示模型在训练过程中如何理解 "Bar" 和 "Position" 的概念。
复制链接

扫一扫