开源音乐生成新势力：手把手教你用YuE在EC2创作AI乐曲

本文链接：https://blog.csdn.net/rralucard123/article/details/145536684

介绍

当谈到AI音乐生成工具时，Suno无疑是最知名的选择。但就在上个月，音乐生成领域迎来了一位开源新秀——由Multimodal Art Projection（多模态艺术投影）团队与香港科技大学（HKUST）联合研发的YuE（乐·悦）正式开源发布。这款基于Apache2许可证的AI音乐生成工具，正在开发者社区掀起一阵热潮。

作品试听

架构图

为何取名YuE？

这个充满东方韵味的名字暗藏玄机：它不仅取自中文"音乐"与"幸福"的意境融合，其发音更是巧妙对应英文中表达喜悦的"yeah"，堪称中西合璧的命名艺术。正如开发者所说："我们期待每个使用YuE的人，都能在创作中收获双倍的快乐"。

三大核心亮点

词曲分离生成：输入歌词即可智能生成独立的伴奏轨道和人声音轨，支持多轨混音导出
多语言适配：完美支持中文、日语、英语、韩语等主流语种歌词创作
云端友好架构：专为云计算优化的设计，特别适合部署在Amazon EC2等云服务器

环境

MacBook Pro (14 英寸, M3, 2023)
操作系统 : macOS 14.5
aws-cli : 2.15.56

音乐创作的环境和相关操作全部在AWS EC2 上完成。

创建 EC2 实例

首先，我们将启动一个配置了 GPU 和充足内存的 EC2 实例来执行 YuE。本次，我们选择了 g5.2xlarge 实例类型，以确保有足够的处理能力。操作系统方面，我们采用了 Ubuntu 22.04 LTS 版本的 AMI，保证系统的稳定性和最新的支持。这样的配置旨在提供高效和强大的计算性能，满足应用的需求。

# 启动 EC2 实例的命令
% aws ec2 run-instances \
  --region <your region> \
  --image-id ami-xxxxxxxx \
  --instance-type g5.2xlarge \
  --key-name <your pem key> \
  --security-group-ids <your security group> \
  --block-device-mappings "[
    {
      \"DeviceName\": \"/dev/sda1\",
      \"Ebs\": {
        \"VolumeSize\": 150,
        \"VolumeType\": \"gp3\",
        \"DeleteOnTermination\": true
      }
    }
  ]" \
  --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=YuE}]"

起動后使用 ssh 进行登录并设置配置。

# 使用Ubuntu的AMI的情况
% ssh -i <pem key> ubuntu@<public ip>

配置YuE

确认设置 YuE 的环境。

cat /etc/os-release

PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

uname -a
Linux ip-xxxxxxxxx 6.8.0-1021-aws 
#23~22.04.1-Ubuntu SMP Tue Dec 10 16:50:46 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

NVIDIA 驱动和 GPU、CUDA 等的确认。
GPU 使用 A10G。

% nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
|=========================================+========================+======================|
|   0  NVIDIA A10G                    On  |   00000000:00:1E.0 Off |                    0 |
...

确认 CUDA 的信息。

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

创建 Conda 环境&安装必要包

下载 Miniconda 的安装程序并运行。

% wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
% bash Miniconda3-latest-Linux-x86_64.sh

安装后请运行以下命令以激活。

% conda init bash
% source ~/.bashrc

创建用于 YuE 的 Python 环境（Python 3.8）。

% conda create -n yue python=3.8 -y
% conda activate yue

Conda 环境确认。

% conda info

active environment : yue
active env location : /opt/conda/envs/yue
conda version : 24.11.2
python version : 3.12.8.final.0

频道设置确认。

% conda config --show channels

channels:
  - conda-forge
  - nvidia
  - pytorch

% conda config --show channel_priority

channel_priority: flexible

创建好 conda 环境后，安装 PyTorch 等所需包。

% conda install pytorch torchvision torchaudio cudatoolkit=11.8 -c pytorch -c nvidia

获取并设置 YuE 仓库

从 GitHub 克隆 YuE 仓库并设置包。

% git clone https://github.com/multimodal-art-projection/YuE.git

% cd YuE

% pip install -r <(curl -sSL https://raw.githubusercontent.com/multimodal-art-projection/YuE/main/requirements.txt)

安装FlashAttention

% pip install flash-attn

在安装 flash-attn 包时，曾提及需要使用 --no-build-isolation 选项来避免构建隔离。在我的环境中，构建过程有时会耗时超过一小时才能完成，因此我选择不使用该选项。

接下来，我们将确认各种包的信息，确保一切配置正确，以便顺利进行后续的操作和开发工作。

python -c "import torch; print('Torch version:', torch.version); print('CUDA version (PyTorch):', torch.version.cuda); print('cuDNN version:', torch.backends.cudnn.version()); print('CUDA available:', torch.cuda.is_available())"

Torch version: <module 'torch.version' from '/opt/conda/envs/yue/lib/python3.8/site-packages/torch/version.py'>
CUDA version (PyTorch): 12.4
cuDNN version: 90100
CUDA available: True

执行 YuE 推理

如下所示，我们将使用示例中提供的歌词和音乐类型信息来执行推理脚本。执行下面的命令时，流程将从下载模型文件开始，随后进行文件生成：

% cd path/your/YuE/inference

% python infer.py \
    --cuda_idx 0 \
    --stage1_model m-a-p/YuE-s1-7B-anneal-en-cot \
    --stage2_model m-a-p/YuE-s2-1B-general \
    --genre_txt ../prompt_egs/genre.txt \
    --lyrics_txt ../prompt_egs/lyrics.txt \
    --run_n_segments 2 \
    --stage2_batch_size 4 \
    --output_dir ../output \
    --max_new_tokens 3000

prompt信息在这里。使用 genre_txt 指定音乐的流派。如下所示，可以用空格分隔，指定多个声乐方向或音乐流派等标签。

inspiring female uplifting pop airy vocal electronic bright vocal vocal

前 200 个标签在这里。除此之外，似乎还可以使用各种标签。

lyrics_txt 中指定歌詞。
↓是为示例准备的歌词。

[verse]
Staring at the sunset, colors paint the sky
Thoughts of you keep swirling, can't deny
I know I let you down, I made mistakes
But I'm here to mend the heart I didn't break

[chorus]
Every road you take, I'll be one step behind
Every dream you chase, I'm reaching for the light
You can't fight this feeling now
I won't back down
You know you can't deny it now
I won't back down

・・・

英語、中国语、日本语、韩语等多种语言可用。
歌词提示请在开头标注结构标签（例如: [verse]、[chorus]、[bridge]、[outro]）
据说要将歌曲歌词分割到各个会话中。

生成的音乐文件的下载

执行完毕后，生成的文件将在约 5 到 10 分钟内完成。生成的结果包括三种类型的 MP3 文件，分别存储在指定的 output_dir 目录中：

只包含人声的 MP3 文件：这个文件包含了所有的歌词部分，没有伴奏。
只包含乐器的 MP3 文件：这个文件仅包含音乐背景，不含人声。
组合版的 MP3 文件：这个文件将人声和乐器部分结合在一起，形成完整的音乐作品。

完成生成后，您可以使用 scp 或其他文件传输工具将这些文件从服务器下载到本地进行确认。例如，使用以下命令：

% scp -i <your pem> ubuntu@<EC2_IP>:"/path/your/YuE/output/your-generated.mp3" .

↑的参数会生成一首用英语演唱的歌曲。

清理

实践之后记得清理相关资源避免出现想定外的扣费

% aws ec2 stop-instances --region <your region> --instance-ids <your instance id>

总结

🎵 想在 Amazon EC2 上体验开源音乐生成软件 YuE？ 本文手把手带你完成整个过程！从 创建实例、配置环境、安装依赖，到 用示例歌词生成音乐并下载，每一步都详细拆解，让你轻松上手！💡

如果你对 AI 音乐创作感兴趣，或者已经尝试过 YuE，欢迎 留言交流你的心得！🎼 也别忘了 订阅关注，一起探索更多 AI 音乐的可能性！ 🚀🎶