Magenta魔改记-1：原始数据转换_python 处理musicxml数据-CSDN博客

本文链接：https://blog.csdn.net/weixin_38090501/article/details/90524647

Magenta魔改记-1：原始数据转换

前言

本文主要讲述Magenta项目原始数据整合的过程，并介绍了读取MIDI和XML的函数。通过本文我们可以看到，在原始音乐数据整合的过程中，Magenta将不同格式的数据转换到了一个接近MusicXML的统一格式中统一存储。

Magenta中有很多自动作曲模型，它们都使用不同格式的数据输入。在Magenta中，原始数据（MIDI,MusicXML等）先被转换成基于protocol buffers的NoteSequence，之后，根据模型的不同，再将NoteSequence转换成该模型需要的输入。

Magenta支持MIDI（.mid/.midi）、MusicXML（.xml/.mxl）、ABC（http://abcnotation.com，没有测试过）等格式的原始数据文件做训练数据。

通过convert_dir_to_note_sequences.py，这些原始数据被转换为NoteSequence，并以tfrecord格式储存。

接下来我们分析在将convert_dir_to_note_sequences.py中如何将MIDI/MusicXML文件转换成NoteSequence。

Magenta version:1.1.1

魔改-1.0：从命令行输入参数：

在Magenta的github中提供了如何将原始数据通过命令行转换为NoteSequence protocol buffers的方法：
https://github.com/tensorflow/magenta/tree/master/magenta/scripts#building-your-dataset

上述链接中提供的Linux命令行如下：

INPUT_DIRECTORY=<folder containing MIDI and/or MusicXML files. can have child folders.>

# TFRecord file that will contain NoteSequence protocol buffers.
SEQUENCES_TFRECORD=/tmp/notesequences.tfrecord

convert_dir_to_note_sequences \
  --input_dir=$INPUT_DIRECTORY \
  --output_file=$SEQUENCES_TFRECORD \
  --recursive

这一步的python命令行如下（摘自convert_dir_to_note_sequences.py源代码注释）：

Example usage:
  $ python magenta/scripts/convert_dir_to_note_sequences.py \
    --input_dir=/path/to/input/dir \
    --output_file=/path/to/tfrecord/file \
    --log=INFO

那么下面介绍如何在代码中直接修改这一步预处理的参数。

这一步运行的文件位置如下：

convert_dir_to_note_sequences.py

打开源代码我们可以看到，程序一开始就定义了一系列tf.flag：

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('input_dir', None,
                           'Directory containing files to convert.')
#输入路径
tf.app.flags.DEFINE_string('output_file', None,
                           'Path to output TFRecord file. Will be overwritten '
                           'if it already exists.')
#输出路径
tf.app.flags.DEFINE_bool('recursive', False,
                         'Whether or not to recurse into subdirectories.')
#是否递归查找子路径的文件

tf.app.flags.DEFINE_string('log', 'INFO',
                           'The threshold for what messages will be logged '
                           'DEBUG, INFO, WARN, ERROR, or FATAL.')
#显示消息的类型

tf.app.flags是Tensorflow中用于从命令行传递参数的模块，基于argparse。如果在运行时不输入参数，则会按程序中默认填写的参数运行。

通过python convert_dir_to_note_sequences.py –h可以显示注释信息和参数及其详情。
因此，我们在自定义参数时，既可以在命令行运行时输入：

python convert_dir_to_note_sequences.py --input_dir=XXX --output_file=YYY --recursive=True

同样，我们也可以把前面这几行当做超参数变量声明，直接在convert_dir_to_note_sequences.py中修改，然后运行这个文件。

除了命令行之外，我们接下来介绍如何在python文件中直接修改参数以及如何在jupyter环境中修改参数并调试。

魔改-2.0：在jupyter notebook中调试：

接下来，我们介绍如何在jupyter notebook中调试，并展现这个程序的详细原理以及文件储存的数据类型。

程序源代码地址：
https://github.com/tensorflow/magenta/blob/master/magenta/scripts/convert_dir_to_note_sequences.py

在本程序中，大致的运行步骤为：

先检测输入路径