基于随机森林+RNN+Tensorflow-Magenta的根据图片情感智能生成音乐系统——深度学习算法应用(含python、ipynb工程源码)+所有数据集（二）

小胡说人工智能

已于 2023-11-02 11:22:32 修改

阅读量333

点赞数

分类专栏：机器学习深度学习学习路线文章标签： 1024程序员节人工智能 python 深度学习随机森林 rnn 机器学习

于 2023-10-24 21:30:00 首次发布

本文链接：https://blog.csdn.net/qq_31136513/article/details/134016289

版权

学习路线同时被 3 个专栏收录

116 篇文章

订阅专栏

深度学习

65 篇文章

订阅专栏

机器学习

40 篇文章

订阅专栏

本文介绍了使用Google的Magenta平台开发的一个项目，通过随机森林和RNN结合，根据图片情感生成相应的音乐。项目涉及数据预处理、模型构建（包括情感分析模型和音乐生成模型）、以及GUI界面展示。代码展示了数据转换和模型训练的过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言

本项目基于Google的Magenta平台，它采用随机森林分类器来识别图片的情感色彩，接着项目使用递归神经网络（RNN）来生成与图片情感相匹配的音乐，最后通过图形用户界面（GUI）实现可视化结果展示。

首先，项目处理图片，使用随机森林分类器来确定图片中的情感色彩。这可以包括情感分类，如欢快、宁静等。该分类器会分析图片的视觉特征，以确定其中蕴含的情感。

随后，根据图片中的情感，项目使用递归神经网络（RNN）生成与情感相匹配的音乐。这个过程涉及到选定特定音符、节奏和和声，以创造出与图片情感相一致的音乐作品。

最后，项目通过图形用户界面（GUI）将图片、情感色彩、生成的音乐等结果以可视化方式呈现给用户。

总之，这个项目结合了计算机视觉、音乐生成和图形用户界面设计，旨在将图片的情感色彩与音乐创作相融合，为用户提供一种独特的艺术体验。这对于艺术和技术的交叉应用可能非常引人注目。

总体设计

本部分包括系统整体结构图和系统流程图。

系统整体结构图

系统整体结构如图所示。

在这里插入图片描述

系统流程图

系统流程如图所示。

在这里插入图片描述

运行环境

本部分包括 Python 环境和Magenta环境。

详见博客。

模块实现

本项目包括3个模块:数据预处理、模型构建、模型训练及保存，下面分别给出各模块的功能介绍及相关代码。

1. 数据预处理

MIDI下载地址为http://midi.midicn.com/，图片在花瓣网收集获取地址为https://huaban.com/boards/60930738/。音乐模型包含欢快和安静两类MIDI文件各100个，图片包含欢快和安静两类各250张，格式为.jpg。另外，数据集也可以从本博客对应工程源码中下载。

（1）图片部分

提取图片中占比前十的色彩信息，将其转换成hsv格式，存储到.csv文件中，便于后续使用。

详见博客。

（2）音乐部分

首先，对网上下载的音乐打标签，将其分为安静和欢快两类；其次，分别进行预处理。在Magenta中，原始数据(MIDI、MusicXML) 被转换成基于缓存协议的NoteSequence，根据模型的不同，将NoteSequence转换成该模型需要的输入。Magenta 支持MIDI ( .mid/ .midi)、MusicXML(.xml/.mxl)等格式的原始数据文件做训练数据。并通过convert_dir_to_note_sequences.py 转换为NoteSequence，以TFRecord格式存储。这里使用的是MIDI文件格式转换。

FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('input_dir', None,
                 'Directory containing files to convert.') #输入MIDI文件路径
tf.app.flags.DEFINE_string('output_file', None,
                   'Path to output TFRecord file. Will be overwritten '
                           'if it already exists.') #输出tfrecord文件路径
tf.app.flags.DEFINE_bool('recursive', False,
                'Whether or not to recurse into subdirectories.')
#是否递归查找子路径的文件
tf.app.flags.DEFINE_string('log', 'INFO',
                           'The threshold for what messages will be logged '
               'DEBUG, INFO, WARN, ERROR, or FATAL.') #显示消息类型
#转换文件
   #参数
   #root_dir:指定根目录的字符串
   #sub_dir:一个字符串，指定“根目录”下的路径，
   #writer:一个记录
   #recursive:一个布尔值，指定是否递归转换文件包含在指定目录的子目录中
   #返回:转换文件路径的映射
def convert_files(root_dir, sub_dir, writer, recursive=False):
  dir_to_convert = os.path.join(root_dir, sub_dir)
  tf.logging.info("Converting files in '%s'.", dir_to_convert)
  files_in_dir = tf.gfile.ListDirectory(os.path.join(dir_to_convert))
  recurse_sub_dirs = []
  written_count = 0
  for file_in_dir in files_in_dir:
    tf.logging.log_every_n(tf.logging.INFO, '%d files converted.',
                           1000, written_count)
    full_file_path = os.path.join(dir_to_convert, file_in_dir)
    if (full_file_path.lower().endswith('.mid') or
        full_file_path.lower().endswith('.midi')):
      try:
        sequence = convert_midi(root_dir, sub_dir, full_file_path)
      except Exception as exc:  #pylint: disable=broad-except
      tf.logging.fatal('%r generated an exception: %s',full_file_path,exc)
        continue
      if sequence:
        writer.write(sequence)
    elif (full_file_path.lower().endswith('.xml') or
          full_file_path.lower().endswith('.mxl')):
      try:
        sequence = convert_musicxml(root_dir, sub_dir, full_file_path)
      except Exception as exc:  #pylint: disable=broad-except
     tf.logging.fatal('%r generated an exception:%s',full_file_path, exc)
        continue
      if sequence:
        writer.write(sequence)
    elif full_file_path.lower().endswith('.abc'):
      try:
        sequences = convert_abc(root_dir, sub_dir, full_file_path)
      except Exception as exc:  #pylint: disable=broad-except
       tf.logging.fatal('%r generated anexception:%s',full_file_path,exc)
        continue
      if sequences:
        for sequence in sequences:
          writer.write(sequence)
    else:
      if recursive and tf.gfile.IsDirectory(full_file_path):
        recurse_sub_dirs.append(os.path.join(sub_dir, file_in_dir))
      else:
        tf.logging.warning(
            'Unable to find a converter for file %s', full_file_path)
  for recurse_sub_dir in recurse_sub_dirs:
    convert_files(root_dir, recurse_sub_dir, writer, recursive)
#将MIDI文件转换为序列原型
#参数: root_dir:指定文件根目录的字符串已转换
#sub_dir：当前正在转换的目录
#full_file_path: 要转换文件的完整路径
#return: 如果文件无法转换，则为注释序列原型或无
def convert_midi(root_dir, sub_dir, full_file_path):
  try:
    sequence = midi_io.midi_to_sequence_proto(
        tf.gfile.GFile(full_file_path, 'rb').read())
  except midi_io.MIDIConversionError as e:
    tf.logging.warning(
        'Could not parse MIDI file %s. It will be skipped. Error was: %s',
        full_file_path, e)
    return None #错误处理
  sequence.collection_name = os.path.basename(root_dir)
  sequence.filename = os.path.join(sub_dir, os.path.basename(full_file_path))
  sequence.id = note_sequence_io.generate_note_sequence_id(
      sequence.filename, sequence.collection_name, 'midi')
  tf.logging.info('Converted MIDI file %s.', full_file_path)
  return sequence
def convert_directory(root_dir, output_file, recursive=False):
  #将文件转换为注释序列并写入output_file
  #在根目录中找到的输入文件被转换为带root_dir的基本名称
#来自root_dir的文件作为文件名。如果递归为真，递归转换指定目录的任何子目录
  #参数:root_dir指定根目录的字符串
  #output_file:要将结果写入TFRecord文件的路径
  #recursive:一个布尔值，指定是否递归转换文件，包含在指定目录的子目录中
	with note_sequence_io.NoteSequenceRecordWriter(output_file) as writer:
		convert_files(root_dir, '', writer, recursive)
 #主函数
def main(unused_argv):
	tf.logging.set_verbosity(FLAGS.log)
	#错误处理
	if not FLAGS.input_dir:
		tf.logging.fatal('--input_dir required')
	  	return
	if not FLAGS.output_file:
	  	tf.logging.fatal('--output_file required')
	  	return
	input_dir = os.path.expanduser(FLAGS.input_dir)       #输入路径
	output_file = os.path.expanduser(FLAGS.output_file) 
	#输出文件
	output_dir = os.path.dirname(output_file)              #输出路径
	if output_dir:
	  	tf.gfile.MakeDirs(output_dir)
	convert_directory(input_dir, output_file, FLAGS.recursive)
#运行主函数
def console_entry_point():
  tf.app.run(main)

将MIDI文件全部存储为TFrecord文件之后，使用polyphony_rnn_create_dataset.py建立数据集，用polyphony模型进行训练，得到音乐数据集。

flags = tf.app.flags
FLAGS = tf.app.flags.FLAGS
flags.DEFINE_string(
    'input', 'E:/college/synaes/midi/midi/tf/pst.tfrecord',
    'TFRecord to read NoteSequence protos from.')
#读取NoteSquence的TFReord文件
flags.DEFINE_string(
    'output_dir', 'E:/college/synaes/poly_rnn/datasets/pst',
    'Directory to write training and eval TFRecord files. The TFRecord files '
    'are populated with SequenceExample protos.') #保存序列示例的路径
flags.DEFINE_float(
    'eval_ratio', 0.1,
    'Fraction of input to set aside for eval set. Partition is randomly '
#测试集的比例，划分是随机的
    'selected.')
flags.DEFINE_string(
    'log', 'INFO',
    'The threshold for what messages will be logged DEBUG, INFO, WARN, ERROR, '
    'or FATAL.') #记录调试、信息、警告、错误或致命消息的阈值
#主函数
def main(unused_argv):
  tf.logging.set_verbosity(FLAGS.log)
  pipeline_instance = polyphony_rnn_pipeline.get_pipeline(
      min_steps=80,
      max_steps=512,
      eval_ratio=FLAGS.eval_ratio,
      config=polyphony_model.default_configs['polyphony'])
#配置config为polyphony数据集
  input_dir = os.path.expanduser(FLAGS.input) #输入路径
  output_dir = os.path.expanduser(FLAGS.output_dir) #输出路径
  pipeline.run_pipeline_serial(
      pipeline_instance,
     pipeline.tf_record_iterator(input_dir, pipeline_instance.input_type),
      output_dir) #生成数据集
#运行主函数
def console_entry_point():
  tf.app.run(main)

2. 模型构建

数据加载进模型之后，定义模型结构，并优化损失函数。

（1）定义模型结构

本部分包括图片情感分析和复调音乐模型。

1)图片情感分析

将30维特征送入随机森林分类器中，模型参数主要为决策树数量、树的深度和节点最小可分样本数。

2)复调音乐模型

Polyphony模型需要从初级轨道生成复音轨道，由此构建PolyphonyRnnModel类实现复音序列的生成，同时评估了复音序列的对数似然性。加载模型，配置contrib_training_HParams类参数，HParams类是以名称-值对的形式保存一组超参数，HParams对象包含用于构建和训练模型的超参数。

class PolyphonyRnnModel(events_rnn_model.EventSequenceRnnModel):
  #RNN复音序列生成模型类
  def generate_polyphonic_sequence(
      self, num_steps, primer_sequence, temperature=1.0, beam_size=1,
      branch_factor=1, steps_per_iteration=1, modify_events_callback=None):
    #从初级复音轨道生成复音轨道
    #参数num_steps:最后一个轨道的整数长度，以步长为单位，包括引物序列
    #primer_sequence: 引物序列，一个多音序对象
    #Temperature: 一个浮点值，指定逻辑值除以多少在计算softmax之前。大于1.0会使轨道更随机，小于1.0则反之
    #beam_size: 一个整数，波束大小在生成轨迹时使用波束搜索
    #branch_factor: 要使用的整数波束搜索分支因子
    #steps_per_iteration: 一个整数，每次波束搜索需要的步数迭代
    #modify_events_callback: 用于修改事件列表的可选回调
    #返回:生成的复音序列对象
    return self._generate_events(num_steps, primer_sequence, temperature,
           beam_size, branch_factor, steps_per_iteration,
           modify_events_callback=modify_events_callback)
#返回生成的复音序列对象
  def polyphonic_sequence_log_likelihood(self, sequence):
    #评估复音序列的对数似然性
    #参数sequence：评估日志的复音序列对象的可能性
    return self._evaluate_log_likelihood([sequence])[0]
#返回该模型下序列的对数似然性
#配置模型参数
default_configs = {
    'polyphony': events_rnn_model.EventSequenceRnnConfig(
        generator_pb2.GeneratorDetails(
            id='polyphony',
            description='Polyphonic RNN'), #配置模型为polyphony
        magenta.music.OneHotEventSequenceEncoderDecoder(
      polyphony_encoder_decoder.PolyphonyOneHotEncoding()),
#将复音输入转化成模型之间的输入/输出
        contrib_training.HParams(
            batch_size=64,
            rnn_layer_sizes=[256, 256, 256],
            dropout_keep_prob=0.5,
            clip_norm=5,
            learning_rate=0.001)),
#HParams类以名称-值对的形式保存一组超参数
}

（2）优化损失函数

本部分包括图片情感分析和复调音乐模型。

1)图片情感分析

由于所有的标签都带有相似的权重，使用精确度作为性能指标。在随机森林分类器中，特征处理和特征选择是较为重要的一环，未经特征选择时，准确率不到80%，多次尝试后最终选择了15维特征，精确度达到了95%。

2)复调音乐模型

经过训练之后的文件以精确度和损失作为性能指标，精确度达到50%左右，而损失有1.8，整体来说这个模型并不理想。但发现随着训练次数的增加，精确度有所提高，损失下降。主要原因是训练次数以及数据集的内容过少导致，想要达到更高的精确度和更小的损失，需要进行多次训练和扩充数据集的内容。

工程源代码下载

详见本人博客资源下载页

其它资料下载

如果大家想继续了解人工智能相关学习路线和知识体系，欢迎大家翻阅我的另外一篇博客《重磅 | 完备的人工智能AI 学习——基础知识学习路线，所有资料免关注免套路直接网盘下载》
这篇博客参考了Github知名开源平台，AI技术平台以及相关领域专家：Datawhale，ApacheCN，AI有道和黄海广博士等约有近100G相关资料，希望能帮助到所有小伙伴们。