昇思25天学习打卡营第25 天|Music Gen

Music Gen is music Generating model based on LM.

Music can be created by text or audio prompt.

Music Gen is based on Transformer.

text-> encoder-->hidden_rep-> decoder->music

model = MusicgenForConditionalGeneration.from_pretrained('facebook/musicgen-small')

Two mode: greedy and sampling. (It is said that the latter one is better)

With no prompts

%%time
unconditional_inputs = model.get_unconditional_inputs(num_samples=1)
audio_values = model.generate(**unconditional_inputs, do_sample = True, max_new_tokens = 256)
sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write('musicgen_out.wav', rate = sampling_rate, data = audio_values[0,0].asnumpy())
Audio(audio_values[0].asnumpy(), rate = sampling_rate)# the method to listen in jupyter

audio_length_in_s = 256/model.config.audio_encoder.frame_rate
audio_length_in_s

With text prompt

processor = AutoProcessor.from_pretrained('facebook/musicgen-small')
inputs = processor(
    text = ['90s pop track with bassy drums and synth', '80s rock song with loud guitars and heavy drums']
    padding = True,
    return_tensors = 'ms'
    )
audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens = 256)
scipy.io.wavfile.write('musicgen_out_text.wav', rate = sampling_rate, data = audio_values[0,0].asnumpy())
Audio(audio_values[0].asnumpy(), rate = sampling_rate)

With audio prompt

processor = AuroProcessor.from_pretrained('facebook/musicgen-small')
dataset = load_dataset('sanchit-gandhi/gtzan', split = 'train', streaming = True)
sample = next(iter(dataset))['audio']
sample['array'] = sample['array'][:len(sample['array']) // 2]
inputs = processor(
    audio = sample['array'],
    sampling_rate = sample['sampling_rate'],
    text = ['80s blue track with groovy saxophone'],
    padding = True,
    return_tensors = 'ms'
)
audio_values = model.generate(**inputs, do_sample = True, guidance_scale= 3, max_new_tokens = 256)

Batch generating.

sample = next(iter(dataset))['audio']
sample_1 = sample['array'][:len(sample['array'])//4]
sample_2 = sample['array'][:len(sample['array']) //2]
inputs = processor(
    audio = [sample_1, sample_2],
    sampling_rate = sample['sampling_rate'],
    text = ['80s blues track with groovy saxophone', '90s rock songs with loud guiatars and heavy drums'],
    padding = True,
    return_tensor = 'ms'
)
audio_values = model.generate(**inputs, do_sample = True, guidance_scale = 3, max_new_tokens = 256)

audio_values = processor.batch_decode(audio_values, padding_mask = inputs.padding_mask)

    

  • 3
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值