python为窗口的关闭按钮增加自定义函数_python-keras:如何编写自定义的损失函数以将......

流派分类通常不需要定制的损失函数.

可以将歌曲分为多个预测窗口的组合模型设置为Multiple Instance Learning(MIL).

MIL是一种有监督的学习方法,其中标签不是在每个独立样本(实例)上,而是在实例的“袋”(无序集合)上.

在您的情况下,实例是MFCC功能的每个5秒窗口,而包就是整首歌.

在Keras中,我们使用TimeDistributed层对所有窗口执行模型.

然后,我们使用GlobalAveragePooling1D有效地合并结果

跨窗口实施平均投票.这比多数投票更容易区分.

下面是一个可运行的示例:

import math

import keras

import librosa

import pandas

import numpy

import sklearn

def window_model(n_bands, n_frames, n_classes, hidden=32):

from keras.layers import Input, Dense, Flatten, Conv2D, MaxPooling2D

out_units = 1 if n_classes == 2 else n_classes

out_activation = 'sigmoid' if n_classes == 2 else 'softmax'

shape = (n_bands, n_frames, 1)

# Basic CNN model

# An MLP could also be used, but may need to reshape on input and output

model = keras.Sequential([

Conv2D(16, (3,3), input_shape=shape),

MaxPooling2D((2,3)),

Conv2D(16, (3,3)),

MaxPooling2D((2,2)),

Flatten(),

Dense(hidden, activation='relu'),

Dense(hidden, activation='relu'),

Dense(out_units, activation=out_activation),

])

return model

def song_model(n_bands, n_frames, n_windows, n_classes=3):

from keras.layers import Input, TimeDistributed, GlobalAveragePooling1D

# Create the frame-wise model, will be reused across all frames

base = window_model(n_bands, n_frames, n_classes)

# GlobalAveragePooling1D expects a 'channel' dimension at end

shape = (n_windows, n_bands, n_frames, 1)

print('Frame model')

base.summary()

model = keras.Sequential([

TimeDistributed(base, input_shape=shape),

GlobalAveragePooling1D(),

])

print('Song model')

model.summary()

model.compile(loss='categorical_crossentropy', optimizer='SGD', metrics=['acc'])

return model

def extract_features(path, sample_rate, n_bands, hop_length, n_frames, window_length, song_length):

# melspectrogram might perform better with CNNs

from librosa.feature import mfcc

# Load a fixed length section of sound

# Might need to pad if some songs are too short

y, sr = librosa.load(path, sr=sample_rate, offset=0, duration=song_length)

assert sr == sample_rate, sr

_song_length = len(y)/sample_rate

assert _song_length == song_length, _song_length

# Split into windows

window_samples = int(sample_rate * window_length)

window_hop = window_samples//2 # use 50% overlap

windows = librosa.util.frame(y, frame_length=window_samples, hop_length=window_hop)

# Calculate features for each window

features = []

for w in range(windows.shape[1]):

win = windows[:, w]

f = mfcc(y=win, sr=sample_rate, n_mfcc=n_bands,

hop_length=hop_length, n_fft=2*hop_length)

f = numpy.expand_dims(f, -1) # add channels dimension

features.append(f)

features = numpy.stack(features)

return features

def main():

# Settings for our model

n_bands = 13 # MFCCs

sample_rate = 22050

hop_length = 512

window_length = 5.0

song_length_max = 1.0*60

n_frames = math.ceil(window_length / (hop_length/sample_rate))

n_windows = math.floor(song_length_max / (window_length/2))-1

model = song_model(n_bands, n_frames, n_windows)

# Generate some example data

ex = librosa.util.example_audio_file()

examples = 8

numpy.random.seed(2)

songs = pandas.DataFrame({

'path': [ex] * examples,

'genre': numpy.random.choice([ 'rock', 'metal', 'blues' ], size=examples),

})

assert len(songs.genre.unique() == 3)

print('Song data')

print(songs)

def get_features(path):

f = extract_features(path, sample_rate, n_bands,

hop_length, n_frames, window_length, song_length_max)

return f

from sklearn.preprocessing import LabelBinarizer

binarizer = LabelBinarizer()

y = binarizer.fit_transform(songs.genre.values)

print('y', y.shape, y)

features = numpy.stack([ get_features(p) for p in songs.path ])

print('features', features.shape)

model.fit(features, y)

if __name__ == '__main__':

main()

该示例输出内部和组合模型摘要:

Frame model

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_1 (Conv2D) (None, 11, 214, 16) 160

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 (None, 5, 71, 16) 0

_________________________________________________________________

conv2d_2 (Conv2D) (None, 3, 69, 16) 2320

_________________________________________________________________

max_pooling2d_2 (MaxPooling2 (None, 1, 34, 16) 0

_________________________________________________________________

flatten_1 (Flatten) (None, 544) 0

_________________________________________________________________

dense_1 (Dense) (None, 32) 17440

_________________________________________________________________

dense_2 (Dense) (None, 32) 1056

_________________________________________________________________

dense_3 (Dense) (None, 3) 99

=================================================================

Total params: 21,075

Trainable params: 21,075

Non-trainable params: 0

_________________________________________________________________

Song model

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

time_distributed_1 (TimeDist (None, 23, 3) 21075

_________________________________________________________________

global_average_pooling1d_1 ( (None, 3) 0

=================================================================

Total params: 21,075

Trainable params: 21,075

Non-trainable params: 0

_________________________________________________________________

并将特征向量的形状输入模型:

features (8, 23, 13, 216, 1)

8首歌曲,每个23个窗口,带有13个MFCC频段,每个窗口216帧.

第五维尺寸为1,使Keras开心…

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值