2. Softmax回归
用于多分类任务,同属线性分类器。
2.1 公式合理性推导
该回归属于多项式分布,即伯努利分布的多项式形式,所以也是可以建立广义线性回归的指数族分布的模型,首先回顾一下该模型:
p
(
y
;
η
)
=
b
(
y
)
exp
(
η
T
T
(
y
)
−
a
(
η
)
)
η
是自然参数(我们要求解的模型),
T
(
y
)
是充分统计量
(
一般就为
y
)
a
(
η
)
是对数部分函数
(
确保分布的连续函数积分为
1
)
p\left( y;\eta \right) \,\,=\,\,b\left( y \right) \exp \left( \eta ^TT\left( y \right) \,\,-\,\,a\left( \eta \right) \right) \\ \eta \,\,\text{是} \text{自然参数(我们要求解的模型),}T\left( y \right) \,\,\text{是} \text{充分统计量}\left( \text{一般就为}y \right) \\ a\left( \eta \right) \,\,\text{是} \text{对数部分函数}\left( \text{确保分布的连续函数积分为}1 \right)
p(y;η)=b(y)exp(ηTT(y)−a(η))η是自然参数(我们要求解的模型),T(y)是充分统计量(一般就为y)a(η)是对数部分函数(确保分布的连续函数积分为1)
然后我们看一下如何将 多项式分布 挂接到模型上:
- 多项式分布的目标值 y ∈{1,2,3,…,k},k为类别数;
- 概率分布为:
P ( y = i ) = φ i , 且 ∑ i = 1 k φ i = 1 \boldsymbol{P}\left( \boldsymbol{y}=\boldsymbol{i} \right) \,\,=\,\,\boldsymbol{\varphi }_{\boldsymbol{i}}\,\,,\,\,\boldsymbol{且}\,\,\sum_{\boldsymbol{i}=\boldsymbol{1}}^{\boldsymbol{k}}{\boldsymbol{\varphi }_{\boldsymbol{i}}}\,\,=\,\,\boldsymbol{1} P(y=i)=φi,且i=1∑kφi=1 - 由联合概率密度函数 推导 至 指数族分布
P ( y ; φ ) = φ 1 I [ y = 1 ] φ 2 I [ y = 2 ] . . . φ k I [ y = k ] = φ 1 I [ y = 1 ] φ 2 I [ y = 2 ] . . . φ k 1 − ∑ i = 1 k − 1 I [ y = i ] = exp ( log ( φ 1 I [ y = 1 ] φ 2 I [ y = 2 ] . . . φ k 1 − ∑ i = 1 k − 1 I [ y = i ] ) ) = exp ( ∑ i = 1 k − 1 I ( y = i ) log φ i + ( 1 − ∑ i = 1 k − 1 I ( y = i ) log φ k ) ) = exp ( ∑ i = 1 k − 1 I ( y = i ) log ( φ i φ k ) + log φ k ) P\left( y;\varphi \right) \,\,=\,\,\varphi _{1}^{I\left[ y=1 \right]}\varphi _{2}^{I\left[ y=2 \right]}...\varphi _{k}^{I\left[ y=k \right]} \\ =\,\,\varphi _{1}^{I\left[ y=1 \right]}\varphi _{2}^{I\left[ y=2 \right]}...\varphi _{k}^{1-\sum_{i=1}^{k-1}{I\left[ y=i \right]}} \\ =\,\,\exp \left( \log \left( \varphi _{1}^{I\left[ y=1 \right]}\varphi _{2}^{I\left[ y=2 \right]}...\varphi _{k}^{1-\sum_{i=1}^{k-1}{I\left[ y=i \right]}} \right) \right) \\ \,\,=\,\,\exp \left( \sum_{i=1}^{k-1}{I\left( y=i \right) \log \varphi _i\,\,+\,\,\left( 1 -\,\,\sum_{i=1}^{k-1}{I\left( y=i \right) \log \varphi _k} \right)} \right) \\ =\,\,\exp \left( \sum_{i=1}^{k-1}{I\left( y=i \right) \log \left( \frac{\varphi _i}{\varphi _k} \right)}\,\,+\,\,\log \varphi _k \right) P(y;φ)=φ1I[y=1]φ2I[y=2]...φkI[y=k]=φ1I[y=1]φ2I[y=2]...φk1−∑i=1k−1I[y=i]=exp(log(φ1I[y=1]φ2I[y=2]...φk1−∑i=1k−1I[y=i]))=exp(i=1∑k−1I(y=i)logφi+(1−i=1∑k−1I(y=i)logφk))=exp(i=1∑k−1I(y=i)log(φkφi)+logφk)
则可以看出
η = ∑ i = 1 k − 1 log ( φ i φ k ) \eta \,\,=\,\,\sum_{i=1}^{k-1}{\log \left( \frac{\varphi _i}{\varphi _k} \right)} η=i=1∑k−1log(φkφi)
则继续求
φ i = φ k e η i \varphi _i\,\,=\,\,\varphi _ke^{\eta _i} φi=φkeηi
∑ i = 1 k φ i = ∑ i = 1 k φ k e η i = 1 , φ k = 1 ∑ i = 1 k e η i \sum_{i=1}^k{\varphi _i\,\,=\,\,\sum_{i=1}^k{\varphi _ke^{\eta _i}}}\,\,=\,\,1 , \varphi _k\,\,=\,\,\frac{1}{\sum_{i=1}^k{e^{\eta _i}}}\,\, i=1∑kφi=i=1∑kφkeηi=1,φk=∑i=1keηi1
φ i = e η i ∑ i = 1 k e η i \varphi _i\,\,=\,\,\frac{e^{\eta _i}}{\sum_{i=1}^k{e^{\eta _i}}}\,\, φi=∑i=1keηieηi
由于 η 还有 η = θx,
所以到此我们就得到了 softmax 的公式:
h θ ( x ( i ) ) = e Z i ∑ i = 1 k e Z i h_{\theta}\left( x^{\left( i \right)} \right) \,\,=\,\,\frac{e^{Z_i}}{\sum_{i=1}^k{e^{Z_i}}}\,\, hθ(x(i))=∑i=1keZieZi
Z i = θ i T x ( i ) Z_i\,\,=\,\,\theta _{i}^{T}x^{\left( i \right)} Zi=θiTx(i)
2.2 损失函数
同样采用最大似然MLE的思想,需要确保条件下的概率最大,且公式为:
L
(
θ
)
=
∏
i
=
1
m
P
(
y
i
∣
x
i
;
θ
)
=
∏
i
=
1
m
∏
j
=
1
k
φ
j
I
{
y
i
=
j
}
L\left( \theta \right) \,\,=\,\,\prod_{i=1}^m{P\left( y^i|x^i;\theta \right)}\,\,=\,\,\prod_{i=1}^m{\prod_{j=1}^k{\varphi _{j}^{I\left\{ y^i\,\,=\,\,j \right\}}}}
L(θ)=i=1∏mP(yi∣xi;θ)=i=1∏mj=1∏kφjI{yi=j}
再取对数得:
l
(
θ
)
=
∑
i
=
1
m
log
P
(
y
i
∣
x
i
;
θ
)
=
∑
i
=
1
m
log
∏
j
=
1
k
y
^
j
i
l\left( \theta \right) \,\,=\,\,\sum_{i=1}^m{\log P\left( y^i|x^i;\theta \right)}\,\,=\,\,\sum_{i=1}^m{\log \prod_{j=1}^k{\hat{y}_{j}^{i}}}
l(θ)=i=1∑mlogP(yi∣xi;θ)=i=1∑mlogj=1∏ky^ji
最后损失最小取负数,推导至损失函数:
J
(
θ
)
=
−
1
m
[
∑
i
=
1
m
∑
j
=
1
k
y
j
i
log
y
^
j
i
]
J\left( \theta \right) \,\,=\,\,-\frac{1}{m}\left[ \sum_{i=1}^m{\sum_{j=1}^k{y_{j}^{i}\log \hat{y}_{j}^{i}}}\,\, \right]
J(θ)=−m1[i=1∑mj=1∑kyjilogy^ji]
这里也可以看到 当 种类 k=2时, 也可以得到逻辑回归的损失函数:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
[
y
1
log
e
Z
1
e
Z
1
+
e
Z
2
+
y
2
log
e
Z
2
e
Z
1
+
e
Z
2
]
J\left( \theta \right) \,\,=\,\,-\frac{1}{m}\sum_{i=1}^m{\left[ y_1\log \frac{e^{Z_1}}{e^{Z_1}\,\,+\,\,e^{Z_2}}\,\,+\,\,y_2\log \frac{e^{Z_2}}{e^{Z_1}\,\,+\,\,e^{Z_2}} \right]}\,\,
J(θ)=−m1i=1∑m[y1logeZ1+eZ2eZ1+y2logeZ1+eZ2eZ2]
y
2
=
1
−
y
1
,
分子分母同时除以分子
y_2\,\,=\,\,1 -\,\,y_{1 }, \text{分子分母同时除以分子}
y2=1−y1,分子分母同时除以分子
J
(
θ
)
=
−
1
m
∑
i
=
1
m
[
y
1
log
1
1
+
e
Z
2
−
Z
1
+
(
1
−
y
1
)
log
1
1
+
e
Z
1
−
Z
2
]
J\left( \theta \right) \,\,=\,\,-\frac{1}{m}\sum_{i=1}^m{\left[ y_1\log \frac{1}{1 +\,\,e^{Z_2\,\,-\,\,Z_1}}\,\,+\,\,\left( 1 -\,\,y_1 \right) \log \frac{1}{1 +\,\,e^{Z_1\,\,-\,\,Z_2}} \right]}\,\,
J(θ)=−m1i=1∑m[y1log1+eZ2−Z11+(1−y1)log1+eZ1−Z21]
进一步可以得出 θ = θ2 - θ1
2.3 逻辑回归与Softmax回归的区别
两张图应该就可以了
逻辑回归,几维特征就几维参数,如3维特征,则除bias外,W也是{w1, w2, w3}。
Sotfmax回归,每个类别都对应几维特征,如 5维特征, 3类别,则除bias外,W为 3 X 5 = 15 维。
2.4 音乐分类器代码实现
2.4.1 数据预处理(傅里叶变换)
对每种音乐的前100首进行傅里叶变换,并保存特征
import numpy as np
from scipy import fft
from scipy.io import wavfile
import matplotlib.pyplot as plt
def load_music_data(type, num):
# 加载数据
sample_rate, X = wavfile.read('../../data/practice/genres/' + type + '/converted/' + type + '.' + str(num).zfill(5) + '.au.wav')
print(sample_rate, X.shape) # 采样率(多少波点/s), 波形
fft_features = FFT(sample_rate, X)
np.save('../../data/practice/genres/fft/' + type + '.' + str(num).zfill(5) + '.fft', fft_features)
def FFT(rate, X):
# 频域显示
plt.plot(abs(fft.fft(X, rate)))
plt.xlabel('frequency')
plt.ylabel('amplitude')
plt.title('FFT of music')
# 太高频 对人来说不是音乐
return abs(fft.fft(X, rate)[:1000])
if __name__ == '__main__':
# 傅里叶变换
genre_list = ['classical', 'jazz', 'country', 'pop', 'rock', 'metal']
for g in genre_list:
for n in range(100):
load_music_data(g, n)
2.4.2 训练与预测
训练并保存模型
import numpy as np
from sklearn.linear_model import LogisticRegression
import pickle
def load_music_data():
"""
读取傅里叶变换后的数据
:return: 数据 X 以及 标签 y
"""
# 加载数据
X, y = [], []
genre_list = ['classical', 'jazz', 'country', 'pop', 'rock', 'metal']
for g in genre_list:
for n in range(100):
fft_features = np.load('../../data/practice/genres/fft/' + g + '.' + str(n).zfill(5) + '.fft.npy')
X.append(fft_features)
y.append(genre_list.index(g))
return np.array(X), np.array(y)
if __name__ == '__main__':
# 读取傅里叶变换
X, y = load_music_data()
# 训练并保存
model = LogisticRegression(multi_class='multinomial', solver='sag', max_iter=10000)
model.fit(X, y)
output = open('model.pkl', 'wb')
pickle.dump(model, output)
output.close()
导入模型并预测
from pprint import pprint
from scipy.fft import fft
import numpy as np
from scipy.io import wavfile
import pickle
if __name__ == '__main__':
pkl_file = open('model.pkl', 'rb')
model = pickle.load(pkl_file)
pprint(model)
pkl_file.close()
print('Starting read wavfile...')
music_name = 'Sound Of Silence(From The Graduate)_Various Artists_128K.wav'
sample, X = wavfile.read('D:/Desktop/Videos/' + music_name)
test_fft_features = abs(fft(X)[:1000])
temp = model.predict(test_fft_features)
print(temp)