(嗯,不知怎么的乳胶不起作用了,我会把它作为图片上传):
我就是这样实现的:def discriminant_function(x_vec, cov_mat, mu_vec):
"""
Calculates the value of the discriminant function for a dx1 dimensional
sample given covariance matrix and mean vector.
Keyword arguments:
x_vec: A dx1 dimensional numpy array representing the sample.
cov_mat: numpy array of the covariance matrix.
mu_vec: dx1 dimensional numpy array of the sample mean.
Returns a float value as result of the discriminant function.
"""
W_i = (-1/2) * np.linalg.inv(cov_mat)
assert(W_i.shape[0] > 1 and W_i.shape[1] > 1), 'W_i must be a matrix'
w_i = np.linalg.inv(cov_mat).dot(mu_vec)
assert(w_i.shape[0] > 1 and w_i.shape[1] == 1), 'w_i must be a column vector'
omega_i_p1 = (((-1/2) * (mu_vec).T).dot(np.linalg.inv(cov_mat))).dot(mu_vec)
omega_i_p2 = (-1/2) * np.log(np.linalg.det(cov_mat))
omega_i = omega_i_p1 - omega_i_p2
assert(omega_i.shape == (1, 1)), 'omega_i must be a scalar'
g = ((x_vec.T).dot(W_i)).dot(x_vec) + (w_i.T).dot(x_vec) + omega_i
return float(g)
为了对数据进行分类,我写了:
导入操作员
^{pr2}$
到目前为止,该代码用于分类,例如import prettytable
classification_dict, error = empirical_error(all_samples, [1,2], classify_data, [discriminant_function,\
[mu_est_1, mu_est_2],
[cov_est_1, cov_est_2]])
labels_predicted = ['w{} (predicted)'.format(i) for i in [1,2]]
labels_predicted.insert(0,'training dataset')
train_conf_mat = prettytable.PrettyTable(labels_predicted)
for i in [1,2]:
a, b = [classification_dict[i][j] for j in [1,2]]
# workaround to unpack (since Python does not support just '*a')
train_conf_mat.add_row(['w{} (actual)'.format(i), a, b])
print(train_conf_mat)
print('Empirical Error: {:.2f} ({:.2f}%)'.format(error, error * 100))
+------------------+----------------+----------------+
| training dataset | w1 (predicted) | w2 (predicted) |
+------------------+----------------+----------------+
| w1 (actual) | 49 | 1 |
| w2 (actual) | 1 | 49 |
+------------------+----------------+----------------+
Empirical Error: 0.02 (2.00%)
对于这样一个简单的数据集:
编辑:
对于协方差相等(线性决策边界)的简单情况,我可以使用fsolve函数:from scipy.optimize import fsolve
x = list(np.arange(-2, 6, 0.1))
y = [fsolve(lambda y: discr_func(i, y, cov_mat=cov_est_1, mu_vec=mu_est_1) - \
discr_func(i, y, cov_mat=cov_est_2, mu_vec=mu_est_2), 0) for i in x]
在
但是,它不适用于二次解,我知道/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/scipy/optimize/minpack.py:236: RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last five Jacobian evaluations.
warnings.warn(msg, RuntimeWarning)
有什么建议或选择吗?在
编辑2:
我可以通过from scipy.optimize.bisect(类似于fsolve)来解决它。结果看起来是“正确的”——我为一个更简单的情况解出了方程,其中决策边界是一个线性函数(x2=3-x1),当我在它上面使用bisect时,它计算了例如x1=3和x2=3的精确结果。在
这里的线性似然估计和极大似然估计的结果是一样的!
非常感谢您的时间和帮助!在
为from matplotlib import pyplot as plt
import numpy as np
import scipy.optimize
x = np.arange(-6,6,0.1)
true_y = [true_dec_bound(x1) for x1 in x]
for i in [50,1000,10000]:
# compute boundary for MLE estimate
y_est = []
for j in x:
y_est.append(scipy.optimize.bisect(lambda y: discr_func(j, y, cov_mat=cov1_ests[i], mu_vec=mu1_ests[i]) - \
discr_func(j, y, cov_mat=cov2_ests[i], mu_vec=mu2_ests[i]), -10, 10))
y_est = [float(i) for i in y_est]
# plot data
f, ax = plt.subplots(figsize=(7, 7))
plt.ylabel('$x_2$', size=20)
plt.xlabel('$x_1$', size=20)
ax.scatter(samples_c1[i][:,0], samples_c1[i][:,1], \
marker='o', color='green', s=40, alpha=0.5, label='$\omega_1$')
ax.scatter(samples_c2[i][:,0], samples_c2[i][:,1], \
marker='^', color='red', s=40, alpha=0.5, label='$\omega_2$')
plt.title('%s bivariate random training samples per class' %i)
plt.legend()
# plot boundaries
plt.plot(x_true50, y_true50, 'b--', lw=3, label='true param. boundary')
plt.plot(x_est50, y_est50, 'k--', lw=3, label='MLE boundary')
plt.legend(loc='lower left')
plt.show()