One Vs All

最新推荐文章于 2024-10-22 15:32:44 发布

墨轩子卿

最新推荐文章于 2024-10-22 15:32:44 发布

阅读量950

点赞数 1

分类专栏：机器学习文章标签： python 开发语言后端

本文链接：https://blog.csdn.net/qq_46141221/article/details/121181294

版权

机器学习专栏收录该内容

12 篇文章

订阅专栏

本文介绍了如何使用OneVsAll算法训练一个多类别分类模型，从数据加载、可视化、损失函数到梯度下降和模型训练。通过实例展示了如何实现逻辑回归的多标签识别，并得到了高精度的93.28%。重点讨论了正则化的应用和预测阶段的实现过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

One VS All

目的：

训练一个一对多的模型，本次实验主要是训练出一个识别（0~9）十个数字的模型

加载数据：

# 初始化数据
input_layer_size = 400
num_labels = 10


# 加载数据
print('Loading and Visualizing data')
data = io.loadmat('ex3data1.mat')
X = data["X"]
y = data["y"]
m = np.size(X, 0)

可视化数据：

def displayData(x):
    example_width = int(np.round(np.sqrt(np.size(x, 1))))
    m, n = x.shape
    example_height = int(n/example_width)
    display_rows = int(np.floor(np.sqrt(m)))
    display_cols = int(np.ceil(m / display_rows)) # 注意一定要转换为int类型，不然运算会报错
    pad = 1
    display_array = - np.ones((pad + display_rows * (example_height+pad), pad + display_cols * (example_width + pad))) # 空白大画布
    curr_ex = 0
    for j in range(display_rows):
        for i in range(display_cols):
            if curr_ex > m: # 展示边界
                break
            max_vals = np.max(np.abs(X[curr_ex, :])) # 从x中取出数据并构成对应大小的矩阵
            display_array[pad+j*(example_height+pad):pad+j*(example_height+pad)+example_height,
                            pad+i*(example_width+pad):pad+i*(example_width+pad)+example_width]\
                = x[curr_ex, :].reshape((example_height, example_width)) / max_vals
            curr_ex += 1
        if curr_ex > m:
            break
    plt.figure()
    plt.imshow(display_array.T, cmap='gray', extent=[-1, 1, -1, 1]) # extent有什么用吗？imshow是展示成图片的形式
    plt.axis('off')
    plt.show()

rand_indices = np.random.permutation(m)
sel = X[rand_indices[0:100], :]
displayData(sel)

某一次的结果如下：

ova-data

训练模型：

损失函数：

def sigmoid(z):
    return 1 / (1 + np.exp(-z))


def lrCostFunction(theta, x, y, lambda_t):
    m = np.size(y, 0)
    grad = np.zeros(np.size(theta, ))
    z = x.dot(theta)
    J = 1 / m * (-y.dot(np.log(sigmoid(z))) - ((1 - y).dot(np.log(1 - sigmoid(z))))) + lambda_t * (
        theta.dot(theta)) / (2 * m)
    grad[1:] = 1 / m * ((x[:, 1:].T.dot(sigmoid(z) - y)) + lambda_t * theta[1:]) # 第一个不必参与正则化
    grad[0] = 1 / m * (x[:, 0].T.dot(sigmoid(z) - y))
    return J, grad

测试：

print('Testing lrcostfuntion with regularization')
theta_t = np.array([-2, -1, 1, 2])
x_t = np.c_[np.ones(5), np.arange(1, 16).reshape((3, 5)).T/10]
y_t = np.array([1, 0, 1, 0, 1])
lambda_t = 3
J, grad = lrCostFunction(theta_t, x_t, y_t, lambda_t)
np.set_printoptions(formatter={'float': '{: 0.6f}'.format})
print('Cost: {:0.7f}'.format(J))
print('Expected cost: 3.734819')
print('Gradients:\n{}'.format(grad))
print('Expected gradients:\n[ 0.146561 -0.548558 0.724722 1.398003]')
_ = input('Press [enter] to continue')

运行结果如下：

Testing lrcostfuntion with regularization
Cost: 3.7348194
Expected cost: 3.734819
Gradients:
[ 0.146561 -0.548558  0.724722  1.398003]
Expected gradients:
[ 0.146561 -0.548558 0.724722 1.398003]

梯度下降：

def costFun(theta, x, y, lamd):
    return lrCostFunction(theta, x, y, lamd)[0]


def gradient(theta, x, y, lamd):
    return lrCostFunction(theta, x, y, lamd)[1]


def oneVsAll(x, y, num_labels, lamd):
    m, n = x.shape
    all_theta = np.zeros((num_labels, n+1))
    x = np.c_[np.ones(m), x]
    for i in range(num_labels):
        initial_theta = np.zeros((n + 1,))
        num = 10 if i == 0 else i
        res = op.minimize(fun=costFun, x0=initial_theta, method='BFGS', jac=gradient, args=(x, 1*(y == num).flatten(), lamd), options={'maxiter':50}) # method 是BFGS，不能是TNC，不然会报错，而且y必须降维不然运行有问题
        all_theta[i, :] = res.x
    return all_theta

print('Training One-VS-All Logistic Regression')
lamd = 0.1
all_theta = oneVsAll(X, y, num_labels, lamd)
_ = input('Press [enter] to continue')

预测：

def predictOneVsAll(theta, x):
    m = x.shape[0]
    x = np.c_[np.ones(m), x]
    p = np.argmax(sigmoid(x.dot(theta.T)), axis=1)
    p[p == 0] = 10 # 将10
    return p


p = predictOneVsAll(all_theta, X)
print('Training Set Accuracy: ', np.mean(np.double(p == y.flatten())) * 100)