从LR损失函数优化学习梯度下降

最新推荐文章于 2022-10-06 22:56:51 发布

痘痘有糖

最新推荐文章于 2022-10-06 22:56:51 发布

阅读量693

点赞数

分类专栏： AI 文章标签： ml

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/feilibaby/article/details/103758838

版权

AI 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

梯度下降法

由于LR的损失函数为:

这样就变成了求min(Jω)

其中α为步长,直到Jω不能再小时停止

梯度下降法的最大问题就是会陷入局部最优，并且每次在对当前样本计算cost的时候都需要去遍历全部样本才能得到cost值，这样计算速度就会慢很多（虽然在计算的时候可以转为矩阵乘法去更新整个ω值）

随机梯度下降法

现在好多框架（mahout）中一般使用随机梯度下降法，它在计算cost的时候只计算当前的代价，最终cost是在全部样本迭代一遍之求和得出，还有他在更新当前的参数w的时候并不是依次遍历样本，而是从所有的样本中随机选择一条进行计算，它方法收敛速度快（一般是使用最大迭代次数），并且还可以避免局部最优，并且还很容易并行（使用参数服务器的方式进行并行）

这里SGD可以改进的地方就是使用动态的步长

其他优化方法

拟牛顿法（使用Hessian矩阵和cholesky分解）
BFGS
L-BFGS

优缺点：无需选择学习率α，更快，但是更复杂。

Matlab实现：

https://github.com/fairlyxu/ml/tree/master/mlclass-ex2-007/mlclass-ex2-007/mlclass-ex2

function [J, grad] = costFunction(theta, X, y)

%COSTFUNCTION Compute cost and gradient for logistic regression

% J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the

% parameter for logistic regression and the gradient of the cost

% w.r.t. to the parameters.

% Initialize some useful values

m = length(y); % number of training examples

% You need to return the following variables correctly

J = 0;

grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================

% Instructions: Compute the cost of a particular choice of theta.

% You should set J to the cost.

% Compute the partial derivatives and set grad to the partial

% derivatives of the cost w.r.t. each parameter in theta

%

% Note: grad should have the same dimensions as theta

%

for i=1:m

J= J- y(i,:)*log(sigmoid(X(i,:)*theta))-(1-y(i,:))*log(1-sigmoid(X(i,:)*theta));

grad(1) = grad(1) + (sigmoid(X(i,:) * theta) - y(i)) * X(i,1);

grad(2) = grad(2) + (sigmoid(X(i,:) * theta) - y(i)) * X(i,2);

grad(3) = grad(3) + (sigmoid(X(i,:) * theta) - y(i)) * X(i,3);

end

grad = grad /m

J = J/m;

%grad = X' * (sigmoid(X * theta) - y) / m

%grad = grad/m;

% =============================================================

end

grad = zeros(3);

for i = 1:m

grad(1) = grad(1) + (sigmoid(X(i,:) * theta) - y(i)) * X(i,1);

grad(2) = grad(2) + (sigmoid(X(i,:) * theta) - y(i)) * X(i,2);

grad(3) = grad(3) + (sigmoid(X(i,:) * theta) - y(i)) * X(i,3);

end

grad = grad /m

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
从LR损失函数优化学习梯度下降

梯度下降法由于LR的损失函数为:这样就变成了求min(Jω)其中α为步长,直到Jω不能再小时停止梯度下降法的最大问题就是会陷入局部最优，并且每次在对当前样本计算cost的时候都需要去遍历全部样本才能得到cost值，这样计算速度就会慢很多（虽然在计算的时候可以转为矩阵乘法去更新整个ω值）随机梯度下降法现在好多框架（mahout）中一般使用随机梯度下降法，它在计算co...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。