[Machine Learning]学习笔记-Logistic Regression

最新推荐文章于 2024-08-07 23:44:12 发布

weixin_30813225

最新推荐文章于 2024-08-07 23:44:12 发布

阅读量84

点赞数

文章标签：数据结构与算法 matlab

原文链接：http://www.cnblogs.com/messier/p/7798213.html

版权

[Machine Learning]学习笔记-Logistic Regression

模型-二分类任务

Logistic regression,亦称logtic regression，翻译为“对数几率回归”，是一种分类学习方法。和先前的线性回归模型不同的是，输出的y一般是离散量的集合，如输出\(y \in \{0,1\}\)的二分类任务。
考虑二分类任务，线性回归模型产生的\(Z=\theta ^TX\)是连续的实值，需要用一个函数\(g(\theta ^TX)\)将z转换为0/1值。

可以采用对数几率函数(Logistic Function,亦称Sigmoid Function):
\[g(z)=\frac{1}{1+e^{-z}}\]
至此，可以确定假设方程\(h_\theta(x)\)的形式：
\[ \begin{align*}& h_\theta (x) = g ( \theta^T x ) \newline \newline& z = \theta^T x \newline& g(z) = \dfrac{1}{1 + e^{-z}}\end{align*} \]
令\(y=g(z)\),可得：
\[ \ln \frac{y}{1-y}=\theta^T x \]
若将y视为样本为正例的可能性,则1-y为反例可能性。
上式可重写为：
\[\ln \frac{p(y=1 | x ; \theta)}{p(y=0 | x ; \theta)}=\theta^T x\]
显然有：
\[ p(y=1 | x ; \theta)=\frac{e^{\theta^T x}}{1+e^{\theta^T x}}=h_\theta (x) \\p(y=0 | x ; \theta)=\frac{1}{1+e^{\theta^T x}}=1-h_\theta (x) \]

可以由极大似然法(maximum likelihood method)来估计\(\theta\),
最大化似然概率\(L(\theta)\),即令每个样本属于其真实标记的概率越大越好：
\[ \begin{equation*} \begin{split} L(\boldsymbol{\theta}) & =p(\mathbf{y}|\mathbf{X}; \boldsymbol{\theta}) \\ & =\prod_{i=1}^{m}p(y_{i}|\mathbf{x}_{i}; \boldsymbol{\theta}) \\ & =\prod_{i=1}^{m} (h_{\boldsymbol{\theta}}(\mathbf{x}_{i}))^{y_{i}} (1-h_{\boldsymbol{\theta}}(\mathbf{x}_{i}))^{1-y_{i}} \end{split} \end{equation*} \]

为了方便求导，对等式两边同时取对数，将\(L(\theta)\)转换为凸函数(convex function),可得：
\[ \begin{equation*} \begin{split} l(\boldsymbol{\theta}) & =\text{log}L(\boldsymbol{\theta}) \\ & = \sum_{i=1}^{m} y_{i} \text{log} h_(\mathbf{x}_{i})+(1-y_{i})\text{log}(1-h_(\mathbf{x_i})) \end{split} \end{equation*} \]
要使\(l(\theta)\)达到最大值，可以构造代价函数\(J(\theta)\):
\[J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]\]

接下来就可以用梯度下降法求得\(J(\theta)\)的最小值了。
\[\begin{align*}& Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta) \newline & \rbrace\end{align*}\]
求偏导：
\[\begin{equation*} \begin{split} \frac{\partial }{\partial \theta_{j}}l(\boldsymbol{\theta}) & = -\frac{1}{m}\left ( \frac{y}{g(\boldsymbol{\theta}^{T}\mathbf{x})}-\frac{1-y}{1-g(\boldsymbol{\theta}^{T}\mathbf{x})} \right) \frac{\partial }{\partial \theta_{j}} g(\boldsymbol{\theta}^{T}\mathbf{x}) \\ & =-\frac{1}{m}\left( \frac{y}{g(\boldsymbol{\theta}^{T}\mathbf{x})}-\frac{1-y}{1-g(\boldsymbol{\theta}^{T}\mathbf{x})} \right) g(\boldsymbol{\theta}^{T}\mathbf{x}) (1-g(\boldsymbol{\theta}^{T}\mathbf{x})) \frac{\partial }{\partial \theta_{j}} \boldsymbol{\theta}^{T}\mathbf{x} \\ & =-\frac{1}{m}\left( y(1-g(\boldsymbol{\theta}^{T}\mathbf{x})) -(1-y) g(\boldsymbol{\theta}^{T}\mathbf{x}) \right)x_{j} \\ & =-\frac{1}{m}(y-g(\boldsymbol{\theta}^{T}\mathbf{x}))x_{j} \\ & =\frac{1}{m}(h_{\boldsymbol{\theta}}(\mathbf{x})-y)x_{j} \\ \end{split} \end{equation*}\]
化简后可得：
\[\begin{align*} & Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline & \rbrace \end{align*}\]

week 3的课中介绍了matlab中采用梯度下降法的优化函数：fminunc
只要写出如下形式的代价函数后：

function [J, grad] = costFunction(theta, X, y)
J = 0;
grad = zeros(size(theta));
rows=size(X,1);
cols=size(X,2);
hx=sigmoid(X*theta);     %rows*1的h_theta(x^i)的值
for i=1:rows
    J=J-1/m*(y(i)*log(hx(i))+(1-y(i))*log(1-hx(i)));
    for j=1:cols
    grad(j)=grad(j)+1/m*(hx(i)-y(i))*X(i,j);
    end
end

就可以调用该函数计算出\(\theta\)和J：

options = optimset('GradObj', 'on', 'MaxIter', 400);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost 
[theta, cost] = ...
    fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);

这篇博客中介绍了详细用法，先mark一下。

多分类任务

基本解决思路是将多分类任务拆解为若干个二分类任务求解。
最经典的拆分策略有三种："一对一"(OvO),“一对其余”（OvR）和多对多（MvM）。
在这里介绍下OvR：对于N个类别，分别训练N个分类器，每个分类器仅将一个类作为正例，其余作为反例。最后将置信度最大的分类器的结果作为预测的结果。如下：
\[\begin{align*}& y \in \lbrace0, 1 ... n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}\]

转载于:https://www.cnblogs.com/messier/p/7798213.html

weixin_30813225

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[Machine Learning]学习笔记-Logistic Regression

[Machine Learning]学习笔记-Logistic Regression模型-二分类任务Logistic regression,亦称logtic regression，翻译为“对数几率回归”，是一种分类学习方法。和先前的线性回归模型不同的是，输出的y一般是离散量的集合，如输出\(y \in \{0,1\}\)的二分类任务。考虑二分类任务，线性回归模型产生的\(Z=\theta ^...
复制链接

扫一扫