逻辑回归(logistic regression)

最新推荐文章于 2023-06-28 11:00:00 发布

猪猪奋斗记

最新推荐文章于 2023-06-28 11:00:00 发布

阅读量1.1k

点赞数 1

分类专栏： Machine Learning 文章标签：逻辑回归机器学习

本文链接：https://blog.csdn.net/bigbigship/article/details/50477515

版权

Machine Learning 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

基本概念

逻辑回归是一种概率型非线性回归模型。虽然名字里面有回归，但是他其实是一种分类的方法，通常是用来研究在某些条件下某个结果会不会发生，例如：已知病人身体里的肿瘤的情况，然后判断这个肿瘤是良性还是恶性。

逻辑回归与线性回归

逻辑回归同线性回归一样都需要有一个假设函数 $h_\theta(x)$ ,代价函数 $J(\theta)$ ,在大体上基本相同。
在线性回归中：

h θ (x) = θ 0 + θ 1 * x 1 + . . . + . . θ n * x n

$h_\theta(x)=\theta_0+\theta_1*x_1+...+..\theta_n*x_n$
但是我们引入了一个

Sigmoid $Sigmoid$ 函数，将结果给映射到区间

(0,1) $(0,1)$ .

Sigmoid $Sigmoid$ 函数：

π (x) = 1 1 + e - x

$\pi(x)= \frac{1}{1+e^{-x}}$
因此在逻辑回归中的假设函数就是

h θ (x) = π (θ τ x) = 1 1 + e - θ τ x

$h_\theta(x)=\pi(\theta^\tau x)\\ =\frac{1}{1+e^{-\theta^\tau x}}$

π(x) $\pi(x)$ 的定义域为

(−∞,+∞) $(-\infty,+\infty)$ ,值域为

(0,1) $(0,1)$ 。因此就将最后的结果映射到了

(0,1) $(0,1)$ 中。

P (y = 1 | x, θ) = π (θ τ x) P (y = 0 | x, θ) = 1 - π (θ τ x)

$P(y=1|x,\theta)=\pi(\theta^\tau x) P(y=0|x,\theta)=1-\pi(\theta^\tau x)$
当

hθ(x)≥0.5 $h_\theta(x)\geq 0.5$ 时,

y=1 $y=1$
当

hθ(x)<0.5 $h_\theta(x)< 0.5$ 时,

y=0 $y=0$

代价函数 $J(\theta)$

在求解参数 $\theta$ 的时候我们用用极大似然估计来求解。设 $pi=P(y_i=1|xi;\theta)$ 表示在给定条件下 $y_i=1$ 的概率，则 $pi=P(y_i=0|xi;\theta)=1-pi$ ,所以可以得到一个观测值的概率 $P(y_i)=p_i^{y_i}(1-p_i)^{1-y_i}$ ,各样本间相互独立就可以得到似然函数为：

L (θ) = Π m i = 1 [h θ (x i)] y i [1 - h θ (x i)] 1 - y i

$L(\theta)=\Pi_{i=1}^m[h_\theta(x_i)]^{y_i}[1-h_\theta(x_i)]^{1-y_i}$
目标就是求这个函数的值最大的时候的参数

θ $\theta$ ,推导过程如下：

$l n L (θ) = Σ m i = 1 [y (i) l n (h θ (x (i))) + (1 - y (i)) l n (1 - h θ (x (i)))]$ $lnL(\theta)=\Sigma_{i=1}^m[y^{(i)}ln(h_\theta(x^{(i)}))+(1-y^{(i)})ln(1-h_\theta(x^{(i)}))]$
$l n L (θ) = Σ m i = 1 (y (i) l n (1 1 + e - x ( i )) + (1 - y (i)) l n (1 - 1 1 + e - x ( i )))$ $lnL(\theta)=\Sigma_{i=1}^m(y^{(i)}ln(\frac{1}{1+e^{-x^{(i)}}})+(1-y^{(i)})ln(1-\frac{1}{1+e^{-x^{(i)}}}))$
$l n L (θ) = Σ m i = 1 (y (i) l n (e x ( i ) e x ( i ) + 1) + (1 - y (i)) l n (1 e x ( i ) + 1))$ $lnL(\theta)=\Sigma_{i=1}^m(y^{(i)}ln(\frac{e^{x^{(i)}}}{e^{x^{(i)}}+1})+(1-y^{(i)})ln(\frac{1}{e^{x^{(i)}}+1}))$
$l n L (θ) = Σ m i = 1 (y (i) * l n e x (i) - y (i) * l n (1 + e x (i)) - (1 - y (i)) * l n (1 + e x (i)))$ $lnL(\theta)=\Sigma_{i=1}^m(y^{(i)}*lne^{x^{(i)}}-y^{(i)}*ln(1+e^{x^{(i)}})-(1-y^{(i)})*ln(1+e^{x^{(i)}}))$
$l n L (θ) = Σ m i = 1 (x (i) y (i) - l n (1 + e x (i)))$ $lnL(\theta)=\Sigma_{i=1}^m(x^{(i)}y^{(i)}-ln(1+e^{x^{(i)}}))$

我们要求得 $\theta$ ，使得 $L(\theta)$ 最大，那么我们的代价函数就可以这样表示：

J (θ) = - 1 m l n L (θ) = - 1 m Σ m i = 1 (x (i) y (i) - l n (1 + e x (i)))

$J(\theta)=-\frac{1}{ m}lnL(\theta)=-\frac{1}{ m}\Sigma_{i=1}^m(x^{(i)}y^{(i)}-ln(1+e^{x^{(i)}}))$
为了求得

θ $\theta$ 我们可以用梯度下降法:
对

J(θ) $J(\theta)$ 求偏导：

\partial J ( θ ) \partial θ j = - Σ m i = 1 [y (i) - h θ (x (i))] * x (i) j = Σ m i = 1 [h θ (x (i)) - y (i)] * x (i) j

$\frac{\partial J(\theta)}{\partial \theta_j}=-\Sigma_{i=1}^m[y^{(i)}-h_\theta(x^{(i)})]*x_j^{(i)}=\Sigma_{i=1}^m[h_\theta(x^{(i)})-y^{(i)}]*x_j^{(i)}$
因此完整的梯度下降应该为：

θ j = θ j + α \partial J ( θ ) \partial θ j = θ j + α Σ m i = 1 [h θ (x (i)) - y (i)] * x (i) j

$\theta_j=\theta_j+\alpha\frac{\partial J(\theta)}{\partial \theta_j}=\theta_j+\alpha\Sigma_{i=1}^m[h_\theta(x^{(i)})-y^{(i)}]*x_j^{(i)}$

Matlab code

$cost function$

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%

z = X*theta;
hx = 1 ./ (1 + exp(-z));
J = -1/m *sum([y'*log(hx) + (1-y)'*log(1-hx)]);
%J = 1/m * sum([-y' * log(hx) - (1 - y)' * log(1 - hx)]);
for j = 1:length(theta)
    grad(j)=1/m*sum((hx-y)'*X(:,j));
end;






% =============================================================

end

$Sigmoid$

function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
%   J = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly 
g = zeros(size(z));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).

sz = size(z);

for i = 1:sz(1),
    for j = 1:sz(2),
        g(i,j) = 1./(1+exp(-z(i,j)));
    end;
end;
% =============================================================

end

$predict$

function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic 
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a 
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

m = size(X, 1); % Number of training examples

% You need to return the following variables correctly
p = zeros(m, 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters. 
%               You should set p to a vector of 0's and 1's
%


pp = sigmoid(X*theta);

pos = find(pp>=0.5);

p(pos,1)=1;




% =========================================================================


end