吴恩达·Machine Learning || chap5&6 Octave Tutorial&Logistic Regression 简记

本文介绍了机器学习中的基本操作、数据处理、计算和绘图,特别关注了梯度下降法和逻辑回归。在逻辑回归中,讨论了分类问题、sigmoid函数、决策边界和成本函数。提出了非线性决策边界的示例,并详细阐述了简化后的成本函数和梯度下降算法。此外,还提到了多类别分类的One-vs-all策略。
摘要由CSDN通过智能技术生成

5 Octave Tutorial

(Octave也可以用MATLAB学习,个人决定使用Python进行学习实现,5-6建议无论学习什么语言都可以看一看)

5-1 Basic operations

5-2 Moving data around

5-3 Computing on data

5-4 Plotting data

5-5 for,while,if statements,and function

5-6 Vectorization

Vectorization example
h θ ( x ) = ∑ j = 0 n θ j x j = θ T x h_\theta(x)=\sum_{j=0}^n \theta_jx_j=\theta^Tx hθ(x)=j=0nθjxj=θTx
Matlab

%% Unvectorized implementation
prediction = 0.0;
for j =1:n+1
	prediction=prediction + theta(j) * x(j)
end
%% Vectorized implementation
prediction = theta' * x

C++

// Unvectorized implementation
double prediction =0.0;
for (int j=0;j<n;j++)
    	prediction += theta[j] * x[j];
// Vectorized implementation
double prediction = theta.transpose() * x;

Gradient descent

6 Logistic Regression

6-1 Classification

Classification

y ∈ { 0 , 1 }              0 : " N e g a t i v e C l a s s " 1 : " P o s i t i v e C l a s s " y \in \{0,1\}\ \ \ \ \ \ \ \ \ \ \ \ \begin{matrix} 0:"Negative Class"\\1:"Positive Class" \end{matrix} y{0,1}            0:"NegativeClass"1:"PositiveClass"

h θ ( x ) h_\theta(x) hθ(x) can be >1 or <0

Logistic Regression: 0 ≤ h θ ( x ) ≤ 1 0 \le h_\theta(x) \le 1 0hθ(x)1

6-2 Hypothesis Representation

Logistic Regression Model

​ Want 0 ≤ h θ ( x ) ≤ 1 0 \le h_\theta(x) \le 1 0hθ(x)1

Sigmoid function/Logistic function

h θ ( x ) = g ( θ T x ) h_\theta(x)=g(\theta^Tx) hθ(x)=g(θTx) g ( z ) = 1 1 + e − z g(z)=\frac{1}{1+e^{-z}} g(z)=1+ez1
h θ ( x ) = 1 1 + e − θ T x h_\theta(x)=\frac{1}{1+e^{-\theta^Tx}} hθ(x)=1+eθTx1
Interpretation of Hypothesis Output

h θ ( x ) h_\theta(x) hθ(x)=estimated probability that y=1 on input​

probability that y= 1, given x, parameterized by $$

6-3 Decision boundary

h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x = P ( y = 1 ∣ x ; θ ) h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}=P(y=1|x;\theta) hθ(x)=g(θTx)=1+eθTx1=P(y=1x;θ)

predict “y=1” if θ T x ≥ 0 \theta^Tx\geq 0 θTx0 ( h θ ( x ) ≥ 0.5 h_\theta(x)\geq 0.5 hθ(x)0.5)

predict “y=0” if θ T x ≤ 0 \theta^Tx\leq 0 θTx0 ( h θ ( x ) ≤ 0.5 h_\theta(x)\leq 0.5 hθ(x)0.5)

Decision Boundary

Non-linear decision boundaries

h θ ( x ) = g ( θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 2 + θ 4 x 2 2 ) {h _ {\theta} ( x ) = g ( \theta _ { 0 } + \theta _ { 1 } x _ { 1 } + \theta _ { 2 } x _ { 2 } } { + \theta _ { 3 } x ^ { 2 } + \theta _ { 4 } x _ { 2 } ^ { 2 } }) hθ(x)=g(θ0+θ1x1+θ2x2+θ3x2+θ4x22)

6-4 Cost function

Training set:

{ ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ⋯   , ( x ( m ) , y ( m ) ) } \{ ( x ^ { ( 1 ) } , y ^ { ( 1 ) } ) , ( x ^ { ( 2 ) } , y ^ { ( 2 ) } ) , \cdots , ( x ^ { ( m ) } , y ^ { ( m ) } ) \} {(x(1),y(1)),(x(2),y(2)),,(x(m),y(m))}

m example

x ∈ [ x 0 x 1 x n ] x 0 = 1 , y ∈ { 0 , 1 } x \in \left[ \begin{array} { l } { x _ { 0 } } \\ { x _ { 1 } } \\ { x _ { n } } \end{array} \right] \quad x _ { 0 } = 1 , y \in \{ 0 , 1 \} xx0x1xnx0=1,y{0,1}

h θ ( x ) = 1 1 + e − θ T x h _ { \theta } ( x ) = \frac { 1 } { 1 + e ^ { - \theta Tx } } hθ(x)=1+eθTx1

Cost function

Linear regression

J ( θ ) = 1 m ∑ i = 1 m 1 2 ( h 0 ( x ( i ) ) − y ( i ) ) 2 = 1 m ∑ i = 1 m c o s t ( h θ ( x ( i ) ) , y ( i ) ) J ( \theta ) = \frac { 1 } { m } \sum _ { i = 1 } ^ { m } \frac { 1 } { 2 } ( h _ { 0 } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) ^ { 2 }= \frac { 1 } { m } \sum _ { i = 1 } ^ { m }cost(h_\theta(x^{(i)}),y{(i)}) J(θ)=m1i=1m21(h0(x(i))y(i))2=m1i=1mcost(hθ(x(i)),y(i))

non-convex ⟶ \longrightarrow ​ convex

Logistic regression function

Cost ⁡ ( h θ ( x ) , y ) = { − log ⁡ ( h θ ( x ) ) y = 1 − log ⁡ ( 1 − h θ ( x ) ) y = 0 \operatorname { Cost } ( h _ { \theta } ( x ) , y ) = \{ \begin{array} { l l } { - \log ( h _ { \theta } ( x ) ) } & { y = 1 } \\ { - \log ( 1 - h _ { \theta } ( x ) ) } & { y = 0 } \end{array} Cost(hθ(x),y)={log(hθ(x))log(1hθ(x))y=1y=0

Cost =0 if y= 1, h θ ( x ) h_\theta(x) hθ(x)=1But as h θ ( x ) h_\theta(x) hθ(x)→0,Cost→ ∞ \infty

Captures intuition that if h θ ( x ) h_\theta(x) hθ(x)​=0 (predict P(y=1lx;0)=0), but y=1,well penalize learning algorithm by a very large cost.

6-5 Simplified cost function and gradient descent

Logistic regression cost function

J ( θ ) = 1 m ∑ i = 1 m Cost ⁡ ( h θ ( x ( i ) ) , y ( i ) ) J ( \theta ) = \frac { 1 } { m } \sum _ { i = 1 } ^ { m } \operatorname { Cost } ( h _ { \theta } ( x ^ { ( i ) } ) , y ^ { ( i ) } ) J(θ)=m1i=1mCost(hθ(x(i)),y(i))​​
Cost ⁡ ( h o ( x ) , y ) = { − log ⁡ ( h θ ( x ) ) y = 1 − log ⁡ ( 1 − h θ ( x ) ) y = 0 \operatorname { Cost } ( h _ { o } ( x ) , y ) = \{ \begin{array} { l l } { - \log ( h _ { \theta } ( x ) ) } & { y = 1 } \\ { - \log ( 1 - h _ { \theta } ( x ) ) } & { y = 0 } \end{array} Cost(ho(x),y)={log(hθ(x))log(1hθ(x))y=1y=0

N o t e : y = 0   o r   1   a l w a y s Note: y =0\ or\ 1\ always Notey=0 or 1 always
↓ \downarrow
J ( θ ) = − 1 m [ ∑ i = 1 m y ( i ) log ⁡ h b ( x ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h g ( x ( i ) ) ) ] J(\theta)= - \frac { 1 } { m } [ \sum _ { i = 1 } ^ { m } y ^ { ( i ) } \log h _ { b } ( x ^ { ( i ) } ) + ( 1 - y ^ { ( i ) } ) \log ( 1 - h _ { g } ( x ^ { ( i ) } ) ) ] J(θ)=m1[i=1my(i)loghb(x(i))+(1y(i))log(1hg(x(i)))]

To fit parameters θ {\theta} θ

m i n θ J ( θ ) min_\theta J ( \theta ) minθJ(θ)

To make a prediction given new x:

​ Output h θ ( x ) = 1 1 + e − θ T x h _ { \theta } ( x ) = \frac { 1 } { 1 + e ^ { - \theta Tx } } hθ(x)=1+eθTx1

Gradient Descent

​ Repeat{

θ j : = θ j − α θ ∂ θ j J ( θ ) = θ j − α ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta _ { j } : = \theta _ { j } - \alpha \frac { \theta } { \partial \theta _ { j } }J ( \theta )=\theta _ { j } - \alpha \sum _ { i = 1 }^{m} ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) x _ { j } ^ { ( i ) } θj:=θjαθjθJ(θ)=θjαi=1m(hθ(x(i))y(i))xj(i)​​​

​ }

Algorithm looks identical to linear regression !

But h θ ( x ) h_\theta(x) hθ(x)​ different

6-6 Advanced optimization

Optimization algorithm

​ Cost function J ( θ ) J(\theta) J(θ).Want m i n θ J ( θ ) min_\theta J(\theta) minθJ(θ)

​ Given θ \theta θ, we have code that can compute

Optimization algorithms

-Gradient descent
-Conjugate gradient
-BFGS
-L-BFGS

Advantages:

  • No need to manually pick α \alpha α
  • Often faster than gradient descent

Disadvantages:

  • More complex
function [jVal,gradient]=costFunction(theta)
	jVal=(theta(1)-5)^2+...
		(theta(2)-5)^2;
	gradient=zero(2,1);
	gradient(1)=2*(theta(1)-5);
	gradient(2)=2*(theta(2)-5);
options =optimset('GradObj','on','MaxIter','100');
initialTheta =zeros(2,1);
[optTheta,functionVal,exitFlag]...=Fminunc(@costFuncion,initiaTheta,options);

note: theta 0 is actually written theta 1 in octave

6-7 Multi-class classification : One-vs-all

Multiclass classification

​ *example :*Email foldering/tagging: Work, Friends, Family, Hobby

h θ ( i ) ( x ) = P ( y = i ∣ x ; θ ) ( i = 1 , 2 , 3 ⋯   ) h _ { \theta } ^ { ( i ) } ( x ) = P ( y = i | x ; \theta ) \quad ( i = 1 , 2 , 3 \cdots) hθ(i)(x)=P(y=ix;θ)(i=1,2,3)​​

One-vs-all
Train a logistic regression classifier h θ i ( x ) h_\theta^{i}(x) hθi(x) for each class i i i​ to predict the probability that y = i y=i y=i

On a new input to make a prediction, pick the class i that maximizes

max ⁡ i h θ ( i ) ( x ) \operatorname {max}_i h _ { \theta } ^ { ( i ) } ( x ) maxihθ(i)(x)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值