Neural Networks and Deep Learning 第二周 Logistic Regression

1. 二分分类 Binary Classification

1.1 Logistic Regression

There is a cat picture. You want to know whether it is a cat picture, if yes, labeled () = 1;otherwise  = 0.

So if your input image is 64 pixels by 64 pixels,
then you would have 3 64 by 64 matrices
corresponding to the red, green and blue pixel intensity values for your images. So to turn these pixel intensity values- Into a feature vector, what we're going to do is unroll all of these pixel values into an input feature vector x.  And so we're going to use nx=12288 to represent the dimension of the input features x. So, the dimension of X depends on its feature numbers.

A single training examples

An entire training set consists of :

1:a set of .

: The number of samples. : the number of train samples; : the number of test samples.

y belongs to a 1*m space. (这个和我学的不同,一般不是m*n的矩阵吗?)

So, let's simply discuss the logistic regression below:

When given a x^(m) (X in a R^(nx) dimensions) (the input data), we need a y^(m) to label X^(m) like {(X, y)}.

But if you want your y is a possibility label which shows the possibility of result is one. For example, you want to know the possibility of to be sunny (not rainy) tomorrow. How to do that?  


We know that the range of w.T.dotX is +∞ to -∞. The problem is transfer to how to narrow down the results in a (0,1) region. Notice the sigmoid function from above picture. The sigmoid function  sig = 1/1+e^(-z) has the ability. Imagining the z is w.T.dotX, when z close to +∞,(larger positive number) e^(-z) -> 0, sig -> 1 comparing that sig -> 0 when z close to -∞(small or larger negative number); when w.t.dotX is 0, sig = 0.5. So, we can set that when p(sig) > 0.5, y =1; p(sig) < 0.5, y = 0.

Cost function
Loss function definition:

 One thing you could do is define the loss when your algorithm outputs y-hat and the true label as Y to be. (也就是看自己预测的值和真实值之间的差距,那肯定是 as small as possible).

Some important info above:

  1. Given you a set of , you want to your prediction y-hat close to y. (这样才算预测准备)
  2. Finding a Loss functionj which measures the diference between y-hat and real y.  
  3. When  , y-hat should be predicted as 1; So,  should be as small as possible.  should be as large as possible, should as large as possible.  should close to 1.
  4. Loss function is dealing with single neural network; Cost finction to large samples.

Gradient Descent

  1. When w,b change, the cost function will change. So, the queation transfer to find a good w,b to let cost function as small as possible.
  2. α is the learning rate, and controls how big a step we take on each iteration or gradient descent.
  3. dw can be represented the deviation of dJ(w)/dw in python.

So, w and b gradient descent can be this:


Computation Graph (计算图)

When you want to compute a function J(a,b,c) = 3(a+bc)时,可以预设,。那么,就可以做出一个计算图出来,蓝色的线条表示 forward propagation 前向传播红色的线表示backward propagation反向传播那么,就能通过反向传播求出每个input和hidden layer相对于的导数。为什么要求这个导数呢?求导数就是求当因子改变时,对函数J的影响多少。下面,通过一个计算的例子来介绍这个计算方法。



用Logistic Regression做例子来看。

So, first, we list all the formulas needed to caculate loss function of logistic regression. Then, we can get the computation graph. w1,w2 and b are the parameters we needed to refine which depends the model quality.

So, in order to caculate the deriate of w1,w2 and b of L(a,y): , and , we can use backward propagation.

  1. caculate =.
  2. caculate  is the derivate of sigmoid function. So, dz = 
  3. caculate  and other.

So, we can get how to caculate or optimize the parameters use coumputation graph. (导数复合求导,讲的很不错,能够理解公式)

Gradient Descent in m samples?

When total samples = m = [1,2,3,4...m], we use  represent the sample represent the deriate of sample . For total sample , they should be the average of every sample (对每个sample的parameter参数的导数求平均)



写2个for loop;for loop会降低性能;怎么办呢? Vectorization。

Broadcasting in Python
import numpy as np
A = np.array([[56.0,0.0,4.4,68.0],


[[ 56.    0.    4.4  68. ]
 [  1.2 104.   52.    8. ]
 [  1.8 135.   99.    0.9]]

cal = A.sum(axis=0)  #sum vertically 纵向加

#这是个横向的 因为是一维数组,一维数组默认横向 维度是(1,n)
array([ 59. , 239. , 155.4,  76.9])

#这是个横向的 二维数组,一个子数组代表一个row
cal = cal.reshape(1,4)
array([[ 59. , 239. , 155.4,  76.9]])

cal_verti = cal.reshape(4,1)
array([[ 59. ],
       [239. ],
       [ 76.9]])

percentage = 100 * A/cal

array([[94.91525424,  0.        ,  2.83140283, 88.42652796],
       [ 2.03389831, 43.51464435, 33.46203346, 10.40312094],
       [ 3.05084746, 56.48535565, 63.70656371,  1.17035111]])

B = np.array([1,2,3,4])
B + 100

array([101, 102, 103, 104])

c = np.array([[1,2,3],[4,5,6]])
c1 = np.array([100,200,300])
c2 = np.array([[100],[200]])
print(c + c1)

[[101 202 303]
 [104 205 306]]
[[101 102 103]
 [204 205 206]]


a = np.random.randn(5)
#这是个(5,)的vector,rank1没有纵坐标 it's neither a row vector nor a column vector.

print(a.T)  #一样的
[ 0.61129989 -0.48008827  1.39754925 -0.90183129  0.13849732]

b = np.random.randn(5,1)

[[ 1.46909732]
 [ 0.50947828]


[[ 1.46909732 -0.70696235  0.50947828 -0.48711335 -1.61225188]]

#所以不要用rank1 vector。





