神经网络思想建立LR模型(DL公开课第二周答案)

上海站 | 高性能计算之GPU CUDA培训

4月13-15日
三天密集式学习  快速带你晋级
阅读全文
>


正文共6603个字,3张图,预计阅读时间17分钟。


LR回顾


LR计算图求导


算法结构


设计一个简单的算法实现判别是否是猫。


用一个神经网络的思想建立一个LR模型,下面这个图解释了为什么LR事实上是一个简单的神经网。


[图片上传失败...(image-4b2c8b-1515499689320)]


Mathematical expression of the algorithm:

For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$
$$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$


构建算法的各个部分


建立神经网络的主要步骤是:


  1. 定义模型结构(例如输入特性的数量)

  2. 初始化模型的参数

  3. 循环:

               计算当前损失(正向传播)

               计算当前梯度(向后传播)

               更新参数(梯度下降)


您通常将1-3单独构建并将它们集成到一个我们称为model()的函数中。


01

工具函数


# GRADED FUNCTION: sigmoiddef sigmoid(z):    

"""    

Compute the sigmoid of z    

Arguments:    z -- A scalar or numpy array of any size.    

Return:    

s -- sigmoid(z)   

 """    

s = 1/(1+np.exp(-z))    

   return s


02

初始化参数


# GRADED FUNCTION: initialize_with_zeros

def initialize_with_zeros(dim):    

"""    

This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.        

Argument:   

dim -- size of the w vector we want (or number of parameters in this case)        

Returns:    

w -- initialized vector of shape (dim, 1)    

b -- initialized scalar (corresponds to the bias)    

"""        

w = np.zeros((dim,1))    

b = 0    

assert(w.shape == (dim, 1))    

assert(isinstance(b, float) or isinstance(b, int))    

   return w, b


03

向前和向后传播


现在参数已经初始化,可以执行向前和向后传播步骤来学习参数。


Exercise: 实现方法 propagate()计算代价函数和梯度
Hints:

Forward Propagation:


  • You get X

  • You compute $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$

  • You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}{m}y{(i)}\log(a{(i)})+(1-y{(i)})\log(1-a^{(i)})$


Here are the two formulas you will be using:

$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$$
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a{(i)}-y{(i)})\tag{8}$$


# GRADED FUNCTION: propagate

def propagate(w, b, X, Y):    

"""   

 Implement the cost function and its gradient for the propagation explained above    

Arguments:    

w -- weights, a numpy array of size (num_px * num_px * 3, 1)   

 b -- bias, a scalar    

X -- data of size (num_px * num_px * 3, number of examples)    

Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)    

Return:    

cost -- negative log-likelihood cost for logistic regression    

dw -- gradient of the loss with respect to w, thus same shape as w    

db -- gradient of the loss with respect to b, thus same shape as b        


Tips:    

- Write your code step by step for the propagation. np.log(), np.dot()    

"""        

m = X.shape[1]       

 # FORWARD PROPAGATION (FROM X TO COST)    

A = sigmoid(np.dot(w.T,X)+b)                             # compute activation   

 cost = -np.sum((Y*np.log(A)+(1-Y)*np.log(1-A)))/m    # compute cost        


# BACKWARD PROPAGATION (TO FIND GRAD)    

dw = np.dot(X,(A-Y).T)/m    

db = np.sum((A-Y))/m    

assert(dw.shape == w.shape)    

assert(db.dtype == float)    

cost = np.squeeze(cost)    

assert(cost.shape == ())        

grads = {"dw": dw,             "db": db}    

   return grads, cost


04

优化


  • 已经初始化了参数。

  • 也可以计算一个成本函数和它的梯度。

  • 现在,需要使用梯度下降来更新参数。


目标是通过最小化代价函数$J$来学习$w$ 和 $b$。对于$\theta$,更新规则是 $ \theta = \theta - \alpha \text{ } d\theta$,$\alpha$是学习率。


# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):    

"""    

This function optimizes w and b by running a gradient descent algorithm        

Arguments:    

w -- weights, a numpy array of size (num_px * num_px * 3, 1)   

 b -- bias, a scalar    

X -- data of shape (num_px * num_px * 3, number of examples)   

Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)    num_iterations -- number of iterations of the optimization loop    

learning_rate -- learning rate of the gradient descent update rule    

print_cost -- True to print the loss every 100 steps       


 Returns:   

 params -- dictionary containing the weights w and bias b    

grads -- dictionary containing the gradients of the weights and bias with respect to the cost function    

costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.        


Tips:    

You basically need to write down two steps and iterate through them:        


1) Calculate the cost and the gradient for the current parameters. Use propagate().        

2) Update the parameters using gradient descent rule for w and b.   

 """        

costs = []        


for i in range(num_iterations):          

  

# Cost and gradient calculation (≈ 1-4 lines of code)        

grads, cost = propagate(w, b, X, Y) 

       

# Retrieve derivatives from grads        

dw = grads["dw"]       

 db = grads["db"]               


 # update rule (≈ 2 lines of code)       

 w = w-learning_rate*dw        

b = b-learning_rate*db                


# Record the costs        

if i % 100 == 0:            

costs.append(cost)                


# Print the cost every 100 training examples        

if print_cost and i % 100 == 0:            

print ("Cost after iteration %i: %f" %(i, cost))        


params = {"w": w,              

"b": b}        


grads = {"dw": dw,             

"db": db}    


   return params, grads, costs


05

预测


前面的函数将输出学习的w和b,我们可以使用w和b来预测数据集x的标签,实现预测()函数。计算预测有两个步骤:


1、Calculate $\hat{Y} = A = \sigma(w^T X + b)$


2、Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).


# GRADED FUNCTION: predictdef predict(w, b, X):   

 '''    

Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)       


 Arguments:   

 w -- weights, a numpy array of size (num_px * num_px * 3, 1)    

b -- bias, a scalar    

X -- data of size (num_px * num_px * 3, number of examples)        


Returns:   

 Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X   

 '''        


m = X.shape[1]    

Y_prediction = np.zeros((1,m))    

w = w.reshape(X.shape[0], 1)        


# Compute vector "A" predicting the probabilities of a cat being present in the picture    

A = sigmoid(np.dot(w.T,X)+b)    

for i in range(A.shape[1]):                


# Convert probabilities A[0,i] to actual predictions p[0,i]        


if A[0][i]>0.5:            

Y_prediction[0][i]=1                                

assert(Y_prediction.shape == (1, m))    

   return Y_prediction


06

合并各个部分组成模型


现在,将通过将所有构建块(在前面部分中实现的函数)组合在一起,以正确的顺序将整个模型构建起来。


# GRADED FUNCTION: model


def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):    


"""    

Builds the logistic regression model by calling the function you've implemented previously        Arguments:   

 X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)    

X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)    

Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)    


num_iterations -- hyperparameter representing the number of iterations to optimize the parameters learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()

print_cost -- Set to true to print the cost every 100 iterations        


Returns:    

d -- dictionary containing information about the model.   

 """           

 # initialize parameters with zeros (≈ 1 line of code)    

w, b = initialize_with_zeros(X_train.shape[0])    

# Gradient descent (≈ 1 line of code)    

parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = False)        


# Retrieve parameters w and b from dictionary "parameters"    

w = parameters["w"]    

b = parameters["b"]       


 # Predict test/train set examples (≈ 2 lines of code)    


Y_prediction_test = predict(w, b, X_test)    

Y_prediction_train = predict(w, b, X_train)    


# Print train/test Errors    

print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))        

d = {"costs": costs,        

 "Y_prediction_test": Y_prediction_test,        

 "Y_prediction_train" : Y_prediction_train,         

"w" : w,         

"b" : b,         

"learning_rate" : learning_rate,         

"num_iterations": num_iterations}    

   return d


原文链接:https://www.jianshu.com/p/8e269451795d
作者个人博客:http://mcgrady.cn
github:https://github.com/TracyMcgrady6


查阅更为简洁方便的分类文章以及最新的课程、产品信息,请移步至全新呈现的“LeadAI学院官网”:

www.leadai.org


请关注人工智能LeadAI公众号,查看更多专业文章

大家都在看

LSTM模型在问答系统中的应用

基于TensorFlow的神经网络解决用户流失概览问题

最全常见算法工程师面试题目整理(一)

最全常见算法工程师面试题目整理(二)

TensorFlow从1到2 | 第三章 深度学习革命的开端:卷积神经网络

装饰器 | Python高级编程

今天不如来复习下Python基础

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值