coursera吴恩达deep learning学习笔记（1）

最新推荐文章于 2023-06-20 16:39:02 发布

半张紙

最新推荐文章于 2023-06-20 16:39:02 发布

阅读量1.3k

点赞数

分类专栏： computer vision

本文链接：https://blog.csdn.net/qq_43714612/article/details/103547062

版权

computer vision 专栏收录该内容

14 篇文章 2 订阅

订阅专栏

Machine learning： a computer program learns from experience E if its perfomance P at task T improves with experiecece E…
supervised learning and unsupervised learning
supervised learning: a data set with the right answer, and then you give a data to the model and it return you "a right answer "
example:
1.房子面积-房价（regression problem）
2.肿瘤体积与患者年龄-良性与否（classification）
3.如何处理有无数个attribute/feature的label ： SVM(支持向量机)

unsupervised learning( find the structure from data):
example：

聚类算法：将拥有相同topic的news分为一类，通过基因将人划分为不同类（data set中只有某个人的基因，而没有告诉这个人是A还是B类）
Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.

2.nerual network
SNN(sandard nerual network),CNN(convolution),RNN(sequence, recurrent循环),

structured data:data SNN
unsturctured data: adio,image RNN,CNN

3.basic of neural network programming

binary classification

识别是否图片含有猫
x为image vector: n = height * width * 3(RGB)
y为0或1
训练集样本矩阵X = 【x（1） x(2) …x(m)] n*m矩阵其中m为训练集样本数
Y = [y(1) y(2)…]
given x, want y^(yhat) = P(x|y=1) x为当前值时y=1的概率
a. 在classification任务中，yhat = wx + b 难以实现y在【0，1】间的要求
b. yhat = sigmoid(wx+b) sigmoid（z) = 1/（1+e^-z)
值域由R变为（-1,1） 且当z=0,sigmoid(z) = 0.5

为了改变w,b，需要cost function：

cost function
Loss（error） function：检测单一样本的y与yhat的差距，也就是反映单一样本的优化程度如何
不使用L = (yhat - y)^2 由于yhat中含有sigmoid函数，L会有多个极小值，不适合用梯度下降法求极值点
此处采用L(y,yhat) = -(y log yhat + (1-y) log(1-yhat) )
注意,y只能取值0,1两数
where y=0, L= -log (1-yhat) 此时要求yhat尽量小,即接近0，才能使L尽量小
where y=1, L=-log yhat 此时要求y尽量接近1

cost function:反映整个训练集的参数（w,b)成本
J(w,b) = sum(L) / m = -sum(y log yhat + (1-y) log(1-yhat) ) / m

gradient descent：

适用于凸函数（只有一个局部最小值）
一维： y=y(w) 记dy/dw = dw 迭代公式为w_new = w - adw (a>0)
二维： w_new = w - a (partial J / partial w) = w-adw
b_new = b - adb

logistic regression中的gradient descent:

z = wx + b = w1x1 + w2x2 + b
yhat = a = sigmoid(z)
L = L(a)
J = sum(L)/m
da = dL/da = …
dz = dL/dz = a - y
dw1 = dJ / dw1 = sum(dL/dz * dz/dw1)/m =sum ( (a(i) - y(i))*x1(i) )/m

实现代码：
1.三重for循环

for  梯度下降迭代:
	J=0, dw1 = 0, dw2 = 0, db =0
	for  i to m:   #遍历训练集m
		z(i) =np.dot( w,x(i)) + b
		a(i) = sigmoid(z(i))
		J += -( y(i) log a(i) + (1-y(i)) log(1-a(i)) )
		dz(i) = a(i) - y(i)
		for j to n_x:  #遍历x的各个维度
			w(j)	+= x(j)(i) * dz(i)
		db += dz(i)
	J /= m
	dw1 /= m
	dw2 /= m
	db /= m
	w1 = w1 - a*dw1
	w2 = w2 - a*dw2
	b = b - a*db

2.矢量化

from math import e
import numpy as np
for  梯度下降迭代:
	dw = np.zeros((1,n_x)), db  = 0
	z = np.dot（w,x) + b        # z.size = (1, m)
	a = 1/(1+e**(-z))
	dz =  a - y     #dz.size = (1,m)
	dw = np.dot(x, dz.T) /m    #dw.size = (x_n , 1)
	db = np.sum(dz)/m
	w1 = w1 - a*dw1
	w2 = w2 - a*dw2
	b = b - a*db

Note: 在Python中，10^6维的向量点乘时，向量化比for循环快了接近300倍。因此在deep learning中，无论何时应该尽量避免显式for循环

numpy语法
1.创建数组
np.arrays( [[1,2,],[3,4]] )
np.zeros( (2,2))
np.full((2,2),np.inf)

2.运算
u,v,w为数学上的向量或矩阵,
u.size = (n,1) , v.size(1,n), w.size = (n,1)
u的指数运算：
np.exp(v)
e**(v)

u的对数运算：
np.log(u)

u的绝对值运算：
np.abs(u)
abs(u)
note： abs不是math的库函数

由下图可以看出，np.exp最快，e**其次，最慢的是for循环
在这里插入图片描述

在这里插入图片描述
note： math库中的函数math.exp和math.log等只能用来计算单个数字，不能用来计算numpy；但是np库和Python的内建函数（如abs, **等）可以计算numpy

元素相乘：u*w
元素进行指数运算： uw
矩阵乘法（向量点积）：np.dot(v,u)
矩阵转置： u.T
每个元素加一： u+1
每个元素平方： u2
每个元素取倒： 1/u

半张紙

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
coursera吴恩达deep learning学习笔记（1）

Machine learning： a computer program learns from experience E if its perfomance P at task T improves with experiecece E.supervised learning and unsupervised learningsupervised learning: a data se...
复制链接

扫一扫

专栏目录