关于吴恩达深度学习总结

关于吴恩达深度学习总结(一)相关函数

一、cost function(成本函数)

衡量在全体训练样本上的表现情况
(6) J = 1 m ∑ i = 1 m L ( a ( i ) , y ( i ) ) J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6} J=m1i=1mL(a(i),y(i))(6)

J = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( a ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − a ( i ) ) J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}) J=m1i=1my(i)log(a(i))+(1y(i))log(1a(i))

 cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A)) 

二、loss function(损失函数)

衡量算法的运行情况,衡量在单个训练样本上的表现情况
(3) L ( a ( i ) , y ( i ) ) = − y ( i ) log ⁡ ( a ( i ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − a ( i ) ) \mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3} L(a(i),y(i))=y(i)log(a(i))(1y(i))log(1a(i))(3)

def L(A,Y):
    loss=Y * np.log(A) + (1 - Y) * np.log(1 - A)
return loss

KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ & L_1(\hat{y},…

def L1(yhat, y):
    loss = np.sum(np.abs(y - yhat))
    return loss

KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ & L_2(\hat{y},…

def L2(yhat, y):
    loss = np.dot((y - yhat),(y - yhat).T)
    return loss

三、y hat

识别对象满足y=1的概率
(2) y ^ ( i ) = a ( i ) = s i g m o i d ( z ( i ) ) \hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2} y^(i)=a(i)=sigmoid(z(i))(2)

(1) z ( i ) = w T x ( i ) + b z^{(i)} = w^T x^{(i)} + b \tag{1} z(i)=wTx(i)+b(1)

A = sigmoid(np.dot(w.T, X) + b)

四、参数的更新规则

θ = θ − α   d θ \theta = \theta - \alpha \text{ } d\theta θ=θα dθ

$$

$$

alpha,对应的是学习率

w = w - learning_rate * dw
b = b - learning_rate * db

五、w,b的导数

(7) ∂ J ∂ w = 1 m X ( A − Y ) T \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7} wJ=m1X(AY)T(7)

(8) ∂ J ∂ b = 1 m ∑ i = 1 m ( a ( i ) − y ( i ) ) \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8} bJ=m1i=1m(a(i)y(i))(8)

dw = 1 / m * np.dot(X, (A - Y).T)
db = 1 / m * np.sum(A - Y)

六、向量化logistic回归

A = σ ( w T X + b ) = ( a ( 0 ) , a ( 1 ) , . . . , a ( m − 1 ) , a ( m ) ) A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)}) A=σ(wTX+b)=(a(0),a(1),...,a(m1),a(m))

J = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( a ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − a ( i ) ) J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}) J=m1i=1my(i)log(a(i))+(1y(i))log(1a(i))

A = sigmoid(np.dot(w.T, X) + b)
cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

七、激活函数

1.sigmoid function(sigmoid函数)

s i g m o i d ( x ) = 1 1 + e − x sigmoid(x) = \frac{1}{1+e^{-x}} sigmoid(x)=1+ex1

def sigmoid(x):
    ##	x--任意大小的标量或numpy数组。
    s = 1 / (1 + np.exp(-x))
    return s
1.1sigmoid function-derivative(sigmoid导)

σ ′ ( x ) = s ( 1 − s ) \sigma'(x) = s(1-s) σ(x)=s(1s)

def sigmoid_derivative(x):
    ##  ds--计算梯度
    s = sigmoid(x)
    ds = s * (1 - s)
    return ds

2.tanh 函数

t a n h ( x ) = e x − e − x e x + e − x tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}} tanh(x)=ex+exexex

def tanh(x):
    t = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
    return t

3.ReLU函数(max(0,x))

def ReLU(x):
    if x>0 :
        return x
    else :
      return 0

4.leaky ReLU函数(max(0.01x,x))

def leakyReLU(x):
    if x>0.01*x :
        return x
    else :
        return 0.01*x

八、图像数组重塑

将读取的图像(3D数组)重新塑造成一维向量。

def image2vector(image):
    v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1)
    return v

九、规范化数据(一)

将每个行向量除以他的范数。(将每个元素的范围都压缩到(0,1)之间)

def normalizeRows(x):
    x_norm = np.linalg.norm(x, axis = 1, keepdims = True)
    x = x / x_norm
    return x

十、规范化数据(二)

当算法需要对两个或多个类进行分类时,可以将softmax看作一个规范化函数。

def softmax(x):
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    s = x_exp / x_sum
    return s

十一、初始化w,b

def initialize(dim):
    """
    这个函数为w创建一个形状为0 (dim, 1)的向量,并初始化b为0。
    dim --我们想要的w向量的大小(在本例中是参数的数量)
    w -- 初始化形状向量(dim, 1)
    b --初始化标量(对应偏差)
    """
    w = np.zeros((dim, 1))
    b = 0
    
    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

十二、学习参数

def propagate(w, b, X, Y):
    """
    w -- 权值,一个大小为numpy的数组(num_px * num_px * 3,1)
    b -- 偏差, 标量
    X -- 数据的大小 (num_px * num_px * 3, number of examples)
    Y -- 正确的 "label" 矢量 (containing 0 if non-cat, 1 if cat) 的大小 (1, number of examples)
    cost -- 逻辑回归的负对数似然成本
    dw -- 损失相对于w的梯度,因此形状与w相同
    db -- 损失相对于b的梯度,因此形状与b相同
    
    """
    m = X.shape[1]
    A = sigmoid(np.dot(w.T, X) + b)
    cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A)) 
    dw = 1 / m * np.dot(X, (A - Y).T)
    db = 1 / m * np.sum(A - Y)
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

十三、优化(更新参数)

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    该函数通过运行梯度下降算法优化w和b
    w --权值,一个大小为numpy的数组(num_px * num_px * 3,1)
    b -- 偏差, 标量
    X -- 形状数据 (num_px * num_px * 3, number of examples)
    Y -- 准确地 "label" 矢量 (containing 0 if non-cat, 1 if cat), 的形状 (1, number of examples)
    num_iterations -- 优化循环的迭代次数
    learning_rate -- 梯度下降更新规则的学习率
    print_cost -- 真打印损失每100步
    
    params -- 包含权重w和偏差b的字典
    grads -- 包含权重梯度和相对于成本函数的偏差的字典
    costs --列出优化过程中计算的所有成本,这将用于绘制学习曲线。
    
    """
    costs = []
    
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)
        dw = grads["dw"]
        db = grads["db"]
        w = w - learning_rate * dw
        b = b - learning_rate * db
        if i % 100 == 0:
            costs.append(cost)
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

十四、预测数据集的标签

def predict(w, b, X):
    """
    使用学习逻辑回归参数(w, b)预测标签是0还是1
     w -- 权值,一个大小为numpy的数组(num_px * num_px * 3,1)
    b -- 偏差, 标量
    X -- 数据的大小 (num_px * num_px * 3, number of examples)
    Y_prediction -- 一个numpy数组(向量),包含X中示例的所有预测(0/1)
    """
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    A = sigmoid(np.dot(w.T, X) + b)
    
    for i in range(A.shape[1]):
        if A[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

十五、构建模型函数

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
     X_train -- 由形状的numpy数组表示的训练集 (num_px * num_px * 3, m_train)
    Y_train -- 由numpy数组表示的训练标签 (矢量) 的形状 (1, m_train)
    X_test -- 由形状(num_px * num_px * 3, m_test)的numpy数组表示的测试集 
    Y_test -- 由形状(1,m_test)的numpy数组(向量)表示的测试标签
    num_iterations -- 超参数表示优化参数的迭代次数
    learning_rate -- 表示optimize()更新规则中使用的学习率的超参数
    print_cost -- 设置为true,以每100次迭代打印成本
    d -- 包含模型信息的字典。
    """
    
    w, b = initialize(X_train.shape[0])
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    w = parameters["w"]
    b = parameters["b"]
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)
    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

numpy相关函数的学习(www.numpy.org)

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值