关于吴恩达深度学习总结-CSDN博客

本文链接：https://blog.csdn.net/nan9909/article/details/88700913

关于吴恩达深度学习总结(一)相关函数

一、cost function(成本函数)

衡量在全体训练样本上的表现情况
$\frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$

$-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

 cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

二、loss function(损失函数)

衡量算法的运行情况，衡量在单个训练样本上的表现情况
$\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$

def L(A,Y):
    loss=Y * np.log(A) + (1 - Y) * np.log(1 - A)
return loss

$KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ & L_1(\hat{y},…$

def L1(yhat, y):
    loss = np.sum(np.abs(y - yhat))
    return loss

$KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ & L_2(\hat{y},…$

def L2(yhat, y):
    loss = np.dot((y - yhat),(y - yhat).T)
    return loss

三、y hat

识别对象满足y=1的概率
$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$

$z^{(i)} = w^T x^{(i)} + b \tag{1}$

A = sigmoid(np.dot(w.T, X) + b)

四、参数的更新规则

$\theta = \theta - \alpha \text{ } d\theta$

alpha,对应的是学习率

w = w - learning_rate * dw
b = b - learning_rate * db

五、w，b的导数

$\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$

$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$

dw = 1 / m * np.dot(X, (A - Y).T)
db = 1 / m * np.sum(A - Y)

六、向量化logistic回归

$\sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$

$-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

A = sigmoid(np.dot(w.T, X) + b)
cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

七、激活函数

1.sigmoid function（sigmoid函数）

$\frac{1}{1+e^{-x}}$

def sigmoid(x):
    ##	x--任意大小的标量或numpy数组。
    s = 1 / (1 + np.exp(-x))
    return s

1.1sigmoid function-derivative(sigmoid导)

$\sigma'(x) = s(1-s)$

def sigmoid_derivative(x):
    ##  ds--计算梯度
    s = sigmoid(x)
    ds = s * (1 - s)
    return ds

2.tanh 函数

$\frac{e^x-e^{-x}}{e^x+e^{-x}}$

def tanh(x):
    t = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
    return t

3.ReLU函数（max(0,x)）

def ReLU(x):
    if x>0 :
        return x
    else :
      return 0

4.leaky ReLU函数（max(0.01x,x)）

def leakyReLU(x):
    if x>0.01*x :
        return x
    else :
        return 0.01*x

八、图像数组重塑

将读取的图像(3D数组)重新塑造成一维向量。

def image2vector(image):
    v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1)
    return v

九、规范化数据(一)

将每个行向量除以他的范数。(将每个元素的范围都压缩到(0,1)之间)

def normalizeRows(x):
    x_norm = np.linalg.norm(x, axis = 1, keepdims = True)
    x = x / x_norm
    return x

十、规范化数据(二)

当算法需要对两个或多个类进行分类时，可以将softmax看作一个规范化函数。

def softmax(x):
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    s = x_exp / x_sum
    return s

十一、初始化w,b

def initialize(dim):
    """
    这个函数为w创建一个形状为0 (dim, 1)的向量，并初始化b为0。
    dim --我们想要的w向量的大小(在本例中是参数的数量)
    w -- 初始化形状向量(dim, 1)
    b --初始化标量(对应偏差)
    """
    w = np.zeros((dim, 1))
    b = 0
    
    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

十二、学习参数

def propagate(w, b, X, Y):
    """
    w -- 权值，一个大小为numpy的数组(num_px * num_px * 3,1)
    b -- 偏差, 标量
    X -- 数据的大小 (num_px * num_px * 3, number of examples)
    Y -- 正确的 "label" 矢量 (containing 0 if non-cat, 1 if cat) 的大小 (1, number of examples)
    cost -- 逻辑回归的负对数似然成本
    dw -- 损失相对于w的梯度，因此形状与w相同
    db -- 损失相对于b的梯度，因此形状与b相同
    
    """
    m = X.shape[1]
    A = sigmoid(np.dot(w.T, X) + b)
    cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A)) 
    dw = 1 / m * np.dot(X, (A - Y).T)
    db = 1 / m * np.sum(A - Y)
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

十三、优化(更新参数)

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    该函数通过运行梯度下降算法优化w和b
    w --权值，一个大小为numpy的数组(num_px * num_px * 3,1)
    b -- 偏差, 标量
    X -- 形状数据 (num_px * num_px * 3, number of examples)
    Y -- 准确地 "label" 矢量 (containing 0 if non-cat, 1 if cat), 的形状 (1, number of examples)
    num_iterations -- 优化循环的迭代次数
    learning_rate -- 梯度下降更新规则的学习率
    print_cost -- 真打印损失每100步
    
    params -- 包含权重w和偏差b的字典
    grads -- 包含权重梯度和相对于成本函数的偏差的字典
    costs --列出优化过程中计算的所有成本，这将用于绘制学习曲线。
    
    """
    costs = []
    
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)
        dw = grads["dw"]
        db = grads["db"]
        w = w - learning_rate * dw
        b = b - learning_rate * db
        if i % 100 == 0:
            costs.append(cost)
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

十四、预测数据集的标签

def predict(w, b, X):
    """
    使用学习逻辑回归参数(w, b)预测标签是0还是1
     w -- 权值，一个大小为numpy的数组(num_px * num_px * 3,1)
    b -- 偏差, 标量
    X -- 数据的大小 (num_px * num_px * 3, number of examples)
    Y_prediction -- 一个numpy数组(向量)，包含X中示例的所有预测(0/1)
    """
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    A = sigmoid(np.dot(w.T, X) + b)
    
    for i in range(A.shape[1]):
        if A[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

十五、构建模型函数

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
     X_train -- 由形状的numpy数组表示的训练集 (num_px * num_px * 3, m_train)
    Y_train -- 由numpy数组表示的训练标签 (矢量) 的形状 (1, m_train)
    X_test -- 由形状(num_px * num_px * 3, m_test)的numpy数组表示的测试集 
    Y_test -- 由形状(1,m_test)的numpy数组(向量)表示的测试标签
    num_iterations -- 超参数表示优化参数的迭代次数
    learning_rate -- 表示optimize()更新规则中使用的学习率的超参数
    print_cost -- 设置为true，以每100次迭代打印成本
    d -- 包含模型信息的字典。
    """
    
    w, b = initialize(X_train.shape[0])
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    w = parameters["w"]
    b = parameters["b"]
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)
    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d