MIT6.S191-L1_Introduction_to_Deep_Learning

The Perceptron

  • Forward Propagation

在这里插入图片描述

Common Activation Functions

NOTE: All activation functions are non-linear

  1. Simoid

  2. Hyperbolic Tangent

  3. Rectified Linear Unit(ReLU)

Multi Output Perceptron

Because all inputs are densely connected to all outputs, these layers are called Dense layers.(全连接)

  • Tensorflow implements Dense layer

    class MyDenseLayer(tf.keras.layers.Layer):
        def __init__(self, input_dim, output_dim):
            super(MyDenseLayer, self).__init__()
            
        # Initialize weights and bias
        self.W = self.add_weights([input_dim, output_dim])
        self.b = self.add_weights([1, output_dim])
        
        def call(self, inputs):
            # Forward propagate the inputs
            z = tf.matmul(inputs, self.W) + self.b
            
            # Feed through a non-linear activation
            output = tf.math.sigmoid(z)
            
            return output
    

    对应于keras的实现

    import tensorflow as tf
    layer = tf.keras.layer.Dense(units=2)
    
    # Example
    # as first layer in a sequential model:
    model = Sequential()
    model.add(Dense(32, input_shape=(16,)))
    # now the model will take as input arrays of shape (*, 16)
    # and output arrays of shape (*, 32)
    
    # after the first layer, you don't need to specify
    # the size of the input anymore:
    model.add(Dense(32))
    

Applying Neural Networks

  • Quantifying Loss

    The losss of our neural network measures the cost incurred from incorrect predictions

  • Empirical Loss 经验损失

    Also known as:

    • **Objective function(目标函数) **
    • Cost function(代价函数)
    • Empirical Risk(经验风险)

    The empirical loss measures the total loss over our entire dataset

  • Binary Cross Entropy Loss

    Cross Entropy Loss can be used with models that output a probability between 0 and 1

在这里插入图片描述

loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(y, predicted) )
  • Mean Square Error Loss

    MSE loss can be used with regression models that output continuous real numbers

在这里插入图片描述

loss = tf.reduce_mean( tf.square(tf.substract(y, predicted)) )

Training Neural Networks

  • Loss Optimization

    We want to find the nerwork weights that achieve the lowest loss

在这里插入图片描述

  • Gradient Descent
    1. Initialize weights randomly 服从正态分布 N ( 0 , σ 2 ) N(0, \sigma^2) N(0,σ2)
    2. Loop until convergence:
    3. ​ Compute gradient, ∂ J ( W ) ∂ W \frac{\partial J(W)}{\partial W} WJ(W)
    4. ​ Update weights, W ⇐ W − η ∂ J ( W ) ∂ W W \Leftarrow W - \eta \frac{\partial J(W)}{\partial W} WWηWJ(W)
    5. Return weights
import tensorflow as tf

weights = tf.Variable( [tf.random.normal()] )

while True:
    with tf.GradientTape() as g:
        loss = compute_loss(weights)
        gradient = g.gradient(loss, weights)
    
    weights = weights - lr * gradient
  • Computing Gradients: Backpropagation

在这里插入图片描述

  • Loss Functions Can Be Difficult to Optimize

    Setting the Learning Rate
    • Small Learning rate converges slowly and gets stuck in false local minima(伪局部最小值)
    • Large Learning rate overshoot, become unstable and diverge
    • Stable Learning rate converge smoothly and avoid local minima
  • Adaptive Learning Rates

    梯度下降的优化方法【https://ruder.io/optimizing-gradient-descent/】

在这里插入图片描述

import tensorflow as tf

model = tf.keras.Sequentail([...])

# pick your favourite optimizer
optimizer = tf.keras.optimizer.SGD()

while True: # loop forever
    
    # forever pass through the network
    prediction = model(x)
    
    with tf.GradientTape() as tape:
        # compute the loss
        loss = compute_loss(y, prediction)
        
    # update the weights using the gradient
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
  • Overfitting 过拟合

    过拟合的处理方法:使用正则(Regularization)方法

  • Regularization

    Technique that constrains our optimization problem to discourage complex models.

    一种限制优化问题来阻止复杂化模型的方法。

    常用正则化方法:

    1. Dropout

      During training, randomly set some activations to 0

      tf.keras.layers.Dropout(p=0.5)
      
    2. Early Stopping

      Stop training before we have a chance to overfit

      During training, randomly set some activations to 0

      tf.keras.layers.Dropout(p=0.5)
      
    3. Early Stopping

      Stop training before we have a chance to overfit

在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值