MIT6.S191-L1_Introduction_to_Deep_Learning

最新推荐文章于 2023-12-15 01:47:50 发布

lcjia_you

最新推荐文章于 2023-12-15 01:47:50 发布

阅读量285

点赞数 1

分类专栏： MIT6.S191

本文链接：https://blog.csdn.net/u011864852/article/details/104286186

版权

MIT6.S191 专栏收录该内容

1 篇文章

订阅专栏

The Perceptron

Forward Propagation

在这里插入图片描述

Common Activation Functions

NOTE: All activation functions are non-linear

Simoid
Hyperbolic Tangent
Rectified Linear Unit(ReLU)

Multi Output Perceptron

Because all inputs are densely connected to all outputs, these layers are called Dense layers.(全连接)

Tensorflow implements Dense layer

class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, input_dim, output_dim):
        super(MyDenseLayer, self).__init__()
        
    # Initialize weights and bias
    self.W = self.add_weights([input_dim, output_dim])
    self.b = self.add_weights([1, output_dim])
    
    def call(self, inputs):
        # Forward propagate the inputs
        z = tf.matmul(inputs, self.W) + self.b
        
        # Feed through a non-linear activation
        output = tf.math.sigmoid(z)
        
        return output

对应于keras的实现

import tensorflow as tf
layer = tf.keras.layer.Dense(units=2)

# Example
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)

# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))

Applying Neural Networks

Quantifying Loss

The losss of our neural network measures the cost incurred from incorrect predictions
Empirical Loss 经验损失

Also known as:
- **Objective function(目标函数) **
- Cost function(代价函数)
- Empirical Risk(经验风险)
The empirical loss measures the total loss over our entire dataset
Binary Cross Entropy Loss

Cross Entropy Loss can be used with models that output a probability between 0 and 1

在这里插入图片描述

loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(y, predicted) )

Mean Square Error Loss

MSE loss can be used with regression models that output continuous real numbers

在这里插入图片描述

loss = tf.reduce_mean( tf.square(tf.substract(y, predicted)) )

Training Neural Networks

Loss Optimization

We want to find the nerwork weights that achieve the lowest loss

在这里插入图片描述

Gradient Descent
1. Initialize weights randomly 服从正态分布 $\sigma^2)$
2. Loop until convergence:
3. Compute gradient, $\frac{\partial J(W)}{\partial W}$
4. Update weights, $\Leftarrow W - \eta \frac{\partial J(W)}{\partial W}$
5. Return weights

import tensorflow as tf

weights = tf.Variable( [tf.random.normal()] )

while True:
    with tf.GradientTape() as g:
        loss = compute_loss(weights)
        gradient = g.gradient(loss, weights)
    
    weights = weights - lr * gradient

Computing Gradients: Backpropagation

在这里插入图片描述

Loss Functions Can Be Difficult to Optimize

Setting the Learning Rate
- Small Learning rate converges slowly and gets stuck in false local minima(伪局部最小值)
- Large Learning rate overshoot, become unstable and diverge
- Stable Learning rate converge smoothly and avoid local minima
Adaptive Learning Rates

梯度下降的优化方法【https://ruder.io/optimizing-gradient-descent/】

在这里插入图片描述

import tensorflow as tf

model = tf.keras.Sequentail([...])

# pick your favourite optimizer
optimizer = tf.keras.optimizer.SGD()

while True: # loop forever
    
    # forever pass through the network
    prediction = model(x)
    
    with tf.GradientTape() as tape:
        # compute the loss
        loss = compute_loss(y, prediction)
        
    # update the weights using the gradient
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

Overfitting 过拟合

过拟合的处理方法：使用正则(Regularization)方法
Regularization

Technique that constrains our optimization problem to discourage complex models.

一种限制优化问题来阻止复杂化模型的方法。

常用正则化方法：
1. Dropout
  
  During training, randomly set some activations to 0
```
tf.keras.layers.Dropout(p=0.5)
```
2. Early Stopping
  
  Stop training before we have a chance to overfit
  
  During training, randomly set some activations to 0
```
tf.keras.layers.Dropout(p=0.5)
```
3. Early Stopping
  
  Stop training before we have a chance to overfit