Deep Learning Specialization 2: Hyperparameter tuning, Regularization and Optimization - Note 3

最新推荐文章于 2024-04-18 10:14:03 发布

du00

最新推荐文章于 2024-04-18 10:14:03 发布

阅读量163

点赞数

分类专栏：笔记

本文链接：https://blog.csdn.net/duh2so4/article/details/95019063

版权

笔记专栏收录该内容

16 篇文章 0 订阅

订阅专栏

2. Hyperparameter tuning, Batch Normalization and Programming Frameworks - Note 3

1. Hyperparameter tuning

待调参数

老师认为的重要性如下（*越多越重要）：

$\alpha$ : 学习速率 ***
$b e t a$ (如果不使用adam): momentum
$\beta_1, \beta_2, \epsilon$ ：Adam
#layers*
#hidden units **
Learning rate decay *
mini-batch size **

建议在超参空间内测试随机值，不建议使用grid search，但是可以coarse to fine，缩小范围。
要理解参数的意义，比如exponetial average weights $\beta1$ 的物理是 $\approx \frac{1}{1 - \beta}$ ，所以均匀随机取 $\beta_1$ 就没有意义了，应该在log尺度上取。

2. Batch Normalization

网络更鲁棒
降低超参选择的敏感度
训练更深的网络更容易
轻微的正则化

对 $z$ 进行归一化（回忆： $z = W x + b$ ， $a = g (z)$ ）
$\begin{aligned} \mu & = \frac{1}{m} \sum_i (z_i - \mu)^2 \\ \sigma^2 & = \frac{1}{m} \sum_i(z_i - \mu)^2 \\ z_\text{norm}^{(i)} & = \frac{z^{(i)}-\mu}{\sqrt{\sigma^2 + \epsilon}} \\ \tilde{z}^{(i)} & = \gamma z_\text{norm}^{(i)} + \beta \end{aligned}$
其中， $\gamma$ 和 $\beta$ 并不是超参，是可以在学习的过程中确定的，并且在这一个归一化的过程中与常数b无关，所以最终只需要学习 $W^{[i]}$ , $\gamma^{[i]}$ , $。具体的学习方法与学习$ $W$ 的过程是一样的。

Test time： $\mu$ 和 $\sigma^2$ 的在单个样本预测时并不存在
1. 使用exponential weight average来从mini-batch中计算；
2. 从全部样本中计算

3. Multi-class classification

将Logistic Regression扩展到Softmax Regression
$\mathcal{L}(\hat{y}, y) = - \sum_{j=1}^\text{n class} y_j \log \hat{y}_j$
输出层换成 $n\times1$ 就可以了，区别比较小。

4. Tensorflow

只需要关心前向的实现，后向会自动帮你实现

TensorFlow的基本套路:

Create Tensors (variables/placeholders) that are not yet executed/evaluated.
Write operations between those Tensors (computaion graph, tf.matmul, tf.add…).
Initialize your Tensors.
Create a Session.
Run the Session on “optimizer” object (using a feed dictionary to bind placeholder variables)

示例代码：

y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y')                    # Define y. Set to 39

loss = tf.Variable((y - y_hat)**2, name='loss')  # Create a variable for the loss

init = tf.global_variables_initializer()         # When init is run later (session.run(init)),
                                                 # the loss variable will be initialized and ready to be computed
with tf.Session() as session:                    # Create a session and print the output
    session.run(init)                            # Initializes the variables
    print(session.run(loss))