DeepLearing学习笔记-行归一化和broadcasting

最新推荐文章于 2023-08-05 09:57:05 发布

JasonLiu1919

最新推荐文章于 2023-08-05 09:57:05 发布

阅读量606

点赞数

分类专栏：深度学习机器学习 Python 文章标签： deep-learning

本文链接：https://blog.csdn.net/ljp1919/article/details/78076690

版权

深度学习同时被 3 个专栏收录

55 篇文章 3 订阅

订阅专栏

Python

51 篇文章 1 订阅

订阅专栏

机器学习

17 篇文章 1 订阅

订阅专栏

背景：

数据归一化能够提高梯度下降的收敛速度

归一化：

归一化方式： $\frac{x}{\| x\|}$ ，每行除以其行向量的范数。
例如

x = [023644] (1)

$x = \begin{bmatrix} 0 & 3 & 4 \\ 2 & 6 & 4 \\ \end{bmatrix}\tag{1}$
范数的计算：

∥ x ∥ = n p . l i n a l g . n o r m (x, a x i s = 1, k e e p d i m s = T r u e) = [5 56 - - \sqrt] (2)

$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix} 5 \\ \sqrt{56} \\ \end{bmatrix}\tag{2}$
归一化的结果：

x_n o r m a l i z e d = x ∥ x ∥ = ⎡ ⎣ ⎢ ⎢ ⎢ 0 2 56 - - \sqrt 3 5 6 56 - - \sqrt 4 5 4 56 - - \sqrt ⎤ ⎦ ⎥ ⎥ ⎥ (3)

$x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix} 0 & \frac{3}{5} & \frac{4}{5} \\ \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\ \end{bmatrix}\tag{3}$
我们之所以可以在两个尺寸不同的矩阵之间做除法是因为python中的broadcasting机制。

python实现：

# GRADED FUNCTION: normalizeRows

def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).

    Argument:
    x -- A numpy matrix of shape (n, m)

    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """

    ### START CODE HERE ### (≈ 2 lines of code)
    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
    x_norm = None
    x_norm = np.linalg.norm(x,axis=1,keepdims=True)
    print("size of x=",np.shape(x))
    print("size of x_norm=",np.shape(x_norm))
    # Divide x by its norm.
    x = x / x_norm
    ### END CODE HERE ###

    return x

x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = " + str(normalizeRows(x)))

输出结果：

size of x= (2, 3)
size of x_norm= (2, 1)
ormalizeRows(x) = [[ 0.          0.6         0.8       ]
 [ 0.13736056  0.82416338  0.54944226]]

broadcasting

从上图x和x_norm的shape结果可看成，两个矩阵尺寸不同，那么之间是如何实现运算的呢？
对两个阵进行操作时，numpy逐元素地比较他们的形状，从后面的维度向前执行。当以下情形出现时，两个维度是兼容的：
1，它们相等
2，其中一个是1
如果这些条件都没有达到，将会抛出错误：frames are not aligned exception，表示两个阵列形状不兼容。结果阵列的尺寸与输入阵列的各维度最大尺寸相同。
下面这些例子不能广播

A      (1d array):  3
B      (1d array):  4 # trailing dimensions do not match  #维度尺寸不兼容
#从后往前，一个是3,一个是4
A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3 # second from last dimensions mismatched #倒数第二个维度不兼容

例子：

下面以softmax 函数为例，进一步说明：

$\text{for } x \in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix} x_1 && x_2 && … && x_n \end{bmatrix}) = \begin{bmatrix} \frac{e^{x_1}}{\sum_{j}e^{x_j}} && \frac{e^{x_2}}{\sum_{j}e^{x_j}} && ... && \frac{e^{x_n}}{\sum_{j}e^{x_j}} \end{bmatrix}$
$\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{, $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$
$s o f t m a x (x) = s o f t m a x ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ x 11 x 21 ⋮ x m 1 x 12 x 22 ⋮ x m 2 x 13 x 23 ⋮ x m 3 \dots \dots ⋱ \dots x 1 n x 2 n ⋮ x m n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ e x 11 \sum j e x 1 j e x 21 \sum j e x 2 j ⋮ e x m 1 \sum j e x m j e x 12 \sum j e x 1 j e x 22 \sum j e x 2 j ⋮ e x m 2 \sum j e x m j e x 13 \sum j e x 1 j e x 23 \sum j e x 2 j ⋮ e x m 3 \sum j e x m j \dots \dots ⋱ \dots e x 1 n \sum j e x 1 j e x 2 n \sum j e x 2 j ⋮ e x m n \sum j e x m j ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ = ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ s o f t m a x (first row of x) s o f t m a x (second row of x) . . . s o f t m a x (last row of x) ⎞ ⎠ ⎟ ⎟ ⎟ ⎟$ $softmax(x) = softmax\begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix} = \begin{bmatrix} \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\ \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}} \end{bmatrix} = \begin{pmatrix} softmax\text{(first row of x)} \\ softmax\text{(second row of x)} \\ ... \\ softmax\text{(last row of x)} \\ \end{pmatrix}$

python 实现：

# GRADED FUNCTION: softmax

def softmax(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (n, m).

    Argument:
    x -- A numpy matrix of shape (n,m)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (n,m)
    """

    ### START CODE HERE ### (≈ 3 lines of code)
    # Apply exp() element-wise to x. Use np.exp(...).
    x_exp = np.exp(x)
    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
    x_sum = np.sum(x_exp,axis=1,keepdims=True)
    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
    s = x_exp / x_sum
    ### END CODE HERE ###

    return s

x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(x)))

运行结果：

softmax(x) = [[  9.80897665e-01   8.94462891e-04   1.79657674e-02   1.21052389e-04
    1.21052389e-04]
 [  8.78679856e-01   1.18916387e-01   8.01252314e-04   8.01252314e-04
    8.01252314e-04]]

注意：broadcasting适用于矩阵之间使用+-*/这些运算符号，但是对于np.dot这种矩阵运算规则并不适用。

例子2：

# a.shape = (3,4)
# b.shape = (4,1)
for i in range(3):
    for j in range(4):
        c[i][j] = a[i][j] + b[j]

那么下面的那个式子是可以对应的：

c = a + b
c = a.T + b.T
c = a + b.T
c = a.T + b

考虑广播规则，我们可以知晓是第三个。

JasonLiu1919

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
DeepLearing学习笔记-行归一化和broadcasting

背景：数据归一化能够提高梯度下降的收敛速度归一化：归一化方式：x∥x∥ \frac{x}{\| x\|} ，每行除以其行向量的范数。例如x=[023644](1)x = \begin{bmatrix} 0 & 3 & 4 \\ 2 & 6 & 4 \\\end{bmatrix}\tag{1} 范数的计算：∥x∥=np.linalg.norm(x,axis=1,keepd
复制链接

扫一扫

专栏目录