背景:
数据归一化能够提高梯度下降的收敛速度
归一化:
归一化方式:
x∥x∥
,每行除以其行向量的范数。
例如
x=[023644](1)
范数的计算:
∥x∥=np.linalg.norm(x,axis=1,keepdims=True)=[556−−√](2)
归一化的结果:
x_normalized=x∥x∥=⎡⎣⎢⎢⎢0256−−√35656−−√45456−−√⎤⎦⎥⎥⎥(3)
我们之所以可以在两个尺寸不同的矩阵之间做除法是因为python中的broadcasting机制。
python实现:
# GRADED FUNCTION: normalizeRows
def normalizeRows(x):
"""
Implement a function that normalizes each row of the matrix x (to have unit length).
Argument:
x -- A numpy matrix of shape (n, m)
Returns:
x -- The normalized (by row) numpy matrix. You are allowed to modify x.
"""
### START CODE HERE ### (≈ 2 lines of code)
# Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
x_norm = None
x_norm = np.linalg.norm(x,axis=1,keepdims=True)
print("size of x=",np.shape(x))
print("size of x_norm=",np.shape(x_norm))
# Divide x by its norm.
x = x / x_norm
### END CODE HERE ###
return x
x = np.array([
[0, 3, 4],
[1, 6, 4]])
print("normalizeRows(x) = " + str(normalizeRows(x)))
输出结果:
size of x= (2, 3)
size of x_norm= (2, 1)
ormalizeRows(x) = [[ 0. 0.6 0.8 ]
[ 0.13736056 0.82416338 0.54944226]]
broadcasting
从上图x和x_norm的shape结果可看成,两个矩阵尺寸不同,那么之间是如何实现运算的呢?
对两个阵进行操作时,numpy
逐元素地比较他们的形状,从后面的维度向前执行。当以下情形出现时,两个维度是兼容的:
1,它们相等
2,其中一个是1
如果这些条件都没有达到,将会抛出错误:frames are not aligned exception
,表示两个阵列形状不兼容。结果阵列的尺寸与输入阵列的各维度最大尺寸相同。
下面这些例子不能广播
A (1d array): 3
B (1d array): 4 # trailing dimensions do not match #维度尺寸不兼容
#从后往前,一个是3,一个是4
A (2d array): 2 x 1
B (3d array): 8 x 4 x 3 # second from last dimensions mismatched #倒数第二个维度不兼容
例子:
下面以softmax 函数为例,进一步说明:
for x∈R1×n, softmax(x)=softmax([x1x2…xn])=[ex1∑jexjex2∑jexj...exn∑jexj]
for a matrix x∈Rm×n, xij maps to the element in the ith row and jth column of x, thus we have:
softmax(x)=softmax⎡⎣⎢⎢⎢⎢⎢x11x21⋮xm1x12x22⋮xm2x13x23⋮xm3……⋱…x1nx2n⋮xmn⎤⎦⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢ex11∑jex1jex21∑jex2j⋮exm1∑jexmjex12∑jex1jex22∑jex2j⋮exm2∑jexmjex13∑jex1jex23∑jex2j⋮exm3∑jexmj……⋱…ex1n∑jex1jex2n∑jex2j⋮exmn∑jexmj⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥=⎛⎝⎜⎜⎜⎜softmax(first row of x)softmax(second row of x)...softmax(last row of x)⎞⎠⎟⎟⎟⎟
python 实现:
# GRADED FUNCTION: softmax
def softmax(x):
"""Calculates the softmax for each row of the input x.
Your code should work for a row vector and also for matrices of shape (n, m).
Argument:
x -- A numpy matrix of shape (n,m)
Returns:
s -- A numpy matrix equal to the softmax of x, of shape (n,m)
"""
### START CODE HERE ### (≈ 3 lines of code)
# Apply exp() element-wise to x. Use np.exp(...).
x_exp = np.exp(x)
# Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
x_sum = np.sum(x_exp,axis=1,keepdims=True)
# Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
s = x_exp / x_sum
### END CODE HERE ###
return s
x = np.array([
[9, 2, 5, 0, 0],
[7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(x)))
运行结果:
softmax(x) = [[ 9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
1.21052389e-04]
[ 8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
8.01252314e-04]]
注意:broadcasting适用于矩阵之间使用+-*/这些运算符号,但是对于np.dot
这种矩阵运算规则并不适用。
例子2:
# a.shape = (3,4)
# b.shape = (4,1)
for i in range(3):
for j in range(4):
c[i][j] = a[i][j] + b[j]
那么下面的那个式子是可以对应的:
- c = a + b
- c = a.T + b.T
- c = a + b.T
- c = a.T + b
考虑广播规则,我们可以知晓是第三个。