matrix/vector derivatives

最新推荐文章于 2022-11-23 17:00:41 发布

ForABiggerWorld

最新推荐文章于 2022-11-23 17:00:41 发布

阅读量777

点赞数

deep learning 专栏收录该内容

92 篇文章 1 订阅

订阅专栏

Gradients for vectorized operations

The above sections were concerned with single variables, but all concepts extend in a straight-forward manner to matrix and vector operations. However, one must pay closer attention to dimensions and transpose operations.

Matrix-Matrix multiply gradient. Possibly the most tricky operation is the matrix-matrix multiplication (which generalizes all matrix-vector and vector-vector) multiply operations:

# forward pass
W = np.random.randn(5, 10)
X = np.random.randn(10, 3)
D = W.dot(X)

# now suppose we had the gradient on D from above in the circuit
dD = np.random.randn(*D.shape) # same shape as D
dW = dD.dot(X.T) #.T gives the transpose of the matrix
dX = W.T.dot(dD)

Tip: use dimension analysis! Note that you do not need to remember the expressions for dW and dX because they are easy to re-derive based on dimensions. For instance, we know that the gradient on the weights dW must be of the same size as W after it is computed, and that it must depend on matrix multiplication of X and dD (as is the case when both X,W are single numbers and not matrices). There is always exactly one way of achieving this so that the dimensions work out. For example, X is of size [10 x 3] and dD of size [5 x 3], so if we want dWand W has shape [5 x 10], then the only way of achieving this is with dD.dot(X.T), as shown above.

Work with small, explicit examples. Some people may find it difficult at first to derive the gradient updates for some vectorized expressions. Our recommendation is to explicitly write out a minimal vectorized example, derive the gradient on paper and then generalize the pattern to its efficient, vectorized form.

Erik Learned-Miller has also written up a longer related document on taking matrix/vector derivatives which you might find helpful. Find it here.

ForABiggerWorld

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
matrix/vector derivatives

Gradients for vectorized operationsThe above sections were concerned with single variables, but all concepts extend in a straight-forward manner to matrix and vector operations. However, one must
复制链接

扫一扫