Assignment1的答案一共被我分成了4部分,分别包含第1,2,3,4题。这部分包含第2题的答案。
2. Neural Network Basics (30 points)
(a). (3 points) Derive the gradients of the sigmoid function and show that it can be rewritten as a function of the function value (i.e. in some expression where only σ(x) , but not x , is present). Assume that the input
解:
(b). (3 points) Derive the gradient with regard to the inputs of a softmax function when cross entropy loss is used for evaluation, i.e. fi nd the gradients with respect to the softmax input vector θ , when the prediction is made by y^=softmax(θ) . Remember the cross entropy function is
where y is the one-hot label vector, and
解:根据提示,假设 y 的第k个值为1,其余值都为0,即
对于 θ 中的第 i 个元素