(吴恩达课程学习笔记)
Gradient checking that eliminates almost all of these problems, is useful when implement complex optimization models, and it will help you make sure and get sort of confident that your implementation of the algorithm is 100% correct.
To deal with little bugs which cannot be inspected, (e.g. it looks like the loss function is decreasing, however, you might just wind up with a neural network that has a higher level of error that you would with a bug-free implementation and you might just not know that was this subtle bug that gives worse performance.)