by Yangqing on 14 May 2014
For a sanity check, try running with a learning rate 0 to see if any nan errors pop up (they shouldn’t, since no learning takes place). If data is not initialized well, it might be possible that even 0.0001 is a too high learning rate.
by sguada on 13 May 2014
Try different initializations, for instance bias set to 0.1