Feature scaling speeds up gradient descent by avoiding many extra iterations that are required when one or more features take on much larger values than the rest.
参考:http://stackoverflow.com/questions/26225344/why-feature-scaling
http://xgli0910.blog.163.com/blog/static/46962168201310683159839/