目录
- 1b有问题
- Supervised, Reinforcement,Unsupervised Learning 区别和概念
- meant by Overfitting in neural networks
- how Dropout is used for neural networks(train&test)
- Squared Error, Cross Entropy, Softmax, Weight Decay
- difference between Maximum Likelihood estimation and Bayesian Inference(supervised)
- concept of Momentum( as an enhancement for Gradient Descent)
1b有问题
Supervised, Reinforcement,Unsupervised Learning 区别和概念
Supervised Learning: The system is presented with training items consisting of an input and a target output. The aim is to predict the output, given the input (for the training set as well as an unseen test set).
Reinforcement Learning: The system chooses actions in a simulated environment, observing its state and receiving rewards along the way. The aim is to maximize the cumulative reward.
Unsupervised Learning: The system is presented with training items consisting of only an input (no target value). The aim is to extract hidden features or other structure from these data.
meant by Overfitting in neural networks
Overfitting is where the training set error continues to reduce, but the test set error stalls or increases.
how to avoid Overfitting
- limiting the number of neurons or connections in the network
- early stopping, with a validation set
- dropout
- weight decay (this can avoid overfitting by limiting the size of the weights)
how Dropout is used for neural networks(train&test)
During each minibatch of training, a fixed percentage (usually, one half) of nodes are chosen to be inactive. In the testing phase, all nodes are active but the activation of each node is multiplied by the same percentage that was used in training.
Squared Error, Cross Entropy, Softmax, Weight Decay
Assume ziz_izi is the actual output, tit_iti is the target output and wjw_jwj are the weights.
Squared Error: E=12∑i(zi−ti)2E = \frac{1}{2}\sum_i (z_i - t_i)^2E=21∑i(zi−ti)2
Cross Entropy: E=∑i(−tilogzi−(1−ti)log(1−zi))E = \sum_i (-t_i\log z_i - (1-t_i)\log(1-z_i))E=∑i(−tilogzi−(1−ti)log(1−zi))
Softmax: E=−(zi−log∑jexp(zj))E = -(z_i - \log\sum_j \exp(z_j))E=−(zi−log∑jexp(zj)), where iii is the correct class.
Weight Decay: E=12∑jwj2E = \frac{1}{2}\sum_j w_j^2E=21∑jwj2
difference between Maximum Likelihood estimation and Bayesian Inference(supervised)
In Maximum Likelihood estimation, the hypothesis h∈Hh\in Hh∈H is chosen which maximises the conditional probability P(D∣h)P(D\mid h)P(D∣h) of the observed data DDD, conditioned on hhh.
In Bayesian Inference, the hypothesis h∈Hh\in Hh∈H is chosen which maximizes P(D∣h)P(h)P(D\mid h)P(h)P(D∣h)P(h), where P(h)P(h)P(h) is the prior probability of hhh.
concept of Momentum( as an enhancement for Gradient Descent)
A running average of the differentials for each weight is maintained and used to update the weights as follows:
δw = αδw −ηdEdw\delta w\ =\ \alpha\delta w\ -\eta\frac{dE}{dw}δw = αδw −ηdwdE
w = w + δww\ =\ w\ +\ \delta ww = w + δw
The constant α\alphaα with 0≤α<10\leq\alpha < 10≤α<1 is called the momentum.