系列前言
参考文献:
- RNNLM - Recurrent Neural Network Language Modeling Toolkit(点此阅读)
- Recurrent neural network based language model(点此阅读)
- EXTENSIONS OF RECURRENT NEURAL NETWORK LANGUAGE MODEL(点此阅读)
- Strategies for Training Large Scale Neural Network Language Models(点此阅读)
- STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKS(点此阅读)
- A guide to recurrent neural networks and backpropagation(点此阅读)
- A Neural Probabilistic Language Model(点此阅读)
- Learning Long-Term Dependencies with Gradient Descent is Difficult(点此阅读)
- Can Artificial Neural Networks Learn Language Models?(点此阅读)
前一篇是网络的前向计算,这篇是网络的学习算法,学习算法我在rnnlm原理及BPTT数学推导中介绍了。学习算法主要更新的地方在网络中的权值,这个最终版本的网络的权值大体可以分为三个部分来看:第一个是网络中类似输入到隐层的权值,隐层到输出层的权值。第二个是网络中ME的部分,即输入层到输出层的权值部分。第三个来看是BPTT的部分。我先把整个网络的ME+Rnn图放上来,然后再贴带注释的源码,结构图如下:
下面代码还是分成两部分来看,一部分是更新非BPTT部分,一个是更新BPTT部分,如下:
//反传误差,更新网络权值
void CRnnLM::learnNet(int last_word, int word)
{
//word表示要预测的词,last_word表示当前输入层所在的词
int a, b, c, t, step;
real beta2, beta3;
//alpha表示学习率,初始值为0.1, beta初始值为0.0000001;
beta2=beta*alpha;
//这里注释不懂,希望明白的朋友讲下~
beta3=beta2*1; //beta3 can be possibly larger than beta2, as that is useful on small datasets (if the final model is to be interpolated wich backoff model) - todo in the future
if (word==-1) return;
//compute error vectors,计算输出层的(只含word所在类别的所有词)误差向量
for (c=0; c<class_cn[vocab[word].class_index]; c++) {
a=class_words[vocab[word].class_index][c];
neu2[a].er=(0-neu2[a].ac);
}
neu2[word].er=(1-neu2[word].ac); //word part
//flush error
for (a=0; a<layer1_size; a++) neu1[a].er=0;
for (a=0; a<layerc_size; a++) neuc[a].er=0;
//计算输出层的class部分的误差向量
for (a=vocab_size; a<layer2_size; a++) {
neu2[a].er=(0-neu2[a].ac);
}
neu2[vocab[word].class_index+vocab_size].er=(1-neu2[vocab[word].class_index+vocab_size].ac); //class part
//计算特征所在syn_d中的下标,和上面一样,针对ME中word部分
if (direct_size>0) { //learn direct connections between words
if (word!=-1) {
unsigned long long hash[MAX_NGRAM_ORDER];
for (a=0; a<direct_order; a++) hash[a]=0;
for (a=0; a<direct_order; a++) {
b=0;
if (a>0) if (history[a-1]==-1) break;
hash[a]=PRIMES[0]*PRIMES[1]*(unsigned long long)(vocab[word].class_index+1);
for (b=1; b<=a; b++) hash[a]+=PRIMES[(a*PRIMES[b]+b)%PRIMES_SIZE]*(unsigned long long)(history[b-1]+1);
hash[a]=(hash[a]%(direct_size/2))+(direct_size)/2;
}
//更新ME中的权值部分,这部分是正对word的
for (c=0; c<class_cn[vocab[word].class_index]; c++) {
a=class_words[vocab[word].class_index][c];
//这里的更新公式很容易推导,利用梯度上升,和RNN是一样的
//详见我的这篇文章,另外这里不同的是权值是放