RNN(二) 前向和BPTT

RNN(二) 前向和BPTT

标签(空格分隔): RNN BPTT


basic definition

To simply notation, the RNN here only contains one input layer, one hidden layer and one putput layer. Notations are listed below:

neural layernodeindexnumber
input layerx(t)iN
previous hidden layers(t)hM
hidden layers(t-1)jM
output layery(t)kO
input->hiddenV(t)i,jN->M
previous hidden->hiddenU(t)h,jM->M
hidden->outputW(t)j,kM->O

Besides, P is the total number of available training samples which are indexed by l

forward

RNN forward
1. input->hidden

netj(t)=iNxi(t)vji+hMsh(t1)ujh+θj

sj(t)=f(netj(t))

2. hidden->output
netk(t)=jMsj(t)wkj+θk

yk(t)=g(netk(t))

f and g are the activate functions of hidden layer and output layer respectively.

backpropagation

prerequisite

Any network structure can be trained with backpropagation when desired output patterns exist and each function that has been used to calculate the actual output patterns is differentiable.

cost function

1.summed squared error(SSE)
The cost function can be any differentiable function that is able to measure the loss of the predicted values from the gold answers. The SSE is frequently-used, and works well in the training of conventional feed-forward neural networks.

C=12lPkO(dlkylk)2

2.cross extropy(CE)
The cross-entropy loss is used in Recurrent Neural Network Language Models(RNNLM) and performs well.
C=lPkOdlklnylk+(1dlk)ln(1ylk)

Discussion below is based on SSE.

error component

  1. error for output nodes
    output node
    δlk=Cnetlk=Cylkylknetlk=(dlkylk)g(ylk)
  2. error for hidden nodes
    hidden node
    δlj=(kOCylkylknetlknetlkslj)sljnetlj=kOδlkwkjf(netlj)

activate function

  1. sigmoid
    f(net)=11+enet

    f(net)=f(net){1f(net)}
  2. softmax
    g(netk)=enetkOkenetk

    g(netk)=enetk(Ojenetjenetk)(Ojenetj)2

gradient descent

According to the gradient descent, each weight change in the network should be proportional to the negative gradient of the cost function, with respect to the speci c weight:

Δw=ηCw

where η is the learning rate.
1. hidden->output
Δwkj=ηCwkj=ηlP(Cnetlk)netlkwkj=ηlPδlknetlkwkj=ηlPδlkslj

2. input->hidden
Δvji=ηCvji=ηlPδljxli

3. previous hidden->hidden
Δujh=ηCujh=ηlPδljs(l1)h

unfolding

In a recurrent neural network, errors can be propagated further, i.e. more than 2 layers, in order to capture longer history information. This process is usually called unfolding.
In an unfolded RNN, the recurrent weight is duplicated spatially for an arbitrary number of time steps, here refered to as T.

netlj(t)=iNxli(t)vji+hMs(l1)hujh+θj

s(l1)h=f(net(l1)h)

t-1 hidden node
Error for hidden nodes through time as:
δlj(t1)=Cnet(l1)j=hMCnetlhnetlhnet(l1)j

=(hMCnetlh)(netlhs(l1)j)(s(l1)jnet(l1)j)

=hMδlh(t)uhjf(net(l1)j)

where h is the index for the hidden node at time step t, and j for the hidden node at time step t-1.
此处原始论文使用的是 slj(t1) ,个人感觉应该是 netlj(t1) ,但是这种表示方式又不好解释,因为 t 时刻对应的下标是l t1 时刻对应的下标也是 l ,所以修改成了net(l1)j,认为 t 时刻对应的为l t1 时刻对应的是 l1 .

After all error deltas have been obtained, weights are folded back adding up to one big change for each unfolded weights.
1. input->hidden

Δvji(t)=ηzTlPδlj(tz)x(lz)i

2. previous hidden->hidden
Δujh(t)=ηzTlPδlj(tz)s(l1z)h

summary

  1. input->hidden
    vji(t+1)=vji(t)+ηzTlPδlj(tz)x(lz)i
  2. previous hidden->hidden
    ujh(t+1)=ujh(t)+ηzTlPδlj(tz)s(l1z)h
  3. hidden->output
    wkj(t+1)=wkj(t)+ηlPδlkslj

references

BackPropagation Through Time
A guide to recurrent neural networks and backpropagation

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值