jrae源码解析(二)

本文细述上文引出的RAECost和SoftmaxCost两个类。

SoftmaxCost

我们已经知道,SoftmaxCost类在给定features和label的情况下(超参数给定),衡量给定权重( hidden×catSize )的误差值 cost ,并指出当前的权重梯度。看代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@Override
     public  double  valueAt( double [] x)
     {
         if ( !requiresEvaluation(x) )
             return  value;
         int  numDataItems = Features.columns;
         
         int [] requiredRows = ArraysHelper.makeArray( 0 , CatSize- 2 );
         ClassifierTheta Theta = new  ClassifierTheta(x,FeatureLength,CatSize);
         DoubleMatrix Prediction = getPredictions (Theta, Features);
         
         double  MeanTerm = 1.0  / ( double ) numDataItems;
         double  Cost = getLoss (Prediction, Labels).sum() * MeanTerm;
         double  RegularisationTerm = 0.5  * Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W);
         
         DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm);
         DoubleMatrix Delta = Features.mmul(Diff.transpose());
     
         DoubleMatrix gradW = Delta.getColumns(requiredRows);
         DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows));
         
         //Regularizing. Bias does not have one.
         gradW = gradW.addi(Theta.W.mul(Lambda));
         
         Gradient = new  ClassifierTheta(gradW,gradb);
         value = Cost + RegularisationTerm;
         gradient = Gradient.Theta;
         return  value;
     }<br><br> public  DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br>    {<br>         int  numDataItems = Features.columns;<br>        DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br>        Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros( 1 ,numDataItems));<br>         return  Activation.valueAt(Input); <br>    }

 是个典型的2层神经网络,没有隐层,首先根据features预测labels,预测结果用softmax归一化,然后根据误差反向传播算出权重梯度。

此处增加200字。

这个典型的2层神经网络,label为一列向量,目标label置1,其余为0;转换函数为softmax函数,输出为每个label的概率。

计算cost的函数为getLoss,假设目标label的预测输出为 p ,则每个样本的cost也即误差函数为:

cost=E(p)=log(p)

根据前述的神经网络后向传播算法,我们得到( j 为目标label时,否则为0):

Ewij=Epjhjnetjxi=1pjpj(1pj)xi=(1pj)xi=(labeljpj)featurei

因此我们便理解了下面代码的含义:

1
DoubleMatrix Delta = Features.mmul(Diff.transpose());

 

RAECost

先看实现代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
@Override
     public  double  valueAt( double [] x)
     {
         if (!requiresEvaluation(x))
             return  value;
         
         Theta Theta1 = new  Theta(x,hiddenSize,visibleSize,dictionaryLength);
         FineTunableTheta Theta2 = new  FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);
         Theta2.setWe( Theta2.We.add(WeOrig) );
         
         final  RAEClassificationCost classificationCost = new  RAEClassificationCost(
                 catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);
         final  RAEFeatureCost featureCost = new  RAEFeatureCost(
                 AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1);
     
         Parallel.For(DataCell,
             new  Parallel.Operation<LabeledDatum<Integer,Integer>>() {
                 public  void  perform( int  index, LabeledDatum<Integer,Integer> Data)
                 {
                     try  {
                         LabeledRAETree Tree = featureCost.Compute(Data);
                         classificationCost.Compute(Data, Tree);                
                     } catch  (Exception e) {
                         System.err.println(e.getMessage());
                     }
                 }
         });
         
         double  costRAE = featureCost.getCost();
         double [] gradRAE = featureCost.getGradient().clone();
             
         double  costSUP = classificationCost.getCost();
         gradient = classificationCost.getGradient();
             
         value = costRAE + costSUP;
         for ( int  i= 0 ; i<gradRAE.length; i++)
             gradient[i] += gradRAE[i];
         
         System.gc();    System.gc();
         System.gc();    System.gc();
         System.gc();    System.gc();
         System.gc();    System.gc();
         
         return  value;
     }

cost由两部分组成,featureCost和classificationCost。程序遍历每个样本,用featureCost.Compute(Data)生成一个递归树,同时累加cost和gradient,然后用classificationCost.Compute(Data, Tree)根据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。

RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树,然后调用BackPropagate计算梯度并累加。具体的算法过程,下一章分解。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值