jrae源码解析（二）

最新推荐文章于 2021-03-02 01:53:43 发布

置顶小少爷984645

最新推荐文章于 2021-03-02 01:53:43 发布

阅读量1.9k

点赞数

本文细述上文引出的RAECost和SoftmaxCost两个类。

SoftmaxCost

我们已经知道，SoftmaxCost类在给定features和label的情况下（超参数给定），衡量给定权重（ hidden×catSize ）的误差值 cost ,并指出当前的权重梯度。看代码。

 
         @Override 
        
         public  
         double  
         valueAt( 
         double 
         [] x)  
        
         { 
        
         if 
         ( !requiresEvaluation(x) ) 
        
         return  
         value; 
        
         int  
         numDataItems = Features.columns; 
        
         int 
         [] requiredRows = ArraysHelper.makeArray( 
         0 
         , CatSize- 
         2 
         ); 
        
         ClassifierTheta Theta =  
         new  
         ClassifierTheta(x,FeatureLength,CatSize); 
        
         DoubleMatrix Prediction = getPredictions (Theta, Features); 
        
         double  
         MeanTerm =  
         1.0  
         / ( 
         double 
         ) numDataItems; 
        
         double  
         Cost = getLoss (Prediction, Labels).sum() * MeanTerm;  
        
         double  
         RegularisationTerm =  
         0.5  
         * Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W); 
        
         DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm); 
        
         DoubleMatrix Delta = Features.mmul(Diff.transpose()); 
        
         DoubleMatrix gradW = Delta.getColumns(requiredRows); 
        
         DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows)); 
        
         //Regularizing. Bias does not have one. 
        
         gradW = gradW.addi(Theta.W.mul(Lambda)); 
        
         Gradient =  
         new  
         ClassifierTheta(gradW,gradb); 
        
         value = Cost + RegularisationTerm; 
        
         gradient = Gradient.Theta; 
        
         return  
         value;  
        
         }<br><br> 
         public  
         DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br>    {<br>         
         int  
         numDataItems = Features.columns;<br>        DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br>        Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros( 
         1 
         ,numDataItems));<br>         
         return  
         Activation.valueAt(Input); <br>    }

是个典型的2层神经网络，没有隐层，首先根据features预测labels，预测结果用softmax归一化，然后根据误差反向传播算出权重梯度。

此处增加200字。

这个典型的2层神经网络，label为一列向量，目标label置1，其余为0；转换函数为softmax函数，输出为每个label的概率。

计算cost的函数为getLoss，假设目标label的预测输出为 p∗ ，则每个样本的cost也即误差函数为：

c o s t = E (p *) = - log (p *)

根据前述的神经网络后向传播算法，我们得到( j 为目标label时，否则为0)：

\partial E \partial w i j = \partial E \partial p j \partial h j \partial n e t j x i = - 1 p j p j (1 - p j) x i = - (1 - p j) x i = - (l a b e l j - p j) f e a t u r e i

因此我们便理解了下面代码的含义：

 
         DoubleMatrix Delta = Features.mmul(Diff.transpose());

RAECost

先看实现代码：

 
         @Override 
        
         public  
         double  
         valueAt( 
         double 
         [] x) 
        
         { 
        
         if 
         (!requiresEvaluation(x)) 
        
         return  
         value; 
        
         Theta Theta1 =  
         new  
         Theta(x,hiddenSize,visibleSize,dictionaryLength); 
        
         FineTunableTheta Theta2 =  
         new  
         FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength); 
        
         Theta2.setWe( Theta2.We.add(WeOrig) ); 
        
         final  
         RAEClassificationCost classificationCost =  
         new  
         RAEClassificationCost( 
        
         catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2); 
        
         final  
         RAEFeatureCost featureCost =  
         new  
         RAEFeatureCost( 
        
         AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1); 
        
         Parallel.For(DataCell,  
        
         new  
         Parallel.Operation<LabeledDatum<Integer,Integer>>() { 
        
         public  
         void  
         perform( 
         int  
         index, LabeledDatum<Integer,Integer> Data) 
        
         { 
        
         try  
         { 
        
         LabeledRAETree Tree = featureCost.Compute(Data); 
        
         classificationCost.Compute(Data, Tree);                  
        
         }  
         catch  
         (Exception e) { 
        
         System.err.println(e.getMessage()); 
        
         } 
        
         } 
        
         }); 
        
         double  
         costRAE = featureCost.getCost(); 
        
         double 
         [] gradRAE = featureCost.getGradient().clone(); 
        
         double  
         costSUP = classificationCost.getCost(); 
        
         gradient = classificationCost.getGradient(); 
        
         value = costRAE + costSUP; 
        
         for 
         ( 
         int  
         i= 
         0 
         ; i<gradRAE.length; i++) 
        
         gradient[i] += gradRAE[i]; 
        
         System.gc();    System.gc(); 
        
         System.gc();    System.gc(); 
        
         System.gc();    System.gc(); 
        
         System.gc();    System.gc(); 
        
         return  
         value; 
        
         }

cost由两部分组成，featureCost和classificationCost。程序遍历每个样本，用featureCost.Compute(Data)生成一个递归树，同时累加cost和gradient，然后用classificationCost.Compute(Data, Tree)根据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。

RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树，然后调用BackPropagate计算梯度并累加。具体的算法过程，下一章分解。

小少爷984645

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
jrae源码解析（二）

本文细述上文引出的RAECost和SoftmaxCost两个类。SoftmaxCost我们已经知道，SoftmaxCost类在给定features和label的情况下（超参数给定），衡量给定权重（hidden×catSize）的误差值cost,并指出当前的权重梯度。看代码。12345678
复制链接

扫一扫