Deep Learning 学习笔记（一）——softmax Regression

最新推荐文章于 2024-05-27 16:46:38 发布

weixin_34062329

最新推荐文章于 2024-05-27 16:46:38 发布

阅读量223

点赞数

文章标签：人工智能 matlab python

原文链接：http://www.cnblogs.com/arsenicer/p/4272522.html

版权

Deep Learning 学习笔记（一）——softmax Regression

　　茫然中不知道该做什么，更看不到希望。

　　偶然看到coursera上有Andrew Ng教授的机器学习课程以及他UFLDL上的深度学习课程，于是静下心来，视频一个个的看，作业一个一个的做，程序一个一个的写。N多数学的不懂、Matlab不熟悉，开始的时候学习进度慢如蜗牛，坚持了几个月，终于也学完了。为了避免遗忘，在这里记下一些内容。由于水平有限，Python也不是太熟悉，英语也不够好，有错误或不当的地方，请不吝赐教。

　　对于softmax背后的理论还不是很清楚，不知道是来自信息论还是概率。不过先了解个大概，先用起来，背后的理论再慢慢补充。

　　softmax的基本理论：

　　对于给定的输入x和输出有，K类分类器每类的概率为P(y=k|x,Θ)，即

　　模型参数 θ⁽¹⁾,θ⁽²⁾,…,θ⁽^K⁾∈Rⁿ ，矩阵θ以K*n的形式比较方便（其中n为输入x的维度或特征数）。

　　softmax回归的代价函数：

　　其中1{y⁽ⁱ⁾=k}为指示函数，即y⁽ⁱ⁾为k时其值为1，否则为0，或则说括号内的表达式为真时其值为1，否则为0

　　梯度公式：

　　在实现此模型时遇到了2个问题，卡了一段时间：

　　　　1. 指示函数如何实现？我的实现方法：把y转换为一个k个元素的向量yv，如果y=i，则yv[i]=1,其他位置为零。在代价函数中用这个向量和概率P元素相乘、在梯度公式中与概率P相减即可实现指示函数。

　　　　2. 对矩阵不太熟练，矢量化花费了不少时间。

　　对概率P的参数θ进行平移得到结果和原概率一致，因此可得到参数θ是有冗余的结论。解决方法有2种，第一种是在代价函数和梯度中加上L2范式惩罚项，这种方式又增加了一个自由参数：惩罚项系数。第二种方式是固定某个类的参数为零，这样的方式不影响最终的分类结果。在我的实现方式里使用第二种方式。

　　教程里提到的梯度检验的方法非常有效，可以有效验证代价函数和梯度实现是否正确。只要通过梯度检验，一般都能得到正确的结果。

　　UFLDL教程上的练习是Matlab，由于对Matlab熟悉度不够，我使用Python+numpy+scipy来实现。代码的意义参考代码中注释。

　　第一段代码是一个抽象的监督学习的模型类，可以用于神经网络等监督学习模型。

  1 import numpy as np
  2 from dp.common.optimize import minFuncSGD
  3 import scipy.optimize as spopt
  4 
  5 class SupervisedLearningModel(object):
  6 
  7     def flatTheta(self):
  8         '''
  9         convert weight and intercept to 1-dim vector
 10         '''
 11         pass
 12     
 13     def rebuildTheta(self,theta):
 14         '''
 15         overwrite the method in SupervisedLearningModel        
 16         convert 1-dim theta to weight and intercept
 17         Parameters:
 18             theta    - The vector hold the weights and intercept, needed by scipy.optimize function
 19                        size:outputSize*inputSize
 20         '''
 21             
 22     def cost(self, theta,X,y):
 23         '''
 24         This method is used to some optimize function such as fmin_cg,fmin_l_bfgs_b in scipy.optimize
 25         Parameters:
 26             theta        - 1-Dim vector of weight
 27             X            - samples, numFeatures by numSamples
 28             y            - labels,  numSamples elements vector
 29         return:
 30             the model cost
 31         '''
 32         pass
 33     
 34     def gradient(self, theta,X,y):
 35         '''
 36         This method is used to some optimize function such as fmin_cg,fmin_l_bfgs_b in scipy.optimize
 37         Parameters:
 38             theta        - 1-Dim vector of weight
 39             X            - samples, numFeatures by numSamples
 40             y            - labels,  numSamples elements vector
 41         return:
 42             the model gradient
 43         '''        
 44         pass
 45     
 46     def costFunc(self,theta,X,y):
 47         '''
 48         This method is used to some optimize function such as minFuncSGD in this package
 49         Parameters:
 50             theta        - 1-Dim vector of weight
 51             X            - samples, numFeatures by numSamples
 52             y            - labels,  numSamples elements vector
 53         return:
 54             the model cost and gradient
 55         '''     
 56         pass
 57     
 58     def predict(self, Xtest):
 59         '''
 60         predict the test samples
 61         Parameters:
 62             X            - test samples, numFeatures by numSamples 
 63         return:
 64             the predict result,a vector, numSamples elements
 65         '''
 66         pass
 67     
 68     def performance(self,Xtest,ytest):
 69         '''
 70         Before calling this method, this model should be training
 71         Parameter:
 72             Xtest    - The data to be predicted, numFeatures by numData
 73         '''            
 74         pred = self.predict(Xtest)   
 75         return np.mean(pred == ytest) * 100        
 76 
 77     def train(self,X,y): 
 78         '''
 79         use this method to train the model.
 80         Parameters:
 81             theta        - 1-Dim vector of weight
 82             X            - samples, numFeatures by numSamples
 83             y            - labels,  numSamples elements vector        
 84         '''                      
 85         theta =self.flatTheta()
 86         
 87         ret = spopt.fmin_l_bfgs_b(self.cost, theta, fprime=self.gradient,args=(X,y),m=200,disp=1, maxiter=100)
 88         opttheta=  ret[0]    
 89         
 90         '''
 91         opttheta = spopt.fmin_cg(self.cost, theta, fprime=self.gradient,args=(X,y),full_output=False,disp=True, maxiter=100)        
 92         '''
 93         '''
 94         options=dict()
 95         options['epochs']=10
 96         options['alpha'] = 2
 97         options['minibatch']=256
 98         opttheta = minFuncSGD(self.costFunc,theta,X,y,options)
 99         
100         '''
101         self.rebuildTheta(opttheta)

View Code

　　第二段代码定义了一个单一神经网络层NNLayer，从第一段代码中的SupervisedModel类继承下来。它在，softmax和多层神经网络中用得到。

  1 class NNLayer(SupervisedLearningModel):
  2     '''
  3     This class is single layer of Neural network 
  4     '''
  5     def __init__(self, inputSize,outputSize,Lambda,actFunc='sigmoid'):
  6         '''
  7         Constructor: initialize one layer w.r.t params
  8         parameters : 
  9             inputSize         - the number of input elements
 10             outputSize        - the number of output
 11             lambda            - weight decay parameter
 12             actFunc        - the can be sigmoid,tanh,rectified linear function
 13         '''
 14         super().__init__()
 15         self.inputSize = inputSize
 16         self.outputSize = outputSize
 17         self.Lambda = Lambda        
 18         self.actFunc=sigmoid
 19         self.actFuncGradient=sigmodGradient
 20         
 21         self.input=0            #input of this layer
 22         self.activation=0       #output of the layer
 23         self.delta=0            #the error of this layer        
 24         self.W=0                #the weight
 25         self.b=0                #the intercept
 26                 
 27         if actFunc=='sigmoid':  
 28             self.actFunc =  sigmoid
 29             self.actFuncGradient = sigmodGradient        
 30         if actFunc=='tanh':            
 31             self.actFunc =  tanh
 32             self.actFuncGradient =tanhGradient
 33         if actFunc=='rectfiedLinear':            
 34             self.actFunc =  rectfiedLinear  
 35             self.actFuncGradient =  rectfiedLinearGradient
 36 
 37         #epsilon的值是一个经验公式  
 38         #initialize weights and intercept (bias)
 39         epsilon_init = 2.4495/np.sqrt(self.inputSize+self.outputSize)*0.001
 40         theta = np.random.rand(self.outputSize, self.inputSize + 1) * 2 * epsilon_init - epsilon_init
 41         self.rebuildTheta(theta)
 42                         
 43     def flatTheta(self):
 44         '''
 45         convert weight and intercept to 1-dim vector
 46         '''
 47         W = np.hstack((self.W, self.b))
 48         return W.ravel() 
 49     
 50     def rebuildTheta(self,theta):
 51         '''
 52         overwrite the method in SupervisedLearningModel        
 53         convert 1-dim theta to weight and intercept
 54         Parameters:
 55             theta    - The vector hold the weights and intercept, needed by scipy.optimize function
 56                        size:outputSize*inputSize
 57         '''
 58         W=theta.reshape(self.outputSize,-1)
 59         self.b=W[:,-1].reshape(self.outputSize,1)   #bias b is a vector with outputSize elements
 60         self.W = W[:,:-1]  
 61 
 62     def forward(self):
 63         '''
 64         Parameters:
 65             X -  The examples in a matrix, 
 66                 it's dimensionality is inputSize by numSamples
 67         '''  
 68         Z = np.dot(self.W,self.input)+self.b     #Z        
 69         self.activation= self.actFunc(Z)             #activations
 70         return self.activation
 71     
 72     def backpropagate(self):
 73         '''
 74         parameter:
 75             inputMat - the actviations of previous layer, or input of this layer,
 76                          inputSize by numSamples
 77             delta - the next layer error term, outputSize by numSamples
 78         
 79         assume current layer number is l,
 80         delta is the error term of layer l+1.
 81         delta(l) = (W(l).T*delta(l+1)).f'(z)
 82         If this layer is the first hidden layer,this method should not
 83         be called
 84         The f' is re-writed to void the second call to the activation function
 85         '''
 86         return np.dot(self.W.T,self.delta)*self.actFuncGradient(self.input)
 87     
 88     def layerGradient(self):
 89         '''
 90         grad_W(l)=delta(l+1)*input.T
 91         grad_b(l) = SIGMA(delta(l+1))
 92         parameters:
 93             inputMat - input of this layer, inputSize by numSamples
 94             delta    - the next layer error term
 95         '''
 96         m=self.input.shape[1]
 97         gw = np.dot(self.delta,self.input.T)/m
 98         gb = np.sum(self.delta,1)/m
 99         #combine gradients of weights and intercepts
100         #and flat it
101         grad = np.hstack((gw, gb.reshape(-1,1)))
102          
103         return grad
104         
105     
106 def sigmoid(Z):
107     return 1.0 /(1.0 + np.exp(-Z))
108 
109 def sigmodGradient (a):
110     #a = sigmoid(Z)
111     return a*(1-a)
112 
113 def tanh(Z):
114     e1=np.exp(Z)
115     e2=np.exp(-Z)
116     return (e1-e2)/(e1+e2)
117 
118 def tanhGradient(a):
119     return 1-a**2
120 
121 def rectfiedLinear(Z):
122     a = np.zeros(Z.shape)+Z
123     a[a<0]=0
124     return a
125 
126 def rectfiedLinearGradient(a):
127     b = np.zeros(a.shape)+a    
128     b[b>0]=1
129     return b

View Code

　　第三段代码是softmax回归的实现，它从NNLayer继承。

  1 import numpy as np
  2 #import scipy.optimize as spopt
  3 from dp.supervised import NNBase
  4 from time import time
  5 #from dp.common.optimize import minFuncSGD
  6 class SoftmaxRegression(NNBase.NNLayer):
  7     '''
  8     We assume the last class weight to be zeros in this implementation.
  9     The weight decay is not used here.
 10 
 11     '''
 12     def __init__(self, numFeatures, numClasses,Lambda=0):
 13         '''
 14         Initialization of weights,intercepts and other members 
 15         Parameters:
 16             numClasses    - The number of classes to be classified
 17             X             - The training samples, numFeatures by numSamples
 18             y             - The labels of training samples, numSamples elements vector
 19         '''      
 20 
 21         # call the super constructor to initialize the weights and intercepts
 22         # We do not need the last weights and intercepts of the last class
 23         super().__init__(numFeatures, numClasses - 1, Lambda, None)
 24         
 25         #self.X=0        
 26         self.y_mat=0  
 27            
 28     def predict(self, Xtest):
 29         '''
 30         Prediction.
 31         Before calling this method, this model should be training
 32         Parameter:
 33             Xtest    - The data to be predicted, numFeatures by numData
 34         '''
 35         Z = np.dot(self.W, Xtest) + self.b
 36         #add the prediction of the last class,they are all zeros
 37         lastClass = np.zeros((1, Xtest.shape[1]))
 38         Z = np.vstack((Z, lastClass))
 39         #get the index of max value in each column, it is the prediction
 40         return np.argmax(Z, 0)       
 41        
 42     def forward(self):
 43         '''
 44         get the matrix of softmax hypothesis
 45         this method  will be called by cost and gradient methods
 46         Parameters:
 47             
 48         '''
 49         h = np.dot(self.W, self.input) + self.b
 50         h = np.exp(h)
 51         #add probabilities of the last class, they are all ones 
 52         h = np.vstack((h, np.ones((1, self.input.shape[1]))))
 53         #The probability of all classes
 54         hsum = np.sum(h, axis=0)
 55         #get the probability of each class
 56         self.activation = h / hsum
 57         #delta = -(self.y_mat-h)
 58         self.delta = self.activation - self.y_mat
 59         self.delta=self.delta[:-1, :]
 60         
 61         return self.activation
 62 
 63     def setTrainingLabels(self,y):
 64         # convert Vector y to a matrix y_mat.
 65         # For sample i, if it belongs to the k-th class, 
 66         # y_mat[k,i]=1 (k==j), y_mat[k,i]=0 (k!=j)        
 67         y = y.astype(np.int64)
 68         m=y.shape[0]
 69         yy = np.arange(m)
 70         self.y_mat = np.zeros((self.outputSize+1, m))          
 71         self.y_mat[y, yy] = 1
 72         
 73     def softmaxforward(self,theta,X,y):
 74         self.input = X
 75         self.setTrainingLabels(y)
 76         self.rebuildTheta(theta)
 77         return self.forward()
 78 
 79     def cost(self, theta,X,y):
 80         '''
 81         The cost function.
 82         Parameters:
 83             theta    - The vector hold the weights and intercept, needed by scipy.optimize function
 84                        size: (numClasses - 1)*(numFeatures + 1)        
 85         '''
 86         h = np.log(self.softmaxforward(theta,X,y))
 87         #h * self.y_mat, apply the indicator function
 88         cost = -np.sum(h *self.y_mat, axis=(0, 1))
 89         
 90         return cost / X.shape[1]
 91     
 92     def gradient(self, theta,X,y):
 93         '''
 94         The gradient function.
 95         Parameters:
 96             theta    - The vector hold the weights and intercept, needed by scipy.optimize function
 97                        size: (numClasses - 1)*(numFeatures + 1)        
 98         '''
 99         self.softmaxforward(theta,X,y)        
100 
101         #get the gradient
102         grad = super().layerGradient()
103                
104         return grad.ravel()
105     
106     def costFunc(self,theta,X,y):
107  
108         grad=self.gradient(theta, X, y)
109         h=np.log(self.activation)
110         cost = -np.sum(h * self.y_mat, axis=(0, 1))/X.shape[1]
111         return cost,grad    
112 
113 
114 def checkGradient(X,y):
115         
116     sm = SoftmaxRegression(X.shape[0], 10)
117     #W = np.hstack((sm.W, sm.b))
118     #sm.setTrainData(X, y)
119     theta = sm.flatTheta()    
120     #grad = sm.gradient(theta,X, y)
121     cost,grad=sm.costFunc(theta, X, y)   
122     numgrad = np.zeros(grad.shape)
123     
124     e = 1e-6
125     
126     for i in range(np.size(grad)):         
127         theta[i]=theta[i]-e
128         loss1,g1 =sm.costFunc(theta,X, y)
129         theta[i]=theta[i]+2*e
130         loss2,g2 = sm.costFunc(theta,X, y)
131         theta[i]=theta[i]-e            
132         
133         numgrad[i] = (-loss1 + loss2) / (2 * e)
134         
135     print(np.sum(np.abs(grad-numgrad))/np.size(grad))

View Code

测试数据使用MNIST数据集。测试结果，正确率在92.5%左右。

测试代码：

 1     X = np.load('../../common/trainImages.npy') / 255 3     
 2　　　y = np.load('../../common/trainLabels.npy')
 4     '''    
 5     X1=X[:,:10]
 6     y1=y[:10]
 7     checkGradient(X1,y1)
 8     '''    
 9     Xtest = np.load('../../common/testImages.npy') / 25511     
10　　　ytest = np.load('../../common/testLabels.npy')
12     sm = SoftmaxRegression(X.shape[0], 10)
13     t0=time()
14     sm.train(X,y)
15     print('training Time %.5f s' %(time()-t0))
16 
17     print('test acc :%.3f%%' % (sm.performance(Xtest,ytest)))

参考资料：

Softmax Regression http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/

posted on 2015-02-04 15:39 arsenicer 阅读( ...) 评论( ...) 编辑收藏

转载于:https://www.cnblogs.com/arsenicer/p/4272522.html

weixin_34062329

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Deep Learning 学习笔记（一）——softmax Regression

Deep Learning 学习笔记（一）——softmax Regression 　　茫然中不知道该做什么，更看不到希望。　　偶然看到coursera上有Andrew Ng教授的机器学习课程以及他UFLDL上的深度学习课程，于是静下心来，视频一个个的看，作业一个一个的做，程序一个一个的写。N多数学的不懂、Matlab不熟悉，开始的时候学习...
复制链接

扫一扫