LR进行多分类theano代码分析

最新推荐文章于 2022-12-09 22:25:16 发布

NLPanda

最新推荐文章于 2022-12-09 22:25:16 发布

阅读量1.2k

点赞数

分类专栏：数据挖掘文章标签： theano-lr

数据挖掘专栏收录该内容

1 篇文章 0 订阅

订阅专栏

模型
逻辑斯谛回归是概率线性分类器。公式如下：

P (Y = i | x, W, b) = s o f t m a x i (W x + b) = e W i x + b i \sum j e W j x + b j

$P(Y=i|x,W,b)=softmax_i(Wx + b) \\ = \frac{e^{W_ix+b_i}}{\sum_je^W_jx+b_j}$

W $W$ 是权值矩阵，

b $b$ 为偏移向量。
模型预测类别

y p r e d = a r g m a x i P (Y = i | x, W, b)

$y_{pred}=argmax_iP(Y=i|x,W,b)$
code

#使用0初始化W,W是形状为（n_in,n_out）的矩阵
self.W = theano.shared(
         value=numpy.zeros(
             (n_in, n_out),
             dtype=theano.config.floatX
         ),
         name='W',
         borrow=True
)
#使用0对b进行初始化,b为长度为n_out的向量
self.b = theano.shared(
         value=numpy.zeros(
             (n_out,),
             dtype=theano.config.floatX
         ),
         name='b',
         borrow=True
)
#计算P(Y|x,W,b)
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
#计算y_pred
self.y_pred = T.argmax(self.p_y_given_x, axis=1)

定义损失函数
多分类逻辑斯谛回归通常采用negative log-likelihood作为损失，等价于最大化数据似然。
似然:

L (θ = W, b, D) = \sum i = 0 | D | l o g (P (Y = y (i) | x (i), W, b))

$L(\theta={W,b},D)=\sum_{i=0}^{|D|}log(P(Y=y^(i)|x^{(i)},W,b))$

l (θ = W, b, D) = - L (θ = W, b, D)

$l(\theta={W,b},D)=-L(\theta={W,b},D)$
code

# y.shape[0] 是y的行数，例如每个分块中样本的个数
# T.arange(y.shape[0]) 是一个向量[0,1,2...,n-1]
# T.log(self.p_y_given_x) 是一个矩阵一行是一个样本[[0,p(1),p(2)...,p(C)],
# [1,p(1),p(2)...,p(C)],...,[n,p(1),p(2)...,p(C)]].
# 下面T.log(self.p_y_given_x) 用LP表示
# LP[T.arange(y.shape[0]),y] 是一个向量v [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
# LP[n-1,y[n-1]]]  T.mean(LP[T.arange(y.shape[0]),y]) 是v中元素的期望
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

创建逻辑斯谛回归类

class LogisticRegression(object):
    """
    多类别分类逻辑斯谛回归
    """
    def __init__(self, input, n_in, n_out):
        #使用0初始化W,W是形状为（n_in,n_out）的矩阵
        self.W = theano.shared(
             value=numpy.zeros(
             (n_in, n_out),
             dtype=theano.config.floatX
             ),
             name='W',
             borrow=True
        )
        #使用0对b进行初始化,b为长度为n_out的向量
        self.b = theano.shared(
             value=numpy.zeros(
                 (n_out,),
                 dtype=theano.config.floatX
             ),
             name='b',
             borrow=True
        )
        #计算P(Y|x,W,b)
        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
        #计算y_pred
        self.y_pred = T.argmax(self.p_y_given_x, axis=1)

        # 模型参数
        self.params = [self.W, self.b]

    #T.log(self.p_y_given_x)为一个矩阵一行是一个样本[0,p(1),p(2),...,p(C)]
    #T.log(self.p_y_given_x)[0,1]表示第一个样本被分到第一个类别的概率
    #y.shape[0]样本个数，T.log(self.p_y_given_x)[T.arange(y.shape[0]), y]每个样本对应的正确类别的概率。
    def negative_log_likelihood(self, y):
        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])    

    def errors(self, y):
        # 检查y与y_pred是否有相同维度
        if y.ndim != self.y_pred.ndim:
            raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', self.y_pred.type)
            )
        # 检查y的数据类型是否正确
        if y.dtype.startswith('int'):
            # T.neq操作返回0或1,1代表一次错误预测
            return T.mean(T.neq(self.y_pred, y))
        else:
            raise NotImplementedError()

用以下方法进行类的初始化

 x = T.matrix('x')  
 y = T.ivector('y')
 classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)

使用classifier.negative_log_likelihood获得cost

 cost = classifier.negative_log_likelihood(y)

模型学习
Theano中求梯度 $\partial l/\partial W$ 和 $\partial l/\partial b$ 可以用一下方法

g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)

通过以下进行方法train_model的一次梯度下降

    updates = [(classifier.W, classifier.W - learning_rate * g_W),
               (classifier.b, classifier.b - learning_rate * g_b)]
    train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            #train_set_x[index * batch_size: (index + 1) * batch_size]赋给x
            #train_set_y[index * batch_size: (index + 1) * batch_size]赋给y
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

模型测试

   test_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: test_set_x[index * batch_size: (index + 1) * batch_size],
            y: test_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

    validate_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: valid_set_x[index * batch_size: (index + 1) * batch_size],
            y: valid_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

完整代码参见

NLPanda

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LR进行多分类theano代码分析

模型逻辑斯谛回归是概率线性分类器。公式如下： P(Y=i|x,W,b)=softmaxi(Wx+b)=eWix+bi∑jeWjx+bjP(Y=i|x,W,b)=softmax_i(Wx + b) \\ = \frac{e^{W_ix+b_i}}{\sum_je^W_jx+b_j} WW是权值矩阵，bb为偏移向量。模型预测类别 ypred=argmaxiP(Y=i|x,W,b) y_{
复制链接

扫一扫

专栏目录