Linear Model(Linear regression && Logistic Regression)

总体把握

给定d个属性描述的示例X =\left \{ x_{1},x_{2},...,x_{d} \right \},其中x_{i}x在第i个属性上的取值,线性模型试图学得一个通过属性的线性组合来进行预测的函数,许多非线性模型(nonlinear model)可在线性模型的基础上通过引入层级结构或高维映射获得.

y = f(x)=\omega ^{T}x+b=\omega _{1}x_{1}+\omega_{2}x_{2}+...+\omega_{d}x_{d}+b , (1)

一.Linear regression

线性回归致力于f(x_{i})\simeq y_{i}均方误差是回归任务中最常用的性能度量.

MSE = \frac{1}{m}\sum_{i=0}^{m}(f(x_{i})-y_{i})^{2}

基于均方误差最小化来进行模型求解的方法称为“最小二乘法”,线性回归中最小二乘法就是试图找到一个条直线是所有样本到直线距离之和最小。

(\omega^{*},b^{*})=arg\min_{(\omega,b)}\sum_{i=1}^{m}(f(x_{i})-y_{i})^{2}

求解\omega,b使MSE最小化的过程,称为线性回归模型的最小二乘参数估计

\frac{\partial E_{\omega,b}}{\partial \omega} = 2\left ( \omega\sum_{i=1}^{m}x_{i}^{2} - \sum_{i=1}^{m}(y_{i}-b)x_{i} \right )=0

\frac{\partial E_{\omega,b}}{\partial b} = 2\left (mb - \sum_{i=1}^{m}(y_{i}-\omega x_{i} )\right )=0

令上式为零,可以得到\omega,b最优解的闭式解

\omega = \frac{\sum_{i=1}^{m}y_{i}(x_{i}-\overline{x})}{\sum_{i=1}^{m}x_{i}^{2} - \frac{1}{m}\left ( \sum_{i=1}^{m}x_{i} \right )^{2}}, b=\frac{1}{m}\sum_{i=1}^{m}(y_{i }- \omega x_{i}),\overline{x} = \frac{1}{m}\sum_{i=1}^{m}x_{i}

 正规方程

\omega,b吸入向量形式\widehat{\omega} = (\omega;b),相应数据集D变为m*(d+1) 大小的矩阵X,每行对应于一个示例,该行前d个元素对应于示例的d 个属性值,最后一个元素恒置为1 

根据均方误差最小化

\widehat{\omega}^{*} = arg\min_{\widehat{\omega}}(y-X\widehat{\omega})^{T}(y-X\widehat{\omega})

对上式求导

\frac{E_{\widehat{\omega}}}{\partial \widehat{\omega}} = 2X^{T}(X\widehat{\omega}-y)

上式为零可得到最优解的闭式解。X^{T}X满秩矩阵正定矩阵

\widehat{\omega}^{*} = (X^{T}X)^{-1}X^{T}y

上式称为正规方程

广义线性模型

假如x所对应的y是在指数尺度上变化,要让预测值更为逼近真实标记y,则

y = exp(\omega^{T}x + b)

上式在形式上仍是线性回归,但实质上在求输入到输出的非线性函数映射,更为一般的,考虑单调可微函数g(\cdot )

y = g^{-1}(\omega^{T}x + b)                                                  

这样的模型为“广义线性模型”。其中函数g(\cdot )称为“联系函数”

基于梯度下降算法

预测函数:h_{\theta}(x) = \theta _{0}x_{0}+\theta _{1}x_{1}+...+\theta _{n}x_{n}

代价函数: J(\theta _{0},...,\theta_{n}) = \frac{1}{2m}\sum_{i=1}^{m}(h_{\theta}(x_{i})-y_{i})^{2}

梯度下降算法:重复计算下面公式直到收敛

\theta _{j} := \theta _{j}-\alpha\frac{\partial }{\partial \theta _{j}}J(\theta _{0},...,\theta_{n})

j=0时,代表着参数b:\theta _{0} := \theta _{0}-\alpha\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x_{i})-y_{i}),j>0,\theta _{j} := \theta _{j}-\alpha\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x_{i})-y_{i})\cdot x_{i}


import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error, r2_score


def load_DataSet():

    diabetes = load_diabetes()
    diabetes_X = diabetes.data[:,np.newaxis, 2] # 只使用一列特征
    # Split the data into training/testing sets
    diabetes_X_train = diabetes_X[:-20]
    diabetes_X_test = diabetes_X[-20:]

    # Split the targets into training/testing sets
    diabetes_y_train = diabetes.target[:-20]
    diabetes_y_test = diabetes.target[-20:]
    return diabetes_X_train,diabetes_X_test,diabetes_y_train,diabetes_y_test

def feature_scaling(dataSet):
    data_arr = np.array(dataSet)

    mu = np.mean(data_arr,0)
    sigma = np.std(data_arr,0)
    for i in range(data_arr.shape[1]):
        data_arr[:,i] = (data_arr[:,i]-mu[i])/sigma[i]
    return dataMatrix

class self_LinearRegression():
    def __init__(self,alpha=1, num_iter = 400):
        self.alpha = alpha
        self.num_iter = num_iter

    def _gradient_descent(self,dataSet, classlist):
        example_num, feature_num = dataSet.shape
        # 添加参数b及其数据集
        dataSet_b = np.hstack((np.ones((example_num,1)),dataSet))
        weights = np.zeros((feature_num+1,1))
        weightsTemp = np.matrix(np.zeros((feature_num+1,1)))

        for i in range(self.num_iter):
            matrixError = np.dot(dataSet_b,weights)-classlist[:,np.newaxis]
            for j in range(feature_num+1):
                matrixSumTerm = np.reshape(dataSet_b[:, j],(1,422))
                weightsTemp[j] = weights[j] - self.alpha / example_num * np.dot(matrixSumTerm,matrixError)
            weights = weightsTemp
        return weights

    def fit(self,X_train, Y):
        weights = self._gradient_descent(X_train,Y)
        self.coef_ = weights

    def predict(self,X_test):
        pred = np.dot(X_test,self.coef_[1:]) + self.coef_[0]
        return pred

if __name__ =="__main__":

    diabetes_X_train, diabetes_X_test, diabetes_y_train, diabetes_y_test = load_DataSet()

    self_model = self_LinearRegression()
    self_model.fit(diabetes_X_train,diabetes_y_train)
    diabetes_y_pred =self_model.predict(diabetes_X_test)
    print('Coefficients: \n', self_model.coef_)
    print("Mean squared error: %.2f" % mean_squared_error(diabetes_y_test, diabetes_y_pred))
    print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))

    plt.scatter(diabetes_X_test, diabetes_y_test, color='red')
    plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)


    plt.show()

二.Logistic Regression

对于分类任务,根据广义广义线性模型:只需找到一个单调可微函数g(\cdot )将分类任务的真实标记y与线性回归模型的预测值联系起来.

对数几率回归

对于二分类任务。我们将f(x)=\omega ^{T}x+b转换为0/1值,最为理想的是“单位阶跃函数”

y = \left\{\begin{matrix} 0, y<0; \\ anyvalues, y=0 \\ 1, y>0 \end{matrix}\right.

但是单位阶跃函数不连续,不可微,因此不能作为广义线性模型的联系函数 。对数几率函数是常用的替代函数

y = \frac{1}{1+e^{-z}}

对数几率函数是一种“Sigmoid函数”。将其作为 g(\cdot ),再根据对数线性回归得到

ln\frac{y}{1-y} = \omega^{T}x+b

其中,ln\frac{y}{1-y}为“对数几率”,二分类实际上是在用线性回归模型的预测结果去逼近真实标记的对数几率,因此叫“对数几率回归”

极大似然法估计最优参数值

我们将上面的对数几率中的y视为类后验概率估计p(y=1|x),对数几率重写为

ln\frac{p(y=1|x)}{p(y=0|x)} = \omega^{T}x+b,p(y=1|x) =\frac{e^{\omega^{T}x+b}}{1+e^{\omega^{T}x+b}},p(y=0|x) = \frac{1}{1+e^{\omega^{T}x+b}}

根据贝叶斯部分, 我们可通过"极大似然法"来估计(\omega;b),对率回归模型最大化"对数似然" 

l(\omega,b) = \sum_{i=1}^{m}lnP(y_{i}|x_{i};\omega,b)

即令每个样本属于其真实标记的概率越大越好.为便于讨论,令\beta = (\omega;b)\widehat{x} = (x;1), 则\omega^{T}x+b= \beta^{T}\widehat{x},则似然项可重写为

p(y_{i}|x_{i};\omega,b) = y_{i}p_{1}(\widehat{x_{i}};\beta)+(1-y_{i})p_{0}(\widehat{x_{i}};\beta)

最大化"对数似然" , 等价于最小化

l(\beta)=\sum_{i=1}^{m}\left ( -y_{i}\beta^{T}\widehat{x_{i}} + ln\left ( 1+e^{\beta^{T}\widehat{x_{i}}} \right ) \right )

关于β 的高阶可导连续凸函数,根据凸优化理论,经典的数值优化算法如梯度下降法、牛顿法等都可求得其最优解,于是就得到

\beta^{*} = arg\min_{\beta}l(\beta)

基于梯度下降算法

和线性回归的形式相同,但是h_{\theta}(x)已经改变为对数几率函数(sigmoid)


import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.metrics import mean_squared_error, r2_score

def load_dataset():
    iris = load_iris()
    dataMat ,labelList= iris.data[:100,:2],iris.target[:100,np.newaxis]
    X_train = np.vstack((dataMat[:40],dataMat[60:]))
    X_test = dataMat[40:60]
    y_train = np.vstack((labelList[:40],labelList[60:]))
    y_test = labelList[40:60]
    return X_train,X_test,y_train,y_test


class self_LogisticRegression():
    def __init__(self):
        self.num_iter = 500

    def _sigmoid(self,x):
        return 1.0 / (1 + np.exp(-x))


    def _gradient_ascent(self,dataMat, classList):
        """
        梯度上升与梯度下降类似,一个求最大值,一个求最小值
        :param dataMat:
        :param classList:
        :return:
        """
        example_num,feature_num = np.shape(dataMat)

        dataMarix_b = np.hstack((np.ones((example_num,1)),dataMat))
        weights = np.zeros((feature_num+1,1))
        weightsTemp = np.zeros((feature_num+1,1))
        for i in range(self.num_iter):
            alpha = 4 / (1.0 + i) + 0.01
            predict = self._sigmoid(np.dot(dataMarix_b,weights))            #np.mat不需要求和
            Matrierror =  predict-classList
            for j in range(feature_num+1):
                matrixSumTerm = np.reshape(dataMarix_b[:, j],(1,example_num))
                weightsTemp[j] = weights[j] - alpha / example_num * np.dot(matrixSumTerm,Matrierror)
            weights = weightsTemp
        return weights

    def _stochastic_gradient_ascent(self,dataMat, classList):
        """

        :param dataMat:
        :param classList:
        :param numIter: 迭代次数
        :return:
        """
        example_num, feature_num = np.shape(dataMat)

        dataMarix_b = np.hstack((np.ones((example_num, 1)), dataMat))
        weights = np.zeros((feature_num + 1, 1))
        weightsTemp = np.zeros((feature_num + 1, 1))
        for j in range(self.num_iter):
            dataInex = list(range(example_num))
            for i in range(example_num):
                alpha = 4/(1.0+j+i)+0.01                                # alpha每次迭代需要调整
                randIndex = int(np.random.uniform(0,len(dataInex)))     #随机选取更新
                predict = self._sigmoid(np.dot(dataMarix_b[randIndex],weights))    #np,array需要求和
                error = predict- classList[randIndex,:]
                for j in range(feature_num + 1):
                    weightsTemp[j] = weights[j] - alpha * error*dataMarix_b[randIndex][j]
                weights = weightsTemp
                del (dataInex[randIndex])
        return weights

    def fit(self,X_train, Y):
        weights = self._stochastic_gradient_ascent(X_train,Y)
        self.coef_ = weights

    def predict(self,X_test):
        pred = np.dot(X_test,self.coef_[1:]) + self.coef_[0]
        return pred

def plotBestFit(weights):
    X_train, X_test, y_train, y_test = load_dataset()
    dataArr = np.vstack((X_train,X_test))
    labelMat = np.vstack((y_train,y_test))
    n = np.shape(dataArr)[0]
    xcord1,ycord1,xcord2,ycord2 = [],[],[],[]
    for i in range(n):
        if labelMat[i] == 1:
            xcord1.append(dataArr[i,0])
            ycord1.append(dataArr[i,1])
        else:
            xcord2.append(dataArr[i,0])
            ycord2.append(dataArr[i,1])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1,ycord1,s=30,c='red',marker='s')
    ax.scatter(xcord2,ycord2,s=30,c='green')
    x = np.arange(4,8,0.1)
    y = (-weights[0]-weights[1]*x)/weights[2]
    ax.plot(x,y.transpose())
    plt.xlabel('x1')
    plt.ylabel('x2')
    plt.show()

if __name__ =="__main__":
    X_train, X_test, y_train, y_test = load_dataset()
    self_model = self_LogisticRegression()
    self_model.fit(X_train,y_train)
    y_pred = self_model.predict(X_test)
    print('Coefficients: \n', self_model.coef_)
    print("Mean squared error: %.2f" % mean_squared_error(y_test, y_pred))
    print('Variance score: %.2f' % r2_score(y_test, y_pred))
    plotBestFit(self_model.coef_)

LR总结

优点:

  1. 便于观测样本预测的概率分数,
  2. 计算成本低,易于理解和实现

缺点:

  1. 不能很好处理大量多类特征,容易欠拟合,分类精度可能不高.
  2. 对非线性特征,需要进行转换
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值