线性回归

最新推荐文章于 2021-03-04 01:23:17 发布

XuLu2013

最新推荐文章于 2021-03-04 01:23:17 发布

阅读量360

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/u012564684/article/details/78138364

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

线性回归

【介绍】

线性回归是利用数理统计中回归分析，来确定两种或两种以上变量间相互依赖的定量关系的一种统计分析方法。其表达形式为为误差，服从均值为0的正态分布。

回归分析中，根据自变量个数，且因变量只有一个，因变量和自变量之间是线性关系，又分为一元线性回归分析和多元线性回归分析。

【模型】

【Loss function】：

一种推导：

为误差，独立同分布，服从均值为0，方差为的正态分布，则

对于m个样本，极大似然函数：

对数似然函数：

求的最大值，即求的最大值，也就是求的最小值

因此loss function为：

【求解】

最小二乘法：

梯度下降法：

梯度（单个样本）：

使用批量梯度下降算法更新参数：

【simulation】

code:

# -*- coding: utf-8 -*-
"""
--linear regression
Date:2017/9/3
@author: xulu
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def loadDataSet():
    fil=pd.read_csv("testSet.txt",encoding="utf-8",header=None,delimiter='\t').as_matrix()
    x=np.reshape(fil[:,0],(100,1))
    y=np.reshape(fil[:,1],(100,1))
    return x,y

def handleDataMat(dataMat):
    return np.insert(dataMat,0,1,axis=1)

def plotDataSet(x,y):
    plt.figure()
    plt.scatter(x,y,c='g',marker='o')#g--0
    plt.show()

def gradient(x,y,weights):
    h=x.dot(weights)             
    error = (h - y)
    return x.transpose().dot(error)

def gradDescent(x, y,weights,iters,alpha):
    for _ in range(iters):
        grad=gradient(x,y,weights)             
        weights = weights - alpha * grad
    return weights

def params_init(param_nums):
    alpha = 0.001
    iters = 500
    weights = np.ones((param_nums,1))
    return alpha,iters,weights

def train(x,y):
    x=handleDataMat(x)
    m,n = np.shape(x)
    alpha,iters,weights=params_init(n)
    weights=gradDescent(x, y,weights,iters,alpha)
    return weights
    
def plotBestFit(x,y,weights):
    plt.figure()
    plt.scatter(x,y,c='g',marker='o')#g--0
    x = np.arange(-4.0, 4.0, 0.1)
    y = weights[0]+weights[1]*x
    plt.plot(x, y)
    plt.xlabel('x'); plt.ylabel('y');
    plt.show()

def predict(x,weights):
    x=handleDataMat(x)
    return x.dot(weights)

if __name__=='__main__':
    x,y=loadDataSet()
    plotDataSet(x,y)
    
    weights=train(x,y)
    plotBestFit(x,y,weights)
    print("weights: ",weights)
    
    testdata=np.array([[1],[2],[3]])
    print("test result:",predict(testdata,weights))

Result: