线性回归
【介绍】
线性回归是利用数理统计中回归分析,来确定两种或两种以上变量间相互依赖的定量关系的一种统计分析方法。其表达形式为为误差,服从均值为0的正态分布。
回归分析中,根据自变量个数,且因变量只有一个,因变量和自变量之间是线性关系,又分为一元线性回归分析和多元线性回归分析。
【模型】
【Loss function】:
一种推导:
为误差,独立同分布,服从均值为0,方差为的正态分布,则
对于m个样本,极大似然函数:
对数似然函数:
求的最大值,即求的最大值,也就是求的最小值
因此loss function为:
【求解】
最小二乘法:
梯度下降法:
梯度(单个样本):
使用批量梯度下降算法更新参数:
【simulation】
code:
# -*- coding: utf-8 -*-
"""
--linear regression
Date:2017/9/3
@author: xulu
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def loadDataSet():
fil=pd.read_csv("testSet.txt",encoding="utf-8",header=None,delimiter='\t').as_matrix()
x=np.reshape(fil[:,0],(100,1))
y=np.reshape(fil[:,1],(100,1))
return x,y
def handleDataMat(dataMat):
return np.insert(dataMat,0,1,axis=1)
def plotDataSet(x,y):
plt.figure()
plt.scatter(x,y,c='g',marker='o')#g--0
plt.show()
def gradient(x,y,weights):
h=x.dot(weights)
error = (h - y)
return x.transpose().dot(error)
def gradDescent(x, y,weights,iters,alpha):
for _ in range(iters):
grad=gradient(x,y,weights)
weights = weights - alpha * grad
return weights
def params_init(param_nums):
alpha = 0.001
iters = 500
weights = np.ones((param_nums,1))
return alpha,iters,weights
def train(x,y):
x=handleDataMat(x)
m,n = np.shape(x)
alpha,iters,weights=params_init(n)
weights=gradDescent(x, y,weights,iters,alpha)
return weights
def plotBestFit(x,y,weights):
plt.figure()
plt.scatter(x,y,c='g',marker='o')#g--0
x = np.arange(-4.0, 4.0, 0.1)
y = weights[0]+weights[1]*x
plt.plot(x, y)
plt.xlabel('x'); plt.ylabel('y');
plt.show()
def predict(x,weights):
x=handleDataMat(x)
return x.dot(weights)
if __name__=='__main__':
x,y=loadDataSet()
plotDataSet(x,y)
weights=train(x,y)
plotBestFit(x,y,weights)
print("weights: ",weights)
testdata=np.array([[1],[2],[3]])
print("test result:",predict(testdata,weights))
Result: