线性回归
学习实例
原理
线性回归是使用线性的方式刻画自变量和因变量直接的相关关系,在线性回归的任务中,线性回归是明确了要用线性的函数对数据进行拟合,并求解线性函数的最优参数的过程。
模型
线性回归,顾名思义,其模型表达为线性函数,中学时期我们就学习过线性函数的表达式
y = ax + b
当自变量的维度为n维时,我们用矩阵的方式表示线性模型:
y = XW
其中,m为样本个数,n为输入的维度
X = [x_{1},...,x_{i},...,x_{m}]
x_{i} = [x_{i1},...,x_{ij},...,x_{in}]
i = 1,...,m , j = 1,...,n
由于我们构造模型的最终目的是用来拟合数据,并能够很好的做预测,因此好的参数构建的模型应该具备很好的预测能力,也就是预测值跟实际值的差距越小越好,我们把实际值跟预测值的差异称为误差,通常使用均方误差来衡量模型的预测能力,表达式为:
cost = 1/2m(\sum_1^n(y_{pred} -y)^{2})
算法
常用的求解线性回归使得损失函数最小的方法有两种:梯度下降,最小二乘,具体的方法可以参考:求解算法
R方理论评估模型
R方的计算方法如下:
R^2 = 1 - (\sum(y_{pred}-y)^2)/(\sum(y_{mean}-y)^2)
用1减去y的残差平方与y的总方差的比值,如果模型预测越好,值约接近于1,R方评估方法适用于回归模型。
code
# A linear regression learning algorithm example using tf lib.
# Author: jiaqian
# refrence: https://github.com/aymericdamien/TensorFlow-Examples/blob/0.11/notebooks/2_BasicModels/linear_regression.ipynb
import tensorflow as tf
import numpy as np
rng = np.random
# parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50
# training data
train_X = np.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]
print train_X
print train_Y
print n_samples
# tf graph input
X = tf.placeholder("float")
Y = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
# construct a linear model
pred = tf.add(tf.multiply(X, W),b)
# mean square error
cost = tf.reduce_sum(tf.pow(pred-Y, 2)/(2*n_samples))
# gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
for (x,y) in zip(train_X, train_Y):
sess.run(optimizer, feed_dict = {X:x, Y:y})
if (epoch+1) % display_step == 0:
c = sess.run(cost,feed_dict = {X: train_X, Y: train_Y})
print "Epoch:",'%04d' % (epoch+1),"cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=",sess.run(b)
print "Optimization Finished!"
training_cost = sess.run(cost,feed_dict = {X: train_X, Y: train_Y})
print "Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b)