Implementing linear regression with multiple variables to predict the prices of houses. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.
The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.
下文介绍的是Andrew Ng 的coursea上面的机器学习课的作业1的多特征线性回归的梯度下降解的Tensorflow实现。注意解析解
是存在的,然而时间复杂度是O(N^3),相比之下如果矩阵尺寸很大,O(n^2)的梯度下降是更快速的解法
下面是tensorflow上面运行的结果
cost函数变化以及拟合的曲线结果:
代码:
#by Mikeyao
#linear regression of multiple
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import time
from mpl_toolkits.mplot3d import Axes3D
print("Loading data...\n")
data=np.loadtxt("ex1data2.txt",delimiter=',')
X=data[:,0:2]
y=data[:,2]
X_test=np.array([1650.0,3.0])
X_test.shape=(1,2)
m=y.shape[0]
y.shape=(m,1)
print('First 10 examples from the datasheet:\n')
#print("x="+str(X[:10,:]))
#print("y="+str(y[:10]))
#norm
mu=np.mean(X,axis=0)
print(mu)
sigma=np.std(X,axis=0)
X_norm=X
X_norm-=mu
X_norm/=sigma+1e-9#avoids dividing by zero by 1e-9
print("mu="+str(mu)+" "+"signma="+str(np.sqrt(sigma))+" ")
#print(X_norm)
#add intercept term to X
X=X_norm
#print(X)
print("Running gradient decent...\n")
alpha=0.01#learning rate
num_iters=400#the iteration times
Xs=tf.placeholder("float64")
Ys=tf.placeholder("float64",shape=(m,1))
W=tf.Variable(np.random.randn(2,1),name="weight")
b=tf.Variable(np.random.randn(1),name="bias")
pred=tf.add(tf.matmul(Xs,W),b)
cost_function=tf.reduce_sum(tf.pow(pred-Ys,2),0)/(2*m)
optimizer=tf.train.GradientDescentOptimizer(alpha).minimize(cost_function)
#initialize variables
init=tf.initialize_all_variables()
cost_history=[]
with tf.Session() as sess:
sess.run(init)
display_step=20
for iteration in range(num_iters):
sess.run(optimizer,feed_dict={Xs:X,Ys:y})
if iteration%display_step==0:
#print(sess.run(cost_function, feed_dict={Xs:X,Ys:y}))
costnow=sess.run(cost_function, feed_dict={Xs:X,Ys:y})
cost_history.append(costnow)
print("Iteration:", '%04d' % (iteration + 1), "cost=", "{:}".format(costnow),"W=", sess.run(W), "b=", sess.run(b))
X_test-=mu
X_test/=sigma
print("prediction=",sess.run(pred,feed_dict={Xs:X_test}))
finalW=sess.run(W)
finalb=sess.run(b)
plt.figure("cost vs iterations")
plt.plot(np.linspace(0,len(cost_history),len(cost_history)),np.array(cost_history),label='cost vs iterations')
plt.figure("normalized fit result")
ax=plt.subplot(111,projection='3d')
ax.scatter(X[:,0],X[:,1],y,c='r',label="$"+"price"+"$")
ax.set_zlabel('price')
ax.set_ylabel('number of bed rooms')
ax.set_xlabel('area')
plt.hold(True)
x_t=np.linspace(-1,1,10000)
y_t=np.linspace(-1,1,10000)
z_t=np.column_stack((x_t.T,y_t.T))
finalW.shape=(2,1)
z_t=np.dot(z_t,finalW)+finalb[0]
print(z_t.shape)
ax.scatter(x_t,y_t,z_t,c='b',label="$"+"price"+"$")
plt.show()
最后给出题目要求的1650ft,3rooms的价格预测:
先对[1650,3]进行normalize
回归结果(对normalized的输入):y=340412+109447*x1-6578*x2
prediction=293081$