Goals
In this lab you will learn to implement the model f w , b f_{w,b} fw,b for linear regression with one variable.
Tools
In this lab you will make use of:
- Numpy, a popular library for scientific computing
- Matplotlib, a popular library for plotting data
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
plt.style.use()
是使用matplotlib自带或自定义的几种美化样式,就可以很轻松的对生成的图形进行美化
# 获取所有的美化样式并输出
print(plt.style.available)
Problem Statement
房价预测,使用一个由两个点组成的数据集,分别为 (1.0, 300) 和 (2.0, 500)
x是 size(1000 square feet-sqft),y是 price(1000s of dollars)
通过这两个点,找到一个合适的线性回归模型,并预测1200sqft的房子的价格
使用NumPy
的一维数组创建x和y变量,使用f-string
格式进行输出
# x_train is the input variable (size in 1000 square feet)
# y_train is the target (price in 1000s of dollars)
x_train = np.array([1.0, 2.0])
y_train = np.array([300.0, 500.0])
print(f"x_train = {x_train}")
print(f"y_train = {y_train}")
输出如下
x_train = [1. 2.]
y_train = [300. 500.]
Number of training examples m
使用m来表示训练示例的数量
Numpy数组有一个.shape
参数,x_train.shape返回一个python元组tuple,x_train.shape[0]表示数组的长度和示例的数量
或者使用函数len()
显示长度
# m is the number of training examples
print(f"x_train.shape: {x_train.shape}")
m = x_train.shape[0]
print(f"Number of training examples is: {m}")
# m is the number of training examples
m = len(x_train)
print(f"Number of training examples is: {m}")
输出如下
x_train.shape: (2,)
Number of training examples is: 2
Number of training examples is: 2
Training example x_i, y_i
Use (x
(
i
)
^{(i)}
(i), y
(
i
)
^{(i)}
(i)) to denote the
i
t
h
i^{th}
ith training example.Since Python is zero indexed, (x
(
0
)
^{(0)}
(0), y
(
0
)
^{(0)}
(0)) is (1.0, 300.0) and (x
(
1
)
^{(1)}
(1), y
(
1
)
^{(1)}
(1)) is (2.0, 500.0).
用索引获得数组中单个元素值,如获得第0个x值,用x_train[0]
i = 0 # Change this to 1 to see (x^1, y^1)
x_i = x_train[i]
y_i = y_train[i]
print(f"(x^({i}), y^({i})) = ({x_i}, {y_i})")
输出如下
(x^(0), y^(0)) = (1.0, 300.0)
Plotting the data
可以用matplotlib库中的函数scatter()
来绘制两个点的散点图
s:形状的大小,默认20,可以是数组,每个参数为对应点大小
c:形状的颜色,b-blue g-green r-red c-cyan m-magenta y-yellow k-black w-white
marker:常见的点的形状
标记 | 符号 | 标记 | 符号 |
---|---|---|---|
. | 点 | * | 星形 |
, | 像素点 | h | 1号六角形 |
o | 圆形 | H | 2号六角形 |
v | 朝下三角形 | + | +号标记 |
^ | 朝上三角形 | x | x号标记 |
< | 朝左三角形 | D | 菱形 |
> | 朝右三角形 | d | 小型菱形 |
s | 正方形 | | | 垂直线形 |
p | 五边形 | _ | 水平线形 |
# Plot the data points
plt.scatter(x_train, y_train, marker='x', c='r')
# Set the title
plt.title("Housing Prices")
# Set the y-axis label
plt.ylabel('Price (in 1000s of dollars)')
# Set the x-axis label
plt.xlabel('Size (1000 sqft)')
plt.show()
将鼠标放在图片上时,图片会实时在右上角显示x和y的坐标值
Model function
As described in lecture, the model function for linear regression (which is a function that maps from x
to y
) is represented as
f w , b ( x ( i ) ) = w x ( i ) + b f_{w,b}(x^{(i)}) = wx^{(i)} + b fw,b(x(i))=wx(i)+b
The formula above is how you can represent straight lines - different values of w w w and b b b give you different straight lines on the plot.
Let’s start with w = 100 w = 100 w=100 and b = 100 b = 100 b=100.
w = 100
b = 100
print(f"w: {w}")
print(f"b: {b}")
输出如下
w: 200
b: 100
Now, let’s compute the value of f w , b ( x ( i ) ) f_{w,b}(x^{(i)}) fw,b(x(i)) for your two data points. You can explicitly write this out for each data point as -
for
x
(
0
)
x^{(0)}
x(0), f_wb = w * x[0] + b
for
x
(
1
)
x^{(1)}
x(1), f_wb = w * x[1] + b
对于大量数据点,一个一个写会重复和冗余,可以使用for循环
计算函数的输出值
Note: The argument description
(ndarray (m,))
describes a Numpy n-dimensional array of shape (m,).(scalar)
describes an argument without dimensions, just a magnitude.np.zero(n)
will return a one-dimensional numpy array with n n n entries.
def compute_model_output(x, w, b):
"""
Computes the prediction of a linear model
Args:
x (ndarray (m,)): Data, m examples
w,b (scalar) : model parameters
Returns
y (ndarray (m,)): target values
"""
m = x.shape[0]
f_wb = np.zeros(m)
for i in range(m):
f_wb[i] = w * x[i] + b
return f_wb
Now let’s call the compute_model_output
function and plot the output.
可以用matplotlib库中的函数plot()
来绘制两个点的曲线图
plt.plot(x, y, "格式控制字符串", 关键字 = 参数)
,格式控制字符串最多可以包括三个部分,“颜色”、“点型”、“线型”
例如plt.plot(x, y, “ob:”) ,"b"为蓝色,“o"为圆点,”:"为点线
可以使用关键字控制属性,如color = “blue”、linewidth = 20、marker = “o”、markersize = 50、markerfacecolor = “red”、markeredgewidth = 6、markeredgecolor = “grey”、linestyle = "solid"或linestyle = “-”、lable = “our”
参数 | 线型 | 参数 | 线型 |
---|---|---|---|
: dotted | 点线 | – dashed | 短划线/虚线 |
-. dashdot | 点画线 | - solid | 实线 |
tmp_f_wb = compute_model_output(x_train, w, b,)
# Plot our model prediction
plt.plot(x_train, tmp_f_wb, c='b',label='Our Prediction')
# Plot the data points
plt.scatter(x_train, y_train, marker='x', c='r',label='Actual Values')
# Set the title
plt.title("Housing Prices")
# Set the y-axis label
plt.ylabel('Price (in 1000s of dollars)')
# Set the x-axis label
plt.xlabel('Size (1000 sqft)')
plt.legend() # 添加图例,自动选择最佳位置
plt.show()
As you can see, setting
w
=
100
w = 100
w=100 and
b
=
100
b = 100
b=100 does not result in a line that fits our data.
具体合适的
w
w
w 和
b
b
b 需要用cost function来求解
Prediction
Now that we have a model, we can use it to make our original prediction. Let’s predict the price of a house with 1200 sqft. Since the units of x x x are in 1000’s of sqft, x x x is 1.2.
w = 200
b = 100
x_i = 1.2
cost_1200sqft = w * x_i + b
print(f"${cost_1200sqft:.0f} thousand dollars")
输出如下
$340 thousand dollars
Congratulations!
In this lab you have learned:
Linear regression builds a model which establishes a relationship between features and targets
- In the example above, the feature was house size and the target was house price
- for simple linear regression, the model has two parameters w w w and b b b whose values are ‘fit’ using training data.
- once a model’s parameters have been determined, the model can be used to make predictions on novel data.