1. 理论
-
数据集: ( x ( i ) , γ ( i ) ) , i = 1 , 2 , … , m \left(x^{(i)}, \gamma^{(i)}\right), i=1,2, \ldots, m (x(i),γ(i)),i=1,2,…,m为一个训练数据,其中 x ( i ) = ( 1 , x 1 ( i ) , x 2 ( i ) , ⋯ , x n ( i ) ) x^{(i)}=\left(1, x_{1}^{(i)}, x_{2}^{(i)}, \cdots, x_{n}^{(i)}\right) x(i)=(1,x1(i),x2(i),⋯,xn(i))
-
拟合公式
h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + … θ n x n = θ T x h_{\theta}(x)=\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\ldots \theta_{n} x_{n}=\theta^{T} x hθ(x)=θ0+θ1x1+θ2x2+…θnxn=θTx
其中:
x = ( 1 x 1 x 2 ⋮ x n ) , θ = ( θ 0 θ 1 ⋮ θ n ) x=\left(\begin{array}{c} 1 \\ x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right), \quad \theta=\left(\begin{array}{c} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{array}\right) x=⎝⎜⎜⎜⎜⎜⎛1x1x2⋮xn⎠⎟⎟⎟⎟⎟⎞,θ=⎝⎜⎜⎜⎛θ0θ1⋮θn⎠⎟⎟⎟⎞
- 代价函数
J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta)=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2} J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
2. 实践
2.1. 数据预处理
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris,load_diabetes
diabetes = load_diabetes()
x = diabetes.data
y = diabetes.target
names = diabetes.feature_names
from sklearn.preprocessing import StandardScaler
x = StandardScaler().fit_transform(x)
y = y.reshape(-1,1)
y = StandardScaler().fit_transform(y)
2.2. 训练
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
lr = LinearRegression(normalize=True)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.2)
lr.fit(X_train,y_train)
2.3. 预测评估与可视化
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
y_pred = lr.predict(X_test)
x_axis = np.arange(0,len(y_pred),1)
plot_data = []
for x_,y_pred_ in zip(x_axis,y_pred):
plot_data.append((x_,y_pred_[0],'true'))
for x_,y_test_ in zip(x_axis,y_test):
plot_data.append((x_,y_test_[0],'pred'))
plot_data = pd.DataFrame(plot_data,columns=['x','y','label'])
sns.lineplot(plot_data['x'],plot_data['y'],hue=plot_data['label'])
# sns.lineplot(plot_data['x_idx'],plot_data['y_pred'])
from sklearn.metrics import mean_squared_error
loss = mean_squared_error(y_pred,y_test)
plt.title(f"loss:{loss:.2f}")
plt.show()