sklearn代码20 1-线性回归boston房价预测

import numpy as np

from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn import datasets
# 波士顿房价
boston = datasets.load_boston()
X = np.linspace(0,10,50).reshape(-1,1)
X
array([[  0.        ],
       [  0.20408163],
       [  0.40816327],
       [  0.6122449 ],
       [  0.81632653],
       [  1.02040816],
       [  1.2244898 ],
       [  1.42857143],
       [  1.63265306],
       [  1.83673469],
       [  2.04081633],
       [  2.24489796],
       [  2.44897959],
       [  2.65306122],
       [  2.85714286],
       [  3.06122449],
       [  3.26530612],
       [  3.46938776],
       [  3.67346939],
       [  3.87755102],
       [  4.08163265],
       [  4.28571429],
       [  4.48979592],
       [  4.69387755],
       [  4.89795918],
       [  5.10204082],
       [  5.30612245],
       [  5.51020408],
       [  5.71428571],
       [  5.91836735],
       [  6.12244898],
       [  6.32653061],
       [  6.53061224],
       [  6.73469388],
       [  6.93877551],
       [  7.14285714],
       [  7.34693878],
       [  7.55102041],
       [  7.75510204],
       [  7.95918367],
       [  8.16326531],
       [  8.36734694],
       [  8.57142857],
       [  8.7755102 ],
       [  8.97959184],
       [  9.18367347],
       [  9.3877551 ],
       [  9.59183673],
       [  9.79591837],
       [ 10.        ]])
y = np.random.randint(2,8,size = 1)*X
y
array([[  0.        ],
       [  0.40816327],
       [  0.81632653],
       [  1.2244898 ],
       [  1.63265306],
       [  2.04081633],
       [  2.44897959],
       [  2.85714286],
       [  3.26530612],
       [  3.67346939],
       [  4.08163265],
       [  4.48979592],
       [  4.89795918],
       [  5.30612245],
       [  5.71428571],
       [  6.12244898],
       [  6.53061224],
       [  6.93877551],
       [  7.34693878],
       [  7.75510204],
       [  8.16326531],
       [  8.57142857],
       [  8.97959184],
       [  9.3877551 ],
       [  9.79591837],
       [ 10.20408163],
       [ 10.6122449 ],
       [ 11.02040816],
       [ 11.42857143],
       [ 11.83673469],
       [ 12.24489796],
       [ 12.65306122],
       [ 13.06122449],
       [ 13.46938776],
       [ 13.87755102],
       [ 14.28571429],
       [ 14.69387755],
       [ 15.10204082],
       [ 15.51020408],
       [ 15.91836735],
       [ 16.32653061],
       [ 16.73469388],
       [ 17.14285714],
       [ 17.55102041],
       [ 17.95918367],
       [ 18.36734694],
       [ 18.7755102 ],
       [ 19.18367347],
       [ 19.59183673],
       [ 20.        ]])
y/X
C:\Users\LXQ\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in true_divide
  """Entry point for launching an IPython kernel.





array([[ nan],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.],
       [  2.]])
lr = LinearRegression()

lr.fit(X,y)
# coeficient  效率,斜率
# w ------> weight 权重
lr.coef_
array([[ 2.]])

w ^ = ( X T X ) − 1 X T y \hat{w} = (X^TX)^{-1}X^Ty w^=(XTX)1XTy

# 线性代数中的矩阵运算
np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
array([[ 2.]])

Eudcode是一个Python数据分析工具集,它可以帮助用户快速地进行数据探索、模型构建等任务。在这个场景,使用scikit-learn库做线性回归实践,比如波士顿房价预测,通常涉及到以下几个步骤: 1. **数据加载**:首先从sklearn.datasets导入波士顿房价数据集Boston Housing Dataset),这是一个经典的数据分析例子,包含506个观测值,每个观测值有13个特征,目标变量是房屋价格。 ```python from sklearn.datasets import load_boston boston = load_boston() ``` 2. **数据预处理**:查看数据描述,对缺失值、异常值进行处理,并将分类特征转化为数值特征,如果有的话。然后分割数据集为训练集和测试集。 ```python X = boston.data y = boston.target # 划分数据 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) ``` 3. **模型训练**:创建一个线性回归模型实例,并使用训练数据拟合模型。 ```python from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) ``` 4. **模型评估**:使用测试集数据进行预测,并计算一些指标如均方误差(MSE)、R²分数等,来评估模型性能。 ```python from sklearn.metrics import mean_squared_error, r2_score y_pred = model.predict(X_test) mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print("Mean Squared Error:", mse) print("R^2 Score:", r2) ``` 5. **模型应用**:最后,你可以用这个模型来预测新的波士顿地区房价,只要提供相应的特征向量。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值