100-Days-Of-ML-Code中文版 链接如下
100-Days-Of-ML-Code中文版
第一步
数据预处理
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv('studentscores.csv')
X = dataset.iloc[ : , : 1 ].values
Y = dataset.iloc[ : , 1 ].values
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size = 1/4, random_state = 0)
和第一天一样
csv如下所示
Hours,Scores
2.5,21
5.1,47
3.2,27
8.5,75
3.5,30
1.5,20
9.2,88
5.5,60
8.3,81
2.7,25
7.7,85
5.9,62
4.5,41
3.3,42
1.1,17
8.9,95
2.5,30
1.9,24
6.1,67
7.4,69
2.7,30
4.8,54
3.8,35
6.9,76
7.8,86
第二步
通过训练集来简单训练线性回归模型
创建LinearRegression类的regressor对象,最后使用fit()方法将regressor对象对数据集进行训练
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor = regressor.fit(X_train, Y_train)
第三步
预测结果
Y_pred = regressor.predict(X_test)
预测来自测试集的结果,使用predict()方法,并将结果保存到Y_pred向量中。
我运行时报错了,如下所示:
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.
我尝试输出X_test,原来X_test是空的,把第一步的test_size改成0.25就可以了
第四步
可视化
训练集结果可视化
plt.scatter(X_train , Y_train, color = 'red')
plt.plot(X_train , regressor.predict(X_train), color ='blue')
测试集结果可视化
plt.scatter(X_test , Y_test, color = 'red')
plt.plot(X_test , regressor.predict(X_test), color ='blue')
参考
1 https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/Code/Day2_Simple_Linear_Regression.md
2 https://github.com/zhyongquan/100-Days-Of-ML-Code-1