100-Days-Of-ML-Code
中文版《机器学习100天》
GitHub :https://github.com/MLEveryday/100-Days-Of-ML-Code
第一步:数据预处理
(1)导入库:
# Importing the libraries
import pandas as pd
import numpy as np
(2)导入数据集
# Importing the dataset
dataset = pd.read_csv('D:/PycharmProjects/DataSet/50_Startups.csv')
X = dataset.iloc[ : , :-1].values
Y = dataset.iloc[ : , 4 ].values
部分数据如下图所示,其中前四列为特征,第五列为输出(也就是需要预测的变量)
(3)将类别数据数字化
# Encoding Categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[: , 3] = labelencoder