One-hot encoding
one_hot_data = pd.get_dummies(data, columns=['rank'])
对某一列变量进行哑变量处理
Splitting the data into Training and Testing
sample = np.random.choice(processed_data.index, size=int(len(processed_data)*0.9), replace=False)
train_data, test_data = processed_data.iloc[sample], processed_data.drop(sample)
将数据集分为训练集和验证集(大概10%),代码觉得比较精妙
Splitting the data into features and targets (labels)
features = train_data.drop('admit', axis=1)
targets = train_data['admit']