机器学习|K邻近回归模型实战|python应用
K邻近回归模型python应用
感谢各位小伙伴看到这里,或许看见代码各位脑袋都大了,啊哈哈
这篇文章很简短,大多数是代码,没有对原理或者代码进行解读,仅仅给小伙伴提供一种思路和方法,希望小伙伴们有所得!
如果对K邻近原理或者代码有更多的兴趣,请大家点赞或者收藏,题主会对大家的呼声进行回应哦!
如果小伙伴引用我的代码,请附上来源,谢谢!
转载请附上来源,谢谢!
1.加载csv数据
def load_orindata():
# load dataset
dataset = read_csv('paperuse.csv',sep=',')
return dataset
2.孤立森林检测异常值
def Cheak_VF(data):
clf=IsolationForest()
pres=clf.fit_predict(trans_data)
return pres
3.检查离散值状态
def obser_nominal_vars(nominal_vars,testdata):
nominal_vars=nominal_vars
testdata=testdata
for each in nominal_vars:
print(each, ':')
print(testdata[each].agg([‘value_counts’]).T)
print('='*35)
4.分类标签编码
处理离散值的一般性方法
def lable_trans(testdata,lable_nominal_vars):
testdata = testdata
object_cols_lable=lable_nominal_vars
label_encoder = LabelEncoder()
for col in object_cols_lable:
testdata[col]= label_encoder.fit_transform(testdata[col])
return testdata
5.热独码编码
热独码对于离散值有比较优秀的表现,但题主建议小类别不超过10
def one_hot_trans(object_cols_onehot,testdata):
object_cols_onehot = object_cols_onehot
testdata = testdata
OH_encoder = OneHotEncoder(handle_unknown='ignore', sparse=False)
OH_cols_train = pd.DataFrame(OH_encoder.fit_transform(testdata[object_cols_onehot]))
OH_cols_train.columns = OH_encoder.get_feature_names_out(input_features=object_cols_onehot)
num_X_train = testdata.drop(object_cols_onehot, axis=1)
OH_X_train = pd.concat([num_X_train, OH_cols_train], axis=1)
# x=testdata.iloc[:, 0:10].values
# y=testdata.iloc[:,10:11].values
return OH_X_train
6.分割数据集
def split_test_train(data,train_size):
data=data
train_size=train_size
features = data.drop('OR', axis=1)
training_features, testing_features, training_target, testing_target = train_test_split(features.values, data['OR'].values, random_state=42,train_size=train_size)
return training_features, testing_features, training_target, testing_target,features
7.构建K邻近回归模型
name='k