KNN—预测房屋出租价格

根据之前的房屋出租的特征和价格来预测房屋出租价格-----基于欧式距离

import pandas as pd
from sklearn.preprocessing import StandardScaler
features = ['accommodates','bedrooms','bathrooms','beds','price','minimum_nights','maximum_nights','number_of_reviews']

dc_listings = pd.read_csv('listings.csv')
dc_listings = dc_listings[features]

dc_listings['price'] = dc_listings.price.str.replace("\$|,",'').astype(float)   #将价格变为字符串

dc_listings = dc_listings.dropna()  #df=df.dropna()  删除所有包含NaN的行,相当于参数全部默认
# print(dc_listings.head())
dc_listings[features] = StandardScaler().fit_transform(dc_listings[features])     #将所有参数标准化fit_transform()

normalized_listings = dc_listings

# print(dc_listings.head())

#取出训练数据和测试数据
norm_train_df = normalized_listings.copy().iloc[0:2792] 
norm_test_df = normalized_listings.copy().iloc[2792:]

#基于欧式距离的多变量距离的计算
from scipy.spatial import distance
first_listing = normalized_listings.iloc[0][['accommodates', 'bathrooms']]

# print(first_listing)

fifth_listing = normalized_listings.iloc[20][['accommodates', 'bathrooms']]
# print(fifth_listing)

first_fifth_distance = distance.euclidean(first_listing, fifth_listing)    #计算欧式距离
# print(first_fifth_distance)



###多变量的KNN模型
def predict_price_multivariate(new_listing_value,feature_columns):
    temp_df = norm_train_df
    temp_df['distance'] = distance.cdist(temp_df[feature_columns],[new_listing_value[feature_columns]])
    temp_df = temp_df.sort_values('distance')
    knn_5 = temp_df.price.iloc[:5]
    predicted_price = knn_5.mean()
    return(predicted_price)

cols = ['accommodates', 'bathrooms']
norm_test_df['predicted_price'] = norm_test_df[cols].apply(predict_price_multivariate,feature_columns=cols,axis=1)    
print(norm_test_df)
norm_test_df['squared_error'] = (norm_test_df['predicted_price'] - norm_test_df['price'])**(2)    
mse = norm_test_df['squared_error'].mean()
rmse = mse ** (1/2)  #计算均方根误差
# print(rmse)     



#使用sklearn来完成KNN
from sklearn.neighbors import KNeighborsRegressor
cols = ['accommodates','bedrooms']
knn = KNeighborsRegressor()      #可以改变k值 ,n_neightbors=5 默认
knn.fit(norm_train_df[cols], norm_train_df['price'])   #fit()函数 :x是特征,y是价格
two_features_predictions = knn.predict(norm_test_df[cols])   #测试集的结果

print(two_features_predictions)

from sklearn.metrics import mean_squared_error
'''计算均方根误差'''
two_features_mse = mean_squared_error(norm_test_df['price'], two_features_predictions)
two_features_rmse = two_features_mse ** (1/2)
# print(two_features_rmse)


#将两个特征变为多个特征
knn = KNeighborsRegressor()

cols = ['accommodates','bedrooms','bathrooms','beds','minimum_nights','maximum_nights','number_of_reviews']

knn.fit(norm_train_df[cols], norm_train_df['price'])     
four_features_predictions = knn.predict(norm_test_df[cols])
four_features_mse = mean_squared_error(norm_test_df['price'], four_features_predictions)
four_features_rmse = four_features_mse ** (1/2)
print(four_features_rmse)

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值