Kaggle房价预测 随机森林方法

本文介绍了参与Kaggle的'House Prices: Advanced Regression Techniques'比赛的经验,分享了使用随机森林进行房价预测的方法。参考了多个教程,并提供了Python代码实现,最终在Kaggle上取得了0.15105的得分。
摘要由CSDN通过智能技术生成

分享一下我老师大神的人工智能教程!零基础,通俗易懂!http://blog.csdn.net/jiangjunshow

也欢迎大家转载本篇文章。分享知识,造福人民,实现我们中华民族伟大复兴!

                       

House Prices: Advanced Regression Techniques

赛题链接:

[ https://www.kaggle.com/c/house-prices-advanced-regression-techniques]


相关参考教程

House Prices: 比赛经验分享
https://www.kaggle.com/xirudieyi/house-prices-advanced-regression-techniques/house-prices
Regression using Keras
https://www.kaggle.com/vishnus/house-prices-advanced-regression-techniques/regression-using-keras/code
Advanced Regression Modeling on House Prices
http://blog.nycdatascience.com/student-works/advanced-regression-modeling-house-prices/

Python代码

#-*- coding: utf-8 -*-import numpy as np # linear algebraimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)# Input data files are available in the "../input/" directory.# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directoryfrom subprocess import check_output#print(check_output(["dir", "./"]).decode("utf8"))# 加载数据train_data = pd.read_csv("./data/train.csv")test_data = pd.read_csv("./data/test.csv")# 定义若干个对数据进行清理的函数,这些函数主要作用在pandas的DataFrame数据类型上# 查看数据集属性值得确实情况def show_missing(houseprice):    missing = houseprice.columns[houseprice.isnull().any()].tolist()    return missing# 查看 categorical 特征的值情况def cat_exploration(houseprice, column):    print(houseprice[column].value_counts())# 对数据集中某一列的缺失值进行补全def cat_imputation(houseprice, column, value):    houseprice.loc[houseprice[column].isnull(), column] = value# LotFrontage# check correlation with LotAreaprint(test_data['LotFrontage'].corr(test_data['LotArea']))  # 0.64print(train_data['LotFrontage'].corr(train_data['LotArea']))  # 0.42test_data['SqrtLotArea'] = np.sqrt(test_data['LotArea'])train_data['SqrtLotArea'] = np.sqrt(train_data['LotArea'])# print(test_data['LotFrontage'].corr(test_data['SqrtLotArea']))# print(train_data['LotFrontage'].corr(train_data['SqrtLotArea']))cond = test_data['LotFrontage'].isnull()test_data.LotFrontage[cond] = test_data.SqrtLotArea[cond]#缺失值用房屋边长补全cond = train_data['LotFrontage'].isnull()train_data.LotFrontage[cond] = train_data.SqrtLotArea[cond]del test_data['SqrtLotArea']del train_data['SqrtLotArea']# MSZoning# 在test测试集中有缺失, train中没有cat_exploration(test_data, 'MSZoning')print(test_data[test_data['MSZoning'].isnull() == True])# MSSubClass  MSZoningprint(pd.crosstab(test_data.MSSubClass, test_data.MSZoning))#test_data中建筑类型缺失值补齐 30:RM 20:RL 70:RMtest_data.loc[test_data['MSSubClass'] == 20, 'MSZoning'] = 'RL'test_data.loc[test_data['MSSubClass'] == 30, 'MSZoning'] = 'RM'test_data.loc[test_data['MSSubClass'] == 70, 'MSZoning'] = 'RM'# Alleyprint(cat_exploration(test_data, 'Alley'))print(cat_exploration(train_data, 'Alley'))# Alley这个特征有太多的nans,这里填充None,也可以直接删除,不使用。后面在根据特征的重要性选择特征是,也可以舍去cat_imputation(test_data, 'Alley', 'None')cat_imputation(train_data, 
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值