kaggle房价预测特征意思_Kaggle项目:房价预测(1)

Kaggle项目——房价预测

1. 问题描述

基于项目提供的爱荷华州埃姆斯的房屋历史成交数据,预测新的房屋销售价格

这是一个回归问题

项目的评分标准是均方根误差(RMSE),预测价格和实际价格取对数计算均方根误差

# 导入类库

import numpy as np

import pandas as pd

import scipy.stats as stats

import matplotlib.pyplot as plt

import seaborn as sns

import warnings

warnings.filterwarnings('ignore')

from sklearn.preprocessing import LabelEncoder

from sklearn.preprocessing import RobustScaler

from sklearn.decomposition import PCA

from sklearn.model_selection import cross_val_score, GridSearchCV, KFold

from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin

from sklearn.base import clone

from sklearn.linear_model import Lasso

from sklearn.linear_model import LinearRegression

from sklearn.linear_model import Ridge

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, ExtraTreesRegressor

from sklearn.svm import SVR, LinearSVR

from sklearn.linear_model import ElasticNet, SGDRegressor, BayesianRidge

from sklearn.kernel_ridge import KernelRidge

from xgboost import XGBRegressor

# 显示中文

plt.rcParams['font.sans-serif'] = ['SimHei']

plt.rcParams['axes.unicode_minus'] = False

2. 数据理解

2.1 数据概览

# 导入数据

train_df = pd.read_csv('./data/train.csv')

test_df = pd.read_csv('./data/test.csv')

# 查看前几行数据

train_df.head()

print('训练集维度:%s,测试集维度:%s' % (train_df.shape, test_df.shape))

训练集维度:(1460, 81),测试集维度:(1459, 80)

# 查看数据基本信息

train_df.info()

RangeIndex: 1460 entries, 0 to 1459Data columns (total 81 columns):Id 1460 non-null int64MSSubClass 1460 non-null int64MSZoning 1460 non-null objectLotFrontage 1201 non-null float64LotArea 1460 non-null int64Street 1460 non-null objectAlley 91 non-null objectLotShape 1460 non-null objectLandContour 1460 non-null objectUtilities 1460 non-null objectLotConfig 1460 non-null objectLandSlope 1460 non-null objectNeighborhood 1460 non-null objectCondition1 1460 non-null objectCondition2 1460 non-null objectBldgType 1460 non-null objectHouseStyle 1460 non-null objectOverallQual 1460 non-null int64OverallCond 1460 non-null int64YearBuilt 1460 non-null int64YearRemodAdd 1460 non-null int64RoofStyle 1460 non-null objectRoofMatl 1460 non-null objectExterior1st 1460 non-null objectExterior2nd 1460 non-null objectMasVnrType 1452 non-null objectMasVnrArea 1452 non-null float64ExterQual 1460 non-null objectExterCond 1460 non-null objectFoundation 1460 non-null objectBsmtQual 1423 non-null objectBsmtCond 1423 non-null objectBsmtExposure 1422 non-null objectBsmtFinType1 1423 non-null objectBsmtFinSF1 1460 non-null int64BsmtFinType2 1422 non-null objectBsmtFinSF2 1460 non-null int64BsmtUnfSF 1460 non-null int64TotalBsmtSF 1460 non-null int64Heating 1460 non-null objectHeatingQC 1460 non-null objectCentralAir 1460 non-null objectElectrical 1459 non-null object1stFlrSF 1460 non-null int642ndFlrSF 1460 non-null int64LowQualFinSF 1460 non-null int64GrLivArea 1460 non-null int64BsmtFullBath 1460 non-null int64BsmtHalfBath 1460 non-null int64FullBath 1460 non-null int64HalfBath 1460 non-null int64BedroomAbvGr 1460 non-null int64KitchenAbvGr 1460 non-null int64KitchenQual 1460 non-null objectTotRmsAbvGrd 1460 non-null int64Functional 1460 non-null objectFireplaces 1460 non-null int64FireplaceQu 770 non-null objectGarageType 1379 non-null objectGarageYrBlt 1379 non-null float64GarageFinish 1379 non-null objectGarageCars 1460 non-null int64GarageArea 1460 non-null int64GarageQual 1379 non-null objectGarageCond 1379 non-null objectPavedDrive 1460 non-null objectWoodDeckSF 1460 non-null int64OpenPorchSF 1460 non-null int64EnclosedPorch 1460 non-null int643SsnPorch 1460 non-null int64ScreenPorch 1460 non-null int64PoolArea 1460 non-null int64PoolQC 7 non-null objectFence 281 non-null objectMiscFeature 54 non-null objectMiscVal 1460 non-null int64MoSold 1460 non-null int64YrSold 1460 non-null int64SaleType 1460 non-null objectSaleCondition 1460 non-null objectSalePrice 1460 non-null int64dtypes: float64(3), int64(35), object(43)memory usage: 924.0+ KB# 查看数据统计信息train_df.describe()

数据基本信息

训练集维度:(1460, 81),测试集维度:(1459, 80)

特征变量79个(不包括’Id’),目标变量为’SalePrice’

特征变量类型:float64(3), int64(33), object(43)

数据集变量解释

– SalePrice: 房产销售价格,以美元计价。所要预测的目标变量

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值