kaggle住房预测项目——第1部分
其他部分: kaggle住房预测项目——第2部分(bagging) kaggle住房预测项目——第3部分(stacking) kaggle住房预测项目——第4部分(其他数据预处理方法)
项目介绍
项目地址
目标:预测每个房屋的销售价格是您的工作。对于测试集中的每个ID,您必须预测SalePrice变量的值。
评估指标
根据预测值的对数与观察到的销售价格的对数之间的均方根误差(RMSE)评估提交的内容。(记录日志意味着预测昂贵房屋和廉价房屋的错误将同等地影响结果。)
加载数据集
导入工具包,数据读取
import pandas as pd
import numpy as np
import matplotlib. pyplot as plt
% matplotlib inline
import seaborn as sns
from sklearn. metrics import mean_squared_error
from sklearn. model_selection import train_test_split
from sklearn. model_selection import GridSearchCV
from sklearn. model_selection import cross_val_score
import warnings
warnings. filterwarnings( 'ignore' )
pd. set_option( 'display.max_columns' , None )
pd. set_option( 'display.max_rows' , None )
pd. set_option( 'max_colwidth' , 100 )
data_sample_submission = pd. read_csv( './data/sample_submission.csv' )
data_train = pd. read_csv( './data/train.csv' )
data_test = pd. read_csv( './data/test.csv' )
基本信息
data_sample_submission. head( )
Id
SalePrice
0
1461
169277.052498
1
1462
187758.393989
2
1463
183583.683570
3
1464
179317.477511
4
1465
150730.079977
data_sample_submission. info( )
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1459 non-null int64
1 SalePrice 1459 non-null float64
dtypes: float64(1), int64(1)
memory usage: 22.9 KB
data_train. head( )
Id
MSSubClass
MSZoning
LotFrontage
LotArea
Street
Alley
LotShape
LandContour
Utilities
LotConfig
LandSlope
Neighborhood
Condition1
Condition2
BldgType
HouseStyle
OverallQual
OverallCond
YearBuilt
YearRemodAdd
RoofStyle
RoofMatl
Exterior1st
Exterior2nd
MasVnrType
MasVnrArea
ExterQual
ExterCond
Foundation
BsmtQual
BsmtCond
BsmtExposure
BsmtFinType1
BsmtFinSF1
BsmtFinType2
BsmtFinSF2
BsmtUnfSF
TotalBsmtSF
Heating
HeatingQC
CentralAir
Electrical
1stFlrSF
2ndFlrSF
LowQualFinSF
GrLivArea
BsmtFullBath
BsmtHalfBath
FullBath
HalfBath
BedroomAbvGr
KitchenAbvGr
KitchenQual
TotRmsAbvGrd
Functional
Fireplaces
FireplaceQu
GarageType
GarageYrBlt
GarageFinish
GarageCars
GarageArea
GarageQual
GarageCond
PavedDrive
WoodDeckSF
OpenPorchSF
EnclosedPorch
3SsnPorch
ScreenPorch
PoolArea
PoolQC
Fence
MiscFeature
MiscVal
MoSold
YrSold
SaleType
SaleCondition
SalePrice
0
1
60
RL
65.0
8450
Pave
NaN
Reg
Lvl
AllPub
Inside
Gtl
CollgCr
Norm
Norm
1Fam
2Story
7
5
2003
2003
Gable
CompShg
VinylSd
VinylSd
BrkFace
196.0
Gd
TA
PConc
Gd
TA
No
GLQ
706
Unf
0
150
856
GasA
Ex
Y
SBrkr
856
854
0
1710
1
0
2
1
3
1
Gd
8
Typ
0
NaN
Attchd
2003.0
RFn
2
548
TA
TA
Y
0
61
0
0
0
0
NaN
NaN
NaN
0
2
2008
WD
Normal
208500
1
2
20
RL
80.0
9600
Pave
NaN
Reg
Lvl
AllPub
FR2
Gtl
Veenker
Feedr
Norm
1Fam
1Story
6
8
1976
1976
Gable
CompShg
MetalSd
MetalSd
None
0.0
TA
TA
CBlock
Gd
TA
Gd
ALQ
978
Unf
0
284
1262
GasA
Ex
Y
SBrkr
1262
0
0
1262
0
1
2
0
3
1
TA
6
Typ
1
TA
Attchd
1976.0
RFn
2
460
TA
TA
Y
298
0
0
0
0
0
NaN
NaN
NaN
0
5
2007
WD
Normal
181500
2
3
60
RL
68.0
11250
Pave
NaN
IR1
Lvl
AllPub
Inside
Gtl
CollgCr
Norm
Norm
1Fam
2Story
7
5
2001
2002
Gable
CompShg
VinylSd
VinylSd
BrkFace
162.0
Gd
TA
PConc
Gd
TA
Mn
GLQ
486
Unf
0
434
920
GasA
Ex
Y
SBrkr
920
866
0
1786
1
0
2
1
3
1
Gd
6
Typ
1
TA
Attchd
2001.0
RFn
2
608
TA
TA
Y
0
42
0
0
0
0
NaN
NaN
NaN
0
9
2008
WD
Normal
223500
3
4
70
RL
60.0
9550
Pave
NaN
IR1
Lvl
AllPub
Corner
Gtl
Crawfor
Norm
Norm
1Fam
2Story
7
5
1915
1970
Gable
CompShg
Wd Sdng
Wd Shng
None
0.0
TA
TA
BrkTil
TA
Gd
No
ALQ
216
Unf
0
540
756
GasA
Gd
Y
SBrkr
961
756
0
1717
1
0
1
0
3
1
Gd
7
Typ
1
Gd
Detchd
1998.0
Unf
3
642
TA
TA
Y
0
35
272
0
0
0
NaN
NaN
NaN
0
2
2006
WD
Abnorml
140000
4
5
60
RL
84.0
14260
Pave
NaN
IR1
Lvl
AllPub
FR2
Gtl
NoRidge
Norm
Norm
1Fam
2Story
8
5
2000
2000
Gable
CompShg
VinylSd
VinylSd
BrkFace
350.0
Gd
TA
PConc
Gd
TA
Av
GLQ
655
Unf
0
490
1145
GasA
Ex
Y
SBrkr
1145
1053
0
2198
1
0
2
1
4
1
Gd
9
Typ
1
TA
Attchd
2000.0
RFn
3
836
TA
TA
Y
192
84
0
0
0
0
NaN
NaN
NaN
0
12
2008
WD
Normal
250000
data_train. info( )
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1460 non-null int64
1 MSSubClass 1460 non-null int64
2 MSZoning 1460 non-null object
3 LotFrontage 1201 non-null float64
4 LotArea 1460 non-null int64
5 Street 1460 non-null object
6 Alley 91 non-null object
7 LotShape 1460 non-null object
8 LandContour 1460 non-null object
9 Utilities 1460 non-null object
10 LotConfig 1460 non-null object
11 LandSlope 1460 non-null object
12 Neighborhood 1460 non-null object
13 Condition1 1460 non-null object
14 Condition2 1460 non-null object
15 BldgType 1460 non-null object
16 HouseStyle 1460 non-null object
17 OverallQual 1460 non-null int64
18 OverallCond 1460 non-null int64
19 YearBuilt 1460 non-null int64
20 YearRemodAdd 1460 non-null int64
21 RoofStyle 1460 non-null object
22 RoofMatl 1460 non-null object
23 Exterior1st 1460 non-null object
24 Exterior2nd 1460 non-null object
25 MasVnrType 1452 non-null object
26 MasVnrArea 1452 non-null float64
27 ExterQual 1460 non-null object
28 ExterCond 1460 non-null object
29 Foundation 1460 non-null object
30 BsmtQual 1423 non-null object
31 BsmtCond 1423 non-null object
32 BsmtExposure 1422 non-null object
33 BsmtFinType1 1423 non-null object
34 BsmtFinSF1 1460 non-null int64
35 BsmtFinType2 1422 non-null object
36 BsmtFinSF2 1460 non-null int64
37 BsmtUnfSF 1460 non-null int64
38 TotalBsmtSF 1460 non-null int64
39 Heating 1460 non-null object
40 HeatingQC 1460 non-null object
41 CentralAir 1460 non-null object
42 Electrical 1459 non-null object
43 1stFlrSF 1460 non-null int64
44 2ndFlrSF 1460 non-null int64
45 LowQualFinSF 1460 non-null int64
46 GrLivArea 1460 non-null int64
47 BsmtFullBath 1460 non-null int64
48 BsmtHalfBath 1460 non-null int64
49 FullBath 1460 non-null int64
50 HalfBath 1460 non-null int64
51 BedroomAbvGr 1460 non-null int64
52 KitchenAbvGr 1460 non-null int64
53 KitchenQual 1460 non-null object
54 TotRmsAbvGrd 1460 non-null int64
55 Functional 1460 non-null object
56 Fireplaces 1460 non-null int64
57 FireplaceQu 770 non-null object
58 GarageType 1379 non-null object
59 GarageYrBlt 1379 non-null float64
60 GarageFinish 1379 non-null object
61 GarageCars 1460 non-null int64
62 GarageArea 1460 non-null int64
63 GarageQual 1379 non-null object
64 GarageCond 1379 non-null object
65 PavedDrive 1460 non-null object
66 WoodDeckSF 1460 non-null int64
67 OpenPorchSF 1460 non-null int64
68 EnclosedPorch 1460 non-null int64
69 3SsnPorch 1460 non-null int64
70 ScreenPorch 1460 non-null int64
71 PoolArea 1460 non-null int64
72 PoolQC 7 non-null object
73 Fence 281 non-null object
74 MiscFeature 54 non-null object
75 MiscVal 1460 non-null int64
76 MoSold 1460 non-null int64
77 YrSold 1460 non-null int64
78 SaleType 1460 non-null object
79 SaleCondition 1460 non-null object
80 SalePrice 1460 non-null int64
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
data_test. head( )
Id
MSSubClass
MSZoning
LotFrontage
LotArea
Street
Alley
LotShape
LandContour
Utilities
LotConfig
LandSlope
Neighborhood
Condition1
Condition2
BldgType
HouseStyle
OverallQual
OverallCond
YearBuilt
YearRemodAdd
RoofStyle
RoofMatl
Exterior1st
Exterior2nd
MasVnrType
MasVnrArea
ExterQual
ExterCond
Foundation
BsmtQual
BsmtCond
BsmtExposure
BsmtFinType1
BsmtFinSF1
BsmtFinType2
BsmtFinSF2
BsmtUnfSF
TotalBsmtSF
Heating
HeatingQC
CentralAir
Electrical
1stFlrSF
2ndFlrSF
LowQualFinSF
GrLivArea
BsmtFullBath
BsmtHalfBath
FullBath
HalfBath
BedroomAbvGr
KitchenAbvGr
KitchenQual
TotRmsAbvGrd
Functional
Fireplaces
FireplaceQu
GarageType
GarageYrBlt
GarageFinish
GarageCars
GarageArea
GarageQual
GarageCond
PavedDrive
WoodDeckSF
OpenPorchSF
EnclosedPorch
3SsnPorch
ScreenPorch
PoolArea
PoolQC
Fence
MiscFeature
MiscVal
MoSold
YrSold
SaleType
SaleCondition
0
1461
20
RH
80.0
11622
Pave
NaN
Reg
Lvl
AllPub
Inside
Gtl
NAmes
Feedr
Norm
1Fam
1Story
5
6
1961
1961
Gable
CompShg
VinylSd
VinylSd
None
0.0
TA
TA
CBlock
TA
TA
No
Rec
468.0
LwQ
144.0
270.0
882.0
GasA
TA
Y
SBrkr
896
0
0
896
0.0
0.0
1
0
2
1
TA
5
Typ
0
NaN
Attchd
1961.0
Unf
1.0
730.0
TA
TA
Y
140
0
0
0
120
0
NaN
MnPrv
NaN
0
6
2010
WD
Normal
1
1462
20
RL
81.0
14267
Pave
NaN
IR1
Lvl
AllPub
Corner
Gtl
NAmes
Norm
Norm
1Fam
1Story
6
6
1958
1958
Hip
CompShg
Wd Sdng
Wd Sdng
BrkFace
108.0
TA
TA
CBlock
TA
TA
No
ALQ
923.0
Unf
0.0
406.0
1329.0
GasA
TA
Y
SBrkr
1329
0
0
1329
0.0
0.0
1
1
3
1
Gd
6
Typ
0
NaN
Attchd
1958.0
Unf
1.0
312.0
TA
TA
Y
393
36
0
0
0
0
NaN
NaN
Gar2
12500
6
2010
WD
Normal
2
1463
60
RL
74.0
13830
Pave
NaN
IR1
Lvl
AllPub
Inside
Gtl
Gilbert
Norm
Norm
1Fam
2Story
5
5
1997
1998
Gable
CompShg
VinylSd
VinylSd
None
0.0
TA
TA
PConc
Gd
TA
No
GLQ
791.0
Unf
0.0
137.0
928.0
GasA
Gd
Y
SBrkr
928
701
0
1629
0.0
0.0
2
1
3
1
TA
6
Typ
1
TA
Attchd
1997.0
Fin
2.0
482.0
TA
TA
Y
212
34
0
0
0
0
NaN
MnPrv
NaN
0
3
2010
WD
Normal
3
1464
60
RL
78.0
9978
Pave
NaN
IR1
Lvl
AllPub
Inside
Gtl
Gilbert
Norm
Norm
1Fam
2Story
6
6
1998
1998
Gable
CompShg
VinylSd
VinylSd
BrkFace
20.0
TA
TA
PConc
TA
TA
No
GLQ
602.0
Unf
0.0
324.0
926.0
GasA
Ex
Y
SBrkr
926
678
0
1604
0.0
0.0
2
1
3
1
Gd
7
Typ
1
Gd
Attchd
1998.0
Fin
2.0
470.0
TA
TA
Y
360
36
0
0
0
0
NaN
NaN
NaN
0
6
2010
WD
Normal
4
1465
120
RL
43.0
5005
Pave
NaN
IR1
HLS
AllPub
Inside
Gtl
StoneBr
Norm
Norm
TwnhsE
1Story
8
5
1992
1992
Gable
CompShg
HdBoard
HdBoard
None
0.0
Gd
TA
PConc
Gd
TA
No
ALQ
263.0
Unf
0.0
1017.0
1280.0
GasA
Ex
Y
SBrkr
1280
0
0
1280
0.0
0.0
2
0
2
1
Gd
5
Typ
0
NaN
Attchd
1992.0
RFn
2.0
506.0
TA
TA
Y
0
82
0
0
144
0
NaN
NaN
NaN
0
1
2010
WD
Normal
data_test. info( )
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 80 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1459 non-null int64
1 MSSubClass 1459 non-null int64
2 MSZoning 1455 non-null object
3 LotFrontage 1232 non-null float64
4 LotArea 1459 non-null int64
5 Street 1459 non-null object
6 Alley 107 non-null object
7 LotShape 1459 non-null object
8 LandContour 1459 non-null object
9 Utilities 1457 non-null object
10 LotConfig 1459 non-null object
11 LandSlope 1459 non-null object
12 Neighborhood 1459 non-null object
13 Condition1 1459 non-null object
14 Condition2 1459 non-null object
15 BldgType 1459 non-null object
16 HouseStyle 1459 non-null object
17 OverallQual 1459 non-null int64
18 OverallCond 1459 non-null int64
19 YearBuilt 1459 non-null int64
20 YearRemodAdd 1459 non-null int64
21 RoofStyle 1459 non-null object
22 RoofMatl 1459 non-null object
23 Exterior1st 1458 non-null object
24 Exterior2nd 1458 non-null object
25 MasVnrType 1443 non-null object
26 MasVnrArea 1444 non-null float64
27 ExterQual 1459 non-null object
28 ExterCond 1459 non-null object
29 Foundation 1459 non-null object
30 BsmtQual 1415 non-null object
31 BsmtCond 1414 non-null object
32 BsmtExposure 1415 non-null object
33 BsmtFinType1 1417 non-null object
34 BsmtFinSF1 1458 non-null float64
35 BsmtFinType2 1417 non-null object
36 BsmtFinSF2 1458 non-null float64
37 BsmtUnfSF 1458 non-null float64
38 TotalBsmtSF 1458 non-null float64
39 Heating 1459 non-null object
40 HeatingQC 1459 non-null object
41 CentralAir 1459 non-null object
42 Electrical 1459 non-null object
43 1stFlrSF 1459 non-null int64
44 2ndFlrSF 1459 non-null int64
45 LowQualFinSF 1459 non-null int64
46 GrLivArea 1459 non-null int64
47 BsmtFullBath 1457 non-null float64
48 BsmtHalfBath 1457 non-null float64
49 FullBath 1459 non-null int64
50 HalfBath 1459 non-null int64
51 BedroomAbvGr 1459 non-null int64
52 KitchenAbvGr 1459 non-null int64
53 KitchenQual 1458 non-null object
54 TotRmsAbvGrd 1459 non-null int64
55 Functional 1457 non-null object
56 Fireplaces 1459 non-null int64
57 FireplaceQu 729 non-null object
58 GarageType 1383 non-null object
59 GarageYrBlt 1381 non-null float64
60 GarageFinish 1381 non-null object
61 GarageCars 1458 non-null float64
62 GarageArea 1458 non-null float64
63 GarageQual 1381 non-null object
64 GarageCond 1381 non-null object
65 PavedDrive 1459 non-null object
66 WoodDeckSF 1459 non-null int64
67 OpenPorchSF 1459 non-null int64
68 EnclosedPorch 1459 non-null int64
69 3SsnPorch 1459 non-null int64
70 ScreenPorch 1459 non-null int64
71 PoolArea 1459 non-null int64
72 PoolQC 3 non-null object
73 Fence 290 non-null object
74 MiscFeature 51 non-null object
75 MiscVal 1459 non-null int64
76 MoSold 1459 non-null int64
77 YrSold 1459 non-null int64
78 SaleType 1458 non-null object
79 SaleCondition 1459 non-null object
dtypes: float64(11), int64(26), object(43)
memory usage: 912.0+ KB
data_train. describe( )
Id
MSSubClass
LotFrontage
LotArea
OverallQual
OverallCond
YearBuilt
YearRemodAdd
MasVnrArea
BsmtFinSF1
BsmtFinSF2
BsmtUnfSF
TotalBsmtSF
1stFlrSF
2ndFlrSF
LowQualFinSF
GrLivArea
BsmtFullBath
BsmtHalfBath
FullBath
HalfBath
BedroomAbvGr
KitchenAbvGr
TotRmsAbvGrd
Fireplaces
GarageYrBlt
GarageCars
GarageArea
WoodDeckSF
OpenPorchSF
EnclosedPorch
3SsnPorch
ScreenPorch
PoolArea
MiscVal
MoSold
YrSold
SalePrice
count
1460.000000
1460.000000
1201.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1452.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1379.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
1460.000000
mean
730.500000
56.897260
70.049958
10516.828082
6.099315
5.575342
1971.267808
1984.865753
103.685262
443.639726
46.549315
567.240411
1057.429452
1162.626712
346.992466
5.844521
1515.463699
0.425342
0.057534
1.565068
0.382877
2.866438
1.046575
6.517808
0.613014
1978.506164
1.767123
472.980137
94.244521
46.660274
21.954110
3.409589
15.060959
2.758904
43.489041
6.321918
2007.815753
180921.195890
std
421.610009
42.300571
24.284752
9981.264932
1.382997
1.112799
30.202904
20.645407
181.066207
456.098091
161.319273
441.866955
438.705324
386.587738
436.528436
48.623081
525.480383
0.518911
0.238753
0.550916
0.502885
0.815778
0.220338
1.625393
0.644666
24.689725
0.747315
213.804841
125.338794
66.256028
61.119149
29.317331
55.757415
40.177307
496.123024
2.703626
1.328095
79442.502883
min
1.000000
20.000000
21.000000
1300.000000
1.000000
1.000000
1872.000000
1950.000000
0.000000
0.000000
0.000000
0.000000
0.000000
334.000000
0.000000
0.000000
334.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
2.000000
0.000000
1900.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
1.000000
2006.000000
34900.000000
25%
365.750000
20.000000
59.000000
7553.500000
5.000000
5.000000
1954.000000
1967.000000
0.000000
0.000000
0.000000
223.000000
795.750000
882.000000
0.000000
0.000000
1129.500000
0.000000
0.000000
1.000000
0.000000
2.000000
1.000000
5.000000
0.000000
1961.000000
1.000000
334.500000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
5.000000
2007.000000
129975.000000
50%
730.500000
50.000000
69.000000
9478.500000
6.000000
5.000000
1973.000000
1994.000000
0.000000
383.500000
0.000000
477.500000
991.500000
1087.000000
0.000000
0.000000
1464.000000
0.000000
0.000000
2.000000
0.000000
3.000000
1.000000
6.000000
1.000000
1980.000000
2.000000
480.000000
0.000000
25.000000
0.000000
0.000000
0.000000
0.000000
0.000000
6.000000
2008.000000
163000.000000
75%
1095.250000
70.000000
80.000000
11601.500000
7.000000
6.000000
2000.000000
2004.000000
166.000000
712.250000
0.000000
808.000000
1298.250000
1391.250000
728.000000
0.000000
1776.750000
1.000000
0.000000
2.000000
1.000000
3.000000
1.000000
7.000000
1.000000
2002.000000
2.000000
576.000000
168.000000
68.000000
0.000000
0.000000
0.000000
0.000000
0.000000
8.000000
2009.000000
214000.000000
max
1460.000000
190.000000
313.000000
215245.000000
10.000000
9.000000
2010.000000
2010.000000
1600.000000
5644.000000
1474.000000
2336.000000
6110.000000
4692.000000
2065.000000
572.000000
5642.000000
3.000000
2.000000
3.000000
2.000000
8.000000
3.000000
14.000000
3.000000
2010.000000
4.000000
1418.000000
857.000000
547.000000
552.000000
508.000000
480.000000
738.000000
15500.000000
12.000000
2010.000000
755000.000000
data_train. head( )
Id
MSSubClass
MSZoning
LotFrontage
LotArea
Street
Alley
LotShape
LandContour
Utilities
LotConfig
LandSlope
Neighborhood
Condition1
Condition2
BldgType
HouseStyle
OverallQual
OverallCond
YearBuilt
YearRemodAdd
RoofStyle
RoofMatl
Exterior1st
Exterior2nd
MasVnrType
MasVnrArea
ExterQual
ExterCond
Foundation
BsmtQual
BsmtCond
BsmtExposure
BsmtFinType1
BsmtFinSF1
BsmtFinType2
BsmtFinSF2
BsmtUnfSF
TotalBsmtSF
Heating
HeatingQC
CentralAir
Electrical
1stFlrSF
2ndFlrSF
LowQualFinSF
GrLivArea
BsmtFullBath
BsmtHalfBath
FullBath
HalfBath
BedroomAbvGr
KitchenAbvGr
KitchenQual
TotRmsAbvGrd
Functional
Fireplaces
FireplaceQu
GarageType
GarageYrBlt
GarageFinish
GarageCars
GarageArea
GarageQual
GarageCond
PavedDrive
WoodDeckSF
OpenPorchSF
EnclosedPorch
3SsnPorch
ScreenPorch
PoolArea
PoolQC
Fence
MiscFeature
MiscVal
MoSold
YrSold
SaleType
SaleCondition
SalePrice
0
1
60
RL
65.0
8450
Pave
NaN
Reg
Lvl
AllPub
Inside
Gtl
CollgCr
Norm
Norm
1Fam
2Story
7
5
2003
2003
Gable
CompShg
VinylSd
VinylSd
BrkFace
196.0
Gd
TA
PConc
Gd
TA
No
GLQ
706
Unf
0
150
856
GasA
Ex
Y
SBrkr
856
854
0
1710
1
0
2
1
3
1
Gd
8
Typ
0
NaN
Attchd
2003.0
RFn
2
548
TA
TA
Y
0
61
0
0
0
0
NaN
NaN
NaN
0
2
2008
WD
Normal
208500
1
2
20
RL
80.0
9600
Pave
NaN
Reg
Lvl
AllPub
FR2
Gtl
Veenker
Feedr
Norm
1Fam
1Story
6
8
1976
1976
Gable
CompShg
MetalSd
MetalSd
None
0.0
TA
TA
CBlock
Gd
TA
Gd
ALQ
978
Unf
0
284
1262
GasA
Ex
Y
SBrkr
1262
0
0
1262
0
1
2
0
3
1
TA
6
Typ
1
TA
Attchd
1976.0
RFn
2
460
TA
TA
Y
298
0
0
0
0
0
NaN
NaN
NaN
0
5
2007
WD
Normal
181500
2
3
60
RL
68.0
11250
Pave
NaN
IR1
Lvl
AllPub
Inside
Gtl
CollgCr
Norm
Norm
1Fam
2Story
7
5
2001
2002
Gable
CompShg
VinylSd
VinylSd
BrkFace
162.0
Gd
TA
PConc
Gd
TA
Mn
GLQ
486
Unf
0
434
920
GasA
Ex
Y
SBrkr
920
866
0
1786
1
0
2
1
3
1
Gd
6
Typ
1
TA
Attchd
2001.0
RFn
2
608
TA
TA
Y
0
42
0
0
0
0
NaN
NaN
NaN
0
9
2008
WD
Normal
223500
3
4
70
RL
60.0
9550
Pave
NaN
IR1
Lvl
AllPub
Corner
Gtl
Crawfor
Norm
Norm
1Fam
2Story
7
5
1915
1970
Gable
CompShg
Wd Sdng
Wd Shng
None
0.0
TA
TA
BrkTil
TA
Gd
No
ALQ
216
Unf
0
540
756
GasA
Gd
Y
SBrkr
961
756
0
1717
1
0
1
0
3
1
Gd
7
Typ
1
Gd
Detchd
1998.0
Unf
3
642
TA
TA
Y
0
35
272
0
0
0
NaN
NaN
NaN
0
2
2006
WD
Abnorml
140000
4
5
60
RL
84.0
14260
Pave
NaN
IR1
Lvl
AllPub
FR2
Gtl
NoRidge
Norm
Norm
1Fam
2Story
8
5
2000
2000
Gable
CompShg
VinylSd
VinylSd
BrkFace
350.0
Gd
TA
PConc
Gd
TA
Av
GLQ
655
Unf
0
490
1145
GasA
Ex
Y
SBrkr
1145
1053
0
2198
1
0
2
1
4
1
Gd
9
Typ
1
TA
Attchd
2000.0
RFn
3
836
TA
TA
Y
192
84
0
0
0
0
NaN
NaN
NaN
0
12
2008
WD
Normal
250000
data_train. shape
(1460, 81)
data_test. shape
(1459, 80)
探索性数据分析(EDA)
数据缺失情况
def missing_data ( data) :
total = data. isnull( ) . sum ( ) . sort_values( ascending = False )
percent = ( data. isnull