数据集为2012-18年国内40个城市不同区域的房价数据。原始training set含有553098行数据,test set含有72870行数据;经过去除乱码,空值后,training set含有537242行数据,test set含有72245行数据。
清洗数据
*重命名变量,小写
rename Price price
rename Lat lat
rename Lon lon
rename Time time
rename Floors floors
rename Floor floor
rename Rooms rooms
rename Halls halls
rename Area area
rename OrientS orients
rename City city
rename District district
rename Street street
rename Community community
*价格与经纬度取对数
generate lprice = ln(price)
generate llat = ln(lat)
generate llon = ln(lon)
*得出总室厅数目,如果室厅数为0,则室厅数设置为1
generate nrooms = rooms + halls
replace nrooms = 1 if nrooms == 0
*算平均室厅面积
generate avgareaperroom = area / nrooms
*对平均室厅面积取对数
generate lavgareaperroom = ln(avgareaperroom)
*估计房间具体在几楼:
*利用replace 函数 将floorlocation标记为数字,并同时标记楼层高低
*具体操作 floorlocation 为 1/6 则楼层为3 (低)
*具体操作 floorlocation 为 1/2 则楼层为4 (中)
*具体操作 floorlocation 为 5/6 则楼层为2 (高)
*具体操作 floorlocation 为1 则楼层为 5(顶层)
*标记完毕后将floorlevel 写成结果为标记数字乘楼层的形式例子:即1/6 乘 3
*如果楼层数为0,则楼层数的最终结果设置为1
generate floorlocation = 0
encode floor,g(floor1)
replace floorlocation = 1 / 6 if floor1 == 3
replace floorlocation = 1 / 2 if floor1 == 4
replace floorlocation = 5 / 6 if floor1 == 2
replace floorlocation = 1 if floor1 == 5
generate floorlevel = floors * floorlocation
replace floorlevel = 1 if floorlevel == 0
*对上述步骤的楼层数取对数,及建筑的楼层总数取对数
generate lfloorlevel = ln(floorlevel)
generate lfloors = ln(floors)
输出主回归表
regress lprice lavgareaperroom
est sto a1
regress lprice lavgareaperroom llat llon orients
est sto a2
regress lprice lavgareaperroom llat llon orients i.time i.city1
est sto a3
esttab a1 a2 a3 using C:\Users\lys\Desktop\竞赛Stata文件\主回归.rtf, b(%12.6f) se(%12.6f) nogap compress s(N r2 ar2) star(* 0.1 ** 0.05 *** 0.01)
多重共线性
regress lprice lavgareaperroom llat llon orients i.time i.city1
estat vif
异方差及自相关检验
*异方差检测
regress lprice lavgareaperroom llat llon orients i.time i.city1
est sto a4
predict e,residual
generate esq = e^2
regress esq lavgareaperroom llat llon orients i.time i.city1
est sto a5
esttab a4 a5 using C:\Users\lys\Desktop\竞赛Stata文件\异方差.rtf, b(%12.6f) se(%12.6f) nogap compress s(N r2 ar2) star(* 0.1 ** 0.05 *** 0.01)
di 537242 * 0.0752
*EGLS消除异方差
generate logesq = ln(esq)
regress logesq lavgareaperroom llat llon orients i.time i.city1
est sto a6
predict logesqhat
generate h = exp(logesqhat)
regress lprice lavgareaperroom llat llon orients i.time i.city1 [aweight = 1/h]
est sto a7
esttab a6 a7 using C:\Users\lys\Desktop\竞赛Stata文件\异方差EGLS2.rtf, b(%12.6f) se(%12.6f) nogap compress s(N r2 ar2) star(* 0.1 ** 0.05 *** 0.01)
*EGLS消除自相关(消除异方差基础上)
predict ee,residual
generate eelag1 = ee[_n-1]
regress ee eelag1
est sto a8
generate lpricelag1 = lprice[_n-1]
generate lavgareaperroomlag1 = lavgareaperroom[_n-1]
generate llatlag1 = llat[_n-1]
generate llonlag1 = llon[_n-1]
generate lpricelag1rho = lpricelag1 * 0.4
generate lavgareaperroomlag1rho = lavgareaperroomlag1 * 0.4
generate llatlag1rho = llatlag1 * 0.4
generate llonlag1rho = llonlag1 * 0.4
generate lpriceelgs = lprice - lpricelag1rho
generate lavgareaperroomelgs = lavgareaperroom - lavgareaperroomlag1rho
generate llatelgs = llat - llatlag1rho
generate llonelgs = llon - llatlag1rho
regress lpriceelgs lavgareaperroomelgs llatelgs llonelgs orients i.time i.city1 [aweight = 1/h]
est sto a9
esttab a8 a9 using C:\Users\lys\Desktop\竞赛Stata文件\AU.rtf, b(%12.6f) se(%12.6f) nogap compress s(N r2 ar2) star(* 0.1 ** 0.05 *** 0.01)
模型有效性检验MAD=Σ|ln(y)-ln(y_hat)|/N,本模型由y转化为线性模型,因此y_hat是lnprice的估计值,因此直接带入
encode city,gen(city1)
regress lprice lavgareaperroom llat llon orients i.time i.city1
est sto a1
predict y_hat
rvfplot
gen price1_hat=y_hat
gen uuuuu = abs(lprice-price1_hat)
summarize(uuuuu)
drop y_hat
summarize(city1)
内生性检验-工具变量法
*lprice lavgareaperroom llat llon orients i.time i.city1
gen AREA=ln(area)
regress lavgareaperroom AREA llat llon orients i.time i.city1
predict v
est sto i1
regress lprice v llat llon orients i.time i.city1
est sto i2
esttab i1 i2 using C:\Users\lys\Desktop\竞赛Stata文件\IV.rtf, b(%12.6f) se(%12.6f) nogap compress s(N r2 ar2) star(* 0.1 ** 0.05 *** 0.01)
ivregress 2sls lprice llat llon orients i.time i.city1 (lavgareaperroom=AREA),r first
estat firststage,all forcenonrobust
ivregress liml lprice llat llon orients i.time i.city1 (lavgareaperroom=AREA),r first
est sto i3
esttab i1 i2 i3 using C:\Users\lys\Desktop\竞赛Stata文件\IVliml.rtf, b(%12.6f) se(%12.6f) nogap compress s(N r2 ar2) star(* 0.1 ** 0.05 *** 0.01)
**# 豪斯曼检验(原假设:所有解释变量均外生,)
quietly reg lprice lavgareaperroom llat llon orients i.time i.city1
estimates store ols
quietly ivregress 2sls lprice llat llon orients i.time i.city1 (lavgareaperroom=AREA)
estimates store iv
hausman iv ols,constant sigmamore
*异方差稳健DWH检验
estat endogenous
*内生性汇总结果
quietly reg lprice llat llon orients i.time i.city1,r
estimates store ols_no_AAR
quietly reg lprice lavgareaperroom llat llon orients i.time i.city1,r
estimates store ols_with_AAR
quietly ivregress 2sls lprice llat llon orients i.time i.city1 (lavgareaperroom=AREA),r
estimates store tsls
quietly ivregress liml lprice llat llon orients i.time i.city1 (lavgareaperroom=AREA),r
estimates store liml
estimates table ols_no_AAR ols_with_AAR tsls liml,b se
estimates table ols_no_AAR ols_with_AAR tsls liml,star(0.1 0.05 0.01)
esttab ols_no_AAR ols_with_AAR tsls liml using C:\Users\lys\Desktop\竞赛Stata文件\sixiang.rtf,se r2 mtitle star(* 0.1 ** 0.05 *** 0.01)
hausman iv ols,constant sigmamore
estat endogenous