前言
曾听人说过,中国经济是房地产市场,美国经济是股票市场。中国房地产市场超过400万亿,房地产总值是美国、欧盟、日本总和,但是股市才50万亿,不到美欧日的十分之一。可见房地产对于中国来说地位尤其明显!对于我们很难在一线城市买房的年轻刚需族来说,这确是一个十分头疼的问题。于此,萌生了分析房价并预测的想法(曾经采用R做过尝试,这次将采用python)。
本次将基于北京房价作为测试数据,后期通过爬虫将抓取包括北上广深等城市的数据以供分析。
数据
感谢Qichen Qiu提供链家网2011-2017北京房价数据,感谢Jonathan Bouchet提供的思路。
本次分析基于python3,代码将稍后整理提供于github。
数据特征包含,kaggle上有具体介绍,在此暂不赘述:
url: the url which fetches the data( character )
id: the id of transaction( character )
Lng: and Lat coordinates, using the BD09 protocol. ( numerical )
Cid: community id( numerical )
tradeTime: the time of transaction( character )
DOM: active days on market.( numerical )
followers: the number of people follow the transaction.( numerical )
totalPrice: the total price( numerical )
price: the average price by square( numerical )
square: the square of house( numerical )
livingRoom: the number of living room( character )
drawingRoom: the number of drawing room( character )
kitchen: the number of kitchen( numerical )
bathroom the number of bathroom( character )
floor: the height of the house. I will turn the Chinese characters to English in the next version.( character )
buildingType: including tower( 1 ) , bungalow( 2 ),combination of plate and tower( 3 ), plate( 4 )( numerical )
constructionTime: the time of construction( numerical )
renovationCondition: including other( 1 ), rough( 2 ),Simplicity( 3 ), hardcover( 4 )( numerical )
buildingStructure: including unknow( 1 ), mixed( 2 ), brick and wood( 3 ), brick and concrete( 4 ),steel( 5 ) and steel-concrete composite ( 6 ).( numerical )
ladderRati