本文是数据可视化的第二篇练习文,目的是承接上一篇中国全国各省结婚率和离婚率数据可视化
该篇文章主要使用的是Python数据可视化,用来分析北京地区从2002到2018年房价的趋势变化
为了方便读者理解,在写的篇幅中,不加入代码,所有代码放在最后的附录里。
一:加载数据以及相应的库包
二:检验数据
加载成功后,需要验证数据的正确性
三:数据处理
对于数据的处理,我们一般查看数据中的缺失值
以上6列数据具有缺失值。
然后我们计算数据中空列缺失值的个数
另外解释一下数据中各列值的含义
url: the url which fetches the data
id: the id of transaction
lng: 经
lat: 纬
cid: community id
tradeTime: the time of transaction
followers: the number of people follow the transaction.
price: the average price by square
square: the square of house(1m*1m)
livingRoom: the number of living room
drawingRoom: the number of drawing room
kitchen: the number of kitchen
bathroom: the number of bathroom
constructionTime: 建造年代
floor: the height of the house. I will turn the Chinese characters to English in the next version.
buildingType: including tower( 1 ) , bungalow( 2 ),combination of plate and tower( 3 ), plate( 4 ).
renovationCondition: including other( 1 ), rough( 2 ),Simplicity( 3 ), hardcover( 4 )
buildingStructure: including unknow( 1 ), mixed( 2 ), brick and wood( 3 ), brick and concrete( 4 ),steel( 5 ) and steel-concrete composite ( 6 ).
ladderRatio: the proportion between number of residents on the same floor and number of elevator of ladder. It describes how many ladders a resident have on average.
elevator: have ( 1 ) or not have elevator( 0 )
fiveYearsProperty: if the owner have the property for less than 5 years,
district列表中各区指代内容:
1:东城区
2:丰台区
3:亦庄
4:大兴区
5:房山
6:昌平区
7:朝阳区
8.海淀区
9.石景山
10:西城区
11:通州区
12门头沟
13:顺义区
查看各值的数量: