Analysis and improvement of the chart “comparison of average total price of housing resources in Hangzhou city”
1. Background and why we choice the topic.
1.1 Background.
At present, house prices are very high all over China, especially in the first tier cities. Young people may not be able to afford a house on their own. This has led to the need for young people to start planning for future cities to live in. In this way, they can choose a city suitable for them to live in according to their ability. But in fact, they don’t know where to start because they don’t have house price information across the country. Therefore, the research results of housing price survey all over the country are of great reference significance for young people. Now young people are in great need of house price data from all over the country.
1.2 Why we choice the topic.
Our team found some house price research reports from the Internet. One of them is about the house price survey of Hangzhou, which has just entered the first tier city of China. Hangzhou is a hot city in China. Its information industry is very developed. Its other name is “the city of the Internet”. It attracts the attention of countless young people. Therefore, the housing price information survey report of Hangzhou is of great significance to many young people.
In this research project, there is a chart of “the average total price comparison of housing resources in Hangzhou city”. This picture contains the information that young people are eager to get - the average house price information in Hangzhou. This chart expresses a lot of important and useful information for young people, but it is also mixed with a lot of useless information, which is not convenient for young people to understand the key information to be expressed in the diagram. This chart has a lot of room for improvement. At the same time, because this chart is of great significance to young people, let’s analyze this chart and try to improve it.
The origine chart:
(The horizontal axis represents the area and the vertical axis represents the average unit price of the house.The title of the chart is “comparison of average total prices of housing resources in different urban areas of Hangzhou”.)
2. Analyze the original chart.
2.1 Introduce the contents of chart 1.
The chart of “comparison of average total price of housing resources in different urban areas of Hangzhou” is a bar chart. The original chart contains the title of “comparison of average total price of housing resources in different urban areas of Hangzhou”; the coordinate axis; the labels of “urban area” and “total price / 10000 yuan”, with 100 as the price scale and urban area as the unit, and the blue color bar; the regions are arranged from left to right in the order of high to low house prices.
From this chart, we can get the ranking of regions according to the average house prices. And we can get the approximate price of the average house price in each region. For example, we can see in this chart that the average house price in West Lake District is the highest, which is estimated to reach 4.8 million yuan / set. From left to right, Binjiang District, Shangcheng district and Gongshu District are the second highest in average house prices…etc. Fuyang district and Qiantang new district are the lower average housing prices in Hangzhou, with an average price of about 2.5 million yuan / set.
2.2 Visual variables in the chart.
The visual variables included in the graph are: the style, size, direction, sorting, spacing and filling color of the bar; the size, font, direction and spacing of the text; the hue, brightness and chroma of the color. In general, visual variables are not complex.We can modify and add new visual variables appropriately on the original visual variables
2.3 Analyze the expression effect of the original chart.
This chart interface is relatively neat, there are not many variables or legends, so readers can get the important information they want with less effort. For example, the reader can input sorted regional data into the brain according to the level of the bar at a glance. At the same time, according to the vertical axis of the concise scale of 100 units, the approximate range of housing prices in various regions is introduced into the brain. The brain then links regional data to house prices. Readers can easily get the important information that the chart wants to express, and the amount of information is not large. The information they get from charts can easily change from short-term memory to long-term memory. The effect of chart expression is good, it makes users easily get the important information that the chart wants to express, and never forget it for a long time.
This map can meet the young people’s demand for housing price data, but it does not mean that the map is perfect. There are problems in the map and there is room for improvement.
3. Try to restore the original chart
3.1 Get data.
We got 30772 pieces of data from the research report. Here’s part of the data:
(The data table has 14 columns :Property rights; attention; area; price; community; life; total price / 10000 yuan; house type; house code; listing time; orientation; floor; decoration; area.)
Property rights | attention | area | price | community | life | total price / 10000 yuan | house type | house code | listing time | orientation | floor | decoration | area |
---|
A part of the original data chart:
产权 | 关注 | 区域 | 单价 | 小区 | 年限 | 总价/万元 | 户型 | 房屋编码 | 挂牌时间 | 朝向 | 楼层 | 装修情况 | 面积 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
70年 | 0 | 余杭临平 | 21015元/平米 | 众安理想湾 | 2015年建/板楼 | 210 | 3室2厅 | 103105013026 | 2019-06-12 | 南 北 | 低楼层/共33层 | 平层/精装 | 99.93平米 |
70年 | 4 | 余杭临平 | 28416元/平米 | 众安理想湾 | 2016年建/板塔结合 | 780 | 6室2厅 | 103104324906 | 2019-04-04 | 南 | 联排/共3层 | 毛坯 | 274.5平米 |
70年 | 2 | 余杭临平 | 17323元/平米 | 众安理想湾 | 2015年建/板楼 | 220 | 3室2厅 | 103102855120 | 2018-09-07 | 南 | 高楼层/共33层 | 精装 | 127平米 |
… | … | … | … | … | … | … | … | … | … | … | … | … | |
70年 | 0 | 余杭瓶窑 | 19613元/平米 | 北湖绿洲花园 | 2013年建/板楼 | 560 | 5室2厅 | 103104419897 | 2019-04-14 | 南 | 联排/共3层 | 毛坯 | 285.53平米 |
70年 | 0 | 余杭瓶窑 | 22314元/平米 | 北湖绿洲花园 | 2013年建/板楼 | 600 | 5室3厅 | 103104663598 | 2019-05-08 | 南 | 共3层 | 毛坯 | 268.9平米 |
70年 | 2 | 余杭瓶窑 | 14946元/平米 | 北湖绿洲花园 | 未知年建/板楼 | 275 | 4室2厅 | 103103212337 | 2018-10-25 | 南 北 | 高楼层/共11层 | 毛坯 | 184平米 |
3.2 Write code.
Python code for drawing the original chart:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
from pyecharts import options as opts
from pyecharts.charts import Bar
import warnings
warnings.filterwarnings('ignore')
#Categorizing data by region
def location(x):
if "临安" in x: return "临安市"
elif "上城" in x: return "上城区"
elif "下城" in x: return "下城区"
elif "江干" in x: return "江干区"
elif "拱墅" in x: return "拱墅区"
elif "西湖" in x: return "西湖区"
elif "滨江" in x: return "滨江区"
elif "萧山" in x: return "萧山区"
elif "余杭" in x: return "余杭区"
elif "富阳" in x: return "富阳区"
elif "钱塘" in x: return "钱塘新区"
else: return "其他"
#Read data
data=pd.read_csv("C:\\Users\\14110\\Desktop\\house.csv")
#Processing missing data
data.dropna(how="any",inplace=True)
#Call the region classification function
data["地理位置"]=data["区域"].apply(location)
#Calculate the average house price per region
sum_area=data.groupby("地理位置")["总价/万元"].mean().sort_values(ascending=False).reset_index()
#draw
plt.figure(figsize=(8, 6))
ax=sns.barplot(sum_area["地理位置"],sum_area["总价/万元"],palette=sns.color_palette('Blues_r'))
ax.set_title("杭州市各城区房源平均总价对比")
ax.set_xlabel("城区")
ax.set_ylabel("总价/万元")
plt.show()
#The same meaning:
'''
plt.figure(figsize=(8, 6))
ax=sns.barplot(sum_area["geographical position"],sum_area["Total price/10000 yuan"],palette=sns.color_palette('Blues_r'))
ax.set_title("Comparison of average total price of housing in Hangzhou")
ax.set_xlabel("region")
ax.set_ylabel("Total price/10000 yuan")
plt.show()
'''
3.3 Code effect.
(The horizontal axis represents the area and the vertical axis represents the average unit price of the house.The title of the chart is “comparison of average total prices of housing resources in different urban areas of Hangzhou”.)
4. These aspects of the original chart need to be improved.Let us do it.
-
Color. The original image has a blue gradient. We don’t think it’s appropriate for the author to do this. Gradient color will cause users to think “what’s the meaning of gradient color?” It is easy for users to miss the key information presented in the chart. Not only that, the original image uses two rounds of gradient colors. We think that it is completely unnecessary for the author to do so, and the author’s practice will greatly distract the user’s energy. Users will think, “each color corresponds to two bars. What’s the relationship between them?” In fact, there is no relationship between the paired bars except for contrast. For improvement, we decided to use a single color fill bar. Considering the user experience, we decided to use a softer light red filling bar and a light yellow background for the chart.
-
Data label. There is no grid guide in this chart, so it is difficult for users to know the data size represented by each bar. Therefore, data labels should be added to each bar in the diagram. These data tags can not only visually show the data size represented by each bar, but also reduce the user’s energy use, making it easier for users to focus on obtaining other key information in the graph.
-
Remove the top and right axis borders of the chart. In this way, it gets rid of the traditional black box visual chart, and the chart is more beautiful. The improved chart doesn’t consume users’ energy, and the beautiful chart can make users feel relaxed and help them concentrate on obtaining the information in the chart.
-
Add suspension box. We add a suspension box for each bar. When the user’s mouse cursor is placed on the bar, a floating box will pop up next to the bar, which displays the area represented by this bar and the average house price in this area. This kind of drawing method is relatively new, which can arouse the user’s interest in the chart. The display of the chart itself does not add additional content, nor does it interfere with readers’ access to key information.
5. Try to implement improvement
5.1 Combined with the above code, make the following code.
from pyecharts import options as opts
from pyecharts.charts import Bar
#Keep one decimal place for house price data,
#it is convenient for data label representation.
sum_area['总价/万元'] = round(sum_area['总价/万元'],1)
#draw
bar2=Bar(init_opts=opts.InitOpts(theme='vintage',width = '650px', height='400px'))
bar2.add_xaxis(sum_area["地理位置"].to_list())
bar2.add_yaxis("总价/万元",sum_area["总价/万元"].to_list())
bar2.set_series_opts(label_opts=opts.LabelOpts(is_show=True))
bar2.set_global_opts(title_opts=opts.TitleOpts(title="杭州市各城区房源平均总价对比"),yaxis_opts=opts.AxisOpts(
name='总价/万元'),xaxis_opts=opts.AxisOpts(name='地理位置',axislabel_opts={"interval":"0","rotate":45}))
bar2.render_notebook()
#The same meaning:
'''
bar2=Bar(init_opts=opts.InitOpts(theme='vintage',width = '650px', height='400px'))
bar2.add_xaxis(sum_area["geographical position"].to_list())
bar2.add_yaxis("Total price/10000 yuan",sum_area["Total price/10000 yuan"].to_list())
bar2.set_series_opts(label_opts=opts.LabelOpts(is_show=True))
bar2.set_global_opts(title_opts=opts.TitleOpts(title="Comparison of average total price of housing in Hangzhou"),yaxis_opts=opts.AxisOpts(
name='Total price/10000 yuan'),xaxis_opts=opts.AxisOpts(name='geographical position',axislabel_opts={"interval":"0","rotate":45}))
bar2.render_notebook()
'''
5.2 Code effect.
(The horizontal axis represents the area and the vertical axis represents the average unit price of the house.The title of the chart is “comparison of average total prices of housing resources in different urban areas of Hangzhou”.)
The chart has a special function that it can pop up a hover window to show some information when the mouse cursor points to one of the bars.The website and other apps can use the function,but now it is only a picture.So we can not use the abrove function now.
The special function is like this:
6. Conclusion.
Referring to the theory of visualization and combining with practice, we improve the expression effect of the original icon, and add some small functions. At this point, readers can easily get more accurate information from the chart. We think that if the author of the original article can use our improved chart, it will be more helpful for young people who are eager to get the conclusion of housing price research.
Special thanks to the research report below for providing the original chart,data and part of the code: