E4523 Data Visualization

目录

1. Grouped Data

1.1 Group一次

1.2 Group两次 

1.3 选择TOP5制图

1.4 时间统计处理

 2. Distribution

2.1 画statistical function

 2.2 画拟合distribution

 3. Mapping

3.1 创建Geojson内容

3.2 Folium热力图


1. Grouped Data

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')

1.1 Group一次

  • 纵坐标为每个group的大小
borough_group = df.groupby('Borough')
borough_group.size().plot(kind='bar')

1.2 Group两次 

  • Unstack,获得每个borough的不同agency的size
agency_borough = df.groupby(['Agency','Borough'])
agency_borough.size().unstack().plot(kind='bar',title="Incidents in each Agency by Borough",figsize=(12,12))
#调节图表大小和标题

  • 时间序列+group两次

具体strftime语法,参考: Python time strftime() 方法 | 菜鸟教程

#处理时间序列
import datetime
#保留年和月
df['yyyymm'] = df['Created Date'].apply(lambda x:datetime.datetime.strftime(x,'%Y%m'))
#group+unstack操作
date_agency = df.groupby(['yyyymm','Agency'])
date_agency.size().unstack().plot(kind='bar',figsize=(12,12))
  • %y 两位数的年份表示(00-99)
  • %Y 四位数的年份表示(000-9999)
  • %m 月份(01-12)
  • %d 月内中的一天(0-31)
  • %H 24小时制小时数(0-23)
  • %I 12小时制小时数(01-12)
  • %M 分钟数(00=59)
  • %S 秒(00-59)
  • %a 本地简化星期名称
  • %A 本地完整星期名称
  • %b 本地简化的月份名称

1.3 选择TOP5制图

两次group,哪个量是全部显示(各占一个图),哪个量在后面;选择top5的量在前面

agency_borough = df.groupby(['Agency', 'Borough']).size().unstack()

形成的agency_borough长这样:key是Bronk...

AgencyBRONXBROOKLYNMANHATTANQUEENSSTATEN ISLAND
3-1-117.028.023.028.06.0
DCA958.01532.01529.01547.0194.
%matplotlib inline
#设计几行几列
COL_NUM = 2
ROW_NUM = 3
import matplotlib.pyplot as plt
fig, axes = plt.subplots(ROW_NUM, COL_NUM, figsize=(12,12))

colors=['r','g','b','y','c']
for i, (borough, agency_count) in enumerate(agency_borough.items()): 
    #图表位置
    ax = axes[int(i/COL_NUM), i%COL_NUM] #3*2matrix i=0,(0,0)
    agency_count = agency_count.sort_values(ascending=False)[:5] #top5取前五个
    agency_count.plot(kind='barh', ax=ax,color=colors) #horizontal;subplot
    ax.set_title(borough)

plt.tight_layout() 

1.4 时间统计处理

import numpy as np
#时间转化成天
df['float_time'] =df['processing_time'].apply(lambda x:x/np.timedelta64(1, 'D'))
grouped = df[['float_time','Agency']].groupby('Agency')
grouped.mean().sort_values('float_time',ascending=False)
df['float_time'].hist(bins=50)

 2. Distribution

2.1 画statistical function

#提取hour
df['hour of day'] = df['Created Date'].apply(lambda x:x.hour)
#画本身图
import seaborn as sns
sns.distplot(df['hour of day'])

 2.2 画拟合distribution

from scipy import stats
# move the data by 5
sns.distplot(df['hour of day'].apply(lambda x: x if x>3 else x+24),kde=True,fit=stats.gamma/norm)

 3. Mapping

3.1 创建Geojson内容

  • 创建文本内容
map_dict = dict()
map_dict["type"] ="FeatureCollection"
features = list()
lats = df['Latitude']
longs = df['Longitude']
agencies = df['Agency']
for index in range(100):
    lat,lon,agency = lats.iloc[index],longs.iloc[index],agencies.iloc[index]
    data_point = { "type": "Feature",
        "geometry": {"type": "Point", "coordinates": [lon, lat]},
        "properties": {"Agency": agency}
        }
    features.append(data_point)
map_dict['features'] = features

应形如下:

example = { "type" : "FeatureCollection",
           "features": [
               {"type": "Feature",
               "geometry": {"type":"Point", "coordinates": [-73.9626, 40.8075]},
                "properties": {"name":"Columbia University"}
               }]}

  • 画图
import json
import geojsonio
geojsonio.display(json.dumps(map_dict))

3.2 Folium热力图

  •  处理输入数据sizes,长得如下

***zip一定要是str格式!

import pandas as pd
zip_groups = df.groupby("Incident Zip")
sizes = pd.DataFrame(zip_groups.size())
import pandas as pd
sizes.rename(columns={0:"size"},inplace=True)
sizes.reset_index(level=0, inplace=True)
sizes['Zip'] = sizes['Incident Zip']
Incident ZipsizeZip
0100003810000
110001522310001
  • 制图
import folium
#Center the map at Times Square
m = folium.Map(location = [40.7589,-73.9851],zoom_start=12)
#key_on是geo_data中与data串联的key的位子
m.choropleth(geo_data='zipcode.geojson', data=sizes,
             columns=[ 'Zip','size'],
             key_on='feature.properties.postalCode',
             fill_color='RdYlGn', fill_opacity=0.7, line_opacity=0.8,
             legend_name='Distribution of Incidents')
folium.LayerControl().add_to(m)
m #展示

4. 其他

  • scatter

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')

plt.scatter(df['duration'],df['trip_distance'])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
<template> <div class="hello"> <h1>子任务五:用折柱展示省份平均消费额和地区平均消费额</h1> <div id="chart" style="height: 800px;width: 1600px;"></div> </div> </template> <script> import { onMounted } from 'vue'; import * as echarts from "../src/assets/echarts.min"; import axios from 'axios'; export default{ setup(){ onMounted(()=>{ axios({ method:"post", url:'/api/dataVisualization/selectOrderInfo', data:{ "startTime":"2020-01-01 00:00:00", "endTime":"2020-12-30 00:00:00" } }).then((res)=>{ let da = res.data.data; console.log(da) var nationMap = new Map(); var aderrsMap = new Map(); da.forEach((e)=>{ if(!nationMap.has(e.provinceName)){ nationMap.set(e.provinceName,Number(e.finalTotalAmount/12)) }else{ let sum1 = nationMap.get(e.provinceName) +Number(e.finalTotalAmount/12); nationMap.set(e.provinceName,sum1); } }); da.forEach((e)=>{ if(!aderrsMap.has(e.regionName)){ aderrsMap.set(e.regionName,Number(e.finalTotalAmount/12)) }else{ let sum1 = aderrsMap.get(e.regionName) + Number(e.finalTotalAmount/12); aderrsMap.set(e.regionName,sum1); } }); let arr1 = Array.from(aderrsMap); let sortedArr1 = arr1.sort(function(c,d){ return d[1] - c[1]; }); let arr = Array.from(nationMap); let sortedArr = arr.sort(function(a,b){ return b[1] - a[1]; }); var top_name = []; var top_data = []; sortedArr.slice(0,5).forEach((res)=>{ top_name.push(res[0]); top_data.push(parseFloat(res[1]).toFixed(2)); }); var avg_name = []; var avg_data = []; sortedArr1.slice(0,5).forEach((res)=>{ avg_name.push(res[0]); avg_data.push(parseFloat(res[1]).toFixed(2)); }); var chartDom = document.getElementById("chart"); var myChart = echarts.init(chartDom); const option={ xAxis:[{ type:"category", data:top_name, name:"省份", },{ data:avg_name, name:"地区", } ], yAxis:{ type:"value", name:"平均消费额", }, series:[{ type:"bar", data:top_data, },{ type:"line", data:avg_data, } ] }; console.log(top_name); console.log(top_data); console.log(avg_name); console.log(avg_data); option && myChart.setOption(option,true); }).catch((err)=>{ console.log(err); }) }) } } </script>对每行代码的作用做出详细的注释,并且表明这段代码为什么要这么写
07-22

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值