看完了《地图时空大数据爬取》第五章,想着实践一下,于是我用义乌市的shp文件做了一下case实践了一下,没想到状况连出,我主要遇到了下述几个问题:
- arcpy无法导入,我使用的是arcmap10.7,按照网上的各种教程,比如在路径文件里加入arcpy的路径之类的,都不行,最后我在spyder里的搜寻路径中添加了arcpy文件的路径,倒确实能够成功把这个包import了,这说明spyder里添加搜索路径确实还是有用的,无法import的同志们可以试试,但是import过程中显示导入的代码又出错了,最后查来查去发现问题应该是由于安装的arcpy支持的是python2,但是我使用的是python3了,而且网上的方法都说是要删除重新装之类的,过于麻烦,所以索性不管那个arcpy了,直接用geopandas重写一遍代码实现功能好了。
- 第二个问题就有点致命了,这个api已经不再免费提供了,这就十分难受了,于是想着用百度地图api吧,但是百度api的返回信息极少,也不会返回道路的polyline信息,如果道路不堵的话也不会返回速度信息,但是至少能用,这里就用百度api做一个例子实践一下好了。
- 然后还遇到了一个小问题,也就是geopandas的CRS的问题,找到了一篇很好的博客,于是迎刃而解啦,这篇文章里有这个博客的链接。
首先是用arcgis做了栅格化,其实所有的都可以用geopandas实现,根本不需要用arcgis,只是当做练习一下而已了。然后后续都使用jupyter notebook实现了,代码如下:
'''
已经用arcmap得到栅格化后的义乌市范围,现在要得到每个栅格的左下角坐标,右上角坐标,以及质心的坐标。
当然其实全部使用python也可以,包括栅格化,以及坐标获取,而且用python来做还更好一点,但是反正是练习,顺便练习下arcgis了。
'''
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 读取shp文件
gdf = gpd.GeoDataFrame.from_file('.\\fishnetPro\\fishnet_select.shp')
gdf.plot()
gdf.head(5)
Id | BLX | BLY | URX | URY | geometry | |
---|---|---|---|---|---|---|
0 | 0 | None | None | None | None | POLYGON ((119.95605 29.03706, 119.95605 29.047... |
1 | 0 | None | None | None | None | POLYGON ((119.96388 29.03706, 119.96388 29.047... |
2 | 0 | None | None | None | None | POLYGON ((119.97172 29.03706, 119.97172 29.047... |
3 | 0 | None | None | None | None | POLYGON ((119.97955 29.03706, 119.97955 29.047... |
4 | 0 | None | None | None | None | POLYGON ((119.88554 29.04735, 119.88554 29.057... |
# 将每个栅格的左下角顶点坐标,右上角顶点坐标赋值给fields
for i in range(len(gdf)):
gdf.iloc[i,1],gdf.iloc[i,2],gdf.iloc[i,3],gdf.iloc[i,4] = gdf.iloc[i,5].bounds
gdf.head(5)
Id | BLX | BLY | URX | URY | geometry | |
---|---|---|---|---|---|---|
0 | 0 | 119.956 | 29.0371 | 119.964 | 29.0474 | POLYGON ((119.95605 29.03706, 119.95605 29.047... |
1 | 0 | 119.964 | 29.0371 | 119.972 | 29.0474 | POLYGON ((119.96388 29.03706, 119.96388 29.047... |
2 | 0 | 119.972 | 29.0371 | 119.98 | 29.0474 | POLYGON ((119.97172 29.03706, 119.97172 29.047... |
3 | 0 | 119.98 | 29.0371 | 119.987 | 29.0474 | POLYGON ((119.97955 29.03706, 119.97955 29.047... |
4 | 0 | 119.886 | 29.0474 | 119.893 | 29.0576 | POLYGON ((119.88554 29.04735, 119.88554 29.057... |
# 计算得到每个栅格质心点的坐标,并保存在新建的字段里面
gdf['centerX'] = gdf.centroid.bounds.iloc[:,[0]]
gdf['centerY'] = gdf.centroid.bounds.iloc[:,[1]]
gdf.head(5)
E:\Anaconda\lib\site-packages\ipykernel_launcher.py:2: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
E:\Anaconda\lib\site-packages\ipykernel_launcher.py:3: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
This is separate from the ipykernel package so we can avoid doing imports until
Id | BLX | BLY | URX | URY | geometry | centerX | centerY | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 119.956 | 29.0371 | 119.964 | 29.0474 | POLYGON ((119.95605 29.03706, 119.95605 29.047... | 119.959968 | 29.042208 |
1 | 0 | 119.964 | 29.0371 | 119.972 | 29.0474 | POLYGON ((119.96388 29.03706, 119.96388 29.047... | 119.967802 | 29.042208 |
2 | 0 | 119.972 | 29.0371 | 119.98 | 29.0474 | POLYGON ((119.97172 29.03706, 119.97172 29.047... | 119.975636 | 29.042208 |
3 | 0 | 119.98 | 29.0371 | 119.987 | 29.0474 | POLYGON ((119.97955 29.03706, 119.97955 29.047... | 119.983470 | 29.042208 |
4 | 0 | 119.886 | 29.0474 | 119.893 | 29.0576 | POLYGON ((119.88554 29.04735, 119.88554 29.057... | 119.889460 | 29.052496 |
import basics
import json
import os
from urllib import request
import imp
imp.reload(basics)
'''
定义采集所有区域道路交通态势数据的函数,这也是实现本notebook功能的核心函数
'''
def getRoad(ak,bottemLeft,upRight,centerPoint):
url="http://api.map.baidu.com/traffic/v1/bound?ak="+ak+"&bounds="+bottemLeft+";"+upRight+\
"&coord_type_input=wgs84&coord_type_output=gcj02"
print(url)
json_obj=request.urlopen(url)
mydata=json.load(json_obj)
if mydata['status']== 0:
try:
centerPoint.status = mydata['evaluation']['status']
centerPoint.description = mydata['description']
except Exception as e:
centerPoint.status = 1
centerPoint.description = "畅通"
else:
print(mydata['message'])
centerPoint.status=0
centerPoint.description = "Unknown"
return centerPoint
import imp
imp.reload(basics)
# 测试一下上面cell写的函数是否正确,能否抓取出每个栅格的整体路况
ak=自己去申请一个ak
bottemLeft=str(gdf.iloc[600,2])+','+str(gdf.iloc[600,1]) ;upRight=str(gdf.iloc[600,4])+','+str(gdf.iloc[600,3])
centerPoint=basics.PointWithAttr(0, gdf.iloc[600,6], gdf.iloc[600,7],None,None)
res = getRoad(ak,bottemLeft,upRight,centerPoint)
res.description
# 抓取整个义乌各个栅格的路况
ak=自己去申请一个ak
gdf['status'] = None
for i in range(len(gdf)):
bottemLeft=str(gdf.iloc[i,2])+','+str(gdf.iloc[i,1]) ;upRight=str(gdf.iloc[i,4])+','+str(gdf.iloc[i,3])
centerPoint=basics.PointWithAttr(0, gdf.iloc[i,6], gdf.iloc[i,7],None,None)
tmp = getRoad(ak,bottemLeft,upRight,centerPoint)
gdf.iloc[i,-1] = tmp.status
gdf.head(5)
0和1我个人认为都是畅通,未知道路的话一般肯定是车很少的路,上图因为是晚上9点爬的数据,所以基本没有拥堵和严重拥堵。
参考文献
百度地图api
高德地图api
地图时空大数据爬取