使用Python和开放数据构建爱丁堡Beergardens的交互式地图！

随着夏天终于到来，想知道在爱丁堡外面享用一杯美味的冷饮的好地方。因此将关于主席许可的开放数据集与一些地理编码相结合，并创建了一个在爱丁堡外部座位的交互式地图。

背景和项目描述

在过去的几年里，英国政府一直致力于开放数据，爱丁堡市议会也不例外。在https://edinburghopendata.info，可以找到包含有关公共生活的许多方面的信息的数据集列表（事件虽然某些文件可以肯定地进行一些更新）。可以在此处找到最新版本。请注意尽管两个文件的文件结构在结构上相同，但标题不同因此如果要查看历史数据，则需要相应地调整下面的代码。该文件包含有权放置椅子的房屋的名称和地址以及一些其他信息。该文件构成了该项目的基础，该项目分为四个部分：

http://www.edinburgh.gov.uk/download/downloads/id/11854/tables_and_chairs_permits.csv

Python学习交流群：1004391443

获取并加载许可文件
使用开放街道地图API获取每个机构的经纬度以及前提类别
清理和分类前提类别
使用folium在地图上绘制房屋

完整的笔记本可以在GitHub上找到。

https://github.com/walkenho/tales-of-1001-data/blob/master/beergarden_happiness_with_python/beergarden_happiness_with_python.ipynb

第0步：设置

首先导入库

import pandas as pdimport requestsimport wget import foliumfrom folium.plugins import MarkerCluster

第1步：获取数据

使用wget下载文件并将其读入pandas数据框。确保设置编码，因为该文件包含特殊字符。

filename = wget.download("http://www.edinburgh.gov.uk/download/downloads/id/11854/tables_and_chairs_permits.csv") df0 = pd.read_csv(filename, encoding = "ISO-8859-1")df0.head()

快速浏览数据可以发现数据中有一些重复数据。它们主要是由于具有不同开始和结束日期的多个许可。一个好的清理方法是过滤日期，但坦率地说现在不在乎这么多，所以只保留前提名称和地址并删除重复项。（注意：该文件还包含有关表区域的信息，将来可能会重新访问该区域）。删除重复项后留下了389行，其中包含前提名称和地址。

# dropping duplicate entriesdf1 = df0.loc[:, ['Premises Name', 'Premises Address']]df1 = df1.drop_duplicates()

# in 2012: 280print(df1.shape[0])

旁边的一句话：在2014年夏天，只有280个房屋有椅子和桌子许可证。露天文化确实起飞了，这是证明它的数据:)

第2步：获得每个前提的纬度和经度

如果想要在地图上可视化房屋，地址是不够的，需要GPS坐标。有不同的API，允许查询地址并返回纬度和经度（一个称为地理编码的过程。可能是使用谷歌地图API，但它带有警告.OpenStreetMap API提供相同的功能，但是免费使用的。

https://developers.google.com/maps/documentation/

https://www.programmableweb.com/api/openstreetmap

使用pandas map函数获取每行的API响应。在查询API之后，删除了所有行，确实没有得到响应。对于失去的少数前提（大约20个）并没有太多的了解，剩下的还有很多。

def query_address(address):    """Return response from open streetmap.        Parameter:    address - address of establishment        Returns:    result - json, response from open street map    """        url = "https://nominatim.openstreetmap.org/search"    parameters = {'q':'{}, Edinburgh'.format(address), 'format':'json'}        response = requests.get(url, params=parameters)    # don't want to raise an error to not stop the processing    # print address instead for future inspection    if response.status_code != 200:        print("Error querying {}".format(address))        result = {}    else:        result = response.json()    return resultdf1['json'] = df1['Premises Address'].map(lambda x: query_address(x))

# drop empty responses
df2 = df1[df1['json'].map(lambda d: len(d)) > 0].copy()print(df2.shape[0])

查看响应中的json字段，发现除了坐标之外，API还返回一个名为“type”的字段，该字段包含此地址的前提类型。将此信息与坐标一起添加到数据框中。

# extract relevant fields from API response (json format)df2['lat'] = df2['json'].map(lambda x: x[0]['lat'])df2['lon'] = df2['json'].map(lambda x: x[0]['lon'])df2['type'] = df2['json'].map(lambda x: x[0]['type'])

最常见的前提类型是咖啡馆，酒吧，餐馆，大专和房屋：

df2.type.value_counts()[:5]cafe          84pub           69restaurant    66tertiary      33house         27Name: type, dtype: int64

第3步：分配前提类别

最感兴趣的是区分两种类型的场所：那些出售咖啡并且更有可能在白天开放的场所（如咖啡店和面包店）以及出售啤酒并且更有可能在晚上开放的场所（像酒吧和餐馆）。因此想将房产分为三类：

第1类：日间活动场所（咖啡店，面包店，熟食店，冰淇淋店）
第2类：酒吧，餐馆，快餐店和酒吧
第3类：其他一切

为此有两个信息来源：前提名称和OpenStreetMap返回的类型。查看数据发现该类型是良好的第一个指标，但也有许多地方被标记错误或根本没有。因此采用两步法：i）根据OpenStreetMap类型分配类别ii）使用其名称清理数据，其中此步骤将覆盖步骤i）。为了清理数据，决定推翻OpenStreetMap分类，如果前提名称包含某些关键元素（例如'咖啡馆'，'咖啡'或类似的咖啡店和'餐馆'，'旅店'或类似的餐厅和酒吧）。这个错误分类例如Cafe Andaluz作为咖啡店，但在大多数情况下工作得相当好。特别是它似乎最符合咖啡店分类的模式，它可能在白天开放，所以它适用于目的。当然只需少于400个条目，就可以手动浏览列表并为每个条目分配正确的类别。但是有兴趣创建一个可以很容易地转移到其他地方的过程，因此专门针对爱丁堡风景的人工干预是不合适的。

步骤3a：根据OpenStreetMap类型分配前提类别

def define_category(mytype):    if mytype in ['cafe', 'bakery', 'deli', 'ice_cream']:        category = 1    elif mytype in ['restaurant', 'pub', 'bar', 'fast_food']:        category = 2    else:        category = 3    return category

# assign category according to OpenStreetMap type
df2['category'] = df2['type'].map(lambda mytype: define_category(mytype))

步骤3b：根据前提名称覆盖类别

def flag_premise(premisename, category):    """Flag premise according to its name.        Parameter:    premisename - str        Returns:    ans - boolean    """    prem = str(premisename).lower()    if ((category == 'coffeeshop'and ('caf' in prem                                       or 'coffee' in prem                                       or 'Tea' in str(premisename)                                       or 'bake' in prem                                       or 'bagel' in prem                                       or 'roast' in prem))         or        (category == 'restaurant' and ('restaurant' in prem                                       or 'bar ' in prem                                       or 'tavern' in prem                                       or 'cask' in prem                                       or 'pizza' in prem                                       or 'whisky' in prem                                       or 'kitchen' in prem                                       or 'Arms' in str(premisename)                                       or 'Inn' in str(premisename)                                       or 'Bar' in str(premisename)))):        ans = True    else:        ans = False    return ans # flag coffee shops and restaurants according to their namesdf2['is_coffeeshop'] = df2['Premises Name'].map(lambda x: flag_premise(x, category='coffeeshop'))df2['is_restaurant'] = df2['Premises Name'].map(lambda x: flag_premise(x, category='restaurant'))

快速检查表明重新调整似乎是合理的：

# show some differences between classification by name and by type returned by the APIdf2.loc[(df2.is_coffeeshop) & (df2.type != 'cafe'), ['Premises Name', 'type']].head(10)

重新分配标记为餐厅或咖啡店的场所。如果一个前提被标记为两者，则咖啡店类别优先：

# re-set category if flagged as restaurant or coffeeshop through namedf2.loc[df2.is_restaurant, 'category'] = 2df2.loc[df2.is_coffeeshop, 'category'] = 1

第4步：可视化

最后，使用Python的Folium包将结果可视化为地图上的标记。MarkerClusters如果在同一区域中有太多符号，则添加单个点以允许我们将符号汇总为组。为每个类别创建单独的群集允许我们使用该LayerControl选项单独切换每个类别。使用'fa'前缀来使用font-awesome（而不是标准glyphicon）符号。

# central coordinates of EdinburghEDI_COORDINATES = (55.953251, -3.188267)  # create empty map zoomed in on Edinburghmap = folium.Map(location=EDI_COORDINATES, zoom_start=12) # add one markercluster per type to allow for individual togglingcoffeeshops = MarkerCluster(name='coffee shops').add_to(map)restaurants = MarkerCluster(name='pubs and restaurants').add_to(map)other = MarkerCluster(name='other').add_to(map) # add coffeeshops to the mapfor chairs in df2[df2.category == 1].iterrows():    folium.Marker(location=[float(chairs[1]['lat']), float(chairs[1]['lon'])],                  popup=chairs[1]['Premises Name'],                 icon=folium.Icon(color='green', icon_color='white', icon='coffee', angle=0, prefix='fa'))\    .add_to(coffeeshops)    # add pubs and restaurants to the mapfor chairs in df2[df2.category == 2].iterrows():    folium.Marker(location=[float(chairs[1]['lat']), float(chairs[1]['lon'])],                  popup=chairs[1]['Premises Name'],                 icon=folium.Icon(color='blue', icon='glass', prefix='fa'))\    .add_to(restaurants)    # add other to the mapfor chairs in df2[df2.category == 3].iterrows():    folium.Marker(location=[float(chairs[1]['lat']), float(chairs[1]['lon'])],                  popup=chairs[1]['Premises Name'],                 icon=folium.Icon(color='gray', icon='question', prefix='fa'))\    .add_to(other)    # enable toggling of data pointsfolium.LayerControl().add_to(map)        display(map)

补充步骤5：将地图保存到png

希望有一个地图的屏幕截图，以便能够将静态版本嵌入Medium帖子（不接受动态版本）。获得静态版本（不仅仅是截取屏幕截图）的最佳方法是以HTML格式保存地图，然后使用Selenium保存HTML的屏幕截图。这就是如何做到这一点（相信Selenium部分的stackoverflow帖子）。

注意：为了使以下工作正常，需要安装geckodriver。从此处下载文件并将其放入/ usr / bin / local（对于Linux机器）。

https://github.com/mozilla/geckodriver/releases

import osimport timefrom selenium import webdriver # save mapfn = 'beergarden_happiness_map.html'tmpurl = 'file:///{path}/{mapfile}'.format(path=os.getcwd(),mapfile=fn)map.save(fn) # download screenshot of mapdelay = 5browser = webdriver.Firefox()browser.get(tmpurl)# give the map tiles some time to loadtime.sleep(delay)browser.save_screenshot('{mapname}.png'.format(mapname=fn.split('.')[0]))browser.quit()

结论

在这篇文章中，下载了一个包含爱丁堡市议会主席和餐桌许可证的开放数据集。然后，使用Open Street Map API根据地址获取场所的类型和GPS位置。在根据房屋名称进行一些额外的数据清理之后，将房屋分为“咖啡店”，“酒吧/餐厅”和“其他”三类，并将它们绘制在交互式地图上，以HTML格式保存并随后转换到png格式。

现在有一个工作的啤酒花园和爱丁堡的露天咖啡店地图，可以享受夏天坐在外面享用美味的冰咖啡或冰镇啤酒！