背景:
我是一家做外卖运营公司的BI分析师,刚进入公司,运营找到我说希望能帮助每一家外卖店定位其半径3公里(正常的配送范围)内商务写字楼多还是住宅类型多,这样可以方便在工作日和周末区分指定策略,住宅类的店家那么可以将折扣预算倾斜在周末,而商务类的可以周一到周四进行活动倾入,那么如何判断该公司是住宅类的还是商务类的就成了一个需要研究的问题。
1.高德搜索POI
要调用高德的API服务,首先需要注册高的开发者账号,去控制台创建应用,获取一个KEY,然后就可以开始使用高德API服务啦!
其中,POI是“Point of Interest”的缩写,中文翻译叫兴趣点,在地理信息系统中,一个POI可以是一个商铺、一个公交站等等等。
如上图所示,POI搜索可以分为关键字搜索,周边搜索,多边形搜索几种,我们的研究中是传入一个门店地址经纬度,然后获取其周边有多少个写字楼类型的和住宅楼类型的POI个数,以门店半径3公里范围内的不同类型的POI个数来判断门店所属店圈类型,其中类型
具体需要传入的参数以及输出的参数,可以参考高德官网说明:https://lbs.amap.com/api/webservice/guide/api/search#around
1.1.获取代码
import pandas as pd
import numpy as np
import requests
import time,datetime
def get_gaode_data(location):
'''120100|120201|120202|120203为写字楼类的类型,具体类型可以查看高德的下载文档《POI分类编码》说明:
https://lbs.amap.com/api/webservice/download'''
types=['120100|120201|120202|120203','120301|120302|120303|120304','141201|141205|141206|141207']
result=[]
for i in types:
parameters = {'location': location, 'key': 'APIKEY','types':i,'offset':25,'radius':3000}#这里的APIKEY为自己申请的
base = 'https://restapi.amap.com/v3/place/around'
response = requests.get(base, parameters)
answer = response.json()
dic={}
dic['store_location']=location
dic['type_desc']=i
dic['poi_count']=answer['count']
result.append(dic)
return result
def poi_type(x):
if x=='120100|120201|120202|120203':
y= '商务写字楼'
elif x=='141201|141205|141206|141207':
y= '高等学校'
else:
y= '住宅区'
return y
#读取数据
df3=pd.read_csv(r'D:szl\data\locations.csv',sep='\t',encoding='utf-8')
#读取经纬度列,去重改成列表形式
location_list=df3['jingweidu'].drop_duplicates().values.tolist()
export_data=pd.DataFrame()
for li in range(len(location_list)):
store_location=location_list[li]
begin_dt=datetime.datetime.now()
result_store=get_gaode_data(store_location)
result_store_df=pd.DataFrame(result_store)
export_data=export_data.append(result_store_df)
#监控调取进度,每1000条打印一次
if li%1000==0:
print ('%s执行成功'%location_list[li])
export_data.to_csv(r'D:\szl\data\locations_export.csv',sep='\t')
2. 基于订单地址类型的属性判断
以上方法是基于搜索POI进行的,为了加强结论的可靠性,我们根据每个门店的订单地址的经纬度有进行了验证,查看订单所属的地址类型,这里就用到了高德的逆地理编码功能,具体内容同样可以参照链接:https://lbs.amap.com/api/webservice/guide/api/georegeo#regeo
2.1获取代码
#根据经纬度返回地址/社区/楼宇/商圈/poi等信息
#使用:根据品牌的订单经纬度信息匹配地址的住宅/写字楼类型
import pandas as pd
import requests
import numpy as np
import openpyxl
import xlrd
import time,datetime
import zipfile
import os
def regeocode(location):
parameters1 = {'location': location, 'key': 'APIKEY','radius': '100', 'extensions' :'all','batch':'true','roadlevel':'0'}
base1 = 'http://restapi.amap.com/v3/geocode/regeo'
response1 = requests.get(base1, parameters1)
answer1 = response1.json()
dic={}
location1=answer1['regeocodes'][0]['addressComponent']['building']['name']
building =answer1['regeocodes'][0]['addressComponent']['building']['name'] # 楼宇
formatted_address =answer1['regeocodes'][0]['formatted_address']
buildingtype=answer1['regeocodes'][0]['addressComponent']['building']['type']
if answer1['regeocodes'][0]['pois']==[]:
pois=''
pois_type=''
else:
pois = answer1['regeocodes'][0]['pois'][0]['name'] # 兴趣点
pois_type=answer1['regeocodes'][0]['pois'][0]['type']
province=answer1['regeocodes'][0]['addressComponent']['province']
district =answer1['regeocodes'][0]['addressComponent']['district']
dic['location']=location
dic['formatted_address']=formatted_address
dic['province']=province
dic['district']=district
dic['building']=building
dic['buildingtype']=buildingtype
dic['pois']=pois
dic['pois_type']=pois_type
return dic
if __name__=='__main__':
inputfile='C:\\Users\\77202\\Desktop\\poi_beijing_v2(1).xlsx'
df1=pd.read_excel(inputfile,sheet_name='Sheet2')
location_df=list(df1['location'])
result=[]
for i in range(len(location_df)):
print('第%s条执行成功'%i)
location1=location_df[i]
print (location1)
dic1=regeocode(location1)
print (type(dic1))
result.append(dic1)
result2=pd.DataFrame(result)
print ('地理数据生成完毕,开始插入excel')
#df2=pd.merge(df1,location_df1,on='location',how='left')
writer=pd.ExcelWriter('C:/Users/77202/Desktop/address_geo_type_info.xlsx')
result2.to_excel(writer,index=False)
writer.save()
print ('success')
另外,最后我们从业务角度,我们增加了每个门店在周末和工作日的午餐时段的订单量分布情况,从业务角度也对门店类型做了判断,三种方式结合判断门店所属类型,三种方法之间,70%的门店类型是一致的,说明结果还是比较好的。