算法分析_早高峰共享单车潮汐点的群智优化学习记录

提交截图:

导入数据集

共享单车轨迹数据为再使用时产生的定位数据,具体包含单车在不同时间段(默认15秒记录一次)的经纬度信息

#导入常见库
import os,codecs
import pandas as pd
import numpy as np
%pylab inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg')

from matplotlib import font_manager as fm, rcParams
import matplotlib.pyplot as plt
#读取共享单车21-25号的轨迹数据
bike_track = pd.concat([
    pd.read_csv('./dataset/gxdc_gj20201221.csv'),
    pd.read_csv('./dataset/gxdc_gj20201222.csv'),
    pd.read_csv('./dataset/gxdc_gj20201223.csv'),
    pd.read_csv('./dataset/gxdc_gj20201224.csv'),
    pd.read_csv('./dataset/gxdc_gj20201225.csv')
])
#按照单车ID(BICYCLE_ID)和当前时间(LOCATING_TIME)进行排序
bike_track=bike_track.sort_values(['BICYCLE_ID','LOCATING_TIME'])
bike_track

#统计数据
bike_track.describe()

注解:

folium介绍:https://blog.csdn.net/qq_45019698/article/details/108683737

folium.Map()为绘制地图图层的基本函数,其主要参数如下:

  • location: tuple或list类型输入,用于控制初始地图中心点的坐标,格式为(纬度,经度)或[纬度,经度],默认为None
  • zoom_start:表示初始地图的缩放尺寸,数值越大缩放程度越大。

有些时候我们希望可以在地图上呈现不规则的几何区域,folium.PolyLine()可以实现,folium.PolyLine()主要参数:

  • locations:二级嵌套的list,用于指定需要按顺序连接的坐标点,若要绘制闭合的几何图像,需要在传入列表的首尾传入同样的坐标。
  • weight:float型,用于控制线条的宽度,默认为5。

通过folium.Maker()方法创建简单的标记部件,并通过add_to()将定义好的部件施加于先前创建的Map对象m之上。

#线路可视化,读取一条数据的轨迹,用国内地图库,会更加准确
import folium
m=folium.Map(location=[24.482426, 118.157606],zoom_start=12)
my_PolyLine = folium.PolyLine(locations=bike_track[bike_track['BICYCLE_ID']== '000152773681a23a7f2d9af8e8902703'][['LATITUDE','LONGITUDE']].values,weight=5)
m.add_children(my_PolyLine)

共享单车停车点位(电子围栏)数据

统一划分的共享单车停放区域

#读取数据
def bick_fence_format(s):
    s = s.replace('[','').replace(']','').split(',')
    s = np.array(s).astype(float).reshape(5,-1)#把浮点类型的数组转化为5行,形成闭环,方便后续的画坐标
    return s
bike_fence=pd.read_csv('./dataset/gxdc_tcd.csv')
#print(bike_fence)
bike_fence['FENCE_LOC']=bike_fence['FENCE_LOC'].apply(bick_fence_format)
#围栏可视化
import folium
m=folium.Map(location=[24.482426, 118.157606],zoom_start=12)
for data in bike_fence['FENCE_LOC'].values[:100]:
    folium.Marker(list(data[0,::-1])).add_to(m)
m

备注:原来这里用的是folium.Marker((data[0,::-1])).add_to(m) ,该类型无法作图,需要list格式
 

共享单车订单数据

共享单车使用时开锁和关锁的信息

#导入共享单车订单数据
bike_order = pd.read_csv('./dataset/gxdc_dd.csv')
bike_order = bike_order.sort_values(['BICYCLE_ID','UPDATE_TIME'])
#print(bike_order)
#单车位置可视化
import folium
m = folium.Map(location=[24.482426, 118.157606], zoom_start=12)
my_PolyLine=folium.PolyLine(locations=bike_order[bike_order['BICYCLE_ID'] == '0000ff105fd5f9099b866bccd157dc50'][['LATITUDE', 'LONGITUDE']].values,weight=5)
m.add_children(my_PolyLine)

 

task 2:共享单车潮汐点分析

  • 共享单车订单与停车点匹配
  • 统计并可视化停车点潮汐情况
  • 计算工作日早高峰07:00-09:00潮汐现象最突出的40个区域

经纬度匹配

在本赛题中如果想要完成具体潮汐计算,则需要将订单或轨迹与具体的停车点进行匹配,需要计算在不同时间下每个停车点:

停了多少车(订单终点为停车点);

骑走多少车(订单起点为停车点);

因此我们需要根据订单定位到具体的停车点:

停车点处理

首先我们对停车点进行处理,需要计算得出每个停车点的面积和中心经纬度:

#得出停车点 LATITUDE范围
bike_fence['MIN_LATITUDE']=bike_fence['FENCE_LOC'].apply(lambda x: np.min(x[:,1]))#取二维数组里所有第二位
bike_fence['MAX_LATITUDE']=bike_fence['FENCE_LOC'].apply(lambda x: np.max(x[:,1]))

#得到停车点LONGITUDE范围
bike_fence['MIN_LONGITUDE']=bike_fence['FENCE_LOC'].apply(lambda x: np.min(x[:,0]))#取二维数组里所有第一位
bike_fence['MAX_LONGITUDE']=bike_fence['FENCE_LOC'].apply(lambda x: np.max(x[:,0]))

from geopy.distance import geodesic
#根据停车点范围计算具体面积
bike_fence['FENCE_AREA'] = bike_fence.apply(lambda x: geodesic((x['MIN_LATITUDE'], x['MIN_LONGITUDE']), (x['MAX_LATITUDE'], x['MAX_LONGITUDE'])).meters,axis=1)
#根据停车点,计算中心经纬度
bike_fence['FENCE_CENTER'] = bike_fence['FENCE_LOC'].apply(lambda x: np.mean(x[:-1,::-1],0))

Geohash经纬度匹配

介绍网址:https://www.cnblogs.com/feiquan/p/11380461.html

通过geohash可以将一个经纬度编码为一个字符串,需要注意的是precision控制了编码精度,数值越大覆盖范围越小,数值越小覆盖范围越大。 需要注意:这里的precision与具体的定位点范围强相关。

备注:

pip install geohash 安装后找不到库,解决方案

pip install geohash可以直接安装这个库

去Lib/site-packages/目录下,把Geohash文件夹重命名为geohash,

然后修改该目录下的init.py文件,把from geohash改为from .geohash

import geohash
bike_order['geohash'] = bike_order.apply(
    lambda x: geohash.encode(x['LATITUDE'], x['LONGITUDE'], precision=6), 
axis=1)

bike_fence['geohash'] = bike_fence['FENCE_CENTER'].apply(
    lambda x: geohash.encode(x[0], x[1], precision=6)
)

使用ws7gx9对单车订单数据进行索引,可以得到具体订单数据

bike_order[bike_order['geohash']=='ws7gx9']

首先对订单数据进行时间提取:

bike_order['UPDATE_TIME'] = pd.to_datetime(bike_order['UPDATE_TIME'])
bike_order['DAY'] = bike_order['UPDATE_TIME'].dt.day.astype(object)
bike_order['DAY'] = bike_order['DAY'].apply(str)

bike_order['HOUR'] = bike_order['UPDATE_TIME'].dt.hour.astype(object)
bike_order['HOUR'] = bike_order['HOUR'].apply(str)
bike_order['HOUR'] = bike_order['HOUR'].str.pad(width=2,side='left',fillchar='0')

# 日期和时间进行拼接
bike_order['DAY_HOUR'] = bike_order['DAY'] + bike_order['HOUR']

使用透视表统计每个区域在不同时间的入流量和出流量:

bike_inflow = pd.pivot_table(bike_order[bike_order['LOCK_STATUS'] == 1], 
                   values='LOCK_STATUS', index=['geohash'],
                    columns=['DAY_HOUR'], aggfunc='count', fill_value=0
)

bike_outflow = pd.pivot_table(bike_order[bike_order['LOCK_STATUS'] == 0], 
                   values='LOCK_STATUS', index=['geohash'],
                    columns=['DAY_HOUR'], aggfunc='count', fill_value=0
)
#绘图
bike_inflow.loc['wsk593'].plot()
bike_outflow.loc['wsk593'].plot()
plt.xticks(list(range(bike_inflow.shape[1])), bike_inflow.columns, rotation=40)
plt.legend(['入流量', '出流量'])

bike_inflow.loc['wsk52r'].plot()
bike_outflow.loc['wsk52r'].plot()
plt.xticks(list(range(bike_inflow.shape[1])), bike_inflow.columns, rotation=40)
plt.legend(['入流量', '出流量'])

方法1:Geohash匹配计算潮汐

统计工作日早高峰期间的潮汐现象,所以我们可以按照天进行单车流量统计:

bike_inflow = pd.pivot_table(bike_order[bike_order['LOCK_STATUS'] == 1], 
                   values='LOCK_STATUS', index=['geohash'],
                    columns=['DAY'], aggfunc='count', fill_value=0
)

bike_outflow = pd.pivot_table(bike_order[bike_order['LOCK_STATUS'] == 0], 
                   values='LOCK_STATUS', index=['geohash'],
                    columns=['DAY'], aggfunc='count', fill_value=0
)

根据入流量和出流量,可以计算得到每个位置的留存流量:

bike_remain = (bike_inflow - bike_outflow).fillna(0)

# 存在骑走的车数量 大于 进来的车数量
bike_remain[bike_remain < 0] = 0  

# 按照天求平均
bike_remain = bike_remain.sum(1)
# 总共有993条街
bike_fence['STREET'] = bike_fence['FENCE_ID'].apply(lambda x: x.split('_')[0])

# 留存车辆 / 街道停车位总面积,计算得到密度
bike_density = bike_fence.groupby(['STREET'])['geohash'].unique().apply(
    lambda hs: np.sum([bike_remain[x] for x in hs])
) / bike_fence.groupby(['STREET'])['FENCE_AREA'].sum()

# 按照密度倒序
bike_density = bike_density.sort_values(ascending=False).reset_index()

按照最近邻经纬度

from sklearn.neighbors import NearestNeighbors

# https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
knn = NearestNeighbors(metric = "haversine", n_jobs=-1, algorithm='brute')
knn.fit(np.stack(bike_fence['FENCE_CENTER'].values))
dist, index = knn.kneighbors(bike_order[['LATITUDE','LONGITUDE']].values[:20000], n_neighbors=1)

用hnswlib做近似搜索,速度较快但精度差一点。

备注:

pip install hnswlib 需要爬梯子(翻墙),才能加载出来

不过我遇到一个报错信息:Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

再安装一个Microsoft Visual C++即可

 

import hnswlib
import numpy as np

p = hnswlib.Index(space='l2', dim=2)
p.init_index(max_elements=300000, ef_construction=1000, M=32)
p.set_ef(1024)
p.set_num_threads(14)

p.add_items(np.stack(bike_fence['FENCE_CENTER'].values))

计算所有订单的停车位置:

index, dist = p.knn_query(bike_order[['LATITUDE','LONGITUDE']].values[:], k=1)

计算所有停车点的潮汐流量:

bike_order['fence'] = bike_fence.iloc[index.flatten()]['FENCE_ID'].values

bike_inflow = pd.pivot_table(bike_order[bike_order['LOCK_STATUS'] == 1], 
                   values='LOCK_STATUS', index=['fence'],
                    columns=['DAY'], aggfunc='count', fill_value=0
)

bike_outflow = pd.pivot_table(bike_order[bike_order['LOCK_STATUS'] == 0], 
                   values='LOCK_STATUS', index=['fence'],
                    columns=['DAY'], aggfunc='count', fill_value=0
)

bike_remain = (bike_inflow - bike_outflow).fillna(0)
bike_remain[bike_remain < 0] = 0  
bike_remain = bike_remain.sum(1)

计算停车点的密度:

bike_density = bike_remain / bike_fence.set_index('FENCE_ID')['FENCE_AREA']
bike_density = bike_density.sort_values(ascending=False).reset_index()
bike_density = bike_density.fillna(0)
bike_density['label'] = '0'
bike_density.iloc[:100, -1] = '1'

bike_density['BELONG_AREA'] ='厦门'
bike_density = bike_density.drop(0, axis=1)
bike_density.columns = ['FENCE_ID', 'FENCE_TYPE', 'BELONG_AREA']
bike_density.to_csv('result.txt', index=None, sep='|')

 

 

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值