(Blue) Taxi Destination and Trip Time Prediction from Partial Trajectories (阅读笔记)20171206

最新推荐文章于 2022-07-28 23:32:42 发布

jasonyanxx

最新推荐文章于 2022-07-28 23:32:42 发布

阅读量591

点赞数 2

分类专栏： Paper_Reading 文章标签： kaggle data-mine taxi prediction

本文链接：https://blog.csdn.net/m0_38058163/article/details/78734575

版权

Paper_Reading 专栏收录该内容

7 篇文章 1 订阅

订阅专栏

《(Blue) Taxi Destination and Trip Time Prediction from Partial Trajectories》2015

20171206 第一次更

一、简介

文章的作者参加了2015年Kaggle的一项数据竞赛（中文链接1，英文链接2），该竞赛的目的就是根据出租车的载客点（乘客上车的地方）来预测出租车的落地点（乘客的目的地）和旅途时间（一次完整的出租车服务所用的时间），kaggle提供了某城市一年的出租车轨迹数据，作者在该比赛中取得了第三名的成绩，第一名由MILA lab提出的深度学习模型获得，虽然作者并没有取得最好的成绩，但是他所提出的出租车数据的分析方法有一定的研究意义。

二、作者的方法

1、概述

作者的基本思路是基于一个intuition（直觉）：

two trips with similar route likely end at the same or nearby destinations.

Therefore, using the destinations of similar trips in the past we can predict the destination of a test trip.

我对这句话的理解是，相似的轨迹极有可能有着相同的终点，于是我们可以从轨迹的相似性着手，寻找不同轨迹中共通的模式，最终用于预测终点。

2、特征提取

1.Feature Extraction for Destination Prediction

-Final destination coordinates and Haversine distance to 10 nearest neighbors

用于计算一对轨迹A和B的相似性。

-Kernel regression (KR) as a smooth version of k-NN regression method

besides computing KR predictions for the full trip, we also compute them using only the last d meters of the ride during the trip matching step, where d ∈ {100,200, 300, 400, 500, 700, 1000, 1200, 1500}

用KR方法代替K-NN的方法，不仅计算了全轨迹的KR，还计算了部分轨迹的KR，最后发现离终点500-700m之间的轨迹的KR值最重要。

-contextual KR

In our work, we exploit this information by using KR to match a test trip with only trips with the same call id, taxi id, day of the week, hour of the day, or taxi stand id.

这是特征提取的核心，吸取了前面KR特征提取的经验，将一些‘contextual（有前后关系）’的信息结合起来，构建特征，这些信息有call id, taxi id, day of the week, hour of the day, or taxi stand id…

RDP algorithm

用于简化数据，处理数据中存在的一些噪声，详情请细看文章。

added features extracted directly from the partially observed trips

最后又加入了一些能够直接获取的特征：
Driection 、Time gap、Number of GPS updates、Day of the week、The first and the last GPS location

2.Feature Extraction for Trip Time Prediction

由于本人的研究对这部分不是很感兴趣，而且此处省略。

3、预测模型构建

作者经过一番尝试后，果断的选择了RF（Random Forest）模型，理由如下：

Moreover, with a RF model we can easily assess the contribution of each feature on the final prediction. With this insight, we know whether a new set of features is relevant or not every time it is added to the mode

翻译过来就是：
能够直观的估计哪些特征的贡献比较大，因为贡献比较大的特征对结果影响大，所以需要仔细的处理和研究。

jasonyanxx

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
(Blue) Taxi Destination and Trip Time Prediction from Partial Trajectories (阅读笔记)20171206

《(Blue) Taxi Destination and Trip Time Prediction from Partial Trajectories》201520171206 第一次更一、简介文章的作者参加了2015年Kaggle的一项数据竞赛（中文链接1，英文链接2），该竞赛的目的就是根据出租车的载客点（乘客上车的地方）来预测出租车的落地点（乘客的目的地）和旅途时间（
复制链接

扫一扫