《Map-Matching for Low-Sampling-Rate GPS Trajectories》 读书笔记


Map-matching is the process of aligning a sequence of observed user positions with the road network on a digital map. (关于mm的定义)

ST-Matching considers (1) the spatial geometric and topological structures of the road network and (2) the temporal/speed constraints of the trajectories. (本文算法的依据)

Typically a GPS trajectory consists of a sequence of points with latitude, longitude, and timestamp information. (工程实践中会考虑更多的因素,只要能提升算法的准确率)

In practice there exists large amount of low-sampling-rate (e.g., one point every 2 minutes) GPS trajectories. They are either application-logged data collected from ad-hoc location-based queries, or generated in the scenarios where saving of energy cost and communication cost are desired. (本文主要介绍低频的场景,也是其引入最短路径的前提条件。要知道如果是高频场景,相邻的点要么在同一条link上,要么在相邻link上)

In this paper we refer to low-sampling-rate as one point every 2 minutes or above. With such sampling rate, the distance between two points may reach over 1300m even a vehicle‟s speed is only 40km/h!


The basic algorithm in shortest path computation is Dijkstra's algorithm. In practice, A* algorithm [10] is often used as a more efficient alternative. A* algorithm uses heuristic function to guide the search toward the destination. Other strategies such as bidirectional search [12], search decomposition [13], and hierarchical search [14] are often used in real applications as well. (介绍相关算法)

Some pre-processing steps can be added to speed up the shortest path search. For example, ALT algorithms in [6] employ the combination of landmarks and triangular inequality to reach a tighter lower bound than Euclidean distance used in A* algorithm. Reach-based-pruning [9] is another method to compute lower bound for pruning purpose. (预处理算法)



Definition 1 (GPS Log): A GPS log is a collection of GPS points 𝐿 = {𝑝1, 𝑝2,... , 𝑝𝑛} . Each GPS point 𝑝𝑖 ∈ 𝐿 contains latitude 𝑝𝑖 . 𝑙𝑎𝑡, longitude 𝑝𝑖 . 𝑙𝑛𝑔 and timestamp 𝑝𝑖 . 𝑡, as illustrated in the left part of Figure 3. (定义点的集合)

Definition 2 (GPS Trajectory): A GPS Trajectory 𝑇 is a sequence of GPS points with the time interval between any consecutive GPS points not exceeding a certain threshold 𝛥𝑇, i.e. 𝑇:𝑝1 → 𝑝 →⋯→𝑝 , where𝑝 ∈𝐿, and0<𝑝 .𝑡−𝑝.𝑡<∆𝑇(1≤ 2𝑛𝑖 𝑖+1𝑖 𝑖 < 𝑛). Figure 3 shows an example of GPS trajectory. 𝛥𝑇 is the sampling interval. In this paper, we focus on low sampling rate GPS trajectories with ∆𝑇 ≥ 2𝑚𝑖𝑛. (定义轨迹)

Definition 3 (Road Segment): A road segment 𝑒 is a directed edge that is associated with an id 𝑒. 𝑒𝑖𝑑, a typical travel speed 𝑒. 𝑣, a length value 𝑒. 𝑙, a starting point 𝑒. 𝑠𝑡𝑎𝑟𝑡, an ending point 𝑒. 𝑒𝑛𝑑 and a list of intermediate points that describes the road using a polyline. Figure 4 shows several real road segments in Bing Map Search [2]. Note that a road may contain several road segments. (定义路段)

Definition 4 (Road Network): A road network is a directed graph 𝐺(𝑉, 𝐸), where 𝑉 is a set of vertices representing the intersections and terminal points of the road segments, and 𝐸 is a set of edges representing road segments. (定义路网的抽象)

Definition 5 (Path): Given two vertices 𝑉𝑖, 𝑉𝑗 in a road network 𝐺, a path 𝑃 is a set of connected road segments that start at 𝑉𝑖 and end at 𝑉𝑗 , i.e.𝑃:𝑒1→𝑒2→⋯→𝑒𝑛 , where 𝑒1.𝑠𝑡𝑎𝑟𝑡=𝑉𝑖, 𝑒𝑛.𝑒𝑛𝑑=𝑉𝑗, 𝑒𝑘.𝑒𝑛𝑑=𝑒𝑘+1.𝑠𝑡𝑎𝑟𝑡, 1≤𝑘<𝑛. (定义路径)

Given a raw GPS trajectory 𝑇 and a road network 𝐺(𝑉, 𝐸), find the path 𝑃 from 𝐺 that matches 𝑇 with its real path. (问题定义)


It is composed of three major components: Candidate Preparation, Spatial and Temporal Analysis, and Result Matching.

Candidate Preparation

Given trajectory 𝑇 = 𝑝1 → 𝑝2 → ⋯ → 𝑝𝑛, we first retrieve a set of candidate road segments within radius 𝑟 of each point 𝑝𝑖 , 1 ≤ 𝑖 ≤ 𝑛. (候选集选取)

Definition 6 (Line Segment Projection): The line segment projection of a point 𝑝 to a road segment 𝑒 is the point 𝑐 on 𝑒 such that 𝑐 = arg 𝑚𝑖𝑛∀ 𝑐𝑖 ∈𝑒 𝑑𝑖𝑠𝑡(𝑐𝑖 , 𝑝) , where 𝑑𝑖𝑠𝑡(𝑐𝑖 , 𝑝) returns the distance between p and any point ci on 𝑒. (关于投影点的定义)

Spatial Analysis

Definition 7 (Observation Probability): The observation probability is defined as the likelihood that a GPS sampling point 𝑝𝑖 matches a candidate point 𝑐𝑖𝑗 computed based on the 𝑗 distance between the two points 𝑑𝑖𝑠𝑡(𝑐𝑖 , 𝑝𝑖) . (观察概率的定义,满足正太分布)



 Temporal Analysis







 Result Matching


Synthetic Trajectory Data

It first randomly selects two vertices in the road network and compute top 𝐾 shortest paths between them. Then it randomly select a trajectory from the K paths as the ground truth, denoted as 𝐺: 𝑒1, 𝑒2, ... , 𝑒𝑛 . The motivation behind this is that moving objects generally follow the direction from source to destination, but not necessarily follow the shortest path strictly. Note that the time interval between any two neighboring points is not uniform. To retrieve a trajectory with desired sampling interval, the simulator select one road segment from every 𝑘′ segments on 𝐺 , The adjustment of sampling rate is therefore achieved by changing the value of 𝑘′ . The simulator generates one GPS point with estimated timestamp information for each selected road segment. The points are produced to follow the zero-mean normal distribution with the standard deviation of 20 meters. (机器模拟测试)

Evaluation Criteria:







