Escolar Documentos
Profissional Documentos
Cultura Documentos
ABSTRACT
τ2 τ3-2
With the increasing availability of moving-object tracking data, use end
of this data for route search and recommendation is increasingly im-
portant. To this end, we propose a novel parallel split-and-combine
start τ3-1
approach to enable route search by locations (RSL-Psc). Given a set τ2 o4
of routes, a set of places to visit O, and a threshold θ , we retrieve the o2 τ1-2
τ3 τ1-1 o3
route composed of sub-routes that (i) has similarity to O no less than o1 τ3
start end
θ and (ii) contains the minimum number of sub-route combinations. end
τ1
The resulting functionality targets a broad range of applications, start
including route planning and recommendation, ridesharing, and
location-based services in general. Figure 1: An example of the RSL-Psc problem
To enable efficient and effective RSL-Psc computation on mas-
sive route data, we develop novel search space pruning techniques
and enable use of the parallel processing capabilities of modern
processors. Specifically, we develop two parallel algorithms, Fully- 1 INTRODUCTION
Split Parallel Search (FSPS) and Group-Split Parallel Search (GSPS).
M The continued proliferation of GPS-equipped mobile devices (e.g.,
We divide the route split-and-combine task into k=0 S(|O |, k + 1)
vehicle navigation systems and smart phones) and the proliferation
sub-tasks, where M is the maximum number of combinations and
of online map-based services (e.g., Bing Maps, Google Maps, and
S(·) is the Stirling number of the second kind. In each sub-task, we
MapQuest) enable the collection and sharing of travel routes. Spe-
use network expansion and exploit spatial similarity bounds for
cialized sites, including Bikely, GPS-Way-Points, Share-My-Routes,
pruning. The algorithms split candidate routes into sub-routes and
and Microsoft Geolife [20], as well as general social network sites,
combine them to construct new routes. The sub-tasks are indepen-
including Twitter, Facebook, and Foursquare, are starting to sup-
dent and are performed in parallel. Extensive experiments with real
port route sharing and search. The availability of massive route
data offer insight into the performance of the algorithms, indicating
data enables novel mobile functionality, including route search by
that our RSL-Psc problem can generate high-quality results and
locations (RSL query [1, 12, 13]), which retrieves routes that are
that the two algorithms are capable of achieving high efficiency
similar in some specific sense to a set of user-specified places (e.g.,
and scalability.
sightseeing places).
The RSL query is useful in a broad range of applications, includ-
KEYWORDS
ing route planning and recommendation, ridesharing, and location
Route recommendation; Trajectory search based services in general [1, 12, 13]. For example, tourists can ex-
ACM Reference Format: ploit the travel histories of other tourists to improve their own
Lisi Chen, Shuo Shang, Christian S. Jensen, Bin Yao, Zhiwei Zhang, Ling travel. Others with similar interests may have visited nearby land-
Shao. 2019. Effective and Efficient Reuse of Past Travel Behavior for Route marks that the tourist may not know, but may be interested in;
Recommendation. In The 25th ACM SIGKDD Conference on Knowledge Dis- or others may have avoided a specific road because it is unpleas-
covery and Data Mining (KDD’19), August 4–8, 2019, Anchorage, AK, USA. ant, although it may seem like a good choice in term of distance.
ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3292500.3330835 Such experiences are captured in the routes shared by previous
∗ Corresponding author. tourists. In addition, tourists may post their routes to attract poten-
tial ridesharing partners. The RSL query can identify such tourists
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed with similar interests (i.e., the user-specified places are similar to
for profit or commercial advantage and that copies bear this notice and the full citation the posted route) and can recommend them as ridesharing partners.
on the first page. Copyrights for components of this work owned by others than ACM In most existing studies (e.g., [1, 12, 13]), the RSL query is defined
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a as a top-k query. However, sometimes the quality of query results
fee. Request permissions from permissions@acm.org. cannot be guaranteed due to insufficient data (e.g., the top-1 route
KDD ’19, August 4–8, 2019, Anchorage, AK, USA is relatively far away from the user-specified places). Consider the
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6201-6/19/08. . . $15.00 example in Figure 1, where o 1 , o 2 , o 3 , and o 4 are query locations
https://doi.org/10.1145/3292500.3330835 (user-specified places) and τ1 , τ2 , and τ3 are routes. Compared to τ1
488
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
and τ3 , route τ2 is spatially close to the query locations so it is re- locations to subsets, and subsets to partitionings), a larger |C | causes
turned as the top-1 result. However, this result is of low quality (i.e., a very high (exponential) number of combination possibilities.
relatively far away from the query locations), and it is not a useful
result in real applications like travel planning and recommendation
and ridesharing. Group-Split Parallel Search: To further improve the efficiency
This motivates us to study a novel split-and-combine approach of RSL-Psc processing, we propose the Group-Split Parallel Search
to solving the RSL problem. Given a set of routes, a set of user- (GSPS) algorithm that adopts a divide-and-conquer strategy. In the
specified places O, and a threshold θ , we retrieve the route τ that case of GSPS, we partition the set of query locations into (k + 1)
consists of several sub-routes that satisfy two conditions: (1) τ subsets (O = O 1 ∪ ... ∪ O k +1 ), where k ∈ [0, M] and we refine the
has the similarity to O no less than θ , and (2) τ contains the min- value of k from the minimum to the maximum. For each value of
imum number of sub-routes (the minimum number of transfers k, we have S(|O |, k + 1) possible partitionings. For each subset O i
in ridesharing). Consider the example in Figure 1, where routes τ1 (i ∈ [1, k + 1]), we search the route candidates that are spatially
and τ3 split from o 2 into sub-routes τ1−1 (from τ1 .start to o 2 ), τ1−2 close to O i . Upper and lower bounds on the aggregate distance are
(from o 2 to τ1 .end), τ3−1 (from τ3 .start to o 2 ) , and τ3−2 (from o 2 defined in order to prune the search space. The route search in each
to τ3 .end). Here, we combine sub-routes τ3−1 and τ1−2 to make up subset is performed in parallel. Then we combine and evaluate the
a new route τ = τ3−1, τ1−2 . Compared to the original routes τ1 , route candidates of the location subsets of the same partitioning,
τ2 , and τ3 , τ matches query locations o 1 , o 2 , o 3 , and o 4 well while again performing the computation for each partitioning in parallel.
combining only two sub-routes. Compared to FSPS, GSPS achieves tighter candidate sets and avoids
The RSL-Psc problem is applied in spatial networks because in the combination from locations to subsets.
many practical scenarios, objects (e.g., commuters and vehicles) Our contributions can be summarized as follows. First, we pro-
move in spatial networks [9, 11, 12] rather than in Euclidean space. pose a novel parallel split-and-combine approach to tackling the
In spatial networks, the most relevant distance notion when quan- problem of route search by locations (RSL-Psc) efficiently and effec-
tifying the distance between two objects is network distance; Eu- tively, thus targeting applications such as route planning and rec-
clidean distance may lead to errors. We adopt aggregate-distance ommendation, ridesharing, and location-based services in general.
matching (i.e., the sum of distances between query locations and Second, we develop two efficient algorithms, Fully-Split Parallel
routes) [1, 9, 10, 12] to match routes and query locations. Search (FSPS) (Section 3) and Group-Split Parallel Search (GSPS)
The RSL-Psc problem is challenging due to its high computation (Section 4), to process the RSL-Psc query efficiently. Third, we con-
M duct extensive experiments on large real route data sets to study the
complexity. There exist k=0 S(|O |, k + 1) possibilities when par-
titioning the set O of query locations, where M is the maximum performance of the algorithms (Section 5). Our experiment results
number of combinations (e.g., the tolerance of transfer times for show that the PSL-Psc query is much more likely to return a valid
a tourist) and S(·) is the Stirling number of the second kind. The result compared with the PSL query without route combination.
computations in different partitionings are independent of each
other so can occur in parallel. We propose two parallel solutions to
the RSL-Psc problem. 2 PRELIMINARIES
Fully-Split Parallel Search: In Fully-Split Parallel Search (FSPS),
2.1 Spatial Networks and Routes
we first use network expansion [3] to explore the spatial network A spatial network is modeled as a connected, undirected graph
from each query location o ∈ O and retrieve the route candidates G = (V , E, F,W ), where V is a vertex set and E ⊆ {{vi , v j }|vi , v j
that are spatially close to o. We define a distance lower bound and a ∈ V ∧ vi v j } is an edge set. A vertex vi ∈ V represents a road
similarity upper bound to prune the search space. Then we partition intersection or an end of a road, and an edge ek = {vi , v j } ∈ E
set O into (k + 1) subsets, where k ∈ [0, M], and we refine the value represents a road segment that enables travel between vertices vi
of k from the minimum to the maximum (since we retrieve the and v j . Function F : V ∪ E → Geometries maps a vertex to the
routes with the minimum number of combinations, once we find point location of the corresponding road intersection and maps an
a qualified route, it is unnecessary to consider larger values of edge to a polyline representing the corresponding road segment.
k). For each possible subset O i ⊆ O, we select the intersection of Function W : E → R assigns a real-valued weight W (e) to an edge
the route candidate sets of the corresponding query locations and e that represents the corresponding road segment’s length.
generate the route candidate set at the subset level (i.e., the route The shortest path between two vertices vi and v j is a sequence
candidate set of O i ). Next, we combine and evaluate candidate sets of edges linking vi and v j such that the sum of their edge weights
associated with every query location subset in each partitioning to is minimal. Such a path is denoted by SP(vi , v j ), and its length is
obtain the query result. The computations in each subset and in denoted by sd(vi , v j ). Euclidean-space based spatial indices (e.g., the
each partitioning occur in parallel. R-tree [6]) and accompanying techniques are relatively ineffective
The advantage of the FSPS algorithm is that it only needs to in network environments due to loose bounds. For simplicity, we
conduct the route search once, after which it can reuse the search assume that the data points considered (e.g., route sample points
results for route combination. Its limitation lies in the tightness of and query locations) are located on vertices.
its upper and lower bounds (it uses a single distance to prune an Definition 1: (Route) A route τ of is a finite sequence p1, p2, ..., pn
aggregate distance). As a result, each query location must maintain a that consists of at least 2 vertices, where pi and pi+1 (i ∈ [1, n − 1])
large candidate set C. When combining route candidates (two steps: are adjacent vertices in V .
489
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
490
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
1 o3
Algorithm 1: FSPSCandidates
2 Data: Query location set O , trajectory set T , similarity threshold θ
Result: C = {C 1 , C 2 , ..., C |O | }
p4 1 for each C i in C do
o2 p2 2 C i ←∅;
p1 for each query location o i in O do
3
3
4 while DijkstraExpansion(o i ).hasNext() do
p3 o1 5 p ←DijkstraExpansion(o i ).next();
6 ubi ← |O | − 1 + e −sd (oi ,p) ;
7 if ubi ≥ θ then
8 for each τ s.t. p ∈ τ do
Figure 3: An example of network expansion 9 C i .add(o i (τ , p));
10 else
11 break;
are scanned by the expansion from o 2 ) in a candidate set C(oi ) for 12 return C;
expansion center oi , and we also maintain the network distance
d(oi , τ ) for each candidate (it is derived during network expansion,
see Equation 3). 3.4 Combining Route Candidates
Now we are ready to present the FSPS algorithm. Specifically, Now we have a route candidate tuple set of each query location.
FSPS consists of two steps: (1) Generating route candidate sets The next step is to combine route candidates and acquire the final
(Section 3.3); (2) Combining route candidates from different query route. In particular, we need to: (1) Generate partitionings for query
location subsets (Section 3.4). locations; (2) For each partitioning, retrieve route candidates associ-
ated with each query location group; (3) Combine route candidates
3.3 Generating Route Candidate Sets in each group and generate the final route. Before presenting the
Before presenting the algorithm, we first define route tuple, which algorithm, we introduce the concepts of partitioning, query location
will be used to record the route candidates and corresponding labels group, and relevant data structures for maintaining partitionings
for each query location. Route tuples will be retrieved and updated and route candidates associated with each query location group.
when combining route candidates from different query location 3.4.1 Partitioning and query location group. We present the defini-
subsets. tion of query locations partitioning in Definition 6.
Definition 5: (Route Tuple) A route tuple of route τ associated Definition 6: (Partitioning of Query Locations) A partitioning of
with query location oi and point p is denoted by oi (τ , p) = e, p, d, query locations is denoted by P. It consists of a set of disjoint query
which consists of three elements: an entry e (identifier) of route τ , location groups {G 1, ..., G n }. Each location group contains a subset
an expansion point p in τ scanned by network expansion, and the of the query locations in O. We use P(k ) to denote k-set of query
shortest network distance d between p and oi . location partitionings. In particular, we have:
Algorithm 1 presents the pseudo code for generating a route can- P(k ) = {Pi | |Pi | = k} (5)
didate set. The inputs are a set of query location O, the route dataset
T , and the similarity threshold θ . The output is a route candidate
tuple set of each query location (i.e., C = {C 1, C 2, ..., C |O | }). Note Recall that once we find a qualified final route, which is generated
that each Ci is maintained as a priority queue and contains route tu- from a set of qualified route candidates from each group (i.e., group-
ples associated with query location oi . Specifically, the route tuples wise route candidates) in a particular partitioning, the algorithm
in Ci is sorted in ascending order of oi (τ ).d. terminates immediately. To enhance the search efficiency, we need
We first initialize the route (candidate) tuple set associated with to find a qualified final route as early as possible. To achieve this,
each query location (lines 1–2). Next, we find the route candidate we use a priority queue to maintain group-wise route candidates in
tuple set of each query location oi ∈ O. Specifically, we perform a each query location group G i ∈ P. In particular, it stores group-wise
network expansion from query location oi . If an unvisited vertex route tuples that are sorted in descending order of the similarity
exists (line 4), we retrieve the next unvisited vertex p (line 5). Then upper bound. Using this data structure, route candidates with high
we update the upper bound of route candidate tuple set associated similarity scores, which are more likely to produce a qualified final
with oi (i.e., ubi ) to be |O | − 1 +e −sd(oi ,p) (Equation 4) (line 6). If ubi route, are evaluated at first. Group-wise route candidates are stored
is no less than the similarity threshold θ , we regard all routes whose as group-wise route tuples (Definition 7).
vertices contain p as candidates, so we add their route candidate Definition 7: (Group-wise route tuple) A group-wise route tuple
tuples to Ci (lines 8–9). If the value of ubi is lower than θ , the of route τ associated with query location group G is denoted by
expansion from oi terminates (lines 10–11). Having searched all G(τ , P) = e, P, ub, which consists of three elements: an entry
query locations, we combine their results and get the result C of (identifier) e of route τ , a set of key-value pairs where the key is a
route candidates. query location (expansion center) in G and the value is an expansion
491
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
point of τ , and the similarity upper bound ub of τ . Specifically, each To improve the efficiency of RSL-Psc search, we need to decrease
pair oi , p j ∈ P satisfies the following conditions: (1) oi ∈ G; (2) the number of route candidates and route candidate tuples. Hence,
oi (τ , p j ) ∈ Ci ; (3) ∀(o, p, o , p ∈ P) (o o ). The value of ub is a more effective pruning strategy is required for the filtering of
computed as follows: unqualified candidates. To achieve this, we propose the Group-
Split Parallel Search (GSPS) algorithm. Unlike the FSPS algorithm,
G(τ , P).ub = |O | − |G | + e −d (oi ,p j )
the GSPS algorithm does not need to maintain route candidates
o i ,p j ∈P
and tuples for each query location. Instead, we generate route
candidates associated with each group directly. Thus, a similarity
We say that G(τ , P) is a valid group-wise route tuple if G(τ , P) upper bound between a route and a query location set can be derived
satisfies Definition 7 and G(τ , P).ub ≥ θ , where θ is the similarity by computing the aggregate distances to query locations in a group
threshold. rather than to a single query location. Consequently, the pruning
3.4.2 Route combination. We proceed to present how to combine power provided by GSPS is much larger than that provided by FSPS.
two route candidates from two different query location groups as a The high-level idea of the GSPS algorithm is as follows. First,
route candidate in the next level (i.e., a route candidate associated we partition the set of query locations into k + 1 groups, where
with the union of the two query location groups). Specifically, given k ∈ [0, M], and we refine the value of k from the minimum to the
group-wise route tuples G(τ , P) and G (τ , P ), we need to be able maximum. For each group G i (i ∈ [1, k + 1]), we directly generate
to determine whether we can generate a qualified group-wise route the route candidates and route tuples that are spatially close to G i .
tuple of group G ∪G based on G(τ , P) and G (τ , P ). We use τ − (p) This involves two steps: (1) Group-based network expansion (cf.
to denote the sub-route of τ starting from the beginning of τ and Section 4.2) and (2) route candidate filtering (cf. Section B.2). The
terminating at point p (p is a point in τ ). Likewise τ + (p) denotes route candidate search in each group are performed in parallel. After
the sub-route of τ starting from point p and terminating at the end that, we combine and evaluate the route candidates associated with
of τ . location groups of the same partitioning. Note that the computation
Intuitively, two route candidates τ and τ can be combined into for each partitioning is also performed in parallel. Compared to
a route candidate in the next level if they have an intersection point FSPS, GSPS produces much fewer candidate sets, and it avoids the
(i.e., τ ∩ τ ∅) and the intersection is a “transfer point”, which lies route candidate combination from query-location level to group
between the expansion points of query locations in the one group level.
and the expansion points of query locations in the other group. A
next-level route candidate tuple is defined as follows.
4.2 Group-based Network Expansion
Definition 8: (Next-level route tuple) Let G(τ , P) and G (τ , P ) be
Recall that in FSPS, network expansion is performed individually for
two group-wise route tuples of G and G , respectively. Routes τ
each query location. When we parallelize the network expansion,
and τ intersect at point pin and O denotes a set of query locations
we may only consider the minimum distance between a query
(G ⊆ O, G ⊆ O). Route τs = τ − (pin )+τ + (pin ) is a next-level route
location o and its nearest vertex p (i.e., sd(o, p) in Equation 3) when
candidate for S = G ∪G and S(τs , P ∪ P ) is a next-level route tuple
calculating the similarity upper bound, which is a static value and
for G and G if:
has limited pruning effect. In GSPS, we introduce group-based
∀(oi , p j ∈ P, oi, p j ∈ P ) (p j ∈ τ − (pin ) ∧ p j ∈ τ + (pin )) network expansion that performs expansion for all query locations
in a group simultaneously. In addition, given a group G, instead of
Having derived a next-level route tuple S(τs , P ∪ P ), we regard storing a comparably loose and static similarity upper bound for
its corresponding route candidate τs as a new group-wise route each query location, we maintain a dynamic upper bound for each
candidate associated with G ∪G . The route combination processing query location o ∈ G that takes an aggregated distance between
is performed iteratively in a bottom-up fashion until we find a all query locations in G and their corresponding nearest vertices
qualified final route associated with group O. into consideration. With the group-based dynamic similarity upper
Detailed algorithms and complexity analyses for generating bounds, we generate route candidates for G directly.
group-wise route tuples, deriving a qualified final route by combin- Next, we explain how to compute the group-based similarity
ing route candidates, are presented in Appendix, Section A. upper bounds and how to perform route candidate pruning based
on the upper bounds. Consider the example in Figure 4, where
4 GROUP-SPLIT PARALLEL SEARCH G = {o 1, o 2, o 3 } is a query location group in a partitioning P, and
τ1 , τ2 , τ3 , and τ4 are routes. In each network expansion iteration,
4.1 Basic Idea we select one of the query locations in G as an expansion center.
The FSPS algorithm maintains a set of route candidates for each For each query location o, we maintain its network distance to its
query location. Because the similarity upper bound of each route current expansion point, which is denoted by o.sd. The location
candidate in FSPS only takes one query location into consideration which has the minimum o.sd value will be selected as the expansion
(cf. Equation 4), which has low pruning power, the number of route center in the current iteration.
candidates associated with each query location can be large. When Table 1 presents the values of o.sd in each iteration. At the be-
combining route candidates in Algorithm 2, a large |Ci | results in a ginning (Iter 0), the values of o.sd for all o ∈ G are 0. Assume that
very large (exponential) number of combination possibilities, which we select o 1 as the expansion center in the first iteration and p1 is
makes the combination process computationally expensive. the first vertex scanned by the expansion from o 1 . We update o 1 .sd
492
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
τ4 5 EXPERIMENTAL STUDY
p7 We report on experiments with real road networks routes and
Points-of-Interest (POI) data sets that offer insight into the efficiency
τ3 p5 and scalability of the proposed algorithms.
d (o1,τ3) = sd (o1,p5) d (o2,τ4) = sd (o2,p7)
p1
p2
τ2 p6 5.1 Experiment Settings
d (o1,τ2) = sd (o1,p1) sd (o2,p2) 5.1.1 Data sets. We use two road networks, namely the Beijing
o1 sd (o3,p6)
o2 Road Network (BRN) and the New York Road Network (NRN)1 ,
sd (o1,p1) = 1.0 d (o2,τ1) = sd (o2,p3)
sd (o2,p3) = 1.2 τ1
which contain 28,342 vertices and 27,690 edges, and 95,581 vertices
sd (o3,p4) = 1.3 p3 o3 d (o3,τ1) = sd (o3,p4) and 260,855 edges, respectively. The corresponding network graphs
sd (o2,p2) = 1.8
sd (o1,p5) = 2.2 p4 are stored and indexed by adjacency lists. In BRN, we use a real
sd (o3,p6) = 2.7 taxi trajectory data set collected by the T-drive project [19], while
sd (o2,p7) = 3.5
in NRN, we use a real taxi data set from New York 10 . Each item in
Figure 4: An example of group-based network expansion the data set contains pick-up and drop-off locations of a taxi. We
derive the shortest path from the pick-up location to the drop-off
Table 1: Update of o.sd location and regard it as a route. The T-drive taxi trajectory data set
contains 800K trajectories and 300K POIs (each POI has a spatial
Iter 1 Iter 2 Iter 3 Iter 4 Iter 5 Iter 6 Iter 7
o 1 .sd 1.0 1.0 1.0 2.2 2.2 2.2 2.2
coordinate with latitude and longitude), while the New York taxi
o 2 .sd 0 1.2 1.2 1.2 1.8 1.8 3.5 data set contains 700M routes. In NRN, we use a real POI data set
o 3 .sd 0 0 1.3 1.3 1.3 2.7 2.7 that contains 19,918 POIs in New York City 2 . For NRN, the POIs
may not match the trajectory points. So we map each POI in NRN
Table 2: Route Label Hash Map to its nearest road network vertex.
Key Value (route label set - o, p, o .sd )
τ1 o 2 , p3 , 1.2 , o 3 , p4 , 1.3
5.1.2 Query location sets. A query location set O is generated
τ2 o 1 , p1 , 1.0 , o 2 , p2 , 1.8 , o 3 , p6 , 2.7 as follows: First, we plot n circular query selector regions with
τ3 o 1 , p5 , 2.2 radius r and place each selector region at a random position in the
underlying space. Next, we randomly select |O |/n POIs from each
selector region. The selected POIs constitute the query location set.
to be 1.0. Because p1 ∈ τ2 , we generate the label o 1, p1, 1.0 for τ2
In the experiments, we evaluate the parameters n and r .
and add it to the route label hash map maintained by group G (cf.
Table 2). The route label hash map is used during route candidate 5.1.3 Implementations. In the experiments, the road network graphs,
filtering (cf. Section B.2). In the second iteration, we select o 2 as the routes, and POIs are memory resident. All algorithms are imple-
expansion center (after the 1st iteration, o 2 .sd = o 3 .sd, so either o 2 mented in Java and run on a cluster with 10 data nodes. Each node
or o 3 can be selected as the expansion center), and p3 is the first is equipped with two Intel Xeon Processors E5-2620 v3 (2.4GHz)
vertex scanned by the expansion from o 2 . Thus, we set o 2 .sd = 1.2. and 128GB RAM. Unless stated otherwise, experimental results are
Likewise, since p3 ∈ τ1 , we generate a label o 2, p3, 1.2 for τ1 and averaged over 200 and 50 independent trials using different query
add it to the route label hash map. We continue the iterative process location sets for effectiveness (Section 5.2) and efficiency evalua-
until we reach the similarity upper bound of G. Theorem 1 explains tions, respectively. The performance metrics are runtime and the
that the resulting pruning is safe. number of route visits. The number of route visits is used as a metric
Theorem 1: Given a query location set O, a similarity threshold because it reflects the number of data accesses. In multi-threaded
θ , and a group of query locations G where G ⊂ O, group-based executions, the total runtime is the maximum runtime among all
network expansion can be stopped and all unexplored routes can individual threads.
be safely pruned when: Trajectories in T are selected randomly from the real data sets.
|O | − |G | + e −o .sd < θ (6) We evaluate the following three methods:
o ∈G • FSPS: Fully-Split Parallel Search (Section 3);
Proof. Let τu be a route that is unexplored during group-based • GSE+CTF: Group-Split Parallel Search (GSPSExpansion +
network expansion. The Dijkstra expansion has the property that, CTFilter) (Section 4);
∀(oi ∈ G) (d(oi , τu ) ≥ oi .sd). As a result, ∀(oi ∈ G) (e −d(oi ,τu ) ≤ • GSE: Group-Split Parallel Search without CTFilter (GSPSEx-
e −oi .sd ). Because d(o, τu ) is non-negative, e −d (o,τu ) cannot exceed pansion only) (Section 4.2).
1. Then we have: ∀(o j ∈ O \ G) (e d (o j ,τu ) ≤ 1). Consequently, When evaluating the number of route visits, we do not report
−o .sd −d (o,τu )
if |O | − |G | + e is smaller than θ , e must be the performance of GSE+CTF because GSE and GSE+CTF incur the
o ∈G o ∈O same numbers of route visits. The parameter settings are listed in
smaller than θ . This completes the proof.
Table 3.
Detailed algorithms for group-based network expansion and
route candidate filtering are presented in Sections B.1 and B.2, 1 https://publish.illinois.edu/dbwork/open-data/
respectively. 2 https://data.cityofnewyork.us/City-Government/Points-Of-Interest/rxuy-2muj
493
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
0.2 0.2
candidate filtering algorithm (CTFilter) is capable of improving the
0 0
pruning effectiveness by a factor of 1.1.
0.4 0.5 0.6 0.7 0.8 0.5 0.3 0.4 0.5 0.6 0.7 0.5
Similarity Threshold, θ ( × |O| ) Similarity Threshold, θ ( × |O| )
494
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
8000K
Runtime (ms)
Runtime (ms)
Runtime (ms)
GSE+CTF 1600 GSE+CTF 6000K GSE+CTF GSE+CTF
300 1000 1000 1200K
1200 4000K 900K
200 800 100 100
600K
100 2000K 10 10
400 300K
(a) BRN-time (b) NRN-time (c) NRN-count (a) BRN-time (b) NRN-time (c) NRN-count
Figure 6: Effect of the number of routes Figure 7: Effect of |O |
M
from 8 to 12, the value of k=0 S(|O |, k + 1) (M = 3) is increased by 500K for BRN and |T | = 10M for NRN). The results in Figure 12
more than 300 times. However, according to Figures 7(a) and 7(b), show that GSE+CTF outperforms FSPS by a factor of 6–8 in term
the CPU time only increases by 30–100 times for all algorithms. This of runtime and outperforms GSE by 30%–60% in term of runtime.
is because some qualified routes require only 0, 1, or 2 transfers, and In BRN, when we set the thread counts to 120, GSE+CTF is able to
that once we find a qualified route, the search process terminates. solve the RSL-Psc problem over a collection of 500K routes in 15
Effect of similarity threshold θ : This set of experiments investi- milliseconds, while in NRN, GSE+CTF is able to solve the RSL-Psc
gates the effect of similarity threshold θ . Figure 8 shows the results problem with 10M routes in 100 milliseconds. When we increase
when we vary the similarity threshold θ . Increasing the value of the thread count from 24 to 120 (5 times), the runtime of GSE+CTF
θ has the following two effects on the performance: (1) Based on and GSE are improved by a factor of around 3, while the runtime
Equation 4 and Theorem 1, a larger value of θ leads to higher prun- performance of FSPS is improved by 2.8 and 2.1 in BRN and NRN,
ing effectiveness, which may improve the efficiency. (2) A larger respectively.
value of θ may postpone the termination of the algorithms. Recall
that while combining routes, we first generate the routes with the 6 RELATED WORK
minimum number of combinations (transfers), and that once we Existing studies related to the RSL-Psc problem can be classified into
find a qualified route, it is unnecessary to consider larger numbers two categories: Location-to-trajectory search and location-based
of combinations. Hence, when we increase θ , it is less likely that route recommendations.
we will be able to generate a qualified route with few combinations.
Such effect may deteriorate the efficiency. Compared to Effect (2), Location-to-route search: Location-to-route search aims at re-
Effect (1) is negligible. As a result, all algorithms exhibit increasing trieving trajectories who have the highest relevances to query ar-
CPU time and route visits as we increase the value of θ . guments [4, 12, 15, 21]. In particular, the relevancy functions may
contain spatial [1], temporal [8], textual [12][21], and density ele-
Effect of maximum number of transfers M: We proceed to eval- ments. The resulting queries are useful in many popular applica-
uate the effect of varying the maximum number of route transfers tions including travel planning, carpooling, friend recommendation
(M). From Figures 9(a) and 9(b), we find that when we increase M, in social networks, and location-based services in general.
the CPU time increases for all algorithms. Specifically, when the According to the types of trajectory query arguments, we further
value of M reaches 4 in NRN, the subsequent increase in CPU time classify existing studies regarding location-to-trajectory search
is modest. The reason is that when M is set to be 5 in NRN, most of into two sub-categories: (1) Trajectory search based on a single
the trials are returned with qualified final routes with no more than location; (2) trajectory search based on multiple locations. Zheng
4 transfers. Additionally, Figure 9(c) suggests that the performance et al. [22] extend the single-point trajectory query to cover spatial
of route counts is relatively consistent as we increase M. and textual domains and propose the TkSK query, which retrieves
Effect of the radius of query location selector region: Fig- the trajectories that are spatially close to the query point and also
ure 10 shows the effect of varying the radius of the query location meet semantic requirements defined by the query. For trajectory
selector region. We find that both CPU time and route visits exhibit search based on multiple locations, the query takes a set of locations
slight or moderate increasing trends for all algorithms when we as argument and returns a trajectory that connects or is close to
increase the radius of the selector region from 1 km to 9 km. The the query locations according to specific metrics. The concept of
reason is that when we apply a large query location selector region, trajectory search by locations (TSL) was first proposed by Chen et
the query locations in O are distributed increasingly widely in the al. [1]. The main difference between RSL-Psc and the problem of
underlying space, which increases the number of route visits during location-based trajectory search studied by existing work is that
network expansion. RSL-Psc returns a route by combining a set of connected trajectories.
Effect of the number of query location selector regions: Fig- In contrast, existing location-based trajectory queries return a single
ure 11 covers the effect of varying the number of query location pre-existing trajectory or a list of pre-existing trajectories.
selector regions. More query regions implies that more expansion Location-based route recommendations: Given a set of loca-
centers must be processed, which increases the search space and tions (e.g., POIs, taxi locations), the location-based route recom-
the number of route visits. Thus, the CPU time and the count of mendation problem aims to derive a new route based on the loca-
visited routes for all three algorithms increase with the number of tions and user preferences. Ge et al. [5] and Ye et al. [16, 17] study
query location selector regions. the mobile sequential recommendation problem that outputs an
Effect of thread counts: We study the effect of thread count on optimal routes with minimum potential travel distance to a taxi
the efficiency of the algorithms using large route data sets (|T | = driver’s next potential passenger. In particular, Ye et al. [18] first
495
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
Runtime (ms)
Runtime (ms)
Runtime (ms)
500 GSE+CTF 800 GSE+CTF 400 GSE+CTF GSE+CTF 1200K
400 1500K 600 1000K
600 300
300 1000K 400 800K
400 200 600K
200
200 500K 100 200 400K
100 200K
0.4 0.5 0.6 0.7 0.8 0.3 0.4 0.5 0.6 0.7 0.3 0.4 0.5 0.6 0.7 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Similarity Threshold, θ ( × |O| ) Similarity Threshold, θ ( × |O| ) Similarity Threshold, θ ( × |O| ) Maximum Number of Transfers, M Maximum Number of Transfers, M Maximum Number of Transfers, M
(a) BRN-time (b) NRN-time (c) NRN-count (a) BRN-time (b) NRN-time (c) NRN-count
Figure 8: Effect of θ Figure 9: Effect of M
Runtime (ms)
Runtime (ms)
Runtime (ms)
Runtime (ms)
Runtime (ms)
Runtime (ms)
GSE+CTF GSE+CTF GSE+CTF GSE+CTF GSE+CTF 1600 GSE+CTF
400 400
150 150 300
1200
300 300
100 100 200 800
200 200
50 100 50 100 100 400
1 3 5 7 9 1 3 5 7 9 1 2 3 4 1 2 3 4 24 48 72 96 120 24 48 72 96 120
r (km) r (km) Number of selector regions, n Number of selector regions, n Thread Counts Thread Counts
(a) BRN-time (b) NRN-time (a) BRN-time (b) NRN-time (a) BRN-time (b) NRN-time
Figure 10: Effect of r Figure 11: Effect of n Figure 12: Effect of thread counts
study the multiple mobile sequential recommendation problem [3] E. W. Dijkstra. 1959. A note on two problems in connection with graphs. Nu-
that generates optimal routes for a group of taxis with different merische Math 1 (1959), 269–271.
[4] E. Frentzos, K. Gratsias, and Y. Theodoridis. 2007. Index-based most similar
locations. Another area of related studies is travel itinerary recom- trajectory search. In ICDE. 816–825.
mendation (e.g., [7, 14]). Specifically, it takes a set of user-specified [5] Yong Ge, Hui Xiong, Alexander Tuzhilin, Keli Xiao, Marco Gruteser, and Michael J.
Pazzani. 2010. An energy-efficient mobile recommender system. In KDD. 899–
POIs and constraints as input to generate an itinerary through a 908.
subset of POIs with a specific starting and ending POI that can be [6] Antonin Guttman. 1984. R-Trees: A Dynamic Index Structure for Spatial Search-
completed within a certain time. Additionally, Yang et al. [2] recom- ing. In SIGMOD. 47–57.
[7] Kwan Hui Lim, Jeffrey Chan, Shanika Karunasekera, and Christopher Leckie.
mend the shortest route to users based on existing trajectories by 2017. Personalized Itinerary Recommendation with Queuing Time Awareness.
considering multiple costs. However, the results generated by these In SIGIR. 325–334.
proposals are new individual routes, while the results of RSL-Psc [8] Shuo Shang, Lisi Chen, Zhewei Wei, Christian S. Jensen, Ji-Rong Wen, and Panos
Kalnis. 2016. Collective Travel Planning in Spatial Networks. IEEE Trans. Knowl.
are combinations of existing trajectories. Data Eng. 28, 5 (2016), 1132–1146.
[9] Shuo Shang, Lisi Chen, Zhewei Wei, Christian S. Jensen, Kai Zheng, and Panos
Kalnis. 2017. Trajectory Similarity Join in Spatial Networks. PVLDB 10, 11 (2017),
7 CONCLUSIONS 1178–1189.
We propose and study RSL-Psc problem, namely parallel split-and- [10] Shuo Shang, Lisi Chen, Zhewei Wei, Christian S. Jensen, Kai Zheng, and Panos
Kalnis. 2018. Parallel trajectory similarity joins in spatial networks. VLDB J. 27,
combine approach to enable route search by locations. To answer 3 (2018), 395–420.
the RSL-Psc query, we develop two parallel search algorithms: [11] Shuo Shang, Lisi Chen, Kai Zheng, Christian S. Jensen, Zhewei Wei, and Panos
Kalnis. 2019. Parallel Trajectory-to-Location Join. IEEE Trans. Knowl. Data Eng.
Fully-Split Parallel Search (FSPS) and Group-Split Parallel Search 31, 6 (2019), 1194–1207.
(GSPS). Specifically, we divide the route split-and-combine task into [12] Shuo Shang, Ruogu Ding, Bo Yuan, Kexin Xie, Kai Zheng, and Panos Kalnis. 2012.
M
k=0
S(|O |, k + 1) sub-tasks, where M is the maximum number of User oriented trajectory search for trip recommendation. In EDBT. 156–167.
[13] Shuo Shang, Ruogu Ding, Kai Zheng, Christian S. Jensen, Panos Kalnis, and
combinations and S(·) is the Stirling number of the second kind. Xiaofang Zhou. 2014. Personalized trajectory matching in spatial networks.
In each sub-task, we use network expansion to explore the spatial VLDB J. 23, 3 (2014), 449–468.
network and exploit spatial similarity bounds for pruning. The algo- [14] Kendall Taylor, Kwan Hui Lim, and Jeffrey Chan. 2018. Travel Itinerary Recom-
mendations with Must-see Points-of-Interest. In WWW. 1198–1205.
rithms split candidate routes into sub-routes and combine them to [15] Kexin Xie, Ke Deng, and Xiaofang Zhou. 2009. From trajectories to activities: a
construct new routes. The sub-tasks are independent of each other spatio-temporal join approach. In LBSN. 25–32.
[16] Zeyang Ye, Keli Xiao, and Yuefan Deng. 2018. A Unified Theory of the Mobile
and are performed in parallel. Extensive experiment with real data Sequential Recommendation Problem. In ICDM. 1380–1385.
demonstrates that our proposed RSL-Psc query is much more likely [17] Zeyang Ye, Keli Xiao, Yong Ge, and Yuefan Deng. 2019. Applying Simulated
to return a valid result compared with the PSL query without route Annealing and Parallel Computing to the Mobile Sequential Recommendation.
IEEE Trans. Knowl. Data Eng. 31, 2 (2019), 243–256.
combination. In addition, FSPS and GSPS algorithms are capable of [18] Zeyang Ye, Lihao Zhang, Keli Xiao, Wenjun Zhou, Yong Ge, and Yuefan Deng.
achieving high efficiency and scalability on massive route data. 2018. Multi-User Mobile Sequential Recommendation: An Efficient Parallel
Computing Paradigm. In KDD. 2624–2633.
Acknowledgements: This work (Bin Yao) was supported by the [19] Jing Yuan, Yu Zheng, Xing Xie, and Guangzhong Sun. 2013. T-Drive: Enhancing
NSFC (61872235, 61729202, 61832017, U1636210) and the National Driving Directions with Taxi Drivers’ Intelligence. IEEE Trans. Knowl. Data Eng.
25, 1 (2013), 220–232.
Key Research and Development Program of China (2018YFC1504504). [20] Jing Yuan, Yu Zheng, Chengyang Zhang, Wenlei Xie, Xing Xie, Guangzhong Sun,
Additionally, Zhiwei Zhang is supported by GRF (12201518, 12232716, and Yan Huang. 2010. T-drive: driving directions based on taxi trajectories. In
12258116) and NSFC (61602395). ACM SIGSPATIAL. 99–108.
[21] Kai Zheng, Shuo Shang, Nicholas Jing Yuan, and Yi Yang. 2013. Towards efficient
search for activity trajectories. In ICDE. 230–241.
REFERENCES [22] Kai Zheng, Bolong Zheng, Jiajie Xu, Guanfeng Liu, An Liu, and Zhixu Li. 2016.
Popularity-aware spatial keyword search on activity trajectories. World Wide
[1] Zaiben Chen, Heng Tao Shen, Xiaofang Zhou, Yu Zheng, and Xing Xie. 2010.
Web 19, 6 (2016), 1–25, online first.
Searching trajectories by locations: an efficiency study. In SIGMOD. 255–266.
[2] Jian Dai, Bin Yang, Chenjuan Guo, and Zhiming Ding. 2015. Personalized route
recommendation using big trajectory data. In ICDE. 543–554.
496
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
A ALGORITHM FOR COMBINING ROUTE Algorithm 2 presents the pseudo code of generating group-wise
CANDIDATES IN FSPS route tuples and deriving a qualified final route by combining route
candidates. We partition the query location set O into k + 1 subsets,
where k ∈ [0, M], and we evaluate each possible partitioning start-
Algorithm 2: FSPSCombination ing from partitioning with one group to partitionings with k + 1
Data: Route candidate tuple set C , similarity threshold θ , combination groups. For each possible partitioning P, we generate the group-
threshold M wise route tuples associated with each group G (lines 4–8). First,
Result: Final result route
1 k ← 0; we initialize G.candidates, which stores all group-wise route tuples
2 while k ≤ M do of G (line 5). Next, for each τ that were scanned by every query
3 for each P in P(k +1) do locations in G during Dijkstra expansion, we generate its group-
4 for each G in P do
5 G.candidates ← ∅; wise route tuples based on Definition 7 (lines 6–8). After having
6 for each τ s.t. ∀(o i ∈ G) (o i (τ , ·) ∈ C i ) do group-wise route tuples of all groups in partitioning P, we evaluate
7 for each valid G(τ , P ) do each possible sequence of groups X and for each sequence we com-
8 G.candidates.add(G(τ , P )); // Definition 7
bine route candidates in each group in a bottom-up fashion until all
(j)
9 for each group sequence X of P do groups are combined (lines 9–26). Here, i in G i is the group index
(0)
10 for each G i in X do (j)
(0) and j in G i denotes the level of the group. In particular, for each
11 Initialize G i ;
sequence of group we initialize the group-wise route tuples of each
12 j ← 0; (0)
13 while | X | > 1 do group G i from level 0 (lines 10–11). If |X| > 1, which means that
14 l , s ← 1; the groups in sequence X can be combined, we proceed to combine
15 while l ≤ k do route candidates in groups from the lowest level. Starting from level
16 if l = k then
Gs
(j +1) (j )
← Gl ;
j = 0, we combine route candidates associated with two adjacent
17
(j) (j)
18 else
groups (i.e., Gl and Gl +1 ) and generate corresponding next-level
(j +1) (j ) (j ) route candidate tuples (lines 19–20). Specifically, we derive the
19 Gs ← G l ∪ G l +1 ;
(j+1) (j ) (j ) (j) (j) (j+1)
20 Gs .candidates ← NextLevelRoute(G l ,G l +1 ); union of groups Gl and Gl +1 (i.e., G s ) (line 19) and generate
21 if |G s
(j +1) (j+1)
| = |O | and Gs .candidates ∅ (j+1)
route candidates associated with G s by calling NextLevelRoute
then (j+1)
22
(j+1)
return Gs .candidates.top(); function (Algorithm 3) (line 20). If the cardinality of G s equals
(j ) (j ) (j +1) the cardinality of the query location set O, this means that we have
23 Replace G l and G l +1 by G s in X ;
combined all groups and the route candidates are associated with
24 l ← l + 2;
25 s ← s + 1; all locations in the query location set. Thus, we can return any
route candidate as a result (lines 21–22). Otherwise, we need to
26 j ← j + 1; (j) (j)
update X by merging Gl and Gl +1 (line 23).
27 k ← k + 1; Algorithm 3 presents the pseudo code for combining the route
candidate tuples associated with two adjacent groups and gener-
ating corresponding next-level route candidate tuples. After ini-
tialization (lines 1–2), we evaluate each pair of routes from G and
G , respectively, and check if we can generate a next-level route
Algorithm 3: NextLevelRoute(G, G ) candidate from the route pair τ and τ . Based on Definition 8, if τ
Data: Groups G and G , similarity threshold θ and τ can generate a next-level route candidate, they must have at
Result: Next-level route tuples of G ∪ G least one intersection point (line 4). For each intersection point pi ,
S ← G ∪ G ;
we consider two potential combinations of sub-routes, τp−i + τ + (pi )
1
2 S.candidates ← ∅;
3 for each G(τ , P ) ∈ G.candidates, G(τ , P ) ∈ G .candidates do and τ − (pi ) + τ + (pi ), and we check whether they are qualified for
4 if τ ∩ τ ∅ then combination based on Definition 8 (lines 8–17). Qualified next-level
5 for each pi ∈ τ ∩ τ do
6 τl ← τ − (pi ) + τ + (pi ); route candidate tuples are added to the route candidate tuple set
7 τr ← τ − (pi ) + τ + (pi ); associated with the union of G and G . In particular, if the union of
8 if τl is a next-level route and |G ∪ G | < |O | then G and G equals to the query location set O, each route candidate
S.candidates.add(S (τl , P ∪ P ));
9
associated with S = G ∪ G is considered to be a final route can-
10 else if τl is a next-level route and Sim(O , τl ) ≥ θ then didate. Here, we check whether the similarity between the route
11 S.candidates.add(S (τl , P ∪ P ));
12 return S.candidates; and query location set satisfies the similarity threshold θ . If so, we
13 if τr is a next-level route and |G ∪ G | < |O | then regard the route as a qualified final route and return S.candidates.
14 S.candidates.add(S (τr , P ∪ P )); Time Complexity: The time complexity of generating route tuples
15 else if τr is a next-level route and Sim(O , τr ) ≥ θ then of each query location in O (i.e., FSPSCandidates) is O((|V |log|V | +
16 S.candidates.add(S (τr , P ∪ P ));
17 return S.candidates;
|E|) · |O | · |Tv |). Specifically, (|V |log|V | + |E|) is the complexity of
the Dijkstra expansion for each query location, |V | is the number
18 return S.candidates ;
of vertices, and |E| is the number of edges. Further, |Tv | denotes the
average number of routes passing each vertex. The time complexity
497
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
of running FSPSCombination on each partitioning can be approx- on the value of oi .sd in iteration i. As we continue the iterative
|O |
imated as O(k · |τo | k + |TG | 2 · k!), where k denotes the number process, the value of oi .sd will increase, which makes the value of
of groups in the partitioning, |τo | is the average number of route |O | − |G | + oi ∈G e −oi .sd decrease. Therefore, it is possible that
tuples of each route associated with each query location, and |TG | a route τ added to H during iteration i can be pruned based on
is the average number of group-wise route tuples of each group.3 Theorem 1 during iteration i+n.
Here, k and |O | are much smaller than |Tv | and |TG |. To address the problem, we develop a route candidate filtering
algorithm, CTFilter, that takes H as input and evaluates each τ ∈ H
B ALGORITHM OF GSPS based on the final value of oi .sd produced by the GSPSExpansion
algorithm.
B.1 Algorithm of Network Expansion
Algorithm 4 presents the pseudo code of group-based network ex-
Algorithm 5: CTFilter
pansion. First, we initialize route label hash map H and the value of
Data: Route label hash map H , the final values of o i .sd for each o i ∈ G ,
oi .sd for each oi ∈ G (lines 1–3). Next, we check whether the expan- cardinality of query location set |O | , route set T , similarity threshold θ
sion can be terminated based on Theorem 1 (line 4). If not, we start Result: G.candidates
the current iteration of network expansion (lines 5–13). Specifically, 1 G.candidates ← ∅;
2 for each τ in H do
we select the query location that has the minimum value of o.sd 3 L ← H .get(τ );
in G for performing expansion (i.e., omin ) (line 5). Here, p is the 4 S ← ∅; P ← ∅;
next vertex scanned by the expansion from omin (line 6). Then we 5 a ← 0;
update omin .sd to be sd(omin , p), which is the shortest network dis- 6 for each label l ∈ L do
7 o ← query location in l ;
tance between omin and p (line 7). After that, we scan and evaluate 8 sd ← o .sd in l ;
the routes that pass through p (lines 8–13). In particular, for each 9 p ← p in l ;
route τ passing through p, we retrieve its route label set (i.e., L) in 10 S .add(o );
H (line 9). If L is null, τ has never been scanned. So we need to add 11 P .add( o, p );
12 a ← a + e −sd ;
a new key-value pair of τ into H . The key is the entry of τ and the
value is a new route label set containing label {omin , p, omin .sd} 13 if |O | − |G | + a + oi ∈G\S e −oi .sd ≥ θ then
14 G.candidates.add(G(τ , P ));
(lines 10–11). If L is not null, it denotes that τ was scanned before.
Here, we just insert a new label ({omin , p, omin .sd}) into L (lines 15 return G.candidates ;
12–13). When the expansion terminates, we return the hash map
H as the result (line 14). Elements in H are considered to be route
After initializing G.candidates, we evaluate each τ in H . First,
candidates associated with G.
we retrieve the route label set L associated with τ (line 3). Next,
sets S, P, and variable a are initialized (lines 4–5). Specifically, S
Algorithm 4: GSPSExpansion
stores the query locations in L, P stores the o, p pairs (cf. P in
Data: Query location group G , cardinality of query location set |O | , route set T ,
similarity threshold θ
Definition 7), and a records the aggregated value of oi .sd (oi ∈ S).
Result: Route label hash map H Then we visit each label l in L and acquire the query location o and
1 H ← ∅; its corresponding p and o.sd in l (lines 7–9). We update S and P by
for each o i in G do
2
3 o i .sd ← 0;
inserting o and o, p, respectively, and update a by adding the value
of o.sd (lines 10–12). Next, we calculate the up-to-date similarity
while |O | − |G | + oi ∈G e −oi .sd ≥ θ do
upper bound of τ . In expression |O | − |G | + a + oi ∈G\S e −oi .sd , a
4
5 omin ← the o with the minimum o .sd in G ;
6 p ←DijkstraExpansion(omin ).next(); denotes the aggregated similarity score contributed by route labels
7 omin .sd ← sd(omin , p); of τ (i.e., L), and oi ∈G\S e −oi .sd computes the similarity score
8 for each τ s.t. p ∈ τ do
9 L ← H .get(τ );
contributed by the query locations in G that are not stored in L. If
10 if L is null then the upper bound is no less than θ , we add the group-wise route
11 H .put(τ , { omin , p, omin .sd }); tuples of τ into G.candidates. When having completed the scan of
12 else H , we return G.candidates as the result.
13 L .add( { omin , p, omin .sd }); The algorithm for combining route candidates in GSPS is simi-
14 return H ;
lar to Algorithm 2. The only difference is that we do not need to
generate the group-wise route tuples associated with each group G
B.2 Route Candidate Filtering (cf. Algorithm 2 lines 4–8) because we have done it in Algorithm 4.
Recall that the GSPSExpansion algorithm generates route candi- Time Complexity: The time complexity of generating route tu-
dates (H ) for group G. However, some routes in H can be eliminated ples of each query location in O (i.e., GSPSExpansion) is O(2 |O | ·
based on the route label set associated with each route in H . In (|V | · log|V | + |E|) · |Tv |), where 2 |O | denotes the number of unique
particular, while evaluating whether a new route τ is a qualified groups that can be generated from query location set O. The time
route candidate for G during iteration i, the corresponding similar- complexity of route combination on each partitioning can be ap-
ity upper bound (i.e., |O | − |G | + oi ∈G e −oi .sd ) is computed based proximated as O(|TG | 2 · k!). The notation was explained at the end
3 We
of Section A.
omit the detailed reduction and approximation of the time complexity due to the
space limitation.
498