Você está na página 1de 13

206 IEEE INTERNET OF THINGS JOURNAL, VOL. 3, NO.

2, APRIL 2016

EPLQ: Efficient Privacy-Preserving Location-Based


Query Over Outsourced Encrypted Data
Lichun Li, Rongxing Lu, Senior Member, IEEE, and Cheng Huang

Abstract—With the pervasiveness of smart phones, location-


based services (LBS) have received considerable attention and
become more popular and vital recently. However, the use of LBS
also poses a potential threat to user’s location privacy. In this
paper, aiming at spatial range query, a popular LBS providing
information about points of interest (POIs) within a given distance,
we present an efficient and privacy-preserving location-based
query solution, called EPLQ. Specifically, to achieve privacy-
preserving spatial range query, we propose the first predicate-only
encryption scheme for inner product range (IPRE), which can
be used to detect whether a position is within a given circu-
lar area in a privacy-preserving way. To reduce query latency,
we further design a privacy-preserving tree index structure in
EPLQ. Detailed security analysis confirms the security properties
of EPLQ. In addition, extensive experiments are conducted, and
the results demonstrate that EPLQ is very efficient in privacy-
preserving spatial range query over outsourced encrypted data. In
particular, for a mobile LBS user using an Android phone, around Fig. 1. Example of spatial range query.
0.9 s is needed to generate a query, and it also only requires a
commodity workstation, which plays the role of the cloud in our
experiments, a few seconds to search POIs.
locations. For another example, some sensitive location data of
Index Terms—Location-based services (LBS), outsourced
encrypted data, privacy-enhancing technology, spatial range organization users may involve trade secret or national security.
query. Protecting the privacy of user location in LBS has attracted con-
siderable interest. However, significant challenges still remain
I. I NTRODUCTION in the design of privacy-preserving LBS, and new challenges
arise particularly due to data outsourcing. In recent years, there
A FEW decades ago, location-based services (LBS) were
used in military only. Today, thanks to advances in infor-
mation and communication technologies, more kinds of LBS
is a growing trend of outsourcing data including LBS data
because of its financial and operational benefits. Lying at the
intersection of mobile computing and cloud computing, design-
have appeared, and they are very useful for not only organiza- ing privacy-preserving outsourced spatial range query faces the
tions but also individuals. Let us take the spatial range query, challenges below.
one kind of LBS that we will focus in this paper, as an exam- 1) Challenge on querying encrypted LBS data. The LBS
ple. Spatial range query is a widely used LBS, which allows a provider is not willing to disclose its valuable LBS data
user to find points of interest (POIs) within a given distance to to the cloud. As illustrated in Fig. 2, the LBS provider
his/her location, i.e., the query point. As illustrated in Fig. 1, encrypts and outsources private LBS data to the cloud,
with this kind of LBS, a user could obtain the records of all and LBS users query the encrypted data in the cloud. As
restaurants within walking distance (say 500 m). Then, the a result, querying encrypted LBS data without privacy
user can go through these records to find a desirable restaurant breach is a big challenge, and we need to protect not only
considering price and reviews. the user locations from the LBS provider and cloud but
While LBS are popular and vital, most of these services also LBS data from the cloud.
today including spatial range query require users to submit 2) Challenge on the resource consumption in mobile devices.
their locations, which raises serious concerns about the leak- Many LBS users are mobile users, and their terminals
ing and misusing of user location data. For example, criminals are smart phones with very limited resources. However,
may utilize the data to track potential victims and predict their the cryptographic or privacy-enhancing techniques used
Manuscript received June 16, 2015; revised July 23, 2015; accepted August to realize privacy-preserving query usually result in high
10, 2015. Date of publication August 17, 2015; date of current version March computational cost and/or storage cost at user side.
23, 2016. 3) Challenge on the efficiency of POI searching. Spatial
The authors are with the School of Electrical and Electronic Engineering, range query is an online service, and LBS users are sensi-
Nanyang Technological University, Singapore 639798 (e-mail: lcli@ntu.
edu.sg; rxlu@ntu.edu.sg; huangcheng@ntu.edu.sg). tive to query latency. To provide good user experiences,
Digital Object Identifier 10.1109/JIOT.2015.2469605 the POI search performing at the cloud side must be
2327-4662 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LI et al.: EPLQ: EFFICIENT PRIVACY-PRESERVING LOCATION-BASED QUERY OVER OUTSOURCED ENCRYPTED DATA 207

vectors is in a given range. The two vectors contain the


location information of the POI and the query, respec-
tively. Based on this discovery and our IPRE scheme,
spatial range query without leaking location informa-
tion can be achieved. To avoid scanning all POIs to find
matched POIs, we further exploit a novel index struc-
ture named ss-tree,
ˆ which conceals sensitive location
information with our IPRE scheme. Experiments on our
implementation demonstrate that our solution is very effi-
cient. Moreover, security analysis shows that EPLQ is
secure under known-sample attacks and ciphertext-only
attacks.
3) Our techniques can be used for more kinds of privacy-
preserving queries over outsourced data. In the spa-
Fig. 2. System model of outsourced LBS under consideration. tial range query discussed in this work, we consider
Euclidean distance, which is widely used in spatial
done in a short time (e.g., a few seconds at most). Again, databases. Our IPRE scheme and ss-tree
ˆ may be used for
the techniques used to realize privacy-preserving query searching records within a given weighted Euclidean dis-
usually increase the search latency. tance or great-circle distance as well. Weighted Euclidean
4) Challenge on security. LBS data are about POIs in real distance is used to measure the dissimilarity in many
world. It is reasonable to assume that the attacker may kinds of data, while great-circle distance is the distance of
have some knowledge about original LBS data. With such two points on the surface of a sphere. Using great-circle
knowledge, known-sample attacks are possible (elabo- distance instead of Euclidean distance for long distances
rated later in Section II). on the surface of earth is more accurate. By supporting
Recently, there are already some solutions for privacy- these two kinds of distances, privacy-preserving simi-
preserving spatial range query [1]–[6]. However, as elaborated larity query and long spatial range query can also be
in Section VIII later, existing solutions cannot address all the realized.
above challenges. Aiming at these, in this paper, we propose This paper is organized as follows. In Section II, we formal-
an efficient solution for privacy-preserving spatial range query ize the system model and attack models considered in our work,
named EPLQ, which allows queries over encrypted LBS data and identify the design goal. In Section III, we recall bilinear
without disclosing user locations to the cloud or LBS provider. pairing and related complexity assumptions as preliminaries,
To protect the privacy of user location in EPLQ, we design which will be used in subsequent sections. In Section IV, we
a novel predicate-only encryption scheme for inner product propose IPRE. Then, based on IPRE, we design the EPLQ solu-
range (IPRE scheme for short), which, to the best of our tion for privacy-preserving spatial range query in Section V,
knowledge, is the first predicate/predicate-only scheme of this followed by its security analysis and performance evaluation
kind. To improve the performance, we also design a privacy- in Sections VI and VII, respectively. We give related work in
preserving index structure named ss-tree.
ˆ Specifically, the main Section VIII, and finally conclude this work in Section IX.
contributions of this paper are three folds.
1) We propose IPRE, which allows testing whether the inner
product of two vectors is within a given range without II. M ODELS AND D ESIGN G OAL
disclosing the vectors. In predicate encryption, the key In this section, we formalize the system model and attack
corresponding to a predicate f can decrypt a ciphertext models considered in this paper, and identify the design goal.
if and only if the attribute of the ciphertext x satisfies
the predicate, i.e., f (x) = 1. Predicate-only encryption
A. System Model
is a special type of predicate encryption not designed
for encrypting/decrypting messages. Instead, it reveals Privacy-preserving POI query has been studied in two set-
that whether f (x) = 1 or not. Predicate-only encryption tings of LBS: 1) public LBS and 2) outsourced LBS. In this
schemes supporting different types of predicates [7], [8] paper, we focus on the latter setting. In the former setting, there
have been proposed for privacy-preserving query on out- is an LBS provider holding a spatial database of POI records
sourced data. To the best of our knowledge, there does not in plaintext, and LBS users query POIs at the provider’s site.
exist predicate/predicate-only scheme supporting inner In outsourced LBS, as shown in Fig. 2, the system consists of
product range. Though our scheme is used for privacy- three kinds of entities, LBS provider, LBS users, and cloud, as
preserving spatial range query in this paper, it may be follows.
applied in other applications as well. 1) The LBS provider has abundant of LBS data, which are
2) We propose EPLQ, an efficient solution for privacy- POI records. The LBS provider allows authorized users
preserving spatial range query. In particular, we show that (i.e., LBS users) to utilize its data through location-based
whether a POI matches a spatial range query or not can queries. Because of the financial and operational benefits
be tested by examining whether the inner product of two of data outsourcing, the LBS provider offers the query
208 IEEE INTERNET OF THINGS JOURNAL, VOL. 3, NO. 2, APRIL 2016

services via the cloud. However, the LBS provider is not encrypted version of the known POI. Then, the attacker
willing to disclose the valuable LBS data to the cloud. knows that corresponding query points must be close to
Therefore, the LBS provider encrypts the LBS data, and the known POI.
outsources the encrypted data to the cloud. In addition to the above attacks, other attacks such as insider
2) The cloud has rich storage and computing resources. It attacks may be possible. In this paper, we consider ciphertext-
stores the encrypted LBS data from the LBS provider, and only and known-sample attacks, which do not require attackers
provides query services for LBS users. So, the cloud has with very strong abilities. We will leave the attacks requiring
to search the encrypted POI records in local storage to very strong abilities for future study.
find the ones matching the queries from LBS users.
3) LBS users have the information of their own locations,
and query the encrypted records of nearby POIs in the C. Design Goal
cloud. Cryptographic or privacy-enhancing techniques
Under the outsourced LBS system model, our design goal
are usually utilized to hide the location information in
is to develop an efficient, accurate, and secure solution for
the queries sent to the cloud. To decrypt the encrypted
privacy-preserving spatial range query. Specifically, the follow-
records received from the cloud, LBS users need to
ing three objectives should be achieved.
obtain the decryption key from the LBS provider in
1) Efficiency. As discussed in Section I, spatial range query
advance.
has stringent performance requirements. A good solu-
tion should not consume many resources of mobile LBS
B. Attack Models users, and the POI search latency should be acceptable for
online query.
Similar to most previous works on outsourced data query,
2) Accuracy. It is desirable that a query result contains the
the cloud is assumed honest but curious and considered as the
exact records matching the query. False negatives would
potential attacker in this work. That is, the cloud would honestly
hurt user experience, while false positives would increase
store and search data as requested; however, the cloud would
communication cost. Additional computational cost is
also have financial incentives to learn those stored LBS data
also required at the user side to filter out false positives.
and user location data in query. Because both LBS data and
3) Security. The proposed solution should be resilient to
user location data are valuable, they should be protected and
ciphertext-only attacks and known-sample attacks. An
hidden from the cloud. In general, in the outsourced LBS set-
accurate and efficient solution for spatial range query
ting, the cloud can observe both queries from LBS users and
[1] already exists, which is resilient to ciphertext-only
encrypted LBS data from the LBS provider, which could be an
attacks but not to known-sample attacks and more power-
advantage to learn user locations. Therefore, assuming different
ful attacks. The proposed solution should be more secure
abilities of the attacker, there are mainly four attack models in
than the solution in [1].
outsourced LBS setting.
Though subject to more powerful attacks such as known-
1) Ciphertext-only attack. In this model, the attacker is able
plaintext attacks, the solution proposed in this paper still can
to observe the ciphertexts of POIs’ locations and queries
be used in many situations where the attackers do not have the
but does not know the plaintexts. Obviously, every cloud
required abilities or knowledge. Our solution also has advan-
has this ability. This is a weak attack model.
tages over the solutions resilient to such attacks. As we will see
2) Known-sample attack. In this model, the attacker knows
in the related works in Section VIII, such solutions are either
the plaintexts of some POIs’ locations and/or queries. The
very computationally costly or not applicable to outsourced
attacker also knows that their corresponding ciphertexts
LBS.
must exist in all the ciphertexts observed by the attacker.
However, the attacker does not know which ciphertext is
corresponding to a known plaintext. Utilizing such infor-
mation, the attacker may be able to reveal the plaintext III. P RELIMINARIES : B ILINEAR PAIRING AND
C OMPLEXITY A SSUMPTIONS
corresponded to any given ciphertext. Such information
is not hard to obtain if the attacker has the background In this section, we outline the cryptographic technique of
knowledge that the LBS database must contain the POIs bilinear pairing and related complexity assumptions, which will
of certain type in a certain area. serve as the basis of our IPRE scheme.
3) Known-plaintext attack. In this model, the attacker knows Let G1 and G2 be two cyclic groups of the same big prime
the plaintexts of some POIs’ locations and/or queries order p, and g be a generator of G1 . Let e : G1 × G1 → G2 be
as well as their corresponding ciphertexts. Utilizing this a pairing, i.e., a map satisfies the following properties:
information, the attacker may be able to reveal the plain- 1) bilinearity: e(P a , Qb ) = e(P, Q)ab for any a, b ∈ Z∗p and
texts corresponded to other ciphertexts. any P, Q ∈ G1 ;
4) Access-pattern attack. In this model, the attacker has 2) nondegeneracy: e(g, g) = 1;
some background knowledge about the pattern of POI 3) computability: e can be computed efficiently.
accessing. For example, the attacker knows that a known The definitions of pairing parameter generator and pairing
POI would be the most popular POI. If an encrypted POI related complexity assumptions are given below. For more
appears most frequently in query results, it must be the comprehensive descriptions, refer to [9].
LI et al.: EPLQ: EFFICIENT PRIVACY-PRESERVING LOCATION-BASED QUERY OVER OUTSOURCED ENCRYPTED DATA 209

Definition 1: Gen. The pairing parameter generator Gen is TABLE I


a probabilistic algorithm that takes a security parameter λ as N OTATIONS F REQUENTLY U SED IN IPRE AND EPLQ
input and outputs a 5-tuple (G1 , G2 , g, p, e).
Definition 2: Computational Diffie–Hellman (CDH) prob-
lem. The CDH problem is: Given (P, P a , P b ) ∈ G1 for
unknown a, b ∈ Zp∗ , compute P ab ∈ G1 .
Definition 3: Decisional bilinear Diffie–Hellman (DBDH)
problem. The DBDH problem is: Given (P, P a , P b , P c ) ∈ G1
and W ∈ G2 for unknown a, b, c ∈ Zp∗ , determine whether
W = e(P, P )abc or a random element from G2 .
Definition 4: Discrete logarithm (DL) problem. The DL
problem is: Given Q ∈ G1 , compute a ∈ Zp∗ such that g a = Q.

IV. IPRE: A N OVEL P REDICATE -O NLY E NCRYPTION


S CHEME FOR I NNER P RODUCT R ANGE
In this section, we present IPRE, which will serve as the
basis of our EPLQ solution for privacy-preserving spatial range
query.

A. Overview
The proposed IPRE scheme allows computing inner prod-
ucts and comparing their values with a predefined range in a
privacy-preserving way. As far as we know, our scheme is the
first predicate/predicate-only encryption scheme for inner prod-
−−→ −−→
uct range. In IPRE, both attributes and predicates are vectors. EUi and EVj . α and β are two predefined secret integers.
So, we use attribute vectors and predicate vectors to refer to the Next, we show how to find such encoding functions.
attributes and predicates in IPRE. Let Λ ⊆ Ztp be the attribute Following the well-known multinomial theorem, we have
set and  ⊆ Ztp be the class of predicates in IPRE. p is a big following equation:
prime here. IPRE allows testing if the inner product of a vector
from Λ and a vector from  is in a predefined range without  d
→ −
− → 
t
disclosing the vectors. (α × Ui , Vj  + β)d = β+α× ui,k × vj,k
IPRE scheme is a symmetric predicate-only encryption k=1
scheme, and it consists of four algorithms: 1) Setup algorithm   
d
for generating a public parameter PP, an attribute encryption =
l1 +l2 +···+lt+1 =d
l1 , l2 , . . . , lt+1
key AK, and a predicate encryption key PK; 2) Enc algorithm l1 ,l2 ,...,lt+1 ∈[0,d]
for encrypting attribute vectors to ciphertexts; 3) GenToken
algorithm for encrypting predicate vectors to tokens; and 
t 
t
× β lt+1 (α × vj,k )lk uli,k
k

4) Check algorithm for checking if a ciphertext’s attribute


k=1 k=1
satisfies a token’s predicate.
For the reader’s convenience, we summarize the important  d
d!
notations to be used in Table I. where l1 ,l2 ,...,l t+1
= l1 !l2 !...l t+1 !
.

Then, the last  sum of lthe above equation
t
l1 + l2 + ··· +lt + 1 = d
d
l1 ,l2 ,...,lt + 1 β t + 1
k=1 (α × vj,k )lk
B. Encoding Attribute Vectors and Predicate Vectors t d+t
l1 ,l2 ,...,lt + 1 ∈[0,d]
t
lk
k=1 ui,k has terms. Let ai,m and bj,m be the k=1 uli,k k
Before describing IPRE’s algorithms, we define the encod-  t t
k=1 (α × vj,k ) ,
d lt+1 lk
ings of attribute vectors and predicate vectors, which serve and l1 ,l2 ,...,lt+1 β respectively,
as a building block of IPRE. Let EncodeU () and EncodeV () in the mth term. Then, the last sum can be written as
be the functions of encoding predicate vectors and attribute
(d+tt )
m=1 ai,m × bj,m . Without loss of generality, let lt+1 = d
vectors, respectively. EncodeU () takes a predicate vector and l1 , l2 , . . . , lt = 0 when m = 1. So, ai,1 = 1 and bj,1 = β d .
−→
Ui = (ui,1 , ui,2 , . . . , ui,t ) and a random integer hi as input, and Then, we have the following equation:
−−→
outputs a vector EUi . Similarly, EncodeV () takes an attribute

→ −
→ − →
vector Vj = (vj,1 , vj,2 , . . . , vj,t ) and a random integer sj as (α × Ui , Vj  + β)d + hi + sj
−−→
input, and outputs a vector EVj . d+t
(t )
We want the encoding functions to meet the requirement that ai,m × bj,m
−−→ −−→ −→ − → = hi + s j +
EUi , EVj  = (α × Ui , Vj  + β)d + hi + sj for any pair of m=1
210 IEEE INTERNET OF THINGS JOURNAL, VOL. 3, NO. 2, APRIL 2016

d+t
(t )
F. Check Algorithm
= hi + sj + ai,1 × bj,1 + ai,m × bj,m The check algorithm takes a ciphertext Cj = (cj,1 , cj,2 , . . . ,


m=2
cj,n+1 ) of an attribute vector Vj and a token Ki =
d+t
(t )


(qi,1 , qi,2 , . . . , qi,n+1 ) associated with a predicate
n
vector Ui
= hi × 1 + 1 × (β d + sj ) + ai,m × bj,m e(cj,k ,qi,k )
as input. The algorithm computes Ψ = ck=1 j,n+1 ×sj,n+1
. If
m=2
Hash(Ψ) is in the set {Ωk : τ1 ≤ k ≤ τ2 }, the algorithm out-
= (hi , 1, ai,2 , ai,3 , . . . , ai,n−1 ),
puts 1. Otherwise, it outputs 0.
(1, β d + sj , bj,2 , bj,3 , . . . , bj,n−1 ). Remark 1: If two inner products are equal, their Ψ are the
Thus, we can define EncodeU () and EncodeV () as same. This is not desirable for some applications. This problem

→ can be circumvented by adding randomness in the generation of
EncodeU (Ui , hi ) = (hi , 1, ai,2 , ai,3 , . . . , ai,n−1 ) (1) predicate/attribute vectors. For a pair of fixed predicate vector

→ and attribute vector, the value Ψ is still fixed. However, given
EncodeV (Vj , sj ) = (1, β d + sj , bj,2 , bj,3 , . . . , bj,n−1 ) (2)
a pair of predicate and attribute, their vectors and their vec-

where n = d+tt + 1. tors’ inner product all have multiple possible values. We will
demonstrate it in our EPLQ solution in Section V.
Correctness proof . First, we prove Hash(Ψ) ∈ {Ωk :
C. Setup Algorithm −→ − →
τ1 ≤ k ≤ τ2 } if τ1 ≤ Ui , Vj  ≤ τ2 . Recall that Ωk =
The setup algorithm is a probabilistic algorithm, which takes Hash(e(g, g)(α×perm(k)+β) ). From the following equation,
d

a security parameter λ, the attribute/predicate vector length t, it is easy to find out that Hash(Ψ) ∈ {Ωk : τ1 ≤ k ≤ τ2 } if
and an inner product range [τ1 , τ2 ] as input. The algorithm −
→ − →
τ2 ≥  U i , V j  ≥ τ 1 :
outputs an attribute encryption key AK = (α, β, d, M ), a predi- n
cate encryption key PK = (d, M ), and a public parameter PP = e(cj,k , qi,k )
Ψ = k=1
((G1 , G2 , g, p, e), (Ωk )τk=τ
2
1
). cj,n+1 × sj,n+1
α, β ∈ Fp are two random numbers for encoding functions. n  
e(g ui,k , g vj,k )
d is a positive integer, and its value depends on the security = k=1 hi
parameter. The scheme is more secure if d is bigger. d could e(g, g) × e(g, g)sj
n  
be 2, or d is an integer satisfying GCD(d, p − 1) = 1. If d = e(g, g) k=1 ui,k ×vj,k
=
2, α, β must be chosen to make sure that the intersection of e(g, g)hi +sj
the set {z : z = −z1 − 2β/α mod p, τ1 ≤ z1 ≤ τ2 } and the set −
→ −1 −

e(g, g)EncodeU (Ui ,hi )MM (EncodeV (Vj ,sj ))
T

S − [τ1 , τ2 ] is empty. Here, S be the set of all possible values of =


inner products. M is an n × n random invertible matrix over the e(g, g)hi +sj

→ −

field Fp . n is the length of encoded attribute/predicate vectors. e(g, g)EncodeU (Ui ,hi ),EncodeV (Vj ,sj )
(G1 , G2 , g, p, e) is the pairing parameter generated by run- =
e(g, g)hi +sj
ning Gen(λ), and Ωk = Hash(e(g, g)(α×perm(k)+β) ). Here,
d

→−→
e(g, g)(α×Ui ,Vj +β) +hi +sj
d

Hash() is a hash function and perm() is a random bijection =


mapping from [τ1 , τ2 ] to [τ1 , τ2 ], i.e., a random permutation e(g, g)hi +sj

→−→
function. Let |Hash()| be the range size of Hash(). Hash() is = e(g, g)(α×Ui ,Vj +β) .
d

chosen to make p |Hash()| τ2 − τ1 .


Second, we prove Hash(Ψ) ∈ / {Ωk : τ1 ≤ k ≤ τ2 } with
−→ − →
D. Enc Algorithm overwhelming probability if Ui , Vj  ∈
/ [τ1 , τ2 ].
Lemma 1: ∀a ∈ Fp , a has at most one dth root if
The algorithm of encrypting attribute vectors is a prob- GCD(d, p − 1) = 1 and p is a prime.
−→
abilistic algorithm, which takes an attribute vector Vj = Proof: Clearly, if a = 0, it has only one dth root 0. If
(vj,1 , vj,2 , . . . , vj,t ) and a random number sj ∈ Fp as input, and a = 0, we prove the lemma by contradiction. If a had two

outputs Cj = (cj,1 , cj,2 , . . . , cj,n+1 ) = ((g vj,k )nk=1 , e(g, g)sj ). or more distinct dth roots, let χ1 , χ2 ∈ Fp be two of them.
   T −

Here, (vj,1 , vj,2 , . . . , vj,n ) = M −1 (EncodeV (Vj , sj ))T mod Since χd1 = χd2 = a, we have (χ1 /χ2 )d = 1. Noticing that Z∗p
p. T is the matrix transpose operator. is a cyclic group of order p − 1, we have order(χ1 /χ2 )|d and
order(χ1 /χ2 )|(p − 1). So, order(χ1 /χ2 ) is a common divi-
sor of d and p − 1. Since χ1 and χ2 are distinct, we have
E. GenToken Algorithm
order(χ1 /χ2 ) = 1. This contradicts with GCD(d, p − 1) = 1.
The token generation algorithm GenT oken is a prob- Therefore, the assumption of a having two or more distinct
−→
abilistic algorithm, which takes a predicate vector Ui = square roots must be false. 
(ui,1 , ui,2 , . . . , ui,t ) and a random number hi ∈ Fp as input, Lemma 2: ∀a ∈ Fp , a has at most two square roots if p is a

and outputs Ki = (qi,1 , qi,2 , . . . , qi,n+1 ) = ((g ui,k )nk=1 , prime.
−→
e(g, g)hi ). Here, (ui,1 , ui,2 , . . . , ui,n ) = EncodeU (Ui , hi ) Proof: Clearly, if a = 0, it has only one square root 0.
M mod p. If a = 0, we prove the lemma by contradiction. Assume that
LI et al.: EPLQ: EFFICIENT PRIVACY-PRESERVING LOCATION-BASED QUERY OVER OUTSOURCED ENCRYPTED DATA 211

a has three or more distinct square roots. Then, we can find


two square roots χ1 , χ2 ∈ Fp satisfying χ1 /χ2 = −1 mod p.
Because χ21 = χ22 = a, we have (χ1 /χ2 )2 = 1 and the order
of χ1 /χ2 is 2. This contracts with the fact that Z∗p has only
one subgroup of order 2 and the subgroup’s elements are 1 and
−1 mod p. 
Let Υk be e(g, g)(α×perm(k)+β) for any k ∈ [τ1 , τ2 ]. Then,
d

Hash(Υk ) = Ωk . Let us consider two cases of d as follows.


Case 1: GCD(d, p − 1) = 1. Consider an equation fk (z)
over the field Fp : (α × z + β)d mod p =
log Υk mod p. From Lemma 1, we know that
log Υk has only one dth root over the field
Fig. 3. Index POIs with ss-tree and ss-tree.
ˆ (a) POI distribution. (b) ss-tree and
Fp . Then, the equation has only one root
ss-tree
ˆ structure.
z = (χ − β)/α mod p, where χ ∈ Fp is the
dth root of log Υk . From the definition of Υk ,
we know that τ1 ≤ z ≤ τ2 for any k ∈ [τ1 , τ2 ].
−→ − →
Therefore, if Ui , Vj  ∈ / [τ1 , τ2 ], the resulting
−→− →
(α×Ui ,Vj +β)d
Ψ = e(g, g) is not equal to any
Υk . Because the range size of Hash() is much
bigger than the size of [τ1 , τ2 ], the probability of
Hash(Ψ) ∈ {Ωk : τ1 ≤ k ≤ τ2 } is very low.
Case 2: d = 2. From Lemma 2, we know that the equa-
tion fk (z) : (α × z + β)2 mod p = log Υk mod p
has at most two roots. Suppose there are two roots
Fig. 4. Data structures of ss-tree node and ss-tree
ˆ node.
z1 and z2 . We have (α × z2 + β) mod p = −(α ×
z1 + β) mod p and z2 = (−z1 − 2β/α) mod p.
From the definition of Υk , we know that at least one
circular areas, rectangular areas, and single-dimension ranges,
of the roots is in [τ1 , τ2 ]. Without loss of generality,
must be concealed. Since the above-proposed IPRE scheme can
suppose z1 ∈ [τ1 , τ2 ]. Then, z2 is in the set {z : z =
conceal location data of points and circular areas, and at the
−z1 − 2β/α mod p, τ1 ≤ z1 ≤ τ2 }. Recall that the
same time ss-tree and some of its variants only need these loca-
values of β and α are chosen to avoid the over-
tion data, it is natural to apply IPRE to these data structures
lap between this set and the set S − [τ1 , τ2 ]. Then,

→ − → for privacy-preserving query. Hence, we choose ss-tree for its
if Ui , Vj  ∈/ [τ1 , τ2 ], the inner product is not in simplicity, and propose ss-tree
ˆ based on ss-tree and IPRE.
the set of roots. Therefore, the resulting Ψ is not As shown in Fig. 3, ss-tree and ss-tree
ˆ share the same struc-
equal to any Υk . Again, because |Hash()| τ2 − ture, and the data structures of their nodes are compared in
τ1 , the probability of Hash(Ψ) ∈ {Ωk : τ1 ≤ k ≤ Fig. 4. Before describing ss-tree,
ˆ we give an introduction on
τ2 } is very low. ss-tree. An ss-tree has the following properties: 1) each nonroot
Remark 2: With a suitable Hash() function, using {Ωk : parent node has mmin to mmax children; 2) the root node has
τ1 ≤ k ≤ τ2 } instead of {Υk : τ1 ≤ k ≤ τ2 } to check inner 0 to mmax children; 3) each leaf node represents a record; and
product range can reduce the size of public parameter, and the 4) each node has a centroid field and a radius field.
resulting false positives are rare. We will show in Section VII In the context of spatial database of Cartesian coordinate sys-
that the false positives are negligible in our EPLQ solution. tem, the centroid is a pair of coordinates (x, y). A leaf node’s
centroid is the corresponding POI’s coordinates, and its radius
V. EPLQ: P ROPOSED S OLUTION FOR is 0. A nonleaf node’s centroid and radius depend on its chil-
P RIVACY-P RESERVING S PATIAL R ANGE Q UERY dren. Its centroid is the mean of all its children’s centroids. Its
radius is not smaller than the distance between its centroid and
In this section, we propose a novel tree data structure named any descendant node’s centroid. In other words, a nonleaf node
ss-tree
ˆ and the EPLQ solution based on the above IPRE scheme is associated with a circular area defined by the node’s centroid
and ss-tree.
ˆ and radius, and all its descendant nodes are in this area.
Similar to other tree index structures, a nonleaf node has a
field (child_pointer_array) for pointers to its children, and a leaf
A. Preliminary: ss-tree
node has a field (leaf_data) storing some data of the correspond-
The ss-tree
ˆ introduced in this work is a variant of ss-tree [10]. ing record (e.g., the whole record, the pointer to the record, or
For indexing spatial data, there actually exist quite a few data the record’s ID). A node of ss-tree also has some other fields
structures such as r-tree [11] and ss-tree, and some of them to support tree building, approximation search, and sampling
(e.g., ss-tree and sh-tree [12]) can be used for spatial range operations. We omit these fields in this paper as they are not
query. When such kind of data structures are used for privacy- relevant to our solution. Refer to [10] for the details and the
preserving query, location data, e.g., the location data of points, building of ss-tree.
212 IEEE INTERNET OF THINGS JOURNAL, VOL. 3, NO. 2, APRIL 2016

With the ss-tree, searching POI records matching a spa- and its radius are all integral multiples of φ.
tial range query is very efficient. Noticing that all descendant The centroid’s coordinates will be modified to
nodes of a nonleaf node are in the nonleaf node’s associated the closest coordinates that are integral multi-
circular area. Search POI records can be done by scanning ples of φ, and the radius will be modified to the
the ss-tree from root to leaves. If a node’s circular area inter- smallest integral multiple of φ that makes the
sects with the query area, all children nodes of the node will normalized area cover the original one.
be scanned. Otherwise, its descendant nodes will be skipped. 2) After modifying (x̀j , ỳj ) and r̀j , generate the


Then, O(log N + R) trees nodes will be scanned to find attribute vector Vj as follows:
matched records where N is the number of POI records in the
database and R is the number of matched records. ((−1)ξ̀j μ̀j , (−1)ξ̀j (μ̀j (r̀j2 − x̀2j − ỳj2 )/φ2 + θ̀j ),
(−1)ξ̀j μ̀j x̀j /φ,(−1)ξ̀j μ̀j ỳj /φ,(−1)ξ̀j μ̀j r̀j /φ,
B. Proposed ss-tree
ˆ ξ`j τ2 , (−1)ξ̀j ).
ss-tree
ˆ is the core of our EPLQ solution. It is a variant of
ss-tree. They share the same tree structure, which is shown in Here, μ̀j = τ2 /( rmax1 /φ + 1 + r̀j /φ)2 ,
Fig. 3. The difference between ss-tree and ss-tree ˆ is the tree and θ̀j is a random nonnegative integer not
nodes’ data, which are shown in Fig. 4. ss-tree
ˆ hides each tree more than μ̀j and τ2 − μ̀j ( rmax1 /φ + 1 +
node’s location information using our predicate-only encryp- r̀j /φ)2 . Again, ξ`j is a random number, and its
tion scheme, and removes unnecessary information. Because of value is 1 or 0.
the encryption, detecting circular area intersection and matched Step 3) Encrypt each attribute vector with IPRE scheme.
records are also different when searching matched records with Step 4) Remove all unnecessary fields of each tree node.
the tree. More specifically, we use two kinds of inner products At last, a node’s data include its encrypted attribute
for detecting circular area intersection and matched records, vector. If the node is a nonleaf node, the data also
and our IPRE scheme assures the detection via inner product include pointers to its children. If the node is a leaf
range in a privacy-preserving way. node, its leaf_data field also store the pointer to the
1) Building ss-tree:
ˆ ss-tree
ˆ can be built from ss-tree. After corresponding record.
building an ss-tree for the spatial database, an ss-tree
ˆ can be 2) Searching ss-tree:
ˆ Searching ss-tree
ˆ is the same as
built by the following steps. searching ss-tree except that detecting circular area intersection
Step 1) Configure parameters τ1 , τ2 , and φ. and matched records are based on our IPRE scheme.
As mentioned in the IPRE scheme, τ1 and τ2 are Suppose a spatial range query wants to find all POIs within
the lower limit and upper limit of the given inner a circular area centered at coordinates (xi , yi ) with radius ri .
product range, respectively. The lower limit τ1 is To search ss-tree,
ˆ the tokens of two predicate vectors associ-


fixed to 0 in ss-tree.
ˆ τ2 is set to a value not ated with the query should be provided. The two vectors Ui
2
smaller than rmax1 where rmax1 is the maximum −

and Ui are shown below, and tokens are generated with IPRE’s
query range allowed in the system. φ is a positive GenT oken algorithm
integer used to scale down the inner products for
detecting area intersection, and τ2 ≥ ( rmax1 /φ + −

Ui = ((−1)ξi (μi (ri2 − x2i − yi2 ) + θi ), (−1)ξi μi , (−1)ξi μi xi ,
rmax2 /φ + 2)2 where rmax2 is the biggest radius
(−1)ξi μi yi , (−1)ξi μi ri , 1, ξi τ2 )
of ss-tree’s circular areas. (Such inner products are

→  2 2 2  
scaled down to make them fit in a smaller range, Ui = ((−1)ξi (ri − xi − yi )/φ2 , (−1)ξi , (−1)ξi 2xi /φ,
which reduces the size of IPRE’s public parame-  

ter. As explained later at the end of this section, the (−1)ξi 2yi /φ, (−1)ξi 2ri /φ, 1, ξi τ2 ).
scaling down will not decrease accuracy.)
Here, ξi and ξi are random integers in {0, 1}. μi = τ2 /ri2 ,
Step 2) Generate an attribute vector for each tree node from
and θi is a random nonnegative integer not more than μi and
the node’s centroid (x̀j , ỳj ) and radius r̀j .
τ2 − μi ri 2 . (xi , yi ) and ri are the centroid coordinates and
Case 1: The node is a leaf node. Generate the attribute

→ radius of the minimal normalized area covering the query area,
vector Vj as follows: ((−1)ξ̀j , (−1)ξ̀j (r̀j2 − x̀2j −
respectively. The way to find this normalized area is the same
ỳj2 ), (−1)ξ̀j × 2x̀j , (−1)ξ̀j × 2ỳj , (−1)ξ̀j × 2r̀j , as that in the generation of attribute vectors.
− → −→
ξ`j τ2 , (−1)ξ̀j ). The tokens of Ui and Ui are used for detecting matched
Here, ξ`j is a random number, and its value is 1 records and circular area intersection, respectively. We call
or 0. the former vector POI-matching predicate vector and the latter
Case 2: The node is a nonleaf node. Do the following two area-intersecting predicate vector.
steps. Given the above tokens associated with the query, POI
1) Modify the node’s original circular area to a records matching the query can be found by searching ss- ˆ
minimal normalized area covering the original tree. The pseudocode of the search algorithm is shown in
one. In this paper, we say a circular area is Algorithm 1. The search starts from the root node. If a nonleaf
a normalized area if its centroid’s coordinates node’s area intersects with the query area, all children of the
LI et al.: EPLQ: EFFICIENT PRIVACY-PRESERVING LOCATION-BASED QUERY OVER OUTSOURCED ENCRYPTED DATA 213

Algorithm 1. Search_ss-tree(node
ˆ nd, query_tokens Ks, As the tree node is a leaf node, r̀j = 0. Then, we have

node_list ndl) ⎪
⎪ μi (ri2 − (xi − x̀j )2 − (yi − ỳj )2 ) + θi ,
→ −
− → ⎨
1: \\ nd: the node to be searched if ξi = ξ`j
 Ui , V j  =
2: \\ Ks: the array of two tokens associated with the query’s ⎪
⎪ τ − μi (ri2 − (xi − x̀j )2 − (yi − ỳj )2 ) − θi ,
⎩ 2
predicate vectors. Ks[0] is the token for POI matching otherwise.
detection, while Ks[1] is the one for detecting intersection The
of circular areas.  distance between the leaf node’s POI and the query point
is (xi − x̀j )2 + (yi − ỳj )2 . Then, the record matches the
3: \\ ndl: the list to store matched leaf nodes query if and only if ri2 ≥ ri2 − (xi − x̀j )2 − (yi − ỳj )2 ≥ 0.
4: As shown below, record matching can be detected by examining
5: C ← nd.encrypted_attribute_vector −
→ − →
if Ui , Vj  is in the range [0, τ2 ] as well
6: if nd is a leaf node then
7: if Check(Ks[0], C) == 1 then ri2 ≥ ri2 − (xi − x̀j )2 − (yi − ỳj )2 ≥ 0
8: \\ nd’s record matches the q’s area
⇔ τ2 ≥ μi (ri2 − (xi − x̀j )2 − (yi − ỳj )2 ) + θi ≥ 0
9: Add nd to node_list ndl.
10: end if and
11: else τ2 ≥ τ2 − μi (ri2 − (xi − x̀j )2 − (yi − ỳj )2 ) − θi ≥ 0
12: if Check(Ks[0], C) == 1 then −→ − →
⇔ τ2 ≥ Ui , Vj  ≥ 0.
13: \\ nd’s area intersects with the q’s area
14: for each child node cld_i of nd do For security reasons, the field order p in IPRE scheme is at least
15: Search_ss-tree(cld_i,
ˆ Ks, ndl) −
→ − →
160 bits, and p Ui , Vj  −p holds. Then, we have
16: end for
17: end if −
→ − → −
→ − →
τ2 ≥ Ui , Vj  ≥ 0 ⇔ τ2 ≥ Ui , Vj  mod p ≥ 0.
18: end if
Therefore, record matching can be detected by examining if
−→ − →
Ui , Vj  mod p is in the range [0, τ2 ].
node will be scanned. Otherwise, all descendant nodes of this
4) Correctness of Detecting the Intersection of Circular
nonleaf node are skipped. Detecting circular area intersection
Areas via Inner Product Range: Similarly, for any area-
and matched records are based on our IPRE scheme for inner −

product range. We give the correctness proofs of the detection intersecting predicate vector Ui and any nonleaf node’s
−→
as follows. attribute vector Vj , we have the following equation:
3) Correctness of Detecting Matched Records via Inner −→ − →
−→ Ui , Vj 
Product Range: For any POI-matching predicate vector Ui ⎧

→ ⎪ μ̀j ((ri + r̀j )2 − (xi − x̀j )2 − (yi − ỳj )2 )/φ2 + θ̀j ,
and any leaf node’s attribute vector Vj , we have the following ⎪

equation: if ξi = ξ`j
=
⎪  2  2  2 2
→ −
− →
 Ui , V j  ⎩ τ2 − μ̀j ((ri + r̀j ) − (xi − x̀j ) − (yi − ỳj ) )/φ − θ̀j ,

otherwise.
= ((−1)ξi (μi (ri2 − x2i − yi2 ) + θi ), (−1)ξi μi , (−1)ξi μi xi , The nonleaf node’s circular area (been normalized) intersects
ξi ξi
(−1) μi yi , (−1) μi ri , 1, ξi τ2 ), with the query’s normalized area if and only if (ri + r̀j )2 ≥
(ri + r̀j )2 − (xi − x̀j )2 − (yi − ỳj )2 ≥ 0. As shown below,
((−1)ξ̀j , (−1)ξ̀j (r̀j2 − x̀2j − ỳj2 ), (−1)ξ̀j × 2x̀j , (−1)ξ̀j the intersection of normalized areas can be detected by exam-
−→ − →
ining if Ui , Vj  mod p is in the range [0, τ2 ] as well. Note
× 2ỳj , (−1)ξ̀j × 2r̀j , ξ`j τ2 , (−1)ξ̀j )
that detecting intersection is to rule out subtrees not containing
= (−1)ξi +ξ̀j (μi (ri2 − x2i − yi2 ) + θi ) matched POIs. Expanding original areas to normalized areas
only results in scanning more tree nodes. All matched POI
+ (−1)ξi +ξ̀j μi (r̀j2 − x̀2j − ỳj2 ) + (−1)ξi +ξ̀j 2μi xi x̀j records can still be found, and not matched records will not
be included in the result
+ (−1)ξi +ξ̀j 2μi yi ỳj + (−1)ξi +ξ̀j 2μi ri r̀j
(ri + r̀j )2 ≥ (ri + r̀j )2 − (xi − x̀j )2 − (yi − ỳj )2 ≥ 0
+ ξ`j τ2 + (−1)ξ̀j ξi τ2
⇔ τ2 ≥ μ̀j ((ri + r̀j )2 − (xi − x̀j )2
ξi +ξ̀j 2 2 2
= (−1) (μi ((ri + r̀j ) − (xi − x̀j ) − (yi − ỳj ) ) + θi ) − (yi − ỳj )2 )/φ2 + θ̀j ≥ 0
+ ξ`j τ2 + (−1)ξ̀j ξi τ2 and

⎪ μi ((ri + r̀j )2 − (xi − x̀j )2 − (yi − ỳj )2 ) + θi , τ2 ≥ τ2 − μ̀j ((ri + r̀j )2 −(xi − x̀j )2 − (yi − ỳj )2 )/φ2 − θ̀j ≥ 0


if ξi = ξ`j −→ − →
= ⇔ τ2 ≥ Ui , Vj  ≥ 0

⎪ τ − μi ((ri + r̀j )2 − (xi − x̀j )2 − (yi − ỳj )2 ) − θi , −→ −
⎩ 2 →
otherwise. ⇔ τ2 ≥ Ui , Vj  mod p ≥ 0.
214 IEEE INTERNET OF THINGS JOURNAL, VOL. 3, NO. 2, APRIL 2016

C. EPLQ Design The confidentiality of LBS data includes not only the confi-
Our EPLQ solution consists of two algorithms: 1) system dentiality of POI records but also the confidentiality of location
information in ss-tree.
ˆ On the other hand, user location pri-
setup and 2) spatial range search.
1) System Setup: The LBS provider initializes the system vacy involves protecting sensitive location information in user
by the following steps. queries and ss-tree.
ˆ The security of EPLQ solution depends on
the underlying standard encryption scheme and IPRE scheme.
Step 1) The LBS provider initializes the parameters and
keys for the solution. The standard encryption scheme is responsible for preventing
The LBS provider initializes the public parameter the cloud from learning POI records, while our IPRE scheme is
responsible for protecting user location and POI location from
and keys of the proposed IPRE scheme as well
as the key of a standard encryption scheme (e.g., the cloud. The current AES standard can be used as the standard
AES). Let AK = (α, β, d, M ), PK = (d, M ), and scheme, and it is secure under ciphertext-only, known-sample,
and known-plaintext attacks. Thus, we focus on the analysis of
PP = ((G1 , G2 , g, p, e), (Ωk )τk=τ
2
1
) be the attribute
encryption key, predicate encryption key, and pub- user/POI location protection with IPRE scheme.
lic parameter of IPRE scheme. PP is shared with
the cloud. PK, (G1 , G2 , g, p, e), and the key of the A. Security of Query and POI Index Encryption
standard encryption scheme are shared with LBS
users. In EPLQ, user queries and the sensitive location informa-
Remark. The standard scheme will be used to tion in ss-tree
ˆ are encrypted with IPRE scheme. A query
encrypt POI records. IPRE scheme is for searching consists of two tokens associated with two predicate vectors,
encrypted records. (Ωk )τk=τ which contains the LBS user’s location information. For a
2
in the public parameter −

1
is used for IPRE’s Check algorithm only, and LBS predicate vector Ui = (ui,1 , ui,2 , . . . , ui,t ), the corresponding


users do not need it to generate tokens. token is ((g ui,k )nk=1 , e(g, g)hi ) where (ui,1 , ui,2 , . . . , ui,n ) =
−→
Step 2) The LBS provider builds an ss-tree ˆ for the LBS EncodeU (Ui , hi )M mod p. Because of the hardness of CDH
database. problem, the attacker cannot reveal any exponent in the token
Step 3) The LBS provider encrypts each POI record with the even if knowing the predicate vector. Without the secret keys of
standard encryption scheme. IPRE (i.e., TK and AK), no one can reveal the predicate vec-
Step 4) The LBS provider outsources all encrypted POI tor, the secret matrix M , or the random number hi . Because of
records and the ss-tree
ˆ to the cloud. the randomness and secretness of hi , encrypting predicate vec-
2) Spatial Range Search: Suppose an LBS user wants to tor to token is semantically secure. Therefore, it is secure under
find all POIs within a circular area centered at coordinates ciphertext-only and known-sample attacks. The sensitive loca-
(xi , yi ) with radius ri . The privacy-preserving query is per- tion information in ss-tree
ˆ is concealed with attribute vector
formed by the following steps. encryption. The encryption is very similar to predicate vector
Step 1) The LBS user generates two tokens for searching encryption, and their security properties are same. Therefore,
POI records with the proposed IPRE scheme. we omit its security analysis.
As elaborated earlier in Section V-B, to search ss- ˆ
tree, two tokens associated with the query area
should be generated. The LBS user generates them B. Security Under the Attack on Inner Products
following the way in Section V-B. Let Ks[0] and As discussed above, it is hard to reveal user queries and
Ks[1] be the generated two tokens. POI locations directly from the ciphertexts of attribute vectors
Step 2) The user sends (Ks[0], Ks[1]) as a query to the cloud. and predicate vectors. Alternatively, the attacker may attempt
Step 3) The cloud searches ss-treeˆ to find all leaf nodes to recover the inner products of predicate vectors and attribute
matching the query from the user. vectors first, and then reveal the vectors containing information
The search algorithm has been given in Section V-B, about user locations and POI locations. Next, we show how this
and its pseudocode is shown in Algorithm 1. attack works and its countermeasure.
Step 4) The cloud returns the corresponding POI records of The attacker may attempt to recover inner products through
exhaustive attacks on (Υk = e(g, g)(α×perm(k)+β) )τk=τ
d
matched leaf nodes to the user. 2
. The
1
Step 5) The LBS user decrypts received POI records exponents ((α × perm(k) + β) )k=τ1 could be viewed as
d τ2
with the shared key of the standard encryption d-degree polynomials of d + 1 terms where the variables are
scheme. α and β. The coefficients of the polynomials are in the range
[τ1d , τ2d ]. These polynomials are in the same vector space of
dimension d + 1. Any d + 1 of these polynomials are linearly
VI. S ECURITY A NALYSIS independent, and any d + 2 of them are not. Thus, for any
In this section, we analyze the security properties of the Υk1 , Υk2 , . . . , Υkd+2 , there exists nonzero λ1 , λ2 , . . . , λd+2
proposed EPLQ solution. Specifically, following the security satisfying the following equation:
requirements discussed earlier, our analysis will focus on how 
d+2 
d+2
e(g, g)λl ×(α×perm(kl )+β) = e(g, g)0 .
d
the proposed EPLQ solution can achieve the LBS data confi- Υλkll =
dentiality and the user’s location privacy. l=1 l=1
LI et al.: EPLQ: EFFICIENT PRIVACY-PRESERVING LOCATION-BASED QUERY OVER OUTSOURCED ENCRYPTED DATA 215

The attacker can find such λ1 , λ2 , . . . , λd+2 through


exhaustive search, and then recover d + 2 inner
products perm(k1 ), . . . , perm(kd+2 ). Recall that τ1
is set to 0 when using IPRE in EPLQ. There are
(τ2 + 1) × τ2 · · · × (τ2 − d) possible values for the com-
bination (perm(k1 ), perm(k2 ), . . . , perm(kd+2 )). Therefore,
the complexity of recovering inner products is O(τ2d+2 ).
With enough known inner products, the attacker can fur-
ther reveal sensitive location data by generating and solving an
overdetermined nonlinear polynomial equation system. Given
−→
the inner product ρi,j of an unknown predicate vector Ui
−→ Fig. 5. User interface of EPLQ.
and attribute vector Vj , the attacker can generate the equa-
−→ − →
tion Ui , Vj  = ρi,j . The unknowns are the location data and TABLE II
random numbers used to generate the vectors. From the inner E XPERIMENTAL S ETTING
products of ε encoded predicate vectors and w encoded attribute
vectors, a system of ε × w equations and O(ε + w) unknowns
can be generated. If ε and w are big enough, the system is
overdetermined. There are polynomial time algorithms [13],
[14] for solving overdetermined nonlinear polynomial systems.
Though the generated system has many solutions, an attacker
knowing enough plaintext samples can reveal user locations
from the solutions.
Noticing that the complexity of recovering inner products
is O(τ2d+2 ), we prevent the attack above by configuring big
enough τ2 and d. d is a configurable parameter in our IPRE
scheme, while, as demonstrated in our EPLQ solution, τ2 can be
configured by controlling the generation of predicate/attribute
vectors. In our prototype, τ2 = 108 and d = 2. Then, the com-
plexity of the attack is over O(2106 ).

VII. P ERFORMANCE E VALUATION


In this section, we evaluate the performance of the proposed
EPLQ solution in terms of communication cost, computational
cost, storage cost, and accuracy.

Fig. 6. POI distributions of experimental datasets including restaurants in


A. Implementation and Experimental Settings (a) New York, (b) California, and (c) France.
We have implemented EPLQ in JAVA, and the user interface
is shown in Fig. 5. We tested EPLQ’s performance in a testbed
of two workstations and one Android phone. These machines length of encoded vectors. To see whether the computational
play the roles of LBS provider, cloud, and mobile LBS user, cost is acceptable for mobile LBS users or not, we measured
respectively. The hardware and software of these machines are the query generation latency. We let the Android phone in
shown in Table II(a), while the parameter settings are shown in our testbed generate 1000 queries, and the average latency per
Table II(b). query generation is about 0.9 s.
In addition, three datasets are extracted from the
OpenStreetMap project (www.openstreetmap.org) for our
experiments. The POIs in the datasets are the restaurants from C. LBS Provider’s Computational Cost
New York, California, and France. Their distributions are During system setup, the LBS provider needs to encrypt POI
shown in Fig. 6, and the POI counts of the three datasets are records, setup IPRE and build the ss-tree.
ˆ Because the cost of
1676, 6994, and 28 340, respectively. record encryption is the same as that in most solutions, we eval-
uate only the computational cost related to IPRE and ss-tree
ˆ
here. The cost is evaluated by system setup latency, which is
B. Computational Cost at User Side defined as the time used to setup IPRE and build the ss-tree.
ˆ
To generate a query, an LBS user needs to encrypt two As shown in Table III, the latencies for the three datasets are
predicate vectors, which requires 2n modular exponentiations, between 1 and 3 h. Considering that system setup is conducted
about 2n2 multiplications, and about 2n2 additions. n is the only once, the computational cost is acceptable.
216 IEEE INTERNET OF THINGS JOURNAL, VOL. 3, NO. 2, APRIL 2016

TABLE III bits. It is about 763 MB for settings in Table II(b). A tree
S YSTEM S ETUP L ATENCY node’s size is around 1.3 KB. Assume that there are 1 million
records in the database. The cloud needs at most 2.42 GB stor-
age space in total. The public parameter and ss-tree
ˆ can fit in
the memory of even one single server. Therefore, the storage
cost is acceptable. The LBS provider sends the cloud, the pub-
lic parameter, and the tree only once. The communication cost
is also acceptable.

F. Accuracy
As discussed in Section IV, using hash function in IPRE
scheme reduces the size of public parameter but introduces
some false positives. This will not hurt the accuracy of EPLQ
solution. The false positive rate is (τ2 − τ1 + 1)/|Hash()|,
which is about 5.42 × 10−12 . O(log N + R) tree nodes are
Fig. 7. POI query latency at cloud side. Note that the latency should be much
scanned during a query. This number is at most a few hundreds.
lower once deployed at a real cloud. Then, the probability that a query result contains false posi-
tive(s) is at most a few hundred times of 5.42 × 10−12 , which
is negligible.
D. Cloud’s Computational Cost
Recall that searching ss-tree
ˆ to find matched records requires
to scan O(log N + R) trees nodes for the database with N VIII. R ELATED W ORKS
records and the query having R matched records. Determining Our work is related to not only privacy-preserving LBS but
whether a record or tree node matches a query or not requires also privacy-preserving query over outsourced encrypted data.
computing Check(Ki , Cj ). Computing the function requires In this section, we introduce some related works that can be
n = 37 pairings and multiplications. used to realize privacy-preserving POI query, though some of
To see whether the computational cost of searching database them are not designed for POI query or LBS. In the litera-
is acceptable or not, we conducted experiments on three ture, there are four kinds of privacy-preserving queries over
datasets. For each dataset, 1000 random query points are cho- POIs: spatial range query [1], nearest neighbor (NN) query
sen. If a query point’s location is not near any POI, the query is [15], K nearest neighbors (KNN) query [2], [16]–[20], and
not realistic and query latency is lower than that in normal sit- multidimensional range query [20]–[26]. Spatial range query
uations. To avoid that, in the experiments, each query point’s cannot be replaced by NN and KNN queries, which all return
location is the same as one random POI’s. For each query the nearest neighbor(s) to a given location. They have differ-
point, we generate three queries with radii of 500 m, 1 km, and ent usages. Sometimes spatial range query may be replaced by
2 km, respectively. Therefore, 3000 queries are generated for multidimensional range query, which returns POIs in a rectan-
each dataset. We measured the average search latency of these gular area instead of a circular area. However, the inaccurate
queries for each dataset, and the results are shown in Fig. 7. result is not desirable. Next, we review the works applicable to
As expected, the latency increases very slow when increasing privacy-preserving spatial range query.
POI count and query radius. In the experiments, a workstation
plays the role of cloud, and only four CPU cores can be utilized
to do the computing. A real cloud has much more computing A. Solutions Applicable to Outsourced LBS
resources, and the query latency at a real cloud should be much 1) Privacy-Preserving Spatial Range Query Based on
lower. Coordinate Transformation: In the solution based on coordi-
nate transformation [1], the coordinates of queries and POIs
in the original coordinate system are transformed to new
E. Communication Cost and Storage Cost coordinates in a new coordinate system. After the transfor-
To make a query, an LBS user sends two tokens to the cloud. mation, the distance information of any two points is still
The communication cost is O(n × log p). Under the settings preserved. Coordinate transformation is very efficient, and the
in Table II(b), the traffic is 4.75 KB, which is acceptable. The return results are accurate. However, solutions designed based
user has to store the attribute encryption key AK and pairing on coordinate transformation would be vulnerable to known-
parameter locally. The storage usage is dominated by M , which sample attacks [2].
is about 27 KB. This is negligible even for a mobile LBS user. 2) Privacy-Preserving POI Query Based on PIR: As far
Let N be the number of POI records in the database. In as we know, only PIR-based solutions [3], [4] can protect the
addition to LBS data, the cloud also needs to store the public privacy in both public LBS and outsourced LBS. Private infor-
parameter of IPRE and an ss-tree
ˆ of less than 4N/3 nodes (for mation retrieval (PIR) [5] is a privacy primitive hiding the
the case of mmin = 4). The size of the public parameter is dom- retrieved data item’s ID from the database server(s). Because
inated by (Ωk )τk=τ
2
1
, which is (τ2 − τ1 + 1) × log |Hash()| the data items being retrieved are hidden from the database
LI et al.: EPLQ: EFFICIENT PRIVACY-PRESERVING LOCATION-BASED QUERY OVER OUTSOURCED ENCRYPTED DATA 217

server(s), whether two queries’ results are the same or not R EFERENCES
are undetectable. Therefore, PIR-based solutions are resilient [1] A. Gutscher, “Coordinate transformation—A solution for the privacy
to access-pattern attacks. PIR can be used to realize all the problem of location based services?” in Proc. 20th Int. Parallel Distrib.
four kinds of POI queries. However, PIR is very communica- Process. Symp. (IPDPS’06), Rhodes Island, Greece, Apr. 25–29, 2006,
p. 424.
tive and computationally costly [6] for the following reasons. [2] W. K. Wong, D. W.-l. Cheung, B. Kao, and N. Mamoulis, “Secure
PIR requires linearly scanning all POI records including their kNN computation on encrypted databases,” in Proc. SIGMOD, 2009,
location data (coordinates and radii) and nonlocation data. pp. 139–152.
[3] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.-L. Tan,
Moreover, to use PIR in LBS, an LBS user must additionally “Private queries in location based services: Anonymizers are not neces-
access the LBS database’s index data in a privacy-preserving sary,” in Proc. SIGMOD, 2008, pp. 121–132.
manner. PIR can retrieve records if given their IDs. To support [4] X. Yi, R. Paulet, E. Bertino, and V. Varadharajan, “Practical k nearest
neighbor queries with location privacy,” in Proc. 30th Int. Conf. Data
spatial range query, an LBS user should obtain nearby POIs’ Eng. (ICDE), 2014, pp. 640–651.
record IDs from index data in a privacy-preserving manner. PIR [5] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private informa-
or other techniques may be used to obtain such IDs. tion retrieval,” J. ACM, vol. 45, no. 6, pp. 965–981, 1998.
[6] F. Olumofin and I. Goldberg, “Revisiting the computational practicality
of private information retrieval,” in Financial Cryptography and Data
Security. New York, NY: Springer, 2012, pp. 158–172.
B. Solutions for Public LBS Only [7] J. Katz, A. Sahai, and B. Waters, “Predicate encryption supporting dis-
junctions, polynomial equations, and inner products,” in Proc. 27th Ann.
1) Privacy-Preserving LBS Based on Anonymous Int. Conf. Theory Appl. Cryptograph. Tech. Adv. Cryptol. (EUROCRYPT
Communication: In this kind of solutions [27], [28], one ’08), Istanbul, Turkey, Apr. 13–17, 2008, pp. 146–162.
or more third parties relay messages between users and the [8] D. Boneh and B. Waters, “Conjunctive, subset, and range queries on
encrypted data,” in Proc. 4th Theory Cryptograph. Conf. (TCC’07),
LBS provider. This approach hides the linkage between user Amsterdam, The Netherlands, Feb. 21–24, 2007, pp. 535–554.
identities and messages from the LBS provider. The query area [9] D. Boneh and M. K. Franklin, “Identity-based encryption from the Weil
would be exposed to the LBS provider, but the user sending the pairing,” SIAM J. Comput., vol. 32, no. 3, pp. 586–615, 2003.
[10] D. A. White and R. Jain, “Similarity indexing with the ss-tree,” in Proc.
query is hidden among a set of users. 12th Int. Conf. Data Eng. (ICDE), 1996, pp. 516–523.
2) Privacy-Preserving LBS Based on Location Obfuscation: [11] A. Guttman, “R-trees: A dynamic index structure for spatial searching,”
In this kind of solutions [29], [30], to prevent the LBS provider in Proc. Annu. Meeting (SIGMOD’84), Boston, MA, USA, Jun. 18–21,
1984, pp. 47–57.
from knowing users’ precise locations, users submit low- [12] T. K. Dang, J. Küng, and R. Wagner, “The sh-tree: A super hybrid index
precision locations or fake locations along with real locations. structure for multidimensional data,” in Proc. 12th Int. Conf. Database
These solutions offer a weak level of privacy. Expert Syst. Appl. (DEXA’ 01), Munich, Germany, Sep. 3–5, 2001,
pp. 340–349.
3) Privacy-Preserving LBS Based on Spatial Cloaking: [13] B.-Y. Yang and J.-M. Chen, “All in the XL family: Theory and practice,”
This kind of solutions [31], [32] combines anonymous commu- in Proc. Int. Conf. Inf. Secur. Cryptol, 2004, pp. 67–86.
nication and location obfuscation techniques together. To the [14] G. Ars, J.-C. Faugere, H. Imai, M. Kawazoe, and M. Sugita, “Comparison
between XL and Gröbner basis algorithms,” in Proc. ASIACRYPT, 2004,
LBS provider, a user cannot be identified from a set of users in pp. 338–353.
a cloaking area, and the cloaking area instead of users’ precise [15] B. Yao, F. Li, and X. Xiao, “Secure nearest neighbor revisited,” in Proc.
locations is sent to the LBS provider. IEEE 29th Int. Conf. Data Eng. (ICDE’13), 2013, pp. 733–744.
[16] Y. Elmehdwi, B. K. Samanthula, and W. Jiang, “Secure k-nearest neigh-
All the above solutions can be applied to a wide range of LBS bor query over encrypted data in outsourced environments,” in Proc. IEEE
including POI query. However, their techniques do not allow the 30th Int. Conf. Data Eng. (ICDE), 2014, pp. 664–675.
cloud to search encrypted data. Therefore, they cannot be used [17] A. Khoshgozaran and C. Shahabi, “Blind evaluation of nearest neigh-
bor queries using space transformation to preserve location privacy,” in
for outsourced LBS where LBS data in the cloud are encrypted. Advances in Spatial and Temporal Databases. New York, NY, USA:
Springer, 2007, pp. 239–257.
[18] B. Hore, S. Mehrotra, M. Canim, and M. Kantarcioglu, “Secure multidi-
IX. C ONCLUSION mensional range queries over outsourced data,” VLDB J., vol. 21, no. 3,
pp. 333–358, 2012.
In this paper, we have proposed EPLQ, an efficient privacy- [19] I.-T. Lien, Y.-H. Lin, J.-R. Shieh, and J.-L. Wu, “A novel privacy preserv-
preserving spatial range query solution for smart phones, which ing location-based service protocol with secret circular shift for k-NN
search,” IEEE Trans. Inf. Forensics Secur., vol. 8, no. 6, pp. 863–873,
preserves the privacy of user location, and achieves confiden- Jun. 2013.
tiality of LBS data. To realize EPLQ, we have designed an [20] M. L. Yiu, G. Ghinita, C. S. Jensen, and P. Kalnis, “Enabling search
IPRE and a novel privacy-preserving index tree named ss-tree.
ˆ services on outsourced private spatial data,” VLDB J., vol. 19, no. 3,
pp. 363–384, 2010.
EPLQ’s efficacy has been evaluated with theoretical analy- [21] E. Shi, J. Bethencourt, T.-H. Chan, D. Song, and A. Perrig, “Multi-
sis and experiments, and detailed analysis shows its security dimensional range query over encrypted data,” in Proc. IEEE Symp.
against known-sample attacks and ciphertext-only attacks. Our Secur. & Privacy, 2007, pp. 350–364.
[22] B. Wang, Y. Hou, M. Li, H. Wang, and H. Li, “Maple: Scalable multi-
techniques have potential usages in other kinds of privacy- dimensional range search over encrypted cloud data with tree-based
preserving queries. If the query can be performed through index,” in Proc. 9th ACM Symp. Inf. Comput. Commun. Secur., 2014,
comparing inner products to a given range, the proposed pp. 111–122.
[23] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, “Order preserving
IPRE and ss-tree
ˆ may be applied to realize privacy-preserving encryption for numeric data,” in Proc. SIGMOD, 2004, pp. 563–574.
query. Two potential usages are privacy-preserving similar- [24] J. Shao, R. Lu, and X. Lin, “Fine: A fine-grained privacy-preserving
ity query and long spatial range query. In the future, we location-based service framework for mobile devices,” in Proc. IEEE
INFOCOM, 2014, pp. 244–252.
will design solutions for these scenarios and identify more [25] A. Boldyreva, N. Chenette, Y. Lee, and A. Oneill, “Order-preserving
usages. symmetric encryption,” in Proc. EUROCRYPT, 2009, pp. 224–241.
218 IEEE INTERNET OF THINGS JOURNAL, VOL. 3, NO. 2, APRIL 2016

[26] P. Wang and C. Ravishankar, “Secure and efficient range queries on out- Rongxing Lu (S’09–M’11–SM’15) received the
sourced databases using Rp-trees,” in Proc. Int. Conf. Data Eng. (ICDE), Ph.D. degree in computer science from Shanghai Jiao
2013, pp. 314–325. Tong University, Shanghai, China, in 2006, and the
[27] A. R. Beresford and F. Stajano, “Location privacy in pervasive comput- Ph.D. degree in electrical and computer engineer-
ing,” Pervasive Comput., vol. 2, no. 1, pp. 46–55, Jan./Mar. 2003. ing from the University of Waterloo, Waterloo, ON,
[28] Y. Zhu, D. Ma, D. Huang, and C. Hu, “Enabling secure location-based Canada, in 2012.
services in mobile cloud computing,” in Proc. 2nd ACM SIGCOMM From May 2012 to April 2013, he was a
Workshop Mobile Cloud Comput., 2013, pp. 27–32. Postdoctoral Fellow with the University of Waterloo.
[29] H. Kido, Y. Yanagisawa, and T. Satoh, “An anonymous communication Since May 2013, he has been an Assistant Professor
technique using dummies for location-based services,” in Proc. Int. Conf. with the School of Electrical and Electronic
Perv. Serv. (ICPS), 2005, pp. 88–97. Engineering, Nanyang Technological University,
[30] C. A. Ardagna, M. Cremonini, E. Damiani, S. D. C. Di Vimercati, Singapore. His research interests include computer network security, mobile
and P. Samarati, “Location privacy protection through obfuscation-based and wireless communication security, and applied cryptography.
techniques,” in Proc. Data Appl. Secur. XXI, 2007, pp. 47–60. Dr. Lu was the recipient of the Canada Governor General Gold Metal.
[31] M. Gruteser and D. Grunwald, “Anonymous usage of location-based
services through spatial and temporal cloaking,” in Proc. 1st Int. Conf.
Mobile Syst. Appl. Serv., 2003, pp. 31–42.
[32] M. F. Mokbel, C.-Y. Chow, and W. G. Aref, “The new Casper: Query Cheng Huang received the B.Eng. degree in infor-
processing for location services without compromising privacy,” in Proc. mation security from Xidian University, Xi’an,
32nd Int. Conf. Very Large Data Bases (VLDB’06), 2006, pp. 763–774. China, in 2013.
He is currently a Project Officer with the
INFINITUS Laboratory, School of Electrical and
Lichun Li received the Bachelor’s degree in infor- Electronic Engineering, Nanyang Technological
mation engineering from the Beijing University University, Singapore. His research interests include
of Posts and Telecommunications, Beijing, China, applied cryptography, cyber security, and privacy.
in 2002, the Master’s degree in communication
and information systems from the China Academy
of Telecommunication Technology, Beijing, China,
in 2006, and the Ph.D. degree in computer sci-
ence from the Beijing University of Posts and
Telecommunications, Beijing, China, in 2009.
He is currently a Postdoctoral Research Fellow
with the INFINITUS Laboratory, School of Electrical
and Electronic Engineering, Nanyang Technological University, Singapore. His
research interests include privacy and security in cloud and big data.

Você também pode gostar