Você está na página 1de 81

Xplore market| Entidade Promotora: Parceiro:

31/07/2018

Indexação de
Modelos
Geográficos
aplicáveis a
Dispositivos Móveis
Estado da Arte

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

Índice
1 Introdução .............................................................................................................................. 3
2 Indexação de Modelos Geográficos ...................................................................................... 8
2.1 Desenvolvimentos aplicáveis a dispositivos móveis ..................................................... 8
2.2 Desenvolvimentos aplicáveis a dispositivos móveis com tecnologia em nuvem ...... 47
3 Indexação Geográfica Aplicada ao Turismo ..................................................................... 64
3.1 Soluções open source..................................................................................................... 64
4 Considerações Finais e Desdobramentos ........................................................................... 80
5 Referências ........................................................................................................................... 81

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

1 Introdução

O estado da arte referente à linha de investigação 2: Indexação de Modelos


Geográficos pretende identificar, a partir de revisão bibliográfica, os principais
desenvolvimentos teóricos e empíricos para dispositivos móveis e aplicáveis ao turismo.
A base de dados com informações geográficas é denominada GIS (Geographic
Information Systems) e se caracterizam pela grande quantidade de informações. Mecanismos
de indexação são necessários para organizar o conteúdo e gerenciá-lo de forma a prover a
informação final requerida pelo usuário. Consequentemente, esse sistema coloca questões
importantes para uso em plataforma mobile, pois requerem grande espaço de memória, alta
capacidade de processamento e, por isso, grande uso da bateria do dispositivo móvel.
Este estado da arte se dedicou à investigação de requisitos, formalizações matemáticas,
análise e proposição de algoritmos para tratar de conteúdos georreferenciados. Também se
dedicou ao estudo teórico e matemático de implantanção de soluções estudadas aplicáveis a
dispositivos móveis, que tendo por base o estado da arte terão de ser adaptados e / ou propostas
novas abordagens para irem ao encontro dos requisitos e necessidades do Xplore.
O estado da arte de um determinado assunto diz respeito a produção técnica e teórica
acumulada sobre um tema até determinado período.

Definidas como de caráter bibliográfico, elas parecem trazer em comum o


desafio de mapear e de discutir uma certa produção acadêmica em diferentes campos
do conhecimento, tentando responder que aspectos e dimensões vêm sendo destacados
e privilegiados em diferentes épocas e lugares, de que formas e em que condições têm
sido produzidas certas dissertações de mestrado, teses de doutorado, publicações em
periódicos e comunicações em anais de congressos e de seminários. (Norma Sandra
Ferreira de Almeida, 2002, p. 1).

Herrera (2013) ressalta que o estado da arte mostra como os principais conceitos e
métodos têm sido tratados nas pesquisas existentes, servindo para nortear motivações e
limitações do desenvolvimento atual pretendido. Ao investigar como autores têm tratado
conceitos, algoritmos e métodos, atualiza e inspira o desenvolvimento atual pretendido. Tal
forma de investigação inclui o levantamento bibliográfico de um assunto feito de forma
intencional e sistemática.
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

Também são reconhecidas por realizarem uma metodologia de caráter


inventariante e descritivo da produção acadêmica e científica sobre o tema que busca
investigar, à luz de categorias e facetas que se caracterizam enquanto tais em cada
trabalho e no conjunto deles, sob os quais o fenômeno passa a ser analisado. (Norma
Sandra Ferreira de Almeida, 2002, p. 1).

A revisão bibliográfica requer uma estratégia de busca, de sistematização e de análise


dos trabalhos (Inoue, 2015). O levantamento do material aqui utilizado teve como norte a
identificação de estratégias de indexação de modelos geográficos para aplicações mobile. A
busca foi feita na base de dados Scopus e no Google Scholar a partir de palavras-chave abertas
dada a contemporaneidade do assunto. Nesta etapa, 47 trabalhos científicos foram encontrados.
Em seguida, fez-se uma análise de conteúdo a partir dos Resumos dos textos encontrados. Para
a sistematização desse conteúdo, utilizou-se a estratégia de coding, isto é, a aplicação de rótulos
que descrevessem a informação compilada.

Coding is analysis. To review a set of field notes, transcribed or synthesized, and


to dissect them meaningfully, while keeping the relations between the parts intact, is the
stuff of analysis. This part of analysis involves how you differentiate and combine the
data you have retrieved and the reflections you make about this information. (Miles &
Huberman, 1994, p. 56).

O primeiro coding foi descritivo e permitiu nomear as questões tratadas a respeito dos
modelos geográficos de indexação. Filtraram-se 18 textos. A partir desse universo mais amplo,
identificou-se as categorias necessárias para discussão do desenvolvimento aqui proposto.
Inicialmente, 15 textos foram selecionados a partir da leitura de seus resumos, e num segundo
momento, 9 textos tiveram sua leitura feita na íntegra. Por fim, 6 textos foram selecionados
para compor esta revisão bibliográfica. Eles aparecem na Tabela 1. Nela fica evidente a
complementação entre as categorias propostas e as palavras-chave encontradas no texto.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

Tabela 1. Categorias de análise, temas e palavras-chave da bibliografia selecionada

Coding Título Palavras-chave


Indexação, GIS, High-performance spatial indexing for mobile computing, location-based
mobile location-based services services, spatial indexing
Indexação, GIS, Geopot: A cloud-based geolocation spatial data service; spatial clustering;
mobile, nuvem data service for mobile applications mobile application; cloud computing
Indexação, GIS, Geo-Planar Indexing (GPI) - An mobile GIS; dynamic mobile geospatial
mobile efficient indexing scheme for fast database; Geo-Planar indexing; GPI
retrieval of raster-based geospatial data
in mobile GIS applications
Turismo: indexação, Spatial indexing of static maps for web GIS; online GIS; spatial index; Q-
GIS, web navigation in online GIS: Application tree; tourist sites; static map; tourism
for tourism web GIS application
Indexação, GIS, A Mobile storage system for massive spatial data, mobile device, data storage,
mobile spatial data layered and graded
Indexação, GIS, Intelligent selection technique for mRTree, QuadTree, R-Tree, GIS
mobile database indexing to augment the speed
performance of query processing on
mobile device

Intencionalmente, o estado da arte incluiu diferentes fontes científicas no intuito de


abarcar o máximo de desenvolvimentos possíveis, seguindo Souza et. al. (2017).

Por se tratar de uma temática recente na literatura e pouco explorada no âmbito


do turismo, optou-se por não restringir os resultados, utilizando-se a opção: todo tipo
de documento “Articles” ou “Articles in press”, “Journals”, “Book or Book chapter”,
“Article or conference paper”, “Conference Review”, “Editorial”, “Business Article”,
“Short Survey” e “Erratum”. Souza V.S., Varum C.M.D.A. & Eusébio C. (2017).

A Tabela 2 ilustra o equilíbrio das fontes entre material apresentado em conferências e


em trabalhos oriundos de revistas científicas. Isso dá relevância e profundidade a este estado
da arte, já que apresentações em conferências mostram assuntos mais atuais e, em revistas
científicas, os trabalhos estão mais desenvolvidos e robustos.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

Tabela 2. Fontes da bibliografia selecionada

Conferência Journal
International Congress on Image and Signal Advanced Materials Research
Processing
International conference on World Wide Web International Journal of Computer Applications in
Technology
International Journal of Geographical Information
Science
Life Science Journal

O tema vem ganhando importância nos últimos anos principalmente por envolver a
evolução dos dispositivos móveis; destaca-se o trabalho de 2003 que já lidava com essa questão.
Ao mesmo tempo, por ser tratar de uma área tecnológica, desenvolvimentos ocorrem o tempo
todo. Houve uma preocupação que os trabalhos selecionados abrangessem uma linha do tempo
coerente com esse desenvolvimento e evolução. Ao mesmo tempo, vale ressaltar que dos
trabalhos analisados de 2017, nenhum avanço relevante foi identificado. A Tabela 3 também
identifica a origem dos trabalhos, sendo dois oriundos da China.

Tabela 3. Categorias de análise no tempo e no espaço

Coding Data Filiação


Indexação, GIS, mobile 2003 Estados Unidos
Indexação, GIS, mobile, nuvem 2011 Canadá
Indexação, GIS, mobile 2012 China
Turismo: indexação, GIS, web 2013 África
Indexação, GIS, mobile 2014 China
Indexação, GIS, mobile 2014 Arábia Saudita

Com isso em mente, apresenta-se, a seguir, o levantamento bibliográfico sistematizado


nas categorias definidas na análise bibliográfica:

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

- Indexação de modelos geográficos, incluindo desenvolvimentos mobile e com tecnologia em


nuvem
- Aplicações no Turismo, com soluções open source.

Cada item se desdobrou em subcategorias que detalham as informações coletadas. As


principais contribuições e recomendações pertinentes a estratégias de indexação de modelos
geográficos são analisadas na seção Considerações Finais e Desdobramentos.
Por se tratar de informações prioritariamente técnicas, incluindo muitas demonstrações
matemáticas de algoritmos, optou-se pela manutenção literal das explicações dos trabalhos de
forma a garantir que as análises não fossem prejudicadas por possíveis imprecisões comuns aos
processos de tradução. De forma a manter a integridade das equações, capturas de tela foram
feitas a partir dos textos originais sempre que necessário.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

2 Indexação de Modelos Geográficos

Nesta seção são discutidos os principais métodos de indexação de informações


geográficas, com destaque para os algoritmos R-tree e Q-tree.

2.1 Desenvolvimentos aplicáveis a dispositivos móveis


As informações GIS são, de maneira geral, informações relacionadas a lugares e
demandam alta capacidade de processamento e armazenagem. Por isso, atuais
desenvolvimentos para uso em ambiente mobile se deparam com desafios relacionados à
capacidade de transmissão de dados (bandwidth), ao tamanho menor das telas e às próprias
limitações do aplicativo (Boucetta et al., 2014). A indexação espacial (ou geográfica) é o
processo que permite a gestão desses dados e se destaca por “process to retrieve spatial data
from data-store and to optimize spatial queries on spatial databases” (Boucetta et al., 2014, p.
239). Os autores se dedicam aos principais métodos de indexação espacial, R-Tree e Quadtree,
para a partir deles propor uma forma de melhorar o tempo de recuperar informações em um
dispositivo móvel. A ideia é ter um sistema que possa ser usado em banda 3G e 4G em uma
plataforma windows.
Segundos os autores, as abordagens existentes de GIS mobile foram feitas para lidar
com a interação entre dispositivos móveis e servidores. Destacam que as redes espaciais mobile
lidam com monitoramento de objetos em movimento (Boucetta et al., 2014). As questões
relacionadas são tempo de processamento e de resposta. Os autores apresentam os principais
desenvolvimentos, e suas limitações, para lidar com informações geográficas em ambiente
mobile. As informações estão no Quadro 1.

The limitation of memory and a low computational capacity in the mobile devices are
some of the problems in the spatial index and hashing methods. The volume of spatial data and the
computational cost of spatial operations are very large; however the mobile devices still stuck on
limited memory and a low computational capacity compared to the Personal Computer (PC).
Therefore, a spatial indexing for the mobile devices should be able to achieve good filtering efficiency
as well. Some of the problem has been solved using a spatial indexing which is called MHF (Multilevel
Hashing File) method for the mobile map service. The storage utilization of MHF is using the simple
hashing technique to improve the searching speed process. Therefore, designing a density scheme of
MBR (Minimum Bounding Rectangle) called HMBR (Hybrid MBR). Future research is expected to
be useful for mobile map service, ITS (Intelligent Transportation System), LBS (Location Based
Service) that have been increasingly studied recently is still needed in this area [9]. R-tree and

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

Quadtree indexes that use wide framework are the best spatial data indexing methods among any other
existing spatial indexing methods for low-dimensional spatial data [13]. In queries processing, R-tree
method may be more efficient due to better maintenance of spatial immediacy, but it might slow down
in updating or index creating and implementation of own concurrency protocols on top of table-level
concurrency mechanisms, since R-tree is built logically as a tree and physically using tables inside
the database and search involves recursive SQL for traversing tree from root to relevant leaves. Linear
Quad- tree results in simpler index creation, faster update and inherit configuration in B-tree
concurrency control protocols, because those indexes calculate tile estimation for geometries and use
existing Btree indexes for performing spatial search and other DML operations [13].

Chen et al, 2003 and Francis et al., 2008 that delivers parallel methodology and contributes
on development to this research is called mRTree spatial data indexing method. QR-tree presenting a
quick speed spatial indexing structure based on Quadtree and R-tree[14-15]. It carries out data space
with the space level partition strategy of Quadtree multistage partition and uses different R-tree index
space object for each partition subspace. The research indicates that although mRTree always required
more storage space than R-tree, it increased better performance in insertion, deletion, and especially
searching. The result has showed that the more amounts of spatial data, the less cost and the better
performance of mRTree. In the other word, for a large spatial database, mRTree obtained more
superiority compared to R-tree [16]. Another similar methodology is a scalable constraint-based Q-
hash indexing for moving objects [15]. All previous researches have mixed the algorithm of R-tree
and Quadtree to create a new structure of spatial data indexing method. However, all previous
researches have faced problems on storage consumption, while it is only better in some ranges of
data as well as moving object environment.

Quadro 1. Principais desenvolvimentos, e suas limitações, ao lidar com informações geográficas em


ambiente mobile.
Fonte: Boucetta et al., 2014, pp. 240-241.

O índice R-tree trata dados multidimensionais característicos das informações


geográficas. É escolhido para o ambiente mobile por lidar com sobreposições de objetos sem
maiores detalhes. A Figura 1 e 2 apresentam sua estrutura, explicada em citação literal dos
autores no Quadro 2.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

10

R-tree is tree data structures which is originally come from B-tree. R-Tree is used for
demonstrating multidimensional points of data, such as indexing for spatial processing methods on
multi-dimensional information. Each node in the tree interacts until the smallest d-dimensional
rectangle that surrounds its child nodes. The leaf nodes comprise cursor to the real geometric entity
in the database, as an alternative of child. The entity are symbolized by the tiny elements associated
with rectangle in which they are enclosed [16]. Commonly, the nodes will communicate with leaf,
thus the tree is selected so that a small amount of nodes is selected during a spatial query. Spatial
query might require a number of nodes to be called before recognize the existence or absence of
specific rectangle.

If we talk about Mobile GIS technology, we suggest choosing R-tree, due to R-Tree able to
handle data containing and several overlap like “whole Earth objects” (object of full earth without
requirement or detail), bounding-box based methods will not work properly. Overlapping data band
is answered by turning the splits that decrease the exposure, using the splits that decrease border of
bouncing boxes when creating nodes. As well, Mobile GIS will store, retrieve and process spatial data;
spatial data in Geographical Information System is a compulsory element that needs to be solved for
all of the problems that might occur in the future. In order to bring better understanding of R-Tree
algorithm we have put detail explanation as follow. R1 and R2 are the root nodes. R1 is an instance,
include child nodes R3 and R4, and include them with minimum bounding rectangle. …:

• All of the leaf node include with x and X index records excluding root. The root could have
less access than x.
• Every index record within a leaf node, indicate tuple represent the minimum rectangle. It
holds the m-dimension spatial data object.
• All of the un-leaf node excluding the root has around x and X child
• Every access in un-leaf node has smallest rectangle which grasp rectangles in every child
node.
• If the root is not a leaf, it has at least two child nodes.
• If the tree is stable and all leaves will be on the same level.

A lot of Scholars have explored R-tree spatial indexing since it is one of the finest spatial
indexing. It is also recommended in Oracle spatial database.

Figura 1. Tree Structure of R-Tree [16]


Fonte: Fonte: Boucetta et al., 2014, p. 241.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

11

Figura 2. Tree Structure of R-Tree - 2[16]


Fonte: Boucetta et al., 2014, p. 241.

Quadro 2. Índice R-tree.


Fonte: Boucetta et al., 2014, p. 241.

Os autores tratam dos algoritmos relacionados ao R-tree utilizados para agilizar o


processamento das consultas em base de dados.

… a lot of innovative spatial cluster grouping algorithm and R-tree insertion algorithm
has been planned to speed up query processing. Those algorithm such as: k-means
clustering method and employs the 3D overlap volume, 3D coverage volume and the
minimum bounding box shape value of nodes as an integrated grouping criteria [17]. A
scalable technique which is called Seeded Clustering will allows us to maintain R-tree
indices by bulk insertion while keeping speed with high data arrival rates has also been
proposed by Lee et.al. (2006). This bottom-up update policy on R-trees will generalizes
existing update techniques and aims to augment update performance [18]. An original
bulk insertion technique for R-Trees using Oracle 10g that is fast and does not decreased
the quality of the result is also already presented [19]. A generalization for the relatives
of R-trees which is called the Multi-scale R- tree, that allows efficient retrieval of
geometric objects at different levels of detail has also been proposed [20]. An efficient
Content-Based Image Retrieval (CBIR) system using R-tree spatial indexing which
utilized shape information of images in order to assist the retrieval process is also
created by database researcher [21]. (Boucetta et al., 2014, pp. 241-142).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

12

Para o ambiente específico dos dispositivos móveis, os autores destacam trabalhos com
HBR-tree, a combinação entre R-tree e QuadTree e o protocolo para busca kNN.

On the other hand, another scholar has describe the problem on real-time mobile GIS
based on the HBR-tree to control massive of location data efficiently have been used by
other researcher [22]. A Technique combining existing Q + R-tree and QuadTree in
terms of range query completing time by a high order of magnitude has also been
proposed[23]. An efficient protocol for the kNN search on a broadcast R-tree, which is
a popular on multi-dimensional index tree, in a wireless broadcast environment in terms
of latency and tuning time as well as memory utilization also become concern on R-Tree
Research [24-25]. (Boucetta et al., 2014, p. 242).

As considerações sobre o índice Quadtree são mantidas nas palavras dos autores e a
Figura 3 ilustra seu processo.
Quadtree is a tree data composition used to develop a set of hierarchical data
compositions. The general property is based on the principle of recursive decomposition
where internal node has up to four children. Basically, Quadtree is dividing two
dimension spaces then divides it into four sub-parts or regions. The part could be in
rectangular, arbitrary or square shape. Finkel and Bentley gave a name for Quadtree
on data structure in 1974. This spatial data indexing has similar partitioning method
with Q-tree. Quadtree has general decomposition methods where it crumbles the space
into flexible cell which has maximum capacity. The region will be divided, then directory
tree pursue Quadtree spatial decomposition when meet the optimum capacity [28].
(Boucetta et al., 2014, p. 242).

Figura 3. Tree Structure of QUAD-Tree [28]


Fonte: Boucetta et al., 2014, p. 242.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

13

Ao apresentar o algoritmo Quadtree, os autores lidam com o tratamento de imagens.

At this time, Quadtree is used for point data, curves, surfaces, areas and volumes. It will
be divided to the same parts on each level, or may be managed by the input. This process,
in image processing, is often expressed in terms of image space hierarchy against object
space tree. The decomposition resolution can be repaired, or may be arranged by some
materials of input data. In some applications the origin of data formation whether they
state the restrictions of sections can also be distinguished. Although it is not
recommended for particular spatial indexing, there are many advantages of using
Quadtree spatial indexing on special circumstances. An algorithm based on applying
eigenspace methods has been presented to a Quadtree of related set images to solve
estimation problem in the occurrence of occlusion or background clutter. The inability
to locate desired object and apply the appropriate normalizations effortlessly, are
efficiently overcome by the recursive Quadtree procedure [26]. Furthermore, the new
structure of Multi version Linear Quadtree (MVLQ) has been introduced based on
spatio-temporal access method. This indexing structure can be used as an index
mechanism for storing and accessing evolving raster images [27]. (Boucetta et al., 2014,
p. 242).

Explicados os principais métodos de indexação, R-tree e Quadtree, os autores vão


apresentar a combinação entre eles como a forma mais eficaz de lidar com aplicações GIS
mobile. Assim, eles pretendem explorar o melhor de cada método convencional de tratamento
de aplicações GIS. A combinação será denominada mRTree. Apesar de demandar maior espaço
de armazenamento, sua principal vantagem é a flexibilidade, adaptando-se às características
dos dados. A principal vantagem desta combinação é na melhoria da velocidade de transferência
dos dados espaciais, isto é, lidar com o ambiente móvel (Boucetta et al., 2014). As
considerações sobre o método são mantidas nas palavras dos autores. A Figura 4 esquematiza
essa combinação.

There are advantages and disadvantages of combining the indexes and build a selection
engine. The intelligent selection technique named as mRTree has disadvantages on
required more storage compared to regular index, but the difference storage is not too
significant and combination of the indexes are not flexible (only better in some ranges
of data). In contrast, mRTree method has advantages on flexibility, which depends on
the condition of the data; it also can choose which index is most suitable with the current
condition. This benefit gives significant improvement if we use the suitable spatial data
indexing. This section we describe how Hybrid Quadtree and R-tree spatial data
indexing method are used to create mRTree. The proposed fusion of Quadtree and R-
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

14

tree spatial data indexing method is based on the weakness and strength of each those
spatial data indexing methods. mRTree is used to combine those spatial data indexing
techniques. In implementation part, some of the tables used Quadtree spatial data
indexing, while the others use R-tree spatial data indexing method. One or two spatial
data indexing method will be implemented in a single spatial database. The use of R-
tree and Quadtree spatial data indexing method in different tables is based on the
condition of data and requirement of applications which will augment the speed of
spatial data query processing. (Boucetta et al., 2014, p. 242).

Figura 4. Architecture of Intelligent selection technique


Fonte: Boucetta et al., 2014, p. 243.

Ao combinar os métodos, o sistema seleciona o mais adequado levando em consideração


as seguintes condições: o número de registro espaciais, o desempenho da memória e do
processador do dispositivo móvel e a banda disponível (Boucetta et al., 2014). A tabela 1
mostrada na Figura 5 as especificações do sistema utilizado.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

15

Web Server and


Database Server Processor Intel Pentium i5
RAM 3GHZ
Web Server IIS
Oracle Application
GIS Server Server,Map Viewer and
MapGuide Open Source
Database Oracle 10g Spatial
HTC Touch
Pro2 Smartphone
Mobile Devices: OS Windows Mobile 6.5
Web
Browser Personal Internet Explorer

Figura 5. Table 1. mRtree System specification


Fonte: Boucetta et al., 2014, p. 243.

A Equação 1 mostra a formulação da técnica de indexação desenvolvida.

Figura 6. captura de tela


Fonte: Boucetta et al., 2014, p. 243.

Os autores procedem testes para a técnica proposta e seus resultados são aqui
apresentados. A técnica consiste, basicamente, em uma seleção inteligente do principal método.
A Figura 7 mostra o processo de busca de um local específico. Foi utilizado o mapa de Pasir
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

16

Gudang, uma cidade da Malásia, com dados disponibilizado pela universidade dos autores. Eles
partiram de um registro de poucos dados chegando a picos de 256.000 registros.

Figura 7. Result of mTRee run on HTC smartphone- 1


Fonte: Boucetta et al., 2014, p. 243.

A Figura 8 e a Figura 9 mostram parte dos resultados dos testes. Eles foram realizados
em redes sem fio 3G e 4G, além do servidor local em ambiente web. O teste mostra que, ao
carregar o mapa completo das informações espaciais da cidade estudada, os dados são claros e
de fácil interação. A resolução dos dados, porém, deve ser reduzida para que o desempenho do
carregamento de dados aumente. Sobre o desempenho do MTRee, nas palavras dos autores,
“this process required 3.5 second for 2000 records until 35 second for 256000 records”.
(Boucetta et al., 2014, p. 244).

Figure. 6 and 7 has shown promising result on mRTree with small response times even
on large quantity of data processing. … There are 6.13% faster if we use 4G connection
to load Pasir Gudang map as spatial data. If the wireless connection speed is below 3G
speed, the response time of query will increase rapidly (speed decrease). (Boucetta et
al., 2014, p. 244).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

17

Figura 8. Response time of various indexing technique compared to mRTree- Using HTC mobile 3G
connection
Fonte: Boucetta et al., 2014, p. 244.

Figura 9. Response time of various indexing technique compared to mRTree- Using HTC mobile 4G
connection
Fonte: Boucetta et al., 2014, p. 244.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

18

Já a Figura 10 se refere aos testes executados no servidor em uma plataforma web, o


que destaca o papel da arquitetura da rede no processo. Para os autores, com uma rede de
dados forte, a técnica conseguirá processar grandes quantidades de dados.

Even though processing on server, the speed of mRTree is much faster compared to
other algorithm like QuadTree or R-tree or even without any indexing. This
performance is believed to bring great change on mobile GIS technology, because with
strong support of network architecture and mRTree technique, it will be possible to
process large data and displayed on mobile devices like HTC or other Smartphone.
(Boucetta et al., 2014, p. 244).

Figura 10. Response time of various indexing technique compared to mRTree- on Server using web
Browser (LOCAL ACCESS)
Fonte: Boucetta et al., 2014, p. 244

A partir do desenvolvimento de uma técnica que identifica o melhor método para lidar
com dados espaciais, o trabalho dos autores permitiu entender alguns aspectos dos principais
algoritmos envolvidos na discussão sobre indexação de dados GIS em ambientes mobile.

The results on this study have shown the capability of mRTree to increase the
performance of query processing on mobile device through wireless broadband
connection. The intelligent selection technique holds important roles to make the best
selection when to use R-tree or Quadtree indexing. This choice can save the time
consumption for communication between server and mobile client. (Boucetta et al.,
2014, p. 244).
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

19

Destacam, ainda, que se pode melhorar na técnica, por exemplo reduzindo


armazenamento e consumo de memória no processamento do mRTree.
Myllymaki & Kaufman (2003) também discutem métodos de indexação de informações
geográficas, introduzindo o conceito location-based services (LBS), que são serviços que
conseguem identificar a posição geográfica do dispositivo móvel (wikipedia) e por isso oferecer
informações GIS mais acuradas. O LBS desafia o processo de indexação pois leva em
consideração o movimento do objeto a ser georreferenciado.

Second-generation LBS applications work on moving queries and highly


dynamic data. Consider queries such as “where are my friends?” (buddy tracking [2])
and “is someone following me?” These queries relate the position of the user asking the
question to the position of other users. Effective evaluation of these types of queries
requires a dynamic spatial index, an index that is continuously kept up-to-date with the
current (potentially also past and future) position of all users. The maintenance of the
index induces a very high update and query load, an issue that is receiving increasing
interest in literature [12, 13].
Proposals have been made to reduce the number of required up- dates by
filtering the location updates of the objects being tracked, thereby trading off some
amount of accuracy for higher performance, or storing in the index trajectories of the
objects and up- dating the information only when those trajectories change [8]. For
instance, if a user is traveling on a relatively straight road, there is no reason to
constantly update the index. Instead, it is sufficient to mark the position and velocity
vector of the user at the start of a linear segment of the road and update the index only
when the road or the user makes a turn.
Regardless of whether one stores simple geographic coordinates or trajectories
in an index, the issue remains that the index needs to be updated and queried as quickly
as possible. A trade-off that LBS middleware should be ready to make is not to try to
store all location data persistently – current external storage technologies simply
cannot sustain the update rates required. If a future storage technology provides the
speed of main-memory chips and happens to be persistent, that’s all the better, but for
now transient data structures should be accepted for high-performance location
tracking. We note that, in the event of main-memory loss, the index can be recreated
within a short period of time with the continuously arriving location data from the
objects being tracked. We also note that persistent storage should be used for
accumulating historical location data, making it amenable to non-realtime analysis, e.g.
pattern discovery [5]. (Myllymaki & Kaufman, 2003, p. 112).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

20

O trabalho dos autores se dedica a analisar o desempenho de três métodos de indexação


levando em consideração esse aspecto dinâmico do LBS. Os métodos são o R-tree (em que
alguns aspectos já foram apresentados neste estado da arte), o ZB-tree e o método de dispersão
(custom array/hashtable method).
Segundo os autores, LBS integram a gestão de dados espaciais com a comunicação sem
fio (Myllymaki & Kaufman, 2003). Isso significa ter uma posição do usuário do dispositivo
móvel relativamente precisa, e que precisa ser constatemente atualizada.

New applications for real-time location information are emerging. … Today,


mobile phone operators have a choice of half a dozen technologies for location
determination, each with distinct power consumption, handset compatibility, time-to-
first-fix, and in-building coverage characteristics. The focus in spatial data management
has traditionally been on Geographic Information Systems (GIS), VLSI design, and
mechanical CAD applications [3]. Dozens of multidimensional access methods have
been proposed for managing spatial data, including grid indices, quadtrees, R-trees, k-
d-trees, and space-filling curves such as Z-order [10]. Most of this research has been
done in the context of static data (e.g. cartographic data) and few albeit relatively
expensive queries (e.g. spatial joins). Moreover, the data usually consists of complex
spatial objects, e.g. polylines and polygons with hundreds of vertices or more [11].
In contrast, a spatial index for location-based services contains a very large
number of simple spatial objects (e.g. points) that are frequently updated. These “moving
object databases” pose new challenges to spatial data management [13]. The workload
is characterized by high index update loads and relatively simple but frequent queries
(e.g. range queries). A location update may merely contain the position of a user or
include a user’s trajectory (direction and speed over time) [12, 8]. Supporting
trajectories adds additional requirements to the index and query scheme [1]. A location
update may also expire at a certain point in time [9]. Finally, location data is inherently
imprecise because location determination yields only an estimate of a user’s location.
(Myllymaki & Kaufman, 2003, p. 113).

O trabalho se dedica, então, a analisar o desempenho e a escalabilidade dos três métodos


de indexação. Os autores vão utilizar, para os testes, o LOCUS Dynamic Spatial Indexing
Testbed com geração de dados do City Simulator (Myllymaki & Kaufman, 2003). Outras
considerações sobre o experimento são mantidas nas palavras dos autores.

… In [7], we defined the DynaMark benchmark with associated metrics for


measuring the cost (elapsed time) of inserting a user’s location, the cost of updating a
user’s location, the size of a spatial index, and the cost of executing various classes of
spatial queries. The performance experiments are carried out in the LOCUS Dynamic
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

21

Spatial Indexing Testbed [6], an extensible system for conducting performance


experiments on dynamic spatial indexing methods. The testbed is easily extended to
include new spatial indexing methods, new location data sources (both simulated and
real), new spatial query types, and new data visualization or analysis methods.

Our analysis focuses on range queries, the most common query type in LBS
applications. The experiments store in an index every single location update received
(i.e. generated by the generator), but the performance difference would be very small
even if the data contained trajectory information instead of simple geographical
coordinates because few additional attributes (velocity vector) need to be stored. Hence,
we argue that the maximum update and query rates shown in this paper would apply to
trajectory-based schemes reasonably well. (Myllymaki & Kaufman, 2003, pp. 112-113).

Como o trabalho dos autores é baseado nos experimentos realizados com os respectivos
métodos de indexação, o Quadro 3 sistematiza as condições destes testes. As explicações são
mantidas nas palavras dos autores.

DynaMark Benchmark
The DynaMark benchmark measures the performance and scalability of a spatial data
management system [7]. The benchmark executes a set of standard, dynamic spatial queries against a
set of location trace files produced with City Simulator [4]. Performance metrics consist of the cost of
updating a user’s location in the spatial index and the cost of spatial queries against the index.

The size of an individual benchmark run is determined by the number of mobile users contained
in the location trace file and ranges from 10,000 to one million or above. Each record in the location
trace file represents a location update that contains the following information: object ID (identifies
mobile user), timestamp (indicates the time when the location was determined or reported by the user),
and X, Y, and Z coordinates of the location. The X and Y coordinates represent the longitude and latitude
values of the location, and Z indicates elevation in meters.

The benchmark defines three types of queries: proximity queries, k-nearest neighbor (kNN)
queries, and sorted-distance queries. The queries are typically centered around the location of a user
issuing the request. A proximity query finds all objects (mobile or stationary) that are within a certain
range, while a kNN query finds the k nearest objects. A sorted-distance query lists objects in increasing
distance order relative to some reference point.

LOCUS Spatial Indexing Testbed


The LOCUS testbed provides a convenient way to run performance experiments on spatial data
management systems and pro- duce performance data that conform to the DynaMark benchmark
specification. The testbed is written in portable C and has been tested on several platforms, including
Windows, AIX, and Solaris. We are currently making the testbed code available on the IBM alphaWorks
developer Web site at http://alphaworks.ibm.com.
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

22

The testbed is extensible with new spatial indexing methods, query generators, and index
visualization methods (Figure 1). The testbed defines a C API for each of the extension types, using
function pointer arrays to achieve something analogous to the Java “interface” concept. An indexing
method extension provides a spatial indexing capability and supports API calls for creating, managing,
and searching an index. A query generator creates one or more spatial queries, typically using the
location of an existing user as a reference. For example, a proximity query is a range query around a
user’s location. An index visualization method provides a simple way for the testbed to plot an index,
e.g. minimum bounding rectangles in an R-tree index.

Figura 11. Architecture of LOCUS Spatial Indexing Testbed


Fonte: Myllymaki & Kaufman, 2003, p. 113.

Location trace files for the testbed are produced with City Simulator [4], a scalable, three-dimensional
model city that simulates an arbitrary number of mobile users moving about in a city, driving on streets or walking
on sidewalks and entering and moving inside buildings. We typically simulate population sizes ranging up to 1
million users.

Extending LOCUS
The testbed has been adapted to use most commercial database systems and their spatial extensions, as
well as several main memory indexing methods. The public version of the testbed comes with two main memory
extensions: a naive array extension and a ZB-tree extension. The array extension implements a combination of a
simple array and hashtable which work together to record the position of moving objects. Updates into this data
structure are extremely fast but queries obviously perform poorly. The array method is intended to be used as a
baseline and sanity check for the experiments. Any real index should perform better than this baseline method.
The ZB-tree method calculates a Z-order value for a moving object in the extension code and stores it in
a binary tree. We used a readily available implementation of binary trees, namely the libavl library that implements
several varieties of them. Z-order values are stored in a right-threaded AVL-tree which performs very well for
proximity queries, requiring only a left-to-right traversal of the leaf nodes of the tree. Sentinel entries representing
the lower-left and upper-right corners of the proximity query window are first inserted into the ZB-tree. Leaf nodes
between the sentinel entries are traversed and returned, followed by removal of the sentinel entries from the tree.
The testbed has also been adapted to use a custom implementation of a memory-resident R-tree and its
variants. Updates to the tree are done in two steps. Realizing that objects make only slight movements between
location updates, the update first checks to see if the current leaf node that contains the object can continue to hold
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

23

it, albeit with new coordinates. Failing that, a delete followed by a normal top-down insertion is done. This scheme
dramatically improves the update performance of the tree index. Lookups are currently performed top-down,
however, an improved scheme would be to search bottom-up until the desired window for a range query, or the
number of items found for a kNN query, has been satisfied.

SETUP FOR PERFORMANCE EXPERIMENTS


A location tracking system that scales up to a hundred million moving objects and beyond is achievable
by partitioning the problem and data space into smaller, local problems that each can be solved using a single
machine. Our task therefore was to measure the scalability of the three main-memory spatial indexing methods on
one standalone machine. We ran the experiments on an IBM xSeries 330 server, a commodity machine equipped
with one 1 GHz Pentium III processor and 1 GB RAM, half of which was re- served for the spatial index (however,
none of the methods required more than 50 megabytes to index even the largest population size). We varied the
size of the mobile user population in the experiments from 10,000 users to 1,000,000 users. The server was
dedicated to the performance tests and there were no other processes on it during the tests.

The City Simulator toolkit was used to produce location trace files for each of the population sizes tested
(10K, 50K, 100K, 200K, 500K, and 1M users). The trace files were stored on a 7200 RPM SCSI disk whose data
transfer rate far exceeded the update performance of the indexing methods, guaranteeing that the testbed was never
waiting for a disk read.

We ran the UPDATE test and query type Q1 of the QUERY tests defined in the DynaMark benchmark.
The UPDATE test updates the position of each mobile object once in each 30-second period (in simulated time)
and was executed until either the trace file was completely processed or an upper time limit of 6 hours had elapsed.
Update cost, index size, and other metrics measured by the LOCUS testbed were collected at an interval of 2500
updates.

Query type Q1 of the QUERY test represents a sequence of random proximity queries (details below).
The total execution time of the test was limited to 12 hours, which allowed the slowest indexing method to process
several iterations of even the largest population size, that is, even for the one million population size the position
of every object was updated and queried numerous times. Query cost and other statistics were measured after every
2500 location trace records.

Proximity (range) queries were created by picking an object from the index at random (uniform
distribution) and then constructing a fixed-size window around that object. The size of the window was
approximately 1% of the total area of the coordinate space in these experiments. Consequently, the number of
results returned by these queries was, on average, 1% of the corresponding population size. Only after the objects
in the result set had been extracted from the index and stored in a vector (ready to be returned to a potential ap-
plication program) was the execution of the query considered complete.

The correct coding of the R-tree index, the most complicated of the three methods investigated, was
validated by running several scaled-down experiments where the tree was visualized and inspected. We
experimented with two variations of R-trees: the basic R-tree, which splits nodes by minimizing the total area of
the split nodes but permits an overlap between nodes, and the R*-tree, which eliminates overlap but leads to
potential deadlocks where non-overlapping splits do not exist. In our experiments deadlocks were not possible,
however, since we indexed point objects only and a clean split was always found.

Quadro 3. Condições dos experimentos


Fonte: Myllymaki & Kaufman, 2003, pp. 113-115.

Os resultados dos experimentos permitem a compreensão de diferentes aspectos dos


métodos em questão. As figuras da Figura 12 mostram o desempenho do índice ZB-tree, que é
explicado em citação literal.
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

24

Figures 2 and 3 show the location update and proximity (range) query cost of
the ZB-tree method as a function of population size and number of updates performed.
The X axis corresponds to the time dimension during the experiments, with the left edge
of each curve corresponding to the start of the experiment and the right edge
corresponding to the conclusion. We see that the update and query cost experience
initial randomness but eventually stabilize after the position of each object has been
updated several times and the index has grown to its full size.
We have plotted the graphs with the number of updates performed (number of
trace file records processed) on the X axis to make it easier to compare the population
size with the corresponding record numbers. For instance, with the 1M population size
the first 1 million records merely populate (insert) the index, with subsequent records
causing actual updates to the index. In Figure 2, we can clearly see a change in the
behavior of the update cost from the initial steady increase to the eventual settling and
minor oscillation (most of which is due to operating system and clock measurement
artifacts, given that the measurements are on microsecond scale). Similarly, Figure 3
shows that the query cost plateaus at the insert/update knee point.
The values of the curves shown in Figures 2 and 3 at the end of the experiments
correspond to the final, steady-state performance of the ZB-tree. We also extracted the
corresponding steady-state update and query cost values for the R-tree and
array/hashtable methods. In Figure 4 we show the steady-state update cost of each
method as a function of population size. We observe that the array/hashtable method
dominates the tree methods, as is expected, given its simple processing requirements.
The average update cost of the array method increases from 2 to 8 microseconds as the
population size increases to one million. In other words, with an index of one million
moving objects, the array method can process 125,000 updates per second. (Myllymaki
& Kaufman, 2003, p. 115).

Figura 12. Captura de tela.


Fonte: Myllymaki & Kaufman, 2003, p. 114.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

25

Os autores fazem algumas considerações sobre a capacidade de processamento dos


métodos a partir de objetos em movimento. O método ZB-tree consegue processar 40.000
atualizações por segundo partindo de uma base de um milhão de objetos em movimento. Já os
métodos R-tree e R+-tree (uma variante do original), com a mesma base de dados, processa
2.000 atualizações por segundo (Myllymaki, J., & Kaufman, J., 2003). A Figura 13 apresenta o
steady-state query cost comparativo dos métodos. Os autores comentam as razões para o
desempenho inferior daqueles métodos.

What was a performance disadvantage for the R-tree and R*-tree during index
updates, has turned into a performance advantage during lookups. Their performance
dominates that of the other methods, if by only a factor of 2 margin or less. It is
interesting to note that while the naive array/hashtable method performs poorly with
small-to-medium population sizes, its simplicity contributes to the narrowing of the
performance gap at larger population sizes. Whereas at population size 100K the gap
is roughly a factor of 15, at population size 500K it is less than 10, and at 1M it is down
to 4 or less.
A likely explanation for the deterioration of query performance of R-tree and
R*-tree is that with very large population sizes the indices become too fragmented and
their selectivity decreases. Too many moving objects compete for space in the same node
of the tree, resulting in frequent node splits and small bounding boxes. The precise
behavior of the index may depend on the relationship between the number of objects
that fit into a node (in our case, 204 objects in each 8 kB node) and the areal density of
moving objects. Also note that the size of query boxes created by our query generator is
independent of the population size. Hence, the same query at population size 1M will
return 10 times as many matching objects as with population size 100K, on average.
However, all indexing methods tested in our experiments were subject to the same
increase in result set size.
In summary, with one million objects in the index, the R-tree method can process
7 requests per second, while the R*-tree and ZB-tree methods can process 5 requests
per second, and the array method 2 requests per second. (Myllymaki & Kaufman, 2003,
pp. 115-116).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

26

Figura 13. Captura de tela


Fonte: Myllymaki & Kaufman, 2003, p. 115.

Os autores apresentam o desenvolvimento matemático para calcular o tamanho máximo


da população acomodado por cada método. As informações estão no Quadro 4.

To estimate the maximum population size supported by an indexing method, we fitted a


function f (P ) = a · P + b to the measured update and query costs of the indexing methods.
Parameter P is population size and f (P ) is the estimated cost of the database up- date or query for
that population size. We assume that the workload consists of a sequence of database requests, issued
at an average interval of i seconds per moving object, where a given request is an update with
probability p and query with probability 1 - p. Therefore, in a system where moving objects report
their location every 5 minutes and every update is followed by a query, the values are i = 300 and
p = 0.5.

A distinct cost function and parameters a and b are needed to rep resent updates and queries.
By solving the simultaneous linear cost equations for queries and updates one can define new
parameters a/ and b/ based on the weighted average:

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

27

Figura 14. Captura de tela


Fonte: Myllymaki & Kaufman, 2003, p. 116.

Quadro 4. Desenvolvimento matemático para calcular o tamanho máximo da população acomodado por
cada método
Fonte: Myllymaki & Kaufman, 2003.

A figura 6 do texto trata das populações máximas de cada método, bem como o tempo
de processamento. A figura 7 do texto apresenta os resultados do método ZB-tree sob diferentes
condições de cargas de informações. Ambas as figuras aparecem na Figura 15. As explicações
detalhadas são mantidas nas palavras dos autores.

Using the estimates provided by Equation 3, in Figure 6 we show the maximum


population sizes supported by the indexing methods at different reporting intervals
(values of i) in an update-only tracking application. We observe that the array/hashtable
method supports up to ten million moving objects when the reporting interval ranges
between 1 and 15 minutes. A reporting interval of 30 seconds reduces the maximum
population size to just under one million. The ZB-tree method is roughly a factor of 2
less capable. Still, with a realistic reporting interval of a few minutes it can support a
population of several million objects. In contrast, the R-tree and R*-tree methods were
limited to population sizes of less than one million with a comparable reporting interval.
Figure 7 shows the maximum population size supported by the ZB-tree method
under various update/query workloads. When the workload consisted of updates only
and each moving object reported its location every 5 minutes, the indexing method could
handle roughly 5 million moving objects. The high capacity was made possible by the
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

28

low update cost of the ZB-tree shown in our experiments, approximately 20


microseconds per update. The maximum population size dropped quickly as the fraction
of queries increased. With a 99% update ratio, maximum population size was 0.5
million, and with 95% updates it dropped to 0.3 million. With higher query loads the
maximum population size settled at the 100,000 to 250,000 range. (Myllymaki &
Kaufman, 2003, pp. 116-117).

Figura 15. Captura de tela


Fonte: Myllymaki & Kaufman, 2003, p. 116.

Lidar com objetos em movimento, que é o objetivo de aplicações que tratam de


informações geográficas-espaciais em ambiente mobile, exige um forte desempenho de
atualização e capacidade de consulta. Ao analisar os três métodos usuais de indexação especial,
o trabalho dos autores permite entender seu comportamento. Eles destacam três questões
principais decorrentes dos testes que executaram, a saber: é factível lidar com grandes
quantidades de dados em movimento sem grandes estruturas, há que se considerar a carga de
trabalho da aplicação LBS para escolher o método ideal de indexação e, por fim, há um conflito
de escolha entre o desempenho da atualização e da consulta para cada um dos métodos. Como
desdobramento do trabalho, os autores citam a necessidade de investigar técnicas que reduzam
o número de atualizações necessários ao índice.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

29

First, even with a low-cost, commodity server one can comfortably maintain and
query an index of up to several million moving objects in real time. In the worst case
where every index update is accompanied by a query (“I am here, where is everybody
else?”) at least 100,000 moving objects can be tracked in realtime on the server. By
using filtering mechanisms or storing trajectory information instead of individual
location updates, one can boost the performance even further.
The second observation is that the index workload produced by an LBS
application directly influences the choice of the indexing method [grifo nosso].
Specifically, the relative frequency of updates vs. queries favors either update-optimized
or query-optimized indexing methods. We point out that there may very well be location
tracking applications where updates dominate the workload and sheer update power is
called for, at the expense of query performance. For these applications, even the naive
array/hashtable solution provides the scalability to populations of several million
objects on a single processor.
In contrast, applications where at least a moderate fraction of requests are
queries will want to trade off update performance for query performance. Our
experiments showed that, for all the three indexing methods, query cost is typically 1 to
3 orders of magnitude higher than update cost on the same population size [grifo
nosso]. (Myllymaki & Kaufman, 2003, p. 117).

Shea & Cao (2012) vão avançar no entendimento de aplicações GIS em ambiente mobile
ao proporem um método de indexação que forneça uma forma mais simples e mais rápida de
georreferenciar imagens de mapa (tal ação corresponde a aplicações GIS). O índice proposto,
Geo-Planar Indexing (GPI), é considerado superior ao usual método R-tree.
Para desenvolverem o algoritmo do índice proposto, os autores vão detalhar o que
implica ter imagens georreferenciadas em um dispositivo móvel.

Basically, the transmission of map contents to a mobile client comprises two


kinds of information – geographic data and attributes. There is a common set of
geospatial data that needs to be provided to mobile clients irrespective of the type of
application. This common set of geospatial data contains: (1) road features; (2) building
features; (3) hydrological features such as coastlines, rivers and reservoirs; and (4)
major point features such as bus stops and addresses. This set of data is usually known
as the “map base”. The map base data can be regarded as a set of collected objects
containing geo-referenced images and thus it possesses the following characteristics:
(1) delivered to the client as a single image; and (2) each single image is geo-referenced
(i.e. position information is embedded in the image).
The map base data serves as the backdrop for displaying other geographically
related data. Therefore, a second set of data associated with the map base, is sent at the
same time to the client. This set of data is known as the “application data”. Usually, the
application data contains more descriptive information than position information. The
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

30

number of data layers to be included can be one or more. References 1 through 10


provide some underlying technologies and principles incorporated in the development
of mobile GIS.
The main purpose for devising the geospatial index numbering system, called
the Geo-Planar Index (GPI) method, is to provide a faster and simpler way, to fetch the
regularly geo-referenced map base images, than that offered by the conventional R-
Tree spatial access algorithm. The usual case is to determine the map base tile that
contains the input point, using a spatial search algorithm called the point-in-polygon
test. That is, an algorithm is devised to find the spatial index number of the map base
tile that contains the input coordinates of a given point. (Shea & Cao, 2012, p. 1047).

O Quadro 5 reproduz as informações sobre o tratamento do mapa de Hong Kong


utilizado para montar o índice proposto.

To start with the numbering algorithm for our GPI method, a seamless base map image covering
the whole of the Hong Kong area is first created from a series of 1:5000 maps. The ground coverage
for each of these base images is 3750m in the x direction and 3000m in the y direction. The
corresponding image size is 4800 x 3840 pixels. Each base image is partitioned into 15 rows by 15
columns evenly to give a total of 225 small tiles. Following this partitioning concept, the whole of
Hong Kong is covered by 57600 small tiles (16 x 4 x 4 x 225) in a mesh of 240 rows by 240
columns.

The approach to obtaining the GPI number of a small tile in the seamless base map image is
based on the coordinates of the lower left corner of the small tile. The GPI number for each is an
8-digit number defining the row number and column number of the small tile position in the mesh
of 240 rows by 240 columns. The first four digits designate the row number, and the last four digits
designate the column number. The lower left corner of the small tile has a zero value for both row
number and column number. The row number is increased in the “northing” direction and, the
column number in the “easting” direction. In order to avoid leading zeros, both row number and
column number are shifted by a value of 8000 (refer to Fig. 1). The spatial index number for the
lower left corner of the small tile is, therefore, 80008000, whereas, the upper right corner tile will
have an index number 82398239. Some examples are shown in Fig. 2.

Figura 16. GPI number based on coordinates of LL corner


Fonte: Shea & Cao, 2012, p. 1047.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

31

Figura 17. Examples of GPI number.


Fonte: Shea & Cao, 2012, p. 1048.

The GPI number for this approach depends on the coordinates of the lower left corner of the
small image tile which, in turn, depends on which row and column band that the small tile falls
into. Expressed mathematically, we have

RowNr = f(y, multiples of 200), (1)

ColumnNr = f(x, multiples of 250). (2)

Using the same example, (804550, 802500), the point is contained within the small image tile
with geo-planar index id of 80128018.

Quadro 5. O índice no mapa de Hong Kong


Fonte: Shea & Cao, 2012, pp. 1047-1048.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

32

Com o tratamento das imagens definido, os autores apresentam o algoritmo para o GPI
proposto no Quadro 6.

Based on the equations (1) and (2), an algorithm for finding the GPI number is formulated as
follows:

Algorithm 1
FIND-GPI(P, FO) /* FO is the false origin = (800000, 800000) */

1 RowNr = (P.y – FO.y) \ 200 /* integer division is used */ 2


ColNr = (P.x – FO.x) \ 250

3 S = (8000 + RowNr)  10000 + (8000 + ColNr)


There are two ways of expressing
4 Return S
a minimum bounding rectangle (MBR): (1) lower left
corner coordinates plus upper right corner coordinates; and (2) lower left corner coordinates, plus
width and height of the bounding rectangle. The second method is used to store the small image tile
minimum bounding rectangle information.

The equations for finding the lower left corner coordinates of the MBR are expressed as
follows:

xtile = 800000 + ColumnNr 250, (3)


ytile = 800000 + RowNr 200. (4)

For example, the lower left corner of the MBR for the tile with geo-planar index id of
80128018 is: (804500, 802400).

Quadro 6. Algoritmo de GPI


Fonte: Shea & Cao, 2012, p. 1048.

Como vão comparar o método proposto com o usual, os autores fazem algumas
considerações breves sobre o R-tree que permitem ampliar o conhecimento da técnica já
discutida neste estado da arte.

The R-Tree [11] was proposed by Guttman in 1984 with the aim to handle
geometrical data in high-dimensional space.
Faloutsos et al. [12] was the first attempt to estimate the performance of R-Tree
for queries in a more formal and analytical approach. The analysis in this paper
provided the framework for almost all related work in estimating the performance of R-
Tree that followed it. Later, Kamel and Faloutsos [13] introduced the following formula
to evaluate the expected number of disk access P(qx, qy) for a query of size qx qy as a
function of the geometric characteristics of the generic R-Trees [ver Figura X].
The [below] formula allows us to estimate the number of disk accesses for a
query window q with the assumptions that “the point queries are uniformly distributed
in the address space and the address space is the unit square.” Obviously, the retrieval
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

33

of a record using R-Tree will take more disk accesses than using conventional index
method. (Shea & Cao, 2012, p. 1048).

Figura 18. Captura de tela


Fonte: Shea & Cao, 2012, p. 1048.

Sobre recuperar imagens via GPI de uma base de dados espaciais em ambiente SQL.

Retrieving a GPI-tagged image BLOB from the geospatial database is a simple


technique that works similar to select record(s) from a database indexed on the GPI
field. To represent the dynamic record set, we use an indexed table, denoted by T[0..m],
in which each position, or record, corresponds to a key in the universe U = {0, 1, … ,
m}. We shall assume that no two records have the same GPI number. Image record g
points to an element in the table with key g. The search operation is trivial to implement
in SQL environment:
DIRECT-GPI-SEARCH(T, g)
1 return T[g]
This search operation obviously takes only O(1) time for single record as is the
case in a simple retrieval of record from a conventional indexed table search. (Shea &
Cao, 2012, p. 1048).

Os autores afirmam que o método GPI supera o desempenho do R-tree: “due to the
simple indexing mechanism GPI (indexed objects are actually 1-dimensional objects) does not
suffer from the dimensionality curse, and the retrieval of record via conventional index key
definitely outperforms R-Tree method” (Shea & Cao, 2012, p. 1048). Por isso, asseguram que
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

34

a técnica pode ser usada para organizar e consultar bases geográficas de dados de forma
eficiente.
Os autores procedem, então, testes que comparam o desempenho de ambos os métodos.
As tabelas 1 e 2 apresentadas nas Figuras 19 e 20 tratam dos dados considerados.

A test was carried out to measure the retrieval times for different test locations and
retrieval strategies. In order to evaluate the real performance of geospatial information
retrieval from a local large database using GPI method on a mobile device, 11 test
locations were selected throughout the whole Hong Kong region. Each contains three
different geospatial data densities: low, medium, and high, as shown in tables I and II
below. (Shea & Cao, 2012, p. 1049).

Retrieve Location (x,y)


Location Description Easting (m) Northing (m)
L01 Airport 811068 817042
L02 Discovery Bay 819323 817803
L03 Lamma Island 829723 809672
L04 Central 834070 816140
L05 Stanley 839354 809482
L06 Tseung Kwan O 842946 817592
L07 Tai Kwok Tsui 835148 820517
L08 Sai Kung 846127 827234
L09 Tai Po 834600 834600
L10 Fairview Park 823461 836961
L11 Yuen Long 824600 838500

Figura 19. Table i. Retrieve location


Fonte: Shea & Cao, 2012, p. 1049.

Retrieve Size (bytes)


Loc. 5x5 6x6 7x7 8x8 9x9 10x10 11x11
25 3 49 64 81 100 121
tiles Tiles tiles tiles tiles tiles tiles
L01 310136 369654 553705 681416 904568 1137401 1369400
L02 498422 738676 918999 1184619 1408919 1713146 1910878
L03 430053 553854 707431 837590 985122 1098576 1257571
L04 1029315 1550127 1741560 2352840 2488674 3217932 3439232
L05 578506 854456 1154879 1379378 1647709 1799863 2083826
L06 726122 910055 1213032 1365692 1691434 1836339 2224201
L07 1116481 1474549 2005433 2491434 3243342 3649831 4542058
L08 582857 792343 945431 1222991 1382636 1796070 1986600
L09 896247 1172807 1509272 1832356 2202250 2632340 3163352
L10 576293 786448 1073419 1283618 1624298 1846066 2300549
L11 482562 775670 975361 1279774 1564442 1979858 2311217

Figura 20. Table ii. Retrieve size


Fonte: Shea & Cao, 2012, p. 1049.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

35

Considerações sobre o SQL aparecem no Quadro 7.

The SQL select statement used in the R-Tree spatial access method is:
SELECT PKUID_s2,TileImage
FROM basemap_image

WHERE PKUID_s2 IN (SELECT PKUID_s2

FROM basemap_meta
WHERE MBRContains(

BuildMBR(lower_left_x, lower_left_y, upper_right_x, upper_right_y), TileMBR)

The SQL select statement used in the GPI method is:


ORDER BY PKUID_s2 ASC;

SELECT PKUID_s2,TileImage

FROM basemap_image
WHERE PKUID_s2 IN (list_of_Ids)
ORDER BY PKUID_s2 ASC;

Quadro 7. Considerações sobre o SQL.


Fonte: Shea & Cao, 2012, p. 1049.

O código do algoritmo que testa o tempo de recuperação dos registros dos métodos em
questão é detalhado na Figura 21.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

36

Figura 21. Captura de tela Algoritmo 2


Fonte: Shea & Cao, 2012, p. 1049.

A partir do algoritmo 2 e das considerações sobre o SQL, os autores desenvolveram uma


aplicação mobile para medir o desempenho da recuperação de informações geográficas. Sobre
os testes, a descrição dos autores é mantida de forma literal.

Based on the above algorithm (Algorithm 2) and the 2 SQL statements, a mobile
application was developed with the aim of measuring the geospatial data retrieval
performance, using two different algorithms on mobile devices. (p. 1049)
The geospatial data are retrieved from the “HKEN_s2_encripted.spatialite”
database installed on the mobile device. The spatial information is stored in the two
tables: (1) basemap_image; and (2) basemap_meta. The size of the geospatial database
is 503,411,712 bytes.
Two Microsoft Windows Mobile-based devices are used:
(1) Dell Axim X51v – Intel PXA270 CPU @ 624 MHz, 64 MB RAM, Windows
Mobile 5.0 Professional; and (2) Acer NeoTouch – Qualcomm 8250 CPU @ 1 GHz,
256 MB SDRAM, Windows Mobile 6.5 Professional.
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

37

In order to obtain a more reliable retrieval time for each geospatial access
method, an average of 10 measurements was made at each location for each mobile
device. The collected raw data were stored in the two local geospatial database files:
(1) _X51v_test_results.spatialite; and
(2) _NeoTouch_test_results.spatialite. (Shea & Coa, 2012, p. 1049).

Dois dispositivos móveis diferentes foram usados para os testes. As tabelas apresentadas
nas Figuras 22 e 23 mostram os resultados das diferentes estratégias de recuperação de
informação, também ilustrados nas Figuras 24 e 25.

Retrieve Time Summary (seconds)


Device: Acer NeoTouch

5x5 6x6 7x7 8x8 9x9 10x10 11x11

R-Tree GPI R-Tree GPI R-Tree GPI R-Tree GPI R-Tree GPI R-Tree GPI R-Tree GPI
L01 21.1 2.2 21.5 2.2 21.6 3.0 21.8 3.4 22.2 4.1 22.8 4.7 23.0 6.1
L02 21.6 2.6 21.8 3.7 22.1 4.0 22.4 5.0 22.9 5.9 23.4 7.4 23.6 8.6
L03 21.7 2.5 21.6 2.6 21.8 3.7 22.1 3.8 22.3 4.6 22.7 4.9 23.0 5.8
L04 22.6 4.5 23.0 6.6 23.5 6.9 24.4 9.8 24.8 11.0 25.5 15.8 25.9 15.9
L05 21.9 3.0 22.1 3.9 22.5 5.0 22.7 6.0 23.1 7.3 23.5 7.6 24.1 9.0
L06 22.2 3.5 22.6 4.0 22.9 5.2 22.8 5.9 23.6 7.2 24.0 7.8 24.5 9.3
L07 22.5 4.9 23.2 5.9 23.9 8.4 24.4 10.7 25.7 14.6 26.1 16.9 27.5 23.8
L08 21.9 3.0 22.3 4.0 22.3 4.2 22.9 5.3 22.9 6.0 23.7 7.8 24.0 8.4
L09 21.9 4.0 22.8 5.0 23.0 6.5 23.5 8.0 23.9 9.4 24.4 11.2 25.2 13.7
L10 22.1 2.7 21.9 3.8 22.2 4.6 22.5 5.4 23.0 6.8 23.4 8.0 24.2 10.0
L11 21.3 2.5 22.0 3.4 22.4 4.3 22.8 5.4 23.1 6.6 23.9 8.2 24.2 10.0

Figura 22. Tabela 3. Retrieve time summary


Fonte: Shea & Cao, 2012, p. 1050.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

38

Retrieve Time Summary (seconds)


Device: Acer NeoTouch

5x5 6x6 7x7 8x8 9x9 10x10 11x11

R-Tree GPI R-Tree GPI R-Tree GPI R-Tree GPI R-Tree GPI R-Tree GPI R-Tree GPI
L01 8.426 0.962 8.506 0.995 8.587 1.070 8.681 1.316 8.717 1.519 8.926 1.854 8.985 2.252
L02 8.807 1.023 8.995 1.155 9.100 1.240 9.241 1.650 9.555 2.088 9.968 2.832 9.796 3.087
L03 8.787 0.882 8.890 1.011 8.970 1.251 9.108 1.337 9.158 1.563 9.291 1.779 9.288 1.990
L04 9.237 1.516 9.542 2.358 9.695 2.567 10.099 3.567 10.282 3.822 10.715 5.172 10.923 5.528
L05 9.001 1.006 9.189 1.227 9.392 1.674 9.495 2.025 9.671 2.510 9.756 2.950 9.997 3.349
L06 8.538 1.197 8.777 1.371 8.925 1.807 8.956 2.020 9.252 2.598 9.348 2.970 9.526 3.497
L07 8.800 1.562 9.014 2.172 9.321 2.994 9.605 3.973 10.223 5.304 10.361 5.909 10.934 7.474
L08 8.552 1.185 8.659 1.244 8.779 1.367 9.015 1.796 9.026 2.119 9.282 2.873 9.324 3.153
L09 8.683 1.381 8.909 1.702 9.059 2.314 9.331 2.846 9.529 3.650 9.836 4.272 10.106 5.190
L10 8.390 1.059 8.520 1.313 8.739 1.763 8.959 2.098 9.177 2.627 9.224 2.850 9.467 3.737
L11 8.410 1.187 8.532 1.644 8.727 1.618 8.886 1.891 9.109 2.400 9.317 3.058 9.542 3.748

Figura 23. Table iv. Retrieve time summary for acer neotouch
Fonte: Shea & Cao, 2012, p. 1050.

Figura 24. Geospatial data retrieval performance on Dell Axim X51v


Fonte: Shea & Cao, 2012, p. 1051.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

39

Figura 25. Geospatial data retrieval performance on Acer NeoTouch


Fonte: Shea & Cao, 2012, p. 1051.

Sobre estes resultados, o destaque é para o fato de o desempenho do método GPI ser
mais dependente do tamanho da base de dados geográfica acessada. Os autores apresentam os
dados matemáticos que ilustram essa observação.

From the two performance ratio sets it can be concluded that the GPI method
outperforms the R-Tree access method when handling different geospatial data
densities, irrespective of the device platform. The performance efficiency for high,
medium, and low data density was 1.3, 1.9, and 4.3, respectively.
Both the gradient of the trend lines for the R-Tree method measured in the two
devices was 0.3 with a high y-intercept value. The y-intercept value is the overhead
expense in executing the R-Tree algorithm. This means that most time is spent in
executing the R-Tree algorithm in obtaining the required geospatial data and the
retrieval time is less dependent on the quantity of data to be retrieved.
In contrast, the time needed for retrieving the geospatial data using the GPI
method is more dependent on the size of the required data, which is also roughly related
to the total number of tiles retrieved. The following three ratios (retrieval size ratio, RS;
retrieval time ratio on Dell, RT1; retrieval time ratio on Acer, RT2) provide reasonable
support to the above observation: (Shea & Cao, 2012, p. 1052).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

40

Figura 26. Captura de tela


Fonte: Shea & Cao, 2012, p. 1052.

O principal resultado dos autores é propor um método com melhor tempo de


recuperação de dados em contraponto com o R-tree, que ele caracteriza como tendo “expensive
R-Tree spatial access methods” (Shea & Cao, 2012, p. 1052).
O próximo trabalho discutido neste estado da arte trata da organização e representação
de dados geográficos, da gestão do seu espaço de armazenamento, organização e indexação,
análise e exibição em plataformas mobile. Song et al. (2014) vão elaborar um modelo que
consegue lidar com a grande quantidade de dados de bases geográficas. O trabalho permite
abordar novos aspectos dentro da temática indexação GIS.
Mantêm-se nas palavras dos autores a descrição das principais soluções de
armazenamento de dados GIS existentes bem como seus principais desafios.

Currently, there are many solutions for application of mobile GIS and major
GIS vendors, large database vendors, handset manufacturers have also launched their
own solutions, including ESRI's ArcPad, MapInfo's MLS, Intergraph's Intelli Where,
Autodesk's Onsite, Oracle's Oracle9i AS Wireless, Sun's Java Location Service
platform.
Among of them, the mobile GIS software-ArcPad, can be well compatible with
ArcGIS, and its map engine has been optimized for Windows CE.ArcPad supports a
variety of data interfaces, such as SHP, JPG and other vector or raster format. ArcIMS
is a comprehensive strong WebGIS platform; it provides mobile spatial information
services through the Internet, which can support thousands of concurrent processing
and day millions of requests. ArcPad is a wireless mobile client of ArcIMS, it integrates
with GPS to get the spatial location of the mobile terminal, and through a wireless
network communications and high-capacity database access to obtain spatial
information services.
Onsite is the mobile enterprise solutions of Autodesk series; it’s designed for
mobile application development and exchanges the spatial data through Java Serverlet.
MapGuide is the web mapping technology of Autodesk, which can provide a good

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

41

operating environment for mobile spatial information services. Moreover, the spatial
information obtained by Onsite can be not only raster images can also be more complex
vector space data.
Comprehensive research status, we can find there are also many shortcomings
on the current mobile spatial information services, mainly in the following areas [1]
[2]:
1. It is very difficult to store vast amounts of data with limited storage space
capacity of mobile devices. So how to make full use of the limited space to store huge
amounts of data is an unresolved difficulty.
2. Because most mobile spatial information services interactive in the form
of radio networks and local storage management efficiency of spatial data is limited, so
when the data reaches a certain size, its retrieval, updating operating efficiency becomes
quite low.
3. Mobile spatial information services to support the wireless network is
not sufficient, most simply just use the base function of mobile GIS, and all services are
provided by the mobile terminal, which cannot meet the requirements for real-time
access to dynamic update service. (Song et al., 2014, p. 2730).

Os autores vão explorar, em seu modelo, índices espaciais, layers e blocks de mapas,
LOD e questões de cache de forma a obter desempenhos robustos em termos de armazenagem
de dados. O estudo também inclui “relational database storage management, spatial data
objects layered block processing, embedded spatial indexing framework, spatial data access
methods and caching mechanisms of data loading“ (Song et al., 2014, p. 2731).
Sobre armazenamento e gestão dos dados espaciais, os autores explicam como reduzir
o espaço necessário para lidar com eles.

… we use embedded relational database to storage and manage the


massive spatial data. The spatial data organized by file format is mapped,
transformed and storied in relational database, which can provide integrated
management for the geometry and attribute data. In this way, it can both to
ensure the integrity of the information space of the transformed data, but also
greatly reduce the space resource.
1. Mapping of spatial data. We mapped the spatial data and save
the logical information through mapping the spatial data to data tables,
indexes tables, and control tables. Principles of spatial data mapping are:
①complete record of all the spatial data and attribute data; ②record logic
and spatial index information of the spatial data. In addition, mapping table
includes three types of data tables: spatial data tables, and control tables index
table space.
2. Spatial data formats. Spatial information exchange format is
typically Binary (WKB) format and Text (WKT) format. WKT format is easy to
understand, and have higher access efficiency, but the space utilization is low,
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

42

which is not suitable for the mobile device. According to the design principles,
in this study, we use WKB format for storing geometric objects, and save the
spatial Information in BLOB type in the embedded database system. (Song et
al., 2014, p. 2731).

Sobre melhorar a velocidade de resposta dos dados, os autores explicam como lidar com
o mapa.

In order to meet the requirements of the graphic display, to improve the


response speed of the data, we do a necessary treatment on the spatial data of
map, making a choice on map elements under different scale. On the study of
spatial data, we focus on the research on how to improve the display speed and
reduce the cost of program memory.
1. Map classification. In this work we divided map classification
into two levels by establishing data classification index: first, classify the data
of different scale to achieve the basic gradation display; second, classify the
different data of same scale to achieve the detailed gradation display.
2. Map block. After classification, we can get different levels of
vector data. However, we do not need to read the entire map sheet data owe to
the small screen of mobile device. Conversely, we build the block index to
achieve the goal of reading the desired data block when it’s needed to be
displayed, which can reduce the amount of data and improve operating
efficiency.
3. Spatial index framework. We build each index for each spatial
data layer and save the index which is generated by the algorithm into device
memory. When the system needs to load map, it can find the related grid in the
view and load the corresponding data. Firstly, we calculate the map sheet and
grid according to the current vision and central location. Secondly, we open
spatial data to read all the information blocks in grid Index according to the
current scale and LOD factor configuration. Thirdly, we read map data
elements according to the offset which is found from the load layer and grid.
[4]. (Song et al., 2014, p. 2731).

O núcleo do método proposto é resolver o problema dos métodos de acesso e


formatos de armazenamento. É esta a contribuição do trabalho neste estado da arte. A
Figura 27 mostra a arquitetura do sistema, e a descrição é mantida nas palavras dos
autores. As Figuras 28 e 29 traz os resultados do sistema.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

43

… we build a system application structure, which can provide data


management support for mobile GIS applications such as car navigation
systems, field data collection and other applications. Prototype system has the
following functional modules and its architecture is shown in Fig.1.
1. Data layer. The system maps the spatial data to data tables,
indexes, tables, and control tables to save the logical information among
spatial data. Finally, data will be stored in document form in embedded
relational database.
2. Logical layer. The system can achieved the function of creating
and modifying spatial index, inserting and deleting records by abstracting
common operator interface. For example, the query of spatial data includes
ranges queries and point queries, etc.
3. Presentation Layer. The system provides map-based operations,
including: zoom, pan, etc. In addition, users can dynamically query spatial
data and attribute information by entering keywords or regional scope in
spatial indexing framework. (Song et al., 2014, pp. 2731-2732).

Figura 27. The system architecture


Fonte: Song et al., 2014, p. 2732.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

44

Figura 28. Narrow View map data


Fonte: Song et al., 2014, p. 2732.

Figura 29. Pan View map data


Fonte: Song et al., 2014, p. 2732.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

45

Os autores utilizam o índice R-tree para aumentar o tempo de consulta. A Figura 30


compara o tempo de resposta do sistema com e sem o índice.

Figura 30. Test result of surface


Fonte: Song et al., 2014, p. 2733.

A Figura 31 apresenta a diferença do tempo de resposta com e sem classificação – “we


have to reduce the amount of query object and data query results to improve query efficiency by
controlling the different levels”. (Song et al., 2014, p. 2733)

Figura 31. Test result between classification and no-classification


Fonte: Song et al., 2014, p. 2733.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

46

As conclusões dos autores fornecem diretrizes para sistemas de aplicações GIS mobile
e são reproduzidas na íntegra.

Firstly, using data caching technology can better support data read and display.
Secondly, integrated use of lossless compression and lossy compression techniques can
better adapt to lack of storage space constraints of mobile devices. Thirdly, create a
spatial index in the database can improve the speed of data read. (Song et al., 2014, p.
2733).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

47

2.2 Desenvolvimentos aplicáveis a dispositivos móveis com tecnologia em


nuvem

O último trabalho sobre indexação de aplicações GIS discute um desenvolvimento


baseado em nuvem. Lee & Liang (2011) falam sobre as vantagens da nuvem em lidar com
informações geográficas.

Accommodating the massive use of mobile applications for the spatial-centric


purposes outlined in this article demands a large-scale spatial data service. It is
challenging to build such a large-scale data service because this service needs a flexible
and expandable computing resource that responds appropriately to the volume of mobile
users. Cloud computing (Armbrust et al. 2010) meets the high demands of this type of
service infrastructure because it is a virtualized computing resource in terms of both
computation and storage. Also, the proposed Cloud computing-based infrastructure can
be easily provisioned on demand with a service level agreement. We have collected over
265 million geolocation data points drawn from a partial database downloaded from
OpenStreetMap (Haklay and Weber 2010). It is not a straightforward matter to store
and manage this kind of data set. Even a real-time geolocation data stream from a vast
number of mobile users is a challenge. A Cloud storage service provides a content
delivery network-like data service (Hofmann and Beaumont 2005) in which data access
is routed to its nearest data access edge. Cloud storage replicates the database to
several remote sites around the world to promote geographic locality. (Lee & Liang,
2011, p. 1284).

Os autores se dedicam a elaborar um serviço para lidar com dados espaciais


considerando armazenamento na nuvem focando em duas ações principais dos aplicativos
móveis: o check-in e a busca pelos arredores. A partir dessas operações básicas, segundo os
autores, derivam outras ações mais complexas. Sobre o sistema proposto, denominado Geopot,
as explicações são mantidas nas palavras dos autores.

Our system has two parts: a local data service for indexing and storing geolocation data
of check-ins to make a compact spatial index database based on an in-memory R- tree
and a local networked in-memory hash table and Cloud-based data service for global
deployment/access. The local data service controls response to all mobile interactions,
which entails the processing of check-ins and responding to nearby searches. The local
data service maintains a compact spatial index with tempo spatial clustering in which
check-ins are grouped by Euclidian distance through a time window. A centroid of a
cluster is used as a spatial index of R-tree, and members of the cluster are temporarily
stored in the local networked hash table. This ensures that the R-tree is of a compact size
that fits into the system’s main memory as more check-ins are coming in. The data stored
in the local networked hash table are published into external Cloud storage for global
accesses. By publishing local data to the Cloud, it insures staying below a specified
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

48

storage size for the local data service, but it also promotes scalable access to the Cloud
from mobile clients. We used Amazon Simple Storage Service (S3) (Garfinkel 2007),
which provides key–object pair storage and also can be used as a content delivery
network with CloudFront. (Lee & Liang, 2011, p. 1285).

O Quadro 8 sintetiza os principais sistemas existentes, pontuando aproximações e


diferenças com o Geopot proposto.

Most conventional spatial databases are based on relational database management systems.
Most of these provide spatial data models and operations related to geometrics. PostGIS by Ramsey
(2005) is a representative spatial database back-ended by PostgreSQL object relational database. To
implement a large-scale database, a tightly coupled database server clustering is required. There are few
studies on large-scale spatial data services for location-based mobile applications. Recently, Windows
Azure Platform 2010, a Cloud database service, has been extended to provide the support. This works
like the spatial data support in SQL Server 2008 in that it has the same spatial data types and spatial
methods and indices. This is just a virtualized database system and still has the limitations of relational
databases for a large-scale user volume. Facebook (Saab 2010) introduces networked key-value
memory cache server for scaling its social network service. They use more than 800 servers supplying
over 28 terabytes of memory for 500 million users’ information.

Wang et al. (2009) studied retrieving and indexing spatial data in a Cloud computing
environment, especially using Google AppEngine. They tried to solve the drawbacks of spatial data
storage using redesigned classic spatial indexing algorithms such as Quad-tree and R-tree. A spatial data
object model is developed based on the Simple Feature Coding Rules from the OGC, such as Well-
Known Binary and Well-Known Text. This is different from our work in terms of design policy. They
tried to convert a non-Cloud spatial database into the Cloud using Well-Known Binary and Well-Known
Text. Our system, Geopot, maintains spatial indices in the local data service and raw data of those spatial
indices are stored in the Cloud.

Wan et al. (2009) investigated a method for making a grid index to promote an independent
data storage partition, so range query is converted into or limited to local query. Consequently a more
effective multilayered grid index has been created for use with huge amounts of data. This work
compiles additional metadata on the spatial index to isolate its spatial index from the large volume of
raw data. Fundamentally, this is different from our method that minimizes the size of the local spatial
index to enhance query performance.

Ester et al. (1998) studied efficient spatial clustering methods on large spatial databases. The
methods are mostly offline clustering algorithms, so they are not suitable to online applications such as
mobile applications.

In terms of key-value storage, there are few open-source systems in which spatial indexing is
supported: MongoDB and GeoCouch (NoSQL 2010). MongoDB provides spatial support that is from
GDAL (Geospatial Data Abstraction Library)/OGR Simple Feature Library. GeoCouch is a spatial
extension for CouchDB, which provides key-document storage. It uses the external spatial back end of
SpatiaLite, which supports geometrics of Points, LineStrings and Polygons and provides bounding
box search, polygon search and radius search. The key-value storages have been integrated with external

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

49

modules of spatial database. This is not suitable to use external Cloud storages such as Amazon S3.
This is different from our system in aspect of decoupling spatial index with raw data that are deployed
into Cloud storage for the large volume of mobile users.

Quadro 8. Sistemas existentes e o Geopot


Fonte: Lee & Liang, 2011, pp. 1285-1286.

Dos aplicativos que usam as operações de check-in e busca pelos arredores, os autores
destacam o Dodgeball, o Brightkite, o Loopt, o Foursquare, o Gowalla, o Google Latitude (Lee
& Liang, 2011). A função de check-in, explicam os autores, informações de latitude, longitude
e altitude são transferidas de um servidor para o dispositivo do usuário. Já a função de busca
pelos arredores precisa considerar informações específicas do usuário. Para o sistema, isso
significa uma alta frequência de solicitações com pouca quantidade de dados transferida. O
desempenho do sistema, então, e sua capacidade, devem acompanhar o número de usuários
(Lee & Liang, 2011).
Tal competência é chamada de scalability – “scalability is a desirable property of a
system, a network or a process. Scalability measures a system’s ability to either handle growing
amounts of work in an efficient manner or readily enlarge/expand in response to that increased
demand” (Lee & Liang, 2011, p. 1286). A computação em nuvem é um recurso importante para
essa propriedade. Os autores explicam brevemente como a estrutura funciona.

It comprises three basic segments: application, platform and infrastructure. Mostly IaaS
(Infrastructure as a Service) is offered by Cloud providers. Without a large amount of
resource investment, hosted applications in a Cloud use shared resources on demand.
The resources used are adjusted automatically according to the demand of a service. As
an infrastructure service, a Cloud provides computational resources and storage
resources such as Amazon Web Service’s EC2 (Elastic Compute Cloud) and S3
(Garfinkel 2007). (Lee & Liang, 2011, pp. 1286-1287).

Os autores dividem o Geopot em duas partes, uma local e outra na nuvem utilizando o
serviço de armazenamento da Amazon, S3, para os dados geográficos. A Figura 32 mostra
como essa divisão contribui para a scalability do sistema.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

50

We use this for external geolocation data storage instead of storing it locally. This
alleviates the burden of provisioning resources required for future use and maintains
the size of a local data service provider as small as possible. Local data service
maintains a compact spatial database containing centroids of clusters in which each
cluster is a group of raw data located in a range of search space. For nearby searches
of mobile clients, it returns a link towards raw data in the Cloud. With this link, mobile
clients can directly access its raw data in the Cloud. (Lee & Liang, 2011, p. 1287).

Figura 32. Scale-out design: local data service and Cloud data service.
Fonte: Lee & Liang, 2011, p. 1287.

O serviço local lida com as informações de check-in e indexa as informações a partir do


R-tree, que também vai alimentar a busca pelos arredores. Os autores também usam algoritmos
hash. Nas palavras dos autores, apresenta-se a arquitetura do serviço no Quadro 9.

The local data service plays a role in collecting check-in geolocation information from mobile
devices, making a spatial cluster of the raw data and indexing it into the spatial index data structure,
that is, R-tree and servicing nearby searches on the R-tree. A cluster of raw data is represented in a
spatial index of R-tree and has a SHA1 (Eastlake and Jones 2001) hash tag that is used as a unique
identification for future client access to the external Cloud storage. The local data service pushes the
raw geolocation data into the external Cloud storage and labels it with the SHA1 hashed identifier.
Cloud storage is a key-value storage, so a hashed identifier is used as the key. Consequently this
involves two steps to retrieve actual raw data from mobile devices: the first step is from the local data
service and the second step is from the Cloud storage. The current high-speed network between local
and the Cloud, with its high bandwidth and low latency, enables this.

The reasons why the data services are separated into two parts are as follows:
(1) To minimize the size of spatial index (R-tree) for a fast nearby search;
(2) To minimize the cost of building and maintaining geolocation data storage;
(3) To minimize the access cost to data through the geographically distributed Cloud
storage (closely located ones are used by mobile users).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

51

As mentioned earlier, collected geolocation data are grouped into clusters in which each
cluster comprises a set of tightly positioned data. .... The centroids of clusters are used to construct
the spatial index to provide the nearby search. The R-tree does not have an access to all geolocation
data through this method, only centroids of clusters do. The actual raw data in a cluster resides in the
Cloud, thus speeding up a nearby search on small-sized spatial indices relatively. With the help of
expandable Cloud storage, we do not have to build large-scale storage beforehand. This significantly
reduces the TCO (Total Cost of Ownership) of a mobile application. Also, a Cloud storage service
is based on multiple data centres around the world. As a result mobile users can access it near the
local edge of the Cloud. This is usually driven by DNS (Domain Name Service) technologies with
geographical locations of IP addresses.

Quadro 9. Sobre a arquitetura do Geopot


Fonte: Lee & Liang, 2011, pp. 1287-1288.

Os autores vão usar um método de agrupamento para lidar com a dimensão temporal
do sistema. O Quadro 10 traz as explicações.
To maintain a lightweight spatial index database (R-tree) in the local data service for a nearby
search, we devise an online clustering method to represent a group of geolocation data within a
particular time window. As shown in Figure 2, a checked-in geolocation data stream is a set of events
that have occurred in a particular geolocation. Data on each event consist of its geolocation and of
metadata describing the event itself. Tightly spaced events are grouped into a cluster. Data for a cluster
consist of its centroid coordinates, a hash key generated by its location and the radius encompassing
its cluster. This information is added into R-tree. It uses a modified version of the online Leader–
Follower method (Duda et al. 2000) that alters only the cluster centre most similar to a new pattern
and spontaneously creates a new cluster if no current cluster is sufficiently close to the data. As mobile
users produce a series of geolocation data within a short period of time, online clustering is more
effective than a K-means (MacQueen 1967) clustering method, which needs a fixed number of clusters
in advance.

Figura 33. Clustering closely located events and making a representative hash key and radius of a
cluster.
Fonte: Lee & Liang, 2011, p. 1288.

Quadro 10. Método de agrupamento para lidar com dimensão temporal


Fonte: Lee & Liang, 2011, p. 1288.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

52

O algoritmo 1 apresentado na Figura 33 apresenta este método de agrupamento que é


ilustrado nas Figuras 34 e 35, e suas explicações são mantidas nas palavras dos autores
conforme Quadro 11.

Algorithm 1 describes the sequence of our clustering method. It gets an incoming geolocation
stream datum (loc). It finds a cluster encompassing the incoming datum. To find a cluster, it uses a
larger search range (locrange) that finds the largest set of cluster candidates. If it finds a cluster that is
located within the distance of θ, the datum is appended into the found cluster. If not, a new cluster is
created and the datum is appended into the new one. For the threshold of θ, we use 1 km in this article
as a minimum cluster diameter. By different θ , we can control the number of clusters because it
determines the diameter of a cluster. When a new cluster is created, it makes a temporal cluster that
includes a TTL (Time-to-Live) value, because it is not possible to maintain all data in the local data
ser- vice. The new cluster’s centroid is added to R-tree and the appended raw datum is stored in a local
hash table. To minimize the size of local storage, any hash tables of expired clusters are deleted. Upon
deleting a local hash table of the expired cluster, the contents of a local hash table are published into
the external Cloud storage. The algorithm repeats the fetching of a new incoming datum. Figure 3
shows the overall sequence of the described temporal clustering. Through Algorithm 1, it creates a
spatial cluster as a local spatial index and a local hash table object containing raw data. This is finally
transferred to the Cloud storage in which the local hash table is stored as a key-value object, as shown
in Figure 4. A spatial index is reused after publishing its current local hash table to Cloud storage. It
needs to reset the time stamp of a cluster after its TTL expiration. This is performed by
resetTimeStamp in the event of a check-in if the local hash table is no value, null.

Figura 33. Captura de tela


Fonte: Lee & Liang, 2011, p. 1290.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

53

Figura 34. The overall sequence of Geopot temporal clustering.


Fonte: Lee & Liang, 2011, p. 1289.

Figura 35. Local R-tree for centroids of clusters and Local/Cloud hash table of key-value objects for
raw data.
Fonte: Lee & Liang, 2011, p. 1290.

Quadro 11. Método de agrupamento proposto


Fonte: Lee & Liang, 2011, pp. 1288-1289.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

54

Os autores explicam outros dois algoritmos envolvidos com informações no Quadro 12.
Algorithm 2 clears the expired clusters’ hash table, which are maintained in the local data
service temporarily. It is difficult to retain all the raw data from each cluster, so it is necessary to delete
it after some period of time, that is, TTL. Before deleting a cluster’s local hash table, the local hash
table is moved to Cloud storage, after which it is deleted. To indicate the availability of raw data that
have already been moved to Cloud storage, the cluster’s ccount is increased. By this count value, mobile
clients recognize the availability of these data. If the count is 3, for example, there are three hash
tables, with two of the three being in the Cloud and one being in the local data service. While a local
hash table is being published, the count value is appended into the object name of the Cloud. So, with
this count value, clients can access the recently published hash table object in the Cloud. With this
count value, clusters do not need to merge themselves. Clusters generated in time are managed with
the serial number of the count. As clusters repeat their life cycle, the local data service has all clusters’
spatial indices and their links (hash tag of identification) to the actual raw data residing in Cloud
storage.

Algorithm 3 shows a nearby search that is performed by a request from a mobile user. At first,
it finds centroids (spatial indices) from the R-tree with a search range, R that is usually several
kilometers. To find the largest set of centroids, it uses a larger search range. Then, it merges all raw
data from the local hash table if available or just returns its hash tags for the external Cloud storage if
there is no data in the local hash table. The search is performed in the local data service, but most raw
data fetching is performed by a mobile device by addressing returned hash tags towards the Cloud
storage.

Figura 34. Captura de tela


Fonte: Lee & Liang, 2011, p. 1291.

Quadro 12. Algoritmos do Geopot


Fonte: Lee & Liang, 2011, pp. 1290-1291.
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

55

Considerações sobre o design de aplicativos móveis com algoritmos de agrupamento


são feitas.
A cluster is made in a timely manner, that is, a sliding time window. That
is, each cluster has its own TTL. A few hours TTL value is recommended for
mobile applications in a dense area. The radius of a cluster is not large, for
example 1 km in this article. A cluster could be overloaded if we do not use the
TTL and the small size of radius for a cluster. A cluster’s TTL will be expired
and then the local hash table constructed during the short period of TTL will
be transferred to the Cloud with its counter tag. With this counter tag in addition
to the cluster’s hashed tag(name), a client application can fetch a recently
collected cluster data from the Cloud. This leads to reduced information
overload by a mobile application. A mobile application can request past data
with giving the lower counter value for the cluster’s hashed tag as the object
name of the Cloud.
A mobile client can get check-ins of a certain long period of time using
a series of requests of sequential counter values with the hashed tag(name) of
a cluster. Actually, the indices of the R-tree are not deleted in the memory after
the expiration of the TTL. The indices of clusters in R-tree will be stayed all the
time, but their counter tag will be increased after every TTL expirations to flush
its local hash table to the Cloud. (Lee & Liang, 2011, pp. 1289-1290).

Sobre o Geopot, o Quadro 12 trata de sua implementação.

Figura 35. Geopot server implementation


Fonte: Lee & Liang, 2011, p. 1292.

… Geopot server’s front-end accepts incoming check-in requests that are distributed evenly
towards internal threads as shown in Figure 5. The distribution for scaling local data service is achieved
by the sharding method [grifo nosso] (Cryans et al. 2008) of partitioning horizontally incoming
requests to internal threads. We use MD5 [grifo nosso] as a consistent hashing to make a shard
number. Each internal thread has its own shard number. Each thread accepts two pairs of information
for check-in: geolocation information (latitude and longitude) and the metadata of its application.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

56

Each thread has its own R-tree for indices of centroids of clusters. By the temporal clustering
algorithm, each incoming check-in is indexed into each thread’s own R-tree and the raw data are stored
in the local networked hash table that is a key-value in-memory storage. For this we use Redis
(NoSQL 2010) open-source software [grifo nosso], which is a high- performance method for key-
value storage. There are three advantages to use internal sharding for local data service: the first one
is to remove internal locking for R-tree, the second one is to maintain the size of a hash table entry at
a moderate number and the third one is to make the system easily extendable by adding more shards
(thread and its R-tree pairs).

The Geopot server is implemented using a multithreaded programming method [grifo


nosso]. To make the server highly concurrent, this requires a way of avoiding lock contentions
occurring in accessing the internal R-tree from multiple threads at the same time. To remove the hotspot
resulting from competing to acquire an exclusive lock for one R-tree, we deploy an individual R-tree to
each check-in thread. If we use only one R-tree, it degrades the performance of accessing R-tree due to
the high contention of locking. If we use one R-tree, each cluster’s hash table size grows as it goes.
Through a multiple R-tree of shards, this can be lessened by partitioning R-tree. Each thread of each
shard uses one process in a modern operating system that is mapped to use one CPU core on the
system. By increasing the number of shards by as much as the number of CPU cores, we can exploit
the system so as to be fully loaded.

Likewise, each incoming request is distributed, whereby raw data are distributed into a local
hash table with a hash tag that is used as a shard number. This is also a horizontal partition for
networked local hash tables. We can easily add more hash table servers to increase the capacity
of the local storage [grifo nosso]. Upon expiring a cluster in the R-tree, the content of its local hash
table is transferred to the external Cloud storage. According to the local networked hash table, the
TTL can be adjusted. If there is not enough space in the local hash table, a small TTL could be used
to free up local storage. However, if we do have enough spaces, a marginal TTL will be used to achieve
a fast response time for clients who do not need to visit Cloud storage with returned hash tags of a
nearby search to get the raw data. To transfer the local hash table to Amazon S3, we use Amazon Web
Service API that is described in WSDL (Web Service Definition Language).

Quadro 12. Implementação do Geopot


Fonte: Lee & Liang, 2011, pp. 1291-1293.

A Figura 36 ilustra a interface que dá acesso ao servidor Geopot; a explicação está em


citação literal.

Any mobile client can connect to Geopot server with the HTTP-based RESTful API ….
It supports Apache Thrift (Slee et al. 2007) network interface for the system integration
purpose that is an open-source network framework for scalable cross-language service
development developed by Facebook for a large-scale system interoperability. Any other
programming languages can make Thrift network bindings to access Geopot server
using the interface definition as described in Figure 6. As defined in the interface, it has
three simple methods of check-in, nearby search and control its server. (Lee & Liang,
2011, p. 1293).
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

57

Figura 36. Thrift interface definition


Fonte: Lee & Liang, 2011, p. 1293.

Sobre outras operações possíveis pelo Geopot.

As mentioned before, our data service has two essential operations for
mobile applications: a check-in and a nearby search. In addition to these two
operations, Geopot’s front-end can provide HTTP-based-based RESTful APIs
to do extra operations for a mobile application as follows:
(1) /login: user authentication
(2) /register: user registration/unregistration
(3) /checkin: check-in a place (current location and metadata)
(4) /nearby: request a nearby search within a search range
(5) /recent: request recent events of a user (Lee & Liang, 2011, p.
1293).

Sobre a publicação da tabela local hash para armazenamento na nuvem, o Quadro 13


sistematiza as informações; a tabela da Figura 37 apresenta exemplos e a Figura 38 traz o
formato do armazenamento.

Each centroid in R-tree of the local data service has its TTL. If this has expired, the content of
the local hash table will be published into external Cloud storage. After publishing on a cluster is
finished, the content of the local hash table is deleted to free up space [grifo nosso]. Each index of
R-tree of the local data service has its own hash tag and its count value as presented in Algorithm 1.
This is used as an object name for external Cloud storage. Geopot uses SHA1 [grifo nosso] (160 bits)
to generate the hash tag using the coordinates of the centroid. As external Cloud storage, AWS S3 has
two-level addressing: first, to access an object in S3, it needs a bucket name as a data domain; second,
it needs an object name in a bucket. Objects are managed in a directory-like hierarchical manner
[grifo nosso].

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

58

Before publishing a hash table to the Cloud, the local data service just serves an incoming
nearby search request with its local hash table. After publishing, it just returns hash tags to its clients.
Clients directly access Cloud storage with the hash tags [grifo nosso]. Geopot uses the hash tag as
an object name consistently.

Table 1 presents examples of publishing a local hash table to the Cloud. In this case, each
local hash table will be replicated into two regions (Asia and the United States) by the Cloud itself
and it will be stored as a JSON (Javascript Object Notation) (Crockford 2010) format as shown in
Figure 7. AWS S3 Cloud service has three different regional data centres: The United States, Europe
and Asia. Because JSON format is widely used as a common data exchange format over HTTP for
web applications, it can be accessible from mobile devices directly without any restrictions. AWS S3
provides HTTP for accessing its objects. It is the same protocol for a general web server. So, it is
important to use a HTTP- friendly data format such as JSON. Network bandwidth to two regions from
the University of Calgary, Canada, where our test clients are located are almost the same as shown in
the table. Mobile clients residing in each region can access the local edge of AWS S3 Cloud service.
This minimizes network latency due to local access.

Figura 37. Table 1. Examples of publishing local HashTable to the Cloud, AWS S3 (Asia and the United States
region).
Fonte: Lee & Liang, 2011, p. 1293.

Figura 38. An example of a published JSON object of a local hash table.


Fonte: Lee & Liang, 2011, p. 1294.

Quadro 13. Publicação tabela hash


Fonte: Lee & Ling, 2011, pp. 1293-1294.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

59

O Quadro 14 traz as avaliações de desempenho do sistema em relação ao tempo de


resposta e à eficiência de espaço. Também traz comparações com o sistema PostGIS em termos
de check-in e busca por arredores.

Performance evaluations
We evaluate our data service system using GPS traces (265,504,306 location data in total) that
is crawled from OpenStreetMap, an open GPS trace project. In our local service, we have one check-
in/nearby server and two memory-based hash table servers (one is located with check-in server and the
other resides on the same local network). For Cloud storage, we use Amazon Web Service S3 as a
key-value Cloud storage service that is accessible through HTTP/HTTPS by mobile users. To evaluate
the scalability of Geopot, we compare the response time with PostGIS only for two basic operations: a
check-in and a nearby search. PostGIS is a full-fledged spatial database plug-in for PostgresSQL
relational database. The purpose of this evaluation is not functional comparison, but the performance
of check-in/nearby performance of Geopot. For each test, we sampled one million check- ins (see short
description on the sampled data set in Appendix 1) from 265 million GPS trace points.

Space efficiency
Local data service maintains a spatial index of clusters and partial temporal hash tables to
reduce the loaded size of spatial information. This enables a faster nearby search than an all-in-one
centralized spatial database. Figure 8 shows (a) the percentage of the number of clusters to the number
of raw location data of check-ins and (b) a histogram of the size of clusters. In this case, the total
number of check-ins is one million. For these raw data, the local data service just makes 11,484 clusters
(1.14%). This means it only needs 11,484 spatial indices in its spatial database for one million raw
data. As shown in Figure 8(b), most of the clusters have less than 400 raw data. The range from 0 to
50 is the highest frequency of cluster size. The histogram shows the frequency of each cluster size. It
has the size of clusters more than 50 through 1000 for the one million samples of 265 million data. The
radius of clusters for our experiments is 1 km. We think this reflects all kinds of common coarse and
dense areas. In a dense area, the size of cluster during a time period (TTL) could be growing in the
number of 1000 similar to this result. Further studies are needed on the relationship between the size
of clusters and the number of check-ins with the TTL in more detail.

Figura 39. (a) Percentage of the number of clusters to the number of raw data and (b) histogram of the size of cluster.
Fonte: Lee & Liang, 2011, p. 1295.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

60

Response time
The following two evaluations were performed in the same local network of the Geopot service
because response time is affected by network conditions on the client side. Thus, it is more accurate to
test its processing performance on the same network. Moreover, to measure the response time of
requests on each system in a fully loaded condition, test clients performed bulk requests to each system.

PostGIS versus Geopot: insertion (check-in)


Figure 9 shows the elapsed time of inserting location data into each system. The insertion
benchmark program is a single-threaded standalone program to check each system’s spatial index
complexity. As you can see, the response time of insertion, until the number of 1000 entries, is almost
the same. However, after that, Geopot outperforms the PostGIS. From the one million check-ins test,
Geopot is 3.82 times faster than PostGIS. PostGIS and Geopot took 1345 sec and 351 sec, respectively.
The Geopot server has a check-in buffered queue to increase the response time of check-in (inserting
location data). Clustering (Indexing) is performed by several background threads of the Geopot local
server. In this case, we use the number of threads that is the same as the number of CPUs in the testing
server. This is automatically configured by Geopot.

Figura 40. Insertion time: PostGIS versus Geopot


Fonte: Lee & Liang, 2011, p. 1296.

After the 104 insertions, the response time has been increased. This is due to the increased
index complexity and the increased index space. This leads to a longer delay time to get the actual data
from disk or cache memory of the system. Because geolocations of series of insertions are not
geographically correlated, it is hard to get a chance to get higher cache hit ratio with a limited working
memory. It is even getting worse disk I/O performance with a larger number of spatial indices than
small. Due to the frequent replacements of pages of virtual memory between main memory and the
disk storage of the system, it affects the response time to access spatial indices. Geopot has less spatial
indices in its working memory; it gets faster response time than PostGIS.

PostGIS database constructs its spatial index for check-ins. It stores the spatial indices and
check-in entries together in the same database. But, Geopot separates the raw data check-ins) from the
spatial indices. The check-ins are in the Cloud. The spatial indices (actually the centroids of clusters)
are in the local data service. When we push the experimental raw data set to both systems for
benchmarking, both systems do the similar processing of indexing data and storing them in each
system, but Geopot reduces the size of spatial indices with the centroids of clusters and the capacity

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

61

for the raw check-ins with the movement to the Cloud. In addition to the reduced size of indices, the
difference between PostGIS and Geopot is the scalability. PostGIS stores all data in the local storage,
but Geopot does not.

PostGIS versus Geopot: selection (nearby search)


We measured the performance of the nearby search of each system with different num- ber of
clients to see the scalability. Figure 10 shows the performance of PostGIS’s query for finding points
in proximity to a point location that is one of 1000 randomly selected data points from our test data.
The simple query is an SQL expression similar to SELECT FROM checkins WHERE ST_DWithin
(the_geom, GeometryFromText (‘POINT( 128.000 45.000)’, 4326), 0.01). This means ‘find all points
within 1 km of a location coordinate’, in this case ( 128.000, 45.000). Figure 10(a) shows the box-and-
whisker statistics on the elapsed time of PostGIS’s nearby search with different number of clients. The
plot gives five-number summaries of the elapsed time measurements: minimum observation, lower
quartile, median, upper quartile and maximum observation. As shown in Figure 10(a), we can figure
out that the average completion time of each query is doubled by as much as the number of additional
clients except between the one thread and four threads. Each number of clients’ average elapsed times
are 0.48 sec, 0.58 sec, 0.94 sec, 1.84 sec, 3.5 sec, and 6.8 sec, respectively. In Figure 10(b), the
measurements of 64 threads are presented as an example to see the actual behaviour of threads. In this
case, we used 64 threads (clients) to stress the system with 1000 queries at the same time. The straight
line of each graph indicates its linear regression. Its completion time has decreased around 600 or so.
This is due to the different running time of each thread. Each thread running simultaneously has evenly
distributed number of queries initially. As shown in Figure 10(b), however, they are not finished at the
same time, so they have different elapsed time of each query. Each thread’s ending time varies
according to the number of those results returned by a query, that is, the number of raw data of a cluster.

Figura 41. (a) PostGIS nearby search time; (b) PostGIS nearby search time with 64 clients
Fonte: Lee & Liang, 2011, p. 1297.

Each thread serves its query on the basis of the first-come-first-serve for 1000 queries for the
experiment. After some time (around 600 items of the graph), the number of idling threads is increased.
It leads to improve the performance of I/O of the system. As much as the number of threads has
decreased, the contention and overhead of PostGIS I/O could be decreased, so it shows decreasing
completion time after the number of 600 or so.

The clients are software threads to simulate accesses to each systems. To give a more stress to
each system, we locate them in the local network. Mostly, mobile devices have smaller network
bandwidth rather than a wired local network. The number of clients used in this experiments can be
considered as larger than actual mobile users in terms of network bandwidth and latency.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

62

Figure 11 shows the performance of Geopot’s nearby search for a local service. This is mainly
performed by local spatial service, that is, R-tree. For a nearby search request, the local server responds
with the coordinates of centroids and their hash tags for addressing the external S3 object. So, it is
much faster than PostGIS, which returns all raw data to clients. In most test cases, Geopot’s nearby
search is completed in under 0.15 sec. As shown in Figure 11(a), we can figure out that the average
elapsed time of each query is very stable across the different number of clients until 32 clients. After
32 clients, it increases a little. Each number of clients’ average elapsed time is 0.61 msec, 0.65 msec,
0.79 msec, 1.1 msec, 4.3 msec, and 1.56 msec, respectively. Figure 11(b) shows the actual
measurements of the test with 64 clients. Across the overall nearby search, it has more stable
performance than those of PostGIS.

The purpose of this section is to show the scalability of PostGIS and Geopot against a large
number of users. From Figures 10 and 11, we can figure out the different comple- tion time of each
system with the different number of accesses from clients. In case of PostGIS, it is a centralized system,
so it has a limitation to serve requests as the num- ber of accesses is growing. But Geopot can be
running with less resources than what PostGIS needs, because Geopot uses the Cloud storage for
serving global data service for a large number of users with its virtualized distributed system. This is
the same as the comparison between a single system and a distributed system. For PostGIS in this
exper- iment, it needs more time to complete its search as the number of accesses is increased. But,
Geopot shows stable performance for even more accesses. This means Geopot can make more stable
and more available service for an unpredictable number of mobile users. Because the Cloud storage
has geographically distributed storage resources across the world, the workloads from unpredictable
number of users can be well distributed to their resources. This also leads to avoid a single point of
failure or overload of the service.

Figura 42. (a) PostGIS nearby search time; (b) PostGIS nearby search time with 64 clients
Fonte: Lee & Liang, 2011, p. 1297.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

63

Figura 43. (a) Geopot nearby search time; (b) Geopot nearby search time with 64 clients
Fonte: Lee & Liang, 2011, p. 1298.

Quadro 14. Avaliações de desempenho e comparação com PostGIS


Fonte: Lee & Liang, 2011, p. 1295-1298

A ideia do Geopot é ser um serviço de dados georreferenciados baseado na nuvem para


aplicações mobile de forma a conseguir lidar com um grande volume de acessos de dispositivos.
Para isso, sua implementação se baseou nos serviços mais característicos das aplicações mobile,
o check-in e a busca pelos arredores. As duas partes do sistema foram detalhadas bem como a
utilização dos algoritmos R-tree e de tabela de dispersão. Os autores elaboram um índice
espacial baseado em agrupamento de forma que os dados geográficos possam utilizar o espaço
de forma mais eficiente. O principal resultado é oferecer um sistema de baixo custo, e os autores
detalharam como implementá-lo via servidor local e nuvem, algo que pode ser feito sem
grandes investimentos em infraestrutura.

By constructing a clustering-based spatial index, it enables more space-efficient


database for a large volume of geolocation data. This leads to get a faster spatial search
for mobile applications. Through our experiments, with one million geolocation data, it
needs 11,484 spatial indices (clusters), just 1.14% of all the data. Also, because it needs
a smaller footprint of working memory for spatial indices, Geopot’s insertion time of
new data is enhanced compared with a conventional spatial database, PostGIS. This is
more effective with a large-scale data. We get almost 3.82 times faster in one million
insertions test. Moreover, Geopot’s approach is to provide a scalable service that is not
limited by the storage capacity of a local data service. In case of nearby search, Geopot
shows very stable elapsed time for searches with regardlessness of the number of
accesses. PostGIS shows that the increasing elapsed time as the number of simultaneous
accesses is growing. The Cloud storage service is highly usable because its network
transfer time is stable with load balancing across geographically distributed storage
resources regardless of the number of simultaneous clients in our study. (Lee & Liang,
2011, p. 1300).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

64

3 Indexação Geográfica Aplicada ao Turismo

3.1 Soluções open source


Partindo das funcionalidades de aplicações GIS, o trabalho de Bessaa, Aissa, Amara, &
Aissa (2013) discutido nesta seção vai abordar soluções open source propondo uma nova
organização de imagens baseado em indexação geográfica. Utilizando a solução proposta para
um aplicações web em ambiente desktop com fins turísticos de uma região africana, os autores
propõem um algoritmo que acelera e otimiza a busca por imagens.
O detalhamento do caso ilustra um desenvolvimento de particular interesse para este
estado da arte relacionado à indexação geográfica aplicada ao turismo. Uma primeira
consideração dos autores tem que ver com o alto custo de software GIS, o que justifica o estudo
de plataformas open source.
De forma breve, os autores retomam algumas informações sobre aplicações GIS,
acrescentando aspectos ao já mencionados neste estado da arte.

The online GIS architecture is based on the client/server concept (Moretz, 2007).
The concept of client-server structure includes a division of the given application into
tasks dispensed between the client and the server (Dugerdil, 2005). An application
based on this idea consists usually of three main parts: a client, a server and a network
for communication. Each one of these elements consists of certain software and
hardware.
In a client/server application, clients contact the server to provide services. The
server responds by providing the requested service.
The client/server architecture can be seen from the physical point of view as a set
of computers representing clients connected to a network through which they communicate
with a machine called a ‘server’. The management of data and users is on the server which
enhances security and limits access to data. The system expandability is possible
independently from the hardware and operating systems installed on different nodes.
However, the costs of server administration remain high not to mention that in case of
failure of it, the whole network is paralysed. (Bessaa et al., 2013, p. 190).

Os autores classificam aplicações GIS online conforme os serviços que oferecem, sua
arquitetura e a forma de publicação das informações.

Online GIS in terms of services offered:


Based on the services that can provide an online GIS, Green and Bossomaier
(2002) classify them into two categories.
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

65

The first category offers one of the most widespread of online SIG: queries based
on spatial location. The Virtual Tourist is one of the examples (Plewe, 1997).
The second category allows the design and dissemination of online maps. Charles
Sturt University in Australia (Steinke et al., 1996) is an example of them.
Online GIS in terms of architecture:
Depending on the architecture of online GIS and the distribution of its components
between client and server, GIS can decomposed into two categories: (Franklin, 1996; Plewe,
1997).
- Online GIS oriented server: also called thin client solution (Green and
Bossomaier, 2002). This solution is independent of machine type and operating system,
the client interface is limited in the functionality offered by the Web browser (Plewe,
1997).
- Online GIS oriented client: In this type of architecture, analysis and processing
of GIS data are made on the web browser on the local machine (client). Other sources
used for this type of solution, the term fat client (Green and Bossomaier, 2002).
If we focus on how GIS is posted, we will find that there are two main approaches
(Green and Bossomaier, 2002; Baptista and Kemp, 2005). The first approach considers a
desktop GIS software, i.e. software that provides all the features (or almost all) on the user
workstation. A second approach would be to use the interface of a Web browser as an
interface of GIS software. (Besaaa et al., 2013, p. 190).

Os autores apresentam, então, a abordagem de mapa estático para publicar as aplicações


GIS online, sob a qual vão propor o algoritmo modificado. Indicam desenvolvimentos existentes
inclusive com abordagem open source. Apesar de ser a forma mais fácil de se tratar mapa em
ambiente web, as limitações são citadas nas palavras dos autores.

This approach, called ‘dead map’ by Avril et al. (2005), is the easiest to spread
a map on the web. It involves inserting cards preconceived as images in the HTML code
corresponding to the current Web page. Using the URL of the called Web page, the
client sends a request to the remote server. The latter, after finding the web page, returns
it to the client that will post on its web browser. The ‘static map’ provides features, in
most cases sufficient to satisfy the general public. Displaying maps in the Web page is
done automatically by the Web browser but the display time depends on the size and
image resolution. The change of scale is possible but only on hotspots. Therefore, each
click requires loading a new image. It is also possible to manage the thematic layers.
However, this requires predicting all possible cases display layers and save them as
images. The layout for printing and downloading maps is displayed in the same manner
with which we proceed to any other picture or text published on the Web. In most cases
it is better to opt for a dynamic solution.
The ‘static map’ is a solution that provides online data previously treated with a
desktop GIS software. They are saved as raster images supported by web browsers (GIF,
PNG, JPG, ...). This approach is ideal when maps are rarely updated; it remains an
acceptable solution for data spread to the general public. The implementation requires little
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

66

capital investment except a Web server and a connection with sufficient bandwidth.
Regarding the software, open source software for creating cards and web pages are
available (Apache, PHP, HTML, ...).
However, this solution has a number of disadvantages, including:
- Each user request is sent to the server increasing the traffic in the
network and therefore the response time of the server (Peng and Zhang, 2004).
- It does not offer a wide range of features (ESRI, 2006).
- Managing multiple levels of scale requires the
preparation of several maps. (Bessaa et al., 2013, p. 191).

Os autores usam o índice Q-tree para organizar as imagens georreferenciadas. Para eles,
isso permite uma menor sobrecarga em termos da publicação em páginas da web (Bessaa et al.,
2013). Discutem, então, soluções baseadas em vetores, abordagem Java e em servidores
conforme informações do Quadro 15.

Map vector
This solution is based on the vector representation of spatial objects constituting the image.
Very little information is needed to define an image which makes a small file. The vector approach
requires downloading and installing a complementary tool called ‘plugin’ on the client.

The vector solution provides better colour and texture rendering. The quality remains independent
of the scale. It provides the user a wide range of features such as zoom, pan, or management layers and
also allows queries on the data. The file size is reduced. However, it requires knowledge and time in
programming. When an image contains a lot of information, the image looks cluttered (Avril et al., 2005).

Exploitation of such a solution requires the installation of plugins, which remains an obstacle
for users wanting to access more quickly to the desired information. This problem is disappearing
with the integration of plugins in the new versions of Web browsers.

Java approach
The Java applet is one of the most widely used techniques for the online GIS. Java is the most
appropriate language for the manipulation of GIS data on the web. Applets are mini- applications that
run in a Web browser that can support Java applications. Each function such as zoom and query
formulation constitutes a separate applet (Peng, 1997). The GIS applet is an executable code on a
machine with access to a website of a web server.

From the software point of view, the solution requires installing the Java applet on the server
and Java Virtual Machine, type of interpreter needed to run the applet on the client (Avril et al., 2005).
Note that this solution is essentially open source. Finally, the implementation of an online GIS based
on such solutions requires some knowledge in programming.

Java solution offers many features (Peng, 1997; Neumann, 2007). However, the main
disadvantage is that the Java applet does not save the GIS data and treatment outcomes directly on the
local machine. This is due to restrictions imposed by JVM. They must therefore be downloaded from the
server by appending its digital signature.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

67

Map server approach


This kind of solution uses a map server which is a program in the form of CGI application or
in another form (Neumann, 2007). A map server is designed to create maps at the request of the
remote client with the settings selected by it (Cammack, 2007).

There are two types of map servers: open source and proprietary. Open source servers have
the advantage of being inexpensive. Proprietary solutions are very similar to desktop GIS in terms of
features offered. These are turnkey solutions, requiring simple familiarisation training. The features
offered in these types of solutions are many and varied (Avril et al., 2005).

Quadro 15. Soluções baseadas em vetores, abordagem Java e em servidores


Fonte: Bessaa et al., 2013, p. 191.

Os autores vão se debruçar sobre o tratamento das imagens com o índice Q-tree de forma
a lidar com a implementação da abordagem de mapa estático. Como a indexação já foi
amplamente abordada neste estado da arte, mantêm-se a formalização dos autores nas imagens
da Figura 44 e respectivas explicações apenas no intuito de contextualizar o desenvolvimento
por eles proposto. Com o mesmo propósito, mantém-se, nas palavras dos autores, a explicação
de como indexar imagens usando tal algoritmos.

Figura 44. Space decomposition and the associated tree


Fonte: Bessaa, B. et al., , 2013, p. 192.

As we have seen, the spatial index allows indexing of spatial objects using an
approximation of its geometry which is generally a rectangle. In our case the space objects
are already rectangles. All geo-referenced images preconceived constitute the elements of
the index. These images cover different areas of interest in space at different resolutions
correspond to rectangles progressively smaller, ranging from low to high resolutions.
At the acquisition of a new image, the webmaster has to launch the update
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

68

procedure that inserts the image in the database and in the index file without worrying
about its relationship with the other images in the database.
During browsing, when the user clicks on any position of the displayed image,
the point query procedure is started. This procedure is a research algorithm giving all
the objects containing the point defined by its coordinates. The result is a set of all
images containing that point.
According to two proposed navigation modes, and based on a selection
criterion, an image of this set is displayed. (Bessaa et al., 2013, p. 192).

Os autores detalham o mecanismo de navegação das imagens georreferenciadas a partir


da Figura 45.

Figura 45. Images result for different resolution


Fonte: Bessaa, B. et al., , 2013, p. 192.

Browsing mode
As the link between the images is not predefined, the result of a point query is a
set of images at different resolutions and which can overlap. Selecting one of these
images depends on two browsing modes:
Sequential browsing
In this mode, all the resulting images are sorted in the order of their resolutions.
The image to be displayed is selected from the group of images having the lowest
resolution that comes just after the resolution of the displayed image (images of the
space R1 after the space R0 in the example in Figure 2). Figure 2 illustrates, for the
display space, the sequentially transition from a resolution to another. It gives for each
space the group of resulting images having lower resolution (blue frames of the said
figure).
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

69

Direct browsing
All the resulting images are still sorted in the order of their resolutions, but the
image to be displayed is selected from the group of images with the highest resolution.
This mode is used in the case of a search for a specific object having its geographic
coordinates. (Bessaa et al., 2013, p. 192).

Os autores formalizam o point query conforme informações do Quadro 16.

In a given resolution, the results of a point query can be composed of several images. In this case,
the best image is chosen. We proposed to select the image that gives most details about the point P with
coordinates (x, y), by estimating the overlap area between the image and a square centre P.
Let a set of k images Ii and a point P(x,y). We calculate, for each image, the minimum distance
between the point P and the four sides of the image:

Figura 46. captura de tela


Fonte: Bessaa, B. et al., 2013, p. 192.

where (x0, y0) and (x1, y1) represent respectively the coordinates of the top left and bottom
right of the image Ii.
The best image that covers the point P is the image Im, which corresponds to the maximum
distance dm, such as:
dm Max(di )
In case of equality, we proceed to an arbitrary choice. Figure 3 illustrates the select criterion
for three images I1, I2 and I3. In this example, we will choose the image I3.

Figura 47. Image select criterion


Fonte: Bessaa, B. et al., 2013, p. 193.

Quadro 16. Point query


Fonte: Bessaa et al., 2013, pp. 192-193.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

70

Os autores inserem uma forma de exibição em mosaico, explicada na Figura 48,


aumentando o detalhamento da imagem. Em seguida, a Figura 49 apresenta o algoritmo
utilizado para esse tratamento da imagem.

Figura 48. Mosaicking operation


Fonte: Bessaa, B. et al., 2013, p. 193.

In order to have maximum detail in a given area, we have introduced an additional option
display: the mosaic display. Indeed, in the two previous modes, a selection criterion is applied to
select the best image including the selected point, for against, in the mosaic display it’s the solution
set of images (those contain the point P) which are displayed in an adequate mosaic window (e.g.
screen). We can also use a window query, in which case, the user selects a window instead of a point,
so the result would be all the images that have an overlap with the window.

The aim is to reconstruct an image result obtained after a click on a point P (x, y) of the
current image. The clicked point must appear at the middle of the image result. This is the result of
mosaicking several images obtained during the point query operation of the user. The position of P is
not the same in each image and the images sizes are different, it is necessary to take a portion of
each image and superimpose (mosaic) them in an original image (blank image of size of the display
window) such that the point P is centred in the resulting image. The overflow parts are deleted, see
Figure 4.

It is possible that the images used for the mosaicking are not sufficient to cover the entire
surface of the empty starting image. In this case, there will be some areas not covered (black). Rather,
it is possible that an image be sufficient to cover the original image as long as the point P is near the
centre of this image and the size thereof is larger than the display window.

Also note that all images used require pre-treatment (histogram equalisation) if the data

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

71

sources differ.

The mosaicking algorithm is described as follows:

Figura 49. captura de tela


Fonte: Bessaa, B. et al., 2013, p. 193.

Quadro 17. Tratamento da imagem


Fonte: Bessaa et al., 2013, p. 193.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

72

O algoritmo point query é reapresentado já que é modificado pelos autores para o


tratamento da aplicação GIS proposta. As informações estão no Quadro 18.

The point query algorithm is a search algorithm giving the set of objects (in this case
images) containing a point defined by its coordinates. To adapt the algorithm to our case, we made a
change that is to remove the images with a resolution lower than the current image. For this purpose
in addition to the point coordinates (x, y), the image resolution is being introduced as input parameter.

Input data: Q Tree Index with Root Coordinates (x, y) of point P


Resolution R of the current image. Navigation Mode (D: direct - S: Sequential)
Output data: An image I Begin
Nd = Root / / initialise the start node
/ / Browse the tree to reach the leaf node containing P.
While Nd <> Leaf do For each Son of Nd do
If (Nd contains P) then Nd.fils = Nd.Fils done
done
/ / Retrieve all images E satisfying the conditions E = 
For each image of Nd do
If (Image Contains P) and (Image.Résolution is higher than R)
then E = E + Image done
Sort E by resolution of the images
If (Mode = S) then E = images of lower resolution
else E = images of higher resolution
// apply the function Choice_Image that gives the best image
according to the criterion described in 4.3 Choice_Image (P, E)
End.

Quadro 18. Point query modificado


Fonte: Bessaa et al., 2013, pp. 193-194.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

73

As Figuras 48 e 49 ilustram os algoritmos para as imagens, conforme Quadro 19.


The following example illustrates the algorithm with a threshold of 7 images per leaf. Let a
collection of 18 images at different resolutions and let two points P1 and P2 (Figures 5, 6).
- For the point P1: the set E = I0, I1, I4, I8, I17 . For the sequential mode, the result depends
on the resolution of the current image. For the direct mode, the result is the image I8.
If we are in image I0, the result would be the image I1.
If the image is I , the result would be the image I.
- For the point P2: E = I0 (initial image). No image of the leaf contains P2, so there will be no
image matches found.

Figura 48. Images collection and original space decomposition


Fonte: Bessaa et al., 2013, p. 194.

Figura 49. Associated Q-tree and research path for P1 and P2


Fonte: Bessaa et al., 2013, p. 194.

Quadro 19. Algoritmos para as imagens


Fonte: Bessaa et al., 2013, p. 194.
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

74

O desenvolvimento dos autores é aplicado em uma solução GIS relacionada a áreas


turísticas na Algeria e parte da implementação de mapa estático.

Our goal is to reach a solution offering a simple and dynamic online GIS, using
open source tools and manipulating raster data. …
It comes to realise an Online GIS solution enhancing an area with tourism
potential. Through the web site of this application, the user connected to the Internet,
will have access to various information related to tourism in the region (hotels,
restaurants, museums, recreation centres ...).
The proposed approach mainly handles raster images. As mentioned earlier, this
solution requires the preparation of maps to display. However, it is possible for the
privileged user (the webmaster) to add images and integrate them into the database of
images. (Bessaa et al., 2013, p. 194).

Sobre as imagens consideradas, as considerações estão no Quadro 20.

In order to test our application, we used several multi- resolution satellite images covering
the region of Algiers. We took the following images: ETM + multispectral at 30 m and panchromatic
at 15 m of LANDSAT satellite, SPOT XS at 20 m and Panchromatic at 10 m and IKONOS
panchromatic at 1 m (Figure 7). To expand our database, we generated more images (fusion of
multispectral and panchromatic) for other resolutions using the set of images mentioned above.

To show the advantage of the proposed organisation, we have inserted in the given order,
ETM + images (30 m), IKONOS Panchromatic (1 m), ETM panchromatic (15 m), SPOT
panchromatic (10 m) and XS (20 m).

Figura 50. Set of images at different resolution


Fonte: Bessaa et al., 2013, p. 195.

Quadro 20. Sobre as imagens


Fonte: Bessaa et al., 2013, p. 195.
Projeto em curso com o apoio de:
Xplore market| Entidade Promotora: Parceiro:

75

A Figura 51 apresenta a sequência obtida após uma operação de point query.

Figura 51. Sequential Navigation after point query ‘El Aurassi Hotel’
Bessaa et al., 2013, p. 195.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

76

A Figura 52 mostra os resultados das operações entre o método usual e o proposto.

Figura 52. captura de tela - Comparison of operations performed by the administrator for both classical
and proposed methods
Bessaa et al., 2013, p. 195.

Algumas considerações sobre essa comparação são mantidas nas palavras dos autores
no Quadro 21. A Figura 53 e a 54 ilustram a exibição mosaico.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

77

We note that the new variant reduces the number of operations of the administrator, because
he no longer has to manage the relationship between images (hotspots).

In addition, the proposed variant automatically takes care of overlap areas between the images
of the same resolutions. The final result of the point query operation on the selected site is shown on
the application interface we implemented (Figure 9), when Figure 10 shows a mosaic display of point
query.

Figura 53. ‘El Aurassi Hotel’ site image viewing


Bessaa et al., 2013, p. 196.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

78

Figura 54. Mosaic of selected images.


Bessaa et al., 2013, p. 196.

Quadro 21. Comparação e sobre a exibição em mosaico


Fonte: Bessaa et al., 2013, p. 196.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

79

A contribuição dos autores se destaca por oferecer uma solução para aplicações GIS que
consideram a interoperabilidade entre os sistemas ao tratar de um software open source. Os
autores, por fim, afirmam que a solução proposta não elimina todos as questões para melhoria,
como por exemplo o tratamento de imagens.

In this paper we presented the different types of online GIS solutions and we
were interested to the ‘static map’ solution. We have proposed a dynamic variant of it
with a new organisation of images based on the use of spatial indexes which we have
made changes on the search algorithm. Finally we have implemented this technique by
developing an online GIS enhancing a region with significant potential for tourism.
Through our application, we used open source tools for the implementation of
an online GIS. The solution we have chosen is a thin client solution in which almost all
treatments are done on the server. This can overload the server if it is overworked and
the response time may depend on the speed of the Internet connection that the user has.
Design costs and investments in hardware are reduced for this solution, which makes it
ideal for organisations and countries financially limited.
The organization we proposed provides an improvement for the ‘static map’
solution to better answer to user needs. But it still remains that this solution has
disadvantages such as the preparation of maps. (Bessaa et al., 2013, p. 196).

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

80

4 Considerações Finais e Desdobramentos


A principal contribuição deste estado da arte é levantar os elementos constituintes das
estratégias de indexação que lidam com informações geográficas (ou espaciais, ou também
denominadas georreferenciadas, e geralmente indicadas pela sigla GIS).
Os desenvolvimentos estudados tratam tanto de ambiente web quanto ambiente mobile.
Eles apresentam diferentes desafios, em que se destaca o fato de aplicações em dispositivos
móveis carecerem de capacidade de armazenamento, processamento e bateria insuficiente para
tratar de dados geográficos. Por isso, há que se buscar nos desenvolvimentos web e desktop
princípios que norteiem desenvolvimentos mobile adequados.
O trabalho analisou, com detalhes, algoritmos dos principais modelos, aprofundando,
para além das formalizações matemáticas, aspectos a serem considerados por desenvolvimentos
semelhantes. As vantagens e desvantagens, os limites, os contributos, as estratégias de um
conjunto relevante de trabalho e autores permitem, por si só, ampliar o estado da arte do tema.
Vale ressaltar que ele deve estar articulado com as questões de caching discutidas no respectivo
levantamento bibliográfico.
Este estado da arte deixa claro a oportunidade de geração de conhecimento, teórico e
empírico, pelo Xplore no âmbito da indexação para conteúdo geográfico mobile.
Desenvolvimentos no turismo ainda estão longe de abordarem essa questão – o próprio fato de
se ter um único trabalho identificado pela revisão bibliográfica que lida com informações
geográficas com aplicação no turismo ilustra tais restrições. Seu desenvolvimento, contudo,
aponta a necessidade de se considerar plataformas open source, por serem financeiramente mais
acessíveis e interoperáveis.

Projeto em curso com o apoio de:


Xplore market| Entidade Promotora: Parceiro:

81

5 Referências

Bessaa, B., Aissa, M. B., Amara, R., & Aissa, A. B. (2013). Spatial indexing of static maps
for navigation in online GIS: application for tourism web GIS. International Journal of
Computer Applications in Technology, 47(2/3), 189-197.
https://doi.org/10.1504/ijcat.2013.054351
Boucetta, S.K., Daman, D., Shaik, S. (2014). Intelligent selection technique for database
indexing to augment the speed performance of query processing on mobile device. Life
Science Journal 11(4):239-245.
Inoue, C. R. (2015). Tipos de revisão de literatura. Tipos de Revisão de Literatura, 9.
Retrieved from http://www.fca.unesp.br/Home/Biblioteca/tipos-de-evisao-de-
literatura.pdf
Lee, D., & Liang, S. H. L. (2011). Geopot: A cloud-based geolocation data service for mobile
applications. International Journal of Geographical Information Science, 25(8), 1283–
1301. https://doi.org/10.1080/13658816.2011.558017
Myllymaki, J., & Kaufman, J. (2004). High-performance spatial indexing for location-based
services. (May), Journal Geoinformatica, 112. https://doi.org/10.1145/775165.775168
Norma Sandra Ferreira de Almeida. (2002). As pesquisas denominadas estado de arte.
Educação & Sociedade , 79(257–272), 257–272. Retrieved from
http://www.scielo.br/pdf/es/v23n79/10857.pdf
Shea, G. Y. K., & Cao, J. (2012). Geo-Planar Indexing (GPI) - An efficient indexing scheme
for fast retrieval of raster-based geospatial data in mobile GIS applications. 2012 5th
International Congress on Image and Signal Processing, CISP 2012, (978), 1047–1052.
https://doi.org/10.1109/CISP.2012.6469774
Song, Z., Chen, J., & Ye, J. Y. (2014). A Mobile Storage System for Massive Spatial Data.
Advanced Materials Research, 962–965, 2730–2734.
https://doi.org/10.4028/www.scientific.net/amr.962-965.2730

Projeto em curso com o apoio de:

Você também pode gostar