Escolar Documentos
Profissional Documentos
Cultura Documentos
net/publication/327117558
CITATIONS READS
0 106
3 authors:
David Moloney
Movidus
58 PUBLICATIONS 252 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Léonie Buckley on 20 August 2018.
Abstract—For the purpose of volumetric data, hashing acts 365, the probability that no two keys map to the same location
to map multi-dimensional space into the one-dimensional space. is only 0.4927.
Hashing is a popular method to store sparse data for the In hashing, the address calculation is generally achieved by
purporses of both gaming and computer graphics. Traditional
methods used to hash 3D volumetric data utilise large prime num- a randomised scrambling of key values, with many methods
bers in an attempt to achieve well-distributed hash addresses to using large primes to achieve this scrambling [7], [8], [9], [10],
minimise addressing collisions. These methods generate hashing [11]. A hash is “a random jumble achieved by hashing” [12].
addressing through randomisation. However, it has been shown However, it has been shown that this randomising method,
that when considering dynamic data, a low addressing collision using XOR Hashing, provides no predictability as to how it
rate cannot be guaranteed through this randomising technique.
In this paper, a spatial hashing implementation is investigated, will perform on a diverse range of data [15]. This is due
and whether varying performance parameters can be improved to hardcoded prime values. If the data being considered is
upon through the use of DECO Hashing. DECO leverages the static, then the hashing algorithm can be run multiple times
inherent structure present in 3D data, which exists in the sense to determine the optimum prime values to use, but this is not
that each coordinate in 3D space is already unique. An open possible when dynamic data is being received from sensors on
source version of Chisel is investigated - OpenChisel - and it is
determined whether the algorithm can be improved upon through robots and drones.
replacing the existing hashing function with DECO Hashing. An optimal solution is to provide a perfect hashing function,
which allows the retrieval of data in a hash table with a single
I. I NTRODUCTION query [16] [17], [18]. This would provide a hash with no
Sparse geometric data is ubiquitous in computer graphics, addressing collisions. Intuitively, this could be achieved by
GIS (Geographical Information Systems) and gaming applica- providing a suitably large hash table. However, this is not
tions, and their use on power and memory-confined embedded always practical as volumetric data is often used in memory
systems is commonplace. Applications for gaming inclue constrained situations, such as for SLAM (Simultaneous Lo-
human pose estimation [1], [2], the creation of ultra-realistic cation And Mapping) and GIS applications on mobile and
3D models [3] and 3D scene reconstruction for AR gaming [4]. robotic platforms [8], [11].
A challenge now exists to provide a sparse storage solution The Chisel algorithm is a system for real-time 3D recon-
which provides both performance and efficient storage [5]. struction onboard a Google Tango [8], which stores the 3D
Hashing is an attractive method to store this sparse data as data through dynamic spatial-hashing. In the investigation of
it does not require pointers, and allows for trivial lookup, the Chisel algorithm [8], it was noted that no justification
insertion and deletion of data. was given for the choice of hash function - XOR Hashing
Hashing is used to map data of arbitrary size to an address- (see Section II-A) - other than that it was used in previous
ing space. More specifically, for the purpose of 3D volumetric works [7], [19], [9].
data, hashing acts to map multi-dimensional data into the one- However, the properties of a hashing function can have a
dimensional domain [6]. Hashing is a popular method to store, devastating impact on the performance of an algorithm. It is
retrieve and delete 3D volumetric data [7], [8], [9], [10], [11]. acknowledged that when different data is mapped to the same
However, the choice of hash function impacts how effi- hash address, performance decreases [7], thus unique hash
ciently the data can be stored - a high probability of unique addresses are desirable.
hashing addresses reduces the need for additional overheads As much computation regarding volumetric data takes place
to deal with addressing collisions. It also allows for high load on embedded platforms - Chisel is benchmarked on two Tango
factors (a measure of how full the hash table is), which reduces devices, a phone and tablet [8] - such memory constrained
wasted memory in the form of empty hash table addresses. devices require similarly low memory-demanding algorithms.
In much of the early literature discussing hashing [12] [13], To an extent this was taking into account in Chisel through
the common goal is to achieve distinct mapping addresses. only representing occupied data to avoid needless computation
However, Knuth states that “it is theoretically impossible to and wasted memory [8]. However, a suboptimal hashing func-
define a hash function that creates truly random data from tion produces needless computation, as addressing collisions
nonrandom data in files” [13]. The birthday paradox [14] must be resolved, diverting resources from other applications.
highlights the difficulty in achieving distinct addresses: if a Minimising these collisions will lead to improvements in other
random function is selected to map 23 keys to a table of size aspects of an algorithm.
314
2018 IEEE Games, Entertainment, Media Conference (GEM).
The address in the hash table is then found by calculating • Maximum Load Factor - The maximum permissible
the modulus of the index, where N is the number of table load factor. The current load factor exceeding the max-
addresses, as shown in Equation 5 below. imum load factor forces an increase in the number of
buckets, and thus causing a rehash.
hash addr = index%N (5) • Bucket Size - The number of entries in a single bucket.
The number of elements in a bucket influences the time
Similar addressing schemes to DECO Hashing exist [27], it takes to access a particular element in the bucket.
[28], but both approaches are more suited to static data due Therefore it is preferable to minimise the number of
to the large dense hash table size and linear addressing re- elements within occupied buckets through reducing ad-
spectively. Linear addressing removes the ability to randomly dressing collisions.
access voxels in the hash table as the table must be traversed • Rehash Rehashes are automatically performed by the
to find the desired voxel. hash table whenever its load factor is going to surpass
its maximum load factor in an operation. A rehash is a
A. Adaptive Qualities of DECO Hashing reconstruction of the hash table - All the elements in the
DECO Hashing is a parameterisable method of hashing and hash table are rearranged according to their hash value
can be adapted to a particular dataset. The following proper- into the new set of buckets. This may alter the order of
ties are adaptive and can be altered to minimise addressing iteration of elements within the container.
collisions.
1) Side Length: It is possible to choose a value of SL
which will guarantee that the index value, calculated using B. Experiments Conducted
Equation 4, of every voxel will be unique. This value is To adequately compare XOR Hashing against DECO Hash-
the cubed root of the number of voxel blocks present, or, ing in OpenChisel, various sequences from the Freiburg [32]
the number of voxel blocks that exist on a side. When this dataset were examined. These are the same datasets that
optimum value is chosen, the maximum value of any identifier were used in the original Chisel [8] implementation. The
coordinates of a voxel block is SL − 1. This ensures that no exact sequences examined are detailed in Appendix A. The
x, y, z combinations will produce the same index value. data hashed was voxel chunks, each of size Nv3 voxels. The
2) Coordinate Ordering: Given the distribution of data coordinates to be used in XOR and DECO Hashing are the
being considered, the ordering of x, y, z can be altered to coordinates of the voxel chunks, which are found through
reduce the rate of addressing collisions. rounding a world coordinate to a chunk coordinate [8]. The
For example, for large-scale LiDAR datasets such as the OpenChisel algorithm was run for the various sequences, and
Dublin City Dataset [29], [30], it can be assumed that the data at the end of each run different parameters were recorded. The
will be highly sparse in the z (vertical) direction (due to tall following were recorded:
buildings). Therefore, the ordering that minimises addressing
collisions under these conditions can be chosen. • Traversal Time - The number of cycles taken to traverse
The predictability that these parameters allow is in stark the hashed structure. The time taken to traverse a the data
contrast with the stochastic nature of XOR Hashing. structure is of importance as it provides information about
the mapped area - for example, how densely occupied is
IV. C OMPARISON OF H ASHING M ETHODS the area, what is the maximum occupied height etc. Intu-
itively, the faster this traversal can take place, the faster
Below are a description of some key concepts in hashing,
the system can proceed with other actions. A speedy
as well as a description of the tests administered.
traversal is of importance especially for applications such
A. Parameters as drones and robotics which are dealing with dynamic
situations and may be required to make decisions based
The following parameters are important to understand when on information gained from the traversal.
interpreting the tests administered. The definitions are pro- Traversal was measured at the same points for both hash-
vided from http://www.cplusplus.com/ [31]: ing functions, and was conducted on a Intel(R) Core(TM)
• Chunks - A Chunk, as described in [8], consists of a i7-6820HQ CPU @ 2.70GHz. This process was repeated
fixed grid of N 3 voxels. numerous time, the lowest and highest outliers were
• Bucket - A bucket is a slot in the hash table’s internal excluded and an average number of cycles was recorded.
hash table where elements are assigned based on the hash The cycles were recorded using a cycle counter from
value of their key. FFTW [33], a cross-platform cycle counter.
• Hash Table Size - The number of buckets in the hash • Bucket Occupancy - The percentage of hash buckets
table. occupied.
• Load Factor - The load factor is the ratio between the • Collisions - The percentage of hash buckets that con-
number of entries in the hash table and the number of tained more than more voxel chunk (i.e. a collision oc-
buckets, i.e. how full the table is. curred) with respect to the total number of hash buckets.
315
2018 IEEE Games, Entertainment, Media Conference (GEM).
• Relative Collisions - The percentage of hash buckets that with DECO Hashing are shown in Figure 2. There was an
contained more than more voxel block with respect to the improvement (i.e. a decrease) in traversal times for each of
number of occupied hash buckets. the sequences examined when XOR Hashing was replaced
• Cost of Computation - The cost of computing both with DECO Hashing. The reason for the improved traversal
hashing functions. This cost was measured as the length time is due to the decrease in addressing collisions that occur
in cycles required to calculate each function. when XOR Hashing is replaced with DECO Hashing, which
• Locality - The locality with which voxels in the 3D space is described in Section III. The average improvement in the
are placed relative to each other in the hash table. traversal time when exchanging XOR Hashing for DECO
These parameters were measured after the entire dataset Hashing was 16.72%.
had been evaluated by the OpenChisel algorithm - i.e. the
mesh had been constructed. The datasets used were from the
Freiburg Dataset [32], which are described in more detail in
Appendix A. Each of the datasets were tested individually,
using the ROS bag files provided with the datasets.
The sole alteration made to the open source version of
the Chisel algorithm was to swap the original XOR Hashing
function with DECO Hashing. Both of these algorithms are
described in Section II.
Collisions
OpenChisel deals with collisions through using a C++ asso-
ciative container, the unordered map. When voxel blocks are
mapped to the same address in an unordered map, it is not
Fig. 2: The improvement in traversal time when XOR Hashing
technically considered that a collision has occurred, as the new
was replaced with DECO Hashing
voxel block is simply added to the bucket. While this pointer
based collision chaining solution in suitable for the PC based
OpenChisel implementation, pointer implementations are not B. Bucket Occupancy
suitable for memory confined embedded systems, especially As described in Section IV, bucket occupancy is given as
when considering dynamic data. When new data is to be the percentage of hash buckets that are occupied. As many
inserted/deleted, the pointers at times must be reassigned, volumetric hashing solutions, such as Chisel [8], take place
which introduces additional overheads. on memory constrained embedded systems, it is not practical
Instead of focusing energy and resources into collision to provide an infinitely large hash table to accommodate
resolution, it is preferable to focus on collision reduction, so as all voxels. This is why only occupied voxel chunks are
to avoid to need for collision resolution at all. This will release represented through hashing.
resources previously required for collision resolution for other Another characteristic of an optimum hash function is a
requirements. This is of particular relevance on memory and fully occupied hash table [16], [17]. That is, no empty buckets
resource confined embedded systems, such as those used in exist but still no addressing collisions have occurred. This of
the original Chisel [8] implementation. course is not practical when dealing with dynamic data, as
new data insertions would cause addressing collisions if the
V. R ESULTS
table is fully occupied. Therefore, the more practical optimal
The results for the tests as detailed in Section IV are outlined solution is to minimise the number of addressing collisions
below. and maximise the bucket occupancy.
From the results in Figure 3, it is clear that DECO Hashing
A. Traversal Time
had a higher bucket occupancy than XOR Hashing for each
As discussed in Section IV, the traversal time is taken of the sequences considered.
as the number of cycles required to traverse the hash table.
Traversal comprises of querying every bucket in the hash table C. Collision Rate
to determine whether it is occupied or not. It also comprises As described in Section IV, the collision rate is given as
of noting all voxel blocks in the bucket - if no collisions the percentage of hash buckets that contained more than one
have occurred then a single query will suffice for that bucket. voxel chunk (i.e. a collision occurred) with respect to the total
However, if collisions have occurred, each of the additional number of hash buckets. As previously mentioned, an optimal
voxel blocks must then be queried in turn, meaning that the hashing solution is to provide a perfect hashing function
number of elements in a bucket influences the time it takes to which allows the retrieval of data from the hash table with a
access a particular element in the bucket [31]. single query [16], [17]. That is, no addressing collisions occur.
The improvement in traversal time for the sequences de- While an optimal hashing solution is not always possible to
scribed in Appendix A when XOR Hashing was replaced implement, reducing the addressing collision rate as much as
316
2018 IEEE Games, Entertainment, Media Conference (GEM).
317
2018 IEEE Games, Entertainment, Media Conference (GEM).
Bunny
traditional XOR method - it requires less cycles to compute,
Resolution 643
and thus will lead to an overall decrease in execution time for
No occupied voxels 11070
the algorithm that it is being used in.
% Occupancy 4.2%
Max x 63
Max y 62
Max z 49
Hash Table Size (No. Entries) 2040
Hash Table Size (Bytes) 32kB
F. Locality of mappings
318
2018 IEEE Games, Entertainment, Media Conference (GEM).
every voxel in 3D space possesses unique coordinates. This XOR Hashing for Hashing volumetric data. The improvements
ensures that the index calculated in Equation 4 will be unique. that were found in each of the experiments conducted is due
Collisions will only occur because the modulus of the hash to the fact that DECO Hashing exploits the inherent structure
table size must be taken so that the hash addresses will fit in that 3D data possesses through every voxel possessing unique
the table - this is calculated using Equation 5. coordinates. While DECO Hashing does dramatically decrease
While technically XOR Hashing is deterministic (the same the rate of addressing collisions, it does not eliminate them
input will always produce the same output), the address calcu- completely.
lation is achieved through the randomised scrambling of key
values, utilising large primes. This random nature can provide VIII. F UTURE W ORK
no guarantee of unique hashing addresses. While DECO does
not guarantee unique addresses either, it has been shown to add An obvious expansion to DECO Hashing is to include
a certain level of predictability, through every index calculation addressing collision resolution. To date this has not been im-
being unique. This in turn increases the probability of unique plemented due to the low percentage of addressing collisions
hash addresses. These are the properties that allowed DECO that have occurred on the datasets tested.
Hashing to outperform XOR Hashing in all of the experiments A further consideration is to add condensation as is used
that were conducted. with linear quad/octrees [35], [24], [25], [36]. Condensation
concerns representing a large number of occupied voxels
VII. C ONCLUSION
with a single identifier. The use of condensation in linear
The following have been determined through the replace- quad/octrees has been shown to, in best case scenarios, require
ment of XOR Hashing with DECO Hashing for Open- only 2% of the memory required by regular quad/octrees.
Chisel [20] an open-source implementation of Chisel [8]. The While it has been shown that DECO is superior to XOR
experiments administered are detailed in Section IV. Hashing when hashing 3D volumetric data, one must always
Discussing the use of XOR Hashing, [7] states that “Al- take caution when using hash functions for other purposes.
though the hash function does not always provide a unique Knuth offers a word of caution “we can never be completely
mapping of grid cells, it can be generated very efficiently sure that a hash function will perform properly when it is
and does not require complex a data structure”. This paper applied to a new set of data.”
has shown that replacing XOR Hashing with DECO leads to
substantial performance gains - DECO Hashing outperformed
XOR Hashing in every test administered: A PPENDIX
• Traversal Time - An improvement of 16.82% was found A. Datasets
when XOR Hashing was replaced with DECO Hashing.
This is due to the decrease in addressing collisions when The datasets examined in this paper are from the Freiburg
comparing DECO Hashing with XOR Hashing. Dataset [32]. Varying sequences were chosen to ensure a wide
• Bucket Occupancy - range of data distribution within the datasets. Listed below are
Again, DECO Hashing outperformed XOR Hashing in the sequences used along with a brief description [32].
this experiment. For the sequences examined, XOR Hash- • freiburg2 xyz - This sequence contains very clean data
ing had an average bucket occupancy of 21.08%, while for debugging translations. The Kinect was moved along
DECO Hashing had an average bucket occupancy of the principal axes in x-, y- and z-direction very slowly.
27.76%. The slow camera motion basically ensures that there is
• Collision Rate - (almost) no motion blur and rolling shutter effects in the
Again, DECO Hashing outperformed XOR Hashing in data.
this experiment. For the sequences examined, XOR Hash- • freiburg3 walking static - Two persons walk through
ing had an average collision rate of 5.7%, while DECO an office scene. The Asus Xtion sensor has been kept
Hashing had an average collision rate of 2.57%. in place manually. This sequence is intended to evaluate
• Relative Collision Rate - the robustness of visual SLAM and odometry algorithms
Again, DECO Hashing outperformed XOR Hashing in to quickly moving dynamic objects in large parts of the
this experiment. For the sequences examined, XOR Hash- visible scene.
ing had an average relative collision rate of 21.71%, while • freiburg1 360 This sequence contains a 360 degree turn
DECO Hashing had an average relative collision rate of in a typical office environment
6.29%. • freiburg3 walking xyz Two persons walk through an
• Locality - DECO Hashing was shown to store voxels office scene. The Asus Xtion sensor has manually been
in the hash table with more locality than XOR Hashing. moved along three directions (xyz) while keeping the
That is, voxels close to each other in the multidimensional same orientation. This sequence is intended to evaluate
space are close to each other in the onedimensional space. the robustness of visual SLAM and odometry algorithms
Given the results above it is the view of the authors of this to quickly moving dynamic objects in large parts of the
paper that DECO Hashing is a preferable method to Traditional visible scene.
319
2018 IEEE Games, Entertainment, Media Conference (GEM).