Escolar Documentos
Profissional Documentos
Cultura Documentos
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
art ic l e i nf o a b s t r a c t
Article history: This paper presents a fiducial marker system specially appropriated for camera pose estimation in
Received 4 June 2013 applications such as augmented reality and robot localization. Three main contributions are presented.
Received in revised form First, we propose an algorithm for generating configurable marker dictionaries (in size and number of
2 December 2013
bits) following a criterion to maximize the inter-marker distance and the number of bit transitions. In the
Accepted 9 January 2014
Available online 21 January 2014
process, we derive the maximum theoretical inter-marker distance that dictionaries of square binary
markers can have. Second, a method for automatically detecting the markers and correcting possible
Keywords: errors is proposed. Third, a solution to the occlusion problem in augmented reality applications is shown.
Augmented reality To that aim, multiple markers are combined with an occlusion mask calculated by color segmentation.
Fiducial marker
The experiments conducted show that our proposal obtains dictionaries with higher inter-marker
Computer vision
distances and lower false negative rates than state-of-the-art systems, and provides an effective solution
to the occlusion problem.
& 2014 Elsevier Ltd. All rights reserved.
0031-3203/$ - see front matter & 2014 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.patcog.2014.01.005
S. Garrido-Jurado et al. / Pattern Recognition 47 (2014) 2280–2292 2281
Fig. 1. Example of augmented reality scene. (a) Input image containing a set of fiducial markers. (b) Markers automatically detected and used for camera pose estimation.
(c) Augmented scene without considering user0 s occlusion. (d) Augmented scene considering occlusion.
Section 5 presents our solution to the occlusion problem. Finally, blob detection and its design was optimized by using genetic
Section 6 shows the experimentation carried out, and Section 7 algorithms. Some authors have proposed the use of trained
draws some conclusions. classifiers to improve detection in cases of bad illumination and
Finally, it must be indicated that our work has been imple- blurring caused by fast camera movement [21].
mented in the ArUco library which is freely available [13]. An alternative to the previous approaches is the square-based
fiducial markers systems. Their main advantage is that the
presence of four prominent points can be employed to obtain
2. Related work the pose, while the inner region is used for identification (either
using a binary code or an arbitrary pattern such as an image).
A fiducial marker system is composed by a set of valid markers In the arbitrary pattern category, one of the most popular systems
and an algorithm which performs its detection, and possibly is ARToolKit [10], an open source project which has been exten-
correction, in images. Several fiducial marker systems have been sively used in the last decade, especially in the academic commu-
proposed in the literature as shown in Fig. 2. nity. ARToolKit markers are composed by a wide black border with
The simplest proposals consist in using points as fiducial an inner image which is stored in a database of valid patterns.
markers, such as LEDs, retroreflective spheres or planar dots Despite its popularity, it has some drawbacks. First, it uses a
[14,15], which can be segmented using basic techniques over template matching approach to identify markers, obtaining high
controlled conditions. Their identification is usually obtained from false positive and inter-marker confusion rates [22]. Second, the
the relative position of the markers and often involves a complex system uses a fixed global threshold to detect squares, making it
process. very sensitive to varying lighting conditions.
Other approaches use planar circular markers where the identi- Most of the square-based fiducial systems use binary codes.
fication is encoded in circular sectors or concentric rings [16,17]. Matrix [23] is one of the first and simplest proposals. It uses a
However, circular markers usually provide just one correspondence binary code with redundant bits for error detection. The ARTag
point (the center), making necessary the detection of several of [11] system is based on the same principles but improves the
them for pose estimation. robustness to lighting and partial occlusion by using an edge-
Other types of fiducial markers are based on blob detection. based square detection method, instead of a fixed threshold.
Cybercode [18] or VisualCode [19] is derived from 2D-barcodes Additionally, it uses a binary coding scheme that includes check-
technology as MaxiCode or QR but can also accurately provide sum bits for error detection and correction. It also recommends
several correspondence points. Other popular fiducial markers are using its dictionary markers in a specific order so as to maximize
the ReacTIVision amoeba markers [20] which are also based on the inter-marker distances. Its main drawback is that the proposed
2282 S. Garrido-Jurado et al. / Pattern Recognition 47 (2014) 2280–2292
marker dictionary is fixed to 36 bits and the maximum number of proposal is an effective method for improving current augmented
erroneous bits that can be corrected is two, independent of the reality applications such as in gaming or film industry, although
inter-marker distances of the subset of markers used. not limited to that.
ARToolKit Plus [24] improves some of the features of ARToolKit.
First, it includes a method to automatically update the global
threshold value depending on pixel values from previously 3. Automatic dictionary generation
detected markers. Second, it employs binary codes including error
detection and correction, thus achieving higher robustness than its The most relevant aspects to consider when designing a marker
predecessor. The last known version of ARToolKitPlus employs a dictionary are the false positive and negative rates, the inter-
binary BCH [25] code for 36 bits markers which presents a marker confusion rate, and the number of valid markers [11]. The
minimum Hamming distance of two. As a consequence, ARToolK- first two are often tackled in the literature using error detection
itPlus BCH markers can detect a maximum error of one bit and and correction bits, which, on the other hand, reduces the number
cannot perform error correction. ARToolKitPlus project was halted of valid markers. The third one depends only on the distance
and followed by the Studierstube Tracker [12] project which is not between the markers employed. If they are too close, a few
publicly available. erroneous bits can lead to another valid marker of the dictionary,
BinARyID [26] proposes a method to generate binary coded and the error could not be even detected.
markers focused on avoiding rotation ambiguities, however it only Another desirable property of markers is having a high number
achieves Hamming distance of one between two markers and does of bit transitions, so that they are less likely to be confused with
not present any error correction process. There are also some environment objects. For instance, the binary codes with only
closed-source systems which employ square markers such as the zeros or ones will be printed as completely black or white markers,
SCR, HOM and IGD [27] marker systems used by the ARVIKA respectively, which would be easily confused with environment
project [28]. objects.
This paper proposes a square-based fiducial marker system While previous works impose fixed dictionaries, we propose an
with binary codes. However, instead of using a predefined set of automatic method for generating them with the desired number
markers, we propose a method for generating configurable marker of markers and with the desired number of bits. Our problem is
dictionaries (with arbitrary size and number of markers), contain- then to select m markers, from the space of all markers with n n
ing only the number of markers required. Our algorithm produces bits, D , so that they are as far as possible from each other and with
markers using a criterion to maximize the inter-marker distance as many bit transitions as possible. In general, the problem is to
and the number of bit transitions. Additionally, a method for find the dictionary Dn that maximizes the desired criterion τ^ ðDÞ:
detecting and correcting errors, based on the dictionary obtained, Dn ¼ argmaxf^τ ðDÞg ð1Þ
is proposed. This method allows error correction of a greater DAD
number of erroneous bits compared to the current state of the art Since a complete evaluation of the search space is not feasible
systems. even for a small n, a stochastic algorithm that finds suboptimal
Our last contribution is related to the occlusion problem in solutions is proposed.
augmented reality applications. When designing an augmented
reality application, interactivity is a key aspect to consider. So, one 3.1. Algorithm overview
may expect users to occlude the markers. ARTag handles the
problem in two ways. First, the marker detection method allows Our algorithm starts from an empty dictionary D that is
small breaks in the square sides. Second, they employ several incrementally populated with new markers. Our markers are
markers simultaneously, thus, the occlusion of some of them does encoded as a ðn þ2Þ ðn þ 2Þ grid (Fig. 3) where the external cells
not affect the global pose estimation. Despite being robust to are set as black, creating an external border easily detectable. The
occlusion, ARTag still has a main drawback: it cannot detect remaining n n cells are employed for coding. Thus, we might
occlusion precisely. As a consequence, if an object moves between define a marker
the camera and the augmented scene (e.g. user0 s hands), the
virtual objects will be rendered on the hands, hiding it (see m ¼ ðw0 ; w1 ; …; wn 1 Þ; ð2Þ
Fig. 1(c,d)). as a tuple composed by n binary words w of length n such that
Proposals to detect the occluded regions usually fall into three
w ¼ ðb0 ; …; bn 1 jbi A f0; 1gÞ: ð3Þ
main categories: depth-based, model-based, and color-based
approaches. Depth-based approaches try to calculate the depth Let us also denote W as the set of all possible words of n bits,
of the image pixels to detect occlusions. However, these whose cardinal is jWj ¼ 2n .
approaches require depth-based sensors, such as stereo, time of At each iteration of the algorithm, a marker is selected based on
flight or structured light cameras [29–31]. When a single camera is a stochastic process that assigns more probability to markers with
used, some authors have adopted model-based approaches a higher number of bit transitions and whose words have not been
[32,33]. The idea is to provide geometric models of the objects yet added to D. If the distance between the generated marker and
which can occlude the scene, and detect their pose. This solution is these in D is greater than a minimum value τ, then it is added.
not practical in many applications where the occluding objects are Otherwise, the marker is rejected and a new marker is randomly
not known in advance, and imposes very strong performance selected. The process stops when the required number of markers
limitations. Finally, color-based approaches [34] can be employed. is achieved.
The idea is to create a color model of the scene (background) Because of the probabilistic nature of the algorithm, the
which is then compared to the foreground objects. acceptance of new markers could be improbable or even impos-
In this work, we propose the use of multiple markers to handle sible in some cases. To guarantee the convergence of the algo-
occlusion (as in ARTag). However, we also propose the use of a rithm, the distance threshold is initially set to the maximum
color map for precisely detecting the visible pixels, so that the possible inter-marker distance that the dictionary can have τ0.
virtual scene is only rendered on them. In order to improve Along the process, the value of τ is reduced after a number of
segmentation, we employ blue and green markers, instead of unproductive iterations ψ. The final value τ^ ðDÞ represents the
classical black-and-white ones. As we experimentally show, our minimum distance between any two markers in D, and it will be
S. Garrido-Jurado et al. / Pattern Recognition 47 (2014) 2280–2292 2283
Fig. 3. Examples of markers of different sizes, n, generated with the proposed method. From left to right: n¼5, n¼ 6 and n¼ 8.
Fig. 4. Examples of quartets for a 2 2 and 3 3 marker. Each arrow indicates the destination of a bit after a 901 clockwise rotation.
Fig. 5. Image process for automatic marker detection. (a) Original image. (b) Result of applying local thresholding. (c) Contour detection. (d) Polygonal approximation and
removal of irrelevant contours. (e) Example of marker after perspective transformation. (f) Bit assignment for each cell.
4. Marker detection and error correction into it (see Fig. 5(e,f)). A first rejection test consists in detecting
the presence of the black border. If all the bits of the border are
This section explains the steps employed to automatically zero, then the inner grid is analyzed using the method
detect the markers in an image (Fig. 5(a)). The process is described below.
comprised by several steps aimed at detecting rectangles and Marker identification and error correction: At this point, it is
extracting the binary code from them. For that purpose, we take as necessary to determine which of the marker candidates
input a gray-scale image. While the image analysis is not a novel obtained actually belongs to the dictionary and which are just
contribution, the marker code identification and error correction is part of the environment. Once the code of a marker candidate is
a new approach specifically designed for the generated diction- extracted, four different identifiers are obtained (one for each
aries of our method. The steps employed by our system are possible rotation). If any of them is found in D, we consider the
described in the following: candidate as a valid marker. To speed up this process, the
dictionary elements are sorted as a balanced binary tree. To
Image segmentation: Firstly, the most prominent contours in that aim, markers are represented by the integer value
the gray-scale image are extracted. Our initial approach was obtained by concatenating all its bits. It can be deduced then
employing the Canny edge detector [35], however, it is very that this process has a logarithmic complexity Oð4 log 2 ðjDjÞÞ,
slow for our real-time purposes. In this work, we have opted for where the factor 4 indicates that one search is necessary for
a local adaptive thresholding approach which has proven to be each rotation of the marker candidate.
very robust to different lighting conditions (see Fig. 5(b)).
Contour extraction and filtering: Afterwards, a contour extrac- If no match is found, the correction method can be applied.
tion is performed on the thresholded image using the Suzuki Considering that the minimum distance between any two markers
and Abe [36] algorithm. It produces the set of image contours, in D is τ^ , an error of at most ⌊ð^τ 1Þ=2c bits can be detected and
most of which are irrelevant for our purposes (see Fig. 5(c)). corrected. Therefore, our marker correction method consists in
Then, a polygonal approximation is performed using the calculating the distance of the erroneous marker candidate to all
Douglas–Peucker [37] algorithm. Since markers are enclosed the markers in D (using Eq. (8)). If the distance is equal to or
in rectangular contours, those that are not approximated to smaller than ⌊ð^τ 1Þ=2c, we consider that the nearest marker is the
4-vertex polygons are discarded. Finally, we simplify near correct one. This process, though, presents a linear complexity of
contours leaving only the external ones. Fig. 5(d) shows the Oð4jDjÞ, since each rotation of the candidate has to be compared to
resulting polygons from this process. the entire dictionary. Nonetheless, it is a highly parallelizable
Marker Code extraction: The next step consists in analyzing the process that can be efficiently implemented in current computers.
inner region of these contours to extract its internal code. First, Note that, compared to the dictionaries of ARToolKitPlus
perspective projection is removed by computing the homo- (which cannot correct errors using the BCH set) and ARTag (only
graphy matrix (Fig. 5(e)). The resulting image is thresholded capable of recovering errors of two bits), our approach can correct
using Otsu0 s method [38], which provides the optimal image errors of ⌊ð^τ 1Þ=2c bits. For instance, for a dictionary generated in
threshold value given that image distribution is bimodal Section 6 with 6 6 bits and 30 markers, we obtained τ^ ¼ 12. So,
(which holds true in this case). Then, the binarized image is our approach can correct 5 bits of errors in this dictionary.
divided into a regular grid and each element is assigned the Additionally, we can generate markers with more bits which leads
value 0 or 1 depending on the values of the majority of pixels to a larger τ^ , thus increasing the correction capabilities. Actually,
2286 S. Garrido-Jurado et al. / Pattern Recognition 47 (2014) 2280–2292
c
our detection and correction method is a general framework that where N k ðujμck ; Σ ck Þ is the k-th Gaussian distribution and πkc is the
can be used with any dictionary (including ARToolKitPlus and mixing coefficient, being
ARTag dictionaries). In fact, if our method is employed with the ∑ π ck ¼ 1:
ARTag dictionary of 30 markers, for instance, we could recover k ¼ 1;2
from errors of 5 bits, instead of the 2 bits they can recover from.
In an initial step, the map must be created from a view of the
board without occlusion. In subsequent frames, color segmenta-
Corner refinement and pose estimation: Once a marker has been
tion is done analyzing if the probability of a pixel is below a certain
detected, it is possible to estimate its pose with respect to the
threshold θc. However, to avoid the hard partitioning imposed by
camera by iteratively minimizing the reprojection error of the
the discretization, the probability of each pixel is computed as the
corners (using for instance the Levenberg–Marquardt algo-
weighted average of the probabilities obtained by the neighbor
rithm [39,40]). While, many approaches have been proposed
cells in the map
for corner detection [41–43], we have opted for doing a linear
regression of the marker side pixels to calculate their intersec- ∑c A Hðpc Þ wðpm ; cÞPðpu ; cÞ
P ðpÞ ¼ ; ð15Þ
tions. This approach was also employed in ARTag [11], ∑c A Hðpc Þ wðpm ; cÞ
ARToolKit [10] and ARToolKitPlus [24].
where pu is the color of the pixel, Hðpc Þ M are the nearest
neighbor cells of pc, and
wðpm ; cÞ ¼ ð2 jpm cj1 Þ2 ð16Þ
1
5. Occlusion detection is a weighting factor based on the L -norm between the mapped
value pm and the center of the cell c. The value 2 represents the
Detecting a single marker might fail for different reasons such maximum possible L1 distance between neighbors. As a conse-
as poor lighting conditions, fast camera movement, and occlu- quence, the proposed weighting value is very fast to compute and
sions. A common approach to improve the robustness of a marker provides good results in practice.
system is the use of marker boards. A marker board is a pattern Considering that the dimension of the observed board in the
composed by multiple markers whose corner locations are image is much bigger than the number of cells in the color map,
referred to a common reference system. Boards present two main neighbor pixels in the image are likely to have similar probabil-
advantages. First, since there are more than one marker, it is less ities. Thus, we can speed up computation by downsampling the
likely to lose all of them at the same time. Second, the more image pixels employed for calculating the mask and assigning the
markers are detected, the more corner points are available for same value to its neighbors.
computing the camera pose, thus, the pose obtained is less Fig. 6 shows the results of the detection and segmentation
influenced by noise. Fig. 1(a) shows the robustness of a marker obtained by our method using as input the hue channel and a
board against partial occlusion. downsampling factor of 4. As can be seen, the occluding hand is
Based on the marker board idea, a method to overcome the properly detected by color segmentation.
occlusion problem in augmented reality applications (i.e., virtual Finally, it must be considered that the lighting conditions might
objects rendered on real objects as shown in Fig. 1(c,d)) is change, thus making it necessary to update the map. This process
proposed. Our approach consists in defining a color map of the can be done with each new frame, or less frequently to avoid
board that is employed to compute an occlusion mask by color increasing the computing time excessively. In order to update the
segmentation. color map, the probability distribution of the map cells is recalcu-
Although the proposed method is general enough to work with lated using only the visible pixels of the board. The process only
any combinations of colors, we have opted in our tests to replace applies to cells with a minimum number of visible pixels γc, i.e.,
black and white markers by others with higher chromatic contrast only if jI c j 4 γ c .
so as to improve color segmentation. In our case, blue and green
have been selected. Additionally we have opted for using only the
6. Experiments and results
hue component of the HSV color model, since we have observed
that it provides the highest robustness to lighting changes and
This section explains the experimentation carried out to test
shadows.
our proposal. First, the processing times required for marker
Let us define the color map M as a nc mc grid, where each
detection and correction are analyzed. Then, the proposed method
cell c represents the color distribution of the pixels of a board
is compared with the state-of-the-art systems in terms of inter-
region. If the board pose is properly estimated, it is possible to
marker distances, number of bit transitions, robustness against
compute the homography Hm that maps the board image pixels p
noise and vertex jitter. Finally, an analysis of the occlusion method
into the map space
proposed is made.
pm ¼ H m p: As already indicated, this work is available under the BSD
license in the ArUco library [13].
Then, the corresponding cell pc is obtained by discretizing the
result to its nearest value pc ¼ ½pm . Let us denote by I c the set of
6.1. Processing time
image board pixels that maps onto cell c.
If the grid size of M is relatively small compared to the size of
Processing time is a crucial feature in many real time fiducial
the board in the images, I c will contain pixels of the two main
applications (such as augmented reality). The marker detection
board colors. It is assumed then that the distribution of the colors
process of Section 4 can be divided into two main steps: finding
in each cell can be modeled by a mixture of two Gaussians [44],
marker candidates and analyzing them to determine if they
using the Expectation–Maximization algorithm [45] to obtain its
actually belong to the dictionary.
parameters. Therefore, the pdf of the color u in a cell c can be
The detection performance of our method has been tested for a
approximated by the expression
dictionary size of jDj ¼ 24. The processing time for candidate
Pðu; cÞ ¼ ∑ π k N ck ðujμck ; Σ ck Þ; ð14Þ detection, marker identification and error correction was mea-
k ¼ 1;2 sured for several video sequences. The tests were performed using
S. Garrido-Jurado et al. / Pattern Recognition 47 (2014) 2280–2292 2287
Fig. 6. Occlusion mask example. (a) Hue component of Fig. 1(a) with the detected markers. (b) Occlusion mask: white pixels represent visible regions of the board.
Table 3
Average processing times for the different steps of
our method.
Fig. 11. Image examples from video sequences used to test the proposed fiducial marker system. First row shows cases of correct marker detection. Second row shows cases
where false positives have not been detected.
Nonetheless, as any color-based method, it is sensitive to placed besides the board. It can be seen that there is a bright spot
lighting conditions, i.e., too bright or too dark regions make it saturating the lower right region of the board, where markers
impossible to detect the markers nor to obtain a precise occlusion cannot be detected. Additionally, because of the light saturation,
mask. Fig. 16 shows an example of scene where a lamp has been the chromatic information in that region (hue channel) is not
reliable, thus producing segmentation errors in the board.
7. Conclusions
None declared.
Acknowledgments
Fig. 15. Examples of users0 interaction applying the occlusion mask. Note that hands and other real objects are not occluded by the virtual character and the virtual floor
texture. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
S. Garrido-Jurado et al. / Pattern Recognition 47 (2014) 2280–2292 2291
Fig. 16. Example of occlusion mask errors due to light saturation. (a) Original input image. (b) Markers detected. (c) Occlusion mask. (d) Augmented scene.
Appendix A. Supplementary data [19] M. Rohs, B. Gfeller, Using camera-equipped mobile phones for interacting with
real-world objects, in: Advances in Pervasive Computing, 2004, pp. 265–271.
[20] M. Kaltenbrunner, R. Bencina, Reactivision: a computer-vision framework for
Supplementary data associated with this article can be found in table-based tangible interaction, in: Proceedings of the 1st International
the online version at http://dx.doi.org/10.1016/j.patcog.2014.01. Conference on Tangible and Embedded Interaction, TEI 0 07, ACM, New York,
005. NY, USA, 2007, pp. 69–74.
[21] D. Claus, A. Fitzgibbon, Reliable automatic calibration of a marker-based
position tracking system, in: Workshop on the Applications of Computer
Vision, 2005, pp. 300–305.
[22] M. Fiala, Comparing ARTag and ARToolKit plus fiducial marker systems, in:
References IEEE International Workshop on Haptic Audio Visual Environments and Their
Applications, 2005, pp. 147–152.
[23] J. Rekimoto, Matrix: a realtime object identification and registration method
[1] R.T. Azuma, A survey of augmented reality, Presence 6 (1997) 355–385.
for augmented reality, in: Proceedings of the 3rd Asian Pacific Computer and
[2] H. Kato, M. Billinghurst, Marker tracking and HMD calibration for a video-
Human Interaction, Kangawa, Japan, IEEE Computer Society, July 15–17, 1998,
based augmented reality conferencing system, in: International Workshop on
pp. 63–69.
Augmented Reality, 1999, pp. 85–94.
[24] D. Wagner, D. Schmalstieg, ARToolKitPlus for pose tracking on mobile devices,
[3] V. Lepetit, P. Fua, Monocular model-based 3D tracking of rigid objects: a
in: Computer Vision Winter Workshop, 2007, pp. 139–146.
survey, in: Foundations and Trends in Computer Graphics and Vision, 2005,
[25] S. Lin, D. Costello, Error Control Coding: Fundamentals and Applications,
pp. 1–89.
Prentice Hall, New Jersey, 1983.
[4] B. Williams, M. Cummins, J. Neira, P. Newman, I. Reid, J. Tardós, A comparison
[26] D. Flohr, J. Fischer, A lightweight ID-based extension for marker tracking
of loop closing techniques in monocular SLAM, Robotics and Autonomous
systems, in: Short Paper Proceedings of the Eurographics Symposium on
Systems 57 (2009) 1188–1197.
Virtual Environments (EGVE), 2007, pp. 59–64.
[5] W. Daniel, R. Gerhard, M. Alessandro, T. Drummond, S. Dieter, Real-time
[27] X. Zhang, S. Fronz, N. Navab, Visual marker detection and decoding in AR
detection and tracking for augmented reality on mobile phones, IEEE Trans.
Vis. Comput. Graph. 16 (3) (2010) 355–368. systems: a comparative study, in: Proceedings of the 1st International
[6] G. Klein, D. Murray, Parallel tracking and mapping for small AR workspaces, in: Symposium on Mixed and Augmented Reality, ISMAR 0 02, IEEE Computer
Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed Society, Washington, DC, USA, 2002, pp. 97–106.
and Augmented Reality, ISMAR 0 07, IEEE Computer Society, Washington, DC, [28] W. Friedrich, D. Jahn, L. Schmidt, Arvika—augmented reality for development,
USA, 2007, pp. 1–10. production and service, in: DLR Projektträger des BMBF für Informationstech-
[7] K. Mikolajczyk, C. Schmid, Indexing based on scale invariant interest points, nik (Ed.), International Status Conference—Lead Projects Human–Computer
in: ICCV, 2001, pp. 525–531. Interaction (Saarbrücken 2001), DLR, Berlin, 2001, pp. 79–89.
[8] D.G. Lowe, Object recognition from local scale-invariant features, in: Proceed- [29] S. Zollmann, G. Reitmayr, Dense depth maps from sparse models and image
ings of the International Conference on Computer Vision, ICCV 0 99, vol. 2, IEEE coherence for augmented reality, in: 18th ACM Symposium on Virtual Reality
Computer Society, Washington, DC, USA, 1999, pp. 1150–1157. Software and Technology, 2012, pp. 53–60.
[9] P. Bhattacharya, M. Gavrilova, A survey of landmark recognition using the bag-of- [30] M.O. Berger, Resolving occlusion in augmented reality: a contour based
words framework, in: Intelligent Computer Graphics, Studies in Computational approach without 3D reconstruction, in: Proceedings of CVPR (IEEE Con-
Intelligence, vol. 441, Springer, Berlin, Heidelberg, 2013, pp. 243–263. ference on Computer Vision and Pattern Recognition), Puerto Rico, 1997,
[10] H. Kato, M. Billinghurst, Marker tracking and HMD calibration for a video- pp. 91–96.
based augmented reality conferencing system, in: Proceedings of the 2nd IEEE [31] J. Schmidt, H. Niemann, S. Vogt, Dense disparity maps in real-time with an
and ACM International Workshop on Augmented Reality, IWAR 0 99, IEEE application to augmented reality, in: Proceedings of the 6th IEEE Workshop on
Computer Society, Washington, DC, USA, 1999, pp. 85–94. Applications of Computer Vision, WACV 0 02, IEEE Computer Society, Washing-
[11] M. Fiala, Designing highly reliable fiducial markers, IEEE Trans. Pattern Anal. ton, DC, USA, 2002, pp. 225–230.
Mach. Intell. 32 (7) (2010) 1317–1324. [32] A. Fuhrmann, G. Hesina, F. Faure, M. Gervautz, Occlusion in collaborative
[12] D. Schmalstieg, A. Fuhrmann, G. Hesina, Z. Szalavári, L.M. Encarnaçäo, augmented environments, Technical Report TR-186-2-98-29, Institute of
M. Gervautz, W. Purgathofer, The Studierstube augmented reality project, Computer Graphics and Algorithms, Vienna University of Technology, Favor-
Presence: Teleoper. Virtual Environ. 11 (1) (2002) 33–54. itenstrasse 9-11/186, A-1040 Vienna, Austria, December 1998.
[13] R. Muñoz-Salinas, S. Garrido-Jurado, ArUco Library, 2013, 〈http://sourceforge. [33] V. Lepetit, M. odile Berger, A semi-automatic method for resolving occlusion in
net/projects/aruco/〉[Online; accessed 01-December-2013]. augmented reality, in: Proceedings of the IEEE Conference on Computer Vision
[14] K. Dorfmller, H. Wirth, Real-time hand and head tracking for virtual environ- and Pattern Recognition, 2000, pp. 225–230.
ments using infrared beacons, in: Proceedings of CAPTECH98, Springer-Verlag, [34] R. Radke, Image change detection algorithms: a systematic survey, IEEE Trans.
Berlin Heidelberg, 1998, pp. 113–127. Image Process. 14 (3) (2005) 294–307.
[15] M. Ribo, A. Pinz, A. L. Fuhrmann, A new optical tracking system for virtual and [35] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern
augmented reality applications, in: Proceedings of the IEEE Instrumentation Anal. Mach. Intell. 8 (6) (1986) 679–698.
and Measurement Technical Conference, 2001, pp. 1932–1936. [36] S. Suzuki, K. Abe, Topological structural analysis of digitized binary images by
[16] V.A. Knyaz, R.V. Sibiryakov, The development of new coded targets for border following, Comput. Vis. Graph. Image Process. 30 (1) (1985) 32–46.
automated point identification and non-contact surface measurements, in: [37] D.H. Douglas, T.K. Peucker, Algorithms for the reduction of the number of
3D Surface Measurements, International Archives of Photogrammetry and points required to represent a digitized line or its caricature, Cartographica:
Remote Sensing, vol. XXXII, Part 5, 1998, pp. 80–85. Int. J. Geogr. Inf. Geovis. 10 (2) (1973) 112–122.
[17] L. Naimark, E. Foxlin, Circular data matrix fiducial system and robust image [38] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans.
processing for a wearable vision-inertial self-tracker, in: Proceedings of the 1st Syst. Man Cybern. 9 (1) (1979) 62–66.
International Symposium on Mixed and Augmented Reality, ISMAR 0 02, IEEE [39] D.W. Marquardt, An algorithm for least-squares estimation of nonlinear
Computer Society, Washington, DC, USA, 2002, pp. 27–36. parameters, SIAM J. Appl. Math. 11 (2) (1963) 431–441.
[18] J. Rekimoto, Y. Ayatsuka, Cybercode: designing augmented reality environ- [40] R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd
ments with visual tags, in: Proceedings of DARE 2000 on Designing edition, Cambridge University Press, New York, NY, USA, 2003.
Augmented Reality Environments, DARE 0 00, ACM, New York, NY, USA, 2000, [41] C. Harris, M. Stephens, A combined corner and edge detector, in: In Proceed-
pp. 1–10. ings of 4th Alvey Vision Conference, 1988, pp. 147–151.
2292 S. Garrido-Jurado et al. / Pattern Recognition 47 (2014) 2280–2292
[42] W. Förstner, E. Gülch, A Fast Operator for Detection and Precise Location of [45] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete
Distinct Points, Corners and Centres of Circular Features, 1987. data via the EM algorithm, J. R. Stat. Soc. Ser. B (Methodological) 39 (1) (1977)
[43] S.M. Smith, J.M. Brady, Susan—a new approach to low level image processing, 1–38.
Int. J. Comput. Vis. 23 (1995) 45–78. [46] D.F. Williamson, R.A. Parker, J.S. Kendrick, The box plot: a simple visual
[44] A. Sanjeev, R. Kannan, Learning mixtures of arbitrary Gaussians, in: Proceed- method to interpret data, Ann. Intern. Med. 110(11) (1989).
ings of the 33rd annual ACM Symposium on Theory of Computing, STOC 0 01, [47] D.Q. Huynh, Metrics for 3D rotations: comparison and analysis, J. Math.
ACM, New York, NY, USA, 2001, pp. 247–257. Imaging Vis. 35 (2) (2009) 155–164.
S. Garrido-Jurado received his B.S. degree in Computer Science from Cordoba University, Spain, in 2011. He obtained his M.S. degree in 2012 and is currently a Ph.D.
Candidate in the Department of Computing and Numerical Analysis of Cordoba University. His research interests are in the areas of augmented reality, automatic camera
pose estimation and 3D reconstruction.
R. Muñoz-Salinas received the Bachelor degree in Computer Science from the University of Granada (Spain) and the Ph.D. degree from the University of Granada in 2006.
Since then, he has been working with the Department of Computing and Numerical Analysis of Cordoba University, currently he is a lecturer. His research is focused mainly
on Computer Vision, Soft Computing techniques applied to Robotics and Human–Robot Interaction.
F.J. Madrid-Cuevas received the Bachelor degree in Computer Science from Malaga University (Spain) and the Ph.D. degree from Polytechnic University of Madrid (Spain), in
1995 and 2003, respectively. Since 1996 he has been working with the Department of Computing and Numerical Analysis of Cordoba University, currently he is an assistant
professor. His research is focused mainly on image segmentation, object recognition and 3D shape-from-X algorithms.
M.J. Marín-Jiménez received his B.Sc., M.Sc. degrees from the University of Granada, Spain, in 2003, and Ph.D. degree from the University of Granada, Spain in 2010. He has
worked, as a visiting student, at the Computer Vision Center of Barcelona (Spain), Vislab-ISR/IST of Lisboa (Portugal) and the Visual Geometry Group of Oxford (UK).
Currently, he works as an assistant professor at the University of Cordoba (Spain). His research interests include object detection, human-centric video understanding and
machine learning.