Escolar Documentos
Profissional Documentos
Cultura Documentos
Dept. of Mechanical Engineering, Politecnico di Milano, Via La Masa, 1 - 20156 Milan, Italy
(gabriele.guidi, sara.gonizzi, laura.micoli)@polimi.it
Commission V, WG V/1
KEY WORDS: 3D Metrology; Low-cost 3D sensors; Resolution; Systematic error; Random error.
ABSTRACT:
Since the advent of the first Kinect as motion controller device for the Microsoft XBOX platform (November 2010), several similar
active and low-cost range sensing devices have been introduced on the mass-market for several purposes, including gesture based
interfaces, 3D multimedia interaction, robot navigation, finger tracking, 3D body scanning for garment design and proximity sensors
for automotive. However, given their capability to generate a real time stream of range images, these has been used in some projects
also as general purpose range devices, with performances that for some applications might be satisfying. This paper shows the working
principle of the various devices, analyzing them in terms of systematic errors and random errors for exploring the applicability of them
in standard 3D capturing problems. Five actual devices have been tested featuring three different technologies: i) Kinect V1 by
Microsoft, Structure Sensor by Occipital, and Xtion PRO by ASUS, all based on different implementations of the PrimeSense sensor;
ii) F200 by Intel/Creative, implementing the RealSense pattern projection technology; Kinect V2 by Microsoft, equipped with the
Canesta TOF Camera. A critical analysis of the results tries first of all to compare them, and secondarily to focus the range of
applications for which such devices could actually work as a viable solution.
5. EXPERIMENTAL RESULTS
The 5 sensors have been fixed on a tripod, used for digitizing the
test object with the optical axis approximately orthogonal to the
reference plane and moved from 550 mm to 1450 mm at steps of
100 mm, thanks to references drawn on the floor. The actual
orientation was then slightly tilted from exactly 90° for
minimizing the reflection that typically appears in the central part
of the scanned surface, as for example the round area with no data
in the lower central side of figures 3, 4 and 5.
All the software packages used for controlling the different
sensors gave the opportunity to select various nominal operating
volumes, ranging from small (600mm x 600mm x 600mm), to
room size (5m x 5m x 5m). In our case the intermediate step was
used (1m x 1m x 1m), but it was not clear how this setting Figure 8. Structure Sensor global uncertainty, random error and
influenced the sensor given that some 3D information was quadratic fitting on the random error (ρ=0.9992).
collected also for distances far larger than 1m. The specific
minimal distance (550 mm) was chosen just testing on-the-fly
during the experiment what was the real minimal distance giving
some response. The maximum distance was defined similarly, by
the sensor giving the better results also from a long range
(Kinect2), that in this conditions allowed to work easily up to
1450 mm.
All sensors were used for acquiring a single frame of the real-
time 3D stream provided by the range device in steady condition.
No time averaging was made.
Figure 7. ASUS Xtion global uncertainty, random error and Given that the framed area might contain several elements of the
quadratic fitting on the random error (ρ=0.9989). room in addition to the reference plane, a manual selection of the
points belonging to the same plane was needed before starting all
the following processing.
In Figure 6 to 10 the results related do the global uncertainty and Device ρ
the random error have been represented graphically, respectively Microsoft Kinect1 0.9976
for Asus Xion (6), Microsoft Kinect1 (7), Occipital Structure ASUS Xtion 0.9989
Sensor (8), Microsoft Kinect2 (9), and Creative Realsense F200 Structure Sensor 0.9992
(10). Creative F200 0.9986
As expected the trend of the global uncertainty σu follows
generally a growing trend with distance for each triangulation Table 3. Correlation factor between the values of random error σr
based device, as shown by figures 6, 7, 8, and 10. However this at the different distances and the related quadratic
growth does not seem to be following a predictable behavior, error model, for all the triangulation-based devices.
probably due to the poor (or absent) calibration of the camera in
charge of collecting the IR image from which the distances are Differently from the triangulation-based devices, the Kinect2
calculated. All the depth images analyzed in this study exhibit a (i.e. the only TOF-based device analyzed in this test) exhibits a
clear radial distortion varying with the different depths, that could nearly constant value of the random error at the different
be probably eliminated with a proper calibration and post- distances, following a slow growth roughly linear (ρ=0.8127),
processing of the raw 3D data collected by the sensor. but maintaining values always lower than 1mm even at the
Differently from σu, the random error σr extracted by such data maximum operating range (see Table 2). However, also for this
follows coherently the quadratic behavior expected for any device, the apparently poor calibration of the optoelectronic
triangulation based device (Blais et al., 1988): device in charge of collecting the IR echoes from the scene, tend
to produce a global uncertainty much more erratic than the pure
𝑧$ random error.
∆𝑧 ≅ ∆𝑝
𝑓𝑏
6. CONCLUSIONS
where z is the probing range, f is the camera focal length, b is the
camera-projector distance, or baseline, and Δp is the error on the The performances of five low-cost 3D sensors conceived for
measurement of the position of each dot in the projected pattern, gesture tracking have been analyzed in terms of systematic errors
intrinsically affected by the noise of the IR camera. and random errors for exploring the applicability of them in
The efficiency of this approach seems to be confirmed by the standard 3D capturing projects. Five actual devices have been
high correlation between the various quadratic functions obtained tested featuring three different technologies: i) Kinect V1 by
by fitting a parabolic function on the measured σr at different Microsoft, Structure Sensor by Occipital, and Xtion PRO by
working distances, reported in Table 3. ASUS, all based on different implementations of the PrimeSense
sensor; ii) F200 by Intel/Creative, implementing the RealSense
z σu (mm) pattern projection technology; Kinect V2 by Microsoft, equipped
(mm) Kinect1 ASUS Structure Creative Kinect2 with the Canesta TOF Camera.
Xtion Sensor F200 The tests have analyzed the range from 550mm to 1450mm that
550 2.090 1.878 1.332 3.010 3.558 seems the more suitable for possible low-cost 3D acquisition with
650 2.296 1.853 1.386 3.679 1.521 handheld devices, giving acceptable results with all the devices
750 2.819 1.812 1.555 3.963 1.588 only between 550mm and 1150mm.
In this range the results exhibit a global uncertainty similar for all
850 3.314 1.972 1.669 5.011 2.439
the Primesense-based devices, ranging from 2 to 3.9 mm for the
950 3.579 2.231 2.002 5.636 1.598
Kinect1, from 1.9 to 2.9 mm for the Xion, from 1.3 to 2.8 mm for
1050 3.584 2.573 2.398 6.702 2.462 the structure sensor. Much worst results are produced by the
1150 3.965 2.976 2.770 6.944 2.676 Realsense-based unit, whose global uncertainty ranges from 3 to
1250 4.007 - - - 1.954 6.9 mm at the same operating ranges.
1350 3.903 - - - 1.732 Finally the Kinect2 unit, if excluded the closest range, exhibit a
1450 - - - - 2.273 measurement uncertainty ranging from 1.4 and 2.7mm even
considering the full operating range considered in the tests, and
Table 1. Global measurement uncertainty vs. distance for the seems therefore unbeatable above 1m if compared with any of
different sensors used in this test. the triangulation-based devices.
These numbers make evident that such devices – as distributed
z σr (mm) on the market – cannot be considered seriously for 3D
(mm) Kinect1 ASUS Structure Creative Kinect2 digitization projects requiring the precise reproduction of
Xtion Sensor F200 surfaces allowed by costly range devices based on pattern
550 0.677 0.695 0.553 0.417 0.793 projection or sheets of laser light, where the measurement
650 0.780 0.792 0.667 0.520 0.764 uncertainty can be reduced below 50 micrometers. However, for
750 0.844 0.931 0.784 0.722 0.809 a range of 3D modeling problems where a high precision is not
850 0.993 1.054 0.903 0.914 0.818 required, like for example the rough digitization of handmade
950 1.183 1.178 1.044 1.244 0.951 mockups for design purposes, the acquisition of shapes for
1050 1.377 1.393 1.172 1.621 0.848 determining volumes or surfaces independently of the fine
1150 1.496 1.562 1.265 2.215 0.856 details, or the rough digitization of human bodies for the
1250 1.681 - - - 0.913 estimation of garment sizes, this kind of low-cost devices can be
1350 1.853 - - - 0.959 effectively used.
Furthermore, the comparative results obtained suggest the
1450 - - - - 0.922
possibility to greatly enhance the performances of such devices
by adding a proper modeling of the optical device and an
Table 2. Random error (extracted by the raw data as described in
associated calibration process for reducing the strong systematic
section 4.5) vs. distance for the different sensors used
error component that emerged for all the tested devices. This
in this test.
point will be explored in a future research.
ACKNOWLEDGEMENTS 1437–1454. doi:10.3390/s120201437
The authors would like to thank Mario Covarrubias and Lee, J.C., 2008. Hacking the Nintendo Wii Remote. IEEE
Giandomenico Caruso from the mechanical Department of Pervasive Comput. 7, 39–45. doi:10.1109/MPRV.2008.53
Politecnico di Milano for having provided the Kinect1 and the
Structure sensors; Paolo Belluco from B10NIX S.r.l. Milan for Lightman, K., 2016. Silicon gets sporty. IEEE Spectr. 53, 48–53.
having provided the Kinect2 sensor, and Matteo Matteucci from doi:10.1109/MSPEC.2016.7420400
DEIB for having provided the ASUS Xion sensor.
The authors would also like to acknowledge the contribution of Maizels, A., Shpunt, A., Litvak, S., 2010. Enhanced 3d
Roberto Belloni in collecting some of the data presented in this interfacing for remote devices. US20100235786.
paper.
Mallick, T., Das, P.P., Majumdar, A.K., 2014. Characterizations
REFERENCES of noise in Kinect depth images: A review. IEEE Sens. J. 14,
1731–1740. doi:10.1109/JSEN.2014.2309987
Alnowami, M., Alnwaimi, B., Tahavori, F., Copland, M., Wells,
K., 2012. A quantitative assessment of using the Kinect for Marks, R., 2011. 3D spatial interaction for entertainment, in:
Xbox360 for respiratory surface motion tracking. Proc. SPIE. 2011 IEEE Symposium on 3D User Interfaces (3DUI). IEEE, pp.
doi:10.1117/12.911463 x–x. doi:10.1109/3DUI.2011.5759209
Bamji, C.S., O’Connor, P., Elkhatib, T., Mehta, S., Thompson, Molnár, B., Toth, C.K., Detrekői, a., 2012. Accuracy Test of
B., Prather, L.A., Snow, D., Akkaya, O.C., Daniel, A., Payne, Microsoft Kinect for Human Morphologic Measurements. ISPRS
A.D., Perry, T., Fenton, M., Chan, V.H., 2015. A 0.13 um CMOS - Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. XXXIX-
System-on-Chip for a 512x424 Time-of-Flight Image Sensor B3, 543–547. doi:10.5194/isprsarchives-XXXIX-B3-543-2012
with Multi-Frequency Photo-Demodulation up to 130 MHz and
Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim,
2 GS/s ADC. IEEE J. Solid-State Circuits 50, 303–319.
D., Davison, A.J., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon,
doi:10.1109/JSSC.2014.2364270
A., 2011. KinectFusion: Real-Time Dense Surface Mapping and
Blais, F., Rioux, M., Beraldin, J.A., 1988. Practical Tracking, in: IEEE ISMAR. IEEE.
Considerations For A Design Of A High Precision 3-D Laser
Nintendo, 2008. Consolidated Financial Highlights [WWW
Scanner System. Proc. SPIE 0959, Optomech. Electro-Optical
Document]. Intern. Doc. URL
Des. Ind. Syst.
www.nintendo.co.jp/ir/pdf/2008/080124e.pdf (accessed 4.6.16).
Bolt, R.A., 1980. “Put-that-there”: Voice and Gesture at the
Payne, A., Daniel, A., Mehta, A., Thompson, B., Bamji, C.S.,
Graphics Interface. SIGGRAPH Comput. Graph. 14, 262–270.
Snow, D., Oshima, H., Prather, L., Fenton, M., Kordus, L.,
doi:10.1145/965105.807503
O’Connor, P., McCauley, R., Nayak, S., Acharya, S., Mehta, S.,
DiFilippo, N.M., Jouaneh, M.K., 2015. Characterization of Elkhatib, T., Meyer, T., O’Dwyer, T., Perry, T., Chan, V.H.,
Different Microsoft Kinect Sensor Models. IEEE Sens. J. 15, Wong, V., Mogallapu, V., Qian, W., Xu, Z., 2014. A 512×424
4554–4564. doi:10.1109/JSEN.2015.2422611 CMOS 3D Time-of-Flight Image Sensor with Multi-Frequency
Photo-Demodulation up to 130MHz and 2GS/s ADC, in: Solid-
Fisher, S.S., 1987. Telepresence master glove controller for State Circuits Conference Digest of Technical Papers (ISSCC),
dexterous robotic end-effectors, in: Casasent, D.P. (Ed.), 2014 IEEE International. pp. 134–135.
Proceedings of SPIE - The International Society for Optical doi:10.1109/ISSCC.2014.6757370
Engineering. pp. 396–401. doi:10.1117/12.937753
Rico, J., Crossan, A., Brewster, S., 2011. Gesture Based
Gonzalez-Jorge, H., Riveiro, B., Vazquez-Fernandez, E., interfaces: practical applications oof gestures in real world
Martínez-Sánchez, J., Arias, P., 2013. Metrological evaluation of mobile settings, in: England, D. (Ed.), Whole Body Interaction.
Microsoft Kinect and Asus Xtion sensors. Meas. J. Int. Meas. Springer London, London, pp. 173–186. doi:10.1007/978-0-
Confed. 46, 1800–1806. 85729-433-3_14
doi:10.1016/j.measurement.2013.01.011
Sell, J., O’Connor, P., 2014. The Xbox One System on a Chip
Guidi, G., 2013. Metrological characterization of 3D imaging and Kinect Sensor. IEEE Micro 34, 44–53.
devices. Proc. SPIE 8791, 87910M. doi:10.1117/12.2021037 doi:10.1109/MM.2014.9
Guidi, G., Russo, M., Magrassi, G., Bordegoni, M., 2010. Taubin, G., 1995. A Signal Processing Approach to Fair Surface
Performance Evaluation of Triangulation Based Range Sensors. Design, in: Proceedings of the 22Nd Annual Conference on
Sensors 10, 7192–7215. doi:10.3390/s100807192 Computer Graphics and Interactive Techniques, SIGGRAPH
’95. ACM, New York, NY, USA, pp. 351–358.
Hirakawa, K., Parks, T.W., 2006. Image denoising using total doi:10.1145/218380.218473
least squares. IEEE Trans. Image Process. 15, 2730–2742.
doi:10.1109/TIP.2006.877352