Test

University of Technology, Sydney
Faculty of Engineering and Information Technology

HEAD-TO-SHOULDER SIGNATURE FOR PERSON
IDENTIFICATION IN HUMAN-ROBOT INTERACTION
Alexander Joseph Virgona
Student Number: 10173639
Project Number: S10-167
Major: Mechanical and Mechatronic Engineering
Supervisor: Dr Nathan Kirchner
A 12 Credit Point Project submitted in partial fullment of the requirement for
the Degree of Bachelor of Engineering
26th August, 2011
Head-to-Shoulder Signature for Person Identication in Human-Robot Interaction
Statement of Originality
I declare that I am the sole author of this report, that I have not used fragments of
text from other sources without proper acknowledgment, that theories, results and
designs of others that I have incorporated into my report have been appropriately
referenced and all sources of assistance have been acknowledged.
ii
Abstract
Driven by recent technological advancements and social issues such as ageing pop-
ulations there is a growing desire to see robots become ubiquitous in society. For
robots to integrate into society they must be able to communicate eectively with
humans. A key capability for interaction with humans is to identify individuals
in a way they does not disrupt their regular behaviour.
This capstone project discusses the limitations of existing identication tech-
niques and proposes a novel method of exploiting the inter-person variations in
the size and shape of peoples head, neck and shoulders to achieve robust person
recognition in a non-intrusive manner. The method proposed uses a scale and
view-angle robust feature vector, the head-to-shoulder signature, to extract de-
scriptive characteristics of the head, neck and shoulder regions of a person. This
head-to-shoulder signature is used in combination with support-vector machines
to identify humans from observations captured with a 3D depth camera.
The method is shown to successfully discriminate between 9 individuals, ob-
served in a walking motion, with 76.8% accuracy and between 438 indviiduals
with 62% accuracy. The method was also shown to successfully recognise peo-
ple carrying objects, whilst seated and while facing away from the robot. The
method developed was implemented and tested on the RobotAssist platform and
used at the international RoboCup@Home competition in Istanbul 2011 where
the RobotAssist team placed 4th out of 19 teams.
iii
Acknowledgements
First and foremost I would like to thank my parents for their unwavering love and
support throughout my undergraduate studies. Secondly I would like to thank
my supervisor Dr Nathan Kirchner for his patience and guidance throughout this
capstone project and Dr Alen Alempijevic for his hard work and leadership on
the RobotAssist project. I would like to thank Michael Koob whom I worked
with closely and was a great help throughout this project. I would also like to my
thank my colleges Dan Egan-Wyer, Sonja Caraian and David Richards for their
help and support. Finally I wish to thank my friends who have kept me sane and
are always the light at the end of the tunnel.
iv
Contents
Statement of Originality ii
Abstract iii
Acknowledgements iv
Contents v
List of Figures viii
Nomenclature xi
1 The Need for Person Identication in Human-Robot Interaction 1
1.1 The Demand for Robots in Society . . . . . . . . . . . . . . . . . 1
1.2 Human-Robot Interaction and the Need for Socially Acceptable
Person Identication . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Existing Approaches to Person Identication . . . . . . . . . . . . 4
1.4 Identifying People in a Typical Human Environment . . . . . . . 7
2 Exploiting Human Physical Dimensions for Person Recognition 10
2.1 Using Head, Neck and Shoulder Region for Person Recognition . . 10
2.2 Natural Variation in Anthropometric Dimensions . . . . . . . . . 11
2.3 Concept Evaluation of Head-to-Shoulder Shape for Person Identication 14
3 A Method of Person Identication Using Head-to-Shoulder Sig-
natures 17
3.1 Scene Analysis for Person Detection . . . . . . . . . . . . . . . . . 19
v
3.2 Extracting the Head-to-Shoulder Signature . . . . . . . . . . . . . 21
3.3 Classication of HSS for Person Identication . . . . . . . . . . . 24
4 Empirical Evaluation 29
4.1 Robustness to People in Walking Motion . . . . . . . . . . . . . . 29
4.1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.2 Experimental Procedure . . . . . . . . . . . . . . . . . . . 30
4.1.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . 31
4.2 Scalability of HSS Based Person Identication . . . . . . . . . . . 34
4.2.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Online performance of HSS Based Person Identication System . 38
4.3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Performance of HSS Based Person Identication at RoboCup@Home
2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.2 Follow Me Task . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.3 Grand Final . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Conclusions and Future Work 47
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.1 Identifying Unknown People . . . . . . . . . . . . . . . . . 48
5.2.2 Benets of Tracking . . . . . . . . . . . . . . . . . . . . . 48
5.2.3 Robustness to Changes in Personal Attire . . . . . . . . . 48
Bibliography 50
Appendix:
vi
A 3D Sensing Technology 54
A.1 Time of Flight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
A.2 Structured Light . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
B RobotAssist as a Development Platform 56
B.1 The RobotAssist Project . . . . . . . . . . . . . . . . . . . . . . . 56
B.2 The RobotAssist Platform . . . . . . . . . . . . . . . . . . . . . . 57
B.2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
B.2.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
C Classication Using Support Vector Machines 58
vii
List of Figures
1.1 A robot designed to help disabled people eat meals. This type of
robot could increase the independence of aged or disabled people
and provide them with improved quality of life. (Afp Photo/Jiji
Press) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 An Examplar scenario in an oce environment. The robot is asked
by one person to retrieve a drink but upon returning nds two
people and must identify who to give the drink to. . . . . . . . . . 4
1.3 A collection of cropped face images taken from the Yale B database
(Lee et al., 2005). This type of cropped and normalised face image
is needed for training face recognition systems limiting the capa-
bilities of such systems to learn online. . . . . . . . . . . . . . . . 5
1.4 A person leaning in to be heard by the robot. This kind of inter-
action is awkward and should be avoided in HRI . . . . . . . . . . 6
1.5 Smartgate system at Sydney airport processes e-passport holders
at the immigration checkpoint. Users are required to stand still in
a marked position on the oor and face the sensor for a number of
seconds. This type of contstrained measuremnt procedure should
be avoided in HRI. . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 A typical oce environment: The scene is cluttered with furni-
ture and objects but the head-to-shoulder region of both people is
clearly visible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
viii
2.1 Data collection setup for proof of concept study: Robot is setup
facing a wall and participants stand facing the robot one at a time
as 3D pointcloud data is gathered. . . . . . . . . . . . . . . . . . . 14
2.2 Box plots of head and shoulder spans showing signigant separa-
tion between individuals relative to the spread of data within each
individual. This separation suggests that these features may be
descriptive enough to facilitate identication. . . . . . . . . . . . . 16
3.1 System Schematic for HSS Person Identication showing the three
main parts of the system: person detection, feature extraction and
person identication. . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Scene analysis process takes a pointcloud of the scene and detects
regions of interest used to segment the pointcloud. . . . . . . . . . 20
3.3 The Head-to-Shoulder Signature is made up of rotationally robust
span features taken from a series of lateral slices of the pointcloud. 22
3.4 The eect of viewing angle on pointcloud measurments: Measuring
orthogonally to the the sensor frame of reference is only useful
when the subjects orientation is known. The span measure shown
in three provides a view angle robust measure. . . . . . . . . . . . 24
3.5 Head-to-Shoulder signature measured from a human is clearly dif-
ferent to that of a chair, this representation can therefore be ex-
ploited to disregard false person detections. . . . . . . . . . . . . . 27
3.6 By adjusting the gamma parameter of the SVM the classication
region can be made to t the training data more or less closely.
(Images generated using svmtoy (Chang and Lin, 2001)) . . . . . 28
4.1 The participant is directed to stand on the marked line but is, in
fact, measured before reaching this point to ensure natural data of
the person walking is captured. . . . . . . . . . . . . . . . . . . . 31
ix
4.2 Graphic user interface shows: the segment of the pointcloud being
measured, the shape of the HSS and the classication success to
generate interest in the experiment and encourage repeated tests,
while the prompt to conrm or correct the classication result of
the system is used to label the recorded data for oine use. . . . 32
4.3 The confusion matrix shows positive classication results of the
system along the diagonal, points o the diagonal are missclassi-
cations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 In the experimental setup for the scaleability test participants walk
towards the sensor, stop momentarily 3m away and then continue
walking past, capturing a mixture of natural walking, stopping and
starting behaviours. . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Confusion matrix showing successful results of the scalability test
on the diagonal and missclassications o the diagonal. Grey point
of the image has been shifted to reveal more detail in the lower end
of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.6 Classication results near zero may indicate non-conforming trials 37
4.7 Graph showing the detrimental eect of increasing the size of the
problem (number of classes) on classication accuracy. Interest-
ingly the graph indicates that this eect will plateau around 60%. 38
4.8 Experimental setup for online evaluation where people were recog-
nised walking, carrying objects, from behind and seated . . . . . . 40
4.9 Images of successful identication results in a range of poses con-
rming the suitability of the system to the challenges of HRI . . . 41
4.10 Stages of the RoboCup@Home 2011 Follow Me task where the
RobotAssist team scored full marks. . . . . . . . . . . . . . . . . . 43
4.11 RoboCup@Home 2011 Grand Final: Robot recognises its leader
from behind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
x
Nomenclature
Term Denition
HRI Human-robot interaction
CAS Centre for Autonomous Systems
UTS University of Technology, Sydney
SVM Support-vector machine(s)
ROI Region(s) of interest
HSS Head-to-Shoulder signature
TOF Time of ight
CL Coded light
xi
Chapter 1
The Need for Person
Identication in Human-Robot
Interaction
1.1 The Demand for Robots in Society
In recent years the eld of robotics has advanced signicantly fuelled by increasing
aordability and availability of sensing technology, computing power and actua-
tion hardware, coupled with a growing interest in robotics research. Advances in
the technology have lead to an increased public prole and now more than ever
there is a desire to see robots become commonplace in society (Goodrich and
Schultz, 2007). Robots already play a large role in industries such as manufac-
turing, packaging and automotive and many of the core capabilities enabling these
robots in areas such as: manipulation, sensing and cognition could be applied to
applications which require some level of human interaction. The development of
social robots capable of working with people has the potential to improve qual-
ity of life (Goodrich and Schultz, 2007) and address serious social issues such as
ageing of the population causing increased demand for care, and workforce short-
ages (Woods, 2011). Domestic service robots, for instance, could assist aged or
disabled persons in the home greatly increasing their independence and reducing
1
Figure 1.1: A robot designed to help disabled people eat meals. This type of robot
could increase the independence of aged or disabled people and provide them with
improved quality of life. (Afp Photo/Jiji Press)
the demand for full time care sta. The robot shown in Fig. 1.1 is an example
of a care robot called Mr Spoon, designed to assist people with eating meals.
The robot is mounted to a desk and requires the use of manual controls, however
further research into the social abilities of robots could enable development of
assistance robots capable of working in a shared environments with humans and
providing this type of functionality. For robots to work in shared environments
with people it is important that they are able to interact in a socially acceptable
way resembling human-human interaction (Rani et al., 2006) . Before this type
of service robot can become a reality there are still many challenges which must
be addressed with regard to human-robot interaction (HRI).
2
1.2 Human-Robot Interaction and the Need
for Socially Acceptable Person
Identication
One of the challenges in the eld of HRI is the robots ability to perceive humans
in their environment. Clearly before any interaction can take place robots need to
be able to detect and locate humans in their environment. Having detected that
there are people in its environment a robot may then be able to interact with those
people. However in most social scenarios there are likely to be numerous people
in the robots environment and being able to distinguish individuals from one
another greatly increases the potential usefulness of service robots. Eective and
secure methods of identication such as ngerprint analysis are used successfully
in high security applications where accuracy takes precedence over intrusiveness
but in the domain of HRI such methods of identication would be considered
socially unacceptable.
For instance, consider the scenario presented in Fig. 1.2. The robot is at-
tempting to complete a delivery task in an oce environment. The robot is
required to deliver an object to a specic person. Having fetched the object the
robot returns to the room to nd more than one person. The robot could put the
responsibility of identication back onto the people in the room (for instance by
asking for the person by name) but it would be far less intrusive and more social
acceptable (That is, it would not unnecessarily disrupt the social behaviour of
humans) if the robot retained the responsibility of accurate delivery and was able
to identify and approach the correct person without delay and without needing
to disturb any third parties.
3
?
?
Figure 1.2: An Examplar scenario in an oce environment. The robot is asked
by one person to retrieve a drink but upon returning nds two people and must
identify who to give the drink to.
1.3 Existing Approaches to Person
Identication
A number of broad approaches to the person recognition problem exist. Indica-
tive examples are: audio (Sanderson, 2008), visual (Barreto et al., 2004; Wright
et al., 2009; Irfanoglu et al., 2004) and behavioural characteristic (Voth, 2003;
Wang et al., 2003). However, the variable conditions in home/oce environ-
ments add to the non-trivial nature of socially acceptable person recognition in
ways that are not thoroughly addressed by these methods above. These environ-
ments tend to vary considerably in visual complexity, lighting conditions, ambient
noise levels and physical structure. Furthermore, there are signicant social fac-
tors which must be considered in the domain of HRI regarding disruption to
human behaviour.
Perhaps the most common method for person identication is face detection
using visual features. The technique presented in (Barreto et al., 2004) uses Haar-
4
Figure 1.3: A collection of cropped face images taken from the Yale B database
(Lee et al., 2005). This type of cropped and normalised face image is needed for
training face recognition systems limiting the capabilities of such systems to learn
online.
like features originally introduced by (Viola and Jones, 2001) and eigenimages
(Moghaddam and Pentland, 1997) to perform face recognition in variable light-
ing conditions, but does not address the issues of partial occlusions and changes
in facial expression. In (Wright et al., 2009) the authors attempt to provide treat-
ments for the challenges of: partial occlusion and variations in illumination and
facial expressions, through a training data selection process. Whilst the authors
demonstrated their methods contribution, this approach requires a prescribed
set of training data and relies on the assumption that face-area detection, crop-
ping, and normalisation have been performed a priori Fig. 1.3. neither of which
are reasonable in the aforementioned application space where an involved train-
ing process may become intangible as the population size grows and/or face-area
detection fails.
A considerably dierent approach to face recognition is exemplied by (Ir-
fanoglu et al., 2004). In this work, the authors exploit inter-person volumetric
variations (calculated from a surface reconstructed from a dense 3D pointcloud
of the face) to recognise people. The use of 3D features from a pointcloud is
shown to improve robustness against the aforementioned issues of variations in
illumination. However this technique again relies on constrained and detailed
observations of the face. Aside from the complex challenges of face recognition
in both the 2D and 3D domains the simple disadvantage of these methods is that
they require the face to be visible, which cannot be guaranteed in the context of
HRI.
5
An approach to person identication which escapes this reliance on face ob-
servations is the use of vocal audio signals. A number of methods for person
identication based on speech are presented in (Sanderson, 2008). Due to back-
ground noise in situations such as: oce, home or factory environments, these
approaches often require the person to be quite close to the microphone to be
eective. Fig. 1.4 shows a person leaning in to be heard by a robot, clearly this
is type of interaction is not ideal for the purposes of HRI as the onus here is on
the user to be understood rather than the robot to understand. Despite eorts
by researchers to overcome problems associated with background noise this type
of approach is still limited in that it requires the person being identied to be
speaking, which clearly is not always the case in social situations.
Figure 1.4: A person leaning in to be heard by the robot. This kind of interaction
is awkward and should be avoided in HRI
Another promising approach to person recognition is through behavioural
characteristics such as a persons gait, as examined in (Voth, 2003; Wang et al.,
2003). In both (Wang et al., 2003) and (Voth, 2003) the dynamic characteris-
tics of a persons gait are derived from a series of silhouette images of the person
walking. In theses cases the silhouette images were constructed from a vision
sensor. Similar information is potentially available from other sensing modes (3D
6
Figure 1.5: Smartgate system at Sydney airport processes e-passport holders at
the immigration checkpoint. Users are required to stand still in a marked position
on the oor and face the sensor for a number of seconds. This type of contstrained
measuremnt procedure should be avoided in HRI.
pointclouds for instance). Whilst the coarseness of the underlying representation
has enabled recognition at greater distances and has been demonstrated to sig-
nicantly improve robustness to variations in illumination, facial expressions and
head-pose, a prohibitive issue remains. The method cannot, by design, recognise
people that are standing or sitting.
1.4 Identifying People in a Typical Human
Environment
Many of the identication methods discussed above perform well under controlled
conditions. For instance, visual methods such as face detection are eective pro-
vided lighting conditions can be controller or predicted, making such techniques
very suitable for biometric applications such the Smartgate system shown in Fig.
1.5 and used at Sydney airport to process Australian and New Zealand e-passport
holders at the immigration desk. The eld of HRI however calls for the develop-
ment of systems capable of functioning in unconstrained social environments.
There are a number of challenges that must be overcome in order for mobile
7
robots to be capable of recognising people in social environments such as an oce
or home. Such environments are cluttered with furniture and other objects such
as tables, desks, chairs, doors and computers. Not only do these objects clutter
the environment but people are often interacting with such objects as well as each
other. As such, the people in the environment will often appear in contact with,
or partially occluded by the objects and other people around them.
This is considerably less so, however, for one region of the human body, peoples
head, neck and shoulders. Unlike the waist, legs, or arms, a persons head, neck
and shoulders maintain a relatively consistent shape through a broader range of
body poses/postures/activities and are often clearly visible even when interacting
with items such as chairs, tables and computers. Consider Fig. 1.6. In this
photograph taken in an oce environment both people are somewhat occluded
by a partition wall and both are interacting chairs, desks and computers. Despite
this both persons head-to-shoulder regions are clearly visible. With the desire
to utilise this region clear, the question becomes, is it suitable for discriminating
between individuals?
8
Figure 1.6: A typical oce environment: The scene is cluttered with furniture
and objects but the head-to-shoulder region of both people is clearly visible
9
Chapter 2
Exploiting Human Physical
Dimensions for Person
Recognition
2.1 Using Head, Neck and Shoulder Region for
Person Recognition
The visibility of the head-to-shoulder region of the human body in typical HRI
situations makes it an attractive candidate for use in socially acceptable person
identication. Its visibility allows it to be observed by: RGB cameras, laser
range nders, 3D depth cameras and any other sensor that relies on line-of-sight
to measure. Physical separation from the rest of the body and from furniture
and other surroundings means that, given an observation from one of the above
sensors, the head-to-shoulder region can be eectively isolated from the rest of the
observation to allow close inspection and extraction of some descriptive metrics.
Given the observability of the head-to-shoulder region, the next step is to
identify a set of metrics which can be measured from this region that will facil-
itate identication. The most obvious choice is to use features of the face for
identication, after all this the most distinctive and widely used key to a persons
10
identity (V. Bruce, 1986). However as discussed in section 1.3 this approach relies
on clear vision of the face which cannot be observed in many situations, for in-
stance from behind the person. Vocal recognition is not dependant on observation
of the face, making it a possible candidate however relies on recording the person
speak and suers from the eects of background noise (Sanderson, 2008). Skin
tone of a person is visible in the head-to-shoulder region from all angles making
it a possible candidate however would rely on visual information, and suer from
the eects of variable lighting conditions (Kakumanu et al., 2007). Hair colour
is another available feature but as with skin tone would be dicult to measure
repeatably in variable light. Citing the lighting problems associated with visual
information the remaining available features lie in geometric information.
With the goal of using geometric information of the head-to-shoulder region to
discriminate between individuals, some measureable features must be chosen. In
order to be useful for identication the feature set chosen must vary signicantly
enough within the population to be descriptive. The measurement and study of
such features and their variation is the domain of anthropometry.
2.2 Natural Variation in Anthropometric
Dimensions
Anthropometry is the scientic study of the measurements and proportions of the
human body (Soanes C., 2005) and has been used: to understand the development
of the human species (Relethford, 2009; Darwin, 1902) ; as an early method for
identication of criminals before the development of modern biometrics (Cole,
2002); and in modern design and ergonomics (Pheasant, 1996).
As discussed above, in order to use anthropometric dimensions of the head-to-
shoulder region for the purpose of identication, some descriptive features must
be measured from this region. To assess the usefulness of the physical features in
the head-shoulder region, some data describing the variation of the features with
the human population in required. Anthropometric surveys provide such data,
documenting a set of standard measurements taken between landmarks (Hrdlicka,
11
Measurement Name (and No.) Mean Min Max
Biacromial Breadth (10) 397.0 6.9 118.5 164.2
Head Breadth (60) 59.7 2.1 50.4 68.1
Head Circumference (61) 223.5 6.0 202.4 246.9
Head Length (62) 77.6 2.8 68.1 86.6
Neck Circumference (80) 149.4 7.7 124.4 185.0
Neck Circumference Base (81) 160.8 8.1 137.4 198.8
Shoulder CIrcumference (90) 462.7 23.8 380.3 560.6
Shoulder Length (92) 59.2 4.3 44.9 72.8
Table 2.1: A summary of anthropometric data taken from (et al., 1989) rele-
vant to the head-to-should region. The data, taken from statistical summaries of
male soldiers, reveals signicant variation in the head and shoulder measures. (All
measurements in millimeters.)
1920) on the human body for a large sample of a population.
An anthropometric survey was conducted in 1988 by the US military to facili-
tate design of clothing, protective equipment and workspaces that accommodated
the full range body sizes of its personnel (et al., 1989) This survey documents
180 anthropometric dimensions measured from almost 9000 soldiers. Amongst
the set of standard measurements are several which relate directly to the head-to-
shoulder region of the body. A statistical summary of the relevant measurements
is provided in Table 2.1. This table shows signicant variation in several anthro-
pometric dimensions across the soldiers surveyed. Diagrammatic representation
of the measurements listed, as well as descriptions of the specic measurements
can be found in (et al., 1989).
To consider these metrics in the context of a person identication method,
the variation across the population for each dimension must be compared to the
precision with which it can be measured. The precision of measurement depends
heavily upon the sensor used to measure it. 3D depth cameras allow measurement
of 3D surfaces at reasonably high frame rates (greater than 25 fps) making them
12
SwissRanger 4000-09
Depth Repeatability (Std Dev) 6mm(center) - 12mm(edge)
Equivalent Parallel Resolution at 2m 8.4mm(center) - 9.7mm(edge)
PrimeSensor 1.08
Depth Resolution at 2m 10mm
Parallel Resolution at 2m 3mm
Table 2.2: Summary of measurement properties of the SwissRanger 4000 and
PrimeSensor 1.08 reference design relevant to capturing spatial information from a
subject at a distance of 2 metres.
a suitable option for this type of measurement on a robotic platform. Appendix
A discusses the types of 3D depth camera technology as well as two of the most
popular models of depth sensing camera. Relevant measurement specications
of two popular depth cameras, the Swissranger4000 and PrimeSensor 1.08, are
summarised below. Equivalent linear resolution (R) is calculated using Eq. 2.1
and Eq. 2.2, where D is the distance from the sensor, is the angular resolution
of the sensor and is the horizontal eld of view of the sensor.
R
center
= D tan() (2.1)
R
edge
= D
tan
tan
(2.2)
The data presented (in Table 2.1 and 2.2) shows that the measuring capa-
bilities of the available sensors are in the same order of magnitude as variations
in the anthropometric measures and in some cases are even ner. This compar-
ison suggests that the proposed idea of using these measures for identication
warrants further investigation.
13
2.3 Concept Evaluation of Head-to-Shoulder
Shape for Person Identication
Given the capabilities of the available sensors (relative to variations in anthropo-
metric dimensions) a proof of concept study was undertaken in order to further
investigate the possibility of person identication using physical dimensions of
the human head-to-shoulder region. Firstly, a dataset was gathered using a Swis-
sranger 4000 attached to the head of the RobotAssist platform. The robot was
placed in a xed position in an empty room with a wall as a backdrop (Fig.2.1).
A total of 25 individuals (the number of students available within CAS-UTS)
were separately recorded standing in front of the robot, at a distance of 2 metres,
facing towards it.
FEATURE
EXTRACTION ZONE
STAND
HERE
STAND
HERE
Figure 2.1: Data collection setup for proof of concept study: Robot is setup facing
a wall and participants stand facing the robot one at a time as 3D pointcloud data
is gathered.
From the pointcloud data gathered the physical dimensions of the head and
shoulders of each person were measured. An algorithm was developed to extract
the lateral span of the head and the shoulders of each person. The box plots in
Fig. 2.2 show the results obtained for the head (Fig. 2.2a) and shoulder spans
(Fig. 2.2b) where the red line marks the median measure for each individual and
the box represents the 25th - 75th percentile range. It is clear from the gure
that there is a separation between the median result of each persons head span
14
and even more so in their shoulder span, suggesting that this technique could
potentially be used for discriminating between people. The plots also show that
spread of results for each person was narrow in comparison to the separation from
other people, in many cases completely separating the middle 50% from other
people in the test and in some cases separating the entire range from other people
in the test. This comparison suggests that not only was the variation between
individuals signicant but that it could be measured using the SwissRanger 4000.
15
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
(a) Box plot of head spans.
0.45
0.5
0.55
0.6
0.65
0.7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
(b) Box plot of shoulder spans.
Figure 2.2: Box plots of head and shoulder spans showing signigant separation
between individuals relative to the spread of data within each individual. This
separation suggests that these features may be descriptive enough to facilitate
identication.
16
Chapter 3
A Method of Person
Identication Using
Head-to-Shoulder Signatures
Based on encouraging results of preliminary investigations into using head and
shoulder shape for person identication, a system was developed to exploit this
idea. In order to identify people the system needed to detect people from a scene,
examine them individually, and make some decision as to their identity. The
identication system developed in this capstone is divided into three main stages:
person detection, feature extraction and person identication as shown in Fig. 3.1.
The input to the system is a 3D pointcloud from any type of 3D depth camera.
The pointcloud is passed to the person detection stage which analyses the scene
by segmenting it into smaller pointclouds for each of the regions of interest (ROI)
potentially representing people, so they can be examined individually. The feature
extraction stage then takes these pointcloud segments and extracts the HSS from
each of them to be considered by the person identication stage. Finally the
person identication stage determines the identity of each person by comparing
the observed HSS with a stored model of known people. These steps are explained
in more detail below.
17
!
!
P
E
R
S
O
N

D
E
T
E
C
T
I
O
N

F
E
A
T
U
R
E

E
X
T
R
A
C
T
I
O
N

P
E
R
S
O
N

I
D
E
N
T
I
F
I
C
A
T
I
O
N

SCENE FLATTENING
IDENTIFICATION TRAINING
Density Image!
Known
Person
HSS!
Unknown
Person
HSS!
Identity Label!
BLOB DETECTION
SIZE CHECKING
HEAD-TO-SHOULDER SLICING
CALCULATE ROTATIONALY INVARIANT SPANS
VERIFY DETECTED PEOPLE HUMAN
HSS
SORT KNOWN PEOPLE

KNOWN
PEOPLE
HSS
Blobs!
Labeled
HSS!
Labeled
HSS!
SEGMENT POINTCLOUD
ROI!
3D Pointcloud!
3D Pointcloud!
HSS!
HSS!
Pointcloud Segments!
HSS!
Human Only HSS!
3D DEPTH CAMERA
!
!"!"!"
N!
Head-to-Shoulder Signature!
Slice Pointclouds!
SUPERVISED
TRAINING
COMUNICATION
WITH USERS
Identity
Label!
WORLD
MODEL
!"!"!"
N!
Rotationally
Invariant Spans!
Figure 3.1: System Schematic for HSS Person Identication showing the three
main parts of the system: person detection, feature extraction and person
identication.
18
3.1 Scene Analysis for Person Detection
To enable detection of people within a scene, a method of scene analysis based
on (Hordern and Kirchner, 2010) was used. First the pointcloud is projected
onto a 2D horizontal plane by taking a bivariate histogram in the two horizontal
axes. This process results in a density image where the intensity of each pixel
represents the concentration of points measured in that area of the scene. This
image resembles an aerial view of the scene where clusters of points typically
represent vertical surfaces. An example of this density image is shown in Fig.
3.2a.
This approach to scene analysis exploits the assumption that people appear in
a scene as vertical surface with a relatively high point-density in the histogram,
when compared with the many horizontal surfaces (oor, tables, chairs) in a
scene. In order to detect these vertical surfaces a threshold is applied to the
density image to remove low density areas such as horizontal surfaces, resulting
in a binary image where only signicant point clusters were represented, as seen
in Fig. 3.2b.
Blob detection is performed on the binary image to allow each apparent verti-
cal surface to be segmented from the scene. Apart from humans, common features
identied via this method of scene analysis include, walls, doors, and tall items
of furniture. Many of these false positives can be eliminated by setting sensible
size constraints based on the expected minimum and maximum size of a person.
Some false positives may still pass this stage of the detection process, however
this is acceptable and preferred to the alternative of creating false negatives (re-
jecting real people) as further discrimination is performed after feature extraction
which can generally eliminate remaining false positives. The result of the blob
detection and size checking process is shown in Fig. 3.2c where red blobs are
those rejected by the size constraints and orange blobs are the remaining blobs
identied as ROI. Fig. 3.2c shows 5 blobs detected by the system. The large red
blob in the centre of the Fig. 3.2c corresponds with the wall and adjacent couch
(seen in Fig. 3.2d) and has been rejected as a possible person because it is too
19
large. The small red blob on the left of Fig 3.2c represents the small portion of
the wall visible in the top left of Fig. 3.2d and has been rejected because it is
too small. Of the three orange blobs two are correctly identied as people. The
orange blob on the far right of Fig. 3.2c corresponds to the white wall on the right
of Fig. 3.2d. This segment has been identied as a ROI but will be disregarded
in the person identication stage based on its HSS.
(a) Bivariate histogram (b) Thresholding
A!
D!
B!
C!
E!
(c) Blob detection
(d) Scene from perspective of the sensor
Figure 3.2: Scene analysis process takes a pointcloud of the scene and detects
regions of interest used to segment the pointcloud.
After detecting and ltering ROI in the density image, the location of the
pixels included in each ROI are used to recover the points belonging to that ROI
20
from the original pointcloud. These pointcloud segments are then passed on to
the feature extraction stage.
3.2 Extracting the Head-to-Shoulder Signature
The feature extraction stage of the identication system is the major innovation
developed in this capstone project. The objective of this stage was to capture
descriptive characteristics of the individual from a pointcloud representation of
their head, neck and shoulders. Based on the outcome of the concept evaluation
(in Section 2.3) where lateral measurements of the head and shoulders were iden-
tied as potentially descriptive quantities, a method was devised extending this
idea to involve a series of lateral size measurements taken at regular intervals over
the region of the head and shoulders. In this way lateral sizes over the region
were explicitly measured and vertical characteristics, such as the distance from
the top of the head to the shoulders, were captured implicitly by changes in size
over the range of intervals.
In order to obtain these lateral size measurements the head-shoulders region
of the pointcloud is divided into a number of lateral slices of equal thickness h
covering the region from the top of the head to the top of the shoulders H (Fig.
3.3). The distance H is chosen to include the shoulders of most adults while
minimising coverage of the arms. H in the experiments presented in this report
was chosen to be 40cm based on a manual survey of the pointcloud data collected
in the concept evaluation study. The number and thickness of slices was chosen to
maximise the resolution of the feature vector obtained while ensuring that each
slice would contain sucient data points to extract the required information.
Selection of the slice thickness depends upon the sensor used and the range of
distances over which people are expected to be measured (D
max
). Eq. 3.1 was
used to determine the minimum slice thickness (h
min
) given the vertical angular
resolution of the sensor used () and the maximum distance of people from the
sensor (D
max
). The formula ensures at least one full row of depth pixels are
captured within each slice.
21
0.3
0.2
0.1
0
0.1
0.2
2
2.1
2.2
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
h!
H!
S
l
i
c
e
s
!
Head-to-Shoulder Signature!
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 1
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 2
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 3
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 4
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 5
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 6
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 7
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 8
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 9
1.7 1.8 1.9
0.2
0.1
0
0.1
0.2
0.3
0.4
Slice 10
Span!
Figure 3.3: The Head-to-Shoulder Signature is made up of rotationally robust
span features taken from a series of lateral slices of the pointcloud.
22
h
min
= 2D
max
tan() (3.1)
In the case of the concept evaluation study the data gathered was of people
standing orthogonally to the sensor, as shown in Fig. 3.4a. In this case lateral
measurements representing the head length and breadth and shoulder breadth
were easily obtained from the maximum and minimum x and z values of the
relevant segments. However when persons are not oriented orthogonally with
the sensor (Fig. 3.4b) the same maximum and minimum values do not give a
meaningful result. What is needed is a consistent lateral measure than is robust
to changes in viewing angle. This has been achieved in this method by means
of a search through all pairs of points in the pointcloud segment to nd the
greatest distance between any two points. This measurement is illustrated on
real point cloud data in Fig. 3.3 and diagrammatically in Fig. 3.4c. Due to
the concave nature of the human form (in the horizontal plane), data captured
from almost any single perspective (with the exception of those close to parallel
with the major axis) can be used in this way to measure a relatively (relative
to variation between individuals) consistent result relating to the length of the
object along the major axis. This dimension is referred to in this report as the
span of the pointcloud, and rather than accurately measure prescribed dimensions
such as head length or shoulder breadth. The aim of the span feature is to give
a repeatable measurement of lateral length of a concave 3D body. This span
feature is extracted from each slice of the pointcloud and the resulting vector of
numbers is what we refer to in this paper as the Head-to-Shoulder Signature.
The procedure presented in Alg. 1 describes the extraction of the HSS where
inputs to the algorithm are as follows:
P - three dimensional pointcloud of person, collection of (X, Y, Z) points
h - slice height, selected according to 3.1
H - segment height, 40cm in all our experiments
For each of the horizontal slices identied in line 8 the maximum distance
between any two points in the slice was calculated in line 13. The span was
23
z!
x!
Head length!
Shoulder breadth!
Head breadth!
(a) Measuring specic
human dimensions is
simplied when the
person is orthogonal to
the sensor
z!
x!
(b) Orthogonal (to sen-

sor) measurements are
less useful when the per-
son is not orthogonal
Shoulder span
Head span
(c) The span measure is
designed to capture a re-
peatable measure of lat-
eral size for a concave 3D
body.
Figure 3.4: The eect of viewing angle on pointcloud measurments: Measuring
orthogonally to the the sensor frame of reference is only useful when the subjects
orientation is known. The span measure shown in three provides a view angle
robust measure.
calculated using a triangular search of every combination of two points in the
slice (lines 11-18).
By searching the slice pointcloud to nd the maximum span in any direction
the measurement is made robust to changes in viewing angle.
3.3 Classication of HSS for Person
Identication
The person identication stage of the system uses the HSS extracted in the feature
extraction stage to identify the person. Before the person is identied the HSS is
used to verify that the ROI detected from the person detection stage is, in fact,
a person. To achieve this a single-class support-vector machine (SVM) is used to
determine if the measured HSS ts into a pre-trained model containing HSS from
a group of people of diverse sizes and shapes. The gamma parameter of the SVM
24
Algorithm 1 Feature Extraction
1: Input: P, h, H
2: Output: Feature Vector - F
3: N = ceil(
h
H
)
4: Y
max
=maxY (P)
5: for i = 1 to N 1 do
6: Y
upper
= Y
max
((i + 1)h)
7: Y
lower
= Y
upper
h
8: S =getSlicePointcloud(Y
upper
, Y
lower
)
9: if size(S) = 2 then
10: d
max
= 0
11: for j = 1 to size(S) do
12: for k = 1 to size(S) j do
13: d eucDistanceXZ(S
j
, S
k
)
14: if d > d
max
then
15: d
max
= d
16: end if
17: end for
18: end for
19: else
20: return null
21: end if
22: F
i
= d
max
23: end for
24: return F
25
can be tuned via a process of systematic iteration to t the training data such
that a broad range of shapes and sizes are allowed to pass while still rejecting
HSS recorded from non human targets such as the chair shown in Fig. 3.5b.
This single class SVM acted to lter out false positives (non-human detections)
reported by the scene segmentation stage. Fig. 3.5 shows the HSS from a person
and a chair respectively demonstrating the clear distinction between the two and
hence relative ease of detecting such false detections.
Once the detected person has been veried it must be determined whether
they are known or unknown to the system. This was, again, achieved using a
single-class SVM to decide wether the detected person ts into a model containing
HSS of people known to the system. The gamma parameter was systematically
selected to positively identify persons who had been learned previously by the
system while rejecting people who were not know to the system. This was not
as eective as in the previous verication process. While there was a signicant
dierence between the HSS of human and non-human detections, the dierences
between a known persons HSS and unknown persons HSS were not so distinct.
The eect of changing gamma in the SVM is shown in a 2D example (Fig. 3.6)
Selecting the gamma parameter in this case was a trade o between:
under-tting the training data achieving greater numbers of true positives
(accepting known people) but also increasing the number of false positives
(accepting unknown people) (Fig. 3.6a); or
over-tting the training data achieving greater numbers of true negatives
(rejecting unknown people) but also increasing the number of false negatives
(rejecting known people) (Fig. 3.6c)
In Fig. 3.6 the light-aqua dots represent training data and the dark-aqua region
represents the positive classication zone. The examples shown are of a 2D feature
space for the purpose of visualisation. Each HSS observation would actually have
N dimensions where N is the number of slices used in the extraction process but
this is hard to visualise.
After determining whether each person is known to the system the HSS from
26
(a) Head-to-Shoulder Signature measured from a person
(b) Head-to-Shoulder Signature measured from a chair
Figure 3.5: Head-to-Shoulder signature measured from a human is clearly dier-
ent to that of a chair, this representation can therefore be exploited to disregard
false person detections.
the detected persons is used either to train the model of known identities or to
identify the person. If the person is agged as unknown the system can add
their HSS to the model of known people. If the person was agged as known
the system can classify the HSS as belonging to one of the people in its model
27
(a) Undertting:
gamma = 1
(b) Good t: gamma =
10
(c) Overtting:
gamma = 100
Figure 3.6: By adjusting the gamma parameter of the SVM the classication
region can be made to t the training data more or less closely. (Images generated
using svmtoy (Chang and Lin, 2001))
and return the identity of the person. The classication of the HSS is performed
using a multi-class SVM with a radial-bias-function kernel. The implementation
of the SVM for classication, although crucial to the functionally of this system
is not the focus of this capstone but for the sake of completeness the important
details of the implementation are included in Appendix C.
Matlab and C++ implementations of the method described above were de-
veloped to facilitate evaluation of the method. Matlab was particularly useful
for fast development and was used extensively in the early development phase of
this project, due its script based programming environment and advanced data
visualisation tools. The C++ implementation involved integrating the functional
components of the method into the component based software architecture al-
ready in use on the RobotAssist platform. Within this framework each stage of
the method described here was developed as an individual component with inputs
and outputs allowing parts of the system to be tested in isolation or swapped for
another component. This combination of component based software and oine
data analysis and visualisation allowed thorough empirical evaluation of all stages
of the identication method as well the system as a whole.
28
Chapter 4
Empirical Evaluation
Concept evaluation (outlined in Chapter 2) suggested that the head-to-shoulder
region could be used to be to distinguish between individuals. Based on this the
person identication system (described in Chapter 3) was developed to detect,
examine and identify people. A number of experiments were conducted with
this system to evaluate its performance. Experiments were conducted using the
RobotAssist robot platform (Kirchner et al., 2010). Details of the RobotAssist
project not core to this capstone are attached in Appendix B.
4.1 Robustness to People in Walking Motion
4.1.1 Objective
An experiment was devised to evaluate the performance of the person identication
system in identifying people walking towards the robot, since this is likely be-
haviour to expect from a potential user of the robot. Specically this experiment
would test the systems robustness to changes in body pose and moving human
targets.
29
4.1.2 Experimental Procedure
The experiment used the Microsoft Kinect sensor attached to the head of the
RobotAssist platform (at a height of 1.5m). The robot was placed facing parallel
to a wall while a couch was placed along the opposite side of the robot to create a
corridor of open space 2m wide in front of the robot. This furniture arrangement
assured that participants would walk through the feature extraction zone (Fig.
4.1) on their approach to the robot without the need for specic instruction on
where to walk which could have potentially aected their walking style. The
entrance to the corridor was 5m from the robot, and the exit was 1m from the
robot on the users left.
Participants were directed, via a number of signs mounted to the experiment,
to stand at a marked position 1.5m in front of the robot so they could be either
classied or added to the HSS model. However, participants were recorded from a
distance of 4 metres to 2 metres from the robot as shown in Fig.4.1. The marked
line was set up as a distractor to convince the participants that they were being
recorded at the marked position when in fact they were being recorded as they
approached it. The experiment was constructed in this way in order to incite
natural behaviour from participants (Kirchner et al., 2011).
Upon arriving at the designated position users were identied by the system
and prompted to interact with the robot via a graphical user interface (GUI) in
order to correctly label the ground truth for each data set. The GUI (Fig. 4.2)
presented participants with the result of the person identication process and
asked them to either, conrm the true positive ID, correct the false positive ID,
or add themselves to the training model if they had not already.
This input was used online to add additional classes to the online model and
validate the classication accuracy as well as to label the recorded data set for
oine use. The performance of the online identication system was displayed in
a confusion matrix on screen which was updated after each test. The purpose
of this feedback was to encourage participants to revisit the test. At the end of
each test the user was asked to leave via the exit, so as not to walk back through
30
2m
FEATURE
EXTRACTION ZONE
STAND
HERE
STAND
HERE
Figure 4.1: The participant is directed to stand on the marked line but is, in fact,
measured before reaching this point to ensure natural data of the person walking
is captured.
the measurement area triggering another test. The experiment ran for one day in
the oces of CAS, UTS and was setup near the door of the facility to encourage
sta to participate as they entered and left the oce. At completion of the
experiment, classication results for entire day were recalculated using the rst
encounter with each of the participants as training data, to simulate an online
learning stage and subsequent encounters were used as testing data.
4.1.3 Results and Discussion
The classication results obtained are shown in the confusion matrix in Figure
4.3. Each row of the confusion matrix corresponds with the ground truth as
recorded by the GUI and each column corresponds with the classication result
obtained by the person identication system. Darker shading indicates a higher
frequency for that truth-result pair where the ideal result is a solid diagonal.
The mean classication success across all ground truths was 76.8%, the min-
imum 41.2% and the maximum 95.45%. Even the most confused class (row 7 in
Fig. 4.3) recorded a classication success 3.7 times greater than random selection.
31
Figure 4.2: Graphic user interface shows: the segment of the pointcloud being
measured, the shape of the HSS and the classication success to generate interest
in the experiment and encourage repeated tests, while the prompt to conrm or
correct the classication result of the system is used to label the recorded data for
oine use.
32
Figure 4.3: The confusion matrix shows positive classication results of the sys-
tem along the diagonal, points o the diagonal are missclassications.
The higher frequencies recorded in column 2 represent the systems tendency to
classify observations as belonging to that class. A possible explanation for this
result is high variation in training data for this class causing it to occupy a large
region in the feature space and encompass observations that do not not fall obvi-
ously into another class. This high variation in training data was most probably
due to the way the training data was captured without specic instructions given
to the participant on how to stand. This was done to simulate an online training
phase in a HRI context where he robot meets someone unkown and needs to learn
them. In hindsight more instruction on how to stand, even if delivered by the
robot, would lead to a more consistent training model and improve classication
results. Dark shading in an individual grid location such as row 7, column 8
indicates a frequent confusion. In this case person 7 was frequently identied
as person 2. This may indicate that these two people have a similar HSS. This
coupled with high variation in training data for class 2 may explain this confusion.
This experiment showed positive results on a group of 9 people walking nat-
33
urally, proving the usefulness of the system in a home setting where the number
of users would rarely exceed 9. However to test the capability of the system to
operate in environments such as a large oce, factory or hotel where the number
of users would exceed 9 the next logical step was to test the scalability of the
system.
4.2 Scalability of HSS Based Person
Identication
4.2.1 Objective
In order to evaluate the applicability of this system to scenarios involving large
numbers of people, for instance in a hotel, an experiment was designed to test
performance of the system on a signicantly larger (than 9 person) problem, as
well as the direct eect of varying the problem size.
This experiment was run with the help of Willow Garage (WG), a robotics re-
search lab in California. Researchers at WG were interested in the capabilities
of the person identication system for a joint venture with a large hotel chain
and oered to run experiments to assist in evaluation of the system. After being
briefed with the requirements of the experimental setup researchers from WG
gathered a large data set over a number of days at IIT Bombay and IIT Kanpur
in India. Participants were asked to walk towards the sensor, stop momentarily
at 3m, and then continue walking past the sensor as shown in Fig. 4.4. Data was
collected from 730 participants however due to limitations of the classication
implementation used (Chang and Lin, 2001) a subset of 438 recordings was used
for this evaluation.
Following data collection the depth image recordings were converted into 3D
pointclouds each labelled corresponding to the recording from which they were
taken. To simplify the scene segmentation process the pointclouds were cropped
34
3m!
Pause here
momentarily!
Figure 4.4: In the experimental setup for the scaleability test participants walk
towards the sensor, stop momentarily 3m away and then continue walking past,
capturing a mixture of natural walking, stopping and starting behaviours.
to contain only the area through which the participants walked. Feature extrac-
tion was performed on the data set using a 20 layer HSS. From each recording half
the data was used for training and the other half for testing. A moving average
was applied to the training and testing data sets as a simple approach to deal
with measurement noise. The data set was processed by the identication system,
rst as one large problem with 438 classes, and then over 9 more iterations with
incrementally smaller problem sizes.
The confusion matrix recorded from the initial test of 438 classes is shown in Fig.
4.5. The mean classication success across all classes in this test was 62%. This
result can be seen in the visible diagonal on the confusion matrix. There was a
great range in classication success across the classes with some greater than 90%
and others approaching 0%. Fig. 4.6 shows a magnied section of the confusion
matrix highlighting some near-0% results. A possible explanation for such low
results is non-conforming tests, for instance where participants were using mobile
phones during the test, adversely aecting the usefulness of the HSS recorded.
Closer inspection of the confusion matrix also shows a number of square pat-
35
Figure 4.5: Confusion matrix showing successful results of the scalability test on
the diagonal and missclassications o the diagonal. Grey point of the image has
been shifted to reveal more detail in the lower end of the results
36
SUCCESS RATE 0
Figure 4.6: Classication results near zero may indicate non-conforming trials
terns along the diagonal. These represent the tendency of the system to confuse
observations within a group. These groups most likely correspond with each of
the recording sessions from the data collection process. This may indicate in-
dicate inconsistency between each of the recording sessions in terms of sensor
placement which could also have aected the results.
Fig. 4.7 shows the eect of varying the problem size. Not surprisingly the
model size directly and adversely eects the performance of the system. What is of
greater interest is the trend in the data. The shape of the curve, indicates that the
adverse eect of model size may settle at higher than expected at approximately
60% classication accuracy. Classication results presented here are based on
single observations however by accumulating classication results over multiple
observations the performance of the system could potentially be increased.
37
0 50 100 150 200 250 300 350 400 450
60
65
70
75
80
85
90
Number of Classes
C
l
a
s
s
i
c
a
t
i
o
n
A
c
c
u
r
a
c
y
%
Eect of Number of Classes on Classication Accuracy
Figure 4.7: Graph showing the detrimental eect of increasing the size of the
problem (number of classes) on classication accuracy. Interestingly the graph
indicates that this eect will plateau around 60%.
4.3 Online performance of HSS Based Person
Identication System
4.3.1 Objective
For the person identication system to be useful in the context of HRI it would
need to be able to work in real time to identify people in an unconstrained envi-
ronment, presenting a variety of dierent behaviours. The aim of this experiment
was to test the ability of the system to perform under such circumstances recog-
nising a greater range of human behaviours than simply standing or walking
towards the robot. This test also served to evaluate the usefulness of associating
multiple classication results to improve the identication accuracy of the system
38
as this was identied as a potentially useful technique in the previous test.
In this experiment the RobotAssist platform was setup (as per Fig.4.8) in a room
facing the doorway. The Microsoft Kinect Sensor was attached to the head of the
robot and streaming depth images to the robot at 30 frames per second for online
processing. Participants were instructed to enter and exit the room presenting a
variety of poses to the robot including:
normal walking,
carrying objects,
facing away from the robot, and
sitting on a rolling chair
Conrmation of identication was verbally communicated by the robot. In
order to improve performance in an unstructured environment the online imple-
mentation of the algorithm used temporal association of classication data with
individuals. A particle lter (Kirchner et al., 2010) was used to continuously track
people enabling temporal association of data with an individual. The model of
known persons HSS was trained prior to the experiment using the GUI developed
for the previous experiment. The trained model of HSS was used with SVM to
classify each observation and upon classication a vector of probabilities associ-
ated with each class was returned. These class probabilities were accumulated
over a number of frames and the average of accumulated probabilities was used to
determine the outcome of classication. Applying an empirically derived thresh-
old (0.8 in our experiment) to the accumulated probabilities enabled rejection of
low condence classication results enabling the robot to withhold identication
results until a high condence. This approach is preferable in the context of HRI
as opposed to announcing numerous results per track.
39
Hello
Dan !!
Figure 4.8: Experimental setup for online evaluation where people were recog-
nised walking, carrying objects, from behind and seated
The experiment resulted in successful identication of all four participants over all
7 encounters with the robot. The test was video recorded and the video was used
in the team description paper (Alempijevic, 2011) for RobotAssists successful
qualication to compete in RoboCup@Home 2011. A number of images from
the video are shown in Fig. 4.9. Specically Fig. 4.9a shows a person who was
identied while facing away from the robot and holding a large painting. Fig.
4.9b shows a person being identied while facing the robot. Fig. 4.9c shows a
person identied as they leave the room holding a stack of books. Finally Fig.
4.9d shows a person identied while sitting in a rolling chair.
Although only a small scale test these results show the applicability of this
system to a real HRI situation. The test also demonstrates the value of temporal
40
(a) From behind carrying a painting (b) Front facing
(c) From behind carrying books (d) Sitting in a rolling chair
Figure 4.9: Images of successful identication results in a range of poses conrm-
ing the suitability of the system to the challenges of HRI
association of classication results facilitated by the use of a tracking system.
4.4 Performance of HSS Based Person
Identication at RoboCup@Home 2011
4.4.1 Objective
The purpose of competing in the RoboCup@Home 2011 competition was to test
the robustness of the complete robotic system in a number of simulated HRI
scenarios. With regard to the person identication system it served to evaluate
the performance of the system to distinguish between known (trained online) and
41
unknown persons.
4.4.2 Follow Me Task
The Follow Me task at RoboCup@Home 2011 required the robot to meet and
learn a randomly allocated person to be its leader (Fig. 4.10a), and then follow
this person through a series of checkpoints designed to test the capabilities of
the robot. This task served as a suitable evaluation of the person identication
systems performance in a realistic HRI scenario. The stages of the task relevant
to the evaluation of the person identication system are shown below in Fig. 4.10.
After learning its leader the robot must be able to follow them around the
arena and maintain tracking as another person passes in between robot and leader
as illustrated in Fig. 4.10b. The RobotAssist platform uses a laser scanner and
particle lter (Kirchner et al., 2010) to track the leader because this is a fast and
robust way to track a moving target. In the event that the target is lost the
robot uses the person identication system to identify the leader. As described
in the method the HSS of all visible targets is compared with the trained model
of the leader to determine which is a better match. At the end of the test the
robot is instructed by the leader (via a hand signal) to stop and wait for them to
return. The leader leaves the eld of view of the robot and returns with a second
unknown person. Upon their return the two people face the robot and wait. The
robot must identify the correct person as its leader and continue to follow them
(Fig. 4.10c). This is again a test of the systems ability to distinguish between a
known person, the leader, and an unknown person.
The system was successful and scored 1000 out of a possible 1100 points.
The remaining 100 points were bonus points for outstanding performance which
needed to be negotiated prior to the test. Nonetheless this was the highest score
in the competition for this task tied with 2 of the other teams where the average
score from all 19 teams was 242. The results from this test conrmed the ability
of the system to successfully distinguish two individuals based on their HSS.
42
LEARN
LEADER!
(a) The robot learns the HSS of its leader
FOLLOW
LEADER!
(b) The robot must follow its leader as another person passes in front
REMEMBER
LEADER!
(c) After losing its target the robot must identify which person is its leader
Figure 4.10: Stages of the RoboCup@Home 2011 Follow Me task where the
RobotAssist team scored full marks.
43
4.4.3 Grand Final
The RoboCup@Home 2011 Grand Final task allowed teams to exhibit any func-
tionality they wished, and the strength of the performance was used to decide 2/3
of the teams nal competition score. The 2011 RobotAssist team used this op-
portunity to demonstrate the performance of the HSS based person identication
system and at the same time evaluate the robustness of the system to repeated
identication tasks in dicult scenarios. In particular the systems ability to
recognise seated individuals from behind was tested. Not only is this one of the
more challenging identication scenarios tested but it represents the realistic sit-
uation of a robot approaching a table of seated people in a home environment.
In this scenario the people would most likely be facing the table and hence facing
away from the robot.
The test was structured similarly to the Follow Me task. The robot learned
a leader at the beginning of the test and proceeded to follow them through a
number of trials. The most relevant part of this test to the evaluation of the
person identication was the nal trial. In order to test the ability of the robot
to recognise its leader in challenging circumstances the robot was blinded with a
cloth while the leader and another person were positioned in front of the robot
with their backs to it as, illustrated in Fig. 4.11a. While covered the robot
would lose its ability to track the position of the leader forcing it to rely on the
HSS based identication system alone to recognise its leader. The blinding cloth
was then removed and the robot would recognise its leader and resume following.
This process was repeated with dierent people and dierent sitting and standing
congurations until the test time had elapsed.
The system performed awlessly in this test, correctly identifying its leader
in 3 out of 3 cases. This test showed the capability of the system for robust
performance in a simulated home environment. Although only tested with with a
choice between 2 people each time, robustly performing this task online involved
a number of important capabilities. As discussed in section 4.3, to work in a HRI
context the robot needed to not simply correctly classify the observed HSS but
44
?!
(a) Robot is blinded disrupting its position tracking system
REMEMBER
LEADER!
(b) Robot uses HSS based person identication to recognise its owner
Figure 4.11: RoboCup@Home 2011 Grand Final: Robot recognises its leader
from behind
45
accumulate classication results until a high condence decision could be made.
The performance of the system was judged by a panel of external judges as well
as the RoboCup@Home technical committee. The nal competition score was
a combination of the external judges score, the technical committee score and
the performance in the competition up to that point, with each representing 1/3
of the nal score. Based on this scoring scheme RobotAssist placed 4th (out
of 19 teams) in the RoboCup@Home competition signicantly improving on the
previous years placed of 12th.
46
Chapter 5
Conclusions and Future Work
5.1 Conclusions
This capstone project has presented a novel method for person recognition which
exploits physical shape of the head, neck and shoulders to facilitate non-intrusive
person identication suitable for the context of HRI. The method presented
employs a rotationally robust feature vector (HSS) to encapsulate the head-to-
shoulder shape of individuals. Combined with a method of scene analysis for
person detection and the use of a SVM classier this capstone presented a com-
plete system for person detection and, learning and identication. A series of
empirical evaluations were performed to asses the performance of the system.
The person identication system was shown to recognise people (from a pool
of 9) in a natural walking motion with 76.8% accuracy. The system was shown to
scale surprisingly well to a problem size of 438 people with an accuracy (based on
individual observations) of 62%. The system was also demonstrated to identify
individuals: walking towards the robot, away from the robot, carrying objects
and sitting down. Finally the person identication system was implemented and
tested on the RobotAssist platform and used in competition at RoboCup@Home
2011 in Istanbul where it was a major contributor to the success of the RobotAs-
sist team who placed 4th out of 19 teams in the competition.
47
5.2 Future Work
Although the system has been shown to perform well in many situations there
are several of areas to be considered in future work.
5.2.1 Identifying Unknown People
Although a method for detecting unknown persons was devised, implemented and
utilised at RoboCup@Home this component of the system remains to be throughly
evaluated. Worthwhile future work in this area could include experimental eval-
uation of the single-class SVM method used. The eect of adjusting the gamma
and cost parameters in the SVM model could be investigated in greater detail
with an aim to developing an ecient method for determining the optimal pa-
rameters, given a set of training data. This method could also be compared to
other methods of anomaly detection.
5.2.2 Benets of Tracking
Through the process of testing the identication system it became evident that
the ability to associate HSS feature vectors and classication results with per-
sistent tracks would signicantly improve the performance of the system. Some
simple tracking techniques were implemented on the RobotAssist platform at the
competition but not thoroughly tested. Scope for future work could be to inte-
grate a particle lter based tracking component into the system and evaluate the
benets of using these persistent tracks to accumulate multiple HSS observations,
multiple classication results, or both.
5.2.3 Robustness to Changes in Personal Attire
An interesting question that has not been addressed in this capstone project is
the robustness of the system to changes in the personal appearance or attire of
an individual from day to day. It is likely that items such as hats, scarves, high
collars and variations in hair style will have an adverse eect on the performance
48
of the system in its current state. An interesting topic for further research and
development of this identication system would be what eect such items have
have on the HSS and whether these eects can be mitigated, for instance via
sparse representation of the HSS with some front end ltering system to eliminate
unexpected measurements associated with hats and other items.
49
Bibliography
Alen Alempijevic. Robotassist - robocup@home 2011 team description paper. In
RoboCup 2011, Sydney, 2011.
J. Barreto, P. Menezes, and J. Dias. Human-robot interaction based on haar-
like features and eigenfaces. In Robotics and Automation, 2004. Proceedings.
ICRA 04. 2004 IEEE International Conference on, volume 2, pages 1888
1893 Vol.2, 26-may 1, 2004. doi: 10.1109/ROBOT.2004.1308099.
Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library
for support vector machines, 2001. Software available at
http://www.csie.ntu.edu.tw/ cjlin/libsvm.
S.A. Cole. Suspect identities: a history of ngerprinting and criminal
identication. Harvard University Press, 2002.
C. Darwin. The origin of species. Number v. 1 in The Origin of Species. D. Ap-
pleton, 1902. URL http://books.google.com.au/books?id=QrcRAAAAYAAJ.
Gordon C. et al. 1988 anthropometric survey of u.s. army personnel: Methods and
summary statistics. Technical report, United States Army Natick, Research,
Development and Engineering Center, 1989.
Michael A. Goodrich and Alan C. Schultz. Human-robot interaction:
a survey. Found. Trends Hum.-Comput. Interact., 1:203275, Jan-
uary 2007. ISSN 1551-3955. doi: 10.1561/1100000005. URL
http://dl.acm.org/citation.cfm?id=1348099.1348100.
50
Daniel Hordern and Nathan Kirchner. Robust and Ecient People Detection
with 3-D Range Data using Shape Matching. In Proc. of the 2010 Australasian
Conference on Robotics and Automation (ACRA-10), pages 18, 2010.
A. Hrdlicka. Anthropometry. The Wistar institute of anatomy and biology, 1920.
MO Irfanoglu, B. Gokberk, and L. Akarun. 3D shape-based face recognition using
automatically registered facial surfaces. In Pattern Recognition, 2004. ICPR
2004. Proceedings of the 17th International Conference on, volume 4, pages
183186. IEEE, 2004. ISBN 0769521282.
P. Kakumanu, S. Makrogiannis, and N. Bourbakis. A survey of skin-color
modeling and detection methods. Pattern Recognition, 40(3):1106
1122, 2007. ISSN 0031-3203. doi: DOI: 10.1016/j.patcog.2006.06.010. URL
http://www.sciencedirect.com/science/article/pii/S0031320306002767.
Nathan Kirchner, Alen Alempijevic, Sonja Caraian, Robert Fitch, Daniel
Hordern, Gibson Hu, Gavin Paul, David Richards, Surya P. N. Singh, and
Stephen Webb. RobotAssist - a Platform for Human Robot Interaction Re-
search. In Proc. of the 2010 Australasian Conference on Robotics and Automa-
tion (ACRA-10), pages 110, 2010.
Nathan Kirchner, Alen Alempijevic, and Gamini Dissanayake. Nonverbal robot-
group interaction using an imitated gaze cue. In HRI 11: Proc. 6th ACM/IEEE
Int. Conf. on Human robot interaction, pages 18, Switzerland, 2011.
Kuang-Chih Lee, Jerey Ho, and David J. Kriegman. Acquiring linear subspaces
for face recognition under variable lighting. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 27:684698, 2005. ISSN 0162-8828. doi:
http://doi.ieeecomputersociety.org/10.1109/TPAMI.2005.92.
B. Moghaddam and A. Pentland. Probabilistic visual learning for object repre-
sentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
19(7):696 710, jul 1997. ISSN 0162-8828. doi: 10.1109/34.598227.
51
Stephen Pheasant. Bodyspace: anthropometry, ergonomics, and the design of
work. Taylor & Francis, 1996. ISBN 9780748400676.
Pramila Rani, Changchun Liu, Nilanjan Sarkar, and Eric Vanman. An em-
pirical study of machine learning techniques for aect recognition in hu-
manrobot interaction. Pattern Analysis and Applications, 9:5869, 2006.
ISSN 1433-7541. URL http://dx.doi.org/10.1007/s10044-006-0025-y.
10.1007/s10044-006-0025-y.
J.H. Relethford. The Human Species: An Introduction to Biological Anthropology.
McGraw-Hill, 2009.
C. Sanderson. Biometric Person Recognition Face, Speech and Fusion. VDM
Verlag, 2008.
Stevenson A. Soanes C. Oxford Dictionary of English. Oxford University Press,
2005.
A. Young V. Bruce. Understanding face recognition. British
Journal of Psychology, 77(3):305327, 1986. ISSN 2044-
8295. doi: 10.1111/j.2044-8295.1986.tb02199.x. URL
http://dx.doi.org/10.1111/j.2044-8295.1986.tb02199.x.
Paul Viola and Michael Jones. Rapid object detection using a boosted cas-
cade of simple features. Computer Vision and Pattern Recognition, IEEE
Computer Society Conference on, 1:511, 2001. ISSN 1063-6919. doi:
http://doi.ieeecomputersociety.org/10.1109/CVPR.2001.990517.
D. Voth. You can tell me by the way i walk. Intelligent Systems, IEEE, 18(1):4
5, Jan 2003. ISSN 1541-1672. doi: 10.1109/MIS.2003.1179185.
L. Wang, T. Tan, H. Ning, and W. Hu. Silhouette analysis-based gait recognition
for human identication. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 25(12):15051518, 2003. ISSN 0162-8828.
52
Mike Woods. Caring for older australians: Overview, report no. 53, nal inquiry
report. Technical report, Productivity Commission, Australian Government,
2011.
J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, and Yi Ma. Robust face recogni-
tion via sparse representation. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 31(2):210 227, Feb. 2009. ISSN 0162-8828. doi:
10.1109/TPAMI.2008.79.
53
Appendix A
3D Sensing Technology
Crucial to the development of the person identication system presented here is
the use of 3D range cameras. These sensors are a type of camera which measures
the distance of the scene from the sensor at each pixel in the cameras eld of
view. This type of data is commonly referred to as a depth image. Using a depth
image in combination with some understanding of the optics of the range camera
a 3D pointcloud can be obtained. A pointcloud is a list of 3D cartesian points
which in this case represents the locations in space of physical surfaces as seen
by the range camera. This type of information is very useful in robotics because
it enables robots to interpret their environment spatially. There are two main
types of range camera technology, time-of-ight (TOF) and coded-light (CL).
A.1 Time of Flight
Time of ight cameras work by illuminating the scene in front of the camera with
an infrared lamp and sensing the reected infrared light with the camera. The
processor on board the sensor uses the phase dierence of the reected infrared
light to calculate the distance of the observed surface from the sensor. The Swis-
sRanger 4000 is an example of a TOF camera and was used in the development
of the identication system presented here.
54
A.2 Structured Light
Structured light 3D sensors work be projecting a specially structured pattern of
infrared light into the scene from a slightly dierent perspective to the camera.
From the point of view of the camera this projected pattern appears distorted
depending on the spatial placement of visible surfaces in the scene. The onboard
processor uses the discrepancies between the projected pattern and the pattern
as seen by the camera to calculate the depth at each part of the image. The
Kinect sensor, which is sold by Microsoft as a game controller for their popular
XBox gaming console, is based on the PrimeSensor 1.08 reference design and is
an example of a depth camera which uses structured light technology.
55
Appendix B
RobotAssist as a Development
Platform
B.1 The RobotAssist Project
The RobotAssist project aims to
Provide a platform for undergraduate and postgraduate students to conduct
research into robotics in particular HRI
Contribute international HRI research community
Integrate teaching and research, providing a path for students at all levels
(high school - university) to become involved in robotics research
Raise the prole of the university in the international robotics community
through competition in the RoboCup@Home competition
Raise the public prole of the university by generating media attention in
exciting projects at UTS.
Develop an adaptable hardware-software platform capable of a wide range
of activities in the arena of HRI and domestic service robotics.
56
B.2 The RobotAssist Platform
B.2.1 Hardware
The platform consists of a collection of o-the-shelf sensors, computers, and actu-
ators coupled with custom made frame and moving parts. The robot is built on a
two-wheeled, motorised segway base for mobility on which is mounted a desktop
computer running Ubuntu which acts as the main processor for the robot. The
sensors on the robot include:
3D Range Cameras (Kinect and Swissranger)
2D Laser Rangenders (Hokuyo)
XSens 6dof inertial measuring unit
Hi-res rgb camera
a pair of high quality condenser microphones (and external audio device)
B.2.2 Software
The platform is primarily controlled by a component-based software framework
called Orca. The Orca framework allows for each separate capability of the robot
to be developed as an individual component which can be started or stopped
independently of other components. These components communicate with one
another via a system called IceBox using a set of predened interfaces. Com-
ponents can publish data, subscribe to data, expose task interfaces, and invoke
task interfaces. This type of system is very versatile because components can
be stopped, started or even interchanged without shutting down the entire sys-
tem, facilitating testing and experimentation. The fact that components can be
developed independently of one another also allows multiple members of the de-
velopment team to be working on dierent parts of the project at he same with
out the need for constant consultation.
57
Appendix C
Classication Using Support
Vector Machines
The identication system developed for this capstone works by extracting the HSS
of the human user and classifying it as one of its known user base. The problem
of classication is (although not the main focus of this project) an interesting
problem in its own right. The system described in this report uses support vector
machines (SVM) to address the important task of classication.
Support vector machines are a concept in machine learning used for solving re-
gression problems (to obtain continuous results) as well as classication problems
( to obtain discrete results in the form of labels). In this project SVM is being
used for the latter (classication) to obtain discrete labels representing known
users to the system. In theory SVM classication produces a binary result. Two
classes are trained with some data points and the SVM denes the boundary
between the two classes. Further data points can then be classied using the
trained SVM and each point will be found to be on one side of the boundary or
another. Despite the binary nature of SVM several strategies exist for solving
multi-class problems using SVM, most of which involve reducing the multi-class
problem to a set of two-class problems.
For the purpose of this project an openly available library called LIBSVM
(Chang and Lin, 2001) was used to implement the classication stage of the
58
system. This library has been written to interface with Matlab, C, Java, Python
and several other programming languages and it is widely used in the research
community. In this project the Matlab programming interface was used for oine
testing and development of the feature extraction algorithm and system as a
whole. The C interface was also used to create real time implementation
59

Test

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Test

Enviado por

Direitos autorais:

Formatos disponíveis

University of Technology, Sydney

Faculty of Engineering and Information Technology

(b) Orthogonal (to sen-

Você também pode gostar