Você está na página 1de 2

4W1H and PSO for Human Activity Recognition

Paper:

4W1H and Particle Swarm Optimization for


Human Activity Recognition
Leon Palafox and Hideki Hashimoto
Institute of Industrial Science, The University of Tokyo
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
E-mail: leon@iba.t.u-tokyo.ac.jp, hashimoto@elect.chuo-u.ac.jp
[Received February 21, 2011; accepted May 27, 2011]

This paper proposes a paradigm in the forensic area


for detecting and categorizing human activities. The
presented approach uses five base variables, referred
to as 4W1H (Who, When, What, Where, and
How) to describe the context in an environment.
The proposed system uses self-organizing maps to classify movements for the How variable of 4W1H,
as well as particle swarm optimization clustering techniques for the grouping (clustering) of data obtained
from observations. The paper describes the hardware
settings required for detecting these variables and the
system designed to do the sensing.
Fig. 1. iSpace sensor setting.

Keywords: self organizing maps, particle swarm optimization, 4W1H, activity recognition

1. Introduction
Human Activity Recognition (HAR) systems are
present in cities, buildings, and rooms where they change
and adapt to continuously changing environments. To
be able to do this, these systems process the data gathered from sensors in the environment. They then modify
the environment in consonance with the activities demonstrated by the people present in the environment. Human
activity recognition has been recognized as a very large
field including many challenges (e.g., HAR can focus on
a person or people, a single room, or even a whole city).
Although the proposed approach could be applied, potentially, to any setting, the focus in the paper is on intelligent
rooms, where the users are few and variables, such as objects and places, are known. In particular, we will use
the iSpace [1], which has an adequate set of sensors to do
recognition tasks (Fig. 1).
Intelligent room settings usually have three components: sensing, classification and action. The sensing
and classification problems have a close relation since the
traits of the sensed data (images, video, sound, etc.) will
dictate which classification tools should be used. Nevertheless, most conventional techniques of HAR have flaws.
For example, cameras recognize activities using only the
human pose [2], often overlooking the multiple characteristics of each scene (e.g., time, place, and environment).
Vol.15 No.7, 2011

To address this issue, some groups focus on extracting activities using context detection. Works like [3] and [4]
showed that sensing extra variables can increase the accuracy of a recognition system.
In this work, we propose describing the actions in an
environment using 4W1H. The 4W1H paradigm defines
activities as a set of 5 variables (Who, When, What,
Where, and How) deemed sufficient to describe every
action. Furthermore, by defining each activity as a set of
these variables, we can mix sensing techniques. For example, we can detect What and Who using object and
subject identification algorithms. We can use an RFID
tagged environment to sense the What variable. Then,
we can use clustering and classification techniques to process the different activities given multiple sets of 4W1H.
There are, however, also problems to solve in the
4W1H method. For example, the way a left-handed person uses a pencil is different from the way a right-handed
person does, and different people have different ways of
doing things. When we use 4W1H, therefore, the How
variable has an intrinsic complexity and needs a special
classification on its own. We need a scheme capable of
performing on-line recognition of an increasing number
of possible Hows. To do this, we used a mix of wavelets
and self-organized maps, which showed good results [5]
when doing a rough classification of unknown inputs with
a high variance.

Journal of Advanced Computational Intelligence


and Intelligent Informatics

793

Palafox, L. and Hashimoto, H.

The goal of the proposed system is to sense activities


in a space. So, once we have the 4W1H variables in a
buffer, we need to cluster them into groups for identification. We map each of the 4W1H variables to an R5
space, by assigning categorical values to each variable. In
this work, we use a clustering technique based on particle
swarm optimization to group these variables. Further, by
doing projections in any of the planes, we can visualize
the clusters with references like time, space and users.
The remainder of the paper is organized as follows. Initially, we discuss related work, and the algorithms used to
solve the problem. We then present and describe the sensing hardware and the sensing system. Finally, we present
results from experimental investigations.

2. Preliminaries
2.1. Related Work
Schilit and Theimer [6] first defined context sensing as:
The ability of the system to discover and react to changes
in the environment they are located in. Using this definition, we may also say that a context sensing system
is capable of sensing those variables which generate the
changes in the environment. Work by Schmidt [7] described how context may help to infer human activities.
Schmidts work did not set up a set of activities or a pattern, but described context-aware applications. Robertson
and Reid [2] used an approach that used position and velocity in addition to local motion to describe an activity.
This enhanced the recognition rate of their system. Li and
Fei-Fei [3] also used context and defined three variables
for static images (what, where, and who). Their approach
used a Dirichlet mixture model to define activities as a
mixture of variables found in the scene. Their work overall, however, did not investigate the implications of relying in one sensor to get all variables. Their results, while
compelling, were limited to static images. Huang et al. [4]
used a similar approximation of context. Their work focused on the when, what, and where variables using an
arrangement of sensors. They also used a pattern matching algorithm to match sensed data to activities. This pattern matching sensing, however, may suffer from a lack of
flexibility in situations with new objects. The presented
work differs from these contributions by increasing the
number of sensing variables. The presented approach also
uses a clustering system. This clustering system will allow the the system to recognize a wide number of unseen
activities.
2.2. Self-Organizing Maps
Kohonen [8] describes Self-Organizing Maps (SOM)
as an algorithm that projects a Z dimensional feature
space into a 2 dimensional map. It places similar elements
close to each other, thus, preserving the topology of the
space. Typically, a SOM is represented by an N N matrix, where each element is a neuron that has Z weights.
In this work, we present K inputs from a Z dimensional
794

space to the map and compare them with each element of


the matrix. A training process updates the weight vectors
in order to bring them closer to the input. After training,
the SOM will be divided into areas where similar inputs
will be grouped together. For a full explanation of the updating equations, please refer to [8]. Anyhow, the map
then classifies new inputs by measuring the distance between the new input D and each element of the SOM.
The resulting class assignment is set to the neuron with
the minimum distance to the input. Although SOMs are
not often used to classify new inputs, Benitez [9] and Cahplot [5] have used SOMs combined with Wavelets as a
surrogate for classification with good results.

2.3. Particle Swarm Optimization Clustering


Particle Swarm Optimization (PSO) is inspired by natures social optimization. Visually, PSO can be imagined as a flock of birds flying in the sky or a school of
fish swimming in the water. In any of these groups every
so-called particle has a position, a velocity, and a set of
simple instructions to do its tasks. In these groups, each
individual is a candidate solution for the optimum state. In
addition, each individual has the ability to record its best
position and regulate its velocity according to its target.
In some applications [10], the PSO algorithm is also
capable of recording the global best solution of the entire
population. In this case, the basic algorithm consists in
updating the current particle velocity and position, while
the update is based on its communitys best results toward
an optimum solution to a given fitness function. On the
other hand, work by Omran et al. [11] describes a PSObased clustering algorithm that is as robust as K-means
and, sometimes, even faster than K-means. In Omrans
algorithm, the particle space is simply regarded as a set of
K possible cluster centroids, and each particle is updated
in relation to the best distribution of these centroids. The
fitness function in PSO is typically a multi-objective optimization problem, which minimizes the distance within a
cluster and maximizes the separation among clusters.
Anyhow, for this work, we have decided to choose
this latter algorithm [11] because of its fast convergence,
which makes it good for an on-line sensing system, as
well as its relative ease of implementation.
2.4. Wavelets
The wavelet transform coefficients, given by the inner
product of x(t) and the basis functions
W ( , n) = x(t), ,n (t)

. . . . . . . . (1)

comprise the time-frequency representation of the original signal. In digital signal processing (as in this work),
the fast-forward wavelet transform is typically implemented as a set of tree-structured filter banks. The input
signal is divided into contiguous, non-overlapping blocks
of samples called frames, and sampling works by sampling frame by frame for the forward transform. This
work uses wavelets as a fast way to compress and filter
the signal from the MTx sensor. It is expected that this

Journal of Advanced Computational Intelligence


and Intelligent Informatics

Vol.15 No.7, 2011

Você também pode gostar