Você está na página 1de 9

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 16, NO.

2, APRIL 2008 131

Local Temporal Common Spatial Patterns for


Robust Single-Trial EEG Classification
Haixian Wang and Wenming Zheng

Abstract—In this paper, we propose a novel optimal spatio-tem- designed such that the variances of the resulted signals deliver
poral filter, termed local temporal common spatial patterns the most discriminative information. CSP has been extensively
(LTCSP), for robust single-trial elctroencephalogram (EEG) applied to abnormal components extraction in EEG [9], normal
classification. Different from classical common spatial patterns
(CSP) that uses only global spatial covariances to compute the versus abnormal EEGs classification [11], localization of
optimal filter, LTCSP considers temporally local information in sources [12], [13], movement-related task classification [17],
the variance modelling. The underlying manifold variances of [20], multiple-channel EEG of motor imagery classification
EEG signals contain more discriminative information. LTCSP is [23], and has been extended from originally binary classifica-
an extension to CSP in the sense that CSP can be derived from tion to multiclass situation [4].
LTCSP under a special case. By constructing an adjacency matrix,
LTCSP is formulated as an eigenvalue problem. So, LTCSP is
Geometrically, the discriminative features of CSP are made
computationally as straightforward as CSP. However, LTCSP has available by finding directions (i.e., spatial filters) onto which
better discrimination ability than CSP and is much more robust. the difference of projected scatters between two EEGs is max-
Simulated experiment and real EEG classification demonstrate imized. CSP computes the spatial patterns by simultaneously
the effectiveness of the proposed LTCSP method. diagonalizing two covariance matrices associated with two pop-
Index Terms—Brain–computer interface (BCI), common spatial ulations of EEGs so as to maximize the difference between the
patterns (CSP), manifold learning, robust electroencephalogram two projected populations. However, one critical drawback of
(EEG) classification, temporally local variance. CSP is that the estimation of the covariance matrices is severely
affected by outliers, for example, unbounded influence func-
I. INTRODUCTION tion. So, even one recording point or one trial contaminated by
artifacts can make the filters changed dramatically, which ba-
HE automatic classification of single-trial electroen-
T cephalogram (EEG) has received increasing attention in a
wide range of biomedical engineering recently. EEG provides
sically means that the filters obtained are meaningless. There-
fore, the classification performance can be strongly weakened
by outliers and artifacts, resulting in bad generalization ability.
a potential nonmuscular communication channel for severely Another limitation of CSP is that it does not take temporal in-
disabled persons, such as those suffering from amyotropic formation into account in the estimation of covariance matrices.
lateral sclerosis (ALS) or locked-in syndrome, since some In other words, CSP is time-independent global method. It fails
mental tasks yield distinguishable EEG signals that can be used to capture temporally local structure of samples. It is expected
to control an assistant device, i.e., brain–computer interface that the time-dependent local variance delivers more discrim-
(BCI) [27]. The feasibility and reliability of this communication inative power than the global variance used in CSP, since the
heavily depends on robust and accurate recognition of the EEGs local variance explicitly considers the (temporal) manifold be-
corresponding to respective mental processes. A large number hind the generation of EEG signals and thus characterizes the
of signal processing and pattern recognition algorithms have intrinsic variance of the signals.
been employed in the BCI community [1], [18]. Among them, From the perspective of neurophysiology, EEG signals are
common spatial patterns (CSP) technique is one of the most naturally nonstationary and may show the behavior of autore-
popular methods. The CSP, first introduced to the domain of gression [19]. Besides, the scalp-recorded EEG signals are very
EEG analysis by Koles et al. [10], is to construct new time-se- noise, since the signals of interest are usually accompanied by
ries by projecting the high-dimensional, spatio-temporal raw a large number of simultaneously active brain activities as well
EEG signals onto very few spatial filters that are elaborately as various artifacts such as electromyogram (EMG) and elec-
trooculogram (EOG) [16]. The performance of a globally linear
method of CSP may be badly distorted by the intrinsic signal
Manuscript received August 12, 2007; revised October 29, 2007; accepted variability and the noise. So, it is important to develop a robust
November 13, 2007. This work was supported in part by the Specialized
Research Fund for the Doctoral Program of Higher Education under Grant
technique that discovers the intrinsic manifold of EEG and is as
20070286030, in part by the National Natural Science Foundation of China invariant as possible against such distortion.
under Grant 10571001 and Grant 60503023, in part by the Jiangsu Natural This paper contributes a local temporal CSP (LTCSP) for
Science Foundation under Grant BK2005407, and in part by the Program for
New Century Excellent Talents in University under Grant NCET-05-0467.
EEG classification, which is motivated by the idea of manifold
The authors are with Key Laboratory of Child Development and Learning modelling developed recently in the field of machine learning
Science, Ministry of Education, Research Center for Learning Science, South- [8], [14], [28]. LTCSP explicitly considers the temporally local
east University, Nanjing, Jiangsu 210096, China (e-mail: hxwang@seu.edu.cn).
Color versions of one or more of the figures in this paper are available online
structure of observed samples. Specifically, by constructing a
at http://ieeexplore.ieee.org. time-dependent adjacency graph, LTCSP models the variances
Digital Object Identifier 10.1109/TNSRE.2007.914468 of EEG signals by using temporally neighboring samples.
1534-4320/$25.00 © 2008 IEEE
132 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 16, NO. 2, APRIL 2008

LTCSP is an extension to CSP in the sense that CSP can be with dimension (channels) by (samples), whose normal-
derived from the LTCSP framework when adopting a particular ized spatial covariance is estimated by .
adjacency graph. It is worthwhile to highlight some properties The final spatial covariances and are respectively com-
of LTCSP as follows. puted by averaging over the trials under each condition.
1) LTCSP has a linear formulation. By using the adjacency The CSP analysis seeks to find a matrix and diagonal matrix
matrix, LTCSP is formulated as an eigenvalue problem that with elements in [0,1] such that
is similar with CSP. So, like CSP, it is computationally easy
and stable. (1)
2) LTCSP, however, explicitly considers the temporally local where is an identity matrix. This can be achieved via the fol-
variance structure, which is modelled by the adjacency lowing two steps. Firstly, whiten the composite spatial covari-
graph. It can model substantially manifold structure of
ance , i.e., determine a matrix such that
EEG signals. Although the underlying theory behind the
generation of EEG signals is not thoroughly established, (2)
the global linear method may not adequately model the
EEG signals [7], [19]. And the advantages of nonlinear The whitening matrix is formed as , where
methods, however, usually are suppressed by the models’ is the eigenvectors matrix of and is the diagonal
complexity [6]. These deficiencies are effectively over- matrix of associated eigenvalues. Secondly, let
come by LTCSP. and , respectively. Then and share the
3) LTCSP is less sensitive to potential outliers and artifacts. common eigenvector matrix , i.e.,
The robustness is achieved from twofold sides. First, the
(3)
outliers could only affect their neighboring samples. And
secondly, the impact within the local range is further al- This decomposition can be accomplished due to .
leviated by the weighting of heat kernel employed in the We assume, throughout this paper, that the eigenvalues men-
adjacency graph. tioned are sorted in descending order. Therefore, the final spatial
In literatures, there are two methods that are most related filter that satisfies (1) is given by
with LTCSP: optimal kernel feature extractor (KFE) [24] and
common spatio-spectral pattern (CSSP) [16]. KFE employs the (4)
“kernel trick” to extend CSP from linear domain into a higher
dimensional kernel space, typically related to input space via Using this decomposition matrix , the EEG signals and
a nonlinear mapping. This kernel-based CSP, however, does are projected as
not take into account the temporally local information and the (5)
outliers, since all the data points are mapped by a common func-
tion, which is globally time-independent and not necessarily Since the sum of two corresponding eigenvalues is always one,
detect outliers. CSSP concatenates each original sample with the variances of first and last few rows of and are most
a time-delayed one to form a longer vector sample, and then suitable features for classification. They (possibly after a log-
extracts spatial filters from these padded samples. In fact, the transformation) are then fed into a classifier.
CSSP method still does not involve temporally local structure
of samples. Besides, the higher-dimensional samples make the B. Local Temporal Common Spatial Patterns
estimation and computation of covariance matrix worse. 1) Basic Idea of LTCSP: The optimal spatial filter is ob-
tained by maximizing the filtered variance of one class and min-
imizing the filtered variance of the other class simultaneously.
II. METHODS Mathematically, can be investigated from another viewpoint,
i.e., is the solution to the optimization problem
A. Common Spatial Patterns
(6)
For the EEG analysis, the raw EEG signal of single trial is
an observed time course for
, with the number of samples (i.e., recording subject to . The proof is as follows. In fact, the
points) per trial and the number of channels (i.e., recording solution to (6) is the eigenvectors of the matrix [5],
electrodes), where represents the transpose operator. That is, [14]. On the other hand, from (3) and (4), it is easy to show that
we represent the observation at a given time as a point in -di-
mensional Euclidean space. And one EEG signal consists of (7)
such points. We assume the points are zero-mean, which usu-
ally is the case after frequency filtering. Denote the ma- where is a diagonal matrix with the associated eigenvalues
trix as with each column being a recording in diagonal and sorted in de-
vector. The normalized spatial covariance of the EEG is calcu- scending order. Here is the th diagonal element of . This
lated as , where tr is the trace operator establishes that the filter obtained by CSP is just the eigenvec-
that sums up the diagonal elements of a matrix. Likewise, de- tors of , and so it maximizes (6). The first and last
note single-trial EEG data matrix under another condition by few columns of have good discriminant information. Below,
WANG AND ZHENG: LOCAL TEMPORAL COMMON SPATIAL PATTERNS FOR ROBUST SINGLE-TRIAL EEG CLASSIFICATION 133

we consider the first few columns of as an example. It is vice where


versa for the last few columns. It follows that
(12)

which we refer to as temporally local covariance matrix.


Likewise, we replace in (8) by

(13)

which also can be denoted by .


(8) Consequently, by substituting above results into (8), we actu-
ally aim to solve that maximizes the objective
where and are respectively the numbers of trials under function
each condition, denotes the th trial, and is the th
column of the matrix . Below, we consider the quadratic
(14)
forms and in (8), where we skip the
subscript and trial-indicator for notation simplicity.
On account of the fact [14] where and are two normalization factors, which will be
given below.
Via the construction of weights and , the formula-
tion of variances in (14) explicitly considers the temporally local
structure of EEG signal. Another important role of the weighting
which implies that the variance after filtering can be interpreted is an attempt to deemphasize atypical samples, since the outliers
as the sum of all squared pairwise distances between the pro- could only affect their neighboring samples and a light weight is
jected data points, we thus propose using put between two samples that have large difference. Maximizing
the objective function (14) equals to maximize the numerator
(9) and minimize the denominator simultaneously. Maximizing the
numerator is to make the scatter of projected data points of one
class (corresponding to one kind of mental task) as far as pos-
instead of in (8), where is the weight imposed sible. Due to introduction of the weights, the samples are not
on the edge that connects data points and . Here, we are equally dealt. A heavier weight is put between two closer data
motivated by the utilization of graph theory. Specifically, from points, say and , which is to ensure that their pro-
the perspective of graph, the samples of a single trial, i.e., jections and are far away. Otherwise, a large
, are represented by a weighted undirected graph penalty will be incurred. We thus expect that the projected data
, where denote a set of nodes corresponding points are scattered well, since even the originally close data
to the sample points, and denote the edges connecting points are filtered far away. Minimizing the denominator is to
pairwise points with the weights . The matrix make the data points of the other class as compact as possible.
, termed adjacency matrix, can be defined as Particularly, it tries to guarantee that, if two data points are origi-
a heat kernel, which is given by nally close, then their projections are close as well, since a heavy
weight is associated with two close samples.
(10) 2) Implementation of LTCSP: We now start to compute the
otherwise temporally local covariance matrices and involved in
(14). Noting the symmetry of and using some algebraic
where is the Euclidean norm in , and are positive
operations, we have that
parameters. Here, defines the temporally local range. In other
words, characters the manifold of data according to temporal
information. Clearly, is monotonously decreasing with re-
spect to the distance between two temporally close data points
(i.e., the difference of their recording times is less than ), and
its value is greater than zero and less than 1. It follows that

(15)

where the Laplacian matrix , and is


a diagonal matrix with the diagonal elements being the row
sums of , i.e., . This result is also
(11) hold for the data from the other class. Substituting above
134 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 16, NO. 2, APRIL 2008

Fig. 1. Data points generated from the “S” curves, (b) is obtained by flipping the data points of (a) according to the diagonal line y = x. Lines in (a) and (b) are
respectively the directions of maximum projected variances of the neighboring data points around A and B.

results into (14), the objective function (14) therefore is TABLE I


converted into (16), shown at bottom of page where we let VARIANCES OF THE FIRST CLASS [CORRESPONDING TO FIG. 1(A)] AFTER
FILTER BY RESPECTIVELY USING CSP AND LTCSP. THESE VARIANCES ARE
and . ALSO THE EIGENVALUES OF COVARIANCE MATRIX OF WHITENED DATA POINTS
Let

3) Classification: Since (which is a generic symbol for


and ) is a semi-positive definite matrix,1 it can be
decomposed as . With and the projec-
tion matrix , a trial is transformed as
respectively be the average normalized temporally local covari- (19)
ance matrices of two classes. Then (16) can be denoted by
The variances of the first and last rows of are used to con-
struct a classifier.
4) Connection to CSP: It is worthwhile to point out that, if let
(17) and all entries of the weighting matrices take 1, i.e.,
(corresponding to ) for ,
Noticing (17) has the same form as (6), we thus have that the then LTCSP will turn out to be conventional CSP. In fact, the
optimal spatial filter maximizing (17) subject to is Laplacian matrix in this case is , where
given by is the vector whose elements are all 1. The Laplacian matrix here
is just to remove the sample mean from the samples, and rescale
the samples. This analysis shows that the weighting matrices
(18) plays an important role in LTCSP. The conventional CSP
1For a
any nonzero N -dimensional vector , one has that a La a D 0
= (
where is the matrix of eigenvectors of with
being the diagonal matrix of associated eigenvalues, and is
W)a = (1 2) = (a 0 a ) W  a
0, where has the elements
a ; . . . ; a . Here, the symbols are generic forms omitting class and trial indi-
the eigenvectors matrix of . cators.

(16)
WANG AND ZHENG: LOCAL TEMPORAL COMMON SPATIAL PATTERNS FOR ROBUST SINGLE-TRIAL EEG CLASSIFICATION 135

Fig. 2. Projections of the two classes onto the first row of the corresponding filter: (a) CSP, and (b) LTCSP.

is a special case of LTCSP when viewing all the data pairs are TABLE II
neighboring and equally important. FIRST AND LAST THREE AVERAGED VARIANCES OF TRAINING DATA
OF THE FIRST CLASS (CORRESPONDING TO UPCOMING LEFT HAND)
AFTER FILTERED BY, RESPECTIVELY, USING CSP AND LTCSP
III. EXPERIMENT

A. Materials
Two data sets are used in the experiment. The first data set
is an artificial 2-D data set. The simulated data are generated
from the “S” curves, as shown in Fig. 1. The data points de-
picted in Fig. 1(a) and (b), respectively, represent one class of
signals. We compare the filtered results of LTCSP and CSP on
the two classes. The second data set is from “BCI competition
2003”—data set IV [2]. The recorded EEGs
are the tasks of upcoming left hand movement vs. upcoming
right hand movement. This database consists of 316 training of BP, where BP starts from a wide range of sensorimotor cortex
epochs and 100 testing epochs, which are all of 500 ms length and then tends to contralateral hand cortex prior to the onset
ending 130 ms before pressing a key. Twenty-eight EEG chan- of movement. When a subject is not engaged with activity of
nels were recorded from a normal subject without a feedback limbs movement (actual or imagined), there exists so called
session. The original data were downsampled at 100 Hz, which idle rhythms; that is, large population of neurons in cortex are
are used in the experiment. Also, we compare the robustness of firing in rhythmical synchrony. But when the activity occur,
CSP and LTCSP when data set 2 is contaminated by an influ- these idle rhythms are attenuated, which can be measured at the
ence function. scalp in the EEG signals reflected around 10 Hz ( rhythm),
as well as around 20 Hz ( rhythm). This physiological phe-
B. Neurophysiological Background of EEG in Finger
nomenon resulting from loss of synchrony in the neural pop-
Movement
ulations is named event-related desynchronization (ERD), and
The difference of the spatio-temporal pattern of EEG be- with contrast, the enhanced rhythmic activity (viewed as re-
tween left– and right-hand movement has been witnessed by bound of ERD) is termed event-related synchronization (ERS).
the neurophysiological studies on movement-evoked potentials. Like BP, ERD/ERS is commonly prominent with contralateral
The readiness potential (also called Bereitschaftspotential or dominance.
BP) can be recorded over the vertex region with the maximum The dissimilarities in the spatio-spectral topography of BP
amplitude before voluntary finger movements [15], [22], and and ERD (since the epoch before actual keystroke is considered
the synchronization and desynchronization (ERS/ERD) modu- only, ERS is not included) are employed to predict the upcoming
lations of - and -rhythm over sensorimotor cortex have been task (left- versus right-hand movement). On account of the low
revealed during both actual and imagined movement activity frequency band of BP (say, below 2 Hz), a low-pass filter with
[21]. cutoff frequency 33 Hz is applied to the EEG data to cover both
BP is one of the main components of movement-related po- BP and ERD features in our experiment. Note that frequency fil-
tentials (MRPs), which are slowly decreasing potentials. The tering is commonly used in using CSP to enhance signal power
pronounced contralateral dominance is the prominent character and possibly to remove outliers [3], [16], [25].
136 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 16, NO. 2, APRIL 2008

TABLE III

= +1
COMPARISON OF CLASSIFICATION RATES (%) OF CSP AND LTCSP ON THE 100 TESTING EEG EPOCHS BY USING LDA AND NEAREST CLASS MEAN CLASSIFIERS.
FOR LTCSP, THE RESULTS CORRESPONDING TO c AS WELL AS THE OPTIMAL c (LISTED IN PARENTHESES) ARE PRESENTED

C. Results classifier [26]. The recognition accuracies of the two methods


(CSP and LTCSP) vary with the value of (i.e., is the
1) For the First Data Set: In using the LTCSP algorithm, number of extracted features), as well as the values of and
the parameter is simply set as and takes 10 in all applied in LTCSP. In Table III, we present the recognition rates
trials. The variances of the first class [corresponding to Fig. 1(a)] of LTCSP in the case and the maximal recognition
after filtered by respectively using CSP and LTCSP are listed in rates corresponding to the optimal . From Table III, it can be
Table I. The variances of the second class are one minus the observed that 1) LTCSP significantly outperforms CSP in each
corresponding variances of the first class. It can be seen that, experimental trial, 2) for LTCSP the optimal always yields
compared with CSP, LTCSP significantly enlarges the differ- better result than in each trial, and 3) the nearest
ence of variances between the first and second classes, which class mean classifier is better than LDA in most cases. The best
implies that the variances obtained by LTCSP deliver more dis- recognition rate of LTCSP can be up to 90%, which is better
criminative information. For CSP and LTCSP, the projections than the leading winner in BCI competition (84% recognition
of the two classes onto the first row of the corresponding filter rate) [25]. Note that in LTCSP, we do not perform elaborate
[by (5) or (19)] are shown in Fig. 2. Evidently, LTCSP can aug- preprocessing as in [25].
ment discrepancy of the two classes in terms of filtered vari- Fig. 3 shows the variances of the training EEG courses after
ances. The reason is that: the variances after global projections projected onto the most important and second most important
of these two classes are similar, but local projected variances discriminative pairs of directions obtained by CSP and LTCSP,
will make great difference. For example, considering the local respectively. Specifically, Fig. 3(a) plots the variances when the
data points around A in Fig. 1(a) and around B in Fig. 1(b), the training EEG signals are projected onto the most important pair
directions of maximal projected variances are nearly perpendic- of directions obtained by CSP. For each point in the figure, the
ular. This clearly reveals that explicitly considering the manifold horizontal coordinate is the variance of the first row of (or
structure in maximizing the difference between variances is im- ) according to (5), and the vertical one is the variance of the
portant and effective. LTCSP just makes use of the advantage of last row. Fig. 3(c) is onto the most important pair of directions
manifold modelling, which may be blinded by CSP. obtained by LTCSP under the settings and . The
2) For the Second Data Set: We set in using the variances are computed according to the first and last rows of
LTCSP algorithm, where , and is (19). Fig. 3(e) is onto the most important pair of directions ob-
the standard deviation of the squared norms of the training sam- tained by LTCSP under the settings and . Fig. 3(b),
ples. In the case (i.e., ), we simply equally (d), and (f) have the same settings as Fig. 3(a), (c), and (e), re-
weight the samples that are in a rectangular time window. spectively, except that they are onto the second most important
Table II displays the first and last three averaged variances of pair of directions. Compared with Fig. 3(a), the points clouds
training data of the first class (corresponding to upcoming left in Fig. 3(c) and (e) are more narrow and long, which basically
hand) after filtered by, respectively, using CSP and LTCSP. means that the difference of the largest and smallest variances
Since the performance of LTCSP varies with respect to the produced by LTCSP is greater than that of CSP. Also, Fig. 3(f)
value of , we show the results in the case and that is slightly better than Fig. 3(b) and (d). So, from Fig. 3, we see
averaged over taking 1 through 10. We see again that LTCSP that LTCSP yields more meaningful clusters than CSP in terms
enlarges the difference of variances between the two classes. of classification, and the most important pair of projection direc-
Therefore, we expect that LTCSP has better discrimination tions contain much discriminative information than the second
ability than CSP. The classification results of the 100 testing most important pair.
EEG epochs by using CSP and LTCSP are shown in Table III. In In the experiment of LTCSP, the parameter (in setting ) is
the experiment, we adopt two popular classifiers: one is linear experimentally chosen to obtain the maximal recognition accu-
discriminant analysis (LDA) and the other is nearest class mean racy. We found that the determination of in such way is ef-
WANG AND ZHENG: LOCAL TEMPORAL COMMON SPATIAL PATTERNS FOR ROBUST SINGLE-TRIAL EEG CLASSIFICATION 137

1
Fig. 3. Variances of the training EEG signals after projected onto the most important and second most important discriminative pairs of directions obtained by
CSP and LTCSP: (a) most important pair by CSP, (b) second most important pair by CSP, (c) most important pair by LTCSP ( = 3; c = + ), (d) second most
1
important pair by LTCSP ( = 3; c = + ), (e) most important pair by LTCSP ( = 3; c = 7), and (f) second most important pair by LTCSP ( = 3; c = 7).

fective. The good recognition accuracy is usually achieved for robustness to outliers. First, we investigate the robustness
some point . Fig. 4 illustrates the recognition of CSP and LTCSP to the impact of an influence function.
rates versus the variation of in two cases ( and Specifically, we add a sufficiently large constant on the
) using LDA and nearest class mean classifiers, first channel (i.e., F3) at the first recording time of the first
respectively. The recognition rate when taking is also training trial (that belongs to the first class corresponding
plotted on the position . to upcoming left hand). On this situation, the classification
Below, we use only the nearest class mean classifier to accuracies of CSP and LTCSP are listed in Table IV. From
compare the performances of CSP and LTCSP in terms of Table IV, we see that the performance of CSP deteriorates
138 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 16, NO. 2, APRIL 2008

TABLE IV
COMPARISON OF CLASSIFICATION RATES (%) OF CSP AND LTCSP ON THE 100 TESTING EEG EPOCHS WHEN THE FIRST TRAINING
EPOCH IS AFFECTED BY AN INFLUENCE FUNCTION. ALSO, THE OPTIMAL c IN LTCSP ARE LISTED IN PARENTHESES

Fig. 4. Recognition rates versus the variation of parameter c in two cases ((r =
1
Fig. 5. Average maximal recognition rates, as well as the associated standard
1;  = 3) and (r = 2;  = 3)) using the LDA and nearest class mean classifier, deviations, of CSP, LTCSP (c = + ), and LTCSP (optimal) versus the number
respectively. of data points contaminated.

greatly. For LTCSP with , the recognition rates also eigenvalue problem, and is computationally as straightforward
drop dramatically. By contrast, the LTCSP with optimal is as CSP. LTCSP actually gives a general extension to CSP in the
slightly affected by the influence function. So, even one trial sense that CSP can be regarded as a special case of LTCSP. Im-
spoiled by an influence function may make CSP useless. And portantly, LTCSP is much more robust than CSP.
LTCSP with equal weights in time window is not powerful LTCSP is formally linear. One possible extension of our work
enough to resist the impact while LTCSP with optimal ef- is to perform LTCSP in the reproducing kernel Hilbert space
fectively circumvents the disaster. Note, if use opposite signs induced by a nonlinear function . In implementation, the kernel
of the perturbation, we arrive at the same results, since the trick is resorted. The performance of kernel-based LTCSP needs
influence value is so large that it dominates involved compu- to be further investigated. Another question is how to choose the
tation of covariance matrices. Second, we add constants of parameters and , which are important for recognition, but
value 100 randomly during channel, recording time, and epoch, may depend on the data set at hand and are not easy to be ana-
where , and then compare the maximal lytically determined. We are currently studying these problems.
recognition rates (with optimal parameters) of CSP, LTCSP
with fixed , and LTCSP. For each , the experiment is ACKNOWLEDGMENT
repeated 50 times. The average maximal recognition rates, as
well as the associated standard deviations, are shown in Fig. 5. The authors would like to thank the anonymous referees for
It can be seen that LTCSP is much more robust than CSP. help recommendations, which improve the paper greatly.

IV. CONCLUSION REFERENCES


We have proposed a new linear subspace learning method, [1] A. Bashashati, M. Fatourechi, R. K. Ward, and G. E. Birch, “A survey
of signal processing algorithms in brain-computer interfaces based on
called LTCSP, to identify the underlying manifold variance of electrical brain signals,” J. Neural Eng., vol. 4, pp. R32–R57, 2007.
EEG signals. LTCSP considers the temporally local informa- [2] B. Blankertz, G. Curio, and K.-R. Müller, “Classifying single trial
tion in the variance modelling. Compared with CSP, LTCSP EEG: Toward brain computer interfacing,” in Advances in Neural
Information Processing Systems, T. G. Diettrich, S. Becker, and
can discover more accurate variance structure of EEG data and Z. Ghahramani, Eds. Boston, MA: MIT-Press, 2002, vol. 14, pp.
thus has better discrimination ability. LTCSP is formulated as an 157–164.
WANG AND ZHENG: LOCAL TEMPORAL COMMON SPATIAL PATTERNS FOR ROBUST SINGLE-TRIAL EEG CLASSIFICATION 139

[3] M. Congedo, F. Lotte, and A. Lécuyer, “Classification of movement [21] G. Pfurtscheller and F. H. L. da Silva, “Event-related EEG/MEG syn-
intention by spatially filtered electromagnetic inverse solutions,” Phys. chronization and desynchronization: Basic principles,” Clin. Neuro-
Med. Biol., vol. 51, pp. 1971–1989, 2006. physiol., vol. 110, no. 11, pp. 1842–1857, Nov. 1999.
[4] G. Dornhege, B. Blankertz, G. Curio, and K.-R. Müller, “Boosting bit [22] J. A. Pineda, B. Z. Allison, and A. Vankov, “The effects of self-move-
rates in noninvasive EEG single-trial classifications by feature combi- ment, observation, and imagination on mu rhythms and readiness po-
nation and multi-class paradigms,” IEEE Trans. Biomed. Eng., vol. 51, tentials (rp’s): Toward a brain-computer interface (BCI),” IEEE Trans.
no. 6, pp. 993–1002, Jun. 2004. Rehabil. Eng., vol. 8, no. 2, pp. 219–222, Jun. 2000.
[5] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed. [23] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatial
Boston, MA: Academic, 1990. filtering of single trial EEG during imagined hand movement,” IEEE
[6] D. Garrett, D. A. Peterson, C. W. Anderson, and M. H. Thaut, “Com- Trans. Rehabil. Eng., vol. 8, no. 4, pp. 441–446, Dec. 2000.
parison of linear, nonlinear, and feature selection methods for EEG [24] S. Sun and C. Zhang, “An optimal kernel feature extractor and its
signal classification,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 11, application to EEG signal classification,” Neurocomput., vol. 69, pp.
no. 2, pp. 141–144, Jun. 2003. 1743–1748, 2006.
[7] N. Hazarika, A. C. Tsoi, and A. A. Sergejew, “Nonlinear considerations [25] Y. Wang, Z. Zhang, Y. Li, X. Gao, S. Gao, and F. Yang, “BCI com-
in EEG signal classification,” IEEE Trans. Signal Process., vol. 45, no. petition 2003-data set iv: An algorithm based on CSSD and FDA for
4, pp. 829–936, Apr. 1997. classifying single-trial EEG,” IEEE Trans. Biomed. Eng., vol. 51, no.
[8] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, “Face recognition 6, pp. 1081–1086, Jun. 2004.
using laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, [26] A. R. Webb, Statistical Pattern Recognition. New York: Oxford Univ.
no. 3, pp. 328–340, Mar. 2005. Press, 1999.
[9] Z. J. Koles, “The quantitative extraction and topographic mapping of [27] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T.
the abnormal components in the clinical EEG,” Electroenc. Clin. Neu- M. Vaughan, “Brain-computer interfaces for communication and con-
rophys., vol. 79, pp. 440–447, 1991. trol,” Clin. Neurophysiol., vol. 113, pp. 767–791, 2002.
[10] Z. J. Koles, M. S. Lazar, and S. Z. Zhou, “Spatial patterns underlying [28] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, “Graph em-
population differences in the background EEG,” Brain Topogr., vol. 2, bedding and extensions: A general framework for dimensionality re-
pp. 275–284, 1990. duction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp.
[11] Z. J. Koles, J. C. Lind, and P. Flor-Henry, “Spatial patterns in the back- 40–51, Jan. 2007.
ground EEG underlying mental disease in man,” Electroenc. Clin. Neu-
rophys., vol. 91, pp. 319–328, 1994.
[12] Z. J. Koles, J. C. Lind, and A. C. K. Soong, “Spatio-temporal decom-
position of the EEG: A general approach to the isolation and localiza-
tion of sources,” Electroencephalogr. Clin. Neurophysiol., vol. 95, pp.
219–230, 1995. Haixian Wang received the B.S. and M.S. degrees
[13] Z. J. Koles and A. C. K. Soong, “EEG source localization: Imple- in statistics, and the Ph.D. degree in computer sci-
menting the spatio-temporal decomposition approach,” Electroen- ence from Anhui University, China, in 1999, 2002,
cephalogr. Clin. Neurophysiol., vol. 107, pp. 343–352, 1998. and 2005, respectively.
[14] Y. Koren and L. Carmel, “Robust linear dimensionality reduction,” During 2002–2005, he was with the Key Labora-
IEEE Trans. Visual. Comput. Graph., vol. 10, no. 4, pp. 459–470, Jul./ tory of Intelligent Computing and Signal Processing
Aug. 2004. of Ministry of Education of China. He currently is
[15] M. Kukleta and M. Lamarche, “Steep early negative slopes can be with the Key Laboratory of Child Development and
demonstrated in pre-movement bereitschaftspotential,” Clin. Neuro- Learning Science of Ministry of Education, and the
physiol., vol. 112, pp. 1642–1649, 2001. Research Center for Learning Science at Southeast
[16] S. Lemm, B. Blankertz, G. Curio, and K.-R. Muller, “Spatio-spectral University, China. His research interests focus on
filters for improved classification of single trial EEG,” IEEE Trans. EEG signal analysis, statistical pattern recognition, and machine learning.
Biomed. Eng., vol. 52, no. 9, pp. 1541–1548, Sep. 2005.
[17] Y. Li, X. Gao, and S. Gao, “Classification of single-trial electroen-
cephalogram during finger movement,” IEEE Trans. Biomed. Eng., vol.
51, no. 6, pp. 1019–1025, Jun. 2004. Wenming Zheng received the B.S. degree in
[18] D. J. McFarland, C. W. Anderson, K.-R. Muller, A. Schlogl, and D. J. computer science from Fuzhou University, Fuzhou,
Krusienski, “BCI meeting 2005-workshop on BCI signal processing: China, in 1997, the M.S. degree from Huaqiao
Feature extraction and translation,” IEEE Trans. Neural Syst. Rehabil. University, Quanzhou, China, in 2001, and the Ph.D.
Eng, vol. 14, no. 2, pp. 135–138, Jun. 2006. degree from Southeast University, Nanjing, Jiangsu,
[19] K.-R. Müller, C. W. Anderson, and G. E. Birch, “Linear and non- in 2004.
linear methods for brain–computer interfaces,” IEEE Trans. Neural. Since 2004, he has been with the Research Center
Syst. Eng., vol. 11, no. 2, pp. 165–169, Jun. 2003. for Leaning Science (RCLS) of Southeast University,
[20] J. Müller-Gerking, G. Pfurtscheller, and H. Flyvbjerg, “Designing op- China. His research interests include neural computa-
timal spatial filters for single-trial EEG classification in a movement tion, pattern recognition, machine learning, and com-
task,” Clin. Neurophysiol., vol. 110, pp. 787–798, 1999. puter vision.

Você também pode gostar