Escolar Documentos
Profissional Documentos
Cultura Documentos
https://dx.doi.org/10.22161/ijaers/3.12.34
INTRODUCTION
www.ijaers.com
METHODOLOGY
www.ijaers.com
(4)
Fig.1: Original Spectral Subtraction denoising block diagram. The same data is used to estimate the noise power spectrum
which is then removed from the overall power spectrum
In spite of the success of this method in denoising event
related functional magnetic resonance imaging time
courses, two problematic issues are present in our
application to EEG recordings. The first is the use of the
phase component of the original signal in the denoised
signal. Given that the information in the phase part is as
important as that in the magnitude part, leaving this
component intact will no doubt limit the efficiency of the
process in removing the noise components in the final
signal. This issue was also a concern in the original
application of this technique in fMRI and it was found
that the improvement is still robust and therefore this
issue is not as critical. The second issue is related to the
observed jumps between the initial and final time points
in the EEG epochs due to such effects as baseline drift
that was found to be present and in many cases severe in
the data sets we used in this work and elsewhere. This is
an important difference between our case and the
application of this method to fMRI signals where baseline
Fig.2: Illustration of border discontinuity artifact in the old spectral subtraction method and its solution in the new
method. The top plot shows the original signal, the middle plot shows the signal processed with the old spectral
subtraction method with a clear artifact at the border in the beginning of the signal. This problem is absent in the new
spectral subtraction method at the bottom.
www.ijaers.com
Page | 179
Fig.3: Block diagram of Modified Spectral Subtraction Denoising where the data are concatenated with its mirror image
to generate a symmetric signal before applying the regular steps of the spectral subtraction method. This allows the phase
of the signal to be zero and avoids artifacts from mismatch of signal levels at the borders.
The detailed steps of implementation of the new method
are given as follows:
Step 1: Read in the raw epoch data s(t) and convert it to a
symmetric signal by concatenation with its reflected
version s(-t).
Step 2: Compute the fast Fourier transform of the
symmetric raw epoch data. Estimate and keep the linear
phase of the result.
Step 3: Compute the periodogram-based estimate of the
power spectrum as the squared magnitude of the fast
Fourier transform of the raw epoch data.
Step 4: Estimate the noise level by computing the average
of the power spectrum values in the upper 20% of the
frequency range that contains no signal components.
Step 5: Use Equation (3) to compute the power spectrum
of the denoised signal. If the subtraction result at any
frequency is negative, it is clipped to zero.
Step 6: Compute the denoised signal discrete Fourier
transform as the square root of the denoised signal power
spectrum and transform it back to the time-domain
www.ijaers.com
Page | 180
Fig.4: Block diagram of proposed new pre-processing chain with an added new denoising block before the usual steps
conventionally applied to BCI signals.
This was done as follows. Since the original channel data
are recorded using a much higher sampling rate than
needed for the known frequency content of EEG signals
and what is conventionally used for activation detection,
the power spectrum of the original signal can be assumed
www.ijaers.com
Page | 181
Fig.5: Illustration of noise power spectrum estimation from the upper part of the signal power spectrum on both ends
known to have no true signal components based on the average of such areas.
The average is used because the power spectrum itself at
each point can be shown to be a random variable that is
unbiased (that is, mean is equal to true value) and
consistent (variance decreases uniformly to zero as
number of points goes to infinity). Given that the
magnitudes of these points are independent and
identically distributed, their average can be used to
improve the estimation of the common mean of their
processes. This estimation process is done for each
channel independently and used to denoise its respective
channel to account for different analog front-ends for each
channel.
2.2 Unsupervised Processing Methods
The basic principle of the new class of unsupervised
techniques is that the trial with P300 signal within each
block has to be different from the rest of the trials within
that block. In other words, if we have an N-trial block,
one signal has a different form from all the other (N-1)
signals. This means that if we could find a measure that
Fig.6: Block diagram of the new unsupervised
can be sensitive to this dissimilarity (or alternatively, the
approach.
similarity of signals without activation), then we can
indeed make a decision based on a single block without
any prior training. In the following, we present a number
Outlier Detection Method
of such measures and discuss how they will be used to
In this method, each trial is formulated as a vector in Mseparate the activated trial from the rest of trials in the
dimensional space where M is the number of points
same block. The assumption in all these methods remains
within each trial. The assumption underlying this method
that there is only one activated trial within each block.
is that the vectors of all trials that exhibit no P300
The input to each of these methods is a set of trials
activation will be similar and that they are all different
{x1 , x2 , , xN } given as a collection of M1 vectors. The
from the one that has a P300 activation. Hence, a distance
general block diagram for all methods is presented in
measure is used to compute the distance between each
Figure 6.
trial and all other trials in a pairwise manner. Then, for
each trial, the sum of all distances with other trials is used
to differentiate the one trial with the largest distance from
all other trials. In mathematical form, the trial with P300
signal is computed as the solution to the optimization
problem given by,
www.ijaers.com
Page | 182
(5)
min xi xj .
i
all ji
This means that this method detects the trial that is the
furthest from all other trials. The norm in this equation
was used as the 2-norm (Strang & Press, 2009).
Possibility of using other norm definitions is possible but
this one was selected to make its concept more visible by
appealing to the common Euclidean distance as the
measure used in this problem.
Correlation Method
The detection of the P300 signal relies on its
characteristic shape and onset that are unique and help
distinguish this type of activation from any other type. It
is also very common for the literature working with P300
signals to show the P300 signal form their data by simple
averaging of a number of trials with known activation
presence. Here, we borrow an activation detection method
from functional magnetic resonance imaging given the
similarity between this technique and that of P300 based
BCI. In particular, if the activation signal shape is
somewhat known, it is possible to detect its presence by
simple correlation of a template activation and each
trial signal. If we have N trials within a block, then the
trial with the strongest correlation with the template
activation should most likely be the one with P300 signal.
Since the onset of the true P300 signal varies between 300
and 500 ms, the template activation is used with different
amounts of time shift to detect such correlation to make
sure that such variability is taken into consideration. In a
mathematical form, the activated trial is found by solving
the following optimization over all trials in the block:
max {xiT st }.
all i,t
(6)
all ji
Cross-Correlation Method
This method bears similarities to both the dot product
method and the correlation method. In particular, rather
than computing the dot product between two trials, it
computes the cross-correlation between them and obtains
the peaks of this cross-correlation function as the measure
of similarity for this method. So, it is a generalization of
the concept in the dot product method and also a variant
of the correlation method whereby the template signal is
just a different trial in the same block. In a mathematical
form, we select the trial that satisfies the following
optimization problem:
max { max xiT xj,t }.
i
ji
(8)
1jN,ji
= UV T ,
(9)
(10)
That is, we select the trial that has all the remaining trials
forming a matrix with the largest 1-norm for its singular
values (similar to the strategy used with compressive
sensing (Cand & Wakin, 2008).
2.3 A Genetic Interval Type-2 Fuzzy Logic Based
Approach for Generating Interpretable Linguistic
Models for the Brain P300 Phenomena
Several machine learning based classification systems
have been used in BCI where linear and non-linear
classification methods have been employed including
Linear Discriminant Analysis (LDA) and Support Vector
Machine (SVM) (Lotte, Congedo, Lcuyer, Lamarche, &
Arnaldi, 2007), (Mller, Anderson, & Birch, 2003),
(Croux, Filzmoser, & Joossens, 2008). Popular LDA
include Fishers Linear Discriminant Analysis (FLDA)
and the regularized versions termed Regularized Fisher
Linear Discriminant (RFLD) as well as Bayesian Linear
Discriminant Analysis (BLDA). The regularized version
of FLDA may give better results for BCI than the nonregularized version Lotte, 2007 #35}, (Mller et al.,
2003), (Croux et al., 2008). However, these techniques
are black box models which are difficult to understand
and analyze by a normal clinician. In addition, most of
these classifiers need to be bespoke trained for the person
using them and under given circumstances. However, if
there is a change in the user or the given circumstances
then the classifier need to be retrained.
Fuzzy Logic Systems (FLSs) have been credited with
providing white box transparent models which can handle
the uncertainty and imprecision. FLSs have been used in
(Lotte, Lcuyer, & Lamarche, 2007) for motor imagery
classification in BCIs which produced similar results to
the most popular classifiers used in BCIs while providing
an easy to read and interpret model. The work on (Lotte,
Lcuyer, & Arnaldi, 2009) employed FuRIA which is a
fuzzy logic based trainable feature extraction algorithm
for BCIs which is based on inverse solutions. This
algorithm can be trained to automatically identify relevant
regions of interest and their associated frequency bands
for the discrimination of mental states. The main
drawback of FuRIA is its long training process (Lotte et
al., 2009). Palaniappan et al. (Palaniappan, Paramesran,
Nishida, & Saiwaki, 2002), presented a new BCI design
using fuzzy ARTMAP neural network whose objective
was to classify the best three of the five available mental
tasks for each subject using power spectral density of
EEG signals. The suggested fuzzy based system has been
used successfully with a tri-state switching device.
However, all these FLSs were based on type-1 fuzzy logic
systems which cannot fully handle or accommodate for
www.ijaers.com
Fig.7: The features extracted for a P300 BCI application with 4 electrodes sensors.
Figure 7 shows that only 38 features were selected as they
were found to be the least number of features to give good
prediction accuracy. Figure 7 enables us to have a
transparent interpretation of what happens in the brain
www.ijaers.com
(a)
(b)
(c)
Fig.8: The General Hierarchical Fuzzy Logic Structure.(b) Incremental hierarchical Structure. (c) Aggregated
hierarchical Structure .
To illustrate how the interval type-2 hierarchical fuzzy
logic classifier reduce the number of rules compared to
other fuzzy logic classifiers, we will give the following
example: Lets assume that the classifier extracted the
maximum number of possible rules from the data where
each rule is represented by one antecedent and one
consequent and all the inputs are numerical and are
represented by the same number of fuzzy sets. In order to
calculate the maximum number of possible rules, we will
use the following equation:
R=N S
(13)
Where:
represents the number of rules
www.ijaers.com
(14)
Fig.9: The proposed Interval Type-2 Hierarchical Fuzzy Logic Classifier structure
Genetic Learning of the Rules of the Type-2 Based Classifier
Fig.10: An overview of the proposed genetic interval type-2 hierarchical fuzzy logic based classifier.
Figure 10 shows an overview on the proposed genetic
type-2 hierarchical fuzzy logic based classifier. The
proposed classifier involve the following stages:
Building equally spaced type-2 membership
functions for each input using the inputs
minimum and maximum values.
Generating the fuzzy rules using the training data
Setting the Genetic Algorithm (GA) initial
population
Optimize the fuzzy rules using GA and evaluate
it using the training data. Evaluate the best GA
solution using validation data and store the best
solution. Classify the testing data using the best
validated solution.
www.ijaers.com
Page | 187
start nf =
stepn
2
+ (stepn (f 1))
(16)
(18)
n
To calculate sfu
and sfln we use the following equations:
n
sfu
= start nf dfou + min(xn )
n
sfl = start nf + dfou + min(xn )
(19)
(20)
2<fF
(21)
en(f1)u
2<
(22)
=sfln
Page | 188
start n =
dfou =
2
max(xn )min(xn )
4
(23)
(24)
(24)
IF xn is n THEN y is O
(25)
www.ijaers.com
Page | 189
(26)
A~ ( xs( t ) )
~q ( xs( t ) )
q
s
and As
for each fuzzy set q=1,..Vi, and for
each input variable s=1,,N. Find q*{1,Vi} such that
:
cg
q*
As
Where
(27)
A~
cg
q
s
( xs( t ) )
1
~ ( x s( t ) ) A~ ( x s( t ) )
2 A
A~ cg ( x s( t ) )
q
s
q
s
q
s
(28)
fi
(t )
could be
fi (t ) A~q ( xs (t ))
s 1
cg
(29)
EXPERIMENTAL RESULTS
www.ijaers.com
Page | 191
Fig.14:Block accuracy results for sample cases using all methods and electrode configurations.
www.ijaers.com
Page | 192
Fig.15: Relative block accuracy results for sample cases using all methods and electrode configurations
The block accuracy results of using the new approach on 4 sample subjects are shown in Figure 14. The figure presents the
results using (a) outlier detection method, (b) correlation method, (c) dot product method, (d) cross correlation method, (e)
singular value decomposition method, and (f) supervised classification using BLDA for direct comparison, each on a
separate row.
www.ijaers.com
Page | 193
Fig.16: Average block accuracy and relative block accuracy results over all cases using all methods and electrode
configurations
The results also show the cases of using 4, 8, 16 and 32
interpretation. In Figure 16, the average block accuracies
channel data on the same graph for each case/method. The
and relative block accuracies for all subjects are presented
relative block accuracy results are shown in Figure 15
for each method to see an overall picture of the
with the same order of the methods for better
performance.
Table.1: Performance of different methods in terms of their low and high block classification accuracies computed over all
subjects as compared to the results from supervised comparison method that requires 3-session training at the bottom row.
4-Channels
8-Channels
16-Channels
32-Channels
Block Accuracy
Limits
Low
High
Low
High
Low
High
Low
High
14.6% 61.5% 16.7% 62.5% 14.6% 58.3% 15.6% 67.7%
Outlier Detection
17.7% 57.3% 20.8% 57.3% 17.7% 62.5% 16.7% 69.8%
Correlation
17.7% 54.2% 20.8% 58.3% 22.9% 61.5% 19.8% 69.8%
Dot Product
17.7% 66.7% 21.9% 65.6% 17.7% 58.3% 16.7% 72.9%
Cross-Correlation
18.8% 69.8% 26.0% 79.2% 18.8% 80.2% 17.7% 84.3%
SVD
Comparison Method 38.5% 98.9% 44.8% 100% 43.8% 100% 50.0% 100%
In Table 1, the low and high limits for the block accuracies for each method are given whereas those for relative block
accuracies are given in Table 2. In the following, the analysis of the results according to different parameters is presented.
www.ijaers.com
Page | 194
Table.2: Performance of different methods in terms of their relative low and high block classification accuracies computed
over all subjects with reference to the results from supervised BLDA comparison method that requires 3-session training.
4 Channels
8 Channels
16 Channels
32 Channels
Block Accuracy
Limits
Low
High
Low
High
Low
High
Low
High
32.9%
62.0%
32.7%
62.5%
31.3%
58.3%
31.9%
67.7%
Outlier Detection
33.1% 61.8% 28.9% 57.3% 28.2% 62.5% 36.5% 69.8%
Correlation
37.1% 55.3% 36.3% 58.3% 33.2% 61.5% 37.3% 69.8%
Dot Product
Cross-Correlation 41.5% 67.8% 37.3% 65.6% 31.2% 58.3% 30.9% 72.9%
48.1% 71.3% 50.9% 79.2% 47.8% 80.2% 38.5% 84.4%
SVD
Fig.17: The KAU BCI lab setup and one of the subjects involved in the real world experiments.
Experimental Setup for Real-World Experiments
www.ijaers.com
Page | 195
For the real world experiments, the users were facing a screen on which 6 Arabic characters were displayed. The characters
were arranged in a 3 by 2 matrix, as shown in Figure 18.
www.ijaers.com
Page | 196
59.38
56.12
57.75
48.04
68.45
32
50.37
62.47
56.42
33.28
78.74
56.013
58.91
58.94
58.93
for each input) of which 19 rules have an output class +1
and 19 rules having output class -1 as shown in Table 3,
the generated rules are very simple with only one
antecedent and there is a small number of rules. Hence,
when compared to the BLDA and RFLDA the proposed
type-2 fuzzy logic based classifier gave a better average
prediction accuracy over the testing data of the Hoffmann
data set. On the other hand, the proposed type-2 fuzzy
logic classifier has resulted in an easy to understand and
analyses linguistic model which explains the P300
phenomena over various subjects. The generated model
could be easily understood and analysed by a normal
clinician. Hence, the proposed type-2 fuzzy logic
classifier can provide more understanding about the
underlying processes and phenomena happening on the
brain in a simplified way where for example rule 1 is
58.24
54.31
63.42
58.87
Page | 198
DISCUSSION
It can be observed that the block accuracy results for 4channel data (plotted in red) show a significant
improvement from the original data in both spectral
subtraction and wavelet shrinkage methods with low
number of blocks. This is also reflected as higher bitrates
in the same range. Even though the effect of denoising in
general is more apparent in experiments with lower
number of channels and low number of blocks, there is
still evident improvement in experiments with high
number of channels where 100% accuracy is reached
earlier as evident in all cases. This is important to indicate
that the inherent spatial compounding from the many
electrodes can still take advantage of temporal denoising
methods and that a combination of the two yields the best
results.
By inspecting the results further, we observe that the
spectral subtraction method offers better results than
wavelet shrinkage based denoising in most experiments
with the exception of a few cases such as in the 4-channel
data of Subject 2 where the 100% accuracy is maintained
once reached in wavelet denoising while it does not with
spectral subtraction. Nevertheless, in all other cases the
spectral subtraction results are superior as evident in the
achieved block accuracy and bit rate for any given
experiment. As a general observation, the results of
spectral subtraction and wavelet denoising methods show
www.ijaers.com
CONCLUSION
I. A., Zayed,
variation and
for medical
Imaging and
www.ijaers.com
M., Hargas,
subtraction
P300-based
engineering
Technique.
www.ijaers.com
www.ijaers.com
Page | 203