A Multi-View Deep Learning Method For Epileptic Seizure

Session 8: Automated Diagnosis and Prediction I ACM-BCB’17, August 20–23, 2017, Boston, MA, USA
A Multi-view Deep Learning Method for Epileptic Seizure

Detection using Short-time Fourier Transform
Ye Yuan Guangxu Xun
College of Information and Communication Engineering, Department of Computer Science and Engineering, State
Beijing University of Technology University of New York at Buffalo
Beijing 100124, China Buffalo 14260, USA
yuanye91@emails.bjut.edu.cn guangxux@buffalo.edu
Kebin Jia Aidong Zhang

College of Information and Communication Engineering, Department of Computer Science and Engineering, State
Beijing University of Technology University of New York at Buffalo
Beijing 100124, China Buffalo 14260, USA
kebinj@bjut.edu.cn azhang@buffalo.edu
ABSTRACT KEYWORDS
With the advances in pervasive sensor technologies, physiological Deep learning; Epileptic seizure; Electroencephalogram; Feature
signals can be captured continuously to prevent the serious out- extraction; Time-frequency analysis
comes caused by epilepsy. Detection of epileptic seizure onset on
collected multi-channel electroencephalogram (EEG) has attracted
lots of attention recently. Deep learning is a promising method
1 INTRODUCTION
to analyze large-scale unlabeled data. In this paper, we propose a With the development of pervasive sensor technologies, tremen-
multi-view deep learning model to capture brain abnormality from dous opportunities are introduced to closely analyze patients’ condi-
multi-channel epileptic EEG signals for seizure detection. Specifi- tions. The advanced sensors, including implantable, wearable, and
cally, we first generate EEG spectrograms using short-time Fourier ambient sensors [20], allow to monitor continuously to prevent the
transform (STFT) to represent the time-frequency information after serious outcomes caused by several medical diseases. The collected
signal segmentation. Second, we adopt stacked sparse denoising physiological data can be further used to recognize hidden patterns
autoencoders (SSDA) to unsupervisedly learn multiple features in biomedical applications to improve treatment [51]. Epilepsy,
by considering both intra and inter correlation of EEG channels, a serious chronic neurological disorder, affects approximately 50
denoted as intra-channel and cross-channel features, respectively. million people worldwide [34]. Although anti-epileptic drugs do
Third, we add an SSDA-based channel selection procedure using have some treatment effect, the disease may still result in serious
proposed response rate to reduce the dimension of intra-channel physical injuries and even death [32, 45]. The major symptom of
feature. Finally, we concatenate the learned multi-features and ap- epilepsy is associated with recurrent and unpredictable epileptic
ply a fully-connected SSDA model with softmax classifier to jointly seizures caused by sudden abnormal neuronal discharges in the
learn the cross-patient seizure detector in a supervised fashion. brain [10]. According to previous studies, electroencephalogram
To evaluate the performance of the proposed model, we carry out (EEG) is one of the most effective methods to observe brain electri-
experiments on a real world benchmark EEG dataset and compare cal activities by placing electrodes outside the skull (extra-cranial
it with six baselines. Extensive experimental results demonstrate recording) or inside the skull (intra-cranial recording) [15]. Fur-
that the proposed learning model is able to extract latent features thermore, with distributed continuous sensing, scalp EEG signals
with meaningful interpretation, and hence is effective in detecting can be collected from a large number of input channels with high
epileptic seizure. temporal resolution. This massive data can record synchronous
activities of neurons in different brain areas. Therefore, epileptic
CCS CONCEPTS seizure detection using multi-channel scalp epileptic EEG signals
has gained great attention in neuro-informatics.
•Applied computing → Bioinformatics; •Computing method-
In clinical settings, visual inspection for epilepsy diagnosis re-
ologies → Neural networks; •Information systems → Data min-
quires scarce highly trained professionals in neurology. In addi-
ing;
tion, it is extremely laborious and time-consuming for physicians
Permission to make digital or hard copies of all or part of this work for personal or to identify epileptic patterns from long-term EEG readings [33].
classroom use is granted without fee provided that copies are not made or distributed These limitations have motivated researchers to develop automated
for profit or commercial advantage and that copies bear this notice and the full citation seizure detection approaches via machine learning methods. Based
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, on the categories of epileptic EEG signals [12], seizure detection
to post on servers or to redistribute to lists, requires prior specific permission and/or a can be regarded as a binary classification task for non-ictal and
fee. Request permissions from permissions@acm.org.
ictal states.
ACM-BCB’17, August 20–23, 2017, Boston, MA, USA.
© 2017 ACM. ISBN 978-1-4503-4722-8/17/08. . . $15.00 In general, the machine learning based seizure detection process
DOI: http://dx.doi.org/10.1145/3107411.3107419 consists of three steps, namely, preprocessing, feature extraction
213
and classification [39]. Among these steps, feature extraction plays by considering both intra and inter correlation of EEG channels,
an important role since it aims to capture the meaningful and dis- denoted as intra-channel and cross-channel features, respectively.
tinctive characteristics of EEG signals, which directly influences the Since the channel selection based techniques have been proven
final classification performance. Deep learning is a branch of ma- to be effective in bioinformatics [18, 25], we further extend this
chine learning that attempts to adopt a multi-layer neural network strategy and propose an SSDA-based channel selection procedure
to automatically extract latent features from massive data. The ini- by integrating both zero-based and sample-based unsupervised
tial concept of deep learning is an unsupervised greedy layer-wise information to derive a better understanding of the characteristics
training algorithm based on deep network [16]. By constructing of seizure patterns. Finally, we merge the learned multi-features
a better representation with deep structure to learn a model, deep together and apply the fully-connected SSDA model with softmax
learning obtains high level features to enhance learning perfor- classifier in a supervised fashion to jointly learn the cross-patient
mance. There are mainly two steps in training a deep network: seizure detector. Our contribution can be summarized as:
pre-training and fine-tuning. A greedy layer-wise unsupervised • We develop an unsupervised feature learning method using
training method is adopted for pre-training using unlabeled data. the EEG spectrogram representation model as preprocess-
The weights connected each layer are trained individually while ing. The generated 2-D time-frequency-based fragments
fixing the others. This step is significantly different from traditional are suitable for latent feature extraction using deep learn-
neural network training strategy. To improve performance, fine- ing. This method also combines features from both signal
tuning is then adopted in a supervised learning fashion by tuning handcrafted engineering and deep learning approaches.
the weights of all layers simultaneously using labeled data. Thus, • We develop an EEG channel selection procedure which
the deep architecture transforms the features into a relative low uses an SSDA-based stimulus-response rate considering
dimension space, which enhances the ability of feature expression. multiple unsupervised information to jointly determine
Research has proved that the use of deep learning can achieve im- the critical channel features. This results in a marked mit-
proved performance in a wide range of applications when compared igation of the noise effect and a great reduction of data
to traditional hand-engineering approaches [3]. For this reason, dimensionality.
numerous methods has been proposed to employ deep learning for • We propose a multi-view deep learning model that inte-
seizure detection [23, 31, 44]. These unsupervised learning features grates both intra and inter correlation of EEG channels,
are explored from unlabeled raw medical data [27, 28, 42], and have i.e., cross-channel and intra-channel features, to explore
shown great promise in mining hidden seizure characteristics. hidden inherent seizure characteristics with meaningful
Despite promising results have been reported using such deep interpretation in multi-channel EEG signals. The results
learning methods, there still exist several challenges that should show that the proposed model outperforms the baseline
be addressed. First, most traditional deep learning models inde- methods on different evaluation criteria.
pendently feed each channel into seizure classifier and ignore the
correlations between them, which leads to the failure to recog- The rest of the paper is organized as follows: Section 2 introduces
nize the general signal patterns. Second, most of the channels in the related works of seizure detection. Our proposed methodology
multi-channel EEG signals are irrelevant to brain related activities, is then described in Section 3. Section 4 presents and discusses the
including seizure onset [18, 25]. These irrelevant channels introduce experimental results for our method followed by the conclusions
lots of noise to data and could significantly reduce the performance in Section 5.
of the learning methods. Third, it does not always yield good per-
formance with conventional deep learning features with simple 2 RELATED WORK
design, especially when confronted with unbalanced datasets or In recent years, there has been a growing interest in detecting
rare events which is a common issue in health informatics [1]. It epileptic seizure through deep learning methods. Pramod et al.
is necessary to integrate knowledge from handcrafted engineer- [35] developed a patient-specific epileptic seizure detector using
ing, such as signal processing, during the deep learning process deep neural networks trained with dropout from EEG data. Some
to enhance the performance. Finally, the patterns of seizures in researchers applied deep belief networks (DBNs) to seizure detec-
EEG signals may vary significantly across patients and even over tion. Turner et al. [46] applied DBNs to seizure detection in high
time for the same patient, which makes it difficult to develop a resolution and multi-channel EEG data. Wulsin et al. [48] applied
cross-patient detector. DBNs in a semi-supervised paradigm to model EEG waveforms for
To tackle the aforementioned challenges, we propose a multi- classification and anomaly detection, and the results showed that
view deep learning model to simultaneously extract the multiple fast seizure detection are possible with sophisticated machine learn-
latent features of multi-channel EEG signals from both intra and ing methods. There are some works related to convolution neural
inter patient groups. Our model is described in Fig. 1. As the seizure networks (CNNs)-based seizure detection approaches. Johansen
state is related to an interval rather than a certain point of EEG et al. [19] developed a CNN model for detecting spikes in EEG of
signals, in this paper, we first slice the EEG signals of different epileptic patients in an automated fashion. Antoniades et al. [2]
patients into fixed-length pieces with a sliding window, and use considered deep learning for automatic feature generation from
short-time Fourier transform (STFT) as the preprocessing step to epileptic intracranial EEG data in time domain, and applied CNNs
further capture the time-frequency information, denoted as EEG in a subject independent fashion. To increase the training ratio and
spectrogram representations. We then adopt stacked sparse denois- overcome the curse of dimensionality, Li et al. [24] adopted a CNN-
ing autoencoders (SSDA) to unsupervisedly learn multiple features based method trained with accelerated proximal gradient algorithm
214
EEG Signals
EEG Fragments
...
Deep Learning Model
Cross-channel
Layers
Channel Selection
...
...
...
Softmax Layer
FC Layers
Intra-channel Layers
EEG Spectrograms
Figure 1: Schematic illustration of the overall approach pipeline.
for feature extraction and classification simultaneously. Moreover, do not consider the irrelevant channels to seizure onset and ig-
stacked autoencoders were also used for analysis of epileptic EEG nore the correlations between EEG channels. With the help of
signals. Supratak et al. [43] developed a patient-specific seizure signal processing and clinical knowledge, we propose to build a
detector using stacked autoencoders and logistic classifier. This learning procedure to extract more meaningful interpretation from
model investigated the possibility of utilizing unsupervised feature multi-channel EEG signals than traditional methods.
learning to automatically learn features from raw, unlabeled EEG For the spectrogram representation, the time-frequency image-
data. Qi et al. [36] proposed a robust stacked autoencoder as a part based features are also used to describe the non-stationary nature
of the deep network to jointly learn effective feature and classifier. of EEG signals. These handcrafted representations contain richer
The authors also proposed a maximum correntrpy criterion to re- features than raw signals. Recently, some existing approaches were
duce the effect of noise. To reduce the sample rate and enhance proposed based on time-frequency image descriptors for epileptic
the efficiency of the vision detection, Yan et al. [50] proposed an seizure detection on EEG data. Kovacs et al. [22] proposed a ratio-
EEG signal classification method based on sparse autoencoder and nal discrete STFT to extract discriminative features in EEG data,
support vector machine. Lin et al. [26] proposed a framework for and used alternating decision tree classifier to detect the seizure
automatic detection of epileptic EEG using stacked sparse autoen- segments in presence of seizure-free segments. Şengür et al. [38]
coders, and the simulation results illustrated the effectiveness of the proposed a new texture descriptor to extract features from EEG
proposed framework. Majumdar. et al. [29] proposed a combined time-frequency images, and employed support vector machine to
solution to the biomedical signal reconstruction and analysis, and detect epileptic seizures. Boubchir et al. [6, 7] developed a model us-
use a semi-supervised stacked autoencoder to address this issue in a ing Haralickfis texture descriptor to describe visually the epileptic
holistic fashion. With any strategy adopted, all the above discussed seizure patterns observed in time-frequency image of EEG signals.
methods directly feed EEG signals as input into deep learning model In order to improve the performance of EEG seizure detection and
without any handcrafted engineering knowledge assisted. These classification of non-stationary EEG signals, the optimal relevant
deep learning methods yielded good performance, however, they translated features are selected according to maximum relevance
215
t=3
and minimum redundancy criteria. Fu. et al. [11] presented a tech- t=1
t=2
nique for seizure classification of EEG signals using Hilbert-Huang Window

Length
transform and support vector machine on time-frequency image.
Some 2-D statistical features such as mean, variance, skewness and
kurtosis of pixel intensity in the histogram were extracted. Samiee
et al. [37] proposed a method based on an adaptive and localized Step
time-frequency image representation of EEG signals relying on Length
rational functions. The authors indicated that multi-layer percep-

tron was the optimal classifier compared with various classifiers Figure 2: An example of EEG segmentation.
such as naive Bayes, logistic regression, support vector machine,
K-nearest neighbors. To improve the accuracy of epileptic EEG sig-
nal classification, Sivasankari et al. [41] introduced a model using addition, spectrogram representation also provides an implicit way
time-frequency features extracted on the basis of three parameters, for noise cancellation of the signal over time.
namely, standard deviation, correlation dimension, and Lyapunov In our model, we use STFT to convert EEG fragments into EEG
exponents. Independent component analysis was incorporated as a spectrogram representations in time-frequency domain. Formally,
preprocessing step and STFT is used for denoising the signal ade- given EEG fragment x(t) of one single-channel, the STFT of the
quately. Tzallas. et al. [47] extracted features by measuring the sig- signal in a continuous form is defined as [8]:
nal segment fractional energy in specific time-frequency windows. ∫ ∞
The authors demonstrated the suitability of time-frequency analysis ST FTx (τ , w) = x(t)w(t − τ )e −jwt dt, (1)
−∞
to classify EEG segments for epileptic seizures. To merge key instan-
taneous frequency descriptors, Boashash. et al. [5] presented an where w(t) is the window function centered around zero, and
approach for EEG abnormalities detection and classification based τ denotes the time index to obtain time localization by taking
on the features derived from EEG time-frequency representations. Fourier transform of the windowed signal. STFT handles the non-
The authors combined signal related and image related features stationarity by windowing, and thus taking the temporal variations
by calculating specific measurements such as instantaneous fre- into account. The spectrogram can be further calculated as the
quency, singular value decomposition, and morphometric features. magnitude squared of the STFT to transform the complex values:
With any strategy adopted, all the above discussed methods applied scaloдram x (τ , w) = |ST FTx (τ , w)| 2 . (2)
handcrafted feature engineering both in time-frequency represen-
tation and feature extraction. These appropriate features, however, The resulting spectrogram is a matrix where time increases across
is determined based on the specific case, which might not capture the columns and frequency increases down the rows. In this way,
new patterns of seizure activity. In our proposed model, we employ by generating the EEG spectrogram representation, the frequency
a deep learning method to learn features that are complement to content in epileptic EEG signals can be described over time, which
the handcrafted engineering, in order to improve the performance can be further learned as time-frequency images by the deep learn-
of seizure detection. ing model. To demonstrate the basic idea of the proposed method,
we adopt the Blackman window as the STFT window function in
3 METHODOLOGY the following sections. We also examine different window functions
as demonstrated later in Section 4.
3.1 EEG Segmentation
The proposed approach is designed to detect epileptic seizure in 3.3 Multi-view Deep Feature Extraction
real-time. Since EEG signals cannot be explicitly segmented into In most cases, on the learning of seizure characteristics from EEG
sub-fragments associated with physiological meanings, we segment signals, only standard deep learning method with simple design is
them into several slots of fixed length. In our model, we obtain EEG adopted to extract features. Using simple deep learning structure
fragments by sliding a fix-length window through the entire signals causes the inability of single feature to reach the robust and accu-
parameterized by two predefined parameters: window length l and rate results. To handle this problem, we propose to unsupervisedly
step length s. Fig. 2 shows an example of how to segment an EEG extract high level and latent seizure features from different perspec-
signal into fragments. Notice that increasing the segment length tives through the SSDA model [49]. Specifically, once the 2-D EEG
can improve the recognition accuracy, at the cost of the delay for spectrograms have been computed, a set of deep features can be
real-time applications. Thus, the length of the sliding window is automatically extracted by considering both intra and inter correla-
fixed to l = 3seconds and the step length s = 1second. tion of EEG channels, i.e., intra-channel features and cross-channel
features. For instance, given a single-channel EEG spectrogram, we
3.2 EEG Spectrogram Representation can regard it as a spectral image where various spatial features are
Considering the fact that the brain abnormality in EEG is reflected extracted within each fragment. In this step, high dimensional raw
by frequency changes and increased amplitudes [14], combining sig- features are integrated into low dimensional latent characteristics
nal processing knowledge improves seizure detection performance. with meaningful interpretation.
Spectrogram representation is essential for extracting interpretable In general, SSDA model is a multi-layer neural network con-
features in time-frequency domain. This enables the classifier to sisting of several stacked basic denoising autoencoders (DA) with
be more robust against amplitude changes and data shifting. In sparse constraint. DA is a single hidden layer model that attempts
216
...
Decoding
... ...
Decoding Encoding
... ...
Encoding Corruption
... ...
Corruption
... ...
(a) The structure of DA network (b) The structure of SSDA network
Figure 3: Deep model for EEG feature learning.
to let the output vector approximate the input vector by learning constraint, receptively. Here KL(ρ k ρ̂ j ) is the Kullback-Leibler (KL)
an encoder and a decoder, shown in Fig. 3(a). Different from the divergence to measure the difference between a Bernoulli random
traditional autoencoder [17], DA adds random corruption to the in- variable with mean ρ and a Bernoulli random variable with mean
put before encoding, sampling x̂ ∼ Pcor r (x̂ | x), in order to enable ρ̂ j , defined as:
the hidden layer to uncover more robust features. Formally, given ρ 1−ρ
a single input vector x, the output vector y can be expressed by the KL(ρ k ρ̂ j ) = ρ log + (1 − ρ) log , (7)
ρ̂ j 1 − ρ̂ j
hypothesis function based on forward-propagation:
where ρ̂ is the average activation of hidden unit j (averaged over
y = hW ,b (x) = f (W (l +1) f (W (l )x̂ + b (l ) ) + b (l +1) ), (3) the training set), and ρ denotes the sparsity parameter which is
where f (·) is the activation function, W (l ) and b (l ) denote the a small value close to zero. Note that the regularization term is
weight matrix and bias in layer l, respectively. considered to decrease the magnitude of the weights and help
The training objective for DA is to minimize the reconstruction prevent overfitting.
error over training set. Given unlabeled training dataset {x (i) , i = In this way, the SSDA model can be regarded as a deep stacked
1, 2, ..., m} where x (i) ∈ Rn , by minimizing the reconstruction error network, as shown in Fig. 3(b). The strategy for training the SSDA
is using the greedy layer-wise method [4]. Specifically, the output
between input x (i) and output y (i) , the cost function in terms of
hidden features extracted in the previous layer of SSDA is fed to the
parameters (W , b) is defined as:
current layer as input. Thus, the parameters of each layer is trained
m
1 Õ individually while fixing the parameters of the other layers. For
JDA (W , b) = LIH (x (i) , hW ,b (x (i) )), (4)
m i=1 our model, Pcor r is set as a binary mask on x which zeros out each
entry independently with probability p. The activation function is
where LIH represents the reconstruction error, which is measured defined as the sigmoid logistic function f (z) = 1/(1 + exp(−z)). For
by the cross-entropy loss: further analysis, we extend the unsupervised SSDA framework by
n denoting the last hidden features learned from each channel and
(i) (i)
Õ
LIH (x (i) , hW ,b (x (i) )) = − [x j log (hW ,b (x j )) whole channels as the intra-channel features and the cross-channel
j=1 (5) features, receptively. This makes it well-suited to capture complex
(i) (i)
+ (1 − x j ) log (1 − hW ,b (x j ))]. correlations across EEG time-frequency spectrograms in the low
dimensional space, especially when working with relatively small
To combine the virtues of sparse coding and neural networks and dataset with few or unreliable labels.
avoid overfitting, we train a DA to minimize the cost function
regularized by a sparsity penalty term: 3.4 Channel Selection
s Channel selection is a special case of intra-channel feature extrac-
λ
kW k 2 + β
Õ
JS DA (W , b) = JDA (W , b) + KL(ρ k ρ̂ j ), (6) tion. In biology, brain related activities, which include epileptic
2 j=1
seizure, are usually dominated by several specific areas of the brain.
where JDA (W , b) is the cost function defined in Equation 4, s is Thus in seizure detection, there exist some irrelevant channels
the number of hidden units. λ and β are the hyper-parameters which bring noise to the classification task. To filter out these
that control the weight of the regularization constraint and sparsity irrelevant channels, in this section, we propose an SSDA-based
217
T
channel selection method to determine the critical features by the where θ = θ 1T θ 2T · · · θ kT denotes the parameters of the

partition of channels. This structure of channel selection can be Ík θ jT x (i )
model, and term 1/ j=1 e is for normalization. Thus, the cost
easily integrated into the whole proposed deep learning framework.
function of softmax is described as:
In our previous work [18, 25], we have shown the importance of
channel selection in multi-channel EEG signals. This strategy in- m k
1 Õ Õ e θl x
T (i )
 λÕ
 k Õ n
volves using zero-stimulus method where an all-zero vector is used J (θ ) = −  1{y (i) = j} log Í + θ2 ,
m  i=1 j=1 k e θ lT x (i )  2 i=1 j=0 i j
as input to the trained deep learning model. After this we measure l =1
(11)
 
the response rate, which is defined to be the distance between the
value of resulting hidden units and random hidden units. where 1{·} is the indicator function and the second term is the regu-
In our model, inspired by our previous work, we define a modi- larization term which penalizes large values of the parameters. No-
tice that in the first term e θl x / lk=1 e θl x = p(y (i) = 1|x (i) ; θ ),
T (i ) Í T (i )
fied response rate considering both zero-based response and sample-
based response with unsupervised information. Specifically, given which means the cost function of softmax is similar to logistic ex-
the aforementioned training data and learned intra-channel SSDA cept the sum over the k. According to the deep learning mechanism,
model, we measure the proposed response rate of each channel c during the pre-training, the unlabeled data is used to train each
by calculating the weighted arithmetic mean: layer individually. After that, the fine-tuning, which treats all layers
m
as a single model, is adopted for supervised learning using labeled
w2 Õ data. During the whole training step, the minibatch stochastic
Rc = w 1 ||hz − ρ|| 2 + ||hs (x (i) ) − ρˆs || 2 , (8)
m i=1 gradient descent is used to minimize the cost function. Note that
we also use weight decay and momentum to accelerate gradient
where hz denotes the response stimulated by the all-zero vector, descent.
hs (x (i) ) represents the response stimulated by the i t h sample vector
x (i) , ρˆs is the average activation vector of the last learned SSDA 4 EXPERIMENTAL RESULTS
hidden layer. Here {w i , i = 1, 2} where w i ∈ [0, 1] and w i = 1
Í
denotes the weight constraint. Intuitively, both terms of Equation
4.1 Dataset
8 represent the strength of response given the input data. The The EEG data we use is an open access scalp EEG dataset collected
high value of Rc displays the salience of hidden units’ activation in at the Children’s Hospital Boston [13, 40]. In this dataset, the multi-
channel c. After the calculation, we rank the value of all channels channel EEG signals are recorded from pediatric subjects with
and select the features from top u channels as the critical channel intractable seizures. The data is obtained from 23 patients, including
features. Here in the implementation, we fix u to be equal to half 18 females and 5 males from age 2 to age 22, to characterize their
the number of EEG channels. In this way, the whole intra-channel seizures and assess the necessity of surgery for them. The beginning
layers are further established on the channel selection structure, and end of each seizure are both annotated in the ground truth.
which results in a marked mitigation of the noise effect and a great Specifically, the EEG signals of each patient contains 23 channels
reduction of data dimensionality. (24 or 26 in a few cases), and the data of each channel is recorded
at 256 Hz with 16-bit resolution.
3.5 Learning and Detection Following the onset of a seizure, a set of EEG channels start to
show rhythmic changes, which contains different time-frequency
After extracting meaningful features, we concatenate the above-
patterns. The rhythmic patterns can assist the seizure detector in
learned feature vectors together in a feature fusion fashion. For-
distinguishing the ictal and non-ictal states. For example, Fig. 4
mally, given training sample x (i) , the new fusion features of this shows seizure samples of multi-channel scalp EEG signals from
sample is defined as: two different patients. The red bar marks the onset of a seizure,
and both patients A and B start seizure at the 6th second. Fig. 4(a)
j =1 n j
Ík
(i) (i) (i) (i)
x F usion = [x 1 , x 2 , ..., x k ] ∈ R , (9) illustrates the onset of a seizure of patient A. It can be seen that
the onset of this seizure comes with the significant changes of EEG
where k denotes the feature index, and n j is the dimensionality of
signals. In Fig. 4(b), the changes can still be observed between ictal
each feature. In our model, the concatenated features are the com-
and non-ictal states from patient B. However, the rhythmic patterns
bination of the intra-channel and cross-channel features mentioned
on EEG signals differ from patient A. Thus, the characteristics of
above. Once the features are merged, we feed them into the fully-
seizures on EEG signals might vary significantly in different EEG
connected (FC) layers and adopt a softmax layer for classification.
channels across patients, and this variability makes the seizure
Softmax regression is a supervised learning algorithm which
detection problem even more difficult.
generalizes logistic regression, a binary classification model, to a
multi-class classification model. Formally, given the labeled training
dataset {(x (i) , y (i) ), i = 1, 2, ..., m} where x (i) ∈ Rn and y (i) ∈
4.2 Baselines
{1, 2, ..., k}, the hypothesis of softmax takes the form: Since it is a classification task, we adopt several widely used clas-
sification algorithms as the baseline methods, including standard
p(y (i) = 1|x (i) ; θ ) e θ 1T x (i )  SVM [9], neural network (NN) [30], and traditional deep learning
1 method SSDA [49] as the first baseline group. For the sake of fair-
 

..
  . 
(i)
hθ (x ) = 

= Í
  . , (10)
 .  k e θ jT x (i )  .  ness and to avoid the curse of high dimensionality, it is necessary to
j=1 e θ k x 
p(y (i) = k |x (i) ; θ )  T (i ) 
    reduce the dimensionality of the data before we send it to SVM, NN
218
219
and STFT-mSSDA methods outperform the baseline methods on

all four different evaluation criteria. Our proposed STFT-mSSDA 1
method has not only got significantly better accuracy and F1-score, 0.9
but also obtained higher AUC-ROC and AUC-PR than the others.
0.8
Feature extraction is crucial for EEG classification. Given the
results of both baseline groups, we can see that the performance of 0.7
NN-based methods is worse than SVM-based methods both in time 0.6
and time-frequency domain. The reason why NN-based method 0.5

doesn’t perform well is because the raw feature space is not suit-
0.4
able for NN-based method using gradient decent optimization al-
gorithm, which is easy to reach a local minimum. The results of 0.3 PSVM
STFT-PSVM
the second baseline group demonstrates that all the classifiers ben- 0.2 STFT-NN
STFT-SSDA
efit from the EEG spectrogram representation. This is because by 0.1 STFT-cSSDA
investigating the handcrafted features, noise interference is effec- STFT-mSSDA
tively reduced and classifiers can obtain more useful information in 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
EEG time-frequency domain. Moreover, the standard SSDA-based
method generates worse result than the others. This results from (a) ROC curves
the low-quality features learned by simply applying traditional
deep learning model on raw data. In contrast, given the results
of STFT-mSSDA which utilizes SSDA method appropriately, we 1
can conclude that the proposed method can successfully extract 0.9
meaningful features from EEG signals. 0.8
0.7
Table 2: Detection Performance Comparisons
0.6
AUC-ROC AUC-PR F1-score Accuracy 0.5
PSVM 0.7283 0.6012 0.8826 0.8136 0.4
PNN 0.7754 0.5487 0.7515 0.6630 0.3 PSVM

STFT-PSVM
PSSDA 0.7166 0.3679 0.7627 0.6750 0.2 STFT-NN
STFT-PSVM 0.7535 0.6356 0.9342 0.8850 STFT-SSDA

STFT-cSSDA
0.1
STFT-NN 0.7314 0.5251 0.8212 0.7338 STFT-mSSDA
STFT-SSDA 0.6080 0.4985 0.8947 0.8195 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
STFT-cSSDA 0.9612 0.8507 0.9506 0.9242
STFT-mSSDA 0.9833 0.9549 0.9605 0.9382 (b) PR curves
Overall, as for the proposed STFT-cSSDA method, the relative Figure 5: The ROC and PR curves of the proposed method
improvements verify that the intra-channel features are playing an and the baselines.
important role to identify seizure in EEG spectrograms. Moreover,
based on the comparison between STFT-cSSDA and STFT-mSSDA,
we arrive at a conclusion that cross-channel features provide com- curve figure, our method achieves the best AUC of 0.9549 compared
plementary information towards intra-channel features and thus with the methods such as STFT-SSDA, STFT-NN, STFT-PSVM and
obtain more representative features from different perspectives PSVM with AUC of 0.4985, 0.5251, 0.6356 and 0.6012, respectively.
than STFT-cSSDA, which is crucial for successful seizure detection.
Fig. 5 shows the ROC and PR curves of each method, respectively. 4.6 Analysis on Channel Selection
We can observe that our STFT-mSSDA and STFT-cSSDA models In this section, we conduct the experiment on both our proposed
perform better than the conventional classification techniques in STFT-cSSDA and STFT-mSSDA models to discuss the effective-
terms of the ability of cross-patient seizure detection. In the ROC ness of our proposed channel selection method. We consider the
curve figure, the proposed STFT-mSSDA method achieves the best strategy that selects all EEG channels for seizure detection, de-
AUC of 0.9833 on the experiment data. And we can observe that, noted as cSSDA(Full) and mSSDA(Full), respectively. Fig. 6 shows
the true positive rate of the STFT-mSSDA model increases at a very the performance comparisons on proposed models with different
fast speed in the beginning when the false positive rate is still close selected channels. The results reveal that by using the most signifi-
to zero, which means STFT-mSSDA is able to capture the important cant features, the performance increase in terms of F1-score and
features to represent and separate interictal and ictal data points accuracy. Specifically, the accuracy of our models using channel
effectively. Furthermore, neither the precision nor recall take into selection method averagely increases 2 percent compared with the
account the number of true negatives, thus the PR curve is more full-channel selected models. Besides, as shown in Table 2 the good
persuasive than the ROC curve under the imbalanced data. In the PR performance of these selected channels supports the effectiveness
220
Table 3: Comparative performance of proposed STFT- using multi-channel scalp EEG signals. The proposed approach
mSSDA with different window functions. is a general, cross-patient model which is capable of extracting
hidden inherent features considering both intra and inter correla-
Window AUC-ROC AUC-PR F1-score Accuracy tion of EEG channels, denoted as intra-channel and cross-channel
features, respectively. To integrate signal processing knowledge,
Hann 0.9553 0.8761 0.9463 0.9163
we express the time-frequency information of EEG fragments via
Rectangular 0.9630 0.8880 0.9521 0.9252
STFT, denoted as EEG spectrogram representation. The multiple
Bartlett 0.9651 0.9049 0.9421 0.9123
latent features are learned from different perspectives through the
Kaiser 0.9636 0.9041 0.9366 0.9013
SSDA model. The critical channels are selected according to our
Bartlett-Hann 0.9637 0.8998 0.9552 0.9312
proposed SSDA-based response rate to reduce the dimension of
Blackman-Harris 0.9704 0.9221 0.9452 0.9163
intra-channel features. To train the seizure detector, the multi-view
Bohman 0.9683 0.9271 0.9562 0.9322
features are merged in a feature fusion fashion and are further
Chebyshev 0.9676 0.9202 0.9373 0.9053
fed into the fully-connected SSDA model with a softmax classifier.
Flat Top 0.9516 0.8205 0.9368 0.9023
The proposed method has been tested on the CHB-MIT scalp EEG
Gaussian 0.9567 0.8757 0.9391 0.9063
dataset and compared with several baseline methods. In general, the
Tapered cosine 0.9526 0.8732 0.9485 0.9212
experimental results demonstrate the effectiveness and superiority
Blackman 0.9833 0.9549 0.9605 0.9382
of the proposed model in detecting epileptic seizures.
of the proposed channel selection method. To sum up, the proposed ACKNOWLEDGMENTS
method can select meaningful channels for the ultimate task of This paper is supported by the Project for the National Natural
epileptic seizure detection. Science Foundation of China under Grant No. 61672064, the China
Scholarship Council Fund under Grant No. 201606540008.
1
0.98
F1-score
Accuracy
REFERENCES
0.96
[1] Turkey N Alotaiby, Saleh A Alshebeili, Tariq Alshawi, Ishtiaq Ahmad, and Fathi
E Abd El-Samie. 2014. EEG seizure detection and prediction algorithms: a survey.
0.94 EURASIP Journal on Advances in Signal Processing 2014, 1 (2014), 183.
[2] Andreas Antoniades, Loukianos Spyrou, Clive Cheong Took, and Saeid Sanei.
0.92
2016. Deep learning for epileptic intracranial EEG data. In Machine Learning for
0.9 Signal Processing (MLSP), 2016 IEEE 26th International Workshop on. IEEE, 1–6.
[3] Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation
0.88 learning: A review and new perspectives. IEEE transactions on pattern analysis
0.86
and machine intelligence 35, 8 (2013), 1798–1828.
[4] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, and others. 2007.
0.84 Greedy layer-wise training of deep networks. Advances in neural information
processing systems 19 (2007), 153.
0.82
[5] Boualem Boashash, Larbi Boubchir, and Ghasem Azemi. 2012. Improving the
0.8 classification of newborn EEG time-frequency representations using a combined
cSSDA(Full) cSSDA mSSDA(Full) mSSDA time-frequency signal and image approach. In Information Science, Signal Pro-
cessing and their Applications (ISSPA), 2012 11th International Conference on. IEEE,
280–285.
[6] Larbi Boubchir, Somaya Al-Maadeed, and Ahmed Bouridane. 2014. Haralick
Figure 6: Performance comparisons on proposed models feature extraction from time-frequency images for epileptic seizure detection
with different selected channels. and classification of EEG data. In Microelectronics (ICM), 2014 26th International
Conference on. IEEE, 32–35.
[7] Larbi Boubchir, Somaya Al-Maadeed, and Ahmed Bouridane. 2014. On the use of
time-frequency features for detecting and classifying epileptic seizure activities
4.7 STFT Comparative Performance in non-stationary EEG signals. In Acoustics, Speech and Signal Processing (ICASSP),
2014 IEEE International Conference on. IEEE, 5889–5893.
We show the performance of the proposed method on our exper- [8] Leon Cohen. 1989. Time-frequency distributions-a review. Proc. IEEE 77, 7 (1989),
iment dataset in various transform scenarios by using different 941–981.
[9] Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine
STFT window functions in Table 3. From the given results, it is learning 20, 3 (1995), 273–297.
obvious that our proposed STFT-mSSDA model can still extract [10] Robert S Fisher, Walter van Emde Boas, Warren Blume, Christian Elger, Pierre
meaningful features from different STFT spaces and outperform Genton, Phillip Lee, and Jerome Engel. 2005. Epileptic seizures and epilepsy:
definitions proposed by the International League Against Epilepsy (ILAE) and
the baseline methods, despite the influence of different window the International Bureau for Epilepsy (IBE). Epilepsia 46, 4 (2005), 470–472.
functions. And we can also see that Blackman window yield better [11] Kai Fu, Jianfeng Qu, Yi Chai, and Yong Dong. 2014. Classification of seizure based
on the time-frequency image of EEG signals using HHT and SVM. Biomedical
performance than other window functions, which indicates that Signal Processing and Control 13 (2014), 15–22.
Blackman window can obtain more information from epileptic EEG [12] Giorgos Giannakakis, Vangelis Sakkalis, Matthew Pediaditis, and Manolis Tsik-
signals. nakis. 2015. Methods for seizure detection and prediction: an overview. Modern
Electroencephalographic Assessment Techniques: Theory and Applications (2015),
131–157.
5 CONCLUSIONS [13] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch
Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and
In this paper, we design and evaluate our proposed multi-view deep H Eugene Stanley. 2000. Physiobank, physiotoolkit, and physionet. Circulation
learning model (STFT-mSSDA) for automated seizure detection 101, 23 (2000), e215–e220.
221
[14] J Gotman, D Flanagan, J Zhang, and B Rosenblatt. 1997. Automatic seizure [33] Florian Mormann, Ralph G Andrzejak, Christian E Elger, and Klaus Lehnertz.
detection in the newborn: methods and initial evaluation. Electroencephalography 2007. Seizure prediction: the long and winding road. Brain 130, 2 (2007), 314–333.
and clinical neurophysiology 103, 3 (1997), 356–362. [34] World Health Organization. 2017. Epilepsy Fact Sheet. http://www.who.int/
[15] L John Greenfield, James D Geyer, and Paul R Carney. 2012. Reading EEGs: a mediacentre/factsheets/fs999/en/. (2017).
practical approach. Lippincott Williams & Wilkins. [35] Siddharth Pramod, Adam Page, Tinoosh Mohsenin, and Tim Oates. 2014. De-
[16] Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimension- tecting Epileptic Seizures from EEG Data using Neural Networks. arXiv preprint
ality of data with neural networks. science 313, 5786 (2006), 504–507. arXiv:1412.6502 (2014).
[17] Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimension- [36] Yu Qi, Yueming Wang, Jianmin Zhang, Junming Zhu, and Xiaoxiang Zheng. 2014.
ality of data with neural networks. science 313, 5786 (2006), 504–507. Robust deep network with maximum correntropy criterion for seizure detection.
[18] Xiaowei Jia, Kang Li, Xiaoyi Li, and Aidong Zhang. 2014. A novel semi-supervised BioMed research international 2014 (2014).
deep learning framework for affective state recognition on eeg signals. In Bioin- [37] Kaveh Samiee, Peter Kovacs, and Moncef Gabbouj. 2015. Epileptic seizure classi-
formatics and Bioengineering (BIBE), 2014 IEEE International Conference on. IEEE, fication of EEG time-series using rational discrete short-time Fourier transform.
30–37. IEEE transactions on Biomedical Engineering 62, 2 (2015), 541–552.
[19] Alexander Rosenberg Johansen, Jing Jin, Tomasz Maszczyk, Justin Dauwels, [38] Abdulkadir Şengür, Yanhui Guo, and Yaman Akbulut. 2016. Time–frequency
Sydney S Cash, and M Brandon Westover. 2016. Epileptiform spike detection texture descriptors of EEG signals for efficient detection of epileptic seizure.
via convolutional neural networks. In Acoustics, Speech and Signal Processing Brain Informatics 3, 2 (2016), 101–108.
(ICASSP), 2016 IEEE International Conference on. IEEE, 754–758. [39] Younghak Shin, Seungchan Lee, Minkyu Ahn, Hohyun Cho, Sung Chan Jun, and
[20] Alistair EW Johnson, Mohammad M Ghassemi, Shamim Nemati, Katherine E Heung-No Lee. 2015. Noise robustness analysis of sparse representation based
Niehaus, David A Clifton, and Gari D Clifford. 2016. Machine learning and classification method for non-stationary EEG signal classification. Biomedical
decision support in critical care. Proc. IEEE 104, 2 (2016), 444–466. Signal Processing and Control 21 (2015), 8–18.
[21] Ian Jolliffe. 2002. Principal component analysis. Wiley Online Library. [40] Ali Hossam Shoeb. 2009. Application of machine learning to epileptic seizure
[22] Peter Kovacs, Kaveh Samiee, and Moncef Gabbouj. 2014. On application of onset detection and treatment. Ph.D. Dissertation. Massachusetts Institute of
rational discrete short time fourier transform in epileptic seizure classification. Technology.
In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Con- [41] K Sivasankari and K Thanushkodi. 2014. An improved EEG signal classification
ference on. IEEE, 5839–5843. using neural network with the consequence of ICA and STFT. Journal of Electrical
[23] Martin Längkvist, Lars Karlsson, and Amy Loutfi. 2014. A review of unsupervised Engineering and Technology 9, 3 (2014), 1060–1071.
feature learning and deep learning for time-series modeling. Pattern Recognition [42] Qiuling Suo, Hongfei Xue, Jing Gao, and Aidong Zhang. 2016. Risk Factor Anal-
Letters 42 (2014), 11–24. ysis Based on Deep Learning Models. In Proceedings of the 7th ACM International
[24] Dazi Li, Guifang Wang, Tianheng Song, and Qibing Jin. 2016. Improving convo- Conference on Bioinformatics, Computational Biology, and Health Informatics.
lutional neural network using accelerated proximal gradient method for epilepsy ACM, 394–403.
diagnosis. In Control (CONTROL), 2016 UKACC 11th International Conference on. [43] Akara Supratak, Ling Li, and Yike Guo. 2014. Feature extraction with stacked
IEEE, 1–6. autoencoders for epileptic seizure detection. In Engineering in Medicine and
[25] Kang Li, Xiaoyi Li, Yuan Zhang, and Aidong Zhang. 2013. Affective state recog- Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE.
nition from EEG with deep belief networks. In Bioinformatics and Biomedicine IEEE, 4184–4187.
(BIBM), 2013 IEEE International Conference on. IEEE, 305–310. [44] Akara Supratak, Chao Wu, Hao Dong, Kai Sun, and Yike Guo. 2016. Survey
[26] Qin Lin, Shu-qun Ye, Xiu-mei Huang, Si-you Li, Mei-zhen Zhang, Yun Xue, and on Feature Extraction and Applications of Biosignals. In Machine Learning for
Wen-Sheng Chen. 2016. Classification of Epileptic EEG Signals with Stacked Health Informatics. Springer, 161–182.
Sparse Autoencoder Based on Deep Learning. In International Conference on [45] Shanbao Tong and Nitish Vyomesh Thakor. 2009. Quantitative EEG analysis
Intelligent Computing. Springer, 802–810. methods and clinical applications. Artech House.
[27] Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing [46] JT Turner, Adam Page, Tinoosh Mohsenin, and Tim Oates. 2014. Deep belief
Gao. 2017. Dipole: Diagnosis Prediction in Healthcare via Attention-based networks used on high resolution multichannel electroencephalography data
Bidirectional Recurrent Neural Networks. In Proceedings of the 23rd ACM SIGKDD for seizure detection. In 2014 AAAI Spring Symposium Series.
International Conference on Knowledge Discovery and Data Mining. ACM. [47] Alexandros T Tzallas, Markos G Tsipouras, and Dimitrios I Fotiadis. 2009. Epilep-
[28] Fenglong Ma, Chuishi Meng, Houping Xiao, Qi Li, Jing Gao, Lu Su, and Aidong tic seizure detection in EEGs using time–frequency analysis. IEEE transactions
Zhang. 2017. Unsupervised Discovery of Drug Side-Effects From Heterogeneous on information technology in biomedicine 13, 5 (2009), 703–710.
Data Sources. In Proceedings of the 23rd ACM SIGKDD International Conference [48] DF Wulsin, JR Gupta, R Mani, JA Blanco, and B Litt. 2011. Modeling electroen-
on Knowledge Discovery and Data Mining. ACM. cephalography waveforms with semi-supervised deep belief nets: fast classi-
[29] Angshul Majumdar, Anupriya Gogna, and Rabab Ward. 2016. Semi-supervised fication and anomaly measurement. Journal of neural engineering 8, 3 (2011),
Stacked Label Consistent Autoencoder for Reconstruction and Analysis of 036015.
Biomedical Signals. IEEE Transactions on Biomedical Engineering (2016). [49] Junyuan Xie, Linli Xu, and Enhong Chen. 2012. Image denoising and inpainting
[30] Warren S McCulloch and Walter Pitts. 1943. A logical calculus of the ideas with deep neural networks. In Advances in Neural Information Processing Systems.
immanent in nervous activity. The bulletin of mathematical biophysics 5, 4 (1943), 341–349.
115–133. [50] Bo Yan, Yong Wang, Yuheng Li, Yejiang Gong, Lu Guan, and Sheng Yu. 2016. An
[31] Piotr Mirowski, Deepak Madhavan, Yann LeCun, and Ruben Kuzniecky. 2009. EEG signal classification method based on sparse auto-encoders and support
Classification of patterns of EEG synchronization for seizure prediction. Clinical vector machine. In Communications in China (ICCC), 2016 IEEE/CIC International
neurophysiology 120, 11 (2009), 1927–1940. Conference on. IEEE, 1–6.
[32] Florian Mormann, Ralph G Andrzejak, Christian E Elger, and Klaus Lehnertz. [51] Guang-Zhong Yang and Magdi Yacoub. 2006. Body sensor networks. Vol. 1.
2007. Seizure prediction: the long and winding road. Brain 130, 2 (2007), 314–333. Springer.
222

A Multi-View Deep Learning Method For Epileptic Seizure

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Multi-View Deep Learning Method For Epileptic Seizure

Enviado por

Direitos autorais:

Formatos disponíveis

Session 8: Automated Diagnosis and Prediction I ACM-BCB’17, August 20–23, 2017, Boston, MA, USA

A Multi-view Deep Learning Method for Epileptic Seizure

Kebin Jia Aidong Zhang

Deep Learning Model

Figure 1: Schematic illustration of the overall approach pipeline.

nique for seizure classification of EEG signals using Hilbert-Huang Window

rational functions. The authors indicated that multi-layer percep-

(a) The structure of DA network (b) The structure of SSDA network

Figure 3: Deep model for EEG feature learning.

and STFT-mSSDA methods outperform the baseline methods on

NN-based methods is worse than SVM-based methods both in time 0.6

and time-frequency domain. The reason why NN-based method 0.5

tively reduced and classifiers can obtain more useful information in 0

meaningful features from EEG signals. 0.8

AUC-ROC AUC-PR F1-score Accuracy 0.5

PSVM 0.7283 0.6012 0.8826 0.8136 0.4

PNN 0.7754 0.5487 0.7515 0.6630 0.3 PSVM

STFT-PSVM 0.7535 0.6356 0.9342 0.8850 STFT-SSDA

STFT-SSDA 0.6080 0.4985 0.8947 0.8195 0

Você também pode gostar