Você está na página 1de 8

A Simple and Efficient Spectral Features for

Breathing and Snoring Sound Classification


Xiang Sun a, Jin Young Kim a,*, Yonggwan Won a, Jung-Ja Kimb, and Kyung-Ah Kimc
a

Dept. of ECE, College of Eng., Chonnam National University, Gwangju, 500-757,S. Korea

Dept. of Biomedical Eng., College of Eng., Chonbuk National University, Jeonju, S. Korea

Dept. of Biomedical Eng., College of Medicine., Chonbuk National University, Jeonju, S. Korea

Abstract. An efficient method to detect snoring and related events (expiration, inspiration and silence) in sleep sound
recordings is proposed in this paper. The feature vector is obtained using normalized mean and standard deviation of 3 subbands energy. The proposed method is based on the acoustic properties of snoring sound which have been validated to be
effective for snoring detection by our experiments. Then the classification procedure is done by applying Support Vector
Machine. An approximately 32 hours database were recorded from the subjects who have acknowledged snoring habit. The
performance of our method is evaluated by classifying the different events in sleep sound recordings and comparing with the
ground truth. This algorithm was able to correctly classify the snores with the accuracy of 97.40%, 99.90% for breath and
100% for silence.
Keywords: Snoring detection, Obstructive Sleep Apnea, Super Vector Machine

1. Introduction
Sleep quality is one of the most important factors for evaluating humans health condition. One type
of respiratory disease named Obstructive sleep apnea (OSA) has been reported as a common disease in
our lives [1]. The term OSA includes a set of symptoms such as repetitive pauses in breathing during
sleep, and usually associated with the Hypoxemia Syndrome which might cause some harmful
consequences, such as tiredness in the daytime, increased risk of strokes cardiovascular diseases and
even sudden apnea [2, 3]. However, usually the individual with OSA is rarely aware of having
difficulty breathing, which make the symptoms may last for years or even decades without
identification.
In recent years, several studies have shown the relationship between snoring and OSA, which is
usually related to loud and heavy snoring. According to that snoring is the most common symptom of
** Corresponding author. E-mail: beyond@jnu.ac.kr

OSA, occurring in 70% to 95% of patients[4]. Earlier studies [5] indicated that the snoring may play a
key-role in diagnosing and differentiating between healthy and OSA patients. Although, it is possible
to analyze patients sleep acoustic characteristics via whole night polysomnography (PSG) [6] records,
which requires a full-night diagnosis while the individual is connected to numerous facilities in the
diagnosis room. But the PSG is difficult to implement for every patients and also expensive. Therefore
a fast and efficient method for snore and non-snore detection is urgently needed.
Until recently, several related snoring detection researches have been developed which all have
inspiring performance on snoring detection. Duckitt, Tuomi and Niesler [7] applied speech recognition
technique to snoring detection (using 6 simple snoring subjects). The Mel-frequency cepstral
coefficients (MFCC) feature is extracted and then classified using Hidden Markov model (HMM)
which achieved a classification rate of 89%. Cavusoglu et al. [8] developed a method by applying
Principal Component Analysis (PCA) to acquire 2-dimensional primary components from a 15dimensional sub-band spectral energy vectors, with robust linear regression (RLR) for classification.
The detection accuracy is 90.2% (using 15 subjects for design and testing respectively). Dafna et al.[ 9]
proposed a Gaussian mixture model (GMM) based method for snoring detection that involves the 40dimensional feature vectors using MFCC, time domain and energy features. The method produced a
detection rate of 98.1% for snore and non-snore.
In order to develop a fast and efficient method for snoring detection, we proposed a simple and
principal features for classification using the normalized mean and standard deviation of sub-bands
spectral energy which show apparent distinction between snore and other classes. Then adopted the
Support Vector Machine (SVM) to classify each frame and achieved an accuracy rate of 97.40% for
snore, 99.90% for breath, 100% for silence.
The section 2 is the methods description that include the overall structure in sub-section 2.1. The
snoring signal analysis and feature extraction is in sub-section 2.2. In section 3 all the details about
database and experiment results are described, and finally, conclusion is given in section 4.
2. Methods
2.1. Overall structure
The raw sleep recordings are processed following the proposed detection system that is shown in
Fig. 1. The overall system structure is composed of the following steps.
1) As shown in Figure 1, a breathing sound signal is segmented in a constant size by framing and
spectral features are obtained from the segmented signals.
2) In training stage, all tagged features as silence, snore, inspiration and expiration are applied to
SVM classifier, and SVM training results in support vectors for each classes.

3) In the testing stage, an input feature vector is classified into breathing modes by the SVM
classifier.

Fig. 1. The block diagram of the snore detection system.

Fig. 2. The spectrogram of sequence of snoring and other sounds.


(Annotated episodes: 1. Inspiration. 2. Expiration. 3. Snore.)

2.2. Feature extraction


By observing the snoring samples in frequency domain, Beck et al.[10] indicates that snore and
breath usually have strong energy in the range of 64Hz-800Hz. By investigating spectral signals in that
band, we found more distinctive characteristics in that band as follows.
1) Snoring signals have harmonic components (HCs) and HCs are very clear in the frequency
range of 50Hz-300Hz.
2) In the range of 300-550Hz, HCs in the snoring signals are mixed with breathing noise with
nearly equal strengths.
3) In the range of 550-800Hz, breathing noises are more dominant than HCs even though HCs are
shown in the snoring case.

4) Breathing sounds without snoring have strong power in the rage of 300-800Hz.
5) Expiration in breathing makes a little strong and short burst noise rather than inspiration.
By considering the above observations, we set critical frequencies of 50Hz, 300Hz, 550Hz and 800Hz.
That is, the interesting band of 50Hz-800Hz can be divided into three sub-bands by the critical
frequencies. For features, we have to consider the tonal property of snoring signals. Tonal signals have
spectral peaks and show high kurtosis in frequency domain. However, kurtosis calculation is a timeconsuming feature. So, we adopt the standard deviation feature considering the domain of breathing
and snoring detection.
Base on the back ground above, a 6-dimensional feature vector is proposed by computing the
normalized mean and standard deviation of 3 sub-bands: 50-300Hz, 50-550Hz, 50-800Hz for each
frame. In addition, we also experiment a non-overlapped sub-bands to compare with our proposed
method. Specifically, the sub-bands are distributed as follows: 50-300Hz, 300-550Hz, 550-800Hz.
Experimentally, the overlapped sub-bands have better performance due to the similarity between
expiration and snore in the sub-band of 550-800Hz. The advantage of the proposed sub-bands is
shown in Figure 3. The frame size is shaped by 128ms Hamming window, 50% overlap. Then the
features at j th frame are calculated as:

i =

i=

where

E |F ( j , f )| |f Bi
E |F ( j , f )| |f B

E |F ( j , f )| |f Bi
E |F ( j, f )| |f B

(1)

(2)

B 1=[50,300] , B 2=[50,550] , B 3=[50,800] , B=[ 50,8000 ] and F ( j , f ) is

the fast Fourier transform in the sub-band

B i at j th frame. A median filter is applied after

computing the energy for each frames. Finally, the 6-dimensional feature vector is applied to SVM
classifier for training.
3. Experimental Results
A portable recorder was used to record an 8 hours overnight sleeping audio recording from each of
the four subjects at a sample frequency of 16 kHz, 16 bits per sample. All these subjects have

acknowledged snoring habit, providing approximately 32 hours recordings. The database, containing
sufficient snoring episodes for our analysis and experiments, is categorized into 4 classes: expiration,

Fig. 3. The feature details in different sub-bands.


(a) and (b) show the feature values of the sub-band 550-800Hz. (c) and (d) show the feature values of the sub-band
50-800Hz, using the same sequence of signal as figure 2.

inspiration, snore and silence. In order to determine the feasibility of our method for snoring and apnea
detection only the primary classes in the database are selected.
To obtain the experimental data, three main steps are employed:
1) For all of these four 8 hours recordings, randomly extract the episodes which belong to each
class and annotate them respectively.
2) Combine the extracted episodes into 4 new audio datasets separately each of these new audio
datasets only contain one type of sounds.

3) Divide each new audio data as 2 parts, the first half parts for training and the rest parts for
testing.
The experiment using proposed approach was conducted using the datasets that is shown in Table 1.
The data length of each class is given in this table, and the length in each cell is converted into frame
using the frame size of 128ms, 50% overlap. The individuals in the training and testing datasets are
same while the training and testing datasets are non-overlapped
Calculating the 6-dimensional feature vector of experimental datasets in Table 1, and apply it to
SVM for training and classification. Then compare the results with ground truth to calculate the
accuracy. The confusion matrix is shown in Table 2. In this work, another set of feature Mel-frequency
cepstral coefficient (MFCC) is also implemented to compare with the proposed features. MFCC is
recommended as the best feature for audio event detection [11]. Therefore, several related work used
MFCC and got good performance. The spectral features are 42-dimensional MFCC including log
energy feature, 0th cepstral coefficient, delta and delta-delta coefficients. The features are also
calculated every 64ms with 128ms Hamming window using the same datasets and applied to the same
SVM classifier. The results of using MFCC feature are shown in Table 3.

Table 1
Training and testing datasets information .

Total length

Training

Testing

Expiration

360s (5625frames)

180s (2812frames)

180s (2812frames)

Inspiration

320s (5000frames)

160s (2500frames)

160s (2500frames)

Silence

410s (6406frames)

205s (3203frames)

205s(3203frames)

Snore

670s (10468frames)

335s (5234frames)

335s (5234frames)

Table 2
Results using proposed feature.

Expiration

Inspiration

Silence

Snore

Expiration

95.40%

4.60%

0.00%

0.00%

Inspiration

35.20%

64.60%

0.00%

0.20%

Silence

0.00%

0.00%

100.00%

0.00%

Snore

2.60%

0.00%

0.00%

97.40%

Table 3
Results using MFCC feature.

Expiration

Inspiration

Silence

Snore

Expiration

14.70%

75.60%

0.00%

9.70%

Inspiration

0.10%

99.90%

0.00%

0.00%

Silence

0.00%

5.80%

94.20%

0.00%

Snore

0.00%

5.00%

0.00%

95.00%

The performances of two experiments both show that most of the errors occurred between
expiration and inspiration. However, considering expiration and inspiration as breath, when using our
proposed feature the performance for breath detection is 99.90% while the accuracies for silence and
snore are 100% and 97.40% respectively. Moreover, Table 3 indicates that our proposed feature
outperforms the MFCC in snoring classification problems. Specifically, the MFCC based detection
rates are 94.86 for breath, 94.2% for silence and 95% for snore.
4. Discussion and conclusion
In this paper we proposed the simple and efficient features for snoring detection. The features are 6dimensional sub-band mean power and standard deviations with normalization. The performance of
the proposed system is encouraging and comparable with MFCC based system, which is
recommended as the best feature. The advantages of the proposed system are less calculation amount,
low cost and convenient for application implementation such as bedside devices installed at patients
home and smart phone application.
In future, we will implement a post processing based on this method and apply the results to snoring
episode detection and also explore other feature extraction methods for more comprehensive database.
5. References
[1]

M. R. Mannarino, F. D. Filippo, and M. Pirro, Obstructive sleep apnea syndrome, European Journal of Internal
Medicine, vol. 7, pp. 586-593, 2012.

[2]

A. Tarasiuk, S. G. Dotan, T. Simon, T. TAL, A. Oksenberg, and H. Reuveni, Low socioeconomic status is a risk
factor for cardiovascular disease among adult obstructive sleep apnea syndrome patients requiring treatment,
Chest, vol. 130, pp. 766-773, 2006.

[3]

H. K. Yaggi, J. Concato, W. N. Kernan, J. H. Lichtman, L. M. Brass, and V. Mohsenin, Obstructive Sleep Apnea as
a Risk Factor for Stroke and Death, New England Journal of Medicine, vol. 353, no. 19, pp. 2034-2041, 2005.

[4]

M. Partinen, and T. Telakivi, Epidemiology of obstructive sleep apnea syndrome, Sleep, vol. 15, pp. S1-4, 1992.

[5]

N. Ben-Israel, A. Tarasiuk, and Y. Zigel, Nocturnal sound analysis for the diagnosis of obstructive sleep apnea,
Proceeding of International Conference of IEEE Engineering in Medicine and Biology Society 2010, pp. 61466149, 2010.

[6]

R. Agarwal, and J. Gotman, Digital tools in polysomnography, Jouranl of Clinical Neurophysiology, vol. 19(2),
pp. 136-43, 2002.

[7]

W. D. Duckitt, S. K. Tuomi, and T. R. Niesler, Automatic detection, segmentation and assessment of snoring from
ambient acoustic data, Physiological Measurements. , vol. 27, pp. 1047-1056, 2006.

[8]

M. Cavusoglu, M. Kamasak, O. Erogul, T. Ciloglu, Y. Serinagaoglu, and T. Akcam, An efficient method for
snore/nonsnore classification of sleep sounds, Physiological Measurements, vol. 28, pp. 841-853, 2007.

[9]

E. Dafna, A. Tarasiuk, and Y. Zigel, Automatic Detection of Whole Night Snoring Events Using Non-Contact
Microphone, PLoS ONE, vol. 8, no. 12, pp. e84139, 2013.

[10]

R. Beck, M. Odeh, A. Oliven, and N. Gavriely, The acoustic properties of snores, European Respiratory Journal,
vol. 8(12), pp. 2120-8, 1995.

[11]

T. Kinnunen, and H. Li, An overview of text-independent speaker recognition: From features to supervectors,
Speech Communication, vol. 52, pp. 12-40, 2010.

Você também pode gostar