Você está na página 1de 13

666 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO.

2, MARCH 2006

Blind Source Separation Based on a Fast-Convergence


Algorithm Combining ICA and Beamforming
Hiroshi Saruwatari, Member, IEEE, Toshiya Kawamura, Tsuyoki Nishikawa, Akinobu Lee, and
Kiyohiro Shikano, Member, IEEE

Abstract—We propose a new algorithm for blind source sep- For the high-quality acquisition of audible signals, several
aration (BSS), in which independent component analysis (ICA) microphone array systems based on the DS array have been
and beamforming are combined to resolve the slow-convergence implemented since the 1980s [5]. Recently, many DS array
problem through optimization in ICA. The proposed method
consists of the following three parts: (a) frequency-domain ICA systems with talker localization have been implemented for
with direction-of-arrival (DOA) estimation, (b) null beamforming hands-free telecommunications or speech recognition [6]–[9].
based on the estimated DOA, and (c) integration of (a) and (b) Although the DS array has a simple structure, it requires a
based on the algorithm diversity in both iteration and frequency large number of microphones to achieve high performance,
domain. The unmixing matrix obtained by ICA is temporally
particularly in low-frequency regions. Thus, the degradation of
substituted by the matrix based on null beamforming through
iterative optimization, and the temporal alternation between ICA separated signals at low frequencies cannot be avoided in these
and beamforming can realize fast- and high-convergence opti- array systems.
mization. The results of the signal separation experiments reveal In order to further improve the performance using more effi-
that the signal separation performance of the proposed algorithm cient methods than the DS array, the ABF has been introduced
is superior to that of the conventional ICA-based BSS method,
even under reverberant conditions. [10]–[12]. The goal of the adaptive algorithm is to determine
the optimum directions of the nulls under the specific constraint
Index Terms—Beamforming, blind source separation, indepen- that the desired signal arriving from the look direction is not sig-
dent component analysis, microphone array.
nificantly distorted. This method can improve the signal-sepa-
ration performance with even a small array in comparison with
I. INTRODUCTION the that of the DS array. The ABF, however, has the following
drawbacks. (a) The look direction of each signal which is sep-
S OURCE separation for acoustic signals is the estimation of
original sound source signals from the mixed signals ob-
served in each input channel. This technique is applicable in the
arated must be determined in the adaptation process. Thus, the
DOAs of the separated sound source signals must be determined
realization of noise-robust speech recognition and high-quality in advance. (b) The adaptation procedure should be performed
hands-free telecommunication systems. Methods of achieving during breaks in the target signal to avoid any distortion of sepa-
the source separation can be classified into two groups: methods rated signals. However, we cannot predict signal breaks in con-
based on a single-channel input, and those based on multi- ventional use. These requirements arise from the fact that the
channel inputs. As single-channel types of source separation, conventional ABF is based on supervised adaptive filtering, and
a method of tracking a formant structure [1], the organization this significantly limits the applicability of the ABF to source
technique for hierarchical perceptual sounds [2], and a method separation in practical applications.
based on auditory scene analysis [3] have been proposed. On In recent years, alternative source-separation approaches
the other hand, as a multichannel type of source separation, the have been proposed by researchers who do not use array signal
method based on array signal processing, e.g., a microphone processing but a specialized branch of information theory,
array system, is one of the most effective techniques [4]. In this i.e., information-geometry theory [13], [14]. Blind source
system, the directions of arrival (DOAs) of the sound sources separation (BSS) is the approach for estimating original source
are estimated and then each of the source signals is separately signals using only the mixed signals observed in each input
obtained using the directivity of the array. The delay-and-sum channel, where the independence among the source signals
(DS) array and the adaptive beamformer (ABF) are the most is mainly used for the separation. This technique is based
conventional and widely used microphone arrays currently on unsupervised adaptive filtering [14], and provides us with
utilized for source separation and noise reduction. extended flexibility in that the source-separation procedure
requires no training sequences and no a priori information on
the DOAs of the sound sources. The early contributory studies
on the BSS performed by Cardoso and Jutten [15], [16] used
Manuscript received July 26, 2002; revised December 21, 2004. This work high-order statistics of the signals for measuring the indepen-
was supported in part by CREST (Core Research for Evolutional Science and
Technology) in Japan. The associate editor coordinating the review of this man- dence. Comon clearly defined the term independent component
uscript and approving it for publication was Dr. Walter Kellermann. analysis (ICA), and presented an algorithm that measures inde-
The authors are with the Graduate School of Information Science, pendence among the source signals [17]. In recent works on the
Nara Institute of Science and Technology, Nara 630-0192, Japan (e-mail:
sawatari@is.naist.jp). ICA-based BSS, several methods in which the complex-valued
Digital Object Identifier 10.1109/TSA.2005.855832 unmixing matrices are calculated in the frequency domain have
1558-7916/$20.00 © 2006 IEEE

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
SARUWATARI et al.: BSS BASED ON A FAST-CONVERGENCE ALGORITHM 667

been proposed to deal with the arrival lags among the elements
of the microphone array system [18]–[21]. The ICA-based BSS
approach seems to be a very flexible and effective technique
for the source separation, but it has an inherent disadvantage in
that there is difficulty with the slow convergence of nonlinear
optimization [22].
To resolve this problem, in this paper, we describe a new al-
gorithm for BSS in which ICA and beamforming are combined.
The proposed method consists of the following three parts: (a)
frequency-domain ICA with estimation of the DOA of the sound
source, (b) null beamforming based on the estimated DOA, and
(c) integration of (a) and (b) based on the algorithm diversity in
both iteration and frequency domain. The temporal utilization of Fig. 1. Configuration of a microphone array and signals.
null beamforming through ICA iterations can realize fast- and
high-convergence optimization. The results of the signal sepa- II. DATA MODEL AND CONVENTIONAL BSS METHOD
ration experiments reveal that the signal separation performance
of the proposed algorithm is superior to that of the conventional A. Sound Mixing Model of Microphone Array
ICA-based BSS method, and the utilization of null beamforming In this study, a straight-line array is assumed. The coordi-
in ICA is effective for improving the separation performance nates of the elements are designated , and
and convergence, even under reverberant conditions. the DOAs of multiple sound sources are designated
In the similar context of a combination technique of BSS and (see Fig. 1).
beamforming, Parra et al. have proposed the methods [23], [24] Multiple mixed signals are observed at the microphone array,
in which geometric beamforming is utilized as a specific spatial and these signals are converted into discrete time series via an
constraint in the conventional BSS. Indeed, their methods ap- A/D converter. By applying the discrete time Fourier transform,
pear to be effective in separating the sound sources, particularly we can express the observed signals, in which multiple source
when the room reverberation is relatively short. However, un- signals are linearly mixed with additive noise, as follows in the
like our proposed method, it is not clear that their methods can frequency domain:
contribute toward an improvement of convergence in the filter (1)
updating. It is also worth mentioning that their methods have an
inherent drawback in that all DOAs of sources should be pre- where is the observed signal vector, is the source
viously identified (or known [25]) to construct the geometric signal vector, and is the mixing matrix; these are defined
beamforming, and thus the additional and redundant sensors are as
required. For example, in [23], regarding the separation of only (2)
two sources, they needed eight microphones which are mainly (3)
used for DOA estimation. This may prevent their methods from
being applied to a conventional BSS problem where the number .. ..
of sources is generally equal to that of sensors. On the contrary, . . (4)
our proposed method can still work in theory and practice under
such a condition for sources and sensors. Also, is the additive noise term which generally represents,
Several approaches to address a source permutation problem for example, an environment noise and/or a sensor noise.
have been recently proposed as another possibility on the uti- We introduce the model for dealing with the arrival lags
lization of beamforming in ICA framework [26]–[29]. It is in- among each of the elements of the microphone array. In this
dicated that spatial information is very valid for solving the case, is assumed to be complex-valued. Hereafter, for
source ordering ambiguity inherent in the frequency-domain convenience, we consider only the relative lags among each of
ICA. However, these approaches have nothing to do with ICA- the elements with respect to the arrival time of the wavefront
based filter optimization itself (only concerned with the permu- of each sound source, and neglect the pure delay between
tation problem), and cannot contribute the improvement of con- the microphone and sound source. Also, is regarded as
vergence in ICA. As far as we know, there were no detailed being identical to the source signals observed at the origin. For
studies on a direct application of the ICA-beamforming com- example, by neglecting the effect of the room reverberation,
bination to the convergence improvement before our proposed we can rewrite the elements in the mixing matrix (4) as the
method. following simple expression:
The rest of this paper is organized as follows. In Sections II
(5)
and III, the formulation for the general BSS problems and the
principle of the proposed method are explained. In Sections IV where is the arrival lag with respect to the th source signal
and V, the signal separation experiments are described. Fol- from the direction, which is observed at the th microphone
lowing a discussion on the results of the experiments, we present at the coordinate of . Also, is the velocity of sound. If the
our conclusions in Section VI. effect of room reverberation is considered, the elements in the

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
668 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 2, MARCH 2006

frequency. The source permutation and gain indeterminacy


are problems inherent in frequency-domain ICA, but several
methods for mitigating these problems have been presented
[19], [21], [26]–[29].
In an actual implementation of frequency-domain ICA, we
perform the signal separation procedure as described below.
First, the short-time analysis of observed signals is conducted
using the discrete Fourier transform (DFT) frame by frame. By
plotting the spectral values in a frequency bin of one micro-
phone input frame by frame, we consider them as a time series
. The other inputs in the same frequency bin are dealt
with in the same manner. Next, we perform the estimation of
the optimal and separation procedure (7) with respect to
Fig. 2. Blind source separation procedure performed in frequency-domain
ICA. all frequency bins. Finally, by applying the inverse DFT and the
overlap-add technique to the separated time series , we
reconstruct the resultant source signals in the time domain.
mixing matrix, , are given by more complicated values
depending on the room reflections.
C. Estimation of Unmixing Matrix
B. Overview of Conventional Frequency-Domain ICA Considering the blind estimation of the unmixing matrix, we
use the optimization algorithm based on the minimization of the
Here we consider the case in which the number of sound
Kullback–Leibler divergence between the joint probability den-
sources equals the number of microphones , i.e., . In
sity function (PDF) of and the product of marginal PDFs
addition, similarly to the conventional ICA context, we assume
of . This algorithm was basically introduced by Murata
that the additive noise is negligible in (1).
and Ikeda for on-line learning [19], but in this study it is modi-
In the frequency-domain ICA, first, the convolutive mixture is
fied by the authors for off-line learning with stable convergence.
simplified down to instantaneous mixtures in the time-frequency
The optimal is obtained by using the following iterative
domain, i.e., the short-time Fourier transform for (1) yields
equation:
(6)
where , and (11)
are the time-frequency representa-
tions of the observed signals and source signals, respectively where denotes the time-averaging operator, is used to ex-
(see Fig. 2). Also, represents the time dependence of the press the value of the th step in the iterations, and is the
short-time analysis. The frequency-domain ICA involves step-size parameter. Many kinds of nonlinear vector functions
an assumption that the time series of the source signals, have been proposed, and it is well known that or
, at each frequency are mutually the sigmoid function is appropriate for super-Gaussian sources
independent. such as a speech signal [13]. In this study, we define the non-
Next, signal separation using the complex-valued unmixing linear vector function as
matrix is conducted as
(12)
(7)
where
(13)
(8)
where and are the real and imaginary parts
.. .. (9) of , respectively. The nonlinear function given by (12)
. .
and (13) indicates that the nonlinearity is applied to the real and
imaginary parts of the complex-valued signals separately. This
The unmixing matrix is optimized in ICA so that the time type of complex-valued nonlinear function has been introduced
series output becomes mutually independent; this opti- by, e.g., Smaragdis [20] for the frequency domain ICA, where
mization procedure is described in Section II-C. If the optimiza- it can be assumed in speech signals that the real (or imaginary)
tion is completed, the separated signal becomes the orig- parts of the time-frequency representations of sources are mu-
inal source signal up to a source permutation and gain tually independent. Note that there is another kind of nonlinear
scaling, as expressed by function recently proposed by Sawada et al. [30], which is di-
(10) rectly applied to absolute values of the complex-valued signals.
Both of [20] and [30] show that the speech signal separation can
where is an arbitrary permutation matrix, and is a be well treated in the frequency domain by using such kinds of
diagonal matrix which has arbitrary diagonal entries for each nonlinear functions.

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
SARUWATARI et al.: BSS BASED ON A FAST-CONVERGENCE ALGORITHM 669

III. PROPOSED ALGORITHM

A. Motivation and Strategy

The conventional ICA method inherently has a significant


disadvantage which is due to slow and poor convergence
through nonlinear optimization in ICA, particularly when
introducing a poor initial setting of the unmixing matrix.
Meanwhile, one of the authors has recently provided an in-
sight into the close relationship between ICA and the fixed null
beamformer [31]. It is reported that, after the filter update has
been completed, ICA with the small number of sensors (e.g.,
) often provides directional nulls against the undesired
source signals, unlike the traditional DS array which enhances
the target signal via the directional lobe. Indeed, the null beam-
forming is approximately optimal for the signal separation when
the effect of the room reverberation is negligible, but this op-
timality cannot hold under reverberant conditions because the
exact signal reduction cannot be achieved by using only the di-
rectional nulls. The null-beamforming approach, however, still
has the advantage that there is no difficulty with respect to the
slow convergence of optimization because the null beamformer
is determined by using only DOA information without indepen- Fig. 3. Proposed algorithm combining frequency-domain ICA and
beamforming.
dence between sound sources.
The above-mentioned findings motivate us to combine ICA
and null beamforming. That is, a specific unmixing matrix B. Basic Algorithm in Case of
which is designed on the basis of null beamforming can assist First, in order to explain an overview of our basic idea, we
ICA in the convergence and yield a good initial value of give full details of the proposed algorithm especially in the case
with regard to an advance removal of the direct sound of the of , where several procedures can be simplified and
interference. In this paper, we propose an algorithm based on easily implemented. The extension of the proposed algorithm
the temporal alternation of learning between ICA and null into the general case will be described in the next
beamforming; the unmixing matrix obtained through section.
ICA is temporally substituted by the matrix based on null The proposed algorithm with is conducted using
beamforming for a temporal initialization or acceleration of the the following steps with respect to all frequency bins in parallel
iterative optimization. (see Fig. 3).
[Step 1: Initialization] Set the initial , i.e., , to
It is worth noting that even in the proposed algorithm, DOA
an arbitrary value, where the subscript is set to be 0.
information for each source is needed before the construction
[Step 2: 1-time ICA iteration] Optimize using the
of the null beamformer, similarly to other beamforming tech-
following one-time ICA iteration:
niques. However, this DOA estimation was considered as a
tough problem under common BSS tasks where the number
of sources, , equals that of sensors, . For instance, the
traditional high-resolution DOA estimator, e.g., MUSIC and
minimum variance methods [32] cannot be applied because
these methods require the condition that . To achieve the
DOA estimation blindly in the case of , we introduce (14)
a new combination in which the DOA estimation follows
where the superscript “(ICA)” is used to express the fact that the
one-time ICA iteration and can be performed by using the
unmixing matrix is obtained by ICA, whereas in (14)
unmixing matrix obtained from ICA. This DOA estimation
originated from either ICA or null beamforming, as described
method is mainly based on our earlier finding that the direc-
in step 5.
tional null is steered to the DOA of the suppressed source in
[Step 3: DOA estimation] Estimate DOAs of the sound
ICA. Consequently, we can approximately estimate the DOAs sources by utilizing the directivity pattern of the array system.
only to find the null directions in the directivity patterns ob- The directivity pattern for the th output is designated by
tained from ICA. Although the proposed combination approach , which is generally obtained by the multiplication of
partly includes heuristics with no guarantee of mathematically array weights and a steering vector as [32]
exact convergence, the effectiveness will be experimentally
discussed in Sections IV and V. (15)

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
670 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 2, MARCH 2006

[Step 5: Diversity using cost function] In order to integrate


the subband ICA with null beamforming, we introduce the fol-
lowing strategy for selecting the most suitable unmixing matrix
in each frequency bin and at each iteration point, i.e., algorithm
diversity in both iteration and frequency domain. As a cost func-
tion for achieving the diversity, we introduce a coherence func-
tion between the separated signals, which is basically defined by

(22)

where and are the separated signals defined by


Fig. 4. Example of directivity patterns constructed by beamforming in Step 4. (7)–(9). We calculate the estimated coherence function once for
and once for ; these
where is the steering vector which is defined by are written as and , respectively.
In fact, the coherence function cannot indicate the exact inde-
pendence between sources, unlike ICA. However, we use this
(16) function to assess the source independence approximately be-
cause of the feasible advantage that the coherence function does
In the directivity patterns, directional nulls exist in only two not include any nonlinear calculations which often entail large
particular directions. Accordingly, by obtaining statistics with computational complexity. Note that the coherence function of
respect to the directions of nulls at all frequency bins, we can the same kind has been previously introduced as a criterion of the
estimate the DOAs of the sound sources. The DOA of the th separation filter optimization in BSS for acoustic signals [24].
sound source, , can be estimated as If the expected separation performance of beamforming
is superior to that of ICA, the following condition holds,
(17) ; otherwise,
. Thus, an observation of the conditions
yields the following algorithm:
where is the total number of points of DFT, and rep-
resents the DOA of the th sound source at the th frequency
bin. These are given by

(18) (23)
If the th iteration is the final iteration, go to step 6; oth-
(19) erwise, go back to step 2 and repeat the ICA iteration, inserting
as given by (23) into in (14) with an increment
where is a function for obtaining the of .
smaller (larger) value between and . We conduct this pro- [Step 6: Ordering and scaling] Using the DOA information
cedure on a specific ordering basis, such that the smaller obtained in step 3, we can detect and correct the source per-
corresponds to the first sound source and the larger corre- mutation and the gain inconsistency [26]. From the directivity
sponds to the second sound source, as depicted in Fig. 1. patterns in all frequency bins, we collect the specific those in
[Step 4: Beamforming] Construct an alternative matrix for which the directional null is steered to the directions of .
signal separation, , based on the null-beamforming Also, we collect the other specific directivity patterns in which
technique where the DOA information obtained in the ICA sec- the directional null is steered to the directions of . By
tion is used. Hereafter, the th element of is written performing this procedure, we can resolve the permutation
by . In , we assume that the look direc- problem. The gain inconsistency problem is resolved by nor-
tion is and the directional null is steered to (see solid line malizing the directivity patterns according to the gain in each
in Fig. 4). Also, in , the look direction is and source direction after the classification. The resultant separated
the directional null is steered to (see broken line in Fig. 4). signals can be obtained as follows:
Under these assumptions, the unmixing matrix sat-
isfies the following equation:

(20)

where is a 2 2 identity matrix. From (20), we can obtain


as

(21) (24)

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
SARUWATARI et al.: BSS BASED ON A FAST-CONVERGENCE ALGORITHM 671

where is the final unmixing matrix obtained in (23), and 1) Make the whole set of to be classified, as
is the resultant directional gain for the th output at
the estimated th source direction , which is given by
(31)
(25)
(26) where is the total number of detected directional nulls
and at most .
2) Set initial partitions , where
C. Extended Algorithm to Case of and . Also, the
In this section, an extension for more than two sources with terminal partitions and are fixed at and ,
more than two sensors (i.e., ) is described. Basically, respectively, throughout the algorithm.
the straightforward extension can be easily made by substituting 3) Given the partitions, calculate the centroids
-dimensional vectors and matrices for all of the 2-D as
vectors and 2 2 matrices in Section III-B. For example, (15)
and (16) are rewritten with unmixing matrix
by (32)

(27) where denotes the number of under


.
4) Given the centroids, update the new partitions as
for .
5) Go back to Step 3), and repeat the loop in Steps 3)–5)
(28) with an appropriate number of iterations. If the centroids
do not move, then stop the algorithm. The final centroids
Also, (21) can be rewritten as
are regarded as the resultant estimated DOAs .
(29) As for step 5 in Section III-B, a slight extension should be
applied. The cost function (22) is replaced by the following
As the exception, the DOA estimation part (step 3) in averaged coherence function among separated signals
Section III-B requires a large modification. The current DOA
estimation algorithm presented in this paper assumed that the
sources are steered in only two directions, and thus we can
heuristically drop the directions into two categories “large (33)
(max)” or “small (min).” This procedure is very simple and has
the benefit of the low computational cost, but obviously the rule where represents the number of combina-
cannot be available in . tions in which two entries are selected from separated signals.
To overcome the problem, we newly introduce an extended
DOA estimation algorithm based on a directional null clustering IV. SIMULATION EXPERIMENTS AND RESULTS
technique, which can work even in the general case of
. Step 3 in Section III-B can be modified into the following In this section, computer-simulation-based BSS experiments
algorithm step 3’. are discussed. We use realistic (measured) room impulse re-
[Step 3’: DOA estimation in ] In the th direc- sponses for the generation of convolutive mixtures. As for the
tivity pattern at the th frequency bin, at most additive noise term in (1), we assume that the time-domain
directional nulls can be found. We define the set of DOAs cor- representations of the elements in the vector are mutu-
responding to the directional nulls as ; which is given ally uncorrelated white Gaussian noises, where the ratio of the
by mixed signal and the noise is set to 40 dB. First, we perform
a simple source separation experiments with two microphones
and two sources to investigate the basic properties of the pro-
posed method. Secondly, the extended experiment with three
(30)
microphones and three sources are shown.
where is a positive small value, and represents a
set of which satisfies the conditions and simultaneously. A. Conditions for Experiments Under
is evidently a good candidate of source directions. A two-element array with the interelement spacing of 4 cm
To estimate the DOAs of sources, we classify with all is assumed. We determined this interelement spacing by con-
and into categories, and then regard the centroids as the sidering that: 1) the spacing should be as large as possible so
estimated DOAs. This classification can be carried out by using as not to yield the poor null filters in (21) at low frequencies,
a Lloyd clustering algorithm [33] as follows. and 2) the spacing should be smaller than half of the minimum

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
672 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 2, MARCH 2006

Fig. 5. Layout of reverberant room used in simulation experiments (K=L= Fig. 6. Noise reduction rates for different iteration points in proposed method,
2).
conventional ICA, and iteratively optimized null beamformer. RT is 0 ms, and
K L= = 2.

TABLE I
ANALYSIS CONDITIONS FOR SIGNAL SEPARATION
requires 1500 frame shifts for 3 s length data within one itera-
tion, and this corresponds to 1500 iterations in the on-line algo-
rithm (e.g., the off-line 100 iterations correspond to the on-line
150 000 iterations). The improvement of the efficiency remains
an open problem.

B. Objective Evaluation of Separated Signal Under


Nonreverberant Condition
In order to compare the basic performance of the proposed
wavelength in order to avoid the spatial aliasing effect; it corre-
algorithm with that of the conventional method, noise reduction
sponds to cm ( cm) in 8 kHz sampling.
rates (NRRs), defined as the output signal-to-noise ratio (SNR)
The speech signals are assumed to arrive from two direc-
in decibels minus the input SNR in dB, are shown in Fig. 6 for
tions, and 40 . Two kinds of sentences with 3-s length,
different iteration points in ICA when RT is 0 ms. This figure
spoken by two male and two female speakers selected from
contains the following three curves.
the ASJ continuous speech corpus for research [34], are used
as the original speech samples. Using these sentences, we ob- : Our proposed BSS method de-
tain 12 combinations with respect to speakers and source direc- scribed in Section III-B.
tions. In these experiments, we used the following signals as : The conventional ICA-based BSS
the source signals: 1) the original speech not convolved with method described in Section II-B.
the room impulse responses (only considering the arrival lags This also corresponds to the special
among microphones), and 2) the original speech convolved with case that ICA is always chosen in
the room impulse responses recorded in two environments char- step 5 of the proposed algorithm,
acterized by different reverberation times (RTs). Hereafter, we i.e., always
designate the experiments using the signals described in 1) as in (23).
the nonreverberant tests, and those described in 2) as the re- : The iteratively optimized null beam-
verberant tests. The impulse responses are recorded in a room former which corresponds to the
with variable RT as shown in Fig. 5. The RTs of the impulse re- special case that the null beam-
sponses recorded in the room are 150 and 300 ms, respectively. former is always chosen in step 5 of
As for an initial value of , we constructed a simple DS the proposed algorithm, i.e., always
array in the first row of which steers the look direction in (23).
to , and one in the second row of which steers the These values are averages of all of the combinations with respect
look direction to 60 . to speakers and source directions.
The analysis conditions of these experiments are summarized From Fig. 6, it is evident that the proposed algorithm can
in Table I. In this study, we only deal with the off-line (batch) show a rapid convergence, and the separation performances of
algorithm because it is easy to treat, and forms a basis for the the proposed algorithm are superior to those of the conventional
on-line algorithm. Thus, we use a very short frame shift (2 ms) ICA-based BSS method at every iteration point, even when con-
rather than the long frame length (128 ms) to obtain a large sidering the additional computational cost of the proposed algo-
number of time-series samples of , although this is not an rithm (the proposed algorithm has a computational complexity
efficient approach for a real-time application. It should be men- of approximately 1.9-fold that of the conventional ICA). For
tioned that the off-line algorithm introduced in this experiment example, compared with the conventional ICA, the proposed

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
SARUWATARI et al.: BSS BASED ON A FAST-CONVERGENCE ALGORITHM 673

Fig. 7. Average of estimated DOA at each frequency corresponding to 030


under the nonreverberant condition. Fig. 9. Average of estimated DOA at each frequency corresponding to 40
under the nonreverberant condition.

Fig. 8. Deviation of estimated DOA at each frequency corresponding to 030


under the nonreverberant condition.
Fig. 10. Deviation of estimated DOA at each frequency corresponding to 40
under the nonreverberant condition.
method can improve the NRR by about 15.2 dB at the 30-it-
eration point. It is also interesting that the iteratively optimized
null beamformer shows almost the similar performance curve
as that of the proposed method. This implies that the proposed
method would select the null beamformer in the diversity part;
this will be discussed in detail later.
For the results of DOA estimation, Figs. 7 and 8 show the av-
erage and deviation of the estimated DOA at each frequency cor-
responding to , respectively. Also, Figs. 9 and 10 show the
average and deviation of the estimated DOA at each frequency
corresponding to 40 , respectively. As shown in these figures,
the proposed algorithm can update appropriately with a
more accurate estimation of DOA than the conventional ICA.
This contributes to the realization of fast and high convergence
Fig. 11. Result of alternation between ICA and null beamforming through
through the optimization of in the proposed algorithm iterative optimization by the proposed algorithm. The symbol, black box,
under the nonreverberant condition. indicates that the null beamforming is used at the iteration point and frequency
Fig. 11 shows an example of alternation results (for a specific bin. RT is 0 ms, and K L = = 2.
male-male combination) between ICA and null beamforming
through iterative optimization by the proposed algorithm when the null beamformer with ideal DOA information performs the
the RT is 0 ms. In this figure, the symbol, black box, indicates NRR of 35 dB. Thus, the result shown in Fig. 11 indicates that
that the null beamforming is used in the iteration point and fre- the proposed algorithm has the ability to select an appropriate
quency bin. As shown in Fig. 11, the unmixing matrix obtained signal-separation algorithm automatically.
by ICA is substituted by the matrix based on null beamforming
through almost all iteration points at every frequency bin. In C. Objective Evaluation of Separated Signal Under
general, the null beamforming is more suitable for the sepa- Reverberant Condition
ration of directional sound sources under the nonreverberant In order to compare the performance of the proposed algo-
condition [22]. Indeed, our preliminary experiment shows that rithm with that of the conventional BSS under reverberant con-

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
674 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 2, MARCH 2006

Fig. 14. Result of alternation between ICA and null beamforming through
iterative optimization by the proposed algorithm. The symbol, black box,
indicates that the null beamforming is used at the iteration point and frequency
Fig. 12. Noise reduction rates for different iteration points in proposed
method, conventional ICA, and iteratively optimized null beamformer. RT is bin. RT is 150 ms, and K L = = 2.
150 ms, and K L= = 2.

Fig. 15. Result of alternation between ICA and null beamforming through
iterative optimization by the proposed algorithm. The symbol, black box,
indicates that the null beamforming is used at the iteration point and frequency
bin. RT is 300 ms, and K L = = 2.

Fig. 13. Noise reduction rates for different iteration points in proposed
method, conventional ICA, and iteratively optimized null beamformer. RT is • Null beamforming is used for the acceleration of learning
300 ms, and K L= = 2. early in the iterations because is a rough ap-
proximation of the unmixing matrix.
ditions, NRRs are shown in Figs. 12 and 13. In addition, as a • ICA is used after the early part of the iterations because it
baseline algorithm, we performed experiments using the null can update the unmixing matrix more accurately.
beamformer with ideal DOA information, and we obtained the • The unmixing matrix obtained by ICA is substituted by
NRR of 6.5 dB when the RT is 150 ms and that of 5.7 dB when the matrix based on null beamforming through all iteration
the RT is 300 ms. points at particular frequency bins where the independence
The results reveal that the separation performances of the between the sources is low.
proposed algorithm are superior to those of the conventional From these results, although null beamforming is not suitable
ICA-based BSS method at every iteration point, even when con- for signal separation under the condition that direct sounds and
sidering the additional computational cost of the proposed algo- their reflections exist, we can confirm that the temporal utiliza-
rithm. More specifically, compared with the conventional ICA, tion of null beamforming for algorithm diversity through ICA
the proposed method can improve the NRR by about 5.7 dB iterations is effective for improving the separation performance
at the 30-iteration point when the RT is 150 ms and by about and convergence.
2.3 dB when the RT is 300 ms. Also these results are recog-
nized as being more promising than the results of simple null D. Experimental Comparison With Alternative Combination
beamforming. Technique of BSS and Beamforming
Figs. 14 and 15 show the examples of alternation results be- As described in Section I, there are alternative approaches for
tween ICA and null beamforming through iterative optimization combining BSS and geometric beamforming [23], [24]. Parra
by the proposed algorithm when the RTs are 150 and 300 ms, et al. [23] have proposed Geometric Source Separation (GSS)
respectively. As shown in Figs. 14 and 15, the proposed algo- in which beamforming is utilized as a specific spatial constraint
rithm can function automatically as follows. in the conventional BSS. The aim of this section is to discuss

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
SARUWATARI et al.: BSS BASED ON A FAST-CONVERGENCE ALGORITHM 675

the differences between the proposed method and Parra’s GSS


through the experimental comparison.
The iterative learning rule of the unmixing matrix in GSS
consists of the following two terms: (a) a gradient of the cost
function for source separation, and (b) a gradient of the penalty
function for keeping the separation filter from being apart from
a specific beamformer. Hereafter, we consider the null beam-
former as the target beamformer in GSS. For example in the
case of , the detailed learning rule is given by

Fig. 16. Comparison among noise reduction rates for different iteration
points in proposed method and Parra’s Geometric Source Separation method
(34) (GSS-Ideal and GSS-Estimated). RT is 150 ms, and K L
= = 2.

where is a normalization
factor ( represents the Frobenius norm), and is a
weight given by the inverse of the condition number of the
matrix . and are the
cross-power spectra of the input and the output ,
respectively, which are calculated around the time (frame)
index . and should be estimated and given in advance
via an appropriate external DOA estimator.
In this paper, we estimate and at three
time instances with each 1 s data, i.e., the total length of the
input sound is 3 s similarly in the previous experiments. The
step-size parameter is set to which is the optimal
value to provide the fastest and highest convergence. The rest of
the experimental conditions are the same as those of the previous
experiments (see Section IV-A). We introduced two types of
GSSs which correspond to different DOA-estimation processes Fig. 17. Comparison among noise reduction rates for different iteration
as follows. points in proposed method and Parra’s Geometric Source Separation method
: GSS with ideal DOAs for each of
(GSS-Ideal and GSS-Estimated). RT is 300 ms, and K L
= = 2.
sources, where we assume that ac-
curate DOA information is previ- that two DOAs for two sources is
ously known, and consequently this heuristically given by combining
GSS is not blind. This will give the the detected value and the initial
upper bound on the separation per- assumption ( or ). For ex-
formance of GSS. ample, when the detected DOA is
: Blind GSS driven by the estimated 25 , the DOAs for GSS ( and )
DOAs, where the DOA estimation is are set to and . Also, when
performed by looking at the output the detected DOA is , and
power of the DS array steered to are set to and .
various directions. The reason for Figs. 16 and 17 shows the NRR results under and
choosing the DS array as a DOA 300 ms for different iterations, where the NRRs are the averages
estimator is that the other DOA of 12 speaker combinations. From these figures, the following
estimation methods, e.g., MUSIC points are revealed.
method cannot be used in this case • In both cases of RT=150 and 300 ms, GSS-Ideal can
because ; indeed MUSIC achieve the source separation to some extent, but this
works only when . In the DS result represents an only slight outperformance from the
array, if two directional peaks are null beamformer. As compared to the proposed method,
detected, then we use the DOAs as the separation performance is relatively low. This result in-
they are. Otherwise, when a single dicates that the spatial constraint by the ideal beamformer
directional peak is only detected, might help the separation, but the separation performance
we introduce a DOA hypothesis is trapped around almost the same level of the beamformer

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
676 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 2, MARCH 2006

Fig. 18. Layout of reverberant room used in simulation experiment ( K =


L = 3) and real recording experiment. Fig. 19. Noise reduction rates for different iteration points in proposed
method, conventional ICA, and iteratively optimized null beamformer. RT is
200 ms, and K L
= = 3.

in the end of the iterations. Needless to say, this level is


never optimal under the reverberant conditions.
• In both cases of and 300 ms, GSS-Estimated
cannot achieve the practical source separation any more,
and the separation performance is obviously lower than
that of the proposed method. This is mainly due to the de-
terioration in the DOA estimation. In fact, for all of the
speaker combinations, the averages of the estimated and
in ms were and , and the de-
viations were 25.1 and 24.7 . Also, the averages of the
estimated and in ms were and
and the deviations were 23.6 and 22.8 . This is a
hard limitation for applying GSS to the case of
under reverberant conditions. Fig. 20. Result of alternation between ICA and null beamforming through
iterative optimization by the proposed algorithm. The symbol, black box,
In summary, above-mentioned findings imply that the idea (the indicates that the null beamforming is used at the iteration point and frequency
spatial constraint by the beamforming with the external reg- bin. RT is 200 ms, and K L = = 3.
ular DOA estimator) introduced in GSS is not alway valid for
the source separation, especially with the small number of sen- reveal that (a) the proposed algorithm can perform the proper
sors under reverberant conditions. On the contrary, these re- diversity between ICA and beamforming even in the case of
sults can provide the convincing evidence that our proposed , and (b) the proposed method obviously outper-
strategy (diversity between ICA and the null beamforming with forms the conventional ICA-based BSS method at every itera-
the null-finding-based DOA estimation) is more beneficial to the tion point. Thus, it can be asserted that the proposed method is
BSS problem in comparison to Parra’s GSS. feasible for the case of as well as .
E. Experimental Result Under
V. ILLUSTRATIVE EXPERIMENT WITH REAL RECORDINGS
The source separation experiment with is
conducted. The room impulse responses are measured in an or- A. Conditions for Experiment
dinary room, which has the RT of 200 ms, as shown in Fig. 18. In this section, a real-recording-based BSS experiment is per-
A three-element array with interelement spacing of 4.3 cm is formed using actual devices in a real acoustic environment. The
used. Three loudspeakers are placed as the sound sources at experiment was carried out in the reverberant room as shown in
three directions, , and . We use the DS-array-based Fig. 18 ( ms). Two of three loudspeakers, S1, S2, and
initial value which steers the look directions to S3 are used to actually sound independent speech sources, and
, and . The analysis conditions for the experiment these speech signals are received by a two-element array with in-
are the same as those in Table I except the step-size parameter, terelement spacing of 4.3 cm. We carried out the following three
where . experiments via above-mentioned equipments: [ConFig. 1] sep-
Fig. 19 shows a NRR results (averaged NRR for 12 combi- aration of S1 and S2, [ConFig. 2] separation of S2 and S3, and
nations) of the proposed method described in Section III-C and [ConFig. 3] separation of S1 and S3, where we consistently use
the conventional BSS. In addition, Fig. 20 shows the example of the DS-array-based initial value which steers the look
alternation results between ICA and null beamforming through directions to and . The analysis conditions for these
iterative optimization by the proposed algorithm. These results experiments are the same as those in Table I.

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
SARUWATARI et al.: BSS BASED ON A FAST-CONVERGENCE ALGORITHM 677

Overall these experimental results indicate encouraging


evidence for the feasibility of the proposed algorithm for a
real-world application such as a robust hands-free speech
communication system.

VI. CONCLUSION
In this paper, we described a fast- and high-convergence al-
gorithm for BSS where null beamforming is temporally used for
algorithm diversity through ICA iterations. The simulation re-
sults of the signal separation experiments reveal that the signal
separation performance of the proposed algorithm is superior to
that of the conventional ICA-based BSS method, and the uti-
lization of null beamforming in ICA is effective for improving
the separation performance and convergence, even under rever-
berant conditions. More specifically, compared with the conven-
Fig. 21. Noise reduction rates for different iteration points in ICA under real
recording condition where K = L = 2. RT is 200 ms, and the background tional method, the proposed method can improve the NRR by
noise level is 37 dB(A). about 15.2 dB at the 30-iteration point when RT is 0 ms, by
about 5.7 dB when RT is 150 ms, and by about 2.3 dB when
RT is 300 ms. In addition, we have experimentally shown the
superiority of the proposed method over Parra’s combination
approach for BSS and beamforming. The results of the BSS
experiment with actual devices in a real acoustic environment
demonstrate that the proposed method can work well even in
the existence of the background noise. The outperformance over
the conventional ICA in the case of two sound sources has been
illustrated.
This paper mainly discussed the BSS algorithms with off-
line learning, but the extension to the on-line application has
not been addressed. The further study on the on-line algorithm
is an open problem, and this will be indispensable particularly
when we deal with more realistic situations, e.g., moving sound
sources and time-varying systems.
Fig. 22. Result of alternation between ICA and null beamforming through
iterative optimization by the proposed algorithm. The symbol, black box,
indicates that the null beamforming is used at the iteration point and frequency ACKNOWLEDGMENT
bin. RT is 200 ms, and K L = = 2.
The authors are grateful to Dr. S. Makino and R. Mukai of
NTT Co., Ltd. for their suggestions and discussions on this
The levels of background noise and each of the sound sources work. The authors thank S. Ukai of NAIST for his contribution
measured at the array origin were 37 dB(A) and 60 dB(A), in the part experiment.
respectively. It also should be mentioned that all of the experi-
mental apparatus may include possible sensor noise, environ-
REFERENCES
ment noise, and/or nonlinear error which is produced in, for
example, amplifiers. [1] T. W. Parsons, “Separation of speech from interfering speech by means
of harmonic selection,” J. Acoust. Soc. Amer., vol. 60, pp. 911–918,
1976.
B. Results [2] K. Kashino, K. Nakadai, T. Kinoshita, and H. Tanaka, “Organization of
hierarchical perceptual sounds,” in Proc. 14th Int. Conf. Artificial Intel-
Fig. 21 shows NRR results to illustrate the performances of ligence, vol. 1, 1995, pp. 158–164.
the proposed algorithm and the conventional BSS under a real [3] M. Unoki and M. Akagi, “A method of signal extraction from noisy
environment. The NRRs are the averaged scores with respect to signal based on auditory scene analysis,” Speech Commun., vol. 27, pp.
261–279, 1999.
three configurations, ConFig. 1–ConFig. 3. The results reveal [4] G. W. Elko, “Microphone array systems for hands-free telecommunica-
that the proposed algorithm outperforms the conventional ICA- tion,” Speech Commun., vol. 20, pp. 229–240, 1996.
based BSS method at every iteration point, even in the existence [5] J. L. Flanagan, J. D. Johnston, R. Zahn, and G. W. Elko, “Computer-
steered microphone arrays for sound transduction in large rooms,” J.
of the background noise. At the final 200-iteration point, the Acoust. Soc. Amer., vol. 78, pp. 1508–1518, 1985.
significant improvement of more than 4 dB can be obtained over [6] H. Wang and P. Chu, “Voice source localization for automatic camera
the conventional ICA. pointing system in videoconferencing,” in Proc. ICASSP’97, Apr. 1997,
pp. 187–190.
In Fig. 22, we show the example of alternation results be- [7] K. Kiyohara, Y. Kaneda, S. Takahashi, H. Nomura, and J. Kojima, “A
tween ICA and null beamforming through iterative optimization microphone array system for speech recognition,” in Proc. ICASSP’97,
by the proposed algorithm. This figure is a good demonstration Apr. 1997, pp. 215–218.
[8] M. Omologo, M. Matassoni, P. Svaizer, and D. Giuliani, “Microphone
for telling that the proposed algorithm can work properly and array based speech recognition with different talker-array positions,” in
achieve the automatic diversity as described in Section IV-C. Proc. ICASSP’97, Apr. 1997, pp. 227–230.

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.
678 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 2, MARCH 2006

[9] H. F. Silverman and W. R. Patterson III, “Visualizing the performance Hiroshi Saruwatari (M’00) was born in Nagoya,
of large-aperture microphone arrays,” in Proc. ICASSP’99, Mar. 1999, Japan, on July 27, 1967. He received the B.E.,
pp. 969–972. M.E., and Ph.D. degrees in electrical engineering
[10] O. L. Frost, “An algorithm for linearly constrained adaptive array pro- from Nagoya University in 1991, 1993, and 2000,
cessing,” Proc. IEEE, vol. 60, pp. 926–935, 1972. respectively.
[11] L. J. Griffiths and C. W. Jim, “An alternative approach to linearly con- He joined Intelligent Systems Laboratory,
strained adaptive beamforming,” IEEE Trans. Antennas Propagat., vol. SECOM Co., Ltd., Tokyo, Japan, in 1993, where
30, pp. 27–34, 1982. he engaged in the research and development on the
[12] Y. Kaneda and J. Ohga, “Adaptive microphone-array system for noise ultrasonic array system for the acoustic imaging.
reduction,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, He is currently an Associate Professor of Graduate
pp. 1391–1400, 1986. School of Information Science, Nara Institute of
[13] T.-W. Lee, Independent Component Analysis. Norwell, MA: Kluwer, Science and Technology. His research interests include array signal processing,
1998. blind source separation, and sound field reproduction. He is a member of the
[14] S. Haykin, Unsupervised Adaptive Filtering. New York: John Wiley, IEICE, Japan VR Society, and the Acoustical Society of Japan.
2000.
[15] J. F. Cardoso, “Eigenstructure of the 4th-order cumulant tensor with ap-
plication to the blind source separation problem,” in Proc. ICASSP’89,
1989, pp. 2109–2112.
[16] C. Jutten and J. Herault, “Blind separation of sources part I: An adaptive Toshiya Kawamura received the B.E. degrees in electrical engineering from
algorithm based on neuromimetic architecture,” Signal Process., vol. 24, Kinki University in 1999. He received the M.E. degrees in information science
pp. 1–10, 1991. from Nara Institute of Science and Technology in 2001.
[17] P. Comon, “Independent component analysis, a new concept?,” Signal His research interests include array signal processing and blind source
Process., vol. 36, pp. 287–314, 1994. separation.
[18] V. Capdevielle, C. Serviere, and J. Lacoume, “Blind separation of wide- Mr. Kawamura is a member of the Acoustical Society of Japan.
band sources in the frequency domain,” in Proc. ICASSP’95, 1995, pp.
2080–2083.
[19] N. Murata and S. Ikeda, “An on-line algorithm for blind source separa-
tion on speech signals,” in Proc. 1998 Int. Symp. Nonlinear Theory and Tsuyoki Nishikawa was born in Mie, Japan, on
Its Application (NOLTA’98), vol. 3, Sep. 1998, pp. 923–926. February 13, 1978. He received the B.E. degree in
[20] P. Smaragdis, “Blind separation of convolved mixtures in the frequency electrical engineering from Kinki University in 2000.
domain,” Neurocomput., vol. 22, pp. 21–34, 1998. He received the M.E. degree in information science
[21] L. Parra and C. Spence, “Convolutive blind separation of nonstationary from Nara Institute of Science and Technology in
sources,” IEEE Trans. Speech Audio Processing, vol. 8, pp. 320–327, 2002. He is currently an Ph.D. candidate of Nara
2000. Institute of Science and Technology.
[22] H. Saruwatari, S. Kurita, K. Takeda, F. Itakura, and K. Shikano, “Blind His research interests include array signal pro-
source separation based on subband ICA and beamforming,” in Proc. cessing and blind source separation.
ICSLP2000, vol. 3, Oct. 2000, pp. 94–97. Mr. Nishikawa is a member of the IEICE and the
[23] L. Parra and C. V. Alvino, “Geometric source separation: Merging con- Acoustical Society of Japan.
volutive source separation with geometric beamforming,” IEEE Trans.
Speech Audio Processing, vol. 10, no. 6, pp. 352–362, 2002.
[24] C. Fancourt and L. Parra, “The generalized sidelobe decorrelator,” in
Proc. IEEE Workshop on Application of Signal Processing to Audio and Akinobu Lee was born in Kyoto, Japan, on De-
Acoustics, 2001, pp. 167–170. cember 19, 1972. He received the B.E. and M.E.
[25] M. S. Pedersen, U. Kjems, K. B. Rasmussen, and L. K. Hansen, “Semi- degrees in information science, and the Ph.D. degree
blind source separation using head-related transfer functions,” in Proc. in informatics from Kyoto University, Kyoto, Japan,
ICASSP 2004, vol. V, 2004, pp. 713–716. in 1996, 1998, and 2000, respectively.
[26] S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Evalu- He is currently an Assistant Professor of Graduate
ation of blind signal separation method using directivity pattern under School of Information Science, Nara Institute of Sci-
reverberant conditions,” in Proc. ICASSP 2000, vol. 5, Jun. 2000, pp. ence and Technology. His research interests include
3140–3143. large vocabulary continuous speech recognition and
[27] H. Sawada, R. Mukai, S. Araki, and S. Makino, “A robust and precise spoken language processing.
method for solving the permutation problem of frequency-domain blind Dr. Lee is a member of the IEICE and the Acous-
source separation,” in Proc. Int. Symp. Independent Component Analysis tical Society of Japan.
and Blind Signal Separation, 2003, pp. 505–510.
[28] W. Wang, J. A. Chambers, and S. Sanei, “A novel hybrid approach to
the permutation problem of frequency domain blind source separation,”
in Proc. Int. Conf. Independent Component Analysis and Blind Signal Kiyohiro Shikano (M’84) received the B.S., M.S.,
Separation, 2004, pp. 532–539. and Ph.D. degrees in electrical engineering from
[29] N. Mitianoudis and M. Davies, “Permutation alignment for frequency Nagoya University in 1970, 1972, and 1980, respec-
domain ICA using subspace beamforming methods,” in Proc. Int. Conf. tively.
Independent Component Analysis and Blind Signal Separation, 2004, He is currently a Professor of Nara Institute of Sci-
pp. 669–676. ence and Technology (NAIST), where he is directing
[30] H. Sawada, R. Mukai, S. Araki, and S. Makino, “Polar coordinate based speech and acoustics laboratory. His major research
nonlinear function for frequency domain blind source separation,” areas are speech recognition, multimodal dialog
IEICE Trans. Fund., vol. E86-A, no. 3, pp. 590–596, 2003. system, speech enhancement, adaptive microphone
[31] S. Araki, R. Mukai, S. Makino, T. Nishikawa, and H. Saruwatari, “The array, and acoustic field reproduction. Since 1972,
fundamental limitation of frequency domain blind source separation for he had been working at NTT Laboratories, where
convolutive mixtures of speech,” IEEE Trans. Speech Audio Processing, he had been engaged in speech recognition research. During 1990–1993, he
vol. 11, no. 2, pp. 109–116, Mar. 2003. was the Executive Research Scientist at NTT Human Interface Laboratories,
[32] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts where he supervised the research of speech recognition and speech coding.
and Techniques. Englewood Cliffs, NJ: Prentice-Hall, 1993. During 1986–1990, he was the Head of Speech Processing Department at ATR
[33] A. Gersho and R. M. Gray, Vector Quantization and Signal Compres- Interpreting Telephony Research Laboratories, where he was directing speech
sion. New York: Kluwer Academic, 1998. recognition and speech synthesis research.
[34] T. Kobayashi, S. Itabashi, S. Hayashi, and T. Takezawa, “ASJ continuous Dr. Shikano received the IEEE Signal Processing Society 1990 Senior Award
speech corpus for research,” J. Acoust. Soc. Jpn., vol. 48, no. 12, pp. in 1991. He is a member of the IEICE, the IPSJ, the ASJ, and the Japan VR
888–893, 1992. in Japanese. Society.

Authorized licensed use limited to: Iran Univ of Science and Tech. Downloaded on May 19,2010 at 05:19:56 UTC from IEEE Xplore. Restrictions apply.

Você também pode gostar