Você está na página 1de 4

28 1

FULL-DUPLEX SPEECH FOR HF RADIO SYSTEMS

N Serinken, B Gagnon, and 0 Erogul

Communications Research Centre, Canada

INTRODUCTION scale modification process is reversible the time


compressed speech segments can be restored to their
original form with minimal distortion. By using analog
Conventional telephone systems operate in full-duplex time compression and expansion full-duplex audio can be
mode where both parties can speak at the same time. HF achieved when the available bandwidth of the system does
radio systems operate in a mode known as half-duplex not allow for digital TDD operation. The real-time
mode where only one person can talk at a given time. The implementation and test results of an analog time division
direction of the communication link is controlled by a duplex system (ATDD) are presented in this paper.
“push to talk” (PTT) button which is activated by the party
wishing to speak. In order to avoid confusion, the speaking
party must indicate verbally that the channel is being, TIME-SCALE MODIFICATION
relinquished by saying “over”. Despite its inconvenience,
half-duplex has retained its popularity because it uses only
one radio channel, which is an important consideration1 Three different TSM algorithms have been evaluated for
when radio spectrum is at a premium. One disadvantage of the purpose of determining the best technique for the real-
a manually or automatically controlled half-duplex time implementation of the ATDD system, Seneff (1),
communications system is that it is difficult to interface to McAulay and Quatieri ( 2 ) , and Verhelst and Roelands (3).
a full-duplex conventional telephone system. In addition, These algorithms have been applied to speech signals from
a ”normal” conversation in which one party may intermpi both genders as well as musical passages and their
the other is impeded by the nature of the radio link. performance has been evaluated through a series of
subjective listening tests, such as the mean opinion score
A full-duplex link can be achieved over a single-channel test (MOS), the degradation mean opinion score test
radio system if both ends of the link operate in what is (DMOS) and the diagnostic rhyme test (DRT),
known as time division duplex (TDD) mode. In this mode Papamichalis (4). The Waveform Similarity Overlap-and-
the channel is allocated half of the time to transmit Add Algorithm (WSOLA) (3) was chosen as the reference
information in each of the two directions. That is, the radio algorithm for the performance comparison of the other
channel is divided into time slots of Ti2 seconds, with each algorithms in the DMOS test. Table 1 gives the subjective
end transmitting in alternate time slots. Stated simply, evaluation results of these tests for sequential TSM
when one end is transmitting, the opposite end is receiving compression followed by expansion of speech signals by
the information and vice versa. In this way a single radio factors of two.
channel can support information flow in both directions
resulting in a virtual full-duplex link. TABLE 1- Subiective evaluation test results
( l=poor to 5=excellent)
Full-duplex voice systems are used on radio links that can
support high speed digital data. For the TDD system to
work, the channel must be capable of carrying more than
twice the data rate needed for a single voice codec. For HE;
radio systems the maximum rate that can be reliably carried
within the assigned band is not sufficient for TDD voice
transmission given the present state of speech coding and
data transmission technology.

To overcome these problems the system described in thii3


paper uses a technique called “time-scale modification“ A: WSOLA, Algorithm in reference (3)
(TSM) which can compress a T second segment of analog B: Algorithm in reference (1)
voice by a factor of two. An advantage of this approach is C: Algorithm in reference (2).
that the compressed voice signal occupies the same
bandwidth and dynamic range and consequently no radio Based on these tests, the WSOLA algorithm was selected
modifications are required. Additionally since the for the real-time implementation of the system because it
frequency occupation remains unchanged, all existing was computationally inexpensive, and the resulting speech
radio regulations remain satisfied and the type approval quality was found to be very high with respect to the
process for the radio need not be repeated. As the time- others.

‘HF Radio Systems and Techniques’, 7-10 July 1997, Conference Publication No. 41 1, 0 IEE, 1997
c
L ’82
next segment. The old signal segment is weighted with the
For a uniform change in the time scale, the time to falling portion of the windowing function while: the new
corresponding to the original articulation rate is mapped to segment is weighted with the rising portion of the
the transformed time t’, through the mapping t’, = p to.The windowing function.
case p > 1 corresponds to slowing down the articulation
rate by means of time-scale expansion, while the case p < Time-scale expansion is achieved by repeating the speech
1 corresponds to speeding up the articulation rate by means segments excised from the previous ones within the
of time-scale compression. Speech events which take place tolerance interval in the same manner explained above.
at a time t‘, according to new time scale will have occurred The time alignment between the successive windows with
at p ’ t 6 in the original time scale.The WSOLA algorithm respect to the signal similarity removes the phase
is a time domain process. It seeks to find a segment of the discontinuities. Therefore, time-scale modified waveform
input signal that will be overlapped with and added to the by the WSOLA algorithm can maintain maximal similarity
previous segment which lies within a prescribed tolerance to the original waveform across its segment joints.
interval around the synthesis instant. The position of the
best segment is determined by finding the value A=Am
lying within a tolerance region [-&m...A,ax] around the
analysis instant and which maximizes the cross-correlation SYSTEM IMPLEMENTATION
coefficients between the previous segment and the segment
under consideration. The basic synthesis equation used by
the WSOLA procedure is: An ATDD system was implemented using a SO MHz
TMS320C3 1 floating-point digital signal processor with
512 kilobytes each of RAM and ROM. The computer
system and external inputloutput audio were interfaced
using a 14-bit linear AID and DIA with a 8 kHz audio
sampling frequency. The processor board has digital ports
where v(n) is the square of a windowing function, w(n), Ak for connection to the radio Wrx control lines and external
are the shift factors within the range [-A,,...&,,], x(n) and timing reference signals. The digitized audio was stored in
y(n) are the input and output signal samples respectively, a 200 ms long buffer and compressed into bursts of 90 ms
L, represents the consecutive window positions, i.e., the in real-time. The remaining 10 ms reserved for in band
synthesis instants, and ?(LA represents analysis instants. timing reference signals. The radio channel was
w(n) is a Hann window’ with 50% overlap. The operation multiplexed into 100 ms transmission and reception slots.
of the WSOLA technique is illustrated in Figure 1 and DSP software was written in “C” language for the
explained below. implementation of the WSOLA algorithm. Figure 2 shows
the ATDD system and HF radio equipment connections
In the implementation of this algorithm, regularly spaced for one end of a radio link. The other end of the ATDD
synthesis instants L, = kL are chosen. Proceeding in a left- link has the same configuration of radio and ATDD
to-right fashion (in Figure 1 ) and assuming the segment processor. Audio signals from the ATDD system were
(A) was the last segment excised from the input and added interfaced to the line-in and out ports of the radio
to the output at time instant L,-/=(k-I)L, i.e., segment (a) = equipment and the control port of the ATDD was
segment (A). WSOLA seeks to find a segment (b) that will connected to the PTT line of the radio system. Figure 3
overlap-add with (a) in a synchronized way and can be shows the compressed signals to and from the audio ports
excised from the input around time instant z’(LJ. As (A’) of the HF transmitter and receiver. The top trace shows the
will overlap-add with (A) = (a) in a natural way to form a compressed 100 ms bursts of audio fed to the transmitter
portion of the original speech, WSOLA can select (b) such while the bottom trace shows the audio output from the
that it resembles (A‘) as closely as possible and is located receiver as observed at points A and B of Figure 2
within the prescribed tolerance interval around z’(LJ in respectively.
the input wave. The position of this best segment (B) is
found by maximizing a normalized cross-correlation
similarity measure between the sample sequence Experimental Setup
underlying (A’) and the input speech. After overlap-adding
@) with (a), WSOLA proceeds to the next output segment,
where (B’) now plays the same role as (A’) in the previous The experiments employed a transmit site near Ottawa and
step. In the overlap and adding process each overlapping a receive site 210 km southwest of this location near Lake
signal segment is weighted with one half of the windowing Ontario, for a period of four days during October 1996.
function for smooth transition from one segment to the Test signals were transmitted at a frequency of 4 MHz
using a horizontal multi-band fan dipole antenna at a
power level of 100 Watts and were received using a
‘The term Hann window is conventionally horizontal dipole antenna. Since the ATDD systlem relies
known as Hanning window, after Julius Von Hann. on the accurate synchronization of transmission and
283

reception time slots, both transrnitter and receiver were CONCLUSIONS


synchronized using global positioning system (GPS)
receivers. To compensate for the end-to-end time delay of
the system the receiver timing reference was delayed to One disadvantage of ATDD systems is the delay
coincide with the start of the timing reference signal introduced to the speech path, which is in the order of
received from the transmitter. The experimental system twice the input buffer size. In the test version of the
used the transmitted timing reference signal as a marker for ATDD system the buffer size was chosen to be 200 ms,
alignment of the external timing reference delay for the resulting in a delay of 400 ms which is less then the delays
receiver. At the receiver output the compressed speech observed from geostationary communication satellites.
signal was separated from the timing reference and ATDD system feasibility has been demonstrated with a
expended into normal speech by the ATDD processor. To real-time system and the quality of speech was found to be
simulate part of a full-duplex radio link, ATDD satisfactory. The effect of channel impairments was
transmission was tested for a one-way link with a 50% similar to those of a conventional half-duplex radio link.
channel occupation. The next phase of ATDD research will concentrate on the
management of transmiit/receive antenna switching,
It was observed that the gain of the communications receiver AGC matched to the ATDD processing and the
receiver was unable to track signal transitions from synchronization of multiplexing time slots through a
transmission time slot to reception time slot due to the reference derived from transmitted reference signals.
receiver automatic gain control (AGC) attack time-
constant. The onset of the burst of compressed speech
occasionally caused the ATDD time scale expansion to REFERENCES
distort the speech signal due to over-amplification of the
received audio. Overall, ATDD system was observed to
perform satisfactorily during various forms of HF radio 1. Seneff S., 1982, ”System to Independently Modify
channel impairments such as noise, interference and Excitation andor Spectrum of Speech Waveform Without
selective fading encountered during the tests. Explicit Pitch Extraction”, IEEE Trans. on Acoustics,
Saeech, and Signal P r o c e s s h Vol. ASSP-30, NO. 4, pp.
ATDD system performance was evaluated by comparison 566-578
of ATDD processed HF radio link to an unprocessed HF
radio channel. The transmitter audio input was controlled 2. McAulay R. J. and Quatieri, T. F., 1986, “Speech
by a selector switch with two inputs which was repeatedly Analysis Synthesis ]Based on a Sinusoidal
fed with 10 minutes of ATDD processed signal followed Representation”, IEEE Trans. Acoustics. Saeech. and
by 10 minutes of conventional speech signal. The receiver Siena1 Processing, Vol. ASSP-34, No. 4, pp. 744-754
audio and ATDD processed signals were recorded with an
audio tape recorder for informal subjective evaluation of 3. Verhelst W. and Roelands M., 1993, “An Overlap-add
the ATDD system performance. Technique Based on Waveform Similarity (WSOLA) For
High Quality Time Scale Modification of Speech”, Int.
At the receiver, the processed and unprocessed systems Conf. Acoustics. Saeech. and Signal Processing;. ICASSP-
were evaluated using the DRT intelligibility test. This -
93, pp. 554-557
intelligibility test is based on the ability of listeners to
distinguish phonemes with common attributes. The listener 4. Papamichalis P.E., 1987, “Practical Approaches to
is presented with one word from a pair and asked to Speech Coding”, Prentice-Hall Inc., Englewood Cliffs,
determine which word was spoken. The alternate choice is N.J., pp. 177-198.
a word differing from the presented word in one phoneme,
usually a consonant. This test contains 464 words in 232
rhyming pairs and a list of words spoken by both male and
female speakers. The intelligibility score is calculated as a
percentage of correct responses.

Both systems were evaluated under similar conditions and.


it was found that the scores for the female speaker were:
consistently lower than the male speaker. The average:
score for the ATDD and conventional radio links were:
80% and 90% respectively, and average intelligibility
score for the WSOLA algorithm was found to be 92%)
without the ATDD processing.
284

Figure 1 WSOLA input and output waveforms

Figure 2 ATDD and radio interfaces

I , , A- Tx input

I R- Rx O u t D u t

Figure 3 Radio input/output audio signals

Você também pode gostar