Você está na página 1de 21

Mid/Side Stereo Coding

MUSIC 422 Final Project


March 11, 2005
Rui Wang, Harold Nyikal, James Yu

02/09/08 1
Introduction

 Why Stereo Coding?


 Exploit correlations
 Flexibility for better quality or compression
 Widely used
 Artifact free

02/09/08 2
Topics

 Mid/Side Coding
 Bit Stream

02/09/08 3
Mid/Side Overview

 Take FFT of Block


 Decide to code M/S or L/R based on channel
energy differences
 Stereo Perceptual Model for L/R or M/S
 Bit allocate
 Quantize
 Pack to Bit Stream

02/09/08 4
Encoder Block Diagram
16-bit PCM WAV

block read audio


size file

L/R L/R
desired
bit rate MDCT FFT M/S convert

select decide FFT M/S


LR/MS MS/LR FFT L/R
L/R
or SMR M/S
or
find peak find peak
M/S
SMR L/R
common select
allocate bits
bit pool SMR(LR/MS)

masking masking
block curve curve
floating point
quantize SMR M/S SMR L/R
SMR SMR

packing

write file

02/09/08 5
Decoder Block Diagram
encoded file

read file

array of bytes

unpack

n-bit code

dequantize

convert M/S
sub-band to L/R
(if necessary)

MDCT coefficients

IMDCT

number (double)

write audio file

02/09/08 6
Mid/Side Coding

Mid = (L+R) / 2
Side = (L-R) / 2
 We can losslessly recover L, R by
L = Mid+Side
R = Mid - Side

02/09/08 7
Choose L/R or M/S

 The decision to transmit L/R or M/S based


on the following threshold
f upper f upper

∑ k k < 0.8
l 2
− r
k = f lower
2
∑k k
l 2
+ r
k = f lower
2

 If TRUE, transmit M/S, otherwise L/R

02/09/08 8
Generate Perceptual Model for L/R
and M/S
 To calculate the M/S masking threshold, first the
same two slope spreading function as used for L/R
(from text) is used.
 BTHRm – base threshold for the M channel
 BTHRs - base threshold for the S channel
 Additionally we must consider the stereo masking
contributions in the M and S channels. This is
dependent on the masking level difference between
the M and S channels.

02/09/08 9
Cont.

 The masking level difference is determined


by the following factor
1.25* ( 1− cos( π* min( z ,15.5 ) / 15.5 ))− 2.5
10
 This is multiplied by BTHRm or BTHRs to
obtain the respective MLD values
 MLDm = MLDfactor * BTHRm
 MLDs = MLDfactor * BTHRs

02/09/08 10
Perceptual Analysis

 Stereo Perceptual Analysis


 Masking and MLD factor

02/09/08 11
The Masking Thresholds

 The masking thresholds are thus calculated


as follows
 THRm = max(BTHRm, min(BTHRs, MLDs))
 THRs = max(BTHRs, min(BTHRm, MLDm))

 The SMR of the M/S channels is determined


from these thresholds

02/09/08 12
Bit Allocation

 Waterfilling Algorithm
 All bands in stereo signal (either M/S or L/R) are
ranked in one pool according to SMR
 Bits allocated to each band from a common pool
of bits

02/09/08 13
Bit Stream

Encode File Header Block 1 Block 2 Block 3 …

File ID Wave info n Blocks Blocksize # of scale # of mantissa # of bytes in a block


Header: (7-bytes) (32 bytes) (8 bytes) (4 bytes) bits bits (m) (4-bytes)
(2 bytes) (2-bytes)

Switch Info Scale factors # of mantissa bits mantissas


Block: (25 bits) (4*25 bits) per band
(m * 25 bits)

02/09/08 14
Example: Time-domain block

02/09/08 15
FFT of the block

02/09/08 16
M/S Channels

02/09/08 17
Sum and Difference of FFT energies

02/09/08 18
Listening Test Results

1) Rock music 2) Bass singer 3) Castanets 4) Glockenspiel


5) Harpsichord 6) Quartet 7) Speech 8) Violin
02/09/08 19
Listening Test: Stereo
Improvement

02/09/08 20
Conclusions

 Stereo signals usually have strong correlation


between the two channels
 M/S encoding is used more often than L/R
based on our decision model
 Stereo Coding improves most stereo signals

02/09/08 21