Você está na página 1de 23

Digital Audio Coding Dr. T.

Collins
Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4

Ancient Audio Coding Methods


Audio coding has actually been around for hundreds of years Traditionally, composers record their music by writing out the notes in a standard notation

200 year old example of audio coding

A slightly more modern equivalent example is the Victorian piano-rolls

Standard MIDI Files


A piano roll can be efficiently digitally encoded by recording the time when each note begins and ends This is what a standard MIDI file does The MIDI standard (Musical Instrument Digital Interface) is an internationally agreed language Standard MIDI files encode

MIDI events/messages e.g. note-on, note-off, etc. The time delay between each event Up to 16 different instruments to be played at once Transmission of parameters containing key velocity, volume, modulation etc.

As well as encoding note limits, it also allows:


Standard MIDI Limitations

In a MIDI file, it is the instructions to play the notes that are stored, not the audio itself The quality of the reproduction depends on the synthesiser used for playback

Original recording

Playback on other synthesisers / sound cards

MIDI vs. Digital Audio


MIDI Digital Audio
Stores instructions to turn notes Stores the actual sampled audio on and off Very efficient (typical rate: 1 kbps) Playback quality depends on the MIDI device Less efficient (typical rate: 100 kbps) Playback quality is always the same

Only synthesised instruments can Any sounds (including speech be used and singing) can be recorded

Sampling

Digital audio represents the continuous analogue audio waveform by a series of discrete samples The Sample rate must be at least double the bandwidth of the audio signal

Typical hi-fi sample rates are 44.1 kHz (CD audio) and 48 kHz (DAT tape and DAB radio)

Sound Pressure Level

Sample rate

Fs/2

Fs

Frequency

Quantisation levels

Each sample is quantised to be represented by a binary integer The number of bits used to represent each sample sets the number of quantisation levels The error between the quantised signal and the original audio is the quantisation noise Peak signal-to-quantisation noise ratio using n-bits per sample can be estimated as:

SNR 6 dB n

CD audio uses 16 bit resolution giving a dynamic range of ~96 dB To hear the quantisation noise, the signal level would be close to the threshold of pain!

Sub-band Coding

Like the eye, the ear is more sensitive to some frequencies than others Many audio coding algorithms exploit this using a form of subband coding
Filters Downsample Quantise Multiplex

Digital audio in


16x48000 =768 kbps 16x3x48000 =2304 kbps 16x3x16000 4x3x16000 =768 kbps =192 kbps

Coded audio out

Bit rates:

Perceptual Coding

A key question when designing a sub-band coder:

What should the quantisation levels of the sub-bands be?

Remember that the quantisation process will introduce noise and that we want the noise to be imperceptible We want the noise to be just below the threshold of hearing (also known as the Minimum Audible Field, MAF) So, the question should be:

What is the MAF in each sub-band?

To estimate this, look at Robinson-Dadson curves

Equal Loudness Curves

Quantisation Implications
80 70 Sound Pressure Level [dB-SPL] 60 50 40 30 20 10 0 -10 -20 -30 5000 10000 15000 Peak Signal Level

12 16 bits bits

Threshold of Hearing

Frequency [Hz]

Quantisation noise

Application to Sub-band Coding


80 70 Sound Pressure Level [dB-SPL] 60 50 40 30 20 10 0 -10 -20 -30 5000 10000 15000 Peak Signal Level

11 bits

12 bits

12 bits

12 bits

11 bits

10 bits

9 bits

9 bits

10 bits

10 bits

10 bits

9 bits
Threshold of Hearing

Frequency [Hz]

Psychoacoustics

Substantial improvements to our sub-band coder are possible using psychoacoustics Psychoacoustics is the study of how sound is perceived by the ear-brain combination Of interest to us: how the threshold of hearing is not constant In fact, the threshold of hearing constantly changes due to masking

Masking
Signal Signal + Noise (SNR = 24 dB) Noise

In the presence of the signal, the noise sounds much quieter (almost undetectable) Due to the anatomy of the ear, loud sounds mask quieter sounds at nearby frequencies Effectively, the threshold of hearing is raised to the masking threshold The masking threshold can be estimated using a psychoacoustic model and exploited by the coder

The Masking Threshold


80 70 Sound Pressure Level [dB-SPL] 60 50 40 30 20 10 0 -10 -20 -30 5000 10000 15000 Masking threshold Threshold of Hearing Signal

Frequency [Hz]

Applying Masking
80 70 Sound Pressure Level [dB-SPL] 60
Space Oddity, Bowie Frame used for example

5 50 bits

40 30 20 10 0 -10 -20 -30

5 bits

5 bits

5 bits

4 bits

4 bits

4 bits

4 bits

Masking threshold

4 bits

3 bits

2 bits

2 bits

Threshold of Hearing

Frequency [Hz] 5000 10000 15000

Average bits per sample = 3.92 Compression ratio = 16:3.92 = 4.1:1

Additional Side Information


The audio signal is processed in discrete blocks of samples known as frames Each frame of each sub-band is:

Scaled to normalise the peak signal level Quantised at a level appropriate for the current signal-tomask ratio

The receiver needs to know the scale factor and quantisation levels used This information must be embedded along with the samples The resulting overhead is very small compared with the compression gains

Block Diagrams
Digital Audio In Sub-band filter bank
Masking thresholds

Scale and Quantise Code Side Info

Multiplex and Data Format

Coded Audio Out

FFT

Psychoacoustic model

ENCODER
Descale & Dequantise Decode Side Info Inverse filter bank Digital Audio Out

Coded Audio In

DeMultiplex

DECODER

MPEG 1: Layers 1, 2 & 3


Three perceptual coders are available in the MPEG 1 specification They are know as layers 1, 2 & 3 Layer 1 (.mp1)

Similar to the simple coder just described 32 sub-bands are used Each frame contains 384 samples (32 x 12) A version of layer 1 was used in the Digital Compact Cassette (DCC) Slightly more complex but better quality than layer 1 Frame length increased to 1152 samples (32 x 36)

Layer 2 (.mp2)

MPEG 1: Layers 1, 2 & 3 (cont)

Layer 2 (cont)

Data formatting of samples and side information is slightly more efficient Used in Digital Audio Broadcasting (DAB) Significantly more complex than layers 1 or 2 Capable of reasonable quality even at very low data rates A combination of sub-band coding and transform coding is used to give up to 576 frequency bands (compared to 32 for layers 1 & 2) Huffman encoding is applied to samples MP3 files now hugely popular for internet and mobile users

Layer 3 (.mp3)

Other Perceptual Coders

The same principles are applied in subtly different ways in most general-purpose audio coders E.g.

Real Audio Microsofts WMA format MiniDisc (ATRAC)

MPEG-4

In the latest version of MPEG, MPEG-4, the specification includes:

General audio coders: Similar to MPEG 1 but including multichannel support Parametric coder: HILN (Harmonics, Individual Lines and Noise) for very low bit rates Speech coders: HVXC and CELP speech coders Structured Audio: Similar to MIDI but including instrument models. Used for synthetic audio. Synthesised Speech: Allows speech to be coded as text and resynthesised at the decoder

Summary

Standard MIDI files

Work by encoding the structure of the music Work by removing the perceptual redundancy from digitised audio Removes perceptual redundancy and statistical redundancy (by entropy coding) Coding method can be chosen to suit signal source Perceptual, statistical and structural redundancy can be exploited

MPEG-1 Layers 1 & 2

MPEG-1 Layer 3

MPEG-4

Você também pode gostar