Você está na página 1de 20

0MULTIMEDIA STANDARDS

1. Standardization
2. Video Coding
2.1 Part I – JPEG, Basic ideas, standards
2.2 Part II – H.261, MPEG-1, MPEG-2
2.3 Part III – AVC, HEVC, Container formats
3. Audio Coding
3.1 Part I – Psychoacoustic Fundamentals
3.2 Part II – MPEG 1 Audio
3.3 Part III – Speech coding
4. Metadata Standards
4.1 Part I – Consumer Standards: CDDB, ID3v1 + ID3v2, EXIF
4.2 Part II – Interlude standards: XML, JSON
4.3 Part III – Broadcast standards: MPEG 7, Dublin core
5. System Standards
5.1 Part I – Audio broadcast
5.2 Part II – Video Broadcast
5.3 Part III – Streaming
2. VIDEO CODING

2.1 Part I – Video Coding I

Still Image Coding – JPEG & JPEG-2000

1. Colour Subsampling:
o Color Space transformation: RGB to YCbCr; for better compression; brightness (Y)
better perceived by humans than change in hue (Cb) and color saturation (Cr).
o Chroma subsampling: Reduce spatial resolution of Cb & Cr
o Block Splitting: Each channel split into 8x8 blocks (MB/MCU)
2. 2-D DCT: Each 8x8 block of Y, Cb, Cr components converted to frequency domain
representation
o Top-left component: DC; largest magnitude; sets the hue for the entire block
o Next 63 components: AC
o Advantage: Aggregate most of the signal in corner
3. Quantization: To greatly reduce high frequency as the human eye cannot perceive greatly the
difference in high frequencies
o Only lossy operation (rounding off to nearest integer) if DCT is performed with high
precision
o As a result, most high frequency components become zero, rest become small values
o Elements of the quantization matrix controls the compression ratio (larger values 
greater compression)
4. Entropy Coding: Lossless data compression
o Zig-Zag Coding: Arranging components on zig-zag pattern using RLE algorithm
(groups similar frequencies together)
o Huffman Coding
JPEG 2000:

- Higher compression rates


- Lossy and lossless compression within a single codec
- No 2D-DCT: Wavelet transforms
- No blocking artifacts
- Decomposition into code blocks: Each block encoded differently, gives access to different
regions of the image

Basic Ideas of Video Coding

Video Compression frames:

Video
Coompression Frames
basics

Intra_frame Inter_frame
(spatial (temporal
coding) coding)

Intra-frame coding: Still image coding (same as JPEG)

1. Color transformation + Chroma subsampling + block splitting


2. 2D – DCT
3. Quantization
4. Entropy coding (RLE + Huffman coding)

Principle of difference image coding: Next frame is a difference of the current frame and the previous
frame

Frame Prediction:

1. I-frames (Intra-coded picture) – coded using current picture


2. P-frames (Predictive coded picture) – coded using current and previous
3. B-frames (Bi-directional predictive coded picture) – coded using current, previous and
successive frames

Motion Compensation:

- Provides an estimation of the successive frames


- Difference between the luminance components (Y) of origin MB (in the actual frame) and the
“search” MB (in the estimated frame)
- Find the “search” MB in the previous frame to determine the motion vector
- Based on motion vectors successive frames can be estimated

2.2 Part II – H.261, MPEG-1, MPEG-2

H.261

Macroblocks :

- 6 8x8 pixel blocs


- 4xY : 1xCb : 1xCr
- 16x16 array of Luma samples
- 8x8 of chroma samples
- 4 :2 :2

GOB : 11x3 = 33 MB’s

Motion compensation inter-picture prediction

(reduces temporal redundancy, motion vectors compensate for motion)

Transform coding (2D-DCT - reduces spatial redundancy)

1. Intraframe - Scalar (for DC – 8 bit) & Vector ( for AC – variable step 2-64)
2. Interframe – Vector with variable step size 2-64 for all values

zig-zag

Entropy

Only decoders are standardized

Encoder designers can design their own pre-processing and encoding algorithm (better
enhancements/better motion estimation algo) as long as the output can be decoded by any
standardized decoder within defined error limits

2 frame formats:

1. CIF – 352x288; 30 fps


2. QCIF – Smaller resolution; mandatory
H.261 Coder:

H.261 Source Coder:

1. Coding control:

o Segmentation of stream to a block of 8x8 pixels


o Intraframe Mode – transmission of ALL blocks
o Interframe Mode – only those blocks whose difference exceeds a threshold
o Transmission control – Transmitted blocks  Transformed – Co-eff’s quantized – Entropy
coded

2. Prediction:

o Motion compensation (optional)


o One motion vector per MB
o x,y components must not exceed +/- 15 for all Y blocks
o motion vector for Cb & Cr components = component values of MB / 2

3. Loop Filter:

o To minimize large errors


o One vertical and horizontal recursive filters
o operation on an entire MB – on/off for all 6 8x8 pixel block

4. DCT Tranform
5. Quantization

6. Synchronization control

o By transmitting intraframes once in a while


o receiver freezes an image in case of loss of sync, requests a new intraframe image

Video Multiplex Coder:

1. Picture – PSC, PN, PS (split screen, CIF/QCIF etc), GOB Data


2. GOB – GSC, GN, quantization info, MB data
3. Macroblock – co-effs of Y, Cb,Cr components
4. Block – sync bit, data, BCH checksum (FEC)

Data rate control

- Difference block threshold adjustment


- quantization step size adjustment (larger step size – lower data rate)
- MB in intraframe or interframe mode

MPEG-1 Video

Video on CD’s and hard disks

D frames: low quality I frames for fast forward/backward search

Encoder: Input  Picture re-order  Motion estimator – motion vectors  DCT  Q  VLC  Mux

MPEG-2 Video/ H.262

Studio and home electronics, digital TV and DVD

Better quality with lower data rates


higher complexity

high range of resolutions, data rates and qualities

divides profiles (tools) and levels (parameters)

2.3 Part III – MPEG-4 Video, Container formats

MPEG-4 Part 2 – Visual / H.263

- Besides video, other visual objects


- In DivX / Xvid

MPEG- 4 Part 10 – Advanced Video Coding / H.264

- Blu-Ray, HDTV (DVB-X2), Mobile TV, Video streaming


- Improvements, performance doubled compared to H.263
- Enhanced motion estimation with variable block size
- Slices (allows parallelization, intraframe only within a slice)

HEVC / H.265

- further halves bitrate


- DVB-T2, Ultra Blu-Ray
- profiles, levels, tiers
3. AUDIO CODING

3.1 Part I – Psychoacoustic Fundamentals

Frequency masking/Simultaneous masking: When a sound is made inaudible by noise or unwanted


sound which hast the same duration as the original sound. For example, pure tonal sounds can be
perceived clearly when separated but not when they appear at the same time.

Frequency resolution/frequency selectivity:

 The ability to hear frequencies separately rather than a combination.


 Occurs in the basilar membrane.
 The listener tunes to the center frequency of the sound and does not let other frequencies
pass.

Critical bandwidth: When two frequencies can be heard as a combination tone, they reside in the same
critical bandwidth.

In the cochlea –

 A complex sound is split into different frequency components, these components appear as
various peaks at specific places across the length of the basilar membrane.
 Frequency selectivity in the ear occurs due to auditory filters (cochlea and outer hair cells).
 Auditory filters are centered over the frequency of the tone
 Damage to the auditory filters can limit the ability to tell sounds apart.

Masking: Exhibits the limits of frequency selectivity. If a sound is masked by noise or an unwanted
sound of different frequency, the human auditory system cannot distinguish between these two
sounds.

Effect of masking on the signal threshold:

 Masking can increase threshold of a signal


 Degree of masking depends on –
- the frequency of the signal and on the frequency of the masker
- if the signal and the masker are presented at the same time (that instant of time)/frequency
(with the same duration of occurrence)
- intensity of the masker

Effect of masking at lower/higher frequencies - Spread of masking:

 As the intensity of the masker increases, spread of masking occurs at higher frequencies
 An interfering signal masks higher frequency signals much better than lower frequency signals
 Higher frequency maskers are only effective over a narrow range of frequencies, close to the
masker frequency
 Lower frequency maskers are effective over a wide range
 Bandwidth of the masker within the auditory filter masks the tone but has no effect outside
this bandwidth

Leveraging spread of masking in MP3: Parts of the signals outside the critical bandwidth are coded
with lesser precision. Parts of the signal perceived by the listener are represented with higher
precision  Reduces the size of the audio file.

Temporal masking/non-simultaneous masking:

 Pre-masking – Sound is masked by something that appears after it. Up to 20ms. Helps
hide pre-echo. Can help with pre-echo if the transform block is short.
 Pre-echo – Quantized transform coefficients produce noise all the time at all time instants
and can be heard with transient sounds. If there is a quite signal block with a transient
sound in the end (drums/castanets), noise can be heard even before the transient sound.
 Post-masking – Masking that occurs after a strong sound. Up to 200ms. Decay in the
effects of the masker

In frequency masking, a loud sound partially or totally masks a softer sound. In temporal masking, a
loud sound blocks other sounds for a period.

Knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory
system.

Perceptual audio encoder

3.2 Part II – MPEG Audio Coding

MPEG 1 Audio

Main building blocks:

1. Filter Bank – decompose signal into spectral components (time-frequency domain)


2. Perceptual Model – calculates the masking threshold (SMR)
3. Quantization and coding – to reduce redundancy. Introduces noise

Mathematically, all transforms used in audio coding systems can be seen as filter banks.

Shape of quantization noise determines audibility of the signal


Joint stereo mode: for increased compression rate

1. M/S Stereo coding: Mid/Side instead of Left/Right channels


- M stereo channel – Sum of L & R = (L+R)/2
- S stereo channel – Difference of L & R = (L-R)/2
2. Intensity Stereo Coding: leverages the low accuracy of human hearing at perceiving the
direction of certain frequencies.
- Inter-aural-time differences (ITD) for low frequencies
- Inter-aural-amplitude differences (IAD) for higher ones, only mono signal is transmitted

Layer I (TM: Philips) Layer II (TM: MUSICAM) Layer III


Frame size (samples) 384 1152 1152
Quantization Block companding (12 Block companding Non-Uni - Huffman
samples) (12 samples)
Spectral resolution (sub-bands) 32 32 576/192
Scale factor select info No Yes Yes
Bit allocation Yes Yes Yes
Scale factor Yes Yes Yes
Bit packing No Yes Yes
Header-less Format No No Yes
Joint Stereo No No Yes

MPEG 1 Audio – Layer 3

Bit stream:

 Fixed part:
- independent of bit rate: 17 bytes in mono, 32 in stereo
- additional info for the frame (pointer for the variable part)
- additional info per granule (selection if Huffman coding tables)
 Variable part (main info):
- Scale factors
- Huffman coded frequency lines
- additional data
 Bit streams can be switched dynamically

2 MDCT block lengths:


- Short block – 6 samples
- Long block – 18 samples

Higher the layer – more compression – but high processing power.

3.3 Part III – Speech Coding

Parameter estimation (with the help of audio coding techniques) to model speech + data compression
(to represent these parameters in a compact bit stream)

Source filter ‘model’ of speech production:

speech = sound source + linear acoustic filter

 sound source/excitation signal models vocal chords; periodic impulse train


 linear filter models the vocal tract; all-pole filter; coefficients obtained by performing linear
prediction

excitation signal * filter response = synthesized speech


Time
Domain PCM, ADPCM
Wavefrom
coders
Frequency
Sub-band coding
Speech (Codes only the spectral shape,
Domain

Coders no regard for human speech)


Linear Predictive
Coders
Vocoders

Formant coders
(extracts certain voice
parameters to be transmitted)

Mean opinion score (MOS): A rated survey to analyze the speech quality. Rated from 1 to 5

Pulse code modulation (PCM)/G.711:

- For voice frequencies


- Logarithmic quantization, mu-law, A-law
- 8 khz sampling frequency
- 64 kbps
- ISDN, VoIP

Code Excited Linear Prediction (CELP):

- low bit rates


- natural sounding
- complex code book search

Main ideas behind CELP:

1. Excitation Codebook: Fixed and adaptive codebook as the input (excitation) to the LP
2. Speech production: source-filter (pitch synthesis + spectral envelope filter)
model through linear prediction (Buffer and LPC analysis)
3. Weighting: perform closed-loop search – perceptually weighted domain
4. Vector quantization

Hybrid Coders: Waveform coders (excitation+synthesis) + LP (weighted errors)


G.728/ LD-CELP: Low delay CELP:

- 16 kbps
- Analysis by synthesis
- Low delay – 5 samples in the decoder (Backwards adaptation of gain and predictor)
- Short term pitch prediction

Perceptual weighting:

Hybrid window (to gen. auto-correlation co-efficient’ s)

Recursion module (convert ACF co-effs to predictor co-efficient’ s)

Weighting filter co-efficient calculation


4 METADATA

- Data about data


- Additional technical data/content data/ sender-receiver info
- Channels/frequency; name (title/album/artist); image specs (ISO/shutter speed)
- Not part of multimedia data; might not be present with multimedia data on same device

4.1 Part I – Consumer Standards

CDDB (for CDs)

ID3v2 + ID3v2 (for mp3)

EXIF (for photos)

CD Data Base:

 Software CD players can query info from an online DB for data not stored on CD
 A unique ID is created based on the titles and the track duration for querying
 New format: CD Text (if device can support)

ID3:

 For storing mp3 metadata

ID3v1:

 Additional info block of 128 bytes


 at the end of a bit stream
 not flexible and extensible
 industry specified

ID3v2:

 Frames
 lyrics + pictures possible

EXIF:

 Stored directly in the image

4.2 Part II – Interlude standards

XML

JSON

XML: Extensible markup language

 Text based, human readable


 Device independent
 Complicated to parse
 Memory consuming (needs to be zipped)
 Docs are well formed – no syntactic errors
 Docs are valid – have a matching scheme

JSON: JavaScript Object Notation

 IETF
 Easily parsable by web browsers
 less overhead
 no built-in validation

4.3 Part III – Broadcast Standards


MPEG 7

Dublin Core

MPEG 7:


5. SYSTEM STANDARDS

5.1 Part I – Audio broadcast: Digital Audio Broadcasting

- International consortium (broadcasters, network operators, consumer electronics, researchers)

Transmitter: Input + channel coding + multiplexing + modulation (COFDM)

2 types of data –

1. Radio program
2. Data services

Transmission channels in DAB:

1. Synchronization channels
o Zero and phase reference
2. Fast information channel
o Analyzed to decode the payload
3. Message service channel
o Radio programs and data services
DAB data modes:

1. Stream mode – constant data stream (64kbps); radio programs


2. Packet mode – asynchronous data stream (very low bit rates); pure data services

DAB Transmission mechanisms:

- For data services

Fast Information Data channel

XPAD

Packet Mode

Fast Information Data channel: Traffic message channel (TMC)

- Based on Alert C protocol


- information contained – event, severity, location, duration, alternative

Extensible Program Associated Data (PAD/X-PAD)

DLS Dynamic label sequence for text


•Short text messages
•8 segments, 18 characters
•SMS, weather forecast, title/artist

SLS Slide Show images


•Sequence of images
•MOT protocol
•JPEG & PNG

EPG Electronic program guide


•ff

- 4 transmission modes with different frequency bands and frame durations

5.2 Part II – Video Broadcasting: Digital Video Broadcasting

- More channels with the same bandwidth (Combined baseband processing)


- Selection of audio video quality
- additional services – subtitles, internet etc

DVB

Satellite Terrestrial Cable


transmisssion transmission transmission

1. Satellite Transmission:
 Uplink – frequency multiplex; single frequency to transponders
 Downlink – frequency multiplex; full-scale amp; no AM; uniform power dist.; QPSK
 Bandwidth – 27Mhz in Ghz range; QPSK; frequency re-use by sat’s in different geo-
stationary positions
2. Terrestrial Transmission:
 Channel – Multipath propagation; in-house reception
 Bandwidth - 8Mhz
 Single frequency networks possible
 good area coverage for roof antennas
 OFDM
3. Cable Transmission:
 High SNR
 Bandwidth – relatively low
 Existing cable network must be available
 must be compatible to satellite channels
 4 QAM

5.3 Part III – Streaming

- Simultaneous download and play of media data via Internet

Real Time Transport Protocol (RTP)

MPEG DASH

RTP:

 Real Time Control Protocol + Real Time Streaming Protocol


 No client controls

MPEG DASH:

 Dynamic Adaptive Steaming over HTTP (XML)


 No client of coded specification
 Dynamic switching between codecs
 Interoperability between clients
 client can control what to stream

MPEG DASH Stream Structure:

1. Period – Time frame


2. Adaptation set – variants of a period (language)
2.1 Representation – link to actual media data; segments
2.2 segments – installation segments & media segments

Você também pode gostar