JPEG, Basic Ideas, Standards H.261, MPEG-1, MPEG-2 AVC, HEVC, Container Formats

0MULTIMEDIA STANDARDS
1. Standardization
2. Video Coding
2.1 Part I – JPEG, Basic ideas, standards
2.2 Part II – H.261, MPEG-1, MPEG-2
2.3 Part III – AVC, HEVC, Container formats
3. Audio Coding
3.1 Part I – Psychoacoustic Fundamentals
3.2 Part II – MPEG 1 Audio
3.3 Part III – Speech coding
4. Metadata Standards
4.1 Part I – Consumer Standards: CDDB, ID3v1 + ID3v2, EXIF
4.2 Part II – Interlude standards: XML, JSON
4.3 Part III – Broadcast standards: MPEG 7, Dublin core
5. System Standards
5.1 Part I – Audio broadcast
5.2 Part II – Video Broadcast
5.3 Part III – Streaming
2. VIDEO CODING
2.1 Part I – Video Coding I
Still Image Coding – JPEG & JPEG-2000
1. Colour Subsampling:
o Color Space transformation: RGB to YCbCr; for better compression; brightness (Y)
better perceived by humans than change in hue (Cb) and color saturation (Cr).
o Chroma subsampling: Reduce spatial resolution of Cb & Cr
o Block Splitting: Each channel split into 8x8 blocks (MB/MCU)
2. 2-D DCT: Each 8x8 block of Y, Cb, Cr components converted to frequency domain
representation
o Top-left component: DC; largest magnitude; sets the hue for the entire block
o Next 63 components: AC
o Advantage: Aggregate most of the signal in corner
3. Quantization: To greatly reduce high frequency as the human eye cannot perceive greatly the
difference in high frequencies
o Only lossy operation (rounding off to nearest integer) if DCT is performed with high
precision
o As a result, most high frequency components become zero, rest become small values
o Elements of the quantization matrix controls the compression ratio (larger values 
greater compression)
4. Entropy Coding: Lossless data compression
o Zig-Zag Coding: Arranging components on zig-zag pattern using RLE algorithm
(groups similar frequencies together)
o Huffman Coding
JPEG 2000:
- Higher compression rates

- Lossy and lossless compression within a single codec
- No 2D-DCT: Wavelet transforms
- No blocking artifacts
- Decomposition into code blocks: Each block encoded differently, gives access to different
regions of the image
Basic Ideas of Video Coding
Video Compression frames:
Video
Coompression Frames
basics
Intra_frame Inter_frame
(spatial (temporal
coding) coding)
Intra-frame coding: Still image coding (same as JPEG)
1. Color transformation + Chroma subsampling + block splitting

2. 2D – DCT
3. Quantization
4. Entropy coding (RLE + Huffman coding)
Principle of difference image coding: Next frame is a difference of the current frame and the previous
frame
Frame Prediction:
1. I-frames (Intra-coded picture) – coded using current picture

2. P-frames (Predictive coded picture) – coded using current and previous
3. B-frames (Bi-directional predictive coded picture) – coded using current, previous and
successive frames
Motion Compensation:
- Provides an estimation of the successive frames

- Difference between the luminance components (Y) of origin MB (in the actual frame) and the
“search” MB (in the estimated frame)
- Find the “search” MB in the previous frame to determine the motion vector
- Based on motion vectors successive frames can be estimated
2.2 Part II – H.261, MPEG-1, MPEG-2
H.261
Macroblocks :
- 6 8x8 pixel blocs

- 4xY : 1xCb : 1xCr
- 16x16 array of Luma samples
- 8x8 of chroma samples
- 4 :2 :2
GOB : 11x3 = 33 MB’s
Motion compensation inter-picture prediction
(reduces temporal redundancy, motion vectors compensate for motion)
Transform coding (2D-DCT - reduces spatial redundancy)
1. Intraframe - Scalar (for DC – 8 bit) & Vector ( for AC – variable step 2-64)
2. Interframe – Vector with variable step size 2-64 for all values
zig-zag
Entropy
Only decoders are standardized
Encoder designers can design their own pre-processing and encoding algorithm (better
enhancements/better motion estimation algo) as long as the output can be decoded by any
standardized decoder within defined error limits
2 frame formats:
1. CIF – 352x288; 30 fps

2. QCIF – Smaller resolution; mandatory
H.261 Coder:
H.261 Source Coder:
1. Coding control:
o Segmentation of stream to a block of 8x8 pixels

o Intraframe Mode – transmission of ALL blocks
o Interframe Mode – only those blocks whose difference exceeds a threshold
o Transmission control – Transmitted blocks  Transformed – Co-eff’s quantized – Entropy
coded
2. Prediction:
o Motion compensation (optional)

o One motion vector per MB
o x,y components must not exceed +/- 15 for all Y blocks
o motion vector for Cb & Cr components = component values of MB / 2
3. Loop Filter:
o To minimize large errors

o One vertical and horizontal recursive filters
o operation on an entire MB – on/off for all 6 8x8 pixel block
4. DCT Tranform
5. Quantization
6. Synchronization control
o By transmitting intraframes once in a while

o receiver freezes an image in case of loss of sync, requests a new intraframe image
Video Multiplex Coder:
1. Picture – PSC, PN, PS (split screen, CIF/QCIF etc), GOB Data

2. GOB – GSC, GN, quantization info, MB data
3. Macroblock – co-effs of Y, Cb,Cr components
4. Block – sync bit, data, BCH checksum (FEC)
Data rate control
- Difference block threshold adjustment

- quantization step size adjustment (larger step size – lower data rate)
- MB in intraframe or interframe mode
MPEG-1 Video
Video on CD’s and hard disks
D frames: low quality I frames for fast forward/backward search
Encoder: Input  Picture re-order  Motion estimator – motion vectors  DCT  Q  VLC  Mux
MPEG-2 Video/ H.262
Studio and home electronics, digital TV and DVD
Better quality with lower data rates

higher complexity
high range of resolutions, data rates and qualities
divides profiles (tools) and levels (parameters)
2.3 Part III – MPEG-4 Video, Container formats
MPEG-4 Part 2 – Visual / H.263
- Besides video, other visual objects

- In DivX / Xvid
MPEG- 4 Part 10 – Advanced Video Coding / H.264
- Blu-Ray, HDTV (DVB-X2), Mobile TV, Video streaming

- Improvements, performance doubled compared to H.263
- Enhanced motion estimation with variable block size
- Slices (allows parallelization, intraframe only within a slice)
HEVC / H.265
- further halves bitrate

- DVB-T2, Ultra Blu-Ray
- profiles, levels, tiers
3. AUDIO CODING
3.1 Part I – Psychoacoustic Fundamentals
Frequency masking/Simultaneous masking: When a sound is made inaudible by noise or unwanted

sound which hast the same duration as the original sound. For example, pure tonal sounds can be
perceived clearly when separated but not when they appear at the same time.
Frequency resolution/frequency selectivity:
 The ability to hear frequencies separately rather than a combination.

 Occurs in the basilar membrane.
 The listener tunes to the center frequency of the sound and does not let other frequencies
pass.
Critical bandwidth: When two frequencies can be heard as a combination tone, they reside in the same
critical bandwidth.
In the cochlea –
 A complex sound is split into different frequency components, these components appear as
various peaks at specific places across the length of the basilar membrane.
 Frequency selectivity in the ear occurs due to auditory filters (cochlea and outer hair cells).
 Auditory filters are centered over the frequency of the tone
 Damage to the auditory filters can limit the ability to tell sounds apart.
Masking: Exhibits the limits of frequency selectivity. If a sound is masked by noise or an unwanted
sound of different frequency, the human auditory system cannot distinguish between these two
sounds.
Effect of masking on the signal threshold:
 Masking can increase threshold of a signal

 Degree of masking depends on –
- the frequency of the signal and on the frequency of the masker
- if the signal and the masker are presented at the same time (that instant of time)/frequency
(with the same duration of occurrence)
- intensity of the masker
Effect of masking at lower/higher frequencies - Spread of masking:
 As the intensity of the masker increases, spread of masking occurs at higher frequencies
 An interfering signal masks higher frequency signals much better than lower frequency signals
 Higher frequency maskers are only effective over a narrow range of frequencies, close to the
masker frequency
 Lower frequency maskers are effective over a wide range
 Bandwidth of the masker within the auditory filter masks the tone but has no effect outside
this bandwidth
Leveraging spread of masking in MP3: Parts of the signals outside the critical bandwidth are coded
with lesser precision. Parts of the signal perceived by the listener are represented with higher
precision  Reduces the size of the audio file.
Temporal masking/non-simultaneous masking:
 Pre-masking – Sound is masked by something that appears after it. Up to 20ms. Helps
hide pre-echo. Can help with pre-echo if the transform block is short.
 Pre-echo – Quantized transform coefficients produce noise all the time at all time instants
and can be heard with transient sounds. If there is a quite signal block with a transient
sound in the end (drums/castanets), noise can be heard even before the transient sound.
 Post-masking – Masking that occurs after a strong sound. Up to 200ms. Decay in the
effects of the masker
In frequency masking, a loud sound partially or totally masks a softer sound. In temporal masking, a
loud sound blocks other sounds for a period.
Knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory
system.
Perceptual audio encoder
3.2 Part II – MPEG Audio Coding
MPEG 1 Audio
Main building blocks:
1. Filter Bank – decompose signal into spectral components (time-frequency domain)

2. Perceptual Model – calculates the masking threshold (SMR)
3. Quantization and coding – to reduce redundancy. Introduces noise
Mathematically, all transforms used in audio coding systems can be seen as filter banks.
Shape of quantization noise determines audibility of the signal

Joint stereo mode: for increased compression rate
1. M/S Stereo coding: Mid/Side instead of Left/Right channels

- M stereo channel – Sum of L & R = (L+R)/2
- S stereo channel – Difference of L & R = (L-R)/2
2. Intensity Stereo Coding: leverages the low accuracy of human hearing at perceiving the
direction of certain frequencies.
- Inter-aural-time differences (ITD) for low frequencies
- Inter-aural-amplitude differences (IAD) for higher ones, only mono signal is transmitted
Layer I (TM: Philips) Layer II (TM: MUSICAM) Layer III

Frame size (samples) 384 1152 1152
Quantization Block companding (12 Block companding Non-Uni - Huffman
samples) (12 samples)
Spectral resolution (sub-bands) 32 32 576/192
Scale factor select info No Yes Yes
Bit allocation Yes Yes Yes
Scale factor Yes Yes Yes
Bit packing No Yes Yes
Header-less Format No No Yes
Joint Stereo No No Yes
MPEG 1 Audio – Layer 3
Bit stream:
 Fixed part:
- independent of bit rate: 17 bytes in mono, 32 in stereo
- additional info for the frame (pointer for the variable part)
- additional info per granule (selection if Huffman coding tables)
 Variable part (main info):
- Scale factors
- Huffman coded frequency lines
- additional data
 Bit streams can be switched dynamically
2 MDCT block lengths:

- Short block – 6 samples
- Long block – 18 samples
Higher the layer – more compression – but high processing power.
3.3 Part III – Speech Coding
Parameter estimation (with the help of audio coding techniques) to model speech + data compression
(to represent these parameters in a compact bit stream)
Source filter ‘model’ of speech production:
speech = sound source + linear acoustic filter
 sound source/excitation signal models vocal chords; periodic impulse train

 linear filter models the vocal tract; all-pole filter; coefficients obtained by performing linear
prediction
excitation signal * filter response = synthesized speech

Time
Domain PCM, ADPCM
Wavefrom
coders
Frequency
Sub-band coding
Speech (Codes only the spectral shape,
Domain
Coders no regard for human speech)

Linear Predictive
Coders
Vocoders
Formant coders
(extracts certain voice
parameters to be transmitted)
Mean opinion score (MOS): A rated survey to analyze the speech quality. Rated from 1 to 5
Pulse code modulation (PCM)/G.711:
- For voice frequencies

- Logarithmic quantization, mu-law, A-law
- 8 khz sampling frequency
- 64 kbps
- ISDN, VoIP
Code Excited Linear Prediction (CELP):
- low bit rates

- natural sounding
- complex code book search
Main ideas behind CELP:
1. Excitation Codebook: Fixed and adaptive codebook as the input (excitation) to the LP
2. Speech production: source-filter (pitch synthesis + spectral envelope filter)
model through linear prediction (Buffer and LPC analysis)
3. Weighting: perform closed-loop search – perceptually weighted domain
4. Vector quantization
Hybrid Coders: Waveform coders (excitation+synthesis) + LP (weighted errors)

G.728/ LD-CELP: Low delay CELP:
- 16 kbps
- Analysis by synthesis
- Low delay – 5 samples in the decoder (Backwards adaptation of gain and predictor)
- Short term pitch prediction
Perceptual weighting:
Hybrid window (to gen. auto-correlation co-efficient’ s)
Recursion module (convert ACF co-effs to predictor co-efficient’ s)
Weighting filter co-efficient calculation

4 METADATA
- Data about data

- Additional technical data/content data/ sender-receiver info
- Channels/frequency; name (title/album/artist); image specs (ISO/shutter speed)
- Not part of multimedia data; might not be present with multimedia data on same device
4.1 Part I – Consumer Standards
CDDB (for CDs)
ID3v2 + ID3v2 (for mp3)
EXIF (for photos)
CD Data Base:
 Software CD players can query info from an online DB for data not stored on CD
 A unique ID is created based on the titles and the track duration for querying
 New format: CD Text (if device can support)
ID3:
 For storing mp3 metadata
ID3v1:
 Additional info block of 128 bytes

 at the end of a bit stream
 not flexible and extensible
 industry specified
ID3v2:
 Frames
 lyrics + pictures possible
EXIF:
 Stored directly in the image
4.2 Part II – Interlude standards
XML
JSON
XML: Extensible markup language
 Text based, human readable

 Device independent
 Complicated to parse
 Memory consuming (needs to be zipped)
 Docs are well formed – no syntactic errors
 Docs are valid – have a matching scheme
JSON: JavaScript Object Notation
 IETF
 Easily parsable by web browsers
 less overhead
 no built-in validation
4.3 Part III – Broadcast Standards

MPEG 7
Dublin Core
MPEG 7:

5. SYSTEM STANDARDS
5.1 Part I – Audio broadcast: Digital Audio Broadcasting
- International consortium (broadcasters, network operators, consumer electronics, researchers)
Transmitter: Input + channel coding + multiplexing + modulation (COFDM)
2 types of data –
1. Radio program
2. Data services
Transmission channels in DAB:
1. Synchronization channels
o Zero and phase reference
2. Fast information channel
o Analyzed to decode the payload
3. Message service channel
o Radio programs and data services
DAB data modes:
1. Stream mode – constant data stream (64kbps); radio programs

2. Packet mode – asynchronous data stream (very low bit rates); pure data services
DAB Transmission mechanisms:
- For data services
Fast Information Data channel
XPAD
Packet Mode
Fast Information Data channel: Traffic message channel (TMC)
- Based on Alert C protocol

- information contained – event, severity, location, duration, alternative
Extensible Program Associated Data (PAD/X-PAD)
DLS Dynamic label sequence for text

•Short text messages
•8 segments, 18 characters
•SMS, weather forecast, title/artist
SLS Slide Show images

•Sequence of images
•MOT protocol
•JPEG & PNG
EPG Electronic program guide

•ff
- 4 transmission modes with different frequency bands and frame durations
5.2 Part II – Video Broadcasting: Digital Video Broadcasting
- More channels with the same bandwidth (Combined baseband processing)

- Selection of audio video quality
- additional services – subtitles, internet etc
DVB
Satellite Terrestrial Cable

transmisssion transmission transmission
1. Satellite Transmission:
 Uplink – frequency multiplex; single frequency to transponders
 Downlink – frequency multiplex; full-scale amp; no AM; uniform power dist.; QPSK
 Bandwidth – 27Mhz in Ghz range; QPSK; frequency re-use by sat’s in different geo-
stationary positions
2. Terrestrial Transmission:
 Channel – Multipath propagation; in-house reception
 Bandwidth - 8Mhz
 Single frequency networks possible
 good area coverage for roof antennas
 OFDM
3. Cable Transmission:
 High SNR
 Bandwidth – relatively low
 Existing cable network must be available
 must be compatible to satellite channels
 4 QAM
5.3 Part III – Streaming
- Simultaneous download and play of media data via Internet
Real Time Transport Protocol (RTP)
MPEG DASH
RTP:
 Real Time Control Protocol + Real Time Streaming Protocol

 No client controls
MPEG DASH:
 Dynamic Adaptive Steaming over HTTP (XML)

 No client of coded specification
 Dynamic switching between codecs
 Interoperability between clients
 client can control what to stream
MPEG DASH Stream Structure:
1. Period – Time frame

2. Adaptation set – variants of a period (language)
2.1 Representation – link to actual media data; segments
2.2 segments – installation segments & media segments

JPEG, Basic Ideas, Standards H.261, MPEG-1, MPEG-2 AVC, HEVC, Container Formats

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

JPEG, Basic Ideas, Standards H.261, MPEG-1, MPEG-2 AVC, HEVC, Container Formats

Enviado por

Direitos autorais:

Formatos disponíveis

0MULTIMEDIA STANDARDS

2.1 Part I – Video Coding I

Still Image Coding – JPEG & JPEG-2000

- Higher compression rates

Basic Ideas of Video Coding

Video Compression frames:

Intra-frame coding: Still image coding (same as JPEG)

1. Color transformation + Chroma subsampling + block splitting

1. I-frames (Intra-coded picture) – coded using current picture

- Provides an estimation of the successive frames

2.2 Part II – H.261, MPEG-1, MPEG-2

- 6 8x8 pixel blocs

GOB : 11x3 = 33 MB’s

Motion compensation inter-picture prediction

(reduces temporal redundancy, motion vectors compensate for motion)

Transform coding (2D-DCT - reduces spatial redundancy)

Only decoders are standardized

1. CIF – 352x288; 30 fps

H.261 Source Coder:

o Segmentation of stream to a block of 8x8 pixels

o Motion compensation (optional)

o To minimize large errors

o By transmitting intraframes once in a while

Video Multiplex Coder:

1. Picture – PSC, PN, PS (split screen, CIF/QCIF etc), GOB Data

Data rate control

- Difference block threshold adjustment

Video on CD’s and hard disks

D frames: low quality I frames for fast forward/backward search

MPEG-2 Video/ H.262

Studio and home electronics, digital TV and DVD

Better quality with lower data rates

high range of resolutions, data rates and qualities

divides profiles (tools) and levels (parameters)

2.3 Part III – MPEG-4 Video, Container formats

MPEG-4 Part 2 – Visual / H.263

- Besides video, other visual objects

MPEG- 4 Part 10 – Advanced Video Coding / H.264

- Blu-Ray, HDTV (DVB-X2), Mobile TV, Video streaming

- further halves bitrate

3.1 Part I – Psychoacoustic Fundamentals

Frequency masking/Simultaneous masking: When a sound is made inaudible by noise or unwanted

Frequency resolution/frequency selectivity:

 The ability to hear frequencies separately rather than a combination.

Effect of masking on the signal threshold:

 Masking can increase threshold of a signal

Effect of masking at lower/higher frequencies - Spread of masking:

Temporal masking/non-simultaneous masking:

Perceptual audio encoder

3.2 Part II – MPEG Audio Coding

Main building blocks:

1. Filter Bank – decompose signal into spectral components (time-frequency domain)

Shape of quantization noise determines audibility of the signal

1. M/S Stereo coding: Mid/Side instead of Left/Right channels

Layer I (TM: Philips) Layer II (TM: MUSICAM) Layer III

MPEG 1 Audio – Layer 3

2 MDCT block lengths:

Higher the layer – more compression – but high processing power.

3.3 Part III – Speech Coding

Source filter ‘model’ of speech production:

speech = sound source + linear acoustic filter

 sound source/excitation signal models vocal chords; periodic impulse train

excitation signal * filter response = synthesized speech