Escolar Documentos
Profissional Documentos
Cultura Documentos
1. Standardization
2. Video Coding
2.1 Part I – JPEG, Basic ideas, standards
2.2 Part II – H.261, MPEG-1, MPEG-2
2.3 Part III – AVC, HEVC, Container formats
3. Audio Coding
3.1 Part I – Psychoacoustic Fundamentals
3.2 Part II – MPEG 1 Audio
3.3 Part III – Speech coding
4. Metadata Standards
4.1 Part I – Consumer Standards: CDDB, ID3v1 + ID3v2, EXIF
4.2 Part II – Interlude standards: XML, JSON
4.3 Part III – Broadcast standards: MPEG 7, Dublin core
5. System Standards
5.1 Part I – Audio broadcast
5.2 Part II – Video Broadcast
5.3 Part III – Streaming
2. VIDEO CODING
1. Colour Subsampling:
o Color Space transformation: RGB to YCbCr; for better compression; brightness (Y)
better perceived by humans than change in hue (Cb) and color saturation (Cr).
o Chroma subsampling: Reduce spatial resolution of Cb & Cr
o Block Splitting: Each channel split into 8x8 blocks (MB/MCU)
2. 2-D DCT: Each 8x8 block of Y, Cb, Cr components converted to frequency domain
representation
o Top-left component: DC; largest magnitude; sets the hue for the entire block
o Next 63 components: AC
o Advantage: Aggregate most of the signal in corner
3. Quantization: To greatly reduce high frequency as the human eye cannot perceive greatly the
difference in high frequencies
o Only lossy operation (rounding off to nearest integer) if DCT is performed with high
precision
o As a result, most high frequency components become zero, rest become small values
o Elements of the quantization matrix controls the compression ratio (larger values
greater compression)
4. Entropy Coding: Lossless data compression
o Zig-Zag Coding: Arranging components on zig-zag pattern using RLE algorithm
(groups similar frequencies together)
o Huffman Coding
JPEG 2000:
Video
Coompression Frames
basics
Intra_frame Inter_frame
(spatial (temporal
coding) coding)
Principle of difference image coding: Next frame is a difference of the current frame and the previous
frame
Frame Prediction:
Motion Compensation:
H.261
Macroblocks :
1. Intraframe - Scalar (for DC – 8 bit) & Vector ( for AC – variable step 2-64)
2. Interframe – Vector with variable step size 2-64 for all values
zig-zag
Entropy
Encoder designers can design their own pre-processing and encoding algorithm (better
enhancements/better motion estimation algo) as long as the output can be decoded by any
standardized decoder within defined error limits
2 frame formats:
1. Coding control:
2. Prediction:
3. Loop Filter:
4. DCT Tranform
5. Quantization
6. Synchronization control
MPEG-1 Video
Encoder: Input Picture re-order Motion estimator – motion vectors DCT Q VLC Mux
HEVC / H.265
Critical bandwidth: When two frequencies can be heard as a combination tone, they reside in the same
critical bandwidth.
In the cochlea –
A complex sound is split into different frequency components, these components appear as
various peaks at specific places across the length of the basilar membrane.
Frequency selectivity in the ear occurs due to auditory filters (cochlea and outer hair cells).
Auditory filters are centered over the frequency of the tone
Damage to the auditory filters can limit the ability to tell sounds apart.
Masking: Exhibits the limits of frequency selectivity. If a sound is masked by noise or an unwanted
sound of different frequency, the human auditory system cannot distinguish between these two
sounds.
As the intensity of the masker increases, spread of masking occurs at higher frequencies
An interfering signal masks higher frequency signals much better than lower frequency signals
Higher frequency maskers are only effective over a narrow range of frequencies, close to the
masker frequency
Lower frequency maskers are effective over a wide range
Bandwidth of the masker within the auditory filter masks the tone but has no effect outside
this bandwidth
Leveraging spread of masking in MP3: Parts of the signals outside the critical bandwidth are coded
with lesser precision. Parts of the signal perceived by the listener are represented with higher
precision Reduces the size of the audio file.
Pre-masking – Sound is masked by something that appears after it. Up to 20ms. Helps
hide pre-echo. Can help with pre-echo if the transform block is short.
Pre-echo – Quantized transform coefficients produce noise all the time at all time instants
and can be heard with transient sounds. If there is a quite signal block with a transient
sound in the end (drums/castanets), noise can be heard even before the transient sound.
Post-masking – Masking that occurs after a strong sound. Up to 200ms. Decay in the
effects of the masker
In frequency masking, a loud sound partially or totally masks a softer sound. In temporal masking, a
loud sound blocks other sounds for a period.
Knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory
system.
MPEG 1 Audio
Mathematically, all transforms used in audio coding systems can be seen as filter banks.
Bit stream:
Fixed part:
- independent of bit rate: 17 bytes in mono, 32 in stereo
- additional info for the frame (pointer for the variable part)
- additional info per granule (selection if Huffman coding tables)
Variable part (main info):
- Scale factors
- Huffman coded frequency lines
- additional data
Bit streams can be switched dynamically
Parameter estimation (with the help of audio coding techniques) to model speech + data compression
(to represent these parameters in a compact bit stream)
Formant coders
(extracts certain voice
parameters to be transmitted)
Mean opinion score (MOS): A rated survey to analyze the speech quality. Rated from 1 to 5
1. Excitation Codebook: Fixed and adaptive codebook as the input (excitation) to the LP
2. Speech production: source-filter (pitch synthesis + spectral envelope filter)
model through linear prediction (Buffer and LPC analysis)
3. Weighting: perform closed-loop search – perceptually weighted domain
4. Vector quantization
- 16 kbps
- Analysis by synthesis
- Low delay – 5 samples in the decoder (Backwards adaptation of gain and predictor)
- Short term pitch prediction
Perceptual weighting:
CD Data Base:
Software CD players can query info from an online DB for data not stored on CD
A unique ID is created based on the titles and the track duration for querying
New format: CD Text (if device can support)
ID3:
ID3v1:
ID3v2:
Frames
lyrics + pictures possible
EXIF:
XML
JSON
IETF
Easily parsable by web browsers
less overhead
no built-in validation
Dublin Core
MPEG 7:
5. SYSTEM STANDARDS
2 types of data –
1. Radio program
2. Data services
1. Synchronization channels
o Zero and phase reference
2. Fast information channel
o Analyzed to decode the payload
3. Message service channel
o Radio programs and data services
DAB data modes:
XPAD
Packet Mode
DVB
1. Satellite Transmission:
Uplink – frequency multiplex; single frequency to transponders
Downlink – frequency multiplex; full-scale amp; no AM; uniform power dist.; QPSK
Bandwidth – 27Mhz in Ghz range; QPSK; frequency re-use by sat’s in different geo-
stationary positions
2. Terrestrial Transmission:
Channel – Multipath propagation; in-house reception
Bandwidth - 8Mhz
Single frequency networks possible
good area coverage for roof antennas
OFDM
3. Cable Transmission:
High SNR
Bandwidth – relatively low
Existing cable network must be available
must be compatible to satellite channels
4 QAM
MPEG DASH
RTP:
MPEG DASH: