Você está na página 1de 49

Concepts of Multimedia

Processing and Transmission


IT 481, Lecture #1
Dennis McCaughey, Ph.D.
22 January, 2007

Outline

Course

Description
Instructor
Exams, Homework and Project
Grading
General Policies
Lecture Schedule
IT 481, Spring

2
01/22/2007

Course Description

Topics
The fundamentals of signal and image
processing, including algorithms for signal
processing that have applications to multimedia
Techniques for voice coding and recognition, CD
and DVD technology, streaming video, WANs
and LANs, and videoconferencing technology

Text: Multimedia Communications; Applications, Networks,


Protocols and Standards, Fred Halsall, Addison-Wesley; 1st
edition (2002), ISBN: 0-201-39818-4.

IT 481, Spring

3
01/22/2007

Instructor

Dennis McCaughey
Contact Information

703-263-7425 (Office)
703-624-6830 (Cell)
dgm@rincon.com (e-mail)
Office Hours: one hour before class

Background

IT 481, Spring

PhD in EE University of Southern California 1977


Thesis: Degrees of Freedom for Projection Imaging

4
01/22/2007

Exams, Homework and Project

Mid-Term: 1 Hour Closed Book


Cover the key topics covered in class and
homework

Final: Format To Be Determined


Homework: 1) Reading assignments, 2)
Written answers to selected questions
based on reading assignments, 3) Some
limited math problems
Project: Format (Preliminary): MATLAB
implementations of selected multimedia
processing applications.

IT 481, Spring

5
01/22/2007

More on the Project

A course project will explore aspects of multimedia


signal processing and will be computer based using
MATLAB.
Project topics will consist of a set of Matlab
implementations addressing multimedia concepts
assigned on a running basis over the semester.
Each student will be required to submit the project
in the format of a final report.
The projects will be graded on the effort applied-not
on Matlab programming skills.
Details regarding topics, content, and format will be
provided during the course.

IT 481, Spring

6
01/22/2007

Grading

The final grade will be determined by a weighted


average of the homework assignments, a midterm exam, a final exam and a project

IT 481, Spring

Homework

10%

Mid-Term

20%

Project

30%

Final

40%

7
01/22/2007

General Policies

Collaboration
Students are permitted and encouraged to collaborate on homework
assignments.
All graded work, however, must be the original effort of the student
submitting the paper.

Homework
Homework will be collected at the beginning of each class
period. Note: Late homework will be accepted provided the reason for
the delay is coordinated with the instructor within 2 days of its
assignment. Homework solutions will be discussed in class.

Make-up Exams
Make-up exams will not be given unless detailed written clarification
accompanied by documentation for the absence is provided. If this
information is not provided an F grade will be given for the exam. The
location and time for a make-up exam will be decided by the instructor.
Also, students are expected to be in class and on-time for every class.

IT 481, Spring

8
01/22/2007

Lecture Schedule (Preliminary)


Week

Date

Chapter

1/22

1/29

None

2/5

4
5
7
8
9
10

2/12
2/19
2/26
3/5
3/12
3/19

3
3
4
1-4
None
4

11

3/26

12

4/2

13

4/9

11

14
15
16

4/16
4/23
4/30
5/14

TBS
TBS
1-6,11

IT 481, Spring

Topic
Lecture #1: Introduction to Multimedia
Communications
Lecture #2: Signal Processing
Fundamentals and Intro to Matlab
Lecture #3: Multimedia Information
Representation
Lecture #4: Text Compression
Lecture #5: Image Compression
Lecture #6: Audio Compression
Mid-Term Exam &Project Review
Spring Break
Lecture #7: Video Compression
Lecture #8: Standards for Multimedia
Communications
Lecture #9: Digital Communication
Basics
Lecture #10: Entertainment Networks
and High Speed Modems
Lecture #11: Data Privacy
Special Topics
Final Exam Review
Final Exam 7:30pm

Reading
Assignment

Homework

1,2

3
3
4
4

5
6
11
TBD
TBD
1-6,11

9
01/22/2007

Multimedia Communications

What is Multimedia?

Multimedia is a combination of text, art,


sound, animation, and video.

Slide: Courtesy, Hung Nguyen


IT 481, Spring

11
01/22/2007

Multimedia Components Simplified

Multimedia can be viewed as they combination of audio,


video, data and how they interact with the user (more than the
sum of the individual components)

Audio

Multimedia

Data

IT 481, Spring

Video

12
01/22/2007

Background

Fast paced emergence in applications in


medicine, education, travel etc
Characterized by large documents that must
be communicated with short delays
Glamorous applications such as distance
learning, video teleconferencing
Applications that are enhanced by Video are
often seen as driver for development of
multimedia networks

IT 481, Spring

13
01/22/2007

Forces Driving Communications That


Facilitate Multimedia Communications

Evolution of communications and data


networks
Increasing availability of almost unlimited
bandwidth demand
Availability of ubiquitous access to the
network
Ever increasing amount of memory and
computational power
Sophisticated terminals
Digitization of virtually everything

IT 481, Spring

14
01/22/2007

New Information System Paradigm

Broadband Link

Multimedia
Integrated
Communication

Integration

Workstation, PC

Multimedia
Processing

Slide: Courtesy, Hung Nguyen


IT 481, Spring

15
01/22/2007

Elements of Multimedia Systems

Two key communication modes


Person-to-person
Person-to-machine

Use
Interface

Transport

Use
Interface

Processing
Storage and
Retrieval

Transport

Use
Interface

Slide: Courtesy, Hung Nguyen


IT 481, Spring

16
01/22/2007

Multimedia Networks

The world has been wrapped in copper and


glass fiber and can be viewed as a hair
ball with physical, wireless and satellite
entry/exit points.
Physical: LAN-WAN connections
Wireless: Cellular telephony, wireless PC
connectivity
Satellite: INMARSAT, THURYA, ACeS etc

IT 481, Spring

17
01/22/2007

Multimedia Communication Model

Partitioning of information objects into


distinct types, e.g., text, audio, video
Standardization of service components per
information type
Creation of platforms at two levels network
service and multimedia communication
Define general applications for multiple use
in various multimedia environments
Define specific applications, e.g. ecommerce, tele-training, using building
blocks from platform and general
applications

IT 481, Spring

18
01/22/2007

Requirements

User Requirements

Fast preparation and presentation


Dynamic control of multimedia applications
Intelligent support to users
Standardization

Network Requirements
High speed and variable bit rates
Multiple virtual connections using the same
access
Synchronization of different information types
Suitable standardized services along with
support

IT 481, Spring

19
01/22/2007

Network Requirements

ATM-BISDN and SS7 have enabled the


switching based communications
capabilities over the PSTN that support the
necessary services
ATM-BISDN-SS7 will evolve to all optical
switchless networks based on packet
transfer

IT 481, Spring

20
01/22/2007

Packet Transfer Concept

Allows voice, video and data to be dealt with


in a common format
More flexible than circuit switching which it
can emulate while allowing the multiplexing
of varied bit rate data streams
Dynamic allocation of bandwidth
Handle Variable Bit Rate (VBR) directly

IT 481, Spring

21
01/22/2007

Considerations

Buffering required for constant bit rate data


such as audio
Re-sequencing and recovery capabilities
must be provided over networks where
packets may be received either in an order
different from that transmitted or dropped
In an ATM network some packets can be
dropped while others may not (i.e. voice vs bank
transfer data packets)
Optimum packet lengths for voice video and data
differ in an ATM network
IP packets over the internet may arrive in a
different order or be dropped.

IT 481, Spring

22
01/22/2007

Digital Video Signal Transport

Encoder
Application
Application
Network
Transformation
Data Structuring Multiplexing/Routing
Re-Synch
Quantization
Entropy
Coding
Bit-Rate
Error detection
Overhead
Control
Loss detection
(FEC)
Error correction
Re-Trans
Erasure
correction

IT 481, Spring

Decoder
De-quantization
Entropy decode
Inv Trans
Loss conceal
Post process

Users

Video

The following figure will be examined over the course of


the semester

23
01/22/2007

Quality of Service (QoS)

The set of parameters that defines the


properties of media streams
Can define four QoS layers:
1. User QoS: Perception of the multimedia data at
the user interface (qualitative)
2. Application QoS: Parameters such as end-toend delay (quantitative)
3. System QoS: Requirements on the
communications services derived from the
application QoS
4. Network QoS: Parameters such as network
load and performance

IT 481, Spring

24
01/22/2007

Applications of Multimedia

Business - Business applications for


multimedia include presentations training,
marketing, advertising, product demos,
databases, catalogues, instant messaging,
and networked communication.

Schools - Educational software can be


developed to enrich the learning process.

Slide: Courtesy, Hung Nguyen


IT 481, Spring

25
01/22/2007

Applications of Multimedia

Home - Most multimedia projects reach the


homes via television sets or monitors with
built-in user inputs.

Public places - Multimedia will become


available at stand-alone terminals or kiosks
to provide information and help.

Slide: Courtesy, Hung Nguyen


IT 481, Spring

26
01/22/2007

Compact Disc Read-Only (CD-ROM)

CD-ROM is the most cost-effective


distribution medium for multimedia projects.
It can contain up to 80 minutes of full-screen
video or sound.
CD burners are used for reading discs and
converting the discs to audio, video, and
data formats.

Slide: Courtesy, Hung Nguyen


IT 481, Spring

27
01/22/2007

Digital Versatile Disc (DVD)

Multilayered DVD technology increases the


capacity of current optical technology to 18
GB.
DVD authoring and integration software is
used to create interactive front-end menus
for films and games.
DVD burners are used for reading discs and
converting the disc to audio, video, and data
formats.

Slide: Courtesy, Hung Nguyen


IT 481, Spring

28
01/22/2007

Multimedia Communications

Multimedia communications is the delivery


of multimedia to the user by electronic or
digitally manipulated means.

Audio Communications
(Telephony, sound, Broadcast)

Data, text, image


Communications
(Data Transfer, fax)

Multimedia
Communications

Video Communications
(Video telephony,
TV/HDTV)

Slide: Courtesy, Hung Nguyen


IT 481, Spring

29
01/22/2007

Multimedia Terms

IT 481, Spring

30
01/22/2007

Alternative Types of Media used in


Multimedia Applications

IT 481, Spring

31
01/22/2007

Multimedia Communications Networks

IT 481, Spring

32
01/22/2007

Multimedia Networks and Their Services

IT 481, Spring

33
01/22/2007

Multimedia Networks and Their Services

IT 481, Spring

34
01/22/2007

Audio-Visual Integration

Application in Biometrics Bimodal


Person Verification

Existing methods for person verification are


mainly based on a single modality which
would have limitation in security and
robustness

Audio visual integration using a camera and


microphone makes person verification a
more reliable product

Slide: Courtesy, Hung Nguyen


IT 481, Spring

36
01/22/2007

Joint Audio-Video Coding

Correlation between audio and video can be


used to achieve more efficient coding
Predictive coding of audio and video information
used to construct estimate of current frame
(cross-modal redundancy)
Difference between original and estimated signal
can be transmitted as parameters
Decision on what and how to send is based on
Rate Distortion (R-D) criteria

Reconstruction done at receiver according


to agreed-upon decoding rules

Slide: Courtesy, Hung Nguyen


IT 481, Spring

37
01/22/2007

Cross-Model Predictive Coding

Visual
Analysis

A-to-V
Mapping

Parameter X

Decision
Module
(R-D)

Nothing

X X

Parameter X

Slide: Courtesy, Hung Nguyen


IT 481, Spring

38
01/22/2007

Importance of Interaction
Multimedia is more than the
combination of text, audio, video and
data
Interaction among media is important
Consider a poorly dubbed movie

Audio not synchronized with video


Lip movements inconsistent with
language
Audio dynamic range inconsistent with
the scene
Slide: Courtesy, Hung Nguyen
IT 481, Spring

39
01/22/2007

Media Interaction

Process and Model

Audio

Compression
Synthesis
3D Sound

Lip synch
Face Animation
Joint A/V Coding

Speech Recognition
Text-to-Speech

Multimedia
Image
Video

Text
Translation
Natural language

Sign language
Lip reading

Compression, Graphics
Database indexing/retrieval

Slide: Courtesy, Hung Nguyen


IT 481, Spring

40
01/22/2007

Bimodality of Human Speech

Human speech is produced by vibration of


the vocal cord, configuration of the vocal
tract with muscles that generate facial
expressions
Audio +

Visual

Perceived

ba

ga

da

pa

ga

ta

ma

ga

na

Slide: Courtesy, Hung Nguyen


IT 481, Spring

41
01/22/2007

Basic Definitions

The basic unit of acoustic speech is called a


phoneme
In the visual domain, the basic unit of mouth
movement is called viseme
A viseme is the smallest visibly distinguishable
unit of speech
Can contain several phonemes and thus form
one viseme group
A many-to-one mapping between phonemes and
visemes

Slide: Courtesy, Hung Nguyen


IT 481, Spring

42
01/22/2007

Lip Reading System

Application to support hearing-impaired


person
People learn to understand spoken
language by combining visual content with
lexical, syntactic, semantic and
programmatic information
Automated lip reading systems
Speech recognition possible using only visual
information
Integrated with speech recognition systems to
improve accuracy

Slide: Courtesy, Hung Nguyen


IT 481, Spring

43
01/22/2007

Lip Synchronization

Applications
In VTC (video teleconferencing) where video
frame is dropped (low bandwidth requirement)
but audio must still be continuous
In non-real-time use such as dubbing in studio
where recorded voice full of background noise

Time-warping commonly used in both audio


and video modes
Time-frequency analysis
Video time-warping could be used for VTC
Audio time-warping could be used for dubbing

Slide: Courtesy, Hung Nguyen


IT 481, Spring

44
01/22/2007

Lip Tracking

To prevent too much jerkiness in the motion


rendering and too much loss in lip synchronization
Involved real-time analysis on 3-dimensional of the
video signal plus one temporal dimension
Produce meaningful parameters
Classification of mouth images into visemes
Measures of dimension, e.g. mouth widths and
heights
Analysis tools Fourier Transform, KarhunenLoeve Transform (KLT), Probability Density
Function (pdf) Estimation

Slide: Courtesy, Hung Nguyen


IT 481, Spring

45
01/22/2007

Audio-to-Visual Mapping for Lip


Tracking

Conversion of acoustic speech to mouth shape


parameters
A mapping of phonemes to visemes
Could be most precisely implemented with a
complete speech recognizer followed by a look-up
table
High computational overhead plus table look-up complexity
Do not need to recognize spoken word to achieve audioto-visual mapping

Physical relationships exist between vocal tract


shape and sound produced functional
relationships exist between speech and visual
parameters

Slide: Courtesy, Hung Nguyen


IT 481, Spring

46
01/22/2007

Classification-Based Conversion
Approaches for Lip Tracking

Two-step process
Classification of acoustic signal using VQ
(vector quantization), HMM (hidden Markov
model) and NN (neural network)
Mapping of the acoustic classes into
corresponding visual outputs, then averaged to
get centroid

Shortcomings
Error resulting from averaging visual vector to
get visual centroid
Not a continuous mapping finite output levels

Slide: Courtesy, Hung Nguyen


IT 481, Spring

47
01/22/2007

Classification-Based Conversion

Phoneme Space

Viseme Space
Centroid

Slide: Courtesy, Hung Nguyen


IT 481, Spring

48
01/22/2007

Audio and Visual Integration for Lip


Reading Applications

Three major steps


Audio-visual pre-processing Principal
Component Analysis (PCA) has been used for
feature extraction
Pattern recognition strategy (HMM, NN, timewarping)
Integration strategy (decision making)

Heuristic rules to incorporate knowledge of phonemes


about the two modalities
Combination of independent evaluation score for each
modalities

Slide: Courtesy, Hung Nguyen


IT 481, Spring

49
01/22/2007

Você também pode gostar