Você está na página 1de 6

March 2007 (vol. 8, no. 3), art. no.

0703-o3006
1541-4922 2007 IEEE
Published by the IEEE Computer Society

From the Editor: Distributed Multimedia Community

Multi-View Video: Get Ready for Next-Generation Television


Ishfaq Ahmad University of Texas at Arlington
Many believe that multi-view video is poised to change how people watch television and that it could
become a driving force in interactive multimedia entertainment, for both desktop and mobile
environments. An MVV system acquires several video sequences of the same scene
simultaneously from more than one angle and transports these streams remotely. Scenes can be
displayed interactively, letting the user rotate the view from multiple angles as if it were 3D and enjoy
the feeling of being in the scene. Owing to the massive amount of data involved and extensive
processing requirements, real-time MVV processing presents research issues that lie at the frontier of
video coding, image processing, computer vision, and display technologies. Building a complete endto-end MVV system also hinges on several additional technologies, such as real-time acquisition,
transmission, and display of dynamic scenes that users can view interactively on conventional screens.
Several research groups around the world are actively researching MVV.

Applications
MVV technology could lead to exciting new applications in areas such as education, medicine,
surveillance, communication, and entertainment. It could also lead to a mass-media shake-up and the
birth of a new industry, especially in the mobile domain. Furthermore, researchers will also need to
examine surround sound with a fresh perspective to accompany the video style. MVV can also
profoundly affect telecommunication, given that telecommunications ultimate goal is highly effective
interpersonal information exchange.
For instance, media sports coverage technology keeps evolving. In the past, only a few TV channels
aired the games that interested people. Now audio and video coverage can be delivered over the
Internet or broadcast in HDTV format. Technology has always dazzled sports fans. Instant replays,
introduced in the early 1960s, added a new dimension that in-stadium fans couldnt see, and
miniature cameras let viewers see what referees see on the field. As MVV technology matures, we can
expect a revolution in coverage of sportscar racing, soccer, football, basketball, and so on. With
multiple cameras capturing and broadcasting the scene live to viewers and letting them rotate the
viewing angle, sports viewing could become a whole new concept.
Current videoconferencing systems provide a fixed view of the remote scene, so they dont give you
the feeling of being there. Multi-view video could have a broad impact on such systems. One
important feature of future communications will be interactivity with stereoscopic and 3D vision, which
make you feel more as if youre present in the scene. In a videoconferencing scenario, participants at
different geographical sites could meet virtually and see one another in free viewpoint video or 3DTV
style.
Surveillance and remote monitoring of important sites, such as critical infrastructures, traffic, parking
lots, and banks, could also benefit from this technology because it can provide coverage of very large
areas from multiple angles. Other potential application areas include entertainment (such as concerts,
multiuser games, and movies), education (such as digital libraries and archives, training and
instruction manuals with real video, and surgeon training), culture (such as zoos, aquariums, and
museums), and archiving (such as scientific archives, national treasures, and traditional
entertainment).

IEEE Distributed Systems Online (vol. 8, no. 3), art. no. 0703-o3006

Research issues
An MVV system consists of components for data acquisition, compression, and delivery. The
acquisition component captures videos from multiple cameras and obtains the acquisitions
parameters. The processing part analyzes the acquired data, extracts features of it, and compresses it
for delivery and storage. On the receiving side, decoding and display devices reconstruct the view in
either two or three dimensions, depending on the devices capabilities.

Video acquisition and representation


For MVV content generation, numerous scene-acquisition methods are possible. The scene-modeling
and real-time processing requirements and the available bandwidth for video transmission determine
the variation in the number, type, and placement of cameras. For instance, for model-based
representation, good-quality 3D video can be rendered using the input from only a limited number of
cameras. Image-based correspondence techniques, however, might require a large number of input
streams but little processing. Some video-acquisition schemes require static background capture
before introducing the scenes dynamic parts.
Estimating the setups extrinsic and intrinsic parameters might require camera calibration. You can
classify acquisition setups on the basis of camera placement geometry, camera type (stationary or
motional), distance from the objects of interest, and synchrony of video acquisition. Other parameters,
such as intrinsic parameters of different camera types, also distinguish different setups. On the basis
of the acquisition system setup, MVV scenarios fall into different categories. The camera configuration
can be parallel,1 convergent, or a combination of both.2 Convergent configurations are generally used
with model-based representations of the dynamic scenes captured.3 Other capturing systems also
exist.47
In an MVV system, the video streams must be synchronized to ensure that all the cameras shutters
open at the same instant when theyre sampling the scene from different angles.3,8 Video captured
from different cameras is used together with timing information to create novel views in multi-view
video. The input from the cameras can be synchronized using external sources such as a light flash at
periodic intervals.4 External synchronization can slow down the frame rate considerably.
One way of representing multi-view video is to use 2D video plus a disparity map and 3D structure.
MPEG-4 multiview coding8 proposed using video streams and a disparity map. Various rendering
methods can be used with this scheme on the client side. The blue-c project at ETH (Eidgenssische
Technische Hochschule) Zrich has used a 3D hierarchical data point representation.4 It allowed
efficient spatial coding into different data streams (tree structure, color, position, and normal
information) and temporal coding using update, insert, and delete operators.

Multi-view video processing


MVV compression involves more than just compressing independent multiple streams, without which
scene reconstruction wouldnt be possible. Traditional 2D video-coding standards, such as MPEG and
H.2XX, exploit the human eyes characteristics, including its sensitivity to color. 2D video coding also
takes advantage of the motion as well as the spatial and statistical redundancies in video data. In
general, MVV is reconstructed from multiple 2D video sequences. More than one view video sequence
must be transmitted or stored, leading to a massive amount of data.
MVV compression algorithms should reduce redundancy in information from multiple views as much as
possible to provide a high degree of compression, subject to distortion and resource constraints. The
redundancy in MVV streams consists of intraframe redundancy (spatial): intraframe prediction coding;
interframe redundancy (temporal): motion-compensated prediction coding; inter-view redundancy
(geometrical): disparity-compensated prediction coding; transform redundancy (frequency): DCT
(Discrete Cosine Transform) or wavelet transform coding; redundancy of human visual system:
scalable coding.

IEEE Distributed Systems Online (vol. 8, no. 3), art. no. 0703-o3006

3D video compression has the following additional requirements:


Visual quality. Decompressed data should provide good visual quality. Criteria include subjective
quality (that is, how it looks to the human visual system), objective quality, and quality consistency
among views (that is, the data should provide perceptually similar visual quality over different views
that will be presented in the same time frame).
Synthesizability for reconstructed video. Decompressed data should support robust generation of a
virtual or interpolated view. So, camera calibration information and the depth/disparity map should be
compressed along with view data.
Compatibility. Should be compatible for current and future video standards.1
Low delay. The compression algorithms should provide low delay for real-time applications. Such
delays include encoding and decoding delays, view change delays, and end-to-end delay.
Camera motion. Should support encoding of video sequences, subject to camera motion.
Scalability. This includes signal-noise ratio scalability, spatial scalability, temporal scalability,
complexity scalability, view scalability, and scalability on a multitude of terminals and under different
network conditions.

Networking and transportation


Delivering MVV video to end users will pose serious networking challenges, involving protocols, quality
of service, channel-delay management, and error concealment and recovery. Depending on their
environments and requirements, MVV systems can be built on different architectures (see figure 1).

IEEE Distributed Systems Online (vol. 8, no. 3), art. no. 0703-o3006

Figure 1. Various multi-view video system architectures: (a) distributed-acquisition


and distributed-viewers model (DADV); (b) Local-acquisition and local-viewers
model (LALV or Saitos model);6 (c) distributed-acquisition and local-viewers
model (DALV or Heinrich-Hertz Institute model;9 (d) local-data-acquisition and
distributed-viewers model (LADV or University of Central Florida model).10

Projects
Because multi-view video is a new and widely applicable research area with a broad range of open
problems, numerous related research efforts are under way worldwide. In Europe, the Digital
Stereoscopic Imaging and Application (DISTIMA) project addressed the production, presentation,
coding, and transmission of digital stereoscopic video signals over integrated broadband
communications networks. Another European research project, the Package for New Operational
Autostereoscopic Multiview System ( PANORAMA), has aimed to facilitate the hardware and software
development of an MVV autostereoscopic telecommunication system. The Advanced Threedimensional Television System Technology (ATTEST) project aims to design an entire 3D-video chain,
including content creation, coding, transmission, and display. Mitsubishi Electric Research
Laboratories, Carnegie Mellon Universitys computer vision lab, Kyoto University, Heinrich Hertz
Institute in Germany, and the blue-c project are pursuing similar endeavors.

IEEE Distributed Systems Online (vol. 8, no. 3), art. no. 0703-o3006

Outlook
A workgroup of the International Organization for Standardizations Motion Picture Expert Group has
been exploring 3D audiovisual technology. The 3DAV has discussed various applications and
technologies in relation to the term multi-view video. A multi-view profile is available in the MPEG-2
standard, which was defined in 1996 as an amendment for stereoscopic TV. The MVP extends the wellknown hybrid coding toward exploitation of inter-view/channel redundancies by implicitly defining
disparity-compensated prediction; however, it doesnt support interactivity. MPEG-4 version 2 includes
the Multiple Auxiliary Component, defined in 2001. MACs basic idea is that grayscale shape is used
not only to describe the video objects transparency but also can be defined in a more general way.
MACs are defined for a video object plane on a pixel-by-pixel basis and contain data related to the
video object, such as disparity, depth, and additional texture. Since 2003, MPEG has also accelerated
its work on MVV coding standards. The Multiview Video Coding initiative has passed MPEGs call-forproposals stage. The proposals were based on the H.264/AVC video coding standard. Thus, the MVC is
currently being developed and standardized as an extension of this standard in a joint ad hoc group on
MVC (AHG on MVC) in JVT. For more information on the MPEG standardization efforts, see
www.chiariglione.org/mpeg/working_documents.htm.
MVV-based products are expected to appear in two to three years. Watching television passively might
soon be a thing of the past.

References
1.

2.

3.
4.

5.
6.

7.

8.
9.
10.

W. Matusik and H. Pfister, 3D TV: A Scalable System for Real-Time Acquisition, Transmission,
and Autostereoscopic Display of Dynamic Scenes, ACM Trans. Graphics, vol. 23, no. 3, 2004,
pp. 814824.
T. Kanade, H. Saito, and S. Vedula, The 3D Room: Digitizing Time-Varying 3D Events by
Synchronized Multiple Video Streams, tech. report CMU-RI-TR-98-34, Carnegie Mellon Univ.,
1998.
T. Kanade, P.W. Rander, and P.J. Narayan, Virtual Reality: Constructing Virtual Worlds from
Real Scenes, IEEE Multimedia, vol. 4, no. 1, 1997, pp. 3447.
M. Gross et al., blue-c: A Spatially Immersive Display and 3D Video Portal for Telepresence,
Proc. ACM Intl Conf. Computer Graphics and Interactive Techniques (SIGGRAPH 03), ACM
Press, 2003, pp. 819827.
S. Moezzi et al., Immersive Video, Proc. Virtual Reality Ann. Intl Symp. (VRAIS 96), IEEE CS
Press, 1996, pp. 1724.
H. Saito, S. Baba, and T. Kanade, Appearance-Based Virtual View Generation from
Multicamera Videos Captured in the 3-D Room, IEEE Trans. Multimedia, vol. 5, no. 3, 2003,
pp. 303316.
A. Smolic and D. McCutchen, 3DAV Exploration of Video-Based Rendering Technology in
MPEG, IEEE Trans. Circuits and Systems for Video Technology, vol. 14, no. 3, 2004, pp. 348
356.
H. Cheng, Temporal Registration of Video Sequences, Proc. 2003 IEEE Intl Conf. Acoustics,
Speech, and Signal Processing (ICASSP 03), vol. 3, IEEE Press, 2003, pp. III 48992.
K. Mueller et al., Coding of 3D Meshes and Video Textures for 3D Video Objects, Proc.
Picture Coding Symp. (PCS 04), 2004.
O. Javed and M. Shah, Tracking and Object Classification for Automated Surveillance, Proc.
7th European Conf. Computer Vision (ECCV 02), Springer, 2002, pp. 343357.

Ishfaq Ahmad is a professor of computer science and engineering at the University of Texas at
Arlington. Contact him at iahmad@cse.uta.edu.

IEEE Distributed Systems Online (vol. 8, no. 3), art. no. 0703-o3006

Related Links
x
x
x

DS Online's Distributed Multimedia Community


"Automated Visual Surveillance in Realistic Scenarios," IEEE Multimedia
"TAVERNS: Visualization and Manipulation of GIS Data in 3D Large Screen Immersive
Environments," Proc. ICAT 06

Cite this article:


Ishfaq Ahmad, "Multiview Video: Get Ready for Next-Generation Television," IEEE Distributed Systems Online, vol. 8, no.
3, 2007, art. no. 0703-o3006.

IEEE Distributed Systems Online (vol. 8, no. 3), art. no. 0703-o3006

Você também pode gostar