VIDEO

VIDEO
INTRODUCTION
The recording and editing of sound has long been in the domain of the PC. Doing so with
motion video has only recently gained acceptance. This is because of the enormous file
size required by video. For example : one second of 24-bit, 640 X 480 mode video and its
associated audio required 30 MB space. Thus a 20 minute clip filled 36 GB of disk space.
Moreover it required processing at 30 MB/s.
The only solution was to compress the data, but compression hardware was very
expensive in the early days of video editing. As a result video was played in very small
sized windows of 160 X 120 pixels which occupied only 1/16 th of the total screen. It
was only after the advent of the Pentium-II processor coupled with cost reduction of
video compression hardware, that full screen digital video finally became a reality.
Moving Pictures
In motion video the illusion of moving images is created by displaying a sequence of still
images rapidly one after another. If displayed fast enough our eye cannot distinguish the
individual frames, rather because of persistence of vision merges the individual frames
with each other thereby creating an effect of movement.
Each individual image in this case is called a frame and the speed with which the images
are displayed one after another is called frame rate. The frame rate should range between
20 and 30 for perceiving smooth realistic motion. Audio is added and synchronized with
the apparent movement of images.
Motion picture is recorded on film whereas in motion video the output is an electrical
signal. Film playback is at 24 fps while for video ranges from 25 to 30. Visual and audio
data when digitized and combined into a file gives rise to digital video.
Video represents a sequence of real world images taken by a movie camera. So it depicts
an event that physically took place in reality. Animation also works on the same principle
of displaying a sequence of images at a specific speed to create the illusion of motion but
here the images are drawn by artists, by hand or software. So the events do not depict any
real sequence of events taking place in the physical world.
ANALOG VIDEO
In analog video systems video is stored and processed in the form of analog electrical
signals. The most popular example is television broadcasting. In contrast, digital video is
where video is represented by a string of bits. All forms of video represents in a PC
represents digital video.
Video Camera
Analog video cameras are used to record a succession of still images and then convert the
brightness and color information of the images into electrical signals. These signals are
transmitted from one place to another using cables or by wireless means and in the
television set at the receiving end these signals are again converted to form the images.
The tube type analog video camera is generally used in professional studios and uses
electron beams to scan in a raster pattern, while the CCD video camera, using a light-
sensitive electronic device called the CCD, is used for home/office purposes where
portability is important.
Tube type Camera
The visual image in front of the video camera is presented to the user by an optical lens.
This lens focuses the scene on the photosensitive surface of a tube in the same way that
the lens of a camera focuses the image on the film surface.
The photo-sensitive surface, called Target, is a form of semi-conductor. It is almost an
insulator in the absence of light. With absorption of energy caused by light striking the
target, electrons acquire sufficient energy to take part in current flow. The electrons
migrate towards a positive potential applied to the lens side of the target. This positive
potential is applied to a thin layer of conductive but transparent material. The vacant
energy states left by the liberated electrons, called holes, migrate towards the inner
surface of the target. Thus a charge pattern appears on the inner surface of the target that
is most positive where the brightness or luminosity of the scene is the greatest.
The charge pattern is sampled point-by-point by a moving beam of electrons which
originates in an electron gun in the tube. The beam scans the charge pattern in the same
way a raster is produced in a monitor but approach the target at a very low velocity. The
beam deposits just enough carriers to neutralize the charge pattern formed by the holes.
Excess electrons are turned back towards the source. The exact number of electrons
needed to neutralize the charge pattern constitute a flow of current in a series circuit. It is
this current flowing across a load resistance that forms the output signal voltage of the
tube.
CCD Camera
The ADC sends the digital

information to a digital
signal processor (DSP). The
DSP adjusts the contrast
and detail in the image,
compresses the data and
sends it to the camera’s
storage medium.
The transistors generate a

continuous analog electrical
signal that goes to an analog to
digital converter. The ADC
translates the varying signal to
a digital format consisting of 1s
and 0s.
Instead of being focused on
photographic film, the image is
focused on a chip called a CCD.
The face of the CCD is studded
with an array of transistors that
create electrical current in
proportion to the intensity of light
striking them.
Light passing through the lens of the camera is focussed on a chip called CCD. The surface of the CCD is
covered with an array of transistors that create electrical current in proportion to the intensity of the light
striking them. The transistors make up the pixels of the image. The transistors generate a continuous
analog electrical signal that goes to an ADC which translates the signal to a digital stream of data. The
ADC sends the digital information to a digital signal processor (DSP) that has been programmed
specifically to manipulate photographic images. The DSP adjusts the contrast and brightness of the image,
and compresses the data before sending it to the camera’s storage medium. The image is temporarily
stored on a hard drive, RAM, floppy or tape built into the camera’s body before being transferred to the
PC’s permanent storage.
Television Systems
Color Signals
The video cameras produce

three output signals which require three parallel cables for transmission. Because of the
complexities involved in transmitting 3 signals in exact synchronism, TV systems do not
usually handle RGB signals. Signals are encoded in composite format as per Luma-
Chroma principle based on human color perceptions. This is distributed using a single
cable or channel.
Human Color Perception
All objects that we observe are focused sharply by the lens system of the eye on the
retina. The retina which is located at the back side of the eye has light sensitive cells
which measure the visual sensations. The retina is connected with the optic nerve which
conduct the light stimuli as sensed by the organs to the optical centre of the brain.
According to the theory formulated by Helmholtz the light sensitive cells are of two types
– rods and cones. The rods provide brightness sensation and thus perceive objects in
various shades of grey from black to white. The cones that are sensitive to color are
broadly divided in three different groups. One set of cones detect the presence of blue
color, the second set perceives red color and the third is sensitive to the green shade. The
combined relative luminosity curve showing relative sensation of brightness produced by
individual spectral colors is shown below. It will be seen from the plot that the sensitivity
of the human eye is greatest for the green-yellow range decreasing towards both the red
and blue ends of the spectrum. Any color other than red, green and blue excite different
sets of cones to generate a cumulative sensation of that color. White color is perceived by
the additive mixing of the sensations from all the three sets of cones.
Based on the spectral response curve and extensive tests with a large number of
observers, the relative intensities of the primary colors for color transmission e.g. for
color television, has been standardized. The reference white for color television
transmission has been chosen to be a mixture of 30% red, 59% green and 11% blue.
These percentages are based on the light sensitivities of the eye to different colors. Thus
one lumen (lm) of white light = 0.3 lm of red + 0.59 lm of green + 0.11 lm of blue = 0.89
lm of yellow + 0.11 lm of blue = 0.7 lm of cyan + 0.3 lm of red = 0.41 lm of magenta +
0.59 lm of green.
Luma-Chroma Principle
The principle states that any video signal can be broken into two components :
The luma component, which describes the variation of brightness in different portions of the image
without regard to any color information. It is denoted by Y and can be expressed as a linear combination
of RGB :
Y = 0.3R + 0.59G + 0.11B
The chroma component, which describe the

variation of color information in different parts of the image without regard to any brightness information.
It is denoted by C and can further be subdivided into two components U and V.
T
hus RGB output signals from a video camera are transformed to YC format using electronic circuitry before
being transmitted. At the receiving end for a B/W TV, the C component is discarded and only the Y
component is used to display a B/W image. For a color TV, the YC components are again converted back
to RGB signals which are used to drive the electron guns of a CRT.
Color Television Camera
The figure below shows a block diagram of a color TV camera. It essentially consists of
three camera tubes in which each tube receives selectively filtered primary colors. Each
camera tube develops a signal voltage proportional to the respective color intensity
received by it. Light from the scene is processed by the objective lens system. The image
formed by the lens is split into three images by glass prisms. These prisms are designed
as diachroic mirrors. A diachroic mirror passes one wavelength and rejects other
wavelengths. Thus red, green and blue images are formed. These pass through color
filters which provide highly precise primary color images which are converted into video
signals by the camera tubes. This generates the three color signals R, G and B.
To generate the monochrome or brightness signal that represents the luminance of the
scene the three camera outputs are added through a resistance matrix in the proportion of
0.3, 0.59 and 0.11 for R, G and B respectively
Y = 0.3R + 0.59G + 0.11B
The Y signal is transmitted as in a monochrome television system. However instead of

transmitting all the three color signals separately the red and blue camera outputs are
combined with the Y signal to obtain what is known as the color difference signals. Color
difference voltages are derived by subtracting the luminance voltage from the color
voltages. Only (R-Y) and (B-Y) are produced. It is only necessary to transmit two of the
three color difference signals since the third may be derived from the other two.
The color difference signals equal zero when white or grey shades are being transmitted.
This is illustrated by the calculation below.
For any grey shade (including white) let R = G = B = v volts.

Then Y = 0.3v + 0.59v + 0.11v = v
Thus, (R-Y) = v – v = 0 volt, and (B-Y) = v – v = 0 volt.
When televising color scenes even when voltages R, G and B are not equal, the Y signal
still represents monochrome equivalent of the color. This aspect can be illustrated by the
example below. For simplicity of calculation let us assume that the camera output
corresponding to the maximum (100%) intensity of white light be an arbitrary value of 1
volt.
Consider a color of unsaturated magenta and it is required to find out the voltage
components of the luminance and color difference signals.
Since the hue is magenta it implies a mixture of red and blue. The word unsaturated
indicates that some white light is also there. The white content will develop all the three
i.e. R, G and B voltages, the magnitudes of which will depend on the extent of
unsaturation. Thus R and B voltages must dominate and both must be of greater
amplitude than G. Let R=0.7 volt, G=0.2 volt, B=0.6 volt represent the unsaturated
magenta color. The white content must be represented by equal quantities of the three
primaries and the actual amount must be indicated by the smallest voltage i.e. G=0.2 volt.
Thus the remaining i.e. R=(0.7-0.2)=0.5 volt and B=(0.6-0.2)=0.4 volt is responsible for
the magenta hue.
The luminance signal Y = 0.3R+0.59G+0.11B = 0.3(0.7)+0.59(0.2)+0.11(0.6) = 0.394
volt.
The color difference signals are : (R-Y) = 0.7-0.394 = 0.306 volt
(B-Y) = 0.6-0.394 = 0.206 volt
The other component (G-Y) can be derived as shown below:

Y = 0.3R+0.59G+0.11B
Thus, (0.3+0.59+0.11)Y = 0.3R+0.59G+0.11B
Rearranging the terms, 0.59(G-Y) = -0.3(R-Y) –0.11(B-Y)
i.e. G-Y = -0.51(R-Y)-0.186(B-Y)
Since the value of the luminance is Y=0.394 volt and peak white corresponds to 1 volt,
the magenta will show up as a fairly dull grey in a monochrome television set.
Chroma Sub-sampling
Conversion of RGB signals into YC format also has another important advantage of
utilizing less bandwidth through the use of chroma subsampling. It had been observed
through experimentation that human eye is more sensitive to brightness information than
to color information. This limitation can be exploited to transmit reduced color
information as compared to brightness information, a process called chroma-
subsampling, and save on bandwidth requirements.
On account of this, we get code words like "4:2:2" and "4:1:1" to describe how the
subsampling is done. Roughly, the numbers refer to the ratios of the luma sampling
frequency to the sampling frequencies of the two chroma channels (typically Cb and Cr,
in digital video); "roughly" because this formula doesn't make any sense for things like
"4:2:0".
4:4:4 --> No chroma subsampling, each pixel has Y, Cr and Cb values.

4:2:2 --> Chroma is sampled at half the horizontal frequency as luma, but the vertical
frequency is the same. The chroma samples are horizontally aligned with luma samples.
4:1:1 --> Chroma is sampled at one-fourth the horizontal frequency as luma, but at full
vertical frequency. The chroma samples are horizontally aligned with luma samples.
4:2:0 --> Chroma is sampled at half the horizontal frequency as luma, and also at half the
vertical frequency. Theoretically, the chroma pixel is positioned between the rows and
columns.
Bandwidth and Frequencies

Each TV channel is allocated 6 MHz of bandwidth. Out of these the 0 to 4 MHz part of the signal is devoted
to Y component, the next 1.5 MHz is taken up by the C component, and the last 0.5 MHz is taken up by
the audio signal.
Video Signal Formats
Component Video
Our color television system starts out with three channels of information; Red, Green, &
Blue (RGB). In the process of translating these channels to a single composite video
signal they are often first converted to Y, R-Y, and B-Y. Both three channel systems,
RGB and Y, R - Y, B - Y are component video signals. They are the components that
eventually make up the composite video signal. Much higher program production quality
is possible if the elements are assembled in the component domain.
Composite Video
A video signal format where both the luminance and chroma components are transmitted
along a single wire or channel. Usually used in normal video equipment like VCRs as
well as TV transmissions. NTSC, PAL, and SECAM are all examples of composite video
systems.
S-Video
Short for Super-video. A video signal format where the luminance and color components
are transmitted separately using multiple cables or channels. Here picture quality is better
than that of composite video but is more expensive. Usually used in high end VCRs and
capture cards.
Television Broadcasting Standards

NTSC
National Television Systems Committee. Broadcast standard used in USA and Japan. Uses 525 horizontal
lines at 30 (29.97) frames / sec. Uses composite video format where luma is denoted by Y and chroma
components by I and Q. While Y utilizes 4 MHz bandwidth of a television channel, I uses 1.5 MHz and Q
only 0.5 MHz. I and Q can be expressed as combinations of RGB as shown below :
I = 0.74(R-Y) – 0.27(B-Y)
Q = 0.48(R-Y) + 0.41(B-Y)
PAL
Phase Alternating Lines. Broadcast standard used in Europe, Australia, South Africa, India. Uses 625
horizontal lines at 25 frames / sec. Uses composite video format where luma is denoted by Y and chroma
components by U and V. While Y utilizes 4 MHz bandwidth of a television channel, U and V both uses 1.3
MHz each. U and V can be expressed as a linear combination of RGB as shown below :
U = 0.493(B-Y)
V = 0.877(R-Y)
SECAM
Sequential Color and Memory. Used in France and Russia. The fundamental difference between the SECAM
system on one hand and the NTSC and PAL system on the other hand is that the latter transmit and
receive two color signals simultaneously while in the SECAM system only one of the two difference signals
is transmitted at a time. It also uses 625 horizontal lines at 25 frames/sec. Here the color difference
signals are denoted by DR and DB and both occupies 1.5 MHz each. They are given by the relations:
DR = -1.9(R-Y)
DB = 1.5(B-Y)
Other Television Systems
Enhanced Definition Television Systems (EDTV)
These are conventional systems modified to offer improved vertical and horizontal resolutions. One of the
systems emerging in US and Europe is known as the Improved Definition Television (IDTV). IDTV is
an attempt to improve NTSC image by using digital memory to double the scanning lines from 525 to
1050. The pictures are only slightly more detailed than NTSC images because the signal does not contain
any new information. By separating the chrominance and luminance parts of the video signal, IDTV
prevents cross-interference between the two.
High Definition Television (HDTV)
The next generation of television is known as the High Definition TV (HDTV). The HDTV image has
approximately twice as many horizontal and vertical pixels as conventional systems. The increased
luminance detail in the image is achieved by employing a video bandwidth approximately five times that
used in conventional systems. Additional bandwidth is used to transmit the color values separately. The
aspect ratio of the HDTV screen will be 16 : 9. Digital codings are essential in the design and
implementation of HDTV. There are two types of possible digital codings : composite coding and
component coding. Composite coding of the whole video signal is in principle easier than a digitization of
the separate signal components (luma and chroma) but there are also serious problems with this
approach, like disturbing cross-talk between the luma and chroma information, and requirement of more
bandwidth due to the fact that chroma-subsampling would not be possible. Hence component coding
seems preferable. The luminance signal is sampled at 13.5 MHz as it is more crucial. The chrominance
signals (R-Y, B-Y) are sampled at 6.75 MHz (4:2:2). The digitized luminance and chrominance signals are
then quantized with 8 bits each. For the US, a total of 720000 pixels are assumed per frame. If the
quantization is 24 bits/pixel and the frame rate is approximately 60 frame/second, then the data rate for
the HDTV will be 1036.8 Mbits/second. Using a compression method data rate reduction to 24
Mbits/second will be possible without noticeable quality loss. In the case of European HDTV, the data rate
is approximately 1152 Mbits/second.
DIGITAL VIDEO
Video Capture
Source and Capture Devices
Two main components : Source and Source device Capture device. During capture the
visual component and audio component are captured separately and automatically
synchronized. Source devices must use PAL or NTSC playback and must have
Composite video or S-video output ports.
The source and source device can be the following :

• Camcorder with pre-recorded video tape
• VCP with pre-recorded video cassette
• Video camera with live footage
• Video CD with Video CD player
Video Capture Card

A full motion video capture card is a circuit board in the computer that consists of the
following components :
• Video INPUT port to accept the video input signals from NTSC/PAL/SECAM
broadcast signals, video camera or VCR. The input port may conform to the
composite-video or S-video standards.
• Video compression-decompression hardware for video data.
• Audio compression-decompression hardware for audio data.
• A/D converter to convert the analog input video signals to digital form.
• Video OUTPUT port to feed output video signals to camera and VCR.
• D/A converter to convert the digital video data to analog signals for feeding to
output analog devices.
• Audio INPUT/OUTPUT ports for audio input and output functions.
Rendering support for the various television signal formats e.g. NTSC, PAL, SECAM
imposes a level of complexity in the design of video capture boards.
Video Capture Software
The following capabilities might be provided by a video capture software, often bundled with a capture
card :
AVI Capture : This allows capture and digitization of the input analog video signals from external devices
and conversion to an AVI file on the disk of the computer. No compression is applied to the video data and
hence this is suitable for small files. Playback of the video is done through the Windows Media Player.
Before capturing parameters like frame rate, brightness, contrast, hue, saturation etc. as well as audio
sampling rate and audio bit size may be specified.
AVI to MPEG Converter : This utility allows the user to convert a captured AVI file to MPEG format. Here
the MPEG compression algorithm is applied to an AVI file and a separate MPG file is created on the disk.
Before compression parameters like quality, amount of compression, frame dimensions, frame rate etc.
may be specified by the user. Playback of the MPEG file is done through the Windows Media Player.
MPEG Capture : Certain cards allow the user to capture video directly in the MPEG format. Here analog
video data is captured, digitized and compressed at the same time before being written to the disk. This is
suitable for capturing large volumes of video data. Parameters like brightness, contrast, saturation etc.
mat be specified by the user before starting capturing.
DAT to MPEG Converter : This utility converts the DAT format of a Video-CD into MPEG. Conversion to
MPEG is usually done for editing purposes. DAT and MPG are similar formats so that the file size changes
by very small amounts after conversion. The user has to specify the source DAT file and the location of the
target MPG file.
MPEG Editor : Some capture software provide the facility of editing an MPEG file. The MPG movie file is
opened in a timeline structure and functions are provided for splitting the file into small parts by
specifying the start and end of each portion. Multiple portions may also be joined together. Sometimes
functions for adding effects like transitions or sub-titling may also be present. The audio track may also be
separately edited or manipulated.
Video Compression
Types of Compression
Video compression is a process whereby the size of the digital video on the disk is reduced using certain
mathematical algorithms. Compression is required only for storing the data. For playback of the video, the
compressed data need again to be decompressed. Software used for the compression/decompression
process are called CODECs. During the process of compression the algorithms analyse the source video
and tries to find out redundant and irrelevant portions. Greater is the amount of these portions in the
source data, better is the scope of compressing it.
Video compression process may be categorised using different criteria.

Lossless compression occurs when the original video data is not changed permanently in any way during
the compression process. This means that the original video data can be obtained after decompression.
Though this preserves the video quality, but the amount of compression achieved is usually limited. This
process is usually used where the quality is of more importance than the storage space issues e.g. medical
image processing.
Lossy compression occurs where a part of the original data is discarded during the compression process in
order to reduce the file size. This data is lost forever and cannot be recovered after the decompression
process. Thus here quality is degraded due to compression. The amount of compression and hence the
degradation in quality is usually selectable by the user – more is the compression greater is the
degradation in quality and vice versa. This process is usually used where storage space is more important
than quality e.g. corporate presentations.
Since video is essentially a sequence of still images, compression can be differentiated on what kind of
redundancy is exploited. Intraframe compression occurs where redundancies in each frame or still image
(spatial redundancy) is exploited to produce compression. This process is same as an image compression
process. A video CODEC can also implement another type of compression when it exploits the
redundancies between adjacent frames in a video sequence (temporal redundancy). This is called
interframe compression.
Compression can also be categorized based on the time taken to compress and decompress. Symmetrical
compression algorithms take almost the same for both the compression and decompression process. This
is usually used in live video transmissions. An asymmetrical compression algorithm usually take a greater
amount of time for the compression process than for the decompression process. This is usually used for
applications like CD-ROM presentations.
Since video is essentially a sequence of still images, the initial stage of video compression is same as that
for image compression. This is the intraframe compression process and can be both lossless and lossy.
The second stage after each frame is individually compressed is the interframe compression process
where redundancies between adjacent frames are exploited to achieve compression.
Lossy Coding Techniques
Lossy coding techniques are also known as Source Coding. The popular methods are discussed below :
Discrete Cosine Transform (DCT)
Which portion of data is considered lossy and which lossless depends on the algorithm.
One method to separate relevant from irrelevant information is Transform Coding. This transforms data
into a different mathematical model better suited for the purpose of separation. One of the best known
transform codings is Discrete Cosine Transform (DCT). For all transform codings an inverse function must
exist to enable reconstruction of the relevant information by the decoder.
An image is subdivided into blocks of 8 X 8 pixels. Each of these blocks is represented as a combination of
DCT functions. 64 appropriately chosen coefficients represent the variation of horizontal and vertical
frequencies of varying pixel intensities.
The human eye is greatly sensitive at low frequency levels, but it sensitivity decreases at high frequency
levels. Thus reduction in number of high frequency DCT components weakly affects image quality. After
the DCT transform, a process called Quantization is used to extract the relevant information by making
the high frequency components zero.
Video Compression Techniques

After the image compression techniques, the video CODEC uses interframe algorithms to
exploit temporal redundancy, as discussed below :
Motion Compensation
By motion compensated prediction, temporal redundancies between two frames in a

video sequence can be exploited. Temporal redundancies can arise from movement of
objects in front of a stationary background. The basic concept is to look for a certain area
(block) in a previous or subsequent frame that matches very closely an area of the same
size in the current frame. If successful then the differences in the block intensity values
are calculated. In addition, the motion vector which represents the translation of the
corresponding blocks in both x- and y-directions is determined. Together the difference
signal and the motion vector represent the deviation between the reference block and
predicted block.
Some Popular CODECs
JPEG
Stands for Joint Photographic Experts Group, a joint effort by ITU and ISO. Achieves
compression by first applying DCT, then quantization, and finally entropy coding the
corresponding DCT coefficients. Corresponding to the 64 DCT coefficients, a 64 element
quantization table is used. Each DCT coefficient is then divided by the corresponding
quantization table entry, and the values rounded off. For entropy coding Huffman method
is used.
MPEG-1
Stands for Moving Pictures Expert Group. MPEG-1 belongs to a family of ISO standards. Provides motion
compensation and utilizes both intraframe and interframe compression. Uses 3 different types of frames :
I-frames, P-frames and B-frames.
I-frames (intracoded) : These are coded without any reference to other images. MPEG makes use of
JPEG for I frames. They can be used as a reference for other frames.
P-frames (predictive) : These require information from the previous I and/or P frame for encoding and
decoding. By exploiting temporal redundancies, the achievable compression ratio is higher than that of the
I frames. P frames can be accessed only after the referenced I or P frame has been decoded.
B-frames (bidirectional predictive) : Requires information from the previous and following I and/or P
frame for encoding and decoding. The highest compression ratio is attainable by using these frames. B
frames are never used as reference for other frames.
Reference frames must be transmitted first. Thus transmission order and display order may differ. The
first I frame must be transmitted first followed by the next P frame and then by the B frames. Thereafter
the second I frame must be transmitted. An important data structure is the Group of Pictures (GOP) . A
GOP contains a fixed number of consecutive frames and guarantees that the first picture is an I-frame. A
GOP gives an MPEG encoder information as to which picture should be encoded as an I, P or B frame and
which frames should serve as references.
The first frame in a GOP is always an I-frame which is encoded like an intraframe image i.e. with DCT,
quantization and entropy coding. The motion estimation step is activated when B or P frames appear in
the GOP. Entropy coding is done by using Huffman coding technique.
Cinepak
Cinepak was originally developed to play small movies on '386 systems, from a single- speed CD-ROM
drive. Its greatest strength is its extremely low CPU requirements. Cinepak's quality/datarate was
amazing when it was first released, but does not compare well with newer CODECs available today. There
are higher-quality (and lower-datarate) solutions for almost any application. However, if you need your
movies to play back on the widest range of machines, you may not be able to use many of the newer
codecs, and Cinepak is still a solid choice.
After sitting idle for many years, Cinepak is finally being dusted off for an upgrade. Cinepak Pro from CTI
(www.cinepak.com) is now in pre-release, offering an incremental improvement in quality, as well as a
number of bug fixes. Supported by QuickTime and Video for Windows.
Sorenson
One of the major advances of QuickTime 3 is the new Sorenson Video CODEC which is included as a
standard component of the installation. It produces the highest quality low-data rate QuickTime movies.
The Sorenson Video CODEC produces excellent Web video suitable for playback on any Pentium or
PowerMac. It also delivers outstanding quality CD-ROM video at a fraction of traditional data rates, which
plays well on 100MHz systems. Compared with Cinepak, Sorenson Video generally achieves higher image
quality at a fraction of the data rate. This allows for higher quality, and either faster viewing (on the
WWW), or more movies on a CD-ROM (often four times as much material on a disc as Cinepak). It
supports variable bitrate encoding [When movies are compressed, each frame of the video must be
encoded to a certain number of bytes. There are several techniques for allocating the bytes for each
frame. Fixed bitrate is used by certain codecs (like Cinepak), which attempt to allocate approximately the
same number of bytes per frame. Variable bitrate (VBR) is supported by other codecs (such as MPEG-2
and Sorenson), and attempts to give each frame the optimum number of bytes, while still meeting set
constraints (such as the overall data rate of the movie, and the maximum peak data rate). ]. Supported
by Quicktime. Manufacturer is Sorenson Vision Inc (www.sorensonvideo.com )
RealVideo
RealMedia currently has only two video CODECs: RealVideo (Standard) and RealVideo (Fractal).
RealVideo (Standard) is usually best for data rates below 3 Kbps. It works better with relatively static
material than it does with higher action content. It usually encodes faster. RealVideo (Standard) is
significantly more CPU intensive than the RealVideo (Fractal) CODEC. It usually requires a very fast
PowerMac or Pentium for optimal playback. It is supported by the RealMedia player. Manufacturer is
Progressive Networks (www.real.com).
H.261
H.261 is a standard video-conferencing CODEC. As such, it is optimized for low data rates and relatively
low motion. Not generally as good quality as H.263. H.261 is CPU intensive, so data rates higher than 50
Kbps may slow down most machines. It may not play well on lower-end machines. H.261 has a strong
temporal compression component, and works best on movies in which there is little change between
frames. Supported by Netshow, Video for Windows.
H.263
H.261 is a standard video-conferencing CODEC. H.263 is an advancement of the H.261 standard, mainly it
was used as a starting point for the development of MPEG (which is optimized for higher data rates).
Supported by QuickTime, Netshow, Video for Windows.
Indeo Video Interactive (IVI)
Indeo Video Interactive (IVI) is a very high-quality, wavelet-based CODEC. It provides excellent image
quality, but requires a high-end Pentium for playback. There are currently two main versions of IVI.
Version 4 is included in QuickTime 3 for Windows; Version 5 is for DirectShow only. Neither version
currently runs on the Macintosh, so any files encoded with IVI will not work cross-platform. Version 5 is
very similar to 4, but uses an improved wavelet algorithm for better compression. Architecures supported
are QuickTime for Windows, Video for Windows, DirectShow. Manufacturer is Intel (www.intel.com).
VDOLive
VDOLive is an architecture for web video delivery, created by VDOnet Corporation (www.vdo.net).
VDOLive is a server-based, "true streaming" architecture that actually adjusts to viewers' connections as
they watch movies. Thus, true streaming movies play in real-time with no delays for downloading. For
example, if you clicked on a 30 second movie, it would start playing and 30 seconds later, it would be
over, regardless of your connection, with no substantial delays. VDOLive's true streaming approach differs
from QuickTime's "progressive download" approach. Progressive download allows you to watch (or hear)
as much of the movie as has downloaded at any time, but movies may periodically pause if the movie has
a higher data rate than the user's connection, or if there are problems with the connection or server, such
as very high traffic. In contrast to progressive download, the VDOLive server talks to the VDOPlayer (the
client) with each frame to determine how much bandwidth a connection can support. The server then only
sends that much information, so movies always play in real time. In order to support this real-time
adjustment of the data-stream, you must use special server software to place VDOLive files on your site.
The real-time adjustment to the viewer's connection works like this: VDOLive files are encoded in a
"pyramidal" fashion. The top level of the pyramid contains the smallest amount of the most critical image
data. If your user has a slow connection, they are only sent this top portion. The file's next level has more
data, and will be sent if the viewer's connection can handle it, and so forth. Users with very fast
connections (T1 or better) are sent the whole file. Thus, users are only sent what they can receive in real-
time, but the data has been pre-sorted so that the information they get is the best image for their
bandwidth.
MPEG-2
MPEG-2 is a standard for broadcast-quality digitally encoded video. It offers outstanding image quality and
resolution. MPEG-2 is the primary video standard for DVD-Video. Playback of MPEG-2 video currently
requires special hardware, which is built into all DVD-Video players, and most (but not all) DVD-ROM kits.
MPEG-2 was based on MPEG-1 but optimized for higher data rates. This allows for excellent quality at DVD
rates (300-1000 Kbps), but tends to produce results inferior to MPEG-1 at lower rates. MPEG-2 is
definitely not appropriate for use over network connections (except in very special, ultra-high-
performances cases).
MPEG-4
MPEG-4 is a standard currently under development for the delivery of interactive multimedia across
networks. As such, it is more than a single CODEC, and will include specifications for audio, video, and
interactivity. The video component of MPEG-4 is very similar to H.263. It is optimized for delivery of video
at Internet data rates. One implementation of MPEG-4 video is included in Microsoft's NetShow. The rest
of the MPEG-4 standard is still being designed. It was recently announced that QuickTime's file format will
be used as a starting point.
Playback Architectures
QuickTime
QuickTime is Apple's multi-platform, industry-standard, multimedia software architecture. It is used by

software developers, hardware manufacturers, and content creators to author and publish synchronized
graphics, sound, video, text, music, VR, and 3D media. The latest free downloads, and more information,
are available at Apple's QuickTime site. (http://www.apple.com/quicktime). QuickTime offers support for a
wide range of delivery media, from WWW to DVD-ROM. It was recently announced that the MPEG-4
standard (now in design) will be based upon the QuickTime file format. QuickTime is also widely used in
digital video editing for output back to videotape. QuickTime is the dominant architecture for CD-ROM
video. It enjoys an impressive market share due to its cross-platform support, wide range of features, and
free licensing. QuickTime is used on the vast majority of CD-ROM titles for these reasons. QuickTime is a
good choice for kiosks, as it integrates well with Macromedia Director, MPEG, and a range of other
technologies.
RealMedia
The RealMedia architecture was developed by Progessive Networks, makers of RealAudio. It was designed
specifically to support live and on-demand video and audio across the WWW. The first version of
RealMedia is focused on video and audio, and is referred to as RealVideo. Later releases of RealMedia will
incorporate other formats including MIDI, text, images, vector graphics, animations, and presentations.
RealMedia content can be placed on your site either with or without special server software. There are
performance advantages with the server, but you don't have to buy one to get started. However, high
volume sites will definitely want a server to get substantially improved file delivery performance. Users
can view RealMedia sites with the RealPlayer, a free "client" application available from Progressive. A
Netscape plug-in is also available. The main downside to RealMedia is that it currently requires a
PowerMac or Pentium computer to view. As such, RealMedia movies aren't available to the full range of
potential users. The latest free downloads, as well as more information, are available at www.real.com.
NetShow
Microsoft's NetShow architecture is aimed at providing the best multimedia delivery over networks, from
14.4 kbps modems to high-speed LANs. There is an impressive range of audio and video CODECs built into
NetShow 3.0. Combined with a powerful media server, this is a powerful solution for networked media.
Technically, the term "NetShow" refers to the client installation and the server software. Netshow clients
are built on top of the DirectShow architecture. Because of this, NetShow has access to its own CODECs,
and also those for DirectShow, Video for Windows, and QuickTime. Netshow media on WWW pages may
be viewed via ActiveX components (for Internet Explorer), plug-ins (for Netscape Navigator), or stand-
alone viewers. NetShow servers support "true streaming" (in their case, called "intelligent streaming"):
the ability to guarantee continuous delivery of media even if the networks' performance degenerates. If
this happens, NetShow will automatically use less video data (thus reducing the quality). If the amount of
available bandwidth decreases more, NetShow will degrade video quality further, until only the audio is
left. Microsoft says that their implementation provides the most graceful handling of this situation. The
latest free downloads, as well as more information, are available at Microsoft's NetShow site
(www.microsoft.com/netshow).
DirectShow
DirectShow (formerly known as ActiveMovie) is the successor to Microsoft's Video for Windows
architecture. It is built on top of the DirectX architecture (including DirectDraw, DirectSound, and
Direct3D), for optimum access to audio and video hardware on Windows-based computers. Supported
playback media includes WWW, CD-ROM, DVD-ROM, and DVD-Video (with hardware). DV Camera support
will be added in an upcoming release. DirectShow has its own player (the Microsoft Media Player,
implemented as an ActiveX control) which may be used independently or within Internet Explorer. There is
also a plug-in for use with Netscape Navigator. And playback may also be provided by other applications
using the OCX component. As DirectShow is the playback architecture for NetShow, these playback
options support either delivery approach. Media Types Supported are Audio, Video, Closed Captioning
(SAMI), MIDI, MPEG, animation (2D or 3D). The latest free downloads, as well as more information, are
available at Microsoft's DirectX site (www.microsoft.com/directx/pavilion/dshow/default.asp).
Video for Windows
Video for Windows is similar to QuickTime. Its main advantage is that it is built into Windows 95.
However, it is limited in many ways. It runs on Windows only, doesn't handle audio/video synchronization
as well as QuickTime, and doesn't support variable-length frames. Video for Windows is no longer
supported by Microsoft, and is being replaced by DirectShow/ActiveMovie (one of the DirectX
technologies). Video for Windows is often referred to as "AVI" after the .AVI extensions specified by its file
format.
[Some of the details discussed is available at :

http://www.etsimo.uniovi.es/hypgraph/video/codecs/Default.htm ]
Some Concepts of Video Editing
Time Base and Frame Rates
In the natural world we experience time as a continuous flow of events. However

working with video requires precise synchronization so it is necessary to measure time
using precise numbers. Familiar time increments like hours, minutes, seconds, are not
precise enough as each second might contain several events. When editing video several
source clips may need to be imported to create the output clip. The source frame rates of
these source clips determine how many frame rates are displayed per second within these
clips. Source frame rates can be different for different types of clips:
Motion picture film – 24 fps
PAL and SECAM video – 25 fps
NTSC video – 29.97 fps
Web applications – 15 fps
CD-ROM applications – 30 fps
In a video editing project file there is a single and common timeline where all the
imported clips are placed. A parameter called the timebase determines how time is
measured and displayed within the editing software. For example a timebase of 30 means
each second is divided into 30 units. The exact time at which an edit occurs depends on
the timebase specified for the particular project. Since there needs to be a common
timebase for the video editor timeline, source clips whose frame rates do not match with
the specified timebase needs adjustments. For example if the frame rate of a source clip is
30 fps and the timebase of the project is also 30 fps then all frames are displayed as
expected (figure below, half second shown)
However if the source clip was recorded at 24 fps and it is placed on a timeline with a
timebase of 30 then to preserve the proper playback speed some of the original frames
need to be repeated. In the figure below, frames 1, 5 and 9 are shown to be repeated for
half second duration.
If the final edited video clip needs to be exported at 15 fps, then from the timeline every
alternate frames need to be discarded.
On the other hand if the timebase was set at 24 and the final video needs to be exported at
15 fps, then some selective frames would need to be discarded. In the figure below
frames 3, 6, 8 and 11 are shown to be discarded for half second duration.
SMPTE Timecode
Timecode defines how frames in a movie are counted and affects the way you view and
edit a clip. For example, you count frames differently when editing video for television
than when editing for motion-picture film. A standard way to represent timecode have
been developed by a global body called Society of Motion Pictures and Television
Engineers (SMPTE) and represent timecode by a set of numbers. The numbers represent
hours, minutes, seconds, and frames, and are added to video to enable precise editing e.g.
00:03:51:03.
When NTSC color systems were developed, the frame rate was changed by a tiny amount
to eliminate the possibility of crosstalk between the audio and color information; the
actual frame rate that is used is exactly 29.97 frames per second. This poses a problem
since this small difference will cause SMPTE time and real time (what your clock reads)
to be different over long periods. Because of this, two methods are used to generate
SMPTE time code in the video world: Drop and Non-Drop.
In SMPTE Non-Drop, the time code frames are always incremented by one in exact
synchronization to the frames of your video. However, since the video actually plays at
only 29.97 frames per second (rather than 30 frames per second), SMPTE time will
increment at a slower rate than real world time. This will lead to a SMPTE time versus
real time discrepancy. Thus, after a while, we could look at the clock on the wall and
notice it is farther ahead than the SMPTE time displayed in our application.
1
sec. (clock time)
1 sec. (SMPTE time) [video plays at 29.97

frames per sec]
Difference of 0.03 frames per second translates to (0.03 × 60 × 60) or 108 frames per
hour.
SMPTE Drop time code (which also runs at 29.97 frames per second ) attempts to
compensate for the discrepancy between real world time and SMPTE time by "dropping"
frames from the sequence of SMPTE frames in order to catch up with real world time.
What this means is that occasionally in the SMPTE sequence of time, the SMPTE time
will jump forward by more than one frame. The time is adjusted forward by two frames
on every minute boundary which increases the numbering by 120 frames every hour.
However to achieve a total compensation of 108 frames, the increment is avoided at the
following minute boundaries : 00, 10, 20, 30, 40 and 50. Thus when SMPTE Drop time
increments from 00:01:59:29, the next value will be 00:02:00:02 in SMPTE Drop rather
than 00:02:00:00 in SMPTE Non-Drop. In SMPTE Drop, it must be remembered that
certain codes no longer exist. For instance, there is no such time as 00:02:00:00 in
SMPTE Drop. The time code is actually 00:02:00:02. No frames are lost, because drop-
frame timecode does not actually drop frames, only frame numbers. To distinguish from
the non-drop type, the numbers are separated by semicolons instead of colons i.e.
00;02;00;02
1 hour (clock time)
diff. of 108 frames 1 hour

(SMPTE time)
Online Editing and Offline Editing
There are three phases of video production :

• Pre-production : Involves writing scripts, visualizing scenes, storyboarding etc.
• Production : Involves shooting the actual scenes
• Post-production : Involves editing the scenes and correcting / enhancing wherever
necessary.
Editing involves a draft or rough cut called Offline Edit which gives a general idea of the
editing possibilities. Offline edit is usually done on a low-end system using a low
resolution copy of the original video. This is to make the process economically feasible
because a low resolution copy is sufficient to decide on the edit points. An edit decision
list (EDL) is created which contains a list of the edit changes to be carried out. The EDL
can be refined through successive iterations until the edit points and changes are
finalized. Since the iterative process may potentially take a long time duration (typically
several days) using a high-end system is not considered desirable and optimum. Once the
EDL is finalized, the final editing work is done on the actual high-resolution copy of the
video using a powerful system. This operation is called Online Edit. It requires a much
lesser time compared to an offline edit because the operations are done only once based
on the finalized EDL. The higher costs of the high-end system need to be borne only for a
short time duration (typically few hours).
Edit Decision List (EDL)
An EDL is used in offline editing for recording the edit points. It contains the names of
the original clips, the In and Out points, and other editing information. In Premiere
editing decisions in the Timeline are recorded in text format and then exported in one of
the EDL formats. A standard EDL contains the following columns:
(a) Header – Contains title and type of timecode (drop-frame or nondrop-frame)
(b) Source Reel ID – Identifies the name or number of the videotape containing the
source clips
(c) Edit Mode – Indicates whether edits take place on video track, audio track or both.
(d) Transition type – Describes the type of transition e.g. wipe, cut etc.
(e) Source In and Source Out – Lists timecodes of first and last frames of clips
On a high-end system the EDL is accepted by an edit controller which applies the editing
changes to the high-quality clips.
FireWire (IEEE-1394)
Although digital video in an external device or camera is already in binary computer

code, you still need to capture it to a file on a hard disk. Capturing digital video is a
simple file transfer to a computer if the computer has an available FireWire (IEEE-1394)
card and a digital video CODEC is available. The IEEE-1394 interface standard
specification is also known as "FireWire" by Apple Computer, Inc. and as "iLink" or
"iLink 1394" by Sony Corp. Developed by the Institute of Electrical and Electronics
Engineers it is a serial data bus that allows high speed data transfers. Three data rates are
supported : 100, 200 and 400 Mbps. The bus speed is governed by the slowest active
node
It consists two separately shielded pairs of wires for signaling, two power conductors and
an outer shield. Upto 63 devices can be connected in daisy chain. The standard also
support hot plugging which means that devices can be connected or disconnected without
switching off power in the cable.
IEEE 1394 is a non-proprietary standard and many organizations and companies have
endorsed the standard. The Digital VCR Conference selected IEEE 1394 as its standard
digital interface; an EIA committee selected IEEE 1394 as the point to point interface for
digital TV. Video Experts Standards Association (VESA) adopted IEEE 1394 for home
networking. Microsoft first supported IEEE 1394 in the Windows 98 operating system
and it is supported in newer operating systems.
THE VISUAL DISPLAY SYSTEM

The Visual Display System consists of two important components – the Monitor and the Adapter Card &
Cable.
THE MONITOR
The principle on which the monitor works is based upon the operation of a sealed glass tube called the
Cathode Ray Tube (CRT).
Monochrome CRT
The CRT is a vacuum sealed glass tube having two electrical terminals inside, the negative electrode or
cathode (K) and a positive electrode or anode (A). Across these terminals a high potential of the order of
18 KV is maintained. This produces a beam of electrons, known as cathode rays, from the cathode
towards the anode. The front face of the CRT is coated with a layer of a material called phosphor
arranged in the form of a rectangular grid of a large number of dots. The material phosphor has a
property of emitting a glow of light when it is hit by charged particles like electrons.
The beam of electrons is controlled by three other positive terminals.
The control grid (G1) helps to draw out the electrons in an uniform
beam, the accelerating grid (G2) provides acceleration to the electrons
in the forward direction and the focusing grid (G3) focuses the beam to
a single point on the screen ahead, so that the diameter of the beam is
equal to the diameter of a single dot of phosphor. This dot is called a
pixel, which is short for Picture Element. As the beam hits the
phosphor dot, a single glowing pixel is created at the center of the
screen. On the neck of the CRT are two other electrical coils called
deflection coils. When current flows through these coils, the
electrical field produced interacts with the electron beam thereby
deflecting it from its original path. One of the coils called the
horizontal deflection coil moves the beam horizontally across the
screen and the other coil called the vertical deflection coil moves
the beam vertically along the height of the screen. When both these
coils are energized the electron beam can be moved in any direction
thus generating a single spot of light at any point on the CRT screen.
Raster Scanning
To draw an image on the CRT screen involves the process of raster scanning. It is the process by which
the electron beam sequentially moves over all the pixels on the screen. The beam starts from the upper-
left corner of the screen, moves over the first row of pixels until it reaches the right hand margin of the
screen. The beam is then switched off and retraces back horizontally to the beginning of the second row of
pixels. This is called horizontal retrace. It is then turned on again and moves over the second row of
pixels. This process continues until it reaches the bottom-right corner of the screen, after which it retraces
back to the starting point. This is called vertical retrace. The entire pattern is called a raster and each
scan line is called a raster line.
Frames and
Refresh
Rate
The electron
beam is said
to produce a
complete
frame of
picture when
starting from
the top-left
corner it
moves over
all the pixels
and returns
back to the starting point. The human brain has the capability to hold on to the image of an object before
our eyes for a fraction of a second even after the object has been removed from before our eyes. This
phenomenon is called persistence of vision. As the beam moves over each pixel, the glow of the pixel
dies down although its image persists in our eyes for sometimes after that. So if the beam can come back
to the pixel before its glow has completely disappeared, to us it will seem that the pixel is glowing
continuously. It has been observed that we see a steady image on the screen only if 60 frames are
generated on the screen per second i.e. the electron beam should return to its starting point within 1/60
th of a second. The monitor is then said to have a refresh rate of 60 Hz. A monitor with a refresh rate of
less than 50 Hz produces a perceptible flicker on the screen and should be avoided.
Color CRT
The working principle of a color CRT is same as that of a monochrome CRT, except that here each pixel
consists of three colored dots instead of one and is called a triad. These colors are red, green and blue
(RGB) and are called primary colors. Corresponding to the three dots there are also three electron
beams from the electrode (also called electron gun) each of which falls on the corresponding dot. It has
been experimentally observed that the three primary colored lights can combine in various proportions to
produce all other colors. As each of the three beams hits the corresponding dots in various intensities,
they produce different proportions of the three elementary colored lights which together create the
sensation of a specific color in our eyes. Our eyes cannot distinguish the individual dots but see their net
effect as a whole. A perforated screen called a shadow mask prevents the beams from falling in the gap
between the dots. Secondary colors are created by mixing equal quantities of primary colors e.g. red
and green create yellow, green and blue create cyan, blue and red create magenta, while all the three
colors in equal proportion produce white.
Interlacing
A process by which monitors of lower refresh rates can produce images comparable in quality to that
produced by a monitor of higher refresh rate. Here each frame is split into two parts consisting of odd and
even lines from the complete image. These are called odd-field and even-field. The first field is
displayed for half the frame duration and then the second field is displayed so that its lines fit between the
lines of the first field. This succeeds in lowering the frame rate without increasing the flicker
correspondingly, although the picture quality is still not same as that of a non-interlaced monitor. One of
the most popular applications of interlacing is TV broadcasting.
Monitor Specifications
(a) Refresh Rate : Number of frames displayed by a monitor in one second. Thus a monitor having a
frame rate of 60 Hz implies that an image on the screen of the monitor is refreshed 60 times per
second.
(b) Horizontal Scan Rate : Number of horizontal lines displayed by the monitor in one second. For a
monitor having a refresh rate of 60 Hz and 600 horizontal lines on the screen, the horizontal scan rate
is 36 KHz.
(c) Dot Pitch : Shortest distance between two neighbouring pixels or triads on the screen. Usually of the
order of 0.4 mm to 0.25 mm.
(d) Pixel Addressability : The total number of pixels that can be addressed on the screen. Measured by
the product of the horizontal number of pixels and the vertical number of pixels on the screen.
Modern monitors usually have 640 X 480 pixels or 800 X 600 pixels on the screen.
(e) Aspect Ratio : Ratio of the width of the screen to its height. For computer monitors and TV screens
it is 4:3, whereas for movie theatres it is 16:9.
(f) Size : The longest diagonal length of the monitor. Standard computer monitors are usually between
15” and 20” in size.
(g) Resolution : The total number of pixels per unit length of the monitor either in the horizontal or
vertical directions. Measured in dots per inch (dpi). Usually of the order of 75 dpi to 96 dpi for modern
monitors.
(h) Color Depth : A measure of the total number of colors that can be displayed on a monitor. Depends
on the number of the varying intensities that the electron beams can be made to have. A monitor
with a color depth of 8-bits can display a total of 28 or 256 colors.
Problem-1
A 15” monitor with aspect ratio of 4:3 has a pixel addressability of 800 X 600. Calculate its
resolution.
Lets the width of the monitor be 4x and its height be 3x.

For a right angled triangle by we know,
(4x)2 + (3x)2 = 152
i.e. 16x2 + 9x2 = 152
i.e. x=3
The width of the monitor is 12” and the height is 9”
So, resolution is (800/12) = (600/9) = 66.67 dpi
Problem-2
A monitor can display 4 shades of red, 8 shades of blue and 16 shades of green. Find out its color
depth.
Each pixel can take up a total of (4 X 8 X 16) or 512 colors.

Since 29 = 512, the monitor has a color depth of 9-bits.
THE VIDEO ADAPTER CARD AND CABLE
The Video Adapter is an expansion card which usually sits on a slot on the motherboard. It acts as an
interface between the processor of the computer and the monitor. The digital data requiring for creating
an image on the screen is generated by the central processor of the computer and consists of RGB values
for each pixel on the screen. These are called pixel attributes. For an 8-bit image, each pixel is digitally
represented by an 8-bit binary number. The adapter interprets these attributes and translates them into
one of 256 voltage levels (since 28 = 256) to drive the electron gun of the monitor. These intensity signals
along with two synchronization signals for positioning the electron beam at the location of the pixel, are
fed to the monitor from the adapter through the video cable.
The VGA
The Video Graphics Array (VGA) adapter was a standard introduced by IBM which was capable of
displaying text and graphics in 16 colors at 640 x 480 mode or 256 colors at 320 x 240 mode. A VGA card
had no real processing power meaning that the CPU had to do most of the image manipulation tasks. The
VGA adapter was connected to a VGA compatible monitor using a video cable with a 15-pin connector. The
pins on the connector carried various signals from the card to the monitor including the color intensity
signals and the synchronization signals. The sync signals were generated by the adapter to control the
movement of the electron guns of the CRT monitor. Sync signals consisted of the horizontal sync pulses
which controlled the left to right movement of the electron beam as well as the horizontal retrace, and the
vertical sync pulses which controlled the up and down movement of the beam as well as the vertical
retrace. Nowadays VGA has become obsolete being replaced by the SVGA adapters.
The SVGA
The industry extended the VGA standard to include improved capabilities like 800 x 600 mode with 16-bit
color and later on 1024 x 768 mode with 24-bit color. All of these standards were collectively called Super
VGA or SVGA. The Video Electronics Standard Association (VESA) defined a standard interface for the
SVGA adapters and called it VESA BIOS Extensions. Along with these new improved standards came
accelerated video cards which included a special graphics processor on the adapter itself and relieved the
main CPU from most of the tasks of image manipulation.
Components of an Adapter
The main components of the video adapter card include :
Display Memory
A bank of memory within the adapter card used for storing pixel attributes. Initially used for storing the
image data from the CPU and later used by the adapter to generate RGB signals for the monitor. The
amount of memory should be sufficient to hold the attributes of all the pixels on the screen and depends
on the pixel addressability as well as the color depth. Thus for a 8-bit image displayed at 640 x 480 mode,
minimum amount of display memory required is approximately 1 MB.
Graphics Controller
A chip within the adapter card responsible for coordinates the activities of all other components of the
card. For the earlier generation video cards, the controller simply passed on the data from the processor
to the monitor after conversion. For modern accelerated video cards, the controller also has the capability
of manipulating the image data independently of the central processor.
Digital-to-Analog Converter
The DAC actually converts the digital data stored in the display memory to analog voltage levels to drive
the electron beams of the CRT.
Problem-3
A monitor has pixel addressability of 800 X 600 and a color depth of 24-bits. Calculate the
minimum amount of display memory required in its adapter card to display an image on the
screen.
A total of 24-bits are allocated to each pixel.

So for a total of 800 X 600, total number of bits required is (800 X 600 X 24).
To store these many bits, the amount of display memory required is (800 X 600 X 24)/(8 X 1024
X 1024) which rounded to the next highest integer becomes 2 MB.
Accelerated Graphics Port (AGP)

To combat the eventual saturation of the PCI bus with video information a new interface has been
pioneered by Intel (http://developer.intel.com/technology/agp) , designed specifically for the video
subsystem. AGP was developed in response to the trend towards greater and greater performance
requirements for video. As software evolves and the computer use continuously into previously unexplored
areas such as 3D acceleration and full-motion video playback, both the processor and the video adapter
need to process more and more information. Another issue has been the increasing demands for video
memory. Much larger memory are being required on video cards not just for the screen image but for the
3D calculations. This in turn makes the video card more expensive.
AGP gets around these problems by two approaches. It provides a separate AGP slot on the motherboard
connected to an AGP bus providing 530 MB/sec. It also utilizes a portion of the main memory known as
the texture cache for storing pixel attributes thereby going beyond the limits of display memory on the
adapter card. AGP is ideal for transferring the huge amount of data required for displaying 3D graphics
and animation. AGP is considered a port and not a bus as it involves only two devices, the processor and
the video card and is not expandable. AGP has helped remove bandwidth overheads from the PCI bus.
The slot itself is physically similar to the PCI slot but is offset further from the edge of the motherboard.
The Liquid Crystal Display
Principle of Operation
Liquid crystals were first discovered in the late 19 th century by Austrian botanist
Freidrich Reinitzer and the term liquid crystal was coined by German physicist Otto
Lehmann.
Liquid crystal are transparent organic substances consisting of long rod-like molecules
which in their natural state arrange themselves with their axes roughly parallel to each
other. By flowing the liquid crystal over finely grooved
surface it is possible to control the alignment of the molecules as they follow the alignment of the grooves.
The first principle of an LCD consists of sandwiching a layer of liquid crystal between two finely grooved
surfaces whose grooves are perpendicular to each other. Thus the molecules at the two surfaces are
aligned perpendicular to each other and those at the intermediate layers are twisted by intermediate
angles. Light in following the molecules is also twisted by 90 degrees as it passes through the liquid
crystal.
The second principle of an LCD

depends on polarizing filters.
Natural light waves are oriented
at random angles. A polarizing
filter acts as a net of finely
parallel lines blocking all light
except those whose waves are
parallel to those lines. A second
polarizer perpendicular to the
first would therefore block all of
the already polarized light. An
LCD consists of two polarizing
filters perpendicular to each other
with a layer of twisted liquid
crystals between them. Light
after passing through the first
polarizer is twisted through 90
deg. By the liquid crystal and
passes out completely through
the second polarizer. This gives us a lighted pixel. On applying an electric charge across the liquid crystal
its molecular alignment is disturbed. In this case light is not twisted by 90 degrees by the liquid crystal
and therefore blocked by the second polarizer. This gives us a dark pixel. Images are drawn on the screen
using arrangements of these lighted and dark pixels.

VIDEO

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

VIDEO

Enviado por

Direitos autorais:

Formatos disponíveis

VIDEO

Tube type Camera

The ADC sends the digital

The transistors generate a

The video cameras produce

Y = 0.3R + 0.59G + 0.11B

The chroma component, which describe the

Color Television Camera

Y = 0.3R + 0.59G + 0.11B

The Y signal is transmitted as in a monochrome television system. However instead of

For any grey shade (including white) let R = G = B = v volts.

The other component (G-Y) can be derived as shown below:

4:4:4 --> No chroma subsampling, each pixel has Y, Cr and Cb values.

Bandwidth and Frequencies

Television Broadcasting Standards

Other Television Systems

Enhanced Definition Television Systems (EDTV)

High Definition Television (HDTV)

Source and Capture Devices

The source and source device can be the following :

Video Capture Card

Video Capture Software

Video compression process may be categorised using different criteria.

Lossy Coding Techniques

Discrete Cosine Transform (DCT)

Video Compression Techniques

By motion compensated prediction, temporal redundancies between two frames in a

Some Popular CODECs

Indeo Video Interactive (IVI)

QuickTime is Apple's multi-platform, industry-standard, multimedia software architecture. It is used by

Video for Windows

[Some of the details discussed is available at :

Some Concepts of Video Editing

Time Base and Frame Rates

In the natural world we experience time as a continuous flow of events. However

1 sec. (SMPTE time) [video plays at 29.97

1 hour (clock time)

diff. of 108 frames 1 hour

Online Editing and Offline Editing

There are three phases of video production :

Edit Decision List (EDL)

Although digital video in an external device or camera is already in binary computer

THE VISUAL DISPLAY SYSTEM

Lets the width of the monitor be 4x and its height be 3x.

Each pixel can take up a total of (4 X 8 X 16) or 512 colors.

THE VIDEO ADAPTER CARD AND CABLE

A total of 24-bits are allocated to each pixel.

Accelerated Graphics Port (AGP)

The second principle of an LCD

Você também pode gostar