CT

NPR COLLEGE OF ENGINEERING & TECHNOLOGY.
EC1252 COMMUNICATION THEORY
DEPT/ YEAR/ SEM: ECE/ II/ IV PREPARED BY: Ms. S. THENMOZHI/ Lecturer/ECE
SYLLABUS UNIT 1 AMPLITUDE MODULATION SYSTEMS Review of spectral characteristics of periodic and non-periodic signals Generation and demodulation of AM, DSBSC, SSB and VSB signals Comparison of amplitude modulation systems Frequency translation FDM Non-linear distortion. UNIT II ANGLE MODULATION SYSTEMS Phase and frequency modulation Single tone Narrow band and wideband FM Transmission bandwidth Generation and demodulation of FM signal. UNIT III NOISE THEORY Review of probability Random variables and random process Gaussian process Noise Shot noise Thermal noise and white noise Narrow band noise Noise temperature Noise figure. UNIT IV PERFORMANCE OF CW MODULATION SYSTEMS Superheterodyne radio receiver and its characteristic SNR Noise in DSBSC systems using coherent detection Noise in AM system using envelope detection FM system FM threshold effect Pre-emphasis and de-emphasis in FM Comparison of performances. UNIT V INFORMATION THEORY Discrete messages and information content Concept of amount of information Average information Entropy Information rate Source coding to increase average information per bit Shannon-fano coding Huffman coding Lempel-Ziv (LZ) coding Shannons theorem Channel capacity Bandwidth S/N trade-off Mutual information and channel capacity Rate distortion theory Lossy source coding. TEXT BOOKS 1. Dennis Roddy and John Coolen., Electronic Communication, 4th Edition, PHI,1995. 2. Herbert Taub and Donald L Schilling., Principles of Communication Systems, 3rd Edition, TMH, 2008. REFERENCES 1. Simon Haykin., Communication Systems, 4th Edition, John Wiley and Sons, 2001. 2. Bruce Carlson., Communication Systems, 3rd Edition, TMH, 1996. 3. Lathi, B. P., Modern Digital and Analog Communication Systems, 3rd Edition, Oxford Press, 2007. 4. John G. Proakis, Masoud Salehi., Fundamentals of Communication Systems, 5th Edition, Pearson Education, 2006.
UNIT 1
AMPLITUDE MODULATION SYSTEMS
Review of spectral characteristics of periodic and non-periodic signals. Generation and demodulation of AM signal. Generation and demodulation of DSBSC signal. Generation and demodulation of SSB signal. Generation and demodulation of VSB signal. Comparison of amplitude modulation systems. Frequency translation. FDM. Non-linear distortion.
Introduction:
In electronics, a signal is an electric current or electromagnetic field used to convey data from one place to another. The simplest form of signal is a direct current (DC) that is switched on and off; this is the principle by which the early telegraph worked. More complex signals consist of an alternating-current (AC) or electromagnetic carrier that contains one or more data streams.
Modulation:
Modulation is the addition of information (or the signal) to an electronic or optical signal carrier. Modulation can be applied to direct current (mainly by turning it on and off), to alternating current, and to optical signals. One can think of blanket waving as a form of modulation used in smoke signal transmission (the carrier being a steady stream of smoke). Morse code, invented for telegraphy and still used in amateur radio, uses a binary (two-state)digital code similar to the code used by modern computers. For most of radio and telecommunication today, the carrier is alternating current (AC) in a given range of frequencies. Common modulation methods include: Amplitude modulation (AM), in which the voltage applied to the carrier is varied over time Frequency modulation (FM), in which the frequency of the carrier waveform is varied in small but meaningful amounts Phase modulation (PM), in which the natural flow of the alternating current waveform is delayed temporarily
Classification of Signals:
Some important classifications of signals Analog vs. Digital signals: as stated in the previous lecture, a signal with a magnitude that may take any real value in a specific range is called an analog signal while a signal with amplitude that takes only a finite number of values is called a digital signal. Continuous-time vs. discrete-time signals: continuous-time signals may be analog or digital signals such that their magnitudes are defined for all values of t, while discrete-time signal are analog or digital signals with magnitudes that are defined at specific instants of time only and are undefined for other time instants.
Periodic vs. aperiodic signals: periodic signals are those that are constructed from a specific shape that repeats regularly after a specific amount of time T0, [i.e., a periodic signal f(t) with period T0 satisfies f(t) = f(t+nT0) for all integer values of n], while aperiodic signals do not repeat regularly.
Deterministic vs. probabilistic signals: deterministic signals are those that can be computed beforehand at any instant of time while a probabilistic signal is one that is random and cannot be determined beforehand.
Energy vs. Power signals: as described below.
Energy and Power Signals

The total energy contained in and average power provided by a signal f(t) (which is a function of time) are defined as
Ef
| f (t ) |2 dt ,
and
T /2
Pf
1 lim T T
| f (t ) |2 dt ,
T /2
respectively.
For periodic signals, the power P can be computed using a simpler form based on the periodicity of the signal as
T t0
PPeriodic f
1 T
| f (t ) |2 dt ,
t0
where T here is the period of the signal and t0 is an arbitrary time instant that is chosen to simply the computation of the integration (to reduce the functions you have to integrate over one period).
Classification of Signals into Power and Energy Signals

Most signals can be classified into Energy signals or Power signals. A signal is classified into an energy or a power signal according to the following criteria
a)
Energy Signals: an energy signal is a signal with finite energy and zero average power (0 E < , P = 0),
b)
Power Signals: a power signal is a signal with infinite energy but finite average power (0 < P < , E ).
Comments:
1.
The square root of the average power
P of a power signal is what is
usually defined as the RMS value of that signal. 2. Your book says that if a signal approaches zero as t approaches then the
signal is an energy signal. This is in most cases true but not always as you can verify in part (d) in the following example. 3. All periodic signals are power signals (but not all nonperiodic signals are energy signals). 4. Any signal f that has limited amplitude (| f | < ) and is time limited (f = 0 for | t | > t0 for some t0 > 0) is an energy signal as in part (g) in
the following example.
Exercise 1: determine if the following signals are Energy signals, Power signals, or
neither, and evaluate E and P for each signal (see examples 2.1 and 2.2 on pages 17 and 18 of your textbook for help).
a)
a(t ) 3sin(2 t ),
This is a periodic signal, so it must be a power signal. Let us prove it.
Ea 9 9
| a (t ) |2 dt
| 3sin(2 t ) |2 dt
1 1 cos(4 t ) dt 2 1 dt 2 J 9 cos(4 t )dt
Notice that the evaluation of the last line in the above equation is infinite because of the first term. The second term has a value between 2 to 2 so it has no effect in the overall value of the energy.
Since a(t) is periodic with period T = 2 /2 = 1 second, we get

1 1
Pa
1 | a (t ) |2 dt 10
1
| 3sin(2 t ) |2 dt
0
9 9
1 1 cos(4 t ) dt 2 0
0
1 dt 2 0
9 cos(4 t )dt
0 1
9 2 9 2
9 sin(4 t ) 4 W
So, the energy of that signal is infinite and its average power is finite (9/2). This means that it is a power signal as expected. Notice that the average power of this signal is as expected (square of the amplitude divided by 2)
b)
b (t ) 5e
2|t |
Let us first find the total energy of the signal.
Eb
| b (t ) |2 dt
0
5e
2|t | 2
dt
25 e 4t dt
25 e 4t dt
0 4t 0
25 4t 0 25 e e 4 4 25 25 50 J 4 4 4
The average power of the signal is

T /2 T /2
Pb
1 lim T T 25 lim
T
| b (t ) | dt
T /2
1 lim T T 25 lim
T
5e
T /2 T /2
2|t | 2
dt
1 T
e 4t dt
T /2 0 T /2 2T
1 T
e 4t dt
0 T /2 0
25 1 4t lim e T 4 T 25 1 lim 1 e T 4 T 0 0 0
25 1 lim e 4t T 4 T 25 1 lim e 2T T 4 T
So, the signal b(t) is definitely an energy signal. So, the energy of that signal is infinite and its average power is finite (9/2). This means that it is a power signal as expected. Notice that the average power of this signal is as expected (the square of the amplitude divided by 2)
c)
c (t )
4e 3t , | t | 5 , 0, |t | 5
d)
d (t )
1 , t t 0, t
1 1
Let us first find the total energy of the signal.
Ed
| d (t ) |2 dt ln t
1
1 dt t 1
So, this signal is NOT an energy signal. However, it is also NOT a power signal since its average power as shown below is zero.
The average power of the signal is

1 lim T T
T T /2
Pd
| d (t ) | dt
T /2 T /2 1
1 lim T T
T /2
1 dt t 1 ln 1 T
lim
1 ln t T
lim
1 T ln T 2 ln T 2 T
lim
1 T ln T 2
lim
Using Lehopitals rule, we see that the power of the signal is zero. That is
ln Pd
T
lim
T 2 T
2 lim T T 1
So, not all signals that approach zero as time approaches positive and negative infinite is an energy signal. They may not be power signals either.
e)
e (t )
7t 2 ,
f)
f (t )
2 cos 2 (2 t ),
g)
g (t )
12cos 2 (2 t ), 0,
8 t 31 elsewhere
AMPLITUDE MODULATION:
In amplitude modulation, the instantaneous amplitude of a carrier wave is varied in accordance with the insta modulating signal. Main advantages of AM are small bandwidth and simple transmitter and receiver designs.
implemented by mixing the carrier wave in a nonlinear device with the modulating signal. This produces uppe which are the sum and difference frequencies of the carrier wave and modulating signal.
The carrier signal is represented by c(t) = A cos(wct) The modulating signal is represented by m(t) = B sin(wmt)
Then the final modulated signal is [1 + m(t)] c(t) = A [1 + m(t)] cos(wct) = A [1 + B sin(wmt)] cos(wct) = A cos(wct) + A m/2 (cos((wc+wm)t)) + A m/2 (cos((wc-wm)t)) Because of demodulation reasons, the magnitude of m(t) is always kept less than 1 and the frequency much smaller than that of the carrier signal. The modulated signal has frequency components at frequencies wc, wc+wm and wc-wm.
DSBSC:
Double Sideband Suppressed Carrier Modulation In amplitude modulation the amplitude of a high-frequency carrier is varied indirect proportion to the low-frequency (baseband) message signal. The carrier is usually a sinusoidal waveform, that is, c(t)=Ac cos(ct+c) Or c(t)=Ac sin(ct+c) Where: Ac is the unmodulated carrier amplitude c is the unmodulated carrier angular frequency in radians/s; c =2fcc is the unmodulated carrier phase, which we shall assume is zero. The amplitude modulated carrier has the mathematical form DSB-SC(t)= A(t) cos(ct) Where: A(t) is the instantaneous amplitude of the modulated carrier, and is a linear function of the message signal m(t). A(t) is also known as the envelope of the modulated signal For double-sideband suppressed carrier (DSB-SC) modulation the amplitude is related to the message as follows: A(t)=Ac(t) m(t) Consider a message signal with spectrum (Fourier transform) M() which is band limited to 2B as shown in Figure 1(b). The bandwidth of this signal is B Hz and c is chosen such that c >> 2B. Applying the modulation theorem, the modulated Fourier transform is A(t) cos(ct)= m(t) cos(ct) ( M( - c)+ M( + c))
GENERATION OF DSBSC: The DSB-SC can be generated using either the balanced modulator or the ring-modulator. The balanced modulator uses two identical AM generators along with an adder. The two amplitude modulators have a common carrier with one of them modulating the input message , and the other modulating the inverted message . Generation of AM is not simple, and to have two AM generators with identical operating conditions is extremely difficult.
Hence, laboratory implementation of the DSB-SC is usually using the ring-modulator, shown in figure 1.
Figure 1
Figure 1: The ring modulator used for the generation of the double-side-band-suppressedcarrier (DSB-SC) This standard form of DSB-SC generation is the most preferred method of laboratory implementation. However, it cannot be used for the generation of the AM waveform. The DSB-SC and the DSB forms of AM are closely related as; the DSB-SC with the addition of the carrier becomes the DSB, while the DSB with the carrier removed results in the DSB-SC form of modulation. Yet, existing methods of DSB cannot be used for the generation of the DSB-SC. Similarly the ring modulator cannot be used for the generation of the DSB. These two forms of modulation are generated using different methods. Our attempt in this work is to propose a single circuit capable of generating both the DSB-SC and the DSB forms of AM. THE MODIFIED SWITCHING MODULATOR: The block diagram of the modified switching modulator given in figure 1, has all the blocks of the switching modulator, but with an additional active device. In this case, the active device has to be of three terminals to enable it being used as a controlled switch. Another significant change is that of the adder being shifted after the active device. These
changes in the switching-modulator enable the carrier to independently control the switching action of the active device, and thus eliminate the restriction existing in the usual switching-modulator (equation (2)). In addition, the same circuit can generate the DSBSC waveform. Thus the task of modulators given in figures 1 and 2 is accomplished by the single modulator of figure 3.
Figure 2 Figure 2: The modified switching modulator It is possible to obtain AM or the DSB-SC waveform from the modified switchingmodulator of figure 3, by just varying, the amplitude of the square wave carrier . It may be noted that the carrier performs two tasks: (i) control the switching action of the active devices and (ii) control the depth of modulation of the generated AM waveform. Thus, the proposed modification in the switching modulator, enables the generation of both the AM and the DSB-SC from a single circuit. Also, it may be noted that the method is devoid of any assumptions or stringent difficult to maintain operating conditions, as in existing low power generation of the AM. We now implement the modified switching modulator and record the observed output in the next Section. Experimental results The circuit implemented for testing the proposed method is given in figure 4, which uses transistors CL-100 and CK-100 for controlled switches, two transformers for the adder, followed by a passive BPF. The square-wave carrier and the sinusoidal message are
given from a function generator (6MHz Aplab FG6M).The waveforms are observed on the mixed signal oscilloscope (100MHz Agilent 54622D, capable of recording the output in .tif format).
Figure 3 Figure 3: The implementation of the modified switching modulator to generate the AM and the DSB-SC waveform The modified switching modulator is tested using a single tone message of 706 Hz, with a square-wave carrier of frequency 7.78 KHz. The depth of modulation of the generated waveform can be varied either by varying the amplitude of the carrier or by varying the amplitude of the signal. Figure 5 has the results of the modulated waveforms obtained using the modified switching modulator. It can be seen that the same circuit is able to generate AM for varying depths of modulation, including the over-modulation and the DSB-SC. The quality of the modulated waveforms is comparable to that obtained using industry standard communication modules (like the LabVolt for example).
Properties of DSB-SC Modulation:
(a) There is a 180 phase reversal at the point where +A(t)=+m(t) goes negative. This is typical of DSB-SC modulation.
(b) The bandwidth of the DSB-SC signal is double that of the message signal, that is, BWDSB-SC =2B (Hz). (c) The modulated signal is centered at the carrier frequency c with two identical sidebands (double-sideband) the lower sideband (LSB) and the upper sideband (USB). Being identical, they both convey the same message component.
(d) The spectrum contains no isolated carrier. Thus the name suppressed carrier.
(e)The 180 phase reversal causes the positive (or negative) side of the envelope to have a shape different from that of the message signal. This is known as envelope distortion, which is typical of DSBSC modulation.
(f) The power in the modulated signal is contained in all four sidebands.
Generation of DSB-SC Signals
The circuits for generating modulated signals are known as modulators. The basic modulators are Nonlinear, Switching and Ring modulators. Conceptually, the simplest modulator is the product or multiplier modulator which is shown in figure 1-a. However, it is very difficult (and expensive) in practice to design a product modulator that maintains amplitude linearity at high carrier frequencies. One way of replacing the modulator stage is by using a non-linear device. We use the non-linearity to generate a harmonic that contains the product term then use a BPF to separate the term of interest. Figure 3 shows a block diagram of a nonlinear DSBSC modulator. Figure 4 shows a double balanced modulator that use the diode as a non-linear device, then use the BPF to separate the product term.
The received DSB-SC signal is Sm(t) = DSB-SC(t)= Ac (t) m(t) cos(ct) The receiver first generates an exact (coherent) replica (same phase and frequency) of the unmodulated carrier Sc(t) = Cos(ct) The coherent carrier is then multiplied with the received signal to give Sm(t)* Sc(t) = Ac (t) m(t) cos(ct)* Cos(ct) = Ac (t) m(t)+1/2 Ac (t) m(t) cos(2ct) The first term is the desired baseband signal while the second is a band-pass signal centered at 2c. A low-pass filter with bandwidth equal to that of the m(t) will pass the first term and reject the band-pass component.
Single Side Band (SSB) Modulation:
In DSB-SC it is observed that there is symmetry in the band structure. So, even if one half is transmitted, the other half can be recovered at the received. By doing so, the bandwidth and power of transmission is reduced by half.
Depending on which half of DSB-SC signal is transmitted, there are two types of
1. Lower Side Band (LSB) Modulation 2. Upper Side Band (USB) Modulation
V estigial Side Band (VSB) Modulation: The following are the drawbacks of SSB signal generation: 1. 2. 3. Generation of an SSB signal is difficult. Selective filtering is to be done to get the original signal back. 0 Phase shifter should be exactly tuned to 90 .
To overcome these drawbacks, VSB modulation is used. It can viewed as a compromise between SSB and DSB-SC.
In VSB 1. One sideband is not rejected fully.
2.
One sideband is transmitted fully and a small part (vestige)of
the other sideband is transmitted.
The transmission BW is BW v =B + v. where, v is the vestigial frequency band.
FREQUENCY TRANSLATION: The transfer of signals occupying a specified frequency band, such as a channel or group of channels, from one portion of the frequency spectrum to another, in such a way that the arithmetic frequency difference of signals within the band is unaltered. FREQUENCY-DIVISION MULTIPLEXING (FDM): It is a form of signal multiplexing which involves assigning non-overlapping frequency ranges to different signals or to each "user" of a medium. FDM can also be used to combine signals before final modulation onto a carrier wave. In this case the carrier signals are referred to as subcarriers: an example is stereo FM transmission, where a 38 kHz subcarrier is used to separate the left-right difference signal from the central left-right sum channel, prior to the frequency modulation of the composite signal. A television channel is divided into subcarrier frequencies for video, color, and audio. DSL uses different frequencies for voice and for upstream and downstream data transmission on the same conductors, which is also an example of frequency duplex. Where frequency-division multiplexing is used as to allow multiple users to share a physical communications channel, it is called frequency-division multiple access (FDMA). NONLINEAR DISTORTION: It is a term used (in fields such as electronics, audio and telecommunications) to describe the phenomenon of a non-linear relationship between the "input" and "output" signals of - for example - an electronic device. EFFECTS OF NONLINEARITY:
Nonlinearity can have several effects, which are unwanted in typical situations. The a3 term for example would, when the input is a sine wave with frequency , result in an extra sine wave at 3, as shown below.
In certain situations, this spurious signal can be filtered away because the "harmonic" 3 lies far outside the frequency range used, but in cable television, for example, third order distortion could cause a 200 MHz signal to interfere with the regular channel at 600 MHz. Nonlinear distortion applied to a superposition of two signals at different frequencies causes the circuit to act as a frequency mixer, creating intermodulation distortion.
PART A (2 MARK) QUESTIONS.
1. As related to AM, what is over modulation, under modulation and 100% modulation? 2. Draw the frequency spectrum of VSB, where it is used 3. Define modulation index of an AM signal 4. Draw the circuit diagram of an envelope detector 5. What is the mid frequency of IF section of AM receivers and its bandwidth. 6. A transmitter radiates 9 kW without modulation and 10.125 kW after modulation. Determine depth of modulation. 7. Draw the spectrum of DSB. 8. Define the transmission efficiency of AM signal. 9. Draw the phasor diagram of AM signal. 10. Advantages of SSB. 11. Disadvantages of DSB-FC. 12. What are the advantages of superhetrodyne receiver? 13. Advantages of VSB. 14. Distinguish between low level and high level modulator. 15. Define FDM & frequency translation. 16. Give the parameters of receiver. 17. Define sensitivity and selectivity. 18. Define fidelity. 19. What is meant by image frequency? 20. Define multitone modulation.
PART B (16 MARK) QUESTIONS
1. Explain the generation of AM signals using square law modulator. (16) 2. Explain the detection of AM signals using envelope detector. (16) 3. Explain about Balanced modulator to generate DSB-SC signal. ` (16) 4. Explain about coherent detector to detect SSB-SC signal. (16) 5. Explain the generation of SSB using balanced modulator. (16) 6. Draw the circuit diagram of Ring modulator and explain with its operation? (16) 7. Discus the coherent detection of DSB-SC modulated wave with a block diagram of detector and Explain. (16) 8. Explain the working of Superheterodyne receiver with its parameters. (16) 9. Draw the block diagram for the generation and demodulation of a VSB signal and explain the principle of operation. (16) 10. Write short notes on frequency translation and FDM? (16)
UNIT II
ANGLE MODULATION SYSTEMS
Phase and frequency modulation Single tone Narrow band FM Wideband FM Transmission bandwidth Generation of FM signal. Demodulation of FM signal
PHASE MODULATION: Phase modulation (PM) is a form of modulation that represents information as variations in the instantaneous phase of a carrier wave. Unlike its more popular counterpart, frequency modulation (FM), PM is not very widely used for radio transmissions. This is because it tends to require more complex receiving hardware and there can be ambiguity problems in determining whether, for example, the signal has changed phase by +180 or -180. PM is used, however, in digital music synthesizers such as the Yamaha DX7, even though these instruments are usually referred to as "FM" synthesizers (both modulation types sound very similar, but PM is usually easier to implement in this area).
An example of phase modulation. The top diagram shows the modulating signal superimposed on the carrier wave. The bottom diagram shows the resulting phase-
modulated signal. PM changes the phase angle of the complex envelope in direct proportion to the message signal. Suppose that the signal to be sent (called the modulating or message signal) is m(t) and the carrier onto which the signal is to be modulated is
Annotated: carrier(time) = (carrier amplitude)*sin(carrier frequency*time + phase shift) This makes the modulated signal
This shows how m(t) modulates the phase - the greater m(t) is at a point in time, the greater the phase shift of the modulated signal at that point. It can also be viewed as a change of the frequency of the carrier signal, and phase modulation can thus be considered a special case of FM in which the carrier frequency modulation is given by the time derivative of the phase modulation. The spectral behavior of phase modulation is difficult to derive, but the mathematics reveals that there are two regions of particular interest:
For small amplitude signals, PM is similar to amplitude modulation (AM) and exhibits its unfortunate doubling of baseband bandwidth and poor efficiency.
For a single large sinusoidal signal, PM is similar to FM, and its bandwidth is approximately ,
where fM = m / 2 and h is the modulation index defined below. This is also known as Carson's Rule for PM.
MODULATION INDEX: As with other modulation indices, this quantity indicates by how much the modulated variable varies around its unmodulated level. It relates to the variations in the phase of the carrier signal: , where is the peak phase deviation. Compare to the modulation index for frequency modulation.
Variable-capacitance diode phase modulator:
This circuit varies the phase between two square waves through at least 180. This capability finds application in fixed-frequency, phase shift, resonant-mode converters. ICs such as the UC3875 usually only work up to about 500 kHz, whereas this circuit can be extended up to tens of megahertz. In addition, the circuit shown uses low-cost components. This example was used for a high-efficiency 2-MHz RF power supply.
The signal is delayed at each gate by the RC network formed by the 4.7k input resistor and capacitance of the 1N4003 diode. The capacitance of the diode, and hence delay, can be varied by controlling the reverse dc bias applied across the diode. The 100k resistor to ground at the input to the second stage corrects a slight loss of 1:1 symmetry. The fixed delay for output A adjusts the phase to be approximately in phase at a 5-V bias.
Note that the control voltage should not drop below approximately 3 V, because the diodes will start to be forward-biased and the signal will be lost.
FREQUENCY MODULATION:
Frequency modulation (FM) conveys information over a carrier wave by varying its instantaneous frequency. This is in contrast with amplitude modulation, in which the amplitude of the carrier is varied while its frequency remains constant. In analog applications, the difference between the instantaneous and the base frequency of the carrier is directly proportional to the instantaneous value of the input signal amplitude. Digital data can be sent by shifting the carrier's frequency among a set of discrete values, a technique known as frequency-shift keying. Frequency modulation can be regarded as phase modulation where the carrier phase modulation is the time integral of the FM modulating signal. FM is widely used for broadcasting of music and speech, and in two-way radio systems, in magnetic tape recording systems, and certain video transmission systems. In radio systems, frequency modulation with sufficient bandwidth provides an advantage in cancelling naturally-occurring noise. Frequency-shift keying (digital FM) is widely used in data and fax modems. THEORY: Suppose the baseband data signal (the message) to be transmitted is xm(t) and the sinusoidal carrier is , where fc is the carrier's base frequency
and Ac is the carrier's amplitude. The modulator combines the carrier with the baseband data signal to get the transmitted signal:
In this equation,
is the instantaneous frequency of the oscillator and
is
the frequency deviation, which represents the maximum shift away from fc in one direction, assuming xm(t) is limited to the range 1. Although it may seem that this limits the frequencies in use to fc f, this neglects the distinction between instantaneous frequency and spectral frequency. The frequency spectrum of an actual FM signal has components extending out to infinite frequency, although they become negligibly small beyond a point. SINUSOIDAL BASEBAND SIGNAL: While it is an over-simplification, a baseband modulated signal may be approximated by a sinusoidal Continuous Wave signal with a frequency fm. The integral of such a signal is
Thus, in this specific case, equation (1) above simplifies to:
where the amplitude deviation
of the modulating sinusoid, is represented by the peak
(see frequency deviation).
The harmonic distribution of a sine wave carrier modulated by such a sinusoidal signal can be represented with Bessel functions - this provides a basis for a mathematical understanding of frequency modulation in the frequency domain. MODULATION INDEX: As with other modulation indices, this quantity indicates by how much the modulated variable varies around its unmodulated level. It relates to the variations in the frequency of the carrier signal:
where signal xm(t), and
is the highest frequency component present in the modulating is the Peak frequency-deviation, i.e. the maximum deviation of , the modulation is
the instantaneous frequency from the carrier frequency. If
called narrowband FM, and its bandwidth is approximately
. If
, the . While
modulation is called wideband FM and its bandwidth is approximately
wideband FM uses more bandwidth, it can improve signal-to-noise ratio significantly. With a tone-modulated FM wave, if the modulation frequency is held constant and the modulation index is increased, the (non-negligible) bandwidth of the FM signal increases, but the spacing between spectra stays the same; some spectral components decrease in strength as others increase. If the frequency deviation is held constant and the modulation frequency increased, the spacing between spectra increases. Frequency modulation can be classified as narrow band if the change in the carrier frequency is about the same as the signal frequency, or as wide-band if the change in the carrier frequency is much higher (modulation index >1) than the signal frequency. [1] For example, narrowband FM is used for two way radio systems such as Family Radio Service where the carrier is allowed to deviate only 2.5 kHz above and below the center frequency, carrying speech signals of no more than 3.5 kHz bandwidth. Wide-band FM is used for FM broadcasting where music and speech is transmitted with up to 75 kHz deviation from the center frequency, carrying audio with up to 20 kHz bandwidth. CARSON'S RULE: A rule of thumb, Carson's rule states that nearly all (~98%) of the power of a frequency-modulated signal lies within a bandwidth of
where
, as defined above, is the peak deviation of the instantaneous frequency .
from the center carrier frequency NOISE QUIETING:
The noise power decreases as the signal power increases, therefore the SNR goes up significantly. MODULATION: FM signals can be generated using either direct or indirect frequency modulation. Direct FM modulation can be achieved by directly feeding the message into the input of a VCO.
For indirect FM modulation, the message signal is integrated to generate a phase modulated signal. This is used to modulate a crystal controlled oscillator, and the result is passed through a frequency multiplier to give an FM signal. DEMODULATION: Many FM detector circuits exist. One common method for recovering the information signal is through a Foster-Seeley discriminator. A phase-lock loop can be used as an FM demodulator. Slope detection demodulates an FM signal by using a tuned circuit, which has its resonant frequency slightly offset from the carrier frequency. As the frequency rises and falls, the tuned circuit provides a changing amplitude of response, converting FM to AM. AM receivers may detect some FM transmissions by this means, though it does not provide an efficient method of detection for FM broadcasts. APPLICATIONS: MAGNETIC TAPE STORAGE: FM is also used at intermediate frequencies by all analog VCR systems, including VHS, to record both the luminance (black and white) and the chrominance portions of the video signal. FM is the only feasible method of recording video to and retrieving video from Magnetic tape without extreme distortion, as video signals have a very large range of frequency components from a few hertz to several megahertz, too wide for equalizers to work with due to electronic noise below 60 dB. FM also keeps the tape at saturation level, and therefore acts as a form of noise reduction, and a simple limiter can mask variations in the playback output, and the FM capture effect removes print-through and pre-echo. A continuous pilot-tone, if added to the signal as was done on V2000 and many Hi-band formats can keep mechanical jitter under control and assist time base correction. These FM systems are unusual in that they have a ratio of carrier to maximum modulation frequency of less than two; contrast this with FM audio broadcasting where the ratio is around 10,000. Consider for example a 6 MHz carrier modulated at a 3.5 MHz rate; by Bessel analysis the first sidebands are on 9.5 and 2.5 MHz, while the second sidebands are on 13 MHz and 1 MHz The result is a sideband of reversed phase on +1 MHz; on demodulation, this results in an unwanted output at 61 = 5 MHz The system must be designed so that this is at an acceptable level.
SOUND: FM is also used at audio frequencies to synthesize sound. This technique, known as FM synthesis, was popularized by early digital synthesizers and became a standard feature for several generations of personal computer sound cards. RADIO: The wideband FM (WFM) requires a wider signal bandwidth than amplitude modulation by an equivalent modulating signal, but this also makes the signal more robust against noise and interference. Frequency modulation is also more robust against simple signal amplitude fading phenomena. As a result, FM was chosen as the modulation standard for high frequency, high fidelity radio transmission: hence the term "FM radio" (although for many years the BBC called it "VHF radio", because commercial FM broadcasting uses a well-known part of the VHF bandthe FM broadcast band). FM receivers employ a special detector for FM signals and exhibit a phenomenon called capture effect, where the tuner is able to clearly receive the stronger of two stations being broadcast on the same frequency. Problematically however, frequency drift or lack of selectivity may cause one station or signal to be suddenly overtaken by another on an adjacent channel. Frequency drift typically constituted a problem on very old or inexpensive receivers, while inadequate selectivity may plague any tuner. An FM signal can also be used to carry a stereo signal: see FM stereo. However, this is done by using multiplexing and demultiplexing before and after the FM process. The rest of this article ignores the stereo multiplexing and demultiplexing process used in "stereo FM", and concentrates on the FM modulation and demodulation process, which is identical in stereo and mono processes. A high-efficiency radio-frequency switching amplifier can be used to transmit FM signals (and other constant-amplitude signals). For a given signal strength (measured at the receiver antenna), switching amplifiers use less battery power and typically cost less than a linear amplifier. This gives FM another advantage over other modulation schemes that require linear amplifiers, such as AM and QAM. FM is commonly used at VHF radio frequencies for highfidelity broadcasts of music and speech (see FM broadcasting). Normal (analog) TV sound is also broadcast using FM. A narrow band form is used for voice communications in
commercial and amateur radio settings. In broadcast services, where audio fidelity is important, wideband FM is generally used. In two-way radio, narrowband FM (NBFM) is used to conserve bandwidth for land mobile radio stations, marine mobile, and many other radio services.
VARACTOR FM MODULATOR:
Varactor FM Modulator Another fm modulator which is widely used in transistorized circuitry uses a voltage-variable capacitor (VARACTOR). The varactor is simply a diode, or pn junction, that is designed to have a certain amount of capacitance between junctions. View (A) of figure 2 shows the varactor schematic symbol. A diagram of a varactor in a simple oscillator circuit is shown in view (B).This is not a working circuit, but merely a simplified illustration. The capacitance of a varactor, as with regular capacitors, is determined by the area of the capacitor plates and the distance between the plates. The depletion region in the varactor is the dielectric and is located between the p and n elements, which serve as the plates. Capacitance is varied in the varactor by varying the reverse bias which controls the thickness of the depletion region. The varactor is so designed that the change in
capacitance is linear with the change in the applied voltage. This is a special design characteristic of the varactor diode. The varactor must not be forward biased because it cannot tolerate much current flow. Proper circuit design prevents the application of forward bias.
IMPORTANT QUESTION
PART A All questions Two Marks: 1. What do you mean by narrowband and wideband FM? 2. Give the frequency spectrum of narrowband FM? 3. Why Armstrong method is superior to reactance modulator. 4. Define frequency deviation in FM? 5. State Carsons rule of FM bandwidth? 6. Differentiate between narrow band and wideband FM.? 7. What are the advantages of FM.? 8. Define PM. 9. What is meant by indirect FM generation? 10. Draw the phasor diagram of narrow band FM. 11. Write the expression for the spectrum of a single tone FM signal. 12. What are the applications of phase locked loop? 13. Define modulation index of FM and PM. 14. Differentiate between phase and frequency modulation. 15. A carrier of frequency 100 MHz is frequency modulated by a signal x(t)=20sin (200x103t ). What is the bandwidth of the FM signal if the frequency sensitivity of the modulator is 25 KHz per volt? 16. What is the bandwidth required for an FM wave in which the modulating frequency signal is 2 KHz and the maximum frequency deviation is 12 KHz? 17. Determine and draw the instantaneous frequency of a wave having a total phase angle given by (t)= 2000t +sin10t. 18. Draw the block diagram of PLL.
PART B
1. Explain the indirect method of generation of FM wave and any one method of demodulating an FM wave. (16) 2. Derive the expression for the frequency modulated signal. Explain what is meant by narrowband FM and wideband FM using the expression. (16) 3. Explain any two techniques of demodulation of FM. (16) 4. Explain the working of the reactance tube modulator and drive an expression to show how the variation of the amplitude of the input signal changes the frequency of the output signal of the modulator. (16) 5. Discuss the effects of nonlinearities in FM. (8) 6. Discuss in detail FM stereo multiplexing. (8) 7. Draw the frequency spectrum of FM and explain. Explain how Varactor diode can be used for frequency modulation. (16) 8. Discuss the indirect method of generating a wide-band FM signal. (8) 9. Draw the circuit diagram of Foster-Seelay discriminator and explain its working. (16) 10. Explain the principle of indirect method of generating a wide-band FM signal with a neat block diagram. (8)
UNIT III
NOISE THEORY
Review of probability. Random variables and random process. Gaussian process. Noise. Shot noise. Thermal noise. White noise. Narrow band noise. Noise temperature. Noise figure.
INTRODUCTION OF PROBABILITY:
Probability theory is the study of uncertainty. Through this class, we will be relying on concepts from probability theory for deriving machine learning algorithms. These notes attempt to cover the basics of probability theory at a level appropriate. The mathematical theory of probability is very sophisticated, and delves into a branch of analysis known as measure theory. In these notes, we provide a basic treatment of probability that does not address these finer details.
1 Elements of probability
In order to define a probability on a set we need a few basic elements, Sample space : The set of all the outcomes of a random experiment. Here, each outcome can be thought of as a complete description of the state of the real world at the end of the experiment. Set of events (or event space) F : A set whose elements A F (called events) are subsets of (i.e., A is a collection of possible outcomes of an experiment).1 . Probability measure: A function P : F R that satisfies the following properties, - P (A) 0, for all A F - P () = 1 - If A1 , A2 , . . . are disjoint events (i.e., Ai Aj = whenever i = j), then X P (ui Ai ) = P (Ai )
i
These three properties are called the Axioms of Probability. Example: Consider the event of tossing a six-sided die. The sample space is = {1, 2, 3, 4, 5, 6}. We can define different event spaces on this sample space. For example, the simplest event space is the trivial event space F = {, }. Another event space is the set of all subsets of . For the first event space, the unique probability measure satisfying the requirements above is given by P () = 0, P () = 1. For the second event space, one valid probability measure is to assign the probability of each set in the event space to be i where i is the number 6 of elements of that set. Properties: If A B = P (A) P (B). P (A B) min(P (A), P (B)). (Union Bound) P (A u B) P (A) + P (B). P ( \ A) = 1 P (A). (Law of Total Probability) If A1 , . . . , Ak are a set of disjoint events such that uk i=1 Pk
2 Random variables
Consider an experiment in which we flip 10 coins, and we want to know the number of coins that come up heads. Here, the elements of the sample space I are 10-length sequences of heads and tails. For example, we might have wO = (H, H, T , H, T , H, H, T , T , T ) E I. However, in practice, we usually do not care about the probability of obtaining any particular sequence of heads and tails. Instead we usually care about real-valued functions of outcomes, such as the number of the number of heads that appear among our 10 tosses, or the length of the longest run of tails. These functions, under some technical conditions, are known as random variables. More formally, a random variable X is a function X : I R.2 Typically, we will denote random variables using upper case letters X () or more simply X (where the dependence on the random outcome is implied). We will denote the value that a random variable may take on using lower case letters x. Example: In our experiment above, suppose that X () is the number of heads which occur in the sequence of tosses . Given that only 10 coins are tossed, X () can take only a finite number of values, so it is known as a discrete random variable. Here, the probability of the set associated with a random variable X taking on some specific value k is
P (X = k) := P ({ : X () = k}). Example: Suppose that X () is a random variable indicating the amount of time it takes for a radioactive particle to decay. In this case, X (I) takes on a infinite number of possible values, so it is called a continuous random variable. We denote the probability that X takes on a value between two real constants a and b (where a < b) as P (a X b) := P ({ : a X () b}). 2.1 Cumulative distribution functions In order to specify the probability measures used when dealing with random variables, it is often convenient to specify alternative functions (CDFs, PDFs, and PMFs) from which the probability measure governing an experiment immediately follows. In this section and the next two sections, we describe each of these types of functions in turn. A cumulative distribution function (CDF) is a function FX : R [0, 1] which specifies a probability measure as, FX (x) , P (X x). (1) By using this function one can calculate the probability of any event in F .3 Figure 1 shows a sample CDF function. 2.2 Probability mass functions When a random variable X takes on a finite set of possible values (i.e., X is a discrete random variable), a simpler way to represent the probability measure associated with a random variable is to directly specify the probability of each value that the random variable can assume. In particular, a probability mass function (PMF) is a function pX : I R such that pX (x) , P (X = x). In the case of discrete random variable, we use the notation V al(X ) for the set of possible values that the random variable X may assume. For example, if X () is a random variable indicating the number of heads out of ten tosses of coin, then V al(X ) = {0, 1, 2, . . . , 10}. Properties: - 0 pX (x) 1. P xV 1. P
xA
al(X )
pX (x) =
pX (x) = P (X e
A). 2.3 Probability density functions For some continuous random variables, the cumulative distribution function FX (x) is differentiable everywhere. In these cases, we define the Probability Density Function or PDF as the derivative of the CDF, i.e., dFX (x) fX (x) , . (2) dx
Note here, that the PDF for a continuous random variable may not always exist (i.e., if FX (x) is not differentiable everywhere). According to the properties of differentiation, for very small x, P (x X x + x) fX (x)x. (3)
Both CDFs and BDFs (when they exist!) can be used for calculating the probabilities of different events. But it should be emphasized that the value of PDF at any given point x is not the probability of that event, i.e., fX (x) = P (X = x). For example, fX (x) can take on values larger than one (but the integral of fX (x) over any subset of R will be at most one). Properties: - fX (x) 0 . R - fX (x) = i. R - x A fX (x)dx = P (X e A). 2.4 Expectation Suppose that X is a discrete random variable with PMF pX (x) and g : R - R is an arbitrary function. In this case, g(X ) can be considered a random variable, and we define the expectation or expected value of g(X ) as X g(x)pX (x). E[g(X )] ,
xV al(X )
If X is a continuous random variable with PDF fX (x), then the expected value of g(X ) is defined as, Z E[g(X )] , g(x)fX (x)dx.
Intuitively, the expectation of g(X ) can be thought of as a weighted average of the values that g(x) can taken on for different values of x, where the weights are given by pX (x) or fX (x). As a special case of the above, note that the expectation, E[X ] of a random variable itself is found by letting g(x) = x; this is also known as the mean of the random variable X . Properties: - E[a] = a for any constant a e R. - E[af (X )] = aE[f (X )] for any constant a e R. - (Linearity of Expectation) E[f (X ) + g(X )] = E[f (X )] + E[g(X )]. - For a discrete random variable X , E[i{X = k}] = P (X = k). 2.5 Variance The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as V ar[X ] , E[(X - E(X ))2 ] Using the properties in the previous section, we can derive an alternate expression for the variance: E[(X - E[X ])2 ] = E[X 2 - 2E[X ]X + E[X ]2 ] = E[X 2 ] - 2E[X ]E[X ] + E[X ]2 = E[X 2 ] - E[X ]2 ,
where the second equality follows from linearity of expectations and the fact that E[X ] is actually a constant with respect to the outer expectation. Properties: - V ar[a] = 0 for any constant a e R. - V ar[af (X )] = a2 V ar[f (X )] for any constant a e R. 2.6 Some common random variables Discrete random variables X Bernoulli(p) (where O p 1): one if a coin with heads probability p comes up heads, zero otherwise. p(x) = p 1-p if p = 1 if p = O
X Binomial(n, p) (where O p 1): the number of heads in n independent flips of a coin with heads probability p. p(x) = n x nx x p (1 - p)
X Geometric(p) (where p > O): the number of flips of a coin with heads probability p until the first heads. p(x) = p(1 - p)x1 X P oisson() (where > O): a probability distribution over the nonnegative integers used for modeling the frequency of rare events. p(x) = e Continuous random variables X U nif orm(a, b) (where a < b): equal probability density to every value between a and b on the real line. ( 1 if a x b f (x) = ba O otherwise x x!
X Exponential() (where > O): decaying probability density over the nonnegative reals. f (x) = ex O if x > O otherwise
X N ormal(, 2 ): also known as the Gaussian distribution 2 1 1 f (x) = e 2 2 (x) 2
Figure 2: PDF and CDF of a couple of random variables.
3 Two random variables

Thus far, we have considered single random variables. In many situations, however, there may be more than one quantity that we are interested in knowing during a random experiment. For instance, in an experiment where we flip a coin ten times, we may care about both X () = the number of heads that come up as well as Y () = the length of the longest run of consecutive heads. In this section, we consider the setting of two random variables. 3.1 Joint and marginal distributions Suppose that we have two random variables X and Y . One way to work with these two random variables is to consider each of them separately. If we do that we will only need FX (x) and FY (y). But if we want to know about the values that X and Y assume simultaneously during outcomes of a random experiment, we require a more complicated structure known as the joint cumulative distribution function of X and Y , defined by FX Y (x, y) = P (X x, Y y)
It can be shown that by knowing the joint cumulative distribution function, the probability of any event involving X and Y can be calculated. 6
The joint CDF FX Y (x, y) and the joint distribution functions FX (x) and FY (y) of each variable separately are related by FX (x) FY (y) = =
y x
lim FX Y (x, y)dy lim FX Y (x, y)dx.
Here, we call FX (x) and FY (y) the marginal cumulative distribution functions of FX Y (x, y). Properties: - o FX Y (x, y) 1.
- limx,y FX Y (x, y) = 1. - limx,y FX Y (x, y) = o. - FX (x) = limy FX Y (x, y). 3.2 Joint and marginal probability mass functions If X and Y are discrete random variables, then the joint probability mass function pX Y : R R [o, 1] is defined by pX Y (x, y) = P (X = x, Y = y). P P 1 for all x, y, and xV al(X ) yV al(Y ) PX Y (x, y) = 1.
Here, o
PX Y (x, y)
How does the joint PMF over two variables relate to the probability mass function for each variable separately? It turns out that X pX (x) = pX Y (x, y).
y
and similarly for pY (y). In this case, we refer to pX (x) as the marginal probability mass function of X . In statistics, the process of forming the marginal distribution with respect to one variable by summing out the other variable is often known as marginalization. 3.3 Joint and marginal probability density functions Let X and Y be two continuous random variables with joint distribution function FX Y . In the case that FX Y (x, y) is everywhere differentiable in both x and y, then we can define the joint probability density function, fX Y (x, y) = 2 FX Y (x, y) . xy
Like in the single-dimensional case, fX Y (x, y) = P (X = x, Y = y), but rather ZZ fX Y (x, y)dxdy = P ((X, Y ) e A).
xA
Note that the values of the probability density function fX Y R y)Rare always nonnegative, but they (x, may be greater than 1. Nonetheless, it must be the case that fX Y (x, y) = 1. Analagous to the discrete case, we define Z fX (x) =
fX Y (x, y)dy,
as the marginal probability density function (or marginal density) of X , and similarly for fY (y).
3.4 Conditional distributions Conditional distributions seek to answer the question, what is the probability distribution over Y , when we know that X must take on a certain value x? In the discrete case, the conditional probability mass function of X given Y is simply pY |X (y|x) _ assuming that pX (x) _ o. In the continuous case, the situation is technically a little more complicated because the probability that a continuous random variable X takes on a specific value x is equal to zero4 . Ignoring this technical point, we simply define, by analogy to the discrete case, the conditional probability density of Y given X _ x to be fX Y (x, y) fY |X (y|x) _ , fX (x) provided fX (x) _ o. 3.5 Bayess rule A useful formula that often arises when trying to derive expression for the conditional probability of one variable given another, is Bayess rule. In the case of discrete random variables X and Y , PY
|X (y|x)
pX Y (x, y) , pX (x)
PX |Y (x|y)PY (y) PX Y (x, y) _ P . l l PX (x) y i EV al(Y ) PX |Y (x|y )PY (y )
If the random variables X and Y are continuous, fY |X (y|x) _ fX |Y (x|y)fY (y) fX Y (x, y) _ R . l l l fX (x) fX |Y (x|y )fY (y )dy
3.6 Independence Two random variables X and Y are independent if FX Y (x, y) _ FX (x)FY (y) for all values of x and y. Equivalently, For discrete random variables, pX Y (x, y) _ pX (x)pY (y) for all x e V al(X ), y e V al(Y ). For discrete random variables, pY |X (y|x) _ pY (y) whenever pX (x) _ o for all y e V al(Y ). For continuous random variables, fX Y (x, y) _ fX (x)fY (y) for all x, y e R. For continuous random variables, fY |X (y|x) _ fY (y) whenever fX (x) _ o for all y e R.
To get around this, a more reasonable way to calculate the conditional CDF is, FY
|X (y,
x) = lim P (Y y|x X x + x). x0
It can be easily seen that if F (x, y) is differentiable in both x, y then, Z y fX,Y (x, FY
|X (y,
x) =
) fX (x)
and therefore we define the conditional PDF of Y given X = x in the following way, fY |X (y|x) = fX Y (x, y) fX (x)
Informally, two random variables X and Y are independent if knowing the value of one variable will never have any effect on the conditional probability distribution of the other variable, that is, you know all the information about the pair (X, Y ) by just knowing f (x) and f (y). The following lemma formalizes this observation: Lemma 3.1. If X and Y are independent then for any subsets A, B R, we have, P (X e A, y e B) _ P (X e A)P (Y e B) By using the above lemma one can prove that if X is independent of Y then any function of X is independent of any function of Y . 3.7 Expectation and covariance Suppose that we have two discrete random variables X, Y and g : R2 - R is a function of these two random variables. Then the expected value of g is defined in the following way, X X E[g(X, Y )] , g(x, y)pX Y (x, y).
xEV al(X ) yEV al(Y )
For continuous random variables X, Y , the analogous expression is Z Z g(x, y)fX Y (x, y)dxdy. E[g(X, Y )] _

We can use the concept of expectation to study the relationship of two random variables with each other. In particular, the covariance of two random variables X and Y is defined as C ov[X, Y ] , E[(X - E[X ])(Y - E[Y ])]
Using an argument similar to that for variance, we can rewrite this as, C ov[X, Y ] _ E[(X - E[X ])(Y - E[Y ])] _ E[X Y - X E[Y ] - Y E[X ] + E[X ]E[Y ]] _ E[X Y ] - E[X ]E[Y ] - E[Y ]E[X ] + E[X ]E[Y ]] _ E[X Y ] - E[X ]E[Y ].
Here, the key step in showing the equality of the two forms of covariance is in the third equality, where we use the fact that E[X ] and E[Y ] are actually constants which can be pulled out of the expectation. When C ov[X, Y ] _ o, we say that X and Y are uncorrelated5 . Properties: - (Linearity of expectation) E[f (X, Y ) + g(X, Y )] _ E[f (X, Y )] + E[g(X, Y )]. - V ar[X + Y ] _ V ar[X ] + V ar[Y ] + 2C ov[X, Y ]. - If X and Y are independent, then C ov[X, Y ] _ o. - If X and Y are independent, then E[f (X )g(Y )] _ E[f (X )]E[g(Y )].
4 Multiple random variables

The notions and ideas introduced in the previous section can be generalized to more than two random variables. In particular, suppose that we have n continuous random variables, X1 (), X2 (), . . . Xn (). In this section, for simplicity of presentation, we focus only on the continuous case, but the generalization to discrete random variables works similarly. 4.1 Basic properties We can define the joint distribution function of X1 , X2 , . . . , Xn , the joint probability density function of X1 , X2 , . . . , Xn , the marginal probability density function of X1 , and the conditional probability density function of X1 given X2 , . . . , Xn , as
FX1 ,X2 ,...,Xn (x1 , x2 , . . . xn )
= P (X1 x1 , X2 x2 , . . . , Xn xn ) n FX1 ,X ,...,Xn (x 1 , x2 , . . . x n ) 2 fX1 ,X2 ,...,Xn (x1 , x2 , . . . xn ) x1 . . . xn = Z Z fX1 (X1 ) = fX1 ,X2 ,...,Xn (x1 , x2 , . . . xn )dx2 . . . dxn

fX1 |X2 ,...,Xn (x1 |x2 , . . . xn )
f (x , x , . . . xn ) = X1 ,X2 ,...,Xn 1 2 f X2 ,...,Xn(x 1 , x 2 , . . . x n )
To calculate the probability of an event A Rn we have, Z fX1 ,X2 ,...,Xn (x1 , x2 , . . . xn )dx1 dx2 . . . dxn (4) P ((x1 , x2 , . . . xn ) e A) =
(x1 ,x2 ,...xn )EA
Chain rule: From the definition of conditional probabilities for multiple random variables, one can show that f (x1 , x2 , . . . , xn ) = f (xn |x1 , x2 . . . , xn1 )f (x1 , x2 . . . , xn1 ) = f (xn |x1 , x2 . . . , xn1 )f (xn1 |x1 , x2 . . . , xn2 )f (x1 , x2 . . . , xn2 ) n Y = . . . = f (x1 ) f (xi |x1 , . . . , xi1 ).
i=2
Independence: For multiple events, A1 , . . . , Ak , we say that A1 , . . . , Ak are mutually independent if for any subset S {1, 2, . . . , k}, we have Y P (niES Ai ) = P (Ai ).
iES
Likewise, we say that random variables X1 , . . . , Xn are independent if f (x1 , . . . , xn ) = f (x1 )f (x2 ) f (xn ). Here, the definition of mutual independence is simply the natural generalization of independence of two random variables to multiple random variables. Independent random variables arise often in machine learning algorithms where we assume that the training examples belonging to the training set represent independent samples from some unknown probability distribution. To make the significance of independence clear, consider a bad training set in which we first sample a single training example (x(1) , y (1) ) from the some unknown distribution, and then add m - 1 copies of the exact same training example to the training set. In this case, we have (with some abuse of notation) P ((x
(1)
,y
(1)
), . . . .(x
(m)
,y
(m)
m Y
)) =
i=1
P (x(i) , y (i) ).
Despite the fact that the training set has size m, the examples are not independent! While clearly the procedure described here is not a sensible method for building a training set for a machine learning algorithm, it turns out that in practice, non-independence of samples does come up often, and it has the effect of reducing the effective size of the training set.
4.2 Random vectors Suppose that we have fl random variables. When working with all these random variables together, we will often find it convenient to put them in a vector X = [X1 X2 . . . Xn ]T . We call the resulting vector a random vector (more formally, a random vector is a mapping from I to Rn ). It should be clear that random vectors are simply an alternative notation for dealing with fl random variables, so the notions of joint PDF and CDF will apply to random vectors as well. Expectation: Consider an arbitrary function from g : Rn R. The expected value of this function is defined as Z E[g(X )] = g(x1 , x2 , . . . , xn )fX1 ,X2 ,...,Xn (x1 , x2 , . . . xn )dx1 dx2 . . . dxn , (5) Rn R where Rn is fl consecutive integrations from - to . If g is a function from Rn to Rm , then the expected value of g is the element-wise expected values of the output vector, i.e., if g is g1 (x) g2 (x) g(x) = . , . gm (x) Then, E[g1 (X )] E[g2 (X )] . E[g(X )] = . . E[gm (X )] Covariance matrix: For a given random vector X : I Rn , its covariance matrix is the fl fl square matrix whose entries are given by ij = C ov[Xi , Xj ]. From the definition of covariance, we have C ov[X1 , X1 ] C ov[X1 , Xn ] .. = . . . C ov[Xn , X1 ] C ov[Xn , Xn ] E[X12 ] - E[X1 ]E[X1 ] E[X1 Xn ] - E[X1 ]E[Xn ] .. = . . .
2 E[Xn X1 ] - E[Xn ]E[X1 ] E[X n ] - E[Xn ]E[Xn ] 2 E[X1 ] E[X1 Xn ] E[X1 ]E[X1 ] E[X1 ]E[Xn ] .. .. . . . = . - . . . . . . 2 E[Xn ]E[X1 ] E[Xn ]E[Xn ] E[Xn X1 ] E[X n ]
= E[X X T ] - E[X ]E[X ]T = . . . = E[(X - E[X ])(X - E[X ])T ]. where the matrix expectation is defined in the obvious way. The covariance matrix has a number of useful properties: - 0; that is, is positive semi definite. - = T ; that is, is symmetric. 4.3 The multivariate Gaussian distribution One particularly important example of a probability distribution over random vectors X is called the multivariate Gaussian or multivariate normal distribution. A random vector X e Rn is said to have a multivariate normal (or Gaussian) distribution with mean e Rn and covariance matrix e Sn (where Sn refers to the space of symmetric positive definite fl fl matrices) ++ ++ 1 1 fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ; , ) = exp - (x - )T 1 (x - ) . 2 (2)n/2 || 1/2
We write this as X N (, ). Notice that in the case fl = 1, this reduces the regular definition of a normal distribution with mean parameter 1 and variance 11 . Generally speaking, Gaussian random variables are extremely useful in machine learning and statistics for two main reasons. First, they are extremely common when modeling noise in statistical algorithms. Quite often, noise can be considered to be the accumulation of a large number of small independent random perturbations affecting the measurement process; by the Central Limit Theorem, summations of independent random variables will tend to look Gaussian. Second, Gaussian random variables are convenient for many analytical manipulations, because many of the integrals involving Gaussian distributions that arise in practice have simple closed form solutions.
GAUSSIAN PROCESS:
In probability theory and statistics, a Gaussian process is a stochastic process whose realizations consist of random values associated with every point in a range of times (or of space) such that each such random variable has a normal distribution. Moreover, every finite collection of those random variables has a multivariate normal distribution. Gaussian processes are important in statistical modeling because of properties inherited from the normal distribution. For example, if a random process is modeled as a Gaussian process, the distributions of various derived quantities can be obtained explicitly. Such quantities include: the average value of the process over a range of times; the error in estimating the average using sample values at a small set of times. A process is Gaussian if and only if for every finite set of indices t1, ..., tk in the index set T
is a vector-valued Gaussian random variable. Using characteristic functions of random variables, the Gaussian property can be formulated as follows:{ Xt ; t T } is Gaussian if and only if, for every finite set of indices t1, ..., tk, there are reals l j with i i > 0 and reals j such that
The numbers l j and j can be shown to be the covariances and means of the variables in the process.
NOISE:
In common use, the word noise means any unwanted sound. In both analog and digital electronics, noise is an unwanted perturbation to a wanted signal; it is called noise as a generalization of the audible noise heard when listening to a weak radio transmission. Signal noise is heard as acoustic noise if played through a loudspeaker; it manifests as 'snow' on a television or video image. Noise can block, distort, change or interfere with the meaning of a message in human, animal and electronic communication. In signal processing or computing it can be considered unwanted data without meaning; that is, data that is not being used to transmit a signal, but is simply produced as an unwanted by-product of other activities. "Signal-to-noise ratio" is sometimes used informally to refer to the ratio of useful information to false or irrelevant data in a conversation or exchange, such as off-topic posts and spam in online discussion forums and other online communities. In information theory, however, noise is still considered to be information. In a broader sense, film grain or even advertisements encountered while looking for something else can be
considered noise. In biology, noise can describe the variability of a measurement around the mean, for example transcriptional noise describes the variability in gene activity between cells in a population. In many of these areas, the special case of thermal noise arises, which sets a fundamental lower limit to what can be measured or signaled and is related to basic physical processes at the molecular level described by well-established thermodynamics considerations, some of which are expressible by simple formulae.
SHOT NOISE:
Shot noise consists of random fluctuations of the electric current in an electrical conductor, which are caused by the fact that the current is carried by discrete charges (electrons). The strength of this noise increases for growing magnitude of the average current flowing through the conductor. Shot noise is to be distinguished from current fluctuations in equilibrium, which happen without any applied voltage and without any average current flowing. These equilibrium current fluctuations are known as Johnson-Nyquist noise. Shot noise is important in electronics, telecommunication, and fundamental physics. The strength of the current fluctuations can be expressed by giving the variance of the current I, where <I> is the average ("macroscopic") current. However, the value measured in this way depends on the frequency range of fluctuations which is measured ("bandwidth" of the measurement): The measured variance of the current grows linearly with bandwidth. Therefore, a more fundamental quantity is the noise power, which is essentially obtained by dividing through the bandwidth (and, therefore, has the dimension ampere squared divided by Hertz). It may be defined as the zero-frequency Fourier transform of the current-current correlation function.
THERMAL NOISE:
Thermal noise (JohnsonNyquist noise, Johnson noise, or Nyquist noise) is the electronic noise generated by the thermal agitation of the charge carriers (usually the electrons) inside an electrical conductor at equilibrium, which happens regardless of any applied voltage. Thermal noise is approximately white, meaning that the power spectral density is nearly equal throughout the frequency spectrum (however see the section below on extremely high frequencies). Additionally, the amplitude of the signal has very nearly a Gaussian probability density function. This type of noise was first measured by John B. Johnson at Bell Labs in 1928. He described his findings to Harry Nyquist, also at Bell Labs, who was able to explain the results. Noise voltage and power Thermal noise is distinct from shot noise, which consists of additional current fluctuations that occur when a voltage is applied and a macroscopic current starts to flow. For the general case, the above definition applies to charge carriers in any type of conducting medium (e.g. ions in an electrolyte), not just resistors. It can be modeled by a voltage source representing the noise of the non-ideal resistor in series with an ideal noise free resistor. The power spectral density, or voltage variance (mean square) per hertz of bandwidth, is given by
where kB is Boltzmann's constant in joules per kelvin, T is the resistor's absolute temperature in kelvins, and R is the resistor value in ohms (). Use this equation for quick calculation:
For example, a 1 k resistor at a temperature of 300 K has
For a given bandwidth, the root mean square (RMS) of the voltage, vn, is given by
where f is the bandwidth in hertz over which the noise is measured. For a 1 k resistor at room temperature and a 10 kHz bandwidth, the RMS noise voltage is 400 nV. A useful rule of thumb to remember is that 50 at 1 Hz bandwidth correspond to 1 nV noise at room temperature. A resistor in a short circuit dissipates a noise power of
The noise generated at the resistor can transfer to the remaining circuit; the maximum noise power transfer happens with impedance matching when the Thevenin equivalent resistance of the remaining circuit is equal to the noise generating resistance. In this case each one of the two participating resistors dissipates noise in both itself and in the other resistor. Since only half of the source voltage drops across any one of these resistors, the resulting noise power is given by
where P is the thermal noise power in watts. Notice that this is independent of the noise generating resistance.
Noise current:
The noise source can also be modeled by a current source in parallel with the resistor by taking the Norton equivalent that corresponds simply to divide by R. This gives the root mean square value of the current source as:
Thermal noise is intrinsic to all resistors and is not a sign of poor design or manufacture, although resistors may also have excess noise.
Noise power in decibels:

Signal power is often measured in dBm (decibels relative to 1 milliwatt, assuming a 50 ohm load). From the equation above, noise power in a resistor at room temperature, in dBm, is then:
where the factor of 1000 is present because the power is given in milliwatts, rather than watts. This equation can be simplified by separating the constant parts from the bandwidth:
which is more commonly seen approximated as:
Noise power at different bandwidths is then simple to calculate: Bandwidth (f) Thermal noise power 1 Hz 10 Hz 100 Hz 1 kHz 10 kHz 100 kHz 180 kHz 200 kHz 1 MHz 2 MHz 6 MHz 20 MHz 174 dBm 164 dBm 154 dBm 144 dBm 134 dBm 124 dBm 121.45 dBm 120.98 dBm 114 dBm 111 dBm 106 dBm 101 dBm Commercial GPS channel Analog television channel WLAN 802.11 channel One LTE resource block One GSM channel (ARFCN) FM channel of 2-way radio Notes
Thermal noise on capacitors:

Thermal noise on capacitors is referred to as kTC noise. Thermal noise in an RC circuit has an unusually simple expression, as the value of the resistance (R) drops out of the equation. This is because higher R contributes to more filtering as well as to more noise. The noise bandwidth of the RC circuit is 1/(4RC), which can substituted into the above formula to eliminate R. The mean-square and RMS noise voltage generated in such a filter are:
Thermal noise accounts for 100% of kTC noise, whether it is attributed to the resistance or to the capacitance. In the extreme case of the reset noise left on a capacitor by opening an ideal switch, the resistance is infinite, yet the formula still applies; however, now the RMS must be interpreted not as a time average, but as an average over many such reset events, since the voltage is constant when the bandwidth is zero. In this sense, the Johnson noise of an RC circuit can be seen to be inherent, an effect of the thermodynamic distribution of the number of electrons on the capacitor, even without the involvement of a resistor. The noise is not caused by the capacitor itself, but by the thermodynamic equilibrium of the amount of charge on the capacitor. Once the capacitor is disconnected from a conducting circuit, the thermodynamic fluctuation is frozen at a random value with standard deviation as given above. The reset noise of capacitive sensors is often a limiting noise source, for example in image sensors. As an alternative to the voltage noise, the reset noise on the capacitor can also be quantified as the electrical charge standard deviation, as
Since the charge variance is kBTC, this noise is often called kTC noise. Any system in thermal equilibrium has state variables with a mean energy of kT/2 per degree of freedom. Using the formula for energy on a capacitor (E = CV2), mean noise energy on a capacitor can be seen to also be C(kT/C), or also kT/2. Thermal noise on a capacitor can be derived from this relationship, without consideration of resistance. The kTC noise is the dominant noise source at small capacitors. Noise of capacitors at 300 K Capacitance 1 fF 10 fF 100 fF 1 pF 10 pF 100 pF 1 nF 2 mV 640 V 200 V 64 V 20 V 6.4 V 2 V Electrons 12.5 e 40 e 125 e 400 e 1250 e 4000 e 12500 e
Noise at very high frequencies:

The above equations are good approximations at any practical radio frequency in use (i.e. frequencies below about 80 gigahertz). In the most general case, which includes up to optical frequencies, the power spectral density of the voltage across the resistor R, in V2/Hz is given by:
where f is the frequency, h Planck's constant, kB Boltzmann constant and T the temperature in kelvins. If the frequency is low enough, that means:
(this assumption is valid until few terahertz at room temperature) then the exponential can be expressed in terms of its Taylor series. The relationship then becomes:
In general, both R and T depend on frequency. In order to know the total noise it is enough to integrate over all the bandwidth. Since the signal is real, it is possible to integrate over only the positive frequencies, then multiply by 2. Assuming that R and T are constants over all the bandwidth f, then the root mean square (RMS) value of the voltage across a resistor due to thermal noise is given by
that is, the same formula as above.
WHITE NOISE:
White noise is a random signal (or process) with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency. White noise draws its name from white light in which the power spectral density of the light is distributed over the visible band in such a way that the eye's three color receptors (cones) are approximately equally stimulated. In statistical sense, a time series rt is called a white noise if {rt} is a sequence of independent and identically distributed (iid) random variables with finite mean and variance. In particular, if rt is normally distributed with mean zero and variance , the series is called a Gaussian white noise. An infinite-bandwidth white noise signal is a purely theoretical construction. The bandwidth of white noise is limited in practice by the mechanism of noise generation, by the transmission medium and by finite observation capabilities. A random signal is considered "white noise" if it is observed to have a flat spectrum over a medium's widest possible bandwidth.
WHITE NOISE IN A SPATIAL CONTEXT:

While it is usually applied in the context of frequency domain signals, the term white noise is also commonly applied to a noise signal in the spatial domain. In this case, it has an auto correlation which can be represented by a delta function over the relevant space dimensions. The signal is then "white" in the spatial frequency domain (this is equally true for signals in the angular frequency domain, e.g., the distribution of a signal across all angles in the night sky).
STATISTICAL PROPERTIES:
The image to the right displays a finite length, discrete time realization of a white noise process generated from a computer. Being uncorrelated in time does not restrict the values a signal can take. Any distribution of values is possible (although it must have zero DC components). Even a binary signal which can only take on the values 1 or -1 will be white if the sequence is statistically uncorrelated. Noise having a continuous distribution, such as a normal distribution, can of course be white. It is often incorrectly assumed that Gaussian noise (i.e., noise with a Gaussian amplitude distribution see normal distribution) is necessarily white noise, yet neither property implies the other. Gaussianity refers to the probability distribution with respect to the value i.e. the probability that the signal has a certain given value, while the term 'white' refers to the way the signal power is distributed over time or among frequencies. We can therefore find Gaussian white noise, but also Poisson, Cauchy, etc. white noises. Thus, the two words "Gaussian" and "white" are often both specified in mathematical models of systems. Gaussian white noise is a good approximation of many real-world situations and generates mathematically tractable models. These models are used so frequently that the term additive white Gaussian noise has a standard abbreviation: AWGN. Gaussian white noise has the useful statistical property that its values are independent (see Statistical independence). White noise is the generalized mean-square derivative of the Wiener process or Brownian motion.
APPLICATIONS:
It is used by some emergency vehicle sirens due to its ability to cut through background noise, which makes it easier to locate. White noise is commonly used in the production of electronic music, usually either directly or as an input for a filter to create other types of noise signal. It is used extensively in audio synthesis, typically to recreate percussive instruments such as cymbals which have high noise content in their frequency domain. It is also used to generate impulse responses. To set up the equalization (EQ) for a concert or other performance in a venue, a short burst of white or pink noise is sent through the PA system and monitored from various points in the venue so that the engineer can tell if the acoustics of the building naturally boost or cut any frequencies. The engineer can then adjust the overall equalization to ensure a balanced mix. White noise can be used for frequency response testing of amplifiers and electronic filters. It is not used for testing loudspeakers as its spectrum contains too great an amount of high frequency content. Pink noise is used for testing transducers such as loudspeakers and microphones. White noise is a common synthetic noise source used for sound masking by a tinnitus masker.[1] White noise is a particularly good source signal for masking devices as it contains higher frequencies in equal volumes to lower ones, and so is capable of more effective masking for high pitched ringing tones most commonly perceived by tinnitus sufferers.
White noise is used as the basis of some random number generators. For example, Random.org uses a system of atmospheric antennae to generate random digit patterns from white noise. White noise machines and other white noise sources are sold as privacy enhancers and sleep aids and to mask tinnitus. Some people claim white noise, when used with headphones, can aid concentration by masking irritating or distracting noises in a person's environment.
MATHEMATICAL DEFINITION:
White random vector: A random vector are the following: is a white random vector if and only if its mean vector and autocorrelation matrix
That is, it is a zero mean random vector, and its autocorrelation matrix is a multiple of the identity matrix. When the autocorrelation matrix is a multiple of the identity, we say that it has spherical correlation. White random process (white noise) A continuous time random process w(t) where function and autocorrelation function satisfy the following: is a white noise process if and only if its mean
i.e. it is a zero mean process for all time and has infinite power at zero time shift since its autocorrelation function is the Dirac delta function. The above autocorrelation function implies the following power spectral density.
since the Fourier transform of the delta function is equal to 1. Since this power spectral density is the same at all frequencies, we call it white as an analogy to the frequency spectrum of white light. A generalization to random elements on infinite dimensional spaces, such as random fields, is the white noise measure. Random vector transformations Two theoretical applications using a white random vector are the simulation and whitening of another arbitrary random vector. To simulate an arbitrary random vector, we transform a white random vector with a carefully chosen matrix. We choose the transformation matrix so that the mean and covariance matrix of the
transformed white random vector matches the mean and covariance matrix of the arbitrary random vector that we are simulating. To whiten an arbitrary random vector, we transform it by a different carefully chosen matrix so that the output random vector is a white random vector. These two ideas are crucial in applications such as channel estimation and channel equalization in communications and audio. These concepts are also used in data compression. Simulating a random vector Suppose that a random vector has covariance matrix Kxx. Since this matrix is Hermitian symmetric
and positive semi definite, by the spectral theorem from linear algebra, we can diagonalize or factor the matrix in the following way.
where E is the orthogonal matrix of eigenvectors and is the diagonal matrix of eigenvalues. We can simulate the 1st and 2nd moment properties of this random vector covariance matrix Kxx via the following transformation of a white vector with mean and
of unit variance:
where
Thus, the output of this transformation has expectation
and covariance matrix
Whitening a random vector The method for whitening a vector following calculation: with mean and covariance matrix Kxx is to perform the
Thus, the output of this transformation has expectation
and covariance matrix
By diagonalizing Kxx, we get the following:
Thus, with the above transformation, we can whiten the random vector to have zero mean and the identity covariance matrix. Random signal transformations We cannot extend the same two concepts of simulating and whitening to the case of continuous time random signals or processes. For simulating, we create a filter into which we feed a white noise signal. We choose the filter so that the output signal simulates the 1st and 2nd moments of any arbitrary random process. For whitening, we feed any arbitrary random signal into a specially chosen filter so that the output of the filter is a white noise signal. Simulating a continuous-time random signal
White noise fed into a linear, time-invariant filter to simulate the 1st and 2nd moments of an arbitrary random process.
We can simulate any wide-sense stationary, continuous-time random process mean and covariance function
with constant
and power spectral density
We can simulate this signal using frequency domain techniques. Because Kx() is Hermitian symmetric and positive semi-definite, it follows that Sx() is real and can be factored as
if and only if Sx() satisfies the Paley-Wiener criterion.
If Sx() is a rational function, we can then factor it into pole-zero form as
Choosing a minimum phase H() so that its poles and zeros lie inside the left half s-plane, we can then simulate x(t) with H() as the transfer function of the filter. We can simulate x(t) by constructing the following linear, time-invariant filter
where w(t) is a continuous-time, white-noise signal with the following 1st and 2nd moment properties:
Thus, the resultant signal
has the same 2nd moment properties as the desired signal x(t).
Whitening a continuous-time random signal
An arbitrary random process x(t) fed into a linear, time-invariant filter that whitens x(t) to create white noise at the output.
Suppose we have a wide-sense stationary, continuous-time random process defined with the same mean , covariance function Kx(), and power spectral density Sx() as above. We can whiten this signal using frequency domain techniques. We factor the power spectral density Sx() as described above.
Choosing the minimum phase H() so that its poles and zeros lie inside the left half s-plane, we can then whiten x(t) with the following inverse filter
We choose the minimum phase filter so that the resulting inverse filter is stable. Additionally, we must be sure that H() is strictly positive for all The final form of the whitening procedure is as follows: so that Hinv() does not have any singularities.
so that w(t) is a white noise random process with zero mean and constant, unit power spectral density
Note that this power spectral density corresponds to a delta function for the covariance function of w(t).
Narrowband Noise Representation
In most communication systems, we are often dealing with band-pass filtering of signals. Wideband noise will be shaped into band limited noise. If the bandwidth of the band limited noise is relatively small compared to the carrier frequency, we refer to this as narrowband noise. We can derive the power spectral density G n (f) and the auto-correlation function R nn ( ) of the narrowband noise and use them to analyze the performance of linear systems. In practice, we often deal with mixing (multiplication), which is a non-linear operation, and the system analysis becomes difficult. In such a case, it is useful to express the narrowband noise as
n(t) = x(t) cos 2 fct - y(t) sin 2 fct
(1)
where fc is the carrier frequency within the band occupied by the noise. x(t) and y(t) are known as the quadrature components of the noise n(t). The Hibert transform of n(t) is
^ n (t) = H[n(t)] = x(t) sin 2 fct + y(t) cos 2 fct
(2)
Proof: The Fourier transform of n (t) is
N(f) =
1 2
X(f - fc) +
1 2
X(f+ fc) +
1 2
jY(f- fc) -
1 2
jY(f+ fc)
^ ^ Let N ( f ) be the Fourier transform of n ( t ).
In the frequency domain,
^ N (f) = N (f)[-j sgn(f)]. We simply multiply all positive frequency components of N(f) by -j and all negative frequency components of N(f) by j. Thus,
1 1 1 1 ^ N (f) = -j X(f-fc)+ j X(f+ fc) - j jY(f- fc) - j jY(f+ f ) c 2 2 2 2
1 1 1 1 ^ N (f) = -j X(f - fc) + j X(f+ fc) + Y(f- fc) + Y(f+ fc) 2 2 2 2
^ and the inverse Fourier transform of N (f) is
^ n (t) = x(t) sin 2 fct + y(t) cos 2 fct
The quadrature components x(t) and y(t) can now be derived from equations (1) and (2). ^ x(t) = n(t)cos 2 fct + n (t)sin 2 fct and ^ y(t) = n(t)cos 2 fct - n (t)sin 2 fct (4) (3)
Given n(t), the quadrature components x(t) and y(t) can be obtained by using the arrangement. x(t) and y(t) have the following properties: 1. 2. 3. 4. E[x(t) y(t)] = 0. x(t) and y(t) are uncorrelated with each other. x(t) and y(t) have the same means and variances as n(t). If n(t) is Gaussian, then x(t) and y(t) are also Gaussian. x (t) and y (t) have identical power spectral densities, related to the power spectral density of n(t) by
Gx(f) = Gy(f) = Gn(f- fc) + Gn(f+ fc)
(5)
for fc - 0.5B < | f | < fc + 0.5B and B is the bandwidth of n(t).
Proof: Equation (5) is the key that will enable us to calculate the effect of noise on AM and FM systems. It implies that the power spectral density of x(t) and y(t) can be found by
shifting the positive portion and negative portion of Gn(f) to zero frequency and adding to give Gx(f) and Gy(f). In the special case where G n (f) is symmetrical about the carrier frequency f c , the positive- and negative-frequency contributions are shifted to zero frequency and added to give Gx(f) = Gy(f) = 2Gn(f- fc) = 2Gn(f+ fc) (6)
Performance of Binary FSK:

Consider the synchronous detector of binary FSK signals. In the presence of additive white Gaussian noise (AWGN), w(t), the received signal is r(t) = Acos 2 fc t + w(t) 1 where A is a constant and fc is the carrier frequency employed if a 1 has been sent. The signals at the output of 1 the band-pass filters of centre frequencies fc and fc are 1 2 r1(t) = Acos 2 fc t + n1(t) 1 and r2(t) = n2(t) where n 1(t) = x1 (t) cos 2 fc t - y1 (t) sin 2 fc t 1 1 and n 2(t) = x2 (t) cos 2 fc t - y2 (t) sin 2 fc t 2 2 are the narrowband noise. With appropriate design of low-pass filter and sampling period, the sampled output signals are
vo1 = A + x1
vo2 = x2 and
v = A + [x 1 - x 2 ].
x1 and x2 are statistically independent Gaussian random variables with zero mean and fixed variance
= N, where N is the power of the random variable. It can be seen that one of the detectors has signal plus noise, the other detector has noise only.
When fc is the carrier frequency employed for sending a 0, the received signal is 2
r(t) = Acos 2 fc t + w(t). It can be 2 shown that
v = -A + [x1 - x2 ]
Since E [ x 1 - x 2 ] 2 = E [ x 1 ] 2 - 2E [ x 1 x 2 ] 2 + E [ x 2 ] 2 = E [ x 1 ] 2 + E [ x 2 ] 2 = 2 + variance 2 2. t =2
2 , the total
NOISE TEMPERATURE: In electronics, noise temperature is a temperature (in Kelvins) assigned to a component such that the noise power delivered by the noisy component to a noiseless matched resistor is given by PRL = kBTsBn in watts, where: is the Boltzmann constant (1.3811023 J/K, joules per Kelvin) is the noise temperature (K) is the noise bandwidth (Hz) Engineers often model noisy components as an ideal component in series with a noisy resistor. The source resistor is often assumed to be at room temperature, conventionally taken as 290 K (17 C, 62 F).
APPLICATIONS:
A communications system is typically made up of a transmitter, a communications channel, and a receiver. The communications channel may consist of any one or a combination of many different physical media (air, coaxial cable, printed wiring board traces). The important thing to note is that no matter what physical media the channel consists of, the transmitted signal will be randomly corrupted by a number of different processes. The most common form of signal degradation is called additive noise. The additive noise in a receiving system can be of thermal origin (thermal noise) or can be from other noisegenerating processes. Most of these other processes generate noise whose spectrum and probability distributions are similar to thermal noise. Because of these similarities, the contributions of all noise sources can be lumped together and regarded as thermal noise. The noise power generated by all these sources ( assigning to the noise a noise temperature ( Tn = Pn / (kBn) In a wireless communications receiver, Tn = (Tant + Tsys) is the antenna noise temperature and determines the noise power seen at the output of the antenna. The physical temperature of the antenna has no affect on . is the noise temperature of the receiver circuitry would equal the sum of two noise temperatures: ) defined as: ) can be described by
and is representative of the noise generated by the non-ideal components inside the receiver.
NOISE FACTOR AND NOISE FIGURE:

An important application of noise temperature is its use in the determination of a components noise factor. The noise factor quantifies the noise power that the component adds to the system when its input noise temperature is .
The noise factor (a linear term) can be converted to noise figure (in decibels) using:
NOISE TEMPERATURE OF A CASCADE:

If there are multiple noisy components in cascade, the noise temperature of the cascade can be calculated using the Friis equation:
where = cascade noise temperature = noise temperature of the first component in the cascade = noise temperature of the second component in the cascade = noise temperature of the third component in the cascade = noise temperature of the nth component in the cascade = linear gain of the first component in the cascade = linear gain of the second component in the cascade = linear gain of the third component in the cascade = linear gain of the (n-1) component in the cascade Components early in the cascade have a much larger influence on the overall noise temperature than those later in the chain. This is because noise introduced by the early stages is, along with the signal, amplified by the later stages. The Friis equation shows why a good quality preamplifier is important in a receive chain.
MEASURING NOISE TEMPERATURE:

The direct measurement of a components noise temperature is a difficult process. Suppose that the noise temperature of a low noise amplifier (LNA) is measured by connecting a noise source to the LNA with a piece of transmission line. From the cascade noise temperature it can be seen that the noise temperature of the transmission line ( ) has the potential of being the largest contributor to the output measurement (especially when you consider that LNAs can have noise temperatures of only a few Kelvin). To accurately measure the noise temperature of the
LNA the noise from the input coaxial cable needs to be accurately known. This is difficult because poor surface finishes and reflections in the transmission line make actual noise temperature values higher than those predicted by theoretical analysis. Similar problems arise when trying to measure the noise temperature of an antenna. Since the noise temperature is heavily dependent on the orientation of the antenna, the direction that the antenna was pointed during the test needs to be specified. In receiving systems, the system noise temperature will have three main contributors, the antenna ( ), the transmission line ( ), and the receiver circuitry ( ). The antenna noise
temperature is considered to be the most difficult to measure because the measurement must be made in the field on an open system. One technique for measuring antenna noise temperature involves using cryogenically cooled loads to calibrate a noise figure meter before measuring the antenna. This provides a direct reference comparison at a noise temperature in the range of very low antenna noise temperatures, so that little extrapolation of the collected data is required.
NOISE FIGURE:
Noise figure (NF) is a measure of degradation of the signal-to-noise ratio (SNR), caused by components in a radio frequency (RF) signal chain. The noise figure is defined as the ratio of the output noise power of a device to the portion thereof attributable to thermal noise in the input termination at standard noise temperature T0 (usually 290 K). The noise figure is thus the ratio of actual output noise to that which would remain if the device itself did not introduce noise. It is a number by which the performance of a radio receiver can be specified. The noise figure is the difference in decibels (dB) between the noise output of the actual receiver to the noise output of an ideal receiver with the same overall gain and bandwidth when the receivers are connected to sources at the standard noise temperature T0 (usually 290 K). The noise power from a simple load is equal to kTB, where k is Boltzmann's constant, T is the absolute temperature of the load (for example a resistor), and B is the measurement bandwidth. This makes the noise figure a useful figure of merit for terrestrial systems where the antenna effective temperature is usually near the standard 290 K. In this case, one receiver with a noise figure say 2 dB better than another, will have an output signal to noise ratio that is about 2 dB better than the other. However, in the case of satellite communications systems, where the antenna is pointed out into cold space, the antenna effective temperature is often colder than 290 K. In these cases a 2 dB improvement in receiver noise figure will result in more than a 2 dB improvement in the output signal to noise ratio. For this reason, the related figure of effective noise temperature is therefore often used instead of the noise figure for characterizing satellite-communication receivers and low noise amplifiers. In heterodyne systems, output noise power includes spurious contributions from image-frequency transformation, but the portion attributable to thermal noise in the input termination at standard noise temperature includes only that which appears in the output via the principal frequency transformation of the system and excludes that which appears via the image frequency transformation.
DEFINITION:
The noise factor of a system is defined as:
where SNRin and SNRout are the input and output power signal-to-noise ratios, respectively. The noise figure is defined as:
where SNRin,dB and SNRout,dB are in decibels (dB). The noise figure is the noise factor, given in dB:
These formulae are only valid when the input termination is at standard noise temperature T0, although in practice small differences in temperature do not significantly affect the values. The noise factor of a device is related to its noise temperature Te:
Devices with no gain (e.g., attenuators) have a noise factor F equal to their attenuation L (absolute value, not in dB) when their physical temperature equals T0. More generally, for an attenuator at a physical temperature T, the noise temperature is Te = (L 1)T, giving a noise factor of:
If several devices are cascaded, the total noise factor can be found with Friis' Formula:
where Fn is the noise factor for the n-th device and Gn is the power gain (linear, not in dB) of the n-th device. In a well designed receive chain, only the noise factor of the first amplifier should be significant.
IMPORTANT QUESTIONS
PART A
1. Define noise figure. 2. What is white noise? 3. What is thermal noise? Give the expression for the thermal noise voltage across a resistor. 4. What is shot noise? 5. Define noise temperature. 6. Find the thermal noise voltage developed across a resistor of 700ohm. The bandwidth of the measuring instrument is 7MHz and the ambient temperature is 27C. 7. Define a random variable? 8. What is a random process? 9. What is Gaussian process? 10. What is a stationary random process?
PART B 1. Derive the effective noise temperature of a cascade amplifier. Explain how the various noises are generated in the method of representing them. (16) 2. Explain how the various noises are generated and the method of representing them. (16) 3. Write notes on noise temperature and noise figure. (8) 4. Derive the noise figure for cascade stages. (8) 5. What is narrowband noise discuss the properties of the quadrature components of a narrow band noise. (8) 6. What is meant by noise equivalent bandwidth? Illustrate it with a diagram. (8) 7. Derive the expression for output signal to noise for a DSB-SC receiver using coherent detection. (16) 8. Write short notes on noise in SSB. (16) 9. Discuss the following: . (16) i) Noise equivalent bandwidth (4) ii) Narrow band noise (4) iii) Noise temperature (4) iv) Noise spectral density (4) 12. How sine wave plus noise is represented? Obtain the joint PDF of such noise Component. (16)
UNIT IV PERFORMANCE OF CW MODULATION SYSTEMS
Superheterodyne radio receiver and its characteristic. SNR. Noise in DSBSC systems using coherent detection. Noise in AM system using envelope detection FM system. FM threshold effect. Pre-emphasis and de-emphasis in FM. Comparison of performances.
SUPERHETERODYNE RADIO RECEIVER:

In electronics, a superheterodyne receiver uses frequency mixing or heterodyning to convert a received signal to a fixed intermediate frequency, which can be more conveniently processed than the original radio carrier frequency. Virtually all modern radio and television receivers use the superheterodyne principle.
DESIGN AND EVOLUTION:
Schematic of a typical superheterodyne receiver. The diagram at right shows the minimum requirements for a single-conversion superheterodyne receiver design. The essential elements are common to all superheterodyne circuits. A signal receiving antenna, a broadband r.f. amplifier, a variable frequency local oscillator, a frequency mixer, a band pass filter to remove unwanted mixer product signals, a demodulator to recover the original audio signal. Cost-optimized designs use one active device for both local oscillator and mixer called a "converter" stage. One example is the pentagrid converter. Circuit description: A suitable antenna is required to receive the chosen range of broadcast signals. The signal received is very small, sometimes only a few microvolts. Reception starts with the antenna signal fed to the R.F. stage. The R.F. amplifier stage must be selectively tuned to pass only the desired range of channels required. To allow the receiver to be tuned to a particular broadcast channel a method of changing the frequency of the local oscillator is needed. The tuning circuit in a simple design may use a variable capacitor, or varicap diode. Only one or two tuned stages need to be adjusted to track over the tuning range of the receiver. Mixer stage The signal is then fed into the mixer stage circuit. The mixer is also fed with a signal from the variable frequency local oscillator (VFO) circuit. The mixer produces both sum and difference beat frequencies signals catch one containing a copy of the desired signal. The four frequencies at the output include the wanted signal fd, the original fLO, and the two new frequencies fd+fLO and fd-fLO. The output signal also contains a number of undesirable frequencies. These are 3rd- and higher-order inter modulation products. These multiple signals are removed by the R.F. bandpass filter, leaving only the desired offset I.F frequency signal fIF which contains the original broadcast information fd.
Intermediate frequency stage: All the intermediate-frequency stages operate at a fixed frequency which need not be adjusted. [6] The I.F. amplifier section fIF is tuned to be highly selective. By changing fLO, the resulting fd-fLO (or fd+fLO) signal can be tuned to the amplifier's fIF. the suitably amplified signal includes the frequency the user wishes to tune, fd. The local oscillator is tuned to produce a frequency close to fd, fLO. In typical amplitude modulation ("AM radio" in the U.S., or MW) receivers, that frequency is 455 kHz;[10] for FM receivers, it is usually 10.7 MHz; for television 33.4 to 45.75 MHz. Other signals from the mixed output of the heterodyne are filtered out by this stage. This depends on the intermediate frequency chosen in the design process. Typically it is 455 kHz for a single stage conversion receiver. The higher the chosen I.F. offset will reduce the effect interference from powerful radio transmissions in adjacent broadcast bands will have on the required signal. Usually the intermediate frequency is lower than either the carrier or oscillator frequencies, but with some types of receiver (e.g. scanners and spectrum analyzers) it is more convenient to use a higher intermediate frequency. In order to avoid interference to and from signal frequencies close to the intermediate frequency, in many countries IF frequencies are controlled by regulatory authorities. Examples of common IFs are 455 kHz for medium-wave AM radio, 10.7 MHz for FM, 38.9 MHz (Europe) or 45 MHz (US) for television, and 70 MHz for satellite and terrestrial microwave equipment. Bandpass filter: The filter must have a band pass range equal to or lesser than the frequency spacing between adjacent broadcast channels. A perfect filter would have high attenuation factor to adjacent channels, but with a broad bandpass response to obtain a better quality of received signal. This may be designed with a dual frequency tuned coil filter design, or a multi pole ceramic crystal filter. Demodulation: The received signal is now processed by the demodulator stage where the broadcast, (usually audio, but may be data), signal is recovered and amplified. A.M. demodulation requires the simple rectification of the R.F. signal to remove one sideband, and a simple resistor and capacitor low pass RC filter to remove the high frequency R.F. carrier component. Other modes of transmission will require more specialized circuits to recover the broadcast signal. The remaining audio signal is then amplified and fed to a suitable transducer, such as a loudspeaker or headphones. Advanced designs: To overcome obstacles such as image response, multiple IF stages are used, and in some cases multiple stages with two IFs of different values are used. For example, the front end might be sensitive to 130 MHz, the first half of the radio to 5 MHz, and the last half to 50 kHz. Two frequency converters would be used, and the radio would be a double conversion superheterodyne; a common example is a television receiver where the audio information is obtained from a second stage of intermediate-frequency conversion. Receivers which are tunable
over a wide bandwidth (e.g. scanners) may use an intermediate frequency higher than the signal, in order to improve image rejection. Other uses: In the case of modern television receivers, no other technique was able to produce the precise bandpass characteristic needed for vestigial sideband reception, first used with the original NTSC system introduced in 1941. This originally involved a complex collection of tunable inductors which needed careful adjustment, but since the 1970s or early 1980s these have been replaced with precision electromechanical surface acoustic wave (SAW) filters. Fabricated by precision laser milling techniques, SAW filters are cheaper to produce, can be made to extremely close tolerances, and are stable in operation. To avoid tooling costs associated with these components most manufacturers then tended to design their receivers around the fixed range of frequencies offered which resulted in de-facto standardization of intermediate frequencies. Modern designs: Microprocessor technology allows replacing the superheterodyne receiver design by a software defined radio architecture, where the IF processing after the initial IF filter is implemented in software. This technique is already in use in certain designs, such as very low-cost FM radios incorporated into mobile phones, since the system already has the necessary microprocessor. Radio transmitters may also use a mixer stage to produce an output frequency, working more or less as the reverse of a superheterodyne receiver. Technical advantages: Superheterodyne receivers have superior characteristics to simpler receiver types in frequency stability and selectivity. They offer better stability than Tuned radio frequency receivers (TRF) because a tunable oscillator is more easily stabilized than a tunable amplifier, especially with modern frequency synthesizer technology. IF filters can give narrower pass bands at the same Q factor than an equivalent RF filter. A fixed IF also allows the use of a crystal filter when exceptionally high selectivity is necessary. Regenerative and super-regenerative receivers offer better sensitivity than a TRF receiver, but suffer from stability and selectivity problems. Drawbacks of this design: High-side and low-side injection: The amount that a signal is down-shifted by the local oscillator depends on whether its frequency f is higher or lower than fLO. That is because its new frequency is |f fLO| in either case. Therefore, there are potentially two signals that could both shift to the same fIF; one at f = fLO + fIF and another at f = fLO fIF. One of those signals, called the image frequency, has to be filtered out prior to the mixer to avoid aliasing. When the upper one is filtered out, it is called high-side injection, because fLO is above the frequency of the received signal. The other case is called low-side injection. High-side injection also reverses the order of a signal's frequency components. Whether that actually changes the signal depends on whether it has spectral symmetry. The reversal can be undone later in the receiver, if necessary.
Image Frequency (fimage): One major disadvantage to the superheterodyne receiver is the problem of image frequency. In heterodyne receivers, an image frequency is an undesired input frequency equal to the station frequency plus twice the intermediate frequency. The image frequency results in two stations being received at the same time, thus producing interference. Image frequencies can be eliminated by sufficient attenuation on the incoming signal by the RF amplifier filter of the superheterodyne receiver.
Early Autodyne receivers typically used IFs of only 150 kHz or so, as it was difficult to maintain reliable oscillation if higher frequencies were used. As a consequence, most Autodyne receivers needed quite elaborate antenna tuning networks, often involving double-tuned coils, to avoid image interference. Later super heterodynes used tubes especially designed for oscillator/mixer use, which were able to work reliably with much higher IFs, reducing the problem of image interference and so allowing simpler and cheaper aerial tuning circuitry. For medium-wave AM radio, a variety of IFs have been used, but usually 455 kHz is used. Local oscillator radiation: It is difficult to keep stray radiation from the local oscillator below the level that a nearby receiver can detect. The receiver's local oscillator can act like a miniature CW transmitter. This means that there can be mutual interference in the operation of two or more superheterodyne receivers in close proximity. In espionage, oscillator radiation gives a means to detect a covert receiver and its operating frequency. One effective way of preventing the local oscillator signal from radiating out from the receiver's antenna is by adding a shielded and power supply decoupled stage of RF amplification between the receiver's antenna and its mixer stage. Local oscillator sideband noise: Local oscillators typically generate a single frequency signal that has negligible amplitude modulation but some random phase modulation. Either of these impurities spreads some of the signal's energy into sideband frequencies. That causes a corresponding widening of the receiver's frequency response, which would defeat the aim to make a very narrow bandwidth receiver such as to receive low-rate digital signals. Care needs to be taken to minimize oscillator phase noise, usually by ensuring that the oscillator never enters a non-linear mode.
SIGNAL-TO-NOISE RATIO:
Signal-to-noise ratio (often abbreviated SNR or S/N) is a measure used in science and engineering to quantify how much a signal has been corrupted by noise. It is defined as the ratio of signal power to the noise power corrupting the signal. A ratio higher than 1:1 indicates more signal than noise. While SNR is commonly quoted for electrical signals, it can be applied to any form of signal (such as isotope levels in an ice core or biochemical signaling between cells).
In less technical terms, signal-to-noise ratio compares the level of a desired signal (such as music) to the level of background noise. The higher the ratio, the less obtrusive the background noise is. "Signal-to-noise ratio" is sometimes used informally to refer to the ratio of useful information to false or irrelevant data in a conversation or exchange. For example, in online discussion forums and other online communities, off-topic posts and spam are regarded as "noise" that interferes with the "signal" of appropriate discussion.
FM DEMODULATORS AND THRESHOLD EFFECT:

An important aspect of analogue FM satellite systems is FM threshold effect. In FM systems where the signal level is well above noise received carrier-to-noise ratio and demodulated signal-to-noise ratio are related by:
The expression however does not apply when the carrier-to-noise ratio decreases below a certain point. Below this critical point the signal-to-noise ratio decreases significantly. This is known as the FM threshold effect (FM threshold is usually defined as the carrier-to-noise ratio at which the demodulated signal-to-noise ratio fall 1 dB below the linear relationship. It generally is considered to occur at about 10 dB). Below the FM threshold point the noise signal (whose amplitude and phase are randomly varying), may instantaneously have an amplitude greater than that of the wanted signal. When this happens the noise will produce a sudden change in the phase of the FM demodulator output. In an audio system this sudden phase change makes a "click". In video applications the term "click noise" is used to describe short horizontal black and white lines that appear randomly over a picture. Because satellite communications systems are power limited they usually operate with only a small design margin above the FM threshold point (perhaps a few dB). Because of this circuit designers have tried to devise techniques to delay the onset of the FM threshold effect. These devices are generally known as FM threshold extension demodulators. Techniques such as FM feedback, phase locked loops and frequency locked loops are used to achieve this effect. By such techniques the onset of FM threshold effects can be delayed till the C/N ratio is around 7 dB Pre-emphasis and de-emphasis: Random noise has a 'triangular' spectral distribution in an FM system, with the effect that noise occurs predominantly at the highest frequencies within the baseband. This can be offset, to a limited extent, by boosting the high frequencies before transmission and reducing them by a corresponding amount in the receiver. Reducing
the high frequencies in the receiver also reduces the high-frequency noise. These processes of boosting and then reducing certain frequencies are known as pre-emphasis and de-emphasis, respectively. The amount of pre-emphasis and de-emphasis used is defined by the time constant of a simple RC filter circuit. In most of the world a 50 s time constant is used. In North America, 75 s is used. This applies to both mono and stereo transmissions and to baseband audio (not the subcarriers). The amount of pre-emphasis that can be applied is limited by the fact that many forms of contemporary music contain more high-frequency energy than the musical styles which prevailed at the birth of FM broadcasting. They cannot be pre-emphasized as much because it would cause excessive deviation of the FM carrier. (Systems more modern than FM broadcasting tend to use either programme-dependent variable pre-emphasise.g. dbx in the BTSC TV sound systemor none at all.) FM stereo: In the late 1950s, several systems to add stereo to FM radio were considered by the FCC. Included were systems from 14 proponents including Crosley, Halstead, Electrical and Musical Industries, Ltd (EMI), Zenith Electronics Corporation and General Electric. The individual systems were evaluated for their strengths and weaknesses during field tests in Uniontown, Pennsylvania using KDKA-FM in Pittsburgh as the originating station. The Crosley system was rejected by the FCC because it degraded the signal-to-noise ratio of the main channel and did not perform well under multipath RF conditions. In addition, it did not allow for SCA services because of its wide FM sub-carrier bandwidth. The Halstead system was rejected due to lack of high frequency stereo separation and reduction in the main channel signal-to-noise ratio. The GE and Zenith systems, so similar that they were considered theoretically identical, were formally approved by the FCC in April 1961 as the standard stereo FM broadcasting method in the USA and later adopted by most other countries. It is important that stereo broadcasts should be compatible with mono receivers. For this reason, the left (L) and right (R) channels are algebraically encoded into sum (L+R) and difference (LR) signals. A mono receiver will use just the L+R signal so the listener will hear both channels in the single loudspeaker. A stereo receiver will add the difference signal to the sum signal to recover the left channel, and subtract the difference signal from the sum to recover the right channel. The (L+R) Main channel signal is transmitted as baseband audio in the range of 30 Hz to 15 kHz. The (LR) Sub-channel signal is modulated onto a 38 kHz double-sideband suppressed carrier (DSBSC) signal occupying the baseband range of 23 to 53 kHz. A 19 kHz pilot tone, at exactly half the 38 kHz sub-carrier frequency and with a precise phase relationship to it, as defined by the formula below, is also generated. This is transmitted at 810% of overall modulation level and used by the receiver to regenerate the 38 kHz sub-carrier with the correct phase. The final multiplex signal from the stereo generator contains the Main Channel (L+R), the pilot tone, and the sub-channel (LR). This composite signal, along with any other sub-carriers, modulates the FM transmitter. The instantaneous deviation of the transmitter carrier frequency due to the stereo audio and pilot tone (at 10% modulation) is:
[2]
Where A and B are the pre-emphasized Left and Right audio signals and fp is the frequency of the pilot tone. Slight variations in the peak deviation may occur in the presence of other subcarriers or because of local regulations. Converting the multiplex signal back into left and right audio signals is performed by a stereo decoder, which is built into stereo receivers. In order to preserve stereo separation and signal-to-noise parameters, it is normal practice to apply preemphasis to the left and right channels before encoding, and to apply de-emphasis at the receiver after decoding. Stereo FM signals are more susceptible to noise and multipath distortion than are mono FM signals. In addition, for a given RF level at the receiver, the signal-to-noise ratio for the stereo signal will be worse than for the mono receiver. For this reason many FM stereo receivers include a stereo/mono switch to allow listening in mono when reception conditions are less than ideal, and most car radios are arranged to reduce the separation as the signal-to-noise ratio worsens, eventually going to mono while still indicating a stereo signal is being received.
IMPORTANT QUESTION PART A 1. How to achieve threshold reduction in FM receiver? 2. What is meant by FOM of a receiver? 3. What is extended threshold demodulator? 4. Draw the Phasor representation of FM noise. 5. Define pre-emphasis and de-emphasis. 6. What is capture effect in FM? 7. What is the SNR for AM with small noise case? 8. What is threshold effect with respect to noise? 9. Define SNR. 10. Define CSNR. 11. Discuss the factors that influence the choice of intermediate frequency in a radio receiver.
PART B 1. Define Hilbert Transform with a suitable example. Give the method of generation and detection of SSB waver. . (16) 2. Discuss the noise performance of AM system using envelope detection. (16) 3. Compare the noise performance of AM and FM systems. (16) 4. Explain the significance of pre-emphasis and de-emphasis in FM system? (8) 5. Derive the noise power spectral density of the FM demodulation and explain its performance with diagram. (16) 6. Draw the block diagram of FM demodulator and explain the effect of noise in detail. Explain the FM threshold effect and capture effect in FM? (16) 7. Explain the FM receiver with block diagram. (8)
UNIT V INFORMATION THEORY
Discrete messages and information content. Concept of amount of information. Average information. Entropy. Information rate. Source coding to increase average information per bit. Shannon-fano coding. Huffman coding. Lempel-Ziv (LZ) coding. Shannons theorem. Channel capacity. Bandwidth. S/N trade-off. Mutual information. Channel capacity. Rate distortion theory. Lossy source coding.
INFORMATION THEORY:
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and communicating data. Since its inception it has broadened to find applications in many other areas, including statistical inference, natural language processing, cryptography generally, networks other than communication networks as inneurobiology, the evolution and function of molecular codes, model selection in ecology, thermal physics, quantum computing, plagiarism detection and other forms of data analysis. A key measure of information is known as entropy, which is usually expressed by the average number of bits needed for storage or communication. Entropy quantifies the uncertainty involved in predicting the value of a random variable. For example, specifying the outcome of a fair coin flip (two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a die (six equally likely outcomes). Applications of fundamental topics of information theory include lossless data compression (e.g. ZIP files), lossy data compression (e.g. MP3s), and channel coding (e.g. for DSL lines). The field is at the intersection of mathematics, statistics, computer science, physics, neurobiology, and electrical engineering. Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields. Important sub-fields of information theory are source coding, channel coding, algorithmic complexity theory, algorithmic information theory, information-theoretic security, and measures of information.
OVERVIEW:
The main concepts of information theory can be grasped by considering the most widespread means of human communication: language. Two important aspects of a concise language are as follows: First, the most common words (e.g., "a", "the", "I") should be shorter than less common words (e.g., "benefit", "generation", "mediocre"), so that sentences will not be too long. Such a tradeoff in word length is analogous to data compression and is the essential aspect of source coding. Second, if part of a sentence is unheard or misheard due to noise e.g., a passing car the listener should still be able to glean the meaning of the underlying message. Such robustness is as essential for an electronic communication system as it is for a language; properly building such robustness into communications is done by channel coding. Source coding and channel coding are the fundamental concerns of information theory. Note that these concerns have nothing to do with the importance of messages. For example, a platitude such as "Thank you; come again" takes about as long to say or write as the urgent plea, "Call an ambulance!" while the latter may be more important and more meaningful in many contexts. Information theory, however, does not consider message importance or meaning, as these are matters of the quality of data rather than the quantity and readability of data, the latter of which is determined solely by probabilities. Information theory is generally considered to have been founded in 1948 by Claude Shannon in his seminal work, "A Mathematical Theory of Communication". The central paradigm of classical information theory is the engineering problem of the transmission of information over a noisy channel. The most fundamental results of this theory are Shannon's source coding theorem, which establishes that, on average, the number of bits needed to represent the result of an uncertain event is given by its entropy; and Shannon's noisy-channel coding theorem,
which states that reliable communication is possible over noisy channels provided that the rate of communication is below a certain threshold, called the channel capacity. The channel capacity can be approached in practice by using appropriate encoding and decoding systems. Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half century or more: adaptive systems, anticipatory systems, artificial intelligence, complex systems, complexity science, cybernetics, informatics, machine learning, along with systems sciences of many descriptions. Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of coding theory. Coding theory is concerned with finding explicit methods, called codes, of increasing the efficiency and reducing the net error rate of data communication over a noisy channel to near the limit that Shannon proved is the maximum possible for that channel. These codes can be roughly subdivided into data compression (source coding) and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible. A third class of information theory codes are cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis. See the article ban (information) for a historical application. Information theory is also used in information retrieval, intelligence gathering, gambling, statistics, and even in musical composition.
Quantities of information
Information theory is based on probability theory and statistics. The most important quantities of information are entropy, the information in a random variable, and mutual information, the amount of information in common between two random variables. The former quantity indicates how easily message data can be compressed while the latter can be used to find the communication rate across a channel. The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. The most common unit of information is the bit, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the hartley, which is based on the common logarithm. In what follows, an expression of the form whenever p = 0. This is justified because is considered by convention to be equal to zero for any logarithmic base.
Entropy:
Entropy of a Bernoulli trial as a function of success probability, often called the binary entropy function, Hb(p). The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss. The entropy, H, of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X. Suppose one transmits 1000 bits (0s and 1s). If these bits are known ahead of transmission (to be a certain value with absolute probability), logic dictates that no information has been transmitted. If, however, each is equally and independently likely to be 0 or 1, 1000 bits (in the information theoretic sense) have been transmitted. Between these two extremes, information can be quantified as follows. If messages{x1,...,xn} that X could be, and p(x) is the probability of X given some of X is defined: is the set of all , then the entropy
(Here, I(x) is the self-information, which is the entropy contribution of an individual message, and
is
the expected value.) An important property of entropy is that it is maximized when all the messages in the message space are equiprobable p(x) = 1 / n,i.e., most unpredictablein which case H(X) = logn. The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2:
Joint entropy: The joint entropy of two discrete random variables X and Y is merely the entropy of their pairing: (X,Y). This implies that if X and Y areindependent, then their joint entropy is the sum of their individual entropies. For example, if (X,Y) represents the position of a chess piece X the row and Y the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.
Despite similar notation, joint entropy should not be confused with cross entropy.
Conditional entropy (equivocation):

The conditional entropy or conditional uncertainty of X given random variable Y (also called the equivocation of X about Y) is the average conditional entropy over Y:
Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:
Mutual information (transinformation):

Mutual information measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of X relative to Y is given by:
where SI (Specific mutual Information) is the pointwise mutual information. A basic property of the mutual information is that
That is, knowing Y, we can save an average of I(X;Y) bits in encoding X compared to not knowing Y. Mutual information is symmetric:
Mutual information can be expressed as the average Kullback Leibler divergence (information gain) of the posterior probability distribution of X given the value of Y to the prior distribution on X:
In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:
Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution and to Pearson's 2 test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.
KullbackLeibler divergence (information gain):

The KullbackLeibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the KullbackLeibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined
Although it is sometimes used as a 'distance metric', it is not a true metric since it is not symmetric and does not satisfy the triangle inequality (making it a semi-quasimetric).
Coding theory:
Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source. Data compression (source coding): There are two formulations for the compression problem: 1. 2. lossless data compression: the data must be reconstructed exactly; lossy data compression: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of Information theory is called ratedistortion theory. Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel. This division of coding theory into compression and transmission is justified by the information transmission theorems, or sourcechannel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.
SOURCE THEORY:
Any process that generates successive messages can be considered a source of information. A memoryless source is one in which each message is an independent identically-distributed random variable, whereas the properties of ergodicity and stationarity impose more general constraints. All such sources are stochastic. These terms are well studied in their own right outside information theory.
Rate:
Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is
that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is
that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.
It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of source coding.
Channel capacity:
Communications over a channelsuch as an ethernet cableis the primary motivation of information theory. As anyone who's ever used a telephone (mobile or landline) knows, however, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality. How much information can one hope to communicate over a noisy (or otherwise imperfect) channel? Consider the communications process over a discrete channel. A simple model of the process is shown below:
Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let p(y | x) be the conditional probability distribution function of Y given X. We will consider p(y | x) to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of f(x), the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the channel capacity and is given by:
This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol). For any information rate R < C and coding error > 0, for large enough N, there exists a code of length N and rate R and a decoding algorithm, such that the maximal probability of block error is ; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with arbitrarily small block error. Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity. BIT RATE: In telecommunications and computing, bitrate (sometimes written bit rate, data rate or as a variable R or fb) is the number of bits that are conveyed or processed per unit of time. The bit rate is quantified using the bits per second (bit/s or bps) unit, often in conjunction with an SI prefix such as kilo- (kbit/s or kbps), mega-(Mbit/s or Mbps), giga- (Gbit/s or Gbps) or tera- (Tbit/s or Tbps). Note that, unlike many other computer-related units, 1 kbit/s is traditionally defined as 1,000 bit/s, not 1,024 bit/s, etc., also before 1999 when SI prefixes were introduced for units of information in the standard IEC 60027-2.
The formal abbreviation for "bits per second" is "bit/s" (not "bits/s", see writing style for SI units). In less formal contexts the abbreviations "b/s" or "bps" are often used, though this risks confusion with "bytes per second" ("B/s", "Bps"). 1 Byte/s (Bps or B/s) corresponds to 8 bit/s (bps or b/s).
ShannonFano coding
In the field of data compression, ShannonFano coding, named after Claude Elwood Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding; however unlike Huffman coding, it does guarantee that all code word lengths are within one bit of their theoretical ideal logP(x). The technique was proposed in Shannon's "A Mathematical Theory of Communication", his 1948 article introducing the field of information theory. The method was attributed to Fano, who later published it as a technical report. ShannonFano coding should not be confused with Shannon coding, the coding method used to prove Shannon's noiseless coding theorem, or with Shannon-Fano-Elias coding (also known as Elias coding), the precursor to arithmetic coding. In ShannonFano coding, the symbols are arranged in order from most probable to least probable, and then divided into two sets whose total probabilities are as close as possible to being equal. All symbols then have the first digits of their codes assigned; symbols in the first set receive "0" and symbols in the second set receive "1". As long as any sets with more than one member remain, the same process is repeated on those sets, to determine successive digits of their codes. When a set has been reduced to one symbol, of course, this means the symbol's code is complete and will not form the prefix of any other symbol's code. The algorithm works, and it produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish them is used most efficiently. Unfortunately, ShannonFano does not always produce optimal prefix codes; the set of probabilities {0.35, 0.17, 0.17, 0.16, 0.15} is an example of one that will be assigned non-optimal codes by ShannonFano coding. For this reason, ShannonFano is almost never used; Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits. This is a constraint that is often unneeded, since the codes will be packed end-to-end in long sequences. If we consider groups of codes at a time, symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are independent and are some power of a half, i.e., . In most
situations, arithmetic coding can produce greater overall compression than either Huffman or Shannon Fano, since it can encode in fractional numbers of bits which more closely approximate the actual information content of the symbol. However, arithmetic coding has not superseded Huffman the way that Huffman supersedes ShannonFano, both because arithmetic coding is more computationally expensive and because it is covered by multiple patents. ShannonFano coding is used in the IMPLODE compression method, which is part of the ZIP file format.
SHANNONFANO ALGORITHM:
A ShannonFano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple: 1. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbols relative frequency of occurrence is known. 2. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right. 3. Divide the list into two parts, with the total frequency counts of the left half being as close to the total of the right as possible. 4. The left half of the list is assigned the binary digit 0, and the right half is assigned the digit 1. This means that the codes for the symbols in the first half will all start with 0, and the codes in the second half will all start with 1. 5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree. Example
ShannonFano Algorithm The example shows the construction of the Shannon code for a small alphabet. The five symbols which can be coded have the following frequency: Symbol A B C D E
Count
15
Probabilities 0.38461538 0.17948718 0.15384615 0.15384615 0.12820513
All symbols are sorted by frequency, from left to right (shown in Figure a). Putting the dividing line between symbols B and C results in a total of 22 in the left group and a total of 17 in the right group. This minimizes the difference in totals between the two groups. With this division, A and B will each have a code that starts with a 0 bit, and the C, D, and E codes will all start with a 1, as shown in Figure b. Subsequently, the left half of the tree gets a new division between A and B, which puts A on a leaf with code 00 and B on a leaf with code 01. After four division procedures, a tree of codes results. In the final tree, the three symbols with the highest frequencies have all been assigned 2-bit codes, and two symbols with lower counts have 3-bit codes as shown table below: Symbol A B C D E
Code
00 01 10 110 111
Results in 2 bits for A, B and C and per 3 bits for D and E an average bit number of
HUFFMAN CODING:
The Shannon-Fano algorithm doesn't always generate an optimal code. In 1952, David A. Huffman gave a different algorithm that always produces an optimal tree for any given probabilities. While the Shannon-Fano tree is created from the root to the leaves, the Huffman algorithm works from leaves to the root in the opposite direction. 1. Create a leaf node for each symbol and add it to frequency of occurrence. 2. While there is more than one node in the queue: 1. Remove the two nodes of lowest probability or frequency from the queue 2. Prepend 0 and 1 respectively to any code already assigned to these nodes
3. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. 4. Add the new node to the queue. 3. The remaining node is the root node and the tree is complete. Example
Huffman Algorithm Using the same frequencies as for the Shannon-Fano example above, viz: Symbol A B C D E
Count
15
Probabilities 0.38461538 0.17948718 0.15384615 0.15384615 0.12820513
In this case D & E have the lowest frequencies and so are allocated 0 and 1 respectively and grouped together with a combined probability of 0.28205128. The lowest pair now are B and C so they're allocated 0 and 1 and grouped together with a combined probability of 0.33333333. This leaves BC and DE now with the lowest probabilities so 0 and 1 are prepended to their codes and they are combined. This then leaves just A and BCDE, which have 0 and 1 prepended respectively and are then combined. This leaves us with a single node and our algorithm is complete. The code lengths for the different characters this time are 1 bit for A and 3 bits for all other characters.
Symbol A B
Code
0 100 101 110 111
Results in 1 bit for A and per 3 bits for B, C, D and E an average bit number of
LempelZivWelch:
LempelZivWelch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has very high throughput
ALGORITHM:
Idea: The scenario described in Welch's 1984 paper[1] encodes sequences of 8-bit data as fixed-length 12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence for which there is no code yet in the dictionary. The code for the sequence (without that character) is emitted, and a new code (for the sequence with that character) is added to the dictionary. The idea was quickly adapted to other situations. In an image based on a color table, for example, the natural character alphabet is the set of color table indexes, and in the 1980s, many images had small color tables (on the order of 16 colors). For such a reduced alphabet, the full 12-bit codes yielded poor compression unless the image was large, so the idea of a variable-width code was introduced: codes typically start one bit wider than the symbols being encoded, and as each code size is used up, the code width increases by 1 bit, up to some prescribed maximum (typically 12 bits). Further refinements include reserving a code to indicate that the code table should be cleared (a "clear code", typically the first value immediately after the values for the individual alphabet characters), and a code to indicate the end of data (a "stop code", typically one greater than the clear code). The clear code allows the table to be reinitialized after it fills up, which lets the encoding adapt to changing patterns in the input data. Smart encoders can monitor the compression efficiency and clear the table whenever the existing table no longer matches the input well. Since the codes are added in a manner determined by the data, the decoder mimics building the table as it sees the resulting codes. It is critical that the encoder and decoder agree on which variety of LZW is being used: the size of the alphabet, the maximum code width, whether variable-width encoding is being used, the initial code size, whether to use the clear and stop codes (and what values they have). Most formats that employ LZW build
this information into the format specification or provide explicit fields for them in a compression header for the data. Encoding: A dictionary is initialized to contain the single-character strings corresponding to all the possible input characters (and nothing else except the clear and stop codes if they're being used). The algorithm works by scanning through the input string for successively longer substrings until it finds one that is not in the dictionary. When such a string is found, the index for the string less the last character (i.e., the longest substring that is in the dictionary) is retrieved from the dictionary and sent to output, and the new string (including the last character) is added to the dictionary with the next available code. The last input character is then used as the next starting point to scan for substrings. In this way, successively longer strings are registered in the dictionary and made available for subsequent encoding as single output values. The algorithm works best on data with repeated patterns, so the initial parts of a message will see little compression. As the message grows, however, the compression ratio tends asymptotically to the maximum. Decoding: The decoding algorithm works by reading a value from the encoded input and outputting the corresponding string from the initialized dictionary. At the same time it obtains the next value from the input, and adds to the dictionary the concatenation of the string just output and the first character of the string obtained by decoding the next input value. The decoder then proceeds to the next input value (which was already read in as the "next value" in the previous pass) and repeats the process until there is no more input, at which point the final input value is decoded without any more additions to the dictionary. In this way the decoder builds up a dictionary which is identical to that used by the encoder, and uses it to decode subsequent input values. Thus the full dictionary does not need be sent with the encoded data; just the initial dictionary containing the single-character strings is sufficient (and is typically defined beforehand within the encoder and decoder rather than being explicitly sent with the encoded data.) Variable-width codes: If variable-width codes are being used, the encoder and decoder must be careful to change the width at the same points in the encoded data, or they will disagree about where the boundaries between individual codes fall in the stream. In the standard version, the encoder increases the width from p to p + 1 when a sequence + s is encountered that is not in the table (so that a code must be added for it) but the next available code in the table is 2p (the first code requiring p + 1 bits). The encoder emits the code for at width p (since that code does not require p + 1 bits), and then increases the code width so that the next code emitted will be p + 1 bits wide. The decoder is always one code behind the encoder in building the table, so when it sees the code for , it will generate an entry for code 2p 1. Since this is the point where the encoder will increase the code width, the decoder must increase the width here as well: at the point where it generates the largest code that will fit in p bits. Unfortunately some early implementations of the encoding algorithm increase the code width and then emit at the new width instead of the old width, so that to the decoder it looks like the width changes one code too early. This is called "Early Change"; it caused so much confusion that Adobe now allows both versions in PDF files, but
includes an explicit flag in the header of each LZW-compressed stream to indicate whether Early Change is being used. Most graphic file formats do not use Early Change. When the table is cleared in response to a clear code, both encoder and decoder change the code width after the clear code back to the initial code width, starting with the code immediately following the clear code. Packing order: Since the codes emitted typically do not fall on byte boundaries, the encoder and decoder must agree on how codes are packed into bytes. The two common methods are LSB-First ("Least Significant Bit First") and MSB-First ("Most Significant Bit First"). In LSB-First packing, the first code is aligned so that the least significant bit of the code falls in the least significant bit of the first stream byte, and if the code has more than 8 bits, the high order bits left over are aligned with the least significant bit of the next byte; further codes are packed with LSB going into the least significant bit not yet used in the current stream byte, proceeding into further bytes as necessary. MSB-first packing aligns the first code so that its most significant bit falls in the MSB of the first stream byte, with overflow aligned with the MSB of the next byte; further codes are written with MSB going into the most significant bit not yet used in the current stream byte. Example: The following example illustrates the LZW algorithm in action, showing the status of the output and the dictionary at every stage, both in encoding and decoding the data. This example has been constructed to give reasonable compression on a very short message. In real text data, repetition is generally less pronounced, so longer input streams are typically necessary before the compression builds up efficiency. The plaintext to be encoded (from an alphabet using only the capital letters) is: The # is a marker used to show that the end of the message has been reached. There are thus 26 symbols in the plaintext alphabet (the 26 capital letters A through Z), plus the stop code #. We arbitrarily assign these the values 1 through 26 for the letters, and 0 for '#'. (Most flavors of LZW would put the stop code after the data alphabet, but nothing in the basic algorithm requires that. The encoder and decoder only have to agree what value it has.) A computer will render these as strings of bits. Five-bit codes are needed to give sufficient combinations to encompass this set of 27 values. The dictionary is initialized with these 27 values. As the dictionary grows, the codes will need to grow in width to accommodate the additional entries. A 5-bit code gives 25 = 32 possible combinations of bits, so when the 33rd dictionary word is created, the algorithm will have to switch at that point from 5-bit strings to 6-bit strings (for all code values, including those which were previously output with only five bits). Note that since the all-zero code 00000 is used, and is labeled "0", the 33rd dictionary entry will be labeled 32. (Previously generated output is not affected by the code-width change, but once a 6-bit value is generated in the dictionary, it could conceivably be the next code emitted, so the width for subsequent output shifts to 6 bits to accommodate that.)
The initial dictionary, then, will consist of the following entries: Symbol Binary Decimal
# 00000
A 00001
B 00010
C 00011
D 00100
E 00101
F 00110
G 00111
H 01000
I 01001
J 01010
10
K 01011
11
L 01100
12
M 01101
13
N 01110
14
O 01111
15
P 10000
16
Q 10001
17
R 10010
18
S 10011
19
T 10100
20
U 10101
21
V 10110
22
W 10111
23
X 11000
24
Y 11001
25
Z 11010
26
Encoding: Buffer input characters in a sequence until + next character is not in the dictionary. Emit the code for , and add + next character to the dictionary. Start buffering again with the next character. Output Current Sequence Next Char Code Bits Extended Dictionary Comments
NULL
20 10100
27:
TO 27 = first available code after 0 through 26
15 01111
28:
OB
2 00010
29:
BE
5 00101
30:
EO
15 01111
31:
OR
18 10010
32:
RN 32 requires 6 bits, so for next output use 6 bits
14 001110
33:
NO
15 001111
34:
OT
20 010100
35:
TT
TO
27 011011
36:
TOB
BE
29 011101
37:
BEO
OR
31 011111
38:
ORT
TOB
36 100100
39:
TOBE
EO
30 011110
40:
EOR
RN
32 100000
41:
RNO
OT
34 100010
# stops the algorithm; send the cur seq
0 000000
and the stop code
Unencoded length = 25 symbols 5 bits/symbol = 125 bits Encoded length = (6 codes 5 bits/code) + (11 codes 6 bits/code) = 96 bits. Using LZW has saved 29 bits out of 125, reducing the message by almost 22%. If the message were longer, then the dictionary words would begin to represent longer and longer sections of text, allowing repeated words to be sent very compactly.
Decoding: To decode an LZW-compressed archive, one needs to know in advance the initial dictionary used, but additional entries can be reconstructed as they are always simply concatenations of previous entries. Input Output Sequence Bits Code Full Conjecture New Dictionary Entry Comments
10100
20
27:
T?
01111
15
27:
TO 28:
O?
00010
28:
OB 29:
B?
00101
29:
BE 30:
E?
01111
15
30:
EO 31:
O?
10010
18
31:
OR 32:
R? created code 31 (last to fit in 5 bits)
001110
14
32:
RN 33:
N? so start using 6 bits
001111
15
33:
NO 34:
O?
010100
20
34:
OT 35:
T?
011011
27
TO
35:
TT 36:
TO?
011101
29
BE
36:
TOB 37:
BE? 36 = TO + 1st symbol (B) of
011111
31
OR
37:
BEO 38:
OR? next coded sequence received (BE)
100100
36
TOB
38:
ORT 39: TOB?
011110
30
EO
39: TOBE 40:
EO?
100000
32
RN
40:
EOR 41:
RN?
100010
34
OT
41: RNO 42:
OT?
000000
At each stage, the decoder receives a code X; it looks X up in the table and outputs the sequence it codes, and it conjectures + ? as the entry the encoder just added because the encoder emitted X for precisely because + ? was not in the table, and the encoder goes ahead and adds it. But what is the missing letter? It is the first letter in the sequence coded by the next code Z that the decoder receives. So the decoder looks up Z, decodes it into the sequence and takes the first letter z and tacks it onto the end of as the next dictionary entry. This works as long as the codes received are in the decoder's dictionary, so that they can be decoded into sequences. What happens if the decoder receives a code Z that is not yet in its dictionary? Since the decoder is always just one code behind the encoder, Z can be in the encoder's dictionary only if the encoder just generated it, when emitting the previous code X for . Thus Z codes some that is + ?, and the decoder can determine the unknown character as follows: 1. The decoder sees X and then Z. 2. It knows X codes the sequence and Z codes some unknown sequence . 3. It knows the encoder just added Z to code + some unknown character, 4. and it knows that the unknown character is the first letter z of . 5. But the first letter of (= + ?) must then also be the first letter of . 6. So must be + x, where x is the first letter of . 7. So the decoder figures out what Z codes even though it's not in the table, 8. and upon receiving Z, the decoder decodes it as + x, and adds + x to the table as the value of Z. This situation occurs whenever the encoder encounters input of the form cScSc, where c is a single character, S is a string and cS is already in the dictionary, but cSc is not. The encoder emits the code for cS, putting a new code for cSc into the dictionary. Next it sees cSc in the input (starting at the second c of cScSc) and emits the new code it just inserted. The argument above shows that whenever the decoder receives a code not in its dictionary, the situation must look like this. Although input of form cScSc might seem unlikely, this pattern is fairly common when the input stream is characterized by significant repetition. In particular, long strings of a single character (which are common in the kinds of images LZW is often used to encode) repeatedly generate patterns of this sort. Further coding: The simple scheme described above focuses on the LZW algorithm itself. Many applications apply further encoding to the sequence of output symbols. Some package the coded stream as printable characters using some form of Binary-to-text encoding; this will increase the encoded length and decrease the compression frequency. Conversely, increased compression can often be achieved with an adaptive entropy encoder. Such a coder estimates
the probability distribution for the value of the next symbol, based on the observed frequencies of values so far. Standard entropy encoding such as Huffman coding or arithmetic coding then uses shorter codes for values with higher probabilities. Uses: When it was introduced, LZW compression provided the best compression ratio among all well-known methods available at that time. It became the first widely used universal data compression method on computers. A large English text file can typically be compressed via LZW to about half its original size. LZW was used in the program compress, which became a more or less standard utility in Unix systems circa 1986. It has since disappeared from many distributions, for both legal and technical reasons, but as of 2008 at least FreeBSD includes both compress and uncompress as a part of the distribution. Several other popular compression utilities also used LZW, or closely related methods. LZW became very widely used when it became part of the GIF image format in 1987. It may also (optionally) be used in TIFF and PDF files. (Although LZW is available in Adobe Acrobat software, Acrobat by default uses the DEFLATE algorithm for most text and color-table-based image data in PDF files.)
Shannon's Theorem:
Shannon's Theorem gives an upper bound to the capacity of a link, in bits per second (bps), as a function of the available bandwidth and the signal-to-noise ratio of the link. The Theorem can be stated as: C = B * log2(1+ S/N) where C is the achievable channel capacity, B is the bandwidth of the line, S is the average signal power and N is the average noise power. The signal-to-noise ratio (S/N) is usually expressed in decibels (dB) given by the formula: 10 * log10(S/N) so for example a signal-to-noise ratio of 1000 is commonly expressed as 10 * log10(1000) = 30 dB. Here is a graph showing the relationship between C/B and S/N (in dB):
Examples Here are two examples of the use of Shannon's Theorem. Modem For a typical telephone line with a signal-to-noise ratio of 30dB and an audio bandwidth of 3kHz, we get a maximum data rate of: C = 3000 * log2(1001) which is a little less than 30 kbps. Satellite TV Channel For a satellite TV channel with a signal-to noise ratio of 20 dB and a video bandwidth of 10MHz, we get a maximum data rate of: C=10000000 * log2(101) which is about 66 Mbps.
CHANNEL CAPACITY:
In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel. By the noisychannel coding theorem, the channel capacity of a given channel is the limiting information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability. Information theory, developed by Claude E. Shannon during World War II, defines the notion of channel capacity and provides a mathematical model by which one can compute it. The key result states that the capacity of the channel, as defined above, is given by the maximum of the mutual information between the input and output of the channel, where the maximization is with respect to the input distribution.
BANDWIDTH:
It has several related meanings: Bandwidth (signal processing) or analog bandwidth, frequency bandwidth or radio bandwidth: a measure of the width of a range of frequencies, measured in hertz Bandwidth (computing) or digital bandwidth: a rate of data transfer, bit rate or throughput, measured in bits per second (bps) Spectral line width: the width of an atomic or molecular spectral line, measured in hertz
Bandwidth can also refer to: Bandwidth (linear algebra), the width of the terms around the diagonal of a matrix hypotenuse In kernel density estimation, "bandwidth" describes the width of the convolution kernel used A normative expected range of linguistic behavior in language expectancy theory In business jargon, the resources needed to complete a task or project Bandwidth (radio program): A Canadian radio program Graph bandwidth, in graph theory
SIGNAL-TO-NOISE RATIO:
Signal-to-noise ratio (often abbreviated SNR or S/N) is a measure used in science and engineering to quantify how much a signal has been corrupted by noise. It is defined as the ratio of signal power to the noise power corrupting the signal. A ratio higher than 1:1 indicates more signal than noise. While SNR is commonly quoted for electrical signals, it can be applied to any form of signal (such as isotope levels in an ice core or biochemical signaling between cells). In less technical terms, signal-to-noise ratio compares the level of a desired signal (such as music) to the level of background noise. The higher the ratio, the less obtrusive the background noise is. "Signal-to-noise ratio" is sometimes used informally to refer to the ratio of useful information to false or irrelevant data in a conversation or exchange. For example, in online discussion forums and other online communities, off-topic posts and spam are regarded as "noise" that interferes with the "signal" of appropriate discussion. Signal-to-noise ratio is defined as the power ratio between a signal (meaningful information) and the background noise (unwanted signal):
where P is average power. Both signal and noise power must be measured at the same or equivalent points in a system, and within the same system bandwidth. If the signal and the noise are measured across the same impedance, then the SNR can be obtained by calculating the square of the amplitude ratio:
where A is root mean square (RMS) amplitude (for example, RMS voltage). Because many signals have a very wide dynamic range, SNRs are often expressed using the logarithmicdecibel scale. In decibels, the SNR is defined as
which may equivalently be written using amplitude ratios as
The concepts of signal-to-noise ratio and dynamic range are closely related. Dynamic range measures the ratio between the strongest un-distorted signal on a channel and the minimum discernable signal, which for most purposes is the noise level. SNR measures the ratio between an arbitrary signal level (not necessarily the most powerful signal possible) and noise. Measuring signal-to-noise ratios requires the selection of a representative or reference signal. In audio engineering, the reference signal is usually a sine wave at a standardized nominal or alignment level, such as 1 kHz at +4 dBu (1.228 VRMS).
SNR is usually taken to indicate an average signal-to-noise ratio, as it is possible that (near) instantaneous signal-to-noise ratios will be considerably different. The concept can be understood as normalizing the noise level to 1 (0 dB) and measuring how far the signal 'stands out'.
Mutual information:
In probability theory and information theory, the mutual information (sometimes known by the archaic term trans information) of two random variables is a quantity that measures the mutual dependence of the two variables. The most common unit of measurement of mutual information is the bit, when logarithms to the base 2 are used. Definition of mutual information: Formally, the mutual information of two discrete random variables X and Y can be defined as:
where p(x,y) is the joint probability distribution function of X and Y, and p1(x) and p2(y) are the marginal probability distribution functions of X and Y respectively. In the case of a continuous function, summation is matched with a definite double integral:
where p(x,y) is now the joint probability density function of X and Y, and p1(x) and p2(y) are the marginal probability density functions of X and Y respectively. These definitions are ambiguous because the base of the log function is not specified. To disambiguate, the function I could be parameterized as I(X,Y,b) where b is the base. Alternatively, since the most common unit of measurement of mutual information is the bit, a base of 2 could be specified. Intuitively, mutual information measures the information that X and Y share: it measures how much knowing one of these variables reduces our uncertainty about the other. For example, if X and Y are independent, then knowing X does not give any information about Y and vice versa, so their mutual information is zero. At the other extreme, if X and Y are identical then all information conveyed by X is shared with Y: knowing X determines the value of Y and vice versa. As a result, in the case of identity the mutual information is the same as the uncertainty contained in Y (or X) alone, namely the entropy of Y (or X: clearly if X and Y are identical they have equal entropy). Mutual information quantifies the dependence between the joint distribution of X and Y and what the joint distribution would be if X and Y were independent. Mutual information is a measure of dependence in the following sense: I(X; Y) = 0 if and only if X and Y are independent random variables. This is easy to see in one direction: if X and Y are independent, then p(x, y) = p(x) p(y), and therefore:
Moreover, mutual information is nonnegative (i.e. I(X;Y) 0; see below) and symmetric (i.e. I(X;Y) = I(Y;X)).
CHANNEL CAPACITY:
In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel. By the noisychannel coding theorem, the channel capacity of a given channel is the limiting information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability. Information theory, developed by Claude E. Shannon during World War II, defines the notion of channel capacity and provides a mathematical model by which one can compute it. The key result states that the capacity of the channel, as defined above, is given by the maximum of the mutual information between the input and output of the channel, where the maximization is with respect to the input distribution Formal definition
Let X represent the space of signals that can be transmitted, and Y the space of signals received, during a block of time over the channel. Let
be the conditional distribution function of Y given X. Treating the channel as a known statistic system, pY | X(y | x) is an inherent fixed property of the communications channel (representing the nature of the noise in it). Then the joint distribution
of X and Y is completely determined by the channel and by the choice of
the marginal distribution of signals we choose to send over the channel. The joint distribution can be recovered by using the identity
Under these constraints, next maximize the amount of information, or the message, that one can communicate over the channel. The appropriate measure for this is the mutual information I(X;Y), and this maximum mutual information is called the channel capacity and is given by
Noisy-channel coding theorem: The noisy-channel coding theorem states that for any > 0 and for any rate R less than the channel capacity C, there is an encoding and decoding scheme that can be used to ensure that the probability of block error is less than for a sufficiently long code. Also, for any rate greater than the channel capacity, the probability of block error at the receiver goes to one as the block length goes to infinity. Example application:
An application of the channel capacity concept to an additive white Gaussian noise (AWGN) channel with B Hz bandwidth and signal-to-noise ratio S/N is the ShannonHartley theorem:
C is measured in bits per second if the logarithm is taken in base 2, or nats per second if the natural logarithm is used, assuming B is in hertz; the signal and noise powers S and N are measured in watts or volts2, so the signal-tonoise ratio here is expressed as a power ratio, not in decibels (dB); since figures are often cited in dB, a conversion may be needed. For example, 30 dB is a power ratio of 1030 / 10 = 103 = 1000. Slow-fading channel: In a slow-fading channel, where the coherence time is greater than the latency requirement, there is no definite capacity as the maximum rate of reliable communications supported by the channel, log2(1 + | h | 2SNR), depends on the random channel gain | h | 2. If the transmitter encodes data at rate R [bits/s/Hz], there is a certain probability that the decoding error probability cannot be made arbitrarily small, , in which case the system is said to be in outage. With a non-zero probability that the channel is in deep fade, the capacity of the slow-fading channel in strict sense is zero. However, it is possible to determine the largest value of R such that the outage probability pout is less than . This value is known as the -outage capacity. FAST-FADING CHANNEL: In a fast-fading channel, where the latency requirement is greater than the coherence time and the codeword length spans many coherence periods, one can average over many independent channel fades by coding over a large number of coherence time intervals. Thus, it is possible to achieve a reliable rate of communication of fast-fading channel. [bits/s/Hz] and it is meaningful to speak of this value as the capacity of the
RATE DISTORTION THEORY:

Ratedistortion theory is a major branch of information theory which provides the theoretical foundations for lossy data compression; it addresses the problem of determining the minimal amount of entropy (or information) R that should be communicated over a channel, so that the source (input signal) can be approximately reconstructed at the receiver (output signal) without exceeding a given distortion D.
IMPORTANT QUESTIONS PART A All questions Two Marks 1. What is entropy? 2. What is prefix code? 3. Define information rate. 4. What is channel capacity of binary synchronous channel with error probability of 0.2? 5. State channel coding theorem. 6. Define entropy for a discrete memory less source. 7. What is channel redundancy? 8. Write down the formula for the mutual information. 9. When is the average information delivered by a source of alphabet size 2, maximum? 10. Name the source coding techniques. 11. Write down the formula for mutual information. 12. Write the expression for code efficiency in terms of entropy. 13. Is the information of a continuous system non negative? If so, why? 14. Explain the significance of the entropy H(X/Y) of a communication system where X is the transmitter and Y is the receiver. 15. An event has six possible outcomes with probabilities .1/4,1/8,1/16,1/32,1/32. Find the entropy of the system.
PART B 1. Discuss Source coding theorem, give the advantage and disadvantage of channel coding in detail, and discuss the data compaction. (16) 2. Explain in detail Huffman coding algorithm and compare this with the other types of coding. (8) 3. Explain the properties of entropy and with suitable example, explain the entropy of binary memory less source. (8) 4. What is entropy? Explain the important properties of entropy. (8) 5. Five symbols of the alphabet of discrete memory less source and their probabilities are given below. (8) S=[S0,S1,S2,S3,S4] P[S]=[.4,.2,.2,.1,.1] Code the symbols using Huffman coding. 6. Write short notes on Differential entropy, derive the channel capacity theorem and discuss the implications of the information capacity theorem. . (16) 7. What do you mean by binary symmetric channel? Derive channel capacity formula for symmetric channel. (8) 8. Construct binary optical code for the following probability symbols using Huffman procedure and calculate entropy of the source, average code Length, efficiency, redundancy and variance? 0.2, 0.18, 0.12, 0.1, 0.1, 0.08, 0.06, 0.06, 0.06, 0.04 (16) 9. Define mutual information. Find the relation between the mutual information and the joint entropy of the channel input and channel output. Explain the important properties of mutual information. . (16) 10. Derive the expression for channel capacity of a continuous channel. Find also the expression for channel capacity of continuous channel of a infinite bandwidth. Comment on the results. (16)
UNIVERSITY QUESTIONS
Reg. No.
Question Paper Code: E3077

B.E./B.Tech. Degree Examinations, Apr/May 2010 Regulations 2008 Fourth Semester Electronics and Communication Engineering EC2252 Communication Theory Time: Three Hours Answer ALL Questions Part A - (10 x 2 = 20 Marks) 1. How many AM broadcast stations can be accommodated i n a 100 kHz bandwidth if the highest frequency modulating a carrier i s 5 kHz? 2. What are the causes of linear distortion? 3. Draw the block diagram of a method for generating a narrowband FM signal. 4. A carrier wave of frequency 100 MHz is frequency modulated by a signal 20 sin (200103 t). What is bandwidth of FM signal if the frequency sensitivity of t he modulation is 25kH z/V. 5. When is a random process called deterministic? 6. A receiver connected to a n antenna of resistance of 50 has an equivalent noise resistance of 30. Find t he receiver noise figure. 7. What are t he characteristics of superheterodyne receivers? 8. What are the methods to improve FM threshold reduction? 9. Define entropy function. 10. Define Rate Bandwidth and Bandwidth efficiency. Maximum: 100 Marks
Part B - (5 x 16 = 80 Marks) 11. (a) (i) Draw an envelope detector circuit used for demodulation of AM and explain its operation. (10) (ii) How SSB can be generated using Weavers method? Illustrate with a neat block diagram. (6) OR 11. (b) (i) Discuss in detail about frequency translation and frequency division multiplexing technique with diagrams. (ii) Compare Amplitude Modulation and Frequency Modulation.
(10) (6)
12. (a)
(i) Using suitable Mathematical analysis show that FM modulation produces infinite sideband. Also deduce an expression for the frequency modulated output and its frequency spectrum. (10) (ii) How can you generate an FM from PM and PM from FM? (6) OR
12. (b)
(i) A 20 MHz i s frequency modulated by a sinusoidal signal such that the maximum frequency deviation is 100 kHz. Determine the modulation index and approximate bandwidth of the FM signal for the following modulating signal frequencies, (1) 1 kHz (2) 100 kHz an d (3) 500 kHz. (8) (ii) Derive the time domain expressions of FM and PM signals. (8)
13. (a) (i) Give a random process, X (t) = A cos(t + ), where A and are constants and is a uniform random variable. Show that X(t) is argotic in both mean and autocorrelation. (8) (ii) Write a short note on shot noise and also explain about power spectral density of shot noise. (8) OR 13. (b) Write t he details about narrow band noise an d the properties of quadrature components of narrowband noise. (16) 14. (a) Derive an expression for SNR at input (SNRc ) and output of (SNRo ) of a coherent detector. (16) 2 E3077
OR 14. (b) (i) Explain pre-emphasis and De-emphasis in detail. 15. (a) (i) Find the code words for five symbols of t he alphabet of a discrete memoryless source with probability {0.4, 0.2, 0.2, 0.1, 0.1}, using Huffman coding and determine t he source entropy and average code word length. (10) (ii) Discuss the source coding theorem. (6) OR 15. (b) (i) Derive the channel capacity of a continuous band limited white Gaussian noise channel. (ii) Discuss about rate distortion theory. (10) (6)

CT

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

CT

Enviado por

Direitos autorais:

Formatos disponíveis

NPR COLLEGE OF ENGINEERING & TECHNOLOGY.

EC1252 COMMUNICATION THEORY

AMPLITUDE MODULATION SYSTEMS

Energy vs. Power signals: as described below.

Energy and Power Signals

Classification of Signals into Power and Energy Signals

The square root of the average power

P of a power signal is what is

the following example.

This is a periodic signal, so it must be a power signal. Let us prove it.

1 1 cos(4 t ) dt 2 1 dt 2 J 9 cos(4 t )dt

Since a(t) is periodic with period T = 2 /2 = 1 second, we get

Let us first find the total energy of the signal.

The average power of the signal is

Let us first find the total energy of the signal.

The average power of the signal is

Properties of DSB-SC Modulation:

Generation of DSB-SC Signals

Single Side Band (SSB) Modulation:

In VSB 1. One sideband is not rejected fully.

One sideband is transmitted fully and a small part (vestige)of

the other sideband is transmitted.

The transmission BW is BW v =B + v. where, v is the vestigial frequency band.

PART A (2 MARK) QUESTIONS.

PART B (16 MARK) QUESTIONS

ANGLE MODULATION SYSTEMS

Variable-capacitance diode phase modulator:

is the instantaneous frequency of the oscillator and

Thus, in this specific case, equation (1) above simplifies to:

where the amplitude deviation

of the modulating sinusoid, is represented by the peak

(see frequency deviation).

where signal xm(t), and

the instantaneous frequency from the carrier frequency. If

called narrowband FM, and its bandwidth is approximately

modulation is called wideband FM and its bandwidth is approximately

, as defined above, is the peak deviation of the instantaneous frequency .

from the center carrier frequency NOISE QUIETING:

X N ormal(, 2 ): also known as the Gaussian distribution 2 1 1 f (x) = e 2 2 (x) 2

Figure 2: PDF and CDF of a couple of random variables.

3 Two random variables

lim FX Y (x, y)dy lim FX Y (x, y)dx.

PX |Y (x|y)PY (y) PX Y (x, y) _ P . l l PX (x) y i EV al(Y ) PX |Y (x|y )PY (y )

x) = lim P (Y y|x X x + x). x0

4 Multiple random variables

FX1 ,X2 ,...,Xn (x1 , x2 , . . . xn )

fX1 |X2 ,...,Xn (x1 |x2 , . . . xn )

f (x , x , . . . xn ) = X1 ,X2 ,...,Xn 1 2 f X2 ,...,Xn(x 1 , x 2 , . . . x n )

For example, a 1 k resistor at a temperature of 300 K has

Noise power in decibels:

which is more commonly seen approximated as:

Thermal noise on capacitors:

Noise at very high frequencies:

that is, the same formula as above.

WHITE NOISE IN A SPATIAL CONTEXT:

Thus, the output of this transformation has expectation

and covariance matrix

Thus, the output of this transformation has expectation

and covariance matrix

By diagonalizing Kxx, we get the following:

and power spectral density

if and only if Sx() satisfies the Paley-Wiener criterion.

If Sx() is a rational function, we can then factor it into pole-zero form as