Você está na página 1de 348

MATHEMATICAL STATISTICS OF PROBABILITY MODELS

NET/JRF/IAS/ISS/JAM/GATE/STATISTICAL INFERENCE

A. SANTHAKUMARAN
About the Author
A. Santhakumaran received his Ph.D. in Mathematics - Statistics from
the Ramanujan Institute for Advanced Study in Mathematics, Univer-
sity of Madras.He has a rich experience in teaching and research. He
had positions as Associate Professor and Head of the Department of
Statistics at Salem Sowdeswari College,Salem, and Professor of Math-
ematics at the Indian Institute of Food Processing Technology, Than-
javur, Tamil Nadu. He has published research papers in Queuing
Theory, Statistical Quality Control,Neural Networks, Fuzzy
Statistics and Food Processing. He is the author of the book Fun-
damentals of Testing Statistical Hypotheses.
Dedicated to all My Teachers
A.Santhakumaran
PREFACE

Human knowledge and practical activities are encompassing the systematic study
of the behaviours of the physical observations and experiments.The purpose of the
book is different from the traditional courses of books. The objectives are to provide
a very basic concepts, elementary presentation emphasis the fundamentals of math-
ematical statistics of predictive models.The book is evolving as a versatile powerful
and indispensable instrument for analysing the statistical data in real life problems.
We have reached a stage where no empirical science can afford to ignore the science.
Mathematical model is a logical description of a system how it performs. Mathemat-
ical models are based on the systematic study of knowledge provided the facts are
existing in the world.Mathematical models involve only manual calculation errors and
disclose the facts as they are. Probability models or Predictive models are based on
the physical experimental outcomes of the data. The probability models must be iden-
tified since they are well described by the causes of variations. The predictive models
contain experimental and computational errors.Simulation is a process of imitating a
real system over time. Simulation affects the logic and it contains computational er-
rors. An experimenter has interested to take a decision of a real system without errors
The evaluation of models definitely depend on some of the per cent errors. The per
True value−Experimental value
cent error of an trail is True value × 100 . The experimenter expects
error free for good decision with minimum observation of data and there by reduce
the cost , administrative inconvenience and time. Apart from the reduction of data
the predicted models require ideal statistics for choosing the best probability models.
The methods of identification deals with the estimation theory of statistical inference.
They are point and confidential interval estimation. The prediction analysis are very
important to make a decision. The book is intended to serve for reaching the goals.
Keeping this in mind, the first chapter of the book deals with Mathematical Models
and Computational methods. The second chapter consists of identification of proba-
bility models from the physical experimental data and also provides some of the the
well known distributions.Chapter 3 gives the criteria of point estimation. Chapter 4
focuses on the study of optimal estimation. Chapter 5 illustrates the properties of reg-
ular family of distributions. Chapter 6 explains the methods of estimation. Chapter 7
discusses interval estimation and Bayesian estimation.
DISTINCTIVE FEATURES

• Care has been taken to provide conceptual clarity, simplicity and up to date
materials for the current situations.

• Properly graded and solved problems to illustrate each concept and procedure
are presented in the text.

• About three hundred solved problems and provide fifty remarks lead to induce
self thinking.

• A chapter only deals with regular family of distributions.

• The book is intended to serve as a text book of one semester course on Statistical
Inference of Under Graduate and Post Graduate Statistics of Indian Universities
and other Applicable Sciences, Allied Statistical Courses, Mathematical Sciences
and various UGC Competitive Examinations like , IAS, GATE, JAM, JRF, NET,
ISS, SLET,etc.

A.Santhakumaran
CONTENTS

1. Mathematical Modeling and Computational Methods

1.1 Introduction . . .

1.2 Building mathematical models

1.3 Basic principles of science

1.4 Governing principles of mathematical models

1.4.1 Idealization

1.4.2 Formulation

1.4.3 Manipulation

1.4.4 Reformulation

1.4.5 Evaluation

1.4.6 Justification

1.4.7 Validation

1.5 Simulation velocity

1.6 Making decision

2. Description of Probability Models

2.1 Introduction

2.1.1 Statistics as Statistical Data

2.1.2 Statistics as Statistical Methods

2.1.3 Statistics as Statistical Models and Methods

2.2 Collecting Statistical data

2.3 Probability models from historical data

2.4 Basic concepts of probability


2.4.1 Random experiment

2.4.2 Random variable

2.5 Discrete probability models

2.6 Discrete group family of distributions

2.7 Continuous probability models

2.8 Distribution functions of random variables

2.8.1Cumulative distribution function technique

2.8.2Change of variable technique

2.9 Empirical probability model

2.9.1 Empirical continuous probability model

2.9.2 Empirical discrete probability model

2.10 Illustration of a probability model

2.11 Recognition test of probability model

Problems

3. Governing Criteria of Point Estimation

3.1 Introduction

3.2 Estimators

3.3 Loss function

3.4 Point estimation

3.5 Problems of point estimation

3.6 Criteria of the point estimation

3.7 Consistent estimator

3.7.1 Sufficient condition for consistent estimator


3.7.2 Invariance property of consistent estimator

3.8 Unbiased estimator

3.9 Sufficient Statistics

3.10 Neyman Criteria of sufficient Statistic

3.11 Exponential family of distributions

3.12 Distribution admitting sufficient statistics

3.13 Joint sufficient statistics

3.14 Efficient estimator

Problems

4. Complete Family of Distributions

4.1 Introduction

4.2 Minimal Statistic with ancillary information

4.3 Complete Statistics and Completeness of family of distributions

4.4 Minimal sufficient statistic

4.5 Method of constructing Minimal sufficient statistics

Problems

5 Optimal Estimation

5.1 Introduction

5.2 Uniformly Minimum Variance Unbiased Estimator

5.3 Un-correlativeness Approach

5.4 Data Reduction Statistic Approach

5.5 Information Inequality Approach


5.6 An Improvement of Cramer -Rao Inequality

5.7 Efficiency of a Statistic

5.8 Extension of Information Inequality

5.9 Multi-Parameter Information Inequality

5.10 Higher Order Information Inequality

Problems

6 Methods of Point Estimation

6.1 Introduction

6.2 Method of Maximum Likelihood Estimation

6.3 Numerical Methods of Maximum Likelihood Estimation

6.4 Optimum properties of MLE

6.5 Method of Minimum Variance Bound Estimation

6.6 Method of Moment Estimation

6.7 Method of Minimum Chi-Square Estimation

6.8 Method of Least Square Estimation

6.9 Best Linear Unbiased Estimator

Problems

7 Interval Estimation

7.1 Introduction

7.2 Confidence Intervals

7.3 Alternative Method of Confidence Intervals

7.4 Shortest Length Confidence Intervals


7.5 Bayes Estimation

7.6 Bayes Risk Related to Prior Information

7.7 Bayes Point Estimation

7.8 Bayes Risk Related to Posterior Information

7.9 Bayes Minimax Estimation

7.10 Bayes confidence intervals

Problems

Answers

Appendix

Glossary of Notation

Bibliography

Index
1. MATHEMATICAL MODELING AND COMPUTATIONAL METHODS

1.1 Introduction

The chapter illustrates that students are motivated to active capacity for reading,
power of understanding, induce self thinking and lead to implement the knowledge in
practical for making creativity in the disciplines. For this purpose consider a free falling
object in vacuum at sea level on the surface of the earth was considered for building
the mathematical model and computational methods. The governing principles of
building mathematical models are Idealization , Formation, Manipulation, Reformation
if necessary, Evaluation, Justification and Validation. Finally the mathematical model
results are compared with predictive model and simulation method results.
Mathematical Models are golden chance for finding the fact from outcomes of
random experiment. The non-mathematical nature of the physical experiments data is
the birth of mathematical modeling.A mathematical model is a logical description of a
system how it performs. It is a symbolic representation of the non-mathematical form
of real life problems which tells the system behaviours and helps to understand the
system features before conducting experiments. Mathematical model can be classified
into deterministic and predictive models. Mathematical or Deterministic models are
based on assumptions, axioms, principles and statements. Predictive or Empirical
models are obtained from the outcomes of physical experimental data after conducting
experiments. Simulation is the process of designing a model of a real system and
arbitrarily build the models for the purpose of understanding the behaviour and for
the operation of the system or Simulation methods are used to get an idea of how
the system will behave in future. In simulation method arbitrarily generate numerical
outcomes of the physical experiments without complicated integration and differential
equations with the help of computer soft wares. Modeling on the outcomes of physical
experimental values reduces time, less expensive where as non- mathematical form of
the physical experimental data do not.Modeling is a value addition and adding scope to
the non- mathematical form of the physical experimental observations. Mathematical
A.Santhakumaran 2

models have (i) Analytical solution, (ii) Graphical solution and (iii) Numerical solution.
Numerical solution consists of (a) Finite difference method, (b) Finite element method
and (c) Simulation method or Bootstrap method.
Analytic solution gives how a mathematical model behaves under all circum-
stances. It is also know as closed form. It helps to standardizes or optimizes the
outcomes of physical experimental non-mathematical form. Analytical solution con-
tains only manual errors. But predictive model consists of manual and experimental
errors where as simulation method involves manual and logic errors.

1.2 Building mathematical models

Building mathematical models depend upon the objectives for studying a partic-
ular problem. Based on the objectives, list out causes and their effects of the problem
for building the mathematical models. For example , the interest for studying the
effect of velocity on a free falling object from a moderate height in vacuum at sea level
on the surface of earth. In the earth, causes are

• distance • air flow

• time of travel • size of the objects

• gravitational force • shape of the objects

• air resistance • mass of the objects and if any

The velocity of falling objects depends on these causes and there by the mathe-
matical model is built.

1.3 Developing scientific understanding

Science is the systematic study of knowledge provided the facts are existing in the
world. The systematic study of knowledge is based on assumptions, axioms, principles
and hypotheses. Developing scientific understanding starts from these concepts for
building mathematical models.
A.Santhakumaran 3

(i) Assumption : A fact is accepted as true or certain to happen without proof.


Heat energy is transferred from more energetic to less energetic particles due to
energy gradient of an object.

(ii) Axiom: A proposition which is regarded as being established or accepted or self


evidently true. A polygon is closed with three sides, then it is a triangle.

(iii) Principle: A statement of conceptual idea which is always accepted as true.


Aristotle abstract idea( proximate matter) is that human bodies are composed
of fire, water, air and earth. This principle inspires to take medicines for human
diseases.

( iv) Hypothesis: A proposed explanation made on the basis of limited evidence as


a starting point for further investigation. For example, which team to choose the
option for playing cricket match is decided by tossing a coin. The hypothesis is
that team A chooses the game first if the team A gets head in tossing a coin.
Otherwise team B chooses its option. The decision has taken after performing
the coin tossing experiment. Thus hypothesis needs evidence to support the
proposal.

1.4 Governing principles of mathematical models

The governing principles of building a mathematical model of the velocity of a


free falling objects from a moderate height in vacuum at sea level on the surface of
earth is illustrated.

1.4.1 Idealization

For considering a mathematical model, a researcher makes an idea to proceed


the causes of free falling objects motion. All causes are not significantly contributed
for constructing the velocity of object. If some of the causes are not significant,then
they do not include in the model. For falling objects size, shape and mass of the
objects are no effect on the motion of free falling objects. The object is falling in a
A.Santhakumaran 4

vacuum such that the effect of free falling object on the causes size, shape, mass, air
resistance, air flow and air density have on influence to affect the velocity of object.
The remaining causes, distance traveled, time of travel and gravitational force are
alone affect the velocity of objects.Idealization is that removing the unimportant or
not significant causes and thereby the significant causes alone consider for constructing
the mathematical models.

1.4.2 Formulation

For building a mathematical model, first to state the assumptions or principles


or axioms or hypotheses related to the objectives of study. Motion of the free falling
objects from moderate height in vacuum at sea level on the surface of earth, assumption
is that rate of change distance x in meter (m) of an object at any time t in second
(s) is directly proportional to the distance already fallen,

dx dx
i.e., ∝x⇒ = kx
dt dt

where k is a constant of proportionally of the free falling objects.

1.4.3 Manipulation

One uses his mathematical knowledge to manipulate for arriving the solution of
mathematical models by graphical or analytical or numerical methods. The analytical
solution motion of free falling object is obtained by variable and separable method of
integration with respect to the variables x and t.

dx
Z Z
= k dt + log c
x
log x = kt + log c
x
log = kt
c
x = cekt

Using the initial condition t = 0, then x = 0 ⇒ c = 0. Thus x = 0 ∀ t ≥ 0. This


means that the object remains in the same place for all the time, i.e., the object does
A.Santhakumaran 5

not move. Here the manipulation is perfect, but the equation x = 0 ∀ ≥ 0 is not
meaning full. Thus the assumption is wrong which needs reformulation.

1.4.4 Reformulation

The reform is that the rate of change distance x of free falling object at any time
t is directly proportional to the time t, it has been falling, i.e., dx
dt ∝ t ⇒ dx
dt = kt
where k is the proportionality constant. It is same for all objects, there is no matter
what the object is. Weight is the only force acting on the objects when object is
falling. Newton’s second law of motion is F = ma kgm/s2 ( One kilogram force =
F
9.81 Newton) where F - Force, m - mass and a - acceleration. Thus a = m, here force
is the weight W. Therefore
W mg
a= ⇒a= = g m/s2
m m
which is independent of mass such that the object’s mass has no effect on the motion
of falling object. Thus the proportionality constant is k = g. Now the differential
equation becomes
Z Z
dx = gtdt + c

After the integration


1
x = gt2 + c
2
Using the initial condition , we get c = 0.
1 2
i.e., x = gt (m/s2 ) × s2
2
1 2
= gt m
2
Velocity is obtained by differentiating x = 12 gt2 with respect to t. It gives
distance traveled dx
v(t) = = = gt m/s
time of travel dt
If additional force drag on the motion of free falling objects, then velocity

v(t) = gt + v0 (t) m/s

where v0 (t) is the initial velocity and g = 9.81 m/s


A.Santhakumaran 6

Mathematical Model Results

Table 1.1 shows the actual calculation results based on the analytical solution,
velocity v(t) = gt m/s where g = 9.81 m/s2 for the given values of time t in seconds
and distance x = 21 gt2 m.

Table 1.1 Actual values


Time t s 1 2 3 4 5 6 7 8 9 10
Distance
xm 4.9 19.6 44.1 78.4 122.5 176.4 240.01 313.6 397.31 490.50
Velocity
v(t) m/s 9.81 19.6 29.4 39.24 49.05 58.86 68.67 78.40 88.29 98.10

Experimental Results

0
Initially a stationary object ball is allowed to fall
12
freely under gravity, the distance traveled is directly pro- 22
portional to the square of elapsed time. This image of 32

the ball, a period of half a second is captured with a flash 42


1 th
camera at 20 flashes per second. During the first 20 52
of the second ball drops one unit of distance (scale: one 62
2 th
unit is equal to 12 cm) and 20 of the second ball drops
72
4 units and so on. Table 1.2 illustrates the experimental
82
values. Figure 1.1 shows the free falling ball which drops
92
from a moderate height at see level on the surface of earth.

102

Figure 1.1 Free ball falling motion


A.Santhakumaran 7

Table 1.2 Experimental values


Velocity Velocity ∆v(t) =
Object Snap Distance Distance v(t) v(t) v(ti+1 ) − v(ti )
position time in traveled t th
t s unit cm cm/ 20 s cm/s cm/s
0 0 0 0 0 0 0
1 1/20 1 12 12 12 12
2 2/20 4 48 24 48 36
3 3/20 9 108 36 108 60
4 4/20 16 192 48 192 84
5 5/20 25 300 60 300 108
6 6/20 36 432 72 432 132
7 7/20 49 588 84 588 156
8 8/20 64 768 96 768 180
9 9/20 81 972 108 972 204
10 10/20 100 1200 120 1200 228

Predictive models

The predictive model of the free falling object is obtained by least square method
of fitting the linear curve v(t) on t. The standardized form of the curve v(t) = 9.751t
( Note: Standardized form is independent of the unit measurement). In scale on
measurement
dx
v(t) = = 9.751t m/s t = 0, 1, 2....
dt
Integrating this with the initial condition t = 0 ⇒ c = 0. The estimated distance

x = 4.875t2 m, t = 0, 1, 2, ....

1.4.5 Evaluation

If any shortcoming of reality in the model is evaluated by proportion of variability


( Coefficient of determination R2 ) which lies in the desirable interval 80 - 100 %. This
is considered on the basis of conceptual principle the normal human body water level.
One knows, the normal human body contains of water approximately close to 80%.
Proportion of variability does not lie in the interval 80 − 100%, then the model has
affected by some kinds of shortcomings. Further adjusted Coefficient of determination
A.Santhakumaran 8

p
R02 = R2 − (1 − R2 ) n−p−1 where n stands for number of trials, p stands for number
of independent variables. Adjusted R02 can be negative but R2 cannot be negative
values. When adjusted R02 increases to indicate that a new additional independent
variable is included in the future selection of the model building.Table 1.3 shows the
observed and estimated values of velocity.

Table 1.3 Observed and Estimated values of Velocity


Time t s 1 2 3 4 5 6 7 8 9 10
Observed
v(t) m/s 9.81 19.6 29.43 39.24 49.05 58.86 68.67 78.40 88.29 98.10
Estimated
v(t) m/s 9.75 19.50 29.25 39.00 48.75 58.50 68.25 78.01 87.75 97.51

Coefficient of determination R2 is the correlation coefficient between the ob-


served and expected values. Using Table 1.3
Pn
(Ei − Ō)2
R = Pni=1
2
2
= 0.95
i=1 (Oi − Ō)
Pn
where Ei − ith estimated value, Oi − ith observed value and Ō = i=1
n . R2 lies in
the desirable interval 80 - 100%. The effect of velocity is explained about 95% on the
cause (time). The left out unexplained factor is only 5%. It is negligible because of
least value near to zero. Thus v(t) = 9.751t m/s, t = 0, 1, 2, 3, ... is evaluated as the
best fit to the observed values in the Table 1.3.

1.4.6 Justification

Predictive model is justified by Root Mean Square Error(RMSE). Justification


means that the observed and estimated values are closed to each other. It can be
easily visualized by drawing the curves of observed and estimated values. When the
curves are resembled same nature, then the predictive model is justified as the best
one. On the other hand to graphical method, the RMSE of model tends to zero is
appropriated, i.e., RMSE lies in the desirable interval 0 to 1 or the Coefficient of
RM SE
variation = Average velocity of the observed values × 100 lies in the desirable interval 0 to
20 %. This is due to the law of nature environment. Oxygen only makes up to 21% of
the air in atmosphere.This predictive model is an apt to the experimental values when
A.Santhakumaran 9

the coefficient of RMSE lies in the desirable interval 0 - 20%. From Table 1.3, the
sP
n
i=1 (Oi − Ei )2
RM S = = 0.3594.
n

The calculated RMSE lies in the desirable interval 0 to 1. The predictive model
v(t) = 9.751t m/s is justified more appropriate to the experimental(observed) values.
0.3594
Coefficient of variation of RMSE = 53.95 × 100 = 0.6661% where the average of
experimental values = 53.95. and it lies in the desirable interval 0 - 20%.

1.4.7 Validation

For a same set of experimental values, there is more than one predictive model is suitable

for the experimental values. In such a case χ2 statistics is used to test the goodness of fit

at 5% level of error for selecting the best model. The Chi-Square test statistics is χ2 =
Pn h (Oi −Ei )2 i
i=1 Ei ∼ χ2 distribution with (n − k − 1) degrees of freedom where n - number of

classes, k - number of parameters estimated..Here the χ2 value is equal to 0.019. The critical

χ2 ( Table value) value for the χ2 statistics is 15.50 with 8 degrees of freedom at 5% level of

error. The hypothesis is to test that H0 : the fitted velocity v(t) = 9.751t m/s is the best one

against the alternative that H1 : the fitted velocity curve is not the best one. The χ2 test is

the right tail one sided test. The acceptance region of the hypothesis H0 is 0 to 15.50 and the

rejection region of the hypothesis H0 is 15.50 to ∞. The calculated χ2 value 0.019 falls in the

acceptance region 0 to 15.50 of the hypothesis H0 .This shows that the velocity v(t) = 9.751t

m/s from the experimental data is validated at 5% level of error. Some time all the predictive

models the Chi-Square statistics values are not significant at fixed level of error.So all models

are suitable for the same experimental values. In this case one selects the predictive model

which has the least Chi- Square statistics value. The reason is that the experimental error of

the predictive model is (OiO−E


i
i)
for single observation. More than one observations, the average

of error is n1 i=1 (OiE−E


Pn i)
i
. For larger n the Chi-Square statistics value is equal to modified

Chi-Square statistics value. So the minimum Chi-Square statistics value among all the possible

Chi-Square statistics values is the best predictive models.


A.Santhakumaran 10

1.5 Simulation velocity

The simulation phenomenon of the velocity of free falling objects is illustrated by


Monte-Carlo Simulation. The simulation technique in which the statistical distribution
functions are created using a series of random numbers.The simulation method is used
to generate arbitrary velocity observations of the free falling objects in vacuum at
sea level on earth for checking the results of mathematical and predictive models.The
free falling object distance between successive positions( ∆v(t)) are equally likely and
uniformly distributed when capture the images in each snap for the free falling object.
Uniform distribution is expected for the ∆v(t), since pictorial representation of velocity
against time Figure 1.2 shows as time increases velocity increases as a straight line.
The slope of straight line V (t) = gt m/s from a moderate height in vacuum at sea
level on the surface of earth has a constant rate acceleration g m/s2 . This gives the
logical of Monte-Carlo Simulation is that the probability density function of the falling
positions of the object is

 1

t = 1, 2, 3 · · · 10
10
f (t) =
 0

otherwise

Using the probability density function(pdf ) of the successive positions, Table 1.4 is
constructed.

Velocity

Time
Figure 1.2 Velocity with acceleration
A.Santhakumaran 11

Velocity

Time
Figure 1.3 Constant velcity with zero acceleration

Figure 1.3 shows the constant velocity of the falling objects with zero acceleration.

Table 1.4 Simulated velocity of experimental data for the free falling object

Generated
Velocity pdf pdf f (t) Cumulative Random Generated
∆v(t) f (t) % pdf f (t)% Interval ∆v(t) ∆v(t)
12 0.1 10 10 00 - 09 180(70)
36 0.1 10 20 10 - 19 108(46)
60 0.1 10 30 20 - 29 108(48)
84 0.1 10 40 30 - 39 132(57)
108 0.1 10 50 40 - 49 60(21)
132 0.1 10 60 50 - 59 132(51)
156 0.1 10 70 60 -69 180(71)
180 0.1 10 80 70 - 79 132(55)
204 0.1 10 90 80 - 89 204 (86)
228 0.1 10 100 90 - 99 60(26)
Total =1200 - - - - 1244

The following random numbers are used and given in Table 1.5
A.Santhakumaran 12

Table 1.5 Random numbers


13692 70992 65172 28053 · · · · · · ··· ···
43905 46941 72300 11641 ··· ··· ··· ···
00504 48658 38051 59408 ··· ··· ··· ···
61274 57238 47267 35303 ··· ··· ··· ···
43753 21159 16239 50595 ··· ··· ··· ···
83505 51662 21636 68192 ··· ··· ··· ···
36807 71420 35804 44862 ··· ··· ··· ···
19110 55680 18792 41487 ··· ··· ··· ···
82615 86980 93290 87971 ··· ··· ··· ···
05612 26584 63013 61881 ··· ··· ··· ···

Choose arbitrarily a column or a row or diagonal from the random number table
and combine two digit numbers because the velocity of object consists two digits only.
Here the second column is chosen and jointly consider ten two digit numbers succes-
sively 70, 46, 48, 57, 21, 51, 71, 55, 86 and 26. The first number 70 falls in the interval
70 - 79 of the Table 1.4.The value against interval gives velocity which is 180 cm/s.
It is shown in the last column of the Table 1.4. Similarly the velocities are simulated
for the rest of free falling object in vacuum at sea level is v(t) = 12.44 m/s.

1.6 Making decision

The motion of free falling objects in vacuum from moderate height at sea level on
the surface of earth is summarized in the Table 1.6

Table 1.6 Comparison of results


Item Mathematical method Experimental Method Simulation Method
Velocity v(t) = 9.81t m/s v(t) = 9.751t m/s Nil
Distance x = 4.90t2 m x = 4.875t2 m/s Nil
When time t = 1.26s t = 1.26s t = 1.26s
Distance x = 7.70 m x = 7.74 m x = 7.74 m
Velocity v(t) = 12.36 m/s v(t) = 12.29 m/s v(t) = 12.44 m/s
A.Santhakumaran 13

There were three methods of obtaining the velocity and distance of the free falling
object in vacuum at sea level on the surface of earth. The results were near to each
other.For the three methods the exact result was the mathematical model velocity
v(t) = gt m/s and the distance x = 12 gt2 m where g = 9.81 m/s2 was used when
the motion of free falling object in vacuum at sea level on the surface of the earth.
For differentiating twice, the distance equation x = 12 gt2 with respect to time t. The
d2 x
acceleration a = dt2
= g m/s2 .
Galileo’s remarkable observation is that all free falling objects fall at the
same rate of acceleration in a vacuum regardless of their mass.
2. PROBABILITY DISTRIBUTION MODELS

2.1 Introduction

Scientific techniques are inevitable when researchers have to deal with historical or
experimental data. The science statistics provides a systematic approach for making
decision which aims to resolve the real life problems. It originated more than 2000
years ago, but it was recognized as a separate discipline from 1940 in India. From then
till now, Statistics is evolving as a versatile powerful and indispensable instrument for
investigation in all fields of real life problems. It provides a wide variety of analytical
tools. We have reached a stage where no empirical science can afford to ignore the
science of Statistics,since the diagnosis of pattern recognition can be achieved through
the science of Statistics.
In India, during the period of Chandra Gupta Maurya, there was an efficient
system of collecting official and administrative Statistics. During Akbar’s reign ( 1556 -
1605 A.D.) people maintained good records of land and agricultural Statistics. Statis-
tics surveys were also conducted during his reign. Sir Ronald A. Fisher known as
father of Statistics placed Statistics on a very sound footing by applying it to various
diversified fields. His contributions in Statistics led to a very responsible position of
Statistics among sciences
Professor P. C. Mahalanobis is the founder of Statistics in India. He was
a Physicist by training, a Statistician by instinct and an Economist by conviction.
Government of India has observed on 29th June the birthday of Professor Prasanta
Chandra Mahalanobis as National Statistics Day. Professor C.R. Rao is an Indian
legend, whose career spans the history of modern Statistics. He is considered by many
to be the greatest living Statistician in the world to day.
There are many definitions of the term Statistics. Some authors have defined
Statistics as Statistical data ( plural sense) and others as Statistical methods ( singular
sense).
A.Santhakumaran 15

2.1.1 Statistics as statistical data

Yule and Kendall define Statistics is the quantitative data affected to a marked
extent by multiplicity of causes. Their definition point out the following characteristics:

• Statistics are aggregates of facts.

• Statistics are affected to a marked extent by multiplicity of causes.

• Statistics are numerically expressed.

• Statistics are enumerated or estimated according to reasonable standards of ac-


curacy.

• Statistics are collected in a systematic manner.

• Statistics are collected for a pre - determined purpose and

• Statistics should be placed in relation to each other.

2.1.2 Statistics as statistical methods

The best definitions of Statistics is given by Croxton and Cowden. They define
Statistics as the science which deals with collection, analysis and interpretation of
numerical data. This definition points out the scientific ways of :

• Data collection • Analysis

• Presentation • Interpretation

2.1.3 Statistics as statistical models and methods

Statistics is an imposing form of Mathematics. The usage of statistical methods


has been briskly expanding in the late 20th century, because of the application value
of the statistical models and methods have greater implication in the applications of
many inter - disciplinary sciences. The author of the book views Statistics is the
A.Santhakumaran 16

science of winding and twisting network connecting Mathematics, Scien-


tific Philosophy, Computer Software and other Intellectual sources of the
millennium.This may be the modern definition of Statistics.
This definition reveals that Statisticians work to translate real life problems
into mathematical models by using assumptions or axioms or principles. Then they
derive exact solutions by their knowledge and thereby intellectually validate the results
and express their merits in non-mathematical forms which make for the consistency of
real life problems.
In real life problems, there are many situations where the actions of entities
within the system under study cannot be completely predicted with 100 percent per-
fection . There is always some variation. The variation can be classified into two
categories, i.e., variation due to assignable causes which has to be identified, and elim-
inated. Variation due to chance causes is equal to 6σ values. This is also called natural
variation. In general, the reduction of natural variation is not necessary and involves
more cost. So it is not feasible to reduce the natural variation. However, some appro-
priate statistical patterns of recognition may well describe the causes of variations.
An appropriate statistical pattern of recognition can be diagnosed by repeated
sampling of phenomenon. Then, through the systematic study of these data, a Statisti-
cian can obtain a known distribution suitable for the data and estimates the parameters
of distribution. A Statistician takes continuous efforts in the selection of a distribution
form.

2.2 Collecting statistical data

Collection of data is one of the important tasks in finding a solution for real life
problems. Even if the statistical pattern of the real life problems are valid, if the data
are inaccurately collected, inappropriately analyzed or not representative of the real
life problems, then the data will be misleading when used for decision making.
One can learn data collection from an actual experience. The following sug-
gestions may enhance and facilitate data collection. Data collection and analysis must
A.Santhakumaran 17

be tackled with great care.

(i) Before collecting data, planning is very important. It could commence by a practice
of pre - observing experience. Try to collect the data while pre - observing. Forms
of the data are devised for due purposes. It is very likely that these forms will
have to be modified several times before the actual data collection begins. Watch
for unusual situations or normal circumstances and consider how they will be
handled. Planning is very important even if the data are collected automatically.
After collecting the data, find out whether the collected data are appropriate or
not.

(ii) If the data being collected are adequate to diagnosis the statistical distributions,
then determine the apt distribution. If the data being used are useless to di-
agnosis the statistical distribution, then there is no need to collect superfluous
data.

(iii) Try to combine homogeneous data sets. Check the data for homogeneity in
successive time periods and during the same time period on successive interval
of times.

(iv) Beware of the possibility of data censoring, in which a quantity of interest is


not observed in its entirety. This problem most often occurs when the analyst is
interested in the time required to complete some process but the process begins
prior to or finishes after the completion of the observation period. Censoring can
result in especially long process times being left out of the data sample.

(v) One may use scatter diagram which indicates the relationship between the two
variables of interest.

(vi) Consider the possibility that a sequence of observations which appear to be independent
may possess autocorrelation. Autocorrelation may exist in successive time periods.
A.Santhakumaran 18

2.3 Probability distribution models from historical Data

The methods for selecting families of distributions are possible, if only the sta-
tistical data are available. The specific distribution within a family is specified by
estimating its parameters. Parameters estimation of a distribution leads to the theory
of estimation.
The formation of frequency distribution or histogram is useful in guessing the
shape of a distribution. Hines and Montgomery state that choosing the number of class
intervals approximately equals the square root of the sample size. If the intervals are
too long, the histogram will be coarse or blocking, and its shape, and other details will
not smoothness the data. So one has to allow the interval sizes to change until a good
choice is found. The histogram for continuous data corresponds to the probability
density function of a theoretical distribution. If continuous, a line drawn through
the centre point of each class interval frequency should result in a shape like that of
probability density function (pdf )( see Figure 2.2).
Histogram for discrete data where there are a large number of data points,
should have a cell for each value in the range of data. However if there are a few
data points, it may be necessary to combine adjacent cells to eliminate the ragged
appearance of the histogram. If histogram is associated with discrete data, it should
look like a probability mass function (pmf ) ( see Figure 2.1).

2.4 Basic concepts of probability

Probability models can be identified from historical or experimental data as statis-


tician’s continuous efforts in the selection of distribution form. For probability models,
one goes on the following basic concepts of probability theory.

2.4.1 Random experiment

An experiment is called a random experiment,if it can be uniformly repeated


under identical trials and all possible outcomes of the experiment cannot be predicted
without performing the experiment. Set of all possible outcomes of the experiment is
A.Santhakumaran 19

known as Sample Space S. A single point in the sample space is an elementary event or
simple event. A set function defined on the sample space is called probability measure,
if

(i) P (A) ≥ 0 ∀ A ∈ S (iii) For any disjoint sets Ai , i = 1, 2, 3 · · ·


S∞
on the sample space S , P ( i=1 Ai ) =
P∞
(ii) P (S) = 1 i=1 P (Ai )

2.4.2 Random variable

Suppose tossing a coin twice one by one successively or at a time under uniform
conditions, then that to each event of the sample space, one assigns 0 to the outcome
of getting two tails, 1 to the outcome of getting one head and 2 to the outcome of
getting two heads. Let X denote the number of heads in tossing the two coins, then
a real valued function X(ω) is induced by the sample space S = {(T, H) × (T, H)} =
{T T, T H, HT, HH} as a random variable and represent ω1 = {T T }, ω2 = {T H} or
{HT }, ω3 = {HH} are simple events. The possible values of the random X(ω) is
X(ω1 ) = 0, X(ω2 ) = 1 and X(ω3 ) = 2. Table 2.1 shows the associated values of the
random variable and their probabilities.

Table 2. 1 Associated values of the random variable

Simple Event ω ω1 ω2 ω3
Number of heads X(ω) = x 0 1 2
P {X(ω) = x} = x2 ( 12 )2

1/4 /12 1/4

Thus a random variable X is a finite real valued function defined on sample space
S and its inverse image is an event,i.e., X −1 (B) = {ω : X(ω) ∈ B} ∈ S, ∀B ∈ B
where B is the Borel set and B is the Borel field generated by class of all semi closed
intervals B ∈ <. HereB1 = {0} = (−∞, 0] , B2 = {1} = (0, 1] , and B3 = {2} = (1, 2].
Further X −1 (B1 ) = ω1 , X −1 (B2 ) = ω2 and X −1 (B3 ) = ω3 are the events in the sample
space S.
A.Santhakumaran 20

A function F (x) on the random variable X such that F (x) = P {X ≤ x} ∀ x ∈


< is called cumulative distribution function of the random variable. Properties of the
distribution function F (x) are

(i) 0 ≤ F (x) ≤ 1 (iii)F (x) is non- decreasing

(ii)F (+∞) = 1 and F (−∞) = 0 (iv) F (x) is right continuous

A random variable X is called discrete it takes on at most countable values


x0 , x1 , x2 , x3 · · · with probabilities P {X = xi } = p(xi ) ≥ 0, ∀ i = 0, 1, 2, 3 · · · and
P∞
i=0 p(xi ) = 1. The set {p(xi ), i = 0, 1, 2, 3 · · ·} is called probability mass function(
pmf ) of the random variable X. Further a random variable X is continuous,if F (x) is
absolutely continuous. If F (x) is continuous, then there exists a function f (x) such that
Rx
F (x) = −∞ f (u)du. The function f (x) is known as probability density function(pdf )
of the random variable X. If F (x) is absolutely continuous and f (x) is continuous,
∂F (x)
then ∂x = f (x) all most surely.

2.5 Discrete probability distribution models

Discrete random variables are used to describe the random phenomenon in which
only integer values can occur. The following are some important distributions.

(i) Bernoulli distribution


An experiment consists of n trials, each trial has a success or a failure and each
trial is repeated under the same condition. Let Xj = 1 if the j th experiment
resulted in success and let Xj = 0, if the j th experiment resulted in a failure, the
sample space has a value 0 and 1. If the trials are independent, each trial has
only two possible outcomes ( success or failure) and the probability of success θ
remains constant from trial to trial. For one trial the pmf

 θ x (1 − θ)1−x

x = 0, 1, 0 < θ < 1
pθ (x) =
 0

otherwise

is the Bernoulli distribution function.


A.Santhakumaran 21

From the above assumptions in a production process, let X denote the quality
of produced item, then X follow the Bernoulli random variable.

(ii) Binomial distribution


Let X denote the number of success in n Bernoulli trials. Then the random
variable X are called a Binomial random variable with parameters n and θ. Here
the sample space is {0, 1, 2, · · · , n} and the pmf is

n! x
x!(n−x)! θ (1 − θ)n−x x = 0, 1, · · · , n, 0 < θ < 1


pθ (x) =
 0

otherwise
In Binomial distribution, the mean is always greater than variance . If
X1 , X2 , · · · , Xn are independent and identically distributed Bernoulli random
Pn
variables, then i=1 Xi ∼ b(n, θ). The problems relating to toss a coin or throw
a dice lead to Binomial distribution . In a production process, the number of x
defective units in a random sample of n units follows Binomial distribution.

(iii) Geometric distribution


The random variable X are related to a sequence of Bernoulli trials in which the
number of trials (x + 1) to achieve the first success is

 θ(1 − θ)x

x = 0, 1, 2, · · · , 0 < θ < 1
pθ (x) =
 0

otherwise

It is the probability that event {X = x} occur, when there are x failures followed
by a success.

A couple decides to have any number of children until they have a male
child. If the probability of having a male child in their family is p, they have
to expect how many children, they will have before the first male child is born.
Let X denote the number of children for the couple. Probability that there are
x female children preceding the first male child is born, is a geometric random
variable.

(iv) Negative Binomial distribution


Pn
If X1 , X2 , · · · , Xn are iid geometric variables, then T = t(X) = i=1 Xi ∼ a
A.Santhakumaran 22

Negative Binomial variate whose pmf is



 (t+n−1)! θ n (1 − θ)t

t = 0, 1, · · ·
t!(n−1)!
pθ (t) =
 0

otherwise
The random variable X are related to a sequence of Bernoulli trials in which x
failures preceding the nth success in (x + n) trials is given by

 (x+n−1)! θ n (1 − θ)x

x = 0, 1, 2, · · ·
(n−1)!x!
pθ (x) =
 0

otherwise
This will happen if the last trial results in a success and among the previous
(n + x − 1) trials there are exactly x failures. Note that if n = 1, then pθ (x) is
the Geometric distribution function. Negative Binomial distribution has Mean
< Variance. In a production process, the number of units that are required to
achieve nth defective in x + n units follow Negative Binomial distribution.

(v) Multinomial distribution


If the sample space of a random experiment has been split into more than two
mutually exclusive and exhaustive events then one can define a random variable
which leads to Multinomial distribution. Let E1 , E2 , · · · , Ek be k mutually exclu-
sive and exhaustive events of a random experiment with respective probabilities
θ1 , θ2 , · · · , θk , such that θ1 + θ2 + · · · + θk = 1 and 0 < θi < 1, i = 1, 2, · · · , k,
then the probability that E1 occurs x1 times, E2 occurs x2 times, · · · , Ek occurs
xk times in n independent trials is known as Multinomial distribution with pmf
is given by

Pk
 n! x1 x2
x1 !x2 !···xk ! θ1 θ2 · · · θkxk where i=1 xi = n
pθ1 ,θ2 ,···,θk (x1 , x2 , · · · , xn ) =
 0 otherwise
If k = 2 , that is, the number of mutually exclusive events is only two, then the
Multinomial distribution becomes a Binomial distribution which is given by

n! x1 x2
x1 !x2 ! θ1 θ2 where x1 + x2 = n and θ1 + θ2 = 1


pθ1 ,θ2 (x1 , x2 ) =
 0

otherwise
That is x2 = n − x1 and θ2 = 1 − θ1 which implies

n! x1
x1 !(n−x1 )! θ1 (1 − θ1 )n−x1 0 < θ1 < 1, x1 = 0, 1, · · · , n


pθ1 (x1 ) =
 0

otherwise
A.Santhakumaran 23

Consider two brand A and B. Each individual in the population prefers brand
A to brand B with probability θ1 , prefers B to A with probability θ2 and is
indifferent between brand A and B with probability θ3 = 1 − θ1 − θ2 . In a
random sample of n individual X1 prefers brand A, X2 prefers brand B and X3
prefers some brand other than A and B. Then the three random variables follow
a Trinomial distribution, i.e.,

pθ1 ,θ2 ,θ3 (x1 , x2 , x3 ) = P {X1 = x1 , X2 = x2 , X3 = x3 }



n! x1 x2 x3
x1 !x2 !x3 ! θ1 θ2 θ3 x1 + x2 + x3 = n


=
 0

otherwise

(vi) Discrete uniform probability distribution model


The random variable X are said to follow Uniform distribution on N points
(x1 , x2 , · · · , xN ), if its pmf is given by

 1

i = 1, 2, · · · , N and N ∈ I+
N
pN (x) = PN {X = xi } =
 0

otherwise

where I+ = {1, 2, 3 . . .}.


A random experiment with complete uncertainty but whose outcomes are equal
probabilities may describe Uniform distribution. In a finite population of N
units, one has to select any unit xi , i = 1, 2, · · · , N from the population with
simple random sampling technique which has a discrete Uniform distribution.

(vii) Hypergeometric probability distribution model


One situation in which Bernoulli trials are encountered is that in which an object
is drawn at random from a collection of two types objects in a box. In order to
repeat this experiment so that the results are independent and identically dis-
tributed, it is necessary to replace each object drawn and to mix the objects
before the next one is drawn. This process is referred to as sampling with re-
placement. If the sampling is done no replacement of the objects drawn, the
resulting trial are still of the Bernoulli type but no longer independent.
A.Santhakumaran 24

For example, four balls are drawn one at a time, at random and no replace-
ment from 8 balls in a box, 3 black and 5 red. The probability that the third
ball drawn is black, i.e.,

P { 3rd ball black} = P (RRB) + P (RBB) + P (BRB) + P (BBB)


5 4 3 5 3 2 3 5 2 3 2 1
= × × + × × + × × + × ×
8 7 6 8 7 6 8 7 6 8 7 6
3
=
8
which is the same as probability that the first ball drawn is black. It should not
be surprising that probability for black ball is the same on the third draw as on
the first draw.

In general case, n objects are to be drawn at random, one at a time, from


a collection of N objects, M of one kind and N − M of another kind. The one
kind and of object will be thought of as success and coded 1; the other kind is
coded 0. Let X1 , X2 , · · · , Xn denote the sequence of coded outcomes; that is Xi
is 1 or 0 according to whether the ith draw results in success or failure. The total
number of success in n trials is just the sum of X 0 s ,

Sn = X1 + X2 + · · · + Xn

as it was in the case of independent identically distributed Bernoulli trials. That


is, the probability of a 1 on the ith trial is the same at each trial:
M
P {Xi = 1} = i = 1, 2, · · · , n
N
One can observe first that the probability of a given sequence of N objects is
1 1 1
···
N N −1 N −n+1
The probability that an object of type 1 occurs in the ith position in the sequence
of N objects is
M (N − 1)(N − 2) · · · (N − n + 1)
P {Xi = 1} =
N (N − 1) · · · (N − n + 2)(N − n + 1)
M
= i = 1, 2, · · · , n
N
A.Santhakumaran 25

where M is the number of ways of selecting the ith position with an object coded
1 and (N − 1)(N − 2) · · · (N − n + 1) is the number of ways of selecting the
remaining (n − 1) places in the sequence from (N − 1) remaining objects. It does
not matter whether the number of success among the n objects drawn, one at a
time, at random or that of simultaneously drawing n at random. The probability
function of Sn is

M N-M
! !

k n-k





k = 0, 1, 2, · · · , min(n, M )


P {Sn = k} = N
 

 n




 0

otherwise

The random variable Sn with the above probability function is said to have a
Hypergeometric distribution. The mean of random variable Sn is easily obtained
from the representation of a Hypergeometric variable as a sum of the Bernoulli
trials. That is,

Mean = E[Sn ] = E[X1 + X2 + · · · + Xn ]

= E[X1 ] + E[Xn ] + · · · + E[Xn ]

= 1 × P {X1 = 1} + 0 × P {X1 = 0}

+ · · · + 1 × P {Xn = 1} + 0 × P {Xn = 0}
M M nM
= + ··· + =
N N N
M N −M N −n
Variance = V [Sn ] = n if N ∈ I+ (2.1)
N N N −1
The probability at each trial that the object drawn is of the type of which there
M
are initially M is p = N, then

N −n
V [Sn ] = npq if N ∈ I+ (2.2)
N −1
N −n
The equation (2.2) differs from the equation (2.1) by the extra factor N −1 . The
−n
V [Sn ] = npq N
N −1 in the no replacement case and the V [Sn ] = npq in the replace-
N −n
ment case for fixed p and fixed n, since the factor N −1 → 1 as N becomes finitely
A.Santhakumaran 26

many. Thus Hypergeometric distribution is exact where as Binomial distribution


is approximate one.

50 students of the M.Sc. statistics in a certain college are divided at random


into 5 batches of 10 each for the annual practical examination in Statistics. The
class consists of 20 resident students and 30 non - resident students. Let X be
the number of students in the first batch who appear the practical examination.
Hypergeometric distribution is apt to describe the random variable X and has
the pmf
30
   !
20
x 10 - x





x = 0, 1, 2, · · · , 10


50
 !
P {X = x} = 10







 0 otherwise

(viii) Poisson distribution


Poisson random variable is used to describe rare events. For example number of
air crashes occurred on Monday in 3 pm to 5 pm. The pmf of Poisson random
variable is given as

 e−θ θx

θ > 0, x = 0, 1, 2, · · ·
x!
pθ (x) =
 0

otherwise

where θ is a parameter. One of the important properties of the Poisson distribu-


tion is that mean and variance are the same and are equal to θ. If X1 , X2 , · · · , Xn
are iid Poisson random variables with parameter θ, then the sum of random vari-
Pn
ables i=1 Xi follows a Poisson distribution with parameter nθ.

After correcting 50 pages of a book, the proof readers find that there are, on
the average 2 errors per 5 pages. One would like to know the number of pages
with errors 0 , 1, 2, 3 · · · in 10000 pages of the first print of the book. X denote
the number of errors per page; then the random variable X follow the Poisson
2
distribution with parameter θ = 5 = 0.4.
A.Santhakumaran 27

2.6 Discrete group family of probability distribution models

If the random variable X follow a power series distribution, then its pmf is

 ax θx

x ∈ S; ax ≥ 0, θ > 0
f (θ)
Pθ {X = x} =
 0

otherwise

ax θx , θ > 0 so that f (θ) is


P
where f (θ) is a generating function, i.e.,f (θ) = x∈S

positive, finite, and, differentiable, and S is a non - empty countable subset of


non - negative integers.

Particular Cases:
(i) Binomial distribution
p
Let θ = 1+p , f (θ) = (1 + θ)n and S = {0, 1, 2, 3, · · · , n} a set of non - negative
integers, then
X
f (θ) = ax θx
x∈S
Xn
(1 + θ)n = ax θx
x=0
n
ax = x
n  x
p
x 1−p
Pp {X = x} = p n
[1 + 1−p ]

 n
  
x px q n−x x = 0, 1, 2, · · · , n

=
 0

otherwise

(ii) Negative Binomial distribution


p
Let θ = 1−p , f (θ) = (1 − θ)−n and S = {0, 1, 2, · · ·}, 0 ≤ θ ≤ 1 and n ∈ I+ . Now
X
f (θ) = ax θ x
x∈S

(1 − θ)−n =
X
ax θx
x=0
- n  n+x-1 n+x-1
   
x x x
ax = (−1) x = (−1) (−1) x = x
A.Santhakumaran 28

n + x - 1  p x
 
x 1+p
P {X = x} = h i−n
p
1 − ( 1+p )
x
n+x-1 p
 
= x (1 + p)−n
1+p
n+x-1 x
 
= x p (1 + p)−(n+x)
-n
= x (−p)x (1 + p)−(n+x) x = 0, 1, 2, · · ·

(iii) Poisson distribution


Let f (θ) = eθ and S = {0, 1, 2, · · ·}. Now
X
f (θ) = ax θx
x∈S
X
θ
e = ax θx
x∈S
∞ ∞
X θx X 1
= ax θx → ax =
x=0
x! x=0
x!
ax θx 1 θx e−θ θx
Pθ {X = x} = = = x = 0, 1, 2, · · ·
f (θ) x! eθ x!

Problem 2.1 Let X be the random variable taking value 1, 2, 3 · · · with P (X = x) = 21 .


Find P {X ≥ 15 is a multiple of 3 | Xis a multiple of 3} .
Solution: Calculating the conditional probability, consider

P {X ≥ 5 ∩ X is a multiple of 3 }
P {Xis a multiple of 3}

P {X is a multiple of 3} = P {X = 3 or 6 or 9 or ... }
1 1 1 1
= + + + ··· = .
23 26 29 7

P {X ≥ 5 ∩ X is a multiple of 3} = P {X = 15 or 18 or 21 or · · · }
 
1 1 1 1 1 1 1 8 1
= 215
+ 218
+ 221
+ ··· = 215
1+ 23
+ 26
+ ··· = 212
× 7 = 212
× 17 .

1
P {X ≥ 15 is a multiple of 3 | X is a multiple of 3} = .
212

Problem 2.2 In 100 sets of ten tosses an unbiased coin, in how many cases do you
expect to get 7 Heads in 3 Trials?.
A.Santhakumaran 29

1 1
Solution: There are ten tosses of a unbiased coin , n = 10. P (H) = 2 and P (T ) = 2 .
Let X denote the number of heads in tossing a coin 10 times, then X ∼ B(10, 21 ).

Expectation of getting 7 heads in 3 trials = 100 × P (X = 7)


!
10 1 10
 
= 100 × ×
7 2
1
= 100 × 0 × 10 = 11.718.
2

Problem 2.3 The probability distribution of a random variable X is given by

X=x 0 1 2 3
p(x) 0.1 0.3 0.4 0.2

Find E[X] and E[Y ] where Y = X 2 + X.


Solution:

E[X] = 0 × 0.1 + 1 × +2 × 0.4 + 3 × 0.2.

= 0.3 + 0.8 + 0.6 = 1.7.

E[X 2 ] = 02 × 0.1 + 12 × 0.3 + 22 × 0.4 + 32 × 0.2.

= 0 + 0.3 + 1.6 + 1.8 = 3.7.

E[Y ] = E[X 2 + X] = 3.7 + 1.7 = 5.4.

Problem 2.4 1800 trials of a draw of two fair dice, what is the expected number of
items that sum will be less than 5?
Solution: When tossing two fair dice,the total number of equally likely cases are
6 × 6 = 36. The number of favorable to the event in a single throw of two dice with
sum will be less than 5 is 3, i.e. the events are 1+3 , 2+2, 3 +1. Probability of getting
3 1
a sum less than is 36 = 12 . Expected number of times, the total will be less than 5 in
1
1800 trials = 12 × 1800 = 150.
Problem 2.5 A and B throw with one die for a stake of Rs.66 that is to be won by
the player who first throws 2. If X have the first throw, what are their respective
expectations?
Solution: The chance of getting first throws 2 with one die is 61 .
A.Santhakumaran 30

A can win in the first , third, fifth, · · · throws.


1 5 5 1 5 5 5 1
A’s chance of getting first throws 2 = + × × + × × × + ···
6 6 6 6 6 6 6 6
 2  3
1 5 5
= [1 + + + · · ·]
6 6 6
1 5 −2 36 6
 
= 1− = = .
6 6 6 × 11 11
B can win in the second , fourth, sixth, · · · throws.
B’s chance of getting first throws 2

= 1 − A’s winning chance of getting first throws 2


5 5
= 1− = .
6 11

6
A’s expectation = 66 ×
= Rs. 36.
11
5
B’s expectation = 66 × = Rs. 30.
11
Problem 2.6 In a business, a person can make a profit of Rs. 3000 with probability
0.6 or suffer a loss of Rs. 1200 with the probability 0.4. Determine his expectation.
Solution: Expectation of profit = Rs.3000 ×0.6 = Rs. 1800.
Expectation of loss = Rs. 1200 × 0.4 = Rs. 480.
His total expectation in the business = Rs. 1800 - Rs. 480 = Rs. 1320.
Problem 2.7 A coin is tossed until a head appear, what is the expectation of tosses
required.
Solution: If X denote the number of tosses in throwing a coin repeatedly, then
x = 1, 2, 3 · · · are the throws head appeared.

Event H TH TTH TTTH ···


X=x 1 2 3 4 ···
 2  3  2
1 1 1 1
P {X = x} 2 2 2 2 ···

The expected number of tosses required to get the head

1 1 1
E[X] = 1 × + 2 × ( )2 + 3 × ( )3 + · · ·
2 2 2
A.Santhakumaran 31

1 1 1
= [1 + 2 × + 3 × 2 + · · ·
2 2 2
1 1 1
= [1 − ]−2 = × 4 = 2.
2 2 2

Problem 2.8 A man draws 3 balls from a box containing 5 white and 7 black balls.
He gets Rs.10 for each white ball and Rs.5 for each black ball. Find his amount of
expectation.
Solution: There are two types of white and black balls. Let X be a random variable
which represents the sum of the amount of Rs. 10 for a white ball and Rs.5 for a black
ball.Three balls are drawn out of 12 balls. It can be done in the following ways., i.e.,
WWW, WWB, WBB and BBB. The random variable X take the values Rs. 30 for
three whites Rs. 25 for two whites and one black, Rs. 20 for one white and two blacks
and Rs.15 for three blacks.
5
3 1
P {X = 30} = 12 = 22
3
5
2 7
P {X = 25} = 7 = 22
2
5
1 21
P {X = 20} = 7 = 44
2
7
3 7
P {X = 15} = 12 = 44
3
1 7 21 7
E[X] = 30 × + 25 × + 20 × + 15 × = 21.25.
22 22 44 44

His amount of expectation = Rs. 21.25.


Problem 2.9 A die is thrown repeatedly until the face 2 is obtained. How many tosses
are required?
Solution: If X denote the number of tosses in throwing a dice until the face 2 is
appeared, then the number of tosses are x = 1, 2, 3, · · · .the face 2 appeared. Let A
be the event of the face 2 and Ā be the complement of event A. The probability of
1 1
getting the event A is 6 and probability of getting the event Ā is 1 − 6 = 56 .
A.Santhakumaran 32

Trial x 1 2 3 4 ···
Event A ĀA ĀĀA ĀĀĀA ···
1 5 1
P {X = x} 6 6 × 6 ( 56 )2 × 1
6 ( 56 )3 × 1
6 ···
The expected number of tosses required to get the face 2 is
 2
1 5 1 5 1
E[X] = 1 × + 2 × × + 3 × × + ···
6 6 6 6 6
−2
1 5

= 1− = 6.
6 6
Problem 2.10 A die is thrown repeatedly until the face 2 is obtained . Find the
expectation that first time the face 2 is appeared.
Solution: If X denote the number of failures preceding the first success in throwing
a die repeatedly, then first time face 2 appeared in the tosses is x = 0, 1, 2, 3 · · ·. Let
A be the event of face 2 and Ā be the complement of event A.
First time face 2
appears A ĀA ĀĀA ĀĀĀA ···
Number of failures
x 0 1 2 3 ···
P {X = X} ( 56 )0 × 1
6 ( 65 )1 × 1
6 ( 56 )2 × 1
6 ( 65 )3 × 1
6 ···
Expected number of first time face 2 appeared is

E[X] = 0 × P {X = 0} + 1 × P {X = 1} + 2 × P {X = 2} + · · ·
 2
5 1 5 1
= 1× × +2× × + ···
6 6 6 6
1 5 5 −2
 
= × 1− = 5.
6 6 6

Problem 2.11 Six dice are thrown 729 times. How many times do you expects at
least three dice to show 5 or 6 ?
2 1
Solution: The probability of getting 5 or 6 when a die is thrown = 6 = 2. If X
denote the number of dice to show 5 or 6 in throwing six dice , then X ∼ B(6, 13 ). The
probability of getting at least 3 dice to show 5 or 6 is
6
!   
x 6−x
X 6 1 2
P {X ≥ 3} =
x=3
x 3 3
233
= .
729
A.Santhakumaran 33

Six dice are thrown 729 times. The expected number of times at least 3 dice to show
233
5 or 6 = 729 × 729 = 233 times.
Problem 2.12 An irregular six-faced die is such that the probability that it gives 3
even numbers in 5 trials is twice the probability that it gives 2 even numbers in 5 trials.
How many sets of exactly 5 trials can be expected to give no even numbers out of 250
sets.
Solution: Let p be the probability of getting an even number with unfair die and
q = 1 − p. The number of trials n = 5. If X denote the number of even numbers in 5
trials, then

P {X = 3} = 2P {X = 2}
! !
5 3 2 5 2 3
p q = 2× p q
3 2
p = 2q = 2(1 − p)
1 2
p = and q =
3 3

Probability of getting no even number


!   
5 1 0 2 5 1
P {X = 0} = =
0 3 3 243

1
The expected number of trials for getting no even number is 250 × 243 = 1.028 ≈ 1.
1
Problem 2.13 The probability of a man hitting a target is 3. How many times he
must fire so that the probability of hitting the target at least once is more than 90%?
Solution: Let X be the number of times a man hitting a target. The probability of
hitting the target at least once is more than 90% , i.e., P {X ≥ 1} = 0.9. Given the
probability of hitting the target is p = 13 . Suppose n is the number of times hitting
the target, then X ∼ B(n, p). Thus

P {X ≥ 1} = 0.9

1 − P {X < 1} = 0.9

1 − P {X = 0} = 0.9

P {X = 0} = 0.1
A.Santhakumaran 34

!   
0 n
n 1 2
= 0.1
0 3 3
 n
2
= 0.1
3
2 log(0.1)
 
n log = log(0.1) ⇒ n =
3 log( 23 )
−1.0000
n = = 5.679 ≈ 5 or 6.
−0.17600
1
Problem 2.14 Suppose a boy is hitting a target with probability 2 in each trial.
what is the probability that his 10th shot is 5th hit?
Solution: The boy’s 10th shot should result his 5th hit. So in his first 9 shots , he
has to hit the target 4 times. Thus there are 4 successes in 9 times. If X denote the
number of times hit the target, then X ∼ B(9, 12 ). Probability of exactly 4 successes
in 9 trials is ! 
9
9 1
P {X = 4} =
4 2
Probability of hitting the target 5 times while the 10th hit 5th success is

P {X = 4} × P {10th hit in his 5th success }


! 
9 1 9 1
= × = 0.1230.
4 2 2

Problem 2.15 An item is produced in large numbers. The machine is known to


produce 2% defective. A quality control inspector is examining the items by taking
them one by one at random. What is the probability that at least 4 items are to be
examined in order to get 2 defective?
Solution: Probability of defective item is p = 0.02% . If at least 4 item are examined
to give 2 defective, i.e., 4 or 5 or 6 or · · · items are examined to give two defective. The
n−1 n−2
probability of two defective in n item while the last item is defect = 1 pq ×p
where n = 4, 5, 6 · · · and q = 1 − p . Table 2.2 shows the trials with respect to the
probabilities of two defective while the lost item is defect.
A.Santhakumaran 35

Table 2.2 The probability of two defective while the lost item is defect.

Trial 1 2 3 ···
Examining item 4 5 6 ···
Number of time failures
in examining item 3 4 5 ···
Two defective while
the last one is defect
3 2 4 3 5 4
among the examining item 1 pq ×p 1 pq ×p 1 pq ×p ···

Probability of 2 defective in at least 4 item is

P {X ≥ 4} = P {X = 4} + P {X = 5} + · · ·
! ! !
3 2 2 4 2 3 5 2 4
= p q + p q + p q + ···
1 1 1
= p2 [3q 2 + 4q 3 + 5q 4 + · · ·]

= p2 [1 + 2q + 3q + · · · + −1 − 2q]

= p2 [1 − q]−2 − 1 − 2q]

= p2 [p−2 − 1 − 2q]

= 1 − p2 − 2p2 q

= 1 − (0.02)2 − 2 × (0.02)2 × 0.98 = 0.9988.

Problem 2.16 If the probability that a child is a boy is 0.80. Find the expected
number of boys in a family with 5 children given that there is at least one boy.
Solution: Let the probability of a boy child in a family be p, 0 < p < 1 and q = 1−p.
If X denote the number of boys in a family with n children, then X ∼ B(n, p). The
probability of exactly x boys in the family given that there at least one boy is

P {X = x ∩ X ≥ 1}
P {X = x | X ≥ 1} =
P {X ≥ 1}
P {X = x}
=
P {X ≥ 1}
n x n−x
x p q
=
1 − P {X = 0}
n x n−x
x p q
=
1 − qn
A.Santhakumaran 36

Expected number of boys in the family given that there at least one boy is
n n n x n−x
x p q
X X
xP {X = x | X ≥ 1} = x
x=1 x=1
(1 − q n )
n
X xn!px q n−x
=
x=1
x!(n − x)!
n
np X (x − 1)!p(x−1) q (n−1)−(x−1)
=
1 − q n x=1 (x − 1)!(n − x)!
np
= (p + q)n−1
1 − qn
where nx=1 n−1
 x−1 n−x
= (p + q)n−1
P
x−1 p q
np
=
1 − qn
5 × 0.8
= where p = 0.8 and n = 5
1 − (0.2)5
5 × 0.2
= = 1.0003 ≈ 1.
0.99968

Problem 2.17 A communication system consists of n components each of while


independently function with probability p. The total system will be able to operate
effectively of at least one half of its components function. For what values of p is a 5
component system more likely to operate effectively than a 3 component system?
Solution: Let the probability of functioning each component be p and q = 1 − p with
0 < p < 1. If X denote the number of components functioning in the n component
system, then X ∼ B(n, p) .
Probability of 5 component system function effectively

= Probabilty of 3 or 4 or 5 component functioning

= P {X = 3} + P {X = 4} + P {X = 5}
! !
5 3 2 5 2
= p q + p q + p5
3 4
= 10p3 q 2 + 5p4 q + p5

Probability of 3 component system functions effectively

= Probability of 2 or 3 component functioning

= P {X = 2} + P {X = 3}
A.Santhakumaran 37

!
3 2
= p q + q3
2
= 3p2 q + q 3

5 component system will function more effectively than the 3 component system , if

10p3 q 2 + 5p4 q + p5 ≥ 3p2 q + p3

10p3 [1 − 2p + p2 ] + 5p4 [1 − p] + p5 ≥ 3p2 [1 − p] + p3

6p5 − 15p4 + 12p3 − 3p2 ≥ 0

3p2 [2p3 − 5p2 + 4p − 1] ≥ 0

3p2 [2p − 1][p − 1]2 ≥ 0

1
⇒ p ≥ 0 or 2p − 1 ≥ 0 or p − 1 ≥ 0 i.e., p ≥ 0 or p ≥ or p ≥ 1
2

p = 0 or p = 1 is oblivious and p ≥ 1 is not possible.


Thus the probability of 5 component system more likely to operate effectively than a
3 component system is p ≥ 21 .
Problem 2.18 Find the probability of 6 or more telephone calls arriving in 9 minutes
period in a telephone exchange, If the telephone calls that are at the rate of 2 every 3
minutes which follow Poisson distribution.
Solution: Let X be the number of telephone calls received in 9 minutes period of
time, then X ∼ Poisson distribution, i.e.,

 e−λ λx

x = 0, 1, 2, 3 · · ·
x!
P {X = x} =
 0

otherwise

where λ is the average number of calls in every 9 minutes. The average number of calls
arriving a telephone booth in one minute is 32 . The average number of calls arriving
2
in every 9 minutes is 9 × 3 = 6, i.e., λ = 6.
P5 e−6 6x
Thus P {X ≥ 6 or more calls } = P {X ≤ 5} = 1 − x=0 x! = 1 − 0.4457 = 0.5543.
Problem 2.19 The number of grasshoppers on a broad bean leaf follows a Poisson
probability model with mean λ = 2. A plant inspector, however records the number of
A.Santhakumaran 38

grasshoppers on a leaf only if at least 1 grasshoppers is present. What is the expected


number of grasshoppers record per leaf?.
Solution: Let X be the number of grasshoppers on a leaf which follow Poisson distri-
bution with parameter λ.The probability that exactly X grasshopper on a leaf given
that at least one grasshoppers present is

P {X = x ∩ X ≥ 1}
P {X = x | X ≥ 1} =
P {X ≥ 1}
P {X = x}
=
1 − P {X = 0}
x
e−λ λx!
= x = 0, 1, 2, 3 · · ·
1 − e−λ

The probability of one or two grasshopper on a leaf given that at least one grasshoppers
on a leaf present is

P {X = 1 or X = 2 | X ≥ 1} = P {X = 1 | X ≥ 1} + P {X = 2 | X ≥ 1}
e−λ λ e−λ λ2
= +
1 − e−λ (1 − e− λ)2!
e−λ λ2
= [λ + ]
1 − e−λ 2
4e−2
= = 0.6260 where λ = 2
1 − e−2

Expected number of Grasshoppers recorded per leaf given at least one grasshopper
present is
∞ ∞
X X e−λ λx
xP {X = x | X ≥ 1} = x
x=1 x=1
(1 − e−λ )x!

λe−λ λx X λx−1
=
1 − e−λ x=1
(x − 1)!
λe−λ eλ λ 2
For λ = 2 ⇒ = = = = 2.3129 ≈ 2 or 3.
1 − e−λ 1 − e−λ 1 − e−2

2.7 Continuous probability distribution models

Continuous random variables can be used to describe random phenomena in


which the variable X of interest can take any value x in some interval which has
A.Santhakumaran 39

P {X = x} = 0 ∀ x in that interval.

Problem 2.20 Find the mean and variance of the distribution that has the form



 0 x < 10


 1

10 ≤ x < 15

4
F {X = x} =
3



 4 15 ≤ x < 20


20 ≤ x

 1

Also find the probability P {X = x}, when x = 10, 15, 20


Solution: We know that

P {X = a} = F (a) − lim F (x)


x→∞

= F (a) − F (a−) where F (a−) is the left hand limit at x = a

P {X = 10} = F (10) − F (10−)


1 1
= −0=
4 4
P {X = 15} = F (15) − F (15−)
3 1 1
= − =
4 4 2
P {X = 20} = F (20) − F (20−)
3
= 1−
4

X=x 10 15 20
1 1 1
P {X = x} 4 2 4

1 1 1
Mean = E[X] = 10 × + 15 × + 20 ×
4 2 4
= 2.5 + 7.5 + 5 = 7.5
1 1 1 950
E[X 2 ] = 1o2 × + 152 × + 202 × =
4 2 4 4
2 2 50
V [X] = E[X ] − [E(X)] = .
4
A.Santhakumaran 40

Problem 2.21 Let X have the distribution function F (x) defined by






 0 x<o

x2

0≤x<1



 4


F {X = x} = 1
 2 1≤x<2


x
2≤x<3


3





1 3≤x

Find the pdf and mean of the distribution of random variable X.


Solution: The pdf of random variable X is calculated as below: when x lies in the
x
interval 0 < x < 1 is f (x) = 2 since in the interval 0 < x < 1. F (x) is continuous.
When 1 ≤ x < 2 it is not continuous at x = 1, since

P (X = 1) = F (1) − F (1−)
1 x2
= − lim
2 x→1− 4
1 1 1
= − =
2 4 4

The pdf of X when x lies in the interval 2 < x < 3 is f (x) = 31 .

When x = 2 ⇒ P (X = 2) = F (2) − F (2−)


2 1
= − lim
3 x→2− 2
2 1 1
= − =
3 2 6

The pdf of X is 
x


 2 0<x<1


 1

x=1

4
f (x) =
1



 6 x=2


 1

2<x<3
3
R1 x R3 1
It satisfies 0 2 dx + P {X = 1} + 1 3 dx + P {X = 2} = 1
Z ∞
Mean E[X] = xf (x)dx
−∞
A.Santhakumaran 41

Z 1 2 Z 3
x x
= dx + 1 × P (X = 1) + dx + 2 × P (X = 2)
0 2 2 3
1 1 2
Z 3
1 1 1
Z
= x dx + 1 × + xdx + 2 ×
2 0 4 3 2 6
19
=
12
(i) Uniform probability distribution model
A random variable X is uniformly distributed at an interval [a, b], if its pdf is
given by 
1
a≤x≤b


b−a
pa,b (x) =
 0

otherwise
x2 −x1
Note that P {x1 < X < x2 } = F (x2 ) − F (x1 ) = b−a is proportional to the
length of the interval for all x1 and x2 satisfying a ≤ x1 ≤ x2 ≤ b. If a random
phenomenon has complete unpredictability, then it can be described as uniform
probability model distribution.
Problem 2.22 A passenger at a bus stop at 4 PM, knowing that the bus will
arrived at some time uniformly distributed between 4 PM and 5 PM. What is
the probability that he will have to wait longer 10 minutes? If at 4.30 PM the
bus has not yet arrived, what is the probability that he will have to wait at least
10 additional minutes?
Solution: Let X be the waiting time in minutes of the passenger. Probability
density function of random variable X is

 1

0 < x < 60
60
f (x) =
 0

otherwise

Probability that he will have to wait longer then 10 minutes


Z 60
1 5
= P {X > 10} = dx =
10 60 6
No bus has arrived between 4 PM and 4.30 PM. Therefore the waiting time of
the passenger is 30 minutes. Probability that he has to wait at least 10 additional
minutes given that he has waited 30 minutes
P {X > 40 ∩ X > 30}
= P {X > 40 | X > 30} =
P {X > 30}
A.Santhakumaran 42

1 R 60
P {X > 40} 60 dx 2
= = 40
R 60 1 dx = .
P {X > 30} 30 60
3

Problem 2.23 The random variable a, b are independently and uniformly dis-
tributed in the intervals (0,6) and (0,9) respectively. Find the probability that
roots of the equation x2 − ax + b = 0 are real.
Solution : The equation x2 − ax + b = 0 has real roots, if ∆ = a2 − 4b ≥ 0 and
a ∼ U (0, 6) and b ∼ U (0, 9) , i.e.,

 1

0<x<6
6
f (x) =
 0

otherwise

 1

0<x<9
9
f (x) =
 0

otherwise

The probability that the roots of the equation x2 − ax + b = 0 are real, i.e.,

P {∆ ≥ 0} = P {a2 − 4b ≥ 0}
Z Z
= dadb subject to a2 − 4b ≥ 0
1 2
Z 6 !
Z
4
a
= f (b)db f (a)da 0 < a < 6 and 0 < b ≤ 14 a2
0 0
Z 6
1 1 2 1
= a da = .
54 0 4 3

(ii) Normal probability distribution model


A random variable X with mean θ (−∞ < θ < ∞) and variance σ 2 (> 0) has a
Normal distribution, it has the pdf

1 2
 √ 1 e− 2σ2 [x−θ]

−∞ < x < ∞
pθ,σ2 (x) = 2πσ
 0

otherwise

The time of a random experiment can be thought of as a Normal distribution.


The time to assemble a product which is the sum of times required for each
assembly operation may describe a Normal random variable.
Problem 2.24 An engineering college has 100 ME students. The probability
that any students require a copy of a particular book from the college library on
A.Santhakumaran 43

any day is 0.05. How many copies of the book should be kept in the library so
that the probability may be greater than 0.95 that none of the students requiring
a copy from the library has to come back disappointed? Assume sample is large.
Solution: Given the sample size n = 100. Probability that a student requires a
book from the library = 0.05. Choose the probability of success is p = 0.05 and
q = 0.95. Let X be the number of students requiring the book , then X ∼ B(n, p).
Mean µ = np = 100 × 0.05 = 5 and variance σ 2 = npq = 100 × 0.05 × 0.95 = 4.75.
Using the Normal approximation, then X ∼ N (µ, σ 2 ). Let X be the required
number of books which satisfies.

P {X < x} > 0.95


X −5 x−5
 
P < > 0.95
2.18 2.18
x−5
 
P Z< > 0.95
2.18
X−5
where Z = 2.18 ∼ N (0, 1)
x−5
 
P −∞ < Z < 0 or 0 < Z < > 0.95
2.18
x−5
 
0.5 + P 0 < Z < > 0.95
2.18
x−5
 
P 0<Z< > 0.45
2.18
Z x−5
2.18 1 1 2
⇒ √ e− 2 x dx > 0.45
0 2π

x−5
From the Normal distribution table, the upper ordinate 2.18 corresponding to the
x−5
area 0.45 is 1.65 , i.e., 2.18 > 1.65 ⇒ x > 5 + 2.18 × 1.65, i.e., x > 17.985 ≈ 18.
Hence the college library should keep at least 18 copies of the book.
Problem 2.25 In an experiment, it is laid down that a student passes, if he
secures 40 per cent or more marks. He is placed in the first, second or third
division according as he secures 60% or more marks, between 50% and 60%
marks, and marks between 40% and 50% respectively. He gets distinction in
case he secures 80% or more marks. It is notified from the result 20% of students
failed on the examination, whereas 5% of them obtained distinction. Calculate
A.Santhakumaran 44

the percentage of students placed in the second division.Assume marks of the


student follow Normal probability distribution model.
Solution: Let X be the marks of students i.e., X ∼ N (µ, σ 2 ). Given P {X <
40} = 0.2 20% of students failed and P {X ≥ 80} = 0.05 5% of students get
distinction.
X −µ 40 − µ X −µ 80 − µ
   
P < = 0.2 and P > = 0.05
σ σ σ σ
40 − µ 80 − µ
   
P Z< = 0.2 and P Z> = 0.05 where Z ∼ N (0, 1)
σ σ
40−µ
From Normal distribution table the upper ordinate σ corresponding to the
are 0.2 is - 0.84, i.e., 40−µ
σ = −0.84 ⇒ µ − 0.84σ = 40 (2.3)
80−µ
The lower ordinate σ corresponding to the area 0.05 is 1.645 ,
i.e., 80−µ
σ = 1.645 ⇒ µ + 1.645σ = 80 (2.4)
Solving the equation (2.3) and (2.4) µ = 53.52 and σ = 16.10. The probaility of
a student placed in the second division is
50 − 53.52 60 − 53.52
 
P {50 < X < 60} = P <Z<
16.10 16.10
= P {−0.2186 < Z < 0.4025}

= P {−0.2186 < Z < 0 or 0 < Z < 0.4025}

= 0.8652 + 0.15634 = 0.24286

Therefore 24% candidates got second division in the examination.


Problem 2.26 Marks obtained by a number of students in a certain subject are
approximately normally distributed with mean 60 and variance 25. If 5 students
are selected at random from this group, what is the probability that exactly 2 of
them would have scored above 70?.
Solution: Let the random variable X denote the marks obtained by the given
set of students in the given subject, then X ∼ N (60, 25), i.e., X has mean µ = 60
and variance σ 2 = 25. The probability that a randomly selected student from
the given set marks over 70 is
X − 60 70 − 60
 
P {X > 70} = P > = P {Z > 2} = 0.0227.
5 5
A.Santhakumaran 45

Let Y be the number of students scoring marks above 70 with probability p =


0.0227. The Probability that out of 5 students selected at random from the set,
exactly 2 will have marks over 70 with B(5, 0.0227) is
!
5
P {Y = 3} = (0.0227)2 (0.9973)3 = 0.005111.
2

Problem 2.27 The fuel per cent specification in a rocket is followed a Normal
distribution with mean µ = 30 and variance σ 2 = 25 through the specification
of fuel is that it should lie between 25 and 35. The manufacturer will get a net
profit per liter of fuel Rs. 100, if the fuel specification lie between 25 and 35.
: Rs. 40, if the fuel specification lie between 20 and 25 or lie between 35 and
40.Also if the fuel specification lie less than 20 or greater than 40 increase a loss
of Rs. 50 per liter of the manufacturer.Find expected profit of the manufacturer.
If he wants to increase his expected profit by 50% the net profit on the category
of fuel that meets the specification, what should be the new profit per liter of
the fuel to his category?.
Solution: Let X be the fuel per cent specification in a rocket, i.e., X ∼ N (µ, σ 2 ).
The probability of fuel specification lie between 25 and 35 is

25 − 30 X − 30 35 − 30
 
P {25 < X < 35} = P < <
5 5 5
= P {−1 < Z < 1} where Z = X−30
5 ∼ N (0, 1)

= 2P {0 < Z < 1} since f (z) = f (−z)

= 2 × 0.3413 = 0.6826.
20 − 30 X − 30 25 − 30
 
P {20 < X < 25} = P < <
5 5 5
= P {−2 < Z < −1} = P {Z < −1} − P {Z < −2}

= 0.1587 − 0.2280 = 0.1360.


35 − 30 40 − 30
 
P {35 < X < 40} = P <Z<
5 5
= P {1 < Z < 2}

= P {0 < Z < 2} − P {0Z < 1}

= 0.4773 − 0.3413 = 0.1360.


A.Santhakumaran 46

P {X < 20 or X > 40} = 1 − (0.6826 + 0.2720) = 0.0454.

Probability Profit/Liter Expected Profit/Liter


0.6826 Rs. 100 Rs. 62.26
0.2720 Rs.40 Rs. 10.80
0.0137 Rs. -50 Rs. -0.685
Total - Rs. 72.375

Expected profit per liter of the manufacturer = Rs. 72.38. Let the revised net
profit per unit of the first category fuel be k. Then the expected revised profit
per liter is Rs.[k × 0.6826 + 40 × 0.272 + −50 × 0.0137] = Rs.0.6826 + Rs.10.20
50% of the expected profit =Rs.72.38 × 0.5 = Rs.36.19.
Manufacturer expected revised profit per liter is Rs. 72.38 + Rs. 36.19 = Rs.
108.57.
108.57−10.20
Therefore 0.6826k + 10.20 = 108.57 ⇒ k = 0.6826 = Rs.144.11. The revised
net profit per unit of the first category fuel is Rs. 144.11.

(iii) Exponential probability distribution model


The random variable X are said to be Exponentially distributed with parameter
θ > 0 , if the pdf is given by

 θe−θx

x>0
pθ (x) =
 0

otherwise

The value of the intercept on the vertical axis is always equal to the value of θ.
Note that all pdf 0 s eventually intersect at θ, since the Exponential distribution
has its mode at the origin. The mean and standard deviation are equal in Expo-
nential distribution. In a random phenomenon, the time between independent
events which have memory less property may appropriately follow Exponential
random variable. For example, the time between arrivals of a large number of
customers who act independently of each other may fit adequately the data to
Exponential distribution.
A.Santhakumaran 47

Problem 2.28 The length of the shower on a tropical island during rainy sea-
son has an Exponential distribution with parameter 2, time being measured in
minutes. What is the probability that a shower will last more than 3 minutes?.
If a shower has already lasted for 2 minutes. What is the probability that it will
last one more minute?
Solution: Assume that X denote the length of shower on a rainy season in min-
utes. Given X follow Exponential probability model distribution with parameter
λ = 2 minutes. Probability that a shower will last more than 3 minutes is
Z ∞
P {X > 3} = 2e−2x dx = e−6 = 0.0025.
3

Probability that a shower has already lasted for 2 minutes given that he will last
for at least one more minutes is

P {X ≥ 2 ∩ X > 1} P {X > 2}
P {X ≥ 2 | X > 1} = = = e−2 = 0.1353.
p{X > 1} P {X > 1}

Problem 2.29 If the time to failure of a component is Exponentially distributed


with mean failure time 5 hours. If 5 components are installed, what is the
probability that one half functioning at the end of 10 hours?.
Solution: Let X be the time to failure of a component in hours and X follow
the Exponential distribution with parameter λ. Its pdf is

 1 e−λx

λ > 0, x > 0
λ
f (x) =
 0

otherwise

1 1
Expected failure time E[X] = λ =5⇒λ= 5 per hour.
The probability that a component functioning at end of 10 hours or after 10
hours as success
Z ∞
1
p = P {X ≥ 10} = λe−λx dx = e−10λ = e−10× 5 = e−2 = 0.1353.
10

5 components are installed and one half or more components functioning at the
end or after 10 hours are 3 or more than 3. Thus the probability of x success in
A.Santhakumaran 48

5 components follows a Binomial distribution with parameters (5, p) is



 5 px q 5−x

x = 0, 1, 2, 3, 4, 5

x
P {X = x} =
 0

otherwise
Probability that one half or more components functioning at the end of 10 hours
or after 10 hours is
5
!
X 5
P {X ≥ 3} = (0.1353)x (0.8647)5−x = 0.01997.
x=3
x
Problem 2.30 The daily consumption of milk in excess of 20000 liters in a town
1
is approximately Exponentially distributed with parameter 3000 . The town has
a daily stock of 35000 liters. What is the probability that of 2 days selected at
random, the stock is insufficient for both days?.
Solution: Let X be the daily consumption of milk in excess of 20000 liters, i.e.,
1
X + 20000 follow Exponential distribution with parameter λ = 3000 per liters.
The daily consumption of milk in a day = 35000 liters.
Probability that the stock is insufficient for a day is

P {X + 20000 > 35000} = P {X > 35000 − 20000} = P {X > 15000}


Z ∞
= λe−λx dx
15000
1
−λ×15000
= e = e−15000× 3000 = e−5 .

The probability that stock is insufficient for 2 days is (e−5 )2 = 0.000045.


Problem 2.31 The lifetime of electric bulbs follow an Exponential distribution.
Bulbs are manufactured by two units of a company. The expected life time of
a bulb is 1000 hours. If the bulb is manufactured by unit I or II. Moreover if
the bulb lasts less than the guaranteed life if 1200 hours, a loss of Rs 35 is to be
borne by the manufacturer. Find which unit is advantageous to him?
Solution: LetX be the life time in hours manufactured by unit I or unit II.
The random variable X follow Exponential probability model distribution with
parameter λ and the pdf is

 λe−λx

λ > 0, x > 0
f (x) =
 0

otherwise
A.Santhakumaran 49

Probability that a bulb has life at the end or after 1200 hours is
Z ∞
6
P {X ≥ 1200} = λe−λx dx = e−1200λ = e− 5 = 0.3012.
1200

Since the expected life time of unit I is

1
E[X] = = 1000
λ
1
⇒λ = per hour and P {X < 1200} = 0.6988
1000

The cost of production per bulb of unit I is



 40 if x ≥ 1200

C1 =
 75

if x < 1200

The expected cost of production per bulb of unit I is

E[C1 ] = 40 × P {X ≥ 1200} + 75 × P {X < 1200}

= 40 × 0.3012 + 75 × 0.6988 = Rs 64.46.

Similarly , if unit II is used , then the expected life time = 1500 hours and
E[X] = 1
λ = 1500 hours, λ = 1
1500 per hour. P {X ≥ 1200} = e−8 = 0.4493 and
P {X < 1200} = 0.5507. The cost production per bulb of unit II is

 30 if x ≥ 1200

C2 =
 65

if x < 1200

The expected cost of production per bulb of unit II is

E[C2 ] = 0.4493 × +0.5507 × 65 = Rs.49.28.

Since E[C1 ] < E[C2 ] unit II is advantageous to the manufactures.

(iv) Probability model of Gamma distribution


A function used to define the Gamma distribution is the Gamma function. A
random variable X follows a Gamma distribution, if

 θβ e−θx xβ−1

x > 0, β > 0, θ > 0
Γβ
pθ,β (x) =
 0

otherwise
A.Santhakumaran 50

where β is called the shape parameter and θ is called the scale parameter.
Pn
i=1 Xi ∼ G(n, 1θ ) , if each Xi ∼ exp( 1θ ). The cumulative distribution func-
tion F (x) = P {X ≤ x} of a random variable X is given by

 1 − ∞ βθ (βθt)β−1 e−βθt dt x > 0
 R
x Γβ
F (x) =
 0

otherwise

(v) Probability model of Erlang distribution


The pdf of Gamma distribution becomes Erlang distribution of order k when
β = k an integer. When β = k a positive integer, the cumulative distribution
function F (x) is given by

−kθx (kθx)i
 1 − Pk−1 e

x>0
i=0 i!
F (x) =
 0

otherwise

which is the sum of Poisson terms with mean kθx.


Problem 2.32 In a certain city, the daily consumption of electric power in
millions of kilowatt hours can be treated as a random variable having an Erlang
distribution and shape parameter = 2. If power plant has a daily capacity of 6
millions kilowatt hours. What is the probability that this power supply will be
adequate on any given day?.
Solution: Let X be the daily consumption of electric power in millions kilowatt
hours and have the Erlang distribution with parameters λ and k. The mean
k
consumption of millions kilowatts hours is E[X] = λ = 2 and shape parameter
k=2 ⇒ λ = 1 . The pdf of the Erlang distribution is

 λ2 e−λx x x > 0

f (x) =
 0

otherwise

The probability of power supply will be inadequate on any given day


Z ∞
P {X > 6} = λ2 e−λx xdx
0
= 6λe−6λ + e−6λ = 6e−6 + e−6 = 0.0174.
A.Santhakumaran 51

The probability of power supply will be adequate on any given day = 1 - 0.0174
= 0.926.
Problem2.33 If a company employees n sales persons. Its gross sales in thou-
sands of rupees follows an Erlang probability model distribution with scale pa-

rameter λ = 0.5 and shape parameter k = 8000 n. If the sales cost is Rs. 4000
per sales person, how many sales persons should the company employ to maxi-
mize the expected profit?.
Solution: Let X be the gross sales in rupees. the company has n sales per-
son. The random variable follow the Erlang distribution with parameter λ and

k. The expected gross sales = E[X] = λk = 16000 n. Let T denote the total
expected profit of the company, then T = total expected sales - total sales cost
√ d2 T
= 16000 n − 4000 × n. For maximum profit dT dn = 0 and dn2 < 0 .

dT 1 1
= 16000 × n 2 −1 − 4000
dn 2
8000
= √ − 4000
n
dT 8000 dT 2
= 0 ⇒ √ − 4000 = 0i.e., n = 4and 2 < 0 at n = 4
dn n dn

T is maximum at n = 4. The company should employ 4 sales persons to maximize


the profit.

(vi) Probability model of Weibull distribution


The random variable X have the Weibull distribution then the pdf is
  β−1
 β x−γ exp[−( x−γ β
α ) ] x≥γ

α α
pβ,α,γ (x) =
 0

otherwise

The three parameters of the Weibull distribution are γ (−∞ < γ < ∞) which is
the location parameter, α (α > 0) which is the scale parameter and β (β > 0)
which is the shape parameter. When γ = 0 the Weibull pdf becomes

 β ( x )β−1 exp[−( x )β ] x ≥ 0

α α α
pβ,α (x) =
 0

otherwise
A.Santhakumaran 52

When γ = 0 and β = 1, the Weibull distribution is reduced to Exponential


distribution with pdf

 1 e− αx

x≥0
α
pα (x) =
 0

otherwise

(vii ) Probability model of Triangular distribution


The random variable X have the Triangular distribution,then the pdf is given by

2(x−a)

 a≤x≤b
 (b−a)(c−a)


pa,b,c (x) = 2(c−x)
 (c−b)(c−a) b<x≤c


 0

otherwise

where a ≤ b ≤ c. The mode occurs at x = b, since a ≤ b ≤ c, it follows that


2a+c a+2c
3 ≤ E[X] ≤ 3 . The mode is used more often than the mean to characterize
the Triangular distribution.

2.8 Distribution functions of random variables

Probability model distributions of one or more random variables are inter-


ested in finding distributions of other random variables that depend on them.
There are two techniques for finding the distribution of functions of random
variables.

2.8.1 Cumulative distribution function technique

For a continuous random variable X , with pdf function f (x). Define Y =


g(X) and find the probability density function of Y by cumulative distribution
technique. First find the cumulative distribution function of F (y) = P {Y ≤
y} = P {g(X) ≤ y}. Then differentiate F (y) with respect to you y which give
the pdf f (y) = F 0 (y).
A.Santhakumaran 53

2.8.2 Change of variable technique

Let X be a continuous random variable with pdf f (x) where x ∈ S where S is


the support of the function f (x) ( range of the variable X). If g(X) is strictly
monotonic with inverse function X = φ(y) , then the pdf of random variable
Y = g(X) is given by f (y) = f (φ(y))|φ0 (y)|
Problem 2.34 If the random variable X follows N (0, 2) and Y = 3X 2 . Find
the pdf of the random variable Y .
Solution: Given X ∼ N (0, 2). The pdf of Y is obtained by cumulative distri-
bution function technique.
r 
y

2
G(y) = P {Y ≤ y} = P {X ≤ y} = P X ≤ ±
3
 r r 
y
y
= P − ≤X≤
3
3
r   r 
y y
= F −F −
3 3
p p
f ( y/3) f (− y/3)
g(y) = √ + √
2 3y 2 3y
1 q
= √ f ( y/3) since f (y) = f (−y)
2 3y

The pdf of Y = 3X 2 is

1
√1 e− 12 y 0<y<∞


2 3yπ
g(y) =
 0

otherwise

Problem 2.35 If X ∼ N (0, 1) , find the pdf of the random variable Y = eX .


Solution: The pdf of Y by change of variable technique is Y = eX .
Let loge eX = loge Y ⇒ X = loge Y

1 1 2
G0 (y)dy = f (log y)d(log y) and f (x) = √ e− 2 x


 √1 e− 12 (log y)2 × 1 dy

0<y<∞
2π y
g(y)dy =
 0

otherwise
A.Santhakumaran 54

2.9 Empirical probability distribution models

An empirical distribution may be either continuous or discrete in nature. It


is used to establish a statistical model for the available data whenever there is a
discrepancy in the aimed distribution or whenever one can unable to arrive at a
known distribution.

2.9.1 Empirical continuous probability distribution models

The time taken to install 100 machines is collected. The data are given in
Table 2.3 which gives the number of machines together with time taken. For
example, 30 machines have installed between 0 and 1 hour, 25 between 1 and 2
hour, 20 between 2, and, 3 hour, and, 25 between 3, and, 4 hour. Let X denote
the time taken to install the machines.

Table 2.3 Distribution of the time to install the machines

Duration
of Hours Frequency p(x) F (x) = P {X ≤ x}
0≤x≤1 30 .30 .30
1<x≤2 25 .25 .55
2<x≤3 20 .20 .75
3<x≤4 25 .25 1.00

2.9.2 Empirical discrete probability distribution models

At the end of day, the number of shipments on the loading docks of an export
company are observed as 0, 1 , 2, 3, 4 and 5 with frequencies 23, 15, 12, 10, 25
and 15 respectively. Let X be the number of shipments on the loading docks of
the company at end of the day. Then X have discrete random variable which
takes the values 0 , 1, 2, 3, 4 and 5 with the distribution as given in Table 2.4.
Figure 2.1 is the Histogram of shipments on the loading docks of the company.
A.Santhakumaran 55

Table 2.4 Distribution of shipments

Number of
shipments x Frequency P {X = x} F (x) = P {X ≤ x}
0 23 .23 .23
1 15 .15 .38
2 12 .12 .50
3 10 .10 .60
4 25 .25 .85
5 15 .15 1.00

25

F
R
20
E
Q
U
E
N
C
Y 10

1 2 4 5 Number of shipments
0 3
Figure 2.1 Histogram of shipments
A.Santhakumaran 56

2.10 Illustration of a probability distribution model

The probability of an item whose value of the variable is constant increment, is


an Exponential distribution. This is apt to fit the data. If probability of an item
whose value of the variable is either positive or negative, then a Normal distribution
is appropriate to the data. When variable of interest seems to follow the Normal
probability distribution, the random variable is restricted to be greater than or less
than a certain value. The truncated Normal distribution will be adequate to fit the
data. Gamma and Weibull distributions are also used to describe the data. Exponential
distribution is a special case of both the Gamma and Weibull distributions. The
difference between Exponential, Gamma and Weibull distributions involve the location
of modes, the pdf ’s and the shapes of their tails will be in proportion to large and short
times. The Exponential distribution has its mode at the origin but Weibull distribution
has its modes at some point( ≥ 0) which is a function of the parameters values selected.
The tail of Gamma distribution is long, like an Exponential distribution while the
tail of Weibull distribution may decline more rapidly or less rapidly than that of an
Exponential distribution. In practice, if there are higher value of the variable than
an Exponential distribution, it can account for Weibull distribution which provides a
better distribution of the data.
Sixteen equipment were produced and placed on test and the Table 2.5 gives
the length of time intervals between failures in hours.

Table 2.5 Equipment’s time between failures

Equipment
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Time
between
failures 19 12 16 1 15 5 10 1 46 7 33 25 4 9 1 10

For the sake of simplicity in processing the data , one can set up the ordered set as
given blow:
A.Santhakumaran 57

Table 2.6 Ordered set of equipment time between failures

Equipment
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Time
between
failures 1 1 1 4 5 7 9 10 10 12 15 16 19 25 33 46

On this basis, one may construct a Histogram to judge the pattern of the data in Table
2.6. An approximate value of the interval can be determined from the formula.

maximum value - minimum value


∆t =
1 + 3.3 log10 N

where the maximum and minimum are the values in the ordered set and N is the total
number of items of the order statistics. In this case maximum value is 46 , minimum
45
value is 1 and N is 16. Thus ∆t = 1+3.3 log10 16 = 9.05 ≈ 10 = width of the class
interval.

Table 2.7 Frequency Distribution

Time
interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
Number of
Equipment 9 4 1 1 1

Histogram is drawn based on the frequency distribution in Table 2.7 and is given
in Figure 2.2.
A.Santhakumaran 58

Number
of
equipments

0 10 20 30 40 50 Time
Figure 2.2 Histogram of time to failures

The Histogram reveals that the distribution could be Negative Exponential or the
right portion of the Normal distribution. Assume the time to failure follows Exponen-
tial distribution of the form,

 θe−θx

θ > 0, x > 0
pθ (x) =
 0

otherwise

How for the assumption is valid has to be verified? The validity of assumption is
tested by the χ2 test of goodness fit.
A.Santhakumaran 59

Table 2.8 Distribution of time to failures

Expected Observed
frequency frequency
Interval pi E O
0 - 10 0.5262 8.41 ≈ 8 9
10 - 20 0.2493 3.98 ≈ 4 4
20 - 30 0.1181 1.886 ≈ 2 1
30 - 40 0.0559 0.8944 ≈ 1 1
40 - 50 0.0265 0.454 ≈ 1 1
R xi+1
where pi = xi θe−θx dx = e−θxi − e−θxi+1 , i = 0, 10, 20, · · · , 50. If the cell frequencies
are less than 5, then it can be made 5 or more than 5. One may get two classes
only, i.e, the expected frequencies are equal to 8 each and the corresponding observed
frequencies are 9 and 7 respectively. The χ2 test of goodness of fit fails to test the
validity of assumption that the sample data come from an Exponential distribution
1
with parameter θ = 13.38 = 0.0747 = failure rate per unit hour where the mean life
214
time of the equipments = 16 = 13.38 hours. To test the validity of assumption that
the time to failure follows an Exponential distribution, consider the likelihood function
of the cell frequencies of o1 = 9 and o2 = 7 is

n! e1 o1 e 2 o 2
o1 + o2 = n


o1 !o2 ! n n
L=
 0

otherwise

Under H0 the likelihood function follows a Binomial probability law b(16, p) where
e1
p= n. To test the hypothesis that H0 : the fit is best one vs H1 : the fit is not the
best one. It is equivalent to test the hypothesis that H0 : p ≤ 0.5 vs H1 : p > 0.5 The
UMP level α = 0.05 test is given by




 1 if x > 11

φ(x) = 0.17 if x = 11



 0

otherwise

The observed value is 9 which is less than 11. There is no evidence to reject the
hypothesis H0 . The data come from an Exponential distribution with 5% level of
A.Santhakumaran 60

significance. Thus time to failure of the equipment follows an Exponential distribution.


One may conclude that on an average the equipment would be operated for 13.38 hours
without failure.

2.11 Recognition test of probability distribution model

The construction of Histograms and the recognition of a distributional shape are


necessary ingredients for selecting a family of distributions to represent a sample data.
A Histogram is not useful for evaluating the fit of chosen distribution. When there
are a small number of data points ( ≤ 30), a Histogram can be rather ragged. Further
perception of the fit depends on the width of Histogram intervals. Even if the intervals
are well chosen, grouping the data into cells makes it difficult to compare a Histogram
to a continuous pdf . A Quantile - Quantile (Q -Q) plot is a useful tool for evaluating
distribution fit that does not suffer from these problems.
If X is a random variable with cumulative distribution F (x), then q quantile of
X is that value y such that F (y) = P {X ≤ y} = q for 0 < q < 1. When F (x) has
an inverse y = F −1 (q). Let x1 , x2 , · · · , xn be a sample observations of X. Order the
observations from smallest to the largest and denote these as yj , j = 1 to n where
y1 ≤ y2 ≤ · · · ≤ yn . One can denote j the rank or order number. Therefore j = 1 for
the smallest and j = n for the largest. Q - Q plot is based on the fact that yj is an
 
j− 1 j− 12
estimate of the ( n2 ) quantile of X, i.e, yj is approximately F −1 n .
A distribution with cumulative distribution function F (x) is a possible represen-
tation of the random variable X. If F (x) is a member of an appropriate family of
 
j− 12
distributions, then a plot of yj versus F −1 n will be approximately a straight
line.
If F (x) is from an appropriate family of distributions and also has appropriate
parameter values, then the line will have slope 1. On the other hand, if the assumed
distribution is inappropriate, the points will deviate from a straight line in a systematic
manner. The decision whether to accept or reject some hypothesized distribution is
subjective.
A.Santhakumaran 61

In the construction of Q - Q plot, the following should be borne in mind.


(i) The observed values will never fall exactly on a straight line. (ii) The ordered values
are not independent, since they have been ranked. (iii) The variances of extremes are
much higher than the variances in middle of plot. Greater discrepancies can be accepted
at the extremes. Linearity of points in the middle of plot is more important than the
linearity at extremes.
A sample 20 repairing times of electronic watch was considered. The repairing
time X is a random variable. The values are in seconds on the random variable X.
The values are arranged in the increasing order of magnitude as in Table 2.9.

Table 2.9 Repairing times of electronic watch


j Value j Value j Value j Value
1 88.54 6 88.82 11 88.98 16 89.26
2 88.56 7 88.85 12 89.02 17 89.30
3 88.60 8 88.90 13 89.08 18 89.35
4 88.64 9 88.95 14 89.18 19 89.41
5 88.75 10 88.97 15 89.25 20 89.45

Table 2.10 Normal Quantile


yj = xj = yj × .08 yj = xj = yj × .08
j− 12 j− 1 j− 12 1
   
−1 j− 2
j 20 F −1 ( 202 ) + 88.993 j 20 F ( 20 ) + 88.993
1 .025 -1.96 88.84 11 .525 .06 89.00
2 .075 - 1.41 88.88 12 .575 .18 89.01
3 .125 - 1.13 88.90 13 .625 .31 89.02
4 .175 - 0.93 88.92 14 .675 .45 89.03
5 .225 - 0.75 88.94 15 .725 .60 89.04
6 .275 -.60 88.95 16 .775 .75 89.05
7 .325 -.45 88.96 17 .825 .93 89.07
8 .375 -.31 88.97 18 .875 1.13 89.08
9 .425 - .18 88.98 19 .925 1.41 89.11
10 .475 -.06 88.99 20 .975 1.96 89.15
 
j− 12
The ordered observations in Table 2.10 are then plotted versus F −1 n for
A.Santhakumaran 62

j = 1, 2, · · · , 20 where F (.) is the cumulative distribution function of the Normal ran-


dom variable X with mean 88.993 seconds, and standard deviation 0.08 seconds to
obtain the Q - Q plot. The plotted values are shown in Figure 2.3. The general per-
ception of a straight line is quite clear in the Q - Q plot, supporting the hypothesis of
a normal distribution.

* yj
*

*
*

xj
Figure 2.3 Q - Q plot of the repairing times

Note: The diagnosis of statistical distributions of real life problems are not exact
but at best they represent reasonable approximations.

2. 12 Accelerated random variable distribution models

Now a days marketing is highly competitive because of the modernized electronic


goods due to rapid growth of the science and technology. Customers purchased goods
would be severed for their expectation.The reliability of the goods is very difficult to
A.Santhakumaran 63

obtain a reasonable amount of failure data under classical conditions. The accelerated
random variables probability distribution are appropriate to study the time to failure of
the experimental units. The probability distribution model is fitted to the accelerated
failure times and then extrapolated to estimate the life distribution at experimental
conditions. The accelerated life test optimum failure time are obtained for changing
the stress of accelerated random variable. The stresses are constant, progressive and
step stress.

Constant stress: An accelerated random variable stress is kept at a


constant level until all test units fail.

Progressive stress:An accelerated random variable is continuously in-


creasing until all test units fail.

Step stress: An accelerated random variable allows stress of a exper-


imental unit at specified times.

A simple step stress which consider only two stress in a experimental test. In
a random experiment initially low stress is first if the unit does not fail in a pre-
specified time, there after stress on it is raised and held for a specified time. Further
accelerated stress is repeatedly increased until the trust unit fails or censoring time
is reached. For example, if the test is a simple stress, then the cumulative exposure
probability distribution model with stress X1 , and , X2 and pre-specified time τ such
that X1 < X2 is

 F1 (x), 0≤x<τ

F (x) =
 F2 (x, τ + s),

τ ≤x<∞ (2.5)

where F1 (x) is the cumulative distribution function of the failure time at stress X1 ,
τ is the time to change stress and s is the solution of F1 (τ ) = F2 (s). Under constant
stress, an accelerated random variable is a second kind of the Pareto distribution which
appears as a mixture of the one parameter exponential distribution. The Lomax pdf
A.Santhakumaran 64

of a constant stress random variable is given by



 λ 1 + x −(λ+1)

x > 0, θ > 0, λ > 0

θ θ
f (x, θ, λ) =
 0

otherwise

The Lomax cumulative distribution function of constant stress is



 1 − 1 + x −λ

x > 0, θ > 0, λ > 0

θ
F (x, θ, λ) =
 0

otherwise

If λ = 1 in the Lomax cumulative distribution of constant stress, then it comes to the


x
exponential distribution model, i.e, F (x, θ) = 1 − (1 + xθ )−1 = 1 − e− θ , x > o, θ > 0
In case of the simple stress of accelerated random variable, the following assump-
tions are made for Lomax distribution model.
(i) Testing is done at stresses X1 and X2 where X1 < X2 .
(ii) The scale parameter θi at stress level i = 1, 2.
(iii) The lifetimes of the test units are independent and identically distributed .
(iv) All n test units are initially placed on low stress X1 and run until time τ when
the stress is changed to high stress X2 and the test is continued until all test units fail
or until a pre-specified censoring time Tc .
The Lomax cumulative exposure distribution model of the simple stress accel-
erated random variable is




 1 − (1 + xθ )−λ 0≤x<τ

x −λ
F (x) =
 1 − (1 + x−τ
θ2 + θ1 ) τ ≤ x < Tc


 0

otherwise

From equation (2.5) s = τ ( θθ22 ).The Lomax pdf of the simple stress accelerated random
variable is 

 λ
(1 + θx1 )−(λ+1) 0≤x<τ
 θ1


f (x) = λ x−τ τ −(λ+1)
 θ2 (1 + θ2 + θ1 ) τ ≤x<∞


 0

otherwise
A.Santhakumaran 65

The k step stress scheme with cumulative exposure model is






 F1 (x) 0 ≤ x < τ1


F2 (x − τ1 + s1 ) τ1 ≤ x < τ2







F3 (x − τ2 + s2 ) τ2 ≤ x < τ3 (2.6)


F (x) =



 ·················· ············


·················· ············







 F (x − τ
k−1 + sk−1 ) τk−1 ≤ x < ∞

k

where Fi (x) is the cumulative distribution function of the failure time at ith stress
level i = 1, 2, 3 · · · k, τi is the time change from ith to (i + 1)th stress level and si−1
is an equivalent start time at ith stress level which is produced the same population
cumulative fraction failing. Thus si−1 is the solution of

Fi+1 (si ) = Fi (τi − τi−1 + si−1 ) with τ0 = s0 = 0, i = 1, 2, 3, · · · , k

From the equation(2.6), the cumulative exposure model for a three step stress
accelerated random variable is



 F1 (x) 0 ≤ x < τ1



 F2 (x − τ1 + s1 ) τ1 ≤ x < τ2

(2.7)
F (x) =



 F3 (x − τ2 + s2 ) τ2 ≤ x < ∞



 0 otherwise

with s1 is the solution of F2 (s1 ) = F1 (τ1 ), and s2 is the solution of F3 (s2 ) = F2 (τ2 −
τ1 + s1 ). In the case of three step stress, the following assumptions are made for the
accelerated random variable.
(i) Testing is done at stress X1 , X2 and X3 where X1 < X2 < X3
(ii) The scale parameters θi at stress level i = 1, 2, 3.
(iii) The life times of test units are independent and identically distributed.
(iv) The constant λ, independent of time and stress.
(v) All n testing unitsare initially placed on low stress X1 and run until pre-specified
time τ1 when the stress is changed to high stress X2 for those remaining test units
that have not failed. The test is continued until pre-specified time τ2 when stress is
A.Santhakumaran 66

changed to X3 and continued until all remaining units fail. The Lomax cumulative
exposure model for a three step stress accelerated random variable is

x −λ


 1 − [1 + θ1 ] 0 ≤ x < τ1


x−τ1 τ1 −λ

 1 − [1 + + τ1 ≤ x < τ2
θ1 ]

θ2
F (x) =
τ2 −τ1



 1 − [1 + x−τ 2 τ1 λ
θ3 + θ2 + θ1 ] τ2 ≤ x < ∞



 0 otherwise

From equation (2.7), s1 = τ1 ( θθ21 ) and s2 = {τ2 − τ1 + τ1 ( θθ21 )}( θθ32 ). The pdf of Lomax
exposure model for a three step stress accelerated random variable is

λ x −(λ+1)


 θ1 [1 + θ1 ] 0 ≤ x < τ1


 λ [1 + x−τ1 + τ1 ]−(λ+1)

τ1 ≤ x < τ2

θ2 θ2 θ1
f (x) =
λ x−τ τ2 −τ1 τ1 −(λ+1)



 θ3 [1 + θ3 + θ2 + θ1 ]
2
τ2 ≤ x < ∞



 0 otherwise

Problems

2.1 The mean and variance of the number of defective items drawn randomly one
by one with replacement from a lot are found to be 10 and 6 respectively. The
distribution number of defective items is

(a) Poisson with mean 10 (c) Normal (µ = 10, σ 2 = 6)


(b) Binomial (n = 25, p = 0.4) (d) Bernoulli (n = 1, p = 0.4) Ans:(b)

2.2 Let X be the Poisson random variable with mean 3, then P {|X − 3| < 1} will be

(a) 21 e−3 (c) 4.5e−3


(b) 3e−3 (d) 27e−3 Ans:(c)

2.3 Let U(1) , U(2) , · · · , U(n) be the order statistics of a random sample U1 , U2 , · · · , Un
of size n from the Uniform (0, 1) distribution. Then the conditional distribution
of U1 given U(n) = u(n) is given by:
A.Santhakumaran 67

(a) Uniform on (0, u(n) )


1 n−1
(b) P {U1 = u(n) } = n and probability n is uniformly distributed over (0, u(n) )
 
1 n−1
(c) Beta n, n

(d) Uniform (0, 1) Ans:(b)

2.4 A biased coin is tested 4 times or until a head turns up, whichever occurs earlier.
The distribution of tails turning up is

(a) Binomial (c) Negative Binomial


(b) Geometric (d) Hypergeometric Ans:(b)

2.5 If X and Y are independent Exponential random variables with the same mean
λ, then the distribution of min(X, Y ) is

λ
(a) Exponential with mean 2 (c) Not Exponential with mean λ
(b) Exponential with mean λ (d) Exponential with mean 2λ Ans:(d)

2.6 The χ2 goodness of fit is based on the assumption that a character under study is

(a) Normal (c) Any distribution


(b) Non - Normal (d) Not required Ans:(a)

2.7 The exact distribution of χ2 goodness of fit for each experiment unit is classified
into one of more k categories of a random sample of size n depends on

(a) Hypergeometric distribution (c) Multinomial distribution


(b) Normal distribution (d) Binomial distribution Ans:(c)

2.8 If X1 ∼ b(n1 , θ1 ), X2 ∼ b(n2 , θ2 ) and X1 , X2 are independent, then the sum of


variables X1 + X2 is distributed as

(a) Hypergeometric distribution (c) Poisson distribution


(b) Binomial distribution (d) Not Binomial distribution Ans:(d)
A.Santhakumaran 68

2.9 If X1 ∼ b(n1 , θ), X2 ∼ b(n2 , θ) and X1 , X2 are independent, then the sum of
variables X1 + X2 is distributed as

(a) Hypergeometric distribution (c) Poisson distribution


(b) Binomial distribution (d) Negative Binomial Distribution Ans:(b)

2.10 If X1 ∼ P (θ1 ), X2 ∼ P (θ2 ) and X1 , X2 are independent,then the sum of variables


X1 + X2 is distributed as

(a) Hypergeometric distribution (c) Poisson distribution


(b) Binomial distribution (d) Satisfies the additive property
Ans:(c) and (d)

2.11 The skewness of a Binomial distribution with parameters n and p will be zero if:

(a) p < 0.5 (c) p 6= 0.5


(b) p > 0.5 (d) p = 0.5 Ans:(d)

2.12 If the sample size n = 2 in the Student’s t - distribution, then it reduces to

(a) Normal distribution (c) χ2 - distribution


(b) F - distribution (d) Cauchy distribution Ans:(d)

2.13 The reciprocal property of Fn−1,n2 −1 distribution can be expressed as


1
(a) Fn2 ,n1 (1 − α) = Fn1 ,n2 (α)
n o
1
(b)P {Fn1 ,n2 (α) ≥ c} = P Fn2 ,n1 (α) ≤ c

(c)Fn2 ,n1 (1 − α2 ) = 1
Fn1 ,n2 ( α )
2

(d) All the above Ans:(d)

2.14 The distribution of which the moment generating function is not useful in finding
the moments is
A.Santhakumaran 69

(a) Binomial distribution (c) Hypergeometric distribution


(b) Negative Binomial distribution (d) Geometric distribution Ans:(a)

2.15 Probability of selecting a unit from a population of N units in a simple random


sampling technique is a

(a) Bernoulli distribution (c) Geometric distribution


(b) Binomial distribution (d) Discrete uniform distribution Ans:(d)

2.16 A production process is a sequence of Bernoulli trials, the number of x defective


units in a sample of n units is a

(a) Bernoulli distribution (c) Multinomial distribution


(b) Binomial distribution (d) Hypergeometric distribution Ans:(b)

2.17 A random variable X is related to a sequence of Bernoulli trials in which the


number of trials (x + 1) to achieve the first success, then the distribution of X is

(a) Bernoulli distribution (c) Multinomial distribution


(b) Binomial distribution (d) Geometric distribution Ans:(d)

Pn
2.18 If X1 , X2 , · · · , Xn are iid Geometric variables, then i=1 Xi follows

(a) Negative Binomial distribution (c) Multinomial distribution


(b) Binomial distribution (d) Geometric distribution Ans:(a)

2.19 A random variable X is related to a sequence of Bernoulli trials in which x


failures preceding the nth success in (x + n) trials is a

(a) Binomial distribution (c) Negative Binomial distribution


(b) Multinomial distribution (d) Geometric distribution Ans:(c)

2.20 If a random experiment has only two mutually exclusive outcomes of a Bernoulli
trial, then the random variable leads to
A.Santhakumaran 70

(a) Binomial distribution (c) Negative Binomial distribution


(b) Multinomial distribution (d) Geometric distribution Ans:(a)

2.21 A box contains N balls M of which are white and N − M are red. If X denotes
the number of white balls in the sample contains n balls with replacement, then
X is a

(a) Binomial variable (c) Negative Binomial variable


(b) Bernoulli variable (d) Hypergeometric variate. Ans:(a)

2.22 The number of independent events that occur in a fixed amount of time may
follow

(a) Exponential distribution (c) Geometric distribution


(b) Poisson distribution (d) Gamma distribution Ans:(b)

2.23 A power series distribution



 ax θx

x ∈ S, ax ≥ 0
f (θ)
Pθ {X = x} =
 0

otherwise
p
where f (θ) = (1 + θ)n , θ = (1−p) and S = {0, 1, 2, · · ·}. Then the random variable
X have

(a) Geometric distribution (c) Binomial distribution


(b) Bernoulli distribution (d) Negative Binomial distribution Ans:(c)

2
2.24 The given probability function p(x) = 3x+1
for x = 0, 1, 2, 3, · · · , represents:

(a) Negative Binomial distribution(c) Bernoulli distribution


(b) Binomial distribution (d) Geometric distribution Ans:(d)

2.25 Dinesh receives 2, 2, 4 and 4 telephone calls on 4 randomly selected days. Assum-
ing that the telephone calls follow Poisson distribution, the estimate of telephone
calls in 8 days is
A.Santhakumaran 71

(a) 12 (c) 24
(b) 3 (d) 25 Ans:(c)

2.26 The exact distribution of χ2 goodness of fit test for each experiment units is
classified into one of two categories with a random sample of size n depends on

(a) Hypergeometric distribution (c) Multinomial distribution


(b) Normal distribution (d) Binomial distribution Ans:(d)

2.27 The pmf of a random variable X is

k+x
 !
P∞ θx+k
(−1)k


 k=0 k Γ(x+k+1) x = 0, 1, · · ·
pθ (x) =


 =0 otherwise
It is known as

(a) Binomial (c) Poisson


( b) Negative Binomial (d) Geometric Ans:(c)

2.28 A fair coin is tossed repeatedly . Let X be the number of tails before the first
head occurs. Let Y denote the number of trials observed between the occurrence
of the first and second heads. Let X + Y = N . Then which of the following
statements are true.
(a) X and Y are independent random variables with

 2−(k+1)

k = 0, 1, 2, · · ·
P {X = k} = P {Y = k} =
 0

otherwise

(b)N has a probability mass function given by



 (k + 1)2k−1

k = o, 1, 2, · · ·
P {N = k} =
 0

otherwise
A.Santhakumaran 72

(c) N has a probability mass function given by



 (k − 1)2k−1

k = o, 1, 2, · · ·
P {N = k} =
 0

otherwise

(d)N has a probability mass function given by



 (k + 1)2k−1

k = o, 1, 2, · · ·
P {N = k} =
 0

otherwise

Ans: (a)

2.29 A fair coin is tossed repeatedly . Let X be the number of tails before the first
head occurs. Let Y denote the number of trials observed between the occurrence
of the first and second heads. Let X + Y = N . Then which of the following
statements are true.
(a) given N = n , the conditional distribution of X and Y are independent.
(b) Given N = n ,

1
k = o, 1, 2, · · ·


n+1
P {X = k} =
 0

otherwise

(c) Given N = n ,

 1

k = o, 1, 2, · · ·
n
P {X = k} =
 0

otherwise

(d) Given N = n ,

1
k = o, 1, 2, · · ·


k+1
P {X = k} =
 0

otherwise

Ans: (a)

2.30 Suppose that (X, Y ) has a joint distribution with the marginal distribution of
X being N (0, 1) and E[Y | X = x] = x3 ∀ x ∈ <. Then which of the following
statement is true?
A.Santhakumaran 73

(a) Corr(X, Y ) = 0 (c) Corr(X, Y )< 0


(b) Corr(X, Y ) > 0 (d) X and Y are independent Ans: (b)

2.31 An urn has 3 red and 6 black balls. Balls are drawn at random one by one
without replacement. The probability that second red ball appears at the 5th
draw is

1 6!4!
(a) 9! (c) 4 × 9!
4! 6!4!
(b) 9! (d) 9! Ans:(c)

X
2.32 Suppose Y is a random vector such that the marginal distribution of X and
the marginal distribution of Y are the same and each is normally distributed
with mean 0 and variance 1. Then which of the following conditions imply
independence of X and Y ?.

1
(a) Cov(X, Y ) = 0 (c)P {X ≥ 0, Y ≤ 0} = 4
1
(b)P {X ≤ 0, Y ≤ 0} = 4 (d) αX + βY ∼ N (0, α2 + β 2 ) ∀ α and β Ans:(d)

X
2.33 Suppose Y is a random vector such that the marginal distribution of X and
the marginal distribution of Y are the same and each is normally distributed
with mean 0 and variance 1. Then which of the following conditions imply
1 − 12 (x2 +y 2 )
independence of X and Y ?. For all t and s ∈ < (a) f (x)(y) = 2π e =
f (x, y)
(b) E[eitX+itY ] = E[eitX ]E[eitY ]
(c) E[eitX+isY ] = E[eitX ]E[eisY ]
(d) f (x + y) = f (x)f (y) Ans: (a),(b),(c) and (d)

2.34 Consider a region R which is a triangle with vertices (0, 0), (0, θ), (θ, 0) where θ >
0 . A sample of size n is selected at random from this region R. Denote the sample
A.Santhakumaran 74

as {(Xi , Yi ), i = 1, 2, 3 · · · n}. Then denoting X(n) = max(X1 , X2 , · · · X(n) ) and


Y(n) = max(Y1 , Y2 , · · · Y(n) ). Which of the following statement is true?

(a) X(n) and Y(n) are independent


(b) Corr( X, Y(n) ) = 0 (d) V [X(n) ] > 0 and V [Y(n) ] > 0
(c) Cov[ X(n) , Y(n) ] = 0 Ans: (a), (b) (c) and (d)

2.35 There are two boxes Box I contains 2 red balls and 4 green balls. Box II contains
4 red balls and 2 green balls. A box is selected at random and a ball is chosen
randomly from the selected box. If the ball turns out to be red. what is the
probability that Box I had been selected?.

1 2
(a) 2 (c) 3
1 1
(b) 3 (d) 6 Ans: (b)

2.36 For any three events A and B which of the following relations always holds?.
1
(a)P 2 (A ∩ B c ) + P 2 (A ∩ B) + P (Ac ) ≥ 3
1
(b)P 2 (A ∩ B) + P 2 (A ∩ B c ) + P 2 (Ac ) = 3

(c) P 2 (A ∩ B c ) + P 2 (A ∩ B) + P 2 (Ac ) = 1
(d) P 2 (A ∩ B c ) + P (A ∩ B) + P 2 (Ac ) = 1 Ans:(a)

2.37 Suppose customers arrive in a shop according to a Poisson process with rate 4
per hour. The shop open at 10 A M.If it is given that the second customer arrives
at 10.40 A M, what is the probability that no customer arrived before 10 A M ?.

1 1
(a) 4 (c) 2
1
(b) e−2 (d) e− 2 Ans:(a)

2.38 Suppose X1 , X2 , · · · , Xn is a random sample from a distribution with probability


density function 
 3x2

0<x<1
f (x) =
 0

otherwise
A.Santhakumaran 75

Let Y = max(X1 , X2 , · · · Xn ). What is the pdf of f (y)?.


(a) f (y) = 3ny 3n−1 0 < y < 1
(b) f (y) = (1 − y 3 )n 0 < y < 1
(c) f (y) = 1 − (1 − y 3 )n 0 < y < 1
(d) f (y) = 3ny 2 (1 − y 3 )n−1 0 < y < 1 Ans: (d)

2.39 A and B play a game of tossing a fair coin. A starts the game by tossing the
coin twice, followed by A tossing the coin once and B tossing the coin twice and
this continues until a head turns up whoever gets the first head wins the game.
Then

(a)P {B wins} > P {Awins} (c) P {A wins} = 1 − P {B wins}


(b) P {B wins} = 2P {Bwins} (d) P {A wins} = 3P {B wins} Ans: (c)

2.40 Let {Xi : i = 1, 2, 3 · · ·} be a sequence of independent random variables each


having a Normal distribution with mean 2 and variance 5. Then which of the
following are true?.
1 Pn 2
(a) n i=1 Xi converges in probability to 9
n P o2
1 n
(b) n i=1 Xi converges in probability to 25
n P o2
1 n
(c) n i=1 Xi converges in probability to 81
n P o2
1 n
(d) n i=1 Xi converges in probability to 4 Ans:(a)

2.41 Let {Xi : i = 1, 2, 3 · · ·} be a sequence of independent random variables each


having a Normal distribution with mean 2 and variance 5. Then which of the
following are true?.
1 Pn
(a) n i=1 Xi converges in probability to 2
n P o2
1 n
(b) n i=1 Xi converges in probability to 25
(c) n1 ni=1 Xi2 converges in probability to 4
P
n P o2
1 n
(d) n i=1 Xi converges in probability to 4 Ans:(a)

2.42 Let X be a random variable with a certain non degenerate distribution. Then
identify the correct statements.
A.Santhakumaran 76

(a) If X has an Exponential distribution, then Median[X] < E[X]


(b) If X has a Uniform distribution on an interval[a, b], then E[X] < Median[X]
(c) If X has a Binomial distribution , then E[X] < V [X]
(d) If X has a Normal distribution, then E[X] < V [X] Ans:(c)

2.43 A random sample (without replacement) of sizen is drawn from a finite popu-
lation of size N (≥ 7). What is the probability that the 4th population unit is
included in the sample but the 6th population unit is not included in the sample?.

n(n−1) (N −n+1)
(a) N (N −1) (c) N (N −1)
n(N −n) n
(b) N (N −1) (d) N Ans:(b)

2.44 A fair die is thrown two times independently . Let X, Y be the outcomes of these
two throws and Z = X + Y . Then which of the following statement is true?

(a) Y and Z are independent (c) X and Z are independent


(b) Z and Y are correlated (d) X and Z are correlated Ans:(a)

2.45 Suppose the random variable T follow an Exponential distribution with unit
mean. Which of the following statement is true?.
(a) The hazard function of T is a constant function
(b) The hazard function of T 2 is canstant function
(c) The hazard function of T 3 is identity function
(d) The hazard function of T is not constant Ans:(a)

2.46 X and Y are independent random variables each having the pdf is

 1 12

−∞ < t < ∞
π 1+t
f (t) =
 0

otherwise

X+Y
The density function of Z = 3 where −∞ < z < ∞ is
A.Santhakumaran 77

6 1 3 1
(a) π 4+9z 2 (c) π 1+9z 2
6 1 6 1
(b) π 5+8z 2 (d) π 9+9z 2 Ans: (c)

2.47 Let Nt denote the number of accidents up to time t. Assume that {Nt } is a
Poisson process with intensity 2. Given that there are exactly 5 accidents during
the time period [20, 30]. What is the conditional probability that there is exactly
one accident during the time period [15, 25]?.

15 −10 1
(a) 12 e (c) 5
105 −30
(b) 5! e (d) 20e−20 Ans:(a)

2.48 Suppose (X, Y ) follows a bivariate Normal distribution with E[X] = E[Y ] =
−y 2 /2 dy,
Rx
0, V [X] = V [Y ] = 2 and Cov[X, Y ] = −1 , if φ(x) = √1
2π −∞ e then
P {X − Y > 6} is


(a) φ(−1) (c) φ( 6)
(b) φ(−3) (d) φ(−6) Ans: (c)

2.49 Let X1 ∼ N (0, 1) and let



 −X1

−2 < X1 < 2
X2 =
 X1

otherwise

Then identity the correct statement is


(a) (X1 , X2 ) does not have a bivariate Normal distribution
(b) Correlation coefficient between X1 and X2 is 1
(c) X1 is directly proportional to X2
(d) (X1 , X2 ) has a bivariate Normal distribution Ans:(a)

2.50 Let X and Y be independent and identically distributed random variables such
1 1
that P {X = 0} = P {X = 1} = 2 and P {Y = 0} = P {Y = 1} = 2 Let
Z = X + Y and W = |X − Y |. Then which statement is not correct?.
A.Santhakumaran 78

(a) X and W are independent (c) Z and W are independent


(b) Y and W are independent (d) Z and W are dependent Ans:(c)

2.51 Hundred tickets are marked 1, 2, 3 · · · 100 and are arranged at random. Four
tickets are picked from these tickets and are given to four persons A, B, C and
D. What is the probability that A gets the ticket with the largest value( among
A, B, C, D) and D gets the ticket with the smallest value( among A, B, CD) ?.

1 1
(a) 2 (c) 6
1 1
(b) 4 (d) 12 Ans: (d)

2.52 Suppose X and Y are independent and identically distributed random variables
and let Z = X + Y . Then the distribution of Z is in the same family as that of
X and Y , if X

(a) Normal distribution (c) Binomial Distribution


(b) Exponential distribution (d) Poisson Distribution Ans: (a) (c) and (d)

2.53 Suppose X is a random variable with following pdf



 pe−x + qe−x

x > 0, p + q = 1 and 0 ≤ p ≤ 1
f (x) =
 0

otherwise

Then the hazard function of X is a (a)constant function for p = 0 and 1


(b) constant function ∀ p ∈ [0, 1]
(c) increasing function ∀ p ∈ (0, 1)
(d) non-monotonic function ∀ p ∈ (0, 1) Ans: (a)

2.54 Let (Ω, F, P ) be a probability space and let A be an event with P (A) > 0. In
which of the following cases does B define a probability measure on (Ω, F )
(a) B(D) = P (A ∪ D) ∀ D ∈ F
(b) B(D) = P (A ∩ D) ∀ D ∈ F
A.Santhakumaran 79

(c)Q(D) = P (D | A) ∀ D ∈ F
(d) 
 P (A | D)

if D ∈ F with P (D) > 0
B(D) =
 0

if P (D) = 0
Ans:(c)

2.55 The joint pdf of (X, Y ) is



 6(1 − x) o < y < x, 0 < y < 1

f (x, y) =
 0

otherwise

Which among the following statement is correct?.


(a) X and Y are independent
(b) 
 3(y − 1)2

0 < y, 1
f (y) =
 0

otherwise
(c) 
 3(y − 1 y 2 ) 0 < y < 1

2
f (y) =
 0

otherwise
(d) 
 (y − 1 y 2 ) 0 < y < 1

2
f (y) =
 0

otherwise
Ans:(b)

2.56 Let {Xi : i = 1, 2, 3 · · ·} be a sequence of independent and identically distributed


random variables with common density function

 e−x

x>0
f (x) =
 0

otherwise

Let {Yi : i = 1, 2, 3 · · ·} be a sequence of iid random variables with common


density function 
 4e−4y

y>0
g(y) =
 0

otherwise
A.Santhakumaran 80

Also {Xi } and {Yi } are independent families. Let Zk = Yk − 3Xk , k = 1, 2, 3 · · ·


Which among the following statement is true?

Pn 1
(a) {Zk > 0} > 0 (c) i=1 Zk → +∞ with probability 2
P∞ 1
(b) P {Zk > 0} < 0 (d) i=1 Zk → −∞ with probability 2 Ans:(a)

2.57 Let X denote the Exponential distribution with parameter λ > 0. Fix a > 0.
Define the random variable Y by Y = k if ka ≤ x < (k + 1)a , k = 0, 1, 2, 3 · · ·.
Which of the following statement is true?.
(a)P {4 < Y < 5} = 0
(b) Y ∼ an Exponential form distribution
(c) Y ∼ a Negative Exponential distribution with parameter aλ
(d) Y ∼ a Negative Exponential distribution with parameter a Ans: (b)

2.58 Let X and Y be random variables with joint probability density function

 cxy o<x<y<1 c∈<

f (x, y) =
 0

otherwise

Which of the following statement is correct?.

1
(a) c = 8 (c) X and Y are independent
1
(b) c = 4 (d) P {X = Y } = 0 Ans:(c)

2.59 Let {Xn , n ≥ 1} be iid Uniform (−1, 2) random variables. Which of the following
1
Xi → 0 almost surely
P
statement is true ?. (a) n

(b) sup{X1 , X2 , · · ·} → 0 almost surely


(c) inf{X1 , X2 , · · ·} → 0 almost surely
1 1
(d) { 2n X2i − X2i−1 } → 1 almost surely
P P
2n

Ans:(a)

2.60 Suppose X1 , X2 , X3 , · · · are iid random variables having common density function
Assume f (x) = f (−x) ∀ x ∈ <. Which of the following statement is correct?.
A.Santhakumaran 81

(a) n1 (X1 + X2 + · · · + Xn ) → 0 in probability


1
(b) n (X1 + X2 + · · · + Xn ) → 0 almost surely
(c) P { √1n (X1 + X2 + · · · + Xn ) < 0} → 1
2
Pn Pn i
(d) i=1 Xi has the same distribution as i=1 (−1) Xi Ans: (d)

2.61 Let X and Y be iid Uniform (0, 1) random variables. Let Z = max(X, Y ) and
W = min(X, Y ). Then P {(Z − W ) > 21 } is

1 1
(a) 2 (c) 4
3 2
(b) 4 (d) 3 Ans:(c)

2.62 Two students are solving the same problem independently, if the probability
3
that the first one solves the problem is 5 and the probability that the second one
4
solves the problem is 5 . What is the probability that at least one of them solves
the problem?

16 21
(a) 25 (c) 25
18 23
(b) 25 (d) 25 Ans: (d)

2.63 A standard fair die is rolled until some face other than 5 or 6 turns up. Let X
denote the face value of the last roll and A = [ X even] and B = [ X is at most
2]. Then

1
(a) P (A ∩ B) = 0 (c) P (A ∩ B) = 4
1
(b) P (A ∩ B) = 6 (d) P (A ∩ B) = 13 . Ans:(a)

2.64 Let X be a random variable with pdf



 1

x > 0, θ > 0
θ
f (x) =
 0

otherwise

Denote Y = k if k ≤ X < k + 1, k = 0, 1, 2, 3 · · · . Then the distribution of Y is


A.Santhakumaran 82

(a) Normal distribution (c) Poisson distribution


(b) Binomial distribution (d) Geometric distribution Ans: (d)

2.65 A box contains 40 numbered red balls and 60 numbered black balls. From the
box balls are drawn one by one at random without replacement till the balls are
drawn. The probability that the last ball drawn black is

1 3
(a) 100 (c) 5
1 2
(b) 60 (d) 5 Ans: (d)

2.66 Suppose the random variable X have the pdf



 α(x − µ)α−1 e−(x−µ)2

x > µ, α > 0, −∞ < µ < ∞
f (x) =
 0

otherwise

Which of the following statements are correct? The hazard function of X is


(a) an increasing function for all α > 0
(b) a decreasing function for all α > 0
(c) an increasing function for some α > 0
(d) an implicit function for all α > 0 and −∞ < µ < ∞ Ans: (b)

2.67 Ten balls are put in 6 slots at random. Then the expected total number of balls
in the two extreme slots is

10 1
(a) 6 (c) 6
10 6
(b) 3 (d) 10 Ans:(b)

X−Y
2.68 Let X and Y be independent random variables and Z = 2 + 3. If X has
characteristic function ϕ and Y has characteristic function ψ, then Z has char-
acteristic function φ where

(a) φ(t) = e−i3t ϕ(2t)ψ(−2t)(c) φ(t) = e−i3t ϕ(t/2)ψ(t/2)


(d) φ(t) = e−i3t ϕ(t/2)ψ(−t/2)
(b) φ(t) = ei3t ϕ(t/2)ψ(−t/2)
Ans :(b)
A.Santhakumaran 83

2.69 Suppose Xn , X are random variables such that Xn converges in distribution to X


and (−)n Xn also converges to X. Then (a) X must be a symmetric distribution
(b) X must be 0
(c) X must have a density
(d) X 2 must be a constant Ans:(a)

2.70 Assume X ∼ Binomial(n, p) for some n = 1, 2, 3, · · · and 0 < p < 1 and Y ∼


Poisson ( λ) for some λ > 0. Suppose E[X] = E[Y ]. Then

(a) V [X] = V [Y ] (c) V [Y ] < V [X]


(b) V [X] < V [Y ] (d) E[X 2 ] = E[Y 2 ] Ans:(b)

2.71 Let X and Y be two random variables with joint probability density function

 1

if 0 ≤ x2 + y 2 ≤ 1
π
f (x, y) =
 0

otherwise

which of the following statement is correct ?

1
(a) P {X > 0} = 2 (c) Cov(X, Y ) = 0
1
(b) E[Y ] = 0 (d) E[Y ] = 2 Ans:(b)

2.72 Let X any Y be two random variables satisfying X ≥ 0, Y ≥ 0, E[X] =


3, V [X] = 9, E[Y ] = 2 and V [Y ] = 4. Which of the following statement is
correct?.

(a) Cov(X, Y ) ≤ 4 (c) V [X + Y ] ≤ 25


(b) E[XY ] > 6 (d) E[X + Y ] ≥ 25 Ans:(a)

2.73 A sample random sample of size n will be drawn from a class of 125 students,
and the mean mathematics score of the sample will be computed. If the standard
error of the sample mean for with replacement sampling is twice as much as the
standard error of the sample mean for without replacement sampling, the value
of n is
A.Santhakumaran 84

(a) 32 (c) 79
(b) 63 (d) 94 Ans: (c)

2.74 Let F (t) , h(t) and m(t) be the life time distribution function, the hazard function
and the mean residual lifetime function respectively, defined [0, ∞). Assume that
F (t) is absolutely continuous which of the following statements is true?.
R∞
(a) 0 h(t)dt =1
R ∞
(1−F (y))du
(b) m(t) = t
1−F (t) for t > 0
(c) m(t) is strictly increasing in t if the life time distribution is Exponential with
mean λ > 0.
R∞
(d) 0 h(t)dt 6= 1 Ans(a)

2.75 Let F (t) , h(t) and m(t) be the life time distribution function, the hazard function
and the mean residual lifetime function respectively, defined [0, ∞). Assume that
F (t) is absolutely continuous which of the following statements is true?.
(a) h(t)m(t) = 1 ∀ t > 0 if the life time distribution is Exponential with mean
λ > 0. R∞
(1−F (y))du
(b) m(t) = t
1−F (t) for t > 0
(c) m(t) is strictly increasing in t if the life time distribution is Exponential with
mean λ > 0.
R∞
(d) 0 h(t)dt 6= 1 Ans:(a)

2.76 A parallel system consists of n identical components. The life times of the
components are independently identically distributed unifom random variables
with mean 30 hours and range 60 hours. If the expected life time of the system
is 50 hours, then the value of n

(a) 3 (c) 5
(b) 4 (d) 6 Ans:(b)

2.77 Suppose A, B, C are events in a common probability space with P (A) =


0.2, P (B) = 0.2, P (C) = 0.3, P (A ∩ B) = 0.1, P (B ∩ C) = 0.1, P (A ∩ C) = 0.1.
A.Santhakumaran 85

Which of the following is a possible value of P (A ∪ B ∪ C)

(a) 0.1 (c) 0.3


(b) 0.2 (d) 0.4 Ans:( d)

2.78 Which of the following statement is correct ?.


X+Y
(a) If X and Y are N (0, 1) , then √
2
∼ N (0, 1)
X
(b) If X and Y are independent N (0, 1) , then Y has the Gamma distribution.
X+Y
(c) If X and Y are independent Uniform (0, 1), then 2 is U (0, 1)
(d) If X is Binomial(n, p), then (n − x) is Binomial(n, 1 − p). Ans:(d)

2.79 Which of the following is correct ?.


X+Y
(a) If X and Y are N (0, 1) , then √
2
∼ N (0, 1)
X
(b) If X and Y are independent N (0, 1) , then Y is a standard Cauchy random
variable
X+Y
(c) If X and Y are independent Uniform (0, 1), then 2 is U (0, 1)
(d) If X is Binomial(n, p), then (n − x) is Binomial(n, p). Ans:(b)

2.80 Let X and Y be independent Exponential random variables. If E[X] = 1 and


E[Y ] = 12 , then P {X > 2Y /X > Y } is

1 2
(a) 2 (c) 3
1 3
(b) 3 (d) 4 Ans:(c)

2.81 X and Y are independent Exponential random variables with mean 4 and 5
respectively. Which of the following statements is true?.
(a) X + Y is Exponential distribution with mean 9
(b) XY is Exponential distribution with mean 20
(c) max(X, Y ) is Exponential distribution
(d) min(X, Y ) is Exponential distribution Ans:(d)

2.82 There are five empty boxes. balls are placed independently one after another in
randomly selected boxes. The probability that the fourth ball is the first to be
A.Santhakumaran 86

placed in an occupied box equals

(a) 54 ( 53 )2 (c) ( 35 )2
(b) ( 35 )3 (d) 54 ( 35 ) Ans:(d)

2.83 From the letters A, B, C, D, E and F are chosen at random with replacement.
What is the probability that either the word BAD or the CAD can be formed
the chosen letter?.

1 6
(a) 216 (c) 216
3 12
(b) 216 (d) 216 Ans:(d)

2.84 Let X be a random variable which is symmetric about 0. Let F be the cumulative
distribution function of X. Which of the following statements is always true?
For all x ∈ <

(a) F (x) + F (−x) = 1 (d) F (x) + F (−x) = 1 − P {X = x}


(b) F (x) − F (−x) = 0 Ans: (c)
(c) F (x) + F (−x) = 1 + P {X = x}

2.85 Let Xi ’s independent random variables such that Xi ’s are symmetric about 0
and V [Xi ] = 2i − 1, i ≥ 1. Then limn→∞ P {X1 + · · · + Xn > n log}

(a) dose not exist (c) equals 1


1
(b) equals 2 (d) equals 0 Ans:(d)

2.86 Let X1 and X2 be normal random variables with mean 0 and variance 1. Let
U1 and U2 be iid U (0, 1) random variables independent of X1 , X2 . Define Z =
X√
1 U1 +X2 U2
. Then
U12 +U22

(a) E[Z] = 2 (c) Z is standard Cauchy distribution


(b) V [X] = 4 (d) Z ∼ N (0, 1) Ans:(d)
A.Santhakumaran 87

2.87 Let X and Y be independent normal random variables with mean 0 and variance
1. Let the characteristic function of XY be denoted by φ. Then

1
(a) φ(2) = 2 (c) φ(t)φ( 1t ) = |t| ∀ t 6= 0
t2
(b) φ(t) is a even function (d) φ(t) = E[e− 2 ] Ans:(b)

2.88 Let X and Y be random variables with joint cumulative distribution F (x, y).
Then which sufficient for (x, y) ∈ <2 to be a point of community of F ?.
(a) P {X = x, Y = y} = 0
(b) Either P {X = x} = 0 or P {Y = y} = 0
(c) P {X = x} = 0 and P {Y = y} = 0
(d) P {X = x, Y ≤ y} =
6 0 and P {X ≤ x, Y = y} = 0 Ans:(c)

2.89 Let X and Y be random variables with joint cumulative distribution F (x, y).
Then which sufficient for (x, y) ∈ <2 to be a point of community of F ?.
(a) P {X = x, Y = y} = 0
(b) Either P {X = x} = 0 or P {Y = y} = 0
(c) P {X = x} =
6 0 and P {Y = y} = 0
(d) P {X = x, Y ≤ y} = 0 and P {X ≤ x, Y = y} = 0 Ans:(d)

2.90 Let X1 , X2 , · · · Xn be iid standard normal variables. Which of the following is


true?

2|X1 |
(a) √ ∼ Student’s t distribution with 2 degrees of freedom
X22 +X32
X −X +X
√ 1 2 3 ∼ Student’s t distribution with 1 degree of freedom
(b)
2|X1 +X2 +X3 |
(X1 −X2 )2
(c) (X1 +X2 )2
∼ F distribution with (2, 2) degrees of freedom
3X12
(d) X12 +X22 +X32
∼ F distribution with (1, 3) degrees of freedom . Ans:(d)

2.91 The distribution function of a random variable X is F (x) = 1−(1+x)e−x , x ≥ 0.


Find the pdf of X. [Ans:f (x) = xe−x , x > 0]

2.92 If X denote the pdf 


 λe−λx

x>0
f (x) =
 0

otherwise
A.Santhakumaran 88

Find the distribution of Y = e−λx . [Ans: F (y) = (1 − y) 0 ≤ y ≤ 1 ]

2.93 The pdf of the random variable X is



1

 x 0≤x<5
 25


f (x) = 1
 25 (10 − x) 5 ≤ x < 10


 0

otherwise

1
Find the pdf of Y = X − 5. [ Ans: f (y) = 25 (y + 5) − 5 ≤ y ≤ 0 and
1
25 (5 − y) 0 ≤ x < 5]

2.94 If pdf of the random variable X is



 1
 1
−∞ < x < ∞
π 1+x2
f (x) =
 0

otherwise

Find the pdf of Y = tan1 X. [ Ans:f (y) = − π1 < x < π1 ]

2.95 If the random variable X is uniformly distributed over ( -1, 1). Find the density
function of Y = sin( πX 2
2 ). [ Ans:f (y) = 9 (y − 1) 1 < y < 4]

2.96 Let X be a random variable whose cumulative distribution function is





 0 x < −3


 1

−3 ≤ x < 6

6
F (x) =
1



 2 6 ≤ x < 10


x ≥ 10

 1

1 1
Find P {X ≤ 4}, P {−5 < X ≤ 4} and P {X = 4}.[Ans: 6, 6 and 0 ]

1
2.97 If X has the pdf f (x) = π, −π < x < π, find the pdf of Y = tan X. [ Ans:
1 1
f (y) = π 1+y 2 − ∞ < x < ∞]

2.98 The moment generating function of a random variable X is MX (t) = (1−p)+pet .


Find the pmf of the random variable X. [ Ans:P {X = x} = px (1 − p)1−x x =
0, 1]
A.Santhakumaran 89

2.99 Telephone calls are being placed a certain exchange at random times on the
average of 4 per minute. Assuming a Poisson law, determine the probability that
in a 15 seconds interval there are 3 or more calls. [ Ans: λ = 1, P {X ≥ 3} =
0.0803.]

2.100 Customer enter a waiting line at random at a rate of 4 per minute. Assuming
that the number enter the line in any given time interval has a Poisson distribu-
tion. Determine the probability that at least one customer enters the line in a
given half-minute interval. [ Ans: λ = 2, P {X ≥ 1} = 0.8647.]

2.101 Men arrive at a service counter according to a Poisson at an average of 6 per


hour, Women according to a Poisson Process at an average of 12 per hour and
Children according to a Poisson process at an average of 12 per hour. Determine
the probability that at least two customers(without regard to sex or age) arrive
5
in a 5 - minutes period. [Ans:λ = 2 , and P {X ≥ 2} = 0.7127.]
3. GOVERNING CRITERIA OF POINT ESTIMATION

3.1 Introduction

Suppose a random experiment outcomes results into random variable X which


describes some characteristics of the population.Then the associated function pθ (x) or
fθ (x) of the random variable X is known as probability model distribution, where as
θ is a measure of some unknown physical quantity. The unknown physical quantity
relates to the population is known as parameter which takes real values in the interval
(−∞, ∞). The totality of all possible values of θ is known as parameter space Ω. If
the parameter θ is known, then the pdf /pmf is completely specified and there by the
study of the random experiment is completely known for making decisions.

3.2 Estimators

The purpose of estimation theory is to arrive at some reasonable estimate of


unknown parameter θ which gives value based on the random experiment results
X1 , X2 , X3 , · · · Xn such that δ(T ) = T (X1 , X2 , X3 , · · · , Xn ). The T is called a Statis-
tic or an Estimator of the parameter θ ∈ Ω. For corresponding to the sample
X1 = x1 , X2 = x2 , · · · , Xn = xn is called estimate and is denoted by δ(t).

3.3 Loss function of estimators

The unknown quantity of the parameter θ is estimated by the estimator δ(T ).


This choice value of δ(T ) is not true to the actual value of θ . The difference between
δ(T ) and θ is known as the loss function and denoted L[θ, δ(T )] so that
L[θ, δ(T )] ≥ 0 if θ 6= δ(T ) and
L[θ, δ(T )] = 0 , if θ = δ(T )
where L[θ, δ(T )] is a random variable , since δ(T ) is a random variable. For a loss
function, the risk of an estimator δ(T ) is defined as

R[θ, δ(T )] = Eθ [L(θ, δ(T ))]


A.Santhakumaran 91

where the expectation is respect to θ. The risk R[θ, δ(T )] is an average loss and it
is assumed that R[θ, δ(T )] < ∞ ∀ ∈ θ. Risk is a measure of accuracy an estimator.
A well defined class of estimators are unbiasedness or equivariance. For obtaining an
optimal estimator for the unknown parameter θ, the risk is minimum. It leads the
successes of estimation theory.
For example, a random variable X is assumed to follow a Normal distribution
with mean θ and variance σ 2 . The parameter space Ω = {(θ, σ); −∞ < θ < ∞, 0 <
σ 2 < ∞}. Suppose a random sample X1 , X2 , X3 , · · · , Xn is taken on X. Here a statistic
T = t(X) from the sample X1 , X2 , · · · , Xn which gives the best value for the parameter
θ. Particular value of the Statistic T = t(x) = x̄ based on the values x1 , x2 , · · · , xn
is called an estimate of θ. If the statistic T = X̄ is used to estimate the unknown
parameter θ, then the sample mean is called an estimator of θ. Thus an estimator is
a rule or a procedure to estimate the value of θ. The numerical value x̄ is called an
estimate of θ.

3.4 Point estimator

Let X1 , X2 , · · · , Xn be n independent identically distributed (iid) random sample


drawn from a population with probability density function (pdf ) pθ (x), θ ∈ Ω. The
statistic T = t(X) is said to be a point estimator of θ, if the function T = t(X) has a
single point θ̂(X) which maps to θ in the parameter space Ω.

3.5 Problems of point estimation

The problems involved in point estimation are

• to select or choose a statistic T = t(X)

• to find the distribution function of the statistic T = t(X)

• to verify the selected statistic satisfies the criteria of point estimation


A.Santhakumaran 92

3.6 Criteria of point estimation

The Criteria of point estimation are

(i) Consistency (iii) Sufficiency and

(ii) Unbiasedness (iv) Efficiency

3.7 Consistent estimator

Consistency is a convergence property of an estimator. It is an asymptotic or


large sample size property. Let X1 , X2 , · · · , Xn be iid random sample drawn from a
population with common distribution Pθ , θ ∈ Ω. An estimator T = t(X) is consistent
for θ if every  > 0 and for each fixed θ ∈ Ω, Pθ {|T − θ| > } → 0 as n → ∞, i.e.
P
T → θ as n → ∞ for fixed θ ∈ Ω.
Problem 3.1 Let X1 , X2 , · · · , Xn be a random sample drawn from a Normal popula-
tion with mean θ and known variance σ 2 . Show that the sample mean X̄ is consistent
estimator of θ.
2
Solution: The statistic X̄ ∼ N(θ, σn ). To test the consistency of estimator, consider
for every  > 0 and fixed θ ∈ Ω,

Pθ {|X̄ − θ| > } = 1 − Pθ {|X̄ − θ| < }

= 1 − Pθ {− < X̄ − θ < }


√ X̄ − θ √
= 1 − Pθ {− n/σ < √ <  n/σ}
σ/ n
√ √
= 1 − Pθ {− n/σ < Z <  n/σ}
X̄ − θ
where Z = √
σ/ n
= 1 − Pθ {−∞ < Z < ∞} as n → ∞

= 1 − 1 = 0 as n → ∞
P
Thus X̄ → θ as n → ∞. The sample mean X̄ of the normal population is a consistent
estimator of the population mean θ.
Remark 3.1 In general sample mean need not be a consistent estimator of the
population mean.
A.Santhakumaran 93

Problem 3.2 Let X1 , X2 , X3 , · · · , Xn be iid random sample drawn from a Cauchy


population with pdf

 1
 1
−∞ < x < ∞
π 1+(x−θ)2
pθ (x) =
 0

otherwise

Examine whether the sample mean of the Cauchy population is consistent?


Solution: For every  > 0 and fixed θ ∈ Ω,

Pθ {|X̄ − θ| > } = 1 − Pθ {− < X̄ − θ < }

= 1 − Pθ {θ −  < X̄ < θ + }
Z θ+
1 1
= 1− 2
dx̄
θ− π 1 + (x̄ − θ)
since X̄ ∼ Cauchy distribution with parameter θ
Z 
1 1
= 1− 2
dz where x̄ − θ = z
− π 1 + z
1
= 1 − [tan−1 (z)]−
π
2
= 1 − tan−1 () since tan−1 (−θ) = − tan−1 (θ)
π

Thus Pθ {|X̄ − θ| > } 6→ 0 as n → ∞


P
i.e., X̄ 6→ θ as n → ∞ . For Cauchy population the sample mean X̄ is not a consistent
estimator of the parameter θ.

3.7.1 Sufficient condition for consistent estimator

Theorem 3.1 If {Tn }∞


n=1 is a sequence of estimator such that Eθ [Tn ] → θ and Vθ [Tn ] → 0

as n → ∞, then the statistic Tn is a consistent estimator of the parameter θ.

Proof: Consider variance of the statistic Tn , i.e.,

Eθ [Tn − θ]2 = Eθ (Tn − Eθ [Tn ] + Eθ [Tn ] − θ)2

= Eθ (Tn − Eθ [Tn ])2 + {Eθ [Tn − θ]}2

= Vθ [Tn ] + {Eθ [Tn − θ]}2

since Eθ (Tn − Eθ [Tn ]) = 0


A.Santhakumaran 94

By Chebychev’s inequality
1
Pθ {|Tn − θ| > } ≤ Eθ [Tn − θ]2
2
1 h 2
i
≤ V [T
θ n ] + {E [T
θ n − θ]}
2
→ 0 as n → ∞

since Vθ [Tn ] → 0 and Eθ [Tn ] → θ as n → ∞.


.. . Tn is a consistent estimator of θ.
Remark 3.2 The conditions are only sufficient, but not necessary. Since if {Xn }∞
n=1 is

a sequence of iid random variables from a population with finite mean θ = Eθ [X], then
X̄ converges to θ in probability for each fixed θ ∈ Ω. It is known as Khintchin’s Weak
Law of Large Numbers, i.e., sample mean X̄ finitely exists, is a consistent estimator
for the population mean θ which does not require the condition Vθ [X̄] → 0 as n → ∞
for every fixed θ ∈ Ω. Thus consistency follows the existence of expectation statistic
and assumption of finite variance is not needed.
For illustration the Cauchy pdf is

 1
 1
−∞ < x < ∞
π 1+x2
p(x) =
 0

otherwise

The mean E[X] does not exist finitely, i.e.,


Z ∞
1 x
E[X] = dx
−∞ π 1 + x2
is divergent. But the Cauchy Principle value
Z t t
1 x 1 2x
Z
lim dx = lim dx
π t→∞ −t 1 + x2 2π t→∞ −t 1 + x2
1 h it
= lim log(1 + x2 )
2π t→∞ −t
1
= lim [log(1 + t ) − log(1 + t2 )]
2
2π t→∞
= 0

The Cauchy Principle value 0 is taken as the mean of Cauchy distribution. Thus the
Cauchy distribution has not the mean finitely exist. Hence for the Cauchy population,
the sample mean X̄ is not a consistent estimator of the parameter θ.
A.Santhakumaran 95

Problem 3.3 If X1 , X2 , · · · , Xn is a random sample drawn from a normal population


1 Pn
N(0, σ 2 ). Show that 3n
4
k=1 Xk is a consistent estimator of σ 4 .
1 Pn 4
Solution: Let T = 3n k=1 Xk .

n
1 X
Eσ4 [T ] = E 4 [Xk4 ]
3n k=1 σ
n
1 X
= E 4 [Xk − 0]4 since E[Xk ] = 0 ∀ k = 1, 2, · · ·
3n k=1 σ
1 1
= nµ4 = 3nσ 4 = σ 4
3n 3n

since µ4 = 3σ 4 whereµ2n = 1 × 3 × 5 × · · · × (2n − 1)σ 2n n = 1, 2, · · ·


n
1 X
Vσ4 [T ] = V 4 [X 4 ]
(3n)2 k=1 σ
n 
1 X 4 2

4
2 
= Eσ4 [Xk ] − Eσ4 [Xk ]
(3n)2 k=1
1
= n[µ8 − µ24 ]
(3n)2
1
= 2
[105σ 8 − (3σ 4 )2 ] since µ8 = 1 × 3 × 5 × 7 × σ 8
3 n
1
= 96σ 8 → 0 as n → ∞.
32 n

Thus T is a consistent estimator of σ 4 .


Problem 3.4 Let X1 , X2 , · · · Xn be a random sample drawn from a population with
Qn 1
rectangular distribution ∪(0, θ), θ > 0. Show that ( i=1 Xi )
n is a consistent estimator
of θe−1 .
Qn 1
Solution: Let GM = ( i=1 Xi )
n ∀ Xi > 0, i = 1, 2, · · · , n.
n
1X
loge GM = log Xi
n i=1
1
Z θ
Eθ [log X] = log xdx
θ 0
( Z θ )
1 θ
= [x log x]0 − dx
θ 0
1
 
= θ log θ − lim x log x − θ
θ x→0

= log θ − 1
A.Santhakumaran 96

1
log x x
Since lim x log x = lim 1 = lim =0
x→0 x→0 x→0 − 12
x x
1

Eθ [log X]2 = (log x)2 dx
θ 0
Z θ
1 2 θ 1 log x
= [x(log x) ]0 − 2x dx
θ θ 0 x
1 2
= (log θ)2 − lim x(log x)2 − [θ log θ − θ]
θ x→0 θ
= (log θ) − 2 log θ + 2 since lim x(log x)2 = 0
2
x→0

Vθ [log X] = (log θ) − 2 log θ + 2 − (log θ − 1)2 = 1


2
n
1 X 1
Vθ [log GM ] = 2
Vθ [log Xi ] =
n i=1 n
Vθ [log GM ] → 0 as n → ∞, ∀ θ > 0

Thus loge GM is a consistent estimator of log θ − 1, i.e., GM is a consistent estimator


of θe−1 .
Problem 3.5 Let X1 , X2 , · · · , Xn be iid random sample drawn from a population
2 Pn
with Eθ [Xi ] = θ and Vθ [Xi ] = σ 2 , ∀ i = 1, 2, · · · , n. Prove that n(n+1) i=1 iXi is a
consistent estimator of θ.
Solution: Consider
" n #
X
Eθ iXi = Eθ [X1 + 2X2 + · · · + nXn ]
i=1
= θ + 2θ + · · · + nθ

= θ[1 + 2 + · · · + n]
n(n + 1)
= θ
2
n
" #
2 X
Eθ iXi = θ, ∀ θ ∈ Ω
n(n + 1) i=1
" n # n
X X
Vθ iXi = i2 Vθ [Xi ]
i=1 i=1
n
X
= σ2 i2
i=1
n(n + 1)(2n + 1)
= σ2
6
n
" #
2 X 2 (2n + 1) 2
Vθ iXi = σ → 0 as n → ∞
n(n + 1) i=1 3 n(n + 1)
A.Santhakumaran 97

2 Pn
Thus n(n+1) i=1 iXi is a consistent estimator of θ.

Consistent estimator is not unique

As the sample size increases the estimator should get closer to the parameter of
interest. Here closer means convergence. For every  > 0, there exists an N where for
all n > N , |Tn − θ| < . Of course the estimators considered are random, i.e., for
every ω ∈ S ( set of all outcomes ) one has a different estimate. The natural question
is, what does convergence mean for random sequences?.
Problem 3.6 Let T = max1≤i≤n {Xi } be the nth order statistic of a random sample
of size n drawn from a population with a uniform distribution on the interval ( 0, θ).
Show that consistent estimator is not unique.
Solution: The pdf of T is

 ntn−1

n 0 < t < θ, θ > 0
θ
pθ (t) =
 0

otherwise
Z θ
n n
Eθ [T ] = tn dt = θ
θn 0 n+1
nθ2 nθ2
Eθ [T 2 ] = , Vθ [T ] =
(n + 2) (n + 2)(n + 1)2
Thus Eθ [T ] → θ and Vθ [T ] → 0 as n → ∞. T is a consistent estimator of θ. Also
h i
(n+1) θ2
Eθ n T = θ and Vθ [ (n+1)
n T] = n(n+2) → 0 as n → ∞, i.e., (n+1)
n T is also a
(n+1)
consistent estimator of θ. The statistic T and n T are the two consistent estimators
of the same parameter θ. Thus consistent estimator is not unique.

3.7.2 Invariant property of consistent estimator

If T = t(X) is a consistent estimator of θ, then an T, T + cn , and an T + cn are


also consistent estimators of θ, where an = 1 + nk , k ∈ < and an → 1 and cn → 0 as
n → ∞ for every fixed θ ∈ Ω. In general, we have the Theorem 3.2.
Theorem 3.2 If Tn = tn (X) is a consistent estimator of τ (θ) and ψ(τ (θ)) is a
continuous function of τ (θ), then ψ(Tn ) is a consistent estimator of ψ(τ (θ)).
A.Santhakumaran 98

P
Proof: Given Tn = tn (X) is a consistent estimator τ (θ), i.e., Tn → τ (θ) as n → ∞.
Therefore for given  > 0, η > 0 , there exist a positive integer n ≥ N (, η) such that

P {|Tn − τ (θ)| < } > 1 − η ∀ n ≥ N

Also ψ(.) is a continuous function , i. e., For every  such that


{|ψ(Tn ) − ψ(τ (θ))|} < 1 whenever |Tn − τ (θ)| < 
i.e., |Tn − τ (θ)| <  ⇒ |ψ(Tn ) − ψ(τ (θ))| < 1
For any two events A and B if A ⇒ B, then A ⊆ B.
Therefore P (A) ≤ P (B), i.e., P (B) ≥ P (A) . Let A = {|Tn − τ (θ)| < } and B =
{|ψ(Tn ) − ψ(τ (θ))| < 1 } then
P {ψ(Tn ) − ψ(τ (θ))| < 1 } ≥ P {|Tn − τ (θ)| < }
P
i.e., P {|ψ(Tn ) − ψ(τ (θ))| < 1 } ≥ 1 − η ∀ n ≥ N ⇒ ψ(Tn ) → ψ(τ (θ)) as n → ∞.
i.e., ψ(Tn ) is a consistent estimator of ψ(τ (θ))
Problem 3.7 Suppose T = t(X) is a statistic with pdf pθ (x) for θ > 0, θ ∈ Ω. Prove
that T 2 = t2 (X) is a consistent estimator of θ2 , if T = t(X) is a consistent of θ.
Solution: Given T = t(X) is a consistent estimator of θ.
By the definition of consistent estimator, Pθ {|T − θ| < } → 1 as n → ∞, for θ >
0, ∀ θ ∈ Ω, consider

Pθ {|T − θ| < } = Pθ {θ −  < T < θ + }

= Pθ {(θ − )2 < T 2 < (θ + )2 }

= Pθ {−2θ < T 2 − θ2 − 2 < 2θ}

= Pθ {−0 < T 2 − θ2 − 2 < 0 }

where 0 = 2θ

= Pθ {−0 < T 0 − θ2 < 0 }

where T 0 = T 2 − 2

= Pθ {|T 0 − θ2 | < 0 } → 1 as n → ∞

T 0 = T 2 − 2 ⇒ T 2 as n → ∞ since  → 0 as n → ∞
.. . Pθ {|T 2 − θ2 | < 0 } → 1 as → ∞. Thus T 2 is a consistent estimator of θ2 .
A.Santhakumaran 99

3.8 Unbiased estimator

For any statistic g(T ), if the mathematical expectation is equal to a parameter


τ (θ), then g(T ) is called an unbiased estimator of the parameter τ (θ),

i.e., Eθ [g(T )] = τ (θ), ∀ θ ∈ Ω.

Otherwise, the statistic g(T ) is said to be a biased estimator of τ (θ). The unbiased
estimator is also called zero bias estimator. A statistic g(T ) is said to be asymptotically
unbiased estimator if Eθ [g(T )] → τ (θ) as n → ∞, ∀ θ ∈ Ω.
Problem 3.8 A random variable X has the pdf




 2θx if 0 < x < 1

pθ (x) = (1 − θ) if 1 ≤ x < 2, 0 < θ < 1



 0

otherwise
Show that g(X), a measurable function of X is an unbiased estimator of θ if and only
R 1 1 R2
if 0 xg(x)dx = 2 and 1 g(x)dx = 0.
Solution: Assume g(X) is an unbiased estimator of θ, i.e.,

Eθ [g(X)] = θ
Z 1 Z 2
g(x)2θxdx + g(x)(1 − θ)dx = θ
0 1
Z 1 Z 2  Z 2
θ 2xg(x)dx − g(x)dx + g(x)dx = θ
0 1 1
Z 1 Z 2
⇒ 2xg(x)dx − g(x)dx = 1 and
0 1
Z 2
g(x)dx = 0
1
Z 1
1
i.e., xg(x)dx = and
0 2
Z 2
g(x)dx = 0
1
R 1 1
R 2
Conversely, 0
xg(x)dx = 2 and 1
g(x)dx = 0, then g(X) is an unbiased estimator of θ.
Z 1 Z 2
Eθ [g(X)] = 2θxg(x)dx + (1 − θ)g(x)dx
0 1
Z 1 Z 2
= 2θ xg(x)dx + (1 − θ) g(x)dx
0 1
1
= 2θ + (1 − θ) × 0 = θ
2
A.Santhakumaran 100

Thus g(X) is an unbiased estimator of θ.


Problem 3.9 If T denotes the number of successes in n independent and identical
trials of an experiment with probability of success θ. Obtain an unbiased estimator of
θ2 and θ(1 − θ), 0 < θ < 1.
Pn
Solution:Let Xi ∼ b(1, θ), ∀ i = 1, 2, · · · , n, then T = i=1 Xi ∼ b(n, θ). If g(T ) is
the unbiased estimator of τ (θ) = θ(1 − θ), then Eθ [g(T )] = θ(1 − θ)
n
X
i.e., g(t)cnt θt (1 − θ)n−t = θ(1 − θ)
t=0
n  t
X θ
g(t)cnt = θ(1 − θ)1−n
t=0
1−θ
θ ρ
Consider ρ = ⇒θ =
1−θ 1+ρ
n  1−n
X ρ 1
.. . g(t)cnt ρt =
t=0
1+ρ 1+ρ
= ρ(1 + ρ)n−2

= ρ[1 + c1n−2 ρ + cn−3


2 ρ2 + · · · + ρn ]

Equating the coefficient of ρt on both sides

g(t)cnt = cn−2
t−1
(n − 2)! t!(n − t)!
g(t) =
(t − 1)!(n − t − 1)! n!
(n − 2)!t(t − 1)!(n − t)(n − t − 1)!
=
(t − 1)!n(n − 1)(n − 2)!(n − t − 1)!
t(n − t)
= , if n = 2, 3, · · ·
n(n − 1)

Thus the unbiased estimator of θ(1 − θ) is

T (n − T )
n = 2, 3, · · ·
n(n − 1)

Let the unbiased estimator of θ2 be given by

Eθ [g ∗ (T )] = θ2
n t
θ

g ∗ (t)cnt
X
(1 − θ)n = θ2
t=0
1−θ
n
g ∗ (t)cnt ρt = ρ2 (1 + ρ)n−2
X

t=0
= ρ2 [1 + cn−2
1 ρ + · · · + cn−2
t ρt + · · · + ρn−2 ]
A.Santhakumaran 101

.. .g ∗ (t)cnt = ct−2
n−2

(n − 2)!t!(n − t)!
⇒ g ∗ (t) =
(t − 2)!(n − t)!n!
(n − 2)!t(t − 1)!(t − 2)!
=
(t − 2)!n(n − 1)(n − 2)!
t(t − 1)
= n = 2, 3, · · · · · ·
n(n − 1)

Thus the unbiased estimator of θ2 is

T [T − 1]
g ∗ (T ) = n = 2, 3, · · ·
n(n − 1)

Problem 3.10 Obtain an unbiased estimator of 1θ , given a sample observation from


a Geometric population with pmf

 θ(1 − θ)x−1

x = 1, 2, 3, · · · , 0 < θ < 1
pθ (x) =
 0

otherwise

Solution: Consider

1
Eθ [g(X)] =
θ

X 1
g(x)θ(1 − θ)x−1 =
x=1
θ

X (1 − θ)
g(x)(1 − θ)x =
x=1
θ2
Take 1 − θ = ρ ⇒ θ = 1 − ρ

g(x)ρx = ρ(1 − ρ)−2
X

x=1
= ρ(1 + 2ρ + 3ρ2 + · · · + xρx−1 + · · ·)

⇒ g(x) = x ∀ x = 1, 2, 3, · · ·

Thus g(X) = X is the unbiased estimator of 1θ .

Unbiased estimator does not exist

Problem 3.11 Assume X ∼ b(1, θ), 0 < θ < 1. If a single observation x of X from a
Bernoulli population, then shown that there is no unbiased estimator exist for θ2 .
A.Santhakumaran 102

Solution: The pmf of random variable X is



 θ(1 − θ)1−x

x = 0, 1 and 0 < θ < 1
pθ (x) =
 0

otherwise

Let there be an unbiased estimator for θ2 say g(X). That is,

Eθ [g(X)] = θ2
1
X
g(x)θx (1 − θ)1−x = θ2
x=0
g(0)(1 − θ) + g(1)θ = θ2

[g(1) − g(0)]θ + g(0) = θ2 ⇒ g(1) = 0 and g(0) = 0 i.e., g(x) = 0 for x = 0, 1.

Thus the value of θ2 is 0 for x = 0 or x = 1. But the value of θ2 lies between 0 to 1.


.˙. The unbiased estimator of θ2 does not exist.
Problem 3.12 If X ∼ b(n, θ) , then show that there exist no unbiased estimator of
1
the parameter θ

Solution: Let the mathematical expectation of the statistic g(X) be

1
Eθ [g(X)] =
θ
n x
n! θ 1
X 
g(x) (1 − θ)n =
i=0
x!(n − x)! 1−θ θ
n
X n! (1 + ρ)n+1
g(x) ρx =
i=0
x!(n − x)! ρ
θ
where ρ = 1−θ

n! (1+ρ)n+1
ρx → g(0) as θ → 0 and → ∞ as ρ → 0 or θ → 0.
P
g(x) x!(n−x)! ρ

Thus there is no unbiased estimator exist of the parameter 1θ .

Unbiased estimator is unique

Problem 3.13 illustrates the uniqueness of unbiased estimator. The unbiased es-
timator of the parameter θ is negative. For practically it is not possible but it is
constructed for mathematical interest .
A.Santhakumaran 103

Problem 3.13 A random sample X is drawn from a Bernoulli population b(1, θ), θ =
{ 41 , 12 }. Show that there exists an unique unbiased estimator of θ2 .
Solution: Let g(X) be the unbiased estimator of θ2 , i.e.,

Eθ [g(X)] = θ2
1
X
g(x)θx (1 − θ)1−x = θ2
x=0

1 1
When θ = ⇒ 3g(0) + g(1) = (3.1)
4 4
1 1
When θ = ⇒ g(0) + g(1) = (3.2)
2 2
Solving the equations (3.1) and (3.2) for g(0) and g(1), one gets the values of g(0) = − 81
and g(1) = 58 , 
 −1

for x = 0
8
i.e., g(x) =
5


8 for x = 1

Thus the unbiased estimator of θ2 is g(X) = X which is unique.

Unbiased estimator is not unique

Let X1 , X2 , · · · , Xn be a iid random sample drawn from a population with Poisson


1 Pn
distribution P (θ). g1 (X) = X̄ and g2 (X) = n i=1 (Xi − X̄)2 are the two unbiased
estimators of θ. Consider a statistic g(X) = αg1 (X) + (1 − α)g2 (X), α ∈ <, 0 < θ < 1.
Then Eθ [g(X)] = θ ∀ θ ∈ Ω and α ∈ < which is not unique. Thus unbiased estimator
is not unique.
Problem 3.14 Show that the mean X̄ of a random sample of size n drawn from a
population with probability density function

 1 e− xθ

0 < x < ∞, θ > 0
θ
pθ (x) =
 0

otherwise

θ2
is an unbiased estimator of θ and has variance n.
Pn
Solution: Let T = i=1 Xi ∼ G(n, θ). The pdf of T is

1 − θt n−1
θn Γn e t 0 < t < ∞, θ > 0


pθ (t) =
 0

otherwise
A.Santhakumaran 104

Z ∞
1 − 1 t n+1−1
Eθ [T ] = e θ t dt
0 θn Γn
= nθ
" n #
X
Eθ Xi = nθ ∀ θ > 0
i=1
Eθ [nX̄] = nθ ∀ θ > 0

⇒ Eθ [X̄] = θ ∀ θ > 0

Eθ [T 2 ] = n(n + 1)θ2 ∀ θ > 0

Vθ [T ] = nθ2 ∀ θ > 0
 Pn
i=1 Xi

.
. . Vθ [X̄] = Vθ
n
1
= Vθ [T ]
n2
1 2 θ2
= nθ =
n2 n
Problem 3.15 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal popu-
Pn
Xi2
lation with mean zero and variance σ 2 , 0 < σ 2 < ∞. Show that i=1
n is an unbiased
2σ 4
estimator of σ 2 and has variance n .
Pn ns2
Solution: Define ns2 = 2
i=1 Xi , then Y = σ2
∼ χ2 distribution with n degrees of
freedom , i.e.,Y ∼ G( n2 , 21 ).

1 n

 1
n e− 2 y y 2 −1 0 < y < ∞
2 2 Γn
p(y) = 2
 0

otherwise

Z ∞
1 1 n
E[Y ] = 1 e− 2 y y 2 +1−1 dy
n
0 2 Γ2
2

1 Γ( n2 + 1)
= n n =n
2 2 Γ n2 ( 1 ) 2 +1
2
E[Y 2 ] = n + 2n 2

V [Y ] = 2n
ns2
But Y = 2
" σ#
ns2
.. . Eσ2 = n
σ2
⇒ Eσ2 [s2 ] = σ 2
A.Santhakumaran 105

P
Xi2
Thus n is an unbiased estimator of σ 2 .
" #
ns2
Vσ2 = 2n
σ2
n2
V 2 [s2 ] = 2n
σ4 σ
2σ 4
Vσ2 [s2 ] =
n
Problem 3.16 Let Y1 < Y2 < Y3 be the order statistics of a random sample of size
3 drawn from an uniform population with pdf

 1

0<x<θ
θ
pθ (x) =
 0

otherwise

Show that 4Y1 and 2Y2 are unbiased estimators of θ. Also find the variance of these
estimators.
Solution: The pdf of Y1 is
 hR i2
 3! 1 θ 1
dx 0 < y1 < θ

1!2! θ y1 θ
pθ (y1 ) =
 0

otherwise

 3 [1 − y1 ]2

0 < y1 < θ
θ θ
pθ (y1 ) =
 0

otherwise

3
Z θ y1
Eθ [Y1 ] = y1 (1 − )2 dy1
θ 0 θ
Z 1
3 y1
= θt(1 − t)2 θdt where θ =t
θ 0
Z 1
= 3θ t2−1 (1 − t)3−1 dt
0
Γ2Γ3 θ
= 3θ = ∀θ>0
Γ5 4
θ2 3θ2
Similarly Eθ [Y12 ] = and Vθ [Y1 ] =
10 15
3θ 2
.. . Vθ [4Y1 ] =
5
The pdf of Y2 is !
Z y2 Z θ
3! 1 1 1

pθ (y2 ) = dx dx
1!1!1! 0 θ θ y2 θ
A.Santhakumaran 106


 62 y2 [1 − y2 ] 0 < y2 < θ

θ θ
pθ (y2 ) =
 0

otherwise
θ
.˙. Eθ [Y2 ] = 2
3θ2 θ2 θ2
⇒ 2Y2 is an unbiased estimator of θ and Eθ [Y 2 ] = 10 and Vθ [Y2 ] = 20 ⇒ Vθ [2Y2 ] = 5

Problem 3.17 Let Y1 and Y2 be two independent and unbiased estimators of θ.


If the variance of Y1 is twice the variance of Y2 , find the constant k1 and k2 so that
k1 Y1 + k2 Y2 is an unbiased estimator of θ with smaller possible variance for such a
linear combination.
Solution: Given for all θ ∈ Ω
Eθ [Y1 ] = θ, Eθ [Y2 ] = θ, Vθ [Y1 ] = 2σ 2 and Vθ [Y2 ] = σ 2 . Also Eθ [k1 Y1 +k2 Y2 ] = θ ∀ θ

k1 Eθ [Y1 ] + k2 Eθ [Y2 ] = θ

⇒ k1 + k2 = 1

i.e., k2 = 1 − k1

Consider φ = Vθ [k1 Y1 + k2 Y2 ]

= k12 Vθ [Y1 ] + k22 Vθ [Y2 ]

= k12 2σ 2 + (1 − k1 )2 σ 2

= 3k12 σ 2 − 2k1 σ 2 + σ 2

Differentiate twice this with respective to k1



= 6k1 σ 2 − 2σ 2
dk1
d2 φ
= 6σ 2
dk12
dφ d2 φ
For minimum = 0 and >0
dk1 dk12
⇒ 6k1 σ 2 − 2σ 2 = 0
1 2
i.e., k1 = and k2 =
3 3

Thus 13 Y1 + 23 Y2 has minimum variance.


A.Santhakumaran 107

Consistent estimator but not unbiased

The following problem 3.18 explains the consistent estimator which is


not unbiased.
Problem 3.18 Given an example that estimator is consistent but not unbi-
ased.
Solution: Let X1 , X2 , · · · , Xn be a sample of size n drawn from a normal
1 Pn
population with mean θ and variance σ 2 . Define s2 = n i=1 (Xi − X̄)2 , then
ns2
Y = σ2
∼ χ2 distribution with (n − 1) degrees of freedom and Y ∼ G( n−1 1
2 , 2 ).

It has the pdf


 1 n−1

 n−1
1
e− 2 y y 2
−1
0<y<∞
p(y) = 2 2 Γ n−1
2


0 otherwise

Z ∞
1 1 n−1
E[Y r ] = n−1 e− 2 y y 2
+r−1
dy
0 2 2 Γ n−1
2
 
n−1
1 Γ 2 +r
= n−1 n−1
2 Γ n−1
2
2 ( 12 ) 2
+r

2r n−1
 
= n−1 Γ +r
Γ 2 2
When r=1
2 n−1 n−1
E[Y ] = Γ =n−1
Γ n−1
2
2 2
" #
ns2
.. . Eσ2 = n−1
σ2
n−1 2
⇒ Eσ2 [s2 ] σ =
n
2(n − 1) 4
and Vσ2 [s2 ] = σ
n2
Thus Eσ2 [s2 ] → σ 2 and Vσ2 [s2 ] → 0 as n → ∞

1 Pn
.˙. n i=1 (Xi − X̄)2 is a consistent estimator of σ 2 .
1 Pn
But Eσ2 [s2 ] 6= σ 2 . .˙. n i=1 (Xi − X̄)2 is not an unbiased estimator of σ 2 .
Problem 3.19 Illustrate with an example that estimator is both consistent
and unbiased.
A.Santhakumaran 108

Solution: Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a


1 Pn
normal population with mean θ and variance σ 2 . Define s2 = n i=1 (Xi − X̄)
2

1 Pn ns2
and S 2 = n−1
2
i=1 (Xi −X̄) , then Y = σ2
∼ χ2 distribution with (n−1) degrees
2(n−1) 4
of freedom and Y ∼ G( n−1 1 2
2 , 2 ). with Eσ 2 [s ] =
n−1 2
n σ and Vσ2 [s2 ] = n2
σ .

n 2
(n − 1)S 2 = ns2 ⇒ S 2 = s
n−1
n
Eσ2 [S 2 ] = E 2 [s2 ]
n−1 σ
n n−1 2
= σ = σ2
n−1 n
n2
Vσ2 [S 2 ] = E 2 [s2 ]
(n − 1)2 σ
n2 2(n − 1) 4
= σ
(n − 1)2 n2
2σ 4
= → 0 as → ∞
(n − 1)
1 Pn
Thus S 2 = n−1 i=1 (Xi − X̄)
2 is consistent and also unbiased estimator of σ 2 .
Problem 3.20 Give an example that unbiased estimator but not consistent.
Solution: Let X1 , X2 , · · · , Xn be a random sample drawn from a normal pop-
ulation with mean θ and known variance σ 2 , then the estimator X1 ( first
observation) of the sample is unbiased but not consistent. Since Eθ [X1 ] = θ
and Vθ [X1 ] = σ 2 ∀ θ ∈ Ω and

Pθ {|X1 − θ| < } = Pθ {− < X1 − θ < }

= Pθ {θ −  < X1 < θ + }
Z θ+
1 1 2
= √ e− 2σ2 (x1 −θ) dx1
2πσ θ−
6→ 1 as n → ∞

. ˙. X1 is not consistent but unbiased estimator of θ.


Problem 3.21 Give an example that estimator is not consistent and not
unbiased.
Solution: Let Y1 < Y2 < Y3 be the order statistics of a random sample of size
A.Santhakumaran 109

3 drawn from a uniform population with pdf for given θ is



 1

0<x<θ
θ
pθ (x) =
 0

otherwise

then Y1 is not consistent and not unbiased estimator of θ, since

θ
Eθ [Y1 ] = 6 θ ∀ θ ∈ Ω and
=
4 
θ θ θ
  

Pθ Y1 − < 
= Pθ −  < Y1 < + 
4 4 4
θ 2
3 4
Z+ y1

= 1− dy1
θ 4 −
θ θ
6→ 1 as n → ∞

Thus Y1 the first order statistic is not consistent and not unbiased estimator
of θ.

3.9 Sufficient statistic

Sufficient statistic conveys as much as information about the distri-


bution of a random variable which is contained in the sample. It helps
to identify a family of distributions only and not for the parameters of
distributions.
Definition 3.1 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a
population with pdf pθ (x). Let T = t(X) be a statistic whose pdf is pθ (t). For
a continuous random variable X, T = t(X) is said to be a sufficient statistic
if and only if
pθ (x1 , x2 , · · · , xn )
pθ (t)
is independent of θ for every given T = t . Similarly for a discrete random
variable X, T = t(X) is said to be a sufficient statistic if and only if

Pθ {X1 = x1 , X2 = x2 , · · · | T = t}

is independent of θ for every given T = t.


Problem 3.22 Let X be a single observation from a population with pmf
A.Santhakumaran 110

pθ (x), 0 < θ < 1. 


θ|x| (1−θ)|x|



 2 x = −1, 1

pθ (x) = 1 − θ(1 − θ) x = 0



 0

otherwise
Show that |X| is sufficient.
Solution: Let Y = |X|. Then P {Y = 0} = P {|X| = 0} = P {X = 0} = 1 − θ(1 − θ)
P {Y = 1} = P {|X| = 1} = P {X = 1orX = −1} = P {X = 1} + P {X = −1} =
θ(1 − θ)
P {X = 1 ∩ Y = 1}
ConsiderP {X = 1 | Y = 1} =
P {Y = 1}
P {X = 1 ∩ |X| = 1}
=
P {Y = 1}
P {X = 1}
=
P {Y = 1}
θ(1−θ)
2 1
= = is independent of θ
θ(1 − θ) 2
Therefore Y = |X| is sufficient.
Problem 3.23 Let X1 , X2 , · · · , Xn be independent random sample drawn from
a population with pdf

 eiθ−x

x > iθ, i = 1, 2, 3 · · · , n
pθ (x) =
 0

otherwise
Xi
Show that T = min1≤i≤n i is a sufficient statistic.
xi
Solution: Let y = i , then dx = idy
i[θ− xi ]
Given pθ (x) = e
i.e., pθ (y) = iei[θ−y] , y > θ
Take T = min1≤i≤n Yi . The pdf of T is
∞ n−1
n!
Z
pθ (t) = iei[θ−t] ieiθ−iy dy
1!(n − 1)! t
= ineinθ−int
P θ<t<∞
inθ− xi
pθ (x1 , x2 , · · · , xn ) e
=
pθ (t) ineinθ−int
1 int−P xi
= e
in
A.Santhakumaran 111

Xi
It is independent of θ. Thus T = min1≤i≤n Yi = min1≤i≤n i is sufficient.
Problem 3.24 Let X1 and X2 be iid Poisson random variables with param-
eter θ. Prove that

(i) X1 + X2 is a sufficient statistic.

(ii) X1 + 2X2 is not a sufficient statistic.

Solution: (i) Given that



 e−θ θx1

x1 = 0, 1, 2, · · ·
x1 !
Pθ {X1 = x1 } =
 0

otherwise

 e−θ θx2

x2 = 0, 1, 2, · · ·
x2 !
and Pθ {X2 = x2 } =
 0

otherwise
Let T = X1 + X2 , then

 e−θ θt

t = 0, 1, 2, · · ·
t!
Pθ {T = t} =
 0

otherwise

Pθ {X1 = x1 , X2 = t − x1 }
Consider Pθ {X1 = x1 , X2 = x2 | T = t} =
Pθ {T = t}
Pθ {X1 = x1 }Pθ {X2 = t − x1 }
=
Pθ {T = t}
e−θ θx1 e−θ θt−x2
x1 ! (t−x2 )!
= e−2θ (2θ)t
t!
t!
= is independent of θ.
(t − x1 )!x1 !2t

.˙. X1 + X2 is a sufficient statistic.

Solution : (ii) Consider Pθ {X1 + 2X2 = 2} = Pθ {X1 = 0, X2 = 1}

+ Pθ {X1 = 2, X2 = 0}

= Pθ {X1 = 0}Pθ {X2 = 1}

+ Pθ {X1 = 2}Pθ {X2 = 0}


A.Santhakumaran 112

θ2 −2θ
= θe−2θ + e
2
θ
= θe−2θ [1 + ]
2
Pθ {X1 = 0, X2 = 1}
Therefore Pθ {X1 = 0, X2 = 1 | X1 + 2X2 = 2} =
Pθ {X1 + 2X2 = 2}
e−2θ θ
=
θe−2θ [1 + 2θ ]
2
= depends on θ.
2+θ

.˙. X1 + 2X2 is not a sufficient statistic.


Problem 3.25 Let X1 and X2 be two independent Bernoulli random vari-
ables such that Pθ {X1 = 1} = 1 − Pθ {X1 = 0} = θ, 0 < θ < 1 and
1
Pθ {X2 = 1} = 1 − Pθ {X2 = 0} = 2θ, 0 < θ ≤ 2. Show that X1 + X2 is not
a sufficient statistic.

Let T = X1 + X2 . Consider

Pθ {T = 1} = Pθ {X1 + X2 = 1}

= Pθ {X1 = 0, X2 = 1} + Pθ {X1 = 1, X2 = 0}

= (1 − θ)2θ + θ(1 − 2θ)

= θ(3 − 4θ)
Pθ {X1 = 0 ∩ X1 + X2 = 1}
.˙.Pθ {X1 = 0 | X1 + X2 = 1} =
Pθ {X1 + X2 = 1}
Pθ {X1 = 0, X2 = 1}
=
Pθ {X1 + X2 = 1}
(1 − θ)2θ
=
θ(3 − 4θ)
2(1 − θ)
= is dependent on θ.
(3 − 4θ)

. ˙. X1 + X2 is not a sufficient statistic.


Problem 3.26 If X1 and X2 denote a random sample drawn from a normal
population N(θ, 1), −∞ < θ < ∞. Show that T = X1 + X2 is a sufficient
statistic.
A.Santhakumaran 113

Solution: The joint pdf of X1 and X2 is

pθ (x1 , x2 ) = pθ (x1 )pθ (x2 )


1 − 1 (x1 −θ)2 − 1 (x2 −θ)2
= e 2 2

Let T = X1 + X2 ∼ N (2θ, 2)

 √ 1 √ e− 14 (t−2θ)2

−∞ < t < ∞
p(t)θ = 2π 2
 0

otherwise

The definition of sufficient statistic gives


1 − 12 [x21 +x22 −2(x1 +x2 )θ+2θ2 ]
pθ (x1 , x2 ) 2π e
= 1 2 2
pθ (t) 1

2 π
e− 4 [t −4tθ+4θ ]
1 2 2 2
1 e− 2 (x1 +x2 )+(x1 +x2 )θ−θ
= √
π e− 14 (x1 +x2 )2 +(x1 +x2 )θ−θ2
1 1 2 2 1 2
= √ e− 2 (x1 +x2 )+ 4 (x1 +x2 ) is independent of θ.
π

. ˙. T = X1 + X2 is a sufficient statistic.

Problem 3.27 Let X1 , X2 , X3 be a sample from B(1, θ). Show that X1 X2 + X3


is not sufficient.
Solution: Let Y = X1 X2 and T = X1 X2 + X3 , then

P {Y = 0} = P {X1 = 0X2 = 0} + P {X1 = 1, X2 = 0} + P {X1 = 0, X2 = 1}

= (1 − θ)2 + θ(1 − θ) + θ(1 − θ)

= 1 − θ2

P {Y = 1} = P {X1 = 1, X2 = 1}

= θ2

P {Y + X3 = 1} = P {Y = 0, X3 = 1} + P {Y = 1, X3 = 0}

= (1 − θ2 )θ + θ2 (1 − θ)

i.e., P {T = 1} = θ(1 − θ)(1 + 2θ)


A.Santhakumaran 114

Consider

P {Y = 1, T = 1}
P {Y = 1 | T = 1} =
P {T = 1}
P {Y = 1}P {X3 = 0}
=
P {T = 1}
θ2 θ
=
θ(1 − θ)(1 + 2θ)
θ2
=
(1 − θ)(1 + 2θ)

P {Y = 1 | T = 1} depends on the parameter θ. Thus X1 X2 + X3 is not suffi-


cient

Remark 3.3 The definition of sufficient statistic is not always useful to find
a sufficient statistic, since

(i) it does not reveal which statistic is to be sufficient.

(ii) even if it is known in some cases, it is tedious to find the pdf of statistic.

(iii) it requires to derive a conditional density, which may not be easy,


namely for continuous random variables.

To avoid the above difficulties one may use the Neyman Factorization The-
orem.

3.10 Neyman criteria of sufficient statistic

Theorem 3.3 Let X1 , X2 , · · · , Xn be discrete random variables with pmf


pθ (x1 , x2 , · · · , xn ), θ ∈ Ω. Then T = t(X) is sufficient statistic if and only if

pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn )

where h(x1 , x2 , · · · , xn ) is a non-negative function of x1 , x2 , · · · , xn only and does not


depend on θ and pθ (t) is a non-negative function of θ and T = t only.
A.Santhakumaran 115

Proof: Assume that T = t(X) is a sufficient statistic. Then by definition of


sufficient statistic that

pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn ).

At any given sample point X1 = x1 , X2 = x2 , · · · , Xn = xn , let t(x) = t; then


adding the consistent restriction T = t does not alter the event X1 = x1 , X2 =
x2 , · · · , Xn = xn :

Pθ {X1 = x1 , · · · , Xn = xn } = Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn , T = t}

= Pθ {T = t}P {X1 = x1 , · · · , Xn = xn | T = t}

provided that P {X1 = x1 , · · · , Xn = xn | T = t} is well defined.


Choose Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } > 0 for some θ.
Define h(x1 , x1 , · · · , xn ) = P {X1 = x1 , · · · , Xn = xn | T = t} and pθ (t) = Pθ {T = t},
then Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } = pθ (t)h(x1 , x2 , · · · , xn ).
Conversely, Pθ {X1 = x1 , X2 = x2 , · · · , Xn = xn } = pθ (t)h(x1 , x2 , · · · , xn ) holds,
then prove that T = t(X) is a sufficient statistic. The marginal pmf of
T = t(X) is
X
Pθ {T = t} = Pθ {X1 = x1 , · · · , Xn = xn , t(X) = t}
t(x)=t
X
= Pθ {X1 = x2 , · · · , Xn = xn }
t(x)=t
X
= pθ (t)h(x1 , x2 , · · · , xn )
t(x)=t

Assume Pθ {T = t} > 0 for some θ > 0, then

Pθ {X1 = x1 , · · · , Xn = xn , T = t}
Pθ {X1 = x1 , · · · , Xn = xn | T = t} =
Pθ {T = t}

 0 if T 6= t

=
 Pθ {X1 =x1 ,···,Xn =xn }

if T = t
Pθ {T =t}
If T = t, then
Pθ {X1 = x1 , · · · , Xn = xn } pθ (t)h(x1 , x2 , · · · , xn )
=
Pθ {T = t} pθ (t) t(x)=t h(x1 , x2 , · · · , xn )
P
A.Santhakumaran 116

h(x1 , x2 , · · · , xn )
=
t(x)=t h(x1 , x2 , · · · , xn )
P

is independent of θ.

Thus T = t(X) is a sufficient statistic.

Invariant property of sufficient statistic

Theorem 3.4 If T = t(X) is a sufficient statistic, then any one to one function of
the sufficient statistic is also a sufficient statistic.
Proof: Let T = t(X) be a sufficient statistic, then by the Neyman Factor-
ization Theorem pθ (x1 , x2 , · · · , xn ) = pθ (t)h(x1 , x2 , · · · , xn ). Let U be any one to
one function of T = t(X), i.e., u = α(t). Since u = α(t) ⇒ t = α−1 (u)
dα−1 (u) 0
dt
= α−1 (u) .

.˙. du = du

h(x1 , x2 , · · · , xn )
pθ (x1 , x2 , · · · , xn ) = pθ (α−1 (u))[α−1 (u)]0
[α−1 (u)]0
= pθ (u)h1 (x1 , x2 , · · · , xn )

where pθ (u) = pθ (α−1 (u))[α−1 (u)]0


h(x1 , x2 , · · · , xn )
h1 (x1 , x2 , · · · , xn ) =
[α−1 (u)]0

is a function of x1 , x2 , · · · , xn for given U = u which is independent of θ. Thus


any one to one function of T = t(X) is also a sufficient statistic.
Remark 3.4

(i) Sufficient statistic is not unique. If it is unique, there is no one to one


function exist.

(ii) Every function of a sufficient statistic is itself a sufficient statistic.

Theorem 3.5 Let T (X) be a statistic such that for some θ1 , θ2 ∈ Ω and the
distributions of X, Y have the support of pθ (.), θ ∈ Ω. The statistic T (X) is
not sufficient for θ if

(i) T (X) = T (Y ) and

(ii) pθ1 (x)pθ2 (y) 6= pθ2 (x)pθ1 (y)


A.Santhakumaran 117

Proof: Define the support of pθ1 (.) by I(θ1 ) and of pθ2 (.) by I(θ2 ) so that
I(θ1 ) = {x : pθ1 (x) > 0} and I(θ2 ) = {x : pθ2 (x) > 0}. Consider the condition
(ii) pθ1 (x)pθ2 (y) 6= pθ2 (x)pθ1 (y) when either x or y is in I(θ1 ) and not in I(θ2 ),
one of its side is zero and the other side is non zero. Further , if T were
sufficient then T (X) = T (Y ) implies both x and y are in I(θ1 ) and I(θ2 ) . This
is not possible. .. . T is not sufficient for θ. Further suppose T is sufficient ,
for single observation X on x by Neyman Factorization Theorem

pθ1 (x) pθ (t)g(x) pθ (t)


= 1 = 1 (3.3)
pθ2 (x) pθ2 (t)g(x) pθ2 (t)

Again for single observation Y on y

pθ1 (y) pθ (t)g(y) pθ (t)


= 1 = 1 (3.4)
pθ2 (y) pθ2 (t)g(y) pθ2 (t)

Using equations (3.3) and ( 3.4) ⇒ pθ1 (x)pθ2 (y) = pθ2 (x)pθ1 (y) if T is sufficient.
By condition (ii) pθ1 (x)pθ2 (y) 6= pθ2 (x)pθ1 (y) and (i) T (X) = T (Y ) show that T
is not sufficient.
Problem 3.28 Let X1 , X2 , · · · , Xn be a random sample drawn from a popu-
lation with pmf 
 θ x (1 − θ)1−x

x = 0, 1
pθ (x) =
 0

otherwise
Find the sufficient statistic.

Consider pθ (x1 , x2 , · · · , xn ) = θt (1 − θ)n−t

where t = ni=1 xi
P
t
θ

= (1 − θ)n
1−θ
= pθ (t)h(x1 , x2 , · · · , xn )
t
θ

where pθ (t) = (1 − θ)n and h(x1 , x2 , · · · , xn ) = 1
1−θ
Pn
.˙. T = i=1 Xi is a sufficient statistic.
Remark 3.5 If the range of distribution depends on the parameter, Ney-
man Factorization Theorem is not convenient to find the sufficient statistic.
A.Santhakumaran 118

For such cases of the distributions definition of sufficient statistic is useful


to find the sufficient statistic.
Problem 3.28 Let X1 , X2 , · · · , Xn be a random sample drawn from a popu-
lation with pdf 
 e−(x−θ)

θ<x<∞
pθ (x) =
 0

otherwise
Obtain a sufficient statistic.
Solution: Consider Y1 ≤ Y2 ≤ · · · ≤ Yn is the order statistic of X1 , X2 , · · · , Xn .
The pdf of the statistic Y1 is
Z ∞ n−1
n!
pθ (y1 ) = e−(y1 −θ) e −(x−θ)
dx
1!(n − 1)! y1

 ne−n(y1 −θ)

θ < y1 < ∞
pθ (y1 ) =
 0

otherwise
The definition of sufficient statistic gives

pθ (x1 , x2 , · · · , xn ) e−(x1 −θ) · · · e−(xn −θ)


=
pθ (y1 ) ne−n(y1 −θ)
e −t+nθ Pn
= −ny +nθ
where t = i=1 xi
ne 1
e−t
= is independent of θ.
ne−ny1

.˙. Y1 = min1≤i≤n {Xi } is sufficient.

Again consider pθ (x1 , x2 , · · · , xn ) = e−(x1 +x2 +···+xn )+nθ

= e−yn +nθ−t+yn
n
X
where t = xi and Yn = max {Xi }
1≤i≤n
i=1
pθ (x1 , x2 , · · · , xn ) = e−yn +nθ e−t+yn

= pθ (yn )h(x1 , x2 , · · · , xn )

By Neyman Factorization Theorem, Yn = max1≤i≤n {Xi } is a sufficient statis-


tic. But if max1≤i≤n {Xi } = Y1 , then the range of the distribution θ < y1 < ∞
A.Santhakumaran 119

depends on θ. Again if max1≤i≤n {Xi } = Y2 , then the range of the dis-


tribution θ < y2 < ∞ depends on θ and so on. Thus for each fixed
Y1 = y1 , Y2 = y2 , · · · Yn = yn , h(x1 , x2 , · · · , xn ) depends on θ. h(x1 , x2 , · · · , xn )
depends on θ is a contradiction to Neyman Factorization Theorem. Hence
the Neyman Factorization Theorem is not convenient when the range of
distribution depends on the parameter θ.
Problem 3.29 Show that the set of order statistic based on a random sample
drawn from a continuous population with pdf p(x | θ) is a sufficient statistic.
The order statistic Y1 ≤ Y2 ≤ · · · ≤ Yn are jointly sufficient statistic
to identifying the distribution. If order statistic is given by Y1 = y1 , Y2 =
y2 , · · · , Yn = yn , then X1 , X2 , · · · , Xn are taking the values equally. So the
probability of the random sample equals for a particular permutations of
1
these given values of the order statistic is n! which is independent of the
parameter θ. . ˙. The set of order statistic is a sufficient statistic.
Problem 3.30 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from
a population with pdf

 1
 1
−∞ < x < ∞
π 1+(x−θ)2
pθ (x) =
 0

otherwise

Can the joint pdf of X1 , X2 , · · · , Xn be written in the form given in Ney-


man Factorization Theorem ? Does Cauchy distribution have a sufficient
statistic ?
The joint pdf of X1 , X2 , · · · , Xn is
n 
1 1
Y 
pθ (x1 , x2 , · · · , xn ) =
i=1
π 1 + (x − θ)2

It cannot be written in the form of Neyman Factorization Theorem, hence


it does not have a single sufficient statistic.
A.Santhakumaran 120

3.11 Exponential family of distributions

Definition 3.2 A family {pθ (x), θ ∈ Ω} of probability functions of the form



 c(θ)eQ(θ)t(x) h(x)

a<x<b
pθ (x) =
 0

otherwise

is said to be a regular exponential family of the probability functions if

• the range a < x < b of the distribution is independent of the parameter


θ.

• Q(θ) is a non - trivial continuous function of θ.

• t(x) is a non-trivial function of x.

• h(x) is a continuous function of x in a < x < b.

If θ is a single value of the parameter space, then it is a single parameter


exponential family.
Definition 3.3 A pdf pθ (x) with single parameter θ is expressed as a single
parameter exponential family

 c(θ)eQ(θ)t(x) h(x)

a<x<b
pθ (x) =
 0

otherwise

then T = t(X) is called sufficient statistic.


Remark 3.6 The simplicity of the definition is to determine the sufficient
statistic by inspection.
Problem 3.31 Let Xi ’ s be independent and having N(iθ, 1), i = 1 to n,
where θ is unknown. Find a sufficient statistic for N( iθ, 1).

 √1 e− 12 (xi −iθ)2

−∞ < xi < ∞
pθ (xi ) = 2π
 0

otherwise

n
Y
Consider pθ (x1 , x2 , · · · , xn ) = pθ (xi )
i=1
A.Santhakumaran 121

n
1
  Pn
1 2
= √ e− 2 i=1 (xi −iθ)

n
1
 Pn 2 Pn Pn 2 2
1
= √ e− 2 i=1 xi +θ i=1 ixi − i=1 i θ

n
1
 Pn 2 Pn
1 n(n+1)(2n+1) 2
= √ e− 2 i=1 xi +θ i=1 ixi − 12
θ

= c(θ)eQ(θ)t(x) h(x)
n Pn
1
x2i
ixi , h(x) = e− 2
X
where t(x) = i=1

i=1
n
1

1 2
and c(θ) = √ e− 12 n(n+1)(2n+1)θ

Pn
Thus T = i=1 iXi is a sufficient statistic.
Problem 3.32 Given n independent observations on a random variable X
with probability density function

1 − xθ



 2θ e if x > 0, θ > 0

pθ (x) = θ θx
 2e if x ≤ 0


 0

otherwise

Obtain a sufficient statistic.


Consider

t(x)
 ( 1 )n e − θ

if x > 0

pθ (x1 , x2 , · · · , xn ) =
 ( θ )n eθt(x) , if x ≤ 0, where t(x) = ni=1 xi
 P
2

 c1 (θ)eQ1 (θ)t(x) h(x) if x > 0, θ > 0

pθ (x) =
 c2 (θ)eQ2 (θ)t(x) h(x)

if x ≤ 0
1 n
where c(θ) = ( 2θ ) , Q1 (θ) = − 1θ , c2 (θ) = ( 2θ )n , Q2 (θ) = θ and h(x) = 1. .˙. T =
Pn
i=1 Xi is a sufficient statistic.
Problem 3.33 If X has a single observation from N (0, σ 2 ), then show that
|X| is a sufficient statistic.

1 2
 √ 1 e− 2σ2 x

0∞ < x < ∞
Given pσ (x) = 2πσ
 0

otherwise
A.Santhakumaran 122

The pdf is expressed as

pσ (x) = c(σ)eQ(σ)t(x) h(x)


1 1
wherec(σ) = √ Q(σ) = − 2 , t(x) = x2 , h(x) = 1
2πσ 2σ
It is an one parameter exponential family. Thus T = X 2 is sufficient It is
equivalent to T = |X| is sufficient
Problem 3.34 Let X1 , X2 , · · · , Xn be a random sample from N (θ, θ), θ > 0.
Find the sufficient statistic for the random sample.

1
 √ 1 e− 2θ (x−θ)2
−∞ < x < ∞

Given pθ (x) = 2πθ
 0

otherwise
The joint pdf of X1 , X2 , · · · , Xn is
n
1
 Pn
1
(x−θ)2
pθ (x1 , x2 , · · · , xn ) = √ e− 2θ i=1
2πθ
1 n − 1 P x2 + P xi − n θ
 
= e 2θ i 2

= c(θ)eQ(θ)t(x) h(x)
 n n
P
√1 e− 2 θ , xi 1
P 2
where c(θ) = 2πθ
h(x) = e t(x) = x i Q(θ) = − 2θ It is an
Xi2 is sufficient statistic.
P
one parameter exponential family. Thus T =

3.12 Distribution admitting sufficient statistic

Let X be a random sample drawn from a population with distribution


Pθ , θ ∈ Ω, whose pdf is given by pθ (x). Assume θ is a single value of the
parameter space Ω and the range of the distribution is independent of
the parameter θ. Let T = t(X) be a sufficient statistic. Using Neyman
Factorization Theorem,
pθ (x) = pθ (t)h(x)

log pθ (x) = log pθ (t) + log h(x)

Assume that the function pθ (t) is partially differentiable with respect to θ,


then
∂ log pθ (x) ∂ log pθ (t)
= = Qθ (t) (3.5)
∂θ ∂θ
A.Santhakumaran 123

Since the equation holds for all values of θ, it is also true for θ = 0. So one
can obtain the relation t(x) = k(t) where

∂ log pθ (x)
|θ=0 = t(x) and Q0 (t) = k(t)
∂θ

Suppose k(t) and t(x) are differentiable with respect to x, then

∂t(x) ∂k(t) ∂t
=
∂x ∂t ∂x

Again differentiate the equation (3.5) with respect to x

∂ 2 log pθ (x) ∂Qθ (t) ∂t


=
∂x∂θ ∂t ∂x
∂ 2 log pθ (x)
∂x∂θ ∂Qθ (t)
∂t(x)
= (3.6)
∂k(t)
∂x
The left hand side of the equation (3.6) is the same for all x. It must depend
∂Qθ (t)
on θ alone so that ∂k(t) = λ(θ), i.e.,

∂Qθ (t) = λ(θ)∂k(t)

Integrating with respective to t,


Z Z
∂Qθ (t) = λ(θ)∂k(t) + c1 (θ)

Qθ (t) = λ(θ)k(t) + c1 (θ)

Again integrating with respective to θ,

Z Z Z
Qθ (t)dθ = k(t) λ(θ)dθ + c1 (θ)dθ + c(x)
R ∂ log pθ (x) R
dθ dθ = t(x) λ(θ)dθ + B(θ) + c(x)
R
since k(t) = t(x) for θ = 0 and B(θ) = c1 (θ)dθ

log pθ (x) = Q(θ)t(x) + B(θ) + c(x)

R
where Q(θ) = λ(θ)dθ

elog pθ) (x) = eQ(θ)t(x)+B(θ)+c(x)


A.Santhakumaran 124

pθ (x) = eQ(θ)t(x) eB(θ) ec(x)

= c(θ)eQ(θ)t(x) h(x)

where c(θ) = eB(θ) , h(x) = ec(x)

This is an one parameter exponential family.


Remark 3.7 The Neyman Factorization Theorem and the Exponential fam-
ily of distributions form are the two equivalent methods of identifying the
sufficient Statistic.

3.13 Joint sufficient statistics

Definition 3.4 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from


a population with pdf pθ1 ,θ2 (x), θ1 , θ2 ∈ Ω. Let T1 = t1 (X), T2 = t2 (X) be
two statistics whose joint pdf is pθ1 ,θ2 (t1 t2 ). The statistics T1 = t1 (X) and
T2 = t2 (X) are called jointly sufficient statistics iff
pθ1 ,θ2 (x1 , x2 , · · · , xn )
pθ1 ,θ2 (t1 , t2 )
is independent of the parameters θ1 and θ2 for fixed T1 = t1 and T2 = t2 .
Problem 3.35 Let X1 , X2 , · · · , Xn be a random sample drawn from a popu-
lation with density function

1
θ1 − θ2 < x < θ1 + θ2 where −∞ < θ1 < ∞, 0 < θ2 < ∞


2θ2
pθ1 ,θ2 (x) =
 0

otherwise

Find a sufficient statistic.


Consider Y1 ≤ Y2 ≤ · · · ≤ Yn be the order statistic of X1 , X2 , · · · , Xn . The joint
pdf of (Y1 , Yn ) is
Z yn n−2
n! 1 1 1
pθ1 ,θ2 (y1 , yn ) = dx
1!(n − 2)!1! 2θ2 1 y 2θ2 2θ2

n(n−1) n−2
(2θ2 )n (yn − y1 ) θ1 − θ2 < θ < θ 1 + θ2


=
 0

otherwise
pθ1 ,θ2 (x1 , x2 , · · · , xn ) ( 2θ12 )n
= n(n−1)
pθ1 ,θ2 (y1 , yn ) − y1 )n−2
(2θ2 )n (yn
A.Santhakumaran 125

1
=
n(n − 1)(yn − y1 )n−2

is independent of the parameter θ1 and θ2 . .˙. (Y1 , Yn ) is jointly sufficient


statistics.

Problem 3.34 Let X1 , X2 , · · · , Xn be a random sample from the pdf



 1 e− σ1 (x−θ)

θ < x < ∞, −∞ < θ < ∞
σ
pθ,σ (x) =
 0

otherwise 0<σ<∞

Find a two dimensional sufficient statistics.


Consider a transformation

Y1 = nX(1)

Y2 = (n − 1)[X(2) − X(1) ]

Y3 = (n − 2)[X(3) − X(2) ]

··· ······

Yn−1 = 2[X(n−1) − X(n−2) ]


n
X n
X
Yn−2 = [X(n) − X(n−1) ] so that Yi = X(i)
i=1 i=1

1
The Jacobian transformation is |J| = n!. The joint pdf of X(1) , X(2) , · · · , X(n)
Qn
is given by p(x(1) , x(2) , · · · , x(n) ) = n! i=1 p(x(i) ) The joint pdf of Y1 , Y2 , · · · , Yn is
given by
n
Y
pθ,σ (y1 , y2 , · · · , yn ) = n! p(yi ) × |J|
i=1
n
Y
= p(yi )
i=1
1 − 1 (P yi +nθ)
= e σ nθ < y1 < ∞, 0 ≤ y2 < · · · , < yn < ∞
σn

Consider a further transformation

U 1 = Y2

U 2 = Y2 + Y3
A.Santhakumaran 126

U 3 = Y2 + Y3 + Y4

··· ······

Un−2 = Y1 + Y2 + · · · + Yn−1

T = Y2 + Y3 + · · · + Yn

i.e., Y2 = U1

Y3 = U2 − U1

Y4 = U3 − U2

··· ······

Yn−1 = Un−2 − Un−3

Yn = T − Un−2

Now Y2 + Y3 + · · · + Yn = T and the Jacobian transformation is |J| = 1. The


joint pdf of Y1 , U2 , · · · , Un−2 T is
(y1 −nθ) t
pθ,σ (y1 , u2 , · · · , un−2 , t) = σ1 e− σ
1
σ n−1
e− σ
nθ ≤ y1 , 0 ≤ u1 ≤ u2 ≤ · · · ≤ un−2 ≤ t < ∞
The marginal density of (Y1 T ) is
(y1 −nθ) t R Ru Ru
pσ,θ (y1 , t) = σ1 e− 1
e− σ 0t 0 n−2 0 n−1 · · · 0u2 du1 du2 · · · dun−2
R
σ
σ n−1
(y1 −nθ) t R t R un−2 R un−1
= σ1 e− 1
e− σ · · · 0u3 u2 du2 du3 · · · dun−2
R
σ
σ n−1 0 0 0
(y1 −nθ) t R t R un−2 R un−1 R u2
= σ1 e− σ
1
σ n−1
e− σ 0 0 0 · · · 0u4 2!3 du3 · · · dun−2
(y1 −nθ) t R t n−3
= σ1 e− σ
1
σ n−1
e− σ 1
(n−3)! 0 un−2 dun−2
(y1 −nθ) t
tn−2
= σ1 e− σ
1
σ n−1
e− σ (n−2)!
n
The first order statistic Y1 has the pdf pθ,σ (y1 ) = nσ e− σ (y1 −θ) θ < y1 < ∞
1
i.e., pθ,σ (y1 ) = σ1 e− σ (y1 −nθ) nθ < y1 < ∞
Pn
Thus Y1 + nθ ∼ eσ and T = i=2 Yi ∼ G(σ, n − 1).

y −nθ t
 1 e[− 1 σ ] tn−2
e[− σ ] nθ < y1 < ∞, 0 < t < ∞

σ (n−2)!σ n−1
pθ,σ (y1 , t) =
 0

otherwise

The conditional joint density of U1 , U2 , · · · , Un−2 given (Y1 , T ) is

(n − 2)
p(u1 , u2 , · · · , un−2 | y1 , t) = 0 < u1 < u2 < · · · < un−2 < t
tn−2
A.Santhakumaran 127

 Pn 
Thus (Y1 , T ) is jointly sufficient statistics, i.e., X(1) , i=1 [X(i) − X(1) ] is
jointly sufficient statistics.
Definition 3.5 Let θ = (θ1 , θ2 , · · · , θk ) is a vector of parameters and T =
(T1 , T2 , · · · , Tk ) is a random vector . The vector T is jointly sufficient statistics
if pθ (x) is expressed of the form
 Pk
 c(θ)e j=1 Qj (θ)tj (x) h(x)

a<x<b
pθ (x) =

 0 otherwise

Problem 3.36 Let X1 , X2 , · · · , Xn be a random sample drawn from a pop-


Pn Pn
ulation N(θ, σ 2 ). Show that the statistic T = 2

i=1 Xi , i=1 Xi is jointly
sufficient statistics.
n
1
  Pn
1 2
pθ,σ2 (x1 , x2 , · · · , xn ) = √ e− 2σ2 i=1 (xi −θ)
2πσ
n
1
 Pn 2 Pn
1 2
= √ e− 2σ2 [ i=1 xi −2θ i=1 xi +nθ ]
2πσ
n
1 nθ 2
 Pn 2 Pn
1
= √ e− 2σ2 e− 2σ2 [ i=1 xi −2θ i=1 xi ]
2πσ
2 )t (x)+Q (θ,σ 2 )t (x)
= c(θ, σ 2 )eQ1 (θ,σ 1 2 2
h(x)
n 2
1

− nθ2
where c(θ, σ 2 ) = √ e , 2σ
2πσ
θ −1
Q1 (θ, σ 2 ) = 2 , Q2 (θ, σ 2 ) = 2 ,
σ 2σ
n
X n
X
h(x) = 1, t1 (x) = xi , t2 (x) = x2i
i=1 i=1
Pn Pn 2

.˙. i=1 Xi , i=1 Xi is jointly sufficient statistics.
Problem 3.36 Let X1 , X2 , · · · , Xn be a random sample from a Gamma(α, β)
population. Find a two dimensional sufficient statistics for the random
sample. 
 αβ e−αx xβ−1

x > 0, α > 0, β > 0
Γβ
Given pα,β (x) =
 0

otherwise
The joint pdf of X1 , X2 · · · , Xn is

αnβ −α P xi Y β−1
pα,β (x1 , x2 , · · · , xn ) = e ( xi )
(Γβ)n
A.Santhakumaran 128

αnβ −α P xi (β−1) log(Qn xi )


= e e i=1
(Γβ)n
αnβ −α P xi +(β−1) P log xi
= e
(Γβ)n
αnβ −α P xi +β P log xi −P log xi
= e
(Γβ)n
= c(α, β)eQ1 (α,β)t1 (x)+Q2 (α,β)t2 (x) h(x)

αnβ Pn
where c(α, β) = (Γβ)n , Q1 (α, β) = −α, t1 (x) = i=1 xi , Q2 (α, β) = β , t2 (x) =
Pn
Pn − log xi
i=1 log xi and h(x) = e i=1 . It is a two parameter exponential family.
Xi2 ) is jointly sufficient statistic.
P P
Therefore ( Xi ,

3.14 Efficient estimator

There are two types of efficient estimators. One is relative efficient


estimator and the other one is efficient estimator. Efficient estimator due
to Cramer - Rao lower bound for the variance of an unbiased estimator.
Relative efficient estimator is given below:
Definition 3.6 Let T1 = t1 (X) and T2 = t2 (X) be two unbiased estimators of
θ and Eθ [T12 ] < ∞ and Eθ [T22 ] < ∞. One may define the efficiency of T1 = t1 (X)
relative to T2 = t2 (X) is
Vθ [T1 ]
Efficiency = .
Vθ [T2 ]
Vθ [T1 ]
If Vθ [T2 ] < 1, then T1 = t1 (X) is more efficient than T2 = t2 (X).
Problem 3.37 Let Y1 < Y2 < Y3 < Y4 < Y5 be the order statistic of a random
sample of size 5 from a uniform population with pdf

 1

0 < x < θ, θ > 0
θ
pθ (x) =
 0

otherwise

Show that 2Y3 is unbiased estimator of θ. Find the conditional expectation


T = Eθ [2Y3 | Y5 ] . Compare the variances of 2Y3 and the statistic T .
A.Santhakumaran 129

The pdf of Y3 is
"Z #2
y3 1 2 θ 1
5! 1
Z 
pθ (y3 ) = dx dx
2!1!2! 0 θ θ y3 θ
30 2
= y [θ − y3 ]2 0 < y3 < θ
θ5 3
30 2 y3
= 5
y3 [1 − ]2 0 < y3 < θ
θ θ
Z θ
30 y3
Eθ [Y3 ] = 3
y [1 − ]2 dy3
θ3 0 3 θ
Z 1
30 y3
= θ4 t3 (1 − t)2 dt where t = θ
θ3 0
Z 1
= 30θ t4−1 (1 − t)3−1 dt
0
Γ4Γ3
= 30θ
Γ7
3! × 2! θ
= 30 =
6! 2
Eθ [2Y3 ] = θ

The joint pdf of Y3 and Y5 is


Z y3 2 Z y5
5! 1 1 1 1

pθ (y3 , y5 ) = dx dx
2!1!1!1! 0 θ θ y3 θ θ

 605 y 2 [y5 − y3 ]

0 < y3 < y5 < θ
θ 3
=

 0 otherwise

The pdf of Y5 is 
 55 y 4

0 < y5 < θ
θ 5
pθ (y5 ) =
 0

otherwise
The conditional distribution of Y3 given Y5 = y5 is
pθ (y3 , y5 )
pθ (y3 | y5 ) =
pθ (y5 )
60 y32 [y5 − y3 ]
= 0 < y3 < y5
5 y54
Z y5
12
Eθ [Y3 | Y5 ] = y33 [y5 − y3 ]dy3
y54 0
3
= y5
5
6
.. . Eθ [2Y3 | Y5 = y5 ] = y5
5
A.Santhakumaran 130

θ2 2θ2
Vθ [Y3 ] = since Eθ [Y32 ] =
28 7
θ2
Vθ [2Y3 ] =
7
Z θ
5 5
Eθ [Y5 ] = y55 dy5 = θ
θ5 0 6
5θ2
Eθ [Y52 ] =
7
5θ2
Vθ [Y5 ] =
5 × 36
6 θ2
 
Vθ Y5 =
5 35

The efficiency of 56 Y5 is relative to 2Y3 is


θ2
Vθ [ 65 Y5 ] 35 1
= θ2
= <1
Vθ [2Y3 ] 7
5

Thus 65 Y5 is more efficient than 2Y3 for the unbiased estimator of θ.


Problems

3.1 Give an example for each of the following cases:


(i) Estimator with non - zero bias.
(ii) Estimator with non zero bias
(iii) Consistent estimator with zero bias and
(iv) Consistent estimator with non zero bias

3.2 Give a sufficient condition for an estimator to be consistent? Is the


sample mean a consistent estimator of the population mean?

3.3 If X1 , X2 , · · · , Xn is a random sample of size n drawn from a population


with uniform distribution ∪[−2θ, θ], examine whether max1≤i≤n {Xi } is
consistent for θ?

3.4 What is consistent estimator ? Examine whether a consistent estimator


is (i) unique and (ii) unbiased.

3.5 Show that if the bias of an estimator and its variance approach zero,
then the estimator will be consistent.
A.Santhakumaran 131

3.6 When would you say that estimator of a parameter is good? In par-
ticular discuss the requirements of consistency and unbiasedness of an
estimator. Give an example to show that a consistent estimator need
not be unbiased.

3.7 Let X1 , X2 , · · · , Xn be n independent random sample drawn from a nor-


mal population with mean θ and variance σ 2 . Obtain the unbiased
estimators of (i) θ and (ii) σ 2 .

3.8 If Tn denotes the number of successes in n independent and identi-


cal trials of an experiment with probability of success p. Obtain an
unbiased estimator of p2 in the form aTn2 + bTn + c.

3.9 Obtain the unbiased estimator of θ(1 − θ), where θ is the parameter of
Binomial distribution.

3.10 Find the unbiased estimator of λ2 in a Poisson population with pa-


rameter λ based on a random sample of size n.

3.11 Let X1 , X2 , · · · , Xn be iid random sample of size n drawn from a popu-


lation with common density

 1 e− xθ

θ > 0 and x > 0
θ
pθ (x) =
 0

otherwise
P
Xi
(i) Show that T1 = n is the unbiased estimator of θ.
Pn
(ii) Let Tc = c i=1 Xi . Show that Ec [Tc − θ]2 = θ2 E1 [Tc − 1]2 .

3.12 Obtain the sufficient statistic, given a sample of size n from a uniform
distribution ∪(−θ, θ).

3.13 State two equivalent definition of sufficient statistic and obtain their
equivalence.

3.14 Explain the concept of sufficiency. State the Factorization Theorem


for a sufficient statistic and indicate its importance.
A.Santhakumaran 132

3.15 Let X1 , X2 , · · · , Xn be a random sample drawn from a uniform popula-


nT1 n
= max1≤i≤n {Xi }.
P
tion in the interval [0, θ]. Define 2 = Xi and n+1 T2

Evaluate their relative efficiency.

3.16 Let X1 , X2 , · · · , Xn be a random sample drawn from a population


N (θ, σ 2 ). Prove that the sample mean is more efficient estimator as
compared to the sample median for the parameter θ.

3.17 Let X be a single observation from a normal population N (2θ, 1) and


Pn
let Y1 , Y2 , · · · , Yn be a normal population N (θ, 1). Define T = 2X+ k=1 Yk .

Show that T is sufficient statistic.

2.18 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal popula-


Pn 2
tion N (0, θ), 0 < θ < ∞. Show that i=1 Xi is a sufficient statistic.

3
3.19 If T1 = 2 max{X1 , X2 } and T2 = 2(X1 + X2 ) are estimators of θ based
on two independent observations X1 and X2 on a random variable
distributed uniformly over (0, θ). Which one do you prefer and why?

3.20 Let X1 , X2 , · · · , Xn be a random sample drawn from a Poisson pop-


P
Xi
ulation with parameter θ. Show that n+2 is not unbiased of θ but
consistent of θ.

3.21 Distinguish between an Estimate and Estimator. Given three obser-


vations X1 , X2 and X3 on a normal random variable X from N (θ, 1), a
person constructs the following estimators for θ

X1 + X2 + X3
T1 =
6
X1 + 2X2 + 3X3
T2 =
7
X1 + X2
T3 =
2

which one would you choose and why?


A.Santhakumaran 133

3.22 A random sample X1 , X2 , · · · , Xn drawn on X which takes 1 or 0 with


P P
Xi ( Xi −1)
respective probabilities θ and (1 − θ). Show that n(n−1) is an
unbiased estimator of θ2 .

3.23 Discuss whether an unbiased estimator exists for the parametric func-
tion τ (θ) = θ2 of Binomial (1, θ) based on a sample of size one.

3.24 Obtain the sufficient statistic of the pdf



 (1 + θ)xθ 0 < x < 1

pθ (x) =
 0

otherwise

based on an independent sample of size n.

3.25 X1 , X2 , X3 and X4 constitute a random sample of size four from a


Poisson population with parameter θ. Show that (X1 + X2 + X3 + X4 )
and (X1 +X2 , X3 +X4 ) are sufficient statistics. Which would you prefer?

3.26 A statistic Tn such that Vθ [Tn ] → 0 ∀ θ is consistent as an estimator of


θ as n → ∞
(a) if and only if Eθ [Tn ] → θ ∀ θ
(b) if, but not only if Eθ [Tn ] → θ ∀ θ
(c) if and only if Eθ [Tn ] = θ ∀ θ, for every n
(d) if and only if |Eθ [Tn ] − θ| and Vθ [Tn ] → 0 ∀ θ Ans:(b)

3.27 The sequence {Xn } of random variables is said to converge X in prob-


ability, if as n → ∞:
(a) P {|Xn − X| > } → 0 for some  > 0
(b) P {|Xn − X| > } → 0 for some  < 0
(c)P {|Xn − X| > } → 0 for every  > 0
(d) P {|Xn − X| < } → 0 for every  > 0 Ans:(c)

3.28 X1 , X2 , · · · , Xn are iid Bernoulli random variables with Eθ [Xi ] = θ and


Pn
Sn = i=1 Xi . Then, for a sequence of non - negative numbers {kn }, Tn =
Sn +kn
n+kn is a consistent estimator of θ.
A.Santhakumaran 134

kn
(a) if n → 0 as n → ∞ (c) iff kn is bounded as n → ∞
(b) if and if kn = 0 ∀ n (d) whatever {kn } is Ans:(a)

3.29 In tossing a coin the P {Head} = p2 . It is tossed n times to estimate the


value of p2 . X denotes the number of heads. One may use to estimate
the unbiased estimator is

 2 X2
(a) X (c) n
n
X
(b) X (d) n2
Ans: (b)
n

3.30 Which of the following statement is not correct for a consistent esti-
mator?
1. If there exists one consistent estimator, then an infinite number of
consistent statistics may be constructed.
2. Unbiased estimators are always consistent.
3. A consistent estimator with finite mean value must tend to be un-
biased in large samples.
Select the correct answer given below
(a) 1 (b) 2 (c) 1 and 3 (d) 1, 2 and 3 Ans: (b)

3.31 Consider the following type of population


1. Normal 2 . Cauchy 3 . Poisson
Sample mean is the best estimator of population mean in case of
(a) 1 and 3 (b) 1 and 2 (c) 2 and 3 (d) 1 , 2 and 3 Ans:(a)

3.32 Suppsoe X1 , X2 , · · · are independent random variables. Assume that


X1 , X3 , · · · are identically distributed with mean µ1 and variance σ12
while X2 , X4 , · · · are identically distributed with mean µ2 and σ22 . Let
Sn = X1 + X2 + · · · + Xn . Then Snb−a n
converges to N (0, 1) if
q n2 2
√ (σ +σ 2)
(a) an = n(µ12+µ2 ) and bn = n 1
2

q 2 2
σ +σ
(b) an = n(µ1 + µ2 ) and bn = n 1 2 2
A.Santhakumaran 135

n(µ1 +µ2 √ q 2
(c) an = 2 and bn = n (σ1 + σ22 )
q
n(µ1 +µ2 σ12 +σ22
(d) an = 2 and bn = n 2 Ans: (a)

3.33 Let (X, Y ) have the joint discrete such that [X | Y = y] ∼ B(y, 0.5)
and Y ∼ Poisson (λ), λ > 0 where λ is an unknown parameter. Let
T = T (X, Y ) be any unbiased estimation of λ. Then

(a) V [T ] ≥ λ ∀ λ (c) V [T ] ≤ V [Y ] ∀ λ
(b) V [T ] ≥ V [Y ] ∀ λ (d) V [T ] = V [Y ] ∀ λ

Ans:(a) and (b)

3.34 Let X1 , X2 , · · · be a random sample from Uniform (0, 3θ), θ > 0. Define
1
T = 3 max(X1 , X2 , · · · , Xn ). Which of the following is not true?.

(a) Tn is consistent for θ (c) Tn is sufficient statistic


(b) Tn is unbiased for θ (d) Tn is minimum order statistic

Ans: (a)

3.35 Suppose draw a random sample of size n from a population of size


N , where 1 < n < N . Using simple random sampling without replace-
ment scheme, let P be the population proportion of unit possessing
a particular attribute and p be the corresponding sample proportion.
Which of the following is unbiased estimator for P (1 − P )?.

n(N −1)
(a) p(1 − p) (c) N (n−1) p(1 − p)
N −n N (n−1)
(b) N −1 np(1 − p) (d) n(N −1) p(1 − p) Ans: ( c)

3.36 A Statistician has drawn a simple random sample of size 2 with re-
placement heights. Let X̄1 be the sample mean of their heights. Then
another Statistician has drawn a simple random sample size 2 without
replacement from those 4 boys. Let X̄2 be the sample mean their
A.Santhakumaran 136

heights which of the following statements is correct?.


X̄1 + X̄2 2 X̄1 +3 X̄2
(a) 2 has larger variance than that of 5
X̄1 + X̄2 2 X̄1 +3 X̄2
(b) 2 has smaller variance than that of 5
X̄1 +2 X̄2 2 X̄1 +3 X̄2
(c) 3 has smaller variance than that of 5
X̄1 + X̄2 2 X̄1 +3 X̄2
(d) 2 has equal variance than that of 5 Ans: (a)

3.37 Let X1 , X2 be iid with probability mass function



 θ x (1 − θ)1−x

x = 0, 1 : 0 < θ < 1
pθ (x) =
 0

otherwise

Which of the following statements are true?.

(a) X1 + X2 is a sufficient statistic (c) X12 + X22 is a sufficient statistic


(b) X1 − X2 is a sufficient statistic (d) X1 + X22 is a sufficient statistic

Ans: (a)

3.38 Let {Xn } be a sequence of independent random variables where the


distribution of Xn is Normal with mean µ and variance n for n =
P Pn
Xi (Xi /i)
1, 2, 3 · · · . Define X̄n = n and Sn = Pi=1
n
(1/i)
. Which of the following
i=1
is true?

(a) E[ X̄n ] = E[Sn ] for large n (d) V [Sn ] > V [ X̄n ] for sufficiently
(b) X̄n is consistent for µ large n Ans:(b)
(c) X̄n is sufficient for µ

3.39 Let X1 , X2 · · · Xn be a random variables from pθ (θ), a pdf or a pmf .


1 1 Pn
Define S 2 = [Xi − X̄]2 where X̄n = Then Sn2 is unbiased
P
n−1 n i=1 Xi .

for θ, if θ > 0

x
(a) pθ (x) = e−θ θx! , x = 0, 1, 2, · · · (c) pθ (x) = 1θ e−θx , x > 0
2
− x2θ
(b) pθ (x) = √1 e
2πθ
, −∞ < x < ∞ (d) pθ (x) = 1θ e−θx , x > 0 Ans: (b)
A.Santhakumaran 137

3.40 Suppose there are K groups each consisting of N boys. We want to


estimate the mean age µ of these KN boys. Fix 1 < n < N and consider
the following schemes.
(i) Draw a simple random sample without replacement of size kn out
of all KN boys.
(ii) From each of the k groups draw a simple random sample with
replacement of size n.
Let Ȳ and ȲG be the respective sample mean ages for the two schemes.
Which of the following is true?

(a) E[Ȳ ] = µ (d) V [Ȳ ] = V [ ȲG ] if all the group


(b) E[ ȲG ] = µ means are same Ans: (c)
(c) V [Ȳ ] < V [ ȲG ] in some cases

3.41 Let X1 , X2 , · · · , X2n+1 be a random sample from a Uniform distribution


on the interval (θ − 1, θ + 1). Let T1 = X̄, the sample mean, T2 = X̄ , the
T1 +T2
sample median, T3 = 2 be three estimator of θ. Then which of the
following statement is true?.
(a) T1 is consistent for θ
(b) Both T1 and T2 are more efficient than T1
(c) All the three estimators are biased for θ
(d) T2 is a sufficient statistic for θ Ans : (a)

3.42 Let X1 , X2 , · · · , X2n+1 be a random sample from a Uniform distribution


on the interval (θ − 1, θ + 1). Let T1 = X̄, the sample mean, T2 = X̄ , the
T1 +T2
sample median, T3 = 2 be three estimator of θ. Then which of the
following statement is true?.
(a) T1 is not consistent for θ
(b) Both T1 and T2 are more efficient than T1
(c) All the three estimators are unbiased for θ
(d) T2 is a sufficient statistic for θ Ans : (c)
A.Santhakumaran 138

3.43 Suppose Ȳ is the mean of the study variables corresponding to a sam-


ple of size n, using simple random sampling with replacement scheme
and Ȳst is the sample mean of the study variables corresponding to a
sample of size n, using stratified random sampling with replacement
scheme under proportional allocation. Which of the following is suffi-
cient for V [Y ] = V [Y¯st ]?
(a) All the stratum sizes are equal
(b) All the stratum totals are equal
(c) All the stratum means are equal
(d) All the stratum variances are equal Ans:(c)

3.44 Let X1 , X2 , · · · , Xn be iid random variables each following Uniform


(1 − θ, 1 + θ) distribution θ > 0. Define X(1) = min(X1 , X2 , · · · , Xn ) X(n) =
1 Pn
max(X1 , X2 , · · · , Xn ) and X̄ = n i=1 Xi . Which of the following is true?.
(a) (X(1) , X̄, X(n) ) is sufficient for θ
(b) 21 (X(n) − X(1) )2 is unbiased for θ2
1 Pn
(c) n i=1 (Xi − 1)2 is unbiased for θ2
3 Pn
(d) n i=1 (Xi − X̄)2 is unbiased for θ2 Ans:(a)

3.45 Let X1 , X2 , · · · , Xn be iid random variables each following Uniform


(1 − θ, 1 + θ) distribution θ > 0. Define X(1) = min(X1 , X2 , · · · , Xn ) X(n) =
1 Pn
max(X1 , X2 , · · · , Xn ) and X̄ = n i=1 Xi . Which of the following is true?.
1
(a) n (X(n) − X(1) )2 is unbiased for θ
(b) 12 (X(n) − X(1) )2 is unbiased for θ2
3 Pn
(c) n i=1 (Xi − 1)2 is unbiased for θ2
3 Pn
(d) n i=1 (Xi − X̄)2 is unbiased for θ2 Ans:(c)

3.46 Let X1 , X2 , · · · , Xn be a random sample from the probability density


function 
 1 e−|x−θ|

−∞ < x < ∞
2
pθ (x) =
 0

otherwise
A.Santhakumaran 139

Which of the following statements is true?


Pn
(a) i=1 Xi is a sufficient statistic for θ
Pn 2
(b) i=1 Xi is a sufficient statistic for θ
Pn 2
(c) i=1 Xi is a sufficient statistic for θ2
1 Pn 2
(d) n i=1 Xi is a sufficient statistic for θ Ans:(a)

3.47 Suppose X and Y are two independent exponential random variables


with mean θ and 2θ respectively, where θ is unknown. Which of the
following statements is true?
(a) X + 2Y is sufficient for θ
(b) X + 2Y is an unbiased estimator of θ
(c) X + 2Y is an unbiased estimator of 3θ
5
(d) X + 2Y is a biased estimator of 2θ Ans:(a)

3.48 A set of N observations resulted in n distinct values x1 , x2 , cdotsxn with re-


Pn
spective frequencies f1 , f2 , · · · fn so that i=1 fi = N . Another n observations

resulted in X1 , x2 , · · · , xn once each so that the modified (new) sample of size

N + n has observation xi with frequency fi + 1.

(a) The new mean is necessarily less than or equal to the original mean

(b) The new median is necessarily less than or equal to the original median

(c) The new variance is necessarily less than or equal to the original variance

(d) The new mode will be same as the original mode Ans:(d)

3.49 Let {Xi }, i = 1, 2, 3, · · · be sequence of iid random variables with E[Xi ] = 0

and V [Xi ] = 1 which of the following is true?


Pn
(a) n1 i=1 Xi2 → 0 in probability
1
Pn 2
(b) n3/2 i=1 Xi → 0 in probability
Pn
(c) i=1 Xi2 → 0 in probability
Pn
(d) n1 i=1 Xi2 → 1 in probability Ans:(d)
4. COMPLETE FAMILY OF DISTRIBUTION MODELS

4.1 Introduction

There are several sufficient statistics with varied degrees of data reduction un-
der different statistical probability models.The degree of data reduction by a sufficient
statistic carries the amount of ancillary information.A sufficient statistic has resulted
into maximum data reduction if it contains no ancillary information.
Definition: 4.1 A statistic T = T (X1 , X2 , · · · Xn ) is known as ancillary if the dis-
tribution of statistics T is independent of the parameter θ and first order ancillary if
Eθ [T ] is independent of the parameter θ.

4.2 Minimal statistic with ancillary information

An ancillary statistic does not contain any information of θ where as suffi-


cient statistics depend on the parameter θ. Statistical inference is objectively con-
ditional on the value of ancillary statistic. In problem 4.21, (X, Y ) ∼ BV N (0, 1, 1, ρ).
(X 2 , Y 2 , XY ) is minimal sufficient for ρ through the distributions of X and Y , since
X ∼ N (0, 1) and Y ∼ N (0, 1) and minimal statistic contains much ancillary infor-
mation, but ancillary statistics X ∼ N (0, 1) and Y ∼ N (0, 1) do not contain any
information of the parameter ρ. Thus a sufficient statistic T has no ancillary material
resulting into maximum data reduction, if Eθ [g(T )] = 0 ∀ θ ∈ Ω ⇒ g(t) = 0 ∀ t ∈ <.
That is no non- constant function g(T ) of T can have constant expectation. A sufficient
statistics T that brings out the maximum simplification of the problem is complete.
Summarization of data reduction can be achieved through complete sufficient statis-
tic.Also the property completeness of a statistic depends on the associated parameter
space Ω. If the family of distributions F0 is complete and that F0 ⊂ F , then the
family of distributions F is also complete. Thus the family should have a large number
of family of distributions to be complete.Note that if a non - constant function of T is
ancillary or first order ancillary, then T is not complete.
A.Santhakumaran 141

4.3 Complete statistics and completeness of family of distributions

Completeness of family of distributions is useful for investing whether there exists


only one unbiased estimator for θ. The existence of the mathematical expectation
implies that the Integral ( or Sum ) involves converges absolutely. The absolute con-
vergence is tacitly assumed in the definition of completeness.
Definition: 4.2 A family of distributions {Fθ , θ ∈ Ω} is to complete if for any mea-
surable function g(X) , Eθ [g(X)] = 0 ∀ θ ∈ Ω ⇒ g(x) = 0 ∀ x ∈ <
Definition: 4.3 A statistic T = T (X1 , X2 , · · · , Xn ) is to complete if the family of dis-
tributions of the statistic is complete, i.e., Eθ [g(T )] = 0 ∀ θ ∈ Ω ⇒ g(t) = 0 ∀ t ∈ <
Suppose T is some statistic for estimating the parameter θ. Let θˆ1 (T ) is an unbiased
estimator of θ so that Eθ [θˆ1 (T )] = θ ∀ θ ∈ Ω. Also θˆ2 (T ) is another unbiased estimator
of θ so that Eθ [θˆ2 (T )] = θ ∀ θ ∈ Ω. such that θˆ1 (T ) 6= θˆ2 (T ) . If T is complete, then

Eθ [θˆ1 (T ) − θˆ2 (T )] = 0 ∀ θ ∈ Ω

⇒ θˆ1 (T ) = θˆ2 (T ) ∀ F

Thus completeness helps for identifying the unique unbiased estimator through com-
plete statistic T . Definitely such estimator reduces the risk which is minimum.
The order statistic obtain from a random sample drawn from a continuous distribution
is complete in the following theorem.
Theorem 4.1 Let F be a class of absolutely continuous distribution functions F so
that F is convex. Also F contains all uniform densities in <. Let X1 , X2 , · · · , Xn be
iid F ∈ F . Then the order statistic T (X) = (X1 , X2 , · · · , Xn ) is complete.
Proof: An estimator T 0 is a function of T ,
i.e., T 0 = g(T ) if and only if Tn (xl ) = T 0 (x) ∀ l where xl = (xl1 , · · · , xln ) and
(l1 , l2 , · · · , ln ) is one of the n! permutations of numbers 1, 2, 3 · · · , n.
Consider cumulative distribution function F1 , F2 , · · · , Fn from F with corresponding
densities f1 (x), f2 (x), · · · , fn (x). For all positive numbers α1 , α2 , · · · , αn , there is some
F ∈ F , the densities Pn
αi fi (x)
f (x) = i=1
Pn
i=1 αi
A.Santhakumaran 142

Consider the expectation


Z Z n
T 0 (x1 , x2 , · · · , xn )
Y
··· f (xj )dx1 dx2 · · · dxn = 0
j=1
n Pn
i=1 αi fj (xj )
Z Z
T 0 (x1 , x2 , · · · , xn )
Y
··· Pn dx1 dx2 · · · dxn = 0 (4.1)
j=1 i6=j αi

the left hand side of the Equation (4.1) is a polynomial in α1 , α2 , · · · , αn . This polyno-
mial is identically equal to zero, which implies that corresponding coefficients are also
zero. i.e,
XZ Z n
T 0 (x1 , x2 , · · · xn )
Y
··· fj (xlj )dx1 dx2 · · · dxn = 0
l∈L j=1
XZ Z n
T 0 (xl )
Y
··· fj (xj )dx1 dx2 · · · xn = 0
l∈L j=1
Z Z n
Y
n! ··· g(T (x)) fj (xj )dx1 dx2 · · · dxn = 0
j=1

1
As the function fj (x) = bj −aj , aj < x < bj , then

Z b1 Z bn
··· g(T (x))dx1 dx2 · · · dxn = 0
a1 an

i.e., the integral of g(T ) over n - dimensional rectangle is 0.

⇒ PF {g(T )} = 0 ∀F ∈F

Thus the order statistic T (X) = (X1 , X2 , · · · Xn ) is complete.


Problem 4.1 Let X be one observation from the pmf

1 |x|

 θ (1 − θ)|x| x = −1, 1
 2


pθ (x) = 1 − θ(1 − θ) x=0



 0

otherwise

Show that the family is not complete but the family of distributions Y = |X| is
complete.
A.Santhakumaran 143

Solution: For single observation X , consider

Eθ [g(X)] = 0
1
X
g(x)pθ (x) = 0
x=−1

1 1
g(−1) θ(1 − θ) + g(0)[1 − θ(1 − θ)] + g(1) θ(1 − θ) = 0
2 2

[g(−1) + g(1) − 2g(0)]θ(1 − θ) + 2g(0) = 0

Equivating the coefficient of θx on both sides

g(0) = 0 and g(−1) + g(1) − 2g(0) = 0

⇒ g(−1) + g(1) = 0 and g(−1) = −g(1)

i.e. g(x) 6= 0 for x = −1, 1. Thus the family is not complete.


Let Y = |X| and find the pmf of Y .




 1 − θ(1 − θ) y = 0

pθ (y) = θ(1 − θ) y=1



 0

otherwise

Consider Eθ [g(Y )] = 0
1
X
g(y)[θ(1 − θ)]y [1 − θ(1 − θ)]1−y = 0
y=0
1
θ(1−θ)
X
g(y)ρy = 0 where ρ = 1−θ(1−θ)
y=0
g(0) + g(1)ρ = 0 ⇒ g(0) = 0 and g(1) = 0

i.e., g(y) = 0 ∀ y = 0, 1. Thus the family of distributions of Y = |X| is complete.


Problem 4.2 Let X1 , X2 , · · · , Xn be iid random sample drawn from a Bernoulli
Pn
population b(1, θ), 0 < θ < 1. Prove that T = i=1 Xi is a complete statistic.
Pn
Solution: Given T = i=1 Xi ∼ b(n, θ), consider

Eθ [g(T )] = 0
A.Santhakumaran 144

n
X
g(t)cnt (1 − θ)n−t = 0
t=0
n t
θ
X 
g(t)cnt (1 − θ)n = 0
t=0
1−θ
Here (1 − θ)n 6= 0
n
X θ
g(t)cnt ρt = 0 where ρ =
t=0
1−θ
g(0)cn0 + g(1)cn1 ρ + · · · + g(n)ρn = 0

By comparing the coefficients of ρt on both sides,

g(0) = 0 coefficient of ρ0

cn1 g(1) = 0 coefficient of ρ1

⇒ g(1) = 0

······ ··· ······

g(n) = 0 coefficient of ρn

Thus g(t) = 0 ∀ t = 0, 1, 2, · · · , n.

Pn
Hence T = i=1 Xi is a complete statistic.
Problem 4.3 Let X1 , X2 , · · · , Xn be iid random sample drawn from a Poisson pop-
Pn
ulation with parameter λ > 0. Show that T = i=1 Xi is a complete statistic.
Pn
Solution: Let T = i=1 Xi ∼ P (nλ)

(nλ)t
i.e., pλ (t) = e−nλ , t = 0, 1, 2, · · · , ∞
t!
Eλ [g(T )] = 0

(nλ)t
g(t)e−nλ
X
= 0
t=0
t!

(nλ)t
= 0 since e−nλ 6= 0
X
g(t)
t=0
t!
nλ (nλ)n
g(0) + g(1) + · · · + g(n) + ··· = 0
1! n!

By comparing the coefficients of λt on both sides,

g(0) = 0 coefficient of λ0
A.Santhakumaran 145

ng(1) = 0 coefficient of λ1

⇒ g(1) = 0

······ ··· ······

Thus g(t) = 0 ∀ t = 0, 1, 2, · · · , ∞

Pn
Hence T = i=1 Xi is a complete statistic.
Problem 4.4 Let X ∼ ∪(0, θ), θ > 0. Show that the family of distributions is
complete.
Solution: For a single observation X, the mathematical expectation of the measurable
function g(X) is

Eθ [g(X)] = 0
Z θ
1
⇒ g(x) dx = 0
0 θ
Z θ
⇒ g(x)dx = 0
0

One can differentiate the above integral with respect to θ on both sides

Z θ
0dx + g(θ) × 1 − g(0) × 0 = 0
0
hR i
b(θ)
d a(θ) pθ (x)dx Z b(θ)
dpθ (x) db(θ)
since = dx + pθ [b(θ)]
dθ dθ a(θ) dθ
da(θ)
−pθ [a(θ)]

g(θ) = 0 ∀ θ > 0, i.e., g(x) = 0 ∀ 0 < x < θ, θ > 0

Thus the family of distributions is complete.


Problem 4.5 Let X ∼ N (0, θ). Prove that the family of pdf ’s {N (0, θ), θ > 0} is
not complete.
Solution: Consider the mathematical expectation for θ

Eθ [X] = 0 for θ > 0


Z ∞
1 1 2
x√ √ e− 2θ x dx = 0
−∞ 2π θ
A.Santhakumaran 146

⇒ x = 0 not for all x

i.e., for some value of x 6= 0

since Eθ [X − θ] = 0
Z ∞
1 1 2
⇒ (x − θ) √ √ e− 2θ x dx = 0
−∞ 2π θ
Z ∞
1 1 2
⇒ t √ √ e− 2θ (t+θ) dt = 0 where t = x − θ
−∞ 2π θ
Z ∞
t
 
1 2 θ
−t − 2θ
e √ e t
dt = 0 since e− 2 6= 0
0 2πθ

R∞ −st f (t)dt.
This is same as the Bilateral Laplace Transform of f (t) as −∞ e By the
uniqueness property of the Laplace Transform
Z ∞
e−st f (t)dt = 0
o
⇒ f (t) = 0 ∀ t ∈ (−∞, ∞)
t 1 2
i.e., √ e− 2θ t = 0
2πθ
⇒ t = 0 i.e., x − θ = 0

⇒x = θ >0

Thus x is not equal to zero for θ > 0. The family X ∼ N (0, θ), θ > 0 is not complete.
Problem 4.6 If X ∼ N (0, θ), θ > 0. Prove that T = X 2 is a complete statistic.
T X2
Solution:Let T = (X − 0)2 , then θ = θ ∼ χ2 distribution with one degree of
T
freedom. θ has the pdf of G( 12 , 21 ).
 1 t 1

 1
1
e− 2 θ ( θt ) 2 −1 1θ 0<t<∞
pθ (t) = 2 2 Γ 12

 0 otherwise

1
 √ 1 e− 2θ t 12 −1
t 0<t<∞

= 2πθ
 0

otherwise
Eθ [g(T )] = 0
Z ∞
1 t 1
g(t) √ e− 2θ t 2 −1 dt = 0
0 2πθ
Z ∞
1 1
e− 2θ t [g(t)t− 2 ]dt = 0 ∀ θ > 0
0
A.Santhakumaran 147

R ∞ −st
This is same as the Laplace Transform of f (t) as 0 e f (t)dt.
Using the uniqueness property of Laplace Transform
1
g(t)t− 2 = 0 ∀ t > 0
i.e., g(t) = 0 ∀ t > 0. Thus T = X 2 is a complete statistic .
Problem 4.7 Examine whether the family of distributions

 2θ if 0 < x < 12 , 0 < θ < 1

pθ (x) =
 2(1 − θ) 1

if 2 ≤x<1

is complete.
Solution: Consider the mathematical expectation of the function g(X)

Eθ [g(X)] = 0
Z 1 Z 1
2
⇒ g(x)2θdx + g(x)2(1 − θ)dx = 0
1
0 2
Z 1 Z 1
2
2θ g(x)dx + 2(1 − θ) g(x)dx = 0
1
0 2
Z 1 Z 1 Z 1
2
θ g(x)dx − θ g(x)dx + g(x)dx = 0
1 1
0 2 2
1
"Z Z 1 # Z 1
2
θ g(x)dx − g(x)dx + g(x)dx = 0
1 1
0 2 2
1
"Z Z 1 #
2
θ g(x)dx − g(x)dx = 0
1
0 2
Z 1
and g(x)dx = 0
1
2
Z 1 Z 1
2
g(x)dx = g(x)dx θ 6= 0
1
0 2
Z 1
2
⇒ g(x)dx = 0
0

That is g(x) = 0 not for all x


i.e., g(x) 6= 0 for some x. Thus the family of distributions is not complete. Since
A.Santhakumaran 148

choose

1


 +1 if 0 < x < 4


1 1

 −1

if ≤x<
4 2
g(x) =
1 3



 +1 if 2 ≤x< 4


 −1 3
≤x<1

if 4
Z 1 Z 1
4 2
Eθ [g(X)] = (+1)2θdx + (−1)2θdx
1
0 4
Z 3 Z 1
4
+ (+1)2(1 − θ)dx + (−1)2(1 − θ)dx
1 3
2 4
1 1 1 1
= 2θ − 2θ + 2(1 − θ) − 2(1 − θ)
4 4 4 4
= 0

But g(x) 6= 0 for some x



1


 +1 if 0 < x < 4


1 1

 −1

if ≤x<
4 2
i.e., g(x) =
1 3



 +1 if 2 ≤x< 4


 −1 3
≤x<1

if 4

Theorem 4.2 Let {Pθ , θ ∈ Ω} be a single parameter exponential family of distribu-


tions. Its pdf is given by

 c(θ)eQ(θ)t(x) h(x)

if a < x < b
pθ (x) =
 0

otherwise

where a and b are independent of θ. Then the family of distributions is complete.


Proof: Assume pθ (x), θ ∈ Ω is a pmf.
X
Eθ [g(T )] = g(t)Pθ {T = t}
t
X
g(t)c(θ)eθt h(x) = 0
t
Choose Q(θ) = θ, h(x) = es(t) , t(x) = t and c(θ) 6= 0,
X
then g(t)eθt+s(t) = 0
t

Define g + (t) = max[g(t), 0] and g − (t) = − min[g(t), 0],


A.Santhakumaran 149

then g(t) = g + (t) − g − (t) and both g + (t) and g − (t) are non - negative functions

[g + (t) − g − (t)]eθt+s(t) = 0 ∀ θ ∈ Ω
X

t
g − (t)eθt+s(t) ∀ θ ∈ Ω
X X
g + (t)eθt+s(t) =
t t

Dividing g + (t) by a constant + (t)eθt+s(t)


P
tg and it is denoted by

g + (t)eθt+s(t)
p+ (t) = P + θt+s(t)
t g (t)e

Again dividing g − (t) by a constant − (t)eθt+s(t)


P
tg and is denoted by

g − (t)eθt+s(t)
p− (t) = P − θt+s(t)
t g (t)e

Now p+ (t) and p− (t) are both pmf ’s and

p− (t)eδt ∀ δ ∈ Ω
X X
p+ (t)eδt =
t t

By the uniqueness property of the moment generating functions p+ (t) = p− (t) ∀ t.


Hence g + (t) = g − (t) ∀ t
⇒ g(t) = 0 ∀ t. Thus the family of distributions is complete.
Problem 4.8 Let X1 , X2 , · · · , Xn be a random sample drawn from a population with
Pn Pn
N (θ, θ2 ). Define g(X) = 2

i=1 Xi , i=1 Xi . Prove that g(X) is not complete.
Pn 2 Pn 2
Solution: Define g(X) = 2 ( i=1 Xi ) − (n + 1) i=1 Xi , n = 2, 3, · · ·
" n #2 " n #
X X
Eθ [g(X)] = 2Eθ Xi − (n + 1)Eθ Xi2
i=1 i=1

Pn 2 2
( i=1 Xi ) = n2 X̄ 2 and X̄ ∼ N (θ, θn )
R ∞ 2 √n − n (x̄−θ)2
Eθ [X̄ 2 ] = −∞ x̄
√ e 2θ2
2πθ
dx̄
x̄−θ √
If z = θ n, then x̄ − θ = z √θn and dx̄ = √θ dz
n
√ 2
R∞ n − z √θ
.. . Eθ [X̄ 2 ] = −∞ (θ + z √θn )2 √2πθ e 2 n dz
R ∞h z2
2
i
= θ2 −∞ 1 + zn + √2zn √12π e− 2 dz
  2
z
1 R ∞ 2 √1
= θ2 1 + n −∞ z 2π
e− 2 dz + 0
A.Santhakumaran 150

1 1
One can take z 2 = t, then z = t 2 and dz = 21 t 2 −1 dt
t 1
h R ∞ i
i.e., Eθ [X̄ 2 ] = θ2 1 + √2
n 2π 0 te− 2 12 t 2 −1 dt

1 ∞ t 3
Z 
= θ2 1 + √ e− 2 t 2 −1 dt
n 2π 0
Γ 32
" #
2 1
= θ 1+ √ 3
n 2π ( 12 ) 2
√ √ #
1 12 π2 2
"
2
= θ 1+ √
n 2π
1 n+1 2
 
= θ2 1 + = θ
n n
" n #2
.
X n+1
. . Eθ Xi = Eθ [nX̄]2 = n2 Eθ [X̄]2 = n2 θ2
i=1
n
= n(n + 1)θ2
n
X n
X
Consider Xi2 = (Xi − θ + θ)2
i=1 i=1
Xn n
X
= (Xi − θ)2 + nθ2 + 2θ (Xi − θ)
i=1 i=1
Xn
= (Xi − θ)2 + 2θnx̄ − nθ2
i=1
" n #
X
Eθ Xi2 = Eθ [ns2 ] + 2θnEθ [X̄] − nθ2
i=1
= Eθ [ns2 ] + 2nθ2 − nθ2 = Eθ [ns2 ] + nθ2
1X
where s2 = (xi − θ)2
n
Pn
ns2 (Xi −θ)2
Let Y = σ2
= i=1
θ2
∼ χ2 distribution with n degrees of freedom. Y has the
pdf G( n2 , 12 )
 1 n

 1
1
e− 2 y y 2 −1 0 < y < ∞
p(y) = 22 Γn
2

 0 otherwise
Z ∞
1 1 n
E[Y ] = n
n
e− 2 y y 2 +1−1 dy = n
0 2 Γ22
" #
ns2
i.e., Eθ = n
σ2
Eθ [s2 ] = θ2 since σ 2 = θ2
A.Santhakumaran 151

n
X
Eθ [ Xi2 ] = nθ2 + nθ2 = 2nθ2
i=1
" n #2 " n #
X X
Eθ [g(X)] = 2Eθ Xi − (n + 1)Eθ Xi2
i=1 i=1
2 2
= 2n(n + 1)θ − (n + 1)2nθ = 0

⇒ g(x) = 0 not for all x

i.e., g(x) 6= 0 for some x


n
!2 n
!
X X
i.e., g(x) = 2 xi − (n + 1) x2i 6= 0
i=1 i=1
n
!2 n
!
X X
i.e., 2 xi 6= (n + 1) x2i for some x, n = 2, 3, · · ·
i=1 i=1

Thus g(X) is not a complete statistic


Problem 4.9 Show that the family of distributions given by the pdf ’s




 θ if 0 < x < θ

pθ (x) = (1 + θ) if θ ≤ x < 1, 0 < θ < 1



 0

otherwise

is complete.
Solution: Consider the expectation Eθ [g(X)] = 0
Z θ Z 1
θg(x)dx + (1 + θ)g(x)dx = 0 + 0
0 θ
Z θ
⇒ g(x)dx = 0 and
0
Z 1
g(x)dx = 0
θ
One can differentiate the above integrals with respect to θ
Z θ
0dx + g(θ) × 1 − g(0) × 0 = 0 and
0
Z 1
0dx + g(1) × 0 − g(θ) × 1 = 0
θ

g(θ) = 0 and −g(θ) = 0 ∀ θ > 0

i.e., g(x) = 0 ∀ 0 < x < θ, 0 < θ < 1


A.Santhakumaran 152

Thus the family of distributions is complete.


Definition 4.4 A statistic T = t(X) is said to be bounded complete statistic, if there
exists a function|g(T )| ≤ M, M ∈ < such that E[g(T )] = 0 ⇒ g(t) = 0 ∀ t ∈ <.
Problem 4.10 Show that Completeness implies bounded completeness, but bounded
completeness does not imply completeness.
Proof: Assume T = t(X) is a complete statistic.
That is E[g(T )] = 0 ⇒ g(t) = 0 ∀ t ∈ <.
Now to prove that g(T ) is bounded complete.

V [g(T )]
P {|g(T ) − E[g(T )]| < } ≥ 1 − for every given  > 0
2
V [g(T )]
P {|g(T )| < } ≥ 1 − for every given  > 0
2
⇒ |g(t)| <  ∀ t ∈ <

at least with probability 1 − V [g(T


2
)]
. This means that g(T ) is bounded with E[g(T )] =
0 ⇒ g(t) = 0 ∀ t ∈ <. i.e., T = t(X) is a bounded complete statistic.
Assume T = t(X) is a bounded complete statistic. To prove that T is not a
complete statistic.




 θ x = −1

Consider a family of density functions pθ (x) =
 (1 − θ)2 θx x = 0, 1, 2, · · ·


 0

otherwise

Examine whether the family is bounded complete. Consider a function



 x x = −1, 0, 1, 2, · · · , n and ∀ n ∈ N

g(x) =
 0

x = n + 1, n + 2, · · ·

Now the function g(x) = x is bounded. If the family is bounded complete, then

Eθ [X] = 0, ⇒ x = 0 ∀ x = −1, 0, 1, · · · , n and ∀ n ∈ N



X
xpθ (x) = 0
x=−1
n
X
−1 × θ + x(1 − θ)2 θx = 0
0
A.Santhakumaran 153

n
θ
= θ(1 − θ)−2
X
xθx =
x=0
(1 − θ)2
Xn
xθx = θ[1 + 2θ + 3θ2 + · · ·]
x=0
= [θ + 2θ2 + 3θ3 + · · ·]

X ∞
X
= xθx = xθx
x=1 x=0
Xn X∞
= xθx + xθx
x=0 x=n+1

X
⇒ xθx = 0
x=n+1

⇒ x = 0 ∀ x = −1, 0, 1, 2, · · · , n and ∀ n ∈ N . Thus the family of distributions is


bounded complete. But it is not complete. Since

Eθ [g(X)] = 0

X
i.e., g(x)pθ (x) = 0
x=−1

X
g(−1)θ + g(x)(1 − θ)2 θx = 0
x=0
X∞
g(x)(1 − θ)2 θx = −g(−1)θ
x=0

−g(−1)θ
= −g(−1)θ(1 − θ)−2
X
g(x)θx =
x=0
(1 − θ)2
X∞
g(x)θx = −g(−1)θ[1 + 2θ + 3θ2 + · · · +
x=0
nθn−1 + (n + 1)θn + · · ·]

= −g(−1)[θ + 2θ2 + 3θ3 + · · · +

nθn + (n + 1)θn+1 + · · ·]

X
= −g(−1) xθx
x=1
X∞
= −g(−1) xθx
x=0
⇒ g(x) = −g(−1)x = cx where c = −g(−1) and c ∈ <

Thus the family of distributions is not complete.


A.Santhakumaran 154

Problem 4.11 Examine the family of distributions of the random variable X given
by Pθ {X = −1} = θ2 , Pθ {X = 0} = 1 − θ and Pθ {X = 1} = θ(1 − θ), 0 < θ < 1 is
complete.
Solution: For single observation X , consider

Eθ [g(X)] = 0

g(−1) × θ2 + g(0)θ(1 − θ) + g(1)(1 − θ) = 0

θ2 [g(−1) − g(0)] + θ[g(0) − g(1)] + g(1)] = 0

g(−1) − g(0) = 0 coefficient of θ2

⇒ g(−1) = g(0)

g(0) − g(1) = 0 coefficient of θ

⇒ g(0) = g(1)

g(1) = 0 coefficient of θ0

Hence g(−1) = g(1) = g(0) = 0. Thus g(x) = 0 for x = −1, 0 and 1. .˙. The family
of distributions is complete.
Problem 4.12 The random variable X has the following distribution

X=x: 0 1 2
Pθ {X = x} 1 − θ − θ2 θ θ2

Prove that the family of distributions is complete.


Solution: For completeness of the distribution, calculate

Eθ [g(X)] = 0

g(0)[1 − θ − θ2 ] + g(1)θ + g(2)θ2 = 0

θ2 [g(2) − g(0)] + θ[g(1) − g(0)] + g(0) = 0

g(2) − g(0) = 0 coefficient of θ2

g(1) − g(0) = 0 coefficient of θ

g(0) = 0 coefficient of θ0
A.Santhakumaran 155

Hence g(0) = g(1) = g(2) = 0, i.e., g(x) = 0 for x = 0, 1 and 2. Thus the family of
distributions is complete.
Problem 4.13 X has the following distribution

X=x: 1 2 3 4 5 6
1 1 1 1 1 1
Pθ {X = x} 6 6 6 6 6 6

Examine whether the family of pmf ’s is complete.


Solution: Define 
 c

when x = 1, 3, 5
g(x) =
 −c when x = 2, 4, 6

Consider E[g(X)] = 0
3c 3c
⇒ − + = 0
6 6
But g(x) 6= 0 for x = 1, 2, 3, 4, 5, 6.

Thus the family of pmf ’s is not complete.


1
Problem 4.14 Show that the family of pmf ’s {pN (x) = N, x = 1, 2, · · · , N and ∀ N =
1, 2, 3, · · ·} is complete.
Solution: The pmf of a random variable X is

1
pN (x) = , x = 1, 2, · · · , N and ∀ N = 1, 2, · · ·
N
i.e., pN =1 (x) = 1, x = 1
1
pN =2 (x) = , x = 1, 2
2
1
pN =3 (x) = , x = 1, 2, 3
3
······ ··· ···············

······ ··· ···············

Consider EN g(X) = 0 ∀ N ∈ I+
PN 1
i.e., x=1 g(x) N = 0 ⇒ g(x) = 0 ∀ x and ∀ N
When N = 1 ⇒ g(1) = 0
A.Santhakumaran 156

When N = 2 ⇒ g(1) + g(2) = 0 ⇒ g(2) = 0 since g(1) = 0


When N = 3 ⇒ g(3) = 0 since g(1) + g(2) = 0 and so on.
Thus g(x) = 0 ∀ x and ∀ N ∈ I+ .
Thus the discrete family of uniform distributions defined on the sample {x | x =
1, 2, 3, · · · , N and N ∈ I+ } is complete.
1
Problem 4.15 Examine whether the family of pmf ’s {pN (x) = N, x =
1, 2, · · · , N and N = 2, 3, · · ·} is complete.
PN 1
Solution: Consider x=1 g(x) N = 0 when N = 2, 3, · · ·
P2
When N = 2 ⇒ x=1 g(x) = 0 ⇒ g(1) + g(2) = 0 i.e., g(2) = −g(1)
When N = 3 ⇒ g(1) + g(2) + g(3) = 0
⇒ g(3) = 0 and so on. Thus EN [g(x)] = 0 ⇒ g(x) 6= 0 for x = 1 and 2 ,

i.e., g(2) = −g(1) and

g(x) = 0 ∀ x = 3, 4, · · · , N and ∀ N = 2, 3, 4, · · ·

Thus the family of distributions is not complete.


Remark 4.1 Completeness is a property for family of distributions. As in the
example 4.15, one can see that the exclusion of even one member from the family
1
{pN (x) = N,x = 1, 2, · · · , N and N = 1, 2, · · ·} destroys completeness.
Remark 4.2 For the example 4.15, define




 c if x = 1

g(x) = −c if x = 2



 0 if x = 3, 4, 5, · · ·

PN 1
then x=1 g(x) N = 0 when N = 2, 3, · · ·
⇒ g(x) = 0 ∀ x = 1, 2, 3, · · · , N and N = 2, 3, · · · . This means that the family
of distributions is bounded complete. Thus there exist is a class of zero unbiased
estimators, i.e.,U0 = {g(X) | c ∈ <} where

 (−1)x−1 c x = 1, 2 and c ∈ <

g(x) =
 0

otherwise

If the family of distributions is complete, then the unbiased estimator of zero is unique.
A.Santhakumaran 157

4.4 Minimal sufficient statistics

Consider a random sample (X1 , X2 , X3 , · · · , Xn ) from a iid discrete popula-


tion with probability function pθ (x), θ ∈ Ω. The statistic T = {X1 = x1 , x2 =
x2 , · · · , Xn−1 = xn−1 } is not sufficient. For

P {X1 = x1 , · · · , Xn = xn | X1 = x1 , X2 = x2 , · · · , Xn−1 = xn−1 } = P {Xn = xn }

= P {Xn = xn }

= pθ (xn )

This conditional probability pθ (xn ) given the value of T , i.e., X1 = x1 , · · · , Xn−1 =


xn−1 is just the probability function of the nth observation which does depend on θ.
One uses a statistic means that there is a reduction of a given sample. It usually
simplifies the methodology, and, theory, and how much data can be reduced without
sacrificing sufficiency.
Definition 4.5 A statistic is said to be minimal sufficient if it is sufficient and if any
reduction of the partition of the sample space defined by the statistic is not sufficient.

4.5 Method of constructing minimal sufficient statistics

Lehmann and Scheffe technique for obtaining a minimal sufficient statistic is par-
tition of the sample space. Once the partition is obtained, a minimal sufficient statistic
can be defined by assigning distinct numbers to distinct partition sets.
In constructing sets of a partition that is to be sufficient for the family of
densities pθ (x), for θ ∈ Ω, there is two sets of sample points X1 = x1 , · · · , Xn = xn and
Y1 = y1 , Y2 = y2 , · · · , Yn = yn will lie on the same partition of the minimal sufficient
partition iff the ratio of x1 , x2 , · · · , xn to its value at y1 , y2 , · · · , yn :

pθ (x1 , x2 · · · , xn )
= k(y1 , · · · , yn ; x1 , x2 , · · · , xn )
pθ (y1 , y2 , · · · , yn )

where k(y1 , · · · , yn ; x1 , x2 , · · · , xn ) 6= 0 and k(y1 , · · · , yn ; x1 , x2 , · · · , xn ) is independent


of θ, θ ∈ Ω
A.Santhakumaran 158

The reason for writing the definition in terms of a product rather than a
ratio is taken into account the points for which pθ (x1 , x2 , · · · , xn ) = 0, i.e., all points
x1 , x2 , · · · , xn such that pθ (x1 , x2 , · · · , xn ) = 0 ∀ θ ∈ Ω will be equivalent, and every
x1 , x2 · · · , xn will be lie in some partition D, namely in D(x1 , x2 , · · · , xn ) and there
is no overlapping of the D’s, so that they constitute a partition of the sample space.
For if two D’s, say D(x1 , x2 , · · · , xn ) and D(y1 , y2 , · · · , yn ) have a point z1 , z2 , · · · , zn
in common, then z1 , z2 , · · · , zn is equivalent to both x1 , x2 , · · · , xn and y1 , y2 , · · · , yn
which are then equivalent to each other and define the same D. Thus the partition of
the sample space D defines the minimal sufficient partition.
Problem 4.16 Let X1 , X2 , · · · Xn be iid random sample drawn from a Binomial
population b(n, θ). Obtaining the minimal sufficient statistic by partition method.
Solution: The joint pdf at X1 = x1 , X2 = x2 , · · · , Xn = xn is
P P
xi
pθ (x1 , x2 , · · · , xn ) = θ (1 − θ)n− xi

The joint pdf at Y1 = y1 , Y2 = y2 , · · · , Yn = yn is


P P
yi
pθ (y1 , y2 , · · · , yn ) = θ (1 − θ)n− yi

The ratio is
P xi =P yi
pθ (x1 , x2 , · · · , xn ) θ

= .
pθ (y1 , y2 , · · · , yn ) 1−θ
yi . Thus the points x1 , x2 , · · · , xn and
P P
The ratio is independent of θ iff xi =
y1 , y2 , · · · , yn whose coordinates have the same set of minimal sufficient partition.
P
Therefore Xi is a minimal sufficient statistic.
Problem 4.17 Let X1 , X2 , · · · , Xn be iid random sample from N (θ, σ 2 ). Assume θ
and σ 2 are unknown. Prove that ( Xi2 ) is a minimal sufficient statistic.
P P
Xi ,
Solution: Consider the ratio

pθ,σ2 (x1 , x2 , · · · , xn ) 1 hX 2 X 2
 X X i
= exp − 2 xi − yi − 2θ xi − yi
pθ,σ2 (y1 , y2 , · · · , yn ) 2σ

The ratio is independent of the parameters (θ, σ 2 ) iff


P 2
xi = yi2 .
P P P
xi = yi and
Xi2 ) is a minimal sufficient statistic.
P P
Therefore ( Xi ,
A.Santhakumaran 159

Problem 4.18 Determine the minimal sufficient statistic based on a random sample
of size from each of the following:
(i) 
 θe−θx

θ>0
pθ (x) =
 0

otherwise
(ii) 
 x exp[− x2 ] x > 0

θ 2θ
pθ (x) =
 0

otherwise
and (iii)  q
x2
2 x2 − σ 2
e x>0


πσ 3
pσ (x) =
0 otherwise

Solution: (i) Consider the ratio

pθ (x1 , x2 , · · · , xn ) h X X i
= exp −θ xi − yi .
pθ (y1 , y2 , · · · , yn )
P P P
The ratio is independent of θ iff xi = yi . Therefore Xi is a minimal sufficient
statistic.
(ii) Consider the ratio

pθ (x1 , x2 , · · · , xn ) Y xi 1 X 2 X 2 
   
= exp − xi − yi .
pθ (y1 , y2 , · · · , yn ) yi 2θ
P 2 P 2 P 2
The ratio is independent of the parameter θ iff xi = yi . Therefore Xi is a
minimal sufficient statistic.
(iii) Consider the ratio
!
pσ (x1 , x2 , · · · , xn ) Y x2i 1 X 2 X 2 
 
= exp − 2 xi − yi
pθ (y1 , y2 , · · · , yn ) yi2 2σ
P 2
xi = yi2 . Therefore Xi2 is a minimal sufficient
P P
The ratio is independent of σ iff
statistic.
Theorem 4.3 The Exponential family of distributions consists of those dis-
tributions with densities or probability functions expressible in the form: pθ (x) =
c(θ)eQ(θ)t(x) h(x), i.e., pθ (x) is a member of exponential family, then there exist is
a minimal sufficient statistic.
A.Santhakumaran 160

Proof: The joint density function of the random sample X1 , X2 , · · · , Xn for a random
variable X is
X Y
pθ (x1 , x2 , · · · , xn ) = [c(θ)]n exp[Q(θ) t(xi )] h(xi ).

Consider the ratio of this density at x1 , x2 , · · · , xn to its value at y1 , y2 , · · · , yn is


pθ (x1 , x2 , · · · , xn ) h nX X oi Y h(x )
i
= exp Q(θ) t(xi ) − t(yi ) .
pθ (y1 , y2 , · · · , yn ) h(yi )
P P P
This is independent of θ iff t(xi ) = t(yi ). Therefore T = t(Xi ) is a minimal
sufficient statistic.
Remark 4.3 A complete sufficient statistics is minimal sufficient whenever minimal
sufficient statistic exists.
Theorem 4.4 Let pθ0 (x) and pθ1 (x)) be the densities and they have the same support
pθ1 (X)
( the range of the two densities are the same). Then the statistic T = pθ0 (X) is minimal
sufficient.
Proof: The necessary and sufficient condition that T = t(X) is a sufficient statistic
for fixed θ1 and θ0 are

pθ1 (x1 , x2 , · · · , xn ) = pθ1 (t)h(x1 , x1 , · · · , xn )

and
pθ0 (x1 , x2 · · · , xn ) = pθ0 (t)h(x1 , x2 , · · · , xn )
pθ1 (x1 ,x2 ,···,xn ) pθ1 (t)
respectively. Let the ratio pθ0 (x1 ,x2 ,···,xn ) = pθ0 (t) be a function of u(x), then U =
p (X)
u(X1 , X2 , · · · , Xn ) is a sufficient statistic for pθθ1 (X) iff T is a function of U . This
0

proves T = t(X) to be minimal sufficient statistic.


If F is a family of distributions with common support and F0 ⊂ F and if
T = t(X) is minimal sufficient statistic for F0 and sufficient for F , it is minimal
sufficient for F .
Problem 4.19 Let F ∼ N (θ, 1) and F0 ∼ N (θ0 , 1) . Prove that F0 ⊂ F .
Solution: Let X1 , X2 , · · · , Xn be a random sample of size n. Then
1
P
(xi −θ)2
pθ (x1 , x2 , · · · , xn ) e− 2
= 1
P
pθ (x1 , x2 · · · , xn ) e− 2 (xi −θ0 )
2

1 2 2
= e 2 [2n(θ−θ0 )x̄−n(θ −θ0 )]
A.Santhakumaran 161

Thus T = X̄ ∼ N (θ0 , n) is the minimal sufficient statistic for N (θ, 1) , i.e., F0 ⊂ F .


Problem 4.20 Let X1 , X2 , · · · , Xn be a random sample from a population defined by
the Cauchy density with parameter θ:

1
−∞ < x < ∞


π[1+(x−θ)2 ]
pθ (x) =
 0

otherwise − ∞ < θ < ∞

Find the minimal sufficient statistic.


Solution:Two sets of sample points x1 , x2 , · · · , xn and y1 , y2 , · · · , yn will lie on the
same partition of the minimal sufficient partition iff the ratio
n
" #
pθ (x1 , x2 , · · · , xn ) Y 1 + (yj − θ)2
=
pθ (y1 ,2 , · · · , yn ) j=1
1 + (xj − θ)2

is independent of θ The numerator and denominator are polynomials of degree 2n in θ.


The ratio is independent of θ, the two polynomials are identical ( the leading coefficients
being equal). This means that the set of zeroes of the numerator polynomial yj + i

( i = −1, j = 1, 2, · · · , n) is the same as the set of zeroes of the polynomial, xj + i
(j = 1, 2, · · · , n). This is true iff the real numbers (x1 , x2 , · · · , xn ) are a permutation of
the numbers (y1 , y2 , · · · , yn ). A partition set of the minimal sufficient partition consists
of the n! permutations of n real numbers. This minimal sufficient partition is defined
by the order statistic ( X(1) , X(2) , · · · , X(n) ).
Problem: 4.21 Construct the minimal sufficient statistics of considering the bivariate
normal distribution (X, Y ) ∼ BV N (0, 1, 1, ρ).
The bivariate normal densities of the distribution is
1
[x2 −2xy+y 2 ]



 √1 e 2(1−ρ2 ) −1 ≤ ρ ≤ 1
p(x, y) = 2π 1−ρ2
 0

otherwise

p(x1 ,x2 ,···,xn ,y1 ,y2 ,···,yn )


Solution:Consider the ratio p(u1 ,u2 ,···,un ,v1 ,v2 ,···vn )

1
P P P
− [ x2i −2 xi yi − yi2 ]
e 2(1−ρ2 )
= 1
P P P
− [ u2i −2 ui vi + vi2 ]
e 2(1−ρ2 )

The ratio is independent of the parameter ρ if and only if


P 2 P 2 P 2 P 2
ui vi . The set ( x2i , yi2 , xi yi ) is
P P P P P
xi = ui , yi = vi and x i yi =
A.Santhakumaran 162

Yi are ancillary since each Xi ∼ N (0, 1)


P P
minimal sufficient statistic and Xi and
and each Yi ∼ N (0, 1) are independent of the parameter ρ
Problem 4.22 Let X be a discrete random variable with the following pmf pθ (x)

x 1 2 3 4 5 6
θ1 1/30 1/15 1/10 4/15 2/ 15 2/5
θ2 1/60 1/30 1/20 1/3 1/15 1/2

Find the minimal sufficient statistic for (θ1 , θ2 ).


Solution: Choosing the partitions as A1 = {1, 2, 3, 5} and A2 = {4, 6},and there by
pθ (x)
calculating the likelihood equivalence principle ratios pθ (y for θ if T (x) = T (y).

pθ (x) pθ (1) pθ (1) pθ (1) pθ (1) pθ (1) pθ (2) pθ (2)


pθ (y) pθ (2) pθ (3) pθ (4) pθ (5) pθ (6) pθ (3) pθ (4)

θ1 1/2 1/3 1/8 1/4 1/12 2/3 1/4


θ2 1/2 1/3 1/20 1/4 1/30 2/3 1/10
pθ (2) pθ (2) pθ (3) pθ (3) pθ (3) pθ (4) pθ (4) pθ (5)
pθ (5) pθ (6) pθ (4) pθ (5) pθ (6) pθ (5) pθ (6) pθ (6)

1/2 1/6 3/8 3/4 1/4 2 2/3 1/3


1/2 1/15 3/20 3/4 1/10 5 2/3 2/15

pθ (x)
For T (x) = T (y) the ratios pθ (y) on the partitions A1 and A2 are free of θ . The
minimal sufficient statistics is

 a if x ∈ A1 = {1, 2, 3, 5}

T (X) =
 b

if x ∈ A2 = {4, 5}

Problems

4.1 X has the following distribution

X=x: 0 1 2
Pθ {X = x} 1 − θ − θ2 θ2 θ

Prove that the family of distributions is complete.


A.Santhakumaran 163

4.2 Let X1 , X2 , · · · , Xn be a sample from pmf



 1

x = 1, 2, 3, · · · , N ; N ∈ I+
N
pN (x) = PN {X = x} =
 0

otherwise

Examine if the family of distributions is complete.

4.3 Let X1 , X2 , · · · , Xn be iid random variables from ∪(0, θ). Prove that the statistic
YN = max1≤i≤n {Xi } is complete.

4.4 Consider the class of Hypergeometric probability distributions, {PD , D =


0, 1, 2, · · · , N } where
  
D N-D

x n-x




x = 0, 1, · · · , min(n, D)


N
  
PD {X = x} = n





 0

otherwise

Show that it is a complete class.

4.5 Examine if the family of distributions



 θ if 0 < x ≤ 1

pθ (x) =
 1 − θ if 1 < x ≤ 2

is complete.

4.6 Let X1 , X2 , · · · , Xn be a sample from ∪(θ − 12 , θ + 12 ), θ ∈ <. Show that the statistic
T = (min1≤i≤n (Xi ), max1≤i≤n (Xi )) is not complete.

4.7 Let X1 , X2 , · · · , Xn be a sample of n independent observations from N (θ, σ 2 )


Pn Pn
−∞ < θ < ∞, 0 < σ 2 < ∞. Show that 2

i=1 Xi , i=1 Xi is a sufficient
statistic. Is it complete? Justify?

4.8 Show that the Exponential family of distributions



 exp[θx + w(x) + A(θ)] −∞ < x < ∞

pθ (x) =
 0

otherwise

depending on a single parameter θ is complete.


A.Santhakumaran 164

4.9 Prove that a complete sufficient statistics is minimal sufficient whenever minimal
sufficient statistic exists.

4.10 Explain the method of construction of minimal sufficient statistic.

4.11 Let X1 , X2 , · · · , Xn be a random sample drawn from a distribution having pdf


of the form 
 1f(x) ∀ θ ∈ Ω

θ θ
fθ (x) =
 0

otherwise

where θ is a scale parameter( the pdf is scale density ). Find the ancillary statistic
for the family of distributions.

4.12 Let X1 , X2 , · · · , Xn be a random sample drawn from a distribution having pdf


of the form

 1 f ( x−θ1 ) −∞ < θ1 < ∞, θ2 > 0

θ2 θ2
fθ1 ,θ2 (x) =

 0 otherwise

where θ1 is a location parameter and θ2 is a scale parameter. Prove that the


statistic T = t(X1 , X2 · · · , Xn ) is an ancillary statistic for the family of distribu-
tions.

4.13 Let X be a discrete random variable with the following pmf pθ (x) .

x 1 2 3 4 5 6
θ1 1/14 2/14 3/14 3/14 4/14 1/14
θ2 1/18 2/18 5/18 5/18 4/18 1/18

where A = {1, 2, 3, 4, 5, 6} and θ = {θ1 , θ2 }. Show that the minimal sufficient


statistic is 
 a if x ∈ A1 = {1, 2, 4, 6}

T (X) =
 b

if x ∈ A2 = {3, 4}

4.14 If a family F is complete, then it is possible to conclude that completeness for


A.Santhakumaran 165

(a) a larger class (c) a small class


(b) an equal class (d) an empty class Ans:(a)

1
4.15 For a fixed n0 = 1, 2, · · · from the family of densities {pN (x) = N,x =
1, 2, 3, · · · N, N ∈ I+ }. Let F = {pN (x) = 1
N,x = 1, 2, 3 · · · , N, N ∈ I+ and N 6=
n0 } where 
 1

x = 1, 2, · · · , N, N ∈ I+
N
pN (x) =
 0

otherwise
then

(a) F is complete (c) F is bounded complete


(b) F is not complete (d) F is not bounded complete Ans:(b)

4.16 If a complete sufficient statistic does not exist, then UMVUE

(a) may not exist (c) may unique


(b) may exist (d) none of the above. Ans:(b)

4.17 If a complete sufficient statistic exists, then UMVUE is

(a) unique (c) not exist


(b) not unique (d) none of the above Ans:(a)

4.18 Let X1 , X2 , · · · , Xn be iid uniform (θ1 , θ2 ) variables, where θ1 < θ2 are unknown
parameters. Which of the following is an ancillary statistic?.

X(k) X(k)
(a) X(n) for any k < n (c) X(n) −X(k) for any k < n
X(n) −X(k) X(k) −X(1)
(b) X(n) for any k <n (d) X(n) −X(k) for any k where 1 < k < n

Ans:(b)
A.Santhakumaran 166

4.19 Consider the problem of estimation of a parameter θ on the basis of X, where


X ∼ N (θ, 1) and −∞ < θ < ∞. Under squared error loss, X has uniformly
smaller risk than that of kX for

(a) k < 0 (c) k > 0


(b) 0 < k < 1 (d) k > ∞ Ans:(b)

4.20 Let X1 , X2 , · · · , Xn be a random sample drawn from a distribution having pdf


of the form 
 1f(x) ∀ θ ∈ Ω

θ θ
fθ (x) =
 0

otherwise
where θ is a scale parameter.Which of the following statements are true?
(a) the pdf is scale density
(b) the pdf is a location density
(c) the pdf is not a scale density
(d) the normal pdf with mean µ and variance σ 2 is an scale and location density
Ans:(a) and (d)

4.21 The family of pmf 0 s {pN (x) = 1


N,x = 1, 2, 3, · · · N, N = 2, 3, 4, · · ·} is

(a) complete (c) bounded complete


(b) not complete (d) not bounded complete Ans:(b) and (c)

1
4.22 The family of distributions {pN (x) = N,x = 1, 2, 3, · · · , N, N = 2, 3, 4, · · ·} is bounded

complete. Then which of the following statement is true?

(a) there exist is a class of zero unbiased estimators

(b) there exist is a class of unique zero unbiased estimators

(c) there exist is a class biased estimators

(d) there exist is a class biased estimators Ans:(a)

4.23 Which of the following statements are true?. Random variable X ∼ N (θ, σ 2 ), then
(a) P {X = θ} = 0 (b) P {X > θ} = 0.5 (c) P {X < Median} = 0.5

(d) P {X ≥ Mode } = 0.5 Ans:(a),(b), (c) and (d)


5. OPTIMAL ESTIMATION

5.1 Introduction

Let g(T ) be an unbiased estimator of τ (θ) and δ(T ) be an another unbiased estimator
of τ (θ) different from g(T ). Then there always exists an infinite number of unbiased
estimators of τ (θ) such that λg(T ) + (1 − λ)δ(T ), 0 < λ < 1. In this case one can find
the best estimator or optimal estimator among all the unbiased estimators. The following
procedures are used to identify the optimal estimator.

5.2 Uniformly Minimum Variance Unbiased Estimator

Let U = {δi (T ), i = 1, 2, 3, · · ·} be the set of all unbiased estimators of the param-


eter τ (θ) ∀ θ ∈ Ω and Vθ [δi (T )] < ∞, i = 1, 2, 3, · · · ∀ θ ∈ Ω and g(T ) be a statistic
with Vθ [g(T )] < ∞. Then the estimator g(T ) is called an Uniformly Minimum Variance
Unbiased Estimator (UMVUE) of τ (θ) if

Eθ [g(T )] = τ (θ) ∀ θ ∈ Ω and

Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δi (T ) − τ (θ)]2

i.e., Vθ [g(T )] ≤ Vθ [δi (T )] ∀ θ ∈ Ω and ∀ i = 1, 2, 3, · · ·

The procedures to identify the UMVUE are

• Uncorrelatedness approach

• Sufficient statistic approach

• Information inequality approach

5.3 Uncorrelatedness approach

It is a mathematical property based on the uncorrelatedness of estimators. The


following result gives a necessary and sufficient condition for an unbiased estimator to be
UMVUE.
A.Santhakumaran 168

Theorem 5.1 Let U be the class of all unbiased estimators T = t(X) of a parameter
τ (θ) ∀ θ ∈ Ω with Eθ [T 2 ] < ∞ for all θ. Suppose that U is a non-empty set. Let U0 be
the set of all unbiased estimators of V of zero, i.e.,

U0 = {V | Eθ [V ] = 0, Eθ [V 2 ] < ∞ ∀ θ ∈ Ω}

Then T ∈ U is a UMVUE of τ (θ) if and only if Eθ [V T ] = 0 ∀ θ ∈ Ω and ∀ V ∈ U0 .


Proof: Let T ∈ U and V ∈ U0 . Assume that T = t(X) is a UMVUE of τ (θ). Prove that
Eθ [V T ] = 0 ∀ θ ∈ Ω. That is, Covθ [V, T ] = 0 ∀ θ ∈ Ω and V ∈ U0 . Consider T + λ V ∈ U
for some real λ (λ 6= 0), then

Eθ [T + λ V ] = τ (θ) + λEθ [V ]

= τ (θ) since Eθ [V ] = 0

⇒ T + λ V is also an unbiased estimator of τ (θ) and Vθ [T + λ V ] ≥ Vθ [T ] ∀ θ and ∀ λ.

Vθ [T ] + λ2 Vθ [V ] + 2λCovθ [V, T ] ≥ Vθ [T ]

i.e., 2λCovθ [T, V ] + λ2 Vθ [V ] ≥ 0 ∀ θ and ∀ λ

It is an quadratic equation in λ and it has two real roots λ = 0 and λ = − 2Covθ [T,V ]
Vθ [V ] . If
λ = 0, trivially T is an UMVUE of τ (θ).
For λ 6= 0, take λ0 = λ
2 = − Covθ [T,V ]
Eθ [V 2 ]
, then one can define T 0 ∈ U where T 0 = T + λ0 V
and Eθ [T + λ0 V ] = Eθ [T ] = τ (θ) and
2
Vθ [T 0 ] = Eθ [T + λ0 V ]2 − Eθ [T + λ0 V ]


= Eθ [T + λ0 V ]2 − τ 2 (θ)
2
= Eθ [T 2 ] − τ 2 (θ) + λ0 Eθ [V 2 ] + 2λ0 Covθ [T, V ]
(Covθ [T, V ])2 2 (Covθ [T, V ])2
= Vθ [T ] + Eθ [V ] − 2
(Eθ [V 2 ])2 Eθ [V 2 ]
(Covθ [T, V ]) 2
= Vθ [T ] −
Eθ [V 2 ]
(Covθ [T, V ])2
Vθ [T 0 ] = Vθ [T ] − ≤ Vθ [T ]
Eθ [V 2 ]

Thus λ0 = − EEθθ[T V]
[V 2 ]
contradicts that T is the UMVUE of τ (θ). If T is the UMVUE of
τ (θ), then Covθ [T, V ] = 0, i.e., Eθ [T V ] = 0 ∀ θ ∈ Ω.
A.Santhakumaran 169

Conversely, assume Covθ [T, V ] = 0 for some θ ∈ Ω. To prove that T is a UMVUE of τ (θ).
Let T 0 be another unbiased estimator of τ (θ) so that T 0 ∈ U, then T 0 − T ∈ U0 . Since
Eθ [T ] = τ (θ) and Eθ [T 0 ] = τ (θ) ⇒

Eθ [T 0 − T ] = 0

⇒ Eθ [T (T 0 − T )] = 0

Eθ [T T 0 ] = Eθ [T 2 ]

Applying Cauchy Schwarz inequality to Eθ [T 0 T ]


2 2
Eθ [T T 0 ] ≤ Eθ [T 2 ]Eθ [T 0 ]


n o1 n o1
2
Eθ [T T 0 ] ≤ Eθ [T 2 ] 2
Eθ [T 0 ] 2

Eθ [T 2 ] n
2
o1
1 ≤ Eθ [T 0 ] 2

{Eθ [T 2 ]} 2
Vθ [T ] ≤ Vθ [T 0 ]

Thus T = t(X) is the UMVUE of τ (θ).


Theorem 5.2 Let U be the non-empty class of unbiased estimators as in the Theorem
5.1, then there exists at most one UMVUE of τ (θ).
Proof: Assume T = t(X) is a UMVUE of τ (θ). Let T 0 = t0 (X) be another UMVUE of
τ (θ).

Eθ [T 0 ] = τ (θ)

Eθ [T ] = τ (θ)

⇒ Eθ [T 0 − T ] = 0

⇒ Eθ [T (T 0 − T )] = 0

i.e., Eθ [T 2 ] = Eθ [T T 0 ]

Covθ [T, T 0 ] = Vθ [T ]

The correlation coefficient between T and T 0 is given by


Covθ [T, T 0 ] Vθ [T ]
= =1
Vθ [T 0 ]
p p
Vθ [T ] Vθ [T 0 ]
since Vθ [T ] = Vθ [T 0 ]

⇒ Pθ {aT + bT 0 = 0} = 1 ∀ a, b ∈ <
A.Santhakumaran 170

Choose a = 1 and b = −1
⇒ Pθ {T = T 0 } = 1, then T and T 0 are the same. .˙. The UMVUE T is unique.
Theorem 5.3 If UMVUE’s Ti = ti (X), i = 1, 2 exist for real function τ1 (θ) and τ2 (θ)
of θ, then aT1 + bT2 is also UMVUE of aτ1 (θ) + bτ2 (θ).
Proof: Given T1 = t1 (X) is a UMVUE of τ1 (θ), i.e., Eθ [T1 V ] = 0 ∀ θ ∈ Ω and V ∈ U0 .
Again Eθ [T2 V ] = 0, ∀ θ ∈ Ω and V ∈ U0 .
Prove that Eθ {[(aT1 + bT2 )V ]} = 0 ∀ θ ∈ Ω.

Covθ [(aT1 + bT2 )V ] = Eθ [(aT1 V ) + (bT2 V )]

−Eθ [aT1 + bT2 ]Eθ [V ] since Eθ [V ] = 0

= Eθ [aT1 V ] + Eθ [bT2 V ]

= aCovθ [T1 , V ] + bCovθ [T2 , V ]

= a×0+b×0=0

Thus aT1 + bT2 is a UMVUE of aτ1 (θ) + bτ2 (θ).


Theorem 5.4 Let {Tn = tn (X)} be a sequence of UMVUE’s of τ (θ) and T = t(X) be
a statistic with Eθ [T 2 ] < ∞ and such that Eθ [Tn − T ]2 → 0 as n → ∞ ∀ θ ∈ Ω. Then
T = t(X) is also the UMVUE of τ (θ).
Proof: Given {Tn }∞
n=1 is a UMVUE of τ (θ), i.e., Eθ [Tn V ] = 0 ∀ n = 1, 2, 3, · · · ∀ θ

and Eθ [V ] = 0 ∀ θ. Prove that T is also an UMVUE of τ (θ), i.e., Eθ [T V ] = 0 ∀ θ and


Eθ [V ] = 0 ∀ θ.

Consider Eθ [T − τ (θ)] = Eθ [T − Tn + Tn − τ (θ)]

|Eθ [T − τ (θ)]| ≤ |Eθ [T − Tn ]| + |Eθ [Tn − τ (θ)]|

≤ Eθ |T − Tn | since Eθ |Tn − τ (θ)| ≥ 0


n o1 n o1
i.e., |Eθ [T − τ (θ)]| ≤ Eθ [T − Tn ]2 2
since |Eθ [T − Tn ]| ≤ Eθ [T − Tn ]2 2

Consider Covθ [T, V ] = Eθ [T V ] − 0

= Eθ [T V ] − Eθ [Tn V ]

Eθ [T V ] = Eθ [(T − Tn )V ]

Applying Cauchy Schwarz inequality to Eθ [(T − Tn )V ]


 1 n o1
|Eθ [(T − Tn )V ]| ≤ Eθ [V 2 ] 2
Eθ [T − Tn ]2 2
A.Santhakumaran 171

But Eθ [(T − Tn )V ] = Eθ [T V ]
 1 n o1
.. . |Eθ [T V ]| ≤ Eθ [V 2 ] 2
Eθ [T − Tn ]2 2

n o1
But Eθ [T − Tn ]2 2
≥ 0 and

Eθ [T − Tn ]2 → 0 as n → ∞

.. . Eθ [T V ] → 0 as n → ∞

i.e., Covθ [T, V ] = 0 as n → ∞ ∀ θ ∈ Ω.

Thus T = t(X) is a UMVUE of τ (θ).


Problem 5.1 if T1 = t1 (X) and T2 = t2 (X) are UMVUE of τ (θ), show that the
correlation coefficient between T1 and T2 is one.
Solution:Given Eθ [T1 ] = τ (θ) and Eθ [T2 ] = τ (θ) for θ ∈ Ω
Vθ [T1 ] = Vθ [T2 ] for θ ∈ Ω.
Consider a new estimator T = 21 [T1 + T2 ] which is also the unbiased estimator of τ (θ),
i.e.,

1 1
Eθ [T ] = Eθ [T1 ] + Eθ [T2 ]
2 2
1 1
= τ (θ) + τ (θ)
2 2
= τ (θ)

1
 
Vθ [T ] = Vθ [T1 + T2 ]
2
1
= {Vθ [T1 ] + Vθ [T2 ] + 2Covθ [T, T2 ]}
4
1 q 
= Vθ [T1 ] + Vθ [T2 ] + 2ρ Vθ [T1 ] + Vθ [T2 ]
4
1
= {2Vθ [T1 ] + +2ρVθ [T1 ]}
4
1
= Vθ [T1 ](1 + ρ)
2

where ρ is the correlation coefficient between T1 and T2 .


Since T1 is the UMVUE of τ (θ)

⇒ Vθ [T ] ≥ Vθ [T1 ]
A.Santhakumaran 172

1
Vθ [T1 ](1 + ρ) ≥ Vθ [T1 ]
2
(1 + ρ) ≥ 2

ρ ≥1

But ρ ≤ 1 ⇒ ρ = 1. Thus the correlation coefficient between the UMVUE’s T1 and T2 is


one.
Problem 5.2 Let X1 , X2 , · · · , Xn be a sample from a population with mean θ and finite
Pn
variance, and T be an estimate of θ of the form T (X1 , X2 , · · · , Xn ) = i=1 αi Xi . If T is
an unbiased estimate of θ that has minimum variance and T 0 is another linear unbiased
estimate of θ , show that
Covθ (T, T 0 ) = Vθ [T ]
Pn
Solution: Given T = i=1 αi Xi is the unbiased estimator of θ, Eθ [T ] = θ.
Also T 0 is the unbiased estimator of θ, i.e., Eθ = [T 0 ].

Eθ [T ] = θ

Eθ [T 0 ] = θ

Eθ [T − T 0 ] = 0

Eθ [T [T − T 0 ] = 0

Eθ [T 2 − T T 0 ] = 0

Eθ [T 2 ] − Eθ [T T 0 ] = 0

Eθ [T T 2 ] = Eθ [T 2 ]

i.e., Covθ (T, T 0 ) = Vθ [T ]

Problem 5.3 Let T1 , T2 be two unbiased estimates having common variance ασ 2 (α > 1),
where σ 2 is the variance of the UMVUE. Show that the correlation coefficient between T1
2−α
and T2 is greater than or equal to α .

Solution: Given Eθ [T1 ] = τ (θ) and Vθ [T1 ] = ασ 2 .


Also Eθ [T2 ] = τ (θ) and Vθ [T2 ] = ασ 2 where α > 1
Since T1 and T2 are UMVUE’s of τ (θ), We have Vθ [T1 ] = Vθ [T2 ] = ασ 2
A.Santhakumaran 173

Consider an estimator 12 [T1 + T2 ]. It is also an unbiased estimator of τ (θ),


h i
1
i.e., Eθ [T1 ] = 2 Eθ [T1 ] + 12 Eθ [T2 ] = 21 τ (θ) + 12 τ (θ) = τ (θ)

1 1
Vθ [ (T1 + T2 )] = {Vθ [T1 ] + Vθ [T2 ] + 2Covθ (T1 , T2 )}
2 4
1 q 
= Vθ [T1 ] + Vθ [T2 ] + 2ρ Vθ [T1 ]Vθ [T2 ]
4
1
= [2Vθ [T1 ] + 2ρVθ [T2 ]]
4
1
= [Vθ [T1 ] + ρVθ [T1 ]
2

where ρ is the correlation coefficient between T1 and T2 . Let T be the UMVUE of τ (θ).

1
Vθ { [T1 + T2 ]} ≥ Vθ [T ]
2
1
i.e., Vθ [T1 ](1 + ρ) ≥ Vθ [T ]
2
1 2
ασ (1 + ρ) ≥ σ2
2
α(1 + ρ) ≥ 2
2
(1 + ρ) ≥
α
2−α
ρ ≥
α

5.4 Sufficient statistic approach

Rao - Blackwell Theorem helps to search for an UMVUE T = t(X) of a parametric


function τ (θ). Let δ(T ) be another statistic and a function of the sufficient statistic T =
t(X) which is an unbiased estimator for the parametric function τ (θ), i.e., Eθ [δ(T )] = τ (θ).
Rao - Blackwell Theorem improves on δ(T ) by conditioning on the sufficient statistic T = t.
That is, computing E[δ(T ) | T = t] = g(t) so that Eθ [g(T )] = τ (θ) with smaller variance
than that of δ(T ). Also it states that the conditioning on sufficient statistic T = t(X) is
made irrespective of any unbiased estimator δ(T ) of τ (θ).
Theorem 5.5 Let {Pθ , θ ∈ Ω} be a family of probability distributions and δ(T ) be
any statistic in U where U is the non-empty class of all unbiased estimators of τ (θ)
with Eθ [δ 2 (T )] < ∞. Let T = t(X) be a sufficient statistic for {Pθ , θ ∈ Ω}. Then the
conditional expectation E[δ(T ) | T = t] = g(t) is independent of θ and g(T ) is an unbiased
A.Santhakumaran 174

estimator of τ (θ). Also Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δ(T ) − τ (θ)]2 ∀ θ ∈ Ω.


Proof: Given that δ(T ) is a unbiased estimator of τ (θ), ∀ θ ∈ Ω and δ(T ) is a function
of a sufficient statistic T . E[δ(T ) | T = t] = g(t) and the statistic g(T ) is an unbiased
estimator of τ (θ), since Eθ [E {δ(T ) | T }] = Eθ [δ(T )] = τ (θ) ∀ θ ∈ Ω.
Now prove that Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δ(T ) − τ (θ)]2 ∀ θ ∈ Ω.
It is enough to prove that Eθ [g 2 (T )] ≤ Eθ [δ 2 (T )] ∀ θ ∈ Ω.

One knows that E [δ(T ) | T ] = E [δ(T ) | T 1 | T ]

Applying Cauchy Schwarz inequality to E[δ(T ) | T 1 | T ]

2
{E[δ(T ) | T 1 | T ]} ≤ E[δ 2 (T ) | T ]E[12 | T ]
2
i.e., {E[δ(T ) | T ]} ≤ E[δ 2 (T ) | T ]

i.e., g 2 (t) ≤ E[δ 2 (T ) | T ]

i.e., Eθ [g 2 (T )] Eθ E[δ 2 (T ) | T ] = Eθ [δ 2 (T )]


⇒ Eθ [g(T ) − τ (θ)]2 ≤ Eθ [δ(T ) − τ (θ)]2 ∀ θ ∈ Ω

The inequality becomes equality iff

Eθ [δ 2 (T )] = Eθ [g 2 (T )]
2
i.e., Eθ E[δ 2 (T ) | T ]

= Eθ {E[δ(T ) | T ]}

since E[E[X 2 | Y ]] = E[X 2 ] and g(t) = E[δ(T ) | T ]


h i
2
Eθ E[δ 2 (T ) | T ] − {E[δ(T ) | T ]} = 0

Eθ [V ar[δ(T ) | T ]] = 0
2
V ar[δ(T ) | T ] = 0 iff E[δ 2 (T ) | T ] = {E[δ(T ) | T ]}

If this is the case , then E[δ(T ) | T = t] = g(t) and the statistic g(T ) is a function of T.
Remark 5.1 The Rao - Blackwell Theorem has the following limitations.

(i) If the unbiased estimator T = t(X) is already a function of only one sufficient statis-
tic, then the derived statistic is identical to T = t(X). In this case there is no
improvement in the variance of the statistic T = t(X).

(ii) If more than one sufficient statistic exists, then one can improve the variance of
the unbiased estimator by using minimal sufficient statistics, since the set of jointly
A.Santhakumaran 175

sufficient statistic is an arbitrary set. To add the concept of completeness to derive


the statistic which is unique and may identify the UMVUE’s. This leads to Lehman
- Scheffe Theorem.

Lehman -Scheffe Theorem states that if a complete sufficient statistic exists, then
the UMVUE of τ (θ) is unique. But it does not mean that only the complete sufficient
statistic has UMVUE. Even if a complete sufficient statistic does not exist, an UMVUE
may still exist.
Theorem 5.6 If T = t(X) is a complete sufficient statistic and there exists an unbiased
estimator δ(T ) of τ (θ), then there exists a unique UMVUE of τ (θ) which is given by
E[δ(T ) | T = t] = g(t).
Proof: Rao - Blackwell Theorem gives E[δ(T ) | T = t] = g(t) and g(T ) is the UMVUE
of τ (θ). It is only to prove that g(T ) is unique. If δ1 (T ) ∈ U and δ2 (T ) ∈ U , then
Eθ [E[δ1 (T ) | T ]] = τ (θ) and Eθ [E[δ2 (T ) | T ]] = τ (θ) ∀ θ ∈ Ω.

Eθ [E[δ1 (T ) | T ] − E[δ2 (T ) | T ]] = 0 ∀ θ ∈ Ω

Since T = t(X) is a complete sufficient statistic

⇒ E[δ1 (T ) | T ] − E[δ2 (T ) | T ] = 0

E[δ1 (T ) | T ] = E[δ2 (T ) | T ]

.˙. The UMVUE g(T ) is unique, if the sufficient statistic T = t(X) is complete.
From the Theorems 5.5 and 5.6, the UMVUE of τ (θ) is obtained by solving a set
of equations and conditioning on the sufficient statistic.

Solving a set of equations of the sufficient statistic

Let Pθ , θ ∈ Ω be a distribution of random variable X. If T is a complete sufficient


statistic, then the UMVUE g(T ) of any parametric function τ (θ) is uniquely determined
by solving the set of equations Eθ [g(T )] = τ (θ) ∀ θ ∈ Ω.
A.Santhakumaran 176

Conditioning on the sufficient statistic

If a random variable X has a distribution Pθ , θ ∈ Ω and δ(T ) is any unbiased


estimator of τ (θ) and T = t(X) is complete sufficient statistic, then the UMVUE g(T ) can
be obtained by conditional expectation of δ(T ) given T = t, i.e., g(t) = E[δ(T ) | T = t].
Problem 5.4 Obtain the UMVUE of θ + 2 for the pmf of the Poisson distribution

 e−θ θx

x = 0, 1, 2, · · ·
x!
p(x | θ) =
 0

otherwise

by taking a sample of size n.


Pn
Solution: Let T = i=1 Xi , then T ∼ P (nθ) , i.e.,

e−nθ (nθ)t
p(t | θ) = t = 0, 1, 2, · · ·
t!
= 0 otherwise
1
p(t | θ) = e−nθ et log nθ
t!
= c(θ)eQ(θ)t(x) h(x)

Pn
where c(θ) = e−nθ , Q(θ) = log nθ, t(x) = i=1 xi , h(x) = t!1 .
.˙. The statistic is complete and sufficient. Thus the UMVUE g(T ) of θ + 2 is

Eθ [g(T )] = θ + 2

1
g(t)e−nθ (nθ)t
X
= θ+2
t=0
t!

X 1
g(t)nt θt = (θ + 2)enθ
t=0
t!

X (nθ)t
= (θ + 2)
t=0
t!
∞ ∞
X 1 X 1
= nt θt+1 +2 nt θ t
t=0
t! t=0
t!
Equivating the coefficient of θt
on both sides
n t nt−1 nt
g(t) = +2
t! (t − 1)! t!
t
g(t) = +2
n
A.Santhakumaran 177

P
xi
= +2
n
= x̄ + 2

Thus X̄ + 2 is the UMVUE of θ + 2.


Problem 5.5 Let Xi (i = 1 to n) be a sample from Poisson distribution with parameter
θ. Obtain the UMVUE of θr−1 e−rθ , r = 1, 2, 3, · · ·.
Pn
Solution: As in the problem 5.1, T = i=1 Xi is complete and sufficient. Therefore there
exists a unique UMVUE of τ (θ) = θr−1 e−rθ , r = 1, 2, · · · .

1
g(t)e−nθ (nθ)t
X
= θr−1 enθ−rθ
t=0
t!

X (n − r)t t+r−1
= θ
t=0
t!
Equivating the coefficient of θr on both sides
nt (n − r)t−r+1
g(t) =
t! (t − r + 1)!

(n − r)t−r+1 t!
g(t) = t
, r = 1, 2, · · · and n > r
n (t − r + 1)!
Thus the UMVUE of θr−1 e−rθ is

(n − r)T −r+1 T!
T
, r = 1, 2, · · · and n > r.
n (T − r + 1)!
 t  t  T
n−1 1 1
Remark 5.2 When r = 1, g(t) = n = 1− n , n = 2, 3, · · · , then 1 − n is
the unbiased estimator of e−θ where T =
P
Xi .
T 2 T
= 3, 4, · · · is the UMVUE of e−2θ θ where T =
P
When r = 2, (n−2) [1− n ] , n Xi .
Problem 5.6 Obtain the UMVUE of θr + (r − 1)θ, r = 1, 2, · · · for the random sample
of size n from Poisson distribution with parameter θ.
Pn
Solution: As in the problem 5.1, T = i=1 Xi is complete and sufficient. There exists
a UMVUE of τ (θ) = θr + (r − 1)θ, r = 1, 2, · · ·

(nθ)t
g(t)e−nθ
X
Eθ [g(T )] = = θr + (r − 1)θ
t=0
t!

X nt θ t
g(t) = [θr + (r − 1)θ]enθ
t=0
t!
A.Santhakumaran 178

= θr enθ + (r − 1)θenθ
∞ ∞
r
X nt θ t X nt θ t
= θ + (r − 1)θ
t=0
t! t=0
t!
∞ ∞
X 1 X 1
= nt θt+r + (r − 1) nt θt+1
t=0
t! t=0
t!
Equivating the coefficient of θt on both sides
nt nt−r nt−1
g(t) = + (r − 1)
t! (t − r)! (t − 1)!
1 t! 1 (r − 1)
= + t!
nr (t − r)! n (t − 1)!
t(t − 1) · · · · · · (t − r + 1) (r − 1)
= + t
nr n
The UMVUE of θr + (r − 1)θ is
T (T − 1) · · · · · · (T − r + 1) (r − 1)
g(T ) = + T, r = 1, 2, · · ·
nr n
Remark 5.3 When r = 1, X̄ is the UMVUE of θ.
X̄(nX̄−1)
When r = 2, n + X̄ is the UMVUE of θ2 + θ.
Problem 5.7 Obtain UMVUE of θ(1 − θ) using a random sample of size n drawn from
a Bernoulli population with parameter θ.
Solution: 
 θ x (1 − θ)1−x

x = 0, 1
Given pθ (x) =
 0

otherwise
n
X
Let T = Xi , then T ∼ b(n, θ)
i=1
i.e., pθ (x) = cnt θt (1 − θ)n−t t = 0, 1, 2, · · · , n
t
θ

= cnt (1 − θ)n
1−θ
θ
= (1 − θ)n et log( 1−θ ) cnt

= c(θ)eQ(θ)t(x) h(x)
θ X  
n
where c(θ) = (1 − θ) , Q(θ) = log , t(x) = xi and h(x) = cnt .
1−θ
P
It is an one parameter exponentially family. .˙. The statistic T = Xi is complete and
sufficient. The UMVUE of θ(1 − θ) is

Eθ [g(T )] = θ(1 − θ)
A.Santhakumaran 179


X
g(t)cnt θt (1 − θ)n−t = θ(1 − θ)
t=0
∞ t
θ

= θ(1 − θ)(1 − θ)−n
X
g(t)cnt
t=0
1−θ
θ
One can take ρ = , then
1−θ
θ 1
1+ρ = 1+ =
1−θ 1−θ
1
Thus 1 − θ =
1+ρ
ρ
⇒ θ =
1+ρ

X
g(t)ρt cnt = ρ(1 + ρ)n−2
t=0
= ρ[1 + c1n−2 ρ + · · · + ρn−2 ]

= ρ + c1n−2 ρ2 + · · · + ρn−1
n−1
!
X n-2
= t-1 ρt
t=1
g(t)cnt = cn−2
t
(n − 2)! t!(n − t)!
g(t) =
(t − 1)!(n − t − 1)! n!
(n − 2)!t(t − 1)!(n − t)(n − t − 1)!
=
(t − 1)!(n − t − 1)!n(n − 1)(n − 2)!
t(n − t)
= if n = 2, 3, · · ·
n(n − 1)
T (n−T )
i.e., n(n−1) is the UMVUE of θ(1 − θ).
1
Problem 5.8 Obtain the UMVUE of p of the pmf

 pq x

x = 0, 1, · · ·
pp (x) =
 0

otherwise

based on a sample of size n.


Solution: If xi denotes the number of trials after the (i − 1)th success up to but not
including the ith success, the probability that Xi = x is pq x for x = 0, 1, · · · and i =
A.Santhakumaran 180

1, 2, · · · , n. The joint pmf of X1 , X2 , · · · , Xn is


 P
 pq xi

xi = 0, 1, · · · ; i = 1, 2, · · · , n
pp (x1 , x2 , · · · , xn ) =
 0

0otherwise
P
= pelog(1−p) xi

= c(p)eQ(p)t(x) h(x)
X
where c(p) = pn , Q(p) = log(1 − p), t(x) = xi , h(x) = 1.

This is an one parameter exponentially family which is complete and sufficient. Thus
there exist an unique UMVUE of p1 . It is given by Ep [g(T )] = p1 .
Pn
The statistic T = i=1 Xi is the sum of n iid geometric variables with same
parameter p has the Negative Binomial distribution. The pmf of T is
 n+t-1
 !

 n-1 pn q t t = 0, 1, · · ·
pp (t) = P {T = t} =


 0 otherwise

∞ n+t-1
!
X 1
g(t) n-1 pn q t =
t=0
p
∞ n+t-1
!
q t = (1 − q)−(n+1)
X
g(t) t
t=0
∞ n+t
!
X
= t qt
t=0
n+t-1
!
Equivating the coefficient of q t on both sidesg(t) t

n+t
!
= t

(n + t)! t!(n − 1)!


g(t) =
t!n! (n + t − 1)!
(n + t)(n + t − 1)!(n − 1)! t+n
= =
n(n − 1)!(n + t − 1)! n
T +n
Thus n is the UMVUE of p1 .
1
Problem 5.9 For a single observation x of X, find the UMVUE of p of the pmf

 pq x

x = 0, 1, · · ·
pp (x) =
 0

otherwise
A.Santhakumaran 181

Solution:The pmf of the random variable is written as

pp (x) = pex log(1−p) = c(p)eQ(p)t(x) h(x)

where c(p) = p, Q(p) = log(1 − p), t(x) = x, h(x) = 1

It is an one parameter exponentially family. The statistic T = X is complete and sufficient.


1
The UMVUE of p is given by

1
Ep [g(X)] =
p

X 1
g(x)pq x =
x=0
p
∞ ∞
g(x)q x = (1 − q)−2 =
X X
(x + 1)q x
x=0 x=0
→ g(x) = x + 1

1
Thus the UMVUE of p is X + 1.
Problem 5.10 Let X1 , X2 , · · · Xn be iid N(θ, 1). Prove that E[X1 | Y ] = x̄ where
Pn
Y = i=1 Xi .

Solution: The sample mean X̄ ∼ N (θ, n1 ) and Eθ [X1 ] = θ, ∀ θ ∈ Ω.


The pdf of the sample size n is
n
Y
pθ (x1 , x2 , · · · , xn ) = p(xi | θ)
i=1
n
1
 
1
P 2

= e− 2 (xi −θ)

n
1
 P 2 nθ2
1
= √ e− 2 xi − 2 +nx̄θ

= c(θ)eQ(θ)t(x) h(x)
n
1
2

− nθ2 1
P 2
e− 2 xi , Q(θ) = θ and t(x) =
X
where c(θ) = e , h(x) = √ xi .

It is an one parameter exponential family. T = X̄ is complete and sufficient. The UMVUE


of θ is given by g(T ) and g(t) = E[X1 | Y ] where δ(T ) = X1 is an unbiased estimator of
Pn
θ. The conditional expectation X1 on Y = i=1 Xi is a regression line, i.e.,

σX1
E [X1 | Y ] = Eθ [X1 ] + bX1 Y (Y − Eθ [Y ]) where bX1 Y = ρ
σY
A.Santhakumaran 182

Pn
and ρ is the correlation coefficient between X1 and Y = i=1 Xi

Cov[X1 , Y ]
ρ =
σX1 σY
X √
Y = Xi ∼ N (nθ, n) σY = n, σX1 = 1

Covθ [X1 , Y ] = Eθ [X1 Y ] − Eθ [X1 ]Eθ [Y ]


n
" #
X
Eθ [X1 Y ] = Eθ X1 Xi
i=1
= Eθ [X12 ] + Eθ [X1 X2 + · · · + Xn X1 ]

= Eθ [X12 ] + Eθ [X1 ]Eθ [X2 ] + · · · + Eθ [X1 ]Eθ [Xn ]

= 1 + θ2 + (n − 1)θ2 where Vθ [X1 ] = Eθ [X12 ] − θ2

= nθ2 − θ2 + 1 + θ2

= nθ2 + 1

Covθ [X1 , Y ] = nθ2 + 1 − θnθ = 1


1 1
ρ = √ and bX1 Y =
n n
1
E[X1 | Y = y] = Eθ [X1 ] + [y − nθ]
n
y
= θ + − θ = x̄
n
E[X1 | Y = y] = x̄ and X̄ is the UMVUE of θ.

Problem 5.11 Let X1 , X2 , · · · Xn be iid random sample with pdf



 1

0<x<θ
θ
pθ (x) =
 0

otherwise

Find the UMVUE of θ.


Solution: Let T = max1≤i≤n {Xi }

The pdf of T is
Z t n−1
n! 1 1
pθ (t) = dx 0<t<θ
1!(n − 1)! 0 θ θ

 nn tn−1

0<t<θ
θ
pθ (t) =
 0

otherwise
A.Santhakumaran 183

The joint density of X1 , X2 , · · · , Xn is


 n
1
pθ (x1 , x2 , · · · , xn ) =
θ

The conditional density of


1
pθ (x1 , x2 · · · xn ) θn 1
= n n−1 =
pθ (t) θn t ntn−1

It is independent of θ. T = max1≤i≤n {Xi } is a sufficient statistic.


Z θ
n n−1
Eθ [g(T )] = g(t) t dt = 0
0 θn
Z θ
g(t)tn−1 dt = 0
0
Differentiate this with respect to θ
hR i
θ n n−1
∂ 0 g(t) θn t dt Z θ
= 0dt + g(θ)θn−1 × 1 − 0 = 0
∂θ 0
⇒ g(θ) = 0 ∀ θ

i.e., g(t) = 0 ∀ t and 0 < t < θ

Thus T = t(X) is a complete and sufficient statistic. δ(T ) = 2X1 is an unbiased estimator

of θ, since Eθ [X1 ] = 0 x1 1θ dx1 = θ
2. The UMVUE of θ is given by g(T ) and g(t) =
E[2X1 | T = t].
When x1 = t the conditional pmf of X1 given T = t is p(x1 | t) = n1 .
When 0 < x1 < t the conditional density of X1 given T = t is
1 (n−1) n−2
pθ (x1 , t) θ θn−1 t
pθ (x1 | t) = = n n−1 0 < x1 < t
pθ (t) nt
θ
 n−1 1

0 < x1 < t
n t
=
 0

otherwise
Z t
1
E[2X1 | T = t] = 2x1 pθ (x1 | t)dx1 + 2t
0 n
n−11 t Z
2t
= 2 x1 dx1 +
n t 0 n
n−11t 2 2t
= 2 +
n t 2 n
1
= (1 + )t
n
A.Santhakumaran 184

Thus the UMVUE of θ is ( n+1


n )T where T = max1≤i≤n {Xi }.

Problem 5.12 Let X1 , X2 , · · · , Xn be a random sample drawn from a distribution with


probability density function

 e−(x−θ)

θ<x<∞
pθ (x) =
 0

otherwise

Derive the U M V U E of (i) θ and (ii) eθ .


Solution: Define T = min1≤i≤n {Xi }

The pdf of T is
Z ∞ n−1
n!
pθ (t) = e−(t−θ) e −(x−θ)
dx
1!(n − 1)! θ

 ne−n(t−θ)

θ<t<∞
=
 0

otherwise
Eθ [g(T )] = 0
Z ∞
g(t)ne−n(t−θ) dt = 0
θ
Z ∞
g(t)e−n(t−θ) dt = 0
θ

One can take z = t − θ, then dz = dt and when t = θ → z = 0, when t = ∞ → z = ∞


Z ∞
g(z + θ)e−nz dz = 0
0
Z ∞
It is same as f (t)e−st dt = 0
0

By the uniqueness property of the Laplace Transform

g(z + θ) = 0 ∀ 0 < z < ∞

i.e., g(t) = 0 ∀ 0 < t − θ < ∞

= 0∀ θ<t<∞

Thus T is complete. Also T = min1≤i≤n {Xi } is sufficient.


Z ∞
Eθ [X1 ] = x1 e−(x1 −θ) dx1
θ
Z ∞
= (z + θ)e−z dz where z = x1 − θ
0
A.Santhakumaran 185

Z ∞ Z ∞
= e−z z 2−1 dz + θ e−z z 1−1 dz
0 0
= Γ2 + θΓ1

= 1+θ

Eθ [X1 − 1] = θ

If one can take δ(T ) = X1 − 1, then the UMVUE of θ is given by g(T ) and g(t) =
E[(X1 − 1) | T = t].
When x1 = t, the conditional pmf of X1 given T = t is pθ (x1 | t) = n1 .
When t < x1 < ∞, the conditional density of X1 given T = t is
e−(x1 −θ) (n − 1)e−(n−1)(t−θ)
pθ (x1 | t) =
ne−n(t−θ)
n − 1 −(x1 −t)
= e
n Z ∞
(n − 1) 1
E[(X1 − 1) | T = t] = (x1 − 1)e−(x1 −t) dx1 + (t − 1)
n n
Z t∞ Z ∞
n−1 n−1
= x1 e−(x1 −t) dx1 − e−(x1 −t) dx1
n t n t
1
+ (t − 1)
n Z
n−1 ∞ n − 1 ∞ −z
Z
= (z + t)e−z dz − e dz
n 0 n 0
1
+ (t − 1) where z = x1 − t
n Z
n − 1 ∞ −z 2−1
Z ∞
n−1
= e z dz + t e−z z 1−1 dz
n 0 n 0
n − 1 ∞ −z 1−1
Z
1
− e z dz + (t − 1)
n 0 n
n−1 n−1 n−1 1
= Γ2 + t− Γ1 + (t − 1)
n n n n
n−1 1
= t + (t − 1)
n n
1
= t−
n
1
The UMVUE of θ is T − 1
n and the UMVUE of eθ is e{T − n }.
Problem 5.13 Let X1 and X2 be a random sample drawn from a population with pdf

 1 e− xθ

0<x<∞
θ
pθ (x) =
 0

otherwise

Obtain the UMVUE of θ.


A.Santhakumaran 186

Solution: The joint pdf of X1 , X2 is

1 − 1 (x1 +x2 )
pθ (x1 , x2 ) = e θ
θ2
1 −1t
= e θ
θ
= c(θ)eQ(θ)t(x) h(x)
2
1 1 X
where c(θ) = , Q(θ) = − , t(x) = xi , h(x) = 1
θ2 θ i=1

It is an one parameter exponential family. It is complete and sufficient. Let t = x1 + x2


∂x1 ∂x1
and t1 = x2 , then x1 = t − t1 and x2 = t1 . ∂t = 1, ∂t1 = −1, ∂x ∂x2
∂t = 0, ∂t1 = 1
2


∂x1 ∂x1


∂t ∂t1
J =


∂x2 ∂x2
∂t ∂t1


1 −1
=



0 1
The joint density of T and T1 is

 12 e− θ1 t

0 < t1 < t < ∞
θ
p(t, t1 | θ) =
 0

otherwise
0 < t1 < t or

 1 e− θ1 t

t1 < t < ∞
= θ2

 0 otherwise
The pdf of T is
Z t
pθ (t) = p(t, t1 | θ)dt1
0
1 t −1t
Z
= e θ dt1
θ2 0
1 −1t
= e θ t 0<t<∞
θ2

1
1
e− θ t t2−1 0<t<∞


θ2 Γ2
=
 0

otherwise
The pdf of T1 is
Z ∞
1 −1t
pθ (t1 ) = 2
e θ dt
t1 θ
− θ1 t ∞
" #
1 e
=
θ − 1θ t1
A.Santhakumaran 187

1 − 1 t1
= e θ 0 < t1 < ∞
θ
The conditional density of T1 given T = t is

 1

0 < t1 < t
t
p(t1 | t) =
 0

otherwise
Eθ [X2 ] = θ .˙. δ(T ) = X2 = T1

is an unbiased estimator θ. Thus the UMVUE of θ is


Z t
1
E [T1 | T ] = t1 dt1
0 t
" #t
1 t21
=
t 2 0
t x1 + x2
= = = x̄
2 2
The UMVUE of θ is X̄.

Problem 5.14 The random variables X and Y have the joint pdf

 22 e− θ1 (x+y)

0<x<y<∞
θ
p(x, y | θ) =
 0

otherwise

Show that

(i) Eθ [Y | X = x] = x + θ

(ii) Eθ [Y ] = Eθ [X + θ] and

(iii) Vθ [X + θ] ≤ Vθ [Y ]

Solution: The marginal density of X is


Z ∞
2 x+y
p(x | θ) = e− θ dy
θ2
 x

 22 e− 2x
 θ 0<x<∞
θ
=
 0

otherwise

The marginal density of Y is


Z y
2 x+y
pθ (y) = e− θ dy
θ2 0
A.Santhakumaran 188


y
 2 e− θ − 2 e− θ2 y

0<y<∞
θ θ
=
 0

otherwise
The conditional pdf of Y given X = x is
2 − x+y
θ2
e θ
pθ (y | x) =
2 − θ2 x
e
θ
y
 1 e xθ e− θ

x<y<∞
θ
=
 0

otherwise
Z ∞
Eθ [Y | X = x] = ypθ (y | x)dy
x
x Z ∞
eθ y
= ye− θ dy
θ
Z x∞
x y
= e θ e− θ dy + x
x
= x+θ
Z ∞ ∞
2 2
Z
− yθ 2
Eθ [Y ] = ye dy − ye− θ y dy
θ 0 θ 0
2 Γ2 2 Γ2
= −
θ ( 1θ )2 θ ( 2θ )2
3
= θ
2
7θ2 5
Eθ [Y 2 ] = , Vθ (Y ) = θ2
2
Z ∞
4
2 −2x θ
Eθ [X] = 2
e θ dx =
0 θ 2
θ 3
Eθ [X + θ] = + θ = θ = Eθ [Y ]
2 2
θ2
Vθ [X + θ] = Vθ [X] =
4
Thus Vθ [X + θ] ≤ Vθ [Y ].

Problem 5.15 Let X1 , X2 , · · · , Xn be a sample from the pmf



 1

x = 1, 2, · · · , N and N ∈ I+
N
p(x | N ) =
 0

otherwise
Obtain the UMVUE of N .
Solution: Define X(n) = max1≤i≤n {Xi } , then the pmf is

PN {X(n) ≤ x} = PN {X1 ≤ x1 , X2 ≤ x2 , · · · , Xn ≤ xn }
A.Santhakumaran 189

= PN {X1 ≤ x1 } · · · PN {Xn ≤ xn }
 n
x x x
= ··· =
N N N
x−1 n
 
PN {X(n) ≤ x − 1} =
N
PN {X(n) = x} = PN {X(n) ≤ x} − PN {X(n) ≤ x − 1}
n n−1
x x−1
 
= −
N N
N
" n n−1 #
t t−1
X 
EN [g(T )] = g(t) − =0
t=1
N N
g(t) = 0 ∀ t = 1, 2, · · · , N

When N = 1, g(1)[1 − 0] = 0 ⇒ g(1) = 0


 n "  n−1 #
1 1

When N = 2, g(1) − 0 + g(2) 1 − = 0
2 2
1
 
g(1) = 0 ⇒ g(2) 1 − = 0 ⇒ g(2) = 0 and so on
2n−1
g(t) = 0 ∀ t = 1, 2, · · · , N

Thus the statistic X(n) is complete

Consider PN {X1 = x1 | X(n) = x}


PN {X1 = x1 ∩ X(n) = x}
=
PN {X(n) = x}
if x1 = 1, 2, · · · , (x − 1) and x1 6= x
x n−1
(N ) − ( x−1
N )
n−1
1
= x
 n x−1 n
×
N −( N ) N
xn−1 − (x − 1)n−1
=
xn − (x − 1)n
if x1 = 1, 2, · · · , (x − 1) and x1 6= x
PN {X1 = x1 ∩ X(n) = x}
PN {X1 = x1 | X(n) = x} = if x1 = x
PN {X(n) = x}
x n−1

N 1
= x n x−1 n × N
(N ) − ( N )
xn−1
= if x1 = x
xn − (x − 1)n
Thus X(n) is a sufficient statistic.
A.Santhakumaran 190

N
X 1
EN [X1 ] = x1
x1 =1
N
1 N (N + 1) N +1
= =
N 2 2
EN [2X1 ] = N +1

EN [2X1 − 1] = N

.. . δ(T ) = 2X1 − 1 is an unbiased estimator of N.

The UMVUE of N is given by

E[(2X1 − 1) | X(n) = x]
x−1
X
= (2x1 − 1)PN {X1 = x1 | X(n) = x}
x1 =1
+(2x − 1)PN {X1 = x1 | X(n) = x}
x−1
xn−1 − (x − 1)n−1 X
= (2x1 − 1)
xn − (x − 1)n x =1
1

xn−1
+ n (2x − 1)
x − (x − 1)n
x−1
xn−1 X
= (2x1 − 1)
xn − (x − 1)n x =1
1

xn−1
+ n (2x − 1)
x − (x − 1)n
x−1
(x − 1)n−1 X
− (2x1 − 1)
xn − (x − 1)n x =1
1

xn−1
= [1 + 3 + 5 + · · · + (2x − 1)]
xn − (x − 1)n
(x − 1)n−1
− n [1 + 3 + · · · + (2x − 3)]
x − (x − 1)n

1 + 3 + · · · + (2x − 3) + (2x − 1) = 1 + 2 + · · · + (2x − 1) + 2x

−(2 + 4 + · · · + 2x)
2x(2x + 1) x(x + 1)
= −2×
2 2
= x(2x + 1) − x(x + 1) = x2

1 + 3 + · · · + (2x − 3) = 1 + 2 + · · · + (2x − 2) − [2 + 4 + · · · + (2x − 2)]


(2x − 2)(2x − 1) 2(x − 1)x
= −
2 2
= (2x − 1)(x − 1) − x(x − 1) = (x − 1)2
A.Santhakumaran 191

h i xn−1 2 (x − 1)n−1
E 2X1 − 1 | X(n) = x = x − (x − 1)2
xn − (x − 1)n xn − (x − 1)n
xn+1 (x − 1)n+1
= −
xn − (x − 1)n xn − (x − 1)n
xn+1 − (x − 1)n+1
=
xn − (x − 1)n
X n+1 −(X−1)n+1
Thus the UMVUE of N is X n −(X−1)n .

Remark 5.4 In Chapter 4 , Example 4.15 is not complete, but it is bounded complete.
The class of unbiased estimators of zero is

U0 = {g(X) | c ∈ <}

where 
 c(−1)x−1

if x = 1, 2
g(x) =
 0

x = 3, 4, · · · , N ; N = 2, 3, · · ·
By Theorem 5.7, CovN [δ(T ), g(X)] = 0 for N = 2, 3, · · · implies that δ(T ) is a UMVUE
of N where T = t(X). That is

EN [δ(t(X))g(X)] = 0 N = 2, 3, · · · , ∀ c ∈ <
N
X 1
δ(t(x))g(x) = 0 N = 2, 3, · · · , ∀ c ∈ <
x=1
N
N
X
⇒ δ(t(x))g(x) = 0 N = 2, 3, · · · , ∀ c ∈ <
x=1
i.e., δ(t(1))c − δ(t(2))c = 0 ∀ c ∈ <

If one can take c = 1 , then δ(t(1)) = δ(t(2)).

.˙. Any estimator δ(T ) such that δ(t(1)) = δ(t(2)) is a UMVUE of N , provided
EN [δ 2 (T )] < ∞, for N = 2, 3, · · · . Thus a family of distributions is bounded complete,
then there is a class of UMVUE’s.
Problem 5.16 Let X1 , X2 , · · · , Xn be a random sample of size n from a distribution
with pdf 
 1 e− xθ

0 < x < ∞, θ > 0
θ
pθ (x) =
 0

otherwise
Obtain the UMVUE of Pθ {X ≥ 2}.
A.Santhakumaran 192

Solution:The joint pdf of the sample size n is


Pn
1 − i=1 xi
p(x1 , x2 , · · · , xn | θ) = e θ
θn
= c(θ)eQ(θ)t(x) h(x)
Pn
It is an one parameter exponential family. The statistic T = i=1 Xi is complete and
sufficient.

Pθ {X ≥ 2} = 1 − Pθ {X < 2}
Z 2
1 −x
= 1− e θ dx
0 θ
2
= e− θ
Z ∞
1 − x 2−1
Eθ [X1 ] = e θ x1 dx1 = θ
0 θ
n
X
Let T = Xi , thenT ∼ G(n, θ)
i=1

1 − θ1 t n−1
θn Γn e t t>0


pθ (t) =
 0

otherwise
n
X
Let y = xi , then
i=2

The joint probability density of (X1 , Y ) is

pθ (x1 , y) = pθ (x1 )pθ (y)


1 − 1 x1 1 1
= e θ n−1
e− θ y y n−2
θ θ Γ(n − 1)
1 − θ1
Pn
x n−2
= e i=1 i y
θn Γ(n − 1)
1 1
= n
e− θ t [t − x1 ]n−2 where y = t − x1
θ Γ(n − 1)

− θ1 t
1
θn Γ(n−1) e [t − x1 ]n−2 0 < x1 ≤ t < ∞


pθ (x1 , t) =
 0

otherwise

− θ1 t
1
θn Γ(n−1) e [t − x1 ]n−2
pθ (x1 | t) =
1 − θ1 t n−1
θn Γn e t
1
= (n − 1)[t − x1 ]n−2
tn−1
A.Santhakumaran 193


 (n − 1) 1 [1 − x1 ]n−2

0 < x1 < t
t t
=
 0

otherwise
The UMVUE of θ is
x1 n−2
Z t
n−1
 
E[X1 | T = t] = x1 1− dx1
0 t t
x1 n−2
Z t
n−1
 
= x1 1 − dx1
t 0 t
x1
One can take z = , then dx1 = tdz
t
When x1 = t → z = 1; when x1 = 0 → z = 0
Z 1
n−1
E[X1 | T = t] = (tz)[1 − z]n−2 tdz
t 0
Z 1
= (n − 1)t (1 − z)n−1−1 z 2−1 dz
0
Γ2Γ(n − 1) t nx̄
= (n − 1)t = = = x̄
Γ(n − 1 + 2) n n
2
The UMVUE of Pθ {X ≥ 2} is e− X̄
Problem 5.17 Let X1 , X2 , · · · , Xn be a random sample from N (θ, σ 2 ). Both θ and σ
are unknown. Find the UMVUE of σ and pth quantile.
P
2 (Xi −X̄)2
Solution: Let Y = (n−1)S σ2
= σ2
∼ χ2 distribution with (n − 1) degrees of
freedom. Y ∼ G( 12 , (n−1)
2 ).
 1 n−1

 n−1
1
e− 2 y y 2
−1
0<y<∞
p(y) = 2 2 Γ n−1
2


0 otherwise
√ Z ∞
1 1 n
E[ Y ] = n−1 e− 2 y y 2 −1 dy
0 2 2 Γ n−1
2
1 Γ n2
= n−1 1 n
2 2 Γ n−12
( 2)
2

Γ n2 √
"r #
n−1 2
i.e., Eσ S = 2
σ2 Γ n−1
2
Γ n2 √ σ
⇒ Eσ [S] = n−1 2

Γ 2 n−1
1 Γ n−1
q
2
= σ where k(n) = Γn
2
n −1
k(n) 2
A.Santhakumaran 194

Thus k(n)S is the unbiased estimator of σ.


n
1
 
− 12 [ x2i −2θ
P P
xi +nθ2 ]
p(x1 , x2 , · · · , xn | θ, σ) = √ e 2σ
2πσ
n
1 nθ 2

1
P 2 θ P
= √ e− 2σ2 xi e σ2 xi e− 2σ2
2πσ
P2
Q (θ ,θ )t (x)
= c(θ1 , θ2 )e j=1 j 1 2 j h(x) where θ1 = θ, θ2 = σ
n
1 1 θ

n 2
and c(θ1 , θ2 ) = √ e− 2σ2 θ , Q1 (θ1 , θ2 ) = − 2 , Q2 (θ1 , θ2 ) = 2
2πσ 2σ σ

Xi2 and T = (T1 , T2 ) is jointly sufficient and complete. But


P P
Hence T1 = Xi , T2 =
there is a one to one function also sufficient. .˙. T = (X̄, S 2 ) is also sufficient and
1
complete. Thus the UMVUE of σ is k(n)S where S 2 = (Xi − X̄)2 . The UMVUE
P
n−1

of pth quantile δp is given by

p = Pθ,σ {X ≤ δp }
X −θ δp − θ
 
= Pθ,σ ≤
σ σ
δp − θ
 
X−θ
= Pθ,σ Z ≤ where Z = σ ∼ N (0, 1)
σ
Z δ−θ
σ
p = p(z)dz
0
Z ∞
i.e., 1 − p = δp −θ
p(z)dz
σ
δp − θ
⇒ = z1−p ⇒ δp = z1−p σ + θ
σ

Thus the UMVUE of δp is Z1−p k(n)S + X̄.

5.5 Information inequality approach

Under some regularity conditions Cramer - Rao inequality provides a lower bound for
the variance of unbiased estimators. It may enable us to judge a given unbiased estimator
is an UMVUE or not. That is, the variance of an unbiased estimator coincides with the
Cramer - Rao lower bound, then the estimator is UMVUE.
A.Santhakumaran 195

Covariance inequality

Theorem 5.7 The covariance inequality between two functions T = t(X) and ψ(X, θ)
is defined as
{Covθ [T, ψ(X, θ)]}2
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
where ψ(X, θ) is a function of X and θ and T = t(X) is a statistic with pdf pθ (t).
Proof: The Cauchy - Schwarz inequality between two variables X and Y is

{E[X − E[X]][Y − E[Y ]]}2 ≤ E[X − E[X]]2 E[Y − E[Y ]]2

(Cov[X, Y ])2 ≤ V [X]V [Y ]

Now replace X by T and Y by ψ(X, θ)

(Covθ [T, ψ(X, θ)])2 ≤ Vθ [T ]Vθ [ψ(X, θ)]


{Covθ [T, ψ(X, θ)]}2
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]

Fisher measure of information

Definition 5.1 Let Pθ , θ ∈ Ω be the distribution of the random variable X. The


function
∂pθ (x) 1 ∂ log pθ (x)
ψ(x, θ) = =
∂θ pθ (x) ∂θ
is the relative rate at which the density pθ (x) changes at x. The average of the square of
this range is defined by

pθ (x) 2
2 Z  0
∂ log pθ (X)
 
I(θ) = Eθ = pθ (x)dx
∂θ pθ (x)

Likelihood function

Definition 5.2 Consider a random sample X1 , X2 , · · · , Xn from a distribution having pdf


pθ (x), θ ∈ Ω. The joint probability density function of X1 , X2 , · · · , Xn with a parameter
θ is p(x1 , x2 , · · · , xn | θ). The joint probability density function may be regarded as a
function of θ is called the likelihood function of the random sample and is denoted by
L(θ) = pθ (x1 , x2 , · · · , xn ) θ ∈ Ω.
A.Santhakumaran 196

Property 5.1 Let IX (θ) and IY (θ) be the amount of information of two independent
samples (X1 , X2 , · · · , Xn ) and (Y1 , Y2 , · · · Yn ) respectively. Let IXY (θ) be the amount of
information of the joint sample (X1 , Y1 )(X2 , Y2 ), · · · , (Xn , Yn ). Then IXY (θ) = IX (θ) +
IY (θ). This is known as additive property of Fisher measure of information.
Proof:

Let LXY (θ) = pθ (x1 , y1 ) · · · pθ (xn , yn )

= pθ (x1 )pθ (y1 ) · · · pθ (xn )pθ (yn )


n
Y n
Y
= pθ (xi ) pθ (yi )
i=1 i=1
= LX (θ)LY (θ)

log LXY (θ) = log LX (θ) + log LY (θ)

Differentiate this with respect to θ


∂ log LXY (θ) ∂ log LX (θ) ∂ log LY (θ)
= +
∂θ ∂θ ∂θ 
∂ log LXY (θ) ∂ log LX (θ) ∂ log LY (θ)
    
Vθ = Vθ + Vθ
∂θ ∂θ ∂θ
IXY (θ) = IX (θ) + IY (θ)

Property 5.2 Let X1 , X2 , · · · , Xn be iid random sample drawn from a population


with density function pθ (x), θ ∈ Ω. Let I(θ) be the amount of information for each
Xi , i = 1, 2, · · · , n. Then the amount of information of (X1 , X2 , · · · , Xn ) is nI(θ).
Proof: The likelihood function for θ of the sample size n is
n
Y
L(θ) = pθ (xi )
i=1
Xn
log L( θ) = log pθ (xi )
i=1
n
∂ log L(θ) X ∂ log pθ (xi )
=
∂θ i=1
∂θ
  " n #
∂ log L(θ) X ∂ log pθ (Xi )
Vθ = Vθ
∂θ i=1
∂θ
n  
X ∂ log pθ (Xi )
= Vθ since Xi0 s iid
i=1
∂θ
n
X
= I(θ) = nI(θ)
i=1
A.Santhakumaran 197

The amount of information of X1 , X2 , · · · , Xn is = nI(θ)


 
∂ log pθ (X)
where I(θ) = Vθ ∀∈Ω
∂θ

is the amount of information of a single observation x of X.


Property 5.3 Let X1 , X2 , · · · , Xn be iid random sample drawn from a population with
density function p(x | θ), θ ∈ Ω. Let IX (θ) be the amount of information of the sample
X1 , X2 , · · · , Xn and IT (θ) be the amount of information of the statistic T = t(X). Then
IX (θ) ≥ IT (θ). If T = t(X) is sufficient, then IX (θ) = IT (θ).
Proof: For a single observation x of X
  Z
∂ log pθ (X) ∂ log pθ (x)
Eθ = pθ (x)dx
∂θ ∂θ
Z
∂pθ (x) 1
= pθ (x)dx
∂θ pθ (x)
Z
∂pθ (x)
= dx
∂θ
Z

= pθ (x)dx
∂θ
Z Z
∂ ∂pθ (x)
Assume pθ (x)dx = dx and make the transformation T = X
∂θ ∂θ
   
∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (x) ∂ log pθ (t)
then Eθ = Eθ since =
∂θ ∂θ ∂θ ∂θ

2
∂ log pθ (X) ∂ log pθ (T )

Consider Eθ − ≥0
∂θ ∂θ
 2  2  
∂ log pθ (X) ∂ log pθ (T ) ∂ log pθ (X) ∂ log pθ (T )
Eθ + Eθ − 2Eθ ≥0
∂θ ∂θ ∂θ ∂θ

∂ log pθ (T ) 2
 
IX (θ) + IT (θ) − 2Eθ ≥ 0
∂θ
IX (θ) + IT (θ) − 2IT (θ) ≥ 0

IX (θ) − IT (θ) ≥ 0

IX (θ) ≥ IT (θ)

Suppose T = t(X) is a sufficient statistic, then

pθ (x) = pθ (t)h(x)

log pθ (x) = log pθ (t) + log h(x)


A.Santhakumaran 198

Differentiate this with respect to θ


∂ log pθ (x) ∂ log pθ (t)
=
∂θ ∂θ
∂ log pθ (X) ∂ log pθ (T )
   
Vθ = Vθ
∂θ ∂θ
⇒ IX (θ) = IT (θ)

When a UMVUE does not exist, one may interest on a Locally Minimum Variance
Unbiased Estimator (LMVUE) which gives the smallest variance that an unbiased estima-
tor can achieve at θ = θ0 . This is helpful to measure the performance of a given unbiased
estimator with some lower bounds of the unbiased estimator which are not sharp. The
Cramer - Rao inequality is very simple to calculate the lower bound for the variance of an
unbiased estimator. Also it provides asymptotically efficient estimators. The assumptions
of the Cramer - Rao inequality are

(i) Ω is an open interval ( finite , infinite or semi infinite).

(ii) The range of the distribution Pθ (x) is independent of the parameter θ.

∂pθ (x)
(iii) For any x and θ the derivative ∂θ exists and is finite.

Theorem 5.8 Under the assumptions (i) ,(ii) and (iii) and that I(θ) > 0. Let T = t(X)
be any statistic with Eθ [T 2 ] < ∞ for which the derivative with respect to θ of Eθ [T ] =
R
tpθ (x)dx exists can be obtained by differentiating under the integral sign, then
∂Eθ [T ] 2
h i
∂θ
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
2 " #
∂ log pθ (X) ∂ 2 log pθ (X)

where I(θ) = Eθ = −Eθ
∂θ ∂θ2
R
Proof: Suppose the assumptions hold for a single observation x of X and pθ (x)dx = 1
is differentiated twice under the integral sign with respect to θ, then
∂pθ (x)
Z
dx = 0
∂θ
∂pθ (x) 1
Z
pθ (x)dx = 0
∂θ pθ (x)
∂ log pθ (x)
Z
pθ (x)dx = 0 (5.1)
∂θ
A.Santhakumaran 199

∂ log pθ (X)
 
⇒ Eθ = 0
∂θ

Differentiate the equation (5.1) with respect to θ


R ∂ 2 log pθ (x) R ∂ log pθ (x) ∂pθ (x)
∂θ2
pθ (x)dx + ∂θ ∂θ dx = 0
∂ 2 log pθ (x)
+ ∂ log∂θpθ (x) ∂ log∂θpθ (x) pθ (x)dx = 0
R R
∂θ2
pθ (x)dx
2 R 2
R ∂ log pθ (x) ∂ log pθ (x)
∂θ2
pθ (x)dx + ∂θ pθ (x)dx = 0
" # 2
∂ 2 log pθ (X) ∂ log pθ (X)

Eθ 2
+ Eθ = 0
∂θ ∂θ
2 " #
∂ log pθ (X) ∂ 2 log pθ (X)

Eθ = −Eθ
∂θ ∂θ2
2 " #
∂ log pθ (X) ∂ 2 log pθ (X)

But I(θ) = Eθ = −Eθ
∂θ ∂θ2
∂ log pθ (X)
 
= Vθ
∂θ
Z
Now Eθ [T ] = tpθ (x)dx

Differentiate this with respect to θ


∂Eθ [T ] dpθ (x)
Z
= t dx
∂θ dθ
∂pθ (x) 1
Z
= t pθ (x)dx
∂θ pθ (x)
∂Eθ [T ] ∂ log pθ (x)
Z
= t pθ (x)dx
∂θ ∂θ
∂ log pθ (X)
 
= Eθ T
∂θ
∂ log pθ (X)
 
= Covθ T,
∂θ
∂ log pθ(X) ∂θ
since Eθ [ ] =0

By covariance inequality
{Covθ [T, ψ(X, θ)]}2
Vθ [T ] ≥ ∀θ∈Ω
Vθ [ψ(X, θ)]
∂ log pθ (x)
Take ψ(x, θ) =
∂θ
∂Eθ [T ] 2
 
∂θ
then Vθ [T ] ≥ ∀θ∈Ω
Vθ [ ∂ log∂θ
pθ (X)
]
A.Santhakumaran 200

∂Eθ [T ] 2
 
∂θ
i.e., Vθ [T ] ≥ ∀θ∈Ω
I(θ)

Different forms of Cramer - Rao inequality

(i) Suppose T = t(X) is a biased estimator of the parameter τ (θ), i.e.,Eθ [T ] = τ (θ)+b(θ),
then the Cramer - Rao inequality becomes

[τ 0 (θ) + b0 (θ)]2
Vθ [T ] ≥ ∀θ∈Ω
I(θ)

(ii) Suppose X1 , X2 , · · · , Xn are iid with pdf pθ (x), θ ∈ Ω and Eθ [T ] = τ (θ) ∀ θ ∈ Ω,


then the Cramer - Rao inequality is written as

[τ 0 (θ)]2
Vθ [T ] ≥ ∀θ∈Ω
nI(θ)
h i
∂ log pθ (X)
where I(θ) = Vθ ∂θ of a single observation x of X.

or

[τ 0 (θ)]2
Vθ [T ] ≥ ∀θ∈Ω
I(θ)
h i
∂ log L(θ) Qn
where I(θ) = Vθ ∂θ and L(θ) = i=1 pθ (xi ).

5.6 An improvement of Cramer -Rao inequality

Chapman Robbin - Kiefer inequality is an improvement of Cramer - Rao inequality,


since it does not involve regularity conditions as in Cramer - Rao inequality. They also
give a lower bound for the variance of an unbiased estimator.
Theorem 5.9 Suppose X is distributed with density function pθ (x) and T = t(X) is a
statistic with Eθ [T ] = τ (θ) and Eθ [T 2 ] < ∞. Suppose pθ (x) > 0 ∀ x . If θ and θ + ∆ are
pθ+∆ (x)
two values for which τ (θ) 6= τ (θ + ∆) and the function ψ(x, θ) = pθ (x) − 1, then
 
 [τ (θ + ∆) − τ (θ)]2 
 
Vθ [T ] ≥ sup h i2 ∀θ∈Ω
∆  Eθ pθ+∆ (X) − 1  
pθ (X)
A.Santhakumaran 201

Proof: First prove that

Eθ [ψ(X, θ)] = 0, ∀ θ ∈ Ω
Z
Eθ [ψ(X, θ)] = ψ(x, θ)pθ (x)dx
pθ+∆ (x)
Z  
= − 1 pθ (x)dx
pθ (x)
Z
= [pθ+∆ (x) − pθ (x)]dx

= 1−1=0

Covθ [T, ψ(X, θ)] = Eθ [T ψ(X, θ)] − Eθ [T ]Eθ [ψ(X, θ)]

= Eθ [T ψ(X, θ)]
pθ+∆ (X)
 
= Eθ T −1
pθ (X)
pθ+∆ (x) − pθ (x)
Z  
= t pθ (x)dx
pθ (x
Z Z
= tpθ+∆ (x)dx − tpθ (x)dx

= τ (θ + ∆) − τ (θ)

By covariance inequality
[τ (θ + ∆) − τ (θ)]2
Vθ [T ] ≥ h
pθ+∆ (X)
i
Vθ pθ (X) −1
It is true for all values of ∆  
 [τ (θ + ∆) − τ (θ)]2 
Vθ [T ] ≥ sup hp i
θ+∆(X)
∆  V −1 
θ pθ (X)

Remark 5.5 If the range of the distribution Pθ , θ ∈ Ω can be relaxed by S(φ) ⊂


S(θ), φ < θ, φ 6= θ, then the Chapman Robbin -Kiefer inequality becomes
 
 [τ (φ) − τ (θ)]2 
Vθ [T ] ≥ sup h
pφ (X)
i
φ:S(φ)⊂S(θ)  V θ −1 
pθ (X)

Problem 5.18 Using a single observation x of X, obtain the Chapman Robbin - Kiefer
bound for the parameter θ of the pdf

 1

0<x<θ
θ
pθ (x) =
 0

otherwise
A.Santhakumaran 202

Solution: Assume φ < θ and φ 6= θ and τ (φ) 6= τ (θ). Define



 1

0<x<φ
φ
pφ (x) =
 0

otherwise
Z φ Z θ
pφ (X) θ 0
 
Eθ = pθ (x)dx + 1 pθ (x)dx
pθ (X) 0 φ φ θ
Z φ
θ1
= dx = 1
0 φθ
2 Z φ  2
pφ (X) θ 1

Eθ = dx
pθ (X) 0 φ θ
θ2 1 θ
= 2
φ=
φ θ φ
pφ (X) pφ (X)
   
Vθ −1 = Vθ
pθ (X) pθ (X)
θ θ−φ
= −1=
φ φ
The Chapman Robbin - Kiefer inequality is
( )
(φ − θ)2
Vθ [T ] ≥ sup φ
φ:S(φ)⊂S(θ) (θ − φ)
≥ sup {φ(θ − φ)}
φ:S(φ)⊂S(θ)
Let y = φ(θ − φ)

Differentiate this with respect to φ


dy
= θ − 2φ

d2 y
= −2 < 0
dφ2
d2 y dy θ
For maximum of y, dφ2
< 0 at the value of φ for which dφ = 0. At φ = 2, y has
θ2
maximum. The maximum value of y is 4 . The Chapman Robbin - Kiefer lower bound
θ2
for the variance of the unbiased estimator of θ is 4 .

Remark 5.6 Chapman Robbin - Kiefer bound becomes the Cramer - Rao lower bound
by allowing ∆ → 0 and assume the range of the distribution is independent of the
∂ log pθ (x)
parameter, and the derivative ∂θ exists and finite, then

[τ (θ + ∆) − τ (θ)]2
Vθ [T ] ≥ h i2
1
Eθ [pθ+∆ (X) − pθ (X)] pθ (X)
A.Santhakumaran 203

[τ (θ+∆)−τ (θ) 2
h i
lim∆→0 ∆
≥ h i2
[pθ+∆ (X)−pθ (X)] 1
Eθ lim∆ →0 ∆ pθ (X)
0
[τ (θ)]2
≥ h i2
1
Eθ p0 (X | θ) pθ (X)
[τ 0 ]2

∂ log pθ (X) 2
h i
Eθ ∂θ
[τ 0 (θ)]2
≥ ∀ θ∈Ω
I(θ)
Problem 5.19 Obtain the Cramer - Rao lower bound for the variance of the unbiased
estimator of the parameter θ of the Cauchy distribution by considering a sample of size
n.

1 1

π 1+(x−θ)2 −∞ < x < ∞, −∞ < θ < ∞
pθ (x) =
 0 otherwise

Solution: For a single observation x of X,

1 1
L(θ) = pθ (x) =
π 1 + (x − θ)2
log L(θ) = − log π − log[1 + (x − θ)2 ]
∂ log pθ (x) 2(x − θ)
=
∂θ 1 + (x − θ)2
2
4(x − θ)2

∂ log pθ (x)
=
∂θ [1 + (x − θ)2 ]2
2
4(X − θ)2
  
∂ log pθ (X)
Eθ = Eθ
∂θ [1 + (X − θ)2 ]2
Z ∞
4 (x − θ)2
= dx
π −∞ [1 + (x − θ)2 ]3
Z ∞
4 t2
= dt since t = x − θ
π −∞ (1 + t2 )3
Z ∞
8 t2
= dt
π 0 (1 + t2 )3
Z ∞ 3
4 u 2 −1 2
= 3 3 du since t = u
π 0 (1 + u) 2 + 2
4 Γ 32 Γ 32
=
π Γ3
4 1√ 1√
π 2 π2 π 1
I(θ) = =
2 2

The Cramer - Rao lower bound from the sample of size n for the variance of the unbiased
A.Santhakumaran 204

[τ 0 (θ)]2 1
estimator of the parameter τ (θ) = θ is nI(θ) = n 21
= n2 .
Problem 5.20 Let X1 , X2 , · · · , Xn is a sample from N (θ, 1). Obtain the Cramer - Rao
lower bound for the variance of (i) θ and (ii) θ2 . Also find the unbiased estimator of θ2 .
To verify that the actual variance of the unbiased estimator of θ2 is same as Cramer -
Rao lower bound.
Solution: (i) The likelihood function for θ is
n
Y
L(θ) = pθ (xi )
i=1
n
1
 Pn
1
(xi −θ)2
= e− 2 i=1

√ n
1X
log L(θ) = −n log 2π − (xi − θ)2
2 i=1

Differentiate this with respect to θ


n
∂ log L(θ) X
= (xi − θ) = n(x̄ − θ)
∂θ i=1
∂ log L(θ) 2
 
= n2 (x̄ − θ)2
∂θ
∂ log L(θ) 2
 
Eθ = n2 Eθ [X̄ − θ]2
∂θ
1
= n2 Vθ [X̄] = n2 = n = I(θ)
n
The Cramer - Rao lower bound for the variance of the unbiased estimator X̄ of τ (θ) = θ
[τ 0 (θ)]2
is I(θ) = n1 .
1
Remark 5.7 The actual variance of the statistic X̄ is Vθ [X̄] = n. It is same as the
Cramer - Rao lower bound. .˙. X̄ is UMVUE of θ.
(ii) The likelihood function for θ becomes
√ 2 n
n 1 X
log L(θ) = − log 2π − xi − θ2
2 2 i=1
Differentiate this with respect to θ2
n √ 
∂ log L(θ) 1 X
= xi − θ 2
∂θ2 2θ i=1
n
1 X 1
= (xi − θ) = n[x̄ − θ]
2θ i=1 2θ
2
n2 n2 1

∂ log L(θ) 1 2 n
Eθ = n Eθ [X̄ − θ]2 = 2 Vθ [X̄] = 2 = 2
∂θ2 4θ 2 4θ 4θ n 4θ
A.Santhakumaran 205

The Cramer - Rao lower bound for the variance of unbiased estimator of τ (θ) = θ2 is
[τ 0 (θ)]2 4θ2 dτ (θ)
I(θ) = n where τ 0 (θ) = dθ2
= 1.

Consider Eθ [X − θ]2 = 1

Eθ [X 2 ] − 1 = θ2
"P #
n 2
i=1 Xi
Eθ − 1 = θ2
n
Pn
Xi2
.. . i=1
n − 1 is the unbiased estimator of θ2 .
Pn Pn Pn
Xi2 (Xi −θ+θ)2 (Xi −θ)2 2θ Pn
Consider i=1
n = i=1
n = i=1
n + θ2 + n i=1 (Xi − θ)
"P # "P #
Xi2 Xi2
Vθ −1 = Vθ
n n
n
"P # 2 !
(Xi − θ)2 2θ
 X
= Vθ + Vθ [Xi ] − 0
n n i=1
"P #
(Xi − θ)2 4θ2
= Vθ + 2n since Vθ [Xi ] = 1 ∀ i = 1 to n
n n
"P #
(Xi − θ)2 4θ2
= Vθ +
n n
ns2 (Xi − θ)2
P
Define Y = 2 = 2
∼ χ2 distribution with n degrees of freedom
σ σ
n 1
 
The pdf of Y ∼ G ,
2 2

1 n

 n
1
e− 2 y y 2 −1 0 < y < ∞
2 2 Γn
p(y) = 2
 0

otherwise
Z ∞
1 1 n
E [Y r ] = n
e− 2 y y 2 +r−1 dy
n
0 2 Γ2 2

1 Γ( n2 + r)
= n n
2 2 Γ n2 ( 12 ) 2 +r
2r Γ( n2 + r)
= r = 1, 2, · · ·
Γ n2
Γ( n + 1)
E[Y ] = 2 2 n =n
Γ2
E[Y 2 ] = (n + 2)n and V [Y ] = 2n
A.Santhakumaran 206

ns2
But Y and σ 2 = 1
=
σ 2 
Y 2n 2
.. . Vθ [s2 ] = Vθ = 2 =
n n n
"P #
Xi 2 4θ 2 2 4θ2
Vθ − 1 = Vθ [s2 ] + = +
n n n n
P
Xi2 4θ2
The actual variance of n − 1 is n + n2 . Here the Cramer - Rao lower bound
P
Xi2
is less than the actual variance of the unbiased estimator n − 1 of the parameter θ2 .
Note that the UMVUE of θ2 is X̄ 2 − n1 , since Eθ [X̄ 2 ] − {Eθ [X̄]}2 = 1
n
1
⇒ Eθ [X̄ 2 ] − n = θ2
1
i.e., X̄ 2 − n is unbiased estimator of θ2 .
1
Problem 5.21 Given pθ (x) = θ, 0 < x < θ, θ > 0. Compute the reciprocal
h i2
∂ log pθ (X) n+1
nEθ ∂θ . Compare this with the variance of n T where T is the largest ob-
servation of a random sample of size n for this distribution.
Solution: Given pdf of the random variable X is

 1

0<x<θ
θ
pθ (x) =
 0

otherwise
1
log pθ (x = −
θ
∂ log pθ (x) 1
= −
∂θ  θ
∂ log pθ (x) 1

=
∂θ θ2
∂ log pθ (X) 2 1
 
Eθ =
∂θ θ2
∂ log pθ (X) 2 n
 
i.e., nEθ =
∂θ θ2
1 θ2
=
∂ log pθ (X) 2 n
h i
nEθ ∂θ
Let T = max {Xi }
1≤i≤n

The pdf of T is

 nn tn−1

0<t<θ
θ
p(t | θ) =
 0

otherwise
A.Santhakumaran 207

n
Eθ [T ] = θ
n+1
n+1
⇒ T is an unbiased estimator of θ
n
n 2
Eθ [T 2 ] = θ
n+2
2
n 2 n

Vθ [T ] = θ − θ2
n+1 n+1
nθ2
=
(n + 1)(n + 2)
n+1 θ2
 
Vθ T =
n n(n + 2)
n+1 θ2
The actual variance of the unbiased estimator n T is n(n+2)

Here the actual variance of the unbiased estimator of θ is less than the Cramer
n+1
- Rao lower bound of the estimator n T. Since the distribution is not satisfied the
n+1
assumptions of the Cramer - Rao inequality . Note that n T is the UMVUE of θ.
Problem 5.22 Find the Cramer - Rao lower bound for the variance of the unbiased
estimator Pθ {X > 2} for a single observation x of X with pdf


 1 e− xθ

x>0θ>0
θ
pθ (x) =
 0

otherwise

Solution:
Z 2
1 −x
Consider τ (θ) = Pθ {X > 2} = 1 − e θ dx
0 θ
#2
− xθ
"
1 e
= 1−
θ − 1θ 0
− θ2 2
= 1+e − 1 = e− θ
1
log pθ (x) = − log θ − x
θ
2
One can take λ = e− θ , then log λ = − 2θ i.e., θ = − log2 λ .

2 x
 
log pλ (x) = − log − + log λ
log λ 2
∂ log pλ (x) log λ 1 x1
 
= − (−2)(−1) (log λ)−2 +
∂λ −2 λ 2λ
A.Santhakumaran 208

1 x
= +
λ log λ 2λ
∂ log pθ (x) θ x 2
= + e− θ
2 − θ2
 
∂ e− θ e 2
2

= [x − θ]
2
 2
4
∂ log pθ (X)  eθ
Eθ   2
 = Eθ [X − θ]2
∂ e− θ 4
4
eθ 2
= θ since Eθ [X − θ]2 = θ2
4
−2
The Cramer - Rao lower bound for the variance of the unbiased estimator of τ (θ) = e θ

4 − θ4 2
is θ2
e , since τ 0 (θ) = ∂τ−(θ)2  = 1. The unbiased estimator of τ (θ) = e− θ is
∂ e θ


 1

if X > 2
T =
 0

otherwise

5.7 Efficiency of a statistic

As a consequence of Cramer - Rao inequality, the efficient estimator is as follow:


Definition 5.3 Let T = t(X) be an unbiased estimator of a parameter θ. Then T = t(X)
is called an efficient estimator of θ iff the variance of T = t(X) attains the Cramer - Rao
lower bound.
Definition 5.4 The ratio of the actual variance of any unbiased estimator of a parameter
to the Cramer - Rao lower bound is called the efficiency of that estimator.

Actual Variance of the statistic


Efficiency =
Cramer - Rao lower bound of that statistic

Definition 5.5 An estimator is said to be efficient estimator if efficiency is one.


Definition 5.6 An estimator is said to be an asymptotic efficient estimator if efficiency
tends to one as n → ∞.
Using Cramer - Rao lower bound to find the efficient estimator has the following
limitations.

• UMVUE exists even the Cramer - Rao regularity conditions are not satisfied.
A.Santhakumaran 209

• UMVUE exists when the regularity conditions are satisfied but UMVUE’s are not
attained the Cramer - Rao lower bound.

Problem 5.23 Let X1 , X2 , · · · , Xn be a random sample from



 θe−θx

0 < x < ∞, θ > 0
pθ (x) =
 0

otherwise

Obtain the asymptotic efficient estimator of θ.


Solution: For a single observation x of X

L(θ) = pθ (x) = θe−θx

log L(θ) = log θ − θx


∂ log pθ (x) 1
= −x
∂θ θ
2
∂ log pθ (x) 1
= − 2
" ∂θ2 # θ
2
∂ log pθ (X) 1
Eθ = − 2
∂θ2 θ

The Cramer - Rao lower bound for the variance of the unbiased estimator of θ is
1 θ2
n 1 = n.
θ2

n
1
X  
Let T = Xi , thenT ∼ G n,
i=1
θ

 θn e−θt tn−1

0<t<∞
Γn
pθ (t) =
 0

otherwise
  Z ∞ n
1 θ −θt n−1−1
Eθ = e t dt
T 0 Γn

θn Γ(n − 1)
=
Γn θn−1
θ
=
n−1
n−1
 
Eθ = θ if n = 2, 3, · · ·
T
n−1
is the unbiased estimator of θ.
T
A.Santhakumaran 210

1 θ2
 
Eθ = if n = 3, 4, · · ·
T2 (n − 1)(n − 2)
1 θ2
 
Vθ =
T (n − 1)2 (n − 2)
n−1 θ2
 
Vθ = , if n = 3, 4, · · ·
T n−2
n−1 θ2 n−1
Actual variance of T is n−2 . Cramer - Rao lower bound of the unbiased estimator T
θ2
of θ is n.
θ2
n−2
Efficiency = θ2
n
n 1
= = 2 ,n = 3, 4, · · ·
n−2 1− n
→ 1 as n → ∞

n−1 n−1
Thus T is the asymptotic efficient estimator of θ. Note that T is the UMVUE of θ.
Theorem 5.10 A necessary and sufficient condition for an estimator to be the most
∂ log pθ (x)
efficient is that T = t(X) is sufficient and t(x) − τ (θ) is proportional to ∂θ where
Eθ [T ] = τ (θ).
∂ log pθ (x)
Proof: Assume T = t(X) is a most efficient estimator of τ (θ) and t(x)−τ (θ) ∝ ∂θ

∂ log pθ (x)
i.e., t(x) − τ (θ) = A(θ)
∂θ
Prove that T = t(X) is a sufficient statistic.
t(x) − τ (θ) ∂ log pθ (x)
=
A(θ) ∂θ
t(x) τ (θ) ∂ log pθ (x)
− =
A(θ) A(θ) ∂θ
t(x) τ (θ)
Z Z Z
dθ − dθ = d log pθ (x) + c(x)
A(θ) A(θ)
Z θ Z θ
1 τ (θ)
Choose dθ = Q(θ) and d(θ) = c1 (θ)
−∞ A(θ) −∞ A(θ)
Then t(x)Q(θ) − c1 (θ) − c(x) = log pθ (x)

eQ(θ)t(x)−c1 (θ)−c(x) = pθ (x)

pθ (x) = c(θ)eQ(θ)t(x) h(x)

where c(θ) = e−c1 (θ) and h(x) = e−c(x) .


A.Santhakumaran 211

It is an one parameter exponential family. . ˙. T = t(X) is a sufficient statistic.


Conversely, assume T = t(X) is sufficient and t(x) − τ (θ) = A(θ) ∂ log∂θpθ (x) . Prove that
T = t(X) is the most efficient estimator of τ (θ).

∂ log pθ (x)
t(x) − τ (θ) = A(θ)
∂θ
t(x) − τ (θ) ∂ log pθ (x)
=
A(θ) ∂θ
2 2
t(x) − τ (θ) ∂ log pθ (x)
 
=
A(θ) ∂θ
2
1 ∂ log pθ (X)

2
Eθ [T − τ (θ)]2 = Eθ
[A(θ)] ∂θ
2
Vθ [T ] ∂ log pθ (X)

= Eθ
[A(θ)]2 ∂θ
2
∂ log pθ (X)

Vθ [T ] = [A(θ)]2 Eθ (5.2)
∂θ

 
∂ log pθ (X)
But Eθ T, = τ 0 (θ)
∂θ
 
∂ log pθ (x)
i.e, Eθ (T − τ (θ)) , = τ 0 (θ)
∂θ
since Eθ [ ∂ log∂θpθ (x) ] = 0
"  2 #
∂ log pθ (X)
Eθ A(θ) = τ 0 (θ)
∂θ
since t(x) − τ (θ) = A(θ) ∂ log∂θpθ (x)
 2
∂ log pθ (X)
A(θ)Eθ = τ 0 (θ)
∂θ
τ 0 (θ)
i.e., A(θ) = h i2
∂ log pθ (X)
Eθ ∂θ

[τ 0 (θ)]2
From equation (5.2) ⇒Vθ [T ] = ∀θ∈Ω
Eθ [ ∂ log∂θ
pθ (X) 2
]

Thus the actual variance of T = t(X) is equal to the Cramer - Rao lower bound.
Remark 5.8 UMVUE may be most efficient estimator. As discussed in problem 5.20,
n−1
T , n = 3, 4, · · · is the UMVUE of θ but not most efficient estimator of θ.
A.Santhakumaran 212

5.8 Extension of information inequality

Cramer - Rao inequality has been modified and extended in different directions. Consider
the first case, where θ is a vector. In second case, it may extend the inequality to get better
bounds for the variance of unbiased estimators. Bhattacharya gives a method of having a
whole sequence of non-decreasing lower bounds for the variance of an unbiased estimator
by successive differentiation of the likelihood function with respect to the parametric
function.
Lemma 5.1 For any random variables X1 , X2 , · · · , Xr with finite second moments, the
covariance matrix
C = [Cov(Xi , Xj )]r×r

is positive semi definite. It is positive definite iff Xi ’s i = 1 to r are independent.


Pr
Proof: Assume Xi ’s are not independent. Consider the variance of i=1 ai Xi
  
c11 · · · c1r a1
  
" r #   
X  c21 · · · c2r 
  a2 
 
i.e., V ai Xi = (a1 , a2 , · · · , ar ) 

 
 ···  ··· 
··· ··· 
  
i=1
  
cr1 · · · crr ar
= a0 Ca ≥ 0 ∀ a0 = (a1 , a2 , · · · , ar )

where C is the covariance matrix.


⇒ C is positive semi definite.
If Xi ’s are independent , then
  
c11 0 ··· 0 a1
  
" r #   
X  0 c22 · · · 0   a2 
V ai Xi = (a1 , a2 , · · · , ar ) 
  
 
 ··· · ··· ·  · 
  
i=1
  
0 · · · · crr ar
= a0 Ca > 0 ∀ a0 = (a1 , a2 , · · · , ar )

⇒ C is positive definite, since cii = V [Xi ] ∀ i = 1, 2, · · · , r and cij = 0 if i 6= j.


Lemma 5.2 Let (X1 , X2 , · · · , Xr ) and Y have finite second moment, let νi = Cov[Xi , Y ]
and Σ be the covariance matrix of the Xi ’ s. Without loss of generality suppose Σ is
A.Santhakumaran 213

ν 0 Σ−1 ν
positive definite, then ρ2 = V [Y ] , ρ is the multiple correlation coefficient between Y
and the vector (X1 , X2 , · · · , Xr ).
Proof: Define ρ is the correlation coefficient between a0 X and Y where a0 =
(a1 , a2 , · · · , ar ) and X0 = (X1 , X2 , · · · , Xr ),
{Cov [ ri=1 ai Xi , Y ]}2
P
i.e., ρ2 = .
V [Y ]V [ ri=1 ai Xi ]
P

Maximizing ρ2 is not uniquely determined by a0 , since ρ is invariant under changes of scale.


Obtaining the unique maximum of ρ, one can impose the condition that V [Σri=1 ai Xi ] =
a0 Σa = 1. Maximizing ρ subject to a0 Σa = 1 is equivalent to maximizing a0 ν subject to
a0 Σa = 1. By Lagrangian multiplier method, the Lagrangian equation is
1
L(a, λ) = a0 ν − λ[a0 Σa − 1]
2
∂L(a, λ)
= ν − λaΣ
∂a
∂L(a, λ)
The necessary condition for maximum is =0
∂a
1
⇒ ν − λaΣ = 0 i.e., a = Σ−1 ν
λ
1 0 −1
ν Σ ν = 1 since a0 Σa = 1
λ2

λ = ± ν 0 Σ−1 ν
Σ−1 ν
.. . a = √
ν 0 Σ−1 ν#
r
"
= a0 Cov [X, Y ] = a0 ν
X
Cov ai Xi , Y
i=1
a0 ν ν 0 Σ−1 ν
.. . ρ = p =√ p
V [Y ] ν 0 Σ−1 ν V [Y ]
ν 0 Σ−1 ν
ρ2 =
V [Y ]
Theorem 5.11 For any unbiased estimator T = t(X) of τ (θ) and any functions
ψi (x, θ) with finite second moments, then V [T ] ≥ ν 0 C −1 ν where ν 0 = (ν1 , ν2 , · · · , νr ) and
C = [cij ]r×r are defined by νi = Cov[T, ψi (X, θ)] and cij = Cov[ψi (X, θ)ψj (X, θ)], i, j =
1, 2, · · · , r.
Proof: As in Lemma 5.2, replace Y by T and Xi by ψi (X, θ), then
ν 0 C −1 ν
ρ2 = ≤1
V [T ]
A.Santhakumaran 214

V [T ] ≥ ν 0 C −1 ν

where νi = Cov[T, ψi (X, θ)] = τi0 (θ), i = 1, 2, · · · , r, and C = Σ.

5.9 Multiparameter information inequality

Let X be distributed with density pθ (x), θ ∈ Ω where θ is a vector, say θ =


(θ1 , θ2 , · · · , θr ).
Assumptions:

(i) Ω is an open interval ( finite or infinite or semi infinite).

(ii) The range of the distribution Pθ is independent of the parameter θ = (θ1 , θ2 , · · · , θr ).

(iii) For any x and θ ∈ Ω and i = 1, 2, · · · , r the derivative exists and is finite.

Define the information matrix of order r


" #
∂ log pθ (X) ∂ log p(X, θ)
I(θ) = [Iij (θ)]r×r where Iij (θ) = Eθ
∂θi ∂θj
For a single observation x of X and the assumptions (i) to (iii)
Z
pθ (x)dx = 1

Differentiate partially with respect to θi


∂pθ (x)
Z
= 0
∂θi
∂pθ (x)
Z
pθ (x)dx = 0
∂θi
∂ log pθ (X)
 
Eθ = 0
∂θi
Z 2
∂ log pθ (x) ∂ log pθ (x) 1 ∂pθ (x)
Z
pθ (x)dx + dx = 0
∂θi ∂θj ∂θi pθ (x) ∂θj
" # " #
∂ 2 log pθ (X) ∂ log pθ (X) ∂ log pθ (X)
Eθ + Eθ = 0
∂θi ∂θj ∂θi ∂θj
" #
∂ log pθ (X) ∂ log pθ (X)
Iij (θ) = Eθ
∂θi ∂θj
" #
∂ 2 log pθ (X)
= −Eθ for i 6= j and
∂θi ∂θj
" #
∂ 2 log pθ (X)
= −Eθ for i = j
∂θi2
A.Santhakumaran 215

h i
∂ log pθ (X)
Theorem 5.12 Suppose that assumptions (i) to (iii) and the relation Eθ ∂θi =
0, i = 1, 2, · · · , r hold and I(θ) is positive definite. Let T = t(X) be any statistic with
Eθ [T 2 ] < ∞ for which the derivative with respect to θi , i = 1, 2, · · · , r of Eθ [T ] =
R
tpθ (x)dx
exists for each i and can be obtained by differentiating under the integral sign. Then
∂Eθ [T ]
Vθ [T ] ≥ α0 I −1 (θ)α, where α0 is the row vector with ith element αi = ∂θi , i = 1, 2, · · · , r.
∂ log pθ (x)
Proof: As in Theorem 5.11, replace ψi (x, θ) = ∂θi ,i = 1, 2 · · · , r and ν = α ,
C = I(θ) ⇒ Vθ [T ] ≥ α0 I −1 (θ)α.

Problem 5.21 Let X1 , X2 , · · · , Xn iid N( θ, σ 2 ). Obtain the information inequality for


the parameter θ = (θ, σ 2 ).
Solution: Cramer - Rao inequality for θ = (θ1 , θ2 ) is Vθ [T ] ≥ α0 I −1 (θ)α, where T =
(T1 , T2 ) and
   
∂Eθ [T]
∂θ1 τ 0 (θ 1)
α =  =
   

∂Eθ [T]
∂θ2 τ 0 (θ 2)
" #
∂ 2 log L(θ)
Iij (θ) = −Eθ i 6= j; i, j = 1, 2.
∂θi ∂θj
" #
∂ 2 log L(θ)
= −Eθ i=j
∂θi2
 
 I11 (θ) I12 (θ) 
I(θ) =  
I21 (θ) I22 (θ)
" # " #
∂ 2 log L(θ) ∂ 2 log L(θ)
I11 (θ) = −Eθ = −Eθ
∂θ1 ∂θ1 ∂θ2
" 2 #
∂ log L(θ)
= Eθ where θ1 = θ
∂θ
" #
∂ 2 log L(θ)
I12 (θ) = I21 (θ) = −Eθ
∂θ1 ∂θ2
" #
∂ 2 log L(θ)
= −Eθ where θ2 = σ 2
∂θ∂σ 2
" # " #
∂ 2 log L(θ) ∂ 2 log L(θ)
I22 (θ) = −Eθ = −Eθ
∂θ2 ∂θ2 ∂θ∂σ 2
The likelihood function for θ is
n
Y
L(θ) = pθ (xi )
i=1
A.Santhakumaran 216

n
1

− 12
2
P
(xi −θ)2
= 2
e 2σ
2πσ
n n 1 X
log L(θ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(θ) 1 X
= 2 (xi − θ)
∂θ 2σ 2
n
= [x̄ − θ]
σ2
2
∂ log L(θ) 1 2

Eθ = n Eθ [X̄ − θ]2
∂θ σ4
n2 σ 2 n
I11 (θ) = =
σ 4 n" σ 2 #
∂ 2 log L(θ)
I12 (θ) = I21 (θ) = −Eθ =0
∂θ∂σ 2
" #
∂ 2 log L(θ) n
since Eθ = − Eθ [X̄ − θ] = 0
∂σ 2 ∂θ σ4
∂ log L(θ) n 1 X
= − + (xi − θ)2
∂σ 2 2σ 2 2(σ 2 )2
∂ 2 log L(θ) n 1 X
= − (xi − θ)2
∂(σ 2 )2 2σ 4 (σ 2 )3
" #
∂ 2 log L(θ) n nσ 2
−Eσ2 = − +
∂(σ 2 )2 2σ 4 σ6
n 1 n
 
I22 (θ) = 1− = 4
σ4 2 2σ
   
n σ2
σ2
0  −1 n 0 
I(θ) =  I (θ) = 
 
2σ 4
 
n
0 2σ 4
0 n
   
∂θ
 1   ∂θ
α =  =

∂σ 2

1 ∂σ 2
  
σ2
0  1 
 
n
Vθ [T] ≥ 1 1

2σ 4
  
0 n 1

σ2 2σ 4
i.e.,Vθ [T1 ] ≥ n and Vσ2 [T2 ] ≥ n .
σ2
Remark 5.9 n is the actual variance of the unbiased estimator T1 = X̄ for θ is same
2σ 4
as the Cramer - Rao lower bound of that estimator but n−1 is the actual variance of
1 Pn 2
the unbiased estimator T2 = n−1 i=1 (Xi − X̄) is greater than the Cramer - Rao lower

bound of that estimator.


A.Santhakumaran 217

5.10 Higher order information inequality

When the lower bound is not sharp, it can be improved by considering the higher
order derivatives of the likelihood function of the parameter θ.
Assumptions: Let X1 , X2 , · · · , Xn be distributed with pdf p(x | θ), θ ∈ Ω.

(i) Ω is an open interval ( finite , infinite or semi infinite).

(ii) The range of the distribution Pθ , θ ∈ Ω is independent of the parameter θ.

(iii) For any x and θ ∈ Ω, the higher order derivatives

∂ i1 +i2 +···+is log L(θ)


∂θ1i1 · · · ∂θsis

exists and is finite.

(vi) Define K(θ) = [Kij (θ)]s×s


" #
∂ i1 +i2 +···+is log L(θ) ∂ j1 +j2 +···+js log L(θ)
where Kij (θ) = Eθ
∂θ1i1 · · · ∂θsis ∂θ1j1 · · · ∂θsjs

Theorem 5.13 Suppose that the assumptions (i) to (iv) hold and that the covariance
matrix K(θ) is positive definite. Let T = t(X) be any statistic with Eθ [T 2 ] < ∞ for
which the higher order derivative τ i1 +i2 +···+is (θ) exists for each i = 1, 2, · · · , s and can be
obtained by differentiating under the integral sign. Then Vθ [T ] ≥ α0 K −1 (θ)α, where α0 is
row vector with elements
!
∂ i1 +i2 +···+is Eθ [T ] ∂ i1 +i2 +···+is log L(θ)
= Covθ T,
∂θ1i1 · · · ∂θsis ∂θ1i1 · · · ∂θsis
= τ i1 +···+is (θ)

Proof: As in Theorem 5.11, replace

∂ i1 +i2 +···+is log L(θ)


ψi (x, θ) =
∂θ1i1 · · · ∂θsis
and C = K(θ) = [Kij (θ)]s×s and ν = α0 = ( τ 0 (θ) τ 00 (θ) · · · τ (s) (θ) ),

then Vθ [T ] ≥ α0 K −1 (θ)α
A.Santhakumaran 218

Problem 5.25 Given that X ∼ b(n, θ), 0 < θ < 1 . Obtain the Bhattacharya bound
for the unbiased estimator of the parameter τ (θ) = θ2 .
Solution:

L(θ) = p(x | θ) = cnx θx (1 − θ)n−x

log L(θ) = log cnx + x log θ − (n − x) log(1 − θ)


" #
∂ i1 +i2 log L(θ) ∂ j1 +j2 log L(θ)
K(θ) = [Kij (θ)] = Eθ
∂θ1i1 ∂θ2i2 ∂θ1j1 ∂θ2j2
 
∂ log L(θ) ∂ log L(θ) ∂ log L(θ) ∂ 2 log L(θ)
∂θ ∂θ ∂θ ∂θ2
K(θ) = Eθ 
 
∂ 2 log L(θ) ∂ log L(θ) ∂ 2 log L(θ) ∂ 2 log L(θ)

∂θ2 ∂θ ∂θ2 ∂θ2
 
∂ log L(θ) 2 ∂ log L(θ) ∂ 2 log L(θ)
 
∂θ ∂θ ∂θ2
= Eθ 
 
∂ 2 log L(θ) ∂ log L(θ) ∂ log L(θ) 2
 2  
∂θ2 ∂θ ∂θ 2

∂ log L(θ) x n−x


= −
∂θ θ (1 − θ)
x − xθ − nθ + xθ (x − nθ)
= =
θ(1 − θ) θ(1 − θ)
2
∂ log L(θ) (x − nθ)2

=
∂θ θ2 (1 − θ)2
2 " #
∂ log L(θ) (X − nθ)2 nθ(1 − θ)

Eθ = Eθ 2 = 2
∂θ θ (1 − θ)2 θ (1 − θ)2
n
=
θ(1 − θ)
" # " #
∂ log L(θ) ∂ 2 log L(θ) ∂ log L(θ) ∂ 2 log L(θ)
 
Eθ = Eθ Eθ
∂θ ∂θ2 ∂θ ∂θ2
∂ log L(θ)
 
= 0 since Eθ =0 and
∂θ
" #
∂ 2 log L(θ) ∂ log L(θ)
 
Eθ 2
Eθ = 0
∂θ ∂θ
" # " # " #
∂ 2 log L(θ) ∂ 2 log L(θ) ∂ 2 log L(θ) ∂ 2 log L(θ)
Eθ = Eθ Eθ
∂θ2 ∂θ2 ∂θ2 ∂θ2
" # 2
∂ 2 log L(θ) ∂ log L(θ) n

Eθ = −Eθ =−
∂θ2 ∂θ θ(1 − θ)
" # " #
∂ 2 log L(θ) ∂ 2 log L(θ) n2
Eθ Eθ =
∂θ2 ∂θ2 θ2 (1 − θ)2
A.Santhakumaran 219

   
n θ(1−θ)
θ(1−θ) 0 −1 n 0
K(θ) =  , K (θ) =  
n2 θ 2 (1−θ)2
0 θ 2 (1−θ)2 0 n

τ (θ) = θ2 , τ 0 (θ) = 2θ, τ (θ) = 2


00
  
h i θ(1−θ) 0 2θ
n
Vθ [T ] ≥ 2θ, 2   
θ 2 (1−θ)2
0 n 2
4θ3 (1 − θ) 4θ2 (1 − θ)2
≥ +
n n2
≥ Cramer - Rao lower bound of θ2 + positive quantity
 
n! x 1
Since log L(θ) = log + log θ2 + (n − x) log[1 − (θ2 ) 2 ]
x!(n − x)! 2
Differentiate this with respect to θ2
∂ log L(θ) (x − nθ)
=
∂θ2 2θ2 (1 − θ)
2
(X − nθ)2
  
∂ log L(θ)
Eθ = Eθ
∂θ2 4θ4 (1 − θ)2
nθ(1 − θ)
=
4θ4 (1 − θ)2
n
I(θ) =
4θ3 (1 − θ)
1 4θ3 (1−θ)
The Cramer - Rao lower bound for the variance of an unbiased estimator is I(θ) = n ,
since τ 0 (θ) = 1.
Remark 5.10 (i) Bhattacharya inequality becomes Cramer - Rao Inequality when s = 1,
i.e.,α1 = τ 0 (θ) and
∂ log L(θ) ∂ log L(θ)
 
K11 (θ) = Eθ
∂θ ∂θ
2
∂ log L(θ)

= Eθ = I(θ)
∂θ
Vθ [T ] ≥ α1 [I −1 (θ)]α1
α12
=
I(θ)
[τ 0 (θ)]2
= h i
Vθ ∂ log∂θL(θ)

(ii) When s = 2 Bhattacharya inequality gives the non decreasing lower bound for the
variance of an unbiased estimator of τ (θ). The Bhattacharya inequality is

Vθ [T ] ≥ α0 K −1 (θ)α
A.Santhakumaran 220

where α0 = τ 0 (θ) τ 00 (θ)



and
 
 K11 (θ) K12 (θ) 
K(θ) =  
K21 (θ) K22 (θ)
2×2
 
0 τ 00 (θ)
 Vθ [T ] τ (θ) 
 
Consider  τ 0 (θ) K (θ) K (θ) ≥0
 11 12 
 
τ 00 (θ) K21 (θ) K22 (θ)
2 (θ)] − τ 0 (θ)[τ 0 (θ)K (θ) − τ 00 (θ)K (θ)]
Vθ [T ][K11 (θ)K22 (θ) − K12 22 12

+ τ 00 (θ)[τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)] ≥ 0

2 (θ)] ≥ τ 0 (θ)[τ 0 (θ)K (θ) − τ 00 (θ)K (θ)] − τ 00 (θ)[τ 0 (θ)K (θ) − τ 00 (θ)K (θ)]
Vθ [T ][K11 (θ)K22 (θ) − K12 22 12 12 11

1

≥ K11 (θ)
[τ 0 (θ)]2 K22 (θ)K11 (θ) − 2τ 0 (θ)τ 00 (θ)K11 (θ)K12 (θ) + [τ 00 (θ)]2 K11
2 (θ)

1

≥ K11 (θ)
[τ 0 (θ)]2 K12
2 (θ) + [τ 0 (θ)]2 K (θ)K (θ) − 2τ 0 (θ)τ 00 (θ)K (θ)K (θ) + [τ 00 (θ)]2 K 2 (θ) − [τ 0 (θ)]2 K 2 (θ)
22 11 11 12 11 12

1

≥ K11 (θ)
[τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)]2 + [τ 0 (θ)]2 [K11 (θ)K22 (θ) − K12
2 (θ)]

1 [τ 0 (θ)K12 (θ) − τ 00 (θ)K11 (θ)]2


Vθ [T ] ≥ [τ 0 (θ)]2 + 2 (θ)]
K11 (θ) K11 (θ)[K11 (θ)K22 (θ) − K12
≥ Cramer - Rao Inequality + Positive quantity
h i
2 (θ) > 0 and K (θ) = V ∂ log L(θ)
Since K(θ) is positive definite so K11 (θ)K22 (θ)−K12 11 θ ∂θ >
0. Thus the Bhattacharya Inequality is more sharper than the Cramer - Rao inequality.

Problems

5.1 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal population with mean
X1 +X2 +···+Xn X1 +X2 +···+Xn
θ. Which among the two estimators T1 = n and T2 = n is
better? Why?

5.2 Show that, under some conditions to be stated there is a lower limit to the variance
of an unbiased estimator. How you modify the lower limit to a biased estimator?

5.3 Let X1 , X2 be independent random variables each having Poisson distribution with
h i
X1 +X2
mean θ. Show that Vθ 2 ≤ Vθ [2X1 − X2 ]. Also justify the inequality by Rao
- Blackwell Theorem.

5.4 Show that Bhattacharya bound is better than Cramer - Rao bound.
A.Santhakumaran 221

5.5 Define Bhattacharya bound of order r. Also obtain Bhattacharya bound of order
2 for estimating θ2 unbiasedly, θ being the mean of a Bernoulli distribution from
which a sample of size n is available.

5.6 Let X and Y have a bivariate normal distribution with mean θ1 and θ2 with positive
variance σ12 and σ22 and with correlation coefficient ρ. Find Eθ2 [Y | X = x] = φ(x)
and variance of φ(X).

5.7 Mention the significance of Rao - Blackwell Theorem.

5.8 In what way, Lehman - Scheffe’s Theorem different from Rao - Blackwell Theorem.

5.9 Let X be a Hyper Geometric random variable with pmf


D N-D
  
x n-x
PD {X = x} =
N
 
n

where max(0, D + n − N ) ≤ x ≤ min(n, D). Find the UMVUE for D, where N is


assumed to be known.

5.10 Let X1 , X2 · · · , Xn be a random sample from a population with mean θ and finite
Pn
variance and T = t(X) be an estimator of θ of the form T = i=1 αi Xi . If T is
an unbiased estimator of θ that has minimum variance and T 0 = t0 (X) is another
linear unbiased estimator of θ, then Covθ (T, T 0 ) = Vθ [T ].

5.11 Let X1 , X2 , · · · , Xn be a random sample from p(x | θ) = θe−θx , θ > 0, x > 0. Show
that Pn−1
n
X
is the UMVUE of θ.
i=1 i

5.12 Stating the assumptions clearly, derive the Chapman - Robbin lower bound for the
variance of an unbiased estimator of a function of a real valued parameter θ.

5.13 A random sample X1 , X2 , · · · , Xn is available from a Poisson population with mean


λ. Using the unbiased estimator T = t(X1 , X2 ) = X12 − X2 . Obtain the UMVUE of
λ2 based on the sample.
A.Santhakumaran 222

5.14 State the Bhattacharya bound of order s. Also prove that it is a non - decreasing
function of s.

5.15 Define Bhattacharya bound. Show that it is sharper than the Cramer - Rao bound.

5.16 On the basis of a random sample of size n, the Cramer - Rao lower bound of variance
of an unbiased estimator of θ in

1
−∞ < x < ∞; −∞ < θ < ∞


π[1+(x−θ)2 ]
pθ (x) =
 0

otherwise
is equal to
( a) n1 (b) 1
n2
(c) 2
n (d) 2
n2
Ans:(c)

5.17 T1 = t1 (X) and T2 = t2 (X) are independent unbiased estimators of θ with V [Ti ] =
vi , i = 1, 2. The best linear unbiased estimator (l1 T1 + l2 T2 ) of θ is the one for which
(a) l1 = l2 = 0.5
v2 v1
(b) l1 = (v1 +v2 ) ; l2 = (v1 +v2 )
v1−1
(c) l1 = (v1 +v2−1
−1 )
(d) l1 = 0, l2 = 1 if v1 > v2 and vice versa Ans:(b)

5.18 Consider the following statements:


If X1 , X2 , · · · , Xn are iid random variables with uniform distribution over (0, θ),
then
1. 2X̄ is an unbiased estimator of θ.
2. The largest among X1 , X2 , · · · , Xn is an unbiased, estimator of θ.
3. The largest among X1 , X2 , · · · , Xn is sufficient for θ.
n+1
4. n X(n) is a minimum variance unbiased estimator of θ.
Of these statements
(a) 1 and 3 are correct
(b) 1 and 4 are correct
(c) 1 and 2 are correct
(d) 1 , 3 and 4 are correct Ans:(c)
A.Santhakumaran 223

5.19 Which one of the following is not necessary for the UMVU estimation of θ by
T = t(X)?
(a) Eθ [T − θ] = 0
(b) Eθ [T − θ]2 < ∞
(c) Eθ [T − θ]2 is minimum
(d) T is a linear function of observations. Ans:(d)

5.20 If T1 = t1 (X) and T2 = t2 (X) are unbiased estimators of θ and θ2 (0 < θ < 1) and
T is a sufficient statistic, then E[T1 | T ] − E[T2 | T ] is
(a) the minimum variance unbiased estimator of θ
(b) always an unbiased estimator of θ(1 − θ) which has variance not exceeding that
of θ(1 − θ)
(c) always the minimum variance unbiased estimator of θ(1 − θ)
(d) not an unbiased estimator of θ(1 − θ). Ans:(b)

5.21 T 0 = t0 (X) and T = t(X) are two unbiased estimator of τ (θ) with variance Vθ [T ] <
∞ and Vθ [T 0 ] < ∞. The estimator T is said to be an efficient estimator of τ θ), if
(a) Vθ [T ] < Vθ [T 0 ]
(b)Vθ [T ] > Vθ [T 0 ]
(c) Vθ [T ] = Vθ [T 0 ]
(d) none of the above Ans:(d)

5.22 T 0 = t0 (X) and T = t(X) are two unbiased estimator of τ (θ) with variance Vθ [T ] <
∞ and Vθ [T 0 ] < ∞. The estimator T is an efficient estimator relative to T 0 of the
parameter τ (θ), if
(a) Vθ [T ] < Vθ [T 0 ]
(b) Vθ [T ] > Vθ [T 0 ]
(c) Vθ [T ] 6= Vθ [T 0 ]
(d) none of the above Ans:(a)

5.23 Suppose
r X1 , X2 , · · · Xn are iid as N (µ, σ 2 ) , −∞ < µ < ∞, σ 2 > 0. Then
Pn
(Xi −X̄)2
(a) i=1
n−1 is the Minimum Variance Unbiased Estimate of σ
A.Santhakumaran 224

Pn
(Xi −X̄)2
(b) i=1
n−1 is the minimum variance unbiased estimator of σ 2
Pn
(Xi −X̄)
(c) i=1
is the minimum variance unbiased estimator of σ
Pn n−1 2
(X i − X̄)
(d) i=1
n is the minimum variance unbiased estimator of σ 2 Ans: (b)
6 METHODS OF POINT ESTIMATION

6.1 Introduction

Chapters 3 , 4 and 5 discuss the properties of a good estimator. The methods of


obtaining such estimators are

(i) Method of Maximum Likelihood Estimation

(ii) Method of Minimum Variance Bound Estimation

(iii) Method of Moments Estimation

(iv) Method of Least Square Estimation

(iv) Method of Minimum Chi-Square Estimation

6.2 Method of maximum likelihood estimation

The Maximum Likelihood Estimation is a principle, states that an estimate of θ,


say θ̂(x) within the admissible range of θ which makes the likelihood function L(θ) as
large as possible. That is, for any admissible value θ̂(x), L(θ̂) ≥ L(θ). Thus θ̂(x) is the
∂L(θ) ∂ 2 L(θ) ∂ log L(θ)
solution of ∂θ = 0 and ∂θ2
< 0 at θ = θ̂(x). It is equivalent that ∂θ = 0 and
∂2 log L(θ)
∂θ2
< 0 at θ = θ̂(x). Thus any non - trivial solution θ̂(X) of the equations which
maximizes L(θ) is called Maximum Likelihood Estimator (MLE) of θ.
Problem 6.1 Let X1 , X2 , · · · , Xn be iid random sample drawn from a population
with pdf N (0, θ), θ > 0. Find the MLE of θ.
Solution: The likelihood function for θ of the sample size n is
n
Y
L(θ) = pθ (xi )
i=1
n
1 2 − 1 Pn (xi −θ)2

= e 2θ i=1
2πθ
n
n n 1 X
log L(θ) = − log 2π − log θ − (xi − θ)2
2 2 2θ i=1
A.Santhaakumaran 225

Differentiating this with respect to θ


n
∂ log L(θ) n 1 X
= 0− + x2
∂θ 2θ 2θ2 i=1 i
n
∂ 2 log L(θ) n 1 X
= − x2
∂θ2 2θ2 θ3 i=1 i
Pn
∂ log L(θ) 1 Pn 2 x2i
For maximum , ∂θ = 0 ⇒ −n + θ i=1 xi = 0, i. e., θ̂ = i=1
n and

∂ 2 log L(θ) n
2
= − < 0 at θ = θ̂(x)
∂θ 2θ̂
Pn
Xi2
The MLE of θ is θ̂(X) = i=1
n .
Problem 6.2 A random sample of size n is drawn from a population having density
function 
 θxθ−1

0 < x < 1, 0 < θ < ∞
pθ (x) =
 0

otherwise
Find the MLE of θ.
Solution: The likelihood function for θ of the sample size n is
n
Y
L(θ) = pθ (xi )
i=1
n
xθ−1
Y
= θn i
i=1
n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1
n
∂ log L(θ) n X
= + log xi
∂θ θ i=1
∂ 2 log L(θ) n
2
= − 2
∂θ θ
∂ log L(θ)
For maximum, = 0
∂θ
n
n X
⇒ + log xi = 0
θ i=1
−n
i.e., θ̂(x) = Pn and
i=1 log xi
∂ 2 log L(θ) −n X 2
θ = θ̂(x) = log x i
∂θ2 n2
A.Santhaakumaran 226

log xi )2
P
(
= − <0
n
−n
Thus the MLE of θ is θ̂(X) = Pn .
i=1 log Xi

Problem 6.3 Let X1 , X2 , · · · , Xn be iid with common pdf



 1 e− xθ

0 < x < ∞, θ > 0
θ
pθ (x) =
 0

otherwise

Obtain the MLE of Pθ {X > 2}.


Solution:

Let p = Pθ {X > 2}

1 − Pθ {X ≤ 2}
=
Z 2
1 −x 2
= 1− e θ dx = e− θ
0 θ
2 1 2
log p = − ⇒ log =
θ p θ
2
⇒ θ =  
log p1

A sample of size n is taken and it is known that k of the observations are X > 2 and
(n − k) of the observation are X < 2. The likelihood function for p of the sample size
n is

L(p) = pk (1 − p)n−k

log L(p) = k log p + (n − k) log(1 − p)


∂ log L(p) k (n − k)
= + (−1)
∂p p (1 − p)
k − np
=
p(1 − p)
∂ 2 log L(p) −np2 − k + 2pk
=
∂p2 [p(1 − p)]2
∂ log L(p)
For maximum, = 0
∂p
⇒ k − np = 0
A.Santhaakumaran 227

k
i.e., p̂ = and
n
2
∂ 2 log L(p) −n nk 2 − k + 2 nk k
= i2
∂p2
h
k
n (1 − nk )
k
p̂= n
h i
k
k 1− n k
= −h i2 < 0 since n < 1 for n = 1, 2, · · ·
k k
n (1 − n)

Thus the value of the MLE of p is p̂ = nk .


2
− −2
The value of the MLE of P {X > 2} = e θ̂(x) where θ̂(x) = k .
log( n )
Problem 6.4 Let X1 , X2 , · · · , Xn be a random sample drawn from a normal popula-
tion with mean θ and variance σ 2 . The density function

1 2
 √ 1 e− 2σ2 (x−θ)

−∞ < x < ∞, −∞ < θ < ∞, σ 2 > 0
pθ,σ2 (x) = 2πσ
 0

otherwise

Find the MLE of

(i) θ when σ is known.

(ii) σ 2 when θ is known.

(iii) both θ and σ 2 are not known.

Solution: Case (i) When σ 2 is known, the likelihood function for θ is


n  
Y 1 − 1
(xi −θ)2
L(θ) = √ e 2σ 2

i=1 2πσ 2
− n 1
P 2
= 2πσ 2 2 e− 2σ2 (xi −θ)
n n 1 X
log L(θ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(θ) 1 X
= (xi − θ)
∂θ σ2
∂ 2 log L(θ) n
= − 2 <0
∂θ2 σ
∂ log L(θ)
For maximum, = 0
∂θ
X ∂ 2 log L(θ)
⇒ (xi − θ) = 0 i.e., θ̂(x) = x̄ and <0
∂θ2
A.Santhaakumaran 228

Thus the value of MLE of θ is θ̂(x) = x̄.


Case (ii) When θ is known, the likelihood function for σ 2 is

n n 1 X
log L(σ 2 ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(σ 2 ) (xi − θ)2
P
n 1
= − +
∂σ 2 2 σ2 2(σ 2 )2
2
∂ log L(σ ) 2 n
P
(xi − θ)2
= −
∂(σ 2 )2 2σ 4 σ6
∂ log L(σ 2 )
For maximum, = 0
∂σ 2
n 1 X
⇒ − 2+ 4 (xi − θ)2 = 0
2σ 2σ
(xi − θ)2
P
σ̂ 2 (x) = and
n
∂ 2 log L(σ 2 )
< 0
∂(σ 2 )2 σ2 =σ̂2 (x)

Pn
(xi −θ)2
Thus the value of the MLE of σ2 is σ̂ 2 (x) = i=1
n .
Case (iii) When θ and σ 2 are unknown, the likelihood function for θ and σ 2 is

n n 1 X
log L(θ, σ 2 ) = − log 2π − log σ 2 − 2 (xi − θ)2
2 2 2σ
∂ log L(θ, σ 2 ) 1 X
= (xi − θ)
∂θ σ2
∂ 2 log L(θ,σ 2 ) ∂ 2 log L(θ,σ 2 )
∂θ∂σ 2
= ∂σ 2 ∂θ
since both the partial derivatives exist and are continuous.

∂ 2 log L(θ, σ 2 ) X −1
= (xi − θ) 4
∂θ∂σ 2 σ
∂ log L(θ, σ 2 ) n 1 X
= − + (xi − θ)2
∂σ 2 2σ 2 2σ 4
∂ 2 log L(θ, σ 2 ) n 1 X
= − (xi − θ)2
∂(σ 2 )2 2σ 4 σ 6
∂ 2 log L(θ, σ 2 ) n
2
=− 2
∂θ σ
For maximum of L(θ, σ 2 ),

∂ log L(θ, σ 2 ) ∂ log L(θ, σ 2 ) ∂ 2 log L(θ, σ 2 )


=0 =0 <0
∂θ ∂σ 2 ∂θ2
A.Santhaakumaran 229

 2
∂ 2 log L(θ,σ 2 ) ∂ 2 log L(θ,σ 2 ) ∂ 2 log L(θ,σ 2 )
and ∂θ 2 ∂(σ 2 )2 − ∂θ∂σ 2 >0
  
−n −n
at θ = θ̂(x) and σ 2 = σ̂ 2 (x) σ̂ 2 (x) 2σ̂ 4 (x)
− 0 > 0 at θ = θ̂(x) = x̄
P
(xi −x̄)2
and σ 2 = σ̂ 2 (x) = n
∂ 2 log L(θ,σ 2 )

−n
since ∂θ2 = σ̂ 2 (x)
<0
θ=θ̂(x)
∂2 log L(θ,σ 2 )

−n
= <0

∂(σ 2 )2
σ 2 =σ̂ 2 (x) 2(σ̂ 2 (x))2


∂ 2 log L(θ, σ 2 ) X −1
2
= (xi − x̄) 4 =0
θ = θ̂(x)

∂θ∂σ σ̂ (x)
σ 2 = σ̂ 2 (x)
P
(xi −x̄)2
.˙. The MLE value of θ and σ 2 are θ̂(x) = x̄ and σ̂ 2 (x) = n .
Problem 6.5 Find the MLE of the parameter α and λ ( λ being large) from a sample
of n independent observations from the population represented by the following density
function 
λ λ
 ( α ) e− αλ x xλ−1

x > 0, λ > 0, α > 0
Γλ
pα,λ (x) =
 0

otherwise
Also obtain the asymptotic form of the covariance for the two parameters for large n.
∂ log Γλ 1
Solution: Given that ∂λ ≈ log λ − 2λ .

Likelihood function for α and λ of the sample size n is


n
 nλ Y
1 λ
−α
Pn
x λ
L(α, λ) = e i=1 i xλ−1
i
(Γλ)n α i=1
n n
λX X
log L(α, λ) = −n log Γλ + nλ log λ − nλ log α − xi + (λ − 1) log xi
α i=1 i=1
P
∂ log L(α, λ) nλ xi
=− +λ 2
∂α α α
∂ 2 log L(α, λ)
P
nλ xi
2
= 2 − 2λ 3
∂α α α
∂ 2 log L(α, λ)
P
n xi
=− + 2
∂λ∂α α α
Pn n
∂ log L(α, λ) ∂ log Γλ i=1 xi
X
= −n + n(1 + log λ) − n log α − + log xi
∂λ ∂λ α i=1
P
∂ log L(α, λ) 1 xi X
= −n(log λ − ) + n + n log λ − n log α − + log xi
∂λ 2λ α
A.Santhaakumaran 230

P
∂ log L(α, λ) n xi X
= + n − n log α − + log xi
∂λ 2λ α
∂ 2 log L(α, λ) n
2
=− 2
∂λ 2λ
∂ log L(α,λ) ∂ log L(α,λ)
For maximum of log L(α, λ), ∂α =0 and ∂λ =0
P
λ xi
−n +λ 2 = 0 ⇒ α̂(x) = x̄ and
P α α
n xi X
+ n − n log α − + log xi = 0
2λ α
n
⇒ λ̂(x) = Pn
2 i=1 (log x̄ − log xi )

∂ 2 log L(α, λ) n nx̄
Further =− + 2 =0
∂λ∂α
α=α̂(x),λ=λ̂(x)
x̄ x̄

∂ 2 log L(α, λ)
<0 and
∂λ2


λ=λ̂(x)
" #2
∂ 2 log L(α, λ) ∂ 2 log L(α, λ) ∂ 2 log L(α, λ)
2 2
− > 0 at α = α̂(x) and λ = λ̂(x)
∂λ ∂α ∂λ∂α
" #
n nλ̂(x) 2λ̂(x)nx̄ n2 1
i.e., − − − 0 = >0
2λ̂2 (x) x̄2 x̄3 λ̂(x)x̄2 2
Thus the value of the MLE of α and λ are α̂(x) = x̄ and λ̂(x) = 2 P(log nx̄−log x ) .
i

The asymptotic covariance matrix is


 i 
2 2
h i h
−Eα,λ ∂ log∂αL(α,λ)
2 −Eα,λ ∂ log
∂λ∂α
L(α,λ)
D=
 
h 2 i h 2 i 
−Eα,λ ∂ log
∂α∂λ
L(α,λ)
−Eα,λ ∂ log∂λL(α,λ)
2

n
" # " #
∂ 2 log L(α, λ) nλ 2λ X
−Eα,λ = − 2 + 3 Eα Xi
∂α2 α α i=1
nλ 2λ
= − + 3 nα
α2 α

= since Eα [Xi ] = α ∀ i
" # α2
∂ 2 log L(α, λ) n
−Eα,λ =
∂λ2 2λ2

The asymptotic covariance matrix at α = α̂(x) and λ = λ̂(x) is


 
nλ̂(x)
α̂2 (x)
0
D=
 

n
0
2λ̂2 (x)
A.Santhaakumaran 231

∂L(θ) ∂ log L(θ)


Remark 6.1 The likelihood equation ∂θ = 0 or ∂θ = 0 has more than one
root and L(θ) is not differentiable everywhere in Ω, then the estimate of the MLE may
be a terminal value, middle value of a sample, need not be unbiased, not sufficient,
not unique and not consistent. The likelihood function L(θ) for θ is continuously
differentiable and is bounded above, then the likelihood equation has unique solution,
which maximizes L(θ).
Problem 6.6 MLE is a terminal value
Find the maximum likelihood estimate of the parameter α when β is known for the
pdf 
 βe−β(x−α)

α ≤ x < ∞, β > 0, α > 0
pα, β (x) =
 0

otherwise
from a sample of size n is α̂ = x(1) .
Solution: When β is known, the likelihood function for α of the sample size n is
Pn
L(α) = β n e−β i=1
(xi −α)
n
X
log L(α) = n log β − β (xi − α)
i=1
∂ log L(α)
= nβ
∂α

The direct method cannot help to estimate the MLE of α. Since α ≤ x(1) ≤ x(2) ≤
· · · ≤ x(n) < ∞ , i.e., the range of the distribution depends on the parameter α.

log L(α) = n log β − nβ x̄ + nβα

is maximum, if α is minimum , i.e., α̂ = x(1) = value of the minimum order statistic


of the sample. Thus the MLE value of α is the terminal value x(1) .
Problem 6.7 Let X1 , X2 , · · · , Xn be a random sample drawn from a population
having density

 1 e−|x−θ|

−∞ < x < ∞, −∞ < θ < ∞
2
pθ (x) =
 0

otherwise

Show that the sample Median is the MLE of θ.


A.Santhaakumaran 232

Solution: The likelihood function for θ of the sample size n is


 n
1 P
L(θ) = e− |xi −θ|
2
 n
1 1
= P
2 |xi −θ|
e
P
|xi −θ|
L(θ) is maximum, if e is minimum.
P
|xi −θ|
But e is minimum if θ̂(x) = Median of the sample value, since mean deviation
is least when measured from the median. Thus the value of MLE of θ is the middle
value of the sample.
Problem 6.8 MLE is not unbiased
Let X1 , X2 , · · · , X5 be a random sample of size 5 from the uniform distribution having
pdf 
 1

0 < x < θ, θ > 0
θ
pθ (x) =
 0

otherwise
Show that the MLE of θ is not unbiased.
Solution: The likelihood function for θ of the sample size n = 5 is
1
L(θ) = if 0 < xi < θ, i = 1, 2, 3, 4, 5.
θ5
L(θ) is maximum, the estimate of θ is minimum. If

θ̂(x) = min {xi } = x(1) , then L(θ̂(x)) is not consistent.


1≤i≤5

If L[θ̂(x)] is consistent, then


  5
1


x(5) if 0 < x ≤ x(5)
L[θ̂(x)] =
 0

otherwise

If θ̂(x) = x(5) = max1≤i≤5 {xi }, then the value of the MLE of θ is θ̂(x) = x(5) .
Let Y = max1≤i≤5 {X5 }. The pdf of Y is

 55 t4

0<y<θ
θ
pθ (y) =
 0

otherwise
Z θ
5 5
Eθ [Y ] = t dt
0 θ5
5
= θ 6= θ
6
A.Santhaakumaran 233

The MLE θ̂(X) = X(5) is not an unbiased estimator.


Problem 6.9 Given an example that MLE is not unique and not sufficient statistic
Solution: Let X1 , X2 , · · · , Xn be iid with the pdf

 1 θ ≤x≤θ+1

pθ (x) =
 0 otherwise

The likelihood function for θ of the sample size n is



 1 if θ ≤ xi ≤ θ + 1, i = 1, 2, · · · , n

L(θ) =
 0

otherwise

 1 if θ ≤ min{xi } ≤ max{xi } ≤ θ + 1, i = 1, 2, · · · , n

L(θ) =
 0

otherwise

 1 if θ ≤ min{xi } = x(1) ≤ max{xi } = x(n) ≤ θ + 1, i = 1, 2, · · · , n

L(θ) =
 0

otherwise

 1 if θ ∈ [x(n) − 1, x(1) ]

L(θ) =
 0

otherwise

Thus any point in [x(n) − 1, x(1) ] is a value of the MLE of θ. Thus the MLE of θ is not
unique and not sufficient statistic.
Problem 6.10 Given an example to show that MLE is not exist
Solution: Let X1 , X2 , · · · , Xn be a random sample drawn from a population with
pmf b(1, θ), 0 < θ < 1 both n and θ are unknown and the only sample values
(0, 0, 0, · · · , 0) or (1, 1, · · · , 1) is available.
The likelihood function for θ of the sample size n is
P P
xi
L(θ) = θ (1 − θ)n− xi

X  X 
log L(θ) = xi + n − xi log(1 − θ)
xi (n − xi )
P P
∂ log L(θ)
= +
∂θ θ 1−θ
∂ log L(θ)
For maximum , = 0
∂θ
⇒ θ̂(x) = x̄ and
A.Santhaakumaran 234


∂ 2 log L(θ)
<0
∂θ2


θ=x̄

If (0, 0, · · · , 0) or (1, 1, · · · , 1) is alone observed, then x̄ = 0 or 1 is the value of the MLE


of θ. It is not the admissible value of θ, since θ ∈ (0, 1). Thus the MLE of θ is not
exist.
Problem 6.11 Examine whether MLE is not consistent?
Solution:
     
 Xi    µi 
Let  ∼N  , σ 2 In  i = 1, 2, · · · , n

Yi µi

be independent vectors, where µi , i = 1, 2, · · · , n and σ 2 are unknown.


 
Xi µ 
i
E  Yi  = µi ∀ i = 1, 2, · · · , n andV [Xi ] = V [Yi ] = σ 2 ∀ i

The likelihood function for µi , i = 1, 2, · · · , n and σ 2 is


n 
1 − 12 (xi −µi )2 − 12 (yi −µi )2
Y 
2
L(µi , σ ) = e 2σ 2σ

i=1
2πσ 2

1 Pn 1 Pn
log L(µi , σ 2 ) = −n log 2π − n log σ 2 − 2σ 2 i=1 (xi − µi )2 − 2σ 2 i=1 (yi − µi )2

∂ log L(µi , σ 2 )
= 0
∂µi
1 1
⇒ 2 (xi − µi ) + 2 (yi − µi ) = 0
σ σ
xi + yi
⇒ µ̂i = , i = 1, 2, · · · , n
2

n n
" #
∂ log L(µi , σ 2 ) −n 1 X 2
X
= + (x i − µ i ) + (yi − µi )2 = 0
∂σ 2 σ2 2σ 4 i=1 i=1
n  n 
" #
−n 1 X x i + yi 2 X x i + yi 2
 
+ x i − + y i − =0
σ2 2σ 4 i=1 2 i=1
2
n n
" #
−n 1 1X 2 1X
+ (xi − yi ) + (xi − yi )2 = 0
σ2 2σ 4 4 i= 4 i=1
n
1 X
⇒ σ̂ 2 (x, y) = (xi − yi )2
4n i=1
A.Santhaakumaran 235

If Vi = Xi − Yi , then Vi ∼ N (0, 2σ 2 ), i = 1, 2, · · · , n, since Xi ∼ N (µi , σ 2 ), Yi ∼


1 Pn
N (µi , σ 2 ), then the MLE of 2σ 2 is n i=1 Vi
2 . V1 2 , V2 2 , · · · , Vn 2 are iid random
variables each having χ2 variate with one degree of freedom. By Kolmogorov’s Strong
Law of Large Numbers
n
1X as
Vi 2 → Eσ2 [Vi2 ] = 2σ 2 as n → ∞ sinceEσ2 [V 2 ] = 2σ 2
n i=1
n
1 X as
i.e., V 2 → σ2 as n → ∞
2n i=1 i
n
1 X as σ2
i.e., V 2→ 6= σ 2 as n → ∞
4n i=1 i 2

n
2 1 X
Thus σ̂ (X, Y ) = (Xi − Yi )2 is not consistent estimator of σ 2 .
4n i=1

6.3 Numerical methods of maximum likelihood estimation

The likelihood equations are often difficult to solve explicitly for θ even in cases
where all the regularity conditions hold and the unique solution exist. Equations in
the exponential cases are very often non-linear and difficult to solve. It may difficult
to locate the global maximum of the likelihood function for the following cases,

(i) the family of distributions under consideration is not of the exponential type.

(ii) there exists multiple roots of the likelihood equations.


∂ log L(θ)
The use of successive iterations to solve the likelihood equations by assuming ∂θ

is continuous at θ for each xi , i = 1, 2, 3, · · · , n, where n is the sample size.


For example, a random variable has a Cauchy distribution depending on a location
parameter θ, i.e., 
 1
 1
−∞ < x < ∞
π 1+(x−θ)2
pθ (x) =
 0

otherwise
Taking a sample of size n from the population, the log likelihood function for θ is
n
X
log L(θ) = −n log π − log[1 + (xi − θ)2 ]
i=1
n 
∂ log L(θ) 2(xi − θ)
X 
= −
∂θ i=1
1 + (xi − θ)2
A.Santhaakumaran 236

The likelihood equation


n 
2(xi − θ)
X 
=0
i=1
1 + (xi − θ)2
has no explicit solution. The log likelihood function of θ may have several local maxi-
mum for a given sample X1 , X2 , · · · , Xn . Suppose − log[1 + (xi − θ)2 ] has a maximum
h i
Pn 2(xi −θ)
at θ = xi , then sum − i=1 1+(xi −θ)2
may have up to n different local maxima and
it depends on the sample values. Newton - Raphson method is used to locate the local
maxima.
(i) Newton - Raphson method
The Newton - Raphson method on the expansion around θ̂(x) of the likelihood equa-
∂ log L(θ)
tion ∂θ is
∂ log L(θ̂(x))
 ∂ 2 log L[θ +ν θ̂(x)−θ ]

( 0)
= ∂ log∂θL(θ0 )
0
∂θ + θ̂(x) − θ0 ∂θ2
for some 0 < ν < 1 (6.1)
where θ̂(x) is the root of the likelihood equation and θ0 is an initial solution or trial
∂ log L(θ̂(x))
solution. Since θ̂(x) is the solution of equation ∂θ = 0 and if ν = 0, then

∂ log L(θ0 )   ∂ 2 log L(θ )


0
+ θ̂(x) − θ0 =0
∂θ ∂θ2
∂ log L(θ0 )
∂θ
⇒ θ̂(x) = θ0 − ∂ 2 log L(θ0 )
= θ1 (say) (6.2)
∂θ2
The value θ1 can be substituted in equation (6.1) for θ0 to obtain another value θ2 , so
that
∂ log L(θ1 )
∂θ
θ2 = θ1 − ∂ 2 log L(θ1 )
(6.3)
∂θ2
and so on. Starting from an initial solution θ0 , one can generate a sequence {θk , k =
0, 1, · · ·} which is determined successively by the formula
∂ log L(θk )
∂θ
θk+1 = θk − ∂ 2 log L(θk )
,k = 0, 1, 2, · · · (6.4)
∂θ 2

If the initial solution θ0 was chosen, close to the root of the likelihood equations θ̂(x)
∂ 2 log L(θk )
and if ∂θ2
for k = 0, 1, · · · , is bounded away from zero, there is a good chance
that the sequence generated by equation (6.4) will converge to the root θ̂(x). The
sequence {θk , k = 0, 1, · · · , } generated by equation (6.4) depends on the sample values
X1 , X2 , · · · Xn . If the chosen initial solution θ0 is a consistent estimator of θ, then
A.Santhaakumaran 237

the sequence obtained by the equation (6.4) will faster converge to the root θ̂(x) and
provide the best asymptotically normal estimator of θ.
In small sample situations the sequence {θk , k = 0, 1, · · · , } generated by equation (6.4)
may convey irregularities due to the particular sample values obtained in the experi-
ment. In order to avoid irregularities in the approximating sequence, two methods are
proposed. They are fixed derivative method and method of scoring.
(ii) The method of fixed derivative
∂ 2 log L(θk )
In the fixed derivative method, the term ∂θ2
in equation (6.4) is replaced by
− ank where {ak , k = 0, 1, · · ·} is a suitable chosen sequence of constants and n is the
sample size.
Now the sequence {θk , k = 0, 1, · · ·} is generated by
ak ∂ log L(θk )
θk+1 = θk + , k = 0, 1, 2, · · · (6.5)
n ∂θ
The sequence {θk , k = 0, 1, · · · , } converge to the root θ̂(x) in a more regular fashion
rather than the equation (6.4) by the choice sequence {ak }∞
k=0

Fixed derivative method fails to converge in many cases, the method of scoring
may use to locate the local maximum, since the log likelihood curve is steep in the
neighbour hood of a local maximum equation (6.5).
(iii) The method of scoring
The method of scoring is a special case of the fixed derivative method. The
n
special sequence {ak , k = 0, 1, · · · , } is chosen by Fisher. It is ak = I(θk ) , where I(θk )
is the amount of Fisher Information of n observations x of X and θk is the value of
approximation after the (k − 1)th iteration. Thus Fisher’s scoring method generates
the sequence
1 ∂ log L(θk )
θk+1 = θk +
I(θk ) ∂θ
for the (k − 1)th iteration, k = 0, 1, 2, · · · . The method of iteration continues and stop
when the sequence {θk , k = 0, 1, · · · , } converges on a local maximum.
Problem 6.12 The following data represents a sample from a Cauchy population.
Obtain the maximum likelihood estimate for the parameter involved in the distribution
by the method of successive approximation.
A.Santhaakumaran 238

7.3344 3.4004 3.944 4.434 6.304


4.444 7.784 10.844 8.604 6.334
5.998 4.406 6.394 5.006 9.582
Solution: The pdf of Cauchy distribution is

 1
 1
−∞ < x < ∞
π 1+(x−θ)2
pθ (x) =
 0

otherwise

Arrange the sample values in the increasing order of magnitude. Let the first trial
value of θ is θ̂(x) = t1 = the value of the sample median. The first approximation
value is
n
4X (xi − t1 )
 
t2 = t1 +
n i=1 1 + (xi − t1 )2
The successive iteration values are t3 , t4 , · · ·. This procedure is continued until any two
successive iterations values are equal. The convergent value is the MLE value of θ.

C programme for MLE of θ of Cauchy distribution

#include< stdio.h >


#include< math.h >
#include< conio.h >
void main()
{
int i,j,n;
float a[100], sum[100], t[100], temp;
clrscr();
printf( ˝ Enter the number of observations n: \ n”);
scanf( ˝ %d”, &n);
printf( ˝ Enter the observations a: \ n”);
for(i= 1; i<= n; i++)
scanf( ˝ % f”, &a[i]);
for(i=1; i<= n-1, i++)
{
A.Santhaakumaran 239

for(j=i+1; j< = n; j++)


{
if(a[i]> = a[j])
{
temp=a[i];
a[i]= a[j];
a[j]= temp;
}
}
}
if(n % 2 = = 0)
t[1] = (a[n/2] + a[ n/2 + 1]) / 2 ;
else
t[1] = a[(n+1)/2];
printf( ˝ \ n OUT PUT \ n \ n ”);
printf( ˝ Value of the MLE of the Cauchy Distribution \ n”);
printf( ˝ \ n - - - - - - - - - - - - - - \ n”);
for(i=1:i <= n; i++)
printf( ˝ \ t %f \ n”, a[i]);
printf( ˝ \ n Result: \ n \ n”);
printf( ˝ Median = t[1] = %f \ n \ n”, t[1]);
for(j=1; j<=n; j++)
{
sum[j]= 0;
for(i =1; i< = n; i++)
{
sum[j] = sum[j] + (a[i] - t[j]) / (1 + (a[i] - t[j]) *(a[i] - t[j]) );
}
printf( ˝ Sum[%d] = % f \ t \ n”, j, sum[j]);
A.Santhaakumaran 240

t[j+1] = t[j] + (4 / (float)n)*(sum[j]);


printf( ˝ t[%d] = %f \n ”, j+1, t[j+1]);
if(abs(t[j] -t[j+1] ) > = .001 )
break;
}
printf( ˝ \ n Value of the MLE of theta = % f”,t[j] );
getch();
}
The value of MLE of θ = 6.013498.
Problem 6.13 Obtain the values of the MLE’s of the parameters b and c of the pdf
 c
 c xc−1 e− xb

x, b, c > 0
b
pb,c (x) =
 0

otherwise

based on a sample of size n.


Solution: The likelihood function for b and c of the sample size n is
n
 n Y
c 1
Pn
xci
L(c, b) = xc−1
i e− b i=1
b i=1
n
X 1X
log L(c, b) = n log c − n log b + (c − 1) log xi − xc
b i=1 i
n
∂ log L(c, b) n X cX
= + log xi − xc−1
∂c c b i=1 i
n
∂ log L(c, b) n 1 X
= − + 2 xc
∂b b b i=1 i

Case (i) when c is known, the maximum of L(c) is obtained.


n
∂ log L(b) n 1 X
= − + 2 xc = 0
∂b b b i=1 i
n
X
⇒ nb = xci
i=1
Pn
xci
The value of the MLE of b is b̂(x) = i=1
n

Case (ii) when b is unknown, the maximum of L(c) is obtained


∂ log L(c)
= 0
∂c
A.Santhaakumaran 241

n n
n X cX
⇒ + log xi − xc−1 = 0
c i=1 b i=1 i
n n
xc−1
X X
i.e., c2 i − cb log xi − nb = 0
i=1 i=1

The estimate of MLE of c is obtained by iterative method.


Case(iii) when both c and b are unknown, the maximum L(c, b) is obtained by solving
the following equations.
∂ log L(c, b) ∂ log L(c, b)
= 0 and =0
∂c ∂b
n
X
nb − xci = 0
i=1
n n
xic−1 = 0
X X
nb + cb log xi − c2
i=1 i=1
n  Pn c n n
i=1 xi
X
xic−1 = 0
X X
i.e., xci + c log xi − c2
i=1
n i=1 i=1

The estimates of c and b are obtained to solve the above equations for c and b by
iterative method.

6.4 Optimum properties of MLE

Lemma 6.1 Denote X ∼ Pθ , θ ∈ Ω and it has pdf pθ (x)

(i) The probability distributions Pθ are distinct for distinct values of θ.

(ii) The range of the density functions p(x | θ) are independent of the parameter θ.

(iii) The random observations X1 , X2 , · · · , Xn on X are independent and identically


distributed.

(iv) Ω contains an open interval and Ω containing θ0 , the true value of θ as an interior
point in Ω.

Then Pθ0 {L(θ0 ) > L(θ1 )} → 1 as n → ∞ for θ0 and θ1 ∈ Ω.


Proof: The likelihood function of θ is
n
Y
L(θ1 ) = pθ (xi ) and
i=1
A.Santhaakumaran 242

n
Y
L(θ0 ) = pθ0 (xi )
i=1
Define Sn = {x : L(θ0 ) > L(θ1 )}

Prove that Pθ0 {Sn } → 1 as n → ∞


L(θ0 ) L(θ0 )
 
> 1 ↔ log >0
L(θ1 ) L(θ1 )
n
pθ0 (xi )
X  
log > 0
i=1
pθ1 (xi )
n
pθ1 (xi )
X  
log < 0
i=1
pθ0 (xi )
n
1X pθ1 (xi )
 
log < 0
n i=1 pθ0 (xi )
n
( )
1X pθ1 (Xi )
n o  
lim Pθ0 {Sn } = Pθ0 lim Sn = Pθ0 lim log <0
n→∞ n→∞ n→∞ n pθ0 (Xi )
i=1
pθ1 (xi )
Since X1 , X2 , · · · , Xn are iid ⇒ pθ0 (xi ) are iid. By Khintchin’s Law of Large Numbers

n
1X pθ1 (xi ) pθ1 (X)
    
P
log → Eθ0 log as n → ∞
n i=1 pθ0 (xi ) pθ0 (X)

By Jensen’s Inequality for the convex function f (X) → E[f (X)] ≤ f (E[X]). Here
p (x) p (x) 1
− log pθθ0 (x) = log pθθ1 (x) is strictly convex.
1 0

pθ1 (x)
 
For the convex function, log
pθ0 (x)
pθ1 (X) pθ1 (X)
     
Eθ0 log ≤ log Eθ0
pθ0 (X) pθ0 (X)
pθ1 (X) pθ1 (x)
  Z
But Eθ0 = pθ (x)dx = 1
pθ0 (X) pθ0 (x) 0

L(θ0 )
 
.˙. lim Pθ0 {Sn } = Pθ0 lim >1
n→∞ n→∞ L(θ1 )
n
( )
1X pθ1 (Xi )
 
= Pθ0 lim log <0
n→∞ n pθ0 (Xi )
i=1
pθ1 (X)
    
= Pθ0 Eθ0 log <0 → 1 as n → ∞
pθ0 (X)
1 dy 1
y = log x is a concave function and − log x is a convex function, since dx
= x
> 0 ↑ ∀ x > 0 and
d2 y
dx2
= − x12 <0
A.Santhaakumaran 243

pθ1 (X)
   
= Pθ0 log Eθ0 <0 → 1 as n → ∞
pθ0 (X)
Pθ0 {L(θ0 ) > L(θ1 )} → 1 as n → ∞

MLE is consistent

Theorem 6.1 (Dugue, 1937) If log L(θ) is differentiable in an interval including


the true value of θ, say θ0 , then under the assumptions of Lemma 6.1, the likelihood
∂ log L(θ)
equation ∂θ = 0 has a root with probability 1 as n → ∞ which is consistent for
θ0 .
Proof: Let θ0 be the true value of θ and consider an interval (θ0 ± δ), δ > 0.
n o
L(θ0 )
By Lemma 6.1 Pθ0 L(θ1 ) >1 → 1 as n → ∞, where θ1 = θ0 ± δ, since θ0 ∈
(θ0 − δ, θ0 + δ) and the likelihood function is continuous in (θ0 − δ, θ0 + δ).
L(θ) should have a relative maximum within (θ0 − δ, θ0 + δ) with probability tends to 1
as n → ∞, since L(θ) is differentiable over (θ0 − δ, θ0 + δ).
∂ log L(θ)
⇒ ∂θ = 0 at some point in (θ0 − δ, θ0 + δ)
∂ log L(θ)
⇒ θ̂(x) is a solution of ∂θ = 0 in (θ0 − δ, θ0 + δ)
⇒ θ̂(X) ∈ [θ0 − δ, θ0 + δ] with probability tends to 1 as n → ∞
n o
⇒ Pθ0 θ0 − δ < θ̂(X) < θ0 + δ → 1 as n → ∞
n o
⇒ Pθ0 θ̂(X) − θ0 < δ → 1as n → ∞

P
⇒ θ̂(X) → θ0 as n → ∞
⇒ θ̂(X) is a consistent estimator of θ.

MLE maximizes the likelihood

Theorem 6.2 ( Huzurbazar, 1948) If log L(θ) is twice differentiable in an interval


including the true value of the parameter, than the consistent solution of the likelihood
equation [ which exists with probability one by Theorem 6.1 ] maximizes the likelihood
at the true value with probability tends to one, i.e.,
 
 ∂ 2 log L(θ) 
Pθ0 <0 → 1 as n → ∞

∂θ2

 
θ=θ̂(x)

∂2 log L(θ)
Proof: Expanding ∂ 2 θ2
as Taylor’s series around θ̂(x) is
∂2 log L[θ̂(x)] ∂2 log L(θ0 ) 3 log L(θ? )
∂θ2
= ∂θ2
+ [θ̂(x) − θ0 ] ∂ ∂θ3
where θ? = θ0 + ν(θ̂(x) − θ0 ), 0 < ν < 1
A.Santhaakumaran 244

3 log L(θ? )

Further, assume ∂ ≤ H(x) ∀ θ ∈ Ω and Eθ0 [H(X)] < ∞ is independent of

∂θ3

θ0 .

∂ 2 log L[θ̂(x)] ∂ 2 log L(θ0 ) ∂ 3 log L(θ ? )
− ≤ |θ̂(x) − θ0 |

∂θ2 ∂θ2 3

∂θ

≤ |θ̂(x) − θ0 |H(x)

P P
|θ̂(X) − θ0 |H(X) → 0 as n → ∞ since θ̂(X) → θ0 as n → ∞

( )
∂ 2 log L[θ̂(X)] ∂ 2 log L(θ0 )
Pθ0 − < → 1 as n → ∞

∂θ2 ∂θ2

Each X1 , X2 , · · · , Xn is iid and by Khintchin’s Law of Large Numbers

n
" #
1X ∂ 2 log pθ (xi ) P ∂ 2 log pθ (X)
→ Eθ0 as n → ∞
n i=1 ∂θ2 ∂θ2
" #
∂ 2 log pθ (X)
Since I(θ0 ) ≥ 0 ⇒ Eθ0 = −I(θ0 ) < 0
∂θ2
n
( )
. 1X ∂ 2 log pθ (X)
. .Pθ0 <0 → 1 as n → ∞
n i=1 ∂θ2
 
n

Y  ∂ 2 log L(θ) 
Since L(θ) = pθ (xi ) ⇒ P θ0 <0 → 1 as n → ∞

2

i=1
 ∂θ
θ=θ̂(x)

MLE is asymptotically normal

Let X1 , X2 , · · · , Xn be random observations on X with pdf pθ (x), θ ∈ Ω.


Assumptions:

∂ log L(θ) ∂ 2 log L(θ) ∂ 3 log L(θ)


(i) ∂θ , ∂θ2 , and ∂θ3
exist for all x and over an interval containing
the true value of θ say θ0 .

∂ 2 log L(θ)
h i h i
∂ log L(θ)
(ii) Eθ0 ∂θ = 0, Eθ0 ∂θ2
= −nI(θ0 ) < 0 ∀ θ ∈ Ω where I(θ0 ) is the
amount of information for a single observation x of X.
3

(iii) ∂ log L(θ)
≤ H(x) and Eθ0 [H(X)] < ∞ is independent of θ0 .

∂θ3
A.Santhaakumaran 245

Theorem 6.3 ( Cramer 1946) Let θ̂(X) be the MLE of θ, then under the regularity
p
conditions (i) to (iii) nI(θ0 )(θ̂(X) − θ0 ) has an asymptotic normal distribution with
mean zero and variance one
∂ log L(θ)
Proof: Let θ̂(X) be the solution of ∂θ = 0 in an interval containing the true
value θ0 of θ.
∂ log L(θ)
Expanding the function ∂θ around θ̂(x) by using Taylor’s series for any fixed x,

   2
∂ log L θ̂(x) ∂ log L(θ0 )   ∂ 2 log L(θ )
0
θ̂(x) − θ0 ∂ 3 log L(θ? )
i.e., = + θ̂(x) − θ0 +
∂θ ∂θ ∂θ2 2! ∂θ3
 
where θ? = θ0 + ν θ̂(x) − θ0 , 0 < ν < 1.
2
∂ log L(θ̂(x)) ∂ log L(θ0 )  ∂ 2 log L(θ0 ) θ̂(x) − θ0 ∂ 3 log L(θ? )
But =0 → + θ̂(x) − θ0 2
+ =0
∂θ ∂θ ∂θ 2 ∂θ3

 2
  ∂ 2 log L(θ )
0
θ̂(x) − θ0 ∂ 3 log L(θ? ) ∂ log L(θ0 )
θ̂(x) − θ0 + =−
∂θ2 2 ∂θ3 ∂θ
   
  ∂ 2 log L(θ )
0
θ̂(x) − θ 0 ∂ 3 log L(θ ? ) ∂ log L(θ0 )
θ̂(x) − θ0  2
+ 3
=−
∂θ 2 ∂θ ∂θ

  1 ∂ log L(θ0 )
n ∂θ
θ̂(x) − θ0 =  
1 ∂ 2 log L(θ )
0 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ ? )
−n ∂θ2
− 2 n ∂θ3

I(θ0 )
nI(θ0 ) n1 ∂ log∂θL(θ0 )
p
 
p I(θ0 )
nI(θ0 ) θ̂(x) − θ0 =  
2 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ? )
− n1 ∂ log L(θ0 )
∂θ 2 − 2 n ∂θ 3

1 ∂ log L(θ0 )
q √ ∂θ
nI(θ0 )
 
nI(θ0 ) θ̂(x) − θ0 =  
2 (θ̂(x)−θ0 ) 1 ∂ 3 log L(θ? )
1
I(θ0 ) − n1 ∂ log L(θ0 )
∂θ2
− 2 n ∂θ3

By Khintchin’s Law of Large Numbers


n
" #
1X ∂ 2 log pθ (xi ) P ∂ 2 log pθ (X)
→ Eθ 0 as n→∞
n i=1 ∂θ2 ∂θ2

n
1X ∂ 2 log pθ (xi ) P
→ −I(θ0 ) as n → ∞
n i=1 ∂θ2
A.Santhaakumaran 246

P P
Also θ̂(X) → θ0 as n → ∞ ⇒ θ̂(X) − θ0 → 0 as n → ∞ and Eθ0 [H(X)] = k as
h i
∂ log pθ (xi ) ∂ log pθ (Xi )
n → ∞. Denote Zi = ∂θ ,i = 1, 2, · · · , n. Eθ0 [Zi ] = Eθ0 ∂θ =0 ∀i=
1, 2, · · · , n. Let Sn = Z1 + · · · + Zn , then E[Sn ] = 0 and V [Sn ] = I(θ0 ) + · · · + I(θ0 ) =
nI(θ0 )
1 ∂ log L(θ0 )
q √ ∂θ
nI(θ0 )
 
nI(θ0 ) θ̂(X) − θ0 = h i as n → ∞
1
I(θ0 ) − n1 (−nI(θ0 )) − 0
q   ∂ log L(θ0 )
∂θ
nI(θ0 ) θ̂(X) − θ0 = p as n → ∞
nI(θ0 )
n −E[Sn ] d
By Lindeberg - Levey Central Limit Theorem S√ → N (0, 1) as n → ∞.
V [Sn ]
p  
.˙. nI(θ0 ) θ̂(X) − θ0 ∼ N (0, 1) as n → ∞.
Remark 6.2 Any consistent estimator θ̂(X) of roots of the likelihood equation satisfies

n(θ̂(X) − θ0 ) ∼ N (0, I(θ10 ) ), then θ̂(X) is an efficient likelihood estimator of θ or
asymptotically normal and efficient estimator of θ.

MLE is unique

Theorem 6.4 ( Wald 1949) Consistent solution of a likelihood equation is unique


with probability 1 as n → ∞
∂ log L(θ)
Proof: Let θˆ1 (x) and θˆ2 (x) be two consistent solutions of ∂θ = 0 and
 
θˆ1 (x) 6= θˆ2 (x) . By Huzurbazar’s Theorem

∂ 2 log L(θˆ1 (X))


( )
Pθ <0 → 1 as n → ∞ and
∂θ2

∂ 2 log L(θˆ2 (X))


( )
Pθ <0 → 1 as n → ∞
∂θ2
∂ log L(θ) ∂ 2 log L(θˆ3 (x))
Applying Rolle’s Theorem to the function ∂θ which gives ∂θ2
= 0 for
 
some θˆ3 (x) within the interval θˆ1 (x), θˆ2 (x) where θˆ3 (x) = λθˆ1 (x)+(1−λ)θˆ2 (x), 0 <
∂ log L(θ)
λ < 1. θˆ3 (x) is also a consistent solution of ∂θ = 0. Thus

∂ 2 log L(θˆ3 (X))


( )
Pθ <0 → 1 as n → ∞
∂θ2
A.Santhaakumaran 247

∂ 2 log L(θˆ3 (x)) ∂ 2 log L(θ̂(x))


∂θ2
< 0 is a contradiction to Rolle’s Theorem property that ∂θ2
=0
 
for some θˆ3 (x) within the interval θˆ1 (x), θˆ2 (x) . The only possibility is θˆ1 (x) = θˆ2 (x).
Thus θˆ1 (x) = θˆ2 (x) is a consistent solution of the likelihood equation and is unique.

Invariant Property of MLE

Let X ∼ Pθ , θ ∈ Ω, where Ω is a k dimensional parameter space. Consider


g(θ) : Ω → O where O is the r dimensional space (r ≤ k). If θ̂ is the MLE of θ, then
g(θ̂) is the MLE of g(θ).
Let g(θ) be the function of θfrom Ω to O, i.e.,g : Ω → O ∀θ ∈ Ω
i.e., g(θ) = ω ∈ O . For a fixed ω ∈ O, let

Aω = [θ | g(θ) = ω]

= the set of all θ0 s such that g(θ) = ω fixed ∀ω∈O

.. . ∩ ω Aω = Ω

In other words for any given θ ∈ Ω, we can find a ω ∈ O such that θ ∈ Aω


Let θ̂ be the MLE of θ, i.e, L(θ̂) is maximized at θ = θ̂, .. . θ̂ ∈ Ω.
⇒ given θ̂, we can find ω̂ = g(θ̂) such that θ̂ ∈ Aω .
Thus θ̂ is the MLE of θ
⇒ g(θ̂) is the MLE of g(θ).

Relation between one parameter exponential family and MLE

Let X1 , X2 , · · · , Xn be a random sample on X according to a one parameter expo-


nential family with density

pθ (x) = c(θ)eQ(θ)t(x) h(x)

= eθt(x)−A(θ) h(x)

where c(θ) = e−A(θ) and Q(θ) = θ


The likelihood function for θ of the sample size n is
Pn
L(θ) = eθ i=1
t(xi )−nA(θ)
h(x) where h(x) = h1 (x1 , x2 , · · · , xn )
A.Santhaakumaran 248

n
X
log L(θ) = θ t(xi ) − nA(θ) + log h(x)
i=1
n
∂ log L(θ)
t(xi ) − nA0 (θ)
X
=
∂θ i=1

n
∂ log L(θ) 1X
For maximum, = 0 ⇒ A0 (θ) = t(xi ) (6.6)
∂θ n i=1
and
∂ 2 log L(θ)
= −nA00 (θ) < 0
∂θ2
Z
Consider eθt(x)−A(θ) h(x)dx = 1

Assume that the integral is continuous and has derivatives of all orders with respect
to θ and it can be differentiated under the integral sign.
Z Z
t(x)eθt(x)−A(θ) h(x)dx − A0 (θ)e−A(θ) eθt(x) h(x)dx = 0
Z
0
Eθ [T ] = A (θ) eθt(x)−A(θ) h(x)dx

A0 (θ) = Eθ [T ] (6.7)

1 Pn
Using equations (6.6) and (6.7), one may get Eθ [T ] = n i=1 t(xi )
Z Z
t(x)eθt(x)−A(θ) h(x)dx − A0 (θ) = 0 since eθt(x) e−A(θ) h(x)dx = 1

Again differentiating with respect to θ


Z
t2 (x)eθt(x)−A(θ) h(x)dx − A00 (θ) − A0 (θ)Eθ [T ] = 0

Eθ [T 2 ] = A00 (θ) + A0 (θ)Eθ [T ]

Eθ [T 2 ] − (Eθ [T ])2 = A00 (θ) since A0 (θ) = Eθ [T ]


∂ 2 A(θ)
i.e., = Vθ [T ]
∂θ2
√  
Thus n θ̂(X) − θ ∼ N (0, Vθ [T ]) , i.e., θ̂(X) is consistent, unique and asymptoti-
cally normal.
A.Santhaakumaran 249

Relationship between sufficient statistic and MLE

If sufficient statistic exists, then the MLE is a function of sufficient statistics.


Let X1 , X2 , · · · , Xn be iid random sample with pdf pθ (x). Let T = t(X) be the
sufficient statistic. The likelihood function for θ of the sample size n is
n
Y
L(θ) = pθ (xi )
i=1
= pθ (t)h(x) where h(x) = h1 (x1 , x2 , · · · , xn )

log L(θ) = log pθ (t) + log h(x)


∂ log L(θ) ∂ log pθ (t)
= and
∂θ ∂θ
∂ 2 log L(θ) ∂ 2 log pθ (t)
=
∂θ2 ∂θ2
∂ 2 log L(θ)

∂ log L(θ) ∂ log pθ (t)
For MLE, ∂θ = 0 and ∂θ2 < 0 are equivalent to ∂θ = 0 and
θ=θ̂(x)
∂2

log pθ (t)
∂θ 2 < 0. Thus MLE is a function of the sufficient statistic.
θ=θ̂(x)

6.5 Method of minimum variance bound estimation

A statistic T = t(X) is said to be a MVBE if it attains the Cramer - Rao lower


bound.
Theorem 6.5 A necessary and sufficient condition for a statistic T = t(X) is a
∂ log L(θ)
MVBE of τ (θ) is ∂θ and [t(x) − τ (θ)] are proportional.
∂ log L(θ) ∂ log L(θ)
Proof: Assume ∂θ and t(x) − τ (θ) are proportional, i.e., ∂θ ∝ t(x) − τ (θ),
i.e.,
∂ log L(θ)
= A(θ)[t(x) − τ (θ)] (6.8)
∂θ
where A(θ) is function of θ only.
To Prove T = t(X) is MVBE of τ (θ), it is enough to prove

[τ 0 (θ)]2
Vθ [T ] = ∀ θ∈Ω
∂ log L(θ) 2
h i
Eθ ∂θ

 
∂ log L(θ)
Covθ T, = τ 0 (θ), ∀ θ ∈ Ω.
∂θ
A.Santhaakumaran 250

 
∂ log L(θ)
i.e., Eθ T = τ 0 (θ), ∀ θ ∈ Ω.
∂θ
   
∂ logL(θ) 0 ∂ log L(θ)
Eθ (T − τ (θ)) = τ (θ), since Eθ =0∀θ∈Ω
∂θ ∂θ
∂ log L(θ)
A(θ)Eθ [T − τ (θ)]2 = τ 0 (θ) since = A(θ)[t(x) − τ 0 (θ)]
∂θ
A(θ)Vθ [T ] = τ 0 (θ)
τ 0 (θ)
A(θ) =
Vθ [T ]
Squaring both sides of (5.8), one can get
 2
∂ log L(θ) 2
= A2 (θ) [t(x) − τ 0 (θ)]
∂θ
 2
∂ log L(θ)
Eθ = A2 (θ)Vθ [T ]
∂θ
2
[τ 0 (θ)]2 Vθ [T ]

∂ log L(θ)
i.e., Eθ = 2
∂θ {Vθ [T ]}
[τ 0 (θ)]2
i.e., Vθ [T ] = h i2 ∀ θ ∈ Ω
∂ log L(θ)
Eθ ∂θ

T = t(X) attains the Cramer - Rao lower bound, i.e., T = t(X) is a MVBE of τ (θ).
∂ log L(θ)
Conversely, assume T = t(X) is a MVBE of τ (θ). Now to prove ∂θ ∝ [t(x) −
∂ log L(θ) 2 0 2
h i
∂ log L(θ)
τ (θ)], i.e., ∂θ = A(θ)[t(x) − τ (θ)], τ 0 (θ) = A(θ)Vθ [T ] and Eθ ∂θ = [τVθ(θ)]
[T ]

2
∂ log L(θ) A2 (θ)Vθ2 [T ]

.˙. Eθ =
∂θ Vθ [T ]
∂ log L(θ) 2
 
Eθ = A2 (θ)Vθ [T ]
∂θ
∂ log L(θ) 2
 
Eθ = A2 (θ)Eθ [T − τ (θ)]2
∂θ
∂ log L(θ)
⇒ = A(θ)[t(x) − τ (θ)]
∂θ
∂ log L(θ)
i.e., ∝ [t(x) − τ (θ)]
∂θ

Problem 6.14 Let X1 , X2 , · · · , Xn be iid random sample drawn from a population


with density function

 θxθ−1

0 < x < 1, θ > 0
pθ (x) =
 0

otherwise

Obtain the MVBE of θ.


A.Santhaakumaran 251

The likelihood function for θ of the sample size n is


n
xθ−1
Y
L(θ) = θn i
i=1
n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1
n
∂ log L(θ) n X
= + log xi
∂θ θ i=1
" n #
X −n
= log xi −
i=1
θ
Pn −n −n
t(x) = i=1 log xi , τ (θ) = θ and A(θ) = 1. Thus the MVBE of τ (θ)(= θ ) is
Pn Pn τ 0 (θ) n
i=1 log Xi and the variance of the estimator i=1 log Xi is A(θ) = θ2
.
The MVBE of θ is θ̂(X) = Pn −nlog X .
i=1 i

Problem 6.15 Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a


population with pdf

1 − xθ p−1
θp Γp e x x > 0, θ > 0


pθ (x) =
 0

otherwise

Obtain the MVBE of θ when p is known.


Solution: The likelihood function for θ of the sample size n is
Pn
x
1 i=1 i
hY ip−1
L(θ) = e− θ xi
(Γp)n θnp
P n
xi X
log L(θ) = −n log Γp − np log θ − + (p − 1) log xi
θ i=1
∂ log L(θ) np nx̄
= 0− + 2
∂θ θ θ
n
= [x̄ − pθ]
θ2 
np x̄

= − θ
θ2 p
np x̄
τ (θ) = θ, A(θ) = 2 , t(x) =
θ p
X̄ τ 0 (θ) x̄2
The MVBE of τ (θ) is T = p, when p is known and Vθ [T ] = A(θ) = np3
.
Theorem 6.6 The necessary and sufficient condition that distribution admits the
estimator of a suitable chosen function of a parameter with variance equal to the in-
formation limit ( MVB) is that the likelihood function L(θ) = eθ1 t(x)+θ2 h(x), where
A.Santhaakumaran 252

h(x) and t(x) are functions of observations only and θ1 and θ2 are functions of θ only.
The parametric functions to be estimated is − dθ dθ2 dθ
dθ1 = − dθ dθ1 and the variance of the
2

2
h  i
estimator is − ddθθ22 = d
dθ − dθ2
dθ1 dθ
1
1 dθ1
Proof: Let T = t(X) be the MVBE of τ (θ) where θ is the population parameter. For a
single observation x of X, the likelihood function for θ is L(θ) = pθ (x), and t(x)−τ (θ)
∂ log L(θ)
and ∂θ are proportional, i.e.,

∂ log L(θ)
= A(θ)[t(x) − τ (θ)]
∂θ

where A(θ) is a function of θ only.


Integrating with respect to θ, one can get
Z Z
∂ log L(θ) = A(θ)[t(x) − τ (θ)]dθ + c

where c is a constant of integration and free from θ.

log L(θ) = t(x)θ1 + θ2 + c

L(θ) = eθ1 t(x)+θ2 +c

= eθ1 t(x)+θ2 ec

= eθ1 t(x)+θ2 h(x) where ec = h(x)

Thus the condition is necessary.


Conversely, the likelihood function L(θ) is expressible in the form

L(θ) = eθ1 t(x)+θ2 h(x)


Z Z
L(θ)dx = h(x)et(x)θ1 +θ2 dx = 1
Z
i.e., h(x)et(x)θ1 dx = e−θ2

Further, assuming the differentiation with respect to θ1 under the integral sign is valid
and differentiate twice, one can get

dθ2
Z  
h(x)et(x)θ1 t(x)dx = e−θ2 − (6.9)
dθ1
A.Santhaakumaran 253

2
dθ2 d2 θ2
Z 
h(x)et(x)θ1 [t2 (x)]dx = e−θ2 − e−θ2 (6.10)
dθ1 dθ12
From equation (6.9), Eθ [T ] = − dθ
dθ1 = τ (θ)
2

 2 2
dθ2
− ddθθ22
R 2
From equation (6.10), t (x)et(x)θ1 +θ2 h(x)dx = dθ1 1
2
dθ2 d2 θ2

Eθ [T 2 ] = −
dθ1 dθ12
d2 θ2
Vθ [T ] = Eθ [T 2 ] − (Eθ [T ])2 = −
dθ12
The variance of the MVBE of τ (θ) is
n o
d dθ2
dθ (− dθ1 )
[τ 0 (θ)]2
=
∂ log L(θ) 2 ∂ log L(θ) 2
h i h i
Eθ ∂θ Eθ ∂θ

[− dθd1 { dθ2 dθ1 2


dθ1 } dθ ]
=
∂ log L(θ) 2
h i
Eθ ∂θ
2
( ddθθ22 )2 { dθ1 2
dθ }
1
=
∂ log L(θ) 2
h i
Eθ ∂θ

But log L(θ) = t(x)θ1 + θ2 + log h(x)

∂ log L(θ) dθ1 dθ2


= t(x) +
∂θ dθ dθ
∂ 2 log L(θ) d2 θ1 d2 θ2
= t(x) 2 +
" ∂θ2 # dθ dθ2
∂ 2 log L(θ) d2 θ1 d2 θ2
Eθ = E θ [T ] +
∂θ2 dθ2 dθ2
d2 θ1 dθ2 d2 θ2
 
= − +
dθ2 dθ1 dθ2
dθ2 dθ2 dθ1
=
dθ dθ1 dθ
d2 θ2 d dθ2 dθ1
 
=
dθ2 dθ dθ1 dθ
dθ2
d[ dθ1
] dθ1 dθ2 d2 θ1
= +
dθ dθ dθ1 dθ2
d2 θ2 dθ1 2 dθ2 d2 θ1
 
= +
dθ12 dθ dθ1 dθ2
2 " #
∂ log L(θ) ∂ 2 log L(θ)

But Eθ = −Eθ
∂θ ∂θ2
A.Santhaakumaran 254

 
The variance of the MVBE of τ (θ) = − dθ2
dθ1 is
2
{ ddθθ22 }2 { dθ1 2
dθ } d2 θ2
1
=−
− d2 θ2
( dθ1 2
dθ )
dθ12
dθ12
2
The variance of T = t(X) is − ddθθ22 . Thus T = t(X) attains the MVB of the parametric
1

function τ (θ).
Problem 6.16 Let X1 , X2 , · · · , Xn be a random sample drawn from the population
with pdf 
 θxθ−1

0 < x < 1, θ > 0
pθ (x) =
 0

otherwise
Find the MVBE of θ.
Solution: The likelihood function for θ is
n
!θ−1
Y
n
L(θ) = θ xi
i=1
n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1
P Pn
log xi −
⇒ L(θ) = en log θ+θ i=1
log xi

⇒ L(θ) = eθ1 t(x)+θ2 h(x)

where θ1 = θ, θ2 = n log θ,
P
h(x) = e− log xi , t(x) =
X
log xi
dθ2 n
τ (θ) = − =−
dθ1 θ
2 d nθ

d θ2 n
Vθ [T ] = − 2 = − = 2
dθ1 dθ θ
−n −n

P
Since Eθ [T ] = τ (θ) = θ log Xi = θ .

Thus the MVBE of θ is θ̂(X) = P −n


log X
.
i

Relationship between MVBE and MLE

If MVBE exists, then the MLE is a function of the MVBE.


Assume that the MVBE T = t(X) exists for the parameter θ, then
∂ log L(θ)
= A(θ)[t(x) − θ]
∂θ
A.Santhaakumaran 255

∂ log L(θ) ∂ 2 log L(θ)


L(θ) attains maximum, if ∂θ = 0 and ∂θ2
< 0 at θ = θ̂(x).

i.e., A(θ)[t(x) − θ] = 0 ⇒ θ̂(x) = t(x) and

∂ 2 log L(θ)
= A0 (θ)[t(x) − θ] + A(θ)(−1)
∂θ2
2
∂ log L(θ)
= −A(θ̂(x)) < 0 at θ = θ̂(x)
∂θ2
ˆ is MLE of θ.
where θ(X)
Problem 6.17 If T = t(X) is MVBE of τ (θ) and pθ (x1 , x2 , · · · , xn ) the joint density
function corresponding to n independent observations of a random variable X , then
∂ log pθ (x1 ,x2 ,···,xn )
show that correlation between T and ∂θ is unity.
Solution: Given T = t(X) is the MVUE of τ (θ), i.e., T attains the Cramer Rao
lower bound,
[τ 0 (θ)]2
⇒ Vθ [T ] = ] θ∈Ω
Vθ [ ∂ log pθ (x∂θ
1 ,x2 ,···,xn )

∂ log pθ (x1 , x2 , · · · , xn )
i.e., [τ 0 (θ)]2 = Vθ [T ]Vθ [ ]
∂θ
s
∂ log pθ (x1 , x2 , · · · , xn )
τ 0 (θ) = Vθ [T ]Vθ [ ]
∂θ

But τ (θ) = Eθ [T ]
Z
= tpθ (x1 , x2 , · · · , xn )dx
∂pθ (x1 , x2 , · · · , xn )
Z
τ 0 (θ) = t dx
∂θ
∂pθ (x1 , x2 , · · · , xn ) pθ (x1 , x2 , · · · , xn )
Z
= t dx
∂θ pθ (x1 , x2 , · · · , xn )
∂ log pθ (x1 , x2 , · · · , xn )
Z
= t pθ (x1 , x2 , · · · , xn )dx
∂θ
∂ log pθ (x1 , x2 , · · · , xn )
 
= Eθ T
∂θ
log pθ (x1 , x2 , · · · , xn )
 
= Covθ T,
∂θ
log pθ (x1 ,x2 ,···,xn )
Correlation coefficient between T and ∂θ is
h  i
Covθ T, log pθ (x1∂θ
,x2 ,···,xn )
ρ= r h i
Vθ [T ]Vθ ∂ log pθ (x∂θ
1 ,x2 ,···,xn )
A.Santhaakumaran 256

τ 0 (θ)
ρ = r h i
∂ log pθ (x1 ,x2 ,···,xn )
Vθ [T ]Vθ ∂θ

= 1
r h i
∂ log pθ (x1 ,x2 ,···,xn )
Since τ 0 (θ) = Vθ [T ]Vθ ∂θ

6.6 Method of moment estimation

Let X1 , X2 , · · · , Xn be iid random sample of size n with pdf pθ (x) where θ =


(θ1 , θ2 , · · · , θk ) of k parameters. Define µ0r = Eθ [X r ], r = 1, 2, · · · , k. The method of
moments estimation is a principle of solving a set of k equations in θ1 , θ2 , · · · , θk to
estimate the parameters θ1 , θ2 , · · · , θk , i.e., θ̂(µ0 ) = µ01 , µ02 , · · · , µ0k . Replace µ0r by m0r ,
where m0r is the rth raw moment of the random sample. It gives the moment estimators
of the parameters.
Remark 6.3 Moment estimators are consistent under suitable conditions. For iid
random sample X1 , X2 , · · · , Xn with pdf pθ (x) ∀ θ ∈ Ω ,
n
1X P
X r ⇒ E[X r ] as n → ∞, r = 1, 2, · · ·
n i=1 i

This is not true when the moments of the distribution do not exist. For example in the
case of Cauchy distribution moment estimators do not exist.
Problem 6.18 A random sample of size n is taken from the log normal distribution

1
 √1 1 − 2σ2 (log x−θ)2
e x>0

pθ,σ2 (x) = 2πσ x
 0

otherwise

Find the moment estimates of θ and σ 2 .


Solution: Consider the rth order moment

1 xr − 12 (log x−θ)2
Z ∞
E[X r ] = e 2σ √ dx
0 2πσ x
Take y = log x, i.e., ey = x ⇒ ey dy = dx
Z ∞
1 1 2
r
E[X ] = √ ery e− 2σ2 (y−θ) dy
0 2πσ
y−θ
Let = z ⇒ y = σz + θ, dy = σdz
σ
A.Santhaakumaran 257

Z ∞
r 1 1 2
E[X ] = √ erθ− 2 z +rσz dz
−∞ 2π
erθ
Z ∞
1 2 −2rσz]
= √ e− 2 [z dz
2π −∞
r2 σ2
erθ+ 2
Z ∞
1 2
= √ e− 2 [z−rσ] dz
2π −∞
2 2
rθ+ r 2σ √ Z ∞ √
0 e 1 2
µr = √ 2π since e− 2 [z−rσ] dz = 2π
2π −∞
r2 σ2
µr0 = erθ+ 2 r = 1, 2, · · ·
σ2 2
when r = 1 log µ10 = θ + , 2 log µ10 = 2θ + σ 2 , log µ10 = 2θ + σ 2
2 !
2 µ20
when r = 2 log µ20 = 2θ + 2σ , 2
log µ20 − log µ10 2
= σ , log = σ2
(µ10 )2
m02
P r
x
 
2
⇒ σ̂ (x) = log where m0r = i
r = 1, 2, · · ·
(m01 )2 n
m02
 
log(m01 )2 = 2θ̂(x) + log
(m01 )2
m02
 
log(m01 )2 − log = 2θ̂(x)
(m01 )2
 
(m01 )2 
i.e., θ̂(x) = log  q
m02

Problem 6.19 Find the moment estimates of α and β for the pdf

 αβ e−αx xβ−1

x > 0, β > 0, α > 0
Γβ
pα,β (x) =
 0

otherwise

by using a sample of size n.


Solution:
αβ −αx β−1
Z ∞
E[X r ] = xr e x dx
0 Γβ
Z ∞ β
α −αx r+β−1
= e x dx
0 Γβ
Γ(β + r) αβ Γ(β + r)
µr 0 = = r = 1, 2, · · ·
αβ+r Γβ αr Γβ
Γ(β + 1) βΓβ β
when r = 1 µ01 = = =
αΓβ αΓβ α
Γ(β + 2) (β + 1)βΓβ (β 2 + β)
when r = 2 µ02 = = =
α2 Γβ 2
α Γβ α2
A.Santhaakumaran 258

µ20 1 (µ0 )2
= 1+ ⇒ β= 1
(µ10 )2 β µ2
P P P 2
(m01 )2 m01 xi x2i xi
Thus β̂(x) = m2 and α̂(x) = m2 where m01 = n and m2 = n − n .
Problem 6.20 Obtain the moment estimate of the parameter θ of the pdf

 1 e−|x−θ|

−∞ < x < ∞
2
pθ (x) =
 0

otherwise

by taking a sample of size n.


Solution: Consider
Z ∞
x −|x−θ|
µ01 = Eθ [X] = e dx
2 −∞
|x − θ| = x − θ if x ≥ θ

= −(x − θ) if x ≤ θ
Z θ
x (x−θ)
Z ∞ x −(x−θ)
µ01 = e dx + e dx
−∞ 2 θ 2
when x − θ = t ⇒ x = t + θ
Z 0 Z ∞
2µ01 = (t + θ)et dt + (t + θ)e−t dt
−∞ 0
Z ∞ Z 0 Z ∞ 
= θ e−|t| dt + tet dt + te−t dt
−∞ −∞ 0
Z ∞ Z ∞ Z ∞
−|t| −t
= θ e dt − θ te dt + θ te−t dt
−∞ 0 0
Z ∞
1 −|t|
= θ since e dt = 1
2 −∞
P
xi
µ01 = θ ⇒ θ̂(x) = m01 where m01 = .
n

Problem 6.21 For a single random observation x of X , obtain the moment estimates
of the parameters a and b of the rectangular distribution
Solution: The pdf of rectangular distribution is

1
a<x<b


b−a
pa,b (x) =
 0

otherwise

Z b
x a+b
µ01 = E[X] = dx =
a b−a 2
A.Santhaakumaran 259

!
x2 b3 − a3 b2 + ab + a2
Z b
1
µ02 = E[X ] =2
dx = =
a a−b b−a 3 3
b2
+ 2ab + a2 − ab (2µ01 )2 − ab
µ02 = =
3 3
3µ02 0 2 0
= 4(µ1 ) − ab and b = 2µ1 − a

.˙. 3µ02 = 4(µ01 )2 − a(2µ01 − a) ⇒ 3µ02 = 4(µ01 )2 − 2aµ01 + a2

2
a2 − 2aµ01 + 4µ01 − 3µ02 = 0
q
2µ01 ± 4µ01 2 − 4(4µ01 2 − 3µ02 )
a=
2
√ √ √
â(x) = m01 ± 3m2 . But 2µ01 = µ01 ± 3m2 + b ⇒ b̂(x) = m01 ± 3m2 . Thus the value
√ √
of the moment estimators of a and b are â(x) = m01 − 3m2 and b̂(x) = m01 + 3m2
P P P 2
xi x2i xi
where m01 = n and m2 = n − n

Problem 6.22 A random variable X has the following distribution function

X=x 0 1 2
Pθ {X = x} 1 − θ − θ2 θ θ2

Obtain the moment estimate of θ, if in a sample of 25 observation there were 10 ones


and 4 twos.
Solution: Constructing the following Table based on information given in the problem,

X=x Pθ {X = x} Frequency f
0 1 − θ − θ2 11
1 θ 10
2 θ2 4
P
Total 1 fi = 25
P
fx
Solution: One can get, µ01 = Eθ [X] = (1 − θ − θ2 ) × 0 + θ × 1 + θ2 × 2 = P fi i
i
0+10+8
0+θ+ 2θ2 = 25

50θ2 + 25θ − 18 = 0

−25± 625+4×50×18
θ̂(x) = 2×50 = 0.4

Problem 6.23 Let X1 , X2 , · · · , Xn be a random sample drawn from a population with


A.Santhaakumaran 260

pdf


 α xα−1 e− β

x, β, α > 0
β
pα,β (x) =
 0

otherwise
Obtain the moment estimates of α and β.
Solution: Compute the rth order moment, i.e.,
Z ∞  α
r α − xβ
E[X ] = xα+r−1 e dx
0 β
 Z ∞
α 1
α+r−1
− βy 1 1 −1
= yα e y α dy where y = xα
β 0 α
Z ∞
1 −y r
= e β y α +1−1 dy
β 0
1 Γ αr + 1

r
 
r
µ0r = r = β α Γ + 1
β ( β1 ) α +1 α
1 2
   
1 2
µ01 = β α Γ + 1 and µ02 = β α Γ +1
α α
2
2 1
   
2 2
µ2 = βαΓ + 1 − βα Γ +1
α α
 2
µ2 Γ( α2 + 1) − Γ( α1 + 1)
=
(µ01 )2
 2
Γ( α1 + 1)
P
S 1 Pn Xi
Coefficient of variation = X̄
where S 2 = n−1
2
i=1 (Xi − X̄) and X̄ = n . Equating
 2
S2 Γ( α2 + 1) − Γ( α1 + 1)
= 2
x̄2

Γ( α1 + 1)

and using iterative method to estimate the value of α. From the estimate α̂(x) one can
obtain the estimate of β.

6.7 Method of minimum Chi - Square estimation

Suppose that a sample contains r classes with observed frequency f1 , f2 , · · · , fr


Pr
such that i=1 fi = f . Let πi (θ) be the probability of an observation in the ith class
Pr
such that i=1 πi (θ) = 1. The probability πi (θ) is the function of θ. Let T = t(X)
be any statistic for the parameter θ where θ is unknown. A statistic T = t(X) is
called minimum χ2 estimator of the parameter θ if it is obtained by minimizing χ2
A.Santhaakumaran 261

with respect to θ,i.e.,


r
X [fi − f πi (θ)]2
χ2 =
i=1
f πi (θ)
r
X fi2
= −f
i=1
f πi (θ)
2 r
∂χ X fi2 dπi (θ)
= − =0
∂θ i=1
f πi2 (θ) dθ
r
X fi2 dπi (θ)
⇒ = 0
i=1
f πi2 (θ) dθ

A solution of this equation is called Minimum χ2 estimate of θ.


Remark 6.4 Minimum χ2 estimator is analogous to that of MLE of θ. The asymp-
totic properties of Minimum χ2 estimators are similar to those of MLE.
A modified form of Minimum χ2 estimator is obtained by minimizing
r
X [f πi (θ) − fi ]2
χ2mod =
i=1
fi
r
X f 2 πi2 (θ)
= −f
i=1
fi
r
∂χ2mod X f2 dπi (θ)
= 2 πi (θ) =0
∂θ i=1
fi dθ
r
X f2 dπi2 (θ)
⇒ = 0
i=1
fi dθ

Solving the equation for θ gives the modified Minimum χ2 estimate of θ.


Problem 6.24 Find minimum χ2 estimate of the parameter θ of the Poisson distri-
bution.
Solution:

e−θ θj
Let πj (θ) = j = 0, 1, · · · ,
j!
dπj (θ) e−θ θj−1 j θj e−θ (−1)
= +
dθ j! j!
−θ
e θ j j  
= −1
j! θ
dπj (θ) j
 
= πj −1
dθ θ
A.Santhaakumaran 262

dχ2 X fj2 dπj (θ)


= − =0
dθ j
f πj2 (θ) dθ
fj2 j
X  
i.e., πj (θ) −1 = 0
j
πj2 (θ) θ
X fj2 
j

−1 = 0
j
πj (θ) θ
X fj2 
j

1− = 0
j
πj (θ) θ

Iterative method may be used to solve the equation for θ. Alternatively, expand f (θ) =
fj2
h i
j
1−
P
j πj (θ) θ in a Taylor’s series as a function of θ upto first order about the sample
mean x̄ where x̄ is the trial value of θ,
X fj2 X fj2  X fj2
" 2 #
j j j j
   
1− = 1− + (θ − x̄) 2
+ 1−
j
πj (θ) θ j
mj x̄ j
mj x̄ x̄
e−x̄ x̄j
where πj (x̄) = mj = and f (θ) = f (x̄) + (θ − x̄)f 0 (x̄)
j!
 
d 1
πj (θ) (1 − θj ) 1

j
 
j

1 dπj (θ)
since = 0+ 2 − 1−
dθ πj (θ) θ θ πj2 (θ) dθ
" #
1 j j 1 j
  
= − 1− (−πj (θ)) 1 −
πj (θ) θ2 θ πj (θ) θ
" 2 #
1 j j

= 2
+ 1− = f 0 (θ)
πj (θ) θ θ
X fj2 
j

But 1− = 0
j
πj (θ) θ

X fj2  X fj2
" 2 #
j j j
 
1− + (θ − x̄) + 1− =0
j
mj x̄ j
mj x̄2 x̄

P fj2 h j
i
− j mj 1− x̄
θ − x̄ = P fj2 j j 2
j mj [ x̄2 + (1 − x̄ ) ]
P fj2
− j mj [x̄ − j] x̄1
θ − x̄ = P fj2
j mj [j + (x̄ − j)2 ] x̄12
A.Santhaakumaran 263

P fj2
j mj [j − x̄]
Let θ1 = x̄ + x̄ P fj2
j mj [j + (j − x̄)2 ]

To improve the value of θ from x̄, repeat the process until to get the convergent value
of θ.
Problem 6.25 Show that for large sample size, maximizing the likelihood function
of the χ2 statistic is equal to minimizing the χ2 statistic.
Solution: Let oj be the observed frequency and ej be the theoretical fre-
P (oj −ej )2
quency of the j th class. Then χ2 = j ej . For large fixed sample size
n, the distribution of quantities oj , j = 1, 2, · · · , r is given by the likelihood
function

n! e1 o1 e2 o2 e r or
     
L = ···
o1 !o2 ! · · · or ! n n n
such that o1 + o2 + · · · + or = n
  o1   o2  or  o1   or
n! e1 e2 er o1 or
= ··· ···
o1 !o2 ! · · · or ! o1 o2 or n n
r
!
X ej
log L = constant + oj log
j=1
oj

1
For large fixed sample size, ej = oj + aj n1−δ , δ > 0, i.e., ej = oj + aj n 2 for δ = 21 ,
1 1
where aj is finite and |aj n 2 | <  and
P P P
j oj = j ej = n so that n 2 j aj = 0 as
1
n → ∞ and if n 6→ ∞, then 
aj < 0(n− 2 ).
P P
aj < 1 for every  > 0, i.e.,
n2

r
" 1 #
oj + aj n 2
X
log L = constant + oj log
j=1
oj
r
" 1 !#
X aj n 2
= constant + oj log 1 +
j=1
oj
1
a2j n 1
" !#
X aj n 2
= constant + oj − 2 + ···
j
oj oj 2
1 1 X a2j n 1
+ 0(n− 2 )
X
= constant + aj n 2 −
j
2 j oj
1 X (ej − oj )2 1
= constant − + 0(n− 2 )
2 j oj
A.Santhaakumaran 264

P (ej −oj )2
If modified χ2 statistic is defined as χ2mod = j oj , then

1
log L = constant − χ2mod as n → ∞.
2

To prove χ2 = χ2mod as n → ∞. Consider


X (oj − ej )2 X (ej − oj )2
χ2 − χ2mod = −
j
ej j
oj
" #
X (ej − oj )2 oj
= −1
j
oj ej
" #
X (ej − oj )2 oj
= 1 −1
j
oj oj + aj n 2

1 !−1 
X (ej − oj )2 a n2
=  1+ j − 1
j
oj oj
1
a2j n
" #
X (ej − oj )2 aj n 2
= 1− + 2 − ······ − 1
j
oj oj oj
X (oj − ej )2 aj n1/2 X (oj − ej )2 a2j n
= − + − ···
j
o2j j
o3j
X a3j n3/2 X a4j n2
= − + − ··· where ej − oj = aj n1/2
j
o2j j
o3j

3  1
3
P a3j n 2 P a3j h 1
i3 1
Since j o2j
<  for some n > N ⇒ j o2 < 
3 = 3
1 = o(n− 2 = o(n− 2 )
j n2 n2
1
" 1
#4
P a4j (n 2 )4 P a4j 14 1
and j o3j
< 1 for some n > N ⇒ j o3 < 1
1 < 1 = (o(n− 2 ))4 =
j (n 2 )4 n2
1
o(n− 2 ) where  > 0 and 1 > 0.

1
χ2 − χ2mod = o(n− 2 ) = 0 as n → ∞
1
Thus log L = constant − χ2 as n → ∞
2
1
max log L = constant + {− max χ2 } as n → ∞
2
1
= constant + min χ2 as n → ∞
2

Maximizing the likelihood function of the χ2 statistic = Minimizing the χ2


statistic.
A.Santhaakumaran 265

6. 8 Method of least square estimation

Consider a linear model Y = Xθ+ where  is a non - observable random


vector such that E[i ] = 0 and V [i ] = σ 2 ∀i, Y is a known vector, θ is the
parameter to be estimated
     
y1 θ1 1
     
     
 y2   θ2   2 
Y = θ= =
     
  
 ···   ···   ··· 
     
     
yn θm n
n×1 m×1 n×1

X = coefficient matrix of the parameter θ


 
x11 x12 · · · x1m
 
 
 x21 x22 ··· x2m 
i.e., X = 
 

 ··· ··· ··· ··· 


 
xn1 xn2 · · · xnm
n×m

Definition 6.1 An estimate of θ say θ̂(x) which minimizes (Y −Xθ)0 (Y −Xθ) is


called the Least Square Estimator (LSE) of θ, i.e.,  = (Y −Xθ), 0 = (Y −Xθ)0 .
Define S = 0  = (Y − Xθ)0 (Y − Xθ). The necessary condition is dS
dθ = 0 and
d2 S
the sufficient condition is dθ2
> 0 at θ = θ̂(x) for minimisation of S.
dS
⇒ = −2X 0 (Y − Xθ) = 0

X 0 Y − X 0 Xθ = 0

X 0 Xθ = X 0 Y

⇒ θ̂(x) = (X 0 X)−1 X 0 Y provided (X 0 X)−1 exists

LSE of θ is unbiased, i.e., Eθ [θ̂(X)] = θ.

θ̂(x) = (X 0 X)−1 X 0 Y

= (X 0 X)−1 X 0 [Xθ + ]

= (X 0 X)−1 X 0 Xθ + (X 0 X)−1 X 0 

Eθ [θ̂(X)] = θ + (X 0 X)−1 X 0 Eθ []

= θ since Eθ [] = 0
A.Santhaakumaran 266

To find the variance of the LSE, consider

θ̂(X) − θ = (X 0 X)−1 X 0 Y − θ

= (X 0 X)−1 X 0 [Xθ + ] − θ

= (X 0 X)−1 X 0 Xθ + (X 0 X)−1 X 0  − θ

= θ + (X 0 X)−1 X 0  − θ

= (X 0 X)−1 X 0 

Vθ [θ̂(X)] = Eθ [θ̂(X) − θ][θ̂(X) − θ]0

= Eθ [(X 0 X)−1 X 0 ][(X 0 X)−1 X 0 ]0

= Eθ [0 ](X 0 X)−1 (X 0 X)(X 0 X)−1

= σ 2 (X 0 X)−1

since E[i ] = 0, V [i ] = σ 2 and E[0 ] = σ 2 I

Linear estimation

let Y1 , Y2 , · · · , Yn be n independent random variables with same variance


σ 2 and E[Y ] = Xθ. Any linear function b0 θ = b1 θ1 + b2 θ2 + · · · + bm θm in θ is
unbiasedly estimable if there exists a linear function c0 Y = c1 y1 + c2 y2 + · · · +
cn yn such that Eθ [c0 Y ] = b0 θ.
Theorem 6.7 A necessary and sufficient condition for the estimability of b0 θ is

ρ(X 0 ) = ρ(X 0 , b)

Proof: Let b0 θ be estimable,


i.e., Eθ [c0 Y ] = b0 θ ∀ θ
⇒ c0 Xθ = b0 θ ∀θ, i.e., c0 X = b0
X 0 c = b is solvable, i.e., ρ(X 0 ) = ρ(X 0 , b)
Conversely, suppose ρ(X 0 ) = ρ(X 0 , b) ⇒ X 0 c = b, the equation is consistent.
.˙. X 0 c = b is solvable , i.e., c0 X = b0
c0 Xθ = b0 θ
A.Santhaakumaran 267

c0 Eθ [Y ] = b0 θ
Eθ [c0 Y ] = b0 θ
⇒ b0 θ is estimable
Remarks 6.5 ρ(X 0 ) = ρ(X 0 , b) ⇒ ρ(X 0 X) = ρ(X 0 X, b)

Best linear unbiased estimator

Definition 6.2 The unbiased linear estimate of an estimable linear para-


metric function b0 θ = b1 θ1 + b2 θ2 + · · · + bm θm with minimum variance is called
the best linear unbiased estimator or BLUE.

6.9 Gauss Markoff Theorem

Theorem 6.8 Let Y1 , Y2 , · · · , Yn be n independent random variables with variance


σ 2 and Eθ [Y ] = Xθ, then every estimable parametric function b0 θ possesses an unique
minimum variance unbiased estimator which is a function of θ̂(X), the LSE of θ.
Further, E[Y − X θ̂ ]0 [Y − X θ̂ ] = (n − r)σ 2 .
Proof: b0 θ is estimable if there exist c0 Y such that Eθ [c0 Y ] = b0 θ

c0 Xθ = b0 θ → X 0 c = b (6.11)

and V [c0 Y ] = c0 cσ 2 (6.12)

Minimize equation (6.12) subject to equation ( 6.11)


Using the method of Lagrange multiplier , one determines the stationary
points by considering

L(λ) = c0 c − 2λ0 (X 0 c − b)

where λ is a vector of Lagrange multiplier


dL(λ)
= 2c0 − 2λ0 X 0
dc
The stationary points of the function L(λ) are given by the equation

dL(λ)
= 0
dc
A.Santhaakumaran 268

⇒ c0 − λ0 X 0 = 0

⇒ c0 = λ0 X 0 i.e., c = Xλ

X 0 Xλ = b (6.13)

Since b0 θ is estimable, equation (5.11) is solvable, i.e.,

ρ(X 0 ) = ρ(X 0 , b) ↔ ρ(X 0 X) = ρ(X 0 X, b).

Thus equation (6.13) is solvable. Let c(1) and c(2) be two solutions for to λ(1)
and λ(2) of equation (6.13).

c(1) = Xλ(1)

c(2) = Xλ(2)

X 0 Xλ(1) = b

X 0 Xλ(2) = b

X 0 X(λ(1) − λ(2) ) = 0

c(1) − c(2) = X(λ(1) − λ(2) )


 0    0  
c(1) − c(2) c(1) − c(2) = λ(1) − λ(2) X 0 X λ(1) − λ(2)
 0  
→ c(1) − c(2) c(1) − c(2) = 0

→ c(1) = c(2)

Thus, whatever be the solution of λ of the equation (6.13) the values of c0


are the same. Hence b0 θ possesses an unique minimum variance unbiased
estimator.
Suppose that ρ(X) = r and
 the
 first r columns of X are linearly indepen-
 b1 
dent. Let X = [X1 X2 ] , b =   . Now the solution of the equation (6.13)
b2
is λ = (X10 X1 )−1 b1

.˙. c = Xλ
−1
= X1 X10 X1 b1
−1 −1
c0 c = b01 X10 X1 X10 X1 X10 X1 b1
−1
= b01 X 0 X1 b1
A.Santhaakumaran 269

For every c satisfying X 0 c = b

c0 c = c0 [I − X1 (X10 X1 )−1 X10 ]c + c0 X1 (X10 X1 )−1 X10 ]c

= c0 [I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 (X10 X1 )(X10 X1 )−1 (X10 X1 )](X10 X1 )−1 b1

= c0 [I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 b1

Since [I − X1 (X10 X1 )−1 X10 ] is an idempotent matrix

= c0 [I − X1 (X10 X1 )−1 X10 ][I − X1 (X10 X1 )−1 X10 ]c + b01 (X10 X1 )−1 b1

≥ b01 (X10 X1 )−1 b1

This indicates that the minimum is actually obtained. The LSE θ̂(X) of θ
is obtained by minimizing (Y − X θ̂)0 (Y − X θ̂). The normal equation is

X 0 Xθ = X 0 Y ⇒ c0 Y = λ0 X 0 Y = λ0 X 0 X θ̂ = b0 θ̂(X) since b0 = λ0 X 0 X.

The best linear unbiased estimator of c0 Y is b0 θ̂(X).


Since I − X1 (X1 X1 )−1 X10 is a projection matrix and hence it is an idempo-
 

tent matrix. Further, it is an well known property that for an idempotent


matrix A, ρ(A) = T r(A)
   
⇒ ρ I − X1 (X10 X1 )−1 X10 = T r I − X1 (X10 X1 )−1 X10 = (n − r)

since the first r column of X1 are linearly independent.

.˙. Eθ [(Y −X θ̂ )0 (Y −X θ̂)] = Eθ [(Y −X θ̂ )0 (I −X1 (X10 X1 )−1 X10 )(Y −X θ̂)] = (n−r)σ 2 .

Problem 6.26 E[Y1 ] = θ1 + θ2 , E[Y2 ] = θ2 + θ3 , E[Y3 ] = θ3 + θ1 . Show that


l1 θ1 + l2 θ2 + l3 θ3 is estimable if l1 + l2 = l3 .
Solution: Given
       
 1 0 1   1 0 1   l1   θ1 
0
       
X=
 0 1 1  , X =  0 1 0  , l =  l2  , θ =  θ2 
      
       
1 0 1 1 1 1 l3 θ3

l0 θ is estimable if and only if ρ(X 0 ) = ρ(X 0 , l) Consider X 0 θ = l , i.e.,


    
 1 0 1   θ1   l1 
    
 0 1 0  θ  = l 
  2   2 
    
1 1 1 θ3 l3
A.Santhaakumaran 270

    
 1 0 1   θ1   l1 
    
 0 1 0  θ  = l 
  2   2 
    
0 1 0 θ3 l3 − l1

    
 1 0 1   θ1   l1 
    
 0 1 0  θ  = l 
  2   2 
    
0 0 0 θ3 l3 − l1 − l2

ρ(X 0 ) = ρ(X 0 , l) if l3 − l1 − l2 = 0, i.e., l3 = l1 + l2 .


Problem 6.27 The feed intake of a cow with weight X1 and yield of milk X2
may be of the linear model Y = a + b1 X1 + b2 X2 + , where  is called random
error or random residuals. If yi , xi1 and xi2 are the values of Y, X1 and X2
for cow i = 1, 2, 3, 4 and 5. The following observations are made on 5 cows:

i Y X1 X2
1 62 2 6
2 60 9 10
3 57 6 4
4 48 3 13
5 23 5 2

The estimate θ̂(X) = (a, b1 , b2 )0 is calculated from θ̂(X) = (X 0 X)−1 X 0 Y


where    
 62   1 2 6 
     

 60 


 1 9 10 
  250 
   
0
 
Y = 57  X= 1 6 4  XY =
 1265
    
    
     
 48   1 3 13  1870
3×1
   
   
23 1 5 2
5×1 5×3
A.Santhaakumaran 271

   
 5 25 35   790 −80 −42 
0 −1
  1  
(X X)  25 155 175  = 480  −80 16
=   0 
   
35 175 325 −42 0 6
      
 790 −80 −42   250   37   a 
1
θ̂(X) = (X 0 X)−1 X 0 Y =
       
 −80 16 0   1265  =  1  =  b 
480 
     2   1 
     
3
−42 0 6 1870 2 b2
The estimated linear model is Y = 37 + 12 X1 + 32 X2 .
Problems

6.1 Define LSE. Show that under certain assumptions to be stated, the
LSE’s are minimum variance unbiased estimators.

6.2 Let yi = βxi +i , i = 1, 2, 3, · · · , n where 1 , 2 , · · · n are uncorrelated random


variables with mean 0 and σ 2 . Find the LSE of β. Show that the LSE
of β is unbiased. Find the variance of LSE of β. Also show that LSE
of β is the best Linear Unbiased Estimator of β.

6.3 Examine the sufficiency and unbiasedness of the MLE.

6.4 Independent random samples of sizes n1 , n2 , and n3 are available from


three normal populations with mean α + β + γ, α − β, and β − γ respec-
tively, and a common variance σ 2 . Find the MLE of α, β, γ and σ 2 . Are
they UMVUE’s?

6.5 Give the conditions for which


(a) the likelihood equation has a consistent estimator with probability
approaching one as n → ∞.
(b) the consistent estimator of the likelihood equation is asymptoti-
cally normal.

6.6 Explain the principle of Maximum Likelihood of Estimation of param-


eter θ of p(x | θ). Obtain MLE of the parameters of N (θ, σ 2 ). Also
examine them for unbiasedness.
A.Santhaakumaran 272

6.7 Show that under what regularity conditions to be stated, the MLE is
asymptotically normally distributed.

6.8 Let X1 , X2 , · · · , Xn be a random sample drawn from a population with


mean θ and finite variance. Let T = t(X) be an estimator for θ and has
minimum variance and T 0 = t0 (X) is any other unbiased estimator of θ,
then Covθ [T, T 0 ] = V [T ].

6.9 Derive the formula to calculate the MLE of θ, using a random sample
θ x
from the distribution with Pθ {X = x} = ax g(θ) , x = 1, 2, · · · where g(θ) =
ax θx . Also obtain the explicit expression for the case of truncated
P

Poisson distribution with x = 1, 2, 3, · · · .

6.10 Show that MLE of θ based on n independent observations from a


uniform distribution in (0, θ) is consistent.

6.11 Find the MLE of θ given the observations .8 and .3 on a random


variable with pdf

 2x

0<x<θ
θ
pθ (x) =

 2 (1−x)
1−θ if θ ≤ x < 1, 0 < θ < 1

6.12 Let X1 , X2 , · · · , Xn be n independent random observations with pdf


N (0, θ). Find the MLE of θ.

5.13 Given a random sample from N (θ, 1), (θ = 0, ±1, ±2, · · · , ). Find the
MLE of θ.

6.14 Explain the method of scoring to obtain the MLE.

6.15 Obtain the MLE of θ based on random samples of sizes n and m from
1 − xθ
populations with respective frequency functions θe and θe−xθ , x >
0, θ > 0.

5.16 What is MVBE? Obtain sufficient conditions for an estimator to be


MVBE.
A.Santhaakumaran 273

6.17 Give an account of estimation by the method of (i) Moments (ii)


Minimum χ2 , giving one illustration in each case.

6.19 Examine the truth of the following statements


(i) MLE is unique
(ii) MLE is unbiased
(iii) If sufficient statistics T = t(X) exists for parameter θ, then MLE
is a function of T

6.20 Show that under certain conditions to be stated MLE is consistent.

6.21 Examine whether MLE always exists.

6.22 Obtain the general form of distribution admitting MVBE’s.

6.23 A random sample of size n is available from pθ (x) = θxθ−1 , 0 < x <
1, θ > 0. Find that function of θ for which MVBE exists. Also find the
MVBE of this function and its variance.

e−θ θx
6.24 Derive the MVUE of θ2 in pθ(x) = x! , x = 0, 1, · · · , by taking a sample
of size n and show that it is not a MVBE of θ2 .

6.25 Describe the Minimum χ2 method of estimation. Show that, under


what certain conditions to be stated, the methods of Minimum χ2 and
Maximum likelihood χ2 statistic are equally efficient estimators.

6.26 Show that MVBE’s exist for the exponential family of densities.

6.27 Find MLE of β in Gamma(1, β) based on a sample of size n where


the actual observations are not available but it is known that k of the
observations are less than or equal to a fixed positive number M .

6.28 Obtain the BLUE of θ for the normal distribution with mean θ and
variance σ 2 based on n observations x1 , x2 , · · · , xn .
A.Santhaakumaran 274

6.29 Obtain the MLE for the coefficient of variation from a population
with N (θ, σ 2 ) based on n observations.

6.30 Obtain the MLE of θ for the pdf



 (1 + θ)xθ

0 < x < 1 and θ > 0
pθ (x) =
 0

otherwise

based on an independent sample of size n.

6.31 Obtain the MLE of θ using a random sample of size n from



 1

−θ < x < θ

pθ (x) =
 0

otherwise

6.32 Show that maximum likelihood estimation χ2 statistic and Minimum


χ2 statistic give the same results as n → ∞.

6.33 Find the MLE of N of



 1

if x = 1, 2, · · · , N, N ∈ I+
N
pN (x) =
 0

otherwise

based on a random sample of size n.

6.34 Suppose X1 , X2 , · · · , Xn are iid observations from the density



x2
 2x2 e− θ2

x > 0, θ > 0
θ
pθ (x) =
0 otherwise

Obtain the MLE of θ.

6.35 If the random variable X takes the value 0 or 1 with probability 1 − p


and p respectively and p ∈ [0.1, 0.9], then maximum likelihood estimate
of p on the basis of a single observation x would be
(a) 8x+1
10 (b)x (c) 9−8x
10 (d) x2 Ans:(a)
Hint: 
 p̂

if x = 0
The maximum of L(p) =
 1 − p̂

if x = 1
A.Santhaakumaran 275

6.36 The maximum likelihood estimator of σ 2 in a normal population with


mean zero is
1
(xi − x̄)2
P
(a) n
1
(xi − x̄)2
P
(b) n−1
1 P 2
(c) n x i
1 P 2
(d) n−1 x i Ans:(c)

6.37 Consider the following statements:


The maximum likelihood estimators
1. are consistent
2. have invariant property
3. can be made unbiased using an adjustment factor even if they are
biased. Of these statements
(a) 1 and 3 are correct
(b) 1 and 2 are correct
(c) 2 and 3 are correct
(d) 1, 2 and 3 are correct Ans:(b)

6.38 Which of the following statements are not correct?


1. From the Cramer - Rao inequality one can always find the lower
bound of the variance of an unbiased estimator
2. If sufficient statistic exits, then maximum likelihood estimator is
itself a sufficient statistic
3. UMVUE and MVBE’s are same
4. MLE’s may not be unique
Select the correct answer given below
(a) 1 and 3 (b) 1 and 2 (c) 1 and 4 (d) 2 and 3 Ans:(a)

6.39 Which one of the following is not necessary for the UMVU estimation
of θ by T = t(X)?
(a) E[T − θ] = 0
A.Santhaakumaran 276

(b) E[T − θ]2 < ∞


(c) E[T − θ]2 is minimum
(d) T = t(X) is a linear function of observations Ans:(d)

6.40 Consider the following statements


If X1 , X2 , · · · , Xn are iid random variables with uniform distribution over
(0, θ), then
1. 2X̄ is an unbiased estimator of θ.
2.The largest among X1 , X2 , · · · , Xn is an unbiased estimator θ
3. The largest among X1 , X2 · · · , Xn is sufficient for θ
n+1
4. n X(n) is a minimum unbiased estimator of θ
Of these statements
(a) 1 alone is correct
(b) 1 and 2 are correct
(c) 1, 3 and 4 are correct
(d) 1 and 4 are correct Ans:(c)

6.41 LSE and MLE are the same if the sample comes from the population
is
(a) Normal (b) Binomial (c) Cauchy ( d) Exponential Ans:(a)

6.42 LSE of the parameters of a linear model are


(a) unbiased (b) BLUE (c) UMVU (d) all the above Ans:(d)

6.43 Let Y1 , Y2 , Y3 be uncorrelated observations with common variance σ 2


and expectations given by E[Y1 ] = β1 , E[Y2 ] = β2 and E[Y3 ] = β1 +
β2 where β1 , β2 are unknown parameters. The best linear unbiased
estimator of β1 + β2 is
(a) Y3 (b) Y1 + Y2 (c) 31 (Y1 + Y2 + 2Y3 ) (d) 21 (Y1 + Y2 + Y3 ) Ans:(c)

6.44 Consider a series system with two independent components. Let the
A.Santhaakumaran 277

component lifespan have exponential distribution with density



 λe−λt

t>0 λ>0
pλ (t) =
 0

otherwise

If n observations T1 , T2 , · · · , Tn on life span of this component are avail-


1 Pn
able and the mean time to failure X̄ = n i=1 Ti , then the maximum
likelihood estimator of the reliability of the system is given by
2t t 2t t
(a) e− X̄ (b) e− X̄ (c) 1 − e− X̄ (d) 1 − e− X̄ Ans:(a)

6.45 Let Y1 , Y2 , Y3 , Y4 be uncorrelated observations such taht E[Y1 ] = β1 +


β2 + β3 = E[Y2 ] , E[Y3 ] = β1 − β2 = E[Y4 ] and V [Yi ] = σ 2 for i = 1, 2, 3, 4
which of the following statements are true?
(a) p1 β1 + p2 β2 + p3 β3 is estimable if and only if p1 + p2 = 2p3
[(Y1 −Y2 )2 +(Y3 −Y4 )2 ]
(b) An unbiased estimator of σ 2 is 4
[(Y1 −Y2 )+(Y3 −Y4 )]
(c) An unbiased estimator of σ 2 is 4

(d) An unbiased estimator of σ 2 is (Y1 − Y2 )2 + (Y3 − Y4 )2


 

Ans:(a) and (b)

6.46 Let Y1 , Y2 , Y3 , Y4 be uncorrelated observations such taht E[Y1 ] = β1 +


β2 + β3 = E[Y2 ] , E[Y3 ] = β1 − β2 = E[Y4 ] and V [Yi ] = σ 2 for i = 1, 2, 3, 4
which of the following statements are true?
(a) p1 β1 + p2 β2 + p3 β3 is estimable if and only if p1 + p2 = p3
[Y3 +Y4 )]
(b) An unbiased estimator of σ 2 is 2
[(Y3 +Y4 )]
(c) The best linear unbiased estimator of β1 − β2 is 2

(d) The variance of the best linear unbiased estimator of β1 + β2 + β3


is σ 2 Ans: (b) and (d)

6.47 Consider the model Yi = iβ + i = 1, 2, 3 where 1 , 2 , 3 are independent


with mean 0 and variance σ 2 , 2σ 2 , 3σ 2 respectively which of the following
is the best linear unbiased estimator of β?.
Y1 +2Y2 +3Y3 6 Y2 Y3 Y1 +Y2 +Y3 3Y1 +2Y2 +Y3
(a) 6 (b) 11 [Y1 + 2 + 3 ] (c) 6 (d) 10 Ans:(c)
A.Santhaakumaran 278

6.48 Let X1 , X2 · · · , Xn be a random θ ∈ (1, 2). Then which of the following


statements about the maximum likelihood estimator of θ is correct?
(a) MLE of θ does not exist
(b) MLE of θ is X̄
(c) MLE of θ exists but it is not X̄
(d) MLE of θ is an unbiased estimator of θ Ans:(c)

6.49 Consider a randomized block design involving 3 treatments and 3


replicates and ti denote the effect of the ith , i = 1, 2, 3. If σ 2 denotes the
variance of an observation which of the following statements is true?.
−t2
t1√ t1 −2t
√2 +t3
(a) The variance of BLUE of 2
and 6
are equal
(b) The covariance between the BLUE of t1 − t2 and the BLUE of
2σ 2
(t1 − 2t2 + t3 ) is 3
σ2
(c) The variance of BLUE of ti − tj (i 6= j, i, j = 1, 2, 3) is 6
σ2
(d) The variance of BLUE of (t1 − 2t2 + t3 ) is 6 Ans:(a)

6.50 Consider a region R which is a triangle with vertices (0, 0), (0, θ), (θ, 0)
where θ > 0. A sample of size n is selected at random from this region
R. Denote the sample as {(Xi , Yi ), i = 1, 2, 3 · · · , n}. Then denoting
X(n) = max(X1 , X2 , · · · , Xn ) and Y(n) = max(Y1 , Y2 , · · · , Yn ) which of the
following statements is true
X(n) +Y(n)
(a) MLE of θ is 2
(Xi +Yi )
(b) MLE of θ is n

(c) MLE of θ is max1≤i≤n (Xi + Yi )


(d) MLE of θ is max{X(N ) , Y(n) } Ans:(a)

6.51 Let X1 , X2 , · · · , Xn be the iid random variables with N (µ, 1) distribution.


Assume that µ ∈ [0, ∞). Let µ̂ be MLE of µ. Then which of the following
statements are true?
(a) µ̂ = max(X̄, 0)
(b) µ̂ is unbiased for µ
A.Santhaakumaran 279

(c) X¯n is sufficient for µ


(d) µ̂ is consistent for µ Ans:(a), (c) and (d)

6.52 Let Y1 , Y2 , · · · , Yn be random variables with common unknown covari-


ance matrix V of the vector (Yi , Y2 , · · · , Yn ) is such that the inverse of V
has all its diagonal elements equal to c and all its off diagonal elements
equal to d. Let T1 be the best linear unbiased estimator of θ and T2
be the ordinary least squares estimator of θ which of the following
statement are true?
1 Pn
(a) T1 = n i=1 Yi = T2
Pn
(b) T2 = nȲ and T1 = i=n Ȳ where Ȳ is the mean of Yi0 s
(c) There are exactly (n − 1) linearly independent linear function of
Y1 , Y2 , · · · Yn each with zero expectation
(d)There are exactly (n − 2) linearly independent linear function of
Y1 , Y2 , · · · Yn each with zero expectation Ans:(a) and (c)

6.53 Let X1 , X2 , · · · , Xn are independent and identically distributed N (θ, 1)


random variables, where θ takes only integer values, i.e., θ ∈
(· · · , −2, −1, 0, 1, 2, 3 · · ·) which of the following estimator of θ?.
(a) X̄
(b) Integer closest to X̄
(c) Integer part of X̄ ( Largest integer ≤ X̄)
(d) Median of (X1 , X2 , · · · , Xn ) Ans:(b)

6.54 Let (X1 , Y1 ) · · · (Xn , Yn ) be data on X cultivated land in a district and Y


the area actually under cultivation, both measured in square feet. Let
α̂, β̂, be the least squares estimates t of α, β in the model Y = α + βx + 
where  is the random error. If the data are converted to square
meters, then
(a) α̂ may change but β̂ will not change
(b) β̂ may change but α̂ will not change
A.Santhaakumaran 280

(c) both α̂ and β̂ may change


(d) neither α̂ nor β̂ will change Ans:(a)

6.55 The set X1 , X2 , · · · , Xn are iid as N (µ, σ 2 ) , −∞ < µ < ∞, σ 2 > 0. Then
which of the following statements are true?
Pn
(Xi −X̄ 2 )
(a) i=1
is the unbiased estimator of σ 2
rP n
n
(Xi −X̄ 2 )
(b) i=1
n−1 is the minimum variance unbiased estimator of σ
rP
n
(Xi −X̄ 2 )
(c) i=1
n is the MLE of σ 2
Pn
(X i −X̄ 2 )
(d) i=1
n is the MLE of σ 2 Ans:(b), (c) and (d)

6.56 For the set (x1 , y1 ), · · · (xn , yn ) the following two models were fitted
using least squared method. Model I: yi = β0 + β1 xi , i = 1, 2, 3, · · · , n.
Model II: yi = β0 + β1 xi + β2 x2i , i = 1, 2, 3, · · · n. Let βˆ0 , βˆ1 be the least
squared estimates of β0 and β1 from Model I, and β1∗ , and β2∗ be the
Pn ˆ ˆ 2
least square estimates from Model II. Let A = i=1 [yi − (β0 + β1 xi )] and
Pn ∗ ∗ ∗ 2 2
B= i=1 [yi −(β0 +β2 xi +β2 xi ] . Then which of the following statements
are true?
(a) A ≥ B (b) A ≤ B
(c) It can happen that A = 0 but B > 0
(d) It can happen that B = 0 but A > 0 Ans:( a) and (d)

6.57 A finite population has N units labeled U1 , U2 , · · · UN and the value of


PN
a study variable on unit Ui is Yi , i = 1, 2, 3 · · · , N . Let Y + i=1 Yi and
Ȳ = N1 N
P
i=1 Yi . A sample of size n > 1 is drawn from the population

with probability proportional to size with replacement, with selection


PN
probabilities p1 , p2 , · · · , pN ; 0 < p < 1, i = 1, 2, 3, · · · , N and i=1 pi = 1.
Define T + n1 i∈S Ypii where the sum extends over the units in the sample.
P

Then, which of the following statements are true?


(a) T is an unbiased estimator of Ȳ
(b) T is an unbiased estimator of Y
(c) The variance of T is zero if Yi is proportional to pi ∀ i = 1, 2, 3, · · · , N
A.Santhaakumaran 281

PN Yi
(d) An unbiased estimator of the variance of T is i=1 ( pi − T )2
Ans: (b) and (c)

6.58 Let X1 , X2 , · · · , Xn be a random sample from N (θ, 1) , θ ∈ [−10, 10] and


Y1 , Y2 , · · · , Yn be defined by

 0

if Xi < 0
Yi =
 1

if Xi ≥ o

Suppose θˆn and θ̃ denote MLE of θ based on {X1 , X2 , · · · , Xn } and on


{Y1 , Y2 , · · · , Yn } respectively. Which of the following statements are
true?
(a)limn→∞ E[θˆn ] = 0
(b)limn→∞ E[θ˜n ] = 0
(c) θˆn is a consistent estimator
(d) θ˜n is a consistent estimator Ans:(c) and (d)

6.59 Consider the following regression model Y = β1 X1 + β2 X2 + . Here  ∼


N (0, σ 2 ) random variable. If βˆ1 and βˆ2 are the least square estimators
of β1 and β2 respectively, then which of the following statements are
correct?.
(a) E[βˆ1 ] = β1 (b) E[βˆ2 ] = β2 (c) V [βˆ1 ] > V [βˆ2 ] (d) Cov(βˆ1 , βˆ2 ) < 0
Ans:(a), (b) and (d)

6.60 Consider the following regression problem Yi = α+βi+i , i = 1, 2, 3 · · · , n.


Here i , i = 1, 2, 3, · · · , n are iid N (0, 1) random variables. It is assumed
that α 6= 0 and β is known. If αˆn is the MLE of α which of the following
statements is true?.
(a)limn→∞ E[αˆn ] 6= α
(b) limn→∞ E[αˆn ] = 0
(c) limn→∞ V [αˆn ] = ∞
(d) limn→∞ V [αˆn ] = 0 Ans:(a)
A.Santhaakumaran 282

6.61 Consider the model Y = Xβ +  where X = ((xij ))n×p


     
Y1 β1 1
     
     
 Y2   β2    
Y =  β=  =  E[] = 0 and D() = σ 2 In , p < n
     
 ···   ···  ···
     
 
     
Yn βp n
Let β̂ be the solution of T
X Xβ = X T Y which of the following are true?
(a) If C T β is the estimable, then C T β̂ is the BLUE of C T
(b) All linear parametric function are estimable iff Rank (X) > p
(c) If Rank(X) < p, then some linear parametric functions are not
estimable
(Y −X β̂)T (Y −X β̂)
(d) n−p is an unbiased estimator of σ 2 Ans:(b) and (c)

6.62 Twenty identical items are put in a life testing experiment starting
at time 0. The failure times of the items are recorded in a sequential
manner. The experiment stops if all the items fail or a pre-fixed time
T > 0 reached which ever is earlier. If the life times of the items are
iid exponential random variables with mean θ, where 0 < θ ≤ 10, then
which of the following statements are true?
(a) The MLE of θ always exist
(b) The MLE of θ may not exist
(c) The MLE of θ is an unbiased estimator of θ, if it exists
(d) The MLE of θ is bounded with probability 1 , if it exists
Ans:(c) and (d)

6.63 Let X1 , X2 , · · · , Xn be a random sample from U (0, 5θ), θ > 0. Define


X(1) = min{X1 , X2 , · · · Xn } and X(n) = max{X1 , X2 , · · · , Xn }. MLE of θ is
X(1) X(n)
(a) 5 (b) X(n) (c) X(1) (d) 5 Ans:(d)

6.64 Let Y1 , Y2 , Y3 and Y4 be uncorrelated observations with common un-


known variance σ 2 and expectation given by E[Y1 ] , E[Y3 ] = β1 − β2 =
1
E[Y4 ] where β1 , β2 and β3 are unknown parameters. Define e1 = √
2(Y1 −Y2 )
A.Santhaakumaran 283

1
and e2 = √
2(Y3 −Y4 )
. An unbiased estimator of σ 2 is
(a) 13 (e21 − e22 ) (b) 12 (e21 + e22 ) (c) 14 (e21 + e22 ) (d) e21 + e22 Ans:(b)

6.65 Suppose that the life time of an electric bulb follows an exponential
distribution with mean θ hours. In order to estimate θ, n bulbs are
switched on at the time. After t hours n − m(> 0) bulbs are found to
be in functioning state. If the life times of the other (m > 0) bulbs
are noted as X1 , X2 , · · · , Xm respectively, then the maximum likelihood
estimate of θ is given by
t
(a) θ̂ = n
log( n−m )
Pn
xi
(b) θ̂ = i=1
Pmm
xi (n−m)t
(c) θ̂ = i=1
Pm m
xi +(n−m)t
(d) θ̂ = i=1
n Ans:(a)

6.66 In the linear model Y1 = β1 x11 + β2 x12 + β3 x13 + 1 ; Y2 = β1 x21 + β2 x22 +


β3 x23 + 2 ; Y3 = β1 x31 + β2 x32 + β3 x33 + 3 where 1 , 2 , 3 are iid N (0, σ 2 )
and

x11 x12 x13


X = x21 x22 6 0
x23 =

x31 x32 x33

Let (βˆ1 , βˆ2 , βˆ3 ) be the least squares estimate of (β1 , β2 , β3 ).Let l1 , l2 , l3 ∈ <
(a) (βˆ1 , βˆ2 , βˆ3 ) is unique
P3 P3
(b) i=1 l1 β̂i is the linear unbiased estimate of i=1 li βi
P3
(c) i=1 li β̂i is the uniformly minimum variance unbiased estimate of
P3
i=1 li βi
P3 P3
(d) i=1 li β̂i is BLUE but not UMVUE of i=1 li βi

Ans(a), (b), (c) and (d)

6.67 Let Y1 , Y2 , Y3 be uncorrelated observations with common variance σ 2


and expectations given by E[Y1 ] = θ0 + θ1 , E[Y2 ] = θ0 + θ2 , E[Y3 ] = θ0 + θ3
where θi0 s are unknown parameters. In the frame work of the linear
A.Santhaakumaran 284

model which of the following statements are true ?.


(a) Each of θ0 , θ1 and θ3 is individually estimable
P3
(b) i=0 θi is estimable
(c) θ1 − θ2 , θ1 − θ3 and θ2 − θ3 are estimable
(d) The error sum of squares is zero Ans:(c) and (d)

0.9 (x−θ)
6.68 Consider the pdf f (x, θ, σ 2 ) = σ φ( σ ) + 0.1φ(x − θ) where −∞ < θ < ∞
and σ > 0 are unknown parameters and φ(x) denotes the pdf N (0, 1).
Let X1 , X2 , · · · , Xn be a random sample from this probability function.
Then which of the following statements are correct?
(a) This model of moments estimators for θ and σ exist
(b) This model is not parametric
(c) An unbiased estimator of θ exists
(d) Consistent estimators of θ do not exist Ans:(a) and (c)

6.69 Consider the linear model Y ∼ Nn (Xβ, σ 2 I) where X is a n × (k + 1)


matrix of rank k + 1 < n. Let β̂ and σ 2 be maximum likelihood estima-
tors of β and σ 2 respectively. Then which of the following statements
are true?
(a) C0v(β̂) = σ 2 X T X
(b) β̂ and σˆ2 are independently distributed
(c) σˆ2 is sufficient for σ 2
(d) σˆ2 = Y T AY where A is a suitable matrix of rank (n − k − 1)
Ans:(b) and (d)

6.70 Consider the linear model

Y1 = µ1 − µ2 + 1

Y2 = µ2 − µ3 + 2

··· = ·········

Yn−1 = µn−1 − µn + n−1

Yn = µn − µ1 + n
A.Santhaakumaran 285

where µ1 , µ2 , · · · , µn are unknown parameters and 1 , 2 , · · · n are uncor-


related with mean 0 and common variance . Let Y be the column
1 Pn
vector(Y1 , Y2 , · · · , Yn )T and Ȳ = n i=1 Yi which of the following state-
ments are true?.
(a) If E(C T Y ) = 0, then all elements of C T are equal.
(b) The best linear unbiased estimator of µ1 − µ3 is Y1 + Y2
(c) The best linear unbiased estimator of µ2 − µ3 is Y2 − Ȳ
(d) All linear functions d1 µ1 + d2 µ2 + · · · + dn µn are estimable
Ans:(a) and (c)

6.71 Suppose {X1 , X2 , · · · , X2 }, n ≥ 2 is a random sample from the distribu-


tion with pdf

 θθ e−θx xθ−1

x>0
Γθ
f (, x, θ) =
 0

otherwise

with θ > 0, then the method of moments estimator of θ


(a) does not exist (b) is P(xn−1)2 (c) is P(xn−x̄)2 n−1
(d) is P(x −1)2
i i i

Ans:(c)

6.72 Let X1 , X2 , · · · , Xn be a random sample from the probability density


function 
 1 e−|x−θ|

−∞ < x < ∞
2
f (x, θ) =
 0

otherwise
which of the following statements are true?
(a) The MLE of θ is the sample Median
Pn
(b) i=1 Xi is a sufficient statistic for θ
(c) The MLE of θ is a function of a sufficient statistic
Pn
Xi
(d) i=1
n is the moment estimator of θ Ans:(a),(b),(c) and (d)

6.73 Let X1 and X2 be a random sample of size two from a distribution


A.Santhaakumaran 286

with pdf

 θ √1 e− 21 x2 + (1 − θ) 1 e−|x|

θ ∈ (0, 12 , 1), −∞ < x < ∞
2π 2
fθ (x) =
 0

otherwise

If the observed values of X1 and X2 are 0 and 2 respectively, then the


maximum liklihood estimate of θ is
1
(a) 0 (b) 2 (c) 1 (d) not unique Ans:( c)

6.74 Let X1 , X2 , · · · , Xn be a random sample from the following P DF



 1 (x − µ)α−1 e−(x−µ)

if x > µ
Γα
f (x, µ, α) =
 0

otherwise

Here −∞ < x < ∞ and α > 0. Then which of the statements are true?
(a) The method of moment estimators neither α nor µ exist
(b) The method of moment estimators of α exists and it is a consistent
estimator α
(c) The method of moment estimators of µ exists and it is a consistent
estimator of µ
(d) The method of moment estimators of both α and µ exist, but they
are not consistent Ans:(b) and (c)
7. INTERVAL ESTIMATION

7.1 Introduction

Let X be a random sample drawn from a population with pdf pθ (x), θ ∈ Ω.


For every distinct value of θ, θ ∈ Ω, there corresponds one member of the family of
distributions. Thus one has a family of pdf ’s {pθ (x), θ ∈ Ω}. The experimenter needs
to select a point estimate of θ, θ ∈ Ω. Even though the estimator may have some valid
statistical properties, the estimator may not reflect the true value of the parameters,
due to the randomness of the observations. Hence one may search for an alternative to
get the closeness of the estimates to the unknown parameters with certain probability
values. Hence as an alternative one may go for the interval estimation with certain
level of significance. This chapter deals with interval estimation.

Family of random sets

Let Pθ , θ ∈ Ω ⊆ <k , be the set of probability distributions of the random variable


X. A family of subsets of S(X) of Ω depends on the observations x of X but not θ, is
called family of random sets.

7.2 Confidence intervals

The problem of interval estimation is that finding a family of random sets S(X) of
the parameter θ, such that for a given α, 0 < α < 1, Pθ {S(X) contains θ} ≥ 1−α, ∀ θ ∈
Ω.
Let θ ∈ Ω ⊆ < and 0 < α < 1. A function θ(X) satisfying Pθ {θ(X) ≤ θ} ≥
1−α ∀ θ is called lower confidence bound of θ at confidence level (1−α). The infiumum
takes over all possible values of θ ∈ Ω ⊆ < of Pθ {θ(X) ≤ θ} is (1 − α). The quantity
(1 − α) is called confidence coefficient.
A function of the form Pθ {θ ≤ θ̄(X)} ≥ 1 − α ∀ θ ∈ Ω ⊆ < is called upper
confidence bound of θ at confidence level (1 − α).

If S(x) is of the form S(x) = θ(x), θ̄(x) , then it is called a confidence interval
at confidence level (1 − α), provided Pθ {θ(X) ≤ θ ≤ θ̄(X)} ≥ 1 − α ∀ θ ∈ Ω ⊆ <. The
A.Santhakumaran 288


confidence coefficient (1 − α) is associated with the random interval θ(X), θ̄(X) ,
Let X be a random sample drawn from a population with pdf pθ (x), θ ∈ Ω ⊆ <
and a, b be two given positive numbers such that a < b, a, b ∈ <. Consider

Pθ {a < X < b} = Pθ {a < X and X < b}


X
 
= Pθ 1 < and X < b
a
b
 
= Pθ b < X and X < b
a
b
 
= Pθ X < b and b < X
a
b
 
= Pθ X < b < X
a
b
The interval with end points X and aX that are functions of X. Hence I(X) =
   
X, ab X is a random interval. Thus if I(X) takes a value x, ab x when X takes the
value x with certain confidence of fixed probability.
Let θ be an unknown parameter and let (θ(X), θ̄(X)) be a (1 − α) level confidence
interval for θ. One desires the confidence limit for g(θ), a monotonic function of
 
θ. The set θ(X), θ̄(X) is equivalent to the set g(θ(X)), g(θ̄(X)) as long as g(θ)

is a monotonic increasing function of θ. Thus g(θ(X)), g(θ̄(X)) is a (1 − α) level

confidence interval for g(θ). If g(θ) is monotonic decreasing, then g(θ̄(X)), g(θ(X))
is a (1 − α) level confidence interval for g(θ).
Problem 7.1 For a single observation x of a random variable X with density function

 1

0 < x < θ, θ > 0
θ
pθ (x) =
 0

otherwise

Obtain the probability of confidence of the random interval (X, 10X) for θ, θ ∈ Ω.
Solution: The probability of confidence of the interval (X, 10X) for θ is

θ
 
Pθ {X < θ < 10X} = Pθ 1 < < 10
X
θ
 
= Pθ <X<θ
10
Z θ
1
= dx = 0.9
θ
10
θ
A.Santhakumaran 289

 
1 19
Problem 7.2 Find the confidence coefficient of the confidence interval 19X , X

for θ based on a single observation x of a random variable X with pdf



θ
0 < x < ∞, θ > 0


(1+θx)2
pθ (x) =
 0

otherwise
 
1 19
Solution: The confidence coefficient of the interval 19X , X for θ is
1 19 1 19
   
Pθ <θ< = Pθ <X<
19X X 19θ θ
Z 19
θ θ
= dx
1
19θ
(1 + θx)2
19
1 2θ
Z
θ
= dx
2 1
19θ
(1 + θx)2
 19
1 1

θ
= −
2 (1 + θx) 1
 19θ
1 1 19

= − − = 0.45
2 20 20
 
X 2X
Problem 7.3 Compute the confidence coefficient of the interval 1+X , 1+2X for
θ
1+θ where X has the pdf

 1

0 < x < θ, θ > 0
θ
pθ (x) =
 0

otherwise
 
X 2X θ
Solution: The confidence coefficient of the interval 1+X , 1+2X for 1+θ is
X θ 2X 1 + 2X 1+θ 1+x
   
Pθ < < = Pθ < <
1+X 1+θ 1 + 2X 2X θ X
1 1 1
 
= Pθ +1< +1< +1
2X θ X
1 1 1
 
= Pθ < <
2X θ X
= Pθ {X < θ < 2X}
θ
 
= Pθ 1 < <2
X
X 1
 
= Pθ 1 > >
θ 2
θ
 
= Pθ <X<θ
2
Z θ
1 1 θ
 
= dx = θ− = 0.5
θ
2
θ θ 2
A.Santhakumaran 290

Problem 7.4 Let T = t(X) be the maximum of two independent observations


drawn from a population with uniform distribution over the interval (0, θ). Compute
the confidence coefficient of the interval (0, 2T ).
Solution: Let T = max{X1 , X2 }. The pdf of T is

 22 t

0<t<θ
θ
pθ (t) =
 0

otherwise

The confidence coefficient of the interval (0, 2T ) is

θ
 
Pθ {0 < θ < 2T } = Pθ 0 < < 2
T
θ
 
= Pθ <T <∞
2
θ
 
= Pθ <T <θ
2
Z θ
2
= 2
tdt
θ
2
θ
" #θ
2 t2
= = 0.75
θ2 2 θ
2

Problem 7.5 For a sample observation x of X is drawn from a population with pdf

 22 (θ − x) 0 < x < θ, θ > 0

θ
pθ (x) =
 0

otherwise

Find (1 − α) level confidence interval for θ.


X
Solution: Consider the pdf of Y = θ. It is given by

 2(1 − y) 0 < y < 1

p(y) =
 0

otherwise

The (1 − α) level confidence interval for θ is

Pθ {λ1 < Y < λ2 } = 1 − α


α
i.e., Pθ {Y ≥ λ2 } =
2
α
and Pθ {Y ≤ λ1 } =
2
A.Santhakumaran 291

Z 1
α
Thus 2(1 − y)dy =
λ2 2
2
λ2 − 2λ2 − (1 − α/2) = 0
q
⇒ λ2 = 1 − 2 − α/2 = c2

Pθ {Y ≤ λ1 } = α/2
Z λ1
2(1 − y)dy = α/2
0
⇒ λ21 − 2λ1 + α/2 = 0
q
λ1 = 1 − 1 − α/2 = c1

The (1 − α) level confidence interval for θ is

Pθ {c1 < Y < c2 } = 1 − α


X
 
Pθ c1 < < c2 = 1−α
θ
X X
 
Pθ <θ< = 1−α
c2 c1
 
X X
c2 , c1 is the (1 − α) level confidence interval for θ.
Problem 7.6 Obtain (1 − α) level confidence interval for θ, using a random sample
of size n from a population with pdf

 e−(x−θ)

x ≥ θ, θ > 0
pθ (x) =
 0

otherwise

Solution: Let Y1 = min1≤i≤n {Xi } be the first order statistic of random sample
X1 , X2 , · · · , Xn . The pdf of Y1 is given by

 ne−n(y1 −θ)

θ < y1 < ∞
pθ (y1 ) =
 0

otherwise

Denote t = y1 − θ, then the pdf of T = t(X) is



 ne−nt

0<t<∞
p(t) =
 0 otherwise

The (1 − α) level confidence interval for θ is

Pθ {λ1 < T < λ2 } = 1 − α


A.Santhakumaran 292

Z λ2
ne−nt dt = 1 − α
λ1
−nλ1
e − e−nλ2 = 1−α

This equation has infinitely many solutions. If one can choose λ1 = 0, then 1−e−nλ2 =
1 − α, i.e., e−nλ2 = α ⇒ −nλ2 = log α. Thus λ2 = 1
n log( α1 ). .˙. The (1 − α) level
confidence interval for θ is
1 1
  
Pθ 0 < T < log = 1−α
n α
1 1
  
Pθ 0 < Y1 − θ < log = 1−α
n α
1 1
   
Pθ Y1 − log < θ < Y1 = 1−α
n α
 
1
Y1 − n log( α1 ), Y1 is the (1 − α) level confidence interval for θ.
Problem 7.7 Given a sample of size n from U (0, θ). Show that the confidence
interval for θ based on the sample range R with confidence coefficient (1 − α) and of
the form (R, Rc ) has c given as a root of the equation

cn−1 [n − (n − 1)c] = α.

Also give the case n = 2.


Solution: The pdf of Range R of sample size n is given by
 hR in−2
 n(n − 1) ∞ p(x | θ)p[(x + R) | θ] x+R p(x | θ)dx
R
dx

−∞ x
pθ (R) =
 0

otherwise

Given pθ (x) = 1θ , 0 < x < θ and pθ (x + R) = 1θ , 0 < x + R < θ or 0 < x < θ − R.


Z θ−R "Z #n−2
11 R+x 1
pθ (R) = n(n − 1) dx dx
0 θθ x θ
Z θ−R
1 1
= n(n − 1) Rn−2 dx
0 θ2 θn−2
n(n − 1) n−2
= R (θ − R)
θn
n(n − 1) R n−2 R
   
= 1− , 0<R<θ
θ θ θ
R
If y = θ, then

 n(n − 1)y n−2 (1 − y) 0 < y < 1

p(y) =
 =0

otherwise
A.Santhakumaran 293

The (1 − α) level confidence interval for θ is given by

R
Pθ {λ1 < < λ2 } = 1 − α
θ
P {λ1 < Y < λ2 } = 1 − α
Z λ2
p(y)dy = 1 − α
λ1
Z λ2
n(n − 1)y n−2 (1 − y)dy = 1 − α
λ1
" #λ2
y n−1 yn
n(n − 1) − = 1−α
n−1 n λ1
nλn−1
2 − (n − 1)λn2 − nλ1n−1 + (n − 1)λn1 = 1−α

This equation has infinitely many solutions. If one can choose λ1 = c and λ2 = 1, then
the confidence interval for θ is

P {c < Y < 1} = 1 − α
R
P {c < < 1} = 1 − α
θ
R
P {R < θ < } = 1 − α
c
 
R, Rc is the (1−α) level confidence interval for θ where c is given by cn−1 [n−(n−1)c] =

α. For n = 2, c = 1 − 1 − α.

7.3 Alternative method of confidence intervals

For large or small samples, the Chebychev’s inequality can be employed to find the
confidence interval for a parameter θ, θ ∈ Ω. For a random variable X with Eθ [X] = θ
and Vθ [X] = σ 2 , then

1
 q 
Pθ |X − θ| <  V [X] > 1 − where  > 1
2

If θ̂(X) is the estimate of θ ( not necessarily unbiased) with finite variance, then by
Chebychev’s inequality

1
 q 
Pθ |θ̂(X) − θ| <  Eθ [θ̂(X) − θ]2 >1−
2
A.Santhakumaran 294

 q q 
1
⇒ θ̂(x) −  Eθ [θ̂(X) − θ]2 , θ̂(x) +  Eθ [θ̂(X) − θ]2 is a 1 − 2
level confidence
interval for θ.
Problem 7.8 Let X1 , X2 , · · · , Xn be iid b(1, θ) random variables. Obtain (1 − α)
level confidence interval for θ by using Chebychev’s inequality.
Pn Vθ [X]
Solution: i=1 Xi ∼ b(n, θ) since each Xi ∼ b(1, θ). Eθ [X̄] = θ and Vθ [X̄] = n =
θ(1−θ)
n . Now  s 
 θ(1 − θ)  1
Pθ |X̄ − θ| <  >1−
 n  2

Since θ(1 − θ) ≤ 41 ,
1 1
 
Pθ |X̄ − θ| <  √ >1−
2 n 2
  1
 
Pθ X̄ − √ < θ < X̄ + √ >1−
2 n 2 n 2
If n is kept constant, then one can choose 1 − 12 = 1 − α ⇒ 2 = 1
α ⇒ = √1 .
α
Thus
the (1 − α) level confidence interval for θ is

1 1
 
x̄ − √ , x̄ + √
2 nα 2 nα

Problem 7.9 Let X be a Binomial random variable with parameters n and θ. Obtain
(1 − α) level confidence interval for θ.
α
Solution: One can find the largest integer n2 (θ) such that Pθ {X ≤ n2 (θ)} ≥ 2 and
the smallest integer n1 (θ) such that Pθ {X ≥ n1 (θ)} ≥ α2 .
Because of the discreteness of the Binomial probability, one cannot make these
α
probabilities exactly equal 2 for all θ other than the symmetrical Binomial probability.
The events {X ≤ n1 (θ)} and {X ≥ n2 (θ)} are mutually exclusive, then

α α
Pθ {X ≤ n1 (θ) or X ≥ n2 (θ)} ≤ + =α
2 2
i.e., Pθ {n1 (θ) < X < n2 (θ)} ≥ 1 − α

The two functions n1 (θ) and n2 (θ) are monotonic and non - decreasing and also dis-
continuous step function such that the (1 − α) level confidence interval for θ is

Pθ {n−1 −1
2 (X) < θ < n1 (X)} ≥ 1 − α
A.Santhakumaran 295

α
where Pθ {X ≤ n1 (θ)} ≤
2
n1 (θ) n
X α
i.e., i θi (1 − θ)n−i ≤
i=0
2

If the observed value X = x, then n−1


1 (x) is the upper confidence limit for θ and

n1 (n−1
1 (x)) = x so that
x n
X α
i θi (1 − θ)n−i = (7.1)
i=0
2
Thus the upper confidence limit for θ. Similarly the lower confidence limit for θ is
n n
X α
i θi (1 − θ)n−i = (7.2)
i=x
2

Solving the equations (7.1) and (7.2) for θ ( when n and α are known) gives the (1 − α)
level confidence interval for θ. i.e., (θ(X), θ̄(X)) is the (1 − α) level confidence interval
where θ̄(x) is the solution of the equation (7.1) and θ(x) is the solution the equation
(7.2).
Problem 7.10 Assuming there is a constant probability θ, for a person entering a
supermarket will make a purchase. Constitute a random sample of a Bernoulli random
variable ( success = purchase made, failure = no purchase made). If 10 persons were
selected at random and it was found that 4 made a purchase. Obtain 90% confidence
interval for θ.
Solution: The 90% confidence limits for θ is
4 10
!
X
i θi (1 − θ)10−i = 0.05
i=0
10 10
!
X
i θi (1 − θ)10−i = 0.05
i=4

Solving these equations for θ, one may get that θ̄(x) = 0.696 and θ(x) = 0.150. Thus,
if a random sample of 10 independent Bernoulli random variables gives x = 4 success,
the 90 % confidence interval for θ is ( 0.150, 0.696).
Problem 7.11 Let X1 , X2 , · · · , Xn be a random sample from a Poisson random
variable X with parameter θ. Obtain (1 − α) level confidence interval for θ.
Pn
Solution: Let Y = i=1 Xi . Given that each Xi follows P (θ). Then Y ∼ P (nθ). The
A.Santhakumaran 296

exact (1 − α) level confidence interval for θ is

Pθ {λ1 (θ) < Y < λ2 (θ)} = 1 − α


α
i.e., Pθ {Y ≥ λ2 (θ)} ≤
2

(nθ)x α
e−nθ
X
⇒ = (7.3)
x=y x! 2
α
and Pθ {Y ≤ λ1 (θ)} ≤
2
y
(nθ)x α
e−nθ
X
⇒ = (7.4)
x=0
x! 2
The (1 − α) level confidence interval for θ is equivalent to

Pθ {λ−1 −1
2 (Y ) < θ < λ1 (Y )} = 1 − α

Solving the equations (7.3) and (7.4) for θ, the (1 − α) level confidence interval for θ

is θ(X), θ̄(X) where θ̄(x) is the solution of equation (7.3) and θ(x) is the solution of
equation (7.4).
Problem 7.12 Let X1 , X2 , · · · , Xn be a random sample of a Uniform random variable
X on (0, θ). Obtain (1 − α) level confidence interval for θ.
Solution: Let T = t(X) = max1≤i≤n {Xi }. The pdf of T is

 nn tn−1

0<t<θ
θ
p(t | θ) =
 0

otherwise

The (1 − α) level confidence interval for θ is

Pθ {λ1 (θ) < T < λ2 (θ)} = 1 − α


α
Pθ {T ≤ λ1 (θ)} =
2
α
and Pθ {T ≥ λ2 (θ)} =
2
Z θ
n n−1 α
Thus Pθ {T ≥ λ2 (θ)} = n
t dt =
λ2 (θ) θ 2
Z λ2 (θ)
n n−1 α
⇒ 1− n
t dt =
0 θ 2
α [λ2 (θ)]n
⇒ 1− =
2 θn
A.Santhakumaran 297

α
 
n
i.e., θ 1− = [λ2 (θ)]n
2
1
α

n
i.e., λ2 (θ) = θ 1 −
2
α
Similarly Pθ {T ≤ λ1 (θ)} =
2
Z λ1 (θ)
n n−1 α
⇒ t dt =
0 θn 2
 1
α n
i.e., λ1 (θ) = θ
2

Thus the (1 − α) level confidence interval for θ is


(  1 1)
α α

n n
Pθ θ <T <θ 1− = 1−α
2 2
 
 T T 

α
1 < θ < α
1  = 1 − α

1− 2
n
2
n

!
T T
1 , 1 provides the (1 − α) level confidence interval for θ.
(1− α2 ) n ( α2 ) n

Problem 7.13 Let X1 , X2 , · · · , Xn be iid random samples drawn from a normal


population with mean θ and variance σ 2 . Find (1 − α) level confidence interval for θ,
(i) when σ 2 is known and (ii) σ 2 is unknown.
Solution: Case (i) when σ 2 is known , consider

P {a < Z < b} = 1 − α
X̄ − θ
whereZ = √ ∼ N (0, 1)
σ/ n
X̄ − θ
P {a < √ < b} = 1 − α
σ/ n
√ √
P {X̄ − bσ/ n < θ < X̄ − aσ/ n} = 1 − α
Z a
α
where a is given by φ(z)dz =
−∞ 2
Z ∞
α
and b is given by φ(z)dz =
b 2

Case (ii) When σ 2 is unknown and sample size n ≤ 30 then the statistic

X̄ − θ
t= √ ∼ t distribution with n − 1 d.f
S/ n
A.Santhakumaran 298

1 Pn
where S 2 = n−1 i=1 [Xi − X̄]2 . In this case

P {t1 < t < t2 } = 1−α


X̄ − θ
P {t1 < √ < t2 } = 1−α
S/ n
X̄ − θ
P {t1 < √ < t2 } = 1−α
S/ n
√ √
P {X̄ − t2 S/ n < θ < X̄ − t1 S/ n} = 1−α
Z t1
α
where t1 is given by pn−1 (t)dt =
−∞ 2
Z ∞
α
and t2 is given by pn−1 (t)dt =
t2 2
X̄−θ
If n > 30 , then t = √
S/ n
∼ N (0, 1). Such a case the 1 − α confidence interval is

S S
(X̄ − zα/2 √ , X̄ − zα/2 √ )
n n
α R∞
where 2 = zα/2 φ(z)dz
Problem 7.14 A random sampling of size 50 taken from a N (θ, σ = 5) has a mean 40.
Obtain a 95% confidence interval for 2θ + 3
Solution: Given the sample mean x̄ = 40 and population standard deviation σ = 5.
The 95% confidence interval for θ is
σ σ
 
P X̄ − 1.96 √ < θ < X̄ + 1.96 √ = 0.95
n n
σ σ
    
P 2 X̄ − 1.96 √ < 2θ < 2 X̄ + 1.96 √ = 0.95
n n
σ σ
     
P 2 X̄ − 1.96 √ + 3 < 2θ + 3 < 2 X̄ + 1.96 √ +3 = 0.95
n n
The 95% confidence limits for 2θ + 3 are
5×2 5×2
2X̄ + 3 ± 1.96 √ = 83 ± 1.96 √
50 50
The 95% confidence interval for 2θ + 3 is ( 80.2281 , 85.7718)

7.4 Shortest length confidence intervals

Let X1 , X2 , · · · , Xn be a random sample from a pdf p(x | θ) and t(X; θ) = Tθ be


an random variable with distribution independent of θ. Suppose λ1 (α) and λ2 (α) are
A.Santhakumaran 299

chosen such that


Pθ {λ1 (α) < Tθ < λ2 (α)} = 1 − α (7.5)

Equation (7.5) can also be written as

Pθ {θ(X) < θ < θ̄(X)} = 1 − α

For every Tθ , λ1 (α) and λ2 (α) can be chosen in number of ways. However the choice
is one like to choose λ1 (α) and λ2 (α), such that θ̄(X) − θ(X) is minimum which is the
(1 − α) level shortest confidence interval based on Tθ .
Let Tθ = t(X, θ) be sufficient statistic. A random variable Tθ is a function of
(X1 , X2 , · · · , Xn ) and θ whose distribution is independent of θ is called pivot.
Problem 7.15 Let X1 , X2 , · · · , Xn be a random sample from N (θ, σ 2 ) where σ 2 is
known. Obtain (1 − α) level shortest confidence interval for θ.
X̄−θ
Solution Consider the statistic Tθ = √σ which is a pivot. Since X̄ is sufficient and
n

Tθ ∼ N (0, 1), i.e, the distribution of Tθ is independent of θ. The (1−α) level confidence
interval for θ is

Pθ {a < Tθ < b} = 1 − α
( )
X̄ − θ
Pθ a < <b = 1−α
√σ
n
σ σ
 
Pθ X̄ − b √ < θ < X̄ − a √ = 1−α
n n

The length of this confidence interval is √σ (b − a).


n

Minimize L = √σ (b − a) subject to
n

Z b
1 1 2
√ e− 2 x dx = 1 − α
a 2π
Z b
i.e., φ(x)dx = 1 − α (7.6)
a

where φ(x) ∼ N (0, 1). The necessary condition for minimum of L is

∂L σ db
 
= √ −1 =0
∂a n da
A.Santhakumaran 300

db
 
⇒ −1 = 0
da
Z a
1 1 2
Define φ(a) = √ e− 2 z dz
−∞ 2π
Differentiate equation (7.6) with respect to a
Z b
dφ(x) da db
dx − φ(a) + φ(b) = 0
a da da da
Z b
db
0 × dx − φ(a) + φ(b) = 0
a da
db
φ(b) − φ(a) = 0
da
db φ(a)
⇒ =
da φ(b)
φ(a)
 
Thus −1 = 0
φ(b)
dL R a
If da = 0, then φ(a) = φ(b), i.e., when a = b or a = −b. If a = b, then a φ(x)dx = 0
R a R b
which does not satisfy a φ(x)dx = 1−α. If a = −b, then −b φ(x)dx = 1−α. Thus the
shortest length confidence interval based on Tθ is a equal two tails confidence interval.
The (1 − α) level confidence interval for θ is
σ σ
 
X̄ − √ z α2 , X̄ + √ z α2
n n
α
where z α2 is the upper ordinate corresponding to the area 2. The shortest length of
this interval is L = 2z α2 √σn .
Problem 7.16 Let X1 , X2 , · · · , Xn be a sample from U (0, θ). Find (1−α) level shortest
confidence interval for θ.
Solution: Let T = max1≤i≤n {Xi }. The pdf of T is

 nn tn−1

0<t<θ
θ
pθ (t) =
 0

otherwise
T
The pdf of Y = θ is given by

 ntn−1

0<y<1
p(y) =
 0

otherwise
T
The statistic Y = θ is pivot. The (1 − α) level confidence interval for θ is

P {a < Y < b} = 1 − α
A.Santhakumaran 301

T
 
P a< <b = 1−α
θ
T T
 
P <θ< = 1−α
b a
1 1
 
The length of the interval isL = − T
a b

To find the shortest confidence interval, minimizing L subject to


Z b
ny n−1 dy = 1 − α
a

i.e., [y n ]ba = 1 − α

bn − an = 1 − α

Diffenciate this with respect to b


da
nbn−1 − nan−1 = 0
db
 n−1
da b
i.e., =
db a
1
Now(1 − α) n < b ≤ 1
dL 1 da 1
 
= T − 2 + 2
db a db b
!
an+1 − bn+1
= T <0
b2 an+1

since a < b ≤ 1 . The minimum occurs at b = 1 , i.,e., 1 − an = 1 − α ⇒ an = α and


1
a = α n . Thus the (1 − α) shortest confidence interval for θ is

T
 
T, 1
αn
Problem 7.17 Let X1 , X2 , · · · , Xn be a sample drawn from a Normal population
N (θ, σ 2 ) where σ 2 is unknown. Obtain (1 − α) level shortest confidence interval for θ.
(X̄−θ) 1 Pn
Solution : The statistic Tθ = S

is a pivot where S 2 = n−1 i=1 (Xi − X̄)2 since
n
X̄ is sufficient and Tθ is independent of the parameter θ. Then Tθ follows t distribution
with (n − 1) degrees of freedom. The (1 − α) level confidence interval for θ is given by

Pθ {a < Tθ < b} = 1 − α
S S
 
Pθ X̄ − b √ < θ < X̄ − a √ = 1−α
n n
A.Santhakumaran 302

The length of the confidence interval L = (b − a) √Sn .


Rb
Minimizing L subject to a pn−1 (t)dt = (1 − α) where pn−1 (t) is the pdf of the t
distribution with n − 1 degrees of freedom.

dL db S
 
= − 1 √ and
da da n
db
pn−1 (b) − pn−1 (a) = 0
da
dL pn−1 (a) S
 
⇒ = −1 √
da pn−1 (b) n
Z a
where pn−1 (a) = pn−1 (t)dt
−∞

As in the problem 7.15, the minimum occurs at a = −b. The (1 − α) level confidence
interval is a equal two tails confidence interval for θ is

S S
 
X̄ − t α2 (n − 1) √ , X̄ + t α2 (n − 1) √
n n
R a α
where a = t α2 (n − 1) is given by −∞ pn−1 (t)dt = 2 and b = −a
This shortest length of this interval is L = 2t α2 (n − 1) √Sn .
Problem 7.18 Let X1 , X2 , · · · Xn be iid random samples drawn from a Normal
population with mean θ and variance σ 2 . Find (1−α) level shortest confidence interval
for σ 2 when (i) θ is known and (ii) θ is unknown.
Solution: The Statistic
Pn
i=1 (Xi − θ)2
Tσ2 = ∼ χ2 with n degrees of fredom
σ2

Tσ2 is a pivot, since the statistic Tσ2 is independent of σ 2


Case (i)
The (1 − α) level confidential interval for σ 2 is

P {a < Tσ2 < b} = 1 − α


( Pn )
i=1 (Xi − θ)2
P a< <b = 1−α
σ2
(P )
n Pn
i=1 (Xi − θ)2 2 i=1 (Xi − θ)2
i.e., P <σ < = 1−α
b a
A.Santhakumaran 303

The length of the shortest confidence interval is


n
1 1
X  
2
L= (Xi − θ) −
i=1
a b

To find the shortest length confidence interval, minimizing L subject to


Z b
pn (χ2 )dχ2 = 1 − α
a

where pn (χ2 ) is the pdf of the χ2 statistic. with n df

dL 1 1 X
 
= − (Xi − θ)2
da a b
Z b
db
and 0dχ2 + pn (b) − pn (a) = 0
a da
db pn (a)
i.e., =
da pn (b)
Z a
wherepn (a) = pn (χ2 )dχ2
0
dL 1 1 pn (a) X
 
= − (Xi − θ)2
da a b pn (b)
dL
For minimum = 0
da
1 1 pn (a)
⇒ 2 =
a b2 pn (b)
2
⇒ b pn (b) = a2 pn (a)

Using iterative method to solve the equation b2 pn (b) = a2 pn (a) for a and b, i.e., to
solve
Z b Z a
2 2 2 2
b pn (χ )dχ = a pn (χ2 )dχ2 where a < b and a 6= b
0 0

If â and b̂ are the solution of the equation, then the shortest confidence interval for σ 2
is Pn Pn !
i=1 (Xi − θ)2 i=1 (Xi − θ)2
,
b̂ â
Case(ii)
If θ is unknown, then
Pn
i=1 (Xi − X̄)2 (n − 1)S 2 Pn
Tσ2 = = ∼ χ2 (n − 1)df where S 2 = 1
n−1 i=1 (Xi − X̄)2
σ2 σ2
A.Santhakumaran 304

In this case to solve the equation

a2 pn−1 (a) = b2 pn−1 (b)

where pn−1 (χ2 ) is the pdf is the statistic

(n − 1)S 2
Tσ2 =
σ2

with (n − 1)df
The shortest confidence interval for σ 2 is
!
(n − 1)S 2 (n − 1)S 2
,
b̂ â

Problem 7.19 Let X and Y be two independent random variables that are N (θ, σ12 )
σ22
and N (θ, σ22 ) respectively. Obtain (1 − α) level confidence interval for the ratio σ12
<1
by considering a random sample X1 , X2 , · · · , Xn1 of size n1 ≥ 2 from the distribution
of X and a random sample Y1 , Y2 , · · · , Yn2 of size n2 ≥ 2 from the distribution of Y .
1 Pn1 1 Pn2
Solution: Let s21 = n1 i=1 (Xi − X̄)2 and s22 = n2 i=1 (Yi − Ȳ )
2 be the variances of
n1 s21 n s2
the two samples. The independent random variables σ12
and σ2 2 2 have χ2 distribution
2

with n1 − 1 and n2 − 2 degrees of freedom respectively. The definition of the F statistic


is
n1 s21
σ12 (n1 −1)
F = n2 s22
∼ F distribution with n1 − 1 and n2 − 1 degrees of freedom.
σ22 (n2 −1)

σ22
The (1 − α) level confidence interval for σ12
is

n1 s21
 
 
σ12 (n1 −1)
 
P σ2 a< n2 s22
<b = 1−α
2  
σ2
 2

1 σ2 (n2 −1)
n2 s22 n2 s22
 

(n2 −1) σ2 (n −1)

P σ2 a < 22 < b n2 s2 = 1−α
2  n1 s21 σ1 1 1 
σ2 (n1 −1) (n1 −1)
1

σ22
The (1 − α) level confidence interval for σ12
is
!
n2 s22 (n1 − 1) n2 s22 (n1 − 1)
a ,b
n1 s21 (n2 − 1) n1 s21 (n2 − 1)
A.Santhakumaran 305

where a and b are given by


Z a
α
= dF (n1 − 1, n2 − 1)
2
Z0 ∞
α
= dF (n1 − 1, n2 − 1)
2 b

Problem 7.20 Let X1 , X2 , · · · , Xn be a random sample of size n from an Exponential


family of distributions with parameter θ. Assume the pdf is

 θe−θx

θ>0
pθ (x) =
 0

otherwise

Obtain (1 − α) level confidence interval for θ.


Solution: The joint pdf of the random sample X1 , X2 , · · · , Xn is
P
p(x1 , x2 , · · · , xn ) = θn e−θ xi

Pn
Let T = i=1 Xi , then T ∼ G(n, 1θ ). Its pdf is

 θn e−θt tn−1

0<t<∞
Γn
pθ (t) =
 0

otherwise

− y2 2n
1
2n Γn e y 2 −1 0<y<∞


pθ (y) =
 0

otherwise

Xi follows χ2 distribution with 2n degrees of freedom. The (1 − α)


P
That is Y = 2θ
level confidence interval for θ is
n X o
Pθ a < 2θ Xi < b = 1−α
a b
 
Pθ P <θ< P = 1−α
2 Xi 2 Xi

where a is given by
Z a
α
p2n (χ2 )dχ2 =
0 2
and b is given by
Z ∞
α
p2n (χ2 )dχ2 =
b 2
A.Santhakumaran 306

Problem 7.21 The time to failure for an electronic component is assumed to be an


Exponential distribution with unknown parameter θ,

 θe−θx

x > 0, θ > 0
i.e., pθ (x) =
 0

otherwise

10 electronic components are place on test and their observed times to failure are 607.5,
1947.0, 37.6, 129.9, 409.5, 529.5, 109.0, 582.4, 499.0, 188.1 hours respectively. Find
the 90% confidence interval for θ and 90% confidence interval for mean time to failure.
Also obtain the 90% confidence interval for the probability of the component for a 100
hours period.
P
Solution: As in the Problem 7.16, xi = 5039.5, 2n = 20 degrees of freedom. From
χ2 table χ20.5 = 10.9 and χ20.95 = 31.4. 90% confidence interval for θ is

10.9 31.4
 
, = (0.00108, 0.00312)
2 × 5039.5 2 × 5039.5

The mean time to failure is 1θ . The 90% confidence interval for mean time to failure
1 1
lies between 0.00312 = 320.5 hours and 0.00108 = 925.9 hours.
The probability that one of these components will work at least t hours without
failure is P {X > t} = e−θt . The 90% confidence interval for the probability of com-
ponent for a 100 hours period lies between e−100×0.00312 = 0.732 and e−100×0.00108 =
0.898.
Problem 7.22 Explain a method of construction of large sample confidence interval
for θ in Poisson (θ).
Solution: For large samples the variable
∂ log L
Z = q ∂θ ∼ N (0, 1)
V [ ∂ log
∂θ
L
]

Hence the distribution of Z one can easily construct the confidence limits for θ for
large samples. We have
X X
log L(θ) = xi log θ − nθ − log xi
∂ log L(θ) nx̄
= −n
∂θ θ
A.Santhakumaran 307

" #
∂ log L(θ) nX̄
 
V = V −θ
∂θ θ
n
" #
1 X
= V Xi
θ2 i=1
n
1 X
= V [X]
θ2 i=1
1
= nθ
θ2
n
=
θ
nx̄
θ −n
ThusZ = p
n/θ

The 95% large confidence interval for θ is

P {−1.96 < Z < 1.96} = 0.95


r
n
P {−1.96 < (X̄ − θ) < 1.96} = 0.95
θ

Hence the 95% confidence limits for θ are


r
n
(x̄ − θ) = ±1.96
θ
3.42
 
θ2 − 2x̄ + θ + x̄2 = 0
n
r
1.92 3.42 3.69
θ = x̄ + ± x̄ + 2
n n n

7.5 Bayes estimation

Bayes estimation treats the parameter θ of a statistical distribution as the realiza-


tions of a random variable Ω with known distribution rather than a unknown constant.
So far the realization of distributions have assumed only the shape of the distribution
to be known but not the value of the parameters. Bayes estimation uses the prior
information of the distribution to completely specify the realization of distributions.
This is the major difference in Bayes estimation and it may quite reasonable, if the
past experience is sufficiently extensive and relevant to the problem. The choice of
prior distribution is made like that of the distribution Pθ by combining experience
with convenience.
A.Santhakumaran 308

A number of observations are available from the distribution Pθ , θ ∈ Ω of a


random variable X and it may be used to check the assumption of distribution form .
But in Bayes estimation only a single observation is available from the distribution of
parameter θ on Ω and it cannot be used to check the assumption of the distribution.
This needs a special care to use in the Bayes estimation.
Replication of a random experiment consists of drawing another set of ob-
servations from the distribution Pθ of a random variable X is possible in the usual
estimation. Replication of a random experiment results taking another value θ0 on Ω
from the prior distribution, then drawing a set of observations from the distribution
Pθ0 of a random variable X is possible in Bayes estimation.
The determination of a Bayes estimation is quite simple in principle. When
consider a situation before observations are taken and the distribution of θ on Ω is
known as prior distribution.
A decision function d(X) is a statistic that takes value in Ω. A non negative
function L(θ, d(X)), θ ∈ Ω is called a loss function. The function R defined by R(θ, d) =
Eθ [L(θ, d(X)] is known as the risk function associated with the decision function d(X)
at θ. For example L(θ, d) = [θ − d]2 , θ ∈ Ω ⊆ < , then the risk R(θ, d) = Eθ [d(X) − θ]2
is a mean squared error. If it is known as the variance of the estimator d(X) when
Eθ [d(X)] = θ.

7.6 Bayes risk to prior information

In Bayes estimation, the pdf (pmf ) π(θ) of θ on Ω ⊆ < is known as prior distri-
bution. For a fixed θ ∈ Ω, the pdf (pmf ) p(x | θ) represents the conditional pdf (pmf )
of a random variable X given θ. If π(θ) is the pdf (pmf ) of θ on Ω ⊆ <, then the joint
pdf (pmf ) of θ on Ω and X is given by p(x, θ) = π(θ)p(x | θ)
The Bayes risk of a decision function d with respect to the loss function L(θ, d)
is defined by R(π, d) = Eθ [R(θ, d)]. If θ on Ω is a continuous random variable and X
is of the continuous type, then Bayes risk with respect to the loss function L(θ, d) is

R(π, d) = Eθ [R(θ, d)]


A.Santhakumaran 309

Z
= R(θ, d)π(θ)dθ
Z
= Eθ [L(θ, d(X))]π(θ)dθ
Z Z 
= L(θ, d(x))p(x | θ)dx π(θ)dθ
Z Z
= L[θ, d(x)]p(x | θ)π(θ)dxdθ

If θ on Ω is a discrete variable with pmf π(θ) and X is of the discrete type, then
XX
R(π, d) = L[θ, d(x)]p(x | θ)π(θ)
θ x

7.7 Bayes point estimation

A decision function d? (X) is known as a Bayes estimator, if it minimizes the Bayes


risk, i.e., if R(π, d? ) = inf d R(π, d).
p(θ | x) is the conditional distribution of a random variable θ on Ω given X = x and
also called as the a posterior distribution of θ on Ω, given the sample. The joint pdf of
X and θ on Ω can be expressed in the form p(x, θ) = g(x)p(θ | x) where g(x) denotes
the marginal pdf (pmf ) of X. The a prior pdf (pmf ) π(θ) gives the distribution of θ on
Ω before the sample is taken and the posterior pdf (pmf ) p(θ | x) gives the distribution
of θ on Ω after the sampling.

7.8 Bayes risk to posterior information

The Bayes risk function of a decision function d(X) with respect to a loss function
L(θ, d(X)) in terms of p(θ | x) is

R(π, d) = Eθ [R(θ, d)]


Z
= R(θ, d(x))g(x)dx
Z
= g(x)Eθ [L(θ, d(x))]dx
Z Z 
= g(x) L(θ, d(x))p(θ | x)dθ dx

or
A.Santhakumaran 310

X
R(π, d) = g(x) [L(θ, d(x))p(θ | x)]
x

E[R(θ, d)] is a mean value of the risk R(θ, d) or the expected value of the risk
R(θ, d). It is evident that a Bayes estimator d? (X) minimizes the mean value of the
risk R(θ, d).
Theorem 7.1 Let X1 , X2 , · · · , Xn be a random sample from the pdf p(x | θ) and π(θ)
be a prior pdf of θ on Ω ⊆ <. Let L(θ, d) = (θ − d)2 be the loss function for estimating
the parameter θ. The Bayes estimator of θ is given by d? (X) = E [θ | X = x] .
Proof: The risk function of a decision function d(x) with respect to the loss function
L(θ, d) = [θ − d]2 is
Z Z 
2
R(π, d) = g(x) [θ − d(x)] p(θ | x)dθ dx

The Bayes estimator is a function d? (X) that minimizes R(π, d). Minimization of
R(π, d) is same as the minimization of
Z
[θ − d(x)]2 p(θ | x)dθ

Mean squared deviation is minimum iff

d? (x) = E[θ | X = x]

since Eθ [d? (X)] = Eθ {E[θ | X = x]} = θ

Remark 7.1 If L(θ, d) = |θ − d| is the loss function for estimating the parameter θ,
then Bayes estimator of θ is the median of the posterior distribution of θ ∈ Ω ⊆ <.
Since E|X − a| is minimized as a function of a, i.e., E|X − a| is minimized when a? =
median of the distribution of X. Also Bayes estimator is need not be unbiased.

7.9 Bayes minimax estimation

The principle of minimax estimator is to choose d? so that max R(θ, d? ) ≤


max R(θ, d) ∀ d. If such a function d? exists, is a minimax estimator of θ ∈ Ω ⊆ <.
Theorem 7.2 If d? (X) is a Bayes estimator having constant risk, that is R(θ, d? )
= constant, then d? (X) is a minimax estimator.
A.Santhakumaran 311

Proof: Let π ? (θ) be the prior density corresponding to the Bayes estimator d? (X)
with respect to the loss function L(θ, d). Then

sup R(θ, d? ) = constant = R(θ, d? )


θ∈Ω
= Eθ L[θ, d? (X)]
Z
= L(θ, d? (x))π ? (θ)dθ
Z
≤ L(θ, d(x))π(θ)dθ

≤ sup R(θ, d)
θ∈Ω

for any other estimator d(X) of the parameter θ. Thus d? (X) is a minimax estimator.

Mean squared error of d(X) = E[d(X) − θ]2

= E[d(X) − E[d(X)] + E[d(X)] − θ]2

= E[d(X) − Ed(X)]2 + [E[d(X) − θ]2

= Vθ [d(X)] + [bias]2

where Eθ [d(X)] − θ is called the bias of estimator d(X).


Problem 7.23 Let X ∼ b(n, θ) and the a prior pdf of θ on Ω ⊆ < is U (0, 1). Find
the Bayes estimate of θ using quadratic loss function. Also find the minimax estimate
of θ
Solution: The a priori pdf of θ on Ω is

 1 0<θ<1

π(θ) =
 0

otherwise

The marginal pdf of X is


Z
g(x) = p(x, θ)dθ
Z
= π(θ)p(x | θ)dθ
Z 1
= cnx θx (1 − θ)n−x dθ
0
Z 1
= cnx θx+1−1 (1 − θ)n−x+1−1 dθ
0
A.Santhakumaran 312

Γ(x + 1)Γ(n − x + 1)
= cnx
Γ(n − x + 1 + x + 1)
n! x!(n − x)!
=
x!(n − x)! (n + 1)!

1
x = 0, 1, 2, · · · , n


n+1
g(x) =
 0

otherwise
The posterior pdf of θ on Ω is

p(x, θ) π(θ)p(x | θ)
p(θ | x) = =
g(x) g(x)
= (n + 1)cx θ (1 − θ)n−x
n x

Bayes estimate d? (x) of the parameter θ

= E (θ | X = x)
Z 1
= θp(θ | x)dθ
0
Z 1
= (n + 1)cnx θx+2−1 (1 − θ)n−x+1−1 dθ
0
n! (x + 1)!(n − x)!
= (n + 1)
x!(n − x)! (n + 2)!
x+1
=
n+2
X+1
The Bayes estimator of the parameter θ is d? (X) = n+2 .

Bayes minimax estimator of the function d? (X) with respect to the loss function
L(θ, d? ) is
Z Z
R(π, d? ) = L[θ, d? (x)]π(θ)p(x | θ)dxdθ
Z ( n )
X
? 2
= π(θ) [d (x) − θ] p(x | θ) dθ where L(θ, d? (x)) = [d? (x) − θ]2
x=0
n 
Z 1 (X 2 )
x+1
= −θ p(x | θ) dθ
0 x=0
n+2
Z 1 2
X +1

= Eθ −θ dθ
0 n+2
Z 1
1 h i
= Eθ (X + 1)2 + (n + 2)2 θ2 − 2(X + 1)(n + 2)θ dθ
(n + 2)2 0
A.Santhakumaran 313

Z 1
? 1
Eθ [X 2 ] + 2Eθ [X] + 1 + θ2 (n + 2)2 − 2θ(n + 2)Eθ [X] − 2θ(n + 2) dθ

R(π, d ) =
(n + 2)2 0
Z 1
1
R(π, d? ) = n(n − 1)θ2 + nθ + 2nθ + 1 + θ2 (n + 2)2 − 2θ(n + 2)nθ − 2θ(n + 2) dθ

(n + 2)2 0

1 1 Z
?
R(π, d ) = [nθ(1 − θ) + (1 − 2θ)2 ]dθ
(n + 2)2 0
1 n 1 1
 
= 2
+ =
(n + 2) 6 3 6(n + 2)

Problem 7.24 Let X1 , X2 , · · · , Xn be a random sample drawn from a population


with pmf 
 θ x (1 − θ)1−x

x = 0, 1 and 0 < θ < 1
pθ (x) =
 0

otherwise
Assume that the a prior distribution of θ on Ω is given by

 1 0<θ<1

π(θ) =
 0 otherwise

Find the Bayes estimate of θ and θ(1 − θ) using the quadratic loss function.
Solution: The marginal pdf of X1 , X2 , · · · , Xn is
Z
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
Z 1
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
Z 1 P P
= θ x1
(1 − θ)n− xi

0
Z 1 X
= θt+1−1 (1 − θ)n−t+1−1 dθ where t = xi
0
 t!(n−t)!

t = 0, 1, 2, · · · , n
(n+1)!
=
 0

otherwise

The posterior pdf of θ on Ω is

p(x1 , x2 , · · · , xn , θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
π(θ)p(x1 , x2 , · · · , xn | θ)
=
g(x1 , x2 , · · · , xn )
A.Santhakumaran 314


 (n+1)! θ t (1 − θ)n−t

0<θ<1
t!(n−t)!
=
 0

otherwise

Bayes estimate of the parameter θ is

d? (x1 , x2 , · · · , xn ) = E [θ | X1 = x1 , · · · , Xn = xn ]
(n + 1)!θt (1 − θ)n−t
Z 1
= θ dθ
0 t!(n − t)!
(n + 1)! 1 t+2−1
Z
= θ (1 − θ)n+1−t−1 dθ
t!(n − t)! 0
(n + 1)! (t + 1)!(n − t)!
=
t!(n − t)! (n + 2)!
P
t+1 xi + 1
= =
n+2 n+2

Bayes estimate of the parameter θ(1 − θ) is


Z 1
?
d (x1 , x2 , · · · , xn ) = θ(1 − θ)p(θ | x1 , x2 , · · · , xn )dθ
0
(n + 1)! 1 t+2−1
Z
= θ (1 − θ)n+2−t−1 dθ
t!(n − t)! 0
(n − t + 1)(t + 1)
=
(n + 3)(n + 2)
(n − xi + 1)( xi + 1)
P P
=
(n + 2)(n + 3)

Problem 7.25 Let X1 , X2 , · · · , Xn be a random sample drawn from a Poisson popu-


lation with parameter θ. For estimating θ, using the quadratic error loss function and
the a prior distribution of θ on Ω, given by pdf

 e−θ

θ>0
π(θ) =
 0

otherwise

is used. Find the Bayes estimate for (i) θ and (ii) e−θ
Solution: The marginal pdf of X1 , X2 , · · · , Xn is
Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
Z0 ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
Z ∞ P
−θ e
−nθ
θ xi
= e dθ
0 x1 ! · · · xn !
A.Santhakumaran 315

Z ∞
1 X
= Qn e−(n+1)θ θt+1−1 dθ where t = xi
i=1 xi ! 0
t!
= Qn
i=1 x i !(n + 1)t+1
The posterior pdf of θ on Ω is
p(x1 , x2 , · · · , xn , θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
π(θ)p(x1 , x2 , · · · , xn | θ)
=
g(x1 , x2 , · · · , xn )
e−(n+1)θ θt X
= (n + 1)t+1 where t = xi and 0 < θ < ∞
t!

Case (i) Bayes estimate of θ is


Z ∞ −(n+1)θ t
e θ
d? (x1 , x2 , · · · , xn ) = θ (n + 1)t+1 dθ
0 t!
(n + 1)t+1 ∞ −(n+1)θ t+2−1
Z
= e θ dθ
t! 0
(n + 1)t+1 Γt + 2
=
t! (n + 1)t+2
t!(t + 1) t+1
= =
t!(n + 1) (n + 1)

Case (ii) Bayes estimate of e−θ is

e−(n+1)θ θt
Z ∞
d? (x1 , x2 , · · · , xn ) = e−θ (n + 1)t+1 dθ
0 t!
(n + 1)t+1 ∞ −(n+2)θ t+1−1
Z
= e θ dθ
t! 0
(n + 1)t+1 Γt + 1
=
t! (n + 2)t+1
t+1
n+1

=
n+2

Problem 7.26 X ∼ b(n, θ) and suppose that a priori pdf of θ on Ω is U (0, 1). Find
(θ−d)2
the Bayes estimate of θ. Using loss function L(θ, d) = θ(1−θ) , find the Bayes minimax
estimate of θ.
x+1
Solution: As in Problem 7.23, the Bayes estimate of θ is d? (x) = n+2 . Minimax
estimate of θ with respect to the loss function L(θ, d? ) is
Z 1 Z 
R(π, d? ) = π(θ) L(θ, d? (x))p(x | θ)dx dθ
0
A.Santhakumaran 316

" n #
X [d? (x) − θ]2
= p(x | θ) dθ
x=0
θ(1 − θ)
n 
Z 1 "X 2 #
x+1 1
= −θ p(x | θ) dθ
0 x=0
n+2 θ(1 − θ)
Z 1 2
X +1 1

= Eθ −θ dθ
0 n+2 θ(1 − θ)
Z 1
1 1

= (n − 4) + dθ
(n + 2)2 0 θ(1 − θ)
Z 1
(n − 4) 1 1 1

= + + dθ
(n + 2)2 (n + 2)2 0 θ 1 − θ

Problem 7.27 Let X1 , X2 , · · · , Xn be a random sample drawn from a population


with pdf G(1, 1θ ). To estimate θ, let the a priori pdf on θ be π(θ) = e−θ , θ > 0 and let
the loss function be squared error. Find the Bayes estimate of θ.
Solution: The marginal pdf of X1 , X2 , · · · , Xn is
Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
0
Z ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
Z ∞ n
−θ(1+t) n+1−1
X
= e θ dθ where t = xi
0 i=0
n!
= , 0<t<∞
(1 + t)n+1

The posterior pdf θ on Ω is

p(x1 , x2 , · · · , xn , θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
π(θ)p(x1 , x2 , · · · , xn | θ)
=
g(x1 , x2 , · · · , xn )
(1 + t)n+1 −θ(1+t) n
= e θ , 0<θ<∞
n!
Bayes estimate of θ is
(1 + t)n+1 ∞ −θ(1+t) n+2−1
Z ∞ Z
d? (x) = e θ dθ
0 n! 0
(1 + t)n (n + 1)!
=
n! (1 + t)n+2
n+1 n+1
= = P
(1 + t)2
[ xi + 1]2
A.Santhakumaran 317

Problem 7.28 Let X1 , X2 , · · · , Xn be iid random sample drawn from a population


with pmf b(1, θ). Assume the a prior pdf of θ on Ω is

a−1 b−1
 θ (1−θ)

0<θ<1
β(a,b)
π(θ) =
 0

otherwise

Find the Bayes estimate of θ using the quadratic loss function.


Solution: The marginal pdf of X1 , X2 , · · · , Xn is
Z 1
g(x1 , x2 , · · · , xn ) = π(θ)p(x1 , x2 , · · · , xn | θ)dθ
0
Z 1 t+a−1
θ (1 − θ)n−t+b−1
= dθ
0 β(a, b)
1 Γ(a + t)Γ(n − t + b)
=
β(a, b) Γ(n + a + b)

The posterior pdf of θ on Ω is

Γ(n + a + b)
p(θ | x1 , x2 , · · · , xn ) = θa+t−1 (1 − θ)n+b−t−1 0 < θ < 1
Γ(a + t)Γ(a + b − t)
Bayes estimate of θ is
Γ(n + a + b) 1 Z
?
d (x) = θa+1+t−1 (1 − θ)n+b−t−1 dθ
Γ(a + t)Γ(n + b − t) 0
P
a+t xi + a
= =
n+b+a n+b+a

Problem 7.29 Let the a prior pdf of θ on Ω be N (0, 1). Let X1 , X2 , · · · , Xn be iid
random sample drawn from a normal population with mean θ and variance 1. Find
the Bayes estimate of θ and Bayes risk with respect to a loss function L[θ, d] = [θ − d]2 .
Solution: The a priori pdf of θ on Ω is

 √1 e− 12 θ2

−∞ < θ < ∞
π(θ) = 2π
 0

otherwise

The pdf of X given θ is



 √1 e− 12 (x−θ)2

−∞ < x < ∞
p(x | θ) = 2π
 0

otherwise
A.Santhakumaran 318

The marginal density of X1 , X2 , · · · , Xn is


Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
−∞
Z ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
−∞
Z ∞ n
1 1

1 2 1
P
(xi −θ)2
= √ e− 2 θ √ e− 2 dθ
−∞ 2π 2π
1
P
x2i
e− 2
Z ∞
1 2 +θ 2 −2nθx̄]
= (n+1)
e− 2 [nθ dθ
(2π) 2 −∞
1
P
x2i
e− 2
Z ∞
(n+1)
= e− 2 [θ2 − 2nx̄θ
n ] dθ
(n+1)
(2π) 2 −∞

− 12
P
x2i n2 x̄2
Z ∞
e n+1 nx̄ 2
= (n+1)
e 2(n+1) e− 2
[θ− n+1 ]

(2π) 2 −∞
√ nx̄ nx̄ 2
Put the transformation n + 1(θ − n+1 ) = t ⇒ (n + 1)(θ − n+1 ) = t2
2 2
− 12 n x̄
P
x2i + 2(n+1) Z ∞
e 1 2
g(x1 , x2 , · · · , xn ) = n+1 √ e− 2 t dt
(2π) 2 n+1 −∞
n2 x̄2
− 12
P
x2i + 2(n+1) √
e
= n+1 √ 2π
(2π) 2 n+1
2 2
− 12 n x̄
P
x2i + 2(n+1)
e
= √ n
n + 1(2π) 2
The posterior pdf of θ on Ω is
π(θ)p(x1 , x2 , · · · , xn | θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
1 2 1
P
(xi −θ)2
√ n
√1 e− 2 θ √ 1 e− 2 n + 1(2π) 2
2π ( 2π)n
= 2
− 21 n x̄
P
x2i + 2(n+1)
e
1 (n+1) nx̄ 2
= q e− 2
[θ− n+1 ]
−∞<θ <∞

n+1
= 0 otherwise

Bayes estimate of θ is

d? (x) = E[θ | X1 = x1 , · · · Xn = xn ]
Z ∞
= θp(θ | x1 , x2 · · · , xn )dθ
−∞
Z ∞
1 1 (n+1) nx̄ 2
= θ √ (n + 1) 2 e− 2 [θ− n+1 ] dθ
−∞ 2π
A.Santhakumaran 319

√ nx̄

Put t = n + 1(θ − ⇒ θ= √t nx̄
n+1 ) n+1
+ n+1 , dt = n + 1dθ

t2
1 te− 2
Z ∞ Z ∞
1 nx̄
 
t2
?
d (x) = √ √ dt + √ e− 2 dt
−∞ 2π n + 1 −∞ 2π n+1
nx̄ nx̄
= 0+ =
n+1 n+1

Bayes minimax estimate R(π, d? )


Z ∞ Z ∞  2
nx̄
= −θ p(x̄ | θ)π(θ)dθdx̄
−∞ −∞ n+1
Z ∞ " #2
nX̄
= π(θ)Eθ −θ dθ
−∞ n+1
Z ∞
1
= π(θ)Eθ [nX̄ − nθ − θ]2 dθ
(n + 1)2 −∞
Z ∞
1 n o
= π(θ) Eθ [n(X̄ − θ)]2 + θ2 dθ
(n + 1)2 −∞
Z ∞
1
= π(θ)[n2 Vθ [X̄] + θ2 ]dθ
(n + 1)2 −∞
Z ∞
1 1
= π(θ)[n + θ]2 dθ where Vθ [X̄] =
(n + 1)2 −∞ n
Z ∞ Z ∞
n 1
= π(θ)dθ + θ2 π(θ)dθ
(n + 1)2 −∞ (n + 1)2 −∞
n 1
= + since π(θ) ∼ N (0, 1)
(n + 1)2 (n + 1)2
n+1 1
= =
(n + 1)2 n+1

7.10 Bayes confidence intervals

Bayes confidence interval estimation taking into account a prior knowledge of the
experiment and to construct the confidence interval for a parameter θ. The posterior
pdf p(θ | x1 , x2 , · · · , xn ) of θ on Ω is known, then one can easily find out the function
l1 (x) and l2 (x) such that

P {l1 (X) < θ < l2 (X)} = 1 − α

It gives the 1 − α level Bayes confidence interval for θ. Thus


Z l2 (θ)
P {l1 (X) < θ < l2 (X)} = p(θ | x1 , x2 , · · · , xn )dθ
l1 (θ)
A.Santhakumaran 320

or
l2 (x)
X
= p(θ | x1 , x2 , · · · xn )
l1 (x)

Problem 7.30 Let X1 , X2 , · · · , Xn be iid b(1, θ) random variables and let the a prior
pdf π(θ) of θ on Ω be U (0, 1). Find (1 − α) level Bayes confidence interval for θ.
Solution: As in Example 7.24,

1 t − θ)n−t 0 < θ < 1 where t =
P
β(t+1,n−t+1) θ (1 xi


p(θ | x1 , x2 , · · · , xn ) =
 0

otherwise

(1 − α) level Bayes confidence interval for θ is

Pθ {l1 (X) < θ < l2 (X)} = 1 − α


α
i.e., Pθ {θ ≥ l2 X} =
2
Z 1
1 α
θt (1 − θ)n−t dθ = (7.7)
l2 x β(t + 1, n − t + 1) 2

α
and Pθ {θ ≤ l1 x} =
2

Z l1 (x)
1 α
(θ) θt (1 − θ)n−t dθ = (7.8)
0 β(t + 1, n − t + 1) 2
Solving the equations (7.7) and (7.8) for θ, one may get the (1 − α) level Bayes confi-
dence interval (θ(x), θ̄(x)) for θ.
Problem 7.31 Let X1 , X2 , · · · , Xn be iid random sample drawn from a normal
population N (θ, 1), θ ∈ Ω ⊆ < and let the a priori pdf π(θ) of θ on Ω be N (0, 1). Find
(1 − α) level Bayes confidence interval for θ.
Solution: As in Problem 7.29, the posterior pdf of θ on Ω is

nx̄ 1
 
p(θ | x1 , x2 , · · · , xn ) ∼ N ,
n+1 n+1

. The (1 − α) level Bayes confidence interval is

Pθ {l1 (X) < θ < l2 (X)} = 1 − α


A.Santhakumaran 321

Consider the Statistic


nx̄
θ− n+1
Z= ∼ N (0, 1)
√1
n+1

Here θ is random variable. If one selects the equal tails confidence interval, then
( ! )
nX̄ √
Pθ −z α2 < −θ n + 1 < z α2 = 1−α
n+1
( )
nX̄ zα nX̄ zα
Pθ −√ 2 <θ< √ +√ 2 = 1−α
n+1 n+1 n+1 n+1

nx̄ z α2 nx̄ z α2 
−√ , +√
n+1 n+1 n+1 n+1

is the (1 − α) level Bayes confidence interval for θ.


Problem 7.32 Let X1 , X2 , · · · , Xn be a random sample from a Poisson distribution
with unknown parameter θ. Assume that the a prior pdf π(θ) of θ on Ω is

1
e−αθ θβ−1 θ > 0, α, β > 0


αβ Γβ
π(θ) =
 0

otherwise

Find (1 − α) level Bayes confidence interval for θ.


Solution: The pdf of X1 , X2 , · · · , Xn given θ is
n
e−nθ θt X
p(x1 , x2 , · · · , xn | θ) = Qn where t = xi
i=1 xi ! i=1

The marginal pdf of X1 , X2 , · · · , Xn is


Z ∞
1 −nθ θ t
−αθ β−1 e
g(x1 , x2 , · · · , xn ) = e θ n dθ
αβ Γβ
Q
0 i=1 xi !
1 1 1 Γ(β + t)
= Qn β β+t
i=1 xi ! α Γβ (α + n)

The posterior pdf of θ on Ω is

(α + n)β+t −(n+α)θ β+t−1


p(θ | x1 , x2 , · · · , xn ) = e θ θ>0
Γ(β + t)

The (1 − α) level confidence interval for θ is

P {l1 (X) < θ < l2 (X)} = 1 − α


α
i.e., Pθ {θ ≥ l2 (x)} =
2
A.Santhakumaran 322

Z ∞
α
p(θ | x1 , x2 , · · · , xn )dθ = (7.9)
l2 (x) 2

α
Pθ {θ ≤ l1 (x)} =
2

Z l1 (x)
α
p(θ | x1 , x2 , · · · , xn )dθ = (7.10)
0 2
Solving the equations (7.9) and (7.10) for θ, one may get the (1 − α) level Bayes
confidence interval (θ(X), θ̄(X)) for θ.
Problem 7.33 Let X1 , X2 , · · · , Xn be a sample drawn from a normal population
N (θ, 1). Assume that the a prior pdf π(θ) on Ω is U (−1, 1). Find (1 − α) level Bayes
confidence interval for θ.
Solution: The pdf of X1 , X2 , · · · , Xn is
  n 1
P 2
 √1 e− 2 (xi −θ) −∞ < x < ∞

p(x1 , x2 , · · · , xn | θ) = 2π
 0

otherwise

The a prior pdf of θ on Ω is



 1

−1 < θ < 1
2
π(θ) =
 0

otherwise

The marginal pdf of X1 , X2 , · · · , Xn is


Z ∞
g(x1 , x2 , · · · , xn ) = p(x1 , x2 , · · · , xn , θ)dθ
−∞
Z ∞
= π(θ)p(x1 , x2 , · · · , xn | θ)dθ
−∞
Z 1 n
1 1

1
P
(xi −θ)2
= √ e− 2 dθ
−1 2 2π
1
P 2
e − 2 xi
Z 1
n 2 −θx̄]
= n e− 2 [θ dθ
2(2π) 2 −1
1
P 2
e − 2 xi
Z 1
e− 2 {[θ−x̄] } dθ
n 2 −x̄2
= n
2(2π) 2 −1
1 nx̄2
P 2
e − 2 xi +
Z 1
2 n 2
= n e− 2 [θ−x̄] dθ
2(2π) 2 −1
A.Santhakumaran 323

1 nx̄2
P 2
e − 2 xi +
Z ∞
2 t2 dt √
= n e− 2 √ where t = (θ − x̄) n
2(2π) 2 −∞ n
1 2 nx̄
P 2
e − 2 xi + 2 √
= √ n 2π
2 n(2π) 2

The posterior pdf of θ on Ω is

π(θ)p(x1 , x2 , · · · , xn | θ)
p(θ | x1 , x2 , · · · , xn ) =
g(x1 , x2 , · · · , xn )
1 − 12
P
(xi −θ)2 √

2e 2 n( 2π)n
= √ √ 1
P 2 n 2
2π( 2π)n e− 2 xi + 2 x̄

n 1 2
= √ e− 2 n[θ−x̄] − ∞ < θ < ∞

1
 
θ ∼ N x̄,
n

The (1 − α) level Bayes confidence interval for θ is

P {a < Z < b} = 1 − α
θ−x̄
where Z = √1
∼ N (0, 1)
n
n o
P −z α2 < Z < z α2 = 1−α
zα/2 zα/2
 
P X̄ − √ < θ < X̄ + √ = 1−α
n n

Thus the (1 − α) level Bayes confidence for θ is

zα/2 zα/2
 
x̄ − √ , x̄ + √
n n

Problems

7.1 Distinguish between point estimation and interval estimation.

7.2 Explain the shortest confidence interval. Also obtain (1 − α) level shortest confi-
dence interval for θ, using a random sample of size n from

 e−(x−θ)

x ≥ 0, θ > 0
p(x | θ) =
 0

otherwise
A.Santhakumaran 324

7.3 Let X1 , X2 , · · · , Xn be a random sample from U (0, θ). Find the shortest - length
confidence interval for θ at level (1 − α).

7.4 Obtain (1 − α) level confidence interval for σ 2 when θ is known in N (θ, σ 2 ).

7.5 Suggest (1 − α) level shortest confidence interval for θ in N (θ, σ 2 ), σ 2 is known.


what is its length?

7.6 Obtain (1−α) coefficient confidence interval for θ based on a random sample from

 1 e− θ1 x

x ≥ 0, θ > 0
θ
p(x | θ) =
 0

otherwise

7.7 Obtain (1 − α) level shortest confidence interval for θ using a random sample from
N (θ, 1).

7.8 Given X1 , X2 , · · · , Xn is a random sample from N (θ, σ 2 ), where σ 2 is known .


Find (1 − α) level upper confidence bound for θ.

7.9 Obtain a confidence interval for the range of a rectangular distribution in random
sample of size n.

7.10 The number of houses sold per week for 15 weeks by Dinesh real estate firm were
3 , 3, 4, 6, 2, 4, 4, 3, 1, 2, 0 , 5, 7, 1, 4 respectively. Assuming these are the
observed values for a random sample size 15 of a Poisson random variable with
parameter θ. Compute 95 % confidence limits for θ. Ans.(2.36, 4.18)

7.11 Show that in large samples, the 95% level confidence limits for the means of a
Poisson distribution are given by
r
1.92 3.84
X̄ + ± X̄
n n
where n−2 is negligible.

7.12 Show that for the pdf



 θe−θx

x > 0, θ > 0
p(x | θ) =
 0

otherwise
A.Santhakumaran 325

the 95% level confidence limits for large samples are given by
 
1.96
1± √
n
θ=

7.13 Obtain the large sample confidence interval with confidence coefficient (1 − α)
for the parameter of Bernoulli distribution.

7.14 Examine the connection between shortest confidence interval and sufficient statis-
tics.

7.15 Given n independent observations from a Poisson distribution with mean λ, find
Bayes’ estimate of λ, assuming the prior distribution π(θ) = e−λ , 0 < λ < ∞.

7.16 If d is a Bayes estimator of θ relative to some prior distributions and the risk
function does not depend on θ, show that d is minimax.

7.17 Define the terms: loss function, risk function and minimax estimator. Explain a
procedure of computing the minimax estimator under squared error loss function.

7.18 Explain Bayes and Minimax estimation procedures. Find out the Bayes estimate
of θ by using the quadratic loss function. Given a random sample from p(x | θ) =
θx (1 − θ)1−x , x = 0, 1. The a priori distribution of θ is π(θ) = 2θ, 0 ≤ θ ≤ 1.

7.19 Let X1 , X2 , · · · , Xn be a sample drawn from a normal population N (0, 1). Assume
that the a prior pdf π(θ) on Ω is U (−1, 1). Find (1 − α) level Bayesian confidence
interval for θ. Also comments on your confidence interval.

7.20 Explain the concepts of Baye’s estimation.

7.21 Distinguish between interval estimation and Bayes interval estimation.

7.22 90 % confidence interval for θ based on a single observation X from the density function

1

θ 0 < x < θ, θ > 0
p(x | θ) =
 0 otherwise

is
 20X   50 
(a) [X, 10X] (b) 19 , 20X (c) 49 , 12.5 (d) All the above Ans:(a)
A.Santhakumaran 326

7.23 The correct interpretation regarding the confidence interval (T1 , T2 ) of the pa-
rameter θ for a distribution F (x | θ), θ ∈ < with confidence coefficient 1 − α is
(a) θ belongs to (T1 , T2 ) with probability 1 − α
(b) (T1 , T2 ) covers the parameter θ with probability 1 − α
(c) (T1 , T2 ) includes the parameter θ with confidence coefficient 1 − α
(d) θ0 belongs to (T1 , T2 ) with confidence α where θ(6= θ0 ) is the true value.
Ans:(c)

7.24 If a random sample of n = 100 voters in a community produced 59 votes in


favour of the candidate A , then 95 % confidence interval of fraction p of the
voting population favouring A is
q
59×41
(a) 59 ± 1.96 100
q
0.59×0.41
(b) 0.59 ± 1.96 100
q
0.59×0.41
(c) 59 ± 2.58 100
q
59×41
(d) 59 ± 2.58 100 Ans:(b)

7.25 Let X1 , X2 , · · · , Xn be a sample from U (0, θ). The equal two tails (1 − α) level
confidence interval for θ is
 
X(n) X(n)
(a) 1 , 1
 (1−α/2) n (α/2) n 
X(n) X(n)
(b) 1 , 1
 (α/2)
X(n)
n (1−α/2) n
X(n)

(c) (1−α/2) n , (α/2) n

(d) None of the above Ans:(a)

7.26 The joint pdf p(x, θ) can be expressed for the given value θ on Ω ⊆ < and the a
prior density π(θ) as
(a)p(x, θ) = p(x | θ)π(θ)
(b) p(x, θ) = g(x)p(x | θ)
g(θ)
(c) p(x, θ) = p(θ|x)
π(θ)
(d)p(x, θ) = p(x|θ) Ans:(a)

7.27 The joint pdf p(x, θ) can be expressed for the given value X = x. p(θ | x) is the
posterior pdf of θ on Ω ⊆ < and g(x) is the marginal density of X as
A.Santhakumaran 327

(a) p(x, θ) = g(x)p(θ | x)


g(x)
(b) p(x, θ) = p(θ|x)
π(θ)
(c) p(x, θ) = p(θ|x)

(d) p(x, θ) = g(x)p(x | θ) Ans:(a)

7.28 Which of the following statements are correct?


(1) Properties of Bayes estimator are given in terms of minimum risk
(2) For large n, Bayes estimators tend to MLE’s irrespective of prior density π(θ)
of θ on Ω
(3) Bayes estimators in many cases are asymptotically consistent
(4) Goodness of a Bayes estimator is measured in terms of mean squared error
loss function
State the correct answer given below
(a) 1 and 2 ( b) 2 and 3 (c) 3 and 4 (d) 1, 2, 3 and 4 Ans:(d)

7.29 Bayes estimator is


(a) unbiased
(b) not unbiased
(c) asymptotically normal
(d) None of the above Ans:(b)

7.30 Which of the following statement is true?


Main feature of Bayes’ approach in the estimation of parameter is
(a) to consider the parameter a random variable
(b) to specify prior distribution
(c) to specify posterior distribution
(d) All the above Ans:(a)

7.31 Bayes estimator is


(a) always asymptotically normal
(b) always a function of minimal sufficient statistics
(c) most efficient
A.Santhakumaran 328

(d) both (a) and (c) Ans:(b)

7.32 Which of the following statements are true?


(1) Bayes estimation uses the prior information of the distribution to completely
specify the realization of distribution.
(2) Bayes estimation involves only a single observation from the ditribution of θ
on Ω
(3) Bayes estimation consists of repeating a random experiment means taking
another value θ0 on Ω from the prior distribution, then drawing a set of observa-
tions from the distribution Pθ0 of a random variable X
Choose the correct answer given below
(a) 1 and 2 (b) 1 and 3 (c) 2 and 3 (d) 1, 2 and 3 Ans:(d)

7.33 A random sample of size n, (n ≥ 1) is drawn from the density



 2λxe−λx2

x>0
fλ (x) =
 0

otherwise

If the prior of λ as an exponential distribution with mean 1, then which of the


following statement is correct?
(a) The posterior distribution of λ ia an exponential distribution
(b) The Bayes estimator of λ with respect to the squared error loss function
exists and is unique
(c) The Bayes estimator of λ with respect to the absolute error loss function
exists and is unique
(d) The Bayes estimator of e−λ does not exist Ans:(c)

7.34 Consider a Cauchy population with pdf



 1
 1
−∞ < x < ∞
π [1+(x−θ)2 ]
fθ (x) =
 0

otherwise

Let X1 , X2 , · · · , Xn be a random sample from the above population which of the


following confidence intervals for θ have confidence coefficient 1 − α, (0 < α < 1
A.Santhakumaran 329

(a) [X1 − tan π(1−α)


2 , X1 + tan π(1−α)
2 ]
(b) [ X1 +X
2
2
− tan π(1−α)
2 , X1 +X
2
2
+ tan π(1−α)
2 ]
(c) [ X1 +X
2
2
− tan 5π(1−α)
2 , X1 +X
2
2
+ tan 2π(1−α)
7

(d) [ X1 +X22 +X3 − tan 5π(1−α)


2 , X1 +X22 +X3 + tan π(1−α)
7 ] Ans:(a) and (c)

7.35 Let X = (X1 , X2 , X3 , X4 )T be 4 random vestor such that X ∼ N4 (O,


P
) where
 
1 ρ ρ ρ
 
 
 ρ 1
X  ρ ρ 
=


 ρ ρ 1 ρ 


 
ρ ρ ρ 1

is positive definite . Then which of the following statements are true?


(a) X1 X2 , X2 X3 and X3 X4 have identical distribution.
(X1 −X2 )2
(b) (X1 −X3 )2
∼ F1,1
(X1 −X2 )2
(c) (X1 +X3 )2
∼ F1,1
(X1 −X2 )2
(d) (X3 −X4 )2
∼ F1,1 Ans:(a) and (d)

7.36 Let X be a random sample from a Poisson distribution with parameter λ has a
prior distribution f (z) where

 e−z

z>0
f (z) =
 0

otherwise

Under the squared error loss function which of the following statements are cor-
rect ?
(a) The Bayes estimator of eλ is 2X+1
X+1
(b) The posterior mean of λ is 2

(c) The posterior distribution of λ is Gamma distribution.


(d) The Baye’s estimator of e2λ is 22(X+1) Ans:(a) , (b) and (c)

7.37 Let (X, Y ) follow a bivariate normal distribution with mean vector (0, 0) and
dispersion matrix  
X  1 ρ 
= 
ρ 1
A.Santhakumaran 330

q
X−Y 1+ρ
where ρ 6= 0 . Suppose Z = X+Y 1−ρ . Then which of the following statements
are correct?
q
1+ρ X−Y
(a) 1−ρ × √
X 2 +Y 2 +2XY
has a Student t distribution
q
1−ρ X−Y
(b) 1+ρ × √
X 2 +Y 2 −2XY
has a Student’s t distribution
(c) Z is symmetric about zero
(d) E[Z] exists and equal to zero Ans:(a) and (c)

1
7.38 Let X be a random sample from an exponential distribution with mean λ. If λ
has a prior distribution with probability density function

 λe−λ

λ>0
g(λ) =
 0

λ≤0

1
Then the Bayes estimator of λ with respect to the squared error loss function is
2 1 X+1
(a) X+1 (b) X (c) X (d) 2 Ans:(a)

7.39 Let X1 , X2 , · · · , Xn for n ≥ 5 be a random sample from the distribution with pdf

 e−(x−θ)

if x > θ
f (x, θ) =
 0

otherwise

for θ > 0. The confidence coefficient of the confidence interval


h i
ln 4 ln 2
min{X1 , X2 , · · · , Xn } − n , min{X1 , X2 , · · · , Xn } + n for θ is
1
(a) 0.5 (b) 0.75 (c) 0.95 (d) 1 − 2n Ans:(d)

7.40 Let X1 , X2 , X3 and X4 be independent and identically distribution random vari-


ables with common distribution normal with mean µ and variance 2. If the prior
distribution µ is normal with mean 0 and variance 12 , then which of the following
is true?
(a) The prior distribution is not conjugate prior
P
Xi
(b) Posterior Mode of µ given X1 , X2 , X3 and X4 is 8
P
Xi
(c) Posterior Median of µ given X1 , X2 , X3 and X4 is
8 P 2
Xi
(d) Posterior Variance of µ given X1 , X2 , X3 and X4 is 4 Ans:(b)
A.Santhakumaran 331

7.41 Let X1 , X2 , · · · , Xn denote a random sample from a N (µ, σ 2 ) distribution. Let


µ ∈ < be known and σ 2 > 0 be unknown. Let χ2n,α/2 be an upper (α/2)th
percentile point of a χ2n distribution. Then a 100(1 − α)% confidence interval for
σ 2 is given by
 Pn Pn   Pn Pn 
( i=1 Xi2 −µ2 ) ( i=1 Xi2 −µ2 ) (Xi −X̄)2 (Xi −X̄)2
(a) ,
(n−1)χ2(n−1),α/2 (n−1)χ2(n−1),1−α/2
(c) i=1
nχ2n,α/2
, i=1
nχ2n,1−α/2
 Pn Pn   Pn P n 
( X 2 −µ2 )
i=1 i
( X 2 −µ2 )
i=1 i
(Xi −µ)2 (Xi −µ)2
(b) nχ2n,α/2
, nχ2n,1−α/2
(d) i=1
nχ2n,α/2
, i=1
nχ2n,1−α/2
Ans:(d)

7.42 Consider a sample of size one say X = x from a population with pdf

 22 (x − θ) θ < x < 2θ, θ > 0

θ
fθ (x) =
 0

otherwise

Which of the following statements are confidence intervals of θ with confidence


coefficient (1 − α)? true?
h i  
X
(a) , X√ (c) X X

 2 1+ α  1−α , 1+ α/2
√X X

 
(b) , √X √X
1+ 1−α/2 1+ α/2 (d) ,
1+ 1−α/4 1+ 3α/4
Ans:(a),(b),(c) and (d)

7.43 Let X1 , X2 , · · · , Xn be iid N (µ, σ 2 ) variables where µ and σ 2 both are unknown
Consider a confidence interval for σ 2 , which is of the form Ia,b =
parameter. P
Xi −X̄)2
P 
(Xi −X̄)2 (
b , a where b > a > 0. Let Gn be the cumulative distribution
function of a χ2 random variable with n degrees of freedom. Which of the
following statements are true?
(a) It is possible to find a 95% confidence interval of the form Ia,b where ab = 1
(b) If Gn−1 (a) = 1 − Gn−1 (b) = 0.025 then it is the shortest 95% confidence
interval
(c) If it is the shortest confidence interval, then a and b must satisfy the condition
b − a = (n − 3) log ab
A.Santhakumaran 332

(d) If Gn−1 (a) − Gn−1 (b) = 0.95, then the expected length of a 95% confidence
 
1 1
interval of the form Ia,b is (n − 1) a − b σ2 Ans:(b) and (c)

7.44 Let X be Binomial distribution with parameters n and p where n ∈


{0, 1, 2, 3, · · · , } and 0 < p < 1 . When n = 0, X is degenerate at 0. Sup-
pose that n has a prior distribution which is Poisson with a known mean λ > 0.
Which of the following statement are true?
(a) The posterior distribution of n is also Poisson but with a mean different from
λ
(b) If X = 0 , the posterior distribution of n is Poisson with mean λ(1 − p)
(λ−n) 1
(c) The Bayes estimate of n has bias 2 when p = 2

(d) The Bayes estimate of n has larger variance than the variance of unbiased
X
estimate p Ans:(a),(b) and (d)
BIBLIOGRAPHY

1. Apostal, T.M., Mathematical Analysis, Addison-Wesley, 1960.

2. Balagurusamy, E., Programming in ANSI C, Tata McGraw - Hill Publishing


Company, Limited, New Delhi, 1965.

3. Cramer, H., Mathematical Methods of Statistics, Princeton, University Press,


Princeton, N.J., 1916.

4. Chernoff, H. and E. L.Lehmann, The use of maximum likelihood estimates in χ2


tests of goodness of fit, Ann. Math. Stat., 25, 579,1964.

5. Deshpande, J. V., A.P. Gore and A. Shanubhogue, Statisyical Analysis of Non-


normal data, New Age International(P) Ltd., New Delhi, 2003.

6. Fisher, R. A., On the mathematical foundations of theoretical statistics, Phil.Tran.


Royal Soc. A, 222, 309 -368, 1922.

7. Ferguson, Mathematical Statistics - A decision theoretic approach, Academic Press,


1967.

8. Hogg, R.V. and A. T. Craig, Introduction to Mathematical Statistics, Macmillan


Publishing Co., Inc., New York, 1970.

9. Lehmann, E.L., Testing statistical hypotheses, John Wiley and Sons., 1959.

10. Lehmann, E. L., Theory of point estimation, John Wiley and Sons., 1983.

11. Lindgren, B.W., Statistical theory , Macmillan Publishing Co., Inc. New York,
1976.

12. Nelson, W., Accelerated testing statistical models, tests plan and data analysis,
John Wiley and Sons, Inc., 2004.

13. Nelson,W., Accelerated life testing step stress models and data analysis, IEEE
Trans. Reliability , 29, 103 - 108, 1980.
A.Santhakumaran 334

14. Neyman, J. and E.S. Pearson, On the problem of the most efficient tests of
statistical hypotheses, Phil. Trans.Roy. Soc., A, 231, 289 - 337,1933.

15.Rao. C.R., Linear statistical inference and its applications, Wiley and Sons, 1984

16. Rohatgi, V.K., Introduction to probability theory and mathematical statistics,


Wiley and Sons, 1985.

17. Santhakumaran, A., Fundamentals of testing statistical hypoheses, Atlantic pub-


lishers and distributors, New Delhi, 2001.

18. Santhakumaran, A., Decision making tools in real life problem, The Hindu, 27th
Oct. 1997.

19. Santhakumaran, A., Probability models and their parametric estimation, K.P.
Jam Publication, Chennai, 2004.

20. Santhakumaran, A., Probability theory and random processes, Sonaversity, Salem,
2005.

21. Zacks, S. Theory of statistical inference, John Wiley and Sons, New York, 1971.
SUBJECT INDEX

Accelerated random variable 62 Completeness 14 , 152


Analytic solution 2 Convex function 242
Ancillary information 140 Constant stress 63
Ancillary statistic 140 Consistent estimator 92
Assumptions 1, 3 Convergence 97
Asymptotic efficient 208 Convergence in probability 75
Asymptotically unbiased 99 Confidence coefficient 287
Asymptotically normal 244 Covariance inequality 195
Auto-correlation 17 Condition on the sufficient statistic 176
Axioms 2, 3 Coefficient of determination 7
Bayes minimax estimator 310 Correlation coefficient 8
Bayes confidencial intervals 319 Cramer Rao inequality 194
Bayes estimation 307 Cumulative distribution 20
Bayes risk 308, 309 Data reduction 140
Bernoulli distribution 20 Deterministic models 1
Bhattacharya bound 218 Decision function 308
Binomial distribution 21 Efficiency 128 , 208
BLUE 267 Empirical discrete probability 54
Bounded family of completeness 152 Empirical continuous probability 54
Bootstrap method 2 Empirical models 1
Cauchy Schwarz inequality 169,170 Erlang distribution 50
Cauchy principle value 94 Estimator 90, 208
Central limit theorem 246 Exponential family 120
Chapman Robbin- Kiefer 200 Exponential probability 46
Chi-Square test 9 Event 19
Chebychev’s inequality 293 Evaluation 1,7
Collection of data 16 Family of random sets 287
Complete statistics 141 Finite element method 2
A.Santhakumaran 336

Finite difference method 2 LMVUE 198


Fixed derivative 237 Loss function 90,308
Fisher measure of information 195 Lomax pdf 63
Formation 4 Lomax cumulative distribution 64,66
Gamma distribution 49 Mahalanobis 15
Gauss Markoff Theorem 265 Mathematical model 1
Geometric distribution 21 Manipulation 1,4
Grapical solution 2 Mean squared error 308
Histogram 18,55,58 Method of scoring 237
Hyper geometric distribution 23 ,25 Minimum Chi-Square 260
Hypothesis 3, 9 Minimal sufficient 140
Huzurbazar’s Theorem 246 MLE 224
Idealization 1, 3 Monte-Carlo Simulation 10
Jacobian transformation 126 Moment estimation 256
Jensen’s inequality 242 Modern definition of statistics 16
Joint sufficient statistic 124 Modifed minimum Chi-Square 261
Justification 1,8 Multinomial distribution 22
Kendall definition of statistics 15 MVBE 224
Khintchin’s weak law 94 Neyman Criteria 114
Lagrangian multiplier 213 Newton - Raphson method 236
Laplace Transform 147 Negative Binomial distribution 21, 22
Lehmann-Scheffe technique 157 Normal probability distribution 42
Lehmann -Scheffe Theorem 175 Normal equation 269
Least square estimation 265 Numerical solution 2
Linear estimation 266 Numerical methods of MLE 235
Lindeberg - Levey 246 Pareto distribution 63
Lower confidence bound 287 Parameter 90
Linearly independent 269 Point estimation criteria 92
Likelihood function 195 Positive definite 212
A.Santhakumaran 337

Positive semi-definite 212 Root mean square error 8, 9


Posterior distribution 309 Sample space 19
Point estimation 91 Shortest confidence interval 298
Poisson distribution 26 Simulation 1,10
Power series distribution 27 Solving a set of equations 175
Predictive models 1 Statements 3
Probability measure 19 Step stress 63
Probability distribution model 23,25 Step stress scheme 65
Probability mass function 20 Stationary points 267
Probability density function 20 Student’s t distribution 87
Projection matrix 269 Sufficient statistic 109
Prior distribution 308 Successive iterations 235
Progressive stress 63 Terminal value 231
Quadratic loss function 308 Triangular distribution 52
Q - Q plot 60,61,62 Two dimension sufficient statistic 125
Random number 12,13 Unbiased estimator 99
Random variable 19 Uncorrelatedness 167
Random experiment 18,90 Uniform distribution 23,41
Random interval 288 UMVUE 167
Rao - Blackwell Theorem 173 Upper confidence bound 287
Reformation 1,5 Validation 1,9
Relative efficient estimator 128 Yule and Kendall definition 15
Risk function 90, 308 Zero bias estimator 99
Rolle’s Theorem 246

Você também pode gostar