Escolar Documentos
Profissional Documentos
Cultura Documentos
net/publication/239793067
CITATIONS READS
28 197
3 authors, including:
Sarjinder Singh
Texas A&M University - Kingsville
417 PUBLICATIONS 2,858 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sarjinder Singh on 04 November 2016.
To cite this article: Margaret Land , Sarjinder Singh & Stephen A. Sedory (2012) Estimation
of a rare sensitive attribute using Poisson distribution, Statistics, 46:3, 351-360, DOI:
10.1080/02331888.2010.524300
Download by: [Texas A & M University--Kingsville] Date: 03 November 2016, At: 22:17
Statistics, Vol. 46, No. 3, June 2012, 351–360
In this paper, a new method to estimate the mean of the number of persons possessing a rare sensitive
attribute is proposed by utilizing the Poisson distribution in survey sampling. Two situations are discussed:
that when the proportion of persons possessing a rare unrelated attribute is known and that when it is
unknown. Unbiased estimators of the mean number of persons possessing the rare sensitive attribute
under two different situations are proposed. The variance expressions are derived in each situation. The
relative efficiencies of the proposed estimators over the direct question method estimator are investigated for
different choices of parameters and are discussed. A technical point is made that the traditional randomized
response models cannot be used to estimate the mean of the Poisson random variable.
1. Introduction
Warner [1] considered a case in which the respondents in a population can be divided into two
mutually exclusive groups: one group with stigmatizing/sensitive characteristic A and the other
group without it. For estimating π , the proportion of respondents in the population belonging to
the sensitive group A, a simple random and with replacement sample (SRSWR) of n respondents
is selected from the population. To collect information on the sensitive characteristic, Warner [1]
made use of a randomization device. One such device is a deck of cards with each card having
one of the following two statements: (i) ‘I belong to group A’; (ii) ‘I do not belong to group
A’. The statements occur with relative frequencies, P0 and (1 − P0 ), respectively, in the deck of
cards. Each respondent in the sample is asked to select a card at random from the well-shuffled
deck. Without showing the card to the interviewer, the interviewee answers the question, ‘Is the
statement on the card true for you?’ The number of respondents n1 that answer ‘yes’ is binomially
distributed with parameters n and P0 π + (1 − P0 )(1 − π ). The maximum likelihood estimator
of π exists for P0 = 0.5 and is given by
(n1 /n) − (1 − P0 )
π̂w = . (1)
2P0 − 1
Let π1 be the true proportion of the rare sensitive attribute A1 in the population . For example, the
proportion of AIDS patients who continue having affairs with strangers; the proportion of persons
who have witnessed a murder; the proportion of persons who are told by their doctors that they will
not survive long due to a ghastly disease, the number of girls raped by their own fathers, etc. Note
that crime and criminals have no limits; thus there is no end of such issues. Consider selecting a
large sample of n persons from the population such that as n → ∞ and π1 → 0 then nπ1 = λ1
(finite). Let π2 be the true proportion of the population having the rare unrelated attribute A2 such
that as n → ∞ and π2 → 0 then nπ2 = λ2 (finite and known). For example, π2 might be the
proportion of persons who are born exactly at 12:00 o’clock; the proportion of babies born blind;
the proportion of triplet births delivered by ladies, etc. Each respondent selected in the sample is
requested to rotate a spinner bearing two types of statements:
(a) Do you possess the rare sensitive attribute A1 ?
and
(b) Do you possess the rare unrelated attribute A2 ?
with probabilities P and (1 − P ), respectively.
Statistics 353
Sometimes it may not be feasible to go door by door with a spinner to collect data from the
respondents. In such situations, we suggest replacing the use of a spinner with a question based
on a known proportion of a third unrelated characteristic, say A3 . Let P be the known proportion
of persons, say who are born in summer. Thus each respondent through a secure e-mail system or
telephone could be asked to do the following: If you are born in summer and you are possessing the
rare sensitive attribute A1 , then report ‘yes’, otherwise report ‘no’. If you are not born in summer
and you possess the rare unrelated non-sensitive attribute A2 , then report ‘yes’, otherwise report
‘no’. In this way, privacy will be maintained whether or not one prefers to use a spinner or some
third characteristic.
In either one of these cases, the probability of a ‘yes’ answer is given by
θ0 = P π1 + (1 − P )π2 . (3)
Note that both attributes A1 and A2 are very rare in the population. As before, assuming that
as n → ∞ and θ0 → 0 such that nθ0 = λ0 (finite). Let y1 , y2 , . . . , yn be a random sample of n
observations from the Poisson distribution with parameter λ0 . Obviously, the likelihood function
of the random sample of n observations is given by
n
e−λ0 λ i
y
L= 0
. (4)
i=1
yi !
On setting
∂ ln(L)
= 0. (6)
∂λ1
The maximum-likelihood estimator of λ1 is given by
1
n
1
λ̂1 = yi − (1 − P )λ2 . (7)
P n i=1
Proof Since yi ∼ P (λ0 ), that is, yi follows a Poisson distribution with parameter λ0 = P λ1 +
(1 − P )λ2 , we have
n n
1 1 1 1
E(λ̂1 ) = E(yi ) − (1 − P )λ2 = λ0 − (1 − P )λ2
P n i=1 P n i=1
1
= [λ0 − (1 − P )λ2 ] = λ1 ,
P
which proves the theorem.
354 M. Land et al.
λ1 (1 − P )λ2
V (λ̂1 ) = + . (8)
nP nP 2
Proof Since yi ∼ P (λ0 ), that is, yi follows Poisson distribution with parameter λ0 = P λ1 +
(1 − P )λ2 , and all are independent, we have
n
1 1 1
n
1
V (λ̂1 ) = V yi − (1 − P )λ2 = 2 V (yi )
P n i=1 P n2 i=1
1
n
1 λ0 P λ1 + (1 − P )λ2
= 2 2
λ0 = 2
=
P n i=1 nP nP 2
λ1 (1 − P )λ2
= + .
nP nP 2
Proof Taking the expected value on both sides of Equation (9), we have
n n
1 1 λ0 P λ1 + (1 − P )λ2
E[v̂(λ̂1 )] = 2 2 E yi = 2 2 λ0 = 2
= ,
n P i=1
n P i=1
nP nP 2
The per cent relative efficiency of the proposed estimator λ̂1 with respect to the direct question
method based estimator (equivalently where P = 1) reduces to
λ1 P 2
RE = × 100%. (10)
P λ1 + (1 − P )λ2
From Equation (10), it is clear that the relative efficiency of the proposed estimator is free from
the sample size n. Also, it is clear if the value of the two parameters λ1 and λ2 are equal, then the
relative efficiency is a function of the randomization device parameter P . If P = 1 the relative
efficiency attains a maximum of 100%, but the choice of P = 1 is not practicable. To look at
the magnitude of the relative efficiency, we chose three different pairs of (λ1 : λ2 ) as (1.5:1.5),
(1.5:0.5) and (0.5:1.5), and we changed the values of P from 0.60 to 0.90 with a step of 0.0001.
For the choice of (λ1 : λ2 ) as (1.5:0.5), the relative efficiency remains a bit higher than the other
two cases, which indicates that it is good to use as the rare unrelated attribute Y , one with a
mean value less than that of the rare sensitive attribute A without effecting the cooperation of the
respondents in using the proposed randomization device such as the spinner shown in Figure 1.
The results based on the proposed method are presented in Figure 2. The relative efficiency of the
proposed estimator could be retained from 60% to 80% while choosing the value of P from 0.70
Statistics 355
(1−P) P
1–P
Rare sensitive
attribute A1
P
80
60
40
20
0
0.5 0.6 0.7 0.8 0.9 1
P
l1 = l2 =1.5 l1 = 1.5, l2 = 0.5 l1 = 0.5, l2 = 1.5
to 0.85 at the cost of protection of the respondents’ privacy. The choice of P should be made such
that the respondents should not feel that their privacy is threatened.
The main problem with the use of the proposed method in Section 2 is that sometimes the mean
value of the rare unrelated attribute remains unknown. In the next section, we suggested a method
that is free from such a limitation.
In this method, each respondent in the sample of n persons, selected using SRSWR from the given
population, is requested to rotate two spinners one after the other. Each respondent in the sample
is requested to use spinner-I with the statements:
(a) Do you possess the rare sensitive attribute A1 ?
and
(b) Do you possess the rare unrelated attribute A2 ?
with probabilities P and (1 − P ) respectively.
Next, the respondent is requested to use spinner-II with the statements:
(a) Do you possess the rare sensitive attribute A1 ?
and
(b) Do you possess the rare unrelated attribute A2 ?
356 M. Land et al.
Spinner-I with two rare attributes Spinner-II with two rare attributes
(1−P) T
P T
(1−T )
1–P (1–T )
with probabilities T and (1 − T ), respectively. Spinners I and II are shown in Figure 3. We feel
that the cost of the survey will not be affected much if each respondent is requested to rotate two
spinners rather than one. The use of spinners could also be replaced with two known unrelated
characteristics as discussed earlier if one anticipates doing a large-scale survey through e-mails
or telephone surveys.
As before, the probabilities of a ‘yes’ answer in the use of spinners I and II are given,
respectively, by
Assuming that as n → ∞ and θ1 → 0 and θ2 → 0, then nθ1 = λ∗1 (say, finite) and nθ2 = λ∗2 (say,
finite). By following Section 2, we have
1
n
P λ̂1 + (1 − P )λ̂2 = y1i , (11)
n i=1
and
1
n
T λ̂1 + (1 − T )λ̂2 = y2i , (12)
n i=1
where y1i and y2i denote the observed values in the first and the second response from the ith
respondent, respectively. Solving Equations (11) and (12) for λ̂1 , we have:
Theorem 3.1 An unbiased estimator of the parameter λ1 for the rare sensitive attribute A1 is
given by
1 n n
λ̂1 = (1 − T ) y1i − (1 − P ) y2i , (13)
n(P − T ) i=1 i=1
with T = P .
Statistics 357
Proof Since y1i ∼ P (λ∗1 ) and y2i ∼ P (λ∗2 ), thus by taking expected values on both sides of
Equation (13), we have
1 n n
E(λ̂1 ) = (1 − T ) E(y1i ) − (1 − P ) E(y2i )
n(P − T ) i=1 i=1
1 n n
∗ ∗
= (1 − T ) λ1 − (1 − P ) λ2
n(P − T ) i=1 i=1
1
= [(1 − T ){P λ1 + (1 − P )λ2 } − (1 − P ){T λ1 + (1 − T )2 }]
(P − T )
1
= [(1 − T )P λ1 + (1 − T )(1 − P )λ2 − (1 − P )T λ1 − (1 − P )(1 − T )λ2 ]
(P − T )
= λ1 ,
Theorem 3.2 The variance of the unbiased estimator of the parameter λ1 is given by
1
V (λ̂1 ) = [{P (1 − T )2 + T (1 − P )2 − 2P T (1 − P )(1 − T )}λ1
n(P − T )2
+ {(1 − P )(1 − T )(2 − P − T ) − 2(1 − P )2 (1 − T )2 }λ2 ]. (14)
Proof Since y1i ∼ P (λ∗1 ) and y2i ∼ P (λ∗2 ), thus V (y1i ) = λ∗1 and V (y2i ) = λ∗2 . Note that both
responses are not independent, thus we have
1 n n
V (λ̂1 ) = 2 (1 − T ) 2
V (y 1i ) + (1 − P ) 2
V (y2i )
n (P − T )2 i=1 i=1
n
−2(1 − T )(1 − P ) Cov(y1i , y2i )
i=1
1 n n n
∗ ∗
= 2 (1 − T ) 2
λ + (1 − P ) 2
λ − 2(1 − T )(1 − P ) λ∗12 (15)
n (P − T )2 i=1
1
i=1
2
i=1
where
λ∗1 = V (y1i ) = E(y1i ) = P λ1 + (1 − P )λ2 , (16)
On using Equations (16), (17) and (18) into Equation (15), we have the theorem.
358 M. Land et al.
Corollary 3.1 An unbiased estimator to estimate the parameter λ2 for the rare unrelated
attribute A2 is given by
n
1 n
λ̂2 = T y1i − P y2i , (19)
n(T − P ) i=1 i=1
Corollary 3.2 An unbiased estimator of the variance of the estimator λ̂1 is given by
1
v̂(λ̂1 ) = [{P (1 − T )2 + T (1 − P )2 − 2P T (1 − P )(1 − T )}λ̂1
n(P − T )2
+ {(1 − P )(1 − T ) (2 − P − T ) − 2(1 − P )2 (1 − T )2 }λ̂2 ], (21)
where λ̂1 and λ̂2 are defined in Equations (13) and (19), respectively.
The per cent relative efficiency of the proposed estimator λ̂1 with respect to the direct question
method based estimator (say, with P = 1 and T = 1 situation) is defined as
λ1 (P − T )2 × 100%
RE = . (23)
{P (1 − T ) + T (1 − P )2 − 2P T (1 − P )(1 − T )}λ1
2
From Equation (23), it is clear that the relative efficiency of the proposed estimator is free from
the sample size n. It is also clear that the value of the two parameters λ1 and λ2 are equal then the
relative efficiency is a function of the randomization device parameters P and T . If P = 1 and
T = 0, the relative efficiency attains the maximum of 100%, but the choice of P = 1 and T = 0
is not practicable. To investigate the magnitude of the relative efficiency, we chose three different
pairs of (λ1 : λ2 ) as (1.5:1.5), (1.5:0.5) and (0.5:1.5), and we let the value of P range from 0.60 to
0.90 with a step of 0.001 and let that of T range from 0.1 to 0.4 with a step of 0.001. For the choice
of (λ1 : λ2 ) as (1.5:0.5), the relative efficiency remains a bit higher than the other two cases, which
suggests that it is good to use as the rare unrelated attribute Y , one with a mean value less than
that of the rare sensitive attribute A without effecting the cooperation of the respondents while
using the proposed randomization device such as spinners shown in Figure 3. The results based
Statistics 359
on the proposed method are presented in the three graphs in Figure 4. The relative efficiency of
the proposed estimator could be constrained between 60% and 80% when choosing the value of
P between 0.70 and 0.85 and that of T between 0.15 and 0.30. The choice of P and T should
be made in such a way that the respondents should not feel that their privacy is threatened, while
the difference (P − T ) should be kept as large as possible so that the variance of the proposed
estimator remains small.
Important remarks: (1) Following the Greenberg et al.’s unrelated question model, it is also
possible to take two independent samples of large sizes to solve the problem of unknown rare
attribute, but this may increase the cost of the survey too much due to the rarity of the attributes.
(2) Note that other types of randomized response models such as those due to Warner [1], Kuk
[5] and Franklin [6], etc. are not easily extendable for estimating the mean of a Poisson random
variable, because one cannot eliminate the factor (1 − π ) which results in a very high probability
of getting ‘yes’ answers in case of rare sensitive attributes, and this may lead to an inconsistent
estimator of the mean of the Poisson random variable if one tries to use these models.
Acknowledgements
The authors are thankful to the Editor-in-Chief Professor O. Bunke and a learned referee for the valuable comments on
the original version of the manuscript.
References
[1] S.L. Warner, Randomized response: A survey technique for eliminating evasive answer bias, J. Amer. Statist. Assoc.
60 (1965), pp. 63–69.
[2] B.G. Greenberg, A.L.A. Abul-Ela, W.R. Simmons, and D.G. Horvitz, The unrelated question randomized response
model – theoretical framework, J. Amer. Statist. Assoc. 64 (1969), pp. 520–539.
[3] D.S. Tracy and N.S. Mangat, Some developments in randomized response sampling during the last decade – a follow
up of review by Chaudhuri and Mukerjee, J. Appl. Statist. Sci. 4(2/3) (1996), pp. 147–158.
[4] J.A. Fox and P.E. Tracy, Randomized Response: A Method for Sensitive Surveys, SAGE Publications, Beverly Hills,
CA, 1986.
[5] A.Y.C. Kuk, Asking sensitive questions indirectly, Biometrika 77 (1990), pp. 436–438.
[6] L.A. Franklin, A comparison of estimators for randomized response sampling with continuous distribution from a
dichotomous population. Comm. Statist. Theory Methods 18 (1989), pp. 489–505.