Você está na página 1de 4

Forensic Science International: Genetics 5 (2011) 281284

Contents lists available at ScienceDirect

Forensic Science International: Genetics


journal homepage: www.elsevier.com/locate/fsig

Original research paper

The predictive value of the maximum likelihood estimator of the number of


contributors to a DNA mixture
H. Haned a,*, L. Pene b, F. Sauvage a, D. Pontier a
a
Universite de Lyon, Universite Lyon 1, CNRS, UMR 5558, Laboratoire de biometrie et biologie evolutive, 69622 Villeurbanne, France
b
Institut National de Police Scientique, Laboratoire de Police Scientique de Lyon, France

A R T I C L E I N F O A B S T R A C T

Article history: We propose to quantify the accuracy of a likelihood-based estimator that was recently proposed for the
Received 13 November 2009 determination of the number of contributors to a DNA mixture, when genetic data alone is considered [H.
Received in revised form 22 February 2010 Haned, L. Pene, J.R. Lobry, A.B. Dufour, D. Pontier, Estimating the number of contributors to forensic DNA
Accepted 21 April 2010
mixtures: does maximum likelihood perform better than maximum allele count? J. Forensic Sci., in
press]. Using Bayes theorem, we derive a formula for the calculation of the predictive value (PV) of the
Keywords: likelihood-based estimator. The PV gives the probability that a DNA stain contains the DNAs of i people
DNA mixtures
given that the maximum likelihood estimator gave an estimate of i contributors for this stain. We
Likelihood estimator
Traces
illustrate the PV calculations for two different types of DNA evidence: traces and body uids.
Body uids The PV varied according to the number of contributors involved in the DNA stain. Setting the maximum
Predictive value number of possible contributors to ve, the lowest predictive values were scored for ve-person mixtures
Bayes theorem with a minimum value of 0.26 for traces, but values were always above 0.94 for stains comprising one, two
or three contributors, for both traces and body uids. Values remained relatively high for four-person
mixtures with a minimum value of 0.69. These ndings conrm that likelihood-maximization is a powerful
approach for the determination of the number of contributors to forensic DNA mixtures.
2010 Elsevier Ireland Ltd. All rights reserved.

1. Introduction maximum likelihood estimator searches the number of contribu-


tors maximizing the likelihood of the observed DNA proles. Using
As the sensitivity of typing methods is constantly increasing, computer-simulated DNA mixtures, the authors of this study
forensic experts deal with more and more complex cases of showed that maximizing the likelihood of the data to nd the most
evidence containing the DNA of several individuals. Though likely number of contributors gives more accurate estimates than
numerous statistical methods exist to calculate the strength of using a lower bound when dealing with mixtures of more than
DNA evidence, the most challenging step in the interpretation of three contributors. However; before considering the use of this
such mixed stains is still the determination of the number of estimator in practical cases, it is important to have at disposal a
contributors involved [1]. Usually, the circumstances of the method to quantify the level of condence that can be given to the
investigated crime combined with genetic and non genetic yielded results.
evidence can produce good grounds to the determination of this In this paper, we propose to globally quantify the accuracy of
number. But the task is seriously complicated when scarce data is the maximum likelihood estimator. Relying on Bayes theorem, we
available about the origin of the stain. This is common in DNA derive a formula for the calculation of the predictive value (PV) of
casework where often no suspect or known contributors are the estimator. The PV aims to give a global appreciation of the
available. A common laboratory practice consists on bounding the condence that can be given to the estimates meanwhile taking
number of contributors to the minimum required to explain the into account prior information about the occurrences of mixed
observed DNA proles without making any use of the available DNA stains in forensic casework. We explain the method and
data except for the number of alleles per locus [2]. Recently, an illustrate its potential use in forensic studies.
alternative approach based on the maximum likelihood principle
was proposed to overcome this issue [3]. Using qualitative 2. Methods
information on which alleles are present in the mixture, this
2.1. Theoretical background

* Corresponding author at. The maximum likelihood estimator takes into account genetic
E-mail address: haned@biomserv.univ-lyon1.fr (H. Haned). data, namely, the frequencies of the alleles present at each locus

1872-4973/$ see front matter 2010 Elsevier Ireland Ltd. All rights reserved.
doi:10.1016/j.fsigen.2010.04.005
282 H. Haned et al. / Forensic Science International: Genetics 5 (2011) 281284

characterizing the analyzed DNA stain, and searches the number of object or tool, or as body uids when samples came from biological
contributors that maximizes the likelihood of the observed proles uids, namely, blood, saliva and semen. The number of individuals
[3]. We dene the predictive value of this estimator as the involved in the stain was also indicated. Samples comprising one
probability of having i contributor(s) to the tested DNA stain, contributor were classied as single-source stains, samples
knowing that the likelihood estimator gave an estimate of i comprising two contributors were classied as resolvable
contributor(s) for this stain. The PV is data-independent, which mixtures and stains comprising more than two contributors
means that the observed data, namely the DNA proles in the stain, were classied as unresolvable mixtures. This restricted
are not involved in the calculations. The PV can thus be assimilated classication is explained by the difculty of determining the
to a precision rate of the estimator, specic to each mixture type. real number of individuals involved [6].
Two-person mixtures are believed to account for the majority of
2.2. Formulation of the predictive value of the likelihood estimator mixtures encountered in casework [7]. Three-, four- and ve-
person mixtures are believed to be rarer. But, as a consequence of
Denoting x the true number of contributors to the mixture and x the restricted classication, very scarce data is available in the
its estimation, the predictive value of the estimator can be written literature about the occurrence of these complex mixtures in
as the conditional probability: Prx ijx i. A simple way to forensic casework. The construction of a prior distribution of
estimate this unknown probability is to rewrite it using its inverse, mixtures occurrences in forensic casework was thus necessary for
which is: Prx ijx i. The transformation is simply done using mixtures comprising more than two contributors.
Bayes formula: The prior probabilities for stains comprising one or two
contributors were set using the available data (survey of the INPS
Prx ijx iPrx i
Prx ijx i (1) casework for year 2008). We chose to set the remaining
Prx i probabilities for mixtures comprising more than two contributors
The term Prx ijx i is the probability that the estimator using experts prior beliefs. We asked three experienced forensic
classies the considered stain as a mixture of i contributor(s), given experts at the INPS to set the proportions of mixed stains
that there are actually i contributor(s). Haned et al. [3] used a comprising three, four or ve contributors. We focused on two
simulation procedure to estimate these conditional probabilities: a key issues in setting up this prior distribution:
thousand mixture comprising two to ve contributors were
simulated by combining alleles at random, with respect to their (i) the probability of encountering a mixture with i contributors
allele frequencies. The efciency of the estimator was estimated as must decrease as i increases,
the proportion of correctly identied mixtures. Here, we follow a (ii) the probability of encountering a complex mixture with more
similar procedure: We simulated 1000 DNA stains containing one than two contributors must be greater in case of traces than in
to ve individuals, using the US African American allele frequen- case of body uids. We justify this by the difculty in
cies published in [4]. The conditional probabilities of success of the distinguishing single-sources contributors in case of traces [8].
estimator were then estimated for each simulated number of
contributors. These requirements are meant to help the forensic experts to
Hereafter, we will refer to the probability Prx i as the prior set the prior distribution but they are not compulsory to the
probability of encountering a mixture of i contributors. Prx i is method, and they can of course be modied or dropped.
the probability of the estimator giving i as an estimate for the
number of contributors to the stain, regardless of the concerned 3. Results and discussion
mixture type. Using the law of total probabilities we rewrite
probability Prx i to a product of conditional and prior 3.1. Crime scene proles survey
probabilities as follows:
Among the 8479 casework proles stains, 5169 were body
Prx ijx iPrx i uids and 3310 were traces. The majority of stains, 71%, comprised
Prx ijx i K (2)
X one contributor and was classied as one contributor stains.
Prx ijx kPrx k
Among the remaining 29% stains, 6% were resolvable mixtures
k1
classied as two-person mixtures and 23% were classied as
Prx ijx k is the probability that the estimator classies the unresolvable mixtures. There were more mixed DNA stains among
considered stain as a mixture of i contributor(s) knowing that there traces than among body uids (Table 1). This nding agrees with
are actually k contributor(s), where k can be equal or differ from i. our predictions and can be explained by the fact that in case of
Values of k range from 1 to K, where K is a biological meaningful body uids, the major contributor drowns the signal of other
threshold for the number of contributors. For illustrative purpose, contributors to the mixture, whereas in case of traces, the low
we set K to 5 and search the maximum likelihood estimates in the quantities of DNA contributed by each individual prevent from
discrete interval [1,6]. As we later discuss, this threshold can be detecting single-source DNA contributors.
extended to K > 5.
3.2. Predictive value of the likelihood estimator
2.3. Constructing the prior distribution of mixed DNA stains
The conditional probabilities of success were estimated from
Thanks to Eq. (2), the only term we have to determine now is the simulated data (Table 2). We obtained similar results to those of
prior probability Prx i. In order to construct this prior
distribution we used a survey of the crime scene proles analyzed
Table 1
at the Institut National de Police Scientique (INPS), the national Percentages of crime scene proles comprising one, two or more than two
forensic laboratory in Lyon, France (data communicated by Laurent individuals.
Pene). For the year 2008, 8479 crime scene proles were analyzed
x1 x2 x>2
at the INPS using the Applied Biosystems AmpFlSTR1 IdentilerTM
Traces 45% 4% 51% N 3310
kit [5]. These samples were either classied as traces when they
Body uids 87% 7% 6% N 5169
came from contact traces, for instance epithelial cells on a given
H. Haned et al. / Forensic Science International: Genetics 5 (2011) 281284 283

Table 2 depending on the crime scene context and the type of evidence
Estimates of the conditional probabilities Prx ijx k. The table is read vertically.
being analyzed. For instance, traces are likely to contain more
For example, the probability of having an estimate of 5, knowing that there are
actually 4 people in the DNA stain is 0.127. contributors than stains from body uids. Once the prior
distributions of the mixed stains set, the results are straightfor-
x 1 x 2 x 3 x 4 x 5 x 6
ward.
x1 1 0 0 0 0.00 0
x2 0 0.998 0.002 0 0.00 0
x3 0 0.005 0.937 0.058 0.00 0 4. Conclusion
x4 0 0 0.067 0.805 0.127 0.001
x5 0 0 0 0.131 0.662 0.207
In this paper, we propose the predictive value to be considered
as a global measure of the likelihood-based estimator efciency. It
is notable that the PV is not meant to be a measure of the
Haned et al. [3]. Different prior values were chosen for traces and uncertainty related to the estimates.
body uids (Table 3). The values presented in this study depend on the simulated
The predictive values varied according to the prior probabilities data and the priors we dened. These can be adapted with respect
used. Where non null priors are used, the predictive values were to the context where the DNA evidence is analyzed. PV calculations
relatively high, for both traces and body uids, as values ranged using priors different from those we propose here can be carried
from 0.69 to 1 for stains containing one, two, three or four out using the R package forensim, available from http://forensim.r-
contributors. The lowest values were scored for ve-person forge.r-project.org/.
mixtures (0.26 for traces). When similar priors are used, the PV The maximum likelihood estimator of the number of con-
slightly differed; in this case, it appeared that the distinction tributors to forensic DNA mixtures can be powerful in critical
between the types of DNA stains under analysis is not necessary. cases, for instance when dealing with DNA casework. Very often in
The priors used in this study are not arbitrary as they are such cases, scarce data is available about the origin of the stain and
dened by experts prior belief. The use of such priors in likelihood only genetic data are available. These data consist of qualitative
ratios is controversial as discussed in Buckleton et al. [9], but in this information about which alleles are present in the stain and
study, the focus is on methods evaluation and these priors are not quantitative information about the alleles peak heights and areas.
related to the prior knowledge about the number of contributors The maximum likelihood estimator only considers qualitative
before the DNA evidence is analyzed. data. Quantitative information might not always help to separate
We set the threshold for the number of contributors to ve the DNA proles into individual components. Moreover, there is no
(Tables 3 and 4) which led to searching the maximum likelihood consensus in the literature about how peak heights or areas should
estimates in the discrete interval [1,6]. We believe that this is a be taken into account, and the developments in the literature
biologically meaningful threshold for searching the most plausible dealing with quantitative data [1015] have not encountered the
number of contributors. However, this threshold can be extended, expected success in the forensic community.
The fact that genetic data support a certain number of
contributors to the evidentiary stain can be of signicant help
Table 3 for the investigators, before any suspect or comparison between
Prior distribution probabilities, for traces and body uids, set by three forensic DNA proles can be processed. When no other information is available,
experts: Expert 1, Expert 2 and Expert 3. Values for x 1 and x 2 were set using
this estimate can guide investigators in their search for potential
the data survey shown Table 1. Values for x 3; . . . ; 5 were given by the interviewed
forensic experts. suspects. To conclude, even if the maximum likelihood approach
might seem too complex for presentation in court, it must not be
x1 x2 x3 x4 x5
neglected as a valuable tool to determine the number of
Expert 1 contributors to DNA stains and forensic experts should be aware
Traces 0.45 0.04 0.30 0.15 0.06 that an alternative method to maximum allele count exists.
Body uids 0.87 0.07 0.04 0.01 0.01

Expert 2
Traces 0.45 0.04 0.35 0.15 0.01 Acknowledgments
Body uids 0.87 0.07 0.05 0.01 0

Expert 3 We thank two referees for a thorough review and constructive


Traces 0.45 0.04 0.25 0.20 0.06 comments. We are grateful to Anne Viallefont and David Fouchet
Body uids 0.87 0.07 0.05 0.01 0 for their helpful comments.

Table 4 References
Predictive values of the maximum likelihood estimator according to the prior
distributions dened by Experts 13 and shown Table 3. Predictive values are given [1] T. Clayton, J. Buckleton, Mixtures, in: J. Buckleton, C.M. Triggs, S.J. Walsh (Eds.),
for traces and body uids, according to the number of individuals contributing to Forensic DNA Evidence Interpretation, CRC Press, 2005, pp. 217274.
the stain (x 1; . . . ; 5). [2] D.R. Paoletti, T.E. Doom, C.M. Krane, M.L. Raymer, D.E. Krane, Empirical analysis of
the STR proles resulting from conceptual mixtures, J. Forensic Sci. 50 (2005)
x1 x2 x3 x4 x5 13611366.
[3] H. Haned, L. Pene, J.R. Lobry, A.B. Dufour, D. Pontier, Estimating the number of
Expert 1
contributors to forensic DNA mixtures: does maximum likelihood perform better
Traces 1 0.96 0.96 0.83 0.67
than maximum allele count? J. Forensic Sci., 2011, in press.
Body uids 1 0.99 0.98 0.69 0.84
[4] J.M. Butler, R. Schoske, P.M. Vallone, J.W. Redman, M.C. Kline, Allele frequencies
Expert 2 for 15 Autosomal STR loci on U.S. Caucasian, African American, and Hispanic
Traces 1 0.96 0.97 0.85 0.26 populations, J. Forensic Sci. 8 (2003) 908911.
[5] Applied Biosystems (2001) AmpFlSTR1 IdentilerTM PCR Amplication Kit Users
Body uids 1 0.99 0.98 0.73 0
Manual, Foster City, CA, P/N 4323291.
Expert 3 [6] B. Budowle, J. Onorato, T.F. Callaghan, A.D. Manna, A.M. Gross, R.A. Guerrieri, et al.,
Traces 1 0.96 0.94 0.88 0.61 Mixture interpretation: Dening the relevant features for guidelines for the
Body uids 1 0.99 0.98 0.73 0 assessment of mixed DNA proles in forensic casework, J. Forensic Sci. 54
(2009) 810821.
284 H. Haned et al. / Forensic Science International: Genetics 5 (2011) 281284

[7] Y. Torres, I. Flores, V. Prieto, M. Lopez-Soto, M.J. Farfan, A. Carracedo, P. Sanz, DNA [11] I.W. Evett, P. Gill, J. Lambert, Taking account of peak areas when interpreting
mixtures in forensic casework: a 4-year retrospective study, Forensic Sci. Int. 134 mixed DNA proles, J. Forensic Sci. 43 (1998) 6269.
(2003) 180186. [12] M. Perlin, B. Szabady, Linear mixture analysis: a mathematical approach to
[8] J. Buckleton, P. Gill, Low Copy Number, in: J. Buckleton, C.M. Triggs, S.J. Walsh resolving mixed DNA samples, J. Forensic Sci. 46 (2001) 13721378.
(Eds.), Forensic DNA Evidence Interpretation, CRC Press, 2005, pp. 275297. [13] T. Clayton, J. Buckleton, Mixtures, in: J. Buckleton, C.M. Triggs, S.J. Walsh (Eds.),
[9] J.S. Buckleton, J.M. Curran, P. Gill, Towards understanding the effect of uncertainty Forensic DNA Evidence Interpretation, CRC Press, 2005, pp. 217274.
in the number of contributors to DNA stains, Forensic Sci. Int. Genet. 1 (2007) 20 [14] T. Wang, N. Xue, J. Douglas Birdwell, Least-square deconvolution: a framework for
28. interpreting short tandem repeat mixtures, J. Forensic Sci. 51 (2006) 12841297.
[10] P. Gill, R. Sparkes, R. Pinchin, T. Clayton, J. Whitaker, J. Buckleton, Interpreting [15] R. Cowell, S. Lauritzen, J. Mortera, Identication and separation of DNA mixtures
simple STR mixtures using allele peak areas, Forensic Sci. Int. 91 (1998) 4153. using peak area information, Forensic Sci. Int. 166 (2007) 2834.

Você também pode gostar