Escolar Documentos
Profissional Documentos
Cultura Documentos
ICROSATELLITE loci, also known as short tandem repeats, are tandem repeat loci with repeat
motifs of two to six nucleotides in length (Tautz 1993).
Microsatellites are highly informative as polymorphic
markers. Variations at microsatellite loci have been used
to study the history and genetic structure of individual
populations, such as DNA fingerprinting, paternity and
relatedness testing, reconstruction of evolutionary trees,
and genetic distance. In addition, they are useful for
inferring migration histories, for identifying individuals
of unknown origin, and for detecting the hidden population substructure. Microsatellites are also widely used
in linkage mapping.
Statistical properties of all measures of genetic variation critically depend upon the composite parameter
4N, where N is the effective population size and
is the mutation rate per locus per generation. An
accurate estimate of will greatly facilitate the inference
on the basis of variation at microsatellite loci. While the
variation at microsatellites is extremely useful, little has
been done to estimate using microsatellite data. This
is partly due to the unknown mutation mechanism at
such loci. Microsatellite loci are hypervariable and the
mechanisms that produce new variation at such loci are
unusual in comparison with those of classical loci. While
the exact mechanism of mutations at such loci is still
not well characterized at a molecular level ( Jeffreys et
al. 1994), it is generally believed that the processes and
the patterns of mutations at different loci may differ
1
Corresponding author: Human Genetics Center, School of Public
Health, University of Texas, 1200 Herman Pressler, Houston, TX
77030. E-mail: yunxin.fu@uth.tmc.edu
556
H. Xu and Y.-X. Fu
TABLE 1
Large sample variance of estimator V
Var( V)
SDa
1
5
10
50
1.67
35.0
136.67
3350.0
1.29
5.92
11.69
57.88
Standard deviation of V.
V 2Vs.
(1)
1
1
2.
12
3
(2)
(3)
1
.
1 2
(4)
p i2 ,
(5)
i1
1 1
F
1 .
2 F 2
(6)
E ( F) 1.1313
3.4882 28.2878
0.3998. (7)
n
n2
For 10,
3.3232 63.698
E( F ) 1.1675
0.2569.
n
n2
(8)
557
558
H. Xu and Y.-X. Fu
TABLE 2
Properties of F
Fe
Bias
30
50
100
200
300
400
600
1000
3.094
3.009
2.927
2.884
2.878
2.869
2.865
2.878
2.019
2.053
2.056
2.054
2.058
2.056
2.057
2.071
0.019
0.053
0.056
0.054
0.058
0.056
0.057
0.071
3.66
3.40
3.16
3.11
3.02
3.06
3.05
3.07
3.66
3.39
3.16
3.11
3.02
3.06
3.05
3.07
30
50
100
200
300
400
600
1000
7.225
6.939
6.742
6.636
6.604
6.603
6.561
6.575
4.999
5.031
5.043
5.036
5.035
5.046
5.023
5.045
0.001
0.031
0.043
0.036
0.035
0.046
0.023
0.045
16.52
14.25
12.90
12.26
12.10
12.00
11.78
11.98
16.52
14.25
12.90
12.26
12.10
12.00
11.78
11.97
10
30
50
100
200
300
400
600
1000
14.186
13.354
12.958
12.778
12.691
12.605
12.625
12.610
10.157
10.026
10.051
10.062
10.041
9.994
10.035
10.043
0.157
0.026
0.051
0.062
0.041
0.006
0.035
0.043
61.03
48.81
42.02
39.51
38.43
37.79
36.90
36.88
61.00
48.81
42.01
39.51
38.43
37.79
36.90
36.88
20
30
50
100
200
300
400
600
1000
27.940
26.249
25.260
24.941
24.600
24.604
24.579
24.493
19.909
19.973
20.012
20.099
19.923
19.977
20.006
19.972
0.091
0.027
0.012
0.099
0.077
0.023
0.006
0.028
217.52
171.77
142.82
131.68
125.90
124.55
123.72
122.48
217.51
171.77
142.82
131.67
125.89
124.55
123.72
122.48
30
30
50
100
200
300
400
600
1000
41.942
39.140
37.721
36.932
36.850
36.646
36.565
36.480
30.104
30.010
30.126
30.002
30.094
30.000
30.007
29.994
0.104
0.010
0.126
0.002
0.094
0.000
0.007
0.006
514.50
377.85
308.05
278.08
270.54
262.49
262.79
256.49
514.49
377.85
308.03
278.08
270.53
262.49
262.79
256.49
40
30
50
100
200
300
400
600
1000
55.986
51.862
49.832
49.100
48.776
48.829
48.553
48.433
40.358
39.946
39.986
40.085
40.028
40.174
40.044
40.020
0.358
0.054
0.014
0.085
0.028
0.174
0.044
0.020
927.53
643.28
527.34
481.06
466.07
451.92
450.49
441.63
927.40
643.28
527.34
481.05
466.07
451.89
450.49
441.63
MSE
Variance
559
TABLE 3
Comparison of F and V
F
Bias
MSE
Var
Bias
MSE
Var
Var(T)a
Efficiencyb
1
2
3
4
5
6
8
10
12
14
16
18
20
25
30
35
40
0.052
0.071
0.062
0.045
0.045
0.049
0.040
0.043
0.064
0.071
0.055
0.021
0.028
0.003
0.006
0.068
0.020
1.16
3.07
5.36
8.36
11.98
16.25
25.85
36.88
51.23
65.10
81.94
101.61
122.48
182.73
256.49
340.53
441.63
1.16
3.07
5.35
8.36
11.97
16.25
25.85
36.88
51.22
65.09
81.93
101.61
122.48
182.73
256.49
340.53
441.63
0.002
0.015
0.010
0.001
0.012
0.006
0.016
0.059
0.023
0.005
0.109
0.070
0.108
0.019
0.124
0.105
0.129
1.65
6.11
12.57
23.32
35.71
51.54
86.66
129.69
191.19
262.74
337.60
423.15
549.14
847.60
1221.03
1649.56
2103.78
1.65
6.11
12.57
23.32
35.71
51.54
86.66
129.69
191.19
262.74
337.59
423.14
549.13
847.60
1221.01
1649.54
2103.76
1.67
6.00
13.00
22.67
35.00
50.00
88.00
136.67
196.00
266.00
346.67
438.00
540.00
841.67
1210.00
1645.00
2146.67
1.44
1.96
2.43
2.71
2.92
3.08
3.40
3.71
3.83
4.09
4.23
4.31
4.41
4.61
4.72
4.83
4.86
a
b
(9)
The performance of both estimators under this generalized stepwise mutation model was investigated through
computer simulation. A total of 50,000 samples were
simulated assuming the generalized model with
0.67. With this value,
E(|U|) 1/ 1.5.
That is, on average each mutation causes a jump of
allele sizes of 1.5 repeat units. For each simulated
sample, the sample procedure as before was taken to
obtain the two estimators, F and V. The bias and MSE
were also taken for each estimator. The corresponding
theoretical values for the bias and MSE of V were also
computed. The details are in the appendix. The simulation value agrees well with the theoretical value. The
results are shown in Table 6.
Table 6 shows that under the generalized stepwise
mutation model, both estimators are upwardly biased.
That is, both estimators on average overestimate the
real value. The bias is an increasing function of .
When the bias of F is compared to that of V, the former
always has a smaller bias than the latter, which means
that F is less biased than V especially when is high.
Comparison between the corresponding MSEs also
shows that F has a smaller MSE than V. These two
points make F still more preferable than V even when
560
H. Xu and Y.-X. Fu
TABLE 4
TABLE 5
L
MSE
Mean
Mean
MSE
30
50
100
200
300
2.027
2.028
2.041
2.086
2.039
3.635
3.403
3.193
3.189
3.015
2.358
2.306
2.238
2.203
2.158
3.249
2.812
2.159
1.901
1.725
30
50
100
200
300
4.921
4.956
5.036
5.014
5.075
16.328
13.502
12.618
11.694
12.091
5.727
5.534
5.382
5.321
5.224
19.949
14.356
11.569
10.029
9.579
10
30
50
100
200
300
9.987
10.002
9.967
9.930
10.058
58.355
47.552
42.251
38.276
38.725
12.189
11.945
11.296
10.945
10.866
106.184
85.583
58.665
47.266
45.992
20
30
50
100
200
300
20.044
20.008
20.200
20.196
19.945
241.480
178.937
151.968
136.492
129.011
26.200
25.259
24.227
23.290
22.569
635.676
392.604
272.930
217.196
192.065
MC replicates
Mean
MSE
Mean
MSE
10,000
30
50
100
200
300
10.12
9.93
10.07
10.03
10.03
60.28
48.83
42.87
39.50
37.65
12.59
11.93
11.55
11.08
11.01
129.90
96.72
81.39
73.05
70.18
100,000
30
50
100
200
300
9.99
10.00
9.97
9.93
10.06
58.36
47.55
42.25
38.28
38.73
12.19
11.95
11.30
10.95
10.87
106.18
85.58
58.67
47.27
45.99
1,000,000
30
50
100
200
300
9.95
10.07
9.98
10.10
9.98
61.30
48.44
40.65
40.56
36.89
11.90
11.59
11.08
10.86
10.56
102.60
76.51
55.96
47.88
39.73
Kimmel and Chakraborty (1996) showed that sample homozygosity at a microsatellite locus depends not
only on , but also on the pattern of allele size change
caused by mutation. Therefore, any attempt to estimate
on the basis of homozygosity has to be mutation model
dependent. Interestingly, the regression formula we
found on the basis of the single-step stepwise mutation
model is reasonably robust against deviations from the
single-step model. This is a useful property since it is
very difficult to specify the model with confidence. On
the other hand, if one has sufficient confidence in a
particular model, a similar approach can be used to
derive the regression formula under the model. This can
be seen from our simulation study when the mutation
561
TABLE 6
Comparison of F and V under the generalized model
F
Bias
MSE
Bias
Bias(T)a
MSE
MSE(T)b
1
2
3
4
5
6
8
10
12
14
16
18
20
25
30
35
40
0.39
0.96
1.63
2.33
3.17
4.05
5.88
7.88
9.95
12.00
14.41
16.56
19.01
25.16
31.66
38.43
45.13
2.30
7.42
15.46
26.38
40.12
58.32
105.93
168.99
250.23
341.60
468.47
601.07
770.27
1,276.12
1,921.98
2,756.88
3,706.23
1.99
3.97
5.90
7.75
9.86
11.97
15.75
19.72
23.73
27.22
31.88
35.07
39.00
49.09
59.05
67.99
77.91
2
4
6
8
10
12
16
20
24
28
32
36
40
50
60
70
80
25.72
84.52
170.98
288.51
453.10
641.03
1,086.61
1,660.19
2,387.10
3,047.05
4,396.28
5,181.32
6,440.48
10,362.25
13,896.41
18,798.79
25,021.12
26
84
174
296
450
636
1,104
1,700
2,424
3,276
4,256
5,364
6,600
10,250
14,700
19,950
26,000
a
b
TABLE 7
Comparison of estimates of ratio of mutation rates with F and V
F
Population
2/1
3/1
4/1
2/1
3/1
4/1
Biaka
Mbuti
Druze
Danes
Han
Japanese
Nasioi
Yakut
Yucatan
Surui
5.58
5.48
3.11
5.61
5.19
10.16
1.68
3.83
2.86
5.33
3.61
8.96
4.18
3.30
1.03
3.86
3.61
2.33
3.18
5.20
2.28
4.40
0.94
1.48
0.72
1.19
4.91
0.60
0.23
3.24
1.68
4.23
0.93
0.61
0.66
0.60
0.77
0.27
0.91
1.47
1.76
5.66
1.69
0.84
0.67
0.24
1.77
0.90
1.51
1.99
2.42
10.55
2.58
0.56
0.88
1.48
3.14
0.05
1.43
3.58
4.88
0.73
5.34
0.47
3.93
0.66
4.34
0.53
2.00
0.52
2.74
0.83
1.21
0.36
1.30
0.94
1.70
0.48
2.26
0.88
2.67
0.95
8.96
1.12
Mean
SD of mean
Variance
Coefficient of variance
562
H. Xu and Y.-X. Fu
A
E( ) 1 ,
n
(10)
V
,
E(U 02)
(11)
LITERATURE CITED
Chakraborty, R., and K. M. Weiss, 1991 Genetic variation of the
mitochondrial DNA genome in American Indians is at mutationdrift equilibrium. Am. J. Anthropol. 86: 497506.
Chakraborty, R., M. Kimmel, D. Stivers, L. Davison and R. Deka,
1997 Relative mutation rates at di-, tri-, and tetra-nucleotide
microsatellite loci. Proc. Natl. Acad. Sci. USA 94: 10411046.
Cheung, K. H., M. V. Osier, J. R. Kidd, A. J. Pakstis, P. L. Miller
et al., 2000 ALFRED: an allele frequency database for diverse
populations and DNA polymorphisms. Nucleic Acids Res. 28:
361363.
Deka, R., G. Sun, D. Smelser, Y. Zhong, M. Kimmel et al., 1999 Rate
and directionality of mutations and effects of allele size constraints at anonymous, gene-associated and disease-causing trinucleotide loci. Mol. Biol. Evol. 16: 11661177.
Di Rienzo, A., A. C. Peterson, J. C. Garza, A. M. Valdes, M. Slatkin
et al., 1994 Mutational process of simple-sequence repeat loci
in human populations. Proc. Natl. Acad. Sci. USA 91: 31663170.
Fu, Y. X., and R. Chakraborty, 1998 Simultaneous estimation of
all the parameters of a stepwise mutation model. Genetics 150:
487497.
Jeffreys, A. J., K. Tamaki, A. MacLeod, D. G. Monckton, D. L.
Neil et al., 1994 Complex gene conversion events in germline
mutation at human minisatellites. Nat. Genet. 6: 136145.
Kimmel, M., and R. Chakraborty, 1996 Measures of variation at
DNA repeat loci under a general stepwise mutation model. Theor.
Popul. Biol. 50: 345367.
Kimmel, M., R. Chakraborty, D. Stives and R. Deka, 1996 Dynamics of repeat polymorphisms under a forward-backward mutation
model: within- and between-population variability at microsatellite loci. Genetics 143: 549555.
Manly, B. F. J., 1997 Randomization, Bootstrap and Monte Carlo Methods
in Biology. Chapman & Hall, New York.
Nielsen, R., 1997 A likelihood approach to population samples of
microsatellite alleles. Genetics 146: 711716.
Ohta, T., and M. Kimura, 1973 A model of mutation appropriate
to estimate the number of electrophoretically detectable alleles
in a finite population. Genet. Res. 22: 201204.
Rubinsztein, D. C., W. Amos, J. Leggo, S. Goodburn, S. Jain et al.,
1995 Microsatellite evolutionevidence for directionality and
variation in rate between species. Nat. Genet. 10: 337343.
Shriver, M. D., L. Jin, R. Chakraborty and E. Boerwinkle, 1993
VNTR allele frequency distributions under the stepwise mutation
model: a computer simulation approach. Genetics 134: 983993.
Tautz, D., 1993 Notes on the definition and nomenclature of tandemly repetitive DNA sequence, pp. 2128 in DNA Fingerprinting:
Current State of the Science, edited by S. D. J. Pena, R. Chakraborty,
J. T. Epplen and A. J. Jeffreys. Birkhauser Publishing, Basel,
Switzerland.
Weber, J. L., and C. Wong, 1993 Mutation of human short tandem
repeats. Hum. Mol. Genet. 2: 11231128.
Wehrhahn, C. F., 1975 The evolution of selectively similar electrophoretically detectable alleles in finite natural populations. Genetics 80: 375394.
Zhivotovsky, L. A., and M. W. Feldman, 1995 Microsatellite variability and genetic distances. Proc. Natl. Acad. Sci. USA 92: 11549
11552.
Zouros, E., 1979 Mutation rates, population sizes and amounts of
electrophoretic variation of enzyme loci in natural populations.
Genetics 92: 623646.
Communicating editor: J. B. Walsh
Given that
P(|U| x) (1 )x1, where 0.67,
that is, |U| geometric(0.67), since the mutation is
symmetric, U0 U, we have
1
1 E(U 04)
.
Var(Vs ) V 2 V
3
12 E(U 02)
Var(U) (E(U))2
M(t)
(A1)
(A2)
2
E( V ) 2E(Vs ) 2 E(U 02)
. (A3)
2
2
(A4)
e 2
.
1 e 2
Bias( V ) 3 2.
(A5)
To calculate the MSE of V we need to calculate variance of size variance V first. From Equation 16 of Kimmel and Chakraborty (1996),
(A8)
(A9)
(A10)
Therefore,
(A7)
(A6)
1
1
2
2
2
.
563
(A11)
Therefore,
MSE( V) [Bias( V)]2 Var( V )
(2)2 122 10
162 10.
(A12)