Você está na página 1de 6

Methods Available for the Analysis of Data from Dominant Molecular Markers Lisa Wallace Department of Biology, University

of South Dakota, 414 East Clark ST, Vermillion, SD 57069 Email: lwallace@usd.edu February, 2003 In the following descriptions, locus = band (these are observable on a gel); an allele is an estimated entity based on dominant data. I. Descriptive statistics of levels of diversity Descriptive population genetic statistics can be calculated based on phenotypic (i.e., band presence/absence) or genotypic (i.e., allele frequencies) data. If you choose to calculate allele frequencies, you are assuming Hardy-Weinberg equilibrium in populations, an outcrossing mating system, and nearly random mating. If you have information from other sources on the mating system or extent of random mating (e.g., from allozymes), that can be incorporated in your estimates of diversity based on dominant data. Only two alleles are considered to exist for a dominant marker locus, the dominant allele (or present; this does not imply that the presence of a band is dominant over the absence in a Mendelian sense) and the null (or visually absent) allele. The presence of a band indicates either a heterozygote or a homozygote for the dominant allele. Thus, allele frequencies are calculated based on the frequency of the null allele (i.e., the number of individuals without the band). Where qi represents the frequency of the null allele, and pi represents the frequency of the dominant allele, qi = # individuals for which the band was NOT present total # individuals surveyed pi = 1- qi Several descriptive measures of diversity can be calculated, including: 1. Band frequencies (phenotypic data)
1/2

2. Allele frequencies (genotypic data) 3. Number of bands per primer, population, taxon, etc. 4. Number and frequencies of rare bands (its up to you to determine and defend what constitutes rarity) 5. Percentage of polymorphic loci the number of loci where the band was observed in some individuals and not in other individuals. This too can be determined at various levels (e.g., population, taxon) 6. Gene diversity often indicated as Neis heterozygosity or expected heterozygosity. Even calculated from allele frequencies, this is still only an estimate of the expected heterozygosity because the allele frequencies are an estimate of expected allele frequencies. Thus, estimates of this statistic should not be compared directly to estimates of heterozygosity that are based on true allele frequencies from allozymes or other codominant markers. The following estimates of diversity are from Mariette et al. (2002). a. Phenotypic gene diversity: Hp = 1 Pi2 - Q i2, where Pi and Q i are the frequencies of band presence and absence, respectively. Estimates of Hp are calculated for each locus, and the mean over all loci is used as the overall estimate of diversity at whatever hierarchical level you are interested in quantifying. b. Genotypic gene diversity: Hg = 1 pi2 - q i2, where pi2 - q i2 are the frequencies of the dominant and null alleles, respectively. Calculate for each locus, and then the mean over all loci just as for phenotypic diversity described above. c. I do not know of a program that will calculate Hp in this manner, but both POPGENE and TFPGA will calculate Hg. With POPGENE, you can specify if you want to assume complete inbreeding, complete outcrossing, or something inbetween in the estimates of Hg. In TFPGA, though, you only have the option of assuming complete outcrossing (i.e., Hardy-Weinberg equilibrium) in populations. TFPGA will give three estimates of genotypic diversity or heterozygosity. These include a direct count, expected heterozygosity under HWE, and Neis (1978) unbiased heterozygosity. The first two measures should be the same and are Hg. I dont recommend using Neis estimate because I dont fully understand how the program calculates it. POPGENE calculates Neis

diversity (1973), which should also be Hg. I calculated estimates of Hp by hand in a spreadsheet. You just need to do a lot of adding, multiplying, and subtracting. 7. Differences in the various estimates of diversity among taxa (populations are the experimental units) can be determined using non-parametric tests such as Kruskal-Wallis followed by Dunns multiple comparisons if you find significant overall differences. See Zar (1996) or Sokal and Rohlf (1995) for formulas. You could also test for differences across populations using loci as the experimental units. II. Comparative statistics of diversity Genetic identities or distances are useful for getting an overall idea of how similar (or different) populations and taxa are. Like estimates of levels of diversity, genetic identities/distances can be calculated based on phenotypic or genotypic data. For phenotypic data, the similarity coefficient of Nei and Li (1979; = Dices coefficient) is a commonly used measure, and can be calculated using NTSYS-pc. ARLEQUIN will calculate a raw estimate of the differences (i.e., the mean number of pairwise differences in bands within and between populations and taxa and inter-taxic distances are corrected to account for relative differences found within species). For genotypic data, any number of measures can be used, and the reader is referred to the manuals of TFPGA and POPGENE. Once calculated, identities/distances can be used in multivariate analyses (e.g., principal coordinates analysis) and in tree-building algorithms (UPGMA or Neighbor-joining). NTSYSpc will perform multivariate analyses and build trees. For PCO, generate a Dice similarity matrix of the data, DCENTER the matrix, and use the double centered matrix in EIGEN. For a tree, put the Dice similarity matrix into the NJOIN program or SAHN (for UPGMA). POPGENE and TFPGA only do UPGMA. I recommend using the NEIGHBOR algorithm in PHYLIP for a neighbor-joining analysis because you can then view the tree easily in TreeView. PAUP will also implement the neighbor-joining algorithm. Bootstrap support for trees can be determined in PAUP or with the RAPD programs developed by Bill Black. Use RAPDPLOT or RAPDDIST to generate multiple pseudo-replicate datasets of distances. Then, move this file over into the PHYLIP directory, and use NEIGHBOR to generate a tree from each of the distance matrices. Rename the resulting treefile and outfile to something like treefile1

and outfile1. Input treefile1 into CONSENSE to generate a consensus tree of the trees generated in NEIGHBOR. The bootstrap values will be in the outfile. The consensus tree can be viewed from the treefile in TreeView. These programs have limits on the number of populations and loci that can be used. Therefore, it might be easier to use PAUP to construct and bootstrap a tree. Similarity/distance matrices can also be compared to matrices based on other sets of data using a Mantel test (e.g., to compare physical distance among populations with how genetically similar/dissimilar they are or to compare taxonomic similarity based on molecular and morphological data). Mantel tests are most easily implemented using NTSYS-pc.

III. Genetic structure Estimates of genetic structure or the degree of differentiation among populations can be estimated using a variety of measures, including an analysis of molecular variance (AMOVA), and by using ratios of other diversity statistics. I recommend using AMOVA. AMOVA can be implemented using ARLEQUIN, and the help file that comes with the program is quite thorough in its explanation of how to carry out the analysis. Should you wish to use ratios of estimates of diversity (e.g., Hp or Hg) to determine population differentiation, the following is a guide: Amount of variation within populations = mean pop diversity/total species diversity Amount of variation among populations = [total species diversity mean pop diversity]/ total species diversity = 1 amount of variation within pops. If you are interested in genetic structure at more than two levels, then just adjust the above to match the number of levels you do have. For example, if you want to determine the amount of divergence among multiple regions as well as among populations, then the following would apply: Amount of variation among groups = [total species diversity mean regional diversity]/total species diversity

A new program call HICKORY and developed by Kent Holsinger and Paul Lewis at the University of Connecticut will calculate F-statistics, including FST, and the inbreeding coefficient, f. Some of the analyses included in this software using Bayesian statistics to estimate population genetic parameters. See Holsinger et al. (2002) for more details about analyses performed by HICKORY. Programs and where to find them: NTSYS-pc (F.J. Rohlf) $230 for the latest version 2.1 from Exeter Software, Setauket, NY. http://www.exetersoftware.com/cat/ntsyspc/ntsyspc.html ARLEQUIN (S. Schneider, D. Roessli, L. Excoffier) Free at http://lgb.unige.ch/arlequin/ POPGENE (F. Yeh, R. Yang, T. Boyle) Free at http://www.ualberta.ca/~fyeh/ TFPGA (M. Miller) Free at http://bioweb.usu.edu/mpmbio/tfpga.htm PHYLIP (J. Felsenstein) Free at http://evolution.genetics.washington.edu/phylip.html RAPD programs (B. Black) Free at ftp://lamar.colostate.edu/pub/wcb4/ TreeView (R. Page) Free at http://taxonomy.zoology.gla.ac.uk/rod/treeview.html HICKORY (K. Holsinger, P. Lewis) Free at http://darwin.eeb.uconn.edu/hickory/hickory.html References Holsinger, K. E., P. O. Lewis, and D. K. Dey. 2002. A Bayesian approach to inferring population structure from dominant markers. Molecular Ecology 11: 1157-1164. Mariette, S., V. Le Corre, F. Austerlitz, and A. Kremer. 2002. Sampling within the genome for measuring within-population diversity: trade-offs between markers. Molecular Ecology 11: 1145-156. Nei, M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA 70: 3321-3323. Nei, M. 1978. Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89: 583-590.

Nei, M. and W. H. Li. 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences, USA 76: 52695273. Sokal, R. R. and F. J. Rohlf. 1995. Biometry. Freeman, NY. Zar, J. H. 1996. Biostatistical Analysis. Prentice Hall, Upper Saddle River, NJ.

Você também pode gostar