Você está na página 1de 17

Sub-Saharan admixture in West Eurasian groups

- Moorjani et al. 2011 Critique (by Dienekes Pontikos)


Let me preface this by saying that I don't doubt that there exists some
Sub-Saharan admixture in some West Eurasian (Caucasoid) groups, and I've
quantified the different types of African admixture that can be found in many
such groups, most recently here.

However, there are serious methodological flaws in a new paper by Moorjani et


al. which render its estimates unreliable. This is unfortunate, as the authors
assembled an important dataset, but they only consider a very simplistic model
of 2-population admixture which is completely inappropriate for the problem
they are studying.

Caucasoids on the Chinese-San axis of variation

Moorjani et al. motivate their study by projecting various West Eurasian


groups from Europe and the Near East onto the first principal component
of variation defined by CHB (Chinese) and San (Bushmen). The reasoning is
the following:
To study the signal of African gene flow into West Eurasian
populations, we began by computing principal components (PCs)

using San Bushmen (HGDP-CEPH- San) and East Eurasians


(HapMap3 Han Chinese- CHB), and plotted the mean values of the
samples from each West Eurasian population onto the first PC, a
procedure called PCA projection [17,18]. The choice of San and
CHB, which are both diverged from the West Eurasian ancestral
populations [19,20], ensures that the patterns in PCA are not
affected by genetic drift in West Eurasians that has occurred since
their common divergence from East Eurasians and South Africans.
This is indeed a good idea: if some Caucasoid group A has a common
ancestral element with Sub-Saharans that is lacking in another Caucasoid
group B, then A is expected to be shifted towards the San side of the first
PC relative to B. Indeed, this is what the authors observe:
We observe that many Levantine, Southern European and Jewish
populations are shifted towards San compared to Northern
Europeans, consistent with African mixture, and motivating formal
testing for the presence of African ancestry (Figure 1, Figure S2).
However, this is clearly a case of seeing the glass half full. The authors prefer
the hypothesis that some Caucasoid groups have African ancestry, although
the hypothesis that other Caucasoid groups have East Asian ancestry can
equally well explain the observed pattern. Indeed, both hypotheses may
explain the phenomenon they observe.

For example, African ancestry in Palestinians has been well-documented, so


Palestinians are expected to be San-shifted relative to northern Europeans. On
the other hand, East Eurasian ancestry has also been well-documented in HGDP
Russians, so we expect them to be CHB-shifted relative to southern Europeans.

Things are not that clear for other Caucasoid populations, e.g., southern
Europeans or northwestern Europeans. The authors assume that the different
position of these two groups on the San-Chinese axis is due only to
Sub-Saharan admixture in southern Europeans. This implicit assumption is
the Achilles' heel of the paper.

Tests of population admixture

Because of genetic drift, two populations that diverged from a common ancestor
will have different allele frequencies. However, imagine if we looked at these
allele differences and saw that a population A not only had different frequencies
than B, but also the difference in frequencies tended to be in the direction of a
Sub-Saharan population. For example, at some locus f(A)=0.4, f(B)=0.3, and
f(Sub-Saharan)=0.1. You can see that B's frequency deviates from A's in the
direction of Sub-Saharans. This may occur due to random drift for one particular
marker, but if it occurs systematically across the genome, then admixture is a
likely explanation. This is the basis of the 3-population test used by the authors.
3

Another idea is to see whether frequency differences between A and B are


correlated with frequency differences between Sub-Saharans and another
Eurasian population unrelated to either A or B. Differences between Caucasoids
and Sub-Saharans are (in part) due to divergence between Sub-Saharans and
ancestral Eurasians. Suppose, for example, that we've identified a group (e.g.,
Papuans) unlikely to have admixed with Caucasoids. If B differs from A (over
many markers) in the same direction that Sub-Saharans differ from Papuans, this
is consistent with the notion that B has some Sub-Saharan admixture that A
lacks. This is the basis of the 4-population test.

Note that because of symmetry, a highly negative value in their 4-population test
(x, CEU, Papuan, YRI) indicates Sub-Saharan admixture, while a highly
positive one would indicate "Papuan" admixture! The authors do observe
positive values, suggesting that some northern European populations are
Papuan-shifted even with respect to CEU, most notably Russia with a Z-score of
11.4. Thankfully, we are spared a paper on Papuan admixture in Russia.

Comparison to the Indian Cline work

These tests are an important statistical tool, and many of this paper's authors
have used them before to study the Indian Cline of populations. However, the

current paper has two important shortcomings in comparison to Reich et al.


(2009).

In their study of the Indian Cline, Reich et al. (2009) excluded groups that
were shifted towards CHB, thus ensuring that they were left with groups that
could be modeled as a simple mix of two ancestral population elements.

Moreover, they used the Onge a relatively isolated population from the Indian
Ocean as a control group that could be said to form a clade with Ancestral South
Indians at the exclusion of West Eurasians. In the current paper it is simply
assumed that northern Europeans have no African admixture.
Application of the test to each West Eurasian population (using A =
YRI and B= CEU) finds little or no evidence of mixture in North
Europeans but highly significant evidence in many Southern
European, Levantine and Jewish groups (Table 1).
In other words: taking CEU (a northern European population) as the
standard, northern Europeans have no evidence of African admixture.

Sardinians: an important test case

Sardinians are an important test case for the authors' model. Their 3-population
test shows no evidence of admixture, while the 4-population test does. Moreover,
their STRUCTURE analysis shows a trivial 0.2%, whereas the authors estimate
their Sub-Saharan admixture as 2.9%.

Let's begin by performing a PCA analysis of Sardinians, CHB, and CEU, which
is shown below.

(All PCA analyses are done in smartpca as implemented in EIGENSOFT 4.0


beta, withnumoutlieriter set to 0. All analyses are performed over datasets
merged in PLINK with the --geno 0.001 flag, which effectively keeps only
common markers and ensures a high quality dataset)

CEU is shifted towards CHB relative to Sardinian. This is made more visually
obvious if we blow up the CEU/CHB portion of the above plot:

CEU is shifted towards CHB by 2.4% relative to Sardinians. This is quite close
to the 2.5% East/South Asian K=3 admixture for Britons in my most recent
analysis, done with a different East Asian reference and a different method
(ADMIXTURE); the CEU sample of White Utahns has been repeatedly shown
to be most similar to people from the British Isles or Northwestern Europe.

Now, let's look at Sardinians, CHB, and YRI:

and a blowup:

Sardinians are shifted 1.1% relative to CEU towards YRI. Again, this is close to
the 0.9% K=3 Sub-Saharan ADMIXTURE result I recently obtained.

So, where does the 2.9% Sub-Saharan admixture in Sardinians come from?
Moorjani et al. estimate this percentage under the assumption that Northern
Europeans are not shifted towards Chinese, i.e., that East Eurasians are
irrelevant. Clearly, as we have seen, this is wrong. As we shall see, this
erroneous assumption leads to the erroneous admixture estimate.

2.9% Sub-Saharan admixture in Sardinians (?)

Now, I will demonstrate how the spurious 2.9% result can be obtained. By
doing so, it will become obvious why Moorjani et al. obtained this result as a
9

result of ignoring the eastern Asian shift of their northern European sample in
their analysis.

Here is a PCA plot of Sardinians, CEU, CHB, YRI:

and the blowup:

10

When we run all four populations together, Sardinians are shifted towards YRI
along Dimension 1, and CEU are shifted towards CHB along Dimension 2.
Given that the eigenvalue for PC1 is approximately twice (50.15) that for PC2
(25.31), and doing a little high school geometry on the triangle (Sardinian, CEU,
YRI), we project Sardinian onto the CEU-YRI line, intersecting at point X. We
thus obtain the estimated "CEU" admixture as:

[distance(YRI,X)-distance(X,CEU)]/[distance(YRI,X)+distance(X,CEU)]
=

[distance(YRI,

Sardinian)^2-distance(CEU,Sardinian)^2]

distance(CEU,YRI)^2

which equals 0.971021, and so, "YRI" admixture is 2.9%!

11

Ashkenazi Jews

The example of the Sardinians showed how lack of controling for East Eurasian
shift tended to overestimate the degree of Sub-Saharan admixture. Another test
case is that of Ashkenazi Jews. The authors find no evidence of admixture with
their 3-population test, but do find such evidence with their 4-population test, as
well as with STRUCTURE.

On a PCA plot of CHB, Ashkenazi (Behar et al. 2011), and CEU, the Ashkenazi
are shifted 3.3% towards CHB along eigenvector 1.

On a PCA plot of YRI, CEU, and Ashkenazi, the Ashkenazi are shifted by 5.3%
towards YRI.

12

In the case of the Sardinians, their African-shift together with CEU's Asian-shift
caused Sardinians/CEU to diverge on the African-Asian axis, and Moorjani et al.
took the entirety of this divergence to represent African admixture in Sardinians.

In this case Ashkenazi are both Asian- and African-shifted relative to CEU. The
two shifts partially cancel each other out: Ashkenazi are pulled towards Africans
on the YRI-CHB axis because of their YRI-shift, and away from them because
of their CHB-shift. Failing to account for these processes, the authors assume
that only Sub-Saharan admixture in Ashkenazi can accont for the different
position of CEU and Ashkenazi on the Asian-African axis, coming up with a
2.8-3.2% "Sub-Saharan admixture" in two different samples.

And, here is a second way of seeing how this spurious admixture estimate
follows from the phenomenon I am describing. CEU are (in terms of Fst) 0.76
times distant from CHB as they are from YRI (Fst=0.17 and 0.129). In other
13

words, Sub-Saharan admixture is more "potent" at shifting a population than


East Eurasian ancestry is. Ashkenazi are YRI-shifted by 5.3%, and they are
CHB-shifted by 3.3%. Multiplying the latter by 0.76 we obtain: 5.3-0.76*3.3 =
2.8%!

In other words, the 2.8% Sub-Saharan admixture in Ashkenazi Jews is a


compromise between two different phenomena in a tug-of-war. It is not an
accurate estimate of admixture.

Papuans

I have also carried an experiment with Sardinians, Ashkenazi Jews, CEU, and
Papuans, instead of CHB, as Papuans are also used in the paper as an outgroup
population.

14

and the blowup:

It is clear that the populations show differential shift towards Papuans that is
concordant with their above-described shift towards the Chinese.

Luhya and Bilala

Failure to correct for differential shift towards Chinese/Papuans is problem


enough, but the paper also fails to properly take into account non-West African
populations. North African groups are conspicuous in their absence, while the
HapMap3 Luhya (LWK) and a Bilala sample are used to represent East Africa.

Henn et al. (2011) contains Tuscan, Yoruba, Maasai, Bulala samples, so I ran
the Tuscans as test data in a supervised ADMIXTURE 1.1 analysis together
with these African groups, HGDP-CEPH North_Italian, and HapMap3 CEU.
That is, I'm playing along -for the sake of argument- with the idea that East
15

Eurasians are irrelevant, and Tuscans can be seen as a mixture of CEU


"Europeans" and African groups.

The results are unambiguous: Tuscans/North Italians are found to be 2.1%/1.2%


"Maasai" and 0% of all the other African groups. In other words whatever
element there is in common between Tuscans and Africans is not
particularly West African.

The inclusion in the paper of HapMap3 Luhya Bantu but not of HapMap3
Luhya Maasai is puzzling, and the choice of one group over the other is passed
in silence.

In my own experiments, I distinguish between North, Sub-Saharan, and East


African ancestral components.

Beyond a binary worldview

Much more can be said, but let's summarize: the model of Moorjani et al. (2011)
fails because:
1. It does not account for the West-East Eurasian axis, folding everything

onto the North European-Sub-Saharan African one


2. It undersamples African diversity by excluding both North African and

East African populations


16

17

Você também pode gostar