Specifying the African origins of the African American genome

(Zakharia et al., 2009) (Fig.1)

Characterizing the admixed African ancestry of African Americans (Zakharia et al., 2009)

***

(Zakharia et al., 2009) (Table1)

Characterizing the admixed African ancestry of African Americans (Zakharia et al., 2009)

____________________

“We found that all the African Americans are admixed in the African component of their ancestry, with estimated contributions of 19% West (for example, Mandenka), 63% West Central (for example, Yoruba), and 14% South West Central or Eastern (for example, Bantu speakers), with little variation among individuals.” (Zakharia et al., 2009, p.8)

“These results are consistent with historic mating patterns among African Americans that are largely uncorrelated to African ancestral origins, and they cast doubt on the general utility of mtDNA or Y-chromosome markers alone to delineate the full African ancestry of African Americans.”  (Zakharia et al., 2009, p.1)

____________________

***

In this blogpost i’ll continue revisiting DNA studies which might have already been published a while ago, but can still prove useful. Especially when trying to learn more about the African origins of African Americans within an approximate regional format. But also for those curious about intra-African genetic variety. Expectations about a finestructured ethnic breakdown should be lowered though despite the somewhat misleading labelling of the ancestral components identified. Unlike the previous haplogroup studies reviewed, these DNA studies will be focussing on the whole genome of their sample groups and therefore provide an autosomal analysis, which potentially should be more informative. These papers can be helpful too if you’re wanting to learn more about the science behind your personal DNA admixture test results on either 23andme, AncestryDNA, Family Tree DNA etc. or also by Dr. McDonald and the various calculators on GEDMatch.

***

***

The first study mentioned goes into the greatest detail describing the within-Africa ancestry of African Americans. The others are more so catered to the genetics of African populations with the African American genome only coming up as a sidethought. They are still very valuable in themselves to obtain a proper understanding of how African ancestral components or genetic clusters are being determined by DNA researchers. These studies are all fairly technical accounts and it’s advisable to at least familiarize yourself with a couple of recurring concepts, the basics of their methodology etc. For a good introduction first read the following:

In case you’re only interested in the outcomes just be aware of the following:

  • Ancestral components are often named after ethnic groups which were sampled and used in the analysis, these potentially misleading labels are NOT per se an indication of actual descent but rather of genetic affinity with the samples. It’s a common “newbie” mistake to take the labels literally.
  • The configuration or tweaking of the dataset can produce vastly different results (just try the GEDMatch calculators for yourself 😉 )
  • The availability of crucial African reference populations or sample groups is inherently limited and is bound to skew the results to some degree
  • This new field of science is still improving itself
  • Research results are therefore not to be taken as anything final. At least not when it comes to the percentages, the overall analysis can still be generally correct but of course only to be determined after a careful reading and pending new research insights.

***

Characterizing the admixed African ancestry of African Americans

(Zakharia et al., 2009)
Link to online article

____________________

“In this study, we characterize the African origins of African Americans by making use of the high-density genotype data generated for 94 HGDP indigenous Africans from differing geographic and linguistic groups, including 21 Mandenka from West Africa, 21 Yoruba from West Central Africa, 15 Bantu speakers from Southwestern and Eastern Africa, 20 Biaka Pygmy and 12 Mbuti Pygmy from Central Africa, and five San from Southern Africa [18]. These subjects are used to represent the potential African ancestors of 136 African Americans recently genotyped in a GWA study of early-onset coronary artery disease ” (Zakharia et al., 2009, p.2)

“Our results are based on examination of the entire autosomal genome and, therefore, provide a more-robust picture of the admixed African ancestry of individual African Americans compared with prior analyses, which focused on only a single locus (mtDNA or Y chromosome).” (Zakharia et al., 2009, p.8)

“To our knowledge, the analyses reported here represent the first effort to characterize the African origin of African Americans by isolating the African-derived genome in each African American individual.” (Zakharia et al., 2009, p.2)

____________________

This study was pretty much pioneering when it came out in 2009. The available sampling was quite limited though as the authors point out themselves. For example when it comes to the US state origins of the 136 African American samples it’s clear they were overwhelmingly from the West. Also the relative lack of African samples is being acknowledged however the authors caution that even an addition of more African reference populations might not really have improved their analysis. Because they established that there was a great degree of genetic overlap between their African ethnic samplegroups. The other studies develop this argument in even more detail so i will return to this theme later on.

Keeping in mind its limitations the interesting part for me was how the regional breakdown within Africa seems to have been rather consistent for their African American samplegroup. Implying that the intermixing of their relocated African ancestors from different ethnic backgrounds has been quite extensive and seemingly random. Resulting in an evenly divided genepool without any real outliers, meaning individuals who scored more “Bantu” (lightblue) or “Mandenka” (orange) than “Yoruba” (red).  It stands in some contrast with what i personally found for AncestryDNA results of African Americans which show more variation (see this spreadsheet). Although the various regions/ancestral components might not be perfect equivalents.

It’s apparent that the socalled “Mandenka”, “Yoruba” and “Bantu” labels are being used somewhat carelessly at times by the authors, they should not be taken literally. These clusters are rather to be considered as proxies for Upper Guinean, Lower Guinean and Central African ancestry (as mentioned in the very first quote above). Taken that way their 3-way breakdown is not that much different from the slave trade records, which also indicate that Lower Guinean origins (in between Liberia and Cameroon) would be predominant for African Americans, even when Upper Guinean and Central African origins are also present of course. So that seems like a pretty solid finding actually.

***

(Zakharia et al., 2009) (Fig.4)

(Zakharia et al., 2009)

***

The greater affinity found for Yoruba samples is also clearly seen in this figure below.  A couple of red Yoruba samples are even located within the purple African American cluster! Had the Yoruba samples been Igbo samples instead undoubtedly you would see pretty much the same patterns and probably even more pronounced given the predominance of Bight of Biafra origins compared with Bight of Benin according to the Slave Voyages Database.

(Zakharia et al., 2009) (Fig.3)

(Zakharia et al., 2009)

***

Another interesting finding concerns the Bantu samples used in this study being quite genetically close to their Yoruba samples. A finding which is also confirmed by the other studies. It’s good to keep this in mind given the socalled “Cameroon/Congo” region found in the Ethnicity Estimates of AncestryDNA, and possibly also for any upcoming new (sub)regional resolution for Africa on 23andme and how it should be interpreted. Judging from this graph below the Mandenka samples seem most distinct, as is also the case for the “Senegal Region” on Ancestry.com (see AncestryDNA regions). The number of samples is very small though.

____________________

“The Bantu appear to have closest ancestry to the Yoruba. This is consistent with the Nigerian origins of the Yoruba and the presumed origins of the Bantu from the southwestern modern boundary of Nigeria and Cameroon [24], and the subsequent migration of the Bantu east and south.” (Zakharia et al., 2009, p.4)

____________________

(Zakharia et al., 2009) (Supplement)

(Zakharia et al., 2009)

***

Genome-wide patterns of population structure and admixture in West Africans and African Americans

(Bryc et al., 2010)
Link to online article

***

Bryc et al. (2009),  Figure 1E,F

Genome-wide patterns of population structure and admixture in West Africans and African Americans (Bryc et al., 2010)

____________________

“To obtain a fine-scale genome-wide perspective of ancestry, we analyze Affymetrix GeneChip 500K genotype data from African Americans (n = 365) and individuals with ancestry from West Africa (n = 203 from 12 populations)” (Bryc et al., 2010, p.1)

“Finally, patterns of genetic similarity among inferred African segments of African-American genomes and genomes of contemporary African populations included in this study suggest African ancestry is most similar to non-Bantu Niger-Kordofanian-speaking populations, consistent with historical documents of the African Diaspora and trans-Atlantic slave trade.” (Bryc et al., 2010, p.1)

____________________

This study is quite similar to the previous one but less detailed in describing the outcomes. The number of 365 African American samples, apparently “from throughout the United States”, has increased in comparison. Plus also the African samplegroups are more numerous however this time no Mandenka or otherwise Upper Guinean samples were included! The most western ethnic group being the Brong from Ghana. So in that sense despite a greater number of African samples this study could actually be inherently less equipped to describe the full regional origins of African Americans than Zakharia et al. (2009) which made use of a smaller but more widely ranging dataset in line with the full regional scope of African American origins. The visual presentation therefore also seems to lack much detail. If you closely examine the figure above you’ll notice that the African part of the African Americans is only depicted in lightblue and purple. They do offer more insight when strictly dealing with their African samplegroups though.

Despite their somewhat limited analysis when it comes to African Americans they make an interesting observation below. It goes to show that in DNA research sample groups are often being used as proxies, depending on limited availability and not per se representing the best historical “fit”. This is not always realized by people taking personal DNA admixture tests, especially on GEDMatch and the results given by Dr. McDonald. People not familiar with the science behind DNA testing tend to take the ethnic labels literally and thereby run a great risk of misidentifying their actual ancestors. Afterall had other proxies been chosen instead their ancestry could very well have been described by different ethnic groups! All of this aside from the mathematical unlikelihood that any Afro-descendant would actually have all of their dozens or hundreds of African born ancestors belonging to only one single ethnic group (see Fictional Family Tree incl. African Born Ancestors).

____________________

“A concern in estimating admixture is the effect of choice of ancestral populations. Often, the true ancestral population is no longer available for sampling; thus, using a proxy may introduce bias when evaluating the admixed population”  […]

“Some studies estimating admixture proportions in African Americans have used a single ancestral African population, the Yoruba (39), and our data provide an effective means of testing whether other populations may serve as better proxies for the ancestral population of African Americans and whether using the Yoruba biases inferences. “ […]

That these FST values are all nearly identical (and quite small), coupled with the small pairwise FST values of the Igbo, Yoruba, and Brong populations (Table 1), suggests that considering the set of West African populations sampled, any of these three populations may serve as a proxy for the ancestral population of the African Americans and that, in fact, all three are likely to have contributed ancestry to present-day African Americans “ […]

“it is important to note that other African populations not sampled, including those from Sierra Leone, Senegal, Guinea Bissau, and Angola, may also serve as good (or potentially even better) proxies for the ancestral population of some African Americans” (Bryc et al., 2010, p.5)

____________________

Another interesting part of the study is this table which shows the genetic distances between their various African samplegroups. As pointed out in their quote below this can sometimes complicate distinguishing distinct ancestral components. However they remain optimistic that this issue might be solved in the near future…

***

Bryc et al. (2009), Table 1

(Bryc et al., 2010)

____________________

“some populations (e.g.,Igbo, Yoruba, and, to a lesser extent, Brong) are so closely related genetically that their contribution to patterns of African ancestry in African Americans is not reliably distinguishable. We believe that increasing the density of markers and, more importantly, sequencing directly in these populations to identify ancestryinformative markers may make this possible in the future.” (Bryc et al., 2010, p.5)

____________________

The Genetic Structure and History of Africans and African Americans

(Tishkoff et al., 2009)
Link to online article
Link to supplement

***

Tishkoff et al. (2009) AA breakdown p.1039

(Tishkoff et al., 2009)

____________________

We studied 121 African populations, four African American populations, and 60 non-African populations for patterns of variation at 1327 nuclear microsatellite and insertion/deletion markers. We identified 14 ancestral population clusters in Africa that correlate with self-described ethnicity and shared cultural and/or linguistic properties. We observed high levels of mixed ancestry in most populations, reflecting historical migration events across the continent.” [..]

 “The ancestry of African Americans is predominantly from Niger-Kordofanian (~71%), European (~13%), and other African (~8%) populations, although admixture levels varied considerably among individuals.” (Tishkoff et al., 2009, p.1035)

____________________

This paper has been a landmark study for its coverage of sampled African populations and describing genetic variation across the continent. If you do an internet search for it you will find it has been discussed extensively already on various websites/blogs/forums etc. I will therefore focus only on the relevant parts for this particular blogpost. I recommend reading the supplement as well, eventhough it’s HUGE!  For a nice visual presentation of this research see this slide show.

In comparison with the previous studies this one is definitely on a whole other level when it comes to African sampling. However the main results, as shown above, are actually not that radically different. A socalled Niger-Kordofanian (=Niger-Congo) ancestral component (orange in the figures above) being found to be predominant and shared with other western Africans. In a way this first breakdown seems quite similar to what’s done on 23andme right now (see Ancestry Composition).However other STRUCTURE runs are also being performed in which more specification is obtained. The breakdown shown below seems much more informative although described in somewhat misleading terms…

____________________

“Supervised STRUCTURE analysis (fig. S34) (4) was used to infer African American ancestry from global training populations, including both Bantu (Lemande) and non-Bantu (Mandinka) Niger-Kordofanian–speaking populations (fig.S34 and table S7). These results were generally consistent with the unsupervised STRUCTURE analysis (table S6) and demonstrate that most African Americans have high proportions of both Bantu (~0.45 mean) and non-Bantu (~0.22 mean) Niger-Kordofanian ancestry, concordant with diasporas originating as far west as Senegambia and as far south as Angola and South Africa (62).”  (Tishkoff et al., 2009, p.1043)

____________________

*** (click to enlarge)

Tishkoff et al. (2009) AA breakdown (Fig. S32)

(Tishkoff et al., 2009, supplement p.69)

***(click to enlarge)

Tishkoff et al. (2009) AA breakdown (Table S7)

(Tishkoff et al., 2009, supplement, p.90)

***

We can see that the socalled Niger-Kordofanian (NK) cluster has been split up in “Bantu Lemande” (still orange) and “Non-Bantu Mandenka” (shown in grey). The “Bantu Lemande” cluster actually describing the shared ancestral connections between the Yoruba, Igbo, Brong etc. from Lower Guinea with the Bantu speaking populations from Central Africa.  Seems imprecise to say the least to just qualify it as “Bantu ancestry”, as done in the quote above. Instead it seems to be corresponding to a primary breakdown between Upper Guinea versus Lower Guinea/Central Africa combined.  Which would also make more sense keeping in mind slave trade statistics. The Upper Guinean portion being estimated in the range of 19-24%  and interestingly peeking in North Carolina. Although the regional AA samplesize seems not that big to be conclusive.

In addition also smaller components are being identified such as Fulani, Nilo-Saharan, Chadic and even Cushitic. Before rushing to any conclusions it’s pertinent to look at the breakdown of other African samplegroups, as you will see that in fact they also are showing minor %’s of these non-Niger-Congo ancestral clusters. In Tishkoff et al. (2009) they are explained as being remnant traces of ancient population migrations, far removed from a genealogical timeframe (~500 years). They mention for example that:

____________________

“Our data support the hypothesis that the Sahel has been a corridor for bidirectional migration between eastern and western Africa” (Tishkoff et al., 2009, p.1041)

____________________

Making it more likely i suppose that African Americans would have inherited these seemingly “exotic” DNA markers from Western African ancestors rather than directly by way of Eastern African ancestors in a recent timeframe of the Trans Atlantic Slave Trade.

Tishkoff et al. (2009) (Table S9)

(Tishkoff et al., 2009, supplement, p.95)

***

Tishkoff et al. (2009) map

(Tishkoff et al., 2009)

***

Just to touch briefly upon the more central theme of this paper which is the genetic stucture and diversity of Africans. It is generally known that Africa is the most genetically diverse continent. However it’s usually not realized that much of this diversity is to be found comparing Niger-Congo speaking groups with other major African language groups (see also the Ethno-Linguistic Map section), some of which are very small in number, such as the Khoi-San and the Pygmies. Within the Niger-Congo language family there might still be much genetic diversity however it’s generally spread out across ethnic groups rather than ethnic groups being completely distinct from one another. In other words intragroup diversity is often greater than intergroup diversity. So the ancestral breakdown of individuals from different but neighbouring ethnic groups could be quite similar while two random individuals from the same ethnic group could be surprisingly different from each other. In some cases reflecting a differentiated deep ancestry, predating ethnic formation or else different patterns of more recent external geneflow. Either way demonstrating that at least from a genetical perspective ethnicity can indeed be considered as a social contruct in individual cases. This however also has implications for designing predictive ancestral clusters in personal DNA testing!

***

Genome-wide Comparison of African-Ancestry Populations 

(Bhatia et al., 2011)
Link to online article

***

Bhatia et al. (2011) (PCA)

(Bhatia et al., 2011)

***

Bhatia et al. (2011) (Fig. 2)

(Bhatia et al., 2011)

____________________

 “500 individuals from each of our African-American, Nigerian, and Gambian data sets were studied together with 500 European individuals via PCA with EIGENSOFT49 (see Figure 1).” […]

“We note that although several African-American individuals come very close to the Nigerian cluster, there remains a nonzero distance between all African-American individuals and the Nigerian cluster. This is consistent with a small, but measurable, FST between the African ancestors of African Americans and Nigerians. ” […]

“The Gambian individuals are separated from Europeans on PC1 and from the Nigerians on PC2. We label each of the Gambian individuals with their subpopulation label (Mandinka, Jola, Fula, Wolloff) and note the existence of cryptic population structure within the Gambia. Several Fula individuals show significant evidence of European-related admixture by their position on PC1. Additionally, the four subpopulations form overlapping but distinguishable clusters along PC2.” (Bhatia et al., 2011, p.372)

____________________

The figures and quotes shown above should be quite selfexplanatory so i will not comment too much on this study. The findings are however very intriguing as they seem to be showing that African Americans share more ancestry with Nigerians than with Gambians. However given the absence of a proxy samplegroup for Central African ancestry it could  be that the PCA plot above might be pulling the African American samples towards the Nigerian cluster in an exaggerated manner. Then again in figure 3 from Zakharia et al.(2009), which i also posted earlier, we see a similar plot but this time incl. Bantu samples aside from Yoruba Nigerians and Mandenka from Senegambia. Here again the African American samples seem to be clustering nearest with Nigerians. Also this time there’s no possible distraction caused by the inclusion of European samples which accounts for the main variation along eigenvector 1 in the plot from Bhatia et al.(2011). Showing only intra-African variation therefore. Eitherway this outcome seems in line with the results of Zakharia et al.(2009) and Tishkoff et al. (2009): both studies reporting Upper Guinean ancestry to be minor compared with Lower Guinean & Central African ancestry combined.

Another noteworthy aspect of this study is the HUGE dataset of Gambian samples, consisting of no less than “2946 individuals from the WTCCC-TB study27 genotyped on the Affymetrix 500k array.” (Bhatia et al., 2011, p.370). Various ethnic groups are represented among them and interestingly they have been made use of again in a very recent and important study named “The African Genome Variation Project shapes medical genetics in Africa” (2015), which i will review in an upcoming blogpost as it features a highly insightful separate ancestral cluster for Upper Guineans. It looks very promising for any prospects of refined African analysis in personal DNA testing, that is if these samples are ever to be made available to companies like 23andme or Ancestry.com…

Below PCA plot is clearly showing the overlap between ethnic groups in Gambia despite also some substructure. Quite likely all of these ethnic groups could be counted as being among the ancestors of presentday Cape Verdeans (see also “Top 20 Ethnic Roots“), attempting to genetically disentangle their separate contributions seems practically impossible at this stage. However who knows in the near future new ways of pinpointing ethnic origins might arise (possibly based on frequencies of IBD segment sharing).

Jallow et al. (2009) PCA

Genome-wide and fine-resolution association analysis of malaria in West Africa. (Jallow et al., 2009)

Advertenties

Geef een reactie

Vul je gegevens in of klik op een icoon om in te loggen.

WordPress.com logo

Je reageert onder je WordPress.com account. Log uit / Bijwerken )

Twitter-afbeelding

Je reageert onder je Twitter account. Log uit / Bijwerken )

Facebook foto

Je reageert onder je Facebook account. Log uit / Bijwerken )

Google+ photo

Je reageert onder je Google+ account. Log uit / Bijwerken )

Verbinden met %s