Did Ancestry kill their African breakdown? (part 2)

Earlier this month Ancestry finally rolled out the updated version of its Ethnicity Estimates for all its customers. Sadly the concerns I raised in July have become reality. Many people are now left confused by their revised African breakdown as reported by AncestryDNA.1 Understandably so given the often drastic and seemingly incoherent changes compared with the previous set-up. In this three-part blog post I will argue that Ancestry’s pioneering analysis of especially West African DNA has been downgraded rather than upgraded! In the first part I evaluated the accuracy of Ancestry’s new African breakdown by analyzing the before & after results of 130 African customers. I found that in most cases the informational value to be derived from their results is showing a decrease rather than any improvement. In the upcoming last part I will discuss FAQ’s about this update as well as look into promising new developments. See also:

 

Table 1 (click to enlarge)

Updated Reference Panel++

Source: Ancestry’s White Paper 2018. Text in red added by myself. Compare also with this overview of Ancestry’s previous Reference Panel. The number of African samples included in Ancestry’s Reference Panel has increased considerably. However take note that this increase of African samples has been disproportionate. Mostly benefiting the “Benin/Togo”, “Mali” and “Cameroon, Congo & Southern Bantu people” regions.

***

The title of this blog series was sort of meant to be tongue-in-cheek 😉 as I do believe that Ancestry still offers opportunities for those wanting to learn more about their African lineage. Nonetheless it seems very clear to me that Ancestry’s update may indeed have “killed it“, but only with their new Asian & European breakdowns! However not so with their African breakdown which has taken a big step backwards instead of forwards. At least in most aspects.

In this part 2 I will explore how the changed composition of Ancestry’s Reference Panel as well as Ancestry’s new algorithm may have contributed to this very disappointing outcome. Main topics:

  1. More is not always better: over-sampling for “Cameroon, Congo & Southern Bantu”, “Benin/Togo” and “Mali” causing inflated scores?
  2. What are the ethnic backgrounds of Ancestry’s African samples?
  3. New algorithm has issues with describing mixed/complex lineage?

***

1) More is not always better

As I have maintained throughout my AncestryDNA survey it is always essential to be aware of any shortcomings in DNA testing. Luckily Ancestry still provides sufficient information on their website to help you understand your results better. However you do need to actively seek it out and not be inclined to skip the small print 😉 I do find that the level of transparency has decreased somewhat when compared to their previous update in 2013.Then again Ancestry’s new white paper is still an insightful, albeit a rather technical account. Recommended reading:

____________________

“We’ve added 13,000 more samples to our reference panel, which increases our ability to identify and find the genetic signature of a region within one’s DNA. ” (Source: Ancestry)

“The rollout of our enhanced ethnicity estimates will take place on September 12, 2018 and with this update, new and existing customers can expect more precise results across Asia and Europe.” (Source: Ancestry) [take note how Africa is not mentioned!]

“We’ve used the expanded reference panel and updated algorithm to add more specific regions in Asia and Europe. (Source: Ancestry) [take note how Africa is again not mentioned!]

Africa presents special challenges. The African continent is the ancient birthplace of humanity, and humans there are the most genetically diverse on earth. This makes Africa a tricky place for ethnicity estimation because you need lots of DNA samples to account for all that diversity. We’re working to increase the number of African samples in our reference panel so we can take full advantage of our new methods of analysis and provide even better estimates for Africa.”  (Source: Ancestry)

____________________

More is better” is a very current belief. Not only in DNA testing but also generally speaking.3 However this assumption does not always hold true! As illustrated by the quotations above Ancestry itself also acknowledges its lackluster African update. There has been no meaningful improvement of performance when describing African DNA for people of African descent but rather a deterioration. In spite of an impressive increase of Ancestry Reference Panel with over 13,000 samples. The total number of samples Ancestry uses to compare your own DNA with is now 16,638 versus 3,000 previously.

Granted most of these new samples are either European or Asian. But in fact also the number of African samples has been increased from 464 to 1,395. Which represents a tripling in sample size! Then again the proportion of African samples in Ancestry’s previous Reference Panel used to be about 15% (464/3000). This share has now however decreased to 8% (1395/16638). This is already an indication of how it is not only absolute numbers you should be concerned with, but also relative standing.

When we have a closer look at the newly updated Reference Panel (see table 1) and compare with the previous version (see this page) more imbalances are revealed. It can be seen for example how the “Cameroon, Congo & Southern Bantu” region now easily has the greatest sample size (n=579). Also the number of Malian samples has been expanded quite spectacularly. This region used to have the least number of samples (n=16) in the previous set-up but now there are 169 samples from Mali available! This is surely to be seen as an improvement in itself.4 However this accomplishment stands in stark contrast with how the already low number of samples for “Senegal” has practically stagnated (n=31 versus n=28). Also for “Nigeria” and “Ivory Coast/Ghana” only a modest increase in sample size has been achieved. In fact it turns out that more than 80% of the increase in African samples went to just three regions: “Cameroon, Congo & Southern Bantu” (+446), “Benin/Togo” (+164), and “Mali” (+153).

Is it any coincidence that these three regions are also the ones which seem to appear in very inflated amounts among the updated results of Afro-Diasporans? I will elaborate further below. However based on only table 1 one might already (intuitively) say that the over-sampled regions seem to suck in ethnicity estimate %’s at the expense of under-sampled regions. In a way functioning like a magnet. Given the reasonable predictive accuracy of the previous version (see this post) it makes you wonder why Ancestry bothered with adding more African samples in such an imbalanced manner when it only ended up making things worse and not better! No change is better than bad change after all!

As highlighted in one of the quotations above Ancestry has added mostly new European & Asian regions in their update. Resulting in a grand total of 43 global regions compared with 26 global regions in the previous version. To be fair Ancestry did introduce one new African region labeled “Eastern Africa”. But at the same time they also merged the former “Cameroon/Congo” region with the “Southeastern Bantu” region, undoing the former useful distinction between Central & Southern Africans.5 Leaving the total number of African regions within AncestryDNA unchanged at 9.

Going by my survey findings before the update there were however only 7 African regions which really mattered for Afro-Diasporans when describing their main ancestral connections with Africa. The socalled “Hunter-Gatherers” and “Africa North” regions usually being minimal or introduced by way of Iberian detour. Arguably after Ancestry’s update the African breakdown now only has 6 regions instead of 7 which really matter. The “Eastern Africa” region again being minimal for Afro-descendants in the Americas (and not even having a good prediction accuracy). While 3 out of 6 remaining regions (“Senegal”, “Ivory Coast/Ghana” and “Nigeria”) have been severely compromised because of Ancestry’s faulty sampling strategy. So basically it all became more generic instead of more specific. Even when the total number of African samples did increase…

***

Chart 1 (click to enlarge)

Predictive accuracy African regions

Source: Ancestry’s White Paper (2018, p.8). This chart depicts the prediction accuracy for each African region. The average or median being marked by the black line in the middle of each coloured boxplot. Based on how Ancestry’s African samples themselves score for each region. Notice the wide range of estimates. Also including 0% for Nigeria! Compare with this chart for the previous version (2013-2018).

____________________

“We predicted nearly 100% of the genetic ethnicity from the correct region for the following groups:  […],Cameroon, Congo & Southern Bantu Peoples,  […], Africa South-Central Hunter-Gatherers. For some regions, such as Nigeria […], the numbers are not as high, with average assignment of 28% […] to the correct region, respectively.” Source: Ancestry’s White Paper (2018, p.20).

____________________

Based on 130 African customer results I have already established in part 1 of this blog series that AncestryDNA’s update seems to work out best for actual Central Africans, Malians and North Africans. For Beninese and Ghanaian Ewe there seems to be not much difference, on average. For Southern Africans it also seems to be mostly an intermediate outcome. But the worst hit would be Nigerians (both north & south), Ghanaians (Akan & Ga), Ivorians, Liberians, Senegambians as well as Northeast Africans. And by default also people in the Afro-Diaspora descended from these populations! Many of these patterns are also reflected in chart 1 above.

It is especially illuminating to compare with this overview of average prediction accuracy of each African region before the update. If I understood Ancestry’s White Paper correctly the coloured boxplot includes 25%, 50%, and 75% percentiles. The complete range (also including outliers) extending even further though. While the black bolded vertical line would represent the median or average. One thing that stands out is that 100% accurate estimates seem to have become less common, while the range downwards to below 50% has increased. Otherwise:

  • Regions with decreased prediction accuracy:  “Nigeria”, “Ivory Coast/Ghana”, “Senegal”, “Northern Africa”.
  • Regions with increased or equal prediction accuracy: “Cameroon, Congo & Southern Bantu”, “Mali”, “Hunter-Gatherers”, “Benin/Togo”.

Although chart 1 is quite insightful it remains regrettable that the former genetic diversity tabs have disappeared as they contained more specific statistical details (in numbers!) for each separate region. It should also be kept in mind that these indications of prediction accuracy are based on the African samples already included in Ancestry’s Reference Panel. From my survey findings based on randomly collected African customer results (before the update) I found that Ancestry tended to overestimate the prediction accuracy of its African regions. For example for 77 Nigerian survey participants I calculated an average “Nigeria” score of around 52% (see this link) while Ancestry mentioned a median score of 69% “Nigeria” for its 67 Nigerian samples (see this screenshot).

 

2) What are the ethnic backgrounds of Ancestry’s African samples?

____________________

“Our samples came from these sources(approximate numbers):

  •  500 from Human Genome Diversity Project (HGDP) samples
  •  800 from One Thousand Genome Project samples
  •  4,400 from AncestryDNA proprietary samples [SMGF]
  •  10,800 from AncestryDNA customers” (Source: Ancestry)

____________________

____________________

“Today there are five main companies in the United States offering genealogical testing, including 23andMe, AncestryDNA, National Geographic, MyHeritage and Living DNA.  Along with their popularity has come controversy. Some scientists note that because none of them release their reference panel data, it’s impossible to evaluate them.” (source: Ancestry.com’s ethnicity updates likely won’t be the last, USA Today, 2018)

____________________

Table 2 (click to enlarge)

***

sorenson databasea

Source: former website of the Sorenson Database. This database (SMGF) was acquired by Ancestry in 2012. This overview is showing the number of samples collected in selected countries. Take note how Mali, Benin, Togo and especially Cameroon had abundant sampling but Congo much less so.

***

Africa’s ethnic diversity is a fact. Even if often underestimated or misunderstood (see this page for maps).  In many cases this means that also within any given African country a great deal of genetic diversity will exist. Invalidating regions referring to modernday countries (with colonial borders). As Ancestry insists on maintaining in this update. For correct interpretation of AncestryDNA’s African regions it is however still crucial to not only know the nationality but also the ethnic backgrounds of the African samples included in Ancestry’s Reference Panel. Imagine for example a “Nigeria” region being defined solely by Hausa-Fulani samples from the north. Surely this will lead to markedly different “Nigeria” scores if instead only southern Nigerian samples (Igbo, Yoruba etc.) had been used to compare your own DNA with!

When AncestryDNA first came out with its pioneering West African breakdown I therefore emailed them several times in 2014 for more details about the specific ethnic groups being included for each African region. They never provided this info… Earlier this month I once more asked about these ethnic details on their website. Again no reply… By necessity then this section will be mostly based on guesswork and (informed) speculation on my part. However to be fair Ancestry does mention some key aspects about their African samples on their website which I will try to incorporate as well.

I will not discuss the possible ethnic background of the samples being used for “Senegal” (Mandenka from HGDP?), “Ivory Coast/Ghana” (Akan/Brong and Ivorian Kru & southwestern Mande?), “Nigeria” (Igbo & Yoruba from HGDP?) and “Northern Africa” (Mozabite from HGDP?). The sample size of these regions has not been expanded that much after this update (see table 1). And frankly I suspect only customer samples have been used whenever there was modest addition.6 Otherwise the sample composition of these regions will have remained the same. For previous discussion see this page as well as this one.

It could very well be a different story though for the leading trio of “Mali”, “Benin/Togo” and “Cameroon, Congo & Southern Bantu”! As I highly suspect that most if not all of their notably greater increase in sampling may have been sourced by way of the former Sorenson (SMGF) database. This sample collection is referred to as proprietary by Ancestry in the first quote above. Because Ancestry acquired the Sorenson Molecular Genealogy Foundation (SMGF) in 2012. The previous version of Ancestry’s Reference Panel probably already contained samples from this collection. But it could very well be that the number of African samples from this invaluable database has increased even more so with this update.7

The website of the Sorenson database has regrettably been taken down by Ancestry in 2015. But luckily it can still be accessed by way of the internet archive 🙂 By performing a search I could verify that all expected countries from AncestryDNA’s African breakdown have indeed been sampled by SMGF (see table 2). However it is to be kept in mind that these samples were originally obtained for either Y-DNA or Mitochondrial DNA. But *possibly* Ancestry has now also managed to extract autosomal DNA from these samples. Again this is speculation on my part!

Looking into the 153 newly added Malian samples for example it is very tempting to go with this SMGF scenario though. Because Malian samples are quite rare in other publicly available databases to my knowledge. Not at all present in either the HGDP or 1000 Genomes databases (mentioned as other sources for Ancestry’s Reference Panel). The number of Malian Ancestry customers may also be assumed to be much too small to support an increase of 153 samples. So by way of elimination only the Sorenson database seems to remain as a viable option. A similar line of reasoning might also be valid for the 446 (!) newly added samples from presumably either Cameroon or Congo. As well as the 164 newly added samples from Benin and/or Togo.

Regrettably I was not able to find any specific ethnic or other relevant details being mentioned on the former website of the Sorenson database. However a possible clue might be taken from the foto credits on Ancestry’s “Mali” page which explicitly refer to SMGF! Pursuing this lead a bit further it turns out that actually at the time SMGF also organized a photo exhibition called “Faces of Mali”. And from the description it seems that at least some of the sampling may have taken place in southwestern Mali.8 In fact from another source it may already be confirmed that both Bambara & Dogon samples were among them! For more details see:

I did not find many useful clues when looking into the possible origins of the newly added samples for the “Benin/Togo” and the “Cameroon, Congo & Southern Bantu” regions. However just going by my before & after survey findings for 130 continental Africans, I suspect that also samples with a non-Gbe origin might have been added for the “Benin/Togo” region. This might explain the lack of any substantial improvement in describing the DNA of my Ewe and Benin samples. Plus it might also (partially) explain the great extension of the Benin/Togo region across West Africa, and especially in southern Nigeria. One would hope that Ancestry did not also include Beninese Yoruba samples, as this would frankly be nothing less than a blunder…

The increase in sample size has by far been the greatest for the “Cameroon, Congo & Southern Bantu” region (+446!). However Ancestry has not been very informative of this major change. Even if this newly combined region now includes more than 20 countries! Again I have to go by my unconfirmed assumptions, but I highly suspect that most newly added samples may have been obtained from Cameroon and not from the Congo, given the vast overrepresentation of the former country in the Sorenson database (2,453 versus 87, see table 2). This would actually be in line with a general trend whereby the genetic importance of Cameroon in DNA testing for Diasporans has been overstated because of a relative abundance of Cameroonian samples to be matched with (both for haplogroup and autosomal testing). While other samples from especially southeastern Nigeria but also from the Congo and Angola are relatively lacking. See also:

New samples added from Kenya as well as Tanzania?

***(click to enlarge)

HG map

Source: Ancestry. This is the updated map of the “Hunter-Gatherers” region. Although the accompanying text has remained the same, the map it self has drastically changed. Only Tanzania and Southwestern Africa are being highlighted as main locations. Take note how Central Africa is no longer indicated. Even when this is the known home place of Pygmy populations of the Mbuti and Baka! Follow this link for the pre-update version of this map.

***

____________________

{“version”:4,”regions”:[{“color”:”#f1e000″,”key”:”Luhya“,”lowConfidenceAssignment”:false,”lowerConfidence”:57,”percentage”:57,”upperConfidence”:60},{“color”:”#75cd00″,”key”:”CameroonCongo”,”lowConfidenceAssignment”:false,”lowerConfidence”:0,”percentage”:31,”upperConfidence”:31},{“color”:”#00cc99″,”key”:”AfricaSanPygmy”,”lowConfidenceAssignment”:false,”lowerConfidence”:0,”percentage”:11,”upperConfidence”:24},{“color”:”#00b8cd”,”key”:”NearEast”,”lowConfidenceAssignment”:false,”lowerConfidence”:0,”percentage”:1,”upperConfidence”:2}]}

***

There are the updated results of a Kenyan with 57% “Eastern Africa”, seen in preview code. See this screenshot for the website version. By using a trick (see this link) you were already able to see such results before Ancestry had rolled out their update to all their customers. The interesting thing is that programming codes have been used instead of the usual regional labeling. And quite tellingly the code name for “Eastern Africa” is “Luhya”!

____________________

Once more I would like to underline that I have no confirmation for what is about to follow. However starting with the newly updated map for the “Hunter-Gatherer” region one might wonder: did Ancestry perhaps replace their Central African hunter-gatherer samples (Mbuti and Baka Pygmys) with Tanzanian ones? These possibly new hunter-gatherer samples for Tanzania being either Sandawe and/or Hadza. All of these populations are heavily marginalized, living in very remote places and subsisting in small numbers. Frankly speaking I do not find them very relevant to understand the origins of especially Afro-Diasporans (one notable exception being the Khoi-San and their genetic legacy among the South African Coloureds). However because of their distinctive genetics they have been studied extensively and many academic samples are available. Which is why they are often featured in DNA testing.

In part 1 of this blog series I already mentioned the quite outlandish reporting of this “Hunter-Gatherer” region in clearly inflated amounts among Northeast Africans. According to Ancestry’s own information this region is now to be found as far north as Djibouti! Far removed from any historical Pygmy or Khoi-San population! But perhaps less absurd when also taking into account a (unconfirmed!) addition of Tanzanian samples. Despite having distinctive DNA markers it is also known that Tanzanian hunter-gatherers have intermingled with surrounding populations across time, incl. Bantu-speaking ones but also Nilotic and (South) Cushitic ones. Which might explain the genetic similarities now being detected (in absence of better fitting samples from Northeast Africa!). See also this very recent study:

It should be noted also that the “Hunter-Gatherer” scores have mostly disappeared for Central Africans themselves (as well as West Africans), going by my before & after survey. Additional sampling from Tanzania however would be clearly in contradiction with the regional overview given by Ancestry which still only mentions the Khoi-San and Pygmy (see map above). But perhaps this text is still under revision. Also the number of samples for the “Hunter-Gatherer” region (previously n=35) has not increased with this update but actually has been reduced with one sample (see table 1)! Then again it might also be that the regional map was made in error or possibly it’s just some quirk of Ancestry’s new algorithm which is causing these inflated “Hunter-Gatherer” amounts to appear among Northeast Africans. Either way the current outcome is very unsatisfying and hardly in support of decent quality control by Ancestry.

Moving on to the new “Eastern African” region I have more solid ground to believe that the Kenyan Luhya people have been used as a reference population. Perhaps in addition to other ones but I would not be surprised if they are the only defining samples being used for “Eastern Africa”. As shown above in the preview mode of the updated results of a Kenyan with 57% “Eastern Africa”, it can be revealed that Ancestry uses “Luhya” as a code name for their ” Eastern Africa” region. By using a trick (see this link) you were already able to see a preview of your updated results before Ancestry had actually rolled out their update to all their customers. Which is how I obtained this insight 😉 The interesting thing is that in this preview mode programming codes have been used instead of the usual regional labeling. It may not be a water proof confirmation but certainly it is no coincidence that Ancestry’s programmers picked out the Luhya as their code name. When one reads the regional description for “Eastern Africa” provided by Ancestry (see this screenshot), again the Luhya are explicitly mentioned and seemingly singled out.

In fact there is more supporting evidence because Ancestry has itself mentioned that the One Thousand Genome Project has been one of their main sources for their newly added samples. And within this 1000 Genomes database 116 Luhya samples from Kenya can be found (see this link). Sufficiently covering the 82 samples being used for Ancestry’s “Eastern Africa” region (see table 1). These very same Luhya samples have actually also been used by 23andme but quite perversely for their West African category (!) (this was before their current update which is still to be rolled out completely). Highly illustrative of the sometimes arbitrary and ill-designed usage of African reference populations by DNA testing companies…

In their white paper Ancestry makes it a point to emphasize how they are committed to “developing the best possible set of reference samples.” They mention that “the genetic distinctness of each region” should be kept in mind. And quite rightfully they mention that it is not only about quantity but also about quality when designing an appropriate Reference Panel. A tool based on comparing relevant populations with your own DNA and which is able then to achieve reasonably accurate ethnicity estimates in line with either historical plausibility or verifiable genealogy.

It would be quite contradictory therefore if the Luhya have indeed been chosen as the sole defining reference population for “Eastern Africa”. As genetic studies have already revealed this Bantu speaking population not to be the perfect choice for strictly covering genetic similarity with Nilotic(-like) DNA among Northeast Africans. Many Kenyan populations actually being composites of Bantu-, Nilo-Saharan- and Cushitic speaking populations to varying degrees. However for the Luhya it seems their ancestral ties with Bantu populations from Central Africa are the strongest. Which probably accounts for the rather disappointing prediction accuracy of the new “Eastern Africa” region among native Northeast Africans (around 50% according to my before & after survey), as well as the subdued appearance of this region among Southeast Africans and the occasional trace reporting (1%) among Afro-Diasporans. The Maasai (successfully used by 23andme for their own “East Africa” category!) arguably make for a much better candidate. See also:

3) New algorithm has issues with describing mixed/complex lineage?

Table 3 (click to enlarge)

Precision & recall

Source: Ancestry’s White Paper (2018, p.22). Red arrows added by myself.  Precision & recall are defined by Ancestry as follows: “Precision can be thought of as how much of the reported ethnicity is true.”, “Recall can be thought of as how much of the true ethnicity is called by the process.” Based on the results of mixed (within Africa) individuals with known background. Take note of the (very) low recall rates for “Nigeria” but also “Senegal” and “Ivory Coast/Ghana”.

____________________

“We also evaluated the accuracy of ethnicity estimates for “synthetic” individuals of mixed ethnicity origins. These test cases are simulations we construct with known mixtures of ethnicities. Each synthetically admixed individual can have as few as 2 or as many as 20 ethnicity regions, with various proportions. Since the true ethnicity proportions are known, we can calculate precision and recall for each ethnicity region. Precision and recall are two important factors in evaluating our estimation process.” (source: Ancestry)

“For regions with low recall, it’s mostly because part of the ethnicity from these regions are assigned to nearby regions. Hence underestimation and the low recall. For regions with low precision, it’s mostly likely part of the nearby regions are assigned there. Hence overestimation and low precision.” (source: Ancestry)

Our new algorithm analyzes longer segments of genetic information and is a fundamental change in how we interpret DNA.” (source: Ancestry)

____________________

The new algorithm used by Ancestry most likely also had a major impact on the updated African breakdown on AncestryDNA. It would be useful to see how the updated results would have turned out if Ancestry had maintained its former algorithm while still using their expanded Reference Panel. For proper understanding it will be mandatory to closely read Ancestry’s white paper. But in order not to digress too much I will keep this section brief and not overly technical. One important consequence of Ancestry’s new algorithm seems to be the tendency to stick everything in as few as possible big regions rather than having things divided up into a dozen small percentages. Something which some customers seemed to take in with great delight as an expression of “diversity” and “exotic” lineage 😉 . But in fact such overly detailed breakdowns often also were confusing or misleading.

The new algorithm also accounts for the disappearance of most “Low Confidence” a.k.a. “Trace regions”. These latter regional scores were often mislabeled and obviously to be taken with a grain of salt. On the one hand this may be considered an improvement as Ancestry is now focusing on larger stretches of DNA, which should be more reliable and less likely to represent statistical noise. But from my experience with correct interpretation and proper follow-up research these minimal scores could sometimes still already be indicative of distinctive ancestors.

It has been said that this update seems good for people with low genetic diversity and good representation of their nationality within Ancestry’s Reference Panel. However for people with more complex background, incl. recently mixed individuals, Ancestry’s new algorithm does not always perform as expected. This has been observed for example for people with known mixed northern & southern European background, whereby the northern European component tends to get overestimated. Over-simplification that works well to eliminate noise for someone that is predominately from one ethnic group, has the opposite effect for someone who is recently mixed or has more complex origins from several generations ago (see this link for insightful discussion). Like wise also for Africans of known mixed background Ancestry often does not get it right. As shown in table 3 this goes especially for people of mixed Nigerian, Senegalese or Ivorian/Ghanaian background. These three regions have already stood out before as having a worse prediction accuracy than in the previous version (see chart 1).

Within its white paper Ancestry specifically makes a distinction between prediction accuracy for so-called “single-origin individuals” as opposed to “synthetic individuals” with known mixed ethnicities. The implications for Afro-Diasporans could be even more far-reaching as after all almost by default Trans-Atlantic Afro-descendants will have intricately mixed origins from across West, Central and Southeast Africa in mostly unknown regional proportions. Generally speaking only historically documented slave trade patterns, African ethnonyms being recorded among enslaved people as well as cultural retention serving as ways to roughly verify any DNA results (see this link). But more so on a group level than for individuals! Therefore the previous algorithm might have been more suitable to deal with this complexity. While the current one might serve to underestimate or simplify the various regional origins of Afro-Diasporans. At least on their African side. From what I have seen their Asian & European admixture is now however much more in line with historical plausibility. Which seems to illustrate you cannot always have it both ways.

If you are discontent about this update let Ancestry know about it!

It is often advised not to take your DNA results too seriously because of all the imperfections and inherent limitations. And it is indeed always good to be well-informed and critical without being over-dismissive. However for myself and many other Afro-Diasporans the African breakdown provided by AncestryDNA represented a promising and valuable tool for learning more about our previously unknown regional lineage within Africa. As we generally do not have much to go by otherwise this is not something to take lightly! Which is why many people have been unsettled by the drastic and seemingly random changes in their AncestryDNA results. I will discuss some of their reactions in the last part of this blog series. Right now I would like to repeat that whenever you are asked for feedback by Ancestry make sure to let them know! When in agreement please also forward them this link:

Achieving improved ethnicity estimates is more difficult than it seems on first sight. I can imagine it often involves balancing opposed considerations and making tough calls. It is an ongoing challenge which Ancestry in their own words is dedicated to take on. And in fact I do appreciate the efforts which have gone into this update. I have noted any improvements whenever I came across them. But generally speaking, in regards to the African breakdown, the outcomes have been very disappointing and frankly a setback!

What I find particularly frustrating is that the current issue of highly inflated “Benin/Togo”, “Cameroon, Congo, Southern Bantu” and “Mali” scores could have been prevented if only Ancestry had carried out their update in a more thoughtful manner. I have been blogging about the misleading country name labeling of especially “Benin/Togo” for several years already (see this blog post). Also from the start I have pointed out that the “Cameroon/Congo” is poorly designed as it covers ancestral connections to both the Bight of Biafra and Central Africa. Instead of addressing these issues or at least attempting to achieve some improvement Ancestry has only made things worse with this update…

___________________________________________________________________________

Notes

1) It might be different story for the European and Asian breakdowns. I have actually seen quite encouraging updated results in this regard. And generally speaking they could be an improvement indeed. Although there are also still some remaining issues. The non-African regional breakdowns are however not a topic of discussion in this blog post.

2) For example crucial statistical information to determine the predictive accuracy of each region is no longer provided as it used to be in the “genetic diversity” tabs (see end of this page for examples). Also I have not yet seen an equivalent of this chart below depicting the “Average ethnicity estimates for natives from each region”. It used to be available by way of this link. But this page has not been updated yet sofar…

***(click to enlarge)

AncestryAdmixture_zpsf4ca3e28

Source: Ancestry. Average ethnicity estimates for natives from each region, based on previous version (2013-2018).

***

3) Unlike commonly assumed you do not need to sample entire populations to obtain informational value with wider implications. Naturally greater sample size does (usually) help matters.  But if you randomly test a given population, and if your sample group is fairly representative of the whole population, you can make generalizations. Naturally methodology and the assumptions being made should be made explicit, but this is common scientific practice. See also:

This is an important lesson I learnt while performing my AncestryDNA survey: robust patterns (in line with historical plausibility) might already be discernible from a sample-size of around n=30. Which is actually often considered a general rule of thumb. Adding more results will indeed lead to greater finesse and more detailed statistics but the main outline might then already be established. Even more so when you are aware of any possible sampling bias or substructure and know how to account for it in your analysis.

4) This spectacular increase in Malian samples (+153) certainly is to be commended in itself. It was in fact one of the main suggestions for improvement I blogged about in July (see this link). In order to prevent the currently inflated “Mali” scores it would have been preferable though if Ancestry also had augmented the sample size of their “Senegal” region. Which is still very low now ((n=31). Like I suggested in July many Senegambian samples are to be obtained from either the 1000 Genomes database (which was actually used by Ancestry in this update!) as well as the MalariaGEN database .

5) Given correct interpretation the distinction being made between “Cameroon/Congo” and “Southeastern Bantu” could be very useful for Afro-descendants as well as many Africans. This was demonstrated most clearly by the frequency of top-ranking scores for “Southeastern Bantu” for my Brazilian and Mexican survey participants, corroborating their strong ancestral ties with Angola (see this blog post).

6) I am making this assumption based on the observation of atypical 100% “Nigeria” scores for 4 Nigerian persons in my before & after update survey. As well as one single 100% “Northern Africa” score for a Moroccan. Such unexpected scores seem to be the result of including customer samples into Ancestry’s Reference Panel. Causing an overfitting or calculator effect. The ethnic backgrounds of the Nigerians scoring 100% “Nigeria” are quite diverse btw, but all hailing from southern Nigeria: Igbo (2x?), Yoruba, Urhobo (?).

7) You might wonder (like I did) why Ancestry did not use all of the available African samples in its Sorenson database right away in 2013. When they first provided their pioneering West African breakdown. However it seems that at the time there were still some issues to resolve about required consent for commercial purposes. Which perhaps may have caused the delay. See also these articles for more references:

8) Again I have to indulge in some speculation at this point. But it seems quite likely to me that Ancestry’s Malian samples were drawn from several ethnic groups and not just one or two. As southwestern Mali is very much a multi-ethnic region already let alone other parts of this large country (see this page). To repeat myself Ancestry has not disclosed the actual ethnic backgrounds of its 153 newly added Malian samples. Which is rather crucial because as I have argued before it is not only the number of additional samples which matters but also their relevancy and how they fit in the Reference Panel. Additional samples being a means to an end. But coherent regional scores in line with historical plausibility or even verifiable genealogy should remain the main goal!

Just to name one possibly problematic issue: the inclusion of Gur/Senoufo speaking samples from southern Mali could very well cause greater regional overlap with Mali’s neighbouring countries to the south & east. Possibly Dogon samples may already have a similar effect. While also any inclusion of Malian Fula samples from the Maasina area would be rather ill-advised as they have quite distinctive genetics, incl. a North African(-like) component. And in fact they are not unique to Mali at all as the Fula people are arguably the epitomy of Africans migrating across the continent (see this map). In my previous survey findings I found however that despite great dispersion and also some degree of local intermingling many Fula people (incl. also the more hybrid Hausa-Fulani) still clearly preserve a distinctive genetic component tied to their presumed origins along the Senegal river. Which is why I find it quite lamentable that their formerly predominant “Senegal” scores have now been replaced by “Mali” ones. See also:

Advertisements

36 thoughts on “Did Ancestry kill their African breakdown? (part 2)

  1. I remember when you did my great aunt DNA that was a 100% African, and she was from Southern Georgia. I can’t find the article you wrote about her. She passed away at 106yrs. She was a blessing to talk to and hear her stories of the past.

    Liked by 1 person

  2. I always enjoy your thoughtful and well-researched analyses of Ancestry.com endeavors. Why do you think Ancestry has not responded to your requests? The point you made about knowing the ethnicity of the groups, not just the regions makes perfect sense, since borders are man-made and subject to change and if you are comparing ethnicities, then one would compare ethnicities, not a region and an ethnicity. I had 13% Southeastern Bantu in the old estimate. Now that is combined with Cameroon Congo. That new region is nearly 3/5 the size of the African continent! It would have been useful to see the old algorithm used with the larger sample size, as you point out. Lastly, I have found it interesting to compare GEdmatch with Ancestry. I have used a few of their tests. They reported Khoi San and Mbuti Pygmy groups in my genetic makeup, as well as Omotic and North African strands. You mention thses in your report. I also think the trace regions can be useful. Have you ever compared the GEDmatch results for people to the other large DNA company results? They do post their methods. The complex genetic makeup of Africans poses challenges. I hope that with advances in genetic testing and rigorous measurements, more can be discovered .

    Liked by 1 person

    • Thanks so much for your continued appreciation!

      I am not sure why Ancestry has not answered to my queries about the ethnic backgrounds of their African samples. However as mentioned in that USA Today article they are not the only DNA testing company not to reveal such information. I cannot imagine this type of information being compromising in itself. On the contrary I think it will be greatly appreciated by Ancestry’s customers if they provide such details! It would be of tremendous help to arrive at correct interpretation of their African regional scores! Before the update I was quite content with the additional context given by Ancestry to their customers. There is always room for improvement of course but in comparison with other companies I felt they did a reasonably good job.

      With this update however I did notice that the level of transparency has decreased. Notably the genetic diversity tabs which have gone missing for each region. Also apparent in the whole manner in which this update has been dragging on since atleast April but never was officially announced till much later …

      13% Southeastern Bantu is quite a considerable score for an African American btw! Have you ever found any African matches possibly associated with this part of your DNA? It is very lamentable that Ancestry has decided to make things more generic instead of specific. Although possibly any trace amount of “Eastern Africa” might still remain in some cases I’m guessing.

      I have actually once followed an inspirational survey done by somebody else about Gedmatch results for the Dodecad/Africa9 calculator (see this link). This is already a couple of years ago though. It was a survey held among African Americans, trying to establish the distribution of total African admixture rates as reported by 23andme but also focusing on other aspects. Therefore also featuring their Oracle results. These primary Oracle populations were predominantly Central African (Fang, Bamoun, Kongo etc.). Because intuitively this did not make historical sense to me this actually triggered my interest in pursuing my own AncestryDNA survey since 2013 which I found to be much more in line with genetically inherited individual variation as well as historical plausibility on a group level.

      Frankly I have since then not really taken much notice of Gedmatch and other third party websites because I found their ancestral categories not up to par with AncestryDNA (before the update). I do think you can get valuable insights from the various (and wide-ranging!) Gedmatch calculators, but only if you are able to make correct interpretions and are aware of their limitations. Seemingly due to the algorithm being applied or else because of a low-resolution of their samples it appears that their predictions are more reflective of (very) ancient migrations. Rather than anything focusing on the last 500 or 1000 years or so. Which from a genealogical perspective, and in particular for the Afro-Diaspora would be most relevant.

      Going by other people’s reactions I also find Gedmatch to be highly confusing and potentially misleading because of the way their results are presented as seemingly very “precise” and “specific”. When in fact such a presumed accuracy cannot be attained with current DNA testing technology. AncestryDNA’s country name labeling may be misleading as well, but on a different scale I would say. Especially since they do mention the limitations of their “estimates” and also quite clearly illustrate the inevitable overlap with their regional maps.

      The labeling of ancestral categories is trickier than many people may realize. But I find it more reproachable when false hope is being generated of pinpointing a particular “tribe” based on the ethnic labeling of DNA scores which are based on mere genetic similarity with a given selection of samples and not actual genealogical descent!

      Although this particular update by AncestryDNA has been a major letdown (in regards to the African breakdown) I do share your hope that new advances will lead to more discoveries in the near future. I will expand on this topic in the last part of these blogseries. So stay tuned 😉

      Like

      • A maternal aunt of mine has Southeast African Bantu as her largest (African) region in the previous version at a full 16%. I’ve always thought this might either be an error or just an interesting recombination, because I had a trace of it in my last version (2%). Unfortunately, try as I might, I have no gotten her interested enough to log back into her account and check her update. And she’s also not the type who’d let me have any control over that kind of thing,

        I’d never come across anyone in the previous version where SE African Bantu was their largest African region or a percentage that high, which is why I was thinking it was in error.

        Liked by 1 person

        • From my survey among African Americans it was indeed uncommon to see that region as top ranking however in some cases it did occur. In my latest count 6 times only among 515 results. It will be interesting to see how much of these former “SE Bantu” scores will be translated into “Eastern African” ones. Sofar I have only seen trace amounts of 1% for that new region among AA’s. But still already quite regularly.

          Like

  3. This explained A LOT about the new results; it has been illuminating. It looks like the biggest sin was the imbalance in the increase by region. If you greatly increase a region, while the others have small increase, stagnate, or even lose samples, of course you’re going to get more of people’s DNA (on average) pulled to this increased region(s). For whatever reason “Mali” dropped for me, my father and his mother, often fairly substantially. But that seems to more uncommon than an increase. I’d be kind of interested in theories as to what samples were used in other regions that may have pulled us away from Mali. Or Senegal certainly didn’t increase either (mine disappeared altogether), so I’m unsure of which regions benefitted.

    Another thing I wish Ancestry would have done is that if they were going to get rid of trace regions – and really, for most folks, most of them were false positives I’ve found anecdotally – that they wouldn’t simply parcel out those regions to existing regions. I wish like other testing companies have gone that if they were unsure of a particular segment that they’d have stuck in in a macro level above the regional level (i.e. Broadly West African, Broadly Central African, etc.). And, BTW, this doesn’t just go for the African results. While My “England, Wales, & Northwestern Europe” results line up much more closely with what I know my European ancestry to be, I still think that it’s probably slightly overstated because they gave some trace regions over to it in the update.

    I think “broad” macro-regions can help with folks as genetically diverse/mixed as African Americans such as myself. Because as it currently stands, when you’re giving segments of DNA to regions your not really sure are actually from those regions while also giving segments of DNA to regions that you are really sure are from those regions, well, that’s ables to oranges. Not knowing something is not necessarily bad, and in fact the honesty would be more refreshing if they’d say this.

    Because of this update, the only African regions I trust, quite frankly, are those in low-double digit ranges or high-single digit ranges. Other than that, the oversampled “big three” regions are just “broad” macro regions as far as I’m concerned, and low single-digits might STILL just be false trace regions. So, that really leaves me with trusting my “Ivory Coast/Ghana” region (8%) and my “Mali” region (5%). I don’t have any African regions below that threshold, and the two above it (Benin/Togo & Cameroon, Congo and Southern Bantu) I imagine are either overstated and/or they overlap too much to be accurate. For instance, the vast majority of my found African matches are southern Nigerians. I have no “Nigeria” region. The two big regions have gobbled “Nigeria” up to nothing.

    Liked by 1 person

    • I’d be kind of interested in theories as to what samples were used in other regions that may have pulled us away from Mali. Or Senegal certainly didn’t increase either (mine disappeared altogether), so I’m unsure of which regions benefitted.”

      Perhaps the additional samples that have been added to “Benin/Togo” may have been the culprit. As mentioned in this blog post I have a feeling that not all of them were from Gbe groups from the south but also from other ethno-linguistic groups in the north (see this map). From my before & after survey among Africans one striking outcome was also a sharp increase of “Benin/Togo” among Ivorians and even Liberians! Although really I’m just guessing as I think this update has created such a mess that some times you will not be able to disentangle any seemingly logical explanations…

      “Not knowing something is not necessarily bad, and in fact the honesty would be more refreshing if they’d say this.

      Yes I agree! I’ve once heard it formulated as “no information is better than false information” or “don’t be more specific than your data supports”. Then again I also find it really appalling how the level of specificity from before the update such as the Nigeria” or the “Southeastern Bantu” scores has now been totally wasted…Again it was far from 100% accurate then but at least on a group level it did fall in line more or less with historical plausibility for Afro-Diasporans and also (broadly speaking) with verifiable genealogy for Africans.

      Like

  4. In the old estimate for me, SE Bantu was the highest at 13% While my siblings also had SE Bantu, it was not nearly as high as mine. The second highest was Benin Togo at 11%. In looking at my DNA matches, I have encountered some higher SE Bantu scores based on the old estimate. In fact, the map at the top of the area originally leaned more towards the east and now leans more towards the west while remaining mostly the same at the bottom. Is this the result of major migrations both west and east across Africa and a reason for the problems with accuracy in the DNA tests taken by Africans and those of African ancestry? It’s interesting.

    Liked by 1 person

    • Very interesting! From my survey among African Americans it was indeed uncommon to see that region as top ranking however in some cases it did occur. In my latest count 6 times only among 515 results. The regional maps by Ancestry are meant as proxies. But migrations across Africa certainly may account for difficulties when wanting to make clear distinctions. What I find striking is that going by my survey results the new region “Eastern Africa” is also to be found in minor amounts among Southeastern Africans, such as Malagasy, Zimbabweans, Malawians and even South Africans, however this is not always clearly indicated on the map and the “also found in” details.

      Like

      • Given where the region is centered, is is really “striking,” though, that Southeastern Africans aren’t showing this as their largest region? It’s clearly centered much further north, and as you laid out weighed heavily toward the Luhya. You’d expect for the countries you mentioned – given that Southeast African Bantu was merged with it – that “Cameroon, Congo, and Southern Bantu” to come up as the largest region.

        This is all to say that it’d be rather odd, in fact, if Eastern Africa showed up high in these countries.

        Like

        • From my survey findings minor but still considerable amounts (10-25%) of “Eastern Africa” are being reported for Southeastern Africans. Whereas I was expecting that (similar to Central Africans) the new region (or should i say macroregion 😉 ) of “Cameroon, Congo, South Bantu” would be clearly prevailing, like in between 90-100%. Only additional regional scores possibly coming from “Hunter-Gatherers” but actually also not expecting those scores for non-South Africans. So that is what I found striking!

          I do indeed think that “Eastern Africa” was intended to indicate Northeast African lineage. But in practice this doesn’t seem to hold up. Of course some regional overlap into neighbouring countries of Kenya was to be expected. And I would be okay with some occasional trace amounts of “Eastern Africa” appearing further south. But when going by a Northeast African interpretation one would not expect (or wish) that these scores for Southeast Africans would be seemingly consistent and at this level of low double-digit %’s. Frankly I think it’s quite misleading if you’re not aware of it.

          Like

  5. Justin said:

    From what I have seen and read (including on this website), this is because the new samples they are using for Nigeria since the update are (our tend to be) less representative of southern Nigerian ethnic groups (like the Igbo, Yoruba and Edo) than the old ones were,

    As mentioned in this blogpost I actually have no reason to assume that the samples being used for “Nigeria” have changed that much. There has been a modest addition, probably only customer samples (probably mostly Southern Nigerians). But otherwise it has remained the same. Presumably it is the much bigger increase for some of the other regions (“Benin/Togo”, “Cameroon, Congo and South Bantu” and “Mali”) which has caused a structural imbalance in Ancestry’s Reference Panel. Also add in the new algorithm and then that may explain the drastic decline in “Nigeria” amounts. Not only for southern Nigerians and their New World descendants btw but also for northern Nigerians (see part 1).

    But also since the update, my Nigeria was reduced from 9% to 0, and my Benin/Togo increased from 5% to 15%, which is not really plausible, both given the similar effects of the change on the score of actual Nigerians (and others with formerly high Nigerian scores),and the fact that my family came from a fairly “Igbo-rich” region (Virginia and nearby parts of North Carolina) the new Benin/Togo score, in my case, seems very likely to the reflect Nigerian ancestry (and is fairly close in amount to the Nigerian ancestry I score before the update).”

    I agree that’s historically speaking very plausible. Have you had any succes yet in finding African/Nigerian matches? They could very well corroborate this scenario.

    my Ivory Coast/Ghana has been reduced from 6%-3% (which seems very plausible since there generally were not that many Akan people enslaved in the US”

    It is good to keep in mind though that the “Ivory Coast/Ghana” region also was indicative of Liberian and (southern) Sierra Leonean lineage for African Americans. Again looking into your African matches could give you more confirmation/insight. See also:

    https://tracingafricanroots.wordpress.com/2018/02/24/ivory-coast-ghana-also-describes-liberian-dna/

    my 9% Senegal has been replaced by 11% Mali (So I am probably about 11% Malian Mande—likely Bambara and/or Mandinka. And I had read that their sample for Senegal was was a Mande-speaking group anyway—a Wolof sample for Senegal would have been more representative and informative).”

    Yes I agree! As mentioned in this blogpost it is actually very likely that Bambara samples have been used for the “Mali” region. But possibly in combination with other samples from Mali with a different background (Dogon and perhaps even Malian Fula?)

    Like

    • Hello. I am not sure I understand how to go about finding African matches. In the conventional “dnamatches” section (a subcategory under the “dna” tab at the menu at the top of the web page) of the ancestrydna website, just about all my matches have been either African Americans or Ashkenazi Jews (which is what the other side of my family is), (with a tiny minority of people British Isles/Western European descent—since I’m also a very small percent British and Western European through my African American side). It it not the aforementioned section of the ancestry website that you are referring to, but perhaps rather a different kind of results available with ancestry the ancestrydnahelper app (which I have seen referred to on this website) that can yield African matches? I would be very interested to see my African matches,I’m just not sure how to do it. (I’m a bit confused I’m afraid, and not at all tech savvy)

      Liked by 1 person

    • Yes. I am fairly sure that my new 15% “Benin/Togo” score is actually entirely southern Nigerian (likely mostly Igbo) ancestry. In my pre-updated results, I had received, not only 9% Nigeria, but also 5% Benin/Togo (which was likely to represent southern Nigerian ancestry as well, especially given my family’s origin, as had been explained in various places on this site by you, and by me in my original post which you have excerpted above. Thus I had inferred that my pre-update results indicated that I was about 12-15% Nigerian (likely Southeastern Nigeria) by combining the 9% “Nigeria” and the 5% from my “Benin/Togo”result (which was also likely to be mislabeled southern Nigerian ancestry; since very actual few “Benin/Togo” Gbe-speaking people were brought as slaves to the US, and southern Nigerians like Yorubas/Edos/Igbos/etc also scored “Benin/Togo” pre-update.). And this inferred 12-15% fits with/is about the same as the new 15%”Benin/Togo” score of the updated version, suggesting that the new 15%Benin/Togo result (as I mentioned) all represents mislabeled southern Nigerian ancestry—that, as well as the fact that my results from 23andme (I sent my dna to them as well) have also yielded almost precisely the same percent at at 14% Nigerian!

      I would be very interested to see my African matches (if any—I certainly hope there are some). I expect that many would be Southeastern Nigerians, likely with some central Africans as well (Bakongos perhaps), an possible some from “Upper Guinea”, though I am now not sure to expect more if Mande or West Atlantic ethnicity (as I now suspect my Malian score may have been inflated due to Ancesty’s over sampling of Malians relative to Senegalese. And it would be interesting to know how many, if any, of their Senegalese samples were even West Atlantic at all as opposed to people from Mande-speaking ethnic groups living in Senegal such as Mandinka or Soninke/Djula (a pretty unrepresentative part of the Senegalese population I would say, and if used, would represent blunder as well). However, even Mande people living in Senegal (assuming they had been there for a very long time) might be expected to have some “West Atlantic” admixture (from peoples like the Wolof, Fula or Toucouleur) and thus serve as a very rough proxy or indicator for possible West Atlantic ancestry to be differentiated/contrasted from the likely more un-admixed Mande-speaking ancestry represented by the “Mali” component.

      Liked by 1 person

    • “It is good to keep in mind though that the “Ivory Coast/Ghana” region also was indicative of Liberian and (southern) Sierra Leonean lineage for African Americans.”

      I see (I had forgotten that). I suppose that could be true in my case; 23andme did give me more than 6% in a component they call “Coastal West African” that includes Sierra Leone/Liberia and the Akan regions (of Ivory Coast Ghana). (And on DNA.Land I score about 6% “Mende/Akan”. And my “Senegal River Valley”on DNA.Land is about 8%, almost the same as it was in my Ancestrydna results pre-update—DNA.Land though, lacks a Mali sample. And unfortunately, its “Senegal River Valley” sample is somewhat misleadingly named, and is actually, as their website says, “Mandenka in Senegal and Gambian in Western Gambia”,—the Senegal River valley is the heartland of the Wolof, though the Mande-speaking people of their sample still might have Wolof/West Atlantic admixture, especially, I suspect, those of Western Gambia near the Atlantic coast.)

      Also, if a small part of my Ancestrydna 17% Cameroon/Congo score is really S.E Nigerian, I suspect that my true S.E Nigerian (likely Igbo) percentage could be something like 16-20%, perhaps leaving me at about 13-16% genuine Cameroon/Congo (most likely really Congo/Angola); I think Congo/Angolan is still a pretty significant proportion of my African ancestry given its general commonness in the African diaspora (and in the regions my family lived, and cultural/anthropological evidence from the area).

      At any rate, is seems pretty clear that my largest African components are: Southeast Nigerian, Central African Bantu, and Upper Guinea (though what region of Upper Guinea predominantly is somewhat less clear, Senegal at the moment seemingly most likely).

      I will describe my results DNAmatch results (using DNAGEDcom) soon hopefully on the relevant page (the link you gave me to your tutorial), and that I hope should give me a better sense of things.

      Liked by 1 person

  6. Fonte, you were wondering about whether or not the old SE Bantu would in the new estimate become East African. You commented that by East African, they actually meant Northeast African. My 13% Bantu, my highest African score, did not shift to East African. It just became part of the large Cameroon/Congo/Southern Bantu category.

    Liked by 1 person

    • Thanks for letting me know! I have a feeling this will also be the case for most other Afro-Diasporans. As this so-called “Southeastern Bantu” region was in fact also (partially) indicative of southwestern Bantu lineage (from Angola, Congo). Having a systematic look into your DNA matches might tell you more.

      Like

      • While I suspect we may never know their samples, and while I suspect a lot of Angolans are being “lost,” from my own personal experience – and your own data seems to show this – the old Southeast African Bantu was even more heavily weighted toward those of Zambians and Zimbabweans descent Angolans. It seems this region really was heavily weighed toward the east coast of Africa.

        Angola is definitely underrepresented as it relates to those who’ve taken the test, though I’d imagine by an large that they’d match heavily with “Congo” since the entire northwest of the country is basically an extension of the KiKongo speaking folks in the two Congos.

        Like

        • My Angolan sample size was very minimal regrettably and also mostly including northern Angolans. Who indeed showed up as overwhelmingly “Cameroon/Congo” as they happen to be Bakongo themselves! Given that I highly suspect some Namibian samples were used for the former “Southeastern Bantu” region I’m guessing that this region would have appeared in greater amounts for central and southern Angolans. But I never got a chance to see their results (prior update).

          But either way given the results of Brazilians and Mexicans whose topranking African region was very often “Southeastern Bantu” I was even more so inclined to think “Southeastern Bantu” could often be indicating southwestern Bantu for other Afro-Diasporans as well (not exclusively of course). Historically speaking this also seems most plausible.

          Like

  7. Oh, so I see now that ancestry.com atleast did add new samples for their current categories though it’s not said what ethnic groups those samples are from and that’s only going to do so much without also having new populations.

    Obviously this was an improvement for European ancestry (and I am glad to see that they did, after five years, come out with something for East Asian), like they got rid of my dad’s “14% Iberian Peninsula” which was not listed as a “trace” aka “low confidence region” which had me curious about since 2014 when we did ancestry.com’s test especially since we are regular African Americans and my dad has no family history in South America or the Caribbean or even Louisiana where there could have possibly been at least a white French ancestor who would have fit the “Iberian.”
    But I’m still upset that this was not an improvement for African ancestry. I don’t get why ancestry.com couldn’t add the international genome populations. If Myheritage was able to use them, then why couldn’t ancestry.com be able to? I’m guessing maybe they just didn’t want to.

    I was also under the impression that this update would get rid of potentially noisy results and be more definite on whether or not very small single digit results are genuine ancestry and not “noise”
    If my granddad really does have Native American ancestry, I wonder how could it be “Andean” which is in South America. That kind of result has been seen on some other tests but not all.
    Tribecode (now called “Teloyears Advanced Ancestry”) and Living DNA did not report any Native American for my granddad. Neither did a sample he did for Myheritage (after Myheritage updated).
    But he gets around “1% Native American” on the Gedmatch admixture calculators, and when the “Native American” is specified, it is always something like “South American” like some (but not all) raw data uploads for Myheritage (“Indigenous Amazonian”). He also has a “<2% South American” on Family Tree DNA’s “myorigins 2.0” (from a 23andme V3 upload).
    The “Turkey and the Caucasus NEW 1%” is also similar to the “<1 Central Asia” on FTDNA myorigins 2.0

    I also don’t know what the deal is with my dad’s
    “Native American—North, Central, South NEW1%” and “Philippines 1%” Dad also has
    “North and Central America <1%” onFTDNA myorigins 2.0 (along with other stuff like “South Central Asia <1%” “Northeast Asia <1%” and “Southeast Asia <1%”)

    Do you think these might still be artifacts because ancestry.com (and really all the companies) still doesn’t have adequate African sampling?

    Liked by 1 person

    • I have no way of verifying but they could indeed still be artifacts. But i’m doubtful it woud be related with any misreading of African DNA. Some diluted Native American DNA doesn’t seem that odd to me for an African American, even when misplaced within the Americas. The 1% “Phillipines” seems to be a proxy for diluted Southeast Asian DNA, which possibly might be indicating a Malagasy line.

      Like

      • Yeah, these tiny non-European results have had me head scratching for years now. I’ve wondered why is it that Caribbean Hispanics like Dominicans can have Indigenous American ancestry from the 1500’s or 1600’s yet today still show like 3-8% “Native American” while African Americans consistently show less than 2%.
        Dr. Doug McDonald told me five years ago that he’s sure it’s definitely legitimate ancestry (though he couldn’t explain why they’re so low for African Americans while Caribbean Hispanics, who would have had their most recent unmixed Indigenous ancestors from well over 300 years ago, get so much higher percentages) because Africans from Africa don’t get those results.
        That had me thinking, maybe wrongly, that Africans like say Yoruba Nigerians are pretty much fully Yoruba Nigerians and would fit with the reference population that is available (which of course is Yoruba) whereas African Americans are such a blend of several different West African ethnicities and maybe some of those African ethnic groups haven’t been covered yet, if that makes sense.

        BTW while Tribecode and Living DNA didn’t report “Native American” for my granddad, Tribecode did report “2.16% South Asian” on their “standard” view (similar to 23andme’s “Speculative”) while my grandma has “1.20% South Asian Kalash”
        Living DNA gave granddad “1.9% Asia East” my dad (who hasn’t done Tribecode) “1.7% Asia South”
        and my grandma “1.9% Asia South”

        I’d like to say that I not only have European admixture (regardless of how it got there, it’s still part of us) in my family tree, but also East Asian, South Asian and Native American (and South American Amazonian and Andean Indigenous), but this has been really confusing over the years.

        Like

        • I’d say the reason Hispanic Caribbeans have preserved more of their Native American lineage is because of widerspread initial intermingling with Native Americans which lead to a greater dispersion within their general genepool from the start. Taking the DR as an example if practically everyone has about 5-10% Amerindian DNA than that proportion will tend to stabilize among the population, because it is passed on through all lines.

          For African Americans I imagine the occasion for widespread intermingling with Native Americans was much less around because of the way Native Americans were being pushed westwards. And also English colonizers who often arrived with their wives tended to have less Native American concubines as the Spaniards did. For the Hispanic Caribbean I think this was rather crucial as their mestizo offspring in the 1500’s would continue to intermingle with Africans and mixed-race Dominicans in the 1600’s and onwards.

          Personally I think that even at trace level for most AA’s any reported Native American DNA will tend to be genuine. But I’m guessing because it is often only inherited through one particular family line for many people it will tend to dilute each generation instead of being reinforced by additional lines.

          Like

          • Yes, you are understanding this history of this correctly. Hispanic folks of predominately African ancestry are way more likely to have incorporated more of the native population than in what would become the United States just because the “Europeans” were already more mixed within a generation or two in the islands. It was a big cultural difference in how the Spanish and Portuguese colonized than the English. In fact, the former were often encouraged to mix and not just from the government but by the Catholic church. So there is really no mystery about how Carribean folks would have more native ancestry.

            And, you’re right about African Americans in the U.S.. Though we often have a false belief about just how much Native American ancestry we have, we do, indeed, on average have SOME native ancestry.

            Liked by 1 person

            • For better understanding of the Dominican context this is a great source: Guitar, L. (1998). Cultural Genesis: Relationships among Indians, Africans, and Spaniards in rural Hispaniola, first half of the sixteenth century.

              As I have blogged about previously I believe that this early formation of a mixed-race nucleus of Dominicans not only ensured the preservation of their Amerindian lineage but also in fact their early Upper Guinean lineage from the 1500’s. Formerly showing up in pronounced “Senegal” scores, often ranking in first place in their African breakdown (see this chart). Regrettably this Upper Guinean founding effect for Dominicans as well as other Hispanics is now much less clearcut because of the way “Mali” has become inflated and the country labeling is less appropriate. See also:

              https://tracingafricanroots.wordpress.com/2015/06/14/documented-african-roots-of-dominicans/

              Like

            • And, you’re right about African Americans in the U.S.. Though we often have a false belief about just how much Native American ancestry we have, we do, indeed, on average have SOME native ancestry.

              I think the misconceptions around Native American lineage within American society generally speaking but also specifically for African Americans are a very interesting phenomenon. I would love to read some in-depth research about it which also takes in to account the recent surge in personal DNA testing.

              I think very often the belief of having one (full-blooded) Native American great-grandparent or so was indeed mistaken. However going back more generations into the early 1800’s and 1700’s the occasion of intermingling with Native Americans would have been greater. And the generally diluted scores within the 0-1% range do seem to indicate that this timeperiod might be most relevant when wanting to trace any Native American ancestor. In individual cases of course more recent lineage might also apply.

              I’m not sure if i will ever blog about it (as I intended) but from my survey of 200 African Americans these were the statistics I had calculated. Due to misreading of Native American DNA the “Asia Central” and “Asia East” score are arguably to be added. From what I’ve seen such potentially misleading overlap between Native American and Asian regions has become much less with this update. So that should be counted as an improvement as well I guess.

              Like

              • I’m still not sure when to take it as definite ancestry (As for the Native American possibility, my granddad was born in 1931 so I would imagine that if he really does have NA ancestry it might be from sometime in the 1600’s or very early 1700’s at the latest)
                . .when different tests are reporting different trace results (“Native American” on some analyses, “East Asian” on another yet “South Asian” on another).
                Same as the “Turkey and the Caucasus NEW 1%” my granddad also got on the “updated” ancestry.com

                Forgot to mention that myheritage DNA, on all my dad’s raw data uploads and his sample, assigned a 1% “Nepali” result.
                Like I said, it’s just been so weird.
                And I’m not sure why I’m seeing “North African” on some analyses, either. I uploaded my granddad’s raw data 11 times (6 23andme and 5 ancestry.com) and the sample and except for one 23andme upload all of them had a North African result of either less than or 1%.
                My dad’s sample has “8.1%” for “North African”
                myheritage assigned me a “7.8% North African” result from a V5 23andme upload. Just odd.

                Have you started looking at the new v5 23andme results yet? I didn’t save my previous results. The new ones are
                Sub-Saharan African 74.7%
                Nigerian 27.4%
                Coastal West African 14.4%
                Senegambian & Guinean 8.4%
                Congolese 8.2%
                Southern East African 0.4%
                African Hunter-Gatherer 0.1%
                Sudanese 0.1%
                Broadly West African 11.3%
                Broadly Congolese & Southern East African 1.0%
                Broadly Northern East African 0.1%
                Broadly Sub-Saharan African 3.1%

                European 23.2%
                British & Irish 8.2%
                French & German 5.8%
                Scandinavian 0.6%
                Iberian 0.4%
                Broadly Northwestern European 5.2%
                Broadly Southern European 0.6%
                Broadly European 2.5%

                East Asian & Native American 1.3%
                Filipino & Austronesian 0.4%
                Native American 0.4%
                Indonesian, Thai, Khmer & Myanma 0.1%
                Broadly Chinese & Southeast Asian 0.3%
                Broadly Northern Asian & Native American 0.1%
                Broadly East Asian & Native American 0.1%

                Unassigned 0.9%

                Like

                • When comparing DNA test results from different companies/websites it is crucial to keep in mind they will not always be perfectly inter-comparable due to differences in algorithm, different databases of reference samples and different labeling rationale of categories. So for example despite some overlap an “Africa North” score on Ancestry will be measuring something else than the equivalent on 23andme or elsewhere.

                  I personally have only tested with Ancestry and 23andme because frankly I have not been impressed favourably by the other companies. Having read reviews and seen how people with verifiable background are being described by MyHeritage, FTDNA, DNA Tribes , Nat Geo. etc. This also goes for third party analysis offered on Gedmatch, DNAland and such, again too confusing/misleading even for people with verifiable background and too little added value.

                  I have not been updated yet on 23andme but from what I’ve seen it’s quite promising. I will definitely blog about it eventually.

                  Like

  8. They definitely used Beninese Yoruba samples, Theres no other way my 42 percent Nigerian (which I trace to the Yoruba tribe, or at least majority) went down to 3 percent and my Benin/Togo percent went from 11 percent to 50! Craziness.

    Liked by 1 person

    • Confounding border-crossing ethnicity with nationality would truly be a major blunder on Ancestry’s part! But if this indeed happened I suppose it’s not too late yet to rectify. One would hope that Ancestry aims to be a learning organization… They should just be transparant about the ethnic backgrounds of their samples. And if they do not have the in-house knowledge to make sound judgement they should be more open to customer feedback pointing them in the right direction!

      Liked by 1 person

      • I honestly think they should invest time and money and sample 3 or 4 tribes from a couple countries. If you look in the map for Benin & Togo, you can see it it focused majority in East Nigeria! I think they should label the names by tribes rather than countries- even though it would seem like a stretch, It would be a major win for Ancestry and then they will definitely attract more customers. But yes I totally agree with you, your blog has been so useful, ive been here since 2015.

        Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s