The accurate representation of ancestry is essential to interpret, access, and integrate human genomics data. For this purpose, the concept of a human reference genome allows for estimating ethnicity based on a type population in a geographical location, which yields a good approximation of the DNA of any single individual.

However, representation of ancestry differs among the various test providers. There are no established guidelines for the representation of ancestry information. Each company relies on its proprietary database of reference populations on which to base its findings of ancestry.

Of course all this relies on testing company current population sampling reference groups. It can be argued the reference population databases of such major test providers as 23andMe, AncestryDNA, FTDNA, and MyHeritage, all have shortfalls in achieving genomic diversity.

For example, people of European ancestry make up less than 25% of the global population, yet they represent the majority of the participants in genetic research of these labs, which are based in the U.S. These labs have an extensive Eurocentric database but a relatively small and limited Asian reference populations for estimating ancestry. However, as Figure 2 shows, these labs are continuously refining their estimations of ancestry by expanding their ancestry reference population databases.

Figure 2
Size of DNA Labs Reference Populations


The people of Asia account for ~60% of the world's population. To serve populations within this figure, the DNA testing lab, XCode, addresses the 23% of the world's population of South Asia heritage.This includes the 1.75 billion people of Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, and Sri Lanka.

Additionally, two DNA labs located in Asia, ZuyanDNA and Wegene DNA lab. address the ancestry of 1.6 billion people, 38% of the world's population, who live in East Asia (China, Mongolia, North Korea, South Korea, Japan, Hong Kong, Taiwan, Macau)and the 9% of the world's population living in Southeast Asia (Brunei, Cambodia, Southern China, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, Timor Leste, Vietnam, Christmas Island, Cocos Islands).

The Wegene lab has an extensive Asian reference population databases and is "one of the rare companies that specialize in the genetic exploration of Asian heritage." WeGene's analytical algorithm uses machine learning and is based on the Admixture ancestor analysis tool developed by the University of California, Los Angeles (UCLA). The algorithm compares the tested person's autosomal DNA information with theat of reference populations in the their database, and quantifies the similarities.

In any case, there is a need for a standardized framework for representation of ancestry data in genomics studies which represents the ancestry of samples in two forms:

1. A detailed description

2. An ancestry category from a controlled list

For example, Table 1 captures accurate, informative, and comprehensive information regarding the ancestry or genealogy of each distinct sample into a Standard Framework.

Table 1
World Standard Framework
Nebula Whole Genome Sequencing



CLICK to email me at: holeman1@gmail.com