Representing Genetic Diversity with Pangenomes by Xian Chang

Xian Chang

UC Santa Cruz | Ji Ing Soong Endowment Fund

The human reference genome is one of the most important and widely used resources in biological research, but it is essentially just one person’s genome and using it for genomic analyses can be inaccurate for people who are genetically dissimilar to that person. A “pangenome” reference better represents the genetic diversity of the human population and improves genomic analyses by mitigating the bias inherent in using the human reference genome.


The human reference genome is one of the most widely used resources in biological research. It is the basis for studying the functional biology of the human genome, genetic variations and their implications in disease, evolutionary relationships between humans and other species, and countless other basic biological and clinical questions. As a “reference”, the reference genome serves as a standard scaffold against which new genomic data is compared. In order for a reference to be effective, it must be similar enough to the sample that they can be compared and differences between them identified and interpreted. However, the human reference genome is a “linear” genome that represents just one copy of a genome and has no information about genetic diversity. Because of its lack of diversity, the reference genome can differ significantly from an individual’s genome and can bias new samples to appear more similar to the reference. In its current form, the human reference genome is not representative of the human population and analyses that use it can be inaccurate for people who are dissimilar to the reference. One emerging alternative to a linear reference genome is a “pangenome” reference that represents a collection of genomic sequences. A pangenome incorporates information about genetic variants and can therefore better represent the genetic makeup of a population. With an improved reference, we can reduce biases in genetic analyses to make genomic research and future genetic testing more accurate and useful for diverse populations. The computational tools that my lab has developed are capable of improving genomic analyses over the current standard and require less computation time despite the increased complexity of using a larger pangenome reference.


14 + 13 =