Credit: CC0 Public Domain

Published in today's edition of Nature, the research led by Dr Monkol Lek of the University of Sydney and Dr Daniel MacArthur of The Broad Institute of MIT and Harvard Universities reveals patterns of genetic variation worldwide by sequencing the exomes of 60,706 individuals with diverse geographic ancestries, including European, African, South Asian, East Asian and Latino populations.

Using a massive exome sequencing database made available through the Exome Aggregation Consortium (ExAC), the international research team identified around 7.4 million genetic variants, providing unprecedented resolution into low-frequency protein-coding variants in human populations. ExAC catalogues exome data from 60,706 unrelated individuals sequenced from numerous disease-specific and population genetic studies. The ExAC website has been visited over 5.2 million times, and currently receives about 70,000 page views per week.

In a sub-analysis of the new Nature paper, the authors analysed 192 pathogenic variants reported in other studies, finding only nine with sufficient data supporting a conclusion that these variants had a strong disease association.

"Large-scale reference datasets of are critical for the medical and functional interpretation of DNA sequence changes," says Dr Lek.

"This analysis reveals global patterns of genetic variation providing resolution that hasn't been possible with smaller datasets of ."

Exome sequencing is a method for sequencing a subset of the human DNA genome that encodes proteins, known as exons. Humans have about 180,000 exons, constituting about one per cent of the human genome, or approximately 30 million base pairs. A base pair is a unit comprising two nucleotide bases bound to each other that form the building blocks of the DNA double helix. The genome contains about 3.2 billion nucleotides and about 23,500 genes.

Three-quarters of the known genetic disease-causing variants are located in the protein-coding exome. Given the cost and technical challenges in analysing the all genomic sequence data, researchers are focusing much of their research primarily on .

Interpreting findings is a significant challenge at the heart of sequencing. Each exome contains about 13,500 single nucleotide variants that change the amino acid and a large number of these are expected to be functional variants. The daunting task for medical researchers is to distinguish variants that are pathogenic from those that have little or no detectable clinical effects.

More information: Nature, DOI: 10.1038/nature19057
Nature, DOI: 10.1038/gim.2016.90
Nature, DOI: 10.1038/ng.3638

Journal information: Nature