Connecting genes to diseases through proteins
Hundreds of connections between different human diseases have been uncovered through their shared origin in our genome by an international research team led by scientists from the Medical Research Council (MRC) Epidemiology Unit at the University of Cambridge, challenging the categorisation of diseases by organ, symptoms, or clinical speciality.
A new study published in Science today generated data on thousands of proteins circulating in our blood and combined this with genetic data to produce a map showing how genetic differences that affect these proteins link together seemingly diverse as well as related diseases.
Proteins are essential functional units of the human body that are composed of amino acids and coded for by our genes. Malfunctions of proteins cause diseases across most medical specialties and organ systems, and proteins are also the most common target of drugs that exist today.
The findings published today help explain why seemingly unrelated symptoms can occur at the same time in patients and suggest that we should reconsider how diverse diseases can be caused by the same underlying protein or mechanism. Where a protein is a drug target, this information can point to new strategies for treating a variety of conditions, as well as minimizing adverse effects.
In the study using blood samples from over 10,000 participants from the Fenland study, the team led by senior author Dr. Claudia Langenberg at the MRC Epidemiology Unit and Berlin Institute of Health at Charité Universitätsmedizin, Germany, demonstrated that natural variation in 2,500 regions of the human genome is very robustly associated with differences in abundance or function of 5,000 proteins circulating in the blood.
This approach addresses an important bottleneck in the translation of basic science to clinically actionable insights. While large scale studies of the human genome have identified many thousands of variants in our DNA sequence that are associated with disease, underlying mechanisms remain often poorly understood due to uncertainties in mapping those variants to genes. By linking such disease-related DNA variations to the abundance or function of an encoded protein, the team produced strong evidence for which genes are involved, and identified novel mechanisms by which proteins mediate genetic risk into disease onset.
For example, multiple genome-wide association studies (GWAS) have linked a region of the human genome known as KAT8 with Alzheimer's disease but failed to identify which gene in this region was involved. By combining data on both proteins and genes the team was able to identify a gene in the KAT8 region named PRSS8, which codes for the protein prostasin, as a novel candidate gene in Alzheimer's disease. Similarly, they identified a novel risk gene for endometrial cancer (RSPO3).
The authors used these new insights to systematically test which of these protein-encoding genes affected a large range of diseases. They discovered more than 1,800 examples in which more than one disease was driven by variations in an individual gene and its protein products. What emerged was a network-like structure of human diseases, because many of the genes connected a range of seemingly diverse as well as related conditions in different tissues. This provides strong evidence that the respective protein is the origin, and points to new potential strategies for treatment.
Dr. Langenberg explained: "An extreme example we discovered of how one protein can be connected to several diseases is the protein Fibulin-3, which we connected to 37 conditions, including hypermobility, hernias, varicose veins, and a lower risk of carpal tunnel syndrome. A likely explanation is atypical formation of elastic fibers covering our organs and joints, leading to differences in elasticity of soft and connective tissues. This is also in line with features that others have observed in mice where this gene was deleted."
Dr. Maik Pietzner, a researcher at the MRC Epidemiology Unit and co-lead author of the study, added: "Using our genome as the basis was key to the success of this study. Because we know that most of the proteins detected in blood have their origin in cells from other tissues, we integrated different biological layers, like gene expression, to enable us to trace proteins back to disease-relevant tissues. For example, we found that higher activity of the enzyme bile salt sulfotransferase was associated with an increased risk of gall stones through a liver specific mechanism. We linked around 900 proteins to their tissue of origin in this way."
In collaboration with colleagues at the Helmholtz Centre in Munich, Germany, the authors have developed a bespoke web application (www.omicscience.org) to enable immediate dissemination of the results, and allow researchers worldwide to dive deeply into information on genes, proteins and diseases they are most interested in.
Dr. Eleanor Wheeler, also at the MRC Epidemiology Unit and co-lead author of the study, concluded: "For most genomic regions associated with disease risk, the underlying causal gene and mechanism are not known. Our work demonstrates the distinctive value of proteins to zoom in on the causal gene for a disease and helps us to understand the mechanism through which genetic variation can cause disease. We envisage that the large amount of information we are sharing with the scientific community will help ongoing and emerging efforts to connect genes to diseases more directly via the encoded protein, thus facilitating accelerated identification of drug targets."