A controversy at last: most of our DNA is junk, no it isn't, yes it is. Actually, I think it is – up to 90% really is junk.
Last year The Conversation published an article with an exciting headline:
Human Genome 2.0: ENCODE project debunks "junk" DNA.
ENCODE, in this case, referred to the Encyclopedia of DNA Elements, a large international research project that undertook new mapping of the genome in terms of features associated with gene regulation.
But one odd thing about the article was that – apart from the title and first line – it hardly mentioned junk DNA.
The most important statement came from an insightful comment, from Brendan Zietsch, a post-doctoral researcher at the University of Queensland, who pointed out that it was misleading to say the ENCODE project had debunked junk DNA.
He referenced some excellent work by genomics expert Sean Eddy explaining that junk DNA lives on (for details, see Eddy's blog and his published commentary).
Last month another top geneticist, Dan Graur from the University of Houston, Texas, and his colleagues published a great paper in Genome Biology and Evolution that also countered the ENCODE conclusions about the death of junk DNA.
Graur's is one of the most spirited demolition jobs I have ever read. If you have time it is worth reading it in full. If not, the title gives you a taste:
On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE.
Graur's is a bitingly witty paper. It runs for 40 withering pages and doesn't hold back.
Below are some of my thoughts on junk DNA. In short, it looks like the Eulogy for junk DNA, published in Science last September, together with the catchy paper No more junk DNA, could win prizes for the most misleading headlines of the year.
What is junk DNA?
Viruses and other small things that replicate rapidly and have large populations, have genomes with very little junk DNA. Viruses can't afford to carry unnecessary baggage. Competing viruses without baggage take over.
Bacterial, and many – but not all – fungal genomes are also pretty compact. Big things, like us, seem to have accumulated DNA, and done so at a much faster rate than we jettison it.
Since we lumber along slowly and only reproduce every 20 years or so, the extra load hasn't seemed to matter.
Think of DNA as being like computer data or code. Since your phone is small it just can't store that much, but in the case of your office hard-drive or server, there is no need to delete every spam email or every draft copy of every document you write.
It is an effort to find things and delete them. You don't want to delete the wrong thing or something that you might need one day. Gradually stuff builds up.
Every now and then email attachments, computer viruses or worms arrive. They are inactivated or quarantined, but lifeless copies of them pile up as well.
It is estimated that perhaps two-thirds of our genome is made up of parasitic virus-like sequences – transposable elements (or jumping genes), which are simply selfish entities that replicate themselves, much like computer viruses or worms. Nearly all of them are now inactive and harmless.
It isn't easy for our genomes to actually throw them out because they are stitched in among more valuable DNA – so we simply leave them there.
Then there are extra copies of genes – sometimes replication goes wrong and extra copies arise. Some of these acquire new functions but most just lose function altogether and are called pseudogenes.
There are lots of repeated sequences in our genomes.
As a general rule the genome hangs onto things rather than throwing them out. All the machinery is blind.
Since some of the stuff in our genomic shed is so important our life depends on it, it is usually best not to throw things away.
Junk and garbage
The non-functional bits are termed junk DNA. The expression was coined by the respected geneticist Susumu Ohno in 1972.
The term has always been controversial. The Nobel Laureate Sydney Brenner made the distinction between junk and garbage.
Junk is stuff you keep – because you don't get round to throwing it out and perhaps a few bits and pieces will become useful.
Garbage is stuff that begins to smell and you get rid of it. There isn't much garbage on computer drives and there isn't much garbage in genomes – but there is lots of junk.
So why did the ENCODE project and the media announce that 80% of our DNA is functional and that junk is dead? Mostly because they defined the word "function" loosely.
What is function?
Function is a tricky word. The junk in the bottom layer of my shed has a function – it serves as a shelf for the top layer and keeps it off the damp floor. The top layer has a function too – it serves as a cover for the bottom layer and keeps it free from dust. My junk has also recently acquired a new function – that is, causing my house to fill up with stuff since the shed is now full.
But those functions are ridiculous. In Dan Graur's paper he uses other nice examples of ludicrous functions. The biological function of the heart is to pump blood but one could argue that another function of the heart is to make a noise.
He points out that the ENCODE team defined function in the wrong way and this in part led them to suggest most of the genome is functional and therefore not junk.
For the ENCODE team DNA was considered functional if it is:
- transcribed (i.e. copied into RNA)
- binds a DNA-binding protein, or lacks associated packaging proteins called histones, or has histones with special marks
- is methylated.
Both Sean Eddy and Dan Graur point out that if ENCODE had included the ability to be replicated (copied into DNA) rather than simply transcribed (copied into RNA), ENCODE could quickly have declared 100% of the genome to be functional.
But being subjected to these activities, or having markers such as methylation or histones are not functions.
The key point is that the bits of junk in my shed are not things I would miss if they were broken. Thus they are non-functional.
What has gone wrong here? The fact is that ENCODE was a "big science" reference data collection exercise that checked for genomic labels and activity but not for function. Previous work had considered function by looking at conservation. Useful things are conserved and one misses them when they are gone.
Current estimates suggest only about 9% of our genome shows evidence of being under selective pressure and functional, not 80%. In other words up to 91% is junk and it is still junk despite the headlines.
So was ENCODE bad? Not at all. The purpose of ENCODE was not to determine whether or not our DNA is junk: the purpose was to catalogue the markings. A great deal of cataloguing was done, and done very well. The data will be useful. The problem was that the work was published in 30 papers.
Exciting headlines and take-home messages had to be squeezed out. Big investments in big science demand big outcomes, and this can cause problems.
In a world of "publish or perish" it wasn't enough to just say: "the data is now available on the web" and leave it at that. There had to be headlines. Nothing is more exciting than the idea that most of our genome has a secret function waiting to be discovered.
What could be better than overturning the idea that most of our genomes are junk and rewriting all the textbooks?
Science is driven by the hope of discovery. Humans are motivated by hope and hype. But that is not such a bad thing. Columbus may have wanted to find a short cut to India; Burke and Wills wanted to find the inland sea, thought to be in the middle of Australia.
Newspaper articles frequently declare that we only use 10% of our brains. These hopes were not well-founded but I rather admire the people who go out on a limb and are always looking for new things.
Columbus made a big discovery and although Burke and Wills didn't find rich farmlands, the land they mapped is rich in minerals. And there will be new discoveries in the genome too. Junkyards can become evolution's playgrounds.
Every now and then a bit of DNA that was termed junk will be found to have acquired a genuine function – it will be a real treasure, and that will make the headlines.
And, yes, there may be some functional bits amid the junk that we have overlooked. But most of our DNA will still be junk.
So in my view junk is junk and I expect that at least 80% of our DNA is junk.
But don't worry: I also predict at least 80% of the mysteries of the genome remain to be discovered. So genetics, too, is far from dead.