ENCODE and its first impractical application
Department of Forensic and Investigative Genetics, Institute of Applied Genetics, University of North Texas Health Science Center at Fort Worth, 3500 Camp Bowie Blvd, Fort Worth, TX, 76107, USA
Investigative Genetics 2013, 4:4 doi:10.1186/2041-2223-4-4Published: 17 January 2013
First paragraph (this article has no abstract)
The C value paradox, as initially coined, was encountered in early eukaryotic genomic studies with the oddity that genome size was not necessarily correlated with organism complexity . With the discovery of non-coding DNA in the 1970s, it became apparent that the size of the eukaryotic genome was not related to the number of genes contained within it. Indeed, only a small portion (approximately 2%) of the human genome carries coding genes [2-4], the rest being the so-called “junk DNA” . The human genome project further elucidated the number of genes in our genomes - counting a paltry 20,000 to 25,000 genes [2-4]. With so few genes one might ask “how could such a complex organism as Homo sapiens pass on the necessary genetic blueprint to the next generation?” An equally enticing question could be “how could nature be so wasteful and commit so much junk DNA to the human genome?” The Encyclopedia of DNA Elements (ENCODE) project has shed some light on these two questions. There is not one paper to cite but greater than 30 studies  that were coordinated and published in concert describing the results of a multi-year consortium effort to catalogue the functional elements of human DNA. Hundreds of authors reported on analyses of thousands of data sets. A good summary of the work is captured in the ENCODE Project Consortium’s September 2012 publication titled “An integrated encyclopedia of DNA elements in the human genome” . The ENCODE project identified a large number of functional elements, defined as sites that encoded a product or exhibited a biochemical signature in the human genome. The power of current DNA sequencing technologies made the Consortium project possible. The depth of analysis is impressive. In this one paper more than 1,600 data sets were analyzed for a multitude of elements including human protein-coding and non-coding RNAs, pseudogenes, RNA from different cell lines, binding locations of a number of DNA-binding proteins and RNA polymerase components, DNase I hypersensitive sites, locations for histone modifications, and DNA methylation. The most exciting finding and one that may begin to address the two questions posed above was that 80.4% of the genome has a biochemical function, that is, it is covered by or near at least one ENCODE-identified element. More precisely, a large portion of the human genome contains a regulatory event. The authors state that “95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction…, and 99% is within 1.7 kb of at least one of the biochemical events measured by ENCODE.” The outcome is that the noncoding junk DNA is far from being useless genome filler. Instead, seemingly inert DNA can influence functional genes. The nature of genetic and epigenetic control is quite complex and exquisite and today all that more appreciated. ENCODE is a public resource that will contribute substantially to the understanding of gene expression and mechanisms of disease and, hopefully, cures.