Speaking the language of DNA
EMBL’s scientists have been instrumental in helping the world understand, decode, archive, and manipulate genomes at scale and across many branches of the evolutionary tree
By Ewan Birney, Eileen Furlong, and Arnaud Krebs
In 1978, British biochemist Frederick Sanger’s team released the genome sequence of ϕX174 – a virus that infects bacteria. The sequence – the five thousand or so nucleotide ‘letters’ that make up the instruction manual that is this virus’s genome – represented a remarkable feat, especially since DNA sequencing was still a young technique at the time. The development of DNA sequencing methods would bring Sanger his second Nobel Prize in Chemistry two years later.
Since then, the field of genomics has grown by leaps and bounds, and throughout this exponential revolution, EMBL has played a pivotal role, one which it continues to expand on today.
On sequences and sequencers
One of the pioneers of the field, Wilhelm Ansorge, joined EMBL in 1979 as a group leader and later became Head of the Functional Genomics Technology Unit. His group developed several methods for increasing the speed and accuracy of DNA sequencing, as well as the first automated fluorescence DNA sequencing system for large genomics DNA. Some of these methods were used later for the Human Genome Project – the large-scale international project which released the first sequence of the human genome in 2003. Ansorge’s team also participated in sequencing the genomes of yeast, mouse, mosquito, and a plant called Arabidopsis, often used as a model organism by biologists.
Cumulative and rapid advances in methods for DNA sequencing resulted in the arrival of next-generation sequencing (NGS) methods in the late 1990s and 2000s, which relied on carrying out huge numbers of sequencing reactions in parallel and then putting the sequences together by aligning them where they overlapped. Companies such as Roche, Solexa, and Illumina entered the field with commercial sequencing machines which could analyse and sequence genomes with ever-growing speed and accuracy, at larger scales and lower costs than older technologies.
EMBL was one of the first European research institutes to take advantage of this rapid technological progress. By 2001, it had set up the genomics core facility, GeneCore – then a Sanger sequencing and microarray provider, led by biochemist Vladimir Benes, who still heads the facility today. GeneCore was one of the first facilities in Europe to embrace the new sequencing methods, adapting and making them available to researchers across Europe as a service. Today, it is an advanced next-generation sequencing platform, providing end-to-end support to users from across its member states, as well as globally, in their genomics projects.
A parallel revolution in data
The growing number of DNA sequences also created a demand for computational support to deal with the resulting data deluge. In 1980, EMBL founded the EMBL Nucleotide Sequence Data Library, the first central depository of nucleotide sequence data in the world. Under Graham Cameron’s leadership, this later developed into the EMBL-European Bioinformatics Institute (EMBL-EBI), established in 1994 and located on the Wellcome Trust Genome Campus in Hinxton, UK.
Since then EMBL-EBI has developed and made available systematic annotations of genomes with flagship projects such as ENSEMBL, providing a roadmap for scientists to probe genome functions. This initiative – which allowed scientists to view and annotate genomes, identifying the location of genes and non-coding regions – was born in 1999, and then headed by Ewan Birney, currently the Deputy Director General of EMBL and Director of EMBL-EBI. Moreover, EMBL-EBI was a pioneer in the development of repositories for scientists to deposit their data, ArrayExpress for microarray data being an early example. These initiatives transformed the way we do science, as they promoted the reuse of datasets in different contexts, improving reproducibility and speeding up discovery. More EMBL achievements in the fields of bioinformatics and computational biology are discussed here.
Understanding the biology of the genome
Genome sequences, databases, and annotations provide scientists with a roadmap to decipher how the genome functions. Microarrays and next generation sequencing enabled the measurement of molecules, and their regulation, at a scale that wasn’t possible before. It allows scientists to ask fundamental questions such as, how does the cell maintain, replicate, and decode its genome in order to perform its many functions and adapt rapidly to changes? EMBL’s Genome Biology Unit tries to answer these questions. Headed by Senior Scientist Eileen Furlong since 2009, the Unit uses various systems-level approaches to unravel the complex processes involved in regulating the cell’s information flow from DNA, RNA, to proteins, integrating cutting-edge experimental and computational approaches and working across scales.
One of the central challenges of genome biology is to understand which parts of the genome are transcribed into mRNA with various cellular functions. Ansorge’s pioneering work allowed researchers for the first time to detect mRNA transcripts using a microarray chip – a small device that can help scientists simultaneously measure the expression levels of all genes in a sample. This groundbreaking technology later allowed the group of Lars Steinmetz to show that large parts of eukaryotic genomes, even outside of genes, are transcribed into RNA. Following this, Steve Cohen’s group was one of the first to identify a function for a particular type of non-genic transcripts – microRNAs – in regulating the expression of other genes.
Expression of a gene is controlled by the interplay of multiple molecular mechanisms, many of which were identified at EMBL. Inside cells, DNA is wrapped around proteins called histones in a structure called chromatin. The shape and structure of chromatin help regulate how easily key proteins called transcriptional activators can access their target genes, which, in turn, regulates gene expression. EMBL researchers made several key contributions in identifying and understanding the mode of action of molecular machines that regulate chromatin structure. For instance, Asifa Akhtar and Peter Becker were the first to discover and characterise a chromatin modifier that controls differences in gene expression between male and female fruit flies.
Jurg Müller’s Group discovered the enzymatic function of PCR1, one of the key regulators of the genes that control organismal development. These include the Hox-gene cluster, whose role in regulating fruit fly development was discovered at EMBL by Christiane Nüsslein-Volhard and Eric Wieschaus, and which was later shown to also be central for mammalian development by EMBL alumnus Denis Duboule.
EMBL researchers have also pioneered the study of ‘cis-regulatory elements’ – the DNA switches that turn genes on and off. This has been the focus of Eileen Furlong’s lab for the last two decades, producing a breadth of studies ranging from developing the methods to map the binding of transcription factors across the genome, to predicting their activity and uncovering the first map of the cis-regulatory elements at the level of an entire organism by pushing the boundaries of single cell genomics.
In addition, research at EMBL is highly interdisciplinary, and while genome function has been dissected using sequencing techniques, genome regulation processes have also been deciphered from a molecular perspective using structural methods, with multiple research groups providing key insights into the structure-function relationships of enzymes involved in genomic function, such as work from Christoph Müller’s group on RNA polymerase or chromatin remodelling complexes.
EMBL has also led the way in developing, validating, and distributing computational tools to analyse gene expression. For example, the software package DESeq (and its successor DESeq2) created by Wolfgang Huber’s team is one of the most widely used packages to identify changes in the expression of genes or their regulators.
From the DNA code to human health, biodiversity, and more
The study of genomes has opened up exciting possibilities for improving human health and helping solve global challenges. One area of progress has been the study of microbiomes – all the microorganisms that inhabit a given environment. Rapid sequencing of DNA in representative environmental samples can give us a snapshot of all the microorganisms living in it, creating a ‘metagenome’. This method has been used successfully by EMBL researchers to investigate microbiomes in diverse contexts, e.g. those living in soil or in our guts, and the effect of factors like antibiotics on them. EMBL Heidelberg’s Bork Group was one of the pioneers of metagenomics and microbiome sequencing and developed many of the tools that are widely used in this field today. Research by several groups at EMBL (Typas, Patil, Bork) has also shown that our gut microbiome influences how efficiently drugs can act and modulate the risk for certain diseases.
The Pan-Cancer Analysis of Whole Genomes project also deserves a special mention here. This project involved more than 1300 scientists and clinicians from 37 countries, who analysed more than 2600 genomes of 38 different tumour types, creating a huge resource of primary cancer genomes. This was co-led by Jan Korbel, and the starting point for many working groups across the world to study and provide novel insights regarding the early evolution of cancer, the mutational processes acting in tumours, and the genes active during distinct stages of tumour progression.
Genomic databases and studies are crucial for the progress of genomic medicine – the use of genomics-based technologies such as DNA sequencing to support healthcare. EMBL, particularly EMBL-EBI, contributes strongly in this regard via collaborations with initiatives such as Genomics England and the Nordic EMBL Partnership. EMBL is currently supporting Genomic Medicine initiatives across Europe, including in the UK, Denmark, Finland, Estonia, and Norway. The European Genomic Data Infrastructure (GDI) project brings many of these projects together. EMBL also plays a crucial role in empowering the scientific community by organising a number of meetings, conferences, workshops, and courses around genomics, focusing on both experimental work and data analysis.
As we head into the next 50 years of genomics, it is interesting to speculate on the future of the field. New strides are already being made in the fields of single-cell omics, where DNA, RNA, and the regulation of RNA transcription can be studied at the level of single cells, and even from intact tissues and organs. These studies provide new insights into a cell’s state in any given condition and how that compares to other cell types in the same organ or organism. Such technologies reveal changes during development and disease and are currently being used by EMBL scientists to understand how embryonic development is impacted by genetic and environmental variation, as well as to understand changes during cancer progression. Cancer is caused by multiple changes in DNA, and EMBL scientists currently collaborate with oncologists at Heidelberg University to pinpoint which DNA changes are the most detrimental and to understand how they function.
Large-scale perturbations of the genome are also giving researchers bigger tools than ever before to understand genome function in precise detail. Moreover, the emergence of methods to synthesise and edit entire genomes, at least for smaller organisms, will revolutionise our ability to understand and modify the functionalities of genomes. The growth of spatial-omics – the integration of imaging with genomics, transcriptomics, and proteomics technologies – and the use of AI-based systems for analysing genomic codes and predicting gene locations might soon provide a holistic view on genome function. Finally, we are seeing a revolution in the processing of genomes from environmental samples, such as those collected during the Traversing European Coastlines (TREC) expedition, leading not only to a better understanding of global biodiversity but also a clearer picture of the effect of our changing climate and human endeavours.
Learn more about EMBL’s contribution to life science research and services in our 50th anniversary commemorative publication.