Wellcome Trust Announces Major Investment in Genome Bioinformatics

Five-year investment to support the Ensembl project, the database providing automatic annotation of the human genome

Email newsletter

News and blog updates

Sign up

The Wellcome Trust today announced a major investment of at least £8 million over five years in the Ensembl project, the database providing automatic annotation of the human genome.

The increased resources in staff and computer power for the gene “software” will mean a much speedier collection and dissemination of information on the function of genes, greatly aiding the work of researchers around the world in finding new diagnostic methods and treatments for a huge variety of diseases.

“Mapping the human genome is an amazing scientific achievement with the power to touch the lives of everybody on the planet. It is important that information is made available in the most ‘user-friendly’ and complete way – and made available free of charge – and this is why the Ensembl project is so vital.

“Ensembl is a wonderful way of transmitting genetic information clearly and quickly across the world. Having such a reference centre, and a pipeline to the wider scientific world, will prove invaluable in the coming years in the fight against a wide range of illnesses.”

Dr Michael Dexter Director of the Wellcome Trust

Ensembl has been developed at the Sanger Centre and the European Bioinformatics Institute (EMBL-EBI – part of the European Molecular Biology Laboratory) on its Genome Campus in Hinxton, Cambridgeshire.

On June 26th an international consortium of public laboratories announced the first ‘working draft’ of the human genome sequence, which was hailed as one of the most outstanding scientific achievements of our lifetime. The public availability of this comprehensive genetic information presents huge opportunities to develop new treatments for diseases based on understanding of the basic molecular processes of life.

However, to understand and exploit the information in the genome, sophisticated computer methods must seek biological meaning by analysing the sequence. One important part of this is to locate genes, which make up only a small part (probably less than three per cent) of the total DNA in humans. The resulting “annotated” DNA sequence must then be made accessible to scientists throughout the world.

The aim of Ensembl is to provide the reference view of genome sequence data as a freely available resource for scientists and the public. Ensembl has been providing automatically generated analysis of human genome sequence since the end of 1999. Since the completion of the working draft the Ensembl team has been collaborating with other international public bioinformatics centres connected with the Human Genome Project to provide an ordered view of the working draft sequence for researchers as quickly as possible. A full analysis of the first version of the working draft, in which the fragments of genome sequence have been organised and connected into a whole, has already been made available via the Ensembl web site.

“This is a superb example of the synergy that is possible through collaborations of institutions in Europe and of the quality of work that is possible in the public domain.”

Fotis Kafatos Director-General of EMBL

“This is wonderful news for open, public domain bioinformatics. This grant will enable Ensembl to expand its team and give the project sufficient compute resources to process the avalanche of sequence data that is being generated.

“Since Ensembl went live in 1999, the Ensembl team have worked to provide researchers worldwide with both an integrated view of what our DNA means and the programming tools to develop their own ways of exploring that data.”

Ewan Birney who heads the Ensembl initiative from the EMBL-EBI side

The Ensembl project is based on an entirely ‘open’ philosophy: all data and program source code are available for the free and unrestricted use of both academic and commercial biomedical researchers worldwide. The Ensembl site and data resources are already being used by large numbers of researchers.

Software developers from both academia and major pharmaceutical companies have also begun participating in a totally open software collaboration with the Ensembl team to speed the development of the software.

New resources such as Ensembl are critical to add value and organise raw sequence data being deposited in the public sequence archives and so maximise the benefit to mankind from this exciting era in science”.

Graham Cameron Joint Head of EBI

“Ensembl puts the genome on the desktop of biologists worldwide, and will provide key infrastructure for functional genomics programmes being pursued at the Sanger Centre and elsewhere.”

Richard Durbin Head of Informatics at the Sanger Centre

Although Ensembl plans to provide a comprehensive view of genomic data for biologists, it is structured so as to be as open as possible to ideas and data from other groups.

“The human genome is too complex for any organisation to have a monopoly of ideas or data.”

Tim Hubbard who heads the Ensembl initiative from the Sanger Centre side

More information

  1. The Wellcome Trust is the world’s largest medical research charity with an annual spend of some £600 million in the current financial year1999/2000. The Wellcome Trust supports more than 5000 researchers at 300 locations in 42 different countries, laying the foundations for the healthcare advances of the 21st century and helping to maintain the UK’s reputation as one of the worlds leading scientific nations. As well as funding major initiatives in the public understanding of science, the Wellcome Trust is the country’s leading supporter of research into the history of medicine. http://wellcome.org/
  2. The Sanger Centre, which receives the majority of its funding from the Wellcome Trust, is one of the world’s leading genome sequencing centres. Both the Sanger Centre and the Wellcome Trust have been at the forefront of efforts to keep sequence data in the public domain. The Sanger Centre employs about 500 people in the purpose-built campus at Hinxton. The Centre is a leading partner in the Human Genome Project, and is responsible for sequencing one-third of the human genome sequence and also contributes to international projects to sequence the genomes of disease-causing organisms. https://www.sanger.ac.uk/
  3. The EBI is an Outstation of The European Molecular Biology Laboratory (EMBL); it maintains some of the world’s largest databases of DNA and protein sequence data, develops tools to help biologists use it, and is the home of research groups who are looking for the biological significance of this data. The EBI is also one of the world’s most important centres for bioinformatics training. EMBL is a basic research institute funded by 16 member states, including most of the EU, Switzerland and Israel. Research at EMBL is conducted by approximately 80 independent groups covering the spectrum of molecular biology. The Laboratory has five units: the main Laboratory in Heidelberg, Outstations in Hinxton (the European Bioinformatics Institute), Grenoble (on the campus of the ILL and ESRF), Hamburg (on the DESY site), and an external research programme in Monterotondo, Italy (sharing a campus with EMMA and the CNRS). The Laboratory provides essential services to the European scientific community, welcomes a large number of scientific visitors each year, and has an active international PhD programme. http://www.ebi.ac.uk/
  4. Ensembl integrates and is built on top of data from existing database resources provided by both institutes. EMBL-EBI is one of the three worldwide repositories for biological sequence data. It houses both the EMBL DNA database and the SWISSPROT protein sequence database, which are core resources used by researchers worldwide. The total amount of DNA sequence data deposited doubles every 6 months, while computers only double in speed every 18 months. Functional annotation of genes in Ensembl are provided from Pfam and other protein domain resources combined together in the INTERPRO project.Ensembl is also integrating emerging data resources that are being generated by post- genomic initiatives, such as from genetic variation projects. These include the efforts to find on the genome where single bases show differences from individual to individual (single nucleotide polymorphisms or SNPs for short).

    The SNP Consortium Ltd, a collaborative effort of commercial companies and the Wellcome Trust to create a freely available genome wide map of such information, has recently announced the release of a total of 100,000 of these points. The Sanger Centre has located 45% of these. The SNPs – many of which have been linked to genetic diseases – have already been integrated into Ensembl and are visible on its web displays.

  5. Internet resources:
    Ensembl: http://www.ensembl.org/
    The Sanger Centre: https://www.sanger.ac.uk/
    EMBL-EBI: http://www.ebi.ac.uk/
    The Wellcome Trust: http://wellcome.org/
    EMBL: http://www.embl.org/
    The SNPs Consortium http://snp.cshl.org/
    Pfam: http://pfam.sanger.ac.uk/
    DAS: http://stein.cshl.org/das/