A collaboration between the Sanger Centre and the EBI adds critical annotation to sequence data from the Human Genome Project

Today researchers at the Sanger Centre, a world-leading DNA sequencing centre, and the European Bioinformatics Institute (EBI) announced that their joint bioinformatics project called Ensembl has now confirmed the location of the sequence of more than 35,000 genes on the human genome and has identified a further 150,000 potential gene fragments.

Ensembl is an automatic tool that adds critical information to sequence information as it is submitted to genome databases, enhancing the usefulness of this data to academia and industry. One effect will be to speed up the process of identifying new targets for drug development. The new tool can be found at http://www.ensembl.org/.

The public Human Genome project has already released 3/4 of the human genome sequence. Ensembl aims to provide a comprehensive analysis of this data. Ensembl data and program source code will be available for the free and unrestricted use of biomedical researchers worldwide. Teams from Sanger and the EBI thereby hope to encourage worldwide collaborations in "adding value" to genome databases.

Annotating this data is necessary to interpret DNA sequences and identify genes. In simple organisms such as bacteria, most of the DNA consists of genes, but the roughly 100,000 human genes make up only about two per cent of the DNA molecule. The function of the other 98 per cent is unknown; in some cases it appears to be "noise". Ensembl contains automated routines which scan sequences for typical patterns found in genes and marks their positions in the molecule. Since new sequences arrive in bits and pieces, another of Ensembl's jobs is to plot each sequence onto the "map" of human chromosomes.

A number of new features are planned for the database in the near future, including integrating information about variant forms of genes called SNPs – many of which have been linked to genetic diseases. The SNP Consortium Ltd, a collaborative effort to create a freely available genome wide map of genetic markers, has recently announced the release of a total of 100,000 SNPs, 45% of which have been contributed by the Sanger Centre.

More information

1. The Sanger Centre, which receives the majority of its funding from the Wellcome Trust, is one of the world's leading genome sequencing centres. Both the Sanger Centre and the Wellcome Trust have been at the forefront of efforts to keep sequence data in the public domain. The Sanger Centre employs about 500 people in the purpose-built campus at Hinxton. The Centre is a leading partner in the Human Genome Project and also contributes to international projects to sequence the genomes of disease-causing organisms.

2. The Wellcome Trust is the world's largest medical research charity with an annual spend of some £600 million in the current financial year 1999/2000. The Wellcome Trust supports more than 3000 researchers at 300 locations in 30 different countries, laying the foundations for the healthcare advances of the 21st century and helping to maintain the UK's reputation as one of the world,s leading scientific nations. As well as funding major initiatives in the public understanding of science, the Wellcome Trust is the country,s leading supporter of research into the history of medicine.

3. The EBI is an Outstation of The European Molecular Biology Laboratory (EMBL); it maintains some of the world's largest databases of DNA and protein sequence data, develops tools to help biologists use it, and is the home of research groups who are looking for the biological significance of this data. The EBI is also one of the world's most important centres for bioinformatics training. EMBL is a basic research institute funded by 16 member states, including most of the EU, Switzerland and Israel. Research at EMBL is conducted by approximately 80 independent groups covering the spectrum of molecular biology. The Laboratory has five units: the main Laboratory in Heidelberg, Outstations in Hinxton (the European Bioinformatics Institute), Grenoble (on the campus of the ILL and ESRF), Hamburg (on the DESY site), and an external research programme in Monterotondo, Italy (sharing a campus with EMMA and the CNRS). The Laboratory provides essential services to the European scientific community, welcomes a large number of scientific visitors each year, and has an active international PhD programme.