8th October 2009

Standards for a new genomic age

Joint Announcement sets Six Genome Sequence Standards

The Wellcome Trust Sanger Institute's sequencing centre.

The Wellcome Trust Sanger Institute's sequencing centre. [Wellcome Library, London]

New standards in genome sequencing are called for today: the report authors assert that the world of genome sequencing must establish a suite of benchmarks against which a new genome sequence can be measured. The measures are independent of the technology used to deliver the sequence.

The result, they say, will be that researchers will have a much clearer idea of where they stand because of the more transparent and unambiguous estimation of any sequence quality.

There is a desperate need to address the variety of data output from next-generation sequencing and to provide guidance on the quality of the assembled sequence. The rapid deployment of different, high-throughput, next-generation sequencing platforms has challenged the traditional sequence assembly and analysis systems.

"The first three decades of sequencing produced relatively limited amounts of data in a small range of formats designed to deliver quality assemblies of DNA sequences," says Darren Grafham, from the Wellcome Trust Sanger Institute and joint first author on the report. "In the past couple of years a quiet revolution has shaken genomics. As a community, we had to provide guidance for researchers to help them use the outpouring of next-generation sequences as efficiently as possible."

"Standards are a major issue to be tackled in genomics right now," says Patrick Chain from Los Alamos National Laboratory (LANL), New Mexico, USA and joint first author. "These proposals are guideposts meant to inform users and generators."

A range of next-generation sequencing technologies, increasingly deployed in research, generate massive amounts of data in any one of several formats. One example is the Wellcome Trust Sanger Institute where, over the past two years, sequence output has gone from around 100 million bases per day to around 60 billion bases per day.

" In the past couple of years a quiet revolution has shaken genomics. As a community, we had to provide guidance for researchers to help them use the outpouring of next-generation sequences as efficiently as possible. "

Darren Grafham

Perhaps more important, many of these data are short sequence stretches for comparative genomics or other studies on related sequence and not data designed to produce draft or finished genome assemblies.

"There is a widening gap between the output data, draft genomes and finished genomes," explains, Chris Detter, Director of the LANL Joint Genome Institute and senior author on the report, "and a developing confusion over which data sets are of a high quality."

"Until now, we have simply had no descriptors or standards to help researchers. Initial discussions began at the Sequencing and Finishing in the Future meeting and have culminated in today's article"

The new standards will take into account the technologies, chemistry or computer programs used to produce and analyse the sequences to place new data into one of six categories.

The categories range from a 'standard draft sequence', the minimum for submission to the public DNA databases to 'finished sequence', where a sequence is as complete as it reasonably can be with current methods and has less than one error in 100,000 bases.

"Genome sequences are a resource that many researchers use to understand biology and disease," says Professor Julian Parkhill, Director of Sequencing and Head of Pathogen Genomics at the Sanger Institute. "However, it is crucial they can know in advance the quality of any sequence so that they can make best use of it. These guidelines will help to maximize the value of new genome sequences by ensuring that they are used in the most appropriate way."

Notes to Editors

Publication details

  • Genomics. Genome project standards in a new era of sequencing.

    Chain PS, Grafham DV, Fulton RS, Fitzgerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, Cole JR, Ding Y, Dugan S, Field D, Garrity GM, Gibbs R, Graves T, Han CS, Harrison SH, Highlander S, Hugenholtz P, Khouri HM, Kodira CD, Kolker E, Kyrpides NC, Lang D, Lapidus A, Malfatti SA, Markowitz V, Metha T, Nelson KE, Parkhill J, Pitluck S, Qin X, Read TD, Schmutz J, Sozhamannan S, Sterk P, Strausberg RL, Sutton G, Thomson NR, Tiedje JM, Weinstock G, Wollam A, Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium and Detter JC

    Science (New York, N.Y.) 2009;326;5950;236-7

Participating Centres

A full list of participating centres is available at the Science website.

The Wellcome Trust Sanger Institute

The Wellcome Trust Sanger Institute, which receives the majority of its funding from the Wellcome Trust, was founded in 1992. The Institute is responsible for the completion of the sequence of approximately one-third of the human genome as well as genomes of model organisms and more than 90 pathogen genomes. In October 2006, new funding was awarded by the Wellcome Trust to exploit the wealth of genome data now available to answer important questions about health and disease.


The Wellcome Trust

The Wellcome Trust is a global charitable foundation dedicated to achieving extraordinary improvements in human and animal health. We support the brightest minds in biomedical research and the medical humanities. Our breadth of support includes public engagement, education and the application of research to improve health. We are independent of both political and commercial interests.


Sanger Institute Contact Information:

Don Powell Press Officer
Wellcome Trust Sanger Institute Hinxton, Cambs, CB10 1SA, UK

Tel +44 (0)1223 496 928
Mobile +44 (0)7753 7753 97
Fax +44 (0)1223 494 919
Email press.office@sanger.ac.uk

* quick link - http://q.sanger.ac.uk/asoam8en