International Vertebrate Genomes Project releases first 15 high-quality reference genomes

Publicly available data will impact studies on life, disease, and conservation efforts

International Vertebrate Genomes Project releases first 15 high-quality reference genomes

The Vertebrates Genomes Project (VGP) has been set up by the Genome 10K consortium to provide high-quality, near error-free, and complete genome assemblies of all 66,000 vertebrate species on Earth
The Vertebrates Genomes Project (VGP) has been set up by the Genome 10K consortium to provide high-quality, near error-free, and complete genome assemblies of all 66,000 vertebrate species on Earth

The Genome 10K (G10K) announces the official launch of a new project, the international Vertebrate Genomes Project (VGP), and its first release of 15 new, high-quality reference genomes for 14 species representing all five vertebrate classes – mammals, birds, reptiles, amphibians, and fishes. The mission of the VGP is to provide high-quality, near error-free, and complete genome assemblies of all 66,000 vertebrate species on Earth to address fundamental questions in biology, disease, and conservation.

Genomes 10K consortium logo

The new sequences are stored and publicly available in the Genome Ark database, a new digital open-access library of genomes generated by the G10K-VGP consortium and hosted by Amazon, and will soon be processed for gene identifications in international public genome browsing and analyses databases, including the National Center for Biotechnology Information (NCBI), Ensembl, and University of California, Santa Cruz (UCSC) genome browser. The G10K-VGP consortium has convened more than 150 experts from academia, industry, and government, from over 50 institutions in 12 countries, to develop high-resolution sequencing and genome assembly methods that reduce cost and eliminate errors that plague current reference genomes. The new VGP genomes eliminate many of these errors. For conservation efforts, these VGP genomes will be used to identify species most genetically at risk for extinction, preserving their genetic information for the future and helping to save them from extinction.

One of the species included in the first release is the kakapo, a flightless parrot found only in New Zealand that is on the brink of extinction, with less than 150 alive. In partnership with the Kakapo Genetic Rescue Project, G10K Chair Erich Jarvis, professor at Rockefeller University and Howard Hughes Medical Institute Investigator, and his group helped sequenced samples from a bird named Jane to create a high-quality assembly that will now become the reference genome for her species. Jane unfortunately died on May 17, 2018, just before the completion of her genome. This first data release of species is being dedicated to Jane and to conservation efforts all over the world to preserve Earth’s biodiversity.

The 15 genomes created through the VGP are a proof of principle demonstrating the strength of the G10K-VGP consortium and the new sequencing technology’s dependability and scalability to sequence all vertebrate genomes. These genomes are currently the most complete versions of their species to date:

Mammals (4 species)
Reptiles (1 species)
Amphibians (1 species)
Anna's hummingbird, one of three species whose genomes have been read by the Vertebrate Genomes Project. Image credit: Alan Vernon, flickr
Anna's hummingbird, one of three species whose genomes have been read by the Vertebrate Genomes Project. Image credit: Alan Vernon, flickr

Birds (3 species. 4 genomes)

Fish (5 species)

These species represent a large diversity of traits and are used to study species evolution and adaptation:

Over the last three years, the G10K-VGP consortium worked behind the scenes to compare all the major sequencing and analysis technologies on just a few animals to help advance and develop the needed technologies to create higher quality, “platinum-level” genomes. They found, as some others have, that sequencing technologies with long reads always gave higher-quality results than with short reads and that technologies that measure long-range genome interactions are necessary to “assemble” these DNA reads into whole chromosomes. Further, they found that the common practice of merging the paternal and maternal chromosomes (haplotypes) into one genome was causing numerous errors. Therefore, they are now assembling the paternal and maternal DNA of an individual separately (called phasing).

“I got tired of having my students spend months to a year or more, and more money, re-cloning and re-sequencing genes because the current draft genome assemblies were not good enough for our studies of genetics of vocal learning and spoken language in songbirds and humans. So, when I was asked and voted in as G10K Chair, I decided to make it a mission to help generate high-quality genome assemblies for studies using any vertebrate species. The bird genomes are also being generated as part of an associated Bird 10,000 (B10K) genomes project.”

Erich Jarvis, Chair of Genome 10K (G10K), Professor at Rockefeller University, and Howard Hughes Medical Institute Investigator

“The advances in long-read sequencing and long-range scaffolding technologies is revolutionizing de novo DNA sequencing. After a 10-year hiatus, this trend inspired me to return to genome assembly as I believe we will ultimately be able to produce near-perfect, telomere-to-telomere genome reconstructions, and if current cost trends continue, for less than $1,000 on average per vertebrate species, thus dramatically altering the landscape of genomics.”

Gene Myers, a director of the Max-Planck Society in Dresden and well-known bioinformatician, G10K Council member and lead of one of the sequencing hubs

The current Phase 1 genomes are being built with Pacific Biosciences long reads to generate an initial assembly of pieces of chromosomes (called contigs), 10X Genomics linked reads to join them together in bigger pieces (called scaffolds), Bionano Genomics optical DNA maps to link them at a larger scale and correct structural errors in the sequence assembly, Arima Genomics (also Dovetail Genomics and Phase Genomics) Hi-C proximity-ligation data to bring larger pieces together into whole chromosomes, and G10K-VGP genome assembly computer algorithms, which were specifically developed by this consortium and will become useful for all species.

“Until recently, sequencing the complete genome of a single animal required millions of dollars and years of effort. New sequencing technologies have dramatically reduced the cost and made it possible to reconstruct near-perfect genomes for the first time. Despite these advances, the computational challenges of assembling and analyzing thousands of genomes remain. To tackle these remarkable challenges, we have assembled an all-star team of bioinformaticians and are recruiting help from around the world. In addition, our corporate informatics partners at DNAnexus and Amazon Web Services have been instrumental in getting this project off the ground.”

Adam Phillippy, Chair of the Vertebrate Genomes Project (VGP) Assembly Working Group and Head of the Genome Informatics Section at the National Human Genome Research Institute

The G10K-VGP consortium plans to complete the VGP in taxonomic hierarchy from Phase 1 representing all 260 orders of living vertebrates, to Phase II representing 1,045 families, Phase III representing 9,478 genera, and finally Phase IV, representing approximately all 66,000 species of vertebrates. Additionally, the VGP will sequence the heterogametic sex where it exists, so that both sex chromosomes can be recovered for each species. The species in Phase 1 are based on a proposed new definition of orders based on species that diverged from each other soon after the last mass extinction event that killed off the dinosaurs 66 million year ago. Studying these ordinal-level species will help scientists determine what type of species survived that mass extinction and inform efforts on how to help species survive the current anthropogenic 6th mass extinction event.

“The last 20 years have proven the value of openly available high-quality reference genome sequences to scientific research, but until now, these have mostly been available just for humans and other key organisms. We are entering an era in which we will obtain reference genome sequences for all species across the Tree of Life. This announcement and data release are key steps towards this goal, for vertebrates, the phylum of animals that we belong to."

Richard Durbin, of the University of Cambridge and the Wellcome Sanger Institute, G10K Council member and lead of the sequencing hubs

“Today represents a monumental example of what is possible when determined people imagine the future. Working together we have sequenced 15 exquisite genomes from across deep evolutionary time, unique in their quality and perfection, enabling us for the first time to uncover the genetic basis of vertebrate life. Now that we started producing exquisite genomes of all living vertebrate orders at high-quality, imagine doing so for all life. Why not?”.

Professor Emma Teeling, University College Dublin, Ireland and Director of the associated Bat 1K Project

"This is a real tour-de-force. We could not have imagined, twenty years ago, that we would ever have genome sequences of more than a handful of animals. Now we have real prospects of solving evolutionary mysteries and charting population health in endangered (even extinct) animals."

Jenny Graves, one of the pioneers of comparative genomics and sex chromosome evolution who was not involved in recent sequencing projects

Notes to Editors

The G10K-VGP leadership consists of a 15-member council, a board of trustees, and 16 subgroups that perform the daily operations of the VGP, including obtaining tissue sample permits, executing DNA extractions, sequencing genomes, performing genome alignments and annotation, and managing the project within and across institutions and countries. The genome sequencing hubs are currently based at the Rockefeller University in New York led by Olivier Fedrigo and Erich Jarvis, the Sanger Institute in the United Kingdom led by Richard Durbin and his team including Shane McCarthy and Kerstin Howe, and the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden, Germany led by Gene Myers and his team including Martin Pippel and Sylke Winkler. The assembly team to which they all belong is led by Adam Phillippy, along with his team members Arang Rhie and Sergey Koren at the NIH. Building on her previous experience assembling and phasing human and animal genomes, Dr. Rhie made a massive effort to help develop a standard assembly process for the VGP. Harris Lewin and his postdoc Joana Damas at UC Davis and others played essential roles in evaluation of assemblies and other stages of the project. The VGP hubs are currently working with major sequencing and assembly companies to further test, improve, and generate new approaches for producing the most complete and error-free reference genomes possible. The G10K-VGP has an open-door policy for any scientist and others that want to join, so long as they follow the G10K policies.

Approximately $600 million is needed to complete all VGP phases. The G10K-VGP is currently focused on completing Phase 1 through crowdsourcing among scientists, having raised $2.5 million of the $6 million thus far needed for this phase. For those in the public that wish to help support the project, or even sponsor a species, more information is available at https://vertebrategenomesproject.org/ways-to-help-1/. Financial gifts to the G10K-VGP can be donated at https://giveandjoin.rockefeller.edu/vgl-donate.

Selected Websites
How do you put a genome back together after sequencing?FactsHow do you put a genome back together after sequencing?
After DNA sequencing is complete, the fragments of DNA that come out of the machine are all jumbled up. Like a jigsaw puzzle we need to take the pieces of the genome and put them back together.

How do you find out the significance of a genome after sequencing?FactsHow do you find out the significance of a genome after sequencing?
We’ve sequenced the genome, put it back together and identified the genes, but now we need to find out what this genome can tell us and how it compares to other genomes.

Contact the Press Office

Dr Samantha Wynne, Media Officer

Tel +44 (0)1223 492 368

Emily Mobley, Media Officer

Tel +44 (0)1223 496 851

Wellcome Sanger Institute,
Hinxton,
Cambridgeshire,
CB10 1SA,
UK

Mobile +44 (0) 7900 607793

Recent News

Wellcome Sanger Institute at 25: how the genomic revolution is changing medicine

Leaps forward in knowledge have allowed scientists and doctors to start to bringing advances out of the lab and into the clinic to directly benefit patients

LifeLab - Free events highlight discovery on your doorstep

Events include pop-up labs, a puppet show with a difference, story-telling and retro gameshows

International Vertebrate Genomes Project releases first 15 high-quality reference genomes

Publicly available data will impact studies on life, disease, and conservation efforts