The Sanger Institute is developing a major programme in biological diversity genome sequencing across the tree of life. One of the driver projects for this is to play a major collaborative role in the international Vertebrate Genomes Project (VGP).

The Vertebrate Genomes Project (VGP) is a project of the Genome 10K (G10K) consortium. The mission of the VGP is to provide high quality genome assemblies of all vertebrate species to address fundamental questions in biology and disease. The current Phase 1 of the project aims at creating reference quality, near gapless, near error-free, chromosomal level, haplotype-phased assemblies of representative species for all ~260 vertebrate orders. Alongside this additional species are being sequenced to the same standard under the VGP umbrella to provide greater phylogenetic depth in certain clades of high research interest, paving the way for phases 2 and 3 of the VGP.

The Sanger Institute is serving as one of the key hubs of the VGP. Since 2017 we have focused on the sequencing, assembly generation and subsequent analysis of fish, caecilian amphibians and rodents. Alongside order representatives from these groups there are current focusses on cyprinid, certain cichlid, notothenioid and anabantoid fishes, as well as select rodents and caecilians. Other hubs include the Vertebrate Genomes Lab at the Rockefeller University, the Max Planck Institute for Cellular and Molecular Biology, and the Genome Informatics Section at the National Human Genome Research Institute.
We are also sequencing some additional vertebrate genomes as part of other projects and collaborations, including the Sanger 25 Genomes Project.


Species to be sequenced are selected with collaborators and flash frozen material is transferred to the Sanger Institute at -80 degrees celsius. A variety of DNA extraction, sequencing and assembling technologies are then combined, amongst them currently Pacific Biosciences, Oxford Nanopore and 10X Chromium sequencing, BioNano optical mapping, and Hi-C chromosome cross-linking.

Data use policy

The VGP releases sequence data, assemblies, SNPs and other variant calls, including those generated at Sanger, as a service to the research community. These data are released under the G10K Data Use Policy. Raw data from the Sanger Institute and some assemblies are also available under Sanger Institute policies. The G10K-VGP members, including those at Sanger, reserve the right to first publication of a genome-wide analysis of the data we have generated, including the use of genome-wide data for phylogenetic and evolutionary analysis, on behalf of ourselves as data producers, the sample providers and other collaborators. We strongly urge researchers to contact us at the email address below or the G10K-VGP Chair, Erich Jarvis, if there are any queries about referencing or publishing analyses based on pre-publication data from the VGP project.

If you have a query about using the project data in your studies or publications, we are happy to answer any queries and can be contacted at and


The spreadsheet lists species in the sequencing and assembly pipeline, together with the progress made.

Data use

