15 August 2014

Samtools CRAMS in support for improved compression formats

Key upgrade to genomics software will underpin global data sharing

Samtools 1.0 is freely available at http://www.htslib.org/. This new version supports the highly efficient genomic data format CRAM, adds new functionality, and integrates more cleanly with other tools.

Samtools 1.0 is freely available at http://www.htslib.org/. This new version supports the highly efficient genomic data format CRAM, adds new functionality, and integrates more cleanly with other tools. [Genome Research Limited]

zoom

Computer scientists at the Wellcome Trust Sanger Institute have released a major upgrade of Samtools, one of the most popular next-generation sequence analysis tools. The revised Samtools 1.0 enables researchers to easily compress, share and analyse genomic sequence data, reducing costs and supporting genomics research around the world.

The Global Alliance for Genomics and Health, in which the Sanger Institute is a partner, has been set up to enable researchers and clinicians to work together using standardised and efficient DNA sequence data formats to find the genetic variants responsible for disease. Samtools 1.0 supports this initiative by enabling researchers to read and write data in the new CRAM format, which was recently adopted by the Global Alliance, in addition to the existing SAM and BAM file formats for genomic sequence information.

The benefits of using CRAM are immediate: it gives a size reduction of 10-30 per cent. In addition, in a similar fashion to the JPEG format for images, CRAM supports much greater compression - up to a hundred fold - in 'lossy' mode which preserves almost all of the important information.

"This major rebuild of Samtools reflects our commitment to supporting the global use of sequencing data," says Dr Richard Durbin, Head of Computational Genomics at the Sanger Institute. "Genome science worldwide relies on fast and efficient data analysis and storage, and Samtools 1.0 fulfils this need by supporting new sequencing and analysis technologies."

" Genome science worldwide relies on fast and efficient data analysis and storage, and Samtools 1.0 fulfils this need by supporting new sequencing and analysis technologies "

Dr Richard Durbin

Samtools software is embedded in many bioinformatics pipelines and is the foundation of many thousands of genomic research papers. Since its creation in 2009, the program has been downloaded more than 225,000 times. Samtools 1.0 is freely available at http://www.htslib.org/. This new version was substantially rewritten to support the highly efficient genomic data format CRAM, add new functionality, and integrate more cleanly with other tools.

"Samtools 1.0 embeds CRAM into genomic data analysis pipelines and removes the need for additional processing," says John Marshall, from the Sanger Institute. "This development paves the way for widespread uptake of this highly efficient file format in genomic research and will lead to lower storage costs."

The significant savings in storage that can be achieved are due to incorporating data compression techniques developed jointly by the Sanger Institute and the EMBL-European Bioinformatics Institute.

"It has been exciting to work on implementing CRAM into Samtools," says James Bonfield, at the Sanger Institute. "The great flexibility of CRAM has allowed a number of new compression techniques to be incorporated, which when combined with Samtools 1.0 will help to future-proof genomic data storage and analysis."

Notes to Editors

The Global Alliance for Genomics and Health

The Global Alliance for Genomics and Health is an international, non-profit alliance formed to help accelerate the potential of genomic medicine to advance human health. Bringing together over 150 leading institutions working in healthcare, research, disease and patient advocacy, life science, and information technology, partners in the Global Alliance are working together to create a common framework of standards and harmonized approaches to enable the responsible, voluntary, and secure sharing of genomic and clinical data.

Website

The Wellcome Trust Sanger Institute

The Wellcome Trust Sanger Institute is one of the world's leading genome centres. Through its ability to conduct research at scale, it is able to engage in bold and long-term exploratory projects that are designed to influence and empower medical science globally. Institute research findings, generated through its own research programmes and through its leading role in international consortia, are being used to develop new diagnostics and treatments for human disease.

Website

The Wellcome Trust

The Wellcome Trust is a global charitable foundation dedicated to achieving extraordinary improvements in human and animal health. We support the brightest minds in biomedical research and the medical humanities. Our breadth of support includes public engagement, education and the application of research to improve health. We are independent of both political and commercial interests.

Website

Contact the Press Office

Don Powell Media and Public Relations Manager
Wellcome Trust Sanger Institute, Hinxton, Cambs, CB10 1SA, UK

Tel +44 (0)1223 496 928
Mobile +44 (0)7753 775 397
Fax +44 (0)1223 494 919
Email press.office@sanger.ac.uk

* quick link - http://q.sanger.ac.uk/4ssxa3pf