11 May 2012

Institute researcher is sequence-squeezing champ

James Bonfield wins worldwide competition to speed up access to genetic data

James Bonfield has won the Pistoia Alliance Sequence Squeeze competition to produce a more efficient and accurate way to compress next-generation DNA sequencing information.

James Bonfield has won the Pistoia Alliance Sequence Squeeze competition to produce a more efficient and accurate way to compress next-generation DNA sequencing information. [Genome Research Limited]

Wellcome Trust Sanger Institute researcher, James Bonfield, has won the $15000 Pistoia Alliance Sequence Squeeze prize for creating the best ways to efficiently compress genetic data. His work will help to speed the sharing of genetic information around the world, and he couldn't have done it without the help of his competitors.

"My programs would have been substantially weaker had I not had the challenge of my fellow competitors. The mix of competition and open discussion really produced amazing results," James said. "But, perhaps the most exciting thought is where this work will go next. Several entrants shared ideas with each other and I suspect that we can produce an even better solution by combining the best parts from each of our entries."

This competition was created by the Pistoia Alliance - a precompetitive alliance of research groups, pharmaceutical companies and scientific societies seeking to improve worldwide genetic research by solving the problems that all researchers in the field face. The aim was to drive the creation of solutions to one of the most pressing problems in genetic research today: the storage and sharing of the vast volumes of genetic data that researchers need to find disease-causing gene variants.

" High-speed sequencing [is] opening up the genetic study of disease in incredible depth...[but] current storage solutions are struggling to cope, which is why James' work is so vital. It literally reduces the size of the problem. "

Tony Cox

"The latest high-speed sequencing machines are opening up the genetic study of disease and biological pathways in incredible depth because they allow hundreds or thousands of genes or genomes to be read and compared," said Tony Cox, Head of Operation Production Software and Informatics at the Sanger Institute. "However, this major leap forward is creating mountains of data that need to be stored and distributed around the world. Current storage solutions and internet transfer methods are struggling to cope, which is why James' work is so vital. It literally reduces the size of the problem."

The competition itself was a demonstration of a novel way to drive forward innovation through its open and interactive set up. The Alliance encouraged continual innovation by posting an interactive leaderboard that showed, day by day, which entrant had produced the most efficient approach. In addition, the collaborative nature of the competition saw entrants sharing their problems and ideas on a variety of discussion forums.

"Seeing my entry being beaten by others spurred me on to improve my code again and again," James said. "Forums, such as encode.ru, had numerous and surprisingly open discussions on ideas, particularly from respected programmer Matt Mahoney, who went as far as to post code snippets. The views on that thread gave me ideas for improving my own program, so the final outcome was better than if I had worked purely in isolation."

Out of the more than 100 entries, James' solutions were judged to be the best overall for compressing the avalanche of information produced by the latest high-speed sequencing machines into forms that can be easily stored and transferred across the internet. The judging panel evaluated the approaches on their ability to:

  • Squeeze the data into the smallest possible space (have the highest compression ratio)
  • Achieve this in the shortest possible time (fastest compression and time)
  • Allow others to unpack the compressed data as quickly as possible for use (fastest decompression time)
  • Use the least amount of computing memory to compress and decompress the data

James' algorithms scored highly in the top three criteria and ensured that alignment data was preserved to allow genetic sequences to be put together quickly and efficiently.

James will be giving half of his prize money to the British Heart Foundation.

"We are delighted for James that his work has been recognised in this way," said Emma Millican, Head of DNA Pipelines and responsible for sequencing at the Sanger Institute. "We hope that his efforts will benefit the Institute and genetic researchers around the world for years to come."

Notes to Editors

The Pistoia Alliance

The Pistoia Alliance is a global, not-for-profit, precompetitive alliance of life science companies, vendors, publishers, and academic groups that aims to lower barriers to innovation by improving the interoperability of R&D business processes.

Website

The Wellcome Trust Sanger Institute

The Wellcome Trust Sanger Institute is one of the world's leading genome centres. Through its ability to conduct research at scale, it is able to engage in bold and long-term exploratory projects that are designed to influence and empower medical science globally. Institute research findings, generated through its own research programmes and through its leading role in international consortia, are being used to develop new diagnostics and treatments for human disease.

Website

The Wellcome Trust

The Wellcome Trust is a global charitable foundation dedicated to achieving extraordinary improvements in human and animal health. We support the brightest minds in biomedical research and the medical humanities. Our breadth of support includes public engagement, education and the application of research to improve health. We are independent of both political and commercial interests.

Website

Contact the Press Office

Don Powell Media and Public Relations Manager
Wellcome Trust Sanger Institute, Hinxton, Cambs, CB10 1SA, UK

Tel +44 (0)1223 496 928
Mobile +44 (0)7753 775 397
Fax +44 (0)1223 494 919
Email press.office@sanger.ac.uk

* quick link - http://q.sanger.ac.uk/120511