Sanger’s cloud computing wins award
Sanger’s use of cloud computing has been recognised by HPCwire, the journal of high performance and data-intensive computing. The IT team won the Readers’ Choice award for Best Use of High Performance Computing in the Cloud.
The high performance computing community cited the way that the Sanger Institute is using a private OpenStack cloud IT environment to enable the sequencing and assembly of 100 complete human genomes a day. The flexible computing space is a vital part of the pipeline developed at Sanger to deliver the UK Biobank Vanguard project. This ambitious pilot project will sequence the 50,000 genomes from volunteers whose DNA samples are stored with UK Biobank, and the techniques and pipelines developed will pave the way for further large-scale projects.
“I’m delighted for the teams, who thoroughly deserve this award. Their creativity and dedication to overcoming the unique challenges posed by storing and analysing genomic data means that we are not only able to cope with, but can also explore at scale, the vast volume of data being generated by our sequencing operation, and to deliver that capacity quickly. Winning the Readers’ Choice award is especially humbling as it means that our peers worldwide acknowledge the groundbreaking work of our scientific computing and informatics teams.”
Tim Cutts, Head of Scientific Computing at the Wellcome Sanger Institute
To enable the Sanger’s high-throughput sequencing teams to store, assemble and analyse the data of more than 100 genomes at 30X pass, the High Performance Computing Team called up the flexibility and scalability of cloud computing. Because a person’s genome is one of the most personal and sensitive datasets that can be stored, the team built an onsite instance of the OpenStack Cloud computing environment to marry the best of cloud computing’s flexibility with the security of onsite management behind the Institute’s enterprise-grade firewall.
"The high bandwidth and data locality, coupled with large scale-out performance that our on-site OpenStack cloud supplies, allows us to flexibly adapt and dynamically scale our compute to meet emerging scientific challenges. This enabled us to rapidly deploy the infrastructure required to sequence up to 3,000 human genomes per month, with the capability of further growth in capability as needed."
The result of this implementation will allow high-quality whole genome sequences to be assembled, stored and analysed at scale.