Information Communications Technology
Our goal is "To provide World Class High Performance Computing and First Class Production Platforms and Services for genome and biodata research."
In IT we run one of the World’s largest Life Sciences Data Centres. Currently, the Data Centre contains about 64PB of storage capacity and has 38,000 processing cores. We are adding to this at roughly 5PB a year from the analysis performed by the scientists plus 2PB a year or 5TB a day from the sequencers.
The IT infrastructure at the Sanger Institute is one of the most extensive in the life sciences. Every day we serve data to researchers across the globe; every week our web pages provide 80,000 page views from over 50 web domains.
At the turn of the century the Sanger Institute had just finished a big push to produce its share of the Human Genome Project, generating DNA sequence for public release. It was a major scientific endeavour and throughout the project provided significant challenges for the IT infrastructure.
Now, with tremendous sequencing capacity of emerging next-generation technologies, the IT infrastructure continues to grow dramatically and adapt to the Institute's scientific needs.
The high performance compute facility (supercomputer) runs about two million tasks (programs / jobs) a week to cater for the research programmes and sequencing production pipelines. It can be thought of as one very large computer which is roughly 20,000 times larger than the average PC.
This is a fantastic tool to perform science. Our aim has always been to make this Big Data facility as easy to use as possible, so that our scientists can quickly extract the information they are looking for.
Discussions with all other large-scale genome sequencing centres are integral to maintaining and improving our IT infrastructure. We must address the particular challenges posed by the explosion of genetic sequence data are working with other centres to investigate international models for future data sharing, such as the Global Alliance for Genomics and Health.
To that end we are tenants in the Jisc Shared Data Centre, a collaborative effort between the Sanger Institute and Jisc, University College London (UCL), Kings College London, Queen Mary University London (QMUL), the Francis Crick Institute and others. We keep a second copy of all of our sequencing data in this facility.
The same facility also hosts eMedLab, a collaborative project for scientific computing in a cloud services environment based on OpenStack. eMedLab is a collaboration between UCL, the Francis Crick Institute, the Sanger Institute, European Bioinformatics Institute (EBI), the London School of Hygiene and Tropical Medicine, QMUL and others, with operational responsibility shared between UCL, Crick and Sanger.
The shape of our IT infrastructure will change dramatically in the future. Large scale collaborative science, and an extremely diverse software landscape, are driving us towards a more cloud-services oriented approach over the next few years, allowing scientists from other organisations to run their own bespoke analyses against our data, and vice versa.
Likewise, the advent of genomics within the clinical space increases our requirements for security, validation and resilience. Meeting these needs while not sacrificing the flexibility required by the Institute’s cutting-edge research science is our key challenge over the next few years.
Director of ICT
Paul Woobey is the Director of ICT and leads the IT facility, which provides high-performance IT infrastructure (HPC) and Enterprise computing to support the large-scale science performed at the Sanger Institute. Our HPC (supercomputing) environment is recognised as World Class and we aim to provide First Class Production Platforms and Services for the institute. However, we do recognise that the next 5 years will bring a number of challenges for IT at the Sanger Institute. Catering for the increasing scale of science that is possible through decreasing costs and higher throughput from sequencing - which is predicted to double our data and computational requirements in the next five years. Our data centre is 90% full, so we need be prepared to open the 4th quadrant (3 are used currently) early in the next Quinqenium to cope with growth. Spin-out organisations and Bio-incubator organisations – with our ambition to offer our IT expertise and experience to these organisations under the banner of “Science as a Service”. Collaboration with our peers around the world amidst changing regulatory and validation requirements.
Human Genetics Informatics (HGI)
Human Genetics Informatics (HGI) supports the scientific aims of the Human Genetics programme by developing and operating computational analysis workflows, ...
Informatics Support Group
High Performance Computing
Our Informatics support team is responsible for both developing and providing scale out scientific compute platforms that can both meet ...