Information Communications Technology
In IT we run one of the World’s largest Life Sciences Data Centres. Currently, the Data Centre contains about 64PB of storage capacity and has 38,000 processing cores. We are adding to this at roughly 5PB a year from the analysis performed by the scientists plus 2PB a year or 5TB a day from the sequencers.
The IT infrastructure at the Sanger Institute is one of the most extensive in the life sciences. Every day we serve data to researchers across the globe; every week our web pages provide 80,000 page views from over 50 web domains.
At the turn of the century the Sanger Institute had just finished a big push to produce its share of the Human Genome Project, generating DNA sequence for public release. It was a major scientific endeavour and throughout the project provided significant challenges for the IT infrastructure.
Now, with tremendous sequencing capacity of emerging next-generation technologies, the IT infrastructure continues to grow dramatically and adapt to the Institute’s scientific needs.
The high performance compute facility (supercomputer) runs about two million tasks (programs / jobs) a week to cater for the research programmes and sequencing production pipelines. It can be thought of as one very large computer which is roughly 20,000 times larger than the average PC.
This is a fantastic tool to perform science. Our aim has always been to make this Big Data facility as easy to use as possible, so that our scientists can quickly extract the information they are looking for.
Discussions with all other large-scale genome sequencing centres are integral to maintaining and improving our IT infrastructure. We must address the particular challenges posed by the explosion of genetic sequence data are working with other centres to investigate international models for future data sharing, such as the Global Alliance for Genomics and Health.
To that end we are tenants in the Jisc Shared Data Centre, a collaborative effort between the Sanger Institute and Jisc, University College London (UCL), Kings College London, Queen Mary University London (QMUL), the Francis Crick Institute and others. We keep a second copy of all of our sequencing data in this facility.
The same facility also hosts eMedLab, a collaborative project for scientific computing in a cloud services environment based on OpenStack. eMedLab is a collaboration between UCL, the Francis Crick Institute, the Sanger Institute, European Bioinformatics Institute (EBI), the London School of Hygiene and Tropical Medicine, QMUL and others, with operational responsibility shared between UCL, Crick and Sanger.
The shape of our IT infrastructure will change dramatically in the future. Large scale collaborative science, and an extremely diverse software landscape, are driving us towards a more cloud-services oriented approach over the next few years, allowing scientists from other organisations to run their own bespoke analyses against our data, and vice versa.
Likewise, the advent of genomics within the clinical space increases our requirements for security, validation and resilience. Meeting these needs while not sacrificing the flexibility required by the Institute’s cutting-edge research science is our key challenge over the next few years.
Human Genetics Informatics (HGI)
Human Genetics Informatics (HGI) supports the scientific aims of the Human Genetics programme by developing and operating computational analysis workflows, managing ...
Informatics Support Group
High Performance Computing
Our Informatics support team is responsible for both developing and providing scale out scientific compute platforms that can both meet todays ...
New Pipeline Group (NPG)
DNA Pipelines Informatics
NPG is responsible for DNA Pipelines's production informatics analysis pipelines, Illumina sequencing QC tools and expertise, and internal archiving of ...
Production Software Development
LIMS compute and infrastructure
The Production Software Development team is responsible for core operations LIMS development in DNA Pipelines and Cellular Genetics within the Wellcome ...
Sequence Analysis and Management (SAM)
SAM contributes to various software packages for processing DNA sequence data, including samtools, htslib, biobambam and the Staden package. We also ...
Stem Cell Informatics
Stem Cell Informatics (SCI) develops custom laboratory information systems (LIMS) and computational research tools (WGE) for high-throughput laboratory analysis of human ...