We help the Human Genetics faculty groups evaluate and access the best methods to process and absorb the huge amounts of sequencing data produced by modern studies at Sanger. In practice, this means creating, testing and running variant-calling pipelines, RNASeq pipelines and annotation pipelines on cohorts of tens of thousands of genomes, exomes and transcriptomes. All this requires superb understanding and control of:
- The Sanger’s High Performance Compute architecture (many server farms with thousands of cores)
- The Sanger’s OpenStack Flexible Compute Architecture and how to reliably deploy into it
- Frameworks to run biological pipelines (e.g. Cromwell, NextFlow)
- Frameworks to store, annotate, filter and analyse very large amounts of genomic variant data (e.g. Hail , or more experimental software such as Tachyon)
- Tools to help us view and account for our storage and processing
We rely on Sanger's systems teams, and cooperate extensively with Sanger core teams and other Sanger program informatics teams such as Cancer Informatics and Cell Gen Informatics to share experience and practice.
We aim to deliver data in a reliable way and continously improve how and what we deliver. One thing is clear - we can no longer hand over vcf files and call it a day! Interested? Come talk to us!