Tim is responsible for the provision of all the IT services which specifically deliver scientific data and computation. This includes:
- High performance computing (HPC) clusters
- High performance parallel filesystems for data analysis
- Training in the use of HPC resources
- Data management systems for scientific data
- Web services for the delivery of scientific content
- Research and development of IT technologies for solving tomorrow's scientific problems
He has a particular bee in his bonnet about using resources efficiently.
The changing focus of the Institute has led to us concentrating on a few areas over the next year or two:
- Validated IT systems. The advent of personalised medicine and clinical genomics means we must provide some validated infrastructure for projects in these areas.
- Pre-configured systems. To free the team's time to assist scientists with their computational challenges we are moving, where possible, to pre-configured hardware systems rather than building them by hand. An example area where we are following this approach is in Lustre parallel filesystems. This becomes as easier as these technologies gradually become more mature and mainstream.
- Flexible secure architecture. Science changes all the time, and the approaches to it change all the time. IT technology generally lasts 5 years, which means we must deploy solutions which have the maximum flexibility to adapt to unknown requirements years from now. We are actively developing private cloud infrastructure with software-defined storage and networking to enable scientists to deploy experimental working environments rapidly and without adversely impacting their colleagues.
- Collaboration. We are actively involved in the Global Alliance for Genomics and Health, the Pan-cancer project, and eMedLab, all of which are building infrastructures for the sharing and collaborative analysis of large volumes of data.
During my previous role creating and leading the Infrastructure Management Team, I focussed on a number of areas:
- Server virtualisation and consolidation. Our previous strategy of buying traditional highly available server pairs for providing services was replaced with a virtualisation strategy, going from zero ro more than 1000 virtual machines in the course of a couple of years. Virtualisation is now moving from this established base towards use in computational and data sharing areas.
- Automation. Configuration management of our Linux and Windows estates to minimise the effort in administering thousands of machines; reorganising our Active Directory according to best practices, deploying Munki for managing Macintoshes, and adopting cfengine across the Institute for Linux systems.
In the Informatics Systems Group, I was an enthusiastic early adopter of blade server technology, deploying our first blade cluster in 2002, squeezing a then-un-precedented 768 cores into just two 19" racks.
I also took part in our early work with parallel filesystems, deploying IBM GPFS for the storing of large shared datasets.
Incyte Genomics (formerly Hexagen)
Here I gained experience on being the customer of scientific IT services, and the skills and techniques needed to run large scale analysis efficiently on limited IT resources.
I developed and ran the company's SNP-calling pipeline, and visualised that data for the senior company scientists.
I was also involved in customer engagement, discussing data needs of customers, and made contributions to the perl framework used by the LifeSeq product to generate and deliver the data to customers.
PhD: Checkpoint Controls in the Latter Half of the Mammalian Cell Cycle
CRC DNA Repair Research Group
Dept of Zoology, University of Cambridge
My research was focussed on the contrasting behaviour of human and rodent transformed cell lines with respect to their G2/M phase checkpoints following perturbation of their nucleotide pool levels during S-phase.
Mammalian S-phase checkpoint integrity is dependent on transformation status and purine deoxyribonucleosides.
Journal of cell science 2000;113 ( Pt 6);1089-96
The Ensembl computing architecture.
Genome research 2004;14;5;971-5