Human Genetics Informatics (HGI)

Sanger Institute, Genome Research Limited

Our Research and Approach

All of the Human Genetics faculty groups use computers to process data and carry out analyses. While there are some analyses with relatively small data sets that could be analysed on a laptop or desktop computer, many analyses involve data sets with vast amounts of data and intensive processing requirements that would take many years (or even centuries) to run on a single machine. In order to stay at the cutting edge of scientific research in this field, our researchers utilise large computational clusters consisting of hundreds of individual computers with the collective processing power of tens of thousands of individial laptop or desktop computers to carry out their analyses on many machines simultaneously, such that work that might have taken 10 years to run on a single machine can be completed within a single day. 

The Informatics Support Group maintains the large-scale computational clusters that we use to run these sorts of analyses, while the Human Genetics Informatics (HGI) team looks after the computational needs that are shared across  Human Genetics faculty groups. Some examples of ways in which we do that are to: 

  • install and maintain specialised analysis software used by researchers to carry out their analyses.
  • manage shared data storage.
  • develop and operate computational workflows for pre-analysis processing of human genetics data sets.

The computational workflows that we run can be very complicated, involving hundreds or even many thousands of individual steps. Each of these steps needs to be able to access its input data and pass its output along to the next step, and it is important not to overload any of the individual computers that are involved by giving them more work to do than they can handle (because overloading them tends to make them more efficient). Given that we share our computers with other groups at the institute, sometimes there will be fewer computational resources available because they are being used by researchers in other groups. At the same time, we would ideally like to be able to reliably recreate the same output data each time we repeat a particular analysis so that we can avoid having to keep all of the pieces of data that we generate stored (in science it is generally important to be able to refer back to data that was used to support a result, so if we are not able to regenerate it we have no choice but to store it indefinitely). This means isolating the software environment that runs each of the steps in our workflows as much as possible so that it can be run in exactly the same way each time. It is these competing interests that ultimately makes running large-scale computational workflows a non-trivial task, and for this reason it is useful to have HGI develop expertise in handling this work centrally rather than distributing it out to individual researchers within the faculty teams.


Dr Joshua C. Randall
Dr Joshua C. Randall
Group Leader

Joshua leads the Human Genetics Informatics team, who are responsible for handling the informatics needs of the Human Genetics faculty.

Emyr James

James, Emyr
Emyr James
Principal Systems Administrator / Principal DevOps Engineer

HGI develops all of our software in the open in our public github repositories, and in accordance with Sanger's software policy, all software is available under a free and open source license. A selection of potentially useful software tools are listed below. We are also part of several large collaborations, including the MRC-funded medical informatics infrastructure projects eMedLab and UMIC.

Within Sanger, we work closely with data production teams who supply the input to our processing pipelines, the systems teams who manage our infrastructure, and other informatics teams with whom we collaborate on some development projects. In addition, we work with researchers in the faculty teams to assist them with their informatics needs.We also have an industrial collaboration with Curoverse, Inc. in which we are trialling use of Arvados, an open-source system for scientific workflow management. 
