Next-generation sequencing involves the application of glass micro-chip based methods and small-volume liquid handling (microfluidics) to sequence DNA more quickly and more cheaply than ever before, indeed 1000s times less costly than the technology used to sequence the first human genome just a few years ago. These methods rely on reacting millions of molecules simultaneously in a single vessel and analysing those molecules in parallel on a single chip using a state-of-the-art optical detection instrument. A further increase in speed and a decrease in cost are attained by running multiple instruments concurrently.
Our current fleet of sequencing instruments includes the HiSeq-X platform, 13 HiSeq 2500s and 6 MiSeqs when enables us to support ambitious research projects in genomic medicine.
The advent of next-generation technologies has fuelled an explosion in the quantity of raw DNA sequence that can be generated by a reasonably sized genomics facility. Compared to about 800 million bases per week generated at the Sanger Institute in the height of the human genome project using conventional capillary electrophoresis methods, the Illumina production facility currently averages about 300 billion bases (Gigabases = Gb) per week. This translates into 5000 human-genome equivalents per year for the approximately 3 billion bases in a human genome.
This enormous capacity is being translated into amazing new scientific endeavours by the Institute faculty, tackling exciting new genomics projects. Researchers here are cataloguing what makes cancer cells dangerous down at the level of individual genetic changes, how and why pathogens like malaria evolve to be more (or less) harmful and how humans adapt to those changes. Metagenomics is the study of the sequences of large populations of different organisms all growing in a common environment - as for example seawater, soil, the human gut - and these studies are made vastly easier by next-generation sequencing. We are looking at how human (and mouse) genomes vary between individuals to help get a handle on how genetics plays a role in the risk, generation, prognosis and treatment of disease.
The Solexa/Illumina technology is based on amplified single-molecule arrays - many millions of single molecules of DNA are placed onto a glass chip and each of those molecules is amplified in situ to form localised colonies or clusters of DNA. Each element of a cluster is virtually identical to its neighbours and thereby the signal from a single molecule is increased linearly to give robust, reliable detection. Sequencing-by-synthesis is carried out on all these clusters simultaneously, using fluorescent reversible terminators, which allow one and only one nucleoside to be added to a growing strand in a single cycle of sequencing.
After incorporation of the terminators, the instrument images and distinguishes the four different terminators (A, C, G and T) by their unique attached fluorescent dye using two different lasers (red and green) and four different optical filters. After imaging one small part of the chip, the instrument continues scanning over the 960 imaging tiles. The last step of the cycle is removal of the fluorescence group and reversal of the termination allowing the next single base to be sequenced in the subsequent cycle. One complete cycle of chemistry and imaging typically takes about 1 hour on the instrument.
The chips have eight channels or lanes, allowing up to eight sample libraries to be simultaneously analysed. Additional samples can be analysed employing a technique called multiplexing or indexing to mix different samples in a single lane of the chip; these samples can be subsequently separated in software using their unique sequence barcodes. Typically, all eight lanes of a 100-cycle run generate about 30 Gb of sequence in paired-end mode (sequencing sequentially 100 bases, e.g., from each end of the molecules).
Running as facility of this size requires a massive amount of support and we work closely with the library preparation team that supplies large numbers of DNA templates in a from ready to be sequenced , the Institute's IT team that maintains the extensive amount of compute and storage infrastructure necessary, sequencing informatics which develops software tools to process, analyse, store and track all the data, projects and samples for the Illumina pipeline and the development team which invents novel and improved protocols to take better advantage of this new technology.
The Illumina Sequencing platform uses Illumina HiSeqs and the HiSeq X Ten system, delivered by a high-throughput team and a bespoke team. The capacity of these machines allows the teams to combine multiple sample libraries into a single lane, routinely generating more than 600 Gigabases (Gb) per run (anticipated to rise to 1Tb per run with upgrades). To ensure our processes are robust and scalable, we seek to conform to the standards of Good Clinical Laboratory Practice (GCLP).
The Illumina Sequencing platform delivered more than 70,000 libraries and 12,000 lanes in 2013 across a range of library types, and a total of 400 Terabases (Tb) of data. In addition, there has been an improvement in quality control, with our Libraries now meeting a 99–100 per cent pass rate and a reduction in sequencing lane failure rates to less than 5 per cent.