Genome analysis pipelines

The Genome analysis pipelines are dedicated to high-throughput delivery in sample logistics, genome-wide data generation, PCR target preparation for re-sequencing, data quality control, analysis and storage.

Cordelia Langford, as Head of Genome analysis production, is responsible for translating Sanger Institute science strategy into operational requirements: for handling several hundred thousand samples a year, submitted by a large number of internal and external collaborators and consortia, and processed across several different platforms. The genome analysis teams play a vital role in delivering important strategic collaborations such as WTCCC. State of the art technologies and expertise, plus critical data storage, tracking and informatics are fundamental components of the four coordinated facilities: Sample logistics, genotyping, microarray and PCR target preparation.

The Sample logistics pipeline handles sample importation to the Institute. The team carry out DNA and RNA extractions, plus quality assessment and normalisation concentration for samples destined for genotyping, microarray or the medical sequencing pipeline. All processes are fully tracked, all plates are barcoded and all plate or tube freezer locations are registered. Integrated robotics minimise the need for manual intervention and manipulation. We have deployed specialised equipment and software to help in selecting and handling samples for a range of applications.

The PCR target preparation pipeline performs the high-throughput generation of DNA targets for re-sequencing experiments.

The Genotyping pipeline utilises Illumina, Affymetrix and Sequenom platforms. The main platform for genome-wide association studies (GWAS) uses the Illumina 660W quad BeadChip at a throughput of 3000 samples per week. Output data quality is monitored by a system that integrates information held in the Illumina-supplied LIMS and our internal LIMS to allow poor quality samples to be identified and recovered or replaced promptly. The Affymetrix GCAS system runs at a capacity of 96 samples per day using the SNP6.0 chip. The CQC algorithm is used to assess data quality and Birdseed is used for genotype assignment. Sequenom has been used to complete several major replication studies, with a weekly throughput of 30,000 samples. In addition, Sequenom is used for sample QC, whereby all samples have a low plex 'molecular fingerprint' generated to facilitate identity checks after genotyping.

The Microarray facility performs transcription analysis and array Comparative Genomic Hybridisation (aCGH) using Illumina, Agilent and Affymetrix platforms.

The Variation informatics group provides data quality control and analysis for genotyping projects. An analysis pipeline for Genome Wide Association Study (GWAS) data extracts intensity values for all samples in a collection and submits them to a standard genotype calling and QC process. The Illuminus algorithm is used to call genotypes for the Illumina platform and BirdSuite is used for the Affymetrix platform. Further analysis, most commonly imputation, is provided according to the needs of the project. A secure website has been developed to allow specific groups of external collaborators to access relevant data.

Microarray informatics perform large-scale microarray data analysis using BioConductor packages and R. The compute farm is utilised for the rapid execution of large-memory jobs. Projects, data and code are archived and tracked via an internal user Wiki. A Solexa sequence analysis pipeline encompassing automated peak-finding (for ChIP-SEQ data) and estimating RNA abundance levels, splice variants and sequence variants in the transcriptome has been constructed.

