Sequence Analysis and Management (SAM) | Scientific Operations
Sequence Analysis and Management (SAM)
Sanger Institute, Genome Research Limited
Our Research and Approach
SAM contributes to various software packages for processing DNA sequence data, including samtools, htslib, biobambam and the Staden package. We also submit raw sequencing data to the EBI on behalf of the research groups.
Gap5 is a DNA sequence assembly visualiser and editing tool. It permits low level base by base editing as well as larger scale contig rearrangements such as complementing, joining and breaking apart contigs.
Input can be from CAF, ACE or more typically SAM, BAM and CRAM file formats.
Many high profile projects such as Vertebrate Genomes Project (VGP), Darwin Tree of Life and the Cancer Genome Project need quality assemblies for downstream analysis using 2nd and 3rd generation sequencing data. Efficient bioinformatics tools in processing and analysis of large quantities of genomic data play crucial roles in producing high quality assemblies as well as data visualisation . The High Performance Algorithm Group (HPAG), headed by Zemin Ning, develops algorithms and software tools for genome analysis. We work with various sequencing technologies and their applications, such as PacBio, ONT, 10X and HiC. Currently the team is focused on genome scaffolding, visualisation and data QC using long range linked reads such as 10X and Hi-C.
The SAM team is responsible for archival of the data produced by NPG to external repositories, typically the EGA, ENA or ArrayExpress at the EBI. They also create and maintain tools which are essential components of our analysis pipelines.