Software

As a leading genomics centre, the Sanger Institute often needs to develop software solutions to novel biological problems.

All our software is made available to the research community and is open access, recognising that community improvement is essential to maximising efficiencies in software development.

Top downloads

  • Artemis - a free genome viewer and annotation tool that allows visualisation of sequence features and the results of analyses within the context of the sequence, and also its six-frame translation
    • ACT - a DNA sequence comparison viewer written in Java. It is based on the software for Artemis, the genome viewer and annotation tool
  • SSAHA2 - a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences

[Genome Research Limited]

Analysis

Genotype/Phenotype analysis

  • Evoker - a graphical tool for visualising genotype intensity data in order to assess genotype calls as part of quality control procedures for genome-wide association studies
  • Genevar - a platform of database and web services for integrative analysis and visualization of SNP-gene associations in eQTL studies
  • GLIDERS - Genome-wide linkage disequilibrium repository and search engine
  • Illuminus - a fast and accurate algorithm for assigning single nucleotide polymorphism (SNP) genotypes to microarray data from the Illumina BeadArray technology
  • Olorin - an interactive filtering tool for next generation sequencing data coming from the study of large complex disease pedigrees
  • optiCall - a robust genotype-calling algorithm for calling rare, low-frequency and common variants from SNP microarray intensity data
  • Optimist - a simple software package for inferring positive selection from marker dynamics in an asexual population
  • PEER - a Bayesian framework to account for complex non-genetic factors in high-dimensional phenotype data

Protein analysis

  • Doublescan - a program for comparative ab initio prediction of protein coding genes in mouse and human DNA
  • Logomat-P - illustrates the similarities of pairs of protein family profiles in an intuitive way
  • Mascot Percolator - a software package that interfaces the database search algorithm Mascot with Percolator
  • Projector - a program for the comparative, homology based prediction of protein coding genes in mouse and human DNA
  • Quicktree - allows the reconstruction of phylogenies for very large protein families that would be infeasible using other popular methods
  • SCOOP - allows the comparison of families of proteins
  • Turbo SLoMo - a software tool which can localise and score sites of protein modification in mass spectrometry data

Sequence analysis

  • Alfresco - FRont-End for Sequence COmparison
  • Alien_hunter - an application for the prediction of putative Horizontal Gene Transfer (HGT) events with the implementation of Interpolated Variable Order Motifs (IVOMs)
  • AMELIA - a program that employs an allele-matching approach that is robust to the presence of both directions of effect for variants within the locus analysed
  • ARIEL - analysis software that employs a locus-wide regression-based collapsing approach that incorporates variant quality scores
  • AutoCSA (Automatic Comparative Sequence Analysis) - mutation detection program designed to detect small mutations (1-50 bases) in sequence traces
  • BioView - a suite of tools for generating lightweight chromatogram images from any trace file that can be cast as a biojava chromatogram interface
  • Blast - the sequencing projects' Blast Search Services
  • CAROL - a combined functional annotation score of non-synonymous coding variants
  • CnD - a copy number variant caller for inbred strains
  • CCRaVAT & QuTie - enables analysis of rare variants in large-scale case control and quantitative trait association studies
  • Dindel - accurate indel calls from short-read data
  • EMu - software for inferring the mutational signatures present in a number of cancer mutation sets
  • Eponine - a probabilistic method for detecting transcription start sites in mammalian genomic sequence
  • ESGI - information about bioinformatics and computational tools available for the analysis of high-throughput genomic data
  • Est_DB - a software suite and database system designed to support expressed sequence tag (EST) sequencing projects, and to provide comprehensive bioinformatic analysis of sequenced EST libraries, for gene discovery and other purposes
  • GWAVA - A functional annotation tool for non-coding sequence variation
  • Hexamer - scans for likely coding regions using 6-mers but without deriving information from base composition
  • Image - a package of analysis algorithms for processing gel images from restriction digest fingerprinting experiments
  • KATE - a program that analyses the effects of low frequency and rare variants on quantitative traits within a chromosomal region
  • Logomat-M - a method to graphically visualise all central aspects of profile Hidden Markov Models (pHMMs), thus generalizing the concept of sequence logos
  • Margarita - infers genealogies from population genotype data and uses these to map disease loci
  • NestedMICA - a method for discovering over-represented short motifs in large sets of strings, for example in finding transcription-factor-binding sites in DNA sequences
  • PICNIC - an algorithm designed to identify copy number segments and genotypes in cancer using a SNP6 'cel' file as input
  • RetroSeq - Transposable element discovery from next-generation sequencing data

Annotation

  • Anacode and Annotools - specialist analysis pipelines and annotation tools
    • Otterlace - an interactive, graphical annotation tool
    • ZMap - a feature annotation viewer
    • SeqTools a suite of tools for visualising sequence alignments
    • AceDB a genome database system
  • Artemis - a free genome viewer and annotation tool that allows visualisation of sequence features and the results of analyses within the context of the sequence, and also its six-frame translation
    • ACT - a DNA sequence comparison viewer written in Java. It is based on the software for Artemis, the genome viewer and annotation tool
    • BamView - interactive display of read alignments in BAM data files
    • DNAPlotter - makes use of the existing circular plot in Jemboss and the Artemis sequence libraries

Assembly

  • Lookseq - a web-based application for alignment visualisation, browsing and analysis of genome sequence data
  • NPG - short read sequencing pipeline
  • PAGIT - Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation
  • Phusion - a software package for assembling genome sequences from whole genome shotgun(WGS) reads
  • REAPR - A tool that evaluates the accuracy of a genome assembly using mapped paired end reads
  • SMALT - a highly efficient and accurate mapper of DNA sequencing reads from a variety of platforms including paired reads
  • SSAHA - a software tool for very fast matching and alignment of DNA sequences
  • SSAHA2 - a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences
  • SSAHAest - a software tool for very fast matching and alignment of ESTs/cDNAs to genomic DNA sequences
  • SSAHAsnp - a polymorphism detection tool, which detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence

Database software

  • DAS - the Institute provides support for the Distributed Annotation Systems via a range of different projects, websites and applications
  • DBCon - database pooling, distributed configuration and SQL Libraries for Java
  • Proserver - a very lightweight DAS server written in Perl

Data formats

  • CAF - a text format for describing sequence assemblies
  • GFF - a format for describing genes and other features associated with DNA, RNA and Protein sequences

Gene finding

  • GAZE - integrates gene prediction signal and content sensor information into complete gene structures
  • PSILC - Pseudogene inference from loss of constraint

All downloads

Analysis

Analysis

Genotype/Phenotype analysis

  • Evoker - a graphical tool for visualising genotype intensity data in order to assess genotype calls as part of quality control procedures for genome-wide association studies
  • Genevar - a platform of database and web services for integrative analysis and visualization of SNP-gene associations in eQTL studies
  • GLIDERS - Genome-wide linkage disequilibrium repository and search engine
  • Illuminus - a fast and accurate algorithm for assigning single nucleotide polymorphism (SNP) genotypes to microarray data from the Illumina BeadArray technology
  • Olorin - an interactive filtering tool for next generation sequencing data coming from the study of large complex disease pedigrees
  • optiCall - a robust genotype-calling algorithm for calling rare, low-frequency and common variants from SNP microarray intensity data
  • Optimist - a simple software package for inferring positive selection from marker dynamics in an asexual population
  • PEER - a Bayesian framework to account for complex non-genetic factors in high-dimensional phenotype data

Protein analysis

  • Doublescan - a program for comparative ab initio prediction of protein coding genes in mouse and human DNA
  • Logomat-P - illustrates the similarities of pairs of protein family profiles in an intuitive way
  • Mascot Percolator - a software package that interfaces the database search algorithm Mascot with Percolator
  • Projector - a program for the comparative, homology based prediction of protein coding genes in mouse and human DNA
  • Quicktree - allows the reconstruction of phylogenies for very large protein families that would be infeasible using other popular methods
  • SCOOP - allows the comparison of families of proteins
  • Turbo SLoMo - a software tool which can localise and score sites of protein modification in mass spectrometry data

Sequence analysis

  • Alfresco - FRont-End for Sequence COmparison
  • Alien_hunter - an application for the prediction of putative Horizontal Gene Transfer (HGT) events with the implementation of Interpolated Variable Order Motifs (IVOMs)
  • AMELIA - a program that employs an allele-matching approach that is robust to the presence of both directions of effect for variants within the locus analysed
  • ARIEL - analysis software that employs a locus-wide regression-based collapsing approach that incorporates variant quality scores
  • AutoCSA (Automatic Comparative Sequence Analysis) - mutation detection program designed to detect small mutations (1-50 bases) in sequence traces
  • BioView - a suite of tools for generating lightweight chromatogram images from any trace file that can be cast as a biojava chromatogram interface
  • Blast - the sequencing projects' Blast Search Services
  • CAROL - a combined functional annotation score of non-synonymous coding variants
  • CnD - a copy number variant caller for inbred strains
  • CCRaVAT & QuTie - enables analysis of rare variants in large-scale case control and quantitative trait association studies
  • Dindel - accurate indel calls from short-read data
  • EMu - software for inferring the mutational signatures present in a number of cancer mutation sets
  • Eponine - a probabilistic method for detecting transcription start sites in mammalian genomic sequence
  • ESGI - information about bioinformatics and computational tools available for the analysis of high-throughput genomic data
  • Est_DB - a software suite and database system designed to support expressed sequence tag (EST) sequencing projects, and to provide comprehensive bioinformatic analysis of sequenced EST libraries, for gene discovery and other purposes
  • GWAVA - A functional annotation tool for non-coding sequence variation
  • Hexamer - scans for likely coding regions using 6-mers but without deriving information from base composition
  • Image - a package of analysis algorithms for processing gel images from restriction digest fingerprinting experiments
  • KATE - a program that analyses the effects of low frequency and rare variants on quantitative traits within a chromosomal region
  • Logomat-M - a method to graphically visualise all central aspects of profile Hidden Markov Models (pHMMs), thus generalizing the concept of sequence logos
  • Margarita - infers genealogies from population genotype data and uses these to map disease loci
  • NestedMICA - a method for discovering over-represented short motifs in large sets of strings, for example in finding transcription-factor-binding sites in DNA sequences
  • PICNIC - an algorithm designed to identify copy number segments and genotypes in cancer using a SNP6 'cel' file as input
  • RetroSeq - Transposable element discovery from next-generation sequencing data

Annotation

Annotation

  • Anacode and Annotools - specialist analysis pipelines and annotation tools
    • Otterlace - an interactive, graphical annotation tool
    • ZMap - a feature annotation viewer
    • SeqTools a suite of tools for visualising sequence alignments
    • AceDB a genome database system
  • Artemis - a free genome viewer and annotation tool that allows visualisation of sequence features and the results of analyses within the context of the sequence, and also its six-frame translation
    • ACT - a DNA sequence comparison viewer written in Java. It is based on the software for Artemis, the genome viewer and annotation tool
    • BamView - interactive display of read alignments in BAM data files
    • DNAPlotter - makes use of the existing circular plot in Jemboss and the Artemis sequence libraries

Assembly

Assembly

  • Lookseq - a web-based application for alignment visualisation, browsing and analysis of genome sequence data
  • NPG - short read sequencing pipeline
  • PAGIT - Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation
  • Phusion - a software package for assembling genome sequences from whole genome shotgun(WGS) reads
  • REAPR - A tool that evaluates the accuracy of a genome assembly using mapped paired end reads
  • SMALT - a highly efficient and accurate mapper of DNA sequencing reads from a variety of platforms including paired reads
  • SSAHA - a software tool for very fast matching and alignment of DNA sequences
  • SSAHA2 - a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences
  • SSAHAest - a software tool for very fast matching and alignment of ESTs/cDNAs to genomic DNA sequences
  • SSAHAsnp - a polymorphism detection tool, which detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence

Database software

Database software

  • DAS - the Institute provides support for the Distributed Annotation Systems via a range of different projects, websites and applications
  • DBCon - database pooling, distributed configuration and SQL Libraries for Java
  • Proserver - a very lightweight DAS server written in Perl

Data formats

Data formats

  • CAF - a text format for describing sequence assemblies
  • GFF - a format for describing genes and other features associated with DNA, RNA and Protein sequences

Gene finding

Gene finding

  • GAZE - integrates gene prediction signal and content sensor information into complete gene structures
  • PSILC - Pseudogene inference from loss of constraint
* quick link - http://q.sanger.ac.uk/skljx0fx