SeqTools

A suite of tools for visualising sequence alignments.

The SeqTools package contains three tools for visualising sequence alignments: Blixem, Dotter and Belvu.

Blixem is an interactive browser of sequence alignments that have been stacked up in a "master-slave" multiple alignment; it is not a 'true' multiple alignment but a 'one-to-many' alignment. Dotter is a graphical dot-matrix program for detailed comparison of two sequences. Belvu is a multiple sequence alignment viewer and phylogenetic tool with an extensive set of user-configurable modes to color residues.

Supported platforms

Our primary supported platform is Ubuntu 14.04 (64-bit). SeqTools is well tested and in daily use on this architecture. It is also tested frequently on Mac OS X. It should also work on several other platforms, as listed below, but is less thoroughly supported.

  • Linux - Ubuntu 14.04 (64-bit) - primary supported architecture
  • Linux - other
  • Mac OS X (Intel)
  • Windows (under Cygwin)
  • Other platforms (using VirtualBox)
  • FreeBSD (in the ports).

Blixem features

  • Overview section showing the positions of genes and alignments around the alignment window
  • Detail section showing the actual alignment of protein or nucleotide sequences to the genomic DNA sequence.
  • View alignments against both strands of the reference sequence.
  • View sequences in nucleotide or protein mode; in protein mode, Blixem will display the three-frame translation of the reference sequence.
  • Residues are highlighted in different colours depending on whether they are an exact match, conserved substitution or mismatch.
  • Gapped alignments are supported, with insertions and deletions being highlighted in the match sequence.
  • Matches can be sorted and filtered.
  • SNPs and other variations can be highlighted in the reference sequence.
  • Poly(A) tails can be displayed and poly(A) signals highlighted in the reference sequence.

Dotter features

  • Every residue in one sequence is compared to every residue in the other, and a matrix of scores is calculated.
  • One sequence is plotted on the x-axis and the other on the y-axis.
  • Noise is filtered out so that alignments appear as diagonal lines.
  • Pairwise scores are averaged over a sliding window to make the score matrix more intelligible.
  • The averaged score matrix forms a three-dimensional landscape, with the two sequences in two dimensions and the height of the peaks in the third. This landscape is projected onto two dimensions using a grey-scale image - the darker grey of a peak, the higher the score is.
  • The contrast and threshold of the grey-scale image can be adjusted interactively, without having to recalculate the score matrix.
  • An Alignment Tool is provided to examine the sequence alignment that the grey-scale image represents.
  • Known high-scoring pairs can be loaded from a GFF file and overlaid onto the plot.
  • Gene models can be loaded from GFF and displayed alongside the relevant axis.
  • Compare a sequence against itself to find internal repeats.
  • Find overlaps between multiple sequences by making a dot-plot of all sequences versus themselves.
  • Run Dotter in batch mode to create large, time-consuming dot-plots as a background process.

Belvu features

  • View multiple sequence alignments.
  • Residues can be coloured by conservation, with user-configurable cutoffs and colours.
  • Residues can be coloured by residue type (user-configurable).
  • Colour schemes can be imported or exported.
  • Swissprot (or PIR) entries can be fetched by double clicking.
  • The position in the alignment can be easily tracked.
  • Manual deletion of rows and columns.
  • Automatic editing of rows and columns based on customisable criteria:
    • removal of all-gap columns;
    • removal of all gaps;
    • removal of redundant sequences;
    • removal of a column by a user-specified percentage of gaps;
    • filtering of sequences by percent identity;
    • removal of sequences by a user-specified percentage of gaps;
    • removal of partial sequences (those starting or ending with gaps); and
    • removal of columns by conservation (with user-specified upper/lower cutoffs).
  • The alignment can be saved in Stockholm, Selex, MSF or FASTA format.
  • Distance matrices between sequences can be generated using a variety of distance metrics.
  • Distance matrices can be imported or exported.
  • Phylogenetic trees can be constructed based on various distance-based tree reconstruction algorithms.
  • Trees can be saved in New Hampshire format.
  • Belvu can perform bootstrap phylogenetic reconstruction.

Software pipelines

As well as being used independently, Blixem, Dotter and Belvu can also be called from other tools as part of a software pipeline. A common workflow is to call Blixem from the ZMap genome browser to analyse a set of alignments in more detail, and to call Dotter from within Blixem to give a graphical representation of a particular alignment. Belvu has an extensive set of command-line arguments for specifying processing and output parameters, making it possible to perform complete processes in a single command-line call. See our team page for more information.

Background

Blixem, Dotter and Belvu were originally written as part of the AceDB genome database system. Version 4 of the programs involved an extensive re-write to take advantage of modern GUI toolkits and to separate them from AceDB to form this independent SeqTools package. They can be used independently or with any other tool that outputs data in a suitable format - the current preferred file formats are FASTA and GFF v3 for Blixem and Dotter; a variety of file formats are supported by Belvu.

Links

Further information

Getting started

Run the programs without arguments to see their usage information, or try out the examples given in the examples directory of the source-code download.

For more details, see the README file in the source code.

Help pages

Help pages, including a quick-start guide and user manual, are installed along with the programs. They can be accessed from within the programs using either the Help menu, the lifebuoy icon on the toolbar, or the Ctrl-H keyboard shortcut. They are included in the doc/User_doc directory in the source code..

User manuals

User manuals are installed along with the programs. The manuals for the current production versions can also be downloaded here:

Other documentation

Other documentation, such as design notes, is included in the doc directory in the source-code. It can also be viewed here.

SeqTools is free software and is distributed under the terms of the Apache Version 2.0 License.

SeqTools should be credited to “Genome Research Ltd”.

Contact

If you need help or have any queries, please contact us using the details below.

SeqTools is maintained by the Flicek team at the EBI.

If you have a bug or feature request, please raise a ticket by emailing seqtools.

For any other enquires, please email annosoft.


Sanger Institute Contributors

Previous contributors

Photo of Gemma Guest

Gemma Guest

Former Senior Software Developer in the Annosoft Team

Photo of Dr Ed Griffiths

Dr Ed Griffiths

Former Senior Scientific Manager

External Contributors