SSAHA

SSAHA

SSAHA

Overview

SSAHA: Sequence Search and Alignment by Hashing Algorithm

SSAHA is a software tool for very fast matching and alignment of DNA sequences.

It achieves its fast search speed by converting sequence information into a 'hash table' data structure, which can then be searched very rapidly for matches.

For improved alignment and mapping of paired-end sequencing reads please use SSAHA2.

Download and Installation

Learn and Support

The SSAHA algorithm is most suitable for applications requiring exact or 'almost exact' matches between two sequences, such as SNP detection or sequence assembly. The sensitivity of the algorithm can be increased by decreasing the step length (command line option -sl, although note that this also increases RAM usage), but in all cases the algorithm will not detect a stretch of consecutive matching bases that is shorter than the hash word length (10 bases by default).

If you are likely to need to search the same set of sequence data on more than one occasion, use the -sn option on the first run to save the hash table to a file. Subsequent runs can then load in this hash table using the -sf hash option instead of computing it from scratch.

Loads and loads of short matches? Try the following:

Set the -ms parameter to a lower value (default is 100000). This causes the software to ignore more of the commonest words in the database. Conversely, sensitivity is increased by setting this parameter to a higher value.

Set the -nr parameter. This causes each query sequence to be scanned for tandem repeats using a simple algorithm.

Set the -mg and -mi parameters. When set, these cause the software to try to join up adjacent shorter matches into larger matches.

Set the -mp parameter. When set, the software prints only matches whose total number of matching bases exceeds a threshold.

Some Applications

Fast sequence assembly (Zemin Ning)

SNP detection (Jim Mullikin)

Ordering and orientation of contigs (Tony Cox)

License and Citation

Copyright (C) 2001 - 2015 Genome Research Ltd.

Authors: Zemin Ning, Tony Cox, Adam Spargo and James Mullikin


SSAHA is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.

You should have received a copy of the GNU General Public License along
with this program. If not, see <http://www.gnu.org/licenses/>.

Contact

For more information, please contact:

Zemin Ning ( zn1@sanger.ac.uk ) or

Authors

Sanger Contributors
External Contributors

Publications

  • SSAHA: a fast search method for large DNA databases.

    Ning Z, Cox AJ and Mullikin JC

    Genome research 2001;11;10;1725-9

Tool Type