SSAHA

Short description of the Software or database that will appear on every page that links to this page.

SSAHA: Sequence Search and Alignment by Hashing Algorithm

About

SSAHA is a software tool for very fast matching and alignment of DNA sequences.

It achieves its fast search speed by converting sequence information into a ‘hash table’ data structure, which can then be searched very rapidly for matches.

For improved alignment and mapping of paired-end sequencing reads please use SSAHA2.

Downloads

The package can be downloaded from

https://ftp.sanger.ac.uk/resources/software/ssaha/

Further information

The SSAHA algorithm is most suitable for applications requiring exact or ‘almost exact’ matches between two sequences, such as SNP detection or sequence assembly. The sensitivity of the algorithm can be increased by decreasing the step length (command line option -sl, although note that this also increases RAM usage), but in all cases the algorithm will not detect a stretch of consecutive matching bases that is shorter than the hash word length (10 bases by default).

If you are likely to need to search the same set of sequence data on more than one occasion, use the -sn option on the first run to save the hash table to a file. Subsequent runs can then load in this hash table using the -sf hash option instead of computing it from scratch.

Loads and loads of short matches? Try the following:

Set the -ms parameter to a lower value (default is 100000). This causes the software to ignore more of the commonest words in the database. Conversely, sensitivity is increased by setting this parameter to a higher value.

Set the -nr parameter. This causes each query sequence to be scanned for tandem repeats using a simple algorithm.

Set the -mg and -mi parameters. When set, these cause the software to try to join up adjacent shorter matches into larger matches.

Set the -mp parameter. When set, the software prints only matches whose total number of matching bases exceeds a threshold.

Some Applications

Fast sequence assembly (Zemin Ning)

SNP detection (Jim Mullikin)

Ordering and orientation of contigs (Tony Cox)

Copyright (C) 2001 – 2015 Genome Research Ltd.

Authors: Zemin Ning, Tony Cox, Adam Spargo and James Mullikin

SSAHA is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Contact

If you need help or have any queries, please contact us using the details below.

For more information, please contact:

Zemin Ning ( zn1@sanger.ac.uk ) or


Sanger Institute Contributors

Photo of Dr Zemin Ning

Dr Zemin Ning

Senior Scientific Manager

External Contributors

Photo of Dr Jim Mullikin

Dr Jim Mullikin

Photo of Tony Cox

Tony Cox