The SSAHA algorithm is most suitable for applications requiring exact or 'almost exact' matches between two sequences, such as SNP detection or sequence assembly. The sensitivity of the algorithm can be increased by decreasing the step length (command line option -sl, although note that this also increases RAM usage), but in all cases the algorithm will not detect a stretch of consecutive matching bases that is shorter than the hash word length (10 bases by default).
If you are likely to need to search the same set of sequence data on more than one occasion, use the -sn option on the first run to save the hash table to a file. Subsequent runs can then load in this hash table using the -sf hash option instead of computing it from scratch.
Loads and loads of short matches? Try the following:
Set the -ms parameter to a lower value (default is 100000). This causes the software to ignore more of the commonest words in the database. Conversely, sensitivity is increased by setting this parameter to a higher value.
Set the -nr parameter. This causes each query sequence to be scanned for tandem repeats using a simple algorithm.
Set the -mg and -mi parameters. When set, these cause the software to try to join up adjacent shorter matches into larger matches.
Set the -mp parameter. When set, the software prints only matches whose total number of matching bases exceeds a threshold.
Fast sequence assembly (Zemin Ning)
SNP detection (Jim Mullikin)
Ordering and orientation of contigs (Tony Cox)