SMALT efficiently aligns DNA sequencing reads with a reference genome.

Reads from a wide range of sequencing platforms, for example Illumina, Roche-454, Ion Torrent, PacBio or ABI-Sanger, can be processed including paired reads.

[Genome Research Limited]



The software employs a hash index of short words (< 21 nucleotides long), sampled at equidistant steps along the genomic reference sequences.

For each read, potentially matching segments in the reference are identified from seed matches in the index and subsequently aligned with the read using a banded Smith-Waterman algorithm.

The best gapped alignments of each read is reported including a score for the reliability of the best mapping. The user can adjust the trade-off between sensitivity and speed by tuning the length and spacing of the hashed words.

A mode for the detection of split (chimeric) reads is provided. Multi-threaded program execution is supported.

Running SMALT

Mapping with SMALT involves two steps: First, a hash index has to be generated for the genomic reference sequences. Then the sequencing reads are mapped onto the reference using the index.

All sequence input files have to be in FASTA or FASTQ format.

  1. smalt index -k 13 -s 6 hs37k13s6 NCBI37.fasta

    builds a hash table for the human genome in file NCBI37.fasta. Two files hs37k13s6.smi and s37k13s6.sma are written to disk.

    -k 13 specifies the length, -s 6 the spacing of the hashed words. This setting is suitable for human DNA reads of the Illumina-Solexa platform with read length > 70 nucleotides.

  2. smalt map -i 800 -f samsoft -o map.sam hs37k13s6 mate_1.fastq mate_2.fastq

    loads the hash table created by the previous step into memory and maps paired-end reads in the files mate_1.fastq and mate_2.fastq with an expected range of insert sizes of up to 800 bp.

    The output is written to the file map.sam in SAM output format using soft clipping of sequences.


Current version - SMALT v0.7.5

Released 16th July 2013

Older versions

Older versions of SMALT up to version 0.7.4 are available here as binaries.

License and copyright

© 2010 - 2013 Genome Research Limited.

The source code is available under the GNU General Public License.


Questions and comments about SMALT should be directed to the author, Hannes Ponstingl.

* quick link -