SMALT aligns DNA sequencing reads with a reference genome.
Reads from a wide range of sequencing platforms can be processed, for example Illumina, Roche-454, Ion Torrent, PacBio or ABI-Sanger. Paired reads are supported. There is no support for SOLiD reads.
A mode for the detection of split (chimeric) reads is provided. Multi-threaded program execution is supported.
Clone the git repository or download the gzipped tar arvchive from
and follow the installation instructions there or issue the following commands:
tar zxvf smalt-0.7.6.tar.gz
Binary distributions up to version 0.7.4 are available from
SMALT employs a hash index of short words up to 20 nucleotides long and sampled at equidistant steps along the reference genome. For each sequencing read, potentially matching segments in the reference genome are identified from seed matches in the index and subsequently aligned with the read using dynamic programming.
The best gapped alignments of each read are reported including a score for the reliability of the best mapping. The user can adjust the trade-off between sensitivity and speed by tuning the length and spacing of the hashed words.
Mapping with SMALT involves two steps: First, a hash index has to be generated for the genomic reference sequences. Then the sequencing reads are mapped onto the reference using the index.
> smalt index -k 14 -s 8 hs38_k14s8 GRCh38.fasta
builds a hash index for the human genome in the FASTA file GRCh38.fasta. Words of 14 base pair length are sampled at every 8th position in the genome. Two files hs38_k14s8.smi and hs38_k14s8.sma are written to disk.
> smalt map -o mapped.sam hs38_k14s8 mates_1.fastq mates_2.fastq
loads the hash table created by the previous step into memory and maps paired-end reads in the files mates_1.fastq and mates_2.fastq. The output is written to the file mapped.sam in SAM output format.
SMALT is Copyright (C) 2010 – 2015 Genome Research Ltd.
The source code is provided under the GNU Public License version 3 GPLv3 .