SMALT efficiently aligns DNA sequencing reads with a reference genome.
Reads from a wide range of sequencing platforms, for example Illumina, Roche-454, Ion Torrent, PacBio or ABI-Sanger, can be processed including paired reads.
[Genome Research Limited]
The software employs a hash index of short words (< 21 nucleotides long), sampled at equidistant steps along the genomic reference sequences.
For each read, potentially matching segments in the reference are identified from seed matches in the index and subsequently aligned with the read using a banded Smith-Waterman algorithm.
The best gapped alignments of each read is reported including a score for the reliability of the best mapping. The user can adjust the trade-off between sensitivity and speed by tuning the length and spacing of the hashed words.
A mode for the detection of split (chimeric) reads is provided. Multi-threaded program execution is supported.
Mapping with SMALT involves two steps: First, a hash index has to be generated for the genomic reference sequences. Then the sequencing reads are mapped onto the reference using the index.
All sequence input files have to be in FASTA or FASTQ format.
smalt index -k 13 -s 6 hs37k13s6 NCBI37.fasta
builds a hash table for the human genome in file NCBI37.fasta. Two files hs37k13s6.smi
and s37k13s6.sma are written to disk.
-k 13 specifies the length, -s 6 the spacing of the hashed words. This setting is
suitable for human DNA reads of the Illumina-Solexa platform with read length > 70 nucleotides.
smalt map -i 800 -f samsoft -o map.sam hs37k13s6 mate_1.fastq mate_2.fastq
loads the hash table created by the previous step into memory and maps paired-end reads in the files
mate_1.fastq and mate_2.fastq with an expected range of insert sizes of up to 800 bp.
The output is written to the file map.sam in SAM output format using soft clipping of sequences.
Released 10th April 2013
Note: BAM output was broken from versions 0.7.0 to 0.7.1: Reference positions were off by +1 bp. CIGAR strings could be incorrect.
Older versions of SMALT are available on the FTP site.
© 2010 - 2013 Genome Research Limited.
The source code will be made available shortly under the GNU General Public License. www.gnu.org/licenses/
Questions and comments about SMALT should be directed to the author, Hannes Ponstingl.