Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.
Are there other ways to improve the assembly, e.g. manually?
This is a very complex topic, and not really part of PAGIT as such, but the Wellcome Trust Advance courses do teach about how to generate and improve assemblies (in the working with pathogens workshop). Please find here the pdf of the module, as it might help you find mis-assemblies, understand the reasons behind mis-assemblies, and help you to fix them manually.
- The script to join chromosomes for ABACAS was missing. Please download it and unzip the content in the PAGIT/ABACAS directory.
- Promer of abacas. Path in the promer file were set wrong. A new file is here: download it. Please replace the in the directory PAGIT/bin/.
- Abacas option order. A bug was reported in Abacas, that the order of the parameter is relevant. Please double check this.
PAGIT is free software and is distributed under the terms of the GNU General Public License.
PAGIT relies on other freely available bioinformatics software developed by third parties. The list of this third-party software is as follows:
- Artemis – annotation & BAM visualization tool
- ACT – a DNA sequence comparison viewer
- BLASTALL – sequence comparison tool
- BWA – read mapping tool (Burrows-Wheeler transformation based)
- MUMmer – sequence comparison tools
- SAMTOOLS – suite to work with BAM files
- SMALT – read mapping tool (k-mer based)
- VELVET – short read assembler
Extra care must be taken, when working with genome bigger than 200mb.
If you make use of this software in your research, please cite as:
Swain MT, Tsai IJ, Assefa SA, Newbold C, Berriman M, Otto TD. A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nature protocols 2012;7;7;1260-84, PUBMED: 22678431; PMC: 3648784; DOI: 10.1038/nprot.2012.068
PAGIT is build of the following four tools, that can also be cited individually
ABACAS: Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009 Aug 1;25(15):1968-9. doi: 10.1093/bioinformatics/btp347. Epub 2009 Jun 3. PubMed PMID: 19497936; PubMed Central PMCID: PMC2712343.
IMAGE: Tsai IJ, Otto TD, Berriman M. Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol. 2010;11(4):R41. doi: 10.1186/gb-2010-11-4-r41. Epub 2010 Apr 13. PubMed PMID: 20388197; PubMed Central PMCID: PMC2884544
ICORN: Otto TD, Sanders M, Berriman M, Newbold C. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics. 2010 Jul 15;26(14):1704-7. doi: 10.1093/bioinformatics/btq269. Epub 2010 Jun 18. PubMed PMID: 20562415; PubMed Central PMCID: PMC2894513.
RATT: Otto TD, Dillon GP, Degrave WS, Berriman M. RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res. 2011 May;39(9):e57. doi: 10.1093/nar/gkq1268. Epub 2011 Feb 8. PubMed PMID: 21306991; PubMed Central PMCID: PMC3089447.