PAGIT

Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.

PAGIT addresses the need for software to generate high quality draft genomes. It is based on a series of programs that we developed: ABACAS, that is able to contiguate contigs from a de novo assembly against a closely related reference. IMAGE, an iterative approach for closing gaps in assembled genomes using mate pair information. It is able to close gaps left open by the assembler in a draft genome, even when using the same data sets as used by the original assembler. iCORN, that enables errors in the consensus sequence to be corrected by iteratively mapping reads to the current assembly. An improved version, especially correction Pacfic Bioscience assemblies (PacBio) can be found here. RATT, a tool to transfer the annotation from a reference genome, or an earlier assembly, onto the latest assembly. PAGIT bundles these software and makes them more accessible for users.

Further information

Are there other ways to improve the assembly, e.g. manually?

This is a very complex topic, and not really part of PAGIT as such, but the Wellcome Trust Advance courses do teach about how to generate and improve assemblies (in the working with pathogens workshop). Please find here the pdf of the module, as it might help you find mis-assemblies, understand the reasons behind mis-assemblies, and help you to fix them manually.

Bug fixes

  • The script to join chromosomes for ABACAS was missing. Please download it and unzip the content in the PAGIT/ABACAS directory.
  • Promer of abacas. Path in the promer file were set wrong. A new file is here: download it. Please replace the in the directory PAGIT/bin/.
  • Abacas option order. A bug was reported in Abacas, that the order of the parameter is relevant. Please double check this.

PAGIT is free software and is distributed under the terms of the GNU General Public License.

PAGIT relies on other freely available bioinformatics software developed by third parties. The list of this third-party software is as follows:

  • Artemis – annotation & BAM visualization tool
  • ACT – a DNA sequence comparison viewer
  • BLASTALL – sequence comparison tool
  • BWA – read mapping tool (Burrows-Wheeler transformation based)
  • MUMmer – sequence comparison tools
  • SAMTOOLS – suite to work with BAM files
  • SMALT – read mapping tool (k-mer based)
  • VELVET – short read assembler

Warning

Extra care must be taken, when working with genome bigger than 200mb.

Citations

If you make use of this software in your research, please cite as:

Swain MT, Tsai IJ, Assefa SA, Newbold C, Berriman M, Otto TD. A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nature protocols 2012;7;7;1260-84, PUBMED: 22678431; PMC: 3648784; DOI: 10.1038/nprot.2012.068

PAGIT is build of the following four tools, that can also be cited individually

ABACAS: Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009 Aug 1;25(15):1968-9. doi: 10.1093/bioinformatics/btp347. Epub 2009 Jun 3. PubMed PMID: 19497936; PubMed Central PMCID: PMC2712343.

IMAGE: Tsai IJ, Otto TD, Berriman M. Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol. 2010;11(4):R41. doi: 10.1186/gb-2010-11-4-r41. Epub 2010 Apr 13. PubMed PMID: 20388197; PubMed Central PMCID: PMC2884544

ICORN: Otto TD, Sanders M, Berriman M, Newbold C. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics. 2010 Jul 15;26(14):1704-7. doi: 10.1093/bioinformatics/btq269. Epub 2010 Jun 18. PubMed PMID: 20562415; PubMed Central PMCID: PMC2894513.

RATT: Otto TD, Dillon GP, Degrave WS, Berriman M. RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res. 2011 May;39(9):e57. doi: 10.1093/nar/gkq1268. Epub 2011 Feb 8. PubMed PMID: 21306991; PubMed Central PMCID: PMC3089447.

Contact

If you need help or have any queries, please contact us using the details below.

For further information please contact Thomas Otto.


Sanger Institute Contributors

Photo of Dr Matt Berriman

Dr Matt Berriman

Senior Group Leader

Previous contributors

Photo of Dr Thomas D Otto

Dr Thomas D Otto

Senior Staff Scientist