Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.
This page is maintained as a historical record and is no longer being updated.
NOTE: Links on this page may no longer work as the page is no longer actively maintained
It is based on a series of programs that we developed: ABACAS, that is able to contiguate contigs from a de novo assembly against a closely related reference. IMAGE, an iterative approach for closing gaps in assembled genomes using mate pair information. It is able to close gaps left open by the assembler in a draft genome, even when using the same data sets as used by the original assembler. iCORN, that enables errors in the consensus sequence to be corrected by iteratively mapping reads to the current assembly. An improved version, especially correction Pacfic Bioscience assemblies (PacBio) can be found here. RATT, a tool to transfer the annotation from a reference genome, or an earlier assembly, onto the latest assembly. PAGIT bundles these software and makes them more accessible for users.
Pagit is compiled for linux/unix systems and available as virtual machine. The installation procedure is below.
- Virtual Machine 32 bit (5.5Gb)
- Virtual Machine 32 bit – bzip2 (1.5Gb)
- Virtual Machine 64 bit (3.9Gb)
- Virtual Machine 64 bit – bzip2 (0.9Gb)
- Download the appropriate compressed tar archive for your Linux system. Click on the Linux binary x64bit on the link above.
- Move the compressed tar archive to the location where you want PAGIT installed, then decompress the tar ball by typing the following commands in a terminal window:
mv PAGIT.V1.64bit.tgz /path/to/my/installed/software cd /path/to/my/installed/software tar xzf PAGIT.V1.64bit.tgz
- Now execute the install script by typing the following in a terminal window:
- Each time you want to run, source the environment settings to run PAGIT:
- (Optional) The environment settings for PAGIT should be sourced each time PAGIT is executed. Alternatively, the command source PAGIT/sourceme.pagit may be included into your local environmental variable file – for example the file .bashrc – so that the PAGIT environment is automatically initialised.
- We assume that the tcsh shell and Java (version 1.6 or above) are installed on the system
Installation: Virtual Machine
The Virtual machine was tested on Windows and MAC OS. It is recommended to have at least 4GB memory when with bacterial size genomes. If the machine has less memory, a setup of a SWAP might be require, see below.
- If not already performed, download the virtual box software from VirtualBox and install it according the VirtualBox documentation: VirtualBox
- Download the PAGIT virtual machine required for your operating system. Click on either the Virtual Machine 32 bit or the Virtual Machine 64 bit link above.
- If you choose the bzip2 version, you will need to unzip the file first. Depending your operating system, a double click on it should do it.
- Open virtual box and click on new to create a new virtual machine. Click on next to move through the registration screens.
- You will need to give the virtual machine a name (e.g. PAGIT) and select the operating system and version: which would be Linux and then either Ubuntu or Ubuntu64.
- Specify the amount of memory to be allocated. You should not give the virtual machine more than 75% of the complete memory available, but it should have at least 2GB.
- Specify the Virtual Hard Disk using the toggle on the use existing hard disk option and click on the file icon to find and select the downloaded PAGIT virtual machine.
- To start the virtual machine, select it and click on the green arrow.
- If not already open, open terminal, left site, third last icon.
- As all variables are already set, you can try the test set with:
cd ~/bin/PAGIT/exampleTestset/ ./dotestrun.sh
All four programs of PAGIT should run through and at the end, an ACT window will open.
IMPORTANT: The password for root is wt. For the user pagit it is pagitvm.
Are there other ways to improve the assembly, e.g. manually?
This is a very complex topic, and not really part of PAGIT as such, but the Wellcome Trust Advance courses do teach about how to generate and improve assemblies (in the working with pathogens workshop). Please find here the pdf of the module, as it might help you find mis-assemblies, understand the reasons behind mis-assemblies, and help you to fix them manually.
- The script to join chromosomes for ABACAS was missing. Please download it and unzip the content in the PAGIT/ABACAS directory.
- Promer of abacas. Path in the promer file were set wrong. A new file is here: download it. Please replace the in the directory PAGIT/bin/.
- Abacas option order. A bug was reported in Abacas, that the order of the parameter is relevant. Please double check this.
PAGIT is free software and is distributed under the terms of the GNU General Public License.
PAGIT relies on other freely available bioinformatics software developed by third parties. The list of this third-party software is as follows:
- Artemis – annotation & BAM visualization tool
- ACT – a DNA sequence comparison viewer
- BLASTALL – sequence comparison tool
- BWA – read mapping tool (Burrows-Wheeler transformation based)
- MUMmer – sequence comparison tools
- SAMTOOLS – suite to work with BAM files
- SMALT – read mapping tool (k-mer based)
- VELVET – short read assembler
Extra care must be taken, when working with genome bigger than 200mb.
If you make use of this software in your research, please cite as:
Swain MT, Tsai IJ, Assefa SA, Newbold C, Berriman M, Otto TD. A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nature protocols 2012;7;7;1260-84, PUBMED: 22678431; PMC: 3648784; DOI: 10.1038/nprot.2012.068
PAGIT is build of the following four tools, that can also be cited individually
ABACAS: Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009 Aug 1;25(15):1968-9. doi: 10.1093/bioinformatics/btp347. Epub 2009 Jun 3. PubMed PMID: 19497936; PubMed Central PMCID: PMC2712343.
IMAGE: Tsai IJ, Otto TD, Berriman M. Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol. 2010;11(4):R41. doi: 10.1186/gb-2010-11-4-r41. Epub 2010 Apr 13. PubMed PMID: 20388197; PubMed Central PMCID: PMC2884544
ICORN: Otto TD, Sanders M, Berriman M, Newbold C. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics. 2010 Jul 15;26(14):1704-7. doi: 10.1093/bioinformatics/btq269. Epub 2010 Jun 18. PubMed PMID: 20562415; PubMed Central PMCID: PMC2894513.
RATT: Otto TD, Dillon GP, Degrave WS, Berriman M. RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res. 2011 May;39(9):e57. doi: 10.1093/nar/gkq1268. Epub 2011 Feb 8. PubMed PMID: 21306991; PubMed Central PMCID: PMC3089447.