REAPR - Recognising Errors in Assemblies using Paired Reads

A tool that evaluates the accuracy of a genome assembly using mapped paired end reads.

REAPR is a tool that evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison. It can be used in any stage of an assembly pipeline to automatically break incorrect scaffolds and flag other errors in an assembly for manual inspection. It reports mis-assemblies and other warnings, and produces a new broken assembly based on the error calls.

REAPR was published in Genome Biology: REAPR: a universal tool for genome assembly evaluation. Genome Biology 2013, 14:R47, doi:10.1186/gb-2013-14-5-r47.

[Genome Research Limited]

Information

The software requires as input an assembly in FASTA format and paired reads mapped to the assembly in a BAM file. Mapping information such as the fragment coverage and insert size distribution is analysed to locate mis-assemblies. REAPR works best using mapped read pairs from a large insert library (at least 1000bp). Additionally, if a short insert Illumina library is also available, REAPR can combine this with the large insert library in order to score each base of the assembly.

Please read the manual for instructions on installing and running REAPR.

Download

Latest Linux version

Please see inside the tarball for installation instructions (in the README file) and the manual (manual.pdf).

Note: it is recommended that reads are mapped with version 0.7.0.1 of SMALT without the -f bam option (use -f samsoft and import to BAM afterwards), to make input to REAPR. Higher versions of SMALT have not been tested with REAPR. Note that the latest version of REAPR can run the mapping for you.

Latest MAC/Windows version (virtual machine)

REAPR is also available as a virtual machine for MAC OSX and Windows users:

For installation of the virtual machine, please follow the instructions on the download page of PAGIT. Once the virtual machine is running, you can run the test by typing these commands in a terminal window:

cd ~/bin/REAPR/Reapr_test/
./test.sh

The REAPR pipeline should run, files will be made for viewing in Artemis and finally Artemis will open. The virtual machine is set up so that REAPR is already in your path.

Important: the root password is wt. The password for the user pagit is pagitvm.

Previous versions

Previous versions of REAPR are available on the FTP site.

Contact

Please send any enquiries to Martin Hunt.

* quick link - http://q.sanger.ac.uk/3i9dlbmt