bam2cram-check

bam2cram-check

bam2cram-check

Overview

This is a tool for comparing the contents of a BAM file with the contents of a CRAM file after converting from one format to the other one. It is a tool for checking that the actual data was unaffected by the format change and this is performed by comparing the stats for each file. The tool also checks that the file isn't truncated (using samtools quickcheck) and outputs a list of differences in case the files don't contain the same sequence data. As a side effect, the tool will also create a .stats and .flagstat in the directory where the file is, or use the existing ones if any.

For running this you need:

python >= 3.5

samtools >=1.3

Usage:

python main.py -b <bam_file> -c <cram_file> -e <err_file> --log <log_file>

Or alternatively, there is also a shell script for checking a full directory of BAMs and CRAMs by submitting as a job to LSF for each pair of files converted:

./run_batch.sh <bam_dir> <cram_dir> <log_dir> <output_dir> <issues_dir>

where each BAM-CRAM conversion to be checked will have its own file in:

  • log_dir - for the logging all the commands ran and their results
  • output_dir - what is sent to stdout by the commands ran
  • issues_dir - what is sent to stderr by the commands ran

There is no need to create these dirs beforehands as the shell script creates them if they don't exist already.

Download and Installation

It is currently available for download on GitHub Repository.

License and Citation

bam2cram-check is licensed under the GNU Affero General Public license, version 3 or greater (AGPLv3+).

Contact

For questions or bug reports, please use GitHub Issues

Authors

Sanger Contributors