bam2cram-check

This is a tools for comparing a BAM file to a CRAM file, after converting from one format to the other one.

This is a tool for comparing the contents of a BAM file with the contents of a CRAM file after converting from one format to the other one. It is a tool for checking that the actual data was unaffected by the format change and this is performed by comparing the stats for each file. The tool also checks that the file isn't truncated (using samtools quickcheck) and outputs a list of differences in case the files don't contain the same sequence data. As a side effect, the tool will also create a .stats and .flagstat in the directory where the file is, or use the existing ones if any.

For running this you need:

python >= 3.5

samtools >=1.3

Usage:

python main.py -b <bam_file> -c <cram_file> -e <err_file> --log <log_file>

Or alternatively, there is also a shell script for checking a full directory of BAMs and CRAMs by submitting as a job to LSF for each pair of files converted:

./run_batch.sh <bam_dir> <cram_dir> <log_dir> <output_dir> <issues_dir>

where each BAM-CRAM conversion to be checked will have its own file in:

  • log_dir - for the logging all the commands ran and their results
  • output_dir - what is sent to stdout by the commands ran
  • issues_dir - what is sent to stderr by the commands ran

There is no need to create these dirs beforehands as the shell script creates them if they don't exist already.

Downloads

It is currently available for download on GitHub Repository.

Further information

bam2cram-check is licensed under the GNU Affero General Public license, version 3 or greater (AGPLv3+).

Contact

If you need help or have any queries, please contact us using the details below.

For questions or bug reports, please use GitHub Issues


Sanger Institute Contributors

Previous contributors

Photo of Irina Gabriela Colgiu

Irina Gabriela Colgiu

Senior Software Developer