MitoHiFi

A python pipeline for mitochondrial genome assembly from PacBio high fidelity reads, developed within the Darwin Tree of Life Project.

MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub.

The full paper introducing MitoHiFi, published in BMC Bioinformatics, can be found here.

At time of publication, MitoHiFi had been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Additionally, the inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats.

Background

PacBio high fidelity (HiFi) sequencing reads are both long (15–20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. However, a dedicated tool for mitochondrial genome assembly using HiFi reads was, until recently, missing.

About MitoHiFi

MitoHiFi was developed within the Darwin Tree of Life Project – an affiliated project of the Earth BioGenome Project – to assemble mitochondrial genomes from the HiFi reads generated for target species. This project ultimately aims to sequence all eukaryotic species on the archipelago of Britain and Ireland.

The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy (the presence of more than one organelle type within a cell, e.g. as in plants) are assembled independently using different tools. Nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly.

MitoHiFi is written in python and is freely available on GitHub. MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).

 


Sanger Institute Contributors

Photo of Professor Mark Blaxter

Professor Mark Blaxter

Programme Lead for Tree of Life Programme and Senior Group Leader

Photo of Dr Richard Durbin

Dr Richard Durbin

Associate Faculty

Photo of Ksenia Krasheninnikova

Ksenia Krasheninnikova

Senior Bioinformatician

Photo of James Torrance

James Torrance

Senior Bioinformatician

Photo of Dr Marcela Uliano-Silva

Dr Marcela Uliano-Silva

Senior Bioinformatician

External Contributor

Photo of João Gabriel R. N. Ferreira

João Gabriel R. N. Ferreira

Bio Bureau Biotecnologia, Rio de Janeiro, Brazil

 
See full index

Publications

Loading publications...