MetaHIT Consortium (Metagenomics of the Human Intestinal Tract consortium)
The MetaHIT project aims to understand the role of the human intestinal microbiota in health and disease.
The consortium involves 13 research centres from eight countries. The project is funded by £11.4 million from the European Commission and runs from 1st January 2008 for four years.
Background
The Sanger Institute contributes to MetaHIT by producing draft nucleotide sequence for the genomes of 100 bacterial strains commonly found in the human intestinal tract. This will provide a reference set of genomes for future studies. The first 30 strains sequenced will be cultured bacteria, while the remaining 70 will be bacteria that cannot yet be grown in the laboratory. For these 70 uncultured bacteria, individual bacterial cells will be isolated and the whole of the single copy of the genome amplified to generate sufficient DNA for nucleotide sequencing.
This work is being carried out with the help of our collaborators at the Rowett Institute of Nutrition and Health in Aberdeen, and Dr Annick Bernalier (Microbiology Unit, INRA Clermont-Ferrand).
Paired-end sequencing generates sequence reads from both ends of a DNA fragment. Sequence reads from many DNA fragments are then assembled into contiguous sequences (contigs). If two reads from one DNA fragment occur in different contigs, then it is likely that the two contigs represent regions that are adjacent in the genome. In this way contigs may be linked together into scaffolds. Within a scaffold, although the sequences between the contigs are unknown their length may be estimated from the length of the DNA fragments being sequenced.
The scaffold information for each genome is given in two files produced by the GS De Novo Assembler. There is a fasta file of the concatenated contig sequences that were scaffolded by paired end analysis. The contigs are separated by Ns with the number of Ns corresponding to the estimated gap size (there are a minimum of 20 Ns marking each gap). The scaffold information is also presented as a text file in the AGP format of the NCBI.
Draft Genome Sequences for Cultured Strains
Please note that these draft sequences are unchecked and unedited, and will contain errors. The 454 is known to have problems with homopolymeric tracts, and these are therefore likely to contain significant numbers of errors.
Note: The improved assemblies listed in the table below were produced using both 454 and Illumina reads. Each genome was assembled initially with SOAP. Newbler was then used to create a combined assembly. Contigs were joined to scaffolds created by Newbler based on overlaps and read pair information. IMAGE was then run on each genome. IMAGE works to close down positive gaps using Illumina sequence that is not assembled. It also finds negative gaps, so these can be manually closed. The sequence was then corrected using ICORN, all indels and SNPs were checked and suggested changes made by ICORN instigated where appropriate. Finally all repeats within the genome over 100bp were checked to ensure that they were confirmed by at least two spanning read pairs. Any obvious misassemblies were addressed and where repeats were not confirmed by spanning read pairs they were broken apart.
Strain | Fold cov. | de novo sequence (fasta/qual) | Scaffolds (fasta/text) | Improved assembly | EMBL acc. | Last update |
---|---|---|---|---|---|---|
Alistipes shahii WAL 8301 | 22x | seq qual | seq text | seq | FP929032 | 26/08/2011 |
Bacteroides xylanisolvens XB1A | 18x | seq qual | seq text | seq | FP929033 | 26/08/2011 |
Bifidobacterium longum subsp. longum F8 | 45x | seq qual | seq text | seq | FP929034 | 26/08/2011 |
Bifidobacterium pseudocatenulatum D2CA | 17x | seq qual | seq text | seq | 26/08/2011 | |
Brachyspira aalborgii 513 | 21x | seq qual | seq text | seq | 26/08/2011 | |
Brachyspira pilosicoli WesB | 34x | seq qual | seq text | seq | 26/08/2011 | |
Butyrivibrio fibrisolvens 16/4 | 63x | seq qual | seq text | seq | FP929036 | 26/08/2011 |
Clostridiales sp. SM4/1 | 14x | seq qual | seq text | seq | FP929060 | 09/11/2011 |
Clostridiales sp. SSC/2 | 29x | seq qual | seq text | seq | FP929061 | 09/11/2011 |
Clostridiales sp. SS3/4 | 16x | seq qual | seq text | seq | FP929062 | 26/08/2011 |
Clostridium saccharolyticum-like K10 | 20x | seq qual | seq text | seq | FP929037 | 26/08/2011 |
Coprococcus catus GD/7 | 21x | seq qual | seq text | seq | FP929038 | 26/08/2011 |
Coprococcus comes SL7/1 | 14x | seq qual | seq | 26/08/2011 | ||
Coprococcus sp. ART55/1 | 17x | seq qual | seq text | seq | FP929039 | 26/08/2011 |
Enterobacter cloacae subsp. cloacae NCTC 9394 | 15x | seq qual | seq text | seq | FP929040 | 26/08/2011 |
Enterococcus sp. 7L76 | 16x | seq qual | seq text | seq | FP929058 | 26/08/2011 |
Eubacterium cylindroides T2-87 | 19x | seq qual | seq text | seq | FP929041 | 26/08/2011 |
Eubacterium rectale DSM 17629 | 25x | seq qual | seq text | seq | FP929042 | 26/08/2011 |
Eubacterium rectale M104/1 | 21x | seq qual | seq text | seq | FP929043 | 09/11/2011 |
Eubacterium siraeum 70/3 | 25x | seq qual | seq text | seq | FP929044 | 26/08/2011 |
Eubacterium siraeum V10Sc8a | 26x | seq qual | seq text | seq | FP929059 | 26/08/2011 |
Faecalibacterium prausnitzii L2-6 | 29x | seq qual | seq text | seq | FP929045 | 26/08/2011 |
Faecalibacterium prausnitzii SL3/3 | 20x | seq qual | seq text | seq | FP929046 | 09/11/2011 |
Gordonibacter pamelaeae 7-10-1-b | 20x | seq qual | seq text | seq | FP929047 | 26/08/2011 |
Megamonas hypermegale ART12/1 | 27x | seq qual | seq text | seq | FP929048 | 26/08/2011 |
Roseburia faecis CC123 | ||||||
Roseburia faecis 11SE37 | ||||||
Roseburia intestinalis M50/1 | 25x | seq qual | seq text | seq | FP929049 | 26/08/2011 |
Roseburia intestinalis XB6B4 | 34x | seq qual | seq text | seq | FP929050 | 26/08/2011 |
Ruminococcus bromii L2-63 | 26x | seq qual | seq text | seq | FP929051 | 26/08/2011 |
Ruminococcus sp. 18P13 | 26x | seq qual | seq text | seq | FP929052 | 26/08/2011 |
Ruminococcus sp. SR1/5 | 32x | seq qual | seq text | seq | FP929053 | 26/08/2011 |
Ruminococcus obeum A2-162 | 22x | seq qual | seq text | seq | FP929054 | 09/11/2011 |
Ruminococcus torques L2-14 | 27x | seq qual | seq text | seq | FP929055 | 26/08/2011 |
Draft genome sequences for uncultured strains
Note that the following draft genomes were derived from whole genome amplified DNA, thus in addition to the potential sequencing errors mentioned above for the cultured strains, the following draft genome sequences may also contain errors due to rearrangements that have occurred during the genome amplification process.
Note: The improved assemblies listed in the table below were produced using both 454 and Illumina reads. Each genome was assembled initially with SOAP. Newbler was then used to create a combined assembly. Contigs were joined to scaffolds created by Newbler based on overlaps and read pair information. IMAGE was then run on each genome. IMAGE works to close down positive gaps using Illumina sequence that is not assembled. It also finds negative gaps, so these can be manually closed. The sequence was then corrected using ICORN, all indels and SNPs were checked and suggested changes made by ICORN instigated where appropriate. Finally all repeats within the genome over 100bp were checked to ensure that they were confirmed by at least two spanning read pairs. Any obvious misassemblies were addressed and where repeats were not confirmed by spanning read pairs they were broken apart.
Strain | Fold coverage | de novo sequence (fasta/qual) | Scaffolds (fasta/text) | Improved assembly | EMBL accession | Last update |
---|---|---|---|---|---|---|
Bacteroides dorei D8 | 66x | seq qual | seq text | seq | 26/08/2011 | |
Eubacterium hallii SM6/1 | 22x | seq qual | seq text | seq | 26/08/2011 | |
Synergistetes sp. SGP1 | 47x | seq qual | seq text | seq | FP929056 | 26/08/2011 |
Studies
Blautia producta genome comparisons
Sample | Strain | Run Accession |
---|---|---|
TL266 | TL266 | ERR033896 |
2950 | 2950 | ERR033897 |
3507 | 3507 | ERR033898 |
Bulk data download
To download MetaHIT data in bulk, please use this ftp link.
Contact
Please address all sequencing enquiries to Dr Keith Turner