We generated a reference genome assembly for gorilla using DNA sampled from a single individual – a female western lowland gorilla (Gorilla gorilla gorilla) named Kamilah, resident at San Diego Zoo. We collected 5.4 Gbp of capillary sequence and 166.8 Gbp of Illumina read pairs, and combined both data sets in an initial hybrid de novo assembly. Improvements in long-range structure were guided by human homology, placing contigs into scaffolds wherever read pairs confirmed collinearity between gorilla and human. Base-pair contiguity was improved by local reassembly within each scaffold, merging or extending contigs using Illumina read pairs. Finally we used additional Kamilah bacterial artificial chromosome and fosmid end pair capillary sequences to provide longer range scaffolding. Base errors were corrected by mapping all Illumina reads back to the assembly and rectifying apparent homozygous variants.
In addition to data from Kamilah, we collected sequence data for three other gorillas, including one from the eastern lowland species, to enable a study of diversity within the Gorilla genus. We also sequenced gorilla RNA and ChIP-seq data to support studies of great ape transcriptomic and regulatory evolution.
The assembly, analysis and other results of the Gorilla Genome Project are published in the publication below.
Accession numbers for all primary sequencing data are presented there; the assembly itself and annotation of genes, transcripts and predictions of gene orthologues and paralogues are available at Ensembl. The RNA-seq data is available from the European Nucleotide Archive under accession ERP002094. More information about the results of the project is also available here.
This sequencing centre plans on publishing the completed and annotated sequences in a peer-reviewed journal as soon as possible. Permission of the principal investigator should be obtained before publishing analyses of the sequence/open reading frames/genes on a chromosome or genome scale. See our data sharing policy.