Gorilla genome - data download

Gorillas, the largest living primates, are humans' closest living relatives after chimpanzees, and are important for the study of human origins and evolution. They are found today only within several endangered populations in the equatorial forests of central Africa.

We generated a reference genome assembly for gorilla using DNA sampled from a single individual - a female western lowland gorilla (Gorilla gorilla gorilla) named Kamilah, resident at San Diego Zoo. We collected 5.4 Gbp of capillary sequence and 166.8 Gbp of Illumina read pairs, and combined both data sets in an initial hybrid de novo assembly. Improvements in long-range structure were guided by human homology, placing contigs into scaffolds wherever read pairs confirmed collinearity between gorilla and human. Base-pair contiguity was improved by local reassembly within each scaffold, merging or extending contigs using Illumina read pairs. Finally we used additional Kamilah bacterial artificial chromosome and fosmid end pair capillary sequences to provide longer range scaffolding. Base errors were corrected by mapping all Illumina reads back to the assembly and rectifying apparent homozygous variants.

In addition to data from Kamilah, we collected sequence data for three other gorillas, including one from the eastern lowland species, to enable a study of diversity within the Gorilla genus. We also sequenced gorilla RNA and ChIP-seq data to support studies of great ape transcriptomic and regulatory evolution.

The assembly, analysis and other results of the Gorilla Genome Project are published in the publication below.

Accession numbers for all primary sequencing data are presented there; the assembly itself and annotation of genes, transcripts and predictions of gene orthologues and paralogues are available at Ensembl. The RNA-seq data is available from the European Nucleotide Archive under accession ERP002094. More information about the results of the project is also available here.

  • Insights into hominid evolution from the gorilla genome sequence.

    Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J, Hobolth A, Lappalainen T, Mailund T, Marques-Bonet T, McCarthy S, Montgomery SH, Schwalie PC, Tang YA, Ward MC, Xue Y, Yngvadottir B, Alkan C, Andersen LN, Ayub Q, Ball EV, Beal K, Bradley BJ, Chen Y, Clee CM, Fitzgerald S, Graves TA, Gu Y, Heath P, Heger A, Karakoc E, Kolb-Kokocinski A, Laird GK, Lunter G, Meader S, Mort M, Mullikin JC, Munch K, O'Connor TD, Phillips AD, Prado-Martinez J, Rogers AS, Sajjadian S, Schmidt D, Shaw K, Simpson JT, Stenson PD, Turner DJ, Vigilant L, Vilella AJ, Whitener W, Zhu B, Cooper DN, de Jong P, Dermitzakis ET, Eichler EE, Flicek P, Goldman N, Mundy NI, Ning Z, Odom DT, Ponting CP, Quail MA, Ryder OA, Searle SM, Warren WC, Wilson RK, Schierup MH, Rogers J, Tyler-Smith C and Durbin R

    Nature 2012;483;7388;169-75