Sanger Institute - Publications 2011
Number of papers published in 2011: 228
Genomics in 2011: challenges and opportunities.
Wellcome Trust Sanger Institute.
As we come to the end of 2011, Genome Biology has asked some members of our Editorial Board for their views on the state of play in genomics. What was their favorite paper of 2011? What are the challenges in their particular research area? Who has had the biggest influence on their careers? What advice would they give to young researchers embarking on a career in research?
Genome biology 2011;12;12;137
Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. email@example.com
Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder that is characterized by mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated individuals and identified NBEAL2 as the causative gene; it has no previously known function but is a member of a gene family that is involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation.
Funded by: British Heart Foundation: RG/09/012/28096; Medical Research Council: MC_U105260799; Wellcome Trust: 082597, 082961, 084183
Nature genetics 2011;43;8;735-7
Dindel: accurate indel calls from short-read data.
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, United Kingdom. firstname.lastname@example.org
Small insertions and deletions (indels) are a common and functionally important type of sequence polymorphism. Most of the focus of studies of sequence variation is on single nucleotide variants (SNVs) and large structural variants. In principle, high-throughput sequencing studies should allow identification of indels just as SNVs. However, inference of indels from next-generation sequence data is challenging, and so far methods for identifying indels lag behind methods for calling SNVs in terms of sensitivity and specificity. We propose a Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference. The candidate haplotypes are formed by combining candidate indels and SNVs identified by the read mapper, while allowing for known sequence variants or candidates from other methods to be included. In our probabilistic realignment model we account for base-calling errors, mapping errors, and also, importantly, for increased sequencing error indel rates in long homopolymer runs. We show that our method is sensitive and achieves low false discovery rates on simulated and real data sets, although challenges remain. The algorithm is implemented in the program Dindel, which has been used in the 1000 Genomes Project call sets.
Funded by: British Heart Foundation: RG/09/012/28096; Wellcome Trust: 086084, 090532, WT089088/Z/09/Z
Genome research 2011;21;6;961-73
IDH1 and IDH2 mutations are frequent events in central chondrosarcoma and central and periosteal chondromas but not in other mesenchymal tumours.
Department of Histopathology, Royal National Orthopaedic Hospital NHS Trust, Stanmore, Middlesex HA7 4LP, UK.
Somatic mutations in isocitrate dehydrogenase 1 (IDH1) and IDH2 occur in gliomas and acute myeloid leukaemia (AML). Since patients with multiple enchondromas have occasionally been reported to have these conditions, we hypothesized that the same mutations would occur in cartilaginous neoplasms. Approximately 1200 mesenchymal tumours, including 220 cartilaginous tumours, 222 osteosarcomas and another ∼750 bone and soft tissue tumours, were screened for IDH1 R132 mutations, using Sequenom(®) mass spectrometry. Cartilaginous tumours and chondroblastic osteosarcomas, wild-type for IDH1 R132, were analysed for IDH2 (R172, R140) mutations. Validation was performed by capillary sequencing and restriction enzyme digestion. Heterozygous somatic IDH1/IDH2 mutations, which result in the production of a potential oncometabolite, 2-hydroxyglutarate, were only detected in central and periosteal cartilaginous tumours, and were found in at least 56% of these, ∼40% of which were represented by R132C. IDH1 R132H mutations were confirmed by immunoreactivity for this mutant allele. The ratio of IDH1:IDH2 mutation was 10.6 : 1. No IDH2 R140 mutations were detected. Mutations were detected in enchondromas through to conventional central and dedifferentiated chondrosarcomas, in patients with both solitary and multiple neoplasms. No germline mutations were detected. No mutations were detected in peripheral chondrosarcomas and osteochondromas. In conclusion, IDH1 and IDH2 mutations represent the first common genetic abnormalities to be identified in conventional central and periosteal cartilaginous tumours. As in gliomas and AML, the mutations appear to occur early in tumourigenesis. We speculate that a mosaic pattern of IDH-mutation-bearing cells explains the reports of diverse tumours (gliomas, AML, multiple cartilaginous neoplasms, haemangiomas) occurring in the same patient.
Funded by: Wellcome Trust: WT077012
The Journal of pathology 2011;224;3;334-43
Ollier disease and Maffucci syndrome are caused by somatic mosaic mutations of IDH1 and IDH2.
Histopathology Unit, Royal National Orthopaedic Hospital National Health Service Trust, Stanmore, UK. email@example.com
Ollier disease and Maffucci syndrome are characterized by multiple central cartilaginous tumors that are accompanied by soft tissue hemangiomas in Maffucci syndrome. We show that in 37 of 40 individuals with these syndromes, at least one tumor has a mutation in isocitrate dehydrogenase 1 (IDH1) or in IDH2, 65% of which result in a R132C substitution in the protein. In 18 of 19 individuals with more than one tumor analyzed, all tumors from a given individual shared the same IDH1 mutation affecting Arg132. In 2 of 12 subjects, a low level of mutated DNA was identified in non-neoplastic tissue. The levels of the metabolite 2HG were measured in a series of central cartilaginous and vascular tumors, including samples from syndromic and nonsyndromic subjects, and these levels correlated strongly with the presence of IDH1 mutations. The findings are compatible with a model in which IDH1 or IDH2 mutations represent early post-zygotic occurrences in individuals with these syndromes.
Funded by: Wellcome Trust: WT077012
Nature genetics 2011;43;12;1262-5
Synthetic associations are unlikely to account for many common disease genome-wide association signals.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB101HH, United Kingdom. firstname.lastname@example.org
Funded by: Wellcome Trust: WT089120/Z/09/Z, WT091745/Z/10/Z
PLoS biology 2011;9;1;e1000580
Comparative whole genome sequence analysis of the carcinogenic bacterial model pathogen Helicobacter felis.
Institute of Molecular Cancer Research, University of Zürich, Switzerland.
The gram-negative bacterium Helicobacter felis naturally colonizes the gastric mucosa of dogs and cats. Due to its ability to persistently infect laboratory mice, H. felis has been used extensively to experimentally model gastric disorders induced in humans by H. pylori. We determined the 1.67 Mb genome sequence of H. felis using combined Solexa and 454 pyrosequencing, annotated the genome, and compared it with multiple previously published Helicobacter genomes. About 1,063 (63.6%) of the 1,671 genes identified in the H. felis genome have orthologues in H. pylori, its closest relative among the fully sequenced Helicobacter species. Many H. pylori virulence factors are shared by H. felis: these include the gamma-glutamyl transpeptidase GGT, the immunomodulator NapA, and the secreted enzymes collagenase and HtrA. Helicobacter felis lacks a Cag pathogenicity island and the vacuolating cytotoxin VacA but possesses a complete comB system conferring natural competence. Remarkable features of the H. felis genome include its paucity of transcriptional regulators and an extraordinary abundance of chemotaxis sensors and restriction/modification systems. Helicobacter felis possesses an episomally replicating 6.7-kb plasmid and harbors three chromosomal regions with deviating GC content. These putative horizontally acquired regions show homology and synteny with the recently isolated H. pylori plasmid pHPPC4 and homology to Campylobacter bacteriophage genes (transposases, structural, and lytic genes), respectively. In summary, the H. felis genome harbors a variety of putative mobile elements that are unique among Helicobacter species and may contribute to this pathogen's carcinogenic properties.
Funded by: Wellcome Trust: 076962, 076964
Genome biology and evolution 2011;3;302-8
Enterotypes of the human gut microbiome.
European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
Our knowledge of species and functional composition of the human gut microbiome is rapidly increasing, but it is still based on very few cohorts and little is known about variation across the world. By combining 22 newly sequenced faecal metagenomes of individuals from four countries with previously published data sets, here we identify three robust clusters (referred to as enterotypes hereafter) that are not nation or continent specific. We also confirmed the enterotypes in two published, larger cohorts, indicating that intestinal microbiota variation is generally stratified, not continuous. This indicates further the existence of a limited number of well-balanced host-microbial symbiotic states that might respond differently to diet and drug intake. The enterotypes are mostly driven by species composition, but abundant molecular functions are not necessarily provided by abundant species, highlighting the importance of a functional analysis to understand microbial communities. Although individual host properties such as body mass index, age, or gender cannot explain the observed enterotypes, data-driven marker genes or functional modules can be identified for each of these host properties. For example, twelve genes significantly correlate with age and three functional modules with the body mass index, hinting at a diagnostic potential of microbial markers.
Funded by: Wellcome Trust: 076964, 082372
Comprehensive comparison of three commercial human whole-exome capture platforms.
Beijing Institute of Genomics, Chinese Academy of Sciences, No.7 Beitucheng West Road, Chaoyang District, Beijing 100029, China. email@example.com
Background: Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.
Results: We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.
Conclusions: We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.
Funded by: Wellcome Trust
Genome biology 2011;12;9;R95
An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. firstname.lastname@example.org
Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P. falciparum.
Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 090532, 090770
PloS one 2011;6;7;e22213
Male lineages in the Himalayan foothills: a commentary on Y-chromosome haplogroup diversity in the sub-Himalayan Terai and Duars populations of East India.
Journal of human genetics 2011;56;12;813-4
Parallel evolution of genes and languages in the Caucasus region.
Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia. email@example.com
We analyzed 40 single nucleotide polymorphism and 19 short tandem repeat Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation, and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language coevolution occurred within geographically isolated populations, probably due to its mountainous terrain.
Funded by: Wellcome Trust: 077009
Molecular biology and evolution 2011;28;10;2905-20
Gene inactivation and its implications for annotation in the era of personal genomics.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.
Funded by: Wellcome Trust
Genes & development 2011;25;1;1-10
CCR4-associated factor 1 coordinates the expression of Plasmodium falciparum egress and invasion proteins.
Department of Global Health, College of Public Health, University of South Florida, College of Public Health, 3720 Spectrum Blvd., Suite 304, Tampa, FL, USA.
Coordinated regulation of gene expression is a hallmark of the Plasmodium falciparum asexual blood-stage development cycle. We report that carbon catabolite repressor protein 4 (CCR4)-associated factor 1 (CAF1) is critical in regulating more than 1,000 genes during malaria parasites' intraerythrocytic stages, especially egress and invasion proteins. CAF1 knockout results in mistimed expression, aberrant accumulation and localization of proteins involved in parasite egress, and invasion of new host cells, leading to premature release of predominantly half-finished merozoites, drastically reducing the intraerythrocytic growth rate of the parasite. This study demonstrates that CAF1 of the CCR4-Not complex is a significant gene regulatory mechanism needed for Plasmodium development within the human host.
Funded by: NIAID NIH HHS: R01 AI094973, R01 AI094973-01, R01AI033656, R01AI094973; Wellcome Trust
Eukaryotic cell 2011;10;9;1257-63
RNAcentral: A vision for an international database of RNA sequences.
During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.
RNA (New York, N.Y.) 2011;17;11;1941-6
Characterization of the proteome, diseases and evolution of the human postsynaptic density.
Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, UK.
We isolated the postsynaptic density from human neocortex (hPSD) and identified 1,461 proteins. hPSD mutations cause 133 neurological and psychiatric diseases and were enriched in cognitive, affective and motor phenotypes underpinned by sets of genes. Strong protein sequence conservation in mammalian lineages, particularly in hub proteins, indicates conserved function and organization in primate and rodent models. The hPSD is an important structure for nervous system disease and behavior.
Funded by: Chief Scientist Office: CZB/4/486; Medical Research Council: G0802238, G0802238(89569); Wellcome Trust: 066717, 077155
Nature neuroscience 2011;14;1;19-21
Essential thrombocythemia: seeing the wood for the trees Response
Metagenomics and the molecular identification of novel viruses.
Department of Veterinary Medicine, University of Cambridge, Cambridge, UK. firstname.lastname@example.org
There have been rapid recent developments in establishing methods for identifying and characterising viruses associated with animal and human diseases. These methodologies, commonly based on hybridisation or PCR techniques, are combined with advanced sequencing techniques termed 'next generation sequencing'. Allied advances in data analysis, including the use of computational transcriptome subtraction, have also impacted the field of viral pathogen discovery. This review details these molecular detection techniques, discusses their application in viral discovery, and provides an overview of some of the novel viruses discovered. The problems encountered in attributing disease causality to a newly identified virus are also considered.
Veterinary journal (London, England : 1997) 2011;190;2;191-8
Meta-analysis of genome-wide association studies from the CHARGE consortium identifies common variants associated with carotid intima media thickness and plaque.
Cardiovascular Health Research Unit and Department of Medicine, University of Washington, Seattle, Washington, USA. email@example.com
Carotid intima media thickness (cIMT) and plaque determined by ultrasonography are established measures of subclinical atherosclerosis that each predicts future cardiovascular disease events. We conducted a meta-analysis of genome-wide association data in 31,211 participants of European ancestry from nine large studies in the setting of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. We then sought additional evidence to support our findings among 11,273 individuals using data from seven additional studies. In the combined meta-analysis, we identified three genomic regions associated with common carotid intima media thickness and two different regions associated with the presence of carotid plaque (P < 5 × 10(-8)). The associated SNPs mapped in or near genes related to cellular signaling, lipid metabolism and blood pressure homeostasis, and two of the regions were associated with coronary artery disease (P < 0.006) in the Coronary Artery Disease Genome-Wide Replication and Meta-Analysis (CARDIoGRAM) consortium. Our findings may provide new insight into pathways leading to subclinical atherosclerosis and subsequent cardiovascular events.
Funded by: Chief Scientist Office: CZB/4/710; Intramural NIH HHS: Z01 HL006002-01, Z99 HL999999; Medical Research Council: MC_U127561128; NCATS NIH HHS: UL1 TR000005; NCRR NIH HHS: M01 RR 16500, M01RR00069, UL1RR025005; NHGRI NIH HHS: HG005581, U01HG004402; NHLBI NIH HHS: HL075366, HL080295, HL084729, HL087652, HL105756, N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-85239, N02-HL-6-4278, R01HL086694, R01HL087641, R01HL59367, U01 HL072515-06; NIA NIH HHS: AG-023629, AG-027058, AG-15928, AG-20098, AG033193, AG08122, AG16495, N01-AG-1-2109, N01-AG-12100, R01 AG18728; NIDDK NIH HHS: DK063491, P30 DK072488; NIGMS NIH HHS: U01 GM074518-04; NINDS NIH HHS: NS17950; PHS HHS: 268200625226C
Nature genetics 2011;43;10;940-7
Abdominal aortic aneurysm is associated with a variant in low-density lipoprotein receptor-related protein 1.
Department of Cardiovascular Sciences, University of Leicester, Leicester LE2 7LX, UK. firstname.lastname@example.org
Abdominal aortic aneurysm (AAA) is a common cause of morbidity and mortality and has a significant heritability. We carried out a genome-wide association discovery study of 1866 patients with AAA and 5435 controls and replication of promising signals (lead SNP with a p value < 1 × 10(-5)) in 2871 additional cases and 32,687 controls and performed further follow-up in 1491 AAA and 11,060 controls. In the discovery study, nine loci demonstrated association with AAA (p < 1 × 10(-5)). In the replication sample, the lead SNP at one of these loci, rs1466535, located within intron 1 of low-density-lipoprotein receptor-related protein 1 (LRP1) demonstrated significant association (p = 0.0042). We confirmed the association of rs1466535 and AAA in our follow-up study (p = 0.035). In a combined analysis (6228 AAA and 49182 controls), rs1466535 had a consistent effect size and direction in all sample sets (combined p = 4.52 × 10(-10), odds ratio 1.15 [1.10-1.21]). No associations were seen for either rs1466535 or the 12q13.3 locus in independent association studies of coronary artery disease, blood pressure, diabetes, or hyperlipidaemia, suggesting that this locus is specific to AAA. Gene-expression studies demonstrated a trend toward increased LRP1 expression for the rs1466535 CC genotype in arterial tissues; there was a significant (p = 0.029) 1.19-fold (1.04-1.36) increase in LRP1 expression in CC homozygotes compared to TT homozygotes in aortic adventitia. Functional studies demonstrated that rs1466535 might alter a SREBP-1 binding site and influence enhancer activity at the locus. In conclusion, this study has identified a biologically plausible genetic variant associated specifically with AAA, and we suggest that this variant has a possible functional role in LRP1 expression.
Funded by: British Heart Foundation: FS/11/16/28696, PG/10/001/28098, RG2008/08; Wellcome Trust: 076113, 084695, 085475
American journal of human genetics 2011;89;5;619-27
TSIDER1, a short and non-autonomous Salivarian trypanosome-specific retroposon related to the ingi6 subclade.
Centre de Résonance Magnétique des Systèmes Biologiques, UMR 5536, Université Bordeaux Segalen, CNRS, 146 rue Léo Saignat, 33076 Bordeaux, France. email@example.com
Retroposons of the ingi clade are the most abundant transposable elements identified in the trypanosomatid genomes. Some are long autonomous elements (ingi, L1Tc) while others, such as RIME and NARTc, are short non-coding elements that parasitize the retrotransposition machinery of the active autonomous ones for their own mobilization. Here, we identified a new family of short non-autonomous retroposons of the ingi clade, called TSIDER1, which are present in the genome of Salivarian (African) trypanosomes, Trypanosoma brucei, T. congolense and T. vivax, but absent in the T. cruzi and Leishmania spp. genomes and, as such, TSIDER1 is the only retroposon subfamily conserved at the nucleotide level between African trypanosome species. We identified three TvSIDER1 families within the genome of T. vivax and the high level of sequence conservation within the TvSIDER1a and TvSIDER1b groups suggests that they are still active. We propose that TvSIDER1a/b elements are using the Tvingi retrotransposition machinery, as they are preceded by the same conserved pattern characteristic of the ingi6 subclade, which corresponds to the retroposon-encoded endonuclease binding site. In contrast, TcoSIDER1, TbSIDER1 and TvSIDER1c are too divergent to be considered as active retroposons. The relatively low number of SIDER elements identified in the T. congolense (70 copies), T. vivax (32 copies) and T. brucei (22 copies) genomes confirms that trypanosomes have not expanded short transposable elements, which is in contrast to Leishmania spp. (∼2000 copies), where SIDER play a role in the regulation of gene expression.
Funded by: Wellcome Trust: WT 085775/Z//08/Z
Molecular and biochemical parasitology 2011;179;1;30-6
Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).
Funded by: Cancer Research UK: 13031; Wellcome Trust: 077198
Genome research 2011;21;5;756-67
Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay.
Wellcome Trust Sanger Institute, Hinxton, United Kingdom. firstname.lastname@example.org
The diversity in the Plasmodium falciparum genome can be used to explore parasite population dynamics, with practical applications to malaria control. The ability to identify the geographic origin and trace the migratory patterns of parasites with clinically important phenotypes such as drug resistance is particularly relevant. With increasing single-nucleotide polymorphism (SNP) discovery from ongoing Plasmodium genome sequencing projects, a demand for high SNP and sample throughput genotyping platforms for large-scale population genetic studies is required. Low parasitaemias and multiple clone infections present a number of challenges to genotyping P. falciparum. We addressed some of these issues using a custom 384-SNP Illumina GoldenGate assay on P. falciparum DNA from laboratory clones (long-term cultured adapted parasite clones), short-term cultured parasite isolates and clinical (non-cultured isolates) samples from East and West Africa, Southeast Asia and Oceania. Eighty percent of the SNPs (n = 306) produced reliable genotype calls on samples containing as little as 2 ng of total genomic DNA and on whole genome amplified DNA. Analysis of artificial mixtures of laboratory clones demonstrated high genotype calling specificity and moderate sensitivity to call minor frequency alleles. Clear resolution of geographically distinct populations was demonstrated using Principal Components Analysis (PCA), and global patterns of population genetic diversity were consistent with previous reports. These results validate the utility of the platform in performing population genetic studies of P. falciparum.
Funded by: Howard Hughes Medical Institute; Intramural NIH HHS; Medical Research Council: G0600718, G19/9; NIAID NIH HHS: R37 AI048071; Wellcome Trust: 090532, 093956
PloS one 2011;6;6;e20251
Determinants of bluetongue virus virulence in murine models of disease.
Medical Research Council-University of Glasgow Centre for Virus Research, Institute of Infection, Inflammation and Immunity, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.
Bluetongue is a major infectious disease of ruminants that is caused by bluetongue virus (BTV). In this study, we analyzed virulence and genetic differences of (i) three BTV field strains from Italy maintained at either a low (L strains) or high (H strains) passage number in cell culture and (ii) three South African "reference" wild-type strains and their corresponding live attenuated vaccine strains. The Italian BTV L strains, in general, were lethal for both newborn NIH-Swiss mice inoculated intracerebrally and adult type I interferon receptor-deficient (IFNAR(-/-)) mice, while the virulence of the H strains was attenuated significantly in both experimental models. Similarly, the South African vaccine strains were not pathogenic for IFNAR(-/-) mice, while the corresponding wild-type strains were virulent. Thus, attenuation of the virulence of the BTV strains used in this study is not mediated by the presence of an intact interferon system. No clear distinction in virulence was observed for the South African BTV strains in newborn NIH-Swiss mice. Full genomic sequencing revealed relatively few amino acid substitutions, scattered in several different viral proteins, for the strains found to be attenuated in mice compared to the pathogenic related strains. However, only the genome segments encoding VP1, VP2, and NS2 consistently showed nonsynonymous changes between all virulent and attenuated strain pairs. This study established an experimental platform for investigating the determinants of BTV virulence. Future studies using reverse genetics will allow researchers to precisely map and "weight" the relative influences of the various genome segments and viral proteins on BTV virulence.
Funded by: Medical Research Council: G0801822; Wellcome Trust
Journal of virology 2011;85;21;11479-89
A modified vaccinia Ankara virus (MVA) vaccine expressing African horse sickness virus (AHSV) VP2 protects against AHSV challenge in an IFNAR -/- mouse model.
Institute for Animal Health, Pirbright, Woking, Surrey, United Kingdom. email@example.com
African horse sickness (AHS) is a lethal viral disease of equids, which is transmitted by Culicoides midges that become infected after biting a viraemic host. The use of live attenuated vaccines has been vital for the control of this disease in endemic regions. However, there are safety concerns over their use in non-endemic countries. Research efforts over the last two decades have therefore focused on developing alternative vaccines based on recombinant baculovirus or live viral vectors expressing structural components of the AHS virion. However, ethical and financial considerations, relating to the use of infected horses in high biosecurity installations, have made progress very slow. We have therefore assessed the potential of an experimental mouse-model for AHSV infection for vaccine and immunology research. We initially characterised AHSV infection in this model, then tested the protective efficacy of a recombinant vaccine based on modified vaccinia Ankara expressing AHS-4 VP2 (MVA-VP2).
Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/00654
PloS one 2011;6;1;e16503
The impact of recombination on dN/dS within recently emerged bacterial clones.
Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom.
The development of next-generation sequencing platforms is set to reveal an unprecedented level of detail on short-term molecular evolutionary processes in bacteria. Here we re-analyse genome-wide single nucleotide polymorphism (SNP) datasets for recently emerged clones of methicillin resistant Staphylococcus aureus (MRSA) and Clostridium difficile. We note a highly significant enrichment of synonymous SNPs in those genes which have been affected by recombination, i.e. those genes on mobile elements designated "non-core" (in the case of S. aureus), or those core genes which have been affected by homologous replacements (S. aureus and C. difficile). This observation suggests that the previously documented decrease in dN/dS over time in bacteria applies not only to genomes of differing levels of divergence overall, but also to horizontally acquired genes of differing levels of divergence within a single genome. We also consider the role of increased drift acting on recently emerged, highly specialised clones, and the impact of recombination on selection at linked sites. This work has implications for a wide range of genomic analyses.
PLoS pathogens 2011;7;7;e1002129
Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma.
Epidemiology and Biostatistics, Imperial College London, Norfolk Place, London, UK. firstname.lastname@example.org
Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10(-8) to P = 10(-190)). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function.
Funded by: British Heart Foundation: FS/10/011/27881, PG/09/002/26056, PG/09/023/26806, RG/07/008/23674; Cancer Research UK: 14136; Department of Health: PHCS/C4/4/016; Intramural NIH HHS: Z01 AG000675-02, Z99 DK999999, ZIA DK075013-05, ZIA DK075013-07; Medical Research Council: G0100222, G0401527, G0601653, G0601966, G0700342, G0700931, G0701863, G0902037, G1000143, G19/35, G8802774, G9521010, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015; NHLBI NIH HHS: R01 HL087647; NIAAA NIH HHS: K05 AA017688; Wellcome Trust: 090532
Nature genetics 2011;43;11;1131-8
Defining the power limits of genome-wide association scan meta-analyses.
Wellcome Trust Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, United Kingdom.
Large-scale meta-analyses of genome-wide association scans (GWAS) have been successful in discovering common risk variants with modest and small effects. The detection of lower frequency signals will undoubtedly require concerted efforts of at least similar scale. We investigate the sample size-dictated power limits of GWAS meta-analyses, in the presence and absence of modest levels of heterogeneity and across a range of different allelic architectures. We find that data combination through large-scale collaboration is vital in the quest for complex trait susceptibility loci, but that effect size heterogeneity across meta-analyzed studies drawn from similar populations does not appear to have a profound effect on sample size requirements.
Funded by: Wellcome Trust: 088885, 090532, WT079557MA, WT081682/Z/06/Z, WT088885/Z/09/Z
Genetic epidemiology 2011;35;8;781-9
Expressions of individuality.
Nature reviews. Microbiology 2011;9;10;701
Genome-wide association study reveals three susceptibility loci for common migraine in the general population.
Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
Migraine is a common, heterogeneous and heritable neurological disorder. Its pathophysiology is incompletely understood, and its genetic influences at the population level are unknown. In a population-based genome-wide analysis including 5,122 migraineurs and 18,108 non-migraineurs, rs2651899 (1p36.32, PRDM16), rs10166942 (2q37.1, TRPM8) and rs11172113 (12q13.3, LRP1) were among the top seven associations (P < 5 × 10(-6)) with migraine. These SNPs were significant in a meta-analysis among three replication cohorts and met genome-wide significance in a meta-analysis combining the discovery and replication cohorts (rs2651899, odds ratio (OR) = 1.11, P = 3.8 × 10(-9); rs10166942, OR = 0.85, P = 5.5 × 10(-12); and rs11172113, OR = 0.90, P = 4.3 × 10(-9)). The associations at rs2651899 and rs10166942 were specific for migraine compared with non-migraine headache. None of the three SNP associations was preferential for migraine with aura or without aura, nor were any associations specific for migraine features. TRPM8 has been the focus of neuropathic pain models, whereas LRP1 modulates neuronal glutamate signaling, plausibly linking both genes to migraine pathophysiology.
Funded by: NCI NIH HHS: CA-47988, R01 CA047988, R01 CA047988-21; NHLBI NIH HHS: HL-043851, HL-080467, HL-099355, R01 HL043851, R01 HL043851-10, R01 HL080467, R01 HL080467-05, RC1 HL099355, RC1 HL099355-02; NINDS NIH HHS: NS-061836, R01 NS061836, R01 NS061836-03
Nature genetics 2011;43;7;695-8
Population genetic structure in Indian Austroasiatic speakers: the role of landscape barriers and sex-specific admixture.
Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu and Estonian Biocentre, Tartu, Estonia.
The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in south and southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in southeast Asia with a later dispersal to south Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from south Asia. To test the two alternative models, this study combines the analysis of uniparentally inherited markers with 610,000 common single nucleotide polymorphism loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17-28 thousand years ago) in southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and "structure-like" analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterized by two ancestral components-one represented in the pattern of Y chromosomal and EDAR results and the other by mitochondrial DNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from southeast Asia, followed by extensive sex-specific admixture with local Indian populations.
Funded by: Wellcome Trust: 077009
Molecular biology and evolution 2011;28;2;1013-24
Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.
Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D00019X/1; Medical Research Council: G0801161; Wellcome Trust: 079643/Z/06/Z
Microbiology (Reading, England) 2011;157;Pt 10;2922-32
Genetic screens using the piggyBac transposon.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.
Transposons are an attractive system to use in genetic screens as they are molecularly tractable and the disrupted loci that give rise to the desired phenotype are easily mapped. We consider herein the characteristics of the piggyBac transposon system in complementing existing mammalian screen strategies, including the Sleeping Beauty transposon system. We also describe the design of the piggyBac resources that we have developed for both forward and reverse genetic screens, and the protocols we use in these experiments.
Funded by: Wellcome Trust
Methods (San Diego, Calif.) 2011;53;4;366-71
Modernizing reference genome assemblies.
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America. email@example.com
Funded by: Wellcome Trust: 077198, 095908
PLoS biology 2011;9;7;e1001091
The GENCODE exome: sequencing the complete human exome.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.
Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing.
Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 077198, WT062023, WT077198, WT089062
European journal of human genetics : EJHG 2011;19;7;827-31
A world in a grain of sand: human history from genetic data.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.
Genome-wide genotypes and sequences are enriching our understanding of the past 50,000 years of human history and providing insights into earlier periods largely inaccessible to mitochondrial DNA and Y-chromosomal studies.To see a world in a grain of sand ...William Blake, Auguries of Innocence.
Funded by: Wellcome Trust
Genome biology 2011;12;11;234
Variation in genome-wide mutation rates within and between human families.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline. Diverse studies have supported Haldane's contention of a higher average mutation rate in the male germline in a variety of mammals, including humans. Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.
Funded by: NHGRI NIH HHS: R01 HG004960; NIGMS NIH HHS: R01 GM070806; Wellcome Trust: 077014, 077014/Z/05/Z, 085532, 090532
Nature genetics 2011;43;7;712-4
A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease.
Genome-wide association studies have identified 11 common variants convincingly associated with coronary artery disease (CAD)¹⁻⁷, a modest number considering the apparent heritability of CAD⁸. All of these variants have been discovered in European populations. We report a meta-analysis of four large genome-wide association studies of CAD, with ∼575,000 genotyped SNPs in a discovery dataset comprising 15,420 individuals with CAD (cases) (8,424 Europeans and 6,996 South Asians) and 15,062 controls. There was little evidence for ancestry-specific associations, supporting the use of combined analyses. Replication in an independent sample of 21,408 cases and 19,185 controls identified five loci newly associated with CAD (P < 5 × 10⁻⁸ in the combined discovery and replication analysis): LIPA on 10q23, PDGFD on 11q22, ADAMTS7-MORF4L1 on 15q25, a gene rich locus on 7q22 and KIAA1462 on 10p11. The CAD-associated SNP in the PDGFD locus showed tissue-specific cis expression quantitative trait locus effects. These findings implicate new pathways for CAD susceptibility.
Funded by: British Heart Foundation: RG/08/014/24067; Cancer Research UK: 10293; Medical Research Council: G0601966, G0700931, G0801056, G9521010, MC_U137686854, MC_U137686857
Nature genetics 2011;43;4;339-44
Basigin is a receptor essential for erythrocyte invasion by Plasmodium falciparum.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
Erythrocyte invasion by Plasmodium falciparum is central to the pathogenesis of malaria. Invasion requires a series of extracellular recognition events between erythrocyte receptors and ligands on the merozoite, the invasive form of the parasite. None of the few known receptor-ligand interactions involved are required in all parasite strains, indicating that the parasite is able to access multiple redundant invasion pathways. Here, we show that we have identified a receptor-ligand pair that is essential for erythrocyte invasion in all tested P. falciparum strains. By systematically screening a library of erythrocyte proteins, we have found that the Ok blood group antigen, basigin, is a receptor for PfRh5, a parasite ligand that is essential for blood stage growth. Erythrocyte invasion was potently inhibited by soluble basigin or by basigin knockdown, and invasion could be completely blocked using low concentrations of anti-basigin antibodies; importantly, these effects were observed across all laboratory-adapted and field strains tested. Furthermore, Ok(a-) erythrocytes, which express a basigin variant that has a weaker binding affinity for PfRh5, had reduced invasion efficiencies. Our discovery of a cross-strain dependency on a single extracellular receptor-ligand pair for erythrocyte invasion by P. falciparum provides a focus for new anti-malarial therapies.
Funded by: Medical Research Council: G19/9; NCEZID CDC HHS: R36 CK000119-01; NIAID NIH HHS: 2T32 AI007535-12, R01 AI057919, R01 AI057919-05, R01AI057919; Wellcome Trust: 077108, 089084, 090532
Disruption of mouse Slx4, a regulator of structure-specific nucleases, phenocopies Fanconi anemia.
Medical Research Council, Laboratory of Molecular Biology, Cambridge, UK.
The evolutionarily conserved SLX4 protein, a key regulator of nucleases, is critical for DNA damage response. SLX4 nuclease complexes mediate repair during replication and can also resolve Holliday junctions formed during homologous recombination. Here we describe the phenotype of the Btbd12 knockout mouse, the mouse ortholog of SLX4, which recapitulates many key features of the human genetic illness Fanconi anemia. Btbd12-deficient animals are born at sub-Mendelian ratios, have greatly reduced fertility, are developmentally compromised and are prone to blood cytopenias. Btbd12(-/-) cells prematurely senesce, spontaneously accumulate damaged chromosomes and are particularly sensitive to DNA crosslinking agents. Genetic complementation reveals a crucial requirement for Btbd12 (also known as Slx4) to interact with the structure-specific endonuclease Xpf-Ercc1 to promote crosslink repair. The Btbd12 knockout mouse therefore establishes a disease model for Fanconi anemia and genetically links a regulator of nuclease incision complexes to the Fanconi anemia DNA crosslink repair pathway.
Funded by: Cancer Research UK: 12401, A11073, A11376, A12401, A8449; Medical Research Council: MC_U105178811, U.1051.03.009(78811); Wellcome Trust: 098051
Nature genetics 2011;43;2;147-52
Rapid pneumococcal evolution in response to clinical interventions.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Epidemiological studies of the naturally transformable bacterial pathogen Streptococcus pneumoniae have previously been confounded by high rates of recombination. Sequencing 240 isolates of the PMEN1 (Spain(23F)-1) multidrug-resistant lineage enabled base substitutions to be distinguished from polymorphisms arising through horizontal sequence transfer. More than 700 recombinations were detected, with genes encoding major antigens frequently affected. Among these were 10 capsule-switching events, one of which accompanied a population shift as vaccine-escape serotype 19A isolates emerged in the USA after the introduction of the conjugate polysaccharide vaccine. The evolution of resistance to fluoroquinolones, rifampicin, and macrolides was observed to occur on multiple occasions. This study details how genomic plasticity within lineages of recombinogenic bacteria can permit adaptation to clinical interventions over remarkably short time scales.
Funded by: Medical Research Council: G0800596; Wellcome Trust: 076962, 076964
Science (New York, N.Y.) 2011;331;6016;430-4
Assessing the complex architecture of polygenic traits in diverged yeast populations.
Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, UK.
Phenotypic variation arising from populations adapting to different niches has a complex underlying genetic architecture. A major challenge in modern biology is to identify the causative variants driving phenotypic variation. Recently, the baker's yeast, Saccharomyces cerevisiae has emerged as a powerful model for dissecting complex traits. However, past studies using a laboratory strain were unable to reveal the complete architecture of polygenic traits. Here, we present a linkage study using 576 recombinant strains obtained from crosses of isolates representative of the major lineages. The meiotic recombinational landscape appears largely conserved between populations; however, strain-specific hotspots were also detected. Quantitative measurements of growth in 23 distinct ecologically relevant environments show that our recombinant population recapitulates most of the standing phenotypic variation described in the species. Linkage analysis detected an average of 6.3 distinct QTLs for each condition tested in all crosses, explaining on average 39% of the phenotypic variation. The QTLs detected are not constrained to a small number of loci, and the majority are specific to a single cross-combination and to a specific environment. Moreover, crosses between strains of similar phenotypes generate greater variation in the offspring, suggesting the presence of many antagonistic alleles and epistatic interactions. We found that subtelomeric regions play a key role in defining individual quantitative variation, emphasizing the importance of the adaptive nature of these regions in natural populations. This set of recombinant strains is a powerful tool for investigating the complex architecture of polygenic traits.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F015216/1, BB/G01616X/1, BBF0152161; Wellcome Trust: WT 084507MA, WT077192 ⁄ Z ⁄ 05 ⁄ Z
Molecular ecology 2011;20;7;1401-13
A viral discovery methodology for clinical biopsy samples utilising massively parallel next generation sequencing.
Department of Veterinary Medicine, The University of Cambridge, Cambridge, United Kingdom.
Here we describe a virus discovery protocol for a range of different virus genera, that can be applied to biopsy-sized tissue samples. Our viral enrichment procedure, validated using canine and human liver samples, significantly improves viral read copy number and increases the length of viral contigs that can be generated by de novo assembly. This in turn enables the Illumina next generation sequencing (NGS) platform to be used as an effective tool for viral discovery from tissue samples.
Funded by: Wellcome Trust
PloS one 2011;6;12;e28879
The effect of next-generation sequencing technology on complex trait research.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Background: Advances in the understanding of complex trait genetics have always been enabled by advances in genomic technology. Next-generation sequencing (NGS) is set to revolutionize the way complex trait genetics research is carried out.
Results: NGS has multiple applications in the field of human genetics, but is accompanied by substantial study design, analysis and interpretation challenges. This review discusses key aspects of study design considerations, data handling issues and required analytical developments. We also highlight early successes in mapping genetic traits using NGS.
Conclusion: NGS opens the entire spectrum of genomic alterations for the genetic analysis of complex traits and there are early publications illustrating its power. Continuing development in analytical tools will allow the promise of NGS to be realized.
European journal of clinical investigation 2011;41;5;561-7
Linkage analysis without defined pedigrees.
Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095-7088, USA.
The need to collect accurate and complete pedigree information has been a drawback of family-based linkage and association studies. Even in case-control studies, investigators should be aware of, and condition on, familial relationships. In single nucleotide polymorphism (SNP) genome scans, relatedness can be directly inferred from the genetic data rather than determined through interviews. Various methods of estimating relatedness have previously been implemented, most notably in PLINK. We present new fast and accurate algorithms for estimating global and local kinship coefficients from dense SNP genotypes. These algorithms require only a single pass through the SNP genotype data. We also show that these estimates can be used to cluster individuals into pedigrees. With these estimates in hand, quantitative trait locus linkage analysis proceeds via traditional variance components methods without any prior relationship information. We demonstrate the success of our algorithms on simulated and real data sets. Our procedures make linkage analysis as easy as a typical genomewide association study.
Funded by: NHGRI NIH HHS: R01 HG006139; NHLBI NIH HHS: P01 HL045522-18; NIGMS NIH HHS: GM053275, R01 GM053275, R01 GM053275-15; NIMH NIH HHS: MH059490, R37 MH059490-12
Genetic epidemiology 2011;35;5;360-70
An evaluation of different target enrichment methods in pooled sequencing designs for complex disease association studies.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size.
Funded by: Wellcome Trust: WT088885/Z/09/Z
PloS one 2011;6;11;e26279
A variant in MCF2L is associated with osteoarthritis.
Wellcome Trust Sanger Institute, Hinxton, UK.
Osteoarthritis (OA) is a prevalent, heritable degenerative joint disease with a substantial public health impact. We used a 1000-Genomes-Project-based imputation in a genome-wide association scan for osteoarthritis (3177 OA cases and 4894 controls) to detect a previously unidentified risk locus. We discovered a small disease-associated set of variants on chromosome 13. Through large-scale replication, we establish a robust association with SNPs in MCF2L (rs11842874, combined odds ratio [95% confidence interval] 1.17 [1.11-1.23], p = 2.1 × 10(-8)) across a total of 19,041 OA cases and 24,504 controls of European descent. This risk locus represents the third established signal for OA overall. MCF2L regulates a nerve growth factor (NGF), and treatment with a humanized monoclonal antibody against NGF is associated with reduction in pain and improvement in function for knee OA patients.
Funded by: Medical Research Council: G0100594, G0901461, MC_U122886349
American journal of human genetics 2011;89;3;446-50
Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing.
Department of Forensic Molecular Biology, Erasmus MC University Medical Center, PO Box 2040, Rotterdam, 3000 CA, The Netherlands. firstname.lastname@example.org.
Background: Numerous genome-wide scans conducted by genotyping previously ascertained single-nucleotide polymorphisms (SNPs) have provided candidate signatures for positive selection in various regions of the human genome, including in genes involved in pigmentation traits. However, it is unclear how well the signatures discovered by such haplotype-based test statistics can be reproduced in tests based on full resequencing data. Four genes (oculocutaneous albinism II (OCA2), tyrosinase-related protein 1 (TYRP1), dopachrome tautomerase (DCT), and KIT ligand (KITLG)) implicated in human skin-color variation, have shown evidence for positive selection in Europeans and East Asians in previous SNP-scan data. In the current study, we resequenced 4.7 to 6.7 kb of DNA from each of these genes in Africans, Europeans, East Asians, and South Asians.
Results: Applying all commonly used neutrality-test statistics for allele frequency distribution to the newly generated sequence data provided conflicting results regarding evidence for positive selection. Previous haplotype-based findings could not be clearly confirmed. Although some tests were marginally significant for some populations and genes, none of them were significant after multiple-testing correction. Combined P values for each gene-population pair did not improve these results. Application of Approximate Bayesian Computation Markov chain Monte Carlo based to these sequence data using a simple forward simulator revealed broad posterior distributions of the selective parameters for all four genes, providing no support for positive selection. However, when we applied this approach to published sequence data on SLC45A2, another human pigmentation candidate gene, we could readily confirm evidence for positive selection, as previously detected with sequence-based and some haplotype-based tests.
Conclusions: Overall, our data indicate that even genes that are strong biological candidates for positive selection and show reproducible signatures of positive selection in SNP scans do not always show the same replicability of selection signals in other tests, which should be considered in future studies on detecting positive selection in genetic data.
Investigative genetics 2011;2;1;24
Computational identification of insertional mutagenesis targets for cancer gene discovery.
Bioinformatics and Statistics, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX Amsterdam, The Netherlands.
Insertional mutagenesis is a potent forward genetic screening technique used to identify candidate cancer genes in mouse model systems. An important, yet unresolved issue in the analysis of these screens, is the identification of the genes affected by the insertions. To address this, we developed Kernel Convolved Rule Based Mapping (KC-RBM). KC-RBM exploits distance, orientation and insertion density across tumors to automatically map integration sites to target genes. We perform the first genome-wide evaluation of the association of insertion occurrences with aberrant gene expression of the predicted targets in both retroviral and transposon data sets. We demonstrate the efficiency of KC-RBM by showing its superior performance over existing approaches in recovering true positives from a list of independently, manually curated cancer genes. The results of this work will significantly enhance the accuracy and speed of cancer gene discovery in forward genetic screens. KC-RBM is available as R-package.
Funded by: Cancer Research UK; Wellcome Trust
Nucleic acids research 2011;39;15;e105
Genetic risk reclassification for type 2 diabetes by age below or above 50 years using 40 type 2 diabetes risk single nucleotide polymorphisms.
General Medicine Division, Massachusetts General Hospital, Boston, Massachusetts, USA.
Objective: To test if knowledge of type 2 diabetes genetic variants improves disease prediction.
Research design and methods: We tested 40 single nucleotide polymorphisms (SNPs) associated with diabetes in 3,471 Framingham Offspring Study subjects followed over 34 years using pooled logistic regression models stratified by age (<50 years, diabetes cases = 144; or ≥50 years, diabetes cases = 302). Models included clinical risk factors and a 40-SNP weighted genetic risk score.
Results: In people <50 years of age, the clinical risk factors model C-statistic was 0.908; the 40-SNP score increased it to 0.911 (P = 0.3; net reclassification improvement (NRI): 10.2%, P = 0.001). In people ≥50 years of age, the C-statistics without and with the score were 0.883 and 0.884 (P = 0.2; NRI: 0.4%). The risk per risk allele was higher in people <50 than ≥50 years of age (24 vs. 11%; P value for age interaction = 0.02).
Conclusions: Knowledge of common genetic variation appropriately reclassifies younger people for type 2 diabetes risk beyond clinical risk factors but not older people.
Funded by: Medical Research Council: MC_U106179474; NCRR NIH HHS: 1S10RR163736-01A1; NHLBI NIH HHS: N01-HC- 25195; NIDDK NIH HHS: K23 DK65978, K24 DK080140, R01 DK078616, R21 DK084527, R21 DK084527-01
Diabetes care 2011;34;1;121-5
Ethical issues in human genomics research in developing countries.
The Ethox Centre, Department of Public Health and Primary Care, University of Oxford, Old Road Campus, Headington, Oxford, OX3 7LF, UK. email@example.com
Background: Genome-wide association studies (GWAS) provide a powerful means of identifying genetic variants that play a role in common diseases. Such studies present important ethical challenges. An increasing number of GWAS is taking place in lower income countries and there is a pressing need to identify the particular ethical challenges arising in such contexts. In this paper, we draw upon the experiences of the MalariaGEN Consortium to identify specific ethical issues raised by such research in Africa, Asia and Oceania.
Discussion: We explore ethical issues in three key areas: protecting the interests of research participants, regulation of international collaborative genomics research and protecting the interests of scientists in low income countries. With regard to participants, important challenges are raised about community consultation and consent. Genomics research raises ethical and governance issues about sample export and ownership, about the use of archived samples and about the complexity of reviewing such large international projects. In the context of protecting the interests of researchers in low income countries, we discuss aspects of data sharing and capacity building that need to be considered for sustainable and mutually beneficial collaborations.
Summary: Many ethical issues are raised when genomics research is conducted on populations that are characterised by lower average income and literacy levels, such as the populations included in MalariaGEN. It is important that such issues are appropriately addressed in such research. Our experience suggests that the ethical issues in genomics research can best be identified, analysed and addressed where ethics is embedded in the design and implementation of such research projects.
Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 077383/Z/05/Z, 087285/Z/08/Z, WT 083326/Z/07/Z
BMC medical ethics 2011;12;5
Cell type-specific DNA methylation at intragenic CpG islands in the immune system.
Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
Human and mouse genomes contain a similar number of CpG islands (CGIs), which are discrete CpG-rich DNA sequences associated with transcription start sites. In both species, ∼50% of all CGIs are remote from annotated promoters but, nevertheless, often have promoter-like features. To determine the role of CGI methylation in cell differentiation, we analyzed DNA methylation at a comprehensive CGI set in cells of the mouse hematopoietic lineage. Using a method that potentially detects ∼33% of genomic CpGs in the methylated state, we found that large differences in gene expression were accompanied by surprisingly few DNA methylation changes. There were, however, many DNA methylation differences between hematopoietic cells and a distantly related tissue, brain. Altered DNA methylation in the immune system occurred predominantly at CGIs within gene bodies, which have the properties of cell type-restricted promoters, but infrequently at annotated gene promoters or CGI flanking sequences (CGI "shores"). Unexpectedly, elevated intragenic CGI methylation correlated with silencing of the associated gene. Differentially methylated intragenic CGIs tended to lack H3K4me3 and associate with a transcriptionally repressive environment regardless of methylation state. Our results indicate that DNA methylation changes play a relatively minor role in the late stages of differentiation and suggest that intragenic CGIs represent regulatory sites of differential gene expression during the early stages of lineage specification.
Funded by: Medical Research Council; Wellcome Trust
Genome research 2011;21;7;1074-86
Does a short breastfeeding period protect from FTO-induced adiposity in children?
Department of Dietetics and Nutrition, Harokopio University, Athens, Greece. firstname.lastname@example.org
Context: A number of studies have reported replicable associations between common genetic loci and obesity indices. One of these loci is the fat mass and obesity associated locus (FTO). We aimed to assess whether breastfeeding mediated the known association between FTO and indices of body fatness.
Methods: This study includes three independent pediatric cohorts, two of Greek origin (the Gene-Diet Attica Investigation: GENDAI, n=1 138 and the "Growth, Exercise and Nutrition Epidemiological Study In preschoolers": the GENESIS study, n=2 374) and one British (the Avon Longitudinal Study of Parents and Children:ALSPAC, n=4 325). Among other information, breastfeeding history was recorded. A DNA sample was ascertained by either blood or saliva. Genotyping for FTO variants was performed in GENDAI and ALSPAC for the rs9939609, while in GENESIS, for the rs17817449 variant.
Results: In all cohorts, multivariate analysis showed that the association between FTO:rs9939609 and measures of obesity was consistent across newly presented cohorts (GENDAI: Body mass index [BMI], β=0.43, p=0.009; Waist Circumference, β=1.067, p=0.019; triceps skinfold, β=0.972, p=0.003; subscapular skinfold, β=0.593, p=0.023; GENESIS: Waist Circumference, β=0.473, p=0.008 and subscapular skinfold, β=0.227, p=0.014). Inclusion of one month of breastfeeding as an interaction term effectively removed these associations with indices of obesity (BMI, Waist-Hip-Ratio and subscapular skinfold). No evidence of such interaction was observed for the independent cohort of British children.
Conclusions: Our findings indicate that in two moderately sized Greek samples, breastfeeding may exert a modifying effect on the relationship between variants at the FTO locus and indices of adiposity. These findings were not replicated in a larger British collection.
Funded by: Medical Research Council: G0600705, G9815508; NIDDK NIH HHS: K23 DK067288; Wellcome Trust
International journal of pediatric obesity : IJPO : an official journal of the International Association for the Study of Obesity 2011;6;2-2;e326-35
Specific capture and whole-genome sequencing of viruses from clinical samples.
Division of Infection and Immunity, University College London, London, United Kingdom. email@example.com
Whole genome sequencing of viruses directly from clinical samples is integral for understanding the genetics of host-virus interactions. Here, we report the use of sample sparing target enrichment (by hybridisation) for viral nucleic acid separation and deep-sequencing of herpesvirus genomes directly from a range of clinical samples including saliva, blood, virus vesicles, cerebrospinal fluid, and tumour cell lines. We demonstrate the effectiveness of the method by deep-sequencing 13 highly cell-associated human herpesvirus genomes and generating full length genome alignments at high read depth. Moreover, we show the specificity of the method enables the study of viral population structures and their diversity within a range of clinical samples types.
Funded by: Department of Health; Medical Research Council: G07008, G0700814, G0900950; Wellcome Trust: 081703MA
PloS one 2011;6;11;e27805
Live vaccines and their role in modern vaccinology
Replicating Vaccines. 2011;Part 1;3-14
Dalliance: interactive genome viewing on the web.
Wellcome Trust/CRUK Gurdon Institute, Cambridge CB2 1QN, UK. firstname.lastname@example.org
Summary: Dalliance is a new genome viewer which offers a high level of interactivity while running within a web browser. All data is fetched using the established distributed annotation system (DAS) protocol, making it easy to customize the browser and add extra data.
Funded by: Wellcome Trust: 077198, 083563
Bioinformatics (Oxford, England) 2011;27;6;889-90
Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.
Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.
Funded by: Wellcome Trust: 076355, 085775/Z/08/Z
Genome research 2011;21;12;2143-56
Developing and implementing an institute-wide data sharing policy.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com.
The Wellcome Trust Sanger Institute has a strong reputation for prepublication data sharing as a result of its policy of rapid release of genome sequence data and particularly through its contribution to the Human Genome Project. The practicalities of broad data sharing remain largely uncharted, especially to cover the wide range of data types currently produced by genomic studies and to adequately address ethical issues. This paper describes the processes and challenges involved in implementing a data sharing policy on an institute-wide scale. This includes questions of governance, practical aspects of applying principles to diverse experimental contexts, building enabling systems and infrastructure, incentives and collaborative issues.
Genome medicine 2011;3;9;60
A user's guide to the encyclopedia of DNA elements (ENCODE).
HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America. firstname.lastname@example.org
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
Funded by: NHGRI NIH HHS: R01 HG003143, R01 HG004037, RC2 HG005573; NIDDK NIH HHS: R01 DK054369, R01 DK065806; Wellcome Trust: 095908
PLoS biology 2011;9;4;e1001046
Meta-analysis of genome-wide association studies confirms a susceptibility locus for knee osteoarthritis on chromosome 7q22.
Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece.
Objectives: Osteoarthritis (OA) is the most prevalent form of arthritis and accounts for substantial morbidity and disability, particularly in older people. It is characterised by changes in joint structure, including degeneration of the articular cartilage, and its aetiology is multifactorial with a strong postulated genetic component.
Methods: A meta-analysis was performed of four genome-wide association (GWA) studies of 2371 cases of knee OA and 35 909 controls in Caucasian populations. Replication of the top hits was attempted with data from 10 additional replication datasets.
Results: With a cumulative sample size of 6709 cases and 44 439 controls, one genome-wide significant locus was identified on chromosome 7q22 for knee OA (rs4730250, p=9.2 × 10⁻⁹), thereby confirming its role as a susceptibility locus for OA.
Conclusion: The associated signal is located within a large (500 kb) linkage disequilibrium block that contains six genes: PRKAR2B (protein kinase, cAMP-dependent, regulatory, type II, β), HPB1 (HMG-box transcription factor 1), COG5 (component of oligomeric golgi complex 5), GPR22 (G protein-coupled receptor 22), DUS4L (dihydrouridine synthase 4-like) and BCAP29 (B cell receptor-associated protein 29). Gene expression analyses of the (six) genes in primary cells derived from different joint tissues confirmed expression of all the genes in the joint environment.
Funded by: Arthritis Research UK: 17489, 18030; Medical Research Council: G0000934, G0100594, G0901461, MC_U122886349; Wellcome Trust: 068545, 083948, 088785, WT079557MA, WT088885/Z/09/Z
Annals of the rheumatic diseases 2011;70;2;349-55
Differential protein expression throughout the life cycle of Trypanosoma congolense, a major parasite of cattle in Africa.
Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada.
Trypanosoma congolense is an important pathogen of livestock in Africa. To study protein expression throughout the T. congolense life cycle, we used culture-derived parasites of each of the three main insect stages and bloodstream stage parasites isolated from infected mice, to perform differential protein expression analysis. Three complete biological replicates of all four life cycle stages were produced from T. congolense IL3000, a cloned parasite that is amenable to culture of major life cycle stages in vitro. Cellular proteins from each life cycle stage were trypsin digested and the resulting peptides were labeled with isobaric tags for relative and absolute quantification (iTRAQ). The peptides were then analyzed by tandem mass spectrometry (MS/MS). This method was used to identify and relatively quantify proteins from the different life cycle stages in the same experiment. A search of the Wellcome Trust's Sanger Institute's semi-annotated T. congolense database was performed using the MS/MS fragmentation data to identify the corresponding source proteins. A total of 2088 unique protein sequences were identified, representing 23% of the ∼9000 proteins predicted for the T. congolense proteome. The 1291 most confidently identified proteins were prioritized for further study. Of these, 784 yielded annotated hits while 501 were described as "hypothetical proteins". Six proteins showed no significant sequence similarity to any known proteins (from any species) and thus represent new, previously uncharacterized T. congolense proteins. Of particular interest among the remainder are several membrane molecules that showed drastic differential expression, including, not surprisingly, the well-studied variant surface glycoproteins (VSGs), invariant surface glycoproteins (ISGs) 65 and 75, congolense epimastigote specific protein (CESP), the surface protease GP63, an amino acid transporter, a pteridine transporter and a haptoglobin-hemoglobin receptor. Several of these surface disposed proteins are of functional interest as they are necessary for survival of the parasites.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
Molecular and biochemical parasitology 2011;177;2;116-25
Examining the overlap between genome-wide rare variant association signals and linkage peaks in rheumatoid arthritis.
University of Manchester, Manchester, UK.
Objective: With the exception of the major histocompatibility complex (MHC) and STAT4, no other rheumatoid arthritis (RA) linkage peak has been successfully fine-mapped to date. This apparent failure to identify association under peaks of linkage could be ascribed to the examination of common variation, when linkage is likely to be driven by rare variants. The purpose of this study was to investigate the overlap between genome-wide rare variant RA association signals observed in the Wellcome Trust Case Control Consortium (WTCCC) study and 11 replicating RA linkage peaks, defined as regions with evidence for linkage in >1 study.
Methods: The WTCCC data set contained 40,482 variants with minor allele frequency of ≤0.05 in 1,860 RA patients and 2,938 controls. Genotypes of all rare variants within a given gene region were collapsed into a single locus and a global P value was calculated per gene.
Results: The distribution of rare variant signals (association P≤10(-5)) was found to differ significantly between regions with and without linkage evidence (P=2×10(-17) by Fisher's exact test). No significant difference was observed after data from the MHC region were removed or when the effect of the HLA-DRB1 locus was accounted for.
Conclusion: The results suggest that rare variant association signals are significantly overrepresented under linkage peaks in RA, but the effect is driven by the MHC. This is the first study to examine the overlap between linkage peaks and rare variant association signals genome-wide in a complex disease.
Funded by: Arthritis Research UK: 17552, 18030; Wellcome Trust: 076113, 079557MA, 088885, WT088885/Z/09/Z
Arthritis and rheumatism 2011;63;6;1522-6
Troponin T is essential for sarcomere assembly in zebrafish skeletal muscle.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.
In striated muscle, the basic contractile unit is the sarcomere, which comprises myosin-rich thick filaments intercalated with thin filaments made of actin, tropomyosin and troponin. Troponin is required to regulate Ca(2+)-dependent contraction, and mutant forms of troponins are associated with muscle diseases. We have disrupted several genes simultaneously in zebrafish embryos and have followed the progression of muscle degeneration in the absence of troponin. Complete loss of troponin T activity leads to loss of sarcomere structure, in part owing to the destructive nature of deregulated actin-myosin activity. When troponin T and myosin activity are simultaneously disrupted, immature sarcomeres are rescued. However, tropomyosin fails to localise to sarcomeres, and intercalating thin filaments are missing from electron microscopic cross-sections, indicating that loss of troponin T affects thin filament composition. If troponin activity is only partially disrupted, myofibrils are formed but eventually disintegrate owing to deregulated actin-myosin activity. We conclude that the troponin complex has at least two distinct activities: regulation of actin-myosin activity and, independently, a role in the proper assembly of thin filaments. Our results also indicate that sarcomere assembly can occur in the absence of normal thin filaments.
Funded by: Wellcome Trust: WT 077037/Z/05/Z, WT 077047/Z/05/Z
Journal of cell science 2011;124;Pt 4;565-77
The Genomic Standards Consortium.
Centre for Ecology & Hydrology, Maclean Building, Crowmarsh Gifford, Wallingford, Oxfordshire, United Kingdom. email@example.com
A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.
PLoS biology 2011;9;6;e1001088
A call for papers for the second special issue of SIGS from the genomic standards consortium
Standards in Genomic Sciences 2011;4;111-2
The Deciphering Developmental Disorders (DDD) study.
Department of Medical Genetics, Cambridge University Hospitals Foundation Trust, Cambridge, UK.
Funded by: Wellcome Trust
Developmental medicine and child neurology 2011;53;8;702-3
Germline fitness-based scoring of cancer mutations.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
A key goal in cancer research is to find the genomic alterations that underlie malignant cells. Genomics has proved successful in identifying somatic variants at a large scale. However, it has become evident that a typical cancer exhibits a heterogenous mutation pattern across samples. Cases where the same alteration is observed repeatedly seem to be the exception rather than the norm. Thus, pinpointing the key alterations (driver mutations) from a background of variations with no direct causal link to cancer (passenger mutations) is difficult. Here we analyze somatic missense mutations from cancer samples and their healthy tissue counterparts (germline mutations) from the viewpoint of germline fitness. We calibrate a scoring system from protein domain alignments to score mutations and their target loci. We show first that this score predicts to a good degree the rate of polymorphism of the observed germline variation. The scoring is then applied to somatic mutations. We show that candidate cancer genes prone to copy number loss harbor mutations with germline fitness effects that are significantly more deleterious than expected by chance. This suggests that missense mutations play a driving role in tumor suppressor genes. Furthermore, these mutations fall preferably onto loci in sequence neighborhoods that are high scoring in terms of germline fitness. In contrast, for somatic mutations in candidate onco genes we do not observe a statistically significant effect. These results help to inform how to exploit germline fitness predictions in discovering new genes and mutations responsible for cancer.
Funded by: Wellcome Trust: 091747
aCGH.Spline--an R package for aCGH dye bias normalization.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. firstname.lastname@example.org
Motivation: The careful normalization of array-based comparative genomic hybridization (aCGH) data is of critical importance for the accurate detection of copy number changes. The difference in labelling affinity between the two fluorophores used in aCGH-usually Cy5 and Cy3-can be observed as a bias within the intensity distributions. If left unchecked, this bias is likely to skew data interpretation during downstream analysis and lead to an increased number of false discoveries.
Results: In this study, we have developed aCGH.Spline, a natural cubic spline interpolation method followed by linear interpolation of outlier values, which is able to remove a large portion of the dye bias from large aCGH datasets in a quick and efficient manner.
Conclusions: We have shown that removing this bias and reducing the experimental noise has a strong positive impact on the ability to detect accurately both copy number variation (CNV) and copy number alterations (CNA).
Funded by: Wellcome Trust: WT077008
Bioinformatics (Oxford, England) 2011;27;9;1195-200
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. email@example.com
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/13438; Wellcome Trust: 062023, 077198
Nucleic acids research 2011;39;Database issue;D800-6
Salmonella bongori provides insights into the evolution of the Salmonellae.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
The genus Salmonella contains two species, S. bongori and S. enterica. Compared to the well-studied S. enterica there is a marked lack of information regarding the genetic makeup and diversity of S. bongori. S. bongori has been found predominantly associated with cold-blooded animals, but it can infect humans. To define the phylogeny of this species, and compare it to S. enterica, we have sequenced 28 isolates representing most of the known diversity of S. bongori. This cross-species analysis allowed us to confidently differentiate ancestral functions from those acquired following speciation, which include both metabolic and virulence-associated capacities. We show that, although S. bongori inherited a basic set of Salmonella common virulence functions, it has subsequently elaborated on this in a different direction to S. enterica. It is an established feature of S. enterica evolution that the acquisition of the type III secretion systems (T3SS-1 and T3SS-2) has been followed by the sequential acquisition of genes encoding secreted targets, termed effectors proteins. We show that this is also true of S. bongori, which has acquired an array of novel effector proteins (sboA-L). All but two of these effectors have no significant S. enterica homologues and instead are highly similar to those found in enteropathogenic Escherichia coli (EPEC). Remarkably, SboH is found to be a chimeric effector protein, encoded by a fusion of the T3SS-1 effector gene sopA and a gene highly similar to the EPEC effector nleH from enteropathogenic E. coli. We demonstrate that representatives of these new effectors are translocated and that SboH, similarly to NleH, blocks intrinsic apoptotic pathways while being targeted to the mitochondria by the SopA part of the fusion. This work suggests that S. bongori has inherited the ancestral Salmonella virulence gene set, but has adapted by incorporating virulence determinants that resemble those employed by EPEC.
Funded by: Medical Research Council; Wellcome Trust: 076964
PLoS pathogens 2011;7;8;e1002191
Assessment of a 44 gene classifier for the evaluation of chronic fatigue syndrome from peripheral blood mononuclear cell gene expression.
Department of Infection, Division of Infection and Immunity, University College London, London, United Kingdom.
Chronic fatigue syndrome (CFS) is a clinically defined illness estimated to affect millions of people worldwide causing significant morbidity and an annual cost of billions of dollars. Currently there are no laboratory-based diagnostic methods for CFS. However, differences in gene expression profiles between CFS patients and healthy persons have been reported in the literature. Using mRNA relative quantities for 44 previously identified reporter genes taken from a large dataset comprising both CFS patients and healthy volunteers, we derived a gene profile scoring metric to accurately classify CFS and healthy samples. This metric out-performed any of the reporter genes used individually as a classifier of CFS.To determine whether the reporter genes were robust across populations, we applied this metric to classify a separate blind dataset of mRNA relative quantities from a new population of CFS patients and healthy persons with limited success. Although the metric was able to successfully classify roughly two-thirds of both CFS and healthy samples correctly, the level of misclassification was high. We conclude many of the previously identified reporter genes are study-specific and thus cannot be used as a broad CFS diagnostic.
PloS one 2011;6;3;e16872
Clustered coding variants in the glutamate receptor complexes of individuals with schizophrenia and bipolar disorder.
Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
Current models of schizophrenia and bipolar disorder implicate multiple genes, however their biological relationships remain elusive. To test the genetic role of glutamate receptors and their interacting scaffold proteins, the exons of ten glutamatergic 'hub' genes in 1304 individuals were re-sequenced in case and control samples. No significant difference in the overall number of non-synonymous single nucleotide polymorphisms (nsSNPs) was observed between cases and controls. However, cluster analysis of nsSNPs identified two exons encoding the cysteine-rich domain and first transmembrane helix of GRM1 as a risk locus with five mutations highly enriched within these domains. A new splice variant lacking the transmembrane GPCR domain of GRM1 was discovered in the human brain and the GRM1 mutation cluster could perturb the regulation of this variant. The predicted effect on individuals harbouring multiple mutations distributed in their ten hub genes was also examined. Diseased individuals possessed an increased load of deleteriousness from multiple concurrent rare and common coding variants. Together, these data suggest a disease model in which the interplay of compound genetic coding variants, distributed among glutamate receptors and their interacting proteins, contribute to the pathogenesis of schizophrenia and bipolar disorders.
Funded by: Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: G0700704, MC_U127592696; Wellcome Trust
PloS one 2011;6;4;e19011
Perilipin deficiency and autosomal dominant partial lipodystrophy.
University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, United Kingdom.
Perilipin is the most abundant adipocyte-specific protein that coats lipid droplets, and it is required for optimal lipid incorporation and release from the droplet. We identified two heterozygous frameshift mutations in the perilipin gene (PLIN1) in three families with partial lipodystrophy, severe dyslipidemia, and insulin-resistant diabetes. Subcutaneous fat from the patients was characterized by smaller-than-normal adipocytes, macrophage infiltration, and fibrosis. In contrast to wild-type perilipin, mutant forms of the protein failed to increase triglyceride accumulation when expressed heterologously in preadipocytes. These findings define a novel dominant form of inherited lipodystrophy and highlight the serious metabolic consequences of a primary defect in the formation of lipid droplets in adipose tissue.
Funded by: Medical Research Council; Wellcome Trust: 077016, 077016/Z/05/Z, 091551, 095515
The New England journal of medicine 2011;364;8;740-8
Meticillin-resistant Staphylococcus aureus with a novel mecA homologue in human and bovine populations in the UK and Denmark: a descriptive study.
Department of Veterinary Medicine, University of Cambridge, UK.
Background: Animals can act as a reservoir and source for the emergence of novel meticillin-resistant Staphylococcus aureus (MRSA) clones in human beings. Here, we report the discovery of a strain of S aureus (LGA251) isolated from bulk milk that was phenotypically resistant to meticillin but tested negative for the mecA gene and a preliminary investigation of the extent to which such strains are present in bovine and human populations.
Methods: Isolates of bovine MRSA were obtained from the Veterinary Laboratories Agency in the UK, and isolates of human MRSA were obtained from diagnostic or reference laboratories (two in the UK and one in Denmark). From these collections, we searched for mecA PCR-negative bovine and human S aureus isolates showing phenotypic meticillin resistance. We used whole-genome sequencing to establish the genetic basis for the observed antibiotic resistance.
Findings: A divergent mecA homologue (mecA(LGA251)) was discovered in the LGA251 genome located in a novel staphylococcal cassette chromosome mec element, designated type-XI SCCmec. The mecA(LGA251) was 70% identical to S aureus mecA homologues and was initially detected in 15 S aureus isolates from dairy cattle in England. These isolates were from three different multilocus sequence type lineages (CC130, CC705, and ST425); spa type t843 (associated with CC130) was identified in 60% of bovine isolates. When human mecA-negative MRSA isolates were tested, the mecA(LGA251) homologue was identified in 12 of 16 isolates from Scotland, 15 of 26 from England, and 24 of 32 from Denmark. As in cows, t843 was the most common spa type detected in human beings.
Interpretation: Although routine culture and antimicrobial susceptibility testing will identify S aureus isolates with this novel mecA homologue as meticillin resistant, present confirmatory methods will not identify them as MRSA. New diagnostic guidelines for the detection of MRSA should consider the inclusion of tests for mecA(LGA251).
Funding: Department for Environment, Food and Rural Affairs, Higher Education Funding Council for England, Isaac Newton Trust (University of Cambridge), and the Wellcome Trust.
Funded by: Wellcome Trust
The Lancet. Infectious diseases 2011;11;8;595-603
RNIE: genome-wide prediction of bacterial intrinsic terminators.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA0, UK. firstname.lastname@example.org
Bacterial Rho-independent terminators (RITs) are important genomic landmarks involved in gene regulation and terminating gene expression. In this investigation we present RNIE, a probabilistic approach for predicting RITs. The method is based upon covariance models which have been known for many years to be the most accurate computational tools for predicting homology in structural non-coding RNAs. We show that RNIE has superior performance in model species from a spectrum of bacterial phyla. Further analysis of species where a low number of RITs were predicted revealed a highly conserved structural sequence motif enriched near the genic termini of the pathogenic Actinobacteria, Mycobacterium tuberculosis. This motif, together with classical RITs, account for up to 90% of all the significantly structured regions from the termini of M. tuberculosis genic elements. The software, predictions and alignments described below are available from http://github.com/ppgardne/RNIE.
Funded by: Howard Hughes Medical Institute
Nucleic acids research 2011;39;14;5845-52
Analysis of XMRV integration sites from human prostate cancer tissues suggests PCR contamination rather than genuine human infection.
MRC Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, 46 Cleveland St, London W1T 4JF, UK.
XMRV is a gammaretrovirus associated in some studies with human prostate cancer and chronic fatigue syndrome. Central to the hypothesis of XMRV as a human pathogen is the description of integration sites in DNA from prostate tumour tissues. Here we demonstrate that 2 of 14 patient-derived sites are identical to sites cloned in the same laboratory from experimentally infected DU145 cells. Identical integration sites have never previously been described in any retrovirus infection. We propose that the patient-derived sites are the result of PCR contamination. This observation further undermines the notion that XMRV is a genuine human pathogen.
Funded by: Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940, WT076608, WT090940
New gene functions in megakaryopoiesis and platelet formation.
Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstr 1, 85764 Neuherberg, Germany. email@example.com
Platelets are the second most abundant cell type in blood and are essential for maintaining haemostasis. Their count and volume are tightly controlled within narrow physiological ranges, but there is only limited understanding of the molecular processes controlling both traits. Here we carried out a high-powered meta-analysis of genome-wide association studies (GWAS) in up to 66,867 individuals of European ancestry, followed by extensive biological and functional assessment. We identified 68 genomic loci reliably associated with platelet count and volume mapping to established and putative novel regulators of megakaryopoiesis and platelet formation. These genes show megakaryocyte-specific gene expression patterns and extensive network connectivity. Using gene silencing in Danio rerio and Drosophila melanogaster, we identified 11 of the genes as novel regulators of blood cell formation. Taken together, our findings advance understanding of novel gene functions controlling fate-determining events during megakaryopoiesis and platelet formation, providing a new example of successful translation of GWAS to function.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; British Heart Foundation: RG/09/012/28096; Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: G0000111, G0601966, G0700704, G0700931, G0701120, G0701863, G0801056, G1000143, MC_PC_15018, MC_U105260799, MC_U106179471, MC_U106188470; NCRR NIH HHS: K12 RR023250, K12 RR023250-05, M01 RR016500, M01 RR016500-08, U54 RR020278, U54 RR020278-06, UL1 RR025005, UL1 RR025005-05; NHGRI NIH HHS: P41 HG003751; NHLBI NIH HHS: N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01 HC085079, P01 HL076491, P01 HL076491-09, P01 HL098055, P01 HL098055-03, R01 HL059367, R01 HL059367-11, R01 HL068986, R01 HL068986-06, R01 HL073410, R01 HL073410-08, R01 HL085251, R01 HL085251-04, R01 HL086694, R01 HL086694-05, R01 HL087641, R01 HL087641-03, R01 HL087679-03, R01 HL088119, R01 HL088119-04, R01 HL103866, R01 HL103866-03, R01 HL105756, U01 HL072515, U01 HL072515-06, U01 HL084756, U01 HL084756-03; NIA NIH HHS: R01 AG018728, R01 AG018728-05S1; NICHD NIH HHS: R01 HD042157, R01 HD042157-01A1; NIDDK NIH HHS: P30 DK072488, P30 DK072488-08; NIGMS NIH HHS: R01 GM053275, R01 GM053275-14, U01 GM074518, U01 GM074518-04; NIMH NIH HHS: RL1 MH083268, RL1 MH083268-05; Wellcome Trust: 092731, 098051, WT077037/Z/05/Z, WT077047/Z/05/Z, WT082597/Z/07/Z
Common variants near ATM are associated with glycemic response to metformin in type 2 diabetes.
Biomedical Research Institute, University of Dundee, Dundee, UK.
Metformin is the most commonly used pharmacological therapy for type 2 diabetes. We report a genome-wide association study for glycemic response to metformin in 1,024 Scottish individuals with type 2 diabetes with replication in two cohorts including 1,783 Scottish individuals and 1,113 individuals from the UK Prospective Diabetes Study. In a combined meta-analysis, we identified a SNP, rs11212617, associated with treatment success (n = 3,920, P = 2.9 × 10(-9), odds ratio = 1.35, 95% CI 1.22-1.49) at a locus containing ATM, the ataxia telangiectasia mutated gene. In a rat hepatoma cell line, inhibition of ATM with KU-55933 attenuated the phosphorylation and activation of AMP-activated protein kinase in response to metformin. We conclude that ATM, a gene known to be involved in DNA repair and cell cycle control, plays a role in the effect of metformin upstream of AMP-activated protein kinase, and variation in this gene alters glycemic response to metformin.
Funded by: Chief Scientist Office; Department of Health: PDA/02/06/016; Medical Research Council: G0601261, G0901310, G19/2; Wellcome Trust: 084726, 084726/Z/08/Z, 085475/B/08/Z, 085475/Z/08/Z
Nature genetics 2011;43;2;117-20
Transition of Plasmodium sporozoites into liver stage-like forms is regulated by the RNA binding protein Pumilio.
Malaria Unit, Instituto de Medicina Molecular, Lisboa, Portugal.
Many eukaryotic developmental and cell fate decisions that are effected post-transcriptionally involve RNA binding proteins as regulators of translation of key mRNAs. In malaria parasites (Plasmodium spp.), the development of round, non-motile and replicating exo-erythrocytic liver stage forms from slender, motile and cell-cycle arrested sporozoites is believed to depend on environmental changes experienced during the transmission of the parasite from the mosquito vector to the vertebrate host. Here we identify a Plasmodium member of the RNA binding protein family PUF as a key regulator of this transformation. In the absence of Pumilio-2 (Puf2) sporozoites initiate EEF development inside mosquito salivary glands independently of the normal transmission-associated environmental cues. Puf2- sporozoites exhibit genome-wide transcriptional changes that result in loss of gliding motility, cell traversal ability and reduction in infectivity, and, moreover, trigger metamorphosis typical of early Plasmodium intra-hepatic development. These data demonstrate that Puf2 is a key player in regulating sporozoite developmental control, and imply that transformation of salivary gland-resident sporozoites into liver stage-like parasites is regulated by a post-transcriptional mechanism.
Funded by: Wellcome Trust: 083811
PLoS pathogens 2011;7;5;e1002046
No evidence of XMRV or related retroviruses in a London HIV-1-positive patient cohort.
Department of Infection and Immunity, University College London, London, United Kingdom. firstname.lastname@example.org
Background: Several studies have implicated a recently discovered gammaretrovirus, XMRV (Xenotropic murine leukaemia virus-related virus), in chronic fatigue syndrome and prostate cancer, though whether as causative agent or opportunistic infection is unclear. It has also been suggested that the virus can be found circulating amongst the general population. The discovery has been controversial, with conflicting results from attempts to reproduce the original studies.
Methodology/principal findings: We extracted peripheral blood DNA from a cohort of 540 HIV-1-positive patients (approximately 20% of whom have never been on anti-retroviral treatment) and determined the presence of XMRV and related viruses using TaqMan PCR. While we were able to amplify as few as 5 copies of positive control DNA, we did not find any positive samples in the patient cohort.
Conclusions/significance: In view of these negative findings in this highly susceptible group, we conclude that it is unlikely that XMRV or related viruses are circulating at a significant level, if at all, in HIV-1-positive patients in London or in the general population.
Funded by: Department of Health; Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940, WT090940
PloS one 2011;6;3;e18096
Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon.
The Lebanese American University, Chouran, Beirut, Lebanon.
Cultural expansions, including of religions, frequently leave genetic traces of differentiation and in-migration. These expansions may be driven by complex doctrinal differentiation, together with major population migrations and gene flow. The aim of this study was to explore the genetic signature of the establishment of religious communities in a region where some of the most influential religions originated, using the Y chromosome as an informative male-lineage marker. A total of 3139 samples were analyzed, including 647 Lebanese and Iranian samples newly genotyped for 28 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y chromosome. Genetic organization was identified by geography and religion across Lebanon in the context of surrounding populations important in the expansions of the major sects of Lebanon, including Italy, Turkey, the Balkans, Syria, and Iran by employing principal component analysis, multidimensional scaling, and AMOVA. Timing of population differentiations was estimated using BATWING, in comparison with dates of historical religious events to determine if these differentiations could be caused by religious conversion, or rather, whether religious conversion was facilitated within already differentiated populations. Our analysis shows that the great religions in Lebanon were adopted within already distinguishable communities. Once religious affiliations were established, subsequent genetic signatures of the older differentiations were reinforced. Post-establishment differentiations are most plausibly explained by migrations of peoples seeking refuge to avoid the turmoil of major historical events.
Funded by: Wellcome Trust
European journal of human genetics : EJHG 2011;19;3;334-40
Y-chromosome R-M343 African lineages and sickle cell disease reveal structured assimilation in Lebanon.
Medical School, The Lebanese American University, Beirut, Lebanon.
We have sought to identify signals of assimilation of African male lines in Lebanon by exploring the association of sickle cell disease (SCD) in Lebanon with Y-chromosome haplogroups that are informative of the disease origin and its exclusivity to the Muslim community. A total of 732 samples were analyzed, including 33 SCD patients from Lebanon genotyped for 28 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y chromosome. Genetic organization was identified using populations known to have influenced the genetic structure of the Lebanese population, in addition to African populations with high incidence of SCD. Y-chromosome haplogroup R-M343 sub-lineages distinguish between sub-Saharan African and Lebanese Y chromosomes. We detected a limited penetration of SCD into Lebanese R-M343 carriers, restricted to Lebanese Muslims. We suggest that this penetration brought the sickle cell gene along with the African R-M343, probably with the Saharan caravan slave trade.
Funded by: Wellcome Trust: 077009
Journal of human genetics 2011;56;1;29-33
A worldwide analysis of beta-defensin copy number variation suggests recent selection of a high-expressing DEFB103 gene copy in East Asia.
Department of Genetics, University of Leicester, University Road, Leicester, United Kingdom.
Beta-defensins are a family of multifunctional genes with roles in defense against pathogens, reproduction, and pigmentation. In humans, six beta-defensin genes are clustered in a repeated region which is copy-number variable (CNV) as a block, with a diploid copy number between 1 and 12. The role in host defense makes the evolutionary history of this CNV particularly interesting, because morbidity due to infectious disease is likely to have been an important selective force in human evolution, and to have varied between geographical locations. Here, we show CNV of the beta-defensin region in chimpanzees, and identify a beta-defensin block in the human lineage that contains rapidly evolving noncoding regulatory sequences. We also show that variation at one of these rapidly evolving sequences affects expression levels and cytokine responsiveness of DEFB103, a key inhibitor of influenza virus fusion at the cell surface. A worldwide analysis of beta-defensin CNV in 67 populations shows an unusually high frequency of high-DEFB103-expressing copies in East Asia, the geographical origin of historical and modern influenza epidemics, possibly as a result of selection for increased resistance to influenza in this region.
Funded by: Medical Research Council: G0801123, GO801123; Wellcome Trust: 067948, 077009, 087663
Human mutation 2011;32;7;743-50
Genomic Analysis of Hepatitis B Virus Reveals Antigen State and Genotype as Sources of Evolutionary Rate Variation
EpiChIP: gene-by-gene quantification of epigenetic modification levels.
MRC Laboratory of Molecular Biology, Hills Rd, CB2 0QH Cambridge, UK. email@example.com
The combination of chromatin immunoprecipitation with next-generation sequencing technology (ChIP-seq) is a powerful and increasingly popular method for mapping protein-DNA interactions in a genome-wide fashion. The conventional way of analyzing this data is to identify sequencing peaks along the chromosomes that are significantly higher than the read background. For histone modifications and other epigenetic marks, it is often preferable to find a characteristic region of enrichment in sequencing reads relative to gene annotations. For instance, many histone modifications are typically enriched around transcription start sites. Calculating the optimal window that describes this enrichment allows one to quantify modification levels for each individual gene. Using data sets for the H3K9/14ac histone modification in Th cells and an accompanying IgG control, we present an analysis strategy that alternates between single gene and global data distribution levels and allows a clear distinction between experimental background and signal. Curve fitting permits false discovery rate-based classification of genes as modified versus unmodified. We have developed a software package called EpiChIP that carries out this type of analysis, including integration with and visualization of gene expression data.
Funded by: Medical Research Council: MC_U105161047
Nucleic acids research 2011;39;5;e27
Exome sequencing identifies a missense mutation in Isl1 associated with low penetrance otitis media in dearisch mice.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Background: Inflammation of the middle ear (otitis media) is very common and can lead to serious complications if not resolved. Genetic studies suggest an inherited component, but few of the genes that contribute to this condition are known. Mouse mutants have contributed significantly to the identification of genes predisposing to otitis media
Results: The dearisch mouse mutant is an ENU-induced mutant detected by its impaired Preyer reflex (ear flick in response to sound). Auditory brainstem responses revealed raised thresholds from as early as three weeks old. Pedigree analysis suggested a dominant but partially penetrant mode of inheritance. The middle ear of dearisch mutants shows a thickened mucosa and cellular effusion suggesting chronic otitis media with effusion with superimposed acute infection. The inner ear, including the sensory hair cells, appears normal. Due to the low penetrance of the phenotype, normal backcross mapping of the mutation was not possible. Exome sequencing was therefore employed to identify a non-conservative tyrosine to cysteine (Y71C) missense mutation in the Islet1 gene, Isl1(Drsh). Isl1 is expressed in the normal middle ear mucosa. The findings suggest the Isl1(Drsh) mutation is likely to predispose carriers to otitis media.
Conclusions: Dearisch, Isl1(Drsh), represents the first point mutation in the mouse Isl1 gene and suggests a previously unrecognized role for this gene. It is also the first recorded exome sequencing of the C3HeB/FeJ background relevant to many ENU-induced mutants. Most importantly, the power of exome resequencing to identify ENU-induced mutations without a mapped gene locus is illustrated.
Funded by: Medical Research Council: G0300212, G0800024, MC_QA137918; Wellcome Trust: 077189
Genome biology 2011;12;9;R90
A very early-branching Staphylococcus aureus lineage lacking the carotenoid pigment staphyloxanthin.
Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia.
Here we discuss the evolution of the northern Australian Staphylococcus aureus isolate MSHR1132 genome. MSHR1132 belongs to the divergent clonal complex 75 lineage. The average nucleotide divergence between orthologous genes in MSHR1132 and typical S. aureus is approximately sevenfold greater than the maximum divergence observed in this species to date. MSHR1132 has a small accessory genome, which includes the well-characterized genomic islands, νSAα and νSaβ, suggesting that these elements were acquired well before the expansion of the typical S. aureus population. Other mobile elements show mosaic structure (the prophage ϕSa3) or evidence of recent acquisition from a typical S. aureus lineage (SCCmec, ICE6013 and plasmid pMSHR1132). There are two differences in gene repertoire compared with typical S. aureus that may be significant clues as to the genetic basis underlying the successful emergence of S. aureus as a pathogen. First, MSHR1132 lacks the genes for production of staphyloxanthin, the carotenoid pigment that confers upon S. aureus its characteristic golden color and protects against oxidative stress. The lack of pigment was demonstrated in 126 of 126 CC75 isolates. Second, a mobile clustered regularly interspaced short palindromic repeat (CRISPR) element is inserted into orfX of MSHR1132. Although common in other staphylococcal species, these elements are very rare within S. aureus and may impact accessory genome acquisition. The CRISPR spacer sequences reveal a history of attempted invasion by known S. aureus mobile elements. There is a case for the creation of a new taxon to accommodate this and related isolates.
Genome biology and evolution 2011;3;881-95
A homozygous mutant embryonic stem cell bank applicable for phenotype-driven genetic screening.
Department of Social and Environmental Medicine, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan. firstname.lastname@example.org
Genome-wide mutagenesis in mouse embryonic stem cells (ESCs) is a powerful tool, but the diploid nature of the mammalian genome hampers its application for recessive genetic screening. We have previously reported a method to induce homozygous mutant ESCs from heterozygous mutants by tetracycline-dependent transient disruption of the Bloom's syndrome gene. However, we could not purify homozygous mutants from a large population of heterozygous mutant cells, limiting the applications. Here we developed a strategy for rapid enrichment of homozygous mutant mouse ESCs and demonstrated its feasibility for cell-based phenotypic analysis. The method uses G418-plus-puromycin double selection to enrich for homozygotes and single-nucleotide polymorphism analysis for identification of homozygosity. We combined this simple approach with gene-trap mutagenesis to construct a homozygous mutant ESC bank with 138 mutant lines and demonstrate its use in phenotype-driven genetic screening.
Nature methods 2011;8;12;1071-7
An activating mutation of AKT2 and human hypoglycemia.
Clinical and Molecular Genetics Unit, Developmental Endocrinology Research Group, Institute of Child Health, University College London, London WC1N 1EH, UK.
Pathological fasting hypoglycemia in humans is usually explained by excessive circulating insulin or insulin-like molecules or by inborn errors of metabolism impairing liver glucose production. We studied three unrelated children with unexplained, recurrent, and severe fasting hypoglycemia and asymmetrical growth. All were found to carry the same de novo mutation, p.Glu17Lys, in the serine/threonine kinase AKT2, in two cases as heterozygotes and in one case in mosaic form. In heterologous cells, the mutant AKT2 was constitutively recruited to the plasma membrane, leading to insulin-independent activation of downstream signaling. Thus, systemic metabolic disease can result from constitutive, cell-autonomous activation of signaling pathways normally controlled by insulin.
Funded by: Medical Research Council: G0502115; Wellcome Trust: 077016, 077016/Z/05/Z, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z, 091551, 091551/Z/10/Z, 095515
Science (New York, N.Y.) 2011;334;6055;474
Large-scale gene-centric analysis identifies novel variants for coronary artery disease.
Coronary artery disease (CAD) has a significant genetic contribution that is incompletely characterized. To complement genome-wide association (GWA) studies, we conducted a large and systematic candidate gene study of CAD susceptibility, including analysis of many uncommon and functional variants. We examined 49,094 genetic variants in ∼2,100 genes of cardiovascular relevance, using a customised gene array in 15,596 CAD cases and 34,992 controls (11,202 cases and 30,733 controls of European descent; 4,394 cases and 4,259 controls of South Asian origin). We attempted to replicate putative novel associations in an additional 17,121 CAD cases and 40,473 controls. Potential mechanisms through which the novel variants could affect CAD risk were explored through association tests with vascular risk factors and gene expression. We confirmed associations of several previously known CAD susceptibility loci (eg, 9p21.3:p<10(-33); LPA:p<10(-19); 1p13.3:p<10(-17)) as well as three recently discovered loci (COL4A1/COL4A2, ZC3HC1, CYP17A1:p<5×10(-7)). However, we found essentially null results for most previously suggested CAD candidate genes. In our replication study of 24 promising common variants, we identified novel associations of variants in or near LIPA, IL5, TRIB1, and ABCG5/ABCG8, with per-allele odds ratios for CAD risk with each of the novel variants ranging from 1.06-1.09. Associations with variants at LIPA, TRIB1, and ABCG5/ABCG8 were supported by gene expression data or effects on lipid levels. Apart from the previously reported variants in LPA, none of the other ∼4,500 low frequency and functional variants showed a strong effect. Associations in South Asians did not differ appreciably from those in Europeans, except for 9p21.3 (per-allele odds ratio: 1.14 versus 1.27 respectively; P for heterogeneity = 0.003). This large-scale gene-centric analysis has identified several novel genes for CAD that relate to diverse biochemical and cellular functions and clarified the literature with regard to many previously suggested genes.
Funded by: British Heart Foundation: RG/08/014/24067, RG/09/12/28096; Medical Research Council: G0401527, G0601966, G0700931, G0701863, G0801056, G1000143, MC_U105260792, MC_U106179471, MC_U137686857; NHLBI NIH HHS: R01 HL087647; Wellcome Trust: 090532
PLoS genetics 2011;7;9;e1002260
Distinguishing driver and passenger mutations in an evolutionary history categorized by interference.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
In many biological scenarios, from the development of drug resistance in pathogens to the progression of healthy cells toward cancer, quantifying the selection acting on observed mutations is a central question. One difficulty in answering this question is the complexity of the background upon which mutations can arise, with multiple potential interactions between genetic loci. We here present a method for discerning selection from a population history that accounts for interference between mutations. Given sequences sampled from multiple time points in the history of a population, we infer selection at each locus by maximizing a likelihood function derived from a multilocus evolution model. We apply the method to the question of distinguishing between loci where new mutations are under positive selection (drivers) and loci that emit neutral mutations (passengers) in a Wright-Fisher model of evolution. Relative to an otherwise equivalent method in which the genetic background of mutations was ignored, our method inferred selection coefficients more accurately for both driver mutations evolving under clonal interference and passenger mutations reaching fixation in the population through genetic drift or hitchhiking. In a population history recorded by 750 sets of sequences of 100 individuals taken at intervals of 100 generations, a set of 50 loci were divided into drivers and passengers with a mean accuracy of >0.95 across a range of numbers of driver loci. The potential application of our model, either in full or in part, to a range of biological systems, is discussed.
Funded by: Wellcome Trust: 091747
Analysis of Complex Disease Association Studies 2011;Chapter 5;69-86
Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study.
Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital, Box 285, Cambridge CB2 0QQ, UK e-mail: email@example.com
Aims/hypothesis: Studying gene-lifestyle interaction may help to identify lifestyle factors that modify genetic susceptibility and uncover genetic loci exerting important subgroup effects. Adequately powered studies with prospective, unbiased, standardised assessment of key behavioural factors for gene-lifestyle studies are lacking. This case-cohort study aims to investigate how genetic and potentially modifiable lifestyle and behavioural factors, particularly diet and physical activity, interact in their influence on the risk of developing type 2 diabetes.
Methods: Incident cases of type 2 diabetes occurring in European Prospective Investigation into Cancer and Nutrition (EPIC) cohorts between 1991 and 2007 from eight of the ten EPIC countries were ascertained and verified. Prentice-weighted Cox regression and random-effects meta-analyses were used to investigate differences in diabetes incidence by age and sex.
Results: A total of 12,403 verified incident cases of type 2 diabetes occurred during 3.99 million person-years of follow-up of 340,234 EPIC participants eligible for InterAct. We defined a centre-stratified subcohort of 16,154 individuals for comparative analyses. Individuals with incident diabetes who were randomly selected into the subcohort (n = 778) were included as cases in the analyses. All prevalent diabetes cases were excluded from the study. InterAct cases were followed-up for an average of 6.9 years; 49.7% were men. Mean baseline age and age at diagnosis were 55.6 and 62.5 years, mean BMI and waist circumference values were 29.4 kg/m(2) and 102.7 cm in men, and 30.1 kg/m(2) and 92.8 cm in women, respectively. Risk of type 2 diabetes increased linearly with age, with an overall HR of 1.56 (95% CI 1.48-1.64) for a 10 year age difference, adjusted for sex. A male excess in the risk of incident diabetes was consistently observed across all countries, with a pooled HR of 1.51 (95% CI 1.39-1.64), adjusted for age.
Conclusions/interpretation: InterAct is a large, well-powered, prospective study that will inform our understanding of the interplay between genes and lifestyle factors on the risk of type 2 diabetes development.
Funded by: Canadian Institutes of Health Research: G0601261; Cancer Research UK: 11692; Medical Research Council: G0401527, G0601261, G1000143, MC_EX_G0800783, MC_U106179471, MC_U106179473, MC_U106179474, MC_UP_A090_1006, MC_UP_A100_1003; Wellcome Trust: 083270/083270/z, 090532
Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk.
Blood pressure is a heritable trait influenced by several biological pathways and responsive to environmental stimuli. Over one billion people worldwide have hypertension (≥140 mm Hg systolic blood pressure or ≥90 mm Hg diastolic blood pressure). Even small increments in blood pressure are associated with an increased risk of cardiovascular events. This genome-wide association study of systolic and diastolic blood pressure, which used a multi-stage design in 200,000 individuals of European descent, identified sixteen novel loci: six of these loci contain genes previously known or suspected to regulate blood pressure (GUCY1A3-GUCY1B3, NPR3-C5orf23, ADM, FURIN-FES, GOSR2, GNAS-EDN3); the other ten provide new clues to blood pressure physiology. A genetic risk score based on 29 genome-wide significant variants was associated with hypertension, left ventricular wall thickness, stroke and coronary artery disease, but not kidney disease or kidney function. We also observed associations with blood pressure in East Asian, South Asian and African ancestry individuals. Our findings provide new insights into the genetics and biology of blood pressure, and suggest potential novel therapeutic pathways for cardiovascular disease prevention.
Funded by: AHRQ HHS: HS06516; Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: CH/03/001, FS05/125, G0501942, PG/02/128, PG97012, PG97027, RG/07/005/23633, RG/07/008/23674, RG/08/008/25291, RG/08/013/25942, RG/08/014/24067, RG/98002, RG08/01, SP/04/002, SP/08/005/25115; Canadian Institutes of Health Research: MOP-82810, MOP172605, MOP77682; Chief Scientist Office: CZB/4/276, CZB/4/710; FIC NIH HHS: R03 TW007165, TW008288, TW05596; Howard Hughes Medical Institute: 55005617; Intramural NIH HHS; Medical Research Council: G0000934, G0100222, G0400874, G0401527, G0500539, G0501942, G0600331, G0600705, G0601966, G0700931, G0701863, G0801056, G0902037, G0902313, G1000143, G19/35, G8802774, G9521010, G9521010D, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U123092720, MC_U123092723, MC_U127561128, MC_U137686857, MC_UP_A100_1003; NCI NIH HHS: 5U01CA086308, P01CA055075, P01CA087969; NCRR NIH HHS: 2M01RR010284, K12RR023250, M01 RR16500, M01-RR00425, RR-024156, RR20649, U54 RR020278, UL1RR025005; NHGRI NIH HHS: HG003054, HG005581, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004436, U01HG004438, U01HG004446, U01HG004726, U01HG004728, U01HG004729, U01HG004735, U01HG004738; NHLBI NIH HHS: 5R01HL086694-03, 5R01HL087679-02, 5R01HL08770002, HL 54512, HL-87660, HL043851, HL080025, HL084729, HL085144, HL086718, HL087647, HL098283, HL36310, HL45508, HL53353, HL54512, N01 HC-15103, N01 HC-55222, N01 HC-95159, N01 HC-95169, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N02-HL-6-4278, R01 HL073410, R01 HL085251, R01 HL086694, R01 HL086694-03, R01 HL086694-04A1, R01 HL086694-05, R01 HL087647, R01 HL087652, R01 HL088119, R01HL056931, R01HL060894, R01HL060919, R01HL06094, R01HL061019, R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258, R01HL071259, R01HL086694, R01HL087641, R01HL089650-02, R01HL59367, R37HL051021, U01 HL054466, U01 HL054466-11, U01 HL054471, U01 HL054473, U01 HL054527, U01 HL072515-06, U01 HL080295, U01 HL084756, U10 HL054512, U10HL054512; NIA NIH HHS: 1R01AG032098-01A, AG13196, N01-AG-1-2109, N01-AG-12100, N01AG6210, N01AG62101, N01AG62103, R01 AG017644-09S1, R01 AG18728; NICHD NIH HHS: N01-HD-1-3107; NIDCR NIH HHS: U01DE018903, U01DE01899; NIDDK NIH HHS: DK062370, DK063491, DK072193, DK075787, DK078150, DK56350, R01 DK072193, R01 DK078150, R01DK058845, R01DK066574, U01 DK062418; NIEHS NIH HHS: ES10126, P30 ES010126, P30ES007033; NIGMS NIH HHS: S06GM008016-320107, S06GM008016-380111, U01 GM074518-04; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164; NINDS NIH HHS: R01 NS39987, R01 NS42733, U01 NS069208, U01 NS069208-01; PHS HHS: 263-MA-410953, 33014, HHSN268200625226C, HHSN268200782096, HHSN268200782096C; Wellcome Trust: 068545/Z/02, 070191/Z/03/Z, 077016/Z/05/Z, 079895, 080747/Z/06/Z, 090532
Blood pressure loci identified with a gene-centric array.
Clinical Pharmacology and Barts and The London Genome Centre, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK. firstname.lastname@example.org
Raised blood pressure (BP) is a major risk factor for cardiovascular disease. Previous studies have identified 47 distinct genetic variants robustly associated with BP, but collectively these explain only a few percent of the heritability for BP phenotypes. To find additional BP loci, we used a bespoke gene-centric array to genotype an independent discovery sample of 25,118 individuals that combined hypertensive case-control and general population samples. We followed up four SNPs associated with BP at our p < 8.56 × 10(-7) study-specific significance threshold and six suggestively associated SNPs in a further 59,349 individuals. We identified and replicated a SNP at LSP1/TNNT3, a SNP at MTHFR-NPPB independent (r(2) = 0.33) of previous reports, and replicated SNPs at AGT and ATP2B1 reported previously. An analysis of combined discovery and follow-up data identified SNPs significantly associated with BP at p < 8.56 × 10(-7) at four further loci (NPR3, HFE, NOS3, and SOX6). The high number of discoveries made with modest genotyping effort can be attributed to using a large-scale yet targeted genotyping array and to the development of a weighting scheme that maximized power when meta-analyzing results from samples ascertained with extreme phenotypes, in combination with results from nonascertained or population samples. Chromatin immunoprecipitation and transcript expression data highlight potential gene regulatory mechanisms at the MTHFR and NOS3 loci. These results provide candidates for further study to help dissect mechanisms affecting BP and highlight the utility of studying SNPs and samples that are independent of those studied previously even when the sample size is smaller than that in previous studies.
Funded by: AHRQ HHS: HS06516; British Heart Foundation: CH/98001, FS05/125, PG/07/131/24254, PG/07/132/24256, PG/07/133/24260, PG/97012, RG/07/005/23633, RG/07/008/23674, RG/08/008, RG/08/008/25291, RG/08/013/25942, RG/2001004, SP/07/007/2367, SP/08/005/25115; Canadian Institutes of Health Research: MOP172605, MOP77682, MOP82810; Department of Health; Medical Research Council: G0100222, G0400874, G0401527, G0501942, G0701863, G0801056, G0802432, G0902037, G1000143, G19/35, G8802774, G9521010, G9521010D, MC_U106179471, MC_U123092720, MC_U123092723, MC_U137686857, MC_UP_A100_1003; NIA NIH HHS: AG13196, R01 AG017644-09S1; Wellcome Trust: 070191/Z/03/A, 070191/Z/03/Z, 076113/C/04/Z, 090532, 093078/Z/10/Z
American journal of human genetics 2011;89;6;688-700
Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry.
CNRS UMR 7205, Muséum National d'Histoire Naturelle, CP50, 45 Rue Buffon, 75005 Paris, France. email@example.com
Supergenes are tight clusters of loci that facilitate the co-segregation of adaptive variation, providing integrated control of complex adaptive phenotypes. Polymorphic supergenes, in which specific combinations of traits are maintained within a single population, were first described for 'pin' and 'thrum' floral types in Primula and Fagopyrum, but classic examples are also found in insect mimicry and snail morphology. Understanding the evolutionary mechanisms that generate these co-adapted gene sets, as well as the mode of limiting the production of unfit recombinant forms, remains a substantial challenge. Here we show that individual wing-pattern morphs in the polymorphic mimetic butterfly Heliconius numata are associated with different genomic rearrangements at the supergene locus P. These rearrangements tighten the genetic linkage between at least two colour-pattern loci that are known to recombine in closely related species, with complete suppression of recombination being observed in experimental crosses across a 400-kilobase interval containing at least 18 genes. In natural populations, notable patterns of linkage disequilibrium (LD) are observed across the entire P region. The resulting divergent haplotype clades and inversion breakpoints are found in complete association with wing-pattern morphs. Our results indicate that allelic combinations at known wing-patterning loci have become locked together in a polymorphic rearrangement at the P locus, forming a supergene that acts as a simple switch between complex adaptive phenotypes found in sympatry. These findings highlight how genomic rearrangements can have a central role in the coexistence of adaptive phenotypes involving several genes acting in concert, by locally limiting recombination and gene flow.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E008836/1, BBE0118451; Medical Research Council: G0900740; Wellcome Trust: 079643, 098051
Genetic risk prediction in complex disease.
Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Cambs, UK
Attempting to classify patients into high or low risk for disease onset or outcomes is one of the cornerstones of epidemiology. For some (but by no means all) diseases, clinically usable risk prediction can be performed using classical risk factors such as body mass index, lipid levels, smoking status, family history and, under certain circumstances, genetics (e.g. BRCA1/2 in breast cancer). The advent of genome-wide association studies (GWAS) has led to the discovery of common risk loci for the majority of common diseases. These discoveries raise the possibility of using these variants for risk prediction in a clinical setting. We discuss the different ways in which the predictive accuracy of these loci can be measured, and survey the predictive accuracy of GWAS variants for 18 common diseases. We show that predictive accuracy from genetic models varies greatly across diseases, but that the range is similar to that of non-genetic risk-prediction models. We discuss what factors drive differences in predictive accuracy, and how much value these predictions add over classical predictive tests. We also review the uses and pitfalls of idealized models of risk prediction. Finally, we look forward towards possible future clinical implementation of genetic risk prediction, and discuss realistic expectations for future utility.
Funded by: Wellcome Trust: WT089120/Z/09/Z
Human molecular genetics 2011;20;R2;R182-8
Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.
Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow.
Funded by: Wellcome Trust: WT089120/Z/09/Z
European journal of human genetics : EJHG 2011;19;6;662-6
CEP152 is a genome maintenance protein disrupted in Seckel syndrome.
Department of Medical Biology, Faculty of Medicine, Karadeniz Technical University, Trabzon, Turkey. firstname.lastname@example.org
Functional impairment of DNA damage response pathways leads to increased genomic instability. Here we describe the centrosomal protein CEP152 as a new regulator of genomic integrity and cellular response to DNA damage. Using homozygosity mapping and exome sequencing, we identified CEP152 mutations in Seckel syndrome and showed that impaired CEP152 function leads to accumulation of genomic defects resulting from replicative stress through enhanced activation of ATM signaling and increased H2AX phosphorylation.
Funded by: Medical Research Council: MC_U120081295, MC_U127580972, MC_U127597124; Wellcome Trust: 077014
Nature genetics 2011;43;1;23-6
Total zinc intake may modify the glucose-raising effect of a zinc transporter (SLC30A8) variant: a 14-cohort meta-analysis.
Department of Nutrition-Dietetics, Harokopio University, Athens, Greece. email@example.com
Objective: Many genetic variants have been associated with glucose homeostasis and type 2 diabetes in genome-wide association studies. Zinc is an essential micronutrient that is important for β-cell function and glucose homeostasis. We tested the hypothesis that zinc intake could influence the glucose-raising effect of specific variants.
Research design and methods: We conducted a 14-cohort meta-analysis to assess the interaction of 20 genetic variants known to be related to glycemic traits and zinc metabolism with dietary zinc intake (food sources) and a 5-cohort meta-analysis to assess the interaction with total zinc intake (food sources and supplements) on fasting glucose levels among individuals of European ancestry without diabetes.
Results: We observed a significant association of total zinc intake with lower fasting glucose levels (β-coefficient ± SE per 1 mg/day of zinc intake: -0.0012 ± 0.0003 mmol/L, summary P value = 0.0003), while the association of dietary zinc intake was not significant. We identified a nominally significant interaction between total zinc intake and the SLC30A8 rs11558471 variant on fasting glucose levels (β-coefficient ± SE per A allele for 1 mg/day of greater total zinc intake: -0.0017 ± 0.0006 mmol/L, summary interaction P value = 0.005); this result suggests a stronger inverse association between total zinc intake and fasting glucose in individuals carrying the glucose-raising A allele compared with individuals who do not carry it. None of the other interaction tests were statistically significant.
Conclusions: Our results suggest that higher total zinc intake may attenuate the glucose-raising effect of the rs11558471 SLC30A8 (zinc transporter) variant. Our findings also support evidence for the association of higher total zinc intake with lower fasting glucose levels.
Funded by: Medical Research Council: G0701863, MC_U106179471, MC_U106188470, MC_UP_A100_1003; NHLBI NIH HHS: R01 HL087700; Wellcome Trust: 090532
In vivo identification of tumor- suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma.
Cancer Genetics Program, Division of Genetics, Beth Israel Deaconess Cancer Center, Department of Medicine and Pathology, Harvard Medical School, Boston, MA 02215, USA.
We recently proposed that competitive endogenous RNAs (ceRNAs) sequester microRNAs to regulate mRNA transcripts containing common microRNA recognition elements (MREs). However, the functional role of ceRNAs in cancer remains unknown. Loss of PTEN, a tumor suppressor regulated by ceRNA activity, frequently occurs in melanoma. Here, we report the discovery of significant enrichment of putative PTEN ceRNAs among genes whose loss accelerates tumorigenesis following Sleeping Beauty insertional mutagenesis in a mouse model of melanoma. We validated several putative PTEN ceRNAs and further characterized one, the ZEB2 transcript. We show that ZEB2 modulates PTEN protein levels in a microRNA-dependent, protein coding-independent manner. Attenuation of ZEB2 expression activates the PI3K/AKT pathway, enhances cell transformation, and commonly occurs in human melanomas and other cancers expressing low PTEN levels. Our study genetically identifies multiple putative microRNA decoys for PTEN, validates ZEB2 mRNA as a bona fide PTEN ceRNA, and demonstrates that abrogated ZEB2 expression cooperates with BRAF(V600E) to promote melanomagenesis.
Funded by: Cancer Research UK; NCI NIH HHS: 1P50 CA121974, P50 CA121974, P50 CA121974-01, R01 CA-82328-09, R01 CA082328, R01 CA082328-09; NCRR NIH HHS: UL1 RR025758, UL1 RR025758-04; Wellcome Trust
Phylogenetic analysis of murine leukemia virus sequences from longitudinally sampled chronic fatigue syndrome patients suggests PCR contamination rather than viral evolution.
Department of Zoology, University of Oxford, South Parks Road, Oxford OX13PS, United Kingdom.
Xenotropic murine leukemia virus (MLV)-related virus (XMRV) has been amplified from human prostate cancer and chronic fatigue syndrome (CFS) patient samples. Other studies failed to replicate these findings and suggested PCR contamination with a prostate cancer cell line, 22Rv1, as a likely source. MLV-like sequences have also been detected in CFS patients in longitudinal samples 15 years apart. Here, we tested whether sequence data from these samples are consistent with viral evolution. Our phylogenetic analyses strongly reject a model of within-patient evolution and demonstrate that the sequences from the first and second time points represent distinct endogenous murine retroviruses, suggesting contamination.
Funded by: Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940, WT090940
Journal of virology 2011;85;20;10909-13
Mouse genomic variation and its effect on phenotypes and gene regulation.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F022697/1; Cancer Research UK: A6997; Medical Research Council: G0800024, MC_U127561112, MC_U137761446; NHLBI NIH HHS: K25 HL080079; NLM NIH HHS: 2T15LM007359; Wellcome Trust: 077192, 079912, 082356, 083573, 083573/Z/07/Z, 085906, 085906/Z/08/Z, 090532
Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.
Forestry and Forest Products Research Institute, Tsukuba, Japan. firstname.lastname@example.org
Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
PLoS pathogens 2011;7;9;e1002219
Genetic variation near IRS1 associates with reduced adiposity and an impaired metabolic profile.
Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Cambridge, UK.
Genome-wide association studies have identified 32 loci influencing body mass index, but this measure does not distinguish lean from fat mass. To identify adiposity loci, we meta-analyzed associations between ∼2.5 million SNPs and body fat percentage from 36,626 individuals and followed up the 14 most significant (P < 10(-6)) independent loci in 39,576 individuals. We confirmed a previously established adiposity locus in FTO (P = 3 × 10(-26)) and identified two new loci associated with body fat percentage, one near IRS1 (P = 4 × 10(-11)) and one near SPRY2 (P = 3 × 10(-8)). Both loci contain genes with potential links to adipocyte physiology. Notably, the body-fat-decreasing allele near IRS1 is associated with decreased IRS1 expression and with an impaired metabolic profile, including an increased visceral to subcutaneous fat ratio, insulin resistance, dyslipidemia, risk of diabetes and coronary artery disease and decreased adiponectin levels. Our findings provide new insights into adiposity and insulin resistance.
Funded by: Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: PG/07/133/24260, RG/07/008/23674, RG/08/008, RG/08/008/25291, SP/04/002, SP/07/007/23671; Cancer Research UK; Chief Scientist Office: CZB/4/710; Department of Health; Medical Research Council: G0100222, G0401527, G0601966, G0700931, G0701863, G0802051, G0902037, G1000143, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_U127561128; NCRR NIH HHS: M01 RR 16500, M01 RR000425-36, M01 RR016500-04, M01-RR00425; NHLBI NIH HHS: N01 HC015103, N01 HC025195, N01 HC045133, N01 HC055222, N01 HC075150, N01 HC085079, N01 HC085086, N01-HC15103, N01-HC25195, N01-HC35129, N01-HC45133, N01-HC55222, N01-HC75150, N01-HC85079-86, N01HC25195, N02 HL64278, R01 HL087652, R01 HL087652-03, R01 HL087700, R01 HL087700-03, R01 HL088119, R01 HL088119-04, R01 HL117078, R01-HL036310-20A2, R01-HL087652, R01-HL08770003, R01-HL088119, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04, U01 HL084756, U01 HL084756-03, U01-HL080295, U01-HL72515, U01-HL84756; NIA NIH HHS: AG13196, N01-AG12100, N01-AG62101, N01-AG62103, N01-AG62106, N01AG12100, N1AG62101A, N1AG62103A, N1AG62106A, R01 AG018728, R01 AG018728-05S1, R01 AG032098, R01 AG032098-01A1, R01-AG031890-01, R01-AG032098-01A1, R01-AG18728, R01-AR/AG41398; NIAMS NIH HHS: R01 AR041398, R01 AR041398-19, R01 AR046838, R01 AR046838-05, R01-AR046838; NIDDK NIH HHS: DK063491, K23 DK080145, K23 DK080145-05, K23-DK080145, P30 DK063491-03, P30 DK072488, P30 DK072488-04S1, P30-DK072488, R01 DK068336, R01 DK068336-03, R01 DK075681, R01 DK075681-04, R01 DK075787, R01 DK075787-05, R01 DK089256, R01-DK06833603, R01-DK07568102, R01-DK075787; Wellcome Trust: 077016/Z/05/Z, 084723/Z/08/Z, 091551, 091746/Z/10/Z, 095515
Nature genetics 2011;43;8;753-60
Glyburide is anti-inflammatory and associated with reduced mortality in melioidosis.
Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK. email@example.com
Background: Patients with diabetes mellitus are more prone to bacterial sepsis, but there are conflicting data on whether outcomes are worse in diabetics after presentation with sepsis. Glyburide is an oral hypoglycemic agent used to treat diabetes mellitus. This K(ATP)-channel blocker and broad-spectrum ATP-binding cassette (ABC) transporter inhibitor has broad-ranging effects on the immune system, including inhibition of inflammasome assembly and would be predicted to influence the host response to infection.
Methods: We studied a cohort of 1160 patients with gram-negative sepsis caused by a single pathogen (Burkholderia pseudomallei), 410 (35%) of whom were known to have diabetes. We subsequently studied prospectively diabetics with B. pseudomallei infection (n = 20) to compare the gene expression profile of peripheral whole blood leukocytes in patients who were taking glyburide against those not taking any sulfonylurea.
Results: Survival was greater in diabetics than in nondiabetics (38% vs 45%, respectively, P = .04), but the survival benefit was confined to the patient group taking glyburide (adjusted odds ratio .47, 95% confidence interval .28-.74, P = .005). We identified differential expression of 63 immune-related genes (P = .001) in patients taking glyburide, the sum effect of which we predict to be antiinflammatory in the glyburide group.
Conclusions: We present observational evidence for a glyburide-associated benefit during human melioidosis and correlate this with an anti-inflammatory effect of glyburide on the immune system.
Funded by: Wellcome Trust: 093956
Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2011;52;6;717-25
Diabetes does not influence activation of coagulation, fibrinolysis or anticoagulant pathways in Gram-negative sepsis (melioidosis).
Center for Experimental and Molecular Medicine, Department of Infectious Diseases, Tropical Medicine & AIDS, Academic Medical Center, Amsterdam, The Netherlands. firstname.lastname@example.org
Diabetes is associated with a disturbance of the haemostatic balance and is an important risk factor for sepsis, but the influence of diabetes on the pathogenesis of sepsis remains unclear. Melioidosis ( Burkholderia pseudomallei infection) is a common cause of community-acquired sepsis in Southeast Asia and northern Australia. We sought to investigate the impact of pre-existing diabetes on the coagulation and fibrinolytic systems during sepsis caused by B.pseudomallei . We recruited a cohort of 44 patients (34 with diabetes and 10 without diabetes) with culture-proven melioidosis. Diabetes was defined as a pre-admission diagnosis of diabetes or an HbA₁c>7.8% at enrolment. Thirty healthy blood donors and 52 otherwise healthy diabetes patients served as controls. Citrated plasma was collected from all subjects; additionally in melioidosis patients follow-up specimens were collected seven and ≥ 28 days after enrolment where possible. Relative to uninfected healthy controls, diabetes per se (i.e. in the absence of infection) was characterised by a procoagulant effect. Melioidosis was associated with activation of coagulation (thrombin-antithrombin complexes (TAT), prothrombin fragment F₁+₂ and fibrinogen concentrations were elevated; PT and PTT prolonged), suppression of anti-coagulation (antithrombin, protein C, total and free protein S levels were depressed) and abnormalities of fibrinolysis (D-dimer and plasmin-antiplasmin complex [PAP] were elevated). Remarkably, none of these haemostatic alterations were influenced by pre-existing diabetes. In conclusion, although diabetes is associated with multiple abnormalities of coagulation, anticoagulation and fibrinolysis, these changes are not detectable when superimposed on the background of larger abnormalities attributable to B. pseudomallei sepsis.
Funded by: Wellcome Trust
Thrombosis and haemostasis 2011;106;6;1139-48
Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci.
National Heart and Lung Institute (NHLI), Imperial College London, Hammersmith Hospital, London, UK. email@example.com
We carried out a genome-wide association study of type-2 diabetes (T2D) in individuals of South Asian ancestry. Our discovery set included 5,561 individuals with T2D (cases) and 14,458 controls drawn from studies in London, Pakistan and Singapore. We identified 20 independent SNPs associated with T2D at P < 10(-4) for testing in a replication sample of 13,170 cases and 25,398 controls, also all of South Asian ancestry. In the combined analysis, we identified common genetic variants at six loci (GRB14, ST6GAL1, VPS26A, HMG20A, AP3S2 and HNF4A) newly associated with T2D (P = 4.1 × 10(-8) to P = 1.9 × 10(-11)). SNPs at GRB14 were also associated with insulin sensitivity (P = 5.0 × 10(-4)), and SNPs at ST6GAL1 and HNF4A were also associated with pancreatic beta-cell function (P = 0.02 and P = 0.001, respectively). Our findings provide additional insight into mechanisms underlying T2D and show the potential for new discovery from genetic association studies in South Asians, a population with increased susceptibility to T2D.
Funded by: British Heart Foundation: SP/04/002; FIC NIH HHS: KO1TW006087; Medical Research Council: G0700931; NIDDK NIH HHS: DK-25446, R01DK082766; Wellcome Trust: 070854/Z/03/Z, 080747/Z/06/Z, 083270/Z/07/Z, 084723/Z/08/Z
Nature genetics 2011;43;10;984-9
High-throughput semiquantitative analysis of insertional mutations in heterogeneous tumors.
Division of Molecular Biology and Cancer Systems Biology Center, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands.
Retroviral and transposon-based insertional mutagenesis (IM) screens are widely used for cancer gene discovery in mice. Exploiting the full potential of IM screens requires methods for high-throughput sequencing and mapping of transposon and retroviral insertion sites. Current protocols are based on ligation-mediated PCR amplification of junction fragments from restriction endonuclease-digested genomic DNA, resulting in amplification biases due to uneven genomic distribution of restriction enzyme recognition sites. Consequently, sequence coverage cannot be used to assess the clonality of individual insertions. We have developed a novel method, called shear-splink, for the semiquantitative high-throughput analysis of insertional mutations. Shear-splink employs random fragmentation of genomic DNA, which reduces unwanted amplification biases. Additionally, shear-splink enables us to assess clonality of individual insertions by determining the number of unique ligation points (LPs) between the adapter and genomic DNA. This parameter serves as a semiquantitative measure of the relative clonality of individual insertions within heterogeneous tumors. Mixing experiments with clonal cell lines derived from mouse mammary tumor virus (MMTV)-induced tumors showed that shear-splink enables the semiquantitative assessment of the clonality of MMTV insertions. Further, shear-splink analysis of 16 MMTV- and 127 Sleeping Beauty (SB)-induced tumors showed enrichment for cancer-relevant insertions by exclusion of irrelevant background insertions marked by single LPs, thereby facilitating the discovery of candidate cancer genes. To fully exploit the use of the shear-splink method, we set up the Insertional Mutagenesis Database (iMDB), offering a publicly available web-based application to analyze both retroviral- and transposon-based insertional mutagenesis data.
Funded by: Cancer Research UK; Wellcome Trust
Genome research 2011;21;12;2181-9
FoSTeS, MMBIR and NAHR at the human proximal Xp region and the mechanisms of human Xq isochromosome formation.
Department of Medical Genetics, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands.
The recently described DNA replication-based mechanisms of fork stalling and template switching (FoSTeS) and microhomology-mediated break-induced replication (MMBIR) were previously shown to catalyze complex exonic, genic and genomic rearrangements. By analyzing a large number of isochromosomes of the long arm of chromosome X (i(Xq)), using whole-genome tiling path array comparative genomic hybridization (aCGH), ultra-high resolution targeted aCGH and sequencing, we provide evidence that the FoSTeS and MMBIR mechanisms can generate large-scale gross chromosomal rearrangements leading to the deletion and duplication of entire chromosome arms, thus suggesting an important role for DNA replication-based mechanisms in both the development of genomic disorders and cancer. Furthermore, we elucidate the mechanisms of dicentric i(Xq) (idic(Xq)) formation and show that most idic(Xq) chromosomes result from non-allelic homologous recombination between palindromic low copy repeats and highly homologous palindromic LINE elements. We also show that non-recurrent-breakpoint idic(Xq) chromosomes have microhomology-associated breakpoint junctions and are likely catalyzed by microhomology-mediated replication-dependent recombination mechanisms such as FoSTeS and MMBIR. Finally, we stress the role of the proximal Xp region as a chromosomal rearrangement hotspot.
Funded by: Wellcome Trust: 077008
Human molecular genetics 2011;20;10;1925-36
96-plex molecular barcoding for the Illumina Genome Analyzer.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Next-generation sequencing technologies have a massive throughput, which dramatically reduces the cost of sequencing per gigabase, compared to standard Sanger sequencing. To make the most efficient use of this throughput when sequencing small regions or genomes, we developed a barcoding method, which allows multiplexing of 96 or more samples per lane. The method employs 8 bp tags, incorporated into each sequencing library during the library preparation enrichment polymerase chain reaction (PCR), pooling bar-coded libraries in equimolar ratios based on quantitative PCR, and sequencing using the three-read Illumina method.
Methods in molecular biology (Clifton, N.J.) 2011;733;279-98
Amplification-free library preparation for paired-end Illumina sequencing.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
The library preparation step is of critical importance for the quality of next-generation sequencing data. The use of the polymerase chain reaction (PCR) as a part of the standard Illumina library preparation protocol causes an appreciable proportion of the obtained sequences to be duplicates, making the sequencing run less efficient. Also, amplification introduces biases, particularly for genomes with high or low GC content, which reduces the complexity of the resulting library. To overcome these difficulties, we developed an amplification-free library preparation. By the use of custom adapters, unamplified, ligated samples can hybridize directly to the oligonucleotides on the flowcell surface.
Methods in molecular biology (Clifton, N.J.) 2011;733;257-66
miR-96 regulates the progression of differentiation in mammalian cochlear inner and outer hair cells.
Department of Biomedical Science, University of Sheffield, Sheffield S10 2TN, United Kingdom.
MicroRNAs (miRNAs) are small noncoding RNAs able to regulate a broad range of protein-coding genes involved in many biological processes. miR-96 is a sensory organ-specific miRNA expressed in the mammalian cochlea during development. Mutations in miR-96 cause nonsyndromic progressive hearing loss in humans and mice. The mouse mutant diminuendo has a single base change in the seed region of the Mir96 gene leading to widespread changes in the expression of many genes. We have used this mutant to explore the role of miR-96 in the maturation of the auditory organ. We found that the physiological development of mutant sensory hair cells is arrested at around the day of birth, before their biophysical differentiation into inner and outer hair cells. Moreover, maturation of the hair cell stereocilia bundle and remodelling of auditory nerve connections within the cochlea fail to occur in miR-96 mutants. We conclude that miR-96 regulates the progression of the physiological and morphological differentiation of cochlear hair cells and, as such, coordinates one of the most distinctive functional refinements of the mammalian auditory system.
Funded by: Action on Hearing Loss: G41; Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 077189, 088719
Proceedings of the National Academy of Sciences of the United States of America 2011;108;6;2355-60
X-box binding protein 1 induces the expression of the lytic cycle transactivator of Kaposi's sarcoma-associated herpesvirus but not Epstein-Barr virus in co-infected primary effusion lymphoma.
University College London, MRC Centre for Molecular Virology, Department of Infection, Division of Infection and Immunity, Windeyer Institute of Medical Science, 46 Cleveland Street, London W1T 4JF, UK.
Cells of primary effusion lymphoma (PEL), a B-cell non-Hodgkin's lymphoma, are latently infected by Kaposi's sarcoma-associated herpesvirus (KSHV), with about 80 % of PEL also co-infected with Epstein-Barr virus (EBV). Both viruses can be reactivated into their lytic replication cycle in PEL by chemical inducers. However, simultaneous activation of both lytic cascades leads to mutual lytic cycle co-repression. The plasma cell-differentiation factor X-box binding protein 1 (XBP-1) transactivates the KSHV immediate-early promoter leading to the production of the replication and transcription activator protein (RTA), and reactivation of KSHV from latency. XBP-1 has been reported to act similarly on the EBV immediate-early promoter Zp, leading to the production of the lytic-cycle transactivator protein BZLF1. Here we show that activated B-cell terminal-differentiation transcription factor X-box binding protein 1 (XBP-1s) does not induce EBV BZLF1 and BRLF1 expression in PEL and BL cell lines, despite inducing lytic reactivation of KSHV in PEL. We show that XBP-1s transactivates the KSHV RTA promoter but does not transactivate the EBV BZLF1 promoter in non-B-cells by using a luciferase assay. Co-expression of activated protein kinase D, which can phosphorylate and inactivate class II histone deacetylases (HDACs), does not rescue XBP-1 activity on Zp nor does it induce BZLF1 and BRLF1 expression in PEL. Finally, chemical inducers of KSHV and EBV lytic replication in PEL, including HDAC inhibitors, do not lead to XBP-1 activation. We conclude that XBP-1 specifically reactivates the KSHV lytic cycle in dually infected PELs.
Funded by: Cancer Research UK; Wellcome Trust
The Journal of general virology 2011;92;Pt 2;421-31
Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans.
Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid nematode genomes.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E018130/1; Wellcome Trust: WT 085775/Z/08/Z
PloS one 2011;6;8;e23216
ITFoM - The IT future of medicine
Procedia Computer Science 2011;7;26-9
Q8IYL2 is a candidate gene for the familial epilepsy syndrome of Partial Epilepsy with Pericentral Spikes (PEPS).
Division of Neuroscience, Imperial College London, UK; Wellcome Trust Sanger Institute, Cambridge, UK. firstname.lastname@example.org
Purpose: Partial Epilepsy with Pericentral Spikes (PEPS) is a novel Mendelian idiopathic epilepsy with evidence of linkage to Chromosome 4p15. Our aim was to identify the causative mutation in this epilepsy syndrome.
Methods: We re-annotated all 42 genes in the linked chromosomal region and sequenced all genes within the linked interval. All exons, intron-exon boundaries and untranslated regions were sequenced in the original pedigree, and novel changes segregating correctly were subjected to bioinformatic analysis. Quantitative polymerase chain reaction was performed to examine for potential copy number variation (CNV).
Results: 29 previously undescribed variants correctly segregating with the linked haplotype were identified. Bioinformatic analysis demonstrated that six variants were non-synonymous coding sequence polymorphisms, one of which, in Q8IYL2 (Gly400Ala), was found in neither Caucasian (n=243) and ancestry-matched Brazilian (n=180) control samples, nor subjects from the 1000 Genome Project. No gene duplications or deletions were identified in the linked region.
Discussion: We postulate that Q8IYL2 is a causative gene for PEPS, after exhaustive resequencing and bioinformatic analysis. The function of this gene is unknown, but it is expressed in brain tissue.
Epilepsy research 2011;96;1-2;109-15
Inference of human population history from individual whole-genome sequences.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
The history of human population size is important for understanding human evolution. Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH), a Korean male (SJK), three European individuals (J. C. Venter, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10-20 kyr ago. Both populations experienced a severe bottleneck 10-60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100-120 kyr ago, but considerable genetic exchanges may still have occurred until 20-40 kyr ago.
Funded by: Wellcome Trust: 077192, WT077192
Mobilization of giant piggyBac transposons in the mouse genome.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.
The development of technologies that allow the stable delivery of large genomic DNA fragments in mammalian systems is important for genetic studies as well as for applications in gene therapy. DNA transposons have emerged as flexible and efficient molecular vehicles to mediate stable cargo transfer. However, the ability to carry DNA fragments >10 kb is limited in most DNA transposons. Here, we show that the DNA transposon piggyBac can mobilize 100-kb DNA fragments in mouse embryonic stem (ES) cells, making it the only known transposon with such a large cargo capacity. The integrity of the cargo is maintained during transposition, the copy number can be controlled and the inserted giant transposons express the genomic cargo. Furthermore, these 100-kb transposons can also be excised from the genome without leaving a footprint. The development of piggyBac as a large cargo vector will facilitate a wider range of genetic and genomic applications.
Funded by: Howard Hughes Medical Institute; Wellcome Trust: WT077187
Nucleic acids research 2011;39;22;e148
Zebrafish Fukutin family proteins link the unfolded protein response with dystroglycanopathies.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Allelic mutations in putative glycosyltransferase genes, fukutin and fukutin-related protein (fkrp), lead to a wide range of muscular dystrophies associated with hypoglycosylation of α-dystroglycan, commonly referred to as dystroglycanopathies. Defective glycosylation affecting dystroglycan-ligand interactions is considered to underlie the disease pathogenesis. We have modelled dystroglycanopathies in zebrafish using a novel loss-of-function dystroglycan allele and by inhibition of Fukutin family protein activities. We show that muscle pathology in embryos lacking Fukutin or FKRP is different from loss of dystroglycan. In addition to hypoglycosylated α-dystroglycan, knockdown of Fukutin or FKRP leads to a notochord defect and a perturbation of laminin expression before muscle degeneration. These are a consequence of endoplasmic reticulum stress and activation of the unfolded protein response (UPR), preceding loss of dystroglycan-ligand interactions. Together, our results suggest that Fukutin family proteins may play important roles in protein secretion and that the UPR may contribute to the phenotypic spectrum of some dystroglycanopathies in humans.
Funded by: Medical Research Council: G0601943; Wellcome Trust: 077037/Z/05/Z, 077047/Z/05/Z
Human molecular genetics 2011;20;9;1763-75
Stella-Cre mice are highly efficient Cre deleters.
College of Animal Science and Technology, Huazhong Agriculture University, Wuhan, China.
Cre-loxP recombination is widely used for genetic manipulation of the mouse genome. Here, we report generation and characterization of a new Cre line, Stella-Cre, where Cre expression cassette was targeted to the 3' UTR of the Stella locus. Stella is specifically expressed in preimplantation embryos and in the germline. Cre-loxP recombination efficiency in Stella-Cre mice was investigated at several genomic loci including Rosa26, Jak2, and Npm1. At all the loci examined, we observed 100% Cre-loxP recombination efficiency in the embryos and in the germline. Thus, Stella-Cre mice serve as a very efficient deleter line.
Funded by: Wellcome Trust
Genesis (New York, N.Y. : 2000) 2011;49;8;689-95
Comparative and demographic analysis of orang-utan genomes.
The Genome Center at Washington University, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, Missouri 63108, USA. email@example.com
'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
Funded by: Medical Research Council: G0501331, MC_U137761446; NHGRI NIH HHS: HG002238, HG002385, R01 HG002939, U54 HG003079, U54 HG003079-08, U54 HG003273; NIA NIH HHS: P01 AG022064; NIGMS NIH HHS: R01 GM059290, R01 GM59290
ATMIN is required for maintenance of genomic stability and suppression of B cell lymphoma.
Mammalian Genetics Lab, Cancer Research UK, London Research Institute, 44, Lincoln's Inn Fields, London WC2A 3LY, UK.
Defective V(D)J rearrangement of immunoglobulin heavy or light chain (IgH or IgL) or class switch recombination (CSR) can initiate chromosomal translocations. The DNA-damage kinase ATM is required for the suppression of chromosomal translocations but ATM regulation is incompletely understood. Here, we show that mice lacking the ATM cofactor ATMIN in B cells (ATMIN(ΔB/ΔB)) have impaired ATM signaling and develop B cell lymphomas. Notably, ATMIN(ΔB/ΔB) cells exhibited defective peripheral V(D)J rearrangement and CSR, resulting in translocations involving the Igh and Igl loci, indicating that ATMIN is required for efficient repair of DNA breaks generated during somatic recombination. Thus, our results identify a role for ATMIN in regulating the maintenance of genomic stability and tumor suppression in B cells.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F012217/1, BBS/E/B/0000C163; Cancer Research UK; Medical Research Council: MC_U105178806; Wellcome Trust
Cancer cell 2011;19;5;587-600
PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.
Gregor Mendel Institute, Vienna, Austria. firstname.lastname@example.org
With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/.
Funded by: Wellcome Trust: 085775/Z/08/Z
PloS one 2011;6;1;e15292
ADAM-15 disintegrin-like domain structure and function
A large palindrome with interchromosomal gene duplications in the pericentromeric region of the D. melanogaster Y chromosome.
Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain.
The non-recombining Y chromosome is expected to degenerate over evolutionary time, however, gene gain is a common feature of Y chromosomes of mammals and Drosophila. Here, we report that a large palindrome containing interchromosomal segmental duplications is located in the vicinity of the first amplicon detected in the Y chromosome of D. melanogaster. The recent appearance of such amplicons suggests that duplications to the Y chromosome, followed by the amplification of the segmental duplications, are a mechanism for the continuing evolution of Drosophila Y chromosomes.
Funded by: Wellcome Trust
Molecular biology and evolution 2011;28;7;1967-71
A research agenda for malaria eradication: basic science and enabling technologies.
Today's malaria control efforts are limited by our incomplete understanding of the biology of Plasmodium and of the complex relationships between human populations and the multiple species of mosquito and parasite. Research priorities include the development of in vitro culture systems for the complete life cycle of P. falciparum and P. vivax and the development of an appropriate liver culture system to study hepatic stages. In addition, genetic technologies for the manipulation of Plasmodium need to be improved, the entire parasite metabolome needs to be characterized to identify new druggable targets, and improved information systems for monitoring the changes in epidemiology, pathology, and host-parasite-vector interactions as a result of intensified control need to be established to bridge the gap between bench, preclinical, clinical, and population-based sciences.
Funded by: Medical Research Council: G0501670
PLoS medicine 2011;8;1;e1000399
Low-bias, strand-specific transcriptome Illumina sequencing by on-flowcell reverse transcription (FRT-seq).
The Wellcome Trust Sanger Institute, Cambridge, UK. email@example.com
The unifying feature of second-generation sequencing technologies is that single template strands are amplified clonally onto a solid surface prior to the sequencing reaction. To convert template strands into a compatible state for attachment to this surface, a multistep library preparation is required, which typically culminates in amplification by the PCR. PCR is an inherently biased process, which decreases the efficiency of data acquisition. Flowcell reverse transcription sequencing is a method of transcriptome sequencing for Illumina sequencers in which the reverse transcription reaction is performed on the flowcell by using unamplified, adapter-ligated mRNA as a template. This approach removes PCR biases and duplicates, generates strand-specific paired-end data and is highly reproducible. The procedure can be performed quickly, taking 2 d to generate clusters from mRNA.
Funded by: Wellcome Trust: WT079643
Nature protocols 2011;6;11;1736-47
APC15 drives the turnover of MCC-CDC20 to make the spindle assembly checkpoint responsive to kinetochore attachment.
The Gurdon Institute and Department of Zoology, Tennis Court Road, Cambridge CB2 1QN, UK.
Faithful chromosome segregation during mitosis depends on the spindle assembly checkpoint (SAC), which monitors kinetochore attachment to the mitotic spindle. Unattached kinetochores generate mitotic checkpoint proteins complexes (MCCs) that bind and inhibit the anaphase-promoting complex, or cyclosome (APC/C). How the SAC proficiently inhibits the APC/C but still allows its rapid activation when the last kinetochore attaches to the spindle is important for the understanding of how cells maintain genomic stability. We show that the APC/C subunit APC15 is required for the turnover of the APC/C co-activator CDC20 and release of MCCs during SAC signalling but not for APC/C activity per se. In the absence of APC15, MCCs and ubiquitylated CDC20 remain 'locked' onto the APC/C, which prevents the ubiquitylation and degradation of cyclin B1 when the SAC is satisfied. We conclude that APC15 mediates the constant turnover of CDC20 and MCCs on the APC/C to allow the SAC to respond to the attachment state of kinetochores.
Funded by: Biotechnology and Biological Sciences Research Council: BB/G001537/1; Cancer Research UK: A3211; Wellcome Trust: 079643/Z/06/Z
Nature cell biology 2011;13;10;1234-43
Insertional mutagenesis identifies multiple networks of cooperating genes driving intestinal tumorigenesis.
Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, UK.
The evolution of colorectal cancer suggests the involvement of many genes. To identify new drivers of intestinal cancer, we performed insertional mutagenesis using the Sleeping Beauty transposon system in mice carrying germline or somatic Apc mutations. By analyzing common insertion sites (CISs) isolated from 446 tumors, we identified many hundreds of candidate cancer drivers. Comparison to human data sets suggested that 234 CIS-targeted genes are also dysregulated in human colorectal cancers. In addition, we found 183 CIS-containing genes that are candidate Wnt targets and showed that 20 CISs-containing genes are newly discovered modifiers of canonical Wnt signaling. We also identified mutations associated with a subset of tumors containing an expanded number of Paneth cells, a hallmark of deregulated Wnt signaling, and genes associated with more severe dysplasia included those encoding members of the FGF signaling cascade. Some 70 genes had co-occurrence of CIS pairs, clustering into 38 sub-networks that may regulate tumor development.
Funded by: Cancer Research UK: 13031, A6997; Wellcome Trust
Nature genetics 2011;43;12;1202-9
Introducing the Human Brain Project
Procedia Computer Science 2011;7;39-42
HLA-A*3101 and carbamazepine-induced hypersensitivity reactions in Europeans.
Molecular and Cellular Therapeutics, the Royal College of Surgeons in Ireland, Dublin, Ireland.
Background: Carbamazepine causes various forms of hypersensitivity reactions, ranging from maculopapular exanthema to severe blistering reactions. The HLA-B*1502 allele has been shown to be strongly correlated with carbamazepine-induced Stevens-Johnson syndrome and toxic epidermal necrolysis (SJS-TEN) in the Han Chinese and other Asian populations but not in European populations.
Methods: We performed a genomewide association study of samples obtained from 22 subjects with carbamazepine-induced hypersensitivity syndrome, 43 subjects with carbamazepine-induced maculopapular exanthema, and 3987 control subjects, all of European descent. We tested for an association between disease and HLA alleles through proxy single-nucleotide polymorphisms and imputation, confirming associations by high-resolution sequence-based HLA typing. We replicated the associations in samples from 145 subjects with carbamazepine-induced hypersensitivity reactions.
Results: The HLA-A*3101 allele, which has a prevalence of 2 to 5% in Northern European populations, was significantly associated with the hypersensitivity syndrome (P=3.5×10(-8)). An independent genomewide association study of samples from subjects with maculopapular exanthema also showed an association with the HLA-A*3101 allele (P=1.1×10(-6)). Follow-up genotyping confirmed the variant as a risk factor for the hypersensitivity syndrome (odds ratio, 12.41; 95% confidence interval [CI], 1.27 to 121.03), maculopapular exanthema (odds ratio, 8.33; 95% CI, 3.59 to 19.36), and SJS-TEN (odds ratio, 25.93; 95% CI, 4.93 to 116.18).
Conclusions: The presence of the HLA-A*3101 allele was associated with carbamazepine-induced hypersensitivity reactions among subjects of Northern European ancestry. The presence of the allele increased the risk from 5.0% to 26.0%, whereas its absence reduced the risk from 5.0% to 3.8%. (Funded by the U.K. Department of Health and others.).
Funded by: Department of Health; Intramural NIH HHS; Medical Research Council: G0400126; PHS HHS: HHS-N261200800001E, HHSN261200800001E; Wellcome Trust: 084730
The New England journal of medicine 2011;364;12;1134-43
Reply: Ileal pouch microbial diversity
Annals of Surgery. 2011;254;669-70
Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis.
Academic Department of Medical Genetics, Cambridge University, Cambridge, UK; Department of Hepatology, Cambridge University Hospitals National Health Service (NHS) Foundation Trust, Cambridge, UK.
In addition to the HLA locus, six genetic risk factors for primary biliary cirrhosis (PBC) have been identified in recent genome-wide association studies (GWAS). To identify additional loci, we carried out a GWAS using 1,840 cases from the UK PBC Consortium and 5,163 UK population controls as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3). We followed up 28 loci in an additional UK cohort of 620 PBC cases and 2,514 population controls. We identified 12 new susceptibility loci (at a genome-wide significance level of P < 5 × 10⁻⁸) and replicated all previously associated loci. We identified three further new loci in a meta-analysis of data from our study and previously published GWAS results. New candidate genes include STAT4, DENND1B, CD80, IL7R, CXCR5, TNFRSF1A, CLEC16A and NFKB1. This study has considerably expanded our knowledge of the genetic architecture of PBC.
Funded by: Medical Research Council: G0500020, G0800460, G0802068; PHS HHS: 1R01LEY018246; Wellcome Trust: 085925/Z/08/Z, 091745, WT090355/B/09/Z, WT09355A/09/Z, WT91745/Z/10/Z
Nature genetics 2011;43;4;329-32
The origins, evolution, and functional potential of alternative splicing in vertebrates.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. firstname.lastname@example.org
Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution.
Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 077198, WT077198/Z/05/Z
Molecular biology and evolution 2011;28;10;2949-59
Sequencing skippy: the genome sequence of an Australian kangaroo, Macropus eugenii.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Sequencing of the tammar wallaby (Macropus eugenii) reveals insights into genome evolution, and mammalian reproduction and development.
Genome biology 2011;12;8;123
Evidence for several waves of global transmission in the seventh cholera pandemic.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Vibrio cholerae is a globally important pathogen that is endemic in many areas of the world and causes 3-5 million reported cases of cholera every year. Historically, there have been seven acknowledged cholera pandemics; recent outbreaks in Zimbabwe and Haiti are included in the seventh and ongoing pandemic. Only isolates in serogroup O1 (consisting of two biotypes known as 'classical' and 'El Tor') and the derivative O139 can cause epidemic cholera. It is believed that the first six cholera pandemics were caused by the classical biotype, but El Tor has subsequently spread globally and replaced the classical biotype in the current pandemic. Detailed molecular epidemiological mapping of cholera has been compromised by a reliance on sub-genomic regions such as mobile elements to infer relationships, making El Tor isolates associated with the seventh pandemic seem superficially diverse. To understand the underlying phylogeny of the lineage responsible for the current pandemic, we identified high-resolution markers (single nucleotide polymorphisms; SNPs) in 154 whole-genome sequences of globally and temporally representative V. cholerae isolates. Using this phylogeny, we show here that the seventh pandemic has spread from the Bay of Bengal in at least three independent but overlapping waves with a common ancestor in the 1950s, and identify several transcontinental transmission events. Additionally, we show how the acquisition of the SXT family of antibiotic resistance elements has shaped pandemic spread, and show that this family was first acquired at least ten years before its discovery in V. cholerae.
Funded by: Wellcome Trust: 076962, 076964
Activation of K-RAS by co-mutation of codons 19 and 20 is transforming.
Department of Pathology, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK, CB2 0QQ, UK. email@example.com.
The K-RAS oncogene is widely mutated in human cancers. Activating mutations in K-RAS give rise to constitutive signalling through the MAPK/ERK and PI3K/AKT pathways promoting increased cell division, reduced apoptosis and transformation. The majority of activating mutations in K-RAS are located in codons 12 and 13. In a human colorectal cancer we identified a novel K-RAS co-mutation that altered codons 19 and 20 resulting in transitions at both codons (L19F/T20A) in the same allele. Using focus forming transformation assays in vitro , we showed that co-mutation of L19F/T20A in K-RAS demonstrated intermediate transforming ability that was greater than that of individual L19F and T20A mutants, but less than that of G12D and G12V K-RAS mutants. This demonstrated the synergistic effects of co-mutation of codons 19 and 20 and illustrated that co-mutation of these codons is functionally significant.
Journal of molecular signaling 2011;6;2
High incidence of recurrent copy number variants in patients with isolated and syndromic Müllerian aplasia.
Department of Obstetrics and Gynecology, University-Clinic Erlangen, Erlangen, Germany.
Background: Congenital malformations involving the Müllerian ducts are observed in around 5% of infertile women. Complete aplasia of the uterus, cervix, and upper vagina, also termed Müllerian aplasia or Mayer-Rokitansky-Kuster-Hauser (MRKH) syndrome, occurs with an incidence of around 1 in 4500 female births, and occurs in both isolated and syndromic forms. Previous reports have suggested that a proportion of cases, especially syndromic cases, are caused by variation in copy number at different genomic loci.
Methods: In order to obtain an overview of the contribution of copy number variation to both isolated and syndromic forms of Müllerian aplasia, copy number assays were performed in a series of 63 cases, of which 25 were syndromic and 38 isolated.
Results: A high incidence (9/63, 14%) of recurrent copy number variants in this cohort is reported here. These comprised four cases of microdeletion at 16p11.2, an autism susceptibility locus not previously associated with Müllerian aplasia, four cases of microdeletion at 17q12, and one case of a distal 22q11.2 microdeletion. Microdeletions at 16p11.2 and 17q12 were found in 4/38 (10.5%) cases with isolated Müllerian aplasia, and at 16p11.2, 17q12 and 22q11.2 (distal) in 5/25 cases (20%) with syndromic Müllerian aplasia.
Conclusion: The finding of microdeletion at 16p11.2 in 2/38 (5%) of isolated and 2/25 (8%) of syndromic cases suggests a significant contribution of this copy number variant alone to the pathogenesis of Müllerian aplasia. Overall, the high incidence of recurrent copy number variants in all forms of Müllerian aplasia has implications for the understanding of the aetiopathogenesis of the condition, and for genetic counselling in families affected by it.
Funded by: Wellcome Trust: 077008, 077014, 079973
Journal of medical genetics 2011;48;3;197-204
Impact of temperament on depression and anxiety symptoms and depressive disorder in a population-based birth cohort.
Public Health Genomics Unit, Institute for Molecular Medicine Finland FIMM, University of Helsinki and National Institute for Health and Welfare, Helsinki, Finland.
Background: The aim of this study was to characterize at the population level how innate features of temperament relate to experience of depressive mood and anxiety, and whether these symptoms have separable temperamental backgrounds.
Methods: The study subjects were 4773 members of the population-based Northern Finland Birth Cohort 1966, a culturally and genetically homogeneous study sample. Temperament was measured at age 31 using the temperament items of the Temperament and Character Inventory and a separate Pessimism score. Depressive mood was assessed based on a previous diagnosis of depressive disorder or symptoms of depression according to the Hopkins Symptom Check List - 25. Anxiety was assessed analogously.
Results: High levels of Harm avoidance and Pessimism were related to both depressive mood (effect sizes; d=0.84 and d=1.25, respectively) and depressive disorder (d=0.68 and d=0.68, respectively). Of the dimensions of Harm avoidance, Anticipatory worry and Fatigability had the strongest effects. Symptoms of depression and anxiety showed very similar underlying temperament patterns.
Limitations: Although Harm avoidance and Pessimism appear to be important endophenotype candidates for depression and anxiety, their potential usefulness as endophenotypes, and whether they meet all the suggested criteria for endophenotypes will remain to be confirmed in future studies.
Conclusions: Personality characteristics of Pessimism and Harm avoidance, in particular its dimensions Anticipatory worry and Fatigability, are strongly related to symptoms of depression and anxiety as well as to depressive disorder in this population. These temperamental features may be used as dimensional susceptibility factors in etiological studies of depression, which may aid in the development of improved clinical practice.
Journal of affective disorders 2011;131;1-3;393-7
A comprehensive evaluation of potential lung function associated genes in the SpiroMeta general population sample.
Nottingham Respiratory Biomedical Research Unit, Division of Therapeutics and Molecular Medicine, University Hospital of Nottingham, Nottingham, United Kingdom.
Rationale: Lung function measures are heritable traits that predict population morbidity and mortality and are essential for the diagnosis of chronic obstructive pulmonary disease (COPD). Variations in many genes have been reported to affect these traits, but attempts at replication have provided conflicting results. Recently, we undertook a meta-analysis of Genome Wide Association Study (GWAS) results for lung function measures in 20,288 individuals from the general population (the SpiroMeta consortium).
Objectives: To comprehensively analyse previously reported genetic associations with lung function measures, and to investigate whether single nucleotide polymorphisms (SNPs) in these genomic regions are associated with lung function in a large population sample.
Methods: We analysed association for SNPs tagging 130 genes and 48 intergenic regions (+/-10 kb), after conducting a systematic review of the literature in the PubMed database for genetic association studies reporting lung function associations.
Results: The analysis included 16,936 genotyped and imputed SNPs. No loci showed overall significant association for FEV(1) or FEV(1)/FVC traits using a carefully defined significance threshold of 1.3×10(-5). The most significant loci associated with FEV(1) include SNPs tagging MACROD2 (P = 6.81×10(-5)), CNTN5 (P = 4.37×10(-4)), and TRPV4 (P = 1.58×10(-3)). Among ever-smokers, SERPINA1 showed the most significant association with FEV(1) (P = 8.41×10(-5)), followed by PDE4D (P = 1.22×10(-4)). The strongest association with FEV(1)/FVC ratio was observed with ABCC1 (P = 4.38×10(-4)), and ESR1 (P = 5.42×10(-4)) among ever-smokers.
Conclusions: Polymorphisms spanning previously associated lung function genes did not show strong evidence for association with lung function measures in the SpiroMeta consortium population. Common SERPINA1 polymorphisms may affect FEV(1) among smokers in the general population.
Funded by: Cancer Research UK; Chief Scientist Office: CZB/4/710; Medical Research Council: G0000934, G0401540, G0600705, G0701863, G0800582, G0801056, G0902125, G0902313, G9815508, G990146, MC_QA137934, MC_U106179471, MC_U106188470; NHLBI NIH HHS: 5R01HL087679-02; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 077016/Z/05/Z, 079895, 092731
PloS one 2011;6;5;e19382
Nature reviews. Microbiology 2011;9;9;633
RATT: Rapid Annotation Transfer Tool.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. firstname.lastname@example.org
Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
Nucleic acids research 2011;39;9;e57
Coordinating cell cycle progression via cyclin specificity.
Cell cycle (Georgetown, Tex.) 2011;10;24;4195-6
Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.
The Gurdon Institute, University of Cambridge, Cambridge, UK.
Cyclin-dependent kinases comprise the conserved machinery that drives progress through the cell cycle, but how they do this in mammalian cells is still unclear. To identify the mechanisms by which cyclin-cdks control the cell cycle, we performed a time-resolved analysis of the in vivo interactors of cyclins E1, A2, and B1 by quantitative mass spectrometry. This global analysis of context-dependent protein interactions reveals the temporal dynamics of cyclin function in which networks of cyclin-cdk interactions vary according to the type of cyclin and cell-cycle stage. Our results explain the temporal specificity of the cell-cycle machinery, thereby providing a biochemical mechanism for the genetic requirement for multiple cyclins in vivo and reveal how the actions of specific cyclins are coordinated to control the cell cycle. Furthermore, we identify key substrates (Wee1 and c15orf42/Sld3) that reveal how cyclin A is able to promote both DNA replication and mitosis.
Funded by: Cancer Research UK: A7397; Wellcome Trust: 079643/Z/06/Z; Worldwide Cancer Research: 10-0908
Molecular cell 2011;43;3;406-17
Genome-wide association study identifies a locus at 7p15.2 associated with endometriosis.
Molecular Epidemiology, Queensland Institute of Medical Research, Herston, Queensland, Australia. email@example.com
Endometriosis is a common gynecological disease associated with pelvic pain and subfertility. We conducted a genome-wide association study (GWAS) in 3,194 individuals with surgically confirmed endometriosis (cases) and 7,060 controls from Australia and the UK. Polygenic predictive modeling showed significantly increased genetic loading among 1,364 cases with moderate to severe endometriosis. The strongest association signal was on 7p15.2 (rs12700667) for 'all' endometriosis (P = 2.6 × 10⁻⁷, odds ratio (OR) = 1.22, 95% CI 1.13-1.32) and for moderate to severe disease (P = 1.5 × 10⁻⁹, OR = 1.38, 95% CI 1.24-1.53). We replicated rs12700667 in an independent cohort from the United States of 2,392 self-reported, surgically confirmed endometriosis cases and 2,271 controls (P = 1.2 × 10⁻³, OR = 1.17, 95% CI 1.06-1.28), resulting in a genome-wide significant P value of 1.4 × 10⁻⁹ (OR = 1.20, 95% CI 1.13-1.27) for 'all' endometriosis in our combined datasets of 5,586 cases and 9,331 controls. rs12700667 is located in an intergenic region upstream of the plausible candidate genes NFE2L3 and HOXA10.
Funded by: Howard Hughes Medical Institute; NCI NIH HHS: P01 CA087969, R01 CA049449, R01 CA050385, R01 CA067262, U01 CA098233; NICHD NIH HHS: R01 HD052473, R01 HD057210; NIDDK NIH HHS: P01 DK070756; Wellcome Trust: 064890, 081682, 084766, 085235, WT084766/Z/08/Z, WT085235/Z/08/Z, WT91745/Z/10/Z
Nature genetics 2011;43;1;51-4
Identity-by-descent-based phasing and imputation in founder populations using graphical models.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identical-By-Descent (IBD). We apply the Bayesian network based model in a new phasing algorithm, called systematic long-range phasing (SLRP), that can capitalize on the close genetic relationships in isolated founder populations, and show with simulated and real genome-wide genotype data that SLRP substantially reduces the rate of phasing errors compared to previous phasing algorithms. Furthermore, the method accurately identifies regions of IBD, enabling linkage-like studies without pedigrees, and can be used to impute most genotypes with very low error rate.
Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: MC_U127561128; Wellcome Trust: 076113, 077192, 085475, WT077192
Genetic epidemiology 2011;35;8;853-60
Insights into the genetic architecture of osteoarthritis from stage 1 of the arcOGEN study.
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.
Objectives: The genetic aetiology of osteoarthritis has not yet been elucidated. To enable a well-powered genome-wide association study (GWAS) for osteoarthritis, the authors have formed the arcOGEN Consortium, a UK-wide collaborative effort aiming to scan genome-wide over 7500 osteoarthritis cases in a two-stage genome-wide association scan. Here the authors report the findings of the stage 1 interim analysis.
Methods: The authors have performed a genome-wide association scan for knee and hip osteoarthritis in 3177 cases and 4894 population-based controls from the UK. Replication of promising signals was carried out in silico in five further scans (44,449 individuals), and de novo in 14 534 independent samples, all of European descent.
Results: None of the association signals the authors identified reach genome-wide levels of statistical significance, therefore stressing the need for corroboration in sample sets of a larger size. Application of analytical approaches to examine the allelic architecture of disease to the stage 1 genome-wide association scan data suggests that osteoarthritis is a highly polygenic disease with multiple risk variants conferring small effects.
Conclusions: Identifying loci conferring susceptibility to osteoarthritis will require large-scale sample sizes and well-defined phenotypes to minimise heterogeneity.
Funded by: Arthritis Research UK: 17489; Medical Research Council: G0901461, MC_U122886349; NIAMS NIH HHS: K24 AR048841, R01 AR052000
Annals of the rheumatic diseases 2011;70;5;864-7
Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
Background: Myelodysplastic syndromes are a diverse and common group of chronic hematologic cancers. The identification of new genetic lesions could facilitate new diagnostic and therapeutic strategies.
Methods: We used massively parallel sequencing technology to identify somatically acquired point mutations across all protein-coding exons in the genome in 9 patients with low-grade myelodysplasia. Targeted resequencing of the gene encoding RNA splicing factor 3B, subunit 1 (SF3B1), was also performed in a cohort of 2087 patients with myeloid or other cancers.
Results: We identified 64 point mutations in the 9 patients. Recurrent somatically acquired mutations were identified in SF3B1. Follow-up revealed SF3B1 mutations in 72 of 354 patients (20%) with myelodysplastic syndromes, with particularly high frequency among patients whose disease was characterized by ring sideroblasts (53 of 82 [65%]). The gene was also mutated in 1 to 5% of patients with a variety of other tumor types. The observed mutations were less deleterious than was expected on the basis of chance, suggesting that the mutated protein retains structural integrity with altered function. SF3B1 mutations were associated with down-regulation of key gene networks, including core mitochondrial pathways. Clinically, patients with SF3B1 mutations had fewer cytopenias and longer event-free survival than patients without SF3B1 mutations.
Conclusions: Mutations in SF3B1 implicate abnormalities of messenger RNA splicing in the pathogenesis of myelodysplastic syndromes. (Funded by the Wellcome Trust and others.).
Funded by: Medical Research Council: G0800784, G1000729, MC_U105161083; NCI NIH HHS: P01 CA078378, P01 CA078378-10, R01 CA124929, R01 CA124929-05; PHS HHS: P01-155249, P01-78378, P50-100007, R01-124929; Wellcome Trust: 077012/Z/05/Z, 088340, 093867, WT088340MA
The New England journal of medicine 2011;365;15;1384-95
Fetal-specific DNA methylation ratio permits noninvasive prenatal diagnosis of trisomy 21.
Cytogenetics and Genomics Department, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.
The trials performed worldwide toward noninvasive prenatal diagnosis (NIPD) of Down's syndrome (or trisomy 21) have shown the commercial and medical potential of NIPD compared to the currently used invasive prenatal diagnostic procedures. Extensive investigation of methylation differences between the mother and the fetus has led to the identification of differentially methylated regions (DMRs). In this study, we present a strategy using the methylated DNA immunoprecipitation (MeDiP) methodology in combination with real-time quantitative PCR (qPCR) to achieve fetal chromosome dosage assessment, which can be performed noninvasively through the analysis of fetal-specific DMRs. We achieved noninvasive prenatal detection of trisomy 21 by determining the methylation ratio of normal and trisomy 21 cases for each tested fetal-specific DMR present in maternal peripheral blood, followed by further statistical analysis. The application of this fetal-specific methylation ratio approach provided correct diagnosis of 14 trisomy 21 and 26 normal cases.
Funded by: Wellcome Trust: 079643
Nature medicine 2011;17;4;510-3
Fetal-specific DNA methylation ratio permits noninvasive prenatal diagnosis of trisomy 21
Obstetrical and Gynecological Survey 2011;66;419
Bacterial epidemiology and biology--lessons from genome sequencing.
The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Next-generation sequencing has ushered in a new era of microbial genomics, enabling the detailed historical and geographical tracing of bacteria. This is helping to shape our understanding of bacterial evolution.
Funded by: Wellcome Trust
Genome biology 2011;12;10;230
Joint genetic analysis of gene expression data with inferred cellular phenotypes.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. firstname.lastname@example.org
Even within a defined cell type, the expression level of a gene differs in individual samples. The effects of genotype, measured factors such as environmental conditions, and their interactions have been explored in recent studies. Methods have also been developed to identify unmeasured intermediate factors that coherently influence transcript levels of multiple genes. Here, we show how to bring these two approaches together and analyse genetic effects in the context of inferred determinants of gene expression. We use a sparse factor analysis model to infer hidden factors, which we treat as intermediate cellular phenotypes that in turn affect gene expression in a yeast dataset. We find that the inferred phenotypes are associated with locus genotypes and environmental conditions and can explain genetic associations to genes in trans. For the first time, we consider and find interactions between genotype and intermediate phenotypes inferred from gene expression levels, complementing and extending established results.
Funded by: Wellcome Trust: WT077192/Z/05/Z
PLoS genetics 2011;7;1;e1001276
Maps of open chromatin guide the functional follow-up of genome-wide association signals: application to hematological traits.
Wellcome Trust Sanger Institute, Hinxton, United Kingdom. email@example.com
Turning genetic discoveries identified in genome-wide association (GWA) studies into biological mechanisms is an important challenge in human genetics. Many GWA signals map outside exons, suggesting that the associated variants may lie within regulatory regions. We applied the formaldehyde-assisted isolation of regulatory elements (FAIRE) method in a megakaryocytic and an erythroblastoid cell line to map active regulatory elements at known loci associated with hematological quantitative traits, coronary artery disease, and myocardial infarction. We showed that the two cell types exhibit distinct patterns of open chromatin and that cell-specific open chromatin can guide the finding of functional variants. We identified an open chromatin region at chromosome 7q22.3 in megakaryocytes but not erythroblasts, which harbors the common non-coding sequence variant rs342293 known to be associated with platelet volume and function. Resequencing of this open chromatin region in 643 individuals provided strong evidence that rs342293 is the only putative causative variant in this region. We demonstrated that the C- and G-alleles differentially bind the transcription factor EVI1 affecting PIK3CG gene expression in platelets and macrophages. A protein-protein interaction network including up- and down-regulated genes in Pik3cg knockout mice indicated that PIK3CG is associated with gene pathways with an established role in platelet membrane biogenesis and thrombus formation. Thus, rs342293 is the functional common variant at this locus; to the best of our knowledge this is the first such variant to be elucidated among the known platelet quantitative trait loci (QTLs). Our data suggested a molecular mechanism by which a non-coding GWA index SNP modulates platelet phenotype.
Funded by: British Heart Foundation: RG/09/012/28096, RG/09/12/28096; Medical Research Council: G0800784, G0900339, MC_U105260799; Wellcome Trust: 081917/Z/07/Z, 091746/Z/10/Z
PLoS genetics 2011;7;6;e1002139
Acquired bleeding disorders
Blood and Bone Marrow Pathology 2011;565-82
Citrobacter rodentium is an unstable pathogen showing evidence of significant genomic flux.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Citrobacter rodentium is a natural mouse pathogen that causes attaching and effacing (A/E) lesions. It shares a common virulence strategy with the clinically significant human A/E pathogens enteropathogenic E. coli (EPEC) and enterohaemorrhagic E. coli (EHEC) and is widely used to model this route of pathogenesis. We previously reported the complete genome sequence of C. rodentium ICC168, where we found that the genome displayed many characteristics of a newly evolved pathogen. In this study, through PFGE, sequencing of isolates showing variation, whole genome transcriptome analysis and examination of the mobile genetic elements, we found that, consistent with our previous hypothesis, the genome of C. rodentium is unstable as a result of repeat-mediated, large-scale genome recombination and because of active transposition of mobile genetic elements such as the prophages. We sequenced an additional C. rodentium strain, EX-33, to reveal that the reference strain ICC168 is representative of the species and that most of the inactivating mutations were common to both isolates and likely to have occurred early on in the evolution of this pathogen. We draw parallels with the evolution of other bacterial pathogens and conclude that C. rodentium is a recently evolved pathogen that may have emerged alongside the development of inbred mice as a model for human disease.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust
PLoS pathogens 2011;7;4;e1002018
A scalable pipeline for highly effective genetic modification of a malaria parasite.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
In malaria parasites, the systematic experimental validation of drug and vaccine targets by reverse genetics is constrained by the inefficiency of homologous recombination and by the difficulty of manipulating adenine and thymine (A+T)-rich DNA of most Plasmodium species in Escherichia coli. We overcame these roadblocks by creating a high-integrity library of Plasmodium berghei genomic DNA (>77% A+T content) in a bacteriophage N15-based vector that can be modified efficiently using the lambda Red method of recombineering. We built a pipeline for generating P. berghei genetic modification vectors at genome scale in serial liquid cultures on 96-well plates. Vectors have long homology arms, which increase recombination frequency up to tenfold over conventional designs. The feasibility of efficient genetic modification at scale will stimulate collaborative, genome-wide knockout and tagging programs for P. berghei.
Funded by: Medical Research Council: G0501670, G0501670(76331); Wellcome Trust: 089085, WT089085/Z/09/Z
Nature methods 2011;8;12;1078-82
Mendelian randomization study of B-type natriuretic peptide and type 2 diabetes: evidence of causal association from population studies.
Medical Research Council Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom. firstname.lastname@example.org
Background: Genetic and epidemiological evidence suggests an inverse association between B-type natriuretic peptide (BNP) levels in blood and risk of type 2 diabetes (T2D), but the prospective association of BNP with T2D is uncertain, and it is unclear whether the association is confounded.
Methods and findings: We analysed the association between levels of the N-terminal fragment of pro-BNP (NT-pro-BNP) in blood and risk of incident T2D in a prospective case-cohort study and genotyped the variant rs198389 within the BNP locus in three T2D case-control studies. We combined our results with existing data in a meta-analysis of 11 case-control studies. Using a Mendelian randomization approach, we compared the observed association between rs198389 and T2D to that expected from the NT-pro-BNP level to T2D association and the NT-pro-BNP difference per C allele of rs198389. In participants of our case-cohort study who were free of T2D and cardiovascular disease at baseline, we observed a 21% (95% CI 3%-36%) decreased risk of incident T2D per one standard deviation (SD) higher log-transformed NT-pro-BNP levels in analysis adjusted for age, sex, body mass index, systolic blood pressure, smoking, family history of T2D, history of hypertension, and levels of triglycerides, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol. The association between rs198389 and T2D observed in case-control studies (odds ratio = 0.94 per C allele, 95% CI 0.91-0.97) was similar to that expected (0.96, 0.93-0.98) based on the pooled estimate for the log-NT-pro-BNP level to T2D association derived from a meta-analysis of our study and published data (hazard ratio = 0.82 per SD, 0.74-0.90) and the difference in NT-pro-BNP levels (0.22 SD, 0.15-0.29) per C allele of rs198389. No significant associations were observed between the rs198389 genotype and potential confounders.
Conclusions: Our results provide evidence for a potential causal role of the BNP system in the aetiology of T2D. Further studies are needed to investigate the mechanisms underlying this association and possibilities for preventive interventions. Please see later in the article for the Editors' Summary.
Funded by: British Heart Foundation: FS/10/005/28147; Medical Research Council: G0401527, G0601463, G1000143; Wellcome Trust: 077016/Z/05/Z
PLoS medicine 2011;8;10;e1001112
Jamb and jamc are essential for vertebrate myocyte fusion.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Cellular fusion is required in the development of several tissues, including skeletal muscle. In vertebrates, this process is poorly understood and lacks an in vivo-validated cell surface heterophilic receptor pair that is necessary for fusion. Identification of essential cell surface interactions between fusing cells is an important step in elucidating the molecular mechanism of cellular fusion. We show here that the zebrafish orthologues of JAM-B and JAM-C receptors are essential for fusion of myocyte precursors to form syncytial muscle fibres. Both jamb and jamc are dynamically co-expressed in developing muscles and encode receptors that physically interact. Heritable mutations in either gene prevent myocyte fusion in vivo, resulting in an overabundance of mononuclear, but otherwise overtly normal, functional fast-twitch muscle fibres. Transplantation experiments show that the Jamb and Jamc receptors must interact between neighbouring cells (in trans) for fusion to occur. We also show that jamc is ectopically expressed in prdm1a mutant slow muscle precursors, which inappropriately fuse with other myocytes, suggesting that control of myocyte fusion through regulation of jamc expression has important implications for the growth and patterning of muscles. Our discovery of a receptor-ligand pair critical for fusion in vivo has important implications for understanding the molecular mechanisms responsible for myocyte fusion and its regulation in vertebrate myogenesis.
Funded by: Wellcome Trust: 077047/Z/05/Z, 077108/Z/05/Z
PLoS biology 2011;9;12;e1001216
A resource of vectors and ES cells for targeted deletion of microRNAs in mice.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
The 21-23 nucleotide, single-stranded RNAs classified as microRNAs (miRNA) perform fundamental roles in diverse cellular and developmental processes. In contrast to the situation for protein-coding genes, no public resource of miRNA mouse mutant alleles exists. Here we describe a collection of 428 miRNA targeting vectors covering 476 of the miRNA genes annotated in the miRBase registry. Using these vectors, we generated a library of highly germline-transmissible C57BL/6N mouse embryonic stem (ES) cell clones harboring targeted deletions for 392 miRNA genes. For most of these targeted clones, chimerism and germline transmission can be scored through a coat color marker. The targeted alleles have been designed to be adaptable research tools that can be efficiently altered by recombinase-mediated cassette exchange to create reporter, conditional and other allelic variants. This miRNA knockout (mirKO) resource can be searched electronically and is available from ES cell repositories for distribution to the scientific community.
Funded by: Wellcome Trust: 079643, WT079643
Nature biotechnology 2011;29;9;840-5
Genomic libraries: I. Construction and screening of fosmid genomic libraries.
Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.
Large insert genome libraries have been a core resource required to sequence genomes, analyze haplotypes, and aid gene discovery. While next generation sequencing technologies are revolutionizing the field of genomics, traditional genome libraries will still be required for accurate genome assembly. Their utility is also being extended to functional studies for understanding DNA regulatory elements. Here, we present a detailed method for constructing genomic fosmid libraries, testing for common contaminants, gridding the library to nylon membranes, then hybridizing the library membranes with a radiolabeled probe to identify corresponding genomic clones. While this chapter focuses on fosmid libraries, many of these steps can also be applied to bacterial artificial chromosome libraries.
Methods in molecular biology (Clifton, N.J.) 2011;772;37-58
Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.
Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.
Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.
Methods in molecular biology (Clifton, N.J.) 2011;772;59-81
Early Diagnosis of Werner's Syndrome Using Exome-Wide Sequencing in a Single, Atypical Patient.
Institute of Metabolic Science, University of Cambridge Metabolic Research Laboratories Cambridge, UK.
Genetic diagnosis of inherited metabolic disease is conventionally achieved through syndrome recognition and targeted gene sequencing, but many patients receive no specific diagnosis. Next-generation sequencing allied to capture of expressed sequences from genomic DNA now offers a powerful new diagnostic approach. Barriers to routine diagnostic use include cost, and the complexity of interpreting results arising from simultaneous identification of large numbers of variants. We applied exome-wide sequencing to an individual, 16-year-old daughter of consanguineous parents with a novel syndrome of short stature, severe insulin resistance, ptosis, and microcephaly. Pulldown of expressed sequences from genomic DNA followed by massively parallel sequencing was undertaken. Single nucleotide variants were called using SAMtools prior to filtering based on sequence quality and existence in control genomes and exomes. Of 485 genetic variants predicted to alter protein sequence and absent from control data, 24 were homozygous in the patient. One mutation - the p.Arg732X mutation in the WRN gene - has previously been reported in Werner's syndrome (WS). On re-evaluation of the patient several early features of WS were detected including loss of fat from the extremities and frontal hair thinning. Lymphoblastoid cells from the proband exhibited a defective decatenation checkpoint, consistent with loss of WRN activity. We have thus diagnosed WS some 15 years earlier than average, permitting aggressive prophylactic therapy and screening for WS complications, illustrating the potential of exome-wide sequencing to achieve early diagnosis and change management of rare autosomal recessive disease, even in individual patients of consanguineous parentage with apparently novel syndromes.
Funded by: Cancer Research UK: 8300; Medical Research Council: G0700733; Wellcome Trust: 095515
Frontiers in endocrinology 2011;2;8
Founder effect in the Horn of Africa for an insulin receptor mutation that may impair receptor recycling.
University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, University of Cambridge, Addenbrooke's Hospital B289, Cambridge, CB2 0QR, UK.
Aims/hypothesis: Genetic insulin receptoropathies are a rare cause of severe insulin resistance. We identified the Ile119Met missense mutation in the insulin receptor INSR gene, previously reported in a Yemeni kindred, in four unrelated patients with Somali ancestry. We aimed to investigate a possible genetic founder effect, and to study the mechanism of loss of function of the mutant receptor.
Methods: Biochemical profiling and DNA haplotype analysis of affected patients were performed. Insulin receptor expression in lymphoblastoid cells from a homozygous p.Ile119Met INSR patient, and in cells heterologously expressing the mutant receptor, was examined. Insulin binding, insulin-stimulated receptor autophosphorylation, and cooperativity and pH dependency of insulin dissociation were also assessed.
Results: All patients had biochemical profiles pathognomonic of insulin receptoropathy, while haplotype analysis revealed the putative shared region around the INSR mutant to be no larger than 28 kb. An increased insulin proreceptor to β subunit ratio was seen in patient-derived cells. Steady state insulin binding and insulin-stimulated autophosphorylation of the mutant receptor was normal; however it exhibited decreased insulin dissociation rates with preserved cooperativity, a difference accentuated at low pH.
Conclusions/interpretation: The p.Ile119Met INSR appears to have arisen around the Horn of Africa, and should be sought first in severely insulin resistant patients with ancestry from this region. Despite collectively compelling genetic, clinical and biochemical evidence for its pathogenicity, loss of function in conventional in vitro assays is subtle, suggesting mildly impaired receptor recycling only.
Funded by: Medical Research Council; Wellcome Trust: 077016/Z/05/Z, 078986/Z/06/Z, 080952/Z/06/Z, 087678/Z/08/Z, 095515
Evidence that Cd101 is an autoimmune diabetes gene in nonobese diabetic mice.
Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom.
We have previously proposed that sequence variation of the CD101 gene between NOD and C57BL/6 mice accounts for the protection from type 1 diabetes (T1D) provided by the insulin-dependent diabetes susceptibility region 10 (Idd10), a <1 Mb region on mouse chromosome 3. In this study, we provide further support for the hypothesis that Cd101 is Idd10 using haplotype and expression analyses of novel Idd10 congenic strains coupled to the development of a CD101 knockout mouse. Susceptibility to T1D was correlated with genotype-dependent CD101 expression on multiple cell subsets, including Foxp3(+) regulatory CD4(+) T cells, CD11c(+) dendritic cells, and Gr1(+) myeloid cells. The correlation of CD101 expression on immune cells from four independent Idd10 haplotypes with the development of T1D supports the identity of Cd101 as Idd10. Because CD101 has been associated with regulatory T and Ag presentation cell functions, our results provide a further link between immune regulation and susceptibility to T1D.
Funded by: NIAID NIH HHS: AI 15416, N01 AI015416, P01 AI039671, P01 AI039671-16, P01AI039671; NIDDK NIH HHS: P30 DK078392, P30 DK078392-01, R01 DK084054, R01 DK084054-03, R01DK084054; Wellcome Trust: 079895, 091157
Journal of immunology (Baltimore, Md. : 1950) 2011;187;1;325-36
Cutting edge: the membrane attack complex of complement is required for the development of murine experimental cerebral malaria.
Department of Microbiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
Cerebral malaria is the most severe complication of Plasmodium falciparum infection and accounts for a large number of malaria fatalities worldwide. Recent studies demonstrated that C5(-/-) mice are resistant to experimental cerebral malaria (ECM) and suggested that protection was due to loss of C5a-induced inflammation. Surprisingly, we observed that C5aR(-/-) mice were fully susceptible to disease, indicating that C5a is not required for ECM. C3aR(-/-) and C3aR(-/-) × C5aR(-/-) mice were equally susceptible to ECM as were wild-type mice, indicating that neither complement anaphylatoxin receptor is critical for ECM development. In contrast, C9 deposition in the brains of mice with ECM suggested an important role for the terminal complement pathway. Treatment with anti-C9 Ab significantly increased survival time and reduced mortality in ECM. Our data indicate that protection from ECM in C5(-/-) mice is mediated through inhibition of membrane attack complex formation and not through C5a-induced inflammation.
Funded by: Medical Research Council: G0501670; NIAID NIH HHS: AI08382, R03 AI083820, R03 AI083820-02, T32 AI007051, T32 AI007051-35, T32 AI07051
Journal of immunology (Baltimore, Md. : 1950) 2011;186;12;6657-60
Kino: A generic document management system for biologists using SA-REST and faceted search
Proceedings - 5th IEEE International Conference on Semantic Computing, ICSC 2011;205-8
A plethora of Plasmodium species in wild apes: a source of human infection?
Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. firstname.lastname@example.org
Recent studies of captive and wild-living apes in Africa have uncovered evidence of numerous new Plasmodium species, one of which was identified as the immediate precursor of human Plasmodium falciparum. These findings raise the question whether wild apes could be a recurrent source of Plasmodium infections in humans. This question is not new, but was the subject of intense investigation by researchers in the first half of the last century. Re-examination of their work in the context of recent molecular findings provides a new framework to understand the diversity of Plasmodium species and to assess the risk of future cross-species transmissions to humans in the context of proposed malaria eradication programs.
Funded by: NIAID NIH HHS: P30 AI 27767, R01 AI091595, R01 AI50529, R01 AI58715, R03 AI074778, R37 AI050529; Wellcome Trust
Trends in parasitology 2011;27;5;222-9
Genome sequencing gets func-y.
Nature reviews. Microbiology 2011;9;6;401
Genetic predisposition to long-term nondiabetic deteriorations in glucose homeostasis: Ten-year follow-up of the GLACIER study.
Department of Public Health and Clinical Medicine, Umeå University Hospital, Sweden.
Objective: To assess whether recently discovered genetic loci associated with hyperglycemia also predict long-term changes in glycemic traits.
Research design and methods: Sixteen fasting glucose-raising loci were genotyped in middle-aged adults from the Gene x Lifestyle interactions And Complex traits Involved in Elevated disease Risk (GLACIER) Study, a population-based prospective cohort study from northern Sweden. Genotypes were tested for association with baseline fasting and 2-h postchallenge glycemia (N = 16,330), and for changes in these glycemic traits during a 10-year follow-up period (N = 4,059).
Results: Cross-sectional directionally consistent replication with fasting glucose concentrations was achieved for 12 of 16 variants; 10 variants were also associated with impaired fasting glucose (IFG) and 7 were independently associated with 2-h postchallenge glucose concentrations. In prospective analyses, the effect alleles at four loci (GCK rs4607517, ADRA2A rs10885122, DGKB-TMEM195 rs2191349, and G6PC2 rs560887) were nominally associated with worsening fasting glucose concentrations during 10-years of follow-up. MTNR1B rs10830963, which was predictive of elevated fasting glucose concentrations in cross-sectional analyses, was associated with a protective effect on postchallenge glucose concentrations during follow-up; however, this was only when baseline fasting and 2-h glucoses were adjusted for. An additive effect of multiple risk alleles on glycemic traits was observed: a weighted genetic risk score (80th vs. 20th centiles) was associated with a 0.16 mmol/l (P = 2.4 × 10⁻⁶) greater elevation in fasting glucose and a 64% (95% CI: 33-201%) higher risk of developing IFG during 10 years of follow-up.
Conclusions: Our findings imply that genetic profiling might facilitate the early detection of persons who are genetically susceptible to deteriorating glucose control; studies of incident type 2 diabetes and discrete cardiovascular end points will help establish whether the magnitude of these changes is clinically relevant.
Effect of using varying negative examples in transcription factor binding site predictions
Lecture Notes in Computer Science 2011;6623 LNCS;1-12
The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium.
The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA. email@example.com
The International Knockout Mouse Consortium (IKMC) aims to mutate all protein-coding genes in the mouse using a combination of gene targeting and gene trapping in mouse embryonic stem (ES) cells and to make the generated resources readily available to the research community. The IKMC database and web portal (www.knockoutmouse.org) serves as the central public web site for IKMC data and facilitates the coordination and prioritization of work within the consortium. Researchers can access up-to-date information on IKMC knockout vectors, ES cells and mice for specific genes, and follow links to the respective repositories from which corresponding IKMC products can be ordered. Researchers can also use the web site to nominate genes for targeting, or to indicate that targeting of a gene should receive high priority. The IKMC database provides data to, and features extensive interconnections with, other community databases.
Funded by: Medical Research Council: MC_U127527203; NHGRI NIH HHS: HG004074
Nucleic acids research 2011;39;Database issue;D849-55
Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
Naturally acquired blood-stage infections of the malaria parasite Plasmodium falciparum typically harbour multiple haploid clones. The apparent number of clones observed in any single infection depends on the diversity of the polymorphic markers used for the analysis, and the relative abundance of rare clones, which frequently fail to be detected among PCR products derived from numerically dominant clones. However, minority clones are of clinical interest as they may harbour genes conferring drug resistance, leading to enhanced survival after treatment and the possibility of subsequent therapeutic failure. We deployed new generation sequencing to derive genome data for five non-propagated parasite isolates taken directly from 4 different patients treated for clinical malaria in a UK hospital. Analysis of depth of coverage and length of sequence intervals between paired reads identified both previously described and novel gene deletions and amplifications. Full-length sequence data was extracted for 6 loci considered to be under selection by antimalarial drugs, and both known and previously unknown amino acid substitutions were identified. Full mitochondrial genomes were extracted from the sequencing data for each isolate, and these are compared against a panel of polymorphic sites derived from published or unpublished but publicly available data. Finally, genome-wide analysis of clone multiplicity was performed, and the number of infecting parasite clones estimated for each isolate. Each patient harboured at least 3 clones of P. falciparum by this analysis, consistent with results obtained with conventional PCR analysis of polymorphic merozoite antigen loci. We conclude that genome sequencing of peripheral blood P. falciparum taken directly from malaria patients provides high quality data useful for drug resistance studies, genomic structural analyses and population genetics, and also robustly represents clonal multiplicity.
Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 077012/Z/05/Z, 090532
PloS one 2011;6;8;e23204
Chromosome and gene copy number variation allow major structural change between species and strains of Leishmania.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom.
Leishmania parasites cause a spectrum of clinical pathology in humans ranging from disfiguring cutaneous lesions to fatal visceral leishmaniasis. We have generated a reference genome for Leishmania mexicana and refined the reference genomes for Leishmania major, Leishmania infantum, and Leishmania braziliensis. This has allowed the identification of a remarkably low number of genes or paralog groups (2, 14, 19, and 67, respectively) unique to one species. These were found to be conserved in additional isolates of the same species. We have predicted allelic variation and find that in these isolates, L. major and L. infantum have a surprisingly low number of predicted heterozygous SNPs compared with L. braziliensis and L. mexicana. We used short read coverage to infer ploidy and gene copy numbers, identifying large copy number variations between species, with 200 tandem gene arrays in L. major and 132 in L. mexicana. Chromosome copy number also varied significantly between species, with nine supernumerary chromosomes in L. infantum, four in L. mexicana, two in L. braziliensis, and one in L. major. A significant bias against gene arrays on supernumerary chromosomes was shown to exist, indicating that duplication events occur more frequently on disomic chromosomes. Taken together, our data demonstrate that there is little variation in unique gene content across Leishmania species, but large-scale genetic heterogeneity can result through gene amplification on disomic chromosomes and variation in chromosome number. Increased gene copy number due to chromosome amplification may contribute to alterations in gene expression in response to environmental conditions in the host, providing a genetic basis for disease tropism.
Funded by: Wellcome Trust: 076355, 085775, 085822
Genome research 2011;21;12;2129-42
Emergent neutrality in adaptive asexual evolution.
Institut für Theoretische Physik, Universität zu Köln, 50937 Köln, Germany.
In nonrecombining genomes, genetic linkage can be an important evolutionary force. Linkage generates interference interactions, by which simultaneously occurring mutations affect each other's chance of fixation. Here, we develop a comprehensive model of adaptive evolution in linked genomes, which integrates interference interactions between multiple beneficial and deleterious mutations into a unified framework. By an approximate analytical solution, we predict the fixation rates of these mutations, as well as the probabilities of beneficial and deleterious alleles at fixed genomic sites. We find that interference interactions generate a regime of emergent neutrality: all genomic sites with selection coefficients smaller in magnitude than a characteristic threshold have nearly random fixed alleles, and both beneficial and deleterious mutations at these sites have nearly neutral fixation rates. We show that this dynamic limits not only the speed of adaptation, but also a population's degree of adaptation in its current environment. We apply the model to different scenarios: stationary adaptation in a time-dependent environment and approach to equilibrium in a fixed environment. In both cases, the analytical predictions are in good agreement with numerical simulations. Our results suggest that interference can severely compromise biological functions in an adapting population, which sets viability limits on adaptive evolution under linkage.
Funded by: Wellcome Trust: 091747
Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease.
Universität zu Lübeck, Medizinische Klinik II, Lübeck, Germany.
We performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals. This analysis identified 13 loci newly associated with CAD at P < 5 × 10⁻⁸ and confirmed the association of 10 of 12 previously reported CAD loci. The 13 new loci showed risk allele frequencies ranging from 0.13 to 0.91 and were associated with a 6% to 17% increase in the risk of CAD per allele. Notably, only three of the new loci showed significant association with traditional CAD risk factors and the majority lie in gene regions not previously implicated in the pathogenesis of CAD. Finally, five of the new CAD risk loci appear to have pleiotropic effects, showing strong association with various other human diseases or traits.
Funded by: British Heart Foundation: PG/08/094/26019, RG/08/014/24067, RG/09/012/28096; Medical Research Council: G0401527, G0801566, G1000143, MC_U106179471; NHLBI NIH HHS: HL087647, N01 HC025195, R01 HL087647, R01HL089650-02
Nature genetics 2011;43;4;333-8
A role for cohesin in T-cell-receptor rearrangement and thymocyte differentiation.
Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK.
Cohesin enables post-replicative DNA repair and chromosome segregation by holding sister chromatids together from the time of DNA replication in S phase until mitosis. There is growing evidence that cohesin also forms long-range chromosomal cis-interactions and may regulate gene expression in association with CTCF, mediator or tissue-specific transcription factors. Human cohesinopathies such as Cornelia de Lange syndrome are thought to result from impaired non-canonical cohesin functions, but a clear distinction between the cell-division-related and cell-division-independent functions of cohesion--as exemplified in Drosophila--has not been demonstrated in vertebrate systems. To address this, here we deleted the cohesin locus Rad21 in mouse thymocytes at a time in development when these cells stop cycling and rearrange their T-cell receptor (TCR) α locus (Tcra). Rad21-deficient thymocytes had a normal lifespan and retained the ability to differentiate, albeit with reduced efficiency. Loss of Rad21 led to defective chromatin architecture at the Tcra locus, where cohesion-binding sites flank the TEA promoter and the Eα enhancer, and demarcate Tcra from interspersed Tcrd elements and neighbouring housekeeping genes. Cohesin was required for long-range promoter-enhancer interactions, Tcra transcription, H3K4me3 histone modifications that recruit the recombination machinery and Tcra rearrangement. Provision of pre-rearranged TCR transgenes largely rescued thymocyte differentiation, demonstrating that among thousands of potential target genes across the genome, defective Tcra rearrangement was limiting for the differentiation of cohesin-deficient thymocytes. These findings firmly establish a cell-division-independent role for cohesin in Tcra locus rearrangement and provide a comprehensive account of the mechanisms by which cohesin enables cellular differentiation in a well-characterized mammalian system.
Funded by: Cancer Research UK: 13031; Howard Hughes Medical Institute; Medical Research Council: MC_U120027516, MC_U120081295; NIAID NIH HHS: R37 AI032524, R37 AI032524-20; NIGMS NIH HHS: R37 GM041052, R37 GM041052-22; Wellcome Trust
Silencing of RhoA nucleotide exchange factor, ARHGEF3, reveals its unexpected role in iron uptake.
Department of Haematology, University of Cambridge and NHS Blood and Transplant, Cambridge, UK.
Genomewide association meta-analysis studies have identified > 100 independent genetic loci associated with blood cell indices, including volume and count of platelets and erythrocytes. Although several of these loci encode known regulators of hematopoiesis, the mechanism by which most sequence variants exert their effect on blood cell formation remains elusive. An example is the Rho guanine nucleotide exchange factor, ARHGEF3, which was previously implicated by genomewide association meta-analysis studies in bone cell biology. Here, we report on the unexpected role of ARHGEF3 in regulation of iron uptake and erythroid cell maturation. Although early erythroid differentiation progressed normally, silencing of arhgef3 in Danio rerio resulted in microcytic and hypochromic anemia. This was rescued by intracellular supplementation of iron, showing that arhgef3-depleted erythroid cells are fully capable of hemoglobinization. Disruption of the arhgef3 target, RhoA, also produced severe anemia, which was, again, corrected by iron injection. Moreover, silencing of ARHGEF3 in erythromyeloblastoid cells K562 showed that the uptake of transferrin was severely impaired. Taken together, this is the first study to provide evidence for ARHGEF3 being a regulator of transferrin uptake in erythroid cells, through activation of RHOA.
Funded by: British Heart Foundation: RG/09/012/28096; Wellcome Trust: WT 077037/Z/05/Z, WT077047/Z/05/Z, WT082597/Z/07/Z
Indian Siddis: African descendants with Indian admixture.
Centre for Cellular and Molecular Biology, Council of Scientific and Industrial Research, Hyderabad, India.
The Siddis (Afro-Indians) are a tribal population whose members live in coastal Karnataka, Gujarat, and in some parts of Andhra Pradesh. Historical records indicate that the Portuguese brought the Siddis to India from Africa about 300-500 years ago; however, there is little information about their more precise ancestral origins. Here, we perform a genome-wide survey to understand the population history of the Siddis. Using hundreds of thousands of autosomal markers, we show that they have inherited ancestry from Africans, Indians, and possibly Europeans (Portuguese). Additionally, analyses of the uniparental (Y-chromosomal and mitochondrial DNA) markers indicate that the Siddis trace their ancestry to Bantu speakers from sub-Saharan Africa. We estimate that the admixture between the African ancestors of the Siddis and neighboring South Asian groups probably occurred in the past eight generations (∼200 years ago), consistent with historical records.
American journal of human genetics 2011;89;1;154-61
Common variants on 8p12 and 1q24.2 confer risk of schizophrenia.
Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, China. firstname.lastname@example.org
Schizophrenia is a severe mental disorder affecting ∼1% of the world population, with heritability of up to 80%. To identify new common genetic risk factors, we performed a genome-wide association study (GWAS) in the Han Chinese population. The discovery sample set consisted of 3,750 individuals with schizophrenia and 6,468 healthy controls (1,578 cases and 1,592 controls from northern Han Chinese, 1,238 cases and 2,856 controls from central Han Chinese, and 934 cases and 2,020 controls from the southern Han Chinese). We further analyzed the strongest association signals in an additional independent cohort of 4,383 cases and 4,539 controls from the Han Chinese population. Meta-analysis identified common SNPs that associated with schizophrenia with genome-wide significance on 8p12 (rs16887244, P = 1.27 × 10(-10)) and 1q24.2 (rs10489202, P = 9.50 × 10(-9)). Our findings provide new insights into the pathogenesis of schizophrenia.
Nature genetics 2011;43;12;1224-7
The tammar wallaby major histocompatibility complex shows evidence of past genomic instability.
Faculty of Veterinary Science, University of Sydney, NSW 2006, Australia.
Background: The major histocompatibility complex (MHC) is a group of genes with a variety of roles in the innate and adaptive immune responses. MHC genes form a genetically linked cluster in eutherian mammals, an organization that is thought to confer functional and evolutionary advantages to the immune system. The tammar wallaby (Macropus eugenii), an Australian marsupial, provides a unique model for understanding MHC gene evolution, as many of its antigen presenting genes are not linked to the MHC, but are scattered around the genome.
Results: Here we describe the 'core' tammar wallaby MHC region on chromosome 2q by ordering and sequencing 33 BAC clones, covering over 4.5 MB and containing 129 genes. When compared to the MHC region of the South American opossum, eutherian mammals and non-mammals, the wallaby MHC has a novel gene organization. The wallaby has undergone an expansion of MHC class II genes, which are separated into two clusters by the class III genes. The antigen processing genes have undergone duplication, resulting in two copies of TAP1 and three copies of TAP2. Notably, Kangaroo Endogenous Retroviral Elements are present within the region and may have contributed to the genomic instability.
Conclusions: The wallaby MHC has been extensively remodeled since the American and Australian marsupials last shared a common ancestor. The instability is characterized by the movement of antigen presenting genes away from the core MHC, most likely via the presence and activity of retroviral elements. We propose that the movement of class II genes away from the ancestral class II region has allowed this gene family to expand and diversify in the wallaby. The duplication of TAP genes in the wallaby MHC makes this species a unique model organism for studying the relationship between MHC gene organization and function.
Funded by: Wellcome Trust: 084071, 089305
BMC genomics 2011;12;421
A conditional knockout resource for the genome-wide study of mouse gene function.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com
Gene targeting in embryonic stem cells has become the principal technology for manipulation of the mouse genome, offering unrivalled accuracy in allele design and access to conditional mutagenesis. To bring these advantages to the wider research community, large-scale mouse knockout programmes are producing a permanent resource of targeted mutations in all protein-coding genes. Here we report the establishment of a high-throughput gene-targeting pipeline for the generation of reporter-tagged, conditional alleles. Computational allele design, 96-well modular vector construction and high-efficiency gene-targeting strategies have been combined to mutate genes on an unprecedented scale. So far, more than 12,000 vectors and 9,000 conditional targeted alleles have been produced in highly germline-competent C57BL/6N embryonic stem cells. High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.
Funded by: NHGRI NIH HHS: U01-HG004080; Wellcome Trust: 077188
Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes.
Department of Twin Research and Genetic Epidemiology, King's College London, London, UK.
Genome-wide association studies have identified many genetic variants associated with complex traits. However, at only a minority of loci have the molecular mechanisms mediating these associations been characterized. In parallel, whereas cis regulatory patterns of gene expression have been extensively explored, the identification of trans regulatory effects in humans has attracted less attention. Here we show that the type 2 diabetes and high-density lipoprotein cholesterol-associated cis-acting expression quantitative trait locus (eQTL) of the maternally expressed transcription factor KLF14 acts as a master trans regulator of adipose gene expression. Expression levels of genes regulated by this trans-eQTL are highly correlated with concurrently measured metabolic traits, and a subset of the trans-regulated genes harbor variants directly associated with metabolic phenotypes. This trans-eQTL network provides a mechanistic understanding of the effect of the KLF14 locus on metabolic disease risk and offers a potential model for other complex traits.
Funded by: Medical Research Council: G0900339; NIMH NIH HHS: R01 MH090941; Wellcome Trust: 079771, 081878, 081917, 090532, 095515
Nature genetics 2011;43;6;561-4
Candidate gene association study for diabetic retinopathy in persons with type 2 diabetes: the Candidate gene Association Resource (CARe).
Department of Ophthalmology, Harvard Medical School, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts 02114, USA. firstname.lastname@example.org
Purpose: To investigate whether variants in cardiovascular candidate genes, some of which have been previously associated with type 2 diabetes (T2D), diabetic retinopathy (DR), and diabetic nephropathy (DN), are associated with DR in the Candidate gene Association Resource (CARe).
Methods: Persons with T2D who were enrolled in the study (n = 2691) had fundus photography and genotyping of single nucleotide polymorphisms (SNPs) in 2000 candidate genes. Two case definitions were investigated: Early Treatment Diabetic Retinopathy Study (ETDRS) grades ≥ 14 and ≥ 30. The χ² analyses for each CARe cohort were combined by Cochran-Mantel-Haenszel (CMH) pooling of odds ratios (ORs) and corrected for multiple hypothesis testing. Logistic regression was performed with adjustment for other DR risk factors. Results from replication in independent cohorts were analyzed with CMH meta-analysis methods.
Results: Among 39 genes previously associated with DR, DN, or T2D, three SNPs in P-selectin (SELP) were associated with DR. The strongest association was to rs6128 (OR = 0.43, P = 0.0001, after Bonferroni correction). These associations remained significant after adjustment for DR risk factors. Among other genes examined, several variants were associated with DR with significant P values, including rs6856425 tagging α-l-iduronidase (IDUA) (P = 2.1 × 10(-5), after Bonferroni correction). However, replication in independent cohorts did not reveal study-wide significant effects. The P values after replication were 0.55 and 0.10 for rs6128 and rs6856425, respectively.
Conclusions: Genes associated with DN, T2D, and vascular diseases do not appear to be consistently associated with DR. A few genetic variants associated with DR, particularly those in SELP and near IDUA, should be investigated in additional DR cohorts.
Funded by: NCRR NIH HHS: UL1 RR 025758; NEI NIH HHS: K12-EY16335, Z01 EY000401-06, Z01 EY000401-07, Z01 EY000403-06, Z01 EY000403-07, Z01 EY000425-04, Z99 EY999999, ZIA EY000401-08, ZIA EY000401-09, ZIA EY000401-10, ZIA EY000403-08, ZIA EY000403-09, ZIA EY000403-10, ZIA EY000425-06; NHLBI NIH HHS: N01-HC-65226; Wellcome Trust: 090532
Investigative ophthalmology & visual science 2011;52;10;7593-602
Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function.
Department of Health Sciences, University of Leicester, Leicester, UK.
Pulmonary function measures reflect respiratory health and are used in the diagnosis of chronic obstructive pulmonary disease. We tested genome-wide association with forced expiratory volume in 1 second and the ratio of forced expiratory volume in 1 second to forced vital capacity in 48,201 individuals of European ancestry with follow up of the top associations in up to an additional 46,411 individuals. We identified new regions showing association (combined P < 5 × 10(-8)) with pulmonary function in or near MFAP2, TGFB2, HDAC4, RARB, MECOM (also known as EVI1), SPATA9, ARMC2, NCR3, ZKSCAN3, CDC123, C10orf11, LRP1, CCDC38, MMP15, CFDP1 and KCNE2. Identification of these 16 new loci may provide insight into the molecular mechanisms regulating pulmonary function and into molecular targets for future therapy to alleviate reduced lung function.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1, G20234; British Heart Foundation: FS05/125, PG/06/154/22043, PG/97012, RG/08/013/25942; Canadian Institutes of Health Research: MOP-82893; Cancer Research UK; Chief Scientist Office: CZB/4/710, CZD/16/6, CZD/16/6/2, CZD/16/6/4; Department of Health; Intramural NIH HHS; Medical Research Council: G0000934, G0401540, G0500539, G0501942, G0600705, G0701863, G0800582, G0801056, G0902125, G0902313, G1000861, G9815508, G9901462, MC_PC_U127561128, MC_U106188470, MC_U123092720, MC_U123092721, MC_U127561128, MC_UP_A620_1014, MC_UP_A620_1015; NCI NIH HHS: 1P50 CA70907, CA127219, CA55769, P50 CA070907, R01 CA121197, R01CA111703, U19 CA148127; NCRR NIH HHS: 5M01 RR00997, M01-RR00425, RR-024156, UL1RR025005; NHGRI NIH HHS: U01-HG-004402, U01-HG-004729; NHLBI NIH HHS: 1K23HL094531-01, 5R01HL087679-02, HL075336, HL080295, HL087652, HL088133, HL105756, N01 HC-15103, N01 HC-25195, N01 HC-55222, N01 HC095159, N01-HC-05187, N01-HC-35129, N01-HC-45133, N01-HC-45134, N01-HC-45204, N01-HC-45205, N01-HC-48047, N01-HC-48048, N01-HC-48049, N01-HC-48050, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-95095, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, N02-HL-6-4278, R01 HL-071022, R01 HL-074104, R01 HL-077612, R01 HL075476, R01 HL077612, R01-HL-084099, R01-HL084099, R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258, R01HL071259, R01HL086694, R01HL087641, R01HL59367, RC1 HL100543; NIA NIH HHS: 1R01AG032098-01A1, AG-023269, AG-027058, AG-15928, AG-20098, AG035835, N01AG12100, N01AG62101, N01AG62103, N01AG62106, R01 AG032098, RC1 AG035835, RC1 AG035835-01; NIDDK NIH HHS: DK063491; NIEHS NIH HHS: ES015794, ZO1 ES49019; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; PHS HHS: 268200625226C, 268200782096C, 268201100005C, 268201100006C, 268201100007C, 268201100008C, 268201100009C, 268201100010C, 268201100011C, 268201100012C; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 077016/Z/05/Z, 079895, 090532, 092731, GR069224
Nature genetics 2011;43;11;1082-90
Interpreting Association Signals
Analysis of Complex Disease Association Studies 2011;Chapter 16;261-76
The effect of genome-wide association scan quality control on imputation outcome for common variants.
Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.
Imputation is an extremely valuable tool in conducting and synthesising genome-wide association studies (GWASs). Directly typed SNP quality control (QC) is thought to affect imputation quality. It is, therefore, common practise to use quality-controlled (QCed) data as an input for imputing genotypes. This study aims to determine the effect of commonly applied QC steps on imputation outcomes. We performed several iterations of imputing SNPs across chromosome 22 in a dataset consisting of 3177 samples with Illumina 610 k (Illumina, San Diego, CA, USA) GWAS data, applying different QC steps each time. The imputed genotypes were compared with the directly typed genotypes. In addition, we investigated the correlation between alternatively QCed data. We also applied a series of post-imputation QC steps balancing elimination of poorly imputed SNPs and information loss. We found that the difference between the unQCed data and the fully QCed data on imputation outcome was minimal. Our study shows that imputation of common variants is generally very accurate and robust to GWAS QC, which is not a major factor affecting imputation outcome. A minority of common-frequency SNPs with particular properties cannot be accurately imputed regardless of QC stringency. These findings may not generalise to the imputation of low frequency and rare variants.
Funded by: Arthritis Research UK: 18030; Medical Research Council: G0100594, G0901461; Wellcome Trust: 079557, 088885, 090532, WT079557MA, WT088885/Z/09/Z
European journal of human genetics : EJHG 2011;19;5;610-4
Genome-wide association analysis identifies variants associated with nonalcoholic fatty liver disease that have distinct effects on metabolic traits.
Department of Internal Medicine, Division of Gastroenterology, University of Michigan, Ann Arbor, Michigan, United States of America. email@example.com
Nonalcoholic fatty liver disease (NAFLD) clusters in families, but the only known common genetic variants influencing risk are near PNPLA3. We sought to identify additional genetic variants influencing NAFLD using genome-wide association (GWA) analysis of computed tomography (CT) measured hepatic steatosis, a non-invasive measure of NAFLD, in large population based samples. Using variance components methods, we show that CT hepatic steatosis is heritable (∼26%-27%) in family-based Amish, Family Heart, and Framingham Heart Studies (n = 880 to 3,070). By carrying out a fixed-effects meta-analysis of genome-wide association (GWA) results between CT hepatic steatosis and ∼2.4 million imputed or genotyped SNPs in 7,176 individuals from the Old Order Amish, Age, Gene/Environment Susceptibility-Reykjavik study (AGES), Family Heart, and Framingham Heart Studies, we identify variants associated at genome-wide significant levels (p<5×10(-8)) in or near PNPLA3, NCAN, and PPP1R3B. We genotype these and 42 other top CT hepatic steatosis-associated SNPs in 592 subjects with biopsy-proven NAFLD from the NASH Clinical Research Network (NASH CRN). In comparisons with 1,405 healthy controls from the Myocardial Genetics Consortium (MIGen), we observe significant associations with histologic NAFLD at variants in or near NCAN, GCKR, LYPLAL1, and PNPLA3, but not PPP1R3B. Variants at these five loci exhibit distinct patterns of association with serum lipids, as well as glycemic and anthropometric traits. We identify common genetic variants influencing CT-assessed steatosis and risk of NAFLD. Hepatic steatosis associated variants are not uniformly associated with NASH/fibrosis or result in abnormalities in serum lipids or glycemic and anthropometric traits, suggesting genetic heterogeneity in the pathways influencing these traits.
Funded by: British Heart Foundation: PG/09/002/26056; Medical Research Council: G0401527, G0701863, G0801056, G0902037, G1000143, G19/35, MC_U106179471, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1014, MC_UP_A620_1015; NCRR NIH HHS: M01RR000065, M01RR000750, M01RR000827, M01RR00188, M01RR020359, UL1 RR024989, UL1RR024989, UL1RR02501401; NHLBI NIH HHS: N01-HC-25195, N02-HL-6-4278, R01 HL087647, R01HL087700, R01HL088119, U01 HL084756, U01 HL72515; NIA NIH HHS: N01-AG-12100, R01 AG18728, T32AG000262; NIAMS NIH HHS: F32AR059469; NIDDK NIH HHS: F32 DK079466-01, K01 DK067207, K23DK080145-01, K24 DK002957, P30DK072488, P60 DK079637, R01DK075681, R01DK075787, T32 DK07191-32, U01 DK061728, U01DK061713, U01DK061718, U01DK061728, U01DK061730, U01DK061731, U01DK061732, U01DK061734, U01DK061737, U01DK061738; NIGMS NIH HHS: T32 GM074905; PHS HHS: ULRR02413101; Wellcome Trust: 090532
PLoS genetics 2011;7;3;e1001324
The role of vitamin D receptor gene polymorphisms in the bone mineral density of Greek postmenopausal women with low calcium intake.
Department of Nutrition - Dietetics, Harokopio University, 17671 Athens, Greece.
The aim of this study was to investigate the effect of common vitamin D receptor (VDR) gene polymorphisms on the bone mineral density (BMD) of Greek postmenopausal women. Healthy postmenopausal women (n=578) were recruited for the study. The BMD of the lumbar spine and hip was measured using dual-energy X-ray absorptiometry with the Lunar DPX-MD device. Assessment of dietary calcium intake was performed with multiple 24-h recalls. Genotyping was performed for the BsmI, TaqI and Cdx-2 polymorphisms of the VDR gene. The selected polymorphisms were not associated with BMD, osteoporosis or osteoporotic fractures. Stratification by calcium intake revealed that in the low calcium intake group (<680 mg/day), all polymorphisms were associated with the BMD of the lumbar spine (P<.05). After adjustment for potential covariates, BsmI and TaqI polymorphisms were associated with the presence of osteoporosis (P<.05), while the presence of the minor A allele of Cdx-2 polymorphism was associated with a lower spine BMD (P=.025). In the higher calcium intake group (>680 mg/day), no significant differences were observed within the genotypes for all polymorphisms. The VDR gene is shown to affect BMD in women with low calcium intake, while its effect is masked in women with higher calcium intake. This result underlines the significance of adequate calcium intake in postmenopausal women, given that it exerts a positive effect on BMD even in the presence of negative genetic predisposition.
The Journal of nutritional biochemistry 2011;22;8;752-7
Massive genomic rearrangement acquired in a single catastrophic event during cancer development.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Cancer is driven by somatically acquired point mutations and chromosomal rearrangements, conventionally thought to accumulate gradually over time. Using next-generation sequencing, we characterize a phenomenon, which we term chromothripsis, whereby tens to hundreds of genomic rearrangements occur in a one-off cellular crisis. Rearrangements involving one or a few chromosomes crisscross back and forth across involved regions, generating frequent oscillations between two copy number states. These genomic hallmarks are highly improbable if rearrangements accumulate over time and instead imply that nearly all occur during a single cellular catastrophe. The stamp of chromothripsis can be seen in at least 2%-3% of all cancers, across many subtypes, and is present in ∼25% of bone cancers. We find that one, or indeed more than one, cancer-causing lesion can emerge out of the genomic crisis. This phenomenon has important implications for the origins of genomic remodeling and temporal emergence of cancer.
Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 093867, WT088340MA
Two covariance models for iron-responsive elements.
Biochemistry and Genetics Otago, University of Otago, Dunedin, New Zealand.
Iron-responsive elements (IREs) function in the 5' or 3' untranslated regions (UTRs) of mRNAs as post-transcriptional structured cis-acting RNA regulatory elements. One known functional mechanism is the binding of Iron Regulatory Proteins (IRPs) to 5' UTR IREs, reducing translation rates at low iron levels. Another known mechanism is IRPs binding to 3' UTR IREs in other mRNAs, increasing RNA stability. Experimentally proven elements are quite small, have some diversity of sequence and structure, and functional genes have similar pseudogenes in the genome. This paper presents two new IRE covariance models, comprising a new IRE clan in the RFAM database to encompass this variation without over-generalisation. Two IRE models rather than a single model is consistent with experimentally proven structures and predictions. All of the IREs with experimental support are modelled. These two new models show a marked increase in the sensitivity and specificity in detection of known iron-responsive elements and ability to predict novel IREs.
Funded by: Wellcome Trust: WT077044/Z/05/Z
RNA biology 2011;8;5;792-801
Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes.
Atherosclerosis Research Unit, Department of Medicine Solna, Karolinska Institutet, Karolinska University Hospital Solna, Stockholm, Sweden.
Objective: Proinsulin is a precursor of mature insulin and C-peptide. Higher circulating proinsulin levels are associated with impaired β-cell function, raised glucose levels, insulin resistance, and type 2 diabetes (T2D). Studies of the insulin processing pathway could provide new insights about T2D pathophysiology.
Research design and methods: We have conducted a meta-analysis of genome-wide association tests of ∼2.5 million genotyped or imputed single nucleotide polymorphisms (SNPs) and fasting proinsulin levels in 10,701 nondiabetic adults of European ancestry, with follow-up of 23 loci in up to 16,378 individuals, using additive genetic models adjusted for age, sex, fasting insulin, and study-specific covariates.
Results: Nine SNPs at eight loci were associated with proinsulin levels (P < 5 × 10(-8)). Two loci (LARP6 and SGSM2) have not been previously related to metabolic traits, one (MADD) has been associated with fasting glucose, one (PCSK1) has been implicated in obesity, and four (TCF7L2, SLC30A8, VPS13C/C2CD4A/B, and ARAP1, formerly CENTD2) increase T2D risk. The proinsulin-raising allele of ARAP1 was associated with a lower fasting glucose (P = 1.7 × 10(-4)), improved β-cell function (P = 1.1 × 10(-5)), and lower risk of T2D (odds ratio 0.88; P = 7.8 × 10(-6)). Notably, PCSK1 encodes the protein prohormone convertase 1/3, the first enzyme in the insulin processing pathway. A genotype score composed of the nine proinsulin-raising alleles was not associated with coronary disease in two large case-control datasets.
Conclusions: We have identified nine genetic variants associated with fasting proinsulin. Our findings illuminate the biology underlying glucose homeostasis and T2D development in humans and argue against a direct role of proinsulin in coronary artery disease pathogenesis.
Funded by: British Heart Foundation: RG/08/014/24067; Medical Research Council: 81696, G0601261, G0601966, G0700222, G0700222(81696), G0700931, G0801056, MC_PC_U127561128, MC_U106188470, MC_U127561128, MC_U137686857, MC_UP_A620_1014, MC_UP_A620_1015; NHLBI NIH HHS: R01 HL087647, U01 HL054527; NIDDK NIH HHS: DK062370, K24 DK080140, R01 DK078616; Wellcome Trust: 077016/Z/05/Z, 083270/Z/07/Z, 090532
Human metabolic individuality in biomedical and pharmaceutical research.
Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany. firstname.lastname@example.org
Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.
Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation; Canadian Institutes of Health Research: MOP172605, MOP77682, MOP‐82810; Cancer Research UK; Intramural NIH HHS; Medical Research Council; NHLBI NIH HHS: 1R01HL103931‐01, HL087647, N01‐HC‐55015, N01‐HC‐55016, N01‐HC‐55018, N01‐HC‐55019, N01‐HC‐55020, N01‐HC‐55021, N01‐HC‐55022, P01 HL098055, P01HL076491‐06, P01HL087018, R01 HL087647, R01 HL087676, R01HL089650‐02; NIA NIH HHS: N01‐AG‐12100; NIDDK NIH HHS: R01DK080732; Wellcome Trust: 091746, 091746/Z/10/Z
A genome-wide screen for interactions reveals a new locus on 4p15 modifying the effect of waist-to-hip ratio on total cholesterol.
Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.
Recent genome-wide association (GWA) studies described 95 loci controlling serum lipid levels. These common variants explain ∼25% of the heritability of the phenotypes. To date, no unbiased screen for gene-environment interactions for circulating lipids has been reported. We screened for variants that modify the relationship between known epidemiological risk factors and circulating lipid levels in a meta-analysis of genome-wide association (GWA) data from 18 population-based cohorts with European ancestry (maximum N = 32,225). We collected 8 further cohorts (N = 17,102) for replication, and rs6448771 on 4p15 demonstrated genome-wide significant interaction with waist-to-hip-ratio (WHR) on total cholesterol (TC) with a combined P-value of 4.79×10(-9). There were two potential candidate genes in the region, PCDH7 and CCKAR, with differential expression levels for rs6448771 genotypes in adipose tissue. The effect of WHR on TC was strongest for individuals carrying two copies of G allele, for whom a one standard deviation (sd) difference in WHR corresponds to 0.19 sd difference in TC concentration, while for A allele homozygous the difference was 0.12 sd. Our findings may open up possibilities for targeted intervention strategies for people characterized by specific genomic profiles. However, more refined measures of both body-fat distribution and metabolic measures are needed to understand how their joint dynamics are modified by the newly found locus.
Funded by: British Heart Foundation: PG/08/094/26019; Cancer Research UK: C865/A2883; Chief Scientist Office: CZB/4/710; Medical Research Council: G0300128, G0801566, G9502233, g0500539, g600705; NHLBI NIH HHS: 5R01HL087679; NIAAA NIH HHS: AA07535, AA10248, AA11998, AA13320, AA13321, AA13326, AA14041, AA17688, K05 AA017688; NIDA NIH HHS: DA12854; NIMH NIH HHS: 1R01MH083268-01, MH66206, U24 MH068457-06; PHS HHS: R01D0042157-01A; Wellcome Trust: 090532, gr069224
PLoS genetics 2011;7;10;e1002333
An optimized microarray platform for assaying genomic variation in Plasmodium falciparum field populations.
The Eck Institute for Global Health, University of Notre Dame, 100 Galvin Life Sciences, Notre Dame, IN 46556, USA.
We present an optimized probe design for copy number variation (CNV) and SNP genotyping in the Plasmodium falciparum genome. We demonstrate that variable length and isothermal probes are superior to static length probes. We show that sample preparation and hybridization conditions mitigate the effects of host DNA contamination in field samples. The microarray and workflow presented can be used to identify CNVs and SNPs with 95% accuracy in a single hybridization, in field samples containing up to 92% human DNA contamination.
Funded by: Medical Research Council: G19/9; NCRR NIH HHS: RR013556; NIAID NIH HHS: AI072517, AI075145; Wellcome Trust: 090532
Genome biology 2011;12;4;R35
The clinical and molecular genetic features of idiopathic infantile periodic alternating nystagmus.
Ophthalmology Group, School of Medicine, University of Leicester, RKCSB, PO Box 65, Leicester LE2 7LX, UK.
Periodic alternating nystagmus consists of involuntary oscillations of the eyes with cyclical changes of nystagmus direction. It can occur during infancy (e.g. idiopathic infantile periodic alternating nystagmus) or later in life. Acquired forms are often associated with cerebellar dysfunction arising due to instability of the optokinetic-vestibular systems. Idiopathic infantile periodic alternating nystagmus can be familial or occur in isolation; however, very little is known about the clinical characteristics, genetic aetiology and neural substrates involved. Five loci (NYS1-5) have been identified for idiopathic infantile nystagmus; three are autosomal (NYS2, NYS3 and NYS4) and two are X-chromosomal (NYS1 and NYS5). We previously identified the FRMD7 gene on chromosome Xq26 (NYS1 locus); mutations of FRMD7 are causative of idiopathic infantile nystagmus influencing neuronal outgrowth and development. It is unclear whether the periodic alternating nystagmus phenotype is linked to NYS1, NYS5 (Xp11.4-p11.3) or a separate locus. From a cohort of 31 X-linked families and 14 singletons (70 patients) with idiopathic infantile nystagmus we identified 10 families and one singleton (21 patients) with periodic alternating nystagmus of which we describe clinical phenotype, genetic aetiology and neural substrates involved. Periodic alternating nystagmus was not detected clinically but only on eye movement recordings. The cycle duration varied from 90 to 280 s. Optokinetic reflex was not detectable horizontally. Mutations of the FRMD7 gene were found in all 10 families and the singleton (including three novel mutations). Periodic alternating nystagmus was predominantly associated with missense mutations within the FERM domain. There was significant sibship clustering of the phenotype although in some families not all affected members had periodic alternating nystagmus. In situ hybridization studies during mid-late human embryonic stages in normal tissue showed restricted FRMD7 expression in neuronal tissue with strong hybridization signals within the afferent arms of the vestibulo-ocular reflex consisting of the otic vesicle, cranial nerve VIII and vestibular ganglia. Similarly within the afferent arm of the optokinetic reflex we showed expression in the developing neural retina and ventricular zone of the optic stalk. Strong FRMD7 expression was seen in rhombomeres 1 to 4, which give rise to the cerebellum and the common integrator site for both these reflexes (vestibular nuclei). Based on the expression and phenotypic data, we hypothesize that periodic alternating nystagmus arises from instability of the optokinetic-vestibular systems. This study shows for the first time that mutations in FRMD7 can cause idiopathic infantile periodic alternating nystagmus and may affect neuronal circuits that have been implicated in acquired forms.
Funded by: Medical Research Council: G9900837
Brain : a journal of neurology 2011;134;Pt 3;892-902
Genome-wide analysis of simultaneous GATA1/2, RUNX1, FLI1, and SCL binding in megakaryocytes identifies hematopoietic regulators.
Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK.
Hematopoietic differentiation critically depends on combinations of transcriptional regulators controlling the development of individual lineages. Here, we report the genome-wide binding sites for the five key hematopoietic transcription factors--GATA1, GATA2, RUNX1, FLI1, and TAL1/SCL--in primary human megakaryocytes. Statistical analysis of the 17,263 regions bound by at least one factor demonstrated that simultaneous binding by all five factors was the most enriched pattern and often occurred near known hematopoietic regulators. Eight genes not previously appreciated to function in hematopoiesis that were bound by all five factors were shown to be essential for thrombocyte and/or erythroid development in zebrafish. Moreover, one of these genes encoding the PDZK1IP1 protein shared transcriptional enhancer elements with the blood stem cell regulator TAL1/SCL. Multifactor ChIP-Seq analysis in primary human cells coupled with a high-throughput in vivo perturbation screen therefore offers a powerful strategy to identify essential regulators of complex mammalian differentiation processes.
Funded by: British Heart Foundation: RG/09/012/28096; Medical Research Council: G0800784, G0900951, G0900951(91754); Wellcome Trust: 077037/Z/05/Z, 077047/Z/05/Z, 082597/Z/07/Z
Developmental cell 2011;20;5;597-609
Association of known loci with lipid levels among children and prediction of dyslipidemia in adults.
Institute for Molecular Medicine, Finland FIMM, University of Helsinki, Helsinki, Finland.
Background: Recent genome-wide association studies have found 95 distinct genetic loci associated with high-density (HDL-C) and low-density (LDL-C) lipoprotein cholesterol, total cholesterol (TC), and triglycerides (TG), using adult samples. It is not known if these variants are associated with lipid levels in children and adolescents and if the genetic risk score (GRS), based on these variants, could improve adulthood dyslipidemia prediction over the childhood lipid measurements.
Methods and results: We used 2443 participants of the Cardiovascular Risk in Young Finns study cohort with up to 5 measurements of serum lipids taken between ages 3 and 45 years to estimate the effect of individual single-nucleotide polymorphisms and the GRS on lipids. The GRSs were strongly associated with lipids in all age groups (1.5 × 10(-20)<P<8.7 × 10(-12) for HDL-C, 3.5 × 10(-27)<P<5.6 × 10(-09) for LDL-C, 2.0 × 10(-25)<P<5.2 × 10(-09) for TC, and 4.1 × 10(-20)<P<8.4 × 10(-05) for TG). Jointly, the lipid loci explained 11.8-26.7% of the total variance in lipids among 3- to 6-year-old children, and the proportion dropped over age, except for TG. The discrimination of adult hypertriglyceridemia improved when GRS was added to childhood lipid measurement (C statistic=0.04, P=0.01).
Conclusions: Previously identified lipid loci are associated with lipid levels in children and adolescents and explain up to more than 2 times of the lipid variation in children compared with adults. The TG-GRS improves the risk discrimination over childhood lipid measurement for adult hypertriglyceridemia.
Funded by: Wellcome Trust
Circulation. Cardiovascular genetics 2011;4;6;673-80
Tumor-specific diagnostic marker for transmissible facial tumors of Tasmanian devils: immunohistochemistry studies.
Menzies Research Institute, University of Tasmania, Hobart, Tasmania, Australia.
Devil facial tumor disease (DFTD) is a transmissible neoplasm that is threatening the survival of the Tasmanian devil. Genetic analyses have indicated that the disease is a peripheral nerve sheath neoplasm of Schwann cell origin. DFTD cells express genes characteristic of myelinating Schwann cells, and periaxin, a Schwann cell protein, has been proposed as a marker for the disease. Diagnosis of DFTD is currently based on histopathology, cytogenetics, and clinical appearance of the disease in affected animals. As devils are susceptible to a variety of neoplastic processes, a specific diagnostic test is required to differentiate DFTD from cancers of similar morphological appearance. This study presents a thorough examination of the expression of a set of Schwann cell and other neural crest markers in DFTD tumors and normal devil tissues. Samples from 20 primary DFTD tumors and 10 DFTD metastases were evaluated by immunohistochemistry for the expression of periaxin, S100 protein, peripheral myelin protein 22, nerve growth factor receptor, nestin, neuron specific enolase, chromogranin A, and myelin basic protein. Of these, periaxin was confirmed as the most sensitive and specific marker, labeling the majority of DFTD cells in 100% of primary DFTD tumors and DFTD metastases. In normal tissues, periaxin showed specificity for Schwann cells in peripheral nerve bundles. This marker was then evaluated in cultured devil Schwann cells, DFTD cell lines, and xenografted DFTD tumors. Periaxin expression was maintained in all these models, validating its utility as a diagnostic marker for the disease.
Veterinary pathology 2011;48;6;1195-203
Messenger RNA and microRNA profiling during early mouse EB formation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
Embryonic stem (ES) cells can be induced to differentiate into embryoid bodies (EBs) in a synchronised manner when plated at a fixed density in hanging drops. This differentiation procedure mimics post-implantation development in mouse embryos and also serves as the starting point of protocols used in differentiation of stem cells into various lineages. Currently, little is known about the potential influence of microRNAs (miRNAs) on mRNA expression patterns during EB formation. We have measured mRNA and miRNA expression in developing EBs plated in hanging drops until day 3, when discrete structural changes occur involving their differentiation into three germ layers. We observe significant alterations in mRNA and miRNA expression profiles during this early developmental time frame, in particular of genes involved in germ layer formation, stem cell pluripotency and nervous system development. Computational target prediction using Pictar, TargetScan and miRBase Targets reveals an enrichment of binding sites corresponding to differentially and highly expressed miRNAs in stem cell pluripotency genes and a neuroectodermal marker, Nes. We also find that members of let-7 family are significantly down-regulated at day 3 and the corresponding up-regulated genes are enriched in let-7 seed sequences. These results depict how miRNA expression changes may affect the expression of mRNAs involved in EB formation on a genome-wide scale. Understanding the regulatory effects of miRNAs during EB formation may enable more efficient derivation of different cell types in culture.
Funded by: Wellcome Trust
Gene expression patterns : GEP 2011;11;5-6;334-44
Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease.
Genetics Department, University Medical Center and University of Groningen, The Netherlands.
Using variants from the 1000 Genomes Project pilot European CEU dataset and data from additional resequencing studies, we densely genotyped 183 non-HLA risk loci previously associated with immune-mediated diseases in 12,041 individuals with celiac disease (cases) and 12,228 controls. We identified 13 new celiac disease risk loci reaching genome-wide significance, bringing the number of known loci (including the HLA locus) to 40. We found multiple independent association signals at over one-third of these loci, a finding that is attributable to a combination of common, low-frequency and rare genetic variants. Compared to previously available data such as those from HapMap3, our dense genotyping in a large sample collection provided a higher resolution of the pattern of linkage disequilibrium and suggested localization of many signals to finer scale regions. In particular, 29 of the 54 fine-mapped signals seemed to be localized to single genes and, in some instances, to gene regulatory elements. Altogether, we define the complex genetic architecture of the risk regions of and refine the risk signals for celiac disease, providing the next step toward uncovering the causal mechanisms of the disease.
Funded by: Medical Research Council: G0000934, G0700545, G1001158, G1001158(95979), G1001799; NCATS NIH HHS: UL1 TR000005; NCI NIH HHS: 1R01CA141743, R01 CA141743; NIDDK NIH HHS: U01-DK062418; Wellcome Trust: 068545/Z/02, 076113/C/04/Z, 084743
Nature genetics 2011;43;12;1193-201
Genome watch: Honey, I shrunk the mimiviral genome.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
This month's Genome Watch describes how the large size of the mimiviral genome is a result of the sympatric lifestyle of mimivirus in host amoebae.
Nature reviews. Microbiology 2011;9;8;563
Dlg3 trafficking and apical tight junction formation is regulated by nedd4 and nedd4-2 e3 ubiquitin ligases.
Institute of Stem Cell Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany.
The Drosophila Discs large (Dlg) scaffolding protein acts as a tumor suppressor regulating basolateral epithelial polarity and proliferation. In mammals, four Dlg homologs have been identified; however, their functions in cell polarity remain poorly understood. Here, we demonstrate that the X-linked mental retardation gene product Dlg3 contributes to apical-basal polarity and epithelial junction formation in mouse organizer tissues, as well as to planar cell polarity in the inner ear. We purified complexes associated with Dlg3 in polarized epithelial cells, including proteins regulating directed trafficking and tight junction formation. Remarkably, of the four Dlg family members, Dlg3 exerts a distinct function by recruiting the ubiquitin ligases Nedd4 and Nedd4-2 through its PPxY motifs. We found that these interactions are required for Dlg3 monoubiquitination, apical membrane recruitment, and tight junction consolidation. Our findings reveal an unexpected evolutionary diversification of the vertebrate Dlg family in basolateral epithelium formation.
Funded by: European Research Council: 242807
Developmental cell 2011;21;3;479-91
Partitioning core and satellite taxa from within cystic fibrosis lung bacterial communities.
NERC Centre for Ecology and Hydrology, Wallingford, UK. firstname.lastname@example.org
Cystic fibrosis (CF) patients suffer from chronic bacterial lung infections that lead to death in the majority of cases. The need to maintain lung function in these patients means that characterising these infections is vital. Increasingly, culture-independent analyses are expanding the number of bacterial species associated with CF respiratory samples; however, the potential significance of these species is not known. Here, we applied ecological statistical tools to such culture-independent data, in a novel manner, to partition taxa within the metacommunity into core and satellite species. Sputa and clinical data were obtained from 14 clinically stable adult CF patients. Fourteen rRNA gene libraries were constructed with 35 genera and 82 taxa, identified in 2139 bacterial clones. Shannon-Wiener and taxa-richness analyses confirmed no undersampling of bacterial diversity. By decomposing the distribution using the ratio of variance to the mean taxon abundance, we partitioned objectively the species abundance distribution into core and satellite species. The satellite group comprised 67 bacterial taxa from 33 genera and the core group, 15 taxa from 7 genera (including Pseudomonas (1 taxon), Streptococcus (2), Neisseria (2), Catonella (1), Porphyromonas (1), Prevotella (5) and Veillonella (3)], the last four being anaerobes). The core group was dominated by Pseudomonas aeruginosa. Other recognised CF pathogens were rare. Mantel and partial Mantel tests assessed which clinical factors influenced the composition observed. CF transmembrane conductance regulator genotype and antibiotic treatment correlated with all core taxa. Lung function correlated with richness. The clinical significance of these core and satellite species findings in the CF lung is discussed. GenBank accession numbers: FM995625–FM997761
Funded by: Wellcome Trust: WT076964
The ISME journal 2011;5;5;780-91
Acute sensitivity of the oral mucosa to oncogenic K-ras.
Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1HH, UK. email@example.com
Mouse models of cancer represent powerful tools for analysing the role of genetic alterations in carcinogenesis. Using a mouse model that allows tamoxifen-inducible somatic activation (by Cre-mediated recombination) of oncogenic K-ras(G12D) in a wide range of tissues, we observed hyperplasia of squamous epithelium located in moist or frequently abraded mucosa, with the most dramatic effects in the oral mucosa. This epithelium showed a sequence of squamous hyperplasia followed by squamous papilloma with dysplasia, in which some areas progressed to early invasive squamous cell carcinoma, within 14 days of widespread oncogenic K-ras activation. The marked proliferative response of the oral mucosa to K-ras(G12D) was most evident in the basal layers of the squamous epithelium of the outer lip with hair follicles and wet mucosal surface, with these cells staining positively for pAKT and cyclin D1, showing Ras/AKT pathway activation and increased proliferation with Ki-67 and EdU positivity. The stromal cells also showed gene activation by recombination and immunopositivity for pERK indicating K-Ras/ERK pathway activation, but without Ki-67 positivity or increase in stromal proliferation. The oral neoplasms showed changes in the expression pattern of cytokeratins (CK6 and CK13), similar to those observed in human oral tumours. Sporadic activation of the K-ras(G12D) allele (due to background spontaneous recombination in occasional cells) resulted in the development of benign oral squamous papillomas only showing a mild degree of dysplasia with no invasion. In summary, we show that oral mucosa is acutely sensitive to oncogenic K-ras, as widespread expression of activated K-ras in the murine oral mucosal squamous epithelium and underlying stroma can drive the oral squamous papilloma-carcinoma sequence.
Funded by: Cancer Research UK: 13031; Medical Research Council: MC_U105370181; Wellcome Trust
The Journal of pathology 2011;224;1;22-32
Modeling the evolution of ETV6-RUNX1-induced B-cell precursor acute lymphoblastic leukemia in mice.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. firstname.lastname@example.org
The t(12;21) translocation that generates the ETV6-RUNX1 (TEL-AML1) fusion gene, is the most common chromosomal rearrangement in childhood cancer and is exclusively associated with B-cell precursor acute lymphoblastic leukemia (BCP-ALL). The translocation arises in utero and is necessary but insufficient for the development of leukemia. Single-nucleotide polymorphism array analysis of ETV6-RUNX1 patient samples has identified multiple additional genetic alterations; however, the role of these lesions in leukemogenesis remains undetermined. Moreover, murine models of ETV6-RUNX1 ALL that faithfully recapitulate the human disease are lacking. To identify novel genes that cooperate with ETV6-RUNX1 in leukemogenesis, we generated a mouse model that uses the endogenous Etv6 locus to coexpress the Etv6-RUNX1 fusion and Sleeping Beauty transposase. An insertional mutagenesis screen was performed by intercrossing these mice with those carrying a Sleeping Beauty transposon array. In contrast to previous models, a substantial proportion (20%) of the offspring developed BCP-ALL. Isolation of the transposon insertion sites identified genes known to be associated with BCP-ALL, including Ebf1 and Epor, in addition to other novel candidates. This is the first mouse model of ETV6-RUNX1 to develop BCP-ALL and provides important insight into the cooperating genetic alterations in ETV6-RUNX1 leukemia.
Funded by: Biotechnology and Biological Sciences Research Council; Cancer Research UK: 13031, A12401; Medical Research Council: G116/187; Wellcome Trust: 082356
The mouse genetics toolkit: revealing function and mechanism.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Large-scale projects are providing rapid global access to a wealth of mouse genetic resources to help discover disease genes and to manipulate their function.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0800024
Genome biology 2011;12;6;224
Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
The genetics of renal cancer is dominated by inactivation of the VHL tumour suppressor gene in clear cell carcinoma (ccRCC), the commonest histological subtype. A recent large-scale screen of ∼3,500 genes by PCR-based exon re-sequencing identified several new cancer genes in ccRCC including UTX (also known as KDM6A), JARID1C (also known as KDM5C) and SETD2 (ref. 2). These genes encode enzymes that demethylate (UTX, JARID1C) or methylate (SETD2) key lysine residues of histone H3. Modification of the methylation state of these lysine residues of histone H3 regulates chromatin structure and is implicated in transcriptional control. However, together these mutations are present in fewer than 15% of ccRCC, suggesting the existence of additional, currently unidentified cancer genes. Here, we have sequenced the protein coding exome in a series of primary ccRCC and report the identification of the SWI/SNF chromatin remodelling complex gene PBRM1 (ref. 4) as a second major ccRCC cancer gene, with truncating mutations in 41% (92/227) of cases. These data further elucidate the somatic genetic architecture of ccRCC and emphasize the marked contribution of aberrant chromatin biology.
Funded by: Cancer Research UK; NCI NIH HHS: R01 CA113636, R01 CA134759; Wellcome Trust: 077012, 077012/Z/05/Z, 088340, 093867
Mutant nucleophosmin and cooperating pathways drive leukemia initiation and progression in mice.
Mouse Genomics Team, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
Acute myeloid leukemia (AML) is a molecularly diverse malignancy with a poor prognosis whose largest subgroup is characterized by somatic mutations in NPM1, which encodes nucleophosmin. These mutations, termed NPM1c, result in cytoplasmic dislocation of nucleophosmin and are associated with distinctive transcriptional signatures, yet their role in leukemogenesis remains obscure. Here we report that activation of a humanized Npm1c knock-in allele in mouse hemopoietic stem cells causes Hox gene overexpression, enhanced self renewal and expanded myelopoiesis. One third of mice developed delayed-onset AML, suggesting a requirement for cooperating mutations. We identified such mutations using a Sleeping Beauty transposon, which caused rapid-onset AML in 80% of mice with Npm1c, associated with mutually exclusive integrations in Csf2, Flt3 or Rasgrp1 in 55 of 70 leukemias. We also identified recurrent integrations in known and newly discovered leukemia genes including Nf1, Bach2, Dleu2 and Nup98. Our results provide new pathogenetic insights and identify possible therapeutic targets in NPM1c+ AML.
Funded by: Cancer Research UK: A7273; Medical Research Council: MC_UP_A652_1001
Nature genetics 2011;43;5;470-5
Genome-wide association study identifies six new loci influencing pulse pressure and mean arterial pressure.
Department of Health Sciences, University of Leicester, Leicester, UK.
Numerous genetic loci have been associated with systolic blood pressure (SBP) and diastolic blood pressure (DBP) in Europeans. We now report genome-wide association studies of pulse pressure (PP) and mean arterial pressure (MAP). In discovery (N = 74,064) and follow-up studies (N = 48,607), we identified at genome-wide significance (P = 2.7 × 10(-8) to P = 2.3 × 10(-13)) four new PP loci (at 4q12 near CHIC2, 7q22.3 near PIK3CG, 8q24.12 in NOV and 11q24.3 near ADAMTS8), two new MAP loci (3p21.31 in MAP4 and 10q25.3 near ADRB1) and one locus associated with both of these traits (2q24.3 near FIGN) that has also recently been associated with SBP in east Asians. For three of the new PP loci, the estimated effect for SBP was opposite of that for DBP, in contrast to the majority of common SBP- and DBP-associated variants, which show concordant effects on both traits. These findings suggest new genetic pathways underlying blood pressure variation, some of which may differentially influence SBP and DBP.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Chief Scientist Office: CZB/4/505, CZB/4/710, ETM/55; Intramural NIH HHS: Z01 HG000024-13; Medical Research Council: G0401527, G0601966, G0700704, G0700931, G0701863, G0801056, G0902313, G1000143, G9521010, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U127561128, MC_U127592696, MC_U137686857; NHLBI NIH HHS: K23 HL080025, N01 HC025195, N01 HC055015, N01 HC085079, N01 HC095159, R01 HL043851, R01 HL086694, R01 HL087647, R01 HL105756, U10 HL054512; NIA NIH HHS: N01 AG012109; NIMHD NIH HHS: 263 MD9164 13; Wellcome Trust: 090532
Nature genetics 2011;43;10;1005-11
Dominant and diet-responsive groups of bacteria within the human colonic microbiota.
Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, UK.
The populations of dominant species within the human colonic microbiota can potentially be modified by dietary intake with consequences for health. Here we examined the influence of precisely controlled diets in 14 overweight men. Volunteers were provided successively with a control diet, diets high in resistant starch (RS) or non-starch polysaccharides (NSPs) and a reduced carbohydrate weight loss (WL) diet, over 10 weeks. Analysis of 16S rRNA sequences in stool samples of six volunteers detected 320 phylotypes (defined at >98% identity) of which 26, including 19 cultured species, each accounted for >1% of sequences. Although samples clustered more strongly by individual than by diet, time courses obtained by targeted qPCR revealed that 'blooms' in specific bacterial groups occurred rapidly after a dietary change. These were rapidly reversed by the subsequent diet. Relatives of Ruminococcus bromii (R-ruminococci) increased in most volunteers on the RS diet, accounting for a mean of 17% of total bacteria compared with 3.8% on the NSP diet, whereas the uncultured Oscillibacter group increased on the RS and WL diets. Relatives of Eubacterium rectale increased on RS (to mean 10.1%) but decreased, along with Collinsella aerofaciens, on WL. Inter-individual variation was marked, however, with >60% of RS remaining unfermented in two volunteers on the RS diet, compared to <4% in the other 12 volunteers; these two individuals also showed low numbers of R-ruminococci (<1%). Dietary non-digestible carbohydrate can produce marked changes in the gut microbiota, but these depend on the initial composition of an individual's gut microbiota.
Funded by: Wellcome Trust: 076964, WT 76964
The ISME journal 2011;5;2;220-30
High-throughput clone library analysis of the mucosa-associated microbiota reveals dysbiosis and differences between inflamed and non-inflamed regions of the intestine in inflammatory bowel disease.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. firstname.lastname@example.org
Background: The gut microbiota is thought to play a key role in the development of the inflammatory bowel diseases Crohn's disease (CD) and ulcerative colitis (UC). Shifts in the composition of resident bacteria have been postulated to drive the chronic inflammation seen in both diseases (the "dysbiosis" hypothesis). We therefore specifically sought to compare the mucosa-associated microbiota from both inflamed and non-inflamed sites of the colon in CD and UC patients to that from non-IBD controls and to detect disease-specific profiles.
Results: Paired mucosal biopsies of inflamed and non-inflamed intestinal tissue from 6 CD (n = 12) and 6 UC (n = 12) patients were compared to biopsies from 5 healthy controls (n = 5) by in-depth sequencing of over 10,000 near full-length bacterial 16S rRNA genes. The results indicate that mucosal microbial diversity is reduced in IBD, particularly in CD, and that the species composition is disturbed. Firmicutes were reduced in IBD samples and there were concurrent increases in Bacteroidetes, and in CD only, Enterobacteriaceae. There were also significant differences in microbial community structure between inflamed and non-inflamed mucosal sites. However, these differences varied greatly between individuals, meaning there was no obvious bacterial signature that was positively associated with the inflamed gut.
Conclusions: These results may support the hypothesis that the overall dysbiosis observed in inflammatory bowel disease patients relative to non-IBD controls might to some extent be a result of the disturbed gut environment rather than the direct cause of disease. Nonetheless, the observed shifts in microbiota composition may be important factors in disease maintenance and severity.
Funded by: Wellcome Trust: WT076964
BMC microbiology 2011;11;7
Rapid and efficient reprogramming of somatic cells to induced pluripotent stem cells by retinoic acid receptor gamma and liver receptor homolog 1.
Wellcome Trust Sanger Institute, Hinxton CB10 1HH, United Kingdom.
Somatic cells can be reprogrammed to induced pluripotent stem cells (iPSCs) by expressing four transcription factors: Oct4, Sox2, Klf4, and c-Myc. Here we report that enhancing RA signaling by expressing RA receptors (RARs) or by RA agonists profoundly promoted reprogramming, but inhibiting it using a RAR-α dominant-negative form completely blocked it. Coexpressing Rarg (RAR-γ) and Lrh-1 (liver receptor homologue 1; Nr5a2) with the four factors greatly accelerated reprogramming so that reprogramming of mouse embryonic fibroblast cells to ground-state iPSCs requires only 4 d induction of these six factors. The six-factor combination readily reprogrammed primary human neonatal and adult fibroblast cells to exogenous factor-independent iPSCs, which resembled ground-state mouse ES cells in growth properties, gene expression, and signaling dependency. Our findings demonstrate that signaling through RARs has critical roles in molecular reprogramming and that the synergistic interaction between Rarg and Lrh1 directs reprogramming toward ground-state pluripotency. The human iPSCs described here should facilitate functional analysis of the human genome.
Funded by: Medical Research Council: G0700665; Wellcome Trust: 077186/Z/05/Z
Proceedings of the National Academy of Sciences of the United States of America 2011;108;45;18283-8
Genome-wide association studies and type 2 diabetes.
Wellcome Trust Sanger Institute, Cambridge, UK.
In recent years, the search for genetic determinants of type 2 diabetes (T2D) has changed dramatically. Although linkage and small-scale candidate gene studies were highly successful in the identification of genes, which, when mutated, caused monogenic forms of T2D, they were largely unsuccessful when applied to the more common forms of the disease. To date, these approaches have only identified two loci (PPARG, KCNJ11) robustly implicated in T2D susceptibility. The ability to perform large-scale association analysis, including genome-wide association studies (GWAS) in many thousands of samples from different populations, and subsequently, the shift to form large international collaborations to perform meta-analyses across many studies has taken the number of independent loci showing genome-wide significant associations with T2D to 44. This number includes six loci identified initially through the analysis of quantitative glycaemic phenotypes, illustrating the usefulness of this approach both to identify new disease genes and gain insight into the mechanisms leading to disease. Combined, these loci still only account for ∼10% of the observed familial clustering in Europeans, leaving much of the variance unexplained. In this review, we will describe what GWAS have taught us about the genetic basis of T2D and discuss possible next steps to uncover the remaining heritability.
Funded by: Wellcome Trust: 077016/Z/05/Z
Briefings in functional genomics 2011;10;2;52-60
An Exceptional Gene: Evolution of the TSPY Gene Family in Humans and Other Great Apes.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs. CB10 1SA, UK. email@example.com.
The TSPY gene stands out from all other human protein-coding genes because of its high copy number and tandemly-repeated organization. Here, we review its evolutionary history in great apes in order to assess whether these unusual properties are more likely to result from a relaxation of constraint or an unusual functional role. Detailed comparisons with chimpanzee are possible because a finished sequence of the chimpanzee Y chromosome is available, together with more limited data from other apes. These comparisons suggest that the human-chimpanzee ancestral Y chromosome carried a tandem array of TSPY genes which expanded on the human lineage while undergoing multiple duplication events followed by pseudogene formation on the chimpanzee lineage. The protein coding region is the most highly conserved of the multi-copy Y genes in human-chimpanzee comparisons, and the analysis of the dN/dS ratio indicates that TSPY is evolutionarily highly constrained, but may have experienced positive selection after the human-chimpanzee split. We therefore conclude that the exceptionally high copy number in humans is most likely due to a human-specific but unknown functional role, possibly involving rapid production of a large amount of TSPY protein at some stage during spermatogenesis.
An exceptional gene: Evolution of the TSPY gene family in humans and other great apes
Sequence-based characterization of structural variation in the mouse genome.
The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.
Structural variation is widespread in mammalian genomes and is an important cause of disease, but just how abundant and important structural variants (SVs) are in shaping phenotypic variation remains unclear. Without knowing how many SVs there are, and how they arise, it is difficult to discover what they do. Combining experimental with automated analyses, we identified 711,920 SVs at 281,243 sites in the genomes of thirteen classical and four wild-derived inbred mouse strains. The majority of SVs are less than 1 kilobase in size and 98% are deletions or insertions. The breakpoints of 160,000 SVs were mapped to base pair resolution, allowing us to infer that insertion of retrotransposons causes more than half of SVs. Yet, despite their prevalence, SVs are less likely than other sequence variants to cause gene expression or quantitative phenotypic variation. We identified 24 SVs that disrupt coding exons, acting as rare variants of large effect on gene function. One-third of the genes so affected have immunological functions.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0800024, MC_U137761446; Wellcome Trust: 079912, 082356, 090532, 098051
Phase I trial of a selective c-MET inhibitor ARQ 197 incorporating proof of mechanism pharmacodynamic studies.
Royal Marsden National Health Service Foundation Trust, The Institute of Cancer Research, Sutton, Surrey, UK.
Purpose: The hepatocyte growth factor/c-MET axis is implicated in tumor cell proliferation, survival, and angiogenesis. ARQ 197 is an oral, selective, non-adenosine triphosphate competitive c-MET inhibitor. A phase I trial of ARQ 197 was conducted to assess safety, tolerability, and target inhibition, including intratumoral c-MET signaling, apoptosis, and angiogenesis.
Patients and methods: Patients with solid tumors amenable to pharmacokinetic and pharmacodynamic studies using serial biopsies, dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI), and circulating endothelial cell (CEC) and circulating tumor cell (CTC) enumeration were enrolled.
Results: Fifty-one patients received ARQ 197 at 100 to 400 mg twice per day. ARQ 197 was well tolerated, with the most common toxicities being grade 1 to 2 fatigue, nausea, and vomiting. Dose-limiting toxicities included grade 3 fatigue (200 mg twice per day; n = 1); grade 3 mucositis, palmar-plantar erythrodysesthesia, and hypokalemia (400 mg twice per day; n = 1); and grade 3 to 4 febrile neutropenia (400 mg twice per day, n = 2; 360 mg twice per day, n = 1). The recommended phase II dose was 360 mg twice per day. ARQ 197 systemic exposure was dose dependent and supported twice per day oral dosing. ARQ 197 decreased phosphorylated c-MET, total c-MET, and phosphorylated focal adhesion kinase and increased terminal deoxynucleotidyl transferase-mediated deoxyuridine triphosphate-biotin nick-end labeling (TUNEL) staining in tumor biopsies (n = 15). CECs decreased in 25 (58.1%) of 43 patients, but no significant changes in DCE-MRI parameters were observed after ARQ 197 treatment. Of 15 patients with detectable CTCs, eight (53.3%) had ≥ 30% decline in CTCs after treatment. Stable disease, as defined by Response Evaluation Criteria in Solid Tumors (RECIST), ≥ 4 months was observed in 14 patients, with minor regressions in gastric and Merkel cell cancers.
Conclusion: ARQ 197 safely inhibited intratumoral c-MET signaling. Further clinical evaluation focusing on combination approaches, including an erlotinib combination in non-small-cell lung cancer, is ongoing.
Funded by: Cancer Research UK: 10334, C1060/A10334; Department of Health; Medical Research Council; Wellcome Trust
Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2011;29;10;1271-9
Targeted gene correction of α1-antitrypsin deficiency in induced pluripotent stem cells.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Human induced pluripotent stem cells (iPSCs) represent a unique opportunity for regenerative medicine because they offer the prospect of generating unlimited quantities of cells for autologous transplantation, with potential application in treatments for a broad range of disorders. However, the use of human iPSCs in the context of genetically inherited human disease will require the correction of disease-causing mutations in a manner that is fully compatible with clinical applications. The methods currently available, such as homologous recombination, lack the necessary efficiency and also leave residual sequences in the targeted genome. Therefore, the development of new approaches to edit the mammalian genome is a prerequisite to delivering the clinical promise of human iPSCs. Here we show that a combination of zinc finger nucleases (ZFNs) and piggyBac technology in human iPSCs can achieve biallelic correction of a point mutation (Glu342Lys) in the α(1)-antitrypsin (A1AT, also known as SERPINA1) gene that is responsible for α(1)-antitrypsin deficiency. Genetic correction of human iPSCs restored the structure and function of A1AT in subsequently derived liver cells in vitro and in vivo. This approach is significantly more efficient than any other gene-targeting technology that is currently available and crucially prevents contamination of the host genome with residual non-human sequences. Our results provide the first proof of principle, to our knowledge, for the potential of combining human iPSCs with genetic correction to generate clinically relevant cells for autologous cell-based therapies.
Funded by: Medical Research Council: G0601840, G0701448, G0800784, G0901786, G1000847; Wellcome Trust: 077187, WT077187
A hyperactive piggyBac transposase for mammalian applications.
Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom.
DNA transposons have been widely used for transgenesis and insertional mutagenesis in various organisms. Among the transposons active in mammalian cells, the moth-derived transposon piggyBac is most promising with its highly efficient transposition, large cargo capacity, and precise repair of the donor site. Here we report the generation of a hyperactive piggyBac transposase. The active transposition of piggyBac in multiple organisms allowed us to screen a transposase mutant library in yeast for hyperactive mutants and then to test candidates in mouse ES cells. We isolated 18 hyperactive mutants in yeast, among which five were also hyperactive in mammalian cells. By combining all mutations, a total of 7 aa substitutions, into a single reading frame, we generated a unique hyperactive piggyBac transposase with 17-fold and ninefold increases in excision and integration, respectively. We showed its applicability by demonstrating an increased efficiency of generation of transgene-free mouse induced pluripotent stem cells. We also analyzed whether this hyperactive piggyBac transposase affects the genomic integrity of the host cells. The frequency of footprints left by the hyperactive piggyBac transposase was as low as WT transposase (~1%) and we found no evidence that the expression of the transposase affects genomic integrity. This hyperactive piggyBac transposase expands the utility of the piggyBac transposon for applications in mammalian genetics and gene therapy.
Funded by: Howard Hughes Medical Institute; Wellcome Trust: WT077187
Proceedings of the National Academy of Sciences of the United States of America 2011;108;4;1531-6
Meta analysis of candidate gene variants outside the LPA locus with Lp(a) plasma levels in 14,500 participants of six White European cohorts.
University College London Genetics Institute, Department of Genetics, Environment and Evolution, Gower St, London WC1E 6BT, UK.
Background: Both genome-wide association studies and candidate gene studies have reported that the major determinant of plasma levels of the Lipoprotein (a) [Lp(a)] reside within the LPA locus on chromosome 6. We have used data from the HumanCVD BeadChip to explore the contribution of other candidate genes determining Lp(a) levels.
Methods: 48,032 single nucleotide polymorphisms (SNPs) from the Illumina HumanCVD BeadChip were genotyped in 5059 participants of the Whitehall II study (WHII) of randomly ascertained healthy men and women. SNPs showing association with Lp(a) levels of p<10(-4) outside the LPA locus were selected for replication in a total of an additional 9463 participants of five European based studies (EAS, EPIC-Norfolk, NPHSII, PROCARDIS, and SAPHIR).
Results: In Whitehall II, apart from the LPA locus (where p values for several SNPs were <10(-30)) there was significant association at four loci GALNT2, FABP1, PPARGC1A and TNFRSFF11A. However, a meta-analysis of the six studies did not confirm any of these findings.
Conclusion: Results from this meta analysis of 14,522 participants revealed no candidate genes from the HumanCVD BeadChip outside the LPA locus to have an effect on Lp(a) levels. Further studies with genome-wide and denser SNP coverage are required to confirm or refute this finding.
Funded by: AHRQ HHS: HS06516; British Heart Foundation: PG/07/133/24260, RG/08/008, SP/07/007/23671; Cancer Research UK; Department of Health; Medical Research Council; NHLBI NIH HHS: HL36310; NIA NIH HHS: AG13196; Wellcome Trust
Genetic and structural variation in the gastric cancer kinome revealed through targeted deep sequencing.
Cellular and Molecular Research, National Cancer Centre, Singapore.
Genetic alterations in kinases have been linked to multiple human pathologies. To explore the landscape of kinase genetic variation in gastric cancer (GC), we used targeted, paired-end deep sequencing to analyze 532 protein and phosphoinositide kinases in 14 GC cell lines. We identified 10,604 single-nucleotide variants (SNV) in kinase exons including greater than 300 novel nonsynonymous SNVs. Family-wise analysis of the nonsynonymous SNVs revealed a significant enrichment in mitogen-activated protein kinase (MAPK)-related genes (P < 0.01), suggesting a preferential involvement of this kinase family in GC. A potential antioncogenic role for MAP2K4, a gene exhibiting recurrent alterations in 2 lines, was functionally supported by siRNA knockdown and overexpression studies in wild-type and MAP2K4 variant lines. The deep sequencing data also revealed novel, large-scale structural rearrangement events involving kinases including gene fusions involving CDK12 and the ERBB2 receptor tyrosine kinase in MKN7 cells. Integrating SNVs and copy number alterations, we identified Hs746T as a cell line exhibiting both splice-site mutations and genomic amplification of MET, resulting in MET protein overexpression. When applied to primary GCs, we identified somatic mutations in 8 kinases, 4 of which were recurrently altered in both primary tumors and cell lines (MAP3K6, STK31, FER, and CDKL5). These results demonstrate that how targeted deep sequencing approaches can deliver unprecedented multilevel characterization of a medically and pharmacologically relevant gene family. The catalog of kinome genetic variants assembled here may broaden our knowledge on kinases and provide useful information on genetic alterations in GC.
Cancer research 2011;71;1;29-39
Animals learn new tricks from microorganisms.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. firstname.lastname@example.org
Nature reviews. Microbiology 2011;9;12;836
Next-generation association studies for complex traits.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. email@example.com
A new study successfully applies complementary whole-genome sequencing and imputation approaches to establish robust disease associations in an isolated population. This strategy is poised to help elucidate the role of variants at the low end of the allele frequency spectrum in the genetic architecture of complex traits.
Funded by: Wellcome Trust: 088885
Nature genetics 2011;43;4;287-8
An evaluation of power to detect low-frequency variant associations using allele-matching tests that account for uncertainty.
Wellcome Trust Sanger Institute, Hinxton, CB10 1HH, UK. Eleftheria@sanger.ac.uk
There is growing interest in the role of rare variants in multifactorial disease etiology, and increasing evidence that rare variants are associated with complex traits. Single SNP tests are underpowered in rare variant association analyses, so locus-based tests must be used. Quality scores at both the SNP and genotype level are available for sequencing data and they are rarely accounted for. A locus-based method that has high power in the presence of rare variants is extended to incorporate such quality scores as weights, and its power is compared with the original method via a simulation study. Preliminary results suggest that taking uncertainty into account does not improve the power.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 2011;100-5
Analysis of complex disease association studies
Analysis of complex disease association studies 2011
Replication of the association of a MET variant with autism in a Chinese Han population.
Department of Children's and Adolescent Health, Public Health College of Harbin Medical University, Harbin, Heilongjiang, People's Republic of China.
Background: Autism is a common, severe and highly heritable neurodevelopmental disorder in children, affecting up to 100 children per 10,000. The MET gene has been regarded as a promising candidate gene for this disorder because it is located within a replicated linkage interval, is involved in pathways affecting the development of the cerebral cortex and cerebellum in ways relevant to autism patients, and has shown significant association signals in previous studies.
Principal findings: Here, we present new ASD patient and control samples from Heilongjiang, China and use them in a case-control and family-based replication study of two MET variants. One SNP, rs38845, was successfully replicated in a case-control association study, but failed to replicate in a family-based study, possibly due to small sample size. The other SNP, rs1858830, failed to replicate in both case-control and family-based studies.
Conclusions: This is the first attempt to replicate associations in Chinese autism samples, and our result provides evidence that MET variants may be relevant to autism susceptibility in the Chinese Han population.
Funded by: Wellcome Trust
PloS one 2011;6;11;e27428
The Lin28/let-7 axis regulates glucose metabolism.
Stem Cell Transplantation Program, Division of Pediatric Hematology/Oncology, Children's Hospital Boston and Dana Farber Cancer Institute, Boston, MA, USA.
The let-7 tumor suppressor microRNAs are known for their regulation of oncogenes, while the RNA-binding proteins Lin28a/b promote malignancy by inhibiting let-7 biogenesis. We have uncovered unexpected roles for the Lin28/let-7 pathway in regulating metabolism. When overexpressed in mice, both Lin28a and LIN28B promote an insulin-sensitized state that resists high-fat-diet induced diabetes. Conversely, muscle-specific loss of Lin28a or overexpression of let-7 results in insulin resistance and impaired glucose tolerance. These phenomena occur, in part, through the let-7-mediated repression of multiple components of the insulin-PI3K-mTOR pathway, including IGF1R, INSR, and IRS2. In addition, the mTOR inhibitor, rapamycin, abrogates Lin28a-mediated insulin sensitivity and enhanced glucose uptake. Moreover, let-7 targets are enriched for genes containing SNPs associated with type 2 diabetes and control of fasting glucose in human genome-wide association studies. These data establish the Lin28/let-7 pathway as a central regulator of mammalian glucose metabolism.
Funded by: British Heart Foundation: RG/07/008/23674; Howard Hughes Medical Institute; Medical Research Council: G0100222, G0902037, G19/35, G8802774, MC_PC_U127561128, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015; NCI NIH HHS: K08 CA157727, T32 CA009172; NIDDK NIH HHS: R01 DK070055; Wellcome Trust: 090532