Sanger Institute - Publications 2012
Number of papers published in 2012: 208
An integrated map of genetic variation from 1,092 human genomes.
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
Funded by: Biotechnology and Biological Sciences Research Council: BB/I021213/1, BB_BB/I02593X/1; British Heart Foundation: BHF_RG/09/012/28096, RG/09/12/28096; Howard Hughes Medical Institute; Medical Research Council: G0900747(91070), MRC_G0701805, MRC_G0801823, MRC_G0900747; NCATS NIH HHS: UL1 TR000124; NCI NIH HHS: R01 CA166661, R01CA166661; NCRR NIH HHS: G12 RR003050, UL1 RR024131, UL1RR024131; NHGRI NIH HHS: P01 HG004120, P01HG4120, P41 HG002371, P41 HG004221, P41HG2371, P41HG4221, R01 HG002898, R01 HG003698, R01 HG004719, R01 HG004960, R01 HG005701, R01 HG007022, R01HG2898, R01HG3698, R01HG4719, R01HG4960, R01HG5701, RC2 HG005552, RC2 HG005581, RC2HG5552, RC2HG5581, U01 HG005208, U01 HG005209, U01 HG005211, U01 HG005214, U01 HG005715, U01 HG005725, U01 HG005728, U01 HG006513, U01 HG006569, U01HG5208, U01HG5209, U01HG5211, U01HG5214, U01HG5715, U01HG5725, U01HG5728, U01HG6513, U01HG6569, U41 HG004568, U41HG4568, U54 HG003067, U54 HG003079, U54 HG003273, U54HG3067, U54HG3079, U54HG3273; NHLBI NIH HHS: HHSN268201100040C, HL078885, R01 HL078885, R01 HL088133, R01 HL095045, R01HL95045, RC2 HL102925, RC2HL102925, T32 1f40 HL094284, T32HL94284; NIAID NIH HHS: AI077439, AI2009061, U19 AI077439; NIEHS NIH HHS: ES015794, R01 ES015794; NIGMS NIH HHS: R01 GM059290, R01GM59290, T32 GM007748, T32 GM008283, T32GM7748, T32GM8283; NIH HHS: DP2 OD006514, DP2OD6514; NIMH NIH HHS: R01 MH084698, R01MH84698; NIMHD NIH HHS: G12 MD007579, P20 MD006899; NLM NIH HHS: T15 LM007033, T15 LM007056, T15LM7033; PHS HHS: HHSN268201100040C; Wellcome Trust: WT085475/Z/08/Z, WT085532, WT085532AIA, WT086084, WT086084/Z/08/Z, WT089250/Z/09/Z, WT090532, WT090532/Z/09/Z, WT095552/Z/11/Z, WT095908, WT096599, WT098051
Analysis of context-dependent errors for illumina sequencing.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. email@example.com
The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of "correcting" this error (the "second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be "corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their "second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.
Journal of bioinformatics and computational biology 2012;10;2;1241005
BLUEPRINT to decode the epigenetic signature written in blood.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0801156, MR/J001597/1; Wellcome Trust: 079249, 095606, 095645, 095908
Nature biotechnology 2012;30;3;224-6
A suggested new bacteriophage genus: "Viunalikevirus".
Laboratory of Gene Technology, Katholieke Universiteit Leuven, Kasteelpark Arenberg 21, Heverlee, Belgium.
We suggest a bacteriophage genus, "Viunalikevirus", as a new genus within the family Myoviridae. To date, this genus includes seven sequenced members: Salmonella phages ViI, SFP10 and ΦSH19; Escherichia phages CBA120 and PhaxI; Shigella phage phiSboM-AG3; and Dickeya phage LIMEstone1. Their shared myovirus morphology, with comparable head sizes and tail dimensions, and genome organization are considered distinguishing features. They appear to have conserved regulatory sequences, a horizontally acquired tRNA set and the probable substitution of an alternate base for thymine in the DNA. A close examination of the tail spike region in the DNA revealed four distinct tail spike proteins, an arrangement which might lead to the umbrella-like structures of the tails visible on electron micrographs. These properties set the suggested genus apart from the recently ratified subfamily Tevenvirinae, although a significant evolutionary relationship can be observed.
Funded by: NIGMS NIH HHS: 2R15GM63637-3A1
Archives of virology 2012;157;10;2035-46
Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome.
Department of Haematology, University of Cambridge, Cambridge, UK. firstname.lastname@example.org
The exon-junction complex (EJC) performs essential RNA processing tasks. Here, we describe the first human disorder, thrombocytopenia with absent radii (TAR), caused by deficiency in one of the four EJC subunits. Compound inheritance of a rare null allele and one of two low-frequency SNPs in the regulatory regions of RBM8A, encoding the Y14 subunit of EJC, causes TAR. We found that this inheritance mechanism explained 53 of 55 cases (P < 5 × 10(-228)) of the rare congenital malformation syndrome. Of the 53 cases with this inheritance pattern, 51 carried a submicroscopic deletion of 1q21.1 that has previously been associated with TAR, and two carried a truncation or frameshift null mutation in RBM8A. We show that the two regulatory SNPs result in diminished RBM8A transcription in vitro and that Y14 expression is reduced in platelets from individuals with TAR. Our data implicate Y14 insufficiency and, presumably, an EJC defect as the cause of TAR syndrome.
Funded by: British Heart Foundation: BHF_FS/09/039/27788, BHF_RG/08/014/24067, BHF_RG/09/012/28096, FS/09/039, RG/09/12/28096; Department of Health: DH_RP-PG-0310-1002; Wellcome Trust: WT-082597/Z/07/Z, WT-084183/2/07/2, WT082597, WT084183, WT091310
Nature genetics 2012;44;4;435-9, S1-2
High-throughput decoding of antitrypanosomal drug efficacy and resistance.
London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.
The concept of disease-specific chemotherapy was developed a century ago. Dyes and arsenical compounds that displayed selectivity against trypanosomes were central to this work, and the drugs that emerged remain in use for treating human African trypanosomiasis (HAT). The importance of understanding the mechanisms underlying selective drug action and resistance for the development of improved HAT therapies has been recognized, but these mechanisms have remained largely unknown. Here we use all five current HAT drugs for genome-scale RNA interference target sequencing (RIT-seq) screens in Trypanosoma brucei, revealing the transporters, organelles, enzymes and metabolic pathways that function to facilitate antitrypanosomal drug action. RIT-seq profiling identifies both known drug importers and the only known pro-drug activator, and links more than fifty additional genes to drug action. A bloodstream stage-specific invariant surface glycoprotein (ISG75) family mediates suramin uptake, and the AP1 adaptin complex, lysosomal proteases and major lysosomal transmembrane protein, as well as spermidine and N-acetylglucosamine biosynthesis, all contribute to suramin action. Further screens link ubiquinone availability to nitro-drug action, plasma membrane P-type H(+)-ATPases to pentamidine action, and trypanothione and several putative kinases to melarsoprol action. We also demonstrate a major role for aquaglyceroporins in pentamidine and melarsoprol cross-resistance. These advances in our understanding of mechanisms of antitrypanosomal drug efficacy and resistance will aid the rational design of new therapies and help to combat drug resistance, and provide unprecedented molecular insight into the mode of action of antitrypanosomal drugs.
Funded by: Wellcome Trust: 085775, 085775/Z/08/Z, 090007, 090007/Z/09/Z, 093010, 093010/Z/10/Z
Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association study.
Wellcome Trust Sanger Institute, Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. email@example.com
Background: Osteoarthritis is the most common form of arthritis worldwide and is a major cause of pain and disability in elderly people. The health economic burden of osteoarthritis is increasing commensurate with obesity prevalence and longevity. Osteoarthritis has a strong genetic component but the success of previous genetic studies has been restricted due to insufficient sample sizes and phenotype heterogeneity.
Methods: We undertook a large genome-wide association study (GWAS) in 7410 unrelated and retrospectively and prospectively selected patients with severe osteoarthritis in the arcOGEN study, 80% of whom had undergone total joint replacement, and 11,009 unrelated controls from the UK. We replicated the most promising signals in an independent set of up to 7473 cases and 42,938 controls, from studies in Iceland, Estonia, the Netherlands, and the UK. All patients and controls were of European descent.
Findings: We identified five genome-wide significant loci (binomial test p≤5·0×10(-8)) for association with osteoarthritis and three loci just below this threshold. The strongest association was on chromosome 3 with rs6976 (odds ratio 1·12 [95% CI 1·08-1·16]; p=7·24×10(-11)), which is in perfect linkage disequilibrium with rs11177. This SNP encodes a missense polymorphism within the nucleostemin-encoding gene GNL3. Levels of nucleostemin were raised in chondrocytes from patients with osteoarthritis in functional studies. Other significant loci were on chromosome 9 close to ASTN2, chromosome 6 between FILIP1 and SENP6, chromosome 12 close to KLHDC5 and PTHLH, and in another region of chromosome 12 close to CHST11. One of the signals close to genome-wide significance was within the FTO gene, which is involved in regulation of bodyweight-a strong risk factor for osteoarthritis. All risk variants were common in frequency and exerted small effects.
Interpretation: Our findings provide insight into the genetics of arthritis and identify new pathways that might be amenable to future therapeutic intervention.
Funding: arcOGEN was funded by a special purpose grant from Arthritis Research UK.
Funded by: Arthritis Research UK: 18030; Medical Research Council: MRC_G0100594, MRC_G0901461, MRC_MC_U122886349
Lancet (London, England) 2012;380;9844;815-23
An evaluation of different meta-analysis approaches in the presence of allelic heterogeneity.
Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK.
Meta-analysis has proven a useful tool in genetic association studies. Allelic heterogeneity can arise from ethnic background differences across populations being meta-analyzed (for example, in search of common frequency variants through genome-wide association studies), and through the presence of multiple low frequency and rare associated variants in the same functional unit of interest (for example, within a gene or a regulatory region). The latter challenge will be increasingly relevant in whole-genome and whole-exome sequencing studies investigating association with complex traits. Here, we evaluate the performance of different approaches to meta-analysis in the presence of allelic heterogeneity. We simulate allelic heterogeneity scenarios in three populations and examine the performance of current approaches to the analysis of these data. We show that current approaches can detect only a small fraction of common frequency causal variants. We also find that for low-frequency variants with large effects (odds ratios 2-3), single-point tests have high power, but also high false-positive rates. P-value based meta-analysis of summary results from allele-matching locus-wide tests outperforms collapsing approaches. We conclude that current strategies for the combination of genetic association data in the presence of allelic heterogeneity are insufficiently powered.
Funded by: Wellcome Trust: 098051
European journal of human genetics : EJHG 2012;20;6;709-12
Imputation of rare variants in next-generation association studies.
Wellcome Trust Sanger Institute, Hinxton, UK. firstname.lastname@example.org
The role of rare variants has become a focus in the search for association with complex traits. Imputation is a powerful and cost-efficient tool to access variants that have not been directly typed, but there are several challenges when imputing rare variants, most notably reference panel selection. Extensions to rare variant association tests to incorporate genotype uncertainty from imputation are discussed, as well as the use of imputed low-frequency and rare variants in the study of population isolates.
Funded by: Wellcome Trust: WT098051
Human heredity 2012;74;3-4;196-204
ARIEL and AMELIA: testing for an accumulation of rare variants using next-generation sequencing data.
Wellcome Trust Sanger Institute, Hinxton, UK.
Objectives: There is increasing evidence that rare variants play a role in some complex traits, but their analysis is not straightforward. Locus-based tests become necessary due to low power in rare variant single-point association analyses. In addition, variant quality scores are available for sequencing data, but are rarely taken into account. Here, we propose two locus-based methods that incorporate variant quality scores: a regression-based collapsing approach and an allele-matching method.
Methods: Using simulated sequencing data we compare 4 locus-based tests of trait association under different scenarios of data quality. We test two collapsing-based approaches and two allele-matching-based approaches, taking into account variant quality scores and ignoring variant quality scores. We implement the collapsing and allele-matching approaches accounting for variant quality in the freely available ARIEL and AMELIA software.
Results: The incorporation of variant quality scores in locus-based association tests has power advantages over weighting each variant equally. The allele-matching methods are robust to the presence of both protective and risk variants in a locus, while collapsing methods exhibit a dramatic loss of power in this scenario.
Conclusions: The incorporation of variant quality scores should be a standard protocol when performing locus-based association analysis on sequencing data. The ARIEL and AMELIA software implement collapsing and allele-matching locus association analysis methods, respectively, that allow the incorporation of variant quality scores.
Funded by: Wellcome Trust: 098051, WT088885, WT090532
Human heredity 2012;73;2;84-94
Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. email@example.com
Our understanding of the composition of multi-clonal malarial infections and the epidemiological factors which shape their diversity remain poorly understood. Traditionally within-host diversity has been defined in terms of the multiplicity of infection (MOI) derived by PCR-based genotyping. Massively parallel, single molecule sequencing technologies now enable individual read counts to be derived on genome-wide datasets facilitating the development of new statistical approaches to describe within-host diversity. In this class of measures the F(WS) metric characterizes within-host diversity and its relationship to population level diversity. Utilizing P. falciparum field isolates from patients in West Africa we here explore the relationship between the traditional MOI and F(WS) approaches. F(WS) statistics were derived from read count data at 86,158 SNPs in 64 samples sequenced on the Illumina GA platform. MOI estimates were derived by PCR at the msp-1 and -2 loci. Significant correlations were observed between the two measures, particularly with the msp-1 locus (P = 5.92×10(-5)). The F(WS) metric should be more robust than the PCR-based approach owing to reduced sensitivity to potential locus-specific artifacts. Furthermore the F(WS) metric captures information on a range of parameters which influence out-crossing risk including the number of clones (MOI), their relative proportions and genetic divergence. This approach should provide novel insights into the factors which correlate with, and shape within-host diversity.
Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 089275, 090532, 090770
PloS one 2012;7;2;e32891
Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project.
Institut Clinique de la Souris, PHENOMIN, IGBMC/ICS-MCI, CNRS, INSERM, Université de Strasbourg, UMR7104, UMR964, 1 rue Laurent Fries, 67404 Illkirch, France.
Two large-scale phenotyping efforts, the European Mouse Disease Clinic (EUMODIC) and the Wellcome Trust Sanger Institute Mouse Genetics Project (SANGER-MGP), started during the late 2000s with the aim to deliver a comprehensive assessment of phenotypes or to screen for robust indicators of diseases in mouse mutants. They both took advantage of available mouse mutant lines but predominantly of the embryonic stem (ES) cells resources derived from the European Conditional Mouse Mutagenesis programme (EUCOMM) and the Knockout Mouse Project (KOMP) to produce and study 799 mouse models that were systematically analysed with a comprehensive set of physiological and behavioural paradigms. They captured more than 400 variables and an additional panel of metadata describing the conditions of the tests. All the data are now available through EuroPhenome database (www.europhenome.org) and the WTSI mouse portal (http://www.sanger.ac.uk/mouseportal/), and the corresponding mouse lines are available through the European Mouse Mutant Archive (EMMA), the International Knockout Mouse Consortium (IKMC), or the Knockout Mouse Project (KOMP) Repository. Overall conclusions from both studies converged, with at least one phenotype scored in at least 80% of the mutant lines. In addition, 57% of the lines were viable, 13% subviable, 30% embryonic lethal, and 7% displayed fertility impairments. These efforts provide an important underpinning for a future global programme that will undertake the complete functional annotation of the mammalian genome in the mouse model.
Funded by: Cancer Research UK: CRUK_13031; Medical Research Council: MRC_G0300212, MRC_MC_U142684171, MRC_MC_U142684172, MRC_MC_U142684175, MRC_MC_qA137918; Wellcome Trust: 098051
Mammalian genome : official journal of the International Mammalian Genome Society 2012;23;9-10;600-10
Ubiquitous Hepatocystis infections, but no evidence of Plasmodium falciparum-like malaria parasites in wild greater spot-nosed monkeys (Cercopithecus nictitans).
Institut de Recherche pour le Développement, University of Montpellier, 34394 Montpellier, France.
Western gorillas (Gorilla gorilla) have been identified as the natural reservoir of the parasites that were the immediate precursor of Plasmodium falciparum infecting humans. Recently, a P. falciparum-like sequence was reported in a sample from a captive greater spot-nosed monkey (Cercopithecus nictitans), and was taken to indicate that this species may also be a natural reservoir for P. falciparum-related parasites. To test this hypothesis we screened blood samples from 292 wild C. nictitans monkeys that had been hunted for bushmeat in Cameroon. We detected Hepatocystis spp. in 49% of the samples, as well as one sequence from a clade of Plasmodium spp. previously found in birds, lizards and bats. However, none of the 292 wild C. nictitans harbored P. falciparum-like parasites.
Funded by: NIAID NIH HHS: AI91595, R01 AI50529; Wellcome Trust: 090851
International journal for parasitology 2012;42;8;709-13
A dominantly acting murine allele of Mcm4 causes chromosomal abnormalities and promotes tumorigenesis.
School of Pharmacy and UW Carbone Cancer Center, University of Wisconsin Madison, Madison, WI, USA.
Here we report the isolation of a murine model for heritable T cell lymphoblastic leukemia/lymphoma (T-ALL) called Spontaneous dominant leukemia (Sdl). Sdl heterozygous mice develop disease with a short latency and high penetrance, while mice homozygous for the mutation die early during embryonic development. Sdl mice exhibit an increase in the frequency of micronucleated reticulocytes, and T-ALLs from Sdl mice harbor small amplifications and deletions, including activating deletions at the Notch1 locus. Using exome sequencing it was determined that Sdl mice harbor a spontaneously acquired mutation in Mcm4 (Mcm4(D573H)). MCM4 is part of the heterohexameric complex of MCM2-7 that is important for licensing of DNA origins prior to S phase and also serves as the core of the replicative helicase that unwinds DNA at replication forks. Previous studies in murine models have discovered that genetic reductions of MCM complex levels promote tumor formation by causing genomic instability. However, Sdl mice possess normal levels of Mcms, and there is no evidence for loss-of-heterozygosity at the Mcm4 locus in Sdl leukemias. Studies in Saccharomyces cerevisiae indicate that the Sdl mutation produces a biologically inactive helicase. Together, these data support a model in which chromosomal abnormalities in Sdl mice result from the ability of MCM4(D573H) to incorporate into MCM complexes and render them inactive. Our studies indicate that dominantly acting alleles of MCMs can be compatible with viability but have dramatic oncogenic consequences by causing chromosomal abnormalities.
Funded by: Cancer Research UK; Medical Research Council: G0800024; NCI NIH HHS: K01CA122183, P30 CA014520, P30CA014520, R03CA137751; NIGMS NIH HHS: GM102756; Wellcome Trust
PLoS genetics 2012;8;11;e1003034
Evolutionary dynamics of local pandemic H1N1/2009 influenza virus lineages revealed by whole-genome analysis.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.
Virus gene sequencing and phylogenetics can be used to study the epidemiological dynamics of rapidly evolving viruses. With complete genome data, it becomes possible to identify and trace individual transmission chains of viruses such as influenza virus during the course of an epidemic. Here we sequenced 153 pandemic influenza H1N1/09 virus genomes from United Kingdom isolates from the first (127 isolates) and second (26 isolates) waves of the 2009 pandemic and used their sequences, dates of isolation, and geographical locations to infer the genetic epidemiology of the epidemic in the United Kingdom. We demonstrate that the epidemic in the United Kingdom was composed of many cocirculating lineages, among which at least 13 were exclusively or predominantly United Kingdom clusters. The estimated divergence times of two of the clusters predate the detection of pandemic H1N1/09 virus in the United Kingdom, suggesting that the pandemic H1N1/09 virus was already circulating in the United Kingdom before the first clinical case. Crucially, three clusters contain isolates from the second wave of infections in the United Kingdom, two of which represent chains of transmission that appear to have persisted within the United Kingdom between the first and second waves. This demonstrates that whole-genome analysis can track in fine detail the behavior of individual influenza virus lineages during the course of a single epidemic or pandemic.
Funded by: Medical Research Council: MC_U117512723; Wellcome Trust: 095831
Journal of virology 2012;86;1;11-8
From HLA association to function.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. firstname.lastname@example.org
A new study refines the association signals for rheumatoid arthritis susceptibility in the major histocompatibility complex (MHC) region to five amino-acid positions encoded in three HLA genes, all within peptide-binding grooves. By adapting statistical methods from genome-wide association studies (GWAS) and using imputation from a large reference panel, they demonstrate the potential for this approach to identify functional variants in associated regions.
Nature genetics 2012;44;3;235-6
Semaphorin-7A is an erythrocyte receptor for P. falciparum merozoite-specific TRAP homolog, MTRAP.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom ; Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
The motility and invasion of Plasmodium parasites is believed to require a cytoplasmic actin-myosin motor associated with a cell surface ligand belonging to the TRAP (thrombospondin-related anonymous protein) family. Current models of invasion usually invoke the existence of specific receptors for the TRAP-family ligands on the surface of the host cell; however, the identities of these receptors remain largely unknown. Here, we identify the GPI-linked protein Semaphorin-7A (CD108) as an erythrocyte receptor for the P. falciparum merozoite-specific TRAP homolog (MTRAP) by using a systematic screening approach designed to detect extracellular protein interactions. The specificity of the interaction was demonstrated by showing that binding was saturable and by quantifying the equilibrium and kinetic biophysical binding parameters using surface plasmon resonance. We found that two MTRAP monomers interact via their tandem TSR domains with the Sema domains of a Semaphorin-7A homodimer. Known naturally-occurring polymorphisms in Semaphorin-7A did not quantitatively affect MTRAP binding nor did the presence of glycans on the receptor. Attempts to block the interaction during in vitro erythrocyte invasion assays using recombinant proteins and antibodies showed no significant inhibitory effect, suggesting the inaccessibility of the complex to proteinaceous blocking agents. These findings now provide important experimental evidence to support the model that parasite TRAP-family ligands interact with specific host receptors during cellular invasion.
Funded by: Medical Research Council: MR/J002283/1; Wellcome Trust: 098051
PLoS pathogens 2012;8;11;e1003031
Deficiency for the ubiquitin ligase UBE3B in a blepharophimosis-ptosis-intellectual-disability syndrome.
Raphael Recanati Genetics Institute, Rabin Medical Center, Beilinson Campus, Petah Tikva 49100, Israel. email@example.com
Ubiquitination plays a crucial role in neurodevelopment as exemplified by Angelman syndrome, which is caused by genetic alterations of the ubiquitin ligase-encoding UBE3A gene. Although the function of UBE3A has been widely studied, little is known about its paralog UBE3B. By using exome and capillary sequencing, we here identify biallelic UBE3B mutations in four patients from three unrelated families presenting an autosomal-recessive blepharophimosis-ptosis-intellectual-disability syndrome characterized by developmental delay, growth retardation with a small head circumference, facial dysmorphisms, and low cholesterol levels. UBE3B encodes an uncharacterized E3 ubiquitin ligase. The identified UBE3B variants include one frameshift and two splice-site mutations as well as a missense substitution affecting the highly conserved HECT domain. Disruption of mouse Ube3b leads to reduced viability and recapitulates key aspects of the human disorder, such as reduced weight and brain size and a downregulation of cholesterol synthesis. We establish that the probable Caenorhabditis elegans ortholog of UBE3B, oxi-1, functions in the ubiquitin/proteasome system in vivo and is especially required under oxidative stress conditions. Our data reveal the pleiotropic effects of UBE3B deficiency and reinforce the physiological importance of ubiquitination in neuronal development and function in mammals.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 090532
American journal of human genetics 2012;91;6;998-1010
Rapid-throughput skeletal phenotyping of 100 knockout mice identifies 9 new genes that determine bone strength.
Molecular Endocrinology Group, Department of Medicine, Imperial College London, London, United Kingdom.
Osteoporosis is a common polygenic disease and global healthcare priority but its genetic basis remains largely unknown. We report a high-throughput multi-parameter phenotype screen to identify functionally significant skeletal phenotypes in mice generated by the Wellcome Trust Sanger Institute Mouse Genetics Project and discover novel genes that may be involved in the pathogenesis of osteoporosis. The integrated use of primary phenotype data with quantitative x-ray microradiography, micro-computed tomography, statistical approaches and biomechanical testing in 100 unselected knockout mouse strains identified nine new genetic determinants of bone mass and strength. These nine new genes include five whose deletion results in low bone mass and four whose deletion results in high bone mass. None of the nine genes have been implicated previously in skeletal disorders and detailed analysis of the biomechanical consequences of their deletion revealed a novel functional classification of bone structure and strength. The organ-specific and disease-focused strategy described in this study can be applied to any biological system or tractable polygenic disease, thus providing a general basis to define gene function in a system-specific manner. Application of the approach to diseases affecting other physiological systems will help to realize the full potential of the International Mouse Phenotyping Consortium.
Funded by: Arthritis Research UK: h UK 18292; Medical Research Council: G0800261; Wellcome Trust: 094134, 77157/Z/05/Z
PLoS genetics 2012;8;8;e1002858
Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. firstname.lastname@example.org
Age-related changes in DNA methylation have been implicated in cellular senescence and longevity, yet the causes and functional consequences of these variants remain unclear. To elucidate the role of age-related epigenetic changes in healthy ageing and potential longevity, we tested for association between whole-blood DNA methylation patterns in 172 female twins aged 32 to 80 with age and age-related phenotypes. Twin-based DNA methylation levels at 26,690 CpG-sites showed evidence for mean genome-wide heritability of 18%, which was supported by the identification of 1,537 CpG-sites with methylation QTLs in cis at FDR 5%. We performed genome-wide analyses to discover differentially methylated regions (DMRs) for sixteen age-related phenotypes (ap-DMRs) and chronological age (a-DMRs). Epigenome-wide association scans (EWAS) identified age-related phenotype DMRs (ap-DMRs) associated with LDL (STAT5A), lung function (WT1), and maternal longevity (ARL4A, TBX20). In contrast, EWAS for chronological age identified hundreds of predominantly hyper-methylated age DMRs (490 a-DMRs at FDR 5%), of which only one (TBX20) was also associated with an age-related phenotype. Therefore, the majority of age-related changes in DNA methylation are not associated with phenotypic measures of healthy ageing in later life. We replicated a large proportion of a-DMRs in a sample of 44 younger adult MZ twins aged 20 to 61, suggesting that a-DMRs may initiate at an earlier age. We next explored potential genetic and environmental mechanisms underlying a-DMRs and ap-DMRs. Genome-wide overlap across cis-meQTLs, genotype-phenotype associations, and EWAS ap-DMRs identified CpG-sites that had cis-meQTLs with evidence for genotype-phenotype association, where the CpG-site was also an ap-DMR for the same phenotype. Monozygotic twin methylation difference analyses identified one potential environmentally-mediated ap-DMR associated with total cholesterol and LDL (CSMD1). Our results suggest that in a small set of genes DNA methylation may be a candidate mechanism of mediating not only environmental, but also genetic effects on age-related phenotypes.
Funded by: European Research Council: 250157; Medical Research Council: G0900339; Wellcome Trust: 090532, 095515
PLoS genetics 2012;8;4;e1002629
A robust clustering algorithm for identifying problematic samples in genome-wide association studies.
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.
Summary: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections.
Availability: The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer
Supplementary data are available at Bioinformatics online.
Funded by: Wellcome Trust: 075491/Z/04/B, 084575/Z/08/Z, 090532, 090532/Z/09/Z
Bioinformatics (Oxford, England) 2012;28;1;134-5
The genome of Mycobacterium africanum West African 2 reveals a lineage-specific locus and genome erosion common to the M. tuberculosis complex.
Wellcome Trust Genome Campus, Wellcome Trust Sanger Institute, Hinxton, UK.
Background: M. africanum West African 2 constitutes an ancient lineage of the M. tuberculosis complex that commonly causes human tuberculosis in West Africa and has an attenuated phenotype relative to M. tuberculosis.
Methodology/principal findings: In search of candidate genes underlying these differences, the genome of M. africanum West African 2 was sequenced using classical capillary sequencing techniques. Our findings reveal a unique sequence, RD900, that was independently lost during the evolution of two important lineages within the complex: the "modern" M. tuberculosis group and the lineage leading to M. bovis. Closely related to M. bovis and other animal strains within the M. tuberculosis complex, M. africanum West African 2 shares an abundance of pseudogenes with M. bovis but also with M. africanum West African clade 1. Comparison with other strains of the M. tuberculosis complex revealed pseudogenes events in all the known lineages pointing toward ongoing genome erosion likely due to increased genetic drift and relaxed selection linked to serial transmission-bottlenecks and an intracellular lifestyle.
Conclusions/significance: The genomic differences identified between M. africanum West African 2 and the other strains of the Mycobacterium tuberculosis complex may explain its attenuated phenotype, and pave the way for targeted experiments to elucidate the phenotypic characteristic of M. africanum. Moreover, availability of the whole genome data allows for verification of conservation of targets used for the next generation of diagnostics and vaccines, in order to ensure similar efficacy in West Africa.
Funded by: Medical Research Council: MRC_MC_U190071468, MRC_MC_U190074190, MRC_MC_U190081982, MRC_MC_U190081991, MRC_MC_U190085850, MRC_MC_UP_A900_1122; Wellcome Trust
PLoS neglected tropical diseases 2012;6;2;e1552
An insertional mutagenesis screen identifies genes that cooperate with Mll-AF9 in a murine leukemogenesis model.
Department of Genetics, Cell Biology and Development, Masonic Cancer Center, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA.
Patients with a t(9;11) translocation (MLL-AF9) develop acute myeloid leukemia (AML), and while in mice the expression of this fusion oncogene also results in the development of myeloid leukemia, it is with long latency. To identify mutations that cooperate with Mll-AF9, we infected neonatal wild-type (WT) or Mll-AF9 mice with a murine leukemia virus (MuLV). MuLV-infected Mll-AF9 mice succumbed to disease significantly faster than controls presenting predominantly with myeloid leukemia while infected WT animals developed predominantly lymphoid leukemia. We identified 88 candidate cancer genes near common sites of proviral insertion. Analysis of transcript levels revealed significantly elevated expression of Mn1, and a trend toward increased expression of Bcl11a and Fosb in Mll-AF9 murine leukemia samples with proviral insertions proximal to these genes. Accordingly, FOSB and BCL11A were also overexpressed in human AML harboring MLL gene translocations. FOSB was revealed to be essential for growth in mouse and human myeloid leukemia cells using shRNA lentiviral vectors in vitro. Importantly, MN1 cooperated with Mll-AF9 in leukemogenesis in an in vivo BM viral transduction and transplantation assay. Together, our data identified genes that define transcription factor networks and important genetic pathways acting during progression of leukemia induced by MLL fusion oncogenes.
Funded by: Cancer Research UK: 13031; Howard Hughes Medical Institute; NCI NIH HHS: CA009138, F32 CA106192, K01 CA122183, U01 CA84221; Wellcome Trust
Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, Sydney, New South Wales 2010, Australia.
Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.
Funded by: Cancer Research UK: 13031; NCI NIH HHS: 2P50CA101955, P01CA134292, P50 CA101955, P50 CA102701, P50CA062924, R01 CA097075, R01 CA97075; NHGRI NIH HHS: U54 HG003273; Wellcome Trust
Impact of common variation in bone-related genes on type 2 diabetes and related traits.
Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA.
Exploring genetic pleiotropy can provide clues to a mechanism underlying the observed epidemiological association between type 2 diabetes and heightened fracture risk. We examined genetic variants associated with bone mineral density (BMD) for association with type 2 diabetes and glycemic traits in large well-phenotyped and -genotyped consortia. We undertook follow-up analysis in ∼19,000 individuals and assessed gene expression. We queried single nucleotide polymorphisms (SNPs) associated with BMD at levels of genome-wide significance, variants in linkage disequilibrium (r(2) > 0.5), and BMD candidate genes. SNP rs6867040, at the ITGA1 locus, was associated with a 0.0166 mmol/L (0.004) increase in fasting glucose per C allele in the combined analysis. Genetic variants in the ITGA1 locus were associated with its expression in the liver but not in adipose tissue. ITGA1 variants appeared among the top loci associated with type 2 diabetes, fasting insulin, β-cell function by homeostasis model assessment, and 2-h post-oral glucose tolerance test glucose and insulin levels. ITGA1 has demonstrated genetic pleiotropy in prior studies, and its suggested role in liver fibrosis, insulin secretion, and bone healing lends credence to its contribution to both osteoporosis and type 2 diabetes. These findings further underscore the link between skeletal and glucose metabolism and highlight a locus to direct future investigations.
Funded by: Medical Research Council: G0900339, G1002084, MC_U106179471; NCATS NIH HHS: UL1 TR000150; NCRR NIH HHS: 1-S10-RR-163736-01A1, M01-RR-16500, UL1-RR-025005; NHGRI NIH HHS: U01-HG-004402; NHLBI NIH HHS: 5-R01-HL-08770003, 5-R01-HL-08821502, N01-HC-25195, N02-HL-6-4278, R01-HL-086694, R01-HL-087641, R01-HL-59367, U01-HL-72515; NIA NIH HHS: R01-AG-18728, R01-AR/AG-41398; NIAMS NIH HHS: R21-AR-056405; NIDDK NIH HHS: 1-L30-DK-089944-01, 5-R01-DK-06833603, 5-R01-DK-07568102, K24 DK080140, K24-DK-080140, P30-DK-072488, P60-DK-079637, R01-DK-04261, R01-DK-078616, T32-DK-007028-35; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust
Rare MTNR1B variants impairing melatonin receptor 1B function contribute to type 2 diabetes.
Centre National de la Recherche Scientifique Unité Mixte de Recherche, Lille Pasteur Institute, France.
Genome-wide association studies have revealed that common noncoding variants in MTNR1B (encoding melatonin receptor 1B, also known as MT(2)) increase type 2 diabetes (T2D) risk(1,2). Although the strongest association signal was highly significant (P < 1 × 10(-20)), its contribution to T2D risk was modest (odds ratio (OR) of ∼1.10-1.15)(1-3). We performed large-scale exon resequencing in 7,632 Europeans, including 2,186 individuals with T2D, and identified 40 nonsynonymous variants, including 36 very rare variants (minor allele frequency (MAF) <0.1%), associated with T2D (OR = 3.31, 95% confidence interval (CI) = 1.78-6.18; P = 1.64 × 10(-4)). A four-tiered functional investigation of all 40 mutants revealed that 14 were non-functional and rare (MAF < 1%), and 4 were very rare with complete loss of melatonin binding and signaling capabilities. Among the very rare variants, the partial- or total-loss-of-function variants but not the neutral ones contributed to T2D (OR = 5.67, CI = 2.17-14.82; P = 4.09 × 10(-4)). Genotyping the four complete loss-of-function variants in 11,854 additional individuals revealed their association with T2D risk (8,153 individuals with T2D and 10,100 controls; OR = 3.88, CI = 1.49-10.07; P = 5.37 × 10(-3)). This study establishes a firm functional link between MTNR1B and T2D risk.
Funded by: Medical Research Council: MC_U106179471; Wellcome Trust: 077016, 077016/Z/05/Z, 090532
Nature genetics 2012;44;3;297-301
Genome-wide meta-analysis of common variant differences between men and women.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
The male-to-female sex ratio at birth is constant across world populations with an average of 1.06 (106 male to 100 female live births) for populations of European descent. The sex ratio is considered to be affected by numerous biological and environmental factors and to have a heritable component. The aim of this study was to investigate the presence of common allele modest effects at autosomal and chromosome X variants that could explain the observed sex ratio at birth. We conducted a large-scale genome-wide association scan (GWAS) meta-analysis across 51 studies, comprising overall 114 863 individuals (61 094 women and 53 769 men) of European ancestry and 2 623 828 common (minor allele frequency >0.05) single-nucleotide polymorphisms (SNPs). Allele frequencies were compared between men and women for directly-typed and imputed variants within each study. Forward-time simulations for unlinked, neutral, autosomal, common loci were performed under the demographic model for European populations with a fixed sex ratio and a random mating scheme to assess the probability of detecting significant allele frequency differences. We do not detect any genome-wide significant (P < 5 × 10(-8)) common SNP differences between men and women in this well-powered meta-analysis. The simulated data provided results entirely consistent with these findings. This large-scale investigation across ~115 000 individuals shows no detectable contribution from common genetic variants to the observed skew in the sex ratio. The absence of sex-specific differences is useful in guiding genetic association study design, for example when using mixed controls for sex-biased traits.
Funded by: Canadian Institutes of Health Research: MOP-82893; Cancer Research UK; Chief Scientist Office: CSO_CZB/4/710; Intramural NIH HHS; Medical Research Council: MRC_G0401527, MRC_G1000143, MRC_G1001799, MRC_MC_PC_U127561128, MRC_MC_U106179471, MRC_MC_U127561128; NCRR NIH HHS: P20 RR018787, RR018787, UL1 RR025005, UL1RR025005; NHGRI NIH HHS: U01 HG004402, U01HG004402; NHLBI NIH HHS: HHSN268201100005C, HHSN268201100005G, HHSN268201100005I, HHSN268201100006C, HHSN268201100007C, HHSN268201100007I, HHSN268201100008C, HHSN268201100008I, HHSN268201100009C, HHSN268201100009I, HHSN268201100010C, HHSN268201100011C, HHSN268201100011I, HHSN268201100012C, HL65234, HL67466, R01 HL059367, R01 HL065234, R01 HL067466, R01 HL086694, R01 HL087641, R01HL086694, R01HL087641, R01HL59367; NIA NIH HHS: N.1-AG-1-1, N.1-AG-1-2111, N01-AG-1-2100, N01-AG-5-0002; NIAAA NIH HHS: AA10248, AA13320, AA13321, AA13326, AA14041, K05 AA017688, R01 AA007535, R01 AA013320, R01 AA013321, R01 AA013326, R01 AA014041; NIDDK NIH HHS: DK062370, P30 DK020572, R01 DK062370, R56 DK062370, U01 DK062370; NIMH NIH HHS: MH081802, MH66206, R01 MH059160, R01 MH062633, R01 MH066206, R01 MH081802, U24 MH068457, U24 MH068457-06; NIMHD NIH HHS: R01 MD009164; NLM NIH HHS: LM010098, R01 LM010098; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust: 076113, 089062/Z/09/Z, 092447/Z/10/Z, 098051, 89061/Z/09/Z, WT095831
Human molecular genetics 2012;21;21;4805-15
A genome-wide association meta-analysis identifies new childhood obesity loci.
Center for Applied Genomics, Abramson Research Center, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
Multiple genetic variants have been associated with adult obesity and a few with severe obesity in childhood; however, less progress has been made in establishing genetic influences on common early-onset obesity. We performed a North American, Australian and European collaborative meta-analysis of 14 studies consisting of 5,530 cases (≥95th percentile of body mass index (BMI)) and 8,318 controls (<50th percentile of BMI) of European ancestry. Taking forward the eight newly discovered signals yielding association with P < 5 × 10(-6) in nine independent data sets (2,818 cases and 4,083 controls), we observed two loci that yielded genome-wide significant combined P values near OLFM4 at 13q14 (rs9568856; P = 1.82 × 10(-9); odds ratio (OR) = 1.22) and within HOXB5 at 17q21 (rs9299; P = 3.54 × 10(-9); OR = 1.14). Both loci continued to show association when two extreme childhood obesity cohorts were included (2,214 cases and 2,674 controls). These two loci also yielded directionally consistent associations in a previous meta-analysis of adult BMI(1).
Funded by: British Heart Foundation: PG/09/023, PG/09/023/26806, PG/1996183/9569; Canadian Institutes of Health Research: MOP-82893; Department of Health: PHCS/C4/4/016; Medical Research Council: 74882, G0000934, G0100103, G0500539, G0600705, G0601653, G0800582, G0801056, G0900554, MC_UP_A620_1014; NHLBI NIH HHS: 1RC2HL101543, 1RC2HL101651, 5R01HL061768, 5R01HL076647, 5R01HL087679-02, 5R01HL087680; NICHD NIH HHS: R01 HD056465, R01 HD056465-01A1, R01 HD056465-02, R01 HD056465-03, R01 HD056465-04, R01 HD056465-05, R24 HD050924; NIDDK NIH HHS: R01 DK075787, U01 DK062418; NIEHS NIH HHS: 5P01ES009581, 5P01ES011627, 5P30ES007048, 5R01ES014447, 5R01ES014708, 5R01ES016535, 5R03ES014046, P30 ES007048; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; ORD VA: RD831861-01; PHS HHS: R826708-01; Wellcome Trust: 052515/2/97/2, 068545/Z/02, 076467, 077016/Z/05/Z, 083948, 086596/Z/08/Z, 090532, 092731, 098395, GR069224, WT088431MA
Nature genetics 2012;44;5;526-31
Image-based characterization of thrombus formation in time-lapse DIC microscopy.
Computer Aided Medical Procedures, Technische Universität München (TUM), Garching bei München 85748, Germany. firstname.lastname@example.org
The characterization of thrombus formation in time-lapse DIC microscopy is of increased interest for identifying genes which account for atherothrombosis and coronary artery diseases (CADs). In particular, we are interested in large-scale studies on zebrafish, which result in large amount of data, and require automatic processing. In this work, we present an image-based solution for the automatized extraction of parameters quantifying the temporal development of thrombotic plugs. Our system is based on the joint segmentation of thrombotic and aortic regions over time. This task is made difficult by the low contrast and the high dynamic conditions observed in vivo DIC microscopic scenes. Our key idea is to perform this segmentation by distinguishing the different motion patterns in image time series rather than by solving standard image segmentation tasks in each image frame. Thus, we are able to compensate for the poor imaging conditions. We model motion patterns by energies based on the idea of dynamic textures, and regularize the model by two prior energies on the shape of the aortic region and on the topological relationship between the thrombus and the aorta. We demonstrate the performance of our segmentation algorithm by qualitative and quantitative experiments on synthetic examples as well as on real in vivo microscopic sequences.
Medical image analysis 2012;16;4;915-31
The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium.
Proteomics Services Team, EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.
Funded by: Biotechnology and Biological Sciences Research Council: BB/I024204/1; Wellcome Trust: WT085949MA
Molecular & cellular proteomics : MCP 2012;11;12;1682-9
Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com
Motivation: High-throughput sequencing (HTS) technologies have made low-cost sequencing of large numbers of samples commonplace. An explosion in the type, not just number, of sequencing experiments has also taken place including genome re-sequencing, population-scale variation detection, whole transcriptome sequencing and genome-wide analysis of protein-bound nucleic acids.
Results: We present Artemis as a tool for integrated visualization and computational analysis of different types of HTS datasets in the context of a reference genome and its corresponding annotation.
Availability: Artemis is freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute websites: http://www.sanger.ac.uk/resources/software/artemis/.
Funded by: Wellcome Trust: WT 076964
Bioinformatics (Oxford, England) 2012;28;4;464-9
Phylogeographic variation in recombination rates within a global clone of methicillin-resistant Staphylococcus aureus.
Department of Biology and Biochemistry, University of Bath, Claverton Down Bath, Bath and North East Somerset BA2 7AY, UK. E.Feil@bath.ac.uk.
BACKGROUND: Next-generation sequencing (NGS) is a powerful tool for understanding both patterns of descent over time and space (phylogeography) and the molecular processes underpinning genome divergence in pathogenic bacteria. Here, we describe a synthesis between these perspectives by employing a recently developed Bayesian approach, BRATNextGen, for detecting recombination on an expanded NGS dataset of the globally disseminated methicillin-resistant Staphylococcus aureus (MRSA) clone ST239. RESULTS: The data confirm strong geographical clustering at continental, national and city scales and demonstrate that the rate of recombination varies significantly between phylogeographic sub-groups representing independent introductions from Europe. These differences are most striking when mobile non-core genes are included, but remain apparent even when only considering the stable core genome. The monophyletic ST239 sub-group corresponding to isolates from South America shows heightened recombination, the sub-group predominantly from Asia shows an intermediate level, and a very low level of recombination is noted in a third sub-group representing a large collection from Turkey. CONCLUSIONS: We show that the rapid global dissemination of a single pathogenic bacterial clone results in local variation in measured recombination rates. Possible explanatory variables include the size and time since emergence of each defined sub-population (as determined by the sampling frame), variation in transmission dynamics due to host movement, and changes in the bacterial genome affecting the propensity for recombination.
Genome biology 2012;13;12;R126
Finding a needle in a haystack. Microbial metatranscriptomes.
This month's Genome Watch highlights some of the technical challenges that need to be overcome to gain further insight into microbial metatranscriptomes.
Nature reviews. Microbiology 2012;10;7;446
Copy number variation of the APC gene is associated with regulation of bone mineral density.
Department of Endocrinology and Diabetes, Sir Charles Gairdner Hospital, Nedlands 6009, Australia. Shelby.Chew@health.wa.gov.au
Introduction: Genetic studies of osteoporosis have commonly examined SNPs in candidate genes or whole genome analyses, but insertions and deletions of DNA, collectively called copy number variations (CNVs), also comprise a large amount of the genetic variability between individuals. Previously, SNPs in the APC gene have been strongly associated with femoral neck and lumbar spine volumetric bone mineral density in older men. In addition, familial adenomatous polyposis patients carrying heterozygous mutations in the APC gene have been shown to have significantly higher mean bone mineral density than age- and sex-matched controls suggesting the importance of this gene in regulating bone mineral density. We examined CNV within the APC gene region to test for association with bone mineral density.
Methods: DNA was extracted from venous blood, genotyped using the Human Hap610 arrays and CNV determined from the fluorescence intensity data in 2070 Caucasian men and women aged 47.0 ± 13.0 (mean ± SD) years, to assess the effects of the CNV on bone mineral density at the forearm, spine and total hip sites.
Results: Data for covariate adjusted bone mineral density from subjects grouped by APC CNV genotype showed significant difference (P=0.02-0.002). Subjects with a single copy loss of APC had a 7.95%, 13.10% and 13.36% increase in bone mineral density at the forearm, spine and total hip sites respectively, compared to subjects with two copies of the APC gene.
Conclusions: These data support previous findings of APC regulating bone mineral density and demonstrate that a novel CNV of the APC gene is significantly associated with bone mineral density in Caucasian men and women.
Funded by: Canadian Institutes of Health Research; Wellcome Trust
The 1000 Genomes Project: data management and community access.
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.
Funded by: Intramural NIH HHS: Z99 LM999999; Medical Research Council: G1000758; NCI NIH HHS: P01 CA101937; NHGRI NIH HHS: U54 HG003273; NIMHD NIH HHS: P20 MD006899; Wellcome Trust: 085532, 090532, 095908, WT085532
Nature methods 2012;9;5;459-62
TNiK is required for postsynaptic and nuclear signaling pathways and cognitive function.
Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.
Traf2 and NcK interacting kinase (TNiK) contains serine-threonine kinase and scaffold domains and has been implicated in cell proliferation and glutamate receptor regulation in vitro. Here we report its role in vivo using mice carrying a knock-out mutation. TNiK binds protein complexes in the synapse linking it to the NMDA receptor (NMDAR) via AKAP9. NMDAR and metabotropic receptors bidirectionally regulate TNiK phosphorylation and TNiK is required for AMPA expression and synaptic function. TNiK also organizes nuclear complexes and in the absence of TNiK, there was a marked elevation in GSK3β and phosphorylation levels of its cognate phosphorylation sites on NeuroD1 with alterations in Wnt pathway signaling. We observed impairments in dentate gyrus neurogenesis in TNiK knock-out mice and cognitive testing using the touchscreen apparatus revealed impairments in pattern separation on a test of spatial discrimination. Object-location paired associate learning, which is dependent on glutamatergic signaling, was also impaired. Additionally, TNiK knock-out mice displayed hyperlocomotor behavior that could be rapidly reversed by GSK3β inhibitors, indicating the potential for pharmacological rescue of a behavioral phenotype. These data establish TNiK as a critical regulator of cognitive functions and suggest it may play a regulatory role in diseases impacting on its interacting proteins and complexes.
Funded by: Medical Research Council: G0802238; NIMH NIH HHS: MH609197, R01 MH060919; Wellcome Trust
The Journal of neuroscience : the official journal of the Society for Neuroscience 2012;32;40;13987-99
Fibrinogen-binding and platelet-aggregation activities of a Lactobacillus salivarius septicaemia isolate are mediated by a novel fibrinogen-binding protein.
Department of Microbiology and Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland.
The marketplace for probiotic foods is burgeoning, measured in billions of euro per annum. It is imperative, however, that all bacterial strains are fully assessed for human safety. The ability to bind fibrinogen is considered a potential pathogenicity trait that can lead to platelet aggregation, serious medical complications, and in some instances, death. Here we examined strains from species frequently used as probiotics for their ability to bind human fibrinogen. Only one strain (CCUG 47825), a Lactobacillus salivarius isolate from a case of septicaemia, was found to strongly adhere to fibrinogen. Furthermore, this strain was found to aggregate human platelets at a level comparable to the human pathogen Staphylococcus aureus. By sequencing the genome of CCUG 47825, we were able to identify candidate genes responsible for fibrinogen binding. Complementing the genetic analysis with traditional molecular microbiological techniques enabled the identification of the novel fibrinogen receptor, CCUG_2371. Although only strain CCUG 47825 bound fibrinogen under laboratory conditions, homologues of the novel fibrinogen binding gene CCUG_2371 are widespread among L. salivarius strains, maintaining their potential to bind fibrinogen if expressed. We highlight the fact that without a full genetic analysis of strains for human consumption, potential pathogenicity traits may go undetected.
Funded by: Wellcome Trust: WT098051
Molecular microbiology 2012;85;5;862-77
Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation.
Wellcome Trust Sanger Institute, Hinxton, UK. firstname.lastname@example.org
We have developed a sequencing method on the Pacific Biosciences RS sequencer (the PacBio) for small DNA molecules that avoids the need for a standard library preparation. To date this approach has been applied toward sequencing single-stranded and double-stranded viral genomes, bacterial plasmids, plasmid vector models for DNA-modification analysis, and linear DNA fragments covering an entire bacterial genome. Using direct sequencing it is possible to generate sequence data from as little as 1 ng of DNA, offering a significant advantage over current protocols which typically require 400-500 ng of sheared DNA for the library preparation.
Funded by: Medical Research Council: G0801156; Wellcome Trust: 095645, 098051
LRP1B deletion in high-grade serous ovarian cancers is associated with acquired chemotherapy resistance to liposomal doxorubicin.
Cancer Genomics and Genetics Program, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.
High-grade serous cancer (HGSC), the most common subtype of ovarian cancer, often becomes resistant to chemotherapy, leading to poor patient outcomes. Intratumoral heterogeneity occurs in nearly all solid cancers, including ovarian cancer, contributing to the development of resistance mechanisms. In this study, we examined the spatial and temporal genomic variation in HGSC using high-resolution single-nucleotide polymorphism arrays. Multiple metastatic lesions from individual patients were analyzed along with 22 paired pretreatment and posttreatment samples. We documented regions of differential DNA copy number between multiple tumor biopsies that correlated with altered expression of genes involved in cell polarity and adhesion. In the paired primary and relapse cohort, we observed a greater degree of genomic change in tumors from patients that were initially sensitive to chemotherapy and had longer progression-free interval compared with tumors from patients that were resistant to primary chemotherapy. Notably, deletion or downregulation of the lipid transporter LRP1B emerged as a significant correlate of acquired resistance in our analysis. Functional studies showed that reducing LRP1B expression was sufficient to reduce the sensitivity of HGSC cell lines to liposomal doxorubicin, but not to doxorubicin, whereas LRP1B overexpression was sufficient to increase sensitivity to liposomal doxorubicin. Together, our findings underscore the large degree of variation in DNA copy number in spatially and temporally separated tumors in HGSC patients, and they define LRP1B as a potential contributor to the emergence of chemotherapy resistance in these patients.
Funded by: Worldwide Cancer Research: 11-0748
Cancer research 2012;72;16;4060-73
High levels of RNA-editing site conservation amongst 15 laboratory mouse strains.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
Background: Adenosine-to-inosine (A-to-I) editing is a site-selective post-transcriptional alteration of double-stranded RNA by ADAR deaminases that is crucial for homeostasis and development. Recently the Mouse Genomes Project generated genome sequences for 17 laboratory mouse strains and rich catalogues of variants. We also generated RNA-seq data from whole brain RNA from 15 of the sequenced strains.
Results: Here we present a computational approach that takes an initial set of transcriptome/genome mismatch sites and filters these calls taking into account systematic biases in alignment, single nucleotide variant calling, and sequencing depth to identify RNA editing sites with high accuracy. We applied this approach to our panel of mouse strain transcriptomes identifying 7,389 editing sites with an estimated false-discovery rate of between 2.9 and 10.5%. The overwhelming majority of these edits were of the A-to-I type, with less than 2.4% not of this class, and only three of these edits could not be explained as alignment artifacts. We validated 24 novel RNA editing sites in coding sequence, including two non-synonymous edits in the Cacna1d gene that fell into the IQ domain portion of the Cav1.2 voltage-gated calcium channel, indicating a potential role for editing in the generation of transcript diversity.
Conclusions: We show that despite over two million years of evolutionary divergence, the sites edited and the level of editing at each site is remarkably consistent across the 15 strains. In the Cds2 gene we find evidence for RNA editing acting to preserve the ancestral transcript sequence despite genomic sequence divergence.
Funded by: Cancer Research UK; Medical Research Council: G0800024, MC_EX_G0802457, MC_U137761446; Wellcome Trust: 079912, 090532
Genome biology 2012;13;4;26
Meta-analysis of genome-wide association studies for personality.
Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands. email@example.com
Personality can be thought of as a set of characteristics that influence people's thoughts, feelings and behavior across a variety of settings. Variation in personality is predictive of many outcomes in life, including mental health. Here we report on a meta-analysis of genome-wide association (GWA) data for personality in 10 discovery samples (17,375 adults) and five in silico replication samples (3294 adults). All participants were of European ancestry. Personality scores for Neuroticism, Extraversion, Openness to Experience, Agreeableness and Conscientiousness were based on the NEO Five-Factor Inventory. Genotype data of ≈ 2.4M single-nucleotide polymorphisms (SNPs; directly typed and imputed using HapMap data) were available. In the discovery samples, classical association analyses were performed under an additive model followed by meta-analysis using the weighted inverse variance method. Results showed genome-wide significance for Openness to Experience near the RASA1 gene on 5q14.3 (rs1477268 and rs2032794, P=2.8 × 10(-8) and 3.1 × 10(-8)) and for Conscientiousness in the brain-expressed KATNAL2 gene on 18q21.1 (rs2576037, P=4.9 × 10(-8)). We further conducted a gene-based test that confirmed the association of KATNAL2 to Conscientiousness. In silico replication did not, however, show significant associations of the top SNPs with Openness and Conscientiousness, although the direction of effect of the KATNAL2 SNP on Conscientiousness was consistent in all replication samples. Larger scale GWA studies and alternative approaches are required for confirmation of KATNAL2 as a novel gene affecting Conscientiousness.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; NCI NIH HHS: CA89392, P01 CA089392, P01 CA089392-08; NHGRI NIH HHS: U01 HG004422, U01 HG004422-02, U01 HG004446, U01HG004438; NIA NIH HHS: N01-AG-1-2109, Z99 AG999999, ZIA AG000180-25, ZIA AG000180-26, ZIA AG000196-03, ZIA AG000196-04, ZIA AG000197-03, ZIA AG000197-04; NIAAA NIH HHS: AA07580, AA07728, AA11998, AA13320, AA13321, K05 AA017688-04, U10 AA008401, U10AA008401; NIDA NIH HHS: DA019951, DA12854, R01 DA012854-10, R01 DA013423, R01 DA013423-05, R01 DA019963-01A2, R01 DA019963-02, R01 DA019963-03; NIMH NIH HHS: MH081802, R01 MH059160; Wellcome Trust: 089062/Z/09/Z, 89061/Z/09/Z
Molecular psychiatry 2012;17;3;337-49
The Clostridium difficile spo0A gene is a persistence and transmission factor.
Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
Clostridium difficile is a major cause of chronic antibiotic-associated diarrhea and a significant health care-associated pathogen that forms highly resistant and infectious spores. Spo0A is a highly conserved transcriptional regulator that plays a key role in initiating sporulation in Bacillus and Clostridium species. Here, we use a murine model to study the role of the C. difficile spo0A gene during infection and transmission. We demonstrate that C. difficile spo0A mutant derivatives can cause intestinal disease but are unable to persist within and effectively transmit between mice. Thus, the C. difficile Spo0A protein plays a key role in persistent infection, including recurrence and host-to-host transmission in mice.
Funded by: Medical Research Council: G0800170, G0901743; Wellcome Trust: 098051, WT086418MA
Infection and immunity 2012;80;8;2704-11
The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.
Bioinformatics and Genomics, Centre for Genomic Regulation and UPF, 08003 Barcelona, Catalonia, Spain.
The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
Funded by: NHGRI NIH HHS: 1U54HG004555-01, 1U54HG004557-01, K99 HG006698, U54 HG004555, U54 HG004557
Genome research 2012;22;9;1775-89
Landscape of transcription in human cells.
Centre for Genomic Regulation and UPF, Doctor Aiguader 88, Barcelona 08003, Catalonia, Spain.
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
Funded by: European Research Council: 249968; NCI NIH HHS: P30 CA045508; NHGRI NIH HHS: 1RC2HG005591, R01 HG003700, R01HG003700, RC2 HG005591, U01 HG003147, U54 HG004555, U54 HG004557, U54 HG004558, U54 HG004576, U54 HG007004, U54HG004555, U54HG004557, U54HG004558, U54HG004576; NIGMS NIH HHS: R37 GM062534; Wellcome Trust: 062023
Human SH2B1 mutations are associated with maladaptive behaviors and obesity.
Department of Molecular and Integrative Physiology, University of Michigan Medical School, Ann Arbor, Michigan 48109-5622, USA.
Src homology 2 B adapter protein 1 (SH2B1) modulates signaling by a variety of ligands that bind to receptor tyrosine kinases or JAK-associated cytokine receptors, including leptin, insulin, growth hormone (GH), and nerve growth factor (NGF). Targeted deletion of Sh2b1 in mice results in increased food intake, obesity, and insulin resistance, with an intermediate phenotype seen in heterozygous null mice on a high-fat diet. We identified SH2B1 loss-of-function mutations in a large cohort of patients with severe early-onset obesity. Mutation carriers exhibited hyperphagia, childhood-onset obesity, disproportionate insulin resistance, and reduced final height as adults. Unexpectedly, mutation carriers exhibited a spectrum of behavioral abnormalities that were not reported in controls, including social isolation and aggression. We conclude that SH2B1 plays a critical role in the control of human food intake and body weight and is implicated in maladaptive human behavior.
Funded by: Medical Research Council: G0900554, G9824984; NCI NIH HHS: P30-CA46592; NIDDK NIH HHS: P60 DK020572, P60-DK20572, R01 DK054222, R01 DK065122, R01 DK073601, R01-DK065122, R01-DK073601, R01-DK54222; NIGMS NIH HHS: T32 GM008322, T32-GM008322; Wellcome Trust: 077016/Z/05/Z, 082390/Z/07/Z, 098497
The Journal of clinical investigation 2012;122;12;4732-6
Variation in human genes encoding adhesion and proinflammatory molecules are associated with severe malaria in the Vietnamese.
Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam. firstname.lastname@example.org
The genetic basis for susceptibility to malaria has been studied widely in African populations but less is known of the contribution of specific genetic variants in Asian populations. We genotyped 67 single-nucleotide polymorphisms (SNPs) in 1030 severe malaria cases and 2840 controls from Vietnam. After data quality control, genotyping data of 956 cases and 2350 controls were analysed for 65 SNPs (3 gender confirmation, 62 positioned in/near 42 malarial candidate genes). A total of 14 SNPs were monomorphic and 2 (rs8078340 and rs33950507) were not in Hardy-Weinberg equilibrium in controls (P<0.01). In all, 7/46 SNPs in 6 genes (ICAM1, IL1A, IL17RC, IL13, LTA and TNF) were associated with severe malaria, with 3/7 SNPs in the TNF/LTA region. Genotype-phenotype correlations between SNPs and clinical parameters revealed that genotypes of rs708567 (IL17RC) correlate with parasitemia (P=0.028, r(2)=0.0086), with GG homozygotes having the lowest parasite burden. Additionally, rs708567 GG homozygotes had a decreased risk of severe malaria (P=0.007, OR=0.78 (95% CI; 0.65-0.93)) and death (P=0.028, OR=0.58 (95% CI; 0.37-0.93)) than those with AA and AG genotypes. In summary, variants in six genes encoding adhesion and proinflammatory molecules are associated with severe malaria in the Vietnamese. Further replicative studies in independent populations will be necessary to confirm these findings.
Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 075491, 075491/Z/04, 077012, 077012/Z/05/Z, 077383, 089275, 089276, 089276/Z/09/Z, 090532, 090532/Z/09/Z, 098051, WT077383/Z/05/Z
Genes and immunity 2012;13;6;503-8
Genomics: ENCODE explained.
Howard Hughes Medical Institute and the Salk Institute for Biological Studies, La Jolla, California 92037, USA. email@example.com
An integrated encyclopedia of DNA elements in the human genome.
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Funded by: Intramural NIH HHS; NCI NIH HHS: P30 CA016086, P30 CA045508; NHGRI NIH HHS: K99 HG006698, R01 HG003143, R01 HG003541, R01 HG003700, R01 HG003988, R01 HG004456, R01 HG005085, R01HG003143, R01HG003541, R01HG003700, R01HG003988, R01HG004456-03, RC2 HG005573, RC2 HG005591, RC2 HG005679, RC2HG005591, RC2HG005679, U01 HG004561, U01 HG004571, U01 HG004695, U01HG004561, U01HG004571, U01HG004695, U41 HG004568, U41HG004568, U54 HG004555, U54 HG004557, U54 HG004558, U54 HG004563, U54 HG004570, U54 HG004576, U54 HG004592, U54HG004555, U54HG004557, U54HG004558, U54HG004563, U54HG004570, U54HG004576, U54HG004592, ZIA HG200323, ZIA HG200341; NIDDK NIH HHS: R01 DK054369, R01 DK065806, R37 DK044746; NIGMS NIH HHS: T32 GM007205; PHS HHS: ZIAHG200323, ZIAHG200341; Wellcome Trust: WT095908
IFITM3 restricts the morbidity and mortality associated with influenza.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
The 2009 H1N1 influenza pandemic showed the speed with which a novel respiratory virus can spread and the ability of a generally mild infection to induce severe morbidity and mortality in a subset of the population. Recent in vitro studies show that the interferon-inducible transmembrane (IFITM) protein family members potently restrict the replication of multiple pathogenic viruses. Both the magnitude and breadth of the IFITM proteins' in vitro effects suggest that they are critical for intrinsic resistance to such viruses, including influenza viruses. Using a knockout mouse model, we now test this hypothesis directly and find that IFITM3 is essential for defending the host against influenza A virus in vivo. Mice lacking Ifitm3 display fulminant viral pneumonia when challenged with a normally low-pathogenicity influenza virus, mirroring the destruction inflicted by the highly pathogenic 1918 'Spanish' influenza. Similar increased viral replication is seen in vitro, with protection rescued by the re-introduction of Ifitm3. To test the role of IFITM3 in human influenza virus infection, we assessed the IFITM3 alleles of individuals hospitalized with seasonal or pandemic influenza H1N1/09 viruses. We find that a statistically significant number of hospitalized subjects show enrichment for a minor IFITM3 allele (SNP rs12252-C) that alters a splice acceptor site, and functional assays show the minor CC genotype IFITM3 has reduced influenza virus restriction in vitro. Together these data reveal that the action of a single intrinsic immune effector, IFITM3, profoundly alters the course of influenza virus infection in mouse and humans.
Funded by: Cancer Research UK: CRUK_13031; Chief Scientist Office; Department of Health: DH_DHCS/04/G121/68; Medical Research Council: MRC_G0600371, MRC_G0600511, MRC_G0800767, MRC_G0800777, MRC_G0802752, MRC_G0901697, MRC_G1000758, MRC_MC_G1001212, MRC_MC_U122785833; NIAID NIH HHS: R01 AI091786, R01AI091786; NIDDK NIH HHS: P30 DK043351; Wellcome Trust: 090382/Z/09/Z, 090385/Z/09/Z, WT090382, WT098051
High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis.
Arthritis Research UK Epidemiology Unit, Centre for Musculoskeletal Research, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK.
Using the Immunochip custom SNP array, which was designed for dense genotyping of 186 loci identified through genome-wide association studies (GWAS), we analyzed 11,475 individuals with rheumatoid arthritis (cases) of European ancestry and 15,870 controls for 129,464 markers. We combined these data in a meta-analysis with GWAS data from additional independent cases (n = 2,363) and controls (n = 17,872). We identified 14 new susceptibility loci, 9 of which were associated with rheumatoid arthritis overall and five of which were specifically associated with disease that was positive for anticitrullinated peptide antibodies, bringing the number of confirmed rheumatoid arthritis risk loci in individuals of European ancestry to 46. We refined the peak of association to a single gene for 19 loci, identified secondary independent effects at 6 loci and identified association to low-frequency variants at 4 loci. Bioinformatic analyses generated strong hypotheses for the causal SNP at seven loci. This study illustrates the advantages of dense SNP mapping analysis to inform subsequent functional investigations.
Funded by: Arthritis Research UK: 17552, 18475; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0600329, G0800759, G1001518, MC_U147585819, MC_UP_A620_1014; NIAID NIH HHS: R01 AI068759; NIAMS NIH HHS: 1R01AR062886-01, K08AR055688, N01-AR-2-2263, N01-AR1-2256, R01 AR056768, R01 AR062886, R01-AR-4-4422, R01-AR056768, R01-AR057108, R01-AR059648, RC2AR059092-01; NIGMS NIH HHS: U01-GM092691; Wellcome Trust: 068545/Z/02, 076113/C/04/Z, 090532, 091157
Nature genetics 2012;44;12;1336-40
Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.
Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.
Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of "novel" and "putative" protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets. We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts. Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.
Funded by: NHGRI NIH HHS: U54 HG0004555, U54 HG004555
Molecular biology and evolution 2012;29;9;2265-83
Progressive cross-reactivity in IgE responses: an explanation for the slow development of human immunity to schistosomiasis?
Department of Pathology, University of Cambridge, Cambridge, United Kindgdom. firstname.lastname@example.org
People in regions of Schistosoma mansoni endemicity slowly acquire immunity, but why this takes years to develop is still not clear. It has been associated with increases in parasite-specific IgE, induced, some investigators propose, to antigens exposed during the death of adult worms. These antigens include members of the tegumental-allergen-like protein family (TAL1 to TAL13). Previously, in a group of S. mansoni-infected Ugandan males, we showed that IgE responses to three TALs expressed in worms (TAL1, -3, and -5) became more prevalent with age. Now, in a subcohort we examined associations of these responses with resistance to reinfection and use the data to propose a mechanism for the slow development of immunity. IgE was measured 9 weeks posttreatment and at reinfection at 2 years (n = 144). An anti-TAL5 IgE (herein referred to as TAL5 IgE) response was associated with reduced reinfection even after adjusting for age using regression analysis (geometric mean odds ratio, 0.24; P = 0.016). TAL5 IgE responders were a subset of TAL3 IgE responders, themselves a subset of TAL1 responders. TAL3 IgE and TAL5 IgE were highly cross-reactive, with TAL3 the immunizing antigen and TAL5 the cross-reactive antigen. Transcriptional and translational studies show that TAL3 is most abundant in adult worms and that TAL5 is most abundant in infectious larvae. We propose that in chronic schistosomiasis, older individuals have repeatedly experienced IgE antigens exposed when adult worms die (e.g., TAL3) and that this leads to increasing cross-reactivity with antigens of invading larvae (e.g., TAL5). Progressive accumulation of worm/larvae cross-reactivity could explain the age-dependent immunity observed in areas of endemicity.
Funded by: Wellcome Trust: 083931/∼/07/Z
Infection and immunity 2012;80;12;4264-70
The importance of identifying alternative splicing in vertebrate genome annotation.
Human and Vertebrate Analysis and Annotation Team, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com
While alternative splicing (AS) can potentially expand the functional repertoire of vertebrate genomes, relatively few AS transcripts have been experimentally characterized. We describe our detailed manual annotation of vertebrate genomes, which is generating a publicly available geneset rich in AS. In order to achieve this we have adopted a highly sensitive approach to annotating gene models supported by correctly mapped, canonically spliced transcriptional evidence combined with a highly cautious approach to adding unsupported extensions to models and making decisions on their functional potential. We use information about the predicted functional potential and structural properties of every AS transcript annotated at a protein-coding or non-coding locus to place them into one of eleven subclasses. We describe the incorporation of new sequencing and proteomics technologies into our annotation pipelines, which are used to identify and validate AS. Combining all data sources has led to the production of a rich geneset containing an average of 6.3 AS transcripts for every human multi-exon protein-coding gene. The datasets produced have proved very useful in providing context to studies investigating the functional potential of genes and the effect of variation may have on gene structure and function. DATABASE URL: http://www.ensembl.org/index.html, http://vega.sanger.ac.uk/index.html.
Funded by: NHGRI NIH HHS: 5U54HG004555-04S1; Wellcome Trust: WT077198
Database : the journal of biological databases and curation 2012;2012;bas014
Genome-wide association analysis identifies susceptibility loci for migraine without aura.
Institute for Stroke and Dementia Research, Klinikum der Universität München, Munich, Germany.
Migraine without aura is the most common form of migraine, characterized by recurrent disabling headache and associated autonomic symptoms. To identify common genetic variants associated with this migraine type, we analyzed genome-wide association data of 2,326 clinic-based German and Dutch individuals with migraine without aura and 4,580 population-matched controls. We selected SNPs from 12 loci with 2 or more SNPs associated with P values of <1 × 10(-5) for replication testing in 2,508 individuals with migraine without aura and 2,652 controls. SNPs at two of these loci showed convincing replication: at 1q22 (in MEF2D; replication P = 4.9 × 10(-4); combined P = 7.06 × 10(-11)) and at 3p24 (near TGFBR2; replication P = 1.0 × 10(-4); combined P = 1.17 × 10(-9)). In addition, SNPs at the PHACTR1 and ASTN2 loci showed suggestive evidence of replication (P = 0.01; combined P = 3.20 × 10(-8) and P = 0.02; combined P = 3.86 × 10(-8), respectively). We also replicated associations at two previously reported migraine loci in or near TRPM8 and LRP1. This study identifies the first susceptibility loci for migraine without aura, thereby expanding our knowledge of this debilitating neurological disorder.
Funded by: Wellcome Trust: 098051
Nature genetics 2012;44;7;777-82
A familial case with interstitial 2q36 deletion: variable phenotypic expression in full and mosaic state.
Department of Medical Genetics, Faculty of Medical Sciences, University of Campinas, Rua Tessália Vieira de Camargo, 126 CEP 13083-887 Campinas, São Paulo, Brazil.
Submicroscopic chromosomal anomalies play an important role in the etiology of craniofacial malformations, including midline facial defects with hypertelorism (MFDH). MFDH is a common feature combination in several conditions, of which Frontonasal Dysplasia is the most frequently encountered manifestation; in most cases the etiology remains unknown. We identified a parent to child transmission of a 6.2 Mb interstitial deletion of chromosome region 2q36.1q36.3 by array-CGH and confirmed by FISH and microsatellite analysis. The patient and her mother both presented an MFDH phenotype although the phenotype in the mother was much milder than her daughter. Inspection of haplotype segregation within the family of 2q36.1 region suggests that the deletion arose on a chromosome derived from the maternal grandfather. Evidences based on FISH, microsatellite and array-CGH analysis point to a high frequency mosaicism for presence of a deleted region 2q36 occurring in blood of the mother. The frequency of mosaicism in other tissues could not be determined. We here suggest that the milder phenotype observed in the proband's mother can be explained by the mosaic state of the deletion. This most likely arose by an early embryonic deletion in the maternal embryo resulting in both gonadal and somatic mosaicism of two cell lines, with and without the deleted chromosome. The occurrence of gonadal mosaicism increases the recurrence risk significantly and is often either underestimated or not even taken into account in genetic counseling where new mutation is suspected.
European journal of medical genetics 2012;55;11;660-5
Controls of nucleosome positioning in the human genome.
Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America. firstname.lastname@example.org
Nucleosomes are important for gene regulation because their arrangement on the genome can control which proteins bind to DNA. Currently, few human nucleosomes are thought to be consistently positioned across cells; however, this has been difficult to assess due to the limited resolution of existing data. We performed paired-end sequencing of micrococcal nuclease-digested chromatin (MNase-seq) from seven lymphoblastoid cell lines and mapped over 3.6 billion MNase-seq fragments to the human genome to create the highest-resolution map of nucleosome occupancy to date in a human cell type. In contrast to previous results, we find that most nucleosomes have more consistent positioning than expected by chance and a substantial fraction (8.7%) of nucleosomes have moderate to strong positioning. In aggregate, nucleosome sequences have 10 bp periodic patterns in dinucleotide frequency and DNase I sensitivity; and, across cells, nucleosomes frequently have translational offsets that are multiples of 10 bp. We estimate that almost half of the genome contains regularly spaced arrays of nucleosomes, which are enriched in active chromatin domains. Single nucleotide polymorphisms that reduce DNase I sensitivity can disrupt the phasing of nucleosome arrays, which indicates that they often result from positioning against a barrier formed by other proteins. However, nucleosome arrays can also be created by DNA sequence alone. The most striking example is an array of over 400 nucleosomes on chromosome 12 that is created by tandem repetition of sequences with strong positioning properties. In summary, a large fraction of nucleosomes are consistently positioned--in some regions because they adopt favored sequence positions, and in other regions because they are forced into specific arrangements by chromatin remodeling or DNA binding proteins.
Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: HG006123; NIGMS NIH HHS: T32 GM007197; NIMH NIH HHS: MH090951
PLoS genetics 2012;8;11;e1003036
Dissecting the regulatory architecture of gene expression QTLs.
Department of Human Genetics, University of Chicago, 920 E58th Street, Chicago, IL 60637, USA. email@example.com
Background: Expression quantitative trait loci (eQTLs) are likely to play an important role in the genetics of complex traits; however, their functional basis remains poorly understood. Using the HapMap lymphoblastoid cell lines, we combine 1000 Genomes genotypes and an extensive catalogue of human functional elements to investigate the biological mechanisms that eQTLs perturb.
Results: We use a Bayesian hierarchical model to estimate the enrichment of eQTLs in a wide variety of regulatory annotations. We find that approximately 40% of eQTLs occur in open chromatin, and that they are particularly enriched in transcription factor binding sites, suggesting that many directly impact protein-DNA interactions. Analysis of core promoter regions shows that eQTLs also frequently disrupt some known core promoter motifs but, surprisingly, are not enriched in other well-known motifs such as the TATA box. We also show that information from regulatory annotations alone, when weighted by the hierarchical model, can provide a meaningful ranking of the SNPs that are most likely to drive gene expression variation.
Conclusions: Our study demonstrates how regulatory annotation and the association signal derived from eQTL-mapping can be combined into a single framework. We used this approach to further our understanding of the biology that drives human gene expression variation, and of the putatively causal SNPs that underlie it.
Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: R01 HG006123; NHLBI NIH HHS: R01 HL092206; NIGMS NIH HHS: GM077959, T32 GM007197; NIMH NIH HHS: MH084703, MH090951
Genome biology 2012;13;1;R7
Universal amplification, next-generation sequencing, and assembly of HIV-1 genomes.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Whole HIV-1 genome sequences are pivotal for large-scale studies of inter- and intrahost evolution, including the acquisition of drug resistance mutations. The ability to rapidly and cost-effectively generate large numbers of HIV-1 genome sequences from different populations and geographical locations and determine the effect of minority genetic variants is, however, a limiting factor. Next-generation sequencing promises to bridge this gap but is hindered by the lack of methods for the enrichment of virus genomes across the phylogenetic breadth of HIV-1 and methods for the robust assembly of the virus genomes from short-read data. Here we report a method for the amplification, next-generation sequencing, and unbiased de novo assembly of HIV-1 genomes of groups M, N, and O, as well as recombinants, that does not require prior knowledge of the sequence or subtype. A sensitivity of at least 3,000 copies/ml was determined by using plasma virus samples of known copy numbers. We applied our novel method to compare the genome diversities of HIV-1 groups, subtypes, and genes. The highest level of diversity was found in the env, nef, vpr, tat, and rev genes and parts of the gag gene. Furthermore, we used our method to investigate mutations associated with HIV-1 drug resistance in clinical samples at the level of the complete genome. Drug resistance mutations were detected as both major variant and minor species. In conclusion, we demonstrate the feasibility of our method for large-scale HIV-1 genome sequencing. This will enable the phylogenetic and phylodynamic resolution of the ongoing pandemic and efficient monitoring of complex HIV-1 drug resistance genotypes.
Funded by: Wellcome Trust: S0753
Journal of clinical microbiology 2012;50;12;3838-44
Systematic identification of genomic markers of drug sensitivity in cancer cells.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
Clinical responses to anticancer therapies are often restricted to a subset of patients. In some cases, mutated cancer genes are potent biomarkers for responses to targeted agents. Here, to uncover new biomarkers of sensitivity and resistance to cancer therapeutics, we screened a panel of several hundred cancer cell lines--which represent much of the tissue-type and genetic diversity of human cancers--with 130 drugs under clinical and preclinical investigation. In aggregate, we found that mutated cancer genes were associated with cellular response to most currently available cancer drugs. Classic oncogene addiction paradigms were modified by additional tissue-specific or expression biomarkers, and some frequently mutated genes were associated with sensitivity to a broad range of therapeutic agents. Unexpected relationships were revealed, including the marked sensitivity of Ewing's sarcoma cells harbouring the EWS (also known as EWSR1)-FLI1 gene translocation to poly(ADP-ribose) polymerase (PARP) inhibitors. By linking drug activity to the functional complexity of cancer genomes, systematic pharmacogenomic profiling in cancer cell lines provides a powerful biomarker discovery platform to guide rational cancer therapeutic strategies.
Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: 1U54HG006097-01, U54 HG006097; NIGMS NIH HHS: P41 GM079575, P41GM079575-02; Wellcome Trust: 086357
Intratumor heterogeneity and branched evolution revealed by multiregion sequencing.
Cancer Research UK London Research Institute (M. Gerlinger, A.J.R., S.H., D.E., E.G., P.M., N.M., A.S., B.P., S.B., N.Q.M., C.R.S., B.S.-D., G.C., G.S., J.D., C.S.), Royal Marsden Hospital Department of Medicine (J.L., M.N., L.P., G.S., M. Gore), Wellcome Trust Sanger Institute (P.T., I.V., A.B., D.J., K.R., C.L., P.A.F.), Barts Cancer Institute at the Barts and the London School of Medicine and Dentistry (M. Gerlinger), and the University College London Cancer Institute (C.S.) - all in London; the Technical University of Denmark, Lyngby (A.C.E., Z.S.); and Harvard Medical School, Boston (Z.S.). Address reprint requests to Dr. Swanton at the Cancer Research UK London Research Institute, Translational Cancer Therapeutics Laboratory, 44 Lincoln's Inn Fields, London WC2A 3LY, United Kingdom, or at firstname.lastname@example.org.
Background: Intratumor heterogeneity may foster tumor evolution and adaptation and hinder personalized-medicine strategies that depend on results from single tumor-biopsy samples.
Methods: To examine intratumor heterogeneity, we performed exome sequencing, chromosome aberration analysis, and ploidy profiling on multiple spatially separated samples obtained from primary renal carcinomas and associated metastatic sites. We characterized the consequences of intratumor heterogeneity using immunohistochemical analysis, mutation functional analysis, and profiling of messenger RNA expression.
Results: Phylogenetic reconstruction revealed branched evolutionary tumor growth, with 63 to 69% of all somatic mutations not detectable across every tumor region. Intratumor heterogeneity was observed for a mutation within an autoinhibitory domain of the mammalian target of rapamycin (mTOR) kinase, correlating with S6 and 4EBP phosphorylation in vivo and constitutive activation of mTOR kinase activity in vitro. Mutational intratumor heterogeneity was seen for multiple tumor-suppressor genes converging on loss of function; SETD2, PTEN, and KDM5C underwent multiple distinct and spatially separated inactivating mutations within a single tumor, suggesting convergent phenotypic evolution. Gene-expression signatures of good and poor prognosis were detected in different regions of the same tumor. Allelic composition and ploidy profiling analysis revealed extensive intratumor heterogeneity, with 26 of 30 tumor samples from four tumors harboring divergent allelic-imbalance profiles and with ploidy heterogeneity in two of four tumors.
Conclusions: Intratumor heterogeneity can lead to underestimation of the tumor genomics landscape portrayed from single tumor-biopsy samples and may present major challenges to personalized-medicine and biomarker development. Intratumor heterogeneity, associated with heterogeneous protein function, may foster tumor adaptation and therapeutic failure through Darwinian selection. (Funded by the Medical Research Council and others.).
Funded by: Cancer Research UK; Medical Research Council: MRC_G0701935, MRC_G0902275; Wellcome Trust: WT079643
The New England journal of medicine 2012;366;10;883-892
JAK2V617F homozygosity arises commonly and recurrently in PV and ET, but PV is characterized by expansion of a dominant homozygous subclone.
Cambridge Institute for Medical Research and Department of Haematology, University of Cambridge, Cambridge, United Kingdom.
Subclones homozygous for JAK2V617F are more common in polycythemia vera (PV) than essential thrombocythemia (ET), but their prevalence and significance remain unclear. The JAK2 mutation status of 6495 BFU-E, grown in low erythropoietin conditions, was determined in 77 patients with PV or ET. Homozygous-mutant colonies were common in patients with JAK2V617F-positive PV and were surprisingly prevalent in JAK2V617F-positive ET and JAK2 exon 12-mutated PV. Using microsatellite PCR to map loss-of-heterozygosity breakpoints within individual colonies, we demonstrate that recurrent acquisition of JAK2V617F homozygosity occurs frequently in both PV and ET. PV was distinguished from ET by expansion of a dominant homozygous subclone, the selective advantage of which is likely to reflect additional genetic or epigenetic lesions. Our results suggest a model in which development of a dominant JAK2V617F-homzygous subclone drives erythrocytosis in many PV patients, with alternative mechanisms operating in those with small or undetectable homozygous-mutant clones.
Funded by: Medical Research Council; Wellcome Trust
Extensive compensatory cis-trans regulation in the evolution of mouse gene expression.
European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
Gene expression levels are thought to diverge primarily via regulatory mutations in trans within species, and in cis between species. To test this hypothesis in mammals we used RNA-sequencing to measure gene expression divergence between C57BL/6J and CAST/EiJ mouse strains and allele-specific expression in their F1 progeny. We identified 535 genes with parent-of-origin specific expression patterns, although few of these showed full allelic silencing. This suggests that the number of imprinted genes in a typical mouse somatic tissue is relatively small. In the set of nonimprinted genes, 32% showed evidence of divergent expression between the two strains. Of these, 2% could be attributed purely to variants acting in trans, while 43% were attributable only to variants acting in cis. The genes with expression divergence driven by changes in trans showed significantly higher sequence constraint than genes where the divergence was explained by variants acting in cis. The remaining genes with divergent patterns of expression (55%) were regulated by a combination of variants acting in cis and variants acting in trans. Intriguingly, the changes in expression induced by the cis and trans variants were in opposite directions more frequently than expected by chance, implying that compensatory regulation to stabilize gene expression levels is widespread. We propose that expression levels of genes regulated by this mechanism are fine-tuned by cis variants that arise following regulatory changes in trans, suggesting that many cis variants are not the primary targets of natural selection.
Funded by: Cancer Research UK: CRUK_15603, CRUK_A15603; European Research Council: ERC_202218; Wellcome Trust
Genome research 2012;22;12;2376-84
Detection of cytoplasmic nucleophosmin expression by imaging flow cytometry.
Haemato-Oncology Diagnostics Service, Department of Haematology, Addenbrooke's Hospital, Cambridge, United Kingdom. email@example.com
Mutations within the nucleophosmin NPM1 gene occur in approximately one-third of cases of acute myeloid leukemia (AML). These mutations result in cytoplasmic accumulation of the mutant NPM protein. NPM1 mutations are currently detected by molecular methods. Using samples from 37 AML patients, we investigated whether imaging flow cytometry could be a viable alternative to this current technique. Bone marrow/peripheral blood cells were stained with anti-NPM antibody and DRAQ5 nuclear stain, and data were acquired on an ImageStream imaging flow cytometer (Amnis Corp., Seattle, USA). Using the similarity feature for data analysis, we demonstrated that this technique could successfully identify cases of AML with a NPM1 mutation based on cytoplasmic NPM protein staining (at similarity threshold of 1.1 sensitivity 88% and specificity 90%). Combining data of mean fluorescence intensity and % dissimilar staining in a 0-2 scoring system further improved the sensitivity (100%). Imaging flow cytometry has the potential to be included as part of a standard flow cytometry antibody panel to identify potential NPM1 mutations as part of diagnosis and minimal residual disease monitoring. Imaging flow cytometry is an exciting technology that has many possible applications in the diagnosis of hematological malignancies, including the potential to integrate modalities.
Funded by: Wellcome Trust: 095663
Cytometry. Part A : the journal of the International Society for Analytical Cytology 2012;81;10;896-900
Analyses of pig genomes provide insight into porcine demography and evolution.
Animal Breeding and Genomics Centre, Wageningen University, De Elst 1, 6708 WD, Wageningen, The Netherlands. firstname.lastname@example.org
For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ∼1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E010520/1, BB/E010520/2, BB/E010768/1, BB/E011640/1, BB/G004013/1, BB/H005935/1, BB/I025328/1; Chief Scientist Office: ETM/32; European Research Council: 249894; Medical Research Council; NCRR NIH HHS: P20 RR017686, P20-RR017686, R13 RR020283, R13 RR020283A, R13 RR032267, R13 RR032267A; NHGRI NIH HHS: R21 HG006464; NIAID NIH HHS: T32 AI083196; NIDA NIH HHS: P30 DA018310, R21 DA027548; NLM NIH HHS: 5 P41 LM006252, 5 P41LM006252, P41 LM006252; Unspecified: 095908
Mapping cis- and trans-regulatory effects across multiple tissues in twins.
Wellcome Trust Sanger Institute, Hinxton, UK.
Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many expression quantitative trait locus (eQTL) studies, typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis effect on expression cannot be accounted for by common cis variants, a finding that reveals the contribution of low-frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene, and we identify several replicating trans variants that act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.
Funded by: Medical Research Council: G0900339, G9815508; Wellcome Trust: 081917/Z/07/Z, 085235, 090532, 092731, 095515
Nature genetics 2012;44;10;1084-9
Lipoprotein(a) and risk of coronary, cerebrovascular, and peripheral artery disease: the EPIC-Norfolk prospective population study.
Department of Public Health and Primary Care, Institute of Public Health, University of Cambridge, Cambridge, United Kingdom.
Objective: Although the association between circulating levels of lipoprotein(a) [Lp(a)] and risk of coronary artery disease (CAD) and stroke is well established, its role in risk of peripheral arterial disease (PAD) remains unclear. Here, we examine the association between Lp(a) levels and PAD in a large prospective cohort. To contextualize these findings, we also examined the association between Lp(a) levels and risk of stroke and CAD and studied the role of low-density lipoprotein as an effect modifier of Lp(a)-associated cardiovascular risk.
Methods and results: Lp(a) levels were measured in apparently healthy participants in the European Prospective Investigation of Cancer (EPIC)-Norfolk cohort. Cox regression was used to quantify the association between Lp(a) levels and risk of PAD, stroke, and CAD outcomes. During 212 981 person-years at risk, a total of 2365 CAD, 284 ischemic stroke, and 596 PAD events occurred in 18 720 participants. Lp(a) was associated with PAD and CAD outcomes but not with ischemic stroke (hazard ratio per 2.7-fold increase in Lp(a) of 1.37, 95% CI 1.25-1.50, 1.13, 95% CI 1.04-1.22 and 0.91, 95% CI 0.79-1.03, respectively). Low-density lipoprotein cholesterol levels did not modify these associations.
Conclusions: Lp(a) levels were associated with future PAD and CAD events. The association between Lp(a) and cardiovascular disease was not modified by low-density lipoprotein cholesterol levels.
Funded by: Cancer Research UK: CRUK_14136, CRUK_A14136, CRUK_A8257; Medical Research Council: MRC_G0401527, MRC_G0701863, MRC_G0801566, MRC_G1000143, MRC_MC_U106179471
Arteriosclerosis, thrombosis, and vascular biology 2012;32;12;3058-65
Afghanistan's ethnic groups share a Y-chromosomal heritage structured by historical events.
The Lebanese American University, Chouran, Beirut, Lebanon.
Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia.
Funded by: Wellcome Trust
PloS one 2012;7;3;e34288
Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing.
Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
Chlamydia trachomatis is responsible for both trachoma and sexually transmitted infections, causing substantial morbidity and economic cost globally. Despite this, our knowledge of its population and evolutionary genetics is limited. Here we present a detailed phylogeny based on whole-genome sequencing of representative strains of C. trachomatis from both trachoma and lymphogranuloma venereum (LGV) biovars from temporally and geographically diverse sources. Our analysis shows that predicting phylogenetic structure using ompA, which is traditionally used to classify Chlamydia, is misleading because extensive recombination in this region masks any true relationships present. We show that in many instances, ompA is a chimera that can be exchanged in part or as a whole both within and between biovars. We also provide evidence for exchange of, and recombination within, the cryptic plasmid, which is another key diagnostic target. We used our phylogenetic framework to show how genetic exchange has manifested itself in ocular, urogenital and LGV C. trachomatis strains, including the epidemic LGV serotype L2b.
Funded by: Department of Health: DH_937; Wellcome Trust: 080348, WT089472, WT098051
Nature genetics 2012;44;4;413-9, S1
GENCODE: the reference human genome annotation for The ENCODE Project.
Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. firstname.lastname@example.org
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Funded by: NHGRI NIH HHS: 5U54HG004555, U54 HG004555; Wellcome Trust: 095908, WT098051
Genome research 2012;22;9;1760-74
Comparative genomic analyses of the Taylorellae.
Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.
Contagious equine metritis (CEM) is an important venereal disease of horses that is of concern to the thoroughbred industry. Taylorella equigenitalis is a causative agent of CEM but very little is known about it or its close relative Taylorella asinigenitalis. To reveal novel information about Taylorella biology, comparative genomic analyses were undertaken. Whole genome sequencing was performed for the T. equigenitalis type strain, NCTC11184. Draft genome sequences were produced for a second T. equigenitalis strain and for a strain of T. asinigenitalis. These genome sequences were analysed and compared to each other and the recently released genome sequence of T. equigenitalis MCE9. These analyses revealed that T. equigenitalis strains appear to be very similar to each other with relatively little strain-specific DNA content. A number of genes were identified that encode putative toxins and adhesins that are possibly involved in infection. Analysis of T. asinigenitalis revealed that it has a very similar gene repertoire to that of T. equigenitalis but shares surprisingly little DNA sequence identity with it. The generation of genome sequence information greatly increases knowledge of these poorly characterised bacteria and greatly facilitates study of them.
Veterinary microbiology 2012;159;1-2;195-203
Genome mapping and genomics of caenorhabditis elegans
Genome Mapping and Genomics in Animals 2012;4;17-30
Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe.
Department of Microbiology and Immunology, University of Melbourne, Victoria, Australia. email@example.com
Shigella are human-adapted Escherichia coli that have gained the ability to invade the human gut mucosa and cause dysentery(1,2), spreading efficiently via low-dose fecal-oral transmission(3,4). Historically, S. sonnei has been predominantly responsible for dysentery in developed countries but is now emerging as a problem in the developing world, seeming to replace the more diverse Shigella flexneri in areas undergoing economic development and improvements in water quality(4-6). Classical approaches have shown that S. sonnei is genetically conserved and clonal(7). We report here whole-genome sequencing of 132 globally distributed isolates. Our phylogenetic analysis shows that the current S. sonnei population descends from a common ancestor that existed less than 500 years ago and that diversified into several distinct lineages with unique characteristics. Our analysis suggests that the majority of this diversification occurred in Europe and was followed by more recent establishment of local pathogen populations on other continents, predominantly due to the pandemic spread of a single, rapidly evolving, multidrug-resistant lineage.
Funded by: Medical Research Council: G0800173, G0800173(86345); Wellcome Trust: 0689
Nature genetics 2012;44;9;1056-9
Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.
Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.
Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.
Funded by: NHGRI NIH HHS: U54 HG004555, U54 HG004557; Wellcome Trust: 095908, WT077198/Z/05/Z
Genome research 2012;22;9;1698-710
WormBase: Annotating many nematode genomes.
European Bioinformatics Institute; Wellcome Trust Genome Campus; Hinxton, Cambridge UK.
WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.
Funded by: Medical Research Council: G0701197
Exploration of signals of positive selection derived from genotype-based human genome scans using re-sequencing data.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.
We have investigated whether regions of the genome showing signs of positive selection in scans based on haplotype structure also show evidence of positive selection when sequence-based tests are applied, whether the target of selection can be localized more precisely, and whether such extra evidence can lead to increased biological insights. We used two tools: simulations under neutrality or selection, and experimental investigation of two regions identified by the HapMap2 project as putatively selected in human populations. Simulations suggested that neutral and selected regions should be readily distinguished and that it should be possible to localize the selected variant to within 40 kb at least half of the time. Re-sequencing of two ~300 kb regions (chr4:158Mb and chr10:22Mb) lacking known targets of selection in HapMap CHB individuals provided strong evidence for positive selection within each and suggested the micro-RNA gene hsa-miR-548c as the best candidate target in one region, and changes in regulation of the sperm protein gene SPAG6 in the other.
Funded by: Wellcome Trust: 077009
Human genetics 2012;131;5;665-74
Genome-wide association study for circulating levels of PAI-1 provides novel insights into its regulation.
National Heart, Lung, and Blood Institute (NHBLI) Framingham Heart Study, Framingham, MA 01702, USA.
We conducted a genome-wide association study to identify novel associations between genetic variants and circulating plasminogen activator inhibitor-1 (PAI-1) concentration, and examined functional implications of variants and genes that were discovered. A discovery meta-analysis was performed in 19 599 subjects, followed by replication analysis of genome-wide significant (P < 5 × 10(-8)) single nucleotide polymorphisms (SNPs) in 10 796 independent samples. We further examined associations with type 2 diabetes and coronary artery disease, assessed the functional significance of the SNPs for gene expression in human tissues, and conducted RNA-silencing experiments for one novel association. We confirmed the association of the 4G/5G proxy SNP rs2227631 in the promoter region of SERPINE1 (7q22.1) and discovered genome-wide significant associations at 3 additional loci: chromosome 7q22.1 close to SERPINE1 (rs6976053, discovery P = 3.4 × 10(-10)); chromosome 11p15.2 within ARNTL (rs6486122, discovery P = 3.0 × 10(-8)); and chromosome 3p25.2 within PPARG (rs11128603, discovery P = 2.9 × 10(-8)). Replication was achieved for the 7q22.1 and 11p15.2 loci. There was nominal association with type 2 diabetes and coronary artery disease at ARNTL (P < .05). Functional studies identified MUC3 as a candidate gene for the second association signal on 7q22.1. In summary, SNPs in SERPINE1 and ARNTL and an SNP associated with the expression of MUC3 were robustly associated with circulating levels of PAI-1.
Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation; Department of Health; Intramural NIH HHS; Medical Research Council: G0700931, MC_U137686857; NCATS NIH HHS: UL1 TR000124; NCRR NIH HHS: M01 RR00052, RR-024156, RR018787, UL1-RR-025005; NHGRI NIH HHS: U01-HG-004402; NHLBI NIH HHS: 1U01 HL072518, HL080295, HL087652, HL105756, HL65234, HL67466, N01 HC-15103, N01 HC-55222, N01 HC085086, N01 HC095166, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-85239, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, N02-HL-6-4278, P01 HL074940, R01 HL095603, R01 HL095796, R01 HL59684, R01-HL-086694, R01-HL-087641, R01-HL-59367, R01-HL59367, R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258, R01HL071259, U01 HL072518; NIA NIH HHS: 1R01AG032098-01A1, AG-023629, AG-027058, AG-15928, AG-20098, N01AG62101, N01AG62103, N01AG62106; NIDDK NIH HHS: DK063491, K24 DK080140; NIGMS NIH HHS: P20 GM103534, T32 GM080178; NLM NIH HHS: LM010098, R01 LM010098; PHS HHS: HHSN268200625226C, HHSN268200782096C, HHSN268201200036C; Wellcome Trust: 090532
MED12 controls the response to multiple cancer drugs through regulation of TGF-β receptor signaling.
Division of Molecular Carcinogenesis, Cancer Genomics Center and Cancer Systems Biology Center, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands.
Inhibitors of the ALK and EGF receptor tyrosine kinases provoke dramatic but short-lived responses in lung cancers harboring EML4-ALK translocations or activating mutations of EGFR, respectively. We used a large-scale RNAi screen to identify MED12, a component of the transcriptional MEDIATOR complex that is mutated in cancers, as a determinant of response to ALK and EGFR inhibitors. MED12 is in part cytoplasmic where it negatively regulates TGF-βR2 through physical interaction. MED12 suppression therefore results in activation of TGF-βR signaling, which is both necessary and sufficient for drug resistance. TGF-β signaling causes MEK/ERK activation, and consequently MED12 suppression also confers resistance to MEK and BRAF inhibitors in other cancers. MED12 loss induces an EMT-like phenotype, which is associated with chemotherapy resistance in colon cancer patients and to gefitinib in lung cancer. Inhibition of TGF-βR signaling restores drug responsiveness in MED12(KD) cells, suggesting a strategy to treat drug-resistant tumors that have lost MED12.
Funded by: Wellcome Trust: WT093868
A method to infer positive selection from marker dynamics in an asexual population.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Motivation: The observation of positive selection acting on a mutant indicates that the corresponding mutation has some form of functional relevance. Determining the fitness effects of mutations thus has relevance to many interesting biological questions. One means of identifying beneficial mutations in an asexual population is to observe changes in the frequency of marked subsets of the population. We here describe a method to estimate the establishment times and fitnesses of beneficial mutations from neutral marker frequency data.
Results: The method accurately reproduces complex marker frequency trajectories. In simulations for which positive selection is close to 5% per generation, we obtain correlations upwards of 0.91 between correct and inferred haplotype establishment times. Where mutation selection coefficients are exponentially distributed, the inferred distribution of haplotype fitnesses is close to being correct. Applied to data from a bacterial evolution experiment, our method reproduces an observed correlation between evolvability and initial fitness defect.
Funded by: Wellcome Trust: 098051
Bioinformatics (Oxford, England) 2012;28;6;831-7
Components of selection in the evolution of the influenza virus: linkage effects beat inherent selection.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. firstname.lastname@example.org
The influenza virus is an important human pathogen, with a rapid rate of evolution in the human population. The rate of homologous recombination within genes of influenza is essentially zero. As such, where two alleles within the same gene are in linkage disequilibrium, interference between alleles will occur, whereby selection acting upon one allele has an influence upon the frequency of the other. We here measured the relative importance of selection and interference effects upon the evolution of influenza. We considered time-resolved allele frequency data from the global evolutionary history of the haemagglutinin gene of human influenza A/H3N2, conducting an in-depth analysis of sequences collected since 1996. Using a model that accounts for selection-caused interference between alleles in linkage disequilibrium, we estimated the inherent selective benefit of individual polymorphisms in the viral population. These inherent selection coefficients were in turn used to calculate the total selective effect of interference acting upon each polymorphism, considering the effect of the initial background upon which a mutation arose, and the subsequent effect of interference from other alleles that were under selection. Viewing events in retrospect, we estimated the influence of each of these components in determining whether a mutant allele eventually fixed or died in the global viral population. Our inherent selection coefficients, when combined across different regions of the protein, were consistent with previous measurements of dN/dS for the same system. Alleles going on to fix in the global population tended to be under more positive selection, to arise on more beneficial backgrounds, and to avoid strong negative interference from other alleles under selection. However, on average, the fate of a polymorphism was determined more by the combined influence of interference effects than by its inherent selection coefficient.
Funded by: Wellcome Trust: 098051
PLoS pathogens 2012;8;12;e1003091
Quantifying selection acting on a complex trait using allele frequency time series data.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
When selection is acting on a large genetically diverse population, beneficial alleles increase in frequency. This fact can be used to map quantitative trait loci by sequencing the pooled DNA from the population at consecutive time points and observing allele frequency changes. Here, we present a population genetic method to analyze time series data of allele frequencies from such an experiment. Beginning with a range of proposed evolutionary scenarios, the method measures the consistency of each with the observed frequency changes. Evolutionary theory is utilized to formulate equations of motion for the allele frequencies, following which likelihoods for having observed the sequencing data under each scenario are derived. Comparison of these likelihoods gives an insight into the prevailing dynamics of the system under study. We illustrate the method by quantifying selective effects from an experiment, in which two phenotypically different yeast strains were first crossed and then propagated under heat stress (Parts L, Cubillos FA, Warringer J, et al. [14 co-authors]. 2011. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res). From these data, we discover that about 6% of polymorphic sites evolve nonneutrally under heat stress conditions, either because of their linkage to beneficial (driver) alleles or because they are drivers themselves. We further identify 44 genomic regions containing one or more candidate driver alleles, quantify their apparent selective advantage, obtain estimates of recombination rates within the regions, and show that the dynamics of the drivers display a strong signature of selection going beyond additive models. Our approach is applicable to study adaptation in a range of systems under different evolutionary pressures.
Funded by: Wellcome Trust: 098051, WT077192/Z/05/Z
Molecular biology and evolution 2012;29;4;1187-97
Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. email@example.com
Antigenic variation enables pathogens to avoid the host immune response by continual switching of surface proteins. The protozoan blood parasite Trypanosoma brucei causes human African trypanosomiasis ("sleeping sickness") across sub-Saharan Africa and is a model system for antigenic variation, surviving by periodically replacing a monolayer of variant surface glycoproteins (VSG) that covers its cell surface. We compared the genome of Trypanosoma brucei with two closely related parasites Trypanosoma congolense and Trypanosoma vivax, to reveal how the variant antigen repertoire has evolved and how it might affect contemporary antigenic diversity. We reconstruct VSG diversification showing that Trypanosoma congolense uses variant antigens derived from multiple ancestral VSG lineages, whereas in Trypanosoma brucei VSG have recent origins, and ancestral gene lineages have been repeatedly co-opted to novel functions. These historical differences are reflected in fundamental differences between species in the scale and mechanism of recombination. Using phylogenetic incompatibility as a metric for genetic exchange, we show that the frequency of recombination is comparable between Trypanosoma congolense and Trypanosoma brucei but is much lower in Trypanosoma vivax. Furthermore, in showing that the C-terminal domain of Trypanosoma brucei VSG plays a crucial role in facilitating exchange, we reveal substantial species differences in the mechanism of VSG diversification. Our results demonstrate how past VSG evolution indirectly determines the ability of contemporary parasites to generate novel variant antigens through recombination and suggest that the current model for antigenic variation in Trypanosoma brucei is only one means by which these parasites maintain chronic infections.
Funded by: Wellcome Trust: 085349/Z/08/Z, WT 055558/Z/98/A, WT 055558/Z/98/C, WT 085775/Z/08/Z, WT085349
Proceedings of the National Academy of Sciences of the United States of America 2012;109;9;3416-21
Bcl11a is required for neuronal morphogenesis and sensory circuit formation in dorsal spinal cord development.
Institute of Molecular and Cellular Anatomy, Ulm University, 89081 Ulm, Germany.
Dorsal spinal cord neurons receive and integrate somatosensory information provided by neurons located in dorsal root ganglia. Here we demonstrate that dorsal spinal neurons require the Krüppel-C(2)H(2) zinc-finger transcription factor Bcl11a for terminal differentiation and morphogenesis. The disrupted differentiation of dorsal spinal neurons observed in Bcl11a mutant mice interferes with their correct innervation by cutaneous sensory neurons. To understand the mechanism underlying the innervation deficit, we characterized changes in gene expression in the dorsal horn of Bcl11a mutants and identified dysregulated expression of the gene encoding secreted frizzled-related protein 3 (sFRP3, or Frzb). Frzb mutant mice show a deficit in the innervation of the spinal cord, suggesting that the dysregulated expression of Frzb can account in part for the phenotype of Bcl11a mutants. Thus, our genetic analysis of Bcl11a reveals essential functions of this transcription factor in neuronal morphogenesis and sensory wiring of the dorsal spinal cord and identifies Frzb, a component of the Wnt pathway, as a downstream acting molecule involved in this process.
Funded by: Wellcome Trust: 079643
Development (Cambridge, England) 2012;139;10;1831-41
Analysis of protein palmitoylation reveals a pervasive role in Plasmodium development and pathogenesis.
Malaria Programme, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Asexual stage Plasmodium falciparum replicates and undergoes a tightly regulated developmental process in human erythrocytes. One mechanism involved in the regulation of this process is posttranslational modification (PTM) of parasite proteins. Palmitoylation is a PTM in which cysteine residues undergo a reversible lipid modification, which can regulate target proteins in diverse ways. Using complementary palmitoyl protein purification approaches and quantitative mass spectrometry, we examined protein palmitoylation in asexual-stage P. falciparum parasites and identified over 400 palmitoylated proteins, including those involved in cytoadherence, drug resistance, signaling, development, and invasion. Consistent with the prevalence of palmitoylated proteins, palmitoylation is essential for P. falciparum asexual development and influences erythrocyte invasion by directly regulating the stability of components of the actin-myosin invasion motor. Furthermore, P. falciparum uses palmitoylation in diverse ways, stably modifying some proteins while dynamically palmitoylating others. Palmitoylation therefore plays a central role in regulating P. falciparum blood stage development.
Funded by: Wellcome Trust: 079643/Z/06/Z, 089084
Cell host & microbe 2012;12;2;246-58
Misuse of hierarchical linear models overstates the significance of a reported association between OXTR and prosociality.
Proceedings of the National Academy of Sciences of the United States of America 2012;109;18;E1048
Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.
Crohn's disease and ulcerative colitis, the two common forms of inflammatory bowel disease (IBD), affect over 2.5 million people of European ancestry, with rising prevalence in other populations. Genome-wide association studies and subsequent meta-analyses of these two diseases as separate phenotypes have implicated previously unsuspected mechanisms, such as autophagy, in their pathogenesis and showed that some IBD loci are shared with other inflammatory diseases. Here we expand on the knowledge of relevant pathways by undertaking a meta-analysis of Crohn's disease and ulcerative colitis genome-wide association scans, followed by extensive validation of significant findings, with a combined total of more than 75,000 cases and controls. We identify 71 new associations, for a total of 163 IBD loci, that meet genome-wide significance thresholds. Most loci contribute to both phenotypes, and both directional (consistently favouring one allele over the course of human history) and balancing (favouring the retention of both alleles within populations) selection effects are evident. Many IBD loci are also implicated in other immune-mediated disorders, most notably with ankylosing spondylitis and psoriasis. We also observe considerable overlap between susceptibility loci for IBD and mycobacterial infection. Gene co-expression network analysis emphasizes this relationship, with pathways shared between host responses to mycobacteria and those predisposing to IBD.
Funded by: Arthritis Research UK: 18475; British Heart Foundation: G0000934; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800675, G0800759, G1002033; NCATS NIH HHS: UL1 TR000005, UL1 TR000124, UL1 TR000124-01; NCI NIH HHS: CA141743, R01 CA141743; NCRR NIH HHS: M01 RR000425, M01-RR00425; NIAID NIH HHS: AI062773, R01 AI062773; NIDDK NIH HHS: DK043351, DK062413, DK062420, DK062422, DK062423, DK062429, DK062429-S1, DK062431, DK062432, DK063491, DK076984, DK084554, DK83756, K23 DK097142, P01 DK046763, P01DK046763, P30 DK043351, P30 DK063491, R01 DK055731, R01 DK083756, R03 DK076984, R21 DK084554, U01 DK062413, U01 DK062418, U01 DK062420, U01 DK062422, U01 DK062423, U01 DK062429, U01 DK062431, U01 DK062432; NIGMS NIH HHS: T32 GM007205, T32GM07205; Wellcome Trust: 068545/Z/02, 083948/Z/07/Z, 085475/B/08/Z, 085475/Z/08/Z, 089120, 090532, 098051
Interaction of insulin and PPAR-α genes in Alzheimer's disease: the Epistasis Project.
Department of Psychiatry, University of Bonn, Bonn, Germany. firstname.lastname@example.org
Altered glucose metabolism has been described in Alzheimer's disease (AD). We re-investigated the interaction of the insulin (INS) and the peroxisome proliferator-activated receptor alpha (PPARA) genes in AD risk in the Epistasis Project, including 1,757 AD cases and 6,294 controls. Allele frequencies of both SNPs (PPARA L162V, INS intron 0 A/T) differed between Northern Europeans and Northern Spanish. The PPARA 162LL genotype increased AD risk in Northern Europeans (p = 0.04), but not in Northern Spanish (p = 0.2). There was no association of the INS intron 0 TT genotype with AD. We observed an interaction on AD risk between PPARA 162LL and INS intron 0 TT genotypes in Northern Europeans (Synergy factor 2.5, p = 0.016), but not in Northern Spanish. We suggest that dysregulation of glucose metabolism contributes to the development of AD and might be due in part to genetic variations in INS and PPARA and their interaction especially in Northern Europeans.
Journal of neural transmission (Vienna, Austria : 1996) 2012;119;4;473-9
The B10 Idd9.3 locus mediates accumulation of functionally superior CD137(+) regulatory T cells in the nonobese diabetic type 1 diabetes model.
Division of Immunology, Allergy and Rheumatology, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA.
CD137 is a T cell costimulatory molecule encoded by the prime candidate gene (designated Tnfrsf9) in NOD.B10 Idd9.3 congenic mice protected from type 1 diabetes (T1D). NOD T cells show decreased CD137-mediated T cell signaling compared with NOD.B10 Idd9.3 T cells, but it has been unclear how this decreased CD137 T cell signaling could mediate susceptibility to T1D. We and others have shown that a subset of regulatory T cells (Tregs) constitutively expresses CD137 (whereas effector T cells do not, and only express CD137 briefly after activation). In this study, we show that the B10 Idd9.3 region intrinsically contributes to accumulation of CD137(+) Tregs with age. NOD.B10 Idd9.3 mice showed significantly increased percentages and numbers of CD137(+) peripheral Tregs compared with NOD mice. Moreover, Tregs expressing the B10 Idd9.3 region preferentially accumulated in mixed bone marrow chimeric mice reconstituted with allotypically marked NOD and NOD.B10 Idd9.3 bone marrow. We demonstrate a possible significance of increased numbers of CD137(+) Tregs by showing functional superiority of FACS-purified CD137(+) Tregs in vitro compared with CD137(-) Tregs in T cell-suppression assays. Increased functional suppression was also associated with increased production of the alternatively spliced CD137 isoform, soluble CD137, which has been shown to suppress T cell proliferation. We show for the first time, to our knowledge, that CD137(+) Tregs are the primary cellular source of soluble CD137. NOD.B10 Idd9.3 mice showed significantly increased serum soluble CD137 compared with NOD mice with age, consistent with their increased numbers of CD137(+) Tregs with age. These studies demonstrate the importance of CD137(+) Tregs in T1D and offer a new hypothesis for how the NOD Idd9.3 region could act to increase T1D susceptibility.
Funded by: NIAID NIH HHS: U19 AI056374, U19AI56374; Wellcome Trust: 079895, 091157
Journal of immunology (Baltimore, Md. : 1950) 2012;189;10;5001-15
The fallacy of ratio correction to address confounding factors.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Scientists aspire to measure cause and effect. Unfortunately confounding variables, ones that are associated with both the probable cause and the outcome, can lead to an association that is true but potentially misleading. For example, altered body weight is often observed in a gene knockout; however, many other variables, such as lean mass, will also change as the body weight changes. This leaves the researcher asking whether the change in that variable is expected for that change in weight. Ratio correction, which is often referred to as normalization, is a method used commonly to remove the effect of a confounding variable. Although ratio correction is used widely in biological research, it is not the method recommended in the statistical literature to address confounding factors; instead regression methods such as the analysis of covariance (ANCOVA) are proposed. This method examines the difference in means after adjusting for the confounding relationship. Using real data, this manuscript demonstrates how the ratio correction approach is flawed and can result in erroneous calls of significance leading to inappropriate biological conclusions. This arises as some of the underlying assumptions are not met. The manuscript goes on to demonstrate that researchers should use ANCOVA, and discusses how graphical tools can be used readily to judge the robustness of this method. This study is therefore a clear example of why assumption testing is an important component of a study and thus why it is included in the Animal Research: Reporting of In Vivo Experiment (ARRIVE) guidelines.
Funded by: Wellcome Trust: 098051, WT077157/Z/05/Z
Laboratory animals 2012;46;3;245-52
Genome-wide association study identifies multiple loci influencing human serum metabolite levels.
Institute for Molecular Medicine Finland, University of Helsinki, Finland.
Nuclear magnetic resonance assays allow for measurement of a wide range of metabolic phenotypes. We report here the results of a GWAS on 8,330 Finnish individuals genotyped and imputed at 7.7 million SNPs for a range of 216 serum metabolic phenotypes assessed by NMR of serum samples. We identified significant associations (P < 2.31 × 10(-10)) at 31 loci, including 11 for which there have not been previous reports of associations to a metabolic trait or disorder. Analyses of Finnish twin pairs suggested that the metabolic measures reported here show higher heritability than comparable conventional metabolic phenotypes. In accordance with our expectations, SNPs at the 31 loci associated with individual metabolites account for a greater proportion of the genetic component of trait variance (up to 40%) than is typically observed for conventional serum metabolic phenotypes. The identification of such associations may provide substantial insight into cardiometabolic disorders.
Funded by: Medical Research Council: G0500539, G0600705; NHLBI NIH HHS: 5R01HL087679, R01 HL087679; NIAAA NIH HHS: AA-08315, AA-12502, AA-15416, K02 AA018755, R01 AA009203, R01 AA012502, R01 AA015416, R37 AA012502; NIMH NIH HHS: 1RL1MH083268, RL1 MH083268; Wellcome Trust: 089062/Z/09/Z, 090532, 098051, 89061/Z/09/Z, GR069224
Nature genetics 2012;44;3;269-76
De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia.
Department of Psychological Medicine and Neurology, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Neuroscience and Mental Health Research Institute, Cardiff University, Cardiff, UK. email@example.com
A small number of rare, recurrent genomic copy number variants (CNVs) are known to substantially increase susceptibility to schizophrenia. As a consequence of the low fecundity in people with schizophrenia and other neurodevelopmental phenotypes to which these CNVs contribute, CNVs with large effects on risk are likely to be rapidly removed from the population by natural selection. Accordingly, such CNVs must frequently occur as recurrent de novo mutations. In a sample of 662 schizophrenia proband-parent trios, we found that rare de novo CNV mutations were significantly more frequent in cases (5.1% all cases, 5.5% family history negative) compared with 2.2% among 2623 controls, confirming the involvement of de novo CNVs in the pathogenesis of schizophrenia. Eight de novo CNVs occurred at four known schizophrenia loci (3q29, 15q11.2, 15q13.3 and 16p11.2). De novo CNVs of known pathogenic significance in other genomic disorders were also observed, including deletion at the TAR (thrombocytopenia absent radius) region on 1q21.1 and duplication at the WBS (Williams-Beuren syndrome) region at 7q11.23. Multiple de novos spanned genes encoding members of the DLG (discs large) family of membrane-associated guanylate kinases (MAGUKs) that are components of the postsynaptic density (PSD). Two de novos also affected EHMT1, a histone methyl transferase known to directly regulate DLG family members. Using a systems biology approach and merging novel CNV and proteomics data sets, systematic analysis of synaptic protein complexes showed that, compared with control CNVs, case de novos were significantly enriched for the PSD proteome (P=1.72 × 10⁻⁶. This was largely explained by enrichment for members of the N-methyl-D-aspartate receptor (NMDAR) (P=4.24 × 10⁻⁶) and neuronal activity-regulated cytoskeleton-associated protein (ARC) (P=3.78 × 10⁻⁸) postsynaptic signalling complexes. In an analysis of 18 492 subjects (7907 cases and 10 585 controls), case CNVs were enriched for members of the NMDAR complex (P=0.0015) but not ARC (P=0.14). Our data indicate that defects in NMDAR postsynaptic signalling and, possibly, ARC complexes, which are known to be important in synaptic plasticity and cognition, play a significant role in the pathogenesis of schizophrenia.
Funded by: Medical Research Council: G0800509; NIMH NIH HHS: MH066392-05A1, P50 MH066392
Molecular psychiatry 2012;17;2;142-53
Gene expression profiles in white blood cells of volunteers exposed to a 50 Hz electromagnetic field.
Department of Biochemistry, University of Cambridge, Sanger Building, Cambridge, CB2 1GA, United Kingdom.
Consistent and independently replicated laboratory evidence to support a causative relationship between environmental exposure to extremely low-frequency electromagnetic fields (EMFs) at power line frequencies and the associated increase in risk of childhood leukemia has not been obtained. In particular, although gene expression responses have been reported in a wide variety of cells, none has emerged as robust, widely replicated effects. DNA microarrays facilitate comprehensive searches for changes in gene expression without a requirement to select candidate responsive genes. To determine if gene expression changes occur in white blood cells of volunteers exposed to an ELF-EMF, each of 17 pairs of male volunteers age 20-30 was subjected either to a 50 Hz EMF exposure of 62.0 ± 7.1 μT for 2 h or to a sham exposure (0.21 ± 0.05 μT) at the same time (11:00 a.m. to 13:00 p.m.). The alternative regime for each volunteer was repeated on the following day and the two-day sequence was repeated 6 days later, with the exception that a null exposure (0.085 ± 0.01 μT) replaced the sham exposure. Five blood samples (10 ml) were collected at 2 h intervals from 9:00 to 17:00 with five additional samples during the exposure and sham or null exposure periods on each study day. RNA samples were pooled for the same time on each study day for the group of 17 volunteers that were subjected to the ELF-EMF exposure/sham or null exposure sequence and were analyzed on Illumina microarrays. Time courses for 16 mammalian genes previously reported to be responsive to ELF-EMF exposure, including immediate early genes, stress response, cell proliferation and apoptotic genes were examined in detail. No genes or gene sets showed consistent response profiles to repeated ELF-EMF exposures. A stress response was detected as a transient increase in plasma cortisol at the onset of either exposure or sham exposure on the first study day. The cortisol response diminished progressively on subsequent exposures or sham exposures, and was attributable to mild stress associated with the experimental protocol.
Radiation research 2012;178;3;138-49
Transposon mutagenesis identifies genes that transform neural stem cells into glioma-initiating cells.
Division of Genetics and Genomics, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, Singapore 138673.
Neural stem cells (NSCs) are considered to be the cell of origin of glioblastoma multiforme (GBM). However, the genetic alterations that transform NSCs into glioma-initiating cells remain elusive. Using a unique transposon mutagenesis strategy that mutagenizes NSCs in culture, followed by additional rounds of mutagenesis to generate tumors in vivo, we have identified genes and signaling pathways that can transform NSCs into glioma-initiating cells. Mobilization of Sleeping Beauty transposons in NSCs induced the immortalization of astroglial-like cells, which were then able to generate tumors with characteristics of the mesenchymal subtype of GBM on transplantation, consistent with a potential astroglial origin for mesenchymal GBM. Sequence analysis of transposon insertion sites from tumors and immortalized cells identified more than 200 frequently mutated genes, including human GBM-associated genes, such as Met and Nf1, and made it possible to discriminate between genes that function during astroglial immortalization vs. later stages of tumor development. We also functionally validated five GBM candidate genes using a previously undescribed high-throughput method. Finally, we show that even clonally related tumors derived from the same immortalized line have acquired distinct combinations of genetic alterations during tumor development, suggesting that tumor formation in this model system involves competition among genetically variant cells, which is similar to the Darwinian evolutionary processes now thought to generate many human cancers. This mutagenesis strategy is faster and simpler than conventional transposon screens and can potentially be applied to any tissue stem/progenitor cells that can be grown and differentiated in vitro.
Funded by: Cancer Research UK: 13031
Proceedings of the National Academy of Sciences of the United States of America 2012;109;44;E2998-3007
Novel mutations consolidate KCTD7 as a progressive myoclonus epilepsy gene.
Folkhälsan Institute of Genetics, Biomedicum Helsinki, PO Box 63, Haartmaninkatu 8, University of Helsinki, FIN-00014 Helsinki, Finland.
Background: The progressive myoclonus epilepsies (PMEs) comprise a group of clinically and genetically heterogeneous disorders characterised by myoclonus, epilepsy, and neurological deterioration. This study aimed to identify the underlying gene(s) in childhood onset PME patients with unknown molecular genetic background.
Methods: Homozygosity mapping was applied on genome-wide single nucleotide polymorphism data of 18 Turkish patients. The potassium channel tetramerisation domain-containing 7 (KCTD7) gene, previously associated with PME in a single inbred family, was screened for mutations. The spatiotemporal expression of KCTD7 was assessed in cellular cultures and mouse brain tissue.
Results: Overlapping homozygosity in 8/18 patients defined a 1.5 Mb segment on 7q11.21 as the major candidate locus. Screening of the positional candidate gene KCTD7 revealed homozygous missense mutations in two of the eight cases. Screening of KCTD7 in a further 132 PME patients revealed four additional mutations (two missense, one in-frame deletion, and one frameshift-causing) in five families. Eight patients presented with myoclonus and epilepsy and one with ataxia, the mean age of onset being 19 months. Within 2 years after onset, progressive loss of mental and motor skills ensued leading to severe dementia and motor handicap. KCTD7 showed cytosolic localisation and predominant neuronal expression, with widespread expression throughout the brain. None of three polypeptides carrying patient missense mutations affected the subcellular distribution of KCTD7.
Discussion: These data confirm the causality of KCTD7 defects in PME, and imply that KCTD7 mutation screening should be considered in PME patients with onset around 2 years of age followed by rapid mental and motor deterioration.
Journal of medical genetics 2012;49;6;391-9
The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium.
Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin 2, Ireland.
More than 50 y of research have provided great insight into the physiology, metabolism, and molecular biology of Salmonella enterica serovar Typhimurium (S. Typhimurium), but important gaps in our knowledge remain. It is clear that a precise choreography of gene expression is required for Salmonella infection, but basic genetic information such as the global locations of transcription start sites (TSSs) has been lacking. We combined three RNA-sequencing techniques and two sequencing platforms to generate a robust picture of transcription in S. Typhimurium. Differential RNA sequencing identified 1,873 TSSs on the chromosome of S. Typhimurium SL1344 and 13% of these TSSs initiated antisense transcripts. Unique findings include the TSSs of the virulence regulators phoP, slyA, and invF. Chromatin immunoprecipitation revealed that RNA polymerase was bound to 70% of the TSSs, and two-thirds of these TSSs were associated with σ(70) (including phoP, slyA, and invF) from which we identified the -10 and -35 motifs of σ(70)-dependent S. Typhimurium gene promoters. Overall, we corrected the location of important genes and discovered 18 times more promoters than identified previously. S. Typhimurium expresses 140 small regulatory RNAs (sRNAs) at early stationary phase, including 60 newly identified sRNAs. Almost half of the experimentally verified sRNAs were found to be unique to the Salmonella genus, and <20% were found throughout the Enterobacteriaceae. This description of the transcriptional map of SL1344 advances our understanding of S. Typhimurium, arguably the most important bacterial infection model.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/F/00042248
Proceedings of the National Academy of Sciences of the United States of America 2012;109;20;E1277-86
Rapid turnover of long noncoding RNAs and the evolution of gene expression.
Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom.
A large proportion of functional sequence within mammalian genomes falls outside protein-coding exons and can be transcribed into long RNAs. However, the roles in mammalian biology of long noncoding RNA (lncRNA) are not well understood. Few lncRNAs have experimentally determined roles, with some of these being lineage-specific. Determining the extent by which transcription of lncRNA loci is retained or lost across multiple evolutionary lineages is essential if we are to understand their contribution to mammalian biology and to lineage-specific traits. Here, we experimentally investigated the conservation of lncRNA expression among closely related rodent species, allowing the evolution of DNA sequence to be uncoupled from evolution of transcript expression. We generated total RNA (RNAseq) and H3K4me3-bound (ChIPseq) DNA data, and combined both to construct catalogues of transcripts expressed in the adult liver of Mus musculus domesticus (C57BL/6J), Mus musculus castaneus, and Rattus norvegicus. We estimated the rate of transcriptional turnover of lncRNAs and investigated the effects of their lineage-specific birth or death. LncRNA transcription showed considerably greater gain and loss during rodent evolution, compared with protein-coding genes. Nucleotide substitution rates were found to mirror the in vivo transcriptional conservation of intergenic lncRNAs between rodents: only the sequences of noncoding loci with conserved transcription were constrained. Finally, we found that lineage-specific intergenic lncRNAs appear to be associated with modestly elevated expression of genomically neighbouring protein-coding genes. Our findings show that nearly half of intergenic lncRNA loci have been gained or lost since the last common ancestor of mouse and rat, and they predict that such rapid transcriptional turnover contributes to the evolution of tissue- and lineage-specific gene expression.
Funded by: Cancer Research UK: CRUK_15603, CRUK_A15603; European Research Council: ERC_202218; Medical Research Council: MRC_MC_U137761446; Wellcome Trust
PLoS genetics 2012;8;7;e1002841
Targeted restoration of the intestinal microbiota with a simple, defined bacteriotherapy resolves relapsing Clostridium difficile disease in mice.
Wellcome Trust Sanger Institute, Hinxton, United Kingdom. firstname.lastname@example.org
Relapsing C. difficile disease in humans is linked to a pathological imbalance within the intestinal microbiota, termed dysbiosis, which remains poorly understood. We show that mice infected with epidemic C. difficile (genotype 027/BI) develop highly contagious, chronic intestinal disease and persistent dysbiosis characterized by a distinct, simplified microbiota containing opportunistic pathogens and altered metabolite production. Chronic C. difficile 027/BI infection was refractory to vancomycin treatment leading to relapsing disease. In contrast, treatment of C. difficile 027/BI infected mice with feces from healthy mice rapidly restored a diverse, healthy microbiota and resolved C. difficile disease and contagiousness. We used this model to identify a simple mixture of six phylogenetically diverse intestinal bacteria, including novel species, which can re-establish a health-associated microbiota and clear C. difficile 027/BI infection from mice. Thus, targeting a dysbiotic microbiota with a defined mixture of phylogenetically diverse bacteria can trigger major shifts in the microbial community structure that displaces C. difficile and, as a result, resolves disease and contagiousness. Further, we demonstrate a rational approach to harness the therapeutic potential of health-associated microbial communities to treat C. difficile disease and potentially other forms of intestinal dysbiosis.
Funded by: Medical Research Council: 93614, MRC_G0901743; Wellcome Trust: 076964, 098051
PLoS pathogens 2012;8;10;e1002995
Characterization and gene expression analysis of the cir multi-gene family of Plasmodium chabaudi chabaudi (AS).
Division of Parasitology, MRC National Institute for Medical Research, London, UK.
Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.
Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.
Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein functions, including their contribution to antigenic variation and immune evasion.
Funded by: Medical Research Council: MRC_MC_EX_G0901345, U117584248
BMC genomics 2012;13;125
Transferrin and HFE genes interact in Alzheimer's disease risk: the Epistasis Project.
Oxford Project to Investigate Memory and Ageing, University Department of Physiology, Anatomy and Genetics, Oxford, UK. email@example.com
Iron overload may contribute to the risk of Alzheimer's disease (AD). In the Epistasis Project, with 1757 cases of AD and 6295 controls, we studied 4 variants in 2 genes of iron metabolism: hemochromatosis (HFE) C282Y and H63D, and transferrin (TF) C2 and -2G/A. We replicated the reported interaction between HFE 282Y and TF C2 in the risk of AD: synergy factor, 1.75 (95% confidence interval, 1.1-2.8, p = 0.02) in Northern Europeans. The synergy factor was 3.1 (1.4-6.9; 0.007) in subjects with the APOEε4 allele. We found another interaction, between HFE 63HH and TF -2AA, markedly modified by age. Both interactions were found mainly or only in Northern Europeans. The interaction between HFE 282Y and TF C2 has now been replicated twice, in altogether 2313 cases of AD and 7065 controls, and has also been associated with increased iron load. We therefore suggest that iron overload may be a causative factor in the development of AD. Treatment for iron overload might thus be protective in some cases.
Funded by: Medical Research Council: G0400074, G0400546, G0502157, G0900652, G1100540
Neurobiology of aging 2012;33;1;202.e1-13
Using mouse models to study function of transcriptional factors in T cell development.
Key Laboratory of Regenerative Biology, Guangzchou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China ; Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou, China.
Laboratory mice have widely been used as tools for basic biological research and models for studying human diseases. With the advances of genetic engineering and conditional knockout (CKO) mice, we now understand hematopoiesis is a dynamic stepwise process starting from hematopoietic stem cells (HSCs) which are responsible for replenishing all blood cells. Transcriptional factors play important role in hematopoiesis. In this review we compile several studies on using genetic modified mice and humanized mice to study function of transcriptional factors in lymphopoiesis, including T lymphocyte and Natural killer (NK) cell development. Finally, we focused on the key transcriptional factor Bcl11b and its function in regulating T cell specification and commitment.
Cell regeneration (London, England) 2012;1;1;8
Design and implementation of ProteinWorldDB
Affiliation: PUC-Rio, Departamento de Informática, Rio de Janeiro, RJ, Brazil; Affiliation: Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Rio de Janeiro, RJ, Brazil; Affiliation: Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Rio de Janeiro, RJ, Brazil; Affiliation: Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Correspondence Address: Lifschitz, S.; PUC-Rio, Departamento de Informática, Rio de Janeiro, RJ, Brazil
This work involves the comparison of protein information in a genomic scale. The main goal is to improve the quality and interpretation of biological data, besides our understanding of biological systems and their interactions. Stringent comparisons were obtained after the application of the Smith-Waterman algorithm in a pair wise manner to all predicted proteins encoded in both completely sequenced and unfinished genomes available in the public database RefSeq. Comparisons were run through a computational grid and the complete result reaches a volume of over 900 GB. Consequently, the database system design is a critical step in order to store and manage the information from comparisons' results. This paper describes database conceptual design issues for the creation of a database that represents a data set of protein sequence cross-comparisons. We show that our conceptual schema and its relational mapping enables users to extract relevant information, from simple to complex queries integrating distinct data sources
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2012;7409;144-55
Mosaic overgrowth with fibroadipose hyperplasia is caused by somatic activating mutations in PIK3CA.
The National Human Genome Research Institute, US National Institutes of Health, Bethesda, Maryland, USA.
The phosphatidylinositol 3-kinase (PI3K)-AKT signaling pathway is critical for cellular growth and metabolism. Correspondingly, loss of function of PTEN, a negative regulator of PI3K, or activating mutations in AKT1, AKT2 or AKT3 have been found in distinct disorders featuring overgrowth or hypoglycemia. We performed exome sequencing of DNA from unaffected and affected cells from an individual with an unclassified syndrome of congenital progressive segmental overgrowth of fibrous and adipose tissue and bone and identified the cancer-associated mutation encoding p.His1047Leu in PIK3CA, the gene that encodes the p110α catalytic subunit of PI3K, only in affected cells. Sequencing of PIK3CA in ten additional individuals with overlapping syndromes identified either the p.His1047Leu alteration or a second cancer-associated alteration, p.His1047Arg, in nine cases. Affected dermal fibroblasts showed enhanced basal and epidermal growth factor (EGF)-stimulated phosphatidylinositol 3,4,5-trisphosphate (PIP(3)) generation and concomitant activation of downstream signaling relative to their unaffected counterparts. Our findings characterize a distinct overgrowth syndrome, biochemically demonstrate activation of PI3K signaling and thereby identify a rational therapeutic target.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/0000C227, BBS/E/B/000C0415; Wellcome Trust: 077016, 078986, 080952, 091551, 095515, 097721
Nature genetics 2012;44;8;928-33
Metal binding in proteins: Machine learning complements X-ray absorption spectroscopy
Lecture Notes in Computer Science 2012;7524;854-857
Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
We genotyped 2,861 cases of primary biliary cirrhosis (PBC) from the UK PBC Consortium and 8,514 UK population controls across 196,524 variants within 186 known autoimmune risk loci. We identified 3 loci newly associated with PBC (at P<5×10(-8)), increasing the number of known susceptibility loci to 25. The most associated variant at 19p12 is a low-frequency nonsynonymous SNP in TYK2, further implicating JAK-STAT and cytokine signaling in disease pathogenesis. An additional five loci contained nonsynonymous variants in high linkage disequilibrium (LD; r2>0.8) with the most associated variant at the locus. We found multiple independent common, low-frequency and rare variant association signals at five loci. Of the 26 independent non-human leukocyte antigen (HLA) signals tagged on the Immunochip, 15 have SNPs in B-lymphoblastoid open chromatin regions in high LD (r2>0.8) with the most associated variant. This study shows how data from dense fine-mapping arrays coupled with functional genomic data can be used to identify candidate causal variants for functional follow-up.
Funded by: Medical Research Council: G0000934, G0800460, MRC_G0500020, MRC_G0802068; NIDDK NIH HHS: U01 DK062418, U01-DK-062418; Wellcome Trust: 068545/Z/02, 076113/C/04/Z, 085925/Z/08/Z, WT090355/A/09/Z, WT090355/B/09/Z, WT090532, WT098051
Nature genetics 2012;44;10;1137-41
Expression of chemosensory proteins in the tsetse fly Glossina morsitans morsitans is related to female host-seeking behaviour.
Department of Biological Chemistry, Rothamsted Research, Harpenden, UK.
Chemosensory proteins (CSPs) are a class of soluble proteins present in high concentrations in the sensilla of insect antennae. It has been proposed that they play an important role in insect olfaction by mediating interactions between odorants and odorant receptors. Here we report, for the first time, the presence of five CSP genes in the tsetse fly Glossina morsitans morsitans, a major vector transmitting nagana in livestock. Real-time quantitative reverse transcription PCR showed that three of the CSPs are expressed in antennae. One of them, GmmCSP2, is transcribed at a very high level and could be involved in olfaction. We also determined expression in the antennae of both males and females at different life stages and with different blood feeding regimes. The transcription of GmmCSP2 was lower in male antennae than in females, with a sharp increase in 10-week-old flies, 48 h after a bloodmeal. Thus there is a clear relationship between CSP gene transcription and host searching behaviour. Genome annotation and phylogenetic analyses comparing G. morsitans morsitans CSPs with those of other Diptera showed rapid evolution after speciation of mosquitoes.
Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: WT085775/Z/08/Z
Insect molecular biology 2012;21;1;41-8
Learned recognition of maternal signature odors mediates the first suckling episode in mice.
Department of Cell Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.
Background: Soon after birth, all mammals must initiate milk suckling to survive. In rodents, this innate behavior is critically dependent on uncharacterized maternally derived chemosensory ligands. Recently, the first pheromone sufficient to initiate suckling was isolated from the rabbit. Identification of the olfactory cues that trigger first suckling in the mouse would provide the means to determine the neural mechanisms that generate innate behavior.
Results: Here we use behavioral analysis, metabolomics, and calcium imaging of primary sensory neurons and find no evidence of ligands with intrinsic bioactivity, such as pheromones, acting to promote first suckling in the mouse. Instead, we find that the initiation of suckling is dependent on variable blends of maternal "signature odors" that are learned and recognized prior to first suckling.
Conclusions: As observed with pheromone-mediated behavior, the response to signature odors releases innate behavior. However, this mechanism tolerates variability in both the signaling ligands and sensory neurons, which may maximize the probability that this first essential behavior is successfully initiated. These results suggest that mammalian species have evolved multiple strategies to ensure the onset of this critical behavior.
Funded by: NIDCD NIH HHS: R01 DC006885, R01 DC009413; Wellcome Trust: 098051
Current biology : CB 2012;22;21;1998-2007
A combined functional annotation score for non-synonymous variants.
Wellcome Trust Sanger Institute, Hinxton, Hinxton, UK. firstname.lastname@example.org
Aims: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants.
Methods: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: 'HGMD-PUBLIC' and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls.
Results: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage.
Conclusion: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.
Funded by: Wellcome Trust: WT088885/Z/09/Z, WT095908, WT098051
Human heredity 2012;73;1;47-51
Community gene annotation in practice.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. email@example.com
Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the 'Blessed' annotator and 'Gatekeeper' approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation. DATABASE URL: http://vega.sanger.ac.uk/index.html.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F02195X/1; Wellcome Trust: WT077198
Database : the journal of biological databases and curation 2012;2012;bas009
A balanced translocation truncates Neurotrimin in a family with intracranial and thoracic aortic aneurysm.
Institute for Molecular Medicine Finland FIMM, Helsinki, Finland.
Background: Balanced chromosomal rearrangements occasionally have strong phenotypic effects, which may be useful in understanding pathobiology. However, conventional strategies for characterising breakpoints are laborious and inaccurate. We present here a proband with a thoracic aortic aneurysm (TAA) and a balanced translocation t(10;11) (q23.2;q24.2). Our purpose was to sequence the chromosomal breaks in this family to reveal a novel candidate gene for aneurysm.
Intracranial aneurysm (IA) and TAAs appear to run in the family in an autosomal dominant manner: After exploring the family history, we observed that the proband's two siblings both died from cerebral haemorrhage, and the proband's parent and parent's sibling died from aortic rupture. After application of a genome-wide paired-end DNA sequencing method for breakpoint mapping, we demonstrate that this translocation breaks intron 1 of a splicing isoform of Neurotrimin at 11q25 in a previously implicated candidate region for IAs and AAs (OMIM 612161).
Conclusions: Our results demonstrate the feasibility of genome-wide paired-end sequencing for the characterisation of balanced rearrangements and identification of candidate genes in patients with potentially disease-associated chromosome rearrangements. The family samples were gathered as a part of our recently launched National Registry of Reciprocal Balanced Translocations and Inversions in Finland (n=2575), and we believe that such a registry will be a powerful resource for the localisation of chromosomal aberrations, which can bring insight into the aetiology of related phenotypes.
Funded by: NIMH NIH HHS: MH084995, R01 MH084995; Wellcome Trust: 098051
Journal of medical genetics 2012;49;10;621-9
Estimating reassortment rates in co-circulating Eurasian swine influenza viruses.
Institute of Evolutionary Biology, University of Edinburgh, Kings Buildings, West Mains Road, Edinburgh EH9 3JT, UK. firstname.lastname@example.org
Swine have often been considered as a mixing vessel for different influenza strains. In order to assess their role in more detail, we undertook a retrospective sequencing study to detect and characterize the reassortants present in European swine and to estimate the rate of reassortment between H1N1, H1N2 and H3N2 subtypes with Eurasian (avian-like) internal protein-coding segments. We analysed 69 newly obtained whole genome sequences of subtypes H1N1-H3N2 from swine influenza viruses sampled between 1982 and 2008, using Illumina and 454 platforms. Analyses of these genomes, together with previously published genomes, revealed a large monophyletic clade of Eurasian swine-lineage polymerase segments containing H1N1, H1N2 and H3N2 subtypes. We subsequently examined reassortments between the haemagglutinin and neuraminidase segments and estimated the reassortment rates between lineages using a recently developed evolutionary analysis method. High rates of reassortment between H1N2 and H1N1 Eurasian swine lineages were detected in European strains, with an average of one reassortment every 2-3 years. This rapid reassortment results from co-circulating lineages in swine, and in consequence we should expect further reassortments between currently circulating swine strains and the recent swine-origin H1N1v pandemic strain.
Funded by: Biotechnology and Biological Sciences Research Council: BB/H014306/1; Medical Research Council: MC_G0902096, MC_U117512723; Wellcome Trust
The Journal of general virology 2012;93;Pt 11;2326-36
Genetically distinct subsets within ANCA-associated vasculitis.
Cambridge Institute for Medical Research, and Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, United Kingdom. email@example.com
Background: Antineutrophil cytoplasmic antibody (ANCA)-associated vasculitis is a severe condition encompassing two major syndromes: granulomatosis with polyangiitis (formerly known as Wegener's granulomatosis) and microscopic polyangiitis. Its cause is unknown, and there is debate about whether it is a single disease entity and what role ANCA plays in its pathogenesis. We investigated its genetic basis.
Methods: A genomewide association study was performed in a discovery cohort of 1233 U.K. patients with ANCA-associated vasculitis and 5884 controls and was replicated in 1454 Northern European case patients and 1666 controls. Quality control, population stratification, and statistical analyses were performed according to standard criteria.
Results: We found both major-histocompatibility-complex (MHC) and non-MHC associations with ANCA-associated vasculitis and also that granulomatosis with polyangiitis and microscopic polyangiitis were genetically distinct. The strongest genetic associations were with the antigenic specificity of ANCA, not with the clinical syndrome. Anti-proteinase 3 ANCA was associated with HLA-DP and the genes encoding α(1)-antitrypsin (SERPINA1) and proteinase 3 (PRTN3) (P=6.2×10(-89), P=5.6×10(-12,) and P=2.6×10(-7), respectively). Anti-myeloperoxidase ANCA was associated with HLA-DQ (P=2.1×10(-8)).
Conclusions: This study confirms that the pathogenesis of ANCA-associated vasculitis has a genetic component, shows genetic distinctions between granulomatosis with polyangiitis and microscopic polyangiitis that are associated with ANCA specificity, and suggests that the response against the autoantigen proteinase 3 is a central pathogenic feature of proteinase 3 ANCA-associated vasculitis. These data provide preliminary support for the concept that proteinase 3 ANCA-associated vasculitis and myeloperoxidase ANCA-associated vasculitis are distinct autoimmune syndromes. (Funded by the British Heart Foundation and others.).
Funded by: British Heart Foundation: SP/09/001/27117; Medical Research Council; Wellcome Trust: 083650/Z/07/Z
The New England journal of medicine 2012;367;3;214-23
Genome-wide association analysis of imputed rare variants: application to seven common complex diseases.
Estonian Genome Centre, University of Tartu, Tartu, Estonia.
Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.
Funded by: Wellcome Trust: WT076113, WT081682, WT090532, WT098017, WT098051
Genetic epidemiology 2012;36;8;785-96
A systematic survey of loss-of-function variants in human protein-coding genes.
Wellcome Trust Sanger Institute, Hinxton, UK. firstname.lastname@example.org
Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
Funded by: Biotechnology and Biological Sciences Research Council: BB/I02593X/1; British Heart Foundation: RG/09/012/28096; NHGRI NIH HHS: U54 HG003273; NIAAA NIH HHS: R21 AA022707; Wellcome Trust: 085532, 090532, 090532/Z/09/Z, 098051
Science (New York, N.Y.) 2012;335;6070;823-8
Accessing data from the International Mouse Phenotyping Consortium: state of the art and future plans.
Mammalian Genetics Unit, Medical Research Council Harwell, Harwell, Oxfordshire OX11 0RD, UK. email@example.com
The International Mouse Phenotyping Consortium (IMPC) (http://www.mousephenotype.org) will reveal the pleiotropic functions of every gene in the mouse genome and uncover the wider role of genetic loci within diverse biological systems. Comprehensive informatics solutions are vital to ensuring that this vast array of data is captured in a standardised manner and made accessible to the scientific community for interrogation and analysis. Here we review the existing EuroPhenome and WTSI phenotype informatics systems and the IKMC portal, and present plans for extending these systems and lessons learned to the development of a robust IMPC informatics infrastructure.
Funded by: Medical Research Council: MC_U142684171, MC_U142684172, MC_U142684175; NHGRI NIH HHS: U54 HG006370
Mammalian genome : official journal of the International Mammalian Genome Society 2012;23;9-10;641-52
Sleeping Beauty mutagenesis reveals cooperating mutations and pathways in pancreatic adenocarcinoma.
Division of Genetics and Genomics, Institute of Molecular and Cell Biology, Singapore 138673.
Pancreatic cancer is one of the most deadly cancers affecting the Western world. Because the disease is highly metastatic and difficult to diagnosis until late stages, the 5-y survival rate is around 5%. The identification of molecular cancer drivers is critical for furthering our understanding of the disease and development of improved diagnostic tools and therapeutics. We have conducted a mutagenic screen using Sleeping Beauty (SB) in mice to identify new candidate cancer genes in pancreatic cancer. By combining SB with an oncogenic Kras allele, we observed highly metastatic pancreatic adenocarcinomas. Using two independent statistical methods to identify loci commonly mutated by SB in these tumors, we identified 681 loci that comprise 543 candidate cancer genes (CCGs); 75 of these CCGs, including Mll3 and Ptk2, have known mutations in human pancreatic cancer. We identified point mutations in human pancreatic patient samples for another 11 CCGs, including Acvr2a and Map2k4. Importantly, 10% of the CCGs are involved in chromatin remodeling, including Arid4b, Kdm6a, and Nsd3, and all SB tumors have at least one mutated gene involved in this process; 20 CCGs, including Ctnnd1, Fbxo11, and Vgll4, are also significantly associated with poor patient survival. SB mutagenesis provides a rich resource of mutations in potential cancer drivers for cross-comparative analyses with ongoing sequencing efforts in human pancreatic adenocarcinoma.
Funded by: Cancer Research UK: 13031
Proceedings of the National Academy of Sciences of the United States of America 2012;109;16;5934-41
A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance.
Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA.
Recent genome-wide association studies have described many loci implicated in type 2 diabetes (T2D) pathophysiology and β-cell dysfunction but have contributed little to the understanding of the genetic basis of insulin resistance. We hypothesized that genes implicated in insulin resistance pathways might be uncovered by accounting for differences in body mass index (BMI) and potential interactions between BMI and genetic variants. We applied a joint meta-analysis approach to test associations with fasting insulin and glucose on a genome-wide scale. We present six previously unknown loci associated with fasting insulin at P < 5 × 10(-8) in combined discovery and follow-up analyses of 52 studies comprising up to 96,496 non-diabetic individuals. Risk variants were associated with higher triglyceride and lower high-density lipoprotein (HDL) cholesterol levels, suggesting a role for these loci in insulin resistance pathways. The discovery of these loci will aid further characterization of the role of insulin resistance in T2D pathophysiology.
Funded by: British Heart Foundation: BHF_RG/07/008/23674; Chief Scientist Office: CSO_CZB/4/710; Medical Research Council: MRC_G0100222, MRC_G0701863, MRC_G0900339, MRC_G0902037, MRC_G1002084, MRC_G19/35, MRC_G8802774, MRC_MC_PC_U127592696, MRC_MC_U106179471, MRC_MC_U106179472, MRC_MC_U127561128, MRC_MC_U127592696, MRC_MC_U137686857, MRC_MC_UP_A100_1003; NCATS NIH HHS: UL1 TR000124; NCRR NIH HHS: S10 RR029392; NHLBI NIH HHS: R01 HL105756; NIDDK NIH HHS: K24 DK080140, P30 DK020572, P30 DK063491, R01 DK072193, R01 DK078616; NIMH NIH HHS: R37 MH059490; Wellcome Trust: WT090532, WT091551
Nature genetics 2012;44;6;659-69
Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for the exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.
Funded by: Howard Hughes Medical Institute: 55005502; Intramural NIH HHS; Medical Research Council: G0600718, G19/9; Wellcome Trust: 075491/Z/04, 077012/Z/05/Z, 082370, 089275, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 092654, 093956, 098051
Disruption of mouse Cenpj, a regulator of centriole biogenesis, phenocopies Seckel syndrome.
Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
Disruption of the centromere protein J gene, CENPJ (CPAP, MCPH6, SCKL4), which is a highly conserved and ubiquitiously expressed centrosomal protein, has been associated with primary microcephaly and the microcephalic primordial dwarfism disorder Seckel syndrome. The mechanism by which disruption of CENPJ causes the proportionate, primordial growth failure that is characteristic of Seckel syndrome is unknown. By generating a hypomorphic allele of Cenpj, we have developed a mouse (Cenpj(tm/tm)) that recapitulates many of the clinical features of Seckel syndrome, including intrauterine dwarfism, microcephaly with memory impairment, ossification defects, and ocular and skeletal abnormalities, thus providing clear confirmation that specific mutations of CENPJ can cause Seckel syndrome. Immunohistochemistry revealed increased levels of DNA damage and apoptosis throughout Cenpj(tm/tm) embryos and adult mice showed an elevated frequency of micronucleus induction, suggesting that Cenpj-deficiency results in genomic instability. Notably, however, genomic instability was not the result of defective ATR-dependent DNA damage signaling, as is the case for the majority of genes associated with Seckel syndrome. Instead, Cenpj(tm/tm) embryonic fibroblasts exhibited irregular centriole and centrosome numbers and mono- and multipolar spindles, and many were near-tetraploid with numerical and structural chromosomal abnormalities when compared to passage-matched wild-type cells. Increased cell death due to mitotic failure during embryonic development is likely to contribute to the proportionate dwarfism that is associated with CENPJ-Seckel syndrome.
Funded by: Cancer Research UK: CRUK_11224, CRUK_12401, CRUK_13031, CRUK_A11224; European Research Council: ERC_268536; Medical Research Council: MRC_G0901338; NEI NIH HHS: K08 EY020530, NIH 1K08EY020530-01A1, R01 EY018213; Wellcome Trust: 098051, WT092096
PLoS genetics 2012;8;11;e1003022
Cancer gene discovery in the mouse.
Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1HH, UK.
Developments in high-throughput genome analysis and in computational tools have made it possible to rapidly profile entire cancer genomes with basepair resolution. In parallel with these advances, mouse models of cancer have evolved into powerful tools for cancer gene discovery. Here we discuss some of the approaches that may be used for cancer gene identification in the mouse and discuss how a cross-species 'oncogenomics' approach to cancer gene discovery represents a powerful strategy for finding genes that drive tumorigenesis.
Funded by: Cancer Research UK: 13031; Wellcome Trust
Current opinion in genetics & development 2012;22;1;14-20
SynGAP isoforms exert opposing effects on synaptic strength.
Centre for Integrative Physiology, University of Edinburgh, Edinburgh EH8 9XD, UK.
Alternative promoter usage and alternative splicing enable diversification of the transcriptome. Here we demonstrate that the function of Synaptic GTPase-Activating Protein (SynGAP), a key synaptic protein, is determined by the combination of its amino-terminal sequence with its carboxy-terminal sequence. 5' rapid amplification of cDNA ends and primer extension show that different N-terminal protein sequences arise through alternative promoter usage that are regulated by synaptic activity and postnatal age. Heterogeneity in C-terminal protein sequence arises through alternative splicing. Overexpression of SynGAP α1 versus α2 C-termini-containing proteins in hippocampal neurons has opposing effects on synaptic strength, decreasing and increasing miniature excitatory synaptic currents amplitude/frequency, respectively. The magnitude of this C-terminal-dependent effect is modulated by the N-terminal peptide sequence. This is the first demonstration that activity-dependent alternative promoter usage can change the function of a synaptic protein at excitatory synapses. Furthermore, the direction and degree of synaptic modulation exerted by different protein isoforms from a single gene locus is dependent on the combination of differential promoter usage and alternative splicing.
Funded by: Medical Research Council: G0300466, G0601584, G0700967, G0902044, G0902044(94018); Wellcome Trust
Nature communications 2012;3;900
Communication about DTC testing: commentary on a 'family experience of personal genomics'.
This paper provides a commentary on 'Family Experience of Personal Genomics' (Corpas 2012). An overview is offered on the communication literature available to help support individuals and families to communicate about genetic information. Despite there being a wealth of evidence, built on years of genetic counseling practice, this does not appear to have been translated clearly to the Direct to Consumer (DTC) testing market. In many countries it is possible to order a DTC genetic test without the involvement of any health professional; there has been heated debate about whether this is appropriate or not. Much of the focus surrounding this has been on whether it is necessary to have a health professional available to offer their clinical knowledge and help with interpreting the DTC genetic test data. What has been missed from this debate is the importance of enabling customers of DTC testing services access to the abundance of information about how to communicate their genetic risks to others, including immediate family. Family communication about health and indeed genetics can be fraught with difficulty. Genetic health professionals, specifically genetic counselors, have particular expertise in family communication about genetics. Such information could be incredibly useful to kinships as they grapple with knowing how to communicate their genomic information with relatives.
Funded by: Wellcome Trust: WT077008
Journal of genetic counseling 2012;21;3;392-8
Generation of the Sotos syndrome deletion in mice.
Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, UK.
Haploinsufficiency of the human 5q35 region spanning the NSD1 gene results in a rare genomic disorder known as Sotos syndrome (Sotos), with patients displaying a variety of clinical features, including pre- and postnatal overgrowth, intellectual disability, and urinary/renal abnormalities. We used chromosome engineering to generate a segmental monosomy, i.e., mice carrying a heterozygous 1.5-Mb deletion of 36 genes on mouse chromosome 13 (4732471D19Rik-B4galt7), syntenic with 5q35.2-q35.3 in humans (Df(13)Ms2Dja ( +/- ) mice). Surprisingly Df(13)Ms2Dja ( +/- ) mice were significantly smaller for their gestational age and also showed decreased postnatal growth, in contrast to Sotos patients. Df(13)Ms2Dja ( +/- ) mice did, however, display deficits in long-term memory retention and dilation of the pelvicalyceal system, which in part may model the learning difficulties and renal abnormalities observed in Sotos patients. Thus, haploinsufficiency of genes within the mouse 4732471D19Rik-B4galt7 deletion interval play important roles in growth, memory retention, and the development of the renal pelvicalyceal system.
Funded by: Cancer Research UK: 13031; Wellcome Trust
Mammalian genome : official journal of the International Mammalian Genome Society 2012;23;11-12;749-57
Modeling partial monosomy for human chromosome 21q11.2-q21.1 reveals haploinsufficient genes influencing behavior and fat deposition.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Haploinsufficiency of part of human chromosome 21 results in a rare condition known as Monosomy 21. This disease displays a variety of clinical phenotypes, including intellectual disability, craniofacial dysmorphology, skeletal and cardiac abnormalities, and respiratory complications. To search for dosage-sensitive genes involved in this disorder, we used chromosome engineering to generate a mouse model carrying a deletion of the Lipi-Usp25 interval, syntenic with 21q11.2-q21.1 in humans. Haploinsufficiency for the 6 genes in this interval resulted in no gross morphological defects and behavioral analysis performed using an open field test, a test of anxiety, and tests for social interaction were normal in monosomic mice. Monosomic mice did, however, display impaired memory retention compared to control animals. Moreover, when fed a high-fat diet (HFD) monosomic mice exhibited a significant increase in fat mass/fat percentage estimate compared with controls, severe fatty changes in their livers, and thickened subcutaneous fat. Thus, genes within the Lipi-Usp25 interval may participate in memory retention and in the regulation of fat deposition.
Funded by: Cancer Research UK: 13031; Wellcome Trust: 098330
PloS one 2012;7;1;e29681
Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes.
Wellcome Trust Centre for Human Genetics, University of Oxford, UK. firstname.lastname@example.org
To extend understanding of the genetic architecture and molecular basis of type 2 diabetes (T2D), we conducted a meta-analysis of genetic variants on the Metabochip, including 34,840 cases and 114,981 controls, overwhelmingly of European descent. We identified ten previously unreported T2D susceptibility loci, including two showing sex-differentiated association. Genome-wide analyses of these data are consistent with a long tail of additional common variant loci explaining much of the variation in susceptibility to T2D. Exploration of the enlarged set of susceptibility loci implicates several processes, including CREBBP-related transcription, adipocytokine signaling and cell cycle regulation, in diabetes pathogenesis.
Funded by: British Heart Foundation: RG/07/008/23674, RG/98002, RG2008/08; Cancer Research UK; Chief Scientist Office: CZB/4/672, CZB/4/710; Department of Health: DHCS/07/07/008; Medical Research Council: G0000649, G0100222, G0401527, G0601261, G0701863, G0902037, G1000143, G19/35, G8802774, MC_U106179471, MC_UP_A100_1003; NCI NIH HHS: CA055075; NCRR NIH HHS: UL1 RR029887, UL1RR025005; NHGRI NIH HHS: 1Z01HG000024, N01HG65403, U01HG004399, U01HG004402; NHLBI NIH HHS: N01HC25195, N02HL64278, R01HL086694, R01HL087641, R01HL59367; NIA NIH HHS: AG028555, AG04563, AG08724, AG08861, AG10175; NIDDK NIH HHS: DK058845, DK062370, DK072193, DK073490, DK078616, DK080140, K24 DK080140, R01 DK072193, R01 DK073490, U01 DK085545; NIGMS NIH HHS: T32 GM007753; NINDS NIH HHS: 1R21NS064908; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust: 064890, 081682, 090367, 090532, 098017, GR072960, GR076113, GR077016, GR081682, GR083270, GR083948, GR084711, GR086596, GR090532, GR098051
Nature genetics 2012;44;9;981-90
Olorin: combining gene flow with exome sequencing in large family studies of complex disease.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK. email@example.com
Motivation: The existence of families with many individuals affected by the same complex disease has long suggested the possibility of rare alleles of high penetrance. In contrast to Mendelian diseases, however, linkage studies have identified very few reproducibly linked loci in diseases such as diabetes and autism. Genome-wide association studies have had greater success with such diseases, but these results explain neither the extreme disease load nor the within-family linkage peaks, of some large pedigrees. Combining linkage information with exome or genome sequencing from large complex disease pedigrees might finally identify family-specific, high-penetrance mutations.
Results: Olorin is a tool, which integrates gene flow within families with next generation sequencing data to enable the analysis of complex disease pedigrees. Users can interactively filter and prioritize variants based on haplotype sharing across selected individuals and other measures of importance, including predicted functional consequence and population frequency.
Funded by: Wellcome Trust: WT098051
Bioinformatics (Oxford, England) 2012;28;24;3320-1
Generation of multipotent lung and airway progenitors from mouse ESCs and patient-specific cystic fibrosis iPSCs.
Center for Regenerative Medicine, Massachusetts General Hospital, Boston, 02114, USA.
Deriving lung progenitors from patient-specific pluripotent cells is a key step in producing differentiated lung epithelium for disease modeling and transplantation. By mimicking the signaling events that occur during mouse lung development, we generated murine lung progenitors in a series of discrete steps. Definitive endoderm derived from mouse embryonic stem cells (ESCs) was converted into foregut endoderm, then into replicating Nkx2.1+ lung endoderm, and finally into multipotent embryonic lung progenitor and airway progenitor cells. We demonstrated that precisely-timed BMP, FGF, and WNT signaling are required for NKX2.1 induction. Mouse ESC-derived Nkx2.1+ progenitor cells formed respiratory epithelium (tracheospheres) when transplanted subcutaneously into mice. We then adapted this strategy to produce disease-specific lung progenitor cells from human Cystic Fibrosis induced pluripotent stem cells (iPSCs), creating a platform for dissecting human lung disease. These disease-specific human lung progenitors formed respiratory epithelium when subcutaneously engrafted into immunodeficient mice.
Funded by: NHLBI NIH HHS: P30 HL101287, R21 HL108055, R21 HL109786
Cell stem cell 2012;10;4;385-97
Behavior and target site selection of conjugative transposon Tn916 in two different strains of toxigenic Clostridium difficile.
Department of Microbial Diseases, UCL Eastman Dental Institute, University College London, London, UK. firstname.lastname@example.org
The insertion sites of the conjugative transposon Tn916 in the anaerobic pathogen Clostridium difficile were determined using Illumina Solexa high-throughput DNA sequencing of Tn916 insertion libraries in two different clinical isolates: 630ΔE, an erythromycin-sensitive derivative of 630 (ribotype 012), and the ribotype 027 isolate R20291, which was responsible for a severe outbreak of C. difficile disease. A consensus 15-bp Tn916 insertion sequence was identified which was similar in both strains, although an extended consensus sequence was observed in R20291. A search of the C. difficile 630 genome showed that the Tn916 insertion motif was present 100,987 times, with approximately 63,000 of these motifs located in genes and 35,000 in intergenic regions. To test the usefulness of Tn916 as a mutagen, a functional screen allowed the isolation of a mutant. This mutant contained Tn916 inserted into a gene involved in flagellar biosynthesis.
Funded by: Medical Research Council: G0601176
Applied and environmental microbiology 2012;78;7;2147-53
Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer.
Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK. email@example.com
The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations.
Funded by: Wellcome Trust: 077012/Z/05/Z, WT088340, WT095908
An atypical facial appearance and growth pattern in a child with Cornelia de Lange Syndrome: an intragenic deletion predicting loss of the N-terminal region of NIPBL.
South-east Scotland Clinical Genetics Services Western General Hospital, Edinburgh, UK. Jennie.firstname.lastname@example.org
Cornelia de Lange Syndrome (CdLS) is a multisystem disorder with a live birth prevalence of approximately one per 15 000. Clinical diagnosis is based on a characteristic facies – low frontal hair line, short nose, triangular nasal tip, crescent shaped mouth, upturned nose, and arched eyebrows – characteristic limb defects and a distinctive pattern of growth and development. Approximately half of all classical cases of CdLS have heterozygous loss of-function mutations in the gene encoding NIPBL, a component of the cohesion-loading apparatus (Dorsett and Krantz, 2009). Herein we describe a patient with a rare intragenic deletion of NIPBL who has typical microcephaly and developmental problems but atypical growth pattern and facial features.
Funded by: Medical Research Council: MC_PC_U127561093, MC_U127561093; Wellcome Trust: WT077008
Clinical dysmorphology 2012;21;1;22-3
A GWAS sequence variant for platelet volume marks an alternative DNM3 promoter in megakaryocytes near a MEIS1 binding site.
Department of Haematology, University of Cambridge and National Health Service Blood and Transplant, Cambridge, United Kingdom.
We recently identified 68 genomic loci where common sequence variants are associated with platelet count and volume. Platelets are formed in the bone marrow by megakaryocytes, which are derived from hematopoietic stem cells by a process mainly controlled by transcription factors. The homeobox transcription factor MEIS1 is uniquely transcribed in megakaryocytes and not in the other lineage-committed blood cells. By ChIP-seq, we show that 5 of the 68 loci pinpoint a MEIS1 binding event within a group of 252 MK-overexpressed genes. In one such locus in DNM3, regulating platelet volume, the MEIS1 binding site falls within a region acting as an alternative promoter that is solely used in megakaryocytes, where allelic variation dictates different levels of a shorter transcript. The importance of dynamin activity to the latter stages of thrombopoiesis was confirmed by the observation that the inhibitor Dynasore reduced murine proplatelet for-mation in vitro.
Funded by: British Heart Foundation: BHF_RG/08/014/24067, BHF_RG/09/012/28096, RG/09/12/28096; Cancer Research UK: CRUK_12765, CRUK_14136; Department of Health: DH_RP-PG-0310-1002; Medical Research Council: MRC_G0401527, MRC_G0800784, MRC_G1000143; NHLBI NIH HHS: HL68130, R01 HL068130; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC3RS_G0900729/1; Wellcome Trust: WT-084183/2/07/2
The genomic landscape shaped by selection on transposable elements across 18 mouse strains.
MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK. email@example.com
Background: Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined.
Results: Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected.
Conclusions: Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.
Funded by: Cancer Research UK: 13031; Intramural NIH HHS; Medical Research Council: G0800024, MC_EX_G0802457, MC_U137761446; NINDS NIH HHS: R01 NS031348; Wellcome Trust: 079912, 090532
Genome biology 2012;13;6;R45
An integrated functional genomics approach identifies the regulatory network directed by brachyury (T) in chordoma.
Randall Division of Cell and Molecular Biophysics, New Hunt's House, King's College London, Guy's Campus, London, SE1 1UL, UK.
Chordoma is a rare malignant tumour of bone, the molecular marker of which is the expression of the transcription factor, brachyury. Having recently demonstrated that silencing brachyury induces growth arrest in a chordoma cell line, we now seek to identify its downstream target genes. Here we use an integrated functional genomics approach involving shRNA-mediated brachyury knockdown, gene expression microarray, ChIP-seq experiments, and bioinformatics analysis to achieve this goal. We confirm that the T-box binding motif of human brachyury is identical to that found in mouse, Xenopus, and zebrafish development, and that brachyury acts primarily as an activator of transcription. Using human chordoma samples for validation purposes, we show that brachyury binds 99 direct targets and indirectly influences the expression of 64 other genes, thereby acting as a master regulator of an elaborate oncogenic transcriptional network encompassing diverse signalling pathways including components of the cell cycle, and extracellular matrix components. Given the wide repertoire of its active binding and the relative specific localization of brachyury to the tumour cells, we propose that an RNA interference-based gene therapy approach is a plausible therapeutic avenue worthy of investigation.
Funded by: Medical Research Council: G0700095, G0700213
The Journal of pathology 2012;228;3;274-85
The role of sphingosine-1-phosphate transporter Spns2 in immune system function.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. firstname.lastname@example.org
Sphingosine-1-phosphate (S1P) is lipid messenger involved in the regulation of embryonic development, immune system functions, and many other physiological processes. However, the mechanisms of S1P transport across cellular membranes remain poorly understood, with several ATP-binding cassette family members and the spinster 2 (Spns2) member of the major facilitator superfamily known to mediate S1P transport in cell culture. Spns2 was also shown to control S1P activities in zebrafish in vivo and to play a critical role in zebrafish cardiovascular development. However, the in vivo roles of Spns2 in mammals and its involvement in the different S1P-dependent physiological processes have not been investigated. In this study, we characterized Spns2-null mouse line carrying the Spns2(tm1a(KOMP)Wtsi) allele (Spns2(tm1a)). The Spns2(tm1a/tm1a) animals were viable, indicating a divergence in Spns2 function from its zebrafish ortholog. However, the immunological phenotype of the Spns2(tm1a/tm1a) mice closely mimicked the phenotypes of partial S1P deficiency and impaired S1P-dependent lymphocyte trafficking, with a depletion of lymphocytes in circulation, an increase in mature single-positive T cells in the thymus, and a selective reduction in mature B cells in the spleen and bone marrow. Spns2 activity in the nonhematopoietic cells was critical for normal lymphocyte development and localization. Overall, Spns2(tm1a/tm1a) resulted in impaired humoral immune responses to immunization. This study thus demonstrated a physiological role for Spns2 in mammalian immune system functions but not in cardiovascular development. Other components of the S1P signaling network are investigated as drug targets for immunosuppressive therapy, but the selective action of Spns2 may present an advantage in this regard.
Funded by: Canadian Institutes of Health Research; Cancer Research UK: 13031; Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 098051
Journal of immunology (Baltimore, Md. : 1950) 2012;189;1;102-11
The critical role of histone H2A-deubiquitinase Mysm1 in hematopoiesis and lymphocyte differentiation.
Wellcome Trust Genome Campus, The Wellcome Trust Sanger Institute, Cambridge, United Kingdom. email@example.com
Stem cell differentiation and lineage specification depend on coordinated programs of gene expression, but our knowledge of the chromatin-modifying factors regulating these events remains incomplete. Ubiquitination of histone H2A (H2A-K119u) is a common chromatin modification associated with gene silencing, and controlled by the ubiquitin-ligase polycomb repressor complex 1 (PRC1) and H2A-deubiquitinating enzymes (H2A-DUBs). The roles of H2A-DUBs in mammalian development, stem cells, and hematopoiesis have not been addressed. Here we characterized an H2A-DUB targeted mouse line Mysm1(tm1a/tm1a) and demonstrated defects in BM hematopoiesis, resulting in lymphopenia, anemia, and thrombocytosis. Development of lymphocytes was impaired from the earliest stages of their differentiation, and there was also a depletion of erythroid cells and a defect in erythroid progenitor function. These phenotypes resulted from a cell-intrinsic requirement for Mysm1 in the BM. Importantly, Mysm1(tm1a/tm1a) HSCs were functionally impaired, and this was associated with elevated levels of reactive oxygen species, γH2AX DNA damage marker, and p53 protein in the hematopoietic progenitors. Overall, these data establish a role for Mysm1 in the maintenance of BM stem cell function, in the control of oxidative stress and genetic stability in hematopoietic progenitors, and in the development of lymphoid and erythroid lineages.
Funded by: Canadian Institutes of Health Research; Wellcome Trust
Mutational processes molding the genomes of 21 breast cancers.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
All cancers carry somatic mutations. The patterns of mutation in cancer genomes reflect the DNA damage and repair processes to which cancer cells and their precursors have been exposed. To explore these mechanisms further, we generated catalogs of somatic mutation from 21 breast cancers and applied mathematical methods to extract mutational signatures of the underlying processes. Multiple distinct single- and double-nucleotide substitution signatures were discernible. Cancers with BRCA1 or BRCA2 mutations exhibited a characteristic combination of substitution mutation signatures and a distinctive profile of deletions. Complex relationships between somatic mutation prevalence and transcription were detected. A remarkable phenomenon of localized hypermutation, termed "kataegis," was observed. Regions of kataegis differed between cancers but usually colocalized with somatic rearrangements. Base substitutions in these regions were almost exclusively of cytosine at TpC dinucleotides. The mechanisms underlying most of these mutational signatures are unknown. However, a role for the APOBEC family of cytidine deaminases is proposed.
Funded by: Department of Health; Medical Research Council: MC_U105178806; NCI NIH HHS: CA089393, P50 CA089393; Wellcome Trust: 088340, 098051, WT088340MA
The life history of 21 breast cancers.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer's life history is encrypted in the somatic mutations present. We developed algorithms to decipher this narrative and applied them to 21 breast cancers. Mutational processes evolve across a cancer's lifespan, with many emerging late but contributing extensive genetic variation. Subclonal diversification is prominent, and most mutations are found in just a fraction of tumor cells. Every tumor has a dominant subclonal lineage, representing more than 50% of tumor cells. Minimal expansion of these subclones occurs until many hundreds to thousands of mutations have accumulated, implying the existence of long-lived, quiescent cell lineages capable of substantial proliferation upon acquisition of enabling genomic changes. Expansion of the dominant subclone to an appreciable mass may therefore represent the final rate-limiting step in a breast cancer's development, triggering diagnosis.
Funded by: Department of Health; NCI NIH HHS: CA089393, P50 CA089393; Wellcome Trust: 088340, 093867, 098051
Transmission of malaria to mosquitoes blocked by bumped kinase inhibitors.
Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, Washington 98195-6423, USA.
Effective control and eradication of malaria will require new tools to prevent transmission. Current antimalarial therapies targeting the asexual stage of Plasmodium do not prevent transmission of circulating gametocytes from infected humans to mosquitoes. Here, we describe a new class of transmission-blocking compounds, bumped kinase inhibitors (BKIs), which inhibit microgametocyte exflagellation. Oocyst formation and sporozoite production, necessary for transmission to mammals, were inhibited in mosquitoes fed on either BKI-1-treated human blood or mice treated with BKI-1. BKIs are hypothesized to act via inhibition of Plasmodium calcium-dependent protein kinase 4 and predicted to have little activity against mammalian kinases. Our data show that BKIs do not inhibit proliferation of mammalian cell lines and are well tolerated in mice. Used in combination with drugs active against asexual stages of Plasmodium, BKIs could prove an important tool for malaria control and eradication.
Funded by: Medical Research Council: G0501670; NIAID NIH HHS: R01 AI089441, R01AI080625, R01AI089441; NIGMS NIH HHS: R01 GM086858, R01GM086858; Wellcome Trust: WT089085/Z/09/Z
The Journal of clinical investigation 2012;122;6;2301-5
Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
A highly invasive form of non-typhoidal Salmonella (iNTS) disease has recently been documented in many countries in sub-Saharan Africa. The most common Salmonella enterica serovar causing this disease is Typhimurium (Salmonella Typhimurium). We applied whole-genome sequence-based phylogenetic methods to define the population structure of sub-Saharan African invasive Salmonella Typhimurium isolates and compared these to global Salmonella Typhimurium populations. Notably, the vast majority of sub-Saharan invasive Salmonella Typhimurium isolates fell within two closely related, highly clustered phylogenetic lineages that we estimate emerged independently ∼52 and ∼35 years ago in close temporal association with the current HIV pandemic. Clonal replacement of isolates from lineage I by those from lineage II was potentially influenced by the use of chloramphenicol for the treatment of iNTS disease. Our analysis suggests that iNTS disease is in part an epidemic in sub-Saharan Africa caused by highly related Salmonella Typhimurium lineages that may have occupied new niches associated with a compromised human population and antibiotic treatment.
Funded by: Wellcome Trust: WT098051
Nature genetics 2012;44;11;1215-21
Recessive HYDIN mutations cause primary ciliary dyskinesia without randomization of left-right body asymmetry.
Department of General Pediatrics, University Children's Hospital Muenster, 48149 Muenster, Germany.
Primary ciliary dyskinesia (PCD) is a genetically heterogeneous recessive disorder characterized by defective cilia and flagella motility. Chronic destructive-airway disease is caused by abnormal respiratory-tract mucociliary clearance. Abnormal propulsion of sperm flagella contributes to male infertility. Genetic defects in most individuals affected by PCD cause randomization of left-right body asymmetry; approximately half show situs inversus or situs ambiguous. Almost 70 years after the hy3 mouse possessing Hydin mutations was described as a recessive hydrocephalus model, we report HYDIN mutations in PCD-affected persons without hydrocephalus. By homozygosity mapping, we identified a PCD-associated locus, chromosomal region 16q21-q23, which contains HYDIN. However, a nearly identical 360 kb paralogous segment (HYDIN2) in chromosomal region 1q21.1 complicated mutational analysis. In three affected German siblings linked to HYDIN, we identified homozygous c.3985G>T mutations that affect an evolutionary conserved splice acceptor site and that subsequently cause aberrantly spliced transcripts predicting premature protein termination in respiratory cells. Parallel whole-exome sequencing identified a homozygous nonsense HYDIN mutation, c.922A>T (p.Lys307(∗)), in six individuals from three Faroe Island PCD-affected families that all carried an 8.8 Mb shared haplotype across HYDIN, indicating an ancestral founder mutation in this isolated population. We demonstrate by electron microscopy tomography that, consistent with the effects of loss-of-function mutations, HYDIN mutant respiratory cilia lack the C2b projection of the central pair (CP) apparatus; similar findings were reported in Hydin-deficient Chlamydomonas and mice. High-speed videomicroscopy demonstrated markedly reduced beating amplitudes of respiratory cilia and stiff sperm flagella. Like the hy3 mouse model, all nine PCD-affected persons had normal body composition because nodal cilia function is apparently not dependent on the function of the CP apparatus.
Funded by: Wellcome Trust: 090532, 091310, 091551
American journal of human genetics 2012;91;4;672-84
Cestode genomics - progress and prospects for advancing basic and applied aspects of flatworm biology.
Department of Zoology, The Natural History Museum, London, UK.
Characterization of the first tapeworm genome, Echinococcus multilocularis, is now nearly complete, and genome assemblies of E. granulosus, Taenia solium and Hymenolepis microstoma are in advanced draft versions. These initiatives herald the beginning of a genomic era in cestodology and underpin a diverse set of research agendas targeting both basic and applied aspects of tapeworm biology. We discuss the progress in the genomics of these species, provide insights into the presence and composition of immunologically relevant gene families, including the antigen B- and EG95/45W families, and discuss chemogenomic approaches toward the development of novel chemotherapeutics against cestode diseases. In addition, we discuss the evolution of tapeworm parasites and introduce the research programmes linked to genome initiatives that are aimed at understanding signalling systems involved in basic host-parasite interactions and morphogenesis.
Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151
Parasite immunology 2012;34;2-3;130-50
Exome sequencing of liver fluke-associated cholangiocarcinoma.
National Cancer Centre Singapore-Van Andel Research Institute Translational Research Laboratory, Division of Medical Sciences, Singapore.
Opisthorchis viverrini-related cholangiocarcinoma (CCA), a fatal bile duct cancer, is a major public health concern in areas endemic for this parasite. We report here whole-exome sequencing of eight O. viverrini-related tumors and matched normal tissue. We identified and validated 206 somatic mutations in 187 genes using Sanger sequencing and selected 15 genes for mutation prevalence screening in an additional 46 individuals with CCA (cases). In addition to the known cancer-related genes TP53 (mutated in 44.4% of cases), KRAS (16.7%) and SMAD4 (16.7%), we identified somatic mutations in 10 newly implicated genes in 14.8-3.7% of cases. These included inactivating mutations in MLL3 (in 14.8% of cases), ROBO2 (9.3%), RNF43 (9.3%) and PEG3 (5.6%), and activating mutations in the GNAS oncogene (9.3%). These genes have functions that can be broadly grouped into three biological classes: (i) deactivation of histone modifiers, (ii) activation of G protein signaling and (iii) loss of genome stability. This study provides insight into the mutational landscape contributing to O. viverrini-related CCA.
Nature genetics 2012;44;6;690-3
Pluripotency and its layers of complexity.
Wellcome Trust Sanger Institute, Hinxton, CB10 1SA UK ; Technology and Research, Agency for Science, 1 Fusionopolis Way, #20-10, Connexis North Tower, Kragujevac, 138632 Singapore.
Pluripotency is depicted by a self-renewing state that can competently differentiate to form the three germ layers. Different stages of early murine development can be captured on a petri dish, delineating a spectrum of pluripotent states, ranging from embryonic stem cells, embryonic germ cells to epiblast stem cells. Anomalous cell populations displaying signs of pluripotency have also been uncovered, from the isolation of embryonic carcinoma cells to the derivation of induced pluripotent stem cells. Gaining insight into the molecular circuitry within these cell types enlightens us about the significance and contribution of each stage, hence deepening our understanding of vertebrate development. In this review, we aim to describe experimental milestones that led to the understanding of embryonic development and the conception of pluripotency. We also discuss attempts at exploring the realm of pluripotency with the identification of pluripotent stem cells within mouse teratocarcinomas and embryos, and the generation of pluripotent cells through nuclear reprogramming. In conclusion, we illustrate pluripotent cells derived from other organisms, including human derivatives, and describe current paradigms in the comprehension of human pluripotency.
Cell regeneration (London, England) 2012;1;1;7
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. firstname.lastname@example.org
Background: Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences.
Results: We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates.
Conclusion: We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 079355/Z/06/Z, 090532
BMC genomics 2012;13;1
The deubiquitinase USP9X suppresses pancreatic ductal adenocarcinoma.
Li Ka Shing Centre, Cambridge Research Institute, Cancer Research UK, Cambridge CB2 0RE, UK.
Pancreatic ductal adenocarcinoma (PDA) remains a lethal malignancy despite much progress concerning its molecular characterization. PDA tumours harbour four signature somatic mutations in addition to numerous lower frequency genetic events of uncertain significance. Here we use Sleeping Beauty (SB) transposon-mediated insertional mutagenesis in a mouse model of pancreatic ductal preneoplasia to identify genes that cooperate with oncogenic Kras(G12D) to accelerate tumorigenesis and promote progression. Our screen revealed new candidate genes for PDA and confirmed the importance of many genes and pathways previously implicated in human PDA. The most commonly mutated gene was the X-linked deubiquitinase Usp9x, which was inactivated in over 50% of the tumours. Although previous work had attributed a pro-survival role to USP9X in human neoplasia, we found instead that loss of Usp9x enhances transformation and protects pancreatic cancer cells from anoikis. Clinically, low USP9X protein and messenger RNA expression in PDA correlates with poor survival after surgery, and USP9X levels are inversely associated with metastatic burden in advanced disease. Furthermore, chromatin modulation with trichostatin A or 5-aza-2'-deoxycytidine elevates USP9X expression in human PDA cell lines, indicating a clinical approach for certain patients. The conditional deletion of Usp9x cooperated with Kras(G12D) to accelerate pancreatic tumorigenesis in mice, validating their genetic interaction. We propose that USP9X is a major tumour suppressor gene with prognostic and therapeutic relevance in PDA.
Funded by: Cancer Research UK: CRUK_13031; NCI NIH HHS: 2P50CA101955, CA106610, CA122183, CA128920, CA62924, K01 CA122183, K01 CA122183-05, K08 CA106610, P50 CA062924, P50 CA101955, P50CA62924, R01 CA128920; Wellcome Trust
High altitude adaptation in Daghestani populations from the Caucasus.
The Wellcome Trust Sanger Institute, Hinxton, UK. email@example.com
We have surveyed 15 high-altitude adaptation candidate genes for signals of positive selection in North Caucasian highlanders using targeted re-sequencing. A total of 49 unrelated Daghestani from three ethnic groups (Avars, Kubachians, and Laks) living in ancient villages located at around 2,000 m above sea level were chosen as the study population. Caucasian (Adygei living at sea level, N = 20) and CEU (CEPH Utah residents with ancestry from northern and western Europe; N = 20) were used as controls. Candidate genes were compared with 20 putatively neutral control regions resequenced in the same individuals. The regions of interest were amplified by long-PCR, pooled according to individual, indexed by adding an eight-nucleotide tag, and sequenced using the Illumina GAII platform. 1,066 SNPs were called using false discovery and false negative thresholds of ~6%. The neutral regions provided an empirical null distribution to compare with the candidate genes for signals of selection. Two genes stood out. In Laks, a non-synonymous variant within HIF1A already known to be associated with improvement in oxygen metabolism was rediscovered, and in Kubachians a cluster of 13 SNPs located in a conserved intronic region within EGLN1 showing high population differentiation was found. These variants illustrate both the common pathways of adaptation to high altitude in different populations and features specific to the Daghestani populations, showing how even a mildly hypoxic environment can lead to genetic adaptation.
Funded by: Wellcome Trust
Human genetics 2012;131;3;423-33
Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool.
Division of Biological Anthropology, University of Cambridge, UK. firstname.lastname@example.org
Humans and their ancestors have traversed the Ethiopian landscape for millions of years, and present-day Ethiopians show great cultural, linguistic, and historical diversity, which makes them essential for understanding African variability and human origins. We genotyped 235 individuals from ten Ethiopian and two neighboring (South Sudanese and Somali) populations on an Illumina Omni 1M chip. Genotypes were compared with published data from several African and non-African populations. Principal-component and STRUCTURE-like analyses confirmed substantial genetic diversity both within and between populations, and revealed a match between genetic data and linguistic affiliation. Using comparisons with African and non-African reference samples in 40-SNP genomic windows, we identified "African" and "non-African" haplotypic components for each Ethiopian individual. The non-African component, which includes the SLC24A5 allele associated with light skin pigmentation in Europeans, may represent gene flow into Africa, which we estimate to have occurred ~3 thousand years ago (kya). The non-African component was found to be more similar to populations inhabiting the Levant rather than the Arabian Peninsula, but the principal route for the expansion out of Africa ~60 kya remains unresolved. Linkage-disequilibrium decay with genomic distance was less rapid in both the whole genome and the African component than in southern African samples, suggesting a less ancient history for Ethiopian populations.
Funded by: Wellcome Trust: 098051
American journal of human genetics 2012;91;1;83-96
Assignment of protein interactions from affinity purification/mass spectrometry data.
Wellcome Trust Sanger Institute , Wellcome Trust Genome Campus, Hinxton, CB10 1SA Cambridgeshire, United Kingdom. email@example.com
The combination of affinity purification with mass spectrometry analysis has become the method of choice for protein complex characterization. With the improved performance of mass spectrometry technology, the sensitivity of the analyses is increasing, probing deeper into molecular interactions and yielding longer lists of proteins. These identify not only core complex subunits but also the more inaccessible proteins that interact weakly or transiently. Alongside them, contaminant proteins, which are often abundant proteins in the cell, tend to be recovered in affinity experiments because they bind nonspecifically and with low affinity to matrix, tag, and/or antibody. The challenge now lies in discriminating nonspecific binders from true interactors, particularly at the low level and in a larger scale. This review aims to summarize the variety of methods that have been used to distinguish contaminants from specific interactions in the past few years, ranging from manual elimination using heuristic rules to more sophisticated probabilistic scoring approaches. We aim to give awareness on the processing that takes place before an interaction list is reported and on the different types of list curation approaches suited to the different experiments.
Funded by: Wellcome Trust: 079643/Z/06/Z
Journal of proteome research 2012;11;3;1462-74
The GENCODE pseudogene resource.
Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
Background: Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data.
Results: As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection.
Conclusions: At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.
Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 77198/Z/05/Z
Genome biology 2012;13;9;R51
Impact of restricted marital practices on genetic variation in an endogamous Gujarati group.
Institute for Genetic Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA. firstname.lastname@example.org
Recent studies have examined the influence on patterns of human genetic variation of a variety of cultural practices. In India, centuries-old marriage customs have introduced extensive social structuring into the contemporary population, potentially with significant consequences for genetic variation. Social stratification in India is evident as social classes that are defined by endogamous groups known as castes. Within a caste, there exist endogamous groups known as gols (marriage circles), each of which comprises a small number of exogamous gotra (lineages). Thus, while consanguinity is strictly avoided and some randomness in mate selection occurs within the gol, gene flow is limited with groups outside the gol. Gujarati Patels practice this form of "exogamic endogamy." We have analyzed genetic variation in one such group of Gujarati Patels, the Chha Gaam Patels (CGP), who comprise individuals from six villages. Population structure analysis of 1,200 autosomal loci offers support for the existence of distinctive multilocus genotypes in the CGP with respect to both non-Gujaratis and other Gujaratis, and indicates that CGP individuals are genetically very similar. Analysis of Y-chromosomal and mitochondrial haplotypes provides support for both patrilocal and patrilineal practices within the gol, and a low-level of female gene flow into the gol. Our study illustrates how the practice of gol endogamy has introduced fine-scale genetic structure into the population of India, and contributes more generally to an understanding of the way in which marriage practices affect patterns of genetic variation.
Funded by: NCI NIH HHS: CA62528-01; NCRR NIH HHS: RR10600-01, RR14514-01; NICHD NIH HHS: P30 HD024064; NIGMS NIH HHS: GM081441, R01 GM081441; Wellcome Trust
American journal of physical anthropology 2012;149;1;92-103
Evolutionary genetics of the human Rh blood group system.
Department of Anthropology, Pennsylvania State University, University Park, PA 16801, USA.
The evolutionary history of variation in the human Rh blood group system, determined by variants in the RHD and RHCE genes, has long been an unresolved puzzle in human genetics. Prior to medical treatments and interventions developed in the last century, the D-positive (RhD positive) children of D-negative (RhD negative) women were at risk for hemolytic disease of the newborn, if the mother produced anti-D antibodies following sensitization to the blood of a previous D-positive child. Given the deleterious fitness consequences of this disease, the appreciable frequencies in European populations of the responsible RHD gene deletion variant (for example, 0.43 in our study) seem surprising. In this study, we used new molecular and genomic data generated from four HapMap population samples to test the idea that positive selection for an as-of-yet unknown fitness benefit of the RHD deletion may have offset the otherwise negative fitness effects of hemolytic disease of the newborn. We found no evidence that positive natural selection affected the frequency of the RHD deletion. Thus, the initial rise to intermediate frequency of the RHD deletion in European populations may simply be explained by genetic drift/founder effect, or by an older or more complex sweep that we are insufficiently powered to detect. However, our simulations recapitulate previous findings that selection on the RHD deletion is frequency dependent and weak or absent near 0.5. Therefore, once such a frequency was achieved, it could have been maintained by a relatively small amount of genetic drift. We unexpectedly observed evidence for positive selection on the C allele of RHCE in non-African populations (on chromosomes with intact copies of the RHD gene) in the form of an unusually high F( ST ) value and the high frequency of a single haplotype carrying the C allele. RhCE function is not well understood, but the C/c antigenic variant is clinically relevant and can result in hemolytic disease of the newborn, albeit much less commonly and severely than that related to the D-negative blood type. Therefore, the potential fitness benefits of the RHCE C allele are currently unknown but merit further exploration.
Funded by: Medical Research Council: G0801123, GO801123; NHGRI NIH HHS: P41-HG004221; NICHD NIH HHS: R01 HD021244, R01-HD21244; Wellcome Trust: 098051, WT098051
Human genetics 2012;131;7;1205-16
Clinically significant copy number alterations and complex rearrangements of MYB and NFIB in head and neck adenoid cystic carcinoma.
Sahlgrenska Cancer Center, Department of Pathology, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden.
Adenoid cystic carcinoma (ACC) of the head and neck is a malignant tumor with poor long-term prognosis. Besides the recently identified MYB-NFIB fusion oncogene generated by a t(6;9) translocation, little is known about other genetic alterations in ACC. Using high-resolution, array-based comparative genomic hybridization, and massively paired-end sequencing, we explored genomic alterations in 40 frozen ACCs. Eighty-six percent of the tumors expressed MYB-NFIB fusion transcripts and 97% overexpressed MYB mRNA, indicating that MYB activation is a hallmark of ACC. Thirty-five recurrent copy number alterations (CNAs) were detected, including losses involving 12q, 6q, 9p, 11q, 14q, 1p, and 5q and gains involving 1q, 9p, and 22q. Grade III tumors had on average a significantly higher number of CNAs/tumor compared to Grade I and II tumors (P = 0.007). Losses of 1p, 6q, and 15q were associated with high-grade tumors, whereas losses of 14q were exclusively seen in Grade I tumors. The t(6;9) rearrangements were associated with a complex pattern of breakpoints, deletions, insertions, inversions, and for 9p also gains. Analyses of fusion-negative ACCs using high-resolution arrays and massively paired-end sequencing revealed that MYB may also be deregulated by other mechanisms in addition to gene fusion. Our studies also identified several down-regulated candidate tumor suppressor genes (CTNNBIP1, CASP9, PRDM2, and SFN) in 1p36.33-p35.3 that may be of clinical significance in high-grade tumors. Further, studies of these and other potential target genes may lead to the identification of novel driver genes in ACC.
Funded by: Wellcome Trust: 077012/Z/05/Z
Genes, chromosomes & cancer 2012;51;8;805-17
Automatic segmentation and tracking of thrombus formation within in vitro microscopic video sequences
Computer Aided Medical Procedures, Technische Universitaet Muenchen, Munich, Germany; Department of Haematology, University of Cambridge, Cambridge, United Kingdom; National Health Service Blood and Transplant, Cambridge, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, United Kingdom
Proceedings - International Symposium on Biomedical Imaging 2012;1635-7
A common single-nucleotide variant in T is strongly associated with chordoma.
Cancer Institute, University College London, UK.
Chordoma is a rare malignant bone tumor that expresses the transcription factor T. We conducted an association study of 40 individuals with chordoma and 358 ancestry-matched controls, with replication in an independent cohort. Whole-exome and Sanger sequencing of T exons showed strong association of the common nonsynonymous SNP rs2305089 with chordoma risk (allelic odds ratio (OR) = 6.1, 95% confidence interval (CI) = 3.1-12.1; P = 4.4 × 10(-9)), a finding that is exceptional in cancers with a non-Mendelian mode of inheritance.
Funded by: Medical Research Council: MC_U123160651; Wellcome Trust
Nature genetics 2012;44;11;1185-7
A tetracycline-repressible transactivator system to study essential genes in malaria parasites.
Department of Microbiology and Molecular Medicine, CMU, University of Geneva, 1 Rue Michel-Servet, 1211 Geneva 4, Switzerland.
A major obstacle in analyzing gene function in apicomplexan parasites is the absence of a practical regulatable expression system. Here, we identified functional transcriptional activation domains within Apicomplexan AP2 (ApiAP2) family transcription factors. These ApiAP2 transactivation domains were validated in blood-, liver-, and mosquito-stage parasites and used to create a robust conditional expression system for stage-specific, tetracycline-dependent gene regulation in Toxoplasma gondii, Plasmodium berghei, and Plasmodium falciparum. To demonstrate the utility of this system, we created conditional knockdowns of two essential P. berghei genes: profilin (PRF), a protein implicated in parasite invasion, and N-myristoyltransferase (NMT), which catalyzes protein acylation. Tetracycline-induced repression of PRF and NMT expression resulted in a dramatic reduction in parasite viability. This efficient regulatable system will allow for the functional characterization of essential proteins that are found in these important parasites.
Funded by: Howard Hughes Medical Institute; Medical Research Council: G0501670; NIAID NIH HHS: R01 AI076276; NIGMS NIH HHS: P50 GM071508, P50GM071508; Wellcome Trust: WT077502MA, WT098051
Cell host & microbe 2012;12;6;824-34
Shared loci for migraine and epilepsy on chromosomes 14q12-q23 and 12q24.2-q24.3.
Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland. email@example.com
Objectives: To describe clinical characteristics and to identify susceptibility loci for epilepsy and migraine in a Finnish family with a complex phenotype.
Methods: Participating family members were interviewed and medical files were reviewed. The seizure classification was made according to International League Against Epilepsy criteria. Migraine diagnosis was made using the validated Finnish Migraine Specific Questionnaire for Family Studies and criteria according to the current International Classification of Headache Disorders-II. DNA samples were obtained from 56 family members and nonparametric genome-wide linkage analyses were performed using 382 polymorphic microsatellite markers. The most promising loci were fine-mapped with additional microsatellite markers.
Results: Clinical data were obtained from 60 family members of whom 12 (20%) had idiopathic epileptic seizures. Eight of those 12 (67%) also had migraine. Altogether 33 of the 60 family members (55%) had migraine. Significant evidence of linkage was found between a locus on 14q12-q23 and migraine (p = 0.0001). Suggestive evidence of linkage in this region was also found for epilepsy with generalized tonic-clonic seizures (p = 0.0034). In addition, significant evidence of linkage was found at a locus on 12q24.2-q24.3 (p < 0.001) for migraine alone and for the combined phenotype of migraine and epilepsy.
Conclusions: Our data suggest the occurrence of common susceptibility loci for epilepsy and migraine on chromosomes 14q12-q23 and 12q24.2-q24.3, implicating a shared genetic etiology for these 2 diseases.
Funded by: NHGRI NIH HHS: R01 HG006139; NIGMS NIH HHS: R01 GM053275
The transcription factor T-bet regulates intestinal inflammation mediated by interleukin-7 receptor+ innate lymphoid cells.
Department of Experimental Immunobiology, Division of Transplantation Immunology and Mucosal Biology, King's College London, UK.
Mice lacking the transcription factor T-bet in the innate immune system develop microbiota-dependent colitis. Here, we show that interleukin-17A (IL-17A)-producing IL-7Rα(+) innate lymphoid cells (ILCs) were potent promoters of disease in Tbx21(-/-)Rag2(-/-) ulcerative colitis (TRUC) mice. TNF-α produced by CD103(-)CD11b(+) dendritic cells synergized with IL-23 to drive IL-17A production by ILCs, demonstrating a previously unrecognized layer of cellular crosstalk between dendritic cells and ILCs. We have identified Helicobacter typhlonius as a key disease trigger driving excess TNF-α production and promoting colitis in TRUC mice. Crucially, T-bet also suppressed the expression of IL-7R, a key molecule involved in controlling intestinal ILC homeostasis. The importance of IL-7R signaling in TRUC disease was highlighted by the dramatic reduction in intestinal ILCs and attenuated colitis following IL-7R blockade. Taken together, these data demonstrate the mechanism by which T-bet regulates the complex interplay between mucosal dendritic cells, ILCs, and the intestinal microbiota.
Funded by: Department of Health: DRF-2009-02-22; Medical Research Council: G0600081, G0802068, MR/J006742/1, MR/J011118/1, MR/K002996/1; Wellcome Trust: WT076964, WT088747MA
A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.
Funded by: FIC NIH HHS: D43 TW007012, TW007012; NIAID NIH HHS: HHSN272201000009I; PHS HHS: HHSN272201000009I; Wellcome Trust: 085775/Z/08/Z
PLoS neglected tropical diseases 2012;6;1;e1455
Evaluation and optimisation of preparative semi-automated electrophoresis systems for Illumina library preparation.
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK. firstname.lastname@example.org
Size selection can be a critical step in preparation of next-generation sequencing libraries. Traditional methods employing gel electrophoresis lack reproducibility, are labour intensive, do not scale well and employ hazardous interchelating dyes. In a high-throughput setting, solid-phase reversible immobilisation beads are commonly used for size-selection, but result in quite a broad fragment size range. We have evaluated and optimised the use of two semi-automated preparative DNA electrophoresis systems, the Caliper Labchip XT and the Sage Science Pippin Prep, for size selection of Illumina sequencing libraries.
Funded by: Wellcome Trust: 098051
A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.
Wellcome Trust Sanger Institute, Hinxton, UK. email@example.com
Background: Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy.
Results: Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform.
Conclusions: All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
Funded by: Wellcome Trust: 098051
BMC genomics 2012;13;341
Changes in HDL cholesterol and cardiovascular outcomes after lipid modification therapy.
Division of Clinical Sciences, St George's University of London, Cranmer Terrace, London, UK. firstname.lastname@example.org
Background: Lipid modification therapy (LMT) produces cardiovascular benefits principally through reductions in low density lipoprotein cholesterol. While recent evidence, using data from 454 participants in the Framingham Offspring Study, has suggested that increases in high density lipoprotein cholesterol (HDL-C) are also associated with a reduction in cardiovascular outcomes, independently of changes in low density lipoprotein cholesterol, replication of this finding is important. The authors therefore present further results using data from the EPIC-Norfolk (UK) and Rotterdam (The Netherlands) prospective cohort studies.
Methods: A total of 1148 participants, 446 from the EPIC-Norfolk and 702 from the Rotterdam study, were assessed for lipids before and after starting LMT. Subsequent risk of cardiovascular events, ascertained through linkage with mortality records and hospital databases, was investigated using Cox proportional hazards regression. Random effects meta-analysis was used to combine results across studies.
Results: Based on combined data from the EPIC-Norfolk and Rotterdam studies there was some evidence that change in HDL-C resulting from LMT was associated with reduced cardiovascular risk (HR per pooled SD (=0.34 mmol/l) increase=0.74, 95% CI 0.56 to 0.99, adjusted for age, sex and baseline HDL-C). However, this association was attenuated and was not (statistically) significant with further adjustments for non-HDL-C and for cigarette smoking history, prevalent diabetes, systolic blood pressure, body mass index, use of antihypertensive medication, previous myocardial infarction, prevalent angina and previous stroke (0.92, 0.701.20).
Conclusions: Following adjustment for conventional non-lipid risk factors of cardiovascular disease, this study provides no evidence to support a significant benefit from increasing HDL-C independent of the effect of lowering non-HDL-C.
Funded by: Cancer Research UK; Medical Research Council
Heart (British Cardiac Society) 2012;98;10;780-5
Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy.
Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, United Kingdom.
Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/08493; Canadian Institutes of Health Research; Wellcome Trust: 085775/Z/08/Z
PLoS pathogens 2012;8;3;e1002567
The TCA cycle is not required for selection or survival of multidrug-resistant Salmonella.
Antimicrobial Agents Research Group, School of Immunity and Infection, University of Birmingham, Edgbaston, Birmingham, UK.
Objectives: The initial aim of this study was to use a systems biology approach to analyse a ciprofloxacin-selected multidrug-resistant (MDR) Salmonella enterica serotype Typhimurium, L664.
Methods: The whole genome sequence and transcriptome of L664 were analysed. Site-directed mutagenesis to recreate each mutation was carried out, followed by phenotypic characterization and mutation frequency analysis. As a mutation in the TCA cycle was detected we tested the controversial hypothesis regarding the bacterial response to bactericidal antibiotics, put forward by Kohanski et al. (Cell 2007; 130: 797-810 and Mol Cell 2010; 37: 311-20), that exposure of bacteria to agents such as ciprofloxacin produces reactive oxygen species (ROS), which transiently increase the mutation rate giving rise to MDR bacteria.
Results: L664 contained a mutation in ramR that conferred MDR. A mutation in tctA affected the TCA cycle and conferred the inability to grow on minimal agar. The virulence of L664 was not attenuated. Ciprofloxacin exposure produced ROS in L664 and SL1344 (tctA::aph), but it was reduced and occurred later. There were no significant differences in the rates of killing or mutations per generation to antibiotic resistance between the strains.
Conclusions: Whilst we confirm production of ROS in response to ciprofloxacin, we have no data to support the hypothesis that this leads to selection of MDR strains. Our results indicate that the mutations in tctA and glgA were random as they did not pre-exist in the parental strain, and that the mutation in tctA did not provide a survival advantage or disadvantage in the presence of antibiotic.
Funded by: Medical Research Council: G0501415, G0801977, GO501415
The Journal of antimicrobial chemotherapy 2012;67;3;589-99
Mutations in ISPD cause Walker-Warburg syndrome and defective glycosylation of α-dystroglycan.
Department of Human Genetics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands.
Walker-Warburg syndrome (WWS) is an autosomal recessive multisystem disorder characterized by complex eye and brain abnormalities with congenital muscular dystrophy (CMD) and aberrant a-dystroglycan glycosylation. Here we report mutations in the ISPD gene (encoding isoprenoid synthase domain containing) as the second most common cause of WWS. Bacterial IspD is a nucleotidyl transferase belonging to a large glycosyltransferase family, but the role of the orthologous protein in chordates is obscure to date, as this phylum does not have the corresponding non-mevalonate isoprenoid biosynthesis pathway. Knockdown of ispd in zebrafish recapitulates the human WWS phenotype with hydrocephalus, reduced eye size, muscle degeneration and hypoglycosylated a-dystroglycan. These results implicate ISPD in a-dystroglycan glycosylation in maintaining sarcolemma integrity in vertebrates.
Funded by: Wellcome Trust: 077037, 077037/Z/05/Z, 077047/Z/05/Z
Nature genetics 2012;44;5;581-5
Insights into hominid evolution from the gorilla genome sequence.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.
Funded by: Biotechnology and Biological Sciences Research Council; Cancer Research UK: 15603, A15603; European Research Council: 202218; Howard Hughes Medical Institute; Intramural NIH HHS; Medical Research Council: G0501331, G0701805; NHGRI NIH HHS: HG002385, R01 HG002385, U54 HG003079; Wellcome Trust: 062023, 075491/Z/04, 077009, 077192, 077198, 089066, 090532, 095908, WT062023, WT077009, WT077192, WT077198, WT089066
Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages.
Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
CTCF-binding locations represent regulatory sequences that are highly constrained over the course of evolution. To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals. Our data indicate that activation of retroelements has produced species-specific expansions of CTCF binding in rodents, dogs, and opossum, which often functionally serve as chromatin and transcriptional insulators. We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago. Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.
Funded by: Cancer Research UK: CRUK_15603, CRUK_A15603; European Research Council: ERC_202218; Wellcome Trust: WT062023, WT095908, WT098051
Bioinformatics Training Network (BTN): a community resource for bioinformatics trainers.
EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. email@example.com
Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of 'high-throughput biology', the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.
Briefings in bioinformatics 2012;13;3;383-9
No interactions between previously associated 2-hour glucose gene variants and physical activity or BMI on 2-hour glucose levels.
Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, UK. firstname.lastname@example.org
Gene-lifestyle interactions have been suggested to contribute to the development of type 2 diabetes. Glucose levels 2 h after a standard 75-g glucose challenge are used to diagnose diabetes and are associated with both genetic and lifestyle factors. However, whether these factors interact to determine 2-h glucose levels is unknown. We meta-analyzed single nucleotide polymorphism (SNP) × BMI and SNP × physical activity (PA) interaction regression models for five SNPs previously associated with 2-h glucose levels from up to 22 studies comprising 54,884 individuals without diabetes. PA levels were dichotomized, with individuals below the first quintile classified as inactive (20%) and the remainder as active (80%). BMI was considered a continuous trait. Inactive individuals had higher 2-h glucose levels than active individuals (β = 0.22 mmol/L [95% CI 0.13-0.31], P = 1.63 × 10(-6)). All SNPs were associated with 2-h glucose (β = 0.06-0.12 mmol/allele, P ≤ 1.53 × 10(-7)), but no significant interactions were found with PA (P > 0.18) or BMI (P ≥ 0.04). In this large study of gene-lifestyle interaction, we observed no interactions between genetic and lifestyle factors, both of which were associated with 2-h glucose. It is perhaps unlikely that top loci from genome-wide association studies will exhibit strong subgroup-specific effects, and may not, therefore, make the best candidates for the study of interactions.
Funded by: British Heart Foundation: RG/07/008/23674; Medical Research Council: G0100222, G0701863, G0902037, G1002084, G19/35, G8802774, MC_U106179471, MC_U106179473, MC_UP_A100_1003, MC_UP_A620_1014, MC_UP_A620_1015; NCATS NIH HHS: UL1 TR000124; NCRR NIH HHS: UL1 RR024148, UL1 RR025741; NHLBI NIH HHS: T32 HL007575; NIDDK NIH HHS: K24 DK080140, P30 DK063491, R01 DK072041
Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways.
Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK.
Through genome-wide association meta-analyses of up to 133,010 individuals of European ancestry without diabetes, including individuals newly genotyped using the Metabochip, we have increased the number of confirmed loci influencing glycemic traits to 53, of which 33 also increase type 2 diabetes risk (q < 0.05). Loci influencing fasting insulin concentration showed association with lipid levels and fat distribution, suggesting impact on insulin resistance. Gene-based analyses identified further biologically plausible loci, suggesting that additional loci beyond those reaching genome-wide significance are likely to represent real associations. This conclusion is supported by an excess of directionally consistent and nominally significant signals between discovery and follow-up studies. Functional analysis of these newly discovered loci will further improve our understanding of glycemic control.
Funded by: AHRQ HHS: HS06516; Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: BHF_RG/07/008/23674, PG/07/133/24260; Chief Scientist Office: CSO_CZB/4/672, CSO_CZB/4/710; Department of Health; FIC NIH HHS: R01 TW005596, TW05596; Intramural NIH HHS; Medical Research Council: 74882, 85374, G0500539, G0600705, G0701863, MRC_G0100222, MRC_G0401527, MRC_G0800582, MRC_G0902037, MRC_G1000143, MRC_G1002084, MRC_G19/35, MRC_G8802774, MRC_G9521010, MRC_MC_PC_U127592696, MRC_MC_U106179471, MRC_MC_U127561128, MRC_MC_U127592696, MRC_MC_UP_A100_1003, MRC_U.1061.00.001 (79471); NCATS NIH HHS: UL1 TR000130; NCRR NIH HHS: M01 RR 16500, M01 RR000425, M01 RR016500, M01-RR00425, P20 RR020649, RR20649, UL1 RR025005, UL1RR025005; NHGRI NIH HHS: HHSN268200782096C, N01HG65403, U01 HG004402, U01HG004402, Z01 HG000024; NHLBI NIH HHS: 5R01HL087679-02, HL075366, HL080295, HL085144, HL087652, HL087660, HL100245, HL105756, N01 HC-55222, N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-85239, N01HC55015, N01HC55016, N01HC55018, N01HC55019, N01HC55020, N01HC55021, N01HC55022, N01HC55222, N01HC75150, N01HC85079, N01HC85086, N02 HL64278, N02-HL-6-4278, R01 HL036310, R01 HL059367, R01 HL075366, R01 HL080295, R01 HL085144, R01 HL086694, R01 HL087641, R01 HL087652, R01 HL087660, R01 HL087679, R01 HL087700, R01 HL088119, R01 HL088215, R01 HL105756, R01-HL-087700, R01-HL-088215, R01HL086694, R01HL087641, R01HL59367, RC1 HL100245, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL084756, U01 HL84756; NIA NIH HHS: 1R01AG032098-01A1, AG-023629, AG-027058, AG-20098, AG028555, AG04563, AG08724, AG08861, AG10175, AG13196, N01 AG062101, N01 AG062103, N01 AG062106, N01-AG-1-2100, N01-AG-1-2109, R01 AG010175, R01 AG013196, R01 AG015928, R01 AG018728, R01 AG020098, R01 AG023629, R01 AG027058, R01 AG028555, R01 AG032098, R01 AG18728, R37 AG013196, R56 AG020098, R56 AG023629, T32 AG000219; NIAMS NIH HHS: K08 AR055688, K08AR055688; NICHD NIH HHS: R24 HD050924; NIDDK NIH HHS: DK063491, DK078150, DK56350, K24 DK080140, P30 DK020572, P30 DK056350, P30 DK063491, P30 DK072488, P30 DK72488, P60 DK079637, P60DK79637, R01 DK054261, R01 DK062370, R01 DK072193, R01 DK075681, R01 DK078150, R01 DK078616, R01 DK54261, R01-DK-075681, R01-DK-8925601, R01-DK062370, R01-DK072193; NIEHS NIH HHS: ES10126, P30 ES010126; NIGMS NIH HHS: U01 GM074518; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02, R01 MH063706, RL1 MH083268, U24 MH068457, U24 MH068457-06; NIMHD NIH HHS: R01 MD009164; NLM NIH HHS: LM010098, R01 LM010098; PHS HHS: HHSN268200625226C, HHSN268200782096C, R01D0042157-01A; Wellcome Trust: 075491/Z/04, 076467, 083948, 092731, GR069224, WT081682, WT090532, WT098017, WT098051
Nature genetics 2012;44;9;991-1005
A Plasmodium calcium-dependent protein kinase controls zygote development and transmission by translationally activating repressed mRNAs.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Calcium-dependent protein kinases (CDPKs) play key regulatory roles in the life cycle of the malaria parasite, but in many cases their precise molecular functions are unknown. Using the rodent malaria parasite Plasmodium berghei, we show that CDPK1, which is known to be essential in the asexual blood stage of the parasite, is expressed in all life stages and is indispensable during the sexual mosquito life-cycle stages. Knockdown of CDPK1 in sexual stages resulted in developmentally arrested parasites and prevented mosquito transmission, and these effects were independent of the previously proposed function for CDPK1 in regulating parasite motility. In-depth translational and transcriptional profiling of arrested parasites revealed that CDPK1 translationally activates mRNA species in the developing zygote that in macrogametes remain repressed via their 3' and 5'UTRs. These findings indicate that CDPK1 is a multifunctional protein that translationally regulates mRNAs to ensure timely and stage-specific protein expression.
Funded by: Medical Research Council: G0501670; Wellcome Trust: 079643/Z/06/Z, WT098051
Cell host & microbe 2012;12;1;9-19
The dynamics of genome-wide DNA methylation reprogramming in mouse primordial germ cells.
Epigenetics Programme, The Babraham Institute, Cambridge CB22 3AT, UK.
Genome-wide DNA methylation reprogramming occurs in mouse primordial germ cells (PGCs) and preimplantation embryos, but the precise dynamics and biological outcomes are largely unknown. We have carried out whole-genome bisulfite sequencing (BS-Seq) and RNA-Seq across key stages from E6.5 epiblast to E16.5 PGCs. Global loss of methylation takes place during PGC expansion and migration with evidence for passive demethylation, but sequences that carry long-term epigenetic memory (imprints, CpG islands on the X chromosome, germline-specific genes) only become demethylated upon entry of PGCs into the gonads. The transcriptional profile of PGCs is tightly controlled despite global hypomethylation, with transient expression of the pluripotency network, suggesting that reprogramming and pluripotency are inextricably linked. Our results provide a framework for the understanding of the epigenetic ground state of pluripotency in the germline.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: MRC_G0700098, MRC_G0801156; Wellcome Trust: WT095645
Molecular cell 2012;48;6;849-62
Cancer develops, progresses and responds to therapies through restricted perturbation of the protein-protein interaction network.
Translational Research Laboratory, Breast Cancer Unit, Catalan Institute of Oncology (ICO), Bellvitge Institute for Biomedical Research (IDIBELL), Gran via 199, L'Hospitalet del Llobregat, Barcelona 08908, Catalonia, Spain.
The products of genes mutated or differentially expressed in cancer tend to occupy central positions within the network of protein-protein interactions, or the interactome network. Integration of different types of gene and protein relationships has considerably increased the understanding of the mechanisms of carcinogenesis, while also enhancing the applicability of expression signatures. In this scenario, however, it remains unknown how cancer develops, progresses and responds to therapies in a potentially controlled manner at the systems level. Here, by applying the concepts of load transfer and cascading failures in power grids, we examine the impact and transmission of cancer-related gene expression changes in the interactome network. Relative to random perturbations, this study reveals topological robustness associated with all cancer conditions. In addition, experimental perturbation of a central cancer node, which consists of over-expression of the α-synuclein (SNCA) protein in MCF7 breast cancer cells, also reveals robustness. Conversely, a search for proteins with an opposite topological impact identifies the autophagy pathway. Mechanistically, the existence of smaller shortest paths among cancer-related proteins appears to be a topological feature that partially contributes to the restricted perturbation of the network. Together, the results of this study suggest that cancer develops, progresses and responds to therapies following controlled, restricted perturbation of the interactome network.
Integrative biology : quantitative biosciences from nano to macro 2012;4;9;1038-48
Finding NECA: zebrafish screen identifies key signalling pathway in β-cell regeneration.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. email@example.com
Disease models & mechanisms 2012;5;6;709-10
optiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants.
Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.
Motivation: Existing microarray genotype-calling algorithms adopt either SNP-by-SNP (SNP-wise) or sample-by-sample (sample-wise) approaches to calling. We have developed a novel genotype-calling algorithm for the Illumina platform, optiCall, that uses both SNP-wise and sample-wise calling to more accurately ascertain genotypes at rare, low-frequency and common variants.
Results: Using data from 4537 individuals from the 1958 British Birth Cohort genotyped on the Immunochip, we estimate the proportion of SNPs lost to downstream analysis due to false quality control failures, and rare variants misclassified as monomorphic, is only 1.38% with optiCall, in comparison to 3.87, 7.85 and 4.09% for Illuminus, GenoSNP and GenCall, respectively. We show that optiCall accurately captures rare variants and can correctly account for SNPs where probe intensity clouds are shifted from their expected positions.
Availability and implementation: optiCall is implemented in C++ for use on UNIX operating systems and is available for download at http://www.sanger.ac.uk/resources/software/opticall/.
Funded by: Medical Research Council: G1001799; Wellcome Trust: 098051
Bioinformatics (Oxford, England) 2012;28;12;1598-603
An In-Solution Hybridisation Method for the Isolation of Pathogen DNA from Human DNA-rich Clinical Samples for Analysis by NGS.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.
Studies on DNA from pathogenic organisms, within clinical samples, are often complicated by the presence of large amounts of host, e.g., human DNA. Isolation of pathogen DNA from these samples would improve the efficiency of next-generation sequencing (NGS) and pathogen identification. Here we describe a solution-based hybridisation method for isolation of pathogen DNA from a mixed population. This straightforward and inexpensive technique uses probes made from whole-genome DNA and off-the-shelf reagents. In this study, Escherichia coli DNA was successfully enriched from a mixture of E.coli and human DNA. After enrichment, genome coverage following NGS was significantly higher and the evenness of coverage and GC content were unaffected. This technique was also applied to samples containing a mixture of human and Plasmodium falciparum DNA. The P.falciparum genome is particularly difficult to sequence due to its high AT content (80.6%) and repetitive nature. Post enrichment, a bias in the recovered DNA was observed, with a poorer representation of the AT-rich non-coding regions. This uneven coverage was also observed in pre-enrichment samples, but to a lesser degree. Despite the coverage bias in enriched samples, SNP (single-nucleotide polymorphism) calling in coding regions was unaffected and the majority of samples had over 90% of their coding region covered at 5× depth. This technique shows significant promise as an effective method to enrich pathogen DNA from samples with heavy human contamination, particularly when applied to GC-neutral genomes.
Funded by: Wellcome Trust: 082370, 090532
The open genomics journal 2012;5
Predisposition gene identification in common cancers by exome sequencing: insights from familial breast cancer.
Division of Genetics and Epidemiology, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK.
The genetic component of breast cancer predisposition remains largely unexplained. Candidate gene case-control resequencing has identified predisposition genes characterised by rare, protein truncating mutations that confer moderate risks of disease. In theory, exome sequencing should yield additional genes of this class. Here, we explore the feasibility and design considerations of this approach. We performed exome sequencing in 50 individuals with familial breast cancer, applying frequency and protein function filters to identify variants most likely to be pathogenic. We identified 867,378 variants that passed the call quality filters of which 1,296 variants passed the frequency and protein truncation filters. The median number of validated, rare, protein truncating variants was 10 in individuals with, and without, mutations in known genes. The functional candidacy of mutated genes was similar in both groups. Without prior knowledge, the known genes would not have been recognisable as breast cancer predisposition genes. Everyone carries multiple rare mutations that are plausibly related to disease. Exome sequencing in common conditions will therefore require intelligent sample and variant prioritisation strategies in large case-control studies to deliver robust genetic evidence of disease association.
Funded by: Cancer Research UK: C8620/A8372, C8620/A8857
Breast cancer research and treatment 2012;134;1;429-33
Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis.
Division of Rheumatology Immunology and Allergy, Brigham and Women's Hospital, Boston, Massachusetts, USA. firstname.lastname@example.org
The genetic architectures of common, complex diseases are largely uncharacterized. We modeled the genetic architecture underlying genome-wide association study (GWAS) data for rheumatoid arthritis and developed a new method using polygenic risk-score analyses to infer the total liability-scale variance explained by associated GWAS SNPs. Using this method, we estimated that, together, thousands of SNPs from rheumatoid arthritis GWAS explain an additional 20% of disease risk (excluding known associated loci). We further tested this method on datasets for three additional diseases and obtained comparable estimates for celiac disease (43% excluding the major histocompatibility complex), myocardial infarction and coronary artery disease (48%) and type 2 diabetes (49%). Our results are consistent with simulated genetic models in which hundreds of associated loci harbor common causal variants and a smaller number of loci harbor multiple rare causal variants. These analyses suggest that GWAS will continue to be highly productive for the discovery of additional susceptibility loci for common diseases.
Funded by: Canadian Institutes of Health Research: IIN-84042, MOP79321; Intramural NIH HHS; Medical Research Council: MRC_MC_U106179471; NIAMS NIH HHS: K08AR055688-01A1, N01-AR-2-2263, R01 AR056768, R01-AR056768, R01-AR057108, R01-AR059648, R01-AR44422; NIGMS NIH HHS: R01 GM045295, U01 GM092691, U01-GM092691; Wellcome Trust: WT090532
Nature genetics 2012;44;5;483-9
The landscape of cancer genes and mutational processes in breast cancer.
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
All cancers carry somatic mutations in their genomes. A subset, known as driver mutations, confer clonal selective advantage on cancer cells and are causally implicated in oncogenesis, and the remainder are passenger mutations. The driver mutations and mutational processes operative in breast cancer have not yet been comprehensively explored. Here we examine the genomes of 100 tumours for somatic copy number changes and mutations in the coding exons of protein-coding genes. The number of somatic mutations varied markedly between individual tumours. We found strong correlations between mutation number, age at which cancer was diagnosed and cancer histological grade, and observed multiple mutational signatures, including one present in about ten per cent of tumours characterized by numerous mutations of cytosine at TpC dinucleotides. Driver mutations were identified in several new cancer genes including AKT2, ARID1B, CASP8, CDKN1B, MAP3K1, MAP3K13, NCOR1, SMARCD1 and TBX3. Among the 100 tumours, we found driver mutations in at least 40 cancer genes and 73 different combinations of mutated cancer genes. The results highlight the substantial genetic diversity underlying this common disease.
Funded by: Cancer Research UK: CRUK_10118; Chief Scientist Office; Department of Health; NCI NIH HHS: CA089393, P30 CA016672, P50 CA089393; Wellcome Trust: 077012/Z/05/Z, WT088340, WT088340MA, WT093867
Lineage-specific virulence determinants of Haemophilus influenzae biogroup aegyptius.
Imperial College London, Medicine, St Mary’s Hospital campus, Norfolk Place, London W2 1PG, UK.
An emergent clone of Haemophilus influenzae biogroup aegyptius (Hae) is responsible for outbreaks of Brazilian purpuric fever (BPF). First recorded in Brazil in 1984, the so-called BPF clone of Hae caused a fulminant disease that started with conjunctivitis but developed into septicemic shock; mortality rates were as high as 70%. To identify virulence determinants, we conducted a pan-genomic analysis. Sequencing of the genomes of the BPF clone strain F3031 and a noninvasive conjunctivitis strain, F3047, and comparison of these sequences with 5 other complete H. influenzae genomes showed that >77% of the F3031 genome is shared among all H. influenzae strains. Delineation of the Hae accessory genome enabled characterization of 163 predicted protein-coding genes; identified differences in established autotransporter adhesins; and revealed a suite of novel adhesins unique to Hae, including novel trimeric autotransporter adhesins and 4 new fimbrial operons. These novel adhesins might play a critical role in host-pathogen interactions.
Funded by: Wellcome Trust
Emerging infectious diseases 2012;18;3;449-57
A benchmarked protein microarray-based platform for the identification of novel low-affinity extracellular protein interactions.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
Low-affinity extracellular protein interactions are critical for cellular recognition processes, but existing methods to detect them are limited in scale, making genome-wide interaction screens technically challenging. To address this, we report here the miniaturization of the AVEXIS (avidity-based extracellular interaction screen) assay by using protein microarray technology. To achieve this, we have developed protein tags and sample preparation methods that enable the parallel purification of hundreds of recombinant proteins expressed in mammalian cells. We benchmarked the protein microarray-based assay against a set of known quantified receptor-ligand pairs and show that it is sensitive enough to detect even very weak interactions that are typical of this class of interactions. The increase in scale enables interaction screening against a dilution series of immobilized proteins on the microarray enabling the observation of saturation binding behaviors to show interaction specificity and also the estimation of interaction affinities directly from the primary screen. These methodological improvements now permit screening for novel extracellular receptor-ligand interactions on a genome-wide scale.
Funded by: Wellcome Trust: 077108
Analytical biochemistry 2012;424;1;45-53
A genome-wide association study of monozygotic twin-pairs suggests a locus related to variability of serum high-density lipoprotein cholesterol.
Institute for Molecular Medicine, University of Helsinki, Finland.
Genome-wide association analysis on monozygotic twin-pairs offers a route to discovery of gene environment interactions through testing for variability loci associated with sensitivity to individual environment/lifestyle. We present a genome-wide scan of loci associated with intra-pair differences in serum lipid and apolipoprotein levels. We report data for 1,720 monozygotic female twin-pairs from GenomEUtwin project with 2.5 million SNPs, imputed or genotyped, and measured serum lipid fractions for both twins. We found one locus associated with intra-pair differences in high-density lipoprotein cholesterol, rs2483058 in an intron of SRGAP2, where twins carrying the C allele are more sensitive to environmental factors(P=3.98 x 10-8). We followed up the association in further genotyped monozygotic twins (N= 1,261),which showed a moderate association for the variant (P= 0.200, same direction of an effect). In addition,we report a new association on the level of apolipoprotein A-ll (P= 4.03 x 1 o-8).
Funded by: NIA NIH HHS: AG028555, AG04563, AG08724, AG08861, AG10175; NIAAA NIH HHS: AA07535, AA10248, AA11998, AA13320, AA13321, AA13326, AA14041, AA17688; NIDA NIH HHS: DA12854; NIMH NIH HHS: MH66206; Wellcome Trust
Twin research and human genetics : the official journal of the International Society for Twin Studies 2012;15;6;691-9
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.
Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.
Funded by: Wellcome Trust: 098051
Nature protocols 2012;7;7;1260-84
DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
Patients with developmental disorders often harbour sub-microscopic deletions or duplications that lead to a disruption of normal gene expression or perturbation in the copy number of dosage-sensitive genes. Clinical interpretation for such patients in isolation is hindered by the rarity and novelty of such disorders. The DECIPHER project (https://decipher.sanger.ac.uk) was established in 2004 as an accessible online repository of genomic and associated phenotypic data with the primary goal of aiding the clinical interpretation of rare copy-number variants (CNVs). DECIPHER integrates information from a variety of bioinformatics resources and uses visualization tools to identify potential disease genes within a CNV. A two-tier access system permits clinicians and clinical scientists to maintain confidential linked anonymous records of phenotypes and CNVs for their patients that, with informed consent, can subsequently be shared with the wider clinical genetics and research communities. Advances in next-generation sequencing technologies are making it practical and affordable to sequence the whole exome/genome of patients who display features suggestive of a genetic disorder. This approach enables the identification of smaller intragenic mutations including single-nucleotide variants that are not accessible even with high-resolution genomic array analysis. This article briefly summarizes the current status and achievements of the DECIPHER project and looks ahead to the opportunities and challenges of jointly analysing structural and sequence variation in the human genome.
Funded by: Wellcome Trust: WT077008
Human molecular genetics 2012;21;R1;R37-44
Rapid whole-genome sequencing of bacterial pathogens in the clinical microbiology laboratory--pipe dream or reality?
Department of Medicine, University of Cambridge, Box 157, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0QQ, UK. email@example.com
The ability to perform rapid, high-throughput whole-genome sequencing using bench-top platforms represents a step-change in capabilities for diagnostic and public health microbiology laboratories. As the cost of sequencing continues to decline, the challenge will be to define when and where to apply this technology. This article reviews its potential applications in the clinical microbiology laboratory and discusses the current barriers to implementation.
Funded by: Medical Research Council: G1000803
The Journal of antimicrobial chemotherapy 2012;67;10;2307-8
Common variants at 12q15 and 12q24 are associated with infant head circumference.
Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands.
To identify genetic variants associated with head circumference in infancy, we performed a meta-analysis of seven genome-wide association studies (GWAS) (N = 10,768 individuals of European ancestry enrolled in pregnancy and/or birth cohorts) and followed up three lead signals in six replication studies (combined N = 19,089). rs7980687 on chromosome 12q24 (P = 8.1 × 10(-9)) and rs1042725 on chromosome 12q15 (P = 2.8 × 10(-10)) were robustly associated with head circumference in infancy. Although these loci have previously been associated with adult height, their effects on infant head circumference were largely independent of height (P = 3.8 × 10(-7) for rs7980687 and P = 1.3 × 10(-7) for rs1042725 after adjustment for infant height). A third signal, rs11655470 on chromosome 17q21, showed suggestive evidence of association with head circumference (P = 3.9 × 10(-6)). SNPs correlated to the 17q21 signal have shown genome-wide association with adult intracranial volume, Parkinson's disease and other neurodegenerative diseases, indicating that a common genetic variant in this region might link early brain growth with neurological disease in later life.
Funded by: British Heart Foundation; Canadian Institutes of Health Research: MOP 82893; Medical Research Council: G0500539, G0600331, G0600705, G0800582, G0801056; NHLBI NIH HHS: 5R01HL087679-02, R01 HL087679; NICHD NIH HHS: 1R01HD056465-01A1, R01 HD056465; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: 079643, 085541/Z/08/Z, 090532, 092731, 098395, GR069224, WT083431MA, WT088431MA
Nature genetics 2012;44;5;532-8
Rare variant association testing for next-generation sequencing data via hierarchical clustering.
Wellcome Trust Sanger Institute, Hinxton, UK. firstname.lastname@example.org
Objectives: It is thought that a proportion of the genetic susceptibility to complex diseases is due to low-frequency and rare variants. Next-generation sequencing in large populations facilitates the detection of rare variant associations to disease risk. In order to achieve adequate power to detect association at low-frequency and rare variants, locus-specific statistical methods are being developed that combine information across variants within a functional unit and test for association with this enriched signal through so-called burden tests.
Methods: We propose a hierarchical clustering approach and a similarity kernel-based association test for continuous phenotypes. This method clusters individuals into groups, within which samples are assumed to be genetically similar, and subsequently tests the group effects among the different clusters.
Results: The power of this approach is comparable to that of collapsing methods when causal variants have the same direction of effect, but its power is significantly higher compared to burden tests when both protective and risk variants are present in the region of interest. Overall, we observe that the Sequence Kernel Association Test (SKAT) is the most powerful approach under the allelic architectures considered.
Conclusions: In our overall comparison, we find the analytical framework within which SKAT operates to yield higher power and to control type I error appropriately.
Funded by: Wellcome Trust: 098051, WT090532, WT098017
Human heredity 2012;74;3-4;165-71
Biochemical and functional analysis of two Plasmodium falciparum blood-stage 6-cys proteins: P12 and P41.
Burnet Institute, Melbourne, Victoria, Australia.
The genomes of Plasmodium parasites that cause malaria in humans, other primates, birds, and rodents all encode multiple 6-cys proteins. Distinct 6-cys protein family members reside on the surface at each extracellular life cycle stage and those on the surface of liver infective and sexual stages have been shown to play important roles in hepatocyte growth and fertilization respectively. However, 6-cys proteins associated with the blood-stage forms of the parasite have no known function. Here we investigate the biochemical nature and function of two blood-stage 6-cys proteins in Plasmodium falciparum, the most pathogenic species to afflict humans. We show that native P12 and P41 form a stable heterodimer on the infective merozoite surface and are secreted following invasion, but could find no evidence that this complex mediates erythrocyte-receptor binding. That P12 and P41 do not appear to have a major role as adhesins to erythrocyte receptors was supported by the observation that antisera to these proteins did not substantially inhibit erythrocyte invasion. To investigate other functional roles for these proteins their genes were successfully disrupted in P. falciparum, however P12 and P41 knockout parasites grew at normal rates in vitro and displayed no other obvious phenotypic changes. It now appears likely that these blood-stage 6-cys proteins operate as a pair and play redundant roles either in erythrocyte invasion or in host-immune interactions.
Funded by: Wellcome Trust: 077108/Z/05/Z
PloS one 2012;7;7;e41937
The tomato genome sequence provides insights into fleshy fruit evolution.
Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.
Funded by: Biotechnology and Biological Sciences Research Council: BB/C509731/1, BB/G006199/1
Molecular characterization of the 2011 Hong Kong scarlet fever outbreak.
Department of Microbiology, The University of Hong Kong, Hong Kong Special Administrative Region, China. email@example.com
A scarlet fever outbreak occurred in Hong Kong in 2011. The majority of cases resulted in the isolation of Streptococcus pyogenes emm12 with multiple antibiotic resistances. Phylogenetic analysis of 22 emm12 scarlet fever outbreak isolates, 7 temporally and geographically matched emm12 non-scarlet fever isolates, and 18 emm12 strains isolated during 2005-2010 indicated the outbreak was multiclonal. Genome sequencing of 2 nonclonal scarlet fever isolates (HKU16 and HKU30), coupled with diagnostic polymerase chain reaction assays, identified 2 mobile genetic elements distributed across the major lineages: a 64.9-kb integrative and conjugative element encoding tetracycline and macrolide resistance and a 46.4-kb prophage encoding superantigens SSA and SpeC and the DNase Spd1. Phenotypic comparison of HKU16 and HKU30 with the S. pyogenes M1T1 strain 5448 revealed that HKU16 displays increased adherence to HEp-2 human epithelial cells, whereas HKU16, HKU30, and 5448 exhibit equivalent resistance to neutrophils and virulence in a humanized plasminogen murine model. However, in contrast to M1T1, the virulence of HKU16 and HKU30 was not associated with covRS mutation. The multiclonal nature of the emm12 scarlet fever isolates suggests that factors such as mobile genetic elements, environmental factors, and host immune status may have contributed to the 2011 scarlet fever outbreak.
The Journal of infectious diseases 2012;206;3;341-51
Detailed metabolic and genetic characterization reveals new associations for 30 known lipid loci.
Institute for Molecular Medicine Finland FIMM, Helsinki University Hospital, FI-00014 University of Helsinki, Helsinki, Finland.
Almost 100 genetic loci are known to affect serum cholesterol and triglyceride levels. For many of these loci, the biological function and causal variants remain unknown. We performed an association analysis of the reported 95 lipid loci against 216 metabolite measures, including 95 measurements on lipids and lipoprotein subclasses, obtained via serum nuclear magnetic resonance metabolomics and four enzymatic lipid traits in 8330 individuals from Finland. The genetic variation in the loci was investigated using a dense set of 440 807 directly genotyped and imputed variants around the previously identified lead single nucleotide polymorphisms (SNPs). For 30 of the 95 loci, we identified new metabolic or genetic associations (P < 5 × 10(-8)). In the majority of the loci, the strongest association was to a more specific metabolite measure than the enzymatic lipids. In four loci, the smallest high-density lipoprotein measures showed effects opposite to the larger ones, and 14 loci had associations beyond the individual lipoprotein measures. In 27 loci, we identified SNPs with a stronger association than the previously reported markers and 12 loci harboured multiple, statistically independent variants. Our data show considerable diversity in association patterns between the loci originally identified through associations with enzymatic lipid measures and reveal association profiles of far greater detail than from routine clinical lipid measures. Additionally, a dense marker set and a homogeneous population allow for detailed characterization of the genetic association signals to a resolution exceeding that achieved so far. Further understanding of the rich variability in genetic effects on metabolites provides insights into the biological processes modifying lipid levels.
Human molecular genetics 2012;21;6;1444-55
Gene-gene interactions in breast cancer susceptibility.
Division of Genetics and Epidemiology, The Institute of Cancer Research, Sutton, Surrey SM2 5NG, UK. firstname.lastname@example.org
There have been few definitive examples of gene-gene interactions in humans. Through mutational analyses in 7325 individuals, we report four interactions (defined as departures from a multiplicative model) between mutations in the breast cancer susceptibility genes ATM and CHEK2 with BRCA1 and BRCA2 (case-only interaction between ATM and BRCA1/BRCA2 combined, P = 5.9 × 10(-4); ATM and BRCA1, P= 0.01; ATM and BRCA2, P= 0.02; CHEK2 and BRCA1/BRCA2 combined, P = 2.1 × 10(-4); CHEK2 and BRCA1, P= 0.01; CHEK2 and BRCA2, P= 0.01). The interactions are such that the resultant risk of breast cancer is lower than the multiplicative product of the constituent risks, and plausibly reflect the functional relationships of the encoded proteins in DNA repair. These findings have important implications for models of disease predisposition and clinical translation.
Funded by: Cancer Research UK: 10118, 11990, C1287/A10118, C1287/A8874, C8620/A8372, C8620/A8857; Medical Research Council: G0000934, G0700491; Wellcome Trust: 068545/Z/02, 090532
Human molecular genetics 2012;21;4;958-62
A British approach to sampling.
Funded by: Wellcome Trust: 077009
European journal of human genetics : EJHG 2012;20;2;129-30
Sibling rivalry among paralogs promotes evolution of the human brain.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
Geneticists have long sought to identify the genetic changes that made us human, but pinpointing the functionally relevant changes has been challenging. Two papers in this issue suggest that partial duplication of SRGAP2, producing an incomplete protein that antagonizes the original, contributed to human brain evolution.
Funded by: Wellcome Trust: 098051
Generation of diversity by somatic mutation in the Camelus dromedarius T-cell receptor gamma variable domains.
Department of Biology, University of Bari, Bari, Italy.
In jawed vertebrates the V-(D)-J rearrangement is the main mechanism generating limitless variations of antigen-specific receptors, immunoglobulins (IGs), and T-cell receptors (TCRs) from few genes. Once the initial diversity is established in primary lymphoid organs, further diversification occurs in IGs by somatic hypermutation, a mechanism from which rearranged TCR genes were thought to be excluded. Here, we report the locus organization and expression of the T-cell receptor gamma (TCRG) genes in the Arabian camel (Camelus dromedarius). Expression data provide evidence that dromedary utilizes only two TCRG V-J genomic arrangements and, as expected, CDR3 contributes the major variability in the V domain. The data also suggest that diversity might be generated by mutation in the productively rearranged TCRGV genes. As for IG genes, the mutational target is biased toward G and C bases and (A/G/T)G(C/T)(A/T) motif (or DGYW). The replacement and synonymous substitutions (R/S) ratios in TCRGV regions are higher for CDR than for framework region, thus suggesting selection toward amino acid changes in CDR. Using the counterpart human TCR γδ receptor as a template, structural models computed adopting a comparative procedure show that nonconservative mutations contribute to diversity in CDR2 and at the γδ V domain interface.
European journal of immunology 2012;42;12;3416-28
Seventy-five genetic loci influencing the human red blood cell.
Department of Cardiology, University of Groningen, University Medical Center Groningen, 9700 RB Groningen, The Netherlands. firstname.lastname@example.org
Anaemia is a chief determinant of global ill health, contributing to cognitive impairment, growth retardation and impaired physical capacity. To understand further the genetic factors influencing red blood cells, we carried out a genome-wide association study of haemoglobin concentration and related parameters in up to 135,367 individuals. Here we identify 75 independent genetic loci associated with one or more red blood cell phenotypes at P < 10(-8), which together explain 4-9% of the phenotypic variance per trait. Using expression quantitative trait loci and bioinformatic strategies, we identify 121 candidate genes enriched in functions relevant to red blood cell biology. The candidate genes are expressed preferentially in red blood cell precursors, and 43 have haematopoietic phenotypes in Mus musculus or Drosophila melanogaster. Through open-chromatin and coding-variant analyses we identify potential causal genetic variants at 41 loci. Our findings provide extensive new insights into genetic mechanisms and biological pathways controlling red blood cell formation and function.
Funded by: British Heart Foundation: BHF_RG/08/014/24067, BHF_RG/09/012/28096; Cancer Research UK: CRUK_14136; Chief Scientist Office: CSO_CZB/4/505, CSO_ETM/55; Department of Health: DH_RP-PG-0310-1002; Medical Research Council: MRC_G0401527, MRC_G0600705, MRC_G0700704, MRC_G0801056, MRC_G1000143, MRC_G1002084, MRC_G9815508, MRC_MC_U106179471, MRC_MC_U106188470; NCATS NIH HHS: UL1 TR000439; NCI NIH HHS: R01 CA165001; NCRR NIH HHS: K12 RR023250, U54 RR020278, UL1 RR025005; NHGRI NIH HHS: U01 HG004402; NHLBI NIH HHS: HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C, P01 HL076491, P01 HL098055, P20 HL113452, R01 HL059367, R01 HL086694, R01 HL087641, R01 HL087679, R01 HL088119, R01 HL103866, R01 HL103931, U01 HL072515, U01 HL084756; NIA NIH HHS: N01AG12109, R01 AG018728; NICHD NIH HHS: R01 HD042157; NIDA NIH HHS: HHSN271201100005C; NIDDK NIH HHS: P30 DK072488; NIGMS NIH HHS: R01 GM053275, U01 GM074518; NIMH NIH HHS: R01 MH081802, RL1 MH083268, U24 MH068457; NLM NIH HHS: R01 LM010098; Wellcome Trust: WT092731, WT097117
Increased tumorigenesis associated with loss of the tumor suppressor gene Cadm1.
Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK. email@example.com
Background: CADM1 encodes an immunoglobulin superfamily (IGSF) cell adhesion molecule. Inactivation of CADM1, either by promoter hypermethylation or loss of heterozygosity, has been reported in a wide variety of tumor types, thus it has been postulated as a tumor suppressor gene.
Findings: We show for the first time that Cadm1 homozygous null mice die significantly faster than wildtype controls due to the spontaneous development of tumors at an earlier age and an increased tumor incidence of predominantly lymphomas, but also some solid tumors. Tumorigenesis was accelerated after irradiation of Cadm1 mice, with the reduced latency in tumor formation suggesting there are genes that collaborate with loss of Cadm1 in tumorigenesis. To identify these co-operating genetic events, we performed a Sleeping Beauty transposon-mediated insertional mutagenesis screen in Cadm1 mice, and identified several common insertion sites (CIS) found specifically on a Cadm1-null background (and not wildtype background).
Conclusion: We confirm that Cadm1 is indeed a bona fide tumor suppressor gene and provide new insights into genetic partners that co-operate in tumorigenesis when Cadm1-expression is lost.
Funded by: Cancer Research UK: 13031
Molecular cancer 2012;11;29
Loss of RASSF1A synergizes with deregulated RUNX2 signaling in tumorigenesis.
Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
The tumor suppressor gene RASSF1A is inactivated through point mutation or promoter hypermethylation in many human cancers. In this study, we conducted a Sleeping Beauty transposon-mediated insertional mutagenesis screen in Rassf1a-null mice to identify candidate genes that collaborate with loss of Rassf1a in tumorigenesis. We identified 10 genes, including the transcription factor Runx2, a transcriptional partner of Yes-associated protein (YAP1) that displays tumor suppressive activity through competing with the oncogenic TEA domain family of transcription factors (TEAD) for YAP1 association. While loss of RASSF1A promoted the formation of oncogenic YAP1-TEAD complexes, the combined loss of both RASSF1A and RUNX2 further increased YAP1-TEAD levels, showing that loss of RASSF1A, together with RUNX2, is consistent with the multistep model of tumorigenesis. Clinically, RUNX2 expression was frequently downregulated in various cancers, and reduced RUNX2 expression was associated with poor survival in patients with diffuse large B-cell or atypical Burkitt/Burkitt-like lymphomas. Interestingly, decreased expression levels of RASSF1 and RUNX2 were observed in both precursor T-cell acute lymphoblastic leukemia and colorectal cancer, further supporting the hypothesis that dual regulation of YAP1-TEAD promotes oncogenic activity. Together, our findings provide evidence that loss of RASSF1A expression switches YAP1 from a tumor suppressor to an oncogene through regulating its association with transcription factors, thereby suggesting a novel mechanism for RASSF1A-mediated tumor suppression.
Funded by: Cancer Research UK: AI2932, C20510/A6997, CRUK_13031, CRUK_A6997, CRUK_A9318; Medical Research Council; Wellcome Trust: WT082356
Cancer research 2012;72;15;3817-3827
Genetic markers for SSG resistance in Leishmania donovani and SSG treatment failure in visceral leishmaniasis patients of the Indian subcontinent.
Department of Biomedical Sciences, Institute of Tropical Medicine Antwerp, Belgium.
The current standard to assess pentavalent antimonial (SSG) susceptibility of Leishmania is a laborious in vitro assay of which the result has little clinical value because SSG-resistant parasites are also found in SSG-cured patients. Candidate genetic markers for clinically relevant SSG-resistant parasites identified by full genome sequencing were here validated on a larger set of clinical strains. We show that 3 genomic locations suffice to specifically detect the SSG-resistant parasites found only in patients experiencing SSG treatment failure. This finding allows the development of rapid assays to monitor the emergence and spread of clinically relevant SSG-resistant Leishmania parasites.
Funded by: Wellcome Trust: 076355, WT 085775/Z/08/Z
The Journal of infectious diseases 2012;206;5;752-5
Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study.
Department of Pharmacology, University of Pennsylvania, Philadelphia, PA, USA.
Background: High plasma HDL cholesterol is associated with reduced risk of myocardial infarction, but whether this association is causal is unclear. Exploiting the fact that genotypes are randomly assigned at meiosis, are independent of non-genetic confounding, and are unmodified by disease processes, mendelian randomisation can be used to test the hypothesis that the association of a plasma biomarker with disease is causal.
Methods: We performed two mendelian randomisation analyses. First, we used as an instrument a single nucleotide polymorphism (SNP) in the endothelial lipase gene (LIPG Asn396Ser) and tested this SNP in 20 studies (20,913 myocardial infarction cases, 95,407 controls). Second, we used as an instrument a genetic score consisting of 14 common SNPs that exclusively associate with HDL cholesterol and tested this score in up to 12,482 cases of myocardial infarction and 41,331 controls. As a positive control, we also tested a genetic score of 13 common SNPs exclusively associated with LDL cholesterol.
Findings: Carriers of the LIPG 396Ser allele (2·6% frequency) had higher HDL cholesterol (0·14 mmol/L higher, p=8×10(-13)) but similar levels of other lipid and non-lipid risk factors for myocardial infarction compared with non-carriers. This difference in HDL cholesterol is expected to decrease risk of myocardial infarction by 13% (odds ratio [OR] 0·87, 95% CI 0·84-0·91). However, we noted that the 396Ser allele was not associated with risk of myocardial infarction (OR 0·99, 95% CI 0·88-1·11, p=0·85). From observational epidemiology, an increase of 1 SD in HDL cholesterol was associated with reduced risk of myocardial infarction (OR 0·62, 95% CI 0·58-0·66). However, a 1 SD increase in HDL cholesterol due to genetic score was not associated with risk of myocardial infarction (OR 0·93, 95% CI 0·68-1·26, p=0·63). For LDL cholesterol, the estimate from observational epidemiology (a 1 SD increase in LDL cholesterol associated with OR 1·54, 95% CI 1·45-1·63) was concordant with that from genetic score (OR 2·13, 95% CI 1·69-2·69, p=2×10(-10)).
Interpretation: Some genetic mechanisms that raise plasma HDL cholesterol do not seem to lower risk of myocardial infarction. These data challenge the concept that raising of plasma HDL cholesterol will uniformly translate into reductions in risk of myocardial infarction.
Funding: US National Institutes of Health, The Wellcome Trust, European Union, British Heart Foundation, and the German Federal Ministry of Education and Research.
Funded by: British Heart Foundation; Medical Research Council: MC_U137686857; NCRR NIH HHS: UL1 RR024148; NHLBI NIH HHS: N01 HC025195, R00 HL094535; NIDDK NIH HHS: R01 DK072193; Wellcome Trust: 090532
Lancet (London, England) 2012;380;9841;572-80
Permissive and restricted virus infection of murine embryonic stem cells.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Recent RNA interference (RNAi) studies have identified many host proteins that modulate virus infection, but small interfering RNA 'off-target' effects and the use of transformed cell lines limit their conclusiveness. As murine embryonic stem (mES) cells can be genetically modified and resources exist where many and eventually all known mouse genes are insertionally inactivated, it was reasoned that mES cells would provide a useful alternative to RNAi screens. Beyond allowing investigation of host-pathogen interactions in vitro, mES cells have the potential to differentiate into other primary cell types, as well as being used to generate knockout mice for in vivo studies. However, mES cells are poorly characterized for virus infection. To investigate whether ES cells can be used to explore host-virus interactions, this study characterized the responses of mES cells following infection by herpes simplex virus type 1 (HSV-1) and influenza A virus. HSV-1 replicated lytically in mES cells, although mES cells were less permissive than most other cell types tested. Influenza virus was able to enter mES cells and express some viral proteins, but the replication cycle was incomplete and no infectious virus was produced. Knockdown of the host protein AHCYL1 in mES cells reduced HSV-1 replication, showing the potential for using mES cells to study host-virus interactions. Transcriptional profiling, however, indicated the lack of an efficient innate immune response in these cells. mES cells may thus be useful to identify host proteins that play a role in virus replication, but they are not suitable to determine factors that are involved in innate host defence.
Funded by: Medical Research Council: G9800943, MR/J002232/1; Wellcome Trust
The Journal of general virology 2012;93;Pt 10;2118-30
Bayesian refinement of association signals for 14 loci in 3 common diseases.
The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies.
Funded by: Arthritis Research UK: 17552; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800759, G19/9, G9521010; Wellcome Trust: 083948, 090532, 091157, 095552, 098051
Nature genetics 2012;44;12;1294-301
Nuclear receptor binding protein 1 regulates intestinal progenitor cell homeostasis and tumour formation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
Genetic screens in simple model organisms have identified many of the key components of the conserved signal transduction pathways that are oncogenic when misregulated. Here, we identify H37N21.1 as a gene that regulates vulval induction in let-60(n1046gf), a strain with a gain-of-function mutation in the Caenorhabditis elegans Ras orthologue, and show that somatic deletion of Nrbp1, the mouse orthologue of this gene, results in an intestinal progenitor cell phenotype that leads to profound changes in the proliferation and differentiation of all intestinal cell lineages. We show that Nrbp1 interacts with key components of the ubiquitination machinery and that loss of Nrbp1 in the intestine results in the accumulation of Sall4, a key mediator of stem cell fate, and of Tsc22d2. We also reveal that somatic loss of Nrbp1 results in tumourigenesis, with haematological and intestinal tumours predominating, and that nuclear receptor binding protein 1 (NRBP1) is downregulated in a range of human tumours, where low expression correlates with a poor prognosis. Thus NRBP1 is a conserved regulator of cell fate, that plays an important role in tumour suppression.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0600127; Wellcome Trust
The EMBO journal 2012;31;11;2486-97
Sequencing and characterization of the FVB/NJ mouse genome.
Background: The FVB/NJ mouse strain has its origins in a colony of outbred Swiss mice established in 1935 at the National Institutes of Health. Mice derived from this source were selectively bred for sensitivity to histamine diphosphate and the B strain of Friend leukemia virus. This led to the establishment of the FVB/N inbred strain, which was subsequently imported to the Jackson Laboratory and designated FVB/NJ. The FVB/NJ mouse has several distinct characteristics, such as large pronuclear morphology, vigorous reproductive performance, and consistently large litters that make it highly desirable for transgenic strain production and general purpose use.
Results: Using next-generation sequencing technology, we have sequenced the genome of FVB/NJ to approximately 50-fold coverage, and have generated a comprehensive catalog of single nucleotide polymorphisms, small insertion/deletion polymorphisms, and structural variants, relative to the reference C57BL/6J genome. We have examined a previously identified quantitative trait locus for atherosclerosis susceptibility on chromosome 10 and identify several previously unknown candidate causal variants.
Conclusion: The sequencing of the FVB/NJ genome and generation of this catalog has increased the number of known variant sites in FVB/NJ by a factor of four, and will help accelerate the identification of the precise molecular variants that are responsible for phenotypes observed in this widely used strain.
Funded by: Cancer Research UK; Medical Research Council: MRC_G0800024; NIH HHS: P40 OD010972; Wellcome Trust
Genome biology 2012;13;8;R72
Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.
Funded by: Wellcome Trust: 079643/Z/06/Z
Molecular & cellular proteomics : MCP 2012;11;8;478-91
Genomic variation in the vomeronasal receptor gene repertoires of inbred mice.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Background: Vomeronasal receptors (VRs), expressed in sensory neurons of the vomeronasal organ, are thought to bind pheromones and mediate innate behaviours. The mouse reference genome has over 360 functional VRs arranged in highly homologous clusters, but the vast majority are of unknown function. Differences in these receptors within and between closely related species of mice are likely to underpin a range of behavioural responses. To investigate these differences, we interrogated the VR gene repertoire from 17 inbred strains of mice using massively parallel sequencing.
Results: Approximately half of the 6222 VR genes that we investigated could be successfully resolved, and those that were unambiguously mapped resulted in an extremely accurate dataset. Collectively VRs have over twice the coding sequence variation of the genome average; but we identify striking non-random distribution of these variants within and between genes, clusters, clades and functional classes of VRs. We show that functional VR gene repertoires differ considerably between different Mus subspecies and species, suggesting these receptors may play a role in mediating behavioural adaptations. Finally, we provide evidence that widely-used, highly inbred laboratory-derived strains have a greatly reduced, but not entirely redundant capacity for differential pheromone-mediated behaviours.
Conclusions: Together our results suggest that the unusually variable VR repertoires of mice have a significant role in encoding differences in olfactory-mediated responses and behaviours. Our dataset has expanded over nine fold the known number of mouse VR alleles, and will enable mechanistic analyses into the genetics of innate behavioural differences in mice.
Funded by: Wellcome Trust: 098051
BMC genomics 2012;13;415
Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
We have assessed the numbers of potentially deleterious variants in the genomes of apparently healthy humans by using (1) low-coverage whole-genome sequence data from 179 individuals in the 1000 Genomes Pilot Project and (2) current predictions and databases of deleterious variants. Each individual carried 281-515 missense substitutions, 40-85 of which were homozygous, predicted to be highly damaging. They also carried 40-110 variants classified by the Human Gene Mutation Database (HGMD) as disease-causing mutations (DMs), 3-24 variants in the homozygous state, and many polymorphisms putatively associated with disease. Whereas many of these DMs are likely to represent disease-allele-annotation errors, between 0 and 8 DMs (0-1 homozygous) per individual are predicted to be highly damaging, and some of them provide information of medical relevance. These analyses emphasize the need for improved annotation of disease alleles both in mutation databases and in the primary literature; some HGMD mutation data have been recategorized on the basis of the present findings, an iterative process that is both necessary and ongoing. Our estimates of deleterious-allele numbers are likely to be subject to both overcounting and undercounting. However, our current best mean estimates of ~400 damaging variants and ~2 bona fide disease mutations per individual are likely to increase rather than decrease as sequencing studies ascertain rare variants more effectively and as additional disease alleles are discovered.
Funded by: Wellcome Trust: WT085532, WT098051
American journal of human genetics 2012;91;6;1022-32
Next-generation sequencing of experimental mouse strains.
Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland. Binnaz.Yalcin@unil.ch
Since the turn of the century the complete genome sequence of just one mouse strain, C57BL/6J, has been available. Knowing the sequence of this strain has enabled large-scale forward genetic screens to be performed, the creation of an almost complete set of embryonic stem (ES) cell lines with targeted alleles for protein-coding genes, and the generation of a rich catalog of mouse genomic variation. However, many experiments that use other common laboratory mouse strains have been hindered by a lack of whole-genome sequence data for these strains. The last 5 years has witnessed a revolution in DNA sequencing technologies. Recently, these technologies have been used to expand the repertoire of fully sequenced mouse genomes. In this article we review the main findings of these studies and discuss how the sequence of mouse genomes is helping pave the way from sequence to phenotype. Finally, we discuss the prospects for using de novo assembly techniques to obtain high-quality assembled genome sequences of these laboratory mouse strains, and what advances in sequencing technologies may be required to achieve this goal.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0800024; Wellcome Trust: 090532
Mammalian genome : official journal of the International Mammalian Genome Society 2012;23;9-10;490-8
WormBase 2012: more genomes, more data, new website.
Division of Biology 156-29, California Institute of Technology, Pasadena, CA 91125, USA. firstname.lastname@example.org
Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community.
Funded by: Howard Hughes Medical Institute; Medical Research Council: G070119, G0701197; NHGRI NIH HHS: P41 HG02223, P41-HG02223
Nucleic acids research 2012;40;Database issue;D735-41
Bcl11a is essential for lymphoid development and negatively regulates p53.
Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
Transcription factors play important roles in lymphopoiesis. We have previously demonstrated that Bcl11a is essential for normal lymphocyte development in the mouse embryo. We report here that, in the adult mouse, Bcl11a is expressed in most hematopoietic cells and is highly enriched in B cells, early T cell progenitors, common lymphoid progenitors (CLPs), and hematopoietic stem cells (HSCs). In the adult mouse, Bcl11a deletion causes apoptosis in early B cells and CLPs and completely abolishes the lymphoid development potential of HSCs to B, T, and NK cells. Myeloid development, in contrast, is not obviously affected by the loss of Bcl11a. Bcl11a regulates expression of Bcl2, Bcl2-xL, and Mdm2, which inhibits p53 activities. Overexpression of Bcl2 and Mdm2, or p53 deficiency, rescues both lethality and proliferative defects in Bcl11a-deficient early B cells and enables the mutant CLPs to differentiate to lymphocytes. Bcl11a is therefore essential for lymphopoiesis and negatively regulates p53 activities. Deletion of Bcl11a may represent a new approach for generating a mouse model that completely lacks an adaptive immune system.
Funded by: Wellcome Trust: 098051
The Journal of experimental medicine 2012;209;13;2467-83
Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes.
Cellular and Molecular Research, National Cancer Centre, Singapore.
Gastric cancer is a major cause of global cancer mortality. We surveyed the spectrum of somatic alterations in gastric cancer by sequencing the exomes of 15 gastric adenocarcinomas and their matched normal DNAs. Frequently mutated genes in the adenocarcinomas included TP53 (11/15 tumors), PIK3CA (3/15) and ARID1A (3/15). Cell adhesion was the most enriched biological pathway among the frequently mutated genes. A prevalence screening confirmed mutations in FAT4, a cadherin family gene, in 5% of gastric cancers (6/110) and FAT4 genomic deletions in 4% (3/83) of gastric tumors. Frequent mutations in chromatin remodeling genes (ARID1A, MLL3 and MLL) also occurred in 47% of the gastric cancers. We detected ARID1A mutations in 8% of tumors (9/110), which were associated with concurrent PIK3CA mutations and microsatellite instability. In functional assays, we observed both FAT4 and ARID1A to exert tumor-suppressor activity. Somatic inactivation of FAT4 and ARID1A may thus be key tumorigenic events in a subset of gastric cancers.
Nature genetics 2012;44;5;570-4