Sanger Institute - Publications 2012
-
BLUEPRINT to decode the epigenetic signature written in blood.
Nature biotechnology 2012;30;3;224-6
PUBMED: 22398613; DOI: 10.1038/nbt.2153
-
Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome.
Department of Haematology, University of Cambridge, Cambridge, UK. caa@sanger.ac.uk
The exon-junction complex (EJC) performs essential RNA processing tasks. Here, we describe the first human disorder, thrombocytopenia with absent radii (TAR), caused by deficiency in one of the four EJC subunits. Compound inheritance of a rare null allele and one of two low-frequency SNPs in the regulatory regions of RBM8A, encoding the Y14 subunit of EJC, causes TAR. We found that this inheritance mechanism explained 53 of 55 cases (P < 5 × 10(-228)) of the rare congenital malformation syndrome. Of the 53 cases with this inheritance pattern, 51 carried a submicroscopic deletion of 1q21.1 that has previously been associated with TAR, and two carried a truncation or frameshift null mutation in RBM8A. We show that the two regulatory SNPs result in diminished RBM8A transcription in vitro and that Y14 expression is reduced in platelets from individuals with TAR. Our data implicate Y14 insufficiency and, presumably, an EJC defect as the cause of TAR syndrome.
Funded by: British Heart Foundation: FS/09/039, RG/09/12/28096; Wellcome Trust: WT-082597/Z/07/Z, WT-084183/2/07/2, WT091310
Nature genetics 2012;44;4;435-9, S1-2
PUBMED: 22366785; DOI: 10.1038/ng.1083
-
High-throughput decoding of antitrypanosomal drug efficacy and resistance.
London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.
The concept of disease-specific chemotherapy was developed a century ago. Dyes and arsenical compounds that displayed selectivity against trypanosomes were central to this work, and the drugs that emerged remain in use for treating human African trypanosomiasis (HAT). The importance of understanding the mechanisms underlying selective drug action and resistance for the development of improved HAT therapies has been recognized, but these mechanisms have remained largely unknown. Here we use all five current HAT drugs for genome-scale RNA interference target sequencing (RIT-seq) screens in Trypanosoma brucei, revealing the transporters, organelles, enzymes and metabolic pathways that function to facilitate antitrypanosomal drug action. RIT-seq profiling identifies both known drug importers and the only known pro-drug activator, and links more than fifty additional genes to drug action. A bloodstream stage-specific invariant surface glycoprotein (ISG75) family mediates suramin uptake, and the AP1 adaptin complex, lysosomal proteases and major lysosomal transmembrane protein, as well as spermidine and N-acetylglucosamine biosynthesis, all contribute to suramin action. Further screens link ubiquinone availability to nitro-drug action, plasma membrane P-type H(+)-ATPases to pentamidine action, and trypanothione and several putative kinases to melarsoprol action. We also demonstrate a major role for aquaglyceroporins in pentamidine and melarsoprol cross-resistance. These advances in our understanding of mechanisms of antitrypanosomal drug efficacy and resistance will aid the rational design of new therapies and help to combat drug resistance, and provide unprecedented molecular insight into the mode of action of antitrypanosomal drugs.
Funded by: Wellcome Trust: 085775/Z/08/Z, 090007/Z/09/Z, 093010/Z/10/Z
Nature 2012;482;7384;232-6
PUBMED: 22278056; PMC: 3303116; DOI: 10.1038/nature10771
-
Comprehensive comparison of three commercial human whole-exome capture platforms.
Beijing Genomics Institute at Shenzhen, 11F, Bei Shan Industrial Zone, Yantian District, Shenzhen 518083, China. asan@genomics.org.cn.
Background: Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.
Results: We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.
Conclusions: We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.
Genome biology 2011;12;9;R95
PUBMED: 21955857; PMC: 3308058; DOI: 10.1186/gb-2011-12-9-r95
-
An evaluation of different meta-analysis approaches in the presence of allelic heterogeneity.
Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK.
Meta-analysis has proven a useful tool in genetic association studies. Allelic heterogeneity can arise from ethnic background differences across populations being meta-analyzed (for example, in search of common frequency variants through genome-wide association studies), and through the presence of multiple low frequency and rare associated variants in the same functional unit of interest (for example, within a gene or a regulatory region). The latter challenge will be increasingly relevant in whole-genome and whole-exome sequencing studies investigating association with complex traits. Here, we evaluate the performance of different approaches to meta-analysis in the presence of allelic heterogeneity. We simulate allelic heterogeneity scenarios in three populations and examine the performance of current approaches to the analysis of these data. We show that current approaches can detect only a small fraction of common frequency causal variants. We also find that for low-frequency variants with large effects (odds ratios 2-3), single-point tests have high power, but also high false-positive rates. P-value based meta-analysis of summary results from allele-matching locus-wide tests outperforms collapsing approaches. We conclude that current strategies for the combination of genetic association data in the presence of allelic heterogeneity are insufficiently powered.
European journal of human genetics : EJHG 2012;20;6;709-12
PUBMED: 22293689; DOI: 10.1038/ejhg.2011.274
-
Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. sa3@sanger.ac.uk
Our understanding of the composition of multi-clonal malarial infections and the epidemiological factors which shape their diversity remain poorly understood. Traditionally within-host diversity has been defined in terms of the multiplicity of infection (MOI) derived by PCR-based genotyping. Massively parallel, single molecule sequencing technologies now enable individual read counts to be derived on genome-wide datasets facilitating the development of new statistical approaches to describe within-host diversity. In this class of measures the F(WS) metric characterizes within-host diversity and its relationship to population level diversity. Utilizing P. falciparum field isolates from patients in West Africa we here explore the relationship between the traditional MOI and F(WS) approaches. F(WS) statistics were derived from read count data at 86,158 SNPs in 64 samples sequenced on the Illumina GA platform. MOI estimates were derived by PCR at the msp-1 and -2 loci. Significant correlations were observed between the two measures, particularly with the msp-1 locus (P = 5.92×10(-5)). The F(WS) metric should be more robust than the PCR-based approach owing to reduced sensitivity to potential locus-specific artifacts. Furthermore the F(WS) metric captures information on a range of parameters which influence out-crossing risk including the number of clones (MOI), their relative proportions and genetic divergence. This approach should provide novel insights into the factors which correlate with, and shape within-host diversity.
Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council; Wellcome Trust
PloS one 2012;7;2;e32891
PUBMED: 22393456; PMC: 3290604; DOI: 10.1371/journal.pone.0032891
-
Frequency and patterns of protease gene resistance mutations in HIV-infected patients treated with lopinavir/ritonavir as their first protease inhibitor.
Medical Research Council Clinical Trials Unit, St Stephen's Centre, Chelsea and Westminster Hospital, 125 Kingsway, London, UK. t.barber@nhs.net
Background: Selection of protease mutations on antiretroviral therapy (ART) including a ritonavir-boosted protease inhibitor (PI) has been reported infrequently. Scarce data exist from long-term cohorts on resistance incidence or mutational patterns emerging to different PIs.
Methods: We studied UK patients receiving lopinavir/ritonavir as their first PI, either while naive to ART or having previously received non-PI-based ART. Virological failure was defined as viral load ≥ 400 copies/mL after previous suppression <400 copies/mL, or failure to achieve <400 copies/mL during the first 6 months. pol sequences whilst failing lopinavir or within 30 days after stopping were analysed. Major and minor mutations (IAS-USA 2008-after exclusion of polymorphisms) were considered. Predicted susceptibility was determined using the Stanford HIVdb algorithm.
Results: Three thousand and fifty-six patients were followed for a median (IQR) of 14 (6-30) months, of whom 811 (27%) experienced virological failure. Of these, resistance test results were available on 291 (36%). One or more protease mutations were detected in 32 (11%) patients; the most frequent were I54V (n = 12), M46I (n = 11), V82A (n = 7) and L76V (n = 3). No association with viral subtype was evident. Many patients retained virus predicted to be susceptible to lopinavir (14, 44%), tipranavir (26, 81%) and darunavir (27, 84%).
Conclusions: This study reflects the experience of patients in routine care. Selection of protease gene mutations by lopinavir/ritonavir occurred at a much higher rate than in clinical trials. The mutations observed showed only partial overlap with those previously identified by structural chemistry models, serial cell culture passage and genotype-phenotype analyses. There remained a low degree of predicted cross-resistance to other widely used PIs.
Funded by: Medical Research Council: G00001999, G0600337, G0900274
The Journal of antimicrobial chemotherapy 2012;67;4;995-1000
PUBMED: 22258921; DOI: 10.1093/jac/dkr569
-
From HLA association to function.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. barrett@sanger.ac.uk
A new study refines the association signals for rheumatoid arthritis susceptibility in the major histocompatibility complex (MHC) region to five amino-acid positions encoded in three HLA genes, all within peptide-binding grooves. By adapting statistical methods from genome-wide association studies (GWAS) and using imputation from a large reference panel, they demonstrate the potential for this approach to identify functional variants in associated regions.
Nature genetics 2012;44;3;235-6
PUBMED: 22366857; DOI: 10.1038/ng.2207
-
Epigenome-Wide Scans Identify Differentially Methylated Regions for Age and Age-Related Phenotypes in a Healthy Ageing Population.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
Age-related changes in DNA methylation have been implicated in cellular senescence and longevity, yet the causes and functional consequences of these variants remain unclear. To elucidate the role of age-related epigenetic changes in healthy ageing and potential longevity, we tested for association between whole-blood DNA methylation patterns in 172 female twins aged 32 to 80 with age and age-related phenotypes. Twin-based DNA methylation levels at 26,690 CpG-sites showed evidence for mean genome-wide heritability of 18%, which was supported by the identification of 1,537 CpG-sites with methylation QTLs in cis at FDR 5%. We performed genome-wide analyses to discover differentially methylated regions (DMRs) for sixteen age-related phenotypes (ap-DMRs) and chronological age (a-DMRs). Epigenome-wide association scans (EWAS) identified age-related phenotype DMRs (ap-DMRs) associated with LDL (STAT5A), lung function (WT1), and maternal longevity (ARL4A, TBX20). In contrast, EWAS for chronological age identified hundreds of predominantly hyper-methylated age DMRs (490 a-DMRs at FDR 5%), of which only one (TBX20) was also associated with an age-related phenotype. Therefore, the majority of age-related changes in DNA methylation are not associated with phenotypic measures of healthy ageing in later life. We replicated a large proportion of a-DMRs in a sample of 44 younger adult MZ twins aged 20 to 61, suggesting that a-DMRs may initiate at an earlier age. We next explored potential genetic and environmental mechanisms underlying a-DMRs and ap-DMRs. Genome-wide overlap across cis-meQTLs, genotype-phenotype associations, and EWAS ap-DMRs identified CpG-sites that had cis-meQTLs with evidence for genotype-phenotype association, where the CpG-site was also an ap-DMR for the same phenotype. Monozygotic twin methylation difference analyses identified one potential environmentally-mediated ap-DMR associated with total cholesterol and LDL (CSMD1). Our results suggest that in a small set of genes DNA methylation may be a candidate mechanism of mediating not only environmental, but also genetic effects on age-related phenotypes.
PLoS genetics 2012;8;4;e1002629
PUBMED: 22532803; PMC: 3330116; DOI: 10.1371/journal.pgen.1002629
-
A robust clustering algorithm for identifying problematic samples in genome-wide association studies.
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.
Summary: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections.
Availability: The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer
Contact: chris.spencer@well.ox.ac.uk
Supplementary data are available at Bioinformatics online.
Funded by: Wellcome Trust: 075491/Z/04/B, 084575/Z/08/Z, 090532/Z/09/Z
Bioinformatics (Oxford, England) 2012;28;1;134-5
PUBMED: 22057162; PMC: 3244763; DOI: 10.1093/bioinformatics/btr599
-
The genome of Mycobacterium africanum West African 2 reveals a lineage-specific locus and genome erosion common to the M. tuberculosis complex.
Wellcome Trust Genome Campus, Wellcome Trust Sanger Institute, Hinxton, UK.
Background: M. africanum West African 2 constitutes an ancient lineage of the M. tuberculosis complex that commonly causes human tuberculosis in West Africa and has an attenuated phenotype relative to M. tuberculosis.
In search of candidate genes underlying these differences, the genome of M. africanum West African 2 was sequenced using classical capillary sequencing techniques. Our findings reveal a unique sequence, RD900, that was independently lost during the evolution of two important lineages within the complex: the "modern" M. tuberculosis group and the lineage leading to M. bovis. Closely related to M. bovis and other animal strains within the M. tuberculosis complex, M. africanum West African 2 shares an abundance of pseudogenes with M. bovis but also with M. africanum West African clade 1. Comparison with other strains of the M. tuberculosis complex revealed pseudogenes events in all the known lineages pointing toward ongoing genome erosion likely due to increased genetic drift and relaxed selection linked to serial transmission-bottlenecks and an intracellular lifestyle.
The genomic differences identified between M. africanum West African 2 and the other strains of the Mycobacterium tuberculosis complex may explain its attenuated phenotype, and pave the way for targeted experiments to elucidate the phenotypic characteristic of M. africanum. Moreover, availability of the whole genome data allows for verification of conservation of targets used for the next generation of diagnostics and vaccines, in order to ensure similar efficacy in West Africa.
Funded by: Wellcome Trust
PLoS neglected tropical diseases 2012;6;2;e1552
PUBMED: 22389744; PMC: 3289620; DOI: 10.1371/journal.pntd.0001552
-
Genomic Comparison of the Closely Related Salmonella enterica Serovars Enteritidis and Dublin.
Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Av. A. Navarro 3051, CP 11600, Montevideo, Uruguay.
The Enteritidis and Dublin serovars of Salmonella enterica are closely related, yet they differ significantly in pathogenicity and epidemiology. S. Enteritidis is a broad host range serovar that commonly causes gastroenteritis and infrequently causes invasive disease in humans. S. Dublin mainly colonizes cattle but upon infecting humans often results in invasive disease.To gain a broader view of the extent of these differences we conducted microarray-based comparative genomics between several field isolates from each serovar. Genome degradation has been correlated with host adaptation in Salmonella, thus we also compared at whole genome scale the available genomic sequences of them to evaluate pseudogene composition within each serovar.Microarray analysis revealed 3771 CDS shared by both serovars while 33 were only present in Enteritidis and 87 were exclusive to Dublin. Pseudogene evaluation showed 177 inactive CDS in S. Dublin which correspond to active genes in S. Enteritidis, nine of which are also inactive in the host adapted S. Gallinarum and S. Choleraesuis serovars. Sequencing of these 9 CDS in several S. Dublin clinical isolates revealed that they are pseudogenes in all of them, indicating that this feature is not peculiar to the sequenced strain. Among these CDS, shdA (Peyer´s patch colonization factor) and mglA (galactoside transport ATP binding protein), appear also to be inactive in the human adapted S. Typhi and S. Paratyphi A, suggesting that functionality of these genes may be relevant for the capacity of certain Salmonella serovars to infect a broad range of hosts.
The open microbiology journal 2012;6;5-13
PUBMED: 22371816; PMC: 3282883; DOI: 10.2174/1874285801206010005
-
Rare MTNR1B variants impairing melatonin receptor 1B function contribute to type 2 diabetes.
Centre National de la Recherche Scientifique Unité Mixte de Recherche, Lille Pasteur Institute, France.
Genome-wide association studies have revealed that common noncoding variants in MTNR1B (encoding melatonin receptor 1B, also known as MT(2)) increase type 2 diabetes (T2D) risk(1,2). Although the strongest association signal was highly significant (P < 1 × 10(-20)), its contribution to T2D risk was modest (odds ratio (OR) of ∼1.10-1.15)(1-3). We performed large-scale exon resequencing in 7,632 Europeans, including 2,186 individuals with T2D, and identified 40 nonsynonymous variants, including 36 very rare variants (minor allele frequency (MAF) <0.1%), associated with T2D (OR = 3.31, 95% confidence interval (CI) = 1.78-6.18; P = 1.64 × 10(-4)). A four-tiered functional investigation of all 40 mutants revealed that 14 were non-functional and rare (MAF < 1%), and 4 were very rare with complete loss of melatonin binding and signaling capabilities. Among the very rare variants, the partial- or total-loss-of-function variants but not the neutral ones contributed to T2D (OR = 5.67, CI = 2.17-14.82; P = 4.09 × 10(-4)). Genotyping the four complete loss-of-function variants in 11,854 additional individuals revealed their association with T2D risk (8,153 individuals with T2D and 10,100 controls; OR = 3.88, CI = 1.49-10.07; P = 5.37 × 10(-3)). This study establishes a firm functional link between MTNR1B and T2D risk.
Funded by: Medical Research Council; Wellcome Trust: 077016/Z/05/Z
Nature genetics 2012;44;3;297-301
PUBMED: 22286214; DOI: 10.1038/ng.1053
-
Genome-wide association study to identify common variants associated with brachial circumference: a meta-analysis of 14 cohorts.
Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Brachial circumference (BC), also known as upper arm or mid arm circumference, can be used as an indicator of muscle mass and fat tissue, which are distributed differently in men and women. Analysis of anthropometric measures of peripheral fat distribution such as BC could help in understanding the complex pathophysiology behind overweight and obesity. The purpose of this study is to identify genetic variants associated with BC through a large-scale genome-wide association scan (GWAS) meta-analysis. We used fixed-effects meta-analysis to synthesise summary results across 14 GWAS discovery and 4 replication cohorts comprising overall 22,376 individuals (12,031 women and 10,345 men) of European ancestry. Individual analyses were carried out for men, women, and combined across sexes using linear regression and an additive genetic model: adjusted for age and adjusted for age and BMI. We prioritised signals for follow-up in two-stages. We did not detect any signals reaching genome-wide significance. The FTO rs9939609 SNP showed nominal evidence for association (p<0.05) in the age-adjusted strata for men and across both sexes. In this first GWAS meta-analysis for BC to date, we have not identified any genome-wide significant signals and do not observe robust association of previously established obesity loci with BC. Large-scale collaborations will be necessary to achieve higher power to detect loci underlying BC.
PloS one 2012;7;3;e31369
PUBMED: 22479309; PMC: 3315559; DOI: 10.1371/journal.pone.0031369
-
Image-based characterization of thrombus formation in time-lapse DIC microscopy.
Computer Aided Medical Procedures, Technische Universität München (TUM), Garching bei München 85748, Germany. brieu@in.tum.de
The characterization of thrombus formation in time-lapse DIC microscopy is of increased interest for identifying genes which account for atherothrombosis and coronary artery diseases (CADs). In particular, we are interested in large-scale studies on zebrafish, which result in large amount of data, and require automatic processing. In this work, we present an image-based solution for the automatized extraction of parameters quantifying the temporal development of thrombotic plugs. Our system is based on the joint segmentation of thrombotic and aortic regions over time. This task is made difficult by the low contrast and the high dynamic conditions observed in vivo DIC microscopic scenes. Our key idea is to perform this segmentation by distinguishing the different motion patterns in image time series rather than by solving standard image segmentation tasks in each image frame. Thus, we are able to compensate for the poor imaging conditions. We model motion patterns by energies based on the idea of dynamic textures, and regularize the model by two prior energies on the shape of the aortic region and on the topological relationship between the thrombus and the aorta. We demonstrate the performance of our segmentation algorithm by qualitative and quantitative experiments on synthetic examples as well as on real in vivo microscopic sequences.
Medical image analysis 2012;16;4;915-31
PUBMED: 22482997; DOI: 10.1016/j.media.2012.02.002
-
Biocurators and biocuration: surveying the 21st century challenges.
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.
Curated databases are an integral part of the tool set that researchers use on a daily basis for their work. For most users, however, how databases are maintained, and by whom, is rather obscure. The International Society for Biocuration (ISB) represents biocurators, software engineers, developers and researchers with an interest in biocuration. Its goals include fostering communication between biocurators, promoting and describing their work, and highlighting the added value of biocuration to the world. The ISB recently conducted a survey of biocurators to better understand their educational and scientific backgrounds, their motivations for choosing a curatorial job and their career goals. The results are reported here. From the responses received, it is evident that biocuration is performed by highly trained scientists and perceived to be a stimulating career, offering both intellectual challenges and the satisfaction of performing work essential to the modern scientific community. It is also apparent that the ISB has at least a dual role to play to facilitate biocurators' work: (i) to promote biocuration as a career within the greater scientific community; (ii) to aid the development of resources for biomedical research through promotion of nomenclature and data-sharing standards that will allow interconnection of biological databases and better exploit the pivotal contributions that biocurators are making. DATABASE URL: http://biocurator.org.
Database : the journal of biological databases and curation 2012;2012;bar059
PUBMED: 22434828; PMC: 3308150; DOI: 10.1093/database/bar059
-
Telomeres and cancer: from crisis to stability to crisis to stability.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.
Telomere attrition unleashes genomic instability, promoting cancer development. Once established, however, the malignant clone often re-establishes genomic stability through overexpression of telomerase. In two papers, one in this issue of Cell and one in the subsequent issue, DePinho and colleagues explore the consequences of telomerase re-expression and its validity as a therapeutic target in mouse models of cancer.
Funded by: Wellcome Trust: 093867
Cell 2012;148;4;633-5
PUBMED: 22341437; PMC: 3322332; DOI: 10.1016/j.cell.2012.01.043
-
Specific expression of Kcna10, Pxn and Odf2 in the organ of Corti.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
The development of the organ of Corti and the highly specialized cells required for hearing involves a multitude of genes, many of which remain unknown. Here we describe the expression pattern of three genes not previously studied in the inner ear in mice at a range of ages both embryonic and early postnatal. Kcna10, a tetrameric Shaker-like potassium channel, is expressed strongly in the hair cells themselves. Odf2, as its centriolar isoform Cenexin, marks the dendrites extending to and contacting hair cells, and Pxn, a focal adhesion scaffold protein, is most strongly expressed in pillar cells during the ages studied. The roles of these genes are yet to be elucidated, but their specific expression patterns imply potential functional significance in the inner ear.
Gene expression patterns : GEP 2012;12;5-6;172-179
PUBMED: 22446089; DOI: 10.1016/j.gep.2012.03.001
-
Association study of nonsynonymous single nucleotide polymorphisms in schizophrenia.
Fundación Pública Galega de Medicina Xenómica-SERGAS, Hospital Clínico Universitario, Santiago de Compostela, Spain.
Background: Genome-wide association studies using several hundred thousand anonymous markers present limited statistical power. Alternatively, association studies restricted to common nonsynonymous single nucleotide polymorphisms (nsSNPs) have the advantage of strongly reducing the multiple testing problem, while increasing the probability of testing functional single nucleotide polymorphisms (SNPs).
Methods: We performed a case-control association study of common nsSNPs in Galician (northwest Spain) samples using the Affymetrix GeneChip Human 20k cSNP Kit, followed by a replication study of the more promising results. After quality control procedures, the discovery sample consisted of 5100 nsSNPs at minor allele frequency >5% analyzed in 476 schizophrenia patients and 447 control subjects. The replication sample consisted of 4069 cases and 15,128 control subjects of European origin. We also performed multilocus analysis, using aggregated scores of nsSNPs at liberal significance thresholds and cross-validation procedures.
Results: The 5 independent nsSNPs with false discovery rate q ≤ .25, as well as 13 additional nsSNPs at p < .01 and located in functional candidate genes, were genotyped in the replication samples. One SNP, rs13107325, located at the metal ions transporter gene SLC39A8, reached significance in the combined sample after Bonferroni correction (trend test, p = 2.7 × 10(-6), allelic odds ratio = 1.32). This SNP presents minor allele frequency of 5% to 10% in many European populations but is rare outside Europe. We also confirmed the polygenic component of susceptibility.
Conclusions: Taking into account that another metal ions transporter gene, SLC39A3, is associated to bipolar disorder, our findings reveal a role for brain metal homeostasis in psychosis.
Biological psychiatry 2012;71;2;169-77
PUBMED: 22078303; DOI: 10.1016/j.biopsych.2011.09.032
-
Microevolution of extensively drug-resistant tuberculosis in Russia.
National Mycobacterium Reference Laboratory, Blizard Institute, Queen Mary, University of London, London E1 2AT, United Kingdom.
Extensively drug-resistant (XDR) tuberculosis (TB), which is resistant to both first- and second-line antibiotics, is an escalating problem, particularly in the Russian Federation. Molecular fingerprinting of 2348 Mycobacterium tuberculosis isolates collected in Samara Oblast, Russia, revealed that 72% belonged to the Beijing lineage, a genotype associated with enhanced acquisition of drug resistance and increased virulence. Whole-genome sequencing of 34 Samaran isolates, plus 25 isolates representing global M. tuberculosis complex diversity, revealed that Beijing isolates originating in Eastern Europe formed a monophyletic group. Homoplasic polymorphisms within this clade were almost invariably associated with antibiotic resistance, indicating that the evolution of this population is primarily driven by drug therapy. Resistance genotypes showed a strong correlation with drug susceptibility phenotypes. A novel homoplasic mutation in rpoC, found only in isolates carrying a common rpoB rifampicin-resistance mutation, may play a role in fitness compensation. Most multidrug-resistant (MDR) isolates also had mutations in the promoter of a virulence gene, eis, which increase its expression and confer kanamycin resistance. Kanamycin therapy may thus select for mutants with increased virulence, helping preserve bacterial fitness and promoting transmission of drug-resistant TB strains. The East European clade was dominated by two MDR clusters, each disseminated across Samara. Polymorphisms conferring fluoroquinolone resistance were independently acquired multiple times within each cluster, indicating that XDR TB is currently not widely transmitted.
Genome research 2012;22;4;735-45
PUBMED: 22294518; PMC: 3317155; DOI: 10.1101/gr.128678.111
-
Association of the GGCX (CAA)16/17 repeat polymorphism with higher warfarin dose requirements in African Americans.
Department of Pharmacy Practice, University of Illinois, Chicago, IL 60612-7230, USA. humma@uic.edu
Objective: Little is known about genetic contributors to higher than usual warfarin dose requirements, particularly for African Americans. This study tested the hypothesis that the γ-glutamyl carboxylase (GGCX) genotype contributes to warfarin dose requirements greater than 7.5 mg/day in an African American population.
Methods: A total of 338 African Americans on a stable dose of warfarin were enrolled. The GGCX rs10654848 (CAA)n, rs12714145 (G>A), and rs699664 (p.R325Q); VKORC1 c.-1639G>A and rs61162043; and CYP2C9*2, *3, *5, *8, *11, and rs7089580 genotypes were tested for their association with dose requirements greater than 7.5 mg/day alone and in the context of other variables known to influence dose variability.
Results: The GGCX rs10654848 (CAA)16 or 17 repeat occurred at a frequency of 2.6% in African Americans and was overrepresented among patients requiring greater than 7.5 mg/day versus those who required lower doses (12 vs. 3%, P=0.003; odds ratio 4.0, 95% confidence interval, 1.5-10.5). The GGCX rs10654848 genotype remained associated with high dose requirements on regression analysis including age, body size, and VKORC1 genotype. On linear regression, the GGCX rs10654848 genotype explained 2% of the overall variability in warfarin dose in African Americans. An examination of the GGCX rs10654848 genotype in warfarin-treated Caucasians revealed a (CAA)16 repeat frequency of only 0.27% (P=0.008 compared with African Americans).
Conclusion: These data support the GGCX rs10654848 genotype as a predictor of higher than usual warfarin doses in African Americans, who have a 10-fold higher frequency of the (CAA)16/17 repeat compared with Caucasians.
Funded by: NHLBI NIH HHS: K23 HL089808-01A2; Wellcome Trust
Pharmacogenetics and genomics 2012;22;2;152-8
PUBMED: 22158446; PMC: 3261355; DOI: 10.1097/FPC.0b013e32834f288f
-
A common X-linked inborn error of carnitine biosynthesis may be a risk factor for nondysmorphic autism.
Departments of Molecular and Human Genetics, Psychiatry, and Pediatrics, Baylor College of Medicine, Houston, TX 77030.
We recently reported a deletion of exon 2 of the trimethyllysine hydroxylase epsilon (TMLHE) gene in a proband with autism. TMLHE maps to the X chromosome and encodes the first enzyme in carnitine biosynthesis, 6-N-trimethyllysine dioxygenase. Deletion of exon 2 of TMLHE causes enzyme deficiency, resulting in increased substrate concentration (6-N-trimethyllysine) and decreased product levels (3-hydroxy-6-N-trimethyllysine and γ-butyrobetaine) in plasma and urine. TMLHE deficiency is common in control males (24 in 8,787 or 1 in 366) and was not significantly increased in frequency in probands from simplex autism families (9 in 2,904 or 1 in 323). However, it was 2.82-fold more frequent in probands from male-male multiplex autism families compared with controls (7 in 909 or 1 in 130; P = 0.023). Additionally, six of seven autistic male siblings of probands in male-male multiplex families had the deletion, suggesting that TMLHE deficiency is a risk factor for autism (metaanalysis Z-score = 2.90 and P = 0.0037), although with low penetrance (2-4%). These data suggest that dysregulation of carnitine metabolism may be important in nondysmorphic autism; that abnormalities of carnitine intake, loss, transport, or synthesis may be important in a larger fraction of nondysmorphic autism cases; and that the carnitine pathway may provide a novel target for therapy or prevention of autism.
Proceedings of the National Academy of Sciences of the United States of America 2012
PUBMED: 22566635; DOI: 10.1073/pnas.1120210109
-
Inheritance of coronary artery disease in men: an analysis of the role of the Y chromosome.
School of Health Sciences, University of Ballarat, Ballarat, VIC, Australia.
Background: A sexual dimorphism exists in the incidence and prevalence of coronary artery disease--men are more commonly affected than are age-matched women. We explored the role of the Y chromosome in coronary artery disease in the context of this sexual inequity.
Methods: We genotyped 11 markers of the male-specific region of the Y chromosome in 3233 biologically unrelated British men from three cohorts: the British Heart Foundation Family Heart Study (BHF-FHS), West of Scotland Coronary Prevention Study (WOSCOPS), and Cardiogenics Study. On the basis of this information, each Y chromosome was tracked back into one of 13 ancient lineages defined as haplogroups. We then examined associations between common Y chromosome haplogroups and the risk of coronary artery disease in cross-sectional BHF-FHS and prospective WOSCOPS. Finally, we undertook functional analysis of Y chromosome effects on monocyte and macrophage transcriptome in British men from the Cardiogenics Study.
Findings: Of nine haplogroups identified, two (R1b1b2 and I) accounted for roughly 90% of the Y chromosome variants among British men. Carriers of haplogroup I had about a 50% higher age-adjusted risk of coronary artery disease than did men with other Y chromosome lineages in BHF-FHS (odds ratio 1·75, 95% CI 1·20-2·54, p=0·004), WOSCOPS (1·45, 1·08-1·95, p=0·012), and joint analysis of both populations (1·56, 1·24-1·97, p=0·0002). The association between haplogroup I and increased risk of coronary artery disease was independent of traditional cardiovascular and socioeconomic risk factors. Analysis of macrophage transcriptome in the Cardiogenics Study revealed that 19 molecular pathways showing strong differential expression between men with haplogroup I and other lineages of the Y chromosome were interconnected by common genes related to inflammation and immunity, and that some of them have a strong relevance to atherosclerosis.
Interpretation: The human Y chromosome is associated with risk of coronary artery disease in men of European ancestry, possibly through interactions of immunity and inflammation.
Funding: British Heart Foundation; UK National Institute for Health Research; LEW Carty Charitable Fund; National Health and Medical Research Council of Australia; European Union 6th Framework Programme; Wellcome Trust.
Funded by: British Heart Foundation: PG/06/097/21331; Wellcome Trust: 087576, WT-0841383/2/07/2
Lancet 2012;379;9819;915-22
PUBMED: 22325189; PMC: 3314981; DOI: 10.1016/S0140-6736(11)61453-0
-
Bayesian estimation of bacterial community composition from 454 sequencing data.
Department of Mathematics and Statistics, P.O.Box 68 (Gustaf Hällströmin katu 2b), University of Helsinki, 00014 Helsinki, Finland and Pathogen Genomics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
Estimating bacterial community composition from a mixed sample in different applied contexts is an important task for many microbiologists. The bacterial community composition is commonly estimated by clustering polymerase chain reaction amplified 16S rRNA gene sequences. Current taxonomy-independent clustering methods for analyzing these sequences, such as UCLUST, ESPRIT-Tree and CROP, have two limitations: (i) expert knowledge is needed, i.e. a difference cutoff between species needs to be specified; (ii) closely related species cannot be separated. The first limitation imposes a burden on the user, since considerable effort is needed to select appropriate parameters, whereas the second limitation leads to an inaccurate description of the underlying bacterial community composition. We propose a probabilistic model-based method to estimate bacterial community composition which tackles these limitations. Our method requires very little expert knowledge, where only the possible maximum number of clusters needs to be specified. Also our method demonstrates its ability to separate closely related species in two experiments, in spite of sequencing errors and individual variations.
Nucleic acids research 2012
PUBMED: 22406836; DOI: 10.1093/nar/gks227
-
Circulating DNA and next-generation sequencing.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.
Personalising cancer medicine depends upon the implementation of personalised diagnostics and therapeutics. Detailed genomic screening is likely to play a central role in this. As the range of drugs and other therapies for cancer continues to increase, there is an increasingly urgent need for sensitive and specific measures of disease burden to guide treatment regimens. The ability to quantify disease burden with high accuracy and sensitivity in patients with cancer would open many potential routes to personalising therapeutic choices. For example, the intensity of therapy could be guided by the amount of disease at diagnosis; monitoring the response of patients to drugs could allow extension of the period of treatment in responders or early changeover of therapy in nonresponders; and early prediction of recurrence could allow salvage therapy to be instituted before complications of relapse develop. The detection of tumour-specific rearrangements in DNA free in the serum or plasma may provide a substantial advance in the accuracy of monitoring disease burden in patients with solid tumours.
Recent results in cancer research. Fortschritte der Krebsforschung. Progrès dans les recherches sur le cancer 2012;195;143-9
PUBMED: 22527501; DOI: 10.1007/978-3-642-28160-0_12
-
Streptococcus pneumoniae: the evolution of antimicrobial resistance to beta-lactams, fluoroquinolones and macrolides.
Malawi-Liverpool-Wellcome Clinical Research Programme, PO Box 30096, Chichiri, Blantyre 3, Malawi; Institute of Infection and Global Health, The University of Liverpool, Liverpool, UK.
Multi drug resistant Streptococcus pneumoniae constitute a major public health concern worldwide. In this review we discuss how the transformable nature of the pneumococcus, in parallel with antimicrobial induced stress, contributes to the evolution of antimicrobial resistance; and how the introduction of the pneumococcal conjugate vaccine has affected the situation.
Microbes and infection / Institut Pasteur 2012
PUBMED: 22342898; DOI: 10.1016/j.micinf.2012.01.012
-
Interpretation of genomic copy number variants using DECIPHER.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Many patients suffering from developmental disorders have submicroscopic deletions or duplications affecting the copy number of dosage-sensitive genes or disrupting normal gene expression. Many of these changes are novel or extremely rare, making clinical interpretation problematic and genotype/phenotype correlations difficult. Identification of patients sharing a genomic rearrangement and having phenotypes in common increases certainty in the diagnosis and allows characterization of new syndromes. The DECIPHER database is an online repository of genotype and phenotype data whose chief objective is to facilitate the association of genomic variation with phenotype to enable the clinical interpretation of copy number variation (CNV). This unit shows how DECIPHER can be used to (1) search for consented patients sharing a defined chromosomal location, (2) navigate regions of interest using in-house visualization tools and the Ensembl genome browser, (3) analyze affected genes and prioritize them according to their likelihood of haploinsufficiency, (4) upload patient aberrations and phenotypes, and (5) create printouts at different levels of detail. By following this protocol, clinicians and researchers alike will be able to learn how to characterize their patients' chromosomal imbalances using DECIPHER.
Funded by: Wellcome Trust: WT077008
Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] 2012;Chapter 8;Unit 8.14
PUBMED: 22241657; DOI: 10.1002/0471142905.hg0814s72
-
Novel Loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals.
Department of Epidemiology, Biostatistics, and Occupational Health, Jewish General Hospital, Lady Davis Institute, McGill University, Montreal, Canada.
Circulating levels of adiponectin, a hormone produced predominantly by adipocytes, are highly heritable and are inversely associated with type 2 diabetes mellitus (T2D) and other metabolic traits. We conducted a meta-analysis of genome-wide association studies in 39,883 individuals of European ancestry to identify genes associated with metabolic disease. We identified 8 novel loci associated with adiponectin levels and confirmed 2 previously reported loci (P = 4.5×10(-8)-1.2×10(-43)). Using a novel method to combine data across ethnicities (N = 4,232 African Americans, N = 1,776 Asians, and N = 29,347 Europeans), we identified two additional novel loci. Expression analyses of 436 human adipocyte samples revealed that mRNA levels of 18 genes at candidate regions were associated with adiponectin concentrations after accounting for multiple testing (p<3×10(-4)). We next developed a multi-SNP genotypic risk score to test the association of adiponectin decreasing risk alleles on metabolic traits and diseases using consortia-level meta-analytic data. This risk score was associated with increased risk of T2D (p = 4.3×10(-3), n = 22,044), increased triglycerides (p = 2.6×10(-14), n = 93,440), increased waist-to-hip ratio (p = 1.8×10(-5), n = 77,167), increased glucose two hours post oral glucose tolerance testing (p = 4.4×10(-3), n = 15,234), increased fasting insulin (p = 0.015, n = 48,238), but with lower in HDL-cholesterol concentrations (p = 4.5×10(-13), n = 96,748) and decreased BMI (p = 1.4×10(-4), n = 121,335). These findings identify novel genetic determinants of adiponectin levels, which, taken together, influence risk of T2D and markers of insulin resistance.
PLoS genetics 2012;8;3;e1002607
PUBMED: 22479202; PMC: 3315470; DOI: 10.1371/journal.pgen.1002607
-
Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene.
Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, University of Cambridge, Cambridge, UK. lucy.davison@cimr.cam.ac.uk
The chromosome 16p13 region has been associated with several autoimmune diseases, including type 1 diabetes (T1D) and multiple sclerosis (MS). CLEC16A has been reported as the most likely candidate gene in the region, since it contains the most disease-associated single-nucleotide polymorphisms (SNPs), as well as an imunoreceptor tyrosine-based activation motif. However, here we report that intron 19 of CLEC16A, containing the most autoimmune disease-associated SNPs, appears to behave as a regulatory sequence, affecting the expression of a neighbouring gene, DEXI. The CLEC16A alleles that are protective from T1D and MS are associated with increased expression of DEXI, and no other genes in the region, in two independent monocyte gene expression data sets. Critically, using chromosome conformation capture (3C), we identified physical proximity between the DEXI promoter region and intron 19 of CLEC16A, separated by a loop of >150 kb. In reciprocal experiments, a 20 kb fragment of intron 19 of CLEC16A, containing SNPs associated with T1D and MS, as well as with DEXI expression, interacted with the promotor region of DEXI but not with candidate DNA fragments containing other potential causal genes in the region, including CLEC16A. Intron 19 of CLEC16A is highly enriched for transcription-factor-binding events and markers associated with enhancer activity. Taken together, these data indicate that although the causal variants in the 16p13 region lie within CLEC16A, DEXI is an unappreciated autoimmune disease candidate gene, and illustrate the power of the 3C approach in progressing from genome-wide association studies results to candidate causal genes.
Funded by: Medical Research Council; Wellcome Trust: 061858, 076113, 076113/C/04/Z, 079895, 082549/Z/07/Z, 089989/Z/09/Z
Human molecular genetics 2012;21;2;322-33
PUBMED: 21989056; PMC: 3276289; DOI: 10.1093/hmg/ddr468
-
Diagnostic interpretation of array data using public databases and internet sources.
Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands. N.deLeeuw@antrg.umcn.nl.
The range of commercially available array platforms and analysis software packages is expanding and their utility is improving, making reliable detection of copy number variation (CNV) relatively straightforward. Reliable interpretation of CNV data, however, is often difficult and requires expertise. With our knowledge of the human genome growing rapidly, applications for array testing continuously broadening, and the resolution of CNV detection increasing, this leads to great complexity in interpreting what can be daunting data. Correct CNV interpretation and optimal use of the genotype information provided by SNP probes on an array depends largely on knowledge present in various resources. In addition to the availability of host laboratories' own datasets and national registries, there are several public databases and Internet resources with genotype and phenotype information that can be used for array data interpretation. With so many resources now available, it is important to know which are fit-for-purpose in a diagnostic setting. We summarise the characteristics of the most commonly used Internet databases and resources, and propose a general data interpretation strategy that can be used for comparative hybridisation, comparative intensity and genotype-based array data.
Human mutation 2012
PUBMED: 22334422; DOI: 10.1002/humu.22049
-
Genomic restructuring in the Tasmanian devil facial tumour: chromosome painting and gene mapping provide clues to evolution of a transmissible tumour.
Research School of Biology, The Australian National University, Canberra, Australia. janine.deakin@anu.edu.au
Devil facial tumour disease (DFTD) is a fatal, transmissible malignancy that threatens the world's largest marsupial carnivore, the Tasmanian devil, with extinction. First recognised in 1996, DFTD has had a catastrophic effect on wild devil numbers, and intense research efforts to understand and contain the disease have since demonstrated that the tumour is a clonal cell line transmitted by allograft. We used chromosome painting and gene mapping to deconstruct the DFTD karyotype and determine the chromosome and gene rearrangements involved in carcinogenesis. Chromosome painting on three different DFTD tumour strains determined the origins of marker chromosomes and provided a general overview of the rearrangement in DFTD karyotypes. Mapping of 105 BAC clones by fluorescence in situ hybridisation provided a finer level of resolution of genome rearrangements in DFTD strains. Our findings demonstrate that only limited regions of the genome, mainly chromosomes 1 and X, are rearranged in DFTD. Regions rearranged in DFTD are also highly rearranged between different marsupials. Differences between strains are limited, reflecting the unusually stable nature of DFTD. Finally, our detailed maps of both the devil and tumour karyotypes provide a physical framework for future genomic investigations into DFTD.
PLoS genetics 2012;8;2;e1002483
PUBMED: 22359511; PMC: 3280961; DOI: 10.1371/journal.pgen.1002483
-
Molecular mechanisms of drug resistance in natural Leishmania populations vary with genetic background.
Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium.
The evolution of drug-resistance in pathogens is a major global health threat. Elucidating the molecular basis of pathogen drug-resistance has been the focus of many studies but rarely is it known whether a drug-resistance mechanism identified is universal for the studied pathogen; it has seldom been clarified whether drug-resistance mechanisms vary with the pathogen's genotype. Nevertheless this is of critical importance in gaining an understanding of the complexity of this global threat and in underpinning epidemiological surveillance of pathogen drug resistance in the field. This study aimed to assess the molecular and phenotypic heterogeneity that emerges in natural parasite populations under drug treatment pressure. We studied lines of the protozoan parasite Leishmania (L.) donovani with differential susceptibility to antimonial drugs; the lines being derived from clinical isolates belonging to two distinct genetic populations that circulate in the leishmaniasis endemic region of Nepal. Parasite pathways known to be affected by antimonial drugs were characterised on five experimental levels in the lines of the two populations. Characterisation of DNA sequence, gene expression, protein expression and thiol levels revealed a number of molecular features that mark antimonial-resistant parasites in only one of the two populations studied. A final series of in vitro stress phenotyping experiments confirmed this heterogeneity amongst drug-resistant parasites from the two populations. These data provide evidence that the molecular changes associated with antimonial-resistance in natural Leishmania populations depend on the genetic background of the Leishmania population, which has resulted in a divergent set of resistance markers in the Leishmania populations. This heterogeneity of parasite adaptations provides severe challenges for the control of drug resistance in the field and the design of molecular surveillance tools for widespread applicability.
Funded by: Wellcome Trust: 085349, WT061173MA-SM
PLoS neglected tropical diseases 2012;6;2;e1514
PUBMED: 22389733; PMC: 3289598; DOI: 10.1371/journal.pntd.0001514
-
Genome-wide association study identifies novel loci associated with circulating phospho- and sphingolipid concentrations.
Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands.
Phospho- and sphingolipids are crucial cellular and intracellular compounds. These lipids are required for active transport, a number of enzymatic processes, membrane formation, and cell signalling. Disruption of their metabolism leads to several diseases, with diverse neurological, psychiatric, and metabolic consequences. A large number of phospholipid and sphingolipid species can be detected and measured in human plasma. We conducted a meta-analysis of five European family-based genome-wide association studies (N = 4034) on plasma levels of 24 sphingomyelins (SPM), 9 ceramides (CER), 57 phosphatidylcholines (PC), 20 lysophosphatidylcholines (LPC), 27 phosphatidylethanolamines (PE), and 16 PE-based plasmalogens (PLPE), as well as their proportions in each major class. This effort yielded 25 genome-wide significant loci for phospholipids (smallest P-value = 9.88×10(-204)) and 10 loci for sphingolipids (smallest P-value = 3.10×10(-57)). After a correction for multiple comparisons (P-value<2.2×10(-9)), we observed four novel loci significantly associated with phospholipids (PAQR9, AGPAT1, PKD2L1, PDXDC1) and two with sphingolipids (PLD2 and APOE) explaining up to 3.1% of the variance. Further analysis of the top findings with respect to within class molar proportions uncovered three additional loci for phospholipids (PNLIPRP2, PCDH20, and ABDH3) suggesting their involvement in either fatty acid elongation/saturation processes or fatty acid specific turnover mechanisms. Among those, 14 loci (KCNH7, AGPAT1, PNLIPRP2, SYT9, FADS1-2-3, DLG2, APOA1, ELOVL2, CDK17, LIPC, PDXDC1, PLD2, LASS4, and APOE) mapped into the glycerophospholipid and 12 loci (ILKAP, ITGA9, AGPAT1, FADS1-2-3, APOA1, PCDH20, LIPC, PDXDC1, SGPP1, APOE, LASS4, and PLD2) to the sphingolipid pathways. In large meta-analyses, associations between FADS1-2-3 and carotid intima media thickness, AGPAT1 and type 2 diabetes, and APOA1 and coronary artery disease were observed. In conclusion, our study identified nine novel phospho- and sphingolipid loci, substantially increasing our knowledge of the genetic basis for these traits.
Funded by: Medical Research Council; Wellcome Trust
PLoS genetics 2012;8;2;e1002490
PUBMED: 22359512; PMC: 3280968; DOI: 10.1371/journal.pgen.1002490
-
Evidence for transcript networks composed of chimeric RNAs in human cells.
Bioinformatics and Genomics, Centre for Genomic Regulation and Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5' and 3' transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.
Funded by: NHGRI NIH HHS: HG003143, R01 HG003143-08, U01HG003147, U01HG003150, U54 HG004592, U54HG004557; Wellcome Trust
PloS one 2012;7;1;e28213
PUBMED: 22238572; PMC: 3251577; DOI: 10.1371/journal.pone.0028213
-
Hyperactive piggyBac gene transfer in human cells and in vivo.
Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA.
We characterized a recently developed hyperactive piggyBac (pB) transposase enzyme [containing seven mutations (7pB)] for gene transfer in human cells in vitro and to somatic cells in mice in vivo. Despite a protein level expression similar to that of native pB, 7pB significantly increased the gene transfer efficiency of a neomycin resistance cassette transposon in both HEK293 and HeLa cultured human cells. Native pB and SB100X, the most active transposase of the Sleeping Beauty transposon system, exhibited similar transposition efficiency in cultured human cell lines. When delivered to primary human T cells ex vivo, 7pB increased gene delivery two- to threefold compared with piggyBac and SB100X. The activity of hyperactive 7pB transposase was not affected by the addition of a 24-kDa N-terminal tag, whereas SB100X manifested a 50% reduction in transposition. Hyperactive 7pB was compared with native pB and SB100X in vivo in mice using hydrodynamic tail-vein injection of a limiting dose of transposase DNA combined with luciferase reporter transposons. We followed transgene expression for up to 6 months and observed approximately 10-fold greater long-term gene expression in mice injected with a codon-optimized version of 7pB compared with mice injected with native pB or SB100X. We conclude that hyperactive piggyBac elements can increase gene transfer in human cells and in vivo and should enable improved gene delivery using the piggyBac transposon system in a variety of cell and gene-therapy applications.
Funded by: NIDDK NIH HHS: T32DK064717; NIGMS NIH HHS: T32GM007330
Human gene therapy 2012;23;3;311-20
PUBMED: 21992617; PMC: 3300075; DOI: 10.1089/hum.2011.138
-
Genome-wide SNP and microsatellite variation illuminate population-level epidemiology in the Leishmania donovani species complex.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. Tim.Downing@sanger.ac.uk
The species of the Leishmania donovani species complex cause visceral leishmaniasis, a debilitating infectious disease transmitted by sandflies. Understanding molecular changes associated with population structure in these parasites can help unravel their epidemiology and spread in humans. In this study, we used a panel of standard microsatellite loci and genome-wide SNPs to investigate population-level diversity in L. donovani strains recently isolated from a small geographic area spanning India, Bihar and Nepal, and compared their variation to that found in diverse strains of the L. donovani complex isolates from Europe, Africa and Asia. Microsatellites and SNPs could clearly resolve the phylogenetic relationships of the strains between continents, and microsatellite phylogenies indicated that certain older Indian strains were closely related to African strains. In the context of the anti-malaria spraying campaigns in the 1960s, this was consistent with a pattern of episodic population size contractions and clonal expansions in these parasites that was supported by population history simulations. In sharp contrast to the low resolution provided by microsatellites, SNPs retained a much more fine-scale resolution of population-level variability to the extent that they identified four different lineages from the same region one of which was more closely related to African and European strains than to Indian or Nepalese ones. Joining results of in vitro testing the antimonial drug sensitivity with the phylogenetic signals from the SNP data highlighted protein-level mutations revealing a distinct drug-resistant group of Nepalese and Indian L. donovani. This study demonstrates the power of genomic data for exploring parasite population structure. Furthermore, markers defining different genetic groups have been discovered that could potentially be applied to investigate drug resistance in clinical Leishmania strains.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 2012;12;1;149-59
PUBMED: 22119748; PMC: 3315668; DOI: 10.1016/j.meegid.2011.11.005
-
AntiFam: a tool to help identify spurious ORFs in protein annotation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA. UK. re3@sanger.ac.uk
As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein families database a number of protein families have been built, which were later identified as composed solely of spurious open reading frames (ORFs) either on the opposite strand or in a different, overlapping reading frame with respect to the true protein-coding or non-coding RNA gene. These families were deleted and are no longer available in Pfam. However, we realized that these may perform a useful function to identify new spurious ORFs. We have collected these families together in AntiFam along with additional custom-made families of spurious ORFs. This resource currently contains 23 families that identified 1310 spurious proteins in UniProtKB and a further 4119 spurious proteins in a collection of metagenomic sequences. UniProt has adopted AntiFam as a part of the UniProtKB quality control process and will investigate these spurious proteins for exclusion.
Funded by: NHGRI NIH HHS: R01 HG004881).; Wellcome Trust: WT077044/Z/05/Z
Database : the journal of biological databases and curation 2012;2012;bas003
PUBMED: 22434837; PMC: 3308159; DOI: 10.1093/database/bas003
-
IFITM3 restricts the morbidity and mortality associated with influenza.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
The 2009 H1N1 influenza pandemic showed the speed with which a novel respiratory virus can spread and the ability of a generally mild infection to induce severe morbidity and mortality in a subset of the population. Recent in vitro studies show that the interferon-inducible transmembrane (IFITM) protein family members potently restrict the replication of multiple pathogenic viruses. Both the magnitude and breadth of the IFITM proteins' in vitro effects suggest that they are critical for intrinsic resistance to such viruses, including influenza viruses. Using a knockout mouse model, we now test this hypothesis directly and find that IFITM3 is essential for defending the host against influenza A virus in vivo. Mice lacking Ifitm3 display fulminant viral pneumonia when challenged with a normally low-pathogenicity influenza virus, mirroring the destruction inflicted by the highly pathogenic 1918 'Spanish' influenza. Similar increased viral replication is seen in vitro, with protection rescued by the re-introduction of Ifitm3. To test the role of IFITM3 in human influenza virus infection, we assessed the IFITM3 alleles of individuals hospitalized with seasonal or pandemic influenza H1N1/09 viruses. We find that a statistically significant number of hospitalized subjects show enrichment for a minor IFITM3 allele (SNP rs12252-C) that alters a splice acceptor site, and functional assays show the minor CC genotype IFITM3 has reduced influenza virus restriction in vitro. Together these data reveal that the action of a single intrinsic immune effector, IFITM3, profoundly alters the course of influenza virus infection in mouse and humans.
Funded by: Chief Scientist Office; Medical Research Council; NIAID NIH HHS: R01AI091786; Wellcome Trust: 090382/Z/09/Z, 090385/Z/09/Z
Nature 2012;484;7395;519-23
PUBMED: 22446628; DOI: 10.1038/nature10921
-
Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
Trans-acting genetic variants have a substantial, albeit poorly characterized, role in the heritable determination of gene expression. Using paired purified primary monocytes and B cells, we identify new predominantly cell type-specific cis and trans expression quantitative trait loci (eQTLs), including multi-locus trans associations to LYZ and KLF4 in monocytes and B cells, respectively. Additionally, we observe a B cell-specific trans association of rs11171739 at 12q13.2, a known autoimmune disease locus, with IP6K2 (P = 5.8 × 10(-15)), PRIC285 (P = 3.0 × 10(-10)) and an upstream region of CDKN1A (P = 2 × 10(-52)), suggesting roles for cell cycle regulation and peroxisome proliferator-activated receptor γ (PPARγ) signaling in autoimmune pathogenesis. We also find that specific human leukocyte antigen (HLA) alleles form trans associations with the expression of AOAH and ARHGAP24 in monocytes but not in B cells. In summary, we show that mapping gene expression in defined primary cell populations identifies new cell type-specific trans-regulated networks and provides insights into the genetic basis of disease susceptibility.
Nature genetics 2012;44;5;502-10
PUBMED: 22446964; DOI: 10.1038/ng.2205
-
Automatic categorization of diverse experimental information in the bioscience literature.
Howard Hughes Medical Institute and Biology Division, California Institute of Technology, Pasadena, CA 91125, USA.
Background: Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance.
Results: We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction.
Conclusions: Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.
Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: P41 HG000739, P41 HG002223, P41 HG002223-10S1, R01 HG004090
BMC bioinformatics 2012;13;16
PUBMED: 22280404; PMC: 3305665; DOI: 10.1186/1471-2105-13-16
-
Bifidobacterial surface-exopolysaccharide facilitates commensal-host interaction through immune modulation and pathogen protection.
Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland.
Bifidobacteria comprise a significant proportion of the human gut microbiota. Several bifidobacterial strains are currently used as therapeutic interventions, claiming various health benefits by acting as probiotics. However, the precise mechanisms by which they maintain habitation within their host and consequently provide these benefits are not fully understood. Here we show that Bifidobacterium breve UCC2003 produces a cell surface-associated exopolysaccharide (EPS), the biosynthesis of which is directed by either half of a bidirectional gene cluster, thus leading to production of one of two possible EPSs. Alternate transcription of the two opposing halves of this cluster appears to be the result of promoter reorientation. Surface EPS provided stress tolerance and promoted in vivo persistence, but not initial colonization. Marked differences were observed in host immune response: strains producing surface EPS (EPS(+)) failed to elicit a strong immune response compared with EPS-deficient variants. Specifically, EPS production was shown to be linked to the evasion of adaptive B-cell responses. Furthermore, presence of EPS(+) B. breve reduced colonization levels of the gut pathogen Citrobacter rodentium. Our data thus assigns a pivotal and beneficial role for EPS in modulating various aspects of bifidobacterial-host interaction, including the ability of commensal bacteria to remain immunologically silent and in turn provide pathogen protection. This finding enforces the probiotic concept and provides mechanistic insights into health-promoting benefits for both animal and human hosts.
Funded by: Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2012;109;6;2108-13
PUBMED: 22308390; PMC: 3277520; DOI: 10.1073/pnas.1115621109
-
Making your database available through Wikipedia: the pros and cons.
HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA, USA. finnr@janelia.hhmi.org
Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis.
Funded by: Howard Hughes Medical Institute; Wellcome Trust: WT098051
Nucleic acids research 2012;40;Database issue;D9-12
PUBMED: 22144683; PMC: 3245093; DOI: 10.1093/nar/gkr1195
-
Ensembl 2012.
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK. flicek@ebi.ac.uk
The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
Funded by: NHGRI NIH HHS: U01HG004695, U41HG006104, U54HG004563; Wellcome Trust: WT062023, WT079643
Nucleic acids research 2012;40;Database issue;D84-90
PUBMED: 22086963; PMC: 3245178; DOI: 10.1093/nar/gkr991
-
The importance of identifying alternative splicing in vertebrate genome annotation.
Human and Vertebrate Analysis and Annotation Team, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. af2@sanger.ac.uk
While alternative splicing (AS) can potentially expand the functional repertoire of vertebrate genomes, relatively few AS transcripts have been experimentally characterized. We describe our detailed manual annotation of vertebrate genomes, which is generating a publicly available geneset rich in AS. In order to achieve this we have adopted a highly sensitive approach to annotating gene models supported by correctly mapped, canonically spliced transcriptional evidence combined with a highly cautious approach to adding unsupported extensions to models and making decisions on their functional potential. We use information about the predicted functional potential and structural properties of every AS transcript annotated at a protein-coding or non-coding locus to place them into one of eleven subclasses. We describe the incorporation of new sequencing and proteomics technologies into our annotation pipelines, which are used to identify and validate AS. Combining all data sources has led to the production of a rich geneset containing an average of 6.3 AS transcripts for every human multi-exon protein-coding gene. The datasets produced have proved very useful in providing context to studies investigating the functional potential of genes and the effect of variation may have on gene structure and function. DATABASE URL: http://www.ensembl.org/index.html, http://vega.sanger.ac.uk/index.html.
Funded by: NHGRI NIH HHS: 5U54HG004555-04S1; Wellcome Trust: WT077198
Database : the journal of biological databases and curation 2012;2012;bas014
PUBMED: 22434846; PMC: 3308168; DOI: 10.1093/database/bas014
-
Intracranial aneurysm risk locus 5q23.2 is associated with elevated systolic blood pressure.
Public Health Genomics Unit, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland. emilia.gaal@helsinki.fi
Although genome-wide association studies (GWAS) have identified hundreds of complex trait loci, the pathomechanisms of most remain elusive. Studying the genetics of risk factors predisposing to disease is an attractive approach to identify targets for functional studies. Intracranial aneurysms (IA) are rupture-prone pouches at cerebral artery branching sites. IA is a complex disease for which GWAS have identified five loci with strong association and a further 14 loci with suggestive association. To decipher potential underlying disease mechanisms, we tested whether there are IA loci that convey their effect through elevating blood pressure (BP), a strong risk factor of IA. We performed a meta-analysis of four population-based Finnish cohorts (n(FIN) = 11 266) not selected for IA, to assess the association of previously identified IA candidate loci (n = 19) with BP. We defined systolic BP (SBP), diastolic BP, mean arterial pressure, and pulse pressure as quantitative outcome variables. The most significant result was further tested for association in the ICBP-GWAS cohort of 200 000 individuals. We found that the suggestive IA locus at 5q23.2 in PRDM6 was significantly associated with SBP in individuals of European descent (p(FIN) = 3.01E-05, p(ICBP-GWAS) = 0.0007, p(ALL) = 8.13E-07). The risk allele of IA was associated with higher SBP. PRDM6 encodes a protein predominantly expressed in vascular smooth muscle cells. Our study connects a complex disease (IA) locus with a common risk factor for the disease (SBP). We hypothesize that common variants in PRDM6 can contribute to altered vascular wall structure, hence increasing SBP and predisposing to IA. True positive associations often fail to reach genome-wide significance in GWAS. Our findings show that analysis of traditional risk factors as intermediate phenotypes is an effective tool for deciphering hidden heritability. Further, we demonstrate that common disease loci identified in a population isolate may bear wider significance.
Funded by: Medical Research Council: G0500539, G0600705; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: GR069224
PLoS genetics 2012;8;3;e1002563
PUBMED: 22438818; PMC: 3305343; DOI: 10.1371/journal.pgen.1002563
-
Exploiting genetic complexity in cancer to improve therapeutic strategies.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK. mj12@sanger.ac.uk
Advances in genome sequencing technologies are enabling researchers to make rapid progress in defining the entire repertoire of causal genetic changes in cancer. The response of patients with cancer to therapy is often highly variable and there is an increasing number of examples where mutations in cancer genomes have been shown to have a profound effect on the clinical effectiveness of drugs. An urgent challenge for the research and clinical communities is how to translate these genomic data sets into new and improved therapeutic strategies for the treatment of patients. The use of large-scale cell line-based drug screens to identify genomic 'biomarkers' of drug response for the stratification of patients has the potential to transform how patients with cancer are treated.
Drug discovery today 2012;17;5-6;188-93
PUBMED: 22342219; DOI: 10.1016/j.drudis.2012.01.025
-
Systematic identification of genomic markers of drug sensitivity in cancer cells.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
Clinical responses to anticancer therapies are often restricted to a subset of patients. In some cases, mutated cancer genes are potent biomarkers for responses to targeted agents. Here, to uncover new biomarkers of sensitivity and resistance to cancer therapeutics, we screened a panel of several hundred cancer cell lines--which represent much of the tissue-type and genetic diversity of human cancers--with 130 drugs under clinical and preclinical investigation. In aggregate, we found that mutated cancer genes were associated with cellular response to most currently available cancer drugs. Classic oncogene addiction paradigms were modified by additional tissue-specific or expression biomarkers, and some frequently mutated genes were associated with sensitivity to a broad range of therapeutic agents. Unexpected relationships were revealed, including the marked sensitivity of Ewing's sarcoma cells harbouring the EWS (also known as EWSR1)-FLI1 gene translocation to poly(ADP-ribose) polymerase (PARP) inhibitors. By linking drug activity to the functional complexity of cancer genomes, systematic pharmacogenomic profiling in cancer cell lines provides a powerful biomarker discovery platform to guide rational cancer therapeutic strategies.
Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: 1U54HG006097-01; NIGMS NIH HHS: P41GM079575-02; Wellcome Trust: 086357
Nature 2012;483;7391;570-5
PUBMED: 22460902; PMC: 3349233; DOI: 10.1038/nature11005
-
Intratumor heterogeneity and branched evolution revealed by multiregion sequencing.
Cancer Research UK London Research Institute, London, United Kingdom.
Background: Intratumor heterogeneity may foster tumor evolution and adaptation and hinder personalized-medicine strategies that depend on results from single tumor-biopsy samples.
Methods: To examine intratumor heterogeneity, we performed exome sequencing, chromosome aberration analysis, and ploidy profiling on multiple spatially separated samples obtained from primary renal carcinomas and associated metastatic sites. We characterized the consequences of intratumor heterogeneity using immunohistochemical analysis, mutation functional analysis, and profiling of messenger RNA expression.
Results: Phylogenetic reconstruction revealed branched evolutionary tumor growth, with 63 to 69% of all somatic mutations not detectable across every tumor region. Intratumor heterogeneity was observed for a mutation within an autoinhibitory domain of the mammalian target of rapamycin (mTOR) kinase, correlating with S6 and 4EBP phosphorylation in vivo and constitutive activation of mTOR kinase activity in vitro. Mutational intratumor heterogeneity was seen for multiple tumor-suppressor genes converging on loss of function; SETD2, PTEN, and KDM5C underwent multiple distinct and spatially separated inactivating mutations within a single tumor, suggesting convergent phenotypic evolution. Gene-expression signatures of good and poor prognosis were detected in different regions of the same tumor. Allelic composition and ploidy profiling analysis revealed extensive intratumor heterogeneity, with 26 of 30 tumor samples from four tumors harboring divergent allelic-imbalance profiles and with ploidy heterogeneity in two of four tumors.
Conclusions: Intratumor heterogeneity can lead to underestimation of the tumor genomics landscape portrayed from single tumor-biopsy samples and may present major challenges to personalized-medicine and biomarker development. Intratumor heterogeneity, associated with heterogeneous protein function, may foster tumor adaptation and therapeutic failure through Darwinian selection. (Funded by the Medical Research Council and others.).
Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust
The New England journal of medicine 2012;366;10;883-92
PUBMED: 22397650; DOI: 10.1056/NEJMoa1113205
-
The role of variation at AβPP, PSEN1, PSEN2, and MAPT in late onset Alzheimer's disease.
MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Neuroscience and Mental Health Research Institute, Cardiff University, Cardiff, UK.
Rare mutations in AβPP, PSEN1, and PSEN2 cause uncommon early onset forms of Alzheimer's disease (AD), and common variants in MAPT are associated with risk of other neurodegenerative disorders. We sought to establish whether common genetic variation in these genes confer risk to the common form of AD which occurs later in life (>65 years). We therefore tested single-nucleotide polymorphisms at these loci for association with late-onset AD (LOAD) in a large case-control sample consisting of 3,940 cases and 13,373 controls. Single-marker analysis did not identify any variants that reached genome-wide significance, a result which is supported by other recent genome-wide association studies. However, we did observe a significant association at the MAPT locus using a gene-wide approach (p = 0.009). We also observed suggestive association between AD and the marker rs9468, which defines the H1 haplotype, an extended haplotype that spans the MAPT gene and has previously been implicated in other neurodegenerative disorders including Parkinson's disease, progressive supranuclear palsy, and corticobasal degeneration. In summary common variants at AβPP, PSEN1, and PSEN2 and MAPT are unlikely to make strong contributions to susceptibility for LOAD. However, the gene-wide effect observed at MAPT indicates a possible contribution to disease risk which requires further study.
Funded by: Biotechnology and Biological Sciences Research Council: G0700704/84698; Chief Scientist Office; Medical Research Council; Wellcome Trust
Journal of Alzheimer's disease : JAD 2012;28;2;377-87
PUBMED: 22027014; DOI: 10.3233/JAD-2011-110824
-
Haplotype analyses of haemoglobin C and haemoglobin s and the dynamics of the evolutionary response to malaria in kassena-nankana district of ghana.
Noguchi Memorial Institute for Medical Research, University of Ghana, Accra, Ghana.
Background: Haemoglobin S (HbS) and C (HbC) are variants of the HBB gene which both protect against malaria. It is not clear, however, how these two alleles have evolved in the West African countries where they co-exist at high frequencies. Here we use haplotypic signatures of selection to investigate the evolutionary history of the malaria-protective alleles HbS and HbC in the Kassena-Nankana District (KND) of Ghana.
The haplotypic structure of HbS and HbC alleles was investigated, by genotyping 56 SNPs around the HBB locus. We found that, in the KND population, both alleles reside on extended haplotypes (approximately 1.5 Mb for HbS and 650 Kb for HbC) that are significantly less diverse than those of the ancestral HbA allele. The extended haplotypes span a recombination hotspot that is known to exist in this region of the genome
Significance: Our findings show strong support for recent positive selection of both the HbS and HbC alleles and provide insights into how these two alleles have both evolved in the population of northern Ghana.
PloS one 2012;7;4;e34565
PUBMED: 22506028; PMC: 3323552; DOI: 10.1371/journal.pone.0034565
-
Binding of more than one Tva800 molecule is required for ASLV-A entry.
Division of Virology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.
Background: Understanding the mechanism by which viruses enter their target cell is an essential part of understanding their infectious cycle. Previous studies have focussed on the multiplicity of viral envelope proteins that need to bind to their cognate receptor to initiate entry. Avian sarcoma and leukosis virus Envelope protein (ASLV Env) mediates entry via a receptor, Tva, which can be attached to the cell surface either by a phospholipid anchor (Tva800) or a transmembrane domain (Tva950). In these studies, we have now investigated the number of target receptors necessary for entry of ASLV Env-pseudotyped virions.
Results: Using titration and modelling experiments we provide evidence that binding of more than one receptor, probably two, is needed for entry of virions via Tva800. However, binding of just one Tva950 receptor is sufficient for successful entry.
Conclusions: The different modes of attachment of Tva800 and Tva950 to the cell membrane have important implications for the utilisation of these proteins as receptors for viral binding and/or uptake.
Funded by: Medical Research Council: U117512710; NCI NIH HHS: R37 CA 089441, R37 CA089441-12; Wellcome Trust: 091747
Retrovirology 2011;8;96
PUBMED: 22099981; PMC: 3267798; DOI: 10.1186/1742-4690-8-96
-
BioMart Central Portal: an open database network for the biological community.
Ontario Institute for Cancer Research, Toronto, M5G 0A3, Canada.
BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities.
Database : the journal of biological databases and curation 2011;2011;bar041
PUBMED: 21930507; PMC: 3263598; DOI: 10.1093/database/bar041
-
Using RNA-seq to determine the transcriptional landscape and the hypoxic response of the pathogenic yeast Candida parapsilosis.
School of Medicine and Medical Science, Conway Institute, UniversityCollege Dublin, Belfield, Dublin 4, Ireland.
Background: Candida parapsilosis is one of the most common causes of Candida infection worldwide. However, the genome sequence annotation was made without experimental validation and little is known about the transcriptional landscape. The transcriptional response of C. parapsilosis to hypoxic (low oxygen) conditions, such as those encountered in the host, is also relatively unexplored.
Results: We used next generation sequencing (RNA-seq) to determine the transcriptional profile of C. parapsilosis growing in several conditions including different media, temperatures and oxygen concentrations. We identified 395 novel protein-coding sequences that had not previously been annotated. We removed > 300 unsupported gene models, and corrected approximately 900. We mapped the 5' and 3' UTR for thousands of genes. We also identified 422 introns, including two introns in the 3' UTR of one gene. This is the first report of 3' UTR introns in the Saccharomycotina. Comparing the introns in coding sequences with other species shows that small numbers have been gained and lost throughout evolution. Our analysis also identified a number of novel transcriptional active regions (nTARs). We used both RNA-seq and microarray analysis to determine the transcriptional profile of cells grown in normoxic and hypoxic conditions in rich media, and we showed that there was a high correlation between the approaches. We also generated a knockout of the UPC2 transcriptional regulator, and we found that similar to C. albicans, Upc2 is required for conferring resistance to azole drugs, and for regulation of expression of the ergosterol pathway in hypoxia.
Conclusion: We provide the first detailed annotation of the C. parapsilosis genome, based on gene predictions and transcriptional analysis. We identified a number of novel ORFs and other transcribed regions, and detected transcripts from approximately 90% of the annotated protein coding genes. We found that the transcription factor Upc2 role has a conserved role as a major regulator of the hypoxic response in C. parapsilosis and C. albicans.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
BMC genomics 2011;12;628
PUBMED: 22192698; PMC: 3287387; DOI: 10.1186/1471-2164-12-628
-
Chado controller: advanced annotation management with a community annotation system.
CIRAD, UMR AGAP, F-34398 Montpellier, France. valentin.guignon@cirad.fr
Summary: We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl.
Availability: The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form
Contact: valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr
Supplementary data are available at Bioinformatics online.
Bioinformatics (Oxford, England) 2012;28;7;1054-6
PUBMED: 22285827; PMC: 3315714; DOI: 10.1093/bioinformatics/bts046
-
Afghanistan's ethnic groups share a Y-chromosomal heritage structured by historical events.
The Lebanese American University, Chouran, Beirut, Lebanon.
Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia.
Funded by: Wellcome Trust
PloS one 2012;7;3;e34288
PUBMED: 22470552; PMC: 3314501; DOI: 10.1371/journal.pone.0034288
-
Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing.
Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. sh16@sanger.ac.uk
Chlamydia trachomatis is responsible for both trachoma and sexually transmitted infections, causing substantial morbidity and economic cost globally. Despite this, our knowledge of its population and evolutionary genetics is limited. Here we present a detailed phylogeny based on whole-genome sequencing of representative strains of C. trachomatis from both trachoma and lymphogranuloma venereum (LGV) biovars from temporally and geographically diverse sources. Our analysis shows that predicting phylogenetic structure using ompA, which is traditionally used to classify Chlamydia, is misleading because extensive recombination in this region masks any true relationships present. We show that in many instances, ompA is a chimera that can be exchanged in part or as a whole both within and between biovars. We also provide evidence for exchange of, and recombination within, the cryptic plasmid, which is another key diagnostic target. We used our phylogenetic framework to show how genetic exchange has manifested itself in ocular, urogenital and LGV C. trachomatis strains, including the epidemic LGV serotype L2b.
Funded by: Wellcome Trust: 080348, 098051
Nature genetics 2012;44;4;413-9, S1
PUBMED: 22406642; DOI: 10.1038/ng.2214
-
Comparative genomic analyses of the Taylorellae.
Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.
Contagious equine metritis (CEM) is an important venereal disease of horses that is of concern to the thoroughbred industry. Taylorella equigenitalis is a causative agent of CEM but very little is known about it or its close relative Taylorella asinigenitalis. To reveal novel information about Taylorella biology, comparative genomic analyses were undertaken. Whole genome sequencing was performed for the T. equigenitalis type strain, NCTC11184. Draft genome sequences were produced for a second T. equigenitalis strain and for a strain of T. asinigenitalis. These genome sequences were analysed and compared to each other and the recently released genome sequence of T. equigenitalis MCE9. These analyses revealed that T. equigenitalis strains appear to be very similar to each other with relatively little strain-specific DNA content. A number of genes were identified that encode putative toxins and adhesins that are possibly involved in infection. Analysis of T. asinigenitalis revealed that it has a very similar gene repertoire to that of T. equigenitalis but shares surprisingly little DNA sequence identity with it. The generation of genome sequence information greatly increases knowledge of these poorly characterised bacteria and greatly facilitates study of them.
Veterinary microbiology 2012
PUBMED: 22541164; DOI: 10.1016/j.vetmic.2012.03.041
-
Interactions between PPAR-α and inflammation-related cytokine genes on the development of Alzheimer's disease, observed by the Epistasis Project.
Department of Psychiatry, University of Bonn, Bonn, Germany; 2Department of Psychiatry, Royal Derby Hospital,Uttoxeter Road, Derby DE22 3WQ, UK.
Objective: Neuroinflammation contributes to the pathogenesis of sporadic Alzheimer's disease (AD). Variations in genes relevant to inflammation may be candidate genes for AD risk. Whole-genome association studies have identified relevant new and known genes. Their combined effects do not explain 100% of the risk, genetic interactions may contribute. We investigated whether genes involved in inflammation, i.e. PPAR-α, interleukins (IL) IL- 1α, IL-1β, IL-6, and IL-10 may interact to increase AD risk.
Methods: The Epistasis Project identifies interactions that affect the risk of AD. Genotyping of single nucleotide polymorphisms (SNPs) in PPARA, IL1A, IL1B, IL6 and IL10 was performed. Possible associations were analyzed by fitting logistic regression models with AD as outcome, controlling for centre, age, sex and presence of apolipoprotein ε4 allele (APOEε4). Adjusted synergy factors were derived from interaction terms (p<0.05 two-sided).
Results: We observed four significant interactions between different SNPs in PPARA and in interleukins IL1A, IL1B, IL10 that may affect AD risk. There were no significant interactions between PPARA and IL6.
Conclusions: In addition to an association of the PPARA L162V polymorphism with the AD risk, we observed four significant interactions between SNPs in PPARA and SNPs in IL1A, IL1B and IL10 affecting AD risk. We prove that gene-gene interactions explain part of the heritability of AD and are to be considered when assessing the genetic risk. Necessary replications will require between 1450 and 2950 of both cases and controls, depending on the prevalence of the SNP, to have 80% power to detect the observed synergy factors.
International journal of molecular epidemiology and genetics 2012;3;1;39-47
-
A review of the role of stem cells in the development and treatment of glioma.
Centre for Brain Repair, Department Clinical Neurosciences, Cambridge University, E.D. Adrian Building, Forvie Site Hills Road, Cambridge, CB2 0PY, UK.
The neurosurgical management of patients with intrinsic glial cancers is one of the most rapidly evolving areas of practice. This has been fuelled by advances in surgical technique not only in cytoreduction but also in drug delivery. Further innovation will depend on a deeper understanding of the biology of the disease and an appreciation of the limitations of current knowledge. Here we review the controversial topic of cancer stem cells applied to glioma to provide neurosurgeons with a working overview. It is now recognised that the adult human brain contains regionally specified cell populations capable of self-renewal that may contribute to tumour growth and maintenance following accumulated mutational change. Tumour cells adapted to maintain growth demonstrate some stem-like characteristics and as such constitute a legitimate therapeutic target. Cellular reprogramming technologies raise the potential of developing stem cells as novel surgical tools to target disease and possibly ameliorate some of the consequences of treatment. Achieving these goals remains a significant challenge to neurosurgical oncologists, not least in challenging how we think about treating brain cancer. This review will briefly examine our understanding of adult stem cells within the brain, the evidence that they contribute to the development of brain tumours as tumour-initiating cells, and the potential implications for therapy. It will also look at the role stem cells may play in the future management of glioma.
Acta neurochirurgica 2012;154;6;951-69
PUBMED: 22527576; DOI: 10.1007/s00701-012-1338-9
-
High-resolution genotyping of the endemic Salmonella Typhi population during a Vi (typhoid) vaccination trial in Kolkata.
Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia. kholt@unimelb.edu.au
Background: Typhoid fever, caused by Salmonella enterica serovar Typhi (S. Typhi), is a major health problem especially in developing countries. Vaccines against typhoid are commonly used by travelers but less so by residents of endemic areas.
Methodology: We used single nucleotide polymorphism (SNP) typing to investigate the population structure of 372 S. Typhi isolated during a typhoid disease burden study and Vi vaccine trial in Kolkata, India. Approximately sixty thousand people were enrolled for fever surveillance for 19 months prior to, and 24 months following, Vi vaccination of one third of the study population (May 2003-December 2006, vaccinations given December 2004).
A diverse S. Typhi population was detected, including 21 haplotypes. The most common were of the H58 haplogroup (69%), which included all multidrug resistant isolates (defined as resistance to chloramphenicol, ampicillin and co-trimoxazole). Quinolone resistance was particularly high among H58-G isolates (97% Nalidixic acid resistant, 30% with reduced susceptibility to ciprofloxacin). Multiple typhoid fever episodes were detected in 22 households, however household clustering was not associated with specific S. Typhi haplotypes.
Conclusions: Typhoid fever in Kolkata is caused by a diverse population of S. Typhi, however H58 haplotypes dominate and are associated with multidrug and quinolone resistance. Vi vaccination did not obviously impact on the haplotype population structure of the S. Typhi circulating during the study period.
Funded by: Wellcome Trust
PLoS neglected tropical diseases 2012;6;1;e1490
PUBMED: 22303491; PMC: 3269425; DOI: 10.1371/journal.pntd.0001490
-
1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two 'missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10(-16). The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, 'missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation.European Journal of Human Genetics advance online publication, 1 February 2012; doi:10.1038/ejhg.2012.3.
Funded by: NCI NIH HHS: R01 CA082659-11S1
European journal of human genetics : EJHG 2012
PUBMED: 22293688; DOI: 10.1038/ejhg.2012.3
-
InterPro in 2011: new developments in the family and domain prediction database.
EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD Cambridge, UK. hunter@ebi.ac.uk
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F010508/1; NIGMS NIH HHS: GM081084
Nucleic acids research 2012;40;Database issue;D306-12
PUBMED: 22096229; PMC: 3245097; DOI: 10.1093/nar/gkr948
-
A method to infer positive selection from marker dynamics in an asexual population.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
MOTIVATION: The observation of positive selection acting on a mutant indicates that the corresponding mutation has some form of functional relevance. Determining the fitness effects of mutations thus has relevance to many interesting biological questions. One means of identifying beneficial mutations in an asexual population is to observe changes in the frequency of marked subsets of the population. We here describe a method to estimate the establishment times and fitnesses of beneficial mutations from neutral marker frequency data. RESULTS: The method accurately reproduces complex marker frequency trajectories. In simulations for which positive selection is close to 5% per generation, we obtain correlations upwards of 0.91 between correct and inferred haplotype establishment times. Where mutation selection coefficients are exponentially distributed, the inferred distribution of haplotype fitnesses is close to being correct. Applied to data from a bacterial evolution experiment, our method reproduces an observed correlation between evolvability and initial fitness defect.
Funded by: Wellcome Trust: 098051
Bioinformatics (Oxford, England) 2012;28;6;831-7
PUBMED: 22223745; PMC: 3307107; DOI: 10.1093/bioinformatics/btr722
-
Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke.
Wellcome Trust Centre for Human Genetics, University of Oxford, UK.
Genetic factors have been implicated in stroke risk, but few replicated associations have been reported. We conducted a genome-wide association study (GWAS) for ischemic stroke and its subtypes in 3,548 affected individuals and 5,972 controls, all of European ancestry. Replication of potential signals was performed in 5,859 affected individuals and 6,281 controls. We replicated previous associations for cardioembolic stroke near PITX2 and ZFHX3 and for large vessel stroke at a 9p21 locus. We identified a new association for large vessel stroke within HDAC9 (encoding histone deacetylase 9) on chromosome 7p21.1 (including further replication in an additional 735 affected individuals and 28,583 controls) (rs11984041; combined P = 1.87 × 10(-11); odds ratio (OR) = 1.42, 95% confidence interval (CI) = 1.28-1.57). All four loci exhibited evidence for heterogeneity of effect across the stroke subtypes, with some and possibly all affecting risk for only one subtype. This suggests distinct genetic architectures for different stroke subtypes.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 085475/B/08/Z, 085475/Z/08/Z, WT084724MA
Nature genetics 2012;44;3;328-33
PUBMED: 22306652; PMC: 3303115; DOI: 10.1038/ng.1081
-
Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. andrew.jackson@sanger.ac.uk
Antigenic variation enables pathogens to avoid the host immune response by continual switching of surface proteins. The protozoan blood parasite Trypanosoma brucei causes human African trypanosomiasis ("sleeping sickness") across sub-Saharan Africa and is a model system for antigenic variation, surviving by periodically replacing a monolayer of variant surface glycoproteins (VSG) that covers its cell surface. We compared the genome of Trypanosoma brucei with two closely related parasites Trypanosoma congolense and Trypanosoma vivax, to reveal how the variant antigen repertoire has evolved and how it might affect contemporary antigenic diversity. We reconstruct VSG diversification showing that Trypanosoma congolense uses variant antigens derived from multiple ancestral VSG lineages, whereas in Trypanosoma brucei VSG have recent origins, and ancestral gene lineages have been repeatedly co-opted to novel functions. These historical differences are reflected in fundamental differences between species in the scale and mechanism of recombination. Using phylogenetic incompatibility as a metric for genetic exchange, we show that the frequency of recombination is comparable between Trypanosoma congolense and Trypanosoma brucei but is much lower in Trypanosoma vivax. Furthermore, in showing that the C-terminal domain of Trypanosoma brucei VSG plays a crucial role in facilitating exchange, we reveal substantial species differences in the mechanism of VSG diversification. Our results demonstrate how past VSG evolution indirectly determines the ability of contemporary parasites to generate novel variant antigens through recombination and suggest that the current model for antigenic variation in Trypanosoma brucei is only one means by which these parasites maintain chronic infections.
Funded by: Wellcome Trust: 085349/Z/08/Z, WT 055558/Z/98/A, WT 055558/Z/98/C, WT 085775/Z/08/Z
Proceedings of the National Academy of Sciences of the United States of America 2012;109;9;3416-21
PUBMED: 22331916; PMC: 3295286; DOI: 10.1073/pnas.1117313109
-
Bcl11a is required for neuronal morphogenesis and sensory circuit formation in dorsal spinal cord development.
Institute of Molecular and Cellular Anatomy, Ulm University, 89081 Ulm, Germany.
Dorsal spinal cord neurons receive and integrate somatosensory information provided by neurons located in dorsal root ganglia. Here we demonstrate that dorsal spinal neurons require the Krüppel-C(2)H(2) zinc-finger transcription factor Bcl11a for terminal differentiation and morphogenesis. The disrupted differentiation of dorsal spinal neurons observed in Bcl11a mutant mice interferes with their correct innervation by cutaneous sensory neurons. To understand the mechanism underlying the innervation deficit, we characterized changes in gene expression in the dorsal horn of Bcl11a mutants and identified dysregulated expression of the gene encoding secreted frizzled-related protein 3 (sFRP3, or Frzb). Frzb mutant mice show a deficit in the innervation of the spinal cord, suggesting that the dysregulated expression of Frzb can account in part for the phenotype of Bcl11a mutants. Thus, our genetic analysis of Bcl11a reveals essential functions of this transcription factor in neuronal morphogenesis and sensory wiring of the dorsal spinal cord and identifies Frzb, a component of the Wnt pathway, as a downstream acting molecule involved in this process.
Development (Cambridge, England) 2012;139;10;1831-41
PUBMED: 22491945; DOI: 10.1242/dev.072850
-
The genomic basis of adaptive evolution in threespine sticklebacks.
Department of Developmental Biology, Beckman Center B300, Stanford University School of Medicine, Stanford California 94305, USA.
Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine-freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine-freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature.
Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: P50-HG002568
Nature 2012;484;7392;55-61
PUBMED: 22481358; PMC: 3322419; DOI: 10.1038/nature10944
-
Misuse of hierarchical linear models overstates the significance of a reported association between OXTR and prosociality.
Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Cambridge CB10 1HH, United Kingdom.
Proceedings of the National Academy of Sciences of the United States of America 2012;109;18;E1048
PUBMED: 22499788; PMC: 3344973; DOI: 10.1073/pnas.1202539109
-
Reply to "Human genetic studies on osteoarthritis from clinicians' viewpoints".
Osteoarthritis and cartilage / OARS, Osteoarthritis Research Society 2012;20;3;250-1; author reply 252
PUBMED: 22233813; DOI: 10.1016/j.joca.2011.09.008
-
Avidity-based extracellular interaction screening (AVEXIS) for the scalable detection of low-affinity extracellular receptor-ligand interactions.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute.
Extracellular protein:protein interactions between secreted or membrane-tethered proteins are critical for both initiating intercellular communication and ensuring cohesion within multicellular organisms. Proteins predicted to form extracellular interactions are encoded by approximately a quarter of human genes, but despite their importance and abundance, the majority of these proteins have no documented binding partner. Primarily, this is due to their biochemical intractability: membrane-embedded proteins are difficult to solubilise in their native conformation and contain structurally-important posttranslational modifications. Also, the interaction affinities between receptor proteins are often characterised by extremely low interaction strengths (half-lives < 1 second) precluding their detection with many commonly-used high throughput methods. Here, we describe an assay, AVEXIS (AVidity-based EXtracellular Interaction Screen) that overcomes these technical challenges enabling the detection of very weak protein interactions (t(1/2) ≤ 0.1 sec) with a low false positive rate. The assay is usually implemented in a high throughput format to enable the systematic screening of many thousands of interactions in a convenient microtitre plate format (Fig. 1). It relies on the production of soluble recombinant protein libraries that contain the ectodomain fragments of cell surface receptors or secreted proteins within which to screen for interactions; therefore, this approach is suitable for type I, type II, GPI-linked cell surface receptors and secreted proteins but not for multipass membrane proteins such as ion channels or transporters. The recombinant protein libraries are produced using a convenient and high-level mammalian expression system, to ensure that important posttranslational modifications such as glycosylation and disulphide bonds are added. Expressed recombinant proteins are secreted into the medium and produced in two forms: a biotinylated bait which can be captured on a streptavidin-coated solid phase suitable for screening, and a pentamerised enzyme-tagged (β-lactamase) prey. The bait and prey proteins are presented to each other in a binary fashion to detect direct interactions between them, similar to a conventional ELISA (Fig. 1). The pentamerisation of the proteins in the prey is achieved through a peptide sequence from the cartilage oligomeric matrix protein (COMP) and increases the local concentration of the ectodomains thereby providing significant avidity gains to enable even very transient interactions to be detected. By normalising the activities of both the bait and prey to predetermined levels prior to screening, we have shown that interactions having monomeric half-lives of 0.1 sec can be detected with low false positive rates.
Funded by: Wellcome Trust: 077108
Journal of visualized experiments : JoVE 2012;61;e3881
PUBMED: 22414956; DOI: 10.3791/3881
-
Genome-wide association study identifies multiple loci influencing human serum metabolite levels.
Institute for Molecular Medicine Finland, University of Helsinki, Finland.
Nuclear magnetic resonance assays allow for measurement of a wide range of metabolic phenotypes. We report here the results of a GWAS on 8,330 Finnish individuals genotyped and imputed at 7.7 million SNPs for a range of 216 serum metabolic phenotypes assessed by NMR of serum samples. We identified significant associations (P < 2.31 × 10(-10)) at 31 loci, including 11 for which there have not been previous reports of associations to a metabolic trait or disorder. Analyses of Finnish twin pairs suggested that the metabolic measures reported here show higher heritability than comparable conventional metabolic phenotypes. In accordance with our expectations, SNPs at the 31 loci associated with individual metabolites account for a greater proportion of the genetic component of trait variance (up to 40%) than is typically observed for conventional serum metabolic phenotypes. The identification of such associations may provide substantial insight into cardiometabolic disorders.
Funded by: Medical Research Council: G0500539, G0600705; NHLBI NIH HHS: 5R01HL087679; NIAAA NIH HHS: AA-08315, AA-09203, AA-12502, AA-15416; NIMH NIH HHS: 1RL1MH083268; Wellcome Trust: 089062/Z/09/Z, 098051, 89061/Z/09/Z, GR069224
Nature genetics 2012;44;3;269-76
PUBMED: 22286219; DOI: 10.1038/ng.1073
-
The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium.
Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, and Department of Genetics, School of Genetics and Microbiology, Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland.
More than 50 y of research have provided great insight into the physiology, metabolism, and molecular biology of Salmonella enterica serovar Typhimurium (S. Typhimurium), but important gaps in our knowledge remain. It is clear that a precise choreography of gene expression is required for Salmonella infection, but basic genetic information such as the global locations of transcription start sites (TSSs) has been lacking. We combined three RNA-sequencing techniques and two sequencing platforms to generate a robust picture of transcription in S. Typhimurium. Differential RNA sequencing identified 1,873 TSSs on the chromosome of S. Typhimurium SL1344 and 13% of these TSSs initiated antisense transcripts. Unique findings include the TSSs of the virulence regulators phoP, slyA, and invF. Chromatin immunoprecipitation revealed that RNA polymerase was bound to 70% of the TSSs, and two-thirds of these TSSs were associated with σ(70) (including phoP, slyA, and invF) from which we identified the -10 and -35 motifs of σ(70)-dependent S. Typhimurium gene promoters. Overall, we corrected the location of important genes and discovered 18 times more promoters than identified previously. S. Typhimurium expresses 140 small regulatory RNAs (sRNAs) at early stationary phase, including 60 newly identified sRNAs. Almost half of the experimentally verified sRNAs were found to be unique to the Salmonella genus, and <20% were found throughout the Enterobacteriaceae. This description of the transcriptional map of SL1344 advances our understanding of S. Typhimurium, arguably the most important bacterial infection model.
Proceedings of the National Academy of Sciences of the United States of America 2012;109;20;E1277-86
PUBMED: 22538806; DOI: 10.1073/pnas.1201061109
-
The population pharmacokinetics of R- and S-warfarin: effect of genetic and clinical factors.
Department of Biostatstics, Brownlow Street, University of Liverpool, Liverpool L69 3GS, UK. slane@liverpool.ac.uk
Background: Warfarin is a drug with a narrow therapeutic index and large interindividual variability in daily dosing requirements. Patients commencing warfarin treatment are at risk of bleeding due to excessive anticoagulation caused by overdosing. The interindividual variability in dose requirements is influenced by a number of factors, including polymorphisms in genes mediating warfarin pharmacology, co-medication, age, sex, body size and diet.
Aims: To develop population pharmacokinetic models of both R- and S-warfarin using clinical and genetic factors and to identify the covariates which influence the interindividual variability in the pharmacokinetic parameters of clearance and volume of distribution in patients on long-term warfarin therapy.
Methods: Patients commencing warfarin therapy were followed up for 26 weeks. Plasma warfarin enantiomer concentrations were determined in 306 patients for S-warfarin and in 309 patients for R-warfarin at 1, 8 and 26 weeks. Patients were also genotyped for CYP2C9 variants (CYP2C9*1,*2 and *3), two single-nucleotide polymorphisms (SNPs) in CYP1A2, one SNP in CYP3A4 and six SNPs in CYP2C19. A base pharmacokinetic model was developed using NONMEM software to determine the warfarin clearance and volume of distribution. The model was extended to include covariates that influenced the between-subject variability.
Results: Bodyweight, age, sex and CYP2C9 genotype significantly influenced S-warfarin clearance. The S-warfarin clearance was estimated to be 0.144 l h⁻¹ (95% confidence interval 0.131, 0.157) in a 70 kg woman aged 69.8 years with the wild-type CYP2C9 genotype, and the volume of distribution was 16.6 l (95% confidence interval 13.5, 19.7). Bodyweight and age, along with the SNPs rs3814637 (in CYP2C19) and rs2242480 (in CYP3A4), significantly influenced R-warfarin clearance. The R-warfarin clearance was estimated to be 0.125 l h⁻¹ (95% confidence interval 0.115, 0.135) in a 70 kg individual aged 69.8 years with the wild-type CYP2C19 and CYP3A4 genotypes, and the volume of distribution was 10.9 l (95% confidence interval 8.63, 13.2).
Conclusions: Our analysis, based on exposure rather than dose, provides quantitative estimates of the clinical and genetic factors impacting on the clearance of both the S- and R-enantiomers of warfarin, which can be used in developing improved dosing algorithms.
Funded by: Department of Health; Wellcome Trust
British journal of clinical pharmacology 2012;73;1;66-76
PUBMED: 21692828; PMC: 3248257; DOI: 10.1111/j.1365-2125.2011.04051.x
-
Comprehensive sequence analysis of nine Usher syndrome genes in the UK National Collaborative Usher Study.
Clinical and Molecular Genetics, Institute of Child Health, UCL, London, UK.
Background: Usher syndrome (USH) is an autosomal recessive disorder comprising retinitis pigmentosa, hearing loss and, in some cases, vestibular dysfunction. It is clinically and genetically heterogeneous with three distinctive clinical types (I-III) and nine Usher genes identified. This study is a comprehensive clinical and genetic analysis of 172 Usher patients and evaluates the contribution of digenic inheritance.
Methods: The genes MYO7A, USH1C, CDH23, PCDH15, USH1G, USH2A, GPR98, WHRN, CLRN1 and the candidate gene SLC4A7 were sequenced in 172 UK Usher patients, regardless of clinical type.
Results: No subject had definite mutations (nonsense, frameshift or consensus splice site mutations) in two different USH genes. Novel missense variants were classified UV1-4 (unclassified variant): UV4 is 'probably pathogenic', based on control frequency <0.23%, identification in trans to a pathogenic/probably pathogenic mutation and segregation with USH in only one family; and UV3 ('likely pathogenic') as above, but no information on phase. Overall 79% of identified pathogenic/UV4/UV3 variants were truncating and 21% were missense changes. MYO7A accounted for 53.2%, and USH1C for 14.9% of USH1 families (USH1C:c.496+1G>A being the most common USH1 mutation in the cohort). USH2A was responsible for 79.3% of USH2 families and GPR98 for only 6.6%. No mutations were found in USH1G, WHRN or SLC4A7.
Conclusions: One or two pathogenic/likely pathogenic variants were identified in 86% of cases. No convincing cases of digenic inheritance were found. It is concluded that digenic inheritance does not make a significant contribution to Usher syndrome; the observation of multiple variants in different genes is likely to reflect polymorphic variation, rather than digenic effects.
Funded by: Wellcome Trust
Journal of medical genetics 2012;49;1;27-36
PUBMED: 22135276; DOI: 10.1136/jmedgenet-2011-100468
-
Characterising chromosome rearrangements: recent technical advances in molecular cytogenetics.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. sls2@sanger.ac.uk
Genomic rearrangements can result in losses, amplifications, translocations and inversions of DNA fragments thereby modifying genome architecture, and potentially having clinical consequences. Many genomic disorders caused by structural variation have initially been uncovered by early cytogenetic methods. The last decade has seen significant progression in molecular cytogenetic techniques, allowing rapid and precise detection of structural rearrangements on a whole-genome scale. The high resolution attainable with these recently developed techniques has also uncovered the role of structural variants in normal genetic variation alongside single-nucleotide polymorphisms (SNPs). We describe how array-based comparative genomic hybridisation, SNP arrays, array painting and next-generation sequencing analytical methods (read depth, read pair and split read) allow the extensive characterisation of chromosome rearrangements in human genomes.
Funded by: Wellcome Trust: WT098051
Heredity 2012;108;1;75-85
PUBMED: 22086080; PMC: 3238113; DOI: 10.1038/hdy.2011.100
-
Transferrin and HFE genes interact in Alzheimer's disease risk: the Epistasis Project.
Oxford Project to Investigate Memory and Ageing, University Department of Physiology, Anatomy and Genetics, Oxford, UK. donald.lehmann@pharm.ox.ac.uk
Iron overload may contribute to the risk of Alzheimer's disease (AD). In the Epistasis Project, with 1757 cases of AD and 6295 controls, we studied 4 variants in 2 genes of iron metabolism: hemochromatosis (HFE) C282Y and H63D, and transferrin (TF) C2 and -2G/A. We replicated the reported interaction between HFE 282Y and TF C2 in the risk of AD: synergy factor, 1.75 (95% confidence interval, 1.1-2.8, p = 0.02) in Northern Europeans. The synergy factor was 3.1 (1.4-6.9; 0.007) in subjects with the APOEε4 allele. We found another interaction, between HFE 63HH and TF -2AA, markedly modified by age. Both interactions were found mainly or only in Northern Europeans. The interaction between HFE 282Y and TF C2 has now been replicated twice, in altogether 2313 cases of AD and 7065 controls, and has also been associated with increased iron load. We therefore suggest that iron overload may be a causative factor in the development of AD. Treatment for iron overload might thus be protective in some cases.
Funded by: Medical Research Council: G0400546
Neurobiology of aging 2012;33;1;202.e1-13
PUBMED: 20817350; DOI: 10.1016/j.neurobiolaging.2010.07.018
-
Expression of chemosensory proteins in the tsetse fly Glossina morsitans morsitans is related to female host-seeking behaviour.
Department of Biological Chemistry, Rothamsted Research, Harpenden, UK.
Chemosensory proteins (CSPs) are a class of soluble proteins present in high concentrations in the sensilla of insect antennae. It has been proposed that they play an important role in insect olfaction by mediating interactions between odorants and odorant receptors. Here we report, for the first time, the presence of five CSP genes in the tsetse fly Glossina morsitans morsitans, a major vector transmitting nagana in livestock. Real-time quantitative reverse transcription PCR showed that three of the CSPs are expressed in antennae. One of them, GmmCSP2, is transcribed at a very high level and could be involved in olfaction. We also determined expression in the antennae of both males and females at different life stages and with different blood feeding regimes. The transcription of GmmCSP2 was lower in male antennae than in females, with a sharp increase in 10-week-old flies, 48 h after a bloodmeal. Thus there is a clear relationship between CSP gene transcription and host searching behaviour. Genome annotation and phylogenetic analyses comparing G. morsitans morsitans CSPs with those of other Diptera showed rapid evolution after speciation of mosquitoes.
Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: WT085775/Z/08/Z
Insect molecular biology 2012;21;1;41-8
PUBMED: 22074189; DOI: 10.1111/j.1365-2583.2011.01114.x
-
GeneDB--an annotation database for pathogens.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. fl2@sanger.ac.uk
GeneDB (http://www.genedb.org) is a genome database for prokaryotic and eukaryotic pathogens and closely related organisms. The resource provides a portal to genome sequence and annotation data, which is primarily generated by the Pathogen Genomics group at the Wellcome Trust Sanger Institute. It combines data from completed and ongoing genome projects with curated annotation, which is readily accessible from a web based resource. The development of the database in recent years has focused on providing database-driven annotation tools and pipelines, as well as catering for increasingly frequent assembly updates. The website has been significantly redesigned to take advantage of current web technologies, and improve usability. The current release stores 41 data sets, of which 17 are manually curated and maintained by biologists, who review and incorporate data from the scientific literature, as well as other sources. GeneDB is primarily a production and annotation database for the genomes of predominantly pathogenic organisms.
Funded by: Wellcome Trust: WT 043565, WT 085775/Z/08/Z
Nucleic acids research 2012;40;Database issue;D98-108
PUBMED: 22116062; PMC: 3245030; DOI: 10.1093/nar/gkr1032
-
A combined functional annotation score for non-synonymous variants.
Wellcome Trust Sanger Institute, Hinxton, Hinxton, UK. ml10@sanger.ac.uk
Aims: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants.
Methods: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: 'HGMD-PUBLIC' and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls.
Results: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage.
Conclusion: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.
Funded by: Wellcome Trust: WT088885/Z/09/Z
Human heredity 2012;73;1;47-51
PUBMED: 22261837; DOI: 10.1159/000334984
-
Multiple origins, migratory paths and molecular profiles of cells populating the avian interpeduncular nucleus.
Department of Human Anatomy and Psychobiology, School of Medicine, University of Murcia, 30100 Murcia, Spain. bl2@sanger.ac.uk
The interpeduncular nucleus (IP) is a key limbic structure, highly conserved evolutionarily among vertebrates. The IP receives indirect input from limbic areas of the telencephalon, relayed by the habenula via the fasciculus retroflexus. The function of the habenulo-IP complex is poorly understood, although there is evidence that in rodents it modulates behaviors such as learning and memory, avoidance, reward and affective states. The IP has been an important subject of interest for neuroscientists, and there are multiple studies about the adult structure, chemoarchitecture and its connectivity, with complex results, due to the presence of multiple cell types across a variety of subnuclei. However, the ontogenetic origins of these populations have not been examined, and there is some controversy about its location in the midbrain-anterior hindbrain area. To address these issues, we first investigated the anteroposterior (AP) origin of the IP complex by fate-mapping its neuromeric origin in the chick, discovering that the IP develops strictly within isthmus and rhombomere 1. Next, we studied the dorsoventral (DV) positional identity of subpopulations of the IP complex. Our results indicate that there are at least four IP progenitor domains along the DV axis. These specific domains give rise to distinct subtypes of cell populations that target the IP with variable subnuclear specificity. Interestingly, these populations can be characterized by differential expression of the transcription factors Pax7, Nkx6.1, Otp, and Otx2. Each of these subpopulations follows a specific route of migration from its source, and all reach the IP roughly at the same stage. Remarkably, IP progenitor domains were found both in the alar and basal plates. Some IP populations showed rostrocaudal restriction in their origins (isthmus versus anterior or posterior r1 regions). A tentative developmental model of the structure of the avian IP is proposed. The IP emerges as a plurisegmental and developmentally heterogeneous formation that forms ventromedially within the isthmus and r1. These findings are relevant since they help to understand the highly complex chemoarchitecture, hodology and functions of this important brainstem structure.
Developmental biology 2012;361;1;12-26
PUBMED: 22019302; DOI: 10.1016/j.ydbio.2011.09.032
-
Community gene annotation in practice.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. jel@sanger.ac.uk
Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the 'Blessed' annotator and 'Gatekeeper' approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation. DATABASE URL: http://vega.sanger.ac.uk/index.html.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F02195X/1; Wellcome Trust: WT077198
Database : the journal of biological databases and curation 2012;2012;bas009
PUBMED: 22434843; PMC: 3308165; DOI: 10.1093/database/bas009
-
A systematic survey of loss-of-function variants in human protein-coding genes.
Wellcome Trust Sanger Institute, Hinxton, UK. macarthur@atgu.mgh.harvard.edu
Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
Funded by: Wellcome Trust: 090532/Z/09/Z, 098051
Science (New York, N.Y.) 2012;335;6070;823-8
PUBMED: 22344438; PMC: 3299548; DOI: 10.1126/science.1215040
-
Sleeping Beauty mutagenesis reveals cooperating mutations and pathways in pancreatic adenocarcinoma.
Division of Genetics and Genomics, Institute of Molecular and Cell Biology, Singapore 138673.
Pancreatic cancer is one of the most deadly cancers affecting the Western world. Because the disease is highly metastatic and difficult to diagnosis until late stages, the 5-y survival rate is around 5%. The identification of molecular cancer drivers is critical for furthering our understanding of the disease and development of improved diagnostic tools and therapeutics. We have conducted a mutagenic screen using Sleeping Beauty (SB) in mice to identify new candidate cancer genes in pancreatic cancer. By combining SB with an oncogenic Kras allele, we observed highly metastatic pancreatic adenocarcinomas. Using two independent statistical methods to identify loci commonly mutated by SB in these tumors, we identified 681 loci that comprise 543 candidate cancer genes (CCGs); 75 of these CCGs, including Mll3 and Ptk2, have known mutations in human pancreatic cancer. We identified point mutations in human pancreatic patient samples for another 11 CCGs, including Acvr2a and Map2k4. Importantly, 10% of the CCGs are involved in chromatin remodeling, including Arid4b, Kdm6a, and Nsd3, and all SB tumors have at least one mutated gene involved in this process; 20 CCGs, including Ctnnd1, Fbxo11, and Vgll4, are also significantly associated with poor patient survival. SB mutagenesis provides a rich resource of mutations in potential cancer drivers for cross-comparative analyses with ongoing sequencing efforts in human pancreatic adenocarcinoma.
Proceedings of the National Academy of Sciences of the United States of America 2012;109;16;5934-41
PUBMED: 22421440; PMC: 3341075; DOI: 10.1073/pnas.1202490109
-
Detection of recombination events in bacterial genomes from large population samples.
Department of Biomedical Engineering and Computational Science, Aalto University, PO Box 12200, FI-00076 AALTO, Finland. pekka.marttinen@aalto.fi
Analysis of important human pathogen populations is currently under transition toward whole-genome sequencing of growing numbers of samples collected on a global scale. Since recombination in bacteria is often an important factor shaping their evolution by enabling resistance elements and virulence traits to rapidly transfer from one evolutionary lineage to another, it is highly beneficial to have access to tools that can detect recombination events. Multiple advanced statistical methods exist for such purposes; however, they are typically limited either to only a few samples or to data from relatively short regions of a total genome. By harnessing the power of recent advances in Bayesian modeling techniques, we introduce here a method for detecting homologous recombination events from whole-genome sequence data for bacterial population samples on a large scale. Our statistical approach can efficiently handle hundreds of whole genome sequenced population samples and identify separate origins of the recombinant sequence, offering an enhanced insight into the diversification of bacterial clones at the level of the whole genome. A data set of 241 whole genome sequences from an important pandemic lineage of Streptococcus pneumoniae is used together with multiple simulated data sets to demonstrate the potential of our approach.
Funded by: NIGMS NIH HHS: U54GM088558
Nucleic acids research 2012;40;1;e6
PUBMED: 22064866; PMC: 3245952; DOI: 10.1093/nar/gkr928
-
Molecular tracing of the emergence, adaptation, and transmission of hospital-associated methicillin-resistant Staphylococcus aureus.
The Roslin Institute and Edinburgh Infectious Diseases, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian EH259RG, United Kingdom.
Hospital-associated infections caused by methicillin-resistant Staphylococcus aureus (MRSA) are a global health burden dominated by a small number of bacterial clones. The pandemic EMRSA-16 clone (ST36-II) has been widespread in UK hospitals for 20 y, but its evolutionary origin and the molecular basis for its hospital association are unclear. We carried out a Bayesian phylogenetic reconstruction on the basis of the genome sequences of 87 S. aureus isolates including 60 EMRSA-16 and 27 additional clonal complex 30 (CC30) isolates, collected from patients in three continents over a 53-y period. The three major pandemic clones to originate from the CC30 lineage, including phage type 80/81, Southwest Pacific, and EMRSA-16, shared a most recent common ancestor that existed over 100 y ago, whereas the hospital-associated EMRSA-16 clone is estimated to have emerged about 35 y ago. Our CC30 genome-wide analysis revealed striking molecular correlates of hospital- or community-associated pandemics represented by mobile genetic elements and nonsynonymous mutations affecting antibiotic resistance and virulence. Importantly, phylogeographic analysis indicates that EMRSA-16 spread within the United Kingdom by transmission from hospitals in large population centers in London and Glasgow to regional health-care settings, implicating patient referrals as an important cause of nationwide transmission. Taken together, the high-resolution phylogenomic approach used resulted in a unique understanding of the emergence and transmission of a major MRSA clone and provided molecular correlates of its hospital adaptation. Similar approaches for hospital-associated clones of other bacterial pathogens may inform appropriate measures for controlling their intra- and interhospital spread.
Funded by: NIGMS NIH HHS: R01 GM080602
Proceedings of the National Academy of Sciences of the United States of America 2012
PUBMED: 22586109; DOI: 10.1073/pnas.1202869109
-
Tandem duplication of chromosomal segments is common in ovarian and breast cancer genomes.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.
The application of paired-end next generation sequencing approaches has made it possible to systematically characterize rearrangements of the cancer genome to base-pair level. Utilizing this approach, we report the first detailed analysis of ovarian cancer rearrangements, comparing high grade serous and clear cell cancers, and these histotypes with other solid cancers. Somatic rearrangements were systematically characterized in 8 high grade serous and 5 clear cell ovarian cancer genomes and we report here the identification of more than 600 somatic rearrangements. Recurrent rearrangements of the transcriptional regulator gene, TSHZ3, were found in 3/8 serous cases. Comparison to breast, pancreatic and prostate cancer genomes revealed that a subset of ovarian cancers share a marked tandem duplication phenotype with triple-negative breast cancers. The tandem duplication phenotype was not linked to BRCA1/2 mutation, suggesting that other common mechanisms or carcinogenic exposures are operative. High grade serous cancers arising in women with germline BRCA1 or BRCA2 mutation showed a high frequency of small chromosomal deletions. These findings indicate that BRCA1/2 germline mutation may contribute to widespread structural change and that other undefined mechanism(s), which are potentially shared with triple negative breast cancer, promote tandem chromosomal duplications that sculpt the ovarian cancer genome. Copyright © 2012 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
The Journal of pathology 2012
PUBMED: 22514011; DOI: 10.1002/path.4042
-
Cancer gene discovery in the mouse.
Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1HH, UK.
Developments in high-throughput genome analysis and in computational tools have made it possible to rapidly profile entire cancer genomes with basepair resolution. In parallel with these advances, mouse models of cancer have evolved into powerful tools for cancer gene discovery. Here we discuss some of the approaches that may be used for cancer gene identification in the mouse and discuss how a cross-species 'oncogenomics' approach to cancer gene discovery represents a powerful strategy for finding genes that drive tumorigenesis.
Funded by: Cancer Research UK; Wellcome Trust
Current opinion in genetics & development 2012;22;1;14-20
PUBMED: 22265936; DOI: 10.1016/j.gde.2011.12.003
-
New insights into the bacterial fitness-associated mechanisms revealed by the characterization of large plasmids of an avian pathogenic E. coli.
The Biodesign Institute, Arizona State University, Tempe, Arizona, United States of America. melha.mellata@asu.edu
Extra-intestinal pathogenic E. coli (ExPEC), including avian pathogenic E. coli (APEC), pose a considerable threat to both human and animal health, with illness causing substantial economic loss. APEC strain χ7122 (O78∶K80∶H9), containing three large plasmids [pChi7122-1 (IncFIB/FIIA-FIC), pChi7122-2 (IncFII), and pChi7122-3 (IncI(2))]; and a small plasmid pChi7122-4 (ColE2-like), has been used for many years as a model strain to study the molecular mechanisms of ExPEC pathogenicity and zoonotic potential. We previously sequenced and characterized the plasmid pChi7122-1 and determined its importance in systemic APEC infection; however the roles of the other pChi7122 plasmids were still ambiguous. Herein we present the sequence of the remaining pChi7122 plasmids, confirming that pChi7122-2 and pChi7122-3 encode an ABC iron transport system (eitABCD) and a putative type IV fimbriae respectively, whereas pChi7122-4 is a cryptic plasmid. New features were also identified, including a gene cluster on pChi7122-2 that is not present in other E. coli strains but is found in Salmonella serovars and is predicted to encode the sugars catabolic pathways. In vitro evaluation of the APEC χ7122 derivative strains with the three large plasmids, either individually or in combinations, provided new insights into the role of plasmids in biofilm formation, bile and acid tolerance, and the interaction of E. coli strains with 3-D cultures of intestinal epithelial cells. In this study, we show that the nature and combinations of plasmids, as well as the background of the host strains, have an effect on these phenomena. Our data reveal new insights into the role of extra-chromosomal sequences in fitness and diversity of ExPEC in their phenotypes.
Funded by: NIAID NIH HHS: R21 AI090416
PloS one 2012;7;1;e29481
PUBMED: 22238616; PMC: 3251573; DOI: 10.1371/journal.pone.0029481
-
Communication about DTC Testing: Commentary on a 'Family Experience of Personal Genomics'.
Ethics Researcher and Registered Genetic Counselor, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, United Kingdom, am33@sanger.ac.uk.
This paper provides a commentary on 'Family Experience of Personal Genomics' (Corpas 2012). An overview is offered on the communication literature available to help support individuals and families to communicate about genetic information. Despite there being a wealth of evidence, built on years of genetic counseling practice, this does not appear to have been translated clearly to the Direct to Consumer (DTC) testing market. In many countries it is possible to order a DTC genetic test without the involvement of any health professional; there has been heated debate about whether this is appropriate or not. Much of the focus surrounding this has been on whether it is necessary to have a health professional available to offer their clinical knowledge and help with interpreting the DTC genetic test data. What has been missed from this debate is the importance of enabling customers of DTC testing services access to the abundance of information about how to communicate their genetic risks to others, including immediate family. Family communication about health and indeed genetics can be fraught with difficulty. Genetic health professionals, specifically genetic counselors, have particular expertise in family communication about genetics. Such information could be incredibly useful to kinships as they grapple with knowing how to communicate their genomic information with relatives.
Journal of genetic counseling 2012;21;3;392-8
PUBMED: 22223062; DOI: 10.1007/s10897-011-9472-8
-
Modeling partial monosomy for human chromosome 21q11.2-q21.1 reveals haploinsufficient genes influencing behavior and fat deposition.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Haploinsufficiency of part of human chromosome 21 results in a rare condition known as Monosomy 21. This disease displays a variety of clinical phenotypes, including intellectual disability, craniofacial dysmorphology, skeletal and cardiac abnormalities, and respiratory complications. To search for dosage-sensitive genes involved in this disorder, we used chromosome engineering to generate a mouse model carrying a deletion of the Lipi-Usp25 interval, syntenic with 21q11.2-q21.1 in humans. Haploinsufficiency for the 6 genes in this interval resulted in no gross morphological defects and behavioral analysis performed using an open field test, a test of anxiety, and tests for social interaction were normal in monosomic mice. Monosomic mice did, however, display impaired memory retention compared to control animals. Moreover, when fed a high-fat diet (HFD) monosomic mice exhibited a significant increase in fat mass/fat percentage estimate compared with controls, severe fatty changes in their livers, and thickened subcutaneous fat. Thus, genes within the Lipi-Usp25 interval may participate in memory retention and in the regulation of fat deposition.
Funded by: Cancer Research UK; Wellcome Trust
PloS one 2012;7;1;e29681
PUBMED: 22276124; PMC: 3262805; DOI: 10.1371/journal.pone.0029681
-
Behavior and target site selection of conjugative transposon Tn916 in two different strains of toxigenic Clostridium difficile.
Department of Microbial Diseases, UCL Eastman Dental Institute, University College London, London, UK. p.mullany@ucl.ac.uk
The insertion sites of the conjugative transposon Tn916 in the anaerobic pathogen Clostridium difficile were determined using Illumina Solexa high-throughput DNA sequencing of Tn916 insertion libraries in two different clinical isolates: 630ΔE, an erythromycin-sensitive derivative of 630 (ribotype 012), and the ribotype 027 isolate R20291, which was responsible for a severe outbreak of C. difficile disease. A consensus 15-bp Tn916 insertion sequence was identified which was similar in both strains, although an extended consensus sequence was observed in R20291. A search of the C. difficile 630 genome showed that the Tn916 insertion motif was present 100,987 times, with approximately 63,000 of these motifs located in genes and 35,000 in intergenic regions. To test the usefulness of Tn916 as a mutagen, a functional screen allowed the isolation of a mutant. This mutant contained Tn916 inserted into a gene involved in flagellar biosynthesis.
Funded by: Medical Research Council: G0601176
Applied and environmental microbiology 2012;78;7;2147-53
PUBMED: 22267673; PMC: 3302608; DOI: 10.1128/AEM.06193-11
-
Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer.
Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK. elizabeth.murchison@sanger.ac.up
The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations.
Funded by: Wellcome Trust: 077012/Z/05/Z
Cell 2012;148;4;780-91
PUBMED: 22341448; PMC: 3281993; DOI: 10.1016/j.cell.2011.11.065
-
An atypical facial appearance and growth pattern in a child with Cornelia de Lange Syndrome: an intragenic deletion predicting loss of the N-terminal region of NIPBL.
South-east Scotland Clinical Genetics Services Western General Hospital, Edinburgh, UK. Jennie.murray@luht.scot.nhs.uk
Cornelia de Lange Syndrome (CdLS) is a multisystem disorder with a live birth prevalence of approximately one per 15 000. Clinical diagnosis is based on a characteristic facies – low frontal hair line, short nose, triangular nasal tip, crescent shaped mouth, upturned nose, and arched eyebrows – characteristic limb defects and a distinctive pattern of growth and development. Approximately half of all classical cases of CdLS have heterozygous loss of-function mutations in the gene encoding NIPBL, a component of the cohesion-loading apparatus (Dorsett and Krantz, 2009). Herein we describe a patient with a rare intragenic deletion of NIPBL who has typical microcephaly and developmental problems but atypical growth pattern and facial features.
Funded by: Wellcome Trust: WT077008
Clinical dysmorphology 2012;21;1;22-3
PUBMED: 21934607; DOI: 10.1097/MCD.0b013e32834c4afc
-
Genome wide adaptations of Plasmodium falciparum in response to lumefantrine selective drug pressure.
Kenya Medical Research Institute, Welcome Trust Research Programme, Kilifi, Kenya.
The combination therapy of the Artemisinin-derivative Artemether (ART) with Lumefantrine (LM) (Coartem®) is an important malaria treatment regimen in many endemic countries. Resistance to Artemisinin has already been reported, and it is feared that LM resistance (LMR) could also evolve quickly. Therefore molecular markers which can be used to track Coartem® efficacy are urgently needed. Often, stable resistance arises from initial, unstable phenotypes that can be identified in vitro. Here we have used the Plasmodium falciparum multidrug resistant reference strain V1S to induce LMR in vitro by culturing the parasite under continuous drug pressure for 16 months. The initial IC(50) (inhibitory concentration that kills 50% of the parasite population) was 24 nM. The resulting resistant strain V1S(LM), obtained after culture for an estimated 166 cycles under LM pressure, grew steadily in 378 nM of LM, corresponding to 15 times the IC(50) of the parental strain. However, after two weeks of culturing V1S(LM) in drug-free medium, the IC(50) returned to that of the initial, parental strain V1S. This transient drug tolerance was associated with major changes in gene expression profiles: using the PFSANGER Affymetrix custom array, we identified 184 differentially expressed genes in V1S(LM). Among those are 18 known and putative transporters including the multidrug resistance gene 1 (pfmdr1), the multidrug resistance associated protein and the V-type H+ pumping pyrophosphatase 2 (pfvp2) as well as genes associated with fatty acid metabolism. In addition we detected a clear selective advantage provided by two genomic loci in parasites grown under LM drug pressure, suggesting that all, or some of those genes contribute to development of LM tolerance--they may prove useful as molecular markers to monitor P. falciparum LM susceptibility.
PloS one 2012;7;2;e31623
PUBMED: 22384044; PMC: 3288012; DOI: 10.1371/journal.pone.0031623
-
Genomic islands of divergence in hybridizing Heliconius butterflies identified by large-scale targeted sequencing.
Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK. njn27@cam.ac.uk
Heliconius butterflies represent a recent radiation of species, in which wing pattern divergence has been implicated in speciation. Several loci that control wing pattern phenotypes have been mapped and two were identified through sequencing. These same gene regions play a role in adaptation across the whole Heliconius radiation. Previous studies of population genetic patterns at these regions have sequenced small amplicons. Here, we use targeted next-generation sequence capture to survey patterns of divergence across these entire regions in divergent geographical races and species of Heliconius. This technique was successful both within and between species for obtaining high coverage of almost all coding regions and sufficient coverage of non-coding regions to perform population genetic analyses. We find major peaks of elevated population differentiation between races across hybrid zones, which indicate regions under strong divergent selection. These 'islands' of divergence appear to be more extensive between closely related species, but there is less clear evidence for such islands between more distantly related species at two further points along the 'speciation continuum'. We also sequence fosmid clones across these regions in different Heliconius melpomene races. We find no major structural rearrangements but many relatively large (greater than 1 kb) insertion/deletion events (including gain/loss of transposable elements) that are variable between races.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council
Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2012;367;1587;343-53
PUBMED: 22201164; PMC: 3233711; DOI: 10.1098/rstb.2011.0198
-
Chromosomal rearrangements and karyotype evolution in carnivores revealed by chromosome painting.
State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, the Chinese Academy of Sciences, Kunming, Yunnan, PR China.whnie@mail.kiz.ac.cn
Chromosomal evolution in carnivores has been revisited extensively using cross-species chromosome painting. Painting probes derived from flow-sorted chromosomes of the domestic dog, which has one of the most rearranged karyotypes in mammals and the highest dipoid number (2n=78) in carnivores, are a powerful tool in detecting both evolutionary intra- and inter-chromosomal rearrangements. However, only a few comparative maps have been established between dog and other non-Canidae species. Here, we extended cross-species painting with dog probes to seven more species representing six carnivore families: Eurasian lynx (Lynx lynx), the stone marten (Martes foina), the small Indian civet (Viverricula indica), the Asian palm civet (Paradoxurus hermaphrodites), Javan mongoose (Hepestes javanicas), the raccoon (Procyon lotor) and the giant panda (Ailuropoda melanoleuca). The numbers and positions of intra-chromosomal rearrangements were found to differ among these carnivore species. A comparative map between human and stone marten, and a map among the Yangtze finless porpoise (Neophocaena phocaenoides asiaeorientalis), stone marten and human were also established to facilitate outgroup comparison and to integrate comparative maps between stone marten and other carnivores with such maps between human and other species. These comparative maps give further insight into genome evolution and karyotype phylogenetic relationships among carnivores, and will facilitate the transfer of gene mapping data from human, domestic dog and cat to other species.
Heredity 2012;108;1;17-27
PUBMED: 22086079; PMC: 3238119; DOI: 10.1038/hdy.2011.107
-
Sex-specific influence of DRD2 on ADHD-type temperament in a large population-based birth cohort.
aPublic Health Genomics Unit, Institute for Molecular Medicine Finland FIMM, University of Helsinki and National Institute for Health and Welfare bDepartment of Medical Genetics, University of Helsinki cDepartment of Public Health, Hjelt Institute, University of Helsinki dInstitute of Health Sciences, University of Oulu eUnit of General Practice, University Hospital of Oulu fClinic of Child Psychiatry, University and University Hospital of Oulu gDepartment of Child and Adolescent Health, National Public Health Institute, Finland hProgram in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts iSemel Institute for Neuroscience and Human Behavior jDepartment of Human Genetics, University of California, Los Angeles, California, USA kDepartment of Epidemiology and Public Health, Imperial College London lWellcome Trust Sanger Institute, Cambridge, UK.
Attention-deficit/hyperactivity disorder (ADHD) is a childhood-onset neurodevelopmental disorder with a significant public-health impact. Previously, we described a candidate gene study in a population-based birth cohort that demonstrated an association with ADHD-affected males and the dopamine receptor D2 (DRD2). The current study evaluates potential associations of dopamine receptor genes and Cloninger temperament traits within this same sample. Participants with stringent lifetime ADHD diagnoses were ascertained systematically from the genetically isolated Northern Finland 1986 Birth Cohort (n=9432), resulting in 178 cases and 157 controls. Markers in all known dopamine receptor genes were genotyped. We report an association of DRD2 with low Persistence in females (rs1079727 P=0.02, rs1124491 P=0.02, rs1800497 P=0.03). The associated DRD2 minor allelic haplotype (CAA, P=0.03) is the same haplotype we previously associated with ADHD in males in this birth cohort. The current study further supports previous results on the role of DRD2 in individuals with ADHD. Investigations suggest that DRD2 may have an impact on both males and females, but the particular outcome appears sex-specific, manifesting as ADHD in males and low Persistence in females. Furthermore, these findings suggest that the putative role of low Persistence as an endophenotype for ADHD deserves further investigation.
Psychiatric genetics 2012
PUBMED: 22531292; DOI: 10.1097/YPG.0b013e32834c0cc8
-
Transmission of malaria to mosquitoes blocked by bumped kinase inhibitors.
Effective control and eradication of malaria will require new tools to prevent transmission. Current antimalarial therapies targeting the asexual stage of Plasmodium do not prevent transmission of circulating gametocytes from infected humans to mosquitoes. Here, we describe a new class of transmission-blocking compounds, bumped kinase inhibitors (BKIs), which inhibit microgametocyte exflagellation. Oocyst formation and sporozoite production, necessary for transmission to mammals, were inhibited in mosquitoes fed on either BKI-1-treated human blood or mice treated with BKI-1. BKIs are hypothesized to act via inhibition of Plasmodium calcium-dependent protein kinase 4 and predicted to have little activity against mammalian kinases. Our data show that BKIs do not inhibit proliferation of mammalian cell lines and are well tolerated in mice. Used in combination with drugs active against asexual stages of Plasmodium, BKIs could prove an important tool for malaria control and eradication.
The Journal of clinical investigation 2012
PUBMED: 22565309; DOI: 10.1172/JCI61822
-
High-resolution single nucleotide polymorphism analysis distinguishes recrudescence and reinfection in recurrent invasive nontyphoidal Salmonella typhimurium disease.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Background: Bloodstream infection with invasive nontyphoidal Salmonella (iNTS) is common and severe among human immunodeficiency virus (HIV)-infected adults throughout sub-Saharan Africa. The epidemiology of iNTS is poorly understood. Survivors frequently experience multiply recurrent iNTS disease, despite appropriate antimicrobial therapy, but recrudescence and reinfection have previously been difficult to distinguish.
Methods: We used high-resolution single nucleotide polymorphism (SNP) typing and whole-genome phylogenetics to investigate 47 iNTS isolates from 14 patients with multiple recurrences following an index presentation with iNTS disease in Blantyre, Malawi. We isolated nontyphoidal salmonellae organisms from blood (n = 35), bone marrow (n = 8), stool (n = 2), urine (n = 1), and throat (n = 1) samples; these isolates comprised serotypes Typhimurium (n = 43) and Enteritidis (n = 4).
Results: Recrudescence with identical or highly phylogenetically related isolates accounted for 78% of recurrences, and reinfection with phylogenetically distinct isolates accounted for 22% of recurrences. Both recrudescence and reinfection could occur in the same individual, and reinfection could either precede or follow recrudescence. The number of days to recurrence (23-486 d) was not different for recrudescence or reinfection. The number of days to recrudescence was unrelated to the number of SNPs accumulated by recrudescent organisms, suggesting that there was little genetic change during persistence in the host, despite exposure to multiple courses of antibiotics. Of Salmonella Typhimurium isolates, 42 of 43 were pathovar ST313.
Conclusions: High-resolution whole-genome phylogenetics successfully discriminated recrudescent iNTS from reinfection, despite a high level of clonality within and among individuals, giving insights into pathogenesis and management. These methods also have adequate resolution to investigate the epidemiology and transmission of this important African pathogen.
Funded by: Wellcome Trust: 076964, 098051
Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2012;54;7;955-63
PUBMED: 22318974; PMC: 3297646; DOI: 10.1093/cid/cir1032
-
Exome sequencing of liver fluke-associated cholangiocarcinoma.
1] National Cancer Centre Singapore-Van Andel Research Institute Translational Research Laboratory, Division of Medical Sciences, National Cancer Centre, Singapore. [2] Division of Cancer and Stem Cell Biology, Duke-National University of Singapore (NUS) Graduate Medical School, Singapore. [3].
Opisthorchis viverrini-related cholangiocarcinoma (CCA), a fatal bile duct cancer, is a major public health concern in areas endemic for this parasite. We report here whole-exome sequencing of eight O. viverrini-related tumors and matched normal tissue. We identified and validated 206 somatic mutations in 187 genes using Sanger sequencing and selected 15 genes for mutation prevalence screening in an additional 46 individuals with CCA (cases). In addition to the known cancer-related genes TP53 (mutated in 44.4% of cases), KRAS (16.7%) and SMAD4 (16.7%), we identified somatic mutations in 10 newly implicated genes in 14.8-3.7% of cases. These included inactivating mutations in MLL3 (in 14.8% of cases), ROBO2 (9.3%), RNF43 (9.3%) and PEG3 (5.6%), and activating mutations in the GNAS oncogene (9.3%). These genes have functions that can be broadly grouped into three biological classes: (i) deactivation of histone modifiers, (ii) activation of G protein signaling and (iii) loss of genome stability. This study provides insight into the mutational landscape contributing to O. viverrini-related CCA.
Nature genetics 2012
PUBMED: 22561520; DOI: 10.1038/ng.2273
-
Delineating nuclear reprogramming.
Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.
Nuclear reprogramming is described as a molecular switch, triggered by the conversion of one cell type to another. Several key experiments in the past century have provided insight into the field of nuclear reprogramming. Previously deemed impossible, this research area is now brimming with new findings and developments. In this review, we aim to give a historical perspective on how the notion of nuclear reprogramming was established, describing main experiments that were performed, including (1) somatic cell nuclear transfer, (2) exposure to cell extracts and cell fusion, and (3) transcription factor induced lineage switch. Ultimately, we focus on (4) transcription factor induced pluripotency, as initiated by a landmark discovery in 2006, where the process of converting somatic cells to a pluripotent state was narrowed down to four transcription factors. The conception that somatic cells possess the capacity to revert to an immature status brings about huge clinical implications including personalized therapy, drug screening and disease modeling. Although this technology has potential to revolutionize the medical field, it is still impeded by technical and biological obstacles. This review describes the effervescent changes in this field, addresses bottlenecks hindering its advancement and in conclusion, applies the latest findings to overcome these issues.
Protein & cell 2012
PUBMED: 22467264; DOI: 10.1007/s13238-012-2920-x
-
Xirp proteins mark injured skeletal muscle in zebrafish.
Max Delbrück Center (MDC) for Molecular Medicine, Berlin, Germany.
Myocellular regeneration in vertebrates involves the proliferation of activated progenitor or dedifferentiated myogenic cells that have the potential to replenish lost tissue. In comparison little is known about cellular repair mechanisms within myocellular tissue in response to small injuries caused by biomechanical or cellular stress. Using a microarray analysis for genes upregulated upon myocellular injury, we identified zebrafish Xin-actin-binding repeat-containing protein1 (Xirp1) as a marker for wounded skeletal muscle cells. By combining laser-induced micro-injury with proliferation analyses, we found that Xirp1 and Xirp2a localize to nascent myofibrils within wounded skeletal muscle cells and that the repair of injuries does not involve cell proliferation or Pax7(+) cells. Through the use of Xirp1 and Xirp2a as markers, myocellular injury can now be detected, even though functional studies indicate that these proteins are not essential in this process. Previous work in chicken has implicated Xirps in cardiac looping morphogenesis. However, we found that zebrafish cardiac morphogenesis is normal in the absence of Xirp expression, and animals deficient for cardiac Xirp expression are adult viable. Although the functional involvement of Xirps in developmental and repair processes currently remains enigmatic, our findings demonstrate that skeletal muscle harbours a rapid, cell-proliferation-independent response to injury which has now become accessible to detailed molecular and cellular characterizations.
PloS one 2012;7;2;e31041
PUBMED: 22355335; PMC: 3280289; DOI: 10.1371/journal.pone.0031041
-
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. so1@sanger.ac.uk
Background: Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences.
Results: We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates.
Conclusion: We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
Funded by: Wellcome Trust: 079355/Z/06/Z
BMC genomics 2012;13;1
PUBMED: 22214261; PMC: 3312816; DOI: 10.1186/1471-2164-13-1
-
A genome-wide association search for type 2 diabetes genes in African Americans.
Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, North Carolina, United States of America. nallred@wfubmc.edu
African Americans are disproportionately affected by type 2 diabetes (T2DM) yet few studies have examined T2DM using genome-wide association approaches in this ethnicity. The aim of this study was to identify genes associated with T2DM in the African American population. We performed a Genome Wide Association Study (GWAS) using the Affymetrix 6.0 array in 965 African-American cases with T2DM and end-stage renal disease (T2DM-ESRD) and 1029 population-based controls. The most significant SNPs (n = 550 independent loci) were genotyped in a replication cohort and 122 SNPs (n = 98 independent loci) were further tested through genotyping three additional validation cohorts followed by meta-analysis in all five cohorts totaling 3,132 cases and 3,317 controls. Twelve SNPs had evidence of association in the GWAS (P<0.0071), were directionally consistent in the Replication cohort and were associated with T2DM in subjects without nephropathy (P<0.05). Meta-analysis in all cases and controls revealed a single SNP reaching genome-wide significance (P<2.5×10(-8)). SNP rs7560163 (P = 7.0×10(-9), OR (95% CI) = 0.75 (0.67-0.84)) is located intergenically between RND3 and RBM43. Four additional loci (rs7542900, rs4659485, rs2722769 and rs7107217) were associated with T2DM (P<0.05) and reached more nominal levels of significance (P<2.5×10(-5)) in the overall analysis and may represent novel loci that contribute to T2DM. We have identified novel T2DM-susceptibility variants in the African-American population. Notably, T2DM risk was associated with the major allele and implies an interesting genetic architecture in this population. These results suggest that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.
Funded by: NCRR NIH HHS: M01 RR07122; NHLBI NIH HHS: R01 HL56266; NIDDK NIH HHS: K99 DK081350, R01 DK053591, R01 DK066358, R01 DK070941, R01 DK070941-06; PHS HHS: HHSC268200782096C
PloS one 2012;7;1;e29202
PUBMED: 22238593; PMC: 3251563; DOI: 10.1371/journal.pone.0029202
-
Assignment of protein interactions from affinity purification/mass spectrometry data.
Wellcome Trust Sanger Institute , Wellcome Trust Genome Campus, Hinxton, CB10 1SA Cambridgeshire, United Kingdom. mp3@sanger.ac.uk
The combination of affinity purification with mass spectrometry analysis has become the method of choice for protein complex characterization. With the improved performance of mass spectrometry technology, the sensitivity of the analyses is increasing, probing deeper into molecular interactions and yielding longer lists of proteins. These identify not only core complex subunits but also the more inaccessible proteins that interact weakly or transiently. Alongside them, contaminant proteins, which are often abundant proteins in the cell, tend to be recovered in affinity experiments because they bind nonspecifically and with low affinity to matrix, tag, and/or antibody. The challenge now lies in discriminating nonspecific binders from true interactors, particularly at the low level and in a larger scale. This review aims to summarize the variety of methods that have been used to distinguish contaminants from specific interactions in the past few years, ranging from manual elimination using heuristic rules to more sophisticated probabilistic scoring approaches. We aim to give awareness on the processing that takes place before an interaction list is reported and on the different types of list curation approaches suited to the different experiments.
Funded by: Wellcome Trust: 079643/Z/06/Z
Journal of proteome research 2012;11;3;1462-74
PUBMED: 22283744; DOI: 10.1021/pr2011632
-
Extent, causes, and consequences of small RNA expression variation in human adipose tissue.
Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
Small RNAs are functional molecules that modulate mRNA transcripts and have been implicated in the aetiology of several common diseases. However, little is known about the extent of their variability within the human population. Here, we characterise the extent, causes, and effects of naturally occurring variation in expression and sequence of small RNAs from adipose tissue in relation to genotype, gene expression, and metabolic traits in the MuTHER reference cohort. We profiled the expression of 15 to 30 base pair RNA molecules in subcutaneous adipose tissue from 131 individuals using high-throughput sequencing, and quantified levels of 591 microRNAs and small nucleolar RNAs. We identified three genetic variants and three RNA editing events. Highly expressed small RNAs are more conserved within mammals than average, as are those with highly variable expression. We identified 14 genetic loci significantly associated with nearby small RNA expression levels, seven of which also regulate an mRNA transcript level in the same region. In addition, these loci are enriched for variants significant in genome-wide association studies for body mass index. Contrary to expectation, we found no evidence for negative correlation between expression level of a microRNA and its target mRNAs. Trunk fat mass, body mass index, and fasting insulin were associated with more than twenty small RNA expression levels each, while fasting glucose had no significant associations. This study highlights the similar genetic complexity and shared genetic control of small RNA and mRNA transcripts, and gives a quantitative picture of small RNA expression variation in the human population.
PLoS genetics 2012;8;5;e1002704
PUBMED: 22589741; PMC: 3349731; DOI: 10.1371/journal.pgen.1002704
-
Bioimage informatics: a new category in Bioinformatics.
Janelia Farm Research Campus, Howard Hughes Medical Institute, VA 20147, USA, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain and Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA.
Bioinformatics (Oxford, England) 2012;28;8;1057
PUBMED: 22399678; PMC: 3324521; DOI: 10.1093/bioinformatics/bts111
-
Evolutionary genetics of the human Rh blood group system.
Department of Anthropology, Pennsylvania State University, University Park, PA, 16801, USA.
The evolutionary history of variation in the human Rh blood group system, determined by variants in the RHD and RHCE genes, has long been an unresolved puzzle in human genetics. Prior to medical treatments and interventions developed in the last century, the D-positive (RhD positive) children of D-negative (RhD negative) women were at risk for hemolytic disease of the newborn, if the mother produced anti-D antibodies following sensitization to the blood of a previous D-positive child. Given the deleterious fitness consequences of this disease, the appreciable frequencies in European populations of the responsible RHD gene deletion variant (for example, 0.43 in our study) seem surprising. In this study, we used new molecular and genomic data generated from four HapMap population samples to test the idea that positive selection for an as-of-yet unknown fitness benefit of the RHD deletion may have offset the otherwise negative fitness effects of hemolytic disease of the newborn. We found no evidence that positive natural selection affected the frequency of the RHD deletion. Thus, the initial rise to intermediate frequency of the RHD deletion in European populations may simply be explained by genetic drift/founder effect, or by an older or more complex sweep that we are insufficiently powered to detect. However, our simulations recapitulate previous findings that selection on the RHD deletion is frequency dependent and weak or absent near 0.5. Therefore, once such a frequency was achieved, it could have been maintained by a relatively small amount of genetic drift. We unexpectedly observed evidence for positive selection on the C allele of RHCE in non-African populations (on chromosomes with intact copies of the RHD gene) in the form of an unusually high F( ST ) value and the high frequency of a single haplotype carrying the C allele. RhCE function is not well understood, but the C/c antigenic variant is clinically relevant and can result in hemolytic disease of the newborn, albeit much less commonly and severely than that related to the D-negative blood type. Therefore, the potential fitness benefits of the RHCE C allele are currently unknown but merit further exploration.
Human genetics 2012
PUBMED: 22367406; DOI: 10.1007/s00439-012-1147-5
-
Clinically significant copy number alterations and complex rearrangements of MYB and NFIB in head and neck adenoid cystic carcinoma.
Sahlgrenska Cancer Center, Department of Pathology, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden.
Adenoid cystic carcinoma (ACC) of the head and neck is a malignant tumor with poor long-term prognosis. Besides the recently identified MYB-NFIB fusion oncogene generated by a t(6;9) translocation, little is known about other genetic alterations in ACC. Using high-resolution, array-based comparative genomic hybridization, and massively paired-end sequencing, we explored genomic alterations in 40 frozen ACCs. Eighty-six percent of the tumors expressed MYB-NFIB fusion transcripts and 97% overexpressed MYB mRNA, indicating that MYB activation is a hallmark of ACC. Thirty-five recurrent copy number alterations (CNAs) were detected, including losses involving 12q, 6q, 9p, 11q, 14q, 1p, and 5q and gains involving 1q, 9p, and 22q. Grade III tumors had on average a significantly higher number of CNAs/tumor compared to Grade I and II tumors (P = 0.007). Losses of 1p, 6q, and 15q were associated with high-grade tumors, whereas losses of 14q were exclusively seen in Grade I tumors. The t(6;9) rearrangements were associated with a complex pattern of breakpoints, deletions, insertions, inversions, and for 9p also gains. Analyses of fusion-negative ACCs using high-resolution arrays and massively paired-end sequencing revealed that MYB may also be deregulated by other mechanisms in addition to gene fusion. Our studies also identified several down-regulated candidate tumor suppressor genes (CTNNBIP1, CASP9, PRDM2, and SFN) in 1p36.33-p35.3 that may be of clinical significance in high-grade tumors. Further, studies of these and other potential target genes may lead to the identification of novel driver genes in ACC. © 2012 Wiley Periodicals, Inc.
Genes, chromosomes & cancer 2012
PUBMED: 22505352; DOI: 10.1002/gcc.21965
-
Shared loci for migraine and epilepsy on chromosomes 14q12-q23 and 12q24.2-q24.3.
Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland. anne.polvi@helsinki.fi
Objectives: To describe clinical characteristics and to identify susceptibility loci for epilepsy and migraine in a Finnish family with a complex phenotype.
Methods: Participating family members were interviewed and medical files were reviewed. The seizure classification was made according to International League Against Epilepsy criteria. Migraine diagnosis was made using the validated Finnish Migraine Specific Questionnaire for Family Studies and criteria according to the current International Classification of Headache Disorders-II. DNA samples were obtained from 56 family members and nonparametric genome-wide linkage analyses were performed using 382 polymorphic microsatellite markers. The most promising loci were fine-mapped with additional microsatellite markers.
Results: Clinical data were obtained from 60 family members of whom 12 (20%) had idiopathic epileptic seizures. Eight of those 12 (67%) also had migraine. Altogether 33 of the 60 family members (55%) had migraine. Significant evidence of linkage was found between a locus on 14q12-q23 and migraine (p = 0.0001). Suggestive evidence of linkage in this region was also found for epilepsy with generalized tonic-clonic seizures (p = 0.0034). In addition, significant evidence of linkage was found at a locus on 12q24.2-q24.3 (p < 0.001) for migraine alone and for the combined phenotype of migraine and epilepsy.
Conclusions: Our data suggest the occurrence of common susceptibility loci for epilepsy and migraine on chromosomes 14q12-q23 and 12q24.2-q24.3, implicating a shared genetic etiology for these 2 diseases.
Neurology 2012;78;3;202-9
PUBMED: 22218271; DOI: 10.1212/WNL.0b013e31823fcd87
-
A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.
Funded by: FIC NIH HHS: TW007012; PHS HHS: HHSN272201000009I; Wellcome Trust: 085775/Z/08/Z
PLoS neglected tropical diseases 2012;6;1;e1455
PUBMED: 22253936; PMC: 3254664; DOI: 10.1371/journal.pntd.0001455
-
The Pfam protein families database.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. mp13@sanger.ac.uk
Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Howard Hughes Medical Institute; Wellcome Trust: WT077044/Z/05/Z
Nucleic acids research 2012;40;Database issue;D290-301
PUBMED: 22127870; PMC: 3245129; DOI: 10.1093/nar/gkr1065
-
MEROPS: the database of proteolytic enzymes, their substrates and inhibitors.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ndr@sanger.ac.uk
Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database (http://merops.sanger.ac.uk) aims to fulfil the need for an integrated source of information about these. The database has hierarchical classifications in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. The database has been expanded to include proteolytic enzymes other than peptidases. Special identifiers for peptidases from a variety of model organisms have been established so that orthologues can be detected in other species. A table of predicted active-site residue and metal ligand positions and the residue ranges of the peptidase domains in orthologues has been added to each peptidase summary. New displays of tertiary structures, which can be rotated or have the surfaces displayed, have been added to the structure pages. New indexes for gene names and peptidase substrates have been made available. Among the enhancements to existing features are the inclusion of small-molecule inhibitors in the tables of peptidase-inhibitor interactions, a table of known cleavage sites for each protein substrate, and tables showing the substrate-binding preferences of peptidases derived from combinatorial peptide substrate libraries.
Funded by: Wellcome Trust: WT098051
Nucleic acids research 2012;40;Database issue;D343-50
PUBMED: 22086950; PMC: 3245014; DOI: 10.1093/nar/gkr987
-
Changes in HDL cholesterol and cardiovascular outcomes after lipid modification therapy.
Division of Clinical Sciences, St George's University of London, Cranmer Terrace, London SW17 0RE, UK; kray@sgul.ac.uk.
Background: Lipid modification therapy (LMT) produces cardiovascular benefits principally through reductions in low density lipoprotein cholesterol. While recent evidence, using data from 454 participants in the Framingham Offspring Study, has suggested that increases in high density lipoprotein cholesterol (HDL-C) are also associated with a reduction in cardiovascular outcomes, independently of changes in low density lipoprotein cholesterol, replication of this finding is important. The authors therefore present further results using data from the EPIC-Norfolk (UK) and Rotterdam (The Netherlands) prospective cohort studies.
Methods: A total of 1148 participants, 446 from the EPIC-Norfolk and 702 from the Rotterdam study, were assessed for lipids before and after starting LMT. Subsequent risk of cardiovascular events, ascertained through linkage with mortality records and hospital databases, was investigated using Cox proportional hazards regression. Random effects meta-analysis was used to combine results across studies.
Results: Based on combined data from the EPIC-Norfolk and Rotterdam studies there was some evidence that change in HDL-C resulting from LMT was associated with reduced cardiovascular risk (HR per pooled SD (=0.34 mmol/l) increase=0.74, 95% CI 0.56 to 0.99, adjusted for age, sex and baseline HDL-C). However, this association was attenuated and was not (statistically) significant with further adjustments for non-HDL-C and for cigarette smoking history, prevalent diabetes, systolic blood pressure, body mass index, use of antihypertensive medication, previous myocardial infarction, prevalent angina and previous stroke (0.92, 0.701.20).
Conclusions: Following adjustment for conventional non-lipid risk factors of cardiovascular disease, this study provides no evidence to support a significant benefit from increasing HDL-C independent of the effect of lowering non-HDL-C.
Heart (British Cardiac Society) 2012;98;10;780-5
PUBMED: 22447463; DOI: 10.1136/heartjnl-2011-301405
-
Adapting to domesticity.
Nature reviews. Microbiology 2012;10;3;163
PUBMED: 22337165; DOI: 10.1038/nrmicro2752
-
Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy.
Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, United Kingdom.
Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/08493; Canadian Institutes of Health Research; Wellcome Trust: 085775/Z/08/Z
PLoS pathogens 2012;8;3;e1002567
PUBMED: 22457617; PMC: 3310773; DOI: 10.1371/journal.ppat.1002567
-
Marked endotheliotropism of highly pathogenic avian influenza virus H5N1 following intestinal inoculation in cats.
Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, USA.
Highly pathogenic avian influenza virus (HPAIV) H5N1 can infect mammals via the intestine; this is unusual since influenza viruses typically infect mammals via the respiratory tract. The dissemination of HPAIV H5N1 following intestinal entry and associated pathogenesis are largely unknown. To assess the route of spread of HPAIV H5N1 to other organs and to determine its associated pathogenesis, we inoculated infected chicken liver homogenate directly into the intestine of cats by use of enteric-coated capsules. Intestinal inoculation of HPAIV H5N1 resulted in fatal systemic disease. The spread of HPAIV H5N1 from the lumen of the intestine to other organs took place via the blood and lymphatic vascular systems but not via neuronal transmission. Remarkably, the systemic spread of the virus via the vascular system was associated with massive infection of endothelial and lymphendothelial cells, resulting in widespread hemorrhages. This is unique for influenza in mammals and resembles the pathogenesis of HPAIV infection in terrestrial poultry. It contrasts with the pathogenesis of systemic disease from the same virus following entry via the respiratory tract, where lesions are characterized mainly by necrosis and inflammation and are associated with the presence of influenza virus antigen in parenchymal, not endothelial cells. The marked endotheliotropism of the virus following intestinal inoculation indicates that the pathogenesis of systemic influenza virus infection in mammals may differ according to the portal of entry.
Journal of virology 2012;86;2;1158-65
PUBMED: 22090101; PMC: 3255817; DOI: 10.1128/JVI.06375-11
-
The TCA cycle is not required for selection or survival of multidrug-resistant Salmonella.
Antimicrobial Agents Research Group, School of Immunity and Infection, University of Birmingham, Edgbaston, Birmingham, UK.
Objectives: The initial aim of this study was to use a systems biology approach to analyse a ciprofloxacin-selected multidrug-resistant (MDR) Salmonella enterica serotype Typhimurium, L664.
Methods: The whole genome sequence and transcriptome of L664 were analysed. Site-directed mutagenesis to recreate each mutation was carried out, followed by phenotypic characterization and mutation frequency analysis. As a mutation in the TCA cycle was detected we tested the controversial hypothesis regarding the bacterial response to bactericidal antibiotics, put forward by Kohanski et al. (Cell 2007; 130: 797-810 and Mol Cell 2010; 37: 311-20), that exposure of bacteria to agents such as ciprofloxacin produces reactive oxygen species (ROS), which transiently increase the mutation rate giving rise to MDR bacteria.
Results: L664 contained a mutation in ramR that conferred MDR. A mutation in tctA affected the TCA cycle and conferred the inability to grow on minimal agar. The virulence of L664 was not attenuated. Ciprofloxacin exposure produced ROS in L664 and SL1344 (tctA::aph), but it was reduced and occurred later. There were no significant differences in the rates of killing or mutations per generation to antibiotic resistance between the strains.
Conclusions: Whilst we confirm production of ROS in response to ciprofloxacin, we have no data to support the hypothesis that this leads to selection of MDR strains. Our results indicate that the mutations in tctA and glgA were random as they did not pre-exist in the parental strain, and that the mutation in tctA did not provide a survival advantage or disadvantage in the presence of antibiotic.
Funded by: Medical Research Council: GO501415
The Journal of antimicrobial chemotherapy 2012;67;3;589-99
PUBMED: 22186876; DOI: 10.1093/jac/dkr515
-
Molecular cytogenetics: karyotype evolution, phylogenomics and future prospects.
Heredity 2012;108;1;1-3
PUBMED: 22167088; PMC: 3238121; DOI: 10.1038/hdy.2011.117
-
Mutations in ISPD cause Walker-Warburg syndrome and defective glycosylation of α-dystroglycan.
1] Department of Human Genetics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands. [2] School of Women's and Children's Health, Sydney Children's Hospital and the University of New South Wales, Sydney, New South Wales, Australia. [3].
Walker-Warburg syndrome (WWS) is an autosomal recessive multisystem disorder characterized by complex eye and brain abnormalities with congenital muscular dystrophy (CMD) and aberrant α-dystroglycan glycosylation. Here we report mutations in the ISPD gene (encoding isoprenoid synthase domain containing) as the second most common cause of WWS. Bacterial IspD is a nucleotidyl transferase belonging to a large glycosyltransferase family, but the role of the orthologous protein in chordates is obscure to date, as this phylum does not have the corresponding non-mevalonate isoprenoid biosynthesis pathway. Knockdown of ispd in zebrafish recapitulates the human WWS phenotype with hydrocephalus, reduced eye size, muscle degeneration and hypoglycosylated α-dystroglycan. These results implicate ISPD in α-dystroglycan glycosylation in maintaining sarcolemma integrity in vertebrates.
Nature genetics 2012
PUBMED: 22522421; DOI: 10.1038/ng.2253
-
Variation at the capsule locus, cps, of mistyped and non-typeable Streptococcus pneumoniae isolates.
Wellcome Trust Sanger Institute, Hinxton, UK;
The capsule polysaccharide locus (cps) is the site of the capsule biosynthesis gene cluster in encapsulated Streptococcus pneumoniae. A set of pneumococcal samples and non-pneumococcal streptococci from Denmark, the Gambia, the Netherlands, Thailand, the United Kingdom and the USA were sequenced at the cps locus to elucidate serologically mistyped or non-typeable isolates. We identified a novel serotype 33B/33C mosaic capsule cluster and previously unseen serotype 22F capsule genes, disrupted and deleted cps clusters, the presence of aliB and nspA genes that are unrelated to capsule production, and similar genes in the non-pneumococcal samples. These data provide greater understanding of diversity at a locus which is crucial to antigenic diversity of the pathogen and current vaccine strategies.
Microbiology (Reading, England) 2012
PUBMED: 22403189; DOI: 10.1099/mic.0.056580-0
-
Sequencing parasite populations.
Nature reviews. Microbiology 2012;10;2;85
PUBMED: 22245933; DOI: 10.1038/nrmicro2738
-
Insights into hominid evolution from the gorilla genome sequence.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.
Funded by: Biotechnology and Biological Sciences Research Council; Howard Hughes Medical Institute; Medical Research Council; NHGRI NIH HHS: HG002385, U54 HG003079; Wellcome Trust: 075491/Z/04, WT062023, WT077009, WT077192, WT077198, WT089066
Nature 2012;483;7388;169-75
PUBMED: 22398555; PMC: 3303130; DOI: 10.1038/nature10842
-
Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages.
Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
CTCF-binding locations represent regulatory sequences that are highly constrained over the course of evolution. To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals. Our data indicate that activation of retroelements has produced species-specific expansions of CTCF binding in rodents, dogs, and opossum, which often functionally serve as chromatin and transcriptional insulators. We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago. Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.
Funded by: Wellcome Trust: WT062023, WT098051
Cell 2012;148;1-2;335-48
PUBMED: 22244452; DOI: 10.1016/j.cell.2011.11.058
-
No Interactions Between Previously Associated 2-Hour Glucose Gene Variants and Physical Activity or BMI on 2-Hour Glucose Levels.
Corresponding author: Robert A. Scott, robert.scott@mrc-epid.cam.ac.uk.
Gene-lifestyle interactions have been suggested to contribute to the development of type 2 diabetes. Glucose levels 2 h after a standard 75-g glucose challenge are used to diagnose diabetes and are associated with both genetic and lifestyle factors. However, whether these factors interact to determine 2-h glucose levels is unknown. We meta-analyzed single nucleotide polymorphism (SNP) × BMI and SNP × physical activity (PA) interaction regression models for five SNPs previously associated with 2-h glucose levels from up to 22 studies comprising 54,884 individuals without diabetes. PA levels were dichotomized, with individuals below the first quintile classified as inactive (20%) and the remainder as active (80%). BMI was considered a continuous trait. Inactive individuals had higher 2-h glucose levels than active individuals (β = 0.22 mmol/L [95% CI 0.13-0.31], P = 1.63 × 10(-6)). All SNPs were associated with 2-h glucose (β = 0.06-0.12 mmol/allele, P ≤ 1.53 × 10(-7)), but no significant interactions were found with PA (P > 0.18) or BMI (P ≥ 0.04). In this large study of gene-lifestyle interaction, we observed no interactions between genetic and lifestyle factors, both of which were associated with 2-h glucose. It is perhaps unlikely that top loci from genome-wide association studies will exhibit strong subgroup-specific effects, and may not, therefore, make the best candidates for the study of interactions.
Funded by: NIDDK NIH HHS: R01 DK072041
Diabetes 2012;61;5;1291-6
PUBMED: 22415877; DOI: 10.2337/db11-0973
-
Beyond the palaeomicrobiology.
This month's Genome Watch highlights the power of palaeomicrobiology in extracting detailed information about the genomes of ancient microorganisms.
Nature reviews. Microbiology 2012;10;4;240
PUBMED: 22406951; DOI: 10.1038/nrmicro2768
-
Structure, diversity, and mobility of the Salmonella pathogenicity island 7 family of integrative and conjugative elements within Enterobacteriaceae.
Pathogen Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.
Integrative and conjugative elements (ICEs) are self-mobile genetic elements found in the genomes of some bacteria. These elements may confer a fitness advantage upon their host bacteria through the cargo genes that they carry. Salmonella pathogenicity island 7 (SPI-7), found within some pathogenic strains of Salmonella enterica, possesses features indicative of an ICE and carries genes implicated in virulence. We aimed to identify and fully analyze ICEs related to SPI-7 within the genus Salmonella and other Enterobacteriaceae. We report the sequence of two novel SPI-7-like elements, found within strains of Salmonella bongori, which share 97% nucleotide identity over conserved regions with SPI-7 and with each other. Although SPI-7 within Salmonella enterica serovar Typhi appears to be fixed within the chromosome, we present evidence that these novel elements are capable of excision and self-mobility. Phylogenetic analyses show that these Salmonella mobile elements share an ancestor which existed approximately 3.6 to 15.8 million years ago. Additionally, we identified more distantly related ICEs, with distinct cargo regions, within other strains of Salmonella as well as within Citrobacter, Erwinia, Escherichia, Photorhabdus, and Yersinia species. In total, we report on a collection of 17 SPI-7 related ICEs within enterobacterial species, of which six are novel. Using comparative and mutational studies, we have defined a core of 27 genes essential for conjugation. We present a growing family of SPI-7-related ICEs whose mobility, abundance, and cargo variability indicate that these elements may have had a large impact on the evolution of the Enterobacteriaceae.
Funded by: Wellcome Trust: 098051
Journal of bacteriology 2012;194;6;1494-504
PUBMED: 22247511; PMC: 3294861; DOI: 10.1128/JB.06403-11
-
optiCall: A robust genotype-calling algorithm for rare, low frequency and common variants.
Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.
MOTIVATION: Existing microarray genotype-calling algorithms adopt either SNP-by-SNP (SNP-wise) or sample-by-sample (sample-wise) approaches to calling. We have developed a novel genotype-calling algorithm for the Illumina platform, optiCall, that uses both SNP-wise and sample-wise calling to more accurately ascertain genotypes at rare, low frequency and common variants. RESULTS: Using data from 4,537 individuals from the 1958 British Birth Cohort genotyped on the Immunochip, we estimate the proportion of SNPs lost to downstream analysis due to false QC failures, and rare variants misclassified as monomorphic, is only 1.38% with optiCall, in comparison to 3.87%, 7.85% and 4.09% for Illuminus, GenoSNP and GenCall, respectively. We show that optiCall accurately captures rare variants and can correctly account for SNPs where probe intensity clouds are shifted from their expected positions.Availability and implementation: optiCall is implemented in C++ for use on Unix operating systems and is available for download at http://www.sanger.ac.uk/resources/software/opticall/ CONTACT: optiCall@sanger.ac.uk.
Bioinformatics (Oxford, England) 2012
PUBMED: 22500001; DOI: 10.1093/bioinformatics/bts180
-
Monocyte gene expression signature of patients with early onset coronary artery disease.
Department of Vascular Medicine, Academic Medical Center, Amsterdam, The Netherlands. s.sivapalaratnam@amc.uva.nl
The burden of cardiovascular disease (CVD) cannot be fully addressed by therapy targeting known pathophysiological pathways. Even with stringent control of all risk factors CVD events are only diminished by half. A number of additional pathways probably play a role in the development of CVD and might serve as novel therapeutic targets. Genome wide expression studies represent a powerful tool to identify such novel pathways. We compared the expression profiles in monocytes from twenty two young male patients with premature familial CAD with those from controls matched for age, sex and smoking status, without a family history of CVD. Since all patients were on statins and aspirin treatment, potentially affecting the expression of genes in monocytes, twelve controls were subsequently treated with simvastatin and aspirin for 6 and 2 weeks, respectively. By whole genome expression arrays six genes were identified to have differential expression in the monocytes of patients versus controls; ABCA1, ABCG1 and RGS1 were downregulated in patients, whereas ADRB2, FOLR3 and GSTM1 were upregulated. Differential expression of all genes, apart from GSTM1, was confirmed by qPCR. Aspirin and statins altered gene expression of ABCG1 and ADBR2. All finding were validated in a second group of twenty four patients and controls. Differential expression of ABCA1, RSG1 and ADBR2 was replicated. In conclusion, we identified these 3 genes to be expressed differently in CAD cases which might play a role in the pathogenesis of atherosclerotic vascular disease.
PloS one 2012;7;2;e32166
PUBMED: 22363809; PMC: 3283726; DOI: 10.1371/journal.pone.0032166
-
Predisposition gene identification in common cancers by exome sequencing: insights from familial breast cancer.
Division of Genetics and Epidemiology, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK.
The genetic component of breast cancer predisposition remains largely unexplained. Candidate gene case-control resequencing has identified predisposition genes characterised by rare, protein truncating mutations that confer moderate risks of disease. In theory, exome sequencing should yield additional genes of this class. Here, we explore the feasibility and design considerations of this approach. We performed exome sequencing in 50 individuals with familial breast cancer, applying frequency and protein function filters to identify variants most likely to be pathogenic. We identified 867,378 variants that passed the call quality filters of which 1,296 variants passed the frequency and protein truncation filters. The median number of validated, rare, protein truncating variants was 10 in individuals with, and without, mutations in known genes. The functional candidacy of mutated genes was similar in both groups. Without prior knowledge, the known genes would not have been recognisable as breast cancer predisposition genes. Everyone carries multiple rare mutations that are plausibly related to disease. Exome sequencing in common conditions will therefore require intelligent sample and variant prioritisation strategies in large case-control studies to deliver robust genetic evidence of disease association.
Breast cancer research and treatment 2012
PUBMED: 22527104; DOI: 10.1007/s10549-012-2057-x
-
Phenotype-specific effect of chromosome 1q21.1 rearrangements and GJA5 duplications in 2436 congenital heart disease patients and 6760 controls.
Institute of Genetic Medicine, Newcastle University, Newcastle, UK.
Recurrent rearrangements of chromosome 1q21.1 that occur via non-allelic homologous recombination have been associated with variable phenotypes exhibiting incomplete penetrance, including congenital heart disease (CHD). However, the gene or genes within the ~1 Mb critical region responsible for each of the associated phenotypes remains unknown. We examined the 1q21.1 locus in 948 patients with tetralogy of Fallot (TOF), 1488 patients with other forms of CHD and 6760 ethnically matched controls using single nucleotide polymorphism genotyping arrays (Illumina 660W and Affymetrix 6.0) and multiplex ligation-dependent probe amplification. We found that duplication of 1q21.1 was more common in cases of TOF than in controls [odds ratio (OR) 30.9, 95% confidence interval (CI) 8.9-107.6); P = 2.2 × 10(-7)], but deletion was not. In contrast, deletion of 1q21.1 was more common in cases of non-TOF CHD than in controls [OR 5.5 (95% CI 1.4-22.0); P = 0.04] while duplication was not. We also detected rare (n = 3) 100-200 kb duplications within the critical region of 1q21.1 in cases of TOF. These small duplications encompassed a single gene in common, GJA5, and were enriched in cases of TOF in comparison to controls [OR = 10.7 (95% CI 1.8-64.3), P = 0.01]. These findings show that duplication and deletion at chromosome 1q21.1 exhibit a degree of phenotypic specificity in CHD, and implicate GJA5 as the gene responsible for the CHD phenotypes observed with copy number imbalances at this locus.
Funded by: British Heart Foundation
Human molecular genetics 2012;21;7;1513-20
PUBMED: 22199024; PMC: 3298277; DOI: 10.1093/hmg/ddr589
-
Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis.
1] Division of Rheumatology Immunology and Allergy, Brigham and Women's Hospital, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA. [3] Division of Genetics, Brigham and Women's Hospital, Boston, Massachusetts, USA.
The genetic architectures of common, complex diseases are largely uncharacterized. We modeled the genetic architecture underlying genome-wide association study (GWAS) data for rheumatoid arthritis and developed a new method using polygenic risk-score analyses to infer the total liability-scale variance explained by associated GWAS SNPs. Using this method, we estimated that, together, thousands of SNPs from rheumatoid arthritis GWAS explain an additional 20% of disease risk (excluding known associated loci). We further tested this method on datasets for three additional diseases and obtained comparable estimates for celiac disease (43% excluding the major histocompatibility complex), myocardial infarction and coronary artery disease (48%) and type 2 diabetes (49%). Our results are consistent with simulated genetic models in which hundreds of associated loci harbor common causal variants and a smaller number of loci harbor multiple rare causal variants. These analyses suggest that GWAS will continue to be highly productive for the discovery of additional susceptibility loci for common diseases.
Nature genetics 2012;44;5;483-9
PUBMED: 22446960; DOI: 10.1038/ng.2232
-
Gut inflammation can boost horizontal gene transfer between pathogenic and commensal Enterobacteriaceae.
Institute of Microbiology, ETH Zürich, 8093 Zürich, Switzerland.
The mammalian gut harbors a dense microbial community interacting in multiple ways, including horizontal gene transfer (HGT). Pangenome analyses established particularly high levels of genetic flux between Gram-negative Enterobacteriaceae. However, the mechanisms fostering intraenterobacterial HGT are incompletely understood. Using a mouse colitis model, we found that Salmonella-inflicted enteropathy elicits parallel blooms of the pathogen and of resident commensal Escherichia coli. These blooms boosted conjugative HGT of the colicin-plasmid p2 from Salmonella enterica serovar Typhimurium to E. coli. Transconjugation efficiencies of ~100% in vivo were attributable to high intrinsic p2-transfer rates. Plasmid-encoded fitness benefits contributed little. Under normal conditions, HGT was blocked by the commensal microbiota inhibiting contact-dependent conjugation between Enterobacteriaceae. Our data show that pathogen-driven inflammatory responses in the gut can generate transient enterobacterial blooms in which conjugative transfer occurs at unprecedented rates. These blooms may favor reassortment of plasmid-encoded genes between pathogens and commensals fostering the spread of fitness-, virulence-, and antibiotic-resistance determinants.
Funded by: Wellcome Trust: 076964
Proceedings of the National Academy of Sciences of the United States of America 2012;109;4;1269-74
PUBMED: 22232693; PMC: 3268327; DOI: 10.1073/pnas.1113246109
-
Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses.
Max Planck Institute for Intelligent Systems, Tübingen, Germany. oliver.stegle@tuebingen.mpg.de
We present PEER (probabilistic estimation of expression residuals), a software package implementing statistical models that improve the sensitivity and interpretability of genetic associations in population-scale expression data. This approach builds on factor analysis methods that infer broad variance components in the measurements. PEER takes as input transcript profiles and covariates from a set of individuals, and then outputs hidden factors that explain much of the expression variability. Optionally, these factors can be interpreted as pathway or transcription factor activations by providing prior information about which genes are involved in the pathway or targeted by the factor. The inferred factors are used in genetic association analyses. First, they are treated as additional covariates, and are included in the model to increase detection power for mapping expression traits. Second, they are analyzed as phenotypes themselves to understand the causes of global expression variability. PEER extends previous related surrogate variable models and can be implemented within hours on a desktop computer.
Funded by: Wellcome Trust: WT077192/Z/05/Z
Nature protocols 2012;7;3;500-7
PUBMED: 22343431; DOI: 10.1038/nprot.2011.457
-
Meta-analyses identify 13 loci associated with age at menopause and highlight DNA repair and immune pathways.
Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands.
To newly identify loci for age at natural menopause, we carried out a meta-analysis of 22 genome-wide association studies (GWAS) in 38,968 women of European descent, with replication in up to 14,435 women. In addition to four known loci, we identified 13 loci newly associated with age at natural menopause (at P < 5 × 10(-8)). Candidate genes located at these newly associated loci include genes implicated in DNA repair (EXO1, HELQ, UIMC1, FAM175A, FANCI, TLK1, POLG and PRIM1) and immune function (IL11, NLRP11 and PRRC2A (also known as BAT2)). Gene-set enrichment pathway analyses using the full GWAS data set identified exoDNase, NF-κB signaling and mitochondrial dysfunction as biological processes related to timing of menopause.
Nature genetics 2012;44;3;260-8
PUBMED: 22267201; PMC: 3288642; DOI: 10.1038/ng.1051
-
Patterns of cis regulatory variation in diverse human populations.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.
PLoS genetics 2012;8;4;e1002639
PUBMED: 22532805; PMC: 3330104; DOI: 10.1371/journal.pgen.1002639
-
Lineage-specific virulence determinants of Haemophilus influenzae biogroup aegyptius.
Imperial College London, Medicine, St Mary’s Hospital campus, Norfolk Place, London W2 1PG, UK.
An emergent clone of Haemophilus influenzae biogroup aegyptius (Hae) is responsible for outbreaks of Brazilian purpuric fever (BPF). First recorded in Brazil in 1984, the so-called BPF clone of Hae caused a fulminant disease that started with conjunctivitis but developed into septicemic shock; mortality rates were as high as 70%. To identify virulence determinants, we conducted a pan-genomic analysis. Sequencing of the genomes of the BPF clone strain F3031 and a noninvasive conjunctivitis strain, F3047, and comparison of these sequences with 5 other complete H. influenzae genomes showed that >77% of the F3031 genome is shared among all H. influenzae strains. Delineation of the Hae accessory genome enabled characterization of 163 predicted protein-coding genes; identified differences in established autotransporter adhesins; and revealed a suite of novel adhesins unique to Hae, including novel trimeric autotransporter adhesins and 4 new fimbrial operons. These novel adhesins might play a critical role in host-pathogen interactions.
Funded by: Wellcome Trust
Emerging infectious diseases 2012;18;3;449-57
PUBMED: 22377449; DOI: 10.3201/eid1803.110728
-
A benchmarked protein microarray-based platform for the identification of novel low-affinity extracellular protein interactions.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
Low-affinity extracellular protein interactions are critical for cellular recognition processes, but existing methods to detect them are limited in scale, making genome-wide interaction screens technically challenging. To address this, we report here the miniaturization of the AVEXIS (avidity-based extracellular interaction screen) assay by using protein microarray technology. To achieve this, we have developed protein tags and sample preparation methods that enable the parallel purification of hundreds of recombinant proteins expressed in mammalian cells. We benchmarked the protein microarray-based assay against a set of known quantified receptor-ligand pairs and show that it is sensitive enough to detect even very weak interactions that are typical of this class of interactions. The increase in scale enables interaction screening against a dilution series of immobilized proteins on the microarray enabling the observation of saturation binding behaviors to show interaction specificity and also the estimation of interaction affinities directly from the primary screen. These methodological improvements now permit screening for novel extracellular receptor-ligand interactions on a genome-wide scale.
Funded by: Wellcome Trust: 077108
Analytical biochemistry 2012;424;1;45-53
PUBMED: 22342946; PMC: 3325482; DOI: 10.1016/j.ab.2012.01.034
-
Common variants at 12q15 and 12q24 are associated with infant head circumference.
1] Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands. [2] Department of Paediatrics, Erasmus Medical Center, Rotterdam, The Netherlands. [3] The Generation R Study Group, Erasmus Medical Center, Rotterdam, The Netherlands. [4].
To identify genetic variants associated with head circumference in infancy, we performed a meta-analysis of seven genome-wide association studies (GWAS) (N = 10,768 individuals of European ancestry enrolled in pregnancy and/or birth cohorts) and followed up three lead signals in six replication studies (combined N = 19,089). rs7980687 on chromosome 12q24 (P = 8.1 × 10(-9)) and rs1042725 on chromosome 12q15 (P = 2.8 × 10(-10)) were robustly associated with head circumference in infancy. Although these loci have previously been associated with adult height, their effects on infant head circumference were largely independent of height (P = 3.8 × 10(-7) for rs7980687 and P = 1.3 × 10(-7) for rs1042725 after adjustment for infant height). A third signal, rs11655470 on chromosome 17q21, showed suggestive evidence of association with head circumference (P = 3.9 × 10(-6)). SNPs correlated to the 17q21 signal have shown genome-wide association with adult intracranial volume, Parkinson's disease and other neurodegenerative diseases, indicating that a common genetic variant in this region might link early brain growth with neurological disease in later life.
Nature genetics 2012;44;5;532-538
PUBMED: 22504419; DOI: 10.1038/ng.2238
-
Common variants at 6q22 and 17q21 are associated with intracranial volume.
1] Department of Epidemiology, Erasmus Medical Center University Medical Center, Rotterdam, The Netherlands. [2] Department of Radiology, Erasmus Medical Center University Medical Center, Rotterdam, The Netherlands. [3] Netherlands Consortium for Healthy Aging, Leiden, The Netherlands. [4].
During aging, intracranial volume remains unchanged and represents maximally attained brain size, while various interacting biological phenomena lead to brain volume loss. Consequently, intracranial volume and brain volume in late life reflect different genetic influences. Our genome-wide association study (GWAS) in 8,175 community-dwelling elderly persons did not reveal any associations at genome-wide significance (P < 5 × 10(-8)) for brain volume. In contrast, intracranial volume was significantly associated with two loci: rs4273712 (P = 3.4 × 10(-11)), a known height-associated locus on chromosome 6q22, and rs9915547 (P = 1.5 × 10(-12)), localized to the inversion on chromosome 17q21. We replicated the associations of these loci with intracranial volume in a separate sample of 1,752 elderly persons (P = 1.1 × 10(-3) for 6q22 and 1.2 × 10(-3) for 17q21). Furthermore, we also found suggestive associations of the 17q21 locus with head circumference in 10,768 children (mean age of 14.5 months). Our data identify two loci associated with head size, with the inversion at 17q21 also likely to be involved in attaining maximal brain size.
Nature genetics 2012;44;5;539-544
PUBMED: 22504418; DOI: 10.1038/ng.2245
-
A genome-wide association meta-analysis identifies new childhood obesity loci.
1] Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA. [2].
Multiple genetic variants have been associated with adult obesity and a few with severe obesity in childhood; however, less progress has been made in establishing genetic influences on common early-onset obesity. We performed a North American, Australian and European collaborative meta-analysis of 14 studies consisting of 5,530 cases (≥95th percentile of body mass index (BMI)) and 8,318 controls (<50th percentile of BMI) of European ancestry. Taking forward the eight newly discovered signals yielding association with P < 5 × 10(-6) in nine independent data sets (2,818 cases and 4,083 controls), we observed two loci that yielded genome-wide significant combined P values near OLFM4 at 13q14 (rs9568856; P = 1.82 × 10(-9); odds ratio (OR) = 1.22) and within HOXB5 at 17q21 (rs9299; P = 3.54 × 10(-9); OR = 1.14). Both loci continued to show association when two extreme childhood obesity cohorts were included (2,214 cases and 2,674 controls). These two loci also yielded directionally consistent associations in a previous meta-analysis of adult BMI.
Nature genetics 2012
PUBMED: 22484627; DOI: 10.1038/ng.2247
-
The first green revolution.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. e-mail: microbes@sanger.ac.uk.
This month's Genome Watch describes how analysis of a basal member of the Plantae can inform us about the evolution of photosynthetic eukaryotes.
Nature reviews. Microbiology 2012;10;5;314
PUBMED: 22504859; DOI: 10.1038/nrmicro2781
-
Detailed metabolic and genetic characterization reveals new associations for 30 known lipid loci.
Institute for Molecular Medicine Finland FIMM, Helsinki University Hospital, FI-00014 University of Helsinki, Helsinki, Finland.
Almost 100 genetic loci are known to affect serum cholesterol and triglyceride levels. For many of these loci, the biological function and causal variants remain unknown. We performed an association analysis of the reported 95 lipid loci against 216 metabolite measures, including 95 measurements on lipids and lipoprotein subclasses, obtained via serum nuclear magnetic resonance metabolomics and four enzymatic lipid traits in 8330 individuals from Finland. The genetic variation in the loci was investigated using a dense set of 440 807 directly genotyped and imputed variants around the previously identified lead single nucleotide polymorphisms (SNPs). For 30 of the 95 loci, we identified new metabolic or genetic associations (P < 5 × 10(-8)). In the majority of the loci, the strongest association was to a more specific metabolite measure than the enzymatic lipids. In four loci, the smallest high-density lipoprotein measures showed effects opposite to the larger ones, and 14 loci had associations beyond the individual lipoprotein measures. In 27 loci, we identified SNPs with a stronger association than the previously reported markers and 12 loci harboured multiple, statistically independent variants. Our data show considerable diversity in association patterns between the loci originally identified through associations with enzymatic lipid measures and reveal association profiles of far greater detail than from routine clinical lipid measures. Additionally, a dense marker set and a homogeneous population allow for detailed characterization of the genetic association signals to a resolution exceeding that achieved so far. Further understanding of the rich variability in genetic effects on metabolites provides insights into the biological processes modifying lipid levels.
Human molecular genetics 2012;21;6;1444-55
PUBMED: 22156771; DOI: 10.1093/hmg/ddr581
-
Gene-gene interactions in breast cancer susceptibility.
Division of Genetics and Epidemiology, The Institute of Cancer Research, Sutton, Surrey SM2 5NG, UK. clare.turnbull@icr.ac.uk
There have been few definitive examples of gene-gene interactions in humans. Through mutational analyses in 7325 individuals, we report four interactions (defined as departures from a multiplicative model) between mutations in the breast cancer susceptibility genes ATM and CHEK2 with BRCA1 and BRCA2 (case-only interaction between ATM and BRCA1/BRCA2 combined, P = 5.9 × 10(-4); ATM and BRCA1, P= 0.01; ATM and BRCA2, P= 0.02; CHEK2 and BRCA1/BRCA2 combined, P = 2.1 × 10(-4); CHEK2 and BRCA1, P= 0.01; CHEK2 and BRCA2, P= 0.01). The interactions are such that the resultant risk of breast cancer is lower than the multiplicative product of the constituent risks, and plausibly reflect the functional relationships of the encoded proteins in DNA repair. These findings have important implications for models of disease predisposition and clinical translation.
Funded by: Cancer Research UK: C1287/A10118, C1287/A8874, C8620/A8372, C8620/A8857; Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02
Human molecular genetics 2012;21;4;958-62
PUBMED: 22072393; DOI: 10.1093/hmg/ddr525
-
A British approach to sampling.
Funded by: Wellcome Trust
European journal of human genetics : EJHG 2012;20;2;129-30
PUBMED: 21829226; PMC: 3260911; DOI: 10.1038/ejhg.2011.153
-
Sibling Rivalry among Paralogs Promotes Evolution of the Human Brain.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Geneticists have long sought to identify the genetic changes that made us human, but pinpointing the functionally relevant changes has been challenging. Two papers in this issue suggest that partial duplication of SRGAP2, producing an incomplete protein that antagonizes the original, contributed to human brain evolution.
Cell 2012;149;4;737-9
PUBMED: 22579279; DOI: 10.1016/j.cell.2012.04.020
-
Jdp2 downregulates Trp53 transcription to promote leukaemogenesis in the context of Trp53 heterozygosity.
The Wellcome Trust Sanger Institute, Cambridge, UK.
We performed a genetic screen in mice to identify candidate genes that are associated with leukaemogenesis in the context of Trp53 heterozygosity. To do this we generated Trp53 heterozygous mice carrying the T2/Onc transposon and SB11 transposase alleles to allow transposon-mediated insertional mutagenesis to occur. From the resulting leukaemias/lymphomas that developed in these mice, we identified nine loci that are potentially associated with tumour formation in the context of Trp53 heterozygosity, including AB041803 and the Jun dimerization protein 2 (Jdp2). We show that Jdp2 transcriptionally regulates the Trp53 promoter, via an atypical AP-1 site, and that Jdp2 expression negatively regulates Trp53 expression levels. This study is the first to identify a genetic mechanism for tumour formation in the context of Trp53 heterozygosity.Oncogene advance online publication, 27 February 2012; doi:10.1038/onc.2012.56.
Oncogene 2012
PUBMED: 22370638; DOI: 10.1038/onc.2012.56
-
Using CF11 cellulose columns to inexpensively and effectively remove human DNA from Plasmodium falciparum-infected whole blood samples.
Howard Hughes Medical Institute, University of Maryland School of Medicine, Baltimore, MD, USA.
Background: Genome and transcriptome studies of Plasmodium nucleic acids obtained from parasitized whole blood are greatly improved by depletion of human DNA or enrichment of parasite DNA prior to next-generation sequencing and microarray hybridization. The most effective method currently used is a two-step procedure to deplete leukocytes: centrifugation using density gradient media followed by filtration through expensive, commercially available columns. This method is not easily implemented in field studies that collect hundreds of samples and simultaneously process samples for multiple laboratory analyses. Inexpensive syringes, hand-packed with CF11 cellulose powder, were recently shown to improve ex vivo cultivation of Plasmodium vivax obtained from parasitized whole blood. This study was undertaken to determine whether CF11 columns could be adapted to isolate Plasmodium falciparum DNA from parasitized whole blood and achieve current quantity and purity requirements for Illumina sequencing.
Methods: The CF11 procedure was compared with the current two-step standard of leukocyte depletion using parasitized red blood cells cultured in vitro and parasitized blood obtained ex vivo from Cambodian patients with malaria. Procedural variations in centrifugation and column size were tested, along with a range of blood volumes and parasite densities.
Results: CF11 filtration reliably produces 500 nanograms of DNA with less than 50% human DNA contamination, which is comparable to that obtained by the two-step method and falls within the current quality control requirements for Illumina sequencing. In addition, a centrifuge-free version of the CF11 filtration method to isolate P. falciparum DNA at remote and minimally equipped field sites in malaria-endemic areas was validated.
Conclusions: CF11 filtration is a cost-effective, scalable, one-step approach to remove human DNA from P. falciparum-infected whole blood samples.
Funded by: Howard Hughes Medical Institute; Wellcome Trust: 098051
Malaria journal 2012;11;41
PUBMED: 22321373; PMC: 3295709; DOI: 10.1186/1475-2875-11-41
-
PfSET10, a Plasmodium falciparum methyltransferase, maintains the active var gene in a poised state during parasite division.
The Walter and Eliza Hall Institute for Medical Research, Melbourne, Victoria, Australia.
A major virulence factor of the malaria parasite Plasmodium falciparum is erythrocyte membrane protein 1 (PfEMP1), a variant protein expressed on the infected erythrocyte surface. PfEMP1 is responsible for adherence of infected erythrocytes to the endothelium and plays an important role in pathogenesis. Mutually exclusive transcription and switched expression of one of 60 var genes encoding PfEMP1 in each parasite genome provides a mechanism for antigenic variation. We report the identification of a parasite protein, designated PfSET10, which localizes exclusively to the perinuclear active var gene expression site. PfSET10 is a histone 3 lysine 4 methyltransferase required to maintain the active var gene in a poised state during division for reactivation in daughter parasites, and as such is required for P. falciparum antigenic variation. PfSET10 likely maintains the transcriptionally permissive chromatin environment of the active var promoter and thus retains memory for heritable transmission of epigenetic information during parasite division.
Cell host & microbe 2012;11;1;7-18
PUBMED: 22264509; DOI: 10.1016/j.chom.2011.11.011
-
Welcome to the plasmidome.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. microbes@sanger.ac.uk.
This month's Genome Watch reviews a recent paper that used a metagenomics approach to characterize the plasmid content of bovine rumen samples.
Nature reviews. Microbiology 2012;10;6;379
PUBMED: 22580363; DOI: 10.1038/nrmicro2804
-
GFAP-Cre-Mediated Transgenic Activation of Bmi1 Results in Pituitary Tumors.
Division of Molecular Genetics, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
Bmi1 is a member of the polycomb repressive complex 1 and plays different roles during embryonic development, depending on the developmental context. Bmi1 over expression is observed in many types of cancer, including tumors of astroglial and neural origin. Although genetic depletion of Bmi1 has been described to result in tumor inhibitory effects partly through INK4A/Arf mediated senescence and apoptosis and also through INK4A/Arf independent effects, it has not been proven that Bmi1 can be causally involved in the formation of these tumors. To see whether this is the case, we developed two conditional Bmi1 transgenic models that were crossed with GFAP-Cre mice to activate transgenic expression in neural and glial lineages. We show here that these mice generate intermediate and anterior lobe pituitary tumors that are positive for ACTH and beta-endorphin. Combined transgenic expression of Bmi1 together with conditional loss of Rb resulted in pituitary tumors but was insufficient to induce medulloblastoma therefore indicating that the oncogenic function of Bmi1 depends on regulation of p16(INK4A)/Rb rather than on regulation of p19(ARF)/p53. Human pituitary adenomas show Bmi1 overexpression in over 50% of the cases, which indicates that Bmi1 could be causally involved in formation of these tumors similarly as in our mouse model.
PloS one 2012;7;5;e35943
PUBMED: 22574128; PMC: 3344841; DOI: 10.1371/journal.pone.0035943
-
Nuclear receptor binding protein 1 regulates intestinal progenitor cell homeostasis and tumour formation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
Genetic screens in simple model organisms have identified many of the key components of the conserved signal transduction pathways that are oncogenic when misregulated. Here, we identify H37N21.1 as a gene that regulates vulval induction in let-60(n1046gf), a strain with a gain-of-function mutation in the Caenorhabditis elegans Ras orthologue, and show that somatic deletion of Nrbp1, the mouse orthologue of this gene, results in an intestinal progenitor cell phenotype that leads to profound changes in the proliferation and differentiation of all intestinal cell lineages. We show that Nrbp1 interacts with key components of the ubiquitination machinery and that loss of Nrbp1 in the intestine results in the accumulation of Sall4, a key mediator of stem cell fate, and of Tsc22d2. We also reveal that somatic loss of Nrbp1 results in tumourigenesis, with haematological and intestinal tumours predominating, and that nuclear receptor binding protein 1 (NRBP1) is downregulated in a range of human tumours, where low expression correlates with a poor prognosis. Thus NRBP1 is a conserved regulator of cell fate, that plays an important role in tumour suppression.
The EMBO journal 2012
PUBMED: 22510880; DOI: 10.1038/emboj.2012.91
-
Diversity in parasitic nematode genomes: the microRNAs of Brugia pahangi and Haemonchus contortus are largely novel.
Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences; University of Glasgow, Garscube Estate, Bearsden Road, Glasgow, G61 1QH, UK. Alan.Winter@glasgow.ac.uk
Background: MicroRNAs (miRNAs) play key roles in regulating post-transcriptional gene expression and are essential for development in the free-living nematode Caenorhabditis elegans and in higher organisms. Whether microRNAs are involved in regulating developmental programs of parasitic nematodes is currently unknown. Here we describe the the miRNA repertoire of two important parasitic nematodes as an essential first step in addressing this question.
Results: The small RNAs from larval and adult stages of two parasitic species, Brugia pahangi and Haemonchus contortus, were identified using deep-sequencing and bioinformatic approaches. Comparative analysis to known miRNA sequences reveals that the majority of these miRNAs are novel. Some novel miRNAs are abundantly expressed and display developmental regulation, suggesting important functional roles. Despite the lack of conservation in the miRNA repertoire, genomic positioning of certain miRNAs within or close to specific coding genes is remarkably conserved across diverse species, indicating selection for these associations. Endogenous small-interfering RNAs and Piwi-interacting (pi)RNAs, which regulate gene and transposon expression, were also identified. piRNAs are expressed in adult stage H. contortus, supporting a conserved role in germline maintenance in some parasitic nematodes.
Conclusions: This in-depth comparative analysis of nematode miRNAs reveals the high level of divergence across species and identifies novel sequences potentially involved in development. Expression of novel miRNAs may reflect adaptations to different environments and lifestyles. Our findings provide a detailed foundation for further study of the evolution and function of miRNAs within nematodes and for identifying potential targets for intervention.
Funded by: Wellcome Trust: 085775/Z/08/Z, 086823/Z/08/Z
BMC genomics 2012;13;4
PUBMED: 22216965; PMC: 3282659; DOI: 10.1186/1471-2164-13-4
-
PomBase: a comprehensive online resource for fission yeast.
Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK. vw253@cam.ac.uk
PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.
Funded by: Wellcome Trust: WT090548MA
Nucleic acids research 2012;40;Database issue;D695-9
PUBMED: 22039153; PMC: 3245111; DOI: 10.1093/nar/gkr853
-
Enhanced peptide identification by electron transfer dissociation using an improved mascot percolator.
Wellcome Trust Sanger Institute, United Kingdom;
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) as a complement to collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation in the use of ETD has been optimal database search software. PERCOLATOR is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously demonstrated sensitivity and specificity benefits on CID data from the MASCOT search engine using MASCOT PERCOLATOR. Here we report recent developments in the MASCOT PERCOLATOR software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of CID and ETD fragmented peptide datasets. The new MASCOT PERCOLATOR increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard MASCOT search including PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by MASCOT PERCOLATOR has enabled a fuller assessment of CID/ETD complementarity to be investigated. Using a dataset of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate the overlap in peptide identifications is 83%, which is significantly higher than previously reported. We suggest how acquisition parameters for combined CID and ETD experiments can be optimised to maximise the number of identifiable spectra.
Molecular & cellular proteomics : MCP 2012
PUBMED: 22493177; DOI: 10.1074/mcp.O111.014522
-
WormBase 2012: more genomes, more data, new website.
Division of Biology 156-29, California Institute of Technology, Pasadena, CA 91125, USA. kyook@wormbase.org
Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community.
Funded by: Howard Hughes Medical Institute; Medical Research Council: G070119; NHGRI NIH HHS: P41 HG02223, P41-HG02223
Nucleic acids research 2012;40;Database issue;D735-41
PUBMED: 22067452; PMC: 3245152; DOI: 10.1093/nar/gkr954
-
Genetic determinants of lipid homeostasis.
Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK; Department of Public Health & Primary Care, University of Cambridge, Cambridge, UK.
Circulating levels of blood lipids are heritable risk factors for atherosclerosis and heart disease, and are the target of therapeutic intervention. Studies of monogenic disorders and - more recently - genome-wide association studies have identified several important genetic determinants of blood lipid levels. These have the potential to provide new drug targets to alter blood lipid levels and may improve prediction of cardiovascular disease. Better functional validation of lipid loci is required to clarify the biological role of proteins encoded by specific genomic regions and understand how they influence lipid metabolism and confer disease risk.
Best practice & research. Clinical endocrinology & metabolism 2012;26;2;203-9
PUBMED: 22498249; DOI: 10.1016/j.beem.2011.11.004
-
Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes.
1] Cellular and Molecular Research, National Cancer Centre, Singapore. [2] Cancer and Stem Cell Biology Program, Duke-National University of Singapore (NUS) Graduate Medical School, Singapore. [3].
Gastric cancer is a major cause of global cancer mortality. We surveyed the spectrum of somatic alterations in gastric cancer by sequencing the exomes of 15 gastric adenocarcinomas and their matched normal DNAs. Frequently mutated genes in the adenocarcinomas included TP53 (11/15 tumors), PIK3CA (3/15) and ARID1A (3/15). Cell adhesion was the most enriched biological pathway among the frequently mutated genes. A prevalence screening confirmed mutations in FAT4, a cadherin family gene, in 5% of gastric cancers (6/110) and FAT4 genomic deletions in 4% (3/83) of gastric tumors. Frequent mutations in chromatin remodeling genes (ARID1A, MLL3 and MLL) also occurred in 47% of the gastric cancers. We detected ARID1A mutations in 8% of tumors (9/110), which were associated with concurrent PIK3CA mutations and microsatellite instability. In functional assays, we observed both FAT4 and ARID1A to exert tumor-suppressor activity. Somatic inactivation of FAT4 and ARID1A may thus be key tumorigenic events in a subset of gastric cancers.
Nature genetics 2012
PUBMED: 22484628; DOI: 10.1038/ng.2246
-
Multi-isotope imaging mass spectrometry reveals slow protein turnover in hair-cell stereocilia.
Department of Neurobiology, Harvard Medical School and Howard Hughes Medical Institute, Boston, Massachusetts 02115, USA.
Hair cells of the inner ear are not normally replaced during an animal's life, and must continually renew components of their various organelles. Among these are the stereocilia, each with a core of several hundred actin filaments that arise from their apical surfaces and that bear the mechanotransduction apparatus at their tips. Actin turnover in stereocilia has previously been studied by transfecting neonatal rat hair cells in culture with a β-actin-GFP fusion, and evidence was found that actin is replaced, from the top down, in 2-3 days. Overexpression of the actin-binding protein espin causes elongation of stereocilia within 12-24 hours, also suggesting rapid regulation of stereocilia lengths. Similarly, the mechanosensory 'tip links' are replaced in 5-10 hours after cleavage in chicken and mammalian hair cells. In contrast, turnover in chick stereocilia in vivo is much slower. It might be that only certain components of stereocilia turn over quickly, that rapid turnover occurs only in neonatal animals, only in culture, or only in response to a challenge like breakage or actin overexpression. Here we quantify protein turnover by feeding animals with a (15)N-labelled precursor amino acid and using multi-isotope imaging mass spectrometry to measure appearance of new protein. Surprisingly, in adult frogs and mice and in neonatal mice, in vivo and in vitro, the stereocilia were remarkably stable, incorporating newly synthesized protein at <10% per day. Only stereocilia tips had rapid turnover and no treadmilling was observed. Other methods confirmed this: in hair cells expressing β-actin-GFP we bleached fiducial lines across hair bundles, but they did not move in 6 days. When we stopped expression of β- or γ-actin with tamoxifen-inducible recombination, neither actin isoform left the stereocilia, except at the tips. Thus, rapid turnover in stereocilia occurs only at the tips and not by a treadmilling process.
Funded by: Howard Hughes Medical Institute; NCRR NIH HHS: 2P41RR0112553-12, P41RR14579; NEI NIH HHS: R01EY12963; NIAMS NIH HHS: R01AR049899; NIBIB NIH HHS: P41EB001974; NIDCD NIH HHS: F32DC009539, R01DC00033, R01DC02281, R01DC03463, R01DC04179; NIDDK NIH HHS: R37DK39773; NIGMS NIH HHS: R01GM47214; PHS HHS: R01D K58762; Wellcome Trust: WT079643
Nature 2012;481;7382;520-4
PUBMED: 22246323; PMC: 3267870; DOI: 10.1038/nature10745
-
PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data.
Department of Molecular Epidemiology, Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands. y.zhang@lumc.nl
Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge.
Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples.
Availability: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.
Bioinformatics (Oxford, England) 2012;28;4;479-86
PUBMED: 22219203; PMC: 3278765; DOI: 10.1093/bioinformatics/btr712
-
Abstracts of the 52nd Annual Scientific Meeting of the British Society for Haematology. April 16-18, 2012. Glasgow, United Kingdom.
British journal of haematology 2012;157 Suppl 1;1-94
PUBMED: 22532974
-
Abstracts of the International Society for Cellular Oncology 2012 Conference, Joint Meeting with the European Workshop on Cytogenetics, and Molecular Genetics of Solid Tumors. March 4-8, 2012. Palma, Mallorca.
Cellular oncology (Dordrecht) 2012;35 Suppl 1;S5-60
PUBMED: 22361930; DOI: 10.1007/s13402-012-0074-8

