Sanger Institute - Publications 2010

Number of papers published in 2010: 437

  • A map of human genome variation from population-scale sequencing.

    1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    Funded by: British Heart Foundation: RG/09/012/28096; Howard Hughes Medical Institute; Medical Research Council: G0801823, G0801823(89305); NCRR NIH HHS: S10RR025056; NHGRI NIH HHS: 01HG3229, N01HG62088, P01HG4120, P41HG2371, P41HG4221, P41HG4222, P50HG2357, R01 HG003229, R01 HG003229-05, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, R01HG2651, R01HG3698, R01HG4333, R01HG4719, R01HG4960, RC2 HG005552-01, RC2 HG005552-02, RC2HG5552, U01HG5208, U01HG5209, U01HG5210, U01HG5211, U01HG5214, U41HG4568, U54 HG003273, U54HG2750, U54HG2757, U54HG3067, U54HG3079, U54HG3273; NIGMS NIH HHS: R01GM59290, R01GM72861, T32 GM007753; NIMH NIH HHS: 01MH84698; Wellcome Trust: 075491, 077009, 077014, 077192, 081407, 085532, 086084, 089061, 089062, 089088, WT075491/Z/04, WT077009, WT081407/Z/06/Z, WT085532AIA, WT086084/Z/08/Z, WT089088/Z/09/Z

    Nature 2010;467;7319;1061-73

  • Statistical comparison of methods to estimate the error probability in short-read Illumina sequencing.

    Abnizova I, Skelly T, Naumenko F, Whiteford N, Brown C and Cox T

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    As was the case in the beginning of the sequencing era, the new generation of short-read sequencing technologies still requires both accuracy of data processing methods and reliable measures of that accuracy. Inspired by the classic of the genre, the Phred method, we generalized those findings in the area of base quality value calibration. We introduce a simple, straightforward statistically established way to measure the performance of a calibrator, and to find an optimal way to assess its reliability. We illustrate the method by assessing the performance of several calibrators/predictors for Illumina, Genome Analyser 2 (GA2) data. The choice of the best predictor is based on optimization of validity, discriminative ability and discrimination power for several candidate predictors. We applied the method on data from one experimental run for genome of the phage varphiX, and found the best predictor out of ten candidates to be 'Purity', a statistics derived from corrected cluster intensities. The source code for the comparison of the predictors is available from the authors by request.

    Journal of bioinformatics and computational biology 2010;8;3;579-91

  • Genetic evidence of multiple loci in dystocia--difficult labour.

    Algovik M, Kivinen K, Peterson H, Westgren M and Kere J

    Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden.

    Background: Dystocia, difficult labour, is a common but also complex problem during childbirth. It can be attributed to either weak contractions of the uterus, a large infant, reduced capacity of the pelvis or combinations of these. Previous studies have indicated that there is a genetic component in the susceptibility of experiencing dystocia. The purpose of this study was to identify susceptibility genes in dystocia.

    Methods: A total of 104 women in 47 families were included where at least two sisters had undergone caesarean section at a gestational length of 286 days or more at their first delivery. Study of medical records and a telephone interview was performed to identify subjects with dystocia. Whole-genome scanning using Affymetrix genotyping-arrays and non-parametric linkage (NPL) analysis was made in 39 women exhibiting the phenotype of dystocia from 19 families. In 68 women re-sequencing was performed of candidate genes showing suggestive linkage: oxytocin (OXT) on chromosome 20 and oxytocin-receptor (OXTR) on chromosome 3.

    Results: We found a trend towards linkage with suggestive NPL-score (3.15) on chromosome 12p12. Suggestive linkage peaks were observed on chromosomes 3, 4, 6, 10, 20. Re-sequencing of OXT and OXTR did not reveal any causal variants.

    Conclusions: Dystocia is likely to have a genetic component with variations in multiple genes affecting the patient outcome. We found 6 loci that could be re-evaluated in larger patient cohorts.

    BMC medical genetics 2010;11;105

  • An insight into the sialome of Glossina morsitans morsitans.

    Alves-Silva J, Ribeiro JM, Van Den Abbeele J, Attardo G, Hao Z, Haines LR, Soares MB, Berriman M, Aksoy S and Lehane MJ

    Vector Group, Liverpool School of Tropical Medicine, Liverpool, L3 5QA, UK.

    Background: Blood feeding evolved independently in worms, arthropods and mammals. Among the adaptations to this peculiar diet, these animals developed an armament of salivary molecules that disarm their host's anti-bleeding defenses (hemostasis), inflammatory and immune reactions. Recent sialotranscriptome analyses (from the Greek sialo = saliva) of blood feeding insects and ticks have revealed that the saliva contains hundreds of polypeptides, many unique to their genus or family. Adult tsetse flies feed exclusively on vertebrate blood and are important vectors of human and animal diseases. Thus far, only limited information exists regarding the Glossina sialome, or any other fly belonging to the Hippoboscidae.

    Results: As part of the effort to sequence the genome of Glossina morsitans morsitans, several organ specific, high quality normalized cDNA libraries have been constructed, from which over 20,000 ESTs from an adult salivary gland library were sequenced. These ESTs have been assembled using previously described ESTs from the fat body and midgut libraries of the same fly, thus totaling 62,251 ESTs, which have been assembled into 16,743 clusters (8,506 of which had one or more EST from the salivary gland library). Coding sequences were obtained for 2,509 novel proteins, 1,792 of which had at least one EST expressed in the salivary glands. Despite library normalization, 59 transcripts were overrepresented in the salivary library indicating high levels of expression. This work presents a detailed analysis of the salivary protein families identified. Protein expression was confirmed by 2D gel electrophoresis, enzymatic digestion and mass spectrometry. Concurrently, an initial attempt to determine the immunogenic properties of selected salivary proteins was undertaken.

    Conclusions: The sialome of G. m. morsitans contains over 250 proteins that are possibly associated with blood feeding. This set includes alleles of previously described gene products, reveals new evidence that several salivary proteins are multigenic and identifies at least seven new polypeptide families unique to Glossina. Most of these proteins have no known function and thus, provide a discovery platform for the identification of novel pharmacologically active compounds, innovative vector-based vaccine targets, and immunological markers of vector exposure.

    Funded by: NIAID NIH HHS: AI51584, R21 AI076879-01A1, R21 AI076879-02; NIGMS NIH HHS: F32 GM077964

    BMC genomics 2010;11;213

  • Data quality control in genetic case-control association studies.

    Anderson CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP and Zondervan KT

    Genetic and Genomic Epidemiology Unit, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    This protocol details the steps for data quality assessment and control that are typically carried out during case-control association studies. The steps described involve the identification and removal of DNA samples and markers that introduce bias. These critical steps are paramount to the success of a case-control study and are necessary before statistically testing for association. We describe how to use PLINK, a tool for handling SNP data, to perform assessments of failure rate per individual and per SNP and to assess the degree of relatedness between individuals. We also detail other quality-control procedures, including the use of SMARTPCA software for the identification of ancestral outliers. These platforms were selected because they are user-friendly, widely used and computationally efficient. Steps needed to detect and establish a disease association using case-control data are not discussed here. Issues concerning study design and marker selection in case-control studies have been discussed in our earlier protocols. This protocol, which is routinely used in our labs, should take approximately 8 h to complete.

    Funded by: Wellcome Trust: 081682, 085235, WT91745/Z/10/Z

    Nature protocols 2010;5;9;1564-73

  • Data quality control in genetic case-control association studies

    ANDERSON CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT

    Nature Protocols. 2010;5;1564-73

  • Genome-wide association study of migraine implicates a common susceptibility variant on 8q22.1.

    Anttila V, Stefansson H, Kallela M, Todt U, Terwindt GM, Calafato MS, Nyholt DR, Dimas AS, Freilinger T, Müller-Myhsok B, Artto V, Inouye M, Alakurtti K, Kaunisto MA, Hämäläinen E, de Vries B, Stam AH, Weller CM, Heinze A, Heinze-Kuhn K, Goebel I, Borck G, Göbel H, Steinberg S, Wolf C, Björnsson A, Gudmundsson G, Kirchmann M, Hauge A, Werge T, Schoenen J, Eriksson JG, Hagen K, Stovner L, Wichmann HE, Meitinger T, Alexander M, Moebus S, Schreiber S, Aulchenko YS, Breteler MM, Uitterlinden AG, Hofman A, van Duijn CM, Tikka-Kleemola P, Vepsäläinen S, Lucae S, Tozzi F, Muglia P, Barrett J, Kaprio J, Färkkilä M, Peltonen L, Stefansson K, Zwart JA, Ferrari MD, Olesen J, Daly M, Wessman M, van den Maagdenberg AM, Dichgans M, Kubisch C, Dermitzakis ET, Frants RR, Palotie A and International Headache Genetics Consortium

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Migraine is a common episodic neurological disorder, typically presenting with recurrent attacks of severe headache and autonomic dysfunction. Apart from rare monogenic subtypes, no genetic or molecular markers for migraine have been convincingly established. We identified the minor allele of rs1835740 on chromosome 8q22.1 to be associated with migraine (P = 5.38 × 10⁻⁹, odds ratio = 1.23, 95% CI 1.150-1.324) in a genome-wide association study of 2,731 migraine cases ascertained from three European headache clinics and 10,747 population-matched controls. The association was replicated in 3,202 cases and 40,062 controls for an overall meta-analysis P value of 1.69 × 10⁻¹¹ (odds ratio = 1.18, 95% CI 1.127-1.244). rs1835740 is located between MTDH (astrocyte elevated gene 1, also known as AEG-1) and PGCP (encoding plasma glutamate carboxypeptidase). In an expression quantitative trait study in lymphoblastoid cell lines, transcript levels of the MTDH were found to have a significant correlation to rs1835740 (P = 3.96 × 10⁻⁵, permuted threshold for genome-wide significance 7.7 × 10⁻⁵. To our knowledge, our data establish rs1835740 as the first genetic risk factor for migraine.

    Funded by: Wellcome Trust: 089062, WT089062

    Nature genetics 2010;42;10;869-73

  • The SH3 domain of postsynaptic density 95 mediates inflammatory pain through phosphatidylinositol-3-kinase recruitment.

    Arbuckle MI, Komiyama NH, Delaney A, Coba M, Garry EM, Rosie R, Allchorne AJ, Forsyth LH, Bence M, Carlisle HJ, O'Dell TJ, Mitchell R, Fleetwood-Walker SM and Grant SG

    Centre for Neuroregeneration, The University of Edinburgh, Institute of Immunology and Infection, Ashworth Buildings, Kings Buildings, Edinburgh EH9 3JT, UK.

    Sensitization to inflammatory pain is a pathological form of neuronal plasticity that is poorly understood and treated. Here we examine the role of the SH3 domain of postsynaptic density 95 (PSD95) by using mice that carry a single amino-acid substitution in the polyproline-binding site. Testing multiple forms of plasticity we found sensitization to inflammation was specifically attenuated. The inflammatory response required recruitment of phosphatidylinositol-3-kinase-C2alpha to the SH3-binding site of PSD95. In wild-type mice, wortmannin or peptide competition attenuated the sensitization. These results show that different types of behavioural plasticity are mediated by specific domains of PSD95 and suggest novel therapeutic avenues for reducing inflammatory pain.

    Funded by: Medical Research Council; Wellcome Trust

    EMBO reports 2010;11;6;473-8

  • Pig genome sequence--analysis and publication strategy.

    Archibald AL, Bolund L, Churcher C, Fredholm M, Groenen MA, Harlizius B, Lee KT, Milan D, Rogers J, Rothschild MF, Uenishi H, Wang J, Schook LB and Swine Genome Sequencing Consortium

    Background: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing.

    Results: Assemblies of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication.

    Conclusions: In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results.

    Funded by: Biotechnology and Biological Sciences Research Council

    BMC genomics 2010;11;438

  • Rare variant association analysis methods for complex traits.

    Asimit J and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.

    There has been increasing interest in rare variants and their association with disease, and several rare variant-disease associations have already been detected. The usual association tests for common variants are underpowered for detecting variants of lower frequency, so alternative approaches are required. In addition to reviewing the association analysis methods for rare variants, we discuss the limitations of genome-wide association studies in identifying rare variants and the problems that arise in the imputation of rare variants.

    Funded by: Wellcome Trust: WT088885/Z/09/Z

    Annual review of genetics 2010;44;293-308

  • TriTrypDB: a functional genomic resource for the Trypanosomatidae.

    Aslett M, Aurrecoechea C, Berriman M, Brestelli J, Brunk BP, Carrington M, Depledge DP, Fischer S, Gajria B, Gao X, Gardner MJ, Gingle A, Grant G, Harb OS, Heiges M, Hertz-Fowler C, Houston R, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Logan FJ, Miller JA, Mitra S, Myler PJ, Nayak V, Pennington C, Phan I, Pinney DF, Ramasamy G, Rogers MB, Roos DS, Ross C, Sivam D, Smith DF, Srinivasamoorthy G, Stoeckert CJ, Subramanian S, Thibodeau R, Tivey A, Treatman C, Velarde G and Wang H

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    TriTrypDB ( is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center ( to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. 'User Comments' may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.

    Funded by: PHS HHS: HHSN266200400037C; Wellcome Trust: 085822, WT085775, WT085775/Z/08/Z, WT085822MA

    Nucleic acids research 2010;38;Database issue;D457-62

  • Lack of association between the Trp719Arg polymorphism in kinesin-like protein-6 and coronary artery disease in 19 case-control studies.

    Assimes TL, Hólm H, Kathiresan S, Reilly MP, Thorleifsson G, Voight BF, Erdmann J, Willenborg C, Vaidya D, Xie C, Patterson CC, Morgan TM, Burnett MS, Li M, Hlatky MA, Knowles JW, Thompson JR, Absher D, Iribarren C, Go A, Fortmann SP, Sidney S, Risch N, Tang H, Myers RM, Berger K, Stoll M, Shah SH, Thorgeirsson G, Andersen K, Havulinna AS, Herrera JE, Faraday N, Kim Y, Kral BG, Mathias RA, Ruczinski I, Suktitipat B, Wilson AF, Yanek LR, Becker LC, Linsel-Nitschke P, Lieb W, König IR, Hengstenberg C, Fischer M, Stark K, Reinhard W, Winogradow J, Grassl M, Grosshennig A, Preuss M, Schreiber S, Wichmann HE, Meisinger C, Yee J, Friedlander Y, Do R, Meigs JB, Williams G, Nathan DM, MacRae CA, Qu L, Wilensky RL, Matthai WH, Qasim AN, Hakonarson H, Pichard AD, Kent KM, Satler L, Lindsay JM, Waksman R, Knouff CW, Waterworth DM, Walker MC, Mooser VE, Marrugat J, Lucas G, Subirana I, Sala J, Ramos R, Martinelli N, Olivieri O, Trabetti E, Malerba G, Pignatti PF, Guiducci C, Mirel D, Parkin M, Hirschhorn JN, Asselta R, Duga S, Musunuru K, Daly MJ, Purcell S, Eifert S, Braund PS, Wright BJ, Balmforth AJ, Ball SG, Myocardial Infarction Genetics Consortium, Wellcome Trust Case Control Consortium, Cardiogenics, Ouwehand WH, Deloukas P, Scholz M, Cambien F, Huge A, Scheffold T, Salomaa V, Girelli D, Granger CB, Peltonen L, McKeown PP, Altshuler D, Melander O, Devaney JM, Epstein SE, Rader DJ, Elosua R, Engert JC, Anand SS, Hall AS, Ziegler A, O'Donnell CJ, Spertus JA, Siscovick D, Schwartz SM, Becker D, Thorsteinsdottir U, Stefansson K, Schunkert H, Samani NJ and Quertermous T

    Department of Medicine, Stanford University School of Medicine, Stanford, California 94304-1334, USA.

    Objectives: We sought to replicate the association between the kinesin-like protein 6 (KIF6) Trp719Arg polymorphism (rs20455), and clinical coronary artery disease (CAD).

    Background: Recent prospective studies suggest that carriers of the 719Arg allele in KIF6 are at increased risk of clinical CAD compared with noncarriers.

    Methods: The KIF6 Trp719Arg polymorphism (rs20455) was genotyped in 19 case-control studies of nonfatal CAD either as part of a genome-wide association study or in a formal attempt to replicate the initial positive reports.

    Results: A total of 17,000 cases and 39,369 controls of European descent as well as a modest number of South Asians, African Americans, Hispanics, East Asians, and admixed cases and controls were successfully genotyped. None of the 19 studies demonstrated an increased risk of CAD in carriers of the 719Arg allele compared with noncarriers. Regression analyses and fixed-effects meta-analyses ruled out with high degree of confidence an increase of ≥2% in the risk of CAD among European 719Arg carriers. We also observed no increase in the risk of CAD among 719Arg carriers in the subset of Europeans with early-onset disease (younger than 50 years of age for men and younger than 60 years of age for women) compared with similarly aged controls as well as all non-European subgroups.

    Conclusions: The KIF6 Trp719Arg polymorphism was not associated with the risk of clinical CAD in this large replication study.

    Funded by: NHLBI NIH HHS: R01 HL056931-04, R01 HL087676-03

    Journal of the American College of Cardiology 2010;56;19;1552-63

  • Gene variants associated with schizophrenia in a Norwegian genome-wide study are replicated in a large European cohort.

    Athanasiu L, Mattingsdal M, Kähler AK, Brown A, Gustafsson O, Agartz I, Giegling I, Muglia P, Cichon S, Rietschel M, Pietiläinen OP, Peltonen L, Bramon E, Collier D, Clair DS, Sigurdsson E, Petursson H, Rujescu D, Melle I, Steen VM, Djurovic S and Andreassen OA

    Institute of Psychiatry, University of Oslo, P.O. 1130, Blindern, N-0318 Oslo, Norway

    We have performed a genome-wide association study (GWAS) of schizophrenia in a Norwegian discovery sample of 201 cases and 305 controls (TOP study) with a focused replication analysis in a larger European sample of 2663 cases and 13,780 control subjects (SGENE-plus study). Firstly, the discovery sample was genotyped with Affymetrix Genome-Wide Human SNP Array 6.0 and 572,888 markers were tested for schizophrenia association. No SNPs in the discovery sample attained genome-wide significance (P<8.7 x 10(-8)). Secondly, based on the GWAS data, we selected 1000 markers with the lowest P values in the discovery TOP sample, and tested these (or HapMap-based surrogates) for association in the replication sample. Sixteen loci were associated with schizophrenia (nominal P value<0.05 and concurring OR) in the replication sample. As a next step, we performed a combined analysis of the findings from these two studies, and the strongest evidence for association with schizophrenia was provided for markers rs7045881 on 9p21, rs433598 on 16p12 and rs10761482 on 10q21. The markers are located in PLAA, ACSM1 and ANK3, respectively. PLAA has not previously been described as a susceptibility gene, but 9p21 is implied as a schizophrenia linkage region. ACSM1 has been identified as a susceptibility gene in a previous schizophrenia GWAS study. The association of ANK3 with schizophrenia is intriguing in light of recent associations of ANK3 with bipolar disorder, thereby supporting the hypothesis of an overlap in genetic susceptibility between these psychopathological entities.

    Funded by: Wellcome Trust: 089061

    Journal of psychiatric research 2010;44;12;748-53

  • Transcriptome analysis of reproductive tissue and intrauterine developmental stages of the tsetse fly (Glossina morsitans morsitans).

    Attardo GM, Ribeiro JM, Wu Y, Berriman M and Aksoy S

    Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, USA.

    Background: Tsetse flies, vectors of African trypanosomes, undergo viviparous reproduction (the deposition of live offspring). This reproductive strategy results in a large maternal investment and the deposition of a small number of progeny during a female's lifespan. The reproductive biology of tsetse has been studied on a physiological level; however the molecular analysis of tsetse reproduction requires deeper investigation. To build a foundation from which to base molecular studies of tsetse reproduction, a cDNA library was generated from female tsetse (Glossina morsitans morsitans) reproductive tissues and the intrauterine developmental stages. 3438 expressed sequence tags were sequenced and analyzed.

    Results: Analysis of a nonredundant catalogue of 1391 contigs resulted in 520 predicted proteins. 475 of these proteins were full length. We predict that 412 of these represent cytoplasmic proteins while 57 are secreted. Comparison of these proteins with other tissue specific tsetse cDNA libraries (salivary gland, fat body/milk gland, and midgut) identified 51 that are unique to the reproductive/immature cDNA library. 11 unique proteins were homologous to uncharacterized putative proteins within the NR database suggesting the identification of novel genes associated with reproductive functions in other insects (hypothetical conserved). The analysis also yielded seven putative proteins without significant homology to sequences present in the public database (unknown genes). These proteins may represent unique functions associated with tsetse's viviparous reproductive cycle. RT-PCR analysis of hypothetical conserved and unknown contigs was performed to determine basic tissue and stage specificity of the expression of these genes.

    Conclusion: This paper identifies 51 putative proteins specific to a tsetse reproductive/immature EST library. 11 of these proteins correspond to hypothetical conserved genes and 7 proteins are tsetse specific.

    Funded by: NIAID NIH HHS: AI081774, AI51584, R01 AI081774-02, R01 AI081774-03, R01 AI081774-04, R21 AI076879-01A1, R21 AI076879-02; NIGMS NIH HHS: F32 GM077964

    BMC genomics 2010;11;160

  • Further evidence supporting a role for gs signal transduction in severe malaria pathogenesis.

    Auburn S, Fry AE, Clark TG, Campino S, Diakite M, Green A, Richardson A, Jallow M, Sisay-Joof F, Pinder M, Molyneux ME, Taylor TE, Haldar K, Rockett KA and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom.

    With the functional demonstration of a role in erythrocyte invasion by Plasmodium falciparum parasites, implications in the aetiology of common conditions that prevail in individuals of African origin, and a wealth of pharmacological knowledge, the stimulatory G protein (Gs) signal transduction pathway presents an exciting target for anti-malarial drug intervention. Having previously demonstrated a role for the G-alpha-s gene, GNAS, in severe malaria disease, we sought to identify other important components of the Gs pathway. Using meta-analysis across case-control and family trio (affected child and parental controls) studies of severe malaria from The Gambia and Malawi, we sought evidence of association in six Gs pathway candidate genes: adenosine receptor 2A (ADORA2A) and 2B (ADORA2B), beta-adrenergic receptor kinase 1 (ADRBK1), adenylyl cyclase 9 (ADCY9), G protein beta subunit 3 (GNB3), and regulator of G protein signalling 2 (RGS2). Our study amassed a total of 2278 cases and 2364 controls. Allele-based models of association were investigated in all genes, and genotype and haplotype-based models were investigated where significant allelic associations were identified. Although no significant associations were observed in the other genes, several were identified in ADORA2A. The most significant association was observed at the rs9624472 locus, where the G allele (approximately 20% frequency) appeared to confer enhanced risk to severe malaria [OR = 1.22 (1.09-1.37); P = 0.001]. Further investigation of the ADORA2A gene region is required to validate the associations identified here, and to identify and functionally characterize the responsible causal variant(s). Our results provide further evidence supporting a role of the Gs signal transduction pathway in the regulation of severe malaria, and request further exploration of this pathway in future studies.

    Funded by: Medical Research Council; NIAID NIH HHS: 5R01AI034969-09

    PloS one 2010;5;4;e10017

  • Genome-wide association study of ankylosing spondylitis identifies non-MHC susceptibility loci.

    Australo-Anglo-American Spondyloarthritis Consortium (TASC), Reveille JD, Sims AM, Danoy P, Evans DM, Leo P, Pointon JJ, Jin R, Zhou X, Bradbury LA, Appleton LH, Davis JC, Diekman L, Doan T, Dowling A, Duan R, Duncan EL, Farrar C, Hadler J, Harvey D, Karaderi T, Mogg R, Pomeroy E, Pryce K, Taylor J, Savage L, Deloukas P, Kumanduri V, Peltonen L, Ring SM, Whittaker P, Glazov E, Thomas GP, Maksymowych WP, Inman RD, Ward MM, Stone MA, Weisman MH, Wordsworth BP and Brown MA

    University of Texas Health Science Center at Houston, USA.

    To identify susceptibility loci for ankylosing spondylitis, we undertook a genome-wide association study in 2,053 unrelated ankylosing spondylitis cases among people of European descent and 5,140 ethnically matched controls, with replication in an independent cohort of 898 ankylosing spondylitis cases and 1,518 controls. Cases were genotyped with Illumina HumHap370 genotyping chips. In addition to strong association with the major histocompatibility complex (MHC; P < 10(-800)), we found association with SNPs in two gene deserts at 2p15 (rs10865331; combined P = 1.9 x 10(-19)) and 21q22 (rs2242944; P = 8.3 x 10(-20)), as well as in the genes ANTXR2 (rs4333130; P = 9.3 x 10(-8)) and IL1R2 (rs2310173; P = 4.8 x 10(-7)). We also replicated previously reported associations at IL23R (rs11209026; P = 9.1 x 10(-14)) and ERAP1 (rs27434; P = 5.3 x 10(-12)). This study reports four genetic loci associated with ankylosing spondylitis risk and identifies a major role for the interleukin (IL)-23 and IL-1 cytokine pathways in disease susceptibility.

    Funded by: Arthritis Research UK; Medical Research Council; NCRR NIH HHS: MO1-RR00425, UL1RR024188; NIAMS NIH HHS: R01-AR046208; PHS HHS: P01-052915; Wellcome Trust: 076113, 077011, 089061

    Nature genetics 2010;42;2;123-7

  • RamA, a member of the AraC/XylS family, influences both virulence and efflux in Salmonella enterica serovar Typhimurium.

    Bailey AM, Ivens A, Kingsley R, Cottell JL, Wain J and Piddock LJ

    Antimicrobial Agents Research Group, Department of Immunity and Infection, The Medical School, The University of Birmingham, Birmingham, B15 2TT, United Kingdom.

    The transcriptomes of Salmonella enterica serovar Typhimurium SL1344 lacking a functional ramA or ramR or with plasmid-mediated high-level overexpression of ramA were compared to those of the wild-type parental strain. Inactivation of ramA led to increased expression of 14 SPI-1 genes and decreased expression of three SPI-2 genes, and it altered expression of ribosomal biosynthetic genes and several amino acid biosynthetic pathways. Furthermore, disruption of ramA led to decreased survival within RAW 264.7 mouse macrophages and attenuation within the BALB/c ByJ mouse model. Highly overexpressed ramA led to increased expression of genes encoding multidrug resistance (MDR) efflux pumps, including acrAB, acrEF, and tolC. Decreased expression of 34 Salmonella pathogenicity island (SPI) 1 and 2 genes, decreased SipC production, decreased adhesion to and survival within macrophages, and decreased colonization of Caenorhabditis elegans were also seen. Disruption of ramR led to the increased expression of ramA, acrAB, and tolC, but not to the same level as when ramA was overexpressed on a plasmid. Inactivation of ramR had a more limited effect on pathogenicity gene expression. In silico analysis of a suggested RamA-binding consensus sequence identified target genes, including ramR, acrA, tolC, sipABC, and ssrA. This study demonstrates that the regulation of a mechanism of MDR and expression of virulence genes show considerable overlap, and we postulate that such a mechanism is dependent on transcriptional activator concentration and promoter sensitivity. However, we have no evidence to support the hypothesis that increased MDR via RamA regulation of AcrAB-TolC gives rise to a hypervirulent strain.

    Funded by: Medical Research Council: GO501415, GO801977; Wellcome Trust

    Journal of bacteriology 2010;192;6;1607-16

  • Searching for the elusive typhoid diagnostic.

    Baker S, Favorov M and Dougan G

    Oxford University Clinical Research Unit, The Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam.

    Typhoid (enteric) fever is still a common disease in many developing countries but current diagnostic tests are inadequate. Studies on pathogenesis and genomics have provided new insight into the organisms that cause enteric fever. Better understanding of the microorganisms explains, in part, why our current typhoid methodologies are limited in their diagnostic information and why developing new strategies may be a considerable challenge. Here we discuss the current position of typhoid diagnostics, highlight the need for technological improvements and suggest potential ways of advancing this area.

    Funded by: Wellcome Trust

    BMC infectious diseases 2010;10;45

  • The structure of Jann_2411 (DUF1470) from Jannaschia sp. at 1.45 Å resolution reveals a new fold (the ABATE domain) and suggests its possible role as a transcription regulator.

    Bakolitsa C, Bateman A, Jin KK, McMullan D, Krishna SS, Miller MD, Abdubek P, Acosta C, Astakhova T, Axelrod HL, Burra P, Carlton D, Chiu HJ, Clayton T, Das D, Deller MC, Duan L, Elias Y, Feuerhelm J, Grant JC, Grzechnik A, Grzechnik SK, Han GW, Jaroszewski L, Klock HE, Knuth MW, Kozbial P, Kumar A, Marciano D, Morse AT, Murphy KD, Nigoghossian E, Okach L, Oommachen S, Paulsen J, Reyes R, Rife CL, Sefcovic N, Tien H, Trame CB, Trout CV, van den Bedem H, Weekes D, White A, Xu Q, Hodgson KO, Wooley J, Elsliger MA, Deacon AM, Godzik A, Lesley S and Wilson IA

    Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA.

    The crystal structure of Jann_2411 from Jannaschia sp. strain CCS1, a member of the Pfam PF07336 family classified as a domain of unknown function (DUF1470), was solved to a resolution of 1.45 Å by multiple-wavelength anomalous dispersion (MAD). This protein is the first structural representative of the DUF1470 Pfam family. Structural analysis revealed a two-domain organization, with the N-terminal domain presenting a new fold called the ABATE domain that may bind an as yet unknown ligand. The C-terminal domain forms a treble-clef zinc finger that is likely to be involved in DNA binding. Analysis of the Jann_2411 protein and the broader ABATE-domain family suggests a role as stress-induced transcriptional regulators.

    Funded by: NIGMS NIH HHS: P50 GM62411, U54 GM074898; Wellcome Trust: 087656, WT077044/Z/05/Z

    Acta crystallographica. Section F, Structural biology and crystallization communications 2010;66;Pt 10;1198-204

  • A predominantly neolithic origin for European paternal lineages.

    Balaresque P, Bowden GR, Adams SM, Leung HY, King TE, Rosser ZH, Goodwin J, Moisan JP, Richard C, Millward A, Demaine AG, Barbujani G, Previderè C, Wilson IJ, Tyler-Smith C and Jobling MA

    Department of Genetics, University of Leicester, Leicester, United Kingdom.

    The relative contributions to modern European populations of Paleolithic hunter-gatherers and Neolithic farmers from the Near East have been intensely debated. Haplogroup R1b1b2 (R-M269) is the commonest European Y-chromosomal lineage, increasing in frequency from east to west, and carried by 110 million European men. Previous studies suggested a Paleolithic origin, but here we show that the geographical distribution of its microsatellite diversity is best explained by spread from a single source in the Near East via Anatolia during the Neolithic. Taken with evidence on the origins of other haplogroups, this indicates that most European Y chromosomes originate in the Neolithic expansion. This reinterpretation makes Europe a prime example of how technological and cultural change is linked with the expansion of a Y-chromosomal lineage, and the contrast of this pattern with that shown by maternally inherited mitochondrial DNA suggests a unique role for males in the transition.

    Funded by: Wellcome Trust: 057559, 065569, 084060, 087576

    PLoS biology 2010;8;1;e1000285

  • Attack of the clones.

    Baldry S

    Nature reviews. Microbiology 2010;8;6;390

  • Comparative genomics of prevaccination and modern Bordetella pertussis strains.

    Bart MJ, van Gent M, van der Heide HG, Boekhorst J, Hermans P, Parkhill J and Mooi FR

    Laboratory for Infectious Diseases and Screening, Netherlands Centre for Infectious Diseases Control, RIVM, Bilthoven, Netherlands.

    Background: Despite vaccination since the 1950s, pertussis has persisted and resurged. It remains a major cause of infant death worldwide and is the most prevalent vaccine-preventable disease in developed countries. The resurgence of pertussis has been associated with the expansion of Bordetella pertussis strains with a novel allele for the pertussis toxin (Ptx) promoter, ptxP3, which have replaced resident ptxP1 strains. Compared to ptxP1 strains, ptxP3 produce more Ptx resulting in increased virulence and immune suppression. To elucidate how B. pertussis has adapted to vaccination, we compared genome sequences of two ptxP3 strains with four strains isolated before and after the introduction vaccination.

    Results: The distribution of SNPs in regions involved in transcription and translation suggested that changes in gene regulation play an important role in adaptation. No evidence was found for acquisition of novel genes. Modern strains differed significantly from prevaccination strains, both phylogenetically and with respect to particular alleles. The ptxP3 strains were found to have diverged recently from modern ptxP1 strains. Differences between ptxP3 and modern ptxP1 strains included SNPs in a number of pathogenicity-associated genes. Further, both gene inactivation and reactivation was observed in ptxP3 strains relative to modern ptxP1 strains.

    Conclusions: Our work suggests that B. pertussis adapted by successive accumulation of SNPs and by gene (in)activation. In particular changes in gene regulation may have played a role in adaptation.

    BMC genomics 2010;11;627

  • Curators of the world unite: the International Society of Biocuration.

    Bateman A

    Bioinformatics (Oxford, England) 2010;26;8;991

  • Time to underpin Wikipedia wisdom.

    Bateman A and Logan DW

    Nature 2010;468;7325;765

  • DUFs: families in search of function.

    Bateman A, Coggill P and Finn RD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, England.

    Domains of unknown function (DUFs) are a large set of uncharacterized protein families that are found in the Pfam database. Here, the scale and growth of functionally uncharacterized families in biological databases are surveyed and the prospects for discovering their function are examined. In particular, the important role that structural genomics can play in identifying potential function is evaluated.

    Funded by: Wellcome Trust: 087656, WT077044/Z/05/Z

    Acta crystallographica. Section F, Structural biology and crystallization communications 2010;66;Pt 10;1148-52

  • Signatures of adaptation to obligate biotrophy in the Hyaloperonospora arabidopsidis genome.

    Baxter L, Tripathy S, Ishaque N, Boot N, Cabral A, Kemen E, Thines M, Ah-Fong A, Anderson R, Badejoko W, Bittner-Eddy P, Boore JL, Chibucos MC, Coates M, Dehal P, Delehaunty K, Dong S, Downton P, Dumas B, Fabro G, Fronick C, Fuerstenberg SI, Fulton L, Gaulin E, Govers F, Hughes L, Humphray S, Jiang RH, Judelson H, Kamoun S, Kyung K, Meijer H, Minx P, Morris P, Nelson J, Phuntumart V, Qutob D, Rehmany A, Rougon-Cardoso A, Ryden P, Torto-Alalibo T, Studholme D, Wang Y, Win J, Wood J, Clifton SW, Rogers J, Van den Ackerveken G, Jones JD, McDowell JM, Beynon J and Tyler BM

    School of Life Sciences, Warwick University, Wellesbourne, CV35 9EF, UK.

    Many oomycete and fungal plant pathogens are obligate biotrophs, which extract nutrients only from living plant tissue and cannot grow apart from their hosts. Although these pathogens cause substantial crop losses, little is known about the molecular basis or evolution of obligate biotrophy. Here, we report the genome sequence of the oomycete Hyaloperonospora arabidopsidis (Hpa), an obligate biotroph and natural pathogen of Arabidopsis thaliana. In comparison with genomes of related, hemibiotrophic Phytophthora species, the Hpa genome exhibits dramatic reductions in genes encoding (i) RXLR effectors and other secreted pathogenicity proteins, (ii) enzymes for assimilation of inorganic nitrogen and sulfur, and (iii) proteins associated with zoospore formation and motility. These attributes comprise a genomic signature of evolution toward obligate biotrophy.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C509123/1, BB/E024815/1, BB/E024882/1, BB/F0161901, EP/F500025/1, T12144; Wellcome Trust

    Science (New York, N.Y.) 2010;330;6010;1549-51

  • Genomic hotspots for adaptation: the population genetics of Müllerian mimicry in the Heliconius melpomene clade.

    Baxter SW, Nadeau NJ, Maroja LS, Wilkinson P, Counterman BA, Dawson A, Beltran M, Perez-Espona S, Chamberlain N, Ferguson L, Clark R, Davidson C, Glithero R, Mallet J, McMillan WO, Kronforst M, Joron M, Ffrench-Constant RH and Jiggins CD

    Department of Zoology, University of Cambridge, Cambridge, United Kingdom.

    Wing patterning in Heliconius butterflies is a longstanding example of both Müllerian mimicry and phenotypic radiation under strong natural selection. The loci controlling such patterns are "hotspots" for adaptive evolution with great allelic diversity across different species in the genus. We characterise nucleotide variation, genotype-by-phenotype associations, linkage disequilibrium, and candidate gene expression at two loci and across multiple hybrid zones in Heliconius melpomene and relatives. Alleles at HmB control the presence or absence of the red forewing band, while alleles at HmYb control the yellow hindwing bar. Across HmYb two regions, separated by approximately 100 kb, show significant genotype-by-phenotype associations that are replicated across independent hybrid zones. In contrast, at HmB a single peak of association indicates the likely position of functional sites at three genes, encoding a kinesin, a G-protein coupled receptor, and an mRNA splicing factor. At both HmYb and HmB there is evidence for enhanced linkage disequilibrium (LD) between associated sites separated by up to 14 kb, suggesting that multiple sites are under selection. However, there was no evidence for reduced variation or deviations from neutrality that might indicate a recent selective sweep, consistent with these alleles being relatively old. Of the three genes showing an association with the HmB locus, the kinesin shows differences in wing disc expression between races that are replicated in the co-mimic, Heliconius erato, providing striking evidence for parallel changes in gene expression between Müllerian co-mimics. Wing patterning loci in Heliconius melpomene therefore show a haplotype structure maintained by selection, but no evidence for a recent selective sweep. The complex genetic pattern contrasts with the simple genetic basis of many adaptive traits studied previously, but may provide a better model for most adaptation in natural populations that has arisen over millions rather than tens of years.

    Funded by: Biotechnology and Biological Sciences Research Council: 011845

    PLoS genetics 2010;6;2;e1000794

  • Comparison of different criteria for the diagnosis of primary myelofibrosis reveals limited clinical utility for measurement of serum lactate dehydrogenase.

    Beer PA, Campbell PJ and Green AR

    Department of Haematology, University of Cambridge, 1Cambridge Institute for Medical Research, Cambridge, United Kingdom.

    Primary myelofibrosis shows histological and pathogenetic overlap with essential thrombocythemia and polycythemia vera. Several diagnostic classifications have been proposed for primary myelofibrosis, although little is known about their clinical utility. In a comparison of three recent classifications, overall concordance was 79%. Inclusion of raised serum lactate dehydrogenase categorized 9% of patients as primary myelofibrosis when other criteria were not met. Although mean serum lactate dehydrogenase levels were higher in patients with primary myelofibrosis, levels were also increased in the majority of patients with essential thrombocythemia or polycythemia vera, and significant overlap was observed. A positive correlation with higher leukocyte and platelet count, and disease duration in primary myelofibrosis, suggests that serum lactate dehydrogenase is a biomarker for disease bulk and/or cellular proliferation. In conclusion, raised lactate dehydrogenase lacks specificity for primary myelofibrosis, consistent with the concept of a phenotypic continuum between essential thrombocythemia, polycythemia vera and primary myelofibrosis.

    Funded by: Medical Research Council; Wellcome Trust: 088340

    Haematologica 2010;95;11;1960-3

  • Independently acquired biallelic JAK2 mutations are present in a minority of patients with essential thrombocythemia.

    Beer PA, Ortmann CA, Campbell PJ and Green AR

    Funded by: Medical Research Council; Wellcome Trust: 088340

    Blood 2010;116;6;1013-4

  • Mobile genetic element proliferation and gene inactivation impact over the genome structure and metabolic capabilities of Sodalis glossinidius, the secondary endosymbiont of tsetse flies.

    Belda E, Moya A, Bentley S and Silva FJ

    Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain.

    Background: Genome reduction is a common evolutionary process in symbiotic and pathogenic bacteria. This process has been extensively characterized in bacterial endosymbionts of insects, where primary mutualistic bacteria represent the most extreme cases of genome reduction consequence of a massive process of gene inactivation and loss during their evolution from free-living ancestors. Sodalis glossinidius, the secondary endosymbiont of tsetse flies, contains one of the few complete genomes of bacteria at the very beginning of the symbiotic association, allowing to evaluate the relative impact of mobile genetic element proliferation and gene inactivation over the structure and functional capabilities of this bacterial endosymbiont during the transition to a host dependent lifestyle.

    Results: A detailed characterization of mobile genetic elements and pseudogenes reveals a massive presence of different types of prophage elements together with five different families of IS elements that have proliferated across the genome of Sodalis glossinidius at different levels. In addition, a detailed survey of intergenic regions allowed the characterization of 1501 pseudogenes, a much higher number than the 972 pseudogenes described in the original annotation. Pseudogene structure reveals a minor impact of mobile genetic element proliferation in the process of gene inactivation, with most of pseudogenes originated by multiple frameshift mutations and premature stop codons. The comparison of metabolic profiles of Sodalis glossinidius and tsetse fly primary endosymbiont Wiglesworthia glossinidia based on their whole gene and pseudogene repertoires revealed a novel case of pathway inactivation, the arginine biosynthesis, in Sodalis glossinidius together with a possible case of metabolic complementation with Wigglesworthia glossinidia for thiamine biosynthesis.

    Conclusions: The complete re-analysis of the genome sequence of Sodalis glossinidius reveals novel insights in the evolutionary transition from a free-living ancestor to a host-dependent lifestyle, with a massive proliferation of mobile genetic elements mainly of phage origin although with minor impact in the process of gene inactivation that is taking place in this bacterial genome. The metabolic analysis of the whole endosymbiotic consortia of tsetse flies have revealed a possible phenomenon of metabolic complementation between primary and secondary endosymbionts that can contribute to explain the co-existence of both bacterial endosymbionts in the context of the tsetse host.

    BMC genomics 2010;11;449

  • Integrated genetic and epigenetic analysis identifies haplotype-specific methylation in the FTO type 2 diabetes and obesity susceptibility locus.

    Bell CG, Finer S, Lindgren CM, Wilson GA, Rakyan VK, Teschendorff AE, Akan P, Stupka E, Down TA, Prokopenko I, Morison IM, Mill J, Pidsley R, International Type 2 Diabetes 1q Consortium, Deloukas P, Frayling TM, Hattersley AT, McCarthy MI, Beck S and Hitman GA

    Medical Genomics, UCL Cancer Institute, University College London, London, United Kingdom.

    Recent multi-dimensional approaches to the study of complex disease have revealed powerful insights into how genetic and epigenetic factors may underlie their aetiopathogenesis. We examined genotype-epigenotype interactions in the context of Type 2 Diabetes (T2D), focussing on known regions of genomic susceptibility. We assayed DNA methylation in 60 females, stratified according to disease susceptibility haplotype using previously identified association loci. CpG methylation was assessed using methylated DNA immunoprecipitation on a targeted array (MeDIP-chip) and absolute methylation values were estimated using a Bayesian algorithm (BATMAN). Absolute methylation levels were quantified across LD blocks, and we identified increased DNA methylation on the FTO obesity susceptibility haplotype, tagged by the rs8050136 risk allele A (p = 9.40×10(-4), permutation p = 1.0×10(-3)). Further analysis across the 46 kb LD block using sliding windows localised the most significant difference to be within a 7.7 kb region (p = 1.13×10(-7)). Sequence level analysis, followed by pyrosequencing validation, revealed that the methylation difference was driven by the co-ordinated phase of CpG-creating SNPs across the risk haplotype. This 7.7 kb region of haplotype-specific methylation (HSM), encapsulates a Highly Conserved Non-Coding Element (HCNE) that has previously been validated as a long-range enhancer, supported by the histone H3K4me1 enhancer signature. This study demonstrates that integration of Genome-Wide Association (GWA) SNP and epigenomic DNA methylation data can identify potential novel genotype-epigenotype interactions within disease-associated loci, thus providing a novel route to aid unravelling common complex diseases.

    Funded by: Medical Research Council; NIDDK NIH HHS: DK-073490; Wellcome Trust: 084071, WT086596/Z/08/Z

    PloS one 2010;5;11;e14040

  • Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06.

    Bennett JS, Bentley SD, Vernikos GS, Quail MA, Cherevach I, White B, Parkhill J and Maiden MC

    Department of Zoology, University of Oxford, UK.

    Background: The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally.

    Results: Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent.

    Conclusion: The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed.

    Funded by: Wellcome Trust: 087622

    BMC genomics 2010;11;652

  • Taming the next-gen beast.

    Bentley S

    Stephen Bentley is at the Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    This month's Genome Watch discusses how alternative approaches to using second-generation sequencing technologies are powerful tools for the analysis of common pathogenic bacteria.

    Nature reviews. Microbiology 2010;8;3;161

  • Study of smell and reproductive organs in a mouse model for CHARGE syndrome.

    Bergman JE, Bosman EA, van Ravenswaaij-Arts CM and Steel KP

    Department of Genetics, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands.

    CHARGE syndrome is a multiple congenital anomaly syndrome characterised by Coloboma, Heart defects, Atresia of choanae, Retardation of growth and/or development, Genital hypoplasia, and Ear anomalies often associated with deafness. It is caused by heterozygous mutations in the CHD7 gene and shows a highly variable phenotype. Anosmia and hypogonadotropic hypogonadism occur in the majority of the CHARGE patients, but the underlying pathogenesis is unknown. Therefore, we studied the ability to smell and aspects of the reproductive system (reproductive performance, gonadotropin-releasing hormone (GnRH) neurons and anatomy of testes and uteri) in a mouse model for CHARGE syndrome, the whirligig mouse (Chd7(Whi/+)). We showed that Chromodomain Helicase DNA-binding protein 7 (Chd7) is expressed in brain areas involved in olfaction and reproduction during embryonic development. We observed poorer performance in the smell test in adult Chd7(Whi/+) mice, secondary either to olfactory dysfunction or to balance disturbances. Olfactory bulb and reproductive organ abnormalities were observed in a proportion of Chd7(Whi/+) mice. Hypothalamic GnRH neurons were slightly reduced in Chd7(Whi/+) females and reproductive performance was slightly less in Chd7(Whi/+) mice. This study shows that the penetrance of anosmia and hypogonadotropic hypogonadism is lower in Chd7(Whi/+) mice than in CHARGE patients. Interestingly, many phenotypic features of the Chd7 mutation showed incomplete penetrance in our model mice, despite the use of inbred, genetically identical mice. This supports the theory that the extreme variability of the CHARGE phenotype in both humans and mice might be attributed to variations in the fetal microenvironment or to purely stochastic events.

    Funded by: Medical Research Council; Wellcome Trust

    European journal of human genetics : EJHG 2010;18;2;171-7

  • Prevalence of Salmonella enterica in poultry and eggs in Uruguay during an epidemic due to Salmonella enterica serovar Enteritidis.

    Betancor L, Pereira M, Martinez A, Giossa G, Fookes M, Flores K, Barrios P, Repiso V, Vignoli R, Cordeiro N, Algorta G, Thomson N, Maskell D, Schelotto F and Chabalgoity JA

    Bacteriology and Virology Department, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay.

    Salmonella enterica serovar Enteritidis (S. Enteritidis) is frequently associated with food-borne disease worldwide. Poultry-derived products are a major source. An epidemic of human infection with S. Enteritidis occurred in Uruguay, and to evaluate the extent of poultry contamination, we conducted a nationwide survey over 2 years that included the analysis of sera from 5,751 birds and 12,400 eggs. Serological evidence of infection with Salmonella group O:9 was found in 24.4% of the birds. All positive sera were retested with a gm flagellum-based enzyme-linked immunosorbent assay, and based on these results, the national prevalence of S. Enteritidis infection was estimated to be 6.3%. Salmonellae were recovered from 58 of 620 pools made up of 20 eggs each, demonstrating a prevalence of at least 1 in every 214 eggs. Surprisingly, the majority of the isolates were not S. Enteritidis. Thirty-nine isolates were typed as S. Derby, 9 as S. Gallinarum, 8 as S. Enteritidis, and 2 as S. Panama. Despite the highest prevalence in eggs, S. Derby was not isolated from humans in the period of analysis, suggesting a low capacity to infect humans. Microarray-based comparative genomic hybridization analysis of S. Derby and S. Enteritidis revealed more than 350 genetic differences. S. Derby lacked pathogenicity islands 13 and 14, the fimbrial lpf operon, and other regions encoding metabolic functions. Several of these regions are present not only in serovar Enteritidis but also in all sequenced strains of S. Typhimurium, suggesting that these regions might be related to the capacity of Salmonella to cause food-borne disease.

    Funded by: Wellcome Trust: 078168/Z/05/Z

    Journal of clinical microbiology 2010;48;7;2413-23

  • Variants in ACAD10 are associated with type 2 diabetes, insulin resistance and lipid oxidation in Pima Indians.

    Bian L, Hanson RL, Muller YL, Ma L, MAGIC Investigators, Kobes S, Knowler WC, Bogardus C and Baier LJ

    Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, 445 N. 5th Street, Suite 210, Phoenix, AZ 85004, USA.

    A prior genome-wide association study in Pima Indians identified a variant within the ACAD10 gene that is associated with early-onset type 2 diabetes. Acylcoenzyme A dehydrogenase 10 (ACAD10) catalyses mitochondrial fatty acid beta-oxidation, which plays a pivotal role in developing insulin resistance and type 2 diabetes. Therefore, ACAD10 was analysed as a positional and biological candidate for type 2 diabetes.

    Methods: Twenty-three SNPs were genotyped in 1,500 Pima Indians to determine the linkage disequilibrium pattern across ACAD10. Association with type 2 diabetes was determined by genotyping four tag single nucleotide polymorphisms (SNPs) in a population-based sample of 3,501 full-heritage Pima Indians; two associated SNPs were further genotyped in a second population-based sample of 3,723 American Indians. Associations with quantitative traits were assessed in 415 non-diabetic full heritage Pima individuals who had been metabolically phenotyped.

    Results: SNPs rs601663 and rs659964 were associated with type 2 diabetes in the full-heritage Pima Indian sample (p=0.04 and 0.0006, respectively), and rs659964 was further associated with type 2 diabetes in the second American Indian sample (p=0.04). Combination of these two samples provided the strongest evidence for association (p=0.009 and 0.00007, for rs601663 and rs659964, respectively). Quantitative trait analyses identified nominal associations with both lower lipid oxidation rate and larger subcutaneous abdominal adipocyte size, which is consistent with the known physiology of ACAD10, and also identified associations with increased insulin resistance.

    We propose that ACAD10 variation may increase type 2 diabetes susceptibility by impairing insulin sensitivity via abnormal lipid oxidation.

    Funded by: NIDDK NIH HHS: ZIA DK075012-04

    Diabetologia 2010;53;7;1349-53

  • Signatures of mutation and selection in the cancer genome.

    Bignell GR, Greenman CD, Davies H, Butler AP, Edkins S, Andrews JM, Buck G, Chen L, Beare D, Latimer C, Widaa S, Hinton J, Fahey C, Fu B, Swamy S, Dalgliesh GL, Teh BT, Deloukas P, Yang F, Campbell PJ, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The cancer genome is moulded by the dual processes of somatic mutation and selection. Homozygous deletions in cancer genomes occur over recessive cancer genes, where they can confer selective growth advantage, and over fragile sites, where they are thought to reflect an increased local rate of DNA breakage. However, most homozygous deletions in cancer genomes are unexplained. Here we identified 2,428 somatic homozygous deletions in 746 cancer cell lines. These overlie 11% of protein-coding genes that, therefore, are not mandatory for survival of human cells. We derived structural signatures that distinguish between homozygous deletions over recessive cancer genes and fragile sites. Application to clusters of unexplained homozygous deletions suggests that many are in regions of inherent fragility, whereas a small subset overlies recessive cancer genes. The results illustrate how structural signatures can be used to distinguish between the influences of mutation and selection in cancer genomes. The extensive copy number, genotyping, sequence and expression data available for this large series of publicly available cancer cell lines renders them informative reagents for future studies of cancer biology and drug discovery.

    Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 093867

    Nature 2010;463;7283;893-8

  • Sex determination in the social amoeba Dictyostelium discoideum.

    Bloomfield G, Skelton J, Ivens A, Tanaka Y and Kay RR

    Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK.

    The genetics of sex determination remain mysterious in many organisms, including some that are otherwise well studied. Here we report the discovery and analysis of the mating-type locus of the model organism Dictyostelium discoideum. Three forms of a single genetic locus specify this species' three mating types: two versions of the locus are entirely different in sequence, and the third resembles a composite of the other two. Single, unrelated genes are sufficient to determine two of the mating types, whereas homologs of both these genes are required in the composite type. The key genes encode polypeptides that possess no recognizable similarity to established protein families. Sex determination in the social amoebae thus appears to use regulators that are unrelated to any others currently known.

    Funded by: Medical Research Council; Wellcome Trust: 06724

    Science (New York, N.Y.) 2010;330;6010;1533-6

  • Large, rare chromosomal deletions associated with severe early-onset obesity.

    Bochukova EG, Huang N, Keogh J, Henning E, Purmann C, Blaszczyk K, Saeed S, Hamilton-Shield J, Clayton-Smith J, O'Rahilly S, Hurles ME and Farooqi IS

    University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.

    Obesity is a highly heritable and genetically heterogeneous disorder. Here we investigated the contribution of copy number variation to obesity in 300 Caucasian patients with severe early-onset obesity, 143 of whom also had developmental delay. Large (>500 kilobases), rare (<1%) deletions were significantly enriched in patients compared to 7,366 controls (P < 0.001). We identified several rare copy number variants that were recurrent in patients but absent or at much lower prevalence in controls. We identified five patients with overlapping deletions on chromosome 16p11.2 that were found in 2 out of 7,366 controls (P < 5 x 10(-5)). In three patients the deletion co-segregated with severe obesity. Two patients harboured a larger de novo 16p11.2 deletion, extending through a 593-kilobase region previously associated with autism and mental retardation; both of these patients had mild developmental delay in addition to severe obesity. In an independent sample of 1,062 patients with severe obesity alone, the smaller 16p11.2 deletion was found in an additional two patients. All 16p11.2 deletions encompass several genes but include SH2B1, which is known to be involved in leptin and insulin signalling. Deletion carriers exhibited hyperphagia and severe insulin resistance disproportionate for the degree of obesity. We show that copy number variation contributes significantly to the genetic architecture of human obesity.

    Funded by: Medical Research Council: G0900554; Wellcome Trust: 077014, 077014/Z/05/0Z, 082390, 082390/Z/07/Z), 085475

    Nature 2010;463;7281;666-70

  • Variants at DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci are associated with reduced glucose-stimulated beta cell function in middle-aged Danish people.

    Boesgaard TW, Grarup N, Jørgensen T, Borch-Johnsen K, Meta-Analysis of Glucose and Insulin-Related Trait Consortium (MAGIC), Hansen T and Pedersen O

    Hagedorn Research Institute, Niels Steensens Vej 2, 2820 Gentofte, Denmark.

    A meta-analysis of 21 genome-wide association studies identified 11 novel genetic loci implicated in fasting glucose homeostasis. We aimed to evaluate the impact of these variants on insulin release and insulin sensitivity estimated from OGTTs.

    Methods: Eleven variants in or near DGKB/TMEM195, ADCY5, MADD, ADRA2A, FADS1, CRY2, SLC2A2, GLIS3, PROX1, C2CD4B and IGF1 were genotyped in 6,784 middle-aged participants of the population-based Inter99 cohort. Association studies of quantitative estimates of insulin release and insulin sensitivity were performed in 5,722 non-diabetic Danish participants on whom an OGTT was performed.

    Results: Assuming an additive genetic model, carriers of the alleles increasing fasting glucose in DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B showed decreased glucose-stimulated insulin release as assessed by the BIGTT-acute insulin response index (2.7-3.5%; p < 0.005 for all) and by corrected insulin response (2.8-5.9%; p < 0.03 for all). In addition, the PROX1 glucose-raising allele showed a 2.9% decreased corrected insulin response (p = 0.03), while the hyperglycaemic allele of variants in or near ADRA2A, FADS1, CRY2 and C2CD4B were associated with a 2.6% to 9.3% decrease in one or both of two different OGTT-based disposition indices (p < 0.02 for all). After correction for multiple testing, variants in the DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci were associated with estimates of beta cell function.

    We found that the lead variants at the DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci were associated with decreased glucose-stimulated insulin response. This association underlines the importance of pancreatic beta cell dysfunction in the genetic predisposition to hyperglycaemia and type 2 diabetes.

    Diabetologia 2010;53;8;1647-55

  • Gap5--editing the billion fragment sequence assembly.

    Bonfield JK and Whitwham A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    MOTIVATION: Existing sequence assembly editors struggle with the volumes of data now readily available from the latest generation of DNA sequencing instruments. RESULTS: We describe the Gap5 software along with the data structures and algorithms used that allow it to be scalable. We demonstrate this with an assembly of 1.1 billion sequence fragments and compare the performance with several other programs. We analyse the memory, CPU, I/O usage and file sizes used by Gap5. Availability and Implementation: Gap5 is part of the Staden Package and is available under an Open Source licence from It is implemented in C and Tcl/Tk. Currently it works on Unix systems only.

    Funded by: Medical Research Council; Wellcome Trust: 077200/Z/05/Z

    Bioinformatics (Oxford, England) 2010;26;14;1699-703

  • Large-scale association analysis of TNF/LTA gene region polymorphisms in type 2 diabetes.

    Boraska V, Rayner NW, Groves CJ, Frayling TM, Diakite M, Rockett KA, Kwiatkowski DP, Day-Williams AG, McCarthy MI and Zeggini E

    Department of Medical Biology, University of Split School of Medicine, Split, Croatia.

    Background: The TNF/LTA locus has been a long-standing T2D candidate gene. Several studies have examined association of TNF/LTA SNPs with T2D but the majority have been small-scale and produced no convincing evidence of association. The purpose of this study is to examine T2D association of tag SNPs in the TNF/LTA region capturing the majority of common variation in a large-scale sample set of UK/Irish origin.

    Methods: This study comprised a case-control (1520 cases and 2570 control samples) and a family-based component (423 parent-offspring trios). Eleven tag SNPs (rs928815, rs909253, rs746868, rs1041981 (T60N), rs1800750, rs1800629 (G-308A), rs361525 (G-238A), rs3093662, rs3093664, rs3093665, and rs3093668) were selected across the TNF/LTA locus and genotyped using a fluorescence-based competitive allele specific assay. Quality control of the obtained genotypes was performed prior to single- and multi-point association analyses under the additive model.

    Results: We did not find any consistent SNP associations with T2D in the case-control or family-based datasets.

    Conclusions: The present study, designed to analyse a set of tag SNPs specifically selected to capture the majority of common variation in the TNF/LTA gene region, found no robust evidence for association with T2D. To investigate the presence of smaller effects of TNF/LTA gene variation with T2D, a large-scale meta-analysis will be required.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 076113, WT088885/Z/09/Z

    BMC medical genetics 2010;11;69

  • 53BP1 loss rescues BRCA1 deficiency and is associated with triple-negative and BRCA-mutated breast cancers.

    Bouwman P, Aly A, Escandell JM, Pieterse M, Bartkova J, van der Gulden H, Hiddingh S, Thanasoula M, Kulkarni A, Yang Q, Haffty BG, Tommiska J, Blomqvist C, Drapkin R, Adams DJ, Nevanlinna H, Bartek J, Tarsounas M, Ganesan S and Jonkers J

    Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

    Germ-line mutations in breast cancer 1, early onset (BRCA1) result in predisposition to breast and ovarian cancer. BRCA1-mutated tumors show genomic instability, mainly as a consequence of impaired recombinatorial DNA repair. Here we identify p53-binding protein 1 (53BP1) as an essential factor for sustaining the growth arrest induced by Brca1 deletion. Depletion of 53BP1 abrogates the ATM-dependent checkpoint response and G2 cell-cycle arrest triggered by the accumulation of DNA breaks in Brca1-deleted cells. This effect of 53BP1 is specific to BRCA1 function, as 53BP1 depletion did not alleviate proliferation arrest or checkpoint responses in Brca2-deleted cells. Notably, loss of 53BP1 partially restores the homologous-recombination defect of Brca1-deleted cells and reverts their hypersensitivity to DNA-damaging agents. We find reduced 53BP1 expression in subsets of sporadic triple-negative and BRCA-associated breast cancers, indicating the potential clinical implications of our findings.

    Funded by: Cancer Research UK: A6997, A8784; Wellcome Trust: 082356

    Nature structural & molecular biology 2010;17;6;688-95

  • Rare variation at the TNFAIP3 locus and susceptibility to rheumatoid arthritis.

    Bowes J, Lawrence R, Eyre S, Panoutsopoulou K, Orozco G, Elliott KS, Ke X, Morris AP, UKRAG, Thomson W, Worthington J, Barton A and Zeggini E

    Arthritis Research UK, Epidemiology Unit, University of Manchester, Manchester, UK.

    Genome-wide association studies (GWAS) conducted using commercial single nucleotide polymorphisms (SNP) arrays have proven to be a powerful tool for the detection of common disease susceptibility variants. However, their utility for the detection of lower frequency variants is yet to be practically investigated. Here we describe the application of a rare variant collapsing method to a large genome-wide SNP dataset, the Wellcome Trust Case Control Consortium rheumatoid arthritis (RA) GWAS. We partitioned the data into gene-centric bins and collapsed genotypes of low frequency variants (defined here as MAF ≤ 0.05) into a single count coupled with univariate analysis. We then prioritized gene regions for further investigation in an independent cohort of 3,355 cases and 2,427 controls based on rare variant signal p value and prior evidence to support involvement in RA. A total of 14,536 gene bins were investigated in the primary analysis and signals mapping to the TNFAIP3 and chr17q24 loci were selected for further investigation. We detected replicating association to low frequency variants in the TNFAIP3 gene (combined p = 6.6 × 10(-6)). Even though rare variants are not well-represented and can be difficult to genotype in GWAS, our study supports the application of low frequency variant collapsing methods to genome-wide SNP datasets as a means of exploiting data that are routinely ignored.

    Funded by: Arthritis Research UK: 17552; Wellcome Trust: 064890, 081682

    Human genetics 2010;128;6;627-33

  • Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop

    Brister JR, Bao Y, Kuiken C, Lefkowitz EJ, Le Mercier P, Leplae R, Madupu R, Scheuermann RH, Schobel S, Seto D, Shrivastava S, STERK P, Zeng Q, Klimke W, Tatusova T

    Viruses-Basel. 2010;2;2258-68

  • Scoring and validation of tandem MS peptide identification methods.

    Brosch M and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    A variety of methods are described in the literature to assign peptide sequences to observed tandem MS data. Typically, the identified peptides are associated only with an arbitrary score that reflects the quality of the peptide-spectrum match but not with a statistically meaningful significance measure. In this chapter, we discuss why statistical significance measures can simplify and unify the interpretation of MS-based proteomic experiments. In addition, we also present available software solutions that convert scores into sound statistical measures.

    Methods in molecular biology (Clifton, N.J.) 2010;604;43-53

  • Quantifying the mechanisms of domain gain in animal proteins.

    Buljan M, Frankish A and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Background: Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms.

    Results: Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events.

    Conclusions: The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.

    Genome biology 2010;11;7;R74

  • New views on natural killer cell-based immunotherapy for melanoma treatment.

    Burke S, Lakshmikanth T, Colucci F and Carbone E

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Natural killer (NK) cell-based immunotherapies treat hematopoietic malignancies, but are less effective against solid tumors. Here, we review recent data on NK cell recognition of melanoma at various stages of the disease and propose a combinatorial strategy to exploit fully the potential of NK cells. Depending on the stage of melanoma progression, NK cell-based therapies could be combined with pharmacological and T cell-based immunotherapies, to: (i) prevent lymph node metastases by redistributing cytotoxic NK cells; (ii) boost NK cell activity using chemotherapy to upregulate activating ligands on tumor cells; and (iii) target visceral metastases by transfer of autologous or allogeneic NK cells.

    Funded by: Medical Research Council; Wellcome Trust

    Trends in immunology 2010;31;9;339-45

  • The patterns and dynamics of genomic instability in metastatic pancreatic cancer.

    Campbell PJ, Yachida S, Mudie LJ, Stephens PJ, Pleasance ED, Stebbings LA, Morsberger LA, Latimer C, McLaren S, Lin ML, McBride DJ, Varela I, Nik-Zainal SA, Leroy C, Jia M, Menzies A, Butler AP, Teague JW, Griffin CA, Burton J, Swerdlow H, Quail MA, Stratton MR, Iacobuzio-Donahue C and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Pancreatic cancer is an aggressive malignancy with a five-year mortality of 97-98%, usually due to widespread metastatic disease. Previous studies indicate that this disease has a complex genomic landscape, with frequent copy number changes and point mutations, but genomic rearrangements have not been characterized in detail. Despite the clinical importance of metastasis, there remain fundamental questions about the clonal structures of metastatic tumours, including phylogenetic relationships among metastases, the scale of ongoing parallel evolution in metastatic and primary sites, and how the tumour disseminates. Here we harness advances in DNA sequencing to annotate genomic rearrangements in 13 patients with pancreatic cancer and explore clonal relationships among metastases. We find that pancreatic cancer acquires rearrangements indicative of telomere dysfunction and abnormal cell-cycle control, namely dysregulated G1-to-S-phase transition with intact G2-M checkpoint. These initiate amplification of cancer genes and occur predominantly in early cancer development rather than the later stages of the disease. Genomic instability frequently persists after cancer dissemination, resulting in ongoing, parallel and even convergent evolution among different metastases. We find evidence that there is genetic heterogeneity among metastasis-initiating cells, that seeding metastasis may require driver mutations beyond those required for primary tumours, and that phylogenetic trees across metastases show organ-specific branches. These data attest to the richness of genetic variation in cancer, brought about by the tandem forces of genomic instability and evolutionary selection.

    Funded by: NCI NIH HHS: CA106610, CA140599, K08 CA106610, K08 CA106610-03, K08 CA106610-04, K08 CA106610-05, R01 CA140599-01, R01 CA140599-02, R01 CA140599-03; Wellcome Trust: 077012/Z/05/Z, 088340, 093867, WT088340MA

    Nature 2010;467;7319;1109-13

  • Structural modelling and comparative analysis of homologous, analogous and specific proteins from Trypanosoma cruzi versus Homo sapiens: putative drug targets for chagas' disease treatment.

    Capriles PV, Guimarães AC, Otto TD, Miranda AB, Dardenne LE and Degrave WM

    Grupo de Modelagem Molecular de Sistemas Biológicos, Laboratório Nacional de Computação Científica, LNCC/MCT, Petrópolis, CEP 25651-075, Brazil.

    Background: Trypanosoma cruzi is the etiological agent of Chagas' disease, an endemic infection that causes thousands of deaths every year in Latin America. Therapeutic options remain inefficient, demanding the search for new drugs and/or new molecular targets. Such efforts can focus on proteins that are specific to the parasite, but analogous enzymes and enzymes with a three-dimensional (3D) structure sufficiently different from the corresponding host proteins may represent equally interesting targets. In order to find these targets we used the workflows MHOLline and AnEnΠ obtaining 3D models from homologous, analogous and specific proteins of Trypanosoma cruzi versus Homo sapiens.

    Results: We applied genome wide comparative modelling techniques to obtain 3D models for 3,286 predicted proteins of T. cruzi. In combination with comparative genome analysis to Homo sapiens, we were able to identify a subset of 397 enzyme sequences, of which 356 are homologous, 3 analogous and 38 specific to the parasite.

    Conclusions: In this work, we present a set of 397 enzyme models of T. cruzi that can constitute potential structure-based drug targets to be investigated for the development of new strategies to fight Chagas' disease. The strategies presented here support the concept of structural analysis in conjunction with protein functional analysis as an interesting computational methodology to detect potential targets for structure-based rational drug design. For example, 2,4-dienoyl-CoA reductase (EC and triacylglycerol lipase (EC, classified as analogous proteins in relation to H. sapiens enzymes, were identified as new potential molecular targets.

    BMC genomics 2010;11;610

  • BamView: viewing mapped read alignment data in the context of the reference sequence.

    Carver T, Böhme U, Otto TD, Parkhill J and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    SUMMARY: BamView is an interactive Java application for visualizing the large amounts of data stored for sequence reads which are aligned against a reference genome sequence. It supports the BAM (Binary Alignment/Map) format. It can be used in a number of contexts including SNP calling and structural annotation. BamView has also been integrated into Artemis so that the reads can be viewed in the context of the nucleotide sequence and genomic features. AVAILABILITY: BamView and Artemis are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at:

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Bioinformatics (Oxford, England) 2010;26;5;676-7

  • PLA2G7 genotype, lipoprotein-associated phospholipase A2 activity, and coronary heart disease risk in 10 494 cases and 15 624 controls of European Ancestry.

    Casas JP, Ninio E, Panayiotou A, Palmen J, Cooper JA, Ricketts SL, Sofat R, Nicolaides AN, Corsetti JP, Fowkes FG, Tzoulaki I, Kumari M, Brunner EJ, Kivimaki M, Marmot MG, Hoffmann MM, Winkler K, März W, Ye S, Stirnadel HA, Boekholdt SM, Khaw KT, Humphries SE, Sandhu MS, Hingorani AD and Talmud PJ

    Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK. <;

    Background: Higher lipoprotein-associated phospholipase A(2)(Lp-PLA2) activity is associated with increased risk of coronary heart disease (CHD), making Lp-PLA2 a potential therapeutic target. PLA2G7 variants associated with Lp-PLA2 activity could evaluate whether this relationship is causal.

    A meta-analysis including a total of 12 studies (5 prospective, 4 case-control, 1 case-only, and 2 cross-sectional studies; n=26 118) was undertaken to examine the association of the following: (1) Lp-PLA2 activity versus cardiovascular biomarkers and risk factors and CHD events (2 prospective studies; n=4884); (2) PLA2G7 single-nucleotide polymorphisms and Lp-PLA2 activity (3 prospective, 2 case-control, 2 cross-sectional studies; up to n=6094); and (3) PLA2G7 single-nucleotide polymorphisms and angiographic coronary artery disease (2 case-control, 1 case-only study; n=4971 cases) and CHD events (5 prospective, 2 case-control studies; n=5523). Lp-PLA2 activity correlated with several CHD risk markers. Hazard ratios for CHD events for the top versus bottom quartile of Lp-PLA2 activity were 1.61 (95% confidence interval, 1.31 to 1.99) and 1.17 (95% confidence interval, 0.91 to 1.51) after adjustment for baseline traits. Of 7 single-nucleotide polymorphisms, rs1051931 (A379V) showed the strongest association with Lp-PLA2 activity, with VV subjects having 7.2% higher activity than AAs. Genotype was not associated with risk markers, angiographic coronary disease (odds ratio, 1.03; 95% confidence interval, 0.80 to 1.32), or CHD events (odds ratio, 0.98; 95% confidence interval, 0.82 to 1.17).

    Conclusions: Unlike Lp-PLA2 activity, PLA2G7 variants associated with modest effects on Lp-PLA2 activity were not associated with cardiovascular risk markers, coronary atheroma, or CHD. Larger association studies, identification of single-nucleotide polymorphisms with larger effects, or randomized trials of specific Lp-PLA2 inhibitors are needed to confirm or refute a contributory role for Lp-PLA2 in CHD.

    Funded by: AHRQ HHS: HS06516; British Heart Foundation: FS/07/011, PG98/183, RG05/014; NIA NIH HHS: AG13196; Wellcome Trust: 085475

    Circulation 2010;121;21;2284-93

  • Beyond the Genome: genomics research ten years after the human genome sequence.

    Casto AM and Amid C

    Department of Genetics, Stanford University, Stanford, CA 94305, USA.

    A report on the meeting 'Beyond the Genome', Boston, USA, 11-13 October 2010.

    Genome biology 2010;11;11;309

  • Molecular and physiological analysis of three Pseudomonas aeruginosa phages belonging to the "N4-like viruses".

    Ceyssens PJ, Brabban A, Rogge L, Lewis MS, Pickard D, Goulding D, Dougan G, Noben JP, Kropinski A, Kutter E and Lavigne R

    Division of Gene Technology, Katholieke Universiteit Leuven, Kasteelpark Arenberg, Leuven, B-3001, Belgium.

    We present a detailed analysis of the genome architecture, structural proteome and infection-related properties of three Pseudomonas phages, designated LUZ7, LIT1 and PEV2. These podoviruses encapsulate 72.5 to 74.9 kb genomes and lyse their host after 25 min aerobic infection. PEV2 can successfully infect under anaerobic conditions, but its latent period is tripled, the lysis proceeds far slower and the burst size decreases significantly. While the overall genome structure of these phages resembles the well-studied coliphage N4, these Pseudomonas phages encode a cluster of tail genes which displays significant similarity to a Pseudomonasaeruginosa (cryptic) prophage region. Using ESI-MS/MS, these tail proteins were shown to be part of the phage particle, as well as ten other proteins including a giant 370 kDa virion RNA polymerase. These phages are the first described representatives of a novel kind of obligatory lytic P. aeruginosa-infecting phages, belonging to the widespread "N4-like viruses" genus.

    Funded by: Wellcome Trust

    Virology 2010;405;1;26-30

  • Genetic loci influencing kidney function and chronic kidney disease.

    Chambers JC, Zhang W, Lord GM, van der Harst P, Lawlor DA, Sehmi JS, Gale DP, Wass MN, Ahmadi KR, Bakker SJ, Beckmann J, Bilo HJ, Bochud M, Brown MJ, Caulfield MJ, Connell JM, Cook HT, Cotlarciuc I, Davey Smith G, de Silva R, Deng G, Devuyst O, Dikkeschei LD, Dimkovic N, Dockrell M, Dominiczak A, Ebrahim S, Eggermann T, Farrall M, Ferrucci L, Floege J, Forouhi NG, Gansevoort RT, Han X, Hedblad B, Homan van der Heide JJ, Hepkema BG, Hernandez-Fuentes M, Hypponen E, Johnson T, de Jong PE, Kleefstra N, Lagou V, Lapsley M, Li Y, Loos RJ, Luan J, Luttropp K, Maréchal C, Melander O, Munroe PB, Nordfors L, Parsa A, Peltonen L, Penninx BW, Perucha E, Pouta A, Prokopenko I, Roderick PJ, Ruokonen A, Samani NJ, Sanna S, Schalling M, Schlessinger D, Schlieper G, Seelen MA, Shuldiner AR, Sjögren M, Smit JH, Snieder H, Soranzo N, Spector TD, Stenvinkel P, Sternberg MJ, Swaminathan R, Tanaka T, Ubink-Veltmaat LJ, Uda M, Vollenweider P, Wallace C, Waterworth D, Zerres K, Waeber G, Wareham NJ, Maxwell PH, McCarthy MI, Jarvelin MR, Mooser V, Abecasis GR, Lightstone L, Scott J, Navis G, Elliott P and Kooner JS

    Department of Epidemiology and Biostatistics, School of Public Health, Imperial College of London, London, UK.

    Using genome-wide association, we identify common variants at 2p12-p13, 6q26, 17q23 and 19q13 associated with serum creatinine, a marker of kidney function (P = 10(-10) to 10(-15)). Of these, rs10206899 (near NAT8, 2p12-p13) and rs4805834 (near SLC7A9, 19q13) were also associated with chronic kidney disease (P = 5.0 x 10(-5) and P = 3.6 x 10(-4), respectively). Our findings provide insight into metabolic, solute and drug-transport pathways underlying susceptibility to chronic kidney disease.

    Nature genetics 2010;42;5;373-5

  • α8-integrins are required for hippocampal long-term potentiation but not for hippocampal-dependent learning.

    Chan CS, Chen H, Bradley A, Dragatsis I, Rosenmund C and Davis RL

    Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA.

    Integrins are heterodimeric transmembrane cell adhesion receptors that are essential for a wide range of biological functions via cell-matrix and cell-cell interactions. Recent studies have provided evidence that some of the subunits in the integrin family are involved in synaptic and behavioral plasticity. To further understand the role of integrins in the mammalian central nervous system, we generated a postnatal forebrain and excitatory neuron-specific knockout of alpha8-integrin in the mouse. Behavioral studies showed that the mutant mice are normal in multiple hippocampal-dependent learning tasks, including a T-maze, non-match-to-place working memory task for which other integrin subunits like alpha3- and beta1-integrin are required. In contrast, mice mutant for alpha8-integrin exhibited a specific impairment of long-term potentiation (LTP) at Schaffer collateral-CA1 synapses, whereas basal synaptic transmission, paired-pulse facilitation and long-term depression (LTD) remained unaffected. Because LTP is also impaired in the absence of alpha3-integrin, our results indicate that multiple integrin molecules are required for the normal expression of LTP, and different integrins display distinct roles in behavioral and neurophysiological processes like synaptic plasticity.

    Funded by: NICHD NIH HHS: HD24064; NIMH NIH HHS: MH60420, R01 MH060420-10

    Genes, brain, and behavior 2010;9;4;402-10

  • The impact of gene expression regulation on evolution of extracellular signaling pathways.

    Charoensawan V, Adryan B, Martin S, Söllner C, Thisse B, Thisse C, Wright GJ and Teichmann SA

    Medical Research Council Laboratory of Molecular Biology, Cambridge CB20QH, United Kingdom.

    Extracellular protein interactions are crucial to the development of multicellular organisms because they initiate signaling pathways and enable cellular recognition cues. Despite their importance, extracellular protein interactions are often under-represented in large scale protein interaction data sets because most high throughput assays are not designed to detect low affinity extracellular interactions. Due to the lack of a comprehensive data set, the evolution of extracellular signaling pathways has remained largely a mystery. We investigated this question using a combined data set of physical pairwise interactions between zebrafish extracellular proteins, mainly from the immunoglobulin superfamily and leucine-rich repeat families, and their spatiotemporal expression profiles. We took advantage of known homology between proteins to estimate the relative rates of changes of four parameters after gene duplication, namely extracellular protein interaction, expression pattern, and the divergence of extracellular and intracellular protein sequences. We showed that change in expression profile is a major contributor to the evolution of signaling pathways followed by divergence in intracellular protein sequence, whereas extracellular sequence and interaction profiles were relatively more conserved. Rapidly evolving expression profiles will eventually drive other parameters to diverge more quickly because differentially expressed proteins get exposed to different environments and potential binding partners. This allows homologous extracellular receptors to attain specialized functions and become specific to tissues and/or developmental stages.

    Funded by: Medical Research Council: MC_U105161047; Wellcome Trust: 077108/Z/05/Z

    Molecular & cellular proteomics : MCP 2010;9;12;2666-77

  • Complete genome sequence and comparative metabolic profiling of the prototypical enteroaggregative Escherichia coli strain 042.

    Chaudhuri RR, Sebaihia M, Hobman JL, Webber MA, Leyton DL, Goldberg MD, Cunningham AF, Scott-Tucker A, Ferguson PR, Thomas CM, Frankel G, Tang CM, Dudley EG, Roberts IS, Rasko DA, Pallen MJ, Parkhill J, Nataro JP, Thomson NR and Henderson IR

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.

    Background: Escherichia coli can experience a multifaceted life, in some cases acting as a commensal while in other cases causing intestinal and/or extraintestinal disease. Several studies suggest enteroaggregative E. coli are the predominant cause of E. coli-mediated diarrhea in the developed world and are second only to Campylobacter sp. as a cause of bacterial-mediated diarrhea. Furthermore, enteroaggregative E. coli are a predominant cause of persistent diarrhea in the developing world where infection has been associated with malnourishment and growth retardation.

    Methods: In this study we determined the complete genomic sequence of E. coli 042, the prototypical member of the enteroaggregative E. coli, which has been shown to cause disease in volunteer studies. We performed genomic and phylogenetic comparisons with other E. coli strains revealing previously uncharacterised virulence factors including a variety of secreted proteins and a capsular polysaccharide biosynthetic locus. In addition, by using Biolog Phenotype Microarrays we have provided a full metabolic profiling of E. coli 042 and the non-pathogenic lab strain E. coli K-12. We have highlighted the genetic basis for many of the metabolic differences between E. coli 042 and E. coli K-12.

    Conclusion: This study provides a genetic context for the vast amount of experimental and epidemiological data published thus far and provides a template for future diagnostic and intervention strategies.

    Funded by: Medical Research Council: G0801209

    PloS one 2010;5;1;e8801

  • Distinct clinical phenotypes associated with JAK2V617F reflect differential STAT1 signaling.

    Chen E, Beer PA, Godfrey AL, Ortmann CA, Li J, Costa-Pereira AP, Ingle CE, Dermitzakis ET, Campbell PJ and Green AR

    Cambridge Institute for Medical Research and Department of Haematology, University of Cambridge, Hills Road, Cambridge, CB2 0XY, UK.

    The JAK2V617F mutation is associated with distinct myeloproliferative neoplasms, including polycythemia vera (PV) and essential thrombocythemia (ET), but it remains unclear how it generates disparate disorders. By comparing clonally-derived mutant and wild-type cells from individual patients, we demonstrate that the transcriptional consequences of JAK2V617F are subtle, and that JAK2V617F-heterozygous erythroid cells from ET and PV patients exhibit differential interferon signaling and STAT1 phosphorylation. Increased STAT1 activity in normal CD34-positive progenitors produces an ET-like phenotype, whereas downregulation of STAT1 activity in JAK2V617F-heterozygous ET progenitors produces a PV-like phenotype. Our results illustrate the power of clonal analysis, indicate that the consequences of JAK2V617F reflect a balance between STAT5 and STAT1 activation and are relevant for other neoplasms associated with signaling pathway mutations.

    Funded by: Wellcome Trust: 088340

    Cancer cell 2010;18;5;524-35

  • Ensembl variation resources.

    Chen Y, Cunningham F, Rios D, McLaren WM, Smith J, Pritchard B, Spudich GM, Brent S, Kulesha E, Marin-Garcia P, Smedley D, Birney E and Flicek P

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Background: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics.

    Description: The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl.

    Conclusions: Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at and from the public MySQL database server at

    Funded by: Medical Research Council; Wellcome Trust

    BMC genomics 2010;11;293

  • A cancer-derived mutation in the PSTAIRE helix of cyclin-dependent kinase 2 alters the stability of cyclin binding.

    Child ES, Hendrychová T, McCague K, Futreal A, Otyepka M and Mann DJ

    Department of Life Sciences, Imperial College, South Kensington, London SW72AZ, UK.

    Cyclin-dependent kinase 2 (cdk2) is a central regulator of the mammalian cell cycle. Here we describe the properties of a mutant form of cdk2 identified during large-scale sequencing of protein kinases from cancerous tissue. The mutation substituted a leucine for a proline in the PSTAIRE helix, the central motif in the interaction of the cdk with its regulatory cyclin subunit. We demonstrate that whilst the mutant cdk2 is considerably impaired in stable cyclin association, it is still able to generate an active kinase that can functionally complement defective cdks in vivo. Molecular dynamic simulations and biophysical measurements indicate that the observed biochemical properties likely stem from increased flexibility within the cyclin-binding helix.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C510859/1, BB/D524208/1; Wellcome Trust: 077012/Z/05/Z

    Biochimica et biophysica acta 2010;1803;7;858-64

  • A Bayesian approach using covariance of single nucleotide polymorphism data to detect differences in linkage disequilibrium patterns between groups of individuals.

    Clark TG, Campino SG, Anastasi E, Auburn S, Teo YY, Small K, Rockett KA, Kwiatkowski DP and Holmes CC

    Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, London, UK.

    MOTIVATION: Quantifying differences in linkage disequilibrium (LD) between sub-groups can highlight genetic regions or sites under selection and/or associated with disease, and may have utility in trans-ethnic mapping studies. RESULTS: We present a novel pseudo Bayes factor (PBF) approach that assess differences in covariance of genotype frequencies from single nucleotide polymorphism (SNP) data from a genome-wide study. The magnitude of the PBF reflects the strength of evidence for a difference, while accounting for the sample size and number of SNPs, without the requirement for permutation testing to establish statistical significance. Application of the PBF to HapMap and Gambian malaria SNP data reveals regional LD differences, some known to be under selection. AVAILABILITY AND IMPLEMENTATION: The PBF approach has been implemented in the BALD (Bayesian analysis of LD differences) C++ software, and is available from

    Funded by: Medical Research Council; Wellcome Trust

    Bioinformatics (Oxford, England) 2010;26;16;1999-2003

  • Common variants near TERC are associated with mean telomere length.

    Codd V, Mangino M, van der Harst P, Braund PS, Kaiser M, Beveridge AJ, Rafelt S, Moore J, Nelson C, Soranzo N, Zhai G, Valdes AM, Blackburn H, Mateo Leach I, de Boer RA, Kimura M, Aviv A, Wellcome Trust Case Control Consortium, Goodall AH, Ouwehand W, van Veldhuisen DJ, van Gilst WH, Navis G, Burton PR, Tobin MD, Hall AS, Thompson JR, Spector T and Samani NJ

    Department of Cardiovascular Sciences, University of Leicester, Glenfield Hospital, Leicester, UK.

    We conducted genome-wide association analyses of mean leukocyte telomere length in 2,917 individuals, with follow-up replication in 9,492 individuals. We identified an association with telomere length on 3q26 (rs12696304, combined P = 3.72 x 10(-14)) at a locus that includes TERC, which encodes the telomerase RNA component. Each copy of the minor allele of rs12696304 was associated with an approximately 75-base-pair reduction in mean telomere length, equivalent to approximately 3.6 years of age-related telomere-length attrition.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; Wellcome Trust

    Nature genetics 2010;42;3;197-9

  • The dopamine β-hydroxylase -1021C/T polymorphism is associated with the risk of Alzheimer's disease in the Epistasis Project.

    Combarros O, Warden DR, Hammond N, Cortina-Borja M, Belbin O, Lehmann MG, Wilcock GK, Brown K, Kehoe PG, Barber R, Coto E, Alvarez V, Deloukas P, Gwilliam R, Heun R, Kölsch H, Mateo I, Oulhaj A, Arias-Vásquez A, Schuur M, Aulchenko YS, Ikram MA, Breteler MM, van Duijn CM, Morgan K, Smith AD and Lehmann DJ

    Neurology Service and Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas, Marqués de Valdecilla University Hospital (University of Cantabria), 39008 Santander, Spain.

    Background: The loss of noradrenergic neurones of the locus coeruleus is a major feature of Alzheimer's disease (AD). Dopamine β-hydroxylase (DBH) catalyses the conversion of dopamine to noradrenaline. Interactions have been reported between the low-activity -1021T allele (rs1611115) of DBH and polymorphisms of the pro-inflammatory cytokine genes, IL1A and IL6, contributing to the risk of AD. We therefore examined the associations with AD of the DBH -1021T allele and of the above interactions in the Epistasis Project, with 1757 cases of AD and 6294 elderly controls.

    Methods: We genotyped eight single nucleotide polymorphisms (SNPs) in the three genes, DBH, IL1A and IL6. We used logistic regression models and synergy factor analysis to examine potential interactions and associations with AD.

    Results: We found that the presence of the -1021T allele was associated with AD: odds ratio = 1.2 (95% confidence interval: 1.06-1.4, p = 0.005). This association was nearly restricted to men < 75 years old: odds ratio = 2.2 (1.4-3.3, 0.0004). We also found an interaction between the presence of DBH -1021T and the -889TT genotype (rs1800587) of IL1A: synergy factor = 1.9 (1.2-3.1, 0.005). All these results were consistent between North Europe and North Spain.

    Conclusions: Extensive, previous evidence (reviewed here) indicates an important role for noradrenaline in the control of inflammation in the brain. Thus, the -1021T allele with presumed low activity may be associated with misregulation of inflammation, which could contribute to the onset of AD. We suggest that such misregulation is the predominant mechanism of the association we report here.

    Funded by: Medical Research Council: G0400546

    BMC medical genetics 2010;11;162

  • Mutation spectrum revealed by breakpoint sequencing of human germline CNVs.

    Conrad DF, Bird C, Blackburne B, Lindsay S, Mamanova L, Lee C, Turner DJ and Hurles ME

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Precisely characterizing the breakpoints of copy number variants (CNVs) is crucial for assessing their functional impact. However, fewer than 10% of known germline CNVs have been mapped to the single-nucleotide level. We characterized the sequence breakpoints from a dataset of all CNVs detected in three unrelated individuals in previous array-based CNV discovery experiments. We used targeted hybridization-based DNA capture and 454 sequencing to sequence 324 CNV breakpoints, including 315 deletions. We observed two major breakpoint signatures: 70% of the deletion breakpoints have 1-30 bp of microhomology, whereas 33% of deletion breakpoints contain 1-367 bp of inserted sequence. The co-occurrence of microhomology and inserted sequence is low (10%), suggesting that there are at least two different mutational mechanisms. Approximately 5% of the breakpoints represent more complex rearrangements, including local microinversions, suggesting a replication-based strand switching mechanism. Despite a rich literature on DNA repair processes, reconstruction of the molecular events generating each of these mutations is not yet possible.

    Funded by: Wellcome Trust: 077014, 077014/Z/05/Z

    Nature genetics 2010;42;5;385-91

  • Origins and functional impact of copy number variation in the human genome.

    Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW and Hurles ME

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK.

    Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

    Funded by: Canadian Institutes of Health Research; NHGRI NIH HHS: HG004221; NIGMS NIH HHS: GM081533; Wellcome Trust: 077006/Z/05/Z, 077008, 077009, 077014

    Nature 2010;464;7289;704-12

  • Community-associated methicillin-resistant Staphylococcus aureus infections.

    Cooke FJ and Brown NM

    Clinical Microbiology and Public Health Laboratory, Health Protection Agency, Addenbrooke's Hospital, Cambridge CB2 0QW, UK.

    Community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has been recognized for over a decade, and usually refers to MRSA identified in previously healthy individuals with no recognized MRSA risk factors. Infections range from minor skin and soft tissue infections, through to severe pneumonia and necrotizing fasciitis. This review summarizes the current data on the epidemiology and molecular features of CA-MRSA, in addition to diagnosis and therapeutic measures. We also refer to current national guidelines for the management of these infections. Areas of agreement include the important genotypic and phenotypic differences of community MRSA strains compared with hospital strains. Areas of controversy include the precise epidemiological definition of community-acquired/associated MRSA. Fortunately, true CA-MRSA can be differentiated from hospital MRSA by molecular techniques, as discussed herein. Recent interest has focused on the changing epidemiology of CA-MRSA. Worldwide, CA-MRSA is now seen outside of the initial specific population groups, and in the USA, the successful USA300 community strain is beginning to spread back into hospitals. Reasons why USA300 remains relatively uncommon in Europe are unclear. Topics timely for research include the investigation of the epidemiology of infections and evolutionary genomics.

    British medical bulletin 2010;94;215-27

  • Microfluidics for the upstream pipeline of DNA sequencing--a worthy application?

    Coupland P

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK CB10 1SA.

    Technological advances and economic investment into DNA sequencing during this decade has provided the industry of genome sequencing with a suite of dedicated sequencing machines capable of rapidly generating vast quantities of sequence data. This next generation of equipment for DNA sequencing is freely available and is utilised more commonly; this has lead to the traditional bottle-neck in the sequencing pipeline transferring from the sequencing process, i.e. reading the bases on the older capillary based machines, to the upstream processes of sample preparation, i.e. creating the DNA libraries that are to be read. Essentially, advancement in sequencing technology is running faster than the equivalent for sample preparation technology and, without a remedy, we will no longer be able to provide samples quick enough to keep the sequencing machines running at full capacity.

    Lab on a chip 2010;10;5;544-7

  • Strong genetic evidence for a selective influence of GABAA receptors on a component of the bipolar disorder phenotype.

    Craddock N, Jones L, Jones IR, Kirov G, Green EK, Grozeva D, Moskvina V, Nikolov I, Hamshere ML, Vukcevic D, Caesar S, Gordon-Smith K, Fraser C, Russell E, Norton N, Breen G, St Clair D, Collier DA, Young AH, Ferrier IN, Farmer A, McGuffin P, Holmans PA, Wellcome Trust Case Control Consortium (WTCCC), Donnelly P, Owen MJ and O'Donovan MC

    Department of Psychological Medicine, School of Medicine, Cardiff University, Cardiff, UK.

    Despite compelling evidence for a major genetic contribution to risk of bipolar mood disorder, conclusive evidence implicating specific genes or pathophysiological systems has proved elusive. In part this is likely to be related to the unknown validity of current phenotype definitions and consequent aetiological heterogeneity of samples. In the recent Wellcome Trust Case Control Consortium genome-wide association analysis of bipolar disorder (1868 cases, 2938 controls) one of the most strongly associated polymorphisms lay within the gene encoding the GABA(A) receptor beta1 subunit, GABRB1. Aiming to increase biological homogeneity, we sought the diagnostic subset that showed the strongest signal at this polymorphism and used this to test for independent evidence of association with other members of the GABA(A) receptor gene family. The index signal was significantly enriched in the 279 cases meeting Research Diagnostic Criteria for schizoaffective disorder, bipolar type (P=3.8 x 10(-6)). Independently, these cases showed strong evidence that variation in GABA(A) receptor genes influences risk for this phenotype (independent system-wide P=6.6 x 10(-5)) with association signals also at GABRA4, GABRB3, GABRA5 and GABRR3. [corrected] Our findings have the potential to inform understanding of presentation, pathogenesis and nosology of bipolar disorders. Our method of phenotype refinement may be useful in studies of other complex psychiatric and non-psychiatric disorders.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 079643

    Molecular psychiatry 2010;15;2;146-53

  • Automated design of genomic Southern blot probes.

    Croning MD, Fricker DG, Komiyama NH and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, UK.

    Background: Sothern blotting is a DNA analysis technique that has found widespread application in molecular biology. It has been used for gene discovery and mapping and has diagnostic and forensic applications, including mutation detection in patient samples and DNA fingerprinting in criminal investigations. Southern blotting has been employed as the definitive method for detecting transgene integration, and successful homologous recombination in gene targeting experiments.The technique employs a labeled DNA probe to detect a specific DNA sequence in a complex DNA sample that has been separated by restriction-digest and gel electrophoresis. Critically for the technique to succeed the probe must be unique to the target locus so as not to cross-hybridize to other endogenous DNA within the sample.Investigators routinely employ a manual approach to probe design. A genome browser is used to extract DNA sequence from the locus of interest, which is searched against the target genome using a BLAST-like tool. Ideally a single perfect match is obtained to the target, with little cross-reactivity caused by homologous DNA sequence present in the genome and/or repetitive and low-complexity elements in the candidate probe. This is a labor intensive process often requiring several attempts to find a suitable probe for laboratory testing.

    Results: We have written an informatic pipeline to automatically design genomic Sothern blot probes that specifically attempts to optimize the resultant probe, employing a brute-force strategy of generating many candidate probes of acceptable length in the user-specified design window, searching all against the target genome, then scoring and ranking the candidates by uniqueness and repetitive DNA element content. Using these in silico measures we can automatically design probes that we predict to perform as well, or better, than our previous manual designs, while considerably reducing design time.We went on to experimentally validate a number of these automated designs by Southern blotting. The majority of probes we tested performed well confirming our in silico prediction methodology and the general usefulness of the software for automated genomic Southern probe design.

    Conclusions: Software and supplementary information are freely available at:

    Funded by: Wellcome Trust

    BMC genomics 2010;11;74

  • A rapid and scalable method for selecting recombinant mouse monoclonal antibodies.

    Crosnier C, Staudt N and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge CB10 1HH, UK.

    Background: Monoclonal antibodies with high affinity and selectivity that work on wholemount fixed tissues are valuable reagents to the cell and developmental biologist, and yet isolating them remains a long and unpredictable process. Here we report a rapid and scalable method to select and express recombinant mouse monoclonal antibodies that are essentially equivalent to those secreted by parental IgG-isotype hybridomas.

    Results: Increased throughput was achieved by immunizing mice with pools of antigens and cloning - from small numbers of hybridoma cells - the functionally rearranged light and heavy chains into a single expression plasmid. By immunizing with the ectodomains of zebrafish cell surface receptor proteins expressed in mammalian cells and screening for formalin-resistant epitopes, we selected antibodies that gave expected staining patterns on wholemount fixed zebrafish embryos.

    Conclusions: This method can be used to quickly select several high quality monoclonal antibodies from a single immunized mouse and facilitates their distribution using plasmids.

    Funded by: NINDS NIH HHS: R01NS063400; Wellcome Trust: 077108/Z/05/Z

    BMC biology 2010;8;76

  • A commensal gone bad: complete genome sequence of the prototypical enterotoxigenic Escherichia coli strain H10407.

    Crossman LC, Chaudhuri RR, Beatson SA, Wells TJ, Desvaux M, Cunningham AF, Petty NK, Mahon V, Brinkley C, Hobman JL, Savarino SJ, Turner SM, Pallen MJ, Penn CW, Parkhill J, Turner AK, Johnson TJ, Thomson NR, Smith SG and Henderson IR

    The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, United Kingdom.

    In most cases, Escherichia coli exists as a harmless commensal organism, but it may on occasion cause intestinal and/or extraintestinal disease. Enterotoxigenic E. coli (ETEC) is the predominant cause of E. coli-mediated diarrhea in the developing world and is responsible for a significant portion of pediatric deaths. In this study, we determined the complete genomic sequence of E. coli H10407, a prototypical strain of enterotoxigenic E. coli, which reproducibly elicits diarrhea in human volunteer studies. We performed genomic and phylogenetic comparisons with other E. coli strains, revealing that the chromosome is closely related to that of the nonpathogenic commensal strain E. coli HS and to those of the laboratory strains E. coli K-12 and C. Furthermore, these analyses demonstrated that there were no chromosomally encoded factors unique to any sequenced ETEC strains. Comparison of the E. coli H10407 plasmids with those from several ETEC strains revealed that the plasmids had a mosaic structure but that several loci were conserved among ETEC strains. This study provides a genetic context for the vast amount of experimental and epidemiological data that have been published.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C510075/1; Medical Research Council: G0801209; Wellcome Trust

    Journal of bacteriology 2010;192;21;5822-31

  • Studying bacterial transcriptomes using RNA-seq.

    Croucher NJ and Thomson NR

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, Cambridgeshire, CB10 1SA, UK.

    Genome-wide studies of bacterial gene expression are shifting from microarray technology to second generation sequencing platforms. RNA-seq has a number of advantages over hybridization-based techniques, such as annotation-independent detection of transcription, improved sensitivity and increased dynamic range. Early studies have uncovered a wealth of novel coding sequences and non-coding RNA, and are revealing a transcriptional landscape that increasingly mirrors that of eukaryotes. Already basic RNA-seq protocols have been improved and adapted to looking at particular aspects of RNA biology, often with an emphasis on non-coding RNAs, and further refinements to current techniques will improve our understanding of gene expression, and genome content, in the future.

    Funded by: Wellcome Trust

    Current opinion in microbiology 2010;13;5;619-24

  • Identification of attractive drug targets in neglected-disease pathogens using an in silico approach.

    Crowther GJ, Shanmugam D, Carmona SJ, Doyle MA, Hertz-Fowler C, Berriman M, Nwaka S, Ralph SA, Roos DS, Van Voorhis WC and Agüero F

    Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, Washington, United States of America.

    Background: The increased sequencing of pathogen genomes and the subsequent availability of genome-scale functional datasets are expected to guide the experimental work necessary for target-based drug discovery. However, a major bottleneck in this has been the difficulty of capturing and integrating relevant information in an easily accessible format for identifying and prioritizing potential targets. The open-access resource facilitates drug target prioritization for major tropical disease pathogens such as the mycobacteria Mycobacterium leprae and Mycobacterium tuberculosis; the kinetoplastid protozoans Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi; the apicomplexan protozoans Plasmodium falciparum, Plasmodium vivax, and Toxoplasma gondii; and the helminths Brugia malayi and Schistosoma mansoni.

    Here we present strategies to prioritize pathogen proteins based on whether their properties meet criteria considered desirable in a drug target. These criteria are based upon both sequence-derived information (e.g., molecular mass) and functional data on expression, essentiality, phenotypes, metabolic pathways, assayability, and druggability. This approach also highlights the fact that data for many relevant criteria are lacking in less-studied pathogens (e.g., helminths), and we demonstrate how this can be partially overcome by mapping data from homologous genes in well-studied organisms. We also show how individual users can easily upload external datasets and integrate them with existing data in to generate highly customized ranked lists of potential targets.

    Using the datasets and the tools available in, we have generated illustrative lists of potential drug targets in seven tropical disease pathogens. While these lists are broadly consistent with the research community's current interest in certain specific proteins, and suggest novel target candidates that may merit further study, the lists can easily be modified in a user-specific manner, either by adjusting the weights for chosen criteria or by changing the criteria that are included.

    PLoS neglected tropical diseases 2010;4;8;e804

  • High-throughput analysis of candidate imprinted genes and allele-specific gene expression in the human term placenta.

    Daelemans C, Ritchie ME, Smits G, Abu-Amero S, Sudbery IM, Forrest MS, Campino S, Clark TG, Stanier P, Kwiatkowski D, Deloukas P, Dermitzakis ET, Tavaré S, Moore GE and Dunham I

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1 SA, UK.

    Background: Imprinted genes show expression from one parental allele only and are important for development and behaviour. This extreme mode of allelic imbalance has been described for approximately 56 human genes. Imprinting status is often disrupted in cancer and dysmorphic syndromes. More subtle variation of gene expression, that is not parent-of-origin specific, termed 'allele-specific gene expression' (ASE) is more common and may give rise to milder phenotypic differences. Using two allele-specific high-throughput technologies alongside bioinformatics predictions, normal term human placenta was screened to find new imprinted genes and to ascertain the extent of ASE in this tissue.

    Results: Twenty-three family trios of placental cDNA, placental genomic DNA (gDNA) and gDNA from both parents were tested for 130 candidate genes with the Sequenom MassArray system. Six genes were found differentially expressed but none imprinted. The Illumina ASE BeadArray platform was then used to test 1536 SNPs in 932 genes. The array was enriched for the human orthologues of 124 mouse candidate genes from bioinformatics predictions and 10 human candidate imprinted genes from EST database mining. After quality control pruning, a total of 261 informative SNPs (214 genes) remained for analysis. Imprinting with maternal expression was demonstrated for the lymphocyte imprinted gene ZNF331 in human placenta. Two potential differentially methylated regions (DMRs) were found in the vicinity of ZNF331. None of the bioinformatically predicted candidates tested showed imprinting except for a skewed allelic expression in a parent-specific manner observed for PHACTR2, a neighbour of the imprinted PLAGL1 gene. ASE was detected for two or more individuals in 39 candidate genes (18%).

    Conclusions: Both Sequenom and Illumina assays were sensitive enough to study imprinting and strong allelic bias. Previous bioinformatics approaches were not predictive of new imprinted genes in the human term placenta. ZNF331 is imprinted in human term placenta and might be a new ubiquitously imprinted gene, part of a primate-specific locus. Demonstration of partial imprinting of PHACTR2 calls for re-evaluation of the allelic pattern of expression for the PHACTR2-PLAGL1 locus. ASE was common in human term placenta.

    Funded by: Medical Research Council; Wellcome Trust

    BMC genetics 2010;11;25

  • Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes.

    Dalgliesh GL, Furge K, Greenman C, Chen L, Bignell G, Butler A, Davies H, Edkins S, Hardy C, Latimer C, Teague J, Andrews J, Barthorpe S, Beare D, Buck G, Campbell PJ, Forbes S, Jia M, Jones D, Knott H, Kok CY, Lau KW, Leroy C, Lin ML, McBride DJ, Maddison M, Maguire S, McLay K, Menzies A, Mironenko T, Mulderrig L, Mudie L, O'Meara S, Pleasance E, Rajasingham A, Shepherd R, Smith R, Stebbings L, Stephens P, Tang G, Tarpey PS, Turrell K, Dykema KJ, Khoo SK, Petillo D, Wondergem B, Anema J, Kahnoski RJ, Teh BT, Stratton MR and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Clear cell renal cell carcinoma (ccRCC) is the most common form of adult kidney cancer, characterized by the presence of inactivating mutations in the VHL gene in most cases, and by infrequent somatic mutations in known cancer genes. To determine further the genetics of ccRCC, we have sequenced 101 cases through 3,544 protein-coding genes. Here we report the identification of inactivating mutations in two genes encoding enzymes involved in histone modification-SETD2, a histone H3 lysine 36 methyltransferase, and JARID1C (also known as KDM5C), a histone H3 lysine 4 demethylase-as well as mutations in the histone H3 lysine 27 demethylase, UTX (KMD6A), that we recently reported. The results highlight the role of mutations in components of the chromatin modification machinery in human cancer. Furthermore, NF2 mutations were found in non-VHL mutated ccRCC, and several other probable cancer genes were identified. These results indicate that substantial genetic heterogeneity exists in a cancer type dominated by mutations in a single gene, and that systematic screens will be key to fully determining the somatic genetic architecture of cancer.

    Funded by: Wellcome Trust: 077012, 077012/Z/05/Z, 082359, 088340, 093867

    Nature 2010;463;7279;360-3

  • Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

    Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Blomberg Le Ann, Bouffard P, Burt DW, Crasta O, Crooijmans RP, Cooper K, Coulombe RA, De S, Delany ME, Dodgson JB, Dong JJ, Evans C, Frederickson KM, Flicek P, Florea L, Folkerts O, Groenen MA, Harkins TT, Herrero J, Hoffmann S, Megens HJ, Jiang A, de Jong P, Kaiser P, Kim H, Kim KW, Kim S, Langenberger D, Lee MK, Lee T, Mane S, Marcais G, Marz M, McElroy AP, Modise T, Nefedov M, Notredame C, Paton IR, Payne WS, Pertea G, Prickett D, Puiu D, Qioa D, Raineri E, Ruffier M, Salzberg SL, Schatz MC, Scheuring C, Schmidt CJ, Schroeder S, Searle SM, Smith EJ, Smith J, Sonstegard TS, Stadler PF, Tafer H, Tu ZJ, Van Tassell CP, Vilella AJ, Williams KP, Yorke JA, Zhang L, Zhang HB, Zhang X, Zhang Y and Reed KM

    Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America.

    A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.

    Funded by: NHGRI NIH HHS: #R01-HG002945, R01 HG006677-12; NIGMS NIH HHS: R01 GM083873-08, R01 GM083873-09; NLM NIH HHS: #R01-LM006845, R01 LM006845-09, R01 LM006845-10, R01 LM006845-11

    PLoS biology 2010;8;9

  • FLU, an amino acid substitution model for influenza proteins.

    Dang CC, Le QS, Gascuel O and Le VS

    College of Technology, Vietnam National University Hanoi, Cau Giay, Hanoi, Vietnam.

    Background: The amino acid substitution model is the core component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Although several general amino acid substitution models have been estimated from large and diverse protein databases, they remain inappropriate for analyzing specific species, e.g., viruses. Emerging epidemics of influenza viruses raise the need for comprehensive studies of these dangerous viruses. We propose an influenza-specific amino acid substitution model to enhance the understanding of the evolution of influenza viruses.

    Results: A maximum likelihood approach was applied to estimate an amino acid substitution model (FLU) from approximately 113,000 influenza protein sequences, consisting of approximately 20 million residues. FLU outperforms 14 widely used models in constructing maximum likelihood phylogenetic trees for the majority of influenza protein alignments. On average, FLU gains approximately 42 log likelihood points with an alignment of 300 sites. Moreover, topologies of trees constructed using FLU and other models are frequently different. FLU does indeed have an impact on likelihood improvement as well as tree topologies. It was implemented in PhyML and can be downloaded from or included in PhyML 3.0 server at

    Conclusions: FLU should be useful for any influenza protein analysis system which requires an accurate description of amino acid substitutions.

    BMC evolutionary biology 2010;10;99

  • The crystal structure of a bacterial Sufu-like protein defines a novel group of bacterial proteins that are similar to the N-terminal domain of human Sufu.

    Das D, Finn RD, Abdubek P, Astakhova T, Axelrod HL, Bakolitsa C, Cai X, Carlton D, Chen C, Chiu HJ, Chiu M, Clayton T, Deller MC, Duan L, Ellrott K, Farr CL, Feuerhelm J, Grant JC, Grzechnik A, Han GW, Jaroszewski L, Jin KK, Klock HE, Knuth MW, Kozbial P, Krishna SS, Kumar A, Lam WW, Marciano D, Miller MD, Morse AT, Nigoghossian E, Nopakun A, Okach L, Puckett C, Reyes R, Tien HJ, Trame CB, van den Bedem H, Weekes D, Wooten T, Xu Q, Yeh A, Zhou J, Hodgson KO, Wooley J, Elsliger MA, Deacon AM, Godzik A, Lesley SA and Wilson IA

    Joint Center for Structural Genomics.

    Sufu (Suppressor of Fused), a two-domain protein, plays a critical role in regulating Hedgehog signaling and is conserved from flies to humans. A few bacterial Sufu-like proteins have previously been identified based on sequence similarity to the N-terminal domain of eukaryotic Sufu proteins, but none have been structurally or biochemically characterized and their function in bacteria is unknown. We have determined the crystal structure of a more distantly related Sufu-like homolog, NGO1391 from Neisseria gonorrhoeae, at 1.4 Å resolution, which provides the first biophysical characterization of a bacterial Sufu-like protein. The structure revealed a striking similarity to the N-terminal domain of human Sufu (r.m.s.d. of 2.6 Å over 93% of the NGO1391 protein), despite an extremely low sequence identity of ∼15%. Subsequent sequence analysis revealed that NGO1391 defines a new subset of smaller, Sufu-like proteins that are present in ∼200 bacterial species and has resulted in expansion of the SUFU (PF05076) family in Pfam.

    Funded by: NIGMS NIH HHS: U54 GM074898; Wellcome Trust: WT077044/Z/05/Z

    Protein science : a publication of the Protein Society 2010;19;11;2131-40

  • The structure of BVU2987 from Bacteroides vulgatus reveals a superfamily of bacterial periplasmic proteins with possible inhibitory function.

    Das D, Finn RD, Carlton D, Miller MD, Abdubek P, Astakhova T, Axelrod HL, Bakolitsa C, Chen C, Chiu HJ, Chiu M, Clayton T, Deller MC, Duan L, Ellrott K, Ernst D, Farr CL, Feuerhelm J, Grant JC, Grzechnik A, Han GW, Jaroszewski L, Jin KK, Klock HE, Knuth MW, Kozbial P, Krishna SS, Kumar A, Marciano D, McMullan D, Morse AT, Nigoghossian E, Nopakun A, Okach L, Puckett C, Reyes R, Rife CL, Sefcovic N, Tien HJ, Trame CB, van den Bedem H, Weekes D, Wooten T, Xu Q, Hodgson KO, Wooley J, Elsliger MA, Deacon AM, Godzik A, Lesley SA and Wilson IA

    Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA.

    Proteins that contain the DUF2874 domain constitute a new Pfam family PF11396. Members of this family have predominantly been identified in microbes found in the human gut and oral cavity. The crystal structure of one member of this family, BVU2987 from Bacteroides vulgatus, has been determined, revealing a β-lactamase inhibitor protein-like structure with a tandem repeat of domains. Sequence analysis and structural comparisons reveal that BVU2987 and other DUF2874 proteins are related to β-lactamase inhibitor protein, PepSY and SmpA_OmlA proteins and hence are likely to function as inhibitory proteins.

    Funded by: NIGMS NIH HHS: U54 GM074898; Wellcome Trust: WT077044/Z/05/Z

    Acta crystallographica. Section F, Structural biology and crystallization communications 2010;66;Pt 10;1265-73

  • Analysis of TBC1D4 in patients with severe insulin resistance.

    Dash S, Langenberg C, Fawcett KA, Semple RK, Romeo S, Sharp S, Sano H, Lienhard GE, Rochford JJ, Howlett T, Massoud AF, Hindmarsh P, Howell SJ, Wilkinson RJ, Lyssenko V, Groop L, Baroni MG, Barroso I, Wareham NJ, O'Rahilly S and Savage DB

    Funded by: Medical Research Council: G0600414, G0800203, MC_U106179471, MC_U117588499; NIDDK NIH HHS: DK25336, R01 DK025336, R56 DK025336; Wellcome Trust: 072070, 077016, 088316

    Diabetologia 2010;53;6;1239-42

  • Proapoptotic Rassf1A/Mst1 signaling in cardiac fibroblasts is protective against pressure overload in mice.

    Del Re DP, Matsuda T, Zhai P, Gao S, Clark GJ, Van Der Weyden L and Sadoshima J

    Department of Cell Biology and Molecular Medicine, Cardiovascular Research Institute, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, New Jersey 07103-2714, USA.

    Mammalian sterile 20-like kinase 1 (Mst1) is a mammalian homolog of Drosophila Hippo, the master regulator of cell death, proliferation, and organ size in flies. It is the chief component of the mammalian Hippo pathway and promotes apoptosis and inhibits compensatory cardiac hypertrophy, playing a critical role in mediating heart failure. How Mst1 is regulated, however, remains unclear. Using genetically altered mice in which expression of the tumor suppressor Ras-association domain family 1 isoform A (Rassf1A) was modulated in a cell type-specific manner, we demonstrate here that Rassf1A is an endogenous activator of Mst1 in the heart. Although the Rassf1A/Mst1 pathway promoted apoptosis in cardiomyocytes, thereby playing a detrimental role, the same pathway surprisingly inhibited fibroblast proliferation and cardiac hypertrophy through both cell-autonomous and autocrine/paracrine mechanisms, playing a protective role during pressure overload. In cardiac fibroblasts, the Rassf1A/Mst1 pathway negatively regulated TNF-α, a key mediator of hypertrophy, fibrosis, and resulting cardiac dysfunction. These results suggest that the functional consequence of activating the proapoptotic Rassf1A/Mst1 pathway during pressure overload is cell type dependent in the heart and that suppressing this mechanism in cardiac fibroblasts could be detrimental.

    Funded by: NHLBI NIH HHS: HL099148, HL59139, HL67724, HL69020, HL91469, P01 HL059139-060011, P01 HL069020-080007, P01 HL069020-090007, P01 HL069020-100007, R01 HL067724-08, R01 HL067724-09, R01 HL067724-10, R01 HL067724-11, R01 HL091469-02, R01 HL091469-04, R01 HL102738-02; NIA NIH HHS: AG27211, R01 AG023039-08

    The Journal of clinical investigation 2010;120;10;3555-67

  • In vivo composition of NMDA receptor signaling complexes differs between membrane subdomains and is modulated by PSD-95 and PSD-93.

    Delint-Ramirez I, Fernández E, Bayés A, Kicsi E, Komiyama NH and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

    Lipid rafts are dynamic membrane microdomains enriched in cholesterol and sphingolipids involved in the compartmentalization of signaling pathways, trafficking and sorting of proteins. At synapses, the glutamatergic NMDA receptor and its cytoplasmic scaffold protein PSD-95 move between postsynaptic density (PSD) and rafts following learning or ischemia. However it is not known whether the signaling complexes formed by these proteins are different in rafts nor the molecular mechanisms that govern their localization. To examine these issues in vivo we used mice carrying genetically encoded tags for purification of protein complexes and specific mutations in NMDA receptors, PSD-95 and other postsynaptic scaffold proteins. Isolation of PSD-95 complexes from mice carrying tandem affinity purification tags showed differential composition in lipid rafts, postsynaptic density and detergent-soluble fractions. Raft PSD-95 complexes showed less CaMKIIalpha and SynGAP and enrichment in Src and Arc/Arg3.1 compared with PSD complexes. Mice carrying knock-outs of PSD-95 or PSD-93 show a key role for PSD-95 in localizing NR2A-containing NMDA receptor complexes to rafts. Deletion of the NR2A C terminus or the C-terminal valine residue of NR2B, which prevents all PDZ interactions, reduced the NR1 association with rafts. Interestingly, the deletion of the NR2B valine residue increased the total amount of lipid rafts. These data show critical roles for scaffold proteins and their interactions with NMDA receptor subunits in organizing the differential expression in rafts and postsynaptic densities of synaptic signaling complexes.

    Funded by: Wellcome Trust: 066717

    The Journal of neuroscience : the official journal of the Society for Neuroscience 2010;30;24;8162-70

  • Differential DNA methylation as a tool for noninvasive prenatal diagnosis (NIPD) of X chromosome aneuploidies.

    Della Ragione F, Mastrovito P, Campanile C, Conti A, Papageorgiou EA, Hultén MA, Patsalis PC, Carter NP and D'Esposito M

    Institute of Genetics and Biophysics A. Buzzati Traverso, Naples, Italy.

    The demographic tendency in industrial countries to delay childbearing, coupled with the maternal age effect in common chromosomal aneuploidies and the risk to the fetus of invasive prenatal diagnosis, are potent drivers for the development of strategies for noninvasive prenatal diagnosis. One breakthrough has been the discovery of differentially methylated cell-free fetal DNA in the maternal circulation. We describe novel bisulfite conversion- and methylation-sensitive enzyme digestion DNA methylation-related approaches that we used to diagnose Turner syndrome from first trimester samples. We used an X-linked marker, EF3, and an autosomal marker, RASSF1A, to discriminate between placental and maternal blood cell DNA using real-time methylation-specific PCR after bisulfite conversion and real-time PCR after methylation-sensitive restriction digestion. By normalizing EF3 amplifications versus RASSF1A outputs, we were able to calculate sex chromosome/autosome ratios in chorionic villus samples, thus permitting us to correctly diagnose Turner syndrome. The identification of this new marker coupled with the strategy outlined here may be instrumental in the development of an efficient, noninvasive method of diagnosis of sex chromosome aneuploidies in plasma samples.

    Funded by: Wellcome Trust: WT077008

    The Journal of molecular diagnostics : JMD 2010;12;6;797-807

  • Protein variation in blood-dwelling schistosome worms generated by differential splicing of micro-exon gene transcripts.

    DeMarco R, Mathieson W, Manuel SJ, Dillon GP, Curwen RS, Ashton PD, Ivens AC, Berriman M, Verjovski-Almeida S and Wilson RA

    Department of Biology, University of York, York YO10 5YW, United Kingdom.

    Schistosoma mansoni is a well-adapted blood-dwelling parasitic helminth, persisting for decades in its human host despite being continually exposed to potential immune attack. Here, we describe in detail micro-exon genes (MEG) in S. mansoni, some present in multiple copies, which represent a novel molecular system for creating protein variation through the alternate splicing of short (< or =36 bp) symmetric exons organized in tandem. Analysis of three closely related copies of one MEG family allowed us to trace several evolutionary events and propose a mechanism for micro-exon generation and diversification. Microarray experiments show that the majority of MEGs are up-regulated in life cycle stages associated with establishment in the mammalian host after skin penetration. Sequencing of RT-PCR products allowed the description of several alternate splice forms of micro-exon genes, highlighting the potential use of these transcripts to generate a complex pool of protein variants. We obtained direct evidence for the existence of such pools by proteomic analysis of secretions from migrating schistosomula and mature eggs. Whole-mount in situ hybridization and immunolocalization showed that MEG transcripts and proteins were restricted to glands or epithelia exposed to the external environment. The ability of schistosomes to produce a complex pool of variant proteins aligns them with the other major groups of blood parasites, but using a completely different mechanism. We believe that our data open a new chapter in the study of immune evasion by schistosomes, and their ability to generate variant proteins could represent a significant obstacle to vaccine development.

    Funded by: Biotechnology and Biological Sciences Research Council; NIAID NIH HHS: AI054711-01A2; Wellcome Trust: WT085775/Z/08/Z

    Genome research 2010;20;8;1112-21

  • Leishmania-specific surface antigens show sub-genus sequence variation and immune recognition.

    Depledge DP, MacLean LM, Hodgkinson MR, Smith BA, Jackson AP, Ma S, Uliana SR and Smith DF

    Centre for Immunology and Infection, Department of Biology, Hull York Medical School, University of York, York, United Kingdom.

    Background: A family of hydrophilic acylated surface (HASP) proteins, containing extensive and variant amino acid repeats, is expressed at the plasma membrane in infective extracellular (metacyclic) and intracellular (amastigote) stages of Old World Leishmania species. While HASPs are antigenic in the host and can induce protective immune responses, the biological functions of these Leishmania-specific proteins remain unresolved. Previous genome analysis has suggested that parasites of the sub-genus Leishmania (Viannia) have lost HASP genes from their genomes.

    We have used molecular and cellular methods to analyse HASP expression in New World Leishmania mexicana complex species and show that, unlike in L. major, these proteins are expressed predominantly following differentiation into amastigotes within macrophages. Further genome analysis has revealed that the L. (Viannia) species, L. (V.) braziliensis, does express HASP-like proteins of low amino acid similarity but with similar biochemical characteristics, from genes present on a region of chromosome 23 that is syntenic with the HASP/SHERP locus in Old World Leishmania species and the L. (L.) mexicana complex. A related gene is also present in Leptomonas seymouri and this may represent the ancestral copy of these Leishmania-genus specific sequences. The L. braziliensis HASP-like proteins (named the orthologous (o) HASPs) are predominantly expressed on the plasma membrane in amastigotes and are recognised by immune sera taken from 4 out of 6 leishmaniasis patients tested in an endemic region of Brazil. Analysis of the repetitive domains of the oHASPs has shown considerable genetic variation in parasite isolates taken from the same patients, suggesting that antigenic change may play a role in immune recognition of this protein family.

    These findings confirm that antigenic hydrophilic acylated proteins are expressed from genes in the same chromosomal region in species across the genus Leishmania. These proteins are surface-exposed on amastigotes (although L. (L.) major parasites also express HASPB on the metacyclic plasma membrane). The central repetitive domains of the HASPs are highly variant in their amino acid sequences, both within and between species, consistent with a role in immune recognition in the host.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 048615, 076355, 077503

    PLoS neglected tropical diseases 2010;4;9;e829

  • Genomic approaches uncover increasing complexities in the regulatory landscape at the human SCL (TAL1) locus.

    Dhami P, Bruce AW, Jim JH, Dillon SC, Hall A, Cooper JL, Bonhoure N, Chiang K, Ellis PD, Langford C, Andrews RM and Vetrie D

    The Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    The SCL (TAL1) transcription factor is a critical regulator of haematopoiesis and its expression is tightly controlled by multiple cis-acting regulatory elements. To elaborate further the DNA elements which control its regulation, we used genomic tiling microarrays covering 256 kb of the human SCL locus to perform a concerted analysis of chromatin structure and binding of regulatory proteins in human haematopoietic cell lines. This approach allowed us to characterise further or redefine known human SCL regulatory elements and led to the identification of six novel elements with putative regulatory function both up and downstream of the SCL gene. They bind a number of haematopoietic transcription factors (GATA1, E2A LMO2, SCL, LDB1), CTCF or components of the transcriptional machinery and are associated with relevant histone modifications, accessible chromatin and low nucleosomal density. Functional characterisation shows that these novel elements are able to enhance or repress SCL promoter activity, have endogenous promoter function or enhancer-blocking insulator function. Our analysis opens up several areas for further investigation and adds new layers of complexity to our understanding of the regulation of SCL expression.

    Funded by: Wellcome Trust

    PloS one 2010;5;2;e9059

  • Complex exon-intron marking by histone modifications is not determined solely by nucleosome distribution.

    Dhami P, Saffrey P, Bruce AW, Dillon SC, Chiang K, Bonhoure N, Koch CM, Bye J, James K, Foad NS, Ellis P, Watkins NA, Ouwehand WH, Langford C, Andrews RM, Dunham I and Vetrie D

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    It has recently been shown that nucleosome distribution, histone modifications and RNA polymerase II (Pol II) occupancy show preferential association with exons ("exon-intron marking"), linking chromatin structure and function to co-transcriptional splicing in a variety of eukaryotes. Previous ChIP-sequencing studies suggested that these marking patterns reflect the nucleosomal landscape. By analyzing ChIP-chip datasets across the human genome in three cell types, we have found that this marking system is far more complex than previously observed. We show here that a range of histone modifications and Pol II are preferentially associated with exons. However, there is noticeable cell-type specificity in the degree of exon marking by histone modifications and, surprisingly, this is also reflected in some histone modifications patterns showing biases towards introns. Exon-intron marking is laid down in the absence of transcription on silent genes, with some marking biases changing or becoming reversed for genes expressed at different levels. Furthermore, the relationship of this marking system with splicing is not simple, with only some histone modifications reflecting exon usage/inclusion, while others mirror patterns of exon exclusion. By examining nucleosomal distributions in all three cell types, we demonstrate that these histone modification patterns cannot solely be accounted for by differences in nucleosome levels between exons and introns. In addition, because of inherent differences between ChIP-chip array and ChIP-sequencing approaches, these platforms report different nucleosome distribution patterns across the human genome. Our findings confound existing views and point to active cellular mechanisms which dynamically regulate histone modification levels and account for exon-intron marking. We believe that these histone modification patterns provide links between chromatin accessibility, Pol II movement and co-transcriptional splicing.

    Funded by: NHGRI NIH HHS: U01HG003168

    PloS one 2010;5;8;e12339

  • Ectodomains of the LDL receptor-related proteins LRP1b and LRP4 have anchorage independent functions in vivo.

    Dietrich MF, van der Weyden L, Prosser HM, Bradley A, Herz J and Adams DJ

    Department of Molecular Genetics, UT Southwestern, Dallas, Texas, United States of America.

    Background: The low-density lipoprotein (LDL) receptor gene family is a highly conserved group of membrane receptors with diverse functions in developmental processes, lipoprotein trafficking, and cell signaling. The low-density lipoprotein (LDL) receptor-related protein 1b (LRP1B) was reported to be deleted in several types of human malignancies, including non-small cell lung cancer. Our group has previously reported that a distal extracellular truncation of murine Lrp1b that is predicted to secrete the entire intact extracellular domain (ECD) is fully viable with no apparent phenotype.

    Here, we have used a gene targeting approach to create two mouse lines carrying internally rearranged exons of Lrp1b that are predicted to truncate the protein closer to the N-terminus and to prevent normal trafficking through the secretary pathway. Both mutations result in early embryonic lethality, but, as expected from the restricted expression pattern of LRP1b in vivo, loss of Lrp1b does not cause cellular lethality as homozygous Lrp1b-deficient blastocysts can be propagated normally in culture. This is similar to findings for another LDL receptor family member, Lrp4. We provide in vitro evidence that Lrp4 undergoes regulated intramembraneous processing through metalloproteases and gamma-secretase cleavage. We further demonstrate negative regulation of the Wnt signaling pathway by the soluble extracellular domain.

    Our results underline a crucial role for Lrp1b in development. The expression in mice of truncated alleles of Lrp1b and Lrp4 with deletions of the transmembrane and intracellular domains leads to release of the extracellular domain into the extracellular space, which is sufficient to confer viability. In contrast, null mutations are embryonically (Lrp1b) or perinatally (Lrp4) lethal. These findings suggest that the extracellular domains of both proteins may function as a scavenger for signaling ligands or signal modulators in the extracellular space, thereby preserving signaling thresholds that are critical for embryonic development, as well as for the clear, but poorly understood role of LRP1b in cancer.

    Funded by: Cancer Research UK; NHLBI NIH HHS: R37 HL063762-12, R37 HL063762-13; Wellcome Trust

    PloS one 2010;5;4;e9960

  • A parasite calcium switch and Achilles' heel revealed.

    Doerig C and Billker O

    Institut National de la Santé et de la Recherche Médicale (INSERM)-Ecole Polytechnique Fédérale de Lausanne (EPFL) Joint Laboratory, INSERM U609, Global Health Institute, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.

    Nature structural & molecular biology 2010;17;5;541-3

  • Assessment of the neuropeptide S system in anxiety disorders.

    Donner J, Haapakoski R, Ezer S, Melén E, Pirkola S, Gratacòs M, Zucchelli M, Anedda F, Johansson LE, Söderhäll C, Orsmark-Pietras C, Suvisaari J, Martín-Santos R, Torrens M, Silander K, Terwilliger JD, Wickman M, Pershagen G, Lönnqvist J, Peltonen L, Estivill X, D'Amato M, Kere J, Alenius H and Hovatta I

    Research Program of Molecular Neurology, Biomedicum Helsinki, Helsinki, Finland.

    Background: The G protein-coupled receptor neuropeptide S receptor 1 (NPSR1) and its ligand neuropeptide S (NPS) form a signaling system mainly implicated in susceptibility to asthma and inflammatory disorders in humans and regulation of anxiety and arousal in rodents. We addressed here the role of NPS and NPSR1 as susceptibility genes for human anxiety disorders.

    Methods: We performed comprehensive association analysis of genetic variants in NPS and NPSR1 in three independent study samples. We first studied a population-based sample (Health 2000, Finland) of 321 anxiety disorder patients and 1317 control subjects and subsequently a Spanish clinical panic disorder sample consisting of 188 cases and 315 control subjects. In addition, we examined a birth cohort of 2020 children (Barn Allergi Miljö Stockholm Epidemiologi [BAMSE], Sweden). We then tested whether alleles of the most significantly associated single nucleotide polymorphisms alter DNA-protein complex formation in electrophoretic mobility shift assays. Finally, we compared acute stress responses on the gene expression level in wild-type and Npsr1(-/-) mice.

    Results: We confirmed previously observed epidemiological association between anxiety and asthma in two population-based cohorts. Single nucleotide polymorphisms within NPS and NPSR1 associated with panic disorder diagnosis in the Finnish and Spanish samples and with parent-reported anxiety/depression in the BAMSE sample. Moreover, some of the implicated single nucleotide polymorphisms potentially affect transcription factor binding. Expression of neurotrophin-3, a neurotrophic factor connected to stress and panic reaction, was significantly downregulated in brain regions of stressed Npsr1(-/-) mice, whereas interleukin-1 beta, an active stress-related immunotransmitter, was upregulated.

    Conclusions: Our results suggest that NPS-NPSR1 signaling is likely involved in anxiety.

    Biological psychiatry 2010;68;5;474-83

  • A new era in the genomics of bacteria.

    Dougan G and Weinstock GM

    Current opinion in microbiology 2010;13;5;616-8

  • Genome-wide analysis reveals loci encoding anti-macrophage factors in the human pathogen Burkholderia pseudomallei K96243.

    Dowling AJ, Wilkinson PA, Holden MT, Quail MA, Bentley SD, Reger J, Waterfield NR, Titball RW and Ffrench-Constant RH

    Biosciences, University of Exeter, Penryn, United Kingdom.

    Burkholderia pseudomallei is an important human pathogen whose infection biology is still poorly understood. The bacterium is endemic to tropical regions, including South East Asia and Northern Australia, where it causes melioidosis, a serious disease associated with both high mortality and antibiotic resistance. B. pseudomallei is a Gram-negative facultative intracellular pathogen that is able to replicate in macrophages. However despite the critical nature of its interaction with macrophages, few anti-macrophage factors have been characterized to date. Here we perform a genome-wide gain of function screen of B. pseudomallei strain K96243 to identify loci encoding factors with anti-macrophage activity. We identify a total of 113 such loci scattered across both chromosomes, with positive gene clusters encoding transporters and secretion systems, enzymes/toxins, secondary metabolite, biofilm, adhesion and signal response related factors. Further phenotypic analysis of four of these regions shows that the encoded factors cause striking cellular phenotypes relevant to infection biology, including apoptosis, formation of actin 'tails' and multi-nucleation within treated macrophages. The detailed analysis of the remaining host of loci will facilitate genetic dissection of the interaction of this important pathogen with host macrophages and thus further elucidate this critical part of its infection cycle.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E021182/1

    PloS one 2010;5;12;e15693

  • Multiple common variants for celiac disease influencing immune gene expression.

    Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GA, Adány R, Aromaa A, Bardella MT, van den Berg LH, Bockett NA, de la Concha EG, Dema B, Fehrmann RS, Fernández-Arquero M, Fiatal S, Grandone E, Green PM, Groen HJ, Gwilliam R, Houwen RH, Hunt SE, Kaukinen K, Kelleher D, Korponay-Szabo I, Kurppa K, MacMathuna P, Mäki M, Mazzilli MC, McCann OT, Mearin ML, Mein CA, Mirza MM, Mistry V, Mora B, Morley KI, Mulder CJ, Murray JA, Núñez C, Oosterom E, Ophoff RA, Polanco I, Peltonen L, Platteel M, Rybak A, Salomaa V, Schweizer JJ, Sperandeo MP, Tack GJ, Turner G, Veldink JH, Verbeek WH, Weersma RK, Wolters VM, Urcelay E, Cukrowska B, Greco L, Neuhausen SL, McManus R, Barisani D, Deloukas P, Barrett JC, Saavalainen P, Wijmenga C and van Heel DA

    Blizard Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.

    We performed a second-generation genome-wide association study of 4,533 individuals with celiac disease (cases) and 10,750 control subjects. We genotyped 113 selected SNPs with P(GWAS) < 10(-4) and 18 SNPs from 14 known loci in a further 4,918 cases and 5,684 controls. Variants from 13 new regions reached genome-wide significance (P(combined) < 5 x 10(-8)); most contain genes with immune functions (BACH2, CCR4, CD80, CIITA-SOCS1-CLEC16A, ICOSLG and ZMIZ1), with ETS1, RUNX3, THEMIS and TNFRSF14 having key roles in thymic T-cell selection. There was evidence to suggest associations for a further 13 regions. In an expression quantitative trait meta-analysis of 1,469 whole blood samples, 20 of 38 (52.6%) tested loci had celiac risk variants correlated (P < 0.0028, FDR 5%) with cis gene expression.

    Funded by: Medical Research Council: G0700545, G0700545(82277); NIDDK NIH HHS: DK050678, DK071003, DK081645, DK57892, R01 DK081645-02; NINDS NIH HHS: NS058980; Wellcome Trust: 084743

    Nature genetics 2010;42;4;295-302

  • New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk.

    Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, Wheeler E, Glazer NL, Bouatia-Naji N, Gloyn AL, Lindgren CM, Mägi R, Morris AP, Randall J, Johnson T, Elliott P, Rybin D, Thorleifsson G, Steinthorsdottir V, Henneman P, Grallert H, Dehghan A, Hottenga JJ, Franklin CS, Navarro P, Song K, Goel A, Perry JR, Egan JM, Lajunen T, Grarup N, Sparsø T, Doney A, Voight BF, Stringham HM, Li M, Kanoni S, Shrader P, Cavalcanti-Proença C, Kumari M, Qi L, Timpson NJ, Gieger C, Zabena C, Rocheleau G, Ingelsson E, An P, O'Connell J, Luan J, Elliott A, McCarroll SA, Payne F, Roccasecca RM, Pattou F, Sethupathy P, Ardlie K, Ariyurek Y, Balkau B, Barter P, Beilby JP, Ben-Shlomo Y, Benediktsson R, Bennett AJ, Bergmann S, Bochud M, Boerwinkle E, Bonnefond A, Bonnycastle LL, Borch-Johnsen K, Böttcher Y, Brunner E, Bumpstead SJ, Charpentier G, Chen YD, Chines P, Clarke R, Coin LJ, Cooper MN, Cornelis M, Crawford G, Crisponi L, Day IN, de Geus EJ, Delplanque J, Dina C, Erdos MR, Fedson AC, Fischer-Rosinsky A, Forouhi NG, Fox CS, Frants R, Franzosi MG, Galan P, Goodarzi MO, Graessler J, Groves CJ, Grundy S, Gwilliam R, Gyllensten U, Hadjadj S, Hallmans G, Hammond N, Han X, Hartikainen AL, Hassanali N, Hayward C, Heath SC, Hercberg S, Herder C, Hicks AA, Hillman DR, Hingorani AD, Hofman A, Hui J, Hung J, Isomaa B, Johnson PR, Jørgensen T, Jula A, Kaakinen M, Kaprio J, Kesaniemi YA, Kivimaki M, Knight B, Koskinen S, Kovacs P, Kyvik KO, Lathrop GM, Lawlor DA, Le Bacquer O, Lecoeur C, Li Y, Lyssenko V, Mahley R, Mangino M, Manning AK, Martínez-Larrad MT, McAteer JB, McCulloch LJ, McPherson R, Meisinger C, Melzer D, Meyre D, Mitchell BD, Morken MA, Mukherjee S, Naitza S, Narisu N, Neville MJ, Oostra BA, Orrù M, Pakyz R, Palmer CN, Paolisso G, Pattaro C, Pearson D, Peden JF, Pedersen NL, Perola M, Pfeiffer AF, Pichler I, Polasek O, Posthuma D, Potter SC, Pouta A, Province MA, Psaty BM, Rathmann W, Rayner NW, Rice K, Ripatti S, Rivadeneira F, Roden M, Rolandsson O, Sandbaek A, Sandhu M, Sanna S, Sayer AA, Scheet P, Scott LJ, Seedorf U, Sharp SJ, Shields B, Sigurethsson G, Sijbrands EJ, Silveira A, Simpson L, Singleton A, Smith NL, Sovio U, Swift A, Syddall H, Syvänen AC, Tanaka T, Thorand B, Tichet J, Tönjes A, Tuomi T, Uitterlinden AG, van Dijk KW, van Hoek M, Varma D, Visvikis-Siest S, Vitart V, Vogelzangs N, Waeber G, Wagner PJ, Walley A, Walters GB, Ward KL, Watkins H, Weedon MN, Wild SH, Willemsen G, Witteman JC, Yarnell JW, Zeggini E, Zelenika D, Zethelius B, Zhai G, Zhao JH, Zillikens MC, DIAGRAM Consortium, GIANT Consortium, Global BPgen Consortium, Borecki IB, Loos RJ, Meneton P, Magnusson PK, Nathan DM, Williams GH, Hattersley AT, Silander K, Salomaa V, Smith GD, Bornstein SR, Schwarz P, Spranger J, Karpe F, Shuldiner AR, Cooper C, Dedoussis GV, Serrano-Ríos M, Morris AD, Lind L, Palmer LJ, Hu FB, Franks PW, Ebrahim S, Marmot M, Kao WH, Pankow JS, Sampson MJ, Kuusisto J, Laakso M, Hansen T, Pedersen O, Pramstaller PP, Wichmann HE, Illig T, Rudan I, Wright AF, Stumvoll M, Campbell H, Wilson JF, Anders Hamsten on behalf of Procardis Consortium, MAGIC investigators, Bergman RN, Buchanan TA, Collins FS, Mohlke KL, Tuomilehto J, Valle TT, Altshuler D, Rotter JI, Siscovick DS, Penninx BW, Boomsma DI, Deloukas P, Spector TD, Frayling TM, Ferrucci L, Kong A, Thorsteinsdottir U, Stefansson K, van Duijn CM, Aulchenko YS, Cao A, Scuteri A, Schlessinger D, Uda M, Ruokonen A, Jarvelin MR, Waterworth DM, Vollenweider P, Peltonen L, Mooser V, Abecasis GR, Wareham NJ, Sladek R, Froguel P, Watanabe RM, Meigs JB, Groop L, Boehnke M, McCarthy MI, Florez JC and Barroso I

    Department of Biostatistics, Boston University School of Public Health, Massachusetts, USA.

    Levels of circulating glucose are tightly regulated. To identify new loci influencing glycemic traits, we performed meta-analyses of 21 genome-wide association studies informative for fasting glucose, fasting insulin and indices of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 nondiabetic participants. Follow-up of 25 loci in up to 76,558 additional subjects identified 16 loci associated with fasting glucose and HOMA-B and two loci associated with fasting insulin and HOMA-IR. These include nine loci newly associated with fasting glucose (in or near ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1 and C2CD4B) and one influencing fasting insulin and HOMA-IR (near IGF1). We also demonstrated association of ADCY5, PROX1, GCK, GCKR and DGKB-TMEM195 with type 2 diabetes. Within these loci, likely biological candidate genes influence signal transduction, cell proliferation, development, glucose-sensing and circadian regulation. Our results demonstrate that genetic studies of glycemic traits can identify type 2 diabetes risk loci, as well as loci containing gene variants that are associated with a modest elevation in glucose levels but are not associated with overt diabetes.

    Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: G0600705, G0601261, G0700222, G0700222(81696), G0701863, G0801056, G19/35, MC_U106179471, MC_U106188470, MC_U127561128, MC_U127592696, MC_U137686854, MC_UP_A620_1014, MC_UP_A620_1015; NIDDK NIH HHS: K24 DK080140-05, P30 DK040561-14, R01 DK029867, R01 DK072193, R01 DK078616-01A1; Wellcome Trust: 064890, 077011, 077016, 081682, 088885, 089061, 091746

    Nature genetics 2010;42;2;105-16

  • Traces of sub-Saharan and Middle Eastern lineages in Indian Muslim populations.

    Eaaswarkhanth M, Haque I, Ravesh Z, Romero IG, Meganathan PR, Dubey B, Khan FA, Chaubey G, Kivisild T, Tyler-Smith C, Singh L and Thangaraj K

    National DNA Analysis Centre, Central Forensic Science Laboratory, Kolkata, India.

    Islam is the second most practiced religion in India, next to Hinduism. It is still unclear whether the spread of Islam in India has been only a cultural transformation or is associated with detectable levels of gene flow. To estimate the contribution of West Asian and Arabian admixture to Indian Muslims, we assessed genetic variation in mtDNA, Y-chromosomal and LCT/MCM6 markers in 472, 431 and 476 samples, respectively, representing six Muslim communities from different geographical regions of India. We found that most of the Indian Muslim populations received their major genetic input from geographically close non-Muslim populations. However, low levels of likely sub-Saharan African, Arabian and West Asian admixture were also observed among Indian Muslims in the form of L0a2a2 mtDNA and E1b1b1a and J(*)(xJ2) Y-chromosomal lineages. The distinction between Iranian and Arabian sources was difficult to make with mtDNA and the Y chromosome, as the estimates were highly correlated because of similar gene pool compositions in the sources. In contrast, the LCT/MCM6 locus, which shows a clear distinction between the two sources, enabled us to rule out significant gene flow from Arabia. Overall, our results support a model according to which the spread of Islam in India was predominantly cultural conversion associated with minor but still detectable levels of gene flow from outside, primarily from Iran and Central Asia, rather than directly from the Arabian Peninsula.

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2010;18;3;354-63

  • Prostate cancer in BRCA2 germline mutation carriers is associated with poorer prognosis.

    Edwards SM, Evans DG, Hope Q, Norman AR, Barbachano Y, Bullock S, Kote-Jarai Z, Meitz J, Falconer A, Osin P, Fisher C, Guy M, Jhavar SG, Hall AL, O'Brien LT, Gehr-Swain BN, Wilkinson RA, Forrest MS, Dearnaley DP, Ardern-Jones AT, Page EC, Easton DF, Eeles RA and UK Genetic Prostate Cancer Study Collaborators and BAUS Section of Oncology

    Oncogenetics team, Section of Cancer Genetics, Institute of Cancer Research, Sutton SM2 5PT, UK.

    Background: The germline BRCA2 mutation is associated with increased prostate cancer (PrCa) risk. We have assessed survival in young PrCa cases with a germline mutation in BRCA2 and investigated loss of heterozygosity at BRCA2 in their tumours.

    Methods: Two cohorts were compared: one was a group with young-onset PrCa, tested for germline BRCA2 mutations (6 of 263 cases had a germline BRAC2 mutation), and the second was a validation set consisting of a clinical set from Manchester of known BRCA2 mutuation carriers (15 cases) with PrCa. Survival data were compared with a control series of patients in a single clinic as determined by Kaplan-Meier estimates. Loss of heterozygosity was tested for in the DNA of tumour tissue of the young-onset group by typing four microsatellite markers that flanked the BRCA2 gene, followed by sequencing.

    Results: Median survival of all PrCa cases with a germline BRCA2 mutation was shorter at 4.8 years than was survival in controls at 8.5 years (P=0.002). Loss of heterozygosity was found in the majority of tumours of BRCA2 mutation carriers. Multivariate analysis confirmed that the poorer survival of PrCa in BRCA2 mutation carriers is associated with the germline BRCA2 mutation per se.

    Conclusion: BRCA2 germline mutation is an independent prognostic factor for survival in PrCa. Such patients should not be managed with active surveillance as they have more aggressive disease.

    Funded by: Cancer Research UK: A3354, C5047/A3354

    British journal of cancer 2010;103;6;918-24

  • Candidate malaria susceptibility/protective SNPs in hospital and population-based studies: the effect of sub-structuring.

    Eid NA, Hussein AA, Elzein AM, Mohamed HS, Rockett KA, Kwiatkowski DP and Ibrahim ME

    Department of Molecular Biology, Institute of Endemic Diseases, Medical Campus, Qasser Street, University of Khartoum, Khartoum, Sudan.

    Background: Populations of East Africa including Sudan, exhibit some of the highest indices of genetic diversity in the continent and worldwide. The current study aims to address the possible impact of population structure and population stratification on the outcome of case-control association-analysis of malaria candidate-genes in different Sudanese populations, where the pronounced genetic heterogeneity becomes a source of concern for the potential effect on the studies outcome.

    Methods: A total of 72 SNPs were genotyped using the Sequenom iPLEX Gold assay in 449 DNA samples that included; cases and controls from two village populations, malaria patients and out-patients from the area of Sinnar and additional controls consisting of healthy Nilo-Saharan speaking individuals. The population substructure was estimated using the Structure 2.2 programme.

    The Hardy-Weinberg Equilibrium values were generally within expectation in Hausa and Massalit. However, in the Sinnar area there was a notable excess of homozygosity, which was attributed to the Whalund effect arising from population amalgamation within the sample. The programme STRUCTURE revealed a division of both Hausa and Massalit into two substructures with the partition in Hausa more pronounced than in Massalit; In Sinnar there was no defined substructure. More than 25 of the 72 SNPs assayed were informative in all areas. Some important SNPs were not differentially distributed between malaria cases and controls, including SNPs in CD36 and NOS2. A number of SNPs showed significant p-values for differences in distribution of genotypes between cases and controls including: rs1805015 (in IL4R1) (P = 0.001), rs17047661 (in CR1) (P = 0.02) and rs1800750 (TNF-376)(P = 0.01) in the hospital samples; rs1050828 (G6PD+202) (P = 0.02) and rs1800896 (IL10-1082) (P = 0.04) in Massalit and rs2243250 (IL4-589) (P = 0.04) in Hausa.

    Conclusions: The difference in population structure partly accounts for some of these significant associations, and the strength of association proved to be sensitive to all levels of sub-structuring whether in the hospital or population-based study.

    Malaria journal 2010;9;119

  • Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies.

    Elks CE, Perry JR, Sulem P, Chasman DI, Franceschini N, He C, Lunetta KL, Visser JA, Byrne EM, Cousminer DL, Gudbjartsson DF, Esko T, Feenstra B, Hottenga JJ, Koller DL, Kutalik Z, Lin P, Mangino M, Marongiu M, McArdle PF, Smith AV, Stolk L, van Wingerden SH, Zhao JH, Albrecht E, Corre T, Ingelsson E, Hayward C, Magnusson PK, Smith EN, Ulivi S, Warrington NM, Zgaga L, Alavere H, Amin N, Aspelund T, Bandinelli S, Barroso I, Berenson GS, Bergmann S, Blackburn H, Boerwinkle E, Buring JE, Busonero F, Campbell H, Chanock SJ, Chen W, Cornelis MC, Couper D, Coviello AD, d'Adamo P, de Faire U, de Geus EJ, Deloukas P, Döring A, Smith GD, Easton DF, Eiriksdottir G, Emilsson V, Eriksson J, Ferrucci L, Folsom AR, Foroud T, Garcia M, Gasparini P, Geller F, Gieger C, GIANT Consortium, Gudnason V, Hall P, Hankinson SE, Ferreli L, Heath AC, Hernandez DG, Hofman A, Hu FB, Illig T, Järvelin MR, Johnson AD, Karasik D, Khaw KT, Kiel DP, Kilpeläinen TO, Kolcic I, Kraft P, Launer LJ, Laven JS, Li S, Liu J, Levy D, Martin NG, McArdle WL, Melbye M, Mooser V, Murray JC, Murray SS, Nalls MA, Navarro P, Nelis M, Ness AR, Northstone K, Oostra BA, Peacock M, Palmer LJ, Palotie A, Paré G, Parker AN, Pedersen NL, Peltonen L, Pennell CE, Pharoah P, Polasek O, Plump AS, Pouta A, Porcu E, Rafnar T, Rice JP, Ring SM, Rivadeneira F, Rudan I, Sala C, Salomaa V, Sanna S, Schlessinger D, Schork NJ, Scuteri A, Segrè AV, Shuldiner AR, Soranzo N, Sovio U, Srinivasan SR, Strachan DP, Tammesoo ML, Tikkanen E, Toniolo D, Tsui K, Tryggvadottir L, Tyrer J, Uda M, van Dam RM, van Meurs JB, Vollenweider P, Waeber G, Wareham NJ, Waterworth DM, Weedon MN, Wichmann HE, Willemsen G, Wilson JF, Wright AF, Young L, Zhai G, Zhuang WV, Bierut LJ, Boomsma DI, Boyd HA, Crisponi L, Demerath EW, van Duijn CM, Econs MJ, Harris TB, Hunter DJ, Loos RJ, Metspalu A, Montgomery GW, Ridker PM, Spector TD, Streeten EA, Stefansson K, Thorsteinsdottir U, Uitterlinden AG, Widen E, Murabito JM, Ong KK and Murray A

    Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK.

    To identify loci for age at menarche, we performed a meta-analysis of 32 genome-wide association studies in 87,802 women of European descent, with replication in up to 14,731 women. In addition to the known loci at LIN28B (P = 5.4 × 10⁻⁶⁰) and 9q31.2 (P = 2.2 × 10⁻³³), we identified 30 new menarche loci (all P < 5 × 10⁻⁸) and found suggestive evidence for a further 10 loci (P < 1.9 × 10⁻⁶). The new loci included four previously associated with body mass index (in or near FTO, SEC16B, TRA2B and TMEM18), three in or near other genes implicated in energy homeostasis (BSX, CRTC1 and MCHR2) and three in or near genes implicated in hormonal regulation (INHBA, PCSK2 and RXRG). Ingenuity and gene-set enrichment pathway analyses identified coenzyme A and fatty acid biosynthesis as biological processes related to menarche timing.

    Funded by: Canadian Institutes of Health Research: 166067; Cancer Research UK: 10118, A10119, A10124; Chief Scientist Office: CZB/4/710; Medical Research Council: G0000934, G0401527, G0500539, G0600705, G0701863, G9815508, MC_U106179471, MC_U106179472, MC_U106188470, MC_U127561128; NCI NIH HHS: CA047988, CA089392, CA104021, CA136792, CA40356, CA54281, CA63464, CA98233, P01 CA055075-17, P01 CA087969-13, P01 CA089392-08, P01 CA089392-09, P01CA055075, P01CA087969, R01 CA040356-15S1, R01 CA047988, R01 CA047988-20, R01 CA063464, R01 CA063464-10, R01 CA104021-05, R37 CA054281-17, U01 CA098233, U01 CA098233-08, U01 CA136792, U01 CA136792-03, Z01 CP010200-03, Z01CP010200; NCRR NIH HHS: M01 RR 16500, M01 RR-00750, M01 RR000750-31, M01 RR016500-04, U54RR025204-01, UL1 RR025005, UL1 RR025005-05, UL1 RR025774, UL1 RR025774-05, UL1RR025005; NHGRI NIH HHS: U01 HG004399-02, U01 HG004402-02, U01 HG004415-02, U01 HG004422-01, U01 HG004422-02, U01 HG004423, U01 HG004423-01, U01 HG004424-04, U01 HG004436, U01 HG004436-02, U01 HG004438-04, U01 HG004446-04, U01 HG004726-02, U01 HG004728, U01 HG004728-01, U01 HG004729-02, U01 HG004735, U01 HG004735-02, U01 HG004738, U01 HG004738-02, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004436, U01HG004438, U01HG004446, U01HG004728, U01HG004729, U01HG004735, U01HG004738, U01HG04424; NHLBI NIH HHS: HL 043851, HL087679, HL69757, N01 HC025195, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02 HL64278, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-11, R01 HL086694-03, R01 HL087641-03, R01 HL087679-03, R01 HL088119, R01 HL088119-04, R01HL086694, R01HL087641, R01HL59367, RC2 HL102419, RC2 HL102419-02, U01 HL072515-06, U01 HL084756, U01 HL084756-03, U01 HL84756, U01HL72515, U19 HL069757-11; NIA NIH HHS: AG-16592, N.1-AG-1-1, N.1-AG-1-2111, N01 AG012100, N01 AG012109, N01 AG050002, N01-AG-1-2109, N01-AG-12100, N01-AG-5-0002, P01 AG018397-08, P01 AG025204-03, P01-AG-18397, R01 AG016592, R01 AG016592-10, R01 AG041517, R01 AR/AG 41398, R21 AG032598-02, R21AG032598; NIAAA NIH HHS: AA07535, AA10248, AA13320, AA13321, AA13326, AA14041, K05 AA017688-04, R01 AA007535-08, R01 AA013320-05, R01 AA013321-05, R01 AA013326-05, R01 AA014041-05, U10 AA008401-23, U10AA008401; NIAMS NIH HHS: R01 AR041398-15, R01 AR041398-20; NICHD NIH HHS: HD-061437, R03 HD061437-02; NIDA NIH HHS: R01 DA012854-09, R01 DA013423-05, R01 DA019963-01A2, R01 DA019963-02, R01 DA019963-03, R01-DA013423; NIDCR NIH HHS: U01 DE018903-02, U01 DE018993, U01 DE018993-01, U01DE018903, U01DE018993; NIDDK NIH HHS: P30 DK072488, R01 DK058845-11, R01DK058845, U01 DK062418, U01 DK062418-06; NIMH NIH HHS: MH66206, R01 MH066206-05; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164, 263 MD821336, 263 MD9164 13; PHS HHS: HHSN268200625226C, HHSN268200782096C, R01-088119, RFAHG006033; Wellcome Trust: 068545/Z/02, 076467/Z/05/Z, 077016/Z/05/Z, 079895, 89061/Z/09/Z

    Nature genetics 2010;42;12;1077-85

  • Evaluation of association of HNF1B variants with diverse cancers: collaborative analysis of data from 19 genome-wide association studies.

    Elliott KS, Zeggini E, McCarthy MI, Gudmundsson J, Sulem P, Stacey SN, Thorlacius S, Amundadottir L, Grönberg H, Xu J, Gaborieau V, Eeles RA, Neal DE, Donovan JL, Hamdy FC, Muir K, Hwang SJ, Spitz MR, Zanke B, Carvajal-Carmona L, Brown KM, Australian Melanoma Family Study Investigators, Hayward NK, Macgregor S, Tomlinson IP, Lemire M, Amos CI, Murabito JM, Isaacs WB, Easton DF, Brennan P, PanScan Consortium, Barkardottir RB, Gudbjartsson DF, Rafnar T, Hunter DJ, Chanock SJ, Stefansson K and Ioannidis JP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Background: Genome-wide association studies have found type 2 diabetes-associated variants in the HNF1B gene to exhibit reciprocal associations with prostate cancer risk. We aimed to identify whether these variants may have an effect on cancer risk in general versus a specific effect on prostate cancer only.

    In a collaborative analysis, we collected data from GWAS of cancer phenotypes for the frequently reported variants of HNF1B, rs4430796 and rs7501939, which are in linkage disequilibrium (r(2) = 0.76, HapMap CEU). Overall, the analysis included 16 datasets on rs4430796 with 19,640 cancer cases and 21,929 controls; and 21 datasets on rs7501939 with 26,923 cases and 49,085 controls. Malignancies other than prostate cancer included colorectal, breast, lung and pancreatic cancers, and melanoma. Meta-analysis showed large between-dataset heterogeneity that was driven by different effects in prostate cancer and other cancers. The per-T2D-risk-allele odds ratios (95% confidence intervals) for rs4430796 were 0.79 (0.76, 0.83)] per G allele for prostate cancer (p<10(-15) for both); and 1.03 (0.99, 1.07) for all other cancers. Similarly for rs7501939 the per-T2D-risk-allele odds ratios (95% confidence intervals) were 0.80 (0.77, 0.83) per T allele for prostate cancer (p<10(-15) for both); and 1.00 (0.97, 1.04) for all other cancers. No malignancy other than prostate cancer had a nominally statistically significant association.

    The examined HNF1B variants have a highly specific effect on prostate cancer risk with no apparent association with any of the other studied cancer types.

    Funded by: NCI NIH HHS: R01-CA83115; NHLBI NIH HHS: N01-HC-25195, N02-HL-6-4278

    PloS one 2010;5;5;e10858

  • A high-throughput pharmaceutical screen identifies compounds with specific toxicity against BRCA2-deficient tumors.

    Evers B, Schut E, van der Burg E, Braumuller TM, Egan DA, Holstege H, Edser P, Adams DJ, Wade-Martins R, Bouwman P and Jonkers J

    Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, the Netherlands.

    Purpose: Hereditary breast cancer is partly explained by germline mutations in BRCA1 and BRCA2. Although patients carry heterozygous mutations, their tumors have typically lost the remaining wild-type allele. Selectively targeting BRCA deficiency may therefore constitute an important therapeutic approach. Clinical trials applying this principle are underway, but it is unknown whether the compounds tested are optimal. It is therefore important to identify alternative compounds that specifically target BRCA deficiency and to test new combination therapies to establish optimal treatment strategies.

    We did a high-throughput pharmaceutical screen on BRCA2-deficient mouse mammary tumor cells and isogenic controls with restored BRCA2 function. Subsequently, we validated positive hits in vitro and in vivo using mice carrying BRCA2-deficient mammary tumors.

    Results: Three alkylators-chlorambucil, melphalan, and nimustine-displayed strong and specific toxicity against BRCA2-deficient cells. In vivo, these showed heterogeneous but generally strong BRCA2-deficient antitumor activity, with melphalan and nimustine doing better than cisplatin and the poly-(ADP-ribose)-polymerase inhibitor olaparib (AZD2281) in this small study. In vitro drug combination experiments showed synergistic interactions between the alkylators and olaparib. Tumor intervention studies combining nimustine and olaparib resulted in recurrence-free survival exceeding 330 days in 3 of 5 animals tested.

    Conclusions: We generated and validated a platform for identification of compounds with specific activity against BRCA2-deficient cells that translates well to the preclinical setting. Our data call for the re-evaluation of alkylators, especially melphalan and nimustine, alone or in combination with the poly-(ADP-ribose)-polymerase inhibitors, for the treatment of breast cancers with a defective BRCA pathway.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D012910/1; Cancer Research UK; Wellcome Trust

    Clinical cancer research : an official journal of the American Association for Cancer Research 2010;16;1;99-108

  • The genetics of obesity: FTO leads the way.

    Fawcett KA and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    In 2007, an association of single nucleotide polymorphisms (SNPs) in the fat mass and obesity-associated (FTO) gene region with body mass index (BMI) and risk of obesity was identified in multiple populations, making FTO the first locus unequivocally associated with adiposity. At the time, FTO was a gene of unknown function and it was not known whether these SNPs exerted their effect on adiposity by affecting FTO or neighboring genes. Therefore, this breakthrough association inspired a wealth of in silico, in vitro, and in vivo analyses in model organisms and humans to improve knowledge of FTO function. These studies suggested that FTO plays a role in controlling feeding behavior and energy expenditure. Here, we review the approaches taken that provide a blueprint for the study of other obesity-associated genes in the hope that this strategy will result in increased understanding of the biological mechanisms underlying body weight regulation.

    Funded by: Wellcome Trust: 077016/Z/05/Z

    Trends in genetics : TIG 2010;26;6;266-74

  • Detailed investigation of the role of common and low-frequency WFS1 variants in type 2 diabetes risk.

    Fawcett KA, Wheeler E, Morris AP, Ricketts SL, Hallmans G, Rolandsson O, Daly A, Wasson J, Permutt A, Hattersley AT, Glaser B, Franks PW, McCarthy MI, Wareham NJ, Sandhu MS and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Objective: Wolfram syndrome 1 (WFS1) single nucleotide polymorphisms (SNPs) are associated with risk of type 2 diabetes. In this study we aimed to refine this association and investigate the role of low-frequency WFS1 variants in type 2 diabetes risk.

    For fine-mapping, we sequenced WFS1 exons, splice junctions, and conserved noncoding sequences in samples from 24 type 2 diabetic case and 68 control subjects, selected tagging SNPs, and genotyped these in 959 U.K. type 2 diabetic case and 1,386 control subjects. The same genomic regions were sequenced in samples from 1,235 type 2 diabetic case and 1,668 control subjects to compare the frequency of rarer variants between case and control subjects.

    Results: Of 31 tagging SNPs, the strongest associated was the previously untested 3' untranslated region rs1046320 (P = 0.008); odds ratio 0.84 and P = 6.59 x 10(-7) on further replication in 3,753 case and 4,198 control subjects. High correlation between rs1046320 and the original strongest SNP (rs10010131) (r2 = 0.92) meant that we could not differentiate between their effects in our samples. There was no difference in the cumulative frequency of 82 rare (minor allele frequency [MAF] <0.01) nonsynonymous variants between type 2 diabetic case and control subjects (P = 0.79). Two intermediate frequency (MAF 0.01-0.05) nonsynonymous changes also showed no statistical association with type 2 diabetes.

    Conclusions: We identified six highly correlated SNPs that show strong and comparable associations with risk of type 2 diabetes, but further refinement of these associations will require large sample sizes (>100,000) or studies in ethnically diverse populations. Low frequency variants in WFS1 are unlikely to have a large impact on type 2 diabetes risk in white U.K. populations, highlighting the complexities of undertaking association studies with low-frequency variants identified by resequencing.

    Funded by: British Heart Foundation; Medical Research Council: MC_U106179471; Wellcome Trust: 064890, 077016, 077016/Z/05/Z, 081682

    Diabetes 2010;59;3;741-6

  • Characterization of a hotspot for mimicry: assembly of a butterfly wing transcriptome to genomic sequence at the HmYb/Sb locus.

    Ferguson L, Lee SF, Chamberlain N, Nadeau N, Joron M, Baxter S, Wilkinson P, Papanicolaou A, Kumar S, Kee TJ, Clark R, Davidson C, Glithero R, Beasley H, Vogel H, Ffrench-Constant R and Jiggins C

    Department of Zoology, University of Cambridge, UK.

    The mimetic wing patterns of Heliconius butterflies are an excellent example of both adaptive radiation and convergent evolution. Alleles at the HmYb and HmSb loci control the presence/absence of hindwing bar and hindwing margin phenotypes respectively between divergent races of Heliconius melpomene, and also between sister species. Here, we used fine-scale linkage mapping to identify and sequence a BAC tilepath across the HmYb/Sb loci. We also generated transcriptome sequence data for two wing pattern forms of H. melpomene that differed in HmYb/Sb alleles using 454 sequencing technology. Custom scripts were used to process the sequence traces and generate transcriptome assemblies. Genomic sequence for the HmYb/Sb candidate region was annotated both using the MAKER pipeline and manually using transcriptome sequence reads. In total, 28 genes were identified in the HmYb/Sb candidate region, six of which have alternative splice forms. None of these are orthologues of genes previously identified as being expressed in butterfly wing pattern development, implying previously undescribed molecular mechanisms of pattern determination on Heliconius wings. The use of next-generation sequencing has therefore facilitated DNA annotation of a poorly characterized genome, and generated hypotheses regarding the identity of wing pattern at the HmYb/Sb loci.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0900740

    Molecular ecology 2010;19 Suppl 1;240-54

  • Association of mouse Dlg4 (PSD-95) gene deletion and human DLG4 gene variation with phenotypes relevant to autism spectrum disorders and Williams' syndrome.

    Feyder M, Karlsson RM, Mathur P, Lyman M, Bock R, Momenan R, Munasinghe J, Scattoni ML, Ihne J, Camp M, Graybeal C, Strathdee D, Begg A, Alvarez VA, Kirsch P, Rietschel M, Cichon S, Walter H, Meyer-Lindenberg A, Grant SG and Holmes A

    Section on Behavioral Science and Genetics, Laboratory for Integrative Neuroscience, National Institute on Alcoholism and Alcohol Abuse, Rockville, MD 20852-9411, USA.

    Objective: Research is increasingly linking autism spectrum disorders and other neurodevelopmental disorders to synaptic abnormalities ("synaptopathies"). PSD-95 (postsynaptic density-95, DLG4) orchestrates protein-protein interactions at excitatory synapses and is a major functional bridge interconnecting a neurexinneuroligin-SHANK pathway implicated in autism spectrum disorders.

    Method: The authors characterized behavioral, dendritic, and molecular phenotypic abnormalities relevant to autism spectrum disorders in mice with PSD-95 deletion (Dlg4⁻(/)⁻). The data from mice led to the identification of single-nucleotide polymorphisms (SNPs) in human DLG4 and the examination of associations between these variants and neural signatures of Williams' syndrome in a normal population, using functional and structural neuroimaging.

    Results: Dlg4⁻(/)⁻ showed increased repetitive behaviors, abnormal communication and social behaviors, impaired motor coordination, and increased stress reactivity and anxiety-related responses. Dlg4⁻(/)⁻ had subtle dysmorphology of amygdala dendritic spines and altered forebrain expression of various synaptic genes, including Cyln2, which regulates cytoskeletal dynamics and is a candidate gene for Williams' syndrome. A signifi-cant association was observed between variations in two human DLG4 SNPs and reduced intraparietal sulcus volume and abnormal cortico-amygdala coupling, both of which characterize Williams' syndrome.

    Conclusions: These findings demonstrate that DLG4 gene disruption in mice produces a complex range of behavioral and molecular abnormalities relevant to autism spectrum disorders and Williams' syndrome. The study provides an initial link between human DLG4 gene variation and key neural endophenotypes of Williams' syndrome and perhaps corticoamygdala regulation of emotional and social processes more generally.

    Funded by: NIAAA NIH HHS: ZIA AA000411-07, ZIA AA000421-02, ZIA AA000421-03; Wellcome Trust

    The American journal of psychiatry 2010;167;12;1508-17

  • The first special issue of Standards in Genomic Sciences from the Genomic Standards Consortium.

    Field D, Kottmann R and Sterk P

    Standards in genomic sciences 2010;3;3;214-5

  • The Pfam protein families database.

    Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (, the USA ( and Sweden (

    Funded by: Howard Hughes Medical Institute; Medical Research Council: MC_U137761446; Wellcome Trust: 087656, WT077044/Z/05/Z

    Nucleic acids research 2010;38;Database issue;D211-22

  • Additional sex combs-like 1 belongs to the enhancer of trithorax and polycomb group and genetically interacts with Cbx2 in mice.

    Fisher CL, Lee I, Bloyer S, Bozza S, Chevalier J, Dahl A, Bodner C, Helgason CD, Hess JL, Humphries RK and Brock HW

    Department of Zoology, University of British Columbia, 2350 Health Sciences Mall, Vancouver, British Columbia V6T1Z3, Canada.

    The Additional sex combs (Asx) gene of Drosophila behaves genetically as an enhancer of trithorax and polycomb (ETP) in displaying bidirectional homeotic phenotypes, suggesting that is required for maintenance of both activation and silencing of Hox genes. There are three murine homologs of Asx called Additional sex combs-like1, 2, and 3. Asxl1 is required for normal adult hematopoiesis; however, its embryonic function is unknown. We used a targeted mouse mutant line Asxl1(tm1Bc) to determine if Asxl1 is required to silence and activate Hox genes in mice during axial patterning. The mutant embryos exhibit simultaneous anterior and posterior transformations of the axial skeleton, consistent with a role for Asxl1 in activation and silencing of Hox genes. Transformations of the axial skeleton are enhanced in compound mutant embryos for the polycomb group gene M33/Cbx2. Hoxa4, Hoxa7, and Hoxc8 are derepressed in Asxl1(tm1Bc) mutants in the antero-posterior axis, but Hoxc8 expression is reduced in the brain of mutants, consistent with Asxl1 being required both for activation and repression of Hox genes. We discuss the genetic and molecular definition of ETPs, and suggest that the function of Asxl1 depends on its cellular context.

    Funded by: Canadian Institutes of Health Research; NCI NIH HHS: R01 CA078815-03, R01 CA092251-11, R01-CA-0078815

    Developmental biology 2010;337;1;9-15

  • Ensembl's 10th year.

    Flicek P, Aken BL, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Gräf S, Haider S, Hammond M, Howe K, Jenkinson A, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Koscielny G, Kulesha E, Lawson D, Longden I, Massingham T, McLaren W, Megy K, Overduin B, Pritchard B, Rios D, Ruffier M, Schuster M, Slater G, Smedley D, Spudich G, Tang YA, Trevanion S, Vilella A, Vogel J, White S, Wilder SP, Zadissa A, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Herrero J, Hubbard TJ, Parker A, Proctor G, Smith J and Searle SM

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Ensembl ( integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.

    Funded by: Biotechnology and Biological Sciences Research Council: BBE0116401; Wellcome Trust: 062023, 077198

    Nucleic acids research 2010;38;Database issue;D557-62

  • Special issue: The Human Intestinal Microbiota.

    Flint HJ, O'Toole PW and Walker AW

    Microbial Ecology Group, Rowett Institute of Nutrition and Health, University of Aberdeen, Bucksburn, Aberdeen AB21 9SB, UK.

    Microbiology (Reading, England) 2010;156;Pt 11;3203-4

  • CINister thoughts.

    Foijer F

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    Chromosome instability (CIN) is the process that leads to aneuploidy, a known hallmark of human tumours for over a century. Nowadays, it is believed that CIN promotes tumorigenesis by shuffling the genome into a malignant order through translocations, amplifications, deletions (structural CIN), and gains and losses of whole chromosomes (numerical CIN or nCIN). The present review focuses on the causes and consequences of nCIN. Several roads can lead to nCIN, including a compromised spindle assembly checkpoint, cohesion defects, p53 deficiency and flawed microtubule-kinetochore attachments. Whereas the link between nCIN and tumorigenesis is becoming more evident, indications have emerged recently that nCIN can suppress tumour formation as well. To understand these paradoxical findings, novel reagents and more sophisticated mouse models are needed. This will provide us with a better understanding of nCIN and eventually with therapies that exploit this characteristic of human tumours.

    Biochemical Society transactions 2010;38;6;1715-21

  • Major depression and the metabolic syndrome.

    Foley DL, Morley KI, Madden PA, Heath AC, Whitfield JB and Martin NG

    Biostatistics Unit, Orygen Youth Health Research Centre & Centre for Youth Mental Health, The University of Melbourne, Australia.

    The aim of this study is to characterize the relationship between major depression and the metabolic syndrome in a large community based sample of Australian men and women aged 26-90 years. A lifetime history of major depression was assessed by telephone interview following the DSM-III-R. A current history of metabolic syndrome was assessed following the United States National Cholesterol Education Program Adult Treatment Panel III (NCEP AP-III) guidelines 1 to 3 years later. Logistic regression was used to estimate the association between depression and the metabolic syndrome, and its component criteria, controlling for age, sex and alcohol dependence. There was no association between a lifetime history of major depression and the presence of the metabolic syndrome. There was a weak association between depression and low high-density lipoprotein cholesterol but not with other component criteria of the metabolic syndrome. Despite calls for interventions directed at depression to reduce the onset of the metabolic syndrome there are important failures to replicate in large samples such as this, no consensus regarding the threshold at which depression may pose a significant risk even allowing for heterogeneity across populations, and no consensus regarding confounders that may explain inter-study differences. The absence of any dosage effect of depression on the associated risk for the metabolic syndrome in other unselected samples does not support a direct causal relationship. The call for intervention studies on the basis of the currently published evidence base is unwarranted.

    Funded by: Medical Research Council; NIAAA NIH HHS: AA014041, AA07535, AA13320, AA13326, K05 AA017688-04, R01 AA007535-08, R01 AA013320-05, R01 AA013326-05, R01 AA014041-05; NIDA NIH HHS: R01 DA012854-08

    Twin research and human genetics : the official journal of the International Society for Twin Studies 2010;13;4;347-58

  • Evaluating the discriminative power of multi-trait genetic risk scores for type 2 diabetes in a northern Swedish population.

    Fontaine-Bisson B, Renström F, Rolandsson O, MAGIC, Payne F, Hallmans G, Barroso I and Franks PW

    Department of Nutrition Sciences, University of Ottawa, Ottawa, ON, Canada.

    We determined whether single nucleotide polymorphisms (SNPs) previously associated with diabetogenic traits improve the discriminative power of a type 2 diabetes genetic risk score.

    Methods: Participants (n = 2,751) were genotyped for 73 SNPs previously associated with type 2 diabetes, fasting glucose/insulin concentrations, obesity or lipid levels, from which five genetic risk scores (one for each of the four traits and one combining all SNPs) were computed. Type 2 diabetes patients and non-diabetic controls (n = 1,327/1,424) were identified using medical records in addition to an independent oral glucose tolerance test.

    Results: Model 1, including only SNPs associated with type 2 diabetes, had a discriminative power of 0.591 (p < 1.00 x 10(-20) vs null model) as estimated by the area under the receiver operator characteristic curve (ROC AUC). Model 2, including only fasting glucose/insulin SNPs, had a significantly higher discriminative power than the null model (ROC AUC 0.543; p = 9.38 x 10(-6) vs null model), but lower discriminative power than model 1 (p = 5.92 x 10(-5)). Model 3, with only lipid-associated SNPs, had significantly higher discriminative power than the null model (ROC AUC 0.565; p = 1.44 x 10(-9)) and was not statistically different from model 1 (p = 0.083). The ROC AUC of model 4, which included only obesity SNPs, was 0.557 (p = 2.30 x 10(-7) vs null model) and smaller than model 1 (p = 0.025). Finally, the model including all SNPs yielded a significant improvement in discriminative power compared with the null model (p < 1.0 x 10(-20)) and model 1 (p = 1.32 x 10(-5)); its ROC AUC was 0.626.

    Adding SNPs previously associated with fasting glucose, insulin, lipids or obesity to a genetic risk score for type 2 diabetes significantly increases the power to discriminate between people with and without clinically manifest type 2 diabetes compared with a model including only conventional type 2 diabetes loci.

    Funded by: Wellcome Trust: 077016/Z/05/Z

    Diabetologia 2010;53;10;2155-62

  • COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer.

    Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, Menzies A, Teague JW, Stratton MR and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The catalogue of Somatic Mutations in Cancer (COSMIC) ( is the largest public resource for information on somatically acquired mutations in human cancer and is available freely without restrictions. Currently (v43, August 2009), COSMIC contains details of 1.5-million experiments performed through 13,423 genes in almost 370,000 tumours, describing over 90,000 individual mutations. Data are gathered from two sources, publications in the scientific literature, (v43 contains 7797 curated articles) and the full output of the genome-wide screens from the Cancer Genome Project (CGP) at the Sanger Institute, UK. Most of the world's literature on point mutations in human cancer has now been curated into COSMIC and while this is continually updated, a greater emphasis on curating fusion gene mutations is driving the expansion of this information; over 2700 fusion gene mutations are now described. Whole-genome sequencing screens are now identifying large numbers of genomic rearrangements in cancer and COSMIC is now displaying details of these analyses also. Examination of COSMIC's data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype. Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.

    Funded by: Wellcome Trust: 077012/Z/05/Z

    Nucleic acids research 2010;38;Database issue;D652-7

  • Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies.

    Fortier I, Burton PR, Robson PJ, Ferretti V, Little J, L'Heureux F, Deschênes M, Knoppers BM, Doiron D, Keers JC, Linksted P, Harris JR, Lachance G, Boileau C, Pedersen NL, Hamilton CM, Hveem K, Borugian MJ, Gallagher RP, McLaughlin J, Parker L, Potter JD, Gallacher J, Kaaks R, Liu B, Sprosen T, Vilain A, Atkinson SA, Rengifo A, Morton R, Metspalu A, Wichmann HE, Tremblay M, Chisholm RL, Garcia-Montero A, Hillege H, Litton JE, Palmer LJ, Perola M, Wolffenbuttel BH, Peltonen L and Hudson TJ

    Public Population Project in Genomics (P³G), Montreal, QC, Canada.

    Background: Vast sample sizes are often essential in the quest to disentangle the complex interplay of the genetic, lifestyle, environmental and social factors that determine the aetiology and progression of chronic diseases. The pooling of information between studies is therefore of central importance to contemporary bioscience. However, there are many technical, ethico-legal and scientific challenges to be overcome if an effective, valid, pooled analysis is to be achieved. Perhaps most critically, any data that are to be analysed in this way must be adequately 'harmonized'. This implies that the collection and recording of information and data must be done in a manner that is sufficiently similar in the different studies to allow valid synthesis to take place.

    Methods: This conceptual article describes the origins, purpose and scientific foundations of the DataSHaPER (DataSchema and Harmonization Platform for Epidemiological Research;, which has been created by a multidisciplinary consortium of experts that was pulled together and coordinated by three international organizations: P³G (Public Population Project in Genomics), PHOEBE (Promoting Harmonization of Epidemiological Biobanks in Europe) and CPT (Canadian Partnership for Tomorrow Project).

    Results: The DataSHaPER provides a flexible, structured approach to the harmonization and pooling of information between studies. Its two primary components, the 'DataSchema' and 'Harmonization Platforms', together support the preparation of effective data-collection protocols and provide a central reference to facilitate harmonization. The DataSHaPER supports both 'prospective' and 'retrospective' harmonization.

    Conclusion: It is hoped that this article will encourage readers to investigate the project further: the more the research groups and studies are actively involved, the more effective the DataSHaPER programme will ultimately be.

    Funded by: Wellcome Trust: 086160/Z/08/A

    International journal of epidemiology 2010;39;5;1383-93

  • Lysine-specific demethylase 1 regulates the embryonic transcriptome and CoREST stability.

    Foster CT, Dovey OM, Lezina L, Luo JL, Gant TW, Barlev N, Bradley A and Cowley SM

    Department of Biochemistry, University of Leicester, Leicester, United Kingdom.

    Lysine-specific demethylase 1 (LSD1), which demethylates mono- and dimethylated histone H3-Lys4 as part of a complex including CoREST and histone deacetylases (HDACs), is essential for embryonic development in the mouse beyond embryonic day 6.5 (e6.5). To determine the role of LSD1 during this early period of embryogenesis, we have generated loss-of-function gene trap mice and conditional knockout embryonic stem (ES) cells. Analysis of postimplantation gene trap embryos revealed that LSD1 expression, and therefore function, is restricted to the epiblast. Conditional deletion of LSD1 in mouse ES cells, the in vitro counterpart of the epiblast, revealed a reduction in CoREST protein and associated HDAC activity, resulting in a global increase in histone H3-Lys56 acetylation, but not H3-Lys4 methylation. Despite this biochemical perturbation, ES cells with LSD1 deleted proliferate normally and retain stem cell characteristics. Loss of LSD1 causes the aberrant expression of 588 genes, including those coding for transcription factors with roles in anterior/posterior patterning and limb development, such as brachyury, Hoxb7, Hoxd8, and retinoic acid receptor γ (RARγ). The gene coding for brachyury, a key regulator of mesodermal differentiation, is a direct target gene of LSD1 and is overexpressed in e6.5 Lsd1 gene trap embryos. Thus, LSD1 regulates the expression and appropriate timing of key developmental regulators, as part of the LSD1/CoREST/HDAC complex, during early embryonic development.

    Funded by: Medical Research Council: G0600135

    Molecular and cellular biology 2010;30;20;4851-63

  • Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci.

    Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R, Anderson CA, Bis JC, Bumpstead S, Ellinghaus D, Festen EM, Georges M, Green T, Haritunians T, Jostins L, Latiano A, Mathew CG, Montgomery GW, Prescott NJ, Raychaudhuri S, Rotter JI, Schumm P, Sharma Y, Simms LA, Taylor KD, Whiteman D, Wijmenga C, Baldassano RN, Barclay M, Bayless TM, Brand S, Büning C, Cohen A, Colombel JF, Cottone M, Stronati L, Denson T, De Vos M, D'Inca R, Dubinsky M, Edwards C, Florin T, Franchimont D, Gearry R, Glas J, Van Gossum A, Guthery SL, Halfvarson J, Verspaget HW, Hugot JP, Karban A, Laukens D, Lawrance I, Lemann M, Levine A, Libioulle C, Louis E, Mowat C, Newman W, Panés J, Phillips A, Proctor DD, Regueiro M, Russell R, Rutgeerts P, Sanderson J, Sans M, Seibold F, Steinhart AH, Stokkers PC, Torkvist L, Kullak-Ublick G, Wilson D, Walters T, Targan SR, Brant SR, Rioux JD, D'Amato M, Weersma RK, Kugathasan S, Griffiths AM, Mansfield JC, Vermeire S, Duerr RH, Silverberg MS, Satsangi J, Schreiber S, Cho JH, Annese V, Hakonarson H, Daly MJ and Parkes M

    Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, Kiel, Germany.

    We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCRR NIH HHS: M01-RR00425; NHLBI NIH HHS: N01 HC-15103, N01 HC-55222, N01-HC-35129, N01-HC-45133, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, R01 HL087652, U01 HL080295; NIAMS NIH HHS: K08 AR055688-01A1S1, K08 AR055688-03, K08 AR055688-04; NIDDK NIH HHS: DK 063491, DK062413, DK062420, DK062422, DK062423, DK062429, DK062431, DK062432, DK064869, DK069513, DK084554, DK76984, P01-DK046763, R01 DK064869-09, U01 DK062420; Wellcome Trust: 089120, WT089120/Z/09/Z

    Nature genetics 2010;42;12;1118-25

  • Nonobese diabetic congenic strain analysis of autoimmune diabetes reveals genetic complexity of the Idd18 locus and identifies Vav3 as a candidate gene.

    Fraser HI, Dendrou CA, Healy B, Rainbow DB, Howlett S, Smink LJ, Gregory S, Steward CA, Todd JA, Peterson LB and Wicker LS

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge.

    We have used the public sequencing and annotation of the mouse genome to delimit the previously resolved type 1 diabetes (T1D) insulin-dependent diabetes (Idd)18 interval to a region on chromosome 3 that includes the immunologically relevant candidate gene, Vav3. To test the candidacy of Vav3, we developed a novel congenic strain that enabled the resolution of Idd18 to a 604-kb interval, designated Idd18.1, which contains only two annotated genes: the complete sequence of Vav3 and the last exon of the gene encoding NETRIN G1, Ntng1. Targeted sequencing of Idd18.1 in the NOD mouse strain revealed that allelic variation between NOD and C57BL/6J (B6) occurs in noncoding regions with 138 single nucleotide polymorphisms concentrated in the introns between exons 20 and 27 and immediately after the 3' untranslated region. We observed differential expression of VAV3 RNA transcripts in thymocytes when comparing congenic mouse strains with B6 or NOD alleles at Idd18.1. The T1D protection associated with B6 alleles of Idd18.1/Vav3 requires the presence of B6 protective alleles at Idd3, which are correlated with increased IL-2 production and regulatory T cell function. In the absence of B6 protective alleles at Idd3, we detected a second T1D protective B6 locus, Idd18.3, which is closely linked to, but distinct from, Idd18.1. Therefore, genetic mapping, sequencing, and gene expression evidence indicate that alteration of VAV3 expression is an etiological factor in the development of autoimmune beta-cell destruction in NOD mice. This study also demonstrates that a congenic strain mapping approach can isolate closely linked susceptibility genes.

    Funded by: NIAID NIH HHS: AI 15416; Wellcome Trust: 061858, 061859, 079895

    Journal of immunology (Baltimore, Md. : 1950) 2010;184;9;5075-84

  • Variants in ADCY5 and near CCNL1 are associated with fetal growth and birth weight.

    Freathy RM, Mook-Kanamori DO, Sovio U, Prokopenko I, Timpson NJ, Berry DJ, Warrington NM, Widen E, Hottenga JJ, Kaakinen M, Lange LA, Bradfield JP, Kerkhof M, Marsh JA, Mägi R, Chen CM, Lyon HN, Kirin M, Adair LS, Aulchenko YS, Bennett AJ, Borja JB, Bouatia-Naji N, Charoen P, Coin LJ, Cousminer DL, de Geus EJ, Deloukas P, Elliott P, Evans DM, Froguel P, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, Glaser B, Groves CJ, Hartikainen AL, Hassanali N, Hirschhorn JN, Hofman A, Holly JM, Hyppönen E, Kanoni S, Knight BA, Laitinen J, Lindgren CM, Meta-Analyses of Glucose and Insulin-related traits Consortium, McArdle WL, O'Reilly PF, Pennell CE, Postma DS, Pouta A, Ramasamy A, Rayner NW, Ring SM, Rivadeneira F, Shields BM, Strachan DP, Surakka I, Taanila A, Tiesler C, Uitterlinden AG, van Duijn CM, Wellcome Trust Case Control Consortium, Wijga AH, Willemsen G, Zhang H, Zhao J, Wilson JF, Steegers EA, Hattersley AT, Eriksson JG, Peltonen L, Mohlke KL, Grant SF, Hakonarson H, Koppelman GH, Dedoussis GV, Heinrich J, Gillman MW, Palmer LJ, Frayling TM, Boomsma DI, Davey Smith G, Power C, Jaddoe VW, Jarvelin MR, Early Growth Genetics (EGG) Consortium and McCarthy MI

    Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter, UK.

    To identify genetic variants associated with birth weight, we meta-analyzed six genome-wide association (GWA) studies (n = 10,623 Europeans from pregnancy/birth cohorts) and followed up two lead signals in 13 replication studies (n = 27,591). rs900400 near LEKR1 and CCNL1 (P = 2 x 10(-35)) and rs9883204 in ADCY5 (P = 7 x 10(-15)) were robustly associated with birth weight. Correlated SNPs in ADCY5 were recently implicated in regulation of glucose levels and susceptibility to type 2 diabetes, providing evidence that the well-described association between lower birth weight and subsequent type 2 diabetes has a genetic component, distinct from the proposed role of programming by maternal nutrition. Using data from both SNPs, we found that the 9% of Europeans carrying four birth weight-lowering alleles were, on average, 113 g (95% CI 89-137 g) lighter at birth than the 24% with zero or one alleles (P(trend) = 7 x 10(-30)). The impact on birth weight is similar to that of a mother smoking 4-5 cigarettes per day in the third trimester of pregnancy.

    Funded by: British Heart Foundation; Canadian Institutes of Health Research: MOP 82893; Chief Scientist Office: CZB/4/710; Department of Health; FIC NIH HHS: TW05596; Medical Research Council: G0000934, G0500070, G0500539, G0600705, G0601261, G0601653, G0800582, G0801056, G9815508; NCRR NIH HHS: RR20649; NHLBI NIH HHS: HL068041, HL085144, HL0876792; NICHD NIH HHS: HD034568, HD05450, HD056465, R24 HD050924-07; NIDDK NIH HHS: 1R01DK075787, DK075787, DK078150, DK56350, R01 DK078150-01, R01 DK078150-02, R01 DK078150-03; NIEHS NIH HHS: ES10126; NIMH NIH HHS: MH083268, MH63706; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 085301, 085541, 89061/Z/09/Z

    Nature genetics 2010;42;5;430-5

  • Mouse welfare terms

    Gardiner M, Wells S, Trower C, SALISBURY J, Mallon AM, Beck T, MELVIN D, Bussell J

    Animal Technology and Welfare. 2010;9;175

  • SnoPatrol: how many snoRNA genes are there?

    Gardner PP, Bateman A and Poole AM

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB101SA, UK.

    Small nucleolar RNAs (snoRNAs) are among the most evolutionarily ancient classes of small RNA. Two experimental screens published in BMC Genomics expand the eukaryotic snoRNA catalog, but many more snoRNAs remain to be found.

    Funded by: Wellcome Trust: 077044, WT077044/Z/05/Z

    Journal of biology 2010;9;1;4

  • The Gene Ontology in 2010: extensions and refinements.

    Gene Ontology Consortium

    The Gene Ontology (GO) Consortium ( (GOC) continues to develop, maintain and use a set of structured, controlled vocabularies for the annotation of genes, gene products and sequences. The GO ontologies are expanding both in content and in structure. Several new relationship types have been introduced and used, along with existing relationships, to create links between and within the GO domains. These improve the representation of biology, facilitate querying, and allow GO developers to systematically check for and correct inconsistencies within the GO. Gene product annotation using GO continues to increase both in the number of total annotations and in species coverage. GO tools, such as OBO-Edit, an ontology-editing tool, and AmiGO, the GOC ontology browser, have seen major improvements in functionality, speed and ease of use.

    Funded by: British Heart Foundation: SP/07/007/23671; Medical Research Council: G0500293; NHGRI NIH HHS: #P41HG02273, HG000330, HG002659, HG003751, HG004341, HG01315, HG02223, P41 HG02273; NHLBI NIH HHS: HL64541; NIGMS NIH HHS: U24GM077905, U24GM088849

    Nucleic acids research 2010;38;Database issue;D331-5

  • A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1.

    Genetic Analysis of Psoriasis Consortium &amp; the Wellcome Trust Case Control Consortium 2, Strange A, Capon F, Spencer CC, Knight J, Weale ME, Allen MH, Barton A, Band G, Bellenguez C, Bergboer JG, Blackwell JM, Bramon E, Bumpstead SJ, Casas JP, Cork MJ, Corvin A, Deloukas P, Dilthey A, Duncanson A, Edkins S, Estivill X, Fitzgerald O, Freeman C, Giardina E, Gray E, Hofer A, Hüffmeier U, Hunt SE, Irvine AD, Jankowski J, Kirby B, Langford C, Lascorz J, Leman J, Leslie S, Mallbris L, Markus HS, Mathew CG, McLean WH, McManus R, Mössner R, Moutsianas L, Naluai AT, Nestle FO, Novelli G, Onoufriadis A, Palmer CN, Perricone C, Pirinen M, Plomin R, Potter SC, Pujol RM, Rautanen A, Riveira-Munoz E, Ryan AW, Salmhofer W, Samuelsson L, Sawcer SJ, Schalkwijk J, Smith CH, Ståhle M, Su Z, Tazi-Ahnini R, Traupe H, Viswanathan AC, Warren RB, Weger W, Wolk K, Wood N, Worthington J, Young HS, Zeeuwen PL, Hayday A, Burden AD, Griffiths CE, Kere J, Reis A, McVean G, Evans DM, Brown MA, Barker JN, Peltonen L, Donnelly P and Trembath RC

    Wellcome Trust Centre for Human Genetics, Oxford, UK.

    To identify new susceptibility loci for psoriasis, we undertook a genome-wide association study of 594,224 SNPs in 2,622 individuals with psoriasis and 5,667 controls. We identified associations at eight previously unreported genomic loci. Seven loci harbored genes with recognized immune functions (IL28RA, REL, IFIH1, ERAP1, TRAF3IP2, NFKBIA and TYK2). These associations were replicated in 9,079 European samples (six loci with a combined P < 5 × 10⁻⁸ and two loci with a combined P < 5 × 10⁻⁷). We also report compelling evidence for an interaction between the HLA-C and ERAP1 loci (combined P = 6.95 × 10⁻⁶). ERAP1 plays an important role in MHC class I peptide processing. ERAP1 variants only influenced psoriasis susceptibility in individuals carrying the HLA-C risk allele. Our findings implicate pathways that integrate epidermal barrier dysfunction with innate and adaptive immune dysregulation in psoriasis pathogenesis.

    Funded by: Department of Health; Medical Research Council: G0000934, G0601387; Wellcome Trust: 068545/Z/02, 083948/Z/07/Z, 084726

    Nature genetics 2010;42;11;985-90

  • Cell-type-specific long-range looping interactions identify distant regulatory elements of the CFTR gene.

    Gheldof N, Smith EM, Tabuchi TM, Koch CM, Dunham I, Stamatoyannopoulos JA and Dekker J

    Program in Gene Function and Expression and Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, MA 01605-0103, USA.

    Identification of regulatory elements and their target genes is complicated by the fact that regulatory elements can act over large genomic distances. Identification of long-range acting elements is particularly important in the case of disease genes as mutations in these elements can result in human disease. It is becoming increasingly clear that long-range control of gene expression is facilitated by chromatin looping interactions. These interactions can be detected by chromosome conformation capture (3C). Here, we employed 3C as a discovery tool for identification of long-range regulatory elements that control the cystic fibrosis transmembrane conductance regulator gene, CFTR. We identified four elements in a 460-kb region around the locus that loop specifically to the CFTR promoter exclusively in CFTR expressing cells. The elements are located 20 and 80 kb upstream; and 109 and 203 kb downstream of the CFTR promoter. These elements contain DNase I hypersensitive sites and histone modification patterns characteristic of enhancers. The elements also interact with each other and the latter two activate the CFTR promoter synergistically in reporter assays. Our results reveal novel long-range acting elements that control expression of CFTR and suggest that 3C-based approaches can be used for discovery of novel regulatory elements.

    Funded by: NHGRI NIH HHS: HG003143, HG004592, R01 HG003143-06, U01 HG003168; Wellcome Trust

    Nucleic acids research 2010;38;13;4325-36

  • No correlation between childhood maltreatment and telomere length.

    Glass D, Parts L, Knowles D, Aviv A and Spector TD

    Funded by: Wellcome Trust

    Biological psychiatry 2010;68;6;e21-2; author reply e23-4

  • Transcription profiling in human platelets reveals LRRFIP1 as a novel protein regulating platelet function.

    Goodall AH, Burns P, Salles I, Macaulay IC, Jones CI, Ardissino D, de Bono B, Bray SL, Deckmyn H, Dudbridge F, Fitzgerald DJ, Garner SF, Gusnanto A, Koch K, Langford C, O'Connor MN, Rice CM, Stemple D, Stephens J, Trip MD, Zwaginga JJ, Samani NJ, Watkins NA, Maguire PB, Ouwehand WH and Bloodomics Consortium

    Department of Cardiovascular Science, University of Leicester, Clinical Sciences Wing, Glenfield Hospital, Leicester, UK.

    Within the healthy population, there is substantial, heritable, and interindividual variability in the platelet response. We explored whether a proportion of this variability could be accounted for by interindividual variation in gene expression. Through a correlative analysis of genome-wide platelet RNA expression data from 37 subjects representing the normal range of platelet responsiveness within a cohort of 500 subjects, we identified 63 genes in which transcript levels correlated with variation in the platelet response to adenosine diphosphate and/or the collagen-mimetic peptide, cross-linked collagen-related peptide. Many of these encode proteins with no reported function in platelets. An association study of 6 of the 63 genes in 4235 cases and 6379 controls showed a putative association with myocardial infarction for COMMD7 (COMM domain-containing protein 7) and a major deviation from the null hypo thesis for LRRFIP1 [leucine-rich repeat (in FLII) interacting protein 1]. Morpholino-based silencing in Danio rerio identified a modest role for commd7 and a significant effect for lrrfip1 as positive regulators of thrombus formation. Proteomic analysis of human platelet LRRFIP1-interacting proteins indicated that LRRFIP1 functions as a component of the platelet cytoskeleton, where it interacts with the actin-remodeling proteins Flightless-1 and Drebrin. Taken together, these data reveal novel proteins regulating the platelet response.

    Funded by: Medical Research Council: MC_U105292688

    Blood 2010;116;22;4646-56

  • BioRuby: bioinformatics software for the Ruby programming language.

    Goto N, Prins P, Nakao M, Bonnal R, Aerts J and Katayama T

    Department of Genome Informatics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, Japan.

    Summary: The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser.

    Availability: BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from


    Bioinformatics (Oxford, England) 2010;26;20;2617-9

  • Targeted TAP tags, phosphoproteomes and the biology of thought.

    Grant SG

    Expert review of proteomics 2010;7;2;169-71

  • Computing behaviour in complex synapses

    Grant, S.G

    Biochemist. 2010;32;6-9

  • PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data.

    Greenman CD, Bignell G, Butler A, Edkins S, Hinton J, Beare D, Swamy S, Santarius T, Chen L, Widaa S, Futreal PA and Stratton MR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    High-throughput oligonucleotide microarrays are commonly employed to investigate genetic disease, including cancer. The algorithms employed to extract genotypes and copy number variation function optimally for diploid genomes usually associated with inherited disease. However, cancer genomes are aneuploid in nature leading to systematic errors when using these techniques. We introduce a preprocessing transformation and hidden Markov model algorithm bespoke to cancer. This produces genotype classification, specification of regions of loss of heterozygosity, and absolute allelic copy number segmentation. Accurate prediction is demonstrated with a combination of independent experimental techniques. These methods are exemplified with affymetrix genome-wide SNP6.0 data from 755 cancer cell lines, enabling inference upon a number of features of biological interest. These data and the coded algorithm are freely available for download.

    Biostatistics (Oxford, England) 2010;11;1;164-75

  • Rare copy number variants: a point of rarity in genetic risk for bipolar disorder and schizophrenia.

    Grozeva D, Kirov G, Ivanov D, Jones IR, Jones L, Green EK, St Clair DM, Young AH, Ferrier N, Farmer AE, McGuffin P, Holmans PA, Owen MJ, O'Donovan MC, Craddock N and Wellcome Trust Case Control Consortium

    Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University, Cardiff CF14 4XN, Wales, UK.

    Context: Recent studies suggest that copy number variation in the human genome is extensive and may play an important role in susceptibility to disease, including neuropsychiatric disorders such as schizophrenia and autism. The possible involvement of copy number variants (CNVs) in bipolar disorder has received little attention to date.

    Objectives: To determine whether large (>100,000 base pairs) and rare (found in <1% of the population) CNVs are associated with susceptibility to bipolar disorder and to compare with findings in schizophrenia.

    Design: A genome-wide survey of large, rare CNVs in a case-control sample using a high-density microarray.

    Setting: The Wellcome Trust Case Control Consortium.

    Participants: There were 1697 cases of bipolar disorder and 2806 nonpsychiatric controls. All participants were white UK residents.

    Overall load of CNVs and presence of rare CNVs.

    Results: The burden of CNVs in bipolar disorder was not increased compared with controls and was significantly less than in schizophrenia cases. The CNVs previously implicated in the etiology of schizophrenia were not more common in cases with bipolar disorder.

    Conclusions: Schizophrenia and bipolar disorder differ with respect to CNV burden in general and association with specific CNVs in particular. Our data are consistent with the possibility that possession of large, rare deletions may modify the phenotype in those at risk of psychosis: those possessing such events are more likely to be diagnosed as having schizophrenia, and those without them are more likely to be diagnosed as having bipolar disorder.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0701003, G0701420, G0800509, G0800759, G90/106; Wellcome Trust: 061858

    Archives of general psychiatry 2010;67;4;318-27

  • CASK mutations are frequent in males and cause X-linked nystagmus and variable XLMR phenotypes.

    Hackett A, Tarpey PS, Licata A, Cox J, Whibley A, Boyle J, Rogers C, Grigg J, Partington M, Stevenson RE, Tolmie J, Yates JR, Turner G, Wilson M, Futreal AP, Corbett M, Shaw M, Gecz J, Raymond FL, Stratton MR, Schwartz CE and Abidi FE

    Genetics of Learning Disability Service, Hunter Genetics, Waratah, New South Wales, Australia.

    Mutations of the calcium/calmodulin-dependent serine protein kinase (CASK) gene have recently been associated with X-linked mental retardation (XLMR) with microcephaly, optic atrophy and brainstem and cerebellar hypoplasia, as well as with an X-linked syndrome having some FG-like features. Our group has recently identified four male probands from 358 probable XLMR families with missense mutations (p.Y268H, p.P396S, p.D710G and p.W919R) in the CASK gene. Congenital nystagmus, a rare and striking feature, was present in two of these families. We screened a further 45 probands with either nystagmus or microcephaly and mental retardation (MR), and identified two further mutations, a missense mutation (p.Y728C) and a splice mutation (c.2521-2A>T) in two small families with nystagmus and MR. Detailed clinical examinations of all six families, including an ophthalmological review in four families, were undertaken to further characterise the phenotype. We report on the clinical features of 24 individuals, mostly male, from six families with CASK mutations. The phenotype was variable, ranging from non-syndromic mild MR to severe MR associated with microcephaly and dysmorphic facial features. Carrier females were variably affected. Congenital nystagmus was found in members of four of the families. Our findings reinforce the CASK gene as a relatively frequent cause of XLMR in females and males. We further define the phenotypic spectrum and demonstrate that affected males with missense mutations or in-frame deletions in CASK are frequently associated with congenital nystagmus and XLMR, a striking feature not previously reported.

    Funded by: Medical Research Council; NICHD NIH HHS: HD260202; Wellcome Trust

    European journal of human genetics : EJHG 2010;18;5;544-52

  • NK cells influence both innate and adaptive immune responses after mucosal immunization with antigen and mucosal adjuvant.

    Hall LJ, Clare S and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    NK cells were found to be recruited in a temporally controlled manner to the nasal-associated lymphoid tissue and the cervical lymph nodes of mice after intranasal immunization with Ag85B-early secreted antigenic target 6 kDa from Mycobacterium tuberculosis mixed with Escherichia coli heat-labile toxin as adjuvant. These NK cells were activated and secreted a diverse range of cytokines and other immunomodulators. Using Ab depletion targeting anti-asialo GM1, we found evidence for altered trafficking, impaired activation, and cytokine secretion of dendritic cells, macrophages, and neutrophils in immunized NK cell-depleted mice compared with control animals. Analysis of Ag-specific immune responses revealed an attenuated Ab and cytokine response in immunized NK cell-depleted animals. Systemic administration of rIL-6 but not rIFN-gamma significantly restored immune responses in mice depleted of NK cells. In conclusion, cytokine production, particularly IL-6, via NK cells and NK cell-activated immune populations plays an important role in the establishment of local innate immune responses and the consequent development of adaptive immunity after mucosal immunization.

    Journal of immunology (Baltimore, Md. : 1950) 2010;184;8;4327-37

  • Probing local innate immune responses after mucosal immunisation.

    Hall LJ, Clare S and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Background: Intranasal immunisation is potentially a very effective route for inducing both mucosal and systemic immunity to an infectious agent.

    Methods: Balb/c mice were intranasally immunised with the mucosal adjuvant heat labile toxin and the Mycobacterium tuberculosis fusion protein Ag85B-ESAT6 and early changes in innate immune responses within local mucosal tissues were examined using flow cytometry and confocal microscopy. Antigen-specific humoral and cellular immune responses were also evaluated.

    Results: Intranasal immunisation induced significant changes in both number and distribution of dendritic cells, macrophages and neutrophils within the nasal-associated lymphoid tissue and cervical lymph nodes in comparison to controls as early as 5 h post immunisation. Immunisation also resulted in a rapid and transient increase in activation marker expression first in the nasal-associated lymphoid tissue, and then in the cervical lymph nodes. This heightened activation status was also apparent from the pro-inflammatory cytokine profiles of these innate populations. In addition we also showed increased expression and distribution of a number of different cell adhesion molecules early after intranasal immunisation within these lymphoid tissues. These observed early changes correlated with the induction of a TH1 type immune response.

    Conclusions: These data provide insights into the complex nature of innate immune responses induced following intranasal immunisation within the upper respiratory tract, and may help clarify the concepts and provide the tools that are needed to exploit the full potential of mucosal vaccines.

    Journal of immune based therapies and vaccines 2010;8;5

  • Being more realistic about the public health impact of genomic medicine.

    Hall WD, Mathews R and Morley KI

    University of Queensland Centre for Clinical Research, The University of Queensland, Herston, Queensland, Australia.

    PLoS medicine 2010;7;10

  • A pharmacometric model describing the relationship between warfarin dose and INR response with respect to variations in CYP2C9, VKORC1, and age.

    Hamberg AK, Wadelius M, Lindh JD, Dahl ML, Padrini R, Deloukas P, Rane A and Jonsson EN

    Department of Medical Sciences, Clinical Pharmacology, Uppsala University Hospital, Uppsala, Sweden.

    The objective of the study was to update a previous NONMEM model to describe the relationship between warfarin dose and international normalized ratio (INR) response, to decrease the dependence of the model on pharmacokinetic (PK) data, and to improve the characterization of rare genotype combinations. The effects of age and CYP2C9 genotype on S-warfarin clearance were estimated from high-quality PK data. Thereafter, a temporal dose-response (K-PD) model was developed from information on dose, INR, age, and CYP2C9 and VKORC1 genotype, with drug clearance as a covariate. Two transit compartment chains accounted for the delay between exposure and response. CYP2C9 genotype was identified as the single most important predictor of required dose, causing a difference of up to 4.2-fold in the maintenance dose. VKORC1 accounted for a difference of up to 2.1-fold in dose, and age reduced the dose requirement by ~6% per decade. This reformulated K-PD model decreases dependence on PK data and enables robust assessment of INR response and dose predictions, even in individuals with rare genotype combinations.

    Clinical pharmacology and therapeutics 2010;87;6;727-34

  • KSHV-encoded miRNAs target MAF to induce endothelial cell reprogramming.

    Hansen A, Henderson S, Lagos D, Nikitenko L, Coulter E, Roberts S, Gratrix F, Plaisance K, Renne R, Bower M, Kellam P and Boshoff C

    Cancer Research UK Viral Oncology Group, University College London Cancer Institute, University College London, London WC1E 6BT, United Kingdom.

    Kaposi sarcoma herpesvirus (KSHV) induces transcriptional reprogramming of endothelial cells. In particular, KSHV-infected lymphatic endothelial cells (LECs) show an up-regulation of genes associated with blood vessel endothelial cells (BECs). Consequently, KSHV-infected tumor cells in Kaposi sarcoma are poorly differentiated endothelial cells, expressing markers of both LECs and BECs. MicroRNAs (miRNAs) are short noncoding RNA molecules that act post-transcriptionally to negatively regulate gene expression. Here we validate expression of the KSHV-encoded miRNAs in Kaposi sarcoma lesions and demonstrate that these miRNAs contribute to viral-induced reprogramming by silencing the cellular transcription factor MAF (musculoaponeurotic fibrosarcoma oncogene homolog). MAF is expressed in LECs but not in BECs. We identify a novel role for MAF as a transcriptional repressor, preventing expression of BEC-specific genes, thereby maintaining the differentiation status of LECs. These findings demonstrate that viral miRNAs could influence the differentiation status of infected cells, and thereby contribute to KSHV-induced oncogenesis.

    Funded by: Cancer Research UK; Medical Research Council: G0800168

    Genes & development 2010;24;2;195-205

  • Evolution of MRSA during hospital transmission and intercontinental spread.

    Harris SR, Feil EJ, Holden MT, Quail MA, Nickerson EK, Chantratita N, Gardete S, Tavares A, Day N, Lindsay JA, Edgeworth JD, de Lencastre H, Parkhill J, Peacock SJ and Bentley SD

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 15A, UK.

    Current methods for differentiating isolates of predominant lineages of pathogenic bacteria often do not provide sufficient resolution to define precise relationships. Here, we describe a high-throughput genomics approach that provides a high-resolution view of the epidemiology and microevolution of a dominant strain of methicillin-resistant Staphylococcus aureus (MRSA). This approach reveals the global geographic structure within the lineage, its intercontinental transmission through four decades, and the potential to trace person-to-person transmission within a hospital environment. The ability to interrogate and resolve bacterial populations is applicable to a range of infectious diseases, as well as microbial ecology.

    Funded by: Department of Health; Wellcome Trust: 076964

    Science (New York, N.Y.) 2010;327;5964;469-74

  • WormBase: a comprehensive resource for nematode research.

    Harris TW, Antoshechkin I, Bieri T, Blasiar D, Chan J, Chen WJ, De La Cruz N, Davis P, Duesbury M, Fang R, Fernandes J, Han M, Kishore R, Lee R, Müller HM, Nakamura C, Ozersky P, Petcherski A, Rangarajan A, Rogers A, Schindelman G, Schwarz EM, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Yook K, Durbin R, Stein LD, Spieth J and Sternberg PW

    Ontario Institute For Cancer Research, Toronto, ON, Canada.

    WormBase ( is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: P41-HG02223, P41HG02223

    Nucleic acids research 2010;38;Database issue;D463-7

  • no tail integrates two modes of mesoderm induction.

    Harvey SA, Tümpel S, Dubrulle J, Schier AF and Smith JC

    Wellcome Trust and Cancer Research UK, Gurdon Institute and Department of Zoology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK.

    During early zebrafish development the nodal signalling pathway patterns the embryo into three germ layers, in part by inducing the expression of no tail (ntl), which is essential for correct mesoderm formation. When nodal signalling is inhibited ntl fails to be expressed in the dorsal margin, but ventral ntl expression is unaffected. These observations indicate that ntl transcription is under both nodal-dependent and nodal-independent regulation. Consistent with these observations and with a role for ntl in mesoderm formation, some somites form within the tail region of embryos lacking nodal signalling. In an effort to understand how ntl is regulated and thus how mesoderm forms, we have mapped the elements responsible for nodal-dependent and nodal-independent expression of ntl in the margin of the embryo. Our work demonstrates that expression of ntl in the margin is the consequence of two separate enhancers, which act to mediate different mechanisms of mesoderm formation. One of these enhancers responds to nodal signalling, and the other to Wnt and BMP signalling. We demonstrate that the nodal-independent regulation of ntl is essential for tail formation. Misexpression of Wnt and BMP ligands can induce the formation of an ectopic tail, which contains somites, in embryos devoid of nodal signalling, and this tail formation is dependent on ntl function. Similarly, nodal-independent tail somite formation requires ntl. At later stages in development ntl is required for notochord formation, and our analysis has also led to the identification of the enhancer required for ntl expression in the developing notochord.

    Funded by: Wellcome Trust

    Development (Cambridge, England) 2010;137;7;1127-35

  • Evolutionary dynamics of Clostridium difficile over short and long time scales.

    He M, Sebaihia M, Lawley TD, Stabler RA, Dawson LF, Martin MJ, Holt KE, Seth-Smith HM, Quail MA, Rance R, Brooks K, Churcher C, Harris D, Bentley SD, Burrows C, Clark L, Corton C, Murray V, Rose G, Thurston S, van Tonder A, Walker D, Wren BW, Dougan G and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

    Clostridium difficile has rapidly emerged as the leading cause of antibiotic-associated diarrheal disease, with the transcontinental spread of various PCR ribotypes, including 001, 017, 027 and 078. However, the genetic basis for the emergence of C. difficile as a human pathogen is unclear. Whole genome sequencing was used to analyze genetic variation and virulence of a diverse collection of thirty C. difficile isolates, to determine both macro and microevolution of the species. Horizontal gene transfer and large-scale recombination of core genes has shaped the C. difficile genome over both short and long time scales. Phylogenetic analysis demonstrates C. difficile is a genetically diverse species, which has evolved within the last 1.1-85 million years. By contrast, the disease-causing isolates have arisen from multiple lineages, suggesting that virulence evolved independently in the highly epidemic lineages.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;16;7527-32

  • Clear detection of ADIPOQ locus as the major gene for plasma adiponectin: results of genome-wide association analyses including 4659 European individuals.

    Heid IM, Henneman P, Hicks A, Coassin S, Winkler T, Aulchenko YS, Fuchsberger C, Song K, Hivert MF, Waterworth DM, Timpson NJ, Richards JB, Perry JR, Tanaka T, Amin N, Kollerits B, Pichler I, Oostra BA, Thorand B, Frants RR, Illig T, Dupuis J, Glaser B, Spector T, Guralnik J, Egan JM, Florez JC, Evans DM, Soranzo N, Bandinelli S, Carlson OD, Frayling TM, Burling K, Smith GD, Mooser V, Ferrucci L, Meigs JB, Vollenweider P, Dijk KW, Pramstaller P, Kronenberg F and van Duijn CM

    Department of Epidemiology and Preventive Medicine, Regensburg University Medical Center, Regensburg, Germany.

    Objective: Plasma adiponectin is strongly associated with various components of metabolic syndrome, type 2 diabetes and cardiovascular outcomes. Concentrations are highly heritable and differ between men and women. We therefore aimed to investigate the genetics of plasma adiponectin in men and women.

    Methods: We combined genome-wide association scans of three population-based studies including 4659 persons. For the replication stage in 13795 subjects, we selected the 20 top signals of the combined analysis, as well as the 10 top signals with p-values less than 1.0 x 10(-4) for each the men- and the women-specific analyses. We further selected 73 SNPs that were consistently associated with metabolic syndrome parameters in previous genome-wide association studies to check for their association with plasma adiponectin.

    Results: The ADIPOQ locus showed genome-wide significant p-values in the combined (p=4.3 x 10(-24)) as well as in both women- and men-specific analyses (p=8.7 x 10(-17) and p=2.5 x 10(-11), respectively). None of the other 39 top signal SNPs showed evidence for association in the replication analysis. None of 73 SNPs from metabolic syndrome loci exhibited association with plasma adiponectin (p>0.01).

    Conclusions: We demonstrated the ADIPOQ gene as the only major gene for plasma adiponectin, which explains 6.7% of the phenotypic variance. We further found that neither this gene nor any of the metabolic syndrome loci explained the sex differences observed for plasma adiponectin. Larger studies are needed to identify more moderate genetic determinants of plasma adiponectin.

    Funded by: NHLBI NIH HHS: N01 HC025195; NIDDK NIH HHS: R01 DK075787-01A1

    Atherosclerosis 2010;208;2;412-20

  • Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution.

    Heid IM, Jackson AU, Randall JC, Winkler TW, Qi L, Steinthorsdottir V, Thorleifsson G, Zillikens MC, Speliotes EK, Mägi R, Workalemahu T, White CC, Bouatia-Naji N, Harris TB, Berndt SI, Ingelsson E, Willer CJ, Weedon MN, Luan J, Vedantam S, Esko T, Kilpeläinen TO, Kutalik Z, Li S, Monda KL, Dixon AL, Holmes CC, Kaplan LM, Liang L, Min JL, Moffatt MF, Molony C, Nicholson G, Schadt EE, Zondervan KT, Feitosa MF, Ferreira T, Lango Allen H, Weyant RJ, Wheeler E, Wood AR, MAGIC, Estrada K, Goddard ME, Lettre G, Mangino M, Nyholt DR, Purcell S, Smith AV, Visscher PM, Yang J, McCarroll SA, Nemesh J, Voight BF, Absher D, Amin N, Aspelund T, Coin L, Glazer NL, Hayward C, Heard-Costa NL, Hottenga JJ, Johansson A, Johnson T, Kaakinen M, Kapur K, Ketkar S, Knowles JW, Kraft P, Kraja AT, Lamina C, Leitzmann MF, McKnight B, Morris AP, Ong KK, Perry JR, Peters MJ, Polasek O, Prokopenko I, Rayner NW, Ripatti S, Rivadeneira F, Robertson NR, Sanna S, Sovio U, Surakka I, Teumer A, van Wingerden S, Vitart V, Zhao JH, Cavalcanti-Proença C, Chines PS, Fisher E, Kulzer JR, Lecoeur C, Narisu N, Sandholt C, Scott LJ, Silander K, Stark K, Tammesoo ML, Teslovich TM, Timpson NJ, Watanabe RM, Welch R, Chasman DI, Cooper MN, Jansson JO, Kettunen J, Lawrence RW, Pellikka N, Perola M, Vandenput L, Alavere H, Almgren P, Atwood LD, Bennett AJ, Biffar R, Bonnycastle LL, Bornstein SR, Buchanan TA, Campbell H, Day IN, Dei M, Dörr M, Elliott P, Erdos MR, Eriksson JG, Freimer NB, Fu M, Gaget S, Geus EJ, Gjesing AP, Grallert H, Grässler J, Groves CJ, Guiducci C, Hartikainen AL, Hassanali N, Havulinna AS, Herzig KH, Hicks AA, Hui J, Igl W, Jousilahti P, Jula A, Kajantie E, Kinnunen L, Kolcic I, Koskinen S, Kovacs P, Kroemer HK, Krzelj V, Kuusisto J, Kvaloy K, Laitinen J, Lantieri O, Lathrop GM, Lokki ML, Luben RN, Ludwig B, McArdle WL, McCarthy A, Morken MA, Nelis M, Neville MJ, Paré G, Parker AN, Peden JF, Pichler I, Pietiläinen KH, Platou CG, Pouta A, Ridderstråle M, Samani NJ, Saramies J, Sinisalo J, Smit JH, Strawbridge RJ, Stringham HM, Swift AJ, Teder-Laving M, Thomson B, Usala G, van Meurs JB, van Ommen GJ, Vatin V, Volpato CB, Wallaschofski H, Walters GB, Widen E, Wild SH, Willemsen G, Witte DR, Zgaga L, Zitting P, Beilby JP, James AL, Kähönen M, Lehtimäki T, Nieminen MS, Ohlsson C, Palmer LJ, Raitakari O, Ridker PM, Stumvoll M, Tönjes A, Viikari J, Balkau B, Ben-Shlomo Y, Bergman RN, Boeing H, Smith GD, Ebrahim S, Froguel P, Hansen T, Hengstenberg C, Hveem K, Isomaa B, Jørgensen T, Karpe F, Khaw KT, Laakso M, Lawlor DA, Marre M, Meitinger T, Metspalu A, Midthjell K, Pedersen O, Salomaa V, Schwarz PE, Tuomi T, Tuomilehto J, Valle TT, Wareham NJ, Arnold AM, Beckmann JS, Bergmann S, Boerwinkle E, Boomsma DI, Caulfield MJ, Collins FS, Eiriksdottir G, Gudnason V, Gyllensten U, Hamsten A, Hattersley AT, Hofman A, Hu FB, Illig T, Iribarren C, Jarvelin MR, Kao WH, Kaprio J, Launer LJ, Munroe PB, Oostra B, Penninx BW, Pramstaller PP, Psaty BM, Quertermous T, Rissanen A, Rudan I, Shuldiner AR, Soranzo N, Spector TD, Syvanen AC, Uda M, Uitterlinden A, Völzke H, Vollenweider P, Wilson JF, Witteman JC, Wright AF, Abecasis GR, Boehnke M, Borecki IB, Deloukas P, Frayling TM, Groop LC, Haritunians T, Hunter DJ, Kaplan RC, North KE, O'Connell JR, Peltonen L, Schlessinger D, Strachan DP, Hirschhorn JN, Assimes TL, Wichmann HE, Thorsteinsdottir U, van Duijn CM, Stefansson K, Cupples LA, Loos RJ, Barroso I, McCarthy MI, Fox CS, Mohlke KL and Lindgren CM

    Regensburg University Medical Center, Department of Epidemiology and Preventive Medicine, Regensburg, Germany.

    Waist-hip ratio (WHR) is a measure of body fat distribution and a predictor of metabolic consequences independent of overall adiposity. WHR is heritable, but few genetic variants influencing this trait have been identified. We conducted a meta-analysis of 32 genome-wide association studies for WHR adjusted for body mass index (comprising up to 77,167 participants), following up 16 loci in an additional 29 studies (comprising up to 113,636 subjects). We identified 13 new loci in or near RSPO3, VEGFA, TBX15-WARS2, NFE2L3, GRB14, DNM3-PIGC, ITPR2-SSPN, LY86, HOXC13, ADAMTS9, ZNRF3-KREMEN1, NISCH-STAB1 and CPEB4 (P = 1.9 × 10⁻⁹ to P = 1.8 × 10⁻⁴⁰) and the known signal at LYPLAL1. Seven of these loci exhibited marked sexual dimorphism, all with a stronger effect on WHR in women than men (P for sex difference = 1.9 × 10⁻³ to P = 1.2 × 10⁻¹³). These findings provide evidence for multiple loci that modulate body fat distribution independent of overall adiposity and reveal strong gene-by-sex interactions.

    Funded by: British Heart Foundation; Chief Scientist Office: CZB/4/710; Department of Health; Medical Research Council: G0000934, G0401527, G0500115, G0501184, G0600705, G0601261, G0701863, G0801056, G9521010, MC_QA137934, MC_U106179472, MC_U106188470, MC_U127561128, MC_UP_A390_1107; NCI NIH HHS: CA047988, CA49449, CA50385, CA65725, CA67262, CA87969, P01 CA087969-12, R01 CA047988, R01 CA047988-20, R01 CA050385-20, R01 CA065725, R01 CA065725-14, R01 CA067262-14, U01 CA049449-21, U01 CA098233, U01 CA098233-08, ­U01-CA098233; NCRR NIH HHS: UL1 RR025005, UL1 RR025005-04, UL1-RR025005, ­UL1-RR025005; NHGRI NIH HHS: HG002651, HG005581, N01 HG065403, N01-HG-65403, R01 HG002651-05, RC2 HG005581-02, T32 HG000040, T32 HG000040-14, U01 HG004399-02, U01 HG004402-02, Z01 HG000024-14, ­T32-HG00040, ­U01-HG004399, ­U01-HG004402; NHLBI NIH HHS: HL043851, HL084729, HL71981, K99 HL094535, K99 HL094535-02, N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01 HC055222, N01 HC075150, N01 HC085079, N01 HC085080, N01 HC085081, N01 HC085082, N01 HC085083, N01 HC085084, N01 HC085085, N01 HC085086, N01-HC-55018, N01-HC55222, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-10, R01 HL071981-07, R01 HL086694-03, R01 HL087641-03, R01 HL087647, R01 HL087647-03, R01 HL087652-03, R01 HL087679-03, R01 HL087700-03, R01 HL088119, R01 HL088119-04, R01-HL087647, R01-HL59367, U01 HL072515-06, U01 HL080295, U01 HL080295-04, U01 HL084729-03, U01 HL084756, U01 HL084756-03, U01-HL72515, ­K99HL094535, ­N01-HC-25195, ­N01-HC-55019, ­N01-HC-55020, ­N01-HC-55021, ­N01-HC-55022, ­N01-HC15103, ­N01-HC35129, ­N01-HC45133, ­N01-HC55015, ­N01-HC55016, ­N01-HC75150, ­N01-HC85079, ­N01-HC85080, ­N01-HC85081, ­N01-HC85082, ­N01-HC85083, ­N01-HC85084, ­N01-HC85085, ­N01-HC85086, ­R01-HL086694, ­R01-HL087641, ­R01-HL087679, ­R01-HL087700, ­R01-HL088119, ­R01­HL087652, ­U01-HL084756; NIA NIH HHS: N01 AG012100, N01 AG012109, N01-AG-1-2109, R01 AG031890-02, ­N01-AG-12100, ­R01-AG031890; NIDDK NIH HHS: DK062370, DK072193, DK075787, DK58845, F32 DK079466-01, K23 DK080145-01, K23-DK080145, P30 DK046200-14, P30 DK072488-06, R01 DK056690-12, R01 DK058845-11, R01 DK068336-03, R01 DK072193-05, R01 DK073490-05, R01 DK075681-04, R01 DK075787-05, R01 DK089256, R01 DK089256-02, R01-DK068336, R01-DK075787, U01 DK062370, U01 DK062370-08, U01 DK062418-06, ­K23-DK080145, ­P30-DK072488, ­R01-DK-073490, ­R01-DK075681, ­R01-DK075787, ­U01-DK062418; NIGMS NIH HHS: U01 GM074518-05, ­U01-GM074518; NIMH NIH HHS: R01 MH063706-05, R01 MH084698-03, RL1 MH083268-05, ­1RL1-MH083268-01, ­MH084698, ­R01-MH63706; PHS HHS: ­263-MA-410953; Wellcome Trust: 064890, 068545, 072960, 075491, 076113, 077011, 077016, 077016/Z/05/Z, 079557, 079895, 081682, 083270, 085235, 085301, 086596, 088885, 089061, 091746, ­068545/Z/02, ­072960, ­076113/B/04/Z, ­091746/Z/10/Z, ­WT086596/Z/08/Z

    Nature genetics 2010;42;11;949-60

  • A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk.

    Heinig M, Petretto E, Wallace C, Bottolo L, Rotival M, Lu H, Li Y, Sarwar R, Langley SR, Bauerfeind A, Hummel O, Lee YA, Paskas S, Rintisch C, Saar K, Cooper J, Buchan R, Gray EE, Cyster JG, Cardiogenics Consortium, Erdmann J, Hengstenberg C, Maouche S, Ouwehand WH, Rice CM, Samani NJ, Schunkert H, Goodall AH, Schulz H, Roider HG, Vingron M, Blankenberg S, Münzel T, Zeller T, Szymczak S, Ziegler A, Tiret L, Smyth DJ, Pravenec M, Aitman TJ, Cambien F, Clayton D, Todd JA, Hubner N and Cook SA

    Max-Delbrück-Center for Molecular Medicine (MDC), Berlin, Germany.

    Combined analyses of gene networks and DNA sequence variation can provide new insights into the aetiology of common diseases that may not be apparent from genome-wide association studies alone. Recent advances in rat genomics are facilitating systems-genetics approaches. Here we report the use of integrated genome-wide approaches across seven rat tissues to identify gene networks and the loci underlying their regulation. We defined an interferon regulatory factor 7 (IRF7)-driven inflammatory network (IDIN) enriched for viral response genes, which represents a molecular biomarker for macrophages and which was regulated in multiple tissues by a locus on rat chromosome 15q25. We show that Epstein-Barr virus induced gene 2 (Ebi2, also known as Gpr183), which lies at this locus and controls B lymphocyte migration, is expressed in macrophages and regulates the IDIN. The human orthologous locus on chromosome 13q32 controlled the human equivalent of the IDIN, which was conserved in monocytes. IDIN genes were more likely to associate with susceptibility to type 1 diabetes (T1D)-a macrophage-associated autoimmune disease-than randomly selected immune response genes (P = 8.85 × 10(-6)). The human locus controlling the IDIN was associated with the risk of T1D at single nucleotide polymorphism rs9585056 (P = 7.0 × 10(-10); odds ratio, 1.15), which was one of five single nucleotide polymorphisms in this region associated with EBI2 (GPR183) expression. These data implicate IRF7 network genes and their regulatory locus in the pathogenesis of T1D.

    Funded by: British Heart Foundation: P301/10/0290; Medical Research Council: MC_U120061454, MC_U120085815, MC_U120097112; Wellcome Trust: 061858, 076113, 089989

    Nature 2010;467;7314;460-4

  • Meeting Report: "Metagenomics, Metadata and Meta-analysis" (M3) Workshop at the Pacific Symposium on Biocomputing 2010.

    Hirschman L, Sterk P, Field D, Wooley J, Cochrane G, Gilbert J, Kolker E, Kyrpides N, Meyer F, Mizrachi I, Nakamura Y, Sansone SA, Schriml L, Tatusova T, White O and Yilmaz P

    This report summarizes the M3 Workshop held at the January 2010 Pacific Symposium on Biocomputing. The workshop, organized by Genomic Standards Consortium members, included five contributed talks, a series of short presentations from stakeholders in the genomics standards community, a poster session, and, in the evening, an open discussion session to review current projects and examine future directions for the GSC and its stakeholders.

    Standards in genomic sciences 2010;2;3;357-60

  • Intrahost evolutionary dynamics of canine influenza virus in naive and partially immune dogs.

    Hoelzer K, Murcia PR, Baillie GJ, Wood JL, Metzger SM, Osterrieder N, Dubovi EJ, Holmes EC and Parrish CR

    Baker Institute for Animal Health, Department of Microbiology and Immunology, College of Veterinary Medicine, Cornell University, Hungerford Hill Road, Ithaca, NY 14853, USA.

    The patterns and dynamics of evolution in acutely infecting viruses within individual hosts are largely unknown. To this end, we investigated the intrahost variation of canine influenza virus (CIV) during the course of experimental infections in naïve and partially immune dogs and in naturally infected dogs. Tracing sequence diversity in the gene encoding domain 1 of the hemagglutinin (HA1) protein over the time course of infection provided information on the patterns and processes of intrahost viral evolution and revealed some of the effects of partial host immunity. Viral populations sampled on any given day were generally characterized by mean pairwise genetic diversities between 0.1 and 0.2% and by mutational spectra that changed considerably on different days. Some observed mutations may have affected antigenicity or host range, including reversions of CIV host-associated mutations. Patterns of sequence diversity differed between naïve and vaccinated dogs, with some presumably antigenic mutations transiently reaching high frequency in the latter. CIV populations are therefore characterized by the rapid generation and clearance of genetic diversity. Potentially advantageous mutations arise readily during the course of single infections and may give rise to antigenic escape or host range variants.

    Funded by: NIGMS NIH HHS: R01 GM080533

    Journal of virology 2010;84;10;5329-35

  • Platelets release novel thiol isomerase enzymes which are recruited to the cell surface following activation.

    Holbrook LM, Watkins NA, Simmonds AD, Jones CI, Ouwehand WH and Gibbins JM

    Institute for Cardiovascular and Metabolic Research, School of Biological Sciences, University of Reading, Whiteknights, Reading, Berkshire.

    The thiol isomerase enzymes protein disulphide isomerase (PDI) and endoplasmic reticulum protein 5 (ERp5) are released by resting and activated platelets. These re-associate with the cell surface where they modulate a range of platelet responses including adhesion, secretion and aggregation. Recent studies suggest the existence of yet uncharacterised platelet thiol isomerase proteins. This study aimed to identify which other thiol isomerase enzymes are present in human platelets. Through the use of immunoblotting, flow cytometry, cell-surface biotinylation and gene array analysis, we report the presence of five additional thiol isomerases in human and mouse platelets and megakaryocytes, namely; ERp57, ERp72, ERp44, ERp29 and TMX3. ERp72, ERp57, ERp44 and ERp29 are released by platelets and relocate to the cell surface following platelet activation. The transmembrane thiol isomerase TMX3 was also detected on the platelet surface but does not increase following activation. Extracellular PDI is also implicated in the regulation of coagulation by the modulation of tissue factor activity. ERp57 was identified within platelet-derived microparticle fractions, suggesting that ERp57 may also be involved in the regulation of coagulation as well as platelet function. These data collectively implicate the expanding family of platelet-surface thiol isomerases in the regulation of haemostasis.

    Funded by: British Heart Foundation; Medical Research Council

    British journal of haematology 2010;148;4;627-37

  • Cooking with GAS.

    Holden M

    Nature reviews. Microbiology 2010;8;4;249

  • Genome sequence of a recently emerged, highly transmissible, multi-antibiotic- and antiseptic-resistant variant of methicillin-resistant Staphylococcus aureus, sequence type 239 (TW).

    Holden MT, Lindsay JA, Corton C, Quail MA, Cockfield JD, Pathak S, Batra R, Parkhill J, Bentley SD and Edgeworth JD

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom.

    The 3.1-Mb genome of an outbreak methicillin-resistant Staphylococcus aureus (MRSA) strain (TW20) contains evidence of recently acquired DNA, including two large regions (635 kb and 127 kb). The strain is resistant to a wide range of antibiotics, antiseptics, and heavy metals due to resistance genes encoded on mobile genetic elements and also mutations in housekeeping genes.

    Funded by: Wellcome Trust

    Journal of bacteriology 2010;192;3;888-92

  • Emx2 and early hair cell development in the mouse inner ear.

    Holley M, Rhodes C, Kneebone A, Herde MK, Fleming M and Steel KP

    Department of Biomedical Science, Addison Building, Western Bank, Sheffield S10 2TN, UK.

    Emx2 is a homeodomain protein that plays a critical role in inner ear development. Homozygous null mice die at birth with a range of defects in the CNS, renal system and skeleton. The cochlea is shorter than normal with about 60% fewer auditory hair cells. It appears to lack outer hair cells and some supporting cells are either absent or fail to differentiate. Many of the hair cells differentiate in pairs and although their hair bundles develop normally their planar cell polarity is compromised. Measurements of cell polarity suggest that classic planar cell polarity molecules are not directly influenced by Emx2 and that polarity is compromised by developmental defects in the sensory precursor population or by defects in epithelial cues for cell alignment. Planar cell polarity is normal in the vestibular epithelia although polarity reversal across the striola is absent in both the utricular and saccular maculae. In contrast, cochlear hair cell polarity is disorganized. The expression domain for Bmp4 is expanded and Fgfr1 and Prox1 are expressed in fewer cells in the cochlear sensory epithelium of Emx2 null mice. We conclude that Emx2 regulates early developmental events that balance cell proliferation and differentiation in the sensory precursor population.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Developmental biology 2010;340;2;547-56

  • High-throughput bacterial SNP typing identifies distinct clusters of Salmonella Typhi causing typhoid in Nepalese children.

    Holt KE, Baker S, Dongol S, Basnyat B, Adhikari N, Thorson S, Pulickal AS, Song Y, Parkhill J, Farrar JJ, Murdoch DR, Kelly DF, Pollard AJ and Dougan G

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Background: Salmonella Typhi (S. Typhi) causes typhoid fever, which remains an important public health issue in many developing countries. Kathmandu, the capital of Nepal, is an area of high incidence and the pediatric population appears to be at high risk of exposure and infection.

    Methods: We recently defined the population structure of S. Typhi, using new sequencing technologies to identify nearly 2,000 single nucleotide polymorphisms (SNPs) that can be used as unequivocal phylogenetic markers. Here we have used the GoldenGate (Illumina) platform to simultaneously type 1,500 of these SNPs in 62 S. Typhi isolates causing severe typhoid in children admitted to Patan Hospital in Kathmandu.

    Results: Eight distinct S. Typhi haplotypes were identified during the 20-month study period, with 68% of isolates belonging to a subclone of the previously defined H58 S. Typhi. This subclone was closely associated with resistance to nalidixic acid, with all isolates from this group demonstrating a resistant phenotype and harbouring the same resistance-associated SNP in GyrA (Phe83). A secondary clone, comprising 19% of isolates, was observed only during the second half of the study.

    Conclusions: Our data demonstrate the utility of SNP typing for monitoring bacterial populations over a defined period in a single endemic setting. We provide evidence for genotype introduction and define a nalidixic acid resistant subclone of S. Typhi, which appears to be the dominant cause of severe pediatric typhoid in Kathmandu during the study period.

    Funded by: Wellcome Trust

    BMC infectious diseases 2010;10;144

  • Disease-associated XMRV sequences are consistent with laboratory contamination.

    Hué S, Gray ER, Gall A, Katzourakis A, Tan CP, Houldcroft CJ, McLaren S, Pillay D, Futreal A, Garson JA, Pybus OG, Kellam P and Towers GJ

    MRC Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, 46 Cleveland St, London W1T 4JF, UK.

    Background: Xenotropic murine leukaemia viruses (MLV-X) are endogenous gammaretroviruses that infect cells from many species, including humans. Xenotropic murine leukaemia virus-related virus (XMRV) is a retrovirus that has been the subject of intense debate since its detection in samples from humans with prostate cancer (PC) and chronic fatigue syndrome (CFS). Controversy has arisen from the failure of some studies to detect XMRV in PC or CFS patients and from inconsistent detection of XMRV in healthy controls.

    Results: Here we demonstrate that Taqman PCR primers previously described as XMRV-specific can amplify common murine endogenous viral sequences from mouse suggesting that mouse DNA can contaminate patient samples and confound specific XMRV detection. To consider the provenance of XMRV we sequenced XMRV from the cell line 22Rv1, which is infected with an MLV-X that is indistinguishable from patient derived XMRV. Bayesian phylogenies clearly show that XMRV sequences reportedly derived from unlinked patients form a monophyletic clade with interspersed 22Rv1 clones (posterior probability >0.99). The cell line-derived sequences are ancestral to the patient-derived sequences (posterior probability >0.99). Furthermore, pol sequences apparently amplified from PC patient material (VP29 and VP184) are recombinants of XMRV and Moloney MLV (MoMLV) a virus with an envelope that lacks tropism for human cells. Considering the diversity of XMRV we show that the mean pairwise genetic distance among env and pol 22Rv1-derived sequences exceeds that of patient-associated sequences (Wilcoxon rank sum test: p = 0.005 and p < 0.001 for pol and env, respectively). Thus XMRV sequences acquire diversity in a cell line but not in patient samples. These observations are difficult to reconcile with the hypothesis that published XMRV sequences are related by a process of infectious transmission.

    Conclusions: We provide several independent lines of evidence that XMRV detected by sensitive PCR methods in patient samples is the likely result of PCR contamination with mouse DNA and that the described clones of XMRV arose from the tumour cell line 22Rv1, which was probably infected with XMRV during xenografting in mice. We propose that XMRV might not be a genuine human pathogen.

    Funded by: Medical Research Council: G0801172, G0801172(87743), G9721629; Wellcome Trust: 090940, WT076608, WT090940

    Retrovirology 2010;7;1;111

  • Interleukin-8 mediates resistance to antiangiogenic agent sunitinib in renal cell carcinoma.

    Huang D, Ding Y, Zhou M, Rini BI, Petillo D, Qian CN, Kahnoski R, Futreal PA, Furge KA and Teh BT

    Laboratory of Cancer Genetics, Laboratory of Computational Biology, Van Andel Research Institute, Grand Rapids, Michigan 49503, USA.

    The broad spectrum kinase inhibitor sunitinib is a first-line therapy for advanced clear cell renal cell carcinoma (ccRCC), a deadly form of kidney cancer. Unfortunately, most patients develop sunitinib resistance and progressive disease after about 1 year of treatment. In this study, we evaluated the mechanisms of resistance to sunitinib to identify the potential tactics to overcome it. Xenograft models were generated that mimicked clinical resistance to sunitinib. Higher microvessel density was found in sunitinib-resistant tumors, indicating that an escape from antiangiogenesis occurred. Notably, escape coincided with increased secretion of interleukin-8 (IL-8) from tumors into the plasma, and coadministration of an IL-8 neutralizing antibody resensitized tumors to sunitinib treatment. In patients who were refractory to sunitinib treatment, IL-8 expression was elevated in ccRCC tumors, supporting the concept that IL-8 levels might predict clinical response to sunitinib. Our results reveal IL-8 as an important contributor to sunitinib resistance in ccRCC and a candidate therapeutic target to reverse acquired or intrinsic resistance to sunitinib in this malignancy.

    Funded by: Wellcome Trust: 077012/Z/05/Z

    Cancer research 2010;70;3;1063-71

  • Characterising and predicting haploinsufficiency in the human genome.

    Huang N, Lee I, Marcotte EM and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.

    Funded by: NIGMS NIH HHS: R01 GM067779-08; Wellcome Trust: 077014/Z/05/Z

    PLoS genetics 2010;6;10;e1001154

  • Characterization of a novel wood mouse virus related to murid herpesvirus 4.

    Hughes DJ, Kipar A, Milligan SG, Cunningham C, Sanders M, Quail MA, Rajandream MA, Efstathiou S, Bowden RJ, Chastel C, Bennett M, Sample JT, Barrell B, Davison AJ and Stewart JP

    School of Infection and Host Defence, University of Liverpool, Liverpool L69 3GA, UK.

    Two novel gammaherpesviruses were isolated, one from a field vole (Microtus agrestis) and the other from wood mice (Apodemus sylvaticus). The genome of the latter, designated wood mouse herpesvirus (WMHV), was completely sequenced. WMHV had the same genome structure and predicted gene content as murid herpesvirus 4 (MuHV4; murine gammaherpesvirus 68). Overall nucleotide sequence identity between WMHV and MuHV4 was 85 % and most of the 10 kb region at the left end of the unique region was particularly highly conserved, especially the viral tRNA-like sequences and the coding regions of genes M1 and M4. The partial sequence (71 913 bp) of another gammaherpesvirus, Brest herpesvirus (BRHV), which was isolated ostensibly from a white-toothed shrew (Crocidura russula), was also determined. The BRHV sequence was 99.2 % identical to the corresponding portion of the WMHV genome. Thus, WMHV and BRHV appeared to be strains of a new virus species. Biological characterization of WMHV indicated that it grew with similar kinetics to MuHV4 in cell culture. The pathogenesis of WMHV in wood mice was also extremely similar to that of MuHV4, except for the absence of inducible bronchus-associated lymphoid tissue at day 14 post-infection and a higher load of latently infected cells at 21 days post-infection.

    Funded by: Medical Research Council; NCI NIH HHS: CA090208; Wellcome Trust

    The Journal of general virology 2010;91;Pt 4;867-79

  • Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites.

    Hunt P, Martinelli A, Modrzynska K, Borges S, Creasey A, Rodrigues L, Beraldi D, Loewe L, Fawcett R, Kumar S, Thomson M, Trivedi U, Otto TD, Pain A, Blaxter M and Cravo P

    Institute for Immunology and Infection Research, School of Biological Sciences, University of Edinburgh, Edinburgh, UK.

    Background: Classical and quantitative linkage analyses of genetic crosses have traditionally been used to map genes of interest, such as those conferring chloroquine or quinine resistance in malaria parasites. Next-generation sequencing technologies now present the possibility of determining genome-wide genetic variation at single base-pair resolution. Here, we combine in vivo experimental evolution, a rapid genetic strategy and whole genome re-sequencing to identify the precise genetic basis of artemisinin resistance in a lineage of the rodent malaria parasite, Plasmodium chabaudi. Such genetic markers will further the investigation of resistance and its control in natural infections of the human malaria, P. falciparum.

    Results: A lineage of isogenic in vivo drug-selected mutant P. chabaudi parasites was investigated. By measuring the artemisinin responses of these clones, the appearance of an in vivo artemisinin resistance phenotype within the lineage was defined. The underlying genetic locus was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (Illumina Solexa) defined the point mutations, insertions, deletions and copy-number variations arising in the lineage. Eight point mutations arise within the mutant lineage, only one of which appears on chromosome 2. This missense mutation arises contemporaneously with artemisinin resistance and maps to a gene encoding a de-ubiquitinating enzyme.

    Conclusions: This integrated approach facilitates the rapid identification of mutations conferring selectable phenotypes, without prior knowledge of biological and molecular mechanisms. For malaria, this model can identify candidate genes before resistant parasites are commonly observed in natural human malaria populations.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D019621/1; Medical Research Council: G0400476; Wellcome Trust: 082611/Z/07/Z

    BMC genomics 2010;11;499

  • Systematic analysis of human protein complexes identifies chromosome segregation proteins.

    Hutchins JR, Toyoda Y, Hegemann B, Poser I, Hériché JK, Sykora MM, Augsburg M, Hudecz O, Buschhorn BA, Bulkescher J, Conrad C, Comartin D, Schleiffer A, Sarov M, Pozniakovsky A, Slabicki MM, Schloissnig S, Steinmacher I, Leuschner M, Ssykor A, Lawo S, Pelletier L, Stark H, Nasmyth K, Ellenberg J, Durbin R, Buchholz F, Mechtler K, Hyman AA and Peters JM

    Research Institute of Molecular Pathology (IMP), Dr. Bohr-Gasse 7, A-1030 Vienna, Austria.

    Chromosome segregation and cell division are essential, highly ordered processes that depend on numerous protein complexes. Results from recent RNA interference screens indicate that the identity and composition of these protein complexes is incompletely understood. Using gene tagging on bacterial artificial chromosomes, protein localization, and tandem-affinity purification-mass spectrometry, the MitoCheck consortium has analyzed about 100 human protein complexes, many of which had not or had only incompletely been characterized. This work has led to the discovery of previously unknown, evolutionarily conserved subunits of the anaphase-promoting complex and the gamma-tubulin ring complex--large complexes that are essential for spindle assembly and chromosome segregation. The approaches we describe here are generally applicable to high-throughput follow-up analyses of phenotypic screens in mammalian cells.

    Funded by: Austrian Science Fund FWF: F 3407-B03

    Science (New York, N.Y.) 2010;328;5978;593-9

  • Epilepsy and mental retardation limited to females with PCDH19 mutations can present de novo or in single generation families.

    Hynes K, Tarpey P, Dibbens LM, Bayly MA, Berkovic SF, Smith R, Raisi ZA, Turner SJ, Brown NJ, Desai TD, Haan E, Turner G, Christodoulou J, Leonard H, Gill D, Stratton MR, Gecz J and Scheffer IE

    SA Pathology, Women's and Children's Hospital, 72 King William Road, North Adelaide, SA 5006, Australia.

    Background: Epilepsy and mental retardation limited to females (EFMR) is an intriguing X-linked disorder affecting heterozygous females and sparing hemizygous males. Mutations in the protocadherin 19 (PCDH19) gene have been identified in seven unrelated families with EFMR.

    Here, we assessed the frequency of PCDH19 mutations in individuals with clinical features which overlap those of EFMR. We analysed 185 females from three cohorts: 42 with Rett syndrome who were negative for MECP2 and CDKL5 mutations, 57 with autism spectrum disorders, and 86 with epilepsy with or without intellectual disability. No mutations were identified in the Rett syndrome and autism spectrum disorders cohorts suggesting that despite sharing similar clinical characteristics with EFMR, PCDH19 mutations are not generally associated with these disorders. Among the 86 females with epilepsy (of whom 51 had seizure onset before 3 years), with or without intellectual disability, we identified two (2.3%) missense changes. One (c.1671C-->G, p.N557K), reported previously without clinical data, was found in two affected sisters, the first EFMR family without a multigenerational family history of affected females. The second, reported here, is a novel de novo missense change identified in a sporadic female. The change, p.S276P, is predicted to result in functional disturbance of PCDH19 as it affects a highly conserved residue adjacent to the adhesion interface of EC3 of PCDH19.

    Conclusions: This de novo PCDH19 mutation in a sporadic female highlights that mutational analysis should be considered in isolated instances of girls with infantile onset seizures and developmental delay, in addition to those with the characteristic family history of EFMR.

    Funded by: Wellcome Trust

    Journal of medical genetics 2010;47;3;211-6

  • Four novel Loci (19q13, 6q24, 12q24, and 5q14) influence the microcirculation in vivo.

    Ikram MK, Sim X, Xueling S, Jensen RA, Cotch MF, Hewitt AW, Ikram MA, Wang JJ, Klein R, Klein BE, Breteler MM, Cheung N, Liew G, Mitchell P, Uitterlinden AG, Rivadeneira F, Hofman A, de Jong PT, van Duijn CM, Kao L, Cheng CY, Smith AV, Glazer NL, Lumley T, McKnight B, Psaty BM, Jonasson F, Eiriksdottir G, Aspelund T, Global BPgen Consortium, Harris TB, Launer LJ, Taylor KD, Li X, Iyengar SK, Xi Q, Sivakumaran TA, Mackey DA, Macgregor S, Martin NG, Young TL, Bis JC, Wiggins KL, Heckbert SR, Hammond CJ, Andrew T, Fahy S, Attia J, Holliday EG, Scott RJ, Islam FM, Rotter JI, McAuley AK, Boerwinkle E, Tai ES, Gudnason V, Siscovick DS, Vingerling JR and Wong TY

    Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands.

    There is increasing evidence that the microcirculation plays an important role in the pathogenesis of cardiovascular diseases. Changes in retinal vascular caliber reflect early microvascular disease and predict incident cardiovascular events. We performed a genome-wide association study to identify genetic variants associated with retinal vascular caliber. We analyzed data from four population-based discovery cohorts with 15,358 unrelated Caucasian individuals, who are members of the Cohort for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium, and replicated findings in four independent Caucasian cohorts (n  =  6,652). All participants had retinal photography and retinal arteriolar and venular caliber measured from computer software. In the discovery cohorts, 179 single nucleotide polymorphisms (SNP) spread across five loci were significantly associated (p<5.0×10(-8)) with retinal venular caliber, but none showed association with arteriolar caliber. Collectively, these five loci explain 1.0%-3.2% of the variation in retinal venular caliber. Four out of these five loci were confirmed in independent replication samples. In the combined analyses, the top SNPs at each locus were: rs2287921 (19q13; p  =  1.61×10(-25), within the RASIP1 locus), rs225717 (6q24; p = 1.25×10(-16), adjacent to the VTA1 and NMBR loci), rs10774625 (12q24; p  =  2.15×10(-13), in the region of ATXN2,SH2B3 and PTPN11 loci), and rs17421627 (5q14; p = 7.32×10(-16), adjacent to the MEF2C locus). In two independent samples, locus 12q24 was also associated with coronary heart disease and hypertension. Our population-based genome-wide association study demonstrates four novel loci associated with retinal venular caliber, an endophenotype of the microcirculation associated with clinical cardiovascular disease. These data provide further insights into the contribution and biological mechanisms of microcirculatory changes that underlie cardiovascular disease.

    Funded by: NCRR NIH HHS: M01RR00069, UL1RR025005; NEI NIH HHS: Z01 EY000401-06, Z01 EY000401-07, Z01 EY000426-04, Z01 EY000426-05, Z01EY000401, Z01EY000426, Z99 EY999999, ZIA EY000401-08, ZIA EY000401-09, ZIA EY000401-10, ZIA EY000403-09, ZIA EY000403-10, ZIA EY000426-06, ZIA EY000426-07, ZIA EY000426-08; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: N01 HC-15103, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85086, N01HC-55222, R01 HL087652, R01HL087641, T32HL007902, U01 HL080295; NIA NIH HHS: N01-AG-12100, Z01AG007380; NIDDK NIH HHS: DK063491

    PLoS genetics 2010;6;10;e1001184

  • A genome-wide perspective of genetic variation in human metabolism.

    Illig T, Gieger C, Zhai G, Römisch-Margl W, Wang-Sattler R, Prehn C, Altmaier E, Kastenmüller G, Kato BS, Mewes HW, Meitinger T, de Angelis MH, Kronenberg F, Soranzo N, Wichmann HE, Spector TD, Adamski J and Suhre K

    Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.

    Serum metabolite concentrations provide a direct readout of biological processes in the human body, and they are associated with disorders such as cardiovascular and metabolic diseases. We present a genome-wide association study (GWAS) of 163 metabolic traits measured in human blood from 1,809 participants from the KORA population, with replication in 422 participants of the TwinsUK cohort. For eight out of nine replicated loci (FADS1, ELOVL2, ACADS, ACADM, ACADL, SPTLC3, ETFDH and SLC16A9), the genetic variant is located in or near genes encoding enzymes or solute carriers whose functions match the associating metabolic traits. In our study, the use of metabolite concentration ratios as proxies for enzymatic reaction rates reduced the variance and yielded robust statistical associations with P values ranging from 3 x 10(-24) to 6.5 x 10(-179). These loci explained 5.6%-36.3% of the observed variance in metabolite concentrations. For several loci, associations with clinically relevant parameters have been reported previously.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; Wellcome Trust: 091746

    Nature genetics 2010;42;2;137-41

  • Orphan CpG islands identify numerous conserved promoters in the mammalian genome.

    Illingworth RS, Gruenewald-Schneider U, Webb S, Kerr AR, James KD, Turner DJ, Smith C, Harrison DJ, Andrews R and Bird AP

    Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom.

    CpG islands (CGIs) are vertebrate genomic landmarks that encompass the promoters of most genes and often lack DNA methylation. Querying their apparent importance, the number of CGIs is reported to vary widely in different species and many do not co-localise with annotated promoters. We set out to quantify the number of CGIs in mouse and human genomes using CXXC Affinity Purification plus deep sequencing (CAP-seq). We also asked whether CGIs not associated with annotated transcripts share properties with those at known promoters. We found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes. In each species CpG density correlates positively with the degree of H3K4 trimethylation, supporting the hypothesis that these two properties are mechanistically interdependent. Approximately half of mammalian CGIs (>10,000) are "orphans" that are not associated with annotated promoters. Many orphan CGIs show evidence of transcriptional initiation and dynamic expression during development. Unlike CGIs at known promoters, orphan CGIs are frequently subject to DNA methylation during development, and this is accompanied by loss of their active promoter features. In colorectal tumors, however, orphan CGIs are not preferentially methylated, suggesting that cancer does not recapitulate a developmental program. Human and mouse genomes have similar numbers of CGIs, over half of which are remote from known promoters. Orphan CGIs nevertheless have the characteristics of functional promoters, though they are much more likely than promoter CGIs to become methylated during development and hence lose these properties. The data indicate that orphan CGIs correspond to previously undetected promoters whose transcriptional activity may play a functional role during development.

    Funded by: Medical Research Council: G0800026, G0900627; Wellcome Trust: 077224

    PLoS genetics 2010;6;9;e1001134

  • A large replication study and meta-analysis in European samples provides further support for association of AHI1 markers with schizophrenia.

    Ingason A, Giegling I, Cichon S, Hansen T, Rasmussen HB, Nielsen J, Jürgens G, Muglia P, Hartmann AM, Strengman E, Vasilescu C, Mühleisen TW, Djurovic S, Melle I, Lerer B, Möller HJ, Francks C, Pietiläinen OP, Lonnqvist J, Suvisaari J, Tuulio-Henriksson A, Walshe M, Vassos E, Di Forti M, Murray R, Bonetto C, Tosato S, GROUP Investigators, Cantor RM, Rietschel M, Craddock N, Owen MJ, Peltonen L, Andreassen OA, Nöthen MM, St Clair D, Ophoff RA, O'Donovan MC, Collier DA, Werge T and Rujescu D

    Research Institute of Biological Psychiatry, Copenhagen University Hospital, Roskilde, Denmark.

    The Abelson helper integration site 1 (AHI1) gene locus on chromosome 6q23 is among a group of candidate loci for schizophrenia susceptibility that were initially identified by linkage followed by linkage disequilibrium mapping, and subsequent replication of the association in an independent sample. Here, we present results of a replication study of AHI1 locus markers, previously implicated in schizophrenia, in a large European sample (in total 3907 affected and 7429 controls). Furthermore, we perform a meta-analysis of the implicated markers in 4496 affected and 18,920 controls. Both the replication study of new samples and the meta-analysis show evidence for significant overrepresentation of all tested alleles in patients compared with controls (meta-analysis; P = 8.2 x 10(-5)-1.7 x 10(-3), common OR = 1.09-1.11). The region contains two genes, AHI1 and C6orf217, and both genes-as well as the neighbouring phosphodiesterase 7B (PDE7B)-may be considered candidates for involvement in the genetic aetiology of schizophrenia.

    Funded by: Medical Research Council; NIMH NIH HHS: R01 MH078075; Wellcome Trust: 076113

    Human molecular genetics 2010;19;7;1379-86

  • Detailed physiologic characterization reveals diverse mechanisms for novel genetic Loci regulating glucose and insulin metabolism in humans.

    Ingelsson E, Langenberg C, Hivert MF, Prokopenko I, Lyssenko V, Dupuis J, Mägi R, Sharp S, Jackson AU, Assimes TL, Shrader P, Knowles JW, Zethelius B, Abbasi FA, Bergman RN, Bergmann A, Berne C, Boehnke M, Bonnycastle LL, Bornstein SR, Buchanan TA, Bumpstead SJ, Böttcher Y, Chines P, Collins FS, Cooper CC, Dennison EM, Erdos MR, Ferrannini E, Fox CS, Graessler J, Hao K, Isomaa B, Jameson KA, Kovacs P, Kuusisto J, Laakso M, Ladenvall C, Mohlke KL, Morken MA, Narisu N, Nathan DM, Pascoe L, Payne F, Petrie JR, Sayer AA, Schwarz PE, Scott LJ, Stringham HM, Stumvoll M, Swift AJ, Syvänen AC, Tuomi T, Tuomilehto J, Tönjes A, Valle TT, Williams GH, Lind L, Barroso I, Quertermous T, Walker M, Wareham NJ, Meigs JB, McCarthy MI, Groop L, Watanabe RM, Florez JC and MAGIC investigators

    Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

    OBJECTIVE Recent genome-wide association studies have revealed loci associated with glucose and insulin-related traits. We aimed to characterize 19 such loci using detailed measures of insulin processing, secretion, and sensitivity to help elucidate their role in regulation of glucose control, insulin secretion and/or action. RESEARCH DESIGN AND METHODS We investigated associations of loci identified by the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) with circulating proinsulin, measures of insulin secretion and sensitivity from oral glucose tolerance tests (OGTTs), euglycemic clamps, insulin suppression tests, or frequently sampled intravenous glucose tolerance tests in nondiabetic humans (n = 29,084). RESULTS The glucose-raising allele in MADD was associated with abnormal insulin processing (a dramatic effect on higher proinsulin levels, but no association with insulinogenic index) at extremely persuasive levels of statistical significance (P = 2.1 x 10(-71)). Defects in insulin processing and insulin secretion were seen in glucose-raising allele carriers at TCF7L2, SCL30A8, GIPR, and C2CD4B. Abnormalities in early insulin secretion were suggested in glucose-raising allele carriers at MTNR1B, GCK, FADS1, DGKB, and PROX1 (lower insulinogenic index; no association with proinsulin or insulin sensitivity). Two loci previously associated with fasting insulin (GCKR and IGF1) were associated with OGTT-derived insulin sensitivity indices in a consistent direction. CONCLUSIONS Genetic loci identified through their effect on hyperglycemia and/or hyperinsulinemia demonstrate considerable heterogeneity in associations with measures of insulin processing, secretion, and sensitivity. Our findings emphasize the importance of detailed physiological characterization of such loci for improved understanding of pathways associated with alterations in glucose homeostasis and eventually type 2 diabetes.

    Funded by: Medical Research Council: G0701863, MC_U106179471, MC_U147574213, MC_U147574239, MC_UP_A620_1014, MC_UP_A620_1015; NIDDK NIH HHS: R01 DK029867, R01 DK072193-05

    Diabetes 2010;59;5;1266-75

  • Metabonomic, transcriptomic, and genomic variation of a population cohort.

    Inouye M, Kettunen J, Soininen P, Silander K, Ripatti S, Kumpula LS, Hämäläinen E, Jousilahti P, Kangas AJ, Männistö S, Savolainen MJ, Jula A, Leiviskä J, Palotie A, Salomaa V, Perola M, Ala-Korpela M and Peltonen L

    Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.

    Comprehensive characterization of human tissues promises novel insights into the biological architecture of human diseases and traits. We assessed metabonomic, transcriptomic, and genomic variation for a large population-based cohort from the capital region of Finland. Network analyses identified a set of highly correlated genes, the lipid-leukocyte (LL) module, as having a prominent role in over 80 serum metabolites (of 134 measures quantified), including lipoprotein subclasses, lipids, and amino acids. Concurrent association with immune response markers suggested the LL module as a possible link between inflammation, metabolism, and adiposity. Further, genomic variation was used to generate a directed network and infer LL module's largely reactive nature to metabolites. Finally, gene co-expression in circulating leukocytes was shown to be dependent on serum metabolite concentrations, providing evidence for the hypothesis that the coherence of molecular networks themselves is conditional on environmental factors. These findings show the importance and opportunity of systematic molecular investigation of human population samples. To facilitate and encourage this investigation, the metabonomic, transcriptomic, and genomic data used in this study have been made available as a resource for the research community.

    Molecular systems biology 2010;6;441

  • An immune response network associated with blood lipid levels.

    Inouye M, Silander K, Hamalainen E, Salomaa V, Harald K, Jousilahti P, Männistö S, Eriksson JG, Saarela J, Ripatti S, Perola M, van Ommen GJ, Taskinen MR, Palotie A, Dermitzakis ET and Peltonen L

    Department of Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    While recent scans for genetic variation associated with human disease have been immensely successful in uncovering large numbers of loci, far fewer studies have focused on the underlying pathways of disease pathogenesis. Many loci which are associated with disease and complex phenotypes map to non-coding, regulatory regions of the genome, indicating that modulation of gene transcription plays a key role. Thus, this study generated genome-wide profiles of both genetic and transcriptional variation from the total blood extracts of over 500 randomly-selected, unrelated individuals. Using measurements of blood lipids, key players in the progression of atherosclerosis, three levels of biological information are integrated in order to investigate the interactions between circulating leukocytes and proximal lipid compounds. Pair-wise correlations between gene expression and lipid concentration indicate a prominent role for basophil granulocytes and mast cells, cell types central to powerful allergic and inflammatory responses. Network analysis of gene co-expression showed that the top associations function as part of a single, previously unknown gene module, the Lipid Leukocyte (LL) module. This module replicated in T cells from an independent cohort while also displaying potential tissue specificity. Further, genetic variation driving LL module expression included the single nucleotide polymorphism (SNP) most strongly associated with serum immunoglobulin E (IgE) levels, a key antibody in allergy. Structural Equation Modeling (SEM) indicated that LL module is at least partially reactive to blood lipid levels. Taken together, this study uncovers a gene network linking blood lipids and circulating cell types and offers insight into the hypothesis that the inflammatory response plays a prominent role in metabolism and the potential control of atherogenesis.

    Funded by: Wellcome Trust: WT089061, WT089062

    PLoS genetics 2010;6;9;e1001113

  • International network of cancer genome projects.

    International Cancer Genome Consortium, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabé RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, Vockley JG, Watanabe K, Yang H, Yuen MM, Knoppers BM, Bobrow M, Cambon-Thomsen A, Dressler LG, Dyke SO, Joly Y, Kato K, Kennedy KL, Nicolás P, Parker MJ, Rial-Sebbag E, Romeo-Casabona CM, Shaw KM, Wallace S, Wiesner GL, Zeps N, Lichter P, Biankin AV, Chabannon C, Chin L, Clément B, de Alava E, Degos F, Ferguson ML, Geary P, Hayes DN, Hudson TJ, Johns AL, Kasprzyk A, Nakagawa H, Penny R, Piris MA, Sarin R, Scarpa A, Shibata T, van de Vijver M, Futreal PA, Aburatani H, Bayés M, Botwell DD, Campbell PJ, Estivill X, Gerhard DS, Grimmond SM, Gut I, Hirst M, López-Otín C, Majumder P, Marra M, McPherson JD, Nakagawa H, Ning Z, Puente XS, Ruan Y, Shibata T, Stratton MR, Stunnenberg HG, Swerdlow H, Velculescu VE, Wilson RK, Xue HH, Yang L, Spellman PT, Bader GD, Boutros PC, Campbell PJ, Flicek P, Getz G, Guigó R, Guo G, Haussler D, Heath S, Hubbard TJ, Jiang T, Jones SM, Li Q, López-Bigas N, Luo R, Muthuswamy L, Ouellette BF, Pearson JV, Puente XS, Quesada V, Raphael BJ, Sander C, Shibata T, Speed TP, Stein LD, Stuart JM, Teague JW, Totoki Y, Tsunoda T, Valencia A, Wheeler DA, Wu H, Zhao S, Zhou G, Stein LD, Guigó R, Hubbard TJ, Joly Y, Jones SM, Kasprzyk A, Lathrop M, López-Bigas N, Ouellette BF, Spellman PT, Teague JW, Thomas G, Valencia A, Yoshida T, Kennedy KL, Axton M, Dyke SO, Futreal PA, Gerhard DS, Gunter C, Guyer M, Hudson TJ, McPherson JD, Miller LJ, Ozenberger B, Shaw KM, Kasprzyk A, Stein LD, Zhang J, Haider SA, Wang J, Yung CK, Cros A, Cross A, Liang Y, Gnaneshan S, Guberman J, Hsu J, Bobrow M, Chalmers DR, Hasel KW, Joly Y, Kaan TS, Kennedy KL, Knoppers BM, Lowrance WW, Masui T, Nicolás P, Rial-Sebbag E, Rodriguez LL, Vergely C, Yoshida T, Grimmond SM, Biankin AV, Bowtell DD, Cloonan N, deFazio A, Eshleman JR, Etemadmoghadam D, Gardiner BB, Gardiner BA, Kench JG, Scarpa A, Sutherland RL, Tempero MA, Waddell NJ, Wilson PJ, McPherson JD, Gallinger S, Tsao MS, Shaw PA, Petersen GM, Mukhopadhyay D, Chin L, DePinho RA, Thayer S, Muthuswamy L, Shazand K, Beck T, Sam M, Timms L, Ballin V, Lu Y, Ji J, Zhang X, Chen F, Hu X, Zhou G, Yang Q, Tian G, Zhang L, Xing X, Li X, Zhu Z, Yu Y, Yu J, Yang H, Lathrop M, Tost J, Brennan P, Holcatova I, Zaridze D, Brazma A, Egevard L, Prokhortchouk E, Banks RE, Uhlén M, Cambon-Thomsen A, Viksna J, Ponten F, Skryabin K, Stratton MR, Futreal PA, Birney E, Borg A, Børresen-Dale AL, Caldas C, Foekens JA, Martin S, Reis-Filho JS, Richardson AL, Sotiriou C, Stunnenberg HG, Thoms G, van de Vijver M, van't Veer L, Calvo F, Birnbaum D, Blanche H, Boucher P, Boyault S, Chabannon C, Gut I, Masson-Jacquemier JD, Lathrop M, Pauporté I, Pivot X, Vincent-Salomon A, Tabone E, Theillet C, Thomas G, Tost J, Treilleux I, Calvo F, Bioulac-Sage P, Clément B, Decaens T, Degos F, Franco D, Gut I, Gut M, Heath S, Lathrop M, Samuel D, Thomas G, Zucman-Rossi J, Lichter P, Eils R, Brors B, Korbel JO, Korshunov A, Landgraf P, Lehrach H, Pfister S, Radlwimmer B, Reifenberger G, Taylor MD, von Kalle C, Majumder PP, Sarin R, Rao TS, Bhan MK, Scarpa A, Pederzoli P, Lawlor RA, Delledonne M, Bardelli A, Biankin AV, Grimmond SM, Gress T, Klimstra D, Zamboni G, Shibata T, Nakamura Y, Nakagawa H, Kusada J, Tsunoda T, Miyano S, Aburatani H, Kato K, Fujimoto A, Yoshida T, Campo E, López-Otín C, Estivill X, Guigó R, de Sanjosé S, Piris MA, Montserrat E, González-Díaz M, Puente XS, Jares P, Valencia A, Himmelbauer H, Himmelbaue H, Quesada V, Bea S, Stratton MR, Futreal PA, Campbell PJ, Vincent-Salomon A, Richardson AL, Reis-Filho JS, van de Vijver M, Thomas G, Masson-Jacquemier JD, Aparicio S, Borg A, Børresen-Dale AL, Caldas C, Foekens JA, Stunnenberg HG, van't Veer L, Easton DF, Spellman PT, Martin S, Barker AD, Chin L, Collins FS, Compton CC, Ferguson ML, Gerhard DS, Getz G, Gunter C, Guttmacher A, Guyer M, Hayes DN, Lander ES, Ozenberger B, Penny R, Peterson J, Sander C, Shaw KM, Speed TP, Spellman PT, Vockley JG, Wheeler DA, Wilson RK, Hudson TJ, Chin L, Knoppers BM, Lander ES, Lichter P, Stein LD, Stratton MR, Anderson W, Barker AD, Bell C, Bobrow M, Burke W, Collins FS, Compton CC, DePinho RA, Easton DF, Futreal PA, Gerhard DS, Green AR, Guyer M, Hamilton SR, Hubbard TJ, Kallioniemi OP, Kennedy KL, Ley TJ, Liu ET, Lu Y, Majumder P, Marra M, Ozenberger B, Peterson J, Schafer AJ, Spellman PT, Stunnenberg HG, Wainwright BJ, Wilson RK and Yang H

    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.

    Funded by: Cancer Research UK: 6613; NCI NIH HHS: P01 CA117969-04S1, P01 CA117969-05, P50 CA102701-08, P50 CA127003-04, P50 CA127003-05; NHGRI NIH HHS: R01 HG001806-02; NIDDK NIH HHS: K08 DK071329, K08 DK071329-04, K08 DK071329-05; Wellcome Trust: 077198, 088340, 093867

    Nature 2010;464;7291;993-8

  • Integrating common and rare genetic variation in diverse human populations.

    International HapMap 3 Consortium, Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Peltonen L, Dermitzakis E, Bonnen PE, Altshuler DM, Gibbs RA, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Gibbs RA, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Dermitzakis E, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, Bonnen PE, Gibbs RA, Gonzaga-Jauregui C, Keinan A, Price AL, Yu F, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Schaffner SF, Zhang Q, Ghori MJ, McGinnis R, McLaren W, Pollack S, Price AL, Schaffner SF, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD and McEwen JE

    Broad Institute, 7 Cambridge Center, Cambridge, Massachusetts 02138, USA.

    Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of <or=5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.

    Funded by: Medical Research Council: G0000934; NHGRI NIH HHS: U54 HG003273; Wellcome Trust: 068545, 068545/Z/02, 076113, 077011, 077014, 082371, 089061, 089062, 091746

    Nature 2010;467;7311;52-8

  • Lack of support for association between the KIF1B rs10492972[C] variant and multiple sclerosis.

    International Multiple Sclerosis Genetics Consortium (IMSGC), Booth DR, Heard RN, Stewart GJ, Cox M, Scott RJ, Lechner-Scott J, Goris A, Dobosi R, Dubois B, Saarela J, Leppä V, Peltonen L, Pirttila T, Cournu-Rebeix I, Fontaine B, Bergamaschi L, D'Alfonso S, Leone M, Lorentzen AR, Harbo HF, Celius EG, Spurkland A, Link J, Kockum I, Olsson T, Hillert J, Ban M, Baker A, Kemppinen A, Sawcer S, Compston A, Robertson NP, De Jager PL, Hafler DA, Barcellos LF, Ivinson AJ, McCauley JL, Pericak-Vance MA, Oksenberg JR, Hauser SL, Sexton D and Haines J

    Funded by: NINDS NIH HHS: R01 NS049477-01A1

    Nature genetics 2010;42;6;469-70; author reply 470-1

  • IL12A, MPHOSPH9/CDK2AP1 and RGS1 are novel multiple sclerosis susceptibility loci.

    International Multiple Sclerosis Genetics Conssortium (IMSGC)

    A recent meta-analysis identified seven single-nucleotide polymorphisms (SNPs) with suggestive evidence of association with multiple sclerosis (MS). We report an analysis of these polymorphisms in a replication study that includes 8,085 cases and 7,777 controls. A meta-analysis across the replication collections and a joint analysis with the discovery data set were performed. The possible functional consequences of the validated susceptibility loci were explored using RNA expression data. For all of the tested SNPs, the effect observed in the replication phase involved the same allele and the same direction of effect observed in the discovery phase. Three loci exceeded genome-wide significance in the joint analysis: RGS1 (P value=3.55 x 10(-9)), IL12A (P=3.08 x 10(-8)) and MPHOSPH9/CDK2AP1 (P=3.96 x 10(-8)). The RGS1 risk allele is shared with celiac disease (CD), and the IL12A risk allele seems to be protective for celiac disease. Within the MPHOSPH9/CDK2AP1 locus, the risk allele correlates with diminished RNA expression of the cell cycle regulator CDK2AP1; this effect is seen in both lymphoblastic cell lines (P=1.18 x 10(-5)) and in peripheral blood mononuclear cells from subjects with MS (P=0.01). Thus, we report three new MS susceptibility loci, including a novel inflammatory disease locus that could affect autoreactive cell proliferation.

    Funded by: Medical Research Council: G0700061; NINDS NIH HHS: R01NS049477

    Genes and immunity 2010;11;5;397-405

  • Failure to validate association between 12p13 variants and ischemic stroke.

    International Stroke Genetics Consortium and Wellcome Trust Case-Control Consortium 2

    Funded by: British Heart Foundation: RG/08/014/24067; Medical Research Council: G0000934, G0701075; NCI NIH HHS: CA 047988; NCRR NIH HHS: M01 RR 165001, M01 RR07122, R54 RR020278; NHGRI NIH HHS: U01 HG004436; NHLBI NIH HHS: HL 043851, HL69757, R01 HL087676, R25 HL088724; NINDS NIH HHS: 1R01 NS059727, K08 NS045802, NS056302, NS30678, NS34447, NS36695, R01 NS 42733, R01 NS059727-01A1, R01 NS45012, R21NS064908; PHS HHS: P60 12583; Wellcome Trust: 068545/Z/02

    The New England journal of medicine 2010;362;16;1547-50

  • The genome sequence of Trypanosoma brucei gambiense, causative agent of chronic human african trypanosomiasis.

    Jackson AP, Sanders M, Berry A, McQuillan J, Aslett MA, Quail MA, Chukualim B, Capewell P, MacLeod A, Melville SE, Gibson W, Barry JD, Berriman M and Hertz-Fowler C

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Background: Trypanosoma brucei gambiense is the causative agent of chronic Human African Trypanosomiasis or sleeping sickness, a disease endemic across often poor and rural areas of Western and Central Africa. We have previously published the genome sequence of a T. b. brucei isolate, and have now employed a comparative genomics approach to understand the scale of genomic variation between T. b. gambiense and the reference genome. We sought to identify features that were uniquely associated with T. b. gambiense and its ability to infect humans.

    An improved high-quality draft genome sequence for the group 1 T. b. gambiense DAL 972 isolate was produced using a whole-genome shotgun strategy. Comparison with T. b. brucei showed that sequence identity averages 99.2% in coding regions, and gene order is largely collinear. However, variation associated with segmental duplications and tandem gene arrays suggests some reduction of functional repertoire in T. b. gambiense DAL 972. A comparison of the variant surface glycoproteins (VSG) in T. b. brucei with all T. b. gambiense sequence reads showed that the essential structural repertoire of VSG domains is conserved across T. brucei.

    Conclusions: This study provides the first estimate of intraspecific genomic variation within T. brucei, and so has important consequences for future population genomics studies. We have shown that the T. b. gambiense genome corresponds closely with the reference, which should therefore be an effective scaffold for any T. brucei genome sequence data. As VSG repertoire is also well conserved, it may be feasible to describe the total diversity of variant antigens. While we describe several as yet uncharacterized gene families with predicted cell surface roles that were expanded in number in T. b. brucei, no T. b. gambiense-specific gene was identified outside of the subtelomeres that could explain the ability to infect humans.

    Funded by: Wellcome Trust: 079703, WT085775/Z/08/Z

    PLoS neglected tropical diseases 2010;4;4;e658

  • Genome-wide association study in a high-risk isolate for multiple sclerosis reveals associated variants in STAT3 gene.

    Jakkula E, Leppä V, Sulonen AM, Varilo T, Kallio S, Kemppinen A, Purcell S, Koivisto K, Tienari P, Sumelahti ML, Elovaara I, Pirttilä T, Reunanen M, Aromaa A, Oturai AB, Søndergaard HB, Harbo HF, Mero IL, Gabriel SB, Mirel DB, Hauser SL, Kappos L, Polman C, De Jager PL, Hafler DA, Daly MJ, Palotie A, Saarela J and Peltonen L

    Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.

    Genetic risk for multiple sclerosis (MS) is thought to involve both common and rare risk alleles. Recent GWAS and subsequent meta-analysis have established the critical role of the HLA locus and identified new common variants associated to MS. These variants have small odds ratios (ORs) and explain only a fraction of the genetic risk. To expose potentially rare, high-impact alleles, we conducted a GWAS of 68 distantly related cases and 136 controls from a high-risk internal isolate of Finland with increased prevalence and familial occurrence of MS. The top 27 loci with p < 10(-4) were tested in 711 cases and 1029 controls from Finland, and the top two findings were validated in 3859 cases and 9110 controls from more heterogeneous populations. SNP (rs744166) within the STAT3 gene was associated to MS (p = 2.75 x 10(-10), OR 0.87, confidence interval 0.83-0.91). The protective haplotype for MS in STAT3 is a risk allele for Crohn disease, implying that STAT3 represents a shared risk locus for at least two autoimmune diseases. This study also demonstrates the potential of special isolated populations in search for variants contributing to complex traits.

    Funded by: NCRR NIH HHS: U54 RR020278; NINDS NIH HHS: R01 NS 43559; Wellcome Trust: 089061/Z/09/Z

    American journal of human genetics 2010;86;2;285-91

  • Genetic diversity of two haploid markers in the Udegey population from southeastern Siberia.

    Jin HJ, Kim KC and Kim W

    Department of Biological Sciences, Dankook University, Cheonan 330-714, Korea.

    The Udegeys are a small ethnic group who live along the tributaries of the Amur River Basin of southeastern Siberia in Russia. They are thought to speak a language belonging to a subdivision of the Tungusic-Manchu branch of the Altaic family. To understand the genetic features and genetic history of the Udegeys, we analyzed two haploid markers, mitochondrial DNA (mtDNA), and Y-chromosomal variation, in 51 individuals (including 21 males) from the Udegey population. In general, the Udegeys' mtDNA profiles revealed similarities to Siberians and other northeastern Asian populations, although a moderate European contribution was also detected. Interestingly, pairwise values of F(ST) and the MDS plots based on the mtDNA variation showed that the Orok and Nivkh inhabiting the very same region of the Udegey were significantly different from the Udegey, implying that they may have been isolated and undergone substantial genetic drift. The Udegeys were characterized by a high frequency (66.7%) of Y chromosome haplogroup C, indicating a close genetic relationship with Mongolians and Siberians. On the paternal side, however, very little admixture was observed between the Udegeys and Europeans. Thus, the combined haploid genetic markers of both mtDNA and the Y chromosome imply that the Udegeys are overall closest to Siberians and northeast Asians of the Altaic linguistic family, with a minor maternal contribution from the European part of the continent.

    American journal of physical anthropology 2010;142;2;303-13

  • The JAK2 46/1 haplotype predisposes to MPL-mutated myeloproliferative neoplasms.

    Jones AV, Campbell PJ, Beer PA, Schnittger S, Vannucchi AM, Zoi K, Percy MJ, McMullin MF, Scott LM, Tapper W, Silver RT, Oscier D, Harrison CN, Grallert H, Kisialiou A, Strike P, Chase AJ, Green AR and Cross NC

    Wessex Regional Genetics Laboratory, Salisbury District Hospital, Salisbury, United Kingdom.

    The 46/1 JAK2 haplotype predisposes to V617F-positive myeloproliferative neoplasms, but the underlying mechanism is obscure. We analyzed essential thrombocythemia patients entered into the PT-1 studies and, as expected, found that 46/1 was overrepresented in V617F-positive cases (n = 404) versus controls (n = 1492, P = 3.9 x 10(-11)). The 46/1 haplotype was also overrepresented in cases without V617F (n = 347, P = .009), with an excess seen for both MPL exon 10 mutated and V617F, MPL exon 10 nonmutated cases. Analysis of further MPL-positive, V617F-negative cases confirmed an excess of 46/1 (n = 176, P = .002), but no association between MPL mutations and MPL haplotype was seen. An excess of 46/1 was also seen in JAK2 exon 12 mutated cases (n = 69, P = .002), and these mutations preferentially arose on the 46/1 chromosome (P = .029). No association between 46/1 and clinical or laboratory features was seen in the PT-1 cohort either with or without V617F. The excess of 46/1 in JAK2 exon 12 cases is compatible with both the "hypermutability" and "fertile ground" hypotheses, but the excess in MPL-mutated cases argues against the former. No difference in sequence, splicing, or expression of JAK2 was found on 46/1 compared with other haplotypes, suggesting that any functional difference of JAK2 on 46/1, if it exists, must be relatively subtle.

    Funded by: NIA NIH HHS: N01-AG-1-1, N01-AG-1-2111, N01-AG-5-0002; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164; Wellcome Trust: 07611, 088340

    Blood 2010;115;22;4517-23

  • Genetic evidence implicates the immune system and cholesterol metabolism in the aetiology of Alzheimer's disease.

    Jones L, Holmans PA, Hamshere ML, Harold D, Moskvina V, Ivanov D, Pocklington A, Abraham R, Hollingworth P, Sims R, Gerrish A, Pahwa JS, Jones N, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C, Mann D, Smith AD, Love S, Kehoe PG, Mead S, Fox N, Rossor M, Collinge J, Maier W, Jessen F, Schürmann B, Heun R, Kölsch H, van den Bussche H, Heuser I, Peters O, Kornhuber J, Wiltfang J, Dichgans M, Frölich L, Hampel H, Hüll M, Rujescu D, Goate AM, Kauwe JS, Cruchaga C, Nowotny P, Morris JC, Mayo K, Livingston G, Bass NJ, Gurling H, McQuillin A, Gwilliam R, Deloukas P, Al-Chalabi A, Shaw CE, Singleton AB, Guerreiro R, Mühleisen TW, Nöthen MM, Moebus S, Jöckel KH, Klopp N, Wichmann HE, Rüther E, Carrasquillo MM, Pankratz VS, Younkin SG, Hardy J, O'Donovan MC, Owen MJ and Williams J

    Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Cardiff, United Kingdom.

    Background: Late Onset Alzheimer's disease (LOAD) is the leading cause of dementia. Recent large genome-wide association studies (GWAS) identified the first strongly supported LOAD susceptibility genes since the discovery of the involvement of APOE in the early 1990s. We have now exploited these GWAS datasets to uncover key LOAD pathophysiological processes.

    Methodology: We applied a recently developed tool for mining GWAS data for biologically meaningful information to a LOAD GWAS dataset. The principal findings were then tested in an independent GWAS dataset.

    We found a significant overrepresentation of association signals in pathways related to cholesterol metabolism and the immune response in both of the two largest genome-wide association studies for LOAD.

    Significance: Processes related to cholesterol metabolism and the innate immune response have previously been implicated by pathological and epidemiological studies of Alzheimer's disease, but it has been unclear whether those findings reflected primary aetiological events or consequences of the disease process. Our independent evidence from two large studies now demonstrates that these processes are aetiologically relevant, and suggests that they may be suitable targets for novel and existing therapeutic approaches.

    Funded by: Medical Research Council; NIA NIH HHS: Z01 AG000950-06; Wellcome Trust

    PloS one 2010;5;11;e13950

  • Reverse engineering a gene network using an asynchronous parallel evolution strategy.

    Jostins L and Jaeger J

    Laboratory for Development & Evolution, University Museum of Zoology, Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, UK.

    Background: The use of reverse engineering methods to infer gene regulatory networks by fitting mathematical models to gene expression data is becoming increasingly popular and successful. However, increasing model complexity means that more powerful global optimisation techniques are required for model fitting. The parallel Lam Simulated Annealing (pLSA) algorithm has been used in such approaches, but recent research has shown that island Evolutionary Strategies can produce faster, more reliable results. However, no parallel island Evolutionary Strategy (piES) has yet been demonstrated to be effective for this task.

    Results: Here, we present synchronous and asynchronous versions of the piES algorithm, and apply them to a real reverse engineering problem: inferring parameters in the gap gene network. We find that the asynchronous piES exhibits very little communication overhead, and shows significant speed-up for up to 50 nodes: the piES running on 50 nodes is nearly 10 times faster than the best serial algorithm. We compare the asynchronous piES to pLSA on the same test problem, measuring the time required to reach particular levels of residual error, and show that it shows much faster convergence than pLSA across all optimisation conditions tested.

    Conclusions: Our results demonstrate that the piES is consistently faster and more reliable than the pLSA algorithm on this problem, and scales better with increasing numbers of nodes. In addition, the piES is especially well suited to further improvements and adaptations: Firstly, the algorithm's fast initial descent speed and high reliability make it a good candidate for being used as part of a global/local search hybrid algorithm. Secondly, it has the potential to be used as part of a hierarchical evolutionary algorithm, which takes advantage of modern multi-core computing architectures.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D00513

    BMC systems biology 2010;4;17

  • iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution.

    König J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM and Ule J

    Medical Research Council Laboratory of Molecular Biology, Cambridge, UK.

    In the nucleus of eukaryotic cells, nascent transcripts are associated with heterogeneous nuclear ribonucleoprotein (hnRNP) particles that are nucleated by hnRNP C. Despite their abundance, however, it remained unclear whether these particles control pre-mRNA processing. Here, we developed individual-nucleotide resolution UV cross-linking and immunoprecipitation (iCLIP) to study the role of hnRNP C in splicing regulation. iCLIP data show that hnRNP C recognizes uridine tracts with a defined long-range spacing consistent with hnRNP particle organization. hnRNP particles assemble on both introns and exons but remain generally excluded from splice sites. Integration of transcriptome-wide iCLIP data and alternative splicing profiles into an 'RNA map' indicates how the positioning of hnRNP particles determines their effect on the inclusion of alternative exons. The ability of high-resolution iCLIP data to provide insights into the mechanism of this regulation holds promise for studies of other higher-order ribonucleoprotein complexes.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E01075X/1; Medical Research Council: U.1051.04.028.00001.01 (85858)

    Nature structural & molecular biology 2010;17;7;909-15

  • Life-course analysis of a fat mass and obesity-associated (FTO) gene variant and body mass index in the Northern Finland Birth Cohort 1966 using structural equation modeling.

    Kaakinen M, Läärä E, Pouta A, Hartikainen AL, Laitinen J, Tammelin TH, Herzig KH, Sovio U, Bennett AJ, Peltonen L, McCarthy MI, Elliott P, De Stavola B and Järvelin MR

    Institute of Health Sciences, Faculty of Medicine, University of Oulu, Oulu, Finland.

    The association between variation in the fat mass and obesity-associated (FTO) gene and adulthood body mass index (BMI; weight (kg)/height (m)(2)) is well-replicated. More thorough analyses utilizing phenotypic data over the life course may deepen our understanding of the development of BMI and thus help in the prevention of obesity. The authors used a structural equation modeling approach to explore the network of variables associated with BMI from the prenatal period to age 31 years (1965-1997) in 4,435 subjects from the Northern Finland Birth Cohort 1966. The use of structural equation modeling permitted the easy inclusion of variables with missing values in the analyses without separate imputation steps, as well as differentiation between direct and indirect effects. There was an association between the FTO single nucleotide polymorphism rs9939609 and BMI at age 31 years that persisted after controlling for several relevant factors during the life course. The total effect of the FTO variant on adult BMI was mostly composed of the direct effect, but a notable part was also arising indirectly via its effects on earlier BMI development. In addition to well-established genetic determinants, many life-course factors such as physical activity, in spite of not showing mediation or interaction, had a strong independent effect on BMI.

    Funded by: Medical Research Council: G0500539; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02

    American journal of epidemiology 2010;172;6;653-65

  • Childhood adversities are associated with shorter telomere length at adult age both in individuals with an anxiety disorder and controls.

    Kananen L, Surakka I, Pirkola S, Suvisaari J, Lönnqvist J, Peltonen L, Ripatti S and Hovatta I

    Research Program of Molecular Neurology, Faculty of Medicine, University of Helsinki, Helsinki, Finland.

    Accelerated leukocyte telomere shortening has been previously associated to self-perceived stress and psychiatric disorders, including schizophrenia and mood disorders. We set out to investigate whether telomere length is affected in patients with anxiety disorders in which stress is a known risk factor. We also studied the effects of childhood and recent psychological distress on telomere length. We utilized samples from the nationally representative population-based Health 2000 Survey that was carried out between 2000-2001 in Finland to assess major public health problems and their determinants. We measured the relative telomere length of the peripheral blood cells by quantitative real-time PCR from 321 individuals with DSM-IV anxiety disorder or subthreshold diagnosis and 653 matched controls aged 30-87 years, who all had undergone the Composite International Diagnostic Interview. While telomere length did not differ significantly between cases and controls in the entire cohort, the older half of the anxiety disorder patients (48-87 years) exhibited significantly shorter telomeres than healthy controls of the same age (P = 0.013). Interestingly, shorter telomere length was also associated with a greater number of reported childhood adverse life events, among both the anxiety disorder cases and controls (P = 0.005). Childhood chronic or serious illness was the most significantly associated single event affecting telomere length at the adult age (P = 0.004). Self-reported current psychological distress did not affect telomere length. Our results suggest that childhood stress might lead to accelerated telomere shortening seen at the adult age. This finding has potentially important implications supporting the view that childhood adversities might have a considerable impact on well being later in life.

    PloS one 2010;5;5;e10826

  • Prokaryote-derived protein inhibitors of peptidases: A sketchy occurrence and mostly unknown function.

    Kantyka T, Rawlings ND and Potempa J

    Department of Microbiology, Jagiellonian University, Krakow, Poland.

    In metazoan organisms protein inhibitors of peptidases are important factors essential for regulation of proteolytic activity. In vertebrates genes encoding peptidase inhibitors constitute up to 1% of genes reflecting a need for tight and specific control of proteolysis especially in extracellular body fluids. In stark contrast unicellular organisms, both prokaryotic and eukaryotic consistently contain only few, if any, genes coding for putative peptidase inhibitors. This may seem perplexing in the light of the fact that these organisms produce large numbers of proteases of different catalytic classes with the genes constituting up to 6% of the total gene count with the average being about 3%. Apparently, however, a unicellular life-style is fully compatible with other mechanisms of regulation of proteolysis and does not require protein inhibitors to control their intracellular and extracellular proteolytic activity. So in prokaryotes occurrence of genes encoding different types of peptidase inhibitors is infrequent and often scattered among phylogenetically distinct orders or even phyla of microbiota. Genes encoding proteins homologous to alpha-2-macroglobulin (family I39), serine carboxypeptidase Y inhibitor (family I51), alpha-1-peptidase inhibitor (family I4) and ecotin (family I11) are the most frequently represented in Bacteria. Although several of these gene products were shown to possess inhibitory activity, with an exception of ecotin and staphostatins, the biological function of microbial inhibitors is unclear. In this review we present distribution of protein inhibitors from different families among prokaryotes, describe their mode of action and hypothesize on their role in microbial physiology and interactions with hosts and environment.

    Funded by: NIDCR NIH HHS: R01 DE009761-18; Wellcome Trust: WT077044/Z/05/Z

    Biochimie 2010;92;11;1644-56

  • Typhoid in Kenya is associated with a dominant multidrug-resistant Salmonella enterica serovar Typhi haplotype that is also widespread in Southeast Asia.

    Kariuki S, Revathi G, Kiiru J, Mengo DM, Mwituria J, Muyodi J, Munyalo A, Teo YY, Holt KE, Kingsley RA and Dougan G

    Centre for Microbiology Research, Kenya Medical Research Institute, P.O. Box 43640-00100, Nairobi, Kenya.

    In sub-Saharan Africa, the burden of typhoid fever, caused by Salmonella enterica serovar Typhi, remains largely unknown, in part because of a lack of blood or bone marrow culture facilities. We characterized a total of 323 S. Typhi isolates from outbreaks in Kenya over the period 1988 to 2008 for antimicrobial susceptibilities and phylogenetic relationships using single-nucleotide polymorphism (SNP) analysis. There was a dramatic increase in the number and percentage of multidrug-resistant (MDR) S. Typhi isolates over the study period. Overall, only 54 (16.7%) S. Typhi isolates were fully sensitive, while the majority, 195 (60.4%), were multiply resistant to most commonly available drugs-ampicillin, chloramphenicol, tetracycline, and cotrimoxazole; 74 (22.9%) isolates were resistant to a single antimicrobial, usually ampicillin, cotrimoxazole, or tetracycline. Resistance to these antibiotics was encoded on self-transferrable IncHI1 plasmids of the ST6 sequence type. Of the 94 representative S. Typhi isolates selected for genome-wide haplotype analysis, sensitive isolates fell into several phylogenetically different groups, whereas MDR isolates all belonged to a single haplotype, H58, associated with MDR and decreased ciprofloxacin susceptibility, which is also dominant in many parts of Southeast Asia. Derivatives of the same S. Typhi lineage, H58, are responsible for multidrug resistance in Kenya and parts of Southeast Asia, suggesting intercontinental spread of a single MDR clone. Given the emergence of this aggressive MDR haplotype, careful selection and monitoring of antibiotic usage will be required in Kenya, and potentially other regions of sub-Saharan Africa.

    Funded by: Wellcome Trust: 064616/01/Z.

    Journal of clinical microbiology 2010;48;6;2171-6

  • The burden and characteristics of enteric fever at a healthcare facility in a densely populated area of Kathmandu.

    Karkey A, Arjyal A, Anders KL, Boni MF, Dongol S, Koirala S, My PV, Nga TV, Clements AC, Holt KE, Duy PT, Day JN, Campbell JI, Dougan G, Dolecek C, Farrar J, Basnyat B and Baker S

    Oxford University Clinical Research Unit, Patan Academy of Health Sciences, Lagankhel, Kathmandu, Nepal.

    Enteric fever, caused by Salmonella enterica serovars Typhi and Paratyphi A (S. Typhi and S. Paratyphi A) remains a major public health problem in many settings. The disease is limited to locations with poor sanitation which facilitates the transmission of the infecting organisms. Efficacious and inexpensive vaccines are available for S. Typhi, yet are not commonly deployed to control the disease. Lack of vaccination is due partly to uncertainty of the disease burden arising from a paucity of epidemiological information in key locations. We have collected and analyzed data from 3,898 cases of blood culture-confirmed enteric fever from Patan Hospital in Lalitpur Sub-Metropolitan City (LSMC), between June 2005 and May 2009. Demographic data was available for a subset of these patients (n = 527) that were resident in LSMC and who were enrolled in trials. We show a considerable burden of enteric fever caused by S. Typhi (2,672; 68.5%) and S. Paratyphi A (1,226; 31.5%) at this Hospital over a four year period, which correlate with seasonal fluctuations in rainfall. We found that local population density was not related to incidence and we identified a focus of infections in the east of LSMC. With data from patients resident in LSMC we found that the median age of those with S. Typhi (16 years) was significantly less than S. Paratyphi A (20 years) and that males aged 15 to 25 were disproportionately infected. Our findings provide a snapshot into the epidemiological patterns of enteric fever in Kathmandu. The uneven distribution of enteric fever patients within the population suggests local variation in risk factors, such as contaminated drinking water. These findings are important for initiating a vaccination scheme and improvements in sanitation. We suggest any such intervention should be implemented throughout the LSMC area.

    Funded by: Medical Research Council: G0600718; Wellcome Trust

    PloS one 2010;5;11;e13988

  • Optimising experimental design for high-throughput phenotyping in mice: a case study.

    Karp NA, Baker LA, Gerdin AK, Adams NC, Ramírez-Solis R and White JK

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    To further the functional annotation of the mammalian genome, the Sanger Mouse Genetics Programme aims to generate and characterise knockout mice in a high-throughput manner. Annually, approximately 200 lines of knockout mice will be characterised using a standardised battery of phenotyping tests covering key disease indications ranging from obesity to sensory acuity. From these findings secondary centres will select putative mutants of interest for more in-depth, confirmatory experiments. Optimising experimental design and data analysis is essential to maximise output using the resources with greatest efficiency, thereby attaining our biological objective of understanding the role of genes in normal development and disease. This study uses the example of the noninvasive blood pressure test to demonstrate how statistical investigation is important for generating meaningful, reliable results and assessing the design for the defined research objectives. The analysis adjusts for the multiple-testing problem by applying the false discovery rate, which controls the number of false calls within those highlighted as significant. A variance analysis finds that the variation between mice dominates this assay. These variance measures were used to examine the interplay between days, readings, and number of mice on power, the ability to detect change. If an experiment is underpowered, we cannot conclude whether failure to detect a biological difference arises from low power or lack of a distinct phenotype, hence the mice are subjected to testing without gain. Consequently, in confirmatory studies, a power analysis along with the 3Rs can provide justification to increase the number of mice used.

    Funded by: Wellcome Trust: WT077157/Z/05/Z

    Mammalian genome : official journal of the International Mammalian Genome Society 2010;21;9-10;467-76

  • Addressing accuracy and precision issues in iTRAQ quantitation.

    Karp NA, Huber W, Sadowski PG, Charles PD, Hester SV and Lilley KS

    European Bioinformatics Institute, European Molecular Biology Laboratory Outstation, Hinxton, UK.

    iTRAQ (isobaric tags for relative or absolute quantitation) is a mass spectrometry technology that allows quantitative comparison of protein abundance by measuring peak intensities of reporter ions released from iTRAQ-tagged peptides by fragmentation during MS/MS. However, current data analysis techniques for iTRAQ struggle to report reliable relative protein abundance estimates and suffer with problems of precision and accuracy. The precision of the data is affected by variance heterogeneity: low signal data have higher relative variability; however, low abundance peptides dominate data sets. Accuracy is compromised as ratios are compressed toward 1, leading to underestimation of the ratio. This study investigated both issues and proposed a methodology that combines the peptide measurements to give a robust protein estimate even when the data for the protein are sparse or at low intensity. Our data indicated that ratio compression arises from contamination during precursor ion selection, which occurs at a consistent proportion within an experiment and thus results in a linear relationship between expected and observed ratios. We proposed that a correction factor can be calculated from spiked proteins at known ratios. Then we demonstrated that variance heterogeneity is present in iTRAQ data sets irrespective of the analytical packages, LC-MS/MS instrumentation, and iTRAQ labeling kit (4-plex or 8-plex) used. We proposed using an additive-multiplicative error model for peak intensities in MS/MS quantitation and demonstrated that a variance-stabilizing normalization is able to address the error structure and stabilize the variance across the entire intensity range. The resulting uniform variance structure simplifies the downstream analysis. Heterogeneity of variance consistent with an additive-multiplicative model has been reported in other MS-based quantitation including fields outside of proteomics; consequently the variance-stabilizing normalization methodology has the potential to increase the capabilities of MS in quantitation across diverse areas of biology and chemistry.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C50694/1

    Molecular & cellular proteomics : MCP 2010;9;9;1885-97

  • Mass Spectrometry for Microbial Proteomics: Issues in Data Analysis with Electrophoretic or Mass Spectrometric Expression Proteomic Data

    Karp, N.

    Mass Spectrometry for Microbial Proteomics 2010;Chapter 18;423-40

  • European lactase persistence genotype shows evidence of association with increase in body mass index.

    Kettunen J, Silander K, Saarela O, Amin N, Müller M, Timpson N, Surakka I, Ripatti S, Laitinen J, Hartikainen AL, Pouta A, Lahermo P, Anttila V, Männistö S, Jula A, Virtamo J, Salomaa V, Lehtimäki T, Raitakari O, Gieger C, Wichmann EH, Van Duijn CM, Smith GD, McCarthy MI, Järvelin MR, Perola M and Peltonen L

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.

    The global prevalence of obesity has increased significantly in recent decades, mainly due to excess calorie intake and increasingly sedentary lifestyle. Here, we test the association between obesity measured by body mass index (BMI) and one of the best-known genetic variants showing strong selective pressure: the functional variant in the cis-regulatory element of the lactase gene. We tested this variant since it is presumed to provide nutritional advantage in specific physical and cultural environments. We genetically defined lactase persistence (LP) in 31 720 individuals from eight European population-based studies and one family study by genotyping or imputing the European LP variant (rs4988235). We performed a meta-analysis by pooling the beta-coefficient estimates of the relationship between rs4988235 and BMI from the nine studies and found that the carriers of the allele responsible for LP among Europeans showed higher BMI (P = 7.9 x 10(-5)). Since this locus has been shown to be prone to population stratification, we paid special attention to reveal any population substructure which might be responsible for the association signal. The best evidence of exclusion of stratification came from the Dutch family sample which is robust for stratification. In this study, we highlight issues in model selection in the genome-wide association studies and problems in imputation of these special genomic regions.

    Funded by: CCR NIH HHS: N01-RC-37004, N01-RC-45035; Medical Research Council: G0600705; NCI NIH HHS: N01-CN-45165; NHLBI NIH HHS: 1-R01-HL087679-01

    Human molecular genetics 2010;19;6;1129-36

  • Specific replication origins promote DNA amplification in fission yeast.

    Kiang L, Heichinger C, Watt S, Bähler J and Nurse P

    Laboratory of Yeast Genetics and Cell Biology, The Rockefeller University, 1230 York Avenue, Box 5, New York, NY 10065, USA.

    To ensure equal replication of the genome in every eukaryotic cell cycle, replication origins fire only once each S phase and do not fire after passive replication. Failure in these controls can lead to local amplification, contributing to genome instability and the development of cancer. To identify features of replication origins important for such amplification, we have investigated origin firing and local genome amplification in the presence of excess helicase loaders Cdc18 and Cdt1 in fission yeast. We find that S phase controls are attenuated and coordination of origin firing is lost, resulting in local amplification. Specific origins are necessary for amplification but act only within a permissive chromosomal context. Origins associated with amplification are highly AT-rich, fire efficiently and early during mitotic S phase, and are located in large intergenic regions. We propose that these features predispose replication origins to re-fire within a single S phase, or to remain active after passive replication.

    Funded by: Cancer Research UK; NIGMS NIH HHS: GM07739

    Journal of cell science 2010;123;Pt 18;3047-51

  • Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe.

    Kim DU, Hayles J, Kim D, Wood V, Park HO, Won M, Yoo HS, Duhig T, Nam M, Palmer G, Han S, Jeffery L, Baek ST, Lee H, Shim YS, Lee M, Kim L, Heo KS, Noh EJ, Lee AR, Jang YJ, Chung KS, Choi SJ, Park JY, Park Y, Kim HM, Park SK, Park HJ, Kang EJ, Kim HB, Kang HS, Park HM, Kim K, Song K, Song KB, Nurse P and Hoe KL

    Integrative Omics Research Centre, Korea Research Institute of Bioscience and Biotechnology, Yuseong, Daejeon, Korea.

    We report the construction and analysis of 4,836 heterozygous diploid deletion mutants covering 98.4% of the fission yeast genome providing a tool for studying eukaryotic biology. Comprehensive gene dispensability comparisons with budding yeast--the only other eukaryote for which a comprehensive knockout library exists--revealed that 83% of single-copy orthologs in the two yeasts had conserved dispensability. Gene dispensability differed for certain pathways between the two yeasts, including mitochondrial translation and cell cycle checkpoint control. We show that fission yeast has more essential genes than budding yeast and that essential genes are more likely than nonessential genes to be present in a single copy, to be broadly conserved and to contain introns. Growth fitness analyses determined sets of haploinsufficient and haploproficient genes for fission yeast, and comparisons with budding yeast identified specific ribosomal proteins and RNA polymerase subunits, which may act more generally to regulate eukaryotic cell growth.

    Funded by: Cancer Research UK; Wellcome Trust: 093917

    Nature biotechnology 2010;28;6;617-23

  • Loss of NPC1 function in a patient with a co-inherited novel insulin receptor mutation does not grossly modify the severity of the associated insulin resistance.

    Kirk J, Porter KM, Parker V, Barroso I, O'Rahilly S, Hendriksz C and Semple RK

    Department of Endocrinology, Birmingham Children's Hospital, Steelhouse Lane, Birmingham B4 6NH, United Kingdom.

    In Npc1 null mice, a model for Niemann Pick Disease Type C1, it has been reported that hepatocyte insulin receptor function is significantly impaired, consistent with growing evidence that membrane fluidity and microdomain structure have an important role in insulin signal transduction. However, whether insulin receptor function is also compromised in human Niemann Pick disease Type C1 is unclear. We now report a girl who developed progressive dementia, ataxia and opthalmoplegia from 9 years old, followed by severe acanthosis nigricans, hirsutism and acne at 11 years old. She was diagnosed with Niemann Pick Disease type C1 (OMIM#257220) based on positive filipin staining and reduced cholesterol-esterifying activity in dermal fibroblasts, and homozygosity for the p.Ile1061Thr NPC1 mutation. Further analysis revealed her also to be heterozygous for a novel trinucleotide deletion (c.3659 + 1_3659 + 3delGTG) at the end of exon 20 of INSR, encoding the insulin receptor, leading to deletion of Trp1193 in the intracellular tyrosine kinase domain. INSR mRNA and protein levels were normal in dermal fibroblasts, consistent with a primary signal transduction defect in the mutant receptor. Although the proband was significantly more insulin resistant than her father, who carried the INSR mutation but was only heterozygous for the NPC1 variant, their respective degrees of IR were very similar to those previously reported in a father-daughter pair with the closely related p.Trp1193Leu INSR mutation. This suggests that loss of NPC1 function, with attendant changes in membrane cholesterol composition, does not significantly modify the IR phenotype, even in the context of severely impaired INSR function.

    Funded by: Medical Research Council; Wellcome Trust: 077016, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z

    Journal of inherited metabolic disease 2010;33 Suppl 3;S227-32

  • Identification of networks of co-occurring, tumor-related DNA copy number changes using a genome-wide scoring approach.

    Klijn C, Bot J, Adams DJ, Reinders M, Wessels L and Jonkers J

    Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

    Tumorigenesis is a multi-step process in which normal cells transform into malignant tumors following the accumulation of genetic mutations that enable them to evade the growth control checkpoints that would normally suppress their growth or result in apoptosis. It is therefore important to identify those combinations of mutations that collaborate in cancer development and progression. DNA copy number alterations (CNAs) are one of the ways in which cancer genes are deregulated in tumor cells. We hypothesized that synergistic interactions between cancer genes might be identified by looking for regions of co-occurring gain and/or loss. To this end we developed a scoring framework to separate truly co-occurring aberrations from passenger mutations and dominant single signals present in the data. The resulting regions of high co-occurrence can be investigated for between-region functional interactions. Analysis of high-resolution DNA copy number data from a panel of 95 hematological tumor cell lines correctly identified co-occurring recombinations at the T-cell receptor and immunoglobulin loci in T- and B-cell malignancies, respectively, showing that we can recover truly co-occurring genomic alterations. In addition, our analysis revealed networks of co-occurring genomic losses and gains that are enriched for cancer genes. These networks are also highly enriched for functional relationships between genes. We further examine sub-networks of these networks, core networks, which contain many known cancer genes. The core network for co-occurring DNA losses we find seems to be independent of the canonical cancer genes within the network. Our findings suggest that large-scale, low-intensity copy number alterations may be an important feature of cancer development or maintenance by affecting gene dosage of a large interconnected network of functionally related genes.

    PLoS computational biology 2010;6;1;e1000631

  • AnnoTrack--a tracking system for genome annotation.

    Kokocinski F, Harrow J and Hubbard T

    Vertebrate Genome Analysis, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB101HH, UK.

    Background: As genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists. Typically this process involves numerous experts from the field and the use of data from dispersed sources as evidence. This kind of collaborative annotation project requires specialized software solutions for efficient data tracking and processing.

    Results: As part of the scale-up phase of the ENCODE project (Encyclopedia of DNA Elements), the aim of the GENCODE project is to produce a highly accurate evidence-based reference gene annotation for the human genome. The AnnoTrack software system was developed to aid this effort. It integrates data from multiple distributed sources, highlights conflicts and facilitates the quick identification, prioritisation and resolution of problems during the process of genome annotation.

    Conclusions: AnnoTrack has been in use for the last year and has proven a very valuable tool for large-scale genome annotation. Designed to interface with standard bioinformatics components, such as DAS servers and Ensembl databases, it is easy to setup and configure for different genome projects. The source code is available at

    Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 077198, WT077198/Z/05/Z

    BMC genomics 2010;11;538

  • Slingshot: a PiggyBac based transposon system for tamoxifen-inducible 'self-inactivating' insertional mutagenesis.

    Kong J, Wang F, Brenton JD and Adams DJ

    Experimental Cancer Genetics, Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We have developed a self-inactivating PiggyBac transposon system for tamoxifen inducible insertional mutagenesis from a stably integrated chromosomal donor. This system, which we have named 'Slingshot', utilizes a transposon carrying elements for both gain- and loss-of-function screens in vitro. We show that the Slingshot transposon can be efficiently mobilized from a range of chromosomal loci with high inducibility and low background generating insertions that are randomly dispersed throughout the genome. Furthermore, we show that once the Slingshot transposon has been mobilized it is not remobilized producing stable clonal integrants in all daughter cells. To illustrate the efficacy of Slingshot as a screening tool we set out to identify mediators of resistance to puromycin and the chemotherapeutic drug vincristine by performing genetrap screens in mouse embryonic stem cells. From these genome-wide screens we identified multiple independent insertions in the multidrug resistance transporter genes Abcb1a/b and Abcg2 conferring resistance to drug treatment. Importantly, we also show that the Slingshot transposon system is functional in other mammalian cell lines such as human HEK293, OVCAR-3 and PE01 cells suggesting that it may be used in a range of cell culture systems. Slingshot represents a flexible and potent system for genome-wide transposon-mediated mutagenesis with many potential applications.

    Funded by: Cancer Research UK; Wellcome Trust

    Nucleic acids research 2010;38;18;e173

  • Insertional mutagenesis in mice deficient for p15Ink4b, p16Ink4a, p21Cip1, and p27Kip1 reveals cancer gene interactions and correlations with tumor phenotypes.

    Kool J, Uren AG, Martins CP, Sie D, de Ridder J, Turner G, van Uitert M, Matentzoglu K, Lagcher W, Krimpenfort P, Gadiot J, Pritchard C, Lenz J, Lund AH, Jonkers J, Rogers J, Adams DJ, Wessels L, Berns A and van Lohuizen M

    Division of Molecular Genetics, The Centre of Biomedical Genetics, Academic Medical Center and Cancer Genomics Centre, Netherlands Cancer Institute, 1066CX, Amsterdam, the Netherlands.

    The cyclin dependent kinase (CDK) inhibitors p15, p16, p21, and p27 are frequently deleted, silenced, or downregulated in many malignancies. Inactivation of CDK inhibitors predisposes mice to tumor development, showing that these genes function as tumor suppressors. Here, we describe high-throughput murine leukemia virus insertional mutagenesis screens in mice that are deficient for one or two CDK inhibitors. We retrieved 9,117 retroviral insertions from 476 lymphomas to define hundreds of loci that are mutated more frequently than expected by chance. Many of these loci are skewed toward a specific genetic context of predisposing germline and somatic mutations. We also found associations between these loci with gender, age of tumor onset, and lymphocyte lineage (B or T cell). Comparison of retroviral insertion sites with single nucleotide polymorphisms associated with chronic lymphocytic leukemia revealed a significant overlap between the datasets. Together, our findings highlight the importance of genetic context within large-scale mutation detection studies, and they show a novel use for insertional mutagenesis data in prioritizing disease-associated genes that emerge from genome-wide association studies.

    Funded by: Cancer Research UK: A6997, A8784; Wellcome Trust: 082356

    Cancer research 2010;70;2;520-31

  • Microindel detection in short-read sequence data.

    Krawitz P, Rödelsperger C, Jäger M, Jostins L, Bauer S and Robinson PN

    Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin.

    Motivation: Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge.

    Results: We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels.


    Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2010;26;6;722-9

  • Association of JAG1 with bone mineral density and osteoporotic fractures: a genome-wide association study and follow-up replication studies.

    Kung AW, Xiao SM, Cherny S, Li GH, Gao Y, Tso G, Lau KS, Luk KD, Liu JM, Cui B, Zhang MJ, Zhang ZL, He JW, Yue H, Xia WB, Luo LM, He SL, Kiel DP, Karasik D, Hsu YH, Cupples LA, Demissie S, Styrkarsdottir U, Halldorsson BV, Sigurdsson G, Thorsteinsdottir U, Stefansson K, Richards JB, Zhai G, Soranzo N, Valdes A, Spector TD and Sham PC

    Department of Medicine, Research Centre of Heart, Brain, Hormone & Healthy Aging, Faculty of Medicine, The University of Hong Kong, Hong Kong, China.

    Bone mineral density (BMD), a diagnostic parameter for osteoporosis and a clinical predictor of fracture, is a polygenic trait with high heritability. To identify genetic variants that influence BMD in different ethnic groups, we performed a genome-wide association study (GWAS) on 800 unrelated Southern Chinese women with extreme BMD and carried out follow-up replication studies in six independent study populations of European descent and Asian populations including 18,098 subjects. In the meta-analysis, rs2273061 of the Jagged1 (JAG1) gene was associated with high BMD (p = 5.27 x 10(-8) for lumbar spine [LS] and p = 4.15 x 10(-5) for femoral neck [FN], n = 18,898). This SNP was further found to be associated with the low risk of osteoporotic fracture (p = 0.009, OR = 0.7, 95% CI 0.57-0.93, n = 1881). Region-wide and haplotype analysis showed that the strongest association evidence was from the linkage disequilibrium block 5, which included rs2273061 of the JAG1 gene (p = 8.52 x 10(-9) for LS and 3.47 x 10(-5) at FN). To assess the function of identified variants, an electrophoretic mobility shift assay demonstrated the binding of c-Myc to the "G" but not "A" allele of rs2273061. A mRNA expression study in both human bone-derived cells and peripheral blood mononuclear cells confirmed association of the high BMD-related allele G of rs2273061 with higher JAG1 expression. Our results identify the JAG1 gene as a candidate for BMD regulation in different ethnic groups, and it is a potential key factor for fracture pathogenesis.

    Funded by: NHLBI NIH HHS: (N02-HL-64278, N01-HC-25195; NIA NIH HHS: R01 AR/AG 41398; NIAMS NIH HHS: R01 AR 050066, R01 AR050066-04; Wellcome Trust

    American journal of human genetics 2010;86;2;229-39

  • Molecular analysis of tumor-promoting CD8+ T cells in two-stage cutaneous chemical carcinogenesis.

    Kwong BY, Roberts SJ, Silberzahn T, Filler RB, Neustadter JH, Galan A, Reddy S, Lin WM, Ellis PD, Langford CF, Hayday AC and Girardi M

    Department of Dermatology and Skin Diseases Research Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA.

    T-pro are tumor-infiltrating TCRalphabeta(+)CD8(+) cells of reduced cytotoxic potential that promote experimental two-stage chemical cutaneous carcinogenesis. Toward understanding their mechanism of action, this study uses whole-genome expression analysis to compare T-pro with systemic CD8(+) T cells from multiple groups of tumor-bearing mice. T-pro show an overt T helper 17-like profile (high retinoic acid-related orphan receptor-(ROR)gammat, IL-17A, IL-17F; low T-bet and eomesodermin), regulatory potential (high FoxP3, IL-10, Tim-3), and transcripts encoding epithelial growth factors (amphiregulin, Gro-1, Gro-2). Tricolor flow cytometry subsequently confirmed the presence of TCRbeta(+) CD8(+) IL-17(+) T cells among tumor-infiltrating lymphocytes (TILs). Moreover, a time-course analysis of independent TIL isolates from papillomas versus carcinomas exposed a clear association of the "T-pro phenotype" with malignant progression. This molecular characterization of T-pro builds a foundation for elucidating the contributions of inflammation to cutaneous carcinogenesis, and may provide useful biomarkers for cancer immunotherapy in which the widely advocated use of tumor-specific CD8(+) cytolytic T cells should perhaps accommodate the cells' potential corruption toward the T-pro phenotype. The data are also likely germane to psoriasis, in which the epidermis may be infiltrated by CD8(+) IL-17-producing T cells.

    Funded by: Medical Research Council; NCI NIH HHS: P50 CA121974, P50 CA121974-01, R01 CA102703, R01 CA102703-05; NIAMS NIH HHS: P30 AR041942-07, P30 AR41942; Wellcome Trust

    The Journal of investigative dermatology 2010;130;6;1726-36

  • Meeting Report from the Genomic Standards Consortium (GSC) Workshop 8.

    Kyrpides N, Field D, Sterk P, Kottmann R, Glöckner FO, Hirschman L, Garrity GM, Cochrane G and Wooley J

    This report summarizes the proceedings of the 8th meeting of the Genomic Standards Consortium held at the Department of Energy Joint Genome Institute in Walnut Creek, CA, USA on September 9-11, 2009. This three-day workshop marked the maturing of Genomic Standards Consortium from an informal gathering of researchers interested in developing standards in the field of genomic and metagenomics to an established community with a defined governance mechanism, its own open access journal, and a family of established standards for describing genomes, metagenomes and marker studies (i.e. ribosomal RNA gene surveys). There will be increased efforts within the GSC to reach out to the wider scientific community via a range of new projects. Further information about the GSC and its activities can be found at

    Standards in genomic sciences 2010;3;1;93-6

  • Jarid2 is a PRC2 component in embryonic stem cells required for multi-lineage differentiation and recruitment of PRC1 and RNA Polymerase II to developmental regulators.

    Landeira D, Sauer S, Poot R, Dvorkina M, Mazzarella L, Jørgensen HF, Pereira CF, Leleu M, Piccolo FM, Spivakov M, Brookes E, Pombo A, Fisher C, Skarnes WC, Snoek T, Bezstarosti K, Demmers J, Klose RJ, Casanova M, Tavares L, Brockdorff N, Merkenschlager M and Fisher AG

    Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College School of Medicine, Hammersmith Hospital Campus, Du Cane Road, London, W12 0NN UK.

    Polycomb Repressor Complexes (PRCs) are important regulators of embryogenesis. In embryonic stem (ES) cells many genes that regulate subsequent stages in development are enriched at their promoters for PRC1, PRC2 and Ser 5-phosphorylated RNA Polymerase II (RNAP), and contain domains of 'bivalent' chromatin (enriched for H3K4me3; histone H3 di- or trimethylated at Lys 4 and H3K27me3; histone H3 trimethylated at Lys 27). Loss of individual PRC components in ES cells can lead to gene de-repression and to unscheduled differentiation. Here we show that Jarid2 is a novel subunit of PRC2 that is required for the co-recruitment of PRC1 and RNAP to genes that regulate development in ES cells. Jarid2-deficient ES cells showed reduced H3K4me2/me3 and H3K27me3 marking and PRC1/PRC2 recruitment, and did not efficiently establish Ser 5-phosporylated RNAP at target genes. ES cells lacking Jarid2, in contrast to previously characterized PRC1 and PRC2 mutants, did not inappropriately express PRC2 target genes. Instead, they show a severely compromised capacity for successful differentiation towards neural or mesodermal fates and failed to correctly initiate lineage-specific gene expression in vitro. Collectively, these data indicate that transcriptional priming of bivalent genes in pluripotent ES cells is Jarid2-dependent, and suggests that priming is critical for subsequent multi-lineage differentiation.

    Funded by: Medical Research Council

    Nature cell biology 2010;12;6;618-24

  • Hundreds of variants clustered in genomic loci and biological pathways affect human height.

    Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S, Ferreira T, Wood AR, Weyant RJ, Segrè AV, Speliotes EK, Wheeler E, Soranzo N, Park JH, Yang J, Gudbjartsson D, Heard-Costa NL, Randall JC, Qi L, Vernon Smith A, Mägi R, Pastinen T, Liang L, Heid IM, Luan J, Thorleifsson G, Winkler TW, Goddard ME, Sin Lo K, Palmer C, Workalemahu T, Aulchenko YS, Johansson A, Zillikens MC, Feitosa MF, Esko T, Johnson T, Ketkar S, Kraft P, Mangino M, Prokopenko I, Absher D, Albrecht E, Ernst F, Glazer NL, Hayward C, Hottenga JJ, Jacobs KB, Knowles JW, Kutalik Z, Monda KL, Polasek O, Preuss M, Rayner NW, Robertson NR, Steinthorsdottir V, Tyrer JP, Voight BF, Wiklund F, Xu J, Zhao JH, Nyholt DR, Pellikka N, Perola M, Perry JR, Surakka I, Tammesoo ML, Altmaier EL, Amin N, Aspelund T, Bhangale T, Boucher G, Chasman DI, Chen C, Coin L, Cooper MN, Dixon AL, Gibson Q, Grundberg E, Hao K, Juhani Junttila M, Kaplan LM, Kettunen J, König IR, Kwan T, Lawrence RW, Levinson DF, Lorentzon M, McKnight B, Morris AP, Müller M, Suh Ngwa J, Purcell S, Rafelt S, Salem RM, Salvi E, Sanna S, Shi J, Sovio U, Thompson JR, Turchin MC, Vandenput L, Verlaan DJ, Vitart V, White CC, Ziegler A, Almgren P, Balmforth AJ, Campbell H, Citterio L, De Grandi A, Dominiczak A, Duan J, Elliott P, Elosua R, Eriksson JG, Freimer NB, Geus EJ, Glorioso N, Haiqing S, Hartikainen AL, Havulinna AS, Hicks AA, Hui J, Igl W, Illig T, Jula A, Kajantie E, Kilpeläinen TO, Koiranen M, Kolcic I, Koskinen S, Kovacs P, Laitinen J, Liu J, Lokki ML, Marusic A, Maschio A, Meitinger T, Mulas A, Paré G, Parker AN, Peden JF, Petersmann A, Pichler I, Pietiläinen KH, Pouta A, Ridderstråle M, Rotter JI, Sambrook JG, Sanders AR, Schmidt CO, Sinisalo J, Smit JH, Stringham HM, Bragi Walters G, Widen E, Wild SH, Willemsen G, Zagato L, Zgaga L, Zitting P, Alavere H, Farrall M, McArdle WL, Nelis M, Peters MJ, Ripatti S, van Meurs JB, Aben KK, Ardlie KG, Beckmann JS, Beilby JP, Bergman RN, Bergmann S, Collins FS, Cusi D, den Heijer M, Eiriksdottir G, Gejman PV, Hall AS, Hamsten A, Huikuri HV, Iribarren C, Kähönen M, Kaprio J, Kathiresan S, Kiemeney L, Kocher T, Launer LJ, Lehtimäki T, Melander O, Mosley TH, Musk AW, Nieminen MS, O'Donnell CJ, Ohlsson C, Oostra B, Palmer LJ, Raitakari O, Ridker PM, Rioux JD, Rissanen A, Rivolta C, Schunkert H, Shuldiner AR, Siscovick DS, Stumvoll M, Tönjes A, Tuomilehto J, van Ommen GJ, Viikari J, Heath AC, Martin NG, Montgomery GW, Province MA, Kayser M, Arnold AM, Atwood LD, Boerwinkle E, Chanock SJ, Deloukas P, Gieger C, Grönberg H, Hall P, Hattersley AT, Hengstenberg C, Hoffman W, Lathrop GM, Salomaa V, Schreiber S, Uda M, Waterworth D, Wright AF, Assimes TL, Barroso I, Hofman A, Mohlke KL, Boomsma DI, Caulfield MJ, Cupples LA, Erdmann J, Fox CS, Gudnason V, Gyllensten U, Harris TB, Hayes RB, Jarvelin MR, Mooser V, Munroe PB, Ouwehand WH, Penninx BW, Pramstaller PP, Quertermous T, Rudan I, Samani NJ, Spector TD, Völzke H, Watkins H, Wilson JF, Groop LC, Haritunians T, Hu FB, Kaplan RC, Metspalu A, North KE, Schlessinger D, Wareham NJ, Hunter DJ, O'Connell JR, Strachan DP, Wichmann HE, Borecki IB, van Duijn CM, Schadt EE, Thorsteinsdottir U, Peltonen L, Uitterlinden AG, Visscher PM, Chatterjee N, Loos RJ, Boehnke M, McCarthy MI, Ingelsson E, Lindgren CM, Abecasis GR, Stefansson K, Frayling TM and Hirschhorn JN

    Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter EX1 2LU, UK.

    Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

    Funded by: British Heart Foundation: PG/02/128, PG/02/128/14470; Cancer Research UK; Chief Scientist Office: CZB/4/276, CZB/4/279, CZB/4/710; Medical Research Council: G0000649, G0000934, G0500539, G0600331, G0600331(77796), G0601261, G0601261(80227), G0701863, G9521010, G9521010(63660), G9521010D, MC_QA137934, MC_U106179471, MC_U106188470, MC_U127561128; NCI NIH HHS: CA047988, CA49449, CA50385, CA65725, CA67262, CA87969, P01 CA087969-12, R01 CA047988, R01 CA047988-20, R01 CA050385-20, R01 CA065725, R01 CA065725-14, R01 CA067262-14, R01 CA104021, R01 CA104021-02, U01 CA049449-21, U01 CA098233, U01 CA098233-08, U01-CA098233; NCRR NIH HHS: M01-RR00425, U54-RR020278, UL1-RR025005; NHGRI NIH HHS: HG002651, HG005214, HG005581, R01 HG002651-05, RC2 HG005581-02, T32-HG00040, U01 HG004399-02, U01 HG004402-02, U01 HG005214-02, U01-HG004399, U01-HG004402, Z01-HG000024; NHLBI NIH HHS: HL043851, HL084729, HL69757, HL71981, K99-HL094535, N01-HC15103, N01-HC25195, N01-HC35129, N01-HC45133, N01-HC55015, N01-HC55016, N01-HC55018, N01-HC55019, N01-HC55020, N01-HC55021, N01-HC55022, N01-HC55222, N01-HC75150, N01-HC85079, N01-HC85080, N01-HC85081, N01-HC85082, N01-HC85083, N01-HC85084, N01-HC85085, N01-HC85086, N02-HL-6-4278, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-10, R01 HL071981-07, R01 HL086694-02, R01 HL087641-01, R01 HL087647, R01 HL087647-01, R01 HL087652-01, R01 HL087676-01, R01 HL087679-01, R01 HL087700-03, R01 HL088119, R01 HL088119-01, R01-HL086694, R01-HL087641, R01-HL087647, R01-HL087652, R01-HL087676, R01-HL087679, R01-HL087700, R01-HL088119, R01-HL59367, U01 HL069757-10, U01 HL072515-06, U01 HL080295-04, U01 HL084729-03, U01 HL084756, U01 HL084756-03, U01-HL080295, U01-HL084756, U01-HL72515; NIA NIH HHS: N01-AG12100, N01-AG12109, R01 AG031890-02, R01-AG031890, Z01-AG00675, Z01-AG007380; NIAAA NIH HHS: AA014041, AA07535, AA10248, AA13320, AA13321, AA13326, K05 AA017688-04, R01 AA007535-08, R01 AA013320-04, R01 AA013321-05, R01 AA013326-05, R01 AA014041-05; NIAMS NIH HHS: K08 AR055688-03, K08 AR055688-04, K08-AR055688; NIDA NIH HHS: DA12854, R01 DA012854-09; NIDDK NIH HHS: DK062370, DK063491, DK072193, DK079466, DK080145, DK46200, DK58845, F32 DK079466-01, K23 DK080145-01, K23-DK080145, P30 DK072488, R01 DK058845-11, R01 DK068336-01, R01 DK072193-05, R01 DK073490-01, R01 DK075681-02, R01 DK075787-03, R01 DK089256, R01 DK089256-02, R01 DK091718, R01-DK068336, R01-DK073490, R01-DK075681, R01-DK075787, U01 DK062370, U01 DK062370-08, U01 DK062418; NIGMS NIH HHS: U01 GM074518-05, U01-GM074518; NIMH NIH HHS: MH084698, R01 MH059160-04, R01 MH059565, R01 MH059565-06, R01 MH059566-08, R01 MH059571-05, R01 MH059586-08, R01 MH059587-09, R01 MH059588-08, R01 MH060870-09, R01 MH060879-08, R01 MH061675-09, R01 MH067257-04, R01 MH081800-01, R01-MH059160, R01-MH59565, R01-MH59566, R01-MH59571, R01-MH59586, R01-MH59587, R01-MH59588, R01-MH60870, R01-MH60879, R01-MH61675, R01-MH63706, R01-MH67257, R01-MH79469, R01-MH81800, RL1 MH083268-05, RL1-MH083268, U01 MH079469-03, U01 MH079470-03, U01-MH79469, U01-MH79470; PHS HHS: 263-MA-410953, HHSN268200625226C, N01-G65403; Wellcome Trust: 064890, 068545, 068545/Z/02, 072856, 072960, 075491, 076113, 076113/B/04/Z, 076113/C/04/Z, 077016, 077016/Z/05/Z, 079557, 079771, 079895, 081682, 081682/Z/06/Z, 083270, 084183/Z/07/Z, 085301, 085301/Z/08/Z, 086596, 086596/Z/08/Z, 088885, 091746, 091746/Z/10/Z

    Nature 2010;467;7317;832-8

  • Genetic association and interaction analysis of USF1 and APOA5 on lipid levels and atherosclerosis.

    Laurila PP, Naukkarinen J, Kristiansson K, Ripatti S, Kauttu T, Silander K, Salomaa V, Perola M, Karhunen PJ, Barter PJ, Ehnholm C and Peltonen L

    Public Health Genomics Unit, National Institute for Health and Welfare and Institute for Molecular Medicine, Helsinki, Finland.

    Objective: USF1 is a ubiquitous transcription factor governing the expression of numerous genes of lipid and glucose metabolism. APOA5 is a well-established candidate gene regulating triglyceride (TG) levels and has been identified as a downstream target of upstream stimulatory factor. No detailed studies about the effect of APOA5 on atherosclerotic lesion formation have been conducted, nor has its potential interaction with USF1 been examined.

    We analyzed allelic variants of USF1 and APOA5 in families (n=516) ascertained for atherogenic dyslipidemia and in an autopsy series of middle-aged men (n=300) with precise quantitative measurements of atherosclerotic lesions. The impact of previously associated APOA5 variants on TGs was observed in the dyslipidemic families, and variant rs3135506 was associated with size of fibrotic aortic lesions in the autopsy series. The USF1 variant rs2516839, associated previously with atherosclerotic lesions, showed an effect on TGs in members of the dyslipidemic families with documented coronary artery disease. We provide preliminary evidence of gene-gene interaction between these variants in an autopsy series with a fibrotic lesion area in the abdominal aorta (P=0.0028), with TGs in dyslipidemic coronary artery disease subjects (P=0.03), and with high-density lipoprotein cholesterol (P=0.008) in a large population cohort of coronary artery disease patients (n=1065) in which the interaction for TGs was not replicated.

    Conclusions: Our findings in these unique samples reinforce the roles of APOA5 and USF1 variants on cardiovascular phenotypes and suggest that both genes contribute to lipid levels and aortic atherosclerosis individually and possibly through epistatic effects.

    Funded by: Wellcome Trust: 089061

    Arteriosclerosis, thrombosis, and vascular biology 2010;30;2;346-52

  • Use of purified Clostridium difficile spores to facilitate evaluation of health care disinfection regimens.

    Lawley TD, Clare S, Deakin LJ, Goulding D, Yen JL, Raisen C, Brandt C, Lovell J, Cooke F, Clark TG and Dougan G

    Microbial Pathogenesis Laboratory, Wellcome Trust, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

    Clostridium difficile is a major cause of antibiotic-associated diarrheal disease in many parts of the world. In recent years, distinct genetic variants of C. difficile that cause severe disease and persist within health care settings have emerged. Highly resistant and infectious C. difficile spores are proposed to be the main vectors of environmental persistence and host transmission, so methods to accurately monitor spores and their inactivation are urgently needed. Here we describe simple quantitative methods, based on purified C. difficile spores and a murine transmission model, for evaluating health care disinfection regimens. We demonstrate that disinfectants that contain strong oxidizing active ingredients, such as hydrogen peroxide, are very effective in inactivating pure spores and blocking spore-mediated transmission. Complete inactivation of 10⁶ pure C. difficile spores on indicator strips, a six-log reduction, and a standard measure of stringent disinfection regimens require at least 5 min of exposure to hydrogen peroxide vapor (HPV; 400 ppm). In contrast, a 1-min treatment with HPV was required to disinfect an environment that was heavily contaminated with C. difficile spores (17 to 29 spores/cm²) and block host transmission. Thus, pure C. difficile spores facilitate practical methods for evaluating the efficacy of C. difficile spore disinfection regimens and bringing scientific acumen to C. difficile infection control.

    Funded by: Medical Research Council: G0901743; Wellcome Trust

    Applied and environmental microbiology 2010;76;20;6895-900

  • CCRaVAT and QuTie-enabling analysis of rare variants in large-scale case control and quantitative trait association studies.

    Lawrence R, Day-Williams AG, Elliott KS, Morris AP and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Background: Genome-wide association studies have been successful in finding common variants influencing common traits. However, these associations only account for a fraction of trait heritability. There has been a shift in the field towards studying low frequency and rare variants, which are now widely recognised as putative complex trait determinants. Despite this increasing focus on examining the role of low frequency and rare variants in complex disease susceptibility, there is a lack of user-friendly analytical packages implementing powerful association tests for the analysis of rare variants.

    Results: We have developed two software tools, CCRaVAT (Case-Control Rare Variant Analysis Tool) and QuTie (Quantitative Trait), which enable efficient large-scale analysis of low frequency and rare variants. Both programs implement a collapsing method examining the accumulation of low frequency and rare variants across a locus of interest that has more power than single variant analysis. CCRaVAT carries out case-control analyses whereas QuTie has been developed for continuous trait analysis.

    Conclusions: CCRaVAT and QuTie are easy to use software tools that allow users to perform genome-wide association analysis on low frequency and rare variants for both binary and quantitative traits. The software is freely available and provides the genetics community with a resource to perform association analysis on rarer genetic variants.

    Funded by: Wellcome Trust: 064890, 079557, 079557MA, 081682, WT088885/Z/09/Z

    BMC bioinformatics 2010;11;527

  • Improvements to services at the European Nucleotide Archive.

    Leinonen R, Akhtar R, Birney E, Bonfield J, Bower L, Corbett M, Cheng Y, Demiralp F, Faruque N, Goodgame N, Gibson R, Hoad G, Hunter C, Jang M, Leonard S, Lin Q, Lopez R, Maguire M, McWilliam H, Plaister S, Radhakrishnan R, Sobhany S, Slater G, Ten Hoopen P, Valentin F, Vaughan R, Zalunin V, Zerbino D and Cochrane G

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    The European Nucleotide Archive (ENA; is Europe's primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL-EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.

    Funded by: Wellcome Trust

    Nucleic acids research 2010;38;Database issue;D39-45

  • Phylogenetic analysis of gene structure and alternative splicing in alpha-actinins.

    Lek M, MacArthur DG, Yang N and North KN

    Institute for Neuroscience and Muscle Research, The Children's Hospital at Westmead, Sydney, NSW, Australia.

    The alpha-actinins are an important family of actin-binding proteins with the ability to cross-link actin filaments when in dimer form. Members of the alpha-actinin family share a domain topology composed of highly conserved actin-binding and EF-hand domains separated by a rod domain composed of spectrin-like repeats. Functional diversity within this family has arisen through exon duplication and the formation of alternate splice isoforms as well as gene duplications during the evolution of vertebrates. In addition to the known functional domains, alpha-actinins also contain a consensus PDZ-binding site. The completed genome sequence of over 32 invertebrate species has allowed the analysis of gene structure and exon-gene duplication over a diverse range of phyla. Our analysis shows that relative to early branching metazoans, there has been considerable intron loss especially in arthropods with few cases of intron gains. The C-terminal PDZ-binding site is conserved in nearly all invertebrates but is missing in some nematodes and platyhelminths. Alternative splicing in the actin-binding domain is conserved in chordates, arthropods, and some nematodes and platyhelminths. In contrast, alternative splicing of the EF-hand domain is only observed in chordates. Finally, given the prevalence of exon duplications seen in the actin-binding domain, this may act as a significant mechanism in the modification of actin-binding properties.

    Molecular biology and evolution 2010;27;4;773-80

  • Integration of genetic, clinical, and INR data to refine warfarin dosing.

    Lenzini P, Wadelius M, Kimmel S, Anderson JL, Jorgensen AL, Pirmohamed M, Caldwell MD, Limdi N, Burmester JK, Dowd MB, Angchaisuksiri P, Bass AR, Chen J, Eriksson N, Rane A, Lindh JD, Carlquist JF, Horne BD, Grice G, Milligan PE, Eby C, Shin J, Kim H, Kurnik D, Stein CM, McMillin G, Pendleton RC, Berg RL, Deloukas P and Gage BF

    Department of Internal Medicine, Washington University, St Louis, Missouri, USA.

    Well-characterized genes that affect warfarin metabolism (cytochrome P450 (CYP) 2C9) and sensitivity (vitamin K epoxide reductase complex 1 (VKORC1)) explain one-third of the variability in therapeutic dose before the international normalized ratio (INR) is measured. To determine genotypic relevance after INR becomes available, we derived clinical and pharmacogenetic refinement algorithms on the basis of INR values (on day 4 or 5 of therapy), clinical factors, and genotype. After adjusting for INR, CYP2C9 and VKORC1 genotypes remained significant predictors (P < 0.001) of warfarin dose. The clinical algorithm had an R(2) of 48% (median absolute error (MAE): 7.0 mg/week) and the pharmacogenetic algorithm had an R(2) of 63% (MAE: 5.5 mg/week) in the derivation set (N = 969). In independent validation sets, the R(2) was 26-43% with the clinical algorithm and 42-58% when genotype was added (P = 0.002). After several days of therapy, a pharmacogenetic algorithm estimates the therapeutic warfarin dose more accurately than one using clinical factors and INR response alone.

    Funded by: Department of Health; NHLBI NIH HHS: HL097036, R01 HL074724-01, R01 HL074724-02, R01 HL074724-03, R01 HL074724-04, R01 HL092173-03, R01 HL092173-04, R01 HL092173-05, R01 HL097036-01, R01S HL074724; NINDS NIH HHS: K23 NS045598-05; Wellcome Trust

    Clinical pharmacology and therapeutics 2010;87;5;572-8

  • The genome of a pathogenic rhodococcus: cooptive virulence underpinned by key gene acquisitions.

    Letek M, González P, Macarthur I, Rodríguez H, Freeman TC, Valero-Rello A, Blanco M, Buckley T, Cherevach I, Fahey R, Hapeshi A, Holdstock J, Leadon D, Navas J, Ocampo A, Quail MA, Sanders M, Scortti MM, Prescott JF, Fogarty U, Meijer WG, Parkhill J, Bentley SD and Vázquez-Boland JA

    Microbial Pathogenesis Unit, Centres for Infectious Diseases and Immunity, Infection, and Evolution, University of Edinburgh, Edinburgh, United Kingdom.

    We report the genome of the facultative intracellular parasite Rhodococcus equi, the only animal pathogen within the biotechnologically important actinobacterial genus Rhodococcus. The 5.0-Mb R. equi 103S genome is significantly smaller than those of environmental rhodococci. This is due to genome expansion in nonpathogenic species, via a linear gain of paralogous genes and an accelerated genetic flux, rather than reductive evolution in R. equi. The 103S genome lacks the extensive catabolic and secondary metabolic complement of environmental rhodococci, and it displays unique adaptations for host colonization and competition in the short-chain fatty acid-rich intestine and manure of herbivores--two main R. equi reservoirs. Except for a few horizontally acquired (HGT) pathogenicity loci, including a cytoadhesive pilus determinant (rpl) and the virulence plasmid vap pathogenicity island (PAI) required for intramacrophage survival, most of the potential virulence-associated genes identified in R. equi are conserved in environmental rhodococci or have homologs in nonpathogenic Actinobacteria. This suggests a mechanism of virulence evolution based on the cooption of existing core actinobacterial traits, triggered by key host niche-adaptive HGT events. We tested this hypothesis by investigating R. equi virulence plasmid-chromosome crosstalk, by global transcription profiling and expression network analysis. Two chromosomal genes conserved in environmental rhodococci, encoding putative chorismate mutase and anthranilate synthase enzymes involved in aromatic amino acid biosynthesis, were strongly coregulated with vap PAI virulence genes and required for optimal proliferation in macrophages. The regulatory integration of chromosomal metabolic genes under the control of the HGT-acquired plasmid PAI is thus an important element in the cooptive virulence of R. equi.

    PLoS genetics 2010;6;9;e1001145

  • NordicDB: a Nordic pool and portal for genome-wide control data.

    Leu M, Humphreys K, Surakka I, Rehnberg E, Muilu J, Rosenström P, Almgren P, Jääskeläinen J, Lifton RP, Kyvik KO, Kaprio J, Pedersen NL, Palotie A, Hall P, Grönberg H, Groop L, Peltonen L, Palmgren J and Ripatti S

    Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

    A cost-efficient way to increase power in a genetic association study is to pool controls from different sources. The genotyping effort can then be directed to large case series. The Nordic Control database, NordicDB, has been set up as a unique resource in the Nordic area and the data are available for authorized users through the web portal ( The current version of NordicDB pools together high-density genome-wide SNP information from ∼5000 controls originating from Finnish, Swedish and Danish studies and shows country-specific allele frequencies for SNP markers. The genetic homogeneity of the samples was investigated using multidimensional scaling (MDS) analysis and pairwise allele frequency differences between the studies. The plot of the first two MDS components showed excellent resemblance to the geographical placement of the samples, with a clear NW-SE gradient. We advise researchers to assess the impact of population structure when incorporating NordicDB controls in association studies. This harmonized Nordic database presents a unique genome-wide resource for future genetic association studies in the Nordic countries.

    European journal of human genetics : EJHG 2010;18;12;1322-6

  • Genome-wide association identifies OBFC1 as a locus involved in human leukocyte telomere biology.

    Levy D, Neuhausen SL, Hunt SC, Kimura M, Hwang SJ, Chen W, Bis JC, Fitzpatrick AL, Smith E, Johnson AD, Gardner JP, Srinivasan SR, Schork N, Rotter JI, Herbig U, Psaty BM, Sastrasinh M, Murray SS, Vasan RS, Province MA, Glazer NL, Lu X, Cao X, Kronmal R, Mangino M, Soranzo N, Spector TD, Berenson GS and Aviv A

    National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA.

    Telomeres are engaged in a host of cellular functions, and their length is regulated by multiple genes. Telomere shortening, in the course of somatic cell replication, ultimately leads to replicative senescence. In humans, rare mutations in genes that regulate telomere length have been identified in monogenic diseases such as dyskeratosis congenita and idiopathic pulmonary fibrosis, which are associated with shortened leukocyte telomere length (LTL) and increased risk for aplastic anemia. Shortened LTL is observed in a host of aging-related complex genetic diseases and is associated with diminished survival in the elderly. We report results of a genome-wide association study of LTL in a consortium of four observational studies (n = 3,417 participants with LTL and genome-wide genotyping). SNPs in the regions of the oligonucleotide/oligosaccharide-binding folds containing one gene (OBFC1; rs4387287; P = 3.9 x 10(-9)) and chemokine (C-X-C motif) receptor 4 gene (CXCR4; rs4452212; P = 2.9 x 10(-8)) were associated with LTL at a genome-wide significance level (P < 5 x 10(-8)). We attempted replication of the top SNPs at these loci through de novo genotyping of 1,893 additional individuals and in silico lookup in another observational study (n = 2,876), and we confirmed the association findings for OBFC1 but not CXCR4. In addition, we confirmed the telomerase RNA component (TERC) as a gene associated with LTL (P = 1.1 x 10(-5)). The identification of OBFC1 through genome-wide association as a locus for interindividual variation in LTL in the general population advances the understanding of telomere biology in humans and may provide insights into aging-related disorders linked to altered LTL dynamics.

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;20;9293-8

  • MicroRNAs in mouse development and disease.

    Lewis MA and Steel KP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    MicroRNAs, small non-coding RNAs which act as repressors of target genes, were discovered in 1993, and since then have been shown to play important roles in the development of numerous systems. Consistent with this role, they are also implicated in the pathogenesis of multiple diseases. Here we review the involvement of microRNAs in mouse development and disease, with particular reference to deafness as an example.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Seminars in cell & developmental biology 2010;21;7;774-80

  • Fast and accurate long-read alignment with Burrows-Wheeler transform.

    Li H and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.

    Motivation: Many programs for aligning short sequencing reads to a reference genome have been developed in the last 2 years. Most of them are very efficient for short reads but inefficient or not applicable for reads >200 bp because the algorithms are heavily and specifically tuned for short queries with low sequencing error rate. However, some sequencing platforms already produce longer reads and others are expected to become available soon. For longer reads, hashing-based software such as BLAT and SSAHA2 remain the only choices. Nonetheless, these methods are substantially slower than short-read aligners in terms of aligned bases per unit time.

    Results: We designed and implemented a new algorithm, Burrows-Wheeler Aligner's Smith-Waterman Alignment (BWA-SW), to align long sequences up to 1 Mb against a large sequence database (e.g. the human genome) with a few gigabytes of memory. The algorithm is as accurate as SSAHA2, more accurate than BLAT, and is several to tens of times faster than both.


    Funded by: Wellcome Trust: 077192/Z/05/Z

    Bioinformatics (Oxford, England) 2010;26;5;589-95

  • A genome-wide association scan on estrogen receptor-negative breast cancer.

    Li J, Humphreys K, Darabi H, Rosin G, Hannelius U, Heikkinen T, Aittomäki K, Blomqvist C, Pharoah PD, Dunning AM, Ahmed S, Hooning MJ, Hollestelle A, Oldenburg RA, Alfredsson L, Palotie A, Peltonen-Palotie L, Irwanto A, Low HQ, Teoh GH, Thalamuthu A, Kere J, D'Amato M, Easton DF, Nevanlinna H, Liu J, Czene K and Hall P

    Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden.

    Introduction: Breast cancer is a heterogeneous disease and may be characterized on the basis of whether estrogen receptors (ER) are expressed in the tumour cells. ER status of breast cancer is important clinically, and is used both as a prognostic indicator and treatment predictor. In this study, we focused on identifying genetic markers associated with ER-negative breast cancer risk.

    Methods: We conducted a genome-wide association analysis of 285,984 single nucleotide polymorphisms (SNPs) genotyped in 617 ER-negative breast cancer cases and 4,583 controls. We also conducted a genome-wide pathway analysis on the discovery dataset using permutation-based tests on pre-defined pathways. The extent of shared polygenic variation between ER-negative and ER-positive breast cancers was assessed by relating risk scores, derived using ER-positive breast cancer samples, to disease state in independent, ER-negative breast cancer cases.

    Results: Association with ER-negative breast cancer was not validated for any of the five most strongly associated SNPs followed up in independent studies (1,011 ER-negative breast cancer cases, 7,604 controls). However, an excess of small P-values for SNPs with known regulatory functions in cancer-related pathways was found (global P = 0.052). We found no evidence to suggest that ER-negative breast cancer shares a polygenic basis to disease with ER-positive breast cancer.

    Conclusions: ER-negative breast cancer is a distinct breast cancer subtype that merits independent analyses. Given the clinical importance of this phenotype and the likelihood that genetic effect sizes are small, greater sample sizes and further studies are required to understand the etiology of ER-negative breast cancers.

    Funded by: Cancer Research UK: A10119, A10124; NCI NIH HHS: R01 CA58427

    Breast cancer research : BCR 2010;12;6;R93

  • JAK2 V617F impairs hematopoietic stem cell function in a conditional knock-in mouse model of JAK2 V617F-positive essential thrombocythemia.

    Li J, Spensberger D, Ahn JS, Anand S, Beer PA, Ghevaert C, Chen E, Forrai A, Scott LM, Ferreira R, Campbell PJ, Watson SP, Liu P, Erber WN, Huntly BJ, Ottersbach K and Green AR

    Cambridge Institute for Medical Research, Department of Haematology, University of Cambridge, United Kingdom.

    The JAK2 V617F mutation is found in most patients with a myeloproliferative neoplasm and is sufficient to produce a myeloproliferative phenotype in murine retroviral transplantation or transgenic models. However, several lines of evidence suggest that disease phenotype is influenced by the level of mutant JAK2 signaling, and we have therefore generated a conditional knock-in mouse in which a human JAK2 V617F is expressed under the control of the mouse Jak2 locus. Human and murine Jak2 transcripts are expressed at similar levels, and mice develop modest increases in hemoglobin and platelet levels reminiscent of human JAK2 V617F-positive essential thrombocythemia. The phenotype is transplantable and accompanied by increased terminal erythroid and megakaryocyte differentiation together with increased numbers of clonogenic progenitors, including erythropoietin-independent erythroid colonies. Unexpectedly, JAK2(V617F) mice develop reduced numbers of lineage(-)Sca-1(+)c-Kit(+) cells, which exhibit increased DNA damage, reduced apoptosis, and reduced cell cycling. Moreover, competitive bone marrow transplantation studies demonstrated impaired hematopoietic stem cell function in JAK2(V617F) mice. These results suggest that the chronicity of human myeloproliferative neoplasms may reflect a balance between impaired hematopoietic stem cell function and the accumulation of additional mutations.

    Funded by: British Heart Foundation: FS09039; Wellcome Trust: 088340

    Blood 2010;116;9;1528-38

  • Genome-wide forward genetic screens in mouse ES cells.

    Li MA, Pettitt SJ, Yusa K and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Mouse embryonic stem (ES) cells are an attractive model system for investigating mammalian biology. Their relatively stable genome and high amenability to genome modification enables the generation of large-scale mutant libraries, which can be subsequently used for phenotype-driven genetic screens. While retroviral vectors have traditionally been used to generate insertional mutations in ES cells, their severe distribution-bias in the mammalian genome substantially limits genome-wide mutagenesis. The recent development of the DNA transposon piggyBac offers an efficient and highly versatile alternative for achieving more unbiased mutagenesis. Furthermore, heterozygous mutations created by insertional mutagens can be converted in parallel to homozygosity by using Blm-deficient ES cells, allowing genome-wide loss-of-function screens to be conducted. In this chapter, we describe the principles underpinning genetic screens in mouse ES cells with examples of previously successful screens. Protocols are provided for piggyBac transposon-mediated mutagenesis, production of the corresponding homozygous mutants in a Blm-deficient genetic background, and methods for mapping and validation of mutations recovered from screens of such libraries.

    Funded by: Wellcome Trust

    Methods in enzymology 2010;477;217-42

  • Reprogramming of T cells to natural killer-like cells upon Bcl11b deletion.

    Li P, Burke S, Wang J, Chen X, Ortiz M, Lee SC, Lu D, Campos L, Goulding D, Ng BL, Dougan G, Huntly B, Gottgens B, Jenkins NA, Copeland NG, Colucci F and Liu P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    T cells develop in the thymus and are critical for adaptive immunity. Natural killer (NK) lymphocytes constitute an essential component of the innate immune system in tumor surveillance, reproduction, and defense against microbes and viruses. Here, we show that the transcription factor Bcl11b was expressed in all T cell compartments and was indispensable for T lineage development. When Bcl11b was deleted, T cells from all developmental stages acquired NK cell properties and concomitantly lost or decreased T cell-associated gene expression. These induced T-to-natural killer (ITNK) cells, which were morphologically and genetically similar to conventional NK cells, killed tumor cells in vitro, and effectively prevented tumor metastasis in vivo. Therefore, ITNKs may represent a new cell source for cell-based therapies.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0501150, G0800784, G116/187; Wellcome Trust: 076962, 077186

    Science (New York, N.Y.) 2010;329;5987;85-9

  • Meta-analysis and imputation refines the association of 15q25 with smoking quantity.

    Liu JZ, Tozzi F, Waterworth DM, Pillai SG, Muglia P, Middleton L, Berrettini W, Knouff CW, Yuan X, Waeber G, Vollenweider P, Preisig M, Wareham NJ, Zhao JH, Loos RJ, Barroso I, Khaw KT, Grundy S, Barter P, Mahley R, Kesaniemi A, McPherson R, Vincent JB, Strauss J, Kennedy JL, Farmer A, McGuffin P, Day R, Matthews K, Bakke P, Gulsvik A, Lucae S, Ising M, Brueckl T, Horstmann S, Wichmann HE, Rawal R, Dahmen N, Lamina C, Polasek O, Zgaga L, Huffman J, Campbell S, Kooner J, Chambers JC, Burnett MS, Devaney JM, Pichard AD, Kent KM, Satler L, Lindsay JM, Waksman R, Epstein S, Wilson JF, Wild SH, Campbell H, Vitart V, Reilly MP, Li M, Qu L, Wilensky R, Matthai W, Hakonarson HH, Rader DJ, Franke A, Wittig M, Schäfer A, Uda M, Terracciano A, Xiao X, Busonero F, Scheet P, Schlessinger D, St Clair D, Rujescu D, Abecasis GR, Grabe HJ, Teumer A, Völzke H, Petersmann A, John U, Rudan I, Hayward C, Wright AF, Kolcic I, Wright BJ, Thompson JR, Balmforth AJ, Hall AS, Samani NJ, Anderson CA, Ahmad T, Mathew CG, Parkes M, Satsangi J, Caulfield M, Munroe PB, Farrall M, Dominiczak A, Worthington J, Thomson W, Eyre S, Barton A, Wellcome Trust Case Control Consortium, Mooser V, Francks C and Marchini J

    Department of Statistics, University of Oxford, Oxford, UK.

    Smoking is a leading global cause of disease and mortality. We established the Oxford-GlaxoSmithKline study (Ox-GSK) to perform a genome-wide meta-analysis of SNP association with smoking-related behavioral traits. Our final data set included 41,150 individuals drawn from 20 disease, population and control cohorts. Our analysis confirmed an effect on smoking quantity at a locus on 15q25 (P = 9.45 x 10(-19)) that includes CHRNA5, CHRNA3 and CHRNB4, three genes encoding neuronal nicotinic acetylcholine receptor subunits. We used data from the 1000 Genomes project to investigate the region using imputation, which allowed for analysis of virtually all common SNPs in the region and offered a fivefold increase in marker density over HapMap2 (ref. 2) as an imputation reference panel. Our fine-mapping approach identified a SNP showing the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3.

    Funded by: Chief Scientist Office: CZB/4/540, CZB/4/710, ETM/137, ETM/75; Medical Research Council: G0401527, G0600329, G0701863, G0800759, G9521010, MC_U106179471, MC_U106188470, MC_U127561128; NIA NIH HHS: Z99 AG999999, ZIA AG000196-03, ZIA AG000196-04

    Nature genetics 2010;42;5;436-40

  • Critical roles of Bcl11b in T-cell development and maintenance of T-cell identity.

    Liu P, Li P and Burke S

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    T-cell development primarily occurs in the thymus and involves in the interactions of many important transcription factors. Until recently, no single transcription factor has been identified to be essential for T-cell lineage commitment or maintenance of T-cell identity. Recent studies have now identified the zinc finger transcription factor Bcl11b to be essential for T-cell development and for maintenance of T-cell identity. Remarkably, T cells acquire NK cell properties upon Bcl11b deletion. These reprogrammed cells have unique properties in proliferation, cytokine dependency and killing target cells, and may therefore provide a new cell source for some cell-based therapies.

    Immunological reviews 2010;238;1;138-49

  • Characterisations of odorant-binding proteins in the tsetse fly Glossina morsitans morsitans.

    Liu R, Lehane S, He X, Lehane M, Hertz-Fowler C, Berriman M, Pickett JA, Field LM and Zhou JJ

    Department of Biological Chemistry, Harpenden, UK.

    Odorant-binding proteins (OBPs) play an important role in insect olfaction by mediating interactions between odorants and odorant receptors. We report for the first time 20 OBP genes in the tsetse fly Glossina morsitans morsitans. qRT-PCR revealed that 8 of these genes were highly transcribed in the antennae. The transcription of these genes in the antennae was significantly lower in males than in females and there was a clear correlation between OBP gene transcription and feeding status. Starvation over 72 h post-blood meal (PBM) did not significantly affect the transcription. However, the transcription in the antennae of 10-week-old flies was much higher than in 3-day-old flies at 48 h PBM and decreased sharply after 72 h starvation, suggesting that the OBP gene expression is affected by the insect's nutritional status. Sequence comparisons with OBPs of other Dipterans identified several homologs to sex pheromone-binding proteins and OBPs of Drosophila melanogaster.

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Cellular and molecular life sciences : CMLS 2010;67;6;919-29

  • Origin of the human malaria parasite Plasmodium falciparum in gorillas.

    Liu W, Li Y, Learn GH, Rudicell RS, Robertson JD, Keele BF, Ndjango JB, Sanz CM, Morgan DB, Locatelli S, Gonder MK, Kranzusch PJ, Walsh PD, Delaporte E, Mpoudi-Ngole E, Georgiev AV, Muller MN, Shaw GM, Peeters M, Sharp PM, Rayner JC and Hahn BH

    Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.

    Plasmodium falciparum is the most prevalent and lethal of the malaria parasites infecting humans, yet the origin and evolutionary history of this important pathogen remain controversial. Here we develop a single-genome amplification strategy to identify and characterize Plasmodium spp. DNA sequences in faecal samples from wild-living apes. Among nearly 3,000 specimens collected from field sites throughout central Africa, we found Plasmodium infection in chimpanzees (Pan troglodytes) and western gorillas (Gorilla gorilla), but not in eastern gorillas (Gorilla beringei) or bonobos (Pan paniscus). Ape plasmodial infections were highly prevalent, widely distributed and almost always made up of mixed parasite species. Analysis of more than 1,100 mitochondrial, apicoplast and nuclear gene sequences from chimpanzees and gorillas revealed that 99% grouped within one of six host-specific lineages representing distinct Plasmodium species within the subgenus Laverania. One of these from western gorillas comprised parasites that were nearly identical to P. falciparum. In phylogenetic analyses of full-length mitochondrial sequences, human P. falciparum formed a monophyletic lineage within the gorilla parasite radiation. These findings indicate that P. falciparum is of gorilla origin and not of chimpanzee, bonobo or ancient human origin.

    Funded by: Howard Hughes Medical Institute; NIAID NIH HHS: P30 AI 7767, P30 AI027767, P30 AI027767-21A1, R01 AI058715-06A1, R01 AI058715-07, R01 AI50529, R03 AI074778, R03 AI074778-02, R37 AI050529, R37 AI050529-07, R37 AI050529-08, R37 AI050529-09, T32 AI007245, T32 AI007245-26, U19 AI 067854, U19 AI067854, U19 AI067854-06; NIGMS NIH HHS: T32 GM008111, T32 GM008111-13; PHS HHS: R01 I58715; Wellcome Trust

    Nature 2010;467;7314;420-5

  • Ten simple rules for editing Wikipedia.

    Logan DW, Sandal M, Gardner PP, Manske M and Bateman A

    PLoS computational biology 2010;6;9

  • Loss-of-function variants in the genomes of healthy humans.

    MacArthur DG and Tyler-Smith C

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Genetic variants predicted to seriously disrupt the function of human protein-coding genes-so-called loss-of-function (LOF) variants-have traditionally been viewed in the context of severe Mendelian disease. However, recent large-scale sequencing and genotyping projects have revealed a surprisingly large number of these variants in the genomes of apparently healthy individuals--at least 100 per genome, including more than 30 in a homozygous state--suggesting a previously unappreciated level of variation in functional gene content between humans. These variants are mostly found at low frequency, suggesting that they are enriched for mildly deleterious polymorphisms suppressed by negative natural selection, and thus represent an attractive set of candidate variants for complex disease susceptibility. However, they are also enriched for sequencing and annotation artefacts, so overall present serious challenges for clinical sequencing projects seeking to identify severe disease genes amidst the 'noise' of technical error and benign genetic polymorphism. Systematic, high-quality catalogues of LOF variants present in the genomes of healthy individuals, built from the output of large-scale sequencing studies such as the 1000 Genomes Project, will help to distinguish between benign and disease-causing LOF variants, and will provide valuable resources for clinical genomics.

    Funded by: Wellcome Trust

    Human molecular genetics 2010;19;R2;R125-30

  • Dysregulated humoral immunity to nontyphoidal Salmonella in HIV-infected African adults.

    MacLennan CA, Gilchrist JJ, Gordon MA, Cunningham AF, Cobbold M, Goodall M, Kingsley RA, van Oosterhout JJ, Msefula CL, Mandala WL, Leyton DL, Marshall JL, Gondwe EN, Bobat S, López-Macías C, Doffinger R, Henderson IR, Zijlstra EE, Dougan G, Drayson MT, MacLennan IC and Molyneux ME

    Medical Research Council Centre for Immune Regulation and Clinical Immunology Service, Institute of Biomedical Research, School of Immunity and Infection, University of Birmingham, Birmingham, UK.

    Nontyphoidal Salmonellae are a major cause of life-threatening bacteremia among HIV-infected individuals. Although cell-mediated immunity controls intracellular infection, antibodies protect against Salmonella bacteremia. We report that high-titer antibodies specific for Salmonella lipopolysaccharide (LPS) are associated with a lack of Salmonella-killing in HIV-infected African adults. Killing was restored by genetically shortening LPS from the target Salmonella or removing LPS-specific antibodies from serum. Complement-mediated killing of Salmonella by healthy serum is shown to be induced specifically by antibodies against outer membrane proteins. This killing is lost when excess antibody against Salmonella LPS is added. Thus, our study indicates that impaired immunity against nontyphoidal Salmonella bacteremia in HIV infection results from excess inhibitory antibodies against Salmonella LPS, whereas serum killing of Salmonella is induced by antibodies against outer membrane proteins.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    Science (New York, N.Y.) 2010;328;5977;508-12

  • Meeting report: a workshop on Best Practices in Genome Annotation.

    Madupu R, Brinkac LM, Harrow J, Wilming LG, Böhme U, Lamesch P and Hannick LI

    Informatics, J. Craig Venter Institute, Rockville, MD 20850 USA, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK and The Arabidopsis Information Resource, Carnegie Institution of Washington, Stanford, CA 94305 USA.

    Efforts to annotate the genomes of a wide variety of model organisms are currently carried out by sequencing centers, model organism databases and academic/institutional laboratories around the world. Different annotation methods and tools have been developed over time to meet the needs of biologists faced with the task of annotating biological data. While standardized methods are essential for consistent curation within each annotation group, methods and tools can differ between groups, especially when the groups are curating different organisms. Biocurators from several institutes met at the Third International Biocuration Conference in Berlin, Germany, April 2009 and hosted the 'Best Practices in Genome Annotation: Inference from Evidence' workshop to share their strategies, pipelines, standards and tools. This article documents the material presented in the workshop.

    Funded by: NHGRI NIH HHS: U54 HG004555-03; Wellcome Trust: 077198

    Database : the journal of biological databases and curation 2010;2010;baq001

  • Distinct effects of allelic NFIX mutations on nonsense-mediated mRNA decay engender either a Sotos-like or a Marshall-Smith syndrome.

    Malan V, Rajan D, Thomas S, Shaw AC, Louis Dit Picard H, Layet V, Till M, van Haeringen A, Mortier G, Nampoothiri S, Puseljić S, Legeai-Mallet L, Carter NP, Vekemans M, Munnich A, Hennekam RC, Colleaux L and Cormier-Daire V

    Département de Génétique, Université Paris Descartes, Hôpital Necker-Enfants Malades, Paris 75015, France.

    By using a combination of array comparative genomic hybridization and a candidate gene approach, we identified nuclear factor I/X (NFIX) deletions or nonsense mutation in three sporadic cases of a Sotos-like overgrowth syndrome with advanced bone age, macrocephaly, developmental delay, scoliosis, and unusual facies. Unlike the aforementioned human syndrome, Nfix-deficient mice are unable to gain weight and die in the first 3 postnatal weeks, while they also present with a spinal deformation and decreased bone mineralization. These features prompted us to consider NFIX as a candidate gene for Marshall-Smith syndrome (MSS), a severe malformation syndrome characterized by failure to thrive, respiratory insufficiency, accelerated osseous maturation, kyphoscoliosis, osteopenia, and unusual facies. Distinct frameshift and splice NFIX mutations that escaped nonsense-mediated mRNA decay (NMD) were identified in nine MSS subjects. NFIX belongs to the Nuclear factor one (NFI) family of transcription factors, but its specific function is presently unknown. We demonstrate that NFIX is normally expressed prenatally during human brain development and skeletogenesis. These findings demonstrate that allelic NFIX mutations trigger distinct phenotypes, depending specifically on their impact on NMD.

    American journal of human genetics 2010;87;2;189-98

  • Butyrate greatly enhances derivation of human induced pluripotent stem cells by promoting epigenetic remodeling and the expression of pluripotency-associated genes.

    Mali P, Chou BK, Yen J, Ye Z, Zou J, Dowey S, Brodsky RA, Ohm JE, Yu W, Baylin SB, Yusa K, Bradley A, Meyers DJ, Mukherjee C, Cole PA and Cheng L

    Stem Cell Program, Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.

    We report here that butyrate, a naturally occurring fatty acid commonly used as a nutritional supplement and differentiation agent, greatly enhances the efficiency of induced pluripotent stem (iPS) cell derivation from human adult or fetal fibroblasts. After transient butyrate treatment, the iPS cell derivation efficiency is enhanced by 15- to 51-fold using either retroviral or piggyBac transposon vectors expressing 4 to 5 reprogramming genes. Butyrate stimulation is more remarkable (>100- to 200-fold) on reprogramming in the absence of either KLF4 or MYC transgene. Butyrate treatment did not negatively affect properties of iPS cell lines established by either 3 or 4 retroviral vectors or a single piggyBac DNA transposon vector. These characterized iPS cell lines, including those derived from an adult patient with sickle cell disease by either the piggyBac or retroviral vectors, show normal karyotypes and pluripotency. To gain insights into the underlying mechanisms of butyrate stimulation, we conducted genome-wide gene expression and promoter DNA methylation microarrays and other epigenetic analyses on established iPS cells and cells from intermediate stages of the reprogramming process. By days 6 to 12 during reprogramming, butyrate treatment enhanced histone H3 acetylation, promoter DNA demethylation, and the expression of endogenous pluripotency-associated genes, including DPPA2, whose overexpression partially substitutes for butyrate stimulation. Thus, butyrate as a cell permeable small molecule provides a simple tool to further investigate molecular mechanisms of cellular reprogramming. Moreover, butyrate stimulation provides an efficient method for reprogramming various human adult somatic cells, including cells from patients that are more refractory to reprogramming.

    Funded by: NHLBI NIH HHS: R01 HL073781-06, RC2 HL101582, RC2 HL101582-01; NIGMS NIH HHS: GM62437; Wellcome Trust

    Stem cells (Dayton, Ohio) 2010;28;4;713-20

  • FRT-seq: amplification-free, strand-specific transcriptome sequencing.

    Mamanova L, Andrews RM, James KD, Sheridan EM, Ellis PD, Langford CF, Ost TW, Collins JE and Turner DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We report an alternative approach to transcriptome sequencing for the Illumina Genome Analyzer, in which the reverse transcription reaction takes place on the flowcell. No amplification is performed during the library preparation, so PCR biases and duplicates are avoided, and because the template is poly(A)(+) RNA rather than cDNA, the resulting sequences are necessarily strand-specific. The method is compatible with paired- or single-end sequencing.

    Funded by: Wellcome Trust: 079643, WT079643

    Nature methods 2010;7;2;130-2

  • Target-enrichment strategies for next-generation sequencing.

    Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E, Shendure J and Turner DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We have not yet reached a point at which routine sequencing of large numbers of whole eukaryotic genomes is feasible, and so it is often necessary to select genomic regions of interest and to enrich these regions before sequencing. There are several enrichment approaches, each with unique advantages and disadvantages. Here we describe our experiences with the leading target-enrichment technologies, the optimizations that we have performed and typical results that can be obtained using each. We also provide detailed protocols for each technology so that end users can find the best compromise between sensitivity, specificity and uniformity for their particular project.

    Funded by: NHGRI NIH HHS: 5R21HG004749; NHLBI NIH HHS: 5R01HL094976; Wellcome Trust: WT079643

    Nature methods 2010;7;2;111-8

  • Construction of a large extracellular protein interaction network and its resolution by spatiotemporal expression profiling.

    Martin S, Söllner C, Charoensawan V, Adryan B, Thisse B, Thisse C, Teichmann S and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge CB101HH, United Kingdom.

    Extracellular interactions involving both secreted and membrane-tethered receptor proteins are essential to initiate signaling pathways that orchestrate cellular behaviors within biological systems. Because of the biochemical properties of these proteins and their interactions, identifying novel extracellular interactions remains experimentally challenging. To address this, we have recently developed an assay, AVEXIS (avidity-based extracellular interaction screen) to detect low affinity extracellular interactions on a large scale and have begun to construct interaction networks between zebrafish receptors belonging to the immunoglobulin and leucine-rich repeat protein families to identify novel signaling pathways important for early development. Here, we expanded our zebrafish protein library to include other domain families and many more secreted proteins and performed our largest screen to date totaling 16,544 potential unique interactions. We report 111 interactions of which 96 are novel and include the first documented extracellular ligands for 15 proteins. By including 77 interactions from previous screens, we assembled an expanded network of 188 extracellular interactions between 92 proteins and used it to show that secreted proteins have twice as many interaction partners as membrane-tethered receptors and that the connectivity of the extracellular network behaves as a power law. To try to understand the functional role of these interactions, we determined new expression patterns for 164 genes within our clone library by using whole embryo in situ hybridization at five key stages of zebrafish embryonic development. These expression data were integrated with the binding network to reveal where each interaction was likely to function within the embryo and were used to resolve the static interaction network into dynamic tissue- and stage-specific subnetworks within the developing zebrafish embryo. All these data were organized into a freely accessible on-line database called ARNIE (AVEXIS Receptor Network with Integrated Expression; and provide a valuable resource of new extracellular signaling interactions for developmental biology.

    Funded by: Medical Research Council; Wellcome Trust: 077108/Z/05/Z

    Molecular & cellular proteomics : MCP 2010;9;12;2654-65

  • Review of genetic factors in intestinal malrotation.

    Martin V and Shaw-Smith C

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Intestinal malrotation is well covered in the surgical literature from the point of view of operative management, but few reviews to date have attempted to provide a comprehensive examination of the topic from the point of view of aetiology, in particular genetic aetiology. Following a brief overview of molecular embryology of midgut rotation, we present in this article instances of and case reports and case series of intestinal malrotation in which a genetic aetiology is likely. Autosomal dominant, autosomal recessive, X-linked and chromosomal forms of the disorder are represented. Most occur in syndromic form, that is to say, in association with other malformations. In many instances, recognition of a specific syndrome is possible, one of several examples discussed being the recently described association of intestinal malrotation with alveolar capillary dysplasia, due to mutations in the forkhead box transcription factor FOXF1. New advances in sequencing technology mean that the identification of the genes mutated in these disorders is more accessible than ever, and paediatric surgeons are encouraged to refer to their colleagues in clinical genetics where a genetic aetiology seems likely.

    Funded by: Wellcome Trust

    Pediatric surgery international 2010;26;8;769-81

  • Genome-wide association study identifies two novel regions at 11p15.5-p13 and 1p31 with major impact on acute-phase serum amyloid A.

    Marzi C, Albrecht E, Hysi PG, Lagou V, Waldenberger M, Tönjes A, Prokopenko I, Heim K, Blackburn H, Ried JS, Kleber ME, Mangino M, Thorand B, Peters A, Hammond CJ, Grallert H, Boehm BO, Kovacs P, Geistlinger L, Prokisch H, Winkelmann BR, Spector TD, Wichmann HE, Stumvoll M, Soranzo N, März W, Koenig W, Illig T and Gieger C

    Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.

    Elevated levels of acute-phase serum amyloid A (A-SAA) cause amyloidosis and are a risk factor for atherosclerosis and its clinical complications, type 2 diabetes, as well as various malignancies. To investigate the genetic basis of A-SAA levels, we conducted the first genome-wide association study on baseline A-SAA concentrations in three population-based studies (KORA, TwinsUK, Sorbs) and one prospective case cohort study (LURIC), including a total of 4,212 participants of European descent, and identified two novel genetic susceptibility regions at 11p15.5-p13 and 1p31. The region at 11p15.5-p13 (rs4150642; p = 3.20×10(-111)) contains serum amyloid A1 (SAA1) and the adjacent general transcription factor 2 H1 (GTF2H1), Hermansky-Pudlak Syndrome 5 (HPS5), lactate dehydrogenase A (LDHA), and lactate dehydrogenase C (LDHC). This region explains 10.84% of the total variation of A-SAA levels in our data, which makes up 18.37% of the total estimated heritability. The second region encloses the leptin receptor (LEPR) gene at 1p31 (rs12753193; p = 1.22×10(-11)) and has been found to be associated with CRP and fibrinogen in previous studies. Our findings demonstrate a key role of the 11p15.5-p13 region in the regulation of baseline A-SAA levels and provide confirmative evidence of the importance of the 1p31 region for inflammatory processes and the close interplay between A-SAA, leptin, and other acute-phase proteins.

    PLoS genetics 2010;6;11;e1001213

  • Ten years experience of Salmonella infections in Cambridge, UK.

    Matheson N, Kingsley RA, Sturgess K, Aliyu SH, Wain J, Dougan G and Cooke FJ

    Clinical Microbiology and Public Health Laboratory, Health Protection Agency, Box 236, Addenbrooke's Hospital, Hills Road, Cambridge CB20QW, UK.

    Objectives: Review of all Salmonella infections diagnosed in the Cambridge area over 10 years.

    Methods: All Salmonella enterica isolated in the Clinical Microbiology Laboratory, Addenbrooke's Hospital between 1.1.1999 and 31.12.2008 were included. Patient demographics, serotype and additional relevant details (travel history, resistance-type, phage-type) were recorded.

    Results: 1003 episodes of Salmonella gastroenteritis were confirmed by stool culture, representing 88 serotypes. Serotypes Enteritidis (59%), Typhimurium (4.7%), Virchow (2.6%), Newport (1.8%) and Braenderup (1.7%) were the 5 most common isolates. There were an additional 37 invasive Salmonella infections (32 blood cultures, 4 tissue samples, 1 CSF). 13/15 patients with Salmonella Typhi or Salmonella Paratyphi isolated from blood or faeces with an available travel history had returned from the Indian subcontinent. 8/10 S. Typhi or Paratyphi isolates tested had reduced susceptibility to fluoroquinolones (MIC > or = 0.125 mg/L). 7/21 patients with non-typhoidal Salmonella bacteraemia were known to be immunosuppressed.

    Conclusion: This study describes Salmonella serotypes circulating within a defined geographical area over a decade. Prospective molecular analysis of isolates of S. enterica by multi-locus sequence typing (MLST) and single nucleotide polymorphism (SNP) detection will determine the geo-phylogenetic relationship of isolates within our region.

    Funded by: Wellcome Trust

    The Journal of infection 2010;60;1;21-5

  • Novel candidate cancer genes identified by a large-scale cross-species comparative oncogenomics approach.

    Mattison J, Kool J, Uren AG, de Ridder J, Wessels L, Jonkers J, Bignell GR, Butler A, Rust AG, Brosch M, Wilson CH, van der Weyden L, Largaespada DA, Stratton MR, Futreal PA, van Lohuizen M, Berns A, Collier LS, Hubbard T and Adams DJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Comparative genomic hybridization (CGH) can reveal important disease genes but the large regions identified could sometimes contain hundreds of genes. Here we combine high-resolution CGH analysis of 598 human cancer cell lines with insertion sites isolated from 1,005 mouse tumors induced with the murine leukemia virus (MuLV). This cross-species oncogenomic analysis revealed candidate tumor suppressor genes and oncogenes mutated in both human and mouse tumors, making them strong candidates for novel cancer genes. A significant number of these genes contained binding sites for the stem cell transcription factors Oct4 and Nanog. Notably, mice carrying tumors with insertions in or near stem cell module genes, which are thought to participate in cell self-renewal, died significantly faster than mice without these insertions. A comparison of the profile we identified to that induced with the Sleeping Beauty (SB) transposon system revealed significant differences in the profile of recurrently mutated genes. Collectively, this work provides a rich catalogue of new candidate cancer genes for functional analysis.

    Funded by: Cancer Research UK: A6997, A8784; NCI NIH HHS: K01 CA122183-04, K01CA122183, R01 CA113636-01A1, R01 CA113636-02, R01 CA113636-03, R01 CA113636-04, R01 CA113636-05, R01 CA134759-01A1, R01 CA134759-02, R01 CA134759-03, R01 CA134759-04; Wellcome Trust: 077198, 082356

    Cancer research 2010;70;3;883-95

  • Notch2 is required for progression of pancreatic intraepithelial neoplasia and development of pancreatic ductal adenocarcinoma.

    Mazur PK, Einwächter H, Lee M, Sipos B, Nakhai H, Rad R, Zimber-Strobl U, Strobl LJ, Radtke F, Klöppel G, Schmid RM and Siveke JT

    Second Department of Internal Medicine and Institute of Pathology, Technical University of Munich, 81675 Munich, Germany.

    Pancreatic cancer is one of the most fatal malignancies lacking effective therapies. Notch signaling is a key regulator of cell fate specification and pancreatic cancer development; however, the role of individual Notch receptors and downstream signaling is largely unknown. Here, we show that Notch2 is predominantly expressed in ductal cells and pancreatic intraepithelial neoplasia (PanIN) lesions. Using genetically engineered mice, we demonstrate the effect of conditional Notch receptor ablation in KrasG12D-driven pancreatic carcinogenesis. Deficiency of Notch2 but not Notch1 stops PanIN progression, prolongs survival, and leads to a phenotypical switch toward anaplastic pancreatic cancer with epithelial-mesenchymal transition. By expression profiling, we identified increased Myc signaling regulated by Notch2 during tumor development, placing Notch2 as a central regulator of PanIN progression and malignant transformation. Our study supports the concept of distinctive roles of individual Notch receptors in cancer development.

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;30;13438-43

  • Use of cancer-specific genomic rearrangements to quantify disease burden in plasma from patients with solid tumors.

    McBride DJ, Orpana AK, Sotiriou C, Joensuu H, Stephens PJ, Mudie LJ, Hämäläinen E, Stebbings LA, Andersson LC, Flanagan AM, Durbecq V, Ignatiadis M, Kallioniemi O, Heckman CA, Alitalo K, Edgren H, Futreal PA, Stratton MR and Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    Detection of recurrent somatic rearrangements routinely allows monitoring of residual disease burden in leukemias, but is not used for most solid tumors. However, next-generation sequencing now allows rapid identification of patient-specific rearrangements in solid tumors. We mapped genomic rearrangements in three cancers and showed that PCR assays for rearrangements could detect a single copy of the tumor genome in plasma without false positives. Disease status, drug responsiveness, and incipient relapse could be serially assessed. In future, this strategy could be readily established in diagnostic laboratories, with major impact on monitoring of disease status and personalizing treatment of solid tumors.

    Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 088340/Z/09/Z, 093867

    Genes, chromosomes & cancer 2010;49;11;1062-9

  • Regulation of the Epstein-Barr virus Zp promoter in B lymphocytes during reactivation from latency.

    McDonald C, Karstegl CE, Kellam P and Farrell PJ

    Department of Virology, Imperial College Faculty of Medicine, St Mary's Campus, London W2 1PG, UK.

    Ten novel mutations were introduced into the Zp promoter to test the role of sequences outside the established transcription factor-binding sites in Epstein-Barr virus (EBV) reactivation. Most of these had only small effects, but mutations in the ZID site were shown to reduce Zp activity strongly at early times after induction by anti-immunoglobulin (anti-Ig). The binding of MEF2 transcription factor to ZID was characterized in detail and linked functionally to Zp promoter activity. The presence of XBP-1s, the active form of XBP-1, after administration of anti-Ig to Akata Burkitt's lymphoma cells is consistent with a role for this factor in reactivation of the EBV lytic cycle, although signalling through MEF2D was quantitatively much more significant in activation of Zp. Silencing of Zp during latency is thought to be primarily a consequence of a repressive chromatin structure on Zp, and this aspect of Zp regulation can be observed in the Akata genome through protection of Zp from activation by BZLF1 in the absence of signalling from the B-cell receptor.

    The Journal of general virology 2010;91;Pt 3;622-9

  • Visualizing chromosome mosaicism and detecting ethnic outliers by the method of "rare" heterozygotes and homozygotes (RHH).

    McGinnis RE, Deloukas P, McLaren WM and Inouye M

    Wellcome Trust Sanger Institute, Cambridge, UK.

    We describe a novel approach for evaluating SNP genotypes of a genome-wide association scan to identify "ethnic outlier" subjects whose ethnicity is different or admixed compared to most other subjects in the genotyped sample set. Each ethnic outlier is detected by counting a genomic excess of "rare" heterozygotes and/or homozygotes whose frequencies are low (<1%) within genotypes of the sample set being evaluated. This method also enables simple and striking visualization of non-Caucasian chromosomal DNA segments interspersed within the chromosomes of ethnically admixed individuals. We show that this visualization of the mosaic structure of admixed human chromosomes gives results similar to another visualization method (SABER) but with much less computational time and burden. We also show that other methods for detecting ethnic outliers are enhanced by evaluating only genomic regions of visualized admixture rather than diluting outlier ancestry by evaluating the entire genome considered in aggregate. We have validated our method in the Wellcome Trust Case Control Consortium (WTCCC) study of 17,000 subjects as well as in HapMap subjects and simulated outliers of known ethnicity and admixture. The method's ability to precisely delineate chromosomal segments of non-Caucasian ethnicity has enabled us to demonstrate previously unreported non-Caucasian admixture in two HapMap Caucasian parents and in a number of WTCCC subjects. Its sensitive detection of ethnic outliers and simple visual discrimination of discrete chromosomal segments of different ethnicity implies that this method of rare heterozygotes and homozygotes (RHH) is likely to have diverse and important applications in humans and other species.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 076113

    Human molecular genetics 2010;19;13;2539-53

  • Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.

    McLaren W, Pritchard B, Rios D, Chen Y, Flicek P and Cunningham F

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    SUMMARY: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species. AVAILABILITY: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at The Ensembl API ( for installation instructions) is open source software.

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2010;26;16;2069-70

  • Hypomorphic temperature-sensitive alleles of NSDHL cause CK syndrome.

    McLarren KW, Severson TM, du Souich C, Stockton DW, Kratz LE, Cunningham D, Hendson G, Morin RD, Wu D, Paul JE, An J, Nelson TN, Chou A, DeBarber AE, Merkens LS, Michaud JL, Waters PJ, Yin J, McGillivray B, Demos M, Rouleau GA, Grzeschik KH, Smith R, Tarpey PS, Shears D, Schwartz CE, Gecz J, Stratton MR, Arbour L, Hurlburt J, Van Allen MI, Herman GE, Zhao Y, Moore R, Kelley RI, Jones SJ, Steiner RD, Raymond FL, Marra MA and Boerkoel CF

    Department of Medical Genetics, Vancouver, Canada.

    CK syndrome (CKS) is an X-linked recessive intellectual disability syndrome characterized by dysmorphism, cortical brain malformations, and an asthenic build. Through an X chromosome single-nucleotide variant scan in the first reported family, we identified linkage to a 5 Mb region on Xq28. Sequencing of this region detected a segregating 3 bp deletion (c.696_698del [p.Lys232del]) in exon 7 of NAD(P) dependent steroid dehydrogenase-like (NSDHL), a gene that encodes an enzyme in the cholesterol biosynthesis pathway. We also found that males with intellectual disability in another reported family with an NSDHL mutation (c.1098 dup [p.Arg367SerfsX33]) have CKS. These two mutations, which alter protein folding, show temperature-sensitive protein stability and complementation in Erg26-deficient yeast. As described for the allelic disorder CHILD syndrome, cells and cerebrospinal fluid from CKS patients have increased methyl sterol levels. We hypothesize that methyl sterol accumulation, not only cholesterol deficiency, causes CKS, given that cerebrospinal fluid cholesterol, plasma cholesterol, and plasma 24S-hydroxycholesterol levels are normal in males with CKS. In summary, CKS expands the spectrum of cholesterol-related disorders and insight into the role of cholesterol in human development.

    American journal of human genetics 2010;87;6;905-14

  • The bacteriology of pouchitis: a molecular phylogenetic analysis using 16S rRNA gene cloning and sequencing.

    McLaughlin SD, Walker AW, Churcher C, Clark SK, Tekkis PP, Johnson MW, Parkhill J, Ciclitira PJ, Dougan G, Nicholls RJ and Petrovska L

    Department of Biosurgery and Surgical Technology, St Mary's Hospital, Imperial College London, London, United Kingdom.

    Objective: To identify, compare, and contrast the microbiota in patients with and without pouchitis after restorative proctocolectomy (RPC) for ulcerative colitis (UC) and familial adenomatous polyposis (FAP).

    Pouchitis is the most common complication following RPC. An abnormal host-microbial interaction has been implicated. We investigated the pouch microbiota in patients with and without pouchitis undergoing restorative proctocolectomy for UC and FAP.

    Methods: Mucosal pouch biopsies, taken from 16 UC (pouchitis 8) and 8 FAP (pouchitis 3) patients were analyzed to the species (or phylotype) level by cloning and sequencing of 3184 full-length bacterial 16S rRNA genes.

    Results: There was a significant increase in Proteobacteria (P = 0.019) and a significant decrease in Bacteroidetes (P = 0.001) and Faecalibacterium prausnitzii (P = 0.029) in the total UC compared with the total FAP cohort, but only limited differences were found between the UC nonpouchitis and pouchitis groups and the FAP pouchitis and nonpouchitis groups. Bacterial diversity in the FAP nonpouchitis group was significantly greater than in UC nonpouchitis (P = 0.019) and significantly greater in UC nonpouchitis compared with UC pouchitis (P = 0.009). No individual species or phylotype specifically associated with either UC or FAP pouchitis was found.

    Conclusions: UC pouch patients have a different, less diverse, gut microbiota than FAP patients. A further reduction in bacterial diversity but no significant dysbiosis occurs in those with pouchitis. The study suggests that a dysbiosis occurs in the ileal pouch of UC RPC patients which predisposes to, but may not directly cause, pouchitis.

    Funded by: Wellcome Trust: 076964

    Annals of surgery 2010;252;1;90-8

  • A variant in LIN28B is associated with 2D:4D finger-length ratio, a putative retrospective biomarker of prenatal testosterone exposure.

    Medland SE, Zayats T, Glaser B, Nyholt DR, Gordon SD, Wright MJ, Montgomery GW, Campbell MJ, Henders AK, Timpson NJ, Peltonen L, Wolke D, Ring SM, Deloukas P, Martin NG, Smith GD and Evans DM

    Genetic Epidemiology, Queensland Institute of Medical Research, Australia.

    The ratio of the lengths of an individual's second to fourth digit (2D:4D) is commonly used as a noninvasive retrospective biomarker for prenatal androgen exposure. In order to identify the genetic determinants of 2D:4D, we applied a genome-wide association approach to 1507 11-year-old children from the Avon Longitudinal Study of Parents and Children (ALSPAC) in whom 2D:4D ratio had been measured, as well as a sample of 1382 12- to 16-year-olds from the Brisbane Adolescent Twin Study. A meta-analysis of the two scans identified a single variant in the LIN28B gene that was strongly associated with 2D:4D (rs314277: p = 4.1 x 10(-8)) and was subsequently independently replicated in an additional 3659 children from the ALSPAC cohort (p = 1.53 x 10(-6)). The minor allele of the rs314277 variant has previously been linked to increased height and delayed age at menarche, but in our study it was associated with increased 2D:4D in the direction opposite to that of previous reports on the correlation between 2D:4D and age at menarche. Our findings call into question the validity of 2D:4D as a simplistic retrospective biomarker for prenatal testosterone exposure.

    Funded by: Medical Research Council: 90600705, G0800582; Wellcome Trust

    American journal of human genetics 2010;86;4;519-25

  • Genome-wide association studies of serum magnesium, potassium, and sodium concentrations identify six Loci influencing serum magnesium levels.

    Meyer TE, Verwoert GC, Hwang SJ, Glazer NL, Smith AV, van Rooij FJ, Ehret GB, Boerwinkle E, Felix JF, Leak TS, Harris TB, Yang Q, Dehghan A, Aspelund T, Katz R, Homuth G, Kocher T, Rettig R, Ried JS, Gieger C, Prucha H, Pfeufer A, Meitinger T, Coresh J, Hofman A, Sarnak MJ, Chen YD, Uitterlinden AG, Chakravarti A, Psaty BM, van Duijn CM, Kao WH, Witteman JC, Gudnason V, Siscovick DS, Fox CS, Köttgen A, Genetic Factors for Osteoporosis Consortium and Meta Analysis of Glucose and Insulin Related Traits Consortium

    Human Genetics Center and Division of Epidemiology, The University of Texas Health Science Center at Houston, School of Public Health, Houston, Texas, USA.

    Magnesium, potassium, and sodium, cations commonly measured in serum, are involved in many physiological processes including energy metabolism, nerve and muscle function, signal transduction, and fluid and blood pressure regulation. To evaluate the contribution of common genetic variation to normal physiologic variation in serum concentrations of these cations, we conducted genome-wide association studies of serum magnesium, potassium, and sodium concentrations using approximately 2.5 million genotyped and imputed common single nucleotide polymorphisms (SNPs) in 15,366 participants of European descent from the international CHARGE Consortium. Study-specific results were combined using fixed-effects inverse-variance weighted meta-analysis. SNPs demonstrating genome-wide significant (p<5 x 10(-8)) or suggestive associations (p<4 x 10(-7)) were evaluated for replication in an additional 8,463 subjects of European descent. The association of common variants at six genomic regions (in or near MUC1, ATP2B1, DCDC5, TRPM6, SHROOM3, and MDS1) with serum magnesium levels was genome-wide significant when meta-analyzed with the replication dataset. All initially significant SNPs from the CHARGE Consortium showed nominal association with clinically defined hypomagnesemia, two showed association with kidney function, two with bone mineral density, and one of these also associated with fasting glucose levels. Common variants in CNNM2, a magnesium transporter studied only in model systems to date, as well as in CNNM3 and CNNM4, were also associated with magnesium concentrations in this study. We observed no associations with serum sodium or potassium levels exceeding p<4 x 10(-7). Follow-up studies of newly implicated genomic loci may provide additional insights into the regulation and homeostasis of human serum magnesium levels.

    Funded by: NCRR NIH HHS: M01-RR00425, UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N02-HL-6-4278, R01 HL087652, R01HL087641, U01 HL080295; NIA NIH HHS: N01-AG-12100; NIDDK NIH HHS: DK063491

    PLoS genetics 2010;6;8

  • Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies.

    Miller DT, Adam MP, Aradhya S, Biesecker LG, Brothman AR, Carter NP, Church DM, Crolla JA, Eichler EE, Epstein CJ, Faucett WA, Feuk L, Friedman JM, Hamosh A, Jackson L, Kaminsky EB, Kok K, Krantz ID, Kuhn RM, Lee C, Ostell JM, Rosenberg C, Scherer SW, Spinner NB, Stavropoulos DJ, Tepperberg JH, Thorland EC, Vermeesch JR, Waggoner DJ, Watson MS, Martin CL and Ledbetter DH

    Division of Genetics and Department of Laboratory Medicine, Children's Hospital Boston and Harvard Medical School, Boston, MA, USA.

    Chromosomal microarray (CMA) is increasingly utilized for genetic testing of individuals with unexplained developmental delay/intellectual disability (DD/ID), autism spectrum disorders (ASD), or multiple congenital anomalies (MCA). Performing CMA and G-banded karyotyping on every patient substantially increases the total cost of genetic testing. The International Standard Cytogenomic Array (ISCA) Consortium held two international workshops and conducted a literature review of 33 studies, including 21,698 patients tested by CMA. We provide an evidence-based summary of clinical cytogenetic testing comparing CMA to G-banded karyotyping with respect to technical advantages and limitations, diagnostic yield for various types of chromosomal aberrations, and issues that affect test interpretation. CMA offers a much higher diagnostic yield (15%-20%) for genetic testing of individuals with unexplained DD/ID, ASD, or MCA than a G-banded karyotype ( approximately 3%, excluding Down syndrome and other recognizable chromosomal syndromes), primarily because of its higher sensitivity for submicroscopic deletions and duplications. Truly balanced rearrangements and low-level mosaicism are generally not detectable by arrays, but these are relatively infrequent causes of abnormal phenotypes in this population (<1%). Available evidence strongly supports the use of CMA in place of G-banded karyotyping as the first-tier cytogenetic diagnostic test for patients with DD/ID, ASD, or MCA. G-banded karyotype analysis should be reserved for patients with obvious chromosomal syndromes (e.g., Down syndrome), a family history of chromosomal rearrangement, or a history of multiple miscarriages.

    Funded by: Howard Hughes Medical Institute; NICHD NIH HHS: RC2 HD064525; NIMH NIH HHS: MH074090

    American journal of human genetics 2010;86;5;749-64

  • Annotating the regulatory genome.

    Montgomery SB, Kasaian K, Jones SJ and Griffith OL

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Determining the timing and molecular repertoire responsible for gene expression is fundamental to understanding a gene's function. Heritable differences in this character are increasingly regarded as explanatory for complex and common traits. For many known trait-predisposing genes, studies have sought to elucidate the associated logic behind gene regulation. However, there exist many challenges in deciphering these mechanisms. Among them, it is recognized that we have limited understanding of regulatory complexity, the current models of gene regulation have low specificity and any gene's regulatory logic is dependent on biological context. Addressing these limitations and defining the regulatory genome is an ongoing challenge for molecular biology. We discuss current efforts to define and annotate the regulatory genome by focusing on curation and text-mining activities. We further highlight the type of information and curation process for describing regulatory elements within the ORegAnno database ( ) and how the general standards for such information are changing.

    Methods in molecular biology (Clifton, N.J.) 2010;674;313-49

  • Transcriptome genetics using second generation sequencing in a Caucasian population.

    Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R and Dermitzakis ET

    Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211 Switzerland.

    Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. Many studies have previously identified genetic variants for gene expression phenotypes using custom and commercially available microarrays. Second generation sequencing technologies are now providing unprecedented access to the fine structure of the transcriptome. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project. We have quantified exon abundance based on read depth and have also developed methods to quantify whole transcript abundance. We have found that approximately 10 million reads of sequencing can provide access to the same dynamic range as arrays with better quantification of alternative and highly abundant transcripts. Correlation with SNPs (small nucleotide polymorphisms) leads to a larger discovery of eQTLs (expression quantitative trait loci) than with arrays. We also detect a substantial number of variants that influence the structure of mature transcripts indicating variants responsible for alternative splicing. Finally, measures of allele-specific expression allowed the identification of rare eQTLs and allelic differences in transcript structure. This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.

    Funded by: Wellcome Trust: 077046

    Nature 2010;464;7289;773-7

  • Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity.

    Morelli G, Song Y, Mazzoni CJ, Eppinger M, Roumagnac P, Wagner DM, Feldkamp M, Kusecek B, Vogler AJ, Li Y, Cui Y, Thomson NR, Jombart T, Leblois R, Lichtner P, Rahalison L, Petersen JM, Balloux F, Keim P, Wirth T, Ravel J, Yang R, Carniel E and Achtman M

    Max-Planck-Institut für Infektionsbiologie, Department of Molecular Biology, Berlin, Germany.

    Plague is a pandemic human invasive disease caused by the bacterial agent Yersinia pestis. We here report a comparison of 17 whole genomes of Y. pestis isolates from global sources. We also screened a global collection of 286 Y. pestis isolates for 933 SNPs using Sequenom MassArray SNP typing. We conducted phylogenetic analyses on this sequence variation dataset, assigned isolates to populations based on maximum parsimony and, from these results, made inferences regarding historical transmission routes. Our phylogenetic analysis suggests that Y. pestis evolved in or near China and spread through multiple radiations to Europe, South America, Africa and Southeast Asia, leading to country-specific lineages that can be traced by lineage-specific SNPs. All 626 current isolates from the United States reflect one radiation, and 82 isolates from Madagascar represent a second radiation. Subsequent local microevolution of Y. pestis is marked by sequential, geographically specific SNPs.

    Funded by: NIAID NIH HHS: AI065359, N01 AI-30071; Science Foundation Ireland: 05/FE1/B882; Wellcome Trust

    Nature genetics 2010;42;12;1140-3

  • EuroPhenome: a repository for high-throughput mouse phenotyping data.

    Morgan H, Beck T, Blake A, Gates H, Adams N, Debouzy G, Leblanc S, Lengger C, Maier H, Melvin D, Meziane H, Richardson D, Wells S, White J, Wood J, EUMODIC Consortium, de Angelis MH, Brown SD, Hancock JM and Mallon AM

    MRC Harwell, Mammalian Genetics Unit, MRC Harwell, Mary Lyon Centre, Harwell Science and Innovation Campus, Oxfordshire OX11 0RD, UK.

    The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project ( is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline ( which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies.

    Funded by: Medical Research Council

    Nucleic acids research 2010;38;Database issue;D577-85

  • Mutations in SLC29A3, encoding an equilibrative nucleoside transporter ENT3, cause a familial histiocytosis syndrome (Faisalabad histiocytosis) and familial Rosai-Dorfman disease.

    Morgan NV, Morris MR, Cangul H, Gleeson D, Straatman-Iwanowska A, Davies N, Keenan S, Pasha S, Rahman F, Gentle D, Vreeswijk MP, Devilee P, Knowles MA, Ceylaner S, Trembath RC, Dalence C, Kismet E, Köseoğlu V, Rossbach HC, Gissen P, Tannahill D and Maher ER

    Wellchild Paediatric Research Centre and Department of Medical and Molecular Genetics, University of Birmingham College of Medical and Dental Sciences, Edgbaston, Birmingham, United Kingdom.

    The histiocytoses are a heterogeneous group of disorders characterised by an excessive number of histiocytes. In most cases the pathophysiology is unclear and treatment is nonspecific. Faisalabad histiocytosis (FHC) (MIM 602782) has been classed as an autosomal recessively inherited form of histiocytosis with similarities to Rosai-Dorfman disease (RDD) (also known as sinus histiocytosis with massive lymphadenopathy (SHML)). To elucidate the molecular basis of FHC, we performed autozygosity mapping studies in a large consanguineous family and identified a novel locus at chromosome 10q22.1. Mutation analysis of candidate genes within the target interval identified biallelic germline mutations in SLC29A3 in the FHC kindred and in two families reported to have familial RDD. Analysis of SLC29A3 expression during mouse embryogenesis revealed widespread expression by e14.5 with prominent expression in the central nervous system, eye, inner ear, and epithelial tissues including the gastrointestinal tract. SLC29A3 encodes an intracellular equilibrative nucleoside transporter (hENT3) with affinity for adenosine. Recently germline mutations in SLC29A3 were also described in two rare autosomal recessive disorders with overlapping phenotypes: (a) H syndrome (MIM 612391) that is characterised by cutaneous hyperpigmentation and hypertrichosis, hepatomegaly, heart anomalies, hearing loss, and hypogonadism; and (b) PHID (pigmented hypertrichosis with insulin-dependent diabetes mellitus) syndrome. Our findings suggest that a variety of clinical diagnoses (H and PHID syndromes, FHC, and familial RDD) can be included in a new diagnostic category of SLC29A3 spectrum disorder.

    Funded by: Cancer Research UK; Wellcome Trust

    PLoS genetics 2010;6;2;e1000833

  • Identification of protective and broadly conserved vaccine antigens from the genome of extraintestinal pathogenic Escherichia coli.

    Moriel DG, Bertoldi I, Spagnuolo A, Marchi S, Rosini R, Nesta B, Pastorello I, Corea VA, Torricelli G, Cartocci E, Savino S, Scarselli M, Dobrindt U, Hacker J, Tettelin H, Tallon LJ, Sullivan S, Wieler LH, Ewers C, Pickard D, Dougan G, Fontana MR, Rappuoli R, Pizza M and Serino L

    Novartis Vaccines and Diagnostics, 53100 Siena, Italy.

    Extraintestinal pathogenic Escherichia coli (ExPEC) are a common cause of disease in both mammals and birds. A vaccine to prevent such infections would be desirable given the increasing antibiotic resistance of these bacteria. We have determined the genome sequence of ExPEC IHE3034 (ST95) isolated from a case of neonatal meningitis and compared this to available genome sequences of other ExPEC strains and a few nonpathogenic E. coli. We found 19 genomic islands present in the genome of IHE3034, which are absent in the nonpathogenic E. coli isolates. By using subtractive reverse vaccinology we identified 230 antigens present in ExPEC but absent (or present with low similarity) in nonpathogenic strains. Nine antigens were protective in a mouse challenge model. Some of them were also present in other pathogenic non-ExPEC strains, suggesting that a broadly protective E. coli vaccine may be possible. The gene encoding the most protective antigen was detected in most of the E. coli isolates, highly conserved in sequence and found to be exported by a type II secretion system which seems to be nonfunctional in nonpathogenic strains.

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;20;9072-7

  • Assessment of protein domain fusions in human protein interaction networks prediction: application to the human kinetochore model.

    Morilla I, Lees JG, Reid AJ, Orengo C and Ranea JA

    Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain.

    In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Some proteins, involved in a common biological process and encoded by separate genes in one organism, can be found fused within a single protein chain in other organisms. By detecting these triplets, a functional relationship can be established between the unfused proteins. Here we use a domain fusion prediction method to predict these protein interactions for the human interactome. We observed that gene fusion events are more related to physical interaction between proteins than to other weaker functional relationships such as participation in a common biological pathway. These results suggest that domain fusion is an appropriate method for predicting protein complexes. The most reliable fused domain predictions were used to build protein-protein interaction (PPI) networks. These predicted PPI network models showed the same topological features as real biological networks and different features from random behaviour. We built the PPI domain fusion sub-network model of the human kinetochore and observed that the majority of the predicted interactions have not yet been experimentally characterised in the publicly available PPI repositories. The study of the human kinetochore domain fusion sub-network reveals undiscovered kinetochore proteins with presumably relevant functions, such as hubs with many connections in the kinetochore sub-network. These results suggest that experimentally hidden regions in the predicted PPI networks contain key functional elements, associated with important functional areas, still undiscovered in the human interactome. Until novel experiments shed light on these hidden regions; domain fusion predictions provide a valuable approach for exploring them.

    Funded by: Biotechnology and Biological Sciences Research Council

    New biotechnology 2010;27;6;755-65

  • An evaluation of statistical approaches to rare variant analysis in genetic association studies.

    Morris AP and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, United Kingdom.

    Genome-wide association (GWA) studies have proved to be extremely successful in identifying novel common polymorphisms contributing effects to the genetic component underlying complex traits. Nevertheless, one source of, as yet, undiscovered genetic determinants of complex traits are those mediated through the effects of rare variants. With the increasing availability of large-scale re-sequencing data for rare variant discovery, we have developed a novel statistical method for the detection of complex trait associations with these loci, based on searching for accumulations of minor alleles within the same functional unit. We have undertaken simulations to evaluate strategies for the identification of rare variant associations in population-based genetic studies when data are available from re-sequencing discovery efforts or from commercially available GWA chips. Our results demonstrate that methods based on accumulations of rare variants discovered through re-sequencing offer substantially greater power than conventional analysis of GWA data, and thus provide an exciting opportunity for future discovery of genetic determinants of complex traits.

    Funded by: Wellcome Trust: 064890, 081682, WT081682/Z/06/Z, WT088885/Z/09/Z

    Genetic epidemiology 2010;34;2;188-93

  • A powerful approach to sub-phenotype analysis in population-based genetic association studies.

    Morris AP, Lindgren CM, Zeggini E, Timpson NJ, Frayling TM, Hattersley AT and McCarthy MI

    The Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, United Kingdom.

    The ultimate goal of genome-wide association (GWA) studies is to identify genetic variants contributing effects to complex phenotypes in order to improve our understanding of the biological architecture underlying the trait. One approach to allow us to meet this challenge is to consider more refined sub-phenotypes of disease, defined by pattern of symptoms, for example, which may be physiologically distinct, and thus may have different underlying genetic causes. The disadvantage of sub-phenotype analysis is that large disease cohorts are sub-divided into smaller case categories, thus reducing power to detect association. To address this issue, we have developed a novel test of association within a multinomial regression modeling framework, allowing for heterogeneity of genetic effects between sub-phenotypes. The modeling framework is extremely flexible, and can be generalized to any number of distinct sub-phenotypes. Simulations demonstrate the power of the multinomial regression-based analysis over existing methods when genetic effects differ between sub-phenotypes, with minimal loss of power when these effects are homogenous for the unified phenotype. Application of the multinomial regression analysis to a genome-wide association study of type 2 diabetes, with cases categorized according to body mass index, highlights previously recognized differential mechanisms underlying obese and non-obese forms of the disease, and provides evidence of a potential novel association that warrants follow-up in independent replication cohorts.

    Funded by: Wellcome Trust: 076113, 081682, WT081682/Z/06/Z

    Genetic epidemiology 2010;34;4;335-43

  • Evoker: a visualization tool for genotype intensity data.

    Morris JA, Randall JC, Maller JB and Barrett JC

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.

    Summary: Genome-wide association studies (GWAS), which produce huge volumes of data, are now being carried out by many groups around the world, creating a need for user-friendly tools for data quality control (QC) and analysis. One critical aspect of GWAS QC is evaluating genotype cluster plots to verify sensible genotype calling in putatively associated single nucleotide polymorphisms (SNPs). Evoker is a tool for visualizing genotype cluster plots, and provides a solution to the computational and storage problems related to working with such large datasets.


    Funded by: Wellcome Trust: 089120, WT08912/Z/09/Z

    Bioinformatics (Oxford, England) 2010;26;14;1786-7

  • C-kit gene mutations in adenoid cystic carcinoma are rare.

    Moskaluk CA, Frierson HF, El-Naggar AK and Futreal PA

    Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc 2010;23;6;905-6; author reply 906-7

  • Methods for Improving Genome Annotation

    Mudge,J. and Harrow,J.

    Knowledge-Based Bioinformatics: From Analysis to Interpretation 2010;Chapter 9;209-32

  • Intra- and interhost evolutionary dynamics of equine influenza virus.

    Murcia PR, Baillie GJ, Daly J, Elton D, Jervis C, Mumford JA, Newton R, Parrish CR, Hoelzer K, Dougan G, Parkhill J, Lennard N, Ormond D, Moule S, Whitwham A, McCauley JW, McKinley TJ, Holmes EC, Grenfell BT and Wood JL

    Cambridge Infectious Diseases Consortium, Department of Veterinary Medicine, University of Cambridge, Madingley Road, CB3 0ES Cambridge, England, United Kingdom.

    Determining the evolutionary basis of cross-species transmission and immune evasion is key to understanding the mechanisms that control the emergence of either new viruses or novel antigenic variants with pandemic potential. The hemagglutinin glycoprotein of influenza A viruses is a critical host range determinant and a major target of neutralizing antibodies. Equine influenza virus (EIV) is a significant pathogen of the horse that causes periodical outbreaks of disease even in populations with high vaccination coverage. EIV has also jumped the species barrier and emerged as a novel respiratory pathogen in dogs, canine influenza virus. We studied the dynamics of equine influenza virus evolution in horses at the intrahost level and how this evolutionary process is affected by interhost transmission in a natural setting. To this end, we performed clonal sequencing of the hemagglutinin 1 gene derived from individual animals at different times postinfection. Our results show that despite the population consensus sequence remaining invariant, genetically distinct subpopulations persist during the course of infection and are also transmitted, with some variants likely to change antigenicity. We also detected a natural case of mixed infection in an animal infected during an outbreak of equine influenza, raising the possibility of reassortment between different strains of virus. In sum, our data suggest that transmission bottlenecks may not be as narrow as originally perceived and that the genetic diversity required to adapt to new host species may be partially present in the donor host and potentially transmitted to the recipient host.

    Funded by: NICHD NIH HHS: R24 HD047879; NIGMS NIH HHS: R01 GM080533, R01 GM083983-01, R01 GM083983-05; Wellcome Trust

    Journal of virology 2010;84;14;6943-54

  • The two most common histological subtypes of malignant germ cell tumour are distinguished by global microRNA profiles, associated with differential transcription factor expression.

    Murray MJ, Saini HK, van Dongen S, Palmer RD, Muralidhar B, Pett MR, Piipari M, Thornton CM, Nicholson JC, Enright AJ and Coleman N

    Medical Research Council Cancer Cell Unit, Cambridge, CB2 0XZ, UK.

    Background: We hypothesised that differences in microRNA expression profiles contribute to the contrasting natural history and clinical outcome of the two most common types of malignant germ cell tumour (GCT), yolk sac tumours (YSTs) and germinomas.

    Results: By direct comparison, using microarray data for paediatric GCT samples and published qRT-PCR data for adult samples, we identified microRNAs significantly up-regulated in YSTs (n = 29 paediatric, 26 adult, 11 overlapping) or germinomas (n = 37 paediatric). By Taqman qRT-PCR we confirmed differential expression of 15 of 16 selected microRNAs and further validated six of these (miR-302b, miR-375, miR-200b, miR-200c, miR-122, miR-205) in an independent sample set. Interestingly, the miR-302 cluster, which is over-expressed in all malignant GCTs, showed further over-expression in YSTs versus germinomas, representing six of the top eight microRNAs over-expressed in paediatric YSTs and seven of the top 11 in adult YSTs. To explain this observation, we used mRNA expression profiles of paediatric and adult malignant GCTs to identify 10 transcription factors (TFs) consistently over-expressed in YSTs versus germinomas, followed by linear regression to confirm associations between TF and miR-302 cluster expression levels. Using the sequence motif analysis environment iMotifs, we identified predicted binding sites for four of the 10 TFs (GATA6, GATA3, TCF7L2 and MAF) in the miR-302 cluster promoter region. Finally, we showed that miR-302 family over-expression in YST is likely to be functionally significant, as mRNAs down-regulated in YSTs were enriched for 3' untranslated region sequences complementary to the common seed of miR-302a~miR-302d. Such mRNAs included mediators of key cancer-associated processes, including tumour suppressor genes, apoptosis regulators and TFs.

    Conclusions: Differential microRNA expression is likely to contribute to the relatively aggressive behaviour of YSTs and may enable future improvements in clinical diagnosis and/or treatment.

    Funded by: Medical Research Council

    Molecular cancer 2010;9;290

  • Allelic variants of IL1R1 gene associate with severe hand osteoarthritis.

    Näkki A, Kouhia ST, Saarela J, Harilainen A, Tallroth K, Videman T, Battié MC, Kaprio J, Peltonen L and Kujala UM

    Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.

    Background: In search for genes predisposing to osteoarthritis (OA), several genome wide scans have provided evidence for linkage on 2q. In this study we targeted a 470 kb region on 2q11.2 presenting the locus with most evidence for linkage to severe OA of distal interphalangeal joints (DIP) in our genome wide scan families.

    Methods: We genotyped 32 single nucleotide polymorphisms (SNPs) in this 470 kb region comprising six genes belonging to the interleukin 1 superfamily and monitored for association with individual SNPs and SNP haplotypes among severe familial hand OA cases (material extended from our previous linkage study; n = 134), unrelated end-stage bilateral primary knee OA cases (n = 113), and population based controls (n = 436).

    Results: Four SNPs in the IL1R1 gene, mapping to a 125 kb LD block, provided evidence for association with hand OA in family-based and case-control analysis, the strongest association being with SNP rs2287047 (p-value = 0.0009).

    Conclusions: This study demonstrates an association between severe hand OA and IL1R1 gene. This gene represents a highly relevant biological candidate since it encodes protein that is a known modulator of inflammatory processes associated with joint destruction and resides within a locus providing consistent evidence for linkage to hand OA. As the observed association did not fully explain the linkage obtained in the previous study, it is plausible that also other variants in this genome region predispose to hand OA.

    BMC medical genetics 2010;11;50

  • Limited variation in vaccine candidate Plasmodium falciparum Merozoite Surface Protein-6 over multiple transmission seasons.

    Neal AT, Jordan SJ, Oliveira AL, Hernandez JN, Branch OH and Rayner JC

    William C Gorgas Center for Geographic Medicine, Division of Infectious Diseases, Department of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294-2170, USA.

    Background: Plasmodium falciparum Merozoite Surface Protein-6 (PfMSP6) is a component of the complex proteinacious coat that surrounds P. falciparum merozoites. This location, and the presence of anti-PfMSP6 antibodies in P. falciparum-exposed individuals, makes PfMSP6 a potential blood stage vaccine target. However, genetic diversity has proven to be a major hurdle for vaccines targeting other blood stage P. falciparum antigens, and few endemic field studies assessing PfMSP6 gene diversity have been conducted. This study follows PfMSP6 diversity in the Peruvian Amazon from 2003 to 2006 and is the first longitudinal assessment of PfMSP6 sequence dynamics.

    Methods: Parasite DNA was extracted from 506 distinct P. falciparum infections spanning the transmission seasons from 2003 to 2006 as part of the Malaria Immunology and Genetics in the Amazon (MIGIA) cohort study near Iquitos, Peru. PfMSP6 was amplified from each sample using a nested PCR protocol, genotyped for allele class by agarose gel electrophoresis, and sequenced to detect diversity. Allele frequencies were analysed using JMP v. and correlated with clinical and epidemiological data collected as part of the MIGIA project.

    Results: Both PfMSP6 allele classes, K1-like and 3D7-like, were detected at the study site, confirming that both are globally distributed. Allele frequencies varied significantly between transmission seasons, with 3D7-class alleles dominating and K1-class alleles nearly disappearing in 2005 and 2006. There was a significant association between allele class and village location (p-value = 0.0008), but no statistically significant association between allele class and age, sex, or symptom status. No intra-allele class sequence diversity was detected.

    Conclusions: Both PfMSP6 allele classes are globally distributed, and this study shows that allele frequencies can fluctuate significantly between communities separated by only a few kilometres, and over time in the same community. By contrast, PfMSP6 was highly stable at the sequence level, with no SNPs detected in the 506 samples analysed. This limited diversity supports further investigation of PfMSP6 as a blood stage vaccine candidate, with the clear caveat that any such vaccine must either contain both alleles or generate cross-protective responses that react against both allele classes. Detailed immunoepidemiology studies are needed to establish the viability of these approaches before PfMSP6 advances further down the vaccine development pipeline.

    Funded by: NIAID NIH HHS: R01 AI064831, R21 AI072421, R21 AI072421-02

    Malaria journal 2010;9;138

  • Interactions of dietary whole-grain intake with fasting glucose- and insulin-related genetic loci in individuals of European descent: a meta-analysis of 14 cohort studies.

    Nettleton JA, McKeown NM, Kanoni S, Lemaitre RN, Hivert MF, Ngwa J, van Rooij FJ, Sonestedt E, Wojczynski MK, Ye Z, Tanaka T, Garcia M, Anderson JS, Follis JL, Djousse L, Mukamal K, Papoutsakis C, Mozaffarian D, Zillikens MC, Bandinelli S, Bennett AJ, Borecki IB, Feitosa MF, Ferrucci L, Forouhi NG, Groves CJ, Hallmans G, Harris T, Hofman A, Houston DK, Hu FB, Johansson I, Kritchevsky SB, Langenberg C, Launer L, Liu Y, Loos RJ, Nalls M, Orho-Melander M, Renstrom F, Rice K, Riserus U, Rolandsson O, Rotter JI, Saylor G, Sijbrands EJ, Sjogren P, Smith A, Steingrímsdóttir L, Uitterlinden AG, Wareham NJ, Prokopenko I, Pankow JS, van Duijn CM, Florez JC, Witteman JC, MAGIC Investigators, Dupuis J, Dedoussis GV, Ordovas JM, Ingelsson E, Cupples L, Siscovick DS, Franks PW and Meigs JB

    Division of Epidemiology, Human Genetics, and Environmental Sciences, University of Texas Health Sciences Center, Houston, Houston, Texas, USA.

    Objective: Whole-grain foods are touted for multiple health benefits, including enhancing insulin sensitivity and reducing type 2 diabetes risk. Recent genome-wide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs) associated with fasting glucose and insulin concentrations in individuals free of diabetes. We tested the hypothesis that whole-grain food intake and genetic variation interact to influence concentrations of fasting glucose and insulin.

    Via meta-analysis of data from 14 cohorts comprising ∼ 48,000 participants of European descent, we studied interactions of whole-grain intake with loci previously associated in GWAS with fasting glucose (16 loci) and/or insulin (2 loci) concentrations. For tests of interaction, we considered a P value <0.0028 (0.05 of 18 tests) as statistically significant.

    Results: Greater whole-grain food intake was associated with lower fasting glucose and insulin concentrations independent of demographics, other dietary and lifestyle factors, and BMI (β [95% CI] per 1-serving-greater whole-grain intake: -0.009 mmol/l glucose [-0.013 to -0.005], P < 0.0001 and -0.011 pmol/l [ln] insulin [-0.015 to -0.007], P = 0.0003). No interactions met our multiple testing-adjusted statistical significance threshold. The strongest SNP interaction with whole-grain intake was rs780094 (GCKR) for fasting insulin (P = 0.006), where greater whole-grain intake was associated with a smaller reduction in fasting insulin concentrations in those with the insulin-raising allele.

    Conclusions: Our results support the favorable association of whole-grain intake with fasting glucose and insulin and suggest a potential interaction between variation in GCKR and whole-grain intake in influencing fasting insulin concentrations.

    Funded by: Medical Research Council: G0701863, G0902037, G19/35, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015; NIA NIH HHS: R01 AG032098-03, R01 AG032098-04

    Diabetes care 2010;33;12;2684-91

  • Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes.

    Neumann B, Walter T, Hériché JK, Bulkescher J, Erfle H, Conrad C, Rogers P, Poser I, Held M, Liebel U, Cetin C, Sieckmann F, Pau G, Kabbe R, Wünsche A, Satagopam V, Schmitz MH, Chapuis C, Gerlich DW, Schneider R, Eils R, Huber W, Peters JM, Hyman AA, Durbin R, Pepperkok R and Ellenberg J

    MitoCheck Project Group, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, D-69117 Heidelberg, Germany.

    Despite our rapidly growing knowledge about the human genome, we do not know all of the genes required for some of the most basic functions of life. To start to fill this gap we developed a high-throughput phenotypic screening platform combining potent gene silencing by RNA interference, time-lapse microscopy and computational image processing. We carried out a genome-wide phenotypic profiling of each of the approximately 21,000 human protein-coding genes by two-day live imaging of fluorescently labelled chromosomes. Phenotypes were scored quantitatively by computational image processing, which allowed us to identify hundreds of human genes involved in diverse biological functions including cell division, migration and survival. As part of the Mitocheck consortium, this study provides an in-depth analysis of cell division phenotypes and makes the entire high-content data set available as a resource to the community.

    Funded by: Wellcome Trust: 077192

    Nature 2010;464;7289;721-7

  • Laser excitation power and the flow cytometric resolution of complex karyotypes.

    Ng BL and Carter NP

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    The analytical resolution of individual chromosome peaks in the flow karyotype of cell lines is dependent on sample preparation and the detection sensitivity of the flow cytometer. We have investigated the effect of laser power on the resolution of chromosome peaks in cell lines with complex karyotypes. Chromosomes were prepared from a human gastric cancer cell line and a cell line from a patient with an abnormal phenotype using a modified polyamine isolation buffer. The stained chromosome suspensions were analyzed on a MoFlo sorter (Beckman Coulter) equipped with two water-cooled lasers (Coherent). A bivariate flow karyotype was obtained from each of the cell lines at various laser power settings and compared to a karyotype generated using laser power settings of 300 mW. The best separation of chromosome peaks was obtained with laser powers of 300 mW. This study demonstrates the requirement for high-laser powers for the accurate detection and purification of chromosomes, particularly from complex karyotypes, using a conventional flow cytometer.

    Funded by: Wellcome Trust: WT077008

    Cytometry. Part A : the journal of the International Society for Analytical Cytology 2010;77;6;585-8

  • The sudden dominance of blaCTX-M harbouring plasmids in Shigella spp. Circulating in Southern Vietnam.

    Nguyen NT, Ha V, Tran NV, Stabler R, Pham DT, Le TM, van Doorn HR, Cerdeño-Tárraga A, Thomson N, Campbell J, Nguyen VM, Tran TT, Pham MV, Cao TT, Wren B, Farrar J and Baker S

    The Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam.

    Background: Plasmid mediated antimicrobial resistance in the Enterobacteriaceae is a global problem. The rise of CTX-M class extended spectrum beta lactamases (ESBLs) has been well documented in industrialized countries. Vietnam is representative of a typical transitional middle income country where the spectrum of infectious diseases combined with the spread of drug resistance is shifting and bringing new healthcare challenges.

    Methodology: We collected hospital admission data from the pediatric population attending the hospital for tropical diseases in Ho Chi Minh City with Shigella infections. Organisms were cultured from all enrolled patients and subjected to antimicrobial susceptibility testing. Those that were ESBL positive were subjected to further investigation. These investigations included PCR amplification for common ESBL genes, plasmid investigation, conjugation, microarray hybridization and DNA sequencing of a bla(CTX-M) encoding plasmid.

    We show that two different bla(CTX-M) genes are circulating in this bacterial population in this location. Sequence of one of the ESBL plasmids shows that rather than the gene being integrated into a preexisting MDR plasmid, the bla(CTX-M) gene is located on relatively simple conjugative plasmid. The sequenced plasmid (pEG356) carried the bla(CTX-M-24) gene on an ISEcp1 element and demonstrated considerable sequence homology with other IncFI plasmids.

    Significance: The rapid dissemination, spread of antimicrobial resistance and changing population of Shigella spp. concurrent with economic growth are pertinent to many other countries undergoing similar development. Third generation cephalosporins are commonly used empiric antibiotics in Ho Chi Minh City. We recommend that these agents should not be considered for therapy of dysentery in this setting.

    Funded by: Wellcome Trust

    PLoS neglected tropical diseases 2010;4;6;e702

  • Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations.

    Nica AC, Montgomery SB, Dimas AS, Stranger BE, Beazley C, Barroso I and Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r(2)) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.

    Funded by: Wellcome Trust

    PLoS genetics 2010;6;4;e1000895

  • Out of the sequencer and into the wiki as we face new challenges in genome informatics.

    Ning Z and Montgomery SB

    Sequencing Informatics, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, UK.

    A report on the joint Cold Spring Harbor Laboratory/Wellcome Trust Conference 'Genome Informatics', 15-19 September 2010, Hinxton, Cambridge, UK.

    Genome biology 2010;11;10;308

  • Salmonella enterica serovar Typhimurium mutants completely lacking the F(0)F(1) ATPase are novel live attenuated vaccine strains.

    Northen H, Paterson GK, Constantino-Casas F, Bryant CE, Clare S, Mastroeni P, Peters SE and Maskell DJ

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    The F(0)F(1) ATPase plays a central role in both the generation of ATP and the utilisation of ATP for cellular processes such as rotation of bacterial flagella. We have deleted the entire operon encoding the F(0)F(1) ATPase, as well as genes encoding individual F(0) or F(1) subunits, in Salmonella enteric serovar Typhimurium. These mutants were attenuated for virulence, as assessed by bacterial counts in the livers and spleens of intravenously infected mice. The attenuated in vivo growth of the entire atp operon mutant was complemented by the insertion of the atp operon into the malXY pseudogene region. Following clearance of the attenuated mutants from the organs, mice were protected against challenge with the virulent wild type parent strain. We have shown that the F(0)F(1) ATPase is important for bacterial growth in vivo and that atp mutants are effective live attenuated vaccines against Salmonella infection.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/S/N/2006/13095; Wellcome Trust

    Vaccine 2010;28;4;940-9

  • Long- and short-term selective forces on malaria parasite genomes.

    Nygaard S, Braunstein A, Malsen G, Van Dongen S, Gardner PP, Krogh A, Otto TD, Pain A, Berriman M, McAuliffe J, Dermitzakis ET and Jeffares DC

    Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark.

    Plasmodium parasites, the causal agents of malaria, result in more than 1 million deaths annually. Plasmodium are unicellular eukaryotes with small ∼23 Mb genomes encoding ∼5200 protein-coding genes. The protein-coding genes comprise about half of these genomes. Although evolutionary processes have a significant impact on malaria control, the selective pressures within Plasmodium genomes are poorly understood, particularly in the non-protein-coding portion of the genome. We use evolutionary methods to describe selective processes in both the coding and non-coding regions of these genomes. Based on genome alignments of seven Plasmodium species, we show that protein-coding, intergenic and intronic regions are all subject to purifying selection and we identify 670 conserved non-genic elements. We then use genome-wide polymorphism data from P. falciparum to describe short-term selective processes in this species and identify some candidate genes for balancing (diversifying) selection. Our analyses suggest that there are many functional elements in the non-genic regions of these genomes and that adaptive evolution has occurred more frequently in the protein-coding regions of the genome.

    Funded by: Wellcome Trust: 085775/Z/08/Z

    PLoS genetics 2010;6;9

  • Multi-locus sequence typing of enteroaggregative Escherichia coli isolates from Nigerian children uncovers multiple lineages.

    Okeke IN, Wallace-Gadsden F, Simons HR, Matthews N, Labar AS, Hwang J and Wain J

    Department of Biology, Haverford College, Haverford, Pennsylvania, USA.

    Background: Enteroaggregative Escherichia coli (EAEC) are defined by their stacked-brick adherence pattern to human epithelial cells. There is no all-encompassing genetic marker for EAEC. The category is commonly implicated in diarrhea but research is hampered by perplexing heterogeneity.

    To identify key EAEC lineages, we applied multilocus sequence typing to 126 E. coli isolates from a Nigerian case-control study that showed aggregative adherence in the HEp-2 adherence assay, and 24 other EAEC strains from diverse locations. EAEC largely belonged to the A, B1 and D phylogenetic groups and only 7 (4.6%) isolates were in the B2 cluster. As many as 96 sequence types (STs) were identified but 60 (40%) of the EAEC strains belong to or are double locus variants of STs 10, 31, and 394. The remainder did not belong to predominant complexes. The most common ST complex, with predicted ancestor ST10, included 32 (21.3%) of the isolates. Significant age-related distribution suggests that weaned children in Nigeria are at risk for diarrhea from of ST10-complex EAEC. Phylogenetic group D EAEC strains, predominantly from ST31- and ST394 complexes, represented 38 (25.3%) of all isolates, include genome-sequenced strain 042, and possessed conserved chromosomal loci.

    We have developed a molecular phylogenetic framework, which demonstrates that although grouped by a shared phenotype, the category of 'EAEC' encompasses multiple pathogenic lineages. Principal among isolates from Nigeria were ST10-complex EAEC that were associated with diarrhea in children over one year and ECOR D strains that share horizontally acquired loci.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust

    PloS one 2010;5;11;e14093

  • Lechevalieria atacamensis sp. nov., Lechevalieria deserti sp. nov. and Lechevalieria roselyniae sp. nov., isolated from hyperarid soils.

    Okoro CK, Bull AT, Mutreja A, Rong X, Huang Y and Goodfellow M

    School of Biology, University of Newcastle, Newcastle-upon-Tyne, UK.

    The taxonomic positions of three Lechevalieria-like strains isolated from hyperarid soils of the Atacama Desert, Chile, were established by using a polyphasic approach. The organisms had chemical and morphological properties consistent with their classification in the genus Lechevalieria. They formed a distinct subclade in the Lechevalieria 16S rRNA gene clade and were most closely related to the type strain of Lechevalieria xinjiangensis. DNA-DNA relatedness data showed that each of the isolates and Lechevalieria xinjiangensis DSM 45081(T) belong to distinct genomic species. The new isolates and the type strains of recognized Lechevalieria species were readily distinguished based on a number of phenotypic properties. A combination of the genotypic and phenotypic data showed that the three isolates represent three novel species of the genus Lechevalieria. The names proposed for these taxa are Lechevalieria atacamensis sp. nov. (type strain C61(T) =CGMCC 4.5536(T) =NRRL B-24706(T)), Lechevalieria deserti sp. nov. (type strain C68(T) =CGMCC 4.5535(T) =NRRL B-24707(T)) and Lechevalieria roselyniae sp. nov. (type strain C81(T) =CGMCC 4.5537(T) =NRRL B-24708(T)).

    International journal of systematic and evolutionary microbiology 2010;60;Pt 2;296-300

  • Synthetic associations in the context of genome-wide association scan signals.

    Orozco G, Barrett JC and Zeggini E

    Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester, UK.

    Genome-wide association studies (GWAS) have successfully identified a large number of genetic variants associated with complex traits, but these only explain a small proportion of the total heritability. It has been recently proposed that rare variants can create 'synthetic association' signals in GWAS, by occurring more often in association with one of the alleles of a common tag single nucleotide polymorphism. While the ultimate evaluation of this hypothesis will require the completion of large-scale sequencing studies, it is informative to place it in the broader context of what is known about the genetic architecture of complex disease. In this review, we draw from empirical and theoretical data to summarize evidence showing that synthetic associations do not underlie many reported GWAS associations.

    Funded by: Wellcome Trust: WT088885/Z/09/Z, WT089120/Z/09/Z

    Human molecular genetics 2010;19;R2;R137-44

  • Dual RMCE for efficient re-engineering of mouse mutant alleles.

    Osterwalder M, Galli A, Rosen B, Skarnes WC, Zeller R and Lopez-Rios J

    Developmental Genetics, Department of Biomedicine, University of Basel, Basel, Switzerland.

    We have developed dual recombinase-mediated cassette exchange (dRMCE) to efficiently re-engineer the thousands of available conditional alleles in mouse embryonic stem cells. dRMCE takes advantage of the wild-type loxP and FRT sites present in these conditional alleles and in many gene-trap lines. dRMCE is a scalable, flexible tool to introduce tags, reporters and mutant coding regions into an endogenous locus of interest in an easy and highly efficient manner.

    Nature methods 2010;7;11;893-5

  • Thioredoxin and glutathione systems differ in parasitic and free-living platyhelminths.

    Otero L, Bonilla M, Protasio AV, Fernández C, Gladyshev VN and Salinas G

    Cátedra de Inmunología, Facultad de Química, Instituto de Higiene, Universidad de la República, Avda, A, Navarro 3051, Montevideo, Uruguay.

    Background: The thioredoxin and/or glutathione pathways occur in all organisms. They provide electrons for deoxyribonucleotide synthesis, function as antioxidant defenses, in detoxification, Fe/S biogenesis and participate in a variety of cellular processes. In contrast to their mammalian hosts, platyhelminth (flatworm) parasites studied so far, lack conventional thioredoxin and glutathione systems. Instead, they possess a linked thioredoxin-glutathione system with the selenocysteine-containing enzyme thioredoxin glutathione reductase (TGR) as the single redox hub that controls the overall redox homeostasis. TGR has been recently validated as a drug target for schistosomiasis and new drug leads targeting TGR have recently been identified for these platyhelminth infections that affect more than 200 million people and for which a single drug is currently available. Little is known regarding the genomic structure of flatworm TGRs, the expression of TGR variants and whether the absence of conventional thioredoxin and glutathione systems is a signature of the entire platyhelminth phylum.

    Results: We examine platyhelminth genomes and transcriptomes and find that all platyhelminth parasites (from classes Cestoda and Trematoda) conform to a biochemical scenario involving, exclusively, a selenium-dependent linked thioredoxin-glutathione system having TGR as a central redox hub. In contrast, the free-living platyhelminth Schmidtea mediterranea (Class Turbellaria) possesses conventional and linked thioredoxin and glutathione systems. We identify TGR variants in Schistosoma spp. derived from a single gene, and demonstrate their expression. We also provide experimental evidence that alternative initiation of transcription and alternative transcript processing contribute to the generation of TGR variants in platyhelminth parasites.

    Conclusions: Our results indicate that thioredoxin and glutathione pathways differ in parasitic and free-living flatworms and that canonical enzymes were specifically lost in the parasitic lineage. Platyhelminth parasites possess a unique and simplified redox system for diverse essential processes, and thus TGR is an excellent drug target for platyhelminth infections. Inhibition of the central redox wire hub would lead to overall disruption of redox homeostasis and disable DNA synthesis.

    Funded by: FIC NIH HHS: TW006959; NIGMS NIH HHS: GM065204; Wellcome Trust: WT 085775/Z/08/Z

    BMC genomics 2010;11;237

  • Seeking perfection.

    Otto TD

    Nature reviews. Microbiology 2010;8;10;681

  • Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    Otto TD, Sanders M, Berriman M and Newbold C

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy.

    Results: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications.

    Availability: The software is available at

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Bioinformatics (Oxford, England) 2010;26;14;1704-7

  • New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.

    Otto TD, Wilinski D, Assefa S, Keane TM, Sarry LR, Böhme U, Lemieux J, Barrell B, Pain A, Berriman M, Newbold C and Llinás M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5' and 3' untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.

    Funded by: NIGMS NIH HHS: P50 GM071508; Wellcome Trust: WT 085775/Z/08/Z

    Molecular microbiology 2010;76;1;12-24

  • Genome-wide association study of blood pressure extremes identifies variant near UMOD associated with hypertension.

    Padmanabhan S, Melander O, Johnson T, Di Blasio AM, Lee WK, Gentilini D, Hastie CE, Menni C, Monti MC, Delles C, Laing S, Corso B, Navis G, Kwakernaak AJ, van der Harst P, Bochud M, Maillard M, Burnier M, Hedner T, Kjeldsen S, Wahlstrand B, Sjögren M, Fava C, Montagnana M, Danese E, Torffvit O, Hedblad B, Snieder H, Connell JM, Brown M, Samani NJ, Farrall M, Cesana G, Mancia G, Signorini S, Grassi G, Eyheramendy S, Wichmann HE, Laan M, Strachan DP, Sever P, Shields DC, Stanton A, Vollenweider P, Teumer A, Völzke H, Rettig R, Newton-Cheh C, Arora P, Zhang F, Soranzo N, Spector TD, Lucas G, Kathiresan S, Siscovick DS, Luan J, Loos RJ, Wareham NJ, Penninx BW, Nolte IM, McBride M, Miller WH, Nicklin SA, Baker AH, Graham D, McDonald RA, Pell JP, Sattar N, Welsh P, Global BPgen Consortium, Munroe P, Caulfield MJ, Zanchetti A and Dominiczak AF

    Institute of Cardiovascular and Medical Sciences, College of Medical Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

    Hypertension is a heritable and major contributor to the global burden of disease. The sum of rare and common genetic variants robustly identified so far explain only 1%-2% of the population variation in BP and hypertension. This suggests the existence of more undiscovered common variants. We conducted a genome-wide association study in 1,621 hypertensive cases and 1,699 controls and follow-up validation analyses in 19,845 cases and 16,541 controls using an extreme case-control design. We identified a locus on chromosome 16 in the 5' region of Uromodulin (UMOD; rs13333226, combined P value of 3.6 × 10⁻¹¹). The minor G allele is associated with a lower risk of hypertension (OR [95%CI]: 0.87 [0.84-0.91]), reduced urinary uromodulin excretion, better renal function; and each copy of the G allele is associated with a 7.7% reduction in risk of CVD events after adjusting for age, sex, BMI, and smoking status (H.R. = 0.923, 95% CI 0.860-0.991; p = 0.027). In a subset of 13,446 individuals with estimated glomerular filtration rate (eGFR) measurements, we show that rs13333226 is independently associated with hypertension (unadjusted for eGFR: 0.89 [0.83-0.96], p = 0.004; after eGFR adjustment: 0.89 [0.83-0.96], p = 0.003). In clinical functional studies, we also consistently show the minor G allele is associated with lower urinary uromodulin excretion. The exclusive expression of uromodulin in the thick portion of the ascending limb of Henle suggests a putative role of this variant in hypertension through an effect on sodium homeostasis. The newly discovered UMOD locus for hypertension has the potential to give new insights into the role of uromodulin in BP regulation and to identify novel drugable targets for reducing cardiovascular risk.

    Funded by: British Heart Foundation: CH/98001, FS/05/095/19937, RG/07/005/23633, SP/08/005/25115

    PLoS genetics 2010;6;10;e1001177

  • Multi-heuristic dynamic task allocation using genetic algorithms in a heterogeneous distributed system.

    Page AJ, Keane TM and Naughton TJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    We present a multi-heuristic evolutionary task allocation algorithm to dynamically map tasks to processors in a heterogeneous distributed system. It utilizes a genetic algorithm, combined with eight common heuristics, in an effort to minimize the total execution time. It operates on batches of unmapped tasks and can preemptively remap tasks to processors. The algorithm has been implemented on a Java distributed system and evaluated with a set of six problems from the areas of bioinformatics, biomedical engineering, computer science and cryptography. Experiments using up to 150 heterogeneous processors show that the algorithm achieves better efficiency than other state-of-the-art heuristic algorithms.

    Journal of parallel and distributed computing 2010;70;7;758-766

  • Characterization of a family with rare deletions in CNTNAP5 and DOCK4 suggests novel risk loci for autism and dyslexia.

    Pagnamenta AT, Bacchelli E, de Jonge MV, Mirza G, Scerri TS, Minopoli F, Chiocchetti A, Ludwig KU, Hoffmann P, Paracchini S, Lowy E, Harold DH, Chapman JA, Klauck SM, Poustka F, Houben RH, Staal WG, Ophoff RA, O'Donovan MC, Williams J, Nöthen MM, Schulte-Körne G, Deloukas P, Ragoussis J, Bailey AJ, Maestrini E, Monaco AP and International Molecular Genetic Study Of Autism Consortium

    The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Background: Autism spectrum disorders (ASDs) are characterized by social, communication, and behavioral deficits and complex genetic etiology. A recent study of 517 ASD families implicated DOCK4 by single nucleotide polymorphism (SNP) association and a microdeletion in an affected sibling pair.

    Methods: The DOCK4 microdeletion on 7q31.1 was further characterized in this family using QuantiSNP analysis of 1M SNP array data and reverse transcription polymerase chain reaction. Extended family members were tested by polymerase chain reaction amplification of junction fragments. DOCK4 dosage was measured in additional samples using SNP arrays. Since QuantiSNP analysis identified a novel CNTNAP5 microdeletion in the same affected sibling pair, this gene was sequenced in 143 additional ASD families. Further polymerase chain reaction-restriction fragment length polymorphism analysis included 380 ASD cases and suitable control subjects.

    Results: The maternally inherited microdeletion encompassed chr7:110,663,978-111,257,682 and led to a DOCK4-IMMP2L fusion transcript. It was also detected in five extended family members with no ASD. However, six of nine individuals with this microdeletion had poor reading ability, which prompted us to screen 606 other dyslexia cases. This led to the identification of a second DOCK4 microdeletion co-segregating with dyslexia. Assessment of genomic background in the original ASD family detected a paternal 2q14.3 microdeletion disrupting CNTNAP5 that was also transmitted to both affected siblings. Analysis of other ASD cohorts revealed four additional rare missense changes in CNTNAP5. No exonic deletions of DOCK4 or CNTNAP5 were seen in 2091 control subjects.

    Conclusions: This study highlights two new risk factors for ASD and dyslexia and demonstrates the importance of performing a high-resolution assessment of genomic background, even after detection of a rare and likely damaging microdeletion using a targeted approach.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02.

    Biological psychiatry 2010;68;4;320-8

  • Identification of three novel superantigen-encoding genes in Streptococcus equi subsp. zooepidemicus, szeF, szeN, and szeP.

    Paillot R, Darby AC, Robinson C, Wright NL, Steward KF, Anderson E, Webb K, Holden MT, Efstratiou A, Broughton K, Jolley KA, Priestnall SL, Marotti Campi MC, Hughes MA, Radford A, Erles K and Waller AS

    Centre for Preventive Medicine, Animal Health Trust, Kentford, Newmarket, Suffolk, United Kingdom.

    The acquisition of superantigen-encoding genes by Streptococcus pyogenes has been associated with increased morbidity and mortality in humans, and the gain of four superantigens by Streptococcus equi is linked to the evolution of this host-restricted pathogen from an ancestral strain of the opportunistic pathogen Streptococcus equi subsp. zooepidemicus. A recent study determined that the culture supernatants of several S. equi subsp. zooepidemicus strains possessed mitogenic activity but lacked known superantigen-encoding genes. Here, we report the identification and activities of three novel superantigen-encoding genes. The products of szeF, szeN, and szeP share 59%, 49%, and 34% amino acid sequence identity with SPEH, SPEM, and SPEL, respectively. Recombinant SzeF, SzeN, and SzeP stimulated the proliferation of equine peripheral blood mononuclear cells, and tumor necrosis factor alpha (TNF-α) and gamma interferon (IFN-γ) production, in vitro. Although none of these superantigen genes were encoded within functional prophage elements, szeN and szeP were located next to a prophage remnant, suggesting that they were acquired by horizontal transfer. Eighty-one of 165 diverse S. equi subsp. zooepidemicus strains screened, including 7 out of 15 isolates from cases of disease in humans, contained at least one of these new superantigen-encoding genes. The presence of szeN or szeP, but not szeF, was significantly associated with mitogenic activity in the S. equi subsp. zooepidemicus population (P < 0.000001, P < 0.000001, and P = 0.104, respectively). We conclude that horizontal transfer of these novel superantigens from and within the diverse S. equi subsp. zooepidemicus population is likely to have implications for veterinary and human disease.

    Infection and immunity 2010;78;11;4817-27

  • Identification of susceptibility loci at 7q31 and 9p13 for bipolar disorder in an isolated population.

    Palo OM, Soronen P, Silander K, Varilo T, Tuononen K, Kieseppä T, Partonen T, Lönnqvist J, Paunio T and Peltonen L

    FIMM, Institute for Molecular Medicine and National Institute for Health and Welfare, Helsinki, Finland.

    We performed a linkage analysis on 23 Finnish families with bipolar disorder and originating from the North-Eastern region of Finland, using the Illumina Linkage Panel IV (6K) Array with an average intermarker spacing of 0.65 cM across the genome. We detected genome-wide significant evidence for linkage of mood disorder (bipolar disorder type I, II, or not otherwise specified, manic type of schizoaffective psychosis, cyclothymia, or recurrent depression) to chromosomes 7q31 (LOD = 3.20) and 9p13.1 (LOD = 4.02). Analyzing the best markers on the complete set of 179 Finnish bipolar families supported the findings on chromosome 9p13 (maximum LOD score of 3.02 at position 383 Mb, immediately upstream of the centromere). This region harbors several interesting candidate genes, including contactin associated protein-like 3 (CNTNAP3) and aldehyde dehydrogenase 1 (ALDH1B1). For the 7q31 locus, only one extended pedigree and ten families originating from the same late settlement region in North-Eastern Finland provided evidence for linkage, suggesting that a gene predisposing to bipolar disorder is enriched in that region. Candidate genes of interest in this locus include potassium-voltage-gated channel, member 2 (KCND2) and calcium-dependent activator protein for secretion 2 (CADPS2). The loci on the centromeric region of 9p13 and the telomeric region of 7q31 may represent susceptibility loci for mood disorder in the Finnish population.

    American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics 2010;153B;3;723-35

  • Towards a comprehensive structural variation map of an individual human genome.

    Pang AW, MacDonald JR, Pinto D, Wei J, Rafiq MA, Conrad DF, Park H, Hurles ME, Lee C, Venter JC, Kirkness EF, Levy S, Feuk L and Scherer SW

    Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada.

    Background: Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions.

    Results: We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association.

    Conclusions: Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.

    Funded by: Canadian Institutes of Health Research

    Genome biology 2010;11;5;R52

  • A locus on chromosome 1p36 is associated with thyrotropin and thyroid function as identified by genome-wide association study.

    Panicker V, Wilson SG, Walsh JP, Richards JB, Brown SJ, Beilby JP, Bremner AP, Surdulescu GL, Qweitin E, Gillham-Nasenya I, Soranzo N, Lim EM, Fletcher SJ and Spector TD

    Department of Endocrinology and Diabetes, Sir Charles Gairdner Hospital, Nedlands, Western Australia.

    Thyroid hormones are key regulators of cellular growth, development, and metabolism, and thyroid disorders are a common cause of ill health in the community. Circulating concentrations of thyrotropin (TSH), thyroxine (T4) and triiodothyronine (T3) have a strong heritable component and are thought to be under polygenic control, but the genes responsible are mostly unknown. In order to identify genetic loci associated with these metabolic phenotypes, we performed a genome-wide association study of 2,120,505 SNPs in 2014 female twins from the TwinsUK study and found a significant association between rs10917469 on chromosome 1p36.13 and serum TSH (p = 3.2 × 10(-8)). The association of rs10917469 with serum TSH was replicated (p = 2.0 × 10(-4)) in an independent community-based sample of 1154 participants in the Busselton Health Study. This SNP is located near CAPZB, which might be a regulator of TSH secretion and thus of pituitary-thyroid axis function. Twenty-nine percent of white individuals carry the variant, and the difference in mean TSH concentrations between wild-type individuals and those homozygous for the minor G allele was 0.5 mU/l, which is likely to be clinically relevant. We also provide evidence of suggestive association (p < 5.0 × 10(-6)) of other SNPs with serum TSH, free T4, and free T3 concentrations, and these SNPs might be good targets for further studies. These results advance understanding of the genetic basis of pituitary-thyroid axis function and metabolic regulation.

    Funded by: Canadian Institutes of Health Research; Wellcome Trust

    American journal of human genetics 2010;87;3;430-5

  • The RING-CH ligase K5 antagonizes restriction of KSHV and HIV-1 particle release by mediating ubiquitin-dependent endosomal degradation of tetherin.

    Pardieu C, Vigan R, Wilson SJ, Calvi A, Zang T, Bieniasz P, Kellam P, Towers GJ and Neil SJ

    MRC Centre for Medical Molecular Virology, University College London, London, United Kingdom.

    Tetherin (CD317/BST2) is an interferon-induced membrane protein that inhibits the release of diverse enveloped viral particles. Several mammalian viruses have evolved countermeasures that inactivate tetherin, with the prototype being the HIV-1 Vpu protein. Here we show that the human herpesvirus Kaposi's sarcoma-associated herpesvirus (KSHV) is sensitive to tetherin restriction and its activity is counteracted by the KSHV encoded RING-CH E3 ubiquitin ligase K5. Tetherin expression in KSHV-infected cells inhibits viral particle release, as does depletion of K5 protein using RNA interference. K5 induces a species-specific downregulation of human tetherin from the cell surface followed by its endosomal degradation. We show that K5 targets a single lysine (K18) in the cytoplasmic tail of tetherin for ubiquitination, leading to relocalization of tetherin to CD63-positive endosomal compartments. Tetherin degradation is dependent on ESCRT-mediated endosomal sorting, but does not require a tyrosine-based sorting signal in the tetherin cytoplasmic tail. Importantly, we also show that the ability of K5 to substitute for Vpu in HIV-1 release is entirely dependent on K18 and the RING-CH domain of K5. By contrast, while Vpu induces ubiquitination of tetherin cytoplasmic tail lysine residues, mutation of these positions has no effect on its antagonism of tetherin function, and residual tetherin is associated with the trans-Golgi network (TGN) in Vpu-expressing cells. Taken together our results demonstrate that K5 is a mechanistically distinct viral countermeasure to tetherin-mediated restriction, and that herpesvirus particle release is sensitive to this mode of antiviral inhibition.

    Funded by: Medical Research Council: G0801172, G0801172(87743), G0801937, G9721629; Wellcome Trust: 076608, WT082274MA

    PLoS pathogens 2010;6;4;e1000843

  • An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.

    Pardo M, Lang B, Yu L, Prosser H, Bradley A, Babu MM and Choudhary J

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.

    Funded by: Medical Research Council: MC_U105185859; Wellcome Trust

    Cell stem cell 2010;6;4;382-95

  • Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing.

    Park H, Kim JI, Ju YS, Gokcumen O, Mills RE, Kim S, Lee S, Suh D, Hong D, Kang HP, Yoo YJ, Shin JY, Kim HJ, Yavartanoo M, Chang YW, Ha JS, Chong W, Hwang GR, Darvishi K, Kim H, Yang SJ, Yang KS, Kim H, Hurles ME, Scherer SW, Carter NP, Tyler-Smith C, Lee C and Seo JS

    Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Korea.

    Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3x coverage) and two Asian genomes (AK1, with 27.8x coverage and AK2, with 32.0x coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.

    Funded by: NHGRI NIH HHS: HG004221; Wellcome Trust: 077008, 077009, 077014

    Nature genetics 2010;42;5;400-5

  • Using caching and optimization techniques to improve performance of the Ensembl website.

    Parker A, Bragin E, Brent S, Pritchard B, Smith JA and Trevanion S

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, UK.

    Background: The Ensembl web site has provided access to genomic information for almost 10 years. During this time the amount of data available through Ensembl has grown dramatically. At the same time, the World Wide Web itself has become a dramatically more important component of the scientific workflow and the way that scientists share and access data and scientific information. Since 2000, the Ensembl web interface has had three major updates and numerous smaller updates. These have largely been in response to expanding data types and valuable representations of existing data types. In 2007 it was realised that a radical new approach would be required in order to serve the project's future requirements, and development therefore focused on identifying suitable web technologies for implementation in the 2008 site redesign.

    Results: By comparing the Ensembl website to well-known "Web 2.0" sites, we were able to identify two main areas in which cutting-edge technologies could be advantageously deployed: server efficiency and interface latency. We then evaluated the performance of the existing site using browser-based tools and Apache benchmarking, and selected appropriate technologies to overcome any issues found. Solutions included optimization of the Apache web server, introduction of caching technologies and widespread implementation of AJAX code. These improvements were successfully deployed on the Ensembl website in late 2008 and early 2009.

    Conclusions: Web 2.0 technologies provide a flexible and efficient way to access the terabytes of data now available from Ensembl, enhancing the user experience through improved website responsiveness and a rich, interactive interface.

    BMC bioinformatics 2010;11;239

  • Genomic information infrastructure after the deluge.

    Parkhill J, Birney E and Kersey P

    Maintaining up-to-date annotation on reference genomes is becoming more important, not less, as the ability to rapidly and cheaply resequence genomes expands.

    Funded by: Wellcome Trust

    Genome biology 2010;11;7;402

  • Genome-wide association meta-analysis of cortical bone mineral density unravels allelic heterogeneity at the RANKL locus and potential pleiotropic effects on bone.

    Paternoster L, Lorentzon M, Vandenput L, Karlsson MK, Ljunggren O, Kindmark A, Mellstrom D, Kemp JP, Jarett CE, Holly JM, Sayers A, St Pourcain B, Timpson NJ, Deloukas P, Davey Smith G, Ring SM, Evans DM, Tobias JH and Ohlsson C

    University of Bristol, Bristol, UK.

    Previous genome-wide association (GWA) studies have identified SNPs associated with areal bone mineral density (aBMD). However, this measure is influenced by several different skeletal parameters, such as periosteal expansion, cortical bone mineral density (BMD(C)) cortical thickness, trabecular number, and trabecular thickness, which may be under distinct biological and genetic control. We have carried out a GWA and replication study of BMD(C), as measured by peripheral quantitative computed tomography (pQCT), a more homogenous and valid measure of actual volumetric bone density. After initial GWA meta-analysis of two cohorts (ALSPAC n = 999, aged ∼15 years and GOOD n = 935, aged ∼19 years), we attempted to replicate the BMD(C) associations that had p<1×10(-5) in an independent sample of ALSPAC children (n = 2803) and in a cohort of elderly men (MrOS Sweden, n = 1052). The rs1021188 SNP (near RANKL) was associated with BMD(C) in all cohorts (overall p = 2×10(-14), n = 5739). Each minor allele was associated with a decrease in BMD(C) of ∼0.14SD. There was also evidence for an interaction between this variant and sex (p = 0.01), with a stronger effect in males than females (at age 15, males -6.77mg/cm(3) per C allele, p = 2×10(-6); females -2.79 mg/cm(3) per C allele, p = 0.004). Furthermore, in a preliminary analysis, the rs1021188 minor C allele was associated with higher circulating levels of sRANKL (p<0.005). We show this variant to be independent from the previously aBMD associated SNP (rs9594738) and possibly from a third variant in the same RANKL region, which demonstrates important allelic heterogeneity at this locus. Associations with skeletal parameters reflecting bone dimensions were either not found or were much less pronounced. This finding implicates RANKL as a locus containing variation associated with volumetric bone density and provides further insight into the mechanism by which the RANK/RANKL/OPG pathway may be involved in skeletal development.

    Funded by: Medical Research Council: 74882, G0800582; Wellcome Trust: 076467

    PLoS genetics 2010;6;11;e1001217

  • A genome-wide association study identifies a novel major locus for glycemic control in type 1 diabetes, as measured by both A1C and glucose.

    Paterson AD, Waggott D, Boright AP, Hosseini SM, Shen E, Sylvestre MP, Wong I, Bharaj B, Cleary PA, Lachin JM, MAGIC (Meta-Analyses of Glucose and Insulin-related traits Consortium), Below JE, Nicolae D, Cox NJ, Canty AJ, Sun L, Bull SB and Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group

    Program in Genetics and Genome Biology, Hospital for Sick Children, Toronto, Canada.

    Objective: Glycemia is a major risk factor for the development of long-term complications in type 1 diabetes; however, no specific genetic loci have been identified for glycemic control in individuals with type 1 diabetes. To identify such loci in type 1 diabetes, we analyzed longitudinal repeated measures of A1C from the Diabetes Control and Complications Trial.

    We performed a genome-wide association study using the mean of quarterly A1C values measured over 6.5 years, separately in the conventional (n = 667) and intensive (n = 637) treatment groups of the DCCT. At loci of interest, linear mixed models were used to take advantage of all the repeated measures. We then assessed the association of these loci with capillary glucose and repeated measures of multiple complications of diabetes.

    Results: We identified a major locus for A1C levels in the conventional treatment group near SORCS1 (10q25.1, P = 7 x 10(-10)), which was also associated with mean glucose (P = 2 x 10(-5)). This was confirmed using A1C in the intensive treatment group (P = 0.01). Other loci achieved evidence close to genome-wide significance: 14q32.13 (GSC) and 9p22 (BNC2) in the combined treatment groups and 15q21.3 (WDR72) in the intensive group. Further, these loci gave evidence for association with diabetic complications, specifically SORCS1 with hypoglycemia and BNC2 with renal and retinal complications. We replicated the SORCS1 association in Genetics of Diabetes in Kidneys (GoKinD) study control subjects (P = 0.01) and the BNC2 association with A1C in nondiabetic individuals.

    Conclusions: A major locus for A1C and glucose in individuals with diabetes is near SORCS1. This may influence the design and analysis of genetic studies attempting to identify risk factors for long-term diabetic complications.

    Funded by: Canadian Institutes of Health Research; NIDDK NIH HHS: N01-DK-6-2204, P60-DK20595, R01-DK-077510, R01-DK077489

    Diabetes 2010;59;2;539-49

  • Antagonistic coevolution accelerates molecular evolution.

    Paterson S, Vogwill T, Buckling A, Benmayor R, Spiers AJ, Thomson NR, Quail M, Smith F, Walker D, Libberton B, Fenton A, Hall N and Brockhurst MA

    School of Biological Sciences, Biosciences Building, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK.

    The Red Queen hypothesis proposes that coevolution of interacting species (such as hosts and parasites) should drive molecular evolution through continual natural selection for adaptation and counter-adaptation. Although the divergence observed at some host-resistance and parasite-infectivity genes is consistent with this, the long time periods typically required to study coevolution have so far prevented any direct empirical test. Here we show, using experimental populations of the bacterium Pseudomonas fluorescens SBW25 and its viral parasite, phage Phi2 (refs 10, 11), that the rate of molecular evolution in the phage was far higher when both bacterium and phage coevolved with each other than when phage evolved against a constant host genotype. Coevolution also resulted in far greater genetic divergence between replicate populations, which was correlated with the range of hosts that coevolved phage were able to infect. Consistent with this, the most rapidly evolving phage genes under coevolution were those involved in host infection. These results demonstrate, at both the genomic and phenotypic level, that antagonistic coevolution is a cause of rapid and divergent evolution, and is likely to be a major driver of evolutionary change within species.

    Funded by: Wellcome Trust

    Nature 2010;464;7286;275-8

  • Twenty-eight divergent polysaccharide loci specifying within- and amongst-strain capsule diversity in three strains of Bacteroides fragilis.

    Patrick S, Blakely GW, Houston S, Moore J, Abratt VR, Bertalan M, Cerdeño-Tárraga AM, Quail MA, Corton N, Corton C, Bignell A, Barron A, Clark L, Bentley SD and Parkhill J

    Centre for Infection and Immunity, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Medical Biology Centre, 97 Lisburn Road, Belfast BT9 7BL, UK.

    Comparison of the complete genome sequence of Bacteroides fragilis 638R, originally isolated in the USA, was made with two previously sequenced strains isolated in the UK (NCTC 9343) and Japan (YCH46). The presence of 10 loci containing genes associated with polysaccharide (PS) biosynthesis, each including a putative Wzx flippase and Wzy polymerase, was confirmed in all three strains, despite a lack of cross-reactivity between NCTC 9343 and 638R surface PS-specific antibodies by immunolabelling and microscopy. Genomic comparisons revealed an exceptional level of PS biosynthesis locus diversity. Of the 10 divergent PS-associated loci apparent in each strain, none is similar between NCTC 9343 and 638R. YCH46 shares one locus with NCTC 9343, confirmed by mAb labelling, and a second different locus with 638R, making a total of 28 divergent PS biosynthesis loci amongst the three strains. The lack of expression of the phase-variable large capsule (LC) in strain 638R, observed in NCTC 9343, is likely to be due to a point mutation that generates a stop codon within a putative initiating glycosyltransferase, necessary for the expression of the LC in NCTC 9343. Other major sequence differences were observed to arise from different numbers and variety of inserted extra-chromosomal elements, in particular prophages. Extensive horizontal gene transfer has occurred within these strains, despite the presence of a significant number of divergent DNA restriction and modification systems that act to prevent acquisition of foreign DNA. The level of amongst-strain diversity in PS biosynthesis loci is unprecedented.

    Funded by: Wellcome Trust: 061696

    Microbiology (Reading, England) 2010;156;Pt 11;3255-69

  • Genetic evidence that raised sex hormone binding globulin (SHBG) levels reduce the risk of type 2 diabetes.

    Perry JR, Weedon MN, Langenberg C, Jackson AU, Lyssenko V, Sparsø T, Thorleifsson G, Grallert H, Ferrucci L, Maggio M, Paolisso G, Walker M, Palmer CN, Payne F, Young E, Herder C, Narisu N, Morken MA, Bonnycastle LL, Owen KR, Shields B, Knight B, Bennett A, Groves CJ, Ruokonen A, Jarvelin MR, Pearson E, Pascoe L, Ferrannini E, Bornstein SR, Stringham HM, Scott LJ, Kuusisto J, Nilsson P, Neptin M, Gjesing AP, Pisinger C, Lauritzen T, Sandbaek A, Sampson M, MAGIC, Zeggini E, Lindgren CM, Steinthorsdottir V, Thorsteinsdottir U, Hansen T, Schwarz P, Illig T, Laakso M, Stefansson K, Morris AD, Groop L, Pedersen O, Boehnke M, Barroso I, Wareham NJ, Hattersley AT, McCarthy MI and Frayling TM

    Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Magdalen Road, Exeter, UK.

    Epidemiological studies consistently show that circulating sex hormone binding globulin (SHBG) levels are lower in type 2 diabetes patients than non-diabetic individuals, but the causal nature of this association is controversial. Genetic studies can help dissect causal directions of epidemiological associations because genotypes are much less likely to be confounded, biased or influenced by disease processes. Using this Mendelian randomization principle, we selected a common single nucleotide polymorphism (SNP) near the SHBG gene, rs1799941, that is strongly associated with SHBG levels. We used data from this SNP, or closely correlated SNPs, in 27 657 type 2 diabetes patients and 58 481 controls from 15 studies. We then used data from additional studies to estimate the difference in SHBG levels between type 2 diabetes patients and controls. The SHBG SNP rs1799941 was associated with type 2 diabetes [odds ratio (OR) 0.94, 95% CI: 0.91, 0.97; P = 2 x 10(-5)], with the SHBG raising allele associated with reduced risk of type 2 diabetes. This effect was very similar to that expected (OR 0.92, 95% CI: 0.88, 0.96), given the SHBG-SNP versus SHBG levels association (SHBG levels are 0.2 standard deviations higher per copy of the A allele) and the SHBG levels versus type 2 diabetes association (SHBG levels are 0.23 standard deviations lower in type 2 diabetic patients compared to controls). Results were very similar in men and women. There was no evidence that this variant is associated with diabetes-related intermediate traits, including several measures of insulin secretion and resistance. Our results, together with those from another recent genetic study, strengthen evidence that SHBG and sex hormones are involved in the aetiology of type 2 diabetes.

    Funded by: Department of Health: DHCS/07/07/008; Medical Research Council: G0000649, G016121, G0601261, MC_U106179471; NHGRI NIH HHS: 1 Z01 HG000024; NIA NIH HHS: R01 AG24233-0; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK062370, DK069922, DK072193; Wellcome Trust: 076113, 077016/Z/05/Z, 083270/Z/07/Z, GR072960

    Human molecular genetics 2010;19;3;535-44

  • Genome annotation: man versus machine.

    Petty NK

    Nature reviews. Microbiology 2010;8;11;762

  • The Citrobacter rodentium genome sequence reveals convergent evolution with human pathogenic Escherichia coli.

    Petty NK, Bulgin R, Crepin VF, Cerdeño-Tárraga AM, Schroeder GN, Quail MA, Lennard N, Corton C, Barron A, Clark L, Toribio AL, Parkhill J, Dougan G, Frankel G and Thomson NR

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Citrobacter rodentium (formally Citrobacter freundii biotype 4280) is a highly infectious pathogen that causes colitis and transmissible colonic hyperplasia in mice. In common with enteropathogenic and enterohemorrhagic Escherichia coli (EPEC and EHEC, respectively), C. rodentium exploits a type III secretion system (T3SS) to induce attaching and effacing (A/E) lesions that are essential for virulence. Here, we report the fully annotated genome sequence of the 5.3-Mb chromosome and four plasmids harbored by C. rodentium strain ICC168. The genome sequence revealed key information about the phylogeny of C. rodentium and identified 1,585 C. rodentium-specific (without orthologues in EPEC or EHEC) coding sequences, 10 prophage-like regions, and 17 genomic islands, including the locus for enterocyte effacement (LEE) region, which encodes a T3SS and effector proteins. Among the 29 T3SS effectors found in C. rodentium are all 22 of the core effectors of EPEC strain E2348/69. In addition, we identified a novel C. rodentium effector, named EspS. C. rodentium harbors two type VI secretion systems (T6SS) (CTS1 and CTS2), while EHEC contains only one T6SS (EHS). Our analysis suggests that C. rodentium and EPEC/EHEC have converged on a common host infection strategy through access to a common pool of mobile DNA and that C. rodentium has lost gene functions associated with a previous pathogenic niche.

    Funded by: Medical Research Council: G0700823

    Journal of bacteriology 2010;192;2;525-38

  • A conserved acetyl esterase domain targets diverse bacteriophages to the Vi capsular receptor of Salmonella enterica serovar Typhi.

    Pickard D, Toribio AL, Petty NK, van Tonder A, Yu L, Goulding D, Barrell B, Rance R, Harris D, Wetter M, Wain J, Choudhary J, Thomson N and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Sulston Building, Hinxton, Cambridge CB10 1SA, United Kingdom.

    A number of bacteriophages have been identified that target the Vi capsular antigen of Salmonella enterica serovar Typhi. Here we show that these Vi phages represent a remarkably diverse set of phages belonging to three phage families, including Podoviridae and Myoviridae. Genome analysis facilitated the further classification of these phages and highlighted aspects of their independent evolution. Significantly, a conserved protein domain carrying an acetyl esterase was found to be associated with at least one tail fiber gene for all Vi phages, and the presence of this domain was confirmed in representative phage particles by mass spectrometric analysis. Thus, we provide a simple explanation and paradigm of how a diverse group of phages target a single key virulence antigen associated with this important human-restricted pathogen.

    Funded by: Wellcome Trust

    Journal of bacteriology 2010;192;21;5746-54

  • Metamotifs--a generative model for building families of nucleotide position weight matrices.

    Piipari M, Down TA and Hubbard TJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    Background: Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence.

    Results: We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain.

    Conclusions: We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.

    Funded by: Wellcome Trust: 077198, 077198/Z/05/Z

    BMC bioinformatics 2010;11;348

  • iMotifs: an integrated sequence motif visualization and analysis environment.

    Piipari M, Down TA, Saini H, Enright A and Hubbard TJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided.

    Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files.


    Funded by: Wellcome Trust: 077198, 077198/Z/05/Z

    Bioinformatics (Oxford, England) 2010;26;6;843-4

  • Genome-wide association study reveals multiple loci associated with primary tooth development during infancy.

    Pillas D, Hoggart CJ, Evans DM, O'Reilly PF, Sipilä K, Lähdesmäki R, Millwood IY, Kaakinen M, Netuveli G, Blane D, Charoen P, Sovio U, Pouta A, Freimer N, Hartikainen AL, Laitinen J, Vaara S, Glaser B, Crawford P, Timpson NJ, Ring SM, Deng G, Zhang W, McCarthy MI, Deloukas P, Peltonen L, Elliott P, Coin LJ, Smith GD and Jarvelin MR

    Department of Epidemiology and Public Health, Imperial College London, London, United Kingdom.

    Tooth development is a highly heritable process which relates to other growth and developmental processes, and which interacts with the development of the entire craniofacial complex. Abnormalities of tooth development are common, with tooth agenesis being the most common developmental anomaly in humans. We performed a genome-wide association study of time to first tooth eruption and number of teeth at one year in 4,564 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966) and 1,518 individuals from the Avon Longitudinal Study of Parents and Children (ALSPAC). We identified 5 loci at P<5x10(-8), and 5 with suggestive association (P<5x10(-6)). The loci included several genes with links to tooth and other organ development (KCNJ2, EDA, HOXB2, RAD51L1, IGF2BP1, HMGA2, MSRB3). Genes at four of the identified loci are implicated in the development of cancer. A variant within the HOXB gene cluster associated with occlusion defects requiring orthodontic treatment by age 31 years.

    Funded by: Medical Research Council: G0500539, G0600705, G0800582; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: GR069224

    PLoS genetics 2010;6;2;e1000856

  • A comprehensive catalogue of somatic mutations from a human cancer genome.

    Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordóñez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A, Mudie LJ, Ning Z, Royce T, Schulz-Trieglaff OB, Spiridou A, Stebbings LA, Szajkowski L, Teague J, Williamson D, Chin L, Ross MT, Campbell PJ, Bentley DR, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.

    Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 093867

    Nature 2010;463;7278;191-6

  • A small-cell lung cancer genome with complex signatures of tobacco exposure.

    Pleasance ED, Stephens PJ, O'Meara S, McBride DJ, Meynert A, Jones D, Lin ML, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, Ordoñez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, McLaughlin SF, Peckham HE, Tsung EF, Costa GL, Lee CC, Minna JD, Gazdar A, Birney E, Rhodes MD, McKernan KJ, Stratton MR, Futreal PA and Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3-8 of CHD7 in frame, and another two lines carrying PVT1-CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer.

    Funded by: NCI NIH HHS: P50CA70907; Wellcome Trust: 077012, 077012/Z/05/Z, 088340, 093867

    Nature 2010;463;7278;184-90

  • Telomere length in prospective and retrospective cancer case-control studies.

    Pooley KA, Sandhu MS, Tyrer J, Shah M, Driver KE, Luben RN, Bingham SA, Ponder BA, Pharoah PD, Khaw KT, Easton DF and Dunning AM

    Cancer Research UK Genetic Epidemiology Unit, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Li Ka Shing Centre, Cambridge, United Kingdom.

    Previous studies have reported that shorter mean telomere length in lymphocytes was associated with increased susceptibility to common diseases of aging, and may be predictive of cancer risk. However, most analyses have examined retrospectively collected case-control studies. Mean telomere length was measured using high-throughput quantitative real-time PCR. Blood for DNA extraction was collected after cancer diagnosis in the East Anglian SEARCH Breast (2,243 cases and 2,181 controls) and SEARCH Colorectal (2,249 cases and 2,161 controls) studies. Prospective case-control studies were conducted for breast cancer (199 cases) and colorectal cancer (185 cases), nested within the EPIC-Norfolk cohort. Blood was collected at least 6 months prior to diagnosis, and was matched to DNA from two cancer-free controls per case. In the retrospective SEARCH studies, the age-adjusted odds ratios for shortest (Q4) versus longest (Q1) quartile of mean telomere length was 15.5 [95% confidence intervals (CI), 11.6-20.8; p-het = 5.7 x 10(-75)], with a "per quartile" P-trend = 2.1 x 10(-80) for breast cancer; and 2.14 (95% CI, 1.77-2.59; p-het = 7.3 x 10(-15)), with a per quartile P-trend = 1.8 x 10(-13) for colorectal cancer. In the prospective EPIC study, the comparable odds ratios (Q4 versus Q1) were 1.58 (95% CI, 0.75-3.31; p-het = 0.23) for breast cancer and 1.13 (95% CI, 0.54-2.36; p-het = 0.75) for colorectal cancer risk. Mean telomere length was shorter in retrospectively collected cases than in controls but the equivalent association was markedly weaker in the prospective studies. This suggests that telomere shortening largely occurs after diagnosis, and therefore, might not be of value in cancer prediction.

    Funded by: Cancer Research UK: A10119, A10123, A10124, A9540, C1287/A9540; Medical Research Council

    Cancer research 2010;70;8;3170-6

  • PARK2 deletions occur frequently in sporadic colorectal cancer and accelerate adenoma development in Apc mutant mice.

    Poulogiannis G, McIntyre RE, Dimitriadi M, Apps JR, Wilson CH, Ichimura K, Luo F, Cantley LC, Wyllie AH, Adams DJ and Arends MJ

    Department of Pathology, University of Cambridge, Cambridge CB2 0QQ, United Kingdom.

    In 100 primary colorectal carcinomas, we demonstrate by array comparative genomic hybridization (aCGH) that 33% show DNA copy number (DCN) loss involving PARK2, the gene encoding PARKIN, the E3 ubiquitin ligase whose deficiency is responsible for a form of autosomal recessive juvenile parkinsonism. PARK2 is located on chromosome 6 (at 6q25-27), a chromosome with one of the lowest overall frequencies of DNA copy number alterations recorded in colorectal cancers. The PARK2 deletions are mostly focal (31% approximately 0.5 Mb on average), heterozygous, and show maximum incidence in exons 3 and 4. As PARK2 lies within FRA6E, a large common fragile site, it has been argued that the observed DCN losses in PARK2 in cancer may represent merely the result of enforced replication of locally vulnerable DNA. However, we show that deficiency in expression of PARK2 is significantly associated with adenomatous polyposis coli (APC) deficiency in human colorectal cancer. Evidence of some PARK2 mutations and promoter hypermethylation is described. PARK2 overexpression inhibits cell proliferation in vitro. Moreover, interbreeding of Park2 heterozygous knockout mice with Apc(Min) mice resulted in a dramatic acceleration of intestinal adenoma development and increased polyp multiplicity. We conclude that PARK2 is a tumor suppressor gene whose haploinsufficiency cooperates with mutant APC in colorectal carcinogenesis.

    Funded by: Cancer Research UK

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;34;15145-50

  • Independent and population-specific association of risk variants at the IRGM locus with Crohn's disease.

    Prescott NJ, Dominy KM, Kubo M, Lewis CM, Fisher SA, Redon R, Huang N, Stranger BE, Blaszczyk K, Hudspith B, Parkes G, Hosono N, Yamazaki K, Onnie CM, Forbes A, Dermitzakis ET, Nakamura Y, Mansfield JC, Sanderson J, Hurles ME, Roberts RG and Mathew CG

    Department of Medical and Molecular Genetics, King's College London School of Medicine, Guy's Hospital, London SE1 9RT, UK.

    DNA polymorphisms in a region on chromosome 5q33.1 which contains two genes, immunity related GTPase related family, M (IRGM) and zinc finger protein 300 (ZNF300), are associated with Crohn's disease (CD). The deleted allele of a 20 kb copy number variation (CNV) upstream of IRGM was recently shown to be in strong linkage disequilibrium (LD) with the CD-associated single nucleotide polymorphisms and is itself associated with CD (P < 0.01). The deletion was correlated with increased or reduced expression of IRGM in transformed cells in a cell line-dependent manner, and has been proposed as a likely causal variant. We report here that small insertion/deletion polymorphisms in the promoter and 5' untranslated region of IRGM are, together with the CNV, strongly associated with CD (P = 1.37 x 10(-5) to 1.40 x 10(-9)), and that the CNV and the 5'-untranslated region variant -308(GTTT)(5) contribute independently to CD susceptibility (P = 2.6 x 10(-7) and P = 2 x 10(-5), respectively). We also show that the CD risk haplotype is associated with a significant decrease in IRGM expression (P < 10(-12)) in untransformed lymphocytes from CD patients. Further analysis of these variants in a Japanese CD case-control sample and of IRGM expression in HapMap populations revealed that neither the IRGM insertion/deletion polymorphisms nor the CNV was associated with CD or with altered IRGM expression in the Asian population. This suggests that the involvement of the IRGM risk haplotype in the pathogenesis of CD requires gene-gene or gene-environment interactions which are absent in Asian populations, or that none of the variants analysed are causal, and that the true causal variants arose after the European-Asian split.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 081808/

    Human molecular genetics 2010;19;9;1828-39

  • Characterization of pneumonia due to Streptococcus equi subsp. zooepidemicus in dogs.

    Priestnall SL, Erles K, Brooks HW, Cardwell JM, Waller AS, Paillot R, Robinson C, Darby AC, Holden MT and Schöniger S

    Department of Pathology and Infectious Diseases, Royal Veterinary College, Hawkshead Lane, North Mymms, Hatfield, Hertfordshire AL9 7TA, United Kingdom.

    Streptococcus equi subsp. zooepidemicus has been linked to cases of acute fatal pneumonia in dogs in several countries. Outbreaks can occur in kenneled dog populations and result in significant levels of morbidity and mortality. This highly contagious disease is characterized by the sudden onset of clinical signs, including pyrexia, dyspnea, and hemorrhagic nasal discharge. The pathogenesis of S. equi subsp. zooepidemicus infection in dogs is poorly understood. This study systematically characterized the histopathological changes in the lungs of 39 dogs from a large rehoming shelter in London, United Kingdom; the dogs were infected with S. equi subsp. zooepidemicus. An objective scoring system demonstrated that S. equi subsp. zooepidemicus caused pneumonia in 26/39 (66.7%) dogs, and most of these dogs (17/26 [65.4%]) were classified as severe fibrino-suppurative, necrotizing, and hemorrhagic. Three recently described superantigen genes (szeF, szeN, and szeP) were detected by PCR in 17/47 (36.2%) of the S. equi subsp. zooepidemicus isolates; however, there was no association between the presence of these genes and the histopathological score. The lungs of S. equi subsp. zooepidemicus-infected dogs with severe respiratory signs and lung pathology did however have significantly higher mRNA levels of the proinflammatory cytokines tumor necrosis factor alpha (TNF-α), interleukin 6 (IL-6), and interleukin 8 (IL-8) than in uninfected controls, suggesting a role for an exuberant host immune response in the pathogenesis of this disease.

    Funded by: Wellcome Trust

    Clinical and vaccine immunology : CVI 2010;17;11;1790-6

  • Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes.

    Qi L, Cornelis MC, Kraft P, Stanya KJ, Linda Kao WH, Pankow JS, Dupuis J, Florez JC, Fox CS, Paré G, Sun Q, Girman CJ, Laurie CC, Mirel DB, Manolio TA, Chasman DI, Boerwinkle E, Ridker PM, Hunter DJ, Meigs JB, Lee CH, Hu FB, van Dam RM, Meta-Analysis of Glucose and Insulin-related traits Consortium (MAGIC) and Diabetes Genetics Replication and Meta-analysis (DIAGRAM) Consortium

    Department of Nutrition, Harvard School of Public Health, and Brigham and Women's Hospital, Boston, MA, USA.

    To identify type 2 diabetes (T2D) susceptibility loci, we conducted genome-wide association (GWA) scans in nested case-control samples from two prospective cohort studies, including 2591 patients and 3052 controls of European ancestry. Validation was performed in 11 independent GWA studies of 10,870 cases and 73,735 controls. We identified significantly associated variants near RBMS1 and ITGB6 genes at 2q24, best-represented by SNP rs7593730 (combined OR=0.90, 95% CI=0.86-0.93; P=3.7x10(-8)). The frequency of the risk-lowering allele T is 0.23. Variants in this region were nominally related to lower fasting glucose and HOMA-IR in the MAGIC consortium (P<0.05). These data suggest that the 2q24 locus may influence the T2D risk by affecting glucose metabolism and insulin resistance.

    Funded by: NCI NIH HHS: CA047988, CA1367 92, CA54281, CA63464, P01CA 089392, P01CA055075, P01CA087969, Z01CP010200; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: U01HG0 04436, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004438, U01HG004446, U01HG0047 29, U01HG004726, U01HG004728, U01HG004735, U01HG004738, U01HG04424; NHLBI NIH HHS: HL043851, HL69757, N01- HC-55022, N01-HC- 55018, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55019, N01-HC-55020, N01-HC-55021, N02-HL-6-427, R01 HL071981-07, R01 HL71981, R01HL086694, R01HL087641, R01HL59367; NIAAA NIH HHS: U10AA008401; NIDA NIH HHS: R01DA013423; NIDCR NIH HHS: U01DE018 993, U01DE018903; NIDDK NIH HHS: DK46200, K01- DK067207, K23 DK65978, K24 DK080140, R01DK058845, R01DK075046, R01DK078616, R90DK071507, T90 DK070078, T90 DK070078-05; PHS HHS: HHSN268200625226C, HHSN268200782096C, RFAHG006033

    Human molecular genetics 2010;19;13;2706-15

  • A human gut microbial gene catalogue established by metagenomic sequencing.

    Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M, Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, Doré J, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, MetaHIT Consortium, Bork P, Ehrlich SD and Wang J

    BGI-Shenzhen, Shenzhen 518083, China.

    To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, approximately 150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively.

    Nature 2010;464;7285;59-65

  • Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome.

    Quinlan AR, Clark RA, Sokolova S, Leibowitz ML, Zhang Y, Hurles ME, Mell JC and Hall IM

    Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia 22908, USA.

    Structural variation (SV) is a rich source of genetic diversity in mammals, but due to the challenges associated with mapping SV in complex genomes, basic questions regarding their genomic distribution and mechanistic origins remain unanswered. We have developed an algorithm (HYDRA) to localize SV breakpoints by paired-end mapping, and a general approach for the genome-wide assembly and interpretation of breakpoint sequences. We applied these methods to two inbred mouse strains: C57BL/6J and DBA/2J. We demonstrate that HYDRA accurately maps diverse classes of SV, including those involving repetitive elements such as transposons and segmental duplications; however, our analysis of the C57BL/6J reference strain shows that incomplete reference genome assemblies are a major source of noise. We report 7196 SVs between the two strains, more than two-thirds of which are due to transposon insertions. Of the remainder, 59% are deletions (relative to the reference), 26% are insertions of unlinked DNA, 9% are tandem duplications, and 6% are inversions. To investigate the origins of SV, we characterized 3316 breakpoint sequences at single-nucleotide resolution. We find that approximately 16% of non-transposon SVs have complex breakpoint patterns consistent with template switching during DNA replication or repair, and that this process appears to preferentially generate certain classes of complex variants. Moreover, we find that SVs are significantly enriched in regions of segmental duplication, but that this effect is largely independent of DNA sequence homology and thus cannot be explained by non-allelic homologous recombination (NAHR) alone. This result suggests that the genetic instability of such regions is often the cause rather than the consequence of duplicated genomic architecture.

    Funded by: NHGRI NIH HHS: 1F32HG005197-01; NIH HHS: DP2OD006493-01

    Genome research 2010;20;5;623-35

  • PiggyBac transposon mutagenesis: a tool for cancer gene discovery in mice.

    Rad R, Rad L, Wang W, Cadinanos J, Vassiliou G, Rice S, Campos LS, Yusa K, Banerjee R, Li MA, de la Rosa J, Strong A, Lu D, Ellis P, Conte N, Yang FT, Liu P and Bradley A

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton-Cambridge CB10 1SA, UK.

    Transposons are mobile DNA segments that can disrupt gene function by inserting in or near genes. Here, we show that insertional mutagenesis by the PiggyBac transposon can be used for cancer gene discovery in mice. PiggyBac transposition in genetically engineered transposon-transposase mice induced cancers whose type (hematopoietic versus solid) and latency were dependent on the regulatory elements introduced into transposons. Analysis of 63 hematopoietic tumors revealed that PiggyBac is capable of genome-wide mutagenesis. The PiggyBac screen uncovered many cancer genes not identified in previous retroviral or Sleeping Beauty transposon screens, including Spic, which encodes a PU.1-related transcription factor, and Hdac7, a histone deacetylase gene. PiggyBac and Sleeping Beauty have different integration preferences. To maximize the utility of the tool, we engineered 21 mouse lines to be compatible with both transposon systems in constitutive, tissue- or temporal-specific mutagenesis. Mice with different transposon types, copy numbers, and chromosomal locations support wide applicability.

    Funded by: Wellcome Trust: 077186, 079643

    Science (New York, N.Y.) 2010;330;6007;1104-7

  • Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains.

    Rakyan VK, Down TA, Maslau S, Andrew T, Yang TP, Beyan H, Whittaker P, McCann OT, Finer S, Valdes AM, Leslie RD, Deloukas P and Spector TD

    Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK.

    There is a growing realization that some aging-associated phenotypes/diseases have an epigenetic basis. Here, we report the first genome-scale study of epigenomic dynamics during normal human aging. We identify aging-associated differentially methylated regions (aDMRs) in whole blood in a discovery cohort, and then replicate these aDMRs in sorted CD4(+) T-cells and CD14(+) monocytes in an independent cohort, suggesting that aDMRs occur in precursor haematopoietic cells. Further replication of the aDMRs in buccal cells, representing a tissue that originates from a different germ layer compared with blood, demonstrates that the aDMR signature is a multitissue phenomenon. Moreover, we demonstrate that aging-associated DNA hypermethylation occurs predominantly at bivalent chromatin domain promoters. This same category of promoters, associated with key developmental genes, is frequently hypermethylated in cancers and in vitro cell culture, pointing to a novel mechanistic link between aberrant hypermethylation in cancer, aging, and cell culture.

    Funded by: Medical Research Council; Wellcome Trust

    Genome research 2010;20;4;434-9

  • Peptidase inhibitors in the MEROPS database.

    Rawlings ND

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    The MEROPS website ( includes information on peptidase inhibitors as well as on peptidases and their substrates. Displays have been put in place to link peptidases and inhibitors together. The classification of protein peptidase inhibitors is continually being revised, and currently inhibitors are grouped into 67 families based on comparisons of protein sequences. These families can be further grouped into 38 clans based on comparisons of tertiary structure. Small molecule inhibitors are important reagents for peptidase characterization and, with the increasing importance of peptidases as drug targets, they are also important to the pharmaceutical industry. Small molecule inhibitors are now included in MEROPS and over 160 summaries have been written.

    Funded by: Wellcome Trust: WT077044/Z/05/Z

    Biochimie 2010;92;11;1463-83

  • MEROPS: the peptidase database.

    Rawlings ND, Barrett AJ and Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database ( aims to fulfil the need for an integrated source of information about these. The database has a hierarchical classification in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. The classification framework is used for attaching information at each level. An important focus of the database has become distinguishing one peptidase from another through identifying the specificity of the peptidase in terms of where it will cleave substrates and with which inhibitors it will interact. We have collected over 39,000 known cleavage sites in proteins, peptides and synthetic substrates. These allow us to display peptidase specificity and alignments of protein substrates to give an indication of how well a cleavage site is conserved, and thus its probable physiological relevance. While the number of new peptidase families and clans has only grown slowly the number of complete genomes has greatly increased. This has allowed us to add an analysis tool to the relevant species pages to show significant gains and losses of peptidase genes relative to related species.

    Funded by: Wellcome Trust: WT077044/Z/05/Z

    Nucleic acids research 2010;38;Database issue;D227-33

  • It's alive!

    Reid AJ

    Nature reviews. Microbiology 2010;8;7;468

  • CODA: accurate detection of functional associations between proteins in eukaryotic genomes using domain fusion.

    Reid AJ, Ranea JA, Clegg AB and Orengo CA

    Wellcome Trust Sanger Institute, Cambridge, United Kingdom.

    Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.

    We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.

    The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from Predictions are also available via web services from

    Funded by: Biotechnology and Biological Sciences Research Council

    PloS one 2010;5;6;e10908

  • Genome-wide association study identifies five loci associated with lung function.

    Repapi E, Sayers I, Wain LV, Burton PR, Johnson T, Obeidat M, Zhao JH, Ramasamy A, Zhai G, Vitart V, Huffman JE, Igl W, Albrecht E, Deloukas P, Henderson J, Granell R, McArdle WL, Rudnicka AR, Wellcome Trust Case Control Consortium, Barroso I, Loos RJ, Wareham NJ, Mustelin L, Rantanen T, Surakka I, Imboden M, Wichmann HE, Grkovic I, Jankovic S, Zgaga L, Hartikainen AL, Peltonen L, Gyllensten U, Johansson A, Zaboli G, Campbell H, Wild SH, Wilson JF, Gläser S, Homuth G, Völzke H, Mangino M, Soranzo N, Spector TD, Polasek O, Rudan I, Wright AF, Heliövaara M, Ripatti S, Pouta A, Naluai AT, Olin AC, Torén K, Cooper MN, James AL, Palmer LJ, Hingorani AD, Wannamethee SG, Whincup PH, Smith GD, Ebrahim S, McKeever TM, Pavord ID, MacLeod AK, Morris AD, Porteous DJ, Cooper C, Dennison E, Shaheen S, Karrasch S, Schnabel E, Schulz H, Grallert H, Bouatia-Naji N, Delplanque J, Froguel P, Blakey JD, NSHD Respiratory Study Team, Britton JR, Morris RW, Holloway JW, Lawlor DA, Hui J, Nyberg F, Jarvelin MR, Jackson C, Kähönen M, Kaprio J, Probst-Hensch NM, Koch B, Hayward C, Evans DM, Elliott P, Strachan DP, Hall IP and Tobin MD

    Departments of Health Sciences and Genetics, Adrian Building, University of Leicester, Leicester, UK.

    Pulmonary function measures are heritable traits that predict morbidity and mortality and define chronic obstructive pulmonary disease (COPD). We tested genome-wide association with forced expiratory volume in 1 s (FEV(1)) and the ratio of FEV(1) to forced vital capacity (FVC) in the SpiroMeta consortium (n = 20,288 individuals of European ancestry). We conducted a meta-analysis of top signals with data from direct genotyping (n < or = 32,184 additional individuals) and in silico summary association data from the CHARGE Consortium (n = 21,209) and the Health 2000 survey (n < or = 883). We confirmed the reported locus at 4q31 and identified associations with FEV(1) or FEV(1)/FVC and common variants at five additional loci: 2q35 in TNS1 (P = 1.11 x 10(-12)), 4q24 in GSTCD (2.18 x 10(-23)), 5q33 in HTR4 (P = 4.29 x 10(-9)), 6p21 in AGER (P = 3.07 x 10(-15)) and 15q23 in THSD4 (P = 7.24 x 10(-15)). mRNA analyses showed expression of TNS1, GSTCD, AGER, HTR4 and THSD4 in human lung tissue. These associations offer mechanistic insight into pulmonary function regulation and indicate potential targets for interventions to alleviate respiratory disease.

    Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation: PG/06/154/22043, PG/97012, RG/08/013/25942; Cancer Research UK; Chief Scientist Office: CZB/4/710, CZD/16/6/2, CZD/16/6/4; Department of Health: 0020029; Medical Research Council: G0000934, G0000943, G0401540, G0500539, G0501942, G0600705, G0800582, G0801056, G0902125, G9815508, G990146, MC_U106179471, MC_U106188470, MC_U123092720, MC_U123092721, MC_U127561128, MC_UP_A620_1014, U.1230.00.008.00005.02; NHLBI NIH HHS: 5R01HL087679-02; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; Wellcome Trust: 068545/Z/02, 075883, 076113/B/04/Z, 077016/Z/05/Z, 079895, 086160/Z/08/A

    Nature genetics 2010;42;1;36-44

  • Using randomised vectors in transcription factor binding site predictions

    Rezwan F, Sun Y, Davey N, Adams R, RUST AG, Robinson M

    Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010;5708880

  • Cross-species chromosome painting in bats from Madagascar: the contribution of Myzopodidae to revealing ancestral syntenies in Chiroptera.

    Richards LR, Rambau RV, Lamb JM, Taylor PJ, Yang F, Schoeman MC and Goodman SM

    School of Biological and Conservation Sciences, University of KwaZulu-Natal, Westville Campus, Durban, 4001, South Africa.

    The chiropteran fauna of Madagascar comprises eight of the 19 recognized families of bats, including the endemic Myzopodidae. While recent systematic studies of Malagasy bats have contributed to our understanding of the morphological and genetic diversity of the island's fauna, little is known about their cytosystematics. Here we investigate karyotypic relationships among four species, representing four families of Chiroptera endemic to the Malagasy region using cross-species chromosome painting with painting probes of Myotis myotis: Myzopodidae (Myzopoda aurita, 2n = 26), Molossidae (Mormopterus jugularis, 2n = 48), Miniopteridae (Miniopterus griveaudi, 2n = 46), and Vespertilionidae (Myotis goudoti, 2n = 44). This study represents the first time a member of the family Myzopodidae has been investigated using chromosome painting. Painting probes of M. myotis were used to delimit 29, 24, 23, and 22 homologous chromosomal segments in the genomes of M. aurita, M. jugularis, M. griveaudi, and M. goudoti, respectively. Comparison of GTG-banded homologous chromosomes/chromosomal segments among the four species revealed the genome of M. aurita has been structured through 14 fusions of chromosomes and chromosomal segments of M. myotis chromosomes leading to a karyotype consisting solely of bi-armed chromosomes. In addition, chromosome painting revealed a novel X-autosome translocation in M. aurita. Comparison of our results with published chromosome maps provided further evidence for karyotypic conservatism within the genera Mormopterus, Miniopterus, and Myotis. Mapping of chromosomal rearrangements onto a molecular consensus phylogeny revealed ancestral syntenies shared between Myzopoda and other bat species of the infraorders Pteropodiformes and Vespertilioniformes. Our study provides further evidence for the involvement of Robertsonian (Rb) translocations and fusions/fissions in chromosomal evolution within Chiroptera.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2010;18;6;635-53

  • A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses.

    Ripatti S, Tikkanen E, Orho-Melander M, Havulinna AS, Silander K, Sharma A, Guiducci C, Perola M, Jula A, Sinisalo J, Lokki ML, Nieminen MS, Melander O, Salomaa V, Peltonen L and Kathiresan S

    Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland. samuli.ripatti@fi

    Background: Comparison of patients with coronary heart disease and controls in genome-wide association studies has revealed several single nucleotide polymorphisms (SNPs) associated with coronary heart disease. We aimed to establish the external validity of these findings and to obtain more precise risk estimates using a prospective cohort design.

    Methods: We tested 13 recently discovered SNPs for association with coronary heart disease in a case-control design including participants differing from those in the discovery samples (3829 participants with prevalent coronary heart disease and 48,897 controls free of the disease) and a prospective cohort design including 30,725 participants free of cardiovascular disease from Finland and Sweden. We modelled the 13 SNPs as a multilocus genetic risk score and used Cox proportional hazards models to estimate the association of genetic risk score with incident coronary heart disease. For case-control analyses we analysed associations between individual SNPs and quintiles of genetic risk score using logistic regression.

    Findings: In prospective cohort analyses, 1264 participants had a first coronary heart disease event during a median 10·7 years' follow-up (IQR 6·7-13·6). Genetic risk score was associated with a first coronary heart disease event. When compared with the bottom quintile of genetic risk score, participants in the top quintile were at 1·66-times increased risk of coronary heart disease in a model adjusting for traditional risk factors (95% CI 1·35-2·04, p value for linear trend=7·3×10(-10)). Adjustment for family history did not change these estimates. Genetic risk score did not improve C index over traditional risk factors and family history (p=0·19), nor did it have a significant effect on net reclassification improvement (2·2%, p=0·18); however, it did have a small effect on integrated discrimination index (0·004, p=0·0006). Results of the case-control analyses were similar to those of the prospective cohort analyses.

    Interpretation: Using a genetic risk score based on 13 SNPs associated with coronary heart disease, we can identify the 20% of individuals of European ancestry who are at roughly 70% increased risk of a first coronary heart disease event. The potential clinical use of this panel of SNPs remains to be defined.

    Funding: The Wellcome Trust; Academy of Finland Center of Excellence for Complex Disease Genetics; US National Institutes of Health; the Donovan Family Foundation.

    Funded by: NHLBI NIH HHS: (R01 HL087676; Wellcome Trust: WT089061/Z/09/Z, WT089062/Z/09/Z

    Lancet 2010;376;9750;1393-400

  • Data analysis issues for allele-specific expression using Illumina's GoldenGate assay.

    Ritchie ME, Forrest MS, Dimas AS, Daelemans C, Dermitzakis ET, Deloukas P and Tavaré S

    Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria, 3052, Australia.

    Background: High-throughput measurement of allele-specific expression (ASE) is a relatively new and exciting application area for array-based technologies. In this paper, we explore several data sets which make use of Illumina's GoldenGate BeadArray technology to measure ASE. This platform exploits coding SNPs to obtain relative expression measurements for alleles at approximately 1500 positions in the genome.

    Results: We analyze data from a mixture experiment where genomic DNA samples from pairs of individuals of known genotypes are pooled to create allelic imbalances at varying levels for the majority of SNPs on the array. We observe that GoldenGate has less sensitivity at detecting subtle allelic imbalances (around 1.3 fold) compared to extreme imbalances, and note the benefit of applying local background correction to the data. Analysis of data from a dye-swap control experiment allowed us to quantify dye-bias, which can be reduced considerably by careful normalization. The need to filter the data before carrying out further downstream analysis to remove non-responding probes, which show either weak, or non-specific signal for each allele, was also demonstrated. Throughout this paper, we find that a linear model analysis of the data from each SNP is a flexible modelling strategy that allows for testing of allelic imbalances in each sample when replicate hybridizations are available.

    Conclusions: Our analysis shows that local background correction carried out by Illumina's software, together with quantile normalization of the red and green channels within each array, provides optimal performance in terms of false positive rates. In addition, we strongly encourage intensity-based filtering to remove SNPs which only measure non-specific signal. We anticipate that a similar analysis strategy will prove useful when quantifying ASE on Illumina's higher density Infinium BeadChips.

    BMC bioinformatics 2010;11;280

  • ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level.

    Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, Field D, Harris S, Hide W, Hofmann O, Neumann S, Sterk P, Tong W and Sansone SA

    The European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories. Availability and Implementation: Software, documentation, case studies and implementations at

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E025080/1, BB/G000638/1, BB/I000917/1

    Bioinformatics (Oxford, England) 2010;26;18;2354-6

  • Lung infections in cystic fibrosis: deriving clinical insight from microbial complexity.

    Rogers GB, Stressmann FA, Walker AW, Carroll MP and Bruce KD

    Molecular Microbiology Research Laboratory, Pharmaceutical Science Division, 150 Stamford Street, Franklin-Wilkins Building, King's College London, London, SE1 9NH, UK.

    Lower respiratory tract bacterial infections, such as those associated with cystic fibrosis lung disease, represent a major healthcare burden. Treatment strategies are currently informed by culture-based routine diagnostics whose limitations, including an inability to isolate all potentially clinically significant bacterial species present in a sample, are well documented. Some advances have resulted from the introduction of culture-independent molecular assays for the detection of specific pathogens. However, the application of bacterial community profiling techniques to the characterization of these infections has revealed much higher levels of microbial diversity than previously recognized. These findings are leading to a fundamental shift in the way such infections are considered. Increasingly, polymicrobial infections are being viewed as complex communities of interacting organisms, with dynamic processes key to their pathogenicity. Such a model requires an analytical strategy that provides insight into the interactions of all members of the infective community. The rapid advance in sequencing technology, along with protocols that limit analysis to viable bacterial cells, are for the first time providing an opportunity to gain such insight.

    Expert review of molecular diagnostics 2010;10;2;187-96

  • Identification and characterization of two novel JARID1C mutations: suggestion of an emerging genotype-phenotype correlation.