Sanger Institute - Publications 2007

Number of papers published in 2007: 315

  • Predicted functions and linkage specificities of the products of the Streptococcus pneumoniae capsular biosynthetic loci.

    Aanensen DM, Mavroidi A, Bentley SD, Reeves PR and Spratt BG

    Department of Infectious Disease Epidemiology, Imperial College London, Room G22, Old Medical School Building, St. Mary's Hospital, Norfolk Place, London W2 1PG, United Kingdom.

    The sequences of the capsular biosynthetic (cps) loci of 90 serotypes of Streptococcus pneumoniae have recently been determined. Bioinformatic procedures were used to predict the general functions of 1,973 of the 1,999 gene products and to identify proteins within the same homology group, Pfam family, and CAZy glycosyltransferase family. Correlating cps gene content with the 54 known capsular polysaccharide (CPS) structures provided tentative assignments of the specific functions of the different homology groups of each functional class (regulatory proteins, enzymes for synthesis of CPS constituents, polymerases, flippases, initial sugar transferases, glycosyltransferases [GTs], phosphotransferases, acetyltransferases, and pyruvyltransferases). Assignment of the glycosidic linkages catalyzed by the 342 GTs (92 homology groups) is problematic, but tentative assignments could be made by using this large set of cps loci and CPS structures to correlate the presence of particular GTs with specific glycosidic linkages, by correlating inverting or retaining linkages in CPS repeat units with the inverting or retaining mechanisms of the GTs predicted from their CAZy family membership, and by comparing the CPS structures of serotypes that have very similar cps gene contents. These large-scale comparisons between structure and gene content assigned the linkages catalyzed by 72% of the GTs, and all linkages were assigned in 32 of the serotypes with known repeat unit structures. Clear examples where very similar initial sugar transferases or glycosyltransferases catalyze different linkages in different serotypes were also identified. These assignments should provide a stimulus for biochemical studies to evaluate the reactions that are proposed.

    Funded by: Wellcome Trust

    Journal of bacteriology 2007;189;21;7856-76

  • WebACT: an online genome comparison suite.

    Abbott JC, Aanensen DM and Bentley SD

    Centre for Bioinformatics, Imperial College London, UK.

    Comparison of related genomes is an enormously powerful technique for explaining phenotypic differences and revealing recent evolutionary events. Genomes evolve through a host of mechanisms including long- and short-range intragenomic rearrangements, insertion of laterally acquired DNA, gene loss, and single-nucleotide polymorphisms. The Artemis Comparison Tool (ACT) was developed to enable the intuitive visualization of the consequences of such events in the context of two or more aligned genomes. WebACT is an online resource designed to allow the alignment of up to five genomic sequences within the ACT environment without the need for local software installation. Comparisons can be carried out between uploaded sequences, or those selected from the EMBL or RefSeq databases, using BLASTZ, MUMmer, or Basic Local Alignment Search Tool (BLAST). Precomputed comparisons can be selected from a database covering all the completed bacterial chromosome and plasmid sequences in the Genome Reviews database (1). This allows the rapid visualization of regions of interest, without the need to handle the full genome sequences. Here, we describe the process of using WebACT to prepare comparisons for visualization, and the selection of precomputed comparisons from the database. The use of ACT to view the selected comparison is then explored using examples from bacterial genomes.

    Funded by: Wellcome Trust

    Methods in molecular biology (Clifton, N.J.) 2007;395;57-74

  • Candidate live, attenuated Salmonella enterica serotype Typhimurium vaccines with reduced fecal shedding are immunogenic and effective oral vaccines.

    Abd El Ghany M, Jansen A, Clare S, Hall L, Pickard D, Kingsley RA and Dougan G

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Environmental shedding of genetically manipulated microorganisms is an issue impeding the development of new live vaccines. We have investigated the immunogenicity of a number of novel Salmonella enterica serotype Typhimurium oral vaccine candidates that express the fragment C (TetC) component of tetanus toxin and harbor combinations of additional mutations in genes shdA, misL, and ratB that contribute to the persistence of serotype Typhimurium's colonization of the intestine. Serotype Typhimurium aroA (TetC) derivatives harboring additional mutations in either shdA or misL or combinations of these mutations exhibited a marked decrease in shedding of the vaccine strain in the feces of orally vaccinated mice. However, equivalent levels of anti-TetC and anti-Salmonella lipopolysaccharide immunoglobulin G (IgG), IgG1, IgG2a, and IgA were detected in sera of the vaccinated but not of the control mice. Cellular immune responses to TetC were detected in all vaccinated mice, regardless of the presence of the additional mutations in shdA or misL. Further, immunization with serotype Typhimurium aroA candidate vaccines harboring shdA and misL afforded complete protection against challenge with a virulent strain of serotype Typhimurium.

    Funded by: Wellcome Trust

    Infection and immunity 2007;75;4;1835-42

  • BCL11B is required for positive selection and survival of double-positive thymocytes.

    Albu DI, Feng D, Bhattacharya D, Jenkins NA, Copeland NG, Liu P and Avram D

    Center for Cell Biology and Cancer Research, Albany Medical College, Albany, NY 12208, USA.

    Transcriptional control of gene expression in double-positive (DP) thymocytes remains poorly understood. We show that the transcription factor BCL11B plays a critical role in DP thymocytes by controlling positive selection of both CD4 and CD8 lineages. BCL11B-deficient DP thymocytes rearrange T cell receptor (TCR) alpha; however, they display impaired proximal TCR signaling and attenuated extracellular signal-regulated kinase phosphorylation and calcium flux, which are all required for initiation of positive selection. Further, provision of transgenic TCRs did not improve positive selection of BCL11B-deficient DP thymocytes. BCL11B-deficient DP thymocytes have altered expression of genes with a role in positive selection, TCR signaling, and other signaling pathways intersecting the TCR, which may account for the defect. BCL11B-deficient DP thymocytes also presented increased susceptibility to spontaneous apoptosis associated with high levels of cleaved caspase-3 and an altered balance of proapoptotic/prosurvival factors. This latter susceptibility was manifested even in the absence of TCR signaling and was only partially rescued by provision of the BCL2 transgene, indicating that control of DP thymocyte survival by BCL11B is nonredundant and, at least in part, independent of BCL2 prosurvival factors.

    Funded by: NHLBI NIH HHS: T32-HL-07194; NIAID NIH HHS: R01 AI067846, R01 AI067846-01A2; NIAMS NIH HHS: K01 AR-02194

    The Journal of experimental medicine 2007;204;12;3003-15

  • Mutations in TCF4, encoding a class I basic helix-loop-helix transcription factor, are responsible for Pitt-Hopkins syndrome, a severe epileptic encephalopathy associated with autonomic dysfunction.

    Amiel J, Rio M, de Pontual L, Redon R, Malan V, Boddaert N, Plouin P, Carter NP, Lyonnet S, Munnich A and Colleaux L

    From the Departments of Genetics, Pediatric Radiology and INSERM U-797, Universite Paris-Descartes, Faculte de Medecine, Hopitaux de Paris, Hopital Necker-Enfants Malades, Paris, France.

    Pitt-Hopkins syndrome (PHS) is a rare syndromic encephalopathy characterized by daily bouts of hyperventilation and a facial gestalt. We report a 1.8-Mb de novo microdeletion on chromosome 18q21.1, identified by array-comparative genomic hybridization in one patient with PHS. We subsequently identified two de novo heterozygous missense mutations of a conserved amino acid in the basic region of the TCF4 gene in three additional subjects with PHS. These findings demonstrate that TCF4 anomalies are responsible for PHS and provide the first evidence of a human disorder related to class I basic helix-loop-helix transcription-factor defects (also known as "E proteins"). Moreover, our data may shed new light on the normal processes underlying autonomic nervous system development and maintenance of an appropriate ventilatory neuronal circuitry.

    Funded by: Wellcome Trust

    American journal of human genetics 2007;80;5;988-93

  • SISYPHUS--structural alignments for proteins with non-trivial relationships.

    Andreeva A, Prlić A, Hubbard TJ and Murzin AG

    MRC Centre for Protein Engineering, Hills Road, Cambridge CB2 2QH, UK.

    With the increasing amount of structural data, the number of homologous protein structures bearing topological irregularities is steadily growing. These include proteins with circular permutations, segment-swapping, context-dependent folding or chameleon sequences that can adopt alternative secondary structures. Their non-trivial structural relationships are readily identified during expert analysis but their automatic identification using the existing computational tools still remains difficult or impossible. Such non-trivial cases of protein relationships are known to pose a problem to multiple alignment algorithms and to impede comparative modeling studies. They support a new emerging concept of evolutionary changeable protein fold, which creates practical difficulties for the hierarchical classifications of protein structures.To facilitate the understanding of, and to provide a comprehensive annotation of proteins with such non-trivial structural relationships we have created SISYPHUS ([Sigmaomeganuphiomicronzeta]--in Greek crafty), a compendium to the SCOP database. The SISYPHUS database contains a collection of manually curated structural alignments and their inter-relationships. The multiple alignments are constructed for protein structural regions that range from oligomeric biological units, or individual domains to fragments of different size. The SISYPHUS multiple alignments are displayed with SPICE, a browser that provides an integrated view of protein sequences, structures and their annotations. The database is available from

    Funded by: Medical Research Council: MC_U105192716; Wellcome Trust: 077198

    Nucleic acids research 2007;35;Database issue;D253-9

  • Karyotypic evolution and phylogenetic relationships in the order Chiroptera as revealed by G-banding comparison and chromosome painting.

    Ao L, Mao X, Nie W, Gu X, Feng Q, Wang J, Su W, Wang Y, Volleth M and Yang F

    Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, and Graduate School of the Chinese Academy of Sciences, Kunming, Yunnan, 650223, P.R. China.

    Bats are a unique but enigmatic group of mammals and have a world-wide distribution. The phylogenetic relationships of extant bats are far from being resolved. Here, we investigated the karyotypic relationships of representative species from four families of the order Chiroptera by comparative chromosome painting and banding. A complete set of painting probes derived from flow-sorted chromosomes of Myotis myotis (family Vespertilionidae) were hybridized onto metaphases of Cynopterus sphinx (2n = 34, family Pteropodidae), Rhinolophus sinicus (2n=36, family Rhinolophidae) and Aselliscus stoliczkanus (2n=30, family Hipposideridae) and delimited 27, 30 and 25 conserved chromosomal segments in the three genomes, respectively. The results substantiate that Robertsonian translocation is the main mode of chromosome evolution in the order Chiroptera, with extensive conservation of whole chromosomal arms. The use of M. myotis (2n=44) probes has enabled the integration of C. sphinx, R. sinicus and A. stoliczkanus chromosomes into the previously established comparative maps between human and Eonycteris spelaea (2n=36), Rhinolophus mehelyi (2n=58), Hipposideros larvatus (2n=32), and M. myotis. Our results provide the first cytogenetic signature rearrangement that supports the grouping of Pteropodidae and Rhinolophoidea in a common clade (i.e. Pteropodiformes or Yinpterochiroptera) and thus improve our understanding on the karyotypic relationships and genome phylogeny of these bat species.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2007;15;3;257-67

  • Functional characterization of the Plasmodium falciparum and P. berghei homologues of macrophage migration inhibitory factor.

    Augustijn KD, Kleemann R, Thompson J, Kooistra T, Crawford CE, Reece SE, Pain A, Siebum AH, Janse CJ and Waters AP

    Department of Parasitology, LUMC, Albinusdreef 2, Room P4-35, 2333 ZA Leiden, The Netherlands.

    Macrophage migration inhibitory factor (MIF) is a mammalian cytokine that participates in innate and adaptive immune responses. Homologues of mammalian MIF have been discovered in parasite species infecting mammalian hosts (nematodes and malaria parasites), which suggests that the parasites express MIF to modulate the host immune response upon infection. Here we report the first biochemical and genetic characterization of a Plasmodium MIF (PMIF). Like human MIF, histidine-tagged purified recombinant PMIF shows tautomerase and oxidoreductase activities (although the activities are reduced compared to those of histidine-tagged human MIF) and efficiently inhibits AP-1 activity in human embryonic kidney cells. Furthermore, we found that Plasmodium berghei MIF is expressed in both a mammalian host and a mosquito vector and that, in blood stages, it is secreted into the infected erythrocytes and released upon schizont rupture. Mutant P. berghei parasites lacking PMIF were able to complete the entire life cycle and exhibited no significant changes in growth characteristics or virulence features during blood stage infection. However, rodent hosts infected with knockout parasites had significantly higher numbers of circulating reticulocytes. Our results suggest that PMIF is produced by the parasite to influence host immune responses and the course of anemia upon infection.

    Funded by: Wellcome Trust: 072171

    Infection and immunity 2007;75;3;1116-28

  • The genome of Salmonella enterica serovar Typhi.

    Baker S and Dougan G

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    The generation of complete genome sequences provides a blueprint that facilitates the genetic characterization of pathogens and their hosts. The genome of Salmonella enterica serovar Typhi (S. Typhi) harbors ~5 million base pairs encoding some 4000 genes, of which >200 are functionally inactive. Comparison of S. Typhi isolates from around the world indicates that they are highly related (clonal) and that they emerged from a single point of origin ~30,000-50,000 years ago. Evidence suggests that, as well as undergoing gene degradation, S. Typhi has also recently acquired genes, such as those encoding the Vi antigen, by horizontal transfer events.

    Funded by: Wellcome Trust

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2007;45 Suppl 1;S29-33

  • A novel linear plasmid mediates flagellar variation in Salmonella Typhi.

    Baker S, Hardy J, Sanderson KE, Quail M, Goodhead I, Kingsley RA, Parkhill J, Stocker B and Dougan G

    The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    Unlike the majority of Salmonella enterica serovars, Salmonella Typhi (S. Typhi), the etiological agent of human typhoid, is monophasic. S. Typhi normally harbours only the phase 1 flagellin gene (fliC), which encodes the H:d antigen. However, some S. Typhi strains found in Indonesia express an additional flagellin antigen termed H:z66. Molecular analysis of H:z66+ S. Typhi revealed that the H:z66 flagellin structural gene (fljB(z66)) is encoded on a linear plasmid that we have named pBSSB1. The DNA sequence of pBSSB1 was determined to be just over 27 kbp, and was predicted to encode 33 coding sequences. To our knowledge, pBSSB1 is the first non-bacteriophage-related linear plasmid to be described in the Enterobacteriaceae.

    Funded by: NIAID NIH HHS: NIH-AI034829

    PLoS pathogens 2007;3;5;e59

  • A linear plasmid truncation induces unidirectional flagellar phase change in H:z66 positive Salmonella Typhi.

    Baker S, Holt K, Whitehead S, Goodhead I, Perkins T, Stocker B, Hardy J and Dougan G

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    The process by which bacteria regulate flagellar expression is known as phase variation and in Salmonella enterica this process permits the expression of one of two flagellin genes, fliC or fljB, at any one time. Salmonella Typhi (S. Typhi) is normally not capable of phase variation of flagellar antigen expression as isolates only harbour the fliC gene (H:d) and lacks an equivalent fljB locus. However, some S. Typhi isolates, exclusively from Indonesia, harbour an fljB equivalent encoded on linear plasmid, pBSSB1 that drives the expression of a novel flagellin named H:z66. H:z66+S. Typhi isolates were stimulated to change flagellar phase and genetically analysed for the mechanism of variation. The phase change was demonstrated to be unidirectional, reverting to expression from the resident chromosomal fliC gene. DNA sequencing demonstrated that pBSSB1 linear DNA was still detectable but that these derivatives had undergone deletion and were lacking fljA(z66) (encoding a flagellar repressor) and fljB(z66). The deletion end-point was found to involve one of the plasmid termini and a palindromic repeat sequence within fljB(z66), distinct to that found at the terminus of pBSSB1. These data demonstrate that, like some Streptomyces linear elements, at least one of the terminal inverted repeats of pBSSB1 is non-essential, but that a palindromic repeat sequence may be necessary for replication.

    Funded by: Wellcome Trust: 076962

    Molecular microbiology 2007;66;5;1207-18

  • Cross-species chromosome painting among camel, cattle, pig and human: further insights into the putative Cetartiodactyla ancestral karyotype.

    Balmus G, Trifonov VA, Biltueva LS, O'Brien PC, Alkalaeva ES, Fu B, Skidmore JA, Allen T, Graphodatsky AS, Yang F and Ferguson-Smith MA

    Cambridge Resource Centre for Comparative Genomics, Department of Veterinary Medicine, Cambridge, UK.

    The great karyotypic differences between camel, cattle and pig, three important domestic animals, have been a challenge for comparative cytogenetic studies based on conventional cytogenetic approaches. To construct a genome-wide comparative chromosome map among these artiodactyls, we made a set of chromosome painting probes from the dromedary camel (Camelus dromedarius) by flow sorting and degenerate oligonucleotide primed-PCR. The painting probes were first used to characterize the karyotypes of the dromedary camel (C. dromedarius), the Bactrian camel (C. bactrianus), the guanaco (Lama guanicoe), the alpaca (L. pacos) and dromedary x guanaco hybrid karyotypes (all with 2n = 74). These FISH experiments enabled the establishment of a high-resolution GTG-banded karyotype, together with chromosome nomenclature and idiogram for C. dromedarius, and revealed that these camelid species have almost identical karyotypes, with only slight variations in the amount and distribution patterns of heterochromatin. Further cross-species chromosome painting between camel, cattle, pig and human with painting probes from the camel and human led to the establishment of genome-wide comparative maps. Between human and camel, pig and camel, and cattle and camel 47, 53 and 53 autosomal conserved segments were detected, respectively. Integrated analysis with previously published comparative maps of human/pig/cattle enabled us to propose a Cetartiodactyla ancestral karyotype and to discuss the early karyotype evolution of Cetartiodactyla. Furthermore, these maps will facilitate the positional cloning of genes by aiding the cross-species transfer of mapping information.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2007;15;4;499-515

  • 'Species' of peptidases.

    Barrett AJ and Rawlings ND

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    A good system for the naming and classification of peptidases can contribute much to the study of these enzymes. Having already described the building of families and clans in the MEROPS system, we here focus on the lowest level in the hierarchy, in which the huge number of individual peptidase proteins are assigned to a lesser number of what we term 'species' of peptidases. Just over 2000 peptidase species are recognised today, but we estimate that 25 000 will one day be known. Each species is built around a peptidase protein that has been adequately characterised. The cluster of peptidase proteins that represent the single species is then assembled primarily by analysis of a sequence 'tree' for the family. Each peptidase species is given a systematic identifier and a summary page of data regarding it is assembled. Because the characterisation of new peptidases lags far behind the sequencing, the majority of peptidase proteins are so far known only as amino acid sequences and cannot yet be assigned to species. We suggest that new forms of analysis of the sequences of the unassigned peptidases may give early indications of how they will cluster into the new species of the future.

    Biological chemistry 2007;388;11;1151-7

  • Biallelic mutation of MSH2 in primary human cells is associated with sensitivity to irradiation and altered RAD51 foci kinetics.

    Barwell J, Pangon L, Hodgson S, Georgiou A, Kesterton I, Slade T, Taylor M, Payne SJ, Brinkman H, Smythe J, Sebire NJ, Solomon E, Docherty Z, Camplejohn R, Homfray T and Morris JR

    Department of Genetics, St. George's Medical School, University of London, Cranmer Terrace, London, UK.

    Background: Reports of differential mutagen sensitivity conferred by a defect in the mismatch repair (MMR) pathway are inconsistent in their conclusions. Previous studies have investigated cells established from immortalised human colorectal tumour lines or cells from animal models.

    Methods: We examined primary human MSH2-deficient neonatal cells, bearing a biallelic truncating mutation in MSH2, for viability and chromosomal damage after exposure to DNA-damaging agents.

    Results: MSH2-deficient cells exhibit no response to interstrand DNA cross-linking agents but do show reduced viability in response to irradiation. They also show increased chromosome damage and exhibit altered RAD51 foci kinetics after irradiation exposure, indicating defective homologous recombinational repair.

    Discussion: The cellular features and sensitivity of MSH2-deficient primary human cells are broadly in agreement with observations of primary murine cells lacking the same gene. The data therefore support the view that the murine model recapitulates early features of MMR deficiency in humans, and implies that the variable data reported for MMR-deficient immortalised human cells may be due to further genetic or epigenetic lesions. We suggest caution in the use of radiotherapy for treatment of malignancies in individuals with functional loss of MSH2.

    Journal of medical genetics 2007;44;8;516-20

  • Prediction of noncoding transcripts

    Bateman A

    Bioinformatics. 2007;103-116

  • SCOOP: a simple method for identification of novel protein superfamily relationships.

    Bateman A and Finn RD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Motivation: Profile searches of sequence databases are a sensitive way to detect sequence relationships. Sophisticated profile-profile comparison algorithms that have been recently introduced increase search sensitivity even further.

    Results: In this article, a simpler approach than profile-profile comparison is presented that has a comparable performance to state-of-the-art tools such as COMPASS, HHsearch and PRC. This approach is called SCOOP (Simple Comparison Of Outputs Program), and is shown to find known relationships between families in the Pfam database as well as detect novel distant relationships between families. Several novel discoveries are presented including the discovery that a domain of unknown function (DUF283) found in Dicer proteins is related to double-stranded RNA-binding domains.

    Availability: SCOOP is freely available under a GNU GPL license from

    Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust: 087656

    Bioinformatics (Oxford, England) 2007;23;7;809-14


    Bateman, A

    Nucleic Acids Res.. 2007;35;D1

  • The Genographic Project public participation mitochondrial DNA database.

    Behar DM, Rosset S, Blue-Smith J, Balanovsky O, Tzur S, Comas D, Mitchell RJ, Quintana-Murci L, Tyler-Smith C, Wells RS and Genographic Consortium

    The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. It allows members of the public to participate in a real-time anthropological genetics study by submitting personal samples for analysis and donating the genetic results to the database. We report our experience from the first 18 months of public participation in the Genographic Project, during which we have created the largest standardized human mitochondrial DNA (mtDNA) database ever collected, comprising 78,590 genotypes. Here, we detail our genotyping and quality assurance protocols including direct sequencing of the mtDNA HVS-I, genotyping of 22 coding-region SNPs, and a series of computational quality checks based on phylogenetic principles. This database is very informative with respect to mtDNA phylogeny and mutational dynamics, and its size allows us to develop a nearest neighbor-based methodology for mtDNA haplogroup prediction based on HVS-I motifs that is superior to classic rule-based approaches. We make available to the scientific community and general public two new resources: a periodically updated database comprising all data donated by participants, and the nearest neighbor haplogroup prediction tool.

    Funded by: Wellcome Trust

    PLoS genetics 2007;3;6;e104

  • Nodal signaling activates differentiation genes during zebrafish gastrulation.

    Bennett JT, Joubin K, Cheng S, Aanstad P, Herwig R, Clark M, Lehrach H and Schier AF

    Developmental Genetics Program, Skirball Institute of Biomolecular Medicine, and Department of Cell Biology, New York University School of Medicine, New York, NY 10016, USA.

    Nodal signals induce mesodermal and endodermal progenitors during vertebrate development. To determine the role of Nodal signaling at a genomic level, we isolated Nodal-regulated genes by expression profiling using macroarrays and gene expression databases. Putative Nodal-regulated genes were validated by in situ hybridization screening in wild type and Nodal signaling mutants. 46 genes were identified, raising the currently known number of Nodal-regulated genes to 72. Based on their expression patterns along the dorsoventral axis, most of these genes can be classified into two groups. One group is expressed in the dorsal margin, whereas the other group is expressed throughout the margin. In addition to transcription factors and signaling components, the screens identified several new functional classes of Nodal-regulated genes, including cytoskeletal components and molecules involved in protein secretion or endoplasmic reticulum stress. We found that x-box binding protein-1 (xbp1) is a direct target of Nodal signaling and required for the terminal differentiation of the hatching gland, a specialized secretory organ whose specification is also dependent on Nodal signaling. These results indicate that Nodal signaling regulates not only specification genes but also differentiation genes.

    Funded by: NIGMS NIH HHS: R01 GM056211-06, R01 GM056211-07, R01 GM056211-08, R01 GM056211-09, R01 GM056211-10

    Developmental biology 2007;304;2;525-40

  • Bacterial therapeutics.

    Bentley S and Sebaihia M

    Nature reviews. Microbiology 2007;5;3;170-1

  • Meningococcal genetic variation mechanisms viewed through comparative analysis of serogroup C strain FAM18.

    Bentley SD, Vernikos GS, Snyder LA, Churcher C, Arrowsmith C, Chillingworth T, Cronin A, Davis PH, Holroyd NE, Jagels K, Maddison M, Moule S, Rabbinowitsch E, Sharp S, Unwin L, Whitehead S, Quail MA, Achtman M, Barrell B, Saunders NJ and Parkhill J

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    The bacterium Neisseria meningitidis is commonly found harmlessly colonising the mucosal surfaces of the human nasopharynx. Occasionally strains can invade host tissues causing septicaemia and meningitis, making the bacterium a major cause of morbidity and mortality in both the developed and developing world. The species is known to be diverse in many ways, as a product of its natural transformability and of a range of recombination and mutation-based systems. Previous work on pathogenic Neisseria has identified several mechanisms for the generation of diversity of surface structures, including phase variation based on slippage-like mechanisms and sequence conversion of expressed genes using information from silent loci. Comparison of the genome sequences of two N. meningitidis strains, serogroup B MC58 and serogroup A Z2491, suggested further mechanisms of variation, including C-terminal exchange in specific genes and enhanced localised recombination and variation related to repeat arrays. We have sequenced the genome of N. meningitidis strain FAM18, a representative of the ST-11/ET-37 complex, providing the first genome sequence for the disease-causing serogroup C meningococci; it has 1,976 predicted genes, of which 60 do not have orthologues in the previously sequenced serogroup A or B strains. Through genome comparison with Z2491 and MC58 we have further characterised specific mechanisms of genetic variation in N. meningitidis, describing specialised loci for generation of cell surface protein variants and measuring the association between noncoding repeat arrays and sequence variation in flanking genes. Here we provide a detailed view of novel genetic diversification mechanisms in N. meningitidis. Our analysis provides evidence for the hypothesis that the noncoding repeat arrays in neisserial genomes (neisserial intergenic mosaic elements) provide a crucial mechanism for the generation of surface antigen variants. Such variation will have an impact on the interaction with the host tissues, and understanding these mechanisms is important to aid our understanding of the intimate and complex relationship between the human nasopharynx and the meningococcus.

    Funded by: Wellcome Trust

    PLoS genetics 2007;3;2;e23

  • Interactions among genes in the ErbB-Neuregulin signalling network are associated with increased susceptibility to schizophrenia.

    Benzel I, Bansal A, Browning BL, Galwey NW, Maycox PR, McGinnis R, Smart D, St Clair D, Yates P and Purvis I

    Psychiatry CEDD, GlaxoSmithKline, New Frontiers Science Park, Third Avenue, Harlow, Essex, CM19 5AW Harlow, Essex, UK.

    Background: Evidence of genetic association between the NRG1 (Neuregulin-1) gene and schizophrenia is now well-documented. Furthermore, several recent reports suggest association between schizophrenia and single-nucleotide polymorphisms (SNPs) in ERBB4, one of the receptors for Neuregulin-1. In this study, we have extended the previously published associations by investigating the involvement of all eight genes from the ERBB and NRG families for association with schizophrenia.

    Methods: Eight genes from the ERBB and NRG families were tested for association to schizophrenia using a collection of 396 cases and 1,342 blood bank controls ascertained from Aberdeen, UK. A total of 365 SNPs were tested. Association testing of both alleles and genotypes was carried out using the fast Fisher's Exact Test (FET). To understand better the nature of the associations, all pairs of SNPs separated by >or= 0.5 cM with at least nominal evidence of association (P < 0.10) were tested for evidence of pairwise interaction by logistic regression analysis.

    Results: 42 out of 365 tested SNPs in the eight genes from the ERBB and NRG gene families were significantly associated with schizophrenia (P < 0.05). Associated SNPs were located in ERBB4 and NRG1, confirming earlier reports. However, novel associations were also seen in NRG2, NRG3 and EGFR. In pairwise interaction tests, clear evidence of gene-gene interaction was detected for NRG1-NRG2, NRG1-NRG3 and EGFR-NRG2, and suggestive evidence was also seen for ERBB4-NRG1, ERBB4-NRG2, ERBB4-NRG3 and ERBB4-ERBB2. Evidence of intragenic interaction was seen for SNPs in ERBB4.

    Conclusion: These new findings suggest that observed associations between NRG1 and schizophrenia may be mediated through functional interaction not just with ERBB4, but with other members of the NRG and ERBB families. There is evidence that genetic interaction among these loci may increase susceptibility to schizophrenia.

    Behavioral and brain functions : BBF 2007;3;31

  • Helminth initiative for drug discovery

    Berriman M

    Expert Opinion on Drug Discovery. 2007;2;Suppl 1

  • Variety is the spice of eukaryotic life.

    Berriman M and Pain A

    Nature reviews. Microbiology 2007;5;9;660-1

  • WormBase: new content and better access.

    Bieri T, Blasiar D, Ozersky P, Antoshechkin I, Bastiani C, Canaran P, Chan J, Chen N, Chen WJ, Davis P, Fiedler TJ, Girard L, Han M, Harris TW, Kishore R, Lee R, McKay S, Müller HM, Nakamura C, Petcherski A, Rangarajan A, Rogers A, Schindelman G, Schwarz EM, Spooner W, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Durbin R, Stein LD, Sternberg PW and Spieth J

    Genome Sequencing Center, Washington University School of Medicine, St Louis, MO 63108, USA.

    WormBase (, a model organism database for Caenorhabditis elegans and other related nematodes, continues to evolve and expand. Over the past year WormBase has added new data on C.elegans, including data on classical genetics, cell biology and functional genomics; expanded the annotation of closely related nematodes with a new genome browser for Caenorhabditis remanei; and deployed new hardware for stronger performance. Several existing datasets including phenotype descriptions and RNAi experiments have seen a large increase in new content. New datasets such as the C.remanei draft assembly and annotations, the Vancouver Fosmid library and TEC-RED 5' end sites are now available as well. Access to and searching WormBase has become more dependable and flexible via multiple mirror sites and indexing through Google.

    Funded by: NHGRI NIH HHS: P41-HG02223

    Nucleic acids research 2007;35;Database issue;D506-10

  • Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution.

    Bignell GR, Santarius T, Pole JC, Butler AP, Perry J, Pleasance E, Greenman C, Menzies A, Taylor S, Edkins S, Campbell P, Quail M, Plumb B, Matthews L, McLay K, Edwards PA, Rogers J, Wooster R, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    For decades, cytogenetic studies have demonstrated that somatically acquired structural rearrangements of the genome are a common feature of most classes of human cancer. However, the characteristics of these rearrangements at sequence-level resolution have thus far been subject to very limited description. One process that is dependent upon somatic genome rearrangement is gene amplification, a mechanism often exploited by cancer cells to increase copy number and hence expression of dominantly acting cancer genes. The mechanisms underlying gene amplification are complex but must involve chromosome breakage and rejoining. We sequenced 133 different genomic rearrangements identified within four cancer amplicons involving the frequently amplified cancer genes MYC, MYCN, and ERBB2. The observed architectures of rearrangement were diverse and highly distinctive, with evidence for sister chromatid breakage-fusion-bridge cycles, formation and reinsertion of double minutes, and the presence of bizarre clusters of small genomic fragments. There were characteristic features of sequences at the breakage-fusion junctions, indicating roles for nonhomologous end joining and homologous recombination-mediated repair mechanisms together with nontemplated DNA synthesis. Evidence was also found for sequence-dependent variation in susceptibility of the genome to somatic rearrangement. The results therefore provide insights into the DNA breakage and repair processes operative in somatic genome rearrangement and illustrate how the evolutionary histories of individual cancers can be reconstructed from large-scale cancer genome sequencing.

    Funded by: Wellcome Trust

    Genome research 2007;17;9;1296-303

  • RPS6KA2, a putative tumour suppressor gene at 6q27 in sporadic epithelial ovarian cancer.

    Bignone PA, Lee KY, Liu Y, Emilion G, Finch J, Soosay AE, Charnock FM, Beck S, Dunham I, Mungall AJ and Ganesan TS

    Cancer Research UK, Molecular Oncology Laboratories, Ovarian Cancer Group, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford, UK.

    We had previously defined by allele loss studies a minimal region at 6q27 (between D6S264 and D6S297) to contain a putative tumour suppressor gene. The p90 ribosomal S6 kinase-3 gene (p90 Rsk-3, RPS6KA2) maps in this interval. It is a serine-threonine kinase that signals downstream of the mitogen-activated protein kinase pathway. It is expressed in normal ovarian epithelium, whereas reduced or absent in tumours or cell lines. We show that RPS6KA2 is monoallelically expressed in the ovary suggesting that loss of a single expressed allele is sufficient to cause complete loss of expression in cancer cells. Further, we have identified two new isoforms of RPS6KA2 with an alternative start codon. Homozygous deletions were identified within the RPS6KA2 gene in two cell lines. Re-expression of RPS6KA2 in ovarian cancer cell lines suppressed colony formation. In UCI101 cells, the expression of RPS6KA2 reduced proliferation, caused G1 arrest, increased apoptosis, reduced levels of phosphorylated extracellular signal-regulated kinase and altered other cell cycle proteins. In contrast, small interfering RNA against RPS6KA2 showed the opposite effect in 41M cells. The above results suggest that RPS6KA2 is a putative tumour suppressor gene to explain allele loss at 6q27.

    Funded by: Wellcome Trust

    Oncogene 2007;26;5;683-700

  • Fast-evolving noncoding sequences in the human genome.

    Bird CP, Stranger BE, Liu M, Thomas DJ, Ingle CE, Beazley C, Miller W, Hurles ME and Dermitzakis ET

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Background: Gene regulation is considered one of the driving forces of evolution. Although protein-coding DNA sequences and RNA genes have been subject to recent evolutionary events in the human lineage, it has been hypothesized that the large phenotypic divergence between humans and chimpanzees has been driven mainly by changes in gene regulation rather than altered protein-coding gene sequences. Comparative analysis of vertebrate genomes has revealed an abundance of evolutionarily conserved but noncoding sequences. These conserved noncoding (CNC) sequences may well harbor critical regulatory variants that have driven recent human evolution.

    Results: Here we identify 1,356 CNC sequences that appear to have undergone dramatic human-specific changes in selective pressures, at least 15% of which have substitution rates significantly above that expected under neutrality. The 1,356 'accelerated CNC' (ANC) sequences are enriched in recent segmental duplications, suggesting a recent change in selective constraint following duplication. In addition, single nucleotide polymorphisms within ANC sequences have a significant excess of high frequency derived alleles and high F(ST) values relative to controls, indicating that acceleration and positive selection are recent in human populations. Finally, a significant number of single nucleotide polymorphisms within ANC sequences are associated with changes in gene expression. The probability of variation in an ANC sequence being associated with a gene expression phenotype is fivefold higher than variation in a control CNC sequence.

    Conclusion: Our analysis suggests that ANC sequences have until very recently played a role in human evolution, potentially through lineage-specific changes in gene regulation.

    Funded by: Wellcome Trust

    Genome biology 2007;8;6;R118

  • Cell attachment properties and infectivity of host-adapted and environmentally adapted Citrobacter rodentium.

    Bishop AL, Wiles S, Dougan G and Frankel G

    Division of Cell and Molecular Biology, Flowers Building, Imperial College London, Exhibition Road, London SW7 2AZ, UK.

    Citrobacter rodentium belongs to a family of extracellular enteric pathogens that include enterohaemorrhagic and enteropathogenic Escherichia coli, which colonises the gastrointestinal mucosa by the attaching and effacing (A/E) mechanism. We previously described the appearance of a 'hyper-infectious' state after passage of C. rodentium through the murine gastrointestinal tract. Here we report that host-adapted C. rodentium is able to efficiently adhere and trigger actin polymerisation on cultured epithelial cells. Consistent with these observations we recorded higher levels of expression of genes carried on the LEE pathogenicity island and type III secretion system effector genes carried on prophages compared with in vitro-grown bacteria; importantly, the level of ler gene expression was unchanged. These phenotypes were lost after shed C. rodentium was adapted to the external environment. Upon exposure of C57Bl/6 mice, environmentally adapted C. rodentium was no longer infectious at the low doses associated with host-adapted bacteria and the bacteria were found to be localised in the caecal patch in a similar way to C. rodentium cultured in laboratory media. Thus, the 'hyper-infectious' host-adapted state, allowing efficient transmission and colonisation of naive hosts, is transient in nature and gradually lost after shedding into the environment.

    Funded by: Wellcome Trust: 071006

    Microbes and infection / Institut Pasteur 2007;9;11;1316-24

  • My 2,000 best films: parallel phenotyping of Dictyostelium development.

    Bloomfield G and Kay RR

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 OQH, UK.

    A new study has used parallel filming to record the development of 2,000 Dictyostelium mutants, and clustered them into related groups using morphological staging and wavelet analysis of aggregation patterns.

    Genome biology 2007;8;7;220

  • Genome plasticity of BCG and impact on vaccine efficacy.

    Brosch R, Gordon SV, Garnier T, Eiglmeier K, Frigui W, Valenti P, Dos Santos S, Duthoy S, Lacroix C, Garcia-Pelayo C, Inwald JK, Golby P, Garcia JN, Hewinson RG, Behr MA, Quail MA, Churcher C, Barrell BG, Parkhill J and Cole ST

    Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, 28 Rue du Docteur Roux, 75724 Paris Cedex 15, France.

    To understand the evolution, attenuation, and variable protective efficacy of bacillus Calmette-Guérin (BCG) vaccines, Mycobacterium bovis BCG Pasteur 1173P2 has been subjected to comparative genome and transcriptome analysis. The 4,374,522-bp genome contains 3,954 protein-coding genes, 58 of which are present in two copies as a result of two independent tandem duplications, DU1 and DU2. DU1 is restricted to BCG Pasteur, although four forms of DU2 exist; DU2-I is confined to early BCG vaccines, like BCG Japan, whereas DU2-III and DU2-IV occur in the late vaccines. The glycerol-3-phosphate dehydrogenase gene, glpD2, is one of only three genes common to all four DU2 variants, implying that BCG requires higher levels of this enzyme to grow on glycerol. Further amplification of the DU2 region is ongoing, even within vaccine preparations used to immunize humans. An evolutionary scheme for BCG vaccines was established by analyzing DU2 and other markers. Lesions in genes encoding sigma-factors and pleiotropic transcriptional regulators, like PhoR and Crp, were also uncovered in various BCG strains; together with gene amplification, these affect gene expression levels, immunogenicity, and, possibly, protection against tuberculosis. Furthermore, the combined findings suggest that early BCG vaccines may even be superior to the later ones that are more widely used.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;13;5596-601

  • Avian genomics in the 21st century.

    Burt DW and White SJ

    Department of Genomics and Genetics, Roslin Institute (Edinburgh), Roslin, Midlothian, UK.

    The chicken has long been an important model organism for developmental biology, as well as a major source of protein with billions of birds used in meat and egg production each year. Chicken genomics has been transformed in recent years, with the characterisation of large EST collections and most recently with the assembly of the chicken genome sequence. As the first livestock genome to be fully sequenced it leads the way for others to follow--with zebra finch later this year. The genome sequence and the availability of three million genetic polymorphisms are expected to aid the identification of genes that control traits of importance in poultry. As the first bird genome to be sequenced it is a model for the remaining 9,600 species thought to exist today. Many of the features of avian biology and organisation of the chicken genome make it an ideal model organism for phylogenetics and embryology, along with applications in agriculture and medicine. The availability of new tools such as whole-genome gene expression arrays and SNP panels, coupled with information resources on the genes and proteins are likely to enhance this position.

    Funded by: Wellcome Trust: 062023

    Cytogenetic and genome research 2007;117;1-4;6-13

  • Generation of an inducible and optimized piggyBac transposon system.

    Cadiñanos J and Bradley A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Genomic studies in the mouse have been slowed by the lack of transposon-mediated mutagenesis. However, since the resurrection of Sleeping Beauty (SB), the possibility of performing forward genetics in mice has been reinforced. Recently, piggyBac (PB), a functional transposon from insects, was also described to work in mammals. As the activity of PB is higher than that of SB11 and SB12, two hyperactive SB transposases, we have characterized and improved the PB system in mouse ES cells. We have generated a mouse codon-optimized version of the PB transposase coding sequence (CDS) which provides transposition levels greater than the original. We have also found that the promoter sequence predicted in the 5'-terminal repeat of the PB transposon is active in the mammalian context. Finally, we have engineered inducible versions of the optimized piggyBac transposase fused with ERT2. One of them, when induced, provides higher levels of transposition than the native piggyBac CDS, whereas in the absence of induction its activity is indistinguishable from background. We expect that these tools, adaptable to perform mouse-germline mutagenesis, will facilitate the identification of genes involved in pathological and physiological processes, such as cancer or ES cell differentiation.

    Funded by: Wellcome Trust

    Nucleic acids research 2007;35;12;e87

  • Loss of the mismatch repair protein MSH6 in human glioblastomas is associated with tumor progression during temozolomide treatment.

    Cahill DP, Levine KK, Betensky RA, Codd PJ, Romany CA, Reavie LB, Batchelor TT, Futreal PA, Stratton MR, Curry WT, Iafrate AJ and Louis DN

    Molecular Pathology Unit, Neurosurgical Service, Brain Tumor Center, and Center for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA.

    Purpose: Glioblastomas are treated by surgical resection followed by radiotherapy [X-ray therapy (XRT)] and the alkylating chemotherapeutic agent temozolomide. Recently, inactivating mutations in the mismatch repair gene MSH6 were identified in two glioblastomas recurrent post-temozolomide. Because mismatch repair pathway inactivation is a known mediator of alkylator resistance in vitro, these findings suggested that MSH6 inactivation was causally linked to these two recurrences. However, the extent of involvement of MSH6 in glioblastoma is unknown. We sought to determine the overall frequency and clinical relevance of MSH6 alterations in glioblastomas.

    The MSH6 gene was sequenced in 54 glioblastomas. MSH6 and O(6)-methylguanine methyltransferase (MGMT) immunohistochemistry was systematically scored in a panel of 46 clinically well-characterized glioblastomas, and the corresponding patient response to treatment evaluated.

    Results: MSH6 mutation was not observed in any pretreatment glioblastoma (0 of 40), whereas 3 of 14 recurrent cases had somatic mutations (P = 0.015). MSH6 protein expression was detected in all pretreatment (17 of 17) cases examined but, notably, expression was lost in 7 of 17 (41%) recurrences from matched post-XRT + temozolomide cases (P = 0.016). Loss of MSH6 was not associated with O(6)-methylguanine methyltransferase status. Measurements of in vivo tumor growth using three-dimensional reconstructed magnetic resonance imaging showed that MSH6-negative glioblastomas had a markedly increased rate of growth while under temozolomide treatment (3.17 versus 0.04 cc/mo for MSH6-positive tumors; P = 0.020).

    Conclusions: Loss of MSH6 occurs in a subset of post-XRT + temozolomide glioblastoma recurrences and is associated with tumor progression during temozolomide treatment, mirroring the alkylator resistance conferred by MSH6 inactivation in vitro. MSH6 deficiency may therefore contribute to the emergence of recurrent glioblastomas during temozolomide treatment.

    Funded by: Wellcome Trust: 077012

    Clinical cancer research : an official journal of the American Association for Cancer Research 2007;13;7;2038-45

  • Methods and strategies for analyzing copy number variation using DNA microarrays.

    Carter NP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The association of DNA copy-number variation (CNV) with specific gene function and human disease has been long known, but the wide scope and prevalence of this form of variation has only recently been fully appreciated. The latest studies using microarray technology have demonstrated that as much as 12% of the human genome and thousands of genes are variable in copy number, and this diversity is likely to be responsible for a significant proportion of normal phenotypic variation. Current challenges involve developing methods not only for detecting and cataloging CNVs in human populations at increasingly higher resolution but also for determining the association of CNVs with biological function, recent human evolution, and common and complex human disease.

    Funded by: Wellcome Trust: 077008

    Nature genetics 2007;39;7 Suppl;S16-21

  • Altered phenotype and gene transcription in endothelial cells, induced by Plasmodium falciparum-infected red blood cells: pathogenic or protective?

    Chakravorty SJ, Carret C, Nash GB, Ivens A, Szestak T and Craig AG

    Molecular & Biochemical Parasitology, Liverpool School of Tropical Medicine, University of Liverpool, Liverpool, L3 5QA, United Kingdom.

    Severe malaria is associated with sequestration of Plasmodium falciparum-infected red blood cells (PRBC) in the microvasculature and elevation of intercellular adhesion molecule-1 (ICAM-1) and TNF. In vitro co-culture of human umbilical vein endothelial cells (HUVEC), with either PRBC or uninfected RBC, required the presence of low level TNF (5pg/ml) for significant up-regulation of ICAM-1, which may contribute to increased cytoadhesion in vivo. These effects were independent of P. falciparum erythrocyte membrane protein-1 (PfEMP-1)-mediated adhesion but critically dependent on cell-cell contact. Further changes included increases in IL8 release and soluble TNF receptor shedding. Microarray analysis of HUVEC transcriptome following co-culture, using a human Affymetrix microarray chip, showed significant differential regulation of genes which defined gene ontologies such as cell communication, cell adhesion, signal transduction and immune response. Our data demonstrate that endothelial cells have the ability to mobilise immune and pro-adhesive responses when exposed to both PRBC and TNF. In addition, there is also a previously un-described positive regulation by RBC and TNF and a concurrent negative regulation of a range of genes involved in inflammation and cell-death, by PRBC and TNF. We propose that the balance between positive and negative regulation demonstrated in our study will determine endothelial pathology during a malaria infection.

    Funded by: Wellcome Trust

    International journal for parasitology 2007;37;8-9;975-87

  • A recombineering based approach for high-throughput conditional knockout targeting vector construction.

    Chan W, Costantino N, Li R, Lee SC, Su Q, Melvin D, Court DL and Liu P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Functional analysis of mammalian genes in vivo is primarily achieved through analysing knockout mice. Now that the sequencing of several mammalian genomes has been completed, understanding functions of all the genes represents the next major challenge in the post-genome era. Generation of knockout mutant mice has currently been achieved by many research groups but only by making individual knockouts, one by one. New technological advances and the refinements of existing technologies are critical for genome-wide targeted mutagenesis in the mouse. We describe here new recombineering reagents and protocols that enable recombineering to be carried out in a 96-well format. Consequently, we are able to construct 96 conditional knockout targeting vectors simultaneously. Our new recombineering system makes it a reality to generate large numbers of precisely engineered DNA constructs for functional genomics studies.

    Funded by: Wellcome Trust

    Nucleic acids research 2007;35;8;e64

  • Serodiagnosis of Salmonella enterica serovar Typhi and S. enterica serovars Paratyphi A, B and C human infections.

    Chart H, Cheasty T, de Pinna E, Siorvanes L, Wain J, Alam D, Nizami Q, Bhutta Z and Threlfall EJ

    Laboratory of Enteric Pathogens, Department of Gastrointestinal Infections, Centre for Infections, Health Protection Agency, 61 Colindale Avenue, London NW9 5EQ, UK.

    The aim of this study was to evaluate an immunoassay for the detection of human serum antibodies to the LPS and flagellar antigens of Salmonella Typhi and Salmonella Paratyphi A, B and C, and to the Vi capsular polysaccharide of S. Typhi and S. Paratyphi C. A total of 330 sera were used; these originated from 15 patients who were culture-positive for S. Typhi and 15 healthy controls, together with 300 sera submitted to the Laboratory of Enteric Pathogens for Salmonella serodiagnosis. By SDS-PAGE/immunoblotting, all 15 sera from culture-positive patients had serum antibodies to the 9,12 LPS antigens and 10 had antibodies to the 'd' flagellar antigens. Of the 300 reference sera, 22 had antibodies to the 9,12 LPS antigens, one to the 1,4,5,12 LPS antigens and 12 to the 6,7 LPS antigens. Only two sera had antibodies to flagellar antigens, one of which bound to the 'b' and the other to the 'd' antigen. An ELISA was developed that successfully detected serum antibodies to the Vi capsular polysaccharides, but because of the kinetics of serum antibody production to the Vi, these antibodies may be of limited value in the serodiagnosis of acute infection with S. Typhi and S. Paratyphi C. The immunoassays described here provide a sensitive means of detecting serum antibodies to the LPS, flagellar and Vi antigens of S. Typhi and S. Paratyphi, and constitute a viable replacement for the Widal assay for the screening of sera. The Salmonella serodiagnosis protocols described here are the new standard operating procedures used by the Health Protection Agency's National Salmonella Reference Centre based in the Laboratory of Enteric Pathogens, Colindale, UK.

    Journal of medical microbiology 2007;56;Pt 9;1161-6

  • Antimicrobial drug resistance of Salmonella enterica serovar typhi in asia and molecular mechanism of reduced susceptibility to the fluoroquinolones.

    Chau TT, Campbell JI, Galindo CM, Van Minh Hoang N, Diep TS, Nga TT, Van Vinh Chau N, Tuan PQ, Page AL, Ochiai RL, Schultsz C, Wain J, Bhutta ZA, Parry CM, Bhattacharya SK, Dutta S, Agtini M, Dong B, Honghui Y, Anh DD, Canh do G, Naheed A, Albert MJ, Phetsouvanh R, Newton PN, Basnyat B, Arjyal A, La TT, Rang NN, Phuong le T, Van Be Bay P, von Seidlein L, Dougan G, Clemens JD, Vinh H, Hien TT, Chinh NT, Acosta CJ, Farrar J and Dolecek C

    Oxford University Clinical Research Unit, Hospital for Tropical Diseases, 190 Ben Ham Tu, Ho Chi Minh City, Vietnam.

    This study describes the pattern and extent of drug resistance in 1,774 strains of Salmonella enterica serovar Typhi isolated across Asia between 1993 and 2005 and characterizes the molecular mechanisms underlying the reduced susceptibilities to fluoroquinolones of these strains. For 1,393 serovar Typhi strains collected in southern Vietnam, the proportion of multidrug resistance has remained high since 1993 (50% in 2004) and there was a dramatic increase in nalidixic acid resistance between 1993 (4%) and 2005 (97%). In a cross-sectional sample of 381 serovar Typhi strains from 8 Asian countries, Bangladesh, China, India, Indonesia, Laos, Nepal, Pakistan, and central Vietnam, collected in 2002 to 2004, various rates of multidrug resistance (16 to 37%) and nalidixic acid resistance (5 to 51%) were found. The eight Asian countries involved in this study are home to approximately 80% of the world's typhoid fever cases. These results document the scale of drug resistance across Asia. The Ser83-->Phe substitution in GyrA was the predominant alteration in serovar Typhi strains from Vietnam (117/127 isolates; 92.1%). No mutations in gyrB, parC, or parE were detected in 55 of these strains. In vitro time-kill experiments showed a reduction in the efficacy of ofloxacin against strains harboring a single-amino-acid substitution at codon 83 or 87 of GyrA; this effect was more marked against a strain with a double substitution. The 8-methoxy fluoroquinolone gatifloxacin showed rapid killing of serovar Typhi harboring both the single- and double-amino-acid substitutions.

    Funded by: Wellcome Trust

    Antimicrobial agents and chemotherapy 2007;51;12;4315-23

  • Multilocus sequence typing analysis of Shigella flexneri isolates collected in Asian countries.

    Choi SY, Jeon YS, Lee JH, Choi B, Moon SH, von Seidlein L, Clemens JD, Dougan G, Wain J, Yu J, Lee JC, Seol SY, Lee BK, Song JH, Song M, Czerkinsky C, Chun J and Kim DW

    1International Vaccine Institute, San 4-8 Bongcheon 7 dong, Kwanak gu, Seoul 151-818, Republic of Korea.

    The multilocus sequence typing scheme used previously for phylogenetic analysis of Escherichia coli was applied to 107 clinical isolates of Shigella flexneri. DNA sequencing of 3423 bp throughout seven housekeeping genes identified eight new allele types and ten new sequence types among the isolates. S. flexneri serotypes 1-5, X and Y were clustered together in a group containing many allelic variants while serotype 6 formed a distinct group, as previously established.

    Funded by: Wellcome Trust: 076962

    Journal of medical microbiology 2007;56;Pt 11;1460-6

  • Activating transcription factor 6 (ATF6) sequence polymorphisms in type 2 diabetes and pre-diabetic traits.

    Chu WS, Das SK, Wang H, Chan JC, Deloukas P, Froguel P, Baier LJ, Jia W, McCarthy MI, Ng MC, Damcott C, Shuldiner AR, Zeggini E and Elbein SC

    Division of Endocrinology 111J-1/LR, Department of Medicine, University of Arkansas for Medical Sciences, John L. McClellan Memorial Veterans Hospital, 4700 W. 7th Street, Little Rock, AR 72205, USA.

    Activating transcription factor 6 (ATF6) is located within the region of linkage to type 2 diabetes on chromosome 1q21-q23 and is a key activator of the endoplasmic reticulum stress response. We evaluated 78 single nucleotide polymorphisms (SNPs) spanning >213 kb in 95 people, from which we selected 64 SNPs for evaluation in 191 Caucasian case subjects from Utah and between 165 and 188 control subjects. Six SNPs showed nominal associations with type 2 diabetes (P = 0.001-0.04), including the nonsynonymous SNP rs1058405 (M67V) in exon 3 and rs11579627 in the 3' flanking region. Only rs1159627 remained significant on permutation testing. The associations were not replicated in 353 African-American case subjects and 182 control subjects, nor were ATF6 SNPs associated with altered insulin secretion or insulin sensitivity in nondiabetic Caucasian individuals. No association with type 2 diabetes was found in a subset of 44 SNPs in Caucasian (n = 2,099), Pima Indian (n = 293), and Chinese (n = 287) samples. Allelic expression imbalance was found in transformed lymphocyte cDNA for 3' untranslated region variants, thus suggesting cis-acting regulatory variants. ATF6 does not appear to play a major role in type 2 diabetes, but further work is required to identify the cause of the allelic expression imbalance.

    Funded by: NIDDK NIH HHS: R01 DK039311-24; Wellcome Trust: 076113, 079557

    Diabetes 2007;56;3;856-62

  • Validation of a new prognostic index for advanced epithelial ovarian cancer: results from its application to a UK-based cohort.

    Clark TG, Stewart M, Rye T, Smyth JF and Gourley C

    Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2007;25;35;5669-70; author reply 5670-1

  • Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure.

    Coghlan A and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Motivation: Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other.

    Results: We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron-exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of approximately 1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder.

    Availability: Scripts and Supplementary Material can be found at

    Funded by: Wellcome Trust: 077192

    Bioinformatics (Oxford, England) 2007;23;12;1468-75

  • Supramolecular signalling complexes in the nervous system.

    Collins MO and Grant SG

    Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    It is now apparent that multiprotein signalling complexes or "signalling machines" are responsible for orchestrating many complex signalling pathways in the cell. The synapse is a sub-cellular specialisation which transmits and converts patterns of electrical activity into cellular memory. This processing of electrical information is mediated by the protein components of the synapse. The organisation of synaptic proteins has been investigated over the last number of years using proteomic methods and with the application ofbioinformatics; a landscape of modular protein complexes at the synapse is emerging. Many share a common organisation centred on a receptor/channel, a protein scaffold, (in which the signalling molecules are localised) and membrane to cytoskeleton interactions. The use of PDZ-domain based protein scaffolds is a particularly common feature in the construction of neuronal protein complexes and the differential presence of these proteins in complexes can have functional consequences. Here we overview current proteomic methodologies for the analysis of multiprotein complexes. In addition, we describe the characterisation of a number of multiprotein complexes associated with ion channels (NMDAR, P2X7 and Kir2) and GPCRs (5-HT2A/5-HT2C, D2 and mGluR5) and discuss common their common components and organisation.

    Funded by: Wellcome Trust

    Sub-cellular biochemistry 2007;43;185-207

  • Analysis of protein phosphorylation on a proteome-scale.

    Collins MO, Yu L and Choudhary JS

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Phosphorylation, the most intensively studied and common PTM on proteins, is a complex biological phenomenon. Its complexity manifests itself in the large numbers of proteins that attach it, remove it and recognise it as a protein code. Since the first report of protein phosphorylation on vitellin 100 years ago, a wide variety of biochemical and analytical chemical approaches have been developed to enrich and detect protein phosphorylation. The last 5 years have witnessed a renaissance in methodologies capable of characterising protein phosphorylation on a proteome-scale. These technological advances have allowed identification of hundreds to thousands of phosphorylation sites in a proteome and have resulted in a profound paradigm shift. For the first time, using quantitative MS, the topology and significance of global phosphorylation networks may be investigated, marking a new era of cell signalling research. This review addresses recent technological advances in the purification of phosphorylated proteins and peptides and current MS-based strategies used to qualitatively and quantitatively probe these enriched phosphoproteomes. In addition, we review the application of complementary array-based technologies to derive signalling networks from kinase-substrate interactions and discuss future challenges in the field.

    Proteomics 2007;7;16;2751-68

  • Adiponectin receptor genes: mutation screening in syndromes of insulin resistance and association studies for type 2 diabetes and metabolic traits in UK populations.

    Collins SC, Luan J, Thompson AJ, Daly A, Semple RK, O'Rahilly S, Wareham NJ and Barroso I

    Metabolic Disease Group, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Adiponectin is an adipokine with insulin-sensitising and anti-atherogenic properties. Several reports suggest that genetic variants in the adiponectin gene are associated with circulating levels of adiponectin, insulin sensitivity and type 2 diabetes risk. Recently two receptors for adiponectin have been cloned. Genetic studies have yielded conflicting results on the role of these genes and type 2 diabetes predisposition. In this study we aimed to evaluate the potential role of genetic variation in these genes in syndromes of severe insulin resistance, type 2 diabetes and in related metabolic traits in UK Europid populations.

    Exons and splice junctions of the adiponectin receptor 1 and 2 genes (ADIPOR1; ADIPOR2) were sequenced in patients from our severe insulin resistance cohort (n=129). Subsequently, 24 polymorphisms were tested for association with type 2 diabetes in population-based type 2 diabetes case-control studies (n=2,127) and with quantitative traits in a population-based longitudinal study (n=1,721).

    Results: No missense or nonsense mutations in ADIPOR1 and ADIPOR2 were detected in the cohort of patients with severe insulin resistance. None of the 24 polymorphisms (allele frequency 2.3-48.3%) tested was associated with type 2 diabetes in the case-control study. Similarly, none of the polymorphisms was associated with fasting plasma insulin, fasting and 2-h post-load plasma glucose, 30-min insulin increment or BMI.

    Genetic variation in ADIPOR1 and ADIPOR2 is not a major cause of extreme insulin resistance in humans, nor does it contribute in a significant manner to type 2 diabetes risk and related traits in UK Europid populations.

    Funded by: Medical Research Council: MC_U106179471; Wellcome Trust

    Diabetologia 2007;50;3;555-62

  • The population genetics of structural variation.

    Conrad DF and Hurles ME

    Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.

    Population genetics is central to our understanding of human variation, and by linking medical and evolutionary themes, it enables us to understand the origins and impacts of our genomic differences. Despite current limitations in our knowledge of the locations, sizes and mutational origins of structural variants, our characterization of their population genetics is developing apace, bringing new insights into recent human adaptation, genome biology and disease. We summarize recent dramatic advances, describe the diverse mutational origins of chromosomal rearrangements and argue that their complexity necessitates a re-evaluation of existing population genetic methods.

    Funded by: Wellcome Trust: 077014

    Nature genetics 2007;39;7 Suppl;S30-6

  • The missing care bundle: antibiotic prescribing in hospitals.

    Cooke FJ and Holmes AH

    Department of Infectious Diseases and Immunology, Hammersmith Hospitals NHS Trust, Imperial College, Du Cane Road, London W12 0HS, UK.

    The care bundle involves grouping together key elements of care for procedures and the management of specific diagnoses in order to provide a systematic method to improve and monitor the delivery of clinical care processes. In short, care bundles aim to ensure that all patients consistently receive the best care or treatment, all of the time. This approach has been successfully applied to the management of various conditions, particularly in the critical care setting. The Institute for Healthcare Improvement's '100K lives campaign' consisted of six care bundles, three of which have addressed preventing hospital-acquired infection. The UK Department of Health's delivery programme to reduce healthcare-associated infections (HCAIs), including methicillin-resistant Staphylococcus aureus (MRSA), includes six 'high-impact interventions', which are care bundles to reduce HCAIs. However, we suggest that one key intervention is missing, and consider this intervention will be increasingly important if hospitals are to address the rising incidence of Clostridium difficile, to tackle antibiotic resistance and to improve patient care. The missing intervention addresses the process of antibiotic prescribing. We propose that the time is right to consider the application of the care bundle approach to improve the prescribing of antibiotics, both for treatment and prophylaxis.

    International journal of antimicrobial agents 2007;30;1;25-9

  • Cases of typhoid fever imported into England, Scotland and Wales (2000-2003).

    Cooke FJ, Day M, Wain J, Ward LR and Threlfall EJ

    Health Protection Agency, Centre for Infections, 61 Colindale Avenue, London, UK.

    Although typhoid fever is no longer endemic in most of the developed world, it remains a major infectious disease in less developed regions and imported cases continue to occur in returning travellers, immigrants or migrant workers. We analysed all 692 isolates of Salmonella enterica subspecies enterica serovar Typhi from cases in England, Scotland and Wales that were sent to the Laboratory of Enteric Pathogens at the Health Protection Agency, Centre for Infections, London, UK between 2000 and 2003. The country of acquisition was known for 416 isolates (60%), and the majority of these (70%) came from India or Pakistan. Overall, 24 countries were listed, mainly in Asia and Africa. A total of 48 phage types were detected, 41% of which were Vi-phage type E1. Antimicrobial susceptibility testing revealed that 22% of isolates were multidrug resistant (MDR) (defined as resistance to chloramphenicol, ampicillin and co-trimoxazole) and 39% were quinolone resistant. A significant number of isolates (n=49) were sensitive to nalidixic acid by disk test but exhibited low-level ciprofloxacin resistance, suggesting a novel mechanism of resistance and reinforcing the need for minimum inhibitory concentration determination. Overall, 13% of isolates were both MDR and likely to show a poor response to a fluoroquinolone. A third-generation cephalosporin (e.g. ceftriaxone) should be considered as empirical therapy in regions of the Indian subcontinent where resistance is now at high levels as well as in patients returning from these areas. This study helps to describe the epidemiology of antimicrobial drug resistance in typhoid fever.

    Funded by: Wellcome Trust

    Transactions of the Royal Society of Tropical Medicine and Hygiene 2007;101;4;398-404

  • Prophage sequences defining hot spots of genome variation in Salmonella enterica serovar Typhimurium can be used to discriminate between field isolates.

    Cooke FJ, Wain J, Fookes M, Ivens A, Thomson N, Brown DJ, Threlfall EJ, Gunn G, Foster G and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Sixty-one Salmonella enterica serovar Typhimurium isolates of animal and human origin, matched by phage type, antimicrobial resistance pattern, and place of isolation, were analyzed by microbiological and molecular techniques, including pulsed-field gel electrophoresis (PFGE) and plasmid profiling. PFGE identified 10 profiles that clustered by phage type and antibiotic resistance pattern with human and animal isolates distributed among different PFGE profiles. Genomic DNA was purified from 23 representative strains and hybridized to the composite Salmonella DNA microarray, and specific genomic regions that exhibited significant variation between isolates were identified. Bioinformatic analysis showed that variable regions of DNA were associated with prophage-like elements. Subsequently, simple multiplex PCR assays were designed on the basis of these variable regions that could be used to discriminate between S. enterica serovar Typhimurium isolates from the same geographical region. These multiplex PCR assays, based on prophage-like elements and Salmonella genomic island 1, provide a simple method for identifying new variants of S. enterica serovar Typhimurium in the field.

    Funded by: Wellcome Trust

    Journal of clinical microbiology 2007;45;8;2590-8

  • The candidate genes TAF5L, TCF7, PDCD1, IL6 and ICAM1 cannot be excluded from having effects in type 1 diabetes.

    Cooper JD, Smyth DJ, Bailey R, Payne F, Downes K, Godfrey LM, Masters J, Zeitels LR, Vella A, Walker NM and Todd JA

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.

    Background: As genes associated with immune-mediated diseases have an increased prior probability of being associated with other immune-mediated diseases, we tested three such genes, IL23R, IRF5 and CD40, for an association with type 1 diabetes. In addition, we tested seven genes, TAF5L, PDCD1, TCF7, IL12B, IL6, ICAM1 and TBX21, with published marginal or inconsistent evidence of an association with type 1 diabetes.

    Methods: We genotyped reported polymorphisms of the ten genes, nonsynonymous SNPs (nsSNPs) and, for the IL12B and IL6 regions, tag SNPs in up to 7,888 case, 8,858 control and 3,142 parent-child trio samples. In addition, we analysed data from the Wellcome Trust Case Control Consortium genome-wide association study to determine whether there was any further evidence of an association in each gene region.

    Results: We found some evidence of associations between type 1 diabetes and TAF5L, PDCD1, TCF7 and IL6 (ORs = 1.05 - 1.13; P = 0.0291 - 4.16 x 10-4). No evidence of an association was obtained for IL12B, IRF5, IL23R, ICAM1, TBX21 and CD40, although there was some evidence of an association (OR = 1.10; P = 0.0257) from the genome-wide association study for the ICAM1 region.

    Conclusion: We failed to exclude the possibility of some effect in type 1 diabetes for TAF5L, PDCD1, TCF7, IL6 and ICAM1. Additional studies, of these and other candidate genes, employing much larger sample sizes and analysis of additional polymorphisms in each gene and its flanking region will be required to ascertain their contributions to type 1 diabetes susceptibility.

    Funded by: Wellcome Trust: 068545/Z/02, 076113

    BMC medical genetics 2007;8;71

  • Epigenetic silencing of Plasmodium falciparum genes linked to erythrocyte invasion.

    Cortés A, Carret C, Kaneko O, Yim Lim BY, Ivens A and Holder AA

    Division of Parasitology, Medical Research Council National Institute for Medical Research (NIMR), London, United Kingdom.

    The process of erythrocyte invasion by merozoites of Plasmodium falciparum involves multiple steps, including the formation of a moving junction between parasite and host cell, and it is characterised by the redundancy of many of the receptor-ligand interactions involved. Several parasite proteins that interact with erythrocyte receptors or participate in other steps of invasion are encoded by small subtelomerically located gene families of four to seven members. We report here that members of the eba, rhoph1/clag, acbp, and pfRh multigene families exist in either an active or a silenced state. In the case of two members of the rhoph1/clag family, clag3.1 and clag3.2, expression was mutually exclusive. Silencing was clonally transmitted and occurred in the absence of detectable DNA alterations, suggesting that it is epigenetic. This was demonstrated for eba-140. Our data demonstrate that variant or mutually exclusive expression and epigenetic silencing in Plasmodium are not unique to genes such as var, which encode proteins that are exported to the surface of the erythrocyte, but also occur for genes involved in host cell invasion. Clonal variant expression of invasion-related ligands increases the flexibility of the parasite to adapt to its human host.

    Funded by: NHLBI NIH HHS: HL078826; Wellcome Trust: 066742

    PLoS pathogens 2007;3;8;e107

  • Quorum sensing has an unexpected role in virulence in the model pathogen Citrobacter rodentium.

    Coulthurst SJ, Clare S, Evans TJ, Foulds IJ, Roberts KJ, Welch M, Dougan G and Salmond GP

    Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QW, UK.

    The bacterial mouse pathogen Citrobacter rodentium causes attaching and effacing (AE) lesions in the same manner as pathogenic Escherichia coli, and is an important model for this mode of pathogenesis. Quorum sensing (QS) involves chemical signalling by bacteria to regulate gene expression in response to cell density. E. coli has never been reported to have N-acylhomoserine lactone (AHL) QS, but it does utilize luxS-dependent signalling. We found production of AHL QS signalling molecules by an AE pathogen, C. rodentium. AHL QS is directed by the croIR locus and a croI mutant is affected in its surface attachment, although not in Type III secretion. AHL QS has an important role in virulence in the mouse as, unexpectedly, the QS mutant is hypervirulent; by contrast, we detected no impact of luxS inactivation. Further study of QS in Citrobacter should provide new insights into AE pathogenesis. As the croIR locus might have been horizontally acquired, AHL QS might exist in some strains of pathogenic E. coli.

    Funded by: Wellcome Trust

    EMBO reports 2007;8;7;698-703

  • Sink or swim.

    Crossman LC

    Nature reviews. Microbiology 2007;5;11;834-5

  • It's hip to be square!

    Crossman LC and Walker A

    Nature reviews. Microbiology 2007;5;6;400-1

  • The continuing search for cancer-causing somatic mutations.

    Dalgliesh GL and Futreal PA

    It is known that cancer is caused by an accumulation of mutations in DNA. Many genes have been associated with tumour progression either through germline or somatic mutations, but mutations in these genes by no means account for all instances of the disease. The availability of the completed human genome sequence and reduced costs of sequencing have allowed large-scale screens to uncover genes that are somatically mutated in cancer. In this issue, Chanock and colleagues present a screen of 91 breast cancers for somatic variants in a set of 21 genes.

    Breast cancer research : BCR 2007;9;1;101

  • Tissue-specific histone modification and transcription factor binding in alpha globin gene expression.

    De Gobbi M, Anguita E, Hughes J, Sloane-Stanley JA, Sharpe JA, Koch CM, Dunham I, Gibbons RJ, Wood WG and Higgs DR

    Medical Research Council, Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK.

    To address the mechanism by which the human globin genes are activated during erythropoiesis, we have used a tiled microarray to analyze the pattern of transcription factor binding and associated histone modifications across the telomeric region of human chromosome 16 in primary erythroid and nonerythroid cells. This 220-kb region includes the alpha globin genes and 9 widely expressed genes flanking the alpha globin locus. This un-biased, comprehensive analysis of transcription factor binding and histone modifications (acetylation and methylation) described here not only identified all known cis-acting regulatory elements in the human alpha globin cluster but also demonstrated that there are no additional erythroid-specific regulatory elements in the 220-kb region tested. In addition, the pattern of histone modification distinguished promoter elements from potential enhancer elements across this region. Finally, comparison of the human and mouse orthologous regions in a unique mouse model, with both regions coexpressed in the same animal, showed significant differences that may explain how these 2 clusters are regulated differently in vivo.

    Funded by: Medical Research Council: MC_U137961145, MC_U137961147; NHGRI NIH HHS: U01 HG003168; Wellcome Trust

    Blood 2007;110;13;4503-10

  • NMDA receptor activation dephosphorylates AMPA receptor glutamate receptor 1 subunits at threonine 840.

    Delgado JY, Coba M, Anderson CN, Thompson KR, Gray EE, Heusner CL, Martin KC, Grant SG and O'Dell TJ

    Interdepartmental PhD Program for Neuroscience, University of California, Los Angeles, 90095, USA.

    Phosphorylation-dependent changes in AMPA receptor function have a crucial role in activity-dependent forms of synaptic plasticity such as long-term potentiation (LTP) and long-term depression (LTD). Although three previously identified phosphorylation sites in AMPA receptor glutamate receptor 1 (GluR1) subunits (S818, S831, and S845) appear to have important roles in LTP and LTD, little is known about the role of other putative phosphorylation sites in GluR1. Here, we describe the characterization of a recently identified phosphorylation site in GluR1 at threonine 840. The results of in vivo and in vitro phosphorylation assays suggest that T840 is not a substrate for protein kinases known to phosphorylate GluR1 at previously identified phosphorylation sites, such as protein kinase A, protein kinase C, and calcium/calmodulin-dependent kinase II. Instead, in vitro phosphorylation assays suggest that T840 is a substrate for p70S6 kinase. Although LTP-inducing patterns of synaptic stimulation had no effect on GluR1 phosphorylation at T840 in the hippocampal CA1 region, bath application of NMDA induced a strong, protein phosphatase 1- and/or 2A-mediated decrease in T840 phosphorylation. Moreover, GluR1 phosphorylation at T840 was transiently decreased by a chemical LTD induction protocol that induced a short-term depression of synaptic strength and persistently decreased by a chemical LTD induction protocol that induced a lasting depression of synaptic transmission. Together, our results show that GluR1 phosphorylation at T840 is regulated by NMDA receptor activation and suggest that decreases in GluR1 phosphorylation at T840 may have a role in LTD.

    Funded by: NIMH NIH HHS: R01 MH060919-06A1; Wellcome Trust: 077155

    The Journal of neuroscience : the official journal of the Society for Neuroscience 2007;27;48;13210-21

  • Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions.

    Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto T, Manzano C, Chrast J, Dike S, Wyss C, Henrichsen CN, Holroyd N, Dickson MC, Taylor R, Hance Z, Foissac S, Myers RM, Rogers J, Hubbard T, Harrow J, Guigó R, Gingeras TR, Antonarakis SE and Reymond A

    Grup de Recerca en Informática Biomèdica, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain.

    This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.

    Funded by: NHGRI NIH HHS: U01HG03147, U01HG03150; PHS HHS: N01C012400; Wellcome Trust: 077198

    Genome research 2007;17;6;746-59

  • AutoCSA, an algorithm for high throughput DNA sequence variant detection in cancer genomes.

    Dicks E, Teague JW, Stephens P, Raine K, Yates A, Mattocks C, Tarpey P, Butler A, Menzies A, Richardson D, Jenkinson A, Davies H, Edkins S, Forbes S, Gray K, Greenman C, Shepherd R, Stratton MR, Futreal PA and Wooster R

    Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The undertaking of large-scale DNA sequencing screens for somatic variants in human cancers requires accurate and rapid processing of traces for variants. Due to their often aneuploid nature and admixed normal tissue, heterozygous variants found in primary cancers are often subtle and difficult to detect. To address these issues, we have developed a mutation detection algorithm, AutoCSA, specifically optimized for the high throughput screening of cancer samples. Availability:

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2007;23;13;1689-91

  • A bimodal pattern of relatedness between the Salmonella Paratyphi A and Typhi genomes: convergence or divergence by homologous recombination?

    Didelot X, Achtman M, Parkhill J, Thomson NR and Falush D

    Department of Statistics, University of Oxford, Oxford OX1 3SY, United Kingdom.

    All Salmonella can cause disease but severe systemic infections are primarily caused by a few lineages. Paratyphi A and Typhi are the deadliest human restricted serovars, responsible for approximately 600,000 deaths per annum. We developed a Bayesian changepoint model that uses variation in the degree of nucleotide divergence along two genomes to detect homologous recombination between these strains, and with other lineages of Salmonella enterica. Paratyphi A and Typhi showed an atypical and surprising pattern. For three quarters of their genomes, they appear to be distantly related members of the species S. enterica, both in their gene content and nucleotide divergence. However, the remaining quarter is much more similar in both aspects, with average nucleotide divergence of 0.18% instead of 1.2%. We describe two different scenarios that could have led to this pattern, convergence and divergence, and conclude that the former is more likely based on a variety of criteria. The convergence scenario implies that, although Paratyphi A and Typhi were not especially close relatives within S. enterica, they have gone through a burst of recombination involving more than 100 recombination events. Several of the recombination events transferred novel genes in addition to homologous sequences, resulting in similar gene content in the two lineages. We propose that recombination between Typhi and Paratyphi A has allowed the exchange of gene variants that are important for their adaptation to their common ecological niche, the human host.

    Funded by: Wellcome Trust

    Genome research 2007;17;1;61-8

  • Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Down TA, Bergman CM, Su J and Hubbard TJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

    Funded by: Wellcome Trust: 077198

    PLoS computational biology 2007;3;1;e7

  • An H-NS-like stealth protein aids horizontal DNA transmission in bacteria.

    Doyle M, Fookes M, Ivens A, Mangan MW, Wain J and Dorman CJ

    Department of Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin 2, Ireland.

    The Sfh protein is encoded by self-transmissible plasmids involved in human typhoid and is closely related to the global regulator H-NS. We have found that Sfh provides a stealth function that allows the plasmids to be transmitted to new bacterial hosts with minimal effects on their fitness. Introducing the plasmid without the sfh gene imposes a mild H-NS(-) phenotype and a severe loss of fitness due to titration of the cellular pool of H-NS by the A+T-rich plasmid. This stealth strategy seems to be used widely to aid horizontal DNA transmission and has important implications for bacterial evolution.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 2007;315;5809;251-2

  • Evolution of genes and genomes on the Drosophila phylogeny.

    Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Aquadro CF, Ardell DH, Arguello R, Artieri CG, Barbash DA, Barker D, Barsanti P, Batterham P, Batzoglou S, Begun D, Bhutkar A, Blanco E, Bosak SA, Bradley RK, Brand AD, Brent MR, Brooks AN, Brown RH, Butlin RK, Caggese C, Calvi BR, Bernardo de Carvalho A, Caspi A, Castrezana S, Celniker SE, Chang JL, Chapple C, Chatterji S, Chinwalla A, Civetta A, Clifton SW, Comeron JM, Costello JC, Coyne JA, Daub J, David RG, Delcher AL, Delehaunty K, Do CB, Ebling H, Edwards K, Eickbush T, Evans JD, Filipski A, Findeiss S, Freyhult E, Fulton L, Fulton R, Garcia AC, Gardiner A, Garfield DA, Garvin BE, Gibson G, Gilbert D, Gnerre S, Godfrey J, Good R, Gotea V, Gravely B, Greenberg AJ, Griffiths-Jones S, Gross S, Guigo R, Gustafson EA, Haerty W, Hahn MW, Halligan DL, Halpern AL, Halter GM, Han MV, Heger A, Hillier L, Hinrichs AS, Holmes I, Hoskins RA, Hubisz MJ, Hultmark D, Huntley MA, Jaffe DB, Jagadeeshan S, Jeck WR, Johnson J, Jones CD, Jordan WC, Karpen GH, Kataoka E, Keightley PD, Kheradpour P, Kirkness EF, Koerich LB, Kristiansen K, Kudrna D, Kulathinal RJ, Kumar S, Kwok R, Lander E, Langley CH, Lapoint R, Lazzaro BP, Lee SJ, Levesque L, Li R, Lin CF, Lin MF, Lindblad-Toh K, Llopart A, Long M, Low L, Lozovsky E, Lu J, Luo M, Machado CA, Makalowski W, Marzo M, Matsuda M, Matzkin L, McAllister B, McBride CS, McKernan B, McKernan K, Mendez-Lago M, Minx P, Mollenhauer MU, Montooth K, Mount SM, Mu X, Myers E, Negre B, Newfeld S, Nielsen R, Noor MA, O'Grady P, Pachter L, Papaceit M, Parisi MJ, Parisi M, Parts L, Pedersen JS, Pesole G, Phillippy AM, Ponting CP, Pop M, Porcelli D, Powell JR, Prohaska S, Pruitt K, Puig M, Quesneville H, Ram KR, Rand D, Rasmussen MD, Reed LK, Reenan R, Reily A, Remington KA, Rieger TT, Ritchie MG, Robin C, Rogers YH, Rohde C, Rozas J, Rubenfield MJ, Ruiz A, Russo S, Salzberg SL, Sanchez-Gracia A, Saranga DJ, Sato H, Schaeffer SW, Schatz MC, Schlenke T, Schwartz R, Segarra C, Singh RS, Sirot L, Sirota M, Sisneros NB, Smith CD, Smith TF, Spieth J, Stage DE, Stark A, Stephan W, Strausberg RL, Strempel S, Sturgill D, Sutton G, Sutton GG, Tao W, Teichmann S, Tobari YN, Tomimura Y, Tsolas JM, Valente VL, Venter E, Venter JC, Vicario S, Vieira FG, Vilella AJ, Villasante A, Walenz B, Wang J, Wasserman M, Watts T, Wilson D, Wilson RK, Wing RA, Wolfner MF, Wong A, Wong GK, Wu CI, Wu G, Yamamoto D, Yang HP, Yang SP, Yorke JA, Yoshida K, Zdobnov E, Zhang P, Zhang Y, Zimin AV, Baldwin J, Abdouelleil A, Abdulkadir J, Abebe A, Abera B, Abreu J, Acer SC, Aftuck L, Alexander A, An P, Anderson E, Anderson S, Arachi H, Azer M, Bachantsang P, Barry A, Bayul T, Berlin A, Bessette D, Bloom T, Blye J, Boguslavskiy L, Bonnet C, Boukhgalter B, Bourzgui I, Brown A, Cahill P, Channer S, Cheshatsang Y, Chuda L, Citroen M, Collymore A, Cooke P, Costello M, D'Aco K, Daza R, De Haan G, DeGray S, DeMaso C, Dhargay N, Dooley K, Dooley E, Doricent M, Dorje P, Dorjee K, Dupes A, Elong R, Falk J, Farina A, Faro S, Ferguson D, Fisher S, Foley CD, Franke A, Friedrich D, Gadbois L, Gearin G, Gearin CR, Giannoukos G, Goode T, Graham J, Grandbois E, Grewal S, Gyaltsen K, Hafez N, Hagos B, Hall J, Henson C, Hollinger A, Honan T, Huard MD, Hughes L, Hurhula B, Husby ME, Kamat A, Kanga B, Kashin S, Khazanovich D, Kisner P, Lance K, Lara M, Lee W, Lennon N, Letendre F, LeVine R, Lipovsky A, Liu X, Liu J, Liu S, Lokyitsang T, Lokyitsang Y, Lubonja R, Lui A, MacDonald P, Magnisalis V, Maru K, Matthews C, McCusker W, McDonough S, Mehta T, Meldrim J, Meneus L, Mihai O, Mihalev A, Mihova T, Mittelman R, Mlenga V, Montmayeur A, Mulrain L, Navidi A, Naylor J, Negash T, Nguyen T, Nguyen N, Nicol R, Norbu C, Norbu N, Novod N, O'Neill B, Osman S, Markiewicz E, Oyono OL, Patti C, Phunkhang P, Pierre F, Priest M, Raghuraman S, Rege F, Reyes R, Rise C, Rogov P, Ross K, Ryan E, Settipalli S, Shea T, Sherpa N, Shi L, Shih D, Sparrow T, Spaulding J, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Strader C, Tesfaye S, Thomson T, Thoulutsang Y, Thoulutsang D, Topham K, Topping I, Tsamla T, Vassiliev H, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Young G, Yu Q, Zembek L, Zhong D, Zimmer A, Zwirko Z, Jaffe DB, Alvarez P, Brockman W, Butler J, Chin C, Gnerre S, Grabherr M, Kleber M, Mauceli E and MacCallum I

    Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA.

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

    Funded by: Medical Research Council: MC_U105161047, MC_U137761446; NHGRI NIH HHS: R01 HG000747, R01 HG000747-16, R01 HG002779-05, R01 HG002779-06, R01 HG004037; NIDDK NIH HHS: Z01 DK015600-12; NIGMS NIH HHS: F32 GM067504, R01 GM074813-04; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09

    Nature 2007;450;7167;203-18

  • A TNF region haplotype offers protection from typhoid fever in Vietnamese patients.

    Dunstan SJ, Nguyen TH, Rockett K, Forton J, Morris AP, Diakite M, Mai NL, Le TP, House D, Parry CM, Ha V, Nguyen TH, Dougan G, Tran TH, Kwiatowski D and Farrar JJ

    Oxford University Clinical Research Unit, Hospital for Tropical Diseases, 190 Ben Ham Tu, Quan 5, District 5, Ho Chi Minh City, Vietnam.

    The genomic region surrounding the TNF locus on human chromosome 6 has previously been associated with typhoid fever in Vietnam (Dunstan et al. in J Infect Dis 183:261-268, 2001). We used a haplotypic approach to understand this association further. Eighty single nucleotide polymorphisms (SNPs) spanning a 150 kb region were genotyped in 95 Vietnamese individuals (typhoid case/mother/father trios). A subset of data from 33 SNPs with a minor allele frequency of >4.3% was used to construct haplotypes. Fifteen SNPs, which tagged the 42 constructed haplotypes were selected. The haplotype tagging SNPs (T1-T15) were genotyped in 380 confirmed typhoid cases and 380 Vietnamese ethnically matched controls. Allelic frequencies of seven SNPs (T1, T2, T3, T5, T6, T7, T8) were significantly different between typhoid cases and controls. Logistic regression results support the hypothesis that there is just one signal associated with disease at this locus. Haplotype-based analysis of the tag SNPs provided positive evidence of association with typhoid (posterior probability 0.821). The analysis highlighted a low-risk cluster of haplotypes that each carry the minor allele of T1 or T7, but not both, and otherwise carry the combination of alleles *12122*1111 at T1-T11, further supporting the one associated signal hypothesis. Finally, individuals that carry the typhoid fever protective haplotype *12122*1111 also produce a relatively low TNF-alpha response to LPS.

    Funded by: Wellcome Trust: 076962, 081682

    Human genetics 2007;122;1;51-61

  • Genome-wide association study identifies novel breast cancer susceptibility loci.

    Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R, Wareham N, Ahmed S, Healey CS, Bowman R, SEARCH collaborators, Meyer KB, Haiman CA, Kolonel LK, Henderson BE, Le Marchand L, Brennan P, Sangrajrang S, Gaborieau V, Odefrey F, Shen CY, Wu PE, Wang HC, Eccles D, Evans DG, Peto J, Fletcher O, Johnson N, Seal S, Stratton MR, Rahman N, Chenevix-Trench G, Bojesen SE, Nordestgaard BG, Axelsson CK, Garcia-Closas M, Brinton L, Chanock S, Lissowska J, Peplonska B, Nevanlinna H, Fagerholm R, Eerola H, Kang D, Yoo KY, Noh DY, Ahn SH, Hunter DJ, Hankinson SE, Cox DG, Hall P, Wedren S, Liu J, Low YL, Bogdanova N, Schürmann P, Dörk T, Tollenaar RA, Jacobi CE, Devilee P, Klijn JG, Sigurdson AJ, Doody MM, Alexander BH, Zhang J, Cox A, Brock IW, MacPherson G, Reed MW, Couch FJ, Goode EL, Olson JE, Meijers-Heijboer H, van den Ouweland A, Uitterlinden A, Rivadeneira F, Milne RL, Ribas G, Gonzalez-Neira A, Benitez J, Hopper JL, McCredie M, Southey M, Giles GG, Schroen C, Justenhoven C, Brauch H, Hamann U, Ko YD, Spurdle AB, Beesley J, Chen X, kConFab, AOCS Management Group, Mannermaa A, Kosma VM, Kataja V, Hartikainen J, Day NE, Cox DR and Ponder BA

    CR-UK Genetic Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK.

    Breast cancer exhibits familial aggregation, consistent with variation in genetic susceptibility to the disease. Known susceptibility genes account for less than 25% of the familial risk of breast cancer, and the residual genetic variance is likely to be due to variants conferring more moderate risks. To identify further susceptibility alleles, we conducted a two-stage genome-wide association study in 4,398 breast cancer cases and 4,316 controls, followed by a third stage in which 30 single nucleotide polymorphisms (SNPs) were tested for confirmation in 21,860 cases and 22,578 controls from 22 studies. We used 227,876 SNPs that were estimated to correlate with 77% of known common SNPs in Europeans at r2 > 0.5. SNPs in five novel independent loci exhibited strong and consistent evidence of association with breast cancer (P < 10(-7)). Four of these contain plausible causative genes (FGFR2, TNRC9, MAP3K1 and LSP1). At the second stage, 1,792 SNPs were significant at the P < 0.05 level compared with an estimated 1,343 that would be expected by chance, indicating that many additional common susceptibility alleles may be identifiable by this approach.

    Funded by: Cancer Research UK: A3353

    Nature 2007;447;7148;1087-93

  • The evolution of imprinting: chromosomal mapping of orthologues of mammalian imprinted domains in monotreme and marsupial mammals.

    Edwards CA, Rens W, Clarke O, Mungall AJ, Hore T, Graves JA, Dunham I, Ferguson-Smith AC and Ferguson-Smith MA

    Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK.

    Background: The evolution of genomic imprinting, the parental-origin specific expression of genes, is the subject of much debate. There are several theories to account for how the mechanism evolved including the hypothesis that it was driven by the evolution of X-inactivation, or that it arose from an ancestrally imprinted chromosome.

    Results: Here we demonstrate that mammalian orthologues of imprinted genes are dispersed amongst autosomes in both monotreme and marsupial karyotypes.

    Conclusion: These data, along with the similar distribution seen in birds, suggest that imprinted genes were not located on an ancestrally imprinted chromosome or associated with a sex chromosome. Our results suggest imprinting evolution was a stepwise, adaptive process, with each gene/cluster independently becoming imprinted as the need arose.

    Funded by: Wellcome Trust

    BMC evolutionary biology 2007;7;157

  • Karyotypic differences in two sibling species of Scotophilus from South Africa (Vespertilionidae, Chiroptera, Mammalia).

    Eick GN, Jacobs DS, Yang F and Volleth M

    Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Stellenbosch (South Africa).

    Karyotype descriptions are given for Scotophilus dinganii (2n = 36, FNa = 50) and a recently discovered sister-species, Scotophilus sp. nov. (2n = 36, FNa = 52). These two sibling species occur sympatrically and are distinguished by body size, echolocation frequency and cytochrome b sequence. Cytogenetically, both species differ from other Scotophilus species in the subtelocentric morphology of chromosome 2 and a terminal heterochromatic segment on the X chromosome. Further, Scotophilus sp. nov. is characterized by a subtelocentric chromosome 4 not found in any other Scotophilus species. Comparing the Scotophilus karyotype with that of the vespertilionid genus Myotis, extensive conservation of whole chromosome arms has been found recently. However, out of 25 chromosomal arms six could not be identified in Scotophilus. Therefore, in the present study fluorescence in situ hybridization with whole chromosome painting probes from Myotis myotis was carried out on metaphase preparations from Scotophilus dinganii and Scotophilus sp. nov. These experiments revealed that three previously unidentified Scotophilus chromosomes (A, B, C) contain homologous sequences to Myotis chromosomes 18 plus 22, 19 plus 25, and 16/17, respectively.

    Cytogenetic and genome research 2007;118;1;72-7

  • Zebrafish genome project: bringing new biology to the vertebrate genome field.

    Ekker SC, Stemple DL, Clark M, Chien CB, Rasooly RS and Javois LC

    Department of Biochemistry and Molecular Biology, Mayo Clinic Cancer Center, Rochester, Minnesota 55905, USA.

    Zebrafish 2007;4;4;239-51

  • Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

    ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaöz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, NISC Comparative Sequencing Program, Baylor College of Medicine Human Genome Sequencing Center, Washington University Genome Sequencing Center, Broad Institute, Children's Hospital Oakland Research Institute, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CW, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JN, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PI, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrímsdóttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B and de Jong PJ

    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

    Funded by: NCI NIH HHS: F32 CA108313; NHGRI NIH HHS: K22 HG003169, K22 HG003169-01A1, P41 HG002371, P41 HG002371-03S1, R01 HG002238, R01 HG002238-15, R01 HG003110, R01 HG003110-03, R01 HG003129-03, R01 HG003143, R01 HG003143-04, R01 HG003521, R01 HG003521-01, R01 HG003532, R01 HG003532-01, R01 HG003541, R01 HG003541-03, U01 HG002523, U01 HG002523-01, U01 HG003147, U01 HG003147-02, U01 HG003150-03, U01 HG003151, U01 HG003151-03, U01 HG003156, U01 HG003156-03, U01 HG003157, U01 HG003157-03, U01 HG003161, U01 HG003161-03, U01 HG003162, U01 HG003162-03, U01 HG003168-02, U54 HG003067, U54 HG003067-01, U54 HG003079, U54 HG003079-01, U54 HG003273, U54 HG003273-01; Wellcome Trust: 062023, 077198

    Nature 2007;447;7146;799-816

  • HCOP: a searchable database of human orthology predictions.

    Eyre TA, Wright MW, Lush MJ and Bruford EA

    HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London.

    The HUGO Gene Nomenclature Committee (HGNC) Comparison of Orthology Predictions (HCOP) search tool combines the human, mouse, rat and chicken orthology assertions made by PhIGs, HomoloGene, Ensembl, Inparanoid, Mouse Genome Informatics (MGI) and HGNC, enabling users to identify predicted ortholog pairs for a specified gene or genes. The HCOP resource provides a useful method to integrate, compare and access a variety of disparate sources of human orthology data. The HCOP search tool, data and documentation are available at

    Funded by: NHGRI NIH HHS: P41 HG003345; Wellcome Trust

    Briefings in bioinformatics 2007;8;1;2-5

  • Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.

    Farooqi IS, Wangensteen T, Collins S, Kimber W, Matarese G, Keogh JM, Lank E, Bottomley B, Lopez-Fernandez J, Ferraz-Amaro I, Dattani MT, Ercan O, Myhre AG, Retterstol L, Stanhope R, Edge JA, McKenzie S, Lessan N, Ghodsi M, De Rosa V, Perna F, Fontana S, Barroso I, Undlien DE and O'Rahilly S

    Cambridge Institute for Medical Research, University Department of Clinical Biochemistry, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Background: A single family has been described in which obesity results from a mutation in the leptin-receptor gene (LEPR), but the prevalence of such mutations in severe, early-onset obesity has not been systematically examined.

    Methods: We sequenced LEPR in 300 subjects with hyperphagia and severe early-onset obesity, including 90 probands from consanguineous families, and investigated the extent to which mutations cosegregated with obesity and affected receptor function. We evaluated metabolic, endocrine, and immune function in probands and affected relatives.

    Results: Of the 300 subjects, 8 (3%) had nonsense or missense LEPR mutations--7 were homozygotes, and 1 was a compound heterozygote. All missense mutations resulted in impaired receptor signaling. Affected subjects were characterized by hyperphagia, severe obesity, alterations in immune function, and delayed puberty due to hypogonadotropic hypogonadism. Serum leptin levels were within the range predicted by the elevated fat mass in these subjects. Their clinical features were less severe than those of subjects with congenital leptin deficiency.

    Conclusions: The prevalence of pathogenic LEPR mutations in a cohort of subjects with severe, early-onset obesity was 3%. Circulating levels of leptin were not disproportionately elevated, suggesting that serum leptin cannot be used as a marker for leptin-receptor deficiency. Congenital leptin-receptor deficiency should be considered in the differential diagnosis in any child with hyperphagia and severe obesity in the absence of developmental delay or dysmorphism.

    Funded by: Medical Research Council: G0502115; Telethon: GJT04008; Wellcome Trust: 067457, 068086, 077016

    The New England journal of medicine 2007;356;3;237-47

  • High resolution array-CGH analysis of single cells.

    Fiegler H, Geigl JB, Langer S, Rigler D, Porter K, Unger K, Carter NP and Speicher MR

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Heterogeneity in the genome copy number of tissues is of particular importance in solid tumor biology. Furthermore, many clinical applications such as pre-implantation and non-invasive prenatal diagnosis would benefit from the ability to characterize individual single cells. As the amount of DNA from single cells is so small, several PCR protocols have been developed in an attempt to achieve unbiased amplification. Many of these approaches are suitable for subsequent cytogenetic analyses using conventional methodologies such as comparative genomic hybridization (CGH) to metaphase spreads. However, attempts to harness array-CGH for single-cell analysis to provide improved resolution have been disappointing. Here we describe a strategy that combines single-cell amplification using GenomePlex library technology (GenomePlex) Single Cell Whole Genome Amplification Kit, Sigma-Aldrich, UK) and detailed analysis of genomic copy number changes by high-resolution array-CGH. We show that single copy changes as small as 8.3 Mb in single cells are detected reliably with single cells derived from various tumor cell lines as well as patients presenting with trisomy 21 and Prader-Willi syndrome. Our results demonstrate the potential of this technology for studies of tumor biology and for clinical diagnostics.

    Funded by: Wellcome Trust

    Nucleic acids research 2007;35;3;e15

  • Construction and use of spotted large-insert clone DNA microarrays for the detection of genomic copy number changes.

    Fiegler H, Redon R and Carter NP

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Microarray-based comparative genomic hybridization has become a widespread method for the analysis of DNA copy number changes across the human genome. Initial methods for microarray construction using large-insert clones required the preparation of DNA from large-scale cultures. This rapidly became an expensive and time-consuming process when expanded to the number of clones needed for higher resolution arrays. To overcome this problem, several PCR-based strategies have been developed to enable array construction from small amounts of cloned DNA. Here, we describe the construction of microarrays composed of human-specific large-insert clones (40-200 kb) using a specific degenerate oligonucleotide PCR strategy. In addition, we also describe array hybridization using manual and automated procedures and methods for array analysis. The technology and protocols described in this article can easily be adapted for other species dependent on the availability of clone libraries. According to our protocols, the procedure will take approximately 3 days from labeling the DNA to scanning the hybridized slides.

    Nature protocols 2007;2;3;577-87

  • Mutations in the BRWD3 gene cause X-linked mental retardation associated with macrocephaly.

    Field M, Tarpey PS, Smith R, Edkins S, O'Meara S, Stevens C, Tofts C, Teague J, Butler A, Dicks E, Barthorpe S, Buck G, Cole J, Gray K, Halliday K, Hills K, Jenkinson A, Jones D, Menzies A, Mironenko T, Perry J, Raine K, Richardson D, Shepherd R, Small A, Varian J, West S, Widaa S, Mallya U, Wooster R, Moon J, Luo Y, Hughes H, Shaw M, Friend KL, Corbett M, Turner G, Partington M, Mulley J, Bobrow M, Schwartz C, Stevenson R, Gecz J, Stratton MR, Futreal PA and Raymond FL

    GOLD Service, Hunter Genetics, Waratah, Australia.

    In the course of systematic screening of the X-chromosome coding sequences in 250 families with nonsyndromic X-linked mental retardation (XLMR), two families were identified with truncating mutations in BRWD3, a gene encoding a bromodomain and WD-repeat domain-containing protein. In both families, the mutation segregates with the phenotype in affected males. Affected males have macrocephaly with a prominent forehead, large cupped ears, and mild-to-moderate intellectual disability. No truncating variants were found in 520 control X chromosomes. BRWD3 is therefore a new gene implicated in the etiology of XLMR associated with macrocephaly and may cause disease by altering intracellular signaling pathways affecting cellular proliferation.

    Funded by: NICHD NIH HHS: HD26202

    American journal of human genetics 2007;81;2;367-74

  • ProServer: a simple, extensible Perl DAS server.

    Finn RD, Stalker JW, Jackson DK, Kulesha E, Clements J and Pettett R

    Wellcome Trust Sanger Institute, Wellcome Trust Geome Campus, Hinxton, Cambridge, UK.

    Summary: The increasing size and complexity of biological databases has led to a growing trend to federate rather than duplicate them. In order to share data between federated databases, protocols for the exchange mechanism must be developed. One such data exchange protocol that is widely used is the Distributed Annotation System (DAS). For example, DAS has enabled small experimental groups to integrate their data into the Ensembl genome browser. We have developed ProServer, a simple, lightweight, Perl-based DAS server that does not depend on a separate HTTP server. The ProServer package is easily extensible, allowing data to be served from almost any underlying data model. Recent additions to the DAS protocol have enabled both structure and alignment (sequence and structural) data to be exchanged. ProServer allows both of these data types to be served.

    Availability: ProServer can be downloaded from or CPAN Details on the system requirements and installation of ProServer can be found at

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2007;23;12;1568-70

  • Y-chromosomal evidence for a limited Greek contribution to the Pathan population of Pakistan.

    Firasat S, Khaliq S, Mohyuddin A, Papaioannou M, Tyler-Smith C, Underhill PA and Ayub Q

    Biomedical and Genetic Engineering Division, Dr. AQ Khan Research Laboratories, Islamabad, Pakistan.

    Three Pakistani populations residing in northern Pakistan, the Burusho, Kalash and Pathan claim descent from Greek soldiers associated with Alexander's invasion of southwest Asia. Earlier studies have excluded a substantial Greek genetic input into these populations, but left open the question of a smaller contribution. We have now typed 90 binary polymorphisms and 16 multiallelic, short-tandem-repeat (STR) loci mapping to the male-specific portion of the human Y chromosome in 952 males, including 77 Greeks in order to re-investigate this question. In pairwise comparisons between the Greeks and the three Pakistani populations using genetic distance measures sensitive to recent events, the lowest distances were observed between the Greeks and the Pathans. Clade E3b1 lineages, which were frequent in the Greeks but not in Pakistan, were nevertheless observed in two Pathan individuals, one of whom shared a 16 Y-STR haplotype with the Greeks. The worldwide distribution of a shortened (9 Y-STR) version of this haplotype, determined from database information, was concentrated in Macedonia and Greece, suggesting an origin there. Although based on only a few unrelated descendants, this provides strong evidence for a European origin for a small proportion of the Pathan Y chromosomes.

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2007;15;1;121-6

  • Characterization of a 3;6 translocation associated with renal cell carcinoma.

    Foster RE, Abdulrahman M, Morris MR, Prigmore E, Gribble S, Ng B, Gentle D, Ready S, Weston PM, Wiesener MS, Kishida T, Yao M, Davison V, Barbero JL, Chu C, Carter NP, Latif F and Maher ER

    Department of Medical and Molecular Genetics, University of Birmingham, The Medical School, Birmingham B15 2TT, UK.

    The most frequent cause of familial clear cell renal cell carcinoma (RCC) is von Hippel-Lindau disease and the VHL tumor suppressor gene (TSG) is inactivated in most sporadic clear cell RCC. Although there is relatively little information on the mechanisms of tumorigenesis of clear cell RCC without VHL inactivation, a subset of familial cases harbors a balanced constitutional chromosome 3 translocation. To date nine different chromosome 3 translocations have been associated with familial or multicentric clear cell RCC; and in three cases chromosome 6 was also involved. To identify candidate genes for renal tumorigenesis we characterized a constitutional translocation, t(3;6)(q22;q16.1) associated with multicentric RCC without evidence of VHL target gene dysregulation. Analysis of breakpoint sequences revealed a 1.3-kb deletion on chromosome 6 within the intron of a 2 exon predicted gene (NT_007299.434). However, RT-PCR analysis failed to detect the expression of this gene in lymphoblast, fibroblast, or kidney tumor cell lines. No known genes were disrupted by the translocation breakpoints but several candidate TSGs (e.g., EPHB1, EPHA7, PPP2R3A RNF184, and STAG1) map within close proximity to the breakpoints.

    Funded by: Wellcome Trust: 077008

    Genes, chromosomes & cancer 2007;46;4;311-7

  • Diffusible signal factor-dependent cell-cell signaling and virulence in the nosocomial pathogen Stenotrophomonas maltophilia.

    Fouhy Y, Scanlon K, Schouest K, Spillane C, Crossman L, Avison MB, Ryan RP and Dow JM

    BIOMERIT Research Centre, Department of Microbiology, BioSciences Institute, National University of Ireland, Cork, Ireland.

    The genome of Stenotrophomonas maltophilia encodes a cell-cell signaling system that is highly related to the diffusible signal factor (DSF)-dependent system of the phytopathogen Xanthomonas campestris. Here we show that in S. maltophilia, DSF signaling controls factors contributing to the virulence and antibiotic resistance of this important nosocomial pathogen.

    Journal of bacteriology 2007;189;13;4964-8

  • Structure, mechanism and catalytic duality of thiamine-dependent enzymes.

    Frank RA, Leeper FJ and Luisi BF

    Department of Biochemistry, University of Cambridge, Cambridge, UK.

    Thiamine is an essential cofactor that is required for processes of general metabolism amongst all organisms, and it is likely to have played a role in the earliest stages of the evolution of life. Here, we review from a structural perspective the enzymatic mechanisms that involve this cofactor. We explore asymmetry within homodimeric thiamine diphosphate (ThDP)-dependent enzyme structures and discuss how this may be correlated with the kinetic properties of half-of-the-sites reactivity, and negative cooperativity. It is likely these structural and kinetic hallmarks may arise through reciprocal coupling of active sites. This mode of communication between distant active sites is not unique to ThDP-dependent enzymes, but is widespread in other classes of oligomeric enzyme. Thus, it appears likely to be a general phenomenon reflecting a powerful mechanism of accelerating the rate of a chemical pathway. Finally, we speculate on the early evolutionary history of the cofactor and its ancient association with protein and RNA.

    Funded by: Wellcome Trust

    Cellular and molecular life sciences : CMLS 2007;64;7-8;892-905

  • PPARGC1A coding variation may initiate impaired NEFA clearance during glucose challenge.

    Franks PW, Ekelund U, Brage S, Luan J, Schafer AJ, O'Rahilly S, Barroso I and Wareham NJ

    Genetic Epidemiology and Clinical Research Group, Department of Public Health and Clinical Medicine, Section for Medicine, Umeå University Hospital, Umeå, Sweden.

    The peroxisome proliferator-activated receptor gamma coactivator 1-alpha protein, encoded by the PPARGC1A gene, transcriptionally activates a complex pathway of lipid and glucose metabolism and is expressed primarily in tissues of high metabolic activity such as liver, heart and exercising oxidative skeletal muscle fibre. Ppargc1a-null mice develop systemic dyslipidaemia and hepatic steatosis. In humans, NEFAs downregulate PPARGC1A expression in skeletal muscle. Furthermore, a common non-synonymous coding variant at PPARGC1A (Gly482Ser, rs8192678) is associated with decreased PPARGC1A mRNA levels and increased type 2 diabetes risk.

    In a population-based sample of 691 healthy middle-aged Europids we assessed whether Gly482Ser is associated with levels of NEFA when fasting and in response to an oral glucose challenge. We also assessed the potential effect-modifying role of adipose tissue mass on these phenotypes.

    Results: After adjustment for age, sex, fat mass and fat-free mass, the Ser482 allele associated with higher NEFA at 30 min and 2 h and with NEFA AUC (all values p<or=0.02). Furthermore, suggestive evidence of interaction between fat mass and Gly482Ser was observed for fasting NEFA (p=0.059). After stratification by level of obesity, genotype associations were observed in the obese for fasting NEFA (p=0.028) and NEFA at 30 min (p=0.013) and 2 h (p=0.002), and with NEFA AUC (p=0.005), but no significant associations were observed in lean individuals (all values p>0.6).

    Our observations indicate that NEFA clearance is blunted following a glucose load in carriers of the PPARCG1A Ser482 allele. This association is augmented by obesity.

    Funded by: Medical Research Council: MC_U106179471, MC_U106179473; Wellcome Trust: 077016

    Diabetologia 2007;50;3;569-73

  • A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity.

    Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW, Shields B, Harries LW, Barrett JC, Ellard S, Groves CJ, Knight B, Patch AM, Ness AR, Ebrahim S, Lawlor DA, Ring SM, Ben-Shlomo Y, Jarvelin MR, Sovio U, Bennett AJ, Melzer D, Ferrucci L, Loos RJ, Barroso I, Wareham NJ, Karpe F, Owen KR, Cardon LR, Walker M, Hitman GA, Palmer CN, Doney AS, Morris AD, Smith GD, Hattersley AT and McCarthy MI

    Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, Magdalen Road, Exeter, UK.

    Obesity is a serious international health problem that increases the risk of several common diseases. The genetic factors predisposing to obesity are poorly understood. A genome-wide search for type 2 diabetes-susceptibility genes identified a common variant in the FTO (fat mass and obesity associated) gene that predisposes to diabetes through an effect on body mass index (BMI). An additive association of the variant with BMI was replicated in 13 cohorts with 38,759 participants. The 16% of adults who are homozygous for the risk allele weighed about 3 kilograms more and had 1.67-fold increased odds of obesity when compared with those not inheriting a risk allele. This association was observed from age 7 years upward and reflects a specific increase in fat mass.

    Funded by: Medical Research Council: G0000934, G0500070, G0600705, G9815508, MC_U106179471, MC_U106188470; NIA NIH HHS: Z99 AG999999; Wellcome Trust: 079557

    Science (New York, N.Y.) 2007;316;5826;889-94

  • Definition of the zebrafish genome using flow cytometry and cytogenetic mapping.

    Freeman JL, Adeniyi A, Banerjee R, Dallaire S, Maguire SF, Chi J, Ng BL, Zepeda C, Scott CE, Humphray S, Rogers J, Zhou Y, Zon LI, Carter NP, Yang F and Lee C

    Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA. <;

    Background: The zebrafish (Danio rerio) is an important vertebrate model organism system for biomedical research. The syntenic conservation between the zebrafish and human genome allows one to investigate the function of human genes using the zebrafish model. To facilitate analysis of the zebrafish genome, genetic maps have been constructed and sequence annotation of a reference zebrafish genome is ongoing. However, the duplicative nature of teleost genomes, including the zebrafish, complicates accurate assembly and annotation of a representative genome sequence. Cytogenetic approaches provide "anchors" that can be integrated with accumulating genomic data.

    Results: Here, we cytogenetically define the zebrafish genome by first estimating the size of each linkage group (LG) chromosome using flow cytometry, followed by the cytogenetic mapping of 575 bacterial artificial chromosome (BAC) clones onto metaphase chromosomes. Of the 575 BAC clones, 544 clones localized to apparently unique chromosomal locations. 93.8% of these clones were assigned to a specific LG chromosome location using fluorescence in situ hybridization (FISH) and compared to the LG chromosome assignment reported in the zebrafish genome databases. Thirty-one BAC clones localized to multiple chromosomal locations in several different hybridization patterns. From these data, a refined second generation probe panel for each LG chromosome was also constructed.

    Conclusion: The chromosomal mapping of the 575 large-insert DNA clones allows for these clones to be integrated into existing zebrafish mapping data. An accurately annotated zebrafish reference genome serves as a valuable resource for investigating the molecular basis of human diseases using zebrafish mutant models.

    Funded by: NCI NIH HHS: R01-CA111560; NHLBI NIH HHS: T32 HL007627; Wellcome Trust

    BMC genomics 2007;8;195

  • Construction, visualisation, and clustering of transcription networks from microarray expression data.

    Freeman TC, Goldovsky L, Brosch M, van Dongen S, Mazière P, Grocock RJ, Freilich S, Thornton J and Enright AJ

    Division of Pathway Medicine, University of Edinburgh Medical School, Edinburgh, United Kingdom.

    Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account. Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins. However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context. We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data. These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions. Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets. This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques. This work has been implemented in a freely available open-source application named BioLayout Express(3D).

    Funded by: Wellcome Trust

    PLoS computational biology 2007;3;10;2032-42

  • Backseat drivers take the wheel.

    Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Somatic mutations in human cancers are comprised of those that contribute to the oncogenic phenotype, driver mutations, and those that reflect the general patterns of exposure and disrepair but are otherwise noncontributory, passenger mutations. Distinguishing drivers that can be of low frequency in any given tumor type from often more numerous passengers is a key challenge. In this issue of Cancer Cell, Fröhling and colleagues tackle this challenge admirably for the known cancer gene FLT3 in acute myeloid leukemia--undertaking a systematic resequencing and functional validation approach, identifying important rare driver mutations as well as passenger mutations in patients negative for the more common activating mutations.

    Funded by: Wellcome Trust: 077012

    Cancer cell 2007;12;6;493-4

  • Fxna, a novel gene differentially expressed in the rat ovary at the time of folliculogenesis, is required for normal ovarian histogenesis.

    Garcia-Rudaz C, Luna F, Tapia V, Kerr B, Colgin L, Galimi F, Dissen GA, Rawlings ND and Ojeda SR

    Division of Neuroscience, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR , USA.

    In rodents, the formation of ovarian follicles occurs after birth. In recent years, several factors required for follicular assembly and the growth of the newly formed follicles have been identified. We now describe a novel gene, Fxna, identified by differential display in the neonatal rat ovary. Fxna encodes an mRNA of 5.4 kb, and a protein of 898 amino acids. Fxna is a transmembrane metallopeptidase from family M28, localized to the endoplasmic reticulum. In the ovary, Fxna mRNA is expressed in granulosa cells; its abundance is maximal 48 hours after birth, i.e. during the initiation of follicular assembly. Reducing Fxna mRNA levels via lentiviral-mediated delivery of short hairpin RNAs to neonatal ovaries resulted in substantial loss of primordial, primary and secondary follicles, and structural disorganization of the ovary, with many abnormal follicles containing more than one oocyte and clusters of somatic cells not associated with any oocytes. These abnormalities were not attributable to either increased apoptosis or decreased proliferation of granulosa cells. The results indicate that Fxna is required for the organization of somatic cells and oocytes into discrete follicular structures. As an endoplasmic reticulum-bound peptidase, Fxna may facilitate follicular organization by processing precursor proteins required for intraovarian cell-to-cell communication.

    Funded by: NCRR NIH HHS: RR-00163; NICHD NIH HHS: HD-24870, TW/HD00668, U54 HD18185

    Development (Cambridge, England) 2007;134;5;945-57

  • Large-scale mapping of mutations affecting zebrafish development.

    Geisler R, Rauch GJ, Geiger-Rudolph S, Albrecht A, van Bebber F, Berger A, Busch-Nentwich E, Dahm R, Dekens MP, Dooley C, Elli AF, Gehring I, Geiger H, Geisler M, Glaser S, Holley S, Huber M, Kerr A, Kirn A, Knirsch M, Konantz M, Küchler AM, Maderspacher F, Neuhauss SC, Nicolson T, Ober EA, Praeg E, Ray R, Rentzsch B, Rick JM, Rief E, Schauerte HE, Schepp CP, Schönberger U, Schonthaler HB, Seiler C, Sidi S, Söllner C, Wehner A, Weiler C and Nüsslein-Volhard C

    Department 3--Genetics, Max-Planck-Institut für Entwicklungsbiologie, Spemannstr, 35/III, 72076 Tübingen, Germany.

    Background: Large-scale mutagenesis screens in the zebrafish employing the mutagen ENU have isolated several hundred mutant loci that represent putative developmental control genes. In order to realize the potential of such screens, systematic genetic mapping of the mutations is necessary. Here we report on a large-scale effort to map the mutations generated in mutagenesis screening at the Max Planck Institute for Developmental Biology by genome scanning with microsatellite markers.

    Results: We have selected a set of microsatellite markers and developed methods and scoring criteria suitable for efficient, high-throughput genome scanning. We have used these methods to successfully obtain a rough map position for 319 mutant loci from the Tübingen I mutagenesis screen and subsequent screening of the mutant collection. For 277 of these the corresponding gene is not yet identified. Mapping was successful for 80 % of the tested loci. By comparing 21 mutation and gene positions of cloned mutations we have validated the correctness of our linkage group assignments and estimated the standard error of our map positions to be approximately 6 cM.

    Conclusion: By obtaining rough map positions for over 300 zebrafish loci with developmental phenotypes, we have generated a dataset that will be useful not only for cloning of the affected genes, but also to suggest allelism of mutations with similar phenotypes that will be identified in future screens. Furthermore this work validates the usefulness of our methodology for rapid, systematic and inexpensive microsatellite mapping of zebrafish mutations.

    BMC genomics 2007;8;11

  • RNA interference in parasitic helminths: current situation, potential pitfalls and future prospects.

    Geldhof P, Visser A, Clark D, Saunders G, Britton C, Gilleard J, Berriman M and Knox D

    Faculty of Veterinary Medicine, Department of Virology, Parasitology and Immunology, Salisburylaan 133, B-9820 Merelbeke, Belgium.

    RNA interference (RNAi) has become an invaluable tool for the functional analysis of genes in a wide variety of organisms including the free-living nematode Caenorhabditis elegans. Recently, attempts have been made to apply this technology to parasitic helminths of animals and plants with variable success. Gene knockdown has been reported for Schistosoma mansoni by soaking or electroporating different life-stages in dsRNA. Similar approaches have been tested on parasitic nematodes which clearly showed that, under certain conditions, it was possible to interfere with gene expression. However, despite these successes, the current utility of this technology in parasite research is questionable. First, problems have arisen with the specificity of RNAi. Treatment of the parasites with dsRNA resulted, in many cases, in non-specific effects. Second, the current RNAi methods have a limited efficiency and effects are sometimes difficult to reproduce. This was especially the case in strongylid parasites where only a small number of genes were susceptible to RNAi-mediated gene knockdown. The future application of RNAi in parasite functional genomics will greatly depend on how we can overcome these difficulties. Optimization of the dsRNA delivery methods and in vitro culture conditions will be the major challenges.

    Funded by: Wellcome Trust

    Parasitology 2007;134;Pt 5;609-19

  • The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase.

    Gerken T, Girard CA, Tung YC, Webby CJ, Saudek V, Hewitson KS, Yeo GS, McDonough MA, Cunliffe S, McNeill LA, Galvanovskis J, Rorsman P, Robins P, Prieur X, Coll AP, Ma M, Jovanovic Z, Farooqi IS, Sedgwick B, Barroso I, Lindahl T, Ponting CP, Ashcroft FM, O'Rahilly S and Schofield CJ

    Chemistry Research Laboratory and Oxford Centre for Integrative Systems Biology, University of Oxford, 12 Mansfield Road, Oxford, Oxon OX1 3TA, UK.

    Variants in the FTO (fat mass and obesity associated) gene are associated with increased body mass index in humans. Here, we show by bioinformatics analysis that FTO shares sequence motifs with Fe(II)- and 2-oxoglutarate-dependent oxygenases. We find that recombinant murine Fto catalyzes the Fe(II)- and 2OG-dependent demethylation of 3-methylthymine in single-stranded DNA, with concomitant production of succinate, formaldehyde, and carbon dioxide. Consistent with a potential role in nucleic acid demethylation, Fto localizes to the nucleus in transfected cells. Studies of wild-type mice indicate that Fto messenger RNA (mRNA) is most abundant in the brain, particularly in hypothalamic nuclei governing energy balance, and that Fto mRNA levels in the arcuate nucleus are regulated by feeding and fasting. Studies can now be directed toward determining the physiologically relevant FTO substrate and how nucleic acid methylation status is linked to increased fat mass.

    Funded by: Medical Research Council: G108/617, G9824984, MC_U137761446; NIGMS NIH HHS: U54 GM064346; Wellcome Trust: 068086, 077016

    Science (New York, N.Y.) 2007;318;5855;1469-72

  • Draft genome of the filarial nematode parasite Brugia malayi.

    Ghedin E, Wang S, Spiro D, Caler E, Zhao Q, Crabtree J, Allen JE, Delcher AL, Guiliano DB, Miranda-Saavedra D, Angiuoli SV, Creasy T, Amedeo P, Haas B, El-Sayed NM, Wortman JR, Feldblyum T, Tallon L, Schatz M, Shumway M, Koo H, Salzberg SL, Schobel S, Pertea M, Pop M, White O, Barton GJ, Carlow CK, Crawford MJ, Daub J, Dimmic MW, Estes CF, Foster JM, Ganatra M, Gregory WF, Johnson NM, Jin J, Komuniecki R, Korf I, Kumar S, Laney S, Li BW, Li W, Lindblom TH, Lustigman S, Ma D, Maina CV, Martin DM, McCarter JP, McReynolds L, Mitreva M, Nutman TB, Parkinson J, Peregrín-Alvarez JM, Poole C, Ren Q, Saunders L, Sluder AE, Smith K, Stanke M, Unnasch TR, Ware J, Wei AD, Weil G, Williams DJ, Zhang Y, Williams SA, Fraser-Liggett C, Slatko B, Blaxter ML and Scott AL

    Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA.

    Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the approximately 90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict approximately 11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during approximately 350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design.

    Funded by: NIAID NIH HHS: R01 AI048562, R01 AI048562-09, U01-AI50903; NIEHS NIH HHS: R15 ES013128, R15 ES013128-01; NLM NIH HHS: R01 LM006845, R01 LM006845-08, R01 LM007938, R01 LM007938-04

    Science (New York, N.Y.) 2007;317;5845;1756-60

  • Chromosomal evolution in tenrecs (Microgale and Oryzorictes, Tenrecidae) from the Central Highlands of Madagascar.

    Gilbert C, Goodman SM, Soarimalala V, Olson LE, O'Brien PC, Elder FF, Yang F, Ferguson-Smith MA and Robinson TJ

    Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Stellenbosch, South Africa.

    Tenrecs (Tenrecidae) are a widely diversified assemblage of small eutherian mammals that occur in Madagascar and Western and Central Africa. With the exception of a few early karyotypic descriptions based on conventional staining, nothing is known about the chromosomal evolution of this family. We present a detailed analysis of G-banded and molecularly defined chromosomes based on fluorescence in situ hybridization (FISH) that allows a comprehensive comparison between the karyotypes of 11 species of two closely related Malagasy genera, Microgale (10 species) and Oryzorictes (one species), of the subfamily Oryzorictinae. The karyotypes of Microgale taiva and M. parvula (2n = 32) were found to be identical to that of O. hova (2n = 32) most likely reflecting the ancestral karyotypes of both genera, as well as that of the Oryzorictinae. Parsimony analysis of chromosomal rearrangements that could have arisen following Whole Arm Reciprocal Translocations (WARTs) showed, however, that these are more likely to be the result of Robertsonian translocations. A single most parsimonious tree was obtained that provides strong support for three species associations within Microgale, all of which are consistent with previous molecular and morphological investigations. By expanding on a recently published molecular clock for the Tenrecidae we were able to place our findings in a temporal framework that shows strong chromosomal rate heterogeneity within the Oryzorictinae. We use these data to critically examine the possible role of chromosomal rearrangements in speciation within Microgale.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2007;15;8;1075-91

  • WormBook: the online review of Caenorhabditis elegans biology.

    Girard LR, Fiedler TJ, Harris TW, Carvalho F, Antoshechkin I, Han M, Sternberg PW, Stein LD and Chalfie M

    Division of Biology, 156-29, Pasadena, CA 91125, USA.

    WormBook ( is an open-access, online collection of original, peer-reviewed chapters on the biology of Caenorhabditis elegans and related nematodes. Since WormBook was launched in June 2005 with 12 chapters, it has grown to over 100 chapters, covering nearly every aspect of C.elegans research, from Cell Biology and Neurobiology to Evolution and Ecology. WormBook also serves as the text companion to WormBase, the C.elegans model organism database. Objects such as genes, proteins and cells are linked to the relevant pages in WormBase, providing easily accessible background information. Additionally, WormBook chapters contain links to other relevant topics in WormBook, and the in-text citations are linked to their abstracts in PubMed and full-text references, if available. Since WormBook is online, its chapters are able to contain movies and complex images that would not be possible in a print version. WormBook is designed to keep up with the rapid pace of discovery in the field of C.elegans research and continues to grow. WormBook represents a generic publishing infrastructure that is easily adaptable to other research communities to facilitate the dissemination of knowledge in the field.

    Funded by: NHGRI NIH HHS: P41 HG02223

    Nucleic acids research 2007;35;Database issue;D472-5

  • Setting the tempo in development: an investigation of the zebrafish somite clock mechanism.

    Giudicelli F, Ozbudak EM, Wright GJ and Lewis J

    Vertebrate Development Laboratory, Cancer Research UK London Research Institute, London, United Kingdom.

    The somites of the vertebrate embryo are clocked out sequentially from the presomitic mesoderm (PSM) at the tail end of the embryo. Formation of each somite corresponds to one cycle of oscillation of the somite segmentation clock--a system of genes whose expression switches on and off periodically in the cells of the PSM. We have previously proposed a simple mathematical model explaining how the oscillations, in zebrafish at least, may be generated by a delayed negative feedback loop in which the products of two Notch target genes, her1 and her7, directly inhibit their own transcription, as well as that of the gene for the Notch ligand DeltaC; Notch signalling via DeltaC keeps the oscillations of neighbouring cells in synchrony. Here we subject the model to quantitative tests. We show how to read temporal information from the spatial pattern of stripes of gene expression in the anterior PSM and in this way obtain values for the biosynthetic delays and molecular lifetimes on which the model critically depends. Using transgenic lines of zebrafish expressing her1 or her7 under heat-shock control, we confirm the regulatory relationships postulated by the model. From the timing of somite segmentation disturbances following a pulse of her7 misexpression, we deduce that although her7 continues to oscillate in the anterior half of the PSM, it governs the future somite segmentation behaviour of the cells only while they are in the posterior half. In general, the findings strongly support the mathematical model of how the somite clock works, but they do not exclude the possibility that other oscillator mechanisms may operate upstream from the her7/her1 oscillator or in parallel with it.

    PLoS biology 2007;5;6;e150

  • Cutaneous immune responses in the common carp detected using transcript analysis.

    Gonzalez SF, Chatziandreou N, Nielsen ME, Li W, Rogers J, Taylor R, Santos Y and Cossins A

    Department of Veterinary Pathobiology, Laboratory for Fish Diseases, The Royal Veterinary and Agricultural University, Stigbøjlen 7, DK-1870 Frederiksberg C, Denmark.

    In order to detect new immune-related genes in common carp (Cyprinus carpio L.) challenged by an ectoparasitic infection, two cDNA libraries were constructed from carp skin sampled at 3 and 72h after infection with Ichthyophthirius multifiliis. In a total of 3500 expressed sequence tags (ESTs) we identified 82 orthologues of genes of immune relevance previously described in other organisms. Of these, 61 have never been described before in C. carpio, thus shedding light on some key components of the defence mechanisms of this species. Among the newly described genes, full-length molecules of prostaglandin D2 synthase (PGDS), the CC chemokine molecule SCYA103, and a second gene for the carp beta(2)-microglobulin (beta(2)m), beta(2)m-2, were described. Transcript amounts of the genes PGDS, interferon (IFN), SCYA103, complement factor 7 (C7), complement factor P (FP), complement factor D (FD) and beta(2)m-2 were evaluated by real-time quantitative PCR (RQ-PCR). Samples from skin, blood and liver from fish challenged with I. multifiliis were taken at 3, 12, 24, 36 and 48h post infection. Higher expression levels of most of these transcripts were observed in skin from uninfected fish, compared to the transcript levels detected in blood and liver from the same animals. Also, there was significant down-regulation of the genes PGDS and beta(2)m-2 in skin, whilst significant up-regulation was observed for the C7 and SCYA103 genes in liver of fish infected with the parasite. These results confirm the active role of fish skin in the immune response against infections, acting as an important site of expression of immune-related molecules.

    Molecular immunology 2007;44;7;1664-79

  • Genome-wide dynamics of SAPHIRE, an essential complex for gene activation and chromatin boundaries.

    Gordon M, Holt DG, Panigrahi A, Wilhelm BT, Erdjument-Bromage H, Tempst P, Bähler J and Cairns BR

    University of Utah, Huntsman Cancer Institute, 2000 Circle of Hope, Salt Lake City, UT 84112, USA.

    In this study, we characterize a four-protein nucleosome-binding complex from Schizosaccharomyces pombe, termed SAPHIRE, that includes two orthologs of human Lsd1, a histone demethylase. The SAPHIRE complex is essential for cell viability, whereas saphire mutants lacking key conserved catalytic residues are viable but thermosensitive, suggesting that SAPHIRE has both an important enzymatic function and an essential nonenzymatic function. SAPHIRE is present in (or adjacent to) particular heterochromatic loci and also in the transcription start site regions of many highly active polymerase II genes. However, ribosomal protein genes are notably SAPHIRE deficient. SAPHIRE promotes activation, as target genes are selectively attenuated in saphire mutants. Interestingly, saphire mutants display increased histone H3 lysine 4 dimethylation, a modification typically associated with euchromatin. SAPHIRE localization is dynamic, as activated genes rapidly acquire SAPHIRE. Furthermore, saphire mutants dramatically shift a heterochromatin-euchromatin boundary in Chr1, suggesting a novel role in boundary regulation.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Molecular and cellular biology 2007;27;11;4058-69

  • Modulation of steroidogenic gene expression and hormone production of H295R cells by pharmaceuticals and other environmentally active compounds.

    Gracia T, Hilscherova K, Jones PD, Newsted JL, Higley EB, Zhang X, Hecker M, Murphy MB, Yu RM, Lam PK, Wu RS and Giesy JP

    Department of Zoology, National Food Safety and Toxicology Center, Center for Integrative Toxicology, Michigan State University, East Lansing, MI 48824, USA.

    The H295R cell bioassay was used to evaluate the potential endocrine disrupting effects of 18 of the most commonly used pharmaceuticals in the United States. Exposures for 48 h with single pharmaceuticals and binary mixtures were conducted; the expression of five steroidogenic genes, 3betaHSD2, CYP11beta1, CYP11beta2, CYP17 and CYP19, was quantified by Q-RT-PCR. Production of the steroid hormones estradiol (E2), testosterone (T) and progesterone (P) was also evaluated. Antibiotics were shown to modulate gene expression and hormone production. Amoxicillin up-regulated the expression of CYP11beta2 and CYP19 by more than 2-fold and induced estradiol production up to almost 3-fold. Erythromycin significantly increased CYP11beta2 expression and the production of P and E2 by 3.5- and 2.4-fold, respectively, while production of T was significantly decreased. The beta-blocker salbutamol caused the greatest induction of CYP17, more than 13-fold, and significantly decreased E2 production. The binary mixture of cyproterone and salbutamol significantly down-regulated expression of CYP19, while a mixture of ethynylestradiol and trenbolone, increased E2 production 3.7-fold. Estradiol production was significantly affected by changes in concentrations of trenbolone, cyproterone, and ethynylestradiol. Exposures with individual pharmaceuticals showed the possible secondary effects that drugs may exert on steroid production. Results from binary mixture exposures suggested the possible type of interactions that may occur between drugs and the joint effects product of such interactions. Dose-response results indicated that although two chemicals may share a common mechanism of action the concentration effects observed may be significantly different.

    Toxicology and applied pharmacology 2007;225;2;142-53

  • Toward a molecular catalogue of synapses.

    Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, Cambridgeshire, UK.

    1906 was a landmark year in the history of the study of the nervous system, most notably for the first 'neuroscience' Nobel prize given to the anatomists Ramon Y Cajal and Camillo Golgi. 1906 is less well known for another event, also of great significance for neuroscience, namely the publication of Charles Sherrington's book 'The Integrative Action of the Nervous system'. It was Cajal and Golgi who debated the anatomical evidence for the synapse and it was Sherrington who laid its foundation in electrophysiological function. In tribute to these pioneers in synaptic biology, this article will address the issue of synapse diversity from the molecular point of view. In particular I will reflect upon efforts to obtain a complete molecular characterisation of the synapse and the unexpectedly high degree of molecular complexity found within it. A case will be made for developing approaches that can be used to generate a general catalogue of synapse types based on molecular markers, which should have wide application.

    Funded by: Wellcome Trust

    Brain research reviews 2007;55;2;445-9

  • Ultra-high resolution array painting facilitates breakpoint sequencing.

    Gribble SM, Kalaitzopoulos D, Burford DC, Prigmore E, Selzer RR, Ng BL, Matthews NS, Porter KM, Curley R, Lindsay SJ, Baptista J, Richmond TA and Carter NP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Objective: To describe a considerably advanced method of array painting, which allows the rapid, ultra-high resolution mapping of translocation breakpoints such that rearrangement junction fragments can be amplified directly and sequenced.

    Method: Ultra-high resolution array painting involves the hybridisation of probes generated by the amplification of small numbers of flow-sorted derivative chromosomes to oligonucleotide arrays designed to tile breakpoint regions at extremely high resolution.

    How ultra-high resolution array painting of four balanced translocation cases rapidly and efficiently maps breakpoints to a point where junction fragments can be amplified easily and sequenced is demonstrated. With this new development, breakpoints can be mapped using just two array experiments: the first using whole-genome array painting to tiling resolution large insert clone arrays, the second using ultra-high-resolution oligonucleotide arrays targeted to the breakpoint regions. In this way, breakpoints can be mapped and then sequenced in a few weeks.

    Funded by: Wellcome Trust: 077008

    Journal of medical genetics 2007;44;1;51-8

  • Annotating noncoding RNA genes.

    Griffiths-Jones S

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Noncoding RNA genes produce a functional RNA product rather than a translated protein. More than 1500 homologs of known "classical" RNA genes can be annotated in the human genome sequence, and automatic homology-based methods predict up to 5000 related sequences. Methods to predict novel RNA genes on a whole-genome scale are immature at present, but their use hints at tens of thousands of such genes in the human genome. Messenger RNA-like transcripts with no protein-coding potential are routinely discovered by high-throughput transcriptome analyses. Meanwhile, various experimental studies have suggested that the vast majority of the human genome is transcribed, although the proportion of the detected RNAs that is functional remains unknown.

    Funded by: Wellcome Trust

    Annual review of genomics and human genetics 2007;8;279-98

  • FAK is required for axonal sorting by Schwann cells.

    Grove M, Komiyama NH, Nave KA, Grant SG, Sherman DL and Brophy PJ

    Centre for Neuroscience Research, University of Edinburgh, Edinburgh EH9 1QH, Scotland, UK.

    Signaling by laminins and axonal neuregulin has been implicated in regulating axon sorting by myelin-forming Schwann cells. However, the signal transduction mechanisms are unknown. Focal adhesion kinase (FAK) has been linked to alpha6beta1 integrin and ErbB receptor signaling, and we show that myelination by Schwann cells lacking FAK is severely impaired. Mutant Schwann cells could interdigitate between axon bundles, indicating that FAK signaling was not required for process extension. However, Schwann cell FAK was required to stimulate cell proliferation, suggesting that amyelination was caused by insufficient Schwann cells. ErbB2 receptor and AKT were robustly phosphorylated in mutant Schwann cells, indicating that neuregulin signaling from axons was unimpaired. These findings demonstrate the vital relationship between axon defasciculation and Schwann cell number and show the importance of FAK in regulating cell proliferation in the developing nervous system.

    Funded by: Wellcome Trust

    The Journal of cell biology 2007;176;3;277-82

  • Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence.

    Gundogdu O, Bentley SD, Holden MT, Parkhill J, Dorrell N and Wren BW

    Pathogen Molecular Department, London School of Hygiene & Tropical Medicine, UK. <;

    Background: Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation.

    Results: Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised.

    Conclusions: Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.

    Funded by: Wellcome Trust

    BMC genomics 2007;8;162

  • Improving the power to detect differentially expressed genes in comparative microarray experiments by including information from self-self hybridizations.

    Gusnanto A, Tom B, Burns P, Macaulay I, Thijssen-Timmer DC, Tijssen MR, Langford C, Watkins N, Ouwehand W, Berzuini C and Dudbridge F

    Medical Research Council-Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR, UK.

    Our ability to detect differentially expressed genes in a microarray experiment can be hampered when the number of biological samples of interest is limited. In this situation, we propose the use of information from self-self hybridizations to acuminate our inference of differential expression. A unified modelling strategy is developed to allow better estimation of the error variance. This principle is similar to the use of a pooled variance estimate in the two-sample t-test. The results from real dataset examples suggest that we can detect more genes that are differentially expressed in the combined models. Our simulation study provides evidence that this method increases sensitivity compared to using the information from comparative hybridizations alone, given the same control for false discovery rate. The largest increase in sensitivity occurs when the amount of information in the comparative hybridization is limited.

    Funded by: Medical Research Council: MC_U105260799, MC_U105261167

    Computational biology and chemistry 2007;31;3;178-85

  • Schistosoma mansoni genome: closing in on a final gene set.

    Haas BJ, Berriman M, Hirai H, Cerqueira GG, Loverde PT and El-Sayed NM

    The J.C. Venter Institute, Rockville, MD 20850, USA.

    The Schistosoma mansoni genome sequencing consortium has recently released the latest versions of the genome assembly as well as an automated preliminary gene structure annotation. The combined datasets constitute a vast resource for researchers to exploit in a variety of post-genomic studies with an emphasis of transcriptomic and proteomic tools. Here we present an innovative method used for combining diverse sources of evidence including ab initio gene predictions, protein and transcript sequence homologies, and cross-genome sequence homologies between S. mansoni and Schistosoma japonicum to define a comprehensive list of protein-coding genes.

    Funded by: NIAID NIH HHS: AI48828; Wellcome Trust: 13557021

    Experimental parasitology 2007;117;3;225-8

  • Lessons learned from the initial sequencing of the pig genome: comparative analysis of an 8 Mb region of pig chromosome 17.

    Hart EA, Caccamo M, Harrow JL, Humphray SJ, Gilbert JG, Trevanion S, Hubbard T, Rogers J and Rothschild MF

    Wellcome Trust Sanger Institute, Wellcome Tust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Background: We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage.

    Results: Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs.

    Conclusion: We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS.

    Funded by: Biotechnology and Biological Sciences Research Council: BBE0116401; Wellcome Trust: 077198

    Genome biology 2007;8;8;R168

  • Specialist fungi, versatile genomes.

    Hertz-Fowler C and Pain A

    Nature reviews. Microbiology 2007;5;5;332-3

  • Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome.

    Hiller NL, Janto B, Hogg JS, Boissy R, Yu S, Powell E, Keefe R, Ehrlich NE, Shen K, Hayes J, Barbadora K, Klimke W, Dernovoy D, Tatusova T, Parkhill J, Bentley SD, Post JC, Ehrlich GD and Hu FZ

    Allegheny General Hospital, Allegheny-Singer Research Institute, Center for Genomic Sciences, Pittsburgh, PA 15212, USA.

    The distributed-genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium and that these pathogens utilize genetic recombination and a large, noncore set of genes as a means of diversity generation. We sequenced the genomes of eight nasopharyngeal strains of Streptococcus pneumoniae isolated from pediatric patients with upper respiratory symptoms and performed quantitative genomic analyses among these and nine publicly available pneumococcal strains. Coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which 1,454 (46%) were conserved among all 17 strains. The majority of the gene clusters, 1,716 (54%), were not found in all strains. Genic differences per strain pair ranged from 35 to 629 orthologous clusters, with each strain's genome containing between 21 and 32% noncore genes. The distribution of the orthologous clusters per genome for the 17 strains was entered into the finite-supragenome model, which predicted that (i) the S. pneumoniae supragenome contains more than 5,000 orthologous clusters and (ii) 99% of the orthologous clusters ( approximately 3,000) that are represented in the S. pneumoniae population at frequencies of >or=0.1 can be identified if 33 representative genomes are sequenced. These extensive genic diversity data support the DGH and provide a basis for understanding the great differences in clinical phenotype associated with various pneumococcal strains. When these findings are taken together with previous studies that demonstrated the presence of a supragenome for Streptococcus agalactiae and Haemophilus influenzae, it appears that the possession of a distributed genome is a common host interaction strategy.

    Funded by: NIDCD NIH HHS: DC02148, DC04173, DC05659; Wellcome Trust

    Journal of bacteriology 2007;189;22;8186-95

  • Complete genome of acute rheumatic fever-associated serotype M5 Streptococcus pyogenes strain manfredo.

    Holden MT, Scott A, Cherevach I, Chillingworth T, Churcher C, Cronin A, Dowd L, Feltwell T, Hamlin N, Holroyd S, Jagels K, Moule S, Mungall K, Quail MA, Price C, Rabbinowitsch E, Sharp S, Skelton J, Whitehead S, Barrell BG, Kehoe M and Parkhill J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Comparisons of the 1.84-Mb genome of serotype M5 Streptococcus pyogenes strain Manfredo with previously sequenced genomes emphasized the role of prophages in diversification of S. pyogenes and the close relationship between strain Manfredo and MGAS8232, another acute rheumatic fever-associated strain.

    Funded by: Wellcome Trust

    Journal of bacteriology 2007;189;4;1473-7

  • Multidrug-resistant Salmonella enterica serovar paratyphi A harbors IncHI1 plasmids similar to those found in serovar typhi.

    Holt KE, Thomson NR, Wain J, Phan MD, Nair S, Hasan R, Bhutta ZA, Quail MA, Norbertczak H, Walker D, Dougan G and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Salmonella enterica serovars Typhi and Paratyphi A cause systemic infections in humans which are referred to as enteric fever. Multidrug-resistant (MDR) serovar Typhi isolates emerged in the 1980s, and in recent years MDR serovar Paratyphi A infections have become established as a significant problem across Asia. MDR in serovar Typhi is almost invariably associated with IncHI1 plasmids, but the genetic basis of MDR in serovar Paratyphi A has remained predominantly undefined. The DNA sequence of an IncHI1 plasmid, pAKU_1, encoding MDR in a serovar Paratyphi A strain has been determined. Significantly, this plasmid shares a common IncHI1-associated DNA backbone with the serovar Typhi plasmid pHCM1 and an S. enterica serovar Typhimurium plasmid pR27. Plasmids pAKU_1 and pHCM1 share 14 antibiotic resistance genes encoded within similar mobile elements, which appear to form a 24-kb composite transposon that has transferred as a single unit into different positions into their IncHI1 backbones. Thus, these plasmids have acquired similar antibiotic resistance genes independently via the horizontal transfer of mobile DNA elements. Furthermore, two IncHI1 plasmids from a Vietnamese isolate of serovar Typhi were found to contain features of the backbone sequence of pAKU_1 rather than pHCM1, with the composite transposon inserted in the same location as in the pAKU_1 sequence. Our data show that these serovar Typhi and Paratyphi A IncHI1 plasmids share highly conserved core DNA and have acquired similar mobile elements encoding antibiotic resistance genes in past decades.

    Funded by: Medical Research Council: G0600805; Wellcome Trust

    Journal of bacteriology 2007;189;11;4257-64

  • Generation of active protein phosphatase 2A is coupled to holoenzyme assembly.

    Hombauer H, Weismann D, Mudrak I, Stanzel C, Fellner T, Lackner DH and Ogris E

    Department of Medical Biochemistry, Max F. Perutz Laboratories, Medical University of Vienna, Vienna, Austria.

    Protein phosphatase 2A (PP2A) is a prime example of the multisubunit architecture of protein serine/threonine phosphatases. Until substrate-specific PP2A holoenzymes assemble, a constitutively active, but nonspecific, catalytic C subunit would constitute a risk to the cell. While it has been assumed that the severe proliferation impairment of yeast lacking the structural PP2A subunit, TPD3, is due to the unrestricted activity of the C subunit, we recently obtained evidence for the existence of the C subunit in a low-activity conformation that requires the RRD/PTPA proteins for the switch into the active conformation. To study whether and how maturation of the C subunit is coupled with holoenzyme assembly, we analyzed PP2A biogenesis in yeast. Here we show that the generation of the catalytically active C subunit depends on the physical and functional interaction between RRD2 and the structural subunit, TPD3. The phenotype of the tpd3Delta strain is therefore caused by impaired, rather than increased, PP2A activity. TPD3/RRD2-dependent C subunit maturation is under the surveillance of the PP2A methylesterase, PPE1, which upon malfunction of PP2A biogenesis, prevents premature generation of the active C subunit and holoenzyme assembly by counteracting the untimely methylation of the C subunit. We propose a novel model of PP2A biogenesis in which a tightly controlled activation cascade protects cells from untargeted activity of the free catalytic PP2A subunit.

    PLoS biology 2007;5;6;e155

  • Ensembl 2007.

    Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A and Birney E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    The Ensembl ( project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/13446, BBS/B/13470; Wellcome Trust: 062023

    Nucleic acids research 2007;35;Database issue;D610-7

  • Identification of common genetic variation that modulates alternative splicing.

    Hull J, Campino S, Rowlands K, Chan MS, Copley RR, Taylor MS, Rockett K, Elvidge G, Keating B, Knight J and Kwiatkowski D

    University Department of Paediatrics, John Radcliffe Hospital, Oxford, United Kingdom.

    Alternative splicing of genes is an efficient means of generating variation in protein function. Several disease states have been associated with rare genetic variants that affect splicing patterns. Conversely, splicing efficiency of some genes is known to vary between individuals without apparent ill effects. What is not clear is whether commonly observed phenotypic variation in splicing patterns, and hence potential variation in protein function, is to a significant extent determined by naturally occurring DNA sequence variation and in particular by single nucleotide polymorphisms (SNPs). In this study, we surveyed the splicing patterns of 250 exons in 22 individuals who had been previously genotyped by the International HapMap Project. We identified 70 simple cassette exon alternative splicing events in our experimental system; for six of these, we detected consistent differences in splicing pattern between individuals, with a highly significant association between splice phenotype and neighbouring SNPs. Remarkably, for five out of six of these events, the strongest correlation was found with the SNP closest to the intron-exon boundary, although the distance between these SNPs and the intron-exon boundary ranged from 2 bp to greater than 1,000 bp. Two of these SNPs were further investigated using a minigene splicing system, and in each case the SNPs were found to exert cis-acting effects on exon splicing efficiency in vitro. The functional consequences of these SNPs could not be predicted using bioinformatic algorithms. Our findings suggest that phenotypic variation in splicing patterns is determined by the presence of SNPs within flanking introns or exons. Effects on splicing may represent an important mechanism by which SNPs influence gene function.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust: 074318

    PLoS genetics 2007;3;6;e99

  • Completing the map of human genetic variation.

    Human Genome Structural Variation Working Group, Eichler EE, Nickerson DA, Altshuler D, Bowcock AM, Brooks LD, Carter NP, Church DM, Felsenfeld A, Guyer M, Lee C, Lupski JR, Mullikin JC, Pritchard JK, Sebat J, Sherry ST, Smith D, Valle D and Waterston RH

    Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.

    Funded by: Wellcome Trust: 077008

    Nature 2007;447;7141;161-5

  • A high utility integrated map of the pig genome.

    Humphray SJ, Scott CE, Clark R, Marron B, Bender C, Camm N, Davis J, Jenks A, Noon A, Patel M, Sehra H, Yang F, Rogatcheva MB, Milan D, Chardon P, Rohrer G, Nonneman D, de Jong P, Meyers SN, Archibald A, Beever JE, Schook LB and Rogers J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA UK.

    Background: The domestic pig is being increasingly exploited as a system for modeling human disease. It also has substantial economic importance for meat-based protein production. Physical clone maps have underpinned large-scale genomic sequencing and enabled focused cloning efforts for many genomes. Comparative genetic maps indicate that there is more structural similarity between pig and human than, for example, mouse and human, and we have used this close relationship between human and pig as a way of facilitating map construction.

    Results: Here we report the construction of the most highly continuous bacterial artificial chromosome (BAC) map of any mammalian genome, for the pig (Sus scrofa domestica) genome. The map provides a template for the generation and assembly of high-quality anchored sequence across the genome. The physical map integrates previous landmark maps with restriction fingerprints and BAC end sequences from over 260,000 BACs derived from 4 BAC libraries and takes advantage of alignments to the human genome to improve the continuity and local ordering of the clone contigs. We estimate that over 98% of the euchromatin of the 18 pig autosomes and the X chromosome along with localized coverage on Y is represented in 172 contigs, with chromosome 13 (218 Mb) represented by a single contig. The map is accessible through pre-Ensembl, where links to marker and sequence data can be found.

    Conclusion: The map will enable immediate electronic positional cloning of genes, benefiting the pig research community and further facilitating use of the pig as an alternative animal model for human disease. The clone map and BAC end sequence data can also help to support the assembly of maps and genome sequences of other artiodactyls.

    Funded by: Biotechnology and Biological Sciences Research Council: BBE0116401; Wellcome Trust: 077198

    Genome biology 2007;8;7;R139

  • G-quadruplexes in promoters throughout the human genome.

    Huppert JL and Balasubramanian S

    Cambridge University Chemical Laboratory, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK.

    Certain G-rich DNA sequences readily form four-stranded structures called G-quadruplexes. These sequence motifs are located in telomeres as a repeated unit, and elsewhere in the genome, where their function is currently unknown. It has been proposed that G-quadruplexes may be directly involved in gene regulation at the level of transcription. In support of this hypothesis, we show that the promoter regions (1 kb upstream of the transcription start site TSS) of genes are significantly enriched in quadruplex motifs relative to the rest of the genome, with >40% of human gene promoters containing one or more quadruplex motif. Furthermore, these promoter quadruplexes strongly associate with nuclease hypersensitive sites identified throughout the genome via biochemical measurement. Regions of the human genome that are both nuclease hypersensitive and within promoters show a remarkable (230-fold) enrichment of quadruplex elements, compared to the rest of the genome. These quadruplex motifs identified in promoter regions also show an interesting structural bias towards more stable forms. These observations support the proposal that promoter G-quadruplexes are directly involved in the regulation of gene expression.

    Funded by: Cancer Research UK: A5709

    Nucleic acids research 2007;35;2;406-13

  • A second generation human haplotype map of over 3.1 million SNPs.

    International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R and Stewart J

    The Scripps Research Institute, 10550 North Torrey Pines Road MEM275, La Jolla, California 92037, USA.

    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

    Funded by: Wellcome Trust: 077008, 077011, 077046, 081682

    Nature 2007;449;7164;851-61

  • Evolutionary consequences of a large duplication event in Trypanosoma brucei: chromosomes 4 and 8 are partial duplicons.

    Jackson AP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Background: Gene order along the genome sequence of the human parasite Trypanosoma brucei provides evidence for a 0.5 Mb duplication, comprising the 3' regions of chromosomes 4 and 8. Here, the principal aim was to examine the contribution made by this duplication event to the T. brucei genome sequence, emphasising the consequences for gene content and the evolutionary change subsequently experienced by paralogous gene copies. The duplicated region may be browsed online at

    Results: Comparisons of trypanosomatid genomes demonstrated widespread gene loss from each duplicon, but also showed that 47% of duplicated genes were retained on both chromosomes as paralogous loci. Secreted and surface-expressed genes were over-represented among retained paralogs, reflecting a bias towards important factors at the host-parasite interface, and consistent with a dosage-balance hypothesis. Genetic divergence in both coding and regulatory regions of retained paralogs was bimodal, with a deficit in moderately divergent paralogs; in particular, non-coding sequences were either conserved or entirely remodelled. The conserved paralogs included examples of remarkable sequence conservation, but also considerable divergence of both coding and regulatory regions. Sequence divergence typically displayed strong negative selection; but several features, such as asymmetric evolutionary rates, positively-selected codons and other non-neutral substitutions, suggested that divergence of some paralogs was driven by functional change. The absence of orthologs to retained paralogs in T. congolense indicated that the duplication event was specific to T. brucei.

    Conclusion: The duplication of this chromosomal region doubled the dosage of many genes. Rather than creating 'more of the same', these results show that paralogs were structurally modified according to various evolutionary trajectories. The retention of paralogs, and subsequent elaboration of both their primary structures and regulatory regions, strongly suggests that this duplication was a seminal development, stimulating functional innovation and fundamentally altering the genetic repertoire of T. brucei relative to other trypanosomatids.

    Funded by: Wellcome Trust

    BMC genomics 2007;8;432

  • Origins of amino acid transporter loci in trypanosomatid parasites.

    Jackson AP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Background: Large amino acid transporter gene families were identified from the genome sequences of three parasitic protists, Trypanosoma brucei, Trypanosoma cruzi and Leishmania major. These genes encode molecular sensors of the external host environment for trypanosomatid cells and are crucial to modulation of gene expression as the parasite passes through different life stages. This study provides a comprehensive phylogenetic account of the origins of these genes, redefining each locus according to a positional criterion, through the integration of phyletic identity with comparative gene order information.

    Results: Each locus was individually specified by its surrounding gene order and associated with homologs showing the same position ('homoeologs') in other species, where available. Bayesian and maximum likelihood phylogenies were in general agreement on systematic relationships and confirmed several 'orthology sets' of genes retained since divergence from the common ancestor. Reconciliation analysis quantified the scale of duplication and gene loss, as well as identifying further apparent orthology sets, which lacked conservation of genomic position. These instances suggested substantial genomic restructuring or transposition. Other analyses identified clear instances of evolutionary rate changes post-duplication, the effects of concerted evolution within tandem gene arrays and gene conversion events between syntenic loci.

    Conclusion: Despite their importance to cell function and parasite development, the repertoires of AAT loci in trypanosomatid parasites are relatively fluid in both complement and gene dosage. Some loci are ubiquitous and, after an ancient origin through transposition, originated through descent from the ancestral trypanosomatid. However, reconciliation analysis demonstrated that unilateral expansions of gene number through tandem gene duplication, transposition of gene duplicates to otherwise well conserved genomic positions, and differential patterns of gene loss have produced largely customised and idiosyncratic AAT repertoires in all three species. Not least in T. brucei, which seems to have retained fewer ancestral loci and has acquired novel loci through a complex mix of tandem and transpositive duplication.

    Funded by: Wellcome Trust

    BMC evolutionary biology 2007;7;26

  • Tandem gene arrays in Trypanosoma brucei: comparative phylogenomic analysis of duplicate sequence variation.

    Jackson AP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Background: The genome sequence of the protistan parasite Trypanosoma brucei contains many tandem gene arrays. Gene duplicates are created through tandem duplication and are expressed through polycistronic transcription, suggesting that the primary purpose of long, tandem arrays is to increase gene dosage in an environment where individual gene promoters are absent. This report presents the first account of the tandem gene arrays in the T. brucei genome, employing several related genome sequences to establish how variation is created and removed.

    Results: A systematic survey of tandem gene arrays showed that substantial sequence variation existed across the genome; variation from different regions of an array often produced inconsistent phylogenetic affinities. Phylogenetic relationships of gene duplicates were consistent with concerted evolution being a widespread homogenising force. However, tandem duplicates were not usually identical; therefore, any homogenising effect was coincident with divergence among duplicates. Allelic gene conversion was detected using various criteria and was apparently able to both remove and introduce sequence variation. Tandem arrays containing structural heterogeneity demonstrated how sequence homogenisation and differentiation can occur within a single locus.

    Conclusion: The use of multiple genome sequences in a comparative analysis of tandem gene arrays identified substantial sequence variation among gene duplicates. The distribution of sequence variation is determined by a dynamic balance of conservative and innovative evolutionary forces. Gene trees from various species showed that intraspecific duplicates evolve in concert, perhaps through frequent gene conversion, although this does not prevent sequence divergence, especially where structural heterogeneity physically separates a duplicate from its neighbours. In describing dynamics of sequence variation that have consequences beyond gene dosage, this survey provides a basis for uncovering the hidden functionality within tandem gene arrays in trypanosomatids.

    Funded by: Wellcome Trust

    BMC evolutionary biology 2007;7;54

  • Genome variation and evolution of the malaria parasite Plasmodium falciparum.

    Jeffares DC, Pain A, Berry A, Cox AV, Stalker J, Ingle CE, Thomas A, Quail MA, Siebenthall K, Uhlemann AC, Kyes S, Krishna S, Newbold C, Dermitzakis ET and Berriman M

    Informatics Division, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA Hinxton, UK.

    Infections with the malaria parasite Plasmodium falciparum result in more than 1 million deaths each year worldwide. Deciphering the evolutionary history and genetic variation of P. falciparum is critical for understanding the evolution of drug resistance, identifying potential vaccine candidates and appreciating the effect of parasite variation on prevalence and severity of malaria in humans. Most studies of natural variation in P. falciparum have been either in depth over small genomic regions (up to the size of a small chromosome) or genome wide but only at low resolution. In an effort to complement these studies with genome-wide data, we undertook shotgun sequencing of a Ghanaian clinical isolate (with fivefold coverage), the IT laboratory isolate (with onefold coverage) and the chimpanzee parasite P. reichenowi (with twofold coverage). We compared these sequences with the fully sequenced P. falciparum 3D7 isolate genome. We describe the most salient features of P. falciparum polymorphism and adaptive evolution with relation to gene function, transcript and protein expression and cellular localization. This analysis uncovers the primary evolutionary changes that have occurred since the P. falciparum-P. reichenowi speciation and changes that are occurring within P. falciparum.

    Funded by: Wellcome Trust: 077046, 079643

    Nature genetics 2007;39;1;120-5

  • In silico functional and structural characterisation of ferlin proteins by mapping disease-causing mutations and evolutionary information onto three-dimensional models of their C2 domains.

    Jiménez JL and Bashir R

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Ferlins are C2 domain proteins involved in membrane fusion events, including membrane repair and synaptic exocytosis, and their deficiency can result in muscular dystrophy and deafness. We have undertaken a structural study of their C2 domains by sequence comparison and homology modelling to understand the function of these poorly characterised proteins and to predict the molecular impact of disease-causing mutations. We observe that non-conservative mutations affecting buried residues tend to result in detrimental phenotypes, likely because of decreased protein stability, whereas most variants with replacements in surface residues do not. The few cases of exposed residues altered in variants known to cause diseases are found in conserved areas of functional importance, including essential calcium-binding regions, as deduced by analogy to other characterised C2 domains. Furthermore, we report distinct features of some C2 domains in the two known ferlin subfamilies that correlates with the presence or absence of the DysF domains. Taken altogether, our results highlight potential targets for further experimental analyses to understand the function of ferlin proteins. We believe our modelling data will aid the diagnosis of diseases associated with ferlin mutations and the development of therapeutic strategies.

    Funded by: Wellcome Trust

    Journal of the neurological sciences 2007;260;1-2;114-23

  • A systematic comparative and structural analysis of protein phosphorylation sites based on the mtcPTM database.

    Jiménez JL, Hegemann B, Hutchins JR, Peters JM and Durbin R

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    mtcPTM is an online repository of human and mouse phosphosites in which data are hierarchically organized to preserve biologically relevant experimental information, thus allowing straightforward comparisons of phosphorylation patterns found under different conditions. The database also contains the largest available collection of atomic models of phosphorylatable proteins. Detailed analysis of this structural dataset reveals that phosphorylation sites are found in a heterogeneous range of structural and sequence contexts. mtcPTM is available on the web

    Funded by: Wellcome Trust

    Genome biology 2007;8;5;R90

  • Epigenetic identification of ADAMTS18 as a novel 16q23.1 tumor suppressor frequently silenced in esophageal, nasopharyngeal and multiple other carcinomas.

    Jin H, Wang X, Ying J, Wong AH, Li H, Lee KY, Srivastava G, Chan AT, Yeo W, Ma BB, Putti TC, Lung ML, Shen ZY, Xu LY, Langford C and Tao Q

    Cancer Epigenetics Laboratory, Sir YK Pao Center for Cancer, Department of Clinical Oncology, Hong Kong Cancer Institute, Chinese University of Hong Kong, China.

    Tumor suppressor genes (TSGs) often locate at chromosomal regions with frequent deletions in tumors. Loss of 16q23 occurs frequently in multiple tumors, indicating the presence of critical TSGs at this locus, such as the well-studied WWOX. Herein, we found that ADAMTS18, located next to WWOX, was significantly downregulated in multiple carcinoma cell lines. No deletion of ADAMTS18 was detected with multiplex differential DNA-PCR or high-resolution 1-Mb array-based comparative genomic hybridization (CGH) analysis. Instead, methylation of the ADAMTS18 promoter CpG Island was frequently detected with methylation-specific PCR and bisulfite genome sequencing in multiple carcinoma cell lines and primary carcinomas, but not in any nontumor cell line and normal epithelial tissue. Both pharmacological and genetic demethylation dramatically induced the ADAMTS18 expression, indicating that CpG methylation directly contributes to the tumor-specific silencing of ADAMTS18. Ectopic ADAMTS18 expression led to significant inhibition of both anchorage-dependent and -independent growth of carcinoma cells lacking the expression. Thus, through functional epigenetics, we identified ADAMTS18 as a novel functional tumor suppressor, being frequently inactivated epigenetically in multiple carcinomas.

    Funded by: Wellcome Trust: 079643

    Oncogene 2007;26;53;7490-8

  • Immunohistochemical characterization of cytokeratins in the abnormal corneal endothelium of posterior polymorphous corneal dystrophy patients.

    Jirsova K, Merjava S, Martincova R, Gwilliam R, Ebenezer ND, Liskova P and Filipec M

    Ocular Tissue Bank, General Teaching Hospital and Charles University, U Nemocnice 2, Prague 128 08, Czech Republic.

    Posterior polymorphous corneal dystrophy (PPCD) is a hereditary bilateral disorder affecting Descemet's membrane and the endothelium. The aim of the present study was to determine the spectrum of cytokeratin (CK) expression in cells on the posterior surface of the cornea in PPCD patients. Ten corneal buttons and one specimen of the trabecular meshwork (TM) from PPCD patients who underwent graft or glaucoma surgery were used, as well as six corneal buttons and two TM specimens obtained from healthy donors as controls. Cryosections were fixed and indirect immunofluorescent staining was performed using antibodies directed against a wide spectrum of cytokeratins (CKs). The number of positive cells and the intensity of the staining were assessed using fluorescent microscopy. All 10 PPCD corneal specimens had areas of endothelium displaying typical endothelial morphology as well as areas consisting of layers two to six cells thick with both flat endothelial-like cells and polygonal cells with round nuclei and a large cytoplasm. Both of these morphologically distinct cell types showed strong immunostaining for CK7, CK19, CK8 and CK18, while weaker positive signals were observed for CK1, CK3/12, CK4, CK5/6, CK10, CK10/13, CK14, CK16 and CK17. PPCD endothelium was completely negative for CK2e, CK9, CK15, and CK20. Focal positivity was detected in PPCD TM for CK4, CK7 and CK19. CK8 and CK18 were the only CKs expressed in control endothelium. PPCD and control epithelium displayed similar staining patterns. The distinct positivity for CK3/12, CK4, CK5/6, CK10/13, CK14, CK16 and CK17 was observed in aberrant PPCD endothelium for the first time. We demonstrate that the abnormal endothelium of PPCD patients expresses a mixture of CKs, with CK7 and CK19 predominating. In terms of CK composition, the aberrant PPCD endothelium shares features of both simple and squamous stratified epithelium with a proliferative capacity. The wide spectrum of CK expression is most probably not indicative of the transformation of endothelial cells to a distinct epithelial phenotype, but more likely reflects the modified differentiation of metaplastic epithelium.

    Experimental eye research 2007;84;4;680-6

  • Structural variation on the short arm of the human Y chromosome: recurrent multigene deletions encompassing Amelogenin Y.

    Jobling MA, Lo IC, Turner DJ, Bowden GR, Lee AC, Xue Y, Carvalho-Silva D, Hurles ME, Adams SM, Chang YM, Kraaijenbrink T, Henke J, Guanti G, McKeown B, van Oorschot RA, Mitchell RJ, de Knijff P, Tyler-Smith C and Parkin EJ

    Department of Genetics, University of Leicester, University Road, Leicester LE1 7RH, UK.

    Structural polymorphism is increasingly recognized as a major form of human genome variation, and is particularly prevalent on the Y chromosome. Assay of the Amelogenin Y gene (AMELY) on Yp is widely used in DNA-based sex testing, and sometimes reveals males who have interstitial deletions. In a collection of 45 deletion males from 12 populations, we used a combination of sequence-tagged site mapping, and binary-marker and Y-short tandem repeat haplotyping to understand the structural basis of this variation. Of the 45 deletion males, 41 carry indistinguishable deletions, 3.0-3.8 Mb in size. Breakpoint mapping strongly implicates a mechanism of non-allelic homologous recombination between the proximal major array of TSPY gene-containing repeats, and a single distal copy of TSPY; this is supported by the estimation of TSPY copy number in deleted and non-deleted males. The remaining four males carry three distinct non-recurrent deletions (2.5-4.0 Mb), which may be due to non-homologous mechanisms. Haplotyping shows that TSPY-mediated deletions have arisen seven times independently in the sample. One instance, represented by 30 chromosomes mostly of Indian origin within haplogroup J2e1*/M241, has a time-to-most-recent-common-ancestor of approximately 7700+/-1300 years. In addition to AMELY, deletion males all lack the genes PRKY and TBL1Y, and the rarer deletion classes also lack PCDH11Y. The persistence and expansion of deletion lineages, together with direct phenotypic evidence, suggests that absence of these genes has no major deleterious effects.

    Funded by: Wellcome Trust: 057559

    Human molecular genetics 2007;16;3;307-16

  • The nicotinic acetylcholine receptor gene family of the nematode Caenorhabditis elegans: an update on nomenclature.

    Jones AK, Davis P, Hodgkin J and Sattelle DB

    Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford,UK.

    The simple nematode, Caenorhabditis elegans, possesses the most extensive known gene family of nicotinic acetylcholine receptor (nAChR)-like subunits. Whilst all show greatest similarity with nAChR subunits of both invertebrates and vertebrates, phylogenetic analysis suggests that just over half of these (32) may represent other members of the cys-loop ligand-gated ion channel superfamily. We have introduced a novel nomenclature system for these "Orphan" subunits, designating them as lgc genes (ligand-gated ion channels of the cys-loop superfamily), which can also be applied in future to unnamed and uncharacterised members of the cys-loop ligand-gated ion channel superfamily. We present here the resulting updated version of the C. elegans nAChR gene family and related ligand-gated ion channel genes.

    Funded by: Medical Research Council: G0701197(84652)

    Invertebrate neuroscience : IN 2007;7;2;129-31

  • Mapping the platelet profile for functional genomic studies and demonstration of the effect size of the GP6 locus.

    Jones CI, Garner SF, Angenent W, Bernard A, Berzuini C, Burns P, Farndale RW, Hogwood J, Rankin A, Stephens JC, Tom BD, Walton J, Dudbridge F, Ouwehand WH, Goodall AH and Bloodomics Consortium

    Department of Cardiovascular Sciences, University of Leicester, Leicester, UK.

    Background: Evidence suggests the wide variation in platelet response within the population is genetically controlled. Unraveling the complex relationship between sequence variation and platelet phenotype requires accurate and reproducible measurement of platelet response.

    Objective: To develop a methodology suitable for measuring signaling pathway-specific platelet phenotype, to use this to measure platelet response in a large cohort, and to demonstrate the effect size of sequence variation in a relevant model gene.

    Methods: Three established platelet assays were evaluated: mobilization of [Ca(2+)](i), aggregometry and flow cytometry, each in response to adenosine 5'-diphosphate (ADP) or the glycoprotein (GP) VI-specific crosslinked collagen-related peptide (CRP). Flow cytometric measurement of fibrinogen binding and P-selectin expression in response to a single, intermediate dose of each agonist gave the best combination of reproducibility and inter-individual variability and was used to measure the platelet response in 506 healthy volunteers. Pathway specificity was ensured by blocking the main subsidiary signaling pathways.

    Results: Individuals were identified who were hypo- or hyper-responders for both pathways, or who had differential responses to the two agonists, or between outcomes. 89 individuals, retested three months later using the same methodology, showed high concordance between the two visits in all four assays (r(2) = 0.872, 0.868, 0.766 and 0.549); all subjects retaining their phenotype at recall. The effect of sequence variation at the GP6 locus accounted for approximately 35% of the variation in the CRP-XL response.

    Conclusion: Genotyping-phenotype association studies in a well-characterized, large cohort provides a powerful strategy to measure the effect of sequence variation in genes regulating the platelet response.

    Funded by: Medical Research Council: G0500707, MC_U105260799, MC_U105261167

    Journal of thrombosis and haemostasis : JTH 2007;5;8;1756-65

  • Advances in the genomics of ticks and tick-borne pathogens.

    Jongejan F, Nene V, de la Fuente J, Pain A and Willadsen P

    Ticks and the diseases for which they are vectors engage in complex interactions with their mammalian hosts. These interactions involve the developmental processes of tick and pathogen, and interplay between the defensive responses and counter responses of host, tick and pathogen. Understanding these interactions has long been an intractable problem, but progress is now being made thanks to the flood of genomic information on host, tick and pathogen, and the attendant, novel experimental tools that have been generated. Each advance reveals new levels of complexity, but there are encouraging signs that genomics is leading to novel means of parasite control.

    Trends in parasitology 2007;23;9;391-6

  • Sequence and functional analyses of Haemophilus spp. genomic islands.

    Juhas M, Power PM, Harding RM, Ferguson DJ, Dimopoulou ID, Elamin AR, Mohd-Zain Z, Hood DW, Adegbola R, Erwin A, Smith A, Munson RS, Harrison A, Mansfield L, Bentley S and Crook DW

    Clinical Microbiology and Infectious Diseases, NDCLS, University of Oxford, Headley Way, Oxford OX3 9DU, UK.

    Background: A major part of horizontal gene transfer that contributes to the diversification and adaptation of bacteria is facilitated by genomic islands. The evolution of these islands is poorly understood. Some progress was made with the identification of a set of phylogenetically related genomic islands among the Proteobacteria, recognized from the investigation of the evolutionary origins of a Haemophilus influenzae antibiotic resistance island, namely ICEHin1056. More clarity comes from this comparative analysis of seven complete sequences of the ICEHin1056 genomic island subfamily.

    Results: These genomic islands have core and accessory genes in approximately equal proportion, with none demonstrating recent acquisition from other islands. The number of variable sites within core genes is similar to that found in the host bacteria. Furthermore, the GC content of the core genes is similar to that of the host bacteria (38% to 40%). Most of the core gene content is formed by the syntenic type IV secretion system dependent conjugative module and replicative module. GC content and lack of variable sites indicate that the antibiotic resistance genes were acquired relatively recently. An analysis of conjugation efficiency and antibiotic susceptibility demonstrates that phenotypic expression of genomic island-borne genes differs between different hosts.

    Conclusion: Genomic islands of the ICEHin1056 subfamily have a longstanding relationship with H. influenzae and H. parainfluenzae and are co-evolving as semi-autonomous genomes within the 'supragenomes' of their host species. They have promoted bacterial diversity and adaptation through becoming efficient vectors of antibiotic resistance by the recent acquisition of antibiotic resistance transposons.

    Genome biology 2007;8;11;R237

  • Radial chromatin positioning is shaped by local gene density, not by gene expression.

    Küpper K, Kölbl A, Biener D, Dittrich S, von Hase J, Thormeyer T, Fiegler H, Carter NP, Speicher MR, Cremer T and Cremer M

    Department of Biology II, Anthropology and Human Genetics, Ludwig Maximilians University, Munich, Germany.

    G- and R-bands of metaphase chromosomes are characterized by profound differences in gene density, CG content, replication timing, and chromatin compaction. The preferential localization of gene-dense, transcriptionally active, and early replicating chromatin in the nuclear interior and of gene-poor, later replicating chromatin at the nuclear envelope has been demonstrated to be evolutionary-conserved in various cell types. Yet, the impact of different local chromatin features on the radial nuclear arrangement of chromatin is still not well understood. In particular, it is not known whether radial chromatin positioning is preferentially shaped by local gene density per se or by other related parameters such as replication timing or transcriptional activity. The interdependence of these distinct chromatin features on the linear deoxyribonucleic acid (DNA) sequence precludes a simple dissection of these parameters with respect to their importance for the reorganization of the linear DNA organization into the distinct radial chromatin arrangements observed in the nuclear space. To analyze this problem, we generated probe sets of pooled bacterial artificial chromosome (BAC) clones from HSA 11, 12, 18, and 19 representing R/G-band-assigned chromatin, segments with different gene density and gene loci with different expression levels. Using multicolor 3D flourescent in situ hybridization (FISH) and 3D image analysis, we determined their localization in the nucleus and their positions within or outside the corresponding chromosome territory (CT). For each BAC data on local gene density within 2- and 10-Mb windows, as well as GC (guanine and cytosine) content, replication timing and expression levels were determined. A correlation analysis of these parameters with nuclear positioning revealed regional gene density as the decisive parameter determining the radial positioning of chromatin in the nucleus in contrast to band assignment, replication timing, and transcriptional activity. We demonstrate a polarized distribution of gene-dense vs gene-poor chromatin within CTs with respect to the nuclear border. Whereas we confirm previous reports that a particular gene-dense and transcriptionally highly active region of about 2 Mb on 11p15.5 often loops out from the territory surface, gene-dense and highly expressed sequences were not generally found preferentially at the CT surface as previously suggested.

    Funded by: Wellcome Trust

    Chromosoma 2007;116;3;285-306

  • Denoising inferred functional association networks obtained by gene fusion analysis.

    Kamburov A, Goldovsky L, Freilich S, Kapazoglou A, Kunin V, Enright AJ, Tsaftaris A and Ouzounis CA

    Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.

    Background: Gene fusion detection - also known as the 'Rosetta Stone' method - involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes.

    Results: In order to explore the usefulness and scope of this approach for protein interaction prediction and generate a high-quality, non-redundant set of interacting pairs of proteins across a wide taxonomic range, we have exhaustively performed gene fusion analysis for 184 genomes using an efficient variant of a previously developed protocol. By analyzing interaction graphs and applying a threshold that limits the maximum number of possible interactions within the largest graph components, we show that we can reduce the number of implausible interactions due to the detection of promiscuous domains. With this generally applicable approach, we generate a robust set of over 2 million distinct and testable interactions encompassing 696,894 proteins in 184 species or strains, most of which have never been the subject of high-throughput experimental proteomics. We investigate the cumulative effect of increasing numbers of genomes on the fidelity and quantity of predictions, and show that, for large numbers of genomes, predictions do not become saturated but continue to grow linearly, for the majority of the species. We also examine the percentage of component (and composite) proteins with relation to the number of genes and further validate the functional categories that are highly represented in this robust set of detected genome-wide interactions.

    Conclusion: We illustrate the phylogenetic and functional diversity of gene fusion events across genomes, and their usefulness for accurate prediction of protein interaction and function.

    BMC genomics 2007;8;460

  • MultiPhyl: a high-throughput phylogenomics webserver using distributed computing.

    Keane TM, Naughton TJ and McInerney JO

    Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA Hinxton, UK.

    With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at:

    Nucleic acids research 2007;35;Web Server issue;W33-7

  • Ability of SPI2 mutant of S. typhi to effectively induce antibody responses to the mucosal antigen enterotoxigenic E. coli heat labile toxin B subunit after oral delivery to humans.

    Khan S, Chatfield S, Stratford R, Bedwell J, Bentley M, Sulsh S, Giemza R, Smith S, Bongard E, Cosgrove CA, Johnson J, Dougan G, Griffin GE, Makin J and Lewis DJ

    Microscience, Wokingham Berkshire RG41 5TU, UK.

    We have evaluated an oral vaccine based on an Salmonella enteric serovar typhi (S. typhi) Ty2 derivative TSB7 harboring deletion mutations in ssaV (SPI-2) and aroC together with a chromosomally integrated copy of eltB encoding the B subunit of enterotoxigenic Escherichia coli heat labile toxin (LT-B) in volunteers. Two oral doses of 10(8) or 10(9)CFU were administered to two groups of volunteers and both doses were well tolerated, with no vaccinemia, and only transient stool shedding. Immune responses to LT-B and S. typhi lipopolysaccharide were demonstrated in 67 and 97% of subjects, respectively, without evidence of anti-carrier immunity preventing boosting of LT-B responses in many cases. Further development of this salmonella-based (spi-VEC) system for oral delivery of heterologous antigens appears warranted.

    Funded by: Wellcome Trust: 076962

    Vaccine 2007;25;21;4175-82

  • Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates.

    Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engström PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, Ghislain J, Pezeron G, Mourrain P, Ellingsen S, Oates AC, Thisse C, Thisse B, Foucher I, Adolf B, Geling A, Lenhard B and Becker TS

    Sars Centre for Marine Molecular Biology, University of Bergen, 5008 Bergen, Norway.

    We report evidence for a mechanism for the maintenance of long-range conserved synteny across vertebrate genomes. We found the largest mammal-teleost conserved chromosomal segments to be spanned by highly conserved noncoding elements (HCNEs), their developmental regulatory target genes, and phylogenetically and functionally unrelated "bystander" genes. Bystander genes are not specifically under the control of the regulatory elements that drive the target genes and are expressed in patterns that are different from those of the target genes. Reporter insertions distal to zebrafish developmental regulatory genes pax6.1/2, rx3, id1, and fgf8 and miRNA genes mirn9-1 and mirn9-5 recapitulate the expression patterns of these genes even if located inside or beyond bystander genes, suggesting that the regulatory domain of a developmental regulatory gene can extend into and beyond adjacent transcriptional units. We termed these chromosomal segments genomic regulatory blocks (GRBs). After whole genome duplication in teleosts, GRBs, including HCNEs and target genes, were often maintained in both copies, while bystander genes were typically lost from one GRB, strongly suggesting that evolutionary pressure acts to keep the single-copy GRBs of higher vertebrates intact. We show that loss of bystander genes and other mutational events suffered by duplicated GRBs in teleost genomes permits target gene identification and HCNE/target gene assignment. These findings explain the absence of evolutionary breakpoints from large vertebrate chromosomal segments and will aid in the recognition of position effect mutations within human GRBs.

    Funded by: Wellcome Trust: 077198

    Genome research 2007;17;5;545-55

  • Reduced ENaC protein abundance contributes to the lower blood pressure observed in pendrin-null mice.

    Kim YH, Pech V, Spencer KB, Beierwaltes WH, Everett LA, Green ED, Shin W, Verlander JW, Sutliff RL and Wall SM

    Department of Medicine, Emory University, Atlanta, Georgia, USA.

    Pendrin (encoded by Pds, Slc26a4) is a Cl(-)/HCO(3)(-) exchanger expressed in the apical regions of type B and non-A, non-B intercalated cells of kidney and mediates renal Cl(-) absorption, particularly when upregulated. Aldosterone increases blood pressure by increasing absorption of both Na(+) and Cl(-) through increased protein abundance and function of Na(+) transporters, such as the epithelial Na(+) channel (ENaC) and the Na(+)-Cl(-) cotransporter (NCC), as well as Cl(-) transporters, such as pendrin. Because aldosterone analogs do not increase blood pressure in Slc26a4(-/-) mice, we asked whether Na(+) excretion and Na(+) transporter protein abundance are altered in kidneys from these mutant mice. Thus wild-type and Slc26a4-null mice were given a NaCl-replete, a NaCl-restricted, or NaCl-replete diet and aldosterone or aldosterone analogs. Abundance of the major renal Na(+) transporters was examined with immunoblots and immunohistochemistry. Slc26a4-null mice showed an impaired ability to conserve Na(+) during dietary NaCl restriction. Under treatment conditions in which circulating aldosterone is increased, alpha-, beta-, and 85-kDa gamma-ENaC subunit protein abundances were reduced 15-35%, whereas abundance of the 70-kDa fragment of gamma-ENaC was reduced approximately 70% in Slc26a4-null relative to wild-type mice. Moreover, ENaC-dependent changes in transepithelial voltage were much lower in cortical collecting ducts from Slc26a4-null than from wild-type mice. Thus, in kidney, ENaC protein abundance and function are modulated by pendrin or through a pendrin-dependent downstream event. The reduced ENaC protein abundance and function observed in Slc26a4-null mice contribute to their lower blood pressure and reduced ability to conserve Na(+) during NaCl restriction.

    Funded by: PHS HHS: P01 061521

    American journal of physiology. Renal physiology 2007;293;4;F1314-24

  • Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data.

    King DC, Taylor J, Zhang Y, Cheng Y, Lawson HA, Martin J, ENCODE groups for Transcriptional Regulation and Multispecies Sequence Analysis, Chiaromonte F, Miller W and Hardison RC

    Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.

    Identification of functional genomic regions using interspecies comparison will be most effective when the full span of relationships between genomic function and evolutionary constraint are utilized. We find that sets of putative transcriptional regulatory sequences, defined by ENCODE experimental data, have a wide span of evolutionary histories, ranging from stringent constraint shown by deep phylogenetic comparisons to recent selection on lineage-specific elements. This diversity of evolutionary histories can be captured, at least in part, by the suite of available comparative genomics tools, especially after correction for regional differences in the neutral substitution rate. Putative transcriptional regulatory regions show alignability in different clades, and the genes associated with them are enriched for distinct functions. Some of the putative regulatory regions show evidence for recent selection, including a primate-specific, distal promoter that may play a novel role in regulation.

    Funded by: NHGRI NIH HHS: HG002238; NIDDK NIH HHS: DK65806

    Genome research 2007;17;6;775-86

  • Africans in Yorkshire? The deepest-rooting clade of the Y phylogeny within an English genealogy.

    King TE, Parkin EJ, Swinfield G, Cruciani F, Scozzari R, Rosa A, Lim SK, Xue Y, Tyler-Smith C and Jobling MA

    Department of Genetics, University of Leicester, Leicester, UK.

    The presence of Africans in Britain has been recorded since Roman times, but has left no apparent genetic trace among modern inhabitants. Y chromosomes belonging to the deepest-rooting clade of the Y phylogeny, haplogroup (hg) A, are regarded as African-specific, and no examples have been reported from Britain or elsewhere in Western Europe. We describe the presence of an hgA1 chromosome in an indigenous British male; comparison with African examples suggests a Western African origin. Seven out of 18 men carrying the same rare east-Yorkshire surname as the original male also carry hgA1 chromosomes, and documentary research resolves them into two genealogies with most-recent-common-ancestors living in Yorkshire in the late 18th century. Analysis using 77 Y-short tandem repeats (STRs) is consistent with coalescence a few generations earlier. Our findings represent the first genetic evidence of Africans among 'indigenous' British, and emphasize the complexity of human migration history as well as the pitfalls of assigning geographical origin from Y-chromosomal haplotypes.

    Funded by: Wellcome Trust: 057559

    European journal of human genetics : EJHG 2007;15;3;288-93

  • Arginine methylation at histone H3R2 controls deposition of H3K4 trimethylation.

    Kirmizis A, Santos-Rosa H, Penkett CJ, Singer MA, Vermeulen M, Mann M, Bähler J, Green RD and Kouzarides T

    Gurdon Institute and Department of Pathology, Tennis Court Road, Cambridge CB2 1QN, UK.

    Modifications on histones control important biological processes through their effects on chromatin structure. Methylation at lysine 4 on histone H3 (H3K4) is found at the 5' end of active genes and contributes to transcriptional activation by recruiting chromatin-remodelling enzymes. An adjacent arginine residue (H3R2) is also known to be asymmetrically dimethylated (H3R2me2a) in mammalian cells, but its location within genes and its function in transcription are unknown. Here we show that H3R2 is also methylated in budding yeast (Saccharomyces cerevisiae), and by using an antibody specific for H3R2me2a in a chromatin immunoprecipitation-on-chip analysis we determine the distribution of this modification on the entire yeast genome. We find that H3R2me2a is enriched throughout all heterochromatic loci and inactive euchromatic genes and is present at the 3' end of moderately transcribed genes. In all cases the pattern of H3R2 methylation is mutually exclusive with the trimethyl form of H3K4 (H3K4me3). We show that methylation at H3R2 abrogates the trimethylation of H3K4 by the Set1 methyltransferase. The specific effect on H3K4me3 results from the occlusion of Spp1, a Set1 methyltransferase subunit necessary for trimethylation. Thus, the inability of Spp1 to recognize H3 methylated at R2 prevents Set1 from trimethylating H3K4. These results provide the first mechanistic insight into the function of arginine methylation on chromatin.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118, 092096

    Nature 2007;449;7164;928-32

  • The landscape of histone modifications across 1% of the human genome in five human cell lines.

    Koch CM, Andrews RM, Flicek P, Dillon SC, Karaöz U, Clelland GK, Wilcox S, Beare DM, Fowler JC, Couttet P, James KD, Lefebvre GC, Bruce AW, Dovey OM, Ellis PD, Dhami P, Langford CF, Weng Z, Birney E, Carter NP, Vetrie D and Dunham I

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SA, United Kingdom.

    We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1, H3K4me2, H3K4me3, respectively) across the ENCODE regions. Studying each modification in five human cell lines including the ENCODE Consortium common cell lines GM06990 (lymphoblastoid) and HeLa-S3, as well as K562, HFL-1, and MOLT4, we identified clear patterns of histone modification profiles with respect to genomic features. H3K4me3, H3K4me2, and H3ac modifications are tightly associated with the transcriptional start sites (TSSs) of genes, while H3K4me1 and H4ac have more widespread distributions. TSSs reveal characteristic patterns of both types of modification present and the position relative to TSSs. These patterns differ between active and inactive genes and in particular the state of H3K4me3 and H3ac modifications is highly predictive of gene activity. Away from TSSs, modification sites are enriched in H3K4me1 and relatively depleted in H3K4me3 and H3ac. Comparison between cell lines identified differences in the histone modification profiles associated with transcriptional differences between the cell lines. These results provide an overview of the functional relationship among histone modifications and gene expression in human cells.

    Funded by: NHGRI NIH HHS: R01HG03110, U01HG003168

    Genome research 2007;17;6;691-707

  • Paired-end mapping reveals extensive structural variation in the human genome.

    Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, Taillon BE, Chen Z, Tanzer A, Saunders AC, Chi J, Yang F, Carter NP, Hurles ME, Weissman SM, Harkins TT, Gerstein MB, Egholm M and Snyder M

    Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT 06520, USA.

    Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.

    Funded by: NCRR NIH HHS: RR19895; Wellcome Trust: 077008, 077014

    Science (New York, N.Y.) 2007;318;5849;420-6

  • New tools and expanded data analysis capabilities at the Protein Structure Prediction Center.

    Kryshtafovych A, Prlic A, Dmytriv Z, Daniluk P, Milostan M, Eyrich V, Hubbard T and Fidelis K

    Genome Center, University of California, Davis, California 95616, USA.

    We outline the main tasks performed by the Protein Structure Prediction Center in support of the CASP7 experiment and provide a brief review of the major measures used in the automatic evaluation of predictions. We describe in more detail the software developed to facilitate analysis of modeling success over and beyond the available templates and the adopted Java-based tool enabling visualization of multiple structural superpositions between target and several models/templates. We also give an overview of the CASP infrastructure provided by the Center and discuss the organization of the results web pages available through

    Funded by: NLM NIH HHS: LM07085-01; Wellcome Trust: 077198

    Proteins 2007;69 Suppl 8;19-26

  • An RNA G-quadruplex in the 5' UTR of the NRAS proto-oncogene modulates translation.

    Kumari S, Bugaut A, Huppert JL and Balasubramanian S

    University Chemical Laboratory, Lensfield Road, Cambridge CB2 1EW, UK.

    Guanine-rich nucleic acid sequences can adopt noncanonical four-stranded secondary structures called guanine (G)-quadruplexes. Bioinformatics analysis suggests that G-quadruplex motifs are prevalent in genomes, which raises the need to elucidate their function. There is now evidence for the existence of DNA G-quadruplexes at telomeres with associated biological function. A recent hypothesis supports the notion that gene promoter elements contain DNA G-quadruplex motifs that control gene expression at the transcriptional level. We discovered a highly conserved, thermodynamically stable RNA G-quadruplex in the 5' untranslated region (UTR) of the gene transcript of the human NRAS proto-oncogene. Using a cell-free translation system coupled to a reporter gene assay, we have demonstrated that this NRAS RNA G-quadruplex modulates translation. This is the first example of translational repression by an RNA G-quadruplex. Bioinformatics analysis has revealed 2,922 other 5' UTR RNA G-quadruplex elements in the human genome. We propose that RNA G-quadruplexes in the 5' UTR modulate gene expression at the translational level.

    Funded by: Cancer Research UK: A4081

    Nature chemical biology 2007;3;4;218-21

  • A comprehensive antibody panel for immunohistochemical analysis of formalin-fixed, paraffin-embedded hematopoietic neoplasms of mice: analysis of mouse specific and human antibodies cross-reactive with murine tissue.

    Kunder S, Calzada-Wack J, Hölzlwimmer G, Müller J, Kloss C, Howat W, Schmidt J, Höfler H, Warren M and Quintanilla-Martinez L

    GSF Research Center for Environment and Health, Institute of Pathology, Neuherberg 85764, Germany.

    Immunohistochemistry is an indispensable tool in human pathology enabling immunophenotypic characterization of tumor cells. Immunohistochemical analyses of mouse models of human hematopoietic neoplasias have become an important aspect for comparison of murine entities with their human counterparts. The aim of this study was to establish a diagnostic antibody panel for analysis of murine lymphomas/leukemias, useful in formalin-fixed/paraffin-embedded tissue. Overall, 48 antibodies (4 rabbit monoclonal, 12 rabbit polyclonal, 2 goat polyclonal, 11 rat, and 19 mouse monoclonal), which were either mouse-specific (14) or cross-reactive with murine tissue (34) were tested for staining quality and diagnostic value in 468 murine hematopoietic neoplasms. Specific staining was achieved with 29 antibodies, of which 18 were human antibodies cross-reactive with murine tissue. Only 23 (B220, BCL-2, BCL-6, CD117, CD138 (2x), CD3 (2x), CD43, CD45, CD5, CD79 alpha cy, cyclin D1, Ki-67 (2x), Mac-3, Mac-2, lysozyme, mast cell tryptase, MPO, Pax-5, TdT, and TER-119) were regarded as valuable for diagnostic evaluation. Immunohistochemistry was also established in an automated immunostainer for high throughput analysis. The antibody panel developed is useful for the classification of murine lymphomas and leukemias analyzed, and a valuable tool for human and veterinary pathologists involved in the diagnostic interpretation of murine models of hematopoietic neoplasias.

    Toxicologic pathology 2007;35;3;366-75

  • A network of multiple regulatory layers shapes gene expression in fission yeast.

    Lackner DH, Beilharz TH, Marguerat S, Mata J, Watt S, Schubert F, Preiss T and Bähler J

    Cancer Research UK Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Gene expression is controlled at multiple layers, and cells may integrate different regulatory steps for coherent production of proper protein levels. We applied various microarray-based approaches to determine key gene-expression intermediates in exponentially growing fission yeast, providing genome-wide data for translational profiles, mRNA steady-state levels, polyadenylation profiles, start-codon sequence context, mRNA half-lives, and RNA polymerase II occupancy. We uncovered widespread and unexpected relationships between distinct aspects of gene expression. Translation and polyadenylation are aligned on a global scale with both the lengths and levels of mRNAs: efficiently translated mRNAs have longer poly(A) tails and are shorter, more stable, and more efficiently transcribed on average. Transcription and translation may be independently but congruently optimized to streamline protein production. These rich data sets, all acquired under a standardized condition, reveal a substantial coordination between regulatory layers and provide a basis for a systems-level understanding of multilayered gene-expression programs.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Molecular cell 2007;26;1;145-55

  • hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes.

    Lamesch P, Li N, Milstein S, Fan C, Hao T, Szabo G, Hu Z, Venkatesan K, Bethel G, Martin P, Rogers J, Lawlor S, McLaren S, Dricot A, Borick H, Cusick ME, Vandenhaute J, Dunham I, Hill DE and Vidal M

    Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.

    Complete sets of cloned protein-encoding open reading frames (ORFs), or ORFeomes, are essential tools for large-scale proteomics and systems biology studies. Here we describe human ORFeome version 3.1 (hORFeome v3.1), currently the largest publicly available resource of full-length human ORFs (available at ). Generated by Gateway recombinational cloning, this collection contains 12,212 ORFs, representing 10,214 human genes, and corresponds to a 51% expansion of the original hORFeome v1.1. An online human ORFeome database, hORFDB, was built and serves as the central repository for all cloned human ORFs ( This expansion of the original ORFeome resource greatly increases the potential experimental search space for large-scale proteomics studies, which will lead to the generation of more comprehensive datasets.

    Genomics 2007;89;3;307-15

  • The role of neuronal complexes in human X-linked brain diseases.

    Laumonnier F, Cuthbert PC and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, UK.

    Beyond finding individual genes that are involved in medical disorders, an important challenge is the integration of sets of disease genes with the complexities of basic biological processes. We examine this issue by focusing on neuronal multiprotein complexes and their components encoded on the human X chromosome. Multiprotein signaling complexes in the postsynaptic terminal of central nervous system synapses are essential for the induction of neuronal plasticity and cognitive processes in animals. The prototype complex is the N-methyl-D-aspartate receptor complex/membrane-associated guanylate kinase-associated signaling complex (NRC/MASC) comprising 185 proteins and embedded within the postsynaptic density (PSD), which is a set of complexes totaling approximately 1,100 proteins. It is striking that 86% (6 of 7) of X-linked NRC/MASC genes and 49% (19 of 39) of X-chromosomal PSD genes are already known to be involved in human psychiatric disorders. Moreover, of the 69 known proteins mutated in X-linked mental retardation, 19 (28%) encode postsynaptic proteins. The high incidence of involvement in cognitive disorders is also found in mouse mutants and indicates that the complexes are functioning as integrated entities or molecular machines and that disruption of different components impairs their overall role in cognitive processes. We also noticed that NRC/MASC genes appear to be more strongly associated with mental retardation and autism spectrum disorders. We propose that systematic studies of PSD and NRC/MASC genes in mice and humans will give a high yield of novel genes important for human disease and new mechanistic insights into higher cognitive functions.

    Funded by: Wellcome Trust

    American journal of human genetics 2007;80;2;205-20

  • Identification of a novel tumor transforming gene GAEC1 at 7q22 which encodes a nuclear protein and is frequently amplified and overexpressed in esophageal squamous cell carcinoma.

    Law FB, Chen YW, Wong KY, Ying J, Tao Q, Langford C, Lee PY, Law S, Cheung RW, Chui CH, Tsao SW, Lam KY, Wong J, Srivastava G and Tang JC

    Department of Pathology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.

    By comparative DNA fingerprinting, we identified a 357-bp DNA fragment frequently amplified in esophageal squamous cell carcinomas (ESCC). This fragment overlaps with an expressed sequence tag mapped to 7q22. Further 5' and 3'-rapid amplification of cDNA ends revealed that it is part of a novel, single-exon gene with full-length mRNA of 2052 bp and encodes a nuclear protein of 109 amino acids ( approximately 15 kDa). This gene, designated as gene amplified in esophageal cancer 1 (GAEC1), was located within a 1-2 Mb amplicon at 7q22.1 identified by high-resolution 1 Mb array-comparative genomic hybridization in 6/10 ESCC cell lines. GAEC1 was ubiquitously expressed in normal tissues including esophageal and gastrointestinal organs; with amplification and overexpression in 6/10 (60%) ESCC cell lines and 34/99 (34%) primary tumors. Overexpression of GAEC1 in 3T3 mouse fibroblasts caused foci formation and colony formation in soft agar, comparable to H-ras and injection of GAEC1-transfected 3T3 cells into athymic nude mice formed undifferentiated sarcoma in vivo, indicating that GAEC1 is a transforming oncogene. Although no significant correlation was observed between GAEC1 amplification and clinicopathological parameters and prognosis, our study demonstrated that overexpressed GAEC1 has tumorigenic potential and suggest that overexpressed GAEC1 may play an important role in ESCC pathogenesis.

    Funded by: Wellcome Trust: 079643

    Oncogene 2007;26;40;5877-88

  • Selective colonization of insoluble substrates by human faecal bacteria.

    Leitch EC, Walker AW, Duncan SH, Holtrop G and Flint HJ

    Microbial Ecology Group, Aberdeen, UK.

    Insoluble plant polysaccharides and endogenous mucin are important energy sources for human colonic microorganisms. The object of this study was to determine whether or not specific communities colonize these substrates. Using faecal samples from four individuals as inocula for an anaerobic in vitro continuous flow system, the colonization of wheat bran, high amylose starch and porcine gastric mucin was examined. Recovered substrates were extensively washed and the remaining tightly attached bacterial communities were identified using polymerase chain reaction-amplified 16S rRNA gene sequences and fluorescent in situ hybridization. The substrate had a major influence on the species of attached bacteria detected. Sequences retrieved from bran were dominated by clostridial cluster XIVa bacteria, including uncultured relatives of Clostridium hathewayi, Eubacterium rectale and Roseburia species. Bacteroides species were also detected. The most abundant sequences recovered from starch were related to the cultured species Ruminococcus bromii, Bifidobacterium adolescentis, Bifidobacterium breve and E. rectale. The most commonly recovered sequences from mucin were from Bifidobacterium bifidum and uncultured bacteria related to Ruminococcus lactaris. This study suggests that a specific subset of bacteria is likely to be the primary colonizers of particular insoluble colonic substrates. For a given substrate, however, the primary colonizing species may vary between host individuals.

    Environmental microbiology 2007;9;3;667-79

  • Common ABCB1 polymorphisms are not associated with multidrug resistance in epilepsy using a gene-wide tagging approach.

    Leschziner GD, Andrew T, Leach JP, Chadwick D, Coffey AJ, Balding DJ, Bentley DR, Pirmohamed M and Johnson MR

    Imperial College London, London, UK.

    P-glycoprotein, the product of the ABCB1 gene, is a proposed mechanism of pharmacoresistance in epilepsy. Previous attempts to correlate the ABCB1 C3435T SNP, or a three-SNP haplotype containing C3435T with epilepsy pharmacoresistance have produced discordant findings. We analysed these single nucleotide polymorphisms (SNPs), plus a more comprehensive set of tagging SNPs describing common variation in ABCB1 in a case-control study. No significant association of C3435T (P=0.55), the three-SNP haplotype (lowest P=0.14) or any gene-wide tagging SNP (lowest P=0.17) with multidrug resistance in epilepsy was identified. Meta-analysis of studies using the same definition of multidrug resistance (n=1064) also demonstrated no significant association of C3435T with multidrug resistance (P=0.31). These findings suggest that C3435T is unlikely to be a marker for epilepsy multidrug resistance. In addition, no evidence for a role of other common ABCB1 polymorphisms was found using a potentially more powerful gene-wide tagging approach.

    Funded by: Wellcome Trust

    Pharmacogenetics and genomics 2007;17;3;217-20

  • ABCB1 genotype and PGP expression, function and therapeutic drug response: a critical review and recommendations for future research.

    Leschziner GD, Andrew T, Pirmohamed M and Johnson MR

    Division of Neurosciences, Imperial College, London, UK.

    The product of the ABCB1 gene, P-glycoprotein (PGP), is a transmembrane active efflux pump for a variety of drugs. It is a putative mechanism of multidrug resistance in a range of diseases. It is postulated that ABCB1 polymorphisms contribute to variability in PGP function, and that therefore multidrug resistance is, at least in part, genetically determined. However, studies of ABCB1 genotype or haplotype and PGP expression, activity or drug response have produced inconsistent results. This critical review of ABCB1 genotype and PGP function, including mRNA expression, PGP-substrate drug pharmacokinetics and drug response, highlights methodological limitations of existing studies, including inadequate power, potential confounding by co-morbidity and co-medication, multiple testing, poor definition of disease phenotype and outcomes, and analysis of multiple drugs that might not be PGP substrates. We have produced recommendations for future research that will aid clarification of the association between ABCB1 genotypes and factors related to PGP activity.

    Funded by: Wellcome Trust

    The pharmacogenomics journal 2007;7;3;154-79

  • The association between polymorphisms in RLIP76 and drug response in epilepsy.

    Leschziner GD, Jorgensen AL, Andrew T, Williamson PR, Marson AG, Coffey AJ, Middleditch C, Balding DJ, Rogers J, Bentley DR, Chadwick D, Johnson MR and Pirmohamed M

    Imperial College London, Division of Neuroscience, Charing Cross Campus, Room 10E07, St Dunstan's Road, London W6 8RF, UK.

    Introduction: Approximately 30% of patients with epilepsy are resistant to treatment with anti-epileptic drugs (AEDs). The ABC drug transporter proteins are hypothesized to mediate drug resistance in epilepsy. More recently, a non-ABC putative transporter, RLIP76, has also been proposed to be involved in the mechanism of pharmacoresistance. One previous association study of six polymorphisms in RLIP76 failed to find any association with drug resistance in a retrospective cohort of epilepsy patients. We aimed to look for an association with outcomes reflecting drug response in a larger prospective cohort, with gene-wide coverage.

    We investigated the role of common polymorphisms in RLIP76 in epilepsy pharmacoresistance by genotyping 23 common RLIP76 polymorphisms in a prospective cohort of 503 epilepsy patients, from the standard and new anti-epileptic drugs (SANAD) prospective study of new and old AEDs. A total of 13 of these were tested for association with four outcomes reflecting response to drugs: time to first seizure, time to 12-month remission, time to withdrawal due to inadequate seizure control, and time to withdrawal due to unacceptable adverse drug events.

    Results: No significant associations, allowing for multiple testing, were found in the whole cohort. There was also no effect in a subgroup of patients on carbamazepine, which is thought to be a RLIP76 substrate, although two polymorphisms were associated with time to first seizure (p = 0.007).

    Discussion: We failed to demonstrate any association between RLIP76 polymorphisms and four different measures of drug response in the larger cohort, but a subgroup analysis of patients receiving carbamazepine suggested an association that should be investigated further.

    Conclusions: Our data suggest that common variants in RLIP76 are unlikely to contribute to epilepsy drug response.

    Pharmacogenomics 2007;8;12;1715-22

  • Variation of 52 new Y-STR loci in the Y Chromosome Consortium worldwide panel of 76 diverse individuals.

    Lim SK, Xue Y, Parkin EJ and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    We have established 16 small multiplex reactions of two-four loci to amplify 52 recently described single-copy simple Y-STRs and typed these loci in a worldwide panel of 74 diverse men and two women. Two Y-STRs were found to be commonly multicopy in this sample set and were excluded from the study. Of the remaining 50, four (DYS481, DYS570, DYS576 and DYS643) showed higher diversities than the commonly used loci and can potentially provide increased haplotype discrimination in both forensic and anthropological work. Ten loci showed occasional missing alleles, duplicated peaks or intermediate-sized alleles.

    Funded by: Wellcome Trust

    International journal of legal medicine 2007;121;2;124-7

  • The Straphylococci: A Postgenomic View

    Linday J A

    Bacterial Pathogenomics. 2007;120-40

  • MASTR: multiple alignment and structure prediction of non-coding RNAs using simulated annealing.

    Lindgreen S, Gardner PP and Krogh A

    Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark.

    Motivation: As more non-coding RNAs are discovered, the importance of methods for RNA analysis increases. Since the structure of ncRNA is intimately tied to the function of the molecule, programs for RNA structure prediction are necessary tools in this growing field of research. Furthermore, it is known that RNA structure is often evolutionarily more conserved than sequence. However, few existing methods are capable of simultaneously considering multiple sequence alignment and structure prediction.

    Result: We present a novel solution to the problem of simultaneous structure prediction and multiple alignment of RNA sequences. Using Markov chain Monte Carlo in a simulated annealing framework, the algorithm MASTR (Multiple Alignment of STructural RNAs) iteratively improves both sequence alignment and structure prediction for a set of RNA sequences. This is done by minimizing a combined cost function that considers sequence conservation, covariation and basepairing probabilities. The results show that the method is very competitive to similar programs available today, both in terms of accuracy and computational efficiency.

    Availability: Source code available from

    Bioinformatics (Oxford, England) 2007;23;24;3304-11

  • Sequencing and analysis of chromosome 1 of Eimeria tenella reveals a unique segmental organization.

    Ling KH, Rajandream MA, Rivailler P, Ivens A, Yap SJ, Madeira AM, Mungall K, Billington K, Yee WY, Bankier AT, Carroll F, Durham AM, Peters N, Loo SS, Isa MN, Novaes J, Quail M, Rosli R, Nor Shamsudin M, Sobreira TJ, Tivey AR, Wai SF, White S, Wu X, Kerhornou A, Blake D, Mohamed R, Shirley M, Gruber A, Berriman M, Tomley F, Dear PH and Wan KL

    Malaysia Genome Institute, UKM-MTDC Smart Technology Centre, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor DE, Malaysia.

    Eimeria tenella is an intracellular protozoan parasite that infects the intestinal tracts of domestic fowl and causes coccidiosis, a serious and sometimes lethal enteritis. Eimeria falls in the same phylum (Apicomplexa) as several human and animal parasites such as Cryptosporidium, Toxoplasma, and the malaria parasite, Plasmodium. Here we report the sequencing and analysis of the first chromosome of E. tenella, a chromosome believed to carry loci associated with drug resistance and known to differ between virulent and attenuated strains of the parasite. The chromosome--which appears to be representative of the genome--is gene-dense and rich in simple-sequence repeats, many of which appear to give rise to repetitive amino acid tracts in the predicted proteins. Most striking is the segmentation of the chromosome into repeat-rich regions peppered with transposon-like elements and telomere-like repeats, alternating with repeat-free regions. Predicted genes differ in character between the two types of segment, and the repeat-rich regions appear to be associated with strain-to-strain variation.

    Funded by: Medical Research Council: MC_U105131672; Wellcome Trust

    Genome research 2007;17;3;311-9

  • A physical analysis of the Y chromosome shows no additional deletions, other than Gr/Gr, associated with testicular germ cell tumour.

    Linger R, Dudakia D, Huddart R, Easton D, Bishop DT, Stratton MR and Rapley EA

    Testicular Cancer Genetics Team, Section of Cancer Genetics, Institute of Cancer Research, Brookes Lawley Building, Sutton, Surrey SM2 5NG, UK.

    Testicular germ cell tumour (TGCT) is the most common malignancy in men aged 15-45 years. A small deletion on the Y chromosome known as 'gr/gr' was shown to be associated with a two-fold increased risk of TGCT, increasing to three-fold in cases with a family history of TGCT. Additional deletions of the Y chromosome, known as AZFa, AZFb and AZFc, are described in patients with infertility; however, complete deletions of these regions have not been identified in TGCT patients. We screened the Y chromosome in a series of TGCT cases to evaluate if additional deletions of Y were implicated in TGCT susceptibility. Single copy Y chromosome STS markers with an average inter-marker spacing of 128 kb were examined in constitutional DNA of 271 index TGCT patients. Three markers showed evidence of deletions, sY1291, indicative of 'gr/gr' (eight out of 271; 2.9%), Y-DAZ3 contained within 'gr/gr' (21 out of 271; 7.7%) and a single deletion of the marker G66152 was identified in one TGCT case. No other markers demonstrated deletions. While several regions of the Y chromosome are known to be deleted and associated with infertility, our study provides no evidence to suggest regions of Y deletion, other than 'gr/gr', are associated with susceptibility to TGCT in UK patients.

    British journal of cancer 2007;96;2;357-61

  • Molecular analysis of the VSX1 gene in familial keratoconus.

    Liskova P, Ebenezer ND, Hysi PG, Gwilliam R, El-Ashry MF, Moodaley LC, Hau S, Twa M, Tuft SJ and Bhatacharya SS

    Division of Molecular Genetics, Institute of Ophthalmology, UCL, London, UK.

    Purpose: To evaluate the role of the visual system homeobox gene 1 (VSX1) in the pathogenesis of familial keratoconus.

    Methods: Families with two or more individuals with keratoconus were recruited and their members examined. The coding region and intron-exon junctions of the VSX1 gene were sequenced in affected individuals. In cases where there were possible pathogenic changes, segregation within the pedigree was analyzed. Meta analysis of reports on an association of p.D144E change with keratoconus phenotype was performed.

    Results: Probands from a panel of 85 apparently unrelated keratoconus families were included. Eleven sequence variants were observed, including the previously reported c.432C>G (p.D144E) change and two novel intronic single nucleotide polymorphisms. However, these three changes did not cosegregate with the disease phenotype.

    Conclusions: We excluded the c.432C>G sequence alteration as the direct cause of the disease. Lack of possibly pathogenic VSX1 sequence variants in the familial panel suggests that involvement of this gene in the pathogenesis of keratoconus is likely to be confined to a small number of pedigrees, at least in the population studied.

    Molecular vision 2007;13;1887-91

  • Comment on "A common genetic variant is associated with adult and childhood obesity".

    Loos RJ, Barroso I, O'rahilly S and Wareham NJ

    Medical Research Council Epidemiology Unit, Cambridge, UK.

    Herbert et al. (Reports, 14 April 2006, p. 279) found that the rs7566605 genetic variant, located upstream of the INSIG2 gene, was consistently associated with increased body mass index. However, we found no evidence of association between rs7566605 and body mass index in two large ethnically homogeneous population-based cohorts. On the contrary, an opposite tendency was observed.

    Funded by: Medical Research Council: G9824984, MC_U106179471, MC_U106188470; Wellcome Trust: 077016

    Science (New York, N.Y.) 2007;315;5809;187; author reply 187

  • TCF7L2 polymorphisms modulate proinsulin levels and beta-cell function in a British Europid population.

    Loos RJ, Franks PW, Francis RW, Barroso I, Gribble FM, Savage DB, Ong KK, O'Rahilly S and Wareham NJ

    Medical Research Council Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK.

    Rapidly accumulating evidence shows that common T-cell transcription factor (TCF)7L2 polymorphisms confer risk of type 2 diabetes through unknown mechanisms. We examined the association between four TCF7L2 single nucleotide polymorphisms (SNPs), including rs7903146, and measures of insulin sensitivity and insulin secretion in 1,697 Europid men and women of the population-based MRC (Medical Research Council)-Ely study. The T-(minor) allele of rs7903146 was strongly and positively associated with fasting proinsulin (P = 4.55 x 10(-9)) and 32,33 split proinsulin (P = 1.72 x 10(-4)) relative to total insulin levels; i.e., differences between T/T and C/C homozygotes amounted to 21.9 and 18.4% respectively. Notably, the insulin-to-glucose ratio (IGR) at 30-min oral glucose tolerance test (OGTT), a frequently used surrogate of first-phase insulin secretion, was not associated with the TCF7L2 SNP (P > 0.7). However, the insulin response (IGR) at 60-min OGTT was significantly lower in T-allele carriers (P = 3.5 x 10(-3)). The T-allele was also associated with higher A1C concentrations (P = 1.2 x 10(-2)) and reduced beta-cell function, assessed by homeostasis model assessment of beta-cell function (P = 2.8 x 10(-2)). Similar results were obtained for the other TCF7L2 SNPs. Of note, both major genes involved in proinsulin processing (PC1, PC2) contain TCF-binding sites in their promoters. Our findings suggest that the TCF7L2 risk allele may predispose to type 2 diabetes by impairing beta-cell proinsulin processing. The risk allele increases proinsulin levels and diminishes the 60-min but not 30-min insulin response during OGTT. The strong association between the TCF7L2 risk allele and fasting proinsulin but not insulin levels is notable, as, in this unselected and largely normoglycemic population, external influences on beta-cell stress are unlikely to be major factors influencing the efficiency of proinsulin processing.

    Funded by: Medical Research Council: G9824984, MC_U106179471, MC_U106179472, MC_U106188470; Wellcome Trust: 071187, 077016

    Diabetes 2007;56;7;1943-7

  • Altered retinal microRNA expression profile in a mouse model of retinitis pigmentosa.

    Loscher CJ, Hokamp K, Kenna PF, Ivens AC, Humphries P, Palfi A and Farrar GJ

    Smurfit Institute of Genetics, Trinity College Dublin, College Green, Dublin 2, Ireland.

    Background: The role played by microRNAs (miRs) as common regulators in physiologic processes such as development and various disease states was recently highlighted. Retinitis pigmentosa (RP) linked to RHO (which encodes rhodopsin) is the most frequent form of inherited retinal degeneration that leads to blindness, for which there are no current therapies. Little is known about the cellular mechanisms that connect mutations within RHO to eventual photoreceptor cell death by apoptosis.

    Results: Global miR expression profiling using miR microarray technology and quantitative real-time RT-PCR (qPCR) was performed in mouse retinas. RNA samples from retina of a mouse model of RP carrying a mutant Pro347Ser RHO transgene and from wild-type retina, brain and a whole-body representation (prepared by pooling total RNA from eight different mouse organs) exhibited notably different miR profiles. Expression of retina-specific and recently described retinal miRs was semi-quantitatively demonstrated in wild-type mouse retina. Alterations greater than twofold were found in the expression of nine miRs in Pro347Ser as compared with wild-type retina (P < 0.05). Expression of miR-1 and miR-133 decreased by more than 2.5-fold (P < 0.001), whereas expression of miR-96 and miR-183 increased by more than 3-fold (P < 0.001) in Pro347Ser retinas, as validated by qPCR. Potential retinal targets for these miRs were predicted in silico.

    Conclusion: This is the first miR microarray study to focus on evaluating altered miR expression in retinal disease. Additionally, novel retinal preference for miR-376a and miR-691 was identified. The results obtained contribute toward elucidating the function of miRs in normal and diseased retina. Modulation of expression of retinal miRs may represent a future therapeutic strategy for retinopathies such as RP.

    Genome biology 2007;8;11;R248

  • Functional cell permeable motifs within medically relevant proteins.

    Low W, Mortlock A, Petrovska L, Dottorini T, Dougan G and Crisanti A

    Biological Sciences, Imperial College London, Imperial College Road, 5th floor SAF Building, London SW7 2AZ, UK.

    Increasing experimental evidence indicates that short polybasic peptides are able to translocate across the membrane of living cells. However, these peptides, often derived from viruses and insects, may induce unspecific effects that could mask the action of their cargoes. Here, we show that a panel of lysine and/or arginine-rich peptides, derived from human proteins involved in cell signalling pathways leading to inflammation, possess the intrinsic ability to cross intact cellular membranes. These peptides are also capable of carrying a biologically active cargo. One of these peptides, encompassing the cell permeable sequence of the Toll-receptor 4 (TLR4) adaptor protein (TIRAP) and modified to carry a dominant-negative domain of the same TIRAP protein, selectively inhibited the production of pro-inflammatory cytokines upon LPS challenge, in in vitro, ex vivo and in vivo experiments. Docking studies indicated that this inhibition might be mediated by the disruption of the recruitment of downstream effector molecules. These results show for the first time the potential of using for therapy cell permeable peptides derived from human proteins involved in disease.

    Funded by: Wellcome Trust: 076962

    Journal of biotechnology 2007;129;3;555-64

  • Islands of euchromatin-like sequence and expressed polymorphic sequences within the short arm of human chromosome 21.

    Lyle R, Prandini P, Osoegawa K, ten Hallers B, Humphray S, Zhu B, Eyras E, Castelo R, Bird CP, Gagos S, Scott C, Cox A, Deutsch S, Ucla C, Cruts M, Dahoun S, She X, Bena F, Wang SY, Van Broeckhoven C, Eichler EE, Guigo R, Rogers J, de Jong PJ, Reymond A and Antonarakis SE

    Department of Genetic Medicine and Development, University of Geneva Medical School, and University Hospitals, 1211 Geneva, Switzerland.

    The goals of the human genome project did not include sequencing of the heterochromatic regions. We describe here an initial sequence of 1.1 Mb of the short arm of human chromosome 21 (HSA21p), estimated to be 10% of 21p. This region contains extensive euchromatic-like sequence and includes on average one transcript every 100 kb. These transcripts show multiple inter- and intrachromosomal copies, and extensive copy number and sequence variability. The sequencing of the "heterochromatic" regions of the human genome is likely to reveal many additional functional elements and provide important evolutionary information.

    Funded by: NHGRI NIH HHS: HG002385

    Genome research 2007;17;11;1690-6

  • Comparative gene expression profiling of in vitro differentiated megakaryocytes and erythroblasts identifies novel activatory and inhibitory platelet membrane proteins.

    Macaulay IC, Tijssen MR, Thijssen-Timmer DC, Gusnanto A, Steward M, Burns P, Langford CF, Ellis PD, Dudbridge F, Zwaginga JJ, Watkins NA, van der Schoot CE and Ouwehand WH

    Department of Haematology, University of Cambridge, Cambridge, UK.

    To identify previously unknown platelet receptors we compared the transcriptomes of in vitro differentiated megakaryocytes (MKs) and erythroblasts (EBs). RNA was obtained from purified, biologically paired MK and EB cultures and compared using cDNA microarrays. Bioinformatical analysis of MK-up-regulated genes identified 151 transcripts encoding transmembrane domain-containing proteins. Although many of these were known platelet genes, a number of previously unidentified or poorly characterized transcripts were also detected. Many of these transcripts, including G6b, G6f, LRRC32, LAT2, and the G protein-coupled receptor SUCNR1, encode proteins with structural features or functions that suggest they may be involved in the modulation of platelet function. Immunoblotting on platelets confirmed the presence of the encoded proteins, and flow cytometric analysis confirmed the expression of G6b, G6f, and LRRC32 on the surface of platelets. Through comparative analysis of expression in platelets and other blood cells we demonstrated that G6b, G6f, and LRRC32 are restricted to the platelet lineage, whereas LAT2 and SUCNR1 were also detected in other blood cells. The identification of the succinate receptor SUCNR1 in platelets is of particular interest, because physiologically relevant concentrations of succinate were shown to potentiate the effect of low doses of a variety of platelet agonists.

    Funded by: Medical Research Council: MC_U105260799

    Blood 2007;109;8;3260-9

  • Karyotype evolution in Rhinolophus bats (Rhinolophidae, Chiroptera) illuminated by cross-species chromosome painting and G-banding comparison.

    Mao X, Nie W, Wang J, Su W, Ao L, Feng Q, Wang Y, Volleth M and Yang F

    Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Kunming, Yunnan, PR China.

    Rhinolophus (Rhinolophidae) is the second most speciose genus in Chiroptera and has extensively diversified diploid chromosome numbers (from 2n = 28 to 62). In spite of many attempts to explore the karyotypic evolution of this genus, most studies have been based on conventional Giemsa staining rather than G-banding. Here we have made a whole set of chromosome-specific painting probes from flow-sorted chromosomes of Aselliscus stoliczkanus (Hipposideridae). These probes have been utilized to establish the first genome-wide homology maps among six Rhinolophus species with four different diploid chromosome numbers (2n = 36, 44, 58, and 62) and three species from other families: Rousettus leschenaulti (2n = 36, Pteropodidae), Hipposideros larvatus (2n = 32, Hipposideridae), and Myotis altarium (2n = 44, Vespertilionidae) by fluorescence in situ hybridization. To facilitate integration with published maps, human paints were also hybridized to A. stoliczkanus chromosomes. Our painting results substantiate the wide occurrence of whole-chromosome arm conservation in Rhinolophus bats and suggest that Robertsonian translocations of different combinations account for their karyotype differences. Parsimony analysis using chromosomal characters has provided some new insights into the Rhinolophus ancestral karyotype and phylogenetic relationships among these Rhinolophus species so far studied. In addition to Robertsonian translocations, our results suggest that whole-arm (reciprocal) translocations involving multiple non-homologous chromosomes as well could have been involved in the karyotypic evolution within Rhinolophus, in particular those bats with low and medium diploid numbers.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2007;15;7;835-48

  • Shaken not stirred: a global research cocktail served in Hinxton.

    Marguerat S, Wilhelm BT and Bähler J

    Cancer Research UK Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    A report of the 2007 Cold Spring Harbor Laboratory/Wellcome Trust Conference on Functional Genomics and Systems Biology, Hinxton, UK, 10-13 October 2007.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Genome biology 2007;8;11;320

  • Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.

    Marioni JC, Thorne NP, Valsesia A, Fitzgerald T, Redon R, Fiegler H, Andrews TD, Stranger BE, Lynch AG, Dermitzakis ET, Carter NP, Tavaré S and Hurles ME

    Computational Biology Group, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, UK.

    Background: Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.

    Results: We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.

    Conclusion: Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.

    Funded by: Wellcome Trust

    Genome biology 2007;8;10;R228

  • Renin enhancer is crucial for full response in Renin expression to an in vivo stimulus.

    Markus MA, Goy C, Adams DJ, Lovicu FJ and Morris BJ

    Basic & Clinical Genomics Laboratory, School of Medical Sciences and Bosch Institute, Building F13, University of Sydney, NSW 2006, Australia.

    We showed recently that deletion of a strong enhancer located 2.7 kb upstream of the renin gene in mice produces a strain with mild hypotension and salt-sensitivity. Here we set out to compare responses in renin expression in kidney and extrarenal tissues in these "REKO" mice. REKO and wild-type mice were placed on a low NaCl/enalapril regimen for 1 week, and then Ren-1(c) mRNA and renin enzyme activities were measured in tissues and plasma. In untreated REKO mice, renin and Ren-1(c) mRNA were reduced significantly in kidney, submandibular gland, adrenal, heart, and brain. In situ hybridization indicated a marked reduction in Ren-1(c) mRNA in juxtaglomerular cells and granular ducts of submandibular gland. After the chronic stimulus response in renal Ren-1(c) mRNA in REKO mice was blunted by 54% compared with wild-type mice, and was accompanied by almost complete exhaustion of renin stores. Response in plasma renin was blunted by 47%, this being mirrored in heart (54% decline), in which renin is derived mostly from the bloodstream. In adrenal a 55% reduction was seen. These data are consistent with inability of REKO mice to adequately replenish renal renin stores during chronic stimulation of renin secretion. In conclusion, the renin enhancer is critical for replenishment of renin stores and response in renin to a chronic in vivo stimulus.

    Hypertension 2007;50;5;933-8

  • Chromosomally unstable mouse tumours have genomic alterations similar to diverse human cancers.

    Maser RS, Choudhury B, Campbell PJ, Feng B, Wong KK, Protopopov A, O'Neil J, Gutierrez A, Ivanova E, Perna I, Lin E, Mani V, Jiang S, McNamara K, Zaghlul S, Edkins S, Stevens C, Brennan C, Martin ES, Wiedemeyer R, Kabbarah O, Nogueira C, Histen G, Aster J, Mansour M, Duke V, Foroni L, Fielding AK, Goldstone AH, Rowe JM, Wang YA, Look AT, Stratton MR, Chin L, Futreal PA and DePinho RA

    Department of Medical Oncology, Dana Farber Cancer Institute, Boston, Massachusetts 02115, USA.

    Highly rearranged and mutated cancer genomes present major challenges in the identification of pathogenetic events driving the neoplastic transformation process. Here we engineered lymphoma-prone mice with chromosomal instability to assess the usefulness of mouse models in cancer gene discovery and the extent of cross-species overlap in cancer-associated copy number aberrations. Along with targeted re-sequencing, our comparative oncogenomic studies identified FBXW7 and PTEN to be commonly deleted both in murine lymphomas and in human T-cell acute lymphoblastic leukaemia/lymphoma (T-ALL). The murine cancers acquire widespread recurrent amplifications and deletions targeting loci syntenic to those not only in human T-ALL but also in diverse human haematopoietic, mesenchymal and epithelial tumours. These results indicate that murine and human tumours experience common biological processes driven by orthologous genetic events in their malignant evolution. The highly concordant nature of genomic events encourages the use of genomically unstable murine cancer models in the discovery of biological driver events in the human oncogenome.

    Funded by: Medical Research Council: G0500389; Wellcome Trust: 077012, 088340

    Nature 2007;447;7147;966-71

  • Transcriptional regulatory network for sexual differentiation in fission yeast.

    Mata J, Wilbrey A and Bähler J

    Cancer Research UK Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1HH, UK.

    Background: Changes in gene expression are hallmarks of cellular differentiation. Sexual differentiation in fission yeast (Schizosaccharomyces pombe) provides a model system for gene expression programs accompanying and driving cellular specialization. The expression of hundreds of genes is modulated in successive waves during meiosis and sporulation in S. pombe, and several known transcription factors are critical for these processes.

    Results: We used DNA microarrays to investigate meiotic gene regulation by examining transcriptomes after genetic perturbations (gene deletion and/or overexpression) of rep1, mei4, atf21 and atf31, which encode known transcription factors controlling sexual differentiation. This analysis reveals target genes at a genome-wide scale and uncovers combinatorial control by Atf21p and Atf31p. We also studied two transcription factors not previously implicated in sexual differentiation whose meiotic induction depended on Mei4p: Rsv2p induces stress-related genes during spore formation, while Rsv1p represses glucose-metabolism genes. Our data further reveal negative feedback interactions: both Rep1p and Mei4p not only activate specific gene expression waves (early and middle genes, respectively) but are also required for repression of genes induced in the previous waves (Ste11p-dependent and early genes, respectively).

    Conclusion: These data give insight into regulatory principles controlling the extensive gene expression program driving sexual differentiation and highlight sophisticated interactions and combinatorial control among transcription factors. Besides triggering simultaneous expression of gene waves, transcription factors also repress genes in the previous wave and induce other factors that in turn regulate a subsequent wave. These dependencies ensure an ordered and timely succession of transcriptional waves during cellular differentiation.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Genome biology 2007;8;10;R217

  • Genetic relatedness of the Streptococcus pneumoniae capsular biosynthetic loci.

    Mavroidi A, Aanensen DM, Godoy D, Skovsted IC, Kaltoft MS, Reeves PR, Bentley SD and Spratt BG

    Department of Infectious Disease Epidemiology, Imperial College London, Room G22, Old Medical School Building, St. Mary's Hospital, Norfolk Place, London W2 1PG, United Kingdom.

    Streptococcus pneumoniae (the pneumococcus) produces 1 of 91 capsular polysaccharides (CPS) that define the serotype. The cps loci of 88 pneumococcal serotypes whose CPS is synthesized by the Wzy-dependent pathway were compared with each other and with additional streptococcal polysaccharide biosynthetic loci and were clustered according to the proportion of shared homology groups (HGs), weighted for the sequence similarities between the genes encoding the shared HGs. The cps loci of the 88 pneumococcal serotypes were distributed into eight major clusters and 21 subclusters. All serotypes within the same serogroup fell into the same major cluster, but in six cases, serotypes within the same serogroup were in different subclusters and, conversely, nine subclusters included completely different serotypes. The closely related cps loci within a subcluster were compared to the known CPS structures to relate gene content to structure. The Streptococcus oralis and Streptococcus mitis polysaccharide biosynthetic loci clustered within the pneumococcal cps loci and were in a subcluster that also included the cps locus of pneumococcal serotype 21, whereas the Streptococcus agalactiae cps loci formed a single cluster that was not closely related to any of the pneumococcal cps clusters.

    Funded by: Wellcome Trust

    Journal of bacteriology 2007;189;21;7841-55

  • Genetic and molecular analysis of the central and peripheral circadian clockwork of mice.

    Maywood ES, O'Neill JS, Reddy AB, Chesham JE, Prosser HM, Kyriacou CP, Godinho SI, Nolan PM and Hastings MH

    Division of Neurobiology, MRC Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom.

    A hierarchy of interacting, tissue-based clocks controls circadian physiology and behavior in mammals. Preeminent are the suprachiasmatic nuclei (SCN): central hypothalamic pacemakers synchronized to solar time via retinal afferents and in turn responsible for internal synchronization of other clocks present in major organ systems. The SCN and peripheral clocks share essentially the same cellular timing mechanism. This consists of autoregulatory transcriptional/posttranslational feedback loops in which the Period (Per) and Cryptochrome (Cry) "clock" genes are negatively regulated by their protein products. Here, we review recent studies directed at understanding the molecular and cellular bases to the mammalian clock. At the cellular level, we demonstrate the role of F-box protein Fbxl3 (characterized by the afterhours mutation) in directing the proteasomal degradation of Cry and thereby controlling negative feedback and circadian period of the molecular loops. Within SCN neural circuitry, we describe how neuropeptidergic signaling by VIP synchronizes and sustains the cellular clocks. At the hypothalamic level, signaling via a different SCN neuropeptide, prokineticin, is not required for pacemaking but is necessary for control of circadian behavior. Finally, we consider how metabolic pathways are coordinated in time, focusing on liver function and the role of glucocorticoid signals in driving the circadian transcriptome and proteome.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    Cold Spring Harbor symposia on quantitative biology 2007;72;85-94

  • Prediction of microRNA targets.

    Mazière P and Enright AJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Recently, microRNAs (miRNAs) have been shown to be important regulators of genes in many organisms and have already been implicated in a growing number of diseases. MiRNAs are short (21-23 nucleotides) RNAs that bind to the 3' untranslated regions of target genes. This binding event causes translational repression of the target gene and, evidence now suggests, also stimulates rapid degradation of the target transcript. miRNAs represent a new species of regulator, controlling the levels of potentially large numbers of proteins, many of which might be important drug targets. The expression of miRNAs shows that they are highly differentially expressed, with specific miRNAs active in certain tissues at certain times. In many cancers, miRNA expression is significantly altered, and this has been shown to be a useful diagnostic tool. Several computational approaches have been developed for the prediction of miRNA targets.

    Drug discovery today 2007;12;11-12;452-8

  • Spreading of mammalian DNA-damage response factors studied by ChIP-chip at damaged telomeres.

    Meier A, Fiegler H, Muñoz P, Ellis P, Rigler D, Langford C, Blasco MA, Carter N and Jackson SP

    The Wellcome Trust and Cancer Research UK Gurdon Institute, Department of Zoology, University of Cambridge, Cambridge, UK.

    Phosphorylated histone H2AX (gammaH2AX) is generated in nucleosomes flanking sites of DNA double-strand breaks, triggering the recruitment of DNA-damage response proteins such as MDC1 and 53BP1. Here, we study shortened telomeres in senescent human cells. We show that most telomeres trigger gammaH2AX formation, which spreads up to 570 kb into the subtelomeric regions. Furthermore, we reveal that the spreading patterns of 53BP1 and MDC1 are very similar to that of gammaH2AX, consistent with a structural link between these factors. Moreover, different subsets of telomeres signal in different cell lines, with those that signal tending to equate to the shortest telomeres of the corresponding cell line, thus linking telomere attrition with DNA-damage signalling. Notably, we find that, in some cases, gammaH2AX spreading is modulated in a manner suggesting that H2AX distribution or its ability to be phosphorylated is not uniform along the chromosome. Finally, we observe weak gammaH2AX signals at telomeres of proliferating cells, but not in hTERT immortalised cells, suggesting that low telomerase activity leads to telomere uncapping and senescence in proliferating primary cells.

    Funded by: Cancer Research UK: A5290; Wellcome Trust

    The EMBO journal 2007;26;11;2707-18

  • Report of a female patient with mental retardation and tall stature due to a chromosomal rearrangement disrupting the OPHN1 gene on Xq12.

    Menten B, Buysse K, Vermeulen S, Meersschaut V, Vandesompele J, Ng BL, Carter NP, Mortier GR and Speleman F

    Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium.

    We report on a patient with mental retardation, seizures and tall stature with advanced bone age in whom a de novo apparently balanced chromosomal rearrangement 46,XX,t(X;9)(q12;p13.3) was identified. Using array CGH on flow-sorted derivative chromosomes (array painting) and subsequent FISH and qPCR analysis, we mapped and sequenced both breakpoints. The Xq12 breakpoint was located within the gene coding for oligophrenin 1 (OPHN1) whereas the 9p13.3 breakpoint was assigned to a non-coding segment within a gene dense region. Disruption of OPHN1 by the Xq12 breakpoint was considered the major cause of the abnormal phenotype observed in the proband.

    Funded by: Wellcome Trust

    European journal of medical genetics 2007;50;6;446-54

  • Lamin A/C polymorphisms, type 2 diabetes, and the metabolic syndrome: case-control and quantitative trait studies.

    Mesa JL, Loos RJ, Franks PW, Ong KK, Luan J, O'Rahilly S, Wareham NJ and Barroso I

    Medical Research Center Epidemiology Unit, Cambridge, U.K.

    Mutations in the LMNA gene, encoding the nuclear envelope protein lamin A/C, are responsible for a number of distinct disease entities including Dunnigan-type familial partial lipodystrophy. Dunningan-type lipodystrophy is characterized by loss of subcutaneous adipose tissue, insulin resistance, dyslipidemia, and type 2 diabetes and shares many of the features of the metabolic syndrome. Furthermore, several genome-wide linkage scans for type 2 diabetes have found evidence of linkage at chromosome 1q21.2, the region that harbors the LMNA gene. Therefore, LMNA is a biological and positional candidate for type 2 diabetes susceptibility. Previous studies have reported association between a common LMNA variant (1908C>T; rs4641) and adverse metabolic traits in ethnically diverse populations from Asia and North America. In the present study, we characterized the common variation across the LMNA gene (including rs4641) and tested for association with type 2 diabetes in two large case-control studies (n = 2,052) and with features of the metabolic syndrome in a separate cohort study (n = 1,572). Despite our study being sufficiently powered to detect effects similar and even smaller in magnitude than those previously reported, none of the LMNA single nucleotide polymorphisms were statistically significantly associated with type 2 diabetes or the metabolic syndrome. Thus, it appears unlikely that variation at LMNA substantially increases the risk of type 2 diabetes or related traits in U.K. Europids.

    Funded by: Medical Research Council: MC_U106179471, MC_U106179472, MC_U106188470; Wellcome Trust: 077016

    Diabetes 2007;56;3;884-9

  • Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.

    Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, Jurka J, Kamal M, Mauceli E, Searle SM, Sharpe T, Baker ML, Batzer MA, Benos PV, Belov K, Clamp M, Cook A, Cuff J, Das R, Davidow L, Deakin JE, Fazzari MJ, Glass JL, Grabherr M, Greally JM, Gu W, Hore TA, Huttley GA, Kleber M, Jirtle RL, Koina E, Lee JT, Mahony S, Marra MA, Miller RD, Nicholls RD, Oda M, Papenfuss AT, Parra ZE, Pollock DD, Ray DA, Schein JE, Speed TP, Thompson K, VandeBerg JL, Wade CM, Walker JA, Waters PD, Webber C, Weidman JR, Xie X, Zody MC, Broad Institute Genome Sequencing Platform, Broad Institute Whole Genome Assembly Team, Graves JA, Ponting CP, Breen M, Samollow PB, Lander ES and Lindblad-Toh K

    Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.

    We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.

    Funded by: Medical Research Council: MC_U137761446; Wellcome Trust: 062023

    Nature 2007;447;7141;167-77

  • Pfam: a domain-centric method for analyzing proteins and proteomes.

    Mistry J and Finn R

    The constant deluge of genome sequencing data means that annotating, classifying, and comparing proteins or proteomes can seam like an endless task. Furthermore, discovering and accessing such data is fundamental to biologists. There are, however, databases that perform these tasks. Pfam, a protein families database, is one such database. In this chapter, the use of the web interface to Pfam and the resources provided (annotation, sequence alignments, phylogenetic trees, profile hidden Markov models [HMMs]) are described. The exploitation of tools for searching sequences against the library of Pfam HMMs, searching for domain combinations, searching by taxonomy, browsing proteomes, and comparing proteomes are outlined in detail.

    Methods in molecular biology (Clifton, N.J.) 2007;396;43-58

  • Predicting active site residue annotations in the Pfam database.

    Mistry J, Bateman A and Finn RD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Background: Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family.

    Description: We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and MEROPS we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives.

    Conclusion: We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation.

    Funded by: Wellcome Trust: 087656

    BMC bioinformatics 2007;8;298

  • Differential var gene expression in the organs of patients dying of falciparum malaria.

    Montgomery J, Mphande FA, Berriman M, Pain A, Rogerson SJ, Taylor TE, Molyneux ME and Craig A

    Malawi-Liverpool-Wellcome Programme of Clinical Tropical Research, College of Medicine, Blantyre, Malawi.

    Sequestration of parasitized erythrocytes in the microcirculation of tissues is thought to be important in the pathogenesis of severe falciparum malaria. A major variant surface antigen, var/Plasmodium falciparum erythrocyte membrane protein 1, expressed on the surface of the infected erythrocyte, mediates cytoadherence to vascular endothelium. To address the question of tissue-specific accumulation of variant types, we used the unique resource generated by the clinicopathological study of fatal paediatric malaria in Blantyre, Malawi, to analyse var gene transcription in patients dying with falciparum malaria. Despite up to 102 different var genes being expressed by P. falciparum populations in a single host, only one to two of these genes were expressed at high levels in the brains and hearts of these patients. These major var types differed between organs. However, identical var types were expressed in the brains of multiple patients from a single malaria season. These results provide the first evidence of organ-specific accumulation of P. falciparum variant types and suggest that parasitized erythrocytes can exhibit preferential binding in the body, supporting the hypothesis of cytoadherence-linked pathogenesis.

    Funded by: NIAID NIH HHS: R01 AI034969-10A1, R01 AI34969; Wellcome Trust: 042390, 071376

    Molecular microbiology 2007;65;4;959-67

  • A survey of genomic properties for the detection of regulatory polymorphisms.

    Montgomery SB, Griffith OL, Schuetz JM, Brooks-Wilson A and Jones SJ

    Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada.

    Advances in the computational identification of functional noncoding polymorphisms will aid in cataloging novel determinants of health and identifying genetic variants that explain human evolution. To date, however, the development and evaluation of such techniques has been limited by the availability of known regulatory polymorphisms. We have attempted to address this by assembling, from the literature, a computationally tractable set of regulatory polymorphisms within the ORegAnno database ( We have further used 104 regulatory single-nucleotide polymorphisms from this set and 951 polymorphisms of unknown function, from 2-kb and 152-bp noncoding upstream regions of genes, to investigate the discriminatory potential of 23 properties related to gene regulation and population genetics. Among the most important properties detected in this region are distance to transcription start site, local repetitive content, sequence conservation, minor and derived allele frequencies, and presence of a CpG island. We further used the entire set of properties to evaluate their collective performance in detecting regulatory polymorphisms. Using a 10-fold cross-validation approach, we were able to achieve a sensitivity and specificity of 0.82 and 0.71, respectively, and we show that this performance is strongly influenced by the distance to the transcription start site.

    PLoS computational biology 2007;3;6;e106

  • Clustering of phosphorylation site recognition motifs can be exploited to predict the targets of cyclin-dependent kinase.

    Moses AM, Hériché JK and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, UK.

    Protein kinases are critical to cellular signalling and post-translational gene regulation, but their biological substrates are difficult to identify. We show that cyclin-dependent kinase (CDK) consensus motifs are frequently clustered in CDK substrate proteins. Based on this, we introduce a new computational strategy to predict the targets of CDKs and use it to identify new biologically interesting candidates. Our data suggest that regulatory modules may exist in protein sequence as clusters of short sequence motifs.

    Funded by: Wellcome Trust

    Genome biology 2007;8;2;R23

  • Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites.

    Moses AM, Liku ME, Li JJ and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, United Kingdom.

    Evolutionary change in gene regulation is a key mechanism underlying the genetic component of organismal diversity. Here, we study evolution of regulation at the posttranslational level by examining the evolution of cyclin-dependent kinase (CDK) consensus phosphorylation sites in the protein subunits of the pre-replicative complex (RC). The pre-RC, an assembly of proteins formed during an early stage of DNA replication, is believed to be regulated by CDKs throughout the animals and fungi. Interestingly, although orthologous pre-RC components often contain clusters of CDK consensus sites, the positions and numbers of sites do not seem conserved. By analyzing protein sequences from both distantly and closely related species, we confirm that consensus sites can turn over rapidly even when the local cluster of sites is preserved, consistent with the notion that precise positioning of phosphorylation events is not required for regulation. We also identify evolutionary changes in the clusters of sites and further examine one replication protein, Mcm3, where a cluster of consensus sites near a nucleocytoplasmic transport signal is confined to a specific lineage. We show that the presence or absence of the cluster of sites in different species is associated with differential regulation of the transport signal. These findings suggest that the CDK regulation of MCM nuclear localization was acquired in the lineage leading to Saccharomyces cerevisiae after the divergence with Candida albicans. Our results begin to explore the dynamics of regulatory evolution at the posttranslational level and show interesting similarities to recent observations of regulatory evolution at the level of transcription.

    Funded by: NCI NIH HHS: 5F31CA110268-03; NIGMS NIH HHS: R01 GM59704; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;45;17713-8

  • Critical assessment of methods of protein structure prediction-Round VII.

    Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T and Tramontano A

    Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA.

    This paper is an introduction to the supplemental issue of the journal PROTEINS, dedicated to the seventh CASP experiment to assess the state of the art in protein structure prediction. The paper describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Highlights are improvements in model accuracy relative to that obtainable from knowledge of a single best template structure; convergence of the accuracy of models produced by automatic servers toward that produced by human modeling teams; the emergence of methods for predicting the quality of models; and rapidly increasing practical applications of the methods.

    Funded by: NIGMS NIH HHS: GM072354; NLM NIH HHS: LM07085; Wellcome Trust: 077198

    Proteins 2007;69 Suppl 8;3-9

  • Mouse Phenotype Database Integration Consortium: integration [corrected] of mouse phenome data resources.

    Mouse Phenotype Database Integration Consortium, Hancock JM, Adams NC, Aidinis V, Blake A, Bogue M, Brown SD, Chesler EJ, Davidson D, Duran C, Eppig JT, Gailus-Durner V, Gates H, Gkoutos GV, Greenaway S, Hrabé de Angelis M, Kollias G, Leblanc S, Lee K, Lengger C, Maier H, Mallon AM, Masuya H, Melvin DG, Müller W, Parkinson H, Proctor G, Reuveni E, Schofield P, Shukla A, Smith C, Toyoda T, Vasseur L, Wakana S, Walling A, White J, Wood J and Zouberakis M

    Understanding the functions encoded in the mouse genome will be central to an understanding of the genetic basis of human disease. To achieve this it will be essential to be able to characterize the phenotypic consequences of variation and alterations in individual genes. Data on the phenotypes of mouse strains are currently held in a number of different forms (detailed descriptions of mouse lines, first-line phenotyping data on novel mutations, data on the normal features of inbred lines) at many sites worldwide. For the most efficient use of these data sets, we have initiated a process to develop standards for the description of phenotypes (using ontologies) and file formats for the description of phenotyping protocols and phenotype data sets. This process is ongoing and needs to be supported by the wider mouse genetics and phenotyping communities to succeed. We invite interested parties to contact us as we develop this process further.

    Funded by: Medical Research Council: MC_U127527203, MC_U142684171, MC_U142684172, MC_U142684175

    Mammalian genome : official journal of the International Mammalian Genome Society 2007;18;3;157-63

  • New developments in the InterPro database.

    Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH and Yeats C

    EMBL Outstation-European Bioinformatics Institute Hinxton, Cambridge, UK.

    InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (, and for download by anonymous FTP ( The InterProScan search tool is now also available via a web service at

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Wellcome Trust: 087656

    Nucleic acids research 2007;35;Database issue;D224-8

  • Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A.

    Nejentsev S, Howson JM, Walker NM, Szeszko J, Field SF, Stevens HE, Reynolds P, Hardy M, King E, Masters J, Hulme J, Maier LM, Smyth D, Bailey R, Cooper JD, Ribas G, Campbell RD, Clayton DG, Todd JA and Wellcome Trust Case Control Consortium

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute of Medical Research, University of Cambridge, CB2 0XY, UK.

    The major histocompatibility complex (MHC) on chromosome 6 is associated with susceptibility to more common diseases than any other region of the human genome, including almost all disorders classified as autoimmune. In type 1 diabetes the major genetic susceptibility determinants have been mapped to the MHC class II genes HLA-DQB1 and HLA-DRB1 (refs 1-3), but these genes cannot completely explain the association between type 1 diabetes and the MHC region. Owing to the region's extreme gene density, the multiplicity of disease-associated alleles, strong associations between alleles, limited genotyping capability, and inadequate statistical approaches and sample sizes, which, and how many, loci within the MHC determine susceptibility remains unclear. Here, in several large type 1 diabetes data sets, we analyse a combined total of 1,729 polymorphisms, and apply statistical methods-recursive partitioning and regression-to pinpoint disease susceptibility to the MHC class I genes HLA-B and HLA-A (risk ratios >1.5; P(combined) = 2.01 x 10(-19) and 2.35 x 10(-13), respectively) in addition to the established associations of the MHC class II genes. Other loci with smaller and/or rarer effects might also be involved, but to find these, future searches must take into account both the HLA class II and class I genes and use even larger samples. Taken together with previous studies, we conclude that MHC-class-I-mediated events, principally involving HLA-B*39, contribute to the aetiology of type 1 diabetes.

    Funded by: Medical Research Council: G0000934, G0600681; Wellcome Trust: 076113

    Nature 2007;450;7171;887-92

  • Sequencing and association analysis of the type 1 diabetes-linked region on chromosome 10p12-q11.

    Nejentsev S, Smink LJ, Smyth D, Bailey R, Lowe CE, Payne F, Masters J, Godfrey L, Lam A, Burren O, Stevens H, Nutland S, Walker NM, Smith A, Twells R, Barratt BJ, Wright C, French L, Chen Y, Deloukas P, Rogers J, Dunham I and Todd JA

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK.

    Background: In an effort to locate susceptibility genes for type 1 diabetes (T1D) several genome-wide linkage scans have been undertaken. A chromosomal region designated IDDM10 retained genome-wide significance in a combined analysis of the main linkage scans. Here, we studied sequence polymorphisms in 23 Mb on chromosome 10p12-q11, including the putative IDDM10 region, to identify genes associated with T1D.

    Results: Initially, we resequenced the functional candidate genes, CREM and SDF1, located in this region, genotyped 13 tag single nucleotide polymorphisms (SNPs) and found no association with T1D. We then undertook analysis of the whole 23 Mb region. We constructed and sequenced a contig tile path from two bacterial artificial clone libraries. By comparison with a clone library from an unrelated person used in the Human Genome Project, we identified 12,058 SNPs. We genotyped 303 SNPs and 25 polymorphic microsatellite markers in 765 multiplex T1D families and followed up 22 associated polymorphisms in up to 2,857 families. We found nominal evidence of association in six loci (P = 0.05 - 0.0026), located near the PAPD1 gene. Therefore, we resequenced 38.8 kb in this region, found 147 SNPs and genotyped 84 of them in the T1D families. We also tested 13 polymorphisms in the PAPD1 gene and in five other loci in 1,612 T1D patients and 1,828 controls from the UK. Overall, only the D10S193 microsatellite marker located 28 kb downstream of PAPD1 showed nominal evidence of association in both T1D families and in the case-control sample (P = 0.037 and 0.03, respectively).

    Conclusion: We conclude that polymorphisms in the CREM and SDF1 genes have no major effect on T1D. The weak T1D association that we detected in the association scan near the PAPD1 gene may be either false or due to a small genuine effect, and cannot explain linkage at the IDDM10 region.

    Funded by: Wellcome Trust: 068545/Z/02

    BMC genetics 2007;8;24

  • Modeling insertional mutagenesis using gene length and expression in murine embryonic stem cells.

    Nord AS, Vranizan K, Tingley W, Zambon AC, Hanspers K, Fong LG, Hu Y, Bacchetti P, Ferrin TE, Babbitt PC, Doniger SW, Skarnes WC, Young SG and Conklin BR

    Department of Medicine, MacDonald Medical Research Laboratories, University of California at Los Angeles, California, USA.

    Background: High-throughput mutagenesis of the mammalian genome is a powerful means to facilitate analysis of gene function. Gene trapping in embryonic stem cells (ESCs) is the most widely used form of insertional mutagenesis in mammals. However, the rules governing its efficiency are not fully understood, and the effects of vector design on the likelihood of gene-trapping events have not been tested on a genome-wide scale.

    In this study, we used public gene-trap data to model gene-trap likelihood. Using the association of gene length and gene expression with gene-trap likelihood, we constructed spline-based regression models that characterize which genes are susceptible and which genes are resistant to gene-trapping techniques. We report results for three classes of gene-trap vectors, showing that both length and expression are significant determinants of trap likelihood for all vectors. Using our models, we also quantitatively identified hotspots of gene-trap activity, which represent loci where the high likelihood of vector insertion is controlled by factors other than length and expression. These formalized statistical models describe a high proportion of the variance in the likelihood of a gene being trapped by expression-dependent vectors and a lower, but still significant, proportion of the variance for vectors that are predicted to be independent of endogenous gene expression.

    The findings of significant expression and length effects reported here further the understanding of the determinants of vector insertion. Results from this analysis can be applied to help identify other important determinants of this important biological phenomenon and could assist planning of large-scale mutagenesis efforts.

    Funded by: NHGRI NIH HHS: HG002766; NHLBI NIH HHS: HL66621

    PloS one 2007;2;7;e617

  • A second large plasmid encodes conjugative transfer and antimicrobial resistance in O119:H2 and some typical O111 enteropathogenic Escherichia coli strains.

    Nwaneshiudu AI, Mucci T, Pickard DJ and Okeke IN

    Department of Biology, Haverford College, 370 Lancaster Avenue, Haverford, PA 19041, USA.

    A novel and functional conjugative transfer system identified in O119:H2 enteropathogenic Escherichia coli (EPEC) strain MB80 by subtractive hybridization is encoded on a large multidrug resistance plasmid, distinct from the well-described EPEC adherence factor (EAF) plasmid. Variants of the MB80 conjugative resistance plasmid were identified in other EPEC strains, including the prototypical O111:NM strain B171, from which the EAF plasmid has been sequenced. This separate large plasmid and the selective advantage that it confers in the antibiotic era have been overlooked because it comigrates with the virulence plasmid on conventional gels.

    Journal of bacteriology 2007;189;16;6074-9

  • A Slicer-independent role for Argonaute 2 in hematopoiesis and the microRNA pathway.

    O'Carroll D, Mecklenbrauker I, Das PP, Santana A, Koenig U, Enright AJ, Miska EA and Tarakhovsky A

    The Laboratory for Lymphocyte Signaling, The Rockefeller University, New York, New York 10021, USA.

    Binding of microRNA (miRNA) to mRNA within the RNA-induced silencing complex (RISC) leads to either translational inhibition or to destruction of the target mRNA. Both of these functions are executed by Argonaute 2 (Ago2). Using hematopoiesis in mice as a model system to study the physiological function of Ago2 in vivo, we found that Ago2 controls early development of lymphoid and erythroid cells. We show that the unique and defining feature of Ago2, the Slicer endonuclease activity, is dispensable for hematopoiesis. Instead, we identified Ago2 as a key regulator of miRNA homeostasis. Deficiency in Ago2 impairs miRNA biogenesis from precursor-miRNAs followed by a reduction in miRNA expression levels. Collectively, our data identify Ago2 as a highly specialized member of the Argonaute family with an essential nonredundant Slicer-independent function within the mammalian miRNA pathway.

    Genes & development 2007;21;16;1999-2004

  • Identification of secretory granule phosphatidylinositol 4,5-bisphosphate-interacting proteins using an affinity pulldown strategy.

    Osborne SL, Wallis TP, Jimenez JL, Gorman JJ and Meunier FA

    Molecular Dynamics of Synaptic Function Laboratory, School of Biomedical Sciences, University of Queensland, St. Lucia, Queensland 4072, Australia.

    Phosphatidylinositol 4,5-bisphosphate (PtdIns(4,5)P2) synthesis is required for calcium-dependent exocytosis in neurosecretory cells. We developed a PtdIns(4,5)P2 bead pulldown strategy combined with subcellular fractionation to identify endogenous chromaffin granule proteins that interact with PtdIns(4,5)P2. We identified two synaptotagmin isoforms, synaptotagmins 1 and 7; spectrin; alpha-adaptin; and synaptotagmin-like protein 4 (granuphilin) by mass spectrometry and Western blotting. The interaction between synaptotagmin 7 and PtdIns(4,5)P2 and its functional relevance was investigated. The 45-kDa isoform of synaptotagmin 7 was found to be highly expressed in adrenal chromaffin cells compared with PC12 cells and to mainly localize to secretory granules by subcellular fractionation, immunoisolation, and immunocytochemistry. We demonstrated that synaptotagmin 7 binds PtdIns(4,5)P2 via the C2B domain in the absence of calcium and via both the C2A and C2B domains in the presence of calcium. We mutated the polylysine stretch in synaptotagmin 7 C2B and demonstrated that this mutant domain lacks the calcium-independent PtdIns(4,5)P2 binding. Synaptotagmin 7 C2B domain inhibited catecholamine release from digitonin-permeabilized chromaffin cells, and this inhibition was abrogated with the C2B polylysine mutant. These data indicate that synaptotagmin 7 C2B-effector interactions, which occur via the polylysine stretch, including calcium-independent PtdIns(4,5)P2 binding, are important for chromaffin granule exocytosis.

    Molecular & cellular proteomics : MCP 2007;6;7;1158-69

  • Common variation in the LMNA gene (encoding lamin A/C) and type 2 diabetes: association analyses in 9,518 subjects.

    Owen KR, Groves CJ, Hanson RL, Knowler WC, Shuldiner AR, Elbein SC, Mitchell BD, Froguel P, Ng MC, Chan JC, Jia W, Deloukas P, Hitman GA, Walker M, Frayling TM, Hattersley AT, Zeggini E and McCarthy MI

    Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Old Road, Headington, Oxford OX3 7LJ, U.K.

    Mutations in the LMNA gene (encoding lamin A/C) underlie familial partial lipodystrophy, a syndrome of monogenic insulin resistance and diabetes. LMNA maps to the well-replicated diabetes-linkage region on chromosome 1q, and there are reported associations between LMNA single nucleotide polymorphisms (SNPs) (particularly rs4641; H566H) and metabolic syndrome components. We examined the relationship between LMNA variation and type 2 diabetes (using six tag SNPs capturing >90% of common variation) in several large datasets. Analysis of 2,490 U.K. diabetic case and 2,556 control subjects revealed no significant associations at either genotype or haplotype level: the minor allele at rs4641 was no more frequent in case subjects (allelic odds ratio [OR] 1.07 [95% CI 0.98-1.17], P = 0.15). In 390 U.K. trios, family-based association analyses revealed nominally significant overtransmission of the major allele at rs12063564 (P = 0.01), which was not corroborated in other samples. Finally, genotypes for 2,817 additional subjects from the International 1q Consortium revealed no consistent case-control or family-based associations with LMNA variants. Across all our data, the OR for the rs4641 minor allele approached but did not attain significance (1.07 [0.99-1.15], P = 0.08). Our data do not therefore support a major effect of LMNA variation on diabetes risk. However, in a meta-analysis including other available data, there is evidence that rs4641 has a modest effect on diabetes susceptibility (1.10 [1.04-1.16], P = 0.001).

    Funded by: NIA NIH HHS: T32 AG 00219; NIDDK NIH HHS: K24 DK 2673, R01 DK 073490, R01 DK 39311, R01 DK 54261, U01 DK 58026; Wellcome Trust: 076113, 079557

    Diabetes 2007;56;3;879-83

  • Chromosome painting among Proboscidea, Hyracoidea and Sirenia: support for Paenungulata (Afrotheria, Mammalia) but not Tethytheria.

    Pardini AT, O'Brien PC, Fu B, Bonde RK, Elder FF, Ferguson-Smith MA, Yang F and Robinson TJ

    Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Private Bag X1, Matieland, 7602 Stellenbosch, South Africa.

    Despite marked improvements in the interpretation of systematic relationships within Eutheria, particular nodes, including Paenungulata (Hyracoidea, Sirenia and Proboscidea), remain ambiguous. The combination of a rapid radiation, a deep divergence and an extensive morphological diversification has resulted in a limited phylogenetic signal confounding resolution within this clade both at the morphological and nucleotide levels. Cross-species chromosome painting was used to delineate regions of homology between Loxodonta africana (2n=56), Procavia capensis (2n=54), Trichechus manatus latirostris (2n=48) and an outgroup taxon, the aardvark (Orycteropus afer, 2n=20). Changes specific to each lineage were identified and although the presence of a minimum of 11 synapomorphies confirmed the monophyly of Paenungulata, no change characterizing intrapaenungulate relationships was evident. The reconstruction of an ancestral paenungulate karyotype and the estimation of rates of chromosomal evolution indicate a reduced rate of genomic repatterning following the paenungulate radiation. In comparison to data available for other mammalian taxa, the paenungulate rate of chromosomal evolution is slow to moderate. As a consequence, the absence of a chromosomal character uniting two paenungulates (at the level of resolution characterized in this study) may be due to a reduced rate of chromosomal change relative to the length of time separating successive divergence events.

    Funded by: Wellcome Trust

    Proceedings. Biological sciences / The Royal Society 2007;274;1615;1333-40

  • Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility.

    Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, Fisher SA, Roberts RG, Nimmo ER, Cummings FR, Soars D, Drummond H, Lees CW, Khawaja SA, Bagnall R, Burke DA, Todhunter CE, Ahmad T, Onnie CM, McArdle W, Strachan D, Bethel G, Bryan C, Lewis CM, Deloukas P, Forbes A, Sanderson J, Jewell DP, Satsangi J, Mansfield JC, Wellcome Trust Case Control Consortium, Cardon L and Mathew CG

    Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, University of Cambridge, Cambridge CB2 2QQ, UK.

    A genome-wide association scan in individuals with Crohn's disease by the Wellcome Trust Case Control Consortium detected strong association at four novel loci. We tested 37 SNPs from these and other loci for association in an independent case-control sample. We obtained replication for the autophagy-inducing IRGM gene on chromosome 5q33.1 (replication P = 6.6 x 10(-4), combined P = 2.1 x 10(-10)) and for nine other loci, including NKX2-3, PTPN2 and gene deserts on chromosomes 1q and 5p13.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 072029

    Nature genetics 2007;39;7;830-2

  • Randomized controlled comparison of ofloxacin, azithromycin, and an ofloxacin-azithromycin combination for treatment of multidrug-resistant and nalidixic acid-resistant typhoid fever.

    Parry CM, Ho VA, Phuong le T, Bay PV, Lanh MN, Tung le T, Tham NT, Wain J, Hien TT and Farrar JJ

    Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

    Isolates of Salmonella enterica serovar Typhi that are multidrug resistant (MDR, resistant to chloramphenicol, ampicillin, and trimethoprim-sulfamethoxazole) and have reduced susceptibility to fluoroquinolones (nalidixic acid resistant, Na(r)) are common in Asia. The optimum treatment for infections caused by such isolates is not established. This study compared different antimicrobial regimens for the treatment of MDR/Na(r) typhoid fever. Vietnamese children and adults with uncomplicated typhoid fever were entered into an open randomized controlled trial. Ofloxacin (20 mg/kg of body weight/day for 7 days), azithromycin (10 mg/kg/day for 7 days), and ofloxacin (15 mg/kg/day for 7 days) combined with azithromycin (10 mg/kg/day for the first 3 days) were compared. Of the 241 enrolled patients, 187 were eligible for analysis (186 S. enterica serovar Typhi, 1 Salmonella enterica serovar Paratyphi A). Eighty-seven percent (163/187) of the patients were children; of the S. enterica serovar Typhi isolates, 88% (165/187) were MDR and 93% (173/187) were Na(r). The clinical cure rate was 64% (40/63) with ofloxacin, 76% (47/62) with ofloxacin-azithromycin, and 82% (51/62) with azithromycin (P = 0.053). The mean (95% confidence interval [CI]) fever clearance time for patients treated with azithromycin (5.8 days [5.1 to 6.5 days]) was shorter than that for patients treated with ofloxacin-azithromycin (7.1 days [6.2 to 8.1 days]) and ofloxacin (8.2 days [7.2 to 9.2 days]) (P < 0.001). Positive fecal carriage immediately posttreatment was detected in 19.4% (12/62) of patients treated with ofloxacin, 6.5% (4/62) of those treated with the combination, and 1.6% (1/62) of those treated with azithromycin (P = 0.006). Both antibiotics were well tolerated. Uncomplicated typhoid fever due to isolates of MDR S. enterica serovar Typhi with reduced susceptibility to fluoroquinolones (Na(r)) can be successfully treated with a 7-day course of azithromycin.

    Funded by: Wellcome Trust

    Antimicrobial agents and chemotherapy 2007;51;3;819-25

  • Interaction analysis of the CBLB and CTLA4 genes in type 1 diabetes.

    Payne F, Cooper JD, Walker NM, Lam AC, Smink LJ, Nutland S, Stevens HE, Hutchings J and Todd JA

    Juvenile Diabetes Research Foundation/Wellcome Trust, Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Cambridge, UK.

    Gene-gene interaction analyses have been suggested as a potential strategy to help identify common disease susceptibility genes. Recently, evidence of a statistical interaction between polymorphisms in two negative immunoregulatory genes, CBLB and CTLA4, has been reported in type 1 diabetes (T1D). This study, in 480 Danish families, reported an association between T1D and a synonymous coding SNP in exon 12 of the CBLB gene (rs3772534 G>A; minor allele frequency, MAF=0.24; derived relative risk, RR for G allele=1.78; P=0.046). Furthermore, evidence of a statistical interaction with the known T1D susceptibility-associated CTLA4 polymorphism rs3087243 (laboratory name CT60, G>A) was reported (P<0.0001), such that the CBLB SNP rs3772534 G allele was overtransmitted to offspring with the CTLA4 rs3087243 G/G genotype. We have, therefore, attempted to obtain additional support for this finding in both large family and case-control collections. In a primary analysis, no evidence for an association of the CBLB SNP rs3772534 with disease was found in either sample set (2162 parent-child trios, P=0.33; 3453 cases and 3655 controls, P=0.69). In the case-only statistical interaction analysis between rs3772534 and rs3087243, there was also no support for an effect (1994 T1D affected offspring, and 3215 cases, P=0.92). These data highlight the need for large, well-characterized populations, offering the possibility of obtaining additional support for initial observations owing to the low prior probability of identifying reproducible evidence of gene-gene interactions in the analysis of common disease-associated variants in human populations.

    Funded by: Medical Research Council: G0000934; Wellcome Trust

    Journal of leukocyte biology 2007;81;3;581-3

  • The practical implications of comparative kinetoplastid genomics.

    Peacock CS

    Wellcome Trust Sanger Institute, UK.

    SEB experimental biology series 2007;58;25-45

  • Comparative genomic analysis of three Leishmania species that cause diverse human disease.

    Peacock CS, Seeger K, Harris D, Murphy L, Ruiz JC, Quail MA, Peters N, Adlem E, Tivey A, Aslett M, Kerhornou A, Ivens A, Fraser A, Rajandream MA, Carver T, Norbertczak H, Chillingworth T, Hance Z, Jagels K, Moule S, Ormond D, Rutter S, Squares R, Whitehead S, Rabbinowitsch E, Arrowsmith C, White B, Thurston S, Bringaud F, Baldauf SL, Faulconbridge A, Jeffares D, Depledge DP, Oyola SO, Hilley JD, Brito LO, Tosi LR, Barrell B, Cruz AK, Mottram JC, Smith DF and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.

    Funded by: Medical Research Council: G0000508; Wellcome Trust: 076355, 085775

    Nature genetics 2007;39;7;839-47

  • Angiotensin II increases chloride absorption in the cortical collecting duct in mice through a pendrin-dependent mechanism.

    Pech V, Kim YH, Weinstein AM, Everett LA, Pham TD and Wall SM

    Renal Division, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia 30322, USA.

    Pendrin (Slc26a4) localizes to type B and non-A, non-B intercalated cells in the distal convoluted tubule, the connecting tubule, and the cortical collecting duct (CCD), where it mediates apical Cl(-)/HCO(3)(-) exchange. The purpose of this study was to determine whether angiotensin II increases transepithelial net chloride transport, J(Cl), in mouse CCD through a pendrin-dependent mechanism. J(Cl) and transepithelial voltage, V(T), were measured in CCDs perfused in vitro from wild-type and Slc26a4 null mice ingesting a NaCl-replete diet or a NaCl-replete diet and furosemide. In CCDs from wild-type mice ingesting a NaCl-replete diet, V(T) and J(Cl) were not different from zero either in the presence or absence of angiotensin II (10(-8) M) in the bath. Thus further experiments employed mice given the high-NaCl diet and furosemide to upregulate renal pendrin expression. CCDs from furosemide-treated wild-type mice had a lumen-negative V(T) and absorbed Cl(-). With angiotensin II in the bath, Cl(-) absorption doubled although V(T) did not become more lumen negative. In contrast, in CCDs from furosemide-treated Slc26a4 null mice, Cl(-) secretion and a V(T) of approximately 0 were observed, neither of which changed with angiotensin II application. Inhibiting ENaC with benzamil abolished V(T) although J(Cl) fell only approximately 50%. Thus substantial Cl(-) absorption is observed in the absence of an electromotive force. Attenuating apical anion exchange with the peritubular application of the H(+)-ATPase inhibitor bafilomycin abolished benzamil-insensitive Cl(-) absorption. In conclusion, angiotensin II increases transcellular Cl(-) absorption in the CCD through a pendrin- and H(+)-ATPase-dependent process.

    Funded by: NIDDK NIH HHS: DK-52935

    American journal of physiology. Renal physiology 2007;292;3;F914-20

  • Environmental and genetic modifiers of squint penetrance during zebrafish embryogenesis.

    Pei W, Williams PH, Clark MD, Stemple DL and Feldman B

    Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.

    The Nodal-related subgroup of the TGFbeta superfamily of secreted cytokines regulates the specification of the mesodermal and endodermal germ layers during gastrulation. Two Nodal-related proteins - Squint (Sqt) and Cyclops (Cyc) - are expressed during germ-layer specification in zebrafish. Genetic sqt mutant phenotypes have defined a variable requirement for zygotic Sqt, but not for maternal Sqt, in midline mesendoderm development. However a comparison of phenotypes arising from oocytes or zygotes injected with Sqt antisense morpholinos has suggested a novel requirement for maternal Sqt in dorsal specification. In this study we examined maternal-zygotic mutants for each of two sqt alleles and we also compared phenotypes of closely related zygotic and maternal-zygotic sqt mutants. Each of these approaches indicated there is no general requirement for maternal Sqt. To better understand the dispensability of maternal and zygotic Sqt, we sought out developmental contexts that more rigorously demand intact Sqt signalling. We found that sqt penetrance is influenced by genetic modifiers, by environmental temperature, by levels of residual Activin-like activity and by Heat-Shock Protein 90 (HSP90) activity. Therefore, Sqt may confer an evolutionary advantage by protecting early-stage embryos against detrimental interacting alleles and environmental challenges.

    Funded by: NHGRI NIH HHS: Z01 HG200309-05, Z99 HG999999; Wellcome Trust

    Developmental biology 2007;308;2;368-78

  • Diet and the evolution of human amylase gene copy number variation.

    Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain JL, Misra R, Carter NP, Lee C and Stone AC

    School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona 85287, USA.

    Starch consumption is a prominent characteristic of agricultural societies and hunter-gatherers in arid environments. In contrast, rainforest and circum-arctic hunter-gatherers and some pastoralists consume much less starch. This behavioral variation raises the possibility that different selective pressures have acted on amylase, the enzyme responsible for starch hydrolysis. We found that copy number of the salivary amylase gene (AMY1) is correlated positively with salivary amylase protein level and that individuals from populations with high-starch diets have, on average, more AMY1 copies than those with traditionally low-starch diets. Comparisons with other loci in a subset of these populations suggest that the extent of AMY1 copy number differentiation is highly unusual. This example of positive selection on a copy number-variable gene is, to our knowledge, one of the first discovered in the human genome. Higher AMY1 copy numbers and protein levels probably improve the digestion of starchy foods and may buffer against the fitness-reducing effects of intestinal disease.

    Funded by: NCRR NIH HHS: C06 RR014491-01, C06 RR016483-01, RR014491, RR015087, RR016483, U42 RR015087-01; Wellcome Trust

    Nature genetics 2007;39;10;1256-60

  • Noncanonical cell death pathways act during Drosophila oogenesis.

    Peterson JS, Bass BP, Jue D, Rodriguez A, Abrams JM and McCall K

    Department of Biology, Boston University, Boston, Massachusetts 02115, USA.

    Programmed cell death (PCD) is a highly conserved process that occurs during development and in response to adverse conditions. In Drosophila, most PCDs require the genes within the H99 deficiency, the adaptor molecule Ark, and caspases. Here we investigate 10 cell death genes for their potential roles in two distinct types of PCD that occur in oogenesis: developmental nurse cell PCD and starvation-induced PCD. Most of the genes investigated were found to have little effect on late stage developmental PCD in oogenesis, although ark mutants showed a partial inhibition. Mid-stage starvation-induced germline PCD was found to be independent of the upstream activators and ark although it requires caspases, suggesting an apoptosome-independent mechanism of caspase activation in mid-oogenesis. These results indicate that novel pathways must control PCD in the ovary.

    Funded by: NIGMS NIH HHS: R01 GM060574-05, R01 GM072124, R01 GM60574

    Genesis (New York, N.Y. : 2000) 2007;45;6;396-404

  • A generalized transducing phage for the murine pathogen Citrobacter rodentium.

    Petty NK, Toribio AL, Goulding D, Foulds I, Thomson N, Dougan G and Salmond GP

    Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, UK.

    A virulent phage (phiCR1) capable of generalized transduction in Citrobacter rodentium was isolated from the environment and characterized. C. rodentium is a natural pathogen of mice, causing transmissible murine colonic hyperplasia. Sequencing of its genome has recently been completed and will soon be fully annotated and published. C. rodentium is an important model organism for infections caused by the human pathogens enteropathogenic and enterohaemorrhagic Escherichia coli (EPEC and EHEC). phiCR1 uses a lipopolysaccharide receptor, has a genome size of approximately 300 kb, and is able to transduce a variety of markers. phiCR1 is the first reported transducing phage for C. rodentium and will be a useful tool for functional genomic analysis of this important natural murine pathogen.

    Funded by: Wellcome Trust: 076962

    Microbiology (Reading, England) 2007;153;Pt 9;2984-8

  • The SCL transcriptional network and BMP signaling pathway interact to regulate RUNX1 activity.

    Pimanda JE, Donaldson IJ, de Bruijn MF, Kinston S, Knezevic K, Huckle L, Piltz S, Landry JR, Green AR, Tannahill D and Göttgens B

    Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 2XY, United Kingdom.

    Hematopoietic stem cell (HSC) development is regulated by several signaling pathways and a number of key transcription factors, which include Scl/Tal1, Runx1, and members of the Smad family. However, it remains unclear how these various determinants interact. Using a genome-wide computational screen based on the well characterized Scl +19 HSC enhancer, we have identified a related Smad6 enhancer that also targets expression to blood and endothelial cells in transgenic mice. Smad6, Bmp4, and Runx1 transcripts are concentrated along the ventral aspect of the E10.5 dorsal aorta in the aorta-gonad-mesonephros region from which HSCs originate. Moreover, Smad6, an inhibitor of Bmp4 signaling, binds and inhibits Runx1 activity, whereas Smad1, a positive mediator of Bmp4 signaling, transactivates the Runx1 promoter. Taken together, our results integrate three key determinants of HSC development; the Scl transcriptional network, Runx1 activity, and the Bmp4/Smad signaling pathway.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;3;840-5

  • Gata2, Fli1, and Scl form a recursively wired gene-regulatory circuit during early hematopoietic development.

    Pimanda JE, Ottersbach K, Knezevic K, Kinston S, Chan WY, Wilson NK, Landry JR, Wood AD, Kolb-Kokocinski A, Green AR, Tannahill D, Lacaud G, Kouskoff V and Göttgens B

    Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom.

    Conservation of the vertebrate body plan has been attributed to the evolutionary stability of gene-regulatory networks (GRNs). We describe a regulatory circuit made up of Gata2, Fli1, and Scl/Tal1 and their enhancers, Gata2-3, Fli1+12, and Scl+19, that operates during specification of hematopoiesis in the mouse embryo. We show that the Fli1+12 enhancer, like the Gata2-3 and Scl+19 enhancers, targets hematopoietic stem cells (HSCs) and relies on a combination of Ets, Gata, and E-Box motifs. We show that the Gata2-3 enhancer also uses a similar cluster of motifs and that Gata2, Fli1, and Scl are expressed in embryonic day-11.5 dorsal aorta where HSCs originate and in fetal liver where they multiply. The three HSC enhancers in these tissues and in ES cell-derived hemangioblast equivalents are bound by each of these transcription factors (TFs) and form a fully connected triad that constitutes a previously undescribed example of both this network motif in mammalian development and a GRN kernel operating during the specification of a mammalian stem cell.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;45;17692-7

  • Frequent activating FGFR2 mutations in endometrial carcinomas parallel germline mutations associated with craniosynostosis and skeletal dysplasia syndromes.

    Pollock PM, Gartside MG, Dejeza LC, Powell MA, Mallon MA, Davies H, Mohammadi M, Futreal PA, Stratton MR, Trent JM and Goodfellow PJ

    Cancer and Cell Biology Division, Translational Genomics Research Institute, Phoenix, AZ, USA.

    Endometrial carcinoma is the most common gynecological malignancy in the United States. Although most women present with early disease confined to the uterus, the majority of persistent or recurrent tumors are refractory to current chemotherapies. We have identified a total of 11 different FGFR2 mutations in 3/10 (30%) of endometrial cell lines and 19/187 (10%) of primary uterine tumors. Mutations were seen primarily in tumors of the endometrioid histologic subtype (18/115 cases investigated, 16%). The majority of the somatic mutations identified were identical to germline activating mutations in FGFR2 and FGFR3 that cause Apert Syndrome, Beare-Stevenson Syndrome, hypochondroplasia, achondroplasia and SADDAN syndrome. The two most common somatic mutations identified were S252W (in eight tumors) and N550K (in five samples). Four novel mutations were identified, three of which are also likely to result in receptor gain-of-function. Extensive functional analyses have already been performed on many of these mutations, demonstrating they result in receptor activation through a variety of mechanisms. The discovery of activating FGFR2 mutations in endometrial carcinoma raises the possibility of employing anti-FGFR molecularly targeted therapies in patients with advanced or recurrent endometrial carcinoma.

    Funded by: NCI NIH HHS: CA091842, R01 CA109544, R01 CA71754; Wellcome Trust: 077012

    Oncogene 2007;26;50;7158-62

  • F0 generation mice fully derived from gene-targeted embryonic stem cells allowing immediate phenotypic analyses.

    Poueymirou WT, Auerbach W, Frendewey D, Hickey JF, Escaravage JM, Esau L, Doré AT, Stevens S, Adams NC, Dominguez MG, Gale NW, Yancopoulos GD, DeChiara TM and Valenzuela DM

    Regeneron Pharmaceuticals, Inc., Tarrytown, New York 10591, USA.

    A useful approach for exploring gene function involves generating mutant mice from genetically modified embryonic stem (ES) cells. Recent advances in genetic engineering of ES cells have shifted the bottleneck in this process to the generation of mice. Conventional injections of ES cells into blastocyst hosts produce F0 generation chimeras that are only partially derived from ES cells, requiring additional breeding to obtain mutant mice that can be phenotyped. The tetraploid complementation approach directly yields mice that are almost entirely derived from ES cells, but it is inefficient, works only with certain hybrid ES cell lines and suffers from nonspecific lethality and abnormalities, complicating phenotypic analyses. Here we show that laser-assisted injection of either inbred or hybrid ES cells into eight cell-stage embryos efficiently yields F0 generation mice that are fully ES cell-derived and healthy, exhibit 100% germline transmission and allow immediate phenotypic analysis, greatly accelerating gene function assignment.

    Nature biotechnology 2007;25;1;91-9

  • Speciation in the Genus Bordetella as deduced from comparative genome analyses

    Preston A

    Bordetella: Molecular Microbiology. 2007;1-16

  • Integrating sequence and structural biology with DAS.

    Prlić A, Down TA, Kulesha E, Finn RD, Kähäri A and Hubbard TJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Background: The Distributed Annotation System (DAS) is a network protocol for exchanging biological data. It is frequently used to share annotations of genomes and protein sequence.

    Results: Here we present several extensions to the current DAS 1.5 protocol. These provide new commands to share alignments, three dimensional molecular structure data, add the possibility for registration and discovery of DAS servers, and provide a convention how to provide different types of data plots. We present examples of web sites and applications that use the new extensions. We operate a public registry of DAS sources, which now includes entries for more than 250 distinct sources.

    Conclusion: Our DAS extensions are essential for the management of the growing number of services and exchange of diverse biological data sets. In addition the extensions allow new types of applications to be developed and scientific questions to be addressed. The registry of DAS sources is available at

    Funded by: Wellcome Trust: 062023, 077198

    BMC bioinformatics 2007;8;333

  • Olfactory bulb hypoplasia in Prokr2 null mice stems from defective neuronal progenitor migration and differentiation.

    Prosser HM, Bradley A and Caldwell MA

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    New neurons are added on a daily basis to the olfactory bulb (OB) of a mammal, and this phenomenon exists throughout its lifetime. These new cells are born in the subventricular zone and migrate to the OB via the rostral migratory stream (RMS). To examine the role of the prokineticin receptor 2 (Prokr2) in neurogenesis, we created a Prokr2 null mouse, and report a decrease in the volume of its OB and also a decrease in the number of bromodeoxyuridine (BrdU)-positive cells. There is disrupted architecture of the OB, with the glomerular layer containing terminal dUTP nick-end labeling (TUNEL) -positive nuclei and also a decrease in tyrosine hydroxylase-positive neurons in this layer. In addition, there are increased numbers of doublecortin-positive neuroblasts in the RMS and increased PSA-NCAM (polysialylated form of the neural cell adhesion molecule) -positive neuronal progenitors around the olfactory ventricle, indicating their detachment from homotypic chains is compromised. Finally, in support of this, Prokr2-deficient cells expanded in vitro as neurospheres are incapable of migrating towards a source of recombinant human prokineticin 2 (PROK2). Together, these findings suggest an important role for Prokr2 in OB neurogenesis.

    Funded by: Wellcome Trust

    The European journal of neuroscience 2007;26;12;3339-44

  • Prokineticin receptor 2 (Prokr2) is essential for the regulation of circadian behavior by the suprachiasmatic nuclei.

    Prosser HM, Bradley A, Chesham JE, Ebling FJ, Hastings MH and Maywood ES

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    The suprachiasmatic nucleus (SCN), the brain's principal circadian pacemaker, coordinates adaptive daily cycles of behavior and physiology, including the rhythm of sleep and wakefulness. The cellular mechanism sustaining SCN circadian timing is well characterized, but the neurochemical pathways by which SCN neurons coordinate circadian behaviors remain unknown. SCN transplant studies suggest a role for (unidentified) secreted factors, and one potential candidate is the SCN neuropeptide prokineticin 2 (Prok2). Prok2 and its cognate prokineticin receptor 2 (Prokr2/Gpcr73l1) are widely expressed in both the SCN and its neural targets, and Prok2 is light-regulated. Hence, they may contribute to cellular timing within the SCN, entrainment of the clock, and/or they may mediate circadian output. We show that a targeted null mutation of Prokr2 disrupts circadian coordination of the activity cycle and thermoregulation. Specifically, mice lacking Prokr2 lost precision in timing the onset of nocturnal locomotor activity; and under both a light/dark cycle and continuous darkness, there was a pronounced temporal redistribution of activity away from early to late circadian night. Moreover, the coherence of circadian behavior was significantly reduced, and nocturnal body temperature was depressed. Entrainment by light is not, however, dependent on Prokr2, and bioluminescence real-time imaging of organotypical SCN slices showed that the mutant SCN is fully competent as a circadian oscillator. We conclude that Prokr2 is not necessary for SCN cellular timekeeping or entrainment, but it is an essential link for coordination of circadian behavior and physiology by the SCN, especially in defining the onset and maintenance of circadian night.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;2;648-53

  • Wnt5a functions in planar cell polarity regulation in mice.

    Qian D, Jones C, Rzadzinska A, Mark S, Zhang X, Steel KP, Dai X and Chen P

    Department of Cell Biology, Emory University School of Medicine, Atlanta, GA 30322, USA.

    Planar cell polarity (PCP) refers to the polarization of cells within the plane of a cell sheet. A distinctive epithelial PCP in vertebrates is the uniform orientation of stereociliary bundles of the sensory hair cells in the mammalian cochlea. In addition to establishing epithelial PCP, planar polarization is also required for convergent extension (CE); a polarized cellular movement that occurs during neural tube closure and cochlear extension. Studies in Drosophila and vertebrates have revealed a conserved PCP pathway, including Frizzled (Fz) receptors. Here we use the cochlea as a model system to explore the involvement of known ligands of Fz, Wnt morphogens, in PCP regulation. We show that Wnt5a forms a reciprocal expression pattern with a Wnt antagonist, the secreted frizzled-related protein 3 (Sfrp3 or Frzb), along the axis of planar polarization in the cochlear epithelium. We further demonstrate that Wnt5a antagonizes Frzb in regulating cochlear extension and stereociliary bundle orientation in vitro, and that Wnt5a(-/-) animals have a shortened and widened cochlea. Finally, we show that Wnt5a is required for proper subcellular distribution of a PCP protein, Ltap/Vangl2, and that Wnt5a interacts genetically with Ltap/Vangl2 for uniform orientation of stereocilia, cochlear extension, and neural tube closure. Together, these findings demonstrate that Wnt5a functions in PCP regulation in mice.

    Funded by: Medical Research Council: G0300212, MC_QA137918; NIDCD NIH HHS: R01 DC005213, R01 DC005213-06, R01 DC007423, R01 DC007423-01A2; Wellcome Trust

    Developmental biology 2007;306;1;121-33

  • Butyrate mediates decrease of histone acetylation centered on transcription start sites and down-regulation of associated genes.

    Rada-Iglesias A, Enroth S, Ameur A, Koch CM, Clelland GK, Respuela-Alonso P, Wilcox S, Dovey OM, Ellis PD, Langford CF, Dunham I, Komorowski J and Wadelius C

    Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala, SE-751 05 Sweden.

    Butyrate is a histone deacetylase inhibitor (HDACi) with anti-neoplastic properties, which theoretically reactivates epigenetically silenced genes by increasing global histone acetylation. However, recent studies indicate that a similar number or even more genes are down-regulated than up-regulated by this drug. We treated hepatocarcinoma HepG2 cells with butyrate and characterized the levels of acetylation at DNA-bound histones H3 and H4 by ChIP-chip along the ENCODE regions. In contrast to the global increases of histone acetylation, many genomic regions close to transcription start sites were deacetylated after butyrate exposure. In order to validate these findings, we found that both butyrate and trichostatin A treatment resulted in histone deacetylation at selected regions, while nucleosome loss or changes in histone H3 lysine 4 trimethylation (H3K4me3) did not occur in such locations. Furthermore, similar histone deacetylation events were observed when colon adenocarcinoma HT-29 cells were treated with butyrate. In addition, genes with deacetylated promoters were down-regulated by butyrate, and this was mediated at the transcriptional level by affecting RNA polymerase II (POLR2A) initiation/elongation. Finally, the global increase in acetylated histones was preferentially localized to the nuclear periphery, indicating that it might not be associated to euchromatin. Our results are significant for the evaluation of HDACi as anti-tumourogenic drugs, suggesting that previous models of action might need to be revised, and provides an explanation for the frequently observed repression of many genes during HDACi treatment.

    Funded by: NHGRI NIH HHS: 5 U01 HG003168

    Genome research 2007;17;6;708-19

  • PALB2, which encodes a BRCA2-interacting protein, is a breast cancer susceptibility gene.

    Rahman N, Seal S, Thompson D, Kelly P, Renwick A, Elliott A, Reid S, Spanova K, Barfoot R, Chagtai T, Jayatilake H, McGuffog L, Hanks S, Evans DG, Eccles D, Breast Cancer Susceptibility Collaboration (UK), Easton DF and Stratton MR

    Section of Cancer Genetics, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey SM2 5NG, UK.

    PALB2 interacts with BRCA2, and biallelic mutations in PALB2 (also known as FANCN), similar to biallelic BRCA2 mutations, cause Fanconi anemia. We identified monoallelic truncating PALB2 mutations in 10/923 individuals with familial breast cancer compared with 0/1,084 controls (P = 0.0004) and show that such mutations confer a 2.3-fold higher risk of breast cancer (95% confidence interval (c.i.) = 1.4-3.9, P = 0.0025). The results show that PALB2 is a breast cancer susceptibility gene and further demonstrate the close relationship of the Fanconi anemia-DNA repair pathway and breast cancer predisposition.

    Funded by: Wellcome Trust: 068545/Z/02, 077012

    Nature genetics 2007;39;2;165-7

  • Gene Prediction

    Rajandream M

    Bioinformatics 2007;71-102

  • Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia.

    Raval A, Tanner SM, Byrd JC, Angerman EB, Perko JD, Chen SS, Hackanson B, Grever MR, Lucas DM, Matkovic JJ, Lin TS, Kipps TJ, Murray F, Weisenburger D, Sanger W, Lynch J, Watson P, Jansen M, Yoshinaga Y, Rosenquist R, de Jong PJ, Coggill P, Beck S, Lynch H, de la Chapelle A and Plass C

    Department of Molecular Virology, Immunology, and Medical Genetics, Human Cancer Genetics Program, The Comprehensive Cancer Center at The Ohio State University, Columbus, OH 43214, USA.

    The heritability of B cell chronic lymphocytic leukemia (CLL) is relatively high; however, no predisposing mutation has been convincingly identified. We show that loss or reduced expression of death-associated protein kinase 1 (DAPK1) underlies cases of heritable predisposition to CLL and the majority of sporadic CLL. Epigenetic silencing of DAPK1 by promoter methylation occurs in almost all sporadic CLL cases. Furthermore, we defined a disease haplotype, which segregates with the CLL phenotype in a large family. DAPK1 expression of the CLL allele is downregulated by 75% in germline cells due to increased HOXB7 binding. In the blood cells from affected family members, promoter methylation results in additional loss of DAPK1 expression. Thus, reduced expression of DAPK1 can result from germline predisposition, as well as epigenetic or somatic events causing or contributing to the CLL phenotype.

    Funded by: NCI NIH HHS: 5U01 CA86389, CA101956, CA110496, CA81534, P30 CA16058, T32 CA106196; Wellcome Trust

    Cell 2007;129;5;879-90

  • Unusual phyletic distribution of peptidases as a tool for identifying potential drug targets.

    Rawlings ND

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Eukaryote homologues of carboxypeptidases Taq have been discovered by Niemirowicz et al. in the protozoan Trypanosoma cruzi, the causative agent of Chagas' disease. This is surprising, because the peptidase family was thought to be restricted to bacteria and archaea. In this issue of the Biochemical Journal, the authors propose that the Trypanosoma carboxypeptidases are potential drug targets for treatment of the disease. The authors also propose that the presence of the genes in the zooflagellates can be explained by a horizontal transfer of an ancestral gene from a prokaryote. Because peptidases are popular drug targets, identifying parasite or pathogen peptidases that have no homologues in their hosts would be a method to select the most promising targets. To understand how unusual this phyletic distribution is among the 183 families of peptidases, several other examples of horizontal transfers are presented, as well as some unusual losses of peptidase genes.

    The Biochemical journal 2007;401;2;e5-7

  • Mutations in ZDHHC9, which encodes a palmitoyltransferase of NRAS and HRAS, cause X-linked mental retardation associated with a Marfanoid habitus.

    Raymond FL, Tarpey PS, Edkins S, Tofts C, O'Meara S, Teague J, Butler A, Stevens C, Barthorpe S, Buck G, Cole J, Dicks E, Gray K, Halliday K, Hills K, Hinton J, Jones D, Menzies A, Perry J, Raine K, Shepherd R, Small A, Varian J, Widaa S, Mallya U, Moon J, Luo Y, Shaw M, Boyle J, Kerr B, Turner G, Quarrell O, Cole T, Easton DF, Wooster R, Bobrow M, Schwartz CE, Gecz J, Stratton MR and Futreal PA

    Cambridge Institute of Medical Research, University of Cambridge, Cambridge, CB2 2XY, UK.

    We have identified one frameshift mutation, one splice-site mutation, and two missense mutations in highly conserved residues in ZDHHC9 at Xq26.1 in 4 of 250 families with X-linked mental retardation (XLMR). In three of the families, the mental retardation phenotype is associated with a Marfanoid habitus, although none of the affected individuals meets the Ghent criteria for Marfan syndrome. ZDHHC9 is a palmitoyltransferase that catalyzes the posttranslational modification of NRAS and HRAS. The degree of palmitoylation determines the temporal and spatial location of these proteins in the plasma membrane and Golgi complex. The finding of mutations in ZDHHC9 suggests that alterations in the concentrations and cellular distribution of target proteins are sufficient to cause disease. This is the first XLMR gene to be reported that encodes a posttranslational modification enzyme, palmitoyltransferase. Furthermore, now that the first palmitoyltransferase that causes mental retardation has been identified, defects in other palmitoylation transferases become good candidates for causing other mental retardation syndromes.

    Funded by: NICHD NIH HHS: HD26202; Wellcome Trust

    American journal of human genetics 2007;80;5;982-7

  • Evolutionary and biomedical insights from the rhesus macaque genome.

    Rhesus Macaque Genome Sequencing and Analysis Consortium, Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, Dinh HH, Dugan-Rocha S, Fulton LA, Gabisi RA, Garner TT, Godfrey J, Hawes AC, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Kirkness EF, Cree A, Fowler RG, Lee S, Lewis LR, Li Z, Liu YS, Moore SM, Muzny D, Nazareth LV, Ngo DN, Okwuonu GO, Pai G, Parker D, Paul HA, Pfannkoch C, Pohl CS, Rogers YH, Ruiz SJ, Sabo A, Santibanez J, Schneider BW, Smith SM, Sodergren E, Svatek AF, Utterback TR, Vattathil S, Warren W, White CS, Chinwalla AT, Feng Y, Halpern AL, Hillier LW, Huang X, Minx P, Nelson JO, Pepin KH, Qin X, Sutton GG, Venter E, Walenz BP, Wallis JW, Worley KC, Yang SP, Jones SM, Marra MA, Rocchi M, Schein JE, Baertsch R, Clarke L, Csürös M, Glasscock J, Harris RA, Havlak P, Jackson AR, Jiang H, Liu Y, Messina DN, Shen Y, Song HX, Wylie T, Zhang L, Birney E, Han K, Konkel MK, Lee J, Smit AF, Ullmer B, Wang H, Xing J, Burhans R, Cheng Z, Karro JE, Ma J, Raney B, She X, Cox MJ, Demuth JP, Dumas LJ, Han SG, Hopkins J, Karimpour-Fard A, Kim YH, Pollack JR, Vinar T, Addo-Quaye C, Degenhardt J, Denby A, Hubisz MJ, Indap A, Kosiol C, Lahn BT, Lawson HA, Marklein A, Nielsen R, Vallender EJ, Clark AG, Ferguson B, Hernandez RD, Hirani K, Kehrer-Sawatzki H, Kolb J, Patil S, Pu LL, Ren Y, Smith DG, Wheeler DA, Schenck I, Ball EV, Chen R, Cooper DN, Giardine B, Hsu F, Kent WJ, Lesk A, Nelson DL, O'brien WE, Prüfer K, Stenson PD, Wallace JC, Ke H, Liu XM, Wang P, Xiang AP, Yang F, Barber GP, Haussler D, Karolchik D, Kern AD, Kuhn RM, Smith KE and Zwieg AS

    Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.

    The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.

    Funded by: NHGRI NIH HHS: U54 HG003068, U54 HG003079, U54 HG003273; Wellcome Trust: 062023

    Science (New York, N.Y.) 2007;316;5822;222-34

  • The brain-derived neurotrophic factor rs6265 (Val66Met) polymorphism and depression in Mexican-Americans.

    Ribeiro L, Busnello JV, Cantor RM, Whelan F, Whittaker P, Deloukas P, Wong ML and Licinio J

    Department of Psychiatry & Behavioral Sciences, Center on Pharmacogenomics, University of Miami Miller School of Medicine, Miami, Florida 33136-1013, USA.

    The hypothesis that brain-derived neurotrophic factor (BDNF) is involved in the pathogenesis of major depression is supported by several research findings; however, genetic studies assessing the relationship between BDNF and psychiatric disorders have produced conflicting results. We examined the effect of a BDNF polymorphism on depression susceptibility in Mexican-Americans. The single nucleotide polymorphism (Val66Met), which has been shown to have functional and behavioral effects, was genotyped in 284 depressed participants and 331 controls, showing association with depression (P=0.005). Individuals homozygous for the major allele (GG) had an increased chance of being depressed (OR=1.7 95% CI 1.17-2.47). Our findings support the association of BDNF single nucleotide polymorphism rs6265 and depression, suggesting that this polymorphism may increase susceptibility to major depression in Mexican-Americans.

    Funded by: NCRR NIH HHS: RR000865, RR017365, RR16996; NHGRI NIH HHS: HG002500; NIDDK NIH HHS: DK063240; NIGMS NIH HHS: GM61394; NIMH NIH HHS: MH062777; Wellcome Trust: 077011

    Neuroreport 2007;18;12;1291-3

  • Requirement of bic/microRNA-155 for normal immune function.

    Rodriguez A, Vigorito E, Clare S, Warren MV, Couttet P, Soond DR, van Dongen S, Grocock RJ, Das PP, Miska EA, Vetrie D, Okkenhaug K, Enright AJ, Dougan G, Turner M and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    MicroRNAs are a class of small RNAs that are increasingly being recognized as important regulators of gene expression. Although hundreds of microRNAs are present in the mammalian genome, genetic studies addressing their physiological roles are at an early stage. We have shown that mice deficient for bic/microRNA-155 are immunodeficient and display increased lung airway remodeling. We demonstrate a requirement of bic/microRNA-155 for the function of B and T lymphocytes and dendritic cells. Transcriptome analysis of bic/microRNA-155-deficient CD4+ T cells identified a wide spectrum of microRNA-155-regulated genes, including cytokines, chemokines, and transcription factors. Our work suggests that bic/microRNA-155 plays a key role in the homeostasis and function of the immune system.

    Funded by: Medical Research Council: G117/424; Wellcome Trust: 077187

    Science (New York, N.Y.) 2007;316;5824;608-11

  • Chromosomal evolution of Arvicolinae (Cricetidae, Rodentia). II. The genome homology of two mole voles (genus Ellobius), the field vole and golden hamster revealed by comparative chromosome painting.

    Romanenko SA, Sitnikova NA, Serdukova NA, Perelman PL, Rubtsova NV, Bakloushinskaya IY, Lyapunova EA, Just W, Ferguson-Smith MA, Yang F and Graphodatsky AS

    Institute of Cytology and Genetics, SB RAS, Novosibirsk, 630090, Russia.

    Using cross-species chromosome painting, we have carried out a comprehensive comparison of the karyotypes of two Ellobius species with unusual sex determination systems: the Transcaucasian mole vole, Ellobius lutescens (2n = 17, X in both sexes), and the northern mole vole, Ellobius talpinus (2n = 54, XX in both sexes). Both Ellobius species have highly rearranged karyotypes. The chromosomal paints from the field vole (Microtus agrestis) detected, in total, 34 and 32 homologous autosomal regions in E. lutescens and E. talpinus karyotypes, respectively. No difference in hybridization pattern of the X paint (as well as Y paint) probes on male and female chromosomes was discovered. The set of golden hamster (Mesocricetus auratus) chromosomal painting probes revealed 44 and 43 homologous autosomal regions in E. lutescens and E. talpinus karyotypes, respectively. A comparative chromosome map was established based on the results of cross-species chromosome painting and a hypothetical ancestral Ellobius karyotype was reconstructed. A considerable number of rearrangements were detected; 31 and 7 fusion/fission rearrangements differentiated the karyotypes of E. lutescens and E. talpinus from the ancestral Ellobius karyotype. It seems that inversions have played a minor role in the genome evolution of these Ellobius species.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2007;15;7;891-7

  • Karyotype evolution and phylogenetic relationships of hamsters (Cricetidae, Muroidea, Rodentia) inferred from chromosomal painting and banding comparison.

    Romanenko SA, Volobouev VT, Perelman PL, Lebedev VS, Serdukova NA, Trifonov VA, Biltueva LS, Nie W, O'Brien PC, Bulatova NSh, Ferguson-Smith MA, Yang F and Graphodatsky AS

    Institute of Cytology and Genetics, SB RAS, Novosibirsk, 630090, Russia.

    The evolutionary success of rodents of the superfamily Muroidea makes this taxon the most interesting for evolution studies, including study at the chromosomal level. Chromosome-specific painting probes from the Chinese hamster and the Syrian (golden) hamster were used to delimit homologous chromosomal segments among 15 hamster species from eight genera: Allocricetulus, Calomyscus, Cricetulus, Cricetus, Mesocricetus, Peromyscus, Phodopus and Tscherskia (Cricetidae, Muroidea, Rodentia). Based on results of chromosome painting and G-banding, comparative maps between 20 rodent species have been established. The integrated maps demonstrate a high level of karyotype conservation among species in the Cricetus group (Cricetus, Cricetulus, Allocricetulus) with Tscherskia as its sister group. Species within the genera Mesocricetus and Phodopus also show a high degree of chromosomal conservation. Our results substantiate many of the conclusions suggested by other data and strengthen the topology of the Muroidea phylogenetic tree through the inclusion of genome-wide chromosome rearrangements. The derivation of the muroids karyotypes from the putative ancestral state involved centric fusions, fissions, addition of heterochromatic arms and a great number of inversions. Our results provide further insights into the karyotype relationships of all species investigated.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2007;15;3;283-97

  • Global transcriptional responses of fission and budding yeast to changes in copper and iron levels: a comparative study.

    Rustici G, van Bakel H, Lackner DH, Holstege FC, Wijmenga C, Bähler J and Brazma A

    EMBL Outstation-Hinxton, European Bioinformatics Institute, Cambridge CB10 1SD, UK.

    Background: Recent studies in comparative genomics demonstrate that interspecies comparison represents a powerful tool for identifying both conserved and specialized biologic processes across large evolutionary distances. All cells must adjust to environmental fluctuations in metal levels, because levels that are too low or too high can be detrimental. Here we explore the conservation of metal homoeostasis in two distantly related yeasts.

    Results: We examined genome-wide gene expression responses to changing copper and iron levels in budding and fission yeast using DNA microarrays. The comparison reveals conservation of only a small core set of genes, defining the copper and iron regulons, with a larger number of additional genes being specific for each species. Novel regulatory targets were identified in Schizosaccharomyces pombe for Cuf1p (pex7 and SPAC3G6.05) and Fep1p (srx1, sib1, sib2, rds1, isu1, SPBC27B12.03c, SPAC1F8.02c, and SPBC947.05c). We also present evidence refuting a direct role of Cuf1p in the repression of genes involved in iron uptake. Remarkable differences were detected in responses of the two yeasts to excess copper, probably reflecting evolutionary adaptation to different environments.

    Conclusion: The considerable evolutionary distance between budding and fission yeast resulted in substantial diversion in the regulation of copper and iron homeostasis. Despite these differences, the conserved regulation of a core set of genes involved in the uptake of these metals provides valuable clues to key features of metal metabolism.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Genome biology 2007;8;5;R73

  • Genome-wide detection and characterization of positive selection in human populations.

    Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Johnson TA, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R and Stewart J

    Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02139, USA.

    With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.

    Funded by: Wellcome Trust: 077008, 077011, 077046, 081682

    Nature 2007;449;7164;913-8

  • Genomic analysis of human microRNA transcripts.

    Saini HK, Griffiths-Jones S and Enright AJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    MicroRNAs (miRNAs) are important genetic regulators of development, differentiation, growth, and metabolism. The mammalian genome encodes approximately 500 known miRNA genes. Approximately 50% are expressed from non-protein-coding transcripts, whereas the rest are located mostly in the introns of coding genes. Intronic miRNAs are generally transcribed coincidentally with their host genes. However, the nature of the primary transcript of intergenic miRNAs is largely unknown. We have performed a large-scale analysis of transcription start sites, polyadenylation signals, CpG islands, EST data, transcription factor-binding sites, and expression ditag data surrounding intergenic miRNAs in the human genome to improve our understanding of the structure of their primary transcripts. We show that a significant fraction of primary transcripts of intergenic miRNAs are 3-4 kb in length, with clearly defined 5' and 3' boundaries. We provide strong evidence for the complete transcript structure of a small number of human miRNAs.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;45;17719-24

  • Genomewide association analysis of coronary artery disease.

    Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, Dixon RJ, Meitinger T, Braund P, Wichmann HE, Barrett JH, König IR, Stevens SE, Szymczak S, Tregouet DA, Iles MM, Pahlke F, Pollard H, Lieb W, Cambien F, Fischer M, Ouwehand W, Blankenberg S, Balmforth AJ, Baessler A, Ball SG, Strom TM, Braenne I, Gieger C, Deloukas P, Tobin MD, Ziegler A, Thompson JR, Schunkert H and WTCCC and the Cardiogenics Consortium

    University of Leicester, Leicester, United Kingdom.

    Background: Modern genotyping platforms permit a systematic search for inherited components of complex diseases. We performed a joint analysis of two genomewide association studies of coronary artery disease.

    Methods: We first identified chromosomal loci that were strongly associated with coronary artery disease in the Wellcome Trust Case Control Consortium (WTCCC) study (which involved 1926 case subjects with coronary artery disease and 2938 controls) and looked for replication in the German MI [Myocardial Infarction] Family Study (which involved 875 case subjects with myocardial infarction and 1644 controls). Data on other single-nucleotide polymorphisms (SNPs) that were significantly associated with coronary artery disease in either study (P<0.001) were then combined to identify additional loci with a high probability of true association. Genotyping in both studies was performed with the use of the GeneChip Human Mapping 500K Array Set (Affymetrix).

    Results: Of thousands of chromosomal loci studied, the same locus had the strongest association with coronary artery disease in both the WTCCC and the German studies: chromosome 9p21.3 (SNP, rs1333049) (P=1.80x10(-14) and P=3.40x10(-6), respectively). Overall, the WTCCC study revealed nine loci that were strongly associated with coronary artery disease (P<1.2x10(-5) and less than a 50% chance of being falsely positive). In addition to chromosome 9p21.3, two of these loci were successfully replicated (adjusted P<0.05) in the German study: chromosome 6q25.1 (rs6922269) and chromosome 2q36.3 (rs2943634). The combined analysis of the two studies identified four additional loci significantly associated with coronary artery disease (P<1.3x10(-6)) and a high probability (>80%) of a true association: chromosomes 1p13.3 (rs599839), 1q41 (rs17465637), 10q11.21 (rs501120), and 15q22.33 (rs17228212).

    Conclusions: We identified several genetic loci that, individually and in aggregate, substantially affect the risk of development of coronary artery disease.

    Funded by: Medical Research Council: G0501942, G9806740; Wellcome Trust: 076113, 077011

    The New England journal of medicine 2007;357;5;443-53

  • Evolutionary vignettes of natural killer cell receptors.

    Sambrook JG and Beck S

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The discovery of novel immune receptors has led to a recent renaissance of research into the innate immune system, following decades of intense research of the adaptive immune system. Of particular interest has been the discovery of the natural killer (NK) cell receptors which, depending on type, interact with classical or non-classical MHC class I antigens of the adaptive immune system, thus functioning at the interface of innate and adaptive immunity. Here, we review recent progress with respect to two such families of NK receptors, the killer immunoglobulin-like receptors (KIRs) and the killer cell lectin-like receptors (KLRs), and attempt to trace their evolution across vertebrates.

    Funded by: Wellcome Trust

    Current opinion in immunology 2007;19;5;553-60

  • Common variants in WFS1 confer risk of type 2 diabetes.

    Sandhu MS, Weedon MN, Fawcett KA, Wasson J, Debenham SL, Daly A, Lango H, Frayling TM, Neumann RJ, Sherva R, Blech I, Pharoah PD, Palmer CN, Kimber C, Tavendale R, Morris AD, McCarthy MI, Walker M, Hitman G, Glaser B, Permutt MA, Hattersley AT, Wareham NJ and Barroso I

    UK Medical Research Council (MRC) Epidemiology Unit, Strangeways Research Laboratory, Cambridge CB1 8RN, UK.

    We studied genes involved in pancreatic beta cell function and survival, identifying associations between SNPs in WFS1 and diabetes risk in UK populations that we replicated in an Ashkenazi population and in additional UK studies. In a pooled analysis comprising 9,533 cases and 11,389 controls, SNPs in WFS1 were strongly associated with diabetes risk. Rare mutations in WFS1 cause Wolfram syndrome; using a gene-centric approach, we show that variation in WFS1 also predisposes to common type 2 diabetes.

    Funded by: Medical Research Council: G0500070, MC_U106179471; Wellcome Trust: 068545/z/02, 077016

    Nature genetics 2007;39;8;951-3

  • Challenges and standards in integrating surveys of structural variation.

    Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, Carter NP, Hurles ME and Feuk L

    The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, 101 College Street, Room 14-701, Ontario M5G 1L7, Canada.

    There has been an explosion of data describing newly recognized structural variants in the human genome. In the flurry of reporting, there has been no standard approach to collecting the data, assessing its quality or describing identified features. This risks becoming a rampant problem, in particular with respect to surveys of copy number variation and their application to disease studies. Here, we consider the challenges in characterizing and documenting genomic structural variants. From this, we derive recommendations for standards to be adopted, with the aim of ensuring the accurate presentation of this form of genetic variation to facilitate ongoing research.

    Funded by: Wellcome Trust: 077008, 077014

    Nature genetics 2007;39;7 Suppl;S7-15

  • Application of phage display to high throughput antibody generation and characterization.

    Schofield DJ, Pope AR, Clementel V, Buckell J, Chapple SDj, Clarke KF, Conquer JS, Crofts AM, Crowther SR, Dyson MR, Flack G, Griffin GJ, Hooks Y, Howat WJ, Kolb-Kokocinski A, Kunze S, Martin CD, Maslen GL, Mitchell JN, O'Sullivan M, Perera RL, Roake W, Shadbolt SP, Vincent KJ, Warford A, Wilson WE, Xie J, Young JL and McCafferty J

    Abcam Ltd, Cambridge Science Park, Cambridge CB4 0FW, UK.

    We have created a high quality phage display library containing over 1010 human antibodies and describe its use in the generation of antibodies on an unprecedented scale. We have selected, screened and sequenced over 38,000 recombinant antibodies to 292 antigens, yielding over 7,200 unique clones. 4,400 antibodies were characterized by specificity testing and detailed sequence analysis and the data/clones are available online. Sensitive detection was demonstrated in a bead based flow cytometry assay. Furthermore, positive staining by immunohistochemistry on tissue microarrays was found for 37% (143/381) of antibodies. Thus, we have demonstrated the potential of and illuminated the issues associated with genome-wide monoclonal antibody generation.

    Funded by: Wellcome Trust

    Genome biology 2007;8;11;R254

  • An introduction to hidden Markov models.

    Schuster-Böckler B and Bateman A

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    This unit introduces the concept of hidden Markov models in computational biology. It describes them using simple biological examples, requiring as little mathematical knowledge as possible. The unit also presents a brief history of hidden Markov models and an overview of their current applications before concluding with a discussion of their limitations.

    Funded by: Wellcome Trust: 087656

    Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] 2007;Appendix 3;Appendix 3A

  • Reuse of structural domain-domain interactions in protein networks.

    Schuster-Böckler B and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Background: Protein interactions are thought to be largely mediated by interactions between structural domains. Databases such as iPfam relate interactions in protein structures to known domain families. Here, we investigate how the domain interactions from the iPfam database are distributed in protein interactions taken from the HPRD, MPact, BioGRID, DIP and IntAct databases.

    Results: We find that known structural domain interactions can only explain a subset of 4-19% of the available protein interactions, nevertheless this fraction is still significantly bigger than expected by chance. There is a correlation between the frequency of a domain interaction and the connectivity of the proteins it occurs in. Furthermore, a large proportion of protein interactions can be attributed to a small number of domain interactions. We conclude that many, but not all, domain interactions constitute reusable modules of molecular recognition. A substantial proportion of domain interactions are conserved between E. coli, S. cerevisiae and H. sapiens. These domains are related to essential cellular functions, suggesting that many domain interactions were already present in the last universal common ancestor.

    Conclusion: Our results support the concept of domain interactions as reusable, conserved building blocks of protein interactions, but also highlight the limitations currently imposed by the small number of available protein structures.

    Funded by: Wellcome Trust: 087656

    BMC bioinformatics 2007;8;259

  • The original Lujan syndrome family has a novel missense mutation (p.N1007S) in the MED12 gene.

    Schwartz CE, Tarpey PS, Lubs HA, Verloes A, May MM, Risheg H, Friez MJ, Futreal PA, Edkins S, Teague J, Briault S, Skinner C, Bauer-Carlin A, Simensen RJ, Joseph SM, Jones JR, Gecz J, Stratton MR, Raymond FL and Stevenson RE

    A novel missense mutation in the mediator of RNA polymerase II transcription subunit 12 (MED12) gene has been found in the original family with Lujan syndrome and in a second family (K9359) that was initially considered to have Opitz-Kaveggia (FG) syndrome. A different missense mutation in the MED12 gene has been reported previously in the original family with FG syndrome and in five other families with compatible clinical findings. Neither sequence alteration has been found in over 1400 control X chromosomes. Lujan (Lujan-Fryns) syndrome is characterised by tall stature with asthenic habitus, macrocephaly, a tall narrow face, maxillary hypoplasia, a high narrow palate with dental crowding, a small or receding chin, long hands with hyperextensible digits, hypernasal speech, hypotonia, mild-to-moderate mental retardation, behavioural aberrations and dysgenesis of the corpus callosum. Although Lujan syndrome has not been previously considered to be in the differential diagnosis of FG syndrome, there are some overlapping clinical manifestations. Specifically, these are dysgenesis of the corpus callosum, macrocephaly/relative macrocephaly, a tall forehead, hypotonia, mental retardation and behavioural disturbances. Thus, it seems that these two X-linked mental retardation syndromes are allelic, with mutations in the MED12 gene.

    Funded by: NICHD NIH HHS: HD 26202; Wellcome Trust

    Journal of medical genetics 2007;44;7;472-7

  • JAK2 exon 12 mutations in polycythemia vera and idiopathic erythrocytosis.

    Scott LM, Tong W, Levine RL, Scott MA, Beer PA, Stratton MR, Futreal PA, Erber WN, McMullin MF, Harrison CN, Warren AJ, Gilliland DG, Lodish HF and Green AR

    University of Cambridge, Cambridge, United Kingdom.

    Background: The V617F mutation, which causes the substitution of phenylalanine for valine at position 617 of the Janus kinase (JAK) 2 gene (JAK2), is often present in patients with polycythemia vera, essential thrombocythemia, and idiopathic myelofibrosis. However, the molecular basis of these myeloproliferative disorders in patients without the V617F mutation is unclear.

    Methods: We searched for new mutations in members of the JAK and signal transducer and activator of transcription (STAT) gene families in patients with V617F-negative polycythemia vera or idiopathic erythrocytosis. The mutations were characterized biochemically and in a murine model of bone marrow transplantation.

    Results: We identified four somatic gain-of-function mutations affecting JAK2 exon 12 in 10 V617F-negative patients. Those with a JAK2 exon 12 mutation presented with an isolated erythrocytosis and distinctive bone marrow morphology, and several also had reduced serum erythropoietin levels. Erythroid colonies could be grown from their blood samples in the absence of exogenous erythropoietin. All such erythroid colonies were heterozygous for the mutation, whereas colonies homozygous for the mutation occur in most patients with V617F-positive polycythemia vera. BaF3 cells expressing the murine erythropoietin receptor and also carrying exon 12 mutations could proliferate without added interleukin-3. They also exhibited increased phosphorylation of JAK2 and extracellular regulated kinase 1 and 2, as compared with cells transduced by wild-type JAK2 or V617F JAK2. Three of the exon 12 mutations included a substitution of leucine for lysine at position 539 of JAK2. This mutation resulted in a myeloproliferative phenotype, including erythrocytosis, in a murine model of retroviral bone marrow transplantation.

    Conclusions: JAK2 exon 12 mutations define a distinctive myeloproliferative syndrome that affects patients who currently receive a diagnosis of polycythemia vera or idiopathic erythrocytosis.

    Funded by: NCI NIH HHS: CA66996, K01 CA115679; NHLBI NIH HHS: P01 HL32262; NIDDK NIH HHS: DK50654; Wellcome Trust: 077012

    The New England journal of medicine 2007;356;5;459-68

  • Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes.

    Sebaihia M, Peck MW, Minton NP, Thomson NR, Holden MT, Mitchell WJ, Carter AT, Bentley SD, Mason DR, Crossman L, Paul CJ, Ivens A, Wells-Bennik MH, Davis IJ, Cerdeño-Tárraga AM, Churcher C, Quail MA, Chillingworth T, Feltwell T, Fraser A, Goodhead I, Hance Z, Jagels K, Larke N, Maddison M, Moule S, Mungall K, Norbertczak H, Rabbinowitsch E, Sanders M, Simmonds M, White B, Whithead S and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom;

    Clostridium botulinum is a heterogeneous Gram-positive species that comprises four genetically and physiologically distinct groups of bacteria that share the ability to produce botulinum neurotoxin, the most poisonous toxin known to man, and the causative agent of botulism, a severe disease of humans and animals. We report here the complete genome sequence of a representative of Group I (proteolytic) C. botulinum (strain Hall A, ATCC 3502). The genome consists of a chromosome (3,886,916 bp) and a plasmid (16,344 bp), which carry 3650 and 19 predicted genes, respectively. Consistent with the proteolytic phenotype of this strain, the genome harbors a large number of genes encoding secreted proteases and enzymes involved in uptake and metabolism of amino acids. The genome also reveals a hitherto unknown ability of C. botulinum to degrade chitin. There is a significant lack of recently acquired DNA, indicating a stable genomic content, in strong contrast to the fluid genome of Clostridium difficile, which can form longer-term relationships with its host. Overall, the genome indicates that C. botulinum is adapted to a saprophytic lifestyle both in soil and aquatic environments. This pathogen relies on its toxin to rapidly kill a wide range of prey species, and to gain access to nutrient sources, it releases a large number of extracellular enzymes to soften and destroy rotting or decayed tissues.

    Funded by: Medical Research Council: G0700837; Wellcome Trust

    Genome research 2007;17;7;1082-92

  • A new environmentally resistant cell type from Dictyostelium.

    Serafimidis I, Bloomfield G, Skelton J, Ivens A and Kay RR

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK.

    This paper describes the serendipitous discovery and first characterization of a new resistant cell type from Dictyostelium, for which the name aspidocyte (from aspis: Greek for shield) is proposed. These cells are induced from amoebae by a range of toxins including heavy metals and antibiotics, and were first detected by their striking resistance to detergent lysis. Aspidocytes are separate, rounded or irregular-shaped cells, which are immotile but remain fully viable; once the toxic stress is removed, they revert to amoeboid cells within an hour. Induction takes a few hours and is completely blocked by the protein synthesis inhibitor cycloheximide. Aspidocytes lack a cell wall and their resistance to detergent lysis is active, requiring continued energy metabolism, and may be assisted by a complete cessation of endocytosis, as measured by uptake of the dye FM1-43. Microarray analysis shows that aspidocytes have a distinct pattern of gene expression, with a number of genes up-regulated that are predicted to be involved in lipid metabolism. Aspidocytes were initially detected in a hypersensitive mutant, in which the AMP deaminase gene is disrupted, suggesting that the inductive pathway involves AMP levels or metabolism. Since aspidocytes can also be induced from wild-type cells and are much more resistant than amoebae to a membrane-disrupting antibiotic, it is possible that they are an adaptation allowing Dictyostelium cells to survive a sudden onslaught of toxins in the wild.

    Funded by: Wellcome Trust: 066742

    Microbiology (Reading, England) 2007;153;Pt 2;619-30

  • A more convenient truth.

    Seth-Smith H

    Nature reviews. Microbiology 2007;5;4;248-50

  • Ocean's elevenses.

    Seth-Smith H

    Nature reviews. Microbiology 2007;5;1;9

  • Different evolutionary histories of the two classical class I genes BF1 and BF2 illustrate drift and selection within the stable MHC haplotypes of chickens.

    Shaw I, Powell TJ, Marston DA, Baker K, van Hateren A, Riegert P, Wiles MV, Milne S, Beck S and Kaufman J

    Institute for Animal Health, Compton, Berkshire, United Kingdom.

    Compared with the MHC of typical mammals, the chicken MHC (BF/BL region) of the B12 haplotype is smaller, simpler, and rearranged, with two classical class I genes of which only one is highly expressed. In this study, we describe the development of long-distance PCR to amplify some or all of each class I gene separately, allowing us to make the following points. First, six other haplotypes have the same genomic organization as B12, with a poorly expressed (minor) BF1 gene between DMB2 and TAP2 and a well-expressed (major) BF2 gene between TAP2 and C4. Second, the expression of the BF1 gene is crippled in three different ways in these haplotypes: enhancer A deletion (B12, B19), enhancer A divergence and transcription start site deletion (B2, B4, B21), and insertion/rearrangement leading to pseudogenes (B14, B15). Third, the three kinds of alterations in the BF1 gene correspond to dendrograms of the BF1 and poorly expressed class II B (BLB1) genes reflecting mostly neutral changes, while the dendrograms of the BF2 and well-expressed class II (BLB2) genes each have completely different topologies reflecting selection. The common pattern for the poorly expressed genes reflects the fact the BF/BL region undergoes little recombination and allows us to propose a pattern of descent for these chicken MHC haplotypes from a common ancestor. Taken together, these data explain how stable MHC haplotypes predominantly express a single class I molecule, which in turn leads to striking associations of the chicken MHC with resistance to infectious pathogens and response to vaccines.

    Funded by: Wellcome Trust

    Journal of immunology (Baltimore, Md. : 1950) 2007;178;9;5744-52

  • Genes flanking Xist in mouse and human are separated on the X chromosome in American marsupials.

    Shevchenko AI, Zakharova IS, Elisaphenko EA, Kolesnikov NN, Whitehead S, Bird C, Ross M, Weidman JR, Jirtle RL, Karamysheva TV, Rubtsov NB, VandeBerg JL, Mazurok NA, Nesterova TB, Brockdorff N and Zakian SM

    Institute of Cytology and Genetics, Russian Academy of Sciences, Siberian Department, Novosibirsk, Russia.

    X inactivation, the transcriptional silencing of one of the two X chromosomes in female mammals, achieves dosage compensation of X-linked genes relative to XY males. In eutherian mammals X inactivation is regulated by the X-inactive specific transcript (Xist), a cis-acting non-coding RNA that triggers silencing of the chromosome from which it is transcribed. Marsupial mammals also undergo X inactivation but the mechanism is relatively poorly understood. We set out to analyse the X chromosome in Monodelphis domestica and Didelphis virginiana, focusing on characterizing the interval defined by the Chic1 and Slc16a2 genes that in eutherians flank the Xist locus. The synteny of this region is retained on chicken chromosome 4 where other loci belonging to the evolutionarily ancient stratum of the human X chromosome, the so-called X conserved region (XCR), are also located. We show that in both M. domestica and D. virginiana an evolutionary breakpoint has separated the Chic1 and Slc16a2 loci. Detailed analysis of opossum genomic sequences revealed linkage of Chic1 with the Lnx3 gene, recently proposed to be the evolutionary precursor of Xist, and Fip1, the evolutionary precursor of Tsx, a gene located immediately downstream of Xist in eutherians. We discuss these findings in relation to the evolution of Xist and X inactivation in mammals.

    Funded by: Wellcome Trust: 067065/Z/02/Z

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2007;15;2;127-36

  • Chromosomal evolution of Arvicolinae (Cricetidae, Rodentia). I. The genome homology of tundra vole, field vole, mouse and golden hamster revealed by comparative chromosome painting.

    Sitnikova NA, Romanenko SA, O'Brien PC, Perelman PL, Fu B, Rubtsova NV, Serdukova NA, Golenishchev FN, Trifonov VA, Ferguson-Smith MA, Yang F and Graphodatsky AS

    Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia.

    Cross-species chromosome painting has become the mainstay of comparative cytogenetic and chromosome evolution studies. Here we have made a set of chromosomal painting probes for the field vole (Microtus agrestis) by DOP-PCR amplification of flow-sorted chromosomes. Together with painting probes of golden hamster (Mesocricetus auratus) and mouse (Mus musculus), the field vole probes have been hybridized onto the metaphases of the tundra vole (Microtus oeconomus). A comparative chromosome map between these two voles, golden hamster and mouse has been established based on the results of cross-species chromosome painting and G-banding comparisons. The sets of paints from the field vole, golden hamster and mouse identified a total of 27, 40 and 47 homologous autosomal regions, respectively, in the genome of tundra vole; 16, 41 and 51 fusion/fission rearrangements differentiate the karyotype of the tundra vole from the karyotypes of the field vole, golden hamster and mouse, respectively.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2007;15;4;447-56

  • Comparative genomics: from genotype to disease phenotype in the leishmaniases.

    Smith DF, Peacock CS and Cruz AK

    Immunology and Infection Unit, Department of Biology/Hull York Medical School, University of York, Heslington, York YO10 5YW, UK.

    Recent progress in sequencing the genomes of several Leishmania species, causative agents of cutaneous, mucocutaneous and visceral leishmaniasis, is revealing unusual features of potential relevance to parasite virulence and pathogenesis in the host. While the genomes of Leishmania major, Leishmania braziliensis and Leishmania infantum are highly similar in content and organisation, species-specific genes and mechanisms distinguish one from another. In particular, the presence of retrotransposons and the components of a putative RNA interference machinery in L. braziliensis suggest the potential for both greater diversity and more tractable experimentation in this Leishmania Viannia species.

    Funded by: Wellcome Trust: 076355

    International journal for parasitology 2007;37;11;1173-86

  • Flowering of strict photoperiodic Nicotiana varieties in non-inductive conditions by transgenic approaches.

    Smykal P, Gennen J, De Bodt S, Ranganath V and Melzer S

    Institute of Plant Sciences, ETH Zürich, Universitaetstrasse 2, 8092, Zurich, Switzerland.

    The genus Nicotiana contains species and varieties that respond differently to photoperiod for flowering time control as day-neutral, short-day and long-day plants. In classical photoperiodism studies, these varieties have been widely used to analyse the physiological nature for floral induction by day length. Since key regulators for flowering time control by day length have been identified in Arabidopsis thaliana by molecular genetic studies, it was intriguing to analyse how closely related plants in the Nicotiana genus with opposite photoperiodic requirements respond to certain flowering time regulators. SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 (SOC1) and FRUITFULL (FUL) are two MADS box genes that are involved in the regulation of flowering time in Arabidopsis. SOC1 is a central flowering time pathway integrator, whereas the exact role of FUL for floral induction has not been established yet. The putative Nicotiana orthologs of SOC1 and FUL, NtSOC1 and NtFUL, were studied in day-neutral tobacco Nicotiana tabacum cv Hicks, in short-day tobacco N. tabacum cv Hicks Maryland Mammoth (MM) and long-day N. sylvestris plants. Both genes were similarly expressed under short- and long-day conditions in day-neutral and short-day tobaccos, but showed a different expression pattern in N. sylvestris. Overexpression of NtSOC1 and NtFUL caused flowering either in strict short-day (NtSOC1) or long-day (NtFUL) Nicotiana varieties under non-inductive photoperiods, indicating that these genes might be limiting for floral induction under non-inductive conditions in different Nicotiana varieties.

    Plant molecular biology 2007;65;3;233-42

  • Pharmacogenomics of non-small cell lung cancer

    Spicer J

    Current Pharmacogenomics. 2007;5;228-234

  • Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures.

    Stark A, Lin MF, Kheradpour P, Pedersen JS, Parts L, Carlson JW, Crosby MA, Rasmussen MD, Roy S, Deoras AN, Ruby JG, Brennecke J, Harvard FlyBase curators, Berkeley Drosophila Genome Project, Hodges E, Hinrichs AS, Caspi A, Paten B, Park SW, Han MV, Maeder ML, Polansky BJ, Robson BE, Aerts S, van Helden J, Hassan B, Gilbert DG, Eastman DA, Rice M, Weir M, Hahn MW, Park Y, Dewey CN, Pachter L, Kent WJ, Haussler D, Lai EC, Bartel DP, Hannon GJ, Kaufman TC, Eisen MB, Clark AG, Smith D, Celniker SE, Gelbart WM and Kellis M

    The Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02140, USA.

    Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.

    Funded by: NHGRI NIH HHS: R01 HG002779-05, R01 HG002779-06, R01 HG004037, R01 HG004037-01A1; NIGMS NIH HHS: R01 GM067031, R01 GM067031-04

    Nature 2007;450;7167;219-32

  • Salmonella enterica serovar typhimurium exploits inflammation to compete with the intestinal microbiota.

    Stecher B, Robbiani R, Walker AW, Westendorf AM, Barthel M, Kremer M, Chaffron S, Macpherson AJ, Buer J, Parkhill J, Dougan G, von Mering C and Hardt WD

    Institute of Microbiology, Swiss Institute of Technology Zurich, Zurich, Switzerland.

    Most mucosal surfaces of the mammalian body are colonized by microbial communities ("microbiota"). A high density of commensal microbiota inhabits the intestine and shields from infection ("colonization resistance"). The virulence strategies allowing enteropathogenic bacteria to successfully compete with the microbiota and overcome colonization resistance are poorly understood. Here, we investigated manipulation of the intestinal microbiota by the enteropathogenic bacterium Salmonella enterica subspecies 1 serovar Typhimurium (S. Tm) in a mouse colitis model: we found that inflammatory host responses induced by S. Tm changed microbiota composition and suppressed its growth. In contrast to wild-type S. Tm, an avirulent invGsseD mutant failing to trigger colitis was outcompeted by the microbiota. This competitive defect was reverted if inflammation was provided concomitantly by mixed infection with wild-type S. Tm or in mice (IL10(-/-), VILLIN-HA(CL4-CD8)) with inflammatory bowel disease. Thus, inflammation is necessary and sufficient for overcoming colonization resistance. This reveals a new concept in infectious disease: in contrast to current thinking, inflammation is not always detrimental for the pathogen. Triggering the host's immune defence can shift the balance between the protective microbiota and the pathogen in favour of the pathogen.

    PLoS biology 2007;5;10;2177-89

  • Conservation and divergence of gene families encoding components of innate immune response systems in zebrafish.

    Stein C, Caccamo M, Laird G and Leptin M

    Institute for Genetics, University of Cologne, Zuelpicher Str, 47, 50674 Cologne, Germany.

    Background: The zebrafish has become a widely used model to study disease resistance and immunity. Although the genes encoding many components of immune signaling pathways have been found in teleost fish, it is not clear whether all components are present or whether the complexity of the signaling mechanisms employed by mammals is similar in fish.

    Results: We searched the genomes of the zebrafish Danio rerio and two pufferfish for genes encoding components of the Toll-like receptor and interferon signaling pathways, the NLR (NACHT-domain and leucine rich repeat containing) protein family, and related proteins. We find that most of the components known in mammals are also present in fish, with clearly recognizable orthologous relationships. The class II cytokines and their receptors have diverged extensively, obscuring orthologies, but the number of receptors is similar in all species analyzed. In the family of the NLR proteins, the canonical members are conserved. We also found a conserved NACHT-domain protein with WD40 repeats that had previously not been described in mammals. Additionally, we have identified in each of the three fish a large species-specific subgroup of NLR proteins that contain a novel amino-terminal domain that is not found in mammalian genomes.

    Conclusion: The main innate immune signaling pathways are conserved in mammals and teleost fish. Whereas the components that act downstream of the receptors are highly conserved, with orthologous sets of genes in mammals and teleosts, components that are known or assumed to interact with pathogens are more divergent and have undergone lineage-specific expansions.

    Funded by: Wellcome Trust: 077198

    Genome biology 2007;8;11;R251

  • Reductive evolution and niche adaptation inferred from the genome of Mycobacterium ulcerans, the causative agent of Buruli ulcer.

    Stinear TP, Seemann T, Pidot S, Frigui W, Reysset G, Garnier T, Meurice G, Simon D, Bouchier C, Ma L, Tichit M, Porter JL, Ryan J, Johnson PD, Davies JK, Jenkin GA, Small PL, Jones LM, Tekaia F, Laval F, Daffé M, Parkhill J and Cole ST

    Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, 75725 Paris Cedex 15, France.

    Mycobacterium ulcerans is found in aquatic ecosystems and causes Buruli ulcer in humans, a neglected but devastating necrotic disease of subcutaneous tissue that is rampant throughout West and Central Africa. Here, we report the complete 5.8-Mb genome sequence of M. ulcerans and show that it comprises two circular replicons, a chromosome of 5632 kb and a virulence plasmid of 174 kb. The plasmid is required for production of the polyketide toxin mycolactone, which provokes necrosis. Comparisons with the recently completed 6.6-Mb genome of Mycobacterium marinum revealed >98% nucleotide sequence identity and genome-wide synteny. However, as well as the plasmid, M. ulcerans has accumulated 213 copies of the insertion sequence IS2404, 91 copies of IS2606, 771 pseudogenes, two bacteriophages, and multiple DNA deletions and rearrangements. These data indicate that M. ulcerans has recently evolved via lateral gene transfer and reductive evolution from the generalist, more rapid-growing environmental species M. marinum to become a niche-adapted specialist. Predictions based on genome inspection for the production of modified mycobacterial virulence factors, such as the highly abundant phthiodiolone lipids, were confirmed by structural analyses. Similarly, 11 protein-coding sequences identified as M. ulcerans-specific by comparative genomics were verified as such by PCR screening a diverse collection of 33 strains of M. ulcerans and M. marinum. This work offers significant insight into the biology and evolution of mycobacterial pathogens and is an important component of international efforts to counter Buruli ulcer.

    Genome research 2007;17;2;192-200

  • Relative impact of nucleotide and copy number variation on gene expression phenotypes.

    Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavaré S, Deloukas P, Hurles ME and Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Extensive studies are currently being performed to associate disease susceptibility with one form of genetic variation, namely, single-nucleotide polymorphisms (SNPs). In recent years, another type of common genetic variation has been characterized, namely, structural variation, including copy number variants (CNVs). To determine the overall contribution of CNVs to complex phenotypes, we have performed association analyses of expression levels of 14,925 transcripts with SNPs and CNVs in individuals who are part of the International HapMap project. SNPs and CNVs captured 83.6% and 17.7% of the total detected genetic variation in gene expression, respectively, but the signals from the two types of variation had little overlap. Interrogation of the genome for both types of variants may be an effective way to elucidate the causes of complex phenotypes and disease in humans.

    Funded by: Wellcome Trust: 065535, 076113, 077009, 077014, 077046

    Science (New York, N.Y.) 2007;315;5813;848-53

  • Population genomics of human gene expression.

    Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavaré S, Deloukas P and Dermitzakis ET

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus-transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.

    Funded by: Wellcome Trust: 077011, 077046

    Nature genetics 2007;39;10;1217-24

  • Proteomic and microarray analyses of the Dictyostelium Zak1-GSK-3 signaling pathway reveal a role in early development.

    Strmecki L, Bloomfield G, Araki T, Dalton E, Skelton J, Schilde C, Harwood A, Williams JG, Ivens A and Pears C

    Biochemistry Department, Oxford University, South Parks Rd., Oxford OX1 3QU, United Kingdom.

    GskA, the Dictyostelium GSK-3 orthologue, is modified and activated by the dual-specificity tyrosine kinase Zak1, and the two kinases form part of a signaling pathway that responds to extracellular cyclic AMP. We identify potential cellular effectors for the two kinases by analyzing the corresponding null mutants. There are proteins and mRNAs that are altered in abundance in only one or the other of the two mutants, indicating that each kinase has some unique functions. However, proteomic and microarray analyses identified a number of proteins and genes, respectively, that are similarly misregulated in both mutant strains. The positive correlation between the array data and the proteomic data is consistent with the Zak1-GskA signaling pathway's functioning by directly or indirectly regulating gene expression. The discoidin 1 genes are positively regulated by the pathway, while the abundance of the H5 protein is negatively regulated. Two of the targets, H5 and discoidin 1, are well-characterized markers for early development, indicating that the Zak1-GskA pathway plays a role in development earlier than previously observed.

    Funded by: Wellcome Trust: 053640/Z, 063612, 064724

    Eukaryotic cell 2007;6;2;245-52

  • Replication timing profile reflects the distinct functional and genomic features of the MHC class II region.

    Takousis P, Johonnett P, Williamson J, Sasieni P, Warnes G, Forshew T, Azuara V, Fisher A, Wu PJ, Jones T, Vatcheva R, Beck S and Sheer D

    Human Cytogenetics Laboratory, Cancer Research, UK London Research Institute, London, UK.

    The timing of DNA replication generally correlates with transcription, gene density and sequence composition. How is the timing affected if a genomic region has a combination of features that individually correlate with either early or late replication? The major histocompatibility complex (MHC) class II region is an AT-rich isochore that would be expected to replicate late, but it also contains coordinately regulated genes that are highly expressed in antigen-presenting cells and are strongly inducible in other cell types. Using cytological and biochemical assays, we find that the entire MHC replicates within the first half of S-phase, and that the class II region replicates slightly later than the adjacent regions irrespective of gene expression. These data suggest that despite AT-richness, an early-to-middle replication time in the class II region is defined by an open chromatin conformation that allows rapid transcriptional activation as a defence against pathogens.

    Funded by: Cancer Research UK: A8318, C5321/A8318; Medical Research Council: MC_U120027516

    Cell cycle (Georgetown, Tex.) 2007;6;19;2393-8

  • Analysis of genetic variation in Akt2/PKB-beta in severe insulin resistance, lipodystrophy, type 2 diabetes, and related metabolic phenotypes.

    Tan K, Kimber WA, Luan J, Soos MA, Semple RK, Wareham NJ, O'Rahilly S and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, U.K.

    We previously reported a family in which a heterozygous missense mutation in Akt2 led to a dominantly inherited syndrome of insulin-resistant diabetes and partial lipodystrophy. To determine whether genetic variation in AKT2 plays a broader role in human metabolic disease, we sequenced the entire coding region and splice junctions of AKT2 in 94 unrelated patients with severe insulin resistance, 35 of whom had partial lipodystrophy. Two rare missense mutations (R208K and R467W) were identified in single individuals. However, insulin-stimulated kinase activities of these variants were indistinguishable from wild type. In two large case-control studies (total number of participants 2,200), 0 of 11 common single nucleotide polymorphism (SNPs) in AKT2 showed significant association with type 2 diabetes. In a quantitative trait study of 1,721 extensively phenotyped individuals from the U.K., no association was found with any relevant intermediate metabolic trait. In summary, although heterozygous loss-of- function mutations in AKT2 can cause a syndrome of severe insulin resistance and lipodystrophy in humans, such mutations are uncommon causes of these syndromes. Furthermore, genetic variation in and around the AKT2 locus is unlikely to contribute significantly to the risk of type 2 diabetes or related intermediate metabolic traits in U.K. populations.

    Funded by: Medical Research Council: MC_U106179471; Wellcome Trust: 077016

    Diabetes 2007;56;3;714-9

  • A novel Gln358Glu mutation in ectodysplasin A associated with X-linked dominant incisor hypodontia.

    Tarpey P, Pemberton TJ, Stockton DW, Das P, Ninis V, Edkins S, Andrew Futreal P, Wooster R, Kamath S, Nayak R, Stratton MR and Patel PI

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Funded by: NCI NIH HHS: C06 CA62528-01; NCRR NIH HHS: C06 RR10600-01, C06 RR14514-01; NIDCR NIH HHS: DE014102; Wellcome Trust

    American journal of medical genetics. Part A 2007;143;4;390-4

  • Mutations in UPF3B, a member of the nonsense-mediated mRNA decay complex, cause syndromic and nonsyndromic mental retardation.

    Tarpey PS, Raymond FL, Nguyen LS, Rodriguez J, Hackett A, Vandeleur L, Smith R, Shoubridge C, Edkins S, Stevens C, O'Meara S, Tofts C, Barthorpe S, Buck G, Cole J, Halliday K, Hills K, Jones D, Mironenko T, Perry J, Varian J, West S, Widaa S, Teague J, Dicks E, Butler A, Menzies A, Richardson D, Jenkinson A, Shepherd R, Raine K, Moon J, Luo Y, Parnau J, Bhat SS, Gardner A, Corbett M, Brooks D, Thomas P, Parkinson-Lawrence E, Porteous ME, Warner JP, Sanderson T, Pearson P, Simensen RJ, Skinner C, Hoganson G, Superneau D, Wooster R, Bobrow M, Turner G, Stevenson RE, Schwartz CE, Futreal PA, Srivastava AK, Stratton MR and Gécz J

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Nonsense-mediated mRNA decay (NMD) is of universal biological significance. It has emerged as an important global RNA, DNA and translation regulatory pathway. By systematically sequencing 737 genes (annotated in the Vertebrate Genome Annotation database) on the human X chromosome in 250 families with X-linked mental retardation, we identified mutations in the UPF3 regulator of nonsense transcripts homolog B (yeast) (UPF3B) leading to protein truncations in three families: two with the Lujan-Fryns phenotype and one with the FG phenotype. We also identified a missense mutation in another family with nonsyndromic mental retardation. Three mutations lead to the introduction of a premature termination codon and subsequent NMD of mutant UPF3B mRNA. Protein blot analysis using lymphoblastoid cell lines from affected individuals showed an absence of the UPF3B protein in two families. The UPF3B protein is an important component of the NMD surveillance machinery. Our results directly implicate abnormalities of NMD in human disease and suggest at least partial redundancy of NMD pathways.

    Funded by: NICHD NIH HHS: HD26202; Wellcome Trust: 077012

    Nature genetics 2007;39;9;1127-33

  • Mutations in CUL4B, which encodes a ubiquitin E3 ligase subunit, cause an X-linked mental retardation syndrome associated with aggressive outbursts, seizures, relative macrocephaly, central obesity, hypogonadism, pes cavus, and tremor.

    Tarpey PS, Raymond FL, O'Meara S, Edkins S, Teague J, Butler A, Dicks E, Stevens C, Tofts C, Avis T, Barthorpe S, Buck G, Cole J, Gray K, Halliday K, Harrison R, Hills K, Jenkinson A, Jones D, Menzies A, Mironenko T, Perry J, Raine K, Richardson D, Shepherd R, Small A, Varian J, West S, Widaa S, Mallya U, Moon J, Luo Y, Holder S, Smithson SF, Hurst JA, Clayton-Smith J, Kerr B, Boyle J, Shaw M, Vandeleur L, Rodriguez J, Slaugh R, Easton DF, Wooster R, Bobrow M, Srivastava AK, Stevenson RE, Schwartz CE, Turner G, Gecz J, Futreal PA, Stratton MR and Partington M

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We have identified three truncating, two splice-site, and three missense variants at conserved amino acids in the CUL4B gene on Xq24 in 8 of 250 families with X-linked mental retardation (XLMR). During affected subjects' adolescence, a syndrome emerged with delayed puberty, hypogonadism, relative macrocephaly, moderate short stature, central obesity, unprovoked aggressive outbursts, fine intention tremor, pes cavus, and abnormalities of the toes. This syndrome was first described by Cazebas et al., in a family that was included in our study and that carried a CUL4B missense variant. CUL4B is a ubiquitin E3 ligase subunit implicated in the regulation of several biological processes, and CUL4B is the first XLMR gene that encodes an E3 ubiquitin ligase. The relatively high frequency of CUL4B mutations in this series indicates that it is one of the most commonly mutated genes underlying XLMR and suggests that its introduction into clinical diagnostics should be a high priority.

    Funded by: NICHD NIH HHS: HD26202; Wellcome Trust

    American journal of human genetics 2007;80;2;345-52

  • ProteomeBinders: planning a European resource of affinity reagents for analysis of the human proteome.

    Taussig MJ, Stoevesandt O, Borrebaeck CA, Bradbury AR, Cahill D, Cambillau C, de Daruvar A, Dübel S, Eichler J, Frank R, Gibson TJ, Gloriam D, Gold L, Herberg FW, Hermjakob H, Hoheisel JD, Joos TO, Kallioniemi O, Koegl M, Koegll M, Konthur Z, Korn B, Kremmer E, Krobitsch S, Landegren U, van der Maarel S, McCafferty J, Muyldermans S, Nygren PA, Palcy S, Plückthun A, Polic B, Przybylski M, Saviranta P, Sawyer A, Sherman DJ, Skerra A, Templin M, Ueffing M and Uhlén M

    Technology Research Group, The Babraham Institute, Cambridge CB22 3AT, UK.

    ProteomeBinders is a new European consortium aiming to establish a comprehensive resource of well-characterized affinity reagents, including but not limited to antibodies, for analysis of the human proteome. Given the huge diversity of the proteome, the scale of the project is potentially immense but nevertheless feasible in the context of a pan-European or even worldwide coordination.

    Nature methods 2007;4;1;13-7

  • A genotype calling algorithm for the Illumina BeadArray platform.

    Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP and Clark TG

    Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.

    Motivation: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes.

    Results: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy.

    Availability: The C++ executable for the algorithm described here is available by request from the authors.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust: 077011, 082370

    Bioinformatics (Oxford, England) 2007;23;20;2741-6

  • Sequence-based analysis of pQBR103; a representative of a unique, transfer-proficient mega plasmid resident in the microbial community of sugar beet.

    Tett A, Spiers AJ, Crossman LC, Ager D, Ciric L, Dow JM, Fry JC, Harris D, Lilley A, Oliver A, Parkhill J, Quail MA, Rainey PB, Saunders NJ, Seeger K, Snyder LA, Squares R, Thomas CM, Turner SL, Zhang XX, Field D and Bailey MJ

    Centre for Ecology and Hydrology-Oxford, Oxford, UK.

    The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the phytosphere. It is known to enhance the competitive fitness of its host, and parts of the plasmid are known to be actively transcribed in the plant environment. Analysis of the complete sequence of this plasmid predicts a coding sequence (CDS)-rich genome containing 478 CDSs and an exceptional degree of genetic novelty; 80% of predicted coding sequences cannot be ascribed a function and 60% are orphans. Of those to which function could be assigned, 40% bore greatest similarity to sequences from Pseudomonas spp, and the majority of the remainder showed similarity to other gamma-proteobacterial genera and plasmids. pQBR103 has identifiable regions presumed responsible for replication and partitioning, but despite being tra+ lacks the full complement of any previously described conjugal transfer functions. The DNA sequence provided few insights into the functional significance of plant-induced transcriptional regions, but suggests that 14% of CDSs may be expressed (11 CDSs with functional annotation and 54 without), further highlighting the ecological importance of these novel CDSs. Comparative analysis indicates that pQBR103 shares significant regions of sequence with other plasmids isolated from sugar beet plants grown at the same geographic location. These plasmid sequences indicate there is more novelty in the mobile DNA pool accessible to phytosphere pseudomonas than is currently appreciated or understood.

    Funded by: Wellcome Trust: 082372

    The ISME journal 2007;1;4;331-40

  • A cytogenetically characterized, genome-anchored 10-Mb BAC set and CGH array for the domestic dog.

    Thomas R, Duke SE, Bloom SK, Breen TE, Young AC, Feiste E, Seiser EL, Tsai PC, Langford CF, Ellis P, Karlsson EK, Lindblad-Toh K and Breen M

    Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, 4700 Hillsborough Street, Raleigh, NC 27606, USA.

    The generation of a 7.5x dog genome assembly provides exciting new opportunities to interpret tumor-associated chromosome aberrations at the biological level. We present a genomic microarray for array comparative genomic hybridization (aCGH) analysis in the dog, comprising 275 bacterial artificial chromosome (BAC) clones spaced at intervals of approximately 10 Mb. Each clone has been positioned accurately within the genome assembly and assigned to a unique chromosome location by fluorescence in situ hybridization (FISH) analysis, both individually and as chromosome-specific BAC pools. The microarray also contains clones representing the dog orthologues of 31 genes implicated in human cancers. FISH analysis of the 10-Mb BAC clone set indicated excellent coverage of each dog chromosome by the genome assembly. The order of clones was consistent with the assembly, but the cytogenetic intervals between clones were variable. We demonstrate the application of the BAC array for aCGH analysis to identify both whole and partial chromosome imbalances using a canine histiocytic sarcoma case. Using BAC clones selected from the array as probes, multicolor FISH analysis was used to further characterize these imbalances, revealing numerous structural chromosome rearrangements. We outline the value of a combined aCGH/FISH approach, together with a well-annotated dog genome assembly, in canine and comparative cancer studies.

    Funded by: NINDS NIH HHS: R21 NS051190-01; Wellcome Trust

    The Journal of heredity 2007;98;5;474-84

  • Comparative genome analyses of the pathogenic Yersiniae based on the genome sequence of Yersinia enterocolitica strain 8081.

    Thomson NR, Howard S, Wren BW and Prentice MB

    The Pathogen Sequencing Unit, Wellcome Trust Genome Campus, The Wellcome Trust Sanger Institute, Cambridge, UK.

    This chapter represents a summary of the findings from the Yersinia enterocolitica strain 8081 whole genome sequence and the associated microarray analysis. Section 1 & 2 provide an introduction to the species and an overview of the general features of the genome. Section 3 identifies important regions within the genome which highlight important differences in gene function that separate the three pathogenic Yersinias. Section 4 describes genomic loci conferring important, species-specific, metabolic and virulence traits. Section 5 details extensive microarray data to provide an overview of species-specific core Y. enterocolitica gene functions and important insights into the intra-species differences between the high, low and non-pathogenic Y. enterocolitica biotypes.

    Advances in experimental medicine and biology 2007;603;2-16

  • Rheumatoid arthritis association at 6q23.

    Thomson W, Barton A, Ke X, Eyre S, Hinks A, Bowes J, Donn R, Symmons D, Hider S, Bruce IN, Wellcome Trust Case Control Consortium, Wilson AG, Marinou I, Morgan A, Emery P, YEAR Consortium, Carter A, Steer S, Hocking L, Reid DM, Wordsworth P, Harrison P, Strachan D and Worthington J

    Arthritis Research Campaign (arc)-Epidemiology Unit, Stopford Building, The University of Manchester, Manchester M13 9PT, UK.

    The Wellcome Trust Case Control Consortium (WTCCC) identified nine single SNPs putatively associated with rheumatoid arthritis at P = 1 x 10(-5) - 5 x 10(-7) in a genome-wide association screen. One, rs6920220, was unequivocally replicated (trend P = 1.1 x 10(-8)) in a validation study, as described here. This SNP maps to 6q23, between the genes oligodendrocyte lineage transcription factor 3 (OLIG3) and tumor necrosis factor-alpha-induced protein 3 (TNFAIP3).

    Funded by: Arthritis Research UK: 17552; Medical Research Council: G0000934, G0000934(68341); Wellcome Trust: 068545, 068545/Z/02, 076113

    Nature genetics 2007;39;12;1431-3

  • Whole-genome array-CGH for detection of submicroscopic chromosomal imbalances in children with mental retardation.

    Thuresson AC, Bondeson ML, Edeby C, Ellis P, Langford C, Dumanski JP and Annerén G

    Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden.

    Chromosomal imbalances are the major cause of mental retardation (MR). Many of these imbalances are caused by submicroscopic deletions or duplications not detected by conventional cytogenetic methods. Microarray-based comparative genomic hybridization (array-CGH) is considered to be superior for the investigation of chromosomal aberrations in children with MR, and has been demonstrated to improve the diagnostic detection rate of these small chromosomal abnormalities. In this study we used 1 Mb genome-wide array-CGH to screen 48 children with MR and congenital malformations for submicroscopic chromosomal imbalances, where the underlying cause was unknown. All children were clinically investigated and subtelomere FISH analysis had been performed in all cases. Suspected microdeletion syndromes such as deletion 22q11.2, Williams-Beuren and Angelman syndromes were excluded before array-CGH analysis was performed. We identified de novo interstitial chromosomal imbalances in two patients (4%), and an interstitial deletion inherited from an affected mother in one patient (2%). In another two of the children (4%), suspected imbalances were detected but were also found in one of the non-affected parents. The yield of identified de novo alterations detected in this study is somewhat less than previously described, and might reflect the importance of which selection criterion of patients to be used before array-CGH analysis is performed. However, array-CGH proved to be a high-quality and reliable tool for genome-wide screening of MR patients of unknown etiology.

    Cytogenetic and genome research 2007;118;1;1-7

  • Convergent adaptation of human lactase persistence in Africa and Europe.

    Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, Powell K, Mortensen HM, Hirbo JB, Osman M, Ibrahim M, Omar SA, Lema G, Nyambo TB, Ghori J, Bumpstead S, Pritchard JK, Wray GA and Deloukas P

    Department of Biology, University of Maryland, College Park, Maryland 20742, USA.

    A SNP in the gene encoding lactase (LCT) (C/T-13910) is associated with the ability to digest milk as adults (lactase persistence) in Europeans, but the genetic basis of lactase persistence in Africans was previously unknown. We conducted a genotype-phenotype association study in 470 Tanzanians, Kenyans and Sudanese and identified three SNPs (G/C-14010, T/G-13915 and C/G-13907) that are associated with lactase persistence and that have derived alleles that significantly enhance transcription from the LCT promoter in vitro. These SNPs originated on different haplotype backgrounds from the European C/T-13910 SNP and from each other. Genotyping across a 3-Mb region demonstrated haplotype homozygosity extending >2.0 Mb on chromosomes carrying C-14010, consistent with a selective sweep over the past approximately 7,000 years. These data provide a marked example of convergent evolution due to strong selective pressure resulting from shared cultural traits-animal domestication and adult milk consumption.

    Funded by: NHGRI NIH HHS: F32HG03801, HG002772-1; NIGMS NIH HHS: R01GM076637; Wellcome Trust: 076113

    Nature genetics 2007;39;1;31-40

  • Burkholderia Hep_Hag autotransporter (BuHA) proteins elicit a strong antibody response during experimental glanders but not human melioidosis.

    Tiyawisutsri R, Holden MT, Tumapa S, Rengpipat S, Clarke SR, Foster SJ, Nierman WC, Day NP and Peacock SJ

    Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand. <;

    Background: The bacterial biothreat agents Burkholderia mallei and Burkholderia pseudomallei are the cause of glanders and melioidosis, respectively. Genomic and epidemiological studies have shown that B. mallei is a recently emerged, host restricted clone of B. pseudomallei.

    Results: Using bacteriophage-mediated immunoscreening we identified genes expressed in vivo during experimental equine glanders infection. A family of immunodominant antigens were identified that share protein domain architectures with hemagglutinins and invasins. These have been designated Burkholderia Hep_Hag autotransporter (BuHA) proteins. A total of 110/207 positive clones (53%) of a B. mallei expression library screened with sera from two infected horses belonged to this family. This contrasted with 6/189 positive clones (3%) of a B. pseudomallei expression library screened with serum from 21 patients with culture-proven melioidosis.

    Conclusion: Members of the BuHA proteins are found in other Gram-negative bacteria and have been shown to have important roles related to virulence. Compared with other bacterial species, the genomes of both B. mallei and B. pseudomallei contain a relative abundance of this family of proteins. The domain structures of these proteins suggest that they function as multimeric surface proteins that modulate interactions of the cell with the host and environment. Their effect on the cellular immune response to B. mallei and their potential as diagnostics for glanders requires further study.

    Funded by: Wellcome Trust

    BMC microbiology 2007;7;19

  • Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes.

    Todd JA, Walker NM, Cooper JD, Smyth DJ, Downes K, Plagnol V, Bailey R, Nejentsev S, Field SF, Payne F, Lowe CE, Szeszko JS, Hafler JP, Zeitels L, Yang JH, Vella A, Nutland S, Stevens HE, Schuilenburg H, Coleman G, Maisuria M, Meadows W, Smink LJ, Healy B, Burren OS, Lam AA, Ovington NR, Allen J, Adlem E, Leung HT, Wallace C, Howson JM, Guja C, Ionescu-Tîrgovişte C, Genetics of Type 1 Diabetes in Finland, Simmonds MJ, Heward JM, Gough SC, Wellcome Trust Case Control Consortium, Dunger DB, Wicker LS and Clayton DG

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, University of Cambridge, Addenbrooke's Hospital, Cambridge CB2 0XY, UK.

    The Wellcome Trust Case Control Consortium (WTCCC) primary genome-wide association (GWA) scan on seven diseases, including the multifactorial autoimmune disease type 1 diabetes (T1D), shows associations at P < 5 x 10(-7) between T1D and six chromosome regions: 12q24, 12q13, 16p13, 18p11, 12p13 and 4q27. Here, we attempted to validate these and six other top findings in 4,000 individuals with T1D, 5,000 controls and 2,997 family trios independent of the WTCCC study. We confirmed unequivocally the associations of 12q24, 12q13, 16p13 and 18p11 (P(follow-up) <or= 1.35 x 10(-9); P(overall) <or= 1.15 x 10(-14)), leaving eight regions with small effects or false-positive associations. We also obtained evidence for chromosome 18q22 (P(overall) = 1.38 x 10(-8)) from a GWA study of nonsynonymous SNPs. Several regions, including 18q22 and 18p11, showed association with autoimmune thyroid disease. This study increases the number of T1D loci with compelling evidence from six to at least ten.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 061858, 061859, 089989

    Nature genetics 2007;39;7;857-64

  • Look who's talking too: graduates developing skills through communication.

    Tomazou EM and Powell GT

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Greater opportunities for young scientists to present their doctoral research to large general audiences will encourage development of transferable skills and involvement in the scientific community. We look at ways students communicate their research and explore the benefits of student-led meetings. The organization of the first Sanger-Cambridge Ph.D. Symposium provides an example of how students can act to establish forums for their work and we call on other young scientists to do the same.

    Nature reviews. Genetics 2007;8;9;724-6

  • IL23R variation determines susceptibility but not disease phenotype in inflammatory bowel disease.

    Tremelling M, Cummings F, Fisher SA, Mansfield J, Gwilliam R, Keniry A, Nimmo ER, Drummond H, Onnie CM, Prescott NJ, Sanderson J, Bredin F, Berzuini C, Forbes A, Lewis CM, Cardon L, Deloukas P, Jewell D, Mathew CG, Parkes M and Satsangi J

    IBD Research Group, Addenbrooke's Hospital, University of Cambridge, Cambridge, England, UK

    Identification of inflammatory bowel disease (IBD) susceptibility genes is key to understanding pathogenic mechanisms. Recently, the North American IBD Genetics Consortium provided compelling evidence for an association between ileal Crohn's disease (CD) and the IL23R gene using genome-wide association scanning. External replication is a priority, both to confirm this finding in other populations and to validate this new technique. We tested for association between IL23R and IBD in a large independent UK panel to determine the size of the effect and explore subphenotype correlation and interaction with CARD15.

    Methods: Eight single nucleotide polymorphism markers in IL23R tested in the North American study were genotyped in 1902 cases of Crohn's disease (CD), 975 cases of ulcerative colitis (UC), and 1345 controls using MassARRAY. Data were analyzed using chi(2) statistics, and subgroup association was sought.

    Results: A highly significant association with CD was observed, with the strongest signal at coding variant Arg381Gln (allele frequency, 2.5% in CD vs 6.2% in controls [P = 1.1 x 10(-12)]; odds ratio, 0.38; 95% confidence interval, 0.29-0.50). A weaker effect was seen in UC (allele frequency, 4.6%; odds ratio, 0.73; 95% confidence interval, 0.55-0.96). Analysis accounting for Arg381Gln suggested that other loci within IL23R also influence IBD susceptibility. Within CD, there were no subphenotype associations or evidence of interaction with CARD15.

    Conclusions: This study shows an association between IL23R and all subphenotypes of CD with a smaller effect on UC. This extends the findings of the North American study, providing clear evidence that genome-wide association scanning can successfully identify true complex disease genes.

    Funded by: Medical Research Council: G0000934(68341); Wellcome Trust: 068545, 068545/Z/02, 077011

    Gastroenterology 2007;132;5;1657-64

  • The implications of alternative splicing in the ENCODE protein complement.

    Tress ML, Martelli PL, Frankish A, Reeves GA, Wesselink JJ, Yeats C, Olason PI, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski RA, López G, Sadowski MI, Watson JD, Fariselli P, Rossi I, Nagy A, Kai W, Størling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramírez F, Schlicker A, Denoeud F, Jones P, Kerrien S, Orchard S, Antonarakis SE, Reymond A, Birney E, Brunak S, Casadio R, Guigo R, Harrow J, Hermjakob H, Jones DT, Lengauer T, Orengo CA, Patthy L, Thornton JM, Tramontano A and Valencia A

    Structural Computational Biology Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain.

    Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.

    Funded by: Wellcome Trust: 062023, 077198

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;13;5495-500

  • Y Chromosome

    Tyler-Smith C

    Encyclopedia of Life Sciences. 2007

  • Mitochondrial permeabilization relies on BH3 ligands engaging multiple prosurvival Bcl-2 relatives, not Bak.

    Uren RT, Dewson G, Chen L, Coyne SC, Huang DC, Adams JM and Kluck RM

    The Walter and Eliza Hall Institute of Medical Research, Melbourne, Victoria 3050, Australia.

    The Bcl-2 family regulates apoptosis by controlling mitochondrial integrity. To clarify whether its prosurvival members function by sequestering their Bcl-2 homology 3 (BH3)-only ligands or their multidomain relatives Bak and Bax, we analyzed whether four prosurvival proteins differing in their ability to bind specific BH3 peptides or Bak could protect isolated mitochondria. Most BH3 peptides could induce temperature-dependent cytochrome c release, but permeabilization was prevented by Bcl-x(L), Bcl-w, Mcl-1, or BHRF1. However, their protection correlated with the ability to bind Bak rather than the added BH3 peptide and could be overcome only by BH3 peptides that bind directly to the appropriate prosurvival member. Mitochondria protected by both Bcl-x(L)-like and Mcl-1 proteins were disrupted only by BH3 peptides that engage both. BH3-only reagents freed Bak from Bcl-x(L) and Mcl-1 in mitochondrial and cell lysates. The findings support a model for the control of apoptosis in which certain prosurvival proteins sequester Bak/Bax, and BH3-only proteins must neutralize all protective prosurvival proteins to allow Bak/Bax to induce mitochondrial disruption.

    Funded by: NCI NIH HHS: CA80188; Wellcome Trust

    The Journal of cell biology 2007;177;2;277-87

  • Clustered gene expression changes flank targeted gene loci in knockout mice.

    Valor LM and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Cambridge, United Kingdom.

    Background: Gene expression profiling using microarrays is a powerful technology widely used to study regulatory networks. Profiling of mRNA levels in mutant organisms has the potential to identify genes regulated by the mutated protein.

    Using tissues from multiple lines of knockout mice we have examined genome-wide changes in gene expression. We report that a significant proportion of changed genes were found near the targeted gene.

    The apparent clustering of these genes was explained by the presence of flanking DNA from the parental ES cell. We provide recommendations for the analysis and reporting of microarray data from knockout mice.

    Funded by: Wellcome Trust

    PloS one 2007;2;12;e1303

  • Integrating synapse proteomics with transcriptional regulation.

    Valor LM and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    The mammalian postsynaptic proteome (PSP) comprises a highly interconnected set of approximately 1,000 proteins. The PSP is organized into macromolecular complexes that have a modular architecture defined by protein interactions and function. Signals initiated by neurotransmitter receptors are integrated by these complexes and their constituent enzymes to orchestrate multiple downstream cellular changes, including transcriptional regulation of genes at the nucleus. Genome wide transcriptome studies are beginning to map the sets of genes regulated by the synapse proteome. Conversely, understanding the transcriptional regulation of genes encoding the synapse proteome will shed light on synapse formation. Mutations that disrupt synapse signalling complexes result in cognitive impairments in mice and humans, and recent evidence indicates that these mutation change gene expression profiles. We discuss the need for global approaches combining genetics, transcriptomics and proteomics in order to understand cognitive function and disruption in diseases.

    Behavior genetics 2007;37;1;18-30

  • Network activity-independent coordinated gene expression program for synapse assembly.

    Valor LM, Charlesworth P, Humphreys L, Anderson CN and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Global biological datasets generated by genomics, transcriptomics, and proteomics provide new approaches to understanding the relationship between the genome and the synapse. Combined transcriptome analysis and multielectrode recordings of neuronal network activity were used in mouse embryonic primary neuronal cultures to examine synapse formation and activity-dependent gene regulation. Evidence for a coordinated gene expression program for assembly of synapses was observed in the expression of 642 genes encoding postsynaptic and plasticity proteins. This synaptogenesis gene expression program preceded protein expression of synapse markers and onset of spiking activity. Continued expression was followed by maturation of morphology and electrical neuronal networks, which was then followed by the expression of activity-dependent genes. Thus, two distinct sequentially active gene expression programs underlie the genomic programs of synapse function.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;11;4658-63

  • The Ras-association domain family (RASSF) members and their role in human tumourigenesis.

    van der Weyden L and Adams DJ

    Experimental Cancer Genetics Laboratory, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK.

    Ras proteins play a direct causal role in human cancer with activating mutations in Ras occurring in approximately 30% of tumours. Ras effectors also contribute to cancer, as mutations occur in Ras effectors, notably B-Raf and PI3-K, and drugs blocking elements of these pathways are in clinical development. In 2000, a new Ras effector was identified, RAS-association domain family 1 (RASSF1), and expression of the RASSF1A isoform of this gene is silenced in tumours by methylation of its promoter. Since methylation is reversible and demethylating agents are currently being used in clinical trials, detection of RASSF1A silencing by promoter hypermethylation has potential clinical uses in cancer diagnosis, prognosis and treatment. RASSF1A belongs to a new family of RAS effectors, of which there are currently 8 members (RASSF1-8). RASSF1-6 each contain a variable N-terminal segment followed by a Ras-association (RA) domain of the Ral-GDS/AF6 type, and a specialised coiled-coil structure known as a SARAH domain extending to the C-terminus. RASSF7-8 contain an N-terminal RA domain and a variable C-terminus. Members of the RASSF family are thought to function as tumour suppressors by regulating the cell cycle and apoptosis. This review will summarise our current knowledge of each member of the RASSF family and in particular what role they play in tumourigenesis, with a special focus on RASSF1A, whose promoter methylation is one of the most frequent alterations found in human tumours.

    Funded by: Cancer Research UK: A6997; Wellcome Trust

    Biochimica et biophysica acta 2007;1776;1;58-85

  • Subunit vaccines based on intimin and Efa-1 polypeptides induce humoral immunity in cattle but do not protect against intestinal colonisation by enterohaemorrhagic Escherichia coli O157:H7 or O26:H-.

    van Diemen PM, Dziva F, Abu-Median A, Wallis TS, van den Bosch H, Dougan G, Chanter N, Frankel G and Stevens MP

    Institute for Animal Health, Compton, Berkshire RG20 7NN, UK.

    Enterohaemorrhagic Escherichia coli (EHEC) infections in humans are an important public health concern and are commonly acquired via contact with ruminant faeces. Cattle are a key control point however cross-protective vaccines for the control of EHEC in the bovine reservoir do not yet exist. The EHEC serogroups that are predominantly associated with human infection in Europe and North America are O157 and O26. Intimin and EHEC factor for adherence (Efa-1) play important roles in intestinal colonisation of cattle by EHEC and are thus attractive candidates for the development of subunit vaccines. Immunisation of calves with the cell-binding domain of intimin subtypes beta or gamma via the intramuscular route induced antigen-specific serum IgG1 and, in some cases salivary IgA responses, but did not reduce the magnitude or duration of faecal excretion of EHEC O26:H- (Int(280)-beta) or EHEC O157:H7 (Int(280)-gamma) upon subsequent experimental challenge. Similarly, immunisation of calves via the intramuscular route with the truncated Efa-1 protein (Efa-1') from EHEC O157:H7 or a mixture of the amino-terminal and central thirds of the full-length protein (Efa-1-N and M) did not protect against intestinal colonisation by EHEC O157:H7 (Efa-1') or EHEC O26:H- (Efa-1-N and M) despite the induction of humoral immunity. A portion of the serum IgG1 elicited by the truncated recombinant antigens in calves was confirmed to recognise native protein exposed on the bacterial surface. Calves immunised with a mixture of Int(280)-gamma and Efa-1' or an EHEC O157:H7 bacterin via the intramuscular route then boosted via the intranasal route with the same antigens using cholera toxin B subunit as an adjuvant were also not protected against intestinal colonisation by EHEC O157:H7. These studies highlight the need for further studies to develop and test novel vaccines or treatments for control of this important foodborne pathogen.

    Funded by: Wellcome Trust: 076962

    Veterinary immunology and immunopathology 2007;116;1-2;47-58

  • A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21.

    van Heel DA, Franke L, Hunt KA, Gwilliam R, Zhernakova A, Inouye M, Wapenaar MC, Barnardo MC, Bethel G, Holmes GK, Feighery C, Jewell D, Kelleher D, Kumar P, Travis S, Walters JR, Sanders DS, Howdle P, Swift J, Playford RJ, McLaren WM, Mearin ML, Mulder CJ, McManus R, McGinnis R, Cardon LR, Deloukas P and Wijmenga C

    Centre for Gastroenterology, Institute of Cell and Molecular Science, Queen Mary University of London, London E1 2AT, UK.

    We tested 310,605 SNPs for association in 778 individuals with celiac disease and 1,422 controls. Outside the HLA region, the most significant finding (rs13119723; P = 2.0 x 10(-7)) was in the KIAA1109-TENR-IL2-IL21 linkage disequilibrium block. We independently confirmed association in two further collections (strongest association at rs6822844, 24 kb 5' of IL21; meta-analysis P = 1.3 x 10(-14), odds ratio = 0.63), suggesting that genetic variation in this region predisposes to celiac disease.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068094, 068545/Z/02, GR068094MA

    Nature genetics 2007;39;7;827-9

  • Assembly of the Candida albicans genome into sixteen supercontigs aligned on the eight chromosomes.

    van het Hoog M, Rast TJ, Martchenko M, Grindle S, Dignard D, Hogues H, Cuomo C, Berriman M, Scherer S, Magee BB, Whiteway M, Chibana H, Nantel A and Magee PT

    Biotechnology Research Institute, National Research Council of Canada, Montreal, Quebec, Canada.

    Background: The 10.9x genomic sequence of Candida albicans, the most important human fungal pathogen, was published in 2004. Assembly 19 consisted of 412 supercontigs, of which 266 were a haploid set, since this fungus is diploid and contains an extensive degree of heterozygosity but lacks a complete sexual cycle. However, sequences of specific chromosomes were not determined.

    Results: Supercontigs from Assembly 19 (183, representing 98.4% of the sequence) were assigned to individual chromosomes purified by pulse-field gel electrophoresis and hybridized to DNA microarrays. Nine Assembly 19 supercontigs were found to contain markers from two different chromosomes. Assembly 21 contains the sequence of each of the eight chromosomes and was determined using a synteny analysis with preliminary versions of the Candida dubliniensis genome assembly, bioinformatics, a sequence tagged site (STS) map of overlapping fosmid clones, and an optical map. The orientation and order of the contigs on each chromosome, repeat regions too large to be covered by a sequence run, such as the ribosomal DNA cluster and the major repeat sequence, and telomere placement were determined using the STS map. Sequence gaps were closed by PCR and sequencing of the products. The overall assembly was compared to an optical map; this identified some misassembled contigs and gave a size estimate for each chromosome.

    Conclusion: Assembly 21 reveals an ancient chromosome fusion, a number of small internal duplications followed by inversions, and a subtelomeric arrangement, including a new gene family, the TLO genes. Correlations of position with relatedness of gene families imply a novel method of dispersion. The sequence of the individual chromosomes of C. albicans raises interesting biological questions about gene family creation and dispersion, subtelomere organization, and chromosome evolution.

    Funded by: NIAID NIH HHS: N01 AI05406, R01 AI 16567

    Genome biology 2007;8;4;R52

  • Definition of a minimal region of deletion of chromosome 7 in uterine leiomyomas by tiling-path microarray CGH and mutation analysis of known genes in this region.

    Vanharanta S, Wortham NC, Langford C, El-Bahrawy M, van der Spuy Z, Sjöberg J, Lehtonen R, Karhu A, Tomlinson IP and Aaltonen LA

    Department of Medical Genetics, Biomedicum Helsinki, University of Helsinki, Helsinki, Finland.

    Somatic interstitial deletions of chromosome segment 7q22-q31 in uterine leiomyomas are a frequent event, thought to be indicative of a tumor suppressor gene in the region. Previous LOH and CGH studies have refined this region to 7q22.3-q31, although the target gene has not been identified. Here, we have used tiling-path resolution microarray CGH to further refine the region and to identify homozygous deletions in fibroids. Furthermore, we have screened all manually annotated genes in the region for mutations. We have refined the minimum deleted region at 7q22.3-q31 to 2.79 Mbp and identified a second region of deletion at 7q34. However, we identified no pathogenic coding variation.

    Genes, chromosomes & cancer 2007;46;5;451-8

  • Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans.

    Vavouri T, Walter K, Gilks WR, Lehner B and Elgar G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Background: The human genome contains thousands of non-coding sequences that are often more conserved between vertebrate species than protein-coding exons. These highly conserved non-coding elements (CNEs) are associated with genes that coordinate development, and have been proposed to act as transcriptional enhancers. Despite their extreme sequence conservation in vertebrates, sequences homologous to CNEs have not been identified in invertebrates.

    Results: Here we report that nematode genomes contain an alternative set of CNEs that share sequence characteristics, but not identity, with their vertebrate counterparts. CNEs thus represent a very unusual class of sequences that are extremely conserved within specific animal lineages yet are highly divergent between lineages. Nematode CNEs are also associated with developmental regulatory genes, and include well-characterized enhancers and transcription factor binding sites, supporting the proposed function of CNEs as cis-regulatory elements. Most remarkably, 40 of 156 human CNE-associated genes with invertebrate orthologs are also associated with CNEs in both worms and flies.

    Conclusion: A core set of genes that regulate development is associated with CNEs across three animal groups (worms, flies and vertebrates). We propose that these CNEs reflect the parallel evolution of alternative enhancers for a common set of developmental regulatory genes in different animal groups. This 're-wiring' of gene regulatory networks containing key developmental coordinators was probably a driving force during the evolution of animal body plans. CNEs may, therefore, represent the genomic traces of these 'hard-wired' core gene regulatory networks that specify the development of each alternative animal body plan.

    Funded by: Medical Research Council: G0401138, MC_U105260799

    Genome biology 2007;8;2;R15

  • Guidelines for molecular karyotyping in constitutional genetic diagnosis.

    Vermeesch JR, Fiegler H, de Leeuw N, Szuhai K, Schoumans J, Ciccone R, Speleman F, Rauch A, Clayton-Smith J, Van Ravenswaaij C, Sanlaville D, Patsalis PC, Firth H, Devriendt K and Zuffardi O

    Center for Human Genetics, University Hospital Gasthuisberg, Leuven, Belgium.

    Array-based whole genome investigation or molecular karyotyping enables the genome-wide detection of submicroscopic imbalances. Proof-of-principle experiments have demonstrated that molecular karyotyping outperforms conventional karyotyping with regard to detection of chromosomal imbalances. This article identifies areas for which the technology seems matured and areas that require more investigations. Molecular karyotyping should be part of the genetic diagnostic work-up of patients with developmental disorders. For the implementation of the technique for other constitutional indications and in prenatal diagnosis, more research is appropriate. Also, the article aims to provide best practice guidelines for the application of array comparative genomic hybridisation to ensure both technical and clinical quality criteria that will optimise and standardise results and reports in diagnostic laboratories. In short, both the specificity and the sensitivity of the arrays should be evaluated in every laboratory offering the diagnostic test. Internal and external quality control programmes are urgently needed to evaluate and standardise the test results between laboratories.

    European journal of human genetics : EJHG 2007;15;11;1105-14

  • Distinct cytokine-driven responses of activated blood gammadelta T cells: insights into unconventional T cell pleiotropy.

    Vermijlen D, Ellis P, Langford C, Klein A, Engel R, Willimann K, Jomaa H, Hayday AC and Eberl M

    Peter Gorer Department of Immunobiology, Guy's, King's and St. Thomas' Medical School, King's College London, London, UK.

    Human Vgamma9/Vdelta2 T cells comprise a small population of peripheral blood T cells that in many infectious diseases respond to the microbial metabolite, (E)-4-hydroxy-3-methyl-but-2-enyl pyrophosphate (HMB-PP), expanding to up to 50% of CD3(+) cells. This "transitional response," occurring temporally between the rapid innate and slower adaptive response, is widely viewed as proinflammatory and/or cytolytic. However, increasing evidence that different cytokines drive widely different effector functions in alphabeta T cells provoked us to apply cDNA microarrays to explore the potential pleiotropy of HMB-PP-activated Vgamma9/Vdelta2 T cells. The data and accompanying validations show that the related cytokines, IL-2, IL-4, or IL-21, each drive proliferation and comparable CD69 up-regulation but induce distinct effector responses that differ from prototypic alphabeta T cell responses. For example, the Th1-like response to IL-2 also includes expression of IL-5 and IL-13 that conversely are not induced by IL-4. The data identify specific molecules that may mediate gammadelta T cell effects. Thus, IL-21 induces a lymphoid-homing phenotype and high, unexpected expression of the follicular B cell-attracting chemokine CXCL13/BCA-1, suggesting a novel follicular B-helper-like T cell that may play a hitherto underappreciated role in humoral immunity early in infection. Such broad plasticity emphasizes the capacity of gammadelta T cells to influence the nature of the immune response to different challenges and has implications for the ongoing clinical application of cytokines together with Vgamma9/Vdelta2 TCR agonists.

    Funded by: Wellcome Trust: 071534

    Journal of immunology (Baltimore, Md. : 1950) 2007;178;7;4304-14

  • Genetic flux over time in the Salmonella lineage.

    Vernikos GS, Thomson NR and Parkhill J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Background: DNA sequences that are shared between closely related organisms while being absent from their common ancestor and from sister lineages of that ancestor are likely to have been acquired by horizontal gene transfer. Over time, the composition of those sequences tends to become more similar to the compositional signature of their host (amelioration).

    Results: From a whole-genome comparative analysis of eleven Salmonella, three Escherichia coli and one Shigella strain, we inferred the relative time of insertion of putative horizontally acquired (PHA) genes in three Salmonella strains on different branches of the S. enterica phylogenetic tree. Compositional analysis suggests that most of the PHA genes are still undergoing an amelioration process and shows a clear correlation between time of insertion and the level of amelioration.

    Conclusion: The results show that older insertions include almost all functional classes. However, very recent horizontal transfer events in the Salmonella lineage involve primarily prophage elements that are shared only between very recently diverged lineages; despite this, the prophage sequence composition is close to that of the host, indicating that host adaptation, rather than amelioration, is likely to be the source of the compositional similarity. Almost half of the PHA genes were acquired at the base of the Salmonella lineage, whereas nearly three-quarters are shared between most S. enterica subspecies. The numerical distribution of PHA genes in the Salmonella tree topology correlates well with the divergence of the major Salmonella species, highlighting the major impact of horizontal transfer on the evolution of the salmonellae.

    Funded by: Wellcome Trust

    Genome biology 2007;8;6;R100

  • microRNA-155 regulates the generation of immunoglobulin class-switched plasma cells.

    Vigorito E, Perks KL, Abreu-Goodger C, Bunting S, Xiang Z, Kohlhaas S, Das PP, Miska EA, Rodriguez A, Bradley A, Smith KG, Rada C, Enright AJ, Toellner KM, Maclennan IC and Turner M

    Laboratory of Lymphocyte Signalling and Development, The Babraham Institute, Cambridge, CB22 3AT, UK.

    microRNA-155 (miR-155) is expressed by cells of the immune system after activation and has been shown to be required for antibody production after vaccination with attenuated Salmonella. Here we show the intrinsic requirement for miR-155 in B cell responses to thymus-dependent and -independent antigens. B cells lacking miR-155 generated reduced extrafollicular and germinal center responses and failed to produce high-affinity IgG1 antibodies. Gene-expression profiling of activated B cells indicated that miR-155 regulates an array of genes with diverse function, many of which are predicted targets of miR-155. The transcription factor Pu.1 is validated as a direct target of miR155-mediated inhibition. When Pu.1 is overexpressed in wild-type B cells, fewer IgG1 cells are produced, indicating that loss of Pu.1 regulation is a contributing factor to the miR-155-deficient phenotype. Our results implicate post-transcriptional regulation of gene expression for establishing the terminal differentiation program of B cells.

    Funded by: Medical Research Council: G0700287, G117/424, G8402371, MC_U105178806

    Immunity 2007;27;6;847-59

  • Say hello to our little friends.

    Walker A

    Nature reviews. Microbiology 2007;5;8;572-3

  • This place is big enough for both of us.

    Walker A and Crossman LC

    Nature reviews. Microbiology 2007;5;2;90-2

  • Urbane decay.

    Walker A and Seth-Smith H

    Nature reviews. Microbiology 2007;5;10;748-9

  • Ankyrin repeat domain-encoding genes in the wPip strain of Wolbachia from the Culex pipiens group.

    Walker T, Klasson L, Sebaihia M, Sanders MJ, Thomson NR, Parkhill J and Sinkins SP

    Peter Medawar Building for Pathogen Research and Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK.

    Background: Wolbachia are obligate endosymbiotic bacteria maternally transmitted through the egg cytoplasm that are responsible for several reproductive disorders in their insect hosts, such as cytoplasmic incompatibility (CI) in infected mosquitoes. Species in the Culex pipiens complex display an unusually high number of Wolbachia-induced crossing types, and based on present data, only the wPip strain is present.

    Results: The sequencing of the wPip strain of Wolbachia revealed the presence of 60 ankyrin repeat domain (ANK) encoding genes and expression studies of these genes were carried out in adult mosquitoes. One of these ANK genes, pk2, is shown to be part of an operon of three prophage-associated genes with sex-specific expression, and is present in two identical copies in the genome. Another homolog of pk2 is also present that is differentially expressed in different Cx. pipiens group strains. A further two ANK genes showed sex-specific regulation in wPip-infected Cx. pipiens group adults.

    Conclusion: The high number, variability and differential expression of ANK genes in wPip suggest an important role in Wolbachia biology, and the gene family provides both markers and promising candidates for the study of reproductive manipulation.

    Funded by: Wellcome Trust: 079059

    BMC biology 2007;5;39

  • Enteroaggregative Escherichia coli related to uropathogenic clonal group A.

    Wallace-Gadsden F, Johnson JR, Wain J and Okeke IN

    Department of Biology, Haverford College, Haverford, Pennsylvania 19041, USA.

    Enteroaggregative Escherichia coli (EAEC) are heterogeneous, diarrheagenic E. coli. Of EAEC strains from Nigeria, 10 independent antimicrobial-resistant isolates belonged to the multilocus sequence type 69 clonal complex, to which uropathogenic E. coli clonal group A belongs. This finding suggests a recent common ancestor for these distinct groups of pathogenic E. coli.

    Emerging infectious diseases 2007;13;5;757-60

  • A recessive genetic screen for host factors required for retroviral infection in a library of insertionally mutated Blm-deficient embryonic stem cells.

    Wang W and Bradley A

    Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, PR China.

    Background: Host factors required for retroviral infection are potential targets for the modulation of diseases caused by retroviruses. During the retroviral life cycle, numerous cellular factors interact with the virus and play an essential role in infection. Cultured embryonic stem (ES) cells are susceptible to retroviral infection, therefore providing access to all of the genes required for this process to take place. In order to identify the host factors involved in retroviral infection, we designed and implemented a scheme for identifying ES cells that are resistant to retroviral infection and subsequent cloning of the mutated gene.

    Results: A library of mutant ES cells was established by genome-wide insertional mutagenesis in Blm-deficient ES cells, and a screen was performed by superinfection of the library at high multiplicity with a recombinant retrovirus carrying a positive and negative selection cassette. Stringent negative selection was then used to exclude the infected ES cells. We successfully recovered five independent clones of ES cells that are resistant to retroviral infection. Analysis of the mutations in these clones revealed four different homozygous and one compound heterozygous mutation in the mCat-1 locus, which confirms that mCat-1 is the ecotropic murine leukemia virus receptor in ES cells.

    Conclusion: We have demonstrated the feasibility and reliability of this recessive genetic approach to identifying critical genes required for retroviral infection in ES cells; the approach provides a unique opportunity to recover other cellular factors required for retroviral infection. The resulting insertionally mutated Blm-deficient ES cell library might also provide access to essential host cell components that are required for infection and replication for other types of virus.

    Funded by: Wellcome Trust

    Genome biology 2007;8;4;R48

  • Induced mitotic recombination of p53 in vivo.

    Wang W, Warren M and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Genetic mosaics produced by FLP/FRT induced mitotic recombination have been widely used in Drosophila to study gene function in development. Recently, the Cre/loxP system has been applied to induce mitotic recombination in mouse embryonic stem cells and in many adult mouse tissues. We have used this strategy to generate a previously undescribed p53 mouse model in which expression of a ubiquitously expressed recombinase in a heterozygous p53 knockout animal produces mitotic recombinant clones homozygous for the p53 mutation. The induction of loss of heterozygosity in a few cells in an otherwise normal tissue mimics genetic aspects of tumorigenesis more closely than existing models and has revealed the possible cell autonomous nature of Wnt3. Our results suggest that inducible mitotic recombination can be used for clonal analysis of mutants in the mouse.

    Funded by: Wellcome Trust: 79643

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;11;4501-5

  • Assessing the potential of immunohistochemistry for systematic gene expression profiling.

    Warford A, Flack G, Conquer JS, Zola H and McCafferty J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Immunohistochemistry (IHC) is a powerful technique for identifying sites of protein expression in tissues at the cellular and sub-cellular level. Here we have investigated the potential of using IHC for genome-wide expression screening by measuring the success rate and specificity of a panel of 35 monoclonal antibodies recognizing 5 well characterised CD antigens. Antibodies were pre-screened on acetone fixed frozen sections of spleen, tonsil and colon tissues. 19/35 antibodies gave staining with a success rate of 0/7 for JAM-2, 1/4 for CD99, 3/6 for CD138, 5/8 for CD45 and 10/10 for MHC-class II. 16/19 of these antibodies also gave staining on formalin fixed paraffin embedded tissue sections of tonsil and colon. All antibodies that had given staining were then profiled on tissues presented in human tissue microarrays. In the frozen microarrays 216 cores from 29 normal tissue types were present and in the formalin fixed paraffin array 344 cores from 35 normal and 4 cancers were represented. Where multiple antibodies were positive, there was evidence of consistent staining of the same tissues with several antibodies. In some cases differences in staining were observed potentially due to differential splice variants, polymorphisms or protein modification. With some antibodies there was evidence of cross-reactivity to inappropriate cells or structures. In addition the staining intensity with formalin fixation was changed quantitatively for some antibodies and in a few cases qualitatively, representing differential sensitivity of specific and non-specific epitopes to fixation. Accordingly, whilst IHC has potential for describing protein expression of unknown genes, these results emphasise a need to systematically address issues of specificity and sensitivity if appropriate profiles are to be described.

    Journal of immunological methods 2007;318;1-2;125-37

  • Targeting therapeutics: closing the gap between bench and bedside--a focus on tissue microarrays.

    Warford T

    Expert review of molecular diagnostics 2007;7;2;103-5

  • A Sall4 mutant mouse model useful for studying the role of Sall4 in early embryonic development and organogenesis.

    Warren M, Wang W, Spiden S, Chen-Murchie D, Tannahill D, Steel KP and Bradley A

    SALL4 is a homologue of the Drosophila homeotic gene spalt, a zinc finger transcription factor, required for inner cell mass proliferation in early embryonic development. It also interacts with other transcription factors to control the development of the anorectal region, kidney, heart, limbs, and brain. Truncating mutations in SALL4 cause Okihiro syndrome, manifest as Duane anomaly, radial ray defects and sensorineural and conductive deafness. We report the characterization of a novel murine Sall4 null allele created by bacterial recombineering in ES cells. Homozygous mutant mice exhibit early embryonic lethality. Heterozygous mutant mice recapitulate phenotypic features of Okihiro syndrome including deafness, lower anogenital tract abnormalities, renal hypoplasia, anencephaly, Hirschprung's disease, and skeletal defects. This phenotype shows important differences in cardiac and ear manifestations to previously characterized Sall4 mutant alleles and should prove useful for the investigation of the influence of modifier alleles and protein interactions on the transcriptional regulatory function of Sall4.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 077187

    Genesis (New York, N.Y. : 2000) 2007;45;1;51-8

  • Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

    Wellcome Trust Case Control Consortium

    There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined approximately 2,000 individuals for each of 7 major diseases and a shared set of approximately 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 x 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10(-5) and 5 x 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0100594, G0501942, G0600329, G0600705, G0800759, G0901461, G19/9, G90/106, G9806740, G9810900; Wellcome Trust: 076113, 077011

    Nature 2007;447;7145;661-78

  • Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants.

    Wellcome Trust Case Control Consortium, Australo-Anglo-American Spondylitis Consortium (TASC), Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, Todd JA, Donnelly P, Barrett JC, Davison D, Easton D, Evans DM, Leung HT, Marchini JL, Morris AP, Spencer CC, Tobin MD, Attwood AP, Boorman JP, Cant B, Everson U, Hussey JM, Jolley JD, Knight AS, Koch K, Meech E, Nutland S, Prowse CV, Stevens HE, Taylor NC, Walters GR, Walker NM, Watkins NA, Winzer T, Jones RW, McArdle WL, Ring SM, Strachan DP, Pembrey M, Breen G, St Clair D, Caesar S, Gordon-Smith K, Jones L, Fraser C, Green EK, Grozeva D, Hamshere ML, Holmans PA, Jones IR, Kirov G, Moskivina V, Nikolov I, O'Donovan MC, Owen MJ, Collier DA, Elkin A, Farmer A, Williamson R, McGuffin P, Young AH, Ferrier IN, Ball SG, Balmforth AJ, Barrett JH, Bishop TD, Iles MM, Maqbool A, Yuldasheva N, Hall AS, Braund PS, Dixon RJ, Mangino M, Stevens S, Thompson JR, Bredin F, Tremelling M, Parkes M, Drummond H, Lees CW, Nimmo ER, Satsangi J, Fisher SA, Forbes A, Lewis CM, Onnie CM, Prescott NJ, Sanderson J, Matthew CG, Barbour J, Mohiuddin MK, Todhunter CE, Mansfield JC, Ahmad T, Cummings FR, Jewell DP, Webster J, Brown MJ, Lathrop MG, Connell J, Dominiczak A, Marcano CA, Burke B, Dobson R, Gungadoo J, Lee KL, Munroe PB, Newhouse SJ, Onipinla A, Wallace C, Xue M, Caulfield M, Farrall M, Barton A, Biologics in RA Genetics and Genomics Study Syndicate (BRAGGS) Steering Committee, Bruce IN, Donovan H, Eyre S, Gilbert PD, Hilder SL, Hinks AM, John SL, Potter C, Silman AJ, Symmons DP, Thomson W, Worthington J, Dunger DB, Widmer B, Frayling TM, Freathy RM, Lango H, Perry JR, Shields BM, Weedon MN, Hattersley AT, Hitman GA, Walker M, Elliott KS, Groves CJ, Lindgren CM, Rayner NW, Timpson NJ, Zeggini E, Newport M, Sirugo G, Lyons E, Vannberg F, Hill AV, Bradbury LA, Farrar C, Pointon JJ, Wordsworth P, Brown MA, Franklyn JA, Heward JM, Simmonds MJ, Gough SC, Seal S, Breast Cancer Susceptibility Collaboration (UK), Stratton MR, Rahman N, Ban M, Goris A, Sawcer SJ, Compston A, Conway D, Jallow M, Newport M, Sirugo G, Rockett KA, Bumpstead SJ, Chaney A, Downes K, Ghori MJ, Gwilliam R, Hunt SE, Inouye M, Keniry A, King E, McGinnis R, Potter S, Ravindrarajah R, Whittaker P, Widden C, Withers D, Cardin NJ, Davison D, Ferreira T, Pereira-Gale J, Hallgrimsdo'ttir IB, Howie BN, Su Z, Teo YY, Vukcevic D, Bentley D, Brown MA, Compston A, Farrall M, Hall AS, Hattersley AT, Hill AV, Parkes M, Pembrey M, Stratton MR, Mitchell SL, Newby PR, Brand OJ, Carr-Smith J, Pearce SH, McGinnis R, Keniry A, Deloukas P, Reveille JD, Zhou X, Sims AM, Dowling A, Taylor J, Doan T, Davis JC, Savage L, Ward MM, Learch TL, Weisman MH and Brown M

    Genetic Epidemiology Group, Department of Health Sciences, University of Leicester, Adrian Building, University Road, Leicester LE1 7RH, UK.

    We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirmation of the previously reported association of AITD with TSHR and FCRL3. These findings, enabled in part by increased statistical power resulting from the expansion of the control reference group to include individuals from the other disease groups, highlight notable new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major 'seronegative' diseases.

    Funded by: Arthritis Research UK: 17552; Cancer Research UK: A4994; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0501942, G0600329, G0600705, G0701003, G0800759, G19/9, G90/106; Multiple Sclerosis Society: 730; Wellcome Trust: 057097, 076113, 081682, 089989

    Nature genetics 2007;39;11;1329-37

  • Quantitative trait association in parent offspring trios: Extension of case/pseudocontrol method and comparison of prospective and retrospective approaches.

    Wheeler E and Cordell HJ

    The Wellcome Trust Sanger Institute, Cambridge, UK.

    The case/pseudocontrol method provides a convenient framework for family-based association analysis of case-parent trios, incorporating several previously proposed methods such as the transmission/disequilibrium test and log-linear modelling of parent-of-origin effects. The method allows genotype and haplotype analysis at an arbitrary number of linked and unlinked multiallelic loci, as well as modelling of more complex effects such as epistasis, parent-of-origin effects, maternal genotype and mother-child interaction effects, and gene-environment interactions. Here we extend the method for analysis of quantitative as opposed to dichotomous (e.g. disease) traits. The resulting method can be thought of as a retrospective approach, modelling genotype given trait value, in contrast to prospective approaches that model trait given genotype. Through simulations and analytical derivations, we examine the power and properties of our proposed approach, and compare it to several previously proposed single-locus methods for quantitative trait association analysis. We investigate the performance of the different methods when extended to allow analysis of haplotype, maternal genotype and parent-of-origin effects. With randomly ascertained families, with or without population stratification, the prospective approach (modeling trait value given genotype) is found to be generally most effective, although the retrospective approach has some advantages with regard to estimation and interpretability of parameter estimates when applied to selected samples.

    Funded by: Wellcome Trust: 068615, 074524

    Genetic epidemiology 2007;31;8;813-33

  • Esophageal atresia, hypoplasia of zygomatic complex, microcephaly, cup-shaped ears, congenital heart defect, and mental retardation--new MCA/MR syndrome in two affected sibs and a mildly affected mother?

    Wieczorek D, Shaw-Smith C, Kohlhase J, Schmitt W, Buiting K, Coffey A, Howard E, Hehr U and Gillessen-Kaesbach G

    Institut für Humangenetik, Universitätsklinikum Essen, Germany, and Department of Medical Genetics, Addenbrooke's Hospital, Cambridge, UK.

    The previously undescribed combination of esophageal atresia, hypoplasia of the zygomatic complex, microcephaly, cup-shaped ears, congenital heart defect, and mental retardation was diagnosed in two siblings of different sexes, with the brother being more severely affected. The mother presented with zygomatic arch hypoplasia of the right side only. We discuss major differential diagnoses: Goldenhar, Feingold, CHARGE, and Treacher Collins syndromes show a few overlapping clinical features, but these diagnoses are unlikely as the clinical findings are unusual for Goldenhar syndrome and mutational screening of the MYCN, the CHD7, and the TCOF1 genes did not reveal any abnormalities. Autosomal recessive oto-facial syndrome, hypomandibular faciocranial dysostosis, and Ozkan syndromes were clinically excluded. A microdeletion 22q11.2 was excluded by FISH analysis, a microdeletion 2p23-p24 by microsatellite analyses, a subtelomeric chromosomal aberration by MLPA, and a small genomic deletion/duplication by CGH array. As X-inactivation studies did not show skewed X-inactivation in the mother, we consider X-chromosomal recessive inheritance of this condition less likely. We discuss autosomal dominant inheritance with variable expressivity or mosaicism in the mother as the likely genetic mechanism in this new multiple congenital anomaly/mental retardation (MCA/MR) syndrome.

    American journal of medical genetics. Part A 2007;143A;11;1135-42

  • The Israeli-Palestinian Science Organization.

    Wiesel T, Agre P, Arrow KJ, Atiyah M, Brézin E, Charfi FF, Cohen-Tanoudji C, Daar A, Jacob F, Kahneman D, Lee YT, Nicolaisen I, Nusseibeh S, Reuter H, Shoham Y, Sulston J, Walzer M and Yaari M

    Science (New York, N.Y.) 2007;315;5808;39

  • 'Oming in on schistosomes: prospects and limitations for post-genomics.

    Wilson RA, Ashton PD, Braschi S, Dillon GP, Berriman M and Ivens A

    Department of Biology, University of York, PO Box 373, York YO10 5YW, UK.

    The recent release of version 3 of the Schistosoma mansoni genome assembly has made a wealth of information available to researchers. Here, progress made in schistosome genomics and post-genomics is considered. The current status of knowledge about the genome, transcriptome, proteome, glycome and immunome is summarized and recent publications briefly reviewed. The prospects for advances in understanding schistosome biology are highlighted. Most importantly, the limitations (which are mostly technical) that need to be addressed before the full potential of the genome database(s) can be realized are emphasized.

    Funded by: Wellcome Trust

    Trends in parasitology 2007;23;1;14-20

  • Finding cis-regulatory modules in Drosophila using phylogenetic hidden Markov models.

    Wong WS and Nielsen R

    Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.

    Motivation: Finding the regulatory modules for transcription factors binding is an important step in elucidating the complex molecular mechanisms underlying regulation of gene expression. There are numerous methods available for solving this problem, however, very few of them take advantage of the increasing availability of comparative genomic data.

    Results: We develop a method for finding regulatory modules in Eukaryotic species using phylogenetic data. Using computer simulations and analysis of real data, we show that the use of phylogenetic hidden Markov model can lead to an increase in accuracy of prediction over methods that do not take advantage of the data from multiple species.

    Availability: The new method is made accessible under GPL in a new publicly available JAVA program: EvoPromoter. It can be downloaded at

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2007;23;16;2031-7

  • Mutations in ionotropic AMPA receptor 3 alter channel properties and are associated with moderate cognitive impairment in humans.

    Wu Y, Arai AC, Rumbaugh G, Srivastava AK, Turner G, Hayashi T, Suzuki E, Jiang Y, Zhang L, Rodriguez J, Boyle J, Tarpey P, Raymond FL, Nevelsteen J, Froyen G, Stratton M, Futreal A, Gecz J, Stevenson R, Schwartz CE, Valle D, Huganir RL and Wang T

    Institute of Genetic Medicine and Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.

    Ionotropic alpha-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors (iGluRs) mediate the majority of excitatory synaptic transmission in the CNS and are essential for the induction and maintenance of long-term potentiation and long-term depression, two cellular models of learning and memory. We identified a genomic deletion (0.4 Mb) involving the entire GRIA3 (encoding iGluR3) by using an X-array comparative genomic hybridization (CGH) and four missense variants (G833R, M706T, R631S, and R450Q) in functional domains of iGluR3 by sequencing 400 males with X-linked mental retardation (XLMR). Three variants were found in males with moderate MR and were absent in 500 control males. Expression studies in HEK293 cells showed that G833R resulted in a 78% reduction of iGluR3 due to protein misfolding. Whole-cell recording studies of iGluR3 homomers in HEK293 cells revealed that neither iGluR3-M706T (S2 domain) nor iGluR3-R631S (near channel core) had substantial channel function, whereas R450Q (S1 domain) was associated with accelerated receptor desensitization. When forming heteromeric receptors with iGluR2 in HEK293 cells, all four iGluR3 variants had altered desensitization kinetics. Our study provides the genetic and functional evidence that mutant iGluR3 with altered kinetic properties is associated with moderate cognitive impairment in humans.

    Funded by: NICHD NIH HHS: HD044789, HD24061, HD26202; NINDS NIH HHS: NS36715, NS41020

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;46;18163-8

  • Interleukin-2 gene variation impairs regulatory T cell function and causes autoimmunity.

    Yamanouchi J, Rainbow D, Serra P, Howlett S, Hunter K, Garner VE, Gonzalez-Munoz A, Clark J, Veijola R, Cubbon R, Chen SL, Rosa R, Cumiskey AM, Serreze DV, Gregory S, Rogers J, Lyons PA, Healy B, Smink LJ, Todd JA, Peterson LB, Wicker LS and Santamaria P

    Julia McFarlane Diabetes Research Centre (JMDRC) and Department of Microbiology and Infectious Diseases, Institute of Inflammation, Infection and Immunity, Faculty of Medicine, The University of Calgary, Calgary, Alberta T2N 4N1, Canada.

    Autoimmune diseases are thought to result from imbalances in normal immune physiology and regulation. Here, we show that autoimmune disease susceptibility and resistance alleles on mouse chromosome 3 (Idd3) correlate with differential expression of the key immunoregulatory cytokine interleukin-2 (IL-2). In order to test directly that an approximately twofold reduction in IL-2 underpins the Idd3-linked destabilization of immune homeostasis, we show that engineered haplodeficiency of Il2 gene expression not only reduces T cell IL-2 production by twofold but also mimics the autoimmune dysregulatory effects of the naturally occurring susceptibility alleles of Il2. Reduced IL-2 production achieved by either genetic mechanism correlates with reduced function of CD4(+) CD25(+) regulatory T cells, which are critical for maintaining immune homeostasis.

    Funded by: Wellcome Trust: 061859

    Nature genetics 2007;39;3;329-37

  • Revisiting the molecular evolutionary history of Shigella spp.

    Yang J, Nie H, Chen L, Zhang X, Yang F, Xu X, Zhu Y, Yu J and Jin Q

    State Key Laboratory for Molecular Virology and Genetic Engineering, 6 Rongjing East Street, BDA Beijing 100176, PR China.

    The theory that Shigella is derived from multiple independent origins of Escherichia coli (Pupo et al. 2000) has been challenged by recent findings that the virulence plasmids (VPs) and the chromosomes share a similar evolutionary history (Escobar-Paramo et al. 2003), which suggests that an ancestral VP entered an E. coli strain only once, which gave rise to Shigella spp. In an attempt to resolve these conflicting theories, we constructed three phylogenetic trees in this study: a robust chromosomal tree using 23 housekeeping genes from 46 strains of Shigella and enteroinvasive E. coli (EIEC), a chromosomal tree using 4 housekeeping genes from 19 EcoR strains and 46 Shigella/EIEC strains, and a VP tree using 5 genes outside of the VP cell-entry region from 38 Shigella/EIEC strains. Both chromosomal trees group Shigella into three main clusters and five outliers, and strongly suggest that Shigella has multiple origins within E. coli. Most strikingly, the VP tree shows that the VPs from two main Shigella clusters, C1 and C2, are more closely related, which contradicts the chromosomal trees that place C2 and C3 next to each other but C1 at a distance. Additionally, we have identified a complete tra operon of the F-plasmid in the genome sequence of an EIEC strain and found that two other EIEC strains are also likely to possess a complete tra operon. All lines of evidence support an alternative multiorigin theory that transferable diverse ancestral VPs entered diverse origins of E. coli multiple times during a prolonged period of time, resulting in Shigella species with diverse genomes but similar pathogenic properties.

    Journal of molecular evolution 2007;64;1;71-9

  • Genome-wide association study of prostate cancer identifies a second risk locus at 8q24.

    Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, Wang Z, Welch R, Staats BJ, Calle EE, Feigelson HS, Thun MJ, Rodriguez C, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Giovannucci E, Willett WC, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Gelmann EP, Tucker M, Gerhard DS, Fraumeni JF, Hoover R, Hunter DJ, Chanock SJ and Thomas G

    SAIC-Frederick, National Cancer Institute (NCI)-Frederick Cancer Research and Development Center, Frederick, Maryland 21702, USA.

    Recently, common variants on human chromosome 8q24 were found to be associated with prostate cancer risk. While conducting a genome-wide association study in the Cancer Genetic Markers of Susceptibility project with 550,000 SNPs in a nested case-control study (1,172 cases and 1,157 controls of European origin), we identified a new association at 8q24 with an independent effect on prostate cancer susceptibility. The most significant signal is 70 kb centromeric to the previously reported SNP, rs1447295, but shows little evidence of linkage disequilibrium with it. A combined analysis with four additional studies (total: 4,296 cases and 4,299 controls) confirms association with prostate cancer for rs6983267 in the centromeric locus (P = 9.42 x 10(-13); heterozygote odds ratio (OR): 1.26, 95% confidence interval (c.i.): 1.13-1.41; homozygote OR: 1.58, 95% c.i.: 1.40-1.78). Each SNP remained significant in a joint analysis after adjusting for the other (rs1447295 P = 1.41 x 10(-11); rs6983267 P = 6.62 x 10(-10)). These observations, combined with compelling evidence for a recombination hotspot between the two markers, indicate the presence of at least two independent loci within 8q24 that contribute to prostate cancer in men of European ancestry. We estimate that the population attributable risk of the new locus, marked by rs6983267, is higher than the locus marked by rs1447295 (21% versus 9%).

    Funded by: CCR NIH HHS: N01-RC-37004, N01-RC-45035; NCI NIH HHS: 5U01CA098233-04, CA55075, N01-CN-45165, T32 CA 09001, U01 CA098710; Wellcome Trust

    Nature genetics 2007;39;5;645-9

  • A second major histocompatibility complex susceptibility locus for multiple sclerosis.

    Yeo TW, De Jager PL, Gregory SG, Barcellos LF, Walton A, Goris A, Fenoglio C, Ban M, Taylor CJ, Goodman RS, Walsh E, Wolfish CS, Horton R, Traherne J, Beck S, Trowsdale J, Caillier SJ, Ivinson AJ, Green T, Pobywajlo S, Lander ES, Pericak-Vance MA, Haines JL, Daly MJ, Oksenberg JR, Hauser SL, Compston A, Hafler DA, Rioux JD and Sawcer S

    Department of Clinical Neurosciences, University of Cambridge, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Objective: Variation in the major histocompatibility complex (MHC) on chromosome 6p21 is known to influence susceptibility to multiple sclerosis with the strongest effect originating from the HLA-DRB1 gene in the class II region. The possibility that other genes in the MHC independently influence susceptibility to multiple sclerosis has been suggested but remains unconfirmed.

    Methods: Using a combination of microsatellite, single nucleotide polymorphism, and human leukocyte antigen (HLA) typing, we screened the MHC in trio families looking for evidence of residual association above and beyond that attributable to the established DRB1*1501 risk haplotype. We then refined this analysis by extending the genotyping of classical HLA loci into independent cases and control subjects.

    Results: Screening confirmed the presence of residual association and suggested that this was maximal in the region of the HLA-C gene. Extending analysis of the classical loci confirmed that this residual association is partly due to allelic heterogeneity at the HLA-DRB1 locus, but also reflects an independent effect from the HLA-C gene. Specifically, the HLA-C*05 allele, or a variant in tight linkage disequilibrium with it, appears to exert a protective effect (p = 3.3 x 10(-5)).

    Interpretation: Variation in the HLA-C gene influences susceptibility to multiple sclerosis independently of any effect attributable to the nearby HLA-DRB1 gene.

    Funded by: Multiple Sclerosis Society: 588; NINDS NIH HHS: K08 NS46341, NS026799, NS032830, NS049477; Wellcome Trust: 048880, 057097

    Annals of neurology 2007;61;3;228-36

  • Insights into modern disease from our distant evolutionary past.

    Yngvadottir B

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    An EMBO workshop entitled 'Human Evolution and Disease' was held recently (6-9 December 2006, Hyderabad, India) where 141 scientists from many disciplines came together to discuss recent studies of human variation, origins and dispersal, natural selection and disease susceptibility. The meeting tackled the subject of human evolution and disease from the different perspectives of archaeology, linguistics, genetics and genomics based on both new and publicly available data sets. In this report, we highlight the latest fashion crazes in the discipline, in particular, the use of large public data sets and new methods to analyse modern human variation and the links between human evolution and disease susceptibility.

    European journal of human genetics : EJHG 2007;15;5;603-6

  • The V103I polymorphism of the MC4R gene and obesity: population based studies and meta-analysis of 29 563 individuals.

    Young EH, Wareham NJ, Farooqi S, Hinney A, Hebebrand J, Scherag A, O'rahilly S, Barroso I and Sandhu MS

    MRC Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK.

    Background: Previous studies have suggested that a variant in the melanocortin-4 receptor (MC4R) gene is important in protecting against common obesity. Larger studies are needed, however, to confirm this relation.

    Methods: We assessed the association between the V103I polymorphism in the MC4R gene and obesity in three UK population based cohort studies, totalling 8304 individuals. We also did a meta-analysis of relevant studies, involving 10 975 cases and 18 588 controls, to place our findings in context.

    Finding: In an analysis of all studies, individuals carrying the isoleucine allele had an 18% (95% confidence interval 4-30%, P=0.015) lower risk of obesity compared with non-carriers. There was no heterogeneity among studies and no apparent publication bias.

    Interpretation: This study confirms that the V103I polymorphism protects against human obesity at a population level. As such it provides proof of principle that specific gene variants may, at least in part, explain susceptibility and resistance to common forms of human obesity. A better understanding of the mechanisms underlying this association will help determine whether changes in MC4R activity have therapeutic potential.

    Funded by: Medical Research Council: G0100103, G9824984, MC_U106179471, MC_U106188470; Wellcome Trust: 068086, 077016

    International journal of obesity (2005) 2007;31;9;1437-41

  • Mapping of KIT adjacent sequences on canid autosomes and B chromosomes.

    Yudkin DV, Trifonov VA, Kukekova AV, Vorobieva NV, Rubtsova NV, Yang F, Acland GM, Ferguson-Smith MA and Graphodatsky AS

    Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia.

    B chromosomes are often considered to be one of the most mysterious elements of karyotypes (Camacho, 2004). It is generally believed that mammalian B chromosomes do not contain any protein coding genes. The discovery of a conserved KIT gene in Canidae B chromosomes has changed this view. Here we performed analysis of sequences surrounding KIT in B chromosomes of the fox and raccoon dog. The presence of the RPL23A pseudogene was shown in canid B chromosomes. The 3' end fragment of the KDR gene was found in raccoon dog B chromosomes. The size of the B-specific fragment homologous to the autosome fragment was estimated to be a minimum of 480 kbp in both species. The origin and evolution of B chromosomes in Canidae are discussed.

    Funded by: NIMH NIH HHS: MH069688

    Cytogenetic and genome research 2007;116;1-2;100-3

  • A new function for the fragile X mental retardation protein in regulation of PSD-95 mRNA stability.

    Zalfa F, Eleuteri B, Dickson KS, Mercaldo V, De Rubeis S, di Penta A, Tabolacci E, Chiurazzi P, Neri G, Grant SG and Bagni C

    Dipartimento di Biologia, Università Tor Vergata, Via della Ricerca Scientifica 1, 00133 Rome, Italy.

    Fragile X syndrome (FXS) results from the loss of the fragile X mental retardation protein (FMRP), an RNA-binding protein that regulates a variety of cytoplasmic mRNAs. FMRP regulates mRNA translation and may be important in mRNA localization to dendrites. We report a third cytoplasmic regulatory function for FMRP: control of mRNA stability. In mice, we found that FMRP binds, in vivo, the mRNA encoding PSD-95, a key molecule that regulates neuronal synaptic signaling and learning. This interaction occurs through the 3' untranslated region of the PSD-95 (also known as Dlg4) mRNA, increasing message stability. Moreover, stabilization is further increased by mGluR activation. Although we also found that the PSD-95 mRNA is synaptically localized in vivo, localization occurs independently of FMRP. Through our functional analysis of this FMRP target we provide evidence that dysregulation of mRNA stability may contribute to the cognitive impairments in individuals with FXS.

    Funded by: Telethon: GGP05269; Wellcome Trust: 056523, 077155

    Nature neuroscience 2007;10;5;578-87

  • Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes.

    Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS, Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI and Hattersley AT

    Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Oxford, OX3 7LJ, UK.

    The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1924 diabetic cases and 2938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.

    Funded by: Medical Research Council: G0000934, G0500070; Wellcome Trust: 083948

    Science (New York, N.Y.) 2007;316;5829;1336-41

  • Y-chromosomal insights into the genetic impact of the caste system in India.

    Zerjal T, Pandya A, Thangaraj K, Ling EY, Kearley J, Bertoneri S, Paracchini S, Singh L and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs, CB10 1SA, UK.

    The caste system has persisted in Indian Hindu society for around 3,500 years. Like the Y chromosome, caste is defined at birth, and males cannot change their caste. In order to investigate the genetic consequences of this system, we have analysed male-lineage variation in a sample of 227 Indian men of known caste, 141 from the Jaunpur district of Uttar Pradesh and 86 from the rest of India. We typed 131 Y-chromosomal binary markers and 16 microsatellites. We find striking evidence for male substructure: in particular, Brahmins and Kshatriyas (but not other castes) from Jaunpur each show low diversity and the predominance of a single distinct cluster of haplotypes. These findings confirm the genetic isolation and drift within the Jaunpur upper castes, which are likely to result from founder effects and social factors. In the other castes, there may be either larger effective population sizes, or less strict isolation, or both.

    Funded by: Wellcome Trust: 077009

    Human genetics 2007;121;1;137-44

  • Inhibition of the dopamine D1 receptor signaling by PSD-95.

    Zhang J, Vinuela A, Neely MH, Hallett PJ, Grant SG, Miller GM, Isacson O, Caron MG and Yao WD

    Department of Psychiatry, Harvard Medical School, New England Primate Research Center, Southborough, Massachusetts 01772, USA.

    Dopamine D1 receptors play an important role in movement, reward, and learning and are implicated in a number of neurological and psychiatric disorders. These receptors are concentrated in dendritic spines of neurons, including the spine head and the postsynaptic density. D1 within spines is thought to modulate the local channels and receptors to control the excitability and synaptic properties of spines. The molecular mechanisms mediating D1 trafficking, anchorage, and function in spines remain elusive. Here we show that the synaptic scaffolding protein PSD-95 thought to play a role in stabilizing glutamate receptors in the postsynaptic density, interacts with D1 and regulates its trafficking and function. Interestingly, the D1-PSD-95 interaction does not require the well characterized domains of PSD-95 but is mediated by the carboxyl-terminal tail of D1 and the NH(2) terminus of PSD-95, a region that is recognized only recently to participate in protein-protein interaction. Co-expression of PSD-95 with D1 in mammalian cells inhibits the D1-mediated cAMP accumulation without altering the total expression level or the agonist binding properties of the receptor. The diminished D1 signaling is mediated by reduced D1 expression at the cell surface as a consequence of an enhanced constitutive, dynamin-dependent endocytosis. In addition, genetically engineered mice lacking PSD-95 show a heightened behavioral response to either a D1 agonist or the psychostimulant amphetamine. These studies demonstrate a role for a glutamatergic scaffold in dopamine receptor signaling and trafficking and identify a new potential target for the modulation of abnormal dopaminergic function.

    Funded by: NCRR NIH HHS: P51 RR000168-430106, RR00168; NINDS NIH HHS: P50 NS039793-07, P50 NS039793-08, P50 NS39793

    The Journal of biological chemistry 2007;282;21;15778-89

  • Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution.

    Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, Ruan Y, Wei CL, Gingeras TR, Guigó R, Harrow J and Gerstein MB

    Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.

    Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

    Funded by: NHGRI NIH HHS: U01HG03147, U01HG03150, U01HG03156; PHS HHS: N01C012400; Wellcome Trust: 077198

    Genome research 2007;17;6;839-51

  • Towards efficient registration of medical images.

    Zhou H, Liu T, Lin F, Pang Y, Wu J and Wu J

    Queen Mary College, University of London, London, UK.

    In this paper we propose a Bayesian based mutual information technique for image registration, combined with an established affine transformation model. Classical affine models allow the images to be approximately aligned. However, inefficiency and inaccuracy has appeared when using these affine models in rigorous circumstances, such as low-resolution images. To challenge this problem, we conduct mutual information measures with importance sampling to the images in an attempt to simulate the probability distribution of intensity similarity across the images. The entire registration adopts a stopping criterion as discovered in the context of differential equations. Finally, experimental results demonstrate the favorable performance of the proposed algorithm.

    Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society 2007;31;6;374-82

* quick link -