Sanger Institute - Publications 2007

Number of papers published in 2007: 144

  • Predicted functions and linkage specificities of the products of the Streptococcus pneumoniae capsular biosynthetic loci.

    Aanensen DM, Mavroidi A, Bentley SD, Reeves PR and Spratt BG

    Department of Infectious Disease Epidemiology, Imperial College London, Room G22, Old Medical School Building, St. Mary's Hospital, Norfolk Place, London W2 1PG, United Kingdom.

    The sequences of the capsular biosynthetic (cps) loci of 90 serotypes of Streptococcus pneumoniae have recently been determined. Bioinformatic procedures were used to predict the general functions of 1,973 of the 1,999 gene products and to identify proteins within the same homology group, Pfam family, and CAZy glycosyltransferase family. Correlating cps gene content with the 54 known capsular polysaccharide (CPS) structures provided tentative assignments of the specific functions of the different homology groups of each functional class (regulatory proteins, enzymes for synthesis of CPS constituents, polymerases, flippases, initial sugar transferases, glycosyltransferases [GTs], phosphotransferases, acetyltransferases, and pyruvyltransferases). Assignment of the glycosidic linkages catalyzed by the 342 GTs (92 homology groups) is problematic, but tentative assignments could be made by using this large set of cps loci and CPS structures to correlate the presence of particular GTs with specific glycosidic linkages, by correlating inverting or retaining linkages in CPS repeat units with the inverting or retaining mechanisms of the GTs predicted from their CAZy family membership, and by comparing the CPS structures of serotypes that have very similar cps gene contents. These large-scale comparisons between structure and gene content assigned the linkages catalyzed by 72% of the GTs, and all linkages were assigned in 32 of the serotypes with known repeat unit structures. Clear examples where very similar initial sugar transferases or glycosyltransferases catalyze different linkages in different serotypes were also identified. These assignments should provide a stimulus for biochemical studies to evaluate the reactions that are proposed.

    Funded by: Wellcome Trust

    Journal of bacteriology 2007;189;21;7856-76

  • WebACT: an online genome comparison suite.

    Abbott JC, Aanensen DM and Bentley SD

    Centre for Bioinformatics, Imperial College London, UK.

    Comparison of related genomes is an enormously powerful technique for explaining phenotypic differences and revealing recent evolutionary events. Genomes evolve through a host of mechanisms including long- and short-range intragenomic rearrangements, insertion of laterally acquired DNA, gene loss, and single-nucleotide polymorphisms. The Artemis Comparison Tool (ACT) was developed to enable the intuitive visualization of the consequences of such events in the context of two or more aligned genomes. WebACT is an online resource designed to allow the alignment of up to five genomic sequences within the ACT environment without the need for local software installation. Comparisons can be carried out between uploaded sequences, or those selected from the EMBL or RefSeq databases, using BLASTZ, MUMmer, or Basic Local Alignment Search Tool (BLAST). Precomputed comparisons can be selected from a database covering all the completed bacterial chromosome and plasmid sequences in the Genome Reviews database (1). This allows the rapid visualization of regions of interest, without the need to handle the full genome sequences. Here, we describe the process of using WebACT to prepare comparisons for visualization, and the selection of precomputed comparisons from the database. The use of ACT to view the selected comparison is then explored using examples from bacterial genomes.

    Funded by: Wellcome Trust

    Methods in molecular biology (Clifton, N.J.) 2007;395;57-74

  • BCL11B is required for positive selection and survival of double-positive thymocytes.

    Albu DI, Feng D, Bhattacharya D, Jenkins NA, Copeland NG, Liu P and Avram D

    Center for Cell Biology and Cancer Research, Albany Medical College, Albany, NY 12208, USA.

    Transcriptional control of gene expression in double-positive (DP) thymocytes remains poorly understood. We show that the transcription factor BCL11B plays a critical role in DP thymocytes by controlling positive selection of both CD4 and CD8 lineages. BCL11B-deficient DP thymocytes rearrange T cell receptor (TCR) alpha; however, they display impaired proximal TCR signaling and attenuated extracellular signal-regulated kinase phosphorylation and calcium flux, which are all required for initiation of positive selection. Further, provision of transgenic TCRs did not improve positive selection of BCL11B-deficient DP thymocytes. BCL11B-deficient DP thymocytes have altered expression of genes with a role in positive selection, TCR signaling, and other signaling pathways intersecting the TCR, which may account for the defect. BCL11B-deficient DP thymocytes also presented increased susceptibility to spontaneous apoptosis associated with high levels of cleaved caspase-3 and an altered balance of proapoptotic/prosurvival factors. This latter susceptibility was manifested even in the absence of TCR signaling and was only partially rescued by provision of the BCL2 transgene, indicating that control of DP thymocyte survival by BCL11B is nonredundant and, at least in part, independent of BCL2 prosurvival factors.

    Funded by: NHLBI NIH HHS: T32 HL007194, T32-HL-07194; NIAID NIH HHS: R01 AI067846, R01 AI067846-01A2; NIAMS NIH HHS: K01 AR-02194, K01 AR002194

    The Journal of experimental medicine 2007;204;12;3003-15

  • SISYPHUS--structural alignments for proteins with non-trivial relationships.

    Andreeva A, Prlić A, Hubbard TJ and Murzin AG

    MRC Centre for Protein Engineering, Hills Road, Cambridge CB2 2QH, UK.

    With the increasing amount of structural data, the number of homologous protein structures bearing topological irregularities is steadily growing. These include proteins with circular permutations, segment-swapping, context-dependent folding or chameleon sequences that can adopt alternative secondary structures. Their non-trivial structural relationships are readily identified during expert analysis but their automatic identification using the existing computational tools still remains difficult or impossible. Such non-trivial cases of protein relationships are known to pose a problem to multiple alignment algorithms and to impede comparative modeling studies. They support a new emerging concept of evolutionary changeable protein fold, which creates practical difficulties for the hierarchical classifications of protein structures.To facilitate the understanding of, and to provide a comprehensive annotation of proteins with such non-trivial structural relationships we have created SISYPHUS ([Sigmaomeganuphiomicronzeta]--in Greek crafty), a compendium to the SCOP database. The SISYPHUS database contains a collection of manually curated structural alignments and their inter-relationships. The multiple alignments are constructed for protein structural regions that range from oligomeric biological units, or individual domains to fragments of different size. The SISYPHUS multiple alignments are displayed with SPICE, a browser that provides an integrated view of protein sequences, structures and their annotations. The database is available from

    Funded by: Medical Research Council: G0100305, MC_U105192716; Wellcome Trust: 077198

    Nucleic acids research 2007;35;Database issue;D253-9

  • The genome of Salmonella enterica serovar Typhi.

    Baker S and Dougan G

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    The generation of complete genome sequences provides a blueprint that facilitates the genetic characterization of pathogens and their hosts. The genome of Salmonella enterica serovar Typhi (S. Typhi) harbors ~5 million base pairs encoding some 4000 genes, of which >200 are functionally inactive. Comparison of S. Typhi isolates from around the world indicates that they are highly related (clonal) and that they emerged from a single point of origin ~30,000-50,000 years ago. Evidence suggests that, as well as undergoing gene degradation, S. Typhi has also recently acquired genes, such as those encoding the Vi antigen, by horizontal transfer events.

    Funded by: Wellcome Trust

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2007;45 Suppl 1;S29-33

  • A linear plasmid truncation induces unidirectional flagellar phase change in H:z66 positive Salmonella Typhi.

    Baker S, Holt K, Whitehead S, Goodhead I, Perkins T, Stocker B, Hardy J and Dougan G

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    The process by which bacteria regulate flagellar expression is known as phase variation and in Salmonella enterica this process permits the expression of one of two flagellin genes, fliC or fljB, at any one time. Salmonella Typhi (S. Typhi) is normally not capable of phase variation of flagellar antigen expression as isolates only harbour the fliC gene (H:d) and lacks an equivalent fljB locus. However, some S. Typhi isolates, exclusively from Indonesia, harbour an fljB equivalent encoded on linear plasmid, pBSSB1 that drives the expression of a novel flagellin named H:z66. H:z66+S. Typhi isolates were stimulated to change flagellar phase and genetically analysed for the mechanism of variation. The phase change was demonstrated to be unidirectional, reverting to expression from the resident chromosomal fliC gene. DNA sequencing demonstrated that pBSSB1 linear DNA was still detectable but that these derivatives had undergone deletion and were lacking fljA(z66) (encoding a flagellar repressor) and fljB(z66). The deletion end-point was found to involve one of the plasmid termini and a palindromic repeat sequence within fljB(z66), distinct to that found at the terminus of pBSSB1. These data demonstrate that, like some Streptomyces linear elements, at least one of the terminal inverted repeats of pBSSB1 is non-essential, but that a palindromic repeat sequence may be necessary for replication.

    Funded by: Wellcome Trust: 076962

    Molecular microbiology 2007;66;5;1207-18

  • SCOOP: a simple method for identification of novel protein superfamily relationships.

    Bateman A and Finn RD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Motivation: Profile searches of sequence databases are a sensitive way to detect sequence relationships. Sophisticated profile-profile comparison algorithms that have been recently introduced increase search sensitivity even further.

    Results: In this article, a simpler approach than profile-profile comparison is presented that has a comparable performance to state-of-the-art tools such as COMPASS, HHsearch and PRC. This approach is called SCOOP (Simple Comparison Of Outputs Program), and is shown to find known relationships between families in the Pfam database as well as detect novel distant relationships between families. Several novel discoveries are presented including the discovery that a domain of unknown function (DUF283) found in Dicer proteins is related to double-stranded RNA-binding domains.

    Availability: SCOOP is freely available under a GNU GPL license from

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust: 087656

    Bioinformatics (Oxford, England) 2007;23;7;809-14

  • Bacterial therapeutics.

    Bentley S and Sebaihia M

    Nature reviews. Microbiology 2007;5;3;170-1

  • Variety is the spice of eukaryotic life.

    Berriman M and Pain A

    Nature reviews. Microbiology 2007;5;9;660-1

  • Genome plasticity of BCG and impact on vaccine efficacy.

    Brosch R, Gordon SV, Garnier T, Eiglmeier K, Frigui W, Valenti P, Dos Santos S, Duthoy S, Lacroix C, Garcia-Pelayo C, Inwald JK, Golby P, Garcia JN, Hewinson RG, Behr MA, Quail MA, Churcher C, Barrell BG, Parkhill J and Cole ST

    Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, 28 Rue du Docteur Roux, 75724 Paris Cedex 15, France.

    To understand the evolution, attenuation, and variable protective efficacy of bacillus Calmette-Guérin (BCG) vaccines, Mycobacterium bovis BCG Pasteur 1173P2 has been subjected to comparative genome and transcriptome analysis. The 4,374,522-bp genome contains 3,954 protein-coding genes, 58 of which are present in two copies as a result of two independent tandem duplications, DU1 and DU2. DU1 is restricted to BCG Pasteur, although four forms of DU2 exist; DU2-I is confined to early BCG vaccines, like BCG Japan, whereas DU2-III and DU2-IV occur in the late vaccines. The glycerol-3-phosphate dehydrogenase gene, glpD2, is one of only three genes common to all four DU2 variants, implying that BCG requires higher levels of this enzyme to grow on glycerol. Further amplification of the DU2 region is ongoing, even within vaccine preparations used to immunize humans. An evolutionary scheme for BCG vaccines was established by analyzing DU2 and other markers. Lesions in genes encoding sigma-factors and pleiotropic transcriptional regulators, like PhoR and Crp, were also uncovered in various BCG strains; together with gene amplification, these affect gene expression levels, immunogenicity, and, possibly, protection against tuberculosis. Furthermore, the combined findings suggest that early BCG vaccines may even be superior to the later ones that are more widely used.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;13;5596-601

  • Generation of an inducible and optimized piggyBac transposon system.

    Cadiñanos J and Bradley A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Genomic studies in the mouse have been slowed by the lack of transposon-mediated mutagenesis. However, since the resurrection of Sleeping Beauty (SB), the possibility of performing forward genetics in mice has been reinforced. Recently, piggyBac (PB), a functional transposon from insects, was also described to work in mammals. As the activity of PB is higher than that of SB11 and SB12, two hyperactive SB transposases, we have characterized and improved the PB system in mouse ES cells. We have generated a mouse codon-optimized version of the PB transposase coding sequence (CDS) which provides transposition levels greater than the original. We have also found that the promoter sequence predicted in the 5'-terminal repeat of the PB transposon is active in the mammalian context. Finally, we have engineered inducible versions of the optimized piggyBac transposase fused with ERT2. One of them, when induced, provides higher levels of transposition than the native piggyBac CDS, whereas in the absence of induction its activity is indistinguishable from background. We expect that these tools, adaptable to perform mouse-germline mutagenesis, will facilitate the identification of genes involved in pathological and physiological processes, such as cancer or ES cell differentiation.

    Funded by: Wellcome Trust

    Nucleic acids research 2007;35;12;e87

  • Methods and strategies for analyzing copy number variation using DNA microarrays.

    Carter NP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The association of DNA copy-number variation (CNV) with specific gene function and human disease has been long known, but the wide scope and prevalence of this form of variation has only recently been fully appreciated. The latest studies using microarray technology have demonstrated that as much as 12% of the human genome and thousands of genes are variable in copy number, and this diversity is likely to be responsible for a significant proportion of normal phenotypic variation. Current challenges involve developing methods not only for detecting and cataloging CNVs in human populations at increasingly higher resolution but also for determining the association of CNVs with biological function, recent human evolution, and common and complex human disease.

    Funded by: Wellcome Trust: 077008

    Nature genetics 2007;39;7 Suppl;S16-21

  • A recombineering based approach for high-throughput conditional knockout targeting vector construction.

    Chan W, Costantino N, Li R, Lee SC, Su Q, Melvin D, Court DL and Liu P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Functional analysis of mammalian genes in vivo is primarily achieved through analysing knockout mice. Now that the sequencing of several mammalian genomes has been completed, understanding functions of all the genes represents the next major challenge in the post-genome era. Generation of knockout mutant mice has currently been achieved by many research groups but only by making individual knockouts, one by one. New technological advances and the refinements of existing technologies are critical for genome-wide targeted mutagenesis in the mouse. We describe here new recombineering reagents and protocols that enable recombineering to be carried out in a 96-well format. Consequently, we are able to construct 96 conditional knockout targeting vectors simultaneously. Our new recombineering system makes it a reality to generate large numbers of precisely engineered DNA constructs for functional genomics studies.

    Funded by: Intramural NIH HHS; Wellcome Trust

    Nucleic acids research 2007;35;8;e64

  • Serodiagnosis of Salmonella enterica serovar Typhi and S. enterica serovars Paratyphi A, B and C human infections.

    Chart H, Cheasty T, de Pinna E, Siorvanes L, Wain J, Alam D, Nizami Q, Bhutta Z and Threlfall EJ

    Laboratory of Enteric Pathogens, Department of Gastrointestinal Infections, Centre for Infections, Health Protection Agency, 61 Colindale Avenue, London NW9 5EQ, UK.

    The aim of this study was to evaluate an immunoassay for the detection of human serum antibodies to the LPS and flagellar antigens of Salmonella Typhi and Salmonella Paratyphi A, B and C, and to the Vi capsular polysaccharide of S. Typhi and S. Paratyphi C. A total of 330 sera were used; these originated from 15 patients who were culture-positive for S. Typhi and 15 healthy controls, together with 300 sera submitted to the Laboratory of Enteric Pathogens for Salmonella serodiagnosis. By SDS-PAGE/immunoblotting, all 15 sera from culture-positive patients had serum antibodies to the 9,12 LPS antigens and 10 had antibodies to the 'd' flagellar antigens. Of the 300 reference sera, 22 had antibodies to the 9,12 LPS antigens, one to the 1,4,5,12 LPS antigens and 12 to the 6,7 LPS antigens. Only two sera had antibodies to flagellar antigens, one of which bound to the 'b' and the other to the 'd' antigen. An ELISA was developed that successfully detected serum antibodies to the Vi capsular polysaccharides, but because of the kinetics of serum antibody production to the Vi, these antibodies may be of limited value in the serodiagnosis of acute infection with S. Typhi and S. Paratyphi C. The immunoassays described here provide a sensitive means of detecting serum antibodies to the LPS, flagellar and Vi antigens of S. Typhi and S. Paratyphi, and constitute a viable replacement for the Widal assay for the screening of sera. The Salmonella serodiagnosis protocols described here are the new standard operating procedures used by the Health Protection Agency's National Salmonella Reference Centre based in the Laboratory of Enteric Pathogens, Colindale, UK.

    Journal of medical microbiology 2007;56;Pt 9;1161-6

  • Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure.

    Coghlan A and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Motivation: Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other.

    Results: We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron-exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of approximately 1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder.

    Availability: Scripts and Supplementary Material can be found at

    Funded by: Wellcome Trust: 077192

    Bioinformatics (Oxford, England) 2007;23;12;1468-75

  • Adiponectin receptor genes: mutation screening in syndromes of insulin resistance and association studies for type 2 diabetes and metabolic traits in UK populations.

    Collins SC, Luan J, Thompson AJ, Daly A, Semple RK, O'Rahilly S, Wareham NJ and Barroso I

    Metabolic Disease Group, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Aims/hypothesis: Adiponectin is an adipokine with insulin-sensitising and anti-atherogenic properties. Several reports suggest that genetic variants in the adiponectin gene are associated with circulating levels of adiponectin, insulin sensitivity and type 2 diabetes risk. Recently two receptors for adiponectin have been cloned. Genetic studies have yielded conflicting results on the role of these genes and type 2 diabetes predisposition. In this study we aimed to evaluate the potential role of genetic variation in these genes in syndromes of severe insulin resistance, type 2 diabetes and in related metabolic traits in UK Europid populations.

    Materials and methods: Exons and splice junctions of the adiponectin receptor 1 and 2 genes (ADIPOR1; ADIPOR2) were sequenced in patients from our severe insulin resistance cohort (n=129). Subsequently, 24 polymorphisms were tested for association with type 2 diabetes in population-based type 2 diabetes case-control studies (n=2,127) and with quantitative traits in a population-based longitudinal study (n=1,721).

    Results: No missense or nonsense mutations in ADIPOR1 and ADIPOR2 were detected in the cohort of patients with severe insulin resistance. None of the 24 polymorphisms (allele frequency 2.3-48.3%) tested was associated with type 2 diabetes in the case-control study. Similarly, none of the polymorphisms was associated with fasting plasma insulin, fasting and 2-h post-load plasma glucose, 30-min insulin increment or BMI.

    Conclusions/interpretation: Genetic variation in ADIPOR1 and ADIPOR2 is not a major cause of extreme insulin resistance in humans, nor does it contribute in a significant manner to type 2 diabetes risk and related traits in UK Europid populations.

    Funded by: Medical Research Council: MC_U106179471; Wellcome Trust

    Diabetologia 2007;50;3;555-62

  • The population genetics of structural variation.

    Conrad DF and Hurles ME

    Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.

    Population genetics is central to our understanding of human variation, and by linking medical and evolutionary themes, it enables us to understand the origins and impacts of our genomic differences. Despite current limitations in our knowledge of the locations, sizes and mutational origins of structural variants, our characterization of their population genetics is developing apace, bringing new insights into recent human adaptation, genome biology and disease. We summarize recent dramatic advances, describe the diverse mutational origins of chromosomal rearrangements and argue that their complexity necessitates a re-evaluation of existing population genetic methods.

    Funded by: Wellcome Trust: 077014

    Nature genetics 2007;39;7 Suppl;S30-6

  • Sink or swim.

    Crossman LC

    Nature reviews. Microbiology 2007;5;11;834-5

  • Tissue-specific histone modification and transcription factor binding in alpha globin gene expression.

    De Gobbi M, Anguita E, Hughes J, Sloane-Stanley JA, Sharpe JA, Koch CM, Dunham I, Gibbons RJ, Wood WG and Higgs DR

    Medical Research Council, Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK.

    To address the mechanism by which the human globin genes are activated during erythropoiesis, we have used a tiled microarray to analyze the pattern of transcription factor binding and associated histone modifications across the telomeric region of human chromosome 16 in primary erythroid and nonerythroid cells. This 220-kb region includes the alpha globin genes and 9 widely expressed genes flanking the alpha globin locus. This un-biased, comprehensive analysis of transcription factor binding and histone modifications (acetylation and methylation) described here not only identified all known cis-acting regulatory elements in the human alpha globin cluster but also demonstrated that there are no additional erythroid-specific regulatory elements in the 220-kb region tested. In addition, the pattern of histone modification distinguished promoter elements from potential enhancer elements across this region. Finally, comparison of the human and mouse orthologous regions in a unique mouse model, with both regions coexpressed in the same animal, showed significant differences that may explain how these 2 clusters are regulated differently in vivo.

    Funded by: Medical Research Council: MC_U137961145, MC_U137961147; NHGRI NIH HHS: U01 HG003168; Wellcome Trust

    Blood 2007;110;13;4503-10

  • Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions.

    Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto T, Manzano C, Chrast J, Dike S, Wyss C, Henrichsen CN, Holroyd N, Dickson MC, Taylor R, Hance Z, Foissac S, Myers RM, Rogers J, Hubbard T, Harrow J, Guigó R, Gingeras TR, Antonarakis SE and Reymond A

    Grup de Recerca en Informática Biomèdica, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain.

    This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.

    Funded by: NCI NIH HHS: N01CO12400; NHGRI NIH HHS: U01 HG003147, U01 HG003150, U01HG03147, U01HG03150; PHS HHS: N01C012400; Wellcome Trust: 077198

    Genome research 2007;17;6;746-59

  • An H-NS-like stealth protein aids horizontal DNA transmission in bacteria.

    Doyle M, Fookes M, Ivens A, Mangan MW, Wain J and Dorman CJ

    Department of Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin 2, Ireland.

    The Sfh protein is encoded by self-transmissible plasmids involved in human typhoid and is closely related to the global regulator H-NS. We have found that Sfh provides a stealth function that allows the plasmids to be transmitted to new bacterial hosts with minimal effects on their fitness. Introducing the plasmid without the sfh gene imposes a mild H-NS(-) phenotype and a severe loss of fitness due to titration of the cellular pool of H-NS by the A+T-rich plasmid. This stealth strategy seems to be used widely to aid horizontal DNA transmission and has important implications for bacterial evolution.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 2007;315;5809;251-2

  • Evolution of genes and genomes on the Drosophila phylogeny.

    Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Aquadro CF, Ardell DH, Arguello R, Artieri CG, Barbash DA, Barker D, Barsanti P, Batterham P, Batzoglou S, Begun D, Bhutkar A, Blanco E, Bosak SA, Bradley RK, Brand AD, Brent MR, Brooks AN, Brown RH, Butlin RK, Caggese C, Calvi BR, Bernardo de Carvalho A, Caspi A, Castrezana S, Celniker SE, Chang JL, Chapple C, Chatterji S, Chinwalla A, Civetta A, Clifton SW, Comeron JM, Costello JC, Coyne JA, Daub J, David RG, Delcher AL, Delehaunty K, Do CB, Ebling H, Edwards K, Eickbush T, Evans JD, Filipski A, Findeiss S, Freyhult E, Fulton L, Fulton R, Garcia AC, Gardiner A, Garfield DA, Garvin BE, Gibson G, Gilbert D, Gnerre S, Godfrey J, Good R, Gotea V, Gravely B, Greenberg AJ, Griffiths-Jones S, Gross S, Guigo R, Gustafson EA, Haerty W, Hahn MW, Halligan DL, Halpern AL, Halter GM, Han MV, Heger A, Hillier L, Hinrichs AS, Holmes I, Hoskins RA, Hubisz MJ, Hultmark D, Huntley MA, Jaffe DB, Jagadeeshan S, Jeck WR, Johnson J, Jones CD, Jordan WC, Karpen GH, Kataoka E, Keightley PD, Kheradpour P, Kirkness EF, Koerich LB, Kristiansen K, Kudrna D, Kulathinal RJ, Kumar S, Kwok R, Lander E, Langley CH, Lapoint R, Lazzaro BP, Lee SJ, Levesque L, Li R, Lin CF, Lin MF, Lindblad-Toh K, Llopart A, Long M, Low L, Lozovsky E, Lu J, Luo M, Machado CA, Makalowski W, Marzo M, Matsuda M, Matzkin L, McAllister B, McBride CS, McKernan B, McKernan K, Mendez-Lago M, Minx P, Mollenhauer MU, Montooth K, Mount SM, Mu X, Myers E, Negre B, Newfeld S, Nielsen R, Noor MA, O'Grady P, Pachter L, Papaceit M, Parisi MJ, Parisi M, Parts L, Pedersen JS, Pesole G, Phillippy AM, Ponting CP, Pop M, Porcelli D, Powell JR, Prohaska S, Pruitt K, Puig M, Quesneville H, Ram KR, Rand D, Rasmussen MD, Reed LK, Reenan R, Reily A, Remington KA, Rieger TT, Ritchie MG, Robin C, Rogers YH, Rohde C, Rozas J, Rubenfield MJ, Ruiz A, Russo S, Salzberg SL, Sanchez-Gracia A, Saranga DJ, Sato H, Schaeffer SW, Schatz MC, Schlenke T, Schwartz R, Segarra C, Singh RS, Sirot L, Sirota M, Sisneros NB, Smith CD, Smith TF, Spieth J, Stage DE, Stark A, Stephan W, Strausberg RL, Strempel S, Sturgill D, Sutton G, Sutton GG, Tao W, Teichmann S, Tobari YN, Tomimura Y, Tsolas JM, Valente VL, Venter E, Venter JC, Vicario S, Vieira FG, Vilella AJ, Villasante A, Walenz B, Wang J, Wasserman M, Watts T, Wilson D, Wilson RK, Wing RA, Wolfner MF, Wong A, Wong GK, Wu CI, Wu G, Yamamoto D, Yang HP, Yang SP, Yorke JA, Yoshida K, Zdobnov E, Zhang P, Zhang Y, Zimin AV, Baldwin J, Abdouelleil A, Abdulkadir J, Abebe A, Abera B, Abreu J, Acer SC, Aftuck L, Alexander A, An P, Anderson E, Anderson S, Arachi H, Azer M, Bachantsang P, Barry A, Bayul T, Berlin A, Bessette D, Bloom T, Blye J, Boguslavskiy L, Bonnet C, Boukhgalter B, Bourzgui I, Brown A, Cahill P, Channer S, Cheshatsang Y, Chuda L, Citroen M, Collymore A, Cooke P, Costello M, D'Aco K, Daza R, De Haan G, DeGray S, DeMaso C, Dhargay N, Dooley K, Dooley E, Doricent M, Dorje P, Dorjee K, Dupes A, Elong R, Falk J, Farina A, Faro S, Ferguson D, Fisher S, Foley CD, Franke A, Friedrich D, Gadbois L, Gearin G, Gearin CR, Giannoukos G, Goode T, Graham J, Grandbois E, Grewal S, Gyaltsen K, Hafez N, Hagos B, Hall J, Henson C, Hollinger A, Honan T, Huard MD, Hughes L, Hurhula B, Husby ME, Kamat A, Kanga B, Kashin S, Khazanovich D, Kisner P, Lance K, Lara M, Lee W, Lennon N, Letendre F, LeVine R, Lipovsky A, Liu X, Liu J, Liu S, Lokyitsang T, Lokyitsang Y, Lubonja R, Lui A, MacDonald P, Magnisalis V, Maru K, Matthews C, McCusker W, McDonough S, Mehta T, Meldrim J, Meneus L, Mihai O, Mihalev A, Mihova T, Mittelman R, Mlenga V, Montmayeur A, Mulrain L, Navidi A, Naylor J, Negash T, Nguyen T, Nguyen N, Nicol R, Norbu C, Norbu N, Novod N, O'Neill B, Osman S, Markiewicz E, Oyono OL, Patti C, Phunkhang P, Pierre F, Priest M, Raghuraman S, Rege F, Reyes R, Rise C, Rogov P, Ross K, Ryan E, Settipalli S, Shea T, Sherpa N, Shi L, Shih D, Sparrow T, Spaulding J, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Strader C, Tesfaye S, Thomson T, Thoulutsang Y, Thoulutsang D, Topham K, Topping I, Tsamla T, Vassiliev H, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Young G, Yu Q, Zembek L, Zhong D, Zimmer A, Zwirko Z, Jaffe DB, Alvarez P, Brockman W, Butler J, Chin C, Gnerre S, Grabherr M, Kleber M, Mauceli E and MacCallum I

    Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA.

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

    Funded by: Intramural NIH HHS: Z01 DK015600-12; Medical Research Council: MC_U105161047, MC_U137761446; NHGRI NIH HHS: R01 HG000747, R01 HG000747-16, R01 HG002779-05, R01 HG002779-06, R01 HG004037; NIGMS NIH HHS: F32 GM067504, R01 GM074813-04; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09

    Nature 2007;450;7167;203-18

  • Genome-wide association study identifies novel breast cancer susceptibility loci.

    Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R, Wareham N, Ahmed S, Healey CS, Bowman R, SEARCH collaborators, Meyer KB, Haiman CA, Kolonel LK, Henderson BE, Le Marchand L, Brennan P, Sangrajrang S, Gaborieau V, Odefrey F, Shen CY, Wu PE, Wang HC, Eccles D, Evans DG, Peto J, Fletcher O, Johnson N, Seal S, Stratton MR, Rahman N, Chenevix-Trench G, Bojesen SE, Nordestgaard BG, Axelsson CK, Garcia-Closas M, Brinton L, Chanock S, Lissowska J, Peplonska B, Nevanlinna H, Fagerholm R, Eerola H, Kang D, Yoo KY, Noh DY, Ahn SH, Hunter DJ, Hankinson SE, Cox DG, Hall P, Wedren S, Liu J, Low YL, Bogdanova N, Schürmann P, Dörk T, Tollenaar RA, Jacobi CE, Devilee P, Klijn JG, Sigurdson AJ, Doody MM, Alexander BH, Zhang J, Cox A, Brock IW, MacPherson G, Reed MW, Couch FJ, Goode EL, Olson JE, Meijers-Heijboer H, van den Ouweland A, Uitterlinden A, Rivadeneira F, Milne RL, Ribas G, Gonzalez-Neira A, Benitez J, Hopper JL, McCredie M, Southey M, Giles GG, Schroen C, Justenhoven C, Brauch H, Hamann U, Ko YD, Spurdle AB, Beesley J, Chen X, kConFab, AOCS Management Group, Mannermaa A, Kosma VM, Kataja V, Hartikainen J, Day NE, Cox DR and Ponder BA

    CR-UK Genetic Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK.

    Breast cancer exhibits familial aggregation, consistent with variation in genetic susceptibility to the disease. Known susceptibility genes account for less than 25% of the familial risk of breast cancer, and the residual genetic variance is likely to be due to variants conferring more moderate risks. To identify further susceptibility alleles, we conducted a two-stage genome-wide association study in 4,398 breast cancer cases and 4,316 controls, followed by a third stage in which 30 single nucleotide polymorphisms (SNPs) were tested for confirmation in 21,860 cases and 22,578 controls from 22 studies. We used 227,876 SNPs that were estimated to correlate with 77% of known common SNPs in Europeans at r2 > 0.5. SNPs in five novel independent loci exhibited strong and consistent evidence of association with breast cancer (P < 10(-7)). Four of these contain plausible causative genes (FGFR2, TNRC9, MAP3K1 and LSP1). At the second stage, 1,792 SNPs were significant at the P < 0.05 level compared with an estimated 1,343 that would be expected by chance, indicating that many additional common susceptibility alleles may be identifiable by this approach.

    Funded by: Breast Cancer Now: 2004NOV49, BREAST CANCER NOW RESEARCH CENTRE; Cancer Research UK: A3353

    Nature 2007;447;7148;1087-93

  • Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

    ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaöz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, NISC Comparative Sequencing Program, Baylor College of Medicine Human Genome Sequencing Center, Washington University Genome Sequencing Center, Broad Institute, Children's Hospital Oakland Research Institute, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CW, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JN, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PI, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrímsdóttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B and de Jong PJ

    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

    Funded by: NCI NIH HHS: F32 CA108313; NHGRI NIH HHS: K22 HG003169, K22 HG003169-01A1, P41 HG002371, P41 HG002371-03S1, R01 HG002238, R01 HG002238-15, R01 HG003110, R01 HG003110-03, R01 HG003129-03, R01 HG003143, R01 HG003143-04, R01 HG003521, R01 HG003521-01, R01 HG003532, R01 HG003532-01, R01 HG003541, R01 HG003541-03, U01 HG002523, U01 HG002523-01, U01 HG003147, U01 HG003147-02, U01 HG003150, U01 HG003150-03, U01 HG003151, U01 HG003151-03, U01 HG003156, U01 HG003156-03, U01 HG003157, U01 HG003157-03, U01 HG003161, U01 HG003161-03, U01 HG003162, U01 HG003162-03, U01 HG003168-02, U54 HG003067, U54 HG003067-01, U54 HG003079, U54 HG003079-01, U54 HG003273, U54 HG003273-01; Wellcome Trust: 062023, 077198

    Nature 2007;447;7146;799-816

  • Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.

    Farooqi IS, Wangensteen T, Collins S, Kimber W, Matarese G, Keogh JM, Lank E, Bottomley B, Lopez-Fernandez J, Ferraz-Amaro I, Dattani MT, Ercan O, Myhre AG, Retterstol L, Stanhope R, Edge JA, McKenzie S, Lessan N, Ghodsi M, De Rosa V, Perna F, Fontana S, Barroso I, Undlien DE and O'Rahilly S

    Cambridge Institute for Medical Research, University Department of Clinical Biochemistry, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Background: A single family has been described in which obesity results from a mutation in the leptin-receptor gene (LEPR), but the prevalence of such mutations in severe, early-onset obesity has not been systematically examined.

    Methods: We sequenced LEPR in 300 subjects with hyperphagia and severe early-onset obesity, including 90 probands from consanguineous families, and investigated the extent to which mutations cosegregated with obesity and affected receptor function. We evaluated metabolic, endocrine, and immune function in probands and affected relatives.

    Results: Of the 300 subjects, 8 (3%) had nonsense or missense LEPR mutations--7 were homozygotes, and 1 was a compound heterozygote. All missense mutations resulted in impaired receptor signaling. Affected subjects were characterized by hyperphagia, severe obesity, alterations in immune function, and delayed puberty due to hypogonadotropic hypogonadism. Serum leptin levels were within the range predicted by the elevated fat mass in these subjects. Their clinical features were less severe than those of subjects with congenital leptin deficiency.

    Conclusions: The prevalence of pathogenic LEPR mutations in a cohort of subjects with severe, early-onset obesity was 3%. Circulating levels of leptin were not disproportionately elevated, suggesting that serum leptin cannot be used as a marker for leptin-receptor deficiency. Congenital leptin-receptor deficiency should be considered in the differential diagnosis in any child with hyperphagia and severe obesity in the absence of developmental delay or dysmorphism.

    Funded by: Medical Research Council: G0502115; Telethon: GJT04008; Wellcome Trust: 067457, 068086, 077016

    The New England journal of medicine 2007;356;3;237-47

  • Construction and use of spotted large-insert clone DNA microarrays for the detection of genomic copy number changes.

    Fiegler H, Redon R and Carter NP

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Microarray-based comparative genomic hybridization has become a widespread method for the analysis of DNA copy number changes across the human genome. Initial methods for microarray construction using large-insert clones required the preparation of DNA from large-scale cultures. This rapidly became an expensive and time-consuming process when expanded to the number of clones needed for higher resolution arrays. To overcome this problem, several PCR-based strategies have been developed to enable array construction from small amounts of cloned DNA. Here, we describe the construction of microarrays composed of human-specific large-insert clones (40-200 kb) using a specific degenerate oligonucleotide PCR strategy. In addition, we also describe array hybridization using manual and automated procedures and methods for array analysis. The technology and protocols described in this article can easily be adapted for other species dependent on the availability of clone libraries. According to our protocols, the procedure will take approximately 3 days from labeling the DNA to scanning the hybridized slides.

    Nature protocols 2007;2;3;577-87

  • ProServer: a simple, extensible Perl DAS server.

    Finn RD, Stalker JW, Jackson DK, Kulesha E, Clements J and Pettett R

    Wellcome Trust Sanger Institute, Wellcome Trust Geome Campus, Hinxton, Cambridge, UK.

    Summary: The increasing size and complexity of biological databases has led to a growing trend to federate rather than duplicate them. In order to share data between federated databases, protocols for the exchange mechanism must be developed. One such data exchange protocol that is widely used is the Distributed Annotation System (DAS). For example, DAS has enabled small experimental groups to integrate their data into the Ensembl genome browser. We have developed ProServer, a simple, lightweight, Perl-based DAS server that does not depend on a separate HTTP server. The ProServer package is easily extensible, allowing data to be served from almost any underlying data model. Recent additions to the DAS protocol have enabled both structure and alignment (sequence and structural) data to be exchanged. ProServer allows both of these data types to be served.

    Availability: ProServer can be downloaded from or CPAN Details on the system requirements and installation of ProServer can be found at

    Funded by: Medical Research Council: G0100305; Wellcome Trust

    Bioinformatics (Oxford, England) 2007;23;12;1568-70

  • The Pfam protein families database.

    Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (, the USA ( and Sweden (, as well as from mirror sites in France ( and South Korea (

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Medical Research Council: G0100305; Wellcome Trust: 087656

    Nucleic acids research 2007;36;Database issue;D281-8

  • Y-chromosomal evidence for a limited Greek contribution to the Pathan population of Pakistan.

    Firasat S, Khaliq S, Mohyuddin A, Papaioannou M, Tyler-Smith C, Underhill PA and Ayub Q

    Biomedical and Genetic Engineering Division, Dr. AQ Khan Research Laboratories, Islamabad, Pakistan.

    Three Pakistani populations residing in northern Pakistan, the Burusho, Kalash and Pathan claim descent from Greek soldiers associated with Alexander's invasion of southwest Asia. Earlier studies have excluded a substantial Greek genetic input into these populations, but left open the question of a smaller contribution. We have now typed 90 binary polymorphisms and 16 multiallelic, short-tandem-repeat (STR) loci mapping to the male-specific portion of the human Y chromosome in 952 males, including 77 Greeks in order to re-investigate this question. In pairwise comparisons between the Greeks and the three Pakistani populations using genetic distance measures sensitive to recent events, the lowest distances were observed between the Greeks and the Pathans. Clade E3b1 lineages, which were frequent in the Greeks but not in Pakistan, were nevertheless observed in two Pathan individuals, one of whom shared a 16 Y-STR haplotype with the Greeks. The worldwide distribution of a shortened (9 Y-STR) version of this haplotype, determined from database information, was concentrated in Macedonia and Greece, suggesting an origin there. Although based on only a few unrelated descendants, this provides strong evidence for a European origin for a small proportion of the Pathan Y chromosomes.

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2007;15;1;121-6

  • Ensembl 2008.

    Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Gräf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kähäri A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJ, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A and Searle S

    European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    The Ensembl project ( is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein-DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E010768/1, BB/E011640/1, BBE0116401, BBS/B/13438, BBS/B/13446, BBS/B/13462, BBS/B/13470; Wellcome Trust: 062023, 077198

    Nucleic acids research 2007;36;Database issue;D707-14

  • Testing of diabetes-associated WFS1 polymorphisms in the Diabetes Prevention Program.

    Florez JC, Jablonski KA, McAteer J, Sandhu MS, Wareham NJ, Barroso I, Franks PW, Altshuler D, Knowler WC and Diabetes Prevention Program Research Group

    Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA.

    Aims/hypothesis: Wolfram syndrome (diabetes insipidus, diabetes mellitus, optic atrophy and deafness) is caused by mutations in the WFS1 gene. Recently, single nucleotide polymorphisms (SNPs) in WFS1 have been reproducibly associated with type 2 diabetes. We therefore examined the effects of these variants on diabetes incidence and response to interventions in the Diabetes Prevention Program (DPP), in which a lifestyle intervention or metformin treatment was compared with placebo.

    Methods: We genotyped the WFS1 SNPs rs10010131, rs752854 and rs734312 (H611R) in 3,548 DPP participants and performed Cox regression analysis using genotype, intervention and their interactions as predictors of diabetes incidence. We also evaluated the effect of these SNPs on insulin resistance and beta cell function at 1 year.

    Results: Although none of the three SNPs was associated with diabetes incidence in the overall cohort, white homozygotes for the previously reported protective alleles appeared less likely to develop diabetes in the lifestyle arm. Examination of the publicly available Diabetes Genetics Initiative genome-wide association dataset revealed that rs10012946, which is in strong linkage disequilibrium with the three WFS1 SNPs (r(2)=0.88-1.0), was associated with type 2 diabetes (allelic odds ratio 0.85, 95% CI 0.75-0.97, p=0.026). In the DPP, we noted a trend towards increased insulin secretion in carriers of the protective variants, although for most SNPs this was seen as compensatory for the diminished insulin sensitivity.

    Conclusions/interpretation: The previously reported protective effect of select WFS1 alleles may be magnified by a lifestyle intervention. These variants appear to confer an improvement in beta cell function.

    Funded by: Intramural NIH HHS; Medical Research Council: MC_U106179471; NIDDK NIH HHS: K23 DK65978-04, R01 DK072041-02, U01 DK048489, U01 DK048489-06

    Diabetologia 2007;51;3;451-7

  • PPARGC1A coding variation may initiate impaired NEFA clearance during glucose challenge.

    Franks PW, Ekelund U, Brage S, Luan J, Schafer AJ, O'Rahilly S, Barroso I and Wareham NJ

    Genetic Epidemiology and Clinical Research Group, Department of Public Health and Clinical Medicine, Section for Medicine, Umeå University Hospital, Umeå, Sweden.

    Aims/hypothesis: The peroxisome proliferator-activated receptor gamma coactivator 1-alpha protein, encoded by the PPARGC1A gene, transcriptionally activates a complex pathway of lipid and glucose metabolism and is expressed primarily in tissues of high metabolic activity such as liver, heart and exercising oxidative skeletal muscle fibre. Ppargc1a-null mice develop systemic dyslipidaemia and hepatic steatosis. In humans, NEFAs downregulate PPARGC1A expression in skeletal muscle. Furthermore, a common non-synonymous coding variant at PPARGC1A (Gly482Ser, rs8192678) is associated with decreased PPARGC1A mRNA levels and increased type 2 diabetes risk.

    Materials and methods: In a population-based sample of 691 healthy middle-aged Europids we assessed whether Gly482Ser is associated with levels of NEFA when fasting and in response to an oral glucose challenge. We also assessed the potential effect-modifying role of adipose tissue mass on these phenotypes.

    Results: After adjustment for age, sex, fat mass and fat-free mass, the Ser482 allele associated with higher NEFA at 30 min and 2 h and with NEFA AUC (all values p<or=0.02). Furthermore, suggestive evidence of interaction between fat mass and Gly482Ser was observed for fasting NEFA (p=0.059). After stratification by level of obesity, genotype associations were observed in the obese for fasting NEFA (p=0.028) and NEFA at 30 min (p=0.013) and 2 h (p=0.002), and with NEFA AUC (p=0.005), but no significant associations were observed in lean individuals (all values p>0.6).

    Conclusions/interpretation: Our observations indicate that NEFA clearance is blunted following a glucose load in carriers of the PPARCG1A Ser482 allele. This association is augmented by obesity.

    Funded by: Medical Research Council: MC_U106179471, MC_U106179473; Wellcome Trust: 077016

    Diabetologia 2007;50;3;569-73

  • Replication of the association between variants in WFS1 and risk of type 2 diabetes in European populations.

    Franks PW, Rolandsson O, Debenham SL, Fawcett KA, Payne F, Dina C, Froguel P, Mohlke KL, Willer C, Olsson T, Wareham NJ, Hallmans G, Barroso I and Sandhu MS

    Department of Public Health and Clinical Medicine, Umeå University Hospital, Umeå, Sweden.

    Aims/hypothesis: Mutations at the gene encoding wolframin (WFS1) cause Wolfram syndrome, a rare neurological condition. Associations between single nucleotide polymorphisms (SNPs) at WFS1 and type 2 diabetes have recently been reported. Thus, our aim was to replicate those associations in a northern Swedish case-control study of type 2 diabetes. We also performed a meta-analysis of published and previously unpublished data from Sweden, Finland and France, to obtain updated summary effect estimates.

    Methods: Four WFS1 SNPs (rs10010131, rs6446482, rs752854 and rs734312 [H611R]) were genotyped in a type 2 diabetes case-control study (n = 1,296/1,412) of Swedish adults. Logistic regression was used to assess the association between each WFS1 SNP and type 2 diabetes, following adjustment for age, sex and BMI. We then performed a meta-analysis of 11 studies of type 2 diabetes, comprising up to 14,139 patients and 16,109 controls, to obtain a summary effect estimate for the WFS1 variants.

    Results: In the northern Swedish study, the minor allele at rs752854 was associated with reduced type 2 diabetes risk [odds ratio (OR) 0.85, 95% CI 0.75-0.96, p=0.010]. Borderline statistical associations were observed for the remaining SNPs. The meta-analysis of the four independent replication studies for SNP rs10010131 and correlated variants showed evidence for statistical association (OR 0.87, 95% CI 0.82-0.93, p=4.5 x 10(-5)). In an updated meta-analysis of all 11 studies, strong evidence of statistical association was also observed (OR 0.89, 95% CI 0.86-0.92; p=4.9 x 10(-11)).

    Conclusions/interpretation: In this study of WFS1 variants and type 2 diabetes risk, we have replicated the previously reported associations between SNPs at this locus and the risk of type 2 diabetes.

    Funded by: Medical Research Council: G0600331, MC_U106179471; NIDDK NIH HHS: DK62370, DK72193; Wellcome Trust: 077016

    Diabetologia 2007;51;3;458-63

  • A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity.

    Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW, Shields B, Harries LW, Barrett JC, Ellard S, Groves CJ, Knight B, Patch AM, Ness AR, Ebrahim S, Lawlor DA, Ring SM, Ben-Shlomo Y, Jarvelin MR, Sovio U, Bennett AJ, Melzer D, Ferrucci L, Loos RJ, Barroso I, Wareham NJ, Karpe F, Owen KR, Cardon LR, Walker M, Hitman GA, Palmer CN, Doney AS, Morris AD, Smith GD, Hattersley AT and McCarthy MI

    Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, Magdalen Road, Exeter, UK.

    Obesity is a serious international health problem that increases the risk of several common diseases. The genetic factors predisposing to obesity are poorly understood. A genome-wide search for type 2 diabetes-susceptibility genes identified a common variant in the FTO (fat mass and obesity associated) gene that predisposes to diabetes through an effect on body mass index (BMI). An additive association of the variant with BMI was replicated in 13 cohorts with 38,759 participants. The 16% of adults who are homozygous for the risk allele weighed about 3 kilograms more and had 1.67-fold increased odds of obesity when compared with those not inheriting a risk allele. This association was observed from age 7 years upward and reflects a specific increase in fat mass.

    Funded by: Intramural NIH HHS: Z99 AG999999; Medical Research Council: G0000934, G0500070, G0600705, G9815508, MC_U106179471, MC_U106188470; Wellcome Trust: 079557, 090532

    Science (New York, N.Y.) 2007;316;5826;889-94

  • Definition of the zebrafish genome using flow cytometry and cytogenetic mapping.

    Freeman JL, Adeniyi A, Banerjee R, Dallaire S, Maguire SF, Chi J, Ng BL, Zepeda C, Scott CE, Humphray S, Rogers J, Zhou Y, Zon LI, Carter NP, Yang F and Lee C

    Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA. <;

    Background: The zebrafish (Danio rerio) is an important vertebrate model organism system for biomedical research. The syntenic conservation between the zebrafish and human genome allows one to investigate the function of human genes using the zebrafish model. To facilitate analysis of the zebrafish genome, genetic maps have been constructed and sequence annotation of a reference zebrafish genome is ongoing. However, the duplicative nature of teleost genomes, including the zebrafish, complicates accurate assembly and annotation of a representative genome sequence. Cytogenetic approaches provide "anchors" that can be integrated with accumulating genomic data.

    Results: Here, we cytogenetically define the zebrafish genome by first estimating the size of each linkage group (LG) chromosome using flow cytometry, followed by the cytogenetic mapping of 575 bacterial artificial chromosome (BAC) clones onto metaphase chromosomes. Of the 575 BAC clones, 544 clones localized to apparently unique chromosomal locations. 93.8% of these clones were assigned to a specific LG chromosome location using fluorescence in situ hybridization (FISH) and compared to the LG chromosome assignment reported in the zebrafish genome databases. Thirty-one BAC clones localized to multiple chromosomal locations in several different hybridization patterns. From these data, a refined second generation probe panel for each LG chromosome was also constructed.

    Conclusion: The chromosomal mapping of the 575 large-insert DNA clones allows for these clones to be integrated into existing zebrafish mapping data. An accurately annotated zebrafish reference genome serves as a valuable resource for investigating the molecular basis of human diseases using zebrafish mutant models.

    Funded by: NCI NIH HHS: R01 CA111560, R01-CA111560; Wellcome Trust

    BMC genomics 2007;8;195

  • The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.

    Genome Information Integration Project And H-Invitational 2, Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, Takeda J, Taniya T, Sakate R, Kikugawa S, Shimada M, Tanino M, Koyanagi KO, Barrero RA, Gough C, Chun HW, Habara T, Hanaoka H, Hayakawa Y, Hilton PB, Kaneko Y, Kanno M, Kawahara Y, Kawamura T, Matsuya A, Nagata N, Nishikata K, Noda AO, Nurimoto S, Saichi N, Sakai H, Sanbonmatsu R, Shiba R, Suzuki M, Takabayashi K, Takahashi A, Tamura T, Tanaka M, Tanaka S, Todokoro F, Yamaguchi K, Yamamoto N, Okido T, Mashima J, Hashizume A, Jin L, Lee KB, Lin YC, Nozaki A, Sakai K, Tada M, Miyazaki S, Makino T, Ohyanagi H, Osato N, Tanaka N, Suzuki Y, Ikeo K, Saitou N, Sugawara H, O'Donovan C, Kulikova T, Whitfield E, Halligan B, Shimoyama M, Twigger S, Yura K, Kimura K, Yasuda T, Nishikawa T, Akiyama Y, Motono C, Mukai Y, Nagasaki H, Suwa M, Horton P, Kikuno R, Ohara O, Lancet D, Eveno E, Graudens E, Imbeaud S, Debily MA, Hayashizaki Y, Amid C, Han M, Osanger A, Endo T, Thomas MA, Hirakawa M, Makalowski W, Nakao M, Kim NS, Yoo HS, De Souza SJ, Bonaldo Mde F, Niimura Y, Kuryshev V, Schupp I, Wiemann S, Bellgard M, Shionyu M, Jia L, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Zhang Q, Go M, Minoshima S, Ohtsubo M, Hanada K, Tonellato P, Isogai T, Zhang J, Lenhard B, Kim S, Chen Z, Hinz U, Estreicher A, Nakai K, Makalowska I, Hide W, Tiffin N, Wilming L, Chakraborty R, Soares MB, Chiusano ML, Suzuki Y, Auffray C, Yamaguchi-Kabata Y, Itoh T, Hishiki T, Fukuchi S, Nishikawa K, Sugano S, Nomura N, Tateno Y, Imanishi T and Gojobori T

    Japan Biological Information Research Center, Japan Biological Informatics Consortium, Japan.

    Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB;, a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.

    Funded by: NHLBI NIH HHS: P50 HL054998, R01 HL064541; Wellcome Trust: 077198

    Nucleic acids research 2007;36;Database issue;D793-9

  • The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase.

    Gerken T, Girard CA, Tung YC, Webby CJ, Saudek V, Hewitson KS, Yeo GS, McDonough MA, Cunliffe S, McNeill LA, Galvanovskis J, Rorsman P, Robins P, Prieur X, Coll AP, Ma M, Jovanovic Z, Farooqi IS, Sedgwick B, Barroso I, Lindahl T, Ponting CP, Ashcroft FM, O'Rahilly S and Schofield CJ

    Chemistry Research Laboratory and Oxford Centre for Integrative Systems Biology, University of Oxford, 12 Mansfield Road, Oxford, Oxon OX1 3TA, UK.

    Variants in the FTO (fat mass and obesity associated) gene are associated with increased body mass index in humans. Here, we show by bioinformatics analysis that FTO shares sequence motifs with Fe(II)- and 2-oxoglutarate-dependent oxygenases. We find that recombinant murine Fto catalyzes the Fe(II)- and 2OG-dependent demethylation of 3-methylthymine in single-stranded DNA, with concomitant production of succinate, formaldehyde, and carbon dioxide. Consistent with a potential role in nucleic acid demethylation, Fto localizes to the nucleus in transfected cells. Studies of wild-type mice indicate that Fto messenger RNA (mRNA) is most abundant in the brain, particularly in hypothalamic nuclei governing energy balance, and that Fto mRNA levels in the arcuate nucleus are regulated by feeding and fasting. Studies can now be directed toward determining the physiologically relevant FTO substrate and how nucleic acid methylation status is linked to increased fat mass.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D011523/1; Medical Research Council: G108/617, G9824984, MC_U137761446; NIGMS NIH HHS: U54 GM064346; Wellcome Trust: 068086, 077016

    Science (New York, N.Y.) 2007;318;5855;1469-72

  • Draft genome of the filarial nematode parasite Brugia malayi.

    Ghedin E, Wang S, Spiro D, Caler E, Zhao Q, Crabtree J, Allen JE, Delcher AL, Guiliano DB, Miranda-Saavedra D, Angiuoli SV, Creasy T, Amedeo P, Haas B, El-Sayed NM, Wortman JR, Feldblyum T, Tallon L, Schatz M, Shumway M, Koo H, Salzberg SL, Schobel S, Pertea M, Pop M, White O, Barton GJ, Carlow CK, Crawford MJ, Daub J, Dimmic MW, Estes CF, Foster JM, Ganatra M, Gregory WF, Johnson NM, Jin J, Komuniecki R, Korf I, Kumar S, Laney S, Li BW, Li W, Lindblom TH, Lustigman S, Ma D, Maina CV, Martin DM, McCarter JP, McReynolds L, Mitreva M, Nutman TB, Parkinson J, Peregrín-Alvarez JM, Poole C, Ren Q, Saunders L, Sluder AE, Smith K, Stanke M, Unnasch TR, Ware J, Wei AD, Weil G, Williams DJ, Zhang Y, Williams SA, Fraser-Liggett C, Slatko B, Blaxter ML and Scott AL

    Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA.

    Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the approximately 90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict approximately 11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during approximately 350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design.

    Funded by: NIAID NIH HHS: R01 AI048562, R01 AI048562-09, U01-AI50903; NIEHS NIH HHS: R15 ES013128, R15 ES013128-01; NLM NIH HHS: R01 LM006845, R01 LM006845-08, R01 LM007938, R01 LM007938-04

    Science (New York, N.Y.) 2007;317;5845;1756-60

  • Improving the power to detect differentially expressed genes in comparative microarray experiments by including information from self-self hybridizations.

    Gusnanto A, Tom B, Burns P, Macaulay I, Thijssen-Timmer DC, Tijssen MR, Langford C, Watkins N, Ouwehand W, Berzuini C and Dudbridge F

    Medical Research Council-Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR, UK.

    Our ability to detect differentially expressed genes in a microarray experiment can be hampered when the number of biological samples of interest is limited. In this situation, we propose the use of information from self-self hybridizations to acuminate our inference of differential expression. A unified modelling strategy is developed to allow better estimation of the error variance. This principle is similar to the use of a pooled variance estimate in the two-sample t-test. The results from real dataset examples suggest that we can detect more genes that are differentially expressed in the combined models. Our simulation study provides evidence that this method increases sensitivity compared to using the information from comparative hybridizations alone, given the same control for false discovery rate. The largest increase in sensitivity occurs when the amount of information in the comparative hybridization is limited.

    Funded by: Medical Research Council: MC_U105260799, MC_U105261167

    Computational biology and chemistry 2007;31;3;178-85

  • Schistosoma mansoni genome: closing in on a final gene set.

    Haas BJ, Berriman M, Hirai H, Cerqueira GG, Loverde PT and El-Sayed NM

    The J.C. Venter Institute, Rockville, MD 20850, USA.

    The Schistosoma mansoni genome sequencing consortium has recently released the latest versions of the genome assembly as well as an automated preliminary gene structure annotation. The combined datasets constitute a vast resource for researchers to exploit in a variety of post-genomic studies with an emphasis of transcriptomic and proteomic tools. Here we present an innovative method used for combining diverse sources of evidence including ab initio gene predictions, protein and transcript sequence homologies, and cross-genome sequence homologies between S. mansoni and Schistosoma japonicum to define a comprehensive list of protein-coding genes.

    Funded by: NIAID NIH HHS: AI48828; Wellcome Trust: 13557021

    Experimental parasitology 2007;117;3;225-8

  • Lessons learned from the initial sequencing of the pig genome: comparative analysis of an 8 Mb region of pig chromosome 17.

    Hart EA, Caccamo M, Harrow JL, Humphray SJ, Gilbert JG, Trevanion S, Hubbard T, Rogers J and Rothschild MF

    Wellcome Trust Sanger Institute, Wellcome Tust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Background: We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage.

    Results: Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs.

    Conclusion: We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS.

    Funded by: Biotechnology and Biological Sciences Research Council: BBE0116401; Wellcome Trust: 077198

    Genome biology 2007;8;8;R168

  • Specialist fungi, versatile genomes.

    Hertz-Fowler C and Pain A

    Nature reviews. Microbiology 2007;5;5;332-3

  • Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome.

    Hiller NL, Janto B, Hogg JS, Boissy R, Yu S, Powell E, Keefe R, Ehrlich NE, Shen K, Hayes J, Barbadora K, Klimke W, Dernovoy D, Tatusova T, Parkhill J, Bentley SD, Post JC, Ehrlich GD and Hu FZ

    Allegheny General Hospital, Allegheny-Singer Research Institute, Center for Genomic Sciences, Pittsburgh, PA 15212, USA.

    The distributed-genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium and that these pathogens utilize genetic recombination and a large, noncore set of genes as a means of diversity generation. We sequenced the genomes of eight nasopharyngeal strains of Streptococcus pneumoniae isolated from pediatric patients with upper respiratory symptoms and performed quantitative genomic analyses among these and nine publicly available pneumococcal strains. Coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which 1,454 (46%) were conserved among all 17 strains. The majority of the gene clusters, 1,716 (54%), were not found in all strains. Genic differences per strain pair ranged from 35 to 629 orthologous clusters, with each strain's genome containing between 21 and 32% noncore genes. The distribution of the orthologous clusters per genome for the 17 strains was entered into the finite-supragenome model, which predicted that (i) the S. pneumoniae supragenome contains more than 5,000 orthologous clusters and (ii) 99% of the orthologous clusters ( approximately 3,000) that are represented in the S. pneumoniae population at frequencies of >or=0.1 can be identified if 33 representative genomes are sequenced. These extensive genic diversity data support the DGH and provide a basis for understanding the great differences in clinical phenotype associated with various pneumococcal strains. When these findings are taken together with previous studies that demonstrated the presence of a supragenome for Streptococcus agalactiae and Haemophilus influenzae, it appears that the possession of a distributed genome is a common host interaction strategy.

    Funded by: NIDCD NIH HHS: DC02148, DC04173, DC05659, R01 DC002148, R01 DC004173, R01 DC005659; Wellcome Trust

    Journal of bacteriology 2007;189;22;8186-95

  • Multidrug-resistant Salmonella enterica serovar paratyphi A harbors IncHI1 plasmids similar to those found in serovar typhi.

    Holt KE, Thomson NR, Wain J, Phan MD, Nair S, Hasan R, Bhutta ZA, Quail MA, Norbertczak H, Walker D, Dougan G and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Salmonella enterica serovars Typhi and Paratyphi A cause systemic infections in humans which are referred to as enteric fever. Multidrug-resistant (MDR) serovar Typhi isolates emerged in the 1980s, and in recent years MDR serovar Paratyphi A infections have become established as a significant problem across Asia. MDR in serovar Typhi is almost invariably associated with IncHI1 plasmids, but the genetic basis of MDR in serovar Paratyphi A has remained predominantly undefined. The DNA sequence of an IncHI1 plasmid, pAKU_1, encoding MDR in a serovar Paratyphi A strain has been determined. Significantly, this plasmid shares a common IncHI1-associated DNA backbone with the serovar Typhi plasmid pHCM1 and an S. enterica serovar Typhimurium plasmid pR27. Plasmids pAKU_1 and pHCM1 share 14 antibiotic resistance genes encoded within similar mobile elements, which appear to form a 24-kb composite transposon that has transferred as a single unit into different positions into their IncHI1 backbones. Thus, these plasmids have acquired similar antibiotic resistance genes independently via the horizontal transfer of mobile DNA elements. Furthermore, two IncHI1 plasmids from a Vietnamese isolate of serovar Typhi were found to contain features of the backbone sequence of pAKU_1 rather than pHCM1, with the composite transposon inserted in the same location as in the pAKU_1 sequence. Our data show that these serovar Typhi and Paratyphi A IncHI1 plasmids share highly conserved core DNA and have acquired similar mobile elements encoding antibiotic resistance genes in past decades.

    Funded by: Medical Research Council: G0600805; Wellcome Trust

    Journal of bacteriology 2007;189;11;4257-64

  • Identification of common genetic variation that modulates alternative splicing.

    Hull J, Campino S, Rowlands K, Chan MS, Copley RR, Taylor MS, Rockett K, Elvidge G, Keating B, Knight J and Kwiatkowski D

    University Department of Paediatrics, John Radcliffe Hospital, Oxford, United Kingdom.

    Alternative splicing of genes is an efficient means of generating variation in protein function. Several disease states have been associated with rare genetic variants that affect splicing patterns. Conversely, splicing efficiency of some genes is known to vary between individuals without apparent ill effects. What is not clear is whether commonly observed phenotypic variation in splicing patterns, and hence potential variation in protein function, is to a significant extent determined by naturally occurring DNA sequence variation and in particular by single nucleotide polymorphisms (SNPs). In this study, we surveyed the splicing patterns of 250 exons in 22 individuals who had been previously genotyped by the International HapMap Project. We identified 70 simple cassette exon alternative splicing events in our experimental system; for six of these, we detected consistent differences in splicing pattern between individuals, with a highly significant association between splice phenotype and neighbouring SNPs. Remarkably, for five out of six of these events, the strongest correlation was found with the SNP closest to the intron-exon boundary, although the distance between these SNPs and the intron-exon boundary ranged from 2 bp to greater than 1,000 bp. Two of these SNPs were further investigated using a minigene splicing system, and in each case the SNPs were found to exert cis-acting effects on exon splicing efficiency in vitro. The functional consequences of these SNPs could not be predicted using bioinformatic algorithms. Our findings suggest that phenotypic variation in splicing patterns is determined by the presence of SNPs within flanking introns or exons. Effects on splicing may represent an important mechanism by which SNPs influence gene function.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust: 074318

    PLoS genetics 2007;3;6;e99

  • Completing the map of human genetic variation.

    Human Genome Structural Variation Working Group, Eichler EE, Nickerson DA, Altshuler D, Bowcock AM, Brooks LD, Carter NP, Church DM, Felsenfeld A, Guyer M, Lee C, Lupski JR, Mullikin JC, Pritchard JK, Sebat J, Sherry ST, Smith D, Valle D and Waterston RH

    Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.

    Funded by: Wellcome Trust: 077008

    Nature 2007;447;7141;161-5

  • A high utility integrated map of the pig genome.

    Humphray SJ, Scott CE, Clark R, Marron B, Bender C, Camm N, Davis J, Jenks A, Noon A, Patel M, Sehra H, Yang F, Rogatcheva MB, Milan D, Chardon P, Rohrer G, Nonneman D, de Jong P, Meyers SN, Archibald A, Beever JE, Schook LB and Rogers J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA UK.

    Background: The domestic pig is being increasingly exploited as a system for modeling human disease. It also has substantial economic importance for meat-based protein production. Physical clone maps have underpinned large-scale genomic sequencing and enabled focused cloning efforts for many genomes. Comparative genetic maps indicate that there is more structural similarity between pig and human than, for example, mouse and human, and we have used this close relationship between human and pig as a way of facilitating map construction.

    Results: Here we report the construction of the most highly continuous bacterial artificial chromosome (BAC) map of any mammalian genome, for the pig (Sus scrofa domestica) genome. The map provides a template for the generation and assembly of high-quality anchored sequence across the genome. The physical map integrates previous landmark maps with restriction fingerprints and BAC end sequences from over 260,000 BACs derived from 4 BAC libraries and takes advantage of alignments to the human genome to improve the continuity and local ordering of the clone contigs. We estimate that over 98% of the euchromatin of the 18 pig autosomes and the X chromosome along with localized coverage on Y is represented in 172 contigs, with chromosome 13 (218 Mb) represented by a single contig. The map is accessible through pre-Ensembl, where links to marker and sequence data can be found.

    Conclusion: The map will enable immediate electronic positional cloning of genes, benefiting the pig research community and further facilitating use of the pig as an alternative animal model for human disease. The clone map and BAC end sequence data can also help to support the assembly of maps and genome sequences of other artiodactyls.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E010520/1, BB/E010520/2, BBE0116401; Wellcome Trust: 077198

    Genome biology 2007;8;7;R139

  • A second generation human haplotype map of over 3.1 million SNPs.

    International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R and Stewart J

    The Scripps Research Institute, 10550 North Torrey Pines Road MEM275, La Jolla, California 92037, USA.

    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

    Funded by: Wellcome Trust: 077008, 077011, 077046, 081682

    Nature 2007;449;7164;851-61

  • In silico functional and structural characterisation of ferlin proteins by mapping disease-causing mutations and evolutionary information onto three-dimensional models of their C2 domains.

    Jiménez JL and Bashir R

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Ferlins are C2 domain proteins involved in membrane fusion events, including membrane repair and synaptic exocytosis, and their deficiency can result in muscular dystrophy and deafness. We have undertaken a structural study of their C2 domains by sequence comparison and homology modelling to understand the function of these poorly characterised proteins and to predict the molecular impact of disease-causing mutations. We observe that non-conservative mutations affecting buried residues tend to result in detrimental phenotypes, likely because of decreased protein stability, whereas most variants with replacements in surface residues do not. The few cases of exposed residues altered in variants known to cause diseases are found in conserved areas of functional importance, including essential calcium-binding regions, as deduced by analogy to other characterised C2 domains. Furthermore, we report distinct features of some C2 domains in the two known ferlin subfamilies that correlates with the presence or absence of the DysF domains. Taken altogether, our results highlight potential targets for further experimental analyses to understand the function of ferlin proteins. We believe our modelling data will aid the diagnosis of diseases associated with ferlin mutations and the development of therapeutic strategies.

    Funded by: Wellcome Trust

    Journal of the neurological sciences 2007;260;1-2;114-23

  • Immunohistochemical characterization of cytokeratins in the abnormal corneal endothelium of posterior polymorphous corneal dystrophy patients.

    Jirsova K, Merjava S, Martincova R, Gwilliam R, Ebenezer ND, Liskova P and Filipec M

    Ocular Tissue Bank, General Teaching Hospital and Charles University, U Nemocnice 2, Prague 128 08, Czech Republic.

    Posterior polymorphous corneal dystrophy (PPCD) is a hereditary bilateral disorder affecting Descemet's membrane and the endothelium. The aim of the present study was to determine the spectrum of cytokeratin (CK) expression in cells on the posterior surface of the cornea in PPCD patients. Ten corneal buttons and one specimen of the trabecular meshwork (TM) from PPCD patients who underwent graft or glaucoma surgery were used, as well as six corneal buttons and two TM specimens obtained from healthy donors as controls. Cryosections were fixed and indirect immunofluorescent staining was performed using antibodies directed against a wide spectrum of cytokeratins (CKs). The number of positive cells and the intensity of the staining were assessed using fluorescent microscopy. All 10 PPCD corneal specimens had areas of endothelium displaying typical endothelial morphology as well as areas consisting of layers two to six cells thick with both flat endothelial-like cells and polygonal cells with round nuclei and a large cytoplasm. Both of these morphologically distinct cell types showed strong immunostaining for CK7, CK19, CK8 and CK18, while weaker positive signals were observed for CK1, CK3/12, CK4, CK5/6, CK10, CK10/13, CK14, CK16 and CK17. PPCD endothelium was completely negative for CK2e, CK9, CK15, and CK20. Focal positivity was detected in PPCD TM for CK4, CK7 and CK19. CK8 and CK18 were the only CKs expressed in control endothelium. PPCD and control epithelium displayed similar staining patterns. The distinct positivity for CK3/12, CK4, CK5/6, CK10/13, CK14, CK16 and CK17 was observed in aberrant PPCD endothelium for the first time. We demonstrate that the abnormal endothelium of PPCD patients expresses a mixture of CKs, with CK7 and CK19 predominating. In terms of CK composition, the aberrant PPCD endothelium shares features of both simple and squamous stratified epithelium with a proliferative capacity. The wide spectrum of CK expression is most probably not indicative of the transformation of endothelial cells to a distinct epithelial phenotype, but more likely reflects the modified differentiation of metaplastic epithelium.

    Experimental eye research 2007;84;4;680-6

  • Mapping the platelet profile for functional genomic studies and demonstration of the effect size of the GP6 locus.

    Jones CI, Garner SF, Angenent W, Bernard A, Berzuini C, Burns P, Farndale RW, Hogwood J, Rankin A, Stephens JC, Tom BD, Walton J, Dudbridge F, Ouwehand WH, Goodall AH and Bloodomics Consortium

    Department of Cardiovascular Sciences, University of Leicester, Leicester, UK.

    Background: Evidence suggests the wide variation in platelet response within the population is genetically controlled. Unraveling the complex relationship between sequence variation and platelet phenotype requires accurate and reproducible measurement of platelet response.

    Objective: To develop a methodology suitable for measuring signaling pathway-specific platelet phenotype, to use this to measure platelet response in a large cohort, and to demonstrate the effect size of sequence variation in a relevant model gene.

    Methods: Three established platelet assays were evaluated: mobilization of [Ca(2+)](i), aggregometry and flow cytometry, each in response to adenosine 5'-diphosphate (ADP) or the glycoprotein (GP) VI-specific crosslinked collagen-related peptide (CRP). Flow cytometric measurement of fibrinogen binding and P-selectin expression in response to a single, intermediate dose of each agonist gave the best combination of reproducibility and inter-individual variability and was used to measure the platelet response in 506 healthy volunteers. Pathway specificity was ensured by blocking the main subsidiary signaling pathways.

    Results: Individuals were identified who were hypo- or hyper-responders for both pathways, or who had differential responses to the two agonists, or between outcomes. 89 individuals, retested three months later using the same methodology, showed high concordance between the two visits in all four assays (r(2) = 0.872, 0.868, 0.766 and 0.549); all subjects retaining their phenotype at recall. The effect of sequence variation at the GP6 locus accounted for approximately 35% of the variation in the CRP-XL response.

    Conclusion: Genotyping-phenotype association studies in a well-characterized, large cohort provides a powerful strategy to measure the effect of sequence variation in genes regulating the platelet response.

    Funded by: Medical Research Council: G0500707, MC_U105260799, MC_U105261167

    Journal of thrombosis and haemostasis : JTH 2007;5;8;1756-65

  • Reduced ENaC protein abundance contributes to the lower blood pressure observed in pendrin-null mice.

    Kim YH, Pech V, Spencer KB, Beierwaltes WH, Everett LA, Green ED, Shin W, Verlander JW, Sutliff RL and Wall SM

    Department of Medicine, Emory University, Atlanta, Georgia, USA.

    Pendrin (encoded by Pds, Slc26a4) is a Cl(-)/HCO(3)(-) exchanger expressed in the apical regions of type B and non-A, non-B intercalated cells of kidney and mediates renal Cl(-) absorption, particularly when upregulated. Aldosterone increases blood pressure by increasing absorption of both Na(+) and Cl(-) through increased protein abundance and function of Na(+) transporters, such as the epithelial Na(+) channel (ENaC) and the Na(+)-Cl(-) cotransporter (NCC), as well as Cl(-) transporters, such as pendrin. Because aldosterone analogs do not increase blood pressure in Slc26a4(-/-) mice, we asked whether Na(+) excretion and Na(+) transporter protein abundance are altered in kidneys from these mutant mice. Thus wild-type and Slc26a4-null mice were given a NaCl-replete, a NaCl-restricted, or NaCl-replete diet and aldosterone or aldosterone analogs. Abundance of the major renal Na(+) transporters was examined with immunoblots and immunohistochemistry. Slc26a4-null mice showed an impaired ability to conserve Na(+) during dietary NaCl restriction. Under treatment conditions in which circulating aldosterone is increased, alpha-, beta-, and 85-kDa gamma-ENaC subunit protein abundances were reduced 15-35%, whereas abundance of the 70-kDa fragment of gamma-ENaC was reduced approximately 70% in Slc26a4-null relative to wild-type mice. Moreover, ENaC-dependent changes in transepithelial voltage were much lower in cortical collecting ducts from Slc26a4-null than from wild-type mice. Thus, in kidney, ENaC protein abundance and function are modulated by pendrin or through a pendrin-dependent downstream event. The reduced ENaC protein abundance and function observed in Slc26a4-null mice contribute to their lower blood pressure and reduced ability to conserve Na(+) during NaCl restriction.

    Funded by: PHS HHS: P01 061521

    American journal of physiology. Renal physiology 2007;293;4;F1314-24

  • Arginine methylation at histone H3R2 controls deposition of H3K4 trimethylation.

    Kirmizis A, Santos-Rosa H, Penkett CJ, Singer MA, Vermeulen M, Mann M, Bähler J, Green RD and Kouzarides T

    Gurdon Institute and Department of Pathology, Tennis Court Road, Cambridge CB2 1QN, UK.

    Modifications on histones control important biological processes through their effects on chromatin structure. Methylation at lysine 4 on histone H3 (H3K4) is found at the 5' end of active genes and contributes to transcriptional activation by recruiting chromatin-remodelling enzymes. An adjacent arginine residue (H3R2) is also known to be asymmetrically dimethylated (H3R2me2a) in mammalian cells, but its location within genes and its function in transcription are unknown. Here we show that H3R2 is also methylated in budding yeast (Saccharomyces cerevisiae), and by using an antibody specific for H3R2me2a in a chromatin immunoprecipitation-on-chip analysis we determine the distribution of this modification on the entire yeast genome. We find that H3R2me2a is enriched throughout all heterochromatic loci and inactive euchromatic genes and is present at the 3' end of moderately transcribed genes. In all cases the pattern of H3R2 methylation is mutually exclusive with the trimethyl form of H3K4 (H3K4me3). We show that methylation at H3R2 abrogates the trimethylation of H3K4 by the Set1 methyltransferase. The specific effect on H3K4me3 results from the occlusion of Spp1, a Set1 methyltransferase subunit necessary for trimethylation. Thus, the inability of Spp1 to recognize H3 methylated at R2 prevents Set1 from trimethylating H3K4. These results provide the first mechanistic insight into the function of arginine methylation on chromatin.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118, 092096

    Nature 2007;449;7164;928-32

  • Paired-end mapping reveals extensive structural variation in the human genome.

    Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, Taillon BE, Chen Z, Tanzer A, Saunders AC, Chi J, Yang F, Carter NP, Hurles ME, Weissman SM, Harkins TT, Gerstein MB, Egholm M and Snyder M

    Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT 06520, USA.

    Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.

    Funded by: NCRR NIH HHS: RR19895; Wellcome Trust: 077008, 077014

    Science (New York, N.Y.) 2007;318;5849;420-6

  • New tools and expanded data analysis capabilities at the Protein Structure Prediction Center.

    Kryshtafovych A, Prlic A, Dmytriv Z, Daniluk P, Milostan M, Eyrich V, Hubbard T and Fidelis K

    Genome Center, University of California, Davis, California 95616, USA.

    We outline the main tasks performed by the Protein Structure Prediction Center in support of the CASP7 experiment and provide a brief review of the major measures used in the automatic evaluation of predictions. We describe in more detail the software developed to facilitate analysis of modeling success over and beyond the available templates and the adopted Java-based tool enabling visualization of multiple structural superpositions between target and several models/templates. We also give an overview of the CASP infrastructure provided by the Center and discuss the organization of the results web pages available through

    Funded by: NLM NIH HHS: LM07085-01; Wellcome Trust: 077198

    Proteins 2007;69 Suppl 8;19-26

  • An RNA G-quadruplex in the 5' UTR of the NRAS proto-oncogene modulates translation.

    Kumari S, Bugaut A, Huppert JL and Balasubramanian S

    University Chemical Laboratory, Lensfield Road, Cambridge CB2 1EW, UK.

    Guanine-rich nucleic acid sequences can adopt noncanonical four-stranded secondary structures called guanine (G)-quadruplexes. Bioinformatics analysis suggests that G-quadruplex motifs are prevalent in genomes, which raises the need to elucidate their function. There is now evidence for the existence of DNA G-quadruplexes at telomeres with associated biological function. A recent hypothesis supports the notion that gene promoter elements contain DNA G-quadruplex motifs that control gene expression at the transcriptional level. We discovered a highly conserved, thermodynamically stable RNA G-quadruplex in the 5' untranslated region (UTR) of the gene transcript of the human NRAS proto-oncogene. Using a cell-free translation system coupled to a reporter gene assay, we have demonstrated that this NRAS RNA G-quadruplex modulates translation. This is the first example of translational repression by an RNA G-quadruplex. Bioinformatics analysis has revealed 2,922 other 5' UTR RNA G-quadruplex elements in the human genome. We propose that RNA G-quadruplexes in the 5' UTR modulate gene expression at the translational level.

    Funded by: Cancer Research UK: A4081

    Nature chemical biology 2007;3;4;218-21

  • A comprehensive antibody panel for immunohistochemical analysis of formalin-fixed, paraffin-embedded hematopoietic neoplasms of mice: analysis of mouse specific and human antibodies cross-reactive with murine tissue.

    Kunder S, Calzada-Wack J, Hölzlwimmer G, Müller J, Kloss C, Howat W, Schmidt J, Höfler H, Warren M and Quintanilla-Martinez L

    GSF Research Center for Environment and Health, Institute of Pathology, Neuherberg 85764, Germany.

    Immunohistochemistry is an indispensable tool in human pathology enabling immunophenotypic characterization of tumor cells. Immunohistochemical analyses of mouse models of human hematopoietic neoplasias have become an important aspect for comparison of murine entities with their human counterparts. The aim of this study was to establish a diagnostic antibody panel for analysis of murine lymphomas/leukemias, useful in formalin-fixed/paraffin-embedded tissue. Overall, 48 antibodies (4 rabbit monoclonal, 12 rabbit polyclonal, 2 goat polyclonal, 11 rat, and 19 mouse monoclonal), which were either mouse-specific (14) or cross-reactive with murine tissue (34) were tested for staining quality and diagnostic value in 468 murine hematopoietic neoplasms. Specific staining was achieved with 29 antibodies, of which 18 were human antibodies cross-reactive with murine tissue. Only 23 (B220, BCL-2, BCL-6, CD117, CD138 (2x), CD3 (2x), CD43, CD45, CD5, CD79 alpha cy, cyclin D1, Ki-67 (2x), Mac-3, Mac-2, lysozyme, mast cell tryptase, MPO, Pax-5, TdT, and TER-119) were regarded as valuable for diagnostic evaluation. Immunohistochemistry was also established in an automated immunostainer for high throughput analysis. The antibody panel developed is useful for the classification of murine lymphomas and leukemias analyzed, and a valuable tool for human and veterinary pathologists involved in the diagnostic interpretation of murine models of hematopoietic neoplasias.

    Toxicologic pathology 2007;35;3;366-75

  • hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes.

    Lamesch P, Li N, Milstein S, Fan C, Hao T, Szabo G, Hu Z, Venkatesan K, Bethel G, Martin P, Rogers J, Lawlor S, McLaren S, Dricot A, Borick H, Cusick ME, Vandenhaute J, Dunham I, Hill DE and Vidal M

    Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.

    Complete sets of cloned protein-encoding open reading frames (ORFs), or ORFeomes, are essential tools for large-scale proteomics and systems biology studies. Here we describe human ORFeome version 3.1 (hORFeome v3.1), currently the largest publicly available resource of full-length human ORFs (available at ). Generated by Gateway recombinational cloning, this collection contains 12,212 ORFs, representing 10,214 human genes, and corresponds to a 51% expansion of the original hORFeome v1.1. An online human ORFeome database, hORFDB, was built and serves as the central repository for all cloned human ORFs ( This expansion of the original ORFeome resource greatly increases the potential experimental search space for large-scale proteomics studies, which will lead to the generation of more comprehensive datasets.

    Genomics 2007;89;3;307-15

  • Host transmission of Salmonella enterica serovar Typhimurium is controlled by virulence factors and indigenous intestinal microbiota.

    Lawley TD, Bouley DM, Hoy YE, Gerke C, Relman DA and Monack DM

    Department of Microbiology and Immunology, 299 Campus Drive, Stanford University, Stanford, CA 94305, USA.

    Transmission is an essential stage of a pathogen's life cycle and remains poorly understood. We describe here a model in which persistently infected 129X1/SvJ mice provide a natural model of Salmonella enterica serovar Typhimurium transmission. In this model only a subset of the infected mice, termed supershedders, shed high levels (>10(8) CFU/g) of Salmonella serovar Typhimurium in their feces and, as a result, rapidly transmit infection. While most Salmonella serovar Typhimurium-infected mice show signs of intestinal inflammation, only supershedder mice develop colitis. Development of the supershedder phenotype depends on the virulence determinants Salmonella pathogenicity islands 1 and 2, and it is characterized by mucosal invasion and, importantly, high luminal abundance of Salmonella serovar Typhimurium within the colon. Immunosuppression of infected mice does not induce the supershedder phenotype, demonstrating that the immune response is not the main determinant of Salmonella serovar Typhimurium levels within the colon. In contrast, treatment of mice with antibiotics that alter the health-associated indigenous intestinal microbiota rapidly induces the supershedder phenotype in infected mice and predisposes uninfected mice to the supershedder phenotype for several days. These results demonstrate that the intestinal microbiota plays a critical role in controlling Salmonella serovar Typhimurium infection, disease, and transmissibility. This novel model should facilitate the study of host, pathogen, and intestinal microbiota factors that contribute to infectious disease transmission.

    Funded by: NIAID NIH HHS: AI26195, R01 AI026195

    Infection and immunity 2007;76;1;403-16

  • Common ABCB1 polymorphisms are not associated with multidrug resistance in epilepsy using a gene-wide tagging approach.

    Leschziner GD, Andrew T, Leach JP, Chadwick D, Coffey AJ, Balding DJ, Bentley DR, Pirmohamed M and Johnson MR

    Imperial College London, London, UK.

    P-glycoprotein, the product of the ABCB1 gene, is a proposed mechanism of pharmacoresistance in epilepsy. Previous attempts to correlate the ABCB1 C3435T SNP, or a three-SNP haplotype containing C3435T with epilepsy pharmacoresistance have produced discordant findings. We analysed these single nucleotide polymorphisms (SNPs), plus a more comprehensive set of tagging SNPs describing common variation in ABCB1 in a case-control study. No significant association of C3435T (P=0.55), the three-SNP haplotype (lowest P=0.14) or any gene-wide tagging SNP (lowest P=0.17) with multidrug resistance in epilepsy was identified. Meta-analysis of studies using the same definition of multidrug resistance (n=1064) also demonstrated no significant association of C3435T with multidrug resistance (P=0.31). These findings suggest that C3435T is unlikely to be a marker for epilepsy multidrug resistance. In addition, no evidence for a role of other common ABCB1 polymorphisms was found using a potentially more powerful gene-wide tagging approach.

    Funded by: Wellcome Trust

    Pharmacogenetics and genomics 2007;17;3;217-20

  • The association between polymorphisms in RLIP76 and drug response in epilepsy.

    Leschziner GD, Jorgensen AL, Andrew T, Williamson PR, Marson AG, Coffey AJ, Middleditch C, Balding DJ, Rogers J, Bentley DR, Chadwick D, Johnson MR and Pirmohamed M

    Imperial College London, Division of Neuroscience, Charing Cross Campus, Room 10E07, St Dunstan's Road, London W6 8RF, UK.

    Introduction: Approximately 30% of patients with epilepsy are resistant to treatment with anti-epileptic drugs (AEDs). The ABC drug transporter proteins are hypothesized to mediate drug resistance in epilepsy. More recently, a non-ABC putative transporter, RLIP76, has also been proposed to be involved in the mechanism of pharmacoresistance. One previous association study of six polymorphisms in RLIP76 failed to find any association with drug resistance in a retrospective cohort of epilepsy patients. We aimed to look for an association with outcomes reflecting drug response in a larger prospective cohort, with gene-wide coverage.

    Patients and methods: We investigated the role of common polymorphisms in RLIP76 in epilepsy pharmacoresistance by genotyping 23 common RLIP76 polymorphisms in a prospective cohort of 503 epilepsy patients, from the standard and new anti-epileptic drugs (SANAD) prospective study of new and old AEDs. A total of 13 of these were tested for association with four outcomes reflecting response to drugs: time to first seizure, time to 12-month remission, time to withdrawal due to inadequate seizure control, and time to withdrawal due to unacceptable adverse drug events.

    Results: No significant associations, allowing for multiple testing, were found in the whole cohort. There was also no effect in a subgroup of patients on carbamazepine, which is thought to be a RLIP76 substrate, although two polymorphisms were associated with time to first seizure (p = 0.007).

    Discussion: We failed to demonstrate any association between RLIP76 polymorphisms and four different measures of drug response in the larger cohort, but a subgroup analysis of patients receiving carbamazepine suggested an association that should be investigated further.

    Conclusions: Our data suggest that common variants in RLIP76 are unlikely to contribute to epilepsy drug response.

    Pharmacogenomics 2007;8;12;1715-22

  • Sequencing and analysis of chromosome 1 of Eimeria tenella reveals a unique segmental organization.

    Ling KH, Rajandream MA, Rivailler P, Ivens A, Yap SJ, Madeira AM, Mungall K, Billington K, Yee WY, Bankier AT, Carroll F, Durham AM, Peters N, Loo SS, Isa MN, Novaes J, Quail M, Rosli R, Nor Shamsudin M, Sobreira TJ, Tivey AR, Wai SF, White S, Wu X, Kerhornou A, Blake D, Mohamed R, Shirley M, Gruber A, Berriman M, Tomley F, Dear PH and Wan KL

    Malaysia Genome Institute, UKM-MTDC Smart Technology Centre, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor DE, Malaysia.

    Eimeria tenella is an intracellular protozoan parasite that infects the intestinal tracts of domestic fowl and causes coccidiosis, a serious and sometimes lethal enteritis. Eimeria falls in the same phylum (Apicomplexa) as several human and animal parasites such as Cryptosporidium, Toxoplasma, and the malaria parasite, Plasmodium. Here we report the sequencing and analysis of the first chromosome of E. tenella, a chromosome believed to carry loci associated with drug resistance and known to differ between virulent and attenuated strains of the parasite. The chromosome--which appears to be representative of the genome--is gene-dense and rich in simple-sequence repeats, many of which appear to give rise to repetitive amino acid tracts in the predicted proteins. Most striking is the segmentation of the chromosome into repeat-rich regions peppered with transposon-like elements and telomere-like repeats, alternating with repeat-free regions. Predicted genes differ in character between the two types of segment, and the repeat-rich regions appear to be associated with strain-to-strain variation.

    Funded by: Biotechnology and Biological Sciences Research Council: S19705; Medical Research Council: MC_U105131672; Wellcome Trust

    Genome research 2007;17;3;311-9

  • Molecular analysis of the VSX1 gene in familial keratoconus.

    Liskova P, Ebenezer ND, Hysi PG, Gwilliam R, El-Ashry MF, Moodaley LC, Hau S, Twa M, Tuft SJ and Bhatacharya SS

    Division of Molecular Genetics, Institute of Ophthalmology, UCL, London, UK.

    Purpose: To evaluate the role of the visual system homeobox gene 1 (VSX1) in the pathogenesis of familial keratoconus.

    Methods: Families with two or more individuals with keratoconus were recruited and their members examined. The coding region and intron-exon junctions of the VSX1 gene were sequenced in affected individuals. In cases where there were possible pathogenic changes, segregation within the pedigree was analyzed. Meta analysis of reports on an association of p.D144E change with keratoconus phenotype was performed.

    Results: Probands from a panel of 85 apparently unrelated keratoconus families were included. Eleven sequence variants were observed, including the previously reported c.432C>G (p.D144E) change and two novel intronic single nucleotide polymorphisms. However, these three changes did not cosegregate with the disease phenotype.

    Conclusions: We excluded the c.432C>G sequence alteration as the direct cause of the disease. Lack of possibly pathogenic VSX1 sequence variants in the familial panel suggests that involvement of this gene in the pathogenesis of keratoconus is likely to be confined to a small number of pedigrees, at least in the population studied.

    Molecular vision 2007;13;1887-91

  • Comment on "A common genetic variant is associated with adult and childhood obesity".

    Loos RJ, Barroso I, O'rahilly S and Wareham NJ

    Medical Research Council Epidemiology Unit, Cambridge, UK.

    Herbert et al. (Reports, 14 April 2006, p. 279) found that the rs7566605 genetic variant, located upstream of the INSIG2 gene, was consistently associated with increased body mass index. However, we found no evidence of association between rs7566605 and body mass index in two large ethnically homogeneous population-based cohorts. On the contrary, an opposite tendency was observed.

    Funded by: Medical Research Council: G9824984, MC_U106179471, MC_U106188470; Wellcome Trust: 077016

    Science (New York, N.Y.) 2007;315;5809;187; author reply 187

  • TCF7L2 polymorphisms modulate proinsulin levels and beta-cell function in a British Europid population.

    Loos RJ, Franks PW, Francis RW, Barroso I, Gribble FM, Savage DB, Ong KK, O'Rahilly S and Wareham NJ

    Medical Research Council Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK.

    Rapidly accumulating evidence shows that common T-cell transcription factor (TCF)7L2 polymorphisms confer risk of type 2 diabetes through unknown mechanisms. We examined the association between four TCF7L2 single nucleotide polymorphisms (SNPs), including rs7903146, and measures of insulin sensitivity and insulin secretion in 1,697 Europid men and women of the population-based MRC (Medical Research Council)-Ely study. The T-(minor) allele of rs7903146 was strongly and positively associated with fasting proinsulin (P = 4.55 x 10(-9)) and 32,33 split proinsulin (P = 1.72 x 10(-4)) relative to total insulin levels; i.e., differences between T/T and C/C homozygotes amounted to 21.9 and 18.4% respectively. Notably, the insulin-to-glucose ratio (IGR) at 30-min oral glucose tolerance test (OGTT), a frequently used surrogate of first-phase insulin secretion, was not associated with the TCF7L2 SNP (P > 0.7). However, the insulin response (IGR) at 60-min OGTT was significantly lower in T-allele carriers (P = 3.5 x 10(-3)). The T-allele was also associated with higher A1C concentrations (P = 1.2 x 10(-2)) and reduced beta-cell function, assessed by homeostasis model assessment of beta-cell function (P = 2.8 x 10(-2)). Similar results were obtained for the other TCF7L2 SNPs. Of note, both major genes involved in proinsulin processing (PC1, PC2) contain TCF-binding sites in their promoters. Our findings suggest that the TCF7L2 risk allele may predispose to type 2 diabetes by impairing beta-cell proinsulin processing. The risk allele increases proinsulin levels and diminishes the 60-min but not 30-min insulin response during OGTT. The strong association between the TCF7L2 risk allele and fasting proinsulin but not insulin levels is notable, as, in this unselected and largely normoglycemic population, external influences on beta-cell stress are unlikely to be major factors influencing the efficiency of proinsulin processing.

    Funded by: Medical Research Council: G9824984, MC_U106179471, MC_U106179472, MC_U106188470; Wellcome Trust: 071187, 077016

    Diabetes 2007;56;7;1943-7

  • Comparative gene expression profiling of in vitro differentiated megakaryocytes and erythroblasts identifies novel activatory and inhibitory platelet membrane proteins.

    Macaulay IC, Tijssen MR, Thijssen-Timmer DC, Gusnanto A, Steward M, Burns P, Langford CF, Ellis PD, Dudbridge F, Zwaginga JJ, Watkins NA, van der Schoot CE and Ouwehand WH

    Department of Haematology, University of Cambridge, Cambridge, UK.

    To identify previously unknown platelet receptors we compared the transcriptomes of in vitro differentiated megakaryocytes (MKs) and erythroblasts (EBs). RNA was obtained from purified, biologically paired MK and EB cultures and compared using cDNA microarrays. Bioinformatical analysis of MK-up-regulated genes identified 151 transcripts encoding transmembrane domain-containing proteins. Although many of these were known platelet genes, a number of previously unidentified or poorly characterized transcripts were also detected. Many of these transcripts, including G6b, G6f, LRRC32, LAT2, and the G protein-coupled receptor SUCNR1, encode proteins with structural features or functions that suggest they may be involved in the modulation of platelet function. Immunoblotting on platelets confirmed the presence of the encoded proteins, and flow cytometric analysis confirmed the expression of G6b, G6f, and LRRC32 on the surface of platelets. Through comparative analysis of expression in platelets and other blood cells we demonstrated that G6b, G6f, and LRRC32 are restricted to the platelet lineage, whereas LAT2 and SUCNR1 were also detected in other blood cells. The identification of the succinate receptor SUCNR1 in platelets is of particular interest, because physiologically relevant concentrations of succinate were shown to potentiate the effect of low doses of a variety of platelet agonists.

    Funded by: Medical Research Council: MC_U105260799

    Blood 2007;109;8;3260-9

  • Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.

    Marioni JC, Thorne NP, Valsesia A, Fitzgerald T, Redon R, Fiegler H, Andrews TD, Stranger BE, Lynch AG, Dermitzakis ET, Carter NP, Tavaré S and Hurles ME

    Computational Biology Group, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, UK.

    Background: Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.

    Results: We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.

    Conclusion: Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.

    Funded by: Wellcome Trust

    Genome biology 2007;8;10;R228

  • Renin enhancer is crucial for full response in Renin expression to an in vivo stimulus.

    Markus MA, Goy C, Adams DJ, Lovicu FJ and Morris BJ

    Basic & Clinical Genomics Laboratory, School of Medical Sciences and Bosch Institute, Building F13, University of Sydney, NSW 2006, Australia.

    We showed recently that deletion of a strong enhancer located 2.7 kb upstream of the renin gene in mice produces a strain with mild hypotension and salt-sensitivity. Here we set out to compare responses in renin expression in kidney and extrarenal tissues in these "REKO" mice. REKO and wild-type mice were placed on a low NaCl/enalapril regimen for 1 week, and then Ren-1(c) mRNA and renin enzyme activities were measured in tissues and plasma. In untreated REKO mice, renin and Ren-1(c) mRNA were reduced significantly in kidney, submandibular gland, adrenal, heart, and brain. In situ hybridization indicated a marked reduction in Ren-1(c) mRNA in juxtaglomerular cells and granular ducts of submandibular gland. After the chronic stimulus response in renal Ren-1(c) mRNA in REKO mice was blunted by 54% compared with wild-type mice, and was accompanied by almost complete exhaustion of renin stores. Response in plasma renin was blunted by 47%, this being mirrored in heart (54% decline), in which renin is derived mostly from the bloodstream. In adrenal a 55% reduction was seen. These data are consistent with inability of REKO mice to adequately replenish renal renin stores during chronic stimulation of renin secretion. In conclusion, the renin enhancer is critical for replenishment of renin stores and response in renin to a chronic in vivo stimulus.

    Hypertension (Dallas, Tex. : 1979) 2007;50;5;933-8

  • Chromosomally unstable mouse tumours have genomic alterations similar to diverse human cancers.

    Maser RS, Choudhury B, Campbell PJ, Feng B, Wong KK, Protopopov A, O'Neil J, Gutierrez A, Ivanova E, Perna I, Lin E, Mani V, Jiang S, McNamara K, Zaghlul S, Edkins S, Stevens C, Brennan C, Martin ES, Wiedemeyer R, Kabbarah O, Nogueira C, Histen G, Aster J, Mansour M, Duke V, Foroni L, Fielding AK, Goldstone AH, Rowe JM, Wang YA, Look AT, Stratton MR, Chin L, Futreal PA and DePinho RA

    Department of Medical Oncology, Dana Farber Cancer Institute, Boston, Massachusetts 02115, USA.

    Highly rearranged and mutated cancer genomes present major challenges in the identification of pathogenetic events driving the neoplastic transformation process. Here we engineered lymphoma-prone mice with chromosomal instability to assess the usefulness of mouse models in cancer gene discovery and the extent of cross-species overlap in cancer-associated copy number aberrations. Along with targeted re-sequencing, our comparative oncogenomic studies identified FBXW7 and PTEN to be commonly deleted both in murine lymphomas and in human T-cell acute lymphoblastic leukaemia/lymphoma (T-ALL). The murine cancers acquire widespread recurrent amplifications and deletions targeting loci syntenic to those not only in human T-ALL but also in diverse human haematopoietic, mesenchymal and epithelial tumours. These results indicate that murine and human tumours experience common biological processes driven by orthologous genetic events in their malignant evolution. The highly concordant nature of genomic events encourages the use of genomically unstable murine cancer models in the discovery of biological driver events in the human oncogenome.

    Funded by: Medical Research Council: G0500389; Wellcome Trust: 077012, 088340

    Nature 2007;447;7147;966-71

  • Genetic relatedness of the Streptococcus pneumoniae capsular biosynthetic loci.

    Mavroidi A, Aanensen DM, Godoy D, Skovsted IC, Kaltoft MS, Reeves PR, Bentley SD and Spratt BG

    Department of Infectious Disease Epidemiology, Imperial College London, Room G22, Old Medical School Building, St. Mary's Hospital, Norfolk Place, London W2 1PG, United Kingdom.

    Streptococcus pneumoniae (the pneumococcus) produces 1 of 91 capsular polysaccharides (CPS) that define the serotype. The cps loci of 88 pneumococcal serotypes whose CPS is synthesized by the Wzy-dependent pathway were compared with each other and with additional streptococcal polysaccharide biosynthetic loci and were clustered according to the proportion of shared homology groups (HGs), weighted for the sequence similarities between the genes encoding the shared HGs. The cps loci of the 88 pneumococcal serotypes were distributed into eight major clusters and 21 subclusters. All serotypes within the same serogroup fell into the same major cluster, but in six cases, serotypes within the same serogroup were in different subclusters and, conversely, nine subclusters included completely different serotypes. The closely related cps loci within a subcluster were compared to the known CPS structures to relate gene content to structure. The Streptococcus oralis and Streptococcus mitis polysaccharide biosynthetic loci clustered within the pneumococcal cps loci and were in a subcluster that also included the cps locus of pneumococcal serotype 21, whereas the Streptococcus agalactiae cps loci formed a single cluster that was not closely related to any of the pneumococcal cps clusters.

    Funded by: Wellcome Trust

    Journal of bacteriology 2007;189;21;7841-55

  • Lamin A/C polymorphisms, type 2 diabetes, and the metabolic syndrome: case-control and quantitative trait studies.

    Mesa JL, Loos RJ, Franks PW, Ong KK, Luan J, O'Rahilly S, Wareham NJ and Barroso I

    Medical Research Center Epidemiology Unit, Cambridge, U.K.

    Mutations in the LMNA gene, encoding the nuclear envelope protein lamin A/C, are responsible for a number of distinct disease entities including Dunnigan-type familial partial lipodystrophy. Dunningan-type lipodystrophy is characterized by loss of subcutaneous adipose tissue, insulin resistance, dyslipidemia, and type 2 diabetes and shares many of the features of the metabolic syndrome. Furthermore, several genome-wide linkage scans for type 2 diabetes have found evidence of linkage at chromosome 1q21.2, the region that harbors the LMNA gene. Therefore, LMNA is a biological and positional candidate for type 2 diabetes susceptibility. Previous studies have reported association between a common LMNA variant (1908C>T; rs4641) and adverse metabolic traits in ethnically diverse populations from Asia and North America. In the present study, we characterized the common variation across the LMNA gene (including rs4641) and tested for association with type 2 diabetes in two large case-control studies (n = 2,052) and with features of the metabolic syndrome in a separate cohort study (n = 1,572). Despite our study being sufficiently powered to detect effects similar and even smaller in magnitude than those previously reported, none of the LMNA single nucleotide polymorphisms were statistically significantly associated with type 2 diabetes or the metabolic syndrome. Thus, it appears unlikely that variation at LMNA substantially increases the risk of type 2 diabetes or related traits in U.K. Europids.

    Funded by: Medical Research Council: MC_U106179471, MC_U106179472, MC_U106188470; Wellcome Trust: 077016

    Diabetes 2007;56;3;884-9

  • Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.

    Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, Jurka J, Kamal M, Mauceli E, Searle SM, Sharpe T, Baker ML, Batzer MA, Benos PV, Belov K, Clamp M, Cook A, Cuff J, Das R, Davidow L, Deakin JE, Fazzari MJ, Glass JL, Grabherr M, Greally JM, Gu W, Hore TA, Huttley GA, Kleber M, Jirtle RL, Koina E, Lee JT, Mahony S, Marra MA, Miller RD, Nicholls RD, Oda M, Papenfuss AT, Parra ZE, Pollock DD, Ray DA, Schein JE, Speed TP, Thompson K, VandeBerg JL, Wade CM, Walker JA, Waters PD, Webber C, Weidman JR, Xie X, Zody MC, Broad Institute Genome Sequencing Platform, Broad Institute Whole Genome Assembly Team, Graves JA, Ponting CP, Breen M, Samollow PB, Lander ES and Lindblad-Toh K

    Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.

    We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.

    Funded by: Medical Research Council: MC_U137761446; Wellcome Trust: 062023

    Nature 2007;447;7141;167-77

  • Genomic expression patterns in cell separation mutants of Schizosaccharomyces pombe defective in the genes sep10 ( + ) and sep15 ( + ) coding for the Mediator subunits Med31 and Med8.

    Miklos I, Szilagyi Z, Watt S, Zilahi E, Batta G, Antunovics Z, Enczi K, Bähler J and Sipiczki M

    Department of Genetics and Applied Microbiology, University of Debrecen, Debrecen, Hungary.

    Cell division is controlled by a complex network involving regulated transcription of genes and postranslational modification of proteins. The aim of this study is to demonstrate that the Mediator complex, a general regulator of transcription, is involved in the regulation of the second phase (cell separation) of cell division of the fission yeast Schizosaccharomyces pombe. In previous studies we have found that the fission yeast cell separation genes sep10 ( + ) and sep15 ( + ) code for proteins (Med31 and Med8) associated with the Mediator complex. Here, we show by genome-wide gene expression profiling of mutants defective in these genes that both Med8 and Med31 control large, partially overlapping sets of genes scattered over the entire genome and involved in diverse biological functions. Six cell separation genes controlled by the transcription factors Sep1 and Ace2 are among the target genes. Since neither sep1 ( + ) nor ace2 ( + ) is affected in the mutant cells, we propose that the Med8 and Med31 proteins act as coactivators of the Sep1-Ace2-dependent cell separation genes. The results also indicate that the subunits of Mediator may contribute to the coordination of cellular processes by fine-tuning of the expression of larger sets of genes.

    Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118

    Molecular genetics and genomics : MGG 2007;279;3;225-38

  • Critical assessment of methods of protein structure prediction-Round VII.

    Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T and Tramontano A

    Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA.

    This paper is an introduction to the supplemental issue of the journal PROTEINS, dedicated to the seventh CASP experiment to assess the state of the art in protein structure prediction. The paper describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Highlights are improvements in model accuracy relative to that obtainable from knowledge of a single best template structure; convergence of the accuracy of models produced by automatic servers toward that produced by human modeling teams; the emergence of methods for predicting the quality of models; and rapidly increasing practical applications of the methods.

    Funded by: NIGMS NIH HHS: GM072354; NLM NIH HHS: LM07085; Wellcome Trust: 077198

    Proteins 2007;69 Suppl 8;3-9

  • Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum.

    Mourier T, Carret C, Kyes S, Christodoulou Z, Gardner PP, Jeffares DC, Pinches R, Barrell B, Berriman M, Griffiths-Jones S, Ivens A, Newbold C and Pain A

    Ancient DNA and Evolution Group, Department of Biology, University of Copenhagen, Copenhagen DK-2100, Denmark.

    We undertook a genome-wide search for novel noncoding RNAs (ncRNA) in the malaria parasite Plasmodium falciparum. We used the RNAz program to predict structures in the noncoding regions of the P. falciparum 3D7 genome that were conserved with at least one of seven other Plasmodium spp. genome sequences. By using Northern blot analysis for 76 high-scoring predictions and microarray analysis for the majority of candidates, we have verified the expression of 33 novel ncRNA transcripts including four members of a ncRNA family in the asexual blood stage. These transcripts represent novel structured ncRNAs in P. falciparum and are not represented in any RNA databases. We provide supporting evidence for purifying selection acting on the experimentally verified ncRNAs by comparing the nucleotide substitutions in the predicted ncRNA candidate structures in P. falciparum with the closely related chimp malaria parasite P. reichenowi. The high confirmation rate within a single parasite life cycle stage suggests that many more of the predictions may be expressed in other stages of the organism's life cycle.

    Funded by: Wellcome Trust

    Genome research 2007;18;2;281-92

  • Mouse Phenotype Database Integration Consortium: integration [corrected] of mouse phenome data resources.

    Mouse Phenotype Database Integration Consortium, Hancock JM, Adams NC, Aidinis V, Blake A, Bogue M, Brown SD, Chesler EJ, Davidson D, Duran C, Eppig JT, Gailus-Durner V, Gates H, Gkoutos GV, Greenaway S, Hrabé de Angelis M, Kollias G, Leblanc S, Lee K, Lengger C, Maier H, Mallon AM, Masuya H, Melvin DG, Müller W, Parkinson H, Proctor G, Reuveni E, Schofield P, Shukla A, Smith C, Toyoda T, Vasseur L, Wakana S, Walling A, White J, Wood J and Zouberakis M

    Understanding the functions encoded in the mouse genome will be central to an understanding of the genetic basis of human disease. To achieve this it will be essential to be able to characterize the phenotypic consequences of variation and alterations in individual genes. Data on the phenotypes of mouse strains are currently held in a number of different forms (detailed descriptions of mouse lines, first-line phenotyping data on novel mutations, data on the normal features of inbred lines) at many sites worldwide. For the most efficient use of these data sets, we have initiated a process to develop standards for the description of phenotypes (using ontologies) and file formats for the description of phenotyping protocols and phenotype data sets. This process is ongoing and needs to be supported by the wider mouse genetics and phenotyping communities to succeed. We invite interested parties to contact us as we develop this process further.

    Funded by: Medical Research Council: MC_U127527203, MC_U142684171, MC_U142684172, MC_U142684175

    Mammalian genome : official journal of the International Mammalian Genome Society 2007;18;3;157-63

  • New developments in the InterPro database.

    Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH and Yeats C

    EMBL Outstation-European Bioinformatics Institute Hinxton, Cambridge, UK.

    InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (, and for download by anonymous FTP ( The InterProScan search tool is now also available via a web service at

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Medical Research Council: G0100305; Wellcome Trust: 087656

    Nucleic acids research 2007;35;Database issue;D224-8

  • Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A.

    Nejentsev S, Howson JM, Walker NM, Szeszko J, Field SF, Stevens HE, Reynolds P, Hardy M, King E, Masters J, Hulme J, Maier LM, Smyth D, Bailey R, Cooper JD, Ribas G, Campbell RD, Clayton DG, Todd JA and Wellcome Trust Case Control Consortium

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute of Medical Research, University of Cambridge, CB2 0XY, UK.

    The major histocompatibility complex (MHC) on chromosome 6 is associated with susceptibility to more common diseases than any other region of the human genome, including almost all disorders classified as autoimmune. In type 1 diabetes the major genetic susceptibility determinants have been mapped to the MHC class II genes HLA-DQB1 and HLA-DRB1 (refs 1-3), but these genes cannot completely explain the association between type 1 diabetes and the MHC region. Owing to the region's extreme gene density, the multiplicity of disease-associated alleles, strong associations between alleles, limited genotyping capability, and inadequate statistical approaches and sample sizes, which, and how many, loci within the MHC determine susceptibility remains unclear. Here, in several large type 1 diabetes data sets, we analyse a combined total of 1,729 polymorphisms, and apply statistical methods-recursive partitioning and regression-to pinpoint disease susceptibility to the MHC class I genes HLA-B and HLA-A (risk ratios >1.5; P(combined) = 2.01 x 10(-19) and 2.35 x 10(-13), respectively) in addition to the established associations of the MHC class II genes. Other loci with smaller and/or rarer effects might also be involved, but to find these, future searches must take into account both the HLA class II and class I genes and use even larger samples. Taken together with previous studies, we conclude that MHC-class-I-mediated events, principally involving HLA-B*39, contribute to the aetiology of type 1 diabetes.

    Funded by: Medical Research Council: G0000934, G0600681; Wellcome Trust: 076113

    Nature 2007;450;7171;887-92

  • Modeling insertional mutagenesis using gene length and expression in murine embryonic stem cells.

    Nord AS, Vranizan K, Tingley W, Zambon AC, Hanspers K, Fong LG, Hu Y, Bacchetti P, Ferrin TE, Babbitt PC, Doniger SW, Skarnes WC, Young SG and Conklin BR

    Department of Medicine, MacDonald Medical Research Laboratories, University of California at Los Angeles, California, USA.

    Background: High-throughput mutagenesis of the mammalian genome is a powerful means to facilitate analysis of gene function. Gene trapping in embryonic stem cells (ESCs) is the most widely used form of insertional mutagenesis in mammals. However, the rules governing its efficiency are not fully understood, and the effects of vector design on the likelihood of gene-trapping events have not been tested on a genome-wide scale.

    Methodology/principal findings: In this study, we used public gene-trap data to model gene-trap likelihood. Using the association of gene length and gene expression with gene-trap likelihood, we constructed spline-based regression models that characterize which genes are susceptible and which genes are resistant to gene-trapping techniques. We report results for three classes of gene-trap vectors, showing that both length and expression are significant determinants of trap likelihood for all vectors. Using our models, we also quantitatively identified hotspots of gene-trap activity, which represent loci where the high likelihood of vector insertion is controlled by factors other than length and expression. These formalized statistical models describe a high proportion of the variance in the likelihood of a gene being trapped by expression-dependent vectors and a lower, but still significant, proportion of the variance for vectors that are predicted to be independent of endogenous gene expression.

    Conclusions/significance: The findings of significant expression and length effects reported here further the understanding of the determinants of vector insertion. Results from this analysis can be applied to help identify other important determinants of this important biological phenomenon and could assist planning of large-scale mutagenesis efforts.

    Funded by: NHGRI NIH HHS: HG002766; NHLBI NIH HHS: HL66621, U01 HL066621

    PloS one 2007;2;7;e617

  • Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility.

    Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, Fisher SA, Roberts RG, Nimmo ER, Cummings FR, Soars D, Drummond H, Lees CW, Khawaja SA, Bagnall R, Burke DA, Todhunter CE, Ahmad T, Onnie CM, McArdle W, Strachan D, Bethel G, Bryan C, Lewis CM, Deloukas P, Forbes A, Sanderson J, Jewell DP, Satsangi J, Mansfield JC, Wellcome Trust Case Control Consortium, Cardon L and Mathew CG

    Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, University of Cambridge, Cambridge CB2 2QQ, UK.

    A genome-wide association scan in individuals with Crohn's disease by the Wellcome Trust Case Control Consortium detected strong association at four novel loci. We tested 37 SNPs from these and other loci for association in an independent case-control sample. We obtained replication for the autophagy-inducing IRGM gene on chromosome 5q33.1 (replication P = 6.6 x 10(-4), combined P = 2.1 x 10(-10)) and for nine other loci, including NKX2-3, PTPN2 and gene deserts on chromosomes 1q and 5p13.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 072029

    Nature genetics 2007;39;7;830-2

  • Interaction analysis of the CBLB and CTLA4 genes in type 1 diabetes.

    Payne F, Cooper JD, Walker NM, Lam AC, Smink LJ, Nutland S, Stevens HE, Hutchings J and Todd JA

    Juvenile Diabetes Research Foundation/Wellcome Trust, Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Cambridge, UK.

    Gene-gene interaction analyses have been suggested as a potential strategy to help identify common disease susceptibility genes. Recently, evidence of a statistical interaction between polymorphisms in two negative immunoregulatory genes, CBLB and CTLA4, has been reported in type 1 diabetes (T1D). This study, in 480 Danish families, reported an association between T1D and a synonymous coding SNP in exon 12 of the CBLB gene (rs3772534 G>A; minor allele frequency, MAF=0.24; derived relative risk, RR for G allele=1.78; P=0.046). Furthermore, evidence of a statistical interaction with the known T1D susceptibility-associated CTLA4 polymorphism rs3087243 (laboratory name CT60, G>A) was reported (P<0.0001), such that the CBLB SNP rs3772534 G allele was overtransmitted to offspring with the CTLA4 rs3087243 G/G genotype. We have, therefore, attempted to obtain additional support for this finding in both large family and case-control collections. In a primary analysis, no evidence for an association of the CBLB SNP rs3772534 with disease was found in either sample set (2162 parent-child trios, P=0.33; 3453 cases and 3655 controls, P=0.69). In the case-only statistical interaction analysis between rs3772534 and rs3087243, there was also no support for an effect (1994 T1D affected offspring, and 3215 cases, P=0.92). These data highlight the need for large, well-characterized populations, offering the possibility of obtaining additional support for initial observations owing to the low prior probability of identifying reproducible evidence of gene-gene interactions in the analysis of common disease-associated variants in human populations.

    Funded by: Medical Research Council: G0000934; Wellcome Trust

    Journal of leukocyte biology 2007;81;3;581-3

  • Comparative genomic analysis of three Leishmania species that cause diverse human disease.

    Peacock CS, Seeger K, Harris D, Murphy L, Ruiz JC, Quail MA, Peters N, Adlem E, Tivey A, Aslett M, Kerhornou A, Ivens A, Fraser A, Rajandream MA, Carver T, Norbertczak H, Chillingworth T, Hance Z, Jagels K, Moule S, Ormond D, Rutter S, Squares R, Whitehead S, Rabbinowitsch E, Arrowsmith C, White B, Thurston S, Bringaud F, Baldauf SL, Faulconbridge A, Jeffares D, Depledge DP, Oyola SO, Hilley JD, Brito LO, Tosi LR, Barrell B, Cruz AK, Mottram JC, Smith DF and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.

    Funded by: Medical Research Council: G0000508; Wellcome Trust: 076355, 085775

    Nature genetics 2007;39;7;839-47

  • Environmental and genetic modifiers of squint penetrance during zebrafish embryogenesis.

    Pei W, Williams PH, Clark MD, Stemple DL and Feldman B

    Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.

    The Nodal-related subgroup of the TGFbeta superfamily of secreted cytokines regulates the specification of the mesodermal and endodermal germ layers during gastrulation. Two Nodal-related proteins - Squint (Sqt) and Cyclops (Cyc) - are expressed during germ-layer specification in zebrafish. Genetic sqt mutant phenotypes have defined a variable requirement for zygotic Sqt, but not for maternal Sqt, in midline mesendoderm development. However a comparison of phenotypes arising from oocytes or zygotes injected with Sqt antisense morpholinos has suggested a novel requirement for maternal Sqt in dorsal specification. In this study we examined maternal-zygotic mutants for each of two sqt alleles and we also compared phenotypes of closely related zygotic and maternal-zygotic sqt mutants. Each of these approaches indicated there is no general requirement for maternal Sqt. To better understand the dispensability of maternal and zygotic Sqt, we sought out developmental contexts that more rigorously demand intact Sqt signalling. We found that sqt penetrance is influenced by genetic modifiers, by environmental temperature, by levels of residual Activin-like activity and by Heat-Shock Protein 90 (HSP90) activity. Therefore, Sqt may confer an evolutionary advantage by protecting early-stage embryos against detrimental interacting alleles and environmental challenges.

    Funded by: Intramural NIH HHS: Z01 HG200309-05, Z99 HG999999; Wellcome Trust

    Developmental biology 2007;308;2;368-78

  • Diet and the evolution of human amylase gene copy number variation.

    Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain JL, Misra R, Carter NP, Lee C and Stone AC

    School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona 85287, USA.

    Starch consumption is a prominent characteristic of agricultural societies and hunter-gatherers in arid environments. In contrast, rainforest and circum-arctic hunter-gatherers and some pastoralists consume much less starch. This behavioral variation raises the possibility that different selective pressures have acted on amylase, the enzyme responsible for starch hydrolysis. We found that copy number of the salivary amylase gene (AMY1) is correlated positively with salivary amylase protein level and that individuals from populations with high-starch diets have, on average, more AMY1 copies than those with traditionally low-starch diets. Comparisons with other loci in a subset of these populations suggest that the extent of AMY1 copy number differentiation is highly unusual. This example of positive selection on a copy number-variable gene is, to our knowledge, one of the first discovered in the human genome. Higher AMY1 copy numbers and protein levels probably improve the digestion of starchy foods and may buffer against the fitness-reducing effects of intestinal disease.

    Funded by: NCRR NIH HHS: C06 RR014491, C06 RR014491-01, C06 RR016483, C06 RR016483-01, RR014491, RR015087, RR016483, U42 RR015087, U42 RR015087-01; Wellcome Trust

    Nature genetics 2007;39;10;1256-60

  • F0 generation mice fully derived from gene-targeted embryonic stem cells allowing immediate phenotypic analyses.

    Poueymirou WT, Auerbach W, Frendewey D, Hickey JF, Escaravage JM, Esau L, Doré AT, Stevens S, Adams NC, Dominguez MG, Gale NW, Yancopoulos GD, DeChiara TM and Valenzuela DM

    Regeneron Pharmaceuticals, Inc., Tarrytown, New York 10591, USA.

    A useful approach for exploring gene function involves generating mutant mice from genetically modified embryonic stem (ES) cells. Recent advances in genetic engineering of ES cells have shifted the bottleneck in this process to the generation of mice. Conventional injections of ES cells into blastocyst hosts produce F0 generation chimeras that are only partially derived from ES cells, requiring additional breeding to obtain mutant mice that can be phenotyped. The tetraploid complementation approach directly yields mice that are almost entirely derived from ES cells, but it is inefficient, works only with certain hybrid ES cell lines and suffers from nonspecific lethality and abnormalities, complicating phenotypic analyses. Here we show that laser-assisted injection of either inbred or hybrid ES cells into eight cell-stage embryos efficiently yields F0 generation mice that are fully ES cell-derived and healthy, exhibit 100% germline transmission and allow immediate phenotypic analysis, greatly accelerating gene function assignment.

    Nature biotechnology 2007;25;1;91-9

  • Integrating sequence and structural biology with DAS.

    Prlić A, Down TA, Kulesha E, Finn RD, Kähäri A and Hubbard TJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Background: The Distributed Annotation System (DAS) is a network protocol for exchanging biological data. It is frequently used to share annotations of genomes and protein sequence.

    Results: Here we present several extensions to the current DAS 1.5 protocol. These provide new commands to share alignments, three dimensional molecular structure data, add the possibility for registration and discovery of DAS servers, and provide a convention how to provide different types of data plots. We present examples of web sites and applications that use the new extensions. We operate a public registry of DAS sources, which now includes entries for more than 250 distinct sources.

    Conclusion: Our DAS extensions are essential for the management of the growing number of services and exchange of diverse biological data sets. In addition the extensions allow new types of applications to be developed and scientific questions to be addressed. The registry of DAS sources is available at

    Funded by: Wellcome Trust: 062023, 077198

    BMC bioinformatics 2007;8;333

  • Mosaic complementation demonstrates a regulatory role for myosin VIIa in actin dynamics of stereocilia.

    Prosser HM, Rzadzinska AK, Steel KP and Bradley A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    We have developed a bacterial artificial chromosome transgenesis approach that allowed the expression of myosin VIIa from the mouse X chromosome. We demonstrated the complementation of the Myo7a null mutant phenotype producing a fine mosaic of two types of sensory hair cells within inner ear epithelia of hemizygous transgenic females due to X inactivation. Direct comparisons between neighboring auditory hair cells that were different only with respect to myosin VIIa expression revealed that mutant stereocilia are significantly longer than those of their complemented counterparts. Myosin VIIa-deficient hair cells showed an abnormally persistent tip localization of whirlin, a protein directly linked to elongation of stereocilia, in stereocilia. Furthermore, myosin VIIa localized at the tips of all abnormally short stereocilia of mice deficient for either myosin XVa or whirlin. Our results strongly suggest that myosin VIIa regulates the establishment of a setpoint for stereocilium heights, and this novel role may influence their normal staircase-like arrangement within a bundle.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Molecular and cellular biology 2007;28;5;1702-12

  • Wnt5a functions in planar cell polarity regulation in mice.

    Qian D, Jones C, Rzadzinska A, Mark S, Zhang X, Steel KP, Dai X and Chen P

    Department of Cell Biology, Emory University School of Medicine, Atlanta, GA 30322, USA.

    Planar cell polarity (PCP) refers to the polarization of cells within the plane of a cell sheet. A distinctive epithelial PCP in vertebrates is the uniform orientation of stereociliary bundles of the sensory hair cells in the mammalian cochlea. In addition to establishing epithelial PCP, planar polarization is also required for convergent extension (CE); a polarized cellular movement that occurs during neural tube closure and cochlear extension. Studies in Drosophila and vertebrates have revealed a conserved PCP pathway, including Frizzled (Fz) receptors. Here we use the cochlea as a model system to explore the involvement of known ligands of Fz, Wnt morphogens, in PCP regulation. We show that Wnt5a forms a reciprocal expression pattern with a Wnt antagonist, the secreted frizzled-related protein 3 (Sfrp3 or Frzb), along the axis of planar polarization in the cochlear epithelium. We further demonstrate that Wnt5a antagonizes Frzb in regulating cochlear extension and stereociliary bundle orientation in vitro, and that Wnt5a(-/-) animals have a shortened and widened cochlea. Finally, we show that Wnt5a is required for proper subcellular distribution of a PCP protein, Ltap/Vangl2, and that Wnt5a interacts genetically with Ltap/Vangl2 for uniform orientation of stereocilia, cochlear extension, and neural tube closure. Together, these findings demonstrate that Wnt5a functions in PCP regulation in mice.

    Funded by: Medical Research Council: G0300212, MC_QA137918; NIDCD NIH HHS: R01 DC005213, R01 DC005213-06, R01 DC007423, R01 DC007423-01A2; Wellcome Trust

    Developmental biology 2007;306;1;121-33

  • Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia.

    Raval A, Tanner SM, Byrd JC, Angerman EB, Perko JD, Chen SS, Hackanson B, Grever MR, Lucas DM, Matkovic JJ, Lin TS, Kipps TJ, Murray F, Weisenburger D, Sanger W, Lynch J, Watson P, Jansen M, Yoshinaga Y, Rosenquist R, de Jong PJ, Coggill P, Beck S, Lynch H, de la Chapelle A and Plass C

    Department of Molecular Virology, Immunology, and Medical Genetics, Human Cancer Genetics Program, The Comprehensive Cancer Center at The Ohio State University, Columbus, OH 43214, USA.

    The heritability of B cell chronic lymphocytic leukemia (CLL) is relatively high; however, no predisposing mutation has been convincingly identified. We show that loss or reduced expression of death-associated protein kinase 1 (DAPK1) underlies cases of heritable predisposition to CLL and the majority of sporadic CLL. Epigenetic silencing of DAPK1 by promoter methylation occurs in almost all sporadic CLL cases. Furthermore, we defined a disease haplotype, which segregates with the CLL phenotype in a large family. DAPK1 expression of the CLL allele is downregulated by 75% in germline cells due to increased HOXB7 binding. In the blood cells from affected family members, promoter methylation results in additional loss of DAPK1 expression. Thus, reduced expression of DAPK1 can result from germline predisposition, as well as epigenetic or somatic events causing or contributing to the CLL phenotype.

    Funded by: NCI NIH HHS: 5U01 CA86389, CA101956, CA110496, CA81534, P30 CA16058, T32 CA106196; Wellcome Trust

    Cell 2007;129;5;879-90

  • MEROPS: the peptidase database.

    Rawlings ND, Morton FR, Kok CY, Kong J and Barrett AJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Peptidases (proteolytic enzymes or proteases), their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database ( aims to fulfil the need for an integrated source of information about these. The organizational principle of the database is a hierarchical classification in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families and in turn grouped into clans. Important additions to the database include newly written, concise text annotations for peptidase clans and the small molecule inhibitors that are outside the scope of the standard classification; displays to show peptidase specificity compiled from our collection of known substrate cleavages; tables of peptidase-inhibitor interactions; and dynamically generated alignments of representatives of each protein species at the family level. New ways to compare peptidase and inhibitor complements between any two organisms whose genomes have been completely sequenced, or between different strains or subspecies of the same organism, have been devised.

    Funded by: Wellcome Trust

    Nucleic acids research 2007;36;Database issue;D320-5

  • Mutations in ZDHHC9, which encodes a palmitoyltransferase of NRAS and HRAS, cause X-linked mental retardation associated with a Marfanoid habitus.

    Raymond FL, Tarpey PS, Edkins S, Tofts C, O'Meara S, Teague J, Butler A, Stevens C, Barthorpe S, Buck G, Cole J, Dicks E, Gray K, Halliday K, Hills K, Hinton J, Jones D, Menzies A, Perry J, Raine K, Shepherd R, Small A, Varian J, Widaa S, Mallya U, Moon J, Luo Y, Shaw M, Boyle J, Kerr B, Turner G, Quarrell O, Cole T, Easton DF, Wooster R, Bobrow M, Schwartz CE, Gecz J, Stratton MR and Futreal PA

    Cambridge Institute of Medical Research, University of Cambridge, Cambridge, CB2 2XY, UK.

    We have identified one frameshift mutation, one splice-site mutation, and two missense mutations in highly conserved residues in ZDHHC9 at Xq26.1 in 4 of 250 families with X-linked mental retardation (XLMR). In three of the families, the mental retardation phenotype is associated with a Marfanoid habitus, although none of the affected individuals meets the Ghent criteria for Marfan syndrome. ZDHHC9 is a palmitoyltransferase that catalyzes the posttranslational modification of NRAS and HRAS. The degree of palmitoylation determines the temporal and spatial location of these proteins in the plasma membrane and Golgi complex. The finding of mutations in ZDHHC9 suggests that alterations in the concentrations and cellular distribution of target proteins are sufficient to cause disease. This is the first XLMR gene to be reported that encodes a posttranslational modification enzyme, palmitoyltransferase. Furthermore, now that the first palmitoyltransferase that causes mental retardation has been identified, defects in other palmitoylation transferases become good candidates for causing other mental retardation syndromes.

    Funded by: NICHD NIH HHS: HD26202, R01 HD026202; Wellcome Trust

    American journal of human genetics 2007;80;5;982-7

  • Evolutionary and biomedical insights from the rhesus macaque genome.

    Rhesus Macaque Genome Sequencing and Analysis Consortium, Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, Dinh HH, Dugan-Rocha S, Fulton LA, Gabisi RA, Garner TT, Godfrey J, Hawes AC, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Kirkness EF, Cree A, Fowler RG, Lee S, Lewis LR, Li Z, Liu YS, Moore SM, Muzny D, Nazareth LV, Ngo DN, Okwuonu GO, Pai G, Parker D, Paul HA, Pfannkoch C, Pohl CS, Rogers YH, Ruiz SJ, Sabo A, Santibanez J, Schneider BW, Smith SM, Sodergren E, Svatek AF, Utterback TR, Vattathil S, Warren W, White CS, Chinwalla AT, Feng Y, Halpern AL, Hillier LW, Huang X, Minx P, Nelson JO, Pepin KH, Qin X, Sutton GG, Venter E, Walenz BP, Wallis JW, Worley KC, Yang SP, Jones SM, Marra MA, Rocchi M, Schein JE, Baertsch R, Clarke L, Csürös M, Glasscock J, Harris RA, Havlak P, Jackson AR, Jiang H, Liu Y, Messina DN, Shen Y, Song HX, Wylie T, Zhang L, Birney E, Han K, Konkel MK, Lee J, Smit AF, Ullmer B, Wang H, Xing J, Burhans R, Cheng Z, Karro JE, Ma J, Raney B, She X, Cox MJ, Demuth JP, Dumas LJ, Han SG, Hopkins J, Karimpour-Fard A, Kim YH, Pollack JR, Vinar T, Addo-Quaye C, Degenhardt J, Denby A, Hubisz MJ, Indap A, Kosiol C, Lahn BT, Lawson HA, Marklein A, Nielsen R, Vallender EJ, Clark AG, Ferguson B, Hernandez RD, Hirani K, Kehrer-Sawatzki H, Kolb J, Patil S, Pu LL, Ren Y, Smith DG, Wheeler DA, Schenck I, Ball EV, Chen R, Cooper DN, Giardine B, Hsu F, Kent WJ, Lesk A, Nelson DL, O'brien WE, Prüfer K, Stenson PD, Wallace JC, Ke H, Liu XM, Wang P, Xiang AP, Yang F, Barber GP, Haussler D, Karolchik D, Kern AD, Kuhn RM, Smith KE and Zwieg AS

    Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.

    The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.

    Funded by: NHGRI NIH HHS: R01 HG002939, U54 HG003068, U54 HG003079, U54 HG003273; Wellcome Trust: 062023

    Science (New York, N.Y.) 2007;316;5822;222-34

  • Requirement of bic/microRNA-155 for normal immune function.

    Rodriguez A, Vigorito E, Clare S, Warren MV, Couttet P, Soond DR, van Dongen S, Grocock RJ, Das PP, Miska EA, Vetrie D, Okkenhaug K, Enright AJ, Dougan G, Turner M and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    MicroRNAs are a class of small RNAs that are increasingly being recognized as important regulators of gene expression. Although hundreds of microRNAs are present in the mammalian genome, genetic studies addressing their physiological roles are at an early stage. We have shown that mice deficient for bic/microRNA-155 are immunodeficient and display increased lung airway remodeling. We demonstrate a requirement of bic/microRNA-155 for the function of B and T lymphocytes and dendritic cells. Transcriptome analysis of bic/microRNA-155-deficient CD4+ T cells identified a wide spectrum of microRNA-155-regulated genes, including cytokines, chemokines, and transcription factors. Our work suggests that bic/microRNA-155 plays a key role in the homeostasis and function of the immune system.

    Funded by: Medical Research Council: G117/424; Wellcome Trust: 077187

    Science (New York, N.Y.) 2007;316;5824;608-11

  • Genome-wide detection and characterization of positive selection in human populations.

    Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Johnson TA, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R and Stewart J

    Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02139, USA.

    With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.

    Funded by: Wellcome Trust: 077008, 077011, 077046, 081682

    Nature 2007;449;7164;913-8

  • Genomewide association analysis of coronary artery disease.

    Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, Dixon RJ, Meitinger T, Braund P, Wichmann HE, Barrett JH, König IR, Stevens SE, Szymczak S, Tregouet DA, Iles MM, Pahlke F, Pollard H, Lieb W, Cambien F, Fischer M, Ouwehand W, Blankenberg S, Balmforth AJ, Baessler A, Ball SG, Strom TM, Braenne I, Gieger C, Deloukas P, Tobin MD, Ziegler A, Thompson JR, Schunkert H and WTCCC and the Cardiogenics Consortium

    University of Leicester, Leicester, United Kingdom.

    Background: Modern genotyping platforms permit a systematic search for inherited components of complex diseases. We performed a joint analysis of two genomewide association studies of coronary artery disease.

    Methods: We first identified chromosomal loci that were strongly associated with coronary artery disease in the Wellcome Trust Case Control Consortium (WTCCC) study (which involved 1926 case subjects with coronary artery disease and 2938 controls) and looked for replication in the German MI [Myocardial Infarction] Family Study (which involved 875 case subjects with myocardial infarction and 1644 controls). Data on other single-nucleotide polymorphisms (SNPs) that were significantly associated with coronary artery disease in either study (P<0.001) were then combined to identify additional loci with a high probability of true association. Genotyping in both studies was performed with the use of the GeneChip Human Mapping 500K Array Set (Affymetrix).

    Results: Of thousands of chromosomal loci studied, the same locus had the strongest association with coronary artery disease in both the WTCCC and the German studies: chromosome 9p21.3 (SNP, rs1333049) (P=1.80x10(-14) and P=3.40x10(-6), respectively). Overall, the WTCCC study revealed nine loci that were strongly associated with coronary artery disease (P<1.2x10(-5) and less than a 50% chance of being falsely positive). In addition to chromosome 9p21.3, two of these loci were successfully replicated (adjusted P<0.05) in the German study: chromosome 6q25.1 (rs6922269) and chromosome 2q36.3 (rs2943634). The combined analysis of the two studies identified four additional loci significantly associated with coronary artery disease (P<1.3x10(-6)) and a high probability (>80%) of a true association: chromosomes 1p13.3 (rs599839), 1q41 (rs17465637), 10q11.21 (rs501120), and 15q22.33 (rs17228212).

    Conclusions: We identified several genetic loci that, individually and in aggregate, substantially affect the risk of development of coronary artery disease.

    Funded by: Medical Research Council: G0501942, G9806740; Wellcome Trust: 076113, 077011

    The New England journal of medicine 2007;357;5;443-53

  • Common variants in WFS1 confer risk of type 2 diabetes.

    Sandhu MS, Weedon MN, Fawcett KA, Wasson J, Debenham SL, Daly A, Lango H, Frayling TM, Neumann RJ, Sherva R, Blech I, Pharoah PD, Palmer CN, Kimber C, Tavendale R, Morris AD, McCarthy MI, Walker M, Hitman G, Glaser B, Permutt MA, Hattersley AT, Wareham NJ and Barroso I

    UK Medical Research Council (MRC) Epidemiology Unit, Strangeways Research Laboratory, Cambridge CB1 8RN, UK.

    We studied genes involved in pancreatic beta cell function and survival, identifying associations between SNPs in WFS1 and diabetes risk in UK populations that we replicated in an Ashkenazi population and in additional UK studies. In a pooled analysis comprising 9,533 cases and 11,389 controls, SNPs in WFS1 were strongly associated with diabetes risk. Rare mutations in WFS1 cause Wolfram syndrome; using a gene-centric approach, we show that variation in WFS1 also predisposes to common type 2 diabetes.

    Funded by: Medical Research Council: G0500070, MC_U106179471; Wellcome Trust: 068545/z/02, 077016

    Nature genetics 2007;39;8;951-3

  • Challenges and standards in integrating surveys of structural variation.

    Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, Carter NP, Hurles ME and Feuk L

    The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, 101 College Street, Room 14-701, Ontario M5G 1L7, Canada.

    There has been an explosion of data describing newly recognized structural variants in the human genome. In the flurry of reporting, there has been no standard approach to collecting the data, assessing its quality or describing identified features. This risks becoming a rampant problem, in particular with respect to surveys of copy number variation and their application to disease studies. Here, we consider the challenges in characterizing and documenting genomic structural variants. From this, we derive recommendations for standards to be adopted, with the aim of ensuring the accurate presentation of this form of genetic variation to facilitate ongoing research.

    Funded by: Wellcome Trust: 077008, 077014

    Nature genetics 2007;39;7 Suppl;S7-15

  • Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes.

    Sebaihia M, Peck MW, Minton NP, Thomson NR, Holden MT, Mitchell WJ, Carter AT, Bentley SD, Mason DR, Crossman L, Paul CJ, Ivens A, Wells-Bennik MH, Davis IJ, Cerdeño-Tárraga AM, Churcher C, Quail MA, Chillingworth T, Feltwell T, Fraser A, Goodhead I, Hance Z, Jagels K, Larke N, Maddison M, Moule S, Mungall K, Norbertczak H, Rabbinowitsch E, Sanders M, Simmonds M, White B, Whithead S and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom;

    Clostridium botulinum is a heterogeneous Gram-positive species that comprises four genetically and physiologically distinct groups of bacteria that share the ability to produce botulinum neurotoxin, the most poisonous toxin known to man, and the causative agent of botulism, a severe disease of humans and animals. We report here the complete genome sequence of a representative of Group I (proteolytic) C. botulinum (strain Hall A, ATCC 3502). The genome consists of a chromosome (3,886,916 bp) and a plasmid (16,344 bp), which carry 3650 and 19 predicted genes, respectively. Consistent with the proteolytic phenotype of this strain, the genome harbors a large number of genes encoding secreted proteases and enzymes involved in uptake and metabolism of amino acids. The genome also reveals a hitherto unknown ability of C. botulinum to degrade chitin. There is a significant lack of recently acquired DNA, indicating a stable genomic content, in strong contrast to the fluid genome of Clostridium difficile, which can form longer-term relationships with its host. Overall, the genome indicates that C. botulinum is adapted to a saprophytic lifestyle both in soil and aquatic environments. This pathogen relies on its toxin to rapidly kill a wide range of prey species, and to gain access to nutrient sources, it releases a large number of extracellular enzymes to soften and destroy rotting or decayed tissues.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D522797/1; Medical Research Council: G0700837; Wellcome Trust

    Genome research 2007;17;7;1082-92

  • A more convenient truth.

    Seth-Smith H

    Nature reviews. Microbiology 2007;5;4;248-50

  • Ocean's elevenses.

    Seth-Smith H

    Nature reviews. Microbiology 2007;5;1;9

  • Different evolutionary histories of the two classical class I genes BF1 and BF2 illustrate drift and selection within the stable MHC haplotypes of chickens.

    Shaw I, Powell TJ, Marston DA, Baker K, van Hateren A, Riegert P, Wiles MV, Milne S, Beck S and Kaufman J

    Institute for Animal Health, Compton, Berkshire, United Kingdom.

    Compared with the MHC of typical mammals, the chicken MHC (BF/BL region) of the B12 haplotype is smaller, simpler, and rearranged, with two classical class I genes of which only one is highly expressed. In this study, we describe the development of long-distance PCR to amplify some or all of each class I gene separately, allowing us to make the following points. First, six other haplotypes have the same genomic organization as B12, with a poorly expressed (minor) BF1 gene between DMB2 and TAP2 and a well-expressed (major) BF2 gene between TAP2 and C4. Second, the expression of the BF1 gene is crippled in three different ways in these haplotypes: enhancer A deletion (B12, B19), enhancer A divergence and transcription start site deletion (B2, B4, B21), and insertion/rearrangement leading to pseudogenes (B14, B15). Third, the three kinds of alterations in the BF1 gene correspond to dendrograms of the BF1 and poorly expressed class II B (BLB1) genes reflecting mostly neutral changes, while the dendrograms of the BF2 and well-expressed class II (BLB2) genes each have completely different topologies reflecting selection. The common pattern for the poorly expressed genes reflects the fact the BF/BL region undergoes little recombination and allows us to propose a pattern of descent for these chicken MHC haplotypes from a common ancestor. Taken together, these data explain how stable MHC haplotypes predominantly express a single class I molecule, which in turn leads to striking associations of the chicken MHC with resistance to infectious pathogens and response to vaccines.

    Funded by: Wellcome Trust

    Journal of immunology (Baltimore, Md. : 1950) 2007;178;9;5744-52

  • Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures.

    Stark A, Lin MF, Kheradpour P, Pedersen JS, Parts L, Carlson JW, Crosby MA, Rasmussen MD, Roy S, Deoras AN, Ruby JG, Brennecke J, Harvard FlyBase curators, Berkeley Drosophila Genome Project, Hodges E, Hinrichs AS, Caspi A, Paten B, Park SW, Han MV, Maeder ML, Polansky BJ, Robson BE, Aerts S, van Helden J, Hassan B, Gilbert DG, Eastman DA, Rice M, Weir M, Hahn MW, Park Y, Dewey CN, Pachter L, Kent WJ, Haussler D, Lai EC, Bartel DP, Hannon GJ, Kaufman TC, Eisen MB, Clark AG, Smith D, Celniker SE, Gelbart WM and Kellis M

    The Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02140, USA.

    Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.

    Funded by: NHGRI NIH HHS: R01 HG002779-05, R01 HG002779-06, R01 HG004037, R01 HG004037-01A1; NIGMS NIH HHS: R01 GM067031, R01 GM067031-04, R01 GM083300

    Nature 2007;450;7167;219-32

  • Salmonella enterica serovar typhimurium exploits inflammation to compete with the intestinal microbiota.

    Stecher B, Robbiani R, Walker AW, Westendorf AM, Barthel M, Kremer M, Chaffron S, Macpherson AJ, Buer J, Parkhill J, Dougan G, von Mering C and Hardt WD

    Institute of Microbiology, Swiss Institute of Technology Zurich, Zurich, Switzerland.

    Most mucosal surfaces of the mammalian body are colonized by microbial communities ("microbiota"). A high density of commensal microbiota inhabits the intestine and shields from infection ("colonization resistance"). The virulence strategies allowing enteropathogenic bacteria to successfully compete with the microbiota and overcome colonization resistance are poorly understood. Here, we investigated manipulation of the intestinal microbiota by the enteropathogenic bacterium Salmonella enterica subspecies 1 serovar Typhimurium (S. Tm) in a mouse colitis model: we found that inflammatory host responses induced by S. Tm changed microbiota composition and suppressed its growth. In contrast to wild-type S. Tm, an avirulent invGsseD mutant failing to trigger colitis was outcompeted by the microbiota. This competitive defect was reverted if inflammation was provided concomitantly by mixed infection with wild-type S. Tm or in mice (IL10(-/-), VILLIN-HA(CL4-CD8)) with inflammatory bowel disease. Thus, inflammation is necessary and sufficient for overcoming colonization resistance. This reveals a new concept in infectious disease: in contrast to current thinking, inflammation is not always detrimental for the pathogen. Triggering the host's immune defence can shift the balance between the protective microbiota and the pathogen in favour of the pathogen.

    PLoS biology 2007;5;10;2177-89

  • Relative impact of nucleotide and copy number variation on gene expression phenotypes.

    Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavaré S, Deloukas P, Hurles ME and Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Extensive studies are currently being performed to associate disease susceptibility with one form of genetic variation, namely, single-nucleotide polymorphisms (SNPs). In recent years, another type of common genetic variation has been characterized, namely, structural variation, including copy number variants (CNVs). To determine the overall contribution of CNVs to complex phenotypes, we have performed association analyses of expression levels of 14,925 transcripts with SNPs and CNVs in individuals who are part of the International HapMap project. SNPs and CNVs captured 83.6% and 17.7% of the total detected genetic variation in gene expression, respectively, but the signals from the two types of variation had little overlap. Interrogation of the genome for both types of variants may be an effective way to elucidate the causes of complex phenotypes and disease in humans.

    Funded by: Wellcome Trust: 065535, 076113, 077009, 077014, 077046

    Science (New York, N.Y.) 2007;315;5813;848-53

  • Population genomics of human gene expression.

    Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavaré S, Deloukas P and Dermitzakis ET

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus-transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.

    Funded by: Wellcome Trust: 077011, 077046

    Nature genetics 2007;39;10;1217-24

  • Replication timing profile reflects the distinct functional and genomic features of the MHC class II region.

    Takousis P, Johonnett P, Williamson J, Sasieni P, Warnes G, Forshew T, Azuara V, Fisher A, Wu PJ, Jones T, Vatcheva R, Beck S and Sheer D

    Human Cytogenetics Laboratory, Cancer Research, UK London Research Institute, London, UK.

    The timing of DNA replication generally correlates with transcription, gene density and sequence composition. How is the timing affected if a genomic region has a combination of features that individually correlate with either early or late replication? The major histocompatibility complex (MHC) class II region is an AT-rich isochore that would be expected to replicate late, but it also contains coordinately regulated genes that are highly expressed in antigen-presenting cells and are strongly inducible in other cell types. Using cytological and biochemical assays, we find that the entire MHC replicates within the first half of S-phase, and that the class II region replicates slightly later than the adjacent regions irrespective of gene expression. These data suggest that despite AT-richness, an early-to-middle replication time in the class II region is defined by an open chromatin conformation that allows rapid transcriptional activation as a defence against pathogens.

    Funded by: Cancer Research UK: A8318, C5321/A8318; Medical Research Council: MC_U120027516

    Cell cycle (Georgetown, Tex.) 2007;6;19;2393-8

  • Analysis of genetic variation in Akt2/PKB-beta in severe insulin resistance, lipodystrophy, type 2 diabetes, and related metabolic phenotypes.

    Tan K, Kimber WA, Luan J, Soos MA, Semple RK, Wareham NJ, O'Rahilly S and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, U.K.

    We previously reported a family in which a heterozygous missense mutation in Akt2 led to a dominantly inherited syndrome of insulin-resistant diabetes and partial lipodystrophy. To determine whether genetic variation in AKT2 plays a broader role in human metabolic disease, we sequenced the entire coding region and splice junctions of AKT2 in 94 unrelated patients with severe insulin resistance, 35 of whom had partial lipodystrophy. Two rare missense mutations (R208K and R467W) were identified in single individuals. However, insulin-stimulated kinase activities of these variants were indistinguishable from wild type. In two large case-control studies (total number of participants 2,200), 0 of 11 common single nucleotide polymorphism (SNPs) in AKT2 showed significant association with type 2 diabetes. In a quantitative trait study of 1,721 extensively phenotyped individuals from the U.K., no association was found with any relevant intermediate metabolic trait. In summary, although heterozygous loss-of- function mutations in AKT2 can cause a syndrome of severe insulin resistance and lipodystrophy in humans, such mutations are uncommon causes of these syndromes. Furthermore, genetic variation in and around the AKT2 locus is unlikely to contribute significantly to the risk of type 2 diabetes or related intermediate metabolic traits in U.K. populations.

    Funded by: Medical Research Council: MC_U106179471; Wellcome Trust: 077016

    Diabetes 2007;56;3;714-9

  • Mutations in UPF3B, a member of the nonsense-mediated mRNA decay complex, cause syndromic and nonsyndromic mental retardation.

    Tarpey PS, Raymond FL, Nguyen LS, Rodriguez J, Hackett A, Vandeleur L, Smith R, Shoubridge C, Edkins S, Stevens C, O'Meara S, Tofts C, Barthorpe S, Buck G, Cole J, Halliday K, Hills K, Jones D, Mironenko T, Perry J, Varian J, West S, Widaa S, Teague J, Dicks E, Butler A, Menzies A, Richardson D, Jenkinson A, Shepherd R, Raine K, Moon J, Luo Y, Parnau J, Bhat SS, Gardner A, Corbett M, Brooks D, Thomas P, Parkinson-Lawrence E, Porteous ME, Warner JP, Sanderson T, Pearson P, Simensen RJ, Skinner C, Hoganson G, Superneau D, Wooster R, Bobrow M, Turner G, Stevenson RE, Schwartz CE, Futreal PA, Srivastava AK, Stratton MR and Gécz J

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Nonsense-mediated mRNA decay (NMD) is of universal biological significance. It has emerged as an important global RNA, DNA and translation regulatory pathway. By systematically sequencing 737 genes (annotated in the Vertebrate Genome Annotation database) on the human X chromosome in 250 families with X-linked mental retardation, we identified mutations in the UPF3 regulator of nonsense transcripts homolog B (yeast) (UPF3B) leading to protein truncations in three families: two with the Lujan-Fryns phenotype and one with the FG phenotype. We also identified a missense mutation in another family with nonsyndromic mental retardation. Three mutations lead to the introduction of a premature termination codon and subsequent NMD of mutant UPF3B mRNA. Protein blot analysis using lymphoblastoid cell lines from affected individuals showed an absence of the UPF3B protein in two families. The UPF3B protein is an important component of the NMD surveillance machinery. Our results directly implicate abnormalities of NMD in human disease and suggest at least partial redundancy of NMD pathways.

    Funded by: NICHD NIH HHS: HD26202, R01 HD026202; Wellcome Trust: 077012

    Nature genetics 2007;39;9;1127-33

  • ProteomeBinders: planning a European resource of affinity reagents for analysis of the human proteome.

    Taussig MJ, Stoevesandt O, Borrebaeck CA, Bradbury AR, Cahill D, Cambillau C, de Daruvar A, Dübel S, Eichler J, Frank R, Gibson TJ, Gloriam D, Gold L, Herberg FW, Hermjakob H, Hoheisel JD, Joos TO, Kallioniemi O, Koegl M, Koegll M, Konthur Z, Korn B, Kremmer E, Krobitsch S, Landegren U, van der Maarel S, McCafferty J, Muyldermans S, Nygren PA, Palcy S, Plückthun A, Polic B, Przybylski M, Saviranta P, Sawyer A, Sherman DJ, Skerra A, Templin M, Ueffing M and Uhlén M

    Technology Research Group, The Babraham Institute, Cambridge CB22 3AT, UK.

    ProteomeBinders is a new European consortium aiming to establish a comprehensive resource of well-characterized affinity reagents, including but not limited to antibodies, for analysis of the human proteome. Given the huge diversity of the proteome, the scale of the project is potentially immense but nevertheless feasible in the context of a pan-European or even worldwide coordination.

    Nature methods 2007;4;1;13-7

  • A genotype calling algorithm for the Illumina BeadArray platform.

    Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP and Clark TG

    Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.

    Motivation: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes.

    Results: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy.

    Availability: The C++ executable for the algorithm described here is available by request from the authors.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust: 077011, 082370

    Bioinformatics (Oxford, England) 2007;23;20;2741-6

  • Sequence-based analysis of pQBR103; a representative of a unique, transfer-proficient mega plasmid resident in the microbial community of sugar beet.

    Tett A, Spiers AJ, Crossman LC, Ager D, Ciric L, Dow JM, Fry JC, Harris D, Lilley A, Oliver A, Parkhill J, Quail MA, Rainey PB, Saunders NJ, Seeger K, Snyder LA, Squares R, Thomas CM, Turner SL, Zhang XX, Field D and Bailey MJ

    Centre for Ecology and Hydrology-Oxford, Oxford, UK.

    The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the phytosphere. It is known to enhance the competitive fitness of its host, and parts of the plasmid are known to be actively transcribed in the plant environment. Analysis of the complete sequence of this plasmid predicts a coding sequence (CDS)-rich genome containing 478 CDSs and an exceptional degree of genetic novelty; 80% of predicted coding sequences cannot be ascribed a function and 60% are orphans. Of those to which function could be assigned, 40% bore greatest similarity to sequences from Pseudomonas spp, and the majority of the remainder showed similarity to other gamma-proteobacterial genera and plasmids. pQBR103 has identifiable regions presumed responsible for replication and partitioning, but despite being tra+ lacks the full complement of any previously described conjugal transfer functions. The DNA sequence provided few insights into the functional significance of plant-induced transcriptional regions, but suggests that 14% of CDSs may be expressed (11 CDSs with functional annotation and 54 without), further highlighting the ecological importance of these novel CDSs. Comparative analysis indicates that pQBR103 shares significant regions of sequence with other plasmids isolated from sugar beet plants grown at the same geographic location. These plasmid sequences indicate there is more novelty in the mobile DNA pool accessible to phytosphere pseudomonas than is currently appreciated or understood.

    Funded by: Wellcome Trust: 082372

    The ISME journal 2007;1;4;331-40

  • Chlamydia trachomatis: genome sequence analysis of lymphogranuloma venereum isolates.

    Thomson NR, Holden MT, Carder C, Lennard N, Lockey SJ, Marsh P, Skipp P, O'Connor CD, Goodhead I, Norbertzcak H, Harris B, Ormond D, Rance R, Quail MA, Parkhill J, Stephens RS and Clarke IN

    The Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Chlamydia trachomatis is the most common cause of sexually transmitted infections in the UK, a statistic that is also reflected globally. There are three biovariants of C. trachomatis: trachoma (serotypes A-C) and two sexually transmitted pathovars; serotypes D-K and lymphogranuloma venereum (LGV). Trachoma isolates and the sexually transmitted serotypes D-K are noninvasive, whereas the LGV strains are invasive, causing a disseminating infection of the local draining lymph nodes. Genome sequences are available for single isolates from the trachoma (serotype A) and sexually transmitted (serotype D) biotypes. We sequenced two isolates from the remaining biotype, LGV, a long-term laboratory passaged strain and the recent "epidemic" LGV isolate-causing proctitis. Although the genome of the LGV strain shows no additional genes that could account for the differences in disease outcome, we found evidence of functional gene loss and identified regions of heightened sequence variation that have previously been shown to be important sites for interstrain recombination. We have used new sequencing technologies to show that the recent clinical LGV isolate causing proctitis is unlikely to be a newly emerged strain but is most probably an old strain with relatively new clinical manifestations.

    Funded by: Wellcome Trust: 080348

    Genome research 2007;18;1;161-71

  • Rheumatoid arthritis association at 6q23.

    Thomson W, Barton A, Ke X, Eyre S, Hinks A, Bowes J, Donn R, Symmons D, Hider S, Bruce IN, Wellcome Trust Case Control Consortium, Wilson AG, Marinou I, Morgan A, Emery P, YEAR Consortium, Carter A, Steer S, Hocking L, Reid DM, Wordsworth P, Harrison P, Strachan D and Worthington J

    Arthritis Research Campaign (arc)-Epidemiology Unit, Stopford Building, The University of Manchester, Manchester M13 9PT, UK.

    The Wellcome Trust Case Control Consortium (WTCCC) identified nine single SNPs putatively associated with rheumatoid arthritis at P = 1 x 10(-5) - 5 x 10(-7) in a genome-wide association screen. One, rs6920220, was unequivocally replicated (trend P = 1.1 x 10(-8)) in a validation study, as described here. This SNP maps to 6q23, between the genes oligodendrocyte lineage transcription factor 3 (OLIG3) and tumor necrosis factor-alpha-induced protein 3 (TNFAIP3).

    Funded by: Arthritis Research UK: 17552; Medical Research Council: G0000934, G0000934(68341); Wellcome Trust: 068545, 068545/Z/02, 076113, 090532

    Nature genetics 2007;39;12;1431-3

  • Convergent adaptation of human lactase persistence in Africa and Europe.

    Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, Powell K, Mortensen HM, Hirbo JB, Osman M, Ibrahim M, Omar SA, Lema G, Nyambo TB, Ghori J, Bumpstead S, Pritchard JK, Wray GA and Deloukas P

    Department of Biology, University of Maryland, College Park, Maryland 20742, USA.

    A SNP in the gene encoding lactase (LCT) (C/T-13910) is associated with the ability to digest milk as adults (lactase persistence) in Europeans, but the genetic basis of lactase persistence in Africans was previously unknown. We conducted a genotype-phenotype association study in 470 Tanzanians, Kenyans and Sudanese and identified three SNPs (G/C-14010, T/G-13915 and C/G-13907) that are associated with lactase persistence and that have derived alleles that significantly enhance transcription from the LCT promoter in vitro. These SNPs originated on different haplotype backgrounds from the European C/T-13910 SNP and from each other. Genotyping across a 3-Mb region demonstrated haplotype homozygosity extending >2.0 Mb on chromosomes carrying C-14010, consistent with a selective sweep over the past approximately 7,000 years. These data provide a marked example of convergent evolution due to strong selective pressure resulting from shared cultural traits-animal domestication and adult milk consumption.

    Funded by: NHGRI NIH HHS: F32 HG003801, F32HG03801, HG002772-1, R01 HG002772; NIGMS NIH HHS: R01 GM076637, R01GM076637; Wellcome Trust: 076113

    Nature genetics 2007;39;1;31-40

  • Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes.

    Todd JA, Walker NM, Cooper JD, Smyth DJ, Downes K, Plagnol V, Bailey R, Nejentsev S, Field SF, Payne F, Lowe CE, Szeszko JS, Hafler JP, Zeitels L, Yang JH, Vella A, Nutland S, Stevens HE, Schuilenburg H, Coleman G, Maisuria M, Meadows W, Smink LJ, Healy B, Burren OS, Lam AA, Ovington NR, Allen J, Adlem E, Leung HT, Wallace C, Howson JM, Guja C, Ionescu-Tîrgovişte C, Genetics of Type 1 Diabetes in Finland, Simmonds MJ, Heward JM, Gough SC, Wellcome Trust Case Control Consortium, Dunger DB, Wicker LS and Clayton DG

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, University of Cambridge, Addenbrooke's Hospital, Cambridge CB2 0XY, UK.

    The Wellcome Trust Case Control Consortium (WTCCC) primary genome-wide association (GWA) scan on seven diseases, including the multifactorial autoimmune disease type 1 diabetes (T1D), shows associations at P < 5 x 10(-7) between T1D and six chromosome regions: 12q24, 12q13, 16p13, 18p11, 12p13 and 4q27. Here, we attempted to validate these and six other top findings in 4,000 individuals with T1D, 5,000 controls and 2,997 family trios independent of the WTCCC study. We confirmed unequivocally the associations of 12q24, 12q13, 16p13 and 18p11 (P(follow-up) <or= 1.35 x 10(-9); P(overall) <or= 1.15 x 10(-14)), leaving eight regions with small effects or false-positive associations. We also obtained evidence for chromosome 18q22 (P(overall) = 1.38 x 10(-8)) from a GWA study of nonsynonymous SNPs. Several regions, including 18q22 and 18p11, showed association with autoimmune thyroid disease. This study increases the number of T1D loci with compelling evidence from six to at least ten.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 061858, 061859, 089989

    Nature genetics 2007;39;7;857-64

  • Look who's talking too: graduates developing skills through communication.

    Tomazou EM and Powell GT

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Greater opportunities for young scientists to present their doctoral research to large general audiences will encourage development of transferable skills and involvement in the scientific community. We look at ways students communicate their research and explore the benefits of student-led meetings. The organization of the first Sanger-Cambridge Ph.D. Symposium provides an example of how students can act to establish forums for their work and we call on other young scientists to do the same.

    Nature reviews. Genetics 2007;8;9;724-6

  • The implications of alternative splicing in the ENCODE protein complement.

    Tress ML, Martelli PL, Frankish A, Reeves GA, Wesselink JJ, Yeats C, Olason PI, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski RA, López G, Sadowski MI, Watson JD, Fariselli P, Rossi I, Nagy A, Kai W, Størling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramírez F, Schlicker A, Denoeud F, Jones P, Kerrien S, Orchard S, Antonarakis SE, Reymond A, Birney E, Brunak S, Casadio R, Guigo R, Harrow J, Hermjakob H, Jones DT, Lengauer T, Orengo CA, Patthy L, Thornton JM, Tramontano A and Valencia A

    Structural Computational Biology Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain.

    Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.

    Funded by: Wellcome Trust: 062023, 077198

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;13;5495-500

  • Determination and validation of principal gene products.

    Tress ML, Wesselink JJ, Frankish A, López G, Goldman N, Löytynoja A, Massingham T, Pardi F, Whelan S, Harrow J and Valencia A

    Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.

    Motivation: Alternative splicing has the potential to generate a wide range of protein isoforms. For many computational applications and for experimental research, it is important to be able to concentrate on the isoform that retains the core biological function. For many genes this is far from clear.

    Results: We have combined five methods into a pipeline that allows us to detect the principal variant for a gene. Most of the methods were based on conservation between species, at the level of both gene and protein. The five methods used were the conservation of exonic structure, the detection of non-neutral evolution, the conservation of functional residues, the existence of a known protein structure and the abundance of vertebrate orthologues. The pipeline was able to determine a principal isoform for 83% of a set of well-annotated genes with multiple variants.

    Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198

    Bioinformatics (Oxford, England) 2007;24;1;11-7

  • Germline rates of de novo meiotic deletions and duplications causing several genomic disorders.

    Turner DJ, Miretti M, Rajan D, Fiegler H, Carter NP, Blayney ML, Beck S and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    Meiotic recombination between highly similar duplicated sequences (nonallelic homologous recombination, NAHR) generates deletions, duplications, inversions and translocations, and it is responsible for genetic diseases known as 'genomic disorders', most of which are caused by altered copy number of dosage-sensitive genes. NAHR hot spots have been identified within some duplicated sequences. We have developed sperm-based assays to measure the de novo rate of reciprocal deletions and duplications at four NAHR hot spots. We used these assays to dissect the relative rates of NAHR between different pairs of duplicated sequences. We show that (i) these NAHR hot spots are specific to meiosis, (ii) deletions are generated at a higher rate than their reciprocal duplications in the male germline and (iii) some of these genomic disorders are likely to have been underascertained clinically, most notably that resulting from the duplication of 7q11, the reciprocal of the deletion causing Williams-Beuren syndrome.

    Funded by: Wellcome Trust: 077008, 077014

    Nature genetics 2007;40;1;90-5

  • Network activity-independent coordinated gene expression program for synapse assembly.

    Valor LM, Charlesworth P, Humphreys L, Anderson CN and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Global biological datasets generated by genomics, transcriptomics, and proteomics provide new approaches to understanding the relationship between the genome and the synapse. Combined transcriptome analysis and multielectrode recordings of neuronal network activity were used in mouse embryonic primary neuronal cultures to examine synapse formation and activity-dependent gene regulation. Evidence for a coordinated gene expression program for assembly of synapses was observed in the expression of 642 genes encoding postsynaptic and plasticity proteins. This synaptogenesis gene expression program preceded protein expression of synapse markers and onset of spiking activity. Continued expression was followed by maturation of morphology and electrical neuronal networks, which was then followed by the expression of activity-dependent genes. Thus, two distinct sequentially active gene expression programs underlie the genomic programs of synapse function.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;11;4658-63

  • The Ras-association domain family (RASSF) members and their role in human tumourigenesis.

    van der Weyden L and Adams DJ

    Experimental Cancer Genetics Laboratory, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK.

    Ras proteins play a direct causal role in human cancer with activating mutations in Ras occurring in approximately 30% of tumours. Ras effectors also contribute to cancer, as mutations occur in Ras effectors, notably B-Raf and PI3-K, and drugs blocking elements of these pathways are in clinical development. In 2000, a new Ras effector was identified, RAS-association domain family 1 (RASSF1), and expression of the RASSF1A isoform of this gene is silenced in tumours by methylation of its promoter. Since methylation is reversible and demethylating agents are currently being used in clinical trials, detection of RASSF1A silencing by promoter hypermethylation has potential clinical uses in cancer diagnosis, prognosis and treatment. RASSF1A belongs to a new family of RAS effectors, of which there are currently 8 members (RASSF1-8). RASSF1-6 each contain a variable N-terminal segment followed by a Ras-association (RA) domain of the Ral-GDS/AF6 type, and a specialised coiled-coil structure known as a SARAH domain extending to the C-terminus. RASSF7-8 contain an N-terminal RA domain and a variable C-terminus. Members of the RASSF family are thought to function as tumour suppressors by regulating the cell cycle and apoptosis. This review will summarise our current knowledge of each member of the RASSF family and in particular what role they play in tumourigenesis, with a special focus on RASSF1A, whose promoter methylation is one of the most frequent alterations found in human tumours.

    Funded by: Cancer Research UK: A6997; Wellcome Trust

    Biochimica et biophysica acta 2007;1776;1;58-85

  • A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21.

    van Heel DA, Franke L, Hunt KA, Gwilliam R, Zhernakova A, Inouye M, Wapenaar MC, Barnardo MC, Bethel G, Holmes GK, Feighery C, Jewell D, Kelleher D, Kumar P, Travis S, Walters JR, Sanders DS, Howdle P, Swift J, Playford RJ, McLaren WM, Mearin ML, Mulder CJ, McManus R, McGinnis R, Cardon LR, Deloukas P and Wijmenga C

    Centre for Gastroenterology, Institute of Cell and Molecular Science, Queen Mary University of London, London E1 2AT, UK.

    We tested 310,605 SNPs for association in 778 individuals with celiac disease and 1,422 controls. Outside the HLA region, the most significant finding (rs13119723; P = 2.0 x 10(-7)) was in the KIAA1109-TENR-IL2-IL21 linkage disequilibrium block. We independently confirmed association in two further collections (strongest association at rs6822844, 24 kb 5' of IL21; meta-analysis P = 1.3 x 10(-14), odds ratio = 0.63), suggesting that genetic variation in this region predisposes to celiac disease.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068094, 068545/Z/02, GR068094MA

    Nature genetics 2007;39;7;827-9

  • Definition of a minimal region of deletion of chromosome 7 in uterine leiomyomas by tiling-path microarray CGH and mutation analysis of known genes in this region.

    Vanharanta S, Wortham NC, Langford C, El-Bahrawy M, van der Spuy Z, Sjöberg J, Lehtonen R, Karhu A, Tomlinson IP and Aaltonen LA

    Department of Medical Genetics, Biomedicum Helsinki, University of Helsinki, Helsinki, Finland.

    Somatic interstitial deletions of chromosome segment 7q22-q31 in uterine leiomyomas are a frequent event, thought to be indicative of a tumor suppressor gene in the region. Previous LOH and CGH studies have refined this region to 7q22.3-q31, although the target gene has not been identified. Here, we have used tiling-path resolution microarray CGH to further refine the region and to identify homozygous deletions in fibroids. Furthermore, we have screened all manually annotated genes in the region for mutations. We have refined the minimum deleted region at 7q22.3-q31 to 2.79 Mbp and identified a second region of deletion at 7q34. However, we identified no pathogenic coding variation.

    Genes, chromosomes & cancer 2007;46;5;451-8

  • Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans.

    Vavouri T, Walter K, Gilks WR, Lehner B and Elgar G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Background: The human genome contains thousands of non-coding sequences that are often more conserved between vertebrate species than protein-coding exons. These highly conserved non-coding elements (CNEs) are associated with genes that coordinate development, and have been proposed to act as transcriptional enhancers. Despite their extreme sequence conservation in vertebrates, sequences homologous to CNEs have not been identified in invertebrates.

    Results: Here we report that nematode genomes contain an alternative set of CNEs that share sequence characteristics, but not identity, with their vertebrate counterparts. CNEs thus represent a very unusual class of sequences that are extremely conserved within specific animal lineages yet are highly divergent between lineages. Nematode CNEs are also associated with developmental regulatory genes, and include well-characterized enhancers and transcription factor binding sites, supporting the proposed function of CNEs as cis-regulatory elements. Most remarkably, 40 of 156 human CNE-associated genes with invertebrate orthologs are also associated with CNEs in both worms and flies.

    Conclusion: A core set of genes that regulate development is associated with CNEs across three animal groups (worms, flies and vertebrates). We propose that these CNEs reflect the parallel evolution of alternative enhancers for a common set of developmental regulatory genes in different animal groups. This 're-wiring' of gene regulatory networks containing key developmental coordinators was probably a driving force during the evolution of animal body plans. CNEs may, therefore, represent the genomic traces of these 'hard-wired' core gene regulatory networks that specify the development of each alternative animal body plan.

    Funded by: Medical Research Council: G0401138, MC_U105260799

    Genome biology 2007;8;2;R15

  • Distinct cytokine-driven responses of activated blood gammadelta T cells: insights into unconventional T cell pleiotropy.

    Vermijlen D, Ellis P, Langford C, Klein A, Engel R, Willimann K, Jomaa H, Hayday AC and Eberl M

    Peter Gorer Department of Immunobiology, Guy's, King's and St. Thomas' Medical School, King's College London, London, UK.

    Human Vgamma9/Vdelta2 T cells comprise a small population of peripheral blood T cells that in many infectious diseases respond to the microbial metabolite, (E)-4-hydroxy-3-methyl-but-2-enyl pyrophosphate (HMB-PP), expanding to up to 50% of CD3(+) cells. This "transitional response," occurring temporally between the rapid innate and slower adaptive response, is widely viewed as proinflammatory and/or cytolytic. However, increasing evidence that different cytokines drive widely different effector functions in alphabeta T cells provoked us to apply cDNA microarrays to explore the potential pleiotropy of HMB-PP-activated Vgamma9/Vdelta2 T cells. The data and accompanying validations show that the related cytokines, IL-2, IL-4, or IL-21, each drive proliferation and comparable CD69 up-regulation but induce distinct effector responses that differ from prototypic alphabeta T cell responses. For example, the Th1-like response to IL-2 also includes expression of IL-5 and IL-13 that conversely are not induced by IL-4. The data identify specific molecules that may mediate gammadelta T cell effects. Thus, IL-21 induces a lymphoid-homing phenotype and high, unexpected expression of the follicular B cell-attracting chemokine CXCL13/BCA-1, suggesting a novel follicular B-helper-like T cell that may play a hitherto underappreciated role in humoral immunity early in infection. Such broad plasticity emphasizes the capacity of gammadelta T cells to influence the nature of the immune response to different challenges and has implications for the ongoing clinical application of cytokines together with Vgamma9/Vdelta2 TCR agonists.

    Funded by: Wellcome Trust: 071534

    Journal of immunology (Baltimore, Md. : 1950) 2007;178;7;4304-14

  • microRNA-155 regulates the generation of immunoglobulin class-switched plasma cells.

    Vigorito E, Perks KL, Abreu-Goodger C, Bunting S, Xiang Z, Kohlhaas S, Das PP, Miska EA, Rodriguez A, Bradley A, Smith KG, Rada C, Enright AJ, Toellner KM, Maclennan IC and Turner M

    Laboratory of Lymphocyte Signalling and Development, The Babraham Institute, Cambridge, CB22 3AT, UK.

    microRNA-155 (miR-155) is expressed by cells of the immune system after activation and has been shown to be required for antibody production after vaccination with attenuated Salmonella. Here we show the intrinsic requirement for miR-155 in B cell responses to thymus-dependent and -independent antigens. B cells lacking miR-155 generated reduced extrafollicular and germinal center responses and failed to produce high-affinity IgG1 antibodies. Gene-expression profiling of activated B cells indicated that miR-155 regulates an array of genes with diverse function, many of which are predicted targets of miR-155. The transcription factor Pu.1 is validated as a direct target of miR155-mediated inhibition. When Pu.1 is overexpressed in wild-type B cells, fewer IgG1 cells are produced, indicating that loss of Pu.1 regulation is a contributing factor to the miR-155-deficient phenotype. Our results implicate post-transcriptional regulation of gene expression for establishing the terminal differentiation program of B cells.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/00001206, BBS/E/B/0000C223, BBS/E/B/0000M206; Medical Research Council: G0700287, G117/424, G8402371, MC_U105178806; Wellcome Trust: 079643

    Immunity 2007;27;6;847-59

  • Say hello to our little friends.

    Walker A

    Nature reviews. Microbiology 2007;5;8;572-3

  • This place is big enough for both of us.

    Walker A and Crossman LC

    Nature reviews. Microbiology 2007;5;2;90-2

  • Urbane decay.

    Walker A and Seth-Smith H

    Nature reviews. Microbiology 2007;5;10;748-9

  • A recessive genetic screen for host factors required for retroviral infection in a library of insertionally mutated Blm-deficient embryonic stem cells.

    Wang W and Bradley A

    Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, PR China.

    Background: Host factors required for retroviral infection are potential targets for the modulation of diseases caused by retroviruses. During the retroviral life cycle, numerous cellular factors interact with the virus and play an essential role in infection. Cultured embryonic stem (ES) cells are susceptible to retroviral infection, therefore providing access to all of the genes required for this process to take place. In order to identify the host factors involved in retroviral infection, we designed and implemented a scheme for identifying ES cells that are resistant to retroviral infection and subsequent cloning of the mutated gene.

    Results: A library of mutant ES cells was established by genome-wide insertional mutagenesis in Blm-deficient ES cells, and a screen was performed by superinfection of the library at high multiplicity with a recombinant retrovirus carrying a positive and negative selection cassette. Stringent negative selection was then used to exclude the infected ES cells. We successfully recovered five independent clones of ES cells that are resistant to retroviral infection. Analysis of the mutations in these clones revealed four different homozygous and one compound heterozygous mutation in the mCat-1 locus, which confirms that mCat-1 is the ecotropic murine leukemia virus receptor in ES cells.

    Conclusion: We have demonstrated the feasibility and reliability of this recessive genetic approach to identifying critical genes required for retroviral infection in ES cells; the approach provides a unique opportunity to recover other cellular factors required for retroviral infection. The resulting insertionally mutated Blm-deficient ES cell library might also provide access to essential host cell components that are required for infection and replication for other types of virus.

    Funded by: Wellcome Trust

    Genome biology 2007;8;4;R48

  • A Sall4 mutant mouse model useful for studying the role of Sall4 in early embryonic development and organogenesis.

    Warren M, Wang W, Spiden S, Chen-Murchie D, Tannahill D, Steel KP and Bradley A

    SALL4 is a homologue of the Drosophila homeotic gene spalt, a zinc finger transcription factor, required for inner cell mass proliferation in early embryonic development. It also interacts with other transcription factors to control the development of the anorectal region, kidney, heart, limbs, and brain. Truncating mutations in SALL4 cause Okihiro syndrome, manifest as Duane anomaly, radial ray defects and sensorineural and conductive deafness. We report the characterization of a novel murine Sall4 null allele created by bacterial recombineering in ES cells. Homozygous mutant mice exhibit early embryonic lethality. Heterozygous mutant mice recapitulate phenotypic features of Okihiro syndrome including deafness, lower anogenital tract abnormalities, renal hypoplasia, anencephaly, Hirschprung's disease, and skeletal defects. This phenotype shows important differences in cardiac and ear manifestations to previously characterized Sall4 mutant alleles and should prove useful for the investigation of the influence of modifier alleles and protein interactions on the transcriptional regulatory function of Sall4.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 077187

    Genesis (New York, N.Y. : 2000) 2007;45;1;51-8

  • Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

    Wellcome Trust Case Control Consortium

    There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined approximately 2,000 individuals for each of 7 major diseases and a shared set of approximately 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 x 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10(-5) and 5 x 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.

    Funded by: Chief Scientist Office: CZB/4/540; Medical Research Council: G0000934, G0100594, G0501942, G0600329, G0600705, G0800759, G0901461, G19/9, G90/106, G9806740, G9810900; Wellcome Trust: 076113, 077011, 090532

    Nature 2007;447;7145;661-78

  • Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants.

    Wellcome Trust Case Control Consortium, Australo-Anglo-American Spondylitis Consortium (TASC), Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, Todd JA, Donnelly P, Barrett JC, Davison D, Easton D, Evans DM, Leung HT, Marchini JL, Morris AP, Spencer CC, Tobin MD, Attwood AP, Boorman JP, Cant B, Everson U, Hussey JM, Jolley JD, Knight AS, Koch K, Meech E, Nutland S, Prowse CV, Stevens HE, Taylor NC, Walters GR, Walker NM, Watkins NA, Winzer T, Jones RW, McArdle WL, Ring SM, Strachan DP, Pembrey M, Breen G, St Clair D, Caesar S, Gordon-Smith K, Jones L, Fraser C, Green EK, Grozeva D, Hamshere ML, Holmans PA, Jones IR, Kirov G, Moskivina V, Nikolov I, O'Donovan MC, Owen MJ, Collier DA, Elkin A, Farmer A, Williamson R, McGuffin P, Young AH, Ferrier IN, Ball SG, Balmforth AJ, Barrett JH, Bishop TD, Iles MM, Maqbool A, Yuldasheva N, Hall AS, Braund PS, Dixon RJ, Mangino M, Stevens S, Thompson JR, Bredin F, Tremelling M, Parkes M, Drummond H, Lees CW, Nimmo ER, Satsangi J, Fisher SA, Forbes A, Lewis CM, Onnie CM, Prescott NJ, Sanderson J, Matthew CG, Barbour J, Mohiuddin MK, Todhunter CE, Mansfield JC, Ahmad T, Cummings FR, Jewell DP, Webster J, Brown MJ, Lathrop MG, Connell J, Dominiczak A, Marcano CA, Burke B, Dobson R, Gungadoo J, Lee KL, Munroe PB, Newhouse SJ, Onipinla A, Wallace C, Xue M, Caulfield M, Farrall M, Barton A, Biologics in RA Genetics and Genomics Study Syndicate (BRAGGS) Steering Committee, Bruce IN, Donovan H, Eyre S, Gilbert PD, Hilder SL, Hinks AM, John SL, Potter C, Silman AJ, Symmons DP, Thomson W, Worthington J, Dunger DB, Widmer B, Frayling TM, Freathy RM, Lango H, Perry JR, Shields BM, Weedon MN, Hattersley AT, Hitman GA, Walker M, Elliott KS, Groves CJ, Lindgren CM, Rayner NW, Timpson NJ, Zeggini E, Newport M, Sirugo G, Lyons E, Vannberg F, Hill AV, Bradbury LA, Farrar C, Pointon JJ, Wordsworth P, Brown MA, Franklyn JA, Heward JM, Simmonds MJ, Gough SC, Seal S, Breast Cancer Susceptibility Collaboration (UK), Stratton MR, Rahman N, Ban M, Goris A, Sawcer SJ, Compston A, Conway D, Jallow M, Newport M, Sirugo G, Rockett KA, Bumpstead SJ, Chaney A, Downes K, Ghori MJ, Gwilliam R, Hunt SE, Inouye M, Keniry A, King E, McGinnis R, Potter S, Ravindrarajah R, Whittaker P, Widden C, Withers D, Cardin NJ, Davison D, Ferreira T, Pereira-Gale J, Hallgrimsdo'ttir IB, Howie BN, Su Z, Teo YY, Vukcevic D, Bentley D, Brown MA, Compston A, Farrall M, Hall AS, Hattersley AT, Hill AV, Parkes M, Pembrey M, Stratton MR, Mitchell SL, Newby PR, Brand OJ, Carr-Smith J, Pearce SH, McGinnis R, Keniry A, Deloukas P, Reveille JD, Zhou X, Sims AM, Dowling A, Taylor J, Doan T, Davis JC, Savage L, Ward MM, Learch TL, Weisman MH and Brown M

    Genetic Epidemiology Group, Department of Health Sciences, University of Leicester, Adrian Building, University Road, Leicester LE1 7RH, UK.

    We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirmation of the previously reported association of AITD with TSHR and FCRL3. These findings, enabled in part by increased statistical power resulting from the expansion of the control reference group to include individuals from the other disease groups, highlight notable new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major 'seronegative' diseases.

    Funded by: Arthritis Research UK: 17552; Cancer Research UK: A4994; Chief Scientist Office: CZB/4/540; Medical Research Council: G0000934, G0501942, G0600329, G0600705, G0701003, G0800759, G19/9, G90/106, G9810900; Multiple Sclerosis Society: 730; NCRR NIH HHS: M01 RR000425, UL1 RR024148; NIAMS NIH HHS: R01 AR046208, R01 AR048465; Wellcome Trust: 057097, 076113, 081682, 089989, 090532

    Nature genetics 2007;39;11;1329-37

  • Esophageal atresia, hypoplasia of zygomatic complex, microcephaly, cup-shaped ears, congenital heart defect, and mental retardation--new MCA/MR syndrome in two affected sibs and a mildly affected mother?

    Wieczorek D, Shaw-Smith C, Kohlhase J, Schmitt W, Buiting K, Coffey A, Howard E, Hehr U and Gillessen-Kaesbach G

    Institut für Humangenetik, Universitätsklinikum Essen, Germany, and Department of Medical Genetics, Addenbrooke's Hospital, Cambridge, UK.

    The previously undescribed combination of esophageal atresia, hypoplasia of the zygomatic complex, microcephaly, cup-shaped ears, congenital heart defect, and mental retardation was diagnosed in two siblings of different sexes, with the brother being more severely affected. The mother presented with zygomatic arch hypoplasia of the right side only. We discuss major differential diagnoses: Goldenhar, Feingold, CHARGE, and Treacher Collins syndromes show a few overlapping clinical features, but these diagnoses are unlikely as the clinical findings are unusual for Goldenhar syndrome and mutational screening of the MYCN, the CHD7, and the TCOF1 genes did not reveal any abnormalities. Autosomal recessive oto-facial syndrome, hypomandibular faciocranial dysostosis, and Ozkan syndromes were clinically excluded. A microdeletion 22q11.2 was excluded by FISH analysis, a microdeletion 2p23-p24 by microsatellite analyses, a subtelomeric chromosomal aberration by MLPA, and a small genomic deletion/duplication by CGH array. As X-inactivation studies did not show skewed X-inactivation in the mother, we consider X-chromosomal recessive inheritance of this condition less likely. We discuss autosomal dominant inheritance with variable expressivity or mosaicism in the mother as the likely genetic mechanism in this new multiple congenital anomaly/mental retardation (MCA/MR) syndrome.

    American journal of medical genetics. Part A 2007;143A;11;1135-42

  • The Israeli-Palestinian Science Organization.

    Wiesel T, Agre P, Arrow KJ, Atiyah M, Brézin E, Charfi FF, Cohen-Tanoudji C, Daar A, Jacob F, Kahneman D, Lee YT, Nicolaisen I, Nusseibeh S, Reuter H, Shoham Y, Sulston J, Walzer M and Yaari M

    Science (New York, N.Y.) 2007;315;5808;39

  • The vertebrate genome annotation (Vega) database.

    Wilming LG, Gilbert JG, Howe K, Trevanion S, Hubbard T and Harrow JL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    The Vertebrate Genome Annotation (Vega) database ( was first made public in 2004 and has been designed to view manual annotation of human, mouse and zebrafish genomic sequences produced at the Wellcome Trust Sanger Institute. Since its initial release, the number of human annotated loci has more than doubled to close to 33 000 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome. In addition, we offer manual annotation of a number of haplotype regions in mouse and human and regions of comparative interest in pig and dog that are unique to Vega.

    Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198

    Nucleic acids research 2007;36;Database issue;D753-60

  • Finding cis-regulatory modules in Drosophila using phylogenetic hidden Markov models.

    Wong WS and Nielsen R

    Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.

    Motivation: Finding the regulatory modules for transcription factors binding is an important step in elucidating the complex molecular mechanisms underlying regulation of gene expression. There are numerous methods available for solving this problem, however, very few of them take advantage of the increasing availability of comparative genomic data.

    Results: We develop a method for finding regulatory modules in Eukaryotic species using phylogenetic data. Using computer simulations and analysis of real data, we show that the use of phylogenetic hidden Markov model can lead to an increase in accuracy of prediction over methods that do not take advantage of the data from multiple species.

    Availability: The new method is made accessible under GPL in a new publicly available JAVA program: EvoPromoter. It can be downloaded at

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2007;23;16;2031-7

  • Interleukin-2 gene variation impairs regulatory T cell function and causes autoimmunity.

    Yamanouchi J, Rainbow D, Serra P, Howlett S, Hunter K, Garner VE, Gonzalez-Munoz A, Clark J, Veijola R, Cubbon R, Chen SL, Rosa R, Cumiskey AM, Serreze DV, Gregory S, Rogers J, Lyons PA, Healy B, Smink LJ, Todd JA, Peterson LB, Wicker LS and Santamaria P

    Julia McFarlane Diabetes Research Centre (JMDRC) and Department of Microbiology and Infectious Diseases, Institute of Inflammation, Infection and Immunity, Faculty of Medicine, The University of Calgary, Calgary, Alberta T2N 4N1, Canada.

    Autoimmune diseases are thought to result from imbalances in normal immune physiology and regulation. Here, we show that autoimmune disease susceptibility and resistance alleles on mouse chromosome 3 (Idd3) correlate with differential expression of the key immunoregulatory cytokine interleukin-2 (IL-2). In order to test directly that an approximately twofold reduction in IL-2 underpins the Idd3-linked destabilization of immune homeostasis, we show that engineered haplodeficiency of Il2 gene expression not only reduces T cell IL-2 production by twofold but also mimics the autoimmune dysregulatory effects of the naturally occurring susceptibility alleles of Il2. Reduced IL-2 production achieved by either genetic mechanism correlates with reduced function of CD4(+) CD25(+) regulatory T cells, which are critical for maintaining immune homeostasis.

    Funded by: Wellcome Trust: 061859

    Nature genetics 2007;39;3;329-37

  • Genome-wide association study of prostate cancer identifies a second risk locus at 8q24.

    Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, Wang Z, Welch R, Staats BJ, Calle EE, Feigelson HS, Thun MJ, Rodriguez C, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Giovannucci E, Willett WC, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Gelmann EP, Tucker M, Gerhard DS, Fraumeni JF, Hoover R, Hunter DJ, Chanock SJ and Thomas G

    SAIC-Frederick, National Cancer Institute (NCI)-Frederick Cancer Research and Development Center, Frederick, Maryland 21702, USA.

    Recently, common variants on human chromosome 8q24 were found to be associated with prostate cancer risk. While conducting a genome-wide association study in the Cancer Genetic Markers of Susceptibility project with 550,000 SNPs in a nested case-control study (1,172 cases and 1,157 controls of European origin), we identified a new association at 8q24 with an independent effect on prostate cancer susceptibility. The most significant signal is 70 kb centromeric to the previously reported SNP, rs1447295, but shows little evidence of linkage disequilibrium with it. A combined analysis with four additional studies (total: 4,296 cases and 4,299 controls) confirms association with prostate cancer for rs6983267 in the centromeric locus (P = 9.42 x 10(-13); heterozygote odds ratio (OR): 1.26, 95% confidence interval (c.i.): 1.13-1.41; homozygote OR: 1.58, 95% c.i.: 1.40-1.78). Each SNP remained significant in a joint analysis after adjusting for the other (rs1447295 P = 1.41 x 10(-11); rs6983267 P = 6.62 x 10(-10)). These observations, combined with compelling evidence for a recombination hotspot between the two markers, indicate the presence of at least two independent loci within 8q24 that contribute to prostate cancer in men of European ancestry. We estimate that the population attributable risk of the new locus, marked by rs6983267, is higher than the locus marked by rs1447295 (21% versus 9%).

    Funded by: CCR NIH HHS: N01-RC-37004, N01-RC-45035; Intramural NIH HHS; NCI NIH HHS: 5U01CA098233-04, CA55075, N01-CN-45165, T32 CA 09001, U01 CA098710; Wellcome Trust

    Nature genetics 2007;39;5;645-9

  • Insights into modern disease from our distant evolutionary past.

    Yngvadottir B

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    An EMBO workshop entitled 'Human Evolution and Disease' was held recently (6-9 December 2006, Hyderabad, India) where 141 scientists from many disciplines came together to discuss recent studies of human variation, origins and dispersal, natural selection and disease susceptibility. The meeting tackled the subject of human evolution and disease from the different perspectives of archaeology, linguistics, genetics and genomics based on both new and publicly available data sets. In this report, we highlight the latest fashion crazes in the discipline, in particular, the use of large public data sets and new methods to analyse modern human variation and the links between human evolution and disease susceptibility.

    European journal of human genetics : EJHG 2007;15;5;603-6

  • The V103I polymorphism of the MC4R gene and obesity: population based studies and meta-analysis of 29 563 individuals.

    Young EH, Wareham NJ, Farooqi S, Hinney A, Hebebrand J, Scherag A, O'rahilly S, Barroso I and Sandhu MS

    MRC Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK.

    Background: Previous studies have suggested that a variant in the melanocortin-4 receptor (MC4R) gene is important in protecting against common obesity. Larger studies are needed, however, to confirm this relation.

    Methods: We assessed the association between the V103I polymorphism in the MC4R gene and obesity in three UK population based cohort studies, totalling 8304 individuals. We also did a meta-analysis of relevant studies, involving 10 975 cases and 18 588 controls, to place our findings in context.

    Finding: In an analysis of all studies, individuals carrying the isoleucine allele had an 18% (95% confidence interval 4-30%, P=0.015) lower risk of obesity compared with non-carriers. There was no heterogeneity among studies and no apparent publication bias.

    Interpretation: This study confirms that the V103I polymorphism protects against human obesity at a population level. As such it provides proof of principle that specific gene variants may, at least in part, explain susceptibility and resistance to common forms of human obesity. A better understanding of the mechanisms underlying this association will help determine whether changes in MC4R activity have therapeutic potential.

    Funded by: Medical Research Council: G0100103, G9824984, MC_U106179471, MC_U106188470; Wellcome Trust: 068086, 077016

    International journal of obesity (2005) 2007;31;9;1437-41

  • A new function for the fragile X mental retardation protein in regulation of PSD-95 mRNA stability.

    Zalfa F, Eleuteri B, Dickson KS, Mercaldo V, De Rubeis S, di Penta A, Tabolacci E, Chiurazzi P, Neri G, Grant SG and Bagni C

    Dipartimento di Biologia, Università Tor Vergata, Via della Ricerca Scientifica 1, 00133 Rome, Italy.

    Fragile X syndrome (FXS) results from the loss of the fragile X mental retardation protein (FMRP), an RNA-binding protein that regulates a variety of cytoplasmic mRNAs. FMRP regulates mRNA translation and may be important in mRNA localization to dendrites. We report a third cytoplasmic regulatory function for FMRP: control of mRNA stability. In mice, we found that FMRP binds, in vivo, the mRNA encoding PSD-95, a key molecule that regulates neuronal synaptic signaling and learning. This interaction occurs through the 3' untranslated region of the PSD-95 (also known as Dlg4) mRNA, increasing message stability. Moreover, stabilization is further increased by mGluR activation. Although we also found that the PSD-95 mRNA is synaptically localized in vivo, localization occurs independently of FMRP. Through our functional analysis of this FMRP target we provide evidence that dysregulation of mRNA stability may contribute to the cognitive impairments in individuals with FXS.

    Funded by: Telethon: GGP05269; Wellcome Trust: 056523, 077155

    Nature neuroscience 2007;10;5;578-87

  • Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes.

    Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS, Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI and Hattersley AT

    Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Oxford, OX3 7LJ, UK.

    The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1924 diabetic cases and 2938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.

    Funded by: Medical Research Council: G0000934, G0500070; Wellcome Trust: 083948, 090532

    Science (New York, N.Y.) 2007;316;5829;1336-41

  • Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution.

    Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, Ruan Y, Wei CL, Gingeras TR, Guigó R, Harrow J and Gerstein MB

    Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.

    Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

    Funded by: NCI NIH HHS: N01CO12400; NHGRI NIH HHS: U01 HG003147, U01 HG003150, U01 HG003156, U01HG03147, U01HG03150, U01HG03156; PHS HHS: N01C012400; Wellcome Trust: 077198

    Genome research 2007;17;6;839-51