Sanger Institute - Publications 2008

Number of papers published in 2008: 308

  • Large-scale molecular analysis of a 34 Mb interval on chromosome 6q: major refinement of the RP25 interval.

    Abd El-Aziz MM, Barragan I, O'Driscoll C, Borrego S, Abu-Safieh L, Pieras JI, El-Ashry MF, Prigmore E, Carter N, Antinolo G and Bhattacharya SS

    Department of Molecular Genetics, Institute of Ophthalmology, London EC1V 9EL, UK.

    A large scale bioinformatics and molecular analysis of a 34 Mb interval on chromosome 6q12 was undertaken as part of our ongoing study to identify the gene responsible for an autosomal recessive retinitis pigmentosa (arRP) locus, RP25. Extensive bioinformatics analysis indicated in excess of 110 genes within the region and we also noted unfinished sequence on chromosome 6q in the Human Genome Database, between 58 and 61.2 Mb. Forty three genes within the RP25 interval were considered as good candidates for mutation screening. Direct sequence analysis of the selected genes in 7 Spanish families with arRP revealed a total of 244 sequence variants, of which 67 were novel but none were pathogenic. This, together with previous reports, excludes 60 genes within the interval ( approximately 55%) as disease causing for RP. To investigate if copy number variation (CNV) exists within RP25, a comparative genomic hybridization (CGH) analysis was performed on a consanguineous family. A clone from the tiling path array, chr6tp-19C7, spanning approximately 100-Kb was found to be deleted in all affected members of the family, leading to a major refinement of the interval. This will eventually have a significant impact on cloning of the RP25 gene.

    Annals of human genetics 2008;72;Pt 4;463-77

  • EYS, encoding an ortholog of Drosophila spacemaker, is mutated in autosomal recessive retinitis pigmentosa.

    Abd El-Aziz MM, Barragan I, O'Driscoll CA, Goodstadt L, Prigmore E, Borrego S, Mena M, Pieras JI, El-Ashry MF, Safieh LA, Shah A, Cheetham ME, Carter NP, Chakarova C, Ponting CP, Bhattacharya SS and Antinolo G

    Department of Molecular Genetics, Institute of Ophthalmology, London EC1V 9EL, UK.

    Using a positional cloning approach supported by comparative genomics, we have identified a previously unreported gene, EYS, at the RP25 locus on chromosome 6q12 commonly mutated in autosomal recessive retinitis pigmentosa. Spanning over 2 Mb, this is the largest eye-specific gene identified so far. EYS is independently disrupted in four other mammalian lineages, including that of rodents, but is well conserved from Drosophila to man and is likely to have a role in the modeling of retinal architecture.

    Funded by: Medical Research Council: MC_U137761446; Wellcome Trust: 077008

    Nature genetics 2008;40;11;1285-7

  • Contemporary approaches for modifying the mouse genome.

    Adams DJ and van der Weyden L

    Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    The mouse is a premiere experimental organism that has contributed significantly to our understanding of vertebrate biology. Manipulation of the mouse genome via embryonic stem (ES) cell technology makes it possible to engineer an almost limitless repertoire of mutations to model human disease and assess gene function. In this review we outline recent advances in mouse experimental genetics and provide a "how-to" guide for those people wishing to access this technology. We also discuss new technologies, such as transposon-mediated mutagenesis, and resources of targeting vectors and ES cells, which are likely to dramatically accelerate the pace with which we can assess gene function in vivo, and the progress of forward and reverse genetic screens in mice.

    Funded by: Cancer Research UK; Wellcome Trust

    Physiological genomics 2008;34;3;225-38

  • Text-mining assisted regulatory annotation.

    Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ, Montgomery SB, Bergman CM and Open Regulatory Annotation Consortium

    Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, VIB, Leuven, B-3000, Belgium.

    Background: Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature.

    Results: We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process.

    Conclusion: Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation.

    Genome biology 2008;9;2;R31

  • Genomic-scale prioritization of drug targets: the TDR Targets database.

    Agüero F, Al-Lazikani B, Aslett M, Berriman M, Buckner FS, Campbell RK, Carmona S, Carruthers IM, Chan AW, Chen F, Crowther GJ, Doyle MA, Hertz-Fowler C, Hopkins AL, McAllister G, Nwaka S, Overington JP, Pain A, Paolini GV, Pieper U, Ralph SA, Riechers A, Roos DS, Sali A, Shanmugam D, Suzuki T, Van Voorhis WC and Verlinde CL

    Instituto de Investigaciones Biotecnológicas, Universidad Nacional de General San Martín, San Martín 1650, Buenos Aires, Argentina.

    The increasing availability of genomic data for pathogens that cause tropical diseases has created new opportunities for drug discovery and development. However, if the potential of such data is to be fully exploited, the data must be effectively integrated and be easy to interrogate. Here, we discuss the development of the TDR Targets database (, which encompasses extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, as well as computationally predicted druggability for potential targets and compound desirability information. By allowing the integration and weighting of this information, this database aims to facilitate the identification and prioritization of candidate drug targets for pathogens.

    Funded by: NIGMS NIH HHS: R01 GM054762-14

    Nature reviews. Drug discovery 2008;7;11;900-7

  • DNA sequence and structural properties as predictors of human and mouse promoters.

    Akan P and Deloukas P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Promoters play a central role in gene regulation, yet our power to discriminate them from non-promoter sequences in higher eukaryotes is mainly restricted to those associated with CpG islands. Here, we examined in silico the promoters of 30,954 human and 18,083 mouse transcripts in the DBTSS database, to assess the impact of particular sequence and structural features (propeller twist, bendability and nucleosome positioning preference) on promoter classification and prediction. Our analysis showed that a stricter-than-traditional definition of CpG islands captures low and high CpG count promoter classes more accurately than the traditional one. We observed that both human and mouse promoter sequences are flexible with the exception of the TATA box and TSS, which are rigid regions irrespective of association with a CpG island. Therefore varying levels of structural flexibility in promoters may affect their accessibility to proteins, and hence their specificity. For all features investigated, averaged values across core promoters discriminated CpG island associated promoters from background, whereas the same did not hold for promoters without a CpG island. However, local changes around - 34 to - 23 (expected position of TATA box) and the TSS were informative in discriminating promoters (both classes) from non-promoter sequences. Additionally, we investigated ATG deserts and observed that they occur in all promoter sets except those with a TATA-box and without a CpG island in human. Interestingly, all mouse promoter sets showed ATG codon depletion irrespective of the presence of a TATA-box, possibly reflecting a weaker contribution to TSS specificity in mouse.

    Funded by: Wellcome Trust: 077011

    Gene 2008;410;1;165-76

  • Order within a mosaic distribution of mitochondrial c-type cytochrome biogenesis systems?

    Allen JW, Jackson AP, Rigden DJ, Willis AC, Ferguson SJ and Ginger ML

    Department of Biochemistry, University of Oxford, UK.

    Mitochondrial cytochromes c and c(1) are present in all eukaryotes that use oxygen as the terminal electron acceptor in the respiratory chain. Maturation of c-type cytochromes requires covalent attachment of the heme cofactor to the protein, and there are at least five distinct biogenesis systems that catalyze this post-translational modification in different organisms and organelles. In this study, we use biochemical data, comparative genomic and structural bioinformatics investigations to provide a holistic view of mitochondrial c-type cytochrome biogenesis and its evolution. There are three pathways for mitochondrial c-type cytochrome maturation, only one of which is present in prokaryotes. We analyze the evolutionary distribution of these biogenesis systems, which include the Ccm system (System I) and the enzyme heme lyase (System III). We conclude that heme lyase evolved once and, in many lineages, replaced the multicomponent Ccm system (present in the proto-mitochondrial endosymbiont), probably as a consequence of lateral gene transfer. We find no evidence of a System III precursor in prokaryotes, and argue that System III is incompatible with multi-heme cytochromes common to bacteria, but absent from eukaryotes. The evolution of the eukaryotic-specific protein heme lyase is strikingly unusual, given that this protein provides a function (thioether bond formation) that is also ubiquitous in prokaryotes. The absence of any known c-type cytochrome biogenesis system from the sequenced genomes of various trypanosome species indicates the presence of a third distinct mitochondrial pathway. Interestingly, this system attaches heme to mitochondrial cytochromes c that contain only one cysteine residue, rather than the usual two, within the heme-binding motif. The isolation of single-cysteine-containing mitochondrial cytochromes c from free-living kinetoplastids, Euglena and the marine flagellate Diplonema papillatum suggests that this unique form of heme attachment is restricted to, but conserved throughout, the protist phylum Euglenozoa.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C508118/1; Wellcome Trust

    The FEBS journal 2008;275;10;2385-402

  • Data growth and its impact on the SCOP database: new developments.

    Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C and Murzin AG

    MRC Centre for Protein Engineering, Hills Road, Cambridge CB2 0QH, UK.

    The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at

    Funded by: Medical Research Council: MC_U105192716; NIGMS NIH HHS: R01-GM073109; Wellcome Trust: 077198

    Nucleic acids research 2008;36;Database issue;D419-25

  • Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation.

    Angiuoli SV, Gussman A, Klimke W, Cochrane G, Field D, Garrity G, Kodira CD, Kyrpides N, Madupu R, Markowitz V, Tatusova T, Thomson N and White O

    Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland 21201, USA.

    The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.

    Funded by: PHS HHS: HHSN2662004000386

    Omics : a journal of integrative biology 2008;12;2;137-41

  • Consistently replicating locus linked to migraine on 10q22-q23.

    Anttila V, Nyholt DR, Kallela M, Artto V, Vepsäläinen S, Jakkula E, Wennerström A, Tikka-Kleemola P, Kaunisto MA, Hämäläinen E, Widén E, Terwilliger J, Merikangas K, Montgomery GW, Martin NG, Daly M, Kaprio J, Peltonen L, Färkkilä M, Wessman M and Palotie A

    Biomedicum Helsinki, Research Program in Molecular Medicine, University of Helsinki, 00290 Helsinki, Finland.

    Here, we present the results of two genome-wide scans in two diverse populations in which a consistent use of recently introduced migraine-phenotyping methods detects and replicates a locus on 10q22-q23, with an additional independent replication. No genetic variants have been convincingly established in migraine, and although several loci have been reported, none of them has been consistently replicated. We employed the three known migraine-phenotyping methods (clinical end diagnosis, latent-class analysis, and trait-component analysis) with robust multiple testing correction in a large sample set of 1675 individuals from 210 migraine families from Finland and Australia. Genome-wide multipoint linkage analysis that used the Kong and Cox exponential model in Finns detected a locus on 10q22-q23 with highly significant evidence of linkage (LOD 7.68 at 103 cM in female-specific analysis). The Australian sample showed a LOD score of 3.50 at the same locus (100 cM), as did the independent Finnish replication study (LOD score 2.41, at 102 cM). In addition, four previously reported loci on 8q21, 14q21, 18q12, and Xp21 were also replicated. A shared-segment analysis of 10q22-q23 linked Finnish families identified a 1.6-9.5 cM segment, centered on 101 cM, which shows in-family homology in 95% of affected Finns. This region was further studied with 1323 SNPs. Although no significant association was observed, four regions warranting follow-up studies were identified. These results support the use of symptomology-based phenotyping in migraine and suggest that the 10q22-q23 locus probably contains one or more migraine susceptibility variants.

    Funded by: NCRR NIH HHS: U54 RR020278; NIAAA NIH HHS: AA007535, AA013320, AA013326, AA014041, AA07728, AA10249, AA11998; NINDS NIH HHS: R01 NS37675

    American journal of human genetics 2008;82;5;1051-63

  • Bortezomib-induced peripheral neuropathy in multiple myeloma: a comprehensive review of the literature.

    Argyriou AA, Iconomou G and Kalofonos HP

    Department of Neurology, Saint Andrew's General Hospital of Patras, Greece.

    Bortezomib has demonstrated significant activity in clinical trials, mainly against recurrent or newly diagnosed multiple myeloma (MM). Peripheral neuropathy is a significant toxicity of bortezomib, requiring dose modification and potential changes in the treatment plan when it occurs. The mechanism underlying bortezomib-induced peripheral neuropathy (BIPN) is unknown. Metabolic changes resulting from the accumulation of bortezomib in the dorsal root ganglia cells, mitochondrial-mediated disregulation of Ca(++) homeostasis, and disregulation of neurotrophins may contribute to the pathogenesis of BIPN. It is increasingly recognized that BIPN may be a proteasome inhibitor class effect, producing primarily a small fiber and painful, axonal, sensory distal neuropathy. Incidence of BIPN is mainly related to various risk factors, including cumulative dose and evidence of preexisting neuropathy. Assessment of BIPN is based primarily on neurologic clinical examination and neurophysiologic methods. To date, apart from the use of dose reduction and schedule change algorithm, there is no effective treatment with neuroprotective agents for BIPN. Analgesics, tricyclic antidepressants, anticonvulsants, and vitamin supplements have been used as symptomatic treatment against bortezomib-associated neuropathic pain with some success. This review looks critically at the pathogenesis, incidence, risk factors, diagnosis, characteristics, and management of BIPN, and highlights areas for future research.

    Blood 2008;112;5;1593-9

  • Tumor necrosis factor SNP haplotypes are associated with iron deficiency anemia in West African children.

    Atkinson SH, Rockett KA, Morgan G, Bejon PA, Sirugo G, O'Connell MA, Hanchard N, Kwiatkowski DP and Prentice AM

    Medical Research Council (MRC) Keneba and MRC Laboratories, Banjul, The Gambia.

    Plasma levels of tumor necrosis factor-alpha (TNF-alpha) are significantly raised in malaria infection and TNF-alpha is thought to inhibit intestinal iron absorption and macrophage iron release. This study investigated putative functional single nucleotide polymorphisms (SNPs) and haplotypes across the major histocompatibility complex (MHC) class III region, including TNF and its immediate neighbors nuclear factor of kappa light polypeptide gene enhancer in B cells (lkappaBL), inhibitor-like 1 and lymphotoxin alpha (LTA), in relation to nutritional iron status and anemia, in a cohort of 780 children across a malaria season. The prevalence of iron deficiency anemia (IDA) increased over the malaria season (P < .001). The TNF(-308) AA genotype was associated with an increased risk of iron deficiency (adjusted OR 8.1; P = .001) and IDA (adjusted OR 5.1; P = .01) at the end of the malaria season. No genotypes were associated with IDA before the malaria season. Thus, TNF appears to be a risk factor for iron deficiency and IDA in children in a malaria-endemic environment and this is likely to be due to a TNF-alpha-induced block in iron absorption.

    Funded by: Medical Research Council

    Blood 2008;112;10;4276-83

  • Epigenetic marking prepares the human HOXA cluster for activation during differentiation of pluripotent cells.

    Atkinson SP, Koch CM, Clelland GK, Willcox S, Fowler JC, Stewart R, Lako M, Dunham I and Armstrong L

    North East Institute for Stem Cell Research, University of Newcastle upon Tyne, International Centre for Life, Newcastle upon Tyne, United Kingdom.

    Activation of Hox gene clusters is an early event in embryonic development since individual members play important roles in patterning of the body axis. Their functions require precise control of spatiotemporal expression to provide positional information for the cells of the developing embryo, and the manner by which this control is achieved has generated considerable interest. The situation is different in pluripotent cells, where HOX genes are not expressed but are held in potentio as bivalent chromatin domains, which are resolved upon differentiation to permit HOX cluster activation. In this study we have used differentiation of the pluripotent embryonal carcinoma cell line NTera2SP12 and the human embryonic stem cell line H9 to examine epigenetic changes that accompany activation of the HOXA cluster and show that specific genomic loci are marked by lysine methylation of histone H3 (H3K4 tri- and dimethyl, H3K9 trimethyl) and acetylation of histone H4 even in the undifferentiated cells. The precise locations of such modified histones may be involved in controlling the colinear expression of genes from the cluster.

    Stem cells (Dayton, Ohio) 2008;26;5;1174-85

  • Genome sequence of Staphylococcus aureus strain Newman and comparative analysis of staphylococcal genomes: polymorphism and evolution of two major pathogenicity islands.

    Baba T, Bae T, Schneewind O, Takeuchi F and Hiramatsu K

    Department of Microbiology and Infection Control Science, Juntendo University, Tokyo 113-8421, Japan.

    Strains of Staphylococcus aureus, an important human pathogen, display up to 20% variability in their genome sequence, and most sequence information is available for human clinical isolates that have not been subjected to genetic analysis of virulence attributes. S. aureus strain Newman, which was also isolated from a human infection, displays robust virulence properties in animal models of disease and has already been extensively analyzed for its molecular traits of staphylococcal pathogenesis. We report here the complete genome sequence of S. aureus Newman, which carries four integrated prophages, as well as two large pathogenicity islands. In agreement with the view that S. aureus Newman prophages contribute important properties to pathogenesis, fewer virulence factors are found outside of the prophages than for the highly virulent strain MW2. The absence of drug resistance genes reflects the general antibiotic-susceptible phenotype of S. aureus Newman. Phylogenetic analyses reveal clonal relationships between the staphylococcal strains Newman, COL, NCTC8325, and USA300 and a greater evolutionary distance to strains MRSA252, MW2, MSSA476, N315, Mu50, JH1, JH9, and RF122. However, polymorphism analysis of two large pathogenicity islands distributed among these strains shows that the two islands were acquired independently from the evolutionary pathway of the chromosomal backbones of staphylococcal genomes. Prophages and pathogenicity islands play central roles in S. aureus virulence and evolution.

    Journal of bacteriology 2008;190;1;300-10

  • High-throughput genotyping of Salmonella enterica serovar Typhi allowing geographical assignment of haplotypes and pathotypes within an urban District of Jakarta, Indonesia.

    Baker S, Holt K, van de Vosse E, Roumagnac P, Whitehead S, King E, Ewels P, Keniry A, Weill FX, Lightfoot D, van Dissel JT, Sanderson KE, Farrar J, Achtman M, Deloukas P and Dougan G

    The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    High-throughput epidemiological typing systems that provide phylogenetic and genotypic information are beneficial for tracking bacterial pathogens in the field. The incidence of Salmonella enterica serovar Typhi infection in Indonesia is high and is associated with atypical phenotypic traits such as expression of the j and the z66 flagellum antigens. Utilizing a high-throughput genotyping platform to investigate known nucleotide polymorphisms dispersed around the genome, we determined the haplotypes of 140 serovar Typhi isolates associated with Indonesia. We identified nine distinct serovar Typhi haplotypes circulating in Indonesia for more than 30 years, with eight of these present in a single Jakarta suburb within a 2-year period. One dominant haplotype, H59, is associated with j and z66 flagellum expression, representing a potential pathotype unique to Indonesia. Phylogenetic analysis suggests that H59 z66(+), j(+) isolates emerged relatively recently in terms of the origin of serovar Typhi and are geographically restricted. These data demonstrate the potential of high-throughput genotyping platforms for analyzing serovar Typhi populations in the field. The study also provides insight into the evolution of serovar Typhi and demonstrates the value of a molecular epidemiological technique that is exchangeable, that is internet friendly, and that has global utility.

    Funded by: Wellcome Trust

    Journal of clinical microbiology 2008;46;5;1741-6

  • Mobilization of the incQ plasmid R300B with a chromosomal conjugation system in Salmonella enterica serovar typhi.

    Baker S, Pickard D, Whitehead S, Farrar J and Dougan G

    The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    Salmonella pathogenicity island 7 (SPI-7) in Salmonella enterica serovar Typhi appears to be related to other genomic islands. Evidence suggests that SPI-7 is susceptible to spontaneous circularization, loss, and transposition. Here, we demonstrate that a region within SPI-7 has the ability to mobilize the small incQ plasmid R300B.

    Funded by: Wellcome Trust

    Journal of bacteriology 2008;190;11;4084-7

  • Dynamic nature of the proximal AZFc region of the human Y chromosome: multiple independent deletion and duplication events revealed by microsatellite analysis.

    Balaresque P, Bowden GR, Parkin EJ, Omran GA, Heyer E, Quintana-Murci L, Roewer L, Stoneking M, Nasidze I, Carvalho-Silva DR, Tyler-Smith C, de Knijff P and Jobling MA

    Department of Genetics, University of Leicester, Leicester, United Kingdom.

    The human Y chromosome shows frequent structural variants, some of which are selectively neutral, while others cause impaired fertility due to the loss of spermatogenic genes. The large-scale use of multiple Y-chromosomal microsatellites in forensic and population genetic studies can reveal such variants, through the absence or duplication of specific markers in haplotypes. We describe Y chromosomes in apparently normal males carrying null and duplicated alleles at the microsatellite DYS448, which lies in the proximal part of the azoospermia factor c (AZFc) region, important in spermatogenesis, and made up of "ampliconic" repeats that act as substrates for nonallelic homologous recombination (NAHR). Physical mapping in 26 DYS448 deletion chromosomes reveals that only three cases belong to a previously described class, representing independent occurrences of an approximately 1.5-Mb deletion mediated by recombination between the b1 and b3 repeat units. The remainder belong to five novel classes; none appears to be mediated through homologous recombination, and all remove some genes, but are likely to be compatible with normal fertility. A combination of deletion analysis with binary-marker and microsatellite haplotyping shows that the 26 deletions represent nine independent events. Nine DYS448 duplication chromosomes can be explained by four independent events. Some lineages have risen to high frequency in particular populations, in particular a deletion within haplogroup (hg) C(*)(xC3a,C3c) found in 18 Asian males. The nonrandom phylogenetic distribution of duplication and deletion events suggests possible structural predisposition to such mutations in hgs C and G.

    Funded by: Wellcome Trust: 057559, 077009

    Human mutation 2008;29;10;1171-80

  • Autosomal-dominant microtia linked to five tandem copies of a copy-number-variable region at chromosome 4p16.

    Balikova I, Martens K, Melotte C, Amyere M, Van Vooren S, Moreau Y, Vetrie D, Fiegler H, Carter NP, Liehr T, Vikkula M, Matthijs G, Fryns JP, Casteels I, Devriendt K and Vermeesch JR

    Center for Human Genetics, University of Leuven, 3000 Leuven, Belgium.

    Recently, large-scale benign copy-number variations (CNVs)--encompassing over 12% of the genome and containing genes considered to be dosage tolerant for human development--were uncovered in the human population. Here we present a family with a novel autosomal-dominantly inherited syndrome characterized by microtia, eye coloboma, and imperforation of the nasolacrimal duct. This phenotype is linked to a cytogenetically visible alteration at 4pter consisting of five copies of a copy-number-variable region, encompassing a low-copy repeat (LCR)-rich sequence. We demonstrate that the approximately 750 kb amplicon occurs in exact tandem copies. This is the first example of an amplified CNV associated with a Mendelian disorder, a discovery that implies that genome screens for genetic disorders should include the analysis of so-called benign CNVs and LCRs.

    Funded by: Wellcome Trust

    American journal of human genetics 2008;82;1;181-7

  • Breakpoint mapping and array CGH in translocations: comparison of a phenotypically normal and an abnormal cohort.

    Baptista J, Mercer C, Prigmore E, Gribble SM, Carter NP, Maloney V, Thomas NS, Jacobs PA and Crolla JA

    Wessex Regional Genetics Laboratory, Salisbury District Hospital, Salisbury, Wiltshire, UK.

    We report the analyses of breakpoints in 31 phenotypically normal and 14 abnormal carriers of balanced translocations. Our study assesses the differences between balanced translocations in normal carriers and those in abnormal carriers, focusing on the presence of genomic imbalances at the breakpoints or elsewhere in the genome, presence of cryptic chromosome rearrangements, and gene disruption. Our hypothesis is that all four features will be associated with phenotypic abnormalities and absent or much less frequent in a normal population. In the normal cohort, we identified neither genomic imbalances at the breakpoints or elsewhere in the genome nor cryptic chromosome rearrangements. In contrast, we identified candidate disease-causing imbalances in 4/14 abnormal patients. These were three breakpoint associated deletions and three deletions unrelated to the breakpoints. All six de novo deletions originated on the paternally inherited chromosome. Additional complexity was also present in one of these cases. Gene disruption by the breakpoints was present in 16/31 phenotypically normal individuals and in 5/14 phenotypically abnormal patients. Our results show that translocations in phenotypically abnormal patients are molecularly distinct from those in normal individuals: the former are more likely to be associated with genomic imbalances at the breakpoints or elsewhere and with chromosomal complexity, whereas the frequency of gene disruption is similar in both normal and abnormal translocation carriers.

    Funded by: Wellcome Trust

    American journal of human genetics 2008;82;4;927-36

  • A robust statistical method for case-control association testing with copy number variation.

    Barnes C, Plagnol V, Fitzgerald T, Redon R, Marchini J, Clayton D and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.

    Funded by: Wellcome Trust: 061860

    Nature genetics 2008;40;10;1245-52

  • Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease.

    Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, Brant SR, Silverberg MS, Taylor KD, Barmada MM, Bitton A, Dassopoulos T, Datta LW, Green T, Griffiths AM, Kistner EO, Murtha MT, Regueiro MD, Rotter JI, Schumm LP, Steinhart AH, Targan SR, Xavier RJ, NIDDK IBD Genetics Consortium, Libioulle C, Sandor C, Lathrop M, Belaiche J, Dewit O, Gut I, Heath S, Laukens D, Mni M, Rutgeerts P, Van Gossum A, Zelenika D, Franchimont D, Hugot JP, de Vos M, Vermeire S, Louis E, Belgian-French IBD Consortium, Wellcome Trust Case Control Consortium, Cardon LR, Anderson CA, Drummond H, Nimmo E, Ahmad T, Prescott NJ, Onnie CM, Fisher SA, Marchini J, Ghori J, Bumpstead S, Gwilliam R, Tremelling M, Deloukas P, Mansfield J, Jewell D, Satsangi J, Mathew CG, Parkes M, Georges M and Daly MJ

    Bioinformatics and Statistical Genetics, Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.

    Several risk factors for Crohn's disease have been identified in recent genome-wide association studies. To advance gene discovery further, we combined data from three studies on Crohn's disease (a total of 3,230 cases and 4,829 controls) and carried out replication in 3,664 independent cases with a mixture of population-based and family-based controls. The results strongly confirm 11 previously reported loci and provide genome-wide significant evidence for 21 additional loci, including the regions containing STAT3, JAK2, ICOSLG, CDKAL1 and ITLN1. The expanded molecular understanding of the basis of this disease offers promise for informed therapeutic development.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0600329, G0800759; NIAID NIH HHS: AI06277, R01 AI062773, R01 AI062773-02; NIDDK NIH HHS: DK064869, DK62413, DK62420, DK62422, DK62423, DK62429, DK62431, DK62432, P30 DK040561-13, P30 DK063491-019004, P30 DK063491-029004, P30 DK063491-039004, P30 DK063491-049004, P30 DK063491-05, R01 DK064869-04, U01 DK062413-06, U01 DK062420, U01 DK062420-01, U01 DK062420-02, U01 DK062420-03, U01 DK062420-04, U01 DK062420-05, U01 DK062420-06, U01 DK062422, U01 DK062422-07, U01 DK062423, U01 DK062423-06, U01 DK062429, U01 DK062429-07, U01 DK062431, U01 DK062431-06; Wellcome Trust: 068545/Z/02

    Nature genetics 2008;40;8;955-62

  • Population-specific risk of type 2 diabetes conferred by HNF4A P2 promoter variants: a lesson for replication studies.

    Barroso I, Luan J, Wheeler E, Whittaker P, Wasson J, Zeggini E, Weedon MN, Hunt S, Venkatesh R, Frayling TM, Delgado M, Neuman RJ, Zhao J, Sherva R, Glaser B, Walker M, Hitman G, McCarthy MI, Hattersley AT, Permutt MA, Wareham NJ and Deloukas P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Objective: Single nucleotide polymorphisms (SNPs) in the P2 promoter region of HNF4A were originally shown to be associated with predisposition for type 2 diabetes in Finnish, Ashkenazi, and, more recently, Scandinavian populations, but they generated conflicting results in additional populations. We aimed to investigate whether data from a large-scale mapping approach would replicate this association in novel Ashkenazi samples and in U.K. populations and whether these data would allow us to refine the association signal.

    Using a dense linkage disequilibrium map of 20q, we selected SNPs from a 10-Mb interval centered on HNF4A. In a staged approach, we first typed 4,608 SNPs in case-control populations from four U.K. populations and an Ashkenazi population (n = 2,516). In phase 2, a subset of 763 SNPs was genotyped in 2,513 additional samples from the same populations.

    Results: Combined analysis of both phases demonstrated association between HNF4A P2 SNPs (rs1884613 and rs2144908) and type 2 diabetes in the Ashkenazim (n = 991; P < 1.6 x 10(-6)). Importantly, these associations are significant in a subset of Ashkenazi samples (n = 531) not previously tested for association with P2 SNPs (odds ratio [OR] approximately 1.7; P < 0.002), thus providing replication within the Ashkenazim. In the U.K. populations, this association was not significant (n = 4,022; P > 0.5), and the estimate for the OR was much smaller (OR 1.04; [95%CI 0.91-1.19]).

    Conclusions: These data indicate that the risk conferred by HNF4A P2 is significantly different between U.K. and Ashkenazi populations (P < 0.00007), suggesting that the underlying causal variant remains unidentified. Interactions with other genetic or environmental factors may also contribute to this difference in risk between populations.

    Funded by: Medical Research Council: MC_U106179471; PHS HHS: R01K049583; Wellcome Trust: 076113, 077016, 079557

    Diabetes 2008;57;11;3161-5

  • Re-evaluation of putative rheumatoid arthritis susceptibility genes in the post-genome wide association study era and hypothesis of a key pathway underlying susceptibility.

    Barton A, Thomson W, Ke X, Eyre S, Hinks A, Bowes J, Gibbons L, Plant D, Wellcome Trust Case Control Consortium, Wilson AG, Marinou I, Morgan A, Emery P, YEAR consortium, Steer S, Hocking L, Reid DM, Wordsworth P, Harrison P and Worthington J

    Arc-Epidemiology Unit, Stopford Building, The University of Manchester, Manchester, UK.

    Rheumatoid arthritis (RA) is an archetypal, common, complex autoimmune disease with both genetic and environmental contributions to disease aetiology. Two novel RA susceptibility loci have been reported from recent genome-wide and candidate gene association studies. We, therefore, investigated the evidence for association of the STAT4 and TRAF1/C5 loci with RA using imputed data from the Wellcome Trust Case Control Consortium (WTCCC). No evidence for association of variants mapping to the TRAF1/C5 gene was detected in the 1860 RA cases and 2930 control samples tested in that study. Variants mapping to the STAT4 gene did show evidence for association (rs7574865, P = 0.04). Given the association of the TRAF1/C5 locus in two previous large case-control series from populations of European descent and the evidence for association of the STAT4 locus in the WTCCC study, single nucleotide polymorphisms mapping to these loci were tested for association with RA in an independent UK series comprising DNA from >3000 cases with disease and >3000 controls and a combined analysis including the WTCCC data was undertaken. We confirm association of the STAT4 and the TRAF1/C5 loci with RA bringing to 5 the number of confirmed susceptibility loci. The effect sizes are less than those reported previously but are likely to be a more accurate reflection of the true effect size given the larger size of the cohort investigated in the current study.

    Funded by: Arthritis Research UK: 17552; Medical Research Council: G0000934

    Human molecular genetics 2008;17;15;2274-9

  • Rheumatoid arthritis susceptibility loci at chromosomes 10p15, 12q13 and 22q13.

    Barton A, Thomson W, Ke X, Eyre S, Hinks A, Bowes J, Plant D, Gibbons LJ, Wellcome Trust Case Control Consortium, YEAR Consortium, BIRAC Consortium, Wilson AG, Bax DE, Morgan AW, Emery P, Steer S, Hocking L, Reid DM, Wordsworth P, Harrison P and Worthington J

    Arthritis Research Campaign, Epidemiology Unit, The University of Manchester, Manchester, UK.

    The WTCCC study identified 49 SNPs putatively associated with rheumatoid arthritis at P = 1 x 10(-4) - 1 x 10(-5) (tier 3). Here we show that three of these SNPs, mapping to chromosome 10p15 (rs4750316), 12q13 (rs1678542) and 22q13 (rs3218253), are also associated (trend P = 4 x 10(-5), P = 4 x 10(-4) and P = 4 x 10(-4), respectively) in a validation study of 4,106 individuals with rheumatoid arthritis and an expanded reference group of 11,238 subjects, confirming them as true susceptibility loci in individuals of European ancestry.

    Funded by: Arthritis Research UK: 17552; Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02

    Nature genetics 2008;40;10;1156-9

  • Convergent evolution in the genetic basis of Müllerian mimicry in heliconius butterflies.

    Baxter SW, Papa R, Chamberlain N, Humphray SJ, Joron M, Morrison C, ffrench-Constant RH, McMillan WO and Jiggins CD

    Department of Zoology, University of Cambridge, Cambridge, United Kingdom.

    The neotropical butterflies Heliconius melpomene and H. erato are Müllerian mimics that display the same warningly colored wing patterns in local populations, yet pattern diversity between geographic regions. Linkage mapping has previously shown convergent red wing phenotypes in these species are controlled by loci on homologous chromosomes. Here, AFLP bulk segregant analysis using H. melpomene crosses identified genetic markers tightly linked to two red wing-patterning loci. These markers were used to screen a H. melpomene BAC library and a tile path was assembled spanning one locus completely and part of the second. Concurrently, a similar strategy was used to identify a BAC clone tightly linked to the locus controlling the mimetic red wing phenotypes in H. erato. A methionine rich storage protein (MRSP) gene was identified within this BAC clone, and comparative genetic mapping shows red wing color loci are in homologous regions of the genome of H. erato and H. melpomene. Subtle differences in these convergent phenotypes imply they evolved independently using somewhat different developmental routes, but are nonetheless regulated by the same switch locus. Genetic mapping of MRSP in a third related species, the "tiger" patterned H. numata, has no association with wing patterning and shows no evidence for genomic translocation of wing-patterning loci.

    Funded by: Biotechnology and Biological Sciences Research Council

    Genetics 2008;180;3;1567-77

  • A novel 154-bp deletion in the human mitochondrial DNA control region in healthy individuals.

    Behar DM, Blue-Smith J, Soria-Hernanz DF, Tzur S, Hadid Y, Bormans C, Moen A, Tyler-Smith C, Quintana-Murci L, Wells RS and Genographic Consortium

    Molecular Medicine Laboratory, Rambam Health Care Campus, Haifa, Israel.

    The biological role of the mitochondrial DNA (mtDNA) control region in mtDNA replication remains unclear. In a worldwide survey of mtDNA variation in the general population, we have identified a novel large control region deletion spanning positions 16154 to 16307 (m.16154_16307del154). The population prevalence of this deletion is low, since it was only observed in 1 out of over 120,000 mtDNA genomes studied. The deletion is present in a nonheteroplasmic state, and was transmitted by a mother to her two sons with no apparent past or present disease conditions. The identification of this large deletion in healthy individuals challenges the current view of the control region as playing a crucial role in the regulation of mtDNA replication, and supports the existence of a more complex system of multiple or epigenetically-determined replication origins.

    Funded by: Wellcome Trust: 077009

    Human mutation 2008;29;12;1387-91

  • The dawn of human matrilineal diversity.

    Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, Metspalu E, Scozzari R, Makkan H, Tzur S, Comas D, Bertranpetit J, Quintana-Murci L, Tyler-Smith C, Wells RS, Rosset S and Genographic Consortium

    Molecular Medicine Laboratory, Rambam Health Care Campus, Haifa 31096, Israel.

    The quest to explain demographic history during the early part of human evolution has been limited because of the scarce paleoanthropological record from the Middle Stone Age. To shed light on the structure of the mitochondrial DNA (mtDNA) phylogeny at the dawn of Homo sapiens, we constructed a matrilineal tree composed of 624 complete mtDNA genomes from sub-Saharan Hg L lineages. We paid particular attention to the Khoi and San (Khoisan) people of South Africa because they are considered to be a unique relic of hunter-gatherer lifestyle and to carry paternal and maternal lineages belonging to the deepest clades known among modern humans. Both the tree phylogeny and coalescence calculations suggest that Khoisan matrilineal ancestry diverged from the rest of the human mtDNA pool 90,000-150,000 years before present (ybp) and that at least five additional, currently extant maternal lineages existed during this period in parallel. Furthermore, we estimate that a minimum of 40 other evolutionarily successful lineages flourished in sub-Saharan Africa during the period of modern human dispersal out of Africa approximately 60,000-70,000 ybp. Only much later, at the beginning of the Late Stone Age, about 40,000 ybp, did introgression of additional lineages occur into the Khoisan mtDNA pool. This process was further accelerated during the recent Bantu expansions. Our results suggest that the early settlement of humans in Africa was already matrilineally structured and involved small, separately evolving isolated populations.

    Funded by: Wellcome Trust

    American journal of human genetics 2008;82;5;1130-40

  • Genomic 'valleys of death'.

    Bentley S

    Nature reviews. Microbiology 2008;6;4;260-1

  • Genome of the actinomycete plant pathogen Clavibacter michiganensis subsp. sepedonicus suggests recent niche adaptation.

    Bentley SD, Corton C, Brown SE, Barron A, Clark L, Doggett J, Harris B, Ormond D, Quail MA, May G, Francis D, Knudson D, Parkhill J and Ishimaru CA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Clavibacter michiganensis subsp. sepedonicus is a plant-pathogenic bacterium and the causative agent of bacterial ring rot, a devastating agricultural disease under strict quarantine control and zero tolerance in the seed potato industry. This organism appears to be largely restricted to an endophytic lifestyle, proliferating within plant tissues and unable to persist in the absence of plant material. Analysis of the genome sequence of C. michiganensis subsp. sepedonicus and comparison with the genome sequences of related plant pathogens revealed a dramatic recent evolutionary history. The genome contains 106 insertion sequence elements, which appear to have been active in extensive rearrangement of the chromosome compared to that of Clavibacter michiganensis subsp. michiganensis. There are 110 pseudogenes with overrepresentation in functions associated with carbohydrate metabolism, transcriptional regulation, and pathogenicity. Genome comparisons also indicated that there is substantial gene content diversity within the species, probably due to differential gene acquisition and loss. These genomic features and evolutionary dating suggest that there was recent adaptation for life in a restricted niche where nutrient diversity and perhaps competition are low, correlated with a reduced ability to exploit previously occupied complex niches outside the plant. Toleration of factors such as multiplication and integration of insertion sequence elements, genome rearrangements, and functional disruption of many genes and operons seems to indicate that there has been general relaxation of selective pressure on a large proportion of the genome.

    Journal of bacteriology 2008;190;6;2150-60

  • Susceptibility loci for intracranial aneurysm in European and Japanese populations.

    Bilguvar K, Yasuno K, Niemelä M, Ruigrok YM, von Und Zu Fraunberg M, van Duijn CM, van den Berg LH, Mane S, Mason CE, Choi M, Gaál E, Bayri Y, Kolb L, Arlier Z, Ravuri S, Ronkainen A, Tajima A, Laakso A, Hata A, Kasuya H, Koivisto T, Rinne J, Ohman J, Breteler MM, Wijmenga C, State MW, Rinkel GJ, Hernesniemi J, Jääskeläinen JE, Palotie A, Inoue I, Lifton RP and Günel M

    Department of Neurosurgery, Neurobiology, Yale Center for Human Genetics and Genomics, Yale University School of Medicine, New Haven, CT 06510, USA.

    Stroke is the world's third leading cause of death. One cause of stroke, intracranial aneurysm, affects approximately 2% of the population and accounts for 500,000 hemorrhagic strokes annually in mid-life (median age 50), most often resulting in death or severe neurological impairment. The pathogenesis of intracranial aneurysm is unknown, and because catastrophic hemorrhage is commonly the first sign of disease, early identification is essential. We carried out a multistage genome-wide association study (GWAS) of Finnish, Dutch and Japanese cohorts including over 2,100 intracranial aneurysm cases and 8,000 controls. Genome-wide genotyping of the European cohorts and replication studies in the Japanese cohort identified common SNPs on chromosomes 2q, 8q and 9p that show significant association with intracranial aneurysm with odds ratios 1.24-1.36. The loci on 2q and 8q are new, whereas the 9p locus was previously found to be associated with arterial diseases, including intracranial aneurysm. Associated SNPs on 8q likely act via SOX17, which is required for formation and maintenance of endothelial cells, suggesting a role in development and repair of the vasculature; CDKN2A at 9p may have a similar role. These findings have implications for the pathophysiology, diagnosis and therapy of intracranial aneurysm.

    Funded by: Howard Hughes Medical Institute; NINDS NIH HHS: R01 NS057756, U24 NS051869

    Nature genetics 2008;40;12;1472-7

  • Interaction of Salmonella enterica serovar Typhi with cultured epithelial cells: roles of surface structures in adhesion and invasion.

    Bishop A, House D, Perkins T, Baker S, Kingsley RA and Dougan G

    The Centre for Molecular Microbiology and Infection, Faculty of Life Sciences, Division of Molecular and Cell Biology, Imperial College London, London SW7 2AZ, UK.

    In this study we investigate the ability of Salmonella enterica serovar Typhi (S. Typhi) surface structures to influence invasion and adhesion in epithelial cell assay systems. In general, S. Typhi was found to be less adherent, invasive and cytotoxic than S. enterica serovar Typhimurium (S. Typhimurium). Culture conditions had little effect on adhesion of S. Typhi to cultured cells but had a marked influence on invasion. In contrast, bacterial growth conditions did not influence S. Typhi apical invasion of polarized cells. The levels of S. Typhi, but not S. Typhimurium, invasion were increased by application of bacteria to the basolateral surface of polarized cells. Expression of virulence (Vi) capsule by S. Typhi resulted in a modest reduction in adhesion, but profoundly reduced levels of invasion of non-polarized cells. However, Vi capsule expression had no affect on invasion of the apical or basolateral surfaces of polarized cells. Mutation of the staA, tcfA or pilS genes did not affect invasion or adhesion in either the presence or the absence of Vi capsule.

    Funded by: Wellcome Trust: 076962

    Microbiology (Reading, England) 2008;154;Pt 7;1914-26

  • Rebound depolarization in single units of the ventral cochlear nucleus: a contribution to grouping by common onset?

    Bleeck S, Ingham NJ, Verhey JL and Winter IM

    Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1 BJ, UK.

    Simultaneous grouping by common onset time is believed to be a powerful cue in auditory perception; components that start or stop roughly at the same time are judged as far more likely to have originated from the same source. Here we report a simple experiment designed to simulate a complex psychophysical paradigm first described by Darwin and Sutherland [(1984) Grouping frequency components of vowels. When is a harmonic not a harmonic? Quarterly J of Experimental Psychology: Hum Exp Psychol 36(A):193-208]. It is possible to change the perception of the vowel /I/ to /epsilon/ by manipulating the harmonics around the first formant (F1). Increasing the amplitude of one harmonic around F1 caused the perception of the vowel to change from /I/ to /epsilon/. Extending the increased component before the vowel could, however, greatly reduce this change. The role of neural adaptation in this effect was questioned by repeating the experiment but this time using a 'captor' tone which was switched on with the asynchronous harmonic and off when the vowel started. This time the vowel percept did change in a fashion analogous to the effect of an increase in the amplitude of the fourth harmonic (which is close to F1). This effect was explained by assuming that the captor had grouped with the leading portion of the asynchronous component enabling the remainder of the asynchronous component to be grouped with the remainder of the components. We propose a relatively low-level neuronal explanation for this grouping effect: the captor reduces the neural response to the leading segment of the asynchronous component by activating across-frequency suppression, either from the cochlea, or acting via a wideband inhibitor in the ventral cochlear nucleus. The reduction in neural response results in a release from adaptation with the offset of the captor terminating the inhibition, such that the response to the continuation of that component is now enhanced. Using a simplified paradigm we show that both primary-like and chopper units in the ventral cochlear nucleus of the anesthetized guinea pig may show a rebound in excitation when a captor is positioned so as to stimulate the suppressive sidebands in its receptive field. The strength of the rebound was positively correlated with the strength of the suppression. These and other results are consistent with the view that low-level mechanisms underlie the psychophysical captor effect.

    Funded by: Biotechnology and Biological Sciences Research Council

    Neuroscience 2008;154;1;139-46

  • Widespread duplications in the genomes of laboratory stocks of Dictyostelium discoideum.

    Bloomfield G, Tanaka Y, Skelton J, Ivens A and Kay RR

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK.

    Background: Duplications of stretches of the genome are an important source of individual genetic variation, but their unrecognized presence in laboratory organisms would be a confounding variable for genetic analysis.

    Results: We report here that duplications of 15 kb or more are common in the genome of the social amoeba Dictyostelium discoideum. Most stocks of the axenic 'workhorse' strains Ax2 and Ax3/4 obtained from different laboratories can be expected to carry different duplications. The auxotrophic strains DH1 and JH10 also bear previously unreported duplications. Strain Ax3/4 is known to carry a large duplication on chromosome 2 and this structure shows evidence of continuing instability; we find a further variable duplication on chromosome 5. These duplications are lacking in Ax2, which has instead a small duplication on chromosome 1. Stocks of the type isolate NC4 are similarly variable, though we have identified some approximating the assumed ancestral genotype. More recent wild-type isolates are almost without large duplications, but we can identify small deletions or regions of high divergence, possibly reflecting responses to local selective pressures. Duplications are scattered through most of the genome, and can be stable enough to reconstruct genealogies spanning decades of the history of the NC4 lineage. The expression level of many duplicated genes is increased with dosage, but for others it appears that some form of dosage compensation occurs.

    Conclusion: The genetic variation described here must underlie some of the phenotypic variation observed between strains from different laboratories. We suggest courses of action to alleviate the problem.

    Funded by: Medical Research Council; Wellcome Trust: 06724

    Genome biology 2008;9;4;R75

  • Contributions of the genome sequence of Erwinia amylovora to the fire blight community

    Bocsanczy, A. M., Beer, S. V., Perna, N. T., Biehl, B., Glasner, J. D., Cartinhour, S. W., Schneider, D. J., DeClerck, G. A., Sebaihia, M., Parkhill, J., Bentley, S.

    Acta Horticulturae. 2008;793;163-170

  • The Serine/threonine kinase Stk33 exhibits autophosphorylation and phosphorylates the intermediate filament protein Vimentin.

    Brauksiepe B, Mujica AO, Herrmann H and Schmidt ER

    Institute of Molecular Genetics, Johannes Gutenberg-University, Mainz, Germany.

    Background: Colocalization of Stk33 with vimentin by double immunofluorescence in certain cells indicated that vimentin might be a target for phosphorylation by the novel kinase Stk33. We therefore tested in vitro the ability of Stk33 to phosphorylate recombinant full length vimentin and amino-terminal truncated versions thereof. In order to prove that Stk33 and vimentin are also in vivo associated proteins co-immunoprecipitation experiments were carried out. For testing the enzymatic activity of immunoprecipitated Stk33 we incubated precipitated Stk33 with recombinant vimentin proteins. To investigate whether Stk33 binds directly to vimentin, an in vitro co-sedimentation assay was performed.

    Results: The results of the kinase assays demonstrate that Stk33 is able to specifically phosphorylate the non-alpha-helical amino-terminal domain of vimentin in vitro. Furthermore, co-immunoprecipitation experiments employing cultured cell extracts indicate that Stk33 and vimentin are associated in vivo. Immunoprecipitated Stk33 has enzymatic activity as shown by successful phosphorylation of recombinant vimentin proteins. The results of the co-sedimentation assay suggest that vimentin binds directly to Stk33 and that no additional protein mediates the association.

    Conclusion: We hypothesize that Stk33 is involved in the in vivo dynamics of the intermediate filament cytoskeleton by phosphorylating vimentin.

    BMC biochemistry 2008;9;25

  • The BioGRID Interaction Database: 2008 update.

    Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bähler J, Wood V, Dolinski K and Tyers M

    Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada.

    The Biological General Repository for Interaction Datasets (BioGRID) database ( was developed to house and distribute collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies. Through comprehensive curation efforts, BioGRID now includes a virtually complete set of interactions reported to date in the primary literature for both the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. A number of new features have been added to the BioGRID including an improved user interface to display interactions based on different attributes, a mirror site and a dedicated interaction management system to coordinate curation across different locations. The BioGRID provides interaction data with monthly updates to Saccharomyces Genome Database, Flybase and Entrez Gene. Source code for the BioGRID and the linked Osprey network visualization system is now freely available without restriction.

    Funded by: Cancer Research UK: A6517; NCRR NIH HHS: 1R01RR024031-01; Wellcome Trust: 077118

    Nucleic acids research 2008;36;Database issue;D637-40

  • Population genomics: modeling the new and a renaissance of the old.

    Brinkman FS and Parkhill J

    Current opinion in microbiology 2008;11;5;439-41

  • Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.

    Brosch M, Swamy S, Hubbard T and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    It is a major challenge to develop effective sequence database search algorithms to translate molecular weight and fragment mass information obtained from tandem mass spectrometry into high quality peptide and protein assignments. We investigated the peptide identification performance of Mascot and X!Tandem for mass tolerance settings common for low and high accuracy mass spectrometry. We demonstrated that sensitivity and specificity of peptide identification can vary substantially for different mass tolerance settings, but this effect was more significant for Mascot. We present an adjusted Mascot threshold, which allows the user to freely select the best trade-off between sensitivity and specificity. The adjusted Mascot threshold was compared with the default Mascot and X!Tandem scoring thresholds and shown to be more sensitive at the same false discovery rates for both low and high accuracy mass spectrometry data.

    Funded by: Wellcome Trust: 077198

    Molecular & cellular proteomics : MCP 2008;7;5;962-70

  • Replacement of adenylate cyclase toxin in a lineage of Bordetella bronchiseptica.

    Buboltz AM, Nicholson TL, Parette MR, Hester SE, Parkhill J and Harvill ET

    Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.

    Bordetella bronchiseptica is a gram-negative respiratory pathogen that infects a wide range of hosts and causes a diverse spectrum of disease. This diversity is likely affected by multiple factors, such as host immune status, polymicrobial infection, and strain diversity. In a murine model of infection, we found that the virulence of B. bronchiseptica strains, as measured by the mean lethal dose, varied widely. Strain 253 was less virulent than the typically studied strain, RB50. Transcriptome analysis showed that cyaA, the gene encoding adenylate cyclase toxin (CyaA), was the most downregulated transcript identified in strain 253 compared to that in strain RB50. Comparative genomic hybridization and genome sequencing of strain 253 revealed that the cya locus, which encodes, activates, and secretes CyaA, was replaced by an operon (ptp) predicted to encode peptide transport proteins. Other B. bronchiseptica strains from the same phylogenetic lineage as that of strain 253 also lacked the cya locus, contained the ptp genes, and were less virulent than strain RB50. Although the loss of CyaA would be expected to be counterselected since it is conserved among the classical bordetellae and believed to be important to their success, our data indicate that the loss of this toxin and the gain of the ptp genes occurred in an ancestral strain that then expanded into a lineage. This suggests that there may be ecological niches in which CyaA is not critical for the success of B. bronchiseptica.

    Funded by: NIAID NIH HHS: AI 053075, AI 065507, R01 AI053075-01A1, R01 AI053075-02, R01 AI053075-03, R01 AI053075-04, R01 AI053075-05; NIGMS NIH HHS: GM083113, R01 GM083113-01, R01 GM083113-02

    Journal of bacteriology 2008;190;15;5502-11

  • Identification of variation in the platelet transcriptome associated with glycoprotein 6 haplotype.

    Burns P, Gusnanto A, Macaulay IC, Rankin A, Tom B, Langford CF, Dudbridge F, Ouwehand WH, Watkins NA and Bloodomics Consortium

    Department of Haematology, University of Cambridge and National Health Service Blood and Transplant, Cambridge, UK.

    Platelet Glycoprotein VI (GPVI) is the activatory collagen signalling receptor that transmits an outside-in signal via the FcR gamma-chain. In Caucasians two GP6 haplotypes have been identified which encode GPVI isoforms that differ by five amino-acids. The minor haplotype is associated with a modest but statistically significant reduction in GPVI abundance and reduced downstream signalling events. As GPVI is also expressed on megakaryocytes, different GPVI isoforms may imprint on the platelet transcriptome. We investigated the association of GP6 haplotype with transcription by comparing the transcriptomes of platelets from individuals homozygous for the major ('a') and minor ('b') haplotypes to identify differentially expressed (DE) transcripts. Platelet RNA was isolated from apheresis concentrates from 16 'aa' donors and eight 'bb' donors. mRNA was amplified using a template-switching PCR based protocol and fluorescently labelled. Samples were randomly paired both within and between haplotypes and compared on a cDNA microarray. No consistently DE transcripts were identified within the 'aa' haplotype but 52 significantly DE transcripts were observed between haplotypes. Generally the fold differences were low (two to four-fold) but were confirmed by qRT-PCR for selected transcripts (TUBB1, P = 0.0004; VWF, P = 0.0126). The results of this study indicate that there are subtle differences between the platelet transcriptomes of individuals who differ by GP6 haplotype. The identification of DE genes may identify critical pathways and nodes not previously known to be involved in platelet development and function.

    Funded by: Medical Research Council: MC_U105261167; Wellcome Trust

    Platelets 2008;19;4;258-67

  • Large-scale screening for novel low-affinity extracellular protein interactions.

    Bushell KM, Söllner C, Schuster-Boeckler B, Bateman A and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, United Kingdom.

    Extracellular protein-protein interactions are essential for both intercellular communication and cohesion within multicellular organisms. Approximately a fifth of human genes encode membrane-tethered or secreted proteins, but they are largely absent from recent large-scale protein interaction datasets, making current interaction networks biased and incomplete. This discrepancy is due to the unsuitability of popular high-throughput methods to detect extracellular interactions because of the biochemical intractability of membrane proteins and their interactions. For example, cell surface proteins contain insoluble hydrophobic transmembrane regions, and their extracellular interactions are often highly transient, having half-lives of less than a second. To detect transient extracellular interactions on a large scale, we developed AVEXIS (avidity-based extracellular interaction screen), a high-throughput assay that overcomes these technical issues and can detect very transient interactions (half-lives <or= 0.1 sec) with a low false-positive rate. We used it to systematically screen for receptor-ligand pairs within the zebrafish immunoglobulin superfamily and identified novel ligands for both well-known and orphan receptors. Genes encoding receptor-ligand pairs were often clustered phylogenetically and expressed in the same or adjacent tissues, immediately implying their involvement in similar biological processes. Using AVEXIS, we have determined the first systematic low-affinity extracellular protein interaction network, supported by independent biological data. This technique will now allow large-scale extracellular protein interaction mapping in a broad range of experimental contexts.

    Funded by: Wellcome Trust: 087656

    Genome research 2008;18;4;622-30

  • Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing.

    Campbell PJ, Pleasance ED, Stephens PJ, Dicks E, Rance R, Goodhead I, Follows GA, Green AR, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.

    During the clonal expansion of cancer from an ancestral cell with an initiating oncogenic mutation to symptomatic neoplasm, the occurrence of somatic mutations (both driver and passenger) can be used to track the on-going evolution of the neoplasm. All subclones within a cancer are phylogenetically related, with the prevalence of each subclone determined by its evolutionary fitness and the timing of its origin relative to other subclones. Recently developed massively parallel sequencing platforms promise the ability to detect rare subclones of genetic variants without a priori knowledge of the mutations involved. We used ultra-deep pyrosequencing to investigate intraclonal diversification at the Ig heavy chain locus in 22 patients with B-cell chronic lymphocytic leukemia. Analysis of a non-polymorphic control locus revealed artifactual insertions and deletions resulting from sequencing errors and base substitutions caused by polymerase misincorporation during PCR amplification. We developed an algorithm to differentiate genuine haplotypes of somatic hypermutations from such artifacts. This proved capable of detecting multiple rare subclones with frequencies as low as 1 in 5000 copies and allowed the characterization of phylogenetic interrelationships among subclones within each patient. This study demonstrates the potential for ultra-deep resequencing to recapitulate the dynamics of clonal evolution in cancer cell populations.

    Funded by: Wellcome Trust: 088340

    Proceedings of the National Academy of Sciences of the United States of America 2008;105;35;13081-6

  • Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing.

    Campbell PJ, Stephens PJ, Pleasance ED, O'Meara S, Li H, Santarius T, Stebbings LA, Leroy C, Edkins S, Hardy C, Teague JW, Menzies A, Goodhead I, Turner DJ, Clee CM, Quail MA, Cox A, Brown C, Durbin R, Hurles ME, Edwards PA, Bignell GR, Stratton MR and Futreal PA

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Human cancers often carry many somatically acquired genomic rearrangements, some of which may be implicated in cancer development. However, conventional strategies for characterizing rearrangements are laborious and low-throughput and have low sensitivity or poor resolution. We used massively parallel sequencing to generate sequence reads from both ends of short DNA fragments derived from the genomes of two individuals with lung cancer. By investigating read pairs that did not align correctly with respect to each other on the reference human genome, we characterized 306 germline structural variants and 103 somatic rearrangements to the base-pair level of resolution. The patterns of germline and somatic rearrangement were markedly different. Many somatic rearrangements were from amplicons, although rearrangements outside these regions, notably including tandem duplications, were also observed. Some somatic rearrangements led to abnormal transcripts, including two from internal tandem duplications and two fusion transcripts created by interchromosomal rearrangements. Germline variants were predominantly mediated by retrotransposition, often involving AluY and LINE elements. The results demonstrate the feasibility of systematic, genome-wide characterization of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.

    Funded by: Wellcome Trust: 077012, 088340

    Nature genetics 2008;40;6;722-9

  • Dictyostelium transcriptional responses to Pseudomonas aeruginosa: common and specific effects from PAO1 and PA14 strains.

    Carilla-Latorre S, Calvo-Garrido J, Bloomfield G, Skelton J, Kay RR, Ivens A, Martinez JL and Escalante R

    Instituto de Investigaciones Biomédicas Alberto Sols, Universidad Autónoma de Madrid-Consejo Superior de Investigaciones Científicas, Madrid, Spain.

    Background: Pseudomonas aeruginosa is one of the most relevant human opportunistic bacterial pathogens. Two strains (PAO1 and PA14) have been mainly used as models for studying virulence of P. aeruginosa. The strain PA14 is more virulent than PAO1 in a wide range of hosts including insects, nematodes and plants. Whereas some of the differences might be attributable to concerted action of determinants encoded in pathogenicity islands present in the genome of PA14, a global analysis of the differential host responses to these P. aeruginosa strains has not been addressed. Little is known about the host response to infection with P. aeruginosa and whether or not the global host transcription is being affected as a defense mechanism or altered in the benefit of the pathogen. Since the social amoeba Dictyostelium discoideum is a suitable host to study virulence of P. aeruginosa and other pathogens, we used available genomic tools in this model system to study the transcriptional host response to P. aeruginosa infection.

    Results: We have compared the virulence of the P. aeruginosa PAO1 and PA14 using D. discoideum and studied the transcriptional response of the amoeba upon infection. Our results showed that PA14 is more virulent in Dictyostelium than PA01using different plating assays. For studying the differential response of the host to infection by these model strains, D. discoideum cells were exposed to either P. aeruginosa PAO1 or P. aeruginosa PA14 (mixed with an excess of the non-pathogenic bacterium Klebsiella aerogenes as food supply) and after 4 hours, cellular RNA extracted. A three-way comparison was made using whole-genome D. discoideum microarrays between RNA samples from cells treated with the two different strains and control cells exposed only to K. aerogenes. The transcriptomic analyses have shown the existence of common and specific responses to infection. The expression of 364 genes changed in a similar way upon infection with one or another strain, whereas 169 genes were differentially regulated depending on whether the infecting strain was either P. aeruginosa PAO1 or PA14. Effects on metabolism, signalling, stress response and cell cycle can be inferred from the genes affected.

    Conclusion: Our results show that pathogenic Pseudomonas strains invoke both a common transcriptional response from Dictyostelium and a strain specific one, indicating that the infective process of bacterial pathogens can be strain-specific and is more complex than previously thought.

    BMC microbiology 2008;8;109

  • The Grandest Genetics Experiment Ever Performed on Man? A Y-Chromosomal Perspective on Genetic Variation in India

    Carvalho-Silva D R, Tyler-Smith, C

    International Journal of Human Genetics. 2008;8;21-9

  • Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

    Carver T, Berriman M, Tivey A, Patel C, Böhme U, Barrell BG, Parkhill J and Rajandream MA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences.

    Results: Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text.

    Availability: Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites:

    Funded by: Wellcome Trust: 082372

    Bioinformatics (Oxford, England) 2008;24;23;2672-6

  • A novel mechanistic spectrum underlies glaucoma-associated chromosome 6p25 copy number variation.

    Chanda B, Asai-Coakwell M, Ye M, Mungall AJ, Barrow M, Dobyns WB, Behesti H, Sowden JC, Carter NP, Walter MA and Lehmann OJ

    Departments of Ophthalmology, University of Alberta, Edmonton, Alberta, Canada.

    The factors that mediate chromosomal rearrangement remain incompletely defined. Among regions prone to structural variant formation, chromosome 6p25 is one of the few in which disease-associated segmental duplications and segmental deletions have been identified, primarily through gene dosage attributable ocular phenotypes. Using array comparative genome hybridization, we studied ten 6p25 duplication and deletion pedigrees and amplified junction fragments from each. Analysis of the breakpoint architecture revealed that all the rearrangements were non-recurrent, and in contrast to most previous examples the majority of the segmental duplications and deletions utilized coupled homologous and non-homologous recombination mechanisms. One junction fragment exhibited an unprecedented 367 bp insert derived from tandemly arranged breakpoint elements. While this accorded with a recently described replication-based mechanism, it differed from the previous example in being unassociated with template switching, and occurring in a segmental deletion. These results extend the mechanisms involved in structural variant formation, provide strong evidence that a spectrum of recombination, DNA repair and replication underlie 6p25 rearrangements, and have implications for genesis of copy number variations in other genomic regions. These findings highlight the benefits of undertaking the extensive studies necessary to characterize structural variants at the base pair level.

    Funded by: Wellcome Trust: 077008

    Human molecular genetics 2008;17;22;3446-58

  • Multiple pathways differentially regulate global oxidative stress responses in fission yeast.

    Chen D, Wilkinson CR, Watt S, Penkett CJ, Toone WM, Jones N and Bähler J

    Cancer Research UK Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, United Kingdom.

    Cellular protection against oxidative damage is relevant to ageing and numerous diseases. We analyzed the diversity of genome-wide gene expression programs and their regulation in response to various types and doses of oxidants in Schizosaccharomyces pombe. A small core gene set, regulated by the AP-1-like factor Pap1p and the two-component regulator Prr1p, was universally induced irrespective of oxidant and dose. Strong oxidative stresses led to a much larger transcriptional response. The mitogen-activated protein kinase (MAPK) Sty1p and the bZIP factor Atf1p were critical for the response to hydrogen peroxide. A newly identified zinc-finger protein, Hsr1p, is uniquely regulated by all three major regulatory systems (Sty1p-Atf1p, Pap1p, and Prr1p) and in turn globally supports gene expression in response to hydrogen peroxide. Although the overall transcriptional responses to hydrogen peroxide and t-butylhydroperoxide were similar, to our surprise, Sty1p and Atf1p were less critical for the response to the latter. Instead, another MAPK, Pmk1p, was involved in surviving this stress, although Pmk1p played only a minor role in regulating the transcriptional response. These data reveal a considerable plasticity and differential control of regulatory pathways in distinct oxidative stress conditions, providing both specificity and backup for protection from oxidative damage.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Molecular biology of the cell 2008;19;1;308-17

  • Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines.

    Choy E, Yelensky R, Bonakdar S, Plenge RM, Saxena R, De Jager PL, Shaw SY, Wolfish CS, Slavik JM, Cotsapas C, Rivas M, Dermitzakis ET, Cahir-McFarland E, Kieff E, Hafler D, Daly MJ and Altshuler D

    Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.

    Lymphoblastoid cell lines (LCLs), originally collected as renewable sources of DNA, are now being used as a model system to study genotype-phenotype relationships in human cells, including searches for QTLs influencing levels of individual mRNAs and responses to drugs and radiation. In the course of attempting to map genes for drug response using 269 LCLs from the International HapMap Project, we evaluated the extent to which biological noise and non-genetic confounders contribute to trait variability in LCLs. While drug responses could be technically well measured on a given day, we observed significant day-to-day variability and substantial correlation to non-genetic confounders, such as baseline growth rates and metabolic state in culture. After correcting for these confounders, we were unable to detect any QTLs with genome-wide significance for drug response. A much higher proportion of variance in mRNA levels may be attributed to non-genetic factors (intra-individual variance--i.e., biological noise, levels of the EBV virus used to transform the cells, ATP levels) than to detectable eQTLs. Finally, in an attempt to improve power, we focused analysis on those genes that had both detectable eQTLs and correlation to drug response; we were unable to detect evidence that eQTL SNPs are convincingly associated with drug response in the model. While LCLs are a promising model for pharmacogenetic experiments, biological noise and in vitro artifacts may reduce power and have the potential to create spurious association due to confounding.

    PLoS genetics 2008;4;11;e1000287

  • Kinase networks integrate profiles of N-methyl-D-aspartate receptor-mediated gene expression in hippocampus.

    Coba MP, Valor LM, Kopanitsa MV, Afinowi NO and Grant SG

    Genes to Cognition, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    The postsynaptic N-methyl-d-aspartate (NMDA) receptor activates multiple kinases and changes the phosphorylation of many postsynaptic proteins organized in signaling networks. Because the NMDA receptor is known to regulate gene expression, it is important to examine whether networks of kinases control signaling to gene expression. We examined the requirement of multiple kinases and NMDA receptor-interacting proteins for gene expression in mouse hippocampal slices. Protocols that induce long-term depression (LTD) and long-term potentiation (LTP) activated common kinases and overlapping gene expression profiles. Combinations of kinases were required for induction of each gene. Distinct combinations of kinases were required to up-regulate Arc, Npas4, Egr2, and Egr4 following either LTP or LTD protocols. Consistent with the combinatorial data, a mouse mutant model of the human cognition disease gene SAP102, which couples ERK kinase to the NMDA receptor, showed deregulated expression of specific genes. These data support a network model of postsynaptic integration where kinase signaling networks are recruited by differential synaptic activity and control both local synaptic events and activity-dependent gene expression.

    The Journal of biological chemistry 2008;283;49;34101-7

  • Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database.

    Cochrane G, Akhtar R, Aldebert P, Althorpe N, Baldwin A, Bates K, Bhattacharyya S, Bonfield J, Bower L, Browne P, Castro M, Cox T, Demiralp F, Eberhardt R, Faruque N, Hoad G, Jang M, Kulikova T, Labarga A, Leinonen R, Leonard S, Lin Q, Lopez R, Lorenc D, McWilliam H, Mukherjee G, Nardone F, Plaister S, Robinson S, Sobhany S, Vaughan R, Wu D, Zhu W, Apweiler R, Hubbard T and Birney E

    EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    The Ensembl Trace Archive ( and the EMBL Nucleotide Sequence Database (, known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.

    Funded by: Wellcome Trust: 062023, 077198, 085532

    Nucleic acids research 2008;36;Database issue;D5-12

  • Identifying protein domains with the Pfam database.

    Coggill P, Finn RD and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Pfam is a database of protein domain families, with each family represented by multiple sequence alignments and profile hidden Markov models (HMMs). In addition, each family has associated annotation, literature references, and links to other databases. The entries in Pfam are available via the World Wide Web and in flatfile format. This unit contains detailed information on how to access and utilize the information present in the Pfam database, namely the families, multiple alignments, and annotation. Details on running Pfam, both remotely and locally are presented.

    Funded by: Wellcome Trust: 087656

    Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] 2008;Chapter 2;Unit 2.5

  • nGASP--the nematode genome annotation assessment project.

    Coghlan A, Fiedler TJ, McKay SJ, Flicek P, Harris TW, Blasiar D, nGASP Consortium and Stein LD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Background: While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase.

    Results: The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders.

    Conclusion: This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.

    Funded by: NHGRI NIH HHS: P41 HG02223; Wellcome Trust

    BMC bioinformatics 2008;9;549

  • Finishing the finished human chromosome 22 sequence.

    Cole CG, McCann OT, Collins JE, Oliver K, Willey D, Gribble SM, Yang F, McLaren K, Rogers J, Ning Z, Beare DM and Dunham I

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Background: Although the human genome sequence was declared complete in 2004, the sequence was interrupted by 341 gaps of which 308 lay in an estimated approximately 28 Mb of euchromatin. While these gaps constitute only approximately 1% of the sequence, knowledge of the full complement of human genes and regulatory elements is incomplete without their sequences.

    Results: We have used a combination of conventional chromosome walking (aided by the availability of end sequences) in fosmid and bacterial artificial chromosome (BAC) libraries, whole chromosome shotgun sequencing, comparative genome analysis and long PCR to finish 8 of the 11 gaps in the initial chromosome 22 sequence. In addition, we have patched four regions of the initial sequence where the original clones were found to be deleted, or contained a deletion allele of a known gene, with a further 126 kb of new sequence. Over 1.018 Mb of new sequence has been generated to extend into and close the gaps, and we have annotated 16 new or extended gene structures and one pseudogene.

    Conclusion: Thus, we have made significant progress to completing the sequence of the euchromatic regions of human chromosome 22 using a combination of detailed approaches. Our experience suggests that substantial work remains to close the outstanding gaps in the human genome sequence.

    Funded by: Wellcome Trust

    Genome biology 2008;9;5;R78

  • Subcellular localization of intracellular human proteins by construction of tagged fusion proteins and transient expression in COS-7 Cells.

    Collins JE

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Identifying the subcellular compartment of a protein is an important step toward assigning protein function. Starting with a clone containing the open reading frame (ORF) of interest, it is possible to attach a variety of short amino acid tags or fluorescent proteins and detect the location of the protein, after transfection into a cell line, using fluorescent microscopy. By collecting data from various expression clone constructs, using a range of cell lines and double labeling with cellular compartment markers, a picture of the localization of a gene can be built up. This chapter describes how to obtain the ORF clone for your gene of interest, clone it into your choice of mammalian expression vector or vectors, transiently transfect for visualization, and where to get started when interpreting the results.

    Funded by: Wellcome Trust

    Methods in molecular biology (Clifton, N.J.) 2008;439;353-67

  • Mapping multiprotein complexes by affinity purification and mass spectrometry.

    Collins MO and Choudhary JS

    Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The combination of affinity purification and tandem mass spectrometry (MS) has emerged as a powerful approach to delineate biological processes. In particular, the use of epitope tags has allowed this approach to become scaleable and has bypassed difficulties associated with generation of antibodies. Single epitope tags and tandem affinity purification (TAP) tags have been used to systematically map protein complexes generating protein interaction data at a near proteome-wide scale. Recent developments in the design of tags, optimisation of purification conditions, experimental design and data analysis have greatly improved the sensitivity and specificity of this approach. Concomitant developments in MS, including high accuracy and high-throughput instrumentation together with quantitative MS methods, have facilitated large-scale and comprehensive analysis of multiprotein complexes.

    Current opinion in biotechnology 2008;19;4;324-30

  • Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.

    Collins MO, Yu L, Campuzano I, Grant SG and Choudhary JS

    Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, United Kingdom.

    We analyzed the mouse forebrain cytosolic phosphoproteome using sequential (protein and peptide) IMAC purifications, enzymatic dephosphorylation, and targeted tandem mass spectrometry analysis strategies. In total, using complementary phosphoenrichment and LC-MS/MS strategies, 512 phosphorylation sites on 540 non-redundant phosphopeptides from 162 cytosolic phosphoproteins were characterized. Analysis of protein domains and amino acid sequence composition of this data set of cytosolic phosphoproteins revealed that it is significantly enriched in intrinsic sequence disorder, and this enrichment is associated with both cellular location and phosphorylation status. The majority of phosphorylation sites found by MS were located outside of structural protein domains (97%) but were mostly located in regions of intrinsic sequence disorder (86%). 368 phosphorylation sites were located in long regions of disorder (over 40 amino acids long), and 94% of proteins contained at least one such long region of disorder. In addition, we found that 58 phosphorylation sites in this data set occur in 14-3-3 binding consensus motifs, linear motifs that are associated with unstructured regions in proteins. These results demonstrate that in this data set protein phosphorylation is significantly depleted in protein domains and significantly enriched in disordered protein sequences and that enrichment of intrinsic sequence disorder may be a common feature of phosphoproteomes. This supports the hypothesis that disordered regions in proteins allow kinases, phosphatases, and phosphorylation-dependent binding proteins to gain access to target sequences to regulate local protein conformation and activity.

    Funded by: Wellcome Trust

    Molecular & cellular proteomics : MCP 2008;7;7;1331-48

  • Characterization of the genomes of a diverse collection of Salmonella enterica serovar Typhimurium definitive phage type 104.

    Cooke FJ, Brown DJ, Fookes M, Pickard D, Ivens A, Wain J, Roberts M, Kingsley RA, Thomson NR and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Salmonella enterica serovar Typhimurium definitive phage type 104 (DT104) has caused significant morbidity and mortality in humans and animals for almost three decades. We completed the full DNA sequence of one DT104 strain, NCTC13348, and showed that significant differences between the genome of this isolate and the genome of the previously sequenced strain Salmonella serovar Typhimurium LT2 are due to integrated prophage elements and Salmonella genomic island 1 encoding antibiotic resistance genes. Thirteen isolates of Salmonella serovar Typhimurium DT104 with different pulsed-field gel electrophoresis (PFGE) profiles were analyzed by using multilocus sequence typing (MLST), plasmid profiling, hybridization to a pan-Salmonella DNA microarray, and prophage-based multiplex PCR. All the isolates belonged to a single MLST type, sequence type ST19. Microarray data demonstrated that the gene contents of the 13 DT104 isolates were remarkably conserved. The PFGE DNA fragment size differences in these isolates could be explained to a great extent by differences in the prophage and plasmid contents. Thus, here the nature of variation in different Salmonella serovar Typhimurium DT104 isolates is further defined at the gene and whole-genome levels, illustrating how this phage type evolves over time.

    Funded by: Wellcome Trust

    Journal of bacteriology 2008;190;24;8155-62

  • A common genomic framework for a diverse assembly of plasmids in the symbiotic nitrogen fixing bacteria.

    Crossman LC, Castillo-Ramírez S, McAnnula C, Lozano L, Vernikos GS, Acosta JL, Ghazoui ZF, Hernández-González I, Meakin G, Walker AW, Hynes MF, Young JP, Downie JA, Romero D, Johnston AW, Dávila G, Parkhill J and González V

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    This work centres on the genomic comparisons of two closely-related nitrogen-fixing symbiotic bacteria, Rhizobium leguminosarum biovar viciae 3841 and Rhizobium etli CFN42. These strains maintain a stable genomic core that is also common to other rhizobia species plus a very variable and significant accessory component. The chromosomes are highly syntenic, whereas plasmids are related by fewer syntenic blocks and have mosaic structures. The pairs of plasmids p42f-pRL12, p42e-pRL11 and p42b-pRL9 as well large parts of p42c with pRL10 are shown to be similar, whereas the symbiotic plasmids (p42d and pRL10) are structurally unrelated and seem to follow distinct evolutionary paths. Even though purifying selection is acting on the whole genome, the accessory component is evolving more rapidly. This component is constituted largely for proteins for transport of diverse metabolites and elements of external origin. The present analysis allows us to conclude that a heterogeneous and quickly diversifying group of plasmids co-exists in a common genomic framework.

    Funded by: Biotechnology and Biological Sciences Research Council: 104/P16988, 208/BRE13665, 208/PRS12210; Wellcome Trust

    PloS one 2008;3;7;e2567

  • The complete genome, comparative and functional analysis of Stenotrophomonas maltophilia reveals an organism heavily shielded by drug resistance determinants.

    Crossman LC, Gould VC, Dow JM, Vernikos GS, Okazaki A, Sebaihia M, Saunders D, Arrowsmith C, Carver T, Peters N, Adlem E, Kerhornou A, Lord A, Murphy L, Seeger K, Squares R, Rutter S, Quail MA, Rajandream MA, Harris D, Churcher C, Bentley SD, Parkhill J, Thomson NR and Avison MB

    Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    Background: Stenotrophomonas maltophilia is a nosocomial opportunistic pathogen of the Xanthomonadaceae. The organism has been isolated from both clinical and soil environments in addition to the sputum of cystic fibrosis patients and the immunocompromised. Whilst relatively distant phylogenetically, the closest sequenced relatives of S. maltophilia are the plant pathogenic xanthomonads.

    Results: The genome of the bacteremia-associated isolate S. maltophilia K279a is 4,851,126 bp and of high G+C content. The sequence reveals an organism with a remarkable capacity for drug and heavy metal resistance. In addition to a number of genes conferring resistance to antimicrobial drugs of different classes via alternative mechanisms, nine resistance-nodulation-division (RND)-type putative antimicrobial efflux systems are present. Functional genomic analysis confirms a role in drug resistance for several of the novel RND efflux pumps. S. maltophilia possesses potentially mobile regions of DNA and encodes a number of pili and fimbriae likely to be involved in adhesion and biofilm formation that may also contribute to increased antimicrobial drug resistance.

    Conclusion: The panoply of antimicrobial drug resistance genes and mobile genetic elements found suggests that the organism can act as a reservoir of antimicrobial drug resistance determinants in a clinical environment, which is an issue of considerable concern.

    Funded by: Wellcome Trust

    Genome biology 2008;9;4;R74

  • Response of Schizosaccharomyces pombe to zinc deficiency.

    Dainty SJ, Kennedy CA, Watt S, Bähler J and Whitehall SK

    Institute of Cell and Molecular Biosciences, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, United Kingdom.

    A component of the cellular response to zinc deficiency operates via control of transcript abundance. Therefore, microarray analysis was employed to identify Schizosaccharomyces pombe genes whose mRNA levels are regulated by intracellular zinc status. A set of 57 genes whose mRNA levels were substantially reduced in response to zinc deficiency was identified, while the mRNA levels of 63 genes were increased by this condition. In order to investigate the mechanisms that control these responses, a genetic screen was employed to identify mutants with defective zinc-responsive gene expression. Two strains (II-1 and V7) that were identified by this screen harbor mutations that are linked to zrt1+, which encodes a putative Zrt/IRT-like protein (ZIP) zinc uptake transporter. Importantly, zrt1+ mRNA levels are increased in response to zinc deprivation, and cells lacking functional Zrt1 are highly impaired in their ability to proliferate at limiting zinc concentrations. Furthermore, zrt1 null cells were found to have severely reduced zinc contents, indicating that Zrt1 functions as a key regulator of intracellular zinc levels in fission yeast. The deletion of fet4+, another zinc-responsive gene encoding a putative metal ion transporter, exacerbated the phenotypes associated with the loss of Zrt1, suggesting that Fet4 also plays a role in zinc uptake under limiting conditions.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C004752/1; Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118

    Eukaryotic cell 2008;7;3;454-64

  • The RNA WikiProject: community annotation of RNA families.

    Daub J, Gardner PP, Tate J, Ramsköld D, Manske M, Scott WG, Weinberg Z, Griffiths-Jones S and Bateman A

    The online encyclopedia Wikipedia has become one of the most important online references in the world and has a substantial and growing scientific content. A search of Google with many RNA-related keywords identifies a Wikipedia article as the top hit. We believe that the RNA community has an important and timely opportunity to maximize the content and quality of RNA information in Wikipedia. To this end, we have formed the RNA WikiProject ( as part of the larger Molecular and Cellular Biology WikiProject. We have created over 600 new Wikipedia articles describing families of noncoding RNAs based on the Rfam database, and invite the community to update, edit, and correct these articles. The Rfam database now redistributes this Wikipedia content as the primary textual annotation of its RNA families. Users can, therefore, for the first time, directly edit the content of one of the major RNA databases. We believe that this Wikipedia/Rfam link acts as a functioning model for incorporating community annotation into molecular biology databases.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: 077044

    RNA (New York, N.Y.) 2008;14;12;2462-4

  • Pathogenomics: an updated European Research Agenda.

    Demuth A, Aharonowitz Y, Bachmann TT, Blum-Oehler G, Buchrieser C, Covacci A, Dobrindt U, Emödy L, van der Ende A, Ewbank J, Fernández LA, Frosch M, García-Del Portillo F, Gilmore MS, Glaser P, Goebel W, Hasnain SE, Heesemann J, Islam K, Korhonen T, Maiden M, Meyer TF, Montecucco C, Oswald E, Parkhill J, Pucciarelli MG, Ron E, Svanborg C, Uhlin BE, Wai SN, Wehland J and Hacker J

    Institut für Molekulare Infektionsbiologie, Röntgenring 11, 97070 Würzburg, Germany.

    The emerging genomic technologies and bioinformatics provide novel opportunities for studying life-threatening human pathogens and to develop new applications for the improvement of human and animal health and the prevention, treatment, and diagnosis of infections. Based on the ecology and population biology of pathogens and related organisms and their connection to epidemiology, more accurate typing technologies and approaches will lead to better means of disease control. The analysis of the genome plasticity and gene pools of pathogenic bacteria including antigenic diversity and antigenic variation results in more effective vaccines and vaccine implementation programs. The study of newly identified and uncultivated microorganisms enables the identification of new threats. The scrutiny of the metabolism of the pathogen in the host allows the identification of new targets for anti-infectives and therapeutic approaches. The development of modulators of host responses and mediators of host damage will be facilitated by the research on interactions of microbes and hosts, including mechanisms of host damage, acute and chronic relationships as well as commensalisms. The study of multiple pathogenic and non-pathogenic microbes interacting in the host will improve the management of multiple infections and will allow probiotic and prebiotic interventions. Needless to iterate, the application of the results of improved prevention and treatment of infections into clinical tests will have a positive impact on the management of human and animal disease. The Pathogenomics Research Agenda draws on discussions with experts of the Network of Excellence "EuroPathoGenomics" at the management board meeting of the project held during 18-21 April 2007, in the Villa Vigoni, Menaggio, Italy. Based on a proposed European Research Agenda in the field of pathogenomics by the ERA-NET PathoGenoMics the meeting's participants updated the established list of topics as the research agenda for the future.

    Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 2008;8;3;386-93

  • From gene expression to disease risk.

    Dermitzakis ET

    Nature genetics 2008;40;5;492-3

  • Regulatory variation and evolution: implications for disease.

    Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA Cambridge, United Kingdom.

    In the past few years, it has become apparent that there is a substantial amount of noncoding DNA that contributes to genome function. However, the multidimensionality of noncoding DNA properties does not allow us to readily identify, characterize, and assess the functional impact of mutations, polymorphisms, and interspecific substitutions. In this chapter, we discuss the evolutionary properties of some of the known noncoding genomic elements, namely regulatory regions, and the extensions of this to other potentially functionally important noncoding regions such as conserved noncoding regions. The implications of this analysis for studies looking at molecular phenotypes such as gene expression and whole-organism phenotypes (e.g., disease) are presented in the context of the exploration of noncoding DNA properties. The aim is to take advantage of current and emerging analysis methods for noncoding DNA to elucidate the genetic causes of phenotypic variation.

    Advances in genetics 2008;61;295-306

  • Minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE).

    Deutsch EW, Ball CA, Berman JJ, Bova GS, Brazma A, Bumgarner RE, Campbell D, Causton HC, Christiansen JH, Daian F, Dauga D, Davidson DR, Gimenez G, Goo YA, Grimmond S, Henrich T, Herrmann BG, Johnson MH, Korb M, Mills JC, Oudes AJ, Parkinson HE, Pascal LE, Pollet N, Quackenbush J, Ramialison M, Ringwald M, Salgado D, Sansone SA, Sherlock G, Stoeckert CJ, Swedlow J, Taylor RC, Walashek L, Warford A, Wilkinson DG, Zhou Y, Zon LI, Liu AY and True LD

    Institute for Systems Biology, 1441 N 34th Street, Seattle, Washington 98103, USA.

    One purpose of the biomedical literature is to report results in sufficient detail that the methods of data collection and analysis can be independently replicated and verified. Here we present reporting guidelines for gene expression localization experiments: the minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE). MISFISHIE is modeled after the Minimum Information About a Microarray Experiment (MIAME) specification for microarray experiments. Both guidelines define what information should be reported without dictating a format for encoding that information. MISFISHIE describes six types of information to be provided for each experiment: experimental design, biomaterials and treatments, reporters, staining, imaging data and image characterizations. This specification has benefited the consortium within which it was developed and is expected to benefit the wider research community. We welcome feedback from the scientific community to help improve our proposal.

    Funded by: Medical Research Council: MC_U117532048, MC_U127527203; NHLBI NIH HHS: R33 HL073712; NIDDK NIH HHS: DK63328, DK63400, DK63481, DK63483, DK63630, R01 DK079798, R01 DK079798-01A2

    Nature biotechnology 2008;26;3;305-12

  • X-linked protocadherin 19 mutations cause female-limited epilepsy and cognitive impairment.

    Dibbens LM, Tarpey PS, Hynes K, Bayly MA, Scheffer IE, Smith R, Bomar J, Sutton E, Vandeleur L, Shoubridge C, Edkins S, Turner SJ, Stevens C, O'Meara S, Tofts C, Barthorpe S, Buck G, Cole J, Halliday K, Jones D, Lee R, Madison M, Mironenko T, Varian J, West S, Widaa S, Wray P, Teague J, Dicks E, Butler A, Menzies A, Jenkinson A, Shepherd R, Gusella JF, Afawi Z, Mazarib A, Neufeld MY, Kivity S, Lev D, Lerman-Sagie T, Korczyn AD, Derry CP, Sutherland GR, Friend K, Shaw M, Corbett M, Kim HG, Geschwind DH, Thomas P, Haan E, Ryan S, McKee S, Berkovic SF, Futreal PA, Stratton MR, Mulley JC and Gécz J

    Department of Genetic Medicine, Level 9 Rieger Building, Women's and Children's Hospital, 72 King William Road, North Adelaide, South Australia 5006, Australia.

    Epilepsy and mental retardation limited to females (EFMR) is a disorder with an X-linked mode of inheritance and an unusual expression pattern. Disorders arising from mutations on the X chromosome are typically characterized by affected males and unaffected carrier females. In contrast, EFMR spares transmitting males and affects only carrier females. Aided by systematic resequencing of 737 X chromosome genes, we identified different protocadherin 19 (PCDH19) gene mutations in seven families with EFMR. Five mutations resulted in the introduction of a premature termination codon. Study of two of these demonstrated nonsense-mediated decay of PCDH19 mRNA. The two missense mutations were predicted to affect adhesiveness of PCDH19 through impaired calcium binding. PCDH19 is expressed in developing brains of human and mouse and is the first member of the cadherin superfamily to be directly implicated in epilepsy or mental retardation.

    Funded by: NICHD NIH HHS: N01-HD-4-3368, N01-HD-4-3383; NIGMS NIH HHS: GM061354; NIMH NIH HHS: R01 MH 64547, R01 MH064547-01, R01 MH064547-01S1, R01 MH064547-02, R01 MH064547-02S1, R01 MH064547-03, R01 MH064547-04, R01 MH064547-05; Wellcome Trust

    Nature genetics 2008;40;6;776-81

  • Altered patterns of gene expression underlying the enhanced immunogenicity of radiation-attenuated schistosomes.

    Dillon GP, Feltwell T, Skelton J, Coulson PS, Wilson RA and Ivens AC

    Department of Biology, University of York, York, United Kingdom.

    Background: Schistosome cercariae only elicit high levels of protective immunity against a challenge infection if they are optimally attenuated by exposure to ionising radiation that truncates their migration in the lungs. However, the underlying molecular mechanisms responsible for the altered phenotype of the irradiated parasite that primes for protection have yet to be identified.

    We have used a custom microarray comprising probes derived from lung-stage parasites to compare patterns of gene expression in schistosomula derived from normal and irradiated cercariae. These were transformed in vitro and cultured for four, seven, and ten days to correspond in development to the priming parasites, before RNA extraction. At these late times after the radiation insult, transcript suppression was the principal feature of the irradiated larvae. Individual gene analysis revealed that only seven were significantly down-regulated in the irradiated versus normal larvae at the three time-points; notably, four of the protein products are present in the tegument or associated with its membranes, perhaps indicating a perturbed function. Grouping of transcripts using Gene Ontology (GO) and subsequent Gene Set Enrichment Analysis (GSEA) proved more informative in teasing out subtle differences. Deficiencies in signalling pathways involving G-protein-coupled receptors suggest the parasite is less able to sense its environment. Reduction of cytoskeleton transcripts could indicate compromised structure which, coupled with a paucity of neuroreceptor transcripts, may mean the parasite is also unable to respond correctly to external stimuli.

    The transcriptional differences observed are concordant with the known extended transit of attenuated parasites through skin-draining lymph nodes and the lungs: prolonged priming of the immune system by the parasite, rather than over-expression of novel antigens, could thus explain the efficacy of the irradiated vaccine.

    Funded by: Biotechnology and Biological Sciences Research Council; NIAID NIH HHS: AI54711-02

    PLoS neglected tropical diseases 2008;2;5;e240

  • Modifier effects between regulatory and protein-coding variation.

    Dimas AS, Stranger BE, Beazley C, Finn RD, Ingle CE, Forrest MS, Ritchie ME, Deloukas P, Tavaré S and Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Genome-wide associations have shown a lot of promise in dissecting the genetics of complex traits in humans with single variants, yet a large fraction of the genetic effects is still unaccounted for. Analyzing genetic interactions between variants (epistasis) is one of the potential ways forward. We investigated the abundance and functional impact of a specific type of epistasis, namely the interaction between regulatory and protein-coding variants. Using genotype and gene expression data from the 210 unrelated individuals of the original four HapMap populations, we have explored the combined effects of regulatory and protein-coding single nucleotide polymorphisms (SNPs). We predict that about 18% (1,502 out of 8,233 nsSNPs) of protein-coding variants are differentially expressed among individuals and demonstrate that regulatory variants can modify the functional effect of a coding variant in cis. Furthermore, we show that such interactions in cis can affect the expression of downstream targets of the gene containing the protein-coding SNP. In this way, a cis interaction between regulatory and protein-coding variants has a trans impact on gene expression. Given the abundance of both types of variants in human populations, we propose that joint consideration of regulatory and protein-coding variants may reveal additional genetic effects underlying complex traits and disease and may shed light on causes of differential penetrance of known disease variants.

    Funded by: Wellcome Trust

    PLoS genetics 2008;4;10;e1000244

  • Efficient targeted transcript discovery via array-based normalization of RACE libraries.

    Djebali S, Kapranov P, Foissac S, Lagarde J, Reymond A, Ucla C, Wyss C, Drenkow J, Dumais E, Murray RR, Lin C, Szeto D, Denoeud F, Calvo M, Frankish A, Harrow J, Makrythanasis P, Vidal M, Salehi-Ashtiani K, Antonarakis SE, Gingeras TR and Guigó R

    Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, Dr. Aiguader 88, 08003 Barcelona, Spain.

    Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. To improve sampling efficiency of human transcripts, we hybridized the products of the RACE reaction onto tiling arrays and used the detected exons to delineate a series of reverse-transcriptase (RT)-PCRs, through which the original RACE transcript population was segregated into simpler transcript populations. We independently cloned the products and sequenced randomly selected clones. This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.

    Funded by: NCI NIH HHS: N01-CO-12400; NHGRI NIH HHS: U01 HG003147-01, U01 HG003147-02, U01 HG003147-02S1, U01 HG003147-02S2, U01 HG003147-02S3, U01 HG003150-01, U01 HG003150-02, U01 HG003150-03, U01 HG003150-03S1, U01 HG003150-03S2, U01HG003147, U01HG003150, U54 HG004555-02, U54 HG004557-01, U54 HG004557-02, U54 HG004557-02S1, U54 HG004557-03; Wellcome Trust: 077198

    Nature methods 2008;5;7;629-35

  • NestedMICA as an ab initio protein motif discovery tool.

    Doğruel M, Down TA and Hubbard TJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Background: Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length.

    Results: Generally NestedMICA recovered most of the short (3-9 amino acid long) test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME.

    Conclusion: NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences.

    Availability: NestedMICA is available under the Lesser GPL open-source license from:

    Funded by: Wellcome Trust: 077198

    BMC bioinformatics 2008;9;19

  • Foreword. Comparative cytogenetics in the genomics era: cytogenomics comes of age.

    Dobigny G and Yang F

    Institut de Recherche pour le Développement, Campus de Baillarguet CS30016, 34988, Montferrier sur Lez, France.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2008;16;1;1-4

  • Protein kinases of malaria parasites: an update.

    Doerig C, Billker O, Haystead T, Sharma P, Tobin AB and Waters NC

    INSERM U609, Wellcome Centre for Molecular Parasitology, University of Glasgow Biomedical Research Centre, Glasgow G12 8TA, Scotland, UK.

    Protein kinases (PKs) play crucial roles in the control of proliferation and differentiation in eukaryotic cells. Research on protein phosphorylation has expanded tremendously in the past few years, in part as a consequence of the realization that PKs represent attractive drug targets in a variety of diseases. Activity in Plasmodium PK research has followed this trend, and several reports on various aspects of this subject were delivered at the Molecular Approaches to Malaria 2008 meeting (MAM2008), a sharp increase from the previous meeting. Here, the authors of most of these communications join to propose an integrated update of the development of the rapidly expanding field of Plasmodium kinomics.

    Trends in parasitology 2008;24;12;570-7

  • An association analysis of murine anxiety genes in humans implicates novel candidate genes for anxiety disorders.

    Donner J, Pirkola S, Silander K, Kananen L, Terwilliger JD, Lönnqvist J, Peltonen L and Hovatta I

    Research Program of Molecular Neurology, Biomedicum Helsinki, Finland.

    Background: Human anxiety disorders are complex diseases with largely unknown etiology. We have taken a cross-species approach to identify genes that regulate anxiety-like behavior with inbred mouse strains that differ in their innate anxiety levels as a model. We previously identified 17 genes with expression levels that correlate with anxiety behavior across the studied strains. In the present study, we tested their 13 known human homologues as candidate genes for human anxiety disorders with a genetic association study.

    Methods: We describe an anxiety disorder study sample derived from a Finnish population-based cohort and consisting of 321 patients and 653 carefully matched control subjects, all interviewed to obtain DSM-IV diagnoses. We genotyped altogether 208 single nucleotide polymorphisms (SNPs) (all non-synonymous SNPs, SNPs that alter potential microRNA binding sites, and gap-filling SNPs selected on the basis of HapMap information) from the investigated anxiety candidate genes.

    Results: Specific alleles and haplotypes of six of the examined genes revealed some evidence for association (p < or = .01). The most significant evidence for association with different anxiety disorder subtypes were: p = .0009 with ALAD (delta-aminolevulinate dehydratase) in social phobia, p = .009 with DYNLL2 (dynein light chain 2) in generalized anxiety disorder, and p = .004 with PSAP (prosaposin) in panic disorder.

    Conclusions: Our findings suggest that variants in these genes might predispose to specific human anxiety disorders. These results illustrate the potential utility of cross-species approaches in identification of candidate genes for psychiatric disorders.

    Funded by: Wellcome Trust: 089061

    Biological psychiatry 2008;64;8;672-80

  • A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis.

    Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, Kulesha E, Gräf S, Johnson N, Herrero J, Tomazou EM, Thorne NP, Bäckdahl L, Herberth M, Howe KL, Jackson DK, Miretti MM, Marioni JC, Birney E, Hubbard TJ, Durbin R, Tavaré S and Beck S

    Wellcome Trust Cancer Research UK Gurdon Institute, and Department of Genetics, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK.

    DNA methylation is an indispensible epigenetic modification required for regulating the expression of mammalian genomes. Immunoprecipitation-based methods for DNA methylome analysis are rapidly shifting the bottleneck in this field from data generation to data analysis, necessitating the development of better analytical tools. In particular, an inability to estimate absolute methylation levels remains a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling. To address this issue, we developed a cross-platform algorithm-Bayesian tool for methylation analysis (Batman)-for analyzing methylated DNA immunoprecipitation (MeDIP) profiles generated using oligonucleotide arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). We developed the latter approach to provide a high-resolution whole-genome DNA methylation profile (DNA methylome) of a mammalian genome. Strong correlation of our data, obtained using mature human spermatozoa, with those obtained using bisulfite sequencing suggest that combining MeDIP-seq or MeDIP-chip with Batman provides a robust, quantitative and cost-effective functional genomic strategy for elucidating the function of DNA methylation.

    Funded by: Cancer Research UK: C14303/A8646; Wellcome Trust: 077198, 083563, 084071

    Nature biotechnology 2008;26;7;779-85

  • Gene expression changes linked to antimicrobial resistance, oxidative stress, iron depletion and retained motility are observed when Burkholderia cenocepacia grows in cystic fibrosis sputum.

    Drevinek P, Holden MT, Ge Z, Jones AM, Ketchell I, Gill RT and Mahenthiralingam E

    Cardiff School of Biosciences, Cardiff University, Cardiff, UK.

    Background: Bacteria from the Burkholderia cepacia complex (Bcc) are the only group of cystic fibrosis (CF) respiratory pathogens that may cause death by an invasive infection known as cepacia syndrome. Their large genome (> 7000 genes) and multiple pathways encoding the same putative functions make virulence factor identification difficult in these bacteria.

    Methods: A novel microarray was designed to the genome of Burkholderia cenocepacia J2315 and transcriptomics used to identify genes that were differentially regulated when the pathogen was grown in a CF sputum-based infection model. Sputum samples from CF individuals infected with the same B. cenocepacia strain as genome isolate were used, hence, other than a dilution into a minimal growth medium (used as the control condition), no further treatment of the sputum was carried out.

    Results: A total of 723 coding sequences were significantly altered, with 287 upregulated and 436 downregulated; the microarray-observed expression was validated by quantitative PCR on five selected genes. B. cenocepacia genes with putative functions in antimicrobial resistance, iron uptake, protection against reactive oxygen and nitrogen species, secretion and motility were among the most altered in sputum. Novel upregulated genes included: a transmembrane ferric reductase (BCAL0270) implicated in iron metabolism, a novel protease (BCAL0849) that may play a role in host tissue destruction, an organic hydroperoxide resistance gene (BCAM2753), an oxidoreductase (BCAL1107) and a nitrite/sulfite reductase (BCAM1676) that may play roles in resistance to the host defenses. The assumptions of growth under iron-depletion and oxidative stress formulated from the microarray data were tested and confirmed by independent growth of B. cenocepacia under each respective environmental condition.

    Conclusion: Overall, our first full transcriptomic analysis of B. cenocepacia demonstrated the pathogen alters expression of over 10% of the 7176 genes within its genome when it grows in CF sputum. Novel genetic pathways involved in responses to antimicrobial resistance, oxidative stress, and iron metabolism were revealed by the microarray analysis. Virulence factors such as the cable pilus and Cenocepacia Pathogenicity Island were unaltered in expression. However, B. cenocepacia sustained or increased expression of motility-associated genes in sputum, maintaining a potentially invasive phenotype associated with cepacia syndrome.

    Funded by: Wellcome Trust: 075586

    BMC infectious diseases 2008;8;121

  • Emergence of highly fluoroquinolone-resistant Salmonella enterica serovar Typhi in a community-based fever surveillance from Kolkata, India.

    Dutta S, Sur D, Manna B, Sen B, Bhattacharya M, Bhattacharya SK, Wain J, Nair S, Clemens JD and Ochiai RL

    International journal of antimicrobial agents 2008;31;4;387-9

  • The evolution of the DLK1-DIO3 imprinted domain in mammals.

    Edwards CA, Mungall AJ, Matthews L, Ryder E, Gray DJ, Pask AJ, Shaw G, Graves JA, Rogers J, SAVOIR consortium, Dunham I, Renfree MB and Ferguson-Smith AC

    Department of Physiology, Development, and Neuroscience, University of Cambridge, Cambridge, United Kingdom.

    A comprehensive, domain-wide comparative analysis of genomic imprinting between mammals that imprint and those that do not can provide valuable information about how and why imprinting evolved. The imprinting status, DNA methylation, and genomic landscape of the Dlk1-Dio3 cluster were determined in eutherian, metatherian, and prototherian mammals including tammar wallaby and platypus. Imprinting across the whole domain evolved after the divergence of eutherian from marsupial mammals and in eutherians is under strong purifying selection. The marsupial locus at 1.6 megabases, is double that of eutherians due to the accumulation of LINE repeats. Comparative sequence analysis of the domain in seven vertebrates determined evolutionary conserved regions common to particular sub-groups and to all vertebrates. The emergence of Dlk1-Dio3 imprinting in eutherians has occurred on the maternally inherited chromosome and is associated with region-specific resistance to expansion by repetitive elements and the local introduction of noncoding transcripts including microRNAs and C/D small nucleolar RNAs. A recent mammal-specific retrotransposition event led to the formation of a completely new gene only in the eutherian domain, which may have driven imprinting at the cluster.

    Funded by: Medical Research Council: G0400156

    PLoS biology 2008;6;6;e135

  • Evolutionary expansion and anatomical specialization of synapse proteome complexity.

    Emes RD, Pocklington AJ, Anderson CN, Bayes A, Collins MO, Vickers CA, Croning MD, Malik BR, Choudhary JS, Armstrong JD and Grant SG

    Institute for Science and Technology in Medicine, Keele University, Thornburrow Drive, Hartshill, Stoke-on-Trent ST4 7QB, UK.

    Understanding the origins and evolution of synapses may provide insight into species diversity and the organization of the brain. Using comparative proteomics and genomics, we examined the evolution of the postsynaptic density (PSD) and membrane-associated guanylate kinase (MAGUK)-associated signaling complexes (MASCs) that underlie learning and memory. PSD and MASC orthologs found in yeast carry out basic cellular functions to regulate protein synthesis and structural plasticity. We observed marked changes in signaling complexity at the yeast-metazoan and invertebrate-vertebrate boundaries, with an expansion of key synaptic components, notably receptors, adhesion/cytoskeletal proteins and scaffold proteins. A proteomic comparison of Drosophila and mouse MASCs revealed species-specific adaptation with greater signaling complexity in mouse. Although synaptic components were conserved amongst diverse vertebrate species, mapping mRNA and protein expression in the mouse brain showed that vertebrate-specific components preferentially contributed to differences between brain regions. We propose that the evolution of synapse complexity around a core proto-synapse has contributed to invertebrate-vertebrate differences and to brain specialization.

    Funded by: Medical Research Council: G90/112, G90/93; Wellcome Trust: 077155

    Nature neuroscience 2008;11;7;799-806

  • Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture.

    Enattah NS, Jensen TG, Nielsen M, Lewinski R, Kuokkanen M, Rasinpera H, El-Shanti H, Seo JK, Alifrangis M, Khalil IF, Natah A, Ali A, Natah S, Comas D, Mehdi SQ, Groop L, Vestergaard EM, Imtiaz F, Rashed MS, Meyer B, Troelsen J and Peltonen L

    Department of Molecular Medicine, National Public Health Institute, Biomedicum Helsinki, 00251 Helsinki, Finland.

    The T(-13910) variant located in the enhancer element of the lactase (LCT) gene correlates perfectly with lactase persistence (LP) in Eurasian populations whereas the variant is almost nonexistent among Sub-Saharan African populations, showing high prevalence of LP. Here, we report identification of two new mutations among Saudis, also known for the high prevalence of LP. We confirmed the absence of the European T(-13910) and established two new mutations found as a compound allele: T/G(-13915) within the -13910 enhancer region and a synonymous SNP in the exon 17 of the MCM6 gene T/C(-3712), -3712 bp from the LCT gene. The compound allele is driven to a high prevalence among Middle East population(s). Our functional analyses in vitro showed that both SNPs of the compound allele, located 10 kb apart, are required for the enhancer effect, most probably mediated through the binding of the hepatic nuclear factor 1 alpha (HNF1 alpha). High selection coefficient (s) approximately 0.04 for LP phenotype was found for both T(-13910) and the compound allele. The European T(-13910) and the earlier identified East African G(-13907) LP allele share the same ancestral background and most likely the same history, probably related to the same cattle domestication event. In contrast, the compound Arab allele shows a different, highly divergent ancestral haplotype, suggesting that these two major global LP alleles have arisen independently, the latter perhaps in response to camel milk consumption. These results support the convergent evolution of the LP in diverse populations, most probably reflecting different histories of adaptation to milk culture.

    American journal of human genetics 2008;82;1;57-72

  • Nuclear envelope defects cause stem cell dysfunction in premature-aging mice.

    Espada J, Varela I, Flores I, Ugalde AP, Cadiñanos J, Pendás AM, Stewart CL, Tryggvason K, Blasco MA, Freije JM and López-Otín C

    Departamento de Bioquímica y Biología Molecular, Facultad de Medicina, Instituto Universitario de Oncología, Universidad de Oviedo, 33006 Oviedo, Spain.

    Nuclear lamina alterations occur in physiological aging and in premature aging syndromes. Because aging is also associated with abnormal stem cell homeostasis, we hypothesize that nuclear envelope alterations could have an important impact on stem cell compartments. To evaluate this hypothesis, we examined the number and functional competence of stem cells in Zmpste24-null progeroid mice, which exhibit nuclear lamina defects. We show that Zmpste24 deficiency causes an alteration in the number and proliferative capacity of epidermal stem cells. These changes are associated with an aberrant nuclear architecture of bulge cells and an increase in apoptosis of their supporting cells in the hair bulb region. These alterations are rescued in Zmpste24(-/-)Lmna(+/-) mutant mice, which do not manifest progeroid symptoms. We also report that molecular signaling pathways implicated in the regulation of stem cell behavior, such as Wnt and microphthalmia transcription factor, are altered in Zmpste24(-/-) mice. These findings establish a link between age-related nuclear envelope defects and stem cell dysfunction.

    The Journal of cell biology 2008;181;1;27-35

  • Evaluating the role of LPIN1 variation in insulin resistance, body weight, and human lipodystrophy in U.K. Populations.

    Fawcett KA, Grimsey N, Loos RJ, Wheeler E, Daly A, Soos M, Semple R, Syddall H, Cooper C, Siniossoglou S, O'Rahilly S, Wareham NJ and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, U.K.

    Objective: Loss of lipin 1 activity causes lipodystrophy and insulin resistance in the fld mouse, and LPIN1 expression and common genetic variation were recently suggested to influence adiposity and insulin sensitivity in humans. We aimed to conduct a comprehensive association study to clarify the influence of common LPIN1 variation on adiposity and insulin sensitivity in U.K. populations and to examine the role of LPIN1 mutations in insulin resistance syndromes.

    Twenty-two single nucleotide polymorphisms tagging common LPIN1 variation were genotyped in Medical Research Council (MRC) Ely (n = 1,709) and Hertfordshire (n = 2,901) population-based cohorts. LPIN1 exons, exon/intron boundaries, and 3' untranslated region were sequenced in 158 patients with idiopathic severe insulin resistance (including 23 lipodystrophic patients) and 48 control subjects.

    Results: We found no association between LPIN1 single nucleotide polymorphisms and fasting insulin but report a nominal association between rs13412852 and BMI (P = 0.042) in a meta-analysis of 8,504 samples from in-house and publicly available studies. Three rare nonsynonymous variants (A353T, R552K, and G582R) were detected in severely insulin-resistant patients. However, these did not cosegregate with disease in affected families, and Lipin1 protein expression and phosphorylation in patients with variants were indistinguishable from those in control subjects.

    Conclusions: Our data do not support a major effect of common LPIN1 variation on metabolic traits and suggest that mutations in LPIN1 are not a common cause of lipodystrophy in humans. The nominal associations with BMI and other metabolic traits in U.K. cohorts require replication in larger cohorts.

    Funded by: Medical Research Council: G0000934, G0000934(68341), G0701446, MC_U106188470, MC_U147574221, MC_U147585824, MC_UP_A620_1014, U.1475.00.002.00001.01 (85824), U.1475.00.004.00002.01(74221); Wellcome Trust: 068545, 068545/Z/02, 077016, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z

    Diabetes 2008;57;9;2527-33

  • Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder.

    Ferreira MA, O'Donovan MC, Meng YA, Jones IR, Ruderfer DM, Jones L, Fan J, Kirov G, Perlis RH, Green EK, Smoller JW, Grozeva D, Stone J, Nikolov I, Chambert K, Hamshere ML, Nimgaonkar VL, Moskvina V, Thase ME, Caesar S, Sachs GS, Franklin J, Gordon-Smith K, Ardlie KG, Gabriel SB, Fraser C, Blumenstiel B, Defelice M, Breen G, Gill M, Morris DW, Elkin A, Muir WJ, McGhee KA, Williamson R, MacIntyre DJ, MacLean AW, St CD, Robinson M, Van Beck M, Pereira AC, Kandaswamy R, McQuillin A, Collier DA, Bass NJ, Young AH, Lawrence J, Ferrier IN, Anjorin A, Farmer A, Curtis D, Scolnick EM, McGuffin P, Daly MJ, Corvin AP, Holmans PA, Blackwood DH, Gurling HM, Owen MJ, Purcell SM, Sklar P, Craddock N and Wellcome Trust Case Control Consortium

    Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.

    To identify susceptibility loci for bipolar disorder, we tested 1.8 million variants in 4,387 cases and 6,209 controls and identified a region of strong association (rs10994336, P = 9.1 x 10(-9)) in ANK3 (ankyrin G). We also found further support for the previously reported CACNA1C (alpha 1C subunit of the L-type voltage-gated calcium channel; combined P = 7.0 x 10(-8), rs1006737). Our results suggest that ion channelopathies may be involved in the pathogenesis of bipolar disorder.

    Funded by: Chief Scientist Office; Medical Research Council: G0500791, G0701003, G9309834, G9623693N; NCRR NIH HHS: U54 RR020278; NIMH NIH HHS: MH062137, MH063445, MH067288, MH63420, N01MH80001; Wellcome Trust: 076113, 077011, 082371

    Nature genetics 2008;40;9;1056-8

  • The minimum information about a genome sequence (MIGS) specification.

    Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, Ashburner M, Axelrod N, Baldauf S, Ballard S, Boore J, Cochrane G, Cole J, Dawyndt P, De Vos P, DePamphilis C, Edwards R, Faruque N, Feldman R, Gilbert J, Gilna P, Glöckner FO, Goldstein P, Guralnick R, Haft D, Hancock D, Hermjakob H, Hertz-Fowler C, Hugenholtz P, Joint I, Kagan L, Kane M, Kennedy J, Kowalchuk G, Kottmann R, Kolker E, Kravitz S, Kyrpides N, Leebens-Mack J, Lewis SE, Li K, Lister AL, Lord P, Maltsev N, Markowitz V, Martiny J, Methe B, Mizrachi I, Moxon R, Nelson K, Parkhill J, Proctor L, White O, Sansone SA, Spiers A, Stevens R, Swift P, Taylor C, Tateno Y, Tett A, Turner S, Ussery D, Vaughan B, Ward N, Whetzel T, San Gil I, Wilson G and Wipat A

    Natural Environmental Research Council Centre for Ecology and Hydrology, Oxford OX1 3SR, UK.

    With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.

    Funded by: Medical Research Council: G8225539; NLM NIH HHS: Z99 LM999999

    Nature biotechnology 2008;26;5;541-7

  • Meeting report: the fifth Genomic Standards Consortium (GSC) workshop.

    Field D, Garrity GM, Sansone SA, Sterk P, Gray T, Kyrpides N, Hirschman L, Glöckner FO, Kottmann R, Angiuoli S, White O, Dawyndt P, Thomson N, Gil IS, Morrison N, Tatusova T, Mizrachi I, Vaughan R, Cochrane G, Kagan L, Murphy S, Schriml L and Genomic Standards Consortium

    NERC Center for Ecology and Hydrology, Oxford, United Kingdom.

    This meeting report summarizes the proceedings of the fifth Genomic Standards Consortium (GSC) workshop held December 12-14, 2007, at the European Bioinformatics Institute (EBI), Cambridge, UK. This fifth workshop served as a milestone event in the evolution of the GSC (launched in September 2005); the key outcome of the workshop was the finalization of a stable version of the MIGS specification (v2.0) for publication. This accomplishment enables, and also in some cases necessitates, downstream activities, which are described in the multiauthor, consensus-driven articles in this special issue of OMICS produced as a direct result of the workshop. This report briefly summarizes the workshop and overviews the special issue. In particular, it aims to explain how the various GSC-led projects are working together to help this community achieve its stated mission of further standardizing the descriptions of genomes and metagenomes and implementing improved mechanisms of data exchange and integration to enable more accurate comparative analyses. Further information about the GSC and its range of activities can be found at

    Omics : a journal of integrative biology 2008;12;2;109-13

  • Meeting report: the fourth Genomic Standards Consortium (GSC) workshop.

    Field D, Glöckner FO, Garrity GM, Gray T, Sterk P, Cochrane G, Vaughan R, Kolker E, Kottmann R, Kyrpides N, Angiuoli S, Dawyndt P, Guralnick R, Goldstein P, Hall N, Hirschman L, Kravitz S, Lister AL, Markowitz V, Thomson N and Whetzel T

    NERC Centre for Ecology and Hydrology, Mansfield Road, Oxford, OX1 3SR United Kingdom.

    This meeting report summarizes the proceedings of the "eGenomics: Cataloguing our Complete Genome Collection IV" workshop held June 6-8, 2007, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This fourth workshop of the Genomic Standards Consortium (GSC) was a mix of short presentations, strategy discussions, and technical sessions. Speakers provided progress reports on the development of the "Minimum Information about a Genome Sequence" (MIGS) specification and the closely integrated "Minimum Information about a Metagenome Sequence" (MIMS) specification. The key outcome of the workshop was consensus on the next version of the MIGS/MIMS specification (v1.2). This drove further definition and restructuring of the MIGS/MIMS XML schema (syntax). With respect to semantics, a term vetting group was established to ensure that terms are properly defined and submitted to the appropriate ontology projects. Perhaps the single most important outcome of the workshop was a proposal to move beyond the concept of "minimum" to create a far richer XML schema that would define a "Genomic Contextual Data Markup Language" (GCDML) suitable for wider semantic integration across databases. GCDML will contain not only curated information (e.g., compliant with MIGS/MIMS), but also be extended to include a variety of data processing and calculations. Further information about the Genomic Standards Consortium and its range of activities can be found at

    Omics : a journal of integrative biology 2008;12;2;101-8

  • The Pfam protein families database.

    Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (, the USA ( and Sweden (, as well as from mirror sites in France ( and South Korea (

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Wellcome Trust: 087656

    Nucleic acids research 2008;36;Database issue;D281-8

  • Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's disease.

    Fisher SA, Tremelling M, Anderson CA, Gwilliam R, Bumpstead S, Prescott NJ, Nimmo ER, Massey D, Berzuini C, Johnson C, Barrett JC, Cummings FR, Drummond H, Lees CW, Onnie CM, Hanson CE, Blaszczyk K, Inouye M, Ewels P, Ravindrarajah R, Keniry A, Hunt S, Carter M, Watkins N, Ouwehand W, Lewis CM, Cardon L, Wellcome Trust Case Control Consortium, Lobo A, Forbes A, Sanderson J, Jewell DP, Mansfield JC, Deloukas P, Mathew CG, Parkes M and Satsangi J

    Department of Medical and Molecular Genetics, King's College London School of Medicine, 8th Floor Guy's Tower, Guy's Hospital, London SE1 9RT, UK.

    We report results of a nonsynonymous SNP scan for ulcerative colitis and identify a previously unknown susceptibility locus at ECM1. We also show that several risk loci are common to ulcerative colitis and Crohn's disease (IL23R, IL12B, HLA, NKX2-3 and MST1), whereas autophagy genes ATG16L1 and IRGM, along with NOD2 (also known as CARD15), are specific for Crohn's disease. These data provide the first detailed illustration of the genetic relationship between these common inflammatory bowel diseases.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0400874, G0600329, G0800383, G0800759, G0802320, MC_QA137934, MC_U105260799; Wellcome Trust: 076113, 077011, 089120

    Nature genetics 2008;40;6;710-2

  • Ensembl 2008.

    Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Gräf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kähäri A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJ, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A and Searle S

    European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    The Ensembl project ( is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein-DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.

    Funded by: Biotechnology and Biological Sciences Research Council: BBE0116401, BBS/B/13446, BBS/B/13470; Wellcome Trust: 062023, 077198

    Nucleic acids research 2008;36;Database issue;D707-14

  • Testing of diabetes-associated WFS1 polymorphisms in the Diabetes Prevention Program.

    Florez JC, Jablonski KA, McAteer J, Sandhu MS, Wareham NJ, Barroso I, Franks PW, Altshuler D, Knowler WC and Diabetes Prevention Program Research Group

    Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA.

    Wolfram syndrome (diabetes insipidus, diabetes mellitus, optic atrophy and deafness) is caused by mutations in the WFS1 gene. Recently, single nucleotide polymorphisms (SNPs) in WFS1 have been reproducibly associated with type 2 diabetes. We therefore examined the effects of these variants on diabetes incidence and response to interventions in the Diabetes Prevention Program (DPP), in which a lifestyle intervention or metformin treatment was compared with placebo.

    Methods: We genotyped the WFS1 SNPs rs10010131, rs752854 and rs734312 (H611R) in 3,548 DPP participants and performed Cox regression analysis using genotype, intervention and their interactions as predictors of diabetes incidence. We also evaluated the effect of these SNPs on insulin resistance and beta cell function at 1 year.

    Results: Although none of the three SNPs was associated with diabetes incidence in the overall cohort, white homozygotes for the previously reported protective alleles appeared less likely to develop diabetes in the lifestyle arm. Examination of the publicly available Diabetes Genetics Initiative genome-wide association dataset revealed that rs10012946, which is in strong linkage disequilibrium with the three WFS1 SNPs (r(2)=0.88-1.0), was associated with type 2 diabetes (allelic odds ratio 0.85, 95% CI 0.75-0.97, p=0.026). In the DPP, we noted a trend towards increased insulin secretion in carriers of the protective variants, although for most SNPs this was seen as compensatory for the diminished insulin sensitivity.

    The previously reported protective effect of select WFS1 alleles may be magnified by a lifestyle intervention. These variants appear to confer an improvement in beta cell function.

    Funded by: Medical Research Council: MC_U106179471; NIDDK NIH HHS: K23 DK65978-04, R01 DK072041-02, U01 DK048489, U01 DK048489-06

    Diabetologia 2008;51;3;451-7

  • The Catalogue of Somatic Mutations in Cancer (COSMIC).

    Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, Menzies A, Teague JW, Futreal PA and Stratton MR

    Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    COSMIC is currently the most comprehensive global resource for information on somatic mutations in human cancer, combining curation of the scientific literature with tumor resequencing data from the Cancer Genome Project at the Sanger Institute, U.K. Almost 4800 genes and 250000 tumors have been examined, resulting in over 50000 mutations available for investigation. This information can be accessed in a number of ways, the most convenient being the Web-based system which allows detailed data mining, presenting the results in easily interpretable formats. This unit describes the graphical system in detail, elaborating an example walkthrough and the many ways that the resulting information can be thoroughly investigated by combining data, respecializing the query, or viewing the results in different ways. Alternate protocols overview the available precompiled data files available for download.

    Funded by: Wellcome Trust: 077012

    Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] 2008;Chapter 10;Unit 10.11

  • Off-pathway, oxygen-dependent thiamine radical in the Krebs cycle.

    Frank RA, Kay CW, Hirst J and Luisi BF

    Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK.

    The catalytic cofactor thiamine diphosphate is found in many enzymes of central metabolism and is essential in all extant forms of life. We demonstrate the presence of an oxygen-dependent free radical in the thiamine diphosphate-dependent Escherichia coli 2-oxoglutarate dehydrogenase, which is a key component of the tricarboxylic acid (Krebs) cycle. The radical was sufficiently long-lived to be trapped by freezing in liquid nitrogen, and its electronic structure was investigated by electron paramagnetic resonance (EPR) and electron-nuclear double resonance (ENDOR). Taken together, the spectroscopic results revealed a delocalized pi radical on the enamine-thiazolium intermediate within the enzyme active site. The radical is generated as an intermediate during substrate turnover by a side reaction with molecular oxygen, resulting in the continuous production of reactive oxygen species under aerobic conditions. This off-pathway reaction may account for metabolic dysfunction associated with several neurodegenerative diseases. The possibility that the on-pathway reaction may proceed via a radical mechanism is discussed.

    Funded by: Wellcome Trust

    Journal of the American Chemical Society 2008;130;5;1662-8

  • Detection, imputation, and association analysis of small deletions and null alleles on oligonucleotide arrays.

    Franke L, de Kovel CG, Aulchenko YS, Trynka G, Zhernakova A, Hunt KA, Blauw HM, van den Berg LH, Ophoff R, Deloukas P, van Heel DA and Wijmenga C

    Complex Genetics Section, DBG-Department of Medical Genetics, University Medical Centre Utrecht, 3584 CG Utrecht, The Netherlands.

    Copy-number variation (CNV) is a major contributor to human genetic variation. Recently, CNV associations with human disease have been reported. Many genome-wide association (GWA) studies in complex diseases have been performed with sets of biallelic single-nucleotide polymorphisms (SNPs), but the available CNV methods are still limited. We present a new method (TriTyper) that can infer genotypes in case-control data sets for deletion CNVs, or SNPs with an extra, untyped allele at a high-resolution single SNP level. By accounting for linkage disequilibrium (LD), as well as intensity data, calling accuracy is improved. Analysis of 3102 unrelated individuals with European descent, genotyped with Illumina Infinium BeadChips, resulted in the identification of 1880 SNPs with a common untyped allele, and these SNPs are in strong LD with neighboring biallelic SNPs. Simulations indicate our method has superior power to detect associations compared to biallelic SNPs that are in LD with these SNPs, yet without increasing type I errors, as shown in a GWA analysis in celiac disease. Genotypes for 1204 triallelic SNPs could be fully imputed, with only biallelic-genotype calls, permitting association analysis of these SNPs in many published data sets. We estimate that 682 of the 1655 unique loci reflect deletions; this is on average 99 deletions per individual, four times greater than those detected by other methods. Whereas the identified loci are strongly enriched for known deletions, 61% have not been reported before. Genes overlapping with these loci more often have paralogs (p = 0.006) and biologically interact with fewer genes than expected (p = 0.004).

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, GR068094MA

    American journal of human genetics 2008;82;6;1316-33

  • Replication of the association between variants in WFS1 and risk of type 2 diabetes in European populations.

    Franks PW, Rolandsson O, Debenham SL, Fawcett KA, Payne F, Dina C, Froguel P, Mohlke KL, Willer C, Olsson T, Wareham NJ, Hallmans G, Barroso I and Sandhu MS

    Department of Public Health and Clinical Medicine, Umeå University Hospital, Umeå, Sweden.

    Mutations at the gene encoding wolframin (WFS1) cause Wolfram syndrome, a rare neurological condition. Associations between single nucleotide polymorphisms (SNPs) at WFS1 and type 2 diabetes have recently been reported. Thus, our aim was to replicate those associations in a northern Swedish case-control study of type 2 diabetes. We also performed a meta-analysis of published and previously unpublished data from Sweden, Finland and France, to obtain updated summary effect estimates.

    Methods: Four WFS1 SNPs (rs10010131, rs6446482, rs752854 and rs734312 [H611R]) were genotyped in a type 2 diabetes case-control study (n = 1,296/1,412) of Swedish adults. Logistic regression was used to assess the association between each WFS1 SNP and type 2 diabetes, following adjustment for age, sex and BMI. We then performed a meta-analysis of 11 studies of type 2 diabetes, comprising up to 14,139 patients and 16,109 controls, to obtain a summary effect estimate for the WFS1 variants.

    Results: In the northern Swedish study, the minor allele at rs752854 was associated with reduced type 2 diabetes risk [odds ratio (OR) 0.85, 95% CI 0.75-0.96, p=0.010]. Borderline statistical associations were observed for the remaining SNPs. The meta-analysis of the four independent replication studies for SNP rs10010131 and correlated variants showed evidence for statistical association (OR 0.87, 95% CI 0.82-0.93, p=4.5 x 10(-5)). In an updated meta-analysis of all 11 studies, strong evidence of statistical association was also observed (OR 0.89, 95% CI 0.86-0.92; p=4.9 x 10(-11)).

    In this study of WFS1 variants and type 2 diabetes risk, we have replicated the previously reported associations between SNPs at this locus and the risk of type 2 diabetes.

    Funded by: Medical Research Council: MC_U106179471; NIDDK NIH HHS: DK62370, DK72193, R01 DK072193-01, R01 DK072193-02, R01 DK072193-03; Wellcome Trust: 077016

    Diabetologia 2008;51;3;458-63

  • Submicroscopic duplications of the hydroxysteroid dehydrogenase HSD17B10 and the E3 ubiquitin ligase HUWE1 are associated with mental retardation.

    Froyen G, Corbett M, Vandewalle J, Jarvela I, Lawrence O, Meldrum C, Bauters M, Govaerts K, Vandeleur L, Van Esch H, Chelly J, Sanlaville D, van Bokhoven H, Ropers HH, Laumonnier F, Ranieri E, Schwartz CE, Abidi F, Tarpey PS, Futreal PA, Whibley A, Raymond FL, Stratton MR, Fryns JP, Scott R, Peippo M, Sipponen M, Partington M, Mowat D, Field M, Hackett A, Marynen P, Turner G and Gécz J

    Human Genome Laboratory, Department for Molecular and Developmental Genetics, VIB, B-3000 Leuven, Belgium.

    Submicroscopic copy-number imbalances contribute significantly to the genetic etiology of human disease. Here, we report a novel microduplication hot spot at Xp11.22 identified in six unrelated families with predominantly nonsyndromic XLMR. All duplications segregate with the disease, including the large families MRX17 and MRX31. The minimal, commonly duplicated region contains three genes: RIBC1, HSD17B10, and HUWE1. RIBC1 could be excluded on the basis of its absence of expression in the brain and because it escapes X inactivation in females. For the other genes, expression array and quantitative PCR analysis in patient cell lines compared to controls showed a significant upregulation of HSD17B10 and HUWE1 as well as several important genes in their molecular pathways. Loss-of-function mutations of HSD17B10 have previously been associated with progressive neurological disease and XLMR. The E3 ubiquitin ligase HUWE1 has been implicated in TP53-associated regulation of the neuronal cell cycle. Here, we also report segregating sequence changes of highly conserved residues in HUWE1 in three XLMR families; these changes are possibly associated with the phenotype. Our findings demonstrate that an increased gene dosage of HSD17B10, HUWE1, or both contribute to the etiology of XLMR and suggest that point mutations in HUWE1 are associated with this disease too.

    Funded by: NICHD NIH HHS: HD26202; NINDS NIH HHS: NS31564; Wellcome Trust

    American journal of human genetics 2008;82;2;432-43

  • Common variation in the ABO glycosyltransferase is associated with susceptibility to severe Plasmodium falciparum malaria.

    Fry AE, Griffiths MJ, Auburn S, Diakite M, Forton JT, Green A, Richardson A, Wilson J, Jallow M, Sisay-Joof F, Pinder M, Peshu N, Williams TN, Marsh K, Molyneux ME, Taylor TE, Rockett KA and Kwiatkowski DP

    The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.

    There is growing epidemiological and molecular evidence that ABO blood group affects host susceptibility to severe Plasmodium falciparum infection. The high frequency of common ABO alleles means that even modest differences in susceptibility could have a significant impact on the health of people living in malaria endemic regions. We performed an association study, the first to utilize key molecular genetic variation underlying the ABO system, genotyping >9000 individuals across three African populations. Using population- and family-based tests, we demonstrated that alleles producing functional ABO enzymes are associated with greater risk of severe malaria phenotypes (particularly malarial anemia) in comparison with the frameshift deletion underlying blood group O: case-control allelic odds ratio (OR), 1.2; 95% confidence interval (CI), 1.09-1.32; P = 0.0003; family-studies allelic OR, 1.19; 95% CI, 1.08-1.32; P = 0.001; pooled across all studies allelic OR, 1.18; 95% CI, 1.11-1.26; P = 2 x 10(-7). We found suggestive evidence of a parent-of-origin effect at the ABO locus by analyzing the family trios. Non-O haplotypes inherited from mothers, but not fathers, are significantly associated with severe malaria (likelihood ratio test of Weinberg, P = 0.046). Finally, we used HapMap data to demonstrate a region of low F(ST) (-0.001) between the three main HapMap population groups across the ABO locus, an outlier in the empirical distribution of F(ST) across chromosome 9 (approximately 99.5-99.9th centile). This low F(ST) region may be a signal of long-standing balancing selection at the ABO locus, caused by multiple infectious pathogens including P. falciparum.

    Funded by: Medical Research Council: G0600230(77610); Wellcome Trust: 074586, 076934

    Human molecular genetics 2008;17;4;567-76

  • Advantages of q-PCR as a method of screening for gene targeting in mammalian cells using conventional and whole BAC-based constructs.

    Gómez-Rodríguez J, Washington V, Cheng J, Dutra A, Pak E, Liu P, McVicar DW and Schwartzberg PL

    Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.

    We evaluate here the use of real-time quantitative PCR (q-PCR) as a method for screening for homologous recombinants generated in mammalian cells from either conventional gene-targeting constructs or whole BAC-based constructs. Using gene-targeted events at different loci, we show that q-PCR is a highly sensitive and accurate method for screening for conventional gene targeting that can reduce the number of clones requiring follow-up screening by Southern blotting. We further compared q-PCR to fluorescent in situ hybridization (FISH) for the detection of gene-targeting events using full-length BAC-based constructs designed to introduce mutations either into one gene or simultaneously into two adjacent genes. We find that although BAC-based constructs appeared to have high rates of homologous recombination when evaluated by FISH, screening by FISH was prone to false positives that were detected by q-PCR. Our results demonstrate the utility of q-PCR as a screening tool for gene targeting and further highlight potential problems with the use of whole BAC-based constructs for homologous recombination.

    Funded by: Wellcome Trust

    Nucleic acids research 2008;36;18;e117

  • SrfB, a member of the Serum Response Factor family of transcription factors, regulates starvation response and early development in Dictyostelium.

    Galardi-Castilla M, Pergolizzi B, Bloomfield G, Skelton J, Ivens A, Kay RR, Bozzaro S and Sastre L

    Instituto de Investigaciones Biomédicas CSIC/UAM. Arturo Duperier, 4. 28029 Madrid, Spain.

    The Serum Response Factor (SRF) is an important regulator of cell proliferation and differentiation. Dictyostelium discoideum srfB gene codes for an SRF homologue and is expressed in vegetative cells and during development under the control of three alternative promoters, which show different cell-type specific patterns of expression. The two more proximal promoters directed gene transcription in prestalk AB, stalk and lower-cup cells. The generation of a strain where the srfB gene has been interrupted (srfB(-)) has shown that this gene is required for regulation of actin-cytoskeleton-related functions, such as cytokinesis and macropinocytosis. The mutant failed to develop well in suspension, but could be rescued by cAMP pulsing, suggesting a defect in cAMP signaling. srfB(-) cells showed impaired chemotaxis to cAMP and defective lateral pseudopodium inhibition. Nevertheless, srfB(-) cells aggregated on agar plates and nitrocellulose filters 2 h earlier than wild type cells, and completed development, showing an increased tendency to form slug structures. Analysis of wild type and srfB(-) strains detected significant differences in the regulation of gene expression upon starvation. Genes coding for lysosomal and ribosomal proteins, developmentally-regulated genes, and some genes coding for proteins involved in cytoskeleton regulation were deregulated during the first stages of development.

    Funded by: Wellcome Trust

    Developmental biology 2008;316;2;260-74

  • ES cell pluripotency and germ-layer formation require the SWI/SNF chromatin remodeling component BAF250a.

    Gao X, Tate P, Hu P, Tjian R, Skarnes WC and Wang Z

    Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Richard Simches Research Center, 185 Cambridge Street, Boston, MA 02114, USA.

    ATP-dependent chromatin remodeling complexes are a notable group of epigenetic modifiers that use the energy of ATP hydrolysis to change the structure of chromatin, thereby altering its accessibility to nuclear factors. BAF250a (ARID1a) is a unique and defining subunit of the BAF chromatin remodeling complex with the potential to facilitate chromosome alterations critical during development. Our studies show that ablation of BAF250a in early mouse embryos results in developmental arrest (about embryonic day 6.5) and absence of the mesodermal layer, indicating its critical role in early germ-layer formation. Moreover, BAF250a deficiency compromises ES cell pluripotency, severely inhibits self-renewal, and promotes differentiation into primitive endoderm-like cells under normal feeder-free culture conditions. Interestingly, this phenotype can be partially rescued by the presence of embryonic fibroblast cells. DNA microarray, immunostaining, and RNA analyses revealed that BAF250a-mediated chromatin remodeling contributes to the proper expression of numerous genes involved in ES cell self-renewal, including Sox2, Utf1, and Oct4. Furthermore, the pluripotency defects in BAF250a mutant ES cells appear to be cell lineage-specific. For example, embryoid body-based analyses demonstrated that BAF250a-ablated stem cells are defective in differentiating into fully functional mesoderm-derived cardiomyocytes and adipocytes but are capable of differentiating into ectoderm-derived neurons. Our results suggest that BAF250a is a key component of the gene regulatory machinery in ES cells controlling self-renewal, differentiation, and cell lineage decisions.

    Funded by: Howard Hughes Medical Institute

    Proceedings of the National Academy of Sciences of the United States of America 2008;105;18;6656-61

  • Mutation of miRNA target sequences during human evolution.

    Gardner PP and Vinther J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    It has long-been hypothesized that changes in non-protein-coding genes and the regulatory sequences controlling expression could undergo positive selection. Here we identify 402 putative microRNA (miRNA) target sequences that have been mutated specifically in the human lineage and show that genes containing such deletions are more highly expressed than their mouse orthologs. Our findings indicate that some miRNA target mutations are fixed by positive selection and might have been involved in the evolution of human-specific traits.

    Trends in genetics : TIG 2008;24;6;262-5

  • Apheresis donors and platelet function: inherent platelet responsiveness influences platelet quality.

    Garner SF, Jones CI, Stephens J, Burns P, Walton J, Bernard A, Angenent W, Ouwehand WH, Goodall AH and BLOODOMICS Consortium

    Department of Haematology, University of Cambridge, Cambridge, UK.

    Background: Process-induced platelet (PLT) activation occurs with all production methods, including apheresis. Recent studies have highlighted the range and consistence of interindividual variation in the PLT response, but little is known about the contribution of a donors' inherent PLT responsiveness to the activation state of the apheresis PLTs or the effect of frequent apheresis on donors' PLTs.

    The relationship between the donors' PLT response on the apheresis PLTs was studied in 47 individuals selected as having PLTs with inherently low, intermediate, or high responsiveness. Whole-blood flow cytometry was used to measure PLT activation (levels of bound fibrinogen) before donation and in the apheresis PLTs. The effects of regular apheresis on the activation status of donors' PLTs were studied by comparing the in vivo activation status of PLTs from apheresis (n = 349) and whole-blood donors (n = 157), before donation. The effect of apheresis per se on PLT activation was measured in 10 apheresis donors before and after donation.

    Results: The level of PLT activation in the apheresis packs was generally higher than in the donor, and the most activated PLTs were from high-responder donors. There was no significant difference in PLT activation before donation between the apheresis and whole-blood donors (p = 0.697), and there was no consistent evidence of activation in the donors immediately after apheresis.

    Conclusion: The most activated apheresis PLTs were obtained from donors with more responsive PLTs. Regular apheresis, however, does not lead to PLT activation in the donors.

    Transfusion 2008;48;4;673-80

  • The Gene Ontology project in 2008.

    Gene Ontology Consortium

    The Gene Ontology (GO) project ( provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of 'reference' genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface.

    Funded by: NHGRI NIH HHS: HG02273

    Nucleic acids research 2008;36;Database issue;D440-4

  • The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.

    Genome Information Integration Project And H-Invitational 2, Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, Takeda J, Taniya T, Sakate R, Kikugawa S, Shimada M, Tanino M, Koyanagi KO, Barrero RA, Gough C, Chun HW, Habara T, Hanaoka H, Hayakawa Y, Hilton PB, Kaneko Y, Kanno M, Kawahara Y, Kawamura T, Matsuya A, Nagata N, Nishikata K, Noda AO, Nurimoto S, Saichi N, Sakai H, Sanbonmatsu R, Shiba R, Suzuki M, Takabayashi K, Takahashi A, Tamura T, Tanaka M, Tanaka S, Todokoro F, Yamaguchi K, Yamamoto N, Okido T, Mashima J, Hashizume A, Jin L, Lee KB, Lin YC, Nozaki A, Sakai K, Tada M, Miyazaki S, Makino T, Ohyanagi H, Osato N, Tanaka N, Suzuki Y, Ikeo K, Saitou N, Sugawara H, O'Donovan C, Kulikova T, Whitfield E, Halligan B, Shimoyama M, Twigger S, Yura K, Kimura K, Yasuda T, Nishikawa T, Akiyama Y, Motono C, Mukai Y, Nagasaki H, Suwa M, Horton P, Kikuno R, Ohara O, Lancet D, Eveno E, Graudens E, Imbeaud S, Debily MA, Hayashizaki Y, Amid C, Han M, Osanger A, Endo T, Thomas MA, Hirakawa M, Makalowski W, Nakao M, Kim NS, Yoo HS, De Souza SJ, Bonaldo Mde F, Niimura Y, Kuryshev V, Schupp I, Wiemann S, Bellgard M, Shionyu M, Jia L, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Zhang Q, Go M, Minoshima S, Ohtsubo M, Hanada K, Tonellato P, Isogai T, Zhang J, Lenhard B, Kim S, Chen Z, Hinz U, Estreicher A, Nakai K, Makalowska I, Hide W, Tiffin N, Wilming L, Chakraborty R, Soares MB, Chiusano ML, Suzuki Y, Auffray C, Yamaguchi-Kabata Y, Itoh T, Hishiki T, Fukuchi S, Nishikawa K, Sugano S, Nomura N, Tateno Y, Imanishi T and Gojobori T

    Japan Biological Information Research Center, Japan Biological Informatics Consortium, Japan.

    Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB;, a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.

    Funded by: Wellcome Trust: 077198

    Nucleic acids research 2008;36;Database issue;D793-9

  • SLC9A6 mutations cause X-linked mental retardation, microcephaly, epilepsy, and ataxia, a phenotype mimicking Angelman syndrome.

    Gilfillan GD, Selmer KK, Roxrud I, Smith R, Kyllerman M, Eiklid K, Kroken M, Mattingsdal M, Egeland T, Stenmark H, Sjøholm H, Server A, Samuelsson L, Christianson A, Tarpey P, Whibley A, Stratton MR, Futreal PA, Teague J, Edkins S, Gecz J, Turner G, Raymond FL, Schwartz C, Stevenson RE, Undlien DE and Strømme P

    Department of Medical Genetics, Ullevål University Hospital, NO-0407 Oslo, Norway.

    Linkage analysis and DNA sequencing in a family exhibiting an X-linked mental retardation (XLMR) syndrome, characterized by microcephaly, epilepsy, ataxia, and absent speech and resembling Angelman syndrome, identified a deletion in the SLC9A6 gene encoding the Na(+)/H(+) exchanger NHE6. Subsequently, other mutations were found in a male with mental retardation (MR) who had been investigated for Angelman syndrome and in two XLMR families with epilepsy and ataxia, including the family designated as having Christianson syndrome. Therefore, mutations in SLC9A6 cause X-linked mental retardation. Additionally, males with findings suggestive of unexplained Angelman syndrome should be considered as potential candidates for SLC9A6 mutations.

    Funded by: NICHD NIH HHS: HD2606; NINDS NIH HHS: NS31564; Wellcome Trust

    American journal of human genetics 2008;82;4;1003-10

  • Phylogenomics of the dog and fox family (Canidae, Carnivora) revealed by chromosome painting.

    Graphodatsky AS, Perelman PL, Sokolovskaya NV, Beklemisheva VR, Serdukova NA, Dobigny G, O'Brien SJ, Ferguson-Smith MA and Yang F

    Institute of Cytology and Genetics, SB RAS, Novosibirsk, 630090, Russia.

    Canid species (dogs and foxes) have highly rearranged karyotypes and thus represent a challenge for conventional comparative cytogenetic studies. Among them, the domestic dog is one of the best-mapped species in mammals, constituting an ideal reference genome for comparative genomic study. Here we report the results of genome-wide comparative mapping of dog chromosome-specific probes onto chromosomes of the dhole, fennec fox, and gray fox, as well as the mapping of red fox chromosome-specific probes onto chromosomes of the corsac fox. We also present an integrated comparative chromosome map between the species studied here and all canids studied previously. The integrated map demonstrates an extensive conservation of whole chromosome arms across different canid species. In addition, we have generated a comprehensive genome phylogeny for the Canidae on the basis of the chromosome rearrangements revealed by comparative painting. This genome phylogeny has provided new insights into the karyotypic relationships among the canids. Our results, together with published data, allow the formulation of a likely Canidae ancestral karyotype (CAK, 2n = 82), and reveal that at least 6-24 chromosomal fission/fusion events are needed to convert the CAK karyotype to that of the modern canids.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2008;16;1;129-43

  • Tracking genome organization in rodents by Zoo-FISH.

    Graphodatsky AS, Yang F, Dobigny G, Romanenko SA, Biltueva LS, Perelman PL, Beklemisheva VR, Alkalaeva EZ, Serdukova NA, Ferguson-Smith MA, Murphy WJ and Robinson TJ

    Institute of Cytology and Genetics, SB RAS 6300090, Novosibirsk, Russia.

    The number of rodent species examined by modern comparative genomic approaches, particularly chromosome painting, is limited. The use of human whole-chromosome painting probes to detect regions of homology in the karyotypes of the rodent index species, the mouse and rat, has been hindered by the highly rearranged nature of their genomes. In contrast, recent studies have demonstrated that non-murid rodents display more conserved genomes, underscoring their suitability for comparative genomic and higher-order systematic studies. Here we provide the first comparative chromosome maps between human and representative rodents of three major rodent lineages Castoridae, Pedetidae and Dipodidae. A comprehensive analysis of these data and those published for Sciuridae show (1) that Castoridae, Pedetidae and Dipodidae form a monophyletic group, and (2) that the European beaver Castor fiber (Castoridae) and the birch mouse Sicista betulina (Dipodidae) are sister species to the exclusion of the springhare Pedetes capensis (Pedetidae), thus resolving an enduring trifurcation in rodent higher-level systematics. Our results together with published data on the Sciuridae allow the formulation of a putative rodent ancestral karyotype (2n = 50) that is thought to comprise the following 26 human chromosomal segments and/or segmental associations: HSA1pq, 1q/10p, 2pq, 2q, 3a, 3b/19p, 3c/21, 4b, 5, 6, 7a, 7b/16p, 8p/4a/8p, 8q, 9/11, 10q, 12a/22a, 12b/22b, 13, 14/15, 16q/19q, 17, 18, 20, X and Y. These findings provide insights into the likely composition of the ancestral rodent karyotype and an improved understanding of placental genome evolution.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2008;16;2;261-74

  • Replication-timing-correlated spatial chromatin arrangements in cancer and in primate interphase nuclei.

    Grasser F, Neusser M, Fiegler H, Thormeyer T, Cremer M, Carter NP, Cremer T and Müller S

    Department of Biology II, Human Genetics, Ludwig-Maximilians University Munich, Planegg-Martinsreid, Germany.

    Using published high-resolution data on S-phase replication timing, we determined the three-dimensional (3D) nuclear arrangement of 33 very-early-replicating and 31 very-late-replicating loci. We analyzed diploid human, non-human primate and rearranged tumor cells by 3D fluorescence in situ hybridization with the aim of investigating the impact of chromosomal structural changes on the nuclear organization of these loci. Overall, their topology was found to be largely conserved between cell types, species and in tumor cells. Early-replicating loci were localized in the nuclear interior, whereas late-replicating loci showed a broader distribution with a higher preference for the periphery than for late-BrdU-incorporation foci. However, differences in the spatial arrangement of early and late loci of chromosome 2, as compared with those from chromosome 5, 7 and 17, argue against replication timing as a major driving force for the 3D radial genome organization in human lymphoblastoid cell nuclei. Instead, genomic properties, and local gene density in particular, were identified as the decisive parameters. Further detailed comparisons of chromosome 7 loci in primate and tumor cells suggest that the inversions analyzed influence nuclear topology to a greater extent than the translocations, thus pointing to geometrical constraints in the 3D conformation of a chromosome territory.

    Funded by: Wellcome Trust

    Journal of cell science 2008;121;Pt 11;1876-86

  • ORegAnno: an open-access community-driven resource for regulatory annotation.

    Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, Aerts S, Mahony S, Sleumer MC, Bilenky M, Haeussler M, Griffith M, Gallo SM, Giardine B, Hooghe B, Van Loo P, Blanco E, Ticoll A, Lithwick S, Portales-Casamar E, Donaldson IJ, Robertson G, Wadelius C, De Bleser P, Vlieghe D, Halfon MS, Wasserman W, Hardison R, Bergman CM, Jones SJ and Open Regulatory Annotation Consortium

    Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada.

    ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the 'publication queue' allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or 'check out' papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at:

    Nucleic acids research 2008;36;Database issue;D107-13

  • miRBase: tools for microRNA genomics.

    Griffiths-Jones S, Saini HK, van Dongen S and Enright AJ

    Faculty of Life Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester, UK.

    miRBase is the central online repository for microRNA (miRNA) nomenclature, sequence data, annotation and target prediction. The current release (10.0) contains 5071 miRNA loci from 58 species, expressing 5922 distinct mature miRNA sequences: a growth of over 2000 sequences in the past 2 years. miRBase provides a range of data to facilitate studies of miRNA genomics: all miRNAs are mapped to their genomic coordinates. Clusters of miRNA sequences in the genome are highlighted, and can be defined and retrieved with any inter-miRNA distance. The overlap of miRNA sequences with annotated transcripts, both protein- and non-coding, are described. Finally, graphical views of the locations of a wide range of genomic features in model organisms allow for the first time the prediction of the likely boundaries of many miRNA primary transcripts. miRBase is available at

    Funded by: Wellcome Trust

    Nucleic acids research 2008;36;Database issue;D154-8

  • The missing link: Bordetella petrii is endowed with both the metabolic versatility of environmental bacteria and virulence traits of pathogenic Bordetellae.

    Gross R, Guzman CA, Sebaihia M, dos Santos VA, Pieper DH, Koebnik R, Lechner M, Bartels D, Buhrmester J, Choudhuri JV, Ebensen T, Gaigalat L, Herrmann S, Khachane AN, Larisch C, Link S, Linke B, Meyer F, Mormann S, Nakunst D, Rückert C, Schneiker-Bekel S, Schulze K, Vorhölter FJ, Yevsa T, Engle JT, Goldman WE, Pühler A, Göbel UB, Goesmann A, Blöcker H, Kaiser O and Martinez-Arias R

    Chair of Microbiology, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany.

    Background: Bordetella petrii is the only environmental species hitherto found among the otherwise host-restricted and pathogenic members of the genus Bordetella. Phylogenetically, it connects the pathogenic Bordetellae and environmental bacteria of the genera Achromobacter and Alcaligenes, which are opportunistic pathogens. B. petrii strains have been isolated from very different environmental niches, including river sediment, polluted soil, marine sponges and a grass root. Recently, clinical isolates associated with bone degenerative disease or cystic fibrosis have also been described.

    Results: In this manuscript we present the results of the analysis of the completely annotated genome sequence of the B. petrii strain DSMZ12804. B. petrii has a mosaic genome of 5,287,950 bp harboring numerous mobile genetic elements, including seven large genomic islands. Four of them are highly related to the clc element of Pseudomonas knackmussii B13, which encodes genes involved in the degradation of aromatics. Though being an environmental isolate, the sequenced B. petrii strain also encodes proteins related to virulence factors of the pathogenic Bordetellae, including the filamentous hemagglutinin, which is a major colonization factor of B. pertussis, and the master virulence regulator BvgAS. However, it lacks all known toxins of the pathogenic Bordetellae.

    Conclusion: The genomic analysis suggests that B. petrii represents an evolutionary link between free-living environmental bacteria and the host-restricted obligate pathogenic Bordetellae. Its remarkable metabolic versatility may enable B. petrii to thrive in very different ecological niches.

    BMC genomics 2008;9;449

  • Systemic spread is an early step in breast cancer.

    Hüsemann Y, Geigl JB, Schubert F, Musiani P, Meyer M, Burghart E, Forni G, Eils R, Fehm T, Riethmüller G and Klein CA

    Department of Pathology, Division of Oncogenomics, University of Regensburg, Regensburg 93053, Germany.

    It is widely accepted that metastasis is a late event in cancer progression. Here, however, we show that tumor cells can disseminate systemically from earliest epithelial alterations in HER-2 and PyMT transgenic mice and from ductal carcinoma in situ in women. Wild-type mice transplanted with single premalignant HER-2 transgenic glands displayed disseminated tumor cells and micrometastasis in bone marrow and lungs. The number of disseminated cancer cells and their karyotypic abnormalities were similar for small and large tumors in patients and mouse models. When activated by bone marrow transplantation into wild-type recipients, 80 early-disseminated cancer cells sufficed to induce lethal carcinosis. Therefore, release from dormancy of early-disseminated cancer cells may frequently account for metachronous metastasis.

    Cancer cell 2008;13;1;58-68

  • High divergence in primate-specific duplicated regions: human and chimpanzee chorionic gonadotropin beta genes.

    Hallast P, Saarela J, Palotie A and Laan M

    Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.

    Background: Low nucleotide divergence between human and chimpanzee does not sufficiently explain the species-specific morphological, physiological and behavioral traits. As gene duplication is a major prerequisite for the emergence of new genes and novel biological processes, comparative studies of human and chimpanzee duplicated genes may assist in understanding the mechanisms behind primate evolution. We addressed the divergence between human and chimpanzee duplicated genomic regions by using Luteinizing Hormone Beta (LHB)/Chorionic Gonadotropin Beta (CGB) gene cluster as a model. The placental CGB genes that are essential for implantation have evolved from an ancestral pituitary LHB gene by duplications in the primate lineage.

    Results: We shotgun sequenced and compared the human (45,165 bp) and chimpanzee (39,876 bp) LHB/CGB regions and hereby present evidence for structural variation resulting in discordant number of CGB genes (6 in human, 5 in chimpanzee). The scenario of species-specific parallel duplications was supported (i) as the most parsimonious solution requiring the least rearrangement events to explain the interspecies structural differences; (ii) by the phylogenetic trees constructed with fragments of intergenic regions; (iii) by the sequence similarity calculations. Across the orthologous regions of LHB/CGB cluster, substitutions and indels contributed approximately equally to the interspecies divergence and the distribution of nucleotide identity was correlated with the regional repeat content. Intraspecies gene conversion may have shaped the LHB/CGB gene cluster. The substitution divergence (1.8-2.59%) exceeded two-three fold the estimates for single-copy loci and the fraction of transversional mutations was increased compared to the unique sequences (43% versus approximately 30%). Despite the high sequence identity among LHB/CGB genes, there are signs of functional differentiation among the gene copies. Estimates for dn/ds rate ratio suggested a purifying selection on LHB and CGB8, and a positive evolution of CGB1.

    Conclusion: If generalized, our data suggests that in addition to species-specific deletions and duplications, parallel duplication events may have contributed to genetic differences separating humans from their closest relatives. Compared to unique genomic segments, duplicated regions are characterized by high divergence promoted by intraspecies gene conversion and species-specific chromosomal rearrangements, including the alterations in gene copy number.

    Funded by: Wellcome Trust: 070191/Z/03/Z

    BMC evolutionary biology 2008;8;195

  • A novel streptococcal integrative conjugative element involved in iron acquisition.

    Heather Z, Holden MT, Steward KF, Parkhill J, Song L, Challis GL, Robinson C, Davis-Poynter N and Waller AS

    Centre for Preventive Medicine, Animal Health Trust, Lanwades Park, Kentford, Newmarket, Suffolk, UK.

    In this study, we determined the function of a novel non-ribosomal peptide synthetase (NRPS) system carried by a streptococcal integrative conjugative element (ICE), ICESe2. The NRPS shares similarity with the yersiniabactin system found in the high-pathogenicity island of Yersinia sp. and is the first of its kind to be identified in streptococci. We named the NRPS product 'equibactin' and genes of this locus eqbA-N. ICESe2, although absolutely conserved in Streptococcus equi, the causative agent of equine strangles, was absent from all strains of the closely related opportunistic pathogen Streptococcus zooepidemicus. Binding of EqbA, a DtxR-like regulator, to the eqbB promoter was increased in the presence of cations. Deletion of eqbA resulted in a small-colony phenotype. Further deletion of the irp2 homologue eqbE, or the genes eqbH, eqbI and eqbJ encoding a putative ABC transporter, or addition of the iron chelator nitrilotriacetate, reversed this phenotype, implicating iron toxicity. Quantification of (55)Fe accumulation and sensitivity to streptonigrin suggested that equibactin is secreted by S. equi and that the eqbH, eqbI and eqbJ genes are required for its associated iron import. In agreement with a structure-based model of equibactin synthesis, supplementation of chemically defined media with salicylate was required for equibactin production.

    Molecular microbiology 2008;70;5;1274-92

  • Sequence data swell for nematodes.

    Hertz-Fowler C and Pain A

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    With more than 80,000 described species that are extremely diverse in terms of ecology and biology, the Nematoda phylum is one of the most common animal phyla. This month's Genome Watch describes genomes of several nematodes, including that of the human filarial parasite Brugia malayi.

    Nature reviews. Microbiology 2008;6;11;800-1

  • Telomeric expression sites are highly conserved in Trypanosoma brucei.

    Hertz-Fowler C, Figueiredo LM, Quail MA, Becker M, Jackson A, Bason N, Brooks K, Churcher C, Fahkro S, Goodhead I, Heath P, Kartvelishvili M, Mungall K, Harris D, Hauser H, Sanders M, Saunders D, Seeger K, Sharp S, Taylor JE, Walker D, White B, Young R, Cross GA, Rudenko G, Barry JD, Louis EJ and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Subtelomeric regions are often under-represented in genome sequences of eukaryotes. One of the best known examples of the use of telomere proximity for adaptive purposes are the bloodstream expression sites (BESs) of the African trypanosome Trypanosoma brucei. To enhance our understanding of BES structure and function in host adaptation and immune evasion, the BES repertoire from the Lister 427 strain of T. brucei were independently tagged and sequenced. BESs are polymorphic in size and structure but reveal a surprisingly conserved architecture in the context of extensive recombination. Very small BESs do exist and many functioning BESs do not contain the full complement of expression site associated genes (ESAGs). The consequences of duplicated or missing ESAGs, including ESAG9, a newly named ESAG12, and additional variant surface glycoprotein genes (VSGs) were evaluated by functional assays after BESs were tagged with a drug-resistance gene. Phylogenetic analysis of constituent ESAG families suggests that BESs are sequence mosaics and that extensive recombination has shaped the evolution of the BES repertoire. This work opens important perspectives in understanding the molecular mechanisms of antigenic variation, a widely used strategy for immune evasion in pathogens, and telomere biology.

    Funded by: NIAID NIH HHS: R01AI021729; Wellcome Trust: 095161

    PloS one 2008;3;10;e3527

  • A Myo6 mutation destroys coordination between the myosin heads, revealing new functions of myosin VI in the stereocilia of mammalian inner ear hair cells.

    Hertzano R, Shalit E, Rzadzinska AK, Dror AA, Song L, Ron U, Tan JT, Shitrit AS, Fuchs H, Hasson T, Ben-Tal N, Sweeney HL, de Angelis MH, Steel KP and Avraham KB

    Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel.

    Myosin VI, found in organisms from Caenorhabditis elegans to humans, is essential for auditory and vestibular function in mammals, since genetic mutations lead to hearing impairment and vestibular dysfunction in both humans and mice. Here, we show that a missense mutation in this molecular motor in an ENU-generated mouse model, Tailchaser, disrupts myosin VI function. Structural changes in the Tailchaser hair bundles include mislocalization of the kinocilia and branching of stereocilia. Transfection of GFP-labeled myosin VI into epithelial cells and delivery of endocytic vesicles to the early endosome revealed that the mutant phenotype displays disrupted motor function. The actin-activated ATPase rates measured for the D179Y mutation are decreased, and indicate loss of coordination of the myosin VI heads or 'gating' in the dimer form. Proper coordination is required for walking processively along, or anchoring to, actin filaments, and is apparently destroyed by the proximity of the mutation to the nucleotide-binding pocket. This loss of myosin VI function may not allow myosin VI to transport its cargoes appropriately at the base and within the stereocilia, or to anchor the membrane of stereocilia to actin filaments via its cargos, both of which lead to structural changes in the stereocilia of myosin VI-impaired hair cells, and ultimately leading to deafness.

    Funded by: Medical Research Council: G0300212, MC_QA137918; NEI NIH HHS: R01-EY12695; NIDCD NIH HHS: R01-DC0099100; Wellcome Trust

    PLoS genetics 2008;4;10;e1000207

  • The genome sequence of the fish pathogen Aliivibrio salmonicida strain LFI1238 shows extensive evidence of gene decay.

    Hjerde E, Lorentzen MS, Holden MT, Seeger K, Paulsen S, Bason N, Churcher C, Harris D, Norbertczak H, Quail MA, Sanders S, Thurston S, Parkhill J, Willassen NP and Thomson NR

    Department of Molecular Biotechnology, Institute of Medical Biology, Faculty of Medicine, University of Tromsø, N-9037 Tromsø, Norway.

    Background: The fish pathogen Aliivibrio salmonicida is the causative agent of cold-water vibriosis in marine aquaculture. The Gram-negative bacterium causes tissue degradation, hemolysis and sepsis in vivo.

    Results: In total, 4 286 protein coding sequences were identified, and the 4.6 Mb genome of A. salmonicida has a six partite architecture with two chromosomes and four plasmids. Sequence analysis revealed a highly fragmented genome structure caused by the insertion of an extensive number of insertion sequence (IS) elements. The IS elements can be related to important evolutionary events such as gene acquisition, gene loss and chromosomal rearrangements. New A. salmonicida functional capabilities that may have been aquired through horizontal DNA transfer include genes involved in iron-acquisition, and protein secretion and play potential roles in pathogenicity. On the other hand, the degeneration of 370 genes and consequent loss of specific functions suggest that A. salmonicida has a reduced metabolic and physiological capacity in comparison to related Vibrionaceae species.

    Conclusion: Most prominent is the loss of several genes involved in the utilisation of the polysaccharide chitin. In particular, the disruption of three extracellular chitinases responsible for enzymatic breakdown of chitin makes A. salmonicida unable to grow on the polymer form of chitin. These, and other losses could restrict the variety of carrier organisms A. salmonicida can attach to, and associate with. Gene acquisition and gene loss may be related to the emergence of A. salmonicida as a fish pathogen.

    BMC genomics 2008;9;616

  • High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi.

    Holt KE, Parkhill J, Mazzoni CJ, Roumagnac P, Weill FX, Goodhead I, Rance R, Baker S, Maskell DJ, Wain J, Dolecek C, Achtman M and Dougan G

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Isolates of Salmonella enterica serovar Typhi (Typhi), a human-restricted bacterial pathogen that causes typhoid, show limited genetic variation. We generated whole-genome sequences for 19 Typhi isolates using 454 (Roche) and Solexa (Illumina) technologies. Isolates, including the previously sequenced CT18 and Ty2 isolates, were selected to represent major nodes in the phylogenetic tree. Comparative analysis showed little evidence of purifying selection, antigenic variation or recombination between isolates. Rather, evolution in the Typhi population seems to be characterized by ongoing loss of gene function, consistent with a small effective population size. The lack of evidence for antigenic variation driven by immune selection is in contrast to strong adaptive selection for mutations conferring antibiotic resistance in Typhi. The observed patterns of genetic isolation and drift are consistent with the proposed key role of asymptomatic carriers of Typhi as the main reservoir of this pathogen, highlighting the need for identification and treatment of carriers.

    Funded by: Wellcome Trust: 067321

    Nature genetics 2008;40;8;987-93

  • Gastric cancer in Japan.

    Horowitz RE

    The New England journal of medicine 2008;359;22;2393-4; author reply 2394-5

  • Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project.

    Horton R, Gibson R, Coggill P, Miretti M, Allcock RJ, Almeida J, Forbes S, Gilbert JG, Halls K, Harrow JL, Hart E, Howe K, Jackson DK, Palmer S, Roberts AN, Sims S, Stewart CA, Traherne JA, Trevanion S, Wilming L, Rogers J, de Jong PJ, Elliott JF, Sawcer S, Todd JA, Trowsdale J and Beck S

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.

    Funded by: NHGRI NIH HHS: U54 HG004555-01; Wellcome Trust: 048880, 062023, 077198

    Immunogenetics 2008;60;1;1-18

  • Array painting reveals a high frequency of balanced translocations in breast cancer cell lines that break in cancer-relevant genes.

    Howarth KD, Blood KA, Ng BL, Beavis JC, Chua Y, Cooke SL, Raby S, Ichimura K, Collins VP, Carter NP and Edwards PA

    Department of Pathology, Hutchison-MRC Research Centre, University of Cambridge, Cambridge, UK.

    Chromosome translocations in the common epithelial cancers are abundant, yet little is known about them. They have been thought to be almost all unbalanced and therefore dismissed as mostly mediating tumour suppressor loss. We present a comprehensive analysis by array painting of the chromosome translocations of breast cancer cell lines HCC1806, HCC1187 and ZR-75-30. In array painting, chromosomes are isolated by flow cytometry, amplified and hybridized to DNA microarrays. A total of 200 breakpoints were identified and all were mapped to 1 Mb resolution on bacterial artificial chromosome (BAC) arrays, then 40 selected breakpoints, including all balanced breakpoints, were further mapped on tiling-path BAC arrays or to around 2 kb resolution using oligonucleotide arrays. Many more of the translocations were balanced at 1 Mb resolution than expected, either reciprocal (eight in total) or balanced for at least one participating chromosome (19 paired breakpoints). Second, many of the breakpoints were at genes that are plausible targets of oncogenic translocation, including balanced breaks at CTCF, EP300/p300 and FOXP4. Two gene fusions were demonstrated, TAX1BP1-AHCY and RIF1-PKD1L1. Our results support the idea that chromosome rearrangements may play an important role in common epithelial cancers such as breast cancer.

    Funded by: Cancer Research UK: A4392; Wellcome Trust: 077008

    Oncogene 2008;27;23;3345-59

  • No evidence in a large UK collection for celiac disease risk variants reported by a Spanish study.

    Hunt KA, Franke L, Deloukas P, Wijmenga C and van Heel DA

    Funded by: Wellcome Trust: 077011

    Gastroenterology 2008;134;5;1629-30; author reply 1630-1

  • Newly identified genetic risk variants for celiac disease related to the immune response.

    Hunt KA, Zhernakova A, Turner G, Heap GA, Franke L, Bruinenberg M, Romanos J, Dinesen LC, Ryan AW, Panesar D, Gwilliam R, Takeuchi F, McLaren WM, Holmes GK, Howdle PD, Walters JR, Sanders DS, Playford RJ, Trynka G, Mulder CJ, Mearin ML, Verbeek WH, Trimble V, Stevens FM, O'Morain C, Kennedy NP, Kelleher D, Pennington DJ, Strachan DP, McArdle WL, Mein CA, Wapenaar MC, Deloukas P, McGinnis R, McManus R, Wijmenga C and van Heel DA

    Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, 4 Newark Street, London E1 2AT, UK.

    Our genome-wide association study of celiac disease previously identified risk variants in the IL2-IL21 region. To identify additional risk variants, we genotyped 1,020 of the most strongly associated non-HLA markers in an additional 1,643 cases and 3,406 controls. Through joint analysis including the genome-wide association study data (767 cases, 1,422 controls), we identified seven previously unknown risk regions (P < 5 x 10(-7)). Six regions harbor genes controlling immune responses, including CCR3, IL12A, IL18RAP, RGS1, SH2B3 (nsSNP rs3184504) and TAGAP. Whole-blood IL18RAP mRNA expression correlated with IL18RAP genotype. Type 1 diabetes and celiac disease share HLA-DQ, IL2-IL21, CCR3 and SH2B3 risk regions. Thus, this extensive genome-wide association follow-up study has identified additional celiac disease risk variants in relevant biological pathways.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068094, 068545/Z/02, 084743, GR068094MA

    Nature genetics 2008;40;4;395-402

  • The functional impact of structural variation in humans.

    Hurles ME, Dermitzakis ET and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Structural variation includes many different types of chromosomal rearrangement and encompasses millions of bases in every human genome. Over the past 3 years, the extent and complexity of structural variation has become better appreciated. Diverse approaches have been adopted to explore the functional impact of this class of variation. As disparate indications of the important biological consequences of genome dynamism are accumulating rapidly, we review the evidence that structural variation has an appreciable impact on cellular phenotypes, disease and human evolution.

    Funded by: Wellcome Trust: 077009, 077014, 077046

    Trends in genetics : TIG 2008;24;5;238-45

  • 1p36 is a preferential target of chromosome 1 deletions in astrocytic tumours and homozygously deleted in a subset of glioblastomas.

    Ichimura K, Vogazianou AP, Liu L, Pearson DM, Bäcklund LM, Plant K, Baird K, Langford CF, Gregory SG and Collins VP

    Department of Pathology, Division of Molecular Histopathology, University of Cambridge, Cambridge, UK.

    Astrocytic, oligodendroglial and mixed gliomas are the commonest gliomas in adults. They have distinct phenotypes and clinical courses, but as they exist as a continuous histological spectrum, differentiating them can be difficult. Co-deletions of total 1p and 19q are found in the majority of oligodendrogliomas and considered as a diagnostic marker and a prognostic indicator. The 1p status of astrocytomas has not yet been thoroughly examined. Using a chromosome 1 tile path array, we investigated 108 adult astrocytic tumours for copy number alterations. Total 1p deletions were rare (2%), however partial deletions involving 1p36 were frequently identified in anaplastic astrocytomas (22%) and glioblastomas (34%). Multivariate analysis showed that patients with total 1p deletions had significantly longer survival (P=0.005). In nine glioblastomas homozygous deletions at 1p36 were identified. No somatic mutations were found among the five genes located in the homozygously deleted region. However, the CpG island of TNFRSF9 was hypermethylated in 19% of astrocytic tumours and 87% of glioma cell lines. TNFRSF9 expression was upregulated after demethylation of glioma cell lines. Akt3 amplifications were found in four glioblastomas. Our results indicate that 1p deletions are common anaplastic astrocytomas and glioblastomas but are distinct from the 1p abnormalities in oligodendrogliomas.

    Funded by: Cancer Research UK: A6618; Wellcome Trust

    Oncogene 2008;27;14;2097-108

  • In vitro differential sensitivity of melanomas to phenothiazines is based on the presence of codon 600 BRAF mutation.

    Ikediobi ON, Reimers M, Durinck S, Blower PE, Futreal AP, Stratton MR and Weinstein JN

    Genomics and Bioinformatics Group, Laboratory of Molecular Pharmacology, National Cancer Institute, Bethesda, Maryland, USA.

    The panel of 60 human cancer cell lines (the NCI-60) assembled by the National Cancer Institute for anticancer drug discovery is a widely used resource. We previously sequenced 24 cancer genes in those cell lines. Eleven of the genes were found to be mutated in three or more of the lines. Using a pharmacogenomic approach, we analyzed the relationship between drug activity and mutations in those 11 genes (APC, RB1, KRAS, NRAS, BRAF, PIK3CA, PTEN, STK11, MADH4, TP53, and CDKN2A). That analysis identified an association between mutation in BRAF and the antiproliferative potential of phenothiazine compounds. Phenothiazines have been used as antipsychotics and as adjunct antiemetics during cancer chemotherapy and more recently have been reported to have anticancer properties. However, to date, the anticancer mechanism of action of phenothiazines has not been elucidated. To follow up on the initial pharmacologic observations in the NCI-60 screen, we did pharmacologic experiments on 11 of the NCI-60 cell lines and, prospectively, on an additional 24 lines. The studies provide evidence that BRAF mutation (codon 600) in melanoma as opposed to RAS mutation is predictive of an increase in sensitivity to phenothiazines as determined by 3-(4,5-dimethylthiazol-2-yl)-5-(3-carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazolium, inner salt assay (Wilcoxon P = 0.007). That pattern of increased sensitivity to phenothiazines based on the presence of codon 600 BRAF mutation may be unique to melanomas, as we do not observe it in a panel of colorectal cancers. The findings reported here have potential implications for the use of phenothiazines in the treatment of V600E BRAF mutant melanoma.

    Funded by: Wellcome Trust: 077012

    Molecular cancer therapeutics 2008;7;6;1337-46

  • A novel CpG island set identifies tissue-specific methylation at developmental gene loci.

    Illingworth R, Kerr A, Desousa D, Jørgensen H, Ellis P, Stalker J, Jackson D, Clee C, Plumb R, Rogers J, Humphray S, Cox T, Langford C and Bird A

    Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom.

    CpG islands (CGIs) are dense clusters of CpG sequences that punctuate the CpG-deficient human genome and associate with many gene promoters. As CGIs also differ from bulk chromosomal DNA by their frequent lack of cytosine methylation, we devised a CGI enrichment method based on nonmethylated CpG affinity chromatography. The resulting library was sequenced to define a novel human blood CGI set that includes many that are not detected by current algorithms. Approximately half of CGIs were associated with annotated gene transcription start sites, the remainder being intra- or intergenic. Using an array representing over 17,000 CGIs, we established that 6%-8% of CGIs are methylated in genomic DNA of human blood, brain, muscle, and spleen. Inter- and intragenic CGIs are preferentially susceptible to methylation. CGIs showing tissue-specific methylation were overrepresented at numerous genetic loci that are essential for development, including HOX and PAX family members. The findings enable a comprehensive analysis of the roles played by CGI methylation in normal and diseased human tissues.

    Funded by: Wellcome Trust

    PLoS biology 2008;6;1;e22

  • Refined mapping of X-linked reticulate pigmentary disorder and sequencing of candidate genes.

    Jaeckle Santos LJ, Xing C, Barnes RB, Ades LC, Megarbane A, Vidal C, Xuereb A, Tarpey PS, Smith R, Khazab M, Shoubridge C, Partington M, Futreal A, Stratton MR, Gecz J and Zinn AR

    McDermott Center for Human Growth and Development, The University of Texas Southwestern Medical School, 5323 Harry Hines Boulevard, Dallas, TX 75390-8591, USA.

    X-linked reticulate pigmentary disorder with systemic manifestations in males (PDR) is very rare. Affected males are characterized by cutaneous and visceral symptoms suggestive of abnormally regulated inflammation. A genetic linkage study of a large Canadian kindred previously mapped the PDR gene to a greater than 40 Mb interval of Xp22-p21. The aim of this study was to identify the causative gene for PDR. The Canadian pedigree was expanded and additional PDR families recruited. Genetic linkage was performed using newer microsatellite markers. Positional and functional candidate genes were screened by PCR and sequencing of coding exons in affected males. The location of the PDR gene was narrowed to a approximately 4.9 Mb interval of Xp22.11-p21.3 between markers DXS1052 and DXS1061. All annotated coding exons within this interval were sequenced in one affected male from each of the three multiplex families as well as one singleton, but no causative mutation was identified. Sequencing of other X-linked genes outside of the linked interval also failed to identify the cause of PDR but revealed a novel nonsynonymous cSNP in the GRPR gene in the Maltese population. PDR is most likely due to a mutation within the linked interval not affecting currently annotated coding exons.

    Funded by: NIAMS NIH HHS: P30 AR041940; Wellcome Trust: 077010

    Human genetics 2008;123;5;469-76

  • The genome-wide patterns of variation expose significant substructure in a founder population.

    Jakkula E, Rehnström K, Varilo T, Pietiläinen OP, Paunio T, Pedersen NL, deFaire U, Järvelin MR, Saharinen J, Freimer N, Ripatti S, Purcell S, Collins A, Daly MJ, Palotie A and Peltonen L

    Department of Molecular Medicine, National Public Health Institute and Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland.

    Although high-density SNP genotyping platforms generate a momentum for detailed genome-wide association (GWA) studies, an offshoot is a new insight into population genetics. Here, we present an example in one of the best-known founder populations by scrutinizing ten distinct Finnish early- and late-settlement subpopulations. By determining genetic distances, homozygosity, and patterns of linkage disequilibrium, we demonstrate that population substructure, and even individual ancestry, is detectable at a very high resolution and supports the concept of multiple historical bottlenecks resulting from consecutive founder effects. Given that genetic studies are currently aiming at identifying smaller and smaller genetic effects, recognizing and controlling for population substructure even at this fine level becomes imperative to avoid confounding and spurious associations. This study provides an example of the power of GWA data sets to demonstrate stratification caused by population history even within a seemingly homogeneous population, like the Finns. Further, the results provide interesting lessons concerning the impact of population history on the genome landscape of humans, as well as approaches to identify rare variants enriched in these subpopulations.

    Funded by: NCRR NIH HHS: U54RR020278; NHLBI NIH HHS: 1R01HL087679-01

    American journal of human genetics 2008;83;6;787-94

  • Rapidly regulated genes are intron poor.

    Jeffares DC, Penkett CJ and Bähler J

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    We show that genes with rapidly changing expression levels in response to stress contain significantly lower intron densities in yeasts, thale cress and mice. Therefore, we propose that introns can delay regulatory responses and are selected against in genes whose transcripts require rapid adjustment for survival of environmental challenges. These findings could provide an explanation for the apparent extensive intron loss during the evolution of some eukaryotic lineages.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Trends in genetics : TIG 2008;24;8;375-8

  • Loss of prokineticin receptor 2 signaling predisposes mice to torpor.

    Jethwa PH, I'Anson H, Warner A, Prosser HM, Hastings MH, Maywood ES and Ebling FJ

    School of Biomedical Sciences, University of Nottingham Medical School, Queen's Medical Centre, Nottingham, UK.

    The genes encoding prokineticin 2 polypeptide (Prok2) and its cognate receptor (Prokr2/Gpcr73l1) are widely expressed in both the suprachiasmatic nucleus and its hypothalamic targets, and this signaling pathway has been implicated in the circadian regulation of behavior and physiology. We have previously observed that the targeted null mutation of Prokr2 disrupts circadian coordination of cycles of locomotor activity and thermoregulation. We have now observed spontaneous but sporadic bouts of torpor in the majority of these transgenic mice lacking Prokr2 signaling. During these torpor bouts, which lasted for up to 8 h, body temperature and locomotor activity decreased markedly. Oxygen consumption and carbon dioxide production also decreased, and there was a decrease in respiratory quotient. These spontaneous torpor bouts generally began toward the end of the dark phase or in the early light phase when the mice were maintained on a 12:12-h light-dark cycle and persisted when mice were exposed to continuous darkness. Periods of food deprivation (16-24 h) induced a substantial decrease in body temperature in all mice, but the duration and depth of hypothermia was significantly greater in mice lacking Prokr2 signaling compared with heterozygous and wild-type littermates. Likewise, when tested in metabolic cages, food deprivation produced greater decreases in oxygen consumption and carbon dioxide production in the transgenic mice than controls. We conclude that Prokr2 signaling plays a role in hypothalamic regulation of energy balance, and loss of this pathway results in physiological and behavioral responses normally only detected when mice are in negative energy balance.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D525064/I, BBS/B/10765; Medical Research Council: U.1051.02.004(78799); Wellcome Trust

    American journal of physiology. Regulatory, integrative and comparative physiology 2008;294;6;R1968-79

  • Image capture and fusion of 3D surface texture using wavelet transform

    Jian, M. W., Dong. J. Y..Wu, J. H.

    Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, ICWAPR '07. 2008;338-343

  • Detection of genome-wide polymorphisms in the AT-rich Plasmodium falciparum genome using a high-density microarray.

    Jiang H, Yi M, Mu J, Zhang L, Ivens A, Klimczak LJ, Huyen Y, Stephens RM and Su XZ

    Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.

    Background: Genetic mapping is a powerful method to identify mutations that cause drug resistance and other phenotypic changes in the human malaria parasite Plasmodium falciparum. For efficient mapping of a target gene, it is often necessary to genotype a large number of polymorphic markers. Currently, a community effort is underway to collect single nucleotide polymorphisms (SNP) from the parasite genome. Here we evaluate polymorphism detection accuracy of a high-density 'tiling' microarray with 2.56 million probes by comparing single feature polymorphisms (SFP) calls from the microarray with known SNP among parasite isolates.

    Results: We found that probe GC content, SNP position in a probe, probe coverage, and signal ratio cutoff values were important factors for accurate detection of SFP in the parasite genome. We established a set of SFP calling parameters that could predict mSFP (SFP called by multiple overlapping probes) with high accuracy (> or = 94%) and identified 121,087 mSFP genome-wide from five parasite isolates including 40,354 unique mSFP (excluding those from multi-gene families) and approximately 18,000 new mSFP, producing a genetic map with an average of one unique mSFP per 570 bp. Genomic copy number variation (CNV) among the parasites was also cataloged and compared.

    Conclusion: A large number of mSFP were discovered from the P. falciparum genome using a high-density microarray, most of which were in clusters of highly polymorphic genes at chromosome ends. Our method for accurate mSFP detection and the mSFP identified will greatly facilitate large-scale studies of genome variation in the P. falciparum parasite and provide useful resources for mapping important parasite traits.

    Funded by: NCI NIH HHS: N01-CO-12400; Wellcome Trust

    BMC genomics 2008;9;398

  • Drug susceptibility testing using molecular techniques can enhance tuberculosis diagnosis in a community with a high tuberculosis incidence

    Johnson, R

    JIDC. 2008;2;40-45

  • Large-scale population study of human cell lines indicates that dosage compensation is virtually complete.

    Johnston CM, Lovell FL, Leongamornlert DA, Stranger BE, Dermitzakis ET and Ross MT

    X Chromosome Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    X chromosome inactivation in female mammals results in dosage compensation of X-linked gene products between the sexes. In humans there is evidence that a substantial proportion of genes escape from silencing. We have carried out a large-scale analysis of gene expression in lymphoblastoid cell lines from four human populations to determine the extent to which escape from X chromosome inactivation disrupts dosage compensation. We conclude that dosage compensation is virtually complete. Overall expression from the X chromosome is only slightly higher in females and can largely be accounted for by elevated female expression of approximately 5% of X-linked genes. We suggest that the potential contribution of escape from X chromosome inactivation to phenotypic differences between the sexes is more limited than previously believed.

    Funded by: Wellcome Trust

    PLoS genetics 2008;4;1;e9

  • A systematic library for comprehensive overexpression screens in Saccharomyces cerevisiae.

    Jones GM, Stalker J, Humphray S, West A, Cox T, Rogers J, Dunham I and Prelich G

    Department of Molecular Genetics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, USA.

    Modern genetic analysis requires the development of new resources to systematically explore gene function in vivo. Overexpression screens are a powerful method to investigate genetic pathways, but the goal of routine and comprehensive overexpression screens has been hampered by the lack of systematic libraries. Here we describe the construction of a systematic collection of the Saccharomyces cerevisiae genome in a high-copy vector and its validation in two overexpression screens.

    Funded by: NIGMS NIH HHS: GM52486; Wellcome Trust

    Nature methods 2008;5;3;239-41

  • Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans.

    Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, Rieder MJ, Cooper GM, Roos C, Voight BF, Havulinna AS, Wahlstrand B, Hedner T, Corella D, Tai ES, Ordovas JM, Berglund G, Vartiainen E, Jousilahti P, Hedblad B, Taskinen MR, Newton-Cheh C, Salomaa V, Peltonen L, Groop L, Altshuler DM and Orho-Melander M

    Cardiology Division, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.

    Blood concentrations of lipoproteins and lipids are heritable risk factors for cardiovascular disease. Using genome-wide association data from three studies (n = 8,816 that included 2,758 individuals from the Diabetes Genetics Initiative specific to the current paper as well as 1,874 individuals from the FUSION study of type 2 diabetes and 4,184 individuals from the SardiNIA study of aging-associated variables reported in a companion paper in this issue) and targeted replication association analyses in up to 18,554 independent participants, we show that common SNPs at 18 loci are reproducibly associated with concentrations of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and/or triglycerides. Six of these loci are new (P < 5 x 10(-8) for each new locus). Of the six newly identified chromosomal regions, two were associated with LDL cholesterol (1p13 near CELSR2, PSRC1 and SORT1 and 19p13 near CILP2 and PBX4), one with HDL cholesterol (1q42 in GALNT2) and five with triglycerides (7q11 near TBL2 and MLXIPL, 8q24 near TRIB1, 1q42 in GALNT2, 19p13 near CILP2 and PBX4 and 1p31 near ANGPTL3). At 1p13, the LDL-associated SNP was also strongly correlated with CELSR2, PSRC1, and SORT1 transcript levels in human liver, and a proxy for this SNP was recently shown to affect risk for coronary artery disease. Understanding the molecular, cellular and clinical consequences of the newly identified loci may inform therapy and clinical care.

    Funded by: Wellcome Trust: 089061

    Nature genetics 2008;40;2;189-97

  • Clinical implication of recurrent copy number alterations in hepatocellular carcinoma and putative oncogenes in recurrent gains on 1q.

    Kim TM, Yim SH, Shin SH, Xu HD, Jung YC, Park CK, Choi JY, Park WS, Kwon MS, Fiegler H, Carter NP, Rhyu MG and Chung YJ

    Department of Microbiology, The Catholic University of Korea, Seoul, Korea.

    To elucidate the pathogenesis of hepatocellular carcinoma (HCC) and develop useful prognosis predictors, it is necessary to identify biologically relevant genomic alterations in HCC. In our study, we defined recurrently altered regions (RARs) common to many cases of HCCs, which may contain tumor-related genes, using whole-genome array-CGH and explored their associations with the clinicopathologic features. Gene set enrichment analysis was performed to investigate functional implication of RARs. On an average, 23.1% of the total probes were altered per case. Mean numbers of altered probes are significantly higher in high-grade, bigger and microvascular invasion (MVI) positive tumors. In total, 32 RARs (14 gains and 18 losses) were defined and 4 most frequent RARs are gains in 1q21.1-q32.1 (64.5%), 1q32.1-q44 (59.2%), 8q11.21-q24.3 (48.7%) and a loss in 17p13.3-p12 (51.3%). Through focusing on RARs, we identified genes and functional pathways likely to be involved in hepatocarcinogenesis. Among genes in the recurrently gained regions on 1q, expression of KIF14 and TPM3 was significantly increased, suggesting their oncogenic potential in HCC. Some RARs showed the significant associations with the clinical features. Especially, the recurrent loss in 9p24.2-p21.1 and gain in 8q11.21-q24.3 are associated with the high tumor grade and MVI, respectively. Functional analysis showed that cytokine receptor binding and defense response to virus pathways are significantly enriched in high grade-related RARs. Taken together, our results and the strategy of analysis will help to elucidate pathogenesis of HCC and to develop biomarkers for predicting behaviors of HCC.

    Funded by: Wellcome Trust: 077008

    International journal of cancer. Journal international du cancer 2008;123;12;2808-15

  • iMapper: a web application for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes.

    Kong J, Zhu F, Stalker J and Adams DJ

    Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambs CB101HH, UK.

    SUMMARY: Insertional mutagenesis is a powerful method for gene discovery. To identify the location of insertion sites in the genome linker based polymerase chain reaction (PCR) methods (such as splinkerette-PCR) may be employed. We have developed a web application called iMapper (Insertional Mutagenesis Mapping and Analysis Tool) for the efficient analysis of insertion site sequence reads against vertebrate and invertebrate Ensembl genomes. Taking linker based sequences as input, iMapper scans and trims the sequence to remove the linker and sequences derived from the insertional mutagen. The software then identifies and removes contaminating sequences derived from chimeric genomic fragments, vector or the transposon concatamer and then presents the clipped sequence reads to a sequence mapping server which aligns them to an Ensembl genome. Insertion sites can then be navigated in Ensembl in the context of genomic features such as gene structures. iMapper also generates test-based format for nucleic acid or protein sequences (FASTA) and generic file format (GFF) files of the clipped sequence reads and provides a graphical overview of the mapped insertion sites against a karyotype. iMapper is designed for high-throughput applications and can efficiently process thousands of DNA sequence reads. AVAILABILITY: iMapper is web based and can be accessed at

    Funded by: Cancer Research UK: C20510/A6997; Wellcome Trust: 76943

    Bioinformatics (Oxford, England) 2008;24;24;2923-5

  • Association analysis of allelic variants of USF1 in coronary atherosclerosis.

    Kristiansson K, Ilveskoski E, Lehtimäki T, Peltonen L, Perola M and Karhunen PJ

    Department of Molecular Medicine, National Public Health Institute, Helsinki, Finland.

    Objective: USF1 regulates the transcription of more than 40 cardiovascular related genes and is well established as a gene associated with familial combined hyperlipidemia, a condition increasing the risk for coronary heart disease. No detailed data, however, exists on the impact of this gene to the critical outcome at the tissue level: different types of atherosclerotic lesions.

    We analyzed the USF1 in 2 autopsy series of altogether 700 middle-aged men (the Helsinki Sudden Death Study) with quantitative morphometric measurements of coronary atherosclerosis. SNP rs2516839, tagging common USF1 haplotypes, associated with the presence of several types of atherosclerotic lesions, particularly with the proportion of advanced atherosclerotic plaques (P=0.02) and area of calcified lesions (P<0.001) of the coronary arteries. Importantly, carriers of risk alleles of rs2516839 also showed a 2-fold risk for sudden cardiac death (genotype TT versus CC; OR 2.10, 95% CI 1.17 to 3.75, P=0.04). The risk effect of rs2516839 was present also in aorta samples of the men.

    Conclusions: Our findings in this unique study sample suggest that USF1 contributes to atherosclerosis, the pathological arterial wall phenotype resulting in coronary heart disease and in its most dramatic consequence-sudden cardiac death.

    Arteriosclerosis, thrombosis, and vascular biology 2008;28;5;983-9

  • Isolated populations and complex disease gene identification.

    Kristiansson K, Naukkarinen J and Peltonen L

    National Public Health Institute and FIMM, Institute for Molecular Medicine Finland, Helsinki 00300, Finland.

    The utility of genetically isolated populations (population isolates) in the mapping and identification of genes is not only limited to the study of rare diseases; isolated populations also provide a useful resource for studies aimed at improved understanding of the biology underlying common diseases and their component traits. Well characterized human populations provide excellent study samples for many different genetic investigations, ranging from genome-wide association studies to the characterization of interactions between genes and the environment.

    Genome biology 2008;9;8;109

  • Molecular and morphological changes in placenta and embryo development associated with the inhibition of polyamine synthesis during midpregnancy in mice.

    López-García C, López-Contreras AJ, Cremades A, Castells MT, Marín F, Schreiber F and Peñafiel R

    Department of Biochemistry and Molecular Biology B and Immunology, Faculty of Medicine, University of Murcia, Campus de Espinardo, 30100 Murcia, Spain.

    Polyamines play an essential role in murine development, as demonstrated by both gene ablation in ornithine decarboxylase (ODC)-deficient embryos and pharmacological treatments of pregnant mice. However, the molecular and cellular mechanisms by which ODC inhibition affects embryonic development during critical periods of pregnancy are mostly unknown. Our present results demonstrate that the contragestational effect of alpha-difluoromethylornithine (DFMO), a suicide inhibitor of ODC, when given at d 7-9 of pregnancy, is associated with embryo growth arrest and marked alterations in the development of yolk sac and placenta. Blood island formation as well as the transcript levels of embryonary globins alpha-like x chain and beta-like y-chain was markedly decreased in the yolk sac. At the placental level, abnormal chorioallantoic attachment, absence of the spongiotrophoblast layer and a deficient development of the labyrinthine zone were evident. Real-time RT-PCR analysis showed that transcript levels of the steroidogenic genes steroidogenic acute regulatory protein, 3beta-hydroxysteroid dehydrogenase VI, and 17alpha-hydroxylase were markedly decreased by DFMO treatment in the developing placenta at d 9 and 10 of pregnancy. Plasma values of progesterone and androstenedione were also decreased by DFMO treatment. Transcriptomic analysis also detected changes in the expression of several genes involved in placentation and the differentiation of trophoblastic lineages. In conclusion, our results indicate that ODC inhibition at d 8 of pregnancy is related to alterations in yolk sac formation and trophoblast differentiation, affecting processes such as vasculogenesis and steroidogenesis.

    Endocrinology 2008;149;10;5012-23

  • Normal germ line establishment in mice carrying a deletion of the Ifitm/Fragilis gene family cluster.

    Lange UC, Adams DJ, Lee C, Barton S, Schneider R, Bradley A and Surani MA

    Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer & Developmental Biology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, United Kingdom.

    The family of interferon-inducible transmembrane proteins (Ifitm) consists of five highly sequence-related cell surface proteins, which are implicated in diverse cellular processes. Ifitm genes are conserved, widely expressed, and characteristically found in genomic clusters, such as the 67-kb Ifitm family locus on mouse chromosome 7. Recently, Ifitm1 and Ifitm3 have been suggested to mediate migration of early primordial germ cells (PGCs), a process that is little understood. To investigate Ifitm function during germ cell development, we used targeted chromosome engineering to generate mutants which either lack the entire Ifitm locus or carry a disrupted Ifitm3 gene only. Here we show that the mutations have no detectable effects on development of the germ line or on the generation of live young. Hence, contrary to previous reports, Ifitm genes are not essential for PGC migration. The Ifitm family is a striking example of a conserved gene cluster which appears to be functionally redundant during development.

    Funded by: Biotechnology and Biological Sciences Research Council; Cancer Research UK; Wellcome Trust: 065601

    Molecular and cellular biology 2008;28;15;4688-96

  • Invasive Salmonellosis in Humans

    Langridge, G.C., Wain, J., Nair, S.

    Ecosal. 2008;Module8.6.2.2

  • Comparative genomics of the Rab protein family in Apicomplexan parasites

    Langsley G.

    Microbes and Infection. 2008

  • Host transmission of Salmonella enterica serovar Typhimurium is controlled by virulence factors and indigenous intestinal microbiota.

    Lawley TD, Bouley DM, Hoy YE, Gerke C, Relman DA and Monack DM

    Department of Microbiology and Immunology, 299 Campus Drive, Stanford University, Stanford, CA 94305, USA.

    Transmission is an essential stage of a pathogen's life cycle and remains poorly understood. We describe here a model in which persistently infected 129X1/SvJ mice provide a natural model of Salmonella enterica serovar Typhimurium transmission. In this model only a subset of the infected mice, termed supershedders, shed high levels (>10(8) CFU/g) of Salmonella serovar Typhimurium in their feces and, as a result, rapidly transmit infection. While most Salmonella serovar Typhimurium-infected mice show signs of intestinal inflammation, only supershedder mice develop colitis. Development of the supershedder phenotype depends on the virulence determinants Salmonella pathogenicity islands 1 and 2, and it is characterized by mucosal invasion and, importantly, high luminal abundance of Salmonella serovar Typhimurium within the colon. Immunosuppression of infected mice does not induce the supershedder phenotype, demonstrating that the immune response is not the main determinant of Salmonella serovar Typhimurium levels within the colon. In contrast, treatment of mice with antibiotics that alter the health-associated indigenous intestinal microbiota rapidly induces the supershedder phenotype in infected mice and predisposes uninfected mice to the supershedder phenotype for several days. These results demonstrate that the intestinal microbiota plays a critical role in controlling Salmonella serovar Typhimurium infection, disease, and transmissibility. This novel model should facilitate the study of host, pathogen, and intestinal microbiota factors that contribute to infectious disease transmission.

    Funded by: NIAID NIH HHS: AI26195

    Infection and immunity 2008;76;1;403-16

  • A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans.

    Lee I, Lehner B, Crombie C, Wong W, Fraser AG and Marcotte EM

    Center for Systems and Synthetic Biology, Department of Chemistry and Biochemistry, Institute for Cellular and Molecular Biology, University of Texas, 2500 Speedway, MBB 3.210, Austin, Texas 78712, USA.

    The fundamental aim of genetics is to understand how an organism's phenotype is determined by its genotype, and implicit in this is predicting how changes in DNA sequence alter phenotypes. A single network covering all the genes of an organism might guide such predictions down to the level of individual cells and tissues. To validate this approach, we computationally generated a network covering most C. elegans genes and tested its predictive capacity. Connectivity within this network predicts essentiality, identifying this relationship as an evolutionarily conserved biological principle. Critically, the network makes tissue-specific predictions-we accurately identify genes for most systematically assayed loss-of-function phenotypes, which span diverse cellular and developmental processes. Using the network, we identify 16 genes whose inactivation suppresses defects in the retinoblastoma tumor suppressor pathway, and we successfully predict that the dystrophin complex modulates EGF signaling. We conclude that an analogous network for human genes might be similarly predictive and thus facilitate identification of disease genes and rational therapeutic targets.

    Funded by: NIGMS NIH HHS: GM06779-01; Wellcome Trust

    Nature genetics 2008;40;2;181-8

  • Evolution of the Rhodococcus equi vap pathogenicity island seen through comparison of host-associated vapA and vapB virulence plasmids.

    Letek M, Ocampo-Sosa AA, Sanders M, Fogarty U, Buckley T, Leadon DP, González P, Scortti M, Meijer WG, Parkhill J, Bentley S and Vázquez-Boland JA

    Division of Microbial Pathogenesis, Centre for Infectious Diseases, Ashworth Laboratories, King's Buildings, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.

    The pathogenic actinomycete Rhodococcus equi harbors different types of virulence plasmids associated with specific nonhuman hosts. We determined the complete DNA sequence of a vapB(+) plasmid, typically associated with pig isolates, and compared it with that of the horse-specific vapA(+) plasmid type. pVAPB1593, a circular 79,251-bp element, had the same housekeeping backbone as the vapA(+) plasmid but differed over an approximately 22-kb region. This variable region encompassed the vap pathogenicity island (PAI), was clearly subject to selective pressures different from those affecting the backbone, and showed major genetic rearrangements involving the vap genes. The pVAPB1593 PAI harbored five different vap genes (vapB and vapJ to -M, with vapK present in two copies), which encoded products differing by 24 to 84% in amino acid sequence from the six full-length vapA(+) plasmid-encoded Vap proteins, consistent with a role for the specific vap gene complement in R. equi host tropism. Sequence analyses, including interpolated variable-order motifs for detection of alien DNA and reconstruction of Vap family phylogenetic relationships, suggested that the vap PAI was acquired by an ancestor plasmid via lateral gene transfer, subsequently evolving by vap gene duplication and sequence diversification to give different (host-adapted) plasmids. The R. equi virulence plasmids belong to a new family of actinobacterial circular replicons characterized by an ancient conjugative backbone and a horizontally acquired niche-adaptive plasticity region.

    Journal of bacteriology 2008;190;17;5797-805

  • Identification of ten loci associated with height highlights new biological pathways in human growth.

    Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI, Sanna S, Eyheramendy S, Voight BF, Butler JL, Guiducci C, Illig T, Hackett R, Heid IM, Jacobs KB, Lyssenko V, Uda M, Diabetes Genetics Initiative, FUSION, KORA, Prostate, Lung Colorectal and Ovarian Cancer Screening Trial, Nurses' Health Study, SardiNIA, Boehnke M, Chanock SJ, Groop LC, Hu FB, Isomaa B, Kraft P, Peltonen L, Salomaa V, Schlessinger D, Hunter DJ, Hayes RB, Abecasis GR, Wichmann HE, Mohlke KL and Hirschhorn JN

    Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.

    Height is a classic polygenic trait, reflecting the combined influence of multiple as-yet-undiscovered genetic factors. We carried out a meta-analysis of genome-wide association study data of height from 15,821 individuals at 2.2 million SNPs, and followed up the strongest findings in >10,000 subjects. Ten newly identified and two previously reported loci were strongly associated with variation in height (P values from 4 x 10(-7) to 8 x 10(-22)). Together, these 12 loci account for approximately 2% of the population variation in height. Individuals with < or =8 height-increasing alleles and > or =16 height-increasing alleles differ in height by approximately 3.5 cm. The newly identified loci, along with several additional loci with strongly suggestive associations, encompass both strong biological candidates and unexpected genes, and highlight several pathways (let-7 targets, chromatin remodeling proteins and Hedgehog signaling) as important regulators of human stature. These results expand the picture of the biological regulation of human height and of the genetic architecture of this classical complex trait.

    Funded by: NCI NIH HHS: 5P01CA087969, 5U01CA098233, CA49449; NHGRI NIH HHS: HG02651; NHLBI NIH HHS: HL084729; NIDDK NIH HHS: 5 R01 DK 075787, DK62370, DK72193, R01 DK072193, R01 DK072193-01, R01 DK072193-02, R01 DK072193-03; Wellcome Trust: 089061

    Nature genetics 2008;40;5;584-91

  • Mapping short DNA sequencing reads and calling variants using mapping quality scores.

    Li H, Ruan J and Durbin R

    The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.

    New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at

    Funded by: Wellcome Trust

    Genome research 2008;18;11;1851-8

  • Curcumin prevents and reverses murine cardiac hypertrophy.

    Li HL, Liu C, de Couto G, Ouzounian M, Sun M, Wang AB, Huang Y, He CW, Shi Y, Chen X, Nghiem MP, Liu Y, Chen M, Dawood F, Fukuoka M, Maekawa Y, Zhang L, Leask A, Ghosh AK, Kirshenbaum LA and Liu PP

    Division of Cardiology, Heart and Stroke/Richard Lewar Centre of Excellence, University Health Network, University of Toronto, Toronto, Ontario, Canada.

    Chromatin remodeling, particularly histone acetylation, plays a critical role in the progression of pathological cardiac hypertrophy and heart failure. We hypothesized that curcumin, a natural polyphenolic compound abundant in the spice turmeric and a known suppressor of histone acetylation, would suppress cardiac hypertrophy through the disruption of p300 histone acetyltransferase-dependent (p300-HAT-dependent) transcriptional activation. We tested this hypothesis using primary cultured rat cardiac myocytes and fibroblasts as well as two well-established mouse models of cardiac hypertrophy. Curcumin blocked phenylephrin-induced (PE-induced) cardiac hypertrophy in vitro in a dose-dependent manner. Furthermore, curcumin both prevented and reversed mouse cardiac hypertrophy induced by aortic banding (AB) and PE infusion, as assessed by heart weight/BW and lung weight/BW ratios, echocardiographic parameters, and gene expression of hypertrophic markers. Further investigation demonstrated that curcumin abrogated histone acetylation, GATA4 acetylation, and DNA-binding activity through blocking p300-HAT activity. Curcumin also blocked AB-induced inflammation and fibrosis through disrupting p300-HAT-dependent signaling pathways. Our results indicate that curcumin has the potential to protect against cardiac hypertrophy, inflammation, and fibrosis through suppression of p300-HAT activity and downstream GATA4, NF-kappaB, and TGF-beta-Smad signaling pathways.

    The Journal of clinical investigation 2008;118;3;879-93

  • Analysis of the DND1 gene in men with sporadic and familial testicular germ cell tumors.

    Linger R, Dudakia D, Huddart R, Tucker K, Friedlander M, Phillips KA, Hogg D, Jewett MA, Lohynska R, Daugaard G, Richard S, Chompret A, Stoppa-Lyonnet D, Bonaïti-Pellié C, Heidenreich A, Albers P, Olah E, Geczi L, Bodrogi I, Daly PA, Guilford P, Fosså SD, Heimdal K, Tjulandin SA, Liubchenko L, Stoll H, Weber W, Einhorn L, McMaster M, Korde L, Greene MH, Nathanson KL, Cortessis V, Easton DF, Bishop DT, Stratton MR and Rapley EA

    Testicular Cancer Genetics Team, Section of Cancer Genetics, Institute of Cancer Research, Sutton, Surrey, UK.

    A base substitution in the mouse Dnd1 gene resulting in a truncated Dnd protein has been shown to be responsible for germ cell loss and the development of testicular germ cell tumors (TGCT) in the 129 strain of mice. We investigated the human orthologue of this gene in 263 patients (165 with a family history of TGCT and 98 without) and found a rare heterozygous variant, p. Glu86Ala, in a single case. This variant was not present in control chromosomes (0/4,132). Analysis of the variant in an additional 842 index TGCT cases (269 with a family history of TGCT and 573 without) did not reveal any additional instances. The variant, p. Glu86Ala, is within a known functional domain of DND1 and is highly conserved through evolution. Although the variant may be a rare polymorphism, a change at such a highly conserved residue is characteristic of a disease-causing variant. Whether it is disease-causing or not, mutations in DND1 make, at most, a very small contribution to TGCT susceptibility in adults and adolescents.

    Funded by: NCI NIH HHS: 1R01 CA102042-01A1, ZIA CP010144-12; Wellcome Trust: 068545/Z/02

    Genes, chromosomes & cancer 2008;47;3;247-52

  • Lifelong reduction of LDL-cholesterol related to a common variant in the LDL-receptor gene decreases the risk of coronary artery disease--a Mendelian Randomisation study.

    Linsel-Nitschke P, Götz A, Erdmann J, Braenne I, Braund P, Hengstenberg C, Stark K, Fischer M, Schreiber S, El Mokhtari NE, Schaefer A, Schrezenmeir J, Schrezenmeier J, Rubin D, Hinney A, Reinehr T, Roth C, Ortlepp J, Hanrath P, Hall AS, Mangino M, Lieb W, Lamina C, Heid IM, Doering A, Gieger C, Peters A, Meitinger T, Wichmann HE, König IR, Ziegler A, Kronenberg F, Samani NJ, Schunkert H, Wellcome Trust Case Control Consortium (WTCCC) and Cardiogenics Consortium

    Medizinische Klinik II, Universität zu Lübeck, Lübeck, Germany.

    Background: Rare mutations of the low-density lipoprotein receptor gene (LDLR) cause familial hypercholesterolemia, which increases the risk for coronary artery disease (CAD). Less is known about the implications of common genetic variation in the LDLR gene regarding the variability of cholesterol levels and risk of CAD.

    Methods: Imputed genotype data at the LDLR locus on 1 644 individuals of a population-based sample were explored for association with LDL-C level. Replication of association with LDL-C level was sought for the most significant single nucleotide polymorphism (SNP) within the LDLR gene in three European samples comprising 6 642 adults and 533 children. Association of this SNP with CAD was examined in six case-control studies involving more than 15 000 individuals.

    Findings: Each copy of the minor T allele of SNP rs2228671 within LDLR (frequency 11%) was related to a decrease of LDL-C levels by 0.19 mmol/L (95% confidence interval (CI) [0.13-0.24] mmol/L, p = 1.5x10(-10)). This association with LDL-C was uniformly found in children, men, and women of all samples studied. In parallel, the T allele of rs2228671 was associated with a significantly lower risk of CAD (Odds Ratio per copy of the T allele: 0.82, 95% CI [0.76-0.89], p = 2.1x10(-7)). Adjustment for LDL-C levels by logistic regression or Mendelian Randomisation models abolished the significant association between rs2228671 with CAD completely, indicating a functional link between the genetic variant at the LDLR gene locus, change in LDL-C and risk of CAD.

    Conclusion: A common variant at the LDLR gene locus affects LDL-C levels and, thereby, the risk for CAD.

    Funded by: British Heart Foundation; Medical Research Council

    PloS one 2008;3;8;e2986

  • The conserved plant sterility gene HAP2 functions after attachment of fusogenic membranes in Chlamydomonas and Plasmodium gametes.

    Liu Y, Tewari R, Ning J, Blagborough AM, Garbom S, Pei J, Grishin NV, Steele RE, Sinden RE, Snell WJ and Billker O

    Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.

    The cellular and molecular mechanisms that underlie species-specific membrane fusion between male and female gametes remain largely unknown. Here, by use of gene discovery methods in the green alga Chlamydomonas, gene disruption in the rodent malaria parasite Plasmodium berghei, and distinctive features of fertilization in both organisms, we report discovery of a mechanism that accounts for a conserved protein required for gamete fusion. A screen for fusion mutants in Chlamydomonas identified a homolog of HAP2, an Arabidopsis sterility gene. Moreover, HAP2 disruption in Plasmodium blocked fertilization and thereby mosquito transmission of malaria. HAP2 localizes at the fusion site of Chlamydomonas minus gametes, yet Chlamydomonas minus and Plasmodium hap2 male gametes retain the ability, using other, species-limited proteins, to form tight prefusion membrane attachments with their respective gamete partners. Membrane dye experiments show that HAP2 is essential for membrane merger. Thus, in two distantly related eukaryotes, species-limited proteins govern access to a conserved protein essential for membrane fusion.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0501670; NIGMS NIH HHS: GM056778, R01 GM056778; Wellcome Trust

    Genes & development 2008;22;8;1051-68

  • Identification of PLCL1 gene for hip bone size variation in females in a genome-wide association study.

    Liu YZ, Wilson SG, Wang L, Liu XG, Guo YF, Li J, Yan H, Deloukas P, Soranzo N, Chinappen-Horsley U, Chinnapen-Horsley U, Cervino A, Cervino A, Williams FM, Xiong DH, Zhang YP, Jin TB, Levy S, Papasian CJ, Drees BM, Hamilton JJ, Recker RR, Spector TD and Deng HW

    School of Medicine, University of Missouri - Kansas City, Kansas City, Missouri, United States of America.

    Osteoporosis, the most prevalent metabolic bone disease among older people, increases risk for low trauma hip fractures (HF) that are associated with high morbidity and mortality. Hip bone size (BS) has been identified as one of the key measurable risk factors for HF. Although hip BS is highly genetically determined, genetic factors underlying the trait are still poorly defined. Here, we performed the first genome-wide association study (GWAS) of hip BS interrogating approximately 380,000 SNPs on the Affymetrix platform in 1,000 homogeneous unrelated Caucasian subjects, including 501 females and 499 males. We identified a gene, PLCL1 (phospholipase c-like 1), that had four SNPs associated with hip BS at, or approaching, a genome-wide significance level in our female subjects; the most significant SNP, rs7595412, achieved a p value of 3.72x10(-7). The gene's importance to hip BS was replicated using the Illumina genotyping platform in an independent UK cohort containing 1,216 Caucasian females. Two SNPs of the PLCL1 gene, rs892515 and rs9789480, surrounded by the four SNPs identified in our GWAS, achieved p values of 8.62x10(-3) and 2.44x10(-3), respectively, for association with hip BS. Imputation analyses on our GWAS and the UK samples further confirmed the replication signals; eight SNPs of the gene achieved combined imputed p values<10(-5) in the two samples. The PLCL1 gene's relevance to HF was also observed in a Chinese sample containing 403 females, including 266 with HF and 177 control subjects. A SNP of the PLCL1 gene, rs3771362 that is only approximately 0.6 kb apart from the most significant SNP detected in our GWAS (rs7595412), achieved a p value of 7.66x10(-3) (odds ratio = 0.26) for association with HF. Additional biological support for the role of PLCL1 in BS comes from previous demonstrations that the PLCL1 protein inhibits IP3 (inositol 1,4,5-trisphosphate)-mediated calcium signaling, an important pathway regulating mechanical sensing of bone cells. Our findings suggest that PLCL1 is a novel gene associated with variation in hip BS, and provide new insights into the pathogenesis of HF.

    Funded by: NIA NIH HHS: R01 AG026564, R21 AG027110; NIAAA NIH HHS: R21 AA015973; NIAMS NIH HHS: P50 AR055081, R01 AR050496-01; Wellcome Trust

    PloS one 2008;3;9;e3160

  • Common variants near MC4R are associated with fat mass, weight and risk of obesity.

    Loos RJ, Lindgren CM, Li S, Wheeler E, Zhao JH, Prokopenko I, Inouye M, Freathy RM, Attwood AP, Beckmann JS, Berndt SI, Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, Jacobs KB, Chanock SJ, Hayes RB, Bergmann S, Bennett AJ, Bingham SA, Bochud M, Brown M, Cauchi S, Connell JM, Cooper C, Smith GD, Day I, Dina C, De S, Dermitzakis ET, Doney AS, Elliott KS, Elliott P, Evans DM, Sadaf Farooqi I, Froguel P, Ghori J, Groves CJ, Gwilliam R, Hadley D, Hall AS, Hattersley AT, Hebebrand J, Heid IM, KORA, Lamina C, Gieger C, Illig T, Meitinger T, Wichmann HE, Herrera B, Hinney A, Hunt SE, Jarvelin MR, Johnson T, Jolley JD, Karpe F, Keniry A, Khaw KT, Luben RN, Mangino M, Marchini J, McArdle WL, McGinnis R, Meyre D, Munroe PB, Morris AD, Ness AR, Neville MJ, Nica AC, Ong KK, O'Rahilly S, Owen KR, Palmer CN, Papadakis K, Potter S, Pouta A, Qi L, Nurses' Health Study, Randall JC, Rayner NW, Ring SM, Sandhu MS, Scherag A, Sims MA, Song K, Soranzo N, Speliotes EK, Diabetes Genetics Initiative, Syddall HE, Teichmann SA, Timpson NJ, Tobias JH, Uda M, SardiNIA Study, Vogel CI, Wallace C, Waterworth DM, Weedon MN, Wellcome Trust Case Control Consortium, Willer CJ, FUSION, Wraight, Yuan X, Zeggini E, Hirschhorn JN, Strachan DP, Ouwehand WH, Caulfield MJ, Samani NJ, Frayling TM, Vollenweider P, Waeber G, Mooser V, Deloukas P, McCarthy MI, Wareham NJ, Barroso I, Jacobs KB, Chanock SJ, Hayes RB, Lamina C, Gieger C, Illig T, Meitinger T, Wichmann HE, Kraft P, Hankinson SE, Hunter DJ, Hu FB, Lyon HN, Voight BF, Ridderstrale M, Groop L, Scheet P, Sanna S, Abecasis GR, Albai G, Nagaraja R, Schlessinger D, Jackson AU, Tuomilehto J, Collins FS, Boehnke M and Mohlke KL

    MRC Epidemiology Unit, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.

    To identify common variants influencing body mass index (BMI), we analyzed genome-wide association data from 16,876 individuals of European descent. After previously reported variants in FTO, the strongest association signal (rs17782313, P = 2.9 x 10(-6)) mapped 188 kb downstream of MC4R (melanocortin-4 receptor), mutations of which are the leading cause of monogenic severe childhood-onset obesity. We confirmed the BMI association in 60,352 adults (per-allele effect = 0.05 Z-score units; P = 2.8 x 10(-15)) and 5,988 children aged 7-11 (0.13 Z-score units; P = 1.5 x 10(-8)). In case-control analyses (n = 10,583), the odds for severe childhood obesity reached 1.30 (P = 8.0 x 10(-11)). Furthermore, we observed overtransmission of the risk allele to obese offspring in 660 families (P (pedigree disequilibrium test average; PDT-avg) = 2.4 x 10(-4)). The SNP location and patterns of phenotypic associations are consistent with effects mediated through altered MC4R function. Our findings establish that common variants near MC4R influence fat mass, weight and obesity risk at the population level and reinforce the need for large-scale data integration to identify variants influencing continuous biomedical traits.

    Funded by: British Heart Foundation; Cancer Research UK; Department of Health: DHCS/07/07/008; Medical Research Council: G0000934, G0000934(68341), G0400874, G0401527, G0600705, G0601261, G0601261(80227), G0701863, G9521010, G9521010(63660), G9824984, G9828345, MC_QA137934, MC_U105161047, MC_U105630924, MC_U106179472, MC_U106188470, MC_U147585824, MC_UP_A620_1014, U.1475.00.002.00001.01 (85824); NIDDK NIH HHS: F32 DK079466-01, K23 DK080145-01, P30 DK040561-13, R01 DK072193, R01 DK072193-01, R01 DK072193-02, R01 DK072193-03; Wellcome Trust: 068545, 076113, 077016, 079557, 084713

    Nature genetics 2008;40;6;768-75

  • Sites of strong Rec12/Spo11 binding in the fission yeast genome are associated with meiotic recombination and with centromeres.

    Ludin K, Mata J, Watt S, Lehmann E, Bähler J and Kohli J

    Institute of Cell Biology, University of Bern, 3012 Bern, Switzerland.

    Meiotic recombination arises from Rec12/Spo11-dependent formation of DNA double-strand breaks (DSBs) and their subsequent repair. We identified Rec12-binding peaks across the Schizosaccharomyces pombe genome using chromatin immunoprecipitation after reversible formaldehyde cross-linking combined with whole-genome DNA microarrays. Strong Rec12 binding coincided with previously identified DSBs at the recombination hotspots ura4A, mbs1, and mbs2 and correlated with DSB formation at a new site. In addition, Rec12 binding corresponded to eight novel conversion hotspots and correlated with crossover density in segments of chromosome I. Notably, Rec12 binding inversely correlated with guanine-cytosine (GC) content, contrary to findings in Saccharomyces cerevisiae. Although both replication origins and Rec12-binding sites preferred AT-rich gene-free regions, they seemed to exclude each other. We also uncovered a connection between binding sites of Rec12 and meiotic cohesin Rec8. Rec12-binding peaks lay often within 2.5 kb of a Rec8-binding peak. Rec12 binding showed preference for large intergenic regions and was found to bind preferentially near to genes expressed strongly in meiosis. Surprisingly, Rec12 binding was also detected in centromeric core regions, which raises the intriguing possibility that Rec12 plays additional roles in meiotic chromosome dynamics.

    Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118

    Chromosoma 2008;117;5;431-44

  • MYO5B mutations cause microvillus inclusion disease and disrupt epithelial cell polarity.

    Müller T, Hess MW, Schiefermeier N, Pfaller K, Ebner HL, Heinz-Erian P, Ponstingl H, Partsch J, Röllinghoff B, Köhler H, Berger T, Lenhartz H, Schlenck B, Houwen RJ, Taylor CJ, Zoller H, Lechner S, Goulet O, Utermann G, Ruemmele FM, Huber LA and Janecke AR

    Department of Pediatrics II, Innsbruck Medical University, 6020 Innsbruck, Austria.

    Following homozygosity mapping in a single kindred, we identified nonsense and missense mutations in MYO5B, encoding type Vb myosin motor protein, in individuals with microvillus inclusion disease (MVID). MVID is characterized by lack of microvilli on the surface of enterocytes and occurrence of intracellular vacuolar structures containing microvilli. In addition, mislocalization of transferrin receptor in MVID enterocytes suggests that MYO5B deficiency causes defective trafficking of apical and basolateral proteins in MVID.

    Funded by: Austrian Science Fund FWF: P 19486-B12

    Nature genetics 2008;40;10;1163-5

  • The neglected role of antibody in protection against bacteremia caused by nontyphoidal strains of Salmonella in African children.

    MacLennan CA, Gondwe EN, Msefula CL, Kingsley RA, Thomson NR, White SA, Goodall M, Pickard DJ, Graham SM, Dougan G, Hart CA, Molyneux ME and Drayson MT

    Malawi-Liverpool-Wellcome Trust Clinical Research Programme, College of Medicine, University of Malawi, Blantyre, Malawi.

    Nontyphoidal strains of Salmonella (NTS) are a common cause of bacteremia among African children. Cell-mediated immune responses control intracellular infection, but they do not protect against extracellular growth of NTS in the blood. We investigated whether antibody protects against NTS bacteremia in Malawian children, because we found this condition mainly occurs before 2 years of age, with relative sparing of infants younger than 4 months old. Sera from all healthy Malawian children tested aged more than 16 months contained anti-Salmonella antibody and successfully killed NTS. Killing was mediated by complement membrane attack complex and not augmented in the presence of blood leukocytes. Sera from most healthy children less than 16 months old lacked NTS-specific antibody, and sera lacking antibody did not kill NTS despite normal complement function. Addition of Salmonella-specific antibody, but not mannose-binding lectin, enabled NTS killing. All NTS strains tested had long-chain lipopolysaccharide and the rck gene, features that resist direct complement-mediated killing. Disruption of lipopolysaccharide biosynthesis enabled killing of NTS by serum lacking Salmonella-specific antibody. We conclude that Salmonella-specific antibody that overcomes the complement resistance of NTS develops by 2 years of life in Malawian children. This finding and the age-incidence of NTS bacteremia suggest that antibody protects against NTS bacteremia and support the development of vaccines against NTS that induce protective antibody.

    Funded by: Wellcome Trust: 067902

    The Journal of clinical investigation 2008;118;4;1553-62

  • Extensive chromosome rearrangements distinguish the karyotype of the hypovirulent species Candida dubliniensis from the virulent Candida albicans.

    Magee BB, Sanchez MD, Saunders D, Harris D, Berriman M and Magee PT

    Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN, USA.

    Candida dubliniensis and Candida albicans, the most common human fungal pathogen, have most of the same genes and high sequence similarity, but C. dubliniensis is less virulent. C. albicans causes both mucosal and hematogenously disseminated disease, C. dubliniensis mostly mucosal infections. Pulse-field electrophoresis, genomic restriction enzyme digests, Southern blotting, and the emerging sequence from the Wellcome Trust Sanger Institute were used to determine the karyotype of C. dubliniensis type strain CD36. Three chromosomes have two intact homologues. A translocation in the rDNA repeat on chromosome R exchanges telomere-proximal regions of R and chromosome 5. Translocations involving the remaining chromosomes occur at the Major Repeat Sequence. CD36 lacks an MRS on chromosome R but has one on 3. Of six other C. dubliniensis strains, no two had the same electrophoretic karyotype. Despite extensive chromosome rearrangements, karyotypic differences between C. dubliniensis and C. albicans are unlikely to affect gene expression. Karyotypic instability may account for the diminished pathogenicity of C. dubliniensis.

    Funded by: NIAID NIH HHS: AI16567, R01 AI016567-21, R01 AI016567-22, R01 AI016567-23; Wellcome Trust

    Fungal genetics and biology : FG & B 2008;45;3;338-50

  • Interferon regulatory factor-1 polymorphisms are associated with the control of Plasmodium falciparum infection.

    Mangano VD, Luoni G, Rockett KA, Sirima BS, Konaté A, Forton J, Clark TG, Bancone G, Sadighi Akha E, Akha ES, Kwiatkowski DP and Modiano D

    Dipartimento di Scienze di Sanità Pubblica, Sezione di Parassitologia, Università di Roma 'La Sapienza', Rome, Italy.

    We describe the haplotypic structure of the interferon regulatory factor-1 (IRF-1) locus in two West African ethnic groups, Fulani and Mossi, that differ in their susceptibility and immune response to Plasmodium falciparum malaria. Both populations showed significant associations between IRF-1 polymorphisms and carriage of P. falciparum infection, with different patterns of association that may reflect their different haplotypic architecture. Genetic variation at this locus does not therefore account for the Fulani-specific resistance to malaria while it could contribute to parasite clearance's ability in populations living in endemic areas. We then conducted a case-control study of three haplotype-tagging single nucleotide polymorphisms (htSNPs) in 370 hospitalised malaria patients (160 severe and 210 uncomplicated) and 410 healthy population controls, all from the Mossi ethnic group. All three htSNPs showed correlation with blood infection levels in malaria patients, and the rs10065633 polymorphism was associated with severe disease (P=0.02). These findings provide the first evidence of the involvement in malaria susceptibility of a specific locus within the 5q31 region, previously shown to be linked with P. falciparum infection levels.

    Funded by: Medical Research Council; Wellcome Trust: 077383

    Genes and immunity 2008;9;2;122-9

  • Comparative cytogenetics of bats (Chiroptera): the prevalence of Robertsonian translocations limits the power of chromosomal characters in resolving interfamily phylogenetic relationships.

    Mao X, Nie W, Wang J, Su W, Feng Q, Wang Y, Dobigny G and Yang F

    Kunming Institute of Zoology, Kunming, Yunnan, People's Republic of China.

    Although the monophyly of Chiroptera is well supported by many independent studies, higher-level systematics, e.g. the monophyly of microbats, remains disputed by morphological and molecular studies. Chromosomal rearrangements, as one type of rare genomic changes, have become increasingly popular in phylogenetic studies as alternatives to molecular and other morphological characters. Here, the representatives of families Megadermatidae and Emballonuridae are studied by comparative chromosome painting for the first time. The results have been integrated into published comparative maps, providing an opportunity to assess genome-wide chromosomal homologies between the representatives of eight bat families. Our results further substantiate the wide occurrence of Robertsonian translocations in bats, with the possible involvement of whole-arm reciprocal translocations (WARTs). In order to search for valid cytogenetic signature(s) for each family and superfamily, evolutionary chromosomal rearrangements identified by chromosomal painting and/or banding comparison are subjected to two independent analyses: (1) a cladistic analysis using parsimony and (2) the mapping of these chromosomal changes onto the molecularly defined phylogenetic tree available from the literature. Both analyses clearly indicate the prevalence of homoplasic events that reduce the reliability of chromosomal characters for resolving interfamily relationships in bats.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2008;16;1;155-70

  • Next-generation sequencing: applications beyond genomes.

    Marguerat S, Wilhelm BT and Bähler J

    Cancer Research UK, Fission Yeast Functional Genomics Group, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    The development of DNA sequencing more than 30 years ago has profoundly impacted biological research. In the last couple of years, remarkable technological innovations have emerged that allow the direct and cost-effective sequencing of complex samples at unprecedented scale and speed. These next-generation technologies make it feasible to sequence not only static genomes, but also entire transcriptomes expressed under different conditions. These and other powerful applications of next-generation sequencing are rapidly revolutionizing the way genomic studies are carried out. Below, we provide a snapshot of these exciting new approaches to understanding the properties and functions of genomes. Given that sequencing-based assays may increasingly supersede microarray-based assays, we also compare and contrast data obtained from these distinct approaches.

    Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118

    Biochemical Society transactions 2008;36;Pt 5;1091-6

  • GeConT 2: gene context analysis for orthologous proteins, conserved domains and metabolic pathways.

    Martinez-Guerrero CE, Ciria R, Abreu-Goodger C, Moreno-Hagelsieb G and Merino E

    Departmento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62210, México.

    The Gene Context Tool (GeConT) allows users to visualize the genomic context of a gene or a group of genes and their orthologous relationships within fully sequenced bacterial genomes. The new version of the server incorporates information from the COG, Pfam and KEGG databases, allowing users to have an integrated graphical representation of the function of genes at multiple levels, their phylogenetic distribution and their genomic context. The sequence of any of the genes can be easily retrieved, as well as the 5' or 3' regulatory regions, greatly facilitating further types of analysis. GeConT 2 is available at:

    Nucleic acids research 2008;36;Web Server issue;W176-80

  • Gametogenesis in malaria parasites is mediated by the cGMP-dependent protein kinase.

    McRobert L, Taylor CJ, Deng W, Fivelman QL, Cummings RM, Polley SD, Billker O and Baker DA

    Department of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom.

    Malaria parasite transmission requires differentiation of male and female gametocytes into gametes within a mosquito following a blood meal. A mosquito-derived molecule, xanthurenic acid (XA), can trigger gametogenesis, but the signalling events controlling this process in the human malaria parasite Plasmodium falciparum remain unknown. A role for cGMP was revealed by our observation that zaprinast (an inhibitor of phosphodiesterases that hydrolyse cGMP) stimulates gametogenesis in the absence of XA. Using cGMP-dependent protein kinase (PKG) inhibitors in conjunction with transgenic parasites expressing an inhibitor-insensitive mutant PKG enzyme, we demonstrate that PKG is essential for XA- and zaprinast-induced gametogenesis. Furthermore, we show that intracellular calcium (Ca2+) is required for differentiation and acts downstream of or in parallel with PKG activation. This work defines a key role for PKG in gametogenesis, elucidates the hierarchy of signalling events governing this process in P. falciparum, and demonstrates the feasibility of selective inhibition of a crucial regulator of the malaria parasite life cycle.

    Funded by: Medical Research Council: 012174; Wellcome Trust: 066742

    PLoS biology 2008;6;6;e139

  • Microbiology in the post-genomic era.

    Medini D, Serruto D, Parkhill J, Relman DA, Donati C, Moxon R, Falkow S and Rappuoli R

    Novartis Vaccines and Diagnostics, 53100 Siena, Italy.

    Genomics has revolutionized every aspect of microbiology. Now, 13 years after the first bacterial genome was sequenced, it is important to pause and consider what has changed in microbiology research as a consequence of genomics. In this article, we review the evolving field of bacterial typing and the genomic technologies that enable comparative analysis of multiple genomes and the metagenomes of complex microbial environments, and address the implications of the genomic era for the future of microbiology.

    Nature reviews. Microbiology 2008;6;6;419-30

  • Recurrent rearrangements of chromosome 1q21.1 and variable pediatric phenotypes.

    Mefford HC, Sharp AJ, Baker C, Itsara A, Jiang Z, Buysse K, Huang S, Maloney VK, Crolla JA, Baralle D, Collins A, Mercer C, Norga K, de Ravel T, Devriendt K, Bongers EM, de Leeuw N, Reardon W, Gimelli S, Bena F, Hennekam RC, Male A, Gaunt L, Clayton-Smith J, Simonic I, Park SM, Mehta SG, Nik-Zainal S, Woods CG, Firth HV, Parkin G, Fichera M, Reitano S, Lo Giudice M, Li KE, Casuga I, Broomer A, Conrad B, Schwerzmann M, Räber L, Gallati S, Striano P, Coppola A, Tolmie JL, Tobias ES, Lilley C, Armengol L, Spysschaert Y, Verloo P, De Coene A, Goossens L, Mortier G, Speleman F, van Binsbergen E, Nelen MR, Hochstenbach R, Poot M, Gallagher L, Gill M, McClellan J, King MC, Regan R, Skinner C, Stevenson RE, Antonarakis SE, Chen C, Estivill X, Menten B, Gimelli G, Gribble S, Schwartz S, Sutcliffe JS, Walsh T, Knight SJ, Sebat J, Romano C, Schwartz CE, Veltman JA, de Vries BB, Vermeesch JR, Barber JC, Willatt L, Tassabehji M and Eichler EE

    University of Washington School of Medicine, Seattle 98195, USA.

    Background: Duplications and deletions in the human genome can cause disease or predispose persons to disease. Advances in technologies to detect these changes allow for the routine identification of submicroscopic imbalances in large numbers of patients.

    Methods: We tested for the presence of microdeletions and microduplications at a specific region of chromosome 1q21.1 in two groups of patients with unexplained mental retardation, autism, or congenital anomalies and in unaffected persons.

    Results: We identified 25 persons with a recurrent 1.35-Mb deletion within 1q21.1 from screening 5218 patients. The microdeletions had arisen de novo in eight patients, were inherited from a mildly affected parent in three patients, were inherited from an apparently unaffected parent in six patients, and were of unknown inheritance in eight patients. The deletion was absent in a series of 4737 control persons (P=1.1x10(-7)). We found considerable variability in the level of phenotypic expression of the microdeletion; phenotypes included mild-to-moderate mental retardation, microcephaly, cardiac abnormalities, and cataracts. The reciprocal duplication was enriched in nine children with mental retardation or autism spectrum disorder and other variable features (P=0.02). We identified three deletions and three duplications of the 1q21.1 region in an independent sample of 788 patients with mental retardation and congenital anomalies.

    Conclusions: We have identified recurrent molecular lesions that elude syndromic classification and whose disease manifestations must be considered in a broader context of development as opposed to being assigned to a specific disease. Clinical diagnosis in patients with these lesions may be most readily achieved on the basis of genotype rather than phenotype.

    Funded by: Department of Health; Howard Hughes Medical Institute; NICHD NIH HHS: HD043569, R01 HD043569-06; Wellcome Trust: 061183

    The New England journal of medicine 2008;359;16;1685-99

  • Key function for the CCAAT-binding factor Php4 to regulate gene expression in response to iron deficiency in fission yeast.

    Mercier A, Watt S, Bähler J and Labbé S

    Département de Biochimie, Faculté de Médecine, Université de Sherbrooke, 3001, 12e Ave. Nord, Sherbrooke, Quebec J1H 5N4, Canada.

    The fission yeast Schizosaccharomyces pombe responds to the deprivation of iron by inducing the expression of the php4+ gene, which encodes a negative regulatory subunit of the heteromeric CCAAT-binding factor. Once formed, the Php2/3/4/5 transcription complex is required to inactivate a subset of genes encoding iron-using proteins. Here, we used a pan-S. pombe microarray to study the transcriptional response to iron starvation and identified 86 genes that exhibit php4+-dependent changes on a genome-wide scale. One of these genes encodes the iron-responsive transcriptional repressor Fep1, whose mRNA levels were decreased after treatment with the permeant iron chelator 2,2'-dipyridyl. In addition, several genes encoding the components of iron-dependent biochemical pathways, including the tricarboxylic acid cycle, mitochondrial respiration, amino acid biosynthesis, and oxidative stress defense, were downregulated in response to iron deficiency. Furthermore, Php4 repressed transcription when brought to a promoter using a yeast DNA-binding domain, and iron deprivation was required for this repression. On the other hand, Php4 was constitutively active when glutathione levels were depleted within the cell. Based on these and previous results, we propose that iron-dependent inactivation of Php4 is regulated at two distinct levels: first, at the transcriptional level by the iron-responsive GATA factor Fep1 and second, at the posttranscriptional level by a mechanism yet to be identified, which inhibits Php4-mediated repressive function when iron is abundant.

    Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118

    Eukaryotic cell 2008;7;3;493-508

  • Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations.

    Merrihew GE, Davis C, Ewing B, Williams G, Käll L, Frewen BE, Noble WS, Green P, Thomas JH and MacCoss MJ

    University of Washington, Department of Genome Sciences, Seattle, Washington 98195, USA.

    We describe a general mass spectrometry-based approach for gene annotation of any organism and demonstrate its effectiveness using the nematode Caenorhabditis elegans. We detected 6779 C. elegans proteins (67,047 peptides), including 384 that, although annotated in WormBase WS150, lacked cDNA or other prior experimental support. We also identified 429 new coding sequences that were unannotated in WS150. Nearly half (192/429) of the new coding sequences were confirmed with RT-PCR data. Thirty-three (approximately 8%) of the new coding sequences had been predicted to be pseudogenes, 151 (approximately 35%) reveal apparent errors in gene models, and 245 (57%) appear to be novel genes. In addition, we verified 6010 exon-exon splice junctions within existing WormBase gene models. Our work confirms that mass spectrometry is a powerful experimental tool for annotating sequenced genomes. In addition, the collection of identified peptides should facilitate future proteomics experiments targeted at specific proteins of interest.

    Funded by: Howard Hughes Medical Institute; NCRR NIH HHS: P41-RR011823; NIDDK NIH HHS: R01-DK069386; NIGMS NIH HHS: R21-GM074787

    Genome research 2008;18;10;1660-9

  • Williams-Beuren syndrome TRIM50 encodes an E3 ubiquitin ligase.

    Micale L, Fusco C, Augello B, Napolitano LM, Dermitzakis ET, Meroni G, Merla G and Reymond A

    Medical Genetics Unit, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy.

    Williams-Beuren syndrome (WBS) is a neurodevelopmental and multisystemic disease that results from hemizygosity of approximately 25 genes mapping to chromosomal region 7q11.23. We report here the preliminary description of eight novel genes mapping within the WBS critical region and/or its syntenic mouse region. Three of these genes, TRIM50, TRIM73 and TRIM74, belong to the TRIpartite motif gene family, members of which were shown to be associated to several human genetic diseases. We describe the preliminary functional characterization of these genes and show that Trim50 encodes an E3 ubiquitin ligase, opening the interesting hypothesis that the ubiquitin-mediated proteasome pathway might be involved in the WBS phenotype.

    Funded by: Telethon: GGP06122; Wellcome Trust: 077046

    European journal of human genetics : EJHG 2008;16;9;1038-49

  • Genomic expression patterns in cell separation mutants of Schizosaccharomyces pombe defective in the genes sep10 ( + ) and sep15 ( + ) coding for the Mediator subunits Med31 and Med8.

    Miklos I, Szilagyi Z, Watt S, Zilahi E, Batta G, Antunovics Z, Enczi K, Bähler J and Sipiczki M

    Department of Genetics and Applied Microbiology, University of Debrecen, Debrecen, Hungary.

    Cell division is controlled by a complex network involving regulated transcription of genes and postranslational modification of proteins. The aim of this study is to demonstrate that the Mediator complex, a general regulator of transcription, is involved in the regulation of the second phase (cell separation) of cell division of the fission yeast Schizosaccharomyces pombe. In previous studies we have found that the fission yeast cell separation genes sep10 ( + ) and sep15 ( + ) code for proteins (Med31 and Med8) associated with the Mediator complex. Here, we show by genome-wide gene expression profiling of mutants defective in these genes that both Med8 and Med31 control large, partially overlapping sets of genes scattered over the entire genome and involved in diverse biological functions. Six cell separation genes controlled by the transcription factors Sep1 and Ace2 are among the target genes. Since neither sep1 ( + ) nor ace2 ( + ) is affected in the mutant cells, we propose that the Med8 and Med31 proteins act as coactivators of the Sep1-Ace2-dependent cell separation genes. The results also indicate that the subunits of Mediator may contribute to the coordination of cellular processes by fine-tuning of the expression of larger sets of genes.

    Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118

    Molecular genetics and genomics : MGG 2008;279;3;225-38

  • Oligosaccharyltransferase-subunit mutations in nonsyndromic mental retardation.

    Molinari F, Foulquier F, Tarpey PS, Morelle W, Boissel S, Teague J, Edkins S, Futreal PA, Stratton MR, Turner G, Matthijs G, Gecz J, Munnich A and Colleaux L

    Laboratoire de Génétique et Epigénétique des Maladies Métaboliques, Neurosensorielles et du Développement (INSERM U781), Université Paris Descartes, Hôpital Necker-Enfants Malades, F-75015 Paris, France.

    Mental retardation (MR) is the most frequent handicap among children and young adults. Although a large proportion of X-linked MR genes have been identified, only four genes responsible for autosomal-recessive nonsyndromic MR (AR-NSMR) have been described so far. Here, we report on two genes involved in autosomal-recessive and X-linked NSMR. First, autozygosity mapping in two sibs born to first-cousin French parents led to the identification of a region on 8p22-p23.1. This interval encompasses the gene N33/TUSC3 encoding one subunit of the oligosaccharyltransferase (OTase) complex, which catalyzes the transfer of an oligosaccharide chain on nascent proteins, the key step of N-glycosylation. Sequencing N33/TUSC3 identified a 1 bp insertion, c.787_788insC, resulting in a premature stop codon, p.N263fsX300, and leading to mRNA decay. Surprisingly, glycosylation analyses of patient fibroblasts showed normal N-glycan synthesis and transfer, suggesting that normal N-glycosylation observed in patient fibroblasts may be due to functional compensation. Subsequently, screening of the X-linked N33/TUSC3 paralog, the IAP gene, identified a missense mutation (c.932T-->G, p.V311G) in a family with X-linked NSMR. Recent studies of fucosylation and polysialic-acid modification of neuronal cell-adhesion glycoproteins have shown the critical role of glycosylation in synaptic plasticity. However, our data provide the first demonstration that a defect in N-glycosylation can result in NSMR. Together, our results demonstrate that fine regulation of OTase activity is essential for normal cognitive-function development, providing therefore further insights to understand the pathophysiological bases of MR.

    American journal of human genetics 2008;82;5;1150-7

  • Fission yeast SWI/SNF and RSC complexes show compositional and functional differences from budding yeast.

    Monahan BJ, Villén J, Marguerat S, Bähler J, Gygi SP and Winston F

    Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.

    SWI/SNF chromatin-remodeling complexes have crucial roles in transcription and other chromatin-related processes. The analysis of the two members of this class in Saccharomyces cerevisiae, SWI/SNF and RSC, has heavily contributed to our understanding of these complexes. To understand the in vivo functions of SWI/SNF and RSC in an evolutionarily distant organism, we have characterized these complexes in Schizosaccharomyces pombe. Although core components are conserved between the two yeasts, the compositions of S. pombe SWI/SNF and RSC differ from their S. cerevisiae counterparts and in some ways are more similar to metazoan complexes. Furthermore, several of the conserved proteins, including actin-like proteins, are markedly different between the two yeasts with respect to their requirement for viability. Finally, phenotypic and microarray analyses identified widespread requirements for SWI/SNF and RSC on transcription including strong evidence that SWI/SNF directly represses iron-transport genes.

    Funded by: Cancer Research UK: A6517, C9546/A6517; NHGRI NIH HHS: HG3456, R01 HG003456-03; NIGMS NIH HHS: GM32967, R37 GM032967-25, R37 GM032967-29; Wellcome Trust: 077118

    Nature structural & molecular biology 2008;15;8;873-80

  • Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum.

    Mourier T, Carret C, Kyes S, Christodoulou Z, Gardner PP, Jeffares DC, Pinches R, Barrell B, Berriman M, Griffiths-Jones S, Ivens A, Newbold C and Pain A

    Ancient DNA and Evolution Group, Department of Biology, University of Copenhagen, Copenhagen DK-2100, Denmark.

    We undertook a genome-wide search for novel noncoding RNAs (ncRNA) in the malaria parasite Plasmodium falciparum. We used the RNAz program to predict structures in the noncoding regions of the P. falciparum 3D7 genome that were conserved with at least one of seven other Plasmodium spp. genome sequences. By using Northern blot analysis for 76 high-scoring predictions and microarray analysis for the majority of candidates, we have verified the expression of 33 novel ncRNA transcripts including four members of a ncRNA family in the asexual blood stage. These transcripts represent novel structured ncRNAs in P. falciparum and are not represented in any RNA databases. We provide supporting evidence for purifying selection acting on the experimentally verified ncRNAs by comparing the nucleotide substitutions in the predicted ncRNA candidate structures in P. falciparum with the closely related chimp malaria parasite P. reichenowi. The high confirmation rate within a single parasite life cycle stage suggests that many more of the predictions may be expressed in other stages of the organism's life cycle.

    Funded by: Wellcome Trust

    Genome research 2008;18;2;281-92

  • Dynamic instability of the major urinary protein gene family revealed by genomic and phenotypic comparisons between C57 and 129 strain mice.

    Mudge JM, Armstrong SD, McLaren K, Beynon RJ, Hurst JL, Nicholson C, Robertson DH, Wilming LG and Harrow JL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SA, UK.

    Background: The major urinary proteins (MUPs) of Mus musculus domesticus are deposited in urine in large quantities, where they bind and release pheromones and also provide an individual 'recognition signal' via their phenotypic polymorphism. Whilst important information about MUP functionality has been gained in recent years, the gene cluster is poorly studied in terms of structure, genic polymorphism and evolution.

    Results: We combine targeted sequencing, manual genome annotation and phylogenetic analysis to compare the Mup clusters of C57BL/6J and 129 strains of mice. We describe organizational heterogeneity within both clusters: a central array of cassettes containing Mup genes highly similar at the protein level, flanked by regions containing Mup genes displaying significantly elevated divergence. Observed genomic rearrangements in all regions have likely been mediated by endogenous retroviral elements. Mup loci with coding sequences that differ between the strains are identified--including a gene/pseudogene pair--suggesting that these inbred lineages exhibit variation that exists in wild populations. We have characterized the distinct MUP profiles in the urine of both strains by mass spectrometry. The total MUP phenotype data is reconciled with our genomic sequence data, matching all proteins identified in urine to annotated genes.

    Conclusion: Our observations indicate that the MUP phenotypic polymorphism observed in wild populations results from a combination of Mup gene turnover coupled with currently unidentified mechanisms regulating gene expression patterns. We propose that the structural heterogeneity described within the cluster reflects functional divergence within the Mup gene family.

    Funded by: NHGRI NIH HHS: U54 HG004555-01; Wellcome Trust: 077198

    Genome biology 2008;9;5;R91

  • Comparative gene mapping in cattle, Indian muntjac, and Chinese muntjac by fluorescence in situ hybridization.

    Murmann AE, Mincheva A, Scheuermann MO, Gautier M, Yang F, Buitkamp J, Strissel PL, Strick R, Rowley JD and Lichter P

    Department of Medicine, Section Hematology/Oncology, University of Chicago, Chicago, IL 60637, USA.

    The Indian muntjac (Muntiacus muntjak vaginalis) has a karyotype of 2n = 6 in the female and 2n = 7 in the male. The karyotypic evolution of Indian muntjac via extensive tandem fusions and several centric fusions are well documented by molecular cytogenetic studies mainly utilizing chromosome paints. To achieve higher resolution mapping, a set of 42 different genomic clones coding for 37 genes and the nucleolar organizer region were used to examine homologies between the cattle (2n = 60), human (2n = 46), Indian muntjac (2n = 6/7) and Chinese muntjac (2n = 46) karyotypes. These genomic clones were mapped by fluorescence in situ hybridization (FISH). Localization of genes on all three pairs of M. m. vaginalis chromosomes and on the acrocentric chromosomes of M. reevesi allowed not only the analysis of the evolution of syntenic regions within the muntjac genus but also allowed a broader comparison of synteny with more distantly related species, such as cattle and human, to shed more light onto the evolving genome organization.

    Genetica 2008;134;3;345-51

  • Effectiveness of continuous glucose monitoring in pregnant women with diabetes: randomised clinical trial.

    Murphy HR, Rayman G, Lewis K, Kelly S, Johal B, Duffield K, Fowler D, Campbell PJ and Temple RC

    Department of Diabetes and Endocrinology, Ipswich Hospital NHS Trust, Ipswich IP4 5PD.

    Objective: To evaluate the effectiveness of continuous glucose monitoring during pregnancy on maternal glycaemic control, infant birth weight, and risk of macrosomia in women with type 1 and type 2 diabetes.

    Design: Prospective, open label randomised controlled trial.

    Setting: Two secondary care multidisciplinary obstetric clinics for diabetes in the United Kingdom.

    Participants: 71 women with type 1 diabetes (n=46) or type 2 diabetes (n=25) allocated to antenatal care plus continuous glucose monitoring (n=38) or to standard antenatal care (n=33).

    Intervention: Continuous glucose monitoring was used as an educational tool to inform shared decision making and future therapeutic changes at intervals of 4-6 weeks during pregnancy. All other aspects of antenatal care were equal between the groups.

    The primary outcome was maternal glycaemic control during the second and third trimesters from measurements of HbA(1c) levels every four weeks. Secondary outcomes were birth weight and risk of macrosomia using birthweight standard deviation scores and customised birthweight centiles. Statistical analyses were done on an intention to treat basis.

    Results: Women randomised to continuous glucose monitoring had lower mean HbA(1c) levels from 32 to 36 weeks' gestation compared with women randomised to standard antenatal care: 5.8% (SD 0.6) v 6.4% (SD 0.7). Compared with infants of mothers in the control arm those of mothers in the intervention arm had decreased mean birthweight standard deviation scores (0.9 v 1.6; effect size 0.7 SD, 95% confidence interval 0.0 to 1.3), decreased median customised birthweight centiles (69% v 93%), and a reduced risk of macrosomia (odds ratio 0.36, 95% confidence interval 0.13 to 0.98).

    Conclusion: Continuous glucose monitoring during pregnancy is associated with improved glycaemic control in the third trimester, lower birth weight, and reduced risk of macrosomia.

    Current Controlled Trials ISRCTN84461581.

    BMJ (Clinical research ed.) 2008;337;a1680

  • Chromhome: a rich internet application for accessing comparative chromosome homology maps.

    Nagarajan S, Rens W, Stalker J, Cox T and Ferguson-Smith MA

    Cambridge Resource Centre for Comparative Genomics, Department of Veterinary Medicine, University of Cambridge, Madingley Road Cambridge CB3 0ES, UK.

    Background: Comparative genomics has become a significant research area in recent years, following the availability of a number of sequenced genomes. The comparison of genomes is of great importance in the analysis of functionally important genome regions. It can also be used to understand the phylogenetic relationships of species and the mechanisms leading to rearrangement of karyotypes during evolution. Many species have been studied at the cytogenetic level by cross species chromosome painting. With the large amount of such information, it has become vital to computerize the data and make them accessible worldwide. Chromhome is a comprehensive web application that is designed to provide cytogenetic comparisons among species and to fulfil this need.

    Results: The Chromhome application architecture is multi-tiered with an interactive client layer, business logic and database layers. Enterprise java platform with open source framework OpenLaszlo is used to implement the Rich Internet Chromhome Application. Cross species comparative mapping raw data are collected and the processed information is stored into MySQL Chromhome database. Chromhome Release 1.0 contains 109 homology maps from 51 species. The data cover species from 14 orders and 30 families. The homology map displays all the chromosomes of the compared species as one image, making comparisons among species easier. Inferred data also provides maps of homologous regions that could serve as a guideline for researchers involved in phylogenetic or evolution based studies.

    Conclusion: Chromhome provides a useful resource for comparative genomics, holding graphical homology maps of a wide range of species. It brings together cytogenetic data of many genomes under one roof. Inferred painting can often determine the chromosomal homologous regions between two species, if each has been compared with a common third species. Inferred painting greatly reduces the need to map entire genomes and helps focus only on relevant regions of the chromosomes of the species under study. Future releases of Chromhome will accommodate more species and their respective gene and BAC maps, in addition to chromosome painting data. Chromhome application provides a single-page interface (SPI) with desktop style layout, delivering a better and richer user experience.

    Funded by: Wellcome Trust

    BMC bioinformatics 2008;9;168

  • Susceptibility to sequelae of human ocular chlamydial infection associated with allelic variation in IL10 cis-regulation.

    Natividad A, Holland MJ, Rockett KA, Forton J, Faal N, Joof HM, Mabey DC, Bailey RL and Kwiatkowski DP

    Infectious and Tropical Diseases Department, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK.

    Trachoma, an infectious disease of the conjunctiva caused by Chlamydia trachomatis, causes scarring and blindness in some infected individuals but not others. In an African community where trachoma is endemic, we have previously identified an IL10 haplotype that is associated with increased risk of scarring complications. Here we examine the hypothesis that the risk haplotype (H-RISK) affects levels of IL10 expression in the conjunctiva during active trachoma infection. To overcome potential genetic and environmental confounders we used the method of allele-specific quantification, which involved identifying subjects in the community who had active trachoma and were also heterozygous for the H-RISK. We find that there is allelic variation in cis-regulation of IL10 in the conjunctiva during active trachoma, with the H-RISK generating relatively more IL10 transcripts than other haplotypes in this population (average difference in IL10 allelic transcripts in the conjunctiva of heterozygous individuals infected with C. trachomatis of 23% (95% confidence interval: 14-32%, P < 0.0001). These findings provide a plausible functional explanation for the observed genetic association, and support the hypothesis that an excessive IL10 response to C. trachomatis infection is a risk factor for scarring and blindness.

    Funded by: Wellcome Trust

    Human molecular genetics 2008;17;2;323-9

  • Meta-analysis of 32 genome-wide linkage studies of schizophrenia

    Ng, M. Y. M., Levinson, D. F, Faraone, S. V, Suarez, B. K. et al

    Molecular Psychiatry 2008

  • Using gene expression to investigate the genetic basis of complex disorders.

    Nica AC and Dermitzakis ET

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1HH, UK.

    The identification of complex disease susceptibility loci through genome-wide association studies (GWAS) has recently become possible and is now a method of choice for investigating the genetic basis of complex traits. The number of results from such studies is constantly increasing but the challenge lying forward is to identify the biological context in which these statistically significant candidate variants act. Regulatory variation plays an important role in shaping phenotypic differences among individuals and thus is very likely to also influence disease susceptibility. As such, integrating gene expression data and other disease relevant intermediate phenotypes with GWAS results could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology. Combining these different levels of information in a meaningful way is however not trivial. In the present review, we outline the several approaches that have been explored so far in this sense and their achievements. We also discuss the limitations of the methods and how upcoming technological developments could help circumvent these limitations. Overall, such efforts will be very helpful in understanding initially regulatory effects on disease and disease etiology in general.

    Funded by: Wellcome Trust: 077046

    Human molecular genetics 2008;17;R2;R129-34

  • Flying lemurs--the 'flying tree shrews'? Molecular cytogenetic evidence for a Scandentia-Dermoptera sister clade.

    Nie W, Fu B, O'Brien PC, Wang J, Su W, Tanomtong A, Volobouev V, Ferguson-Smith MA and Yang F

    State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming,Yunnan 650223, People's Republic of China.

    Background: Flying lemurs or Colugos (order Dermoptera) represent an ancient mammalian lineage that contains only two extant species. Although molecular evidence strongly supports that the orders Dermoptera, Scandentia, Lagomorpha, Rodentia and Primates form a superordinal clade called Supraprimates (or Euarchontoglires), the phylogenetic placement of Dermoptera within Supraprimates remains ambiguous.

    Results: To search for cytogenetic signatures that could help to clarify the evolutionary affinities within this superordinal group, we have established a genome-wide comparative map between human and the Malayan flying lemur (Galeopterus variegatus) by reciprocal chromosome painting using both human and G. variegatus chromosome-specific probes. The 22 human autosomal paints and the X chromosome paint defined 44 homologous segments in the G. variegatus genome. A putative inversion on GVA 11 was revealed by the hybridization patterns of human chromosome probes 16 and 19. Fifteen associations of human chromosome segments (HSA) were detected in the G. variegatus genome: HSA1/3, 1/10, 2/21, 3/21, 4/8, 4/18, 7/15, 7/16, 7/19, 10/16, 12/22 (twice), 14/15, 16/19 (twice). Reverse painting of G. variegatus chromosome-specific paints onto human chromosomes confirmed the above results, and defined the origin of the homologous human chromosomal segments in these associations. In total, G. variegatus paints revealed 49 homologous chromosomal segments in the HSA genome.

    Conclusion: Comparative analysis of our map with published maps from representative species of other placental orders, including Scandentia, Primates, Lagomorpha and Rodentia, suggests a signature rearrangement (HSA2q/21 association) that links Scandentia and Dermoptera to one sister clade. Our results thus provide new evidence for the hypothesis that Scandentia and Dermoptera have a closer phylogenetic relationship to each other than either of them has to Primates.

    BMC biology 2008;6;18

  • Molecular pathology of aneurysms.

    Niemelä M, Frösen J, Hernesniemi J, Dashti R and Palotie A

    Funded by: Wellcome Trust: 089062

    Surgical neurology 2008;70;1;36-8

  • Organ-specific requirements for Hdac1 in liver and pancreas formation.

    Noël ES, Casal-Sueiro A, Busch-Nentwich E, Verkade H, Dong PD, Stemple DL and Ober EA

    National Institute for Medical Research, Division of Developmental Biology, The Ridgeway, Mill Hill, London, UK.

    Liver, pancreas and lung originate from the presumptive foregut in temporal and spatial proximity. This requires precisely orchestrated transcriptional activation and repression of organ-specific gene expression within the same cell. Here, we show distinct roles for the chromatin remodelling factor and transcriptional repressor Histone deacetylase 1 (Hdac1) in endodermal organogenesis in zebrafish. Loss of Hdac1 causes defects in timely liver specification and in subsequent differentiation. Mosaic analyses reveal a cell-autonomous requirement for hdac1 within the hepatic endoderm. Our studies further reveal specific functions for Hdac1 in pancreas development. Loss of hdac1 causes the formation of ectopic endocrine clusters anteriorly to the main islet, as well as defects in exocrine pancreas specification and differentiation. In addition, we observe defects in extrahepatopancreatic duct formation and morphogenesis. Finally, loss of hdac1 results in an expansion of the foregut endoderm in the domain from which the liver and pancreas originate. Our genetic studies demonstrate that Hdac1 is crucial for regulating distinct steps in endodermal organogenesis. This suggests a model in which Hdac1 may directly or indirectly restrict foregut fates while promoting hepatic and exocrine pancreatic specification and differentiation, as well as pancreatic endocrine islet morphogenesis. These findings establish zebrafish as a tractable system to investigate chromatin remodelling factor functions in controlling gene expression programmes in vertebrate endodermal organogenesis.

    Funded by: Medical Research Council: MC_U117581329

    Developmental biology 2008;322;2;237-50

  • Transcription and chromatin organization of a housekeeping gene cluster containing an integrated beta-globin locus control region.

    Noordermeer D, Branco MR, Splinter E, Klous P, van Ijcken W, Swagemakers S, Koutsourakis M, van der Spek P, Pombo A and de Laat W

    Department of Cell Biology and Genetics, Erasmus Medical Center, Rotterdam, The Netherlands.

    The activity of locus control regions (LCR) has been correlated with chromatin decondensation, spreading of active chromatin marks, locus repositioning away from its chromosome territory (CT), increased association with transcription factories, and long-range interactions via chromatin looping. To investigate the relative importance of these events in the regulation of gene expression, we targeted the human beta-globin LCR in two opposite orientations to a gene-dense region in the mouse genome containing mostly housekeeping genes. We found that each oppositely oriented LCR influenced gene expression on both sides of the integration site and over a maximum distance of 150 kilobases. A subset of genes was transcriptionally enhanced, some of which in an LCR orientation-dependent manner. The locus resides mostly at the edge of its CT and integration of the LCR in either orientation caused a more frequent positioning of the locus away from its CT. Locus association with transcription factories increased moderately, both for loci at the edge and outside of the CT. These results show that nuclear repositioning is not sufficient to increase transcription of any given gene in this region. We identified long-range interactions between the LCR and two upregulated genes and propose that LCR-gene contacts via chromatin looping determine which genes are transcriptionally enhanced.

    Funded by: Medical Research Council

    PLoS genetics 2008;4;3;e1000016

  • Mutations in mRNA export mediator GLE1 result in a fetal motoneuron disease.

    Nousiainen HO, Kestilä M, Pakkasjärvi N, Honkala H, Kuure S, Tallila J, Vuopala K, Ignatius J, Herva R and Peltonen L

    Department of Molecular Medicine, National Public Health Institute, Helsinki 00290, Finland.

    The most severe forms of motoneuron disease manifest in utero are characterized by marked atrophy of spinal cord motoneurons and fetal immobility. Here, we report that the defective gene underlying lethal motoneuron syndrome LCCS1 is the mRNA export mediator GLE1. Our finding of mutated GLE1 exposes a common pathway connecting the genes implicated in LCCS1, LCCS2 and LCCS3 and elucidates mRNA processing as a critical molecular mechanism in motoneuron development and maturation.

    Funded by: NIEHS NIH HHS: P01 ES11253-03; Wellcome Trust: 089061

    Nature genetics 2008;40;2;155-7

  • Post-genomic challenges for collaborative research in infectious diseases.

    Okeke IN and Wain J

    Department of Biology, Haverford College, Haverford, Pennsylvania 19041, USA.

    Although high-burden pathogens have been prioritized for sequencing, genomic research has yet to yield effective vaccines, diagnostics or therapeutics for the infectious diseases that burden developing countries. International research partnerships are needed more today than ever before, and we propose that increased participation by scientists in endemic areas would overcome current roadblocks and is an essential path towards translational research outcomes.

    Funded by: Wellcome Trust

    Nature reviews. Microbiology 2008;6;11;858-64

  • Reconfiguration of genomic anchors upon transcriptional activation of the human major histocompatibility complex.

    Ottaviani D, Lever E, Mitter R, Jones T, Forshew T, Christova R, Tomazou EM, Rakyan VK, Krawetz SA, Platts AE, Segarane B, Beck S and Sheer D

    Cancer Research UK London Research Institute, Lincoln's Inn Fields, London WC2A 3PX, United Kingdom.

    The folding of chromatin into topologically constrained loop domains is essential for genomic function. We have identified genomic anchors that define the organization of chromatin loop domains across the human major histocompatibility complex (MHC). This locus contains critical genes for immunity and is associated with more diseases than any other region of the genome. Classical MHC genes are expressed in a cell type-specific pattern and can be induced by cytokines such as interferon-gamma (IFNG). Transcriptional activation of the MHC was associated with a reconfiguration of chromatin architecture resulting from the formation of additional genomic anchors. These findings suggest that the dynamic arrangement of genomic anchors and loops plays a role in transcriptional regulation.

    Funded by: Cancer Research UK: A8318, C5321/A8318; NICHD NIH HHS: HD36512; Wellcome Trust: WT084071

    Genome research 2008;18;11;1778-86

  • Critical immunological pathways are downregulated in APECED patient dendritic cells.

    Pöntynen N, Strengell M, Sillanpää N, Saharinen J, Ulmanen I, Julkunen I and Peltonen L

    National Public Health Institute and FIMM, Institute for Molecular Medicine Finland, Biomedicum, Helsinki, Finland.

    Autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED) is a monogenic autoimmune disease caused by mutations in the autoimmune regulator (AIRE) gene. AIRE functions as a transcriptional regulator, and it has a central role in the development of immunological tolerance. AIRE regulates the expression of ectopic antigens in epithelial cells of the thymic medulla and has been shown to participate in the development of peripheral tolerance. However, the mechanism of action of AIRE has remained elusive. To further investigate the role of AIRE in host immune functions, we studied the properties and transcript profiles in in vitro monocyte-differentiated dendritic cells (moDCs) obtained from APECED patients and healthy controls. AIRE-deficient monocytes showed typical DC morphology and expressed DC marker proteins cluster of differentiation 86 and human leukocyte antigen class II. APECED patient-derived moDCs were functionally impaired: the transcriptional response of cytokine genes to pathogens was drastically reduced. Interestingly, some changes were observable already at the immature DC stage. Pathway analyses of transcript profiles revealed that the expression of the components of the host cell signaling pathways involved in cell-cell signalling, innate immune responses, and cytokine activity were reduced in APECED moDCs. Our observations support a role for AIRE in peripheral tolerance and are the first ones to show that AIRE has a critical role in DC responses to microbial stimuli in humans.

    Funded by: Wellcome Trust: 089061

    Journal of molecular medicine (Berlin, Germany) 2008;86;10;1139-52

  • Scheduling in a dynamic heterogeneous distributed system using estimation error

    Page, A. J, Keane, T. M, Naughton, T. J.

    Journal of Parallel and Distributed Computing. 2008

  • Genomic adaptation: a fungal perspective.

    Pain A and Hertz-Fowler C

    Nature reviews. Microbiology 2008;6;8;572-3

  • The genome of the simian and human malaria parasite Plasmodium knowlesi.

    Pain A, Böhme U, Berry AE, Mungall K, Finn RD, Jackson AP, Mourier T, Mistry J, Pasini EM, Aslett MA, Balasubrammaniam S, Borgwardt K, Brooks K, Carret C, Carver TJ, Cherevach I, Chillingworth T, Clark TG, Galinski MR, Hall N, Harper D, Harris D, Hauser H, Ivens A, Janssen CS, Keane T, Larke N, Lapp S, Marti M, Moule S, Meyer IM, Ormond D, Peters N, Sanders M, Sanders S, Sargeant TJ, Simmonds M, Smith F, Squares R, Thurston S, Tivey AR, Walker D, White B, Zuiderwijk E, Churcher C, Quail MA, Cowman AF, Turner CM, Rajandream MA, Kocken CH, Thomas AW, Newbold CI, Barrell BG and Berriman M

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated, and it has a close phylogenetic relationship to Plasmodium vivax, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome and other sequenced Plasmodium genomes. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.

    Funded by: Wellcome Trust: 085775

    Nature 2008;455;7214;799-803

  • Estimating the ancestral recombinations graph (ARG) as compatible networks of SNP patterns.

    Parida L, Melé M, Calafell F, Bertranpetit J and Genographic Consortium

    Computational Biology Center, IBM TJ Watson Research, Yorktown Heights, NY 10598, USA.

    Traditionally nonrecombinant genome, i.e., mtDNA or Y chromosome, has been used for phylogeography, notably for ease of analysis. The topology of the phylogeny structure in this case is an acyclic graph, which is often a tree, is easy to comprehend and is somewhat easy to infer. However, recombination is an undeniable genetic fact for most part of the genome. Driven by the need for a more complete analysis, we address the problem of estimating the ancestral recombination graph (ARG) from a collection of extant sequences. We exploit the coherence that is observed in the human haplotypes as patterns and present a network model of patterns to reconstruct the ARG. We test our model on simulations that closely mimic the observed haplotypes and observe promising results.

    Journal of computational biology : a journal of computational molecular cell biology 2008;15;9;1133-54

  • alpha-Isoform of calcium-calmodulin-dependent protein kinase II and postsynaptic density protein 95 differentially regulate synaptic expression of NR2A- and NR2B-containing N-methyl-d-aspartate receptors in hippocampus.

    Park CS, Elgersma Y, Grant SG and Morrison JH

    Department of Neuroscience, Mount Sinai School of Medicine, New York, NY 10029, USA.

    N-methyl-d-aspartate receptors (NMDARs) are critical determinants of bidirectional synaptic plasticity, however, studies of NMDAR function have been based primarily on pharmacological and electrophysiological manipulations, and it is still debated whether there are subunit-selective forms of long-term potentiation (LTP) and long-term depression (LTD). Here we provide ultrastructural analyses of axospinous synapses in cornu ammonis field 1 of hippocampus (CA1) stratum radiatum of transgenic mice with mutations to two key underlying postsynaptic density (PSD) proteins, postsynaptic density protein 95 (PSD-95) and the alpha-isoform of calcium-calmodulin-dependent protein kinase II (alphaCaMKII). Distribution profiles of synaptic proteins in these mice reveal very different patterns of subunit-specific NMDAR localization, which may be related to the divergent phenotypes of the two mutants. In PSD-95, Dlg, ZO-1/Dlg-homologous region (PDZ) 3-truncated mutant mice in which LTD could not be induced but LTP was found to be enhanced, we found a subtle, yet preferential displacement of synaptic N-methyl-d-aspartate receptor subunit 2B (NR2B) subunits in lateral regions of the synapse without affecting changes in the localization of N-methyl-d-aspartate receptor subunit 2A (NR2A) subunits. In persistent inhibitory alphaCaMKII Thr305 substituted with Asp in alpha-isoform of calcium-calmodulin kinase II (T305D) mutant mice with severely impaired LTP but stable LTD expression, we found a selective reduction of NR2A subunits at both the synapse and throughout the cytoplasm of the spine without any effect on the NR2B subunit. In an experiment of mutual exclusivity, neither PSD-95 nor alphaCaMKII localization was found to be affected by mutations to the corresponding PSD protein suggesting that they are functionally independent of the other in the regulation of NR2A- and NR2B-containing NMDARs preceding synaptic activity. Consequently, there may exist at least two distinct PSD-95 and alphaCaMKII-specific NMDAR complexes involved in mediating LTP and LTD through opposing signal transduction pathways in synapses of the hippocampus. The contrasting phenotypes of the PSD-95 and alphaCaMKII mutant mice further establish the prospect of an independent and, possibly, competing mechanism for the regulation of NMDAR-dependent bidirectional synaptic plasticity.

    Funded by: NIA NIH HHS: AG06647, R37 AG006647-20; Wellcome Trust

    Neuroscience 2008;151;1;43-55

  • Whole-Genome analysis of pathogen evolution

    Parkhill J

    Evolution in Helth and Disease 2008;199-213

  • Whole-genome analysis of pathogen evolution

    Parkhill J

    Evolution in Health and Disease. 2008;199-214

  • Time to remove the model organism blinkers.

    Parkhill J

    Trends in microbiology 2008;16;11;510-1

  • Differential roles of the PKC novel isoforms, PKCdelta and PKCepsilon, in mouse and human platelets.

    Pears CJ, Thornber K, Auger JM, Hughes CE, Grygielska B, Protty MB, Pearce AC and Watson SP

    Department of Biochemistry, University of Oxford, Oxford, United Kingdom.

    Background: Increasing evidence suggests that individual isoforms of protein kinase C (PKC) play distinct roles in regulating platelet activation.

    In this study, we focus on the role of two novel PKC isoforms, PKCdelta and PKCepsilon, in both mouse and human platelets. PKCdelta is robustly expressed in human platelets and undergoes transient tyrosine phosphorylation upon stimulation by thrombin or the collagen receptor, GPVI, which becomes sustained in the presence of the pan-PKC inhibitor, Ro 31-8220. In mouse platelets, however, PKCdelta undergoes sustained tyrosine phosphorylation upon activation. In contrast the related isoform, PKCepsilon, is expressed at high levels in mouse but not human platelets. There is a marked inhibition in aggregation and dense granule secretion to low concentrations of GPVI agonists in mouse platelets lacking PKCepsilon in contrast to a minor inhibition in response to G protein-coupled receptor agonists. This reduction is mediated by inhibition of tyrosine phosphorylation of the FcRgamma-chain and downstream proteins, an effect also observed in wild-type mouse platelets in the presence of a PKC inhibitor.

    Conclusions: These results demonstrate a reciprocal relationship in levels of the novel PKC isoforms delta and epsilon in human and mouse platelets and a selective role for PKCepsilon in signalling through GPVI.

    Funded by: British Heart Foundation; Wellcome Trust

    PloS one 2008;3;11;e3793

  • Complete genome sequence of uropathogenic Proteus mirabilis, a master of both adherence and motility.

    Pearson MM, Sebaihia M, Churcher C, Quail MA, Seshasayee AS, Luscombe NM, Abdellah Z, Arrosmith C, Atkin B, Chillingworth T, Hauser H, Jagels K, Moule S, Mungall K, Norbertczak H, Rabbinowitsch E, Walker D, Whithead S, Thomson NR, Rather PN, Parkhill J and Mobley HL

    Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI 48109-0620, USA.

    The gram-negative enteric bacterium Proteus mirabilis is a frequent cause of urinary tract infections in individuals with long-term indwelling catheters or with complicated urinary tracts (e.g., due to spinal cord injury or anatomic abnormality). P. mirabilis bacteriuria may lead to acute pyelonephritis, fever, and bacteremia. Most notoriously, this pathogen uses urease to catalyze the formation of kidney and bladder stones or to encrust or obstruct indwelling urinary catheters. Here we report the complete genome sequence of P. mirabilis HI4320, a representative strain cultured in our laboratory from the urine of a nursing home patient with a long-term (> or =30 days) indwelling urinary catheter. The genome is 4.063 Mb long and has a G+C content of 38.88%. There is a single plasmid consisting of 36,289 nucleotides. Annotation of the genome identified 3,685 coding sequences and seven rRNA loci. Analysis of the sequence confirmed the presence of previously identified virulence determinants, as well as a contiguous 54-kb flagellar regulon and 17 types of fimbriae. Genes encoding a potential type III secretion system were identified on a low-G+C-content genomic island containing 24 intact genes that appear to encode all components necessary to assemble a type III secretion system needle complex. In addition, the P. mirabilis HI4320 genome possesses four tandem copies of the zapE metalloprotease gene, genes encoding six putative autotransporters, an extension of the atf fimbrial operon to six genes, including an mrpJ homolog, and genes encoding at least five iron uptake mechanisms, two potential type IV secretion systems, and 16 two-component regulators.

    Funded by: NIAID NIH HHS: AI059722, F32 AI068324, F32 AI068324-01A2, T32 AI7528; Wellcome Trust

    Journal of bacteriology 2008;190;11;4027-37

  • Exclusion of biglycan mutations in a cohort of patients with neuromuscular disorders.

    Peat RA, Gécz J, Fallon JR, Tarpey PS, Smith R, Futreal A, Stratton MR, Lamandé SR, Yang N and North KN

    Institute for Neuromuscular Research, The Children's Hospital at Westmead, Sydney, Australia.

    Biglycan has been considered a good candidate for neuromuscular disease based on direct interactions with collagen VI and alpha-dystroglycan, both of which are linked with congenital muscular dystrophy (CMD). We screened 83 patients with CMD and other neuromuscular disorders and six controls for mutations and variations in the biglycan sequence. We identified a number of novel sequence variations. After family analysis and control screening we found that none of these polymorphisms were disease-causing mutations. Thus mutations in biglycan are not a common cause of neuromuscular disorders in our cohort.

    Funded by: NICHD NIH HHS: R01 HD023924-17; Wellcome Trust: 077012

    Neuromuscular disorders : NMD 2008;18;8;606-9

  • Chromosome painting shows that skunks (Mephitidae, Carnivora) have highly rearranged karyotypes.

    Perelman PL, Graphodatsky AS, Dragoo JW, Serdyukova NA, Stone G, Cavagna P, Menotti A, Nie W, O'Brien PC, Wang J, Burkett S, Yuki K, Roelke ME, O'Brien SJ, Yang F and Stanyon R

    Institute of Cytology and Genetics, SB RAS 630090, Novosibirsk, Russia.

    The karyotypic relationships of skunks (Mephitidae) with other major clades of carnivores are not yet established. Here, multi-directional chromosome painting was used to reveal the karyological relationships among skunks and between Mephitidae (skunks) and Procyonidae (raccoons). Representative species from three genera of Mephitidae (Mephitis mephitis, 2n = 50; Mephitis macroura, 2n = 50; Conepatus leuconotus, 2n = 46; Spilogale gracilis, 2n = 60) and one species of Procyonidae (Procyon lotor, 2n = 38) were studied. Chromosomal homology was mapped by hybridization of five sets of whole-chromosome paints derived from stone marten (Martes foina, 2n = 38), cat, skunks (M. mephitis; M. macroura) and human. The karyotype of the raccoon is highly conserved and identical to the hypothetical ancestral musteloid karyotype, suggesting that procyonids have a particular importance for establishing the karyological evolution within the caniforms. Ten fission events and five fusion events are necessary to generate the ancestral skunk karyotype from the ancestral carnivore karyotype. Our results show that Mephitidae joins Canidae and Ursidae as the third family of carnivores that are characterized by a high rate of karyotype evolution. Shared derived chromosomal fusion of stone marten chromosomes 6 and 14 phylogenetically links the American hog-nosed skunk and eastern spotted skunk.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2008;16;8;1215-31

  • Copy number variation and evolution in humans and chimpanzees.

    Perry GH, Yang F, Marques-Bonet T, Murphy C, Fitzgerald T, Lee AS, Hyland C, Stone AC, Hurles ME, Tyler-Smith C, Eichler EE, Carter NP, Lee C and Redon R

    School of Human Evolution & Social Change, Arizona State University, Tempe, Arizona 85287, USA.

    Copy number variants (CNVs) underlie many aspects of human phenotypic diversity and provide the raw material for gene duplication and gene family expansion. However, our understanding of their evolutionary significance remains limited. We performed comparative genomic hybridization on a single human microarray platform to identify CNVs among the genomes of 30 humans and 30 chimpanzees as well as fixed copy number differences between species. We found that human and chimpanzee CNVs occur in orthologous genomic regions far more often than expected by chance and are strongly associated with the presence of highly homologous intrachromosomal segmental duplications. By adapting population genetic analyses for use with copy number data, we identified functional categories of genes that have likely evolved under purifying or positive selection for copy number changes. In particular, duplications and deletions of genes with inflammatory response and cell proliferation functions may have been fixed by positive selection and involved in the adaptive phenotypic differentiation of humans and chimpanzees.

    Funded by: Howard Hughes Medical Institute; NCRR NIH HHS: RR014491, RR015087, RR016483; NHGRI NIH HHS: HG004221; Wellcome Trust

    Genome research 2008;18;11;1698-710

  • Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.

    Pickard D, Thomson NR, Baker S, Wain J, Pardo M, Goulding D, Hamlin N, Choudhary J, Threfall J and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Some bacteriophages target potentially pathogenic bacteria by exploiting surface-associated virulence factors as receptors. For example, phage have been identified that exhibit specificity for Vi capsule producing Salmonella enterica serovar Typhi. Here we have characterized the Vi-associated E1-typing bacteriophage using a number of molecular approaches. The absolute requirement for Vi capsule expression for infectivity was demonstrated using different Vi-negative S. enterica derivatives. The phage particles were shown to have an icosahedral head and a long noncontractile tail structure. The genome is 45,362 bp in length with defined capsid and tail regions that exhibit significant homology to the S. enterica transducing phage ES18. Mass spectrometry was used to confirm the presence of a number of hypothetical proteins in the Vi phage E1 particle and demonstrate that a number of phage proteins are modified posttranslationally. The genome of the Vi phage E1 is significantly related to other bacteriophages belonging to the same serovar Typhi phage-typing set, and we demonstrate a role for phage DNA modification in determining host specificity.

    Funded by: Wellcome Trust

    Journal of bacteriology 2008;190;7;2580-7

  • Cytogenetic studies and karyotype nomenclature of three wild canid species: maned wolf (Chrysocyon brachyurus), bat-eared fox (Otocyon megalotis) and fennec fox (Fennecus zerda).

    Pieńkowska-Schelling A, Schelling C, Zawada M, Yang F, Bugno M and Ferguson-Smith M

    Department of Animal Sciences, Swiss Federal Institute of Technology, Zurich, Switzerland.

    We have analysed the chromosomes of three wild and endangered canid species: the maned wolf (Chrysocyon brachyurus), the bat-eared fox (Otocyon megalotis) and the fennec fox (Fennecuszerda) using classical and molecular cytogenetic methods. For the first time detailed and encompassing descriptions of the chromosomes are presented including the chromosomal assignment of nucleolar organizer regions and the 5S rRNA gene cluster. We propose a karyotype nomenclature with ideograms including more than 300 bands per haploid set for each of these three species which will form the basis for further research. In addition, we propose four basic different patterns of karyotype organization in the family Canidae. A comparison of these patterns with the most recent molecular phylogeny of Canidae revealed that the karyotype evolution of a species is not always strongly connected with its phylogenetic position. Our findings underline the need and justification for basic cytogenetic work in rare and exotic species.

    Cytogenetic and genome research 2008;121;1;25-34

  • Global transcript profiles of fat in monozygotic twins discordant for BMI: pathways behind acquired obesity.

    Pietiläinen KH, Naukkarinen J, Rissanen A, Saharinen J, Ellonen P, Keränen H, Suomalainen A, Götz A, Suortti T, Yki-Järvinen H, Oresic M, Kaprio J and Peltonen L

    Obesity Research Unit, Department of Psychiatry, Helsinki University Central Hospital, Helsinki, Finland.

    Background: The acquired component of complex traits is difficult to dissect in humans. Obesity represents such a trait, in which the metabolic and molecular consequences emerge from complex interactions of genes and environment. With the substantial morbidity associated with obesity, a deeper understanding of the concurrent metabolic changes is of considerable importance. The goal of this study was to investigate this important acquired component and expose obesity-induced changes in biological pathways in an identical genetic background.

    We used a special study design of "clonal controls," rare monozygotic twins discordant for obesity identified through a national registry of 2,453 young, healthy twin pairs. A total of 14 pairs were studied (eight male, six female; white), with a mean +/- standard deviation (SD) age 25.8 +/- 1.4 y and a body mass index (BMI) difference 5.2 +/- 1.8 kg/m(2). Sequence analyses of mitochondrial DNA (mtDNA) in subcutaneous fat and peripheral leukocytes revealed no aberrant heteroplasmy between the co-twins. However, mtDNA copy number was reduced by 47% in the obese co-twin's fat. In addition, novel pathway analyses of the adipose tissue transcription profiles exposed significant down-regulation of mitochondrial branched-chain amino acid (BCAA) catabolism (p < 0.0001). In line with this finding, serum levels of insulin secretion-enhancing BCAAs were increased in obese male co-twins (9% increase, p = 0.025). Lending clinical relevance to the findings, in both sexes the observed aberrations in mitochondrial amino acid metabolism pathways in fat correlated closely with liver fat accumulation, insulin resistance, and hyperinsulinemia, early aberrations of acquired obesity in these healthy young adults.

    Conclusions: Our findings emphasize a substantial role of mitochondrial energy- and amino acid metabolism in obesity and development of insulin resistance.

    Funded by: NIAAA NIH HHS: AA-00145, AA-08315, AA-12502

    PLoS medicine 2008;5;3;e51

  • Endoglin expression in blood and endothelium is differentially regulated by modular assembly of the Ets/Gata hemangioblast code.

    Pimanda JE, Chan WY, Wilson NK, Smith AM, Kinston S, Knezevic K, Janes ME, Landry JR, Kolb-Kokocinski A, Frampton J, Tannahill D, Ottersbach K, Follows GA, Lacaud G, Kouskoff V and Göttgens B

    Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom.

    Endoglin is an accessory receptor for TGF-beta signaling and is required for normal hemangioblast, early hematopoietic, and vascular development. We have previously shown that an upstream enhancer, Eng -8, together with the promoter region, mediates robust endothelial expression yet is inactive in blood. To identify hematopoietic regulatory elements, we used array-based methods to determine chromatin accessibility across the entire locus. Subsequent transgenic analysis of candidate elements showed that an endothelial enhancer at Eng +9 when combined with an element at Eng +7 functions as a strong hemato-endothelial enhancer. Chromatin immunoprecipitation (ChIP)-chip analysis demonstrated specific binding of Ets factors to the promoter as well as to the -8, +7+9 enhancers in both blood and endothelial cells. By contrast Pu.1, an Ets factor specific to the blood lineage, and Gata2 binding was only detected in blood. Gata2 was bound only at +7 and GATA motifs were required for hematopoietic activity. This modular assembly of regulators gives blood and endothelial cells the regulatory freedom to independently fine-tune gene expression and emphasizes the role of regulatory divergence in driving functional divergence.

    Funded by: Biotechnology and Biological Sciences Research Council; Cancer Research UK

    Blood 2008;112;12;4512-22

  • BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals.

    Poser I, Sarov M, Hutchins JR, Hériché JK, Toyoda Y, Pozniakovsky A, Weigl D, Nitzsche A, Hegemann B, Bird AW, Pelletier L, Kittler R, Hua S, Naumann R, Augsburg M, Sykora MM, Hofemeister H, Zhang Y, Nasmyth K, White KP, Dietzel S, Mechtler K, Durbin R, Stewart AF, Peters JM, Buchholz F and Hyman AA

    Max Planck Institute for Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, D-01307 Dresden, Germany.

    The interpretation of genome sequences requires reliable and standardized methods to assess protein function at high throughput. Here we describe a fast and reliable pipeline to study protein function in mammalian cells based on protein tagging in bacterial artificial chromosomes (BACs). The large size of the BAC transgenes ensures the presence of most, if not all, regulatory elements and results in expression that closely matches that of the endogenous gene. We show that BAC transgenes can be rapidly and reliably generated using 96-well-format recombineering. After stable transfection of these transgenes into human tissue culture cells or mouse embryonic stem cells, the localization, protein-protein and/or protein-DNA interactions of the tagged protein are studied using generic, tag-based assays. The same high-throughput approach will be generally applicable to other model systems.

    Funded by: NHGRI NIH HHS: 1R01HG004428-01; Wellcome Trust: 077192

    Nature methods 2008;5;5;409-15

  • Mosaic complementation demonstrates a regulatory role for myosin VIIa in actin dynamics of stereocilia.

    Prosser HM, Rzadzinska AK, Steel KP and Bradley A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    We have developed a bacterial artificial chromosome transgenesis approach that allowed the expression of myosin VIIa from the mouse X chromosome. We demonstrated the complementation of the Myo7a null mutant phenotype producing a fine mosaic of two types of sensory hair cells within inner ear epithelia of hemizygous transgenic females due to X inactivation. Direct comparisons between neighboring auditory hair cells that were different only with respect to myosin VIIa expression revealed that mutant stereocilia are significantly longer than those of their complemented counterparts. Myosin VIIa-deficient hair cells showed an abnormally persistent tip localization of whirlin, a protein directly linked to elongation of stereocilia, in stereocilia. Furthermore, myosin VIIa localized at the tips of all abnormally short stereocilia of mice deficient for either myosin XVa or whirlin. Our results strongly suggest that myosin VIIa regulates the establishment of a setpoint for stereocilium heights, and this novel role may influence their normal staircase-like arrangement within a bundle.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Molecular and cellular biology 2008;28;5;1702-12

  • A large genome center's improvements to the Illumina sequencing system.

    Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, Durbin R, Swerdlow H and Turner DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    The Wellcome Trust Sanger Institute is one of the world's largest genome centers, and a substantial amount of our sequencing is performed with 'next-generation' massively parallel sequencing technologies: in June 2008 the quantity of purity-filtered sequence data generated by our Genome Analyzer (Illumina) platforms reached 1 terabase, and our average weekly Illumina production output is currently 64 gigabases. Here we describe a set of improvements we have made to the standard Illumina protocols to make the library preparation more reliable in a high-throughput environment, to reduce bias, tighten insert size distribution and reliably obtain high yields of data.

    Funded by: Medical Research Council: G0701805; Wellcome Trust: 079643

    Nature methods 2008;5;12;1005-10

  • A software framework for microarray and gene expression object model (MAGE-OM) array design annotation.

    Qureshi M and Ivens A

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Background: The MIAME and MAGE-OM standards defined by the MGED society provide a specification and implementation of a software infrastructure to facilitate the submission and sharing of data from microarray studies via public repositories. However, although the MAGE object model is flexible enough to support different annotation strategies, the annotation of array descriptions can be complex.

    Results: We have developed a graphical Java-based application (Adamant) to assist with submission of Microarray designs to public repositories. Output of the application is fully compliant with the standards prescribed by the various public data repositories.

    Conclusion: Adamant will allow researchers to annotate and submit their own array designs to public repositories without requiring programming expertise, knowledge of the MAGE-OM or XML. The application has been used to submit a number of ArrayDesigns to the Array Express database.

    Funded by: Wellcome Trust

    BMC genomics 2008;9;133

  • An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs).

    Rakyan VK, Down TA, Thorne NP, Flicek P, Kulesha E, Gräf S, Tomazou EM, Bäckdahl L, Johnson N, Herberth M, Howe KL, Jackson DK, Miretti MM, Fiegler H, Marioni JC, Birney E, Hubbard TJ, Carter NP, Tavaré S and Beck S

    Institute of Cell and Molecular Science, Barts and the London, London E1 2AT, United Kingdom.

    We report a novel resource (methylation profiles of DNA, or mPod) for human genome-wide tissue-specific DNA methylation profiles. mPod consists of three fully integrated parts, genome-wide DNA methylation reference profiles of 13 normal somatic tissues, placenta, sperm, and an immortalized cell line, a visualization tool that has been integrated with the Ensembl genome browser and a new algorithm for the analysis of immunoprecipitation-based DNA methylation profiles. We demonstrate the utility of our resource by identifying the first comprehensive genome-wide set of tissue-specific differentially methylated regions (tDMRs) that may play a role in cellular identity and the regulation of tissue-specific genome function. We also discuss the implications of our findings with respect to the regulatory potential of regions with varied CpG density, gene expression, transcription factor motifs, gene ontology, and correlation with other epigenetic marks such as histone modifications.

    Funded by: Cancer Research UK: C14303/A4646; Wellcome Trust: 077198

    Genome research 2008;18;9;1518-29

  • The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates.

    Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, Crabtree J, Sebaihia M, Thomson NR, Chaudhuri R, Henderson IR, Sperandio V and Ravel J

    Institute for Genome Sciences, Department of Microbiology & Immunology, University of Maryland School of Medicine, 20 Penn Street, Baltimore, MD 21201, USA.

    Whole-genome sequencing has been skewed toward bacterial pathogens as a consequence of the prioritization of medical and veterinary diseases. However, it is becoming clear that in order to accurately measure genetic variation within and between pathogenic groups, multiple isolates, as well as commensal species, must be sequenced. This study examined the pangenomic content of Escherichia coli. Six distinct E. coli pathovars can be distinguished using molecular or phenotypic markers, but only two of the six pathovars have been subjected to any genome sequencing previously. Thus, this report provides a seminal description of the genomic contents and unique features of three unsequenced pathovars, enterotoxigenic E. coli, enteropathogenic E. coli, and enteroaggregative E. coli. We also determined the first genome sequence of a human commensal E. coli isolate, E. coli HS, which will undoubtedly provide a new baseline from which workers can examine the evolution of pathogenic E. coli. Comparison of 17 E. coli genomes, 8 of which are new, resulted in identification of approximately 2,200 genes conserved in all isolates. We were also able to identify genes that were isolate and pathovar specific. Fewer pathovar-specific genes were identified than anticipated, suggesting that each isolate may have independently developed virulence capabilities. Pangenome calculations indicate that E. coli genomic diversity represents an open pangenome model containing a reservoir of more than 13,000 genes, many of which may be uncharacterized but important virulence factors. This comparative study of the species E. coli, while descriptive, should provide the basis for future functional work on this important group of pathogens.

    Funded by: NIAID NIH HHS: N01-AI-30071

    Journal of bacteriology 2008;190;20;6881-93

  • The MEROPS batch BLAST: a tool to detect peptidases and their non-peptidase homologues in a genome.

    Rawlings ND and Morton FR

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Many of the 181 families of peptidases contain homologues that are known to have functions other than peptide bond hydrolysis. Distinguishing an active peptidase from a homologue that is not a peptidase requires specialist knowledge of the important active site residues, because replacement or lack of one of these catalytic residues is an important clue that the homologue in question is unlikely to hydrolyse peptide bonds. Now that the rate at which proteins are characterized is outstripped by the rate that genome sequences are determined, many genes are being incorrectly annotated because only sequence similarity is taken into consideration. We present a tool called the MEROPS batch BLAST which not only performs a comparison against the MEROPS sequence collection, but also does a pair-wise alignment with the closest homologue detected and calculates the position of the active site residues. A non-peptidase homologue can be distinguished by the absence or unacceptable replacement of any of these residues. An analysis of peptidase homologues in the genome of the bacterium Erythrobacter litoralis is presented as an example.

    Funded by: Wellcome Trust

    Biochimie 2008;90;2;243-59

  • MEROPS: the peptidase database.

    Rawlings ND, Morton FR, Kok CY, Kong J and Barrett AJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Peptidases (proteolytic enzymes or proteases), their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database ( aims to fulfil the need for an integrated source of information about these. The organizational principle of the database is a hierarchical classification in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families and in turn grouped into clans. Important additions to the database include newly written, concise text annotations for peptidase clans and the small molecule inhibitors that are outside the scope of the standard classification; displays to show peptidase specificity compiled from our collection of known substrate cleavages; tables of peptidase-inhibitor interactions; and dynamically generated alignments of representatives of each protein species at the family level. New ways to compare peptidase and inhibitor complements between any two organisms whose genomes have been completely sequenced, or between different strains or subspecies of the same organism, have been devised.

    Funded by: Wellcome Trust

    Nucleic acids research 2008;36;Database issue;D320-5

  • A logic-based diagram of signalling pathways central to macrophage activation.

    Raza S, Robertson KA, Lacaze PA, Page D, Enright AJ, Ghazal P and Freeman TC

    Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh, EH16 4SB, UK.

    Background: The complex yet flexible cellular response to pathogens is orchestrated by the interaction of multiple signalling and metabolic pathways. The molecular regulation of this response has been studied in great detail but comprehensive and unambiguous diagrams describing these events are generally unavailable. Four key signalling cascades triggered early-on in the innate immune response are the toll-like receptor, interferon, NF-kappaB and apoptotic pathways, which co-operate to defend cells against a given pathogen. However, these pathways are commonly viewed as separate entities rather than an integrated network of molecular interactions.

    Results: Here we describe the construction of a logically represented pathway diagram which attempts to integrate these four pathways central to innate immunity using a modified version of the Edinburgh Pathway Notation. The pathway map is available in a number of electronic formats and editing is supported by yEd graph editor software.

    Conclusion: The map presents a powerful visual aid for interpreting the available pathway interaction knowledge and underscores the valuable contribution well constructed pathway diagrams make to communicating large amounts of molecular interaction data. Furthermore, we discuss issues with the limitations and scalability of pathways presented in this fashion, explore options for automated layout of large pathway networks and demonstrate how such maps can aid the interpretation of functional studies.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust

    BMC systems biology 2008;2;36

  • The Protein Feature Ontology: a tool for the unification of protein feature annotations.

    Reeves GA, Eilbeck K, Magrane M, O'Donovan C, Montecchi-Palazzi L, Harris MA, Orchard S, Jimenez RC, Prlic A, Hubbard TJ, Hermjakob H and Thornton JM

    EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    Motivation: The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of uncharacterized protein structures and sequences. Consequently, many computational tools have been developed to help elucidate protein function. However, such services are spread throughout the world, often with standalone web pages. Integration of these methods is needed and so far this has not been possible as there was no common vocabulary available that could be used as a standard language.

    Results: The Protein Feature Ontology has been developed to provide a structured controlled vocabulary for features on a protein sequence or structure and comprises approximately 100 positional terms, now integrated into the Sequence Ontology (SO) and 40 non-positional terms which describe features relating to the whole-protein sequence. In addition, post-translational modifications are described by using a pre-existing ontology, the Protein Modification Ontology (MOD). This ontology is being used to integrate over 150 distinct annotations provided by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in Europe.

    Availability: The Protein Feature Ontology can be browsed by accessing the ontology lookup service at the European Bioinformatics Institute (

    Funded by: Wellcome Trust: 062023, 077198

    Bioinformatics (Oxford, England) 2008;24;23;2767-72

  • Fission yeast MAP kinase Sty1 is recruited to stress-induced genes.

    Reiter W, Watt S, Dawson K, Lawrence CL, Bähler J, Jones N and Wilkinson CR

    Paterson Institute for Cancer Research, University of Manchester, Wilmslow Road, Manchester, UK.

    The stress-induced expression of many fission yeast genes is dependent upon the Sty1 mitogen-activated protein kinase (MAPK) and Atf1 transcription factor. Atf1 is phosphorylated by Sty1 yet this phosphorylation is not required for stress-induced gene expression, suggesting another mechanism exists whereby Sty1 activates transcription. Here we show that Sty1 associates with Atf1-dependent genes and is recruited to both their promoters and coding regions. This occurs in response to various stress conditions coincident with the kinetics of the activation of Sty1. Association with promoters is not a consequence of increased nuclear accumulation of Sty1 nor does it require the phosphorylation of Atf1. However, recruitment is completely abolished in a mutant lacking Sty1 kinase activity. Both Atf1 and its binding partner Pcr1 are required for association of Sty1 with Atf1-dependent promoters, suggesting that this heterodimer must be intact for optimal recruitment of the MAPK. However, many Atf1-dependent genes are still expressed in a pcr1Delta mutant but with significantly delayed kinetics, thus providing an explanation for the relatively mild stress sensitivity displayed by pcr1Delta. Consistent with this delay, Sty1 and Atf1 cannot be detected at these promoters in this condition, suggesting that their association with chromatin is weak or transient in the absence of Pcr1.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118, 098051

    The Journal of biological chemistry 2008;283;15;9945-56

  • Different pathways to acquiring resistance genes illustrated by the recent evolution of IncW plasmids.

    Revilla C, Garcillán-Barcia MP, Fernández-López R, Thomson NR, Sanders M, Cheung M, Thomas CM and de la Cruz F

    Departamento de Biología Molecular e Instituto de Biomedicina y Biotecnología de Cantabria, Universidad de Cantabria-CSIC-IDICAN, C. Herrera Oria s/n, 39011 Santander, Spain.

    DNA sequence analysis of five IncW plasmids (R388, pSa, R7K, pIE321, and pIE522) demonstrated that they share a considerable portion of their genomes and allowed us to define the IncW backbone. Among these plasmids, the backbone is stable and seems to have diverged recently, since the overall identity among its members is higher than 95%. The only gene in which significant variation was observed was trwA; the changes in the coding sequence correlated with parallel changes in the corresponding TrwA binding sites at oriT, suggesting a functional connection between both sets of changes. The present IncW plasmid diversity is shaped by the acquisition of antibiotic resistance genes as a consequence of the pressure exerted by antibiotic usage. Sequence comparisons pinpointed the insertion events that differentiated the five plasmids analyzed. Of greatest interest is that a single acquisition of a class I integron platform, into which different gene cassettes were later incorporated, gave rise to plasmids R388, pIE522, and pSa, while plasmids R7K and pIE321 do not contain the integron platform and arose in the antibiotic world because of the insertion of several antibiotic resistance transposons.

    Antimicrobial agents and chemotherapy 2008;52;4;1472-80

  • Use and misuse of the gene ontology annotations.

    Rhee SY, Wood V, Dolinski K and Draghici S

    Carnegie Institution for Science, Department of Plant Biology, 260 Panama Street, Stanford, California 94305, USA.

    The Gene Ontology (GO) project is a collaboration among model organism databases to describe gene products from all organisms using a consistent and computable language. GO produces sets of explicitly defined, structured vocabularies that describe biological processes, molecular functions and cellular components of gene products in both a computer- and human-readable manner. Here we describe key aspects of GO, which, when overlooked, can cause erroneous results, and address how these pitfalls can be avoided.

    Nature reviews. Genetics 2008;9;7;509-15

  • Male-pattern baldness susceptibility locus at 20p11.

    Richards JB, Yuan X, Geller F, Waterworth D, Bataille V, Glass D, Song K, Waeber G, Vollenweider P, Aben KK, Kiemeney LA, Walters B, Soranzo N, Thorsteinsdottir U, Kong A, Rafnar T, Deloukas P, Sulem P, Stefansson H, Stefansson K, Spector TD and Mooser V

    Department of Twin Research and Genetic Epidemiology, King's College London, London SE1 7EH, UK.

    We conducted a genome-wide association study for androgenic alopecia in 1,125 men and identified a newly associated locus at chromosome 20p11.22, confirmed in three independent cohorts (n = 1,650; OR = 1.60, P = 1.1 x 10(-14) for rs1160312). The one man in seven who harbors risk alleles at both 20p11.22 and AR (encoding the androgen receptor) has a sevenfold-increased odds of androgenic alopecia (OR = 7.12, P = 3.7 x 10(-15)).

    Funded by: Wellcome Trust: 077011

    Nature genetics 2008;40;11;1282-4

  • WormBase 2007.

    Rogers A, Antoshechkin I, Bieri T, Blasiar D, Bastiani C, Canaran P, Chan J, Chen WJ, Davis P, Fernandes J, Fiedler TJ, Han M, Harris TW, Kishore R, Lee R, McKay S, Müller HM, Nakamura C, Ozersky P, Petcherski A, Schindelman G, Schwarz EM, Spooner W, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Yook K, Durbin R, Stein LD, Spieth J and Sternberg PW

    Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridgeshire CB10 1SA, UK.

    WormBase ( is the major publicly available database of information about Caenorhabditis elegans, an important system for basic biological and biomedical research. Derived from the initial ACeDB database of C. elegans genetic and sequence information, WormBase now includes the genomic, anatomical and functional information about C. elegans, other Caenorhabditis species and other nematodes. As such, it is a crucial resource not only for C. elegans biologists but the larger biomedical and bioinformatics communities. Coverage of core areas of C. elegans biology will allow the biomedical community to make full use of the results of intensive molecular genetic analysis and functional genomic studies of this organism. Improved search and display tools, wider cross-species comparisons and extended ontologies are some of the features that will help scientists extend their research and take advantage of other nematode species genome sequences.

    Funded by: NHGRI NIH HHS: P41-HG02223

    Nucleic acids research 2008;36;Database issue;D612-7

  • Maximum-likelihood estimation of site-specific mutation rates in human mitochondrial DNA from partial phylogenetic classification.

    Rosset S, Wells RS, Soria-Hernanz DF, Tyler-Smith C, Royyuru AK, Behar DM and Genographic Consortium

    Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv, Israel.

    The mitochondrial DNA hypervariable segment I (HVS-I) is widely used in studies of human evolutionary genetics, and therefore accurate estimates of mutation rates among nucleotide sites in this region are essential. We have developed a novel maximum-likelihood methodology for estimating site-specific mutation rates from partial phylogenetic information, such as haplogroup association. The resulting estimation problem is a generalized linear model, with a nonstandard link function. We develop inference and bias correction tools for our estimates and a hypothesis-testing approach for site independence. We demonstrate our methodology using 16,609 HVS-I samples from the Genographic Project. Our results suggest that mutation rates among nucleotide sites in HVS-I are highly variable. The 16,400-16,500 region exhibits significantly lower rates compared to other regions, suggesting potential functional constraints. Several loci identified in the literature as possible termination-associated sequences (TAS) do not yield statistically slower rates than the rest of HVS-I, casting doubt on their functional importance. Our tests do not reject the null hypothesis of independent mutation rates among nucleotide sites, supporting the use of site-independence assumption for analyzing HVS-I. Potential extensions of our methodology include its application to estimation of mutation rates in other genetic regions, like Y chromosome short tandem repeats.

    Funded by: Wellcome Trust

    Genetics 2008;180;3;1511-24

  • TreeFam: 2008 Update.

    Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, Hériché JK, Hu Y, Kristiansen K, Li R, Liu T, Moses A, Qin J, Vang S, Vilella AJ, Ureta-Vidal A, Bolund L, Wang J and Durbin R

    Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China.

    TreeFam ( was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14,351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.

    Funded by: Wellcome Trust

    Nucleic acids research 2008;36;Database issue;D735-40

  • Evolution of NMDA receptor cytoplasmic interaction domains: implications for organisation of synaptic signalling complexes.

    Ryan TJ, Emes RD, Grant SG and Komiyama NH

    Genes to Cognition Program, Wellcome Trust Sanger Institute, Cambridge, UK.

    Background: Glutamate gated postsynaptic receptors in the central nervous system (CNS) are essential for environmentally stimulated behaviours including learning and memory in both invertebrates and vertebrates. Though their genetics, biochemistry, physiology, and role in behaviour have been intensely studied in vitro and in vivo, their molecular evolution and structural aspects remain poorly understood. To understand how these receptors have evolved different physiological requirements we have investigated the molecular evolution of glutamate gated receptors and ion channels, in particular the N-methyl-D-aspartate (NMDA) receptor, which is essential for higher cognitive function. Studies of rodent NMDA receptors show that the C-terminal intracellular domain forms a signalling complex with enzymes and scaffold proteins, which is important for neuronal and behavioural plasticity

    Results: The vertebrate NMDA receptor was found to have subunits with C-terminal domains up to 500 amino acids longer than invertebrates. This extension was specific to the NR2 subunit and occurred before the duplication and subsequent divergence of NR2 in the vertebrate lineage. The shorter invertebrate C-terminus lacked vertebrate protein interaction motifs involved with forming a signaling complex although the terminal PDZ interaction domain was conserved. The vertebrate NR2 C-terminal domain was predicted to be intrinsically disordered but with a conserved secondary structure.

    Conclusion: We highlight an evolutionary adaptation specific to vertebrate NMDA receptor NR2 subunits. Using in silico methods we find that evolution has shaped the NMDA receptor C-terminus into an unstructured but modular intracellular domain that parallels the expansion in complexity of an NMDA receptor signalling complex in the vertebrate lineage. We propose the NR2 C-terminus has evolved to be a natively unstructured yet flexible hub organising postsynaptic signalling. The evolution of the NR2 C-terminus and its associated signalling complex may contribute to species differences in behaviour and in particular cognitive function.

    Funded by: Wellcome Trust

    BMC neuroscience 2008;9;6

  • Annotation of mammalian primary microRNAs.

    Saini HK, Enright AJ and Griffiths-Jones S

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Background: MicroRNAs (miRNAs) are important regulators of gene expression and have been implicated in development, differentiation and pathogenesis. Hundreds of miRNAs have been discovered in mammalian genomes. Approximately 50% of mammalian miRNAs are expressed from introns of protein-coding genes; the primary transcript (pri-miRNA) is therefore assumed to be the host transcript. However, very little is known about the structure of pri-miRNAs expressed from intergenic regions. Here we annotate transcript boundaries of miRNAs in human, mouse and rat genomes using various transcription features. The 5' end of the pri-miRNA is predicted from transcription start sites, CpG islands and 5' CAGE tags mapped in the upstream flanking region surrounding the precursor miRNA (pre-miRNA). The 3' end of the pri-miRNA is predicted based on the mapping of polyA signals, and supported by cDNA/EST and ditags data. The predicted pri-miRNAs are also analyzed for promoter and insulator-associated regulatory regions.

    Results: We define sets of conserved and non-conserved human, mouse and rat pre-miRNAs using bidirectional BLAST and synteny analysis. Transcription features in their flanking regions are used to demarcate the 5' and 3' boundaries of the pri-miRNAs. The lengths and boundaries of primary transcripts are highly conserved between orthologous miRNAs. A significant fraction of pri-miRNAs have lengths between 1 and 10 kb, with very few introns. We annotate a total of 59 pri-miRNA structures, which include 82 pre-miRNAs. 36 pri-miRNAs are conserved in all 3 species. In total, 18 of the confidently annotated transcripts express more than one pre-miRNA. The upstream regions of 54% of the predicted pri-miRNAs are found to be associated with promoter and insulator regulatory sequences.

    Conclusion: Little is known about the primary transcripts of intergenic miRNAs. Using comparative data, we are able to identify the boundaries of a significant proportion of human, mouse and rat pri-miRNAs. We confidently predict the transcripts including a total of 77, 58 and 47 human, mouse and rat pre-miRNAs respectively. Our computational annotations provide a basis for subsequent experimental validation of predicted pri-miRNAs.

    Funded by: Wellcome Trust

    BMC genomics 2008;9;564

  • Pfam 10 years on: 10,000 families and still growing.

    Sammut SJ, Finn RD and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.

    Funded by: Wellcome Trust: 087656

    Briefings in bioinformatics 2008;9;3;210-9

  • Mendelian randomisation studies of type 2 diabetes: future prospects.

    Sandhu MS, Debenham SL, Barroso I and Loos RJ

    MRC Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK.

    Funded by: Medical Research Council: MC_U106179471, MC_U106188470; Wellcome Trust: 077016

    Diabetologia 2008;51;2;211-3

  • LDL-cholesterol concentrations: a genome-wide association study.

    Sandhu MS, Waterworth DM, Debenham SL, Wheeler E, Papadakis K, Zhao JH, Song K, Yuan X, Johnson T, Ashford S, Inouye M, Luben R, Sims M, Hadley D, McArdle W, Barter P, Kesäniemi YA, Mahley RW, McPherson R, Grundy SM, Wellcome Trust Case Control Consortium, Bingham SA, Khaw KT, Loos RJ, Waeber G, Barroso I, Strachan DP, Deloukas P, Vollenweider P, Wareham NJ and Mooser V

    Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, Cambridge, UK.

    Background: LDL cholesterol has a causal role in the development of cardiovascular disease. Improved understanding of the biological mechanisms that underlie the metabolism and regulation of LDL cholesterol might help to identify novel therapeutic targets. We therefore did a genome-wide association study of LDL-cholesterol concentrations.

    Methods: We used genome-wide association data from up to 11,685 participants with measures of circulating LDL-cholesterol concentrations across five studies, including data for 293 461 autosomal single nucleotide polymorphisms (SNPs) with a minor allele frequency of 5% or more that passed our quality control criteria. We also used data from a second genome-wide array in up to 4337 participants from three of these five studies, with data for 290,140 SNPs. We did replication studies in two independent populations consisting of up to 4979 participants. Statistical approaches, including meta-analysis and linkage disequilibrium plots, were used to refine association signals; we analysed pooled data from all seven populations to determine the effect of each SNP on variations in circulating LDL-cholesterol concentrations.

    Findings: In our initial scan, we found two SNPs (rs599839 [p=1.7x10(-15)] and rs4970834 [p=3.0x10(-11)]) that showed genome-wide statistical association with LDL cholesterol at chromosomal locus 1p13.3. The second genome screen found a third statistically associated SNP at the same locus (rs646776 [p=4.3x10(-9)]). Meta-analysis of data from all studies showed an association of SNPs rs599839 (combined p=1.2x10(-33)) and rs646776 (p=4.8x10(-20)) with LDL-cholesterol concentrations. SNPs rs599839 and rs646776 both explained around 1% of the variation in circulating LDL-cholesterol concentrations and were associated with about 15% of an SD change in LDL cholesterol per allele, assuming an SD of 1 mmol/L.

    Interpretation: We found evidence for a novel locus for LDL cholesterol on chromosome 1p13.3. These results potentially provide insight into the biological mechanisms that underlie the regulation of LDL cholesterol and might help in the discovery of novel therapeutic targets for cardiovascular disease.

    Funded by: Medical Research Council: G0000934, G0701863, MC_QA137934, MC_U105630924, MC_U106188470; Wellcome Trust: 068545/Z/02

    Lancet 2008;371;9611;483-91

  • Genetic association analyses of non-synonymous single nucleotide polymorphisms in diabetic nephropathy.

    Savage DA, Patterson CC, Deloukas P, Whittaker P, McKnight AJ, Morrison J, Boulton AJ, Demaine AG, Marshall SM, Millward BA, Thomas SM, Viberti GC, Walker JD, Sadlier D, Maxwell AP and Bain SC

    Nephrology Research Laboratory, Queen's University, Belfast, BT9 7AB, Northern Ireland, UK.

    Diabetic nephropathy, characterised by persistent proteinuria, hypertension and progressive kidney failure, affects a subset of susceptible individuals with diabetes. It is also a leading cause of end-stage renal disease (ESRD). Non-synonymous (ns) single nucleotide polymorphisms (SNPs) have been reported to contribute to genetic susceptibility in both monogenic disorders and common complex diseases. The objective of this study was to investigate whether nsSNPs are involved in susceptibility to diabetic nephropathy using a case-control design.

    Methods: White type 1 diabetic patients with (cases) and without (controls) nephropathy from eight centres in the UK and Ireland were genotyped for a selected subset of nsSNPs using Illumina's GoldenGate BeadArray assay. A chi (2) test for trend, stratified by centre, was used to assess differences in genotype distribution between cases and controls. Genomic control was used to adjust for possible inflation of test statistics, and the False Discovery Rate method was used to account for multiple testing.

    Results: We assessed 1,111 nsSNPs for association with diabetic nephropathy in 1,711 individuals with type 1 diabetes (894 cases, 817 controls). A number of SNPs demonstrated a significant difference in genotype distribution between groups before but not after correction for multiple testing. Furthermore, neither subgroup analysis (diabetic nephropathy with ESRD or diabetic nephropathy without ESRD) nor stratification by duration of diabetes revealed any significant differences between groups.

    The nsSNPs investigated in this study do not appear to contribute significantly to the development of diabetic nephropathy in patients with type 1 diabetes.

    Funded by: Wellcome Trust

    Diabetologia 2008;51;11;1998-2002

  • New case of interstitial deletion 12(q15-q21.2) in a girl with facial dysmorphism and mental retardation.

    Schluth C, Gesny R, Borck G, Redon R, Abadie V, Kleinfinger P, Munnich A, Lyonnet S and Colleaux L

    Département de Génétique, Hôpital Necker Enfants Malades, Paris, France.

    Interstitial deletions of the long arm of chromosome 12 are rare rearrangements with only 15 cases reported in the literature. The phenotype may include facial dysmorphism, developmental delay, ectodermal abnormalities, cardiac and renal malformations depending on breakpoints' position. Here, we describe a third case of 12(q15-q21.2) deletion ascertained through CGH-array analyses and provide a 5-year follow-up. The patient presented with pre- and postnatal growth retardation, congenital heart defect, developmental delay, and facial dysmorphism changing with age, underlining the importance of long-term follow-up. We compared this new case with previous observations of 12q deletions in order to propose phenotype-karyotype correlations.

    American journal of medical genetics. Part A 2008;146A;1;93-6

  • Repeated replication and a prospective meta-analysis of the association between chromosome 9p21.3 and coronary artery disease.

    Schunkert H, Götz A, Braund P, McGinnis R, Tregouet DA, Mangino M, Linsel-Nitschke P, Cambien F, Hengstenberg C, Stark K, Blankenberg S, Tiret L, Ducimetiere P, Keniry A, Ghori MJ, Schreiber S, El Mokhtari NE, Hall AS, Dixon RJ, Goodall AH, Liptau H, Pollard H, Schwarz DF, Hothorn LA, Wichmann HE, König IR, Fischer M, Meisinger C, Ouwehand W, Deloukas P, Thompson JR, Erdmann J, Ziegler A, Samani NJ and Cardiogenics Consortium

    Medizinische Klinik II, Universität zu Lübeck, Lübeck, Germany.

    Background: Recently, genome-wide association studies identified variants on chromosome 9p21.3 as affecting the risk of coronary artery disease (CAD). We investigated the association of this locus with CAD in 7 case-control studies and undertook a meta-analysis.

    A single-nucleotide polymorphism (SNP), rs1333049, representing the 9p21.3 locus, was genotyped in 7 case-control studies involving a total of 4645 patients with myocardial infarction or CAD and 5177 controls. The mode of inheritance was determined. In addition, in 5 of the 7 studies, we genotyped 3 additional SNPs to assess a risk-associated haplotype (ACAC). Finally, a meta-analysis of the present data and previously published samples was conducted. A limited fine mapping of the locus was performed. The risk allele (C) of the lead SNP, rs1333049, was uniformly associated with CAD in each study (P<0.05). In a pooled analysis, the odds ratio per copy of the risk allele was 1.29 (95% confidence interval, 1.22 to 1.37; P=0.0001). Haplotype analysis further suggested that this effect was not homogeneous across the haplotypic background (test for interaction, P=0.0079). An autosomal-additive mode of inheritance best explained the underlying association. The meta-analysis of the rs1333049 SNP in 12,004 cases and 28,949 controls increased the overall level of evidence for association with CAD to P=6.04x10(-10) (odds ratio, 1.24; 95% confidence interval, 1.20 to 1.29). Genotyping of 31 additional SNPs in the region identified several with a highly significant association with CAD, but none had predictive information beyond that of the rs1333049 SNP.

    Conclusions: This broad replication provides unprecedented evidence for association between genetic variants at chromosome 9p21.3 and risk of CAD.

    Funded by: British Heart Foundation

    Circulation 2008;117;13;1675-84

  • Protein interactions in human genetic diseases.

    Schuster-Böckler B and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    We present a novel method that combines protein structure information with protein interaction data to identify residues that form part of an interaction interface. Our prediction method can retrieve interaction hotspots with an accuracy of 60% (at a 20% false positive rate). The method was applied to all mutations in the Online Mendelian Inheritance in Man (OMIM) database, predicting 1,428 mutations to be related to an interaction defect. Combining predicted and hand-curated sets, we discuss how mutations affect protein interactions in general.

    Funded by: Wellcome Trust: 087656

    Genome biology 2008;9;1;R9

  • Opportunity knocks

    Seth-Smith H., Walker A.

    Nat Rev Microbiol. 2008;6;652-3

  • A Poultry Existence

    Seth-Smith H

    Nat Rev Microbiol 2008;6;8

  • Genome-wide transcriptional changes induced by phagocytosis or growth on bacteria in Dictyostelium.

    Sillo A, Bloomfield G, Balest A, Balbo A, Pergolizzi B, Peracino B, Skelton J, Ivens A and Bozzaro S

    Department of Clinical and Biological Sciences, University of Turin, Ospedale S, Luigi, 10043 Orbassano, Torino, Italy.

    Background: Phagocytosis plays a major role in the defense of higher organisms against microbial infection and provides also the basis for antigen processing in the immune response. Cells of the model organism Dictyostelium are professional phagocytes that exploit phagocytosis of bacteria as the preferred way to ingest food, besides killing pathogens. We have investigated Dictyostelium differential gene expression during phagocytosis of non-pathogenic bacteria, using DNA microarrays, in order to identify molecular functions and novel genes involved in phagocytosis.

    Results: The gene expression profiles of cells incubated for a brief time with bacteria were compared with cells either incubated in axenic medium or growing on bacteria. Transcriptional changes during exponential growth in axenic medium or on bacteria were also compared. We recognized 443 and 59 genes that are differentially regulated by phagocytosis or by the different growth conditions (growth on bacteria vs. axenic medium), respectively, and 102 genes regulated by both processes. Roughly one third of the genes are up-regulated compared to macropinocytosis and axenic growth. Functional annotation of differentially regulated genes with different tools revealed that phagocytosis induces profound changes in carbohydrate, amino acid and lipid metabolism, and in cytoskeletal components. Genes regulating translation and mitochondrial biogenesis are mostly up-regulated. Genes involved in sterol biosynthesis are selectively up-regulated, suggesting a shift in membrane lipid composition linked to phagocytosis. Very few changes were detected in genes required for vesicle fission/fusion, indicating that the intracellular traffic machinery is mostly in common between phagocytosis and macropinocytosis. A few putative receptors, including GPCR family 3 proteins, scaffolding and adhesion proteins, components of signal transduction and transcription factors have been identified, which could be part of a signalling complex regulating phagocytosis and adaptational downstream responses.

    Conclusion: The results highlight differences between phagocytosis and macropinocytosis, and provide the basis for targeted functional analysis of new candidate genes and for comparison studies with transcriptomes during infection with pathogenic bacteria.

    BMC genomics 2008;9;291

  • Polymorphisms in the estrogen receptor 1 and vitamin C and matrix metalloproteinase gene families are associated with susceptibility to lymphoma.

    Skibola CF, Bracci PM, Halperin E, Nieters A, Hubbard A, Paynter RA, Skibola DR, Agana L, Becker N, Tressler P, Forrest MS, Sankararaman S, Conde L, Holly EA and Smith MT

    School of Public Health, Division of Environmental Health Sciences, University of California Berkeley, Berkeley, California, USA.

    Background: Non-Hodgkin lymphoma (NHL) is the fifth most common cancer in the U.S. and few causes have been identified. Genetic association studies may help identify environmental risk factors and enhance our understanding of disease mechanisms.

    768 coding and haplotype tagging SNPs in 146 genes were examined using Illumina GoldenGate technology in a large population-based case-control study of NHL in the San Francisco Bay Area (1,292 cases 1,375 controls are included here). Statistical analyses were restricted to HIV- participants of white non-Hispanic origin. Genes involved in steroidogenesis, immune function, cell signaling, sunlight exposure, xenobiotic metabolism/oxidative stress, energy balance, and uptake and metabolism of cholesterol, folate and vitamin C were investigated. Sixteen SNPs in eight pathways and nine haplotypes were associated with NHL after correction for multiple testing at the adjusted q<0.10 level. Eight SNPs were tested in an independent case-control study of lymphoma in Germany (494 NHL cases and 494 matched controls). Novel associations with common variants in estrogen receptor 1 (ESR1) and in the vitamin C receptor and matrix metalloproteinase gene families were observed. Four ESR1 SNPs were associated with follicular lymphoma (FL) in the U.S. study, with rs3020314 remaining associated with reduced risk of FL after multiple testing adjustments [odds ratio (OR) = 0.42, 95% confidence interval (CI) = 0.23-0.77) and replication in the German study (OR = 0.24, 95% CI = 0.06-0.94). Several SNPs and haplotypes in the matrix metalloproteinase-3 (MMP3) and MMP9 genes and in the vitamin C receptor genes, solute carrier family 23 member 1 (SLC23A1) and SLC23A2, showed associations with NHL risk.

    Our findings suggest a role for estrogen, vitamin C and matrix metalloproteinases in the pathogenesis of NHL that will require further validation.

    Funded by: NCI NIH HHS: CA104862, CA122663, CA45614, CA87014, CA89745

    PloS one 2008;3;7;e2816

  • Computational prediction of protein-protein interactions.

    Skrabanek L, Saini HK, Bader GD and Enright AJ

    Department of Physiology and Biophysics and Institute for Computational Biomedicine, Weill Medical College of Cornell University, 1300 York Avenue, New York, NY 10021, USA.

    Recently a number of computational approaches have been developed for the prediction of protein-protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.

    Molecular biotechnology 2008;38;1;1-17

  • The effect of temperature on Natural Antisense Transcript (NAT) expression in Aspergillus flavus.

    Smith CA, Robertson D, Yates B, Nielsen DM, Brown D, Dean RA and Payne GA

    Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK 74078, USA.

    Naturally occurring Antisense Transcripts (NATs) compose an emerging group of regulatory RNAs. These regulatory elements appear in all organisms examined, but little is known about global expression of NATs in fungi. Analysis of currently available EST sequences suggests that 352 cis NATs are present in Aspergillus flavus. An Affymetrix GeneChip microarray containing probes for these cis NATs, as well as all predicted genes in A. flavus, allowed a whole genome expression analysis of these elements in response to two ecologically important temperatures for the fungus. RNA expression analysis showed that 32 NATs and 2,709 genes were differentially expressed between 37 degrees C, the optimum temperature for growth, and 28 degrees C, the conducive temperature for the biosynthesis of aflatoxin (AF) and many other secondary metabolites. These NATs correspond to sense genes with diverse functions including transcription initiation, carbohydrate processing and binding, temperature sensitive morphogenesis, and secondary metabolism. This is the first report of a whole genome transcriptional analysis of NAT expression in a fungus.

    Current genetics 2008;54;5;241-69

  • Understanding how morphogens work.

    Smith JC, Hagemann A, Saka Y and Williams PH

    Wellcome Trust/CR-UK Gurdon Institute, Department of Zoology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK.

    In this article, we describe the mechanisms by which morphogens in the Xenopus embryo exert their long-range effects. Our results are consistent with the idea that signalling molecules such as activin and the nodal-related proteins traverse responding tissue not by transcytosis or by cytonemes but by movement through the extracellular space. We suggest, however, that additional experiments, involving real-time imaging of morphogens, are required for a real understanding of what influences signalling range and the shape of a morphogen gradient.

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2008;363;1495;1387-92

  • Robust, persistent transgene expression in human embryonic stem cells is achieved with AAVS1-targeted integration.

    Smith JR, Maguire S, Davis LA, Alexander M, Yang F, Chandran S, ffrench-Constant C and Pedersen RA

    Department of Surgery, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Hills Road, Cambridge CB2 OXY, United Kingdom.

    Silencing and variegated transgene expression are poorly understood problems that can interfere with gene function studies in human embryonic stem cells (hESCs). We show that transgene expression (enhanced green fluorescent protein [EGFP]) from random integration sites in hESCs is affected by variegation and silencing, with only half of hESCs expressing the transgene, which is gradually lost after withdrawal of selection and differentiation. We tested the hypothesis that a transgene integrated into the adeno-associated virus type 2 (AAV2) target region on chromosome 19, known as the AAVS1 locus, would maintain transgene expression in hESCs. When we used AAV2 technology to target the AAVS1 locus, 4.16% of hESC clones achieved AAVS1-targeted integration. Targeted clones expressed Oct-4, stage-specific embryonic antigen-3 (SSEA3), and Tra-1-60 and differentiated into all three primary germ layers. EGFP expression from the AAVS1 locus showed significantly reduced variegated expression when in selection, with 90% +/- 4% of cells expressing EGFP compared with 57% +/- 32% for randomly integrated controls, and reduced tendency to undergo silencing, with 86% +/- 7% hESCs expressing EGFP 25 days after withdrawal of selection compared with 39% +/- 31% for randomly integrated clones. In addition, quantitative polymerase chain reaction analysis of hESCs also indicated significantly higher levels of EGFP mRNA in AAVS1-targeted clones as compared with randomly integrated clones. Transgene expression from the AAVS1 locus was shown to be stable during hESC differentiation, with more than 90% of cells expressing EGFP after 15 days of differentiation, as compared with approximately 30% for randomly integrated clones. These results demonstrate the utility of transgene integration at the AAVS1 locus in hESCs and its potential clinical application.

    Funded by: Medical Research Council: G0300300, G0300723, G0600275

    Stem cells (Dayton, Ohio) 2008;26;2;496-504

  • Conservation of the H19 noncoding RNA and H19-IGF2 imprinting mechanism in therians.

    Smits G, Mungall AJ, Griffiths-Jones S, Smith P, Beury D, Matthews L, Rogers J, Pask AJ, Shaw G, VandeBerg JL, McCarrey JR, SAVOIR Consortium, Renfree MB, Reik W and Dunham I

    The Babraham Institute, Laboratory of Developmental Genetics and Imprinting, Cambridge CB22 3AT, UK.

    Comparisons between eutherians and marsupials suggest limited conservation of the molecular mechanisms that control genomic imprinting in mammals. We have studied the evolution of the imprinted IGF2-H19 locus in therians. Although marsupial orthologs of protein-coding exons were easily identified, the use of evolutionarily conserved regions and low-stringency Bl2seq comparisons was required to delineate a candidate H19 noncoding RNA sequence. The therian H19 orthologs show miR-675 and exon structure conservation, suggesting functional selection on both features. Transcription start site sequences and poly(A) signals are also conserved. As in eutherians, marsupial H19 is maternally expressed and paternal methylation upstream of the gene originates in the male germline, encompasses a CTCF insulator, and spreads somatically into the H19 gene. The conservation in all therians of the mechanism controlling imprinting of the IGF2-H19 locus suggests a sequential model of imprinting evolution.

    Funded by: Medical Research Council: G0400154; Wellcome Trust

    Nature genetics 2008;40;8;971-6

  • Association of a nonsynonymous variant of DAOA with visuospatial ability in a bipolar family sample.

    Soronen P, Silander K, Antila M, Palo OM, Tuulio-Henriksson A, Kieseppä T, Ellonen P, Wedenoja J, Turunen JA, Pietiläinen OP, Hennah W, Lönnqvist J, Peltonen L, Partonen T and Paunio T

    Department of Molecular Medicine, National Public Health Institute, Helsinki, Finland.

    Background: Bipolar disorder and schizophrenia are hypothesized to share some genetic background.

    Methods: In a two-phase study, we evaluated the effect of five promising candidate genes for psychotic disorders, DAOA, COMT, DTNBP1, NRG1, and AKT1, on bipolar spectrum disorder, psychotic disorder, and related cognitive endophenotypes in a Finnish family-based sample ascertained for bipolar disorder.

    Results: In initial screening of 362 individuals from 63 families, we found only marginal evidence for association with the diagnosis-based dichotomous classification. Those associations did not strengthen when we genotyped the complete sample of 723 individuals from 180 families. We observed a significant association of DAOA variants rs3916966 and rs2391191 with visuospatial ability (Quantitative Transmission Disequilibrium Test [QTDT]; p = 4 x 10(-6) and 5 x 10(-6), respectively) (n = 159) with the two variants in almost complete linkage disequilibrium. The COMT variant rs165599 also associated with visuospatial ability, and in our dataset, we saw an additive effect of DAOA and COMT variants on this neuropsychological trait.

    Conclusions: The ancestral allele (Arg) of the nonsynonymous common DAOA variant rs2391191 (Arg30Lys) was found to predispose to impaired performance. The DAOA gene may play a role in predisposing individuals to a mixed phenotype of psychosis and mania and to impairments in related neuropsychological traits.

    Funded by: Wellcome Trust: 089061

    Biological psychiatry 2008;64;5;438-42

  • The novel mouse mutation Oblivion inactivates the PMCA2 pump and causes progressive hearing loss.

    Spiden SL, Bortolozzi M, Di Leva F, de Angelis MH, Fuchs H, Lim D, Ortolano S, Ingham NJ, Brini M, Carafoli E, Mammano F and Steel KP

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, UK.

    Progressive hearing loss is common in the human population, but we have few clues to the molecular basis. Mouse mutants with progressive hearing loss offer valuable insights, and ENU (N-ethyl-N-nitrosourea) mutagenesis is a useful way of generating models. We have characterised a new ENU-induced mouse mutant, Oblivion (allele symbol Obl), showing semi-dominant inheritance of hearing impairment. Obl/+ mutants showed increasing hearing impairment from post-natal day (P)20 to P90, and loss of auditory function was followed by a corresponding base to apex progression of hair cell degeneration. Obl/Obl mutants were small, showed severe vestibular dysfunction by 2 weeks of age, and were completely deaf from birth; sensory hair cells were completely degenerate in the basal turn of the cochlea, although hair cells appeared normal in the apex. We mapped the mutation to Chromosome 6. Mutation analysis of Atp2b2 showed a missense mutation (2630C-->T) in exon 15, causing a serine to phenylalanine substitution (S877F) in transmembrane domain 6 of the PMCA2 pump, the resident Ca(2+) pump of hair cell stereocilia. Transmembrane domain mutations in these pumps generally are believed to be incompatible with normal targeting of the protein to the plasma membrane. However, analyses of hair cells in cultured utricular maculae of Obl/Obl mice and of the mutant Obl pump in model cells showed that the protein was correctly targeted to the plasma membrane. Biochemical and biophysical characterisation showed that the pump had lost a significant portion of its non-stimulated Ca(2+) exporting ability. These findings can explain the progressive loss of auditory function, and indicate the limits in our ability to predict mechanism from sequence alone.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Telethon: GGP04169; Wellcome Trust

    PLoS genetics 2008;4;10;e1000238

  • Large recurrent microdeletions associated with schizophrenia.

    Stefansson H, Rujescu D, Cichon S, Pietiläinen OP, Ingason A, Steinberg S, Fossdal R, Sigurdsson E, Sigmundsson T, Buizer-Voskamp JE, Hansen T, Jakobsen KD, Muglia P, Francks C, Matthews PM, Gylfason A, Halldorsson BV, Gudbjartsson D, Thorgeirsson TE, Sigurdsson A, Jonasdottir A, Jonasdottir A, Bjornsson A, Mattiasdottir S, Blondal T, Haraldsson M, Magnusdottir BB, Giegling I, Möller HJ, Hartmann A, Shianna KV, Ge D, Need AC, Crombie C, Fraser G, Walker N, Lonnqvist J, Suvisaari J, Tuulio-Henriksson A, Paunio T, Toulopoulou T, Bramon E, Di Forti M, Murray R, Ruggeri M, Vassos E, Tosato S, Walshe M, Li T, Vasilescu C, Mühleisen TW, Wang AG, Ullum H, Djurovic S, Melle I, Olesen J, Kiemeney LA, Franke B, GROUP, Sabatti C, Freimer NB, Gulcher JR, Thorsteinsdottir U, Kong A, Andreassen OA, Ophoff RA, Georgi A, Rietschel M, Werge T, Petursson H, Goldstein DB, Nöthen MM, Peltonen L, Collier DA, St Clair D and Stefansson K

    CNS Division, deCODE genetics, Sturlugata 8, IS-101 Reykjavík, Iceland.

    Reduced fecundity, associated with severe mental disorders, places negative selection pressure on risk alleles and may explain, in part, why common variants have not been found that confer risk of disorders such as autism, schizophrenia and mental retardation. Thus, rare variants may account for a larger fraction of the overall genetic risk than previously assumed. In contrast to rare single nucleotide mutations, rare copy number variations (CNVs) can be detected using genome-wide single nucleotide polymorphism arrays. This has led to the identification of CNVs associated with mental retardation and autism. In a genome-wide search for CNVs associating with schizophrenia, we used a population-based sample to identify de novo CNVs by analysing 9,878 transmissions from parents to offspring. The 66 de novo CNVs identified were tested for association in a sample of 1,433 schizophrenia cases and 33,250 controls. Three deletions at 1q21.1, 15q11.2 and 15q13.3 showing nominal association with schizophrenia in the first sample (phase I) were followed up in a second sample of 3,285 cases and 7,951 controls (phase II). All three deletions significantly associate with schizophrenia and related psychoses in the combined sample. The identification of these rare, recurrent risk variants, having occurred independently in multiple founders and being subject to negative selection, is important in itself. CNV analysis may also point the way to the identification of additional and more prevalent risk variants in genes and pathways involved in schizophrenia.

    Funded by: Department of Health: PDA/02/06/016; Medical Research Council: G0901310; NIMH NIH HHS: R01 MH078075, R01MH71425-01A1; Wellcome Trust: 089061

    Nature 2008;455;7210;232-6

  • Accounting for non-genetic factors improves the power of eQTL studies

    Stegle, O, Kannan, A, Durbin, R, Winn, J.

    Lecture Notes in Computer Science. 2008;4955;411-422

  • Insights from the complete genome sequence of Mycobacterium marinum on the evolution of Mycobacterium tuberculosis.

    Stinear TP, Seemann T, Harrison PF, Jenkin GA, Davies JK, Johnson PD, Abdellah Z, Arrowsmith C, Chillingworth T, Churcher C, Clarke K, Cronin A, Davis P, Goodhead I, Holroyd N, Jagels K, Lord A, Moule S, Mungall K, Norbertczak H, Quail MA, Rabbinowitsch E, Walker D, White B, Whitehead S, Small PL, Brosch R, Ramakrishnan L, Fischbach MA, Parkhill J and Cole ST

    Department of Microbiology, Monash University, Clayton 3800, Australia.

    Mycobacterium marinum, a ubiquitous pathogen of fish and amphibia, is a near relative of Mycobacterium tuberculosis, the etiologic agent of tuberculosis in humans. The genome of the M strain of M. marinum comprises a 6,636,827-bp circular chromosome with 5424 CDS, 10 prophages, and a 23-kb mercury-resistance plasmid. Prominent features are the very large number of genes (57) encoding polyketide synthases (PKSs) and nonribosomal peptide synthases (NRPSs) and the most extensive repertoire yet reported of the mycobacteria-restricted PE and PPE proteins, and related-ESX secretion systems. Some of the NRPS genes comprise a novel family and seem to have been acquired horizontally. M. marinum is used widely as a model organism to study M. tuberculosis pathogenesis, and genome comparisons confirmed the close genetic relationship between these two species, as they share 3000 orthologs with an average amino acid identity of 85%. Comparisons with the more distantly related Mycobacterium avium subspecies paratuberculosis and Mycobacterium smegmatis reveal how an ancestral generalist mycobacterium evolved into M. tuberculosis and M. marinum. M. tuberculosis has undergone genome downsizing and extensive lateral gene transfer to become a specialized pathogen of humans and other primates without retaining an environmental niche. M. marinum has maintained a large genome so as to retain the capacity for environmental survival while becoming a broad host range pathogen that produces disease strikingly similar to M. tuberculosis. The work described herein provides a foundation for using M. marinum to better understand the determinants of pathogenesis of tuberculosis.

    Funded by: NIAID NIH HHS: R01 AI036396; Wellcome Trust

    Genome research 2008;18;5;729-41

  • Distal transgene insertion affects CpG island maintenance during differentiation.

    Strathdee D, Whitelaw CB and Clark AJ

    Division of Gene Function and Development, Roslin Institute, Roslin, Midlothian EH25 9PS, United Kingdom.

    About half of all genes have a CpG island surrounding the promoter and transcription start site. Most promoter CpG islands are normally unmethylated in all tissues, irrespective of the expression level of the associated gene. Establishment of the appropriate patterns of DNA methylation in the genome is essential for normal development and patterns of gene expression. Aberrant methylation of CpG islands and silencing of the associated genes is frequently observed in cancer. One gene with a 5'-CpG island is cytoplasmic beta-actin, which is an abundantly expressed protein and a major component of microfilaments. Inserting a betageo cassette into the 3'-untranslated region of beta-actin gene led to widespread but not ubiquitous lacZ expression in mice heterozygous for the modified beta-actin allele. Surprisingly, embryos homozygous for this insertion died at mid-gestation. The modified beta-actin allele was expressed in undifferentiated embryonic stem cells but was turned off as these cells differentiate in vitro and in vivo. We demonstrate that the insertion affects the maintenance of the methylation status of the CpG island of the modified beta-actin allele in differentiated but not in undifferentiated embryonic cells. These data suggest that there is a two-step process to defining a CpG island, requiring both embryonic establishment and a signal that maintains the CpG island in differentiated cells. Furthermore, they indicate that features built into the CpG island are not sufficient to direct CpG island maintenance during differentiation.

    Funded by: Biotechnology and Biological Sciences Research Council: Cad04943

    The Journal of biological chemistry 2008;283;17;11509-15

  • Genome resequencing and genetic variation.

    Stratton M

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Nature biotechnology 2008;26;1;65-6

  • The emerging landscape of breast cancer susceptibility.

    Stratton MR and Rahman N

    Section of Cancer Genetics, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey SM2 5NG, UK.

    The genetic basis of inherited predisposition to breast cancer has been assiduously investigated for the past two decades and has been the subject of several recent discoveries. Three reasonably well-defined classes of breast cancer susceptibility alleles with different levels of risk and prevalence in the population have become apparent: rare high-penetrance alleles, rare moderate-penetrance alleles and common low-penetrance alleles. The contribution of each component to breast cancer predisposition is still to be fully explored, as are the phenotypic characteristics of the cancers associated with them, the ways in which they interact, much of their biology and their clinical utility. These recent advances herald a new chapter in the exploration of susceptibility to breast cancer and are likely to provide insights relevant to other common, heterogeneous diseases.

    Nature genetics 2008;40;1;17-22

  • A DNA transposon-based approach to validate oncogenic mutations in the mouse.

    Su Q, Prosser HM, Campos LS, Ortiz M, Nakamura T, Warren M, Dupuy AJ, Jenkins NA, Copeland NG, Bradley A and Liu P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, England.

    Large-scale cancer genome projects will soon be able to sequence many cancer genomes to comprehensively identify genetic changes in human cancer. Genome-wide association studies have also identified putative cancer associated loci. Functional validation of these genetic mutations in vivo is becoming a challenge. We describe here a DNA transposon-based platform that permits us to explore the oncogenic potential of genetic mutations in the mouse. Briefly, promoter-less human cancer gene cDNAs were first cloned into Sleeping Beauty (SB) transposons. DNA transposition in the mouse that carried both the transposons and the SB transposase made it possible for the cDNAs to be expressed from an appropriate endogenous genomic locus and in the relevant cell types for tumor development. Consequently, these mice developed a broad spectrum of tumors at very early postnatal stages. This technology thus complements the large-scale cancer genome projects.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2008;105;50;19904-9

  • HapMap coverage for SNPs in the Japanese population.

    Takeuchi F, Serizawa M and Kato N

    Department of Medical Ecology and Informatics, Research Institute, International Medical Center of Japan, Tokyo, Japan.

    The coverage of human genomic variations is known to substantially affect the success of genome-wide association studies. We therefore assessed the SNP coverage in the HapMap database for a total of 1,304 subjects from the Japanese population by combining resequencing and high-density genotyping approaches. First, we resequenced 48 Japanese subjects in 86 genes (572 kb in total), and we then genotyped the subset of tag SNPs and also imputed genotypes for all of the detected SNPs in an additional panel of 1,256 subjects. Subsequently, we genotyped 555,352 tag SNPs selected from the HapMap in 72 Japanese subjects (from the panel of 1,256 subjects) and further imputed genotypes for all SNPs currently included in the HapMap. Of 738 common genic SNPs (1.3 per kb) that we detected by resequencing, 58% had already been genotyped in the HapMap, and 31% were not genotyped but had a proxy SNP in the HapMap with a linkage disequilibrium coefficient r (2) > or = 0.8, whereas 11% were not represented in the current HapMap database. Thus, the HapMap coverage appears to be high although not thorough for SNPs in the Japanese population as compared to its coverage reported in Caucasians, and this needs to be considered when we interpret association results.

    Journal of human genetics 2008;53;1;96-9

  • Identification of CC2D2A as a Meckel syndrome gene adds an important piece to the ciliopathy puzzle.

    Tallila J, Jakkula E, Peltonen L, Salonen R and Kestilä M

    National Public Health Institute, Institute for Molecular Medicine Finland, Helsinki 00290, Finland.

    Meckel syndrome (MKS) is a lethal malformation disorder characterized classically by encephalocele, polycystic kidneys, and polydactyly. MKS is also one of the major contributors to syndromic neural tube defects (NTDs). Recent findings have shown primary cilia dysfunction in the molecular background of MKS, indicating that cilia are critical for early human development. However, even though four genes behind MKS have been identified to date, they elucidate only a minor proportion of the MKS cases. In this study, instead of traditional linkage analysis, we selected 10 nonrelated affected fetuses and looked for the homozygous regions shared by them. Based on this strategy, we identified the sixth locus and the fifth gene, CC2D2A (MKS6), behind MKS. The biological function of CC2D2A is uncharacterized, but the corresponding polypeptide is predicted to be involved in ciliary functions and it has a calcium binding domain (C2). Immunofluorescence staining of patient's fibroblast cells demonstrates that the cells lack cilia, providing evidence for the critical role of CC2D2A in cilia formation. Our finding is very significant not only to understand the molecular background of MKS, but also to obtain additional information about the function of the cilia, which can help to understand their significance in normal development and also in other ciliopathies, which are an increasing group of disorders with overlapping phenotypes.

    Funded by: NIEHS NIH HHS: P01 ES11253-03

    American journal of human genetics 2008;82;6;1361-7

  • Patterns of evolution in the unique tRNA gene arrays of the genus Entamoeba.

    Tawari B, Ali IK, Scott C, Quail MA, Berriman M, Hall N and Clark CG

    Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom.

    Genome sequencing of the protistan parasite Entamoeba histolytica HM-1:IMSS revealed that almost all the tRNA genes are organized into tandem arrays that make up over 10% of the genome. The 25 distinct array units contain up to 5 tRNA genes each and some also encode the 5S RNA. Between adjacent genes in array units are complex short tandem repeats (STRs) resembling microsatellites. To investigate the origins and evolution of this unique gene organization, we have undertaken a genome survey to determine the array unit organization in 4 other species of Entamoeba-Entamoeba dispar, Entamoeba moshkovskii, Entamoeba terrapinae, and Entamoeba invadens-and have explored the STR structure in other isolates of E. histolytica. The genome surveys revealed that E. dispar has the same array unit organization as E. histolytica, including the presence and numerical variation of STRs between adjacent genes. However, the individual repeat sequences are completely different to those in E. histolytica. All other species of Entamoeba studied also have tandem arrays of clustered tRNA genes, but the gene composition of the array units often differs from that in E. histolytica/E. dispar. None of the other species' arrays exhibit the complex STRs between adjacent genes although simple tandem duplications are occasionally seen. The degree of similarity in organization reflects the phylogenetic relationships among the species studied. Within individual isolates of E. histolytica most copies of the array unit are uniform in sequence with only minor variation in the number and organization of the STRs. Between isolates, however, substantial differences in STR number and organization can exist although the individual repeat sequences tend to be conserved. The origin of this unique gene organization in the genus Entamoeba clearly predates the common ancestor of the species investigated to date and their function remains unclear.

    Funded by: Biotechnology and Biological Sciences Research Council: G18391; Wellcome Trust: 067314, 085775

    Molecular biology and evolution 2008;25;1;187-98

  • Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project.

    Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy NW, Hermjakob H, Julian RK, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper M, Le Novère N, Leebens-Mack J, Lewis SE, Lord P, Mallon AM, Marthandan N, Masuya H, McNally R, Mehrle A, Morrison N, Orchard S, Quackenbush J, Reecy JM, Robertson DG, Rocca-Serra P, Rodriguez H, Rosenfelder H, Santoyo-Lopez J, Scheuermann RH, Schober D, Smith B, Snape J, Stoeckert CJ, Tipton K, Sterk P, Untergasser A, Vandesompele J and Wiemann S

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,

    Funded by: Biotechnology and Biological Sciences Research Council: E025080/1; Medical Research Council: MC_U142684171; NIBIB NIH HHS: EB005034-01, R01 EB005034, R01 EB005034-04

    Nature biotechnology 2008;26;8;889-96

  • Control of feeding behavior in C. elegans by human G protein-coupled receptors permits screening for agonist-expressing bacteria.

    Teng MS, Shadbolt P, Fraser AG, Jansen G and McCafferty J

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, United Kingdom.

    G protein-coupled receptors (GPCRs) have a key role in many biological processes and are important drug targets for many human diseases. Therefore, understanding the molecular interactions between GPCRs and their ligands would improve drug design. Here, we describe an approach that allows the rapid identification of functional agonists expressed in bacteria. Transgenic Caenorhabditis elegans expressing the human chemokine receptor 5 (CCR5) in nociceptive neurons show avoidance behavior on encounter with the ligand MIP-1alpha and avoid feeding on Escherichia coli expressing MIP-1alpha compared with control bacteria. This system allows a simple activity screen, based on the distribution of transgenic worms in a binary food-choice assay, without a requirement for protein purification or tagging. By using this approach, a library of 68 MIP-1alpha variants was screened, and 13 critical agonist residues involved in CCR5 activation were identified, four of which (T8, A9, N22, and A25) have not been described previously, to our knowledge. Identified residues were subsequently validated in receptor binding assays and by calcium flux assays in mammalian cells. This approach serves not only for structure/function studies as demonstrated, but may be used to facilitate the discovery of agonists within bacterial libraries.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2008;105;39;14826-31

  • Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure.

    Teo YY

    Wellcome Trust Centre for Human Genetics, University of Oxford, UK.

    Genetic association studies which survey the entire genome have become a common design for uncovering the genetic basis of common diseases, including lipid-related traits. Such studies have identified several novel loci which influence blood lipids. The present review highlights the statistical challenges associated with such large-scale genetic studies and discusses the available methodological strategies for handling these issues.

    The successful analysis of genome-wide data assayed on commercial genotyping arrays depends on careful exploration of the data. Unaccounted sample failures, genotyping errors and population structure can introduce misleading signals that mimic genuine association. Careful interpretation of useful summary statistics and graphical data displays can minimize the extent of false associations that need to be followed up in replication or fine-mapping experiments.

    Summary: Recently published genome-wide studies are beginning to yield valuable insights into the importance of well designed methodological and statistical techniques for sensible interpretation of the plethora of genetic data generated.

    Funded by: Wellcome Trust

    Current opinion in lipidology 2008;19;2;133-43

  • Whole genome-amplified DNA: insights and imputation.

    Teo YY, Inouye M, Small KS, Fry AE, Potter SC, Dunstan SJ, Seielstad M, Barroso I, Wareham NJ, Rockett KA, Kwiatkowski DP and Deloukas P

    Funded by: Medical Research Council: G0600230, G19/9, MC_U106179471; Wellcome Trust: 077011, 077016

    Nature methods 2008;5;4;279-80

  • Perturbation analysis: a simple method for filtering SNPs with erroneous genotyping in genome-wide association studies.

    Teo YY, Small KS, Clark TG and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, University of Oxford, United Kingdom.

    We introduce a simple and yet scientifically objective criterion for identifying SNPs with genotyping errors due to poor clustering. This yields a metric for assessing the stability of the assigned genotypes by evaluating the extent of discordance between the calls made with the unperturbed and perturbed intensities. The efficacy of the metric is evaluated by: (1) estimating the extent of over-dispersion of the Hardy-Weinberg equilibrium chi-square test statistics; (2) an interim case-control study, where we investigated the efficacy of the introduced metric and standard quality control filters in reducing the number of SNPs with evidence of phenotypic association which are attributed to genotyping errors; (3) investigating the call and concordance rates of SNPs identified by perturbation analysis which have been genotyped on both Affymetrix and Illumina platforms. Removing SNPs identified by the extent of discordance can reduce the degree of over-dispersion of the HWE test statistic. Sensible use of perturbation analysis in an association study can correctly identify SNPs with problematic genotyping, reducing the number required for visual inspection. SNPs identified by perturbation analysis had lower call and concordance rates, and removal of these SNPs significantly improved the performance for the remaining SNPs.

    Funded by: Medical Research Council; Wellcome Trust: 082370

    Annals of human genetics 2008;72;Pt 3;368-74

  • Maternal footprints of Southeast Asians in North India.

    Thangaraj K, Chaubey G, Kivisild T, Selvi Rani D, Singh VK, Ismail T, Carvalho-Silva D, Metspalu M, Bhaskar LV, Reddy AG, Chandra S, Pande V, Prathap Naidu B, Adarsh N, Verma A, Jyothi IA, Mallick CB, Shrivastava N, Devasena R, Kumari B, Singh AK, Dwivedi SK, Singh S, Rao G, Gupta P, Sonvane V, Kumari K, Basha A, Bhargavi KR, Lalremruata A, Gupta AK, Kaur G, Reddy KK, Rao AP, Villems R, Tyler-Smith C and Singh L

    Centre for Cellular and Molecular Biology, Hyderabad, India.

    We have analyzed 7,137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNA(Lys) region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.9 and 0.6%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9-bp deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.

    Funded by: Wellcome Trust: 077009

    Human heredity 2008;66;1;1-9

  • Characterization of 6q deletions in mature B cell lymphomas and childhood acute lymphoblastic leukemia.

    Thelander EF, Ichimura K, Corcoran M, Barbany G, Nordgren A, Heyman M, Berglund M, Mungall A, Rosenquist R, Collins VP, Grandér D, Larsson C and Lagercrantz S

    Medical Genetics Unit, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.

    The study was undertaken with the aim to outline deletion patterns involving the long arm of chromosome 6, a common abnormality in lymphoproliferative disorders. Using a chromosome 6 specific tile path array, 60 samples from in total 49 cases with mantle cell lymphoma (MCL), de novo diffuse large B-cell lymphoma (DLBCL), transformed DLBCL as well as preceding follicular lymphoma (FL), and childhood acute lymphoblastic leukemia (ALL), were characterized. Twenty-six of the studied cases, representing all diagnoses, showed a 6q deletion among which 85% involved a 3 Mb region in 6q21. The minimal deleted interval in 6q21 encompasses the FOXO3A, PRDM1 and HACE1 candidate genes. The PRDM1 gene was found homozygously deleted in a case of DLBCL. Moreover, in two DLBCL cases, an overlapping homozygous deletion was identified in 6q23.3 - 24.1, encompassing the TNFAIP3 gene among others. Taken together, we refined the deletion pattern within the long arm of chromosome 6 in four different types of hematological malignances, suggesting the location of tumor suppressor genes involved in the tumor progression.

    Leukemia & lymphoma 2008;49;3;477-87

  • Phenotypical characteristics of idiopathic infantile nystagmus with and without mutations in FRMD7.

    Thomas S, Proudlock FA, Sarvananthan N, Roberts EO, Awan M, McLean R, Surendran M, Kumar AS, Farooq SJ, Degg C, Gale RP, Reinecke RD, Woodruff G, Langmann A, Lindner S, Jain S, Tarpey P, Raymond FL and Gottlob I

    Ophthalmology Group, University of Leicester, Leicester, UK.

    Idiopathic infantile nystagmus (IIN) consists of involuntary oscillations of the eyes. The familial form is most commonly X-linked. We recently found mutations in a novel gene FRMD7 (Xq26.2), which provided an opportunity to investigate a genetically defined and homogeneous group of patients with nystagmus. We compared clinical features and eye movement recordings of 90 subjects with mutation in the gene (FRMD7 group) to 48 subjects without mutations but with clinical IIN (non-FRMD7 group). Fifty-eight female obligate carriers of the mutation were also investigated. The median visual acuity (VA) was 0.2 logMAR (Snellen equivalent 6/9) in both groups and most patients had good stereopsis. The prevalence of strabismus was also similar (FRMD7: 7.8%, non-FRMD7: 10%). The presence of anomalous head posture (AHP) was significantly higher in the non-FRMD7 group (P < 0.0001). The amplitude of nystagmus was more strongly dependent on the direction of gaze in the FRMD7 group being lower at primary position (P < 0.0001), compared to non-FRMD7 group (P = 0.83). Pendular nystagmus waveforms were also more frequent in the FRMD7 group (P = 0.003). Fifty-three percent of the obligate female carriers of an FRMD7 mutation were clinically affected. The VA's in affected females were slightly better compared to affected males (P = 0.014). Subnormal optokinetic responses were found in a subgroup of obligate unaffected carriers, which may be interpreted as a sub-clinical manifestation. FRMD7 is a major cause of X-linked IIN. Most clinical and eye movement characteristics were similar in the FRMD7 group and non-FRMD7 group with most patients having good VA and stereopsis and low incidence of strabismus. Fewer patients in the FRMD7 group had AHPs, their amplitude of nystagmus being lower in primary position. Our findings are helpful in the clinical identification of IIN and genetic counselling of nystagmus patients.

    Brain : a journal of neurology 2008;131;Pt 5;1259-67

  • Vive la différence.

    Thomson NR

    Nature reviews. Microbiology 2008;6;7;502-3

  • Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into evolutionary and host adaptation pathways.

    Thomson NR, Clayton DJ, Windhorst D, Vernikos G, Davidson S, Churcher C, Quail MA, Stevens M, Jones MA, Watson M, Barron A, Layton A, Pickard D, Kingsley RA, Bignell A, Clark L, Harris B, Ormond D, Abdellah Z, Brooks K, Cherevach I, Chillingworth T, Woodward J, Norberczak H, Lord A, Arrowsmith C, Jagels K, Moule S, Mungall K, Sanders M, Whitehead S, Chabalgoity JA, Maskell D, Humphrey T, Roberts M, Barrow PA, Dougan G and Parkhill J

    The Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    We have determined the complete genome sequences of a host-promiscuous Salmonella enterica serovar Enteritidis PT4 isolate P125109 and a chicken-restricted Salmonella enterica serovar Gallinarum isolate 287/91. Genome comparisons between these and other Salmonella isolates indicate that S. Gallinarum 287/91 is a recently evolved descendent of S. Enteritidis. Significantly, the genome of S. Gallinarum has undergone extensive degradation through deletion and pseudogene formation. Comparison of the pseudogenes in S. Gallinarum with those identified previously in other host-adapted bacteria reveals the loss of many common functional traits and provides insights into possible mechanisms of host and tissue adaptation. We propose that experimental analysis in chickens and mice of S. Enteritidis-harboring mutations in functional homologs of the pseudogenes present in S. Gallinarum could provide an experimentally tractable route toward unraveling the genetic basis of host adaptation in S. enterica.

    Funded by: Wellcome Trust

    Genome research 2008;18;10;1624-37

  • Chlamydia trachomatis: genome sequence analysis of lymphogranuloma venereum isolates.

    Thomson NR, Holden MT, Carder C, Lennard N, Lockey SJ, Marsh P, Skipp P, O'Connor CD, Goodhead I, Norbertzcak H, Harris B, Ormond D, Rance R, Quail MA, Parkhill J, Stephens RS and Clarke IN

    The Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Chlamydia trachomatis is the most common cause of sexually transmitted infections in the UK, a statistic that is also reflected globally. There are three biovariants of C. trachomatis: trachoma (serotypes A-C) and two sexually transmitted pathovars; serotypes D-K and lymphogranuloma venereum (LGV). Trachoma isolates and the sexually transmitted serotypes D-K are noninvasive, whereas the LGV strains are invasive, causing a disseminating infection of the local draining lymph nodes. Genome sequences are available for single isolates from the trachoma (serotype A) and sexually transmitted (serotype D) biotypes. We sequenced two isolates from the remaining biotype, LGV, a long-term laboratory passaged strain and the recent "epidemic" LGV isolate-causing proctitis. Although the genome of the LGV strain shows no additional genes that could account for the differences in disease outcome, we found evidence of functional gene loss and identified regions of heightened sequence variation that have previously been shown to be important sites for interstrain recombination. We have used new sequencing technologies to show that the recent clinical LGV isolate causing proctitis is unlikely to be a newly emerged strain but is most probably an old strain with relatively new clinical manifestations.

    Funded by: Wellcome Trust: 080348

    Genome research 2008;18;1;161-71

  • Evolutionary plasticity of genetic interaction networks.

    Tischler J, Lehner B and Fraser AG

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Non-additive genetic interactions contribute to many genetic disorders, but they are extremely difficult to predict. Here we show that genetic interactions identified in yeast, unlike gene functions or protein interactions, are not highly conserved in animals. Genetic interactions are therefore unlikely to represent simple redundancy between genes or pathways, and genetic interactions from yeast do not directly predict genetic interactions in higher eukaryotes, including humans.

    Funded by: Wellcome Trust

    Nature genetics 2008;40;4;390-1

  • Generation of a genomic tiling array of the human major histocompatibility complex (MHC) and its application for DNA methylation analysis.

    Tomazou EM, Rakyan VK, Lefebvre G, Andrews R, Ellis P, Jackson DK, Langford C, Francis MD, Bäckdahl L, Miretti M, Coggill P, Ottaviani D, Sheer D, Murrell A and Beck S

    The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Background: The major histocompatibility complex (MHC) is essential for human immunity and is highly associated with common diseases, including cancer. While the genetics of the MHC has been studied intensively for many decades, very little is known about the epigenetics of this most polymorphic and disease-associated region of the genome.

    Methods: To facilitate comprehensive epigenetic analyses of this region, we have generated a genomic tiling array of 2 Kb resolution covering the entire 4 Mb MHC region. The array has been designed to be compatible with chromatin immunoprecipitation (ChIP), methylated DNA immunoprecipitation (MeDIP), array comparative genomic hybridization (aCGH) and expression profiling, including of non-coding RNAs. The array comprises 7832 features, consisting of two replicates of both forward and reverse strands of MHC amplicons and appropriate controls.

    Results: Using MeDIP, we demonstrate the application of the MHC array for DNA methylation profiling and the identification of tissue-specific differentially methylated regions (tDMRs). Based on the analysis of two tissues and two cell types, we identified 90 tDMRs within the MHC and describe their characterisation.

    Conclusion: A tiling array covering the MHC region was developed and validated. Its successful application for DNA methylation profiling indicates that this array represents a useful tool for molecular analyses of the MHC in the context of medical genomics.

    Funded by: Cancer Research UK: A8318

    BMC medical genomics 2008;1;19

  • Determination and validation of principal gene products.

    Tress ML, Wesselink JJ, Frankish A, López G, Goldman N, Löytynoja A, Massingham T, Pardi F, Whelan S, Harrow J and Valencia A

    Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.

    Motivation: Alternative splicing has the potential to generate a wide range of protein isoforms. For many computational applications and for experimental research, it is important to be able to concentrate on the isoform that retains the core biological function. For many genes this is far from clear.

    Results: We have combined five methods into a pipeline that allows us to detect the principal variant for a gene. Most of the methods were based on conservation between species, at the level of both gene and protein. The five methods used were the conservation of exonic structure, the detection of non-neutral evolution, the conservation of functional residues, the existence of a known protein structure and the abundance of vertebrate orthologues. The pipeline was able to determine a principal isoform for 83% of a set of well-annotated genes with multiple variants.

    Funded by: NHGRI NIH HHS: U54 HG004555-01; Wellcome Trust: 077198

    Bioinformatics (Oxford, England) 2008;24;1;11-7

  • Multidirectional cross-species painting illuminates the history of karyotypic evolution in Perissodactyla.

    Trifonov VA, Stanyon R, Nesterenko AI, Fu B, Perelman PL, O'Brien PC, Stone G, Rubtsova NV, Houck ML, Robinson TJ, Ferguson-Smith MA, Dobigny G, Graphodatsky AS and Yang F

    Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, 630090, Russia.

    The order Perissodactyla, the group of odd-toed ungulates, includes three extant families: Equidae, Tapiridae, and Rhinocerotidae. The extremely rapid karyotypic diversification in perissodactyls has so far prevented the establishment of genome-wide homology maps between these three families by traditional cytogenetic approaches. Here we report the first genome-wide comparative chromosome maps of African rhinoceroses, four tapir species, four equine species, and humans. These maps were established by multidirectional chromosome painting, with paint probes derived from flow-sorted chromosomes of Equus grevyi, Tapirus indicus, and Ceratotherium simum as well as painting probes from horse and human. The Malayan tapir (Tapirus indicus), Baird's tapir (T. bairdii), mountain tapir (T. pinchaque), lowland tapir (T. terrestris), and onager (E. hemionus onager), were studied by cross-species chromosome painting for the first time. Our results, when integrated with previously published comparative chromosome maps of the other perissodactyl species, have enabled the reconstruction of perissodactyl, ceratomorph, and equid ancestral karyotypes, and the identification of the defining evolutionary chromosomal rearrangements along each lineage. Our results allow a more reliable estimate of the mode and tempo of evolutionary chromosomal rearrangements, revealing a striking switch between the slowly evolving ceratomorphs and extremely rapidly evolving equids.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2008;16;1;89-107

  • Burkholderia pseudomallei genome plasticity associated with genomic island variation.

    Tumapa S, Holden MT, Vesaratchavest M, Wuthiekanun V, Limmathurotsakul D, Chierakul W, Feil EJ, Currie BJ, Day NP, Nierman WC and Peacock SJ

    Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand. <;

    Background: Burkholderia pseudomallei is a soil-dwelling saprophyte and the cause of melioidosis. Horizontal gene transfer contributes to the genetic diversity of this pathogen and may be an important determinant of virulence potential. The genome contains genomic island (GI) regions that encode a broad array of functions. Although there is some evidence for the variable distribution of genomic islands in B. pseudomallei isolates, little is known about the extent of variation between related strains or their association with disease or environmental survival.

    Results: Five islands from B. pseudomallei strain K96243 were chosen as representatives of different types of genomic islands present in this strain, and their presence investigated in other B. pseudomallei. In silico analysis of 10 B. pseudomallei genome sequences provided evidence for the variable presence of these regions, together with micro-evolutionary changes that generate GI diversity. The diversity of GIs in 186 isolates from NE Thailand (83 environmental and 103 clinical isolates) was investigated using multiplex PCR screening. The proportion of all isolates positive by PCR ranged from 12% for a prophage-like island (GI 9), to 76% for a metabolic island (GI 16). The presence of each of the five GIs did not differ between environmental and disease-associated isolates (p > 0.05 for all five islands). The cumulative number of GIs per isolate for the 186 isolates ranged from 0 to 5 (median 2, IQR 1 to 3). The distribution of cumulative GI number did not differ between environmental and disease-associated isolates (p = 0.27). The presence of GIs was defined for the three largest clones in this collection (each defined as a single sequence type, ST, by multilocus sequence typing); these were ST 70 (n = 15 isolates), ST 54 (n = 11), and ST 167 (n = 9). The rapid loss and/or acquisition of gene islands was observed within individual clones. Comparisons were drawn between isolates obtained from the environment and from patients with melioidosis in order to examine the role of genomic islands in virulence and clinical associations. There was no reproducible association between the individual or cumulative presence of five GIs and a range of clinical features in 103 patients with melioidosis.

    Conclusion: Horizontal gene transfer of mobile genetic elements can rapidly alter the gene repertoire of B. pseudomallei. This study confirms the utility of a range of approaches in defining the presence and significance of genomic variation in natural populations of B. pseudomallei.

    Funded by: Wellcome Trust

    BMC genomics 2008;9;190

  • Germline rates of de novo meiotic deletions and duplications causing several genomic disorders.

    Turner DJ, Miretti M, Rajan D, Fiegler H, Carter NP, Blayney ML, Beck S and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    Meiotic recombination between highly similar duplicated sequences (nonallelic homologous recombination, NAHR) generates deletions, duplications, inversions and translocations, and it is responsible for genetic diseases known as 'genomic disorders', most of which are caused by altered copy number of dosage-sensitive genes. NAHR hot spots have been identified within some duplicated sequences. We have developed sperm-based assays to measure the de novo rate of reciprocal deletions and duplications at four NAHR hot spots. We used these assays to dissect the relative rates of NAHR between different pairs of duplicated sequences. We show that (i) these NAHR hot spots are specific to meiosis, (ii) deletions are generated at a higher rate than their reciprocal duplications in the male germline and (iii) some of these genomic disorders are likely to have been underascertained clinically, most notably that resulting from the duplication of 7q11, the reciprocal of the deletion causing Williams-Beuren syndrome.

    Funded by: Wellcome Trust: 077008, 077014

    Nature genetics 2008;40;1;90-5

  • Long-range, high-throughput haplotype determination via haplotype-fusion PCR and ligation haplotyping.

    Turner DJ, Tyler-Smith C and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Ligation Haplotyping is a robust, novel method for experimental determination of haplotypes over long distances, which can be applied to assaying both sequence and structural variation. The simplicity and efficacy of the method for genotyping large chromosomal rearrangements and haplotyping SNPs over long distances make it a valuable and powerful addition to the methodological repertoire, which will be beneficial to studies of population genetics and evolution, disease association and inheritance, and genomic variation. We illustrate the versatility of the method both by genotyping a Yp paracentric inversion, found in approximately 60% of Northwest European males, that strongly influences the germline rate of infertility-causing XY translocations and by haplotyping two autosomal SNPs that lie 16.4 kb apart on chromosome 7, and which influence an individual's susceptibility to systemic lupus erythematosus.

    Funded by: Wellcome Trust

    Nucleic acids research 2008;36;13;e82

  • An evolutionary perspective on Y-chromosomal variation and male infertility.

    Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridgeshire, UK.

    Genetic variation on the Y chromosome is one of the best-documented causes of male infertility, but the genes responsible have still not been identified. This review discusses how an evolutionary perspective may help with interpretation of the data available and suggest novel approaches to identify key genes. Comparison with the chimpanzee Y chromosome indicates that USP9Y is dispensable in apes, but that multiple copies of TSPY1 may have an important role. Comparisons between infertile and control groups in search of genetic susceptibility factors are more complex for the Y chromosome than for the rest of the genome because of population stratification and require unusual levels of confirmation. But the extreme population stratification exhibited by the Y also allows populations particularly suitable for some studies to be identified, such as the partial AZFc deletions common in Northern European populations where further dissection of this complex structural region would be facilitated.

    Funded by: Wellcome Trust

    International journal of andrology 2008;31;4;376-82

  • Loss of Rassf1a cooperates with Apc(Min) to accelerate intestinal tumourigenesis.

    van der Weyden L, Arends MJ, Dovey OM, Harrison HL, Lefebvre G, Conte N, Gergely FV, Bradley A and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Promoter methylation of the RAS-association domain family 1, isoform A gene (RASSF1A) is one of the most frequent events found in human tumours. In this study we set out to test the hypothesis that loss of Rassf1a can cooperate with inactivation of the adenomatous polyposis coli (Apc) gene to accelerate intestinal tumourigenesis using the Apc-Min (Apc(Min/+)) mouse model, as mutational or deletional inactivation of APC is a frequent early event in the genesis of intestinal cancer. Further, loss of RASSF1A has also been reported to occur in premalignant adenomas of the bowel. RASSF1A has been implicated in an array of pivotal cellular processes, including regulation of the cell cycle, apoptosis, microtubule stability and most recently in the beta-catenin signalling pathway. By interbreeding isoform specific Rassf1a knockout mice with Apc(+/Min) mice, we showed that loss of Rassf1a results in a significant increase in adenomas of the small intestine and accelerated intestinal tumourigenesis leading to the earlier death of adenocarcinoma-bearing mice and decreased overall survival. Comparative genomic hybridization of adenomas from Rassf1a(-/-); Apc(+/Min) mice revealed no evidence of aneuploidy or gross chromosomal instability (no difference to adenomas from Rassf1a(+/+); Apc(+/Min) mice). Immunohistochemical analysis of adenomas revealed increased nuclear beta-catenin accumulation in adenomas from Rassf1a(-/-); Apc(+/Min) mice, compared to those from Rassf1a(+/+); Apc(+/Min) mice, but no differences in proliferation marker (Ki67) staining patterns. Collectively these data demonstrate cooperation between inactivation of Rassf1a and Apc resulting in accelerated intestinal tumourigenesis, with adenomas showing increased nuclear accumulation of beta-catenin, supporting a mechanistic link via loss of the known interaction of Rassf1 with beta-TrCP that usually mediates degradation of beta-catenin.

    Funded by: Cancer Research UK: A8449; Wellcome Trust: 079643

    Oncogene 2008;27;32;4503-8

  • Graph clustering via a discrete uncoupling process

    Van Dongen, S.

    Siam Journal on Matrix Analysis and Applications. 2008;30;121-141

  • Resolving the structural features of genomic islands: a machine learning approach.

    Vernikos GS and Parkhill J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Large inserts of horizontally acquired DNA that contain functionally related genes with limited phylogenetic distribution are often referred to as genomic islands (GIs), and structural definitions of these islands, based on common features, have been proposed. Although a large number of mobile elements fall well within the GI definition, there are several concerns about the structural consensus for GIs: The current GI definition was put forward 10 yr ago when only 12 complete bacterial genomes were available, a large number of GIs deviate from that definition, and in silico predictions assuming a full/partial GI structural model bias the sampling of the GI structural space toward "well-structured" GIs. In this study, the structural features of genomic regions are sampled by a hypothesis-free, bottom-up search, and these are exploited in a machine learning approach with the aim of explicitly quantifying and modeling the contribution of each feature to the GI structure. Performing a whole-genome-based comparative analysis between 37 strains of three different genera and 12 outgroup genomes, 668 genomic regions were sampled and used to train structural GI models. The data show that, overall, GIs from the three different genera fall into distinct, genus-specific structural families. However, decreasing the taxa resolution, by studying GI structures across different genus boundaries, provides models that converge on a fairly similar GI structure, further suggesting that GIs can be seen as a superfamily of mobile elements, with core and variable structural features, rather than a well-defined family.

    Funded by: Wellcome Trust

    Genome research 2008;18;2;331-42

  • High-resolution mapping of expression-QTLs yields insight into human gene regulation.

    Veyrieras JB, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, Stephens M and Pritchard JK

    Department of Human Genetics, The University of Chicago, Chicago, IL, USA.

    Recent studies of the HapMap lymphoblastoid cell lines have identified large numbers of quantitative trait loci for gene expression (eQTLs). Reanalyzing these data using a novel Bayesian hierarchical model, we were able to create a surprisingly high-resolution map of the typical locations of sites that affect mRNA levels in cis. Strikingly, we found a strong enrichment of eQTLs in the 250 bp just upstream of the transcription end site (TES), in addition to an enrichment around the transcription start site (TSS). Most eQTLs lie either within genes or close to genes; for example, we estimate that only 5% of eQTLs lie more than 20 kb upstream of the TSS. After controlling for position effects, SNPs in exons are approximately 2-fold more likely than SNPs in introns to be eQTLs. Our results suggest an important role for mRNA stability in determining steady-state mRNA levels, and highlight the potential of eQTL mapping as a high-resolution tool for studying the determinants of gene regulation.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: HG002772, HG02585-01; NIGMS NIH HHS: GM077959

    PLoS genetics 2008;4;10;e1000214

  • Habitual energy expenditure modifies the association between NOS3 gene polymorphisms and blood pressure.

    Vimaleswaran KS, Franks PW, Barroso I, Brage S, Ekelund U, Wareham NJ and Loos RJ

    Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Cambridge, UK.

    Background: The endothelial nitric-oxide synthase (NOS3) gene encodes the enzyme (eNOS) that synthesizes the molecule nitric oxide, which facilitates endothelium-dependent vasodilation in response to physical activity. Thus, energy expenditure may modify the association between the genetic variation at NOS3 and blood pressure.

    Methods: To test this hypothesis, we genotyped 11 NOS3 polymorphisms, capturing all common variations, in 726 men and women from the Medical Research Council (MRC) Ely Study (age (mean +/- s.d.): 55 +/- 10 years, body mass index: 26.4 +/- 4.1 kg/m(2)). Habitual/non-resting energy expenditure (NREE) was assessed via individually calibrated heart rate monitoring over 4 days.

    Results: The intronic variant, IVS25+15 [G-->A], was significantly associated with blood pressure; GG homozygotes had significantly lower levels of diastolic blood pressure (DBP) (-2.8 mm Hg; P = 0.016) and systolic blood pressure (SBP) (-1.9 mm Hg; P = 0.018) than A-allele carriers. The interaction between NREE and IVS25+15 was also significant for both DBP (P = 0.006) and SBP (P = 0.026), in such a way that the effect of the GG-genotype on blood pressure was stronger in individuals with higher NREE (DBP: -4.9 mm Hg, P = 0.02. SBP: -3.8 mm Hg, P= 0.03 for the third tertile). Similar results were observed when the outcome was dichotomously defined as hypertension.

    Conclusions: In summary, the NOS3 IVS25+15 is directly associated with blood pressure and hypertension in white Europeans. However, the associations are most evident in the individuals with the highest NREE. These results need further replication and have to be ideally tested in a trial before being informative for targeted disease prevention. Eventually, the selection of individuals for lifestyle intervention programs could be guided by knowledge of genotype.

    Funded by: Medical Research Council: MC_U106179471, MC_U106179473, MC_U106188470, U.1061.00.001 (79471), U.1061.00.005(79473); Wellcome Trust: 077016, 087636

    American journal of hypertension 2008;21;3;297-302

  • The Gly482Ser genotype at the PPARGC1A gene and elevated blood pressure: a meta-analysis involving 13,949 individuals.

    Vimaleswaran KS, Luan J, Andersen G, Muller YL, Wheeler E, Brito EC, O'Rahilly S, Pedersen O, Baier LJ, Knowler WC, Barroso I, Wareham NJ, Loos RJ and Franks PW

    Department of Public Health & Clinical Medicine, Umeå University Hospital, Umeå, Sweden.

    The protein encoded by the PPARGC1A gene is expressed at high levels in metabolically active tissues and is involved in the control of oxidative stress via reactive oxygen species detoxification. Several recent reports suggest that the PPARGC1A Gly482Ser (rs8192678) missense polymorphism may relate inversely with blood pressure. We used conventional meta-analysis methods to assess the association between Gly482Ser and systolic (SBP) or diastolic blood pressures (DBP) or hypertension in 13,949 individuals from 17 studies, of which 6,042 were previously unpublished observations. The studies comprised cohorts of white European, Asian, and American Indian adults, and adolescents from South America. Stratified analyses were conducted to control for population stratification. Pooled genotype frequencies were 0.47 (Gly482Gly), 0.42 (Gly482Ser), and 0.11 (Ser482Ser). We found no evidence of association between Gly482Ser and SBP [Gly482Gly: mean = 131.0 mmHg, 95% confidence interval (CI) = 130.5-131.5 mmHg; Gly482Ser mean = 133.1 mmHg, 95% CI = 132.6-133.6 mmHg; Ser482Ser: mean = 133.5 mmHg, 95% CI = 132.5-134.5 mmHg; P = 0.409] or DBP (Gly482Gly: mean = 80.3 mmHg, 95% CI = 80.0-80.6 mmHg; Gly482Ser mean = 81.5 mmHg, 95% CI = 81.2-81.8 mmHg; Ser482Ser: mean = 82.1 mmHg, 95% CI = 81.5-82.7 mmHg; P = 0.651). Contrary to previous reports, we did not observe significant effect modification by sex (SBP, P = 0.966; DBP, P = 0.715). We were also unable to confirm the previously reported association between the Ser482 allele and hypertension [odds ratio: 0.97, 95% CI = 0.87-1.08, P = 0.585]. These results were materially unchanged when analyses were focused on whites only. However, statistical evidence of gene-age interaction was apparent for DBP [Gly482Gly: 73.5 (72.8, 74.2), Gly482Ser: 77.0 (76.2, 77.8), Ser482Ser: 79.1 (77.4, 80.9), P = 4.20 x 10(-12)] and SBP [Gly482Gly: 121.4 (120.4, 122.5), Gly482Ser: 125.9 (124.6, 127.1), Ser482Ser: 129.2 (126.5, 131.9), P = 7.20 x 10(-12)] in individuals <50 yr (n = 2,511); these genetic effects were absent in those older than 50 yr (n = 5,088) (SBP, P = 0.41; DBP, P = 0.51). Our findings suggest that the PPARGC1A Ser482 allele may be associated with higher blood pressure, but this is only apparent in younger adults.

    Funded by: Medical Research Council: MC_U106179471, MC_U106188470; Wellcome Trust: 077016

    Journal of applied physiology (Bethesda, Md. : 1985) 2008;105;4;1352-8

  • Brain gene expression profiles of Cln1 and Cln5 deficient mice unravels common molecular pathways underlying neuronal degeneration in NCL diseases.

    von Schantz C, Saharinen J, Kopra O, Cooper JD, Gentile M, Hovatta I, Peltonen L and Jalanko A

    National Public Health Institute and FIMM, Institute for Molecular Medicine, Helsinki, Finland.

    Background: The neuronal ceroid lipofuscinoses (NCL) are a group of children's inherited neurodegenerative disorders, characterized by blindness, early dementia and pronounced cortical atrophy. The similar pathological and clinical profiles of the different forms of NCL suggest that common disease mechanisms may be involved. To explore the NCL-associated disease pathology and molecular pathways, we have previously produced targeted knock-out mice for Cln1 and Cln5. Both mouse-models replicate the NCL phenotype and neuropathology; the Cln1-/- model presents with early onset, severe neurodegenerative disease, whereas the Cln5-/- model produces a milder disease with a later onset.

    Results: Here we have performed quantitative gene expression profiling of the cortex from 1 and 4 month old Cln1-/- and Cln5-/- mice. Combined microarray datasets from both mouse models exposed a common affected pathway: genes regulating neuronal growth cone stabilization display similar aberrations in both models. We analyzed locus specific gene expression and showed regional clustering of Cln1 and three major genes of this pathway, further supporting a close functional relationship between the corresponding gene products; adenylate cyclase-associated protein 1 (Cap1), protein tyrosine phosphatase receptor type F (Ptprf) and protein tyrosine phosphatase 4a2 (Ptp4a2). The evidence from the gene expression data, indicating changes in the growth cone assembly, was substantiated by the immunofluorescence staining patterns of Cln1-/- and Cln5-/- cortical neurons. These primary neurons displayed abnormalities in cytoskeleton-associated proteins actin and beta-tubulin as well as abnormal intracellular distribution of growth cone associated proteins GAP-43, synapsin and Rab3.

    Conclusion: Our data provide the first evidence for a common molecular pathogenesis behind neuronal degeneration in INCL and vLINCL. Since CLN1 and CLN5 code for proteins with distinct functional roles these data may have implications for other forms of NCLs as well.

    BMC genomics 2008;9;146

  • Does my genome look big in this?

    Walker A and Langridge G

    Nature reviews. Microbiology 2008;6;12;878-9

  • Single-cell genomics.

    Walker A and Parkhill J

    Nature reviews. Microbiology 2008;6;3;176-7

  • Genome-wide association study identifies genes for biomarkers of cardiovascular disease: serum urate and dyslipidemia.

    Wallace C, Newhouse SJ, Braund P, Zhang F, Tobin M, Falchi M, Ahmadi K, Dobson RJ, Marçano AC, Hajat C, Burton P, Deloukas P, Brown M, Connell JM, Dominiczak A, Lathrop GM, Webster J, Farrall M, Spector T, Samani NJ, Caulfield MJ and Munroe PB

    Clinical Pharmacology and The Genome Centre, Barts and The London, Queen Mary's School of Medicine and Dentistry, London, EC1M 6BQ, UK.

    Many common diseases are accompanied by disturbances in biochemical traits. Identifying the genetic determinants could provide novel insights into disease mechanisms and reveal avenues for developing new therapies. Here, we report a genome-wide association analysis for commonly measured serum and urine biochemical traits. As part of the WTCCC, 500,000 SNPs genome wide were genotyped in 1955 hypertensive individuals characterized for 25 serum and urine biochemical traits. For each trait, we assessed association with individual SNPs, adjusting for age, sex, and BMI. Lipid measurements were further examined in a meta-analysis of genome-wide data from a type 2 diabetes scan. The most promising associations were examined in two epidemiological cohorts. We discovered association between serum urate and SLC2A9, a glucose transporter (p = 2 x 10(-15)) and confirmed this in two independent cohorts, GRAPHIC study (p = 9 x 10(-15)) and TwinsUK (p = 8 x 10(-19)). The odds ratio for hyperuricaemia (defined as urate >0.4 mMol/l) is 1.89 (95% CI = 1.36-2.61) per copy of common allele. We also replicated many genes previously associated with serum lipids and found previously recognized association between LDL levels and SNPs close to genes encoding PSRC1 and CELSR2 (p = 1 x 10(-7)). The common allele was associated with a 6% increase in nonfasting serum LDL. This region showed increased association in the meta-analysis (p = 4 x 10(-14)). This finding provides a potential biological mechanism for the recent association of this same allele of the same SNP with increased risk of coronary disease.

    Funded by: Wellcome Trust: 076113/B/04/Z

    American journal of human genetics 2008;82;1;139-49

  • SCANPS: a web server for iterative protein sequence database searching by dynamic programing, with display in a hierarchical SCOP browser.

    Walsh TP, Webber C, Searle S, Sturrock SS and Barton GJ

    College of Life Sciences, University of Dundee, Dundee, UK.

    SCANPS performs iterative profile searching similar to PSI-BLAST but with full dynamic programing on each cycle and on-the-fly estimation of significance. This combination gives good sensitivity and selectivity that outperforms PSI-BLAST in domain-searching benchmarks. Although computationally expensive, SCANPS exploits onchip parallelism (MMX and SSE2 instructions on Intel chips) as well as MPI parallelism to give acceptable turnround times even for large databases. A web server developed to run SCANPS searches is now available at The server interface allows a range of different protein sequence databases to be searched including the SCOP database of protein domains. The server provides the user with regularly updated versions of the main protein sequence databases and is backed up by significant computing resources which ensure that searches are performed rapidly. For SCOP searches, the results may be viewed in a new tree-based representation that reflects the structure of the SCOP hierarchy; this aids the user in placing each hit in the context of its SCOP classification and understanding its relationship to other domains in SCOP.

    Nucleic acids research 2008;36;Web Server issue;W25-9

  • A genome wide RNAi screen by time lapse microscopy in order to identify mitotic genes - Computational aspects and challenges

    Walter, T, Held, M, Neumann, B, Heriche, J. K, Conrad, C, Pepperkok, R, Ellenberg, J.

    2008 5th IEEE International Symposium on Biomedical Imaging. 2008;328-331

  • Characterization of the antimicrobial peptide attacin loci from Glossina morsitans.

    Wang J, Hu C, Wu Y, Stuart A, Amemiya C, Berriman M, Toyoda A, Hattori M and Aksoy S

    Yale University School of Medicine, Department of Epidemiology and Public Health, 60 College Street, New Haven, CT 06510, USA.

    The antimicrobial peptide Attacin is an immune effector molecule that can inhibit the growth of gram-negative bacteria. In Glossina morsitans morsitans, which serves as the sole vectors of African trypanosomes, Attacins also play a role in trypanosome resistance, and in maintaining parasite numbers at homeostatic levels in infected individuals. We characterized the attacin encoding loci from a Bacterial Artificial Chromosome (BAC) library. The attacin genes are organized into three clusters. Cluster 1 contains two attacin (attA) genes located in head-to-head orientation, cluster 2 contains two closely related genes (attA and attB) located in a similar transcriptional orientation, and cluster 3 contains a single attacin gene (attD). Coding and transcription regulatory sequences of attA and attB are nearly identical, but differ significantly from attD. Putative AttA and AttB have signal peptide sequences, but lack the pro domain typically present in insect Attacins. Putative AttD lacks both domains. Analysis of attacin cDNA sequences shows polymorphisms that could arise either from allelic variations or from the presence of additional attacin genomic loci. Real time-PCR analysis reveals that attA and attB expression is induced in the fat body of flies per os challenged with Escherichia coli and parasitized with trypanosomes. In the midgut, expression of these attacins is similarly induced following microbial challenge, but reduced in response to parasite infections. Transcription of AttD is significantly less relative to the other two genes, and is preferentially induced in the fat body of parasitized flies. These results indicate that the different attacin genes may be differentially regulated.

    Funded by: NHGRI NIH HHS: HG02526, U01 HG002526-01, U01 HG002526-02, U01 HG002526-03; NIAID NIH HHS: AI51584, R01 AI051584-05, R01 AI068932-01A2; Wellcome Trust: 085775

    Insect molecular biology 2008;17;3;293-302

  • The diploid genome sequence of an Asian individual.

    Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, Guo Y, Feng B, Li H, Lu Y, Fang X, Liang H, Du Z, Li D, Zhao Y, Hu Y, Yang Z, Zheng H, Hellmann I, Inouye M, Pool J, Yi X, Zhao J, Duan J, Zhou Y, Qin J, Ma L, Li G, Yang Z, Zhang G, Yang B, Yu C, Liang F, Li W, Li S, Li D, Ni P, Ruan J, Li Q, Zhu H, Liu D, Lu Z, Li N, Guo G, Zhang J, Ye J, Fang L, Hao Q, Chen Q, Liang Y, Su Y, San A, Ping C, Yang S, Chen F, Li L, Zhou K, Zheng H, Ren Y, Yang L, Gao Y, Yang G, Li Z, Feng X, Kristiansen K, Wong GK, Nielsen R, Durbin R, Bolund L, Zhang X, Li S, Yang H and Wang J

    Beijing Genomics Institute at Shenzhen, Shenzhen 518000, China.

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.

    Funded by: NHGRI NIH HHS: R01 HG003229, R01 HG003229-04; Wellcome Trust: 077192

    Nature 2008;456;7218;60-5

  • Global role for polyadenylation-assisted nuclear RNA degradation in posttranscriptional gene silencing.

    Wang SW, Stevenson AL, Kearsey SE, Watt S and Bähler J

    Division of Molecular and Genomic Medicine, National Health Research Institutes, 35 Keyan Road, Zhunan Town, Miaoli County 350, Taiwan.

    Fission yeast Cid14, a component of the TRAMP (Cid14/Trf4-Air1-Mtr4 polyadenylation) complex, polyadenylates nuclear RNA and stimulates degradation by the exosome for RNA quality control. Here, we analyze patterns of global gene expression in cells lacking the Cid14 or the Dis3/Rpr44 subunit of the nuclear exosome. We found that transcripts from many genes induced during meiosis, including key regulators, accumulated in the absence of Cid14 or Dis3. Moreover, our data suggest that additional substrates include transcripts involved in heterochromatin assembly. Mutant cells lacking Cid14 and/or Dis3 accumulate transcripts corresponding to naturally silenced repeat elements within heterochromatic domains, reflecting defects in centromeric gene silencing and derepression of subtelomeric gene expression. We also uncover roles for Cid14 and Dis3 in maintaining the genomic integrity of ribosomal DNA. Our data indicate that polyadenylation-assisted nuclear RNA turnover functions in eliminating a variety of RNA targets to control diverse processes, such as heterochromatic gene silencing, meiotic differentiation, and maintenance of genomic integrity.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Molecular and cellular biology 2008;28;2;656-65

  • Chromosomal transposition of PiggyBac in mouse embryonic stem cells.

    Wang W, Lin C, Lu D, Ning Z, Cox T, Melvin D, Wang X, Bradley A and Liu P

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

    Transposon systems are widely used for generating mutations in various model organisms. PiggyBac (PB) has recently been shown to transpose efficiently in the mouse germ line and other mammalian cell lines. To facilitate PB's application in mammalian genetics, we characterized the properties of the PB transposon in mouse embryonic stem (ES) cells. We first measured the transposition efficiencies of PB transposon in mouse embryonic stem cells. We next constructed a PB/SB hybrid transposon to compare PB and Sleeping Beauty (SB) transposon systems and demonstrated that PB transposition was inhibited by DNA methylation. The excision and reintegration rates of a single PB from two independent genomic loci were measured and its ability to mutate genes with gene trap cassettes was tested. We examined PB's integration site distribution in the mouse genome and found that PB transposition exhibited local hopping. The comprehensive information from this study should facilitate further exploration of the potential of PB and SB DNA transposons in mammalian genetics.

    Funded by: NIGMS NIH HHS: 5R21GM079528; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2008;105;27;9290-5

  • Genome analysis of the platypus reveals unique signatures of evolution.

    Warren WC, Hillier LW, Marshall Graves JA, Birney E, Ponting CP, Grützner F, Belov K, Miller W, Clarke L, Chinwalla AT, Yang SP, Heger A, Locke DP, Miethke P, Waters PD, Veyrunes F, Fulton L, Fulton B, Graves T, Wallis J, Puente XS, López-Otín C, Ordóñez GR, Eichler EE, Chen L, Cheng Z, Deakin JE, Alsop A, Thompson K, Kirby P, Papenfuss AT, Wakefield MJ, Olender T, Lancet D, Huttley GA, Smit AF, Pask A, Temple-Smith P, Batzer MA, Walker JA, Konkel MK, Harris RS, Whittington CM, Wong ES, Gemmell NJ, Buschiazzo E, Vargas Jentzsch IM, Merkel A, Schmitz J, Zemann A, Churakov G, Kriegs JO, Brosius J, Murchison EP, Sachidanandam R, Smith C, Hannon GJ, Tsend-Ayush E, McMillan D, Attenborough R, Rens W, Ferguson-Smith M, Lefèvre CM, Sharp JA, Nicholas KR, Ray DA, Kube M, Reinhardt R, Pringle TH, Taylor J, Jones RC, Nixon B, Dacheux JL, Niwa H, Sekita Y, Huang X, Stark A, Kheradpour P, Kellis M, Flicek P, Chen Y, Webber C, Hardison R, Nelson J, Hallsworth-Pepin K, Delehaunty K, Markovic C, Minx P, Feng Y, Kremitzki C, Mitreva M, Glasscock J, Wylie T, Wohldmann P, Thiru P, Nhan MN, Pohl CS, Smith SM, Hou S, Nefedov M, de Jong PJ, Renfree MB, Mardis ER and Wilson RK

    Genome Sequencing Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.

    Funded by: Medical Research Council: MC_U137761446; NCI NIH HHS: P01 CA013106-37; NHGRI NIH HHS: HG002238, R01 HG004037-02, R01HG02385, U54 HG003079; NIGMS NIH HHS: R01 GM59290; Wellcome Trust: 062023

    Nature 2008;453;7192;175-83

  • Rapid Virulence Annotation (RVA): identification of virulence factors using a bacterial genome library and multiple invertebrate hosts.

    Waterfield NR, Sanchez-Contreras M, Eleftherianos I, Dowling A, Yang G, Wilkinson P, Parkhill J, Thomson N, Reynolds SE, Bode HB, Dorus S and Ffrench-Constant RH

    Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom.

    Current sequence databases now contain numerous whole genome sequences of pathogenic bacteria. However, many of the predicted genes lack any functional annotation. We describe an assumption-free approach, Rapid Virulence Annotation (RVA), for the high-throughput parallel screening of genomic libraries against four different taxa: insects, nematodes, amoeba, and mammalian macrophages. These hosts represent different aspects of both the vertebrate and invertebrate immune system. Here, we apply RVA to the emerging human pathogen Photorhabdus asymbiotica using "gain of toxicity" assays of recombinant Escherichia coli clones. We describe a wealth of potential virulence loci and attribute biological function to several putative genomic islands, which may then be further characterized using conventional molecular techniques. The application of RVA to other pathogen genomes promises to ascribe biological function to otherwise uncharacterized virulence genes.

    Funded by: Biotechnology and Biological Sciences Research Council

    Proceedings of the National Academy of Sciences of the United States of America 2008;105;41;15967-72

  • urg1: a uracil-regulatable promoter system for fission yeast with short induction and repression times.

    Watt S, Mata J, López-Maury L, Marguerat S, Burns G and Bähler J

    Cancer Research United Kingdom Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Background: The fission yeast Schizosaccharomyces pombe is a popular genetic model organism with powerful experimental tools. The thiamine-regulatable nmt1 promoter and derivatives, which take >15 hours for full induction, are most commonly used for controlled expression of ectopic genes. Given the short cell cycle of fission yeast, however, a promoter system that can be rapidly regulated, similar to the GAL system for budding yeast, would provide a key advantage for many experiments.

    We used S. pombe microarrays to identify three neighbouring genes (urg1, urg2, and urg3) whose transcript levels rapidly and strongly increased in response to uracil, a condition which otherwise had little effect on global gene expression. We cloned the promoter of urg1 (uracil-regulatable gene) to create several PCR-based gene targeting modules for replacing native promoters with the urg1 promoter (Purg1) in the normal chromosomal locations of genes of interest. The kanMX6 and natMX6 markers allow selection under urg1 induced and repressed conditions, respectively. Some modules also allow N-terminal tagging of gene products placed under urg1 control. Using pom1 as a proof-of-principle, we observed a maximal increase of Purg1-pom1 transcripts after uracil addition within less than 30 minutes, and a similarly rapid decrease after uracil removal. The induced and repressed transcriptional states remained stable over 24-hour periods. RT-PCR comparisons showed that both induced and repressed Purg1-pom1 transcript levels were lower than corresponding P3nmt1-pom1 levels (wild-type nmt1 promoter) but higher than P81nmt1-pom1 levels (weak nmt1 derivative).

    We exploited the urg1 promoter system to rapidly induce pom1 expression at defined cell-cycle stages, showing that ectopic pom1 expression leads to cell branching in G2-phase but much less so in G1-phase. The high temporal resolution provided by the urg1 promoter should facilitate experimental design and improve the genetic toolbox for the fission yeast community.

    Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118

    PloS one 2008;3;1;e1428

  • Genome-wide association analysis identifies 20 loci that influence adult height.

    Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, Freathy RM, Perry JR, Stevens S, Hall AS, Samani NJ, Shields B, Prokopenko I, Farrall M, Dominiczak A, Diabetes Genetics Initiative, Wellcome Trust Case Control Consortium, Johnson T, Bergmann S, Beckmann JS, Vollenweider P, Waterworth DM, Mooser V, Palmer CN, Morris AD, Ouwehand WH, Cambridge GEM Consortium, Zhao JH, Li S, Loos RJ, Barroso I, Deloukas P, Sandhu MS, Wheeler E, Soranzo N, Inouye M, Wareham NJ, Caulfield M, Munroe PB, Hattersley AT, McCarthy MI and Frayling TM

    Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, Magdalen Road, Exeter EX1 2LU, UK.

    Adult height is a model polygenic trait, but there has been limited success in identifying the genes underlying its normal variation. To identify genetic variants influencing adult human height, we used genome-wide association data from 13,665 individuals and genotyped 39 variants in an additional 16,482 samples. We identified 20 variants associated with adult height (P < 5 x 10(-7), with 10 reaching P < 1 x 10(-10)). Combined, the 20 SNPs explain approximately 3% of height variation, with a approximately 5 cm difference between the 6.2% of people with 17 or fewer 'tall' alleles compared to the 5.5% with 27 or more 'tall' alleles. The loci we identified implicate genes in Hedgehog signaling (IHH, HHIP, PTCH1), extracellular matrix (EFEMP1, ADAMTSL3, ACAN) and cancer (CDK6, HMGA2, DLEU7) pathways, and provide new insights into human growth and developmental processes. Finally, our results provide insights into the genetic architecture of a classic quantitative trait.

    Funded by: British Heart Foundation: FS/05/061/19501, PG/02/128/14470, PG02/128; Medical Research Council: G0600705, G0701863, G9521010, G9521010(63660), G9521010D, MC_U106188470; Wellcome Trust: 076113, 077011, 077016

    Nature genetics 2008;40;5;575-83

  • Association of the timing of puberty with a chromosome 2 locus.

    Wehkalampi K, Widén E, Laine T, Palotie A and Dunkel L

    Hospital for Children and Adolescents, Helsinki University Hospital, 00029 Helsinki, Finland.

    Context: Twin studies indicate that the timing of pubertal onset is under strong genetic control. However, genes controlling pubertal timing in the general population have not yet been identified.

    Objective: To facilitate the identification of genes influencing the timing of pubertal growth and maturation, we conducted linkage mapping of constitutional delay of growth and puberty (CDGP), an extreme variant of normal pubertal timing, in extended families.

    Fifty-two families multiply affected with CDGP were genotyped with 383 multiallelic markers. CDGP was defined based on growth charts (the age at onset of growth spurt, peak height velocity, or attaining adult height taking place at least 1.5 sd later than average). Chromosomal regions cosegregating with CDGP were identified with parametric affected-only linkage analysis using CDGP as a dichotomized trait.

    Results: The genome-wide scan detected linkage of CDGP to a region on chromosome 2p13-2q13. The two-point heterogeneity LOD (HLOD) score was 1.62 (alpha = 0.27), and the corresponding multipoint HLOD was 2.54 (alpha = 0.31). Fine-mapping the region at 1 cM resolution increased the multipoint HLOD score to 4.44 (alpha = 0.41). The linkage became weaker if family members diagnosed with CDGP without growth data were also included in the analyses.

    Conclusions: The pericentromeric region of chromosome 2 harbors a gene predisposing to pubertal delay in multiply affected pedigrees. Our data suggest that this locus may be a component of the internal clock controlling the timing of the onset of puberty.

    The Journal of clinical endocrinology and metabolism 2008;93;12;4833-9

  • How degrading: Cyp26s in hindbrain development.

    White RJ and Schilling TF

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The vitamin A derivative retinoic acid performs many functions in vertebrate development and is thought to act as a diffusible morphogen that patterns the anterior-posterior axis of the hindbrain. Recent work in several systems has led to insights into how the spatial distribution of retinoic acid is regulated. These have shown local control of synthesis and degradation, and computational models suggest that degradation by the Cyp26 enzymes plays a critical role in the formation of a morphogen gradient as well as its ability to compensate for fluctuations in RA levels.

    Funded by: NICHD NIH HHS: R13 HD057707-01A1; NIDCR NIH HHS: R01 DE013828-07, R01 DE013828-08; NIGMS NIH HHS: P50 GM076516-01A1; NINDS NIH HHS: NS-41353, R01 NS041353-06, R01 NS041353-07, R01 NS041353-08

    Developmental dynamics : an official publication of the American Association of Anatomists 2008;237;10;2775-90

  • Feedback circuit among INK4 tumor suppressors constrains human glioblastoma development.

    Wiedemeyer R, Brennan C, Heffernan TP, Xiao Y, Mahoney J, Protopopov A, Zheng H, Bignell G, Furnari F, Cavenee WK, Hahn WC, Ichimura K, Collins VP, Chu GC, Stratton MR, Ligon KL, Futreal PA and Chin L

    Department of Medical Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA 02115, USA.

    We have developed a nonheuristic genome topography scan (GTS) algorithm to characterize the patterns of genomic alterations in human glioblastoma (GBM), identifying frequent p18(INK4C) and p16(INK4A) codeletion. Functional reconstitution of p18(INK4C) in GBM cells null for both p16(INK4A) and p18(INK4C) resulted in impaired cell-cycle progression and tumorigenic potential. Conversely, RNAi-mediated depletion of p18(INK4C) in p16(INK4A)-deficient primary astrocytes or established GBM cells enhanced tumorigenicity in vitro and in vivo. Furthermore, acute suppression of p16(INK4A) in primary astrocytes induced a concomitant increase in p18(INK4C). Together, these findings uncover a feedback regulatory circuit in the astrocytic lineage and demonstrate a bona fide tumor suppressor role for p18(INK4C) in human GBM wherein it functions cooperatively with other INK4 family members to constrain inappropriate proliferation.

    Funded by: NCI NIH HHS: P01 CA095616-060004, P01 CA095616-070004, P01CA95616, R01 CA099041-05, R01CA99041; NIAMS NIH HHS: T32 AR007098; Wellcome Trust

    Cancer cell 2008;13;4;355-64

  • Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.

    Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J and Bähler J

    Cancer Research UK Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Recent data from several organisms indicate that the transcribed portions of genomes are larger and more complex than expected, and that many functional properties of transcripts are based not on coding sequences but on regulatory sequences in untranslated regions or non-coding RNAs. Alternative start and polyadenylation sites and regulation of intron splicing add additional dimensions to the rich transcriptional output. This transcriptional complexity has been sampled mainly using hybridization-based methods under one or few experimental conditions. Here we applied direct high-throughput sequencing of complementary DNAs (RNA-Seq), supplemented with data from high-density tiling arrays, to globally sample transcripts of the fission yeast Schizosaccharomyces pombe, independently from available gene annotations. We interrogated transcriptomes under multiple conditions, including rapid proliferation, meiotic differentiation and environmental stress, as well as in RNA processing mutants to reveal the dynamic plasticity of the transcriptional landscape as a function of environmental, developmental and genetic factors. High-throughput sequencing proved to be a powerful and quantitative method to sample transcriptomes deeply at maximal resolution. In contrast to hybridization, sequencing showed little, if any, background noise and was sensitive enough to detect widespread transcription in >90% of the genome, including traces of RNAs that were not robustly transcribed or rapidly degraded. The combined sequencing and strand-specific array data provide rich condition-specific information on novel, mostly non-coding transcripts, untranslated regions and gene structures, thus improving the existing genome annotation. Sequence reads spanning exon-exon or exon-intron junctions give unique insight into a surprising variability in splicing efficiency across introns, genes and conditions. Splicing efficiency was largely coordinated with transcript levels, and increased transcription led to increased splicing in test genes. Hundreds of introns showed such regulated splicing during cellular proliferation or differentiation.

    Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118

    Nature 2008;453;7199;1239-43

  • Reliability of the long case.

    Wilkinson TJ, Campbell PJ and Judd SJ

    Department of Medicine, Christchurch School of Medicine and Health Sciences, University of Otago, Dunedin, New Zealand.

    Objectives: The use of long cases for summative assessment of clinical competence is limited by concerns about unreliability. This study aims to explore the reliability of long cases and how reliability is affected by supplementation with short cases.

    Methods: We performed a statistical analysis of examinations held by the Royal Australasian College of Physicians in 2005 and 2006 to determine overall reliability and sources of variance in reliability according to candidate ability, case difficulty and inter-examiner differences.

    Results: Scores for 546 long cases in 2005 and 773 long cases in 2006 were analysed. In 2006, 38% of the total variation in long case data was explained by variation in candidate ability, with other significant contributors to variance being candidate x case and candidate x examiner interactions. Similar figures were found for the 2005 examinations. A short case is less reliable than a long case, but when examiner time is taken into account, three short cases are as reliable as one long case. Any combination of short and long cases would require 4-5 hours of testing time in order to achieve dependability > 0.7.

    Conclusions: Long cases can be optimised for reliability but time limits their use as the sole tool in a high-stakes examination. Further examiner training, better case selection, or greater use of short cases would have minimal impact on reliability. Reliability can be improved by either increasing examination time or including additional methods of summative assessment, such as might be provided by workplace assessment.

    Medical education 2008;42;9;887-93

  • Genomic resources and microarrays for the common carp Cyprinus carpio L

    Williams, D. R., Li, W., Hughes, M. A., Gonzalez, S. F., Vernon, C., Vidal, M. C., Jeney, Z., Jeney, G., Dixon, P., McAndrew, B., Bartfai, R., Orban, L., Trudeau, V., Rogers, J., Matthews, L., Fraser, E. J., Gracey, A. Y., Cossins, A. R.

    J Fish Biol. 2008;72;2095-2117

  • The vertebrate genome annotation (Vega) database.

    Wilming LG, Gilbert JG, Howe K, Trevanion S, Hubbard T and Harrow JL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    The Vertebrate Genome Annotation (Vega) database ( was first made public in 2004 and has been designed to view manual annotation of human, mouse and zebrafish genomic sequences produced at the Wellcome Trust Sanger Institute. Since its initial release, the number of human annotated loci has more than doubled to close to 33 000 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome. In addition, we offer manual annotation of a number of haplotype regions in mouse and human and regions of comparative interest in pig and dog that are unique to Vega.

    Funded by: NHGRI NIH HHS: U54 HG004555-02; Wellcome Trust: 077198

    Nucleic acids research 2008;36;Database issue;D753-60

  • Sequence differentiation in regions identified by a genome scan for local adaptation.

    Wood HM, Grahame JW, Humphray S, Rogers J and Butlin RK

    Institute of Integrative and Comparative Biology, University of Leeds, Leeds LS29JT, UK.

    Genome scans using large numbers of randomly selected markers have revealed a small proportion of loci that deviate from neutral expectations and so may mark genomic regions that contribute to local adaptation. Measurements of sequence differentiation and identification of genes in these regions is important but difficult, especially in organisms with limited genetic information available. We have followed up a genome scan in the marine gastropod, Littorina saxatilis, by searching a bacterial artificial chromosome library with differentiated and undifferentiated markers, sequencing four bacterial artificial chromosomes and then analysing sequence variation in population samples for fragments at, and close to the original marker polymorphisms. We show that sequence differentiation follows the patterns expected from the original marker frequencies, that differentiated markers identify independent and highly localized sites and that these sites fall outside coding regions. Two differentiated loci are characterized by insertions of putative transposable elements that appear to have increased in frequency recently and which might influence expression of downstream genes. These results provide strong candidate loci for the study of local adaptation in Littorina. They demonstrate an approach that can be applied to follow up genome scans in other taxa and they show that the genome scan approach can lead rapidly to candidate genes in nonmodel organisms.

    Funded by: Biotechnology and Biological Sciences Research Council

    Molecular ecology 2008;17;13;3123-35

  • Databases, data tombs and dust in the wind.

    Wren JD and Bateman A

    As biomedical data accumulates, the need to store, share and organize it grows. Consequently, the number of Internet-accessible databases has been rapidly growing on an annual basis. Bioinformatics regularly publishes descriptions of biomedically relevant databases, Nucleic Acids Research has published an annual database issue since 1996 and now a new open-access journal, Database: The Journal of Biological Databases and Curation, will soon be launched by Oxford University Press in 2009 ( Since databases can be made publicly available on the Internet without publication, it is worth considering what factors prioritize publication of database descriptions in a peer-reviewed journal. In general, publication of a database description in a journal advertises it as a valuable resource for scientific research. Implicitly, it is assumed that this resource is publicly available (most likely for free) and will be maintained. However, therein lies the problem: Database papers are simply not of the same nature as regular research articles. Over time, some databases simply become inaccessible, some are created but not maintained or updated, and some databases are never used (Galperin, 2006). Thus, for database creators, reviewers and journal editors, there are several additional considerations to judge, prior to publication, how potentially valuable these new databases may be.

    Bioinformatics (Oxford, England) 2008;24;19;2127-8

  • Variation of the oxytocin/neurophysin I (OXT) gene in four human populations.

    Xu Y, Xue Y, Asan, Daly A, Wu L and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs., CB10 1SA, UK.

    Oxytocin is a short peptide with multiple functions in human biology and has been implicated in autism. We aimed to determine the normal pattern of variation around the oxytocin gene and resequenced it and its flanking regions in 91 individuals from four HapMap populations and one chimpanzee. We identified 14 single nucleotide polymorphisms (SNPs), all noncoding, including eight that were novel. Population genetic analyses were largely consistent with a neutral evolutionary history, but an Hudson-Kreitman-Aguadé (HKA) test revealed more variation within the human population than expected from the level of chimpanzee-human divergence.

    Funded by: Wellcome Trust: 077009

    Journal of human genetics 2008;53;7;637-43

  • Adaptive evolution of UGT2B17 copy-number variation.

    Xue Y, Sun D, Daly A, Yang F, Zhou X, Zhao M, Huang N, Zerjal T, Lee C, Carter NP, Hurles ME and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    The human UGT2B17 gene varies in copy number from zero to two per individual and also differs in mean number between populations from Africa, Europe, and East Asia. We show that such a high degree of geographical variation is unusual and investigate its evolutionary history. This required first reinterpreting the reference sequence in this region of the genome, which is misassembled from the two different alleles separated by an artifactual gap. A corrected assembly identifies the polymorphism as a 117 kb deletion arising by nonallelic homologous recombination between approximately 4.9 kb segmental duplications and allows the deletion breakpoint to be identified. We resequenced approximately 12 kb of DNA spanning the breakpoint in 91 humans from three HapMap and one extended HapMap populations and one chimpanzee. Diversity was unusually high and the time to the most recent common ancestor was estimated at approximately 2.4 or approximately 3.0 million years by two different methods, with evidence of balancing selection in Europe. In contrast, diversity was low in East Asia where a single haplotype predominated, suggesting positive selection for the deletion in this part of the world.

    Funded by: Wellcome Trust

    American journal of human genetics 2008;83;3;337-46

  • Y-chromosomal diversity in Lebanon is structured by recent historical events.

    Zalloua PA, Xue Y, Khalife J, Makhoul N, Debiane L, Platt DE, Royyuru AK, Herrera RJ, Hernanz DF, Blue-Smith J, Wells RS, Comas D, Bertranpetit J, Tyler-Smith C and Genographic Consortium

    The Lebanese American University, Chouran, Beirut 1102 2801, Lebanon.

    Lebanon is an eastern Mediterranean country inhabited by approximately four million people with a wide variety of ethnicities and religions, including Muslim, Christian, and Druze. In the present study, 926 Lebanese men were typed with Y-chromosomal SNP and STR markers, and unusually, male genetic variation within Lebanon was found to be more strongly structured by religious affiliation than by geography. We therefore tested the hypothesis that migrations within historical times could have contributed to this situation. Y-haplogroup J*(xJ2) was more frequent in the putative Muslim source region (the Arabian Peninsula) than in Lebanon, and it was also more frequent in Lebanese Muslims than in Lebanese non-Muslims. Conversely, haplogroup R1b was more frequent in the putative Christian source region (western Europe) than in Lebanon and was also more frequent in Lebanese Christians than in Lebanese non-Christians. The most common R1b STR-haplotype in Lebanese Christians was otherwise highly specific for western Europe and was unlikely to have reached its current frequency in Lebanese Christians without admixture. We therefore suggest that the Islamic expansion from the Arabian Peninsula beginning in the seventh century CE introduced lineages typical of this area into those who subsequently became Lebanese Muslims, whereas the Crusader activity in the 11(th)-13(th) centuries CE introduced western European lineages into Lebanese Christians.

    Funded by: Wellcome Trust

    American journal of human genetics 2008;82;4;873-82

  • Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes.

    Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PI, Abecasis GR, Almgren P, Andersen G, Ardlie K, Boström KB, Bergman RN, Bonnycastle LL, Borch-Johnsen K, Burtt NP, Chen H, Chines PS, Daly MJ, Deodhar P, Ding CJ, Doney AS, Duren WL, Elliott KS, Erdos MR, Frayling TM, Freathy RM, Gianniny L, Grallert H, Grarup N, Groves CJ, Guiducci C, Hansen T, Herder C, Hitman GA, Hughes TE, Isomaa B, Jackson AU, Jørgensen T, Kong A, Kubalanza K, Kuruvilla FG, Kuusisto J, Langenberg C, Lango H, Lauritzen T, Li Y, Lindgren CM, Lyssenko V, Marvelle AF, Meisinger C, Midthjell K, Mohlke KL, Morken MA, Morris AD, Narisu N, Nilsson P, Owen KR, Palmer CN, Payne F, Perry JR, Pettersen E, Platou C, Prokopenko I, Qi L, Qin L, Rayner NW, Rees M, Roix JJ, Sandbaek A, Shields B, Sjögren M, Steinthorsdottir V, Stringham HM, Swift AJ, Thorleifsson G, Thorsteinsdottir U, Timpson NJ, Tuomi T, Tuomilehto J, Walker M, Watanabe RM, Weedon MN, Willer CJ, Wellcome Trust Case Control Consortium, Illig T, Hveem K, Hu FB, Laakso M, Stefansson K, Pedersen O, Wareham NJ, Barroso I, Hattersley AT, Collins FS, Groop L, McCarthy MI, Boehnke M and Altshuler D

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D). Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and approximately 2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975. We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P = 5.0 x 10(-14)), CDC123-CAMK1D (P = 1.2 x 10(-10)), TSPAN8-LGR5 (P = 1.1 x 10(-9)), THADA (P = 1.1 x 10(-9)), ADAMTS9 (P = 1.2 x 10(-8)) and NOTCH2 (P = 4.1 x 10(-8)) gene regions. Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D.

    Funded by: Department of Health: DHCS/07/07/008; Medical Research Council: G0000649, G0600705, G0601261, MC_U106179471; NCI NIH HHS: CA87969; NHGRI NIH HHS: HG002651, N01-HG-65403, U01 HG004171, U01 HG004399; NHLBI NIH HHS: HL084729, T32 HL007627; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK062370, DK072193, DK58845, P30 DK040561-12, R01 DK029867, R01 DK072193, R01 DK072193-01, R01 DK072193-02, R01 DK072193-03; Wellcome Trust: 072960, 076113, 077016, 079557, GR072960

    Nature genetics 2008;40;5;638-45

  • A genomic-based approach combining in vivo selection in mice to identify a novel virulence gene in Leishmania.

    Zhang WW, Peacock CS and Matlashewski G

    Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada.

    Background: Infection with Leishmania results in a broad spectrum of pathologies where L. infantum and L. donovani cause fatal visceral leishmaniasis and L. major causes destructive cutaneous lesions. The identification and characterization of Leishmania virulence genes may define the genetic basis for these different pathologies.

    Comparison of the recently completed L. major and L. infantum genomes revealed a relatively small number of genes that are absent or present as pseudogenes in L. major and potentially encode proteins in L. infantum. To investigate the potential role of genetic differences between species in visceral infection, seven genes initially classified as absent in L. major but present in L. infantum were cloned from the closely related L. donovani genome and introduced into L. major. The transgenic L. major expressing the L. donovani genes were then introduced into BALB/c mice to select for parasites with increased virulence in the spleen to determine whether any of the L. donovani genes increased visceral infection levels. During the course of these experiments, one of the selected genes (LinJ32_V3.1040 (Li1040)) was reclassified as also present in the L. major genome. Interestingly, only the Li1040 gene significantly increased visceral infection in the L. major transfectants. The Li1040 gene encodes a protein containing a putative component of an endosomal protein sorting complex involved with protein transport.

    Conclusions: These observations demonstrate that the levels of expression and sequence variations in genes ubiquitously shared between Leishmania species have the potential to significantly influence virulence and tissue tropism.

    Funded by: Canadian Institutes of Health Research; Wellcome Trust

    PLoS neglected tropical diseases 2008;2;6;e248

  • Neo-sex chromosomes in the black muntjac recapitulate incipient evolution of mammalian sex chromosomes.

    Zhou Q, Wang J, Huang L, Nie W, Wang J, Liu Y, Zhao X, Yang F and Wang W

    CAS-Max Planck Junior Research Group, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, People's Republic of China.

    Background: The regular mammalian X and Y chromosomes diverged from each other at least 166 to 148 million years ago, leaving few traces of their early evolution, including degeneration of the Y chromosome and evolution of dosage compensation.

    Results: We studied the intriguing case of black muntjac, in which a recent X-autosome fusion and a subsequent large autosomal inversion within just the past 0.5 million years have led to inheritance patterns identical to the traditional X-Y (neo-sex chromosomes). We compared patterns of genome evolution in 35-kilobase noncoding regions and 23 gene pairs on the homologous neo-sex chromosomes. We found that neo-Y alleles have accumulated more mutations, comprising a wide variety of mutation types, which indicates cessation of recombination and is consistent with an ongoing neo-Y degeneration process. Putative deleterious mutations were observed in coding regions of eight investigated genes as well as cis-regulatory regions of two housekeeping genes. In vivo assays characterized a neo-Y insertion in the promoter of the CLTC gene that causes a significant reduction in allelic expression. A neo-Y-linked deletion in the 3'-untranslated region of gene SNX22 abolished a microRNA target site. Finally, expression analyses revealed complex patterns of expression divergence between neo-Y and neo-X alleles.

    Conclusion: The nascent neo-sex chromosome system of black muntjacs is a valuable model in which to study the evolution of sex chromosomes in mammals. Our results illustrate the degeneration scenarios in various genomic regions. Of particular importance, we report--for the first time--that regulatory mutations were probably able to accelerate the degeneration process of Y and contribute to further evolution of dosage compensation.

    Genome biology 2008;9;6;R98

  • Immunoglobulin light chain (IgL) genes in zebrafish: Genomic configurations and inversional rearrangements between (V(L)-J(L)-C(L)) gene clusters.

    Zimmerman AM, Yeo G, Howe K, Maddox BJ and Steiner LA

    Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. >

    In mammals, Immunoglobulin light chain (IgL) are localized to two chromosomal regions (designated kappa and lambda). Here we report a genome-wide survey of IgL genes in the zebrafish revealing (V(L)-J(L)-C(L)) clusters spanning 5 separate chromosomes. To elucidate IgL loci present in the zebrafish genome assembly (Zv6), conventional sequence similarity searches and a novel scanning approach based on recombination signal sequence (RSS) motifs were applied. RT-PCR with zebrafish cDNA was used to confirm annotations, evaluate VJ-rearrangement possibilities and show that each chromosomal locus is expressed. In contrast to other vertebrates in which IgL exon usage has been studied, inversional rearrangement between (V(L)-J(L)-C(L)) clusters were found. Inter-cluster rearrangements may convey a selective advantage for editing self-reactive receptors and poise zebrafish by virtue of their extensive numbers of V(L), J(L) and C(L) to have greater potential for immunoglobulin gene shuffling than traditionally studied mice and human models.

    Funded by: Wellcome Trust: 077198

    Developmental and comparative immunology 2008;32;4;421-34

* quick link -