Sanger Institute - Publications 2009

Number of papers published in 2009: 239

  • PSD-95 is essential for hallucinogen and atypical antipsychotic drug actions at serotonin receptors.

    Abbas AI, Yadav PN, Yao WD, Arbuckle MI, Grant SG, Caron MG and Roth BL

    Department of Biochemistry, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USA.

    Here, we report that postsynaptic density protein of 95 kDa (PSD-95), a postsynaptic density scaffolding protein, classically conceptualized as being essential for the regulation of ionotropic glutamatergic signaling at the postsynaptic membrane, plays an unanticipated and essential role in mediating the actions of hallucinogens and atypical antipsychotic drugs at 5-HT(2A) and 5-HT(2C) serotonergic G-protein-coupled receptors. We show that PSD-95 is crucial for normal 5-HT(2A) and 5-HT(2C) expression in vivo and that PSD-95 maintains normal receptor expression by promoting apical dendritic targeting and stabilizing receptor turnover in vivo. Significantly, 5-HT(2A)- and 5-HT(2C)-mediated downstream signaling is impaired in PSD-95(null) mice, and the 5-HT(2A)-mediated head-twitch response is abnormal. Furthermore, the ability of 5-HT(2A) inverse agonists to normalize behavioral changes induced by glutamate receptor antagonists is abolished in the absence of PSD-95 in vivo. These results demonstrate that PSD-95, in addition to the well known role it plays in scaffolding macromolecular glutamatergic signaling complexes, profoundly modulates metabotropic 5-HT(2A) and 5-HT(2C) receptor function.

    Funded by: NCRR NIH HHS: RR00168; NIDA NIH HHS: DA021420; NIGMS NIH HHS: T32 GM007250; NIMH NIH HHS: MH-73853, MH61887, R01 MH061887-07, R01 MH061887-08, R01 MH061887-09, R01 MH061887-10, R01 MH061887-11, R01 MH061887-12, U19 MH082441-01, U19 MH082441-010001, U19 MH082441-019003, U19 MH082441-02, U19 MH082441-020001, U19 MH082441-03S1, U19 MH082441-04, U19 MH082441-040001, U19 MH082441-05, U19MH82441; NINDS NIH HHS: NS-19576, NS057311; Wellcome Trust

    The Journal of neuroscience : the official journal of the Society for Neuroscience 2009;29;22;7124-36

  • Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2.

    Ahmed S, Thomas G, Ghoussaini M, Healey CS, Humphreys MK, Platte R, Morrison J, Maranian M, Pooley KA, Luben R, Eccles D, Evans DG, Fletcher O, Johnson N, dos Santos Silva I, Peto J, Stratton MR, Rahman N, Jacobs K, Prentice R, Anderson GL, Rajkovic A, Curb JD, Ziegler RG, Berg CD, Buys SS, McCarty CA, Feigelson HS, Calle EE, Thun MJ, Diver WR, Bojesen S, Nordestgaard BG, Flyger H, Dörk T, Schürmann P, Hillemanns P, Karstens JH, Bogdanova NV, Antonenkova NN, Zalutsky IV, Bermisheva M, Fedorova S, Khusnutdinova E, SEARCH, Kang D, Yoo KY, Noh DY, Ahn SH, Devilee P, van Asperen CJ, Tollenaar RA, Seynaeve C, Garcia-Closas M, Lissowska J, Brinton L, Peplonska B, Nevanlinna H, Heikkinen T, Aittomäki K, Blomqvist C, Hopper JL, Southey MC, Smith L, Spurdle AB, Schmidt MK, Broeks A, van Hien RR, Cornelissen S, Milne RL, Ribas G, González-Neira A, Benitez J, Schmutzler RK, Burwinkel B, Bartram CR, Meindl A, Brauch H, Justenhoven C, Hamann U, GENICA Consortium, Chang-Claude J, Hein R, Wang-Gohrke S, Lindblom A, Margolin S, Mannermaa A, Kosma VM, Kataja V, Olson JE, Wang X, Fredericksen Z, Giles GG, Severi G, Baglietto L, English DR, Hankinson SE, Cox DG, Kraft P, Vatten LJ, Hveem K, Kumle M, Sigurdson A, Doody M, Bhatti P, Alexander BH, Hooning MJ, van den Ouweland AM, Oldenburg RA, Schutte M, Hall P, Czene K, Liu J, Li Y, Cox A, Elliott G, Brock I, Reed MW, Shen CY, Yu JC, Hsu GC, Chen ST, Anton-Culver H, Ziogas A, Andrulis IL, Knight JA, kConFab, Australian Ovarian Cancer Study Group, Beesley J, Goode EL, Couch F, Chenevix-Trench G, Hoover RN, Ponder BA, Hunter DJ, Pharoah PD, Dunning AM, Chanock SJ and Easton DF

    Department of Oncology, University of Cambridge, UK.

    Genome-wide association studies (GWAS) have identified seven breast cancer susceptibility loci, but these explain only a small fraction of the familial risk of the disease. Five of these loci were identified through a two-stage GWAS involving 390 familial cases and 364 controls in the first stage, and 3,990 cases and 3,916 controls in the second stage. To identify additional loci, we tested over 800 promising associations from this GWAS in a further two stages involving 37,012 cases and 40,069 controls from 33 studies in the CGEMS collaboration and Breast Cancer Association Consortium. We found strong evidence for additional susceptibility loci on 3p (rs4973768: per-allele OR = 1.11, 95% CI = 1.08-1.13, P = 4.1 x 10(-23)) and 17q (rs6504950: per-allele OR = 0.95, 95% CI = 0.92-0.97, P = 1.4 x 10(-8)). Potential causative genes include SLC4A7 and NEK10 on 3p and COX11 on 17q.

    Funded by: Cancer Research UK: 10118, 11021, A10123, C1287/A10118, C1287/A5260, C1287/A7497, C490/A11021; NCI NIH HHS: 5UO1CA098233, CA-06-503, CA-58860, CA-92044, CA-95-011, CA49449, CA50385, CA65725, CA67262, CA87969, P30 CA062203, P50 CA116201, R01 CA102740-01A2, R01 CA104021-04, R01 CA122340, U01 CA69398, U01 CA69417, U01 CA69446, U01 CA69467, U01 CA69631, U01 CA69638, UO1 CA098710, UO1 CA69467

    Nature genetics 2009;41;5;585-90

  • Genetic diversity amongst isolates of Neospora caninum, and the development of a multiplex assay for the detection of distinct strains.

    Al-Qassab S, Reichel MP, Ivens A and Ellis JT

    Department of Medical and Molecular Biosciences, University of Technology, Sydney, P.O. Box 123, Broadway, New South Wales 2007, Australia.

    Infection with Neospora caninum is regarded as a significant cause of abortion in cattle. Despite the economic impact of this infection, relatively little is known about the biology of this parasite. In this study, mini and microsatellite DNAs were detected in the genome of N. caninum and eight loci were identified that each contained repetitive DNA which was polymorphic among different isolates of this parasite. A multiplex PCR assay was developed for the detection of genetic variation within N. caninum based on length polymorphism associated with three different repetitive markers. The utility of the multiplex PCR was demonstrated in that it was able to distinguish amongst strains of N. caninum used as either vaccine or challenge strains in animal vaccination experiments and that it could genotype N. caninum associated with naturally acquired infections of animals. The multiplex PCR is simple, rapid, informative and sensitive and should provide a valuable tool for further studies on the epidemiology of N. caninum in different host species.

    Molecular and cellular probes 2009;23;3-4;132-9

  • SnoopCGH: software for visualizing comparative genomic hybridization data.

    Almagro-Garcia J, Manske M, Carret C, Campino S, Auburn S, Macinnis BL, Maslen G, Pain A, Newbold CI, Kwiatkowski DP and Clark TG

    Wellcome Trust Sanger Institute, Hinxton, The Weatherall Institute of Molecular Medicine and Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. jg10@sanger.ac.uk

    Array-based comparative genomic hybridization (CGH) technology is used to discover and validate genomic structural variation, including copy number variants, insertions, deletions and other structural variants (SVs). The visualization and summarization of the array CGH data outputs, potentially across many samples, is an important process in the identification and analysis of SVs. We have developed a software tool for SV analysis using data from array CGH technologies, which is also amenable to short-read sequence data. Availability and implementation: SnoopCGH is written in java and is available from http://snoopcgh.sourceforge.net/

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;20;2732-3

  • Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome.

    Amid C, Rehaume LM, Brown KL, Gilbert JG, Dougan G, Hancock RE and Harrow JL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ca1@sanger.ac.uk

    Background: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region.

    Results: The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins.

    Conclusions: Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.

    Funded by: NHGRI NIH HHS: U54 HG004555-03; Wellcome Trust: 077198

    BMC genomics 2009;10;606

  • Testing for rare variant associations in complex diseases.

    Asimit J and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK. ja11@sanger.ac.uk.

    The study of rare variants holds the promise of accounting for some of the missing heritability in complex traits. Next-generation sequencing technologies enable probing of variation across the full spectrum of allele frequencies. Multiple methods for the analysis of rare variants have been proposed and, recently, Ionita-Laza et al. have presented an approach with the theoretical capacity to detect risk and protective variants. The identification of rare risk variants could have major implications in understanding complex disease etiopathogenesis.

    Genome medicine 2009;1;11;24

  • ABACAS: algorithm-based automatic contiguation of assembled sequences.

    Assefa S, Keane TM, Otto TD, Newbold C and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK. sa4@sanger.ac.uk

    Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated. Availability and Implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net.

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Bioinformatics (Oxford, England) 2009;25;15;1968-9

  • Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts.

    Aulchenko YS, Ripatti S, Lindqvist I, Boomsma D, Heid IM, Pramstaller PP, Penninx BW, Janssens AC, Wilson JF, Spector T, Martin NG, Pedersen NL, Kyvik KO, Kaprio J, Hofman A, Freimer NB, Jarvelin MR, Gyllensten U, Campbell H, Rudan I, Johansson A, Marroni F, Hayward C, Vitart V, Jonasson I, Pattaro C, Wright A, Hastie N, Pichler I, Hicks AA, Falchi M, Willemsen G, Hottenga JJ, de Geus EJ, Montgomery GW, Whitfield J, Magnusson P, Saharinen J, Perola M, Silander K, Isaacs A, Sijbrands EJ, Uitterlinden AG, Witteman JC, Oostra BA, Elliott P, Ruokonen A, Sabatti C, Gieger C, Meitinger T, Kronenberg F, Döring A, Wichmann HE, Smit JH, McCarthy MI, van Duijn CM, Peltonen L and ENGAGE Consortium

    [1] Department of Epidemiology and Biostatistics, Erasmus University Medical Center, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands. [2] These authors contributed equally to this work.

    Recent genome-wide association (GWA) studies of lipids have been conducted in samples ascertained for other phenotypes, particularly diabetes. Here we report the first GWA analysis of loci affecting total cholesterol (TC), low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and triglycerides sampled randomly from 16 population-based cohorts and genotyped using mainly the Illumina HumanHap300-Duo platform. Our study included a total of 17,797-22,562 persons, aged 18-104 years and from geographic regions spanning from the Nordic countries to Southern Europe. We established 22 loci associated with serum lipid levels at a genome-wide significance level (P < 5 x 10(-8)), including 16 loci that were identified by previous GWA studies. The six newly identified loci in our cohort samples are ABCG5 (TC, P = 1.5 x 10(-11); LDL, P = 2.6 x 10(-10)), TMEM57 (TC, P = 5.4 x 10(-10)), CTCF-PRMT8 region (HDL, P = 8.3 x 10(-16)), DNAH11 (LDL, P = 6.1 x 10(-9)), FADS3-FADS2 (TC, P = 1.5 x 10(-10); LDL, P = 4.4 x 10(-13)) and MADD-FOLH1 region (HDL, P = 6 x 10(-11)). For three loci, effect sizes differed significantly by sex. Genetic risk scores based on lipid loci explain up to 4.8% of variation in lipids and were also associated with increased intima media thickness (P = 0.001) and coronary heart disease incidence (P = 0.04). The genetic risk score improves the screening of high-risk groups of dyslipidemia over classical risk factors.

    Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: MC_U127527180, MC_U127561128; NHLBI NIH HHS: 5R01HL087679-02; Wellcome Trust: 089061

    Nature genetics 2009;41;1;47-55

  • A novel system of polymorphic and diverse NK cell receptors in primates.

    Averdam A, Petersen B, Rosner C, Neff J, Roos C, Eberle M, Aujard F, Münch C, Schempp W, Carrington M, Shiina T, Inoko H, Knaust F, Coggill P, Sehra H, Beck S, Abi-Rached L, Reinhardt R and Walter L

    Department of Primate Genetics, German Primate Centre, Göttingen, Germany.

    There are two main classes of natural killer (NK) cell receptors in mammals, the killer cell immunoglobulin-like receptors (KIR) and the structurally unrelated killer cell lectin-like receptors (KLR). While KIR represent the most diverse group of NK receptors in all primates studied to date, including humans, apes, and Old and New World monkeys, KLR represent the functional equivalent in rodents. Here, we report a first digression from this rule in lemurs, where the KLR (CD94/NKG2) rather than KIR constitute the most diverse group of NK cell receptors. We demonstrate that natural selection contributed to such diversification in lemurs and particularly targeted KLR residues interacting with the peptide presented by MHC class I ligands. We further show that lemurs lack a strict ortholog or functional equivalent of MHC-E, the ligands of non-polymorphic KLR in "higher" primates. Our data support the existence of a hitherto unknown system of polymorphic and diverse NK cell receptors in primates and of combinatorial diversity as a novel mechanism to increase NK cell receptor repertoire.

    Funded by: NIAID NIH HHS: AI 31168, R01 AI031168-16A2, R01 AI031168-17; PHS HHS: HHSN261200800001E

    PLoS genetics 2009;5;10;e1000688

  • Gene body methylation of the dimethylarginine dimethylamino-hydrolase 2 (Ddah2) gene is an epigenetic biomarker for neural stem cell differentiation.

    Bäckdahl L, Herberth M, Wilson G, Tate P, Campos LS, Cortese R, Eckhardt F and Beck S

    UCL Cancer Institute, University College London, London WC1E 6BT, UK. l.backdahl@ucl.ac.uk

    DNA methylation is an important epigenetic mark that is involved in the regulation of many cellular processes such as gene expression, genomic imprinting and silencing of repetitive elements. Because of their ability to cause and capture phenotypic plasticity, epigenetic marks such as DNA methylation represent potential biomarkers to distinguish between different types of tissues and stages of differentiation. Here, we have identified differential DNA methylation in the gene body of the nitric oxide inhibitor Ddah2 that discriminates embryonic stem cells from neural stem cells and is positively correlated with differential gene expression.

    Funded by: Wellcome Trust: WT-084071

    Epigenetics : official journal of the DNA Methylation Society 2009;4;4;248-54

  • Complete genome sequence of Macrococcus caseolyticus strain JCSCS5402, [corrected] reflecting the ancestral genome of the human-pathogenic staphylococci.

    Baba T, Kuwahara-Arai K, Uchiyama I, Takeuchi F, Ito T and Hiramatsu K

    Department of Microbiology and Infection Control Science, Juntendo University, 2-1-1 Hongo, Bunkyo, Tokyo 113-8421, Japan. tbaba@juntendo.ac.jp

    We isolated the methicillin-resistant Macrococcus caseolyticus strain JCSC5402 from animal meat in a supermarket and determined its whole-genome nucleotide sequence. This is the first report on the genome analysis of a macrococcal species that is evolutionarily closely related to the human pathogens Staphylococcus aureus and Bacillus anthracis. The essential biological pathways of M. caseolyticus are similar to those of staphylococci. However, the species has a small chromosome (2.1 MB) and lacks many sugar and amino acid metabolism pathways and a plethora of virulence genes that are present in S. aureus. On the other hand, M. caseolyticus possesses a series of oxidative phosphorylation machineries that are closely related to those in the family Bacillaceae. We also discovered a probable primordial form of a Macrococcus methicillin resistance gene complex, mecIRAm, on one of the eight plasmids harbored by the M. caseolyticus strain. This is the first finding of a plasmid-encoding methicillin resistance gene. Macrococcus is considered to reflect the genome of ancestral bacteria before the speciation of staphylococcal species and may be closely associated with the origin of the methicillin resistance gene complex of the notorious human pathogen methicillin-resistant S. aureus.

    Journal of bacteriology 2009;191;4;1180-90

  • Genomic complexity of the Y-STR DYS19: inversions, deletions and founder lineages carrying duplications.

    Balaresque P, Parkin EJ, Roewer L, Carvalho-Silva DR, Mitchell RJ, van Oorschot RA, Henke J, Stoneking M, Nasidze I, Wetton J, de Knijff P, Tyler-Smith C and Jobling MA

    Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK.

    The Y-STR DYS19 is firmly established in the repertoire of Y-chromosomal markers used in forensic analysis yet is poorly understood at the molecular level, lying in a complex genomic environment and exhibiting null alleles, as well as duplications and occasional triplications in population samples. Here, we analyse three null alleles and 51 duplications and show that DYS19 can also be involved in inversion events, so that even its location within the short arm of the Y chromosome is uncertain. Deletion mapping in the three chromosomes carrying null alleles shows that their deletions are less than approximately 300 kb in size. Haplotypic analysis with binary markers shows that they belong to three different haplogroups and so represent independent events. In contrast, a collection of 51 DYS19 duplication chromosomes belong to only four haplogroups: two are singletons and may represent somatic mutation in lymphoblastoid cell lines, but two, in haplogroups G and C3c, represent founder lineages that have spread widely in Central Europe/West Asia and East Asia, respectively. Consideration of candidate mechanisms underlying both deletions and duplications provides no evidence for the involvement of non-allelic homologous recombination, and they are likely to represent sporadic events with low mutation rates. Understanding the basis and population distribution of these DYS19 alleles will aid in the utilisation and interpretation of profiles that contain them.

    Funded by: Wellcome Trust: 057559, 077009

    International journal of legal medicine 2009;123;1;15-23

  • Replication analysis identifies TYK2 as a multiple sclerosis susceptibility factor.

    Ban M, Goris A, Lorentzen AR, Baker A, Mihalova T, Ingram G, Booth DR, Heard RN, Stewart GJ, Bogaert E, Dubois B, Harbo HF, Celius EG, Spurkland A, Strange R, Hawkins C, Robertson NP, Dudbridge F, Wason J, De Jager PL, Hafler D, Rioux JD, Ivinson AJ, McCauley JL, Pericak-Vance M, Oksenberg JR, Hauser SL, Sexton D, Haines J, Sawcer S, Wellcome Trust Case-Control Consortium (WTCCC) and Compston A

    Department of Clinical Neuroscience, Addenbrooke's, Hospital, University of Cambridge, Cambridge, UK. mb531@medschl.cam.ac.uk

    In a recent genome-wide association study (GWAS) based on 12,374 non-synonymous single nucleotide polymorphisms we identified a number of candidate multiple sclerosis susceptibility genes. Here, we describe the extended analysis of 17 of these loci undertaken using an additional 4234 patients, 2983 controls and 2053 trio families. In the final analysis combining all available data, we found that evidence for association was substantially increased for one of the 17 loci, rs34536443 from the tyrosine kinase 2 (TYK2) gene (P=2.7 x 10(-6), odds ratio=1.32 (1.17-1.47)). This single nucleotide polymorphism results in an amino acid substitution (proline to alanine) in the kinase domain of TYK2, which is predicted to influence the levels of phosphorylation and therefore activity of the protein and so is likely to have a functional role in multiple sclerosis.

    Funded by: Medical Research Council: G0000934, G0700061, MC_U105292688; NINDS NIH HHS: NS 049477-01A1, R01 NS049477-01A1; Wellcome Trust: 061858, 068545/Z/02, 076113, 085475

    European journal of human genetics : EJHG 2009;17;10;1309-13

  • Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.

    Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, Julier C, Morahan G, Nerup J, Nierras C, Plagnol V, Pociot F, Schuilenburg H, Smyth DJ, Stevens H, Todd JA, Walker NM, Rich SS and Type 1 Diabetes Genetics Consortium

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.

    Type 1 diabetes (T1D) is a common autoimmune disorder that arises from the action of multiple genetic and environmental risk factors. We report the findings of a genome-wide association study of T1D, combined in a meta-analysis with two previously published studies. The total sample set included 7,514 cases and 9,045 reference samples. Forty-one distinct genomic locations provided evidence for association with T1D in the meta-analysis (P < 10(-6)). After excluding previously reported associations, we further tested 27 regions in an independent set of 4,267 cases, 4,463 controls and 2,319 affected sib-pair (ASP) families. Of these, 18 regions were replicated (P < 0.01; overall P < 5 × 10(-8)) and 4 additional regions provided nominal evidence of replication (P < 0.05). The many new candidate genes suggested by these results include IL10, IL19, IL20, GLIS3, CD69 and IL27.

    Funded by: Medical Research Council: G0000934; NIDDK NIH HHS: DK46635, K08 DK002876-06, R01 DK046635-15, U01 DK062418, U01 DK062418-06; NIMH NIH HHS: MH 63420, MH059565, MH059571, MH059588, MH060879, MH061675, MH067257, MH59566, MH59586, MH59587, MH60870; Wellcome Trust: 061858, 076113

    Nature genetics 2009;41;6;703-7

  • Cloud computing.

    Bateman A and Wood M

    Bioinformatics (Oxford, England) 2009;25;12;1475

  • Phospholipid scramblases and Tubby-like proteins belong to a new superfamily of membrane tethered transcription factors.

    Bateman A, Finn RD, Sims PJ, Wiedmer T, Biegert A and Söding J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. agb@sanger.ac.uk

    Motivation: Phospholipid scramblases (PLSCRs) constitute a family of cytoplasmic membrane-associated proteins that were identified based upon their capacity to mediate a Ca(2+)-dependent bidirectional movement of phospholipids across membrane bilayers, thereby collapsing the normally asymmetric distribution of such lipids in cell membranes. The exact function and mechanism(s) of these proteins nevertheless remains obscure: data from several laboratories now suggest that in addition to their putative role in mediating transbilayer flip/flop of membrane lipids, the PLSCRs may also function to regulate diverse processes including signaling, apoptosis, cell proliferation and transcription. A major impediment to deducing the molecular details underlying the seemingly disparate biology of these proteins is the current absence of any representative molecular structures to provide guidance to the experimental investigation of their function.

    Results: Here, we show that the enigmatic PLSCR family of proteins is directly related to another family of cellular proteins with a known structure. The Arabidopsis protein At5g01750 from the DUF567 family was solved by X-ray crystallography and provides the first structural model for this family. This model identifies that the presumed C-terminal transmembrane helix is buried within the core of the PLSCR structure, suggesting that palmitoylation may represent the principal membrane anchorage for these proteins. The fold of the PLSCR family is also shared by Tubby-like proteins. A search of the PDB with the HHpred server suggests a common evolutionary ancestry. Common functional features also suggest that tubby and PLSCR share a functional origin as membrane tethered transcription factors with capacity to modulate phosphoinositide-based signaling.

    Funded by: NHLBI NIH HHS: HL036946, HL063819, HL076215; Wellcome Trust: 087656, WT077044/Z/05/Z

    Bioinformatics (Oxford, England) 2009;25;2;159-62

  • Expression screening and annotation of a zebrafish myoblast cDNA library.

    Baxendale S, Chen CK, Tang H, Davison C, Hateren LV, Croning MD, Humphray SJ, Hubbard SJ and Ingham PW

    MRC Centre for Developmental and Biomedical Genetics, University of Sheffield, Sheffield S10 2TN, UK.

    To analyse the myogenic transcriptome and identify novel genes involved in muscle development in an in vivo context, we have constructed a muscle specific cDNA library from GFP-expressing myoblasts purified by fluorescent activated cell sorting of transgenic zebrafish embryos. We have generated 153,428 EST sequences from this library that have been clustered into consensi, mapped to the genome assembly Zv6 and analysed for protein homology. Expression analysis of a randomly picked sample of clones using whole mount in situ hybridisation, identified 30 genes that are expressed specifically within the myotome, one third of which represent novel sequences. These genes have been assigned to syn-expression groups. The sequencing of the myoblast enriched cDNA library has significantly increased the number of zebrafish ESTs, facilitating the prediction of new spliced transcripts in the genome assembly and providing a transcriptome of an in vivo myoblast cell.

    Gene expression patterns : GEP 2009;9;2;73-82

  • Neuroproteomics: understanding the molecular organization and complexity of the brain.

    Bayés A and Grant SG

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Advances in technology have equipped the field of neuroproteomics with refined tools for the study of the expression, interaction and function of proteins in the nervous system. In combination with bioinformatics, neuroproteomics can address the organization of dynamic, functional protein networks and macromolecular structures that underlie physiological, anatomical and behavioural processes. Furthermore, neuroproteomics is contributing to the elucidation of disease mechanisms and is a powerful tool for the identification of biomarkers.

    Funded by: Medical Research Council; Wellcome Trust

    Nature reviews. Neuroscience 2009;10;9;635-46

  • Interaction of Salmonella enterica with basil and other salad leaves.

    Berger CN, Shaw RK, Brown DJ, Mather H, Clare S, Dougan G, Pallen MJ and Frankel G

    Department of Life Science, Division of Cell and Molecular Biology, Imperial College London, London, UK.

    Contaminated salad leaves have emerged as important vehicles for the transmission of enteric pathogens to humans. A recent outbreak of Salmonella enterica serovar Senftenberg (S. Senftenberg) in the United Kingdom has been traced to the consumption of contaminated basil. Using the outbreak strain of S. Senftenberg, we found that it binds to basil, lettuce, rocket and spinach leaves showing a pattern of diffuse adhesion. Flagella were seen linking S. Senftenberg to the leaf epidermis, and the deletion of fliC (encoding phase-1 flagella) resulted in a significantly reduced level of adhesion. In contrast, although flagella linking S. enterica serovar Typhimurium to the basil leaf epidermis were widespread, deletion of fliC did not affect leaf attachment levels. These results implicate the role of flagella in Salmonella leaf attachment and suggest that different Salmonella serovars use strain-specific mechanisms to attach to salad leaves.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council

    The ISME journal 2009;3;2;261-5

  • The genome of the blood fluke Schistosoma mansoni.

    Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG and El-Sayed NM

    Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK. mb4@sanger.ac.uk

    Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

    Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03; NIAID NIH HHS: AI054711-01A2, AI48828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09; Wellcome Trust: 086151, WT085775/Z/08/Z

    Nature 2009;460;7253;352-8

  • Genomic and phenotypic variation in epidemic-spanning Salmonella enterica serovar Enteritidis isolates.

    Betancor L, Yim L, Fookes M, Martinez A, Thomson NR, Ivens A, Peters S, Bryant C, Algorta G, Kariuki S, Schelotto F, Maskell D, Dougan G and Chabalgoity JA

    Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Universidad de la República, Montevideo, Uruguay. laurabet@higiene.edu.uy

    Background: Salmonella enterica serovar Enteritidis (S. Enteritidis) has caused major epidemics of gastrointestinal infection in many different countries. In this study we investigate genome divergence and pathogenic potential in S. Enteritidis isolated before, during and after an epidemic in Uruguay.

    Results: 266 S. Enteritidis isolates were genotyped using RAPD-PCR and a selection were subjected to PFGE analysis. From these, 29 isolates spanning different periods, genetic profiles and sources of isolation were assayed for their ability to infect human epithelial cells and subjected to comparative genomic hybridization using a Salmonella pan-array and the sequenced strain S. Enteritidis PT4 P125109 as reference. Six other isolates from distant countries were included as external comparators.Two hundred and thirty three chromosomal genes as well as the virulence plasmid were found as variable among S. Enteritidis isolates. Ten out of the 16 chromosomal regions that varied between different isolates correspond to phage-like regions. The 2 oldest pre-epidemic isolates lack phage SE20 and harbour other phage encoded genes that are absent in the sequenced strain. Besides variation in prophage, we found variation in genes involved in metabolism and bacterial fitness. Five epidemic strains lack the complete Salmonella virulence plasmid. Significantly, strains with indistinguishable genetic patterns still showed major differences in their ability to infect epithelial cells, indicating that the approach used was insufficient to detect the genetic basis of this differential behaviour.

    Conclusion: The recent epidemic of S. Enteritidis infection in Uruguay has been driven by the introduction of closely related strains of phage type 4 lineage. Our results confirm previous reports demonstrating a high degree of genetic homogeneity among S. Enteritidis isolates. However, 10 of the regions of variability described here are for the first time reported as being variable in S. Enteritidis. In particular, the oldest pre-epidemic isolates carry phage-associated genetic regions not previously reported in S. Enteritidis. Overall, our results support the view that phages play a crucial role in the generation of genetic diversity in S. Enteritidis and that phage SE20 may be a key marker for the emergence of particular isolates capable of causing epidemics.

    Funded by: Wellcome Trust: 078168/Z/05/Z

    BMC microbiology 2009;9;237

  • Public health. The cholera crisis in Africa.

    Bhattacharya S, Black R, Bourgeois L, Clemens J, Cravioto A, Deen JL, Dougan G, Glass R, Grais RF, Greco M, Gust I, Holmgren J, Kariuki S, Lambert PH, Liu MA, Longini I, Nair GB, Norrby R, Nossal GJ, Ogra P, Sansonetti P, von Seidlein L, Songane F, Svennerholm AM, Steele D and Walker R

    Indian Council of Medical Research, Ansari Nagore, New Delhi, 110029, India.

    Science (New York, N.Y.) 2009;324;5929;885

  • Calcium-dependent signaling and kinases in apicomplexan parasites.

    Billker O, Lourido S and Sibley LD

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Calcium controls many critical events in the complex life cycles of apicomplexan parasites including protein secretion, motility, and development. Calcium levels are normally tightly regulated and rapid release of calcium into the cytosol activates a family of calcium-dependent protein kinases (CDPKs), which are normally characteristic of plants. CDPKs present in apicomplexans have acquired a number of unique domain structures likely reflecting their diverse functions. Calcium regulation in parasites is closely linked to signaling by cyclic nucleotides and their associated kinases. This Review summarizes the pivotal roles that calcium- and cyclic nucleotide-dependent kinases play in unique aspects of parasite biology.

    Funded by: Medical Research Council: G0501670; NIAID NIH HHS: AI34036, R01 AI034036, R01 AI034036-17, R01 AI082423-01, R21 AI067051

    Cell host & microbe 2009;5;6;612-22

  • A mouse chromosome 4 balancer ENU-mutagenesis screen isolates eleven lethal lines.

    Boles MK, Wilkinson BM, Maxwell A, Lai L, Mills AA, Nishijima I, Salinger AP, Moskowitz I, Hirschi KK, Liu B, Bradley A and Justice MJ

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA. mb144070@bcm.tmc.edu

    Background: ENU-mutagenesis is a powerful technique to identify genes regulating mammalian development. To functionally annotate the distal region of mouse chromosome 4, we performed an ENU-mutagenesis screen using a balancer chromosome targeted to this region of the genome.

    Results: We isolated 11 lethal lines that map to the region of chromosome 4 between D4Mit117 and D4Mit281. These lines form 10 complementation groups. The majority of lines die during embryonic development between E5.5 and E12.5 and display defects in gastrulation, cardiac development, and craniofacial development. One line displayed postnatal lethality and neurological defects, including ataxia and seizures.

    Conclusion: These eleven mutants allow us to query gene function within the distal region of mouse chromosome 4 and demonstrate that new mouse models of mammalian developmental defects can easily and quickly be generated and mapped with the use of ENU-mutagenesis in combination with balancer chromosomes. The low number of mutations isolated in this screen compared with other balancer chromosome screens indicates that the functions of genes in different regions of the genome vary widely.

    Funded by: NCI NIH HHS: R01 CA115503, R01 CA115503-01A1; NHLBI NIH HHS: R01 HL76260; NICHD NIH HHS: U01 HD39372

    BMC genetics 2009;10;12

  • Mosaic 22q13 deletions: evidence for concurrent mosaic segmental isodisomy and gene conversion.

    Bonaglia MC, Giorda R, Beri S, Bigoni S, Sensi A, Baroncini A, Capucci A, De Agostini C, Gwilliam R, Deloukas P, Dunham I and Zuffardi O

    Eugenio Medea Scientific Institute, Bosisio Parini, Lecco, Italy. clara.bonaglia@bp.lnf.it

    Although 22q terminal deletions are well documented, very few patients with mosaicism have been reported. We describe two new cases with mosaic 22q13.2-qter deletion, detected by karyotype analysis, showing the neurological phenotype of 22q13.3 deletion syndrome. Case 1 represents an exceptional case of mosaicism for maternal 22q13.2-qter deletion (45% of cells) and 22q13.2-qter paternal segmental isodisomy (55% of cells). This complex situation was suspected because cytogenetic, FISH and array-CGH analyses showed the presence of an 8.8 Mb mosaic 22q13.2-qter deletion, whereas microsatellite marker analysis was consistent with maternal deletion without any evidence of mosaic deletion. Molecular analysis led to the definition of very close, but not coincident, deletion and uniparental disomy (UPD) break points. Furthermore, we demonstrated that the segmental UPD arose by gene conversion in the same region. In Case 2, mosaicism for a paternal 8.9 Mb 22q13.2-qter deletion (73% of cells) was detected. In both patients, the level of mosaicism was also verified in saliva samples. We propose possible causative mechanisms for both rearrangements. Although the size of the deletions was quite similar, the phenotype was more severe in Case 2 than in Case 1. As maternal UPD 22 has not been generally associated with any defects and as the size of the deletion is very similar in the two cases, phenotype severity is likely to depend entirely on the degree of mosaicism in each individual.

    Funded by: Telethon: GGP06208; Wellcome Trust: 077011

    European journal of human genetics : EJHG 2009;17;4;426-33

  • Family-based analysis of tumor necrosis factor and lymphotoxin-alpha tag polymorphisms with type 1 diabetes in the population of South Croatia.

    Boraska V, Zeggini E, Groves CJ, Rayner NW, Skrabić V, Diakite M, Rockett KA, Kwiatkowski D, McCarthy MI and Zemunik T

    Department of Medical Biology, Medical School, University of Split, Split, Croatia. vboraska@mefst.hr

    Tumor necrosis factor (TNF) and lymphotoxin-alpha (LTA) are cytokines with a wide range of inflammatory and immunomodulatory activities. Type 1 diabetes is an autoimmune disease characterized by destruction of insulin-producing pancreatic beta cells. The aim of the present study was to evaluate the association of polymorphisms in the TNF/LTA gene region with susceptibility to type 1 diabetes. We investigated 11 TNF/LTA tag polymorphisms, designed to capture the majority of common variation in the region, in 160 trio families from South Croatia. We observed overtransmission of alleles from parents to affected child at five variants: (rs909253, allele C, p = 1.2x10(-4); rs1041981, allele A, p = 1.1x10(-4); rs1800629 (G-308A), allele A, p = 1.2x10(-4); rs361525 (G-238A), allele G, p = 8.2x10(-3) and rs3093668, allele G, p = 0.014). We also identified overtransmission of the rs1800629(G-308A)-rs361525(G-238A) A-G haplotype, p = 2.384x10(-5). The present study found an association of the TNF/LTA gene region with type 1 diabetes. A careful assessment of TNF/LTA variants adjusted for linkage disequilibrium with HLA loci is needed to further clarify the role of these genes in type 1 diabetes susceptibility in the population of South Croatia.

    Funded by: Wellcome Trust: 079557, 082370, 088885

    Human immunology 2009;70;3;195-9

  • Catweasel mice: a novel role for Six1 in sensory patch development and a model for branchio-oto-renal syndrome.

    Bosman EA, Quint E, Fuchs H, Hrabé de Angelis M and Steel KP

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    Large-scale mouse mutagenesis initiatives have provided new mouse mutants that are useful models of human deafness and vestibular dysfunction. Catweasel is a novel N-ethyl-N-nitrosourea (ENU)-induced mutation. Heterozygous catweasel mutant mice exhibit mild headtossing associated with a posterior crista defect. We mapped the catweasel mutation to a critical region of 13 Mb on chromosome 12 containing the Six1, -4 and -6 genes. We identified a basepair substitution in exon 1 of the Six1 gene that changes a conserved glutamic acid (E) at position 121 to a glycine (G) in the Six1 homeodomain. Cwe/Cwe animals lack Preyer and righting reflexes, display severe headshaking and have severely truncated cochlea and semicircular canals. Cwe/Cwe animals had very few hair cells in the utricle, but their ampullae and cochlea were devoid of any hair cells. Bmp4, Jag1 and Sox2 expression were largely absent at early stages of sensory development and NeuroD expression was reduced in the developing vestibulo-acoustic ganglion. Lastly we show that Six1 genetically interacts with Jag1. We propose that the catweasel phenotype is due to a hypomorphic mutation in Six1 and that catweasel mice are a suitable model for branchio-oto-renal syndrome. In addition Six1 has a pivotal role in early sensory patch development and may act in the same genetic pathway as Jag1.

    Funded by: Medical Research Council; Wellcome Trust

    Developmental biology 2009;328;2;285-96

  • IRS2 variants and syndromes of severe insulin resistance.

    Bottomley WE, Soos MA, Adams C, Guran T, Howlett TA, Mackie A, Miell J, Monson JP, Temple R, Tenenbaum-Rakover Y, Tymms J, Savage DB, Semple RK, O'Rahilly S and Barroso I

    Funded by: Wellcome Trust: 077016, 077016/Z/05/Z, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z

    Diabetologia 2009;52;6;1208-11

  • The genome sequence of taurine cattle: a window to ruminant biology and evolution.

    Bovine Genome Sequencing and Analysis Consortium, Elsik CG, Tellam RL, Worley KC, Gibbs RA, Muzny DM, Weinstock GM, Adelson DL, Eichler EE, Elnitski L, Guigó R, Hamernik DL, Kappes SM, Lewin HA, Lynn DJ, Nicholas FW, Reymond A, Rijnkels M, Skow LC, Zdobnov EM, Schook L, Womack J, Alioto T, Antonarakis SE, Astashyn A, Chapple CE, Chen HC, Chrast J, Câmara F, Ermolaeva O, Henrichsen CN, Hlavina W, Kapustin Y, Kiryutin B, Kitts P, Kokocinski F, Landrum M, Maglott D, Pruitt K, Sapojnikov V, Searle SM, Solovyev V, Souvorov A, Ucla C, Wyss C, Anzola JM, Gerlach D, Elhaik E, Graur D, Reese JT, Edgar RC, McEwan JC, Payne GM, Raison JM, Junier T, Kriventseva EV, Eyras E, Plass M, Donthu R, Larkin DM, Reecy J, Yang MQ, Chen L, Cheng Z, Chitko-McKown CG, Liu GE, Matukumalli LK, Song J, Zhu B, Bradley DG, Brinkman FS, Lau LP, Whiteside MD, Walker A, Wheeler TT, Casey T, German JB, Lemay DG, Maqbool NJ, Molenaar AJ, Seo S, Stothard P, Baldwin CL, Baxter R, Brinkmeyer-Langford CL, Brown WC, Childers CP, Connelley T, Ellis SA, Fritz K, Glass EJ, Herzig CT, Iivanainen A, Lahmers KK, Bennett AK, Dickens CM, Gilbert JG, Hagen DE, Salih H, Aerts J, Caetano AR, Dalrymple B, Garcia JF, Gill CA, Hiendleder SG, Memili E, Spurlock D, Williams JL, Alexander L, Brownstein MJ, Guan L, Holt RA, Jones SJ, Marra MA, Moore R, Moore SS, Roberts A, Taniguchi M, Waterman RC, Chacko J, Chandrabose MM, Cree A, Dao MD, Dinh HH, Gabisi RA, Hines S, Hume J, Jhangiani SN, Joshi V, Kovar CL, Lewis LR, Liu YS, Lopez J, Morgan MB, Nguyen NB, Okwuonu GO, Ruiz SJ, Santibanez J, Wright RA, Buhay C, Ding Y, Dugan-Rocha S, Herdandez J, Holder M, Sabo A, Egan A, Goodell J, Wilczek-Boney K, Fowler GR, Hitchens ME, Lozado RJ, Moen C, Steffen D, Warren JT, Zhang J, Chiu R, Schein JE, Durbin KJ, Havlak P, Jiang H, Liu Y, Qin X, Ren Y, Shen Y, Song H, Bell SN, Davis C, Johnson AJ, Lee S, Nazareth LV, Patel BM, Pu LL, Vattathil S, Williams RL, Curry S, Hamilton C, Sodergren E, Wheeler DA, Barris W, Bennett GL, Eggen A, Green RD, Harhay GP, Hobbs M, Jann O, Keele JW, Kent MP, Lien S, McKay SD, McWilliam S, Ratnakumar A, Schnabel RD, Smith T, Snelling WM, Sonstegard TS, Stone RT, Sugimoto Y, Takasuga A, Taylor JF, Van Tassell CP, Macneil MD, Abatepaulo AR, Abbey CA, Ahola V, Almeida IG, Amadio AF, Anatriello E, Bahadue SM, Biase FH, Boldt CR, Carroll JA, Carvalho WA, Cervelatti EP, Chacko E, Chapin JE, Cheng Y, Choi J, Colley AJ, de Campos TA, De Donato M, Santos IK, de Oliveira CJ, Deobald H, Devinoy E, Donohue KE, Dovc P, Eberlein A, Fitzsimmons CJ, Franzin AM, Garcia GR, Genini S, Gladney CJ, Grant JR, Greaser ML, Green JA, Hadsell DL, Hakimov HA, Halgren R, Harrow JL, Hart EA, Hastings N, Hernandez M, Hu ZL, Ingham A, Iso-Touru T, Jamis C, Jensen K, Kapetis D, Kerr T, Khalil SS, Khatib H, Kolbehdari D, Kumar CG, Kumar D, Leach R, Lee JC, Li C, Logan KM, Malinverni R, Marques E, Martin WF, Martins NF, Maruyama SR, Mazza R, McLean KL, Medrano JF, Moreno BT, Moré DD, Muntean CT, Nandakumar HP, Nogueira MF, Olsaker I, Pant SD, Panzitta F, Pastor RC, Poli MA, Poslusny N, Rachagani S, Ranganathan S, Razpet A, Riggs PK, Rincon G, Rodriguez-Osorio N, Rodriguez-Zas SL, Romero NE, Rosenwald A, Sando L, Schmutz SM, Shen L, Sherman L, Southey BR, Lutzow YS, Sweedler JV, Tammen I, Telugu BP, Urbanski JM, Utsunomiya YT, Verschoor CP, Waardenberg AJ, Wang Z, Ward R, Weikard R, Welsh TH, White SN, Wilming LG, Wunderlich KR, Yang J and Zhao FQ

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/13446; NHGRI NIH HHS: U54 HG003273, U54 HG003273-04, U54 HG003273-04S1, U54 HG003273-05, U54 HG003273-05S1, U54 HG003273-05S2, U54 HG003273-06, U54 HG003273-06S1, U54 HG003273-06S2, U54 HG003273-07, U54 HG003273-08; NIDA NIH HHS: P30 DA018310; Wellcome Trust: 062023, 077198

    Science (New York, N.Y.) 2009;324;5926;522-8

  • Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds.

    Bovine HapMap Consortium, Gibbs RA, Taylor JF, Van Tassell CP, Barendse W, Eversole KA, Gill CA, Green RD, Hamernik DL, Kappes SM, Lien S, Matukumalli LK, McEwan JC, Nazareth LV, Schnabel RD, Weinstock GM, Wheeler DA, Ajmone-Marsan P, Boettcher PJ, Caetano AR, Garcia JF, Hanotte O, Mariani P, Skow LC, Sonstegard TS, Williams JL, Diallo B, Hailemariam L, Martinez ML, Morris CA, Silva LO, Spelman RJ, Mulatu W, Zhao K, Abbey CA, Agaba M, Araujo FR, Bunch RJ, Burton J, Gorni C, Olivier H, Harrison BE, Luff B, Machado MA, Mwakaya J, Plastow G, Sim W, Smith T, Thomas MB, Valentini A, Williams P, Womack J, Woolliams JA, Liu Y, Qin X, Worley KC, Gao C, Jiang H, Moore SS, Ren Y, Song XZ, Bustamante CD, Hernandez RD, Muzny DM, Patil S, San Lucas A, Fu Q, Kent MP, Vega R, Matukumalli A, McWilliam S, Sclep G, Bryc K, Choi J, Gao H, Grefenstette JJ, Murdoch B, Stella A, Villa-Angulo R, Wright M, Aerts J, Jann O, Negrini R, Goddard ME, Hayes BJ, Bradley DG, Barbosa da Silva M, Lau LP, Liu GE, Lynn DJ, Panzitta F and Dodds KG

    The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.

    Funded by: NHGRI NIH HHS: U54 HG003273; NIGMS NIH HHS: R01 GM083606-02, R01 GM083606-05

    Science (New York, N.Y.) 2009;324;5926;528-32

  • Trypanosomatid genomes contain several subfamilies of ingi-related retroposons.

    Bringaud F, Berriman M and Hertz-Fowler C

    Centre de Résonance Magnétique des Systèmes Biologiques, UMR-5536 CNRS, Université Victor Segalen Bordeaux 2, 146 rue Léo Saignat, 33076 Bordeaux, France. bringaud@rmsb.u-bordeaux2.fr

    Retroposons are ubiquitous transposable elements found in the genomes of most eukaryotes, including trypanosomatids. The African and American trypanosomes (Trypanosoma brucei and Trypanosoma cruzi) contain long autonomous retroposons of the ingi clade (Tbingi and L1Tc, respectively) and short nonautonomous truncated versions (TbRIME and NARTc, respectively), as well as degenerate ingi-related retroposons devoid of coding capacity (DIREs). In contrast, Leishmania major contains only remnants of extinct retroposons (LmDIREs) and of short nonautonomous heterogeneous elements (LmSIDERs). We extend this comparative and evolutionary analysis of retroposons to the genomes of two other African trypanosomes (Trypanosoma congolense and Trypanosoma vivax) and another Leishmania sp. (Leishmania braziliensis). Three new potentially functional retroposons of the ingi clade have been identified: Tvingi in T. vivax and Tcoingi and L1Tco in T. congolense. T. congolense is the first trypanosomatid containing two classes of potentially active retroposons of the ingi clade. We analyzed sequences located upstream of these new long autonomous ingi-related elements, which code for the recognition site of the retroposon-encoded endonuclease. The closely related Tcoingi and Tvingi elements show the same conserved pattern, indicating that the Tcoingi- and Tvingi-encoded endonucleases share site specificity. Similarly, the conserved pattern previously identified upstream of L1Tc has also been detected at the same relative position upstream of L1Tco elements. A phylogenetic analysis of all ingi-related retroposons identified so far, including DIREs, clearly shows that several distinct subfamilies have emerged and coexisted, though in the course of trypanosomatid evolution, only a few have been maintained as active elements in modern trypanosomatid (sub)species.

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Eukaryotic cell 2009;8;10;1532-42

  • Genomic variation in a global village: report of the 10th annual Human Genome Variation Meeting 2008.

    Brookes AJ, Chanock SJ, Hudson TJ, Peltonen L, Abecasis G, Kwok PY and Scherer SW

    Department of Genetics, University of Leicester, Leicester, United Kingdom.

    The Centre for Applied Genomics of the Hospital for Sick Children and the University of Toronto hosted the 10th Human Genome Variation (HGV) Meeting in Toronto, Canada, in October 2008, welcoming about 240 registrants from 34 countries. During the 3 days of plenary workshops, keynote address, and poster sessions, a strong cross-disciplinary trend was evident, integrating expertise from technology and computation, through biology and medicine, to ethics and law. Single nucleotide polymorphisms (SNPs) as well as the larger copy number variants (CNVs) are recognized by ever-improving array and next-generation sequencing technologies, and the data are being incorporated into studies that are increasingly genome-wide as well as global in scope. A greater challenge is to convert data to information, through databases, and to use the information for greater understanding of human variation. In the wake of publications of the first individual genome sequences, an inaugural public forum provided the opportunity to debate whether we are ready for personalized medicine through direct-to-consumer testing. The HGV meetings foster collaboration, and fruits of the interactions from 2008 are anticipated for the 11th annual meeting in September 2009.

    Human mutation 2009;30;7;1134-8

  • Accurate and sensitive peptide identification with Mascot Percolator.

    Brosch M, Yu L, Hubbard T and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. In this paper, we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. Mascot Percolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.

    Funded by: Wellcome Trust: 077198

    Journal of proteome research 2009;8;6;3176-81

  • Observational study on variability between biobanks in the estimation of DNA concentration.

    Brown J, Donev AN, Aslanidis C, Bracegirdle P, Dixon KP, Foedinger M, Gwilliam R, Hardy M, Illig T, Ke X, Krinka D, Lagerberg C, Laiho P, Lewis DH, McArdle W, Patton S, Ring SM, Schmitz G, Stevens H, Tybring G, Wichmann HE, Ollier WE and Yuille MA

    Centre for Integrated Genomic Medical Research, University of Manchester, Manchester, UK. Jay.Brown@cmmc.nhs.uk

    Background: There is little confidence in the consistency of estimation of DNA concentrations when samples move between laboratories. Evidence on this consistency is largely anecdotal. Therefore there is a need first to measure this consistency among different laboratories and then identify and implement remedies. A pilot experiment to test logistics and provide initial data on consistency was therefore conceived.

    Methods: DNA aliquots at nominal concentrations between 10 and 300 ng/mul were dispensed into the wells of 96-well plates by one participant - the coordinating centre. Participants estimated the concentration in each well and returned estimates to the coordinating centre.

    Results: Considerable overall variability was observed among estimates. There were statistically significant differences between participants' measurements and between fluorescence emission and absorption spectroscopy.

    Conclusion: Anecdotal evidence of variability in DNA concentration estimation has been substantiated. Reduction in variability between participants will require the identification of major sources of variation, specification of effective remedies and their implementation.

    BMC research notes 2009;2;208

  • Functional diversity for REST (NRSF) is defined by in vivo binding affinity hierarchies at the DNA sequence level.

    Bruce AW, López-Contreras AJ, Flicek P, Down TA, Dhami P, Dillon SC, Koch CM, Langford CF, Dunham I, Andrews RM and Vetrie D

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. awb41@cam.ac.uk

    The molecular events that contribute to, and result from, the in vivo binding of transcription factors to their cognate DNA sequence motifs in mammalian genomes are poorly understood. We demonstrate that variations within the DNA sequence motifs that bind the transcriptional repressor REST (NRSF) encode in vivo DNA binding affinity hierarchies that contribute to regulatory function during lineage-specific and developmental programs in fundamental ways. First, canonical sequence motifs for REST facilitate strong REST binding and control functional classes of REST targets that are common to all cell types, whilst atypical motifs participate in weak interactions and control those targets, which are cell- or tissue-specific. Second, variations in REST binding relate directly to variations in expression and chromatin configurations of REST's target genes. Third, REST clearance from its binding sites is also associated with variations in the RE1 motif. Finally, and most surprisingly, weak REST binding sites reside in DNA sequences that show the highest levels of constraint through evolution, thus facilitating their roles in maintaining tissue-specific functions. These relationships have never been reported in mammalian systems for any transcription factor.

    Funded by: Wellcome Trust

    Genome research 2009;19;6;994-1005

  • Genome-wide microarray-based comparative genomic hybridization analysis of lymphoplasmacytic lymphomas reveals heterogeneous aberrations.

    Buckley PG, Walsh SH, Laurell A, Sundström C, Roos G, Langford CF, Dumanski JP and Rosenquist R

    Department of Cancer Genetics, Royal College of Surgeons in Ireland, Dublin, Ireland. pbuckley@rcsi.ie

    Lymphoplasmacytic lymphoma (LPL) is not a sharply delineated lymphoma entity, either morphologically, phenotypically, or clinically. The diagnosis is often made by excluding other small cell lymphomas with plasmacytic differentiation, thus a genetic diagnostic marker would be of great benefit. Conventional cytogenetic techniques have previously demonstrated a deletion of 6q in a proportion of cases, varying from 7 to 55%. In this report, we apply array-based comparative genomic hybridization on 11 LPL samples. Genomic aberrations were detected in 9 of 11 cases, and included gains and losses. In general, the number of genetic aberrations was relatively low (two to three abnormalities per case). Recurrent aberrations detected were deletion of 6q (two cases), deletion of chromosome 17 (two cases), gain of 3q (two cases), and gain of chromosome 7 (two cases). This report not only confirms the reported loss of 6q in a proportion of cases but also highlights the genetic heterogeneity of LPL, in accordance with the known immunophenotypical, morphological, and clinical diversity of the disease.

    Funded by: Wellcome Trust

    Leukemia & lymphoma 2009;50;9;1528-34

  • The T3SS effector EspT defines a new category of invasive enteropathogenic E. coli (EPEC) which form intracellular actin pedestals.

    Bulgin R, Arbeloa A, Goulding D, Dougan G, Crepin VF, Raymond B and Frankel G

    Centre for Molecular Microbiology and Infection, Division of Cell and Molecular Biology, Imperial College London, London, United Kingdom.

    Enteropathogenic Escherichia coli (EPEC) strains are defined as extracellular pathogens which nucleate actin rich pedestal-like membrane extensions on intestinal enterocytes to which they intimately adhere. EPEC infection is mediated by type III secretion system effectors, which modulate host cell signaling. Recently we have shown that the WxxxE effector EspT activates Rac1 and Cdc42 leading to formation of membrane ruffles and lamellipodia. Here we report that EspT-induced membrane ruffles facilitate EPEC invasion into non-phagocytic cells in a process involving Rac1 and Wave2. Internalized EPEC resides within a vacuole and Tir is localized to the vacuolar membrane, resulting in actin polymerization and formation of intracellular pedestals. To the best of our knowledge this is the first time a pathogen has been shown to induce formation of actin comets across a vacuole membrane. Moreover, our data breaks the dogma of EPEC as an extracellular pathogen and defines a new category of invasive EPEC.

    Funded by: Medical Research Council: G0700823; Wellcome Trust

    PLoS pathogens 2009;5;12;e1000683

  • The evolution of protein domain families.

    Buljan M and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    Protein domains are the common currency of protein structure and function. Over 10,000 such protein families have now been collected in the Pfam database. Using these data along with animal gene phylogenies from TreeFam allowed us to investigate the gain and loss of protein domains. Most gains and losses of domains occur at protein termini. We show that the nature of changes is similar after speciation or duplication events. However, changes in domain architecture happen at a higher frequency after gene duplication. We suggest that the bias towards protein termini is largely because insertion and deletion of domains at most positions in a protein are likely to disrupt the structure of existing domains. We can also use Pfam to trace the evolution of specific families. For example, the immunoglobulin superfamily can be traced over 500 million years during its expansion into one of the largest families in the human genome. It can be shown that this protein family has its origins in basic animals such as the poriferan sponges where it is found in cell-surface-receptor proteins. We can trace how the structure and sequence of this family diverged during vertebrate evolution into constant and variable domains that are found in the antibodies of our immune system as well as in neural and muscle proteins.

    Funded by: Wellcome Trust: 087656, WT077044/Z/05/Z

    Biochemical Society transactions 2009;37;Pt 4;751-5

  • Evolution of pathogenicity and sexual reproduction in eight Candida genomes.

    Butler G, Rasmussen MD, Lin MF, Santos MA, Sakthikumar S, Munro CA, Rheinbay E, Grabherr M, Forche A, Reedy JL, Agrafioti I, Arnaud MB, Bates S, Brown AJ, Brunke S, Costanzo MC, Fitzpatrick DA, de Groot PW, Harris D, Hoyer LL, Hube B, Klis FM, Kodira C, Lennard N, Logue ME, Martin R, Neiman AM, Nikolaou E, Quail MA, Quinn J, Santos MC, Schmitzberger FF, Sherlock G, Shah P, Silverstein KA, Skrzypek MS, Soll D, Staggs R, Stansfield I, Stumpf MP, Sudbery PE, Srikantha T, Zeng Q, Berman J, Berriman M, Heitman J, Gow NA, Lorenz MC, Birren BW, Kellis M and Cuomo CA

    UCD School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland. geraldine.butler@ucd.ie

    Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.

    Funded by: Medical Research Council: G0400284; NHGRI NIH HHS: R01 HG004037-02, U54 HG003067, U54 HG003067-06; NIAID NIH HHS: HHSN266200400001C, R01 AI050113, R01 AI075096-04; NIDCR NIH HHS: R01 DE015873; Wellcome Trust

    Nature 2009;459;7247;657-62

  • Integrin-mediated axoglial interactions initiate myelination in the central nervous system.

    Câmara J, Wang Z, Nunes-Fonseca C, Friedman HC, Grove M, Sherman DL, Komiyama NH, Grant SG, Brophy PJ, Peterson A and ffrench-Constant C

    Department of Pathology, University of Cambridge, Cambridge CB2 1QP, England, UK.

    All but the smallest-diameter axons in the central nervous system are myelinated, but the signals that initiate myelination are unknown. Our prior work has shown that integrin signaling forms part of the cell-cell interactions that ensure only those oligodendrocytes contacting axons survive. Here, therefore, we have asked whether integrins regulate the interactions that lead to myelination. Using homologous recombination to insert a single-copy transgene into the hypoxanthine phosphoribosyl transferase (hprt) locus, we find that mice expressing a dominant-negative beta1 integrin in myelinating oligodendrocytes require a larger axon diameter to initiate timely myelination. Mice with a conditional deletion of focal adhesion kinase (a signaling molecule activated by integrins) exhibit a similar phenotype. Conversely, transgenic mice expressing dominant-negative beta3 integrin in oligodendrocytes display no myelination abnormalities. We conclude that beta1 integrin plays a key role in the axoglial interactions that sense axon size and initiate myelination, such that loss of integrin signaling leads to a delay in myelination of small-diameter axons.

    Funded by: Multiple Sclerosis Society: 669; Wellcome Trust

    The Journal of cell biology 2009;185;4;699-712

  • Somatic and germline genetics at the JAK2 locus.

    Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton, UK. pc8@sanger.ac.uk

    Myeloproliferative neoplasms are hematological malignancies frequently associated with somatically acquired mutation of the JAK2 gene. A new study shows that these mutations are preferentially found within a particular inherited JAK2 haplotype, implying the existence of a strong, but uncharacterized, interaction between somatic and germline genetics at this locus.

    Funded by: Wellcome Trust: 088340

    Nature genetics 2009;41;4;385-6

  • Reticulin accumulation in essential thrombocythemia: prognostic significance and relationship to therapy.

    Campbell PJ, Bareford D, Erber WN, Wilkins BS, Wright P, Buck G, Wheatley K, Harrison CN and Green AR

    Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK.

    Purpose: Essential thrombocythemia (ET) manifests substantial interpatient heterogeneity in rates of thrombosis, hemorrhage, and disease transformation. Bone marrow histology reflects underlying disease activity in ET but many morphological features show poor reproducibility.

    We evaluated the clinical significance of bone marrow reticulin, a measure previously shown to have relatively high interobserver reliability, in a large, prospectively-studied cohort of ET patients.

    Results: Reticulin grade positively correlated with white blood cell (P = .05) and platelet counts (P = .0001) at diagnosis. Elevated reticulin levels at presentation predicted higher rates of arterial thrombosis (hazard ratio [HR], 1.8; 95% CI, 1.1 to 2.9; P = .01), major hemorrhage (HR, 2.0; 95% CI, 1.0 to 3.9; P = .05), and myelofibrotic transformation (HR, 5.5; 95% CI, 1.7 to 18.4; P = .0007) independently of known risk factors. Higher reticulin levels at diagnosis were associated with greater subsequent falls in hemoglobin levels in patients treated with anagrelide (P < .0001), but not in those receiving hydroxyurea (P = .9). Moreover, serial trephine specimens in patients randomly assigned to anagrelide showed significantly greater increases in reticulin grade compared with those allocated to hydroxyurea (P = .0003), and four patients who developed increased bone marrow reticulin on anagrelide showed regression of fibrosis when switched to hydroxyurea. These data suggest that patients receiving anagrelide therapy should undergo surveillance bone marrow biopsy every 2 to 3 years and that those who show substantially increasing reticulin levels are at risk of myelofibrotic transformation and may benefit from changing therapy before adverse clinical features develop.

    Conclusion: Our results demonstrate that bone marrow reticulin grade at diagnosis represents an independent prognostic marker in ET, reflecting activity and/or duration of disease, with implications for the monitoring of patients receiving anagrelide.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust

    Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2009;27;18;2991-9

  • TLR9 polymorphisms in African populations: no association with severe malaria, but evidence of cis-variants acting on gene expression.

    Campino S, Forton J, Auburn S, Fry A, Diakite M, Richardson A, Hull J, Jallow M, Sisay-Joof F, Pinder M, Molyneux ME, Taylor TE, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. sc11@sanger.ac.uk

    Background: During malaria infection the Toll-like receptor 9 (TLR9) is activated through induction with plasmodium DNA or another malaria motif not yet identified. Although TLR9 activation by malaria parasites is well reported, the implication to the susceptibility to severe malaria is not clear. The aim of this study was to assess the contribution of genetic variation at TLR9 to severe malaria.

    Methods: This study explores the contribution of TLR9 genetic variants to severe malaria using two approaches. First, an association study of four common single nucleotide polymorphisms was performed on both family- and population-based studies from Malawian and Gambian populations (n>6000 individual). Subsequently, it was assessed whether TLR9 expression is affected by cis-acting variants and if these variants could be mapped. For this work, an allele specific expression (ASE) assay on a panel of HapMap cell lines was carried out.

    Results: No convincing association was found with polymorphisms in TLR9 for malaria severity, in either Gambian or Malawian populations, using both case-control and family based study designs. Using an allele specific expression assay it was observed that TLR9 expression is affected by cis-acting variants, these results were replicated in a second experiment using biological replicates.

    Conclusion: By using the largest cohorts analysed to date, as well as a standardized phenotype definition and study design, no association of TLR9 genetic variants with severe malaria was found. This analysis considered all common variants in the region, but it is remains possible that there are rare variants with association signals. This report also shows that TLR9 expression is potentially modulated through cis-regulatory variants, which may lead to differential inflammatory responses to infection between individuals.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust

    Malaria journal 2009;8;44

  • Evolutionary breakpoints in the gibbon suggest association between cytosine methylation and karyotype evolution.

    Carbone L, Harris RA, Vessere GM, Mootnick AR, Humphray S, Rogers J, Kim SK, Wall JD, Martin D, Jurka J, Milosavljevic A and de Jong PJ

    Children's Hospital and Research Center Oakland, Oakland, California, United States of America. lcarbone@chori.org

    Gibbon species have accumulated an unusually high number of chromosomal changes since diverging from the common hominoid ancestor 15-18 million years ago. The cause of this increased rate of chromosomal rearrangements is not known, nor is it known if genome architecture has a role. To address this question, we analyzed sequences spanning 57 breaks of synteny between northern white-cheeked gibbons (Nomascus l. leucogenys) and humans. We find that the breakpoint regions are enriched in segmental duplications and repeats, with Alu elements being the most abundant. Alus located near the gibbon breakpoints (<150 bp) have a higher CpG content than other Alus. Bisulphite allelic sequencing reveals that these gibbon Alus have a lower average density of methylated cytosine that their human orthologues. The finding of higher CpG content and lower average CpG methylation suggests that the gibbon Alu elements are epigenetically distinct from their human orthologues. The association between undermethylation and chromosomal rearrangement in gibbons suggests a correlation between epigenetic state and structural genome variation in evolution.

    PLoS genetics 2009;5;6;e1000538

  • Genome watch: What a scorcher!

    Cerdeño-Tárraga AM

    This month's Genome Watch looks at the publication of four hyperthermophilic archaeal genomes, three of which belong to the Crenarchaeota phylum and one of which belongs to the newly defined Nanoarchaeota phylum.

    Nature reviews. Microbiology 2009;7;6;408-9

  • Induction of antibody responses to African horse sickness virus (AHSV) in ponies after vaccination with recombinant modified vaccinia Ankara (MVA).

    Chiam R, Sharp E, Maan S, Rao S, Mertens P, Blacklaws B, Davis-Poynter N, Wood J and Castillo-Olivares J

    Animal Health Trust, Lanwades Park, Kentford, Newmarket, Suffolk, United Kingdom.

    Background: African horse sickness virus (AHSV) causes a non-contagious, infectious disease in equids, with mortality rates that can exceed 90% in susceptible horse populations. AHSV vaccines play a crucial role in the control of the disease; however, there are concerns over the use of polyvalent live attenuated vaccines particularly in areas where AHSV is not endemic. Therefore, it is important to consider alternative approaches for AHSV vaccine development. We have carried out a pilot study to investigate the ability of recombinant modified vaccinia Ankara (MVA) vaccines expressing VP2, VP7 or NS3 genes of AHSV to stimulate immune responses against AHSV antigens in the horse.

    VP2, VP7 and NS3 genes from AHSV-4/Madrid87 were cloned into the vaccinia transfer vector pSC11 and recombinant MVA viruses generated. Antigen expression or transcription of the AHSV genes from cells infected with the recombinant viruses was confirmed. Pairs of ponies were vaccinated with MVAVP2, MVAVP7 or MVANS3 and both MVA vector and AHSV antigen-specific antibody responses were analysed. Vaccination with MVAVP2 induced a strong AHSV neutralising antibody response (VN titre up to a value of 2). MVAVP7 also induced AHSV antigen-specific responses, detected by western blotting. NS3 specific antibody responses were not detected.

    Conclusions: This pilot study demonstrates the immunogenicity of recombinant MVA vectored AHSV vaccines, in particular MVAVP2, and indicates that further work to investigate whether these vaccines would confer protection from lethal AHSV challenge in the horse is justifiable.

    Funded by: Biotechnology and Biological Sciences Research Council

    PloS one 2009;4;6;e5997

  • Lineage-specific biology revealed by a finished genome assembly of the mouse.

    Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, Bult CJ, Agarwala R, Cherry JL, DiCuccio M, Hlavina W, Kapustin Y, Meric P, Maglott D, Birtle Z, Marques AC, Graves T, Zhou S, Teague B, Potamousis K, Churas C, Place M, Herschleb J, Runnheim R, Forrest D, Amos-Landgraf J, Schwartz DC, Cheng Z, Lindblad-Toh K, Eichler EE, Ponting CP and Mouse Genome Sequencing Consortium

    National Center for Biotechnology Information, Bethesda, Maryland, United States of America. church@ncbi.nlm.nih.gov

    The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.

    Funded by: Medical Research Council: MC_U127561112, MC_U137761446, MC_U142684174; NHGRI NIH HHS: HG002385, R01 HG002385

    PLoS biology 2009;7;5;e1000112

  • Tumor necrosis factor and lymphotoxin-alpha polymorphisms and severe malaria in African populations.

    Clark TG, Diakite M, Auburn S, Campino S, Fry AE, Green A, Richardson A, Small K, Teo YY, Wilson J, Jallow M, Sisay-Joof F, Pinder M, Griffiths MJ, Peshu N, Williams TN, Marsh K, Molyneux ME, Taylor TE, Rockett KA and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Nuffield Department of Medicine, John Radcliffe Hospital, Oxford, United Kingdom. tgc@well.ox.ac.uk

    The tumor necrosis factor gene (TNF) and lymphotoxin-alpha gene (LTA) have long attracted attention as candidate genes for susceptibility traits for malaria, and several of their polymorphisms have been found to be associated with severe malaria (SM) phenotypes. In a large study involving >10,000 individuals and encompassing 3 African populations, we found evidence to support the reported associations between the TNF -238 polymorphism and SM in The Gambia. However, no TNF/LTA polymorphisms were found to be associated with SM in cohorts in Kenya and Malawi. It has been suggested that the causal polymorphisms regulating the TNF and LTA responses may be located some distance from the genes. Therefore, more-detailed mapping of variants across TNF/LTA genes and their flanking regions in the Gambian and allied populations may need to be undertaken to find any causal polymorphisms.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 076934

    The Journal of infectious diseases 2009;199;4;569-75

  • Neurotransmitters drive combinatorial multistate postsynaptic density networks.

    Coba MP, Pocklington AJ, Collins MO, Kopanitsa MV, Uren RT, Swamy S, Croning MD, Choudhary JS and Grant SG

    Genes to Cognition, Wellcome Trust Sanger Institute, Cambridgeshire, UK.

    The mammalian postsynaptic density (PSD) comprises a complex collection of approximately 1100 proteins. Despite extensive knowledge of individual proteins, the overall organization of the PSD is poorly understood. Here, we define maps of molecular circuitry within the PSD based on phosphorylation of postsynaptic proteins. Activation of a single neurotransmitter receptor, the N-methyl-D-aspartate receptor (NMDAR), changed the phosphorylation status of 127 proteins. Stimulation of ionotropic and metabotropic glutamate receptors and dopamine receptors activated overlapping networks with distinct combinatorial phosphorylation signatures. Using peptide array technology, we identified specific phosphorylation motifs and switching mechanisms responsible for the integration of neurotransmitter receptor pathways and their coordination of multiple substrates in these networks. These combinatorial networks confer high information-processing capacity and functional diversity on synapses, and their elucidation may provide new insights into disease mechanisms and new opportunities for drug discovery.

    Funded by: Medical Research Council: G90/93; Wellcome Trust: 066717

    Science signaling 2009;2;68;ra19

  • Neuroproteomics

    Collins MO, Grant SGN

    Encyclopedia of Neuroscience 2009;971-81

  • Genomic analysis reveals extensive gene duplication within the bovine TRB locus.

    Connelley T, Aerts J, Law A and Morrison WI

    The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Roslin, EH25 9RG, UK. timothy.connelley@ed.ac.uk

    Background: Diverse TR and IG repertoires are generated by V(D)J somatic recombination. Genomic studies have been pivotal in cataloguing the V, D, J and C genes present in the various TR/IG loci and describing how duplication events have expanded the number of these genes. Such studies have also provided insights into the evolution of these loci and the complex mechanisms that regulate TR/IG expression. In this study we analyze the sequence of the third bovine genome assembly to characterize the germline repertoire of bovine TRB genes and compare the organization, evolution and regulatory structure of the bovine TRB locus with that of humans and mice.

    Results: The TRB locus in the third bovine genome assembly is distributed over 5 scaffolds, extending to approximately 730 Kb. The available sequence contains 134 TRBV genes, assigned to 24 subgroups, and 3 clusters of DJC genes, each comprising a single TRBD gene, 5-7 TRBJ genes and a single TRBC gene. Seventy-nine of the TRBV genes are predicted to be functional. Comparison with the human and murine TRB loci shows that the gene order, as well as the sequences of non-coding elements that regulate TRB expression, are highly conserved in the bovine. Dot-plot analyses demonstrate that expansion of the genomic TRBV repertoire has occurred via a complex and extensive series of duplications, predominantly involving DNA blocks containing multiple genes. These duplication events have resulted in massive expansion of several TRBV subgroups, most notably TRBV6, 9 and 21 which contain 40, 35 and 16 members respectively. Similarly, duplication has lead to the generation of a third DJC cluster. Analyses of cDNA data confirms the diversity of the TRBV genes and, in addition, identifies a substantial number of TRBV genes, predominantly from the larger subgroups, which are still absent from the genome assembly. The observed gene duplication within the bovine TRB locus has created a repertoire of phylogenetically diverse functional TRBV genes, which is substantially larger than that described for humans and mice.

    Conclusion: The analyses completed in this study reveal that, although the gene content and organization of the bovine TRB locus are broadly similar to that of humans and mice, multiple duplication events have led to a marked expansion in the number of TRB genes. Similar expansions in other ruminant TR loci suggest strong evolutionary pressures in this lineage have selected for the development of enlarged sets of TR genes that can contribute to diverse TR repertoires.

    Funded by: Wellcome Trust: 075820

    BMC genomics 2009;10;192

  • First report of human infection with Salmonella enterica serovar Apapa resulting from exposure to a pet lizard.

    Cooke FJ, De Pinna E, Maguire C, Guha S, Pickard DJ, Farrington M and Threlfall EJ

    Addenbrooke's Hospital, Cambridge, United Kingdom. fiona.cooke@addenbrookes.nhs.uk

    We present the first documented human case of Salmonella enterica serovar Apapa infection, isolated concurrently from a hospital inpatient and a pet lizard. The isolates were identical by biochemical profiling and pulsed-field gel electrophoresis. This rare serotype is known to be associated with reptiles. The current practice for avoiding reptile-associated infections is reviewed.

    Journal of clinical microbiology 2009;47;8;2672-4

  • Large scale association analysis of novel genetic loci for coronary artery disease.

    Coronary Artery Disease Consortium, Samani NJ, Deloukas P, Erdmann J, Hengstenberg C, Kuulasmaa K, McGinnis R, Schunkert H, Soranzo N, Thompson J, Tiret L and Ziegler A

    Background: Combined analysis of 2 genome-wide association studies in cases enriched for family history recently identified 7 loci (on 1p13.3, 1q41, 2q36.3, 6q25.1, 9p21, 10q11.21, and 15q22.33) that may affect risk of coronary artery disease (CAD). Apart from the 9p21 locus, the other loci await substantive replication. Furthermore, the effect of these loci on CAD risk in a broader range of individuals remains to be determined.

    We undertook association analysis of single nucleotide polymorphisms at each locus with CAD risk in 11,550 cases and 11,205 controls from 9 European studies. The 9p21.3 locus showed unequivocal association (rs1333049, combined odds ratio [OR]=1.20, 95% CI [1.16 to 1.25], probability value=2.81 x 10(-21)). We also confirmed association signals at 1p13.3 (rs599839, OR=1.13 [1.08 to 1.19], P=1.44 x 10(-7)), 1q41 (rs3008621, OR=1.10 [1.04 to 1.17], P=1.02 x 10(-3)), and 10q11.21 (rs501120, OR=1.11 [1.05 to 1.18], P=4.34 x 10(-4)). The associations with 6q25.1 (rs6922269, P=0.020) and 2q36.3 (rs2943634, P=0.032) were borderline and not statistically significant after correction for multiple testing. The 15q22.33 locus did not replicate. The 10q11.21 locus showed a possible sex interaction (P=0.015), with a significant effect in women (OR=1.29 [1.15 to 1.45], P=1.86 x 10(-5)) but not men (OR=1.03 [0.96 to 1.11], P=0.387). There were no other strong interactions of any of the loci with other traditional risk factors. The loci at 9p21, 1p13.3, 2q36.3, and 10q11.21 acted independently and cumulatively increased CAD risk by 15% (12% to 18%), per additional risk allele.

    Conclusions: The findings provide strong evidence for association between at least 4 genetic loci and CAD risk. Cumulatively, these novel loci have a significant impact on risk of CAD at least in European populations.

    Funded by: British Heart Foundation: CH/03/001/15569, RG/08/014/24067; Medical Research Council: G0401527, G0701863, MC_U106179471; Wellcome Trust: 077011, 082371, 091746

    Arteriosclerosis, thrombosis, and vascular biology 2009;29;5;774-80

  • From small reads do mighty genomes grow.

    Croucher NJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. microbes@sanger.ac.uk

    This month's Genome Watch discusses the use of next-generation sequencing technologies to assemble draft genomes for two pseudomonad species.

    Nature reviews. Microbiology 2009;7;9;621

  • Influenza-specific amino acid substitution model

    Cuong DC, Vinh LS, Quang LS

    KSE 2009 - The 1st International Conference on Knowledge and Systems Engineering. 2009;5361735;19-25

  • X-box binding protein 1 contributes to induction of the Kaposi's sarcoma-associated herpesvirus lytic cycle under hypoxic conditions.

    Dalton-Griffin L, Wilson SJ and Kellam P

    Department of Infection, UCL, London, United Kingdom.

    Kaposi's sarcoma-associated herpesvirus (KSHV), like other herpesviruses, has two stages to its life cycle: latency and lytic replication. KSHV is required for development of Kaposi's sarcoma, a tumor of endothelial origin, and is associated with the B-cell tumor primary effusion lymphoma (PEL) and the plasmablastic variant of multicentric Castleman's disease, all of which are characterized by predominantly latent KSHV infection. Recently, we and others have shown that the activated form of transcription factor X-box binding protein 1 (XBP-1) is a physiological trigger of KSHV lytic reactivation in PEL. Here, we show that XBP-1s transactivates the ORF50/RTA promoter though an ACGT core containing the XBP-1 response element, an element previously identified as a weakly active hypoxia response element (HRE). Hypoxia induces the KSHV lytic cycle, and active HREs that respond to hypoxia-inducible factor 1alpha are present in the ORF50/RTA promoter. Hypoxia also induces active XBP-1s, and here, we show that both transcription factors contribute to the induction of RTA expression, leading to the production of infectious KSHV under hypoxic conditions.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust

    Journal of virology 2009;83;14;7202-9

  • A truncation mutation in TBC1D4 in a family with acanthosis nigricans and postprandial hyperinsulinemia.

    Dash S, Sano H, Rochford JJ, Semple RK, Yeo G, Hyden CS, Soos MA, Clark J, Rodin A, Langenberg C, Druet C, Fawcett KA, Tung YC, Wareham NJ, Barroso I, Lienhard GE, O'Rahilly S and Savage DB

    Departments of Medicine and Clinical Biochemistry, University of Cambridge, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Tre-2, BUB2, CDC16, 1 domain family member 4 (TBC1D4) (AS160) is a Rab-GTPase activating protein implicated in insulin-stimulated glucose transporter 4 (GLUT4) translocation in adipocytes and myotubes. To determine whether loss-of-function mutations in TBC1D4 might impair GLUT4 translocation and cause insulin resistance in humans, we screened the coding regions of this gene in 156 severely insulin-resistant patients. A female presenting at age 11 years with acanthosis nigricans and extreme postprandial hyperinsulinemia was heterozygous for a premature stop mutation (R363X) in TBC1D4. After demonstrating reduced expression of wild-type TBC1D4 protein and expression of the truncated protein in lymphocytes from the proband, we further characterized the biological effects of the truncated protein in 3T3L1 adipocytes. Prematurely truncated TBC1D4 protein tended to increase basal cell membrane GLUT4 levels (P = 0.053) and significantly reduced insulin-stimulated GLUT4 cell membrane translocation (P < 0.05). When coexpressed with wild-type TBC1D4, the truncated protein dimerized with full-length TBC1D4, suggesting that the heterozygous truncated variant might interfere with its wild-type counterpart in a dominant negative fashion. Two overweight family members with the mutation also manifested normal fasting glucose and insulin levels but disproportionately elevated insulin levels following an oral glucose challenge. This family provides unique genetic evidence of TBC1D4 involvement in human insulin action.

    Funded by: British Heart Foundation; Medical Research Council: G0600414; NCI NIH HHS: P30 CA023108; NIDDK NIH HHS: DK25336, R01 DK025336; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2009;106;23;9350-5

  • Genetics, gene expression and bioinformatics of the pituitary gland.

    Davis SW, Potok MA, Brinkmeier ML, Carninci P, Lyons RH, MacDonald JW, Fleming MT, Mortensen AH, Egashira N, Ghosh D, Steel KP, Osamura RY, Hayashizaki Y and Camper SA

    University of Michigan, Ann Arbor, MI 48109-0618, USA.

    Genetic cases of congenital pituitary hormone deficiency are common and many are caused by transcription factor defects. Mouse models with orthologous mutations are invaluable for uncovering the molecular mechanisms that lead to problems in organ development and typical patient characteristics. We are using mutant mice defective in the transcription factors PROP1 and POU1F1 for gene expression profiling to identify target genes for these critical transcription factors and candidates for cases of pituitary hormone deficiency of unknown aetiology. These studies reveal critical roles for Wnt signalling pathways, including the TCF/LEF transcription factors and interacting proteins of the groucho family, bone morphogenetic protein antagonists and targets of notch signalling. Current studies are investigating the roles of novel homeobox genes and pathways that regulate the transition from proliferation to differentiation, cell adhesion and cell migration. Pituitary adenomas are a common human health problem, yet most cases are sporadic, necessitating alternative approaches to traditional Mendelian genetic studies. Mouse models of adenoma formation offer the opportunity for gene expression profiling during progressive stages of hyperplasia, adenoma and tumorigenesis. This approach holds promise for the identification of relevant pathways and candidate genes as risk factors for adenoma formation, understanding mechanisms of progression, and identifying drug targets and clinically relevant biomarkers.

    Funded by: NICHD NIH HHS: HD R3730428, R01 HD030428-07, R01 HD034283-15, R01 HD34283, R37 HD030428-19; Wellcome Trust

    Hormone research 2009;71 Suppl 2;101-15

  • A common variant associated with dyslexia reduces expression of the KIAA0319 gene.

    Dennis MY, Paracchini S, Scerri TS, Prokunina-Olsson L, Knight JC, Wade-Martins R, Coggill P, Beck S, Green ED and Monaco AP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Numerous genetic association studies have implicated the KIAA0319 gene on human chromosome 6p22 in dyslexia susceptibility. The causative variant(s) remains unknown but may modulate gene expression, given that (1) a dyslexia-associated haplotype has been implicated in the reduced expression of KIAA0319, and (2) the strongest association has been found for the region spanning exon 1 of KIAA0319. Here, we test the hypothesis that variant(s) responsible for reduced KIAA0319 expression resides on the risk haplotype close to the gene's transcription start site. We identified seven single-nucleotide polymorphisms on the risk haplotype immediately upstream of KIAA0319 and determined that three of these are strongly associated with multiple reading-related traits. Using luciferase-expressing constructs containing the KIAA0319 upstream region, we characterized the minimal promoter and additional putative transcriptional regulator regions. This revealed that the minor allele of rs9461045, which shows the strongest association with dyslexia in our sample (max p-value = 0.0001), confers reduced luciferase expression in both neuronal and non-neuronal cell lines. Additionally, we found that the presence of this rs9461045 dyslexia-associated allele creates a nuclear protein-binding site, likely for the transcriptional silencer OCT-1. Knocking down OCT-1 expression in the neuronal cell line SHSY5Y using an siRNA restores KIAA0319 expression from the risk haplotype to nearly that seen from the non-risk haplotype. Our study thus pinpoints a common variant as altering the function of a dyslexia candidate gene and provides an illustrative example of the strategic approach needed to dissect the molecular basis of complex genetic traits.

    Funded by: Wellcome Trust: 074318

    PLoS genetics 2009;5;3;e1000436

  • The evolution of protein functions and networks: a family-centric approach.

    Dessailly BH, Reid AJ, Yeats C, Lees JG, Cuff A and Orengo CA

    Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.

    The study of superfamilies of protein domains using a combination of structure, sequence and function data provides insights into deep evolutionary history. In the present paper, analyses of functional diversity within such superfamilies as defined in the CATH-Gene3D resource are described. These analyses focus on structure-function relationships in very large and diverse superfamilies, and on the evolution of domain superfamily members in protein-protein complexes.

    Funded by: Biotechnology and Biological Sciences Research Council

    Biochemical Society transactions 2009;37;Pt 4;745-50

  • A genetic association study in the Gambia using tagging polymorphisms in the major histocompatibility complex class III region implicates a HLA-B associated transcript 2 polymorphism in severe malaria susceptibility.

    Diakite M, Clark TG, Auburn S, Campino S, Fry AE, Green A, Morris AP, Richardson A, Jallow M, Sisay-Joof F, Pinder M, Kwiatkowski DP and Rockett KA

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.

    The tumour necrosis factor (TNF) gene and other genes flanking it in the major histocompatibility complex (MHC) class III region are potentially important mediators of both immunity and pathogenesis of malaria. We investigated the association of severe malaria with 11 haplotype tagging-polymorphisms for 11 MHC class III candidate genes, including TNF, lymphotoxin alpha (LTA), allograft inflammatory factor 1 (AIF1), and HLA-B associated transcript 2 (BAT2). An analysis of 2,162 case-controls demonstrated the first evidence of association between a BAT2 polymorphism (rs1046089) and severe malaria.

    Funded by: Medical Research Council: G0200454(62635), G0600230(77610), G0600718(80133); Wellcome Trust: 077383, 081682, 082370

    Human genetics 2009;125;1;105-9

  • Common regulatory variation impacts gene expression in a cell type-dependent manner.

    Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, Attar-Cohen H, Ingle C, Beazley C, Gutierrez Arcelus M, Sekowska M, Gagnebin M, Nisbett J, Deloukas P, Dermitzakis ET and Antonarakis SE

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1HH, Cambridge, UK.

    Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type-specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type-specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.

    Funded by: Wellcome Trust: 077011, 077046

    Science (New York, N.Y.) 2009;325;5945;1246-50

  • Ectopic recombination of a malaria var gene during mitosis associated with an altered var switch rate.

    Duffy MF, Byrne TJ, Carret C, Ivens A and Brown GV

    Department of Medicine at RMH, University of Melbourne, Parkville 3050, Australia. mduffy@unimelb.edu.au

    The Plasmodium falciparum var multigene family encodes P. falciparum erythrocyte membrane protein 1, which is responsible for the pathogenic traits of antigenic variation and adhesion of infected erythrocytes to host receptors during malaria infection. Clonal antigenic variation of P. falciparum erythrocyte membrane protein 1 is controlled by the switching between exclusively transcribed var genes. The tremendous diversity of the var gene repertoire both within and between parasite strains is critical for the parasite's strategy of immune evasion. We show that ectopic recombination between var genes occurs during mitosis, providing P. falciparum with opportunities to diversify its var repertoire, even during the course of a single infection. We show that the regulation of the recombined var gene has been disrupted, resulting in its persistent activation although the regulation of most other var genes is unaffected. The var promoter and intron of the recombined var gene are not responsible for its atypically persistent activity, and we conclude that altered subtelomeric cis sequence is the most likely cause of the persistent activity of the recombined var gene.

    Journal of molecular biology 2009;389;3;453-69

  • Genetic Loci associated with C-reactive protein levels and risk of coronary heart disease.

    Elliott P, Chambers JC, Zhang W, Clarke R, Hopewell JC, Peden JF, Erdmann J, Braund P, Engert JC, Bennett D, Coin L, Ashby D, Tzoulaki I, Brown IJ, Mt-Isa S, McCarthy MI, Peltonen L, Freimer NB, Farrall M, Ruokonen A, Hamsten A, Lim N, Froguel P, Waterworth DM, Vollenweider P, Waeber G, Jarvelin MR, Mooser V, Scott J, Hall AS, Schunkert H, Anand SS, Collins R, Samani NJ, Watkins H and Kooner JS

    Faculty of Medicine, Imperial College London, London, United Kingdom. p.elliott@imperial.ac.uk

    Context: Plasma levels of C-reactive protein (CRP) are independently associated with risk of coronary heart disease, but whether CRP is causally associated with coronary heart disease or merely a marker of underlying atherosclerosis is uncertain.

    Objective: To investigate association of genetic loci with CRP levels and risk of coronary heart disease.

    We first carried out a genome-wide association (n = 17,967) and replication study (n = 13,615) to identify genetic loci associated with plasma CRP concentrations. Data collection took place between 1989 and 2008 and genotyping between 2003 and 2008. We carried out a mendelian randomization study of the most closely associated single-nucleotide polymorphism (SNP) in the CRP locus and published data on other CRP variants involving a total of 28,112 cases and 100,823 controls, to investigate the association of CRP variants with coronary heart disease. We compared our finding with that predicted from meta-analysis of observational studies of CRP levels and risk of coronary heart disease. For the other loci associated with CRP levels, we selected the most closely associated SNP for testing against coronary heart disease among 14,365 cases and 32,069 controls.

    Risk of coronary heart disease.

    Results: Polymorphisms in 5 genetic loci were strongly associated with CRP levels (% difference per minor allele): SNP rs6700896 in LEPR (-14.8%; 95% confidence interval [CI], -17.6% to -12.0%; P = 6.2 x 10(-22)), rs4537545 in IL6R (-11.5%; 95% CI, -14.4% to -8.5%; P = 1.3 x 10(-12)), rs7553007 in the CRP locus (-20.7%; 95% CI, -23.4% to -17.9%; P = 1.3 x 10(-38)), rs1183910 in HNF1A (-13.8%; 95% CI, -16.6% to -10.9%; P = 1.9 x 10(-18)), and rs4420638 in APOE-CI-CII (-21.8%; 95% CI, -25.3% to -18.1%; P = 8.1 x 10(-26)). Association of SNP rs7553007 in the CRP locus with coronary heart disease gave an odds ratio (OR) of 0.98 (95% CI, 0.94 to 1.01) per 20% lower CRP level. Our mendelian randomization study of variants in the CRP locus showed no association with coronary heart disease: OR, 1.00; 95% CI, 0.97 to 1.02; per 20% lower CRP level, compared with OR, 0.94; 95% CI, 0.94 to 0.95; predicted from meta-analysis of the observational studies of CRP levels and coronary heart disease (z score, -3.45; P < .001). SNPs rs6700896 in LEPR (OR, 1.06; 95% CI, 1.02 to 1.09; per minor allele), rs4537545 in IL6R (OR, 0.94; 95% CI, 0.91 to 0.97), and rs4420638 in the APOE-CI-CII cluster (OR, 1.16; 95% CI, 1.12 to 1.21) were all associated with risk of coronary heart disease.

    Conclusion: The lack of concordance between the effect on coronary heart disease risk of CRP genotypes and CRP levels argues against a causal association of CRP with coronary heart disease.

    Funded by: British Heart Foundation: LSHM33 CT-2007-037273, SP/04/002; Cancer Research UK; Department of Health; Medical Research Council; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: 089061

    JAMA : the journal of the American Medical Association 2009;302;1;37-48

  • Genome-wide association study identifies variants at 9p21 and 22q13 associated with development of cutaneous nevi.

    Falchi M, Bataille V, Hayward NK, Duffy DL, Bishop JA, Pastinen T, Cervino A, Zhao ZZ, Deloukas P, Soranzo N, Elder DE, Barrett JH, Martin NG, Bishop DT, Montgomery GW and Spector TD

    Department of Twin Research & Genetic Epidemiology, Kings College London, St. Thomas' Hospital Campus, London, UK. m.falchi@imperial.ac.uk

    A high melanocytic nevi count is the strongest known risk factor for cutaneous melanoma. We conducted a genome-wide association study for nevus count using 297,108 SNPs in 1,524 twins, with validation in an independent cohort of 4,107 individuals. We identified strongly associated variants in MTAP, a gene adjacent to the familial melanoma susceptibility locus CDKN2A on 9p21 (rs4636294, combined P = 3.4 x 10(-15)), as well as in PLA2G6 on 22q13.1 (rs2284063, combined P = 3.4 x 10(-8)). In addition, variants in these two loci showed association with melanoma risk in 3,131 melanoma cases from two independent studies, including rs10757257 at 9p21, combined P = 3.4 x 10(-8), OR = 1.23 (95% CI = 1.15-1.30) and rs132985 at 22q13.1, combined P = 2.6 x 10(-7), OR = 1.23 (95% CI = 1.15-1.30). This provides the first report of common variants associated to nevus number and demonstrates association of these variants with melanoma susceptibility.

    Funded by: Cancer Research UK: 10589, C588/A4994; Department of Health; NCI NIH HHS: CA88363, R01 CA083115-08, R01 CA83115; Wellcome Trust: 077011, 091746

    Nature genetics 2009;41;8;915-9

  • Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.

    Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Cambridge, UK.

    The molecular complexity of mammalian proteomes demands new methods for mapping the organization of multiprotein complexes. Here, we combine mouse genetics and proteomics to characterize synapse protein complexes and interaction networks. New tandem affinity purification (TAP) tags were fused to the carboxyl terminus of PSD-95 using gene targeting in mice. Homozygous mice showed no detectable abnormalities in PSD-95 expression, subcellular localization or synaptic electrophysiological function. Analysis of multiprotein complexes purified under native conditions by mass spectrometry defined known and new interactors: 118 proteins comprising crucial functional components of synapses, including glutamate receptors, K+ channels, scaffolding and signaling proteins, were recovered. Network clustering of protein interactions generated five connected clusters, with two clusters containing all the major ionotropic glutamate receptors and one cluster with voltage-dependent K+ channels. Annotation of clusters with human disease associations revealed that multiple disorders map to the network, with a significant correlation of schizophrenia within the glutamate receptor clusters. This targeted TAP tagging strategy is generally applicable to mammalian proteomics and systems biology approaches to disease.

    Funded by: Wellcome Trust

    Molecular systems biology 2009;5;269

  • Convergent extension movements and ciliary function are mediated by ofd1, a zebrafish orthologue of the human oral-facial-digital type 1 syndrome gene.

    Ferrante MI, Romio L, Castro S, Collins JE, Goulding DA, Stemple DL, Woolf AS and Wilson SW

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK.

    In humans, OFD1 is mutated in oral-facial-digital type I syndrome leading to prenatal death in hemizygous males and dysmorphic faces and brain malformations, with polycystic kidneys presenting later in life in heterozygous females. To elucidate the function of Ofd1, we have studied its function during zebrafish embryonic development. In wild-type embryos, ofd1 mRNA is widely expressed and Ofd1-green fluorescent protein (GFP) fusion localizes to the centrosome/basal body. Disrupting Ofd1 using antisense morpholinos (MOs) led to bent body axes, hydrocephalus and oedema. Laterality was randomized in the brain, heart and viscera, likely a consequence of shorter cilia with disrupted axonemes and perturbed intravesicular fluid flow in Kupffer's vesicle. Embryos injected with ofd1 MOs also displayed convergent extension (CE) defects, which were enhanced by loss of Slb/Wnt11 or Tri/Vangl2, two proteins functioning in a non-canonical Wnt/Planar Cell Polarity (PCP) pathway. Pronephric glomerular midline fusion was compromised in vangl2 and ofd1 loss of function embryos and we suggest this anomaly may be a novel CE defect. Thus, Ofd1 is required for ciliary motility and function in zebrafish, supporting data showing that Ofd1 is essential for primary cilia function in mice. In addition, our data show that Ofd1 is important for CE during gastrulation, consistent with data linking primary cilia and non-canonical Wnt/PCP signalling.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 075311, WT077037/Z/05/Z

    Human molecular genetics 2009;18;2;289-303

  • Meeting Report: "Metagenomics, Metadata and Meta-analysis" (M3) Special Interest Group at ISMB 2009.

    Field D, Friedberg I, Sterk P, Kottmann R, Glöckner FO, Hirschman L, Garrity GM, Cochrane G, Wooley J and Gilbert J

    This report summarizes the proceedings of the "Metagenomics, Metadata and Meta-analysis" (M3) Special Interest Group (SIG) meeting held at the Intelligent Systems for Molecular Biology 2009 conference. The Genomic Standards Consortium (GSC) hosted this meeting to explore the bottlenecks and emerging solutions for obtaining biological insights through large-scale comparative analysis of metagenomic datasets. The M3 SIG included 16 talks, half of which were selected from submitted abstracts, a poster session and a panel discussion involving members of the GSC Board. This report summarizes this one-day SIG, attempts to identify shared themes and recapitulates community recommendations for the future of this field. The GSC will also host an M3 workshop at the Pacific Symposium on Biocomputing (PSB) in January 2010. Further information about the GSC and its range of activities can be found at http://gensc.org/.

    Standards in genomic sciences 2009;1;3;278-82

  • Meeting Report from the Genomic Standards Consortium (GSC) Workshops 6 and 7.

    Field D, Sterk P, Kyrpides N, Kottmann R, Glöckner FO, Hirschman L, Garrity GM, Wooley J and Gilna P

    This report summarizes the proceedings of the 6th and 7th workshops of the Genomic Standards Consortium (GSC), held back-to-back in 2008. GSC 6 focused on furthering the activities of GSC working groups, GSC 7 focused on outreach to the wider community. GSC 6 was held October 10-14, 2008 at the European Bioinformatics Institute, Cambridge, United Kingdom and included a two-day workshop focused on the refinement of the Genomic Contextual Data Markup Language (GCDML). GSC 7 was held as the opening day of the International Congress on Metagenomics 2008 in San Diego California. Major achievements of these combined meetings included an agreement from the International Nucleotide Sequence Database Consortium (INSDC) to create a "MIGS" keyword for capturing "Minimum Information about a Genome Sequence" compliant information within INSDC (DDBJ/EMBL /Genbank) records, launch of GCDML 1.0, MIGS compliance of the first set of "Genomic Encyclopedia of Bacteria and Archaea" project genomes, approval of a proposal to extend MIGS to 16S rRNA sequences within a "Minimum Information about an Environmental Sequence", finalization of plans for the GSC eJournal, "Standards in Genomic Sciences" (SIGS), and the formation of a GSC Board. Subsequently, the GSC has been awarded a Research Co-ordination Network (RCN4GSC) grant from the National Science Foundation, held the first SIGS workshop and launched the journal. The GSC will also be hosting outreach workshops at both ISMB 2009 and PSB 2010 focused on "Metagenomics, Metadata and MetaAnalysis" (M(3)). Further information about the GSC and its range of activities can be found at http://gensc.org, including videos of all the presentations at GSC 7.

    Standards in genomic sciences 2009;1;1;68-71

  • DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources.

    Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Van Vooren S, Moreau Y, Pettett RM and Carter NP

    Cambridge University Department of Medical Genetics, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK. hvf21@cam.ac.uk

    Many patients suffering from developmental disorders harbor submicroscopic deletions or duplications that, by affecting the copy number of dosage-sensitive genes or disrupting normal gene expression, lead to disease. However, many aberrations are novel or extremely rare, making clinical interpretation problematic and genotype-phenotype correlations uncertain. Identification of patients sharing a genomic rearrangement and having phenotypic features in common leads to greater certainty in the pathogenic nature of the rearrangement and enables new syndromes to be defined. To facilitate the analysis of these rare events, we have developed an interactive web-based database called DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations. DECIPHER catalogs common copy-number changes in normal populations and thus, by exclusion, enables changes that are novel and potentially pathogenic to be identified. DECIPHER enhances genetic counseling by retrieving relevant information from a variety of bioinformatics resources. Known and predicted genes within an aberration are listed in the DECIPHER patient report, and genes of recognized clinical importance are highlighted and prioritized. DECIPHER enables clinical scientists worldwide to maintain records of phenotype and chromosome rearrangement for their patients and, with informed consent, share this information with the wider clinical research community through display in the genome browser Ensembl. By sharing cases worldwide, clusters of rare cases having phenotype and structural rearrangement in common can be identified, leading to the delineation of new syndromes and furthering understanding of gene function.

    Funded by: Wellcome Trust: WT077008

    American journal of human genetics 2009;84;4;524-33

  • Genetic association study for RSV bronchiolitis in infancy at the 5q31 cytokine cluster.

    Forton JT, Rowlands K, Rockett K, Hanchard N, Herbert M, Kwiatkowski DP and Hull J

    The Wellcome Trust Centre for Human Genetics, University of Oxford, UK. julian.forton@paediatrics.ox.ac.uk

    Background: The pathophysiological basis of severe respiratory syncytial virus (RSV) bronchiolitis in infancy is poorly understood and has hindered vaccine development. Studies implicate the cell-mediated immune response in the pathogenesis of the disease. A recent twin study estimated a heritable contribution of 22% to RSV bronchiolitis. Genetic epidemiology provides a new approach to identifying important immune determinants of disease severity.

    Methods: A comprehensive high-density gene-region association study for severe RSV bronchiolitis in infancy at 5q31 across 11 genes including the Th2-cytokine cluster was performed. A haplotype tagging approach was used to analyse genetic variation at 113 single nucleotide polymorphisms (SNPs) in 780 independent cases and 1045 controls. The study had sufficient power to detect small effects, perform extensive haplotype analysis and analyse both a principal phenotype and a refined age-limited phenotype enriched for first-exposure RSV infection.

    Results: SNP associations were found at IL4 and a highly significant risk haplotype was identified across IL13 CNS-1 and IL4 (odds ratio 1.69, p<0.0001), present in both case-control and family-based analyses. All associations were strongest for a phenotype limited to <6 months of age, implicating this locus in primary RSV disease. The same risk haplotype has previously been shown to be associated with increased IL13 expression.

    Conclusions: A haplotype at IL13-1L4, which is associated with increased IL13 production, confers an increased risk of severe primary RSV bronchiolitis in early infancy. This study, together with previous studies implicating the same locus in atopic sensitisation, suggests that primary RSV bronchiolitis and atopy share a genetic contribution at the IL13-IL4 locus.

    Funded by: Medical Research Council; Wellcome Trust: 071472, 082370

    Thorax 2009;64;4;345-52

  • Rfam: updates to the RNA families database.

    Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK. pg5@sanger.ac.uk

    Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at http://rfam.sanger.ac.uk/and http://rfam.janelia.org/.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: 077044

    Nucleic acids research 2009;37;Database issue;D136-40

  • A home for RNA families at RNA Biology

    Gardner, P. P, Bateman, A. G.

    Rna Biology 2009;6;2-4

  • Reduced TFAP2A function causes variable optic fissure closure and retinal defects and sensitizes eye development to mutations in other morphogenetic regulators.

    Gestri G, Osborne RJ, Wyatt AW, Gerrelli D, Gribble S, Stewart H, Fryer A, Bunyan DJ, Prescott K, Collin JR, Fitzgerald T, Robinson D, Carter NP, Wilson SW and Ragge NK

    Department of Cell and Developmental Biology, UCL, London, UK.

    Mutations in the transcription factor encoding TFAP2A gene underlie branchio-oculo-facial syndrome (BOFS), a rare dominant disorder characterized by distinctive craniofacial, ocular, ectodermal and renal anomalies. To elucidate the range of ocular phenotypes caused by mutations in TFAP2A, we took three approaches. First, we screened a cohort of 37 highly selected individuals with severe ocular anomalies plus variable defects associated with BOFS for mutations or deletions in TFAP2A. We identified one individual with a de novo TFAP2A four amino acid deletion, a second individual with two non-synonymous variations in an alternative splice isoform TFAP2A2, and a sibling-pair with a paternally inherited whole gene deletion with variable phenotypic expression. Second, we determined that TFAP2A is expressed in the lens, neural retina, nasal process, and epithelial lining of the oral cavity and palatal shelves of human and mouse embryos--sites consistent with the phenotype observed in patients with BOFS. Third, we used zebrafish to examine how partial abrogation of the fish ortholog of TFAP2A affects the penetrance and expressivity of ocular phenotypes due to mutations in genes encoding bmp4 or tcf7l1a. In both cases, we observed synthetic, enhanced ocular phenotypes including coloboma and anophthalmia when tfap2a is knocked down in embryos with bmp4 or tcf7l1a mutations. These results reveal that mutations in TFAP2A are associated with a wide range of eye phenotypes and that hypomorphic tfap2a mutations can increase the risk of developmental defects arising from mutations at other loci.

    Funded by: Medical Research Council: G0501487, G0700089; Wellcome Trust: 074376, 078047, WT077008

    Human genetics 2009;126;6;791-803

  • Neonates harbour highly active gammadelta T cells with selective impairments in preterm infants.

    Gibbons DL, Haque SF, Silberzahn T, Hamilton K, Langford C, Ellis P, Carr R and Hayday AC

    Peter Gorer Department of Immunobiology, London, UK.

    Acknowledgement of the breadth of T-cell pleiotropy has provoked increasing interest in the degree to which functional responsiveness is elicited by environmental cues versus differentiation. This is particularly relevant for young animals requiring rapid responses to acute environmental exposure. In young mice, gammadelta T cells are disproportionately important for immuno-protection. To examine the situation in humans, we compared populations and clones of T cells from term and preterm babies, and adults. By comparison with alphabeta T cells, neonate-derived gammadelta cells show stronger, pleiotropic functional responsiveness, and lack signatory deficits in IFN-gamma production. Emphasising the acquisition of functional competence in utero, IFN-gamma was produced by gammadelta cells sampled from premature births, and, although one month's post-partum environmental exposure invariably increased their TNF-alpha production, it had no consistent effect on IFN-gamma or IL-2. In sum, gammadelta cells seem well positioned at birth to contribute to immuno-protection and immuno-regulation, possibly compensating for selective immaturity in the alphabeta compartment. With regard to the susceptibilities of preterm babies to viral infection, gammadelta cells from preterm neonates were commonly impaired in Toll-like receptor-3 and -7 expression and compared with cells from term babies failed to optimise cytokine production in response to coincident TCR and TLR agonists.

    Funded by: PHS HHS: R0161799; Wellcome Trust: 071534

    European journal of immunology 2009;39;7;1794-806

  • The porin OmpD from nontyphoidal Salmonella is a key target for a protective B1b cell antibody response.

    Gil-Cruz C, Bobat S, Marshall JL, Kingsley RA, Ross EA, Henderson IR, Leyton DL, Coughlan RE, Khan M, Jensen KT, Buckley CD, Dougan G, MacLennan IC, López-Macías C and Cunningham AF

    School of Immunity and Infection and Medical Research Council Centre for Immune Regulation, University of Birmingham, Birmingham B15 2TT, United Kingdom.

    Invasive nontyphoidal Salmonella (NTS), including Salmonella typhimurium (STm), are major yet poorly-recognized killers of infants in sub-Saharan Africa. Death in these children is usually associated with bacteremia, commonly in the absence of gastrointestinal symptoms. Evidence from humans and animal studies suggest that severe infection and bacteremia occur when specific Ab is lacking. Understanding how Ab responses to Salmonella are regulated will help develop vaccines against these devastating infections. STm induces atypical Ab responses characterized by prominent, accelerated, extrafollicular T-independent (TI) Ab against a range of surface antigens. These responses develop without concomitant germinal centers, which only appear as infection resolves. Here, we show STm rapidly induces a population of TI B220(+)CD5(-) B1b cells during infection and TI Ab from B1b cells targets the outer membrane protein (Omp) porins OmpC, OmpD and OmpF but not flagellin. When porins are used as immunogens they can ablate bacteremia and provide equivalent protection against STm as killed bacterial vaccine and this is wholly B cell-dependent. Furthermore Ab from porin-immunized chimeras, that have B1b cells, is sufficient to impair infection. Infecting with porin-deficient bacteria identifies OmpD, a protein absent from Salmonella Typhi, as a key target of Ab in these infections. This work broadens the recognized repertoire of TI protein antigens and highlights the importance of Ab from different B cell subsets in controlling STm infection. OmpD is a strong candidate vaccine target and may, in part, explain the lack of cross-protection between Salmonella Typhi and STm infections.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2009;106;24;9803-8

  • A general basis for cognition in the evolution of synapse signaling complexes.

    Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Cambridge, United Kingdom. sg3@sanger.ac.uk

    Beneath the complexity of the human brain are molecular principles shaped by evolution explaining the origins of the behavioral repertoire. The role of the nervous system is to provide a repertoire of behaviors allowing the animal to respond and adapt to changing environments during the course of its life. Multiprotein complexes in the postsynaptic terminal of synapses control adaptive and cognitive processes in metazoan nervous systems. These multiprotein complexes are organized into molecular networks that detect and respond to patterns of neural activity. Combinations of proteins are used to build different complexes and pathways producing great diversity. These complexes evolved from an ancestral core set of proteins controlling adaptive behaviors in unicellular organisms known as the protosynapse. Later expansion in numbers and interactions resulted in more complex synapses in invertebrates and vertebrates. The resultant combinatorial complexity has contributed to the neuroanatomical, neurophysiological, and behavioral diversity in these species. Mutations in genes encoding the complexes result in many human diseases of the nervous system. This general mechanism of cognition provides a useful template for studying evolution of behavior in all animals.

    Funded by: Wellcome Trust

    Cold Spring Harbor symposia on quantitative biology 2009;74;249-57

  • Genomic and epigenetic evidence for oxytocin receptor deficiency in autism.

    Gregory SG, Connelly JJ, Towers AJ, Johnson J, Biscocho D, Markunas CA, Lintas C, Abramson RK, Wright HH, Ellis P, Langford CF, Worley G, Delong GR, Murphy SK, Cuccaro ML, Persico A and Pericak-Vance MA

    Duke Center for Human Genetics, DUMC, Durham, NC, USA. simon.gregory@duke.edu

    Background: Autism comprises a spectrum of behavioral and cognitive disturbances of childhood development and is known to be highly heritable. Although numerous approaches have been used to identify genes implicated in the development of autism, less than 10% of autism cases have been attributed to single gene disorders.

    Methods: We describe the use of high-resolution genome-wide tilepath microarrays and comparative genomic hybridization to identify copy number variants within 119 probands from multiplex autism families. We next carried out DNA methylation analysis by bisulfite sequencing in a proband and his family, expanding this analysis to methylation analysis of peripheral blood and temporal cortex DNA of autism cases and matched controls from independent datasets. We also assessed oxytocin receptor (OXTR) gene expression within the temporal cortex tissue by quantitative real-time polymerase chain reaction (PCR).

    Results: Our analysis revealed a genomic deletion containing the oxytocin receptor gene, OXTR (MIM accession no.: 167055), previously implicated in autism, was present in an autism proband and his mother who exhibits symptoms of obsessive-compulsive disorder. The proband's affected sibling did not harbor this deletion but instead may exhibit epigenetic misregulation of this gene through aberrant gene silencing by DNA methylation. Further DNA methylation analysis of the CpG island known to regulate OXTR expression identified several CpG dinucleotides that show independent statistically significant increases in the DNA methylation status in the peripheral blood cells and temporal cortex in independent datasets of individuals with autism as compared to control samples. Associated with the increase in methylation of these CpG dinucleotides is our finding that OXTR mRNA showed decreased expression in the temporal cortex tissue of autism cases matched for age and sex compared to controls.

    Conclusion: Together, these data provide further evidence for the role of OXTR and the oxytocin signaling pathway in the etiology of autism and, for the first time, implicate the epigenetic regulation of OXTR in the development of the disorder.See the related commentary by Gurrieri and Neri: http://www.biomedcentral.com/1741-7015/7/63.

    Funded by: NINDS NIH HHS: P01-NS026630; PHS HHS: R01-NIH080647

    BMC medicine 2009;7;62

  • Genetic utility of broadly defined bipolar schizoaffective disorder as a diagnostic concept.

    Hamshere ML, Green EK, Jones IR, Jones L, Moskvina V, Kirov G, Grozeva D, Nikolov I, Vukcevic D, Caesar S, Gordon-Smith K, Fraser C, Russell E, Breen G, St Clair D, Collier DA, Young AH, Ferrier IN, Farmer A, McGuffin P, Wellcome Trust Case Control Consortium, Holmans PA, Owen MJ, O'Donovan MC and Craddock N

    Biostatistics and Bioinformatics Unit and Department of Psychological Medicine, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK.

    Background: Psychiatric phenotypes are currently defined according to sets of descriptive criteria. Although many of these phenotypes are heritable, it would be useful to know whether any of the various diagnostic categories in current use identify cases that are particularly helpful for biological-genetic research.

    Aims: To use genome-wide genetic association data to explore the relative genetic utility of seven different descriptive operational diagnostic categories relevant to bipolar illness within a large UK case-control bipolar disorder sample.

    Method: We analysed our previously published Wellcome Trust Case Control Consortium (WTCCC) bipolar disorder genome-wide association data-set, comprising 1868 individuals with bipolar disorder and 2938 controls genotyped for 276 122 single nucleotide polymorphisms (SNPs) that met stringent criteria for genotype quality. For each SNP we performed a test of association (bipolar disorder group v. control group) and used the number of associated independent SNPs statistically significant at P<0.00001 as a metric for the overall genetic signal in the sample. We next compared this metric with that obtained using each of seven diagnostic subsets of the group with bipolar disorder: Research Diagnostic Criteria (RDC): bipolar I disorder; manic disorder; bipolar II disorder; schizoaffective disorder, bipolar type; DSM-IV: bipolar I disorder; bipolar II disorder; schizoaffective disorder, bipolar type.

    Results: The RDC schizoaffective disorder, bipolar type (v. controls) stood out from the other diagnostic subsets as having a significant excess of independent association signals (P<0.003) compared with that expected in samples of the same size selected randomly from the total bipolar disorder group data-set. The strongest association in this subset of participants with bipolar disorder was at rs4818065 (P = 2.42 x 10(-7)). Biological systems implicated included gamma amniobutyric acid (GABA)(A) receptors. Genes having at least one associated polymorphism at P<10(-4) included B3GALTS, A2BP1, GABRB1, AUTS2, BSN, PTPRG, GIRK2 and CDH12.

    Conclusions: Our findings show that individuals with broadly defined bipolar schizoaffective features have either a particularly strong genetic contribution or that, as a group, are genetically more homogeneous than the other phenotypes tested. The results point to the importance of using diagnostic approaches that recognise this group of individuals. Our approach can be applied to similar data-sets for other psychiatric and non-psychiatric phenotypes.

    Funded by: Medical Research Council: G0000647, G0000934, G0701003; Wellcome Trust: 060620

    The British journal of psychiatry : the journal of mental science 2009;195;1;23-9

  • Cross-species chromosome painting corroborates microchromosome fusion during karyotype evolution of birds.

    Hansmann T, Nanda I, Volobouev V, Yang F, Schartl M, Haaf T and Schmid M

    Department of a Human Genetics, University of Würzburg, Würzburg , Germany.

    The stone curlew, also known as thick-knee (Burhinus oedicnemus, BOE), represents a phylogenetically young species of the shorebirds (Charadriiformes) that exhibits one of the most atypical genome organizations known within the class of Aves, due to an extremely low diploid number (2n = 42) and only 6 pairs of microchromosomes in its complement. This distinct deviation from the 'typical' avian karyotype is attributed to repeated fusions of ancestral microchromosomes. In order to compare different species with this atypical avian karyotype and to investigate the chromosome rearrangement patterns, chromosome-specific painting probes representing the whole genome of the stone curlew were used to delineate chromosome homology between BOE and 5 species belonging to 5 different avian orders: herring gull (Charadriiformes), cockatiel (Psittaciformes), rock pigeon (Columbiformes), great gray owl (Strigiformes) and Eurasian coot (Gruiformes). Paints derived from the 20 BOE autosomes delimited 28 to 33 evolutionarily conserved segments in the karyotypes of the 5 species, similar to the number recognized by BOE paints in such a basal lineage as the chicken (28 conserved segments). This suggests a high degree of conservation in genome organization in birds. BOE paints also revealed some species-specific rearrangements. In particular, chromosomes BOE1-4 and 14, as well as to a large extent BOE5 and 6, showed conserved synteny with macrochromosomes, whereas homologous regions for BOE7-13 are found to be largely distributed on microchromosomes in the species investigated. Interestingly, the 6 pairs of BOE microchromosomes 15-20 appear to have undergone very few rearrangements in the 5 lineages investigated. Although the arrangements of BOE homologous segments on some chromosomes can be explained by complex fusions and inversions, the occurrence of homologous regions at multiple sites may point to fission of ancestral chromosomes in the karyotypes of the species investigated. However, the present results demonstrate that the ancestral microchromosomes most likely experienced fusion in the stone curlew lineage forming the medium-sized BOE chromosomes, while they have been conserved as microchromosomes in the other neoavian lineages.

    Cytogenetic and genome research 2009;126;3;281-304

  • Identification of MAMDC1 as a candidate susceptibility gene for systemic lupus erythematosus (SLE).

    Hellquist A, Zucchelli M, Lindgren CM, Saarialho-Kere U, Järvinen TM, Koskenmies S, Julkunen H, Onkamo P, Skoog T, Panelius J, Räisänen-Sokolowski A, Hasan T, Widen E, Gunnarson I, Svenungsson E, Padyukov L, Assadi G, Berglind L, Mäkelä VV, Kivinen K, Wong A, Cunningham Graham DS, Vyse TJ, D'Amato M and Kere J

    Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.

    Background: Systemic lupus erythematosus (SLE) is a complex autoimmune disorder with multiple susceptibility genes. We have previously reported suggestive linkage to the chromosomal region 14q21-q23 in Finnish SLE families.

    Genetic fine mapping of this region in the same family material, together with a large collection of parent affected trios from UK and two independent case-control cohorts from Finland and Sweden, indicated that a novel uncharacterized gene, MAMDC1 (MAM domain containing glycosylphosphatidylinositol anchor 2, also known as MDGA2, MIM 611128), represents a putative susceptibility gene for SLE. In a combined analysis of the whole dataset, significant evidence of association was detected for the MAMDC1 intronic single nucleotide polymorphisms (SNP) rs961616 (P -value = 0.001, Odds Ratio (OR) = 1.292, 95% CI 1.103-1.513) and rs2297926 (P -value = 0.003, OR = 1.349, 95% CI 1.109-1.640). By Northern blot, real-time PCR (qRT-PCR) and immunohistochemical (IHC) analyses, we show that MAMDC1 is expressed in several tissues and cell types, and that the corresponding mRNA is up-regulated by the pro-inflammatory cytokines tumour necrosis factor alpha (TNF-alpha) and interferon gamma (IFN-gamma) in THP-1 monocytes. Based on its homology to known proteins with similar structure, MAMDC1 appears to be a novel member of the adhesion molecules of the immunoglobulin superfamily (IgCAM), which is involved in cell adhesion, migration, and recruitment to inflammatory sites. Remarkably, some IgCAMs have been shown to interact with ITGAM, the product of another SLE susceptibility gene recently discovered in two independent genome wide association (GWA) scans.

    Significance: Further studies focused on MAMDC1 and other molecules involved in these pathways might thus provide new insight into the pathogenesis of SLE.

    PloS one 2009;4;12;e8037

  • Analysis of expressed sequence tags from the four main developmental stages of Trypanosoma congolense.

    Helm JR, Hertz-Fowler C, Aslett M, Berriman M, Sanders M, Quail MA, Soares MB, Bonaldo MF, Sakurai T, Inoue N and Donelson JE

    Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA.

    Trypanosoma congolense is one of the most economically important pathogens of livestock in Africa. Culture-derived parasites of each of the three main insect stages of the T. congolense life cycle, i.e., the procyclic, epimastigote and metacyclic stages, and bloodstream stage parasites isolated from infected mice, were used to construct stage-specific cDNA libraries and expressed sequence tags (ESTs or cDNA clones) in each library were sequenced. Thirteen EST clusters encoding different variant surface glycoproteins (VSGs) were detected in the metacyclic library and 26 VSG EST clusters were found in the bloodstream library, 6 of which are shared by the metacyclic library. Rare VSG ESTs are present in the epimastigote library, and none were detected in the procyclic library. ESTs encoding enzymes that catalyze oxidative phosphorylation and amino acid metabolism are about twice as abundant in the procyclic and epimastigote stages as in the metacyclic and bloodstream stages. In contrast, ESTs encoding enzymes involved in glycolysis, the citric acid cycle and nucleotide metabolism are about the same in all four developmental stages. Cysteine proteases, kinases and phosphatases are the most abundant enzyme groups represented by the ESTs. All four libraries contain T. congolense-specific expressed sequences not present in the Trypanosoma brucei and Trypanosoma cruzi genomes. Normalized cDNA libraries were constructed from the metacyclic and bloodstream stages, and found to be further enriched for T. congolense-specific ESTs. Given that cultured T. congolense offers an experimental advantage over other African trypanosome species, these ESTs provide a basis for further investigation of the molecular properties of these four developmental stages, especially the epimastigote and metacyclic stages for which it is difficult to obtain large quantities of organisms. The T. congolense EST databases are available at: http://www.sanger.ac.uk/Projects/T_congolense/EST_index.shtml. The sequence data have been submitted to EMBL under the following accession numbers: FN263376-FN292969.

    Funded by: NIAID NIH HHS: R01 AI059451-05; Wellcome Trust

    Molecular and biochemical parasitology 2009;168;1;34-42

  • Rapid evolution of virulence and drug resistance in the emerging zoonotic pathogen Streptococcus suis.

    Holden MT, Hauser H, Sanders M, Ngo TH, Cherevach I, Cronin A, Goodhead I, Mungall K, Quail MA, Price C, Rabbinowitsch E, Sharp S, Croucher NJ, Chieu TB, Mai NT, Diep TS, Chinh NT, Kehoe M, Leigh JA, Ward PN, Dowson CG, Whatmore AM, Chanter N, Iversen P, Gottschalk M, Slater JD, Smith HE, Spratt BG, Xu J, Ye C, Bentley S, Barrell BG, Schultsz C, Maskell DJ and Parkhill J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. mh3@sanger.ac.uk

    Background: Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood.

    The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, approximately 40% of the approximately 2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three approximately 90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors.

    The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance.

    Funded by: Wellcome Trust: 089472

    PloS one 2009;4;7;e6072

  • Genomic evidence for the evolution of Streptococcus equi: host restriction, increased virulence, and genetic exchange with human pathogens.

    Holden MT, Heather Z, Paillot R, Steward KF, Webb K, Ainslie F, Jourdan T, Bason NC, Holroyd NE, Mungall K, Quail MA, Sanders M, Simmonds M, Willey D, Brooks K, Aanensen DM, Spratt BG, Jolley KA, Maiden MC, Kehoe M, Chanter N, Bentley SD, Robinson C, Maskell DJ, Parkhill J and Waller AS

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The continued evolution of bacterial pathogens has major implications for both human and animal disease, but the exchange of genetic material between host-restricted pathogens is rarely considered. Streptococcus equi subspecies equi (S. equi) is a host-restricted pathogen of horses that has evolved from the zoonotic pathogen Streptococcus equi subspecies zooepidemicus (S. zooepidemicus). These pathogens share approximately 80% genome sequence identity with the important human pathogen Streptococcus pyogenes. We sequenced and compared the genomes of S. equi 4047 and S. zooepidemicus H70 and screened S. equi and S. zooepidemicus strains from around the world to uncover evidence of the genetic events that have shaped the evolution of the S. equi genome and led to its emergence as a host-restricted pathogen. Our analysis provides evidence of functional loss due to mutation and deletion, coupled with pathogenic specialization through the acquisition of bacteriophage encoding a phospholipase A(2) toxin, and four superantigens, and an integrative conjugative element carrying a novel iron acquisition system with similarity to the high pathogenicity island of Yersinia pestis. We also highlight that S. equi, S. zooepidemicus, and S. pyogenes share a common phage pool that enhances cross-species pathogen evolution. We conclude that the complex interplay of functional loss, pathogenic specialization, and genetic exchange between S. equi, S. zooepidemicus, and S. pyogenes continues to influence the evolution of these important streptococci.

    Funded by: Wellcome Trust: 047072, 087622, 089472

    PLoS pathogens 2009;5;3;e1000346

  • The genome of Burkholderia cenocepacia J2315, an epidemic pathogen of cystic fibrosis patients.

    Holden MT, Seth-Smith HM, Crossman LC, Sebaihia M, Bentley SD, Cerdeño-Tárraga AM, Thomson NR, Bason N, Quail MA, Sharp S, Cherevach I, Churcher C, Goodhead I, Hauser H, Holroyd N, Mungall K, Scott P, Walker D, White B, Rose H, Iversen P, Mil-Homens D, Rocha EP, Fialho AM, Baldwin A, Dowson C, Barrell BG, Govan JR, Vandamme P, Hart CA, Mahenthiralingam E and Parkhill J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom. mh3@sanger.ac.uk

    Bacterial infections of the lungs of cystic fibrosis (CF) patients cause major complications in the treatment of this common genetic disease. Burkholderia cenocepacia infection is particularly problematic since this organism has high levels of antibiotic resistance, making it difficult to eradicate; the resulting chronic infections are associated with severe declines in lung function and increased mortality rates. B. cenocepacia strain J2315 was isolated from a CF patient and is a member of the epidemic ET12 lineage that originated in Canada or the United Kingdom and spread to Europe. The 8.06-Mb genome of this highly transmissible pathogen comprises three circular chromosomes and a plasmid and encodes a broad array of functions typical of this metabolically versatile genus, as well as numerous virulence and drug resistance functions. Although B. cenocepacia strains can be isolated from soil and can be pathogenic to both plants and man, J2315 is representative of a lineage of B. cenocepacia rarely isolated from the environment and which spreads between CF patients. Comparative analysis revealed that ca. 21% of the genome is unique in comparison to other strains of B. cenocepacia, highlighting the genomic plasticity of this species. Pseudogenes in virulence determinants suggest that the pathogenic response of J2315 may have been recently selected to promote persistence in the CF lung. The J2315 genome contains evidence that its unique and highly adapted genetic content has played a significant role in its success as an epidemic CF pathogen.

    Funded by: Wellcome Trust

    Journal of bacteriology 2009;191;1;261-77

  • Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder.

    Holmans P, Green EK, Pahwa JS, Ferreira MA, Purcell SM, Sklar P, Wellcome Trust Case-Control Consortium, Owen MJ, O'Donovan MC and Craddock N

    MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Heath Park, CF23 6BQ Cardiff, UK. holmanspa@cardiff.ac.uk

    We present a method for testing overrepresentation of biological pathways, indexed by gene-ontology terms, in lists of significant SNPs from genome-wide association studies. This method corrects for linkage disequilibrium between SNPs, variable gene size, and multiple testing of nonindependent pathways. The method was applied to the Wellcome Trust Case-Control Consortium Crohn disease (CD) data set. At a general level, the biological basis of CD is relatively well known for a complex genetic trait, and it thus acted as a test of the method. The method, known as ALIGATOR (Association LIst Go AnnoTatOR), successfully detected biological pathways implicated in CD. The method was also applied to a meta-analysis of bipolar disorder, and it implicated the modulation of transcription and cellular activity, including that which occurs via hormonal action, as an important player in pathogenesis.

    Funded by: Medical Research Council; Wellcome Trust

    American journal of human genetics 2009;85;1;13-24

  • Detecting SNPs and estimating allele frequencies in clonal bacterial populations by sequencing pooled DNA.

    Holt KE, Teo YY, Li H, Nair S, Dougan G, Wain J and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. kh2@sanger.ac.uk

    SUMMARY: Here, we present a method for estimating the frequencies of SNP alleles present within pooled samples of DNA using high-throughput short-read sequencing. The method was tested on real data from six strains of the highly monomorphic pathogen Salmonella Paratyphi A, sequenced individually and in a pool. A variety of read mapping and quality-weighting procedures were tested to determine the optimal parameters, which afforded > or =80% sensitivity of SNP detection and strong correlation with true SNP frequency at poolwide read depth of 40x, declining only slightly at read depths 20-40x. AVAILABILITY: The method was implemented in Perl and relies on the opensource software Maq for read mapping and SNP calling. The Perl script is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/pools/.

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;16;2074-5

  • Extensive molecular differences between anterior- and posterior-half-sclerotomes underlie somite polarity and spinal nerve segmentation.

    Hughes DS, Keynes RJ and Tannahill D

    Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, CB3 2DY, UK. dsthughes@gmail.com

    Background: The polarization of somite-derived sclerotomes into anterior and posterior halves underlies vertebral morphogenesis and spinal nerve segmentation. To characterize the full extent of molecular differences that underlie this polarity, we have undertaken a systematic comparison of gene expression between the two sclerotome halves in the mouse embryo.

    Results: Several hundred genes are differentially-expressed between the two sclerotome halves, showing that a marked degree of molecular heterogeneity underpins the development of somite polarity.

    Conclusion: We have identified a set of genes that warrant further investigation as regulators of somite polarity and vertebral morphogenesis, as well as repellents of spinal axon growth. Moreover the results indicate that, unlike the posterior half-sclerotome, the central region of the anterior-half-sclerotome does not contribute bone and cartilage to the vertebral column, being associated instead with the development of the segmented spinal nerves.

    Funded by: Medical Research Council; Wellcome Trust

    BMC developmental biology 2009;9;30

  • InterPro: the integrative protein signature database.

    Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH and Yeats C

    EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. hunter@ebi.ac.uk

    The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1, BB/F010508/1; NIGMS NIH HHS: GM081084; Wellcome Trust: 087656

    Nucleic acids research 2009;37;Database issue;D211-5

  • Complete genome sequence and comparative genome analysis of enteropathogenic Escherichia coli O127:H6 strain E2348/69.

    Iguchi A, Thomson NR, Ogura Y, Saunders D, Ooka T, Henderson IR, Harris D, Asadulghani M, Kurokawa K, Dean P, Kenny B, Quail MA, Thurston S, Dougan G, Hayashi T, Parkhill J and Frankel G

    Division of Bioenvironmental Science, Frontier Science Research Center, University of Miyazaki, Miyazaki, Japan.

    Enteropathogenic Escherichia coli (EPEC) was the first pathovar of E. coli to be implicated in human disease; however, no EPEC strain has been fully sequenced until now. Strain E2348/69 (serotype O127:H6 belonging to E. coli phylogroup B2) has been used worldwide as a prototype strain to study EPEC biology, genetics, and virulence. Studies of E2348/69 led to the discovery of the locus of enterocyte effacement-encoded type III secretion system (T3SS) and its cognate effectors, which play a vital role in attaching and effacing lesion formation on gut epithelial cells. In this study, we determined the complete genomic sequence of E2348/69 and performed genomic comparisons with other important E. coli strains. We identified 424 E2348/69-specific genes, most of which are carried on mobile genetic elements, and a number of genetic traits specifically conserved in phylogroup B2 strains irrespective of their pathotypes, including the absence of the ETT2-related T3SS, which is present in E. coli strains belonging to all other phylogroups. The genome analysis revealed the entire gene repertoire related to E2348/69 virulence. Interestingly, E2348/69 contains only 21 intact T3SS effector genes, all of which are carried on prophages and integrative elements, compared to over 50 effector genes in enterohemorrhagic E. coli O157. As E2348/69 is the most-studied pathogenic E. coli strain, this study provides a genomic context for the vast amount of existing experimental data. The unexpected simplicity of the E2348/69 T3SS provides the first opportunity to fully dissect the entire virulence strategy of attaching and effacing pathogens in the genomic context.

    Funded by: Medical Research Council: G0700151; Wellcome Trust

    Journal of bacteriology 2009;191;1;347-54

  • Weak preservation of local neutral substitution rates across mammalian genomes.

    Imamura H, Karro JE and Chuang JH

    Boston College, Department of Biology, Chestnut Hill, MA 02467, USA. himamura@itg.be

    Background: The rate at which neutral (non-functional) bases undergo substitution is highly dependent on their location within a genome. However, it is not clear how fast these location-dependent rates change, or to what extent the substitution rate patterns are conserved between lineages. To address this question, which is critical not only for understanding the substitution process but also for evaluating phylogenetic footprinting algorithms, we examine ancestral repeats: a predominantly neutral dataset with a significantly higher genomic density than other datasets commonly used to study substitution rate variation. Using this repeat data, we measure the extent to which orthologous ancestral repeat sequences exhibit similar substitution patterns in separate mammalian lineages, allowing us to ascertain how well local substitution rates have been preserved across species.

    Results: We calculated substitution rates for each ancestral repeat in each of three independent mammalian lineages (primate - from human/macaque alignments, rodent - from mouse/rat alignments, and laurasiatheria - from dog/cow alignments). We then measured the correlation of local substitution rates among these lineages. Overall we found the correlations between lineages to be statistically significant, but too weak to have much predictive power (r2 <5%). These correlations were found to be primarily driven by regional effects at the scale of several hundred kb or larger. A few repeat classes (e.g. 7SK, Charlie8, and MER121) also exhibited stronger conservation of rate patterns, likely due to the effect of repeat-specific purifying selection. These classes should be excluded when estimating local neutral substitution rates.

    Conclusion: Although local neutral substitution rates have some correlations among mammalian species, these correlations have little predictive power on the scale of individual repeats. This indicates that local substitution rates have changed significantly among the lineages we have studied, and are likely to have changed even more for more diverged lineages. The correlations that do persist are too weak to be responsible for many of the highly conserved elements found by phylogenetic footprinting algorithms, leading us to conclude that such elements must be conserved due to selective forces.

    BMC evolutionary biology 2009;9;89

  • Common variants at five new loci associated with early-onset inflammatory bowel disease.

    Imielinski M, Baldassano RN, Griffiths A, Russell RK, Annese V, Dubinsky M, Kugathasan S, Bradfield JP, Walters TD, Sleiman P, Kim CE, Muise A, Wang K, Glessner JT, Saeed S, Zhang H, Frackelton EC, Hou C, Flory JH, Otieno G, Chiavacci RM, Grundmeier R, Castro M, Latiano A, Dallapiccola B, Stempak J, Abrams DJ, Taylor K, McGovern D, Western Regional Alliance for Pediatric IBD, Silber G, Wrobel I, Quiros A, International IBD Genetics Consortium, Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, Brant SR, Silverberg MS, Taylor KD, Barmuda MM, Bitton A, Dassopoulos T, Datta LW, Green T, Griffiths AM, Kistner EO, Murtha MT, Regueiro MD, Rotter JI, Schumm LP, Steinhart AH, Targan SR, Xavier RJ, NIDDK IBD Genetics Consortium, Libioulle C, Sandor C, Lathrop M, Belaiche J, Dewit O, Gut I, Heath S, Laukens D, Mni M, Rutgeerts P, Van Gossum A, Zelenika D, Franchimont D, Hugot JP, de Vos M, Vermeire S, Louis E, Belgian-French IBD Consortium, Wellcome Trust Case Control Consortium, Cardon LR, Anderson CA, Drummond H, Nimmo E, Ahmad T, Prescott NJ, Onnie CM, Fisher SA, Marchini J, Ghori J, Bumpstead S, Gwillam R, Tremelling M, Delukas P, Mansfield J, Jewell D, Satsangi J, Mathew CG, Parkes M, Georges M, Daly MJ, Heyman MB, Ferry GD, Kirschner B, Lee J, Essers J, Grand R, Stephens M, Levine A, Piccoli D, Van Limbergen J, Cucchiara S, Monos DS, Guthery SL, Denson L, Wilson DC, Grant SF, Daly M, Silverberg MS, Satsangi J and Hakonarson H

    Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.

    The inflammatory bowel diseases (IBD) Crohn's disease and ulcerative colitis are common causes of morbidity in children and young adults in the western world. Here we report the results of a genome-wide association study in early-onset IBD involving 3,426 affected individuals and 11,963 genetically matched controls recruited through international collaborations in Europe and North America, thereby extending the results from a previous study of 1,011 individuals with early-onset IBD. We have identified five new regions associated with early-onset IBD susceptibility, including 16p11 near the cytokine gene IL27 (rs8049439, P = 2.41 x 10(-9)), 22q12 (rs2412973, P = 1.55 x 10(-9)), 10q22 (rs1250550, P = 5.63 x 10(-9)), 2q37 (rs4676410, P = 3.64 x 10(-8)) and 19q13.11 (rs10500264, P = 4.26 x 10(-10)). Our scan also detected associations at 23 of 32 loci previously implicated in adult-onset Crohn's disease and at 8 of 17 loci implicated in adult-onset ulcerative colitis, highlighting the close pathogenetic relationship between early- and adult-onset IBD.

    Funded by: Canadian Institutes of Health Research; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCRR NIH HHS: C06-RR11234, M01 RR002172-26, M01-RR00064; NIDDK NIH HHS: DK062423, DK069513, K24 DK060617-07, P30 DK040561-14, T32 DK007477, U01 DK062420, U01 DK062420-08; Wellcome Trust: 072789/Z/03/Z

    Nature genetics 2009;41;12;1335-40

  • Transposon-mediated genome manipulation in vertebrates.

    Ivics Z, Li MA, Mátés L, Boeke JD, Nagy A, Bradley A and Izsvák Z

    Max Delbrück Center for Molecular Medicine, Berlin, Germany. zivics@mdc-berlin.de

    Transposable elements are DNA segments with the unique ability to move about in the genome. This inherent feature can be exploited to harness these elements as gene vectors for genome manipulation. Transposon-based genetic strategies have been established in vertebrate species over the last decade, and current progress in this field suggests that transposable elements will serve as indispensable tools. In particular, transposons can be applied as vectors for somatic and germline transgenesis, and as insertional mutagens in both loss-of-function and gain-of-function forward mutagenesis screens. In addition, transposons will gain importance in future cell-based clinical applications, including nonviral gene transfer into stem cells and the rapidly developing field of induced pluripotent stem cells. Here we provide an overview of transposon-based methods used in vertebrate model organisms with an emphasis on the mouse system and highlight the most important considerations concerning genetic applications of the transposon systems.

    Funded by: NCI NIH HHS: P01 CA016519-340010; NIGMS NIH HHS: R01 GM036481-23

    Nature methods 2009;6;6;415-22

  • Genome-wide and fine-resolution association analysis of malaria in West Africa.

    Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, Somaskantharajah E, Whittaker P, Widden C, Donnelly P, Howie B, Marchini J, Morris A, SanJoaquin M, Achidi EA, Agbenyega T, Allen A, Amodu O, Corran P, Djimde A, Dolo A, Doumbo OK, Drakeley C, Dunstan S, Evans J, Farrar J, Fernando D, Hien TT, Horstmann RD, Ibrahim M, Karunaweera N, Kokwaro G, Koram KA, Lemnge M, Makani J, Marsh K, Michon P, Modiano D, Molyneux ME, Mueller I, Parker M, Peshu N, Plowe CV, Puijalon O, Reeder J, Reyburn H, Riley EM, Sakuntabhai A, Singhasivanon P, Sirima S, Tall A, Taylor TE, Thera M, Troye-Blomberg M, Williams TN, Wilson M, Kwiatkowski DP, Wellcome Trust Case Control Consortium and Malaria Genomic Epidemiology Network

    MRC Laboratories, Fajara, Banjul, Gambia.

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600230(77610), G0600329, G0600718, G0800759, G19/9, G9828345, MC_U190081977, MC_U190081993; NIAID NIH HHS: U19 AI065683, U19 AI065683-04; Wellcome Trust: 061858, 064890, 076113, 076934, 077011, 077383, 077383/Z/05/Z, 081682, 089062

    Nature genetics 2009;41;6;657-65

  • Repetitive sequence variation and dynamics in the ribosomal DNA array of Saccharomyces cerevisiae as revealed by whole-genome resequencing.

    James SA, O'Kelly MJ, Carter DM, Davey RP, van Oudenaarden A and Roberts IN

    National Collection of Yeast Cultures, Institute of Food Research, Norwich Research Park, Colney, Norwich NR4 7UA, United Kingdom.

    Ribosomal DNA (rDNA) plays a key role in ribosome biogenesis, encoding genes for the structural RNA components of this important cellular organelle. These genes are vital for efficient functioning of the cellular protein synthesis machinery and as such are highly conserved and normally present in high copy numbers. In the baker's yeast Saccharomyces cerevisiae, there are more than 100 rDNA repeats located at a single locus on chromosome XII. Stability and sequence homogeneity of the rDNA array is essential for function, and this is achieved primarily by the mechanism of gene conversion. Detecting variation within these arrays is extremely problematic due to their large size and repetitive structure. In an attempt to address this, we have analyzed over 35 Mbp of rDNA sequence obtained from whole-genome shotgun sequencing (WGSS) of 34 strains of S. cerevisiae. Contrary to expectation, we find significant rDNA sequence variation exists within individual genomes. Many of the detected polymorphisms are not fully resolved. For this type of sequence variation, we introduce the term partial single nucleotide polymorphism, or pSNP. Comparative analysis of the complete data set reveals that different S. cerevisiae genomes possess different patterns of rDNA polymorphism, with much of the variation located within the rapidly evolving nontranscribed intergenic spacer (IGS) region. Furthermore, we find that strains known to have either structured or mosaic/hybrid genomes can be distinguished from one another based on rDNA pSNP number, indicating that pSNP dynamics may provide a reliable new measure of genome origin and stability.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust

    Genome research 2009;19;4;626-35

  • Effects of calcium signaling on Plasmodium falciparum erythrocyte invasion and post-translational modification of gliding-associated protein 45 (PfGAP45).

    Jones ML, Cottingham C and Rayner JC

    Department of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA.

    Plasmodium falciparum erythrocyte invasion is powered by an actin/myosin motor complex that is linked both to the tight junction and to the merozoite cytoskeleton through the Inner Membrane Complex (IMC). The IMC association of the myosin motor, PfMyoA, is maintained by its association with three proteins: PfMTIP, a myosin light chain, PfGAP45, an IMC peripheral membrane protein, and PfGAP50, an integral membrane protein of the IMC. This protein complex is referred to as the glideosome, and given its central role in erythrocyte invasion, this complex is likely the target of several specific regulatory effectors that ensure it is properly localized, assembled, and activated as the merozoite prepares to invade its target cell. However, little is known about how erythrocyte invasion as a whole is regulated, or about how or whether that regulation impacts the glideosome. Here we show that P. falciparum erythrocyte invasion is regulated by the release of intracellular calcium via the cyclic-ADP Ribose (cADPR) pathway, but that inhibition of cADPR-mediated calcium release does not affect PfGAP45 phosphorylation or glideosome association. By contrast, the serine/threonine kinase inhibitor, staurosporine, affects both PfGAP45 isoform distribution and the integrity of the glideosome complex. This data identifies specific regulatory elements involved in controlling P. falciparum erythrocyte invasion and reveals that the assembly status of the merozoite glideosome, which is central to erythrocyte invasion, is surprisingly dynamic.

    Funded by: NIAID NIH HHS: T32 AI055438, T32 AI055438-05

    Molecular and biochemical parasitology 2009;168;1;55-62

  • Use of a genetic isolate to identify rare disease variants: C7 on 5p associated with MS.

    Kallio SP, Jakkula E, Purcell S, Suvela M, Koivisto K, Tienari PJ, Elovaara I, Pirttilä T, Reunanen M, Bronnikov D, Viander M, Meri S, Hillert J, Lundmark F, Harbo HF, Lorentzen AR, De Jager PL, Daly MJ, Hafler DA, Palotie A, Peltonen L and Saarela J

    Finnish Institute for Molecular Medicine, Biomedicum, Helsinki, Finland.

    Large case-control genome-wide association studies primarily expose common variants contributing to disease pathogenesis with modest effects. Thus, alternative strategies are needed to tackle rare, possibly more penetrant alleles. One strategy is to use special populations with a founder effect and isolation, resulting in allelic enrichment. For multiple sclerosis such a unique setting is reported in Southern Ostrobothnia in Finland, where the prevalence and familial occurrence of multiple sclerosis (MS) are exceptionally high. Here, we have studied one of the best replicated MS loci, 5p, and monitored for haplotypes shared among 72 regional MS cases, the majority of which are genealogically distantly related. The haplotype analysis over the 45 Mb region, covering the linkage peak identified in Finnish MS families, revealed only modest association at IL7R (P = 0.04), recently implicated in MS, whereas most significant association was found with one haplotype covering the C7-FLJ40243 locus (P = 0.0001), 5.1 Mb centromeric of IL7R. The finding was validated in an independent sample from the isolate and resulted in an odds ratio of 2.73 (P = 0.000003) in the combined data set. The identified relatively rare risk haplotype contains C7 (complement component 7), an important player of the innate immune system. Suggestive association with alleles of the region was seen also in more heterogeneous populations. Interestingly, also the complement activity correlated with the identified risk haplotype. These results suggest that the MS predisposing locus on 5p is more complex than assumed and exemplify power of population isolates in the identification of rare disease alleles.

    Funded by: NCRR NIH HHS: U54 RR020278; NIMH NIH HHS: R01MH71425-01A1; NINDS NIH HHS: R01 NS 43559; Wellcome Trust: 089061, 089062

    Human molecular genetics 2009;18;9;1670-83

  • Planning the human variome project: the Spain report.

    Kaput J, Cotton RG, Hardman L, Watson M, Al Aqeel AI, Al-Aama JY, Al-Mulla F, Alonso S, Aretz S, Auerbach AD, Bapat B, Bernstein IT, Bhak J, Bleoo SL, Blöcker H, Brenner SE, Burn J, Bustamante M, Calzone R, Cambon-Thomsen A, Cargill M, Carrera P, Cavedon L, Cho YS, Chung YJ, Claustres M, Cutting G, Dalgleish R, den Dunnen JT, Díaz C, Dobrowolski S, dos Santos MR, Ekong R, Flanagan SB, Flicek P, Furukawa Y, Genuardi M, Ghang H, Golubenko MV, Greenblatt MS, Hamosh A, Hancock JM, Hardison R, Harrison TM, Hoffmann R, Horaitis R, Howard HJ, Barash CI, Izagirre N, Jung J, Kojima T, Laradi S, Lee YS, Lee JY, Gil-da-Silva-Lopes VL, Macrae FA, Maglott D, Marafie MJ, Marsh SG, Matsubara Y, Messiaen LM, Möslein G, Netea MG, Norton ML, Oefner PJ, Oetting WS, O'Leary JC, de Ramirez AM, Paalman MH, Parboosingh J, Patrinos GP, Perozzi G, Phillips IR, Povey S, Prasad S, Qi M, Quin DJ, Ramesar RS, Richards CS, Savige J, Scheible DG, Scott RJ, Seminara D, Shephard EA, Sijmons RH, Smith TD, Sobrido MJ, Tanaka T, Tavtigian SV, Taylor GR, Teague J, Töpel T, Ullman-Cullere M, Utsunomiya J, van Kranen HJ, Vihinen M, Webb E, Weber TK, Yeager M, Yeom YI, Yim SH, Yoo HS and Contributors to the Human Variome Project Planning Meeting

    Division of Personalised Nutrition and Medicine, FDA/National Center for Toxicological Research, Jefferson, Arkansas 72079, USA. James.kaput@fda.hhs.gov

    The remarkable progress in characterizing the human genome sequence, exemplified by the Human Genome Project and the HapMap Consortium, has led to the perception that knowledge and the tools (e.g., microarrays) are sufficient for many if not most biomedical research efforts. A large amount of data from diverse studies proves this perception inaccurate at best, and at worst, an impediment for further efforts to characterize the variation in the human genome. Because variation in genotype and environment are the fundamental basis to understand phenotypic variability and heritability at the population level, identifying the range of human genetic variation is crucial to the development of personalized nutrition and medicine. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) was proposed initially to systematically collect mutations that cause human disease and create a cyber infrastructure to link locus specific databases (LSDB). We report here the discussions and recommendations from the 2008 HVP planning meeting held in San Feliu de Guixols, Spain, in May 2008.

    Human mutation 2009;30;4;496-510

  • The impact of newly identified loci on coronary heart disease, stroke and total mortality in the MORGAM prospective cohorts.

    Karvanen J, Silander K, Kee F, Tiret L, Salomaa V, Kuulasmaa K, Wiklund PG, Virtamo J, Saarela O, Perret C, Perola M, Peltonen L, Cambien F, Erdmann J, Samani NJ, Schunkert H, Evans A and MORGAM Project

    Department of Health Promotion and Chronic Disease Prevention, National Public Health Institute, Helsinki, Finland. juha.karvanen@ktl.fi

    Recently, genome wide association studies (GWAS) have identified a number of single nucleotide polymorphisms (SNPs) as being associated with coronary heart disease (CHD). We estimated the effect of these SNPs on incident CHD, stroke and total mortality in the prospective cohorts of the MORGAM Project. We studied cohorts from Finland, Sweden, France and Northern Ireland (total N=33,282, including 1,436 incident CHD events and 571 incident stroke events). The lead SNPs at seven loci identified thus far and additional SNPs (in total 42) were genotyped using a case-cohort design. We estimated the effect of the SNPs on disease history at baseline, disease events during follow-up and classic risk factors. Multiple testing was taken into account using false discovery rate (FDR) analysis. SNP rs1333049 on chromosome 9p21.3 was associated with both CHD and stroke (HR=1.20, 95% CI 1.08-1.34 for incident CHD events and 1.15, 0.99-1.34 for incident stroke). SNP rs11670734 (19q12) was associated with total mortality and stroke. SNP rs2146807 (10q11.21) showed some association with the fatality of acute coronary event. SNP rs2943634 (2q36.3) was associated with high density lipoprotein (HDL) cholesterol and SNPs rs599839, rs4970834 (1p13.3) and rs17228212 (15q22.23) were associated with non-HDL cholesterol. SNPs rs2943634 (2q36.3) and rs12525353 (6q25.1) were associated with blood pressure. These findings underline the need for replication studies in prospective settings and confirm the candidacy of several SNPs that may play a role in the etiology of cardiovascular disease.

    Funded by: Wellcome Trust: 089061

    Genetic epidemiology 2009;33;3;237-46

  • Replication of restless legs syndrome loci in three European populations.

    Kemlink D, Polo O, Frauscher B, Gschliesser V, Högl B, Poewe W, Vodicka P, Vavrova J, Sonka K, Nevsimalova S, Schormair B, Lichtner P, Silander K, Peltonen L, Gieger C, Wichmann HE, Zimprich A, Roeske D, Müller-Myhsok B, Meitinger T and Winkelmann J

    Helmholtz Zentrum Munich, National Research Center of Environment and Health, Institute of Human Genetics, Munich, Germany.

    Background: Restless legs syndrome (RLS) is associated with common variants in three intronic and intergenic regions in MEIS1, BTBD9, and MAP2K5/LBXCOR1 on chromosomes 2p, 6p and 15q.

    Methods: Our study investigated these variants in 649 RLS patients and 1230 controls from the Czech Republic (290 cases and 450 controls), Austria (269 cases and 611 controls) and Finland (90 cases and 169 controls). Ten single nucleotide polymorphisms (SNPs) within the three genomic regions were selected according to the results of previous genome-wide scans. Samples were genotyped using Sequenom platforms.

    Results: We replicated associations for all loci in the combined samples set (rs2300478 in MEIS1, p = 1.26 x 10(-5), odds ratio (OR) = 1.47, rs3923809 in BTBD9, p = 4.11 x 10(-5), OR = 1.58 and rs6494696 in MAP2K5/LBXCOR1, p = 0.04764, OR = 1.27). Analysing only familial cases against all controls, all three loci were significantly associated. Using sporadic cases only, we could confirm the association only with BTBD9.

    Conclusion: Our study shows that variants in these three loci confer consistent disease risks in patients of European descent. Among the known loci, BTBD9 seems to be the most consistent in its effect on RLS across populations and is also most independent of familial clustering.

    Funded by: Wellcome Trust: 089061

    Journal of medical genetics 2009;46;5;315-8

  • Cyclin-dependent kinase inhibits reinitiation of a normal S-phase program during G2 in fission yeast.

    Kiang L, Heichinger C, Watt S, Bähler J and Nurse P

    Laboratory of Yeast Genetics and Cell Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA. lkiang@rockefeller.edu

    To achieve faithful replication of the genome once in each cell cycle, reinitiation of S phase is prevented in G(2) and origins are restricted from refiring within S phase. We have investigated the block to rereplication during G(2) in fission yeast. The DNA synthesis that occurs when G(2)/M cyclin-dependent kinase (CDK) activity is depleted has been assumed to be repeated rounds of S phase without mitosis, but this has not been demonstrated to be the case. We show here that on G(2)/M CDK depletion in G(2), repeated S phases are induced, which are correlated with normal G(1)/S transcription and attainment of doublings in cell size. Mostly normal mitotic S-phase origins are utilized, although at different efficiencies, and replication is essentially equal across the genome. We conclude that CDK inhibits reinitiation of S phase during G(2), and if G(2)/M CDK is depleted, replication results from induction of a largely normal S-phase program with only small differences in origin usage and efficiency.

    Funded by: Cancer Research UK; NIGMS NIH HHS: GM 07739

    Molecular and cellular biology 2009;29;15;4025-32

  • Linkage and linkage disequilibrium scan for autism loci in an extended pedigree from Finland.

    Kilpinen H, Ylisaukko-oja T, Rehnström K, Gaál E, Turunen JA, Kempas E, von Wendt L, Varilo T and Peltonen L

    Department of Molecular Medicine, Institute for Molecular Medicine, Finland.

    Population isolates, such as Finland, have proved beneficial in mapping rare causative genetic variants due to a limited number of founders resulting in reduced genetic heterogeneity and extensive linkage disequilibrium (LD). We have here used this special opportunity to identify rare alleles in autism by genealogically tracing 20 autism families into one extended pedigree with verified genealogical links reaching back to the 17th century. In this unique pedigree, we performed a dense microsatellite marker genome-wide scan of linkage and LD and followed initial findings with extensive fine-mapping. We identified a putative autism susceptibility locus at 19p13.3 and obtained further evidence for previously identified loci at 1q23 and 15q11-q13. Most promising candidate genes were TLE2 and TLE6 clustered at 19p13 and ATP1A2 at 1q23.

    Funded by: Wellcome Trust: 089061

    Human molecular genetics 2009;18;15;2912-21

  • Typhoid Fever

    Kingsley,R.A. and Dougan,G.;

    Vaccines for Biodefense and Emerging and Neglected Diseases 2009;Chapter 57;1147–1161

  • Support for the involvement of large copy number variants in the pathogenesis of schizophrenia.

    Kirov G, Grozeva D, Norton N, Ivanov D, Mantripragada KK, Holmans P, International Schizophrenia Consortium, Wellcome Trust Case Control Consortium, Craddock N, Owen MJ and O'Donovan MC

    Department of Psychological Medicine, Cardiff University, Heath Park, Cardiff, UK.

    We investigated the involvement of rare (<1%) copy number variants (CNVs) in 471 cases of schizophrenia and 2792 controls that had been genotyped using the Affymetrix GeneChip 500K Mapping Array. Large CNVs >1 Mb were 2.26 times more common in cases (P = 0.00027), with the effect coming mostly from deletions (odds ratio, OR = 4.53, P = 0.00013) although duplications were also more common (OR = 1.71, P = 0.04). Two large deletions were found in two cases each, but in no controls: a deletion at 22q11.2 known to be a susceptibility factor for schizophrenia and a deletion on 17p12, at 14.0-15.4 Mb. The latter is known to cause hereditary neuropathy with liability to pressure palsies. The same deletion was found in 6 of 4618 (0.13%) cases and 6 of 36 092 (0.017%) controls in the re-analysed data of two recent large CNV studies of schizophrenia (OR = 7.82, P = 0.001), with the combined significance level for all three studies achieving P = 5 x 10(-5). One large duplication on 16p13.1, which has been previously implicated as a susceptibility factor for autism, was found in three cases and six controls (0.6% versus 0.2%, OR = 2.98, P = 0.13). We also provide the first support for a recently reported association between deletions at 15q11.2 and schizophrenia (P = 0.026). This study confirms the involvement of rare CNVs in the pathogenesis of schizophrenia and contributes to the growing list of specific CNVs that are implicated.

    Funded by: Medical Research Council; NIMH NIH HHS: 2 P50 MH066392-05A1; Wellcome Trust: 076113

    Human molecular genetics 2009;18;8;1497-503

  • Meta-analysis of 28,141 individuals identifies common variants within five new loci that influence uric acid concentrations.

    Kolz M, Johnson T, Sanna S, Teumer A, Vitart V, Perola M, Mangino M, Albrecht E, Wallace C, Farrall M, Johansson A, Nyholt DR, Aulchenko Y, Beckmann JS, Bergmann S, Bochud M, Brown M, Campbell H, EUROSPAN Consortium, Connell J, Dominiczak A, Homuth G, Lamina C, McCarthy MI, ENGAGE Consortium, Meitinger T, Mooser V, Munroe P, Nauck M, Peden J, Prokisch H, Salo P, Salomaa V, Samani NJ, Schlessinger D, Uda M, Völker U, Waeber G, Waterworth D, Wang-Sattler R, Wright AF, Adamski J, Whitfield JB, Gyllensten U, Wilson JF, Rudan I, Pramstaller P, Watkins H, PROCARDIS Consortium, Doering A, Wichmann HE, KORA Study, Spector TD, Peltonen L, Völzke H, Nagaraja R, Vollenweider P, Caulfield M, WTCCC, Illig T and Gieger C

    Institute of Epidemiology, Helmholtz Zentrum München, National Research Center for Environment and Health, Neuherberg, Germany.

    Elevated serum uric acid levels cause gout and are a risk factor for cardiovascular disease and diabetes. To investigate the polygenetic basis of serum uric acid levels, we conducted a meta-analysis of genome-wide association scans from 14 studies totalling 28,141 participants of European descent, resulting in identification of 954 SNPs distributed across nine loci that exceeded the threshold of genome-wide significance, five of which are novel. Overall, the common variants associated with serum uric acid levels fall in the following nine regions: SLC2A9 (p = 5.2x10(-201)), ABCG2 (p = 3.1x10(-26)), SLC17A1 (p = 3.0x10(-14)), SLC22A11 (p = 6.7x10(-14)), SLC22A12 (p = 2.0x10(-9)), SLC16A9 (p = 1.1x10(-8)), GCKR (p = 1.4x10(-9)), LRRC16A (p = 8.5x10(-9)), and near PDZK1 (p = 2.7x10(-9)). Identified variants were analyzed for gender differences. We found that the minor allele for rs734553 in SLC2A9 has greater influence in lowering uric acid levels in women and the minor allele of rs2231142 in ABCG2 elevates uric acid levels more strongly in men compared to women. To further characterize the identified variants, we analyzed their association with a panel of metabolites. rs12356193 within SLC16A9 was associated with DL-carnitine (p = 4.0x10(-26)) and propionyl-L-carnitine (p = 5.0x10(-8)) concentrations, which in turn were associated with serum UA levels (p = 1.4x10(-57) and p = 8.1x10(-54), respectively), forming a triangle between SNP, metabolites, and UA levels. Taken together, these associations highlight additional pathways that are important in the regulation of serum uric acid levels and point toward novel potential targets for pharmacological intervention to prevent or treat hyperuricemia. In addition, these findings strongly support the hypothesis that transport proteins are key in regulating serum uric acid levels.

    Funded by: Arthritis Research UK; British Heart Foundation: FS/05/061/19501, PG02/128; Chief Scientist Office: CZB/4/710; Medical Research Council: G0400874, G9521010, G9521010D, MC_U127561128; NIA NIH HHS: N01-AG-1-2109; NIAAA NIH HHS: AA007535; Wellcome Trust: 076113/B/04/Z

    PLoS genetics 2009;5;6;e1000504

  • Parental origin of sequence variants associated with complex diseases.

    Kong A, Steinthorsdottir V, Masson G, Thorleifsson G, Sulem P, Besenbacher S, Jonasdottir A, Sigurdsson A, Kristinsson KT, Jonasdottir A, Frigge ML, Gylfason A, Olason PI, Gudjonsson SA, Sverrisson S, Stacey SN, Sigurgeirsson B, Benediktsdottir KR, Sigurdsson H, Jonsson T, Benediktsson R, Olafsson JH, Johannsson OT, Hreidarsson AB, Sigurdsson G, DIAGRAM Consortium, Ferguson-Smith AC, Gudbjartsson DF, Thorsteinsdottir U and Stefansson K

    deCODE genetics, Sturlugata 8, 101 Reykjavík, Iceland. kong@decode.is

    Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.

    Funded by: NIAMS NIH HHS: K08 AR055688-02, K08 AR055688-03; NIDDK NIH HHS: R01 DK029867

    Nature 2009;462;7275;868-74

  • Replication in genome-wide association studies.

    Kraft P, Zeggini E and Ioannidis JP

    Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, USA.

    Replication helps ensure that a genotype-phenotype association observed in a genome-wide association (GWA) study represents a credible association and is not a chance finding or an artifact due to uncontrolled biases. We discuss prerequisites for exact replication; issues of heterogeneity; advantages and disadvantages of different methods of data synthesis across multiple studies; frequentist vs. Bayesian inferences for replication; and challenges that arise from multi-team collaborations. While consistent replication can greatly improve the credibility of a genotype-phenotype association, it may not eliminate spurious associations due to biases shared by many studies. Conversely, lack of replication in well-powered follow-up studies usually invalidates the initially proposed association, although occasionally it may point to differences in linkage disequilibrium or effect modifiers across studies.

    Funded by: NCRR NIH HHS: UL1 RR025752-01

    Statistical science : a review journal of the Institute of Mathematical Statistics 2009;24;4;561-573

  • Constitutional haploinsufficiency of tumor suppressor genes in mentally retarded patients with microdeletions in 17p13.1.

    Krepischi-Santos AC, Rajan D, Temple IK, Shrubb V, Crolla JA, Huang S, Beal S, Otto PA, Carter NP, Vianna-Morgante AM and Rosenberg C

    Department of Genetics and Evolutionary Biology, Institute of Biosciences, University of São Paulo, São Paulo, Brazil.

    Chromosome microdeletions or duplications are detected in 10-20% of patients with mental impairment and normal karyotypes. A few cases have been reported of mental impairment with microdeletions comprising tumor suppressor genes. By array-CGH we detected 4 mentally impaired individuals carrying de novo microdeletions sharing an overlapping segment of approximately 180 kb in 17p13.1. This segment encompasses 18 genes, including 3 involved in cancer, namely KCTD11/REN, DLG4/PSD95, and GPS2. Furthermore, in 2 of the patients, the deletions also included TP53, the most frequently inactivated gene in human cancers. The 3 tumor suppressor genes KCTD11, DLG4, and GPS2, in addition to the GABARAP gene, have a known or suspected function in neuronal development and are candidates for causing mental impairment in our patients. Among our 4 patients with deletions in 17p13.1, 3 were part of a Brazilian cohort of 300 mentally retarded individuals, suggesting that this segment may be particularly prone to rearrangements and appears to be an important cause (approximately 1%) of mental retardation. Further, the constitutive deletion of tumor suppressor genes in these patients, particularly TP53, probably confers a significantly increased lifetime risk for cancer and warrants careful oncological surveillance of these patients. Constitutional chromosome deletions containing tumor suppressor genes in patients with mental impairment or congenital abnormalities may represent an important mechanism linking abnormal phenotypes with increased risks of cancer.

    Cytogenetic and genome research 2009;125;1;1-7

  • Cross-species chromosome painting in Cetartiodactyla: reconstructing the karyotype evolution in key phylogenetic lineages.

    Kulemzina AI, Trifonov VA, Perelman PL, Rubtsova NV, Volobuev V, Ferguson-Smith MA, Stanyon R, Yang F and Graphodatsky AS

    Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, 630090, Russia.

    Recent molecular and morphological studies place Artiodactyla and Cetacea into the order Cetartiodactyla. Within the Cetartiodactyla such families as Bovidae, Cervidae, and Suidae are well studied by comparative chromosome painting, but many taxa that are crucial for understanding cetartiodactyl phylogeny remain poorly studied. Here we present the genome-wide comparative maps of five cetartiodactyl species obtained by chromosome painting with human and dromedary paint probes from four taxa: Cetacea, Hippopotamidae, Giraffidae, and Moschidae. This is the first molecular cytogenetic report on pilot whale, hippopotamus, okapi, and Siberian musk deer. Our results, when integrated with previously published comparative chromosome maps allow us to reconstruct the evolutionary pathway and rates of chromosomal rearrangements in Cetartiodactyla. We hypothesize that the putative cetartiodactyl ancestral karyotype (CAK) contained 25-26 pairs of autosomes, 2n = 52-54, and that the association of human chromosomes 8/9 could be a cytogenetic signature that unites non-camelid cetartiodactyls. There are no unambiguous cytogenetic landmarks that unite Hippopotamidae and Cetacea. If we superimpose chromosome rearrangements on the supertree generated by Price and colleagues, several homoplasy events are needed to explain cetartiodactyl karyotype evolution. Our results apparently favour a model of non-random breakpoints in chromosome evolution. Cetariodactyl karyotype evolution is characterized by alternating periods of low and fast rates in various lineages. The highest rates are found in Suina (Suidae+Tayasuidae) lineage (1.76 rearrangements per million years (R/My)) and the lowest in Cetaceans (0.07 R/My). Our study demonstrates that the combined use of human and camel paints is highly informative for revealing evolutionary karyotypic rearrangements among cetartiodactyl species.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2009;17;3;419-36

  • Common genetic variation in the melatonin receptor 1B gene (MTNR1B) is associated with decreased early-phase insulin response.

    Langenberg C, Pascoe L, Mari A, Tura A, Laakso M, Frayling TM, Barroso I, Loos RJ, Wareham NJ, Walker M and RISC Consortium

    MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK. claudia.langenberg@mrc-epid.cam.ac.uk

    We investigated whether variation in MTNR1B, which was recently identified as a common genetic determinant of fasting glucose levels in healthy, diabetes-free individuals, is associated with measures of beta cell function and whole-body insulin sensitivity.

    Methods: We studied 1,276 healthy individuals of European ancestry at 19 centres of the Relationship between Insulin Sensitivity and Cardiovascular disease (RISC) study. Whole-body insulin sensitivity was assessed by euglycaemic-hyperinsulinaemic clamp and indices of beta cell function were derived from a 75 g oral glucose tolerance test (including 30 min insulin response and glucose sensitivity). We studied rs10830963 in MTNR1B using additive genetic models, adjusting for age, sex and recruitment centre.

    Results: The minor (G) allele of rs10830963 in MTNR1B (frequency 0.30 in HapMap Centre d'Etude du Polymorphisme [Utah residents with northern and western European ancestry] [CEU]; 0.29 in RISC participants) was associated with higher levels of fasting plasma glucose (standardised beta [95% CI] 0.17 [0.085, 0.25] per G allele, p = 5.8 x 10(-5)), consistent with recent observations. In addition, the G-allele was significantly associated with lower early insulin response (-0.19 [-0.28, -0.10], p = 1.7 x 10(-5)), as well as with decreased beta cell glucose sensitivity (-0.11 [-0.20, -0.027], p = 0.010). No associations were observed with clamp-assessed insulin sensitivity (p = 0.15) or different measures of body size (p > 0.7 for all).

    Genetic variation in MTNR1B is associated with defective early insulin response and decreased beta cell glucose sensitivity, which may contribute to the higher glucose levels of non-diabetic individuals carrying the minor G allele of rs10830963 in MTNR1B.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0701863, MC_U106188470; Wellcome Trust: 077016, 077016/Z/05/Z

    Diabetologia 2009;52;8;1537-42

  • Testing the water: marine metagenomics.

    Langridge G

    Nature reviews. Microbiology 2009;7;8;552

  • Nontyphoidal Salmonella serovars cause different degrees of invasive disease globally.

    Langridge GC, Nair S and Wain J

    The Journal of infectious diseases 2009;199;4;602-3

  • Antibiotic treatment of clostridium difficile carrier mice triggers a supershedder state, spore-mediated transmission, and severe disease in immunocompromised hosts.

    Lawley TD, Clare S, Walker AW, Goulding D, Stabler RA, Croucher N, Mastroeni P, Scott P, Raisen C, Mottram L, Fairweather NF, Wren BW, Parkhill J and Dougan G

    Microbial Pathogenesis Laboratory1 and Pathogen Genomics, Hinxton, United Kingdom. tl2@sanger.ac.uk

    Clostridium difficile persists in hospitals by exploiting an infection cycle that is dependent on humans shedding highly resistant and infectious spores. Here we show that human virulent C. difficile can asymptomatically colonize the intestines of immunocompetent mice, establishing a carrier state that persists for many months. C. difficile carrier mice consistently shed low levels of spores but, surprisingly, do not transmit infection to cohabiting mice. However, antibiotic treatment of carriers triggers a highly contagious supershedder state, characterized by a dramatic reduction in the intestinal microbiota species diversity, C. difficile overgrowth, and excretion of high levels of spores. Stopping antibiotic treatment normally leads to recovery of the intestinal microbiota species diversity and suppresses C. difficile levels, although some mice persist in the supershedding state for extended periods. Spore-mediated transmission to immunocompetent mice treated with antibiotics results in self-limiting mucosal inflammation of the large intestine. In contrast, transmission to mice whose innate immune responses are compromised (Myd88(-/-)) leads to a severe intestinal disease that is often fatal. Thus, mice can be used to investigate distinct stages of the C. difficile infection cycle and can serve as a valuable surrogate for studying the spore-mediated transmission and interactions between C. difficile and the host and its microbiota, and the results obtained should guide infection control measures.

    Funded by: Wellcome Trust

    Infection and immunity 2009;77;9;3661-9

  • Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.

    Lawley TD, Croucher NJ, Yu L, Clare S, Sebaihia M, Goulding D, Pickard DJ, Parkhill J, Choudhary J and Dougan G

    Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom. tl2@sanger.ac.uk

    Clostridium difficile, a major cause of antibiotic-associated diarrhea, produces highly resistant spores that contaminate hospital environments and facilitate efficient disease transmission. We purified C. difficile spores using a novel method and show that they exhibit significant resistance to harsh physical or chemical treatments and are also highly infectious, with <7 environmental spores per cm(2) reproducibly establishing a persistent infection in exposed mice. Mass spectrometric analysis identified approximately 336 spore-associated polypeptides, with a significant proportion linked to translation, sporulation/germination, and protein stabilization/degradation. In addition, proteins from several distinct metabolic pathways associated with energy production were identified. Comparison of the C. difficile spore proteome to those of other clostridial species defined 88 proteins as the clostridial spore "core" and 29 proteins as C. difficile spore specific, including proteins that could contribute to spore-host interactions. Thus, our results provide the first molecular definition of C. difficile spores, opening up new opportunities for the development of diagnostic and therapeutic approaches.

    Funded by: Wellcome Trust

    Journal of bacteriology 2009;191;17;5377-86

  • GLIDERS--a web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs.

    Lawrence R, Day-Williams AG, Mott R, Broxholme J, Cardon LR and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. z.r.yang@ex.ac.uk

    Background: A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine) that enables the retrieval of pairwise associations with r2 >or= 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers.

    Description: GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range). The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%), distance limits between SNPs (minimum and maximum), r2 (0.3 to 1), HapMap population sample (CEU, YRI and JPT+CHB combined) and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file.

    Conclusion: GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.

    Funded by: Wellcome Trust: 079557, 079557MA, 088885/Z/09/Z

    BMC bioinformatics 2009;10;367

  • The brain structural disposition to social interaction.

    Lebreton M, Barnes A, Miettunen J, Peltonen L, Ridler K, Veijola J, Tanskanen P, Suckling J, Jarvelin MR, Jones PB, Isohanni M, Bullmore ET and Murray GK

    Brain Mapping Unit, University of Cambridge, Cambridge, UK.

    Social reward dependence (RD) in humans is a stable pattern of attitudes and behaviour hypothesized to represent a favourable disposition towards social relationships and attachment as a personality dimension. It has been theorized that this long-term disposition to openness is linked to the capacity to process primary reward. Using brain structure measures from magnetic resonance imaging, and a measure of RD from Cloninger's temperament and character inventory, a self-reported questionnaire, in 41 male subjects sampled from a general population birth cohort, we investigated the neuro-anatomical basis of social RD. We found that higher social RD in men was significantly associated with increased gray matter density in the orbitofrontal cortex, basal ganglia and temporal lobes, regions that have been previously shown to be involved in processing of primary rewards. These findings provide evidence for a brain structural disposition to social interaction, and that sensitivity to social reward shares a common neural basis with systems for processing primary reward information.

    Funded by: Medical Research Council

    The European journal of neuroscience 2009;29;11;2247-52

  • Statistical estimation of cell-cycle progression and lineage commitment in Plasmodium falciparum reveals a homogeneous pattern of transcription in ex vivo culture.

    Lemieux JE, Gomez-Escobar N, Feller A, Carret C, Amambua-Ngwa A, Pinches R, Day F, Kyes SA, Conway DJ, Holmes CC and Newbold CI

    Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, United Kingdom.

    We have cultured Plasmodium falciparum directly from the blood of infected individuals to examine patterns of mature-stage gene expression in patient isolates. Analysis of the transcriptome of P. falciparum is complicated by the highly periodic nature of gene expression because small variations in the stage of parasite development between samples can lead to an apparent difference in gene expression values. To address this issue, we have developed statistical likelihood-based methods to estimate cell cycle progression and commitment to asexual or sexual development lineages in our samples based on microscopy and gene expression patterns. In cases subsequently matched for temporal development, we find that transcriptional patterns in ex vivo culture display little variation across patients with diverse clinical profiles and closely resemble transcriptional profiles that occur in vitro. These statistical methods, available to the research community, assist in the design and interpretation of P. falciparum expression profiling experiments where it is difficult to separate true differential expression from cell-cycle dependent expression. We reanalyze an existing dataset of in vivo patient expression profiles and conclude that previously observed discrete variation is consistent with the commitment of a varying proportion of the parasite population to the sexual development lineage.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2009;106;18;7559-64

  • An ENU-induced mutation of miR-96 associated with progressive hearing loss in mice.

    Lewis MA, Quint E, Glazier AM, Fuchs H, De Angelis MH, Langford C, van Dongen S, Abreu-Goodger C, Piipari M, Redshaw N, Dalmay T, Moreno-Pelayo MA, Enright AJ and Steel KP

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Progressive hearing loss is common in the human population, but little is known about the molecular basis. We report a new N-ethyl-N-nitrosurea (ENU)-induced mouse mutant, diminuendo, with a single base change in the seed region of Mirn96. Heterozygotes show progressive loss of hearing and hair cell anomalies, whereas homozygotes have no cochlear responses. Most microRNAs are believed to downregulate target genes by binding to specific sites on their mRNAs, so mutation of the seed should lead to target gene upregulation. Microarray analysis revealed 96 transcripts with significantly altered expression in homozygotes; notably, Slc26a5, Ocm, Gfi1, Ptprq and Pitpnm1 were downregulated. Hypergeometric P-value analysis showed that hundreds of genes were upregulated in mutants. Different genes, with target sites complementary to the mutant seed, were downregulated. This is the first microRNA found associated with deafness, and diminuendo represents a model for understanding and potentially moderating progressive hair cell degeneration in hearing loss more generally.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 077189, 077198

    Nature genetics 2009;41;5;614-8

  • Fast and accurate short read alignment with Burrows-Wheeler transform.

    Li H and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals.

    Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is approximately 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package.

    Availability: http://maq.sourceforge.net.

    Funded by: Wellcome Trust: 077192/Z/05/Z

    Bioinformatics (Oxford, England) 2009;25;14;1754-60

  • The Sequence Alignment/Map format and SAMtools.

    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R and 1000 Genome Project Data Processing Subgroup

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK, Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA.

    Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

    Availability: http://samtools.sourceforge.net.

    Funded by: NHGRI NIH HHS: R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, U54HG002750; Wellcome Trust: 077192/Z/05/Z

    Bioinformatics (Oxford, England) 2009;25;16;2078-9

  • Chromosomal mobilization and reintegration of Sleeping Beauty and PiggyBac transposons.

    Liang Q, Kong J, Stalker J and Bradley A

    The Sleeping Beauty and PiggyBac DNA transposon systems have recently been developed as tools for insertional mutagenesis. We have compared the chromosomal mobilization efficiency and insertion site preference of the two transposons mobilized from the same donor site in mouse embryonic stem (ES) cells under conditions in which there were no selective constraints on the transposons' insertion sites. Compared with Sleeping Beauty, PiggyBac exhibits higher transposition efficiencies, no evidence for local hopping and a significant bias toward reintegration in intragenic regions, which demonstrate its utility for insertional mutagenesis. Although Sleeping Beauty had no detectable genomic bias with respect to insertions in genes or intergenic regions, both Sleeping Beauty and PiggyBac transposons displayed preferential integration into actively transcribed loci.

    Genesis (New York, N.Y. : 2000) 2009;47;6;404-8

  • Mining mammalian genomes for folding competent proteins using Tat-dependent genetic selection in Escherichia coli.

    Lim HK, Mansell TJ, Linderman SW, Fisher AC, Dyson MR and DeLisa MP

    School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York 14853, USA.

    Recombinant expression of eukaryotic proteins in Escherichia coli is often limited by poor folding and solubility. To address this problem, we employed a recently developed genetic selection for protein folding and solubility based on the bacterial twin-arginine translocation (Tat) pathway to rapidly identify properly folded recombinant proteins or soluble protein domains of mammalian origin. The coding sequences for 29 different mammalian polypeptides were cloned as sandwich fusions between an N-terminal Tat export signal and a C-terminal selectable marker, namely beta-lactamase. Hence, expression of the selectable marker and survival on selective media was linked to Tat export of the target mammalian protein. Since the folding quality control feature of the Tat pathway prevents export of misfolded proteins, only correctly folded fusion proteins reached the periplasm and conferred cell survival. In general, the ability to confer growth was found to relate closely to the solubility profile and molecular weight of the protein, although other features such as number of contiguous hydrophobic amino acids and cysteine content may also be important. These results highlight the capacity of Tat selection to reveal the folding potential of mammalian proteins and protein domains without the need for structural or functional information about the target protein.

    Protein science : a publication of the Protein Society 2009;18;12;2537-49

  • DNA methylation-histone modification relationships across the desmin locus in human primary cells.

    Lindahl Allen M, Koch CM, Clelland GK, Dunham I and Antoniou M

    Nuclear Biology Group, King's College London School of Medicine, Department of Medical and Molecular Genetics, 8th Floor Tower Wing, Guy's Hospital, London SE1 9RT, UK. Marianne_LindahlAllen@hms.harvard.edu

    Background: We present here an extensive epigenetic analysis of a 500 kb region, which encompasses the human desmin gene (DES) and its 5' locus control region (LCR), the only muscle-specific transcriptional regulatory element of this type described to date. These data complement and extend Encyclopaedia of DNA Elements (ENCODE) studies on region ENr133. We analysed histone modifications and underlying DNA methylation patterns in physiologically relevant DES expressing (myoblast/myotube) and non-expressing (peripheral blood mononuclear) primary human cells.

    Results: We found that in expressing myoblast/myotube but not peripheral blood mononuclear cell (PBMC) cultures, histone H4 acetylation displays a broadly distributed enrichment across a gene rich 200 kb region whereas H3 acetylation localizes at the transcriptional start site (TSS) of genes. We show that the DES LCR and TSS of DES are enriched with hyperacetylated domains of acetylated histone H3, with H3 lysine 4 di- and tri-methylation (H3K4me2 and me3) exhibiting a different distribution pattern across this locus. The CpG island that extends into the first intron of DES is methylation-free regardless of the gene's expression status and in non-expressing PBMCs is marked with histone H3 lysine 27 tri-methylation (H3K27me3).

    Conclusion: Overall, our results constitute the first study correlating patterns of histone modifications and underlying DNA methylation of a muscle-specific LCR and its associated downstream gene region whilst additionally placing this within a much broader genomic context. Our results clearly show that there are distinct patterns of histone H3 and H4 acetylation and H3 methylation at the DES LCR, promoter and intragenic region. In addition, the presence of H3K27me3 at the DES methylation-free CpG only in non-expressing PBMCs may serve to silence this gene in non-muscle tissues. Generally, our work demonstrates the importance of using multiple, physiologically relevant tissue types that represent different expressing/non-expressing states when investigating epigenetic marks and that underlying DNA methylation status should be correlated with histone modification patterns when studying chromatin structure.

    Funded by: Medical Research Council: G78/7909

    BMC molecular biology 2009;10;51

  • Genome-wide association scan meta-analysis identifies three Loci influencing adiposity and fat distribution.

    Lindgren CM, Heid IM, Randall JC, Lamina C, Steinthorsdottir V, Qi L, Speliotes EK, Thorleifsson G, Willer CJ, Herrera BM, Jackson AU, Lim N, Scheet P, Soranzo N, Amin N, Aulchenko YS, Chambers JC, Drong A, Luan J, Lyon HN, Rivadeneira F, Sanna S, Timpson NJ, Zillikens MC, Zhao JH, Almgren P, Bandinelli S, Bennett AJ, Bergman RN, Bonnycastle LL, Bumpstead SJ, Chanock SJ, Cherkas L, Chines P, Coin L, Cooper C, Crawford G, Doering A, Dominiczak A, Doney AS, Ebrahim S, Elliott P, Erdos MR, Estrada K, Ferrucci L, Fischer G, Forouhi NG, Gieger C, Grallert H, Groves CJ, Grundy S, Guiducci C, Hadley D, Hamsten A, Havulinna AS, Hofman A, Holle R, Holloway JW, Illig T, Isomaa B, Jacobs LC, Jameson K, Jousilahti P, Karpe F, Kuusisto J, Laitinen J, Lathrop GM, Lawlor DA, Mangino M, McArdle WL, Meitinger T, Morken MA, Morris AP, Munroe P, Narisu N, Nordström A, Nordström P, Oostra BA, Palmer CN, Payne F, Peden JF, Prokopenko I, Renström F, Ruokonen A, Salomaa V, Sandhu MS, Scott LJ, Scuteri A, Silander K, Song K, Yuan X, Stringham HM, Swift AJ, Tuomi T, Uda M, Vollenweider P, Waeber G, Wallace C, Walters GB, Weedon MN, Wellcome Trust Case Control Consortium, Witteman JC, Zhang C, Zhang W, Caulfield MJ, Collins FS, Davey Smith G, Day IN, Franks PW, Hattersley AT, Hu FB, Jarvelin MR, Kong A, Kooner JS, Laakso M, Lakatta E, Mooser V, Morris AD, Peltonen L, Samani NJ, Spector TD, Strachan DP, Tanaka T, Tuomilehto J, Uitterlinden AG, van Duijn CM, Wareham NJ, Hugh Watkins, Procardis Consortia, Waterworth DM, Boehnke M, Deloukas P, Groop L, Hunter DJ, Thorsteinsdottir U, Schlessinger D, Wichmann HE, Frayling TM, Abecasis GR, Hirschhorn JN, Loos RJ, Stefansson K, Mohlke KL, Barroso I, McCarthy MI and Giant Consortium

    Wellcome Trust Centre for Human Genetics, University of Oxford, , Oxford, United Kingdom.

    To identify genetic loci influencing central obesity and fat distribution, we performed a meta-analysis of 16 genome-wide association studies (GWAS, N = 38,580) informative for adult waist circumference (WC) and waist-hip ratio (WHR). We selected 26 SNPs for follow-up, for which the evidence of association with measures of central adiposity (WC and/or WHR) was strong and disproportionate to that for overall adiposity or height. Follow-up studies in a maximum of 70,689 individuals identified two loci strongly associated with measures of central adiposity; these map near TFAP2B (WC, P = 1.9x10(-11)) and MSRA (WC, P = 8.9x10(-9)). A third locus, near LYPLAL1, was associated with WHR in women only (P = 2.6x10(-8)). The variants near TFAP2B appear to influence central adiposity through an effect on overall obesity/fat-mass, whereas LYPLAL1 displays a strong female-only association with fat distribution. By focusing on anthropometric measures of central obesity and fat distribution, we have identified three loci implicated in the regulation of human adiposity.

    Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation; Medical Research Council: 0600705, G0000649, G0000934, G0500539, G0600705, G0601261, G0701863, G0801056, G9521010, G9521010D, MC_QA137934, MC_U106188470, MC_UP_A620_1014; NHLBI NIH HHS: HL084729, HL087679; NIDDK NIH HHS: DK062370, DK067288, DK07191, DK072193, DK075787, DK079466, DK080145, F32 DK079466-01, K23 DK080145-01, R01 DK029867, R01 DK072193-04; PHS HHS: G02651; Wellcome Trust: 064890, 068545/Z/02, 081682, 086596/Z/08/Z, GR069224, GR072960, GR076113

    PLoS genetics 2009;5;6;e1000508

  • HI: haplotype improver using paired-end short reads.

    Long Q, MacArthur D, Ning Z and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambs, UK. ql2@sanger.ac.uk

    Summary: We present a program to improve haplotype reconstruction by incorporating information from paired-end reads, and demonstrate its utility on simulated data. We find that given a fixed coverage, longer reads (implying fewer of them) are preferable.

    Availability: The executable and user manual can be freely downloaded from ftp://ftp.sanger.ac.uk/pub/zn1/HI.

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;18;2436-7

  • Adventitious changes in long-range gene expression caused by polymorphic structural variation and promoter competition.

    Lower KM, Hughes JR, De Gobbi M, Henderson S, Viprakasit V, Fisher C, Goriely A, Ayyub H, Sloane-Stanley J, Vernimmen D, Langford C, Garrick D, Gibbons RJ and Higgs DR

    Medical Research Council Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, The John Radcliffe Hospital, Headington, Oxford, OX3 9DS, United Kingdom.

    It is well established that all of the cis-acting sequences required for fully regulated human alpha-globin expression are contained within a region of approximately 120 kb of conserved synteny. Here, we show that activation of this cluster in erythroid cells dramatically affects expression of apparently unrelated and noncontiguous genes in the 500 kb surrounding this domain, including a gene (NME4) located 300 kb from the alpha-globin cluster. Changes in NME4 expression are mediated by physical cis-interactions between this gene and the alpha-globin regulatory elements. Polymorphic structural variation within the globin cluster, altering the number of alpha-globin genes, affects the pattern of NME4 expression by altering the competition for the shared alpha-globin regulatory elements. These findings challenge the concept that the genome is organized into discrete, insulated regulatory domains. In addition, this work has important implications for our understanding of genome evolution, the interpretation of genome-wide expression, expression-quantitative trait loci, and copy number variant analyses.

    Funded by: Medical Research Council; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2009;106;51;21771-6

  • Amplification and overexpression of Hsa-miR-30b, Hsa-miR-30d and KHDRBS3 at 8q24.22-q24.23 in medulloblastoma.

    Lu Y, Ryan SL, Elliott DJ, Bignell GR, Futreal PA, Ellison DW, Bailey S and Clifford SC

    Northern Institute for Cancer Research, Newcastle University, Newcastle upon Tyne, United Kingdom.

    Background: Medulloblastoma is the most common malignant brain tumour of childhood. The identification of critical genes involved in its pathogenesis will be central to advances in our understanding of its molecular basis, and the development of improved therapeutic approaches.

    We performed a SNP-array based genome-wide copy number analysis in medulloblastoma cell lines, to identify regions of genomic amplification and homozygous deletion, which may harbour critical disease genes. A series of novel and established medulloblastoma defects were detected (MYC amplification (n = 4), 17q21.31 high-level gain (n = 1); 9p21.1-p21.3 (n = 1) and 6q23.1 (n = 1) homozygous deletion). Most notably, a novel recurrent region of genomic amplification at 8q24.22-q24.23 was identified (n = 2), and selected for further investigation. Additional analysis by interphase fluorescence in situ hybridisation (iFISH), PCR-based mapping and SNP-array revealed this novel amplification at 8q24.22-q24.23 is independent of MYC amplification at 8q24.21, and is unique to medulloblastoma in over 800 cancer cell lines assessed from different tumour types, suggesting it contains key genes specifically involved in medulloblastoma development. Detailed mapping identified a 3Mb common minimal region of amplification harbouring 3 coding genes (ZFAT1, LOC286094, KHDRBS3) and two genes encoding micro-RNAs (hsa-miR-30b, hsa-miR-30d). Of these, only expression of hsa-miR-30b, hsa-miR-30d and KHDRBS3 correlated with copy number status, and all three of these transcripts also displayed evidence of elevated expression in sub-sets of primary medulloblastomas, measured relative to the normal cerebellum.

    These data implicate hsa-miR-30b, hsa-miR-30d and KHDRBS3 as putative oncogenic target(s) of a novel recurrent medulloblastoma amplicon at 8q24.22-q24.23. Our findings suggest critical roles for these genes in medulloblastoma development, and further support the contribution of micro-RNA species to medulloblastoma pathogenesis.

    Funded by: Cancer Research UK: C8464/A5497

    PloS one 2009;4;7;e6159

  • Biology of Genomes: making sense of sequence.

    Macarthur DG

    Human Evolution, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. dm8@sanger.ac.uk.

    A report on the Biology of Genomes meeting held at Cold Spring Harbor Laboratory, NY, USA, 5-9 May 2009.

    Genome medicine 2009;1;6;61

  • A genome-wide association study identifies a novel locus on chromosome 18q12.2 influencing white cell telomere length.

    Mangino M, Richards JB, Soranzo N, Zhai G, Aviv A, Valdes AM, Samani NJ, Deloukas P and Spector TD

    Department of Twin Research & Genetic Epidemiology, King's College London, London, UK.

    Background: Telomere length is a predictor for a number of common age related diseases and is a heritable trait.

    To identify new loci associated with mean leukocyte telomere length we conducted a genome wide association study of 314,075 single nucleotide polymorphisms (SNPs) and validated the results in a second cohort (n for both cohorts combined = 2790). We identified two novel associated variants (rs2162440, p = 2.6 x 10(-6); and rs7235755, p = 5.5 x 10(-6)) on chromosome 18q12.2 in the same region as the VPS34/PIKC3C gene, which has been directly implicated in the pathway controlling telomere length variation in yeast.

    Conclusion: These results provide new insights into the pathways regulating telomere homeostasis in humans.

    Funded by: Wellcome Trust: 077011

    Journal of medical genetics 2009;46;7;451-4

  • LookSeq: a browser-based viewer for deep sequencing data.

    Manske HM and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. mm6@sanger.ac.uk

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust

    Genome research 2009;19;11;2125-32

  • SNP-o-matic.

    Manske HM and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. mm6@sanger.ac.uk

    Motivation: High throughput sequencing technologies generate large amounts of short reads. Mapping these to a reference sequence consumes large amounts of processing time and memory, and read mapping errors can lead to noisy or incorrect alignments. SNP-o-matic is a fast, memory-efficient and stringent read mapping tool offering a variety of analytical output functions, with an emphasis on genotyping.

    Availability: http://snpomatic.sourceforge.net.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;18;2434-5

  • High prevalence of four long QT syndrome founder mutations in the Finnish population.

    Marjamaa A, Salomaa V, Newton-Cheh C, Porthan K, Reunanen A, Karanko H, Jula A, Lahermo P, Väänänen H, Toivonen L, Swan H, Viitasalo M, Nieminen MS, Peltonen L, Oikarinen L, Palotie A and Kontula K

    Research Program in Molecular Medicine, Biomedicum Helsinki, University of Helsinki, Helsinki, Finland.

    Aims: Long QT syndrome (LQTS) is an inherited arrhythmia disorder with an estimated prevalence of 0.01%-0.05%. In Finland, four founder mutations constitute up to 70% of the known genetic spectrum of LQTS. In the present survey, we sought to estimate the actual prevalence of the founder mutations and to determine their effect sizes in the general Finnish population.

    We genotyped 6334 subjects aged > or =30 years from a population cohort (Health 2000 study) for the four Finnish founder mutations using Sequenom MALDI-TOF mass spectrometry. The electrocardiogram (ECG) parameters were measured from digital 12-lead ECGs, and QT intervals were adjusted for age, sex, and heart rate using linear regression. A total of 27 individuals carried one of the founder mutations resulting in their collective prevalence estimate of 0.4% (95% CI 0.3%-0.6%). The KCNQ1 G589D mutation (n=8) was associated with a 50 ms (SE 7.0) prolongation of the adjusted QT interval (P=9.0x10(-13)). The KCNH2 R176W variant (n=16) resulted in a 22 ms (SE 4.7) longer adjusted QT interval (P=2.1x10(-6)).

    Conclusion: In Finland 1 individual out of 250 carries a LQTS founder mutation, which is the highest documented prevalence of LQTS mutations that lead to a marked QT prolongation.

    Funded by: NHLBI NIH HHS: HL 080025; Wellcome Trust: 089061, 089062

    Annals of medicine 2009;41;3;234-40

  • Donor-recipient mismatch for common gene deletion polymorphisms in graft-versus-host disease.

    McCarroll SA, Bradner JE, Turpeinen H, Volin L, Martin PJ, Chilewski SD, Antin JH, Lee SJ, Ruutu T, Storer B, Warren EH, Zhang B, Zhao LP, Ginsburg D, Soiffer RJ, Partanen J, Hansen JA, Ritz J, Palotie A and Altshuler D

    Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, USA. mccarroll@genetics.med.harvard.edu

    Transplantation and pregnancy, in which two diploid genomes reside in one body, can each lead to diseases in which immune cells from one individual target antigens encoded in the other's genome. One such disease, graft-versus-host disease (GVHD) after hematopoietic stem cell transplantation (HSCT, or bone marrow transplant), is common even after transplants between HLA-identical siblings, indicating that cryptic histocompatibility loci exist outside the HLA locus. The immune system of an individual whose genome is homozygous for a gene deletion could recognize epitopes encoded by that gene as alloantigens. Analyzing common gene deletions in three HSCT cohorts (1,345 HLA-identical sibling donor-recipient pairs), we found that risk of acute GVHD was greater (odds ratio (OR) = 2.5; 95% confidence interval (CI) 1.4-4.6) when donor and recipient were mismatched for homozygous deletion of UGT2B17, a gene expressed in GVHD-affected tissues and giving rise to multiple histocompatibility antigens. Human genome structural variation merits investigation as a potential mechanism in diseases of alloimmunity.

    Funded by: NCI NIH HHS: CA18029, P01 CA018029-270048, P01 CA018029-349016; NHLBI NIH HHS: HL087690, P01 HL070149-05, R01 HL087690-03; NIAID NIH HHS: AI29530, AI33484, P01 AI029530, P01 AI029530-130007, P01 AI033484-13; PHS HHS: HA070149

    Nature genetics 2009;41;12;1341-4

  • Microduplications of 16p11.2 are associated with schizophrenia.

    McCarthy SE, Makarov V, Kirov G, Addington AM, McClellan J, Yoon S, Perkins DO, Dickel DE, Kusenda M, Krastoshevsky O, Krause V, Kumar RA, Grozeva D, Malhotra D, Walsh T, Zackai EH, Kaplan P, Ganesh J, Krantz ID, Spinner NB, Roccanova P, Bhandari A, Pavon K, Lakshmi B, Leotta A, Kendall J, Lee YH, Vacic V, Gary S, Iakoucheva LM, Crow TJ, Christian SL, Lieberman JA, Stroup TS, Lehtimäki T, Puura K, Haldeman-Englert C, Pearl J, Goodell M, Willour VL, Derosse P, Steele J, Kassem L, Wolff J, Chitkara N, McMahon FJ, Malhotra AK, Potash JB, Schulze TG, Nöthen MM, Cichon S, Rietschel M, Leibenluft E, Kustanovich V, Lajonchere CM, Sutcliffe JS, Skuse D, Gill M, Gallagher L, Mendell NR, Wellcome Trust Case Control Consortium, Craddock N, Owen MJ, O'Donovan MC, Shaikh TH, Susser E, Delisi LE, Sullivan PF, Deutsch CK, Rapoport J, Levy DL, King MC and Sebat J

    Recurrent microdeletions and microduplications of a 600-kb genomic region of chromosome 16p11.2 have been implicated in childhood-onset developmental disorders. We report the association of 16p11.2 microduplications with schizophrenia in two large cohorts. The microduplication was detected in 12/1,906 (0.63%) cases and 1/3,971 (0.03%) controls (P = 1.2 x 10(-5), OR = 25.8) from the initial cohort, and in 9/2,645 (0.34%) cases and 1/2,420 (0.04%) controls (P = 0.022, OR = 8.3) of the replication cohort. The 16p11.2 microduplication was associated with a 14.5-fold increased risk of schizophrenia (95% CI (3.3, 62)) in the combined sample. A meta-analysis of datasets for multiple psychiatric disorders showed a significant association of the microduplication with schizophrenia (P = 4.8 x 10(-7)), bipolar disorder (P = 0.017) and autism (P = 1.9 x 10(-7)). In contrast, the reciprocal microdeletion was associated only with autism and developmental disorders (P = 2.3 x 10(-13)). Head circumference was larger in patients with the microdeletion than in patients with the microduplication (P = 0.0007).

    Funded by: Medical Research Council: G0800509; NCRR NIH HHS: RR000037; NICHD NIH HHS: HD04147; NIDCR NIH HHS: DE016442; NIGMS NIH HHS: GM081519; NIMH NIH HHS: 1U24MH081810, K99 MH086756-01, K99 MH086756-02, MH061009, MH071523, MH074027, MH076431, MH077139, MH081810, MH083989, MH31340, MH44245, N01 MH90001, R00 MH086756-03, R01 MH091350-03, ZIA MH002581-19; PHS HHS: HF004222; Wellcome Trust: 076113

    Nature genetics 2009;41;11;1223-7

  • Geographical structure and differential natural selection among North European populations.

    McEvoy BP, Montgomery GW, McRae AF, Ripatti S, Perola M, Spector TD, Cherkas L, Ahmadi KR, Boomsma D, Willemsen G, Hottenga JJ, Pedersen NL, Magnusson PK, Kyvik KO, Christensen K, Kaprio J, Heikkilä K, Palotie A, Widen E, Muilu J, Syvänen AC, Liljedahl U, Hardiman O, Cronin S, Peltonen L, Martin NG and Visscher PM

    Queensland Institute of Medical Research, Brisbane, Queensland, Australia. brian.mcevoy@qimr.edu.au

    Population structure can provide novel insight into the human past, and recognizing and correcting for such stratification is a practical concern in gene mapping by many association methodologies. We investigate these patterns, primarily through principal component (PC) analysis of whole genome SNP polymorphism, in 2099 individuals from populations of Northern European origin (Ireland, United Kingdom, Netherlands, Denmark, Sweden, Finland, Australia, and HapMap European-American). The major trends (PC1 and PC2) demonstrate an ability to detect geographic substructure, even over a small area like the British Isles, and this information can then be applied to finely dissect the ancestry of the European-Australian and European-American samples. They simultaneously point to the importance of considering population stratification in what might be considered a small homogeneous region. There is evidence from F(ST)-based analysis of genic and nongenic SNPs that differential positive selection has operated across these populations despite their short divergence time and relatively similar geographic and environmental range. The pressure appears to have been focused on genes involved in immunity, perhaps reflecting response to infectious disease epidemic. Such an event may explain a striking selective sweep centered on the rs2508049-G allele, close to the HLA-G gene on chromosome 6. Evidence of the sweep extends over a 8-Mb/3.5-cM region. Overall, the results illustrate the power of dense genotype and sample data to explore regional population variation, the events that have crafted it, and their implications in both explaining disease prevalence and mapping these genes by association.

    Funded by: Biotechnology and Biological Sciences Research Council; Department of Health; Wellcome Trust

    Genome research 2009;19;5;804-14

  • Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive hearing loss.

    Mencía A, Modamio-Høybjør S, Redshaw N, Morín M, Mayo-Merino F, Olavarrieta L, Aguirre LA, del Castillo I, Steel KP, Dalmay T, Moreno F and Moreno-Pelayo MA

    Unidad de Genética Molecular, Hospital Ramón y Cajal, Madrid, Spain.

    MicroRNAs (miRNAs) bind to complementary sites in their target mRNAs to mediate post-transcriptional repression, with the specificity of target recognition being crucially dependent on the miRNA seed region. Impaired miRNA target binding resulting from SNPs within mRNA target sites has been shown to lead to pathologies associated with dysregulated gene expression. However, no pathogenic mutations within the mature sequence of a miRNA have been reported so far. Here we show that point mutations in the seed region of miR-96, a miRNA expressed in hair cells of the inner ear, result in autosomal dominant, progressive hearing loss. This is the first study implicating a miRNA in a mendelian disorder. The identified mutations have a strong impact on miR-96 biogenesis and result in a significant reduction of mRNA targeting. We propose that these mutations alter the regulatory role of miR-96 in maintaining gene expression profiles in hair cells required for their normal function.

    Funded by: Action on Hearing Loss: G41; Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Nature genetics 2009;41;5;609-13

  • Genetic structure of nomadic Bedouin from Kuwait.

    Mohammad T, Xue Y, Evison M and Tyler-Smith C

    Division of Genomic Medicine, University of Sheffield, Sheffield, UK.

    Bedouin are traditionally nomadic inhabitants of the Persian Gulf who claim descent from two male lineages: Adnani and Qahtani. We have investigated whether or not this tradition is reflected in the current genetic structure of a sample of 153 Bedouin males from six Kuwaiti tribes, including three tribes from each traditional lineage. Volunteers were genotyped using a panel of autosomal and Y-STRs, and Y-SNPs. The samples clustered with their geographical neighbours in both the autosomal and Y-chromosomal analyses, and showed strong evidence of genetic isolation and drift. Although there was no evidence of segregation into the two male lineages, other aspects of genetic structure were in accord with tradition.

    Funded by: Wellcome Trust: 077009

    Heredity 2009;103;5;425-33

  • Current computational methods for prioritizing candidate regulatory polymorphisms.

    Montgomery S

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Discovery of DNA sequence variants responsible for human phenotypic variation is key to advances in molecular diagnostics and medicines. Historically, variants that alter the protein-coding sequence of genes have been targeted when attempting to identify a trait's etiology; this is done because the rules governing these regions are generally well-understood and candidate variants can be easily selected. However, the effects of variants on gene regulation are increasingly regarded as being as important as protein-coding variation in uncovering the nature of phenotypic variation. I discuss resources and methodology that have recently been developed to computationally prioritize variants that may alter gene expression.

    Methods in molecular biology (Clifton, N.J.) 2009;569;89-114

  • Discovery of mating in the major African livestock pathogen Trypanosoma congolense.

    Morrison LJ, Tweedie A, Black A, Pinchbeck GL, Christley RM, Schoenefeld A, Hertz-Fowler C, MacLeod A, Turner CM and Tait A

    Wellcome Centre for Molecular Parasitology, University of Glasgow, Glasgow Biomedical Research Centre, Glasgow, United Kingdom. lm78y@udcf.gla.ac.uk

    The protozoan parasite, Trypanosoma congolense, is one of the most economically important pathogens of livestock in Africa and, through its impact on cattle health and productivity, has a significant effect on human health and well being. Despite the importance of this parasite our knowledge of some of the fundamental biological processes is limited. For example, it is unknown whether mating takes place. In this paper we have taken a population genetics based approach to address this question. The availability of genome sequence of the parasite allowed us to identify polymorphic microsatellite markers, which were used to genotype T. congolense isolates from livestock in a discrete geographical area of The Gambia. The data showed a high level of diversity with a large number of distinct genotypes, but a deficit in heterozygotes. Further analysis identified cryptic genetic subdivision into four sub-populations. In one of these, parasite genotypic diversity could only be explained by the occurrence of frequent mating in T. congolense. These data are completely inconsistent with previous suggestions that the parasite expands asexually in the absence of mating. The discovery of mating in this species of trypanosome has significant consequences for the spread of critical traits, such as drug resistance, as well as for fundamental aspects of the biology and epidemiology of this neglected but economically important pathogen.

    Funded by: Wellcome Trust: 074732, 079703

    PloS one 2009;4;5;e5564

  • Inferring selection on amino acid preference in protein domains.

    Moses AM and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. alan.moses@utoronto.ca

    Models that explicitly account for the effect of selection on new mutations have been proposed to account for "codon bias" or the excess of "preferred" codons that results from selection for translational efficiency and/or accuracy. In principle, such models can be applied to any mutation that results in a preferred allele, but in most cases, the fitness effect of a specific mutation cannot be predicted. Here we show that it is possible to assign preferred and unpreferred states to amino acid changing mutations that occur in protein domains. We propose that mutations that lead to more common amino acids (at a given position in a domain) can be considered "preferred alleles" just as are synonymous mutations leading to codons for more abundant tRNAs. We use genome-scale polymorphism data to show that alleles for preferred amino acids in protein domains occur at higher frequencies in the population, as has been shown for preferred codons. We show that this effect is quantitative, such that there is a correlation between the shift in frequency of preferred alleles and the predicted fitness effect. As expected, we also observe a reduction in the numbers of polymorphisms and substitutions at more important positions in domains, consistent with stronger selection at those positions. We examine the derived allele frequency distribution and polymorphism to divergence ratios of preferred and unpreferred differences and find evidence for both negative and positive selections acting to maintain protein domains in the human population. Finally, we analyze a model for selection on amino acid preferences in protein domains and find that it is consistent with the quantitative effects that we observe.

    Funded by: Wellcome Trust: 077192

    Molecular biology and evolution 2009;26;3;527-36

  • Gene-wide analyses of genome-wide association data sets: evidence for multiple common risk alleles for schizophrenia and bipolar disorder and for overlap in genetic risk.

    Moskvina V, Craddock N, Holmans P, Nikolov I, Pahwa JS, Green E, Wellcome Trust Case Control Consortium, Owen MJ and O'Donovan MC

    Department of Psychological Medicine, School of Medicine, Cardiff University, Cardiff, UK.

    Genome-wide association (GWAS) analyses have identified susceptibility loci for many diseases, but most risk for any complex disorder remains unattributed. There is therefore scope for complementary approaches to these data sets. Gene-wide approaches potentially offer additional insights. They might identify association to genes through multiple signals. Also, by providing support for genes rather than single nucleotide polymorphisms (SNPs), they offer an additional opportunity to compare the results across data sets. We have undertaken gene-wide analysis of two GWAS data sets: schizophrenia and bipolar disorder. We performed two forms of analysis, one based on the smallest P-value per gene, the other on a truncated product of P method. For each data set and at a range of statistical thresholds, we observed significantly more SNPs within genes (P(min) for excess<0.001) showing evidence for association than expected whereas this was not true for extragenic SNPs (P(min) for excess>0.1). At a range of thresholds of significance, we also observed substantially more associated genes than expected (P(min) for excess in schizophrenia=1.8 x 10(-8), in bipolar=2.4 x 10(-6)). Moreover, an excess of genes showed evidence for association across disorders. Among those genes surpassing thresholds highly enriched for true association, we observed evidence for association to genes reported in other GWAS data sets (CACNA1C) or to closely related family members of those genes including CSF2RB, CACNA1B and DGKI. Our analyses show that association signals are enriched in and around genes, large numbers of genes contribute to both disorders and gene-wide analyses offer useful complementary approaches to more standard methods.

    Funded by: Medical Research Council: G0000934, G0800509; Wellcome Trust: 076113, 079643

    Molecular psychiatry 2009;14;3;252-60

  • Novel genes in cell cycle control and lipid metabolism with dynamically regulated binding sites for sterol regulatory element-binding protein 1 and RNA polymerase II in HepG2 cells detected by chromatin immunoprecipitation with microarray detection.

    Motallebipour M, Enroth S, Punga T, Ameur A, Koch C, Dunham I, Komorowski J, Ericsson J and Wadelius C

    Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden.

    Sterol regulatory element-binding proteins 1 and 2 (SREBP-1 and SREBP-2) are important regulators of genes involved in cholesterol and fatty acid metabolism, but have also been implicated in the regulation of the cell cycle and have been associated with the pathogenesis of type 2 diabetes, atherosclerosis and obesity, among others. In this study, we aimed to characterize the binding sites of SREBP-1 and RNA polymerase II through chromatin immunoprecipitation and microarray analysis in 1% of the human genome, as defined by the Encyclopaedia of DNA Elements consortium, in a hepatocellular carcinoma cell line (HepG2). Our data identified novel binding sites for SREBP-1 in genes directly or indirectly involved in cholesterol metabolism, e.g. apolipoprotein C-III (APOC3). The most interesting biological findings were the binding sites for SREBP-1 in genes for host cell factor C1 (HCFC1), involved in cell cycle regulation, and for filamin A (FLNA). For RNA polymerase II, we found binding sites at classical promoters, but also in intergenic and intragenic regions. Furthermore, we found evidence of sterol-regulated binding of SREBP-1 and RNA polymerase II to HCFC1 and FLNA. From the results of this work, we infer that SREBP-1 may be involved in processes other than lipid metabolism.

    The FEBS journal 2009;276;7;1878-90

  • Abnormal behavior in a chromosome-engineered mouse model for human 15q11-13 duplication seen in autism.

    Nakatani J, Tamada K, Hatanaka F, Ise S, Ohta H, Inoue K, Tomonaga S, Watanabe Y, Chung YJ, Banerjee R, Iwamoto K, Kato T, Okazawa M, Yamauchi K, Tanda K, Takao K, Miyakawa T, Bradley A and Takumi T

    Osaka Bioscience Institute, Suita, Osaka 565-0874, Japan.

    Substantial evidence suggests that chromosomal abnormalities contribute to the risk of autism. The duplication of human chromosome 15q11-13 is known to be the most frequent cytogenetic abnormality in autism. We have modeled this genetic change in mice by using chromosome engineering to generate a 6.3 Mb duplication of the conserved linkage group on mouse chromosome 7. Mice with a paternal duplication display poor social interaction, behavioral inflexibility, abnormal ultrasonic vocalizations, and correlates of anxiety. An increased MBII52 snoRNA within the duplicated region, affecting the serotonin 2c receptor (5-HT2cR), correlates with altered intracellular Ca(2+) responses elicited by a 5-HT2cR agonist in neurons of mice with a paternal duplication. This chromosome-engineered mouse model for autism seems to replicate various aspects of human autistic phenotypes and validates the relevance of the human chromosome abnormality. This model will facilitate forward genetics of developmental brain disorders and serve as an invaluable tool for therapeutic development.

    Cell 2009;137;7;1235-46

  • Genetic structure of Europeans: a view from the North-East.

    Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, Karachanak S, Piskácková T, Balascák I, Peltonen L, Jakkula E, Rehnström K, Lathrop M, Heath S, Galan P, Schreiber S, Meitinger T, Pfeufer A, Wichmann HE, Melegh B, Polgár N, Toniolo D, Gasparini P, D'Adamo P, Klovins J, Nikitina-Zake L, Kucinskas V, Kasnauskiene J, Lubinski J, Debniak T, Limborska S, Khrunin A, Estivill X, Rabionet R, Marsal S, Julià A, Antonarakis SE, Deutsch S, Borel C, Attar H, Gagnebin M, Macek M, Krawczak M, Remm M and Metspalu A

    Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.

    Using principal component (PC) analysis, we studied the genetic constitution of 3,112 individuals from Europe as portrayed by more than 270,000 single nucleotide polymorphisms (SNPs) genotyped with the Illumina Infinium platform. In cohorts where the sample size was >100, one hundred randomly chosen samples were used for analysis to minimize the sample size effect, resulting in a total of 1,564 samples. This analysis revealed that the genetic structure of the European population correlates closely with geography. The first two PCs highlight the genetic diversity corresponding to the northwest to southeast gradient and position the populations according to their approximate geographic origin. The resulting genetic map forms a triangular structure with a) Finland, b) the Baltic region, Poland and Western Russia, and c) Italy as its vertexes, and with d) Central- and Western Europe in its centre. Inter- and intra- population genetic differences were quantified by the inflation factor lambda (lambda) (ranging from 1.00 to 4.21), fixation index (F(st)) (ranging from 0.000 to 0.023), and by the number of markers exhibiting significant allele frequency differences in pair-wise population comparisons. The estimated lambda was used to assess the real diminishing impact to association statistics when two distinct populations are merged directly in an analysis. When the PC analysis was confined to the 1,019 Estonian individuals (0.1% of the Estonian population), a fine structure emerged that correlated with the geography of individual counties. With at least two cohorts available from several countries, genetic substructures were investigated in Czech, Finnish, German, Estonian and Italian populations. Together with previously published data, our results allow the creation of a comprehensive European genetic map that will greatly facilitate inter-population genetic studies including genome wide association studies (GWAS).

    PloS one 2009;4;5;e5472

  • Genome-wide association study identifies eight loci associated with blood pressure.

    Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, Coin L, Najjar SS, Zhao JH, Heath SC, Eyheramendy S, Papadakis K, Voight BF, Scott LJ, Zhang F, Farrall M, Tanaka T, Wallace C, Chambers JC, Khaw KT, Nilsson P, van der Harst P, Polidoro S, Grobbee DE, Onland-Moret NC, Bots ML, Wain LV, Elliott KS, Teumer A, Luan J, Lucas G, Kuusisto J, Burton PR, Hadley D, McArdle WL, Wellcome Trust Case Control Consortium, Brown M, Dominiczak A, Newhouse SJ, Samani NJ, Webster J, Zeggini E, Beckmann JS, Bergmann S, Lim N, Song K, Vollenweider P, Waeber G, Waterworth DM, Yuan X, Groop L, Orho-Melander M, Allione A, Di Gregorio A, Guarrera S, Panico S, Ricceri F, Romanazzi V, Sacerdote C, Vineis P, Barroso I, Sandhu MS, Luben RN, Crawford GJ, Jousilahti P, Perola M, Boehnke M, Bonnycastle LL, Collins FS, Jackson AU, Mohlke KL, Stringham HM, Valle TT, Willer CJ, Bergman RN, Morken MA, Döring A, Gieger C, Illig T, Meitinger T, Org E, Pfeufer A, Wichmann HE, Kathiresan S, Marrugat J, O'Donnell CJ, Schwartz SM, Siscovick DS, Subirana I, Freimer NB, Hartikainen AL, McCarthy MI, O'Reilly PF, Peltonen L, Pouta A, de Jong PE, Snieder H, van Gilst WH, Clarke R, Goel A, Hamsten A, Peden JF, Seedorf U, Syvänen AC, Tognoni G, Lakatta EG, Sanna S, Scheet P, Schlessinger D, Scuteri A, Dörr M, Ernst F, Felix SB, Homuth G, Lorbeer R, Reffelmann T, Rettig R, Völker U, Galan P, Gut IG, Hercberg S, Lathrop GM, Zelenika D, Deloukas P, Soranzo N, Williams FM, Zhai G, Salomaa V, Laakso M, Elosua R, Forouhi NG, Völzke H, Uiterwaal CS, van der Schouw YT, Numans ME, Matullo G, Navis G, Berglund G, Bingham SA, Kooner JS, Connell JM, Bandinelli S, Ferrucci L, Watkins H, Spector TD, Tuomilehto J, Altshuler D, Strachan DP, Laan M, Meneton P, Wareham NJ, Uda M, Jarvelin MR, Mooser V, Melander O, Loos RJ, Elliott P, Abecasis GR, Caulfield M and Munroe PB

    Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. cnewtoncheh@chgr.mgh.harvard.edu

    Elevated blood pressure is a common, heritable cause of cardiovascular disease worldwide. To date, identification of common genetic variants influencing blood pressure has proven challenging. We tested 2.5 million genotyped and imputed SNPs for association with systolic and diastolic blood pressure in 34,433 subjects of European ancestry from the Global BPgen consortium and followed up findings with direct genotyping (N ≤ 71,225 European ancestry, N ≤ 12,889 Indian Asian ancestry) and in silico comparison (CHARGE consortium, N = 29,136). We identified association between systolic or diastolic blood pressure and common variants in eight regions near the CYP17A1 (P = 7 × 10(-24)), CYP1A2 (P = 1 × 10(-23)), FGF5 (P = 1 × 10(-21)), SH2B3 (P = 3 × 10(-18)), MTHFR (P = 2 × 10(-13)), c10orf107 (P = 1 × 10(-9)), ZNF652 (P = 5 × 10(-9)) and PLCD3 (P = 1 × 10(-8)) genes. All variants associated with continuous blood pressure were associated with dichotomous hypertension. These associations between common variants and blood pressure and hypertension offer mechanistic insights into the regulation of blood pressure and may point to novel targets for interventions to prevent cardiovascular disease.

    Funded by: British Heart Foundation: FS/05/061/19501, PG02/128, SP/04/002; Cancer Research UK; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: 85374, G0000934, G0400874, G0401527, G0501942, G0600329, G0701863, G0800759, G0801056, G9521010, G9521010D, MC_QA137934, MC_U105630924, MC_U106188470, MC_U137686854; NCRR NIH HHS: U54RR020278; NHGRI NIH HHS: 1Z01HG000024; NHLBI NIH HHS: K23 HL080025-04, K23HL083102, K23HL80025, R01 HL056931, R01 HL056931-02, R01 HL056931-03, R01 HL056931-04, R01HL056931, R01HL087676, R01HL087679; NIA NIH HHS: N01-AG-1-2109, N01AG-821336, N01AG-916413; NICHD NIH HHS: N01-HD-1-3107; NIDA NIH HHS: U54DA021519; NIDDK NIH HHS: DK062370, DK072193, R01 DK029867, R01 DK072193-04, U01DK062418; NIEHS NIH HHS: P30ES007033; NIMH NIH HHS: RL1MH083268; NIMHD NIH HHS: 263MD821336, 263MD916413; PHS HHS: 263-MA-410953; Wellcome Trust: 061858, 068545/Z/02, 070191/Z/03/Z, 076113, 076113/B/04/Z, 077011, 077016, 077016/Z/05/Z, 079557, 079895, 088885, 089061, WT088885/Z/09/Z

    Nature genetics 2009;41;6;666-76

  • Chromosomal rearrangements underlying karyotype differences between Chinese pangolin (Manis pentadactyla) and Malayan pangolin (Manis javanica) revealed by chromosome painting.

    Nie W, Wang J, Su W, Wang Y and Yang F

    State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, PR China. whnie@mail.kiz.ac.cn

    The Chinese pangolin (Manis pentadactyla), a representative species of the order Pholidota, has been enlisted in the mammalian whole-genome sequencing project mainly because of its phylogenetic importance. Previous studies showed that the diploid number of M. pentadactyla could vary from 2n = 36 to 42. To further characterize the genome organization of M. pentadactyla and to elucidate chromosomal mechanism underlying the karyotype diversity of Pholidota, we flow-sorted the chromosomes of 2n = 40 M. pentadactyla, and generated a set of chromosome-specific probes by DOP-PCR amplification of flow-sorted chromosomes. A comparative chromosome map between M. pentadactyla and the Malayan pangolin (Manis javanica, 2n = 38), as well as between human and M. pentadactyla, was established by chromosome painting for the first time. Our results demonstrate that seven Robertsonian rearrangements, together with considerable variations in the quantity of heterochromatin and in the number of nucleolar organizer regions (NORs) differentiate the karyotypes of 2n = 38 M. javanica and 2n = 40 M. pentadactyla. Moreover, we confirm that the M. javanica Y chromosome bears one NOR. Comparison of human homologous segment associations found in the genomes of M. javanica and M. pentadactyla revealed seven shared associations (HSA 1q/11, 2p/5, 2q/10q, 4p+q/20, 5/13, 6/19p and 8q/10p) that could constitute the potential Pholidota-specific signature rearrangements.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2009;17;3;321-9

  • Common genetic variation near the phospholamban gene is associated with cardiac repolarisation: meta-analysis of three genome-wide association studies.

    Nolte IM, Wallace C, Newhouse SJ, Waggott D, Fu J, Soranzo N, Gwilliam R, Deloukas P, Savelieva I, Zheng D, Dalageorgou C, Farrall M, Samani NJ, Connell J, Brown M, Dominiczak A, Lathrop M, Zeggini E, Wain LV, Wellcome Trust Case Control Consortium, DCCT/EDIC Research Group, Newton-Cheh C, Eijgelsheim M, Rice K, de Bakker PI, QTGEN consortium, Pfeufer A, Sanna S, Arking DE, QTSCD consortium, Asselbergs FW, Spector TD, Carter ND, Jeffery S, Tobin M, Caulfield M, Snieder H, Paterson AD, Munroe PB and Jamshidi Y

    Unit of Genetic Epidemiology and Bioinformatics, Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands.

    To identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS) including 3,558 subjects from the TwinsUK and BRIGHT cohorts in the UK and the DCCT/EDIC cohort from North America. Five loci were significantly associated with QT interval at P<1x10(-6). To validate these findings we performed an in silico comparison with data from two QT consortia: QTSCD (n = 15,842) and QTGEN (n = 13,685). Analysis confirmed the association between common variants near NOS1AP (P = 1.4x10(-83)) and the phospholamban (PLN) gene (P = 1.9x10(-29)). The most associated SNP near NOS1AP (rs12143842) explains 0.82% variance; the SNP near PLN (rs11153730) explains 0.74% variance of QT interval duration. We found no evidence for interaction between these two SNPs (P = 0.99). PLN is a key regulator of cardiac diastolic function and is involved in regulating intracellular calcium cycling, it has only recently been identified as a susceptibility locus for QT interval. These data offer further mechanistic insights into genetic influence on the QT interval which may predispose to life threatening arrhythmias and sudden cardiac death.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: 06/094, FS/05/061/19501, PG02/128, SP/02/001; Department of Health; Medical Research Council: G0400874, G0501942, G9521010, G9521010D; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: HL054512, HL86694, K23-HL-080025, N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85086, N02-HL-6-4278, R01 HL087652, R01HL086694, R01HL087641, R01HL59367, U01 HL080295; NIA NIH HHS: N01-AG-1-2109; NIDDK NIH HHS: N01-DK-6-2204, R01-DK-077510; PHS HHS: 263-MA-410953, HHSN268200625226C; Wellcome Trust: WT088885/Z/09/Z

    PloS one 2009;4;7;e6138

  • Developmentally regulated impediments to skin reinnervation by injured peripheral sensory axon terminals.

    O'Brien GS, Martin SM, Söllner C, Wright GJ, Becker CG, Portera-Cailliau C and Sagasti A

    Department of Molecular Cell and Developmental Biology, University of California, Los Angeles, CA 90095, USA.

    The structural plasticity of neurites in the central nervous system (CNS) diminishes dramatically after initial development, but the peripheral nervous system (PNS) retains substantial plasticity into adulthood. Nevertheless, functional reinnervation by injured peripheral sensory neurons is often incomplete [1-6]. To investigate the developmental control of skin reinnervation, we imaged the regeneration of trigeminal sensory axon terminals in live zebrafish larvae following laser axotomy. When axons were injured during early stages of outgrowth, regenerating and uninjured axons grew into denervated skin and competed with one another for territory. At later stages, after the establishment of peripheral arbor territories, the ability of uninjured neighbors to sprout diminished severely, and although injured axons reinitiated growth, they were repelled by denervated skin. Regenerating axons were repelled specifically by their former territories, suggesting that local inhibitory factors persist in these regions. Antagonizing the function of several members of the Nogo receptor (NgR)/RhoA pathway improved the capacity of injured axons to grow into denervated skin. Thus, as in the CNS, impediments to reinnervation in the PNS arise after initial establishment of axon arbor structure.

    Funded by: NIDCR NIH HHS: R01 DE018496-01A2; Wellcome Trust

    Current biology : CB 2009;19;24;2086-90

  • Functional genomics in zebrafish permits rapid characterization of novel platelet membrane proteins.

    O'Connor MN, Salles II, Cvejic A, Watkins NA, Walker A, Garner SF, Jones CI, Macaulay IC, Steward M, Zwaginga JJ, Bray SL, Dudbridge F, de Bono B, Goodall AH, Deckmyn H, Stemple DL, Ouwehand WH and Bloodomics Consortium

    Department of Haematology, University of Cambridge, Cambridge, United Kingdom.

    In this study, we demonstrate the suitability of the vertebrate Danio rerio (zebrafish) for functional screening of novel platelet genes in vivo by reverse genetics. Comparative transcript analysis of platelets and their precursor cell, the megakaryocyte, together with nucleated blood cell elements, endothelial cells, and erythroblasts, identified novel platelet membrane proteins with hitherto unknown roles in thrombus formation. We determined the phenotype induced by antisense morpholino oligonucleotide (MO)-based knockdown of 5 of these genes in a laser-induced arterial thrombosis model. To validate the model, the genes for platelet glycoprotein (GP) IIb and the coagulation protein factor VIII were targeted. MO-injected fish showed normal thrombus initiation but severely impaired thrombus growth, consistent with the mouse knockout phenotypes, and concomitant knockdown of both resulted in spontaneous bleeding. Knockdown of 4 of the 5 novel platelet proteins altered arterial thrombosis, as demonstrated by modified kinetics of thrombus initiation and/or development. We identified a putative role for BAMBI and LRRC32 in promotion and DCBLD2 and ESAM in inhibition of thrombus formation. We conclude that phenotypic analysis of MO-injected zebrafish is a fast and powerful method for initial screening of novel platelet proteins for function in thrombosis.

    Funded by: Wellcome Trust: WT077037/Z/05/Z, WT082597/Z/07/Z

    Blood 2009;113;19;4754-62

  • Somatic mutation databases as tools for molecular epidemiology and molecular pathology of cancer: proposed guidelines for improving data collection, distribution, and integration.

    Olivier M, Petitjean A, Teague J, Forbes S, Dunnick JK, den Dunnen JT, Langerød A, Wilkinson JM, Vihinen M, Cotton RG, Hainaut P, IARC and EC FP6

    Group of Molecular Carcinogenesis and Biomarkers, International Agency for Research on Cancer, World Health Organization, Lyon, France. molivier@iarc.fr

    There are currently less than 40 locus-specific databases (LSDBs) and one large general database that curate data on somatic mutations in human cancer genes. These databases have different scope and use different annotation standards and database systems, resulting in duplicated efforts in data curation, and making it difficult for users to find clear and consistent information. As data related to somatic mutations are generated at an increasing pace it is urgent to create a framework for improving the collecting of this information and making it more accessible to clinicians, scientists, and epidemiologists to facilitate research on biomarkers. Here we propose a data flow for improving the connectivity between existing databases and we provide practical guidelines for data reporting, database contents, and annotation standards. These proposals are based on common standards recommended by the Human Genome Variation Society (HGVS) with additions related to specific requirements of somatic mutations in cancer. Indeed, somatic mutations may be used in molecular pathology and clinical studies to characterize tumor types, help treatment choice, predict response to treatment and patient outcome, or in epidemiological studies as markers for tumor etiology or exposure assessment. Thus, specific annotations are required to cover these diverse research topics. This initiative is meant to promote collaboration and discussion on these issues and the development of adequate resources that would avoid the loss of extremely valuable information generated by years of basic and clinical research.

    Human mutation 2009;30;3;275-82

  • Findings from bipolar disorder genome-wide association studies replicate in a Finnish bipolar family-cohort.

    Ollila HM, Soronen P, Silander K, Palo OM, Kieseppä T, Kaunisto MA, Lönnqvist J, Peltonen L, Partonen T and Paunio T

    Funded by: Wellcome Trust: 089061

    Molecular psychiatry 2009;14;4;351-3

  • Genetic variation in LIN28B is associated with the timing of puberty.

    Ong KK, Elks CE, Li S, Zhao JH, Luan J, Andersen LB, Bingham SA, Brage S, Smith GD, Ekelund U, Gillson CJ, Glaser B, Golding J, Hardy R, Khaw KT, Kuh D, Luben R, Marcus M, McGeehin MA, Ness AR, Northstone K, Ring SM, Rubin C, Sims MA, Song K, Strachan DP, Vollenweider P, Waeber G, Waterworth DM, Wong A, Deloukas P, Barroso I, Mooser V, Loos RJ and Wareham NJ

    Medical Research Council (MRC) Epidemiology Unit, Addenbrooke's Hospital, Cambridge, UK. ken.ong@mrc-epid.cam.ac.uk

    The timing of puberty is highly variable. We carried out a genome-wide association study for age at menarche in 4,714 women and report an association in LIN28B on chromosome 6 (rs314276, minor allele frequency (MAF) = 0.33, P = 1.5 × 10(-8)). In independent replication studies in 16,373 women, each major allele was associated with 0.12 years earlier menarche (95% CI = 0.08-0.16; P = 2.8 × 10(-10); combined P = 3.6 × 10(-16)). This allele was also associated with earlier breast development in girls (P = 0.001; N = 4,271); earlier voice breaking (P = 0.006, N = 1,026) and more advanced pubic hair development in boys (P = 0.01; N = 4,588); a faster tempo of height growth in girls (P = 0.00008; N = 4,271) and boys (P = 0.03; N = 4,588); and shorter adult height in women (P = 3.6 × 10(-7); N = 17,274) and men (P = 0.006; N = 9,840) in keeping with earlier growth cessation. These studies identify variation in LIN28B, a potent and specific regulator of microRNA processing, as the first genetic determinant regulating the timing of human pubertal growth and development.

    Funded by: Cancer Research UK; Medical Research Council: 73437, G0000934, G0401527, G0401527(74922), G0701863, G9815508, MC_U105630924, MC_U106179471, MC_U106179472, MC_U106179473, MC_U106188470, MC_U123092720, MC_U123092721, U.1061.00.001 (79471), U.1061.00.004(79472); Wellcome Trust: 068049, 068545/Z/02, 076467/Z/05/Z, 077011, 077016, 077016/Z/05/Z, 079996

    Nature genetics 2009;41;6;729-33

  • Combined effects of three independent SNPs greatly increase the risk estimate for RA at 6q23.

    Orozco G, Hinks A, Eyre S, Ke X, Gibbons LJ, Bowes J, Flynn E, Martin P, Wellcome Trust Case Control Consortium, YEAR consortium, Wilson AG, Bax DE, Morgan AW, Emery P, Steer S, Hocking L, Reid DM, Wordsworth P, Harrison P, Thomson W, Barton A and Worthington J

    arc-Epidemiology Unit, Stopford Building, The University of Manchester, Manchester M13 9PT, UK. gisela.orozco@manchester.ac.uk

    The most consistent finding derived from the WTCCC GWAS for rheumatoid arthritis (RA) was association to a SNP at 6q23. We performed a fine-mapping of the region in order to search the 6q23 region for additional disease variants. 3962 RA patients and 3531 healthy controls were included in the study. We found 18 SNPs associated with RA. The SNP showing the strongest association was rs6920220 [P = 2.6 x 10(-6), OR (95% CI) 1.22 (1.13-1.33)]. The next most strongly associated SNP was rs13207033 [P = 0.0001, OR (95% CI) 0.86 (0.8-0.93)] which was perfectly correlated with rs10499194, a SNP previously associated with RA in a US/European series. Additionally, we found a number of new potential RA markers, including rs5029937, located in the intron 2 of TNFAIP3. Of the 18 associated SNPs, three polymorphisms, rs6920220, rs13207033 and rs5029937, remained significant after conditional logistic regression analysis. The combination of the carriage of both risk alleles of rs6920220 and rs5029937 together with the absence of the protective allele of rs13207033 was strongly associated with RA when compared with carriage of none [OR of 1.86 (95% CI) (1.51-2.29)]. This equates to an effect size of 1.50 (95% CI 1.21-1.85) compared with controls and is higher than that obtained for any SNP individually. This is the first study to show that the confirmed loci from the GWA studies, that confer only a modest effect size, could harbour a significantly greater effect once the effect of additional risk variants are accounted for.

    Funded by: Arthritis Research UK: 17552, 18475; Medical Research Council: G0000934; Wellcome Trust: 061858, 068545/Z/02

    Human molecular genetics 2009;18;14;2693-9

  • The discovery of genes implicated in myocardial infarction

    Ouwehand WH, Bloodomics-Cardiogenics-Consortia

    Journal of Thrombosis and Haemostasis; 22nd Congress of the International-Society-on-Thrombosis-and-Haemostasis. 2009;7;305-7

  • Sites of differential DNA methylation between placenta and peripheral blood: molecular markers for noninvasive prenatal diagnosis of aneuploidies.

    Papageorgiou EA, Fiegler H, Rakyan V, Beck S, Hulten M, Lamnissou K, Carter NP and Patsalis PC

    Cytogenetics and Genomics Department, The CyprusInstitute of Neurology and Genetics, Nicosia, Cyprus.

    The use of epigenetic differences between maternal whole blood and fetal (placental) DNA is one of the main areas of interest for the development of noninvasive prenatal diagnosis of aneuploidies. However, the lack of detailed chromosome-wide identification of differentially methylated sites has limited the application of this approach. In this study, we describe an analysis of chromosome-wide methylation status using methylation DNA immunoprecipitation coupled with high-resolution tiling oligonucleotide array analysis specific for chromosomes 21, 18, 13, X, and Y using female whole blood and placental DNA. We identified more than 2000 regions of differential methylation between female whole blood and placental DNA on each of the chromosomes tested. A subset of the differentially methylated regions identified was validated by real-time quantitative polymerase chain reaction. Additionally, correlation of these regions with CpG islands, genes, and promoter regions was investigated. Between 56 to 83% of the regions were located within nongenic regions whereas only 1 to 11% of the regions overlapped with CpG islands; of these, up to 65% were found in promoter regions. In summary, we identified a large number of previously unreported fetal epigenetic molecular markers that have the potential to be developed into targets for noninvasive prenatal diagnosis of trisomy 21 and other common aneuploidies. In addition, we demonstrated the effectiveness of the methylation DNA immunoprecipitation approach in the enrichment of hypermethylated fetal DNA.

    The American journal of pathology 2009;174;5;1609-18

  • Ethical data release in genome-wide association studies in developing countries.

    Parker M, Bull SJ, de Vries J, Agbenyega T, Doumbo OK and Kwiatkowski DP

    Ethox Centre, University of Oxford, Oxford, United Kingdom. michael.parker@ethox.ox.ac.uk

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; PHS HHS: 566; Wellcome Trust: 077383/Z/05/Z, 087285/Z/08/Z

    PLoS medicine 2009;6;11;e1000143

  • A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

    Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, Maskell DJ, Parkhill J, Choudhary J, Thomson NR and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    High-density, strand-specific cDNA sequencing (ssRNA-seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi). By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR). An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.

    Funded by: Wellcome Trust

    PLoS genetics 2009;5;7;e1000569

  • Circulating beta-carotene levels and type 2 diabetes-cause or effect?

    Perry JR, Ferrucci L, Bandinelli S, Guralnik J, Semba RD, Rice N, Melzer D, DIAGRAM Consortium, Saxena R, Scott LJ, McCarthy MI, Hattersley AT, Zeggini E, Weedon MN and Frayling TM

    Peninsula Medical School, Exeter, UK.

    Circulating beta-carotene levels are inversely associated with risk of type 2 diabetes, but the causal direction of this association is not certain. In this study we used a Mendelian randomisation approach to provide evidence for or against the causal role of the antioxidant vitamin beta-carotene in type 2 diabetes.

    Methods: We used a common polymorphism (rs6564851) near the BCMO1 gene, which is strongly associated with circulating beta-carotene levels (p = 2 x 10(-24)), with each G allele associated with a 0.27 standard deviation increase in levels. We used data from the InCHIANTI and Uppsala Longitudinal Study of Adult Men (ULSAM) studies to estimate the association between beta-carotene levels and type 2 diabetes. We next used a triangulation approach to estimate the expected effect of rs6564851 on type 2 diabetes risk and compared this with the observed effect using data from 4549 type 2 diabetes patients and 5579 controls from the Diabetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium.

    Results: A 0.27 standard deviation increase in beta-carotene levels was associated with an OR of 0.90 (95% CI 0.86-0.95) for type 2 diabetes in the InCHIANTI study. This association was similar to that of the ULSAM study (OR 0.90 [0.84-0.97]). In contrast, there was no association between rs6564851 and type 2 diabetes (OR 0.98 [0.93-1.04], p = 0.58); this effect size was also smaller than that expected, given the known associations between rs6564851 and beta-carotene levels, and the associations between beta-carotene levels and type 2 diabetes.

    Our findings in this Mendelian randomisation study are in keeping with randomised controlled trials suggesting that beta-carotene is not causally protective against type 2 diabetes.

    Funded by: NIA NIH HHS: N01-AG-6-2101, N01-AG-6-2103, N01-AG-6-2106, R01 AG24233-01, Z01 AG000015-50

    Diabetologia 2009;52;10;2117-21

  • Meta-analysis of genome-wide association data identifies two loci influencing age at menarche.

    Perry JR, Stolk L, Franceschini N, Lunetta KL, Zhai G, McArdle PF, Smith AV, Aspelund T, Bandinelli S, Boerwinkle E, Cherkas L, Eiriksdottir G, Estrada K, Ferrucci L, Folsom AR, Garcia M, Gudnason V, Hofman A, Karasik D, Kiel DP, Launer LJ, van Meurs J, Nalls MA, Rivadeneira F, Shuldiner AR, Singleton A, Soranzo N, Tanaka T, Visser JA, Weedon MN, Wilson SG, Zhuang V, Streeten EA, Harris TB, Murray A, Spector TD, Demerath EW, Uitterlinden AG and Murabito JM

    Institute of Biomedical and Clinical Science, Peninsula Medical School, Exeter, UK.

    We conducted a meta-analysis of genome-wide association data to detect genes influencing age at menarche in 17,510 women. The strongest signal was at 9q31.2 (P = 1.7 × 10(-9)), where the nearest genes include TMEM38B, FKTN, FSD1L, TAL2 and ZNF462. The next best signal was near the LIN28B gene (rs7759938; P = 7.0 × 10(-9)), which also influences adult height. We provide the first evidence for common genetic variants influencing female sexual maturation.

    Funded by: NCRR NIH HHS: M01 RR 16500, M01 RR016500-02; NHLBI NIH HHS: N01 HC025195, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02 HL64278, N02-HL-6-4278, U01 HL072515, U01 HL072515-06, U01 HL72515; NIA NIH HHS: N.1-AG-1-1, N.1-AG-1-2111, N01 AG012100, N01-AG-12100, N01-AG-5-0002, R01 AR/AG 41398, R21 AG032598-02, R21AG032598, U19 AG023122, U19 AG023122-05; NIAMS NIH HHS: R01 AR041398-15; NIDDK NIH HHS: P30 DK072488, P30 DK072488-02; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164, 263 MD821336, 263 MD9164 13; Wellcome Trust

    Nature genetics 2009;41;6;648-50

  • OSBPL10, a novel candidate gene for high triglyceride trait in dyslipidemic Finnish subjects, regulates cellular lipid metabolism.

    Perttilä J, Merikanto K, Naukkarinen J, Surakka I, Martin NW, Tanhuanpää K, Grimard V, Taskinen MR, Thiele C, Salomaa V, Jula A, Perola M, Virtanen I, Peltonen L and Olkkonen VM

    National Institute for Health and Welfare/Public Health Genomics Unit, Biomedicum, 00251 Helsinki, Finland.

    Analysis of variants in three genes encoding oxysterol-binding protein (OSBP) homologues (OSBPL2, OSBPL9, OSBPL10) in Finnish families with familial low high-density lipoprotein (HDL) levels (N = 426) or familial combined hyperlipidemia (N = 684) revealed suggestive linkage of OSBPL10 single-nucleotide polymorphisms (SNPs) with extreme end high triglyceride (TG; >90th percentile) trait. Prompted by this initial finding, we carried out association analysis in a metabolic syndrome subcohort (Genmets) of Health2000 examination survey (N = 2,138), revealing association of multiple OSBPL10 SNPs with high serum TG levels (>95th percentile). To investigate whether OSBPL10 could be the gene underlying the observed linkage and association, we carried out functional experiments in the human hepatoma cell line Huh7. Silencing of OSBPL10 increased the incorporation of [(3)H]acetate into cholesterol and both [(3)H]acetate and [(3)H]oleate into triglycerides and enhanced the accumulation of secreted apolipoprotein B100 in growth medium, suggesting that the encoded protein ORP10 suppresses hepatic lipogenesis and very-low-density lipoprotein production. ORP10 was shown to associate dynamically with microtubules, consistent with its involvement in intracellular transport or organelle positioning. The data introduces OSBPL10 as a gene whose variation may contribute to high triglyceride levels in dyslipidemic Finnish subjects and provides evidence for ORP10 as a regulator of cellular lipid metabolism.

    Journal of molecular medicine (Berlin, Germany) 2009;87;8;825-35

  • Agouti C57BL/6N embryonic stem cells for mouse genetic resources.

    Pettitt SJ, Liang Q, Rairdan XY, Moran JL, Prosser HM, Beier DR, Lloyd KC, Bradley A and Skarnes WC

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    We report the characterization of a highly germline competent C57BL/6N mouse embryonic stem cell line, JM8. To simplify breeding schemes, the dominant agouti coat color gene was restored in JM8 cells by targeted repair of the C57BL/6 nonagouti mutation. These cells provide a robust foundation for large-scale mouse knockout programs that aim to provide a public resource of targeted mutations in the C57BL/6 genetic background.

    Funded by: NHGRI NIH HHS: U01-HG004080; PHS HHS: U01-42430; Wellcome Trust: 077188, WT077187

    Nature methods 2009;6;7;493-5

  • Variation in Salmonella enterica serovar typhi IncHI1 plasmids during the global spread of resistant typhoid fever.

    Phan MD, Kidgell C, Nair S, Holt KE, Turner AK, Hinds J, Butcher P, Cooke FJ, Thomson NR, Titball R, Bhutta ZA, Hasan R, Dougan G and Wain J

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    A global collection of plasmids of the IncHI1 incompatibility group from Salmonella enterica serovar Typhi were analyzed by using a combination of DNA sequencing, DNA sequence analysis, PCR, and microarrays. The IncHI1 resistance plasmids of serovar Typhi display a backbone of conserved gene content and arrangement, within which are embedded preferred acquisition sites for horizontal DNA transfer events. The variable regions appear to be preferred acquisition sites for DNA, most likely through composite transposition, which is presumably driven by the acquisition of resistance genes. Plasmid multilocus sequence typing, a molecular typing method for IncHI1 plasmids, was developed using variation in six conserved loci to trace the spread of these plasmids and to elucidate their evolutionary relationships. The application of this method to a collection of 36 IncHI1 plasmids revealed a chronological clustering of plasmids despite their difference in geographical origins. Our findings suggest that the predominant plasmid types present after 1993 have not evolved directly from the earlier predominant plasmid type but have displaced them. We propose that antibiotic selection acts to maintain resistance genes on the plasmid, but there is also competition between plasmids encoding the same resistance phenotype.

    Funded by: Wellcome Trust: 080039

    Antimicrobial agents and chemotherapy 2009;53;2;716-27

  • Preparation of bacteriophage lysates and pure DNA.

    Pickard DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Preparation of pure bacteriophage DNA used to rely on using CsCl gradients to give high purity or methods that yielded DNA that was either of low recovery or subject to significant genomic contamination. Recently though, new methods have come along that allow the purification of DNA from plate lysates that are not only capable of high yield but also, for all intents and purposes, free of genomic contamination (i.e. no visible genomic contamination on restriction analysis or when used for bacteriophage sequencing). This protocol that form the basis of this short section can be used to prepare bacteriophage DNA from one or two 9 cm L-agar plates. For these preps, the use of agarose in the top agar is recommended to avoid any restriction inhibitors that may be present in some agar preparations.

    Methods in molecular biology (Clifton, N.J.) 2009;502;3-9

  • Association of AKT1 with verbal learning, verbal memory, and regional cortical gray matter density in twins.

    Pietiläinen OP, Paunio T, Loukola A, Tuulio-Henriksson A, Kieseppä T, Thompson P, Toga AW, van Erp TG, Silventoinen K, Soronen P, Hennah W, Turunen JA, Wedenoja J, Palo OM, Silander K, Lönnqvist J, Kaprio J, Cannon TD and Peltonen L

    FIMM, Institute for Molecular Medicine Finland and National Public Health Institute, Biomedicum, Helsinki, Finland.

    AKT1, encoding the protein kinase B, has been associated with the genetic etiology of schizophrenia and bipolar disorder. However, minuscule data exist on the role of different alleles of AKT1 in measurable quantitative endophenotypes, such as cognitive abilities and neuroanatomical features, showing deviations in schizophrenia and bipolar disorder. We evaluated the contribution of AKT1 to quantitative cognitive traits and 3D high-resolution neuroanatomical images in a Finnish twin sample consisting of 298 twins: 61 pairs with schizophrenia (8 concordant), 31 pairs with bipolar disorder (5 concordant) and 65 control pairs matched for age, sex and demographics. An AKT1 allele defined by the SNP rs1130214 located in the UTR of the gene revealed association with cognitive traits related to verbal learning and memory (P = 0.0005 for a composite index). This association was further fortified by a higher degree of resemblance of verbal memory capacity in pairs sharing the rs1130214 genotype compared to pairs not sharing the genotype. Furthermore, the same allele was also associated with decreased gray matter density in medial and dorsolateral prefrontal cortex (P < 0.05). Our findings support the role of AKT1 in the genetic background of cognitive and anatomical features, known to be affected by psychotic disorders. The established association of the same allelic variant of AKT1 with both cognitive and neuroanatomical aberrations could suggest that AKT1 exerts its effect on verbal learning and memory via neural networks involving prefrontal cortex.

    Funded by: Wellcome Trust: 089061

    American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics 2009;150B;5;683-92

  • Mother's genome or maternally-inherited genes acting in the fetus influence gestational age in familial preterm birth.

    Plunkett J, Feitosa MF, Trusgnich M, Wangler MF, Palomar L, Kistka ZA, DeFranco EA, Shen TT, Stormo AE, Puttonen H, Hallman M, Haataja R, Luukkonen A, Fellman V, Peltonen L, Palotie A, Daw EW, An P, Teramo K, Borecki I and Muglia LJ

    Department of Pediatrics, Washington University School of Medicine, St Louis, Miss, USA.

    Objective: While multiple lines of evidence suggest the importance of genetic contributors to risk of preterm birth, the nature of the genetic component has not been identified. We perform segregation analyses to identify the best fitting genetic model for gestational age, a quantitative proxy for preterm birth.

    Methods: Because either mother or infant can be considered the proband from a preterm delivery and there is evidence to suggest that genetic factors in either one or both may influence the trait, we performed segregation analysis for gestational age either attributed to the infant (infant's gestational age), or the mother (by averaging the gestational ages at which her children were delivered), using 96 multiplex preterm families.

    Results: These data lend further support to a genetic component contributing to birth timing since sporadic (i.e. no familial resemblance) and nontransmission (i.e. environmental factors alone contribute to gestational age) models are strongly rejected. Analyses of gestational age attributed to the infant support a model in which mother's genome and/or maternally-inherited genes acting in the fetus are largely responsible for birth timing, with a smaller contribution from the paternally-inherited alleles in the fetal genome.

    Conclusion: Our findings suggest that genetic influences on birth timing are important and likely complex.

    Funded by: NIGMS NIH HHS: T32 GM081739-02, T32GM081739

    Human heredity 2009;68;3;209-19

  • Variants in MTNR1B influence fasting glucose levels.

    Prokopenko I, Langenberg C, Florez JC, Saxena R, Soranzo N, Thorleifsson G, Loos RJ, Manning AK, Jackson AU, Aulchenko Y, Potter SC, Erdos MR, Sanna S, Hottenga JJ, Wheeler E, Kaakinen M, Lyssenko V, Chen WM, Ahmadi K, Beckmann JS, Bergman RN, Bochud M, Bonnycastle LL, Buchanan TA, Cao A, Cervino A, Coin L, Collins FS, Crisponi L, de Geus EJ, Dehghan A, Deloukas P, Doney AS, Elliott P, Freimer N, Gateva V, Herder C, Hofman A, Hughes TE, Hunt S, Illig T, Inouye M, Isomaa B, Johnson T, Kong A, Krestyaninova M, Kuusisto J, Laakso M, Lim N, Lindblad U, Lindgren CM, McCann OT, Mohlke KL, Morris AD, Naitza S, Orrù M, Palmer CN, Pouta A, Randall J, Rathmann W, Saramies J, Scheet P, Scott LJ, Scuteri A, Sharp S, Sijbrands E, Smit JH, Song K, Steinthorsdottir V, Stringham HM, Tuomi T, Tuomilehto J, Uitterlinden AG, Voight BF, Waterworth D, Wichmann HE, Willemsen G, Witteman JC, Yuan X, Zhao JH, Zeggini E, Schlessinger D, Sandhu M, Boomsma DI, Uda M, Spector TD, Penninx BW, Altshuler D, Vollenweider P, Jarvelin MR, Lakatta E, Waeber G, Fox CS, Peltonen L, Groop LC, Mooser V, Cupples LA, Thorsteinsdottir U, Boehnke M, Barroso I, Van Duijn C, Dupuis J, Watanabe RM, Stefansson K, McCarthy MI, Wareham NJ, Meigs JB and Abecasis GR

    [1] Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford OX3 7LJ, UK. [2] Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK. [3] These authors contributed equally to this work.

    To identify previously unknown genetic loci associated with fasting glucose concentrations, we examined the leading association signals in ten genome-wide association scans involving a total of 36,610 individuals of European descent. Variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all ten studies. The strongest signal was observed at rs10830963, where each G allele (frequency 0.30 in HapMap CEU) was associated with an increase of 0.07 (95% CI = 0.06-0.08) mmol/l in fasting glucose levels (P = 3.2 x 10(-50)) and reduced beta-cell function as measured by homeostasis model assessment (HOMA-B, P = 1.1 x 10(-15)). The same allele was associated with an increased risk of type 2 diabetes (odds ratio = 1.09 (1.05-1.12), per G allele P = 3.3 x 10(-7)) in a meta-analysis of 13 case-control studies totaling 18,236 cases and 64,453 controls. Our analyses also confirm previous associations of fasting glucose with variants at the G6PC2 (rs560887, P = 1.1 x 10(-57)) and GCK (rs4607517, P = 1.0 x 10(-25)) loci.

    Funded by: Medical Research Council: G0000649, G016121, G0500539, G0601261, G0701863, MC_U106179471, MC_U106188470; NCRR NIH HHS: RR-163736; NHGRI NIH HHS: HG-02651, R01 HG002651-05; NHLBI NIH HHS: HC-25195, HL-084729, HL-087679, N01 HC025195, N02-HL-6-4278, R01 HL087679-02, U01 HL084729-03; NIDA NIH HHS: DA-021519, U54 DA021519, U54 DA021519-04; NIDDK NIH HHS: DK-062370, DK-065978, DK-072193, DK-078616, DK-080140, DK069922, K23 DK065978-05, K24 DK080140-01, K24 DK080140-02, K24 DK080140-05, R01 DK029867, R01 DK062370, R01 DK062370-05, R01 DK069922-02, R01 DK072193-04, R01 DK078616-01A1; NIMH NIH HHS: MH059160, R01 MH059160-04; Wellcome Trust: 076113, 077011, 077016, 079557, 083948, 089061, GR069224, GR072960

    Nature genetics 2009;41;1;77-81

  • Linkage disequilibrium mapping of the replicated type 2 diabetes linkage signal on chromosome 1q.

    Prokopenko I, Zeggini E, Hanson RL, Mitchell BD, Rayner NW, Akan P, Baier L, Das SK, Elliott KS, Fu M, Frayling TM, Groves CJ, Gwilliam R, Scott LJ, Voight BF, Hattersley AT, Hu C, Morris AD, Ng M, Palmer CN, Tello-Ruiz M, Vaxillaire M, Wang CR, Stein L, Chan J, Jia W, Froguel P, Elbein SC, Deloukas P, Bogardus C, Shuldiner AR, McCarthy MI and International Type 2 Diabetes 1q Consortium

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Objective: Linkage of the chromosome 1q21-25 region to type 2 diabetes has been demonstrated in multiple ethnic groups. We performed common variant fine-mapping across a 23-Mb interval in a multiethnic sample to search for variants responsible for this linkage signal.

    In all, 5,290 single nucleotide polymorphisms (SNPs) were successfully genotyped in 3,179 type 2 diabetes case and control subjects from eight populations with evidence of 1q linkage. Samples were ascertained using strategies designed to enhance power to detect variants causal for 1q linkage. After imputation, we estimate approximately 80% coverage of common variation across the region (r (2) > 0.8, Europeans). Association signals of interest were evaluated through in silico replication and de novo genotyping in approximately 8,500 case subjects and 12,400 control subjects.

    Results: Association mapping of the 23-Mb region identified two strong signals, both of which were restricted to the subset of European-descent samples. The first mapped to the NOS1AP (CAPON) gene region (lead SNP: rs7538490, odds ratio 1.38 [95% CI 1.21-1.57], P = 1.4 x 10(-6), in 999 case subjects and 1,190 control subjects); the second mapped within an extensive region of linkage disequilibrium that includes the ASH1L and PKLR genes (lead SNP: rs11264371, odds ratio 1.48 [1.18-1.76], P = 1.0 x 10(-5), under a dominant model). However, there was no evidence for association at either signal on replication, and, across all data (>24,000 subjects), there was no indication that these variants were causally related to type 2 diabetes status.

    Conclusions: Detailed fine-mapping of the 23-Mb region of replicated linkage has failed to identify common variant signals contributing to the observed signal. Future studies should focus on identification of causal alleles of lower frequency and higher penetrance.

    Funded by: Medical Research Council: G0601201, G0601201(79675); NCI NIH HHS: K07-CA67960; NCRR NIH HHS: M01 RR14288; NIA NIH HHS: T32-AG00219; NIDDK NIH HHS: K24-DK02673, R01-DK073490, R01-DK39311, R01-DK54261, U01-DK58026; Wellcome Trust: 072960, 077011, 079557, 088885, GR072960

    Diabetes 2009;58;7;1704-9

  • The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

    Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R and Lipman D

    National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA. Pruitt@ncbi.nlm.nih.gov

    Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

    Funded by: NHGRI NIH HHS: 1U54HG004555-01, U54 HG004555-03; Wellcome Trust: 062023, 077198, WT062023, WT077198

    Genome research 2009;19;7;1316-23

  • Improved protocols for the illumina genome analyzer sequencing system.

    Quail MA, Swerdlow H and Turner DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    In this unit, we describe a set of improvements we have made to the standard Illumina Genome Analyzer protocols to make the sequencing process more reliable in a high-throughput environment, reduce amplification bias, narrow the distribution of insert sizes, and reliably obtain high yields of data.

    Funded by: Wellcome Trust: 098051, WT079643

    Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] 2009;Chapter 18;Unit 18.2

  • Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology.

    Ramos AM, Crooijmans RP, Affara NA, Amaral AJ, Archibald AL, Beever JE, Bendixen C, Churcher C, Clark R, Dehais P, Hansen MS, Hedegaard J, Hu ZL, Kerstens HH, Law AS, Megens HJ, Milan D, Nonneman DJ, Rohrer GA, Rothschild MF, Smith TP, Schnabel RD, Van Tassell CP, Taylor JF, Wiedmann RT, Schook LB and Groenen MA

    Wageningen University, Animal Breeding and Genomics Centre, Wageningen, The Netherlands.

    Background: The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design a high-density SNP genotyping assay.

    A total of 19 reduced representation libraries derived from four swine breeds (Duroc, Landrace, Large White, Pietrain) and a Wild Boar population and three restriction enzymes (AluI, HaeIII and MspI) were sequenced using Illumina's Genome Analyzer (GA). The SNP discovery effort resulted in the de novo identification of over 372K SNPs. More than 549K SNPs were used to design the Illumina Porcine 60K+SNP iSelect Beadchip, now commercially available as the PorcineSNP60. A total of 64,232 SNPs were included on the Beadchip. Results from genotyping the 158 individuals used for sequencing showed a high overall SNP call rate (97.5%). Of the 62,621 loci that could be reliably scored, 58,994 were polymorphic yielding a SNP conversion success rate of 94%. The average minor allele frequency (MAF) for all scorable SNPs was 0.274.

    Overall, the results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs. In addition, the validation of the PorcineSNP60 Beadchip demonstrated that the assay is an excellent tool that will likely be used in a variety of future studies in pigs.

    PloS one 2009;4;8;e6524

  • A genome-wide association study of testicular germ cell tumor.

    Rapley EA, Turnbull C, Al Olama AA, Dermitzakis ET, Linger R, Huddart RA, Renwick A, Hughes D, Hines S, Seal S, Morrison J, Nsengimana J, Deloukas P, UK Testicular Cancer Collaboration, Rahman N, Bishop DT, Easton DF and Stratton MR

    Section of Cancer Genetics, Institute of Cancer Research, Sutton, Surrey, UK.

    We conducted a genome-wide association study for testicular germ cell tumor (TGCT), genotyping 307,666 SNPs in 730 cases and 1,435 controls from the UK and replicating associations in a further 571 cases and 1,806 controls. We found strong evidence for susceptibility loci on chromosome 5 (per allele OR = 1.37 (95% CI = 1.19-1.58), P = 3 x 10(-13)), chromosome 6 (OR = 1.50 (95% CI = 1.28-1.75), P = 10(-13)) and chromosome 12 (OR = 2.55 (95% CI = 2.05-3.19), P = 10(-31)). KITLG, encoding the ligand for the receptor tyrosine kinase KIT, which has previously been implicated in the pathogenesis of TGCT and the biology of germ cells, may explain the association on chromosome 12.

    Funded by: Cancer Research UK: 10118, A4994; Medical Research Council: G0000934, G0700491; Wellcome Trust: 068545/Z/02, 077012

    Nature genetics 2009;41;7;807-10

  • A large and accurate collection of peptidase cleavages in the MEROPS database.

    Rawlings ND

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Peptidases are enzymes that hydrolyse peptide bonds in proteins and peptides. Peptidases are important in pathological conditions such as Alzheimer's disease, tumour and parasite invasion, and for processing viral polyproteins. The MEROPS database is an Internet resource containing information on peptidases, their substrates and inhibitors. The database now includes details of cleavage positions in substrates, both physiological and non-physiological, natural and synthetic. There are 39 118 cleavages in the collection; including 34 606 from a total of 10 513 different proteins and 2677 cleavages in synthetic substrates. The number of cleavages designated as 'physiological' is 13 307. The data are derived from 6095 publications. At least one substrate cleavage is known for 45% of the 2415 different peptidases recognized in the MEROPS database. The website now has three new displays: two showing peptidase specificity as a logo and a frequency matrix, the third showing a dynamically generated alignment between each protein substrate and its most closely related homologues. Many of the proteins described in the literature as peptidase substrates have been studied only in vitro. On the assumption that a physiologically relevant cleavage site would be conserved between species, the conservation of every site in terms of peptidase preference has been examined and a number have been identified that are not conserved. There are a number of cogent reasons why a site might not be conserved. Each poorly conserved site has been examined and a reason postulated. Some sites are identified that are very poorly conserved where cleavage is more likely to be fortuitous than of physiological relevance. This data-set is freely available via the Internet and is a useful training set for algorithms to predict substrates for peptidases and cleavage positions within those substrates. The data may also be useful for the design of inhibitors and for engineering novel specificities into peptidases.Database URL:http://merops.sanger.ac.uk.

    Database : the journal of biological databases and curation 2009;2009;bap015

  • Lessons learnt from large-scale exon re-sequencing of the X chromosome.

    Raymond FL, Whibley A, Stratton MR and Gecz J

    Cambridge Institute of Medical Research, University of Cambridge, Cambridge, UK. flr24@cam.ac.uk

    A candidate gene approach to identifying novel causes of disease is concept-limiting and in the new era of high throughput sequencing there is now no need to restrict the experiment to a few interesting genes. We have recently completed a large-scale exon re-sequencing project using Sanger sequencing technology to analyse approximately 1 Mb of coding sequence of the X chromosome in probands from >200 families with various forms of intellectual disability. We review the lessons learnt from this experience. Comparing large data sets will certainly reveal pathogenic mutations in genes that were not possible to identify previously. However, the task of distinguishing pathogenic mutations from rare sequence variants is not easy and is the most substantial challenge to the next decade. High-throughput technology has the attraction of being cheap, fast and comprehensive but for projects that require detailed coverage of a genomic region at an exhaustive level they may require a combination of large-scale with a small-scale follow-up of difficult regions to sequence. The number of rare truncating variants present in coding regions of the X chromosome that are not pathogenic was 1%. The importance of the quality of the starting material both clinically and molecularly and the number of sequence variants both rare and common that any one individual has across their coding sequence is discussed.

    Funded by: Wellcome Trust

    Human molecular genetics 2009;18;R1;R60-4

  • Comparative genomic hybridization: microarray design and data interpretation.

    Redon R and Carter NP

    Wellcome Trust, Sanger Institute, Cambridge, UK.

    Microarray-based Comparative Genomic Hybridization (array-CGH) has been applied for a decade to screen for submicroscopic DNA gains and losses in tumor and constitutional DNA samples. This method has become increasingly flexible with the integration of new biological resources generated by genome sequencing projects. In this chapter, we describe alternative strategies for whole genome screening and high resolution breakpoint mapping of copy number changes by array-CGH, as well as tools available for accurate analysis of array-CGH experiments. Although most methods listed here have been designed for microarrays comprising large-insert clones, they can be adapted easily to other types of microarray platforms, such as those constructed from printed or synthesized oligonucleotides.

    Funded by: Wellcome Trust: 077008

    Methods in molecular biology (Clifton, N.J.) 2009;529;37-49

  • Comparative genomic hybridization: DNA labeling, hybridization and detection.

    Redon R, Fitzgerald T and Carter NP

    Wellcome Trust, Sanger Institute, Cambridge, UK.

    Array-CGH involves the comparison of a test to a reference genome using a microarray composed of target sequences with known chromosomal coordinates. The test and reference DNA samples are used as templates to generate two probe DNAs labeled with distinct fluorescent dyes. The two probe DNAs are co-hybridized on a microarray in the presence of Cot-1 DNA to suppress unspecific hybridization of repeat sequences. After slide washes and drying, microarray images are acquired on a laser scanner and fluorescent intensities from every target sequence spot on the array are extracted using dedicated computer programs. Intensity ratios are calculated and normalized to enable data interpretation. Although the protocols explained in this chapter correspond primarily to the use of large-insert clone microarrays in either manual or automated fashion, necessary adaptations for hybridization on microarrays comprising shorter target DNA sequences are also briefly described.

    Funded by: Wellcome Trust: 077008

    Methods in molecular biology (Clifton, N.J.) 2009;529;267-78

  • Comparative genomic hybridization: DNA preparation for microarray fabrication.

    Redon R, Rigler D and Carter NP

    Wellcome Trust, Sanger Institute, Cambridge, UK.

    The spatial resolution of microarray-based comparative genomic hybridization (array-CGH) is dependent on the length and density of target DNA sequences covering the chromosomal region of interest. Here we describe the methods developed at the Wellcome Trust Sanger Institute (Cambridge, UK) to construct microarrays comprising large-insert clones available through genome sequencing projects. These methods are applicable to Bacterial and Phage Artificial Chromosomes (BAC and PAC) as well as fosmid and cosmid clones. The protocols are scalable for the construction of microarrays composed of several hundreds up to several ten thousands clones.

    Funded by: Wellcome Trust: 077008

    Methods in molecular biology (Clifton, N.J.) 2009;529;259-66

  • The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species.

    Reference Genome Group of the Gene Ontology Consortium

    The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content.

    Funded by: British Heart Foundation: SP/07/007/23671; Medical Research Council: G0500293; NHGRI NIH HHS: HG00022, P41 HG000330, P41 HG001315, P41 HG002273, P41 HG002659-06, P41 HG02223; NHLBI NIH HHS: HL64541; NICHD NIH HHS: HD033745; NIGMS NIH HHS: GM64426, U24 GM07790

    PLoS computational biology 2009;5;7;e1000431

  • Allelic variants in HTR3C show association with autism.

    Rehnström K, Ylisaukko-oja T, Nummela I, Ellonen P, Kempas E, Vanhala R, von Wendt L, Järvelä I and Peltonen L

    Department of Molecular Medicine, National Public Health Institute and Institute for Molecular Medicine Finland, Helsinki, Finland. karola.rehnstrom@ktl.fi

    Autism spectrum disorders (ASDs) are severe neurodevelopmental disorders with a strong genetic component. Only a few predisposing genes have been identified so far. We have previously performed a genome-wide linkage screen for ASDs in Finnish families where the most significant linkage peak was identified at 3q25-27. Here, 11 positional and functionally relevant candidate genes at 3q25-27 were tested for association with autistic disorder. Genotypes of 125 single nucleotide polymorphisms (SNPs) were determined in 97 families with at least one individual affected with autistic disorder. The most significant association was observed using two non-synonymous SNPs in HTR3C, rs6766410 and rs6807362, both resulting in P = 0.0012 in family-based association analysis. In addition, the haplotype C-C corresponding to amino acids N163-A405 was overtransmitted to affected individuals (P = 0.006). Sequencing revealed no other variants in the coding region or splice sites of HTR3C. Based on the association analysis results in a previously identified linkage region, we propose that HTR3C represents a novel candidate locus for ASDs and should be tested in other populations.

    Funded by: Wellcome Trust: 089061

    American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics 2009;150B;5;741-6

  • An essential role for the Plasmodium Nek-2 Nima-related protein kinase in the sexual development of malaria parasites.

    Reininger L, Tewari R, Fennell C, Holland Z, Goldring D, Ranford-Cartwright L, Billker O and Doerig C

    INSERM U609-Wellcome Centre for Molecular Parasitology, Biomedical Research Centre, Faculty of Biomedical and Life Sciences, University of Glasgow, 120 University Place, Glasgow G12 8TA, Scotland.

    The molecular control of cell division and development in malaria parasites is far from understood. We previously showed that a Plasmodium gametocyte-specific NIMA-related protein kinase, nek-4, is required for completion of meiosis in the ookinete, the motile form that develops from the zygote in the mosquito vector. Here, we show that another NIMA-related kinase, Pfnek-2, is also predominantly expressed in gametocytes, and that Pfnek-2 is an active enzyme displaying an in vitro substrate preference distinct from that of Pfnek-4. A functional nek-2 gene is required for transmission of both Plasmodium falciparum and the rodent malaria parasite Plasmodium berghei to the mosquito vector, which is explained by the observation that disruption of the nek-2 gene in P. berghei causes dysregulation of DNA replication during meiosis and blocks ookinete development. This has implications (i) in our understanding of sexual development of malaria parasites and (ii) in the context of control strategies aimed at interfering with malaria transmission.

    Funded by: Medical Research Council; Wellcome Trust

    The Journal of biological chemistry 2009;284;31;20858-68

  • Replication and extension of genome-wide association study results for obesity in 4923 adults from northern Sweden.

    Renström F, Payne F, Nordström A, Brito EC, Rolandsson O, Hallmans G, Barroso I, Nordström P, Franks PW and GIANT Consortium

    Department of Public Health and Clinical Medicine, Umeå University Hospital, Umeå, Sweden.

    Recent genome-wide association studies (GWAS) have identified multiple risk loci for common obesity (FTO, MC4R, TMEM18, GNPDA2, SH2B1, KCTD15, MTCH2, NEGR1 and PCSK1). Here we extend those studies by examining associations with adiposity and type 2 diabetes in Swedish adults. The nine single nucleotide polymorphisms (SNPs) were genotyped in 3885 non-diabetic and 1038 diabetic individuals with available measures of height, weight and body mass index (BMI). Adipose mass and distribution were objectively assessed using dual-energy X-ray absorptiometry in a sub-group of non-diabetics (n = 2206). In models with adipose mass traits, BMI or obesity as outcomes, the most strongly associated SNP was FTO rs1121980 (P < 0.001). Five other SNPs (SH2B1 rs7498665, MTCH2 rs4752856, MC4R rs17782313, NEGR1 rs2815752 and GNPDA2 rs10938397) were significantly associated with obesity. To summarize the overall genetic burden, a weighted risk score comprising a subset of SNPs was constructed; those in the top quintile of the score were heavier (+2.6 kg) and had more total (+2.4 kg), gynoid (+191 g) and abdominal (+136 g) adipose tissue than those in the lowest quintile (all P < 0.001). The genetic burden score significantly increased diabetes risk, with those in the highest quintile (n = 193/594 cases/controls) being at 1.55-fold (95% CI 1.21-1.99; P < 0.0001) greater risk of type 2 diabetes than those in the lowest quintile (n = 130/655 cases/controls). In summary, we have statistically replicated six of the previously associated obese-risk loci and our results suggest that the weight-inducing effects of these variants are explained largely by increased adipose accumulation.

    Funded by: Wellcome Trust

    Human molecular genetics 2009;18;8;1489-96

  • Comparative genomic analysis of ten Streptococcus pneumoniae temperate bacteriophages.

    Romero P, Croucher NJ, Hiller NL, Hu FZ, Ehrlich GD, Bentley SD, García E and Mitchell TJ

    Division of Infection and Immunity, Glasgow Biomedical Research Centre, University of Glasgow, Glasgow, United Kingdom.

    Streptococcus pneumoniae is an important human pathogen that often carries temperate bacteriophages. As part of a program to characterize the genetic makeup of prophages associated with clinical strains and to assess the potential roles that they play in the biology and pathogenesis in their host, we performed comparative genomic analysis of 10 temperate pneumococcal phages. All of the genomes are organized into five major gene clusters: lysogeny, replication, packaging, morphogenesis, and lysis clusters. All of the phage particles observed showed a Siphoviridae morphology. The only genes that are well conserved in all the genomes studied are those involved in the integration and the lysis of the host in addition to two genes, of unknown function, within the replication module. We observed that a high percentage of the open reading frames contained no similarities to any sequences catalogued in public databases; however, genes that were homologous to known phage virulence genes, including the pblB gene of Streptococcus mitis and the vapE gene of Dichelobacter nodosus, were also identified. Interestingly, bioinformatic tools showed the presence of a toxin-antitoxin system in the phage phiSpn_6, and this represents the first time that an addition system in a pneumophage has been identified. Collectively, the temperate pneumophages contain a diverse set of genes with various levels of similarity among them.

    Funded by: NIDCD NIH HHS: DC02148, DC04173, DC05659

    Journal of bacteriology 2009;191;15;4854-62

  • Partial lipodystrophy and insulin resistant diabetes in a patient with a homozygous nonsense mutation in CIDEC.

    Rubio-Cabezas O, Puri V, Murano I, Saudek V, Semple RK, Dash S, Hyden CS, Bottomley W, Vigouroux C, Magré J, Raymond-Barker P, Murgatroyd PR, Chawla A, Skepper JN, Chatterjee VK, Suliman S, Patch AM, Agarwal AK, Garg A, Barroso I, Cinti S, Czech MP, Argente J, O'Rahilly S, Savage DB and LD Screening Consortium

    Department of Endocrinology, Hospital Infantil Universitario Niño Jesús, Madrid, Spain.

    Lipodystrophic syndromes are characterized by adipose tissue deficiency. Although rare, they are of considerable interest as they, like obesity, typically lead to ectopic lipid accumulation, dyslipidaemia and insulin resistant diabetes. In this paper we describe a female patient with partial lipodystrophy (affecting limb, femorogluteal and subcutaneous abdominal fat), white adipocytes with multiloculated lipid droplets and insulin-resistant diabetes, who was found to be homozygous for a premature truncation mutation in the lipid droplet protein cell death-inducing Dffa-like effector C (CIDEC) (E186X). The truncation disrupts the highly conserved CIDE-C domain and the mutant protein is mistargeted and fails to increase the lipid droplet size in transfected cells. In mice, Cidec deficiency also reduces fat mass and induces the formation of white adipocytes with multilocular lipid droplets, but in contrast to our patient, Cidec null mice are protected against diet-induced obesity and insulin resistance. In addition to describing a novel autosomal recessive form of familial partial lipodystrophy, these observations also suggest that CIDEC is required for unilocular lipid droplet formation and optimal energy storage in human fat.

    Funded by: Medical Research Council: G0600414; NIDDK NIH HHS: DK30898, DK32520, DK54387, DK60837, P30 DK032520-25, P30 DK032520-26, R01 DK054387-13, R37 DK030898-23; Wellcome Trust: 077016, 077016/Z/05/Z

    EMBO molecular medicine 2009;1;5;280-7

  • The versatility and adaptation of bacteria from the genus Stenotrophomonas.

    Ryan RP, Monchy S, Cardinale M, Taghavi S, Crossman L, Avison MB, Berg G, van der Lelie D and Dow JM

    BIOMERIT Research Centre, Department of Microbiology, BioSciences Institute, University College Cork, Cork, Ireland. r.ryan@ucc.ie

    The genus Stenotrophomonas comprises at least eight species. These bacteria are found throughout the environment, particularly in close association with plants. Strains of the most predominant species, Stenotrophomonas maltophilia, have an extraordinary range of activities that include beneficial effects for plant growth and health, the breakdown of natural and man-made pollutants that are central to bioremediation and phytoremediation strategies and the production of biomolecules of economic value, as well as detrimental effects, such as multidrug resistance, in human pathogenic strains. Here, we discuss the versatility of the bacteria in the genus Stenotrophomonas and the insight that comparative genomic analysis of clinical and endophytic isolates of S. maltophilia has brought to our understanding of the adaptation of this genus to various niches.

    Funded by: Austrian Science Fund FWF: P 20542-B16; Wellcome Trust

    Nature reviews. Microbiology 2009;7;7;514-25

  • Presence of interstereocilial links in waltzer mutants suggests Cdh23 is not essential for tip link formation.

    Rzadzinska AK and Steel KP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Cadherin23 has been proposed to form the upper part of the tip link, an interstereocilial link believed to control opening of transducer channels of sensory hair cells. However, we detect tip link-like links in mouse mutants with null alleles of Cdh23, suggesting the presence of other components that permit formation of a link between the tip of one stereocilium and the side of the adjacent taller stereocilium.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Neuroscience 2009;158;2;365-8

  • Genome-wide association analysis of metabolic traits in a birth cohort from a founder population.

    Sabatti C, Service SK, Hartikainen AL, Pouta A, Ripatti S, Brodsky J, Jones CG, Zaitlen NA, Varilo T, Kaakinen M, Sovio U, Ruokonen A, Laitinen J, Jakkula E, Coin L, Hoggart C, Collins A, Turunen H, Gabriel S, Elliot P, McCarthy MI, Daly MJ, Järvelin MR, Freimer NB and Peltonen L

    Department of Human Genetics and Los Angeles, Los Angeles, California 90095, USA.

    Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene-environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.

    Funded by: NCRR NIH HHS: U54 RR020278; NIGMS NIH HHS: GM053275-14; NIMH NIH HHS: MH083268; Wellcome Trust: 089061

    Nature genetics 2009;41;1;35-46

  • The Pakistan Risk of Myocardial Infarction Study: a resource for the study of genetic, lifestyle and other determinants of myocardial infarction in South Asia.

    Saleheen D, Zaidi M, Rasheed A, Ahmad U, Hakeem A, Murtaza M, Kayani W, Faruqui A, Kundi A, Zaman KS, Yaqoob Z, Cheema LA, Samad A, Rasheed SZ, Mallick NH, Azhar M, Jooma R, Gardezi AR, Memon N, Ghaffar A, Fazal-ur-Rehman, Khan N, Shah N, Ali Shah A, Samuel M, Hanif F, Yameen M, Naz S, Sultana A, Nazir A, Raza S, Shazad M, Nasim S, Javed MA, Ali SS, Jafree M, Nisar MI, Daood MS, Hussain A, Sarwar N, Kamal A, Deloukas P, Ishaq M, Frossard P and Danesh J

    Center for Non-Communicable Diseases, Karachi, Pakistan. ds436@medschl.cam.ac.uk

    The burden of coronary heart disease (CHD) is increasing at a greater rate in South Asia than in any other region globally, but there is little direct evidence about its determinants. The Pakistan Risk of Myocardial Infarction Study (PROMIS) is an epidemiological resource to enable reliable study of genetic, lifestyle and other determinants of CHD in South Asia. By March 2009, PROMIS had recruited over 5,000 cases of first-ever confirmed acute myocardial infarction (MI) and over 5,000 matched controls aged 30-80 years. For each participant, information has been recorded on demographic factors, lifestyle, medical and family history, anthropometry, and a 12-lead electrocardiogram. A range of biological samples has been collected and stored, including DNA, plasma, serum and whole blood. During its next stage, the study aims to expand recruitment to achieve a total of about 20,000 cases and about 20,000 controls, and, in subsets of participants, to enrich the resource by collection of monocytes, establishment of lymphoblastoid cell lines, and by resurveying participants. Measurements in progress include profiling of candidate biochemical factors, assay of 45,000 variants in 2,100 candidate genes, and a genomewide association scan of over 650,000 genetic markers. We have established a large epidemiological resource for CHD in South Asia. In parallel with its further expansion and enrichment, the PROMIS resource will be systematically harvested to help identify and evaluate genetic and other determinants of MI in South Asia. Findings from this study should advance scientific understanding and inform regionally appropriate disease prevention and control strategies.

    Funded by: British Heart Foundation; Medical Research Council; Wellcome Trust: 077011

    European journal of epidemiology 2009;24;6;329-38

  • The Schistosoma japonicum genome reveals features of host-parasite interplay.

    Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium

    Schistosoma japonicum is a parasitic flatworm that causes human schistosomiasis, which is a significant cause of morbidity in China and the Philippines. Here we present a draft genomic sequence for the worm. The genome provides a global insight into the molecular architecture and host interaction of this complex metazoan pathogen, revealing that it can exploit host nutrients, neuroendocrine hormones and signalling pathways for growth, development and maturation. Having a complex nervous system and a well-developed sensory system, S. japonicum can accept stimulation of the corresponding ligands as a physiological response to different environments, such as fresh water or the tissues of its intermediate and mammalian hosts. Numerous proteases, including cercarial elastase, are implicated in mammalian skin penetration and haemoglobin degradation. The genomic information will serve as a valuable platform to facilitate development of new interventions for schistosomiasis control.

    Funded by: NIAID NIH HHS: AI39461; Wellcome Trust: 085775

    Nature 2009;460;7253;345-51

  • Expression analysis of the Theileria parva subtelomere-encoded variable secreted protein gene family.

    Schmuckli-Maurer J, Casanova C, Schmied S, Affentranger S, Parvanova I, Kang'a S, Nene V, Katzer F, McKeever D, Müller J, Bishop R, Pain A and Dobbelaere DA

    Molecular Pathobiology, Vetsuisse Faculty, University of Bern, Bern, Switzerland.

    Background: The intracellular protozoan parasite Theileria parva transforms bovine lymphocytes inducing uncontrolled proliferation. Proteins released from the parasite are assumed to contribute to phenotypic changes of the host cell and parasite persistence. With 85 members, genes encoding subtelomeric variable secreted proteins (SVSPs) form the largest gene family in T. parva. The majority of SVSPs contain predicted signal peptides, suggesting secretion into the host cell cytoplasm.

    We analysed SVSP expression in T. parva-transformed cell lines established in vitro by infection of T or B lymphocytes with cloned T. parva parasites. Microarray and quantitative real-time PCR analysis revealed mRNA expression for a wide range of SVSP genes. The pattern of mRNA expression was largely defined by the parasite genotype and not by host background or cell type, and found to be relatively stable in vitro over a period of two months. Interestingly, immunofluorescence analysis carried out on cell lines established from a cloned parasite showed that expression of a single SVSP encoded by TP03_0882 is limited to only a small percentage of parasites. Epitope-tagged TP03_0882 expressed in mammalian cells was found to translocate into the nucleus, a process that could be attributed to two different nuclear localisation signals.

    Conclusions: Our analysis reveals a complex pattern of Theileria SVSP mRNA expression, which depends on the parasite genotype. Whereas in cell lines established from a cloned parasite transcripts can be found corresponding to a wide range of SVSP genes, only a minority of parasites appear to express a particular SVSP protein. The fact that a number of SVSPs contain functional nuclear localisation signals suggests that proteins released from the parasite could contribute to phenotypic changes of the host cell. This initial characterisation will facilitate future studies on the regulation of SVSP gene expression and the potential biological role of these enigmatic proteins.

    Funded by: Wellcome Trust: 064654, 077431, GR075820MA

    PloS one 2009;4;3;e4839

  • Genome flexibility in Neisseria meningitidis.

    Schoen C, Tettelin H, Parkhill J and Frosch M

    Institut für Hygiene und Mikrobiologie, der Universität Würzburg, Josef-Schneider-Strasse 2, Bau E1, Würzburg 97877, Germany. cschoen@hygiene.uni-wuerzburg.de

    Neisseria meningitidis usually lives as a commensal bacterium in the upper airways of humans. However, occasionally some strains can also cause life-threatening diseases such as sepsis and bacterial meningitis. Comparative genomics demonstrates that only very subtle genetic differences between carriage and disease strains might be responsible for the observed virulence differences and that N. meningitidis is, evolutionarily, a very recent species. Comparative genome sequencing also revealed a panoply of genetic mechanisms underlying its enormous genomic flexibility which also might affect the virulence of particular strains. From these studies, N. meningitidis emerges as a paradigm for organisms that use genome variability as an adaptation to changing and thus challenging environments.

    Vaccine 2009;27 Suppl 2;B103-11

  • Sequence and analysis of a plasmid-encoded mercury resistance operon from Mycobacterium marinum identifies MerH, a new mercuric ion transporter.

    Schué M, Dover LG, Besra GS, Parkhill J and Brown NL

    School of Biosciences, The University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom. matschue@yahoo.fr

    In this study, we report the DNA sequence and biological analysis of a mycobacterial mercury resistance operon encoding a novel Hg(2+) transporter. MerH was found to transport mercuric ions in Escherichia coli via a pair of essential cysteine residues but only when coexpressed with the mercuric reductase.

    Funded by: Medical Research Council; Wellcome Trust

    Journal of bacteriology 2009;191;1;439-44

  • The key role of genomics in modern vaccine and drug design for emerging infectious diseases.

    Seib KL, Dougan G and Rappuoli R

    Novartis Vaccines and Diagnostics, Siena, Italy.

    It can be argued that the arrival of the "genomics era" has significantly shifted the paradigm of vaccine and therapeutics development from microbiological to sequence-based approaches. Genome sequences provide a previously unattainable route to investigate the mechanisms that underpin pathogenesis. Genomics, transcriptomics, metabolomics, structural genomics, proteomics, and immunomics are being exploited to perfect the identification of targets, to design new vaccines and drugs, and to predict their effects in patients. Furthermore, human genomics and related studies are providing insights into aspects of host biology that are important in infectious disease. This ever-growing body of genomic data and new genome-based approaches will play a critical role in the future to enable timely development of vaccines and therapeutics to control emerging infectious diseases.

    Funded by: Wellcome Trust

    PLoS genetics 2009;5;10;e1000612

  • Genome watch: breaking the ICE.

    Seth-Smith H and Croucher NJ

    Nature reviews. Microbiology 2009;7;5;328-9

  • Co-evolution of genomes and plasmids within Chlamydia trachomatis and the emergence in Sweden of a new variant strain.

    Seth-Smith HM, Harris SR, Persson K, Marsh P, Barron A, Bignell A, Bjartling C, Clark L, Cutcliffe LT, Lambden PR, Lennard N, Lockey SJ, Quail MA, Salim O, Skilton RJ, Wang Y, Holland MJ, Parkhill J, Thomson NR and Clarke IN

    Molecular Microbiology Group, University Medical School, Southampton General Hospital, Southampton, SO16 6YD, UK. hss@sanger.ac.uk

    Background: Chlamydia trachomatis is the most common cause of sexually transmitted infections globally and the leading cause of preventable blindness in the developing world. There are two biovariants of C. trachomatis: 'trachoma', causing ocular and genital tract infections, and the invasive 'lymphogranuloma venereum' strains. Recently, a new variant of the genital tract C. trachomatis emerged in Sweden. This variant escaped routine diagnostic tests because it carries a plasmid with a deletion. Failure to detect this strain has meant it has spread rapidly across the country provoking a worldwide alert. In addition to being a key diagnostic target, the plasmid has been linked to chlamydial virulence. Analysis of chlamydial plasmids and their cognate chromosomes was undertaken to provide insights into the evolutionary relationship between chromosome and plasmid. This is essential knowledge if the plasmid is to be continued to be relied on as a key diagnostic marker, and for an understanding of the evolution of Chlamydia trachomatis.

    Results: The genomes of two new C. trachomatis strains were sequenced, together with plasmids from six C. trachomatis isolates, including the new variant strain from Sweden. The plasmid from the new Swedish variant has a 377 bp deletion in the first predicted coding sequence, abolishing the site used for PCR detection, resulting in negative diagnosis. In addition, the variant plasmid has a 44 bp duplication downstream of the deletion. The region containing the second predicted coding sequence is the most highly conserved region of the plasmids investigated. Phylogenetic analysis of the plasmids and chromosomes are fully congruent. Moreover this analysis also shows that ocular and genital strains diverged from a common C. trachomatis progenitor.

    Conclusion: The evolutionary pathways of the chlamydial genome and plasmid imply that inheritance of the plasmid is tightly linked with its cognate chromosome. These data suggest that the plasmid is not a highly mobile genetic element and does not transfer readily between isolates. Comparative analysis of the plasmid sequences has revealed the most conserved regions that should be used to design future plasmid based nucleic acid amplification tests, to avoid diagnostic failures.

    Funded by: Medical Research Council: G0601640; Wellcome Trust

    BMC genomics 2009;10;239

  • Worldwide patterns of haplotype diversity at 9p21.3, a locus associated with type 2 diabetes and coronary heart disease.

    Silander K, Tang H, Myles S, Jakkula E, Timpson NJ, Cavalli-Sforza L and Peltonen L

    Institute of Molecular Medicine FIMM, University of Helsinki, and Unit of Public Health Genomics, National Institute for Health and Welfare, Tukholmankatu 8, 00290 Helsinki, Finland. kaisa.silander@thl.fi.

    A 100 kb region on 9p21.3 harbors two major disease susceptibility loci: one for type 2 diabetes (T2D) and one for coronary heart disease (CHD). The single nucleotide polymorphisms (SNPs) associated with these two diseases in Europeans reside on two adjacent haplotype blocks with independent effects on disease. To help delimit the regions that likely harbor the disease-causing variants in populations of non-European origin, we studied the haplotype diversity and allelic history of the 9p21.3 region using 938 unrelated individuals from 51 populations (Human Genome Diversity Panel). We used SNP data from Illumina's 650Y SNP arrays supplemented with five additional SNPs within the region of interest. Haplotype frequencies were analyzed with the EM algorithm implemented in PLINK. For the T2D locus, the TT risk haplotype of SNPs rs10811661 and rs10757283 was present at similar frequencies in all global populations, while a shared 6-SNP haplotype that carries the protective C allele of rs10811661 was found at a frequency of 2.9% in Africans and 41.3% in East Asians and was associated with low haplotype diversity. For the CHD locus, all populations shared a core risk haplotype spanning >17.5 kb, which shows dramatic increase in frequency between African (11.5%) and Middle Eastern (63.7%) populations. Interestingly, two SNPs (rs2891168 and rs10757278) tagging this CHD risk haplotype are most strongly associated with CHD disease status according to independent clinical fine-mapping studies. The large variation in linkage disequilibrium patterns identified between the populations demonstrates the importance of allelic background data when selecting SNPs for replication in global populations. Intriguingly, the protective allele for T2D and the risk allele for CHD show an increase in frequency in non-Africans compared to Africans, implying different population histories for these two adjacent disease loci.

    Genome medicine 2009;1;5;51

  • Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens.

    Silby MW, Cerdeño-Tárraga AM, Vernikos GS, Giddens SR, Jackson RW, Preston GM, Zhang XX, Moon CD, Gehrig SM, Godfrey SA, Knight CG, Malone JG, Robinson Z, Spiers AJ, Harris S, Challis GL, Yaxley AM, Harris D, Seeger K, Murphy L, Rutter S, Squares R, Quail MA, Saunders E, Mavromatis K, Brettin TS, Bentley SD, Hothersall J, Stephens E, Thomas CM, Parkhill J, Levy SB, Rainey PB and Thomson NR

    Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Centre for Adaptation Genetics and Drug Resistance, Boston, MA 02111, USA. mark.silby@tufts.edu

    Background: Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species.

    Results: Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome.

    Conclusions: P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome.

    Funded by: Biotechnology and Biological Sciences Research Council: 104/P16729, P15257; Wellcome Trust

    Genome biology 2009;10;5;R51

  • Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size.

    Soranzo N, Rivadeneira F, Chinappen-Horsley U, Malkina I, Richards JB, Hammond N, Stolk L, Nica A, Inouye M, Hofman A, Stephens J, Wheeler E, Arp P, Gwilliam R, Jhamai PM, Potter S, Chaney A, Ghori MJ, Ravindrarajah R, Ermakov S, Estrada K, Pols HA, Williams FM, McArdle WL, van Meurs JB, Loos RJ, Dermitzakis ET, Ahmadi KR, Hart DJ, Ouwehand WH, Wareham NJ, Barroso I, Sandhu MS, Strachan DP, Livshits G, Spector TD, Uitterlinden AG and Deloukas P

    Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Recent genome-wide (GW) scans have identified several independent loci affecting human stature, but their contribution through the different skeletal components of height is still poorly understood. We carried out a genome-wide scan in 12,611 participants, followed by replication in an additional 7,187 individuals, and identified 17 genomic regions with GW-significant association with height. Of these, two are entirely novel (rs11809207 in CATSPER4, combined P-value = 6.1x10(-8) and rs910316 in TMED10, P-value = 1.4x10(-7)) and two had previously been described with weak statistical support (rs10472828 in NPR3, P-value = 3x10(-7) and rs849141 in JAZF1, P-value = 3.2x10(-11)). One locus (rs1182188 at GNA12) identifies the first height eQTL. We also assessed the contribution of height loci to the upper- (trunk) and lower-body (hip axis and femur) skeletal components of height. We find evidence for several loci associated with trunk length (including rs6570507 in GPR126, P-value = 4x10(-5) and rs6817306 in LCORL, P-value = 4x10(-4)), hip axis length (including rs6830062 at LCORL, P-value = 4.8x10(-4) and rs4911494 at UQCC, P-value = 1.9x10(-4)), and femur length (including rs710841 at PRKG2, P-value = 2.4x10(-5) and rs10946808 at HIST1H1D, P-value = 6.4x10(-6)). Finally, we used conditional analyses to explore a possible differential contribution of the height loci to these different skeletal size measurements. In addition to validating four novel loci controlling adult stature, our study represents the first effort to assess the contribution of genetic loci to three skeletal components of height. Further statistical tests in larger numbers of individuals will be required to verify if the height loci affect height preferentially through these subcomponents of height.

    Funded by: Medical Research Council: G0000934, G0701863, MC_QA137934, MC_U106188470; Wellcome Trust: 068545/Z/02

    PLoS genetics 2009;5;4;e1000445

  • MET and autism susceptibility: family and case-control studies.

    Sousa I, Clark TG, Toma C, Kobayashi K, Choma M, Holt R, Sykes NH, Lamb JA, Bailey AJ, Battaglia A, Maestrini E, Monaco AP and International Molecular Genetic Study of Autism Consortium (IMGSAC)

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Autism is a common, severe and highly heritable neurodevelopmental disorder. The International Molecular Genetic Study of Autism Consortium (IMGSAC) genome screen for linkage in affected sib-pair families identified a chromosome 7q susceptibility locus (AUTS1), that has subsequently shown evidence of increased sharing in several independent multiplex samples and in two meta-analyses. Taking into account the location of the MET gene under this linkage peak, and the fact that it has recently been reported to be associated with autism, the gene was further analyzed as a promising autism candidate. The gene encodes a transmembrane receptor tyrosine kinase of the hepatocyte growth factor/scatter factor (HGF/SF). MET is best known as an oncogene, but its signalling also participates in immune function, peripheral organ development and repair, and the development of the cerebral cortex and cerebellum (all of which have been observed earlier as being disregulated in individuals with autism). Here we present a family-based association analysis covering the entire MET locus. Significant results were obtained in both single locus and haplotype approaches with a single nucleotide polymorphism in intron 1 (rs38845, P<0.004) and with one intronic haplotype (AAGTG, P<0.009) in 325 multiplex IMGSAC families and 10 IMGSAC trios. Although these results failed to replicate in an independent sample of 82 Italian trios, the association itself was confirmed by a case-control analysis performed using the Italian cohort (P<0.02). The previously reported positive association of rs1858830 failed to replicate in this study. Overall, our findings provide further evidence that MET may play a role in autism susceptibility.

    Funded by: Medical Research Council; Wellcome Trust: 075491

    European journal of human genetics : EJHG 2009;17;6;749-58

  • Is the thrifty genotype hypothesis supported by evidence based on confirmed type 2 diabetes- and obesity-susceptibility variants?

    Southam L, Soranzo N, Montgomery SB, Frayling TM, McCarthy MI, Barroso I and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    According to the thrifty genotype hypothesis, the high prevalence of type 2 diabetes and obesity is a consequence of genetic variants that have undergone positive selection during historical periods of erratic food supply. The recent expansion in the number of validated type 2 diabetes- and obesity-susceptibility loci, coupled with access to empirical data, enables us to look for evidence in support (or otherwise) of the thrifty genotype hypothesis using proven loci.

    Methods: We employed a range of tests to obtain complementary views of the evidence for selection: we determined whether the risk allele at associated 'index' single-nucleotide polymorphisms is derived or ancestral, calculated the integrated haplotype score (iHS) and assessed the population differentiation statistic fixation index (F (ST)) for 17 type 2 diabetes and 13 obesity loci.

    Results: We found no evidence for significant differences for the derived/ancestral allele test. None of the studied loci showed strong evidence for selection based on the iHS score. We find a high F (ST) for rs7901695 at TCF7L2, the largest type 2 diabetes effect size found to date.

    Our results provide some evidence for selection at specific loci, but there are no consistent patterns of selection that provide conclusive confirmation of the thrifty genotype hypothesis. Discovery of more signals and more causal variants for type 2 diabetes and obesity is likely to allow more detailed examination of these issues.

    Funded by: Medical Research Council: G0601261, G0601261(80227); Wellcome Trust: 077016, 079557, 088885, WT077016/Z/05/Z, WT088885/Z/09/Z

    Diabetologia 2009;52;9;1846-51

  • Genetic determinants of height growth assessed longitudinally from infancy to adulthood in the northern Finland birth cohort 1966.

    Sovio U, Bennett AJ, Millwood IY, Molitor J, O'Reilly PF, Timpson NJ, Kaakinen M, Laitinen J, Haukka J, Pillas D, Tzoulaki I, Molitor J, Hoggart C, Coin LJ, Whittaker J, Pouta A, Hartikainen AL, Freimer NB, Widen E, Peltonen L, Elliott P, McCarthy MI and Jarvelin MR

    Department of Epidemiology and Public Health, Imperial College London, London, United Kingdom.

    Recent genome-wide association (GWA) studies have identified dozens of common variants associated with adult height. However, it is unknown how these variants influence height growth during childhood. We derived peak height velocity in infancy (PHV1) and puberty (PHV2) and timing of pubertal height growth spurt from parametric growth curves fitted to longitudinal height growth data to test their association with known height variants. The study consisted of N = 3,538 singletons from the prospective Northern Finland Birth Cohort 1966 with genotype data and frequent height measurements (on average 20 measurements per person) from 0-20 years. Twenty-six of the 48 variants tested associated with adult height (p<0.05, adjusted for sex and principal components) in this sample, all in the same direction as in previous GWA scans. Seven SNPs in or near the genes HHIP, DLEU7, UQCC, SF3B4/SV2A, LCORL, and HIST1H1D associated with PHV1 and five SNPs in or near SOCS2, SF3B4/SV2A, C17orf67, CABLES1, and DOT1L with PHV2 (p<0.05). We formally tested variants for interaction with age (infancy versus puberty) and found biologically meaningful evidence for an age-dependent effect for the SNP in SOCS2 (p = 0.0030) and for the SNP in HHIP (p = 0.045). We did not have similar prior evidence for the association between height variants and timing of pubertal height growth spurt as we had for PHVs, and none of the associations were statistically significant after correction for multiple testing. The fact that in this sample, less than half of the variants associated with adult height had a measurable effect on PHV1 or PHV2 is likely to reflect limited power to detect these associations in this dataset. Our study is the first genetic association analysis on longitudinal height growth in a prospective cohort from birth to adulthood and gives grounding for future research on the genetic regulation of human height during different periods of growth.

    Funded by: Medical Research Council: G0500539, G0600705; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: GR069224

    PLoS genetics 2009;5;3;e1000409

  • Genomic and genic deletions of the FOX gene cluster on 16q24.1 and inactivating mutations of FOXF1 cause alveolar capillary dysplasia and other malformations.

    Stankiewicz P, Sen P, Bhatt SS, Storer M, Xia Z, Bejjani BA, Ou Z, Wiszniewska J, Driscoll DJ, Maisenbacher MK, Bolivar J, Bauer M, Zackai EH, McDonald-McGinn D, Nowaczyk MM, Murray M, Hustead V, Mascotti K, Schultz R, Hallam L, McRae D, Nicholson AG, Newbury R, Durham-O'Donnell J, Knight G, Kini U, Shaikh TH, Martin V, Tyreman M, Simonic I, Willatt L, Paterson J, Mehta S, Rajan D, Fitzgerald T, Gribble S, Prigmore E, Patel A, Shaffer LG, Carter NP, Cheung SW, Langston C and Shaw-Smith C

    Dept of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA. pawels@bcm.edu

    Alveolar capillary dysplasia with misalignment of pulmonary veins (ACD/MPV) is a rare, neonatally lethal developmental disorder of the lung with defining histologic abnormalities typically associated with multiple congenital anomalies (MCA). Using array CGH analysis, we have identified six overlapping microdeletions encompassing the FOX transcription factor gene cluster in chromosome 16q24.1q24.2 in patients with ACD/MPV and MCA. Subsequently, we have identified four different heterozygous mutations (frameshift, nonsense, and no-stop) in the candidate FOXF1 gene in unrelated patients with sporadic ACD/MPV and MCA. Custom-designed, high-resolution microarray analysis of additional ACD/MPV samples revealed one microdeletion harboring FOXF1 and two distinct microdeletions upstream of FOXF1, implicating a position effect. DNA sequence analysis revealed that in six of nine deletions, both breakpoints occurred in the portions of Alu elements showing eight to 43 base pairs of perfect microhomology, suggesting replication error Microhomology-Mediated Break-Induced Replication (MMBIR)/Fork Stalling and Template Switching (FoSTeS) as a mechanism of their formation. In contrast to the association of point mutations in FOXF1 with bowel malrotation, microdeletions of FOXF1 were associated with hypoplastic left heart syndrome and gastrointestinal atresias, probably due to haploinsufficiency for the neighboring FOXC2 and FOXL1 genes. These differences reveal the phenotypic consequences of gene alterations in cis.

    Funded by: Wellcome Trust

    American journal of human genetics 2009;84;6;780-91

  • Common variants conferring risk of schizophrenia.

    Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, Werge T, Pietiläinen OP, Mors O, Mortensen PB, Sigurdsson E, Gustafsson O, Nyegaard M, Tuulio-Henriksson A, Ingason A, Hansen T, Suvisaari J, Lonnqvist J, Paunio T, Børglum AD, Hartmann A, Fink-Jensen A, Nordentoft M, Hougaard D, Norgaard-Pedersen B, Böttcher Y, Olesen J, Breuer R, Möller HJ, Giegling I, Rasmussen HB, Timm S, Mattheisen M, Bitter I, Réthelyi JM, Magnusdottir BB, Sigmundsson T, Olason P, Masson G, Gulcher JR, Haraldsson M, Fossdal R, Thorgeirsson TE, Thorsteinsdottir U, Ruggeri M, Tosato S, Franke B, Strengman E, Kiemeney LA, Genetic Risk and Outcome in Psychosis (GROUP), Melle I, Djurovic S, Abramova L, Kaleda V, Sanjuan J, de Frutos R, Bramon E, Vassos E, Fraser G, Ettinger U, Picchioni M, Walker N, Toulopoulou T, Need AC, Ge D, Yoon JL, Shianna KV, Freimer NB, Cantor RM, Murray R, Kong A, Golimbet V, Carracedo A, Arango C, Costas J, Jönsson EG, Terenius L, Agartz I, Petursson H, Nöthen MM, Rietschel M, Matthews PM, Muglia P, Peltonen L, St Clair D, Goldstein DB, Stefansson K and Collier DA

    deCODE genetics, Sturlugata 8, IS-101 Reykjavik, Iceland.

    Schizophrenia is a complex disorder, caused by both genetic and environmental factors and their interactions. Research on pathogenesis has traditionally focused on neurotransmitter systems in the brain, particularly those involving dopamine. Schizophrenia has been considered a separate disease for over a century, but in the absence of clear biological markers, diagnosis has historically been based on signs and symptoms. A fundamental message emerging from genome-wide association studies of copy number variations (CNVs) associated with the disease is that its genetic basis does not necessarily conform to classical nosological disease boundaries. Certain CNVs confer not only high relative risk of schizophrenia but also of other psychiatric disorders. The structural variations associated with schizophrenia can involve several genes and the phenotypic syndromes, or the 'genomic disorders', have not yet been characterized. Single nucleotide polymorphism (SNP)-based genome-wide association studies with the potential to implicate individual genes in complex diseases may reveal underlying biological pathways. Here we combined SNP data from several large genome-wide scans and followed up the most significant association signals. We found significant association with several markers spanning the major histocompatibility complex (MHC) region on chromosome 6p21.3-22.1, a marker located upstream of the neurogranin gene (NRGN) on 11q24.2 and a marker in intron four of transcription factor 4 (TCF4) on 18q21.2. Our findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.

    Funded by: Department of Health: PDA/02/06/016; NHLBI NIH HHS: 1R01HL087679-01; NIMH NIH HHS: R01 MH078075; Wellcome Trust: 089061

    Nature 2009;460;7256;744-7

  • Arena syndrome is caused by a missense mutation in PLP1.

    Stevenson RE, Tarpey P, May MM, Stratton MR and Schwartz CE

    JC Self Research Institute of Human Genetics, Greenwood Genetic Center, Greenwood, South Carolina 29646, USA. res@ggc.org

    American journal of medical genetics. Part A 2009;149A;5;1081

  • Loci at chromosomes 13, 19 and 20 influence age at natural menopause.

    Stolk L, Zhai G, van Meurs JB, Verbiest MM, Visser JA, Estrada K, Rivadeneira F, Williams FM, Cherkas L, Deloukas P, Soranzo N, de Keyzer JJ, Pop VJ, Lips P, Lebrun CE, van der Schouw YT, Grobbee DE, Witteman J, Hofman A, Pols HA, Laven JS, Spector TD and Uitterlinden AG

    Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands.

    We conducted a genome-wide association study for age at natural menopause in 2,979 European women and identified six SNPs in three loci associated with age at natural menopause: chromosome 19q13.4 (rs1172822; -0.4 year per T allele (39%); P = 6.3 × 10(-11)), chromosome 20p12.3 (rs236114; +0.5 year per A allele (21%); P = 9.7 × 10(-11)) and chromosome 13q34 (rs7333181; +0.5 year per A allele (12%); P = 2.5 × 10(-8)). These common genetic variants regulate timing of ovarian aging, an important risk factor for breast cancer, osteoporosis and cardiovascular disease.

    Funded by: Wellcome Trust: 077011

    Nature genetics 2009;41;6;645-7

  • Detection of single nucleotide polymorphisms based on the multilocus sequence typing database of Staphylococcus aureus using locked nucleic acid oligonucleotides.

    Stone M, Bamford K and Wain J

    Journal of medical microbiology 2009;58;Pt 5;693-5

  • The cancer genome.

    Stratton MR, Campbell PJ and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. mrs@sanger.ac.uk

    All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of cancer cells. Over the past quarter of a century much has been learnt about these mutations and the abnormal genes that operate in human cancers. We are now, however, moving into an era in which it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes. These studies will provide us with a detailed and comprehensive perspective on how individual cancers have developed.

    Funded by: Wellcome Trust: 077012, 088340

    Nature 2009;458;7239;719-24

  • Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels.

    Sudbery I, Stalker J, Simpson JT, Keane T, Rust AG, Hurles ME, Walter K, Lynch D, Teboul L, Brown SD, Li H, Ning Z, Nadeau JH, Croniger CM, Durbin R and Adams DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK. ims@sanger.ac.uk

    Genome sequences are essential tools for comparative and mutational analyses. Here we present the short read sequence of mouse chromosome 17 from the Mus musculus domesticus derived strain A/J, and the Mus musculus castaneus derived strain CAST/Ei. We describe approaches for the accurate identification of nucleotide and structural variation in the genomes of vertebrate experimental organisms, and show how these techniques can be applied to help prioritize candidate genes within quantitative trait loci.

    Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust

    Genome biology 2009;10;10;R112

  • Human Molecular Genetics

    Sudbery, P., Sudbery, I.

    3rd Edition;Pearson Educational;ISBN978-0132051576 2009

  • Confirmation of multiple risk Loci and genetic impacts by a genome-wide association study of type 2 diabetes in the Japanese population.

    Takeuchi F, Serizawa M, Yamamoto K, Fujisawa T, Nakashima E, Ohnaka K, Ikegami H, Sugiyama T, Katsuya T, Miyagishi M, Nakashima N, Nawata H, Nakamura J, Kono S, Takayanagi R and Kato N

    Department of Medical Ecology and Informatics, Research Institute, International Medical Center of Japan, Tokyo, Japan.

    Objective: To identify novel type 2 diabetes gene variants and confirm previously identified ones, a three-staged genome-wide association study was performed in the Japanese population.

    In the stage 1 scan, we genotyped 519 case and 503 control subjects with 482,625 single nucleotide polymorphism (SNP) markers; in the stage 2 panel comprising 1,110 case subjects and 1,014 control subjects, we assessed 1,456 SNPs (P < 0.0025, stage 1); additionally to direct genotyping, 964 healthy control subjects formed the in silico control panel. Along with genome-wide exploration, we aimed to replicate the disease association of 17 SNPs from 16 candidate loci previously identified in Europeans. The associated and/or replicated loci (23 SNPs; P < 7 x 10(-5) for genome-wide exploration and P < 0.05 for replication) were examined in the stage 3 panel comprising 4,000 case subjects and 12,569 population-based samples, from which 4,889 nondiabetic control subjects were preselected. The 12,569 subjects were used for overall risk assessment in the general population.

    Results: Four loci-1 novel with suggestive evidence (PEPD on 19q13, P = 1.4 x 10(-5)) and three previously reported-were identified; the association of CDKAL1, CDKN2A/CDKN2B, and KCNQ1 were confirmed (P < 10(-19)). Moreover, significant associations were replicated in five other candidate loci: TCF7L2, IGF2BP2, SLC30A8, HHEX, and KCNJ11. There was substantial overlap of type 2 diabetes susceptibility genes between the two populations, whereas effect size and explained variance tended to be higher in the Japanese population.

    Conclusions: The strength of association was more prominent in the Japanese population than in Europeans for more than half of the confirmed type 2 diabetes loci.

    Diabetes 2009;58;7;1690-9

  • Mutation spectrum of Meckel syndrome genes: one group of syndromes or several distinct groups?

    Tallila J, Salonen R, Kohlschmidt N, Peltonen L and Kestilä M

    National Institute of Health and Welfare, Public Health Genomics Unit and FIMM, Institute for Molecular Medicine Finland, Helsinki 00290, Finland.

    Meckel syndrome (MKS) is a lethal malformation syndrome that belongs to the group of disorders that are associated with primary cilia dysfunction. Total of five genes are known to be involved in the molecular background of MKS. Here we have systematically analyzed all these genes in a total of 29 MKS families. Seven of the families were Finnish and the rest originated from elsewhere in Europe. We found 12 novel mutations in 13 families. Mutations in the MKS genes are also found in other syndromes and it seems reasonable to assume that there is a correlation between the syndromes and the mutations. To obtain some supportive information, we collected all the previously published mutations in the genes to see whether the different syndromes are dictated by the nature of the mutations. Based on this study, mutations play a role in the clinical phenotype, given that the same allelic combination of mutations has never been reported in two clinically distinct syndromes.

    Funded by: Wellcome Trust: 089061

    Human mutation 2009;30;8;E813-30

  • A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation.

    Tarpey PS, Smith R, Pleasance E, Whibley A, Edkins S, Hardy C, O'Meara S, Latimer C, Dicks E, Menzies A, Stephens P, Blow M, Greenman C, Xue Y, Tyler-Smith C, Thompson D, Gray K, Andrews J, Barthorpe S, Buck G, Cole J, Dunmore R, Jones D, Maddison M, Mironenko T, Turner R, Turrell K, Varian J, West S, Widaa S, Wray P, Teague J, Butler A, Jenkinson A, Jia M, Richardson D, Shepherd R, Wooster R, Tejada MI, Martinez F, Carvill G, Goliath R, de Brouwer AP, van Bokhoven H, Van Esch H, Chelly J, Raynaud M, Ropers HH, Abidi FE, Srivastava AK, Cox J, Luo Y, Mallya U, Moon J, Parnau J, Mohammed S, Tolmie JL, Shoubridge C, Corbett M, Gardner A, Haan E, Rujirabanjerd S, Shaw M, Vandeleur L, Fullston T, Easton DF, Boyle J, Partington M, Hackett A, Field M, Skinner C, Stevenson RE, Bobrow M, Turner G, Schwartz CE, Gecz J, Raymond FL, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Large-scale systematic resequencing has been proposed as the key future strategy for the discovery of rare, disease-causing sequence variants across the spectrum of human complex disease. We have sequenced the coding exons of the X chromosome in 208 families with X-linked mental retardation (XLMR), the largest direct screen for constitutional disease-causing mutations thus far reported. The screen has discovered nine genes implicated in XLMR, including SYP, ZNF711 and CASK reported here, confirming the power of this strategy. The study has, however, also highlighted issues confronting whole-genome sequencing screens, including the observation that loss of function of 1% or more of X-chromosome genes is compatible with apparently normal existence.

    Funded by: Cancer Research UK: 10118; NICHD NIH HHS: HD26202; Wellcome Trust: 077012

    Nature genetics 2009;41;5;535-43

  • 'Putting our heads together': insights into genomic conservation between human and canine intracranial tumors.

    Thomas R, Duke SE, Wang HJ, Breen TE, Higgins RJ, Linder KE, Ellis P, Langford CF, Dickinson PJ, Olby NJ and Breen M

    Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27606, USA.

    Numerous attributes render the domestic dog a highly pertinent model for cancer-associated gene discovery. We performed microarray-based comparative genomic hybridization analysis of 60 spontaneous canine intracranial tumors to examine the degree to which dog and human patients exhibit aberrations of ancestrally related chromosome regions, consistent with a shared pathogenesis. Canine gliomas and meningiomas both demonstrated chromosome copy number aberrations (CNAs) that share evolutionarily conserved synteny with those previously reported in their human counterpart. Interestingly, however, genomic imbalances orthologous to some of the hallmark aberrations of human intracranial tumors, including chromosome 22/NF2 deletions in meningiomas and chromosome 1p/19q deletions in oligodendrogliomas, were not major events in the dog. Furthermore, and perhaps most significantly, we identified highly recurrent CNAs in canine intracranial tumors for which the human orthologue has been reported previously at low frequency but which have not, thus far, been associated intimately with the pathogenesis of the tumor. The presence of orthologous CNAs in canine and human intracranial cancers is strongly suggestive of their biological significance in tumor development and/or progression. Moreover, the limited genetic heterogenity within purebred dog populations, coupled with the contrasting organization of the dog and human karyotypes, offers tremendous opportunities for refining evolutionarily conserved regions of tumor-associated genomic imbalance that may harbor novel candidate genes involved in their pathogenesis. A comparative approach to the study of canine and human intracranial tumors may therefore provide new insights into their genetic etiology, towards development of more sophisticated molecular subclassification and tailored therapies in both species.

    Funded by: NINDS NIH HHS: NS051190, R21 NS051190-01; Wellcome Trust

    Journal of neuro-oncology 2009;94;3;333-49

  • Microarray-based cytogenetic profiling reveals recurrent and subtype-associated genomic copy number aberrations in feline sarcomas.

    Thomas R, Valli VE, Ellis P, Bell J, Karlsson EK, Cullen J, Lindblad-Toh K, Langford CF and Breen M

    Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27606, USA. rachael_thomas@ncsu.edu

    Injection-site-associated sarcomas (ISAS), commonly arising at the site of routine vaccine administration, afflict as many as 22,000 domestic cats annually in the USA. These tumors are typically more aggressive and prone to recurrence than spontaneous sarcomas (non-ISAS), generally receiving a poorer long-term prognosis and warranting a more aggressive therapeutic approach. Although certain clinical and histological factors are highly suggestive of ISAS, timely diagnosis and optimal clinical management may be hindered by the absence of definitive markers that can distinguish between tumors with underlying injection-related etiology and their spontaneous counterpart. Specific nonrandom chromosome copy number aberrations (CNAs) have been associated with the clinical behavior of a vast spectrum of human tumors, providing an extensive resource of potential diagnostic and prognostic biomarkers. Although similar principles are now being applied with great success in other species, their relevance to feline molecular oncology has not yet been investigated in any detail. We report the construction of a genomic microarray platform for detection of recurrent CNAs in feline tumors through cytogenetic assignment of 210 large-insert DNA clones selected at intervals of approximately 15 Mb from the feline genome sequence assembly. Microarray-based profiling of 19 ISAS and 27 non-ISAS cases identified an extensive range of genomic imbalances that were highly recurrent throughout the combined panel of 46 sarcomas. Deletions of two specific regions were significantly associated with the non-ISAS phenotype. Further characterization of these regions may ultimately permit molecular distinction between ISAS and non-ISAS, as a tool for predicting tumor behavior and prognosis, as well as refining means for therapeutic intervention.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2009;17;8;987-1000

  • Influence of genetic background on tumor karyotypes: evidence for breed-associated cytogenetic aberrations in canine appendicular osteosarcoma.

    Thomas R, Wang HJ, Tsai PC, Langford CF, Fosmire SP, Jubala CM, Getzy DM, Cutter GR, Modiano JF and Breen M

    Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, 4700 Hillsborough Street, Raleigh, NC 27606, USA.

    Recurrent chromosomal aberrations in solid tumors can reveal the genetic pathways involved in the evolution of a malignancy and in some cases predict biological behavior. However, the role of individual genetic backgrounds in shaping karyotypes of sporadic tumors is unknown. The genetic structure of purebred dog breeds, coupled with their susceptibility to spontaneous cancers, provides a robust model with which to address this question. We tested the hypothesis that there is an association between breed and the distribution of genomic copy number imbalances in naturally occurring canine tumors through assessment of a cohort of Golden Retrievers and Rottweilers diagnosed with spontaneous appendicular osteosarcoma. Our findings reveal significant correlations between breed and tumor karyotypes that are independent of gender, age at diagnosis, and histological classification. These data indicate for the first time that individual genetic backgrounds, as defined by breed in dogs, influence tumor karyotypes in a cancer with extensive genomic instability.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2009;17;3;365-77

  • Common variants in the region around Osterix are associated with bone mineral density and growth in childhood.

    Timpson NJ, Tobias JH, Richards JB, Soranzo N, Duncan EL, Sims AM, Whittaker P, Kumanduri V, Zhai G, Glaser B, Eisman J, Jones G, Nicholson G, Prince R, Seeman E, Spector TD, Brown MA, Peltonen L, Smith GD, Deloukas P and Evans DM

    MRC Centre for Causal Analyses in Translational Epidemiology, Department of Social Medicine, University of Bristol, Bristol, UK.

    Peak bone mass achieved in adolescence is a determinant of bone mass in later life. In order to identify genetic variants affecting bone mineral density (BMD), we performed a genome-wide association study of BMD and related traits in 1518 children from the Avon Longitudinal Study of Parents and Children (ALSPAC). We compared results with a scan of 134 adults with high or low hip BMD. We identified associations with BMD in an area of chromosome 12 containing the Osterix (SP7) locus, a transcription factor responsible for regulating osteoblast differentiation (ALSPAC: P = 5.8 x 10(-4); Australia: P = 3.7 x 10(-4)). This region has previously shown evidence of association with adult hip and lumbar spine BMD in an Icelandic population, as well as nominal association in a UK population. A meta-analysis of these existing studies revealed strong association between SNPs in the Osterix region and adult lumbar spine BMD (P = 9.9 x 10(-11)). In light of these findings, we genotyped a further 3692 individuals from ALSPAC who had whole body BMD and confirmed the association in children as well (P = 5.4 x 10(-5)). Moreover, all SNPs were related to height in ALSPAC children, but not weight or body mass index, and when height was included as a covariate in the regression equation, the association with total body BMD was attenuated. We conclude that genetic variants in the region of Osterix are associated with BMD in children and adults probably through primary effects on growth.

    Funded by: Medical Research Council: G0800582; Wellcome Trust

    Human molecular genetics 2009;18;8;1510-7

  • Sir2 paralogues cooperate to regulate virulence genes and antigenic variation in Plasmodium falciparum.

    Tonkin CJ, Carret CK, Duraisingh MT, Voss TS, Ralph SA, Hommel M, Duffy MF, Silva LM, Scherf A, Ivens A, Speed TP, Beeson JG and Cowman AF

    The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.

    Cytoadherance of Plasmodium falciparum-infected erythrocytes in the brain, organs and peripheral microvasculature is linked to morbidity and mortality associated with severe malaria. Parasite-derived P. falciparum Erythrocyte Membrane Protein 1 (PfEMP1) molecules displayed on the erythrocyte surface are responsible for cytoadherance and undergo antigenic variation in the course of an infection. Antigenic variation of PfEMP1 is achieved by in situ switching and mutually exclusive transcription of the var gene family, a process that is controlled by epigenetic mechanisms. Here we report characterisation of the P. falciparum silent information regulator's A and B (PfSir2A and PfSir2B) and their involvement in mutual exclusion and silencing of the var gene repertoire. Analysis of P. falciparum parasites lacking either PfSir2A or PfSir2B shows that these NAD(+)-dependent histone deacetylases are required for silencing of different var gene subsets classified by their conserved promoter type. We also demonstrate that in the absence of either of these molecules mutually exclusive expression of var genes breaks down. We show that var gene silencing originates within the promoter and PfSir2 paralogues are involved in cis spreading of silenced chromatin into adjacent regions. Furthermore, parasites lacking PfSir2A but not PfSir2B have considerably longer telomeric repeats, demonstrating a role for this molecule in telomeric end protection. This work highlights the pivotal but distinct role for both PfSir2 paralogues in epigenetic silencing of P. falciparum virulence genes and the control of pathogenicity of malaria infection.

    PLoS biology 2009;7;4;e84

  • Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease.

    Trégouët DA, König IR, Erdmann J, Munteanu A, Braund PS, Hall AS, Grosshennig A, Linsel-Nitschke P, Perret C, DeSuremain M, Meitinger T, Wright BJ, Preuss M, Balmforth AJ, Ball SG, Meisinger C, Germain C, Evans A, Arveiler D, Luc G, Ruidavets JB, Morrison C, van der Harst P, Schreiber S, Neureuther K, Schäfer A, Bugert P, El Mokhtari NE, Schrezenmeir J, Stark K, Rubin D, Wichmann HE, Hengstenberg C, Ouwehand W, Wellcome Trust Case Control Consortium, Cardiogenics Consortium, Ziegler A, Tiret L, Thompson JR, Cambien F, Schunkert H and Samani NJ

    Institut National de la Santé Et de la Recherche Médicale (INSERM) Unité Mixte de Recherche (UMR_S) 525, Université Pierre et Marie Curie (UPMC). Paris 06, Paris 75013, France. david.tregouet@upmc.fr

    We identify the SLC22A3-LPAL2-LPA gene cluster as a strong susceptibility locus for coronary artery disease (CAD) through a genome-wide haplotype association (GWHA) study. This locus was not identified from previous genome-wide association (GWA) studies focused on univariate analyses of SNPs. The proposed approach may have wide utility for analyzing GWA data for other complex traits.

    Funded by: British Heart Foundation; Medical Research Council; Wellcome Trust

    Nature genetics 2009;41;3;283-5

  • Gene-environmental interaction regarding alcohol-metabolizing enzymes in the Japanese general population.

    Tsuchihashi-Makaya M, Serizawa M, Yanai K, Katsuya T, Takeuchi F, Fujioka A, Yamori Y, Ogihara T and Kato N

    Division of Genomic Epidemiology, Department of Clinical Research and Informatics, Research Institute, International Medical Center of Japan, Tokyo, Japan. mimakaya@ri.imcj.go.jp

    Epidemiological studies have shown that excessive alcohol consumption is a potent risk factor to develop hypertension. In addition, some polymorphisms of the alcohol metabolism genes have been reported to exert significant impacts on the risk of alcoholism. We investigate the relevance of genetic susceptibility to drinking behavior and its influence on the sensitivity to pressor effects of alcohol in the Japanese general population. We initially screened SNPs in four candidate genes by resequencing. From 35 SNPs thus identified, 10 tag SNPs were selected and used for large-scale association analysis in a total of 5724 subjects. Among the SNPs tested, significant association (P<0.001) with drinking behavior was observed for ADH1B Arg47His (rs1229984) and ALDH2 Glu487Lys (rs671) polymorphisms. All subjects with Lys homozygote (AA genotype) of rs671 turned out to be nondrinkers and the combination of two SNP genotypes appeared to substantially influence people's drinking behavior in a synergistic manner. rs671 was significantly associated with blood pressure (P=0.0001-0.0491) in subgroups of drinkers. In the context of gene-environment interaction, our data clearly show the genetic impacts of two SNPs on drinking behavior and of one SNP on the sensitivity to the pressor effects of alcohol in the Japanese general population.

    Hypertension research : official journal of the Japanese Society of Hypertension 2009;32;3;207-13

  • High-throughput haplotype determination over long distances by haplotype fusion PCR and ligation haplotyping.

    Turner DJ and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. djt@sanger.ac.uk

    When combined with haplotype fusion PCR (HF-PCR), ligation haplotyping is a robust, high-throughput method for empirical determination of haplotypes, which can be applied to assaying both sequence and structural variation over long distances. Unlike alternative approaches to haplotype determination, such as allele-specific PCR and long PCR, HF-PCR and ligation haplotyping do not suffer from mispriming or template-switching errors. In this method, HF-PCR is used to juxtapose DNA sequences from single-molecule templates, which contain single-nucleotide polymorphisms (SNPs) or paralogous sequence variants (PSVs) separated by several kilobases. HF-PCR uses an emulsion-based fusion PCR, which can be performed rapidly and in a 96-well format. Subsequently, a ligation-based assay is performed on the HF-PCR products to determine haplotypes. Products are resolved by capillary electrophoresis. Once optimized, the procedure can be performed quickly, taking a day and a half to generate phased haplotypes from genomic DNA.

    Funded by: Wellcome Trust: 077014, 077014/Z/05/Z

    Nature protocols 2009;4;12;1771-83

  • Next-generation sequencing of vertebrate experimental organisms.

    Turner DJ, Keane TM, Sudbery I and Adams DJ

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Next-generation sequencing technologies are revolutionizing biology by allowing for genome-wide transcription factor binding-site profiling, transcriptome sequencing, and more recently, whole-genome resequencing. While it is currently not possible to generate complete de novo assemblies of higher-vertebrate genomes using next-generation sequencing, improvements in sequence read lengths and throughput, coupled with new assembly algorithms for large data sets, will soon make this a reality. These developments will in turn spawn a revolution in how genomic data are used to understand genetics and how model organisms are used for disease gene discovery. This review provides an overview of the current next-generation sequencing platforms and the newest computational tools for the analysis of next-generation sequencing data. We also describe how next-generation sequencing may be applied in the context of vertebrate model organism genetics.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 2009;20;6;327-38

  • Genetic similarity of chromosome 6 between patients receiving hematopoietic stem cell transplantation and HLA matched sibling donors.

    Turpeinen H, Volin L, Nikkinen L, Ojala P, Palotie A, Saarela J and Partanen J

    Finnish Red Cross Blood Service, Kivihaantie 7, 00310 Helsinki, Finland. hannu.turpeinen@veripalvelu.fi

    Background: Matching for HLA genes located on chromosome 6 is required in hematopoietic stem cell transplantation to reduce the incidence of graft-versus-host disease. However, a considerable proportion of patients still suffer from it, obviously due to genetic differences outside the HLA gene region.

    We studied the similarity of almost 4,000 single nucleotide polymorphisms on chromosome 6 between patients receiving hematopoietic stem cell transplantation and their HLA-matched sibling donors.

    Results: We observed that as a result of routine HLA matching the siblings in fact shared surprisingly long chromosomal fragments with similar single nucleotide polymorphism genotypes--from 11.65 Mb to 134.66 Mb. The number of genes mapped on these shared fragments varied from 402 to 1,302. Considering the whole chromosome 6, the HLA-matched siblings were apparently identical for 65.2-97.8% of the single nucleotide polymorphisms.

    Conclusions: Potentially, genes similar in some transplantation pairs while different in others might have a significant role in determining the outcome after hematopoietic stem cell transplantation.

    Haematologica 2009;94;4;528-35

  • A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites.

    Uren AG, Mikkers H, Kool J, van der Weyden L, Lund AH, Wilson CH, Rance R, Jonkers J, van Lohuizen M, Berns A and Adams DJ

    Division of Molecular Genetics, Cancer Genomics Centre, Netherlands Cancer Institute, Plesmanlaan, Amsterdam, The Netherlands.

    Insertional mutagens such as viruses and transposons are a useful tool for performing forward genetic screens in mice to discover cancer genes. These screens are most effective when performed using hundreds of mice; however, until recently, the cost-effective isolation and sequencing of insertion sites has been a major limitation to performing screens on this scale. Here we present a method for the high-throughput isolation of insertion sites using a highly efficient splinkerette-PCR method coupled with capillary or 454 sequencing. This protocol includes a description of the procedure for DNA isolation, DNA digestion, linker or splinkerette ligation, primary and secondary PCR amplification, and sequencing. This method, which takes about 1 week to perform, has allowed us to isolate hundreds of thousands of insertion sites from mouse tumors and, unlike other methods, has been specifically optimized for the murine leukemia virus (MuLV), and can easily be performed in a 96-well plate format for the efficient multiplex isolation of insertion sites.

    Funded by: Cancer Research UK: A6542; Wellcome Trust: 098051

    Nature protocols 2009;4;5;789-98

  • Megaoesophagus in Rassf1a-null mice.

    van der Weyden L, Happerfield L, Arends MJ and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. lvdw@sanger.ac.uk

    Megaoesophagus, or oesophageal achalasia, is a neuromuscular disorder characterized by an absence of peristalsis and flaccid dilatation of the oesophagus, resulting in the retention of ingesta in the dilated segment. The aetiology and pathogenesis of idiopathic (or primary) megaoesophagus are still poorly understood and very little is known about the genetic causes of megaoesophagus in humans. Attempts to develop animal models of this condition have been largely unsuccessful and although the ICRC/HiCri strain of mice spontaneously develop megaoesophagus, the underlying genetic cause remains unknown. In this report, we show that aged Rassf1a-null mice have an enhanced susceptibility to megaoesophagus compared with wild-type littermates (approximately 20%vs. approximately 2% incidence respectively; P = 0.01). Histological examination of the dilated oesophaguses shows a reduction in the numbers of nerve cells (both ganglia and nerve fibres) in the myenteric plexus of the dilated mid and lower oesophagus that was confirmed by S100 immunohistochemistry. There was also a chronic inflammatory infiltrate and subsequent fibrosis of the myenteric plexus and the muscle layers. These appearances closely mimic the gross and histopathological findings in human cases of megaoesophagus/achalasia, thus demonstrating that this is a representative mouse model of the disease. Thus, we have identified a genetic cause of the development of megaoesophagus/achalasia that could be screened for in patients, and may eventually facilitate the development of therapies that could prevent further progression of the disease once it is diagnosed at an early stage.

    Funded by: Cancer Research UK; Wellcome Trust

    International journal of experimental pathology 2009;90;2;101-8

  • Somatic mutations of the histone H3K27 demethylase gene UTX in human cancer.

    van Haaften G, Dalgliesh GL, Davies H, Chen L, Bignell G, Greenman C, Edkins S, Hardy C, O'Meara S, Teague J, Butler A, Hinton J, Latimer C, Andrews J, Barthorpe S, Beare D, Buck G, Campbell PJ, Cole J, Forbes S, Jia M, Jones D, Kok CY, Leroy C, Lin ML, McBride DJ, Maddison M, Maquire S, McLay K, Menzies A, Mironenko T, Mulderrig L, Mudie L, Pleasance E, Shepherd R, Smith R, Stebbings L, Stephens P, Tang G, Tarpey PS, Turner R, Turrell K, Varian J, West S, Widaa S, Wray P, Collins VP, Ichimura K, Law S, Wong J, Yuen ST, Leung SY, Tonon G, DePinho RA, Tai YT, Anderson KC, Kahnoski RJ, Massie A, Khoo SK, Teh BT, Stratton MR and Futreal PA

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Somatically acquired epigenetic changes are present in many cancers. Epigenetic regulation is maintained via post-translational modifications of core histones. Here, we describe inactivating somatic mutations in the histone lysine demethylase gene UTX, pointing to histone H3 lysine methylation deregulation in multiple tumor types. UTX reintroduction into cancer cells with inactivating UTX mutations resulted in slowing of proliferation and marked transcriptional changes. These data identify UTX as a new human cancer gene.

    Funded by: Wellcome Trust: 077012, 088340

    Nature genetics 2009;41;5;521-3

  • Improving global and regional resolution of male lineage differentiation by simple single-copy Y-chromosomal short tandem repeat polymorphisms.

    Vermeulen M, Wollstein A, van der Gaag K, Lao O, Xue Y, Wang Q, Roewer L, Knoblauch H, Tyler-Smith C, de Knijff P and Kayser M

    Department of Forensic Molecular Biology, Erasmus University Medical Center Rotterdam, 3000 CA Rotterdam, The Netherlands.

    We analyzed 67 short tandem repeat polymorphisms from the non-recombining part of the Y-chromosome (Y-STRs), including 49 rarely studied simple single-copy (ss)Y-STRs and 18 widely used Y-STRs, in 590 males from 51 populations belonging to 8 worldwide regions (HGDP-CEPH panel). Although autosomal DNA profiling provided no evidence for close relationship, we found 18 Y-STR haplotypes (defined by 67 Y-STRs) that were shared by two to five men in 13 worldwide populations, revealing high and widespread levels of cryptic male relatedness. Maximal (95.9%) haplotype resolution was achieved with the best 25 out of 67 Y-STRs in the global dataset, and with the best 3-16 markers in regional datasets (89.6-100% resolution). From the 49 rarely studied ssY-STRs, the 25 most informative markers were sufficient to reach the highest possible male lineage differentiation in the global (92.2% resolution), and 3-15 markers in the regional datasets (85.4-100%). Considerably lower haplotype resolutions were obtained with the three commonly used Y-STR sets (Minimal Haplotype, PowerPlex Y, and AmpFlSTR Yfiler. Six ssY-STRs (DYS481, DYS533, DYS549, DYS570, DYS576 and DYS643) were most informative to supplement the existing Y-STR kits for increasing haplotype resolution, or - together with additional ssY-STRs - as a new set for maximizing male lineage differentiation. Mutation rates of the 49 ssY-STRs were estimated from 403 meiotic transfers in deep-rooted pedigrees, and ranged from approximately 4.8 x 10(-4) for 31 ssY-STRs with no mutations observed to 1.3 x 10(-2) and 1.5 x 10(-2) for DYS570 and DYS576, respectively, the latter representing the highest mutation rates reported for human Y-STRs so far. Our findings thus demonstrate that ssY-STRs are useful for maximizing global and regional resolution of male lineages, either as a new set, or when added to commonly used Y-STR sets, and support their application to forensic, genealogical and anthropological studies.

    Funded by: Wellcome Trust: 077009

    Forensic science international. Genetics 2009;3;4;205-13

  • Genome-wide association study of smoking initiation and current smoking.

    Vink JM, Smit AB, de Geus EJ, Sullivan P, Willemsen G, Hottenga JJ, Smit JH, Hoogendijk WJ, Zitman FG, Peltonen L, Kaprio J, Pedersen NL, Magnusson PK, Spector TD, Kyvik KO, Morley KI, Heath AC, Martin NG, Westendorp RG, Slagboom PE, Tiemeier H, Hofman A, Uitterlinden AG, Aulchenko YS, Amin N, van Duijn C, Penninx BW and Boomsma DI

    Department of Biological Psychology, Center for Neurogenomic and Cognitive Research, VU University Amsterdam, The Netherlands. jm.vink@psy.vu.nl

    For the identification of genes associated with smoking initiation and current smoking, genome-wide association analyses were carried out in 3497 subjects. Significant genes that replicated in three independent samples (n = 405, 5810, and 1648) were visualized into a biologically meaningful network showing cellular location and direct interaction of their proteins. Several interesting groups of proteins stood out, including glutamate receptors (e.g., GRIN2B, GRIN2A, GRIK2, GRM8), proteins involved in tyrosine kinase receptor signaling (e.g., NTRK2, GRB14), transporters (e.g., SLC1A2, SLC9A9) and cell-adhesion molecules (e.g., CDH23). We conclude that a network-based genome-wide association approach can identify genes influencing smoking behavior.

    Funded by: NIMH NIH HHS: MH074027, MH077139, MH081802; Wellcome Trust

    American journal of human genetics 2009;84;3;367-79

  • Milk and two oligosaccharides.

    Walker A

    Nature reviews. Microbiology 2009;7;7;483

  • Single domain antibodies against the collagen signalling receptor glycoprotein VI are inhibitors of collagen induced thrombus formation.

    Walker A, Pugh N, Garner SF, Stephens J, Maddox B, Ouwehand WH, Farndale RW, Steward M and Bloodomics Consortium

    Domantis Ltd., 315 Cambridge Science Park, Cambridge, UK. Adam.Walker@Domantis.com

    Human Domain Antibodies (dAbs) that bind to and inhibit the function of platelet glycoprotein VI (GPVI) have been isolated from phage display libraries and their efficacy demonstrated using in vitro models of platelet activation. Here we describe the properties of one such antibody, BLO8-1, which has been shown to specifically inhibit the binding of recombinant human GPVI to cross-linked collagen related peptide (CRP-XL) in vitro. BLO8-1 specifically binds to the platelet cell surface and prevents CRP-XL induced platelet aggregation in platelet-rich plasma, as well as inhibiting thrombus formation in whole blood under arterial shear conditions. Using a series of mutant GPVI molecules, BLO8-1 was shown to recognize an epitope within the collagen binding domain of GPVI, therefore the anti-thrombotic effect of this dAb is predicted to be due to direct blocking of the collagen-GPVI interaction. These data, together with the desirable properties of Domain Antibodies, show that dAbs could potentially be used to generate novel biopharmaceuticals with anti-thrombotic properties.

    Funded by: British Heart Foundation: RG/09/003/27122; Medical Research Council: G0500707

    Platelets 2009;20;4;268-76

  • CLIP: construction of cDNA libraries for high-throughput sequencing from RNAs cross-linked to proteins in vivo.

    Wang Z, Tollervey J, Briese M, Turner D and Ule J

    MRC-Laboratory of Molecular Biology, Hills Road, Cambridge CB20QH, UK.

    UV cross-linking and immunoprecipitation assay (CLIP) can identify direct interaction sites between RNA-binding proteins and RNAs in vivo, and has been used to study several proteins in tissues and cell cultures. The main challenge of the method is to specifically amplify the low amount of isolated RNA. The current protocol is optimised for efficient RNA purification and ligation of barcoded RNA adapters. High-throughput sequencing of the multiplexed cDNA library allows for a comprehensive coverage of the target sequences.

    Funded by: Medical Research Council: MC_U105185858; Wellcome Trust: 089701

    Methods (San Diego, Calif.) 2009;48;3;287-93

  • A HaemAtlas: characterizing gene expression in differentiated human blood cells.

    Watkins NA, Gusnanto A, de Bono B, De S, Miranda-Saavedra D, Hardie DL, Angenent WG, Attwood AP, Ellis PD, Erber W, Foad NS, Garner SF, Isacke CM, Jolley J, Koch K, Macaulay IC, Morley SL, Rendon A, Rice KM, Taylor N, Thijssen-Timmer DC, Tijssen MR, van der Schoot CE, Wernisch L, Winzer T, Dudbridge F, Buckley CD, Langford CF, Teichmann S, Göttgens B, Ouwehand WH and Bloodomics Consortium

    Department of Haematology, University of Cambridge, National Health Service Blood and Transplant, Cambridge, United Kingdom. naw23@cam.ac.uk

    Hematopoiesis is a carefully controlled process that is regulated by complex networks of transcription factors that are, in part, controlled by signals resulting from ligand binding to cell-surface receptors. To further understand hematopoiesis, we have compared gene expression profiles of human erythroblasts, megakaryocytes, B cells, cytotoxic and helper T cells, natural killer cells, granulocytes, and monocytes using whole genome microarrays. A bioinformatics analysis of these data was performed focusing on transcription factors, immunoglobulin superfamily members, and lineage-specific transcripts. We observed that the numbers of lineage-specific genes varies by 2 orders of magnitude, ranging from 5 for cytotoxic T cells to 878 for granulocytes. In addition, we have identified novel coexpression patterns for key transcription factors involved in hematopoiesis (eg, GATA3-GFI1 and GATA2-KLF1). This study represents the most comprehensive analysis of gene expression in hematopoietic cells to date and has identified genes that play key roles in lineage commitment and cell function. The data, which are freely accessible, will be invaluable for future studies on hematopoiesis and the role of specific genes and will also aid the understanding of the recent genome-wide association studies.

    Funded by: Wellcome Trust

    Blood 2009;113;19;e1-9

  • The global consequence of disruption of the AcrAB-TolC efflux pump in Salmonella enterica includes reduced expression of SPI-1 and other attributes required to infect the host.

    Webber MA, Bailey AM, Blair JM, Morgan E, Stevens MP, Hinton JC, Ivens A, Wain J and Piddock LJ

    Antimicrobial Agents Research Group, Division of Immunity and Infection, University of Birmingham, Edgbaston, Birmingham, United Kingdom.

    The mechanisms by which RND pumps contribute to pathogenicity are currently not understood. Using the AcrAB-TolC system as a paradigm multidrug-resistant efflux pump and Salmonella enterica serovar Typhimurium as a model pathogen, we have demonstrated that AcrA, AcrB, and TolC are each required for efficient adhesion to and invasion of epithelial cells and macrophages by Salmonella in vitro. In addition, AcrB and TolC are necessary for Salmonella to colonize poultry. Mutants lacking acrA, acrB, or tolC showed differential expression of major operons and proteins involved in pathogenesis. These included chemotaxis and motility genes, including cheWY and flgLMK and 14 Salmonella pathogenicity island (SPI)-1-encoded type III secretion system genes, including sopE, and associated effector proteins. Reverse transcription-PCR confirmed these data for identical mutants in two other S. Typhimurium backgrounds. Western blotting showed reduced production of SipA, SipB, and SipC. The absence of AcrB or TolC also caused widespread repression of chemotaxis and motility genes in these mutants, and for acrB::aph, this was associated with decreased motility. For mutants lacking a functional acrA or acrB gene, the nap and nir operons were repressed, and both mutants grew poorly in anaerobic conditions. All phenotypes were restored to that of the wild type by trans-complementation with the wild-type allele of the respective inactivated gene. These data explain how mutants lacking a component of AcrAB-TolC are attenuated and that this phenotype is a result of decreased expression of numerous genes encoding proteins involved in pathogenicity. The link between antibiotic resistance and pathogenicity establishes the AcrAB-TolC system as fundamental to the biology of Salmonella.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council

    Journal of bacteriology 2009;191;13;4276-85

  • Swift: primary data analysis for the Illumina Solexa sequencing platform.

    Whiteford N, Skelly T, Curtis C, Ritchie ME, Löhr A, Zaranek AW, Abnizova I and Brown C

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. nava.whiteford@nanoporetech.com

    MOTIVATION: Primary data analysis methods are of critical importance in second generation DNA sequencing. Improved methods have the potential to increase yield and reduce the error rates. Openly documented analysis tools enable the user to understand the primary data, this is important for the optimization and validity of their scientific work. RESULTS: In this article, we describe Swift, a new tool for performing primary data analysis on the Illumina Solexa Sequencing Platform. Swift is the first tool, outside of the vendors own software, which completes the full analysis process, from raw images through to base calls. As such it provides an alternative to, and independent validation of, the vendor supplied tool. Our results show that Swift is able to increase yield by 13.8%, at comparable error rate.

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;17;2194-9

  • Comparative genomics of the emerging human pathogen Photorhabdus asymbiotica with the insect pathogen Photorhabdus luminescens.

    Wilkinson P, Waterfield NR, Crossman L, Corton C, Sanchez-Contreras M, Vlisidou I, Barron A, Bignell A, Clark L, Ormond D, Mayho M, Bason N, Smith F, Simmonds M, Churcher C, Harris D, Thompson NR, Quail M, Parkhill J and Ffrench-Constant RH

    School of Biosciences, University of Exeter in Cornwall, Penryn TR10 9EZ, UK. P.A.Wilkinson@exeter.ac.uk

    Background: The Gram-negative bacterium Photorhabdus asymbiotica (Pa) has been recovered from human infections in both North America and Australia. Recently, Pa has been shown to have a nematode vector that can also infect insects, like its sister species the insect pathogen P. luminescens (Pl). To understand the relationship between pathogenicity to insects and humans in Photorhabdus we have sequenced the complete genome of Pa strain ATCC43949 from North America. This strain (formerly referred to as Xenorhabdus luminescens strain 2) was isolated in 1977 from the blood of an 80 year old female patient with endocarditis, in Maryland, USA. Here we compare the complete genome of Pa ATCC43949 with that of the previously sequenced insect pathogen P. luminescens strain TT01 which was isolated from its entomopathogenic nematode vector collected from soil in Trinidad and Tobago.

    Results: We found that the human pathogen Pa had a smaller genome (5,064,808 bp) than that of the insect pathogen Pl (5,688,987 bp) but that each pathogen carries approximately one megabase of DNA that is unique to each strain. The reduced size of the Pa genome is associated with a smaller diversity in insecticidal genes such as those encoding the Toxin complexes (Tc's), Makes caterpillars floppy (Mcf) toxins and the Photorhabdus Virulence Cassettes (PVCs). The Pa genome, however, also shows the addition of a plasmid related to pMT1 from Yersinia pestis and several novel pathogenicity islands including a novel Type Three Secretion System (TTSS) encoding island. Together these data suggest that Pa may show virulence against man via the acquisition of the pMT1-like plasmid and specific effectors, such as SopB, that promote its persistence inside human macrophages. Interestingly the loss of insecticidal genes in Pa is not reflected by a loss of pathogenicity towards insects.

    Conclusion: Our results suggest that North American isolates of Pa have acquired virulence against man via the acquisition of a plasmid and specific virulence factors with similarity to those shown to play roles in pathogenicity against humans in other bacteria.

    Funded by: Biotechnology and Biological Sciences Research Council

    BMC genomics 2009;10;302

  • Signal initiation in biological systems: the properties and detection of transient extracellular protein interactions.

    Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. gw2@sanger.ac.uk

    Individual cells within biological systems frequently coordinate their functions through signals initiated by specific extracellular protein interactions involving receptors that bridge the cellular membrane. Due to their biochemical nature, these membrane-embedded receptor proteins are difficult to manipulate and their interactions are characterised by very weak binding strengths that cannot be detected using popular high throughput assays. This review will provide a general outline of the biochemical attributes of receptor proteins focussing in particular on the biophysical properties of their transient interactions. Methods that are able to detect these weak extracellular binding events and especially those that can be used for identifying novel interactions will be compared. Finally, I discuss the feasibility of constructing a complete and accurate extracellular protein interaction map, and the methods that are likely to be useful in achieving this goal.

    Molecular bioSystems 2009;5;12;1405-12

  • CARM1 is required in embryonic stem cells to maintain pluripotency and resist differentiation.

    Wu Q, Bruce AW, Jedrusik A, Ellis PD, Andrews RM, Langford CF, Glover DM and Zernicka-Goetz M

    Wellcome Trust and Cancer Research UK Gurdon Institute, Cambridge, United Kingdom.

    Histone H3 methylation at R17 and R26 recently emerged as a novel epigenetic mechanism regulating pluripotency in mouse embryos. Blastomeres of four-cell embryos with high H3 methylation at these sites show unrestricted potential, whereas those with lower levels cannot support development when aggregated in chimeras of like cells. Increasing histone H3 methylation, through expression of coactivator-associated-protein-arginine-methyltransferase 1 (CARM1) in embryos, elevates expression of key pluripotency genes and directs cells to the pluripotent inner cell mass. We demonstrate CARM1 is also required for the self-renewal and pluripotency of embryonic stem (ES) cells. In ES cells, CARM1 depletion downregulates pluripotency genes leading to their differentiation. CARM1 associates with Oct4/Pou5f1 and Sox2 promoters that display detectable levels of R17/26 histone H3 methylation. In CARM1 overexpressing ES cells, histone H3 arginine methylation is also at the Nanog promoter to which CARM1 now associates. Such cells express Nanog at elevated levels and delay their response to differentiation signals. Thus, like in four-cell embryo blastomeres, histone H3 arginine methylation by CARM1 in ES cells allows epigenetic modulation of pluripotency.

    Funded by: Medical Research Council: G0300723, G0800784; Wellcome Trust: 064421

    Stem cells (Dayton, Ohio) 2009;27;11;2637-45

  • Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree.

    Xue Y, Wang Q, Long Q, Ng BL, Swerdlow H, Burton J, Skuce C, Taylor R, Abdellah Z, Zhao Y, Asan, MacArthur DG, Quail MA, Carter NP, Yang H and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1SA, UK. ylx@sanger.ac.uk

    Understanding the key process of human mutation is important for many aspects of medical genetics and human evolution. In the past, estimates of mutation rates have generally been inferred from phenotypic observations or comparisons of homologous sequences among closely related species. Here, we apply new sequencing technology to measure directly one mutation rate, that of base substitutions on the human Y chromosome. The Y chromosomes of two individuals separated by 13 generations were flow sorted and sequenced by Illumina (Solexa) paired-end sequencing to an average depth of 11x or 20x, respectively. Candidate mutations were further examined by capillary sequencing in cell-line and blood DNA from the donors and additional family members. Twelve mutations were confirmed in approximately 10.15 Mb; eight of these had occurred in vitro and four in vivo. The latter could be placed in different positions on the pedigree and led to a mutation-rate measurement of 3.0 x 10(-8) mutations/nucleotide/generation (95% CI: 8.9 x 10(-9)-7.0 x 10(-8)), consistent with estimates of 2.3 x 10(-8)-6.3 x 10(-8) mutations/nucleotide/generation for the same Y-chromosomal region from published human-chimpanzee comparisons depending on the generation and split times assumed.

    Funded by: Wellcome Trust

    Current biology : CB 2009;19;17;1453-7

  • Generation of Paint Probes by Flow-Sorted and Mocrodissected Chromosomes

    Yang, F, Trifonov, V.A, Ng, B. L., Kosyakova, N, Carter, N. P.

    Fluorescence In Situ Hybridization (FISH) - Application Guide. 2009;35-52

  • Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads.

    Ye K, Schulz MH, Long Q, Apweiler R and Ning Z

    EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. k.ye@lumc.nl

    Motivation: There is a strong demand in the genomic community to develop effective algorithms to reliably identify genomic variants. Indel detection using next-gen data is difficult and identification of long structural variations is extremely challenging.

    Results: We present Pindel, a pattern growth approach, to detect breakpoints of large deletions and medium-sized insertions from paired-end short reads. We use both simulated reads and real data to demonstrate the efficiency of the computer program and accuracy of the results.

    Availability: The binary code and a short user manual can be freely downloaded from http://www.ebi.ac.uk/ approximately kye/pindel/.

    Contact: k.ye@lumc.nl; zn1@sanger.ac.uk.

    Bioinformatics (Oxford, England) 2009;25;21;2865-71

  • Interaction of enteric bacterial pathogens with murine embryonic stem cells.

    Yu J, Rossi R, Hale C, Goulding D and Dougan G

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. jy1@sanger.ac.uk

    Embryonic stem (ES) cells are susceptible to genetic manipulation and retain the potential to differentiate into diverse cell types, which are factors that make them potentially attractive cells for studying host-pathogen interactions. Murine ES cells were found to be susceptible to invasion by Salmonella enterica serovar Typhimurium and Shigella flexneri and to the formation of attaching and effacing lesions by enteropathogenic Escherichia coli. S. enterica serovar Typhimurium and S. flexneri cell entry was dependent on the Salmonella pathogenicity island 1 and Shigella mxi/spa type III secretion systems, respectively. Microscopy studies indicated that both S. enterica serovar Typhimurium and S. flexneri were located in intracellular niches in ES cells that were similar to the niches occupied in differentiated cells. ES cells were eventually killed following bacterial invasion, but no evidence of activation of classical caspase-associated apoptotic or innate immune pathways was found. To demonstrate the potential of mutant ES cells, we employed an ES cell line defective in cholesterol synthesis and found that the mutant cells were less susceptible to infection by Salmonella and Shigella than the parental ES cells. Thus, we highlighted the practical use of genetically modified ES cells for studying microbe-host interactions.

    Funded by: Wellcome Trust

    Infection and immunity 2009;77;2;585-97

  • Generation of transgene-free induced pluripotent mouse stem cells by the piggyBac transposon.

    Yusa K, Rad R, Takeda J and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Induced pluripotent stem cells (iPSCs) have been generated from somatic cells by transgenic expression of Oct4 (Pou5f1), Sox2, Klf4 and Myc. A major difficulty in the application of this technology for regenerative medicine, however, is the delivery of reprogramming factors. Whereas retroviral transduction increases the risk of tumorigenicity, transient expression methods have considerably lower reprogramming efficiencies. Here we describe an efficient piggyBac transposon-based approach to generate integration-free iPSCs. Transposons carrying 2A peptide-linked reprogramming factors induced reprogramming of mouse embryonic fibroblasts with equivalent efficiencies to retroviral transduction. We removed transposons from these primary iPSCs by re-expressing transposase. Transgene-free iPSCs could be identified by negative selection. piggyBac excised without a footprint, leaving the iPSC genome without any genetic alteration. iPSCs fulfilled all criteria of pluripotency, such as pluripotency gene expression, teratoma formation and contribution to chimeras. piggyBac transposon-based reprogramming may be used to generate therapeutically applicable iPSCs.

    Funded by: Wellcome Trust: 077187, WT077187

    Nature methods 2009;6;5;363-9

* quick link - http://q.sanger.ac.uk/rst3d0x1